The simplest way to specify Kite and Hadoop dependencies is to use the Kite App Parent POM. This ensures that you inherit a compatible set of dependencies and that the Kite plugins are suitably configured. Add the following to your POM:

1
2
3
4
5
<parent>
  <groupId>org.kitesdk</groupId>
  <artifactId>kite-app-parent-cdh4</artifactId>
  <version>1.1.0</version>
</parent>

Alternatively, if you choose not to use the Kite App Parent POM add the Cloudera repository to your Maven POM:

1
2
3
4
5
6
7
8
<repository>
  <id>cdh.repo</id>
  <url>https://repository.cloudera.com/artifactory/cloudera-repos</url>
  <name>Cloudera Repositories</name>
  <snapshots>
    <enabled>false</enabled>
  </snapshots>
</repository>

Then add a dependency for each module you want to use by referring to the information listed on the Dependency Information pages listed below. You can also view the transitive dependencies for each module.

Hadoop Component Dependencies

As a general rule, Kite modules mark Hadoop component dependencies as having provided scope, since in many cases the dependencies are provided by the container that the code is running in.

For example,

  • Kite Data has a provided dependency on the core Hadoop libraries
  • Kite Crunch has a provided dependency on Crunch and the core Hadoop libraries
  • Kite HCatalog has a provided dependency on Hive

The following containers provide the dependencies listed:

  • The hadoop jar command provides the core Hadoop dependencies.
  • The MapReduce task environment provides the core Hadoop dependencies.
  • When used from then Kite App Parent POM the Kite Maven Plugin provides the Hadoop, HBase and Hive dependencies. If the Kite App Parent POM is not being used, then these dependencies should be specified in the plugin’s dependencies section of the POM.

However, there are some cases where you may have to provide the relevant Hadoop component dependencies yourself, and Kite has grouping dependencies for this purpose.

There is a grouping dependency for each flavor of Hadoop distribution, which differ by Maven artifact ID:

  • Apache Hadoop 2 (the default), kite-hadoop2-dependencies
  • Apache Hadoop 1, kite-hadoop1-dependencies
  • CDH 4, kite-hadoop-cdh4-dependencies
  • CDH 5, kite-hadoop-cdh5-dependencies

This is how you would specify a dependency on the CDH 5 dependencies:

1
2
3
4
5
6
7
<dependency>
  <groupId>org.kitesdk</groupId>
  <artifactId>kite-hadoop-cdh5-dependencies</artifactId>
  <version>1.1.0</version>
  <type>pom</type>
  <scope>compile</scope>
</dependency>

There are an analogous set of grouping dependencies for HBase:

  • Apache Hadoop 2 (the default), kite-hbase2-dependencies
  • Apache Hadoop 1, kite-hbase1-dependencies
  • CDH 4, kite-hbase-cdh4-dependencies
  • CDH 5, kite-hbase-cdh5-dependencies

Here are some scenarios when you need to provide Hadoop component dependencies:

  • Crunch jobs, even those running in the containers listed above. However, if using
    the Kite App Parent POM Crunch is provided (example)
  • Standalone Java programs, not run using kite:run-tool or hadoop jar (example)
  • Web apps (example)

Kite Data Modules

Kite Morphlines Modules

Kite Tools Modules