Class JobClasspathHelper

  extended by

public class JobClasspathHelper
extends Object

This class is an helper to copy the jars needed by the job in the Distributed cache.

This tool helps to setup the job classpath at runtime. It allows library sharing between job. That result in faster jobs setup (since most of the time the libs are already uploaded in HDFS). Before submitting a job, you use this tool to provide the classes that you use in your job.

The tool will find the jar(s), or will create the jars and upload them to a "library" path in HDFS, and it will create an md5 file along the uploaded jar.

In order to find the jar or creating the job's Jar It use a modified version of org.apache.hadoop.util.JarFinder that is found in Hadoop 0.23

If another job needs the same jar and provide the same "library" path it will discover it and use it, without having to lose the time that the upload of the jar would require.

If the jar does not exist in the "library" path, it will upload it. However, if the jar is already in the "library" path, the tool will compute the md5 of the jar and compare with the one found in HDFS, and if there's a difference, the jar will be uploaded.

If it creates a jar (from the classes of the job itself or from the classes in your workspace for example), it will upload the created jar to the "library" path and clean them after the JVM exits.

Here's an example for a job class TestTool.class that requires HashFunction from Guava.

 new JobClasspathHelper().prepareClasspath(getConf(), new Path("/lib/path"), new Class[] { TestTool.class, HashFunction.class});

tbussier (

Constructor Summary
Method Summary
 void prepareClasspath(Configuration conf, Path libDir, Class<?>... classesToInclude)
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Constructor Detail


public JobClasspathHelper()
Method Detail


public void prepareClasspath(Configuration conf,
                             Path libDir,
                             Class<?>... classesToInclude)
                      throws Exception
conf - Configuration object for the Job. Used to get the FileSystem associated with it.
libDir - Destination directory in the FileSystem (Usually HDFS) where to upload and look for the libs.
classesToInclude - Classes that are needed by the job. JarFinder will look for the jar containing these classes.

Copyright © 2013–2014. All rights reserved.