public class CrunchDatasets extends Object
A helper class for exposing Datasets and Views as Crunch
ReadableSources or Targets.
| Constructor and Description |
|---|
CrunchDatasets() |
| Modifier and Type | Method and Description |
|---|---|
static <E> ReadableSource<E> |
asSource(String uri,
Class<E> type)
|
static <E> ReadableSource<E> |
asSource(URI uri,
Class<E> type)
|
static <E> ReadableSource<E> |
asSource(View<E> view)
Expose the given
View as a Crunch ReadableSource. |
static Target |
asTarget(String uri)
|
static Target |
asTarget(URI uri)
|
static <E> Target |
asTarget(View<E> view)
|
static <E> PCollection<E> |
partition(PCollection<E> collection,
Dataset<E> dataset)
Partitions
collection to be stored efficiently in dataset. |
static <E> PCollection<E> |
partition(PCollection<E> collection,
Dataset<E> dataset,
int numWriters)
Deprecated.
will be removed in 0.19.0; use partition(PCollection, View, int)
|
static <E> PCollection<E> |
partition(PCollection<E> collection,
View<E> view)
Partitions
collection to be stored efficiently in View. |
static <E> PCollection<E> |
partition(PCollection<E> collection,
View<E> view,
int numWriters)
Partitions
collection to be stored efficiently in View. |
public static <E> ReadableSource<E> asSource(View<E> view)
View as a Crunch ReadableSource.E - the type of entity produced by the sourceview - the view to read fromReadableSource for the viewpublic static <E> ReadableSource<E> asSource(URI uri, Class<E> type)
E - the type of entity produced by the sourceuri - the URI of the view or dataset to read fromtype - the Java type of the entities in the datasetReadableSource for the viewpublic static <E> ReadableSource<E> asSource(String uri, Class<E> type)
E - the type of entity produced by the sourceuri - the URI of the view or dataset to read fromtype - the Java type of the entities in the datasetReadableSource for the viewpublic static <E> Target asTarget(View<E> view)
E - the type of entity stored in the viewview - the view to write toTarget for the viewpublic static Target asTarget(String uri)
uri - the dataset or view URITarget for the dataset or viewpublic static Target asTarget(URI uri)
uri - the dataset or view URITarget for the dataset or viewpublic static <E> PCollection<E> partition(PCollection<E> collection, View<E> view)
collection to be stored efficiently in View.
This restructures the parallel collection so that all of the entities that will be stored in a given partition will be processed by the same writer.
E - the type of entities in the collection and underlying datasetcollection - a collection of entitiesview - a View of a dataset to partition the collection forpublic static <E> PCollection<E> partition(PCollection<E> collection, Dataset<E> dataset)
collection to be stored efficiently in dataset.
This restructures the parallel collection so that all of the entities that will be stored in a given partition will be processed by the same writer.
E - the type of entities in the collection and underlying datasetcollection - a collection of entitiesdataset - a dataset to partition the collection forpublic static <E> PCollection<E> partition(PCollection<E> collection, View<E> view, int numWriters)
collection to be stored efficiently in View.
This restructures the parallel collection so that all of the entities that will be stored in a given partition will be processed by the same writer.
If the dataset is not partitioned, then this will structure all of the
entities to produce a number of files equal to numWriters.
E - the type of entities in the collection and underlying datasetcollection - a collection of entitiesview - a View of a dataset to partition the collection fornumWriters - the number of writers that should be usedpartition(PCollection, View)@Deprecated public static <E> PCollection<E> partition(PCollection<E> collection, Dataset<E> dataset, int numWriters)
collection to be stored efficiently in dataset.
This restructures the parallel collection so that all of the entities that will be stored in a given partition will be processed by the same writer.
If the dataset is not partitioned, then this will structure all of the
entities to produce a number of files equal to numWriters.
E - the type of entities in the collection and underlying datasetcollection - a collection of entitiesdataset - a dataset to partition the collection fornumWriters - the number of writers that should be usedpartition(PCollection, Dataset)Copyright © 2013–2015. All rights reserved.