public class Datasets extends Object
Dataset instances.
URIs
All methods require a URI that identifies a dataset, view, or
repository. The URI must begin with the scheme dataset:,
view:, or repo:. The remainder of the URI is
implementation specific, depending on the dataset scheme.
For example, the URI dataset:hive:movies/ratings
references a dataset named ratings in the
movies namespace, stored in Hive.
The URI view:hive:movies/ratings?year=2015&month=3
references a view of the same ratings dataset. The view
is filtered to include records from only March, 2015.
See Dataset and View URIs for the available URI patterns.
Dataset Descriptors
Some methods require a DatasetDescriptor that encapsulates metadata
about a dataset. Descriptors are built using a
descriptor builder.
Entities
Entities are analagous to records in database terminology. The term is used in the API to emphasize that an entity can include not only primitive objects, but also complex objects such as hash maps.
Some methods accept an entity class that will be used by Kite when returning entities from a dataset or view.
| Constructor and Description |
|---|
Datasets() |
| Modifier and Type | Method and Description |
|---|---|
static <V extends View<GenericRecord>> |
create(String uri,
DatasetDescriptor descriptor)
Create a
Dataset for the given dataset or view URI string. |
static <E,V extends View<E>> |
create(String uri,
DatasetDescriptor descriptor,
Class<E> type)
Create a
Dataset for the given dataset or view URI string. |
static <V extends View<GenericRecord>> |
create(URI uri,
DatasetDescriptor descriptor)
Create a
Dataset for the given dataset or view URI. |
static <E,V extends View<E>> |
create(URI uri,
DatasetDescriptor descriptor,
Class<E> type)
Create a
Dataset for the given dataset or view URI. |
static boolean |
delete(String uri)
Delete a
Dataset identified by the given dataset URI string. |
static boolean |
delete(URI uri)
Delete a
Dataset identified by the given dataset URI. |
static boolean |
exists(String uri)
Check whether a
Dataset identified by the given URI string exists. |
static boolean |
exists(URI uri)
Check whether a
Dataset identified by the given URI exists. |
static Collection<URI> |
list(String uri)
List the
Dataset URIs in the repository identified by the URI
string. |
static Collection<URI> |
list(URI uri)
List the
Dataset URIs in the repository identified by the URI. |
static <V extends View<GenericRecord>> |
load(String uriString)
|
static <E,V extends View<E>> |
load(String uriString,
Class<E> type)
|
static <V extends View<GenericRecord>> |
load(URI uri)
|
static <E,V extends View<E>> |
load(URI uri,
Class<E> type)
|
static <D extends Dataset<GenericRecord>> |
update(String uri,
DatasetDescriptor descriptor)
Update a
Dataset for the given dataset or view URI string. |
static <E,D extends Dataset<E>> |
update(String uri,
DatasetDescriptor descriptor,
Class<E> type)
Update a
Dataset for the given dataset or view URI string. |
static <D extends Dataset<GenericRecord>> |
update(URI uri,
DatasetDescriptor descriptor)
Update a
Dataset for the given dataset or view URI. |
static <E,D extends Dataset<E>> |
update(URI uri,
DatasetDescriptor descriptor,
Class<E> type)
Update a
Dataset for the given dataset or view URI. |
public static <E,V extends View<E>> V load(URI uri, Class<E> type)
Dataset or View for the given URI.
URIs must begin with dataset: or view:. The remainder of
the URI is implementation specific, depending on the dataset scheme.
If you use a dataset URI, load returns the unfiltered dataset.
If you use a view URI, load returns a View configured to
read a subset of the dataset.
E - the type used for readers and writers created by this
DatasetV - the type of View expecteduri - a Dataset or View URItype - a Java class that represents an entity in the datasetView for the given URIDatasetNotFoundException - if there is no dataset for the given URINullPointerException - if any arguments are nullIllegalArgumentException - if uri is not a dataset or view URIpublic static <V extends View<GenericRecord>> V load(URI uri)
Dataset or View for the given URI.
URIs must begin with dataset: or view:. The remainder of
the URI is implementation specific, depending on the dataset scheme.
If you use a dataset URI, load returns the unfiltered dataset.
If you use a view URI, load returns a View configured to
read a subset of the dataset.
V - the type of View expecteduri - a Dataset or View URIView for the given URIDatasetNotFoundException - if there is no dataset for the given URINullPointerException - if any arguments are nullIllegalArgumentException - if uri is not a dataset or view URIpublic static <E,V extends View<E>> V load(String uriString, Class<E> type)
Dataset or View for the given URI.
URIs must begin with dataset: or view:. The remainder of
the URI is implementation specific, depending on the dataset scheme.
If you use a dataset URI, load returns the unfiltered dataset.
If you use a view URI, load returns a View configured to
read a subset of the dataset.
E - the type used for readers and writers created by this
DatasetV - the type of View expecteduriString - a Dataset or View URItype - a Java class that represents an entity in the datasetView for the given URIDatasetNotFoundException - if there is no dataset for the given URINullPointerException - if any arguments are nullIllegalArgumentException - if uri is not a dataset or view URIpublic static <V extends View<GenericRecord>> V load(String uriString)
Dataset or View for the given URI.
URIs must begin with dataset: or view:. The remainder of
the URI is implementation specific, depending on the dataset scheme.
If you use a dataset URI, load returns the unfiltered dataset.
If you use a view URI, load returns a View configured to
read a subset of the dataset.
V - the type of View expecteduriString - a Dataset or View URIView for the given URIDatasetNotFoundException - if there is no dataset for the given URINullPointerException - if any arguments are nullIllegalArgumentException - if uri is not a dataset or view URIpublic static <E,V extends View<E>> V create(URI uri, DatasetDescriptor descriptor, Class<E> type)
Dataset for the given dataset or view URI.
create returns an empty dataset. You can use DatasetWriter
to populate your dataset.
URIs must begin with dataset: or view:. The remainder of
the URI is implementation specific, depending on the dataset scheme. If the
URI is a view URI, this method creates the underlying dataset and returns a
view of it.
E - the type used for readers and writers created by this
DatasetV - the type of Dataset or View expecteduri - a Dataset or View URItype - a Java class that represents an entity in the datasetDataset responsible for the given URINullPointerException - if uri, descriptor, or type is
nullIllegalArgumentException - if uri is not a dataset or view URIDatasetExistsException - if a Dataset for the given URI already existsIncompatibleSchemaException - if the schema is not compatible with existing datasets with
shared storage (for example, in the same HBase table)public static <V extends View<GenericRecord>> V create(URI uri, DatasetDescriptor descriptor)
Dataset for the given dataset or view URI.
create returns an empty dataset. You can use DatasetWriter
to populate your dataset.
URIs must begin with dataset: or view:. The remainder of
the URI is implementation specific, depending on the dataset scheme. If the
URI is a view URI, this method creates the underlying dataset and returns a
view of it.
V - the type of Dataset or View expecteduri - a Dataset or View URIDataset responsible for the given URINullPointerException - if uri or descriptor is nullIllegalArgumentException - if uri is not a dataset or view URIDatasetExistsException - if a Dataset for the given URI already existsIncompatibleSchemaException - if the schema is not compatible with existing datasets with
shared storage (for example, in the same HBase table)public static <E,V extends View<E>> V create(String uri, DatasetDescriptor descriptor, Class<E> type)
Dataset for the given dataset or view URI string.
create returns an empty dataset. You can use DatasetWriter
to populate your dataset.
URIs must begin with dataset: or view:. The remainder of
the URI is implementation specific, depending on the dataset scheme. If the
URI is a view URI, this method creates the underlying dataset and returns a
view of it.
E - the type used for readers and writers created by this
DatasetV - the type of Dataset or View expecteduri - a Dataset or View URI stringtype - a Java class that represents an entity in the datasetDataset responsible for the given URINullPointerException - if uri, descriptor, or type is
nullIllegalArgumentException - if uri is not a dataset or view URIDatasetExistsException - if a Dataset for the given URI already existsIncompatibleSchemaException - if the schema is not compatible with existing datasets with
shared storage (for example, in the same HBase table)public static <V extends View<GenericRecord>> V create(String uri, DatasetDescriptor descriptor)
Dataset for the given dataset or view URI string.
create returns an empty dataset. You can use DatasetWriter
to populate your dataset.
URIs must begin with dataset: or view:. The remainder of
the URI is implementation specific, depending on the dataset scheme. If the
URI is a view URI, this method creates the underlying dataset and returns a
view of it.
V - the type of Dataset or View expecteduri - a Dataset or View URI stringDataset responsible for the given URINullPointerException - if uri or descriptor is nullIllegalArgumentException - if uri is not a dataset or view URIDatasetExistsException - if a Dataset for the given URI already existsIncompatibleSchemaException - if the schema is not compatible with existing datasets with
shared storage (for example, in the same HBase table)public static <E,D extends Dataset<E>> D update(URI uri, DatasetDescriptor descriptor, Class<E> type)
Dataset for the given dataset or view URI.
You can add columns, remove columns, or change the data type of columns in your dataset, provided you don't attempt a change that is incompatible with written data. Avro defines rules for compatible schema evolution. See Schema Evolution.
This method updates the dataset descriptor, so you can also add or change properties.
The recommended way to update a dataset descriptor is to build it
based on an existing descriptor. Use
DatasetDescriptor.Builder(DatasetDescriptor) to
build a DatasetDescriptor based on an existing instance.
You cannot change a dataset format or partition strategy.
URIs must begin with dataset:. The remainder of
the URI is implementation specific, depending on the dataset scheme.
E - the type used for readers and writers created by this
DatasetD - the type of Dataset expecteduri - a Dataset URItype - a Java class that represents an entity in the datasetDataset for the given URINullPointerException - if uri, descriptor, or type is
nullIllegalArgumentException - if uri is not a dataset URIDatasetNotFoundException - if there is no dataset for the given URIUnsupportedOperationException - if descriptor updates are not supported by the implementationConcurrentSchemaModificationException - if the Dataset schema is updated concurrentlyIncompatibleSchemaException - if the schema is not compatible with previous schemas, or with
existing datasets with shared storage (for example, in the same
HBase table)public static <D extends Dataset<GenericRecord>> D update(URI uri, DatasetDescriptor descriptor)
Dataset for the given dataset or view URI.
You can add columns, remove columns, or change the data type of columns in your dataset, provided you don't attempt a change that is incompatible with written data. Avro defines rules for compatible schema evolution. See Schema Evolution.
This method updates the dataset descriptor, so you can also add or change properties.
The recommended way to update a dataset descriptor is to build it
based on an existing descriptor. Use
DatasetDescriptor.Builder(DatasetDescriptor) to
build a DatasetDescriptor based on an existing instance.
You cannot change a dataset format or partition strategy.
URIs must begin with dataset:. The remainder of
the URI is implementation specific, depending on the dataset scheme.
D - the type of Dataset expecteduri - a Dataset URIDataset for the given URINullPointerException - if uri or descriptor is nullIllegalArgumentException - if uri is not a dataset URIDatasetNotFoundException - if there is no dataset for the given URIUnsupportedOperationException - if descriptor updates are not supported by the implementationConcurrentSchemaModificationException - if the Dataset schema is updated concurrentlyIncompatibleSchemaException - if the schema is not compatible with previous schemas, or with
existing datasets with shared storage (for example, in the same
HBase table)public static <E,D extends Dataset<E>> D update(String uri, DatasetDescriptor descriptor, Class<E> type)
Dataset for the given dataset or view URI string.
You can add columns, remove columns, or change the data type of columns in your dataset, provided you don't attempt a change that is incompatible with written data. Avro defines rules for compatible schema evolution. See Schema Evolution.
This method updates the dataset descriptor, so you can also add or change properties.
The recommended way to update a dataset descriptor is to build it
based on an existing descriptor. Use
DatasetDescriptor.Builder(DatasetDescriptor) to
build a DatasetDescriptor based on an existing instance.
You cannot change a dataset format or partition strategy.
URIs must begin with dataset:. The remainder of
the URI is implementation specific, depending on the dataset scheme.
E - the type used for readers and writers created by this
DatasetD - the type of Dataset expecteduri - a Dataset URI stringtype - a Java class that represents an entity in the datasetDataset for the given URINullPointerException - if uri, descriptor, or type is
nullIllegalArgumentException - if uri is not a dataset URIDatasetNotFoundException - if there is no dataset for the given URIUnsupportedOperationException - if descriptor updates are not supported by the implementationConcurrentSchemaModificationException - if the Dataset schema is updated concurrentlyIncompatibleSchemaException - if the schema is not compatible with previous schemas, or with
existing datasets with shared storage (for example, in the same
HBase table)public static <D extends Dataset<GenericRecord>> D update(String uri, DatasetDescriptor descriptor)
Dataset for the given dataset or view URI string.
You can add columns, remove columns, or change the data type of columns in your dataset, provided you don't attempt a change that is incompatible with written data. Avro defines rules for compatible schema evolution. See Schema Evolution.
This method updates the dataset descriptor, so you can also add or change properties.
The recommended way to update a dataset descriptor is to build it
based on an existing descriptor. Use
DatasetDescriptor.Builder(DatasetDescriptor) to
build a DatasetDescriptor based on an existing instance.
You cannot change a dataset format or partition strategy.
URIs must begin with dataset:. The remainder of
the URI is implementation specific, depending on the dataset scheme.
D - the type of Dataset expecteduri - a Dataset URI stringDataset for the given URINullPointerException - if uri or descriptor is nullIllegalArgumentException - if uri is not a dataset URIDatasetNotFoundException - if there is no dataset for the given URIUnsupportedOperationException - if descriptor updates are not supported by the implementationConcurrentSchemaModificationException - if the Dataset schema is updated concurrentlyIncompatibleSchemaException - if the schema is not compatible with previous schemas, or with
existing datasets with shared storage (for example, in the same
HBase table)public static boolean delete(URI uri)
Dataset identified by the given dataset URI.
When you call this method using a dataset URI, both data and metadata are deleted. After you call this method, the dataset no longer exists, unless an exception is thrown.
When you call this method using a view URI, data in that view is deleted.
The dataset's metadata is not changed. This can throw an
UnsupportedOperationException if the delete requires additional
work. For example, if some, but not all, of the data in an underlying data
file must be removed, then the implementation is allowed to reject the
deletion rather than copy the remaining records to a new file.
An implementation must document under what conditions it accepts deletes,
and under what conditions it rejects them.
URIs must begin with dataset:. The remainder of
the URI is implementation specific, depending on the dataset scheme.
uri - a Dataset URItrue if any data or metadata is removed, false
otherwiseNullPointerException - if uri is nullIllegalArgumentException - if uri is not a dataset URIpublic static boolean delete(String uri)
Dataset identified by the given dataset URI string.
When you call this method using a dataset URI, both data and metadata are deleted. After you call this method, the dataset no longer exists, unless an exception is thrown.
When you call this method using a view URI, data in that view is deleted.
The dataset's metadata is not changed. This can throw an
UnsupportedOperationException if the delete requires additional
work. For example, if some, but not all, of the data in an underlying data
file must be removed, then the implementation is allowed to reject the
deletion rather than copy the remaining records to a new file.
An implementation must document under what conditions it accepts deletes,
and under what conditions it rejects them.
URIs must begin with dataset:. The remainder of
the URI is implementation specific, depending on the dataset scheme.
uri - a Dataset URI stringtrue if any data or metadata is removed, false
otherwiseNullPointerException - if uri is nullIllegalArgumentException - if uri is not a dataset URIpublic static boolean exists(URI uri)
Dataset identified by the given URI exists.
URIs must begin with dataset:. The remainder of
the URI is implementation specific, depending on the dataset scheme.
uri - a Dataset URItrue if the dataset exists, false otherwiseNullPointerException - if uri is nullIllegalArgumentException - if uri is not a dataset URIpublic static boolean exists(String uri)
Dataset identified by the given URI string exists.
URIs must begin with dataset:. The remainder of
the URI is implementation specific, depending on the dataset scheme.
uri - a Dataset URI stringtrue if the dataset exists, false otherwiseNullPointerException - if uri is nullIllegalArgumentException - if uri is not a dataset URIpublic static Collection<URI> list(URI uri)
Dataset URIs in the repository identified by the URI.
URI formats are defined by Dataset implementations. The repository
URIs you pass to this method must begin with repo:. For example, to
list the Dataset URIs for the Hive repository, provide the URI
repo:hive.
uri - a DatasetRepository URIDatasetRepositoryNullPointerException - if uri is nullIllegalArgumentException - if uri is not a repository URIpublic static Collection<URI> list(String uri)
Dataset URIs in the repository identified by the URI
string.
URI formats are defined by Dataset implementations. The repository
URIs you pass to this method must begin with repo:. For example, to
list the Dataset URIs for the Hive repository, provide the URI
repo:hive.
uri - a DatasetRepository URI stringDatasetRepositoryNullPointerException - if URI is nullIllegalArgumentException - if uri is not a repository URICopyright © 2013–2015. All rights reserved.