public class Datasets extends Object
Dataset
instances.
URIs
All methods require a URI that identifies a dataset, view, or
repository. The URI must begin with the scheme dataset:
,
view:
, or repo:
. The remainder of the URI is
implementation specific, depending on the dataset scheme.
For example, the URI dataset:hive:movies/ratings
references a dataset named ratings in the
movies namespace, stored in Hive.
The URI view:hive:movies/ratings?year=2015&month=3
references a view of the same ratings dataset. The view
is filtered to include records from only March, 2015.
See Dataset and View URIs for the available URI patterns.
Dataset Descriptors
Some methods require a DatasetDescriptor
that encapsulates metadata
about a dataset. Descriptors are built using a
descriptor builder
.
Entities
Entities are analagous to records in database terminology. The term is used in the API to emphasize that an entity can include not only primitive objects, but also complex objects such as hash maps.
Some methods accept an entity class that will be used by Kite when returning entities from a dataset or view.
Constructor and Description |
---|
Datasets() |
Modifier and Type | Method and Description |
---|---|
static <V extends View<GenericRecord>> |
create(String uri,
DatasetDescriptor descriptor)
Create a
Dataset for the given dataset or view URI string. |
static <E,V extends View<E>> |
create(String uri,
DatasetDescriptor descriptor,
Class<E> type)
Create a
Dataset for the given dataset or view URI string. |
static <V extends View<GenericRecord>> |
create(URI uri,
DatasetDescriptor descriptor)
Create a
Dataset for the given dataset or view URI. |
static <E,V extends View<E>> |
create(URI uri,
DatasetDescriptor descriptor,
Class<E> type)
Create a
Dataset for the given dataset or view URI. |
static boolean |
delete(String uri)
Delete a
Dataset identified by the given dataset URI string. |
static boolean |
delete(URI uri)
Delete a
Dataset identified by the given dataset URI. |
static boolean |
exists(String uri)
Check whether a
Dataset identified by the given URI string exists. |
static boolean |
exists(URI uri)
Check whether a
Dataset identified by the given URI exists. |
static Collection<URI> |
list(String uri)
List the
Dataset URIs in the repository identified by the URI
string. |
static Collection<URI> |
list(URI uri)
List the
Dataset URIs in the repository identified by the URI. |
static <V extends View<GenericRecord>> |
load(String uriString)
|
static <E,V extends View<E>> |
load(String uriString,
Class<E> type)
|
static <V extends View<GenericRecord>> |
load(URI uri)
|
static <E,V extends View<E>> |
load(URI uri,
Class<E> type)
|
static <D extends Dataset<GenericRecord>> |
update(String uri,
DatasetDescriptor descriptor)
Update a
Dataset for the given dataset or view URI string. |
static <E,D extends Dataset<E>> |
update(String uri,
DatasetDescriptor descriptor,
Class<E> type)
Update a
Dataset for the given dataset or view URI string. |
static <D extends Dataset<GenericRecord>> |
update(URI uri,
DatasetDescriptor descriptor)
Update a
Dataset for the given dataset or view URI. |
static <E,D extends Dataset<E>> |
update(URI uri,
DatasetDescriptor descriptor,
Class<E> type)
Update a
Dataset for the given dataset or view URI. |
public static <E,V extends View<E>> V load(URI uri, Class<E> type)
Dataset
or View
for the given URI
.
URIs must begin with dataset:
or view:
. The remainder of
the URI is implementation specific, depending on the dataset scheme.
If you use a dataset URI, load
returns the unfiltered dataset.
If you use a view URI, load
returns a View
configured to
read a subset of the dataset.
E
- the type used for readers and writers created by this
Dataset
V
- the type of View
expecteduri
- a Dataset
or View
URItype
- a Java class that represents an entity in the datasetView
for the given URIDatasetNotFoundException
- if there is no dataset for the given URINullPointerException
- if any arguments are null
IllegalArgumentException
- if uri
is not a dataset or view URIpublic static <V extends View<GenericRecord>> V load(URI uri)
Dataset
or View
for the given URI
.
URIs must begin with dataset:
or view:
. The remainder of
the URI is implementation specific, depending on the dataset scheme.
If you use a dataset URI, load
returns the unfiltered dataset.
If you use a view URI, load
returns a View
configured to
read a subset of the dataset.
V
- the type of View
expecteduri
- a Dataset
or View
URIView
for the given URIDatasetNotFoundException
- if there is no dataset for the given URINullPointerException
- if any arguments are null
IllegalArgumentException
- if uri
is not a dataset or view URIpublic static <E,V extends View<E>> V load(String uriString, Class<E> type)
Dataset
or View
for the given URI
.
URIs must begin with dataset:
or view:
. The remainder of
the URI is implementation specific, depending on the dataset scheme.
If you use a dataset URI, load
returns the unfiltered dataset.
If you use a view URI, load
returns a View
configured to
read a subset of the dataset.
E
- the type used for readers and writers created by this
Dataset
V
- the type of View
expecteduriString
- a Dataset
or View
URItype
- a Java class that represents an entity in the datasetView
for the given URIDatasetNotFoundException
- if there is no dataset for the given URINullPointerException
- if any arguments are null
IllegalArgumentException
- if uri
is not a dataset or view URIpublic static <V extends View<GenericRecord>> V load(String uriString)
Dataset
or View
for the given URI
.
URIs must begin with dataset:
or view:
. The remainder of
the URI is implementation specific, depending on the dataset scheme.
If you use a dataset URI, load
returns the unfiltered dataset.
If you use a view URI, load
returns a View
configured to
read a subset of the dataset.
V
- the type of View
expecteduriString
- a Dataset
or View
URIView
for the given URIDatasetNotFoundException
- if there is no dataset for the given URINullPointerException
- if any arguments are null
IllegalArgumentException
- if uri
is not a dataset or view URIpublic static <E,V extends View<E>> V create(URI uri, DatasetDescriptor descriptor, Class<E> type)
Dataset
for the given dataset or view URI.
create
returns an empty dataset. You can use DatasetWriter
to populate your dataset.
URIs must begin with dataset:
or view:
. The remainder of
the URI is implementation specific, depending on the dataset scheme. If the
URI is a view URI, this method creates the underlying dataset and returns a
view of it.
E
- the type used for readers and writers created by this
Dataset
V
- the type of Dataset
or View
expecteduri
- a Dataset
or View
URItype
- a Java class that represents an entity in the datasetDataset
responsible for the given URINullPointerException
- if uri
, descriptor
, or type
is
null
IllegalArgumentException
- if uri
is not a dataset or view URIDatasetExistsException
- if a Dataset
for the given URI already existsIncompatibleSchemaException
- if the schema is not compatible with existing datasets with
shared storage (for example, in the same HBase table)public static <V extends View<GenericRecord>> V create(URI uri, DatasetDescriptor descriptor)
Dataset
for the given dataset or view URI.
create
returns an empty dataset. You can use DatasetWriter
to populate your dataset.
URIs must begin with dataset:
or view:
. The remainder of
the URI is implementation specific, depending on the dataset scheme. If the
URI is a view URI, this method creates the underlying dataset and returns a
view of it.
V
- the type of Dataset
or View
expecteduri
- a Dataset
or View
URIDataset
responsible for the given URINullPointerException
- if uri
or descriptor
is null
IllegalArgumentException
- if uri
is not a dataset or view URIDatasetExistsException
- if a Dataset
for the given URI already existsIncompatibleSchemaException
- if the schema is not compatible with existing datasets with
shared storage (for example, in the same HBase table)public static <E,V extends View<E>> V create(String uri, DatasetDescriptor descriptor, Class<E> type)
Dataset
for the given dataset or view URI string.
create
returns an empty dataset. You can use DatasetWriter
to populate your dataset.
URIs must begin with dataset:
or view:
. The remainder of
the URI is implementation specific, depending on the dataset scheme. If the
URI is a view URI, this method creates the underlying dataset and returns a
view of it.
E
- the type used for readers and writers created by this
Dataset
V
- the type of Dataset
or View
expecteduri
- a Dataset
or View
URI stringtype
- a Java class that represents an entity in the datasetDataset
responsible for the given URINullPointerException
- if uri
, descriptor
, or type
is
null
IllegalArgumentException
- if uri
is not a dataset or view URIDatasetExistsException
- if a Dataset
for the given URI already existsIncompatibleSchemaException
- if the schema is not compatible with existing datasets with
shared storage (for example, in the same HBase table)public static <V extends View<GenericRecord>> V create(String uri, DatasetDescriptor descriptor)
Dataset
for the given dataset or view URI string.
create
returns an empty dataset. You can use DatasetWriter
to populate your dataset.
URIs must begin with dataset:
or view:
. The remainder of
the URI is implementation specific, depending on the dataset scheme. If the
URI is a view URI, this method creates the underlying dataset and returns a
view of it.
V
- the type of Dataset
or View
expecteduri
- a Dataset
or View
URI stringDataset
responsible for the given URINullPointerException
- if uri
or descriptor
is null
IllegalArgumentException
- if uri
is not a dataset or view URIDatasetExistsException
- if a Dataset
for the given URI already existsIncompatibleSchemaException
- if the schema is not compatible with existing datasets with
shared storage (for example, in the same HBase table)public static <E,D extends Dataset<E>> D update(URI uri, DatasetDescriptor descriptor, Class<E> type)
Dataset
for the given dataset or view URI.
You can add columns, remove columns, or change the data type of columns in your dataset, provided you don't attempt a change that is incompatible with written data. Avro defines rules for compatible schema evolution. See Schema Evolution.
This method updates the dataset descriptor, so you can also add or change properties.
The recommended way to update a dataset descriptor is to build it
based on an existing descriptor. Use
DatasetDescriptor.Builder(DatasetDescriptor)
to
build a DatasetDescriptor based on an existing instance.
You cannot change a dataset format or partition strategy.
URIs must begin with dataset:
. The remainder of
the URI is implementation specific, depending on the dataset scheme.
E
- the type used for readers and writers created by this
Dataset
D
- the type of Dataset
expecteduri
- a Dataset
URItype
- a Java class that represents an entity in the datasetDataset
for the given URINullPointerException
- if uri
, descriptor
, or type
is
null
IllegalArgumentException
- if uri
is not a dataset URIDatasetNotFoundException
- if there is no dataset for the given URIUnsupportedOperationException
- if descriptor updates are not supported by the implementationConcurrentSchemaModificationException
- if the Dataset
schema is updated concurrentlyIncompatibleSchemaException
- if the schema is not compatible with previous schemas, or with
existing datasets with shared storage (for example, in the same
HBase table)public static <D extends Dataset<GenericRecord>> D update(URI uri, DatasetDescriptor descriptor)
Dataset
for the given dataset or view URI.
You can add columns, remove columns, or change the data type of columns in your dataset, provided you don't attempt a change that is incompatible with written data. Avro defines rules for compatible schema evolution. See Schema Evolution.
This method updates the dataset descriptor, so you can also add or change properties.
The recommended way to update a dataset descriptor is to build it
based on an existing descriptor. Use
DatasetDescriptor.Builder(DatasetDescriptor)
to
build a DatasetDescriptor based on an existing instance.
You cannot change a dataset format or partition strategy.
URIs must begin with dataset:
. The remainder of
the URI is implementation specific, depending on the dataset scheme.
D
- the type of Dataset
expecteduri
- a Dataset
URIDataset
for the given URINullPointerException
- if uri
or descriptor
is null
IllegalArgumentException
- if uri
is not a dataset URIDatasetNotFoundException
- if there is no dataset for the given URIUnsupportedOperationException
- if descriptor updates are not supported by the implementationConcurrentSchemaModificationException
- if the Dataset
schema is updated concurrentlyIncompatibleSchemaException
- if the schema is not compatible with previous schemas, or with
existing datasets with shared storage (for example, in the same
HBase table)public static <E,D extends Dataset<E>> D update(String uri, DatasetDescriptor descriptor, Class<E> type)
Dataset
for the given dataset or view URI string.
You can add columns, remove columns, or change the data type of columns in your dataset, provided you don't attempt a change that is incompatible with written data. Avro defines rules for compatible schema evolution. See Schema Evolution.
This method updates the dataset descriptor, so you can also add or change properties.
The recommended way to update a dataset descriptor is to build it
based on an existing descriptor. Use
DatasetDescriptor.Builder(DatasetDescriptor)
to
build a DatasetDescriptor based on an existing instance.
You cannot change a dataset format or partition strategy.
URIs must begin with dataset:
. The remainder of
the URI is implementation specific, depending on the dataset scheme.
E
- the type used for readers and writers created by this
Dataset
D
- the type of Dataset
expecteduri
- a Dataset
URI stringtype
- a Java class that represents an entity in the datasetDataset
for the given URINullPointerException
- if uri
, descriptor
, or type
is
null
IllegalArgumentException
- if uri
is not a dataset URIDatasetNotFoundException
- if there is no dataset for the given URIUnsupportedOperationException
- if descriptor updates are not supported by the implementationConcurrentSchemaModificationException
- if the Dataset
schema is updated concurrentlyIncompatibleSchemaException
- if the schema is not compatible with previous schemas, or with
existing datasets with shared storage (for example, in the same
HBase table)public static <D extends Dataset<GenericRecord>> D update(String uri, DatasetDescriptor descriptor)
Dataset
for the given dataset or view URI string.
You can add columns, remove columns, or change the data type of columns in your dataset, provided you don't attempt a change that is incompatible with written data. Avro defines rules for compatible schema evolution. See Schema Evolution.
This method updates the dataset descriptor, so you can also add or change properties.
The recommended way to update a dataset descriptor is to build it
based on an existing descriptor. Use
DatasetDescriptor.Builder(DatasetDescriptor)
to
build a DatasetDescriptor based on an existing instance.
You cannot change a dataset format or partition strategy.
URIs must begin with dataset:
. The remainder of
the URI is implementation specific, depending on the dataset scheme.
D
- the type of Dataset
expecteduri
- a Dataset
URI stringDataset
for the given URINullPointerException
- if uri
or descriptor
is null
IllegalArgumentException
- if uri
is not a dataset URIDatasetNotFoundException
- if there is no dataset for the given URIUnsupportedOperationException
- if descriptor updates are not supported by the implementationConcurrentSchemaModificationException
- if the Dataset
schema is updated concurrentlyIncompatibleSchemaException
- if the schema is not compatible with previous schemas, or with
existing datasets with shared storage (for example, in the same
HBase table)public static boolean delete(URI uri)
Dataset
identified by the given dataset URI.
When you call this method using a dataset URI, both data and metadata are deleted. After you call this method, the dataset no longer exists, unless an exception is thrown.
When you call this method using a view URI, data in that view is deleted.
The dataset's metadata is not changed. This can throw an
UnsupportedOperationException
if the delete requires additional
work. For example, if some, but not all, of the data in an underlying data
file must be removed, then the implementation is allowed to reject the
deletion rather than copy the remaining records to a new file.
An implementation must document under what conditions it accepts deletes,
and under what conditions it rejects them.
URIs must begin with dataset:
. The remainder of
the URI is implementation specific, depending on the dataset scheme.
uri
- a Dataset
URItrue
if any data or metadata is removed, false
otherwiseNullPointerException
- if uri
is nullIllegalArgumentException
- if uri
is not a dataset URIpublic static boolean delete(String uri)
Dataset
identified by the given dataset URI string.
When you call this method using a dataset URI, both data and metadata are deleted. After you call this method, the dataset no longer exists, unless an exception is thrown.
When you call this method using a view URI, data in that view is deleted.
The dataset's metadata is not changed. This can throw an
UnsupportedOperationException
if the delete requires additional
work. For example, if some, but not all, of the data in an underlying data
file must be removed, then the implementation is allowed to reject the
deletion rather than copy the remaining records to a new file.
An implementation must document under what conditions it accepts deletes,
and under what conditions it rejects them.
URIs must begin with dataset:
. The remainder of
the URI is implementation specific, depending on the dataset scheme.
uri
- a Dataset
URI stringtrue
if any data or metadata is removed, false
otherwiseNullPointerException
- if uri
is nullIllegalArgumentException
- if uri
is not a dataset URIpublic static boolean exists(URI uri)
Dataset
identified by the given URI exists.
URIs must begin with dataset:
. The remainder of
the URI is implementation specific, depending on the dataset scheme.
uri
- a Dataset
URItrue
if the dataset exists, false
otherwiseNullPointerException
- if uri
is nullIllegalArgumentException
- if uri
is not a dataset URIpublic static boolean exists(String uri)
Dataset
identified by the given URI string exists.
URIs must begin with dataset:
. The remainder of
the URI is implementation specific, depending on the dataset scheme.
uri
- a Dataset
URI stringtrue
if the dataset exists, false
otherwiseNullPointerException
- if uri
is nullIllegalArgumentException
- if uri
is not a dataset URIpublic static Collection<URI> list(URI uri)
Dataset
URIs in the repository identified by the URI.
URI formats are defined by Dataset
implementations. The repository
URIs you pass to this method must begin with repo:
. For example, to
list the Dataset
URIs for the Hive repository, provide the URI
repo:hive
.
uri
- a DatasetRepository
URIDatasetRepository
NullPointerException
- if uri
is nullIllegalArgumentException
- if uri
is not a repository URIpublic static Collection<URI> list(String uri)
Dataset
URIs in the repository identified by the URI
string.
URI formats are defined by Dataset
implementations. The repository
URIs you pass to this method must begin with repo:
. For example, to
list the Dataset
URIs for the Hive repository, provide the URI
repo:hive
.
uri
- a DatasetRepository
URI stringDatasetRepository
NullPointerException
- if URI
is nullIllegalArgumentException
- if uri
is not a repository URICopyright © 2013–2015. All rights reserved.