|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectorg.kitesdk.data.spi.AbstractDatasetRepository
org.kitesdk.data.hcatalog.HCatalogDatasetRepository
public class HCatalogDatasetRepository
A DatasetRepository that uses the Hive/HCatalog metastore for metadata,
and stores data in a Hadoop FileSystem.
The location of the data directory is either chosen by Hive/HCatalog (so called
"managed tables"), or specified when creating an instance of this class by providing
a FileSystem, and a root directory in the constructor ("external tables").
The primary methods of interest will be
create(String, DatasetDescriptor), load(String), and
delete(String) which create a new dataset, load an existing
dataset, or delete an existing dataset, respectively. Once a dataset has been created
or loaded, users can invoke the appropriate Dataset methods to get a reader
or writer as needed.
DatasetRepository,
Dataset| Nested Class Summary | |
|---|---|
static class |
HCatalogDatasetRepository.Builder
A fluent builder to aid in the construction of HCatalogDatasetRepository
instances. |
| Method Summary | ||
|---|---|---|
|
create(String name,
DatasetDescriptor descriptor)
Create a Dataset with the supplied descriptor. |
|
boolean |
delete(String name)
Delete the named Dataset. |
|
boolean |
exists(String name)
Checks if there is a Dataset in this repository named name. |
|
Collection<String> |
list()
List the names of the Datasets in this DatasetRepository. |
|
|
load(String name)
Get the latest version of a named Dataset. |
|
|
update(String name,
DatasetDescriptor descriptor)
Update an existing Dataset to reflect the supplied descriptor. |
|
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Method Detail |
|---|
public <E> Dataset<E> create(String name,
DatasetDescriptor descriptor)
DatasetRepositoryDataset with the supplied descriptor. Depending on
the underlying dataset storage, some schemas types or configurations may
not be supported. If an illegal schema is supplied, an exception will be
thrown by the implementing class. It is illegal to create a more than one
dataset with a given name. If a duplicate name is provided, an exception is
thrown.
name - The fully qualified dataset namedescriptor - A descriptor that describes the schema and other
properties of the dataset
public <E> Dataset<E> update(String name,
DatasetDescriptor descriptor)
DatasetRepositoryDataset to reflect the supplied descriptor. The
common case is updating a dataset schema. Depending on
the underlying dataset storage, some updates may not be supported,
such as a change in format or partition strategy.
Any attempt to make an unsupported or incompatible update will result in an
exception being thrown and no change being made to the dataset.
name - The fully qualified dataset namedescriptor - A descriptor that describes the schema and other properties of the
dataset
public <E> Dataset<E> load(String name)
DatasetRepositoryDataset. If no dataset with the
provided name exists, a DatasetNotFoundException is thrown.
name - The name of the dataset.public boolean delete(String name)
DatasetRepositoryDataset. If no dataset with the
provided name exists, a DatasetNotFoundException is thrown.
name - The name of the dataset.
true if the dataset was successfully deleted, false if the
dataset does not exist.public boolean exists(String name)
DatasetRepositoryDataset in this repository named name.
name - a Dataset name to check the existence of
name exists, false otherwisepublic Collection<String> list()
DatasetRepositoryDatasets in this DatasetRepository.
If there is not at least one Dataset in this repository, an empty
list will be returned.
Collection of Dataset names (Strings)
|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||