|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object org.kitesdk.data.spi.AbstractDatasetRepository org.kitesdk.data.hcatalog.HCatalogDatasetRepository
public class HCatalogDatasetRepository
A DatasetRepository
that uses the Hive/HCatalog metastore for metadata,
and stores data in a Hadoop FileSystem
.
The location of the data directory is either chosen by Hive/HCatalog (so called
"managed tables"), or specified when creating an instance of this class by providing
a FileSystem
, and a root directory in the constructor ("external tables").
The primary methods of interest will be
create(String, DatasetDescriptor)
, load(String)
, and
delete(String)
which create a new dataset, load an existing
dataset, or delete an existing dataset, respectively. Once a dataset has been created
or loaded, users can invoke the appropriate Dataset
methods to get a reader
or writer as needed.
DatasetRepository
,
Dataset
Nested Class Summary | |
---|---|
static class |
HCatalogDatasetRepository.Builder
A fluent builder to aid in the construction of HCatalogDatasetRepository
instances. |
Method Summary | ||
---|---|---|
|
create(String name,
DatasetDescriptor descriptor)
Create a Dataset with the supplied descriptor . |
|
boolean |
delete(String name)
Delete the named Dataset . |
|
boolean |
exists(String name)
Checks if there is a Dataset in this repository named name . |
|
Collection<String> |
list()
List the names of the Dataset s in this DatasetRepository . |
|
|
load(String name)
Get the latest version of a named Dataset . |
|
|
update(String name,
DatasetDescriptor descriptor)
Update an existing Dataset to reflect the supplied descriptor . |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Method Detail |
---|
public <E> Dataset<E> create(String name, DatasetDescriptor descriptor)
DatasetRepository
Dataset
with the supplied descriptor
. Depending on
the underlying dataset storage, some schemas types or configurations may
not be supported. If an illegal schema is supplied, an exception will be
thrown by the implementing class. It is illegal to create a more than one
dataset with a given name. If a duplicate name is provided, an exception is
thrown.
name
- The fully qualified dataset namedescriptor
- A descriptor that describes the schema and other
properties of the dataset
public <E> Dataset<E> update(String name, DatasetDescriptor descriptor)
DatasetRepository
Dataset
to reflect the supplied descriptor
. The
common case is updating a dataset schema. Depending on
the underlying dataset storage, some updates may not be supported,
such as a change in format or partition strategy.
Any attempt to make an unsupported or incompatible update will result in an
exception being thrown and no change being made to the dataset.
name
- The fully qualified dataset namedescriptor
- A descriptor that describes the schema and other properties of the
dataset
public <E> Dataset<E> load(String name)
DatasetRepository
Dataset
. If no dataset with the
provided name
exists, a DatasetNotFoundException
is thrown.
name
- The name of the dataset.public boolean delete(String name)
DatasetRepository
Dataset
. If no dataset with the
provided name
exists, a DatasetNotFoundException
is thrown.
name
- The name of the dataset.
true
if the dataset was successfully deleted, false if the
dataset does not exist.public boolean exists(String name)
DatasetRepository
Dataset
in this repository named name
.
name
- a Dataset
name to check the existence of
name
exists, false otherwisepublic Collection<String> list()
DatasetRepository
Dataset
s in this DatasetRepository
.
If there is not at least one Dataset
in this repository, an empty
list will be returned.
Collection
of Dataset names (String
s)
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |