|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectorg.kitesdk.data.spi.AbstractDatasetRepository
org.kitesdk.data.filesystem.FileSystemDatasetRepository
public class FileSystemDatasetRepository
A DatasetRepository that stores data in a Hadoop FileSystem.
Given a FileSystem, a root directory, and a MetadataProvider,
this DatasetRepository implementation can load and store
Datasets on both local filesystems as well as the Hadoop Distributed
FileSystem (HDFS). Users may directly instantiate this class with the three
dependencies above and then perform dataset-related operations using any of
the provided methods. The primary methods of interest will be
create(String, org.kitesdk.data.DatasetDescriptor),
load(String), and
delete(String) which create a new dataset, load an existing
dataset, or delete an existing dataset, respectively. Once a dataset has been created
or loaded, users can invoke the appropriate Dataset methods to get a reader
or writer as needed.
DatasetWriter instances returned from this
implementation have the following flush() method semantics. For Avro
files, flush() will invoke HDFS hflush,
which guarantees that client buffers are flushed, so new readers will see all
entries written up to that point. For Parquet files, flush() has no
effect.
DatasetRepository,
Dataset,
DatasetDescriptor,
PartitionStrategy,
MetadataProvider| Nested Class Summary | |
|---|---|
static class |
FileSystemDatasetRepository.Builder
A fluent builder to aid in the construction of FileSystemDatasetRepository
instances. |
| Constructor Summary | |
|---|---|
FileSystemDatasetRepository(Configuration conf,
MetadataProvider metadataProvider)
Construct a FileSystemDatasetRepository for the given
MetadataProvider for metadata storage. |
|
| Method Summary | ||
|---|---|---|
|
create(String name,
DatasetDescriptor descriptor)
Create a Dataset with the supplied descriptor. |
|
boolean |
delete(String name)
Delete the named Dataset. |
|
boolean |
exists(String name)
Checks if there is a Dataset in this repository named name. |
|
MetadataProvider |
getMetadataProvider()
|
|
Collection<String> |
list()
List the names of the Datasets in this DatasetRepository. |
|
|
load(String name)
Get the latest version of a named Dataset. |
|
static PartitionKey |
partitionKeyForPath(Dataset dataset,
URI partitionPath)
Get a PartitionKey corresponding to a partition's filesystem path
represented as a URI. |
|
String |
toString()
|
|
|
update(String name,
DatasetDescriptor descriptor)
Update an existing Dataset to reflect the supplied descriptor. |
|
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait |
| Constructor Detail |
|---|
public FileSystemDatasetRepository(Configuration conf,
MetadataProvider metadataProvider)
FileSystemDatasetRepository for the given
MetadataProvider for metadata storage.
conf - a Configuration for FileSystem accessmetadataProvider - the provider for metadata storage| Method Detail |
|---|
public <E> Dataset<E> create(String name,
DatasetDescriptor descriptor)
DatasetRepositoryDataset with the supplied descriptor. Depending on
the underlying dataset storage, some schemas types or configurations may
not be supported. If an illegal schema is supplied, an exception will be
thrown by the implementing class. It is illegal to create a more than one
dataset with a given name. If a duplicate name is provided, an exception is
thrown.
name - The fully qualified dataset namedescriptor - A descriptor that describes the schema and other
properties of the dataset
public <E> Dataset<E> update(String name,
DatasetDescriptor descriptor)
DatasetRepositoryDataset to reflect the supplied descriptor. The
common case is updating a dataset schema. Depending on
the underlying dataset storage, some updates may not be supported,
such as a change in format or partition strategy.
Any attempt to make an unsupported or incompatible update will result in an
exception being thrown and no change being made to the dataset.
name - The fully qualified dataset namedescriptor - A descriptor that describes the schema and other properties of the
dataset
public <E> Dataset<E> load(String name)
DatasetRepositoryDataset. If no dataset with the
provided name exists, a DatasetNotFoundException is thrown.
name - The name of the dataset.public boolean delete(String name)
DatasetRepositoryDataset. If no dataset with the
provided name exists, a DatasetNotFoundException is thrown.
name - The name of the dataset.
true if the dataset was successfully deleted, false if the
dataset does not exist.public boolean exists(String name)
DatasetRepositoryDataset in this repository named name.
name - a Dataset name to check the existence of
name exists, false otherwisepublic Collection<String> list()
DatasetRepositoryDatasets in this DatasetRepository.
If there is not at least one Dataset in this repository, an empty
list will be returned.
Collection of Dataset names (Strings)
public static PartitionKey partitionKeyForPath(Dataset dataset,
URI partitionPath)
PartitionKey corresponding to a partition's filesystem path
represented as a URI. If the path is not a valid partition,
then IllegalArgumentException is thrown. Note that the partition does not
have to exist.
dataset - the filesystem datasetpartitionPath - a directory path where the partition data is stored
public String toString()
toString in class Objectpublic MetadataProvider getMetadataProvider()
MetadataProvider being used by this repository.
|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||