FileSystemDatasetRepository (Kite Development Kit 0.11.0 API)

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.kitesdk.data.filesystem
Class FileSystemDatasetRepository

java.lang.Object
  org.kitesdk.data.spi.AbstractDatasetRepository
      org.kitesdk.data.filesystem.FileSystemDatasetRepository

All Implemented Interfaces:: DatasetRepository

public class FileSystemDatasetRepository
extends org.kitesdk.data.spi.AbstractDatasetRepository
extends org.kitesdk.data.spi.AbstractDatasetRepository

A DatasetRepository that stores data in a Hadoop FileSystem.

Given a FileSystem, a root directory, and a MetadataProvider, this DatasetRepository implementation can load and store Datasets on both local filesystems as well as the Hadoop Distributed FileSystem (HDFS). Users may directly instantiate this class with the three dependencies above and then perform dataset-related operations using any of the provided methods. The primary methods of interest will be create(String, org.kitesdk.data.DatasetDescriptor), load(String), and delete(String) which create a new dataset, load an existing dataset, or delete an existing dataset, respectively. Once a dataset has been created or loaded, users can invoke the appropriate Dataset methods to get a reader or writer as needed.

DatasetWriter instances returned from this implementation have the following flush() method semantics. For Avro files, flush() will invoke HDFS hflush, which guarantees that client buffers are flushed, so new readers will see all entries written up to that point. For Parquet files, flush() has no effect.

See Also:: DatasetRepository, Dataset, DatasetDescriptor, PartitionStrategy, MetadataProvider

Nested Class Summary
`static class`	`FileSystemDatasetRepository.Builder` A fluent builder to aid in the construction of `FileSystemDatasetRepository` instances.

Constructor Summary
`FileSystemDatasetRepository(Configuration conf, MetadataProvider metadataProvider)` Construct a `FileSystemDatasetRepository` for the given `MetadataProvider` for metadata storage.

Method Summary





<E> Dataset<E>

create(String name,
       DatasetDescriptor descriptor)

Create a Dataset with the supplied descriptor.

boolean delete(String name)
Delete the named Dataset.

boolean exists(String name)
Checks if there is a Dataset in this repository named name.

MetadataProvider getMetadataProvider()

Collection<String> list()
List the names of the Datasets in this DatasetRepository.





<E> Dataset<E>

load(String name)
Get the latest version of a named Dataset.

static PartitionKey partitionKeyForPath(Dataset dataset, URI partitionPath)
Get a PartitionKey corresponding to a partition's filesystem path represented as a URI.

String toString()





<E> Dataset<E>

update(String name,
       DatasetDescriptor descriptor)

Update an existing Dataset to reflect the supplied descriptor.

Methods inherited from class java.lang.Object
`clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait`

Constructor Detail

FileSystemDatasetRepository

public FileSystemDatasetRepository(Configuration conf,
                                   MetadataProvider metadataProvider)

Construct a FileSystemDatasetRepository for the given MetadataProvider for metadata storage.

Parameters:: conf - a Configuration for FileSystem access; metadataProvider - the provider for metadata storage
Since:: 0.8.0

Method Detail