org.kitesdk.data.filesystem
Class FileSystemDatasetRepository

java.lang.Object
  extended by org.kitesdk.data.spi.AbstractDatasetRepository
      extended by org.kitesdk.data.spi.filesystem.FileSystemDatasetRepository
          extended by org.kitesdk.data.filesystem.FileSystemDatasetRepository
All Implemented Interfaces:
DatasetRepository, org.kitesdk.data.spi.TemporaryDatasetRepositoryAccessor

public class FileSystemDatasetRepository
extends org.kitesdk.data.spi.filesystem.FileSystemDatasetRepository

A DatasetRepository that stores data in a Hadoop FileSystem.

Given a FileSystem, a root directory, and a MetadataProvider, DatasetRepository loads and stores Datasets on both local filesystems and the Hadoop Distributed FileSystem (HDFS). You can instantiate this class directly with the three dependencies above, then perform dataset-related operations using any of the provided methods. The primary methods of interest are AbstractDatasetRepository.create(String, org.kitesdk.data.DatasetDescriptor), AbstractDatasetRepository.load(String), and FileSystemDatasetRepository.delete(String) which create a new dataset, load an existing dataset, or delete an existing dataset, respectively. Once you create or load a dataset, you can invoke the appropriate Dataset methods to get a reader or writer as needed.

DatasetWriter instances returned from this implementation have the following flush() method semantics. For Avro files, flush() invokes HDFS hflush, which guarantees that client buffers are flushed, so new readers see all entries written up to that point. For Parquet files, flush() has no effect.

See Also:
DatasetRepository, Dataset, DatasetDescriptor, PartitionStrategy

Nested Class Summary
static class FileSystemDatasetRepository.Builder
          A fluent builder to aid in the construction of FileSystemDatasetRepository instances.
 
Method Summary
 
Methods inherited from class org.kitesdk.data.spi.filesystem.FileSystemDatasetRepository
create, delete, exists, getTemporaryRepository, getUri, list, load, partitionKeyForPath, toString, update
 
Methods inherited from class org.kitesdk.data.spi.AbstractDatasetRepository
create, load, update
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 



Copyright © 2013–2014. All rights reserved.