org.kitesdk.data.filesystem
Class FileSystemDatasetRepository
java.lang.Object
org.kitesdk.data.spi.AbstractDatasetRepository
org.kitesdk.data.spi.filesystem.FileSystemDatasetRepository
org.kitesdk.data.filesystem.FileSystemDatasetRepository
- All Implemented Interfaces:
- DatasetRepository
public class FileSystemDatasetRepository
- extends org.kitesdk.data.spi.filesystem.FileSystemDatasetRepository
A DatasetRepository that stores data in a Hadoop
FileSystem.
Given a FileSystem, a root directory, and a
MetadataProvider,
DatasetRepository loads and stores
Datasets on both local filesystems and the Hadoop
Distributed FileSystem (HDFS). You can instantiate this class directly with
the three dependencies above, then perform dataset-related operations using
any of the provided methods. The primary methods of interest are
FileSystemDatasetRepository.create(String, org.kitesdk.data.DatasetDescriptor),
FileSystemDatasetRepository.load(String), and
FileSystemDatasetRepository.delete(String) which create a new dataset, load an existing
dataset, or delete an existing dataset, respectively. Once you create or load
a dataset, you can invoke the appropriate Dataset
methods to get a reader or writer as needed.
DatasetWriter instances returned from this
implementation have the following flush() method semantics.
For Avro files, flush() invokes HDFS hflush,
which guarantees that client buffers are flushed, so new readers see all
entries written up to that point. For Parquet files, flush()
has no effect.
- See Also:
DatasetRepository,
Dataset,
DatasetDescriptor,
PartitionStrategy
| Fields inherited from class org.kitesdk.data.spi.AbstractDatasetRepository |
REPOSITORY_URI_PROPERTY_NAME |
| Methods inherited from class org.kitesdk.data.spi.filesystem.FileSystemDatasetRepository |
create, delete, exists, getUri, list, load, partitionKeyForPath, toString, update |
| Methods inherited from class org.kitesdk.data.spi.AbstractDatasetRepository |
addRepositoryUri |
Copyright © 2013–2014. All rights reserved.