org.kitesdk.data.filesystem
Class FileSystemDatasetRepository
java.lang.Object
org.kitesdk.data.spi.AbstractDatasetRepository
org.kitesdk.data.spi.filesystem.FileSystemDatasetRepository
org.kitesdk.data.filesystem.FileSystemDatasetRepository
- All Implemented Interfaces:
- DatasetRepository, org.kitesdk.data.spi.TemporaryDatasetRepositoryAccessor
public class FileSystemDatasetRepository
- extends org.kitesdk.data.spi.filesystem.FileSystemDatasetRepository
A DatasetRepository
that stores data in a Hadoop
FileSystem
.
Given a FileSystem
, a root directory, and a
MetadataProvider
,
DatasetRepository
loads and stores
Dataset
s on both local filesystems and the Hadoop
Distributed FileSystem (HDFS). You can instantiate this class directly with
the three dependencies above, then perform dataset-related operations using
any of the provided methods. The primary methods of interest are
AbstractDatasetRepository.create(String, org.kitesdk.data.DatasetDescriptor)
,
AbstractDatasetRepository.load(String)
, and
FileSystemDatasetRepository.delete(String)
which create a new dataset, load an existing
dataset, or delete an existing dataset, respectively. Once you create or load
a dataset, you can invoke the appropriate Dataset
methods to get a reader or writer as needed.
DatasetWriter
instances returned from this
implementation have the following flush()
method semantics.
For Avro files, flush()
invokes HDFS hflush
,
which guarantees that client buffers are flushed, so new readers see all
entries written up to that point. For Parquet files, flush()
has no effect.
- See Also:
DatasetRepository
,
Dataset
,
DatasetDescriptor
,
PartitionStrategy
Methods inherited from class org.kitesdk.data.spi.filesystem.FileSystemDatasetRepository |
create, delete, exists, getTemporaryRepository, getUri, list, load, partitionKeyForPath, toString, update |
Methods inherited from class org.kitesdk.data.spi.AbstractDatasetRepository |
create, load, update |
Copyright © 2013–2014. All rights reserved.