FileSystemDatasetRepository (Kite Development Kit 0.13.0 API)

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.kitesdk.data.filesystem
Class FileSystemDatasetRepository

java.lang.Object
  org.kitesdk.data.spi.AbstractDatasetRepository
      org.kitesdk.data.spi.filesystem.FileSystemDatasetRepository
          org.kitesdk.data.filesystem.FileSystemDatasetRepository

All Implemented Interfaces:: DatasetRepository

public class FileSystemDatasetRepository
extends org.kitesdk.data.spi.filesystem.FileSystemDatasetRepository
extends org.kitesdk.data.spi.filesystem.FileSystemDatasetRepository

A DatasetRepository that stores data in a Hadoop FileSystem.

Given a FileSystem, a root directory, and a MetadataProvider, DatasetRepository loads and stores Datasets on both local filesystems and the Hadoop Distributed FileSystem (HDFS). You can instantiate this class directly with the three dependencies above, then perform dataset-related operations using any of the provided methods. The primary methods of interest are FileSystemDatasetRepository.create(String, org.kitesdk.data.DatasetDescriptor), FileSystemDatasetRepository.load(String), and FileSystemDatasetRepository.delete(String) which create a new dataset, load an existing dataset, or delete an existing dataset, respectively. Once you create or load a dataset, you can invoke the appropriate Dataset methods to get a reader or writer as needed.

DatasetWriter instances returned from this implementation have the following flush() method semantics. For Avro files, flush() invokes HDFS hflush, which guarantees that client buffers are flushed, so new readers see all entries written up to that point. For Parquet files, flush() has no effect.

See Also:: DatasetRepository, Dataset, DatasetDescriptor, PartitionStrategy

Nested Class Summary
`static class`	`FileSystemDatasetRepository.Builder` A fluent builder to aid in the construction of `FileSystemDatasetRepository` instances.

Field Summary

Fields inherited from class org.kitesdk.data.spi.AbstractDatasetRepository
`REPOSITORY_URI_PROPERTY_NAME`

Method Summary

Methods inherited from class org.kitesdk.data.spi.filesystem.FileSystemDatasetRepository
`create, delete, exists, getUri, list, load, partitionKeyForPath, toString, update`

Methods inherited from class org.kitesdk.data.spi.AbstractDatasetRepository
`addRepositoryUri`

Methods inherited from class java.lang.Object
`clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait`