org.kitesdk.data.hcatalog
Class HCatalogDatasetRepository

java.lang.Object
  extended by org.kitesdk.data.spi.AbstractDatasetRepository
      extended by org.kitesdk.data.spi.filesystem.FileSystemDatasetRepository
          extended by org.kitesdk.data.hcatalog.HCatalogDatasetRepository
All Implemented Interfaces:
DatasetRepository

public class HCatalogDatasetRepository
extends org.kitesdk.data.spi.filesystem.FileSystemDatasetRepository

A DatasetRepository that uses the Hive/HCatalog metastore for metadata, and stores data in a Hadoop FileSystem.

The location of the data directory is either chosen by Hive/HCatalog (so called "managed tables"), or specified when creating an instance of this class by providing a FileSystem, and a root directory in the constructor ("external tables").

The primary methods of interest will be create(String, DatasetDescriptor), FileSystemDatasetRepository.load(String), and delete(String) which create a new dataset, load an existing dataset, or delete an existing dataset, respectively. Once a dataset has been created or loaded, users can invoke the appropriate Dataset methods to get a reader or writer as needed.

See Also:
DatasetRepository, Dataset

Nested Class Summary
static class HCatalogDatasetRepository.Builder
          A fluent builder to aid in the construction of HCatalogDatasetRepository instances.
 
Field Summary
 
Fields inherited from class org.kitesdk.data.spi.AbstractDatasetRepository
REPOSITORY_URI_PROPERTY_NAME
 
Method Summary
<E> Dataset<E>
create(String name, DatasetDescriptor descriptor)
          Create a Dataset with the supplied descriptor.
 boolean delete(String name)
          Delete data for the Dataset named name and remove its DatasetDescriptor from the underlying metadata provider.
 
Methods inherited from class org.kitesdk.data.spi.filesystem.FileSystemDatasetRepository
exists, getUri, list, load, partitionKeyForPath, toString, update
 
Methods inherited from class org.kitesdk.data.spi.AbstractDatasetRepository
addRepositoryUri
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Method Detail

create

public <E> Dataset<E> create(String name,
                             DatasetDescriptor descriptor)
Description copied from interface: DatasetRepository
Create a Dataset with the supplied descriptor. Depending on the underlying dataset storage, some schema types or configurations might not be supported. If you supply an illegal schema, the implementing class throws an exception. It is illegal to create more than one dataset with the same name. If you provide a duplicate name, the implementing class throws an exception.

Specified by:
create in interface DatasetRepository
Overrides:
create in class org.kitesdk.data.spi.filesystem.FileSystemDatasetRepository
Parameters:
name - The fully qualified dataset name
descriptor - A descriptor that describes the schema and other properties of the dataset
Returns:
The newly created dataset

delete

public boolean delete(String name)
Description copied from interface: DatasetRepository
Delete data for the Dataset named name and remove its DatasetDescriptor from the underlying metadata provider. After this method is called, there is no Dataset with the given name, unless an exception is thrown. If either data or metadata are removed, this method returns true. If there is no Dataset corresponding to the given name, this method makes no changes and returns false.

Specified by:
delete in interface DatasetRepository
Overrides:
delete in class org.kitesdk.data.spi.filesystem.FileSystemDatasetRepository
Parameters:
name - The name of the dataset to delete.
Returns:
true if any data or metadata is removed, false if no action is taken.


Copyright © 2013–2014. All rights reserved.