org.kitesdk.data.hcatalog
Class HCatalogDatasetRepository

java.lang.Object
  extended by org.kitesdk.data.spi.AbstractDatasetRepository
      extended by org.kitesdk.data.hcatalog.HCatalogDatasetRepository
All Implemented Interfaces:
DatasetRepository

public class HCatalogDatasetRepository
extends org.kitesdk.data.spi.AbstractDatasetRepository

A DatasetRepository that uses the Hive/HCatalog metastore for metadata, and stores data in a Hadoop FileSystem.

The location of the data directory is either chosen by Hive/HCatalog (so called "managed tables"), or specified when creating an instance of this class by providing a FileSystem, and a root directory in the constructor ("external tables").

The primary methods of interest will be create(String, DatasetDescriptor), load(String), and delete(String) which create a new dataset, load an existing dataset, or delete an existing dataset, respectively. Once a dataset has been created or loaded, users can invoke the appropriate Dataset methods to get a reader or writer as needed.

See Also:
DatasetRepository, Dataset

Nested Class Summary
static class HCatalogDatasetRepository.Builder
          A fluent builder to aid in the construction of HCatalogDatasetRepository instances.
 
Method Summary
<E> Dataset<E>
create(String name, DatasetDescriptor descriptor)
          Create a Dataset with the supplied descriptor.
 boolean delete(String name)
          Delete the named Dataset.
 boolean exists(String name)
          Checks if there is a Dataset in this repository named name.
 Collection<String> list()
          List the names of the Datasets in this DatasetRepository.
<E> Dataset<E>
load(String name)
          Get the latest version of a named Dataset.
<E> Dataset<E>
update(String name, DatasetDescriptor descriptor)
          Update an existing Dataset to reflect the supplied descriptor.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Method Detail

create

public <E> Dataset<E> create(String name,
                             DatasetDescriptor descriptor)
Description copied from interface: DatasetRepository
Create a Dataset with the supplied descriptor. Depending on the underlying dataset storage, some schemas types or configurations may not be supported. If an illegal schema is supplied, an exception will be thrown by the implementing class. It is illegal to create a more than one dataset with a given name. If a duplicate name is provided, an exception is thrown.

Parameters:
name - The fully qualified dataset name
descriptor - A descriptor that describes the schema and other properties of the dataset
Returns:
The newly created dataset

update

public <E> Dataset<E> update(String name,
                             DatasetDescriptor descriptor)
Description copied from interface: DatasetRepository
Update an existing Dataset to reflect the supplied descriptor. The common case is updating a dataset schema. Depending on the underlying dataset storage, some updates may not be supported, such as a change in format or partition strategy. Any attempt to make an unsupported or incompatible update will result in an exception being thrown and no change being made to the dataset.

Parameters:
name - The fully qualified dataset name
descriptor - A descriptor that describes the schema and other properties of the dataset
Returns:
The updated dataset

load

public <E> Dataset<E> load(String name)
Description copied from interface: DatasetRepository
Get the latest version of a named Dataset. If no dataset with the provided name exists, a DatasetNotFoundException is thrown.

Parameters:
name - The name of the dataset.

delete

public boolean delete(String name)
Description copied from interface: DatasetRepository
Delete the named Dataset. If no dataset with the provided name exists, a DatasetNotFoundException is thrown.

Parameters:
name - The name of the dataset.
Returns:
true if the dataset was successfully deleted, false if the dataset does not exist.

exists

public boolean exists(String name)
Description copied from interface: DatasetRepository
Checks if there is a Dataset in this repository named name.

Parameters:
name - a Dataset name to check the existence of
Returns:
true if a Dataset named name exists, false otherwise

list

public Collection<String> list()
Description copied from interface: DatasetRepository
List the names of the Datasets in this DatasetRepository. If there is not at least one Dataset in this repository, an empty list will be returned.

Returns:
a Collection of Dataset names (Strings)


Copyright © 2013–2014. All rights reserved.