org.kitesdk.data
Interface DatasetRepository

All Known Subinterfaces:
RandomAccessDatasetRepository
All Known Implementing Classes:
org.kitesdk.data.spi.AbstractDatasetRepository, FileSystemDatasetRepository, HCatalogDatasetRepository

@Immutable
public interface DatasetRepository

A logical repository (storage system) of Datasets.

Implementations of DatasetRepository are storage systems that contain zero or more Datasets. A repository acts as a factory, as well as a registry, of datasets. Users can create(String, DatasetDescriptor) a new Dataset with a name and schema, or retrieve a handle to an existing dataset, by name, by way of the load(String) method. While not expressly forbidden, most repositories are expected to support only a single concrete Dataset implementation.

No guarantees are made as to the durability, reliability, or availability of the underlying storage. That is, a DatasetRepository could be on disk, in memory, or some combination. See the implementation class for details about the guarantees it provides.

Implementations of DatasetRepository are immutable.

See Also:
Dataset, DatasetDescriptor

Method Summary
<E> Dataset<E>
create(String name, DatasetDescriptor descriptor)
          Create a Dataset with the supplied descriptor.
 boolean delete(String name)
          Delete the named Dataset.
 boolean exists(String name)
          Checks if there is a Dataset in this repository named name.
 Collection<String> list()
          List the names of the Datasets in this DatasetRepository.
<E> Dataset<E>
load(String name)
          Get the latest version of a named Dataset.
<E> Dataset<E>
update(String name, DatasetDescriptor descriptor)
          Update an existing Dataset to reflect the supplied descriptor.
 

Method Detail

load

<E> Dataset<E> load(String name)
Get the latest version of a named Dataset. If no dataset with the provided name exists, a DatasetNotFoundException is thrown.

Parameters:
name - The name of the dataset.
Throws:
DatasetNotFoundException - If there is no data set named name
DatasetRepositoryException
Since:
0.7.0

create

<E> Dataset<E> create(String name,
                      DatasetDescriptor descriptor)
Create a Dataset with the supplied descriptor. Depending on the underlying dataset storage, some schemas types or configurations may not be supported. If an illegal schema is supplied, an exception will be thrown by the implementing class. It is illegal to create a more than one dataset with a given name. If a duplicate name is provided, an exception is thrown.

Parameters:
name - The fully qualified dataset name
descriptor - A descriptor that describes the schema and other properties of the dataset
Returns:
The newly created dataset
Throws:
IllegalArgumentException - If name or descriptor is null
DatasetExistsException - If a Dataset named name already exists.
ConcurrentSchemaModificationException - If the Dataset schema is updated concurrently.
IncompatibleSchemaException - If the schema is not compatible with existing datasets with shared storage (e.g. in the same HBase table).
DatasetRepositoryException

update

<E> Dataset<E> update(String name,
                      DatasetDescriptor descriptor)
Update an existing Dataset to reflect the supplied descriptor. The common case is updating a dataset schema. Depending on the underlying dataset storage, some updates may not be supported, such as a change in format or partition strategy. Any attempt to make an unsupported or incompatible update will result in an exception being thrown and no change being made to the dataset.

Parameters:
name - The fully qualified dataset name
descriptor - A descriptor that describes the schema and other properties of the dataset
Returns:
The updated dataset
Throws:
IllegalArgumentException - If name is null
DatasetNotFoundException - If there is no data set named name
UnsupportedOperationException - If descriptor updates are not supported by the implementation.
ConcurrentSchemaModificationException - If the Dataset schema is updated concurrently.
IncompatibleSchemaException - If the schema is not compatible with previous schemas, or with existing datasets with shared storage (e.g. in the same HBase table).
DatasetRepositoryException
Since:
0.3.0

delete

boolean delete(String name)
Delete the named Dataset. If no dataset with the provided name exists, a DatasetNotFoundException is thrown.

Parameters:
name - The name of the dataset.
Returns:
true if the dataset was successfully deleted, false if the dataset does not exist.
Throws:
IllegalArgumentException - If name is null
DatasetNotFoundException - If the Dataset location cannot be determined because no metadata exists.
ConcurrentSchemaModificationException - If the Dataset schema is updated concurrently.
DatasetRepositoryException
Since:
0.7.0

exists

boolean exists(String name)
Checks if there is a Dataset in this repository named name.

Parameters:
name - a Dataset name to check the existence of
Returns:
true if a Dataset named name exists, false otherwise
Throws:
IllegalArgumentException - If name is null
DatasetRepositoryException
Since:
0.7.0

list

Collection<String> list()
List the names of the Datasets in this DatasetRepository. If there is not at least one Dataset in this repository, an empty list will be returned.

Returns:
a Collection of Dataset names (Strings)
Throws:
DatasetRepositoryException
Since:
0.7.0


Copyright © 2013–2014. All rights reserved.