org.kitesdk.data
Interface DatasetRepository

All Known Subinterfaces:
RandomAccessDatasetRepository
All Known Implementing Classes:
org.kitesdk.data.spi.AbstractDatasetRepository, org.kitesdk.data.spi.filesystem.FileSystemDatasetRepository, FileSystemDatasetRepository, HCatalogDatasetRepository

@Immutable
public interface DatasetRepository

A logical repository (storage system) of Datasets.

Implementations of DatasetRepository are storage systems that contain zero or more Datasets. A repository acts as a factory, as well as a registry, of datasets. You can create(String, DatasetDescriptor) a new Dataset with a name and schema, or retrieve a handle to an existing dataset, by name, by way of the load(String) method. While not expressly forbidden, most repositories are expected to support only a single concrete Dataset implementation.

No guarantees are made as to the durability, reliability, or availability of the underlying storage. That is, a DatasetRepository could be on disk, in memory, or some combination. See the implementation class for details about the guarantees it provides.

Implementations of DatasetRepository are immutable.

See Also:
Dataset, DatasetDescriptor

Method Summary
<E> Dataset<E>
create(String name, DatasetDescriptor descriptor)
          Create a Dataset with the supplied descriptor.
<E> Dataset<E>
create(String name, DatasetDescriptor descriptor, Class<E> type)
          Create a Dataset with the supplied descriptor.
 boolean delete(String name)
          Delete data for the Dataset named name and remove its DatasetDescriptor from the underlying metadata provider.
 boolean exists(String name)
          Checks if there is a Dataset in this repository named name.
 URI getUri()
          Return the URI of this repository.
 Collection<String> list()
          List the names of the Datasets in this DatasetRepository.
<E> Dataset<E>
load(String name)
          Get the latest version of a named Dataset.
<E> Dataset<E>
load(String name, Class<E> type)
          Get the latest version of a named Dataset.
<E> Dataset<E>
update(String name, DatasetDescriptor descriptor)
          Update an existing Dataset to reflect the supplied descriptor.
<E> Dataset<E>
update(String name, DatasetDescriptor descriptor, Class<E> type)
          Update an existing Dataset to reflect the supplied descriptor.
 

Method Detail

load

<E> Dataset<E> load(String name)
Get the latest version of a named Dataset. If no dataset with the provided name exists, a DatasetNotFoundException is thrown.

Parameters:
name - The name of the dataset.
Throws:
DatasetNotFoundException - if there is no data set named name
DatasetRepositoryException
Since:
0.7.0

load

<E> Dataset<E> load(String name,
                    Class<E> type)
Get the latest version of a named Dataset. If no dataset with the provided name exists, a DatasetNotFoundException is thrown.

Parameters:
name - The name of the dataset.
type - the Java type of entities in the dataset
Throws:
DatasetNotFoundException - if there is no data set named name
DatasetRepositoryException
Since:
0.15.0

create

<E> Dataset<E> create(String name,
                      DatasetDescriptor descriptor)
Create a Dataset with the supplied descriptor. Depending on the underlying dataset storage, some schema types or configurations might not be supported. If you supply an illegal schema, the implementing class throws an exception. It is illegal to create more than one dataset with the same name. If you provide a duplicate name, the implementing class throws an exception.

Parameters:
name - The fully qualified dataset name
descriptor - A descriptor that describes the schema and other properties of the dataset
Returns:
The newly created dataset
Throws:
IllegalArgumentException - if name or descriptor is null
DatasetExistsException - if a Dataset named name already exists.
ConcurrentSchemaModificationException - if the Dataset schema is updated concurrently.
IncompatibleSchemaException - if the schema is not compatible with existing datasets with shared storage (for example, in the same HBase table).
DatasetRepositoryException

create

<E> Dataset<E> create(String name,
                      DatasetDescriptor descriptor,
                      Class<E> type)
Create a Dataset with the supplied descriptor. Depending on the underlying dataset storage, some schema types or configurations might not be supported. If you supply an illegal schema, the implementing class throws an exception. It is illegal to create more than one dataset with the same name. If you provide a duplicate name, the implementing class throws an exception.

Parameters:
name - The fully qualified dataset name
descriptor - A descriptor that describes the schema and other properties of the dataset
type - the Java type of entities in the dataset
Returns:
The newly created dataset
Throws:
IllegalArgumentException - if name or descriptor is null
DatasetExistsException - if a Dataset named name already exists.
ConcurrentSchemaModificationException - if the Dataset schema is updated concurrently.
IncompatibleSchemaException - if the schema is not compatible with existing datasets with shared storage (for example, in the same HBase table).
DatasetRepositoryException
Since:
0.15.0

update

<E> Dataset<E> update(String name,
                      DatasetDescriptor descriptor)
Update an existing Dataset to reflect the supplied descriptor. The common case is updating a dataset schema. Depending on the underlying dataset storage, some updates might not be supported, such as a change in format or partition strategy. Any attempt to make an unsupported or incompatible update results in an exception being thrown and no changes made to the dataset.

Parameters:
name - The fully qualified dataset name
descriptor - A descriptor that describes the schema and other properties of the dataset
Returns:
The updated dataset
Throws:
IllegalArgumentException - if name is null
DatasetNotFoundException - if there is no data set named name
UnsupportedOperationException - if descriptor updates are not supported by the implementation
ConcurrentSchemaModificationException - if the Dataset schema is updated concurrently
IncompatibleSchemaException - if the schema is not compatible with previous schemas, or with existing datasets with shared storage (for example, in the same HBase table).
DatasetRepositoryException
Since:
0.3.0

update

<E> Dataset<E> update(String name,
                      DatasetDescriptor descriptor,
                      Class<E> type)
Update an existing Dataset to reflect the supplied descriptor. The common case is updating a dataset schema. Depending on the underlying dataset storage, some updates might not be supported, such as a change in format or partition strategy. Any attempt to make an unsupported or incompatible update results in an exception being thrown and no changes made to the dataset.

Parameters:
name - The fully qualified dataset name
descriptor - A descriptor that describes the schema and other properties of the dataset
type - the Java type of entities in the dataset
Returns:
The updated dataset
Throws:
IllegalArgumentException - if name is null
DatasetNotFoundException - if there is no data set named name
UnsupportedOperationException - if descriptor updates are not supported by the implementation
ConcurrentSchemaModificationException - if the Dataset schema is updated concurrently
IncompatibleSchemaException - if the schema is not compatible with previous schemas, or with existing datasets with shared storage (for example, in the same HBase table).
DatasetRepositoryException
Since:
0.15.0

delete

boolean delete(String name)
Delete data for the Dataset named name and remove its DatasetDescriptor from the underlying metadata provider. After this method is called, there is no Dataset with the given name, unless an exception is thrown. If either data or metadata are removed, this method returns true. If there is no Dataset corresponding to the given name, this method makes no changes and returns false.

Parameters:
name - The name of the dataset to delete.
Returns:
true if any data or metadata is removed, false if no action is taken.
Throws:
IllegalArgumentException - if name is null
ConcurrentSchemaModificationException - if the Dataset schema is updated concurrently.
DatasetRepositoryException
Since:
0.7.0

exists

boolean exists(String name)
Checks if there is a Dataset in this repository named name.

Parameters:
name - a Dataset name to check the existence of
Returns:
true if a Dataset named name exists, false otherwise
Throws:
IllegalArgumentException - if name is null
DatasetRepositoryException
Since:
0.7.0

list

Collection<String> list()
List the names of the Datasets in this DatasetRepository. If there is not at least one Dataset in this repository, an empty list is returned.

Returns:
a Collection of Dataset names (Strings)
Throws:
DatasetRepositoryException
Since:
0.7.0

getUri

URI getUri()
Return the URI of this repository. When used with the DatasetRepositories.open(java.net.URI) (or DatasetRepositories.openRandomAccess(java.net.URI)) method, an equivalent DatasetRepository object to this is returned.

Returns:
the URI of this repository
Since:
0.12.0


Copyright © 2013–2014. All rights reserved.