|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectorg.kitesdk.data.DatasetRepositories
public class DatasetRepositories
Convenience methods for working with DatasetRepository
instances.
Constructor Summary | |
---|---|
DatasetRepositories()
|
Method Summary | |
---|---|
static DatasetRepository |
open(String uri)
Synonym for open(java.net.URI) for String URIs. |
static DatasetRepository |
open(URI repositoryUri)
Open a DatasetRepository for the given URI. |
static RandomAccessDatasetRepository |
openRandomAccess(String uri)
Synonym for openRandomAccess(java.net.URI) for String URIs. |
static RandomAccessDatasetRepository |
openRandomAccess(URI repositoryUri)
Synonym for open(java.net.URI) for
RandomAccessDatasetRepository s |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public DatasetRepositories()
Method Detail |
---|
public static DatasetRepository open(String uri)
open(java.net.URI)
for String URIs.
uri
- a String URI
IllegalArgumentException
- If the String cannot be parsed into a
valid URI
.public static DatasetRepository open(URI repositoryUri)
Open a DatasetRepository
for the given URI.
This method provides a way to open to a DatasetRepository
while providing configuration options. For almost all cases, this
is the preferred method for retrieving an instance of a
DatasetRepository
.
The format of a repository URI is as follows.
repo:[storage component]
The [storage component]
indicates the underlying metadata and,
in some cases, physical storage of the data, along with any options. The
supported storage backends are:
file:[path]
where [path]
is a relative or absolute
filesystem path to be used as the dataset repository root directory in which
to store dataset data. When specifying an absolute path, the
null authority
(i.e. file:///my/path
)
form can be used. Alternatively, the authority section can be omitted
entirely (e.g. file:/my/path
). Either way, it is illegal to
provide an authority (i.e.
file://this-part-is-illegal/my/path
). This storage backend
produces a DatasetRepository
that stores both data and metadata
on the local operating system filesystem. See
FileSystemDatasetRepository
for more information.
hdfs://[host]:[port]/[path]
where [host]
and
[port]
indicate the location of the Hadoop NameNode, and
[path]
is the dataset repository root directory in which to
store dataset data. This form loads the Hadoop configuration
information per the usual methods (that is, searching the process's
classpath for the various configuration files). This storage backend
produces a DatasetRepository
that stores both data and metadata in
HDFS. See FileSystemDatasetRepository
for more information.
hive
and
hive://[metastore-host]:[metastore-port]/
connects to the
Hive MetaStore. Dataset locations are determined by Hive as managed
tables.
hive:/[path]
and
hive://[metastore-host]:[metastore-port]/[path]
also
connect to the Hive MetaStore, but tables are external and stored
under [path]
. The repository storage layout is the same
as hdfs
and file
repositories. HDFS connection
options can be supplied by adding hdfs-host
and
hdfs-port
query options to the URI (see examples).
repo:hbase:[zookeeper-host1]:[zk-port],[zookeeper-host2],...
opens an HBase-backed DatasetRepository. This URI can also be
instantiated with openRandomAccess(URI)
to instantiate a RandomAccessDatasetRepository
repo:file:foo/bar |
Store data+metadata on the local filesystem in the directory
./foo/bar . |
repo:file:///data |
Store data+metadata on the local filesystem in the directory
/data |
repo:hdfs://localhost:8020/data |
Same as above, but stores data+metadata on HDFS. |
repo:hive |
Connects to the Hive MetaStore and creates managed tables. |
repo:hive://meta-host:9083/ |
Connects to the Hive MetaStore at thrift://meta-host:9083 ,
and creates managed tables. This only matches when the path is
/ | . Any non-root path matches the external Hive URIs.
repo:hive:/path?hdfs-host=localhost&hdfs-port=8020 |
Connects to the default Hive MetaStore and creates external tables
stored in hdfs://localhost:8020/ at path .
hdfs-host and hdfs-port are optional.
|
repo:hive://meta-host:9083/path?hdfs-host=localhost&hdfs-port=8020
|
Connects to the Hive MetaStore at thrift://meta-host:9083/
and creates external tables stored in hdfs://localhost:8020/
at path . hdfs-host and hdfs-port
are optional.
|
repo:hbase:zk1,zk2,zk3
|
Connects to HBase via the given Zookeeper quorum nodes. |
repositoryUri
- The repository URI
DatasetRepository
public static RandomAccessDatasetRepository openRandomAccess(String uri)
openRandomAccess(java.net.URI)
for String URIs.
uri
- a String URI
RandomAccessDatasetRepository
IllegalArgumentException
- If the String cannot be parsed into a
valid URI
.public static RandomAccessDatasetRepository openRandomAccess(URI repositoryUri)
Synonym for open(java.net.URI)
for
RandomAccessDatasetRepository
s
This method provides a way to connect to a DatasetRepository
the
same way open(java.net.URI)
does, but instead returns an
implementation of type RandomAccessDatasetRepository
.
You should use this method when you need to access a
RandomAccessDataset
to use random access methods, such as
RandomAccessDataset.put(Object)
.
repo:[storage component]
The [storage component]
indicates the underlying metadata and,
in some cases, physical storage of the data, along with any options. The
supported storage backends are:
repo:hbase:[zookeeper-host1]:[zk-port],[zookeeper-host2],...
will open a HBase-backed DatasetRepository. This URI can also be
instantiated with openRandomAccess(URI)
to instantiate a RandomAccessDatasetRepository
repo:hbase:zk1,zk2,zk3
|
Connects to HBase via the given Zookeeper quorum nodes. |
repositoryUri
- The repository URI
RandomAccessDatasetRepository
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |