org.kitesdk.data
Class PartitionStrategy

java.lang.Object
  extended by org.kitesdk.data.PartitionStrategy

@Immutable
public class PartitionStrategy
extends Object

The strategy used to determine how a dataset is partitioned.

A PartitionStrategy is configured with one or more FieldPartitioners upon creation. When a Dataset is configured with a partition strategy, that data is considered partitioned. Any entities written to a partitioned dataset are evaluated with its PartitionStrategy which, in turn, produces a PartitionKey that is used by the dataset implementation to select the proper partition.

You should use the inner PartitionStrategy.Builder to create new instances.

See Also:
FieldPartitioner, PartitionKey, DatasetDescriptor, Dataset

Nested Class Summary
static class PartitionStrategy.Builder
          A fluent builder to aid in the construction of PartitionStrategys.
 
Method Summary
 boolean equals(Object o)
           
 int getCardinality()
           Return the cardinality produced by the contained field partitioners.
 List<org.kitesdk.data.spi.FieldPartitioner> getFieldPartitioners()
           Get the list of field partitioners used for partitioning.
 int hashCode()
           
 PartitionKey partitionKey(Object... values)
           Construct a partition key with a variadic array of values corresponding to the field partitioners in this partition strategy.
 PartitionKey partitionKeyForEntity(Object entity)
           Construct a partition key for the given entity.
 PartitionKey partitionKeyForEntity(Object entity, PartitionKey reuseKey)
           Construct a partition key for the given entity, reusing the supplied key if not null.
 String toString()
           
 String toString(boolean pretty)
           
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
 

Method Detail

getFieldPartitioners

public List<org.kitesdk.data.spi.FieldPartitioner> getFieldPartitioners()

Get the list of field partitioners used for partitioning.

FieldPartitioners are returned in the same order they are used during partition selection.


getCardinality

public int getCardinality()

Return the cardinality produced by the contained field partitioners.

This can be used to aid in calculating resource usage during certain operations. For example, when writing data to a partitioned dataset, you can use this method to estimate (or discover exactly, depending on the partition functions) how many leaf partitions exist.

Warning: This method is allowed to lie and should be treated only as a hint. Some partition functions are fixed (for example, hash modulo number of buckets), while others are open-ended (for example, discrete value) and depend on the input data.

Returns:
The estimated (or possibly concrete) number of leaf partitions.

partitionKey

public PartitionKey partitionKey(Object... values)

Construct a partition key with a variadic array of values corresponding to the field partitioners in this partition strategy.

It is permitted to have fewer values than field partitioners, in which case all subpartititions in the unspecified parts of the key are matched by the key.

Null values are not permitted.


partitionKeyForEntity

public PartitionKey partitionKeyForEntity(Object entity)

Construct a partition key for the given entity.

This is a convenient way to find the partition that a given entity is written to, or to find a partition using objects from the entity domain.


partitionKeyForEntity

public PartitionKey partitionKeyForEntity(Object entity,
                                          @Nullable
                                          PartitionKey reuseKey)

Construct a partition key for the given entity, reusing the supplied key if not null.

This is a convenient way to find the partition that a given entity is written to, or to find a partition using objects from the entity domain.


equals

public boolean equals(Object o)
Overrides:
equals in class Object

hashCode

public int hashCode()
Overrides:
hashCode in class Object

toString

public String toString()
Overrides:
toString in class Object

toString

public String toString(boolean pretty)
Parameters:
pretty - true to indent and format JSON
Returns:
this PartitionStrategy as its JSON representation


Copyright © 2013–2014. All rights reserved.