edu.umd.cloud9.io
Class Tuple

java.lang.Object
  extended by edu.umd.cloud9.io.Tuple
All Implemented Interfaces:
Comparable<Tuple>, Writable, WritableComparable<Tuple>

public class Tuple
extends Object
implements WritableComparable<Tuple>

Writable representing an arbitrary tuple. Tuples are instantiated from a Schema. The Tuple class implements WritableComparable, so it can be directly used as MapReduce keys and values. The natural sort order of tuples is defined by an internally-generated byte representation and is not based on field values. This class, combined with ArrayListWritable and HashMapWritable, allows the user to define arbitrarily complex data structures.

All fields can either be indexed via its integer position or its field name. Each field is typed, which can be determined via getFieldType(int). Fields can either contain an object of the specified type or a special symbol String. The method containsSymbol(int) can be used to check if a field contains a special symbol. If the field contains a special symbol, get(int) will return null. If the field does not contain a special symbol, getSymbol(int) will return null.

Here is a typical usage scenario for special symbols: say you had tuples that represented count(a, b), where a and b are tokens you observe. There is often a need to compute count(a, *), for example, to derive conditional probabilities. In this case, you can use a special symbol to represent the *, and distinguish it from the lexical token '*'.

The natural sort order of the Tuple is defined by Comparable.compareTo(Object). Tuples are sorted by field, with special symbols always appearing first within each field.

Author:
Jimmy Lin
See Also:
ArrayListWritable, HashMapWritable, Schema

Constructor Summary
Tuple()
          Creates an empty Tuple.
 
Method Summary
 int compareTo(Tuple that)
           Defines a natural sort order for the Tuple class.
 boolean containsSymbol(int i)
          Determines if a particular field (by position) contains a special symbol.
 boolean containsSymbol(String field)
          Determines if a particular field (by name) contains a special symbol.
static Tuple createFrom(DataInput in)
          Factory method for deserializing a Tuple object.
 Object get(int i)
          Returns object at a particular field (by position) in this Tuple.
 Object get(String field)
          Returns object at a particular field (by name) in this Tuple.
 int getFieldCount()
           
 Class<?> getFieldType(int i)
          Returns the type of a particular field (by position).
 Class<?> getFieldType(String field)
          Returns the type of a particular field (by name).
 String getSymbol(int i)
          Returns special symbol at a particular field (by position).
 String getSymbol(String field)
          Returns special symbol at a particular field (by name).
 int hashCode()
          Returns a hash code for this Tuple.
 void readFields(DataInput in)
          Deserializes the Tuple.
 void set(int i, Object o)
          Sets the object at a particular field (by position) in this Tuple.
 void set(String field, Object o)
          Sets the object at a particular field (by name) in this Tuple.
 void setSymbol(int i, String s)
          Sets a special symbol at a particular field (by position) in this Tuple.
 void setSymbol(String field, String s)
          Sets a special symbol at a particular field (by name) in this Tuple.
 String toString()
          Generates human-readable String representation of this Tuple.
 void write(DataOutput out)
          Serializes this Tuple.
 
Methods inherited from class java.lang.Object
equals, getClass, notify, notifyAll, wait, wait, wait
 

Constructor Detail

Tuple

public Tuple()
Creates an empty Tuple. This constructor is needed by Hadoop's framework for deserializing Writable objects. The preferred way to instantiate tuples is through Schema.instantiate(Object...).

Method Detail

createFrom

public static Tuple createFrom(DataInput in)
                        throws IOException
Factory method for deserializing a Tuple object.

Parameters:
in - raw byte source of the Tuple
Returns:
a new Tuple
Throws:
IOException

set

public void set(int i,
                Object o)
Sets the object at a particular field (by position) in this Tuple.

Parameters:
i - field position
o - object to set at the specified field

set

public void set(String field,
                Object o)
Sets the object at a particular field (by name) in this Tuple.

Parameters:
field - field name
o - object to set at the specified field

setSymbol

public void setSymbol(int i,
                      String s)
Sets a special symbol at a particular field (by position) in this Tuple.

Parameters:
i - field position
s - special symbol to set at specified field

setSymbol

public void setSymbol(String field,
                      String s)
Sets a special symbol at a particular field (by name) in this Tuple.

Parameters:
field - field name
s - special symbol to set at specified field

get

public Object get(int i)
Returns object at a particular field (by position) in this Tuple. Returns null if the field contains a special symbol.

Parameters:
i - field position
Returns:
object at field, or null if the field contains a special symbol

get

public Object get(String field)
Returns object at a particular field (by name) in this Tuple. Returns null if the field contains a special symbol.

Parameters:
field - field name
Returns:
object at field, or null if the field contains a special symbol

getSymbol

public String getSymbol(int i)
Returns special symbol at a particular field (by position). Returns null if the field does not contain a special symbol.

Parameters:
i - field position
Returns:
special symbol at field, or null if the field does not contain a special symbol.

getSymbol

public String getSymbol(String field)
Returns special symbol at a particular field (by name). Returns null if the field does not contain a special symbol.

Parameters:
field - field name
Returns:
special symbol at field, or null if the field does not contain a special symbol.

containsSymbol

public boolean containsSymbol(int i)
Determines if a particular field (by position) contains a special symbol.

Parameters:
i - field position
Returns:
true if the field contains a special symbol, or false otherwise

containsSymbol

public boolean containsSymbol(String field)
Determines if a particular field (by name) contains a special symbol.

Parameters:
field - field name
Returns:
true if the field contains a special symbol, or false otherwise

getFieldType

public Class<?> getFieldType(int i)
Returns the type of a particular field (by position).

Parameters:
i - field position
Returns:
type of the field

getFieldType

public Class<?> getFieldType(String field)
Returns the type of a particular field (by name).

Parameters:
field - field name
Returns:
type of the field

getFieldCount

public int getFieldCount()

readFields

public void readFields(DataInput in)
                throws IOException
Deserializes the Tuple.

Specified by:
readFields in interface Writable
Parameters:
in - source for raw byte representation
Throws:
IOException

write

public void write(DataOutput out)
           throws IOException
Serializes this Tuple.

Specified by:
write in interface Writable
Parameters:
out - where to write the raw byte representation
Throws:
IOException

toString

public String toString()
Generates human-readable String representation of this Tuple.

Overrides:
toString in class Object
Returns:
human-readable String representation of this Tuple

compareTo

public int compareTo(Tuple that)

Defines a natural sort order for the Tuple class. Following standard convention, this method returns a value less than zero, a value greater than zero, or zero if this Tuple should be sorted before, sorted after, or is equal to obj. The sort order is defined as follows:

Specified by:
compareTo in interface Comparable<Tuple>
Returns:
a value less than zero, a value greater than zero, or zero if this Tuple should be sorted before, sorted after, or is equal to obj.

hashCode

public int hashCode()
Returns a hash code for this Tuple.

Overrides:
hashCode in class Object
Returns:
hash code for this Tuple