edu.umd.cloud9.collection.trec
Class TrecDocument

java.lang.Object
  extended by edu.umd.cloud9.collection.Indexable
      extended by edu.umd.cloud9.collection.trec.TrecDocument
All Implemented Interfaces:
Writable

public class TrecDocument
extends Indexable

Object representing a TREC document.

Author:
Jimmy Lin

Field Summary
static String XML_END_TAG
          End delimiter of the document, which is </DOC>.
static String XML_START_TAG
          Start delimiter of the document, which is <DOC>.
 
Constructor Summary
TrecDocument()
          Creates an empty TrecDocument object.
 
Method Summary
 String getContent()
          Returns the content of the document.
 String getDocid()
          Returns the globally-unique String identifier of the document within the collection (e.g., LA123190-0134).
static void readDocument(TrecDocument doc, String s)
          Reads a raw XML string into a TrecDocument object.
 void readFields(DataInput in)
          Serializes this object.
 void write(DataOutput out)
          Deserializes this object.
 
Methods inherited from class edu.umd.cloud9.collection.Indexable
getDisplayContent, getDisplayContentType
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

XML_START_TAG

public static final String XML_START_TAG
Start delimiter of the document, which is <DOC>.

See Also:
Constant Field Values

XML_END_TAG

public static final String XML_END_TAG
End delimiter of the document, which is </DOC>.

See Also:
Constant Field Values
Constructor Detail

TrecDocument

public TrecDocument()
Creates an empty TrecDocument object.

Method Detail

write

public void write(DataOutput out)
           throws IOException
Deserializes this object.

Throws:
IOException

readFields

public void readFields(DataInput in)
                throws IOException
Serializes this object.

Throws:
IOException

getDocid

public String getDocid()
Returns the globally-unique String identifier of the document within the collection (e.g., LA123190-0134).

Specified by:
getDocid in class Indexable

getContent

public String getContent()
Returns the content of the document.

Specified by:
getContent in class Indexable

readDocument

public static void readDocument(TrecDocument doc,
                                String s)
Reads a raw XML string into a TrecDocument object.

Parameters:
doc - the TrecDocument object
s - raw XML string