edu.umd.cloud9.collection.trecweb
Class TrecWebDocument

java.lang.Object
  extended by edu.umd.cloud9.collection.Indexable
      extended by edu.umd.cloud9.collection.trecweb.TrecWebDocument
All Implemented Interfaces:
Writable

public class TrecWebDocument
extends Indexable


Field Summary
static String XML_END_TAG
          End delimiter of the document, which is </DOC>.
static String XML_START_TAG
          Start delimiter of the document, which is <DOC>.
 
Constructor Summary
TrecWebDocument()
          Creates an empty Doc2Document object.
 
Method Summary
 String getContent()
          Returns the content of this Gov2 document.
 String getDocid()
          Returns the docid of this Gov2 document.
static void readDocument(TrecWebDocument doc, String s)
          Reads a raw XML string into a TrecWebDocument object.
 void readFields(DataInput in)
          Serializes this object.
static boolean readNextTrecWebDocument(TrecWebDocument doc, DataInputStream stream)
           
 void write(DataOutput out)
          Deserializes this object.
 
Methods inherited from class edu.umd.cloud9.collection.Indexable
getDisplayContent, getDisplayContentType
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

XML_START_TAG

public static final String XML_START_TAG
Start delimiter of the document, which is <DOC>.

See Also:
Constant Field Values

XML_END_TAG

public static final String XML_END_TAG
End delimiter of the document, which is </DOC>.

See Also:
Constant Field Values
Constructor Detail

TrecWebDocument

public TrecWebDocument()
Creates an empty Doc2Document object.

Method Detail

write

public void write(DataOutput out)
           throws IOException
Deserializes this object.

Throws:
IOException

readFields

public void readFields(DataInput in)
                throws IOException
Serializes this object.

Throws:
IOException

getDocid

public String getDocid()
Returns the docid of this Gov2 document.

Specified by:
getDocid in class Indexable

getContent

public String getContent()
Returns the content of this Gov2 document.

Specified by:
getContent in class Indexable

readDocument

public static void readDocument(TrecWebDocument doc,
                                String s)
Reads a raw XML string into a TrecWebDocument object.

Parameters:
doc - the TrecWebDocument object
s - raw XML string

readNextTrecWebDocument

public static boolean readNextTrecWebDocument(TrecWebDocument doc,
                                              DataInputStream stream)
                                       throws IOException
Throws:
IOException