edu.umd.cloud9.collection.trecweb
Class TrecWebDocumentInputFormat

java.lang.Object
  extended by org.apache.hadoop.mapred.FileInputFormat<K,V>
      extended by edu.umd.cloud9.collection.IndexableFileInputFormat<LongWritable,TrecWebDocument>
          extended by edu.umd.cloud9.collection.trecweb.TrecWebDocumentInputFormat
All Implemented Interfaces:
InputFormat<LongWritable,TrecWebDocument>

public class TrecWebDocumentInputFormat
extends IndexableFileInputFormat<LongWritable,TrecWebDocument>

Hadoop InputFormat for processing the TREC collection.

Author:
Jimmy Lin

Nested Class Summary
static class TrecWebDocumentInputFormat.TrecWebRecordReader
          Hadoop RecordReader for reading TREC-formatted documents.
 
Field Summary
 
Fields inherited from class org.apache.hadoop.mapred.FileInputFormat
LOG
 
Constructor Summary
TrecWebDocumentInputFormat()
           
 
Method Summary
 RecordReader<LongWritable,TrecWebDocument> getRecordReader(InputSplit inputSplit, JobConf conf, Reporter reporter)
          Returns a RecordReader for this InputFormat.
 
Methods inherited from class org.apache.hadoop.mapred.FileInputFormat
addInputPath, addInputPaths, getInputPathFilter, getInputPaths, getSplits, setInputPathFilter, setInputPaths, setInputPaths
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

TrecWebDocumentInputFormat

public TrecWebDocumentInputFormat()
Method Detail

getRecordReader

public RecordReader<LongWritable,TrecWebDocument> getRecordReader(InputSplit inputSplit,
                                                                  JobConf conf,
                                                                  Reporter reporter)
                                                           throws IOException
Returns a RecordReader for this InputFormat.

Specified by:
getRecordReader in interface InputFormat<LongWritable,TrecWebDocument>
Specified by:
getRecordReader in class FileInputFormat<LongWritable,TrecWebDocument>
Throws:
IOException