edu.umd.cloud9.collection.line
Class TextDocumentInputFormat

java.lang.Object
  extended by org.apache.hadoop.mapred.FileInputFormat<LongWritable,TextDocument>
      extended by edu.umd.cloud9.collection.line.TextDocumentInputFormat
All Implemented Interfaces:
InputFormat<LongWritable,TextDocument>, JobConfigurable

public class TextDocumentInputFormat
extends FileInputFormat<LongWritable,TextDocument>
implements JobConfigurable

Hadoop InputFormat for processing a simple collection. Each document of the collection consists of a single line of text: the docid, followed by a tab, followed by the document contents. Note that the document content cannot contain embedded tabs or newlines.

Author:
Jimmy Lin

Nested Class Summary
static class TextDocumentInputFormat.TextDocumentLineRecordReader
           
 
Field Summary
 
Fields inherited from class org.apache.hadoop.mapred.FileInputFormat
LOG
 
Constructor Summary
TextDocumentInputFormat()
           
 
Method Summary
 void configure(JobConf conf)
           
 RecordReader<LongWritable,TextDocument> getRecordReader(InputSplit genericSplit, JobConf job, Reporter reporter)
           
 
Methods inherited from class org.apache.hadoop.mapred.FileInputFormat
addInputPath, addInputPaths, getInputPathFilter, getInputPaths, getSplits, setInputPathFilter, setInputPaths, setInputPaths
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

TextDocumentInputFormat

public TextDocumentInputFormat()
Method Detail

configure

public void configure(JobConf conf)
Specified by:
configure in interface JobConfigurable

getRecordReader

public RecordReader<LongWritable,TextDocument> getRecordReader(InputSplit genericSplit,
                                                               JobConf job,
                                                               Reporter reporter)
                                                        throws IOException
Specified by:
getRecordReader in interface InputFormat<LongWritable,TextDocument>
Specified by:
getRecordReader in class FileInputFormat<LongWritable,TextDocument>
Throws:
IOException