edu.umd.cloud9.collection.line
Class TextDocumentInputFormat
java.lang.Object
org.apache.hadoop.mapred.FileInputFormat<LongWritable,TextDocument>
edu.umd.cloud9.collection.line.TextDocumentInputFormat
- All Implemented Interfaces:
- InputFormat<LongWritable,TextDocument>, JobConfigurable
public class TextDocumentInputFormat
- extends FileInputFormat<LongWritable,TextDocument>
- implements JobConfigurable
Hadoop InputFormat for processing a simple collection. Each
document of the collection consists of a single line of text: the docid,
followed by a tab, followed by the document contents. Note that the document
content cannot contain embedded tabs or newlines.
- Author:
- Jimmy Lin
TextDocumentInputFormat
public TextDocumentInputFormat()
configure
public void configure(JobConf conf)
- Specified by:
configure in interface JobConfigurable
getRecordReader
public RecordReader<LongWritable,TextDocument> getRecordReader(InputSplit genericSplit,
JobConf job,
Reporter reporter)
throws IOException
- Specified by:
getRecordReader in interface InputFormat<LongWritable,TextDocument>- Specified by:
getRecordReader in class FileInputFormat<LongWritable,TextDocument>
- Throws:
IOException