edu.umd.cloud9.collection.wikipedia
Class WikipediaPageInputFormat

java.lang.Object
  extended by org.apache.hadoop.mapred.FileInputFormat<K,V>
      extended by edu.umd.cloud9.collection.IndexableFileInputFormat<LongWritable,WikipediaPage>
          extended by edu.umd.cloud9.collection.wikipedia.WikipediaPageInputFormat
All Implemented Interfaces:
InputFormat<LongWritable,WikipediaPage>

public class WikipediaPageInputFormat
extends IndexableFileInputFormat<LongWritable,WikipediaPage>

Hadoop InputFormat for processing Wikipedia pages from the XML dumps.

Author:
Jimmy Lin

Nested Class Summary
static class WikipediaPageInputFormat.WikipediaPageRecordReader
          Hadoop RecordReader for reading Wikipedia pages from the XML dumps.
 
Field Summary
 
Fields inherited from class org.apache.hadoop.mapred.FileInputFormat
LOG
 
Constructor Summary
WikipediaPageInputFormat()
           
 
Method Summary
 RecordReader<LongWritable,WikipediaPage> getRecordReader(InputSplit inputSplit, JobConf conf, Reporter reporter)
          Returns a RecordReader for this InputFormat.
 
Methods inherited from class org.apache.hadoop.mapred.FileInputFormat
addInputPath, addInputPaths, getInputPathFilter, getInputPaths, getSplits, setInputPathFilter, setInputPaths, setInputPaths
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

WikipediaPageInputFormat

public WikipediaPageInputFormat()
Method Detail

getRecordReader

public RecordReader<LongWritable,WikipediaPage> getRecordReader(InputSplit inputSplit,
                                                                JobConf conf,
                                                                Reporter reporter)
                                                         throws IOException
Returns a RecordReader for this InputFormat.

Specified by:
getRecordReader in interface InputFormat<LongWritable,WikipediaPage>
Specified by:
getRecordReader in class FileInputFormat<LongWritable,WikipediaPage>
Throws:
IOException