edu.umd.cloud9.collection.wikipedia
Class BuildWikipediaDocnoMapping

java.lang.Object
  extended by org.apache.hadoop.conf.Configured
      extended by edu.umd.cloud9.collection.wikipedia.BuildWikipediaDocnoMapping
All Implemented Interfaces:
Configurable, Tool

public class BuildWikipediaDocnoMapping
extends Configured
implements Tool

Tool for building the mapping between Wikipedia internal ids (docids) and sequentially-numbered ints (docnos). The program takes four command-line arguments:

Here's a sample invocation:

 hadoop jar cloud9.jar edu.umd.cloud9.collection.wikipedia.BuildWikipediaDocnoMapping \
   -libjars bliki-core-3.0.15.jar,commons-lang-2.5.jar \
   /user/jimmy/Wikipedia/raw/enwiki-20101011-pages-articles.xml tmp \
   /user/jimmy/Wikipedia/docno-en-20101011.dat 100
 

Author:
Jimmy Lin

Constructor Summary
BuildWikipediaDocnoMapping()
           
 
Method Summary
static void main(String[] args)
           
 int run(String[] args)
           
 
Methods inherited from class org.apache.hadoop.conf.Configured
getConf, setConf
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface org.apache.hadoop.conf.Configurable
getConf, setConf
 

Constructor Detail

BuildWikipediaDocnoMapping

public BuildWikipediaDocnoMapping()
Method Detail

run

public int run(String[] args)
        throws Exception
Specified by:
run in interface Tool
Throws:
Exception

main

public static void main(String[] args)
                 throws Exception
Throws:
Exception