edu.umd.cloud9.collection.clue
Class DemoCountClueWarcRecords

java.lang.Object
  extended by org.apache.hadoop.conf.Configured
      extended by edu.umd.cloud9.collection.clue.DemoCountClueWarcRecords
All Implemented Interfaces:
Configurable, Tool

public class DemoCountClueWarcRecords
extends Configured
implements Tool

Simple demo program to count the number of records in the ClueWeb09 collection, from either the original source WARC files or repacked SequenceFiles (controlled by the first command-line parameter). The program also verifies the docid to docno mappings.

The program takes four command-line arguments:

Here's a sample invocation:

 hadoop jar cloud9.jar edu.umd.cloud9.collection.clue.DemoCountSourceClueWarcRecords \
   original /umd/collections/ClueWeb09 1 /umd/collections/ClueWeb09/docno-mapping.dat
 

Author:
Jimmy Lin

Constructor Summary
DemoCountClueWarcRecords()
          Creates an instance of this tool.
 
Method Summary
static void main(String[] args)
          Dispatches command-line arguments to the tool via the ToolRunner.
 int run(String[] args)
          Runs this tool.
 
Methods inherited from class org.apache.hadoop.conf.Configured
getConf, setConf
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface org.apache.hadoop.conf.Configurable
getConf, setConf
 

Constructor Detail

DemoCountClueWarcRecords

public DemoCountClueWarcRecords()
Creates an instance of this tool.

Method Detail

run

public int run(String[] args)
        throws Exception
Runs this tool.

Specified by:
run in interface Tool
Throws:
Exception

main

public static void main(String[] args)
                 throws Exception
Dispatches command-line arguments to the tool via the ToolRunner.

Throws:
Exception