edu.umd.cloud9.collection.wikipedia
Class WikipediaDocnoMapping

java.lang.Object
  extended by edu.umd.cloud9.collection.wikipedia.WikipediaDocnoMapping
All Implemented Interfaces:
DocnoMapping

public class WikipediaDocnoMapping
extends Object
implements DocnoMapping

Provides a mapping between Wikipedia internal ids (docids) and sequentially-numbered ints (docnos).

The main of this class provides a simple program for accessing docno mappings. Command-line arguments are as follows:

Author:
Jimmy Lin

Constructor Summary
WikipediaDocnoMapping()
          Creates a WikipediaDocnoMapping object
 
Method Summary
 String getDocid(int docno)
          Returns the docid for a particular docno.
 int getDocno(String docid)
          Returns the docno for a particular docid.
 void loadMapping(Path p, FileSystem fs)
          Loads a mapping file containing the docid to docno mappings.
static void main(String[] args)
          Simple program the provides access to the docno/docid mappings.
static int[] readDocnoMappingData(Path p, FileSystem fs)
          Reads a mappings file into memory.
static void writeDocnoMappingData(String inputFile, int n, String outputFile)
          Creates a mappings file from the contents of a flat text file containing docid to docno mappings.
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

WikipediaDocnoMapping

public WikipediaDocnoMapping()
Creates a WikipediaDocnoMapping object

Method Detail

getDocno

public int getDocno(String docid)
Description copied from interface: DocnoMapping
Returns the docno for a particular docid.

Specified by:
getDocno in interface DocnoMapping
Parameters:
docid - the docid
Returns:
the docno for the docid

getDocid

public String getDocid(int docno)
Description copied from interface: DocnoMapping
Returns the docid for a particular docno.

Specified by:
getDocid in interface DocnoMapping
Parameters:
docno - the docno
Returns:
the docid for the docno

loadMapping

public void loadMapping(Path p,
                        FileSystem fs)
                 throws IOException
Description copied from interface: DocnoMapping
Loads a mapping file containing the docid to docno mappings.

Specified by:
loadMapping in interface DocnoMapping
Parameters:
p - path to the mappings file
fs - appropriate FileSystem
Throws:
IOException

writeDocnoMappingData

public static void writeDocnoMappingData(String inputFile,
                                         int n,
                                         String outputFile)
                                  throws IOException
Creates a mappings file from the contents of a flat text file containing docid to docno mappings. This method is used by BuildWikipediaDocnoMapping internally.

Parameters:
inputFile - flat text file containing docid to docno mappings
outputFile - output mappings file
Throws:
IOException

readDocnoMappingData

public static int[] readDocnoMappingData(Path p,
                                         FileSystem fs)
                                  throws IOException
Reads a mappings file into memory.

Parameters:
p - path to the mappings file
fs - appropriate FileSystem
Returns:
an array of docids; the index position of each docid is its docno
Throws:
IOException

main

public static void main(String[] args)
                 throws IOException
Simple program the provides access to the docno/docid mappings.

Parameters:
args - command-line arguments
Throws:
IOException