edu.umd.cloud9.collection
Interface DocnoMapping

All Known Implementing Classes:
Aquaint2DocnoMapping, ClueWarcDocnoMapping, Gov2DocnoMapping, MedlineDocnoMapping, TextDocnoMapping, TrecDocnoMapping, WikipediaDocnoMapping, Wt10gDocnoMapping

public interface DocnoMapping

Interface for an object that maintains the mapping between docids and docnos. A docid is a globally-unique String identifier for a document in the collection. For many types of information retrieval algorithms, documents in the collection must be sequentially numbered; thus, each document in the collection must be assigned a unique integer identifier, which is its docno. Typically, the docid to docno mappings are stored in a mappings file, which is loaded into memory by concrete objects implementing this interface.

Unless there are compelling reasons otherwise, it is preferable to start numbering docnos from one instead of zero. This is because zero cannot be represented in many common compression schemes that are used in information retrieval (e.g., Golomb codes).


Method Summary
 String getDocid(int docno)
          Returns the docid for a particular docno.
 int getDocno(String docid)
          Returns the docno for a particular docid.
 void loadMapping(Path p, FileSystem fs)
          Loads a mapping file containing the docid to docno mappings.
 

Method Detail

getDocno

int getDocno(String docid)
Returns the docno for a particular docid.

Parameters:
docid - the docid
Returns:
the docno for the docid

getDocid

String getDocid(int docno)
Returns the docid for a particular docno.

Parameters:
docno - the docno
Returns:
the docid for the docno

loadMapping

void loadMapping(Path p,
                 FileSystem fs)
                 throws IOException
Loads a mapping file containing the docid to docno mappings.

Parameters:
p - path to the mappings file
fs - appropriate FileSystem
Throws:
IOException