|
||||||||||
| PREV PACKAGE NEXT PACKAGE | FRAMES NO FRAMES | |||||||||
See:
Description
| Class Summary | |
|---|---|
| BuildWikipediaDocnoMapping | Tool for building the mapping between Wikipedia internal ids (docids) and sequentially-numbered ints (docnos). |
| BuildWikipediaForwardIndex | Tool for building a document forward index for Wikipedia. |
| BuildWikipediaLinkGraph | Tool for extracting the link graph out of Wikipedia. |
| DemoCountWikipediaPages | Tool for counting the number of pages in a particular Wikipedia XML dump file. |
| DumpWikipediaToPlainText | Tool for taking a Wikipedia XML dump file and spits out articles in a flat text file (article title and content, separated by a tap). |
| LookupWikipediaArticle | Tool for providing command-line access to page titles given either a docno or a docid. |
| RepackWikipedia |
Tool for repacking Wikipedia XML dumps into SequenceFiles. |
| WikipediaDocnoMapping | Provides a mapping between Wikipedia internal ids (docids) and sequentially-numbered ints (docnos). |
| WikipediaForwardIndex | Forward index for Wikipedia collections. |
| WikipediaPage | A page from Wikipedia. |
| WikipediaPageInputFormat | Hadoop InputFormat for processing Wikipedia pages from the XML
dumps. |
| WikipediaPageInputFormat.WikipediaPageRecordReader | Hadoop RecordReader for reading Wikipedia pages from the
XML dumps. |
| WikipediaPagesBz2InputStream | Class for working with bz2-compressed Wikipedia article dump files on local disk. |
Provides classes for working with Wikipedia XML dumps.
|
||||||||||
| PREV PACKAGE NEXT PACKAGE | FRAMES NO FRAMES | |||||||||