edu.umd.cloud9.collection.wikipedia
Class WikipediaPagesBz2InputStream
java.lang.Object
edu.umd.cloud9.collection.wikipedia.WikipediaPagesBz2InputStream
public class WikipediaPagesBz2InputStream
- extends Object
Class for working with bz2-compressed Wikipedia article dump files on local
disk.
- Author:
- Jimmy Lin
WikipediaPagesBz2InputStream
public WikipediaPagesBz2InputStream(String file)
throws IOException
- Creates an input stream for reading Wikipedia articles from a
bz2-compressed dump file.
- Parameters:
file - path to dump file
- Throws:
IOException
readNext
public boolean readNext(WikipediaPage page)
throws IOException
- Reads the next Wikipedia page.
- Parameters:
page - WikipediaPage object to read into
- Returns:
true if page is successfully read
- Throws:
IOException
main
public static void main(String[] args)
throws Exception
- Throws:
Exception