edu.umd.cloud9.collection.wikipedia
Class WikipediaPagesBz2InputStream

java.lang.Object
  extended by edu.umd.cloud9.collection.wikipedia.WikipediaPagesBz2InputStream

public class WikipediaPagesBz2InputStream
extends Object

Class for working with bz2-compressed Wikipedia article dump files on local disk.

Author:
Jimmy Lin

Constructor Summary
WikipediaPagesBz2InputStream(String file)
          Creates an input stream for reading Wikipedia articles from a bz2-compressed dump file.
 
Method Summary
static void main(String[] args)
           
 boolean readNext(WikipediaPage page)
          Reads the next Wikipedia page.
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

WikipediaPagesBz2InputStream

public WikipediaPagesBz2InputStream(String file)
                             throws IOException
Creates an input stream for reading Wikipedia articles from a bz2-compressed dump file.

Parameters:
file - path to dump file
Throws:
IOException
Method Detail

readNext

public boolean readNext(WikipediaPage page)
                 throws IOException
Reads the next Wikipedia page.

Parameters:
page - WikipediaPage object to read into
Returns:
true if page is successfully read
Throws:
IOException

main

public static void main(String[] args)
                 throws Exception
Throws:
Exception