edu.umd.cloud9.collection.wikipedia
Class WikipediaPage

java.lang.Object
  extended by edu.umd.cloud9.collection.Indexable
      extended by edu.umd.cloud9.collection.wikipedia.WikipediaPage
All Implemented Interfaces:
Writable

public class WikipediaPage
extends Indexable

A page from Wikipedia.

Author:
Jimmy Lin

Field Summary
static String XML_END_TAG
          End delimiter of the page, which is </page>.
static String XML_START_TAG
          Start delimiter of the page, which is <page>.
 
Constructor Summary
WikipediaPage()
          Creates an empty WikipediaPage object.
 
Method Summary
 List<String> extractLinkDestinations()
           
 String findInterlanguageLink(String lang)
          Returns the inter-language link to a specific language (if any).
 String getContent()
          Returns the contents of this page (title + text).
 String getDisplayContent()
           
 String getDisplayContentType()
           
 String getDocid()
          Returns the article title (i.e., the docid).
 String getRawXML()
          Returns the raw XML of this page.
 String getTitle()
          Returns the title of this page.
 String getWikiMarkup()
          Returns the text of this page.
 boolean isArticle()
          Checks to see if this page is an actual article, and not, for example, "File:", "Category:", "Wikipedia:", etc.
 boolean isDisambiguation()
          Checks to see if this page is a disambiguation page.
 boolean isEmpty()
          Checks to see if this page is an empty page.
 boolean isRedirect()
          Checks to see if this page is a redirect page.
 boolean isStub()
          Checks to see if this article is a stub.
 void readFields(DataInput in)
          Serializes this object.
static void readPage(WikipediaPage page, String s)
          Reads a raw XML string into a WikipediaPage object.
 void write(DataOutput out)
          Deserializes this object.
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

XML_START_TAG

public static final String XML_START_TAG
Start delimiter of the page, which is <page>.

See Also:
Constant Field Values

XML_END_TAG

public static final String XML_END_TAG
End delimiter of the page, which is </page>.

See Also:
Constant Field Values
Constructor Detail

WikipediaPage

public WikipediaPage()
Creates an empty WikipediaPage object.

Method Detail

write

public void write(DataOutput out)
           throws IOException
Deserializes this object.

Throws:
IOException

readFields

public void readFields(DataInput in)
                throws IOException
Serializes this object.

Throws:
IOException

getDocid

public String getDocid()
Returns the article title (i.e., the docid).

Specified by:
getDocid in class Indexable

getContent

public String getContent()
Returns the contents of this page (title + text).

Specified by:
getContent in class Indexable

getDisplayContent

public String getDisplayContent()
Overrides:
getDisplayContent in class Indexable

getDisplayContentType

public String getDisplayContentType()
Overrides:
getDisplayContentType in class Indexable

getRawXML

public String getRawXML()
Returns the raw XML of this page.


getWikiMarkup

public String getWikiMarkup()
Returns the text of this page.


getTitle

public String getTitle()
Returns the title of this page.


isDisambiguation

public boolean isDisambiguation()
Checks to see if this page is a disambiguation page. A WikipediaPage is either an article, a disambiguation page, a redirect page, or an empty page.

Returns:
true if this page is a disambiguation page

isRedirect

public boolean isRedirect()
Checks to see if this page is a redirect page. A WikipediaPage is either an article, a disambiguation page, a redirect page, or an empty page.

Returns:
true if this page is a redirect page

isEmpty

public boolean isEmpty()
Checks to see if this page is an empty page. A WikipediaPage is either an article, a disambiguation page, a redirect page, or an empty page.

Returns:
true if this page is an empty page

isStub

public boolean isStub()
Checks to see if this article is a stub. Return value is only meaningful if this page isn't a disambiguation page, a redirect page, or an empty page.

Returns:
true if this article is a stub

isArticle

public boolean isArticle()
Checks to see if this page is an actual article, and not, for example, "File:", "Category:", "Wikipedia:", etc.

Returns:
true if this page is an actual article

findInterlanguageLink

public String findInterlanguageLink(String lang)
Returns the inter-language link to a specific language (if any).

Parameters:
lang - language
Returns:
title of the article in the foreign language if link exists, null otherwise

extractLinkDestinations

public List<String> extractLinkDestinations()

readPage

public static void readPage(WikipediaPage page,
                            String s)
Reads a raw XML string into a WikipediaPage object.

Parameters:
page - the WikipediaPage object
s - raw XML string