|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectedu.umd.cloud9.collection.clue.ClueCollectionPathConstants
public class ClueCollectionPathConstants
Class that provides convenience methods for processing portions of the Clue Web collection with Hadoop. Static methods in this class allow the user to easily "select" different portions of the collection to serve as input to a MapReduce job.
| Method Summary | |
|---|---|
static void |
addEnglishCollectionPart(JobConf conf,
String base,
int i)
Adds a part (segment) of the Clue Web English collection to a Hadoop JobConf object. |
static void |
addEnglishCompleteCollection(JobConf conf,
String base)
Adds the complete Clue Web English collection to a Hadoop JobConf object. |
static void |
addEnglishSmallCollection(JobConf conf,
String base)
Adds the first part (segment) of the Clue Web English collection to a Hadoop JobConf object. |
static void |
addEnglishTestFile(JobConf conf,
String base)
Adds a sample compressed WARC archive to a Hadoop JobConf
object. |
static void |
addEnglishTinyCollection(JobConf conf,
String base)
Adds the first section of the Clue Web English collection to a Hadoop JobConf object. |
| Methods inherited from class java.lang.Object |
|---|
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Method Detail |
|---|
public static void addEnglishTestFile(JobConf conf,
String base)
JobConf
object. The specific archive is
ClueWeb09_English_1/en0000/00.warc.gz, which contains
35,582 Web pages.
conf - Hadoop JobConfbase - base path for the Clue Web collection
public static void addEnglishTinyCollection(JobConf conf,
String base)
JobConf object. Specifically, this method adds the
contents of ClueWeb09_English_1/en0000/, which contains
3,382,356 pages.
conf - Hadoop JobConfbase - base path for the Clue Web collection
public static void addEnglishSmallCollection(JobConf conf,
String base)
JobConf object. Specifically, this method adds the
contents of ClueWeb09_English_1/, which contains
50,220,423 pages.
conf - Hadoop JobConfbase - base path for the Clue Web collection
public static void addEnglishCompleteCollection(JobConf conf,
String base)
JobConf object. Specifically, this method adds the
contents of ClueWeb09_English_1/ through
ClueWeb09_English_10/, which contains 503,903,810 pages.
conf - Hadoop JobConfbase - base path for the Clue Web collection
public static void addEnglishCollectionPart(JobConf conf,
String base,
int i)
JobConf object. Part 1 corresponds to the contents of
ClueWeb09_English_1/ (i.e., the "small" collection), all
the way through part 10. Note that adding all ten parts is equivalent to
adding the complete English collection.
conf - Hadoop JobConfbase - base path for the Clue Web collection
|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||