text
text,
a dataset directory which
contains some texts.
Licensing:
The information on this web page is distributed under the MIT license.
Related Data and Programs:
chain_letters,
a dataset directory which
contains several examples of chain letters.
german,
a dataset directory which
contains some short German texts;
ngrams,
a dataset directory which
contains information about the observed frequency of "ngrams"
(particular sequences of n letters) in English text.
words,
a dataset directory which
contains lists of words;
Datasets:
-
abc.txt,
the lower-case alphabet.
-
adventures_of_sherlock_holmes.txt,
the text of "The Adventures_of_Sherlock_Holmes"
by Arthur Conan Doyle.
-
alice_in_wonderland.txt,
the text of "Alice's Adventures in Wonderland" by Lewis Carroll.
-
before_the_law.txt,
the text of "Before the Law" by Franz Kafka.
-
carnival.txt,
the text of "The Facts Concerning the Recent Carnival of Crime
in Connecticut",
by Mark Twain (Samuel Clemens);
-
declaration_of_independence.txt,
the text of the Declaration of Independence.
-
desiderata.txt,
the text of "Desiderata",
by Max Ehrmann.
-
genesis.txt,
the text of the first five chapters of the Book of Genesis,
in the King James version.
-
gettysburg_address.txt,
the text of "The Gettysburg Address",
by Abraham Lincoln;
-
hamlet.txt,
the text of the "To be or not to be" soliloquy from "Hamlet",
by William Shakespeare;
-
lorem_ipsum.txt,
the "Lorem Ipsum" block of text used as printer's dummy text for
hundreds of years. The text was extracted from a work by Cicero,
but chopped up somewhat. In particular, the opening phrase
"Lorem ipsum" is actually pulled from Cicero's phrase "Neque porro
quisquam est, qui dolorem ipsum quia dolor sit amet,...";
-
major_general.txt,
the text of the Major-General's song from "The Pirates of Penzance",
by Gilbert and Sullivan.
-
moby_dick.txt,
the text of "Moby Dick"
by Herman Melville;
-
panjandrum.txt,
the text of "The Great Panjandrum", a nonsense paragraph given
as a test to Charles Macklin, an actor who claimed he could memorize
and then recite any text after a single reading,
by Samuel Foote.
-
pride_and_prejudice.txt,
the text of "Pride and Prejudice"
by Jane Austen;
-
quick_brown_fox.txt,
"The quick brown fox jumps over the lazy dog.", a sentence containing
every letter of the alphabet.
-
robinson_crusoe.txt,
the text of "Robinson Crusoe"
by Daniel Defoe;
-
sonnets.txt,
the text of the sonnets,
by William Shakespeare;
-
species.txt,
the last two paragraphs of "The Origin of Species",
by Charles Darwin;
-
the_raven.txt,
the text of "The Raven",
by Edgar Allan Poe;
-
wizard_of_oz.txt,
the text of "The Wonderful Wizard of Oz",
by L Frank Baum;
Last revised on 09 October 2024.