text_to_wordlist, a Python code which shows how to start with a text file, read its information into a single long string, and divide that string into individual words. This allows an investigator to analyze the text for patterns.
The information on this web page is distributed under the MIT license.
text_to_wordlist is available in a Python version.
markov_text, a Python code which uses a Markov Chain Monte Carlo (MCMC) process to sample an existing text file and create a new text that is randomized, but retains some of the structure of the original one.
ngrams, a dataset directory which contains information about the observed frequency of "ngrams" (particular sequences of n letters) in English text.
text, a dataset directory which contains some short English texts, such as Alice in Wonderland, the Gettysburg Address, Hamlet, Moby Dick, Robinson Crusoe, the Wizard of Oz;