text_to_wordlist


text_to_wordlist, a Python code which shows how to start with a text file, read its information into a single long string, and divide that string into individual words. This allows an investigator to analyze the text for patterns.

Licensing:

The information on this web page is distributed under the MIT license.

Languages:

text_to_wordlist is available in a Python version.

Related Data and Programs:

markov_text, a Python code which uses a Markov Chain Monte Carlo (MCMC) process to sample an existing text file and create a new text that is randomized, but retains some of the structure of the original one.

ngrams, a dataset directory which contains information about the observed frequency of "ngrams" (particular sequences of n letters) in English text.

text, a dataset directory which contains some short English texts, such as Alice in Wonderland, the Gettysburg Address, Hamlet, Moby Dick, Robinson Crusoe, the Wizard of Oz;

Source Code:


Last modified on 09 October 2024.