text_strip


text_strip, a Python code which uses the "re" regular expression library to strip a text file of unwanted characters.

Licensing:

The information on this web page is distributed under the MIT license.

Languages:

text_strip is available in a Python version.

Related Data and Programs:

markov_text, a Python code which uses a Markov Chain Monte Carlo (MCMC) process to sample an existing text file and create a new text that is randomized, but retains some of the structure of the original one.

ngrams, a dataset directory which contains information about the observed frequency of "ngrams" (particular sequences of n letters) in English text.

text_to_wordlist, a Python code which shows how to start with a text file, read its information into a single long string, and divide that string into individual words. This allows an investigator to analyze the text for patterns.

text, a dataset directory which contains some short English texts, such as Alice in Wonderland, the Gettysburg Address, Hamlet, Moby Dick, Robinson Crusoe, the Wizard of Oz;

Source Code:


Last modified on 21 February 2026.