markov_text

markov_text, a Python code which uses a Markov Chain Monte Carlo (MCMC) process to sample an existing text file and create a new text that is randomized, but retains some of the structure of the original one.

The program is given a text file, a suffix length N, and a total text length M. Starting at random point in the text, it selects N consecutive words, which are called the prefix. It then finds every word that immediately follows any occurrence of the prefix in the text, and chooses one randomly as the suffix. The prefix is now modified by removing the first word and appending the suffix. The program stops after M consecutive words have been generated in this way.

Licensing:

The information on this web page is distributed under the MIT license.

Languages:

markov_text is available in a Python version.

Related Programs:

ngrams, a Python code which analyzes a string or text against the observed frequency of ngrams (particular sequences of n letters) in English text.

text_to_wordlist, a Python code which shows how to start with a text file, read its information into a single long string, and divide that string into individual words. This allows an investigator to analyze the text for patterns.

Author:

Original Python version downloaded from "Rosetta Code".

References:

https://rosettacode.org/wiki/Markov_chain_text_generator
Claude Shannon,
A Mathematical Theory of Communication,
The Bell System Technical Journal,
July 1948, pages 379-423, October 1948, pages 623-656.

Source Code:

markov_text.py, the source code.
markov_text.sh, runs all the tests.
markov_text.txt, the output file.

alice_oz.txt, a text to be sampled, combining text from "Alice in Wonderland" and "The Wonderful Wizard of Oz".

Last revised on 18 March 2021.