markov_text


markov_text, a Python code which uses a Markov chain process to sample an existing text file and create a new text that is randomized, but retains some of the structure of the original one.

The program is given a text file, a suffix length N, and a total text length M. Starting at random point in the text, it selects N consecutive words, which are called the prefix. It then finds every word that immediately follows any occurrence of the prefix in the text, and chooses one randomly as the suffix. The prefix is now modified by removing the first word and appending the suffix. The program stops after M consecutive words have been generated in this way.

Licensing:

The computer code and data files made available on this web page are distributed under the MIT license

Languages:

markov_text is available in a Python version.

Related Programs:

ngrams, a Python code which analyzes a string or text against the observed frequency of ngrams (particular sequences of n letters) in English text.

Author:

Original Python code downloaded from "Rosetta Code".

References:

  1. https://rosettacode.org/wiki/Markov_chain_text_generator
  2. Claude Shannon,
    A Mathematical Theory of Communication,
    The Bell System Technical Journal,
    July 1948, pages 379-423, October 1948, pages 623-656.

Source Code:


Last revised on 18 March 2021.