markov_letters, an Octave code which counts the occurences of letter pairs in a text.
The text
The quick brown fox jumps over the lazy dog.can be regarded as a string of symbols. To simplify matters, replace all uppercase letters by lowercase letters, and all non-alphabetic characters by spaces. Also, pad the string with an initial and final blank.
The function reads the text, which will generally be stored in a file. It counts the occurrence of every possible pair of letters, creating a table(), a 27x27 array of frequencies. Entry table(1,3) counts the number of times the letter 'a' was immediately followed by the letter 'c', for instance. The 27th row and column keep track of the blank or non-alphabetic characters.
If you divide each row of the table by its sum, then this normalized value of table(i,j) can be regarded as the probability that a randomly selected occurrence of the i-th letter will be immediately followed by the j-th letter. This suggests various statistical tests on texts, and a means of simulating a text at a very low level.
The computer code and data files described and made available on this web page are distributed under the MIT license
markov_letters is available in a MATLAB version and an Octave version.
chrpak, an Octave code which works with characters and strings.
filum, an Octave code which can work with information in text files.