dictionary_code


dictionary_code, an Octave code which can apply a dictionary code to a text file.

A common feature of lossless compression schemes is the construction of a "dictionary" of the symbols or words occuring in the file, and the replacement of symbols by dictionary indices.

These functions illustrate that idea, by starting with a version of the Gettysburg Address. In order to simplify our work, we remove punctuation and capitalization. Using MATLAB's "textread" function, we can create a cell array where each entry is a word in the file. Using MATLAB's unique() function we can construct a "dictionary" that lists in alphabetic order every word occurring in the file. Using a surprisingly obscure MATLAB function, we can then replace every word in the text file by its dictionary index. This is the operation of the "dictionary_encode()" function.

In order to decode or uncompress the file, we need both the encoded file and the dictionary. For our example, the dictionary is stored as a separate file, although compression schemes pack both the encoded text and the dictionary together. The function "dictionary_decode()" can then recover the original message.

Licensing:

The computer code and data files described and made available on this web page are distributed under the MIT license

Languages:

dictionary_code is available in a MATLAB version and an Octave version.

Related Programs:

dictionary_code_test

atbash, an Octave code which applies the Atbash substitution cipher to a string of text.

caesar, an Octave code which can apply a Caesar Shift Cipher to a string of text.

chrpak, an Octave code which works with characters and strings.

filum, an Octave code which can work with information in text files.

monoalphabetic, an Octave code which can apply a monoalphabetic substitution cipher to a string of text.

rot13, an Octave code which can encipher a string using the ROT13 cipher for letters, and the ROT5 cipher for digits.

Source Code:


Last revised on 02 July 2023.