dictionary_code


dictionary_code, a Python code which applies a dictionary code to a text file.

A common feature of lossless compression schemes is the construction of a "dictionary" of the symbols or words occuring in the file, and the replacement of symbols by dictionary indices.

These functions illustrate that idea, by starting with a version of the Gettysburg Address. In order to simplify our work, we remove punctuation and capitalization. We create an array where each entry is a word in the file. Identifying the unique words in the array, we construct a "dictionary" that lists in alphabetic order every word occurring in the file. We then replace every word in the text file by its dictionary index. This is the operation of the "dictionary_encode()" function.

In order to decode or uncompress the file, we need both the encoded file and the dictionary. For our example, the dictionary is stored as a separate file, although compression schemes pack both the encoded text and the dictionary together. The function "dictionary_decode()" can then recover the original message.

Licensing:

The information on this web page is distributed under the MIT license.

Languages:

dictionary_code is available in a MATLAB version and an Octave version and a Python version.

Related Programs:

atbash, a Python code which applies the Atbash substitution cipher to a string of text.

caesar, a Python code which applies a Caesar Shift Cipher to a string of text.

chrpak, a Python code which works with characters and strings.

filum, a Python code which works with information in text files.

rot13, a Python code which enciphers a string using the ROT13 cipher for letters, and the ROT5 cipher for digits.

Source Code:


Last revised on 15 March 2025.