Apply a Dictionary Code to a Text File

DICTIONARY_CODE is a MATLAB library which can apply a dictionary code to a text file.

A common feature of lossless compression schemes is the construction of a "dictionary" of the symbols or words occuring in the file, and the replacement of symbols by dictionary indices.

These functions illustrate that idea, by starting with a version of the Gettysburg Address. In order to simplify our work, we remove punctuation and capitalization. Using MATLAB's "textread" function, we can create a cell array where each entry is a word in the file. Using MATLAB's unique() function we can construct a "dictionary" that lists in alphabetic order every word occurring in the file. Using a surprisingly obscure MATLAB function, we can then replace every word in the text file by its dictionary index. This is the operation of the "dictionary_encode()" function.

In order to decode or uncompress the file, we need both the encoded file and the dictionary. For our example, the dictionary is stored as a separate file, although compression schemes pack both the encoded text and the dictionary together. The function "dictionary_decode()" can then recover the original message.


The computer code and data files described and made available on this web page are distributed under the GNU LGPL license.


DICTIONARY_CODE is available in a MATLAB version.

Related Programs:

ATBASH, a MATLAB library which applies the Atbash substitution cipher to a string of text.

CAESAR, a MATLAB library which can apply a Caesar Shift Cipher to a string of text.

CHRPAK, a MATLAB library which works with characters and strings.

FILUM, a MATLAB library which can work with information in text files.

MONOALPHABETIC, a MATLAB library which can apply a monoalphabetic substitution cipher to a string of text.

ROT13, a MATLAB library which can encipher a string using the ROT13 cipher for letters, and the ROT5 cipher for digits.

Source Code:

Examples and Tests:

You can go up one level to the MATLAB source codes.

Last revised on 31 January 2016.