dictionary_code


dictionary_code, a MATLAB code which can apply a dictionary code to a text file.

A common feature of lossless compression schemes is the construction of a "dictionary" of the symbols or words occuring in the file, and the replacement of symbols by dictionary indices.

These functions illustrate that idea, by starting with a version of the Gettysburg Address. In order to simplify our work, we remove punctuation and capitalization. Using MATLAB's "textread" function, we can create a cell array where each entry is a word in the file. Using MATLAB's unique() function we can construct a "dictionary" that lists in alphabetic order every word occurring in the file. Using a surprisingly obscure MATLAB function, we can then replace every word in the text file by its dictionary index. This is the operation of the "dictionary_encode()" function.

In order to decode or uncompress the file, we need both the encoded file and the dictionary. For our example, the dictionary is stored as a separate file, although compression schemes pack both the encoded text and the dictionary together. The function "dictionary_decode()" can then recover the original message.

Licensing:

The computer code and data files described and made available on this web page are distributed under the MIT license

Languages:

dictionary_code is available in a MATLAB version.

Related Programs:

atbash, a MATLAB code which applies the Atbash substitution cipher to a string of text.

caesar, a MATLAB code which can apply a Caesar Shift Cipher to a string of text.

chrpak, a MATLAB code which works with characters and strings.

dictionary_code_test

filum, a MATLAB code which can work with information in text files.

monoalphabetic, a MATLAB code which can apply a monoalphabetic substitution cipher to a string of text.

rot13, a MATLAB code which can encipher a string using the ROT13 cipher for letters, and the ROT5 cipher for digits.

Source Code:


Last revised on 05 January 2019.