XLS_IO, R programs which illustrate how data can be shared between Microsoft EXCEL and R, using XLS and CSV files.
Microsoft EXCEL data is typically stored in a file with the extension XLS. The XLS file format is a Microsoft proprietary scheme, and it is somewhat difficult for another program to extract data directly from an XLS file.
We will not try to have R read an EXCEL file directly. Instead, we will use the intermediate CSV (comma separated variable) format. In other words, given an XLS file, we will have EXCEL read it in, and then write it out as a CSV file. And if we have a CSV file, EXCEL can read it directly.
This means that EXCEL and R can "communicate" as long as the CSV files have the appropriate form.
One problem that can arise is that the CSV file might contain text, or strings, either as row or column labels, or as actual data. In order for strings to be handled properly, R will need them to appear inside quotes.
For example, the dataset parentht records the sex, height, mother's height and father's height for each of 40 children. If we have EXCEL read the file parentht.xls and then save the dataset as a CSV file parentht.csv, the first line of the CSV file contains labels for the columns, and reads:
Gender,Height,Mother's Height,Father's HeightBecause the labels are strings, and contain more than one word, R will not be able to read this data correctly. Since only this single line is a problem, we might fix the difficulty by simply using an editor to place double quotes around each label:
Gender, Height, "Mother's Height", "Father's Height"
The computer code and data files on this web page are distributed under the MIT license
CSV_IO R programs which illustrate how R can read or write CSV (Comma Separated Variable) files.
BODYTEMP is a dataset of 105 body temperatures in degrees Fahrenheit. There is only one variable, and there are no column labels, so this is a particularly easy dataset to handle.
PARENTHT is a dataset of 40 children, listing sex, height, and the height of both parents. Thus, there are four columns of data. Each column as a label, and these are text strings. When EXCEL outputs the CSV file, the text strings are not quoted, which will cause problems. Using an editor, the strings can be quoted, before being input to R.
CHMOVIE is a dataset of 50 children's movies, listing title, company, length in minutes, seconds of tobacco use, and seconds of alcohol use. There are five columns of data, each column as a label, and these are text strings. Using an editor, the strings in the CSV file output by EXCEL are all quoted before being passed to R.