Comma-Separated Value Files and R


Perhaps the most conventional file type to store data, .csv files are ubiquitous. Reading and writing comma-separated value files can often be the first and/or last step of an traditional data analysis project.

It is advantageous to use the readr package (a part of the tidyverse) over base R to read and write your .csv files for several reasons. Readr functions use a more consistent naming scheme, offer the user more intuitive parsing, and are much faster than base R. Readr also automatically loads your data as a useful tibble. On the other hand, if you’re looking for more speed, it may be useful to check out the data.table package instead.


library('readr')


Reading .csv Files

In its most simple form, this may get the job done. It can be helpful to use R Studio’s Import Dataset GUI to quickly access the secondary arguments of read_csv.


my_data <- read_csv('mydata.csv')


Custom Input Arguements

To read tab or semicolon delimited files, use these arguments with the read_delim function. Again, it’s probably best to use the R Studio GUI if you have access to it. This can be useful when you’re working with tab separated files, or European data (semicolons).

I like to choose my import options with the GUI, then copy the code in the bottom right hand corner of the interface into my R script. This increases reproducability.


read_delim(
  "mydata.csv",               # file path and name always comes first
  delim = ",",                # single character field separator
  quote = "\"",               # single character to quote strings
  comment = "#",              # single character to signal comments
  col_names = c('a','b'),     # custom name columns on import
  na = ".",                   # string to signify missing values
  skip = 0,                   # number of lines to skip before read
  progress = show_progress()  # display a progress bar
  )


Zipped Files

Compressed data files are treated just like uncompressed files. You don’t have to do anything extra!


my_data <- read_csv('mydata.csv.zip')


Reading Files Outside of the Working Directory

It’s good practice to keep all of your input files together in an input folder. You can simply reference the path to this folder in your read_csv function.


my_data <- read_csv('~/local/path/to/my/file/mydata.csv')
my_data <- read_csv('https://github.com/tidyverse/readr/raw/master/inst/extdata/mtcars.csv')


Writing .csv Files

Remember that you must type the name of the object in your R environment before you give the argument to the path and file name you’re saving.


write_csv(
  data_name, 
  'path/to/filename.csv'
)


Saving Custom Delimiters

The function write_delim can also be used to write a .csv files if you set the delim arguement to ‘,’. In this case, the tibble will be saved as a tab separated file.


write_delim(
  data_name,                
  'path/to/filename.tsv'    
  delim = '\t'              
)


Appending .csv Files

Be careful not to append the column names as an observation!


write_csv(
  your_object,
  'path/to/existing/filename.csv',
  append = T
)


That’s all for now! Thanks for reading!

- Fisher



Comments