Problem

We have a few CSV files that have the same columns. We want to merge them together while retaining information about which file each row came from.

Data

Let’s say the CSV files are file{1,2,3,4}.csv. This is file1.csv:

library(tidyverse)
read_csv("file1.csv")
ABCDEFGHIJ0123456789
Red
<dbl>
Blue
<dbl>
Green
<dbl>
Yellow
<dbl>
1638.5457987.76671241.4354
1687.4048847.45922031.2051
1616.90912550.75557888.0743
3628.5404502.794626064.2754

The other three files are similar.

Solution summary

This is the complete solution using tidyverse libraries:

library(tidyverse)
csv_names = paste0("file", 1:4, ".csv") 
csv_names %>% map(read_csv) %>% set_names(csv_names) %>% bind_rows(.id = "file")
ABCDEFGHIJ0123456789
file
<chr>
Red
<dbl>
Blue
<dbl>
Green
<dbl>
Yellow
<dbl>
file1.csv1638.5457987.76671241.4354
file1.csv1687.4048847.45922031.2051
file1.csv1616.90912550.75557888.0743
file1.csv3628.5404502.794626064.2754
file2.csv1740.4817855.78273921.4672
file2.csv1779.0659546.18523701.3121
file2.csv1495.42011824.84658297.7501
file2.csv3552.9414461.808625234.2801
file3.csv1741.4868141.41767311.4672
file3.csv1904.7248872.13929201.2273

Breaking it down

Making the list of files

The first step is to get all the file names in a list. If your filenames have a regular pattern, you can construct that easily with paste. paste0 is a special case of paste with the joining delimiter set to "".

csv_names = paste0("file", 1:4, ".csv") 
csv_names
[1] "file1.csv" "file2.csv" "file3.csv" "file4.csv"

Import each of them

Now we want to import each of these files into a data frame (technically a tibble since we’re using tidyverse). Fortunately purr has a map function that works very similarly to map function in Javascript, which I therefore found much more intuitive to use than the more popular lapply variety of functions in R.

csv_dfs = csv_names %>% map(read_csv)

length(csv_dfs)
[1] 4
csv_dfs[[3]]
ABCDEFGHIJ0123456789
Red
<dbl>
Blue
<dbl>
Green
<dbl>
Yellow
<dbl>
1741.4868141.41767311.4672
1904.7248872.13929201.2273
1494.95212617.70458018.0773
3541.4624369.841625084.2818

Joining them together

The bind_rows function binds rows together, and has a handy .id argument that creates a new column holding data that links back the rows to the data frames they came from.

csv_dfs %>% bind_rows(.id="filename")
ABCDEFGHIJ0123456789
filename
<chr>
Red
<dbl>
Blue
<dbl>
Green
<dbl>
Yellow
<dbl>
11638.5457987.76671241.4354
11687.4048847.45922031.2051
11616.90912550.75557888.0743
13628.5404502.794626064.2754
21740.4817855.78273921.4672
21779.0659546.18523701.3121
21495.42011824.84658297.7501
23552.9414461.808625234.2801

This is still not quite exactly what we want, because unless we name the elements in our list, bind_rows does not does not know what to put in the .id column, so it just uses a numeric sequence instead. lists in R are almost like dictionaries in python, but some elements only has a value and no key. Right now our list has only values and no keys. Setting the name for all of them can be done in one shot using set_names.

names(csv_dfs)
[1] "file1.csv" "file2.csv" "file3.csv" "file4.csv"

Now bind_rows can use these names to set the values in the .id column:

csv_dfs %>% bind_rows(.id="filename")
ABCDEFGHIJ0123456789
filename
<chr>
Red
<dbl>
Blue
<dbl>
Green
<dbl>
Yellow
<dbl>
file1.csv1638.5457987.76671241.4354
file1.csv1687.4048847.45922031.2051
file1.csv1616.90912550.75557888.0743
file1.csv3628.5404502.794626064.2754
file2.csv1740.4817855.78273921.4672
file2.csv1779.0659546.18523701.3121
file2.csv1495.42011824.84658297.7501
file2.csv3552.9414461.808625234.2801
file3.csv1741.4868141.41767311.4672
file3.csv1904.7248872.13929201.2273

We’re done!

Modifying each csv before merging

You may want to modify each individual CSV before merging. For example, your original csv files may have 100s of columns but you only want to keep a few, and you don’t want to merge first and then select for efficiency. It is easy to do so by creating a wrapper function around read_csv:


csv_extract = function(csv_name) {
  read_csv(csv_name) %>% 
    select(Red, Blue, Green)  # Dropping Yellow because we're not interested.
}
csv_names = paste0("file", 1:4, ".csv") 
csv_names %>% map(csv_extract) %>% set_names(csv_names) %>% bind_rows(.id = "file")
ABCDEFGHIJ0123456789
file
<chr>
Red
<dbl>
Blue
<dbl>
Green
<dbl>
file1.csv1638.5457987.7667124
file1.csv1687.4048847.4592203
file1.csv1616.90912550.7555788
file1.csv3628.5404502.79462606
file2.csv1740.4817855.7827392
file2.csv1779.0659546.1852370
file2.csv1495.42011824.8465829
file2.csv3552.9414461.80862523
file3.csv1741.4868141.4176731
file3.csv1904.7248872.1392920
LS0tCnRpdGxlOiAiTWVyZ2luZyBDU1YgZmlsZXMiCm91dHB1dDoKICBodG1sX25vdGVib29rOgogICAgY29kZV9mb2xkaW5nOiBzaG93CiAgICB0b2M6IHllcwogICAgdGhlbWU6IGNlcnVsZWFuCiAgaHRtbF9kb2N1bWVudDoKICAgIGRmX3ByaW50OiBwYWdlZAogICAgdG9jOiB5ZXMKICBwZGZfZG9jdW1lbnQ6IGRlZmF1bHQKLS0tCgo8c3R5bGUgdHlwZT0idGV4dC9jc3MiPgoKYm9keXsgLyogTm9ybWFsICAqLwogICAgICBmb250LXNpemU6IDE0cHg7CiAgfQp0ZCB7ICAvKiBUYWJsZSAgKi8KICBmb250LXNpemU6IDE0cHg7Cn0KaDEudGl0bGUgewogIGZvbnQtc2l6ZTogMzhweDsKfQpoMSB7IC8qIEhlYWRlciAxICovCiAgZm9udC1zaXplOiAyOHB4Owp9CmgyIHsgLyogSGVhZGVyIDIgKi8KICAgIGZvbnQtc2l6ZTogMjJweDsKfQpoMyB7IC8qIEhlYWRlciAzICovCiAgZm9udC1zaXplOiAxOHB4Owp9CmNvZGUucnsgLyogQ29kZSBibG9jayAqLwogICAgZm9udC1zaXplOiAxNHB4Owp9CnByZSB7IC8qIENvZGUgYmxvY2sgLSBkZXRlcm1pbmVzIGNvZGUgc3BhY2luZyBiZXR3ZWVuIGxpbmVzICovCiAgICBmb250LXNpemU6IDE0cHg7Cn0KPC9zdHlsZT4KCiMgUHJvYmxlbQpXZSBoYXZlIGEgZmV3IENTViBmaWxlcyB0aGF0IGhhdmUgdGhlIHNhbWUgY29sdW1ucy4gV2Ugd2FudCB0byBtZXJnZSB0aGVtIHRvZ2V0aGVyIHdoaWxlIHJldGFpbmluZyBpbmZvcm1hdGlvbiBhYm91dCB3aGljaCBmaWxlIGVhY2ggcm93IGNhbWUgZnJvbS4gCgojIERhdGEKTGV0J3Mgc2F5IHRoZSBDU1YgZmlsZXMgYXJlIGBmaWxlezEsMiwzLDR9LmNzdmAuIFRoaXMgaXMgYGZpbGUxLmNzdmA6CgpgYGB7ciBtZXNzYWdlPUZBTFNFfQpsaWJyYXJ5KHRpZHl2ZXJzZSkKcmVhZF9jc3YoImZpbGUxLmNzdiIpCmBgYApUaGUgb3RoZXIgdGhyZWUgZmlsZXMgYXJlIHNpbWlsYXIuIAoKIyBTb2x1dGlvbiBzdW1tYXJ5CgpUaGlzIGlzIHRoZSBjb21wbGV0ZSBzb2x1dGlvbiB1c2luZyB0aWR5dmVyc2UgbGlicmFyaWVzOgoKYGBge3IgbWVzc2FnZT1GQUxTRX0KbGlicmFyeSh0aWR5dmVyc2UpCmNzdl9uYW1lcyA9IHBhc3RlMCgiZmlsZSIsIDE6NCwgIi5jc3YiKSAKY3N2X25hbWVzICU+JSBtYXAocmVhZF9jc3YpICU+JSBzZXRfbmFtZXMoY3N2X25hbWVzKSAlPiUgYmluZF9yb3dzKC5pZCA9ICJmaWxlIikKYGBgCgojIEJyZWFraW5nIGl0IGRvd24gCgojIyBNYWtpbmcgdGhlIGxpc3Qgb2YgZmlsZXMgCgpUaGUgZmlyc3Qgc3RlcCBpcyB0byBnZXQgYWxsIHRoZSBmaWxlIG5hbWVzIGluIGEgbGlzdC4gSWYgeW91ciBmaWxlbmFtZXMgaGF2ZSBhIHJlZ3VsYXIgcGF0dGVybiwgeW91IGNhbiBjb25zdHJ1Y3QgdGhhdCBlYXNpbHkgd2l0aCBgcGFzdGVgLiBgcGFzdGUwYCBpcyBhIHNwZWNpYWwgY2FzZSBvZiBwYXN0ZSB3aXRoIHRoZSBqb2luaW5nIGRlbGltaXRlciBzZXQgdG8gYCIiYC4gCgpgYGB7cn0KY3N2X25hbWVzID0gcGFzdGUwKCJmaWxlIiwgMTo0LCAiLmNzdiIpIApjc3ZfbmFtZXMKYGBgCgojIyBJbXBvcnQgZWFjaCBvZiB0aGVtIAoKTm93IHdlIHdhbnQgdG8gaW1wb3J0IGVhY2ggb2YgdGhlc2UgZmlsZXMgaW50byBhIGRhdGEgZnJhbWUgKHRlY2huaWNhbGx5IGEgdGliYmxlIHNpbmNlIHdlJ3JlIHVzaW5nIHRpZHl2ZXJzZSkuIEZvcnR1bmF0ZWx5IGBwdXJyYCBoYXMgYSBtYXAgZnVuY3Rpb24gdGhhdCB3b3JrcyB2ZXJ5IHNpbWlsYXJseSB0byBtYXAgZnVuY3Rpb24gaW4gSmF2YXNjcmlwdCwgd2hpY2ggSSB0aGVyZWZvcmUgZm91bmQgbXVjaCBtb3JlIGludHVpdGl2ZSB0byB1c2UgdGhhbiB0aGUgbW9yZSBwb3B1bGFyIGBsYXBwbHlgIHZhcmlldHkgb2YgZnVuY3Rpb25zIGluIFIuICAKCmBgYHtyIG1lc3NhZ2U9RkFMU0V9CmNzdl9kZnMgPSBjc3ZfbmFtZXMgJT4lIG1hcChyZWFkX2NzdikKCmxlbmd0aChjc3ZfZGZzKQpjc3ZfZGZzW1szXV0KYGBgCgojIyBKb2luaW5nIHRoZW0gdG9nZXRoZXIKClRoZSBbYGJpbmRfcm93c2BdKGh0dHBzOi8vZHBseXIudGlkeXZlcnNlLm9yZy9yZWZlcmVuY2UvYmluZC5odG1sKSBmdW5jdGlvbiBiaW5kcyByb3dzIHRvZ2V0aGVyLCBhbmQgaGFzIGEgaGFuZHkgYC5pZGAgYXJndW1lbnQgdGhhdCBjcmVhdGVzIGEgbmV3IGNvbHVtbiBob2xkaW5nIGRhdGEgdGhhdCBsaW5rcyBiYWNrIHRoZSByb3dzIHRvIHRoZSBkYXRhIGZyYW1lcyB0aGV5IGNhbWUgZnJvbS4gCgpgYGB7ciByb3dzLnByaW50PTh9CmNzdl9kZnMgJT4lIGJpbmRfcm93cyguaWQ9ImZpbGVuYW1lIikKYGBgCgpUaGlzIGlzIHN0aWxsIG5vdCBxdWl0ZSBleGFjdGx5IHdoYXQgd2Ugd2FudCwgYmVjYXVzZSB1bmxlc3Mgd2UgbmFtZSB0aGUgZWxlbWVudHMgaW4gb3VyIGxpc3QsIGJpbmRfcm93cyBkb2VzIG5vdCBkb2VzIG5vdCBrbm93IHdoYXQgdG8gcHV0IGluIHRoZSBgLmlkYCBjb2x1bW4sIHNvIGl0IGp1c3QgdXNlcyBhIG51bWVyaWMgc2VxdWVuY2UgaW5zdGVhZC4gYGxpc3RzYCBpbiBSIGFyZSBhbG1vc3QgbGlrZSBkaWN0aW9uYXJpZXMgaW4gcHl0aG9uLCBidXQgc29tZSBlbGVtZW50cyBvbmx5IGhhcyBhIHZhbHVlIGFuZCBubyBrZXkuIFJpZ2h0IG5vdyBvdXIgbGlzdCBoYXMgb25seSB2YWx1ZXMgYW5kIG5vIGtleXMuIFNldHRpbmcgdGhlIG5hbWUgZm9yIGFsbCBvZiB0aGVtIGNhbiBiZSBkb25lIGluIG9uZSBzaG90IHVzaW5nIGBzZXRfbmFtZXNgLgoKYGBge3IgbWVzc2FnZT1GQUxTRX0KY3N2X2RmcyA9IGNzdl9uYW1lcyAlPiUgbWFwKHJlYWRfY3N2KSAlPiUgc2V0X25hbWVzKGNzdl9uYW1lcykKCm5hbWVzKGNzdl9kZnMpCmNzdl9kZnMkZmlsZTMuY3N2CmBgYAoKTm93IGBiaW5kX3Jvd3NgIGNhbiB1c2UgdGhlc2UgbmFtZXMgdG8gc2V0IHRoZSB2YWx1ZXMgaW4gdGhlIGAuaWRgIGNvbHVtbjogCgpgYGB7cn0KY3N2X2RmcyAlPiUgYmluZF9yb3dzKC5pZD0iZmlsZW5hbWUiKQpgYGAKCgpXZSdyZSBkb25lISAKCiMjICBNb2RpZnlpbmcgZWFjaCBjc3YgYmVmb3JlIG1lcmdpbmcKCllvdSBtYXkgd2FudCB0byBtb2RpZnkgZWFjaCBpbmRpdmlkdWFsIENTViBiZWZvcmUgbWVyZ2luZy4gRm9yIGV4YW1wbGUsIHlvdXIgb3JpZ2luYWwgY3N2IGZpbGVzIG1heSBoYXZlIDEwMHMgb2YgY29sdW1ucyBidXQgeW91IG9ubHkgd2FudCB0byBrZWVwIGEgZmV3LCBhbmQgeW91IGRvbid0IHdhbnQgdG8gbWVyZ2UgZmlyc3QgYW5kIHRoZW4gc2VsZWN0IGZvciBlZmZpY2llbmN5LiBJdCBpcyBlYXN5IHRvIGRvIHNvIGJ5IGNyZWF0aW5nIGEgd3JhcHBlciBmdW5jdGlvbiBhcm91bmQgYHJlYWRfY3N2YDogCgpgYGB7ciBtZXNzYWdlPUZBTFNFfQoKY3N2X2V4dHJhY3QgPSBmdW5jdGlvbihjc3ZfbmFtZSkgewogIHJlYWRfY3N2KGNzdl9uYW1lKSAlPiUgCiAgICBzZWxlY3QoUmVkLCBCbHVlLCBHcmVlbikgICMgRHJvcHBpbmcgWWVsbG93IGJlY2F1c2Ugd2UncmUgbm90IGludGVyZXN0ZWQuCn0KY3N2X25hbWVzID0gcGFzdGUwKCJmaWxlIiwgMTo0LCAiLmNzdiIpIApjc3ZfbmFtZXMgJT4lIG1hcChjc3ZfZXh0cmFjdCkgJT4lIHNldF9uYW1lcyhjc3ZfbmFtZXMpICU+JSBiaW5kX3Jvd3MoLmlkID0gImZpbGUiKQpgYGAK