Reading and writing CSV with Python
Batteries included is a blog series about the Python Standard Library. Each day, I share insights, ideas and examples for different parts of the library. Blaugust is an annual blogging festival in August where the goal is to write a blog post every day of the month.
Yesterday, I started with regular expressions and the
re
module. In it, I mentioned two ways
to parse input data: regular expressions and string operations. Before we go
any further, I’ll introduce a third one so you don’t get any ideas about
parsing the CSV files by hand using either of the aforementioned methods.
Python’s standard library includes a fantastic module csv that provides a Reader and Writer objects for reading and writing CSVs.
Example data
The example data I’m using today is a CSV file that contains information about public libraries in mainland Finland from 1999 to 2022. It was downloaded from StatFin and used/shared in accordance with its CC-BY 4.0 license.
The entire CSV file is available here and the first three lines look like this:
"Year","Main libraries (no.)","Branch libraries (no.)","Institutional libraries (no.)","Libraries total (no.)","Mobile libraries (no.)","Stops for mobile libraries (no.)","Library boats (no.)","Stops for library boats (no.)","Collections: books, (no. in thousands)","Collections: other library media (no. in thousands)","Collections total (no. in thousands)","Acquisitions: books (no. in thousands)","Acquisitions: other library media (no. in thousands)","Acquisitions total (no. in thousands)","Operating expenditure total (1 000 EUR)","Expenditure on staff (1 000 EUR)","Expenditure on library materials (1 000 e)","Other expenditure (1 000 EUR)","Expenditure / inhabitant (EUR/inhabitant)","Staff total (person years)","Person years hired by the library (person years)","Other person years (person years)","Library visits (no. in thousands)","Library visits / inhabitant (no. / inhabitant)","Web visits (no. in thousands)","Borrowers (no. in thousands)","Borrowers / inhabitant, %","Loans total (no. in thousands)","Kbook loans (no. in thousands)","Loans of other medias (no. in thousands)","Recieved inter-library loans (no. in thousands)","Given inter-library loans (no. in thousands)","Loans total / inhabitant (no. /inhabitant)"
"1999",436,401,111,948,203,17218,.,.,36940,3698,40638,1557,309,1866,201931,113234,34252,54445,39,4771,4157,614,63765,12,.,2417,47,99265,75895,23158,212,203,19
"2000",436,401,99,936,201,16960,1,10,37013,3884,40897,1607,310,1917,208125,117103,35159,55864,40,4760,4175,585,63978,12,.,2431,47,102197,77281,24690,226,212,20
Reading a CSV
A base way of to read a CSV file into Python with csv module looks like this
import csv
with open('libraries.csv', 'r') as csvfile:
data = csv.reader(csvfile)
headers = next(data)
for row in data:
for header, value in zip(headers, row):
print(f'{header}: {value}')
This prints out
Year: 1999
Main libraries (no.): 436
Branch libraries (no.): 401
Institutional libraries (no.): 111
Libraries total (no.): 948
Mobile libraries (no.): 203
[ -- removed long output -- ]
DictReader
csv also provides a DictReader
that
reads the CSV file directly into a list of dictionaries. For each dictionary,
the headers of the CSV file are used as keys.
import csv
with open("libraries.csv", "r") as csvfile:
reader = csv.DictReader(csvfile)
print(list(reader))
prints
[
{
'Year': '1999',
'Main libraries (no.)': '436',
'Branch libraries (no.)': '401',
'Institutional libraries (no.)': '111',
'Libraries total (no.)': '948',
...
},
...
]
Custom delimiters
By default, csv.reader
uses comma as a
delimiter. You can provide a
delimiter
keyword argument to change it
to any other delimiter. For example, to read a semi-colon delimited data, you
can do:
import csv
with open("libraries.csv", "r") as csvfile:
reader = csv.reader(csvfile, delimiter=';')
Writing data into a CSV
To write data into a CSV format, we initiate a Writer object and use writerows method to write a list of lists of data into a file.
import csv
data = [
['name', 'favorite book'],
['Juhis', 'Hitchhiker\'s Guide to the Galaxy'],
['Kado', 'Alice in Wonderland'],
['Ellie', 'Moby Dick']
]
with open('favorites.csv', 'w') as csvfile:
writer = csv.writer(csvfile)
writer.writerows(data)
results in
name,favorite book
Juhis,Hitchhiker's Guide to the Galaxy
Kado,Alice in Wonderland
Ellie,Moby Dick
Quotes
By default, the writer adds quotes around strings only when necessary.
Sometimes for extra added compatibility, I like to make sure they are added
around every item and that can be done by passing a
quoting
keyword argument:
import csv
data = [
['name', 'favorite book'],
['Juhis', 'Hitchhiker\'s Guide to the Galaxy'],
['Kado', 'Alice in Wonderland'],
['Ellie', 'Moby Dick']
]
with open('favorites.csv', 'w') as csvfile:
writer = csv.writer(csvfile, quoting=csv.QUOTE_ALL)
writer.writerows(data)
results in
"name","favorite book"
"Juhis","Hitchhiker's Guide to the Galaxy"
"Kado","Alice in Wonderland"
"Ellie","Moby Dick"
DictWriter
To write dictionaries into csv format, you can use DictWriter.
To initialise the writer, you need to pass in a list of
fieldnames
that matches the keys in the
dictionaries passed to writerows
. A
ValueError
is raised when a key that is
not present in fieldnames
is
encountered.
import csv
data = [
{"name": "Juhis", "favorite book": "Hitchhiker's Guide to the Galaxy"},
{"name": "Kado", "favorite book": "Alice in Wonderland"},
{"name": "Ellie", "favorite book": "Moby Dick"},
]
with open("favorites.csv", "w") as csvfile:
fieldnames = ["name", "favorite book"]
writer = csv.DictWriter(csvfile, fieldnames, quoting=csv.QUOTE_ALL)
writer.writeheader()
writer.writerows(data)
Bonus tool, csvpy
While this series is mainly about the standard library of Python, I want to highlight a really cool tool from the csvkit. Csvkit is a command-line toolkit that provides tools for many csv related tasks.
The one relevant here is csvpy which loads a CSV file passed to it into a csv reader and opens a Python REPL for you:
➜ csvpy libraries.csv
Welcome! "libraries.csv" has been loaded in an agate.csv.reader object named "reader".
>>> next(reader)
['Year', 'Main libraries (no.)', 'Branch libraries (no.)', 'Institutional libraries (no.)', 'Libraries total (no.)', 'Mobile libraries (no.)', 'Stops for mobile libraries (no.)', 'Library boats (no.)', 'Stops for library boats (no.)', 'Collections: books, (no. in thousands)', 'Collections: other library media (no. in thousands)', 'Collections total (no. in thousands)', 'Acquisitions: books (no. in thousands)', 'Acquisitions: other library media (no. in thousands)', 'Acquisitions total (no. in thousands)', 'Operating expenditure total (1 000 EUR)', 'Expenditure on staff (1 000 EUR)', 'Expenditure on library materials (1 000 e)', 'Other expenditure (1 000 EUR)', 'Expenditure / inhabitant (EUR/inhabitant)', 'Staff total (person years)', 'Person years hired by the library (person years)', 'Other person years (person years)', 'Library visits (no. in thousands)', 'Library visits / inhabitant (no. / inhabitant)', 'Web visits (no. in thousands)', 'Borrowers (no. in thousands)', 'Borrowers / inhabitant, %', 'Loans total (no. in thousands)', 'Kbook loans (no. in thousands)', 'Loans of other medias (no. in thousands)', 'Recieved inter-library loans (no. in thousands)', 'Given inter-library loans (no. in thousands)', 'Loans total / inhabitant (no. /inhabitant)']
If you want to load the data into a
DictReader
, you can pass an additional
--dict
flag to it:
➜ csvpy --dict favorites.csv
Welcome! "favorites.csv" has been loaded in an agate.csv.DictReader object named "reader".
>>> next(reader)
{'name': 'Juhis', 'favorite book': "Hitchhiker's Guide to the Galaxy"}
I find this tool so handy when I’m reading and managing data on the command line.
Under the hood, it uses agate’s CSV reader.