Juha-Matti Santala
Community Builder. Dreamer. Adventurer.

Reading and writing CSV with Python

Batteries included is a blog series about the Python Standard Library. Each day, I share insights, ideas and examples for different parts of the library. Blaugust is an annual blogging festival in August where the goal is to write a blog post every day of the month.

Yesterday, I started with regular expressions and the re module. In it, I mentioned two ways to parse input data: regular expressions and string operations. Before we go any further, I’ll introduce a third one so you don’t get any ideas about parsing the CSV files by hand using either of the aforementioned methods.

Python’s standard library includes a fantastic module csv that provides a Reader and Writer objects for reading and writing CSVs.

Example data

The example data I’m using today is a CSV file that contains information about public libraries in mainland Finland from 1999 to 2022. It was downloaded from StatFin and used/shared in accordance with its CC-BY 4.0 license.

The entire CSV file is available here and the first three lines look like this:

"Year","Main libraries (no.)","Branch libraries (no.)","Institutional libraries (no.)","Libraries total (no.)","Mobile libraries (no.)","Stops for mobile libraries (no.)","Library boats (no.)","Stops for library boats (no.)","Collections: books, (no. in thousands)","Collections: other library media (no. in thousands)","Collections total (no. in thousands)","Acquisitions: books (no. in thousands)","Acquisitions: other library media (no. in thousands)","Acquisitions total (no. in thousands)","Operating expenditure total (1 000 EUR)","Expenditure on staff (1 000 EUR)","Expenditure on library materials (1 000 e)","Other expenditure (1 000 EUR)","Expenditure / inhabitant (EUR/inhabitant)","Staff total (person years)","Person years hired by the library (person years)","Other person years (person years)","Library visits (no. in thousands)","Library visits / inhabitant (no. / inhabitant)","Web visits (no. in thousands)","Borrowers (no. in thousands)","Borrowers / inhabitant, %","Loans total (no. in thousands)","Kbook loans (no. in thousands)","Loans of other medias (no. in thousands)","Recieved inter-library loans (no. in thousands)","Given inter-library loans (no. in thousands)","Loans total / inhabitant (no. /inhabitant)"
"1999",436,401,111,948,203,17218,.,.,36940,3698,40638,1557,309,1866,201931,113234,34252,54445,39,4771,4157,614,63765,12,.,2417,47,99265,75895,23158,212,203,19
"2000",436,401,99,936,201,16960,1,10,37013,3884,40897,1607,310,1917,208125,117103,35159,55864,40,4760,4175,585,63978,12,.,2431,47,102197,77281,24690,226,212,20

Reading a CSV

A base way of to read a CSV file into Python with csv module looks like this

import csv

with open('libraries.csv', 'r') as csvfile:
  data = csv.reader(csvfile)
  headers = next(data)
  for row in data:
	  for header, value in zip(headers, row):
      print(f'{header}: {value}')

This prints out

Year: 1999
Main libraries (no.): 436
Branch libraries (no.): 401
Institutional libraries (no.): 111
Libraries total (no.): 948
Mobile libraries (no.): 203
[ -- removed long output -- ]

DictReader

csv also provides a DictReader that reads the CSV file directly into a list of dictionaries. For each dictionary, the headers of the CSV file are used as keys.

import csv

with open("libraries.csv", "r") as csvfile:
    reader = csv.DictReader(csvfile)
    print(list(reader))

prints

[
 {
  'Year': '1999',
  'Main libraries (no.)': '436', 
  'Branch libraries (no.)': '401', 
  'Institutional libraries (no.)': '111', 
  'Libraries total (no.)': '948', 
  ...
},
...
]

Custom delimiters

By default, csv.reader uses comma as a delimiter. You can provide a delimiter keyword argument to change it to any other delimiter. For example, to read a semi-colon delimited data, you can do:

import csv

with open("libraries.csv", "r") as csvfile:
    reader = csv.reader(csvfile, delimiter=';')

Writing data into a CSV

To write data into a CSV format, we initiate a Writer object and use writerows method to write a list of lists of data into a file.

import csv

data = [
  ['name', 'favorite book'],
  ['Juhis', 'Hitchhiker\'s Guide to the Galaxy'],
  ['Kado', 'Alice in Wonderland'],
  ['Ellie', 'Moby Dick']
]

with open('favorites.csv', 'w') as csvfile:
  writer = csv.writer(csvfile)
  writer.writerows(data)

results in

name,favorite book
Juhis,Hitchhiker's Guide to the Galaxy
Kado,Alice in Wonderland
Ellie,Moby Dick

Quotes

By default, the writer adds quotes around strings only when necessary. Sometimes for extra added compatibility, I like to make sure they are added around every item and that can be done by passing a quoting keyword argument:

import csv

data = [
  ['name', 'favorite book'],
  ['Juhis', 'Hitchhiker\'s Guide to the Galaxy'],
  ['Kado', 'Alice in Wonderland'],
  ['Ellie', 'Moby Dick']
]

with open('favorites.csv', 'w') as csvfile:
  writer = csv.writer(csvfile, quoting=csv.QUOTE_ALL)
  writer.writerows(data)

results in

"name","favorite book"
"Juhis","Hitchhiker's Guide to the Galaxy"
"Kado","Alice in Wonderland"
"Ellie","Moby Dick"

DictWriter

To write dictionaries into csv format, you can use DictWriter.

To initialise the writer, you need to pass in a list of fieldnames that matches the keys in the dictionaries passed to writerows. A ValueError is raised when a key that is not present in fieldnames is encountered.

import csv

data = [
    {"name": "Juhis", "favorite book": "Hitchhiker's Guide to the Galaxy"},
    {"name": "Kado", "favorite book": "Alice in Wonderland"},
    {"name": "Ellie", "favorite book": "Moby Dick"},
]

with open("favorites.csv", "w") as csvfile:
    fieldnames = ["name", "favorite book"]
    writer = csv.DictWriter(csvfile, fieldnames, quoting=csv.QUOTE_ALL)
    writer.writeheader()
    writer.writerows(data)

Bonus tool, csvpy

While this series is mainly about the standard library of Python, I want to highlight a really cool tool from the csvkit. Csvkit is a command-line toolkit that provides tools for many csv related tasks.

The one relevant here is csvpy which loads a CSV file passed to it into a csv reader and opens a Python REPL for you:

➜ csvpy libraries.csv
Welcome! "libraries.csv" has been loaded in an agate.csv.reader object named "reader".
>>> next(reader)
['Year', 'Main libraries (no.)', 'Branch libraries (no.)', 'Institutional libraries (no.)', 'Libraries total (no.)', 'Mobile libraries (no.)', 'Stops for mobile libraries (no.)', 'Library boats (no.)', 'Stops for library boats (no.)', 'Collections: books, (no. in thousands)', 'Collections: other library media (no. in thousands)', 'Collections total (no. in thousands)', 'Acquisitions: books (no. in thousands)', 'Acquisitions: other library media (no. in thousands)', 'Acquisitions total (no. in thousands)', 'Operating expenditure total (1 000 EUR)', 'Expenditure on staff (1 000 EUR)', 'Expenditure on library materials (1 000 e)', 'Other expenditure (1 000 EUR)', 'Expenditure / inhabitant (EUR/inhabitant)', 'Staff total (person years)', 'Person years hired by the library (person years)', 'Other person years (person years)', 'Library visits (no. in thousands)', 'Library visits / inhabitant (no. / inhabitant)', 'Web visits (no. in thousands)', 'Borrowers (no. in thousands)', 'Borrowers / inhabitant, %', 'Loans total (no. in thousands)', 'Kbook loans (no. in thousands)', 'Loans of other medias (no. in thousands)', 'Recieved inter-library loans (no. in thousands)', 'Given inter-library loans (no. in thousands)', 'Loans total / inhabitant (no. /inhabitant)']

If you want to load the data into a DictReader, you can pass an additional --dict flag to it:

➜ csvpy --dict favorites.csv
Welcome! "favorites.csv" has been loaded in an agate.csv.DictReader object named "reader".
>>> next(reader)
{'name': 'Juhis', 'favorite book': "Hitchhiker's Guide to the Galaxy"}

I find this tool so handy when I’m reading and managing data on the command line.

Under the hood, it uses agate’s CSV reader.