Juha-Matti Santala
Community Builder. Dreamer. Adventurer.

Learning Rust #4: Parsing JSON with strong types

Last December I finally started learning Rust and in January I built and published my first app with Rust: 235. Learning Rust is my new monthly blog series that is defnitely not a tutorial but rather a place for me to keep track of my learning and write about things I've learned along the way.

Learning Rust series

One big difference with Rust compared to Javascript and Python that I've been writing for most of my career is strong static typing. With JS and Python, I'd just call an API, get some data in JSON, parse that into a Python dict or Javascript object and then access different values directly.

With Rust, I'm working with differently typed structures and while I understand the value of it, sometimes it feels very cumbersome to write.

I started with serde_json::Value

When I started working with Rust to build 235, I decided to use Serde that seemed to be a popular library for seriealizing and deserializing data. While I was still learning the basics, it felt overwhelming to figure out the correct way of defining types for the entire API response. I decided to leave that as a refactoring exercise for a later day and started by using serde_json::Value, which is kinda a catch-all solution. It is an Enum that represents any valid JSON value.

#[tokio::main]
async fn fetch_games() -> Result<serde_json::Value, Error> {
    let request_url = String::from("https://nhl-score-api.herokuapp.com/api/scores/latest");
    let response = reqwest::get(&request_url).await?;
    let scores: serde_json::Value = response.json().await?;

    Ok(scores)
}

Here's my original fetch_games function that called the API and parsed the data into a serde_json::Value. Nice and clean, easy to use.

But it led to my code being full of code like this

let games = scores["games"].as_array().unwrap();

let home_team = &game_json["teams"]["home"]["abbreviation"].as_str().unwrap();

let all_goals = game_json["goals"].as_array().unwrap_or(&empty);

I had to code very defensively because any type conversion with as_ methods could have fail and thus needed to be unwrapped and in some cases, made sure the keys even existed in the beginning:

if (&game_json["teams"]).is_null() {
  return None;
}

let home_team = &game_json["teams"]["home"]["abbreviation"].as_str().unwrap();
let away_team = &game_json["teams"]["away"]["abbreviation"].as_str().unwrap();

Using serde_json::Value did help me build the first functioning versions of my application but in the back of my head I had the knowledge that defining the structures as Rust structs would make my code better.

One key thing that kept me from doing that in the beginning was I couldn't figure out how to deal with dynamic keys. The API I use uses team abbreviations as keys on certain places and I had no idea how to do it so I left the issue for months.

How to define structs for JSON

Using custom structs to define expected types with JSON is rather straight-forward if you're familiar with structs.

I'll start with the example from the documentation:

#[derive(Serialize, Deserialize)]
struct Person {
    name: String,
    age: u8,
    phones: Vec<String>,
}

Here we define a Person that has name as String, an age as u8 and a list of phone numbers as Vec<String>. Once we have this definition, all we have to do is to use that as a type of the result and serde_json and Rust will validate that the structure and types of data matches what we defined:

// Parse the string of data into a Person object.
let p: Person = serde_json::from_str(data)?;

// Do things just like with any other Rust data structure.
println!("Please call {} at the number {}", p.name, p.phones[0]);

Then we draw the rest of the owl

I found the base example very good and nice to follow. However, that wasn't all that I needed so next I needed to figure out how to draw the rest of the owl.

snake_case of Rust with camelCase of JSON

First, Rust likes to use snake_case and gives you a warning by default if you don't. JSON keys are usually in camelCase so my editor and my shell were full of warnings and that was annoying. I learned that I can silence them by defining an allow attribute for each struct that needed it:

#[derive(Debug, Serialize, Deserialize)]
#[allow(non_snake_case)]
pub struct TeamResponse {
    pub abbreviation: String,
    pub id: u64,
    pub locationName: String,
    pub shortName: String,
    pub teamName: String,
}

I haven't yet found out if there's a way to use snake case'd names and automatically map them to corresponding JSON values. (Raymond Hettinger from the Python community has a nice talk for why it's valuable to have your code in idiomatic Python; much of it is applicable to any language).

edit. 2021-05-06: Thanks to PatatasDelPapa's comment in pull request, I was able to fix this:

#[derive(Debug, Serialize, Deserialize)]
#[serde(rename_all = "camelCase")]
pub struct TeamResponse {
    pub abbreviation: String,
    pub id: u64,
    pub location_name: String,
    pub short_name: String,
    pub team_name: String,
}

Code organizing API response types into a separate file

Second, you'll see that in the snippet above, definitions start with pub. That's because I wanted to move my API response type definitions to its own file so they don't pollute my main.rs and it's easier to find and change them. So I made api_types.rs file, moved them all there, added plenty of pubs and imported them with

mod api_types;
use api_types::{APIResponse, GameResponse, GoalResponse};

in main.rs.

Dynamic keys

Third, I needed some support for dynamic keys. I was expecting to have to jump through some hoops but this ended up being one of the easiest steps in the process. Thanks to this GitHub issue, I found the solution: use HashMap<String, serde_json::Value> for any object that has dynamic keys.

#[derive(Debug, Serialize, Deserialize)]
#[allow(non_snake_case)]
pub struct GameResponse {
    pub status: StatusResponse,
    pub startTime: String,
    pub goals: Vec<GoalResponse>,
    pub scores: HashMap<String, serde_json::Value>,
    pub teams: TeamsResponse,
    pub preGameStats: PreGameStatsResponse,
    pub currentStats: CurrentStatsResponse,
}

In the documentation of the API, scores field is defined as

scores object: each team’s goal count, plus one of these possible fields:
- overtime: set to true if the game ended in overtime, absent if it didn’t
- shootout: set to true if the game ended in shootout, absent if it didn’t

and looks like

"scores": {
  "BOS": 4,
  "CHI": 3,
  "overtime": true
}

The keys for each team are gonna be dynamic so by using HashMap. The downside of this that I wasn't able to figure out, is that I'd like to put overtime and shootout keys into the type definitions.

Optional keys

And finally, fourth, some keys are optional – or to be specific in this case, they are dependent on the situation.

I'm happy I already learned about Option types earlier (check out Learning Rust #2 if you haven't heard about them). And that's all you need with Rust and serde_json:

#[derive(Debug, Serialize, Deserialize)]
pub struct APIResponse {
    pub date: DateResponse,
    pub games: Vec<GameResponse>,
    pub errors: Option<HashMap<String, serde_json::Value>>,
}

In this example, which is the top level definition of my API response type, I've defined errors to be optional – it's only present if the API I use found some errors in the data it's using.

By saying that the type of the data is Option, we'll either get a Some(value) if the key exists and None if it doesn't. For most cases in this application, it's not random which keys are there and which are not. So if I'm already in a branch of logic where the prerequisite is being dealt with, I can trust that the value exists and can unwrap without dealing with potential Nones.

Result? Much cleaner code

Refactoring the code with these definitions is not only an exercise of writing code for the sake of writing code. The result of this work is that we push the problems at the boundaries of parsing the data and we don't have to deal with them at the logic level anymore.

I already showed some examples in the beginning about the code I wasn't happy about. Let's see how they look like now:

Before

if (&game_json["teams"]).is_null() {
  return None;
}

let home_team = &game_json["teams"]["home"]["abbreviation"].as_str().unwrap();
let away_team = &game_json["teams"]["away"]["abbreviation"].as_str().unwrap();

After

let home_team = &game_json.teams.home.abbreviation;
let away_team = &game_json.teams.away.abbreviation;

Before

let empty: Vec<serde_json::Value> = Vec::new();
let all_goals = game_json["goals"].as_array().unwrap_or(&empty);

After

let all_goals = &game_json.goals;

I don't know about you but I'm very happy to see my code being much more simpler and easier to read. I can also trust that once the data comes through from the API, it won't have any surprises.

Anything I don't like?

Writing type definitions for these kind of nested structures feels sometimes overly complicated. There are some definitions that I don't need anywhere outside one value in the structure and having to write a named struct to that feels bit annoying.

One thing I did enjoy in Typescript is a way to write in-line nested types and interfaces like this:

interface IEndpoints {
 auth: {
  login: string;
 }
}

In Rust, I would have needed to make two structs.

Pull requests for all these changes

If you wanna check out all the changes and how the codebase cleaned up during this process, you can find it in GitHub: hamatti/nhl-235/pull/27.