Learning Rust #4: Parsing JSON with strong types
Last December I finally started learning Rust and in January I built and published my first app with Rust: 235. Learning Rust is my new monthly blog series that is defnitely not a tutorial but rather a place for me to keep track of my learning and write about things I've learned along the way.
Learning Rust series
- Learning Rust #1: Pattern Matching
- Learning Rust #2: Option & Result
- Learning Rust #3: crates.io & publishing your package
- Learning Rust #4: Parsing JSON with strong types (you are here)
- Learning Rust #5: Rustlings
- Learning Rust #6: Understanding ownership in Rust
- Learning Rust #7: Learn from the community
- Learning Rust #8: What's next?
- Learning Rust #9: A talk about rustlings
- Learning Rust #10: Added new feature with a HashMap
One big difference with Rust compared to Javascript and Python that I've been writing for most of my career is strong static typing. With JS and Python, I'd just call an API, get some data in JSON, parse that into a Python dict or Javascript object and then access different values directly.
With Rust, I'm working with differently typed structures and while I understand the value of it, sometimes it feels very cumbersome to write.
I started with serde_json::Value
When I started working with Rust to build 235, I decided to use
Serde that seemed to be a popular library for
seriealizing and deserializing data. While I was still learning the basics, it
felt overwhelming to figure out the correct way of defining types for the
entire API response. I decided to leave that as a refactoring exercise for a
later day and started by using
serde_json::Value
, which is kinda a catch-all solution. It is an Enum that represents any
valid JSON value.
#[tokio::main]
async fn fetch_games() -> Result<serde_json::Value, Error> {
let request_url = String::from("https://nhl-score-api.herokuapp.com/api/scores/latest");
let response = reqwest::get(&request_url).await?;
let scores: serde_json::Value = response.json().await?;
Ok(scores)
}
Here's my original fetch_games
function that called the API and
parsed the data into a serde_json::Value
. Nice and clean, easy to
use.
But it led to my code being full of code like this
let games = scores["games"].as_array().unwrap();
let home_team = &game_json["teams"]["home"]["abbreviation"].as_str().unwrap();
let all_goals = game_json["goals"].as_array().unwrap_or(&empty);
I had to code very defensively because any type conversion with
as_
methods could have fail and thus needed to be unwrapped and
in some cases, made sure the keys even existed in the beginning:
if (&game_json["teams"]).is_null() {
return None;
}
let home_team = &game_json["teams"]["home"]["abbreviation"].as_str().unwrap();
let away_team = &game_json["teams"]["away"]["abbreviation"].as_str().unwrap();
Using serde_json::Value
did help me build the first functioning
versions of my application but in the back of my head I had the knowledge that
defining the structures as Rust structs would make my code better.
One key thing that kept me from doing that in the beginning was I couldn't figure out how to deal with dynamic keys. The API I use uses team abbreviations as keys on certain places and I had no idea how to do it so I left the issue for months.
How to define structs for JSON
Using custom structs to define expected types with JSON is rather straight-forward if you're familiar with structs.
I'll start with the example from the documentation:
#[derive(Serialize, Deserialize)]
struct Person {
name: String,
age: u8,
phones: Vec<String>,
}
Here we define a Person
that has name as String
, an
age as u8
and a list of phone numbers as
Vec<String>
. Once we have this definition, all we have to
do is to use that as a type of the result and serde_json
and Rust
will validate that the structure and types of data matches what we defined:
// Parse the string of data into a Person object.
let p: Person = serde_json::from_str(data)?;
// Do things just like with any other Rust data structure.
println!("Please call {} at the number {}", p.name, p.phones[0]);
Then we draw the rest of the owl
I found the base example very good and nice to follow. However, that wasn't all that I needed so next I needed to figure out how to draw the rest of the owl.
snake_case of Rust with camelCase of JSON
First, Rust likes to use
snake_case and gives
you a warning by default if you don't. JSON keys are usually in
camelCase so my editor
and my shell were full of warnings and that was annoying. I learned that I can
silence them by defining an allow
attribute for each struct that
needed it:
#[derive(Debug, Serialize, Deserialize)]
#[allow(non_snake_case)]
pub struct TeamResponse {
pub abbreviation: String,
pub id: u64,
pub locationName: String,
pub shortName: String,
pub teamName: String,
}
I haven't yet found out if there's a way to use snake case'd names and automatically map them to corresponding JSON values. (Raymond Hettinger from the Python community has a nice talk for why it's valuable to have your code in idiomatic Python; much of it is applicable to any language).
edit. 2021-05-06: Thanks to PatatasDelPapa's comment in pull request, I was able to fix this:
#[derive(Debug, Serialize, Deserialize)]
#[serde(rename_all = "camelCase")]
pub struct TeamResponse {
pub abbreviation: String,
pub id: u64,
pub location_name: String,
pub short_name: String,
pub team_name: String,
}
Code organizing API response types into a separate file
Second, you'll see that in the snippet above, definitions start with
pub
. That's because I wanted to move my API response type
definitions to its own file so they don't pollute my main.rs and it's easier
to find and change them. So I made api_types.rs
file, moved them
all there, added plenty of pub
s and imported them with
mod api_types;
use api_types::{APIResponse, GameResponse, GoalResponse};
in main.rs.
Dynamic keys
Third, I needed some support for dynamic keys. I was expecting to have to jump
through some hoops but this ended up being one of the easiest steps in the
process. Thanks to
this GitHub issue,
I found the solution: use
HashMap<String, serde_json::Value>
for any object that has
dynamic keys.
#[derive(Debug, Serialize, Deserialize)]
#[allow(non_snake_case)]
pub struct GameResponse {
pub status: StatusResponse,
pub startTime: String,
pub goals: Vec<GoalResponse>,
pub scores: HashMap<String, serde_json::Value>,
pub teams: TeamsResponse,
pub preGameStats: PreGameStatsResponse,
pub currentStats: CurrentStatsResponse,
}
In the documentation of the API, scores
field is defined as
scores
object: each team’s goal count, plus one of these possible fields:
-overtime
: set totrue
if the game ended in overtime, absent if it didn’t
-shootout
: set totrue
if the game ended in shootout, absent if it didn’t
and looks like
"scores": {
"BOS": 4,
"CHI": 3,
"overtime": true
}
The keys for each team are gonna be dynamic so by using HashMap
.
The downside of this that I wasn't able to figure out, is that I'd like to put
overtime
and shootout
keys into the type
definitions.
Optional keys
And finally, fourth, some keys are optional – or to be specific in this case, they are dependent on the situation.
I'm happy I already learned about Option
types earlier (check out Learning Rust #2 if you haven't heard about them). And that's all you need with Rust and serde_json
:
#[derive(Debug, Serialize, Deserialize)]
pub struct APIResponse {
pub date: DateResponse,
pub games: Vec<GameResponse>,
pub errors: Option<HashMap<String, serde_json::Value>>,
}
In this example, which is the top level definition of my API response type,
I've defined errors
to be optional – it's only present if the API
I use found some errors in the data it's using.
By saying that the type of the data is Option
, we'll either get a
Some(value)
if the key exists and None
if it
doesn't. For most cases in this application, it's not random which keys are
there and which are not. So if I'm already in a branch of logic where the
prerequisite is being dealt with, I can trust that the value exists and can
unwrap without dealing with potential None
s.
Result? Much cleaner code
Refactoring the code with these definitions is not only an exercise of writing code for the sake of writing code. The result of this work is that we push the problems at the boundaries of parsing the data and we don't have to deal with them at the logic level anymore.
I already showed some examples in the beginning about the code I wasn't happy about. Let's see how they look like now:
Before
if (&game_json["teams"]).is_null() {
return None;
}
let home_team = &game_json["teams"]["home"]["abbreviation"].as_str().unwrap();
let away_team = &game_json["teams"]["away"]["abbreviation"].as_str().unwrap();
After
let home_team = &game_json.teams.home.abbreviation;
let away_team = &game_json.teams.away.abbreviation;
Before
let empty: Vec<serde_json::Value> = Vec::new();
let all_goals = game_json["goals"].as_array().unwrap_or(&empty);
After
let all_goals = &game_json.goals;
I don't know about you but I'm very happy to see my code being much more simpler and easier to read. I can also trust that once the data comes through from the API, it won't have any surprises.
Anything I don't like?
Writing type definitions for these kind of nested structures feels sometimes overly complicated. There are some definitions that I don't need anywhere outside one value in the structure and having to write a named struct to that feels bit annoying.
One thing I did enjoy in Typescript is a way to write in-line nested types and interfaces like this:
interface IEndpoints {
auth: {
login: string;
}
}
In Rust, I would have needed to make two structs.
Pull requests for all these changes
If you wanna check out all the changes and how the codebase cleaned up during this process, you can find it in GitHub: hamatti/nhl-235/pull/27.