One of the most valuable techniques I’ve learned from strongly-typed functional programming is value classes. By wrapping common types in specialised case classes, we can improve semantic clarity and leverage compile-time checks to avoid bugs.
Different types of
Imagine a domain where we have a customer ID, and a number of orders. These are both integral values, so we can use
Int to model them.
val customerId: Int = 823826 val numOrders: Int = 5
But we can sum two
Int values, so why not add
val _ = customerId + numOrders // 🤯 Nonsense!
The compiler lets us add these values because they are both of type
Int. But this operation is nonsense: a number of
orders added to a customer ID is meaningless. I tend to think of this as being dimensionally inconsistent,
like adding a mass in kg to a length in metres.
Int in a new, dedicated type allows us to avoid this problem:
final case class CustomerId(asInt: Int) extends AnyVal final case class NumOrders(asInt: Int) extends AnyVal val customerId: CustomerId = CustomerId(823826) val numOrders: NumOrders = NumOrders(5) val _ = customerId + numOrders // 🙅 Doesn't compile
In this example,
NumOrders are “value classes”.
Scala’s imperfect implementation of value classes
Ideally, these value classes would allow us to create compile-time checks while being optimized away at runtime. In other languages, this technique is referred to as “newtypes”, and that’s exactly how it works.
Unfortunately, Scala’s value classes do generally come with a runtime cost. This blog post
provides an excellent explanation (and criticism) of the tradeoffs that have led to this behaviour. It’s worth noting
that the performance overheads of the
CustomerId wrapper over the top of the
Int are almost always negligible on the
JVM. Particularly for workloads like web application backends, we should just ignore this cost.
To the first order, then, we can think of Scala’s value classes as allowing us to create compile-time checks without any runtime cost.
A slightly-less-contrived example
The above example with
NumOrders is a bit contrived. I’ll give a real example where I’ve used this
technique to avoid a bug.
Consider this tab-separated file published by the Australian Electoral Commission. It includes information about candidates in the 2019 Australian election. We might model the csv row (for data ingestion), and the candidate data like:
final case class CandidateRow( state: String, divisionId: Int, divisionName: String, partyAbbreviation: String, partyName: String, candidateId: Int, surname: String, givenName: String, ) final case class Candidate( id: Int, surname: String, givenName: String, ) def rowToCandidate(row: CandidateRow): Candidate = Candidate( row.candidateId, row.givenName, row.surname, )
Can you spot the bug in
rowToCandidate? We have mixed up the order of
row.surname. This kind of
mistake is incredibly easy to make. What if we used strong types (including value classes)?
final case class CandidateRow( state: State, divisionId: DivisionId, divisionName: DivisionName, partyAbbreviation: PartyAbbreviation, partyName: PartyName, candidateId: CandidateId, surname: CandidateSurname, givenName: CandidateGivenName, ) final case class Candidate( id: CandidaateId, surname: CandidateSurname, givenName: CandidateGivenName, ) def rowToCandidate(row: CandidateRow): Candidate = Candidate( row.candidateId, row.givenName, // 🙅 Doesn't compile row.surname, // 🙅 Doesn't compile )
The compiler is now able to identify this bug.
Of course, this requires that we have correctly modelled the fields in
CandidateRow. But this complies with a more
general principle of strongly-typed functional programming: put compiler-enforced constraints on data as soon as it
enters your program (eg from a tsv file), and then rely on the compiler to help find bugs everywhere else.