One of the most valuable techniques I’ve learned from strongly-typed functional programming is value classes. By wrapping common types in specialised case classes, we can improve semantic clarity and leverage compile-time checks to avoid bugs.
Different types of Int
Imagine a domain where we have a customer ID, and a number of orders. These are both integral values, so we can use
Int
to model them.
val customerId: Int = 823826
val numOrders: Int = 5
But we can sum two Int
values, so why not add customerId
to numOrders
?
val _ = customerId + numOrders // 🤯 Nonsense!
The compiler lets us add these values because they are both of type Int
. But this operation is nonsense: a number of
orders added to a customer ID is meaningless. I tend to think of this as being dimensionally inconsistent,
like adding a mass in kg to a length in metres.
Wrapping Int
in a new, dedicated type allows us to avoid this problem:
final case class CustomerId(asInt: Int) extends AnyVal
final case class NumOrders(asInt: Int) extends AnyVal
val customerId: CustomerId = CustomerId(823826)
val numOrders: NumOrders = NumOrders(5)
val _ = customerId + numOrders // 🙅 Doesn't compile
In this example, CustomerId
and NumOrders
are “value classes”.
Scala’s imperfect implementation of value classes
Ideally, these value classes would allow us to create compile-time checks while being optimized away at runtime. In other languages, this technique is referred to as “newtypes”, and that’s exactly how it works.
Unfortunately, Scala’s value classes do generally come with a runtime cost. This blog post
provides an excellent explanation (and criticism) of the tradeoffs that have led to this behaviour. It’s worth noting
that the performance overheads of the CustomerId
wrapper over the top of the Int
are almost always negligible on the
JVM. Particularly for workloads like web application backends, we should just ignore this cost.
To the first order, then, we can think of Scala’s value classes as allowing us to create compile-time checks without any runtime cost.
A slightly-less-contrived example
The above example with CustomerId
and NumOrders
is a bit contrived. I’ll give a real example where I’ve used this
technique to avoid a bug.
Consider this tab-separated file published by the Australian Electoral Commission. It includes information about candidates in the 2019 Australian election. We might model the csv row (for data ingestion), and the candidate data like:
final case class CandidateRow(
state: String,
divisionId: Int,
divisionName: String,
partyAbbreviation: String,
partyName: String,
candidateId: Int,
surname: String,
givenName: String,
)
final case class Candidate(
id: Int,
surname: String,
givenName: String,
)
def rowToCandidate(row: CandidateRow): Candidate =
Candidate(
row.candidateId,
row.givenName,
row.surname,
)
Can you spot the bug in rowToCandidate
? We have mixed up the order of row.givenName
and row.surname
. This kind of
mistake is incredibly easy to make. What if we used strong types (including value classes)?
final case class CandidateRow(
state: State,
divisionId: DivisionId,
divisionName: DivisionName,
partyAbbreviation: PartyAbbreviation,
partyName: PartyName,
candidateId: CandidateId,
surname: CandidateSurname,
givenName: CandidateGivenName,
)
final case class Candidate(
id: CandidaateId,
surname: CandidateSurname,
givenName: CandidateGivenName,
)
def rowToCandidate(row: CandidateRow): Candidate =
Candidate(
row.candidateId,
row.givenName, // 🙅 Doesn't compile
row.surname, // 🙅 Doesn't compile
)
The compiler is now able to identify this bug.
Of course, this requires that we have correctly modelled the fields in CandidateRow
. But this complies with a more
general principle of strongly-typed functional programming: put compiler-enforced constraints on data as soon as it
enters your program (eg from a tsv file), and then rely on the compiler to help find bugs everywhere else.