Timothy McCarthy
Timothy McCarthy

Categories

One of the most valuable techniques I’ve learned from strongly-typed functional programming is value classes. By wrapping common types in specialised case classes, we can improve semantic clarity and leverage compile-time checks to avoid bugs.

Different types of Int

Imagine a domain where we have a customer ID, and a number of orders. These are both integral values, so we can use Int to model them.

val customerId: Int = 823826
val numOrders: Int = 5

But we can sum two Int values, so why not add customerId to numOrders?

val _ = customerId + numOrders // 🤯 Nonsense!

The compiler lets us add these values because they are both of type Int. But this operation is nonsense: a number of orders added to a customer ID is meaningless. I tend to think of this as being dimensionally inconsistent, like adding a mass in kg to a length in metres.

Wrapping Int in a new, dedicated type allows us to avoid this problem:

final case class CustomerId(asInt: Int) extends AnyVal
final case class NumOrders(asInt: Int)  extends AnyVal

val customerId: CustomerId = CustomerId(823826)
val numOrders: NumOrders   = NumOrders(5)

val _ = customerId + numOrders // 🙅 Doesn't compile

In this example, CustomerId and NumOrders are “value classes”.

Scala’s imperfect implementation of value classes

Ideally, these value classes would allow us to create compile-time checks while being optimized away at runtime. In other languages, this technique is referred to as “newtypes”, and that’s exactly how it works.

Unfortunately, Scala’s value classes do generally come with a runtime cost. This blog post provides an excellent explanation (and criticism) of the tradeoffs that have led to this behaviour. It’s worth noting that the performance overheads of the CustomerId wrapper over the top of the Int are almost always negligible on the JVM. Particularly for workloads like web application backends, we should just ignore this cost.

To the first order, then, we can think of Scala’s value classes as allowing us to create compile-time checks without any runtime cost.

A slightly-less-contrived example

The above example with CustomerId and NumOrders is a bit contrived. I’ll give a real example where I’ve used this technique to avoid a bug.

Consider this tab-separated file published by the Australian Electoral Commission. It includes information about candidates in the 2019 Australian election. We might model the csv row (for data ingestion), and the candidate data like:

final case class CandidateRow(
  state: String, 
  divisionId: Int, 
  divisionName: String, 
  partyAbbreviation: String, 
  partyName: String, 
  candidateId: Int, 
  surname: String, 
  givenName: String,
)

final case class Candidate(
  id: Int,
  surname: String,
  givenName: String,
)

def rowToCandidate(row: CandidateRow): Candidate = 
  Candidate(
    row.candidateId,
    row.givenName,
    row.surname,
  )

Can you spot the bug in rowToCandidate? We have mixed up the order of row.givenName and row.surname. This kind of mistake is incredibly easy to make. What if we used strong types (including value classes)?

final case class CandidateRow(
  state: State, 
  divisionId: DivisionId, 
  divisionName: DivisionName, 
  partyAbbreviation: PartyAbbreviation, 
  partyName: PartyName, 
  candidateId: CandidateId, 
  surname: CandidateSurname, 
  givenName: CandidateGivenName,
)

final case class Candidate(
  id: CandidaateId,
  surname: CandidateSurname,
  givenName: CandidateGivenName,
)

def rowToCandidate(row: CandidateRow): Candidate = 
  Candidate(
    row.candidateId,
    row.givenName, // 🙅 Doesn't compile
    row.surname,   // 🙅 Doesn't compile
  )

The compiler is now able to identify this bug.

Of course, this requires that we have correctly modelled the fields in CandidateRow. But this complies with a more general principle of strongly-typed functional programming: put compiler-enforced constraints on data as soon as it enters your program (eg from a tsv file), and then rely on the compiler to help find bugs everywhere else.