Data Structures Are Antithetical to Functional Programming
Functional programmers are incredibly lazy.
More precisely, we defer commitment as late as possible. In extreme examples like Haskell, we defer the computation of every part of every expression until the last possible moment.
Even in less extreme cases, however, we push effects (such as input/output) to the edges of our program. With Monad Transformers Library (MTL) or FTL, we defer committing to specific effect types until our application’s main function.
This commitment to non-commitment tends to have several benefits:
- It improves comprehension, because low-level portions of the code needn’t be concerned with unnecessary and distracting details;
- It improves reusability, since higher-level code has more control over decisions.
- Sometimes, it even improves performance (algorithm fusion in the case of Haskell’s non-strict evaluation model).
Sadly, there’s something about even the most functional code that is inherently not lazy. Instead, almost all functional code eagerly commits to more than it needs to, obfuscating intent and decreasing reusability.
I’m referring to the tendency of functional code to use hard-coded data structures.
Devil in the Data
Let’s take the following code example, which tries to find an employee’s manager or returns an error message:
Notice how the code commits to returning the result in a hard-coded Either
data structure.
This is far more specific than the code actually requires. All the function really needs is the ability to construct error and success cases, without the ability to know anything about the data structure or to deconstruct it.
The code is prematurely committing to a concrete data structure, which removes
the choice from all higher-level code (you don’t have the choice to return the
error in an IO
or Future
monad, for example), making the implementation both
unnecessarily complex and more difficult to reuse.
In an ideal world, the findManager
function would be polymorphic in the type
of data structure it returned. Further, the function would not require the
ability to construct and deconstruct error/success values, since it depends
only on construction.
In an ideal world, our code would commit to no data structures.
Constructing Polymorphic Data
To purge the blight of concrete data structures from our functional programs, our first step is going to be to separate the capability of construction from the capability of deconstruction.
Using an MTL-like approach, we’re going to describe the capability of
constructing an Either
-like structure:
These classes promise the ability to construct an Either
-like structure from a
left or a right value. However, the structure they are constructing could be anything—including a sum type with many more terms than just right and
left, or even a Monad
with an error and success case.
With this type class, we can now make the original code polymorphic in the type of data structure it returns:
This was an easy change to make, but what happens when we need not just the ability to construct a data structure, but also the ability to deconstruct it?
The solution to this problem is surprisingly elegant.
Deconstructing Polymorphic Data
The capability of deconstruction can be expressed directly as a catamorphism or fold from the representation of a data structure.
In the case of an Either
-like data structure, we can represent this
catamorphism with the following type class:
Here’s how we could use this type class to construct a leftOrElse
combinator,
which extracts the left side of an either if it’s there, or else uses a
specified default value:
This code is completely polymorphic in the type of data structure it operates on, requiring only the ability to deconstruct an either-like data structure into its components.
However, it’s hard to look at this code without noticing there’s a lot of boilerplate. The catamorphism of deconstruction is eerily similar to the anamorphism of construction, yet we have two type classes, one for construction, and one for deconstruction.
With a little bit of abstraction, we can completely eliminate all this boilerplate and make generic abstractions for construction and deconstruction.
Generic Constructors & Deconstructors
As a first step, we’ll refactor the constructors and deconstructors for Either
as follows:
These classes lets you simply build and match against either-like structures as follows:
Notice that we can abstract over all constructors of a given arity in straightforward fashion:
Here’s a concrete example of instances of these classes for constructing and
deconstructing the Either
data structure:
Finally, we can refactor our toy example to use this machinery as follows:
There you have it—code that’s fully polymorphic in construction and deconstruction. It works with any data structure that can provide the capabilities it needs, and only the capabilities it needs.
Now let’s revisit our original example.
Polymorphic Overload
Our original example that motivated this development was the findManger
function:
If we modified this example to use a polymorphic return value, we’d end up with the following type signature:
This function could be used to return any type of structure which is at least as
capable as Either
’s constructors, including any Monad
that had failure and
success cases, or other sum types that have more than two terms.
While more flexible, the function is still monomorphic in the type of List
,
Employee
, and String
. If taken to its logical conclusion, then making this
function totally polymorphic would yield something like this:
If we change a bunch of names and use Unicode and type alias operators, we can make this horrible soup of letters and symbols a little more readable, possibly close to the following:
That’s about the best we can get—and let’s be honest, it’s not very good. There’s still a lot of boilerplate, and while all this polymorphism is very powerful, must code won’t need it.
It’s a heavy price to pay for fringe use cases. Too heavy, I’d say, at least with today’s programming languages.
Looking Forward
Ultimately, I’d argue that today’s programming languages are unnaturally obsessed with data. They are wedded to the idea of rigid layouts of bits in memory—for understandably pragmatic or historic reasons.
Yet, the functions we develop in our code don’t usually require bits. Instead, they require capabilities. Some to construct, some to deconstruct. If we expressed these capabilities precisely, we could free our code from having to know too much about the structures it operates on, while at the same time making our code much more reusable for different purposes.
We can model this degree of polymorphism with today’s programming languages, but it’s awfully cumbersome. However, if we were building a next-generation programming language, we could make it difficult or impossible to make our code depend on “layouts of bits”, and instead, make it trivial to require only as much structure as we need to implement each function.
For now, though, I recommend you stick to data structures when you must, and judiciously incorporate data polymorphism where it makes sense to do so, which will be cases where the opportunities for reuse vastly outweigh the overhead.