Static Typing – the dangers of incomplete info

Ok, so I’m going to use this post – Making Illegal States Unrepresentable – and I’ll add my experience to it. For people that don’t know F# (or that don’t want to check all the post to see what’s the point), the idea is that he’s trying to construct a type that will only be valid if a user does at least an e-mail address or a postal contact. Then, he ends with the following type (I’m “inventing” a way to represent this type that’s close to Scala, but easier to read for people that don’t know Scala or Haskell or F#):

type Contact {
  Name: String
  AND Contact: ContactInfo
}

type ContactInfo {
  EmailOnly: EmailInfo
  OR PostOnly: PostalInfo
  OR EmailAndPost: EmailInfo with PostalInfo
}

// Types EmailInfo and PostalInfo have to be defined also

Then, he uses 13 lines to construct a ContactInfo, and another 12 to update a contact info. He ends up concluding that these complicated types are necessary because the logic is complicated. And that’s where we start to disagree.

Real world data

It really is just a single validation. And it’s also a very simple one. Let’s go with something more “real life” here: where I live, health plans need a way to identify an user. You can use the registration number, or your document. Your document can be your national ID, or a foreign one, where you have to capture the country, document type, and number. How it would be represented?

type Identification {
  RegistrationNumber: RegistrationNumberInfo
  OR Document: ValidDocument
}

type ValidDocument {
  NationalID: String
  OR Other: ForeignDocument
}

type ForeignDocument {
  DocumentType: Enum[Passport, ID]
  AND Country: String
  AND Number: String
}

But…. now things start to become complicated. ID is only valid if the country is one of the “mercosur” countries, so “Country” needs to be an Enum, and need to enumerate ALL countries. To be able to bypass this problem and “make illegal states unrepresentable”, we need to add dependent types, so let’s suppose we have a language that can work with dependent types in a non-clunky way:

type ForeignDocument {
  Info: CountryAndType
  AND Number: String
}

type CountryAndType {
  MercosurType: MercosurCountryAndType
  OR NonMercosurType: NonMercosurCountryAndType
}

type MercosurCountryAndType {
  Country: String IN [&quot;BR&quot;, &quot;AR&quot;, &quot;CO&quot;, ....]
  AND Type: Enum[Passport, ID]
}

type NonMercosurCountryAndType {
  Country: String IN NOT [&quot;BR&quot;, &quot;AR&quot;, &quot;CO&quot;, ...]
  // Type is always &quot;Passport&quot; in this case
}

Ok, so how do we create an user info like this? Well, we:
1. Need to check if we have the Registration Number
2. If not, we need to check if we have the Document
3. If we do have the document, we’ll check if the country is one of the Mercosur countries (by hand, because it’s a data that came from IO)
4. Then, we have to check if the document type is allowed for that country
5. If so, we convert the document type TO the specific enum type
6. Then, we finally check if the number is present
7. If so, we proceed to create the full Identification.

All this work is for a single attribute of the user. Only one. And then, if a new document is allowed for specific countries, we have to:
1. Add a new type
2. Check all pattern-matches we have to check for that new type
3. Convert every “destructuring” we have on our code to check for this new pattern

Probably, the compiler will help us with some of these steps. But we’re forgetting that we still need to add types for “database schema” and for “json schema” that comes from the API. We also need to add validation messages, etc…

Clojure

Now, can we do it in Clojure? Using Malli library:

(def ^:private mercosur-countries
  #{&quot;br&quot; &quot;co&quot; &quot;ar&quot; ...})

(defn check-valid-type [{:keys [type country]}]
  (if (mercosur-countries country)
    true
    (= type :passport)))

(defn- at-least-one-key [map]
  (or (contains? map :document)
      (contains? map :registration-number)))

(def Identification
  [:and
   [:map
    [:registration-number {:optional true} string?]
    [:document {:optional true}
     [:and [:map
            [:number string?]
            [:type [:and keyword? [:enum :passport :id]]]
            [:country string?]]
      [:fn {:error/message &quot;When country is not from mercosur, you can only choose passport&quot;}
       check-valid-type]]]]
   [:fn
    {:error/message &quot;Document or Registration Number is required&quot;}
    at-least-one-key]])

Ok, so this allows us to represent invalid data. But it’ll not validate, and it’s way easier to construct. We also have coercers and validation and error messages for free, if we allow “illegal states” to sometimes appear on your code. A simple:

(def decode
  (malli/decoder Identification
    (transform/transformer
     transform/json-transformer
     transform/strip-extra-keys-transformer)))

Defines a decoder to this data structure that will remove extra keys, coerce the elements to their right types from JSON, and then your data structure is already ready to be sent to validation. If it does not validate, you’ll get messages (in any language – Malli have support for i18n) telling you what’s wrong.

Is it worth it?

The part of the original post I can’t agree at all is: is it worth it? Because writing sotware is a process: you first write a feature, then that feature will probably have bugs, and you’ll issue fixes. But software also evolves – and complicated types are harder to extend.

Let’s suppose that, on the original data from the F# post, you need to add “Telephone” as a valid contact. Now you have:

type ContactInfo {
  EmailOnly: EmailInfo
  OR PostOnly: PostalInfo
  OR TelephoneOnly: TelephoneInfo
  OR EmailAndPost: EmailInfo with PostalInfo
  OR TelephoneAndPost: TelephoneInfo with PostalInfo
  OR TelephoneAndEmail: TelephoneInfo with EmailInfo
  OR TelephoneAndEmailAndPost: TelephoneInfo with EmailInfo with PostalInfo
}

Now, every pattern match needs to add 4 more cases. A simple getEmail will have to add pattern-match for at least 3 cases, just to get the e-mail. And then, if someone wants to add another ContactInfo, this approach becomes simply impossible because the number of combinations.

There’s also another issue: we are not in “typed-land” all the time. We receive JSON or YAML, we interop with databases that have different structures and types, we need to define “types” for all these operations, and lots of conversion rules. How far should we go? Should we represent SQL, or JSON, in a way that invalid states are also unrepresentable? So, sometimes you’ll have to limit how far do you want to go – and that’s one of the issues. If a part of the software is “correct by default” because you can’t even represent an invalid state, and another is “incorrect”, it’s hard to keep track on your head which part of the software you have to worry about.

And to be honest – I’m not yet sure that the “first part” is worth it.

PS: I’m not really a “static type expert” so if I made any mistake, feel free to correct me and I’ll try to fix the post or even do another post where I re-visit this subject!

Categories: Clojure

Tags: Designfunctionaltyping

2 Comments

Anon · 2021-07-19 at 12:56

F# hobbyist here, I want to address some of your points, but I’m not an expert so take my opinions with a grain of salt.

You mentioned that we are not always in “typed-land” and that is an important consideration. However, wouldn’t it be nice if we could ONLY have parsing errors at the outermost layers where we interact with un-typed data? For example if we are creating a web API, we only have to validated the incoming request, and as long as that parses into our types, we know we can’t stumble into an unhandled case further down. This is the functional core or onion architecture, where your main domain/business logic is pure and immutable, and only your boundaries have to deal with real-world things like malformed data.

I also want to mention that F# is a very practically-minded language. If you don’t like the ceremony of defining an in-depth domain model with types, you don’t have to. You can pick and choose where it makes sense. In my opinion your clojure example is comparing apples to oranges. It does something similar to what the F# version would do, but not equivalent. In F# we can choose to have no type safety whatsoever with the dynamic keyword, or full type safety like in the “designing with types” example, or anything in between.

Getting to the crux of your argument though, which seems to be whether or not designing with types (or type safety in general) is worth it… As with most things in life, I would say that it depends on a combination of many factors. For example, which bugs are likely to be introduced by a lack of types, what are the consequences of having those bugs, how easy is it to add strict types to prevent those bugs, etc. so it’s hard to give a generalized answer. If I were to try however, I would say that in general I feel like I get quite a lot of type safety for “free-ish” in F#, because of features like type inference. For example adding an “alias” to int called ProductNumber is a one-liner and ensures that I can never try to look up a product based on anything but a ProductNumber. Conversely in C# I have had to debug countless bugs where the wrong “type” of int (or guid) was passed into a function, so the cost of not using more fine-grained types than the basic built-in ones like int, string, guid, etc. is quite high.

Oh and as for your last ContactInfo example, it would probably be better represented as a list of (Email OR Post OR Telephone), and using a function to construct it similarly to “contactFromEmail” in the linked blog post, to ensure that it always has at least one entry. Or, depending on your exact domain, Email could be required but Post and Telephone are optional.

Maurício Szabo · 2021-08-09 at 18:11

Right, taking the same grain of salt here, because I don’t know too much of F# 🙂

I think your opinions and concerns here are completely valid, and I agree with all of them. To be honest, I never found a static typing system that I felt comfortable working with (and I would probably don’t want to meddle with .NET languages for multiple reasons – one, for example, is the constant try to push telemetry down my throat).

But the post was not trying to compare both languages, and I’m sorry if it felt this way. It was to counter the argument against “typed based programming”, a thing that keeps increasing on functional programming languages – Haskell, ELM, and now F# have it. It wasn’t meant to argue that type-safety in general is bad, but to show how impractical is that crescent “I’ll make my whole program type-safe in a way that I can be sure that if it compiles, it’s bug-free” argument.

As for the suggestion, is it possible to make, in type-level, a way that we can be sure that a list have at least one entry, and we don’t have duplicates on “type”, for example, we have two different contacts of type Email? Because that’s what I’m arguing on the post – that making everything type-safe to the point I can’t even construct an invalid type is not practical…

Static Typing – the dangers of incomplete info

Published by Maurício Szabo on 2021-02-022021-02-02

Real world data

Clojure

Is it worth it?

Like this:

Related

2 Comments

Anon · 2021-07-19 at 12:56

Maurício Szabo · 2021-08-09 at 18:11

Comments are closed.

Clojure

Recent open source projects

Clojure

Performance optimizations WITHOUT Reagent – part 3 (and probably final)

Clojure

Porting Clojure libraries to Ruby – Part 2

Static Typing – the dangers of incomplete info

Published by Maurício Szabo on 2021-02-022021-02-02

Real world data

Clojure

Is it worth it?

Share this:

Like this:

Related

2 Comments

Anon · 2021-07-19 at 12:56

Maurício Szabo · 2021-08-09 at 18:11

Comments are closed.

Related Posts

Clojure

Recent open source projects

Clojure

Performance optimizations WITHOUT Reagent – part 3 (and probably final)

Clojure

Porting Clojure libraries to Ruby – Part 2