Ok, so I’m going to use this post – Making Illegal States Unrepresentable – and I’ll add my experience to it. For people that don’t know F# (or that don’t want to check all the post to see what’s the point), the idea is that he’s trying to construct a type that will only be valid if a user does at least an e-mail address or a postal contact. Then, he ends with the following type (I’m “inventing” a way to represent this type that’s close to Scala, but easier to read for people that don’t know Scala or Haskell or F#):

type Contact {
  Name: String
  AND Contact: ContactInfo
}

type ContactInfo {
  EmailOnly: EmailInfo
  OR PostOnly: PostalInfo
  OR EmailAndPost: EmailInfo with PostalInfo
}

// Types EmailInfo and PostalInfo have to be defined also

Then, he uses 13 lines to construct a ContactInfo, and another 12 to update a contact info. He ends up concluding that these complicated types are necessary because the logic is complicated. And that’s where we start to disagree.

Real world data

It really is just a single validation. And it’s also a very simple one. Let’s go with something more “real life” here: where I live, health plans need a way to identify an user. You can use the registration number, or your document. Your document can be your national ID, or a foreign one, where you have to capture the country, document type, and number. How it would be represented?

type Identification {
  RegistrationNumber: RegistrationNumberInfo
  OR Document: ValidDocument
}

type ValidDocument {
  NationalID: String
  OR Other: ForeignDocument
}

type ForeignDocument {
  DocumentType: Enum[Passport, ID]
  AND Country: String
  AND Number: String
}

But…. now things start to become complicated. ID is only valid if the country is one of the “mercosur” countries, so “Country” needs to be an Enum, and need to enumerate ALL countries. To be able to bypass this problem and “make illegal states unrepresentable”, we need to add dependent types, so let’s suppose we have a language that can work with dependent types in a non-clunky way:

type ForeignDocument {
  Info: CountryAndType
  AND Number: String
}

type CountryAndType {
  MercosurType: MercosurCountryAndType
  OR NonMercosurType: NonMercosurCountryAndType
}

type MercosurCountryAndType {
  Country: String IN ["BR", "AR", "CO", ....]
  AND Type: Enum[Passport, ID]
}

type NonMercosurCountryAndType {
  Country: String IN NOT ["BR", "AR", "CO", ...]
  // Type is always "Passport" in this case
}

Ok, so how do we create an user info like this? Well, we:
1. Need to check if we have the Registration Number
2. If not, we need to check if we have the Document
3. If we do have the document, we’ll check if the country is one of the Mercosur countries (by hand, because it’s a data that came from IO)
4. Then, we have to check if the document type is allowed for that country
5. If so, we convert the document type TO the specific enum type
6. Then, we finally check if the number is present
7. If so, we proceed to create the full Identification.

All this work is for a single attribute of the user. Only one. And then, if a new document is allowed for specific countries, we have to:
1. Add a new type
2. Check all pattern-matches we have to check for that new type
3. Convert every “destructuring” we have on our code to check for this new pattern

Probably, the compiler will help us with some of these steps. But we’re forgetting that we still need to add types for “database schema” and for “json schema” that comes from the API. We also need to add validation messages, etc…

Clojure

Now, can we do it in Clojure? Using Malli library:

(def ^:private mercosur-countries
  #{"br" "co" "ar" ...})

(defn check-valid-type [{:keys [type country]}]
  (if (mercosur-countries country)
    true
    (= type :passport)))

(defn- at-least-one-key [map]
  (or (contains? map :document)
      (contains? map :registration-number)))

(def Identification
  [:and
   [:map
    [:registration-number {:optional true} string?]
    [:document {:optional true}
     [:and [:map
            [:number string?]
            [:type [:and keyword? [:enum :passport :id]]]
            [:country string?]]
      [:fn {:error/message "When country is not from mercosur, you can only choose passport"}
       check-valid-type]]]]
   [:fn
    {:error/message "Document or Registration Number is required"}
    at-least-one-key]])

Ok, so this allows us to represent invalid data. But it’ll not validate, and it’s way easier to construct. We also have coercers and validation and error messages for free, if we allow “illegal states” to sometimes appear on your code. A simple:

(def decode
  (malli/decoder Identification
    (transform/transformer
     transform/json-transformer
     transform/strip-extra-keys-transformer)))

Defines a decoder to this data structure that will remove extra keys, coerce the elements to their right types from JSON, and then your data structure is already ready to be sent to validation. If it does not validate, you’ll get messages (in any language – Malli have support for i18n) telling you what’s wrong.

Is it worth it?

The part of the original post I can’t agree at all is: is it worth it? Because writing sotware is a process: you first write a feature, then that feature will probably have bugs, and you’ll issue fixes. But software also evolves – and complicated types are harder to extend.

Let’s suppose that, on the original data from the F# post, you need to add “Telephone” as a valid contact. Now you have:

type ContactInfo {
  EmailOnly: EmailInfo
  OR PostOnly: PostalInfo
  OR TelephoneOnly: TelephoneInfo
  OR EmailAndPost: EmailInfo with PostalInfo
  OR TelephoneAndPost: TelephoneInfo with PostalInfo
  OR TelephoneAndEmail: TelephoneInfo with EmailInfo
  OR TelephoneAndEmailAndPost: TelephoneInfo with EmailInfo with PostalInfo
}

Now, every pattern match needs to add 4 more cases. A simple getEmail will have to add pattern-match for at least 3 cases, just to get the e-mail. And then, if someone wants to add another ContactInfo, this approach becomes simply impossible because the number of combinations.

There’s also another issue: we are not in “typed-land” all the time. We receive JSON or YAML, we interop with databases that have different structures and types, we need to define “types” for all these operations, and lots of conversion rules. How far should we go? Should we represent SQL, or JSON, in a way that invalid states are also unrepresentable? So, sometimes you’ll have to limit how far do you want to go – and that’s one of the issues. If a part of the software is “correct by default” because you can’t even represent an invalid state, and another is “incorrect”, it’s hard to keep track on your head which part of the software you have to worry about.

And to be honest – I’m not yet sure that the “first part” is worth it.

PS: I’m not really a “static type expert” so if I made any mistake, feel free to correct me and I’ll try to fix the post or even do another post where I re-visit this subject!


0 Comments

Leave a Reply

Your email address will not be published. Required fields are marked *

%d bloggers like this: