Names matter. That’s one of the things that I learned, working multiple years on programming software. When you have something called “integration”, for example, that means different things for different people, that’s a recipe for disaster.
That’s why I get quite triggered when people use Untyped / Typed, Weak / Strong typed, Uni / Multi typed, to describe programming languages – because it means different things to different people, or are simply wrong. So let’s dive into it:
Untyped languages or languages without types are quite rare: Assembly is the most interesting example (there’s literally no types in Assembly: everything is a memory address, byte, word, etc). Some virtual machines’ bytecodes are also untyped. In this case, most languages are typed. Ruby, for example, have types, and you can dispatch different behavior depending on the type of the caller (because it’s a class-based object-oriented language). Clojure is also typed, because you can do type-based polymorphism with protocols using defprotocol
and defrecord
for example. You can also use deftype
, and if there’s a specific command to define types, how can we call it “untyped”?
Weak / Strong typed is a term that’s hard to define. Some people say that it’s when you have pointers, because you can bypass the typing at all; others, when the compiler erases typing at run-time; some say that’s when the language tries to coerce and implicitly convert from one type to another, to try to figure out what you want…
Now, a language that allows you to bypass typing because you have pointers is called “memory unsafe”; if the compiler erases typing information, is called “type erasure”; coercion of course is called “coercion”, and can be “automatic” or “manual” coercion (for example, most languages try to coerce integers to decimals at some point); even the ability to convert between types is controversial: on Ruby and Scala, operators are also methods of the class (and, on Ruby, you can rewrite methods on classes that already exist, so automatic conversion is a user-defined feature) – so, please use the right term.
Unityped means a language that only have one type. I believe only some markup languages enter on this classification. Calling a language that have multiple types as unityped because someone imagined that all these multiple types can somewhat collapse into a single one is, at minimum, strange; and also, again, we do have a term for languages like Javascript, Ruby, Python, Smalltalk, and Clojure.
Don’t be ambiguous
Languages that check types before the program run are called static typed languages. Others do not check, and these are called dynamic typed languages. Now, some dynamic typed language can also be duck-typed: that is, you can expect to be able to call a specific function / method / behaviour on whatever you receive, and have the language handle it for you; for example, in Ruby you can call .empty?
on multiple objects that have no relation between then, and they will answer true
if they are empty. Some dynamic-typed languages prefer to be more explicit, and do not use duck-typing (for example, on LISP, you use nth
to get an element from a list, and aref
to get from a vector – on Clojure, both use nth
).
Dynamic typed languages can be compiled or interpreted – another misconception is that they are always interpreted. Clojure, Elixir, Erlang, are all compiled languages. Some others do compile to bytecode, but not emit a file on bytecode format, and some have some implementation that can be compiled to bytecode, like TruffeRuby for example. Some of these languages check things before you run the program (Clojure checks if variables exist, ClojureScript checks even more things like function arities on compilation), some do not (you can have an undeclared variable on a Ruby code, for example). Racket is an interesting example – it can be static or dynamic typed, and it can be interpreted or compiled.
There’s also gradual typing – some languages allows you to define type information for some part of your program. How the “dynamic” and “static” part of the programs interact depends on the language – Racket again being the most notable example I’ve seen.
Finally, there’s dependent typing – a new kind of static-typed languages that not only use the static typing information, but also additional data to further restrict what the program accept. These languages can allow you to encode things like “this function receives one integer, and returns another integer that’s lower than the input”. Most notable languages are Coq, Idris, and Formality.
But you understood what I meant…
Yes, probably people can understand – but it can be confusing. For example, for a while Ruby community used “strong typed” to describe Ruby, and “weak typed” to describe Java, for example. Also, there are moments when someone uses “weak” and “un-*” as a words to defend static-typed languages – being their intention or not. Language and human psychology are weird – we are conditioned to believe that “strong” is better than “weak”, and that “having something” is better than “not having”.
Also, people come from diverse backgrounds: some are still learning English, so things can become confusing really fast: if one says that “Clojure is a dynamic typed language” and other that its a “weak typed”, a learning person probably won’t understand that the author meant the same thing (and again, it’s common for people to say that Ruby is “strong typed”, so this same person will understand even less – if Clojure offers more guarantees than Ruby, why is it weaker?).
So, yes, “Javascript is a dynamic-typed language with automatic and implicit coercion” is a mouthful, but it’s better than use a misleading term and keep the next 20 minutes discussing if it’s “indeed weak-typed or not”…