This is a good summary of the history, but there is one small error.
The sentence "Hoare seems to have independently come up with the idea of sum and product types." is not true.
C.A.R. Hoare has first described his proposals in November 1965, in "Record Handling". This followed after a year the proposal of John McCarthy from October 1964, "Definition of new data types in ALGOL x".
Hoare and McCarthy were colleagues in the committee for designing a successor of ALGOL 60, where both proposals have been discussed and Hoare has taken explicitly the concepts of product types and union types (sum types) from McCarthy.
However, instead of using the "Cartesian" keyword used by McCarthy, Hoare has used the older keywords "record" and "record class", taken from COBOL 60, because "record class" and "Cartesian" really were the same method of deriving a new type. The term "record class" of Hoare was shortened to "class" in SIMULA 67, from where "class" has been taken by all object-oriented programming languages.
The proposal of Hoare was explicitly based on the proposal of McCarthy, on COBOL 60 and on the concept of pointers previously introduced in the languages CPL and Euler. However, like in those 2 languages, Hoare used the term "references" for pointers. The term pointer was used for the first time by IBM one year later, when they have added to PL/I a part of the features proposed by Hoare.
To records (named structures in PL/I), record classes (= product types), unions (sum types) and references (named pointers in PL/I) from his predecessors, Hoare has added a few new concepts, e.g. null pointers, enumerations (named finite sets), the operator "new", constructors and destructors (not named such). Overloaded operators had already been proposed by McCarthy.
This is thesis of Frederick McBride, father of Connor McBride, who does significant work in dependent types' circles [1]. The thesis describes symbolic computation algorithms, applies them to expression derivative calculation with simplification, and provides not only definition and construction of algebraic types, but also pattern matching over their values.
> Niklaus Wirth uses “discriminated union” why Pascal doesn’t have sum types.
In the referenced paper (Wirth, 1975, An assessment of the programming language PASCAL, ACM), Wirth refers to the "inspect when" statement of Simula 67, which is structurally identical to type discrimination with (exhaustive) pattern matching; so Simula essentially already had something like "sum types" (unified with inheritance). Wirth implemented the same concept in his later Oberon language (which also supports inheritance and offers a WITH statement similar to Simula's "inspect when"). Also Pascal variant records are similar to sum types (even if Wirth didn't use this term); variant records have an explicit tag field, though Pascal doesn't enforce tag nor exhaustiveness checking. Wirth improved this in Oberon, which can be used to meet both key guarantees of sum types.
Since writing this I've been informed of some gaps (mostly through email and a lobsters [1] thread). Some of the main ones:
- McCarthy's "Direct Union" is probably conflating "disjoint union" and "direct sum".
- ML probably got the sum/product names from Dana Scott's work. It's unclear if Scott knew of McCarthy's paper or was inspired by it.
- I called ALGOL-68 a "curious dead end" but that's not true: Dennis Ritchie said that he was inspired by 68 when developing C. Also, 68 had exhaustive pattern matching earlier than ML.
- Hoare cites McCarthy in an earlier version of his record paper [2].
Also I kinda mixed up the words for "tagged unions" and "labeled unions". Hope that didn't confuse anybody!
> Sum types are relatively rare in modern programming languages, outside functional programming and some places like Rust.
Is that true? I guess it depends on what "modern" means. But, for popular languages less that ~20 years old seem to all have them AFAIK, except Go, I think.
Swift being one I've used which I like a lot (Swift enum).
> ALGOL-68 would later implement both of them but also be a curious dead end in language history; it would have little impact on modern programming languages.
I chuckled at this one because I consider most modern languages to be homomorphic to Algol-68.
I think that there was a part of Algol-68 that took over the world. That part has been table stakes for a "real" language ever since. But, if I understand correctly, Algol-68 had some weird corners, too, and I'm not sure that other languages kept those.
Hmm, I always thought of "sum type" as abstract/type theoretic and "tagged/discriminated union" as one possible implementation of the concept for a finite memory model.
Nice write up! Ahistorically, if you know some category theory the ideas map directly to what are called sums and products in category theory, and the "category of algebras" is just the special category in which much of the semantics of programming takes place (which is just an abstraction where we have operations just like sum and multiplication, but we can use any set, not just the numbers)—hence types are "algebraic" products and sums.
This is a good summary of the history, but there is one small error.
The sentence "Hoare seems to have independently come up with the idea of sum and product types." is not true.
C.A.R. Hoare has first described his proposals in November 1965, in "Record Handling". This followed after a year the proposal of John McCarthy from October 1964, "Definition of new data types in ALGOL x".
Hoare and McCarthy were colleagues in the committee for designing a successor of ALGOL 60, where both proposals have been discussed and Hoare has taken explicitly the concepts of product types and union types (sum types) from McCarthy.
However, instead of using the "Cartesian" keyword used by McCarthy, Hoare has used the older keywords "record" and "record class", taken from COBOL 60, because "record class" and "Cartesian" really were the same method of deriving a new type. The term "record class" of Hoare was shortened to "class" in SIMULA 67, from where "class" has been taken by all object-oriented programming languages.
The proposal of Hoare was explicitly based on the proposal of McCarthy, on COBOL 60 and on the concept of pointers previously introduced in the languages CPL and Euler. However, like in those 2 languages, Hoare used the term "references" for pointers. The term pointer was used for the first time by IBM one year later, when they have added to PL/I a part of the features proposed by Hoare.
To records (named structures in PL/I), record classes (= product types), unions (sum types) and references (named pointers in PL/I) from his predecessors, Hoare has added a few new concepts, e.g. null pointers, enumerations (named finite sets), the operator "new", constructors and destructors (not named such). Overloaded operators had already been proposed by McCarthy.
The list of papers miss this one from 1970: https://personal.cis.strath.ac.uk/conor.mcbride/FVMcB-PhD.pd...
This is thesis of Frederick McBride, father of Connor McBride, who does significant work in dependent types' circles [1]. The thesis describes symbolic computation algorithms, applies them to expression derivative calculation with simplification, and provides not only definition and construction of algebraic types, but also pattern matching over their values.
[1] https://personal.cis.strath.ac.uk/conor.mcbride/
Interesting article, thanks for sharing.
> Niklaus Wirth uses “discriminated union” why Pascal doesn’t have sum types.
In the referenced paper (Wirth, 1975, An assessment of the programming language PASCAL, ACM), Wirth refers to the "inspect when" statement of Simula 67, which is structurally identical to type discrimination with (exhaustive) pattern matching; so Simula essentially already had something like "sum types" (unified with inheritance). Wirth implemented the same concept in his later Oberon language (which also supports inheritance and offers a WITH statement similar to Simula's "inspect when"). Also Pascal variant records are similar to sum types (even if Wirth didn't use this term); variant records have an explicit tag field, though Pascal doesn't enforce tag nor exhaustiveness checking. Wirth improved this in Oberon, which can be used to meet both key guarantees of sum types.
Since writing this I've been informed of some gaps (mostly through email and a lobsters [1] thread). Some of the main ones:
- McCarthy's "Direct Union" is probably conflating "disjoint union" and "direct sum".
- ML probably got the sum/product names from Dana Scott's work. It's unclear if Scott knew of McCarthy's paper or was inspired by it.
- I called ALGOL-68 a "curious dead end" but that's not true: Dennis Ritchie said that he was inspired by 68 when developing C. Also, 68 had exhaustive pattern matching earlier than ML.
- Hoare cites McCarthy in an earlier version of his record paper [2].
Also I kinda mixed up the words for "tagged unions" and "labeled unions". Hope that didn't confuse anybody!
[1] https://lobste.rs/s/ppm44i/very_early_history_algebraic_data...
[2] https://dl.acm.org/doi/10.5555/1061032.1061041
> Sum types are relatively rare in modern programming languages, outside functional programming and some places like Rust.
Is that true? I guess it depends on what "modern" means. But, for popular languages less that ~20 years old seem to all have them AFAIK, except Go, I think.
Swift being one I've used which I like a lot (Swift enum).
> ALGOL-68 would later implement both of them but also be a curious dead end in language history; it would have little impact on modern programming languages.
I chuckled at this one because I consider most modern languages to be homomorphic to Algol-68.
To all of Algol-68? Or just to a subset of it?
I think that there was a part of Algol-68 that took over the world. That part has been table stakes for a "real" language ever since. But, if I understand correctly, Algol-68 had some weird corners, too, and I'm not sure that other languages kept those.
Hmm, I always thought of "sum type" as abstract/type theoretic and "tagged/discriminated union" as one possible implementation of the concept for a finite memory model.
Great article, usually many end up only talking about recent history, while this traces much further back to the origins.
See https://news.ycombinator.com/item?id=45461480
Nice write up! Ahistorically, if you know some category theory the ideas map directly to what are called sums and products in category theory, and the "category of algebras" is just the special category in which much of the semantics of programming takes place (which is just an abstraction where we have operations just like sum and multiplication, but we can use any set, not just the numbers)—hence types are "algebraic" products and sums.