Hindley–Milner type system - Alchetron, the free social encyclopedia

In type theory and functional programming, Hindley–Milner (HM), also known as Damas–Milner or Damas–Hindley–Milner, is a classical type system for the lambda calculus with parametric polymorphism, first described by J. Roger Hindley and later rediscovered by Robin Milner. Luis Damas contributed a close formal analysis and proof of the method in his PhD thesis.

Among HM's more notable properties is completeness and its ability to deduce the most general type of a given program without the need of any type annotations or other hints supplied by the programmer. Algorithm W is a fast algorithm, performing type inference in almost linear time with respect to the size of the source, making it practically usable to type large programs. HM is preferably used for functional languages. It was first implemented as part of the type system of the programming language ML. Since then, HM has been extended in various ways, most notably by type class constraints as used in Haskell.

Introduction

Organizing their original paper, Damas and Milner clearly separated two very different tasks. One is to describe what types an expression can have and another to present an algorithm actually computing a type. Keeping the two aspects separate allows one to focus separately on the logic (i.e. meaning) behind the algorithm, as well as to establish a benchmark for the algorithm's properties.

How expressions and types fit to each other is described by means of a deductive system. Like any proof system, it allows different ways to come to a conclusion and since one and the same expression arguably might have different types, dissimilar conclusions about an expression are possible. Contrary to this, the type inference method itself (Algorithm W) is defined as a deterministic step-by-step procedure, leaving no choice what to do next. Thus clearly, decisions not present in the logic might have been made constructing the algorithm, which demand a closer look and justifications but would perhaps remain non-obvious without the above differentiation.

Syntax

Logic and algorithm share the notions of "expression" and "type", whose form is made precise by the syntax.

The expressions to be typed are exactly those of the lambda calculus, enhanced by a let-expression. These are shown in the adjacent table. For readers unfamiliar with the lambda calculus, here is a brief explanation: The application e 1 e 2 represents applying the function e 1 to the argument e 2 , often written e 1 ( e 2 ) . The abstraction λ x . e represents an anonymous function that maps the input x to the output e . This is also called function literal, common in most contemporary programming languages, and sometimes written as f u n c t i o n ( x ) r e t u r n e e n d . The let expression l e t x = e 1 i n e 2 represents the result of substituting every occurrence of x in e 2 with e 1 .

Types as a whole are split into two groups, called mono- and polytypes.

The reason why the let-expression appears in the syntax is to allow let polymorphism (see below).

Monotypes

Monotypes τ are syntactically represented as terms. A monotype always designates a particular type, in the sense that it is equal only to itself and different from all others.

Examples of monotypes include type constants like i n t or s t r i n g , and parametric types like M a p ( S e t s t r i n g ) i n t . These types are examples of applications of type functions, for example, from the set { M a p 2 , S e t 1 , s t r i n g 0 , i n t 0 } , where the superscript indicates the number of type parameters. The complete set of type functions D is arbitrary in HM, except that it must contain at least → 2 , the type of functions. It is often written in infix notation for convenience. For example, a function mapping integers to strings has type i n t → s t r i n g ; here, the type i n t → s t r i n g is written in infix notation. In prefix notation, it would be ( → ) i n t s t r i n g .

Type variables are monotypes. Standing alone, a type variable α is meant to be as concrete as i n t or β , and clearly different from both. Type variables occurring as monotypes behave as if they were type constants whose identity is unknown. Correspondingly, a function typed α → α only maps values of the particular type α on itself. Such a function can only be applied to values having type α and to no others.

Polytype

Polytypes (or type schemes) are types containing variables bound by one or more for-all quantifiers, e.g. ∀ α . α → α .

A function with polytype ∀ α . α → α can map any value of the same type to itself, and the identity function is a value for this type.

As another example ∀ α . ( S e t α ) → i n t is the type of a function mapping all finite sets to integers. The count of members is a value for this type.

Note that quantifiers can only appear top level, i.e. a type ∀ α . α → ∀ α . α for instance, is excluded by the syntax of types. Note also that monotypes are included in the polytypes, thus a type has the general form ∀ α 1 … ∀ α n . τ , where τ is a monotype.

Free type variables

In a type ∀ α 1 … ∀ α n . τ , the symbol ∀ is the quantifier binding the type variables α i in the monotype τ . The variables α i are called quantified and any occurrence of a quantified type variable in τ is called bound and all unbound type variables in τ are called free. Like in the lambda calculus, the notion of free and bound variables is essential for the understanding of the meaning of types.

This is certainly the hardest part of HM, perhaps because polytypes containing free variables are not represented in programming languages like Haskell. Likewise, one does not have clauses with free variables in Prolog. In particular developers experienced with both languages and actually knowing all the prerequisites of HM, are likely to slip this point. In Haskell for example, all type variables implicitly occur quantified, i.e. a Haskell type a -> a means ∀ α . α → α here. Because a type like α → α , though it may practically occur in a Haskell program, cannot be expressed there, it can easily be confused with its quantified version.

So what function can have a type like e.g. ∀ β . β → α , i.e. a mixture of both bound and free type variables and what could the free type variable α therein mean?

Consider f o o in Example 1, with type annotations in brackets. Its parameter y is not used in the body, but the variable x bound in the outer context of f o o surely is. As a consequence, f o o accepts every value as argument, while returning a value bound outside and with it its type. b a r to the contrary has type ∀ α . ∀ β . α → ( β → α ) , in which all occurring type variables are bound. Evaluating, for instance b a r 1 , results in a function of type ∀ β . β → i n t , perfectly reflecting that foo's monotype α in ∀ β . β → α has been refined by this call.

In this example, the free monotype variable α in foo's type becomes meaningful by being quantified in the outer scope, namely in bar's type. I.e. in context of the example, the same type variable α appears both bound and free in different types. As a consequence, a free type variable cannot be interpreted better than stating it is a monotype without knowing the context. Turning the statement around, in general, a typing is not meaningful without a context.

Context and typing

Consequently, to get the yet disjoint parts of the syntax, expressions and types together meaningfully, a third part, the context is needed. Syntactically, it is a list of pairs x : σ , called assignments or assumptions, stating for each value variable x i therein a type σ i . All three parts combined gives a typing judgment of the form Γ ⊢ e : σ , stating, that under assumptions Γ , the expression e has type σ .

Now having the complete syntax at hand, one can finally make a meaningful statement about the type of f o o in example 1, above, namely x : α ⊢ λ y . x : ∀ β . β → α . Contrary to the above formulations, the monotype variable α no longer appears unbound, i.e. meaningless, but bound in the context as the type of the value variable x . The circumstance whether a type variable is bound or free in the context apparently plays a significant role for a type as part of a typing, so free ( Γ ) it is made precise in the side box.

Polymorphic type order

While the equality of monotypes is purely syntactical, polytypes offer a richer structure by being related to other types through a specialization relation σ ⊑ σ ′ expressing that σ ′ is more special than σ .

When being applied to a value a polymorphic function has to change its shape specializing to deal with this particular type of values. During this process, it also changes its type to match that of the parameter. If for instance the identity function having type ∀ α . α → α is to be applied on a number having type i n t , both simply cannot work together, because all the types are different and nothing fits. What is needed is a function of type i n t → i n t . Thus, during application, the polymorphic identity is specialized to a monomorphic version of itself. In terms of the specialization relation, one writes ∀ α . α → α ⊑ i n t → i n t

Now the shape shifting of polymorphic values is not fully arbitrary but rather limited by their pristine polytype. Following what has happened in the example one could paraphrase the rule of specialization, saying, a polymorphic type ∀ α . τ is specialized by consistently replacing each occurrence of α in τ and dropping the quantifier. While this rule works well for any monotype used as replacement, it fails when a polytype, say ∀ β . β is tried as a replacement, resulting in the non-syntactical type ∀ β . β → ∀ β . β .

The syntactic restriction to allow quantification only top-level is imposed to prevent generalization while specializing. Instead of ∀ β . β → ∀ β . β , the more special type ∀ β . β → β must be produced in this case.

One could undo the former specialization by specializing on some value of type ∀ α . α again. In terms of the relation one gains ∀ α . α → α ⊑ ∀ β . β → β ⊑ ∀ α . α → α as a summary, meaning that syntactically different polytypes are equal with respect to renaming their quantified variables.

Now focusing only on the question whether a type is more special than another and no longer what the specialized type is used for, one could summarize the specialization as in the box above. Paraphrasing it clockwise, a type ∀ α 1 … ∀ α n . τ is specialized by consistently replacing any of the quantified variables α i by arbitrary monotypes τ i gaining a monotype τ ′ . Finally, type variables in τ ′ not occurring free in the pristine type can optionally be quantified.

Thus the specialization rules makes sure that no free variable, i.e. monotype in the pristine type becomes unintentionally bound by a quantifier, but originally quantified variable can be replaced with whatever, even with types introducing new quantified or unquantified type variables.

Starting with a polytype ∀ α . α , the specialization could either replace the body by another quantified variable, actually a rename or by some type constant (including the function type) which may or may not have parameters filled either with monotypes or quantified type variables. Once a quantified variable is replaced by a type application, this specialization cannot be undone through another substitution as it was possible for quantified variables. Thus the type application is there to stay. Only if it contains another quantified type variable, the specialization could continue further replacing for it.

So the specialization introduces no further equivalence on polytype beside the already known renaming. Polytypes are syntactically equal up to renaming their quantified variables. The equality of types is a reflexive, antisymmetric and transitive relation and the remaining specializations of polytypes are transitive and with this the relation ⊑ is an order.

Deductive system

The syntax of HM is carried forward to the syntax of the inference rules that form the body of the formal system, by using the typings as judgments. Each of the rules define what conclusion could be drawn from what premises. Additionally to the judgments, some extra conditions introduced above might be used as premises, too.

A proof using the rules is a sequence of judgments such that all premises are listed before a conclusion. Please see the Examples 2 and 3 below for a possible format of proofs. From left to right, each line shows the conclusion, the [ N a m e ] of the rule applied and the premises, either by referring to an earlier line (number) if the premise is a judgment or by making the predicate explicit.

Typing rules

The side box shows the deduction rules of the HM type system. One can roughly divide them into two groups:

The first four rules [ V a r ] (variable or function access), [ A p p ] (application, i.e. function call with one parameter), [ A b s ] (abstraction, i.e. function declaration) and [ L e t ] (variable declaration) are centered around the syntax, presenting one rule for each of the expression forms. Their meaning is pretty obvious at the first glance, as they decompose each expression, prove their sub-expressions and finally combine the individual types found in the premises to the type in the conclusion.

The second group is formed by the remaining two rules [ I n s t ] and [ G e n ] . They handle specialization and generalization of types. While the rule [ I n s t ] should be clear from the section on specialization above, [ G e n ] complements the former, working in the opposite direction. It allows generalization, i.e. to quantify monotype variables that are not bound in the context. The necessity of this restriction α ∉ f r e e ( Γ ) is introduced in the section on free type variables.

The following two examples exercise the rule system in action

Example 2: A proof for Γ ⊢ D i d ( n ) : i n t where Γ = i d : ∀ α . α → α , n : i n t , could be written

1 : Γ ⊢ D i d : ∀ α . α → α [ V a r ] ( i d : ∀ α . α → α ∈ Γ ) 2 : Γ ⊢ D i d : i n t → i n t [ I n s t ] ( 1 ) , ( ∀ α . α → α ⊑ i n t → i n t ) 3 : Γ ⊢ D n : i n t [ V a r ] ( n : i n t ∈ Γ ) 4 : Γ ⊢ D i d ( n ) : i n t [ A p p ] ( 2 ) , ( 3 )

Example 3: To demonstrate generalization, ⊢ D let i d = λ x . x in i d : ∀ α . α → α is shown below:

1 : x : α ⊢ D x : α [ V a r ] ( x : α ∈ { x : α } ) 2 : ⊢ D λ x . x : α → α [ A b s ] ( 1 ) 3 : ⊢ D λ x . x : ∀ α . α → α [ G e n ] ( 2 ) , ( α ∉ f r e e ( ϵ ) ) 4 : i d : ∀ α . α → α ⊢ D i d : ∀ α . α → α [ V a r ] ( i d : ∀ α . α → α ∈ { i d : ∀ α . α → α } ) 5 : ⊢ D let i d = λ x . x in i d : ∀ α . α → α [ L e t ] ( 3 ) , ( 4 )

Principal type

As mentioned in the introduction, the rules allow one to deduce different types for one and the same expression. See for instance, Example 2, steps 1,2 and Example 3, steps 2,3 for three different typings of the same expression. Clearly, the different results are not fully unrelated, but connected by the type order. It is an important property of the rule system and this order that whenever more than one type can be deduced for an expression, among them is (modulo alpha-renaming of the type variables) a unique most general type in the sense, that all others are specialization of it. Though the rule system must allow to derive specialized types, a type inference algorithm should deliver this most general or principal type as its result.

Let-polymorphism

Not visible immediately, the rule set encodes a regulation under which circumstances a type might be generalized or not by a slightly varying use of mono- and polytypes in the rules [ A b s ] and [ L e t ] .

In rule [ A b s ] , the value variable of the parameter of the function λ x . e is added to the context with a monomorphic type through the premise Γ , x : τ ⊢ D e : τ ′ , while in the rule [ L e t ] , the variable enters the environment in polymorphic form Γ , x : σ ⊢ D e 1 : τ . Though in both cases the presence of x in the context prevents the use of the generalisation rule for any monotype variable in the assignment. This regulation forces the parameter x in a λ -expression to remain monomorphic, while in a let-expression, the variable could already be introduced polymorphic, making specializations possible.

As a consequence of this regulation, no type can be inferred for λ f . ( f true , f 0 ) since the parameter f is in a monomorphic position, while let f = λ x . x in ( f true , f 0 ) yields a type ( b o o l , i n t ) , because f has been introduced in a let-expression and is treated polymorphic therefore. Note that this behaviour is in strong contrast to the usual definition let x = e 1 in e 2 ::= ( λ x . e 2 ) e 1 and the reason why the let-expression appears in the syntax at all. This distinction is called let-polymorphism or let generalization and is a conception owed to HM.

Towards an algorithm

Now that the deduction system of HM is at hand, one could present an algorithm and validate it with respect to the rules. Alternatively, it might be possible to derive it by taking a closer look on how the rules interact and proof are formed. This is done in the remainder of this article focusing on the possible decisions one can make while proving a typing.

Degrees of freedom choosing the rules

Isolating the points in a proof, where no decision is possible at all, the first group of rules centered around the syntax leaves no choice since to each syntactical rule corresponds a unique typing rule, which determines a part of the proof, while between the conclusion and the premises of these fixed parts chains of [ I n s t ] and [ G e n ] could occur. Such a chain could also exist between the conclusion of the proof and the rule for topmost expression. All proofs must have the so sketched shape.

Because the only choice in a proof with respect of rule selection are the [ I n s t ] and [ G e n ] chains, the form of the proof suggests the question whether it can be made more precise, where these chains might be needed. This is in fact possible and leads to a variant of the rules system with no such rules.

Syntax-directed rule system

A contemporary treatment of HM uses a purely syntax-directed rule system due to Clement as an intermediate step. In this system, the specialization is located directly after the original [ V a r ] rule and merged into it, while the generalization becomes part of the [ L e t ] rule. There the generalization is also determined to always produce the most general type by introducing the function Γ ¯ ( τ ) , which quantifies all monotype variables not bound in Γ .

Formally, to validate, that this new rule system ⊢ S is equivalent to the original ⊢ D , one has to show that Γ ⊢ D e : σ ⇔ Γ ⊢ S e : σ , which falls apart into two sub-proofs:

Γ ⊢ D e : σ ⇐ Γ ⊢ S e : σ (Consistency)

Γ ⊢ D e : σ ⇒ Γ ⊢ S e : σ (Completeness)

While consistency can be seen by decomposing the rules [ L e t ] and [ V a r ] of ⊢ S into proofs in ⊢ D , it is likely visible that ⊢ S is incomplete, as one cannot show λ x . x : ∀ α . α → α in ⊢ S , for instance, but only λ x . x : α → α . An only slightly weaker version of completeness is provable though, namely

Γ ⊢ D e : σ ⇒ Γ ⊢ S e : τ ∧ Γ ¯ ( τ ) ⊑ σ

implying, one can derive the principal type for an expression in ⊢ S allowing to generalize the proof in the end.

Comparing ⊢ D and ⊢ S note that now only monotypes appear in the judgments of all rules.

Degrees of freedom instantiating the rules

Within the rules themselves, assuming a given expression, one is free to pick the instances for (rule) variables not occurring in this expression. These are the instances for the type variable in the rules. Working towards finding the most general type, this choice can be limited to picking suitable types for τ in [ V a r ] and [ A b s ] . The decision of a suitable choice cannot be made locally, but its quality becomes apparent in the premises of [ A p p ] , the only rule, in which two different types, namely the function's formal and actual parameter type have to come together as one.

Therefore, the general strategy for finding a proof would be to make the most general assumption ( α ∉ f r e e ( Γ ) ) for τ in [ A b s ] and to refine this and the choice to be made in [ V a r ] until all side conditions imposed by the [ A p p ] rules are finally met. Fortunately, no trial and error is needed, since an effective method is known to compute all the choices, Robinson's Unification in combination with the so-called Union-Find algorithm.

To briefly summarize the union-find algorithm, given the set of all types in a proof, it allows one to group them together into equivalence classes by means of a u n i o n procedure and to pick a representative for each such class using a f i n d procedure. Emphasizing on the word procedure in the sense of side effect, we're clearly leaving the realm of logic to prepare an effective algorithm. The representative of a u n i o n ( a , b ) is determined such, that if both a and b are type variables the representative is arbitrarily one of them, while uniting a variable and a term, the term becomes the representative. Assuming an implementation of union-find at hand, one can formulate the unification of two monotypes as follows:

unify(ta,tb): ta = find(ta) tb = find(tb) if both ta,tb are terms of the form D p1..pn with identical D,n then unify(ta[i],tb[i]) for each corresponding ith parameter else if at least one of ta,tb is a type variable then union(ta,tb) else error 'types do not match'

Algorithm W

The presentation of Algorithm W as shown in the side box does not only deviate significantly from the original but is also a gross abuse of the notation of logical rules, since it includes side effects. It is legitimized here, for allowing a direct comparison with ⊢ S while expressing an efficient implementation at the same time. The rules now specify a procedure with parameters Γ , e yielding τ in the conclusion where the execution of the premises proceeds from left to right. Alternatively to a procedure, it could be viewed as an attributation of the expression.

The procedure i n s t ( σ ) specializes the polytype σ by copying the term and replacing the bound type variables consistently by new monotype variables. ' n e w v a r ' produces a new monotype variable. Likely, Γ ¯ ( τ ) has to copy the type introducing new variables for the quantification to avoid unwanted captures. Overall, the algorithm now proceeds by always making the most general choice leaving the specialization to the unification, which by itself produces the most general result. As noted above, the final result τ has to be generalized to Γ ¯ ( τ ) in the end, to gain the most general type for a given expression.

Because the procedures used in the algorithm have nearly O(1) cost, the overall cost of the algorithm is close to linear in the size of the expression for which a type is to be inferred. This is in strong contrast to many other attempts to derive type inference algorithms, which often came out to be NP-hard, if not undecidable with respect to termination. Thus the HM performs as well as the best fully informed type-checking algorithms can. Type-checking here means that an algorithm does not have to find a proof, but only to validate a given one.

Efficiency is slightly reduced because the binding of type variables in the context has to be maintained to allow computation of Γ ¯ ( τ ) and enable an occurs check to prevent the building of recursive types during u n i o n ( α , τ ) . An example of such a case is λ x . ( x x ) , for which no type can be derived using HM. Practically, types are only small terms and do not build up expanding structures. Thus, in complexity analysis, one can treat comparing them as a constant, retaining O(1) costs.

Original presentation of Algorithm W

In the original paper, the algorithm is presented more formally using a substitution style instead of side effects in the method above. In the latter form, the side effect invisibly takes care of all places where a type variable is used. Explicitly using substitutions not only makes the algorithm hard to read‹See TfD›, because the side effect occurs virtually everywhere, but also gives the false impression that the method might be costly. When implemented using purely functional means or for the purpose of proving the algorithm to be basically equivalent to the deduction system, full explicitness is of course needed and the original formulation a necessary refinement.

Recursive definitions

A central property of the lambda calculus is, that recursive definitions are non-elemental, but can instead be expressed by a fixed point combinator. The original paper notes that recursion can be realized by this combinator's type f i x : ∀ α . ( α → α ) → α . A possible recursive definitions could thus be formulated as r e c v = e 1 i n e 2 ::= l e t v = f i x ( λ v . e 1 ) i n e 2 .

Alternatively an extension of the expression syntax and an extra typing rule is possible as:

Γ , Γ ′ ⊢ e 1 : τ 1 … Γ , Γ ′ ⊢ e n : τ n Γ , Γ ″ ⊢ e : τ Γ ⊢ r e c v 1 = e 1 a n d … a n d v n = e n i n e : τ [ R e c ]

where

Γ ′ = v 1 : τ 1 , … , v n : τ n

Γ ″ = v 1 : Γ ¯ ( τ 1 ) , … , v n : Γ ¯ ( τ n )

basically merging [ A b s ] and [ L e t ] while including the recursively defined variables in monotype positions where they occur left to the i n but as polytypes right to it. This formulation perhaps best summarizes the essence of let-polymorphism.

References

Hindley–Milner type system Wikipedia

(Text) CC BY-SA

Contents