in his 1907 doctoral thesis Brouwer advanced the view that the basis of mathematics and of logic is found in the capacity of human beings to carry out mental constructions. on the method of proof by contradiction he says, "The words of your mathematical demonstration merely accompany a mathematical construction that is effected without words. At the point where you enounce the contradiction, I simply perceive that the construction no longer goes, that the required structure cannot be imbedded in the given basic structure. And when I make this observation, I do not think of a principium contradictionis."

in the 1960's the mathematician E.Bishop carried out a sweeping development of mathematics along the lines conceived by Brouwer. Bishop's works, and those of his followers, have started a modern school of constructive mathematics that is in harmony with the influence of computing in mathematics. in Bishop's words, "The positive integers and their arithmetic are presupposed by the very nature of our intelligence.... Every mathematical statement ultimately expresses the fact that if we perform certain computations within the set of positive integers, we shall get certain results.... Thus even the most abstract mathematical statement has a computational basis."

Danny Gratzer

Proof Theory

Hypothetical Judgments

one important concept in proof theory - judgments with hypotheses - is best illustrated by trying to write the introduction and elimination rules for “implies” or “entailment”, written A ⊃ B

clearly A ⊃ B is supposed to mean we can prove B true assume that A true to be provable. in other words, we can construct a derivation of the form

 A true
 ——————
   .
   .
   .
 ——————
 B true

we can notate our rules then as:

 —————— 
 A true
 ——————
   .
   .
   .
 ——————
 B true           A ⊃ B    A
 ——————————       ——————————
 A ⊃ B true         B true

this notation is a bit clunky, so we’ll opt for a new one: Γ ⊢ J. in this notation Γ is some list of judgments we assume to hold and J is the thing we want to show holds. generally we’ll end up with the rule

J ∈ Γ
—————
Γ ⊢ J

which captures the fact that Γ contains assumptions we may (or may not) use to prove our goal. this specific rule may vary depending on how we want express how assumptions work in our logic. this is the most straightforward characterization of how this ought to work

our hypothetical judgments come with a few rules which we call “structural rules”. they modify the structure of judgment, rather than any particular proposition we’re trying to prove

weakening
  Γ ⊢ J
—————————
Γ, Γ' ⊢ J

  
contraction
Γ, A, A, Γ' ⊢ J
———————————————
 Γ, A, Γ' ⊢ J


exchange
Γ' = permute(Γ)   Γ' ⊢ A
————————————————————————
        Γ ⊢ A

finally, we get a substitution principle. this allows us to eliminate some of the assumptions we made to prove a theorem

Γ ⊢ A   Γ, A ⊢ B
————————————————
     Γ ⊢ B

these five rules define meaning to our hypothetical judgments. we can restate our formulation of entailment with less clunky notation then as:

A prop  B prop
——————————————
  A ⊃ B prop

Γ, A ⊢ B      Γ ⊢ A ⊃ B    Γ ⊢ A
—————————     ——————————————————
Γ ⊢ A ⊃ B           Γ ⊢ B

one thing in particular to note here is that entailment actually internalizes the notion of hypothetical judgments into our logic

given a proof that Γ ⊢ A and another derivation of Γ, A ⊢ B, we can produce a derivation of Γ ⊢ B

such a theorem is utterly crazy unless we can formalize what it means to derive something

in every logic we're keep circling back to two core objects: judgments and propositions. the best explanation of judgments I’ve read comes from Frank Pfenning:

A judgment is something we may know, that is, an object of knowledge.
A judgment is evident if we in fact know it.

so judgments are the things we’ll structure our logic around

you’ve definitely heard of one judgment: A true. this judgment signifies whether or not some proposition A is true

judgments can be much fancier though: we might have a whole bunch of judgments like n even, A possible or A resource

these judgments act across various syntactic objects. we’ll understand the meaning of a proposition by the ways we can prove it, that is the proofs that A true is evident

we prove a judgment J through inference rules. an inference rule takes the form

J₁ J₂ .... Jₓ
—————————————
     J

which should be read as “when J₁, J₂ … and Jₓ hold, then so does J”. here the things above the line are premises and the ones below are conclusions. what we’ll do is define a bunch of these inference rules and use them to construct proofs of judgments

for example, we might have the inference rules

             n even
 ——————    ————————————
 0 even    S(S(n)) even

for the judgment n even. we can then form proofs to show that n even holds for some particular n

       ——————
       0 even
    ————————————
    S(S(0)) even
 ——————————————————
 S(S(S(S(0)))) even

this tree for example is evidence that 4 even holds

one judgment we’ll often see is A prop. it simply says that A is a "well-formed proposition", not necessarily true but syntactically well formed. this judgment is defined inductively over the structure of A

an example judgment would be:

A prop  B prop
——————————————
  A ∧ B prop

which says that A ∧ B (A and B) is a well formed proposition if and only if A and B are. we can imagine a whole bunch of these rules

                A prop B prop
——————  ——————  ————————————— ...
⊤ prop  ⊥ prop    A ∨ B prop

that lay out the propositions of our logic

this doesn’t yet tell us how prove any of these propositions to be true, but it’s a start. after we formally specify what sentences are propositions in our logic we need to discuss how to prove that one is true. we do this with a different judgment A true which is once again defined inductively

Soundness and Completeness

rules that let us “introduce” new proofs of propositions are introduction rules. once we have a proof, we can use it to construct other proofs. the rules for how we do that are called elimination rules

the introduction and elimination rules match up. anytime we prove something by an introduction rule followed by an elimination rule, we should be able to rewrite to avoid this duplication. this also hints that the rules aren’t too powerful: we can’t prove anything with the elimination rules that we didn’t have a proof for at some point already

for example, we might want to give meaning to the proposition A ∧ B

to do this we define its meaning through the inference rules for proving that A ∧ B true. we know what A ∧ B means, but what does have a proof of it imply? we should be able to “get out what we put in” which would mean we’d have two inference rules. this proof looks like this:


A ∧ B true    A ∧ B true
——————————    ——————————
 A true        B true

and in pseudo-code:


    fst (a, b) ≡ a
    snd (a, b) ≡ b

and completeness is that

           D           D
         ————–       ————–
  D      A ∧ B       A ∧ B
————— ⇒  —————      ——————
A ∧ B      A           B
          ———————————————
               A ∧ B

and in code:


    d ≡ (fst d, snd d)

the first equations are reductions and the second is expansion. these actually correspond the η and β rules we expect a programming language to have

Constructive Logic

the core idea of constructive logic is replacing the notion of truth found in classical logic with an intuitionist version

in a classical logic each proposition is either true or false, regardless of what we know about it

in a constructive system, a formula cannot be assigned either until we have direct evidence of it. it’s not that there’s a magical new boolean value, {true, false, i-don’t-know}, it’s just not a meaningful question to ask. it doesn’t make sense in these logics to say “A is true” without having a proof of A

the consequences of dealing with things in this way can be boils down to a few things. for example, we now know that

if ∃ x . A(x) can be proven, then there is some term which we can readily produce t so that A(t) is provable

if A ∨ B can be proven then either A or B is provable and we know which

these make sense when you realize that ∃ x . A(x) can only be proven if we have a direct example of it. we can’t indirectly reason that it really ought to exist or merely claim that it must be true in one of a set of cases. we actually need to introduce it by proving an example of it

one thing to note that is that some constructive logics conflict a bit with intuitionism. while intuitionism might have provided some of the basis for constructive logics gradually people have poked and pushed the boundaries away from just Brouwer’s intuitionism. for example both Markov’s principle and Church’s thesis state something about all computable functions. while they may be reasonable statements we can’t give a satisfactory proof for them. this is a little confusing

there are a few explanations of constructive logic that basically describe it as “Classical logic - the law of excluded middle”. more verbosely, a constructive logic is just one that forbids

∀ A. A ∨ ¬ A being provable (the law of excluded middle, LEM)
∀ A. ¬ (¬ A) → A being provable (the law of double negation)

I carefully chose the words “being provable” because we can easily introduce these as a hypothesis to a proof and still have a sound system. indeed this is not uncommon when working in Coq or Agda. they’re just not a readily available tool. looking at them, this should be apparent as they both let us prove something without directly proving it. this isn’t really a defining aspect of constructivism, just a natural consequence. if we need a proof of A to show A to be true if we admit A ∨ ¬ A by default it defeats the point. we can introduce A merely by showing ¬ (¬ A) which isn’t a proof of A! just a proof that it really ought to be true

Equality

equality seems like one of the simplest things to talk about in a theorem prover. after all, the notion of equality is something any small child can intuitively grasp. the sad bit is, while it’s quite easy to hand-wave about, how equality is formalized seems to be a rather complex topic

Definitional Equality

two terms A and B are definitional equal is a judgment notated

Γ ⊢ A ≡ B

this is not a user level proof but rather a primitive, untyped judgment in the meta-theory of the language itself. the typing rules of the language will likely include a rule along the lines of

Γ ⊢ A ≡ B, Γ ⊢ x : A
————————————————————–
     Γ ⊢ x : B

so this isn’t an identity type you would prove something with, but a much more magical notion that two things are completely the same to the typechecker

in most type theories we have a slightly more powerful notion of definitional equality where not only are x ≡ y if x is y only by definition but also by computation. so in Coq for example

(2 + 2) ≡ 4

even though “definitionally” these are entirely separate entities. in most theories, definitionally equal means “inlining all definitions and with normalization”, but not all

in type theories that distinguish between the two, the judgment that when normalized x is y is called "judgmental equality". I won’t distinguish between the two further because most don’t, but it’s worth noting that they can be seen as separate concepts

Propositional Equality

propositional equality is a particular type constructor with the type/kind


  Id : (A : Set) → A → A → Type

we should be able to prove a number of definitions like


  reflexivity  : (A : Set)(x     : A) → Id x x
  symmetry     : (A : Set)(x y   : A) → Id x y → Id y x
  transitivity : (A : Set)(x y z : A) → Id x y → Id y z → Id x z

this is an entirely separate issue from definitional equality since propositional equality is a concept that users can hypothesis about

One very important difference is that we can make proofs like

sanity : Id 1 2 → ⊥

since the identity proposition is a type family which can be used just like any other proposition. This is in stark contrast to definitional equality which a user can’t even normally utter!

Intensional

this is arguably the simplest form of equality. identity types are just normal inductive types with normal induction principles. the most common is equality given by Martin Lof


  data Id (A : Set) : A → A → Type where
    Refl : (x : A) → Id x x

this yields a simple induction principle


  id-ind : (P : (x y : A) → Id x y → Type)
           → ((x : A) → P x x (Refl x))
           → (x y : A)(p : Id x y) → P x y p

in other words, if we can prove that P holds for the reflexivity case, than P holds for any x and y where Id x y

we can actually phrase Id in a number of ways, including


  data Id (A : Set)(x : A) : A → Set where
    Refl : Id x x

this really makes a difference in the resulting induction principle


  j : (A : Set)(x : A)(P : (y : A) → Id x y → Set)
       → P x Refl
       → (y : A)(p : Id x y) → P y p

this clearly turned out a bit differently! In particular now P is only parametrized over one value of A, y. this particular elimination is traditionally named j

the fact that this only relies on simple inductive principles is also a win for typechecking. equality/substitution fall straight out of how normal inductive types are handled

the price we pay of course is that this is much more painful to work with. an intensional identity type means the burden of constructing our equality proofs falls on users. furthermore, we lose the ability to talk about observational equality

Observational Equality

observational equality is the idea that two “thingies” are indistinguishable by any test. it’s clear that we can prove that if Id x y, then f x = f y, but it’s less clear how to go the other way and prove something like


  fun_ext : (A B : Set)(f g : A → B) → ((x : A) → Id (f x) (g x)) → Id f g
  fun_ext f g p = ??

even though this is clearly desirable. if we know that f and g behave exactly the same way, we’d like our equality to be able to state that. however, we don’t know that f and g are constructed the same way, making this impossible to prove

this can be introduced as an axiom but to maintain our inductively defined equality type we have to sacrifice one of the following

coherence
inductive types
extensionality
decidability

some this has been avoided by regarding equality as an induction over the class of types as in Martin Lof’s intuitionist type theory

Definitional + Extensional

some type theories go a different route to equality, giving us back the extensionality in the process. one of those type theories is extensional type theory

in the simplest formulation, we have intensional type theory with a new rule, reflection

Γ ⊢ p : Id x y
——————————–————
  Γ ⊢ x ≡ y

this means that our normal propositional equality can be shoved back into the more magical definitional equality. this gives us a lot more power, all the typecheckers magic and support of definitional equality can be used with our equality types

it isn’t all puppies an kittens though, arbitrary reflection can also make things undecidable in general. for example Martin Lof’s system is undecidable with extensional equality

no extensional type theory is implemented this way. instead they’ve taken a different approach to defining types themselves

in this model of ETT types are regarded as a partial equivalence relation (PER) over unityped (untyped if you want to get in a flamewar) lambda calculus terms

these PERs precisely reflect the extensional equality at that “type” and we then check membership by reflexivity. so a:T is synonymous with (a,a)∈T. since we are dealing with a PER, we know that ∀ a . (a,a)∈T need not hold. this is reassuring, otherwise we’d be able to prove that every type was inhabited by every term

the actual NuRPL&friends theory is a little more complicated than that. it’s not entirely dependent on PERs and allows a few different ways of introducing types, but I find that PERs are a helpful idea

Propositional Extensionality

This is another flavor of extensional type theory which is really just intensional type theory plus some axioms

we can arrive at this type theory in a number of ways, the simplest is to add axiom K


  k : (A : Set)(x : A)(P : (x : A) → Id x x → Type)
    → P x (Refl x)
    → (p : Id x x) → P x p

this says that if we can prove that for any property P, P x (Refl x) holds, then it holds for any proof that Id x x. this is subtly different than straightforward induction on Id because here we’re not proving that a property parameterized over two different values of A, but only one

this is horribly inconsistent in something like homotopy type theory but lends a bit of convenience to theories where we don’t give Id as much meaning

using k we can prove that for any p q : Id x y, then Id p q. in Agda notation


  prop : (A : Set)(x y : A)(p q : x ≡ y) → p ≡ q
  prop A x . x refl q = k A P (λ _ → refl) x q
      where P : (x : A) → x ≡ x → Set
            P _ p = refl ≡ p

this can be further refined to show that that we can eliminate all proofs that Id x x are Refl x


  rec : (A : Set)(P : A → Set)(x y : A)(p : P x) → x ≡ y → P y
  rec A P x .x p refl = p

  rec-refl-is-useless : (A : Set)(P : A → Set)(x : A)
                        → (p : P x)(eq : x ≡ x) → p ≡ rec A P x x p eq
  rec-refl-is-useless A P x p eq with prop A x x eq refl
  rec-refl-is-useless A P x p .refl | refl = refl

this form of extensional type theory still leaves a clear distinction between propositional equality and definitional equality by avoiding a reflection rule. however, with rec-refl-is–useless we can do much of the same things, whenever we have something that matches on an equality proof we can just remove it

we essentially have normal propositional equality, but with the knowledge that things can only be equal in one way, up to propositional equality

Heterogeneous Equality

the next form of equality we’ll talk about is slightly different than previous ones. heterogeneous equality is designed to co-exist in some other type theory and supplement the existing form of equality

heterogeneous equality is most commonly defined with John Major equality


  data JMeq : (A B : Set) → A → B → Set where
    JMrefl : (A : Set)(x : A) → JMeq A A x x

this is termed after a British politician since while it promises that any two terms can be equal regardless of their class (type), only two things from the same class can ever be equal

the above definition doesn’t typecheck in Agda. that’s because Agda is predicative, meaning that a type constructor can’t quantify over the same universe it occupies. we can however, cleverly phrase JMeq so to avoid this


  data JMeq (A : Set) : (B : Set) → A → B → Set where
    JMrefl : (a : A) → JMeq A A a a

now the constructor avoids quantifying over Set and therefore fits inside the same universe as A and B

JMeq is usually paired with an axiom to reflect heterogeneous equality back into our normal equality proof


  reflect : (A : Set)(x y : A) → JMeq x y → Id x y

this reflection doesn’t look necessary, but arises for similar reasons that dictate that k is unprovable

it looks like this heterogeneous equality is a lot more trouble than it’s worth at first. it really shines when we’re working with terms that we know must be the same, but require pattern matching or other jiggering to prove

building Proof Assistants

a lot of the ways we actually interact with type theories is not on the blackboard but through some proof assistant which mechanizes the tedious aspects of using a type theory. for formal type theory this is particularly natural. it’s decidable whether M:A holds so the user just writes a term and says “Hey this is a proof of A” and the computer can take care of all the work of checking it. this is the basic experience we get with Coq, Agda, Idris, and others. even ≡ is handled without us thinking about it

with computational type theory life is a little sadder

we can’t just write terms like we would for a formal type theory because M ∈ A isn’t decidable

we need to help guide the computer through the process of validating that our term is well typed. this is the price we pay for having an exceptionally rich notion of M = N ∈ A and M ∈ A, there isn’t a snowball’s chance in hell of it being decidable

to make this work we switch gears and instead of trying to construct terms we start working with what’s called a program refinement logic, a PRL. a PRL is basically a sequent calculus with a central judgment of

H ≫ A ◁ e

this is going to be set up so that H ⊢ e ∈ A holds, but there’s a crucial difference. with ∈ everything was an input. to mechanize it we would write a function accepting a context and two terms and checking whether one is a member of the other. with H ≫ A ◁ e only H and A are inputs, e should be thought of as an output. what we’ll do with this judgment is work with a tactic language to construct a derivation of H ≫ A without even really thinking with that ◁ e and the system will use our proof to construct the term for us. so in Agda when I want to write a sorting function what I might do is say

    sort : List Nat → List Nat
    sort xs = ...

I just give the definition and Agda is going to do the grunt work to make sure that I don’t apply a nat to a string or something equally nutty. in a system like NuPRL what we do instead is define the type that our sorting function ought to have and use tactics to prove the existence of a realizer for it. by default we don’t really specify what exactly that realizer

for example, if I was writing JonPRL maybe I’d say

    ||| Somehow this says a list of nats is a sorted version of another
    Operator sorting : (0; 0).

    Theorem sort : [(xs : List Nat) {ys : List Nat | is-sorting(ys; xs)}] {
      ||| Tactics go here.
    }

I specify a sufficiently strong type so that if I can construct a realizer for it then I clearly have constructed a sorting algorithm

of course we have tactics which let us say things “I want to use this realizer” and then we have to go off and show that the candidate realizer is a validate realizer. in that situation we’re actually acting as a type checker, constructing a derivation implying e ∈ A

proofs in Agda


  data _×_ (A B : Set) : Set where
    _,_ : A → B → A × B

  fst : {A B : Set} → A × B → A
  fst (a , b) = a

  snd : {A B : Set} → A × B → B
  snd (a , b) = b

the definition of conjunction is nothing but the definition of the Cartesian product of two sets: an element of the Cartesian product is a pair of elements of the component sets:


  _/\_ : Set -> Set -> Set
  A /\ B = A x B

this is the Curry-Howard identification of conjunction and Cartesian product

when we say that all proofs of A & B are pairs of proofs of A and proofs of B, we actually mean that all canonical proofs of A & B are pairs of canonical proofs of A and canonical proofs of B. this is the so called Brouwer-Heyting-Kolmogorov (BHK)-interpretation of logic, as refined and formalised by Martin-Lof

the definition of disjunction follows similar lines. according to the BHK-interpretation a (canonical) proof of A | B is either a canonical proof of A or a canonical proof of B:


  data _\/_ (A B : Set) : Set where
    inl : A -> A \/ B
    inr : B -> A \/ B

  case : {A B C : Set} -> A v B -> (A -> C) -> (B -> C) -> C
  case (inl a) f g = f a
  case (inr b) f g = g b

and


  data True : Set where
    <> : True

  data False : Set where

  nocase : {A : Set} -> False -> A
  nocase ()

  not : Set -> Set
  not A = A -> False

according to the BHK-interpretation, to prove an implication is to provide a method for transforming a proof of A into a proof of B. Brouwer pioneered this idea about 100 years ago. in modern constructive mathematics in general, and in Martin-Lof type theory in particular, a “method” is usually understood as a computable function which transforms proofs. thus we define implication as function space. to be clear, we introduce some new notation for implications:


  _==>_ : (A B : Set) -> Set
  A ==> B = A -> B

the above definition is not accepted in Martin-Lof’s own version of propositions-as-sets. the reason is that each proposition should be defined by stating what its canonical proofs are. a canonical proof should always begin with a constructor, but a function in A -> B does not (unless one considers the lambda-sign as a constructor). instead, Martin-Lof defines implication as a set with one constructor:


  data _==>_ (A B : Set) : Set where 
    fun : (A -> B) -> A ==> B

if ==> is defined in this way, a canonical proof of A ==> B always begins with the constructor fun

the rule of ==>-elimination (modus ponens) is now defined by pattern matching:


  apply : {A B : Set} -> A ==> B -> A -> B
  apply (fun f) a = f a

this finishes the definition of propositional logic inside Agda

quantificators in Agda

the BHK-interpretation of universal quantification (for all) ∀ x:A . B is similar to the BHK-interpretation of implication: to prove ∀ x:A . B we need to provide a method which transforms an arbitrary element a of the domain A into a proof of the proposition B[a/x], that is, the proposition B where the free variable x has been instantiated (substituted) by the term a. as usual we must avoid capturing free variables


  Forall : (A : Set) -> (B : A -> Set) -> Set
  Forall A B = (x : A) -> B x

a universal quantifier can be viewed as the conjunction of a family of propositions. another common name is the “Π-set”, since Cartesian products of families of sets are often written as Π x:A . B

Martin-Lof does not accept the above definition in his version of the BHK-interpretation. instead he defines the universal quantifier as a data type with one constructor:


  data Forall (A : Set) (B : A -> Set) : Set where
    dfun : ((a : A) -> B a) -> Forall A B

according to the BHK-interpretation, a proof of ∃ x:A . B consists of an element a of domain A and a proof of B[a/x]


  data Exists (A : Set) (B : A -> Set) : Set where
    _,_ : (a : A) -> B a -> Exists A B

thinking in terms of Curry-Howard, this is also a definition of the dependent product

an alternative name is then the disjoint union of a family of sets, since an existential quantifier can be viewed as the disjunction of a family of propositions. another common name is the “Σ-set”, since disjoint union of families of sets are often written as Σ x:A . B


  dfst : {A : Set} {B : A -> Set} -> Exists A B -> A
  dfst (a , b) = a

  dsnd : {A : Set} {B : A -> Set} -> (p : Exists A B) -> B (dfst p)
  dsnd (a , b) = b

  dcase : {A B : Set} -> {C : A v B -> Set} -> (z : A v B) ->
    ((x : A) -> C (inl x)) ->
    ((y : B) -> C (inr y)) -> C z
  dcase (inl a) d e = d a
  dcase (inr b) d e = e b

  dnocase : {A : False -> Set} -> (z : False) -> A z
  dnocase ()

relations in Agda


  data _≡_ {A : Set} : A -> A -> Set where
    refl : (x : A) -> x ≡ x

  data Id : A → A → Set where
    id : (x : A) → Id x x

so that

  
  sym : ∀ {A : Set} {x y : A} -> x ≡ y -> y ≡ x
  sym refl = refl

  trans : (x y z : A) -> x ≡ y -> y ≡ z -> Id x z
  trans x _ _ refl refl = id x

  subst : {A : Set} -> {B : A -> Set} -> {x y : A} -> x ≡ y -> B x -> B y
  subst (refl x) z = z