in his 1907 doctoral thesis Brouwer advanced the view that the basis of mathematics and of logic is found in the capacity of human beings to carry out mental constructions. on the method of proof by contradiction he says, "The words of your mathematical demonstration merely accompany a mathematical construction that is effected without words. At the point where you enounce the contradiction, I simply perceive that the construction no longer goes, that the required structure cannot be imbedded in the given basic structure. And when I make this observation, I do not think of a principium contradictionis."
in the 1960's the mathematician E.Bishop carried out a sweeping development of mathematics along the lines conceived by Brouwer. Bishop's works, and those of his followers, have started a modern school of constructive mathematics that is in harmony with the influence of computing in mathematics. in Bishop's words, "The positive integers and their arithmetic are presupposed by the very nature of our intelligence.... Every mathematical statement ultimately expresses the fact that if we perform certain computations within the set of positive integers, we shall get certain results.... Thus even the most abstract mathematical statement has a computational basis."
one important concept in proof theory - judgments with hypotheses - is best illustrated by trying to write the introduction and elimination rules for “implies” or “entailment”, written A ⊃ B
clearly A ⊃ B
is supposed to mean we can prove B true
assume that A true
to be provable. in other words, we can construct a derivation of the form
A true
——————
.
.
.
——————
B true
we can notate our rules then as:
——————
A true
——————
.
.
.
——————
B true A ⊃ B A
—————————— ——————————
A ⊃ B true B true
this notation is a bit clunky, so we’ll opt for a new one: Γ ⊢ J
. in this notation Γ
is some list of judgments we assume to hold and J
is the thing we want to show holds. generally we’ll end up with the rule
J ∈ Γ
—————
Γ ⊢ J
which captures the fact that Γ contains assumptions we may (or may not) use to prove our goal. this specific rule may vary depending on how we want express how assumptions work in our logic. this is the most straightforward characterization of how this ought to work
our hypothetical judgments come with a few rules which we call “structural rules”. they modify the structure of judgment, rather than any particular proposition we’re trying to prove
weakeningΓ ⊢ J ————————— Γ, Γ' ⊢ J
contractionΓ, A, A, Γ' ⊢ J ——————————————— Γ, A, Γ' ⊢ J
exchangeΓ' = permute(Γ) Γ' ⊢ A ———————————————————————— Γ ⊢ A
finally, we get a substitution principle. this allows us to eliminate some of the assumptions we made to prove a theorem
Γ ⊢ A Γ, A ⊢ B
————————————————
Γ ⊢ B
these five rules define meaning to our hypothetical judgments. we can restate our formulation of entailment with less clunky notation then as:
A prop B prop
——————————————
A ⊃ B prop
Γ, A ⊢ B Γ ⊢ A ⊃ B Γ ⊢ A
————————— ——————————————————
Γ ⊢ A ⊃ B Γ ⊢ B
one thing in particular to note here is that entailment actually internalizes the notion of hypothetical judgments into our logic
given a proof that Γ ⊢ A
and another derivation of Γ, A ⊢ B
, we can produce a derivation of Γ ⊢ B
such a theorem is utterly crazy unless we can formalize what it means to derive something
in every logic we're keep circling back to two core objects: judgments and propositions. the best explanation of judgments I’ve read comes from Frank Pfenning:
A judgment is something we may know, that is, an object of knowledge.
A judgment is evident if we in fact know it.
so judgments are the things we’ll structure our logic around
you’ve definitely heard of one judgment: A true
. this judgment signifies whether or not some proposition A
is true
judgments can be much fancier though: we might have a whole bunch of judgments like n even
, A possible
or A resource
these judgments act across various syntactic objects. we’ll understand the meaning of a proposition by the ways we can prove it, that is the proofs that A true
is evident
we prove a judgment J
through inference rules. an inference rule takes the form
J₁ J₂ .... Jₓ
—————————————
J
which should be read as “when J₁, J₂ … and Jₓ hold, then so does J”. here the things above the line are premises and the ones below are conclusions. what we’ll do is define a bunch of these inference rules and use them to construct proofs of judgments
for example, we might have the inference rules
n even
—————— ————————————
0 even S(S(n)) even
for the judgment n even
. we can then form proofs to show that n even
holds for some particular n
——————
0 even
————————————
S(S(0)) even
——————————————————
S(S(S(S(0)))) even
this tree for example is evidence that 4 even
holds
one judgment we’ll often see is A prop
. it simply says that A
is a "well-formed proposition", not necessarily true but syntactically well formed. this judgment is defined inductively over the structure of A
an example judgment would be:
A prop B prop
——————————————
A ∧ B prop
which says that A ∧ B
(A and B) is a well formed proposition if and only if A
and B
are. we can imagine a whole bunch of these rules
A prop B prop
—————— —————— ————————————— ...
⊤ prop ⊥ prop A ∨ B prop
that lay out the propositions of our logic
this doesn’t yet tell us how prove any of these propositions to be true, but it’s a start. after we formally specify what sentences are propositions in our logic we need to discuss how to prove that one is true. we do this with a different judgment A true
which is once again defined inductively
rules that let us “introduce” new proofs of propositions are introduction rules. once we have a proof, we can use it to construct other proofs. the rules for how we do that are called elimination rules
the introduction and elimination rules match up. anytime we prove something by an introduction rule followed by an elimination rule, we should be able to rewrite to avoid this duplication. this also hints that the rules aren’t too powerful: we can’t prove anything with the elimination rules that we didn’t have a proof for at some point already
for example, we might want to give meaning to the proposition A ∧ B
to do this we define its meaning through the inference rules for proving that A ∧ B true
. we know what A ∧ B
means, but what does have a proof of it imply? we should be able to “get out what we put in” which would mean we’d have two inference rules. this proof looks like this:
A ∧ B true A ∧ B true
—————————— ——————————
A true B true
and in pseudo-code:
fst (a, b) ≡ a
snd (a, b) ≡ b
and completeness is that
D D
————– ————–
D A ∧ B A ∧ B
————— ⇒ ————— ——————
A ∧ B A B
———————————————
A ∧ B
and in code:
d ≡ (fst d, snd d)
the first equations are reductions and the second is expansion. these actually correspond the η and β rules we expect a programming language to have
the core idea of constructive logic is replacing the notion of truth found in classical logic with an intuitionist version
in a classical logic each proposition is either true or false, regardless of what we know about it
in a constructive system, a formula cannot be assigned either until we have direct evidence of it. it’s not that there’s a magical new boolean value, {true, false, i-don’t-know}, it’s just not a meaningful question to ask. it doesn’t make sense in these logics to say “A
is true” without having a proof of A
the consequences of dealing with things in this way can be boils down to a few things. for example, we now know that
∃ x . A(x)
can be proven, then there is some term which we can readily produce t
so that A(t)
is provableA ∨ B
can be proven then either A
or B
is provable and we know whichthese make sense when you realize that ∃ x . A(x)
can only be proven if we have a direct example of it. we can’t indirectly reason that it really ought to exist or merely claim that it must be true in one of a set of cases. we actually need to introduce it by proving an example of it
one thing to note that is that some constructive logics conflict a bit with intuitionism. while intuitionism might have provided some of the basis for constructive logics gradually people have poked and pushed the boundaries away from just Brouwer’s intuitionism. for example both Markov’s principle and Church’s thesis state something about all computable functions. while they may be reasonable statements we can’t give a satisfactory proof for them. this is a little confusing
there are a few explanations of constructive logic that basically describe it as “Classical logic - the law of excluded middle”. more verbosely, a constructive logic is just one that forbids
∀ A. A ∨ ¬ A
being provable (the law of excluded middle, LEM)∀ A. ¬ (¬ A) → A
being provable (the law of double negation)I carefully chose the words “being provable” because we can easily introduce these as a hypothesis to a proof and still have a sound system. indeed this is not uncommon when working in Coq or Agda. they’re just not a readily available tool. looking at them, this should be apparent as they both let us prove something without directly proving it. this isn’t really a defining aspect of constructivism, just a natural consequence. if we need a proof of A
to show A
to be true if we admit A ∨ ¬ A
by default it defeats the point. we can introduce A
merely by showing ¬ (¬ A)
which isn’t a proof of A
! just a proof that it really ought to be true
equality seems like one of the simplest things to talk about in a theorem prover. after all, the notion of equality is something any small child can intuitively grasp. the sad bit is, while it’s quite easy to hand-wave about, how equality is formalized seems to be a rather complex topic
two terms A
and B
are definitional equal is a judgment notated
Γ ⊢ A ≡ B
this is not a user level proof but rather a primitive, untyped judgment in the meta-theory of the language itself. the typing rules of the language will likely include a rule along the lines of
Γ ⊢ A ≡ B, Γ ⊢ x : A
————————————————————–
Γ ⊢ x : B
so this isn’t an identity type you would prove something with, but a much more magical notion that two things are completely the same to the typechecker
in most type theories we have a slightly more powerful notion of definitional equality where not only are x ≡ y
if x
is y
only by definition but also by computation. so in Coq for example
(2 + 2) ≡ 4
even though “definitionally” these are entirely separate entities. in most theories, definitionally equal means “inlining all definitions and with normalization”, but not all
in type theories that distinguish between the two, the judgment that when normalized x
is y
is called "judgmental equality". I won’t distinguish between the two further because most don’t, but it’s worth noting that they can be seen as separate concepts
propositional equality is a particular type constructor with the type/kind
Id : (A : Set) → A → A → Type
we should be able to prove a number of definitions like
reflexivity : (A : Set)(x : A) → Id x x
symmetry : (A : Set)(x y : A) → Id x y → Id y x
transitivity : (A : Set)(x y z : A) → Id x y → Id y z → Id x z
this is an entirely separate issue from definitional equality since propositional equality is a concept that users can hypothesis about
One very important difference is that we can make proofs like
sanity : Id 1 2 → ⊥
since the identity proposition is a type family which can be used just like any other proposition. This is in stark contrast to definitional equality which a user can’t even normally utter!
this is arguably the simplest form of equality. identity types are just normal inductive types with normal induction principles. the most common is equality given by Martin Lof
data Id (A : Set) : A → A → Type where
Refl : (x : A) → Id x x
this yields a simple induction principle
id-ind : (P : (x y : A) → Id x y → Type)
→ ((x : A) → P x x (Refl x))
→ (x y : A)(p : Id x y) → P x y p
in other words, if we can prove that P
holds for the reflexivity case, than P
holds for any x
and y
where Id x y
we can actually phrase Id
in a number of ways, including
data Id (A : Set)(x : A) : A → Set where
Refl : Id x x
this really makes a difference in the resulting induction principle
j : (A : Set)(x : A)(P : (y : A) → Id x y → Set)
→ P x Refl
→ (y : A)(p : Id x y) → P y p
this clearly turned out a bit differently! In particular now P
is only parametrized over one value of A
, y
. this particular elimination is traditionally named j
the fact that this only relies on simple inductive principles is also a win for typechecking. equality/substitution fall straight out of how normal inductive types are handled
the price we pay of course is that this is much more painful to work with. an intensional identity type means the burden of constructing our equality proofs falls on users. furthermore, we lose the ability to talk about observational equality
observational equality is the idea that two “thingies” are indistinguishable by any test. it’s clear that we can prove that if Id x y
, then f x = f y
, but it’s less clear how to go the other way and prove something like
fun_ext : (A B : Set)(f g : A → B) → ((x : A) → Id (f x) (g x)) → Id f g
fun_ext f g p = ??
even though this is clearly desirable. if we know that f
and g
behave exactly the same way, we’d like our equality to be able to state that. however, we don’t know that f
and g
are constructed the same way, making this impossible to prove
this can be introduced as an axiom but to maintain our inductively defined equality type we have to sacrifice one of the following
some this has been avoided by regarding equality as an induction over the class of types as in Martin Lof’s intuitionist type theory
some type theories go a different route to equality, giving us back the extensionality in the process. one of those type theories is extensional type theory
in the simplest formulation, we have intensional type theory with a new rule, reflection
Γ ⊢ p : Id x y
——————————–————
Γ ⊢ x ≡ y
this means that our normal propositional equality can be shoved back into the more magical definitional equality. this gives us a lot more power, all the typecheckers magic and support of definitional equality can be used with our equality types
it isn’t all puppies an kittens though, arbitrary reflection can also make things undecidable in general. for example Martin Lof’s system is undecidable with extensional equality
no extensional type theory is implemented this way. instead they’ve taken a different approach to defining types themselves
in this model of ETT types are regarded as a partial equivalence relation (PER) over unityped (untyped if you want to get in a flamewar) lambda calculus terms
these PERs precisely reflect the extensional equality at that “type” and we then check membership by reflexivity. so a:T
is synonymous with (a,a)∈T
. since we are dealing with a PER, we know that ∀ a . (a,a)∈T
the actual NuRPL&friends theory is a little more complicated than that. it’s not entirely dependent on PERs and allows a few different ways of introducing types, but I find that PERs are a helpful idea
This is another flavor of extensional type theory which is really just intensional type theory plus some axioms
we can arrive at this type theory in a number of ways, the simplest is to add axiom K
k : (A : Set)(x : A)(P : (x : A) → Id x x → Type)
→ P x (Refl x)
→ (p : Id x x) → P x p
this says that if we can prove that for any property P
, P x (Refl x)
holds, then it holds for any proof that Id x x
. this is subtly different than straightforward induction on Id
because here we’re not proving that a property parameterized over two different values of A
, but only one
this is horribly inconsistent in something like homotopy type theory but lends a bit of convenience to theories where we don’t give Id
as much meaning
using k
we can prove that for any p q : Id x y
, then Id p q
. in Agda notation
prop : (A : Set)(x y : A)(p q : x ≡ y) → p ≡ q
prop A x . x refl q = k A P (λ _ → refl) x q
where P : (x : A) → x ≡ x → Set
P _ p = refl ≡ p
this can be further refined to show that that we can eliminate all proofs that Id x x
are Refl x
rec : (A : Set)(P : A → Set)(x y : A)(p : P x) → x ≡ y → P y
rec A P x .x p refl = p
rec-refl-is-useless : (A : Set)(P : A → Set)(x : A)
→ (p : P x)(eq : x ≡ x) → p ≡ rec A P x x p eq
rec-refl-is-useless A P x p eq with prop A x x eq refl
rec-refl-is-useless A P x p .refl | refl = refl
this form of extensional type theory still leaves a clear distinction between propositional equality and definitional equality by avoiding a reflection rule. however, with rec-refl-is–useless
we can do much of the same things, whenever we have something that matches on an equality proof we can just remove it
we essentially have normal propositional equality, but with the knowledge that things can only be equal in one way, up to propositional equality
the next form of equality we’ll talk about is slightly different than previous ones. heterogeneous equality is designed to co-exist in some other type theory and supplement the existing form of equality
heterogeneous equality is most commonly defined with John Major equality
data JMeq : (A B : Set) → A → B → Set where
JMrefl : (A : Set)(x : A) → JMeq A A x x
this is termed after a British politician since while it promises that any two terms can be equal regardless of their class (type), only two things from the same class can ever be equal
the above definition doesn’t typecheck in Agda. that’s because Agda is predicative, meaning that a type constructor can’t quantify over the same universe it occupies. we can however, cleverly phrase JMeq
so to avoid this
data JMeq (A : Set) : (B : Set) → A → B → Set where
JMrefl : (a : A) → JMeq A A a a
now the constructor avoids quantifying over Set
and therefore fits inside the same universe as A
and B
JMeq
is usually paired with an axiom to reflect heterogeneous equality back into our normal equality proof
reflect : (A : Set)(x y : A) → JMeq x y → Id x y
this reflection doesn’t look necessary, but arises for similar reasons that dictate that k
is unprovable
it looks like this heterogeneous equality is a lot more trouble than it’s worth at first. it really shines when we’re working with terms that we know must be the same, but require pattern matching or other jiggering to prove
a lot of the ways we actually interact with type theories is not on the blackboard but through some proof assistant which mechanizes the tedious aspects of using a type theory. for formal type theory this is particularly natural. it’s decidable whether M:A
holds so the user just writes a term and says “Hey this is a proof of A
” and the computer can take care of all the work of checking it. this is the basic experience we get with Coq, Agda, Idris, and others. even ≡
is handled without us thinking about it
with computational type theory life is a little sadder
we can’t just write terms like we would for a formal type theory because M ∈ A
isn’t decidable
we need to help guide the computer through the process of validating that our term is well typed. this is the price we pay for having an exceptionally rich notion of M = N ∈ A
and M ∈ A
, there isn’t a snowball’s chance in hell of it being decidable
to make this work we switch gears and instead of trying to construct terms we start working with what’s called a program refinement logic, a PRL. a PRL is basically a sequent calculus with a central judgment of
H ≫ A ◁ e
this is going to be set up so that H ⊢ e ∈ A
holds, but there’s a crucial difference. with ∈
everything was an input. to mechanize it we would write a function accepting a context and two terms and checking whether one is a member of the other. with H ≫ A ◁ e
only H
and A
are inputs, e
should be thought of as an output. what we’ll do with this judgment is work with a tactic language to construct a derivation of H ≫ A
without even really thinking with that ◁ e
and the system will use our proof to construct the term for us. so in Agda when I want to write a sorting function what I might do is say
sort : List Nat → List Nat
sort xs = ...
I just give the definition and Agda is going to do the grunt work to make sure that I don’t apply a nat to a string or something equally nutty. in a system like NuPRL what we do instead is define the type that our sorting function ought to have and use tactics to prove the existence of a realizer for it. by default we don’t really specify what exactly that realizer
for example, if I was writing JonPRL maybe I’d say
||| Somehow this says a list of nats is a sorted version of another
Operator sorting : (0; 0).
Theorem sort : [(xs : List Nat) {ys : List Nat | is-sorting(ys; xs)}] {
||| Tactics go here.
}
I specify a sufficiently strong type so that if I can construct a realizer for it then I clearly have constructed a sorting algorithm
of course we have tactics which let us say things “I want to use this realizer” and then we have to go off and show that the candidate realizer is a validate realizer. in that situation we’re actually acting as a type checker, constructing a derivation implying e ∈ A
data _×_ (A B : Set) : Set where
_,_ : A → B → A × B
fst : {A B : Set} → A × B → A
fst (a , b) = a
snd : {A B : Set} → A × B → B
snd (a , b) = b
the definition of conjunction is nothing but the definition of the Cartesian product of two sets: an element of the Cartesian product is a pair of elements of the component sets:
_/\_ : Set -> Set -> Set
A /\ B = A x B
this is the Curry-Howard identification of conjunction and Cartesian product
when we say that all proofs of A & B are pairs of proofs of A and proofs of B, we actually mean that all canonical proofs of A & B are pairs of canonical proofs of A and canonical proofs of B. this is the so called Brouwer-Heyting-Kolmogorov (BHK)-interpretation of logic, as refined and formalised by Martin-Lof
the definition of disjunction follows similar lines. according to the BHK-interpretation a (canonical) proof of A | B is either a canonical proof of A or a canonical proof of B:
data _\/_ (A B : Set) : Set where
inl : A -> A \/ B
inr : B -> A \/ B
case : {A B C : Set} -> A v B -> (A -> C) -> (B -> C) -> C
case (inl a) f g = f a
case (inr b) f g = g b
and
data True : Set where
<> : True
data False : Set where
nocase : {A : Set} -> False -> A
nocase ()
not : Set -> Set
not A = A -> False
according to the BHK-interpretation, to prove an implication is to provide a method for transforming a proof of A into a proof of B. Brouwer pioneered this idea about 100 years ago. in modern constructive mathematics in general, and in Martin-Lof type theory in particular, a “method” is usually understood as a computable function which transforms proofs. thus we define implication as function space. to be clear, we introduce some new notation for implications:
_==>_ : (A B : Set) -> Set
A ==> B = A -> B
the above definition is not accepted in Martin-Lof’s own version of propositions-as-sets. the reason is that each proposition should be defined by stating what its canonical proofs are. a canonical proof should always begin with a constructor, but a function in A -> B
data _==>_ (A B : Set) : Set where
fun : (A -> B) -> A ==> B
if ==>
is defined in this way, a canonical proof of A ==> B
fun
the rule of ==>-elimination (modus ponens) is now defined by pattern matching:
apply : {A B : Set} -> A ==> B -> A -> B
apply (fun f) a = f a
this finishes the definition of propositional logic inside Agda
∀ x:A . B
is similar to the BHK-interpretation of implication: to prove ∀ x:A . B
Forall : (A : Set) -> (B : A -> Set) -> Set
Forall A B = (x : A) -> B x
a universal quantifier can be viewed as the conjunction of a family of propositions. another common name is the “Π-set”, since Cartesian products of families of sets are often written as Π x:A . B
Martin-Lof does not accept the above definition in his version of the BHK-interpretation. instead he defines the universal quantifier as a data type with one constructor:
data Forall (A : Set) (B : A -> Set) : Set where
dfun : ((a : A) -> B a) -> Forall A B
according to the BHK-interpretation, a proof of ∃ x:A . B
data Exists (A : Set) (B : A -> Set) : Set where
_,_ : (a : A) -> B a -> Exists A B
thinking in terms of Curry-Howard, this is also a definition of the dependent product
an alternative name is then the disjoint union of a family of sets, since an existential quantifier can be viewed as the disjunction of a family of propositions. another common name is the “Σ-set”, since disjoint union of families of sets are often written as Σ x:A . B
dfst : {A : Set} {B : A -> Set} -> Exists A B -> A
dfst (a , b) = a
dsnd : {A : Set} {B : A -> Set} -> (p : Exists A B) -> B (dfst p)
dsnd (a , b) = b
dcase : {A B : Set} -> {C : A v B -> Set} -> (z : A v B) ->
((x : A) -> C (inl x)) ->
((y : B) -> C (inr y)) -> C z
dcase (inl a) d e = d a
dcase (inr b) d e = e b
dnocase : {A : False -> Set} -> (z : False) -> A z
dnocase ()
data _≡_ {A : Set} : A -> A -> Set where
refl : (x : A) -> x ≡ x
data Id : A → A → Set where
id : (x : A) → Id x x
so that
sym : ∀ {A : Set} {x y : A} -> x ≡ y -> y ≡ x
sym refl = refl
trans : (x y z : A) -> x ≡ y -> y ≡ z -> Id x z
trans x _ _ refl refl = id x
subst : {A : Set} -> {B : A -> Set} -> {x y : A} -> x ≡ y -> B x -> B y
subst (refl x) z = z