đ *May 2, 2013*

*This post is a re-post from the now defunct blog of PPS. The date of this post is the same as the original post.*

The *proof by extraction* is a new proof technique, combining advantages of the proof by reflection and the proof by untrusted OCaml oracles. The idea is to add effects to the Coq language, so mutable variables, non-termination and exceptions are allowed. The execution is made efficient by extraction to OCaml which can pre-compute complex values. The results are then post-checked in Coq thanks to our theory of *simulable monads*.

We provide Cybele, a Coq plugin to do proof by extraction. You can start looking at the Get started and Examples sections. It is based on an original idea of Yann RĂ©gis-Gianas, and we are developing it with the help of Lourdes del Carmen GonzĂĄlez Huesca, and Beta Ziliani.

We consider the example of the equivalence decision between two terms given a list of known equivalences. Let us take a type `T`

with `~`

an equivalence relation and a list of hypothesis:

`H_1 : e_{i_1} ~ e_{j_1}`

- …
`H_n : e_{i_n} ~ e_{j_n}`

Our goal is to prove that `e_i ~ e_j`

using the reflexivity, symmetry and transitivity properties. On small instances we can obviously solve it by hand, or write a tactic to do it for us. We just need to apply the equivalence properties as many times as necessary. But on a bigger sample:

`H_{1,2} : e_1 ~ e_2`

- …
`H_{999,1000} : e_{999} ~ e_{1000}`

a proof of `e_1 ~ e_{1000}`

would consist of 998 applications of the transitivity rule. This is just huge. The solution of the proof by reflection technique is to replace a proof term by a *computation*. This computation has to be done by a proven correct decision procedure.

The union-find algorithm is an efficient way to solve the equivalence problem. Let us note `hs`

the list of hypothesis indexes `[(i_1, j_1), ..., (i_n, j_n)]`

and `(i, j)`

the indexes of our goal `e_i ~ e_j`

. Then a decision procedure in pseudo-code would look like:

```
decide (hs, (i, j)) : bool :=
let a = ref [] in
map (fun (i, j) -> union a i j) hs;
let i2 = find a i in
let j2 = find a j in
i2 = j2
```

We start with an empty array `a`

representing the fact that equivalence classes are *a priori* singleton sets. We merge them calling the `union`

procedure on each hypothesis. If the representatives `(i2, j2)`

of `(i, j)`

are equal then the property `e_i ~ e_j`

holds.

A nice way to show the soundness of `decide`

is to add an invariant to the array `a`

stating that if `i`

and `j`

are linked then `e_i ~ e_j`

. This can be done elegantly in Coq, which natively supports mixed proofs and programs (see the Program construct). At the end we just return a proof of `e_i ~ e_j`

if `i2`

equals `j2`

. An harder thing is to encode the union-find in a purely functional way since it critically relies on a mutable array, and show the termination of the `union`

and `find`

procedures.

We decided not to do it, and instead introduce a monad `M`

representing side-effects, non-termination and exceptions. We obtain a solution close to the pseudo-code:

```
Definition decide (known_eqs : equalities) (i j : T) : M (i ~ j).
refine (
let! a := tmp_ref s 0 nil in
do! List.iter (unify a) known_eqs in
let! Pi2 := find a i in
let (i2, Hii2) := Pi2 in
let! Pj2 := find a j in
let (j2, Hjj2) := Pj2 in
if eq_dec i2 j2 then
ret _
else
error "decide: the terms are not equal").
... (* some proof term *)
Defined.
```

On line 3, `let!`

is a notation for the bind operator. Line 9, we test the equality of `i2`

and `j2`

. Line 10, `ret _`

means we are returning a proof, namely the proof that `e_i ~ e_j`

. This proof is delayed until line 13, where it is made in proof mode. It uses the properties `Hii2`

and `Hjj2`

given by the invariant on `a`

, stating that `e_i ~ e_{i_2}`

and `e_j ~ e_{j_2}`

.

This is very much in the style of the IO monad of Haskell. Ours provides the following primitives.

- ref :
`forall i, T_i -> M (Ref.t T_i)`

(memory allocation) - read :
`Ref.t T -> M T`

- write :
`Ref.t T -> T -> M ()`

- fix :
`((A -> B) -> A -> B) -> A -> B`

- dependent_fix :
`((forall x : A, B) -> forall x : A, B) -> forall x : A, B`

- raise :
`string -> M T`

- try_with :
`(() -> M T) -> (string -> M T) -> M T`

- print :
`T -> M ()`

Obviously, this monad cannot be run natively in Coq. There is no run function of type `M A -> A`

or `M A -> option A`

. This is due to the general fixpoint operators which cannot be encoded in Coq since its typing rules enforce termination. Plus we want to run it as efficiently as possible.

To run our monad we combine the Coq and OCaml back-ends:

Compilation chain.

We use the extraction mechanism of Coq to get an equivalent program of our decision procedure in OCaml. All the monadic operators are replaced by native OCaml operators, since this language supports all the effects of our monad. We run it on a specific problem instance `x`

. It may not return a result in case of uncaught exception or infinite loop. If the execution is successful, we extract what we call a *prophecy*. This value is a guide which helps Coq to evaluate the monad efficiently with an extended run function of type:

```
run : prophecy -> M A -> option A
```

For the non-termination it contains the numbers of steps needed to evaluate a monadic value. It can also contain some pre-computed values in OCaml to save computational time in Coq. We provide a general mechanism to pass results from OCaml to Coq in the prophecy with the system of *input memory*. When you allocate a reference, you can mark it as an “input”. If you do so, its final value after the OCaml run will be given as its initial value to Coq. Any value can be passed thought the prophecy as long as you can serialize it. It does not work for functions or proofs, but does for all common data-structures such as integers or lists.

Notice that if you do not save results in the prophecy, computations are made twice: once in OCaml, then post-checked in Coq with the run function. This is the main limitation of our system, but it could be solved with a proven correct extraction mechanism. We could directly trust the execution made in OCaml and import the result in Coq.

We pretend that our programming model combines nicely safety and efficiency. Safety because of the strong type system of Coq and the ability to prove invariants on our programs. Efficiency because costly computations are delayed to OCaml. It supersedes what can be done with proof by reflection in pure Coq or with an OCaml plugin generating proof certificates. In a way, Cybele unifies these two techniques, and allow to combine both approach depending on the problem.

Trade-off efficiency vs correctness.

As future work we plan to add more monadic operators, especially for concurrency. It would also be great to have a trusted extraction mechanism for Coq.

blog comments powered by Disqus