Formalization of the Tezos protocol's interface in Coq

January 15, 2020

The protocol of Tezos is written in the OCaml language. It uses a restricted OCaml interface of a few thousands lines of code to access to its primitives. Using the compiler coq-of-ocaml, we generate a Coq formalization of most of the Tezos protocol’s interface (the missing OCaml constructs are listed here). We hope this work to be a first step to enable formal reasoning on the implementation of the Tezos protocol. In this blog post, we present what we added to coq-of-ocaml in order to support the Tezos interface.

This work was financed by Nomadic Labs with the aim to verify the OCaml implementation of the Tezos blockchain. coq-of-ocaml is a compiler from OCaml to Coq.

Polymorphic variants

In OCaml, we can tag sum types with the backquote operator as follows:

type json =
  [ `O of (string * json) list
  | `Bool of bool
  | `Float of float
  | `A of json list
  | `Null
  | `String of string ]

There are no direct equivalents in Coq and we generate a warning message when we encounter tags. Still, we generate some code as a first approximation. For type definitions we generate a Coq inductive:

Inductive json : Set :=
| Bool : bool -> json
| Null : json
| O : list (string * json) -> json
| Float : Z -> json
| String : string -> json
| A : list json -> json.

and for inlined variants:

val fold :
  t ->
  key ->
  init:'a ->
  f:([`Key of key | `Dir of key] -> 'a -> 'a Lwt.t) ->
  'a Lwt.t

we generate a sum type with the tags as comments:

Parameter fold : forall {a : Set},
  t -> key -> a -> ((* `Dir *) key + (* `Key *) key -> a -> Lwt.t a) ->
  Lwt.t a.

Types and values collisions

In Coq, in contrast to OCaml, types and values live in the same namespace. Thus there are often name collisions in the generated code. Typically, this is due to the use of values having the same name as their type. We prevent that for the common cases (list, string, …) by translating the value names with a _value suffix. For example, we convert:

val string : string encoding

to:

Parameter __string_value : encoding string.

Modules

We added the import of the definitions of module types in the .mli files. The code to do so is the same as for .ml files. For example, we import:

module type COMPARABLE = sig
  type t

  val compare : t -> t -> int
end

to:

Module COMPARABLE.
  Record signature {t : Set} := {
    t := t;
    compare : t -> t -> Z;
  }.
  Arguments signature : clear implicits.
End COMPARABLE.

We handle the import of modules declared with a signature by unfolding the signature. We do so as we do not represent signatures in Coq unless for first-class modules (using a record type). For example, we import:

module String : COMPARABLE with type t = string

to:

Module String.
  Definition t := string.

  Parameter compare : t -> t -> Z.
End String.

This idea is due to Mehdi Bouaziz.

We also handle the include keyword by doing a similar unfolding. The unfolding of include significantly increases the size of the Coq code for the Tezos protocol’s interface. Indeed, the number of generated lines for the interface was multiplied by two once we converted the include occurences.

First-class modules

We improved the detection of first-class modules. The detection of first-class modules is a challenge in coq-of-ocaml. Indeed, we need to distinguish between first-class and plain modules in order to generated either a dependent record or a Coq module. Moreover, there are no builtin ways to translate between a record and a module in Coq.

When we access to the field of a module, we consider this module as a Coq record when it:

  • has a named signature (to have a corresponding named record definition);
  • is locally opened in an expression.

We added the constraint of being locally opened in order to filter out some plain modules with a named signature. We may change this rule in the future and convert more plain modules to records by default, as this could factorize the generated code and help to handle functors.

We added the support of first-class modules with polymorphic abstract types. For that, we mark the arity of the abstract types. For example, for a map signature in OCaml:

module type MAP = sig
  type key
  type +'a t
  val empty : 'a t
  val is_empty : 'a t -> bool
  val mem : key -> 'a t -> bool
end

we generate:

Module MAP.
  Record signature {key : Set} {t : Set -> Set} := {
    key := key;
    t := t;
    empty : forall {a : Set}, t a;
    is_empty : forall {a : Set}, t a -> bool;
    mem : forall {a : Set}, key -> t a -> bool;
  }.
  Arguments signature : clear implicits.
End MAP.

Records

In Coq, the fields of a record are projection functions which live in the same namespace as the values. Thus Coq is more prone to name collisions for record fields than OCaml. We solve this issue by putting the record definitions into modules of the same name. For example, for the record:

type descr = {name : string; descr : string option}

we generate the following Coq code:

Module descr.
  Record record := {
    name : string;
    descr : option string }.
End descr.
Definition descr := descr.record.

We thus prevent a name collision between the record type descr and the record field descr.

In subsequent accesses to the record fields, we prefix the projections by the record name. For example, to access to the name field of a record instance r of descr, we write in Coq:

r.(descr.name)

Pretty-printing

We improved the pretty-printing for the types. In particular:

  • we limit the number of parenthesis by taking into account the precedence of the operators;
  • we use a more consistent indentation.

As an example, we moved from the following generated Coq:

| FLambda : Z -> (list ((Context.binder_annot Names.Name.t) * Constr.constr)) ->
  Constr.constr -> (Esubst.subs fconstr) -> fterm

to:

| FLambda :
  Z -> list (Context.binder_annot Names.Name.t * Constr.constr) ->
  Constr.constr -> Esubst.subs fconstr -> fterm

Conclusion

We generated around six thousands valid Coq lines. Some OCaml constructs were missing, such as the extensible types. We may not support them as we see no direct equivalents in Coq. There is also a lot of code duplication due to the repetitive inclusion of similar module signatures. We will try to find solutions to factorize the generated code.

Finally, we will see if the code for the Tezos interface is usable when applying coq-of-ocaml to the protocol implementation.

blog comments powered by Disqus