Formalization of Languages and Systems

Overview

Formalizing in a broad sense is an activity involving all members, in particular using assistants like Coq and Isabelle. We formalize the semantics of specification and programming languages, including concurrency, probability, time, mathematical/numerical operations, etc... This activity is inherently fundamental for all deductive and testing-based verification activities.

Our main directions of research, which are described further down in this page, include the following items:

Formalization of SQL and data-centric programming languages.
Formalization of object-oriented specification languages (UML/OCL).
Formalization of process-oriented specification languages (CSP, Circus, HLL).
Formalization of heterogenous timed choreography languages (TESL).
Formalization of the semantics of JavaScript (ECMAScript 5).
Formalization of IEEE floating-point arithmetic.
Formalization of real numbers, for application to real analysis.

People

Research axes

Data Intensive Systems Formalization (DataCert)

The Datacert thematic is highly interested in the certification of data intensive systems with the Coq proof assistant. As data, in all forms, is increasingly large in volume, as a result of computers capturing more and more of the companies activity, the scientists' work and also as a result of better and more powerful automatic data gathering tools (e.g. space telescopes, focused crawlers, archived experimental data mandatory in some types of government-funded research programs). The availability and reliability of such large data volumes is a gold mine for companies, scientists and simply citizens. It is then of crucial importance to protect data integrity and reliability by means of robust methods and tools.

The aim of this research track is to certify data intensive systems such as relational database systems, XQuery and semi structured engines using formal tools such as SMT provers (e.g., Alt-ergo) embedded in the Why(3) platform as well as interactive theorem provers (e.g., Coq).

Formal semantics of UML/OCL

The Unified Modeling Language (UML) is one of the few modeling languages that is widely used in industry. While UML is mostly known as diagrammatic modeling language (e.g., visualizing class models), it is complemented by a textual language, called Object Constraint Language (OCL). OCL is based on a four-valued logic that turns UML into a formal language. Unfortunately the semantics of this specification language, captured in the "Annex A" of the OCL standard, leads to different interpretations of corner cases. We formalize the core of OCL: denotational definitions, a logical calculus and operational rules that allow for the execution of OCL expressions by a mixture of term rewriting and code compilation. Our formalization reveals several inconsistencies and contradictions in the current version of the OCL standard 2.5. Overall, we provided a theory in Isabelle/HOL and a generated document which is intended to provide the basis for a machine-checked text "Annex A" of the OCL standard targeting at tool implementors.

Formal semantics of process-oriented specification languages.

The Circus specification language combines elements for complex data and behavior specifications, using an integration of Z and CSP with a refinement calculus. Its semantics is based on Hoare and He’s unifying theories of programming (UTP). We develop a machine-checked, formal semantics based on a “shallow embedding” of Circus in Isabelle/UTP (our semantic theory of UTP based on Isabelle/HOL). We derive proof rules from this semantics and implement tactic support that finally allows for proofs of refinement for Circus processes (involving both data and behavioral aspects).

In parallel, we developed a formalization in Isabelle/HOL of the work of Hoare and Roscoe on the denotational semantics of the Failure/Divergence Model of CSP. It follows essentially the presentation of CSP in Roscoe's Book "Theory and Practice of Concurrency" @{cite "roscoe:csp:1998" } and the semantic details in a joint Paper of Roscoe and Brooks "An improved failures model for communicating processes" originated by Brookes and Roscoe in 1985. This model is still the "Gold-Standard" of CSP semantics.

Formalization of heterogenous timed choreography languages.

We study the semantics of specification languages for the coordination of concurrent systems, which provides time, duration and causality constraints. The language TESL (developed by Boulanger et al in the LRI-ModHel group) is defined for the coordination of concurrent systems, and provides concepts for time, duration and causality constraints. The language is defined in terms of a denotational semantics. In order to be applicable in verification techniques, we also define a symbolic operational semantics that constructs the possible timelines. We present a novel way to link these two semantics by taking advantage of an unfolding principle of these timelines. Moreover, we apply our study to a tool for runtime-testing heterogeneous systems. We also define a dilation relation between timelines, which characterizes the addition of stuttering instants, and we prove that specifications in the language are invariant by stuttering. We give a formalization of these semantic models in the Isabelle/HOL proof assistant, together with formal proofs for soundness, completeness and progress, as well as local termination and invariance by stuttering.

Formal semantics of JavaScript

The JsCert project has provided a formalization of an almost-complete subset of the semantics of JavaScript, following the ECMAScript Language Specification, version 5. This semantics is expressed in Coq, using a pretty-big step semantics. In addition to the inductive evaluation rules, a reference interpreter has been proved correct. This interpreter may be used to run actual JavaScript program following the rules of the formal semantics. It has been used in particular to validate the formal semantics against official JavaScript test suites. This formalization of JavaScript can serve as a basis for developing formalization of secure subsets of JavaScript, for developing verified static analyses for JavaScript code, and for mechanically-verified translations from high-level languages into JavaScript. These applications are the fpcus of the ANR AJACS project.

Floating-point arithmetic

A high-level of guarantee on floating-point computations is obtained by formal proofs in Coq. We maintain and develop large Coq libraries (such as Flocq and PFF) for floating-point arithmetic: core definitions, axiomatic and computational rounding operations, high-level properties. They provide a framework for developers to formally certify numerical applications. More details can be found on the verified computer arithmetic page.

Real analysis

We have worked on easing proofs of differentiability and integrability in Coq. The use case was the existence of a solution to the wave equation thanks to D'Alembert's formula; the goal was to automate the process as much as possible. Our approach is based on the pervasive use of total functions. We aim at creating a modern formalization of the real numbers in Coq, with a focus on practicality. This is sorely needed to ease the verification of numerical applications, especially those involving advanced mathematics.

Weak memory model of Multicore OCaml

Multicore OCaml is an experimental variant of the OCaml programming language which features shared-memory multicore parallelism. It is planned to be merged in the main OCaml sources. At Vals, we explore the weak memory model which Multicore OCaml uses by formally verifying concurrent programs in this memory model and by experimenting the design of lock-free concurrent algorithms in Multicore OCaml.

Grants

ANR AJACS (JavaScript applications: certified analyses and security)
ANR VERASCO
ANR FastRelax
ANR Datacert

Software and proofs

Coquelicot is a Coq library dedicated to real analysis: differentiation, integration, and so on. It is a conservative extension of the standard library of Coq, but with a strong focus on usability.
JsCert: a formalization of the semantics of JavaScript, together with an executable evaluator.
Datacert: a Coq library for the relational data model Coq Documentation [CoqDoc file]

See also our gallery of verified programs.

Main publications

A Coq formalisation of SQL's execution engines, Véronique Benzaken, Évelyne Contejean, Chantal Keller, Eunice Martins, ITP 2018 - International Conference on Interactive Theorem Proving, Jul 2018, Oxford, United Kingdom. Springer, ITP 2018: Interactive Theorem Proving, 10895, pp.88-107, 2018, Lecture Notes in Computer Science
Certifying Standard and Stratified Datalog Inference Engines in SSReflect, Véronique Benzaken, Évelyne Contejean, Stefania Dumbrava, International Conference on Interective Theorem Proving, 2017, Brasilia, Brazil. 2017
A Coq formalization of the Relational Data Model V. Benzaken, E. Contejean, S. Dumbrava. 23rd European Symposium on Programming Languages (ESOP). 2014.
Achim D. Brucker, Frédéric Tuong, and Burkhart Wolff. Featherweight OCL: A Proposal for a Machine-Checked Formal Semantics for OCL 2.5. In Archive of Formal Proofs, 2014. HOL/OCL.
A Trusted Mechanised JavaScript Specification. Martin Bodin, Arthur Charguéraud, Daniele Filaretti, Philippa Gardner, Sergio Maffeis, Daiva Naudziuniene, Alan Schmitt and Gareth Smith. 41st Symposium on Principles of Programming Languages (POPL). January 2014.
Coquelicot: A user-friendly library of real analysis for Coq. Sylvie Boldo, Catherine Lelay, and Guillaume Melquiond. Mathematics in Computer Science, pages 1--22, June 2014. [ bib | DOI | full text on HAL ]

VALS

Verified Algorithms, Languages and Systems