Data-centric Languages and Systems
The data-centric languages and systems thematic aims at designing and developping programming langages as well as systems that seriously take into account complex and massive data. The purpose is to build robust and efficient platforms on well founded theoretical grounds.
Managing, querying and making sense of data have become major aspects
of our society. In the past forty years, the advance of
technology has allowed computer systems to store vast amounts of
data. This has, in turn, spurred novel ways of handling data.
AI and Data-analytics for instance have caused a paradigm shift :
data is nowadays massive, heterogeneous, unstructured and manipulated with
application domain specific languages such as Python (AI, Physics, …), R (bio-informatics, statistics)
These newer approach typically manipulate vast amount of data, for long period of times, which in turn make testing and prototyping difficult. Consequently, there is a high demand for safer and more robust systems, designed from scratch while taking into account correctness and and efficiency.
This provides the key research direction we follow :
- The formalization of SQL and data-centric programming languages, a joined effort with the Formalization of Languages and Systems thematic
- The design, formalization and implementation of the BOLDR system
- Véronique Benzaken
- Evelyne Contejean
- Mohammed Hachmaoui
- Chantal Keller
- Julien Lopez
- Kim Nguyen
- Rébecca Zucchini
- Stefania Dumbrava (Phd. candidate, defended in 2016)
- Hyeonseung Im (Postdoc)
- Eunice Martins (Master Internship)
- Romain Vernoux (Master Internship)
- Rebecca Zucchini (Master Internship)
Data Intensive Systems Formalization (DataCert)
This research direction is at the intersection with the Formalisation of Languages and Systems thematic, see here.
Breaking Boundaries between Language and Database Runtimes (BOLDR)
The goal of this project is to create a uniform and universal query intermediate representation (QIR), to bridge the gap between programming language construct and database queries.
project defines the semantics of the QIR, its properties, and its
translation to database runtimes.
The on-going implementation generate efficient database queries from SimpleScript (a toy language) or R code for SQL database (in ANSI syntax) or databases based on the Apache Hadoop framework, such as HBase and Hive, while support for Python (as a frontend) and Spark (as a backend) are being investigated.
Advanced type system for data-oriented and dynamic languages
- Sponsored Research Grant - Oracle (Deep Integration of Programming Languages and Databases with Truffle/Graal)
- Projet ANR - Blanc Typex Typeful certified XML: integrating language, logic, and data-oriented best practices.
- Projet ANR - DEFIS Codex
- A link to the implementation of BOLDR (developped by Julien Lopez)
- The CDuce: an XML-centric functional programming language , part of major Linux distributions (Debian/Ubuntu, Fedora, Mandriva). , is used as a sandbox to investigate advanced type-systems
Books and books chapters
- Chapter 1 : NoSQL Languages and Systems, Kim Nguyễn, NoSQL Data Models: Trends and Challenges, Olivier Pivert Editor, 2018, ISTE
- XML Typechecking , V. Benzaken and G. Castagna and H. Hosoya and B-C Pierce and S. Vansummeren (Invited chapter in) Encyclopedia of Database Systems, Springer Verlag 2009.
- Fast in-memory XPath search using compressed indexes, Diego Arroyuelo, Francisco Claude, Sebastian Maneth, Veli Mäkinen, Gonzalo Navarro, Kim Nguyen 0001, Jouni Sirén, Niko Välimäki, 399-434, Softw., Pract. Exper., 2015,
- Optimizing XML Querying using Type-based Document Projection V. Benzaken, G. Castagna, D. Colazzo, K. Nguyen. In ACM Transactions on Database Systems (TODS) March 2013.
- Language-integrated queries: a BOLDR approach, Véronique Benzaken, Laurent Daynès, Giuseppe Castagna, Julien Lopez, Kim Nguyễn, Jérôme Siméon, Romain Vernoux, WWW, 2018
- Set-theoretic types for polymorphic variants, Giuseppe Castagna, Tommaso Petrucciani, Kim Nguyễn, 378-391, ICFP, 2016
- A Core Calculus for XQuery 3.0 Giuseppe Castagna, Hyeonseung Im, Kim Nguyễn and Véronique Benzaken In ESOP'15, European Symposium on Programming Languages, ETAPS 2015: 11-18 April 2015, London, UK.
- Polymorphic Functions with Set-Theoretic Types. Part 2: Local Type Inference and Type Reconstruction. G. Castagna, K. Nguyễn, Z. Xu, and P. Abate In POPL'15, 42nd ACM Symposium on Principles of Programming Languages, pag. 289-302, January, 2015.
- Polymorphic Functions with Set-Theoretic Types. Part 1: Syntax, Semantics, and Evaluation. G. Castagna, K. Nguyễn, Z. Xu, H. Im, S. Lenglet, and L. Padovani In POPL'14, 41th ACM Symposium on Principles of Programming Languages, January, 2014.
- Static and dynamic semantics for NoSQL Languages V. Benzaken, G. Castagna, K. Nguyen, J. Simeon. in ACM International Conference on Principles of Programming Languages POPL Roma 2013