Curious and enthusiastic, I want to get involved in the processing and analysis of public data.
I care about computer science from the implementation detail up to the theoretical questions, about mathematics since forever, and about open data as a citizen and a former student in statistics.
I maintain datasets aimed at all citizens, processed and hosted on servers that I manage.
- Theorerical computer science
- Source code
- The open data movement
- Machine Learning
- Tools: Everything that lives in a terminal(e.g. Git, Emacs, Grep, Jq, ...)
- Algorithms and data structures
- Used to public speaking
- Used to Open Source development (≈ 7 years)
|2014 -||Visiting Researcher at INRIA COATI Team (Sophia Antipolis)|
|2012 -||Researcher at CNRS in Computer Science / Université Paris Sud|
|2011 - 2012||Post-doctoral researcher in Discrete Geometry, Université Libre de Bruxelles (Belgium)|
|2008 - 2011||Teacher at Polytech'Nice (engineering school)|
|2008 - 2011||PhD Student, MASCOTTE Project, I3S/INRIA Labs, Sophia Antipolis (France)|
|2008 - 2011|| PhD in Computer Science, Université de Nice-Sophia Antipolis|
Title : Three years of Graphs and music
|2006 - 2008||Masters in ‘Statistics, Computer Science and Numerical analysis’|
|2003 - 2006|| Bachelor in Mathematics|
- 2017 Algorithms (2nd year)
- 2017 HTML/CSS (1st year)
- 2016 Bash (3rd year)
- 2011 Discrete Mathematics (3rd year)
- 2010 Algorithms and introduction to Java (3rd year)
- 2010 Games and Strategies (1st year)
- 2009 Introduction to UNIX (3rd year)
Open Data Projects
The Agence Nationale de la Recherche website lists the whole history of its fundings on individual pages. This data deserves to be opened and given to the scrutiny of all citizens: they are now exctracted through web scraping and available as a JSON file.
Appointments to official positions decided by the french President of by his Ministers are published through the Journal Officiel. This data, unfortunately, is not structured and so cannot be processed automatically.
To this end, I build and maintain a database containing all those decisions (e.g. prefects, ambassadors, etc.) and make it available to anyone in XML/JSON/TSV format:
This database can be queried by name, but also by tag or by position type. It allows, for instance, the creation of an automatically updated list of prefects.
- A database of citations between scientific papers available on arXiv.org. This processing is performed by scripts written in Python, Perl or Bash. It requires the creation of several intermediate databases (names of scientific journals, abbreviations, editors, etc.).
Neural Networks experiments with
TensorFlowin order to perform speaker recognition in sound samples. PDF_table_scraper-- a python script to extract tables included in PDF files. This script has been used by the Regards Citoyens association, which promotes Oepn Data in the French institution. https://github.com/regardscitoyens/PDF_table_scraper
I also helped this associated with a simple D3.js page meant to visualise their accounting, and an attendance chart for the french members of parliament available at https://www.nosdeputes.fr/
semantically_augmented_sage-- the draft of an interface between SageMath and DBpedia, the semantic database. Sage's Python objects are associated with a DBpedia item whose properties are obtained through SPARQL requests. https://github.com/nathanncohen/semantically_augmented_sage
I also host several other datasets in JSON format, and regularly updated through web scraping. Among them are the time tables of several members of the government, or an index of all Impact Studies produced by the Parliament.
They are available at this address:
Software Development for Research
I use Sage (a general mathematics software) for research, and contributed to it anything I could possibly need from 2009 to 2016. I submitted 650+ patches and peer-reviewed 600+, and I was de facto in charge of code related to Graph Theory, Linear Programming and Combinatorial Designs. I also coauthored a book explaining the use of Sage to french undergraduate students.
Sage is written in Python and Cython (=Python+C), so that the most critical parts of the code can be pure C. I especially like to spend time on careful implementations of the most fundamental functions, trying to find the algorithm and data structures that will make them as efficient and simple as possible. Its graph library has a uniquely wide scope.
Graphs Theory: Sage can build 150+ named graphs or graph families of theoretical interest. It contains hundreds of graph functions representing the main concepts known to researchers, which includes exact solvers for NP-Hard problems. Several of its features are not available anywhere else in a public software, in particular recognition algorithms.
Linear Programming: Sage has its own interface with the most famous MILP solvers (GLPK, CoinOR CBC, Gurobi, and CPLEX). It is its main tool to solve NP-Hard graph problems, and an extremely powerful way to solve the combinatorial problems that researchers meet daily.
Combinatorial Designs and Strongly Regular Graphs are mathematical objects whose constructions are scattered across scientific literature. In two databases (integrated in Sage) I attempted to gather all known constructions, as well as ensure the reproducibility of the constructions.
These databases are actually a combination of both pure data and recursive constructions used in the litterature.
Teaming with David Coudert (researcher at INRIA) we were awarded in 2016 a prize in Algorithms Engineering by the Flinder University d'Adelaïde (Australie).
This was an international challenge on the Hamiltonian Cycle problem, a classical problem in theoretical computer science, famous for its toughness. It is also known under its "Traveling Salesman Problem" form, an optimisation problem in which a Salesman must visit a given list of places in the most efficient order.