Nathann Cohen
CNRS Researcher in Computer Science

6 february 1986 – French – Nice
nathann.cohen@gmail.com
https://www.steinertriples.fr/ncohen/
French, English (fluent)
+33 6 95 78 16 46

Scientific publications
https://goo.gl/M9YXJy
In short

Curious and enthusiastic, I want to get involved in the processing and analysis of public data.

I care about computer science from the implementation detail up to the theoretical questions, about mathematics since forever, and about open data as a citizen and a former student in statistics.

I maintain datasets aimed at all citizens, processed and hosted on servers that I manage.

Interests
  • Theorerical computer science
  • Mathematics
  • Algorithms
  • Source code
  • Optimisation
  • Data
  • The open data movement
  • Dataviz
  • Machine Learning
Skills
  • Bash, C, Cython, HTML/CSS, Python, JavaScript
  • Tools: Everything that lives in a terminal
    (e.g. Git, Emacs, Grep, Jq, ...)
  • Algorithms and data structures
  • Used to public speaking
  • Used to Open Source development (≈ 7 years)
Work experience
2014 - Visiting Researcher at INRIA COATI Team (Sophia Antipolis)
2012 - Researcher at CNRS in Computer Science / Université Paris Sud
2011 - 2012Post-doctoral researcher in Discrete Geometry, Université Libre de Bruxelles (Belgium)
2008 - 2011Teacher at Polytech'Nice (engineering school)
2008 - 2011PhD Student, MASCOTTE Project, I3S/INRIA Labs, Sophia Antipolis (France)
University degrees
2008 - 2011 PhD in Computer Science, Université de Nice-Sophia Antipolis
Title : Three years of Graphs and music
2006 - 2008 Masters in ‘Statistics, Computer Science and Numerical analysis’
2003 - 2006 Bachelor in Mathematics
Teaching (≈ 250 hours)
  • 2017 Algorithms (2nd year)
  • 2017 HTML/CSS (1st year)
  • 2016 Bash (3rd year)
  • 2011 Discrete Mathematics (3rd year)
  • 2010 Algorithms and introduction to Java (3rd year)
  • 2010 Games and Strategies (1st year)
  • 2009 Introduction to UNIX (3rd year)

Open Data Projects

National Research Agency fundings (JavaScript)

The Agence Nationale de la Recherche website lists the whole history of its fundings on individual pages. This data deserves to be opened and given to the scrutiny of all citizens: they are now exctracted through web scraping and available as a JSON file.

https://www.steinertriples.fr/ncohen/data/ANR_archive/
A database for official positions (Bash, Makefile, JavaScript)

Appointments to official positions decided by the french President of by his Ministers are published through the Journal Officiel. This data, unfortunately, is not structured and so cannot be processed automatically.

To this end, I build and maintain a database containing all those decisions (e.g. prefects, ambassadors, etc.) and make it available to anyone in XML/JSON/TSV format:

https://www.steinertriples.fr/ncohen/data/nominations_JORF/

This database can be queried by name, but also by tag or by position type. It allows, for instance, the creation of an automatically updated list of prefects.

https://www.steinertriples.fr/ncohen/data/prefets_departements/
Past projects (Python, Perl, Bash, SPARQL)
  • A database of citations between scientific papers available on arXiv.org. This processing is performed by scripts written in Python, Perl or Bash. It requires the creation of several intermediate databases (names of scientific journals, abbreviations, editors, etc.).
  • Neural Networks experiments with TensorFlow in order to perform speaker recognition in sound samples.

  • PDF_table_scraper -- a python script to extract tables included in PDF files. This script has been used by the Regards Citoyens association, which promotes Oepn Data in the French institution.

    https://github.com/regardscitoyens/PDF_table_scraper

    I also helped this associated with a simple D3.js page meant to visualise their accounting, and an attendance chart for the french members of parliament available at https://www.nosdeputes.fr/

  • semantically_augmented_sage -- the draft of an interface between SageMath and DBpedia, the semantic database. Sage's Python objects are associated with a DBpedia item whose properties are obtained through SPARQL requests.

    https://github.com/nathanncohen/semantically_augmented_sage
  • I also host several other datasets in JSON format, and regularly updated through web scraping. Among them are the time tables of several members of the government, or an index of all Impact Studies produced by the Parliament.

    They are available at this address:

    https://www.steinertriples.fr/ncohen/data/

Software Development for Research

SageMath

I use Sage (a general mathematics software) for research, and contributed to it anything I could possibly need from 2009 to 2016. I submitted 650+ patches and peer-reviewed 600+, and I was de facto in charge of code related to Graph Theory, Linear Programming and Combinatorial Designs. I also coauthored a book explaining the use of Sage to french undergraduate students.

Sage is written in Python and Cython (=Python+C), so that the most critical parts of the code can be pure C. I especially like to spend time on careful implementations of the most fundamental functions, trying to find the algorithm and data structures that will make them as efficient and simple as possible. Its graph library has a uniquely wide scope.

  • Graphs Theory: Sage can build 150+ named graphs or graph families of theoretical interest. It contains hundreds of graph functions representing the main concepts known to researchers, which includes exact solvers for NP-Hard problems. Several of its features are not available anywhere else in a public software, in particular recognition algorithms.

  • Linear Programming: Sage has its own interface with the most famous MILP solvers (GLPK, CoinOR CBC, Gurobi, and CPLEX). It is its main tool to solve NP-Hard graph problems, and an extremely powerful way to solve the combinatorial problems that researchers meet daily.

https://www.steinertriples.fr/ncohen/contributions.php
Mathematical databases (Python)

Combinatorial Designs and Strongly Regular Graphs are mathematical objects whose constructions are scattered across scientific literature. In two databases (integrated in Sage) I attempted to gather all known constructions, as well as ensure the reproducibility of the constructions.

These databases are actually a combination of both pure data and recursive constructions used in the litterature.

Combinatorial designs
https://goo.gl/IMRu5q
Strongly regular graphs
https://arxiv.org/abs/1601.00181
Flinders Hamiltonian Cycle Project Challenge (2016)(Python, Cython, C/C++, D3.js)

Teaming with David Coudert (researcher at INRIA) we were awarded in 2016 a prize in Algorithms Engineering by the Flinder University d'Adelaïde (Australie).

This was an international challenge on the Hamiltonian Cycle problem, a classical problem in theoretical computer science, famous for its toughness. It is also known under its "Traveling Salesman Problem" form, an optimisation problem in which a Salesman must visit a given list of places in the most efficient order.

https://goo.gl/d1udo1