Skip to main content

Hello ! This is my blog powered by Known. I post articles and links about coding, FOSS, but not only, in French (mostly) or English (if I did not find anything related before).


Raspberry Pi install checklist

2 min read

This is some memo for me, to use as a checklist whenever I set up a new Raspberry Pi which is to be running continuously (typically as a webserver).

First, I start from the lite version of Raspbian.

After install:

  1. sudo apt-get update && sudo apt-get upgrade

  2. sudo raspi-config and tweak according to my needs.

  3. Install some useful tools:

sudo apt-get install ack-grep fail2ban git heirloom-mailx htop libxml2-dev libxslt1-dev libyaml-dev moreutils msmtp-mta python-dev python-pip python3 python3-dev python3-pip screen vim zlib1g-dev

  1. Install RPi-Monitor. First install its dependencies:

sudo apt-get install librrds-perl libhttp-daemon-perl libjson-perl libipc-sharelite-perl libfile-which-perl

  1. cd $HOME; git clone; cd RPi-Monitor; sudo TARGETDIR=/ STARTUPSYS=systemd make install to install it. Be careful about a current bug with systemd install

  2. Some useful bash config: echo 'export PATH=$HOME/.local/bin:$PATH' >> $HOME/.bashrc; echo 'export EDITOR=vim' >> $HOME/.bashrc.

  3. Use NTP to keep the system in sync with current time: sudo timedatectl set-ntp true.

  4. Load ip_conntrack_ftp module: sudo echo "ip_conntrack_ftp" >>& /etc/modules-load.d/modules.conf.

  5. Set up an iptables systemd service à la Arch Linux. See this unit. Put iptables config in /etc/iptables/ip{6,}tables.rules.

  6. Remove the file in /etc/sudoers.d which prevents pi user from having to type its password.

  7. Configure msmtp to be able to send emails using the mailserver on my main server.

  8. Harden SSH configuration as you would do for a server.

  9. Set a MAILTO address in crontab and edit aliases.


Don du mois de janvier : Framasoft

1 min read

Je continue les dons du mois en donnant ce mois-ci 15€ à Framasoft.

Framasoft est un réseau dédié à la promotion du « libre » en général et du logiciel libre en particulier et offre de nombreux services et projets innovants mis librement à disposition du grand public, notamment dans le cadre de leur campagne de « dégooglisation » (des services libres, hébergés par Framasoft, qui offrent des alternatives aux services offerts par Google / Doodle / Facebook / Github etc, et la liste va croissante !). Bien évidemment, les services peuvent être très facilement autohébergés, et ils l'encouragent à travers leur campagne des CHATONS.

En particulier, leur liste d'alternatives est très bien faite et très pertinente.


Don du mois de décembre : EFF

2 min read

Je suis tombé sur les dons du mois de Sam & Max, qui donnaient chaque mois à une organisation qui fournit des produits et des services qu'ils utilisaient et qui avaient été importants pour eux le mois passé, tout en écrivant un billet sur leur blog afin de faire parler de l'organisme.

J'ai récemment migré l'intégralité des certificats SSL utilisés sur et ses sous-domaines, pour passer de StartSSL à Let's Encrypt, principalement suite à cette annonce de Mozilla. Je n'ai jamais payé pour un certificat SSL depuis que j'ai ce nom de domaine (StartSSL, tout comme Let's Encrypt les fournissent gratuitement), tandis que les autorités facturent jusqu'à 100$ le certificat.

Ce mois-ci, c'est donc 25$ qui vont à l'EFF principalement pour leur soutien à Let's Encrypt et leur certbot qui facilite énormément la gestion de ses certificats. L'EFF s'engage également pour défendre la liberté d'expression sur le net, pour lutter contre les brevets logiciels et contre les DRMs, ainsi que sur les questions de vie privée. Ils sont également derrière un certain nombre de logiciels et extensions tels que HTTPS Everywhere.


Moving from URxvt to st

2 min read

I have been using URxvt terminal for a while, but was suffering many issues with it recently. In particular, I had a weird locale issue, leading to unicode encoding errors whenever I copy accentuated characters using primary keyboard, some weird issues due to urxvt-tabbed and it just blew up when I tried to get new unicode characters right in it (such as smileys).

A friend told me about st which may be quite daunting at first, especially since all the configuration is made statically in a C header file, but it is working incredibly well, and just doing the job fine.

I have a mirror repo with my own configuration in case you want to have a look at it. This reproduces most of my URxvt user experience, except from two things:

  1. I don't have any tabs in st. But this is not a real issue and I'd rather depend on another program to handle tabs, such as tmux or even i3.
  2. I don't have clickable URLs as I used to have in URxvt. But once again, after a few weeks without this feature, I prefer selecting and copy/pasting URLs rather than clicking on them. This way, I don't open links unintentionally.

I was relying on a hack to get local notifications for my Weechat running through SSH + screen, using an extended escape sequence, and if you are also using it this commit will implement this behavior in st.



Improved back and forth between workspaces

2 min read

i3 has a feature to enable going back and forth between workspaces. Once enabled, if you are on workspace 1 and switch to workspace 2 and then just press mod+2 again to switch to workspace 2, you will go back to workspace 1.

However, this feature is quite limited as it does not remember more than one previous workspace. For example, say you are on workspace 1, switch to workspace 2 and then to workspace 3. Then, typing mod+3 will send you back to workspace 2 as expected. But then, typing mod+2 will send you back to workspace 3 whereas one may have expected it to switch to workspace 1 (as does Weechat with buffers switch for instance).

This can be solved by wrapping around the workspace switching in the i3 config. I wrote this small script to handle it.

Basically, you have to start the script when you start i3 by putting

exec_always --no-startup-id "python PATH_TO_/"

in your .i3/config file.

Then, you can replace your bindsym commands to switch workspaces, calling the same script:

bindsym $mod+agrave exec "echo 10 | socat - UNIX-CONNECT:$XDG_RUNTIME_DIR/i3/i3-back-and-forth-enhanced.sock" (Replace $XDG_RUNTIME_DIR by /tmp if this environment variable is not defined on your system.)

This script does maintain a queue of 20 previously seen workspaces (so you can go back 20 workspaces ago in your history). This can be increased by editing the WORKSPACES_STACK = deque(maxlen=20) line according to your needs.

Hope this helps! :)


Comparison of tools to fetch references for scientific papers

3 min read

EDIT: Finally, the impossible build of CERMINE was just a temporary issue, and they are distributing standalone JAR files, which is very interesting to easily ship it with another program. See this Github issue for more infos. You might also be interested in the CERMINE paper which is also presenting some comparisons, as I did below.


Recently, I tried to aggregate in a single place various codes I had written) to handle scientific papers. Some feature I was missing, and I would like to add, was the ability to fetch automatically references from a given paper. For arXiv papers, I had a simple solution using the LaTeX sources, but I wanted to have something more universal, taking a simple PDF file in input (thanks John for the suggestion, and Al for the tips on existing software solutions).

I tried a comparison of three existing software to extract references from a PDF file:

  • pdfextract from Crossref, very easy to use, written in Ruby.
  • Grobid, more advanced (using machine learning models), written in Java, but quite easy to use too.
  • Cermine, using the same approach as Grobid, but I could not get it to build on my computer. I used their REST service instead.

To compare them, I asked Antonin to build a list of most important journals and take five papers for every such journal, from Dissemin. This gives us a JSON file containing around 500 papers.

I downloaded some articles, to get a (hopefully) representative set, composed of 147 different papers from various journals (I did not had access to some of them, so I could not fetch the full dataset). I ran pdfextract, Grobid and Cermine on each of them and compared the results.

The raw results are available here for each paper, and I generated a single page comparison to ease the visual diff between the three results, available here (note that this webpage is very heavy, around 16MB).

Briefly comparing the results, the machine learning based models (Cermine and Grobid) seems to give far better results than the simple approach taken by pdfextract, at the expense of being more difficult to build and run. Cermine gives a bunch of infos, too much in my opinion, and I think Grobid is given the most reusable and complete results. Feel free to compare them yourself.


  • I also found ParsCit which may be of interest. Though, you first need to extract text from your PDF file. I did not yet test it more in depth.

  • This tweet tends to confirm the results I had, that Grobid is the best one.

  • If it can be useful, here is a small web service written in Python to allow a user to upload a paper and parse citations and try to assess open-access availability of the cited papers. It uses CERMINE as it was the easiest way to go, especially since it offers a web API, which allows me to distribute a simply working script, without any additional requirements.



Localizing a webapp with webL10n.js

3 min read

I am currently working on a Velib webapp. With Élie, we modularized everything so that the backend API can be edited easily, and adapted to any other bike sharing system, hence we wanted it to be easily localizable and looked for solutions compatible with as many browsers as possible. We finally chose webL10n.js. Here are some explanations about it and how to use it.

Why webL10n.js?

First thing is: why choose webL10n.js instead of anything else? We found basically four solutions: webL10n.js, L20n.js, Jed and a modified version of webL10n.js used in Gaia.

Jed takes a really different approach and, especially as we are not really familiar with localizing content, we found it more difficult to use and integrate.

The three others take a really simple approach. They use extra data-* attributes on any tag to replace on the fly the textContent of the node by a translation found in a formatted file. It is really easy to integrate, use and tweak. They support advanced features such as pluralization, HTML modifications, responsive localization (to use a different localization file on small screens), etc.

WebL10n.js and the modified version in Gaia are basically the same, except that the one in Gaia dropped hacks to add support in some old browsers such as IE6. Plus webL10n is in a separate git repo which is easy to track, so I'd rather go with this one. But the documentation is not really complete and the associated MDN wiki page is outdated. Hence this blog post :) Don't worry about the lack of recent commits on webL10n.js, it is stable and usable (and still maintained and supported by kaze).

L20n.js is the latest Mozilla project aiming at replacing webL10n.js. I had many problems with it, because the API keeps moving, and no doc is in sync with the code. Downloadable version exposes a totally different API than the one in the git repo, and the doc is not really helpful concerning which version should be considered stable. Plus the l20n file format is really weird and I'd rather not reinvent the wheel and go with standard properties file to ease translation for contributors.

Demo and extra infos

For more informations, you can refer to the README.

For an app using it, you can have a look at our VelibFxos webapp, espcially this folder. You can also see it in your browser at (under heavy work in progress, so might break from time to time).

Note: Note that there is a bug when mixing pluralization and innerHTML, which can be worked around. See this Github issue.

EDIT: Since I initially wrote this article, I came across Polyglot.js, a JS library by Airbnb, which does exactly this. It looks really nice and trustable, especially since it is backed by Airbnb which is heavily using it. One major drawback is that is uses a non-standard format for translations, instead of a po file or a properties file. But it could be easy to plug such a parser into it. It supports basically the same features as webl10n.js, except that it only has a JS API, without support for extra data- parameters. This is ok if you render your templates in JS (using React or Angular for instance), but I find it more difficult than webl10n.js to use in other cases.

, ,


Let's add some metadata on arXiv!

7 min read

This article contains ideas and explanations around this code. Many references to it will be done through this article.

Disclaimer: The above code is here as a proof of concept and to back this article with some code. It is clearly not designed (nor scalable) to run in production. However, the reference_fetcher part was giving good results on the arXiv papers I tested it on.

Nowadays, most of the published scientific papers are available online, either directly on the publisher's website, or as preprints on Open access repositories. For physics and computer science, a large part of them is available on the repository (a major, worldwide, Open access repository managed by Cornell), depending on the research topics. All published papers get a unique (global) identifier, called a DOI, which can be used to identify them and link to them. For instance, if you go to you are automatically redirected to the Physical Review B website, on the page of the paper with DOI 10.1103/FPhysRevB.47.7312. This is really useful to target a paper, and identify it uniquely, in a machine-readable way and in a way that will last. However, very little use seems to be done of this system. This is why I had the idea to put some extra metadata on published papers, using such systems.

From now on, I will mainly focus on arXiv for two main reasons. First, it is Open access, so it is accessible everywhere (and not depending on the subscriptions of a particular institution) and reusable, and second, arXiv provides sources for most of the papers, which is of great interest as we will see below. arXiv gives a unique identifier to the preprints. Correspondence between DOIs and arXiv identifiers can be made quite easily as some publishers push back DOIs to arXiv upon publication, and authors manually update the fields on arXiv for the rest of the publishers.

Using services such as Crossref or the publisher's website, it is really easy to get a formatted bibliography (plaintext, BibTeX, …) from a given identifier (e.g. see some codes for DOI or arXiv id for BibTeX output). Then, writing a bibliography should be as easy as keeping track of a list of identifiers!

Let's make a graph of citations!

In scientific papers, references are usually a plaintext list of papers used as reference, at the end of the article. This list follows some rules and formats, but there exist a wide variety of different formats, and it is often really difficult to parse them automatically (see for an example of references format).

If you want to fetch automatically the references from a given paper (to download them in batch for instance), you would basically have to parse a PDF file, find the references section, and parse each textual item, which is really difficult and error-prone. Some repositories, such as arXiv, offer sources for the published preprints. In this case, one can deal with a LaTeX-formatted bibliography (a thebibliography environment, not a full BiBTeX though), which is a bit better, but still a pity to deal with. When referencing an article, nobody uses DOIs!

The first idea is then to try to automatically fetch references for arXiv preprints and mark them as relationships between articles.

Fortunately, arXiv provides bbl source files for most of the articles (which are LaTeX-formatted bibliography). We can then avoid having to parse a PDF file, and directly get some structured text, but bibliography is still in plaintext, without any machine-readable identifier. Here comes Crossref which offers a wonderful API to try to fetch a DOI from a plain text (see And it gives surprisingly good results!

This automatic fetching of DOI for references of a given arXiv papers is available in this code.

Then, one can simply write a simple API accepting POST requests to add papers to a database, fetch referenced papers, and mark relationships between them. This is how began.

If you post a paper to it, identified either by its DOI (and a valid associated arXiv id is found) or directly by its arXiv id, it will add it to the database, resolve its references and mark relationships in database between this paper and the references papers. One can then simply query the graph of "citations", in direct or reverse order, to get any papers cited by a given one, or citing a given one.

The only similar service I know of on the web is the one provided by SAO/NASA ADS. See for instance how it deals with the introductory paper. It is quite fantastic for giving both the papers citing this one and cited by this one, in a browsable form, but its core is not open-source (or I did not find it), and I have no idea how it works in the background. There is no easily accessible API, and it works only in some very specific fields (typically Physics).

Let's add even more relations!

Now that we have a base API to add papers and relationships between them to a database, we can imagine going one step further and mark any kind of relations between the papers.

For instance, one can find that a given paper could be another reference for another one, which was not citing it. We could then collaboratively work to put extra metadata on scientific papers, such as extra references, which would be useful to everyone.

Such relationships could also be similar to, introductory_course, etc. This is quite limitless and the above code can already handle it. :)

Let's go one step further and add tags!

So, by now, we can have uniquely identified papers, with any kind of relationships between them, which we can crowdsource. Let's take some time to look at how arXiv stores papers.

They classify them by "general categories" (e.g. cond-mat which is a (very) large category called "Condensed Matter") and subcategories (e.g. cond-mat.quant-gas for "Quantum gases" under "Condensed Matter"). A RSS feed is offered for all these categories, and researchers usually follow the subcategory of their research area to keep up to date with published articles.

Although some article are released under multiple categories, most of them only have one category, very often because they do not fit anywhere else, but sometimes because the author did not think it could be relevant in another field. Plus some researchers work at the edge of two fields, and following everything published in these two fields is a very time-consuming task.

Next step is then to collaboratively tag articles. We could get tags as targeted as we want, or as general as we want, and everyone could follow the tags they want. Plus doing it collaboratively allows someone who finds an article interesting for their field, which was not the author's field, to make it appear in the feed of his colleagues.


We finally have the tools to mark relations between papers, to annotate them, complete them, and tag them. And all of this collaboratively. With DOIs and similar unique identifiers, we have the ability to get rid of the painful plaintext citations and references and use easily machine-manageable identifiers, while still getting some nicely rendered BibTeX citations automagically.

People are already doing this kind of things for webpages (identified by their URL) with Reddit or HackerNews and so on, let's do the same for scientific papers! :)

A demo instance should be available at This may not be very stable or highly available though. Note that Content-Type is the one of a JSON API and your browser may force you to download the response rather than displaying it. Easier way to browse it is to use cURL, according to the README.



Velib dataset

1 min read

Just a quick note to say that I am running a script to periodically dump the data available from the Velib API (every 2 minutes).

The dump can be found here (sqlite3 database). It is generated by this script.

Please host your own if you plan on making many queries against the previous URL.


Doing low cost telepresence (for under $200)

8 min read

With a friend, we recently started a project of building a project of low cost telepresence robot (sorry, link in French only) at our local hackerspace.

The goal is to build a robot that could be used to move around a room remotely, and stream audio and video in both directions. Our target budget is $200. We got a first working version (although it does not yet stream audio), and it is time for some explanations on the setup and how to build your own =) All the instructions, code and necessary stuff can be found at our git repo.

Screen capture

3D model

Basic idea

When taking part in a group meeting remotely, using some videoconference solution, it is often frustrating not being able to move around the room on the other side. This prevents us from having parallel discussions, and if the remote microphone is poor quality, we often do not hear clearly everybody speaking. Plus, someone speaking may be hidden by another speaker and many other such problems happen.

The goal was then to find a solution to do videoconferences (streaming both audio and video in both directions) and be able to move on the other side, to be able to see everyone and to come closer to the current speaker. Commercial solutions exist but they are really expensive (a few thousands dollars). We wanted to have the same basic features for $200, and it seems we almost achieved it!

Bill of Materials

The whole system is built around a Raspberry Pi and a PiCamera, which offer decent performances at a very fair price. The rest is really basic DIY stuff.

Here is the complete bill of materials:

Total: $140


  • We had to use a Raspberry Pi model 2 for the nice performance boost on this model. Even more important is the increased number of GPIOs on this model, with 2 usable hardware PWMs (provided that you don't use the integrated sound card output). This is useful to control the two wheels with hardware PWM and have a precise control of the move. The camera holder can be safely controlled with a software PWM and we did not experience any troubles doing so.
  • You can easily replace those parts by equivalent ones as long as you keep in mind that the battery pack should be able to provide enough current for the raspberry pi and the servos. We used standard USB battery packs for simplicity and user friendliness. However, they are more expensive than standard modelling lithium batteries and provide less current in general.
  • We had to use two battery packs. Indeed, the peak current due to the servos starting was too excessive for the battery pack and it was crashing the raspberry pi. Using two separate alimentation lines for the raspberry pi and the servos, we no longer have this problem and this solution is easier than tweaking the alimentation line until the raspberry pi stops freezing (which it may never do).

For the next version, we plan to add:

Total with these parts: $228


  • We used an HDMI screen as the official RaspberryPi screen uses most of the GPIOs pins, which we need. We decided to use bluetooth speakers as the integrated sound card was not usable as we were using the two hardware PWM lines for motion. This way, we have a speaker with a built-in microphone, which smaller than having the two of them separately.
  • The USB bluetooth adapter is impressively expensive, but it is the only one we found at the moment which we were sure would be compatible with Linux without any problems. Plus others adapters we found were not much cheaper.
  • The total budget is $223 without shipping. It is a bit over the initial budget goal, but we can easily lower it to $200. Indeed, we did not especially look for the cheaper parts. In particular, we bought the servos from Adafruit and I think we can find some servos for less (especially the camera holder servo, which can be a micro servo at $5 and should be enough). The bluetooth adapter is quite expensive as well and we could find a cheaper one I think. Budget shrinkage will be our next goal, once we have everything working.

Building the robot

All the necessary stuff is in our git repo (or its github mirror, both should be kept in sync). The repo contains three main directories: - blueprints which are the models of the robot. - disty which is the main server code on the Raspberry Pi. - webview which is the web controller served by the Raspberry Pi.

First of all, you should cut the parts and print the 3D parts in the blueprints dir. eps files in this directory are ready to cut files whereas svg files should be the same ones in easily editable format. You should laser cut the top and bottom files. picam_case_* files are the camera case we used,

You should 3D print:

  • the picam_case_* files for the camera case (licensed under CC BY SA).
  • teleprez.blend is the complete CAO model of the robot in Blender.
  • camera_servo_holder.stl is the plastic part to hold the camera servo. You need to print it once. wheel_servo_holder.stl is the plastic part to hold the servos for the wheels. You need four of them.

Assembling your Disty robot should be straightforward and easy to do if you look at the following pictures :) Use two ball transfer units to stabilize the robot and lock them with some rubber band (or anything better than that). Adjust tightly the height of the wheels so that the two wheels and the ball transfer units touch the ground.




GPIO pinout for the connection can be found at

GPIO pinout

For the electrical wiring, we used a standard USB-Micro USB cable to power the Raspberry Pi from one battery (located below the robot, to add weight on the ball transfer units and ensure contact is made with the surface). On the other battery, we just cut a USB - Micro USB cable to plug into it and connect the servos directly through a piece of breadboard to the battery. We had to use two batteries to prevent the draw from the servos to reboot the Raspberry Pi.

Here you are, you have a working Disty!

Running it

This may not be super user-friendly at the moment, we hope to improve this in the future.

Download any Linux image you want for your Raspberry Pi. Install uv4l and the uv4l-webrtc component. Enable the camera and ensure you can take pictures from the command line (there is a lot of doc) about this on the web.

Then, clone the Git repo somewhere on your Raspberry Pi. You should build the main disty code (which is the serverside code). This code will handle the control of the servos (emit PWMs etc) and listen on UDP port 4242 for instructions sent from the webview. Instructions to build it are located in the associated README. You will need cmake and a system-wide install of wiringpi to build the code.

You can then start the robot. Start by launching the disty program (as root as you need access to the GPIOs), ./disty, and then start the webview, ./ as root also as it serves the webview on port 80, which is below 1024 and owned by root. If you have ZeroConf on your Raspberry Pi (or a decent router), you can go to http://disty (or whatever hostname is set on your Raspberry Pi) to get the webview. Else, use the IP address instead. Webview usage should be almost straightforward.

It should work out of the box on your local LAN. If you are behind a NAT, it will need some black magic (which is implemented but may not be sufficient) to connect the remote user and Disty camera. In any case, you need to be able to access the webview (disty port 80) from the remote side.


All contributions and feedbacks are more than welcomed!

All the source code we wrote is under a beer-ware license, under otherwise specified.

* --------------------------------------------------------------------------------
* "THE BEER-WARE LICENSE" (Revision 42):
* Phyks and Élie wrote this file. As long as you retain this notice you
* can do whatever you want with this stuff (and you can also do whatever you want
* with this stuff without retaining it, but that's not cool...). If we meet some
* day, and you think this stuff is worth it, you can buy us a beer
* in return.
*                                                                       hackEns
* ---------------------------------------------------------------------------------

If you need a more legally valid license, you can consider Disty to be under an MIT license.

Some sources of inspiration and documentation