Skip to main content

Hello ! This is my blog powered by Known. I post articles and links about coding, FOSS, but not only, in French (mostly) or English (if I did not find anything related before).


Filtering ads with your Raspberry Pi

13 min read

TL;DR: Please have a look at the benchmark section below, to be aware of the limitations of this particular setup and decide whether to spend some time putting it in place or not.

I recently came across this Pi-Hole project that claims to be "a black hole for Internet advertisements" (thanks nicofrand for making me discover this!). The idea was really attractive: having a simple Raspberry Pi on the network doing all the ad filtering for the whole network, rather than having to maintain a separate uBlock Origin install on each and every computer of the network. It was also particularly attractive as having such ad blocker on a smartphone requires a rooted device. Plus there was a really nice web interface to control the whole ad blocking device.

While looking at it more in depth, I realized it was actually very limited:

  1. First, it was built around specific software and was doing some magical stuff using these software. It was really painful to get away from them. Basically, it uses dnsmasq to expose a DNS service, a standard hosts file to block the hosts serving ads, and a lighthttpd webserver. Problem is I already have a DNS resolver (unbound) on this Raspberry Pi, and a web server (nginx). I did not want to spend a lot of time trying to integrate it in my existing setup if finally it was not that powerful, so I decided to look at it in detail before installing.
  2. Second issue was that it relies on dnsmasq. dnsmasq is a simple program that allow you to answer DNS queries by using the hosts defined in /etc/hosts and to forward every other requests to another DNS server (typically your ISP DNS server). Pi-Hole lets you configure two DNS servers to forward queries to, default one being (Google :/). I already have a resolver on this Raspberry Pi and I do want to do the resolution myself, especially since my ISP DNS servers lies, and I do not want to use public DNS server on another network. So I had to hack on Pi-Hole to do some DNS resolution. About these issues, I'd like to point to two very interesting articles from Bortzmeyer: this one about Google DNS (in French) and this one about having your own DNS resolver (same). Also, being a DNS resolver, it may be cumbersome to disable it temporarily to load some website that absolutely requires the ads to be loaded.
  3. Last issue was that contrary to uBlock which filters at the requests level (and even sometimes at the HTML level), the fact that it is basically an alternative DNS resolver means you can only filter at the domain level. That is, you either whitelist (default) or blacklist a domain which is serving ads or malware, but you cannot differentiate different paths for a given domain. While browsing Libération website, I can see uBlock is blocking queries such as Such queries cannot be blocked by Pi-Hole without blacklisting the whole Amazon S3 network.

Given these facts, I remembered Privoxy which can be used as a filtering proxy, in a way similar to uBlock. Given that it is a proxy, it can filter in details, just as uBlock do and you can very easily disable it (simply disable the proxy). Plus, almost any devices offers you a proxy setting, so it should work both on my Android phones and computers. In this article I describe how I set up a Pi-Hole alternative based on Unbound (to have my own DNS resolver and block some things at the domain level) coupled with a Privoxy proxy to filter out ads.

Limitations: So, contrary to Pi-Hole, the setup described here will be able to remove ads in a similar way to uBlock/AdBlock. If you go through the whole article till the end, it will also have the element hiding features. Being a proxy setting, it will be easy to toggle it on and off (either by Prixovy toggling features or by manually turning off proxy on your device), if required. However, be aware of the remaining limitations with regards to HTTPS streams (section 4.15). As AdBlock/uBlock runs in the browser, it can filter ads in HTTPS streams as well, Privoxy will not be as efficient without HTTPS interception (which is generally not a good idea). However, it should perform rather well in the vast majority of situations (also note that AdBlock for rooted Android devices is also a proxy, so for them, it will not change anything).

I assume you already have a running Raspberry Pi with some basic install. Typically, see this previous article if this is not the case

Set up a DNS resolver

Let's install a DNS resolver on the Raspberry Pi, to answer DNS queries on the network. I am installing unbound and configuring Unbound here.

$ sudo apt-get install unbound
$ curl -o /etc/unbound/root.hints
$ # (Optional) Set crontask to download root.hints file every six months

The last curl command is used to fetch the root hints, to query hosts that are not cached. See this section of the ArchWiki page for more infos.

Then, you can create a basic configuration file for unbound.

$ cat /etc/unbound/unbound.conf.d/local.conf
    username: "unbound"
    interface:  # Listen on all interfaces
    root-hints: "/etc/unbound/root.hints"
    access-control: allow  # Access-control, see "Example" section in

Then, enable and start Unbound service at startup:

$ sudo systemctl start unbound && sudo systemctl enable unbound

You can now change the resolver to be used on your Raspberry Pi and on your whole network. To change it on your Raspberry Pi, have a look at this wiki page (for Raspbian). To set it as the default DNS resolver on your network, have a look at your router configuration, and set the DNS resolvers address to the address of your Raspberry Pi. Don't forget to open the ports in your firewall (53 tcp and udp).

To check everything is working fine, you can use dig (from dnsutils package on Debian-based distributions). Typically, dig should give you some results (in the ANSWER SECTION) and the IP address in the SERVER line should be the one of your Raspberry Pi.

Note: At this point, we should emphasize that having an open DNS resolver (that is, a DNS resolver that can answer to anyone) can be a security risk especially since some DDoS attacks use it. Then, you should make sure that your Raspberry Pi DNS server is only accessible from your local network, and that no third-party has access to it. This should be done through the access-control line in the above configuration, but this can also be enforced by the firewall running on your Raspberry Pi and the firewall on your router (typically, most routers provided by your ISP block any incoming connections, check this).

Block some domains based on hosts

Now, we would like unbound to block some domains that are known to serve ads and malware, in a similar way as Pi-Hole does. For this purpose, we will use unbound-block-hosts script to import hosts files into Unbound configuration. Basically, for every such domain, Unbound will return

unbound-block-hosts is designed with the Dan Pollock's hosts file in mind, whereas I wanted to be able to import any host file in Unbound. Here is a forked and patched version for this purpose (very ugly patch, as I am not fluent in Perl :/).

We will create an includes dir in the Unbound configuration directory (mkdir /etc/unbound/includes/), and include the rules in the main configuration by appending include: "/etc/unbound/includes/*.conf" to the /etc/unbound/unbound.conf.d/local.conf previously created.

Now, you can run ./unbound-block-hosts --url="SOME_URL" --file=/etc/unbound/includes/FOOBAR-blocking.conf to generate a matching configuration for a given hosts list. Typically, I have a script doing:

set -e

cd "$(dirname "$0")"

echo "Fetch Malware domains list and append to unbound"
./unbound-block-hosts --url="" --file=/etc/unbound/includes/malwaredomainlist-blocking.conf --address="YOUR_RASPBERRY_PI_IP"
echo "Fetch Yoyo ad servers list and append to unbound"
curl ";showintro=0&mimetype=plaintext" > /etc/unbound/includes/yoyoadservers-blocking.conf

systemctl reload unbound

which is crontask-ed to run every day. Default address is which means the local host for the client machine. I do not want to have too many 404 on my local webservers, so I'd rather put the IP address of the Raspberry Pi and have a webserver answering a 404 on it.

Install a webserver

As simple as

sudo apt-get install nginx

Install and configure Privoxy

Now, you can install Privoxy:

sudo apt-get install privoxy

The default configuration should be mostly ok. You can look at the /etc/privoxy/config file to adapt it to your needs (the file is really an example of well documented config file). Two options you might be interesting in changing are the debug option (to enable logging, which is disabled by default) and listen-addr. You will want to set the latter to:

listen-address  YOUR_RASPBERRY_PI_IP:8118

so that the Prixovy proxy is accessible from the rest of your LAN. As always, do not forget to configure your firewall to let the Privoxy connections pass through. At this point, you should try to set the proxy in your browser's preferences and check that everything is working fine. You should be able to browse to any web page, but the proxy will not do anything else for the moment.

Note: At this point, we should emphasize that having such a proxy is a security risk, as anyone having access to your proxy can browse the web with your IP address (and you may be held liable for anything illegal done with it). Then, you should make sure that your Raspberry Pi Privoxy is only accessible from your local network, and that no third-party has access to it. This should be enforced by the firewall running on your Raspberry Pi and the firewall on your router (typically, most routers provided by your ISP block any incoming connections, check this).

Privoxy, as installed by the Raspbian package, enables a couple of filters out of the box. As we will be translating Adblock rules into Privoxy rules, we can disable them. Edit the /etc/privoxy/match-all.action file to get something like this:

# Id: match-all.action,v
# This file contains the actions that are applied to all requests and
# may be overruled later on by other actions files. Less experienced
# users should only edit this file through the actions file editor.
{ \
+change-x-forwarded-for{block} \
+client-header-tagger{css-requests} \
+client-header-tagger{image-requests} \
+filter{refresh-tags} \
+filter{webbugs} \
+filter{jumping-windows} \
+filter{ie-exploits} \
+hide-from-header{block} \
+hide-referrer{conditional-block} \
/ # Match all URLs

In particular, I disabled the filters img-reorder (which is really intensive for the Raspberry Pi, and takes a few hundreds of milliseconds to process a regular page) and banners-by-size as we will be importing Adblock rules which should give better results. deanimate-gifs and session-cookies-only is a matter of taste (respectively it prevents animated GIFs by replacing them by their last frame and only allowing temporary cookies).

Block ads using Privoxy

We will now be importing adBlock rules into privoxy. One way to do it is to use this Haskell script.

To install it directly on your Raspberry Pi, provided you have a recent Raspberry Pi:

This was not my case, so I set up a builder on my server to run daily. Some rules are provided by the author and my builds are available here.

The way to set up the resulting files into Privoxy is very well detailed on the page of the project.

Note: My builds are made with the Element Hiding feature and as the domainCSS parameter. You should replace any occurrence of by the FQDN or IP adrdess of your Raspberry Pi when importing it. Please, do not put too much load on my hosted builds and consider hosting your owns.


All the tests were made with a Raspberry Pi of the first model with 512MB of RAM (model 1B). The Raspberry Pi has a wired access to internet (100MB/s port only on the Raspberry Pi). My laptop is wired as well (gigabit ethernet). Home connection to internet is a fiber access (923 Mbps download, 250 Mbps upload, as reported by DSLReports).

Testing the DNS server

Without the DNS server,

$ # Using my ISP resolver
$ % dig @
;; Query time: 7 msec

$ # Using Google DNS
$ dig @
;; Query time: 5 msec

$ # Using the DNS resolver on my Raspberry Pi
$ dig @
;; Query time: 505 msec

$ # Using it another time, now that the domain is in cache
$ dig @
;; Query time: 5 msec

These are typical times, the value is typically the one obtained as average of a few runs.

We can see that there is some overhead when first accessing a domain, as the Pi has to do the full DNS resolution. Afterwards, the domain is kept in cache and it is as fast to use the DNS server from the Pi as it is to use any other one.

Testing the Privoxy setup

Now, let us focus on the performances of the Privoxy on the Raspberry Pi. I tested it with a few websites, and results were roughly the same. Here is a detailed example of Liberation's website, a French journal. This example is interesting as my µBlock setup on my laptop blocks 23 different things when I don't use the DNS nor the Privoxy proxy. It is also an interesting example as out of the 23 blocked contents, only 14 of them could be blocked by DNS (with the setup described above).

The main issue here is that Privoxy is very long to process the page with all the filters, and it is way too heavy for my low power Raspberry Pi first model.

The main HTML document for this page takes 7 seconds to load when passing through the proxy, mainly due to the processing time. When reloading the page, it only takes 400ms as it is already in cache. As a comparison, it takes only 24ms when loading it directly.

The complete setup looks equivalent to the µBlock setup on my laptop.

I don't have a more recent version of the Raspberry Pi (typically Raspberry Pi 3) to test what the performances are on such a more powerful system. If you can try it, let me know, I am curious about the way it handles the load, and I could publish an edit to this article.


Don du mois de mars : Jupyter

1 min read

Je continue les dons du mois en donnant ce mois-ci 20$ à Jupyter.

Jupyter (anciennement iPython) est issu de la scission entre iPython (le noyau) et la partie notebook (feuilles de calcul). Jupyter Notebook recouvre cette deuxième partie, et supporte différents noyaux en plus de Python (R, Haskell, Julia, Ruby, etc). Un bon exemple du rendu est disponible ici (en lecture seule).

Une feuille de calculs est composée de cellules, et chaque cellule peut être soit du texte (markdown étendu avec le support de LaTeX pour les équations, rendues avec MathJaX), soit du code. C'est un très bon outil pour faire un genre de programmation lettrée où code et équations sont ensemble, dans un unique document cohérent. De plus, Jupyter supporte l'export vers de nombreux formats (grâce à LaTeX et à Pandoc), et notamment PDF, TeX et HTML, ce qui permet de générer un document final propre contenant l'intégralité des notes et des simulations pour un projet donné.


Don du mois de février : i3wm

1 min read

J'ai été assez débordé le mois dernier et viens de me rendre compte que mon article sur le don du mois de février était resté en brouillon non publié :/

Je continue donc les dons du mois en donnant ce mois-ci 15€ à i3wm.

i3wm est un gestionnaire de fenêtres pour X11 (Linux) qui fait du tiling ("pavage" en bon français). Il organise automatiquement les fenêtres à l'écran, de sorte qu'elles ne se superposent jamais, mais pavent l'espace disponible.

Je l'utilise quotidiennement, et une fois qu'on l'a testé (et qu'on s'y est habitué), ça devient très rapidement indispensable.

À noter, il existe une alternative (sans lien avec i3wm) pour Wayland: sway.


Raspberry Pi install checklist

2 min read

This is some memo for me, to use as a checklist whenever I set up a new Raspberry Pi which is to be running continuously (typically as a webserver).

First, I start from the lite version of Raspbian.

After install:

  1. sudo apt-get update && sudo apt-get upgrade

  2. sudo raspi-config and tweak according to my needs.

  3. Install some useful tools:

sudo apt-get install ack-grep fail2ban git heirloom-mailx htop libxml2-dev libxslt1-dev libyaml-dev moreutils msmtp-mta python-dev python-pip python3 python3-dev python3-pip screen vim zlib1g-dev

  1. Install RPi-Monitor. First install its dependencies:

sudo apt-get install librrds-perl libhttp-daemon-perl libjson-perl libipc-sharelite-perl libfile-which-perl

  1. cd $HOME; git clone; cd RPi-Monitor; sudo TARGETDIR=/ STARTUPSYS=systemd make install to install it. Be careful about a current bug with systemd install

  2. Some useful bash config: echo 'export PATH=$HOME/.local/bin:$PATH' >> $HOME/.bashrc; echo 'export EDITOR=vim' >> $HOME/.bashrc.

  3. Use NTP to keep the system in sync with current time: sudo timedatectl set-ntp true.

  4. Load ip_conntrack_ftp module: sudo echo "ip_conntrack_ftp" >>& /etc/modules-load.d/modules.conf.

  5. Set up an iptables systemd service à la Arch Linux. See this unit. Put iptables config in /etc/iptables/ip{6,}tables.rules.

  6. Remove the file in /etc/sudoers.d which prevents pi user from having to type its password.

  7. Configure msmtp to be able to send emails using the mailserver on my main server.

  8. Harden SSH configuration as you would do for a server.

  9. Set a MAILTO address in crontab and edit aliases.


Don du mois de janvier : Framasoft

1 min read

Je continue les dons du mois en donnant ce mois-ci 15€ à Framasoft.

Framasoft est un réseau dédié à la promotion du « libre » en général et du logiciel libre en particulier et offre de nombreux services et projets innovants mis librement à disposition du grand public, notamment dans le cadre de leur campagne de « dégooglisation » (des services libres, hébergés par Framasoft, qui offrent des alternatives aux services offerts par Google / Doodle / Facebook / Github etc, et la liste va croissante !). Bien évidemment, les services peuvent être très facilement autohébergés, et ils l'encouragent à travers leur campagne des CHATONS.

En particulier, leur liste d'alternatives est très bien faite et très pertinente.


Don du mois de décembre : EFF

2 min read

Je suis tombé sur les dons du mois de Sam & Max, qui donnaient chaque mois à une organisation qui fournit des produits et des services qu'ils utilisaient et qui avaient été importants pour eux le mois passé, tout en écrivant un billet sur leur blog afin de faire parler de l'organisme.

J'ai récemment migré l'intégralité des certificats SSL utilisés sur et ses sous-domaines, pour passer de StartSSL à Let's Encrypt, principalement suite à cette annonce de Mozilla. Je n'ai jamais payé pour un certificat SSL depuis que j'ai ce nom de domaine (StartSSL, tout comme Let's Encrypt les fournissent gratuitement), tandis que les autorités facturent jusqu'à 100$ le certificat.

Ce mois-ci, c'est donc 25$ qui vont à l'EFF principalement pour leur soutien à Let's Encrypt et leur certbot qui facilite énormément la gestion de ses certificats. L'EFF s'engage également pour défendre la liberté d'expression sur le net, pour lutter contre les brevets logiciels et contre les DRMs, ainsi que sur les questions de vie privée. Ils sont également derrière un certain nombre de logiciels et extensions tels que HTTPS Everywhere.


Moving from URxvt to st

2 min read

I have been using URxvt terminal for a while, but was suffering many issues with it recently. In particular, I had a weird locale issue, leading to unicode encoding errors whenever I copy accentuated characters using primary keyboard, some weird issues due to urxvt-tabbed and it just blew up when I tried to get new unicode characters right in it (such as smileys).

A friend told me about st which may be quite daunting at first, especially since all the configuration is made statically in a C header file, but it is working incredibly well, and just doing the job fine.

I have a mirror repo with my own configuration in case you want to have a look at it. This reproduces most of my URxvt user experience, except from two things:

  1. I don't have any tabs in st. But this is not a real issue and I'd rather depend on another program to handle tabs, such as tmux or even i3.
  2. I don't have clickable URLs as I used to have in URxvt. But once again, after a few weeks without this feature, I prefer selecting and copy/pasting URLs rather than clicking on them. This way, I don't open links unintentionally.

I was relying on a hack to get local notifications for my Weechat running through SSH + screen, using an extended escape sequence, and if you are also using it this commit will implement this behavior in st.



Improved back and forth between workspaces

2 min read

i3 has a feature to enable going back and forth between workspaces. Once enabled, if you are on workspace 1 and switch to workspace 2 and then just press mod+2 again to switch to workspace 2, you will go back to workspace 1.

However, this feature is quite limited as it does not remember more than one previous workspace. For example, say you are on workspace 1, switch to workspace 2 and then to workspace 3. Then, typing mod+3 will send you back to workspace 2 as expected. But then, typing mod+2 will send you back to workspace 3 whereas one may have expected it to switch to workspace 1 (as does Weechat with buffers switch for instance).

This can be solved by wrapping around the workspace switching in the i3 config. I wrote this small script to handle it.

Basically, you have to start the script when you start i3 by putting

exec_always --no-startup-id "python PATH_TO_/"

in your .i3/config file.

Then, you can replace your bindsym commands to switch workspaces, calling the same script:

bindsym $mod+agrave exec "echo 10 | socat - UNIX-CONNECT:$XDG_RUNTIME_DIR/i3/i3-back-and-forth-enhanced.sock" (Replace $XDG_RUNTIME_DIR by /tmp if this environment variable is not defined on your system.)

This script does maintain a queue of 20 previously seen workspaces (so you can go back 20 workspaces ago in your history). This can be increased by editing the WORKSPACES_STACK = deque(maxlen=20) line according to your needs.

Hope this helps! :)


Comparison of tools to fetch references for scientific papers

3 min read

EDIT: Finally, the impossible build of CERMINE was just a temporary issue, and they are distributing standalone JAR files, which is very interesting to easily ship it with another program. See this Github issue for more infos. You might also be interested in the CERMINE paper which is also presenting some comparisons, as I did below.


Recently, I tried to aggregate in a single place various codes I had written) to handle scientific papers. Some feature I was missing, and I would like to add, was the ability to fetch automatically references from a given paper. For arXiv papers, I had a simple solution using the LaTeX sources, but I wanted to have something more universal, taking a simple PDF file in input (thanks John for the suggestion, and Al for the tips on existing software solutions).

I tried a comparison of three existing software to extract references from a PDF file:

  • pdfextract from Crossref, very easy to use, written in Ruby.
  • Grobid, more advanced (using machine learning models), written in Java, but quite easy to use too.
  • Cermine, using the same approach as Grobid, but I could not get it to build on my computer. I used their REST service instead.

To compare them, I asked Antonin to build a list of most important journals and take five papers for every such journal, from Dissemin. This gives us a JSON file containing around 500 papers.

I downloaded some articles, to get a (hopefully) representative set, composed of 147 different papers from various journals (I did not had access to some of them, so I could not fetch the full dataset). I ran pdfextract, Grobid and Cermine on each of them and compared the results.

The raw results are available here for each paper, and I generated a single page comparison to ease the visual diff between the three results, available here (note that this webpage is very heavy, around 16MB).

Briefly comparing the results, the machine learning based models (Cermine and Grobid) seems to give far better results than the simple approach taken by pdfextract, at the expense of being more difficult to build and run. Cermine gives a bunch of infos, too much in my opinion, and I think Grobid is given the most reusable and complete results. Feel free to compare them yourself.


  • I also found ParsCit which may be of interest. Though, you first need to extract text from your PDF file. I did not yet test it more in depth.

  • This tweet tends to confirm the results I had, that Grobid is the best one.

  • If it can be useful, here is a small web service written in Python to allow a user to upload a paper and parse citations and try to assess open-access availability of the cited papers. It uses CERMINE as it was the easiest way to go, especially since it offers a web API, which allows me to distribute a simply working script, without any additional requirements.



Localizing a webapp with webL10n.js

3 min read

I am currently working on a Velib webapp. With Élie, we modularized everything so that the backend API can be edited easily, and adapted to any other bike sharing system, hence we wanted it to be easily localizable and looked for solutions compatible with as many browsers as possible. We finally chose webL10n.js. Here are some explanations about it and how to use it.

Why webL10n.js?

First thing is: why choose webL10n.js instead of anything else? We found basically four solutions: webL10n.js, L20n.js, Jed and a modified version of webL10n.js used in Gaia.

Jed takes a really different approach and, especially as we are not really familiar with localizing content, we found it more difficult to use and integrate.

The three others take a really simple approach. They use extra data-* attributes on any tag to replace on the fly the textContent of the node by a translation found in a formatted file. It is really easy to integrate, use and tweak. They support advanced features such as pluralization, HTML modifications, responsive localization (to use a different localization file on small screens), etc.

WebL10n.js and the modified version in Gaia are basically the same, except that the one in Gaia dropped hacks to add support in some old browsers such as IE6. Plus webL10n is in a separate git repo which is easy to track, so I'd rather go with this one. But the documentation is not really complete and the associated MDN wiki page is outdated. Hence this blog post :) Don't worry about the lack of recent commits on webL10n.js, it is stable and usable (and still maintained and supported by kaze).

L20n.js is the latest Mozilla project aiming at replacing webL10n.js. I had many problems with it, because the API keeps moving, and no doc is in sync with the code. Downloadable version exposes a totally different API than the one in the git repo, and the doc is not really helpful concerning which version should be considered stable. Plus the l20n file format is really weird and I'd rather not reinvent the wheel and go with standard properties file to ease translation for contributors.

Demo and extra infos

For more informations, you can refer to the README.

For an app using it, you can have a look at our VelibFxos webapp, espcially this folder. You can also see it in your browser at (under heavy work in progress, so might break from time to time).

Note: Note that there is a bug when mixing pluralization and innerHTML, which can be worked around. See this Github issue.

EDIT: Since I initially wrote this article, I came across Polyglot.js, a JS library by Airbnb, which does exactly this. It looks really nice and trustable, especially since it is backed by Airbnb which is heavily using it. One major drawback is that is uses a non-standard format for translations, instead of a po file or a properties file. But it could be easy to plug such a parser into it. It supports basically the same features as webl10n.js, except that it only has a JS API, without support for extra data- parameters. This is ok if you render your templates in JS (using React or Angular for instance), but I find it more difficult than webl10n.js to use in other cases.

, ,