Filtering ads with your Raspberry Pi

Posted on April 04, 2017 in Selfhosting • 10 min read

TL;DR: Please have a look at the benchmark section below, to be aware of the limitations of this particular setup and decide whether to spend some time putting it in place or not.

I recently came across this Pi-Hole project that claims to be “a black hole for Internet advertisements” (thanks nicofrand for making me discover this!). The idea was really attractive: having a simple Raspberry Pi on the network doing all the ad filtering for the whole network, rather than having to maintain a separate uBlock Origin install on each and every computers of the network. It was also particularly attractive as having such ad blocker on a smartphone requires a rooted device. Plus there was a really nice web interface to control the whole ad blocking device.

While looking at it more in depth, I realized it was actually very limited:

  1. First, it was built around specific softwares and was doing some magical stuff using these softwares. It was really painful to get away from them. Basically, it uses dnsmasq to expose a DNS service, a standard hosts file to block the hosts serving ads, and a lighthttpd webserver. Problem is I already have a DNS resolver (unbound) on this Raspberry Pi, and a web server (nginx). I did not want to spend a lot of time trying to integrate it in my existing setup if finally it was not that powerful, so I decided to look at it in details before installing.
  2. Second issue was that it relies on dnsmasq. dnsmasq is a simple program that allow you to answer DNS queries by using the hosts defined in /etc/hosts and to forward every other requests to another DNS server (typically your ISP DNS server). Pi-Hole lets you configure two DNS servers to forward queries to, default one being 8.8.8.8 (Google :/). I already have a resolver on this Raspberry-Pi and I do want to do the resolution myself, especially since my ISP DNS servers lies, and I do not want to use public DNS server on another network. So I had to hack on Pi-Hole to do some DNS resolution. About these issues, I’d like to point to two very interesting articles from Bortzmeyer: this one about Google DNS (in French) and this one about having your own DNS resolver (same). Also, being a DNS resolver, it may be cumbersome to disable it temporarily to load some website that absolutely requires the ads to be loaded.
  3. Last issue was that contrary to uBlock which filters at the requests level (and even sometimes at the HTML level), the fact that is basically an alternative DNS resolver means you can only filter at the domain level. That is, you either whitelist (default) or blacklist a domain which is serving ads or malwares, but you cannot differentiate different paths for a given domain. While browsing Libération website, I can see uBlock is blocking queries such as http://s3.amazonaws.com/files.wrapper.theadtech.com/native/placements/liberation.fr/pconfig?r=5a6f5d98b4d608. Such queries cannot be blocked by Pi-Hole without blacklisting the whole Amazon S3 network.

Given these facts, I remembered Privoxy which can be used as a filtering proxy, in a way similar to uBlock. Given that it is a proxy, it can filter in details, just as uBlock do and you can very easily disable it (simply disable the proxy). Plus, almost any devices offer you a proxy setting, so it should work both on my Android phones and computers. In this article I describe how I set up a Pi-Hole alternative based on Unbound (to have my own DNS resolver and block some things at the domain level) coupled with a Privoxy proxy to filter out ads.

Limitations: So, contrary to Pi-Hole, the setup described here will be able to remove ads in a similar way to uBlock/AdBlock. If you go through the whole article till the end, it will also have the element hiding features. Being a proxy setting, it will be easy to toggle it on and off (either by Prixovy toggling features or by manually turning off proxy on your device), if required. However, be aware of the remaining limitations with regards to HTTPS streams (section 4.15). As AdBlock/uBlock runs in the browser, it can filter ads in HTTPS streams as well, Privoxy will not be as efficient without HTTPS interception (which is generally not a good idea). However, it should perform rather well in the vast majority of situations (also note that AdBlock for rooted Android devices is also a proxy, so for them, it will not change anything).

I assume you already have a running Raspberry Pi with some basic install. Typically, see this previous article if this is not the case

Set up a DNS resolver

Let’s install a DNS resolver on the Raspberry Pi, to answer DNS queries on the network. I am installing unbound and configuring Unbound here.

$ sudo apt-get install unbound
$ curl -o /etc/unbound/root.hints https://www.internic.net/domain/named.cache
$ # (Optional) Set crontask to download root.hints file every six months

The last curl command is used to fetch the root hints, to query hosts that are not cached. See this section of the ArchWiki page for more infos.

Then, you can create a basic configuration file for unbound.

$ cat /etc/unbound/unbound.conf.d/local.conf
server:
    username: "unbound"
    interface: 0.0.0.0  # Listen on all interfaces
    root-hints: "/etc/unbound/root.hints"
    access-control: 192.168.0.0/8 allow  # Access-control, see "Example" section in https://www.unbound.net/documentation/unbound.conf.html

Then, enable and start Unbound service at startup:

$ sudo systemctl start unbound && sudo systemctl enable unbound

You can now change the resolver to be used on your Raspberry Pi and on your whole network. To change it on your Raspberry Pi, have a look at this wiki page (for Raspbian). To set it as the default DNS resolver on your network, have a look at your router configuration, and set the DNS resolvers address to the address of your Raspberry Pi. Don’t forget to open the ports in your firewall (53 tcp and udp).

To check everything is working fine, you can use dig (from dnsutils package on Debian-based distributions). Typically, dig google.fr should give you some results (in the ANSWER SECTION) and the IP address in the SERVER line should be the one of your Raspberry Pi.

Note: At this point, we should emphasize that having an open DNS resolver (that is, a DNS resolver that can answer to anyone) can be a security risk especially since some DDoS attacks use it. Then, you should make sure that your Raspberry Pi DNS server is only accessible from your local network, and that no third-party has access to it. This should be done through the access-control line in the above configuration, but this can also be enforced by the firewall running on your Raspberry Pi and the firewall on your router (typically, most routers provided by your ISP block any incoming connections, check this).

Block some domains based on hosts

Now, we would like unbound to block some domains that are known to serve ads and malwares, in a similar way as Pi-Hole does. For this purpose, we will use unbound-block-hosts script to import hosts files into Unbound configuration. Basically, for every such domain, Unbound will return 127.0.0.1.

unbound-block-hosts is designed with the Dan Pollock’s hosts file in mind, whereas I wanted to be able to import any host file in Unbound. Here is a forked and patched version for this purpose (very ugly patch, as I am not fluent in Perl :/).

We will create an includes dir in the Unbound configuration directory (mkdir /etc/unbound/includes/), and include the rules in the main configuration by appending include: "/etc/unbound/includes/*.conf" to the /etc/unbound/unbound.conf.d/local.conf previously created.

Now, you can run ./unbound-block-hosts --url="SOME_URL" --file=/etc/unbound/includes/FOOBAR-blocking.conf to generate a matching configuration for a given hosts list. Typically, I have a script doing:

#/bin/sh
set -e

cd "$(dirname "$0")"

echo "Fetch Malware domains list and append to unbound"
./unbound-block-hosts --url="http://www.malwaredomainlist.com/hostslist/hosts.txt" --file=/etc/unbound/includes/malwaredomainlist-blocking.conf --address="YOUR_RASPBERRY_PI_IP"
echo "Fetch Yoyo ad servers list and append to unbound"
curl "https://pgl.yoyo.org/adservers/serverlist.php?hostformat=unbound;showintro=0&mimetype=plaintext" > /etc/unbound/includes/yoyoadservers-blocking.conf

systemctl reload unbound

which is crontask-ed to run every day. Default address is 127.0.0.1 which means the local host for the client machine. I do not want to have too many 404 on my local webservers, so I’d rather put the IP address of the Raspberry Pi and have a webserver answering a 404 on it.

Install a webserver

As simple as

sudo apt-get install nginx

Install and configure Privoxy

Now, you can install Privoxy:

sudo apt-get install privoxy

The default configuration should be mostly ok. You can look at the /etc/privoxy/config file to adapt it to your needs (the file is really an example of well documented config file). Two options you might be interesting in changing are the debug option (to enable logging, which is disabled by default) and listen-addr. You will want to set the latter to:

listen-address  127.0.0.1:8118
listen-address  YOUR_RASPBERRY_PI_IP:8118

so that the Prixovy proxy is accessible from the rest of your LAN. As always, do not forget to configure your firewall to let the Privoxy connections pass through. At this point, you should try to set the proxy in your browser’s preferences and check that everything is working fine. You should be able to browse to any web page, but the proxy will not do anything else for the moment.

Note: At this point, we should emphasize that having such a proxy is a security risk, as anyone having access to your proxy can browse the web with your IP address (and you may be held liable for anything illegal done with it). Then, you should make sure that your Raspberry Pi Privoxy is only accessible from your local network, and that no third-party has access to it. This should be enforced by the firewall running on your Raspberry Pi and the firewall on your router (typically, most routers provided by your ISP block any incoming connections, check this).

Privoxy, as installed by the Raspbian package, enables a couple of filters out of the box. As we will be translating Adblock rules into Privoxy rules, we can disable them. Edit the /etc/privoxy/match-all.action file to get something like this:

#############################################################################
# Id: match-all.action,v
#
# This file contains the actions that are applied to all requests and
# may be overruled later on by other actions files. Less experienced
# users should only edit this file through the actions file editor.
#
#############################################################################
{ \
+change-x-forwarded-for{block} \
+client-header-tagger{css-requests} \
+client-header-tagger{image-requests} \
+filter{refresh-tags} \
+filter{webbugs} \
+filter{jumping-windows} \
+filter{ie-exploits} \
+hide-from-header{block} \
+hide-referrer{conditional-block} \
}
/ # Match all URLs

In particular, I disabled the filters img-reorder (which is really intensive for the Raspberry Pi, and takes a few hundreds of milliseconds to process a regular page) and banners-by-size as we will be importing Adblock rules which should give better results. deanimate-gifs and session-cookies-only is a matter of taste (respectively it prevents animated GIFs by replacing them by their last frame and only allowing temporary cookies).

Block ads using Privoxy

We will now be importing adBlock rules into privoxy. One way to do it is to use this Haskell script.

To install it directly on your Raspberry Pi, provided you have a recent Raspberry Pi:

  • Install Haskell Stack http://allocinit.io/haskell/haskell-on-raspberry-pi-3/ (if you are feeling adventurous)
  • Install adblock2privoxy https://projects.zubr.me/wiki/adblock2privoxy#from-sources

This was not my case, so I set up a builder on my server to run daily. Some rules are provided by the author and my builds are available here.

The way to set up the resulting files into Privoxy is very well detailed on the page of the project.

Note: My builds are made with the Element Hiding feature and example.com as the domainCSS parameter. You should replace any occurrence of example.com by the FQDN or IP adrdess of your Raspberry Pi when importing it. Please, do not put too much load on my hosted builds and consider hosting your owns.

Benchmark

All the tests were made with a Raspberry Pi of the first model with 512MB of RAM (model 1B). The Raspberry Pi has a wired access to internet (100MB/s port only on the Raspberry Pi). My laptop is wired as well (gigabit ethernet). Home connection to internet is a fiber access (923 Mbps download, 250 Mbps upload, as reported by DSLReports).

Testing the DNS server

Without the DNS server,

$ # Using my ISP resolver
$ % dig @192.168.0.254 example.com
...
;; Query time: 7 msec
;; SERVER: 192.168.0.254#53(192.168.0.254)

$ # Using Google DNS
$ dig @8.8.8.8 example.com
...
;; Query time: 5 msec
;; SERVER: 8.8.8.8#53(8.8.8.8)

$ # Using the DNS resolver on my Raspberry Pi
$ dig @192.168.0.1 example.com
...
;; Query time: 505 msec
;; SERVER: 192.168.0.1#53(192.168.0.1)

$ # Using it another time, now that the domain is in cache
$ dig @192.168.0.1 example.com
...
;; Query time: 5 msec
;; SERVER: 192.168.0.1#53(192.168.0.1)

These are typical times, the value is typically the one obtained as average of a few runs.

We can see that there is some overhead when first accessing a domain, as the Pi has to do the full DNS resolution. Afterwards, the domain is kept in cache and it is as fast to use the DNS server from the Pi as it is to use any other one.

Testing the Privoxy setup

Now, let us focus on the performances of the Privoxy on the Raspberry Pi. I tested it with a few websites, and results were roughly the same. Here is a detailed example of Liberation’s website, a French journal. This example is interesting as my µBlock setup on my laptop blocks 23 different things when I don’t use the DNS nor the Privoxy proxy. It is also an interesting example as out of the 23 blocked contents, only 14 of them could be blocked by DNS (with the setup described above).

The main issue here is that Privoxy is very long to process the page with all the filters, and it is way too heavy for my low power Raspberry Pi first model.

The main HTML document for this page takes 7 seconds to load when passing through the proxy, mainly due to the processing time. When reloading the page, it only takes 400ms as it is already in cache. As a comparison, it takes only 24ms when loading it directly.

The complete setup looks equivalent to the µBlock setup on my laptop.

I don’t have a more recent version of the Raspberry Pi (typically Raspberry Pi 3) to test what the performances are on such a more powerful system. If you can try it, let me know, I am curious about the way it handles the load, and I could publish an edit to this article.

EDIT: A related article of interest is https://www.shaftinc.fr/blocage-pubs-unbound.html.