Filtering ads with your Raspberry Pi
Posted on April 04, 2017 in Selfhosting • 10 min read
TL;DR: Please have a look at the benchmark section below, to be aware of the limitations of this particular setup and decide whether to spend some time putting it in place or not.
I recently came across this Pi-Hole project that claims to be “a black hole for Internet advertisements” (thanks nicofrand for making me discover this!). The idea was really attractive: having a simple Raspberry Pi on the network doing all the ad filtering for the whole network, rather than having to maintain a separate uBlock Origin install on each and every computers of the network. It was also particularly attractive as having such ad blocker on a smartphone requires a rooted device. Plus there was a really nice web interface to control the whole ad blocking device.
While looking at it more in depth, I realized it was actually very limited:
- First, it was built around specific softwares and was doing some magical
stuff using these softwares. It was really painful to get away from them.
Basically, it uses
dnsmasq
to expose a DNS service, a standardhosts
file to block the hosts serving ads, and a lighthttpd webserver. Problem is I already have a DNS resolver (unbound
) on this Raspberry Pi, and a web server (nginx
). I did not want to spend a lot of time trying to integrate it in my existing setup if finally it was not that powerful, so I decided to look at it in details before installing. - Second issue was that it relies on
dnsmasq
.dnsmasq
is a simple program that allow you to answer DNS queries by using the hosts defined in/etc/hosts
and to forward every other requests to another DNS server (typically your ISP DNS server). Pi-Hole lets you configure two DNS servers to forward queries to, default one being 8.8.8.8 (Google
:/). I already have a resolver on this Raspberry-Pi and I do want to do the resolution myself, especially since my ISP DNS servers lies, and I do not want to use public DNS server on another network. So I had to hack on Pi-Hole to do some DNS resolution. About these issues, I’d like to point to two very interesting articles from Bortzmeyer: this one about Google DNS (in French) and this one about having your own DNS resolver (same). Also, being a DNS resolver, it may be cumbersome to disable it temporarily to load some website that absolutely requires the ads to be loaded. - Last issue was that contrary to uBlock which filters at the requests level
(and even sometimes at the HTML level), the fact that is basically an
alternative DNS resolver means you can only filter at the domain level.
That is, you either whitelist (default) or blacklist a domain which is
serving ads or malwares, but you cannot differentiate different paths for a
given domain. While browsing Libération website, I
can see uBlock is blocking queries such as
http://s3.amazonaws.com/files.wrapper.theadtech.com/native/placements/liberation.fr/pconfig?r=5a6f5d98b4d608
. Such queries cannot be blocked by Pi-Hole without blacklisting the whole Amazon S3 network.
Given these facts, I remembered Privoxy which can be used as a filtering proxy, in a way similar to uBlock. Given that it is a proxy, it can filter in details, just as uBlock do and you can very easily disable it (simply disable the proxy). Plus, almost any devices offer you a proxy setting, so it should work both on my Android phones and computers. In this article I describe how I set up a Pi-Hole alternative based on Unbound (to have my own DNS resolver and block some things at the domain level) coupled with a Privoxy proxy to filter out ads.
Limitations: So, contrary to Pi-Hole, the setup described here will be able to remove ads in a similar way to uBlock/AdBlock. If you go through the whole article till the end, it will also have the element hiding features. Being a proxy setting, it will be easy to toggle it on and off (either by Prixovy toggling features or by manually turning off proxy on your device), if required. However, be aware of the remaining limitations with regards to HTTPS streams (section 4.15). As AdBlock/uBlock runs in the browser, it can filter ads in HTTPS streams as well, Privoxy will not be as efficient without HTTPS interception (which is generally not a good idea). However, it should perform rather well in the vast majority of situations (also note that AdBlock for rooted Android devices is also a proxy, so for them, it will not change anything).
I assume you already have a running Raspberry Pi with some basic install. Typically, see this previous article if this is not the case
Set up a DNS resolver
Let’s install a DNS resolver on the Raspberry Pi, to answer DNS queries on the network. I am installing unbound and configuring Unbound here.
$ sudo apt-get install unbound
$ curl -o /etc/unbound/root.hints https://www.internic.net/domain/named.cache
$ # (Optional) Set crontask to download root.hints file every six months
The last curl
command is used to fetch the root hints, to query hosts that
are not cached. See this section of the ArchWiki
page for more infos.
Then, you can create a basic configuration file for unbound.
$ cat /etc/unbound/unbound.conf.d/local.conf
server:
username: "unbound"
interface: 0.0.0.0 # Listen on all interfaces
root-hints: "/etc/unbound/root.hints"
access-control: 192.168.0.0/8 allow # Access-control, see "Example" section in https://www.unbound.net/documentation/unbound.conf.html
Then, enable and start Unbound service at startup:
$ sudo systemctl start unbound && sudo systemctl enable unbound
You can now change the resolver to be used on your Raspberry Pi and on your whole network. To change it on your Raspberry Pi, have a look at this wiki page (for Raspbian). To set it as the default DNS resolver on your network, have a look at your router configuration, and set the DNS resolvers address to the address of your Raspberry Pi. Don’t forget to open the ports in your firewall (53 tcp and udp).
To check everything is working fine, you can use dig
(from dnsutils
package on Debian-based distributions). Typically, dig google.fr
should give
you some results (in the ANSWER SECTION
) and the IP address in the SERVER
line should be the one of your Raspberry Pi.
Note: At this point, we should emphasize that having an open DNS resolver
(that is, a DNS resolver that can answer to anyone) can be a security risk
especially since some DDoS attacks use
it.
Then, you should make sure that your Raspberry Pi DNS server is only
accessible from your local network, and that no third-party has access to it.
This should be done through the access-control
line in the above
configuration, but this can also be enforced by the firewall running on your
Raspberry Pi and the firewall on your router (typically, most routers provided
by your ISP block any incoming connections, check this).
Block some domains based on hosts
Now, we would like unbound
to block some domains that are known to serve ads
and malwares, in a similar way as Pi-Hole does. For this purpose, we will use
unbound-block-hosts
script
to import hosts
files into Unbound configuration. Basically, for every such
domain, Unbound will return 127.0.0.1
.
unbound-block-hosts
is designed with the Dan Pollock’s hosts
file in mind, whereas I wanted to be able to
import any host file in Unbound.
Here is a forked and patched
version for this purpose (very ugly patch, as I am not fluent in Perl :/).
We will create an includes
dir in the Unbound configuration directory
(mkdir /etc/unbound/includes/
), and include the rules in the main
configuration by appending include: "/etc/unbound/includes/*.conf"
to the
/etc/unbound/unbound.conf.d/local.conf
previously created.
Now, you can run ./unbound-block-hosts --url="SOME_URL"
--file=/etc/unbound/includes/FOOBAR-blocking.conf
to generate a matching
configuration for a given hosts
list. Typically, I have a script doing:
#/bin/sh
set -e
cd "$(dirname "$0")"
echo "Fetch Malware domains list and append to unbound"
./unbound-block-hosts --url="http://www.malwaredomainlist.com/hostslist/hosts.txt" --file=/etc/unbound/includes/malwaredomainlist-blocking.conf --address="YOUR_RASPBERRY_PI_IP"
echo "Fetch Yoyo ad servers list and append to unbound"
curl "https://pgl.yoyo.org/adservers/serverlist.php?hostformat=unbound;showintro=0&mimetype=plaintext" > /etc/unbound/includes/yoyoadservers-blocking.conf
systemctl reload unbound
which is crontask-ed to run every day. Default address is 127.0.0.1
which
means the local host for the client machine. I do not want to have too many
404 on my local webservers, so I’d rather put the IP address of the Raspberry
Pi and have a webserver answering a 404 on it.
Install a webserver
As simple as
sudo apt-get install nginx
Install and configure Privoxy
Now, you can install Privoxy:
sudo apt-get install privoxy
The default configuration should be mostly ok. You can look at the
/etc/privoxy/config
file to adapt it to your needs (the file is really an
example of well documented config file). Two options you might be interesting
in changing are the debug
option (to enable logging, which is disabled by
default) and listen-addr
. You will want to set the latter to:
listen-address 127.0.0.1:8118
listen-address YOUR_RASPBERRY_PI_IP:8118
so that the Prixovy proxy is accessible from the rest of your LAN. As always, do not forget to configure your firewall to let the Privoxy connections pass through. At this point, you should try to set the proxy in your browser’s preferences and check that everything is working fine. You should be able to browse to any web page, but the proxy will not do anything else for the moment.
Note: At this point, we should emphasize that having such a proxy is a security risk, as anyone having access to your proxy can browse the web with your IP address (and you may be held liable for anything illegal done with it). Then, you should make sure that your Raspberry Pi Privoxy is only accessible from your local network, and that no third-party has access to it. This should be enforced by the firewall running on your Raspberry Pi and the firewall on your router (typically, most routers provided by your ISP block any incoming connections, check this).
Privoxy, as installed by the Raspbian package, enables a couple of filters out
of the box. As we will be translating Adblock rules into Privoxy rules, we can
disable them. Edit the /etc/privoxy/match-all.action
file to get something
like this:
#############################################################################
# Id: match-all.action,v
#
# This file contains the actions that are applied to all requests and
# may be overruled later on by other actions files. Less experienced
# users should only edit this file through the actions file editor.
#
#############################################################################
{ \
+change-x-forwarded-for{block} \
+client-header-tagger{css-requests} \
+client-header-tagger{image-requests} \
+filter{refresh-tags} \
+filter{webbugs} \
+filter{jumping-windows} \
+filter{ie-exploits} \
+hide-from-header{block} \
+hide-referrer{conditional-block} \
}
/ # Match all URLs
In particular, I disabled the filters img-reorder
(which is really intensive
for the Raspberry Pi, and takes a few hundreds of milliseconds to process a
regular page) and banners-by-size
as we will be importing Adblock rules
which should give better results. deanimate-gifs
and session-cookies-only
is a matter of taste (respectively it prevents animated GIFs by replacing them
by their last frame and only allowing temporary cookies).
Block ads using Privoxy
We will now be importing adBlock rules into privoxy. One way to do it is to use this Haskell script.
To install it directly on your Raspberry Pi, provided you have a recent Raspberry Pi:
- Install Haskell Stack http://allocinit.io/haskell/haskell-on-raspberry-pi-3/ (if you are feeling adventurous)
- Install adblock2privoxy https://projects.zubr.me/wiki/adblock2privoxy#from-sources
This was not my case, so I set up a builder on my server to run daily. Some rules are provided by the author and my builds are available here.
The way to set up the resulting files into Privoxy is very well detailed on the page of the project.
Note: My builds are made with the Element Hiding feature and example.com
as the domainCSS
parameter. You should replace any occurrence of
example.com
by the FQDN or IP adrdess of your Raspberry Pi when importing
it. Please, do not put too much load on my hosted builds and consider hosting
your owns.
Benchmark
All the tests were made with a Raspberry Pi of the first model with 512MB of RAM (model 1B). The Raspberry Pi has a wired access to internet (100MB/s port only on the Raspberry Pi). My laptop is wired as well (gigabit ethernet). Home connection to internet is a fiber access (923 Mbps download, 250 Mbps upload, as reported by DSLReports).
Testing the DNS server
Without the DNS server,
$ # Using my ISP resolver
$ % dig @192.168.0.254 example.com
...
;; Query time: 7 msec
;; SERVER: 192.168.0.254#53(192.168.0.254)
$ # Using Google DNS
$ dig @8.8.8.8 example.com
...
;; Query time: 5 msec
;; SERVER: 8.8.8.8#53(8.8.8.8)
$ # Using the DNS resolver on my Raspberry Pi
$ dig @192.168.0.1 example.com
...
;; Query time: 505 msec
;; SERVER: 192.168.0.1#53(192.168.0.1)
$ # Using it another time, now that the domain is in cache
$ dig @192.168.0.1 example.com
...
;; Query time: 5 msec
;; SERVER: 192.168.0.1#53(192.168.0.1)
These are typical times, the value is typically the one obtained as average of a few runs.
We can see that there is some overhead when first accessing a domain, as the Pi has to do the full DNS resolution. Afterwards, the domain is kept in cache and it is as fast to use the DNS server from the Pi as it is to use any other one.
Testing the Privoxy setup
Now, let us focus on the performances of the Privoxy on the Raspberry Pi. I tested it with a few websites, and results were roughly the same. Here is a detailed example of Liberation’s website, a French journal. This example is interesting as my µBlock setup on my laptop blocks 23 different things when I don’t use the DNS nor the Privoxy proxy. It is also an interesting example as out of the 23 blocked contents, only 14 of them could be blocked by DNS (with the setup described above).
The main issue here is that Privoxy is very long to process the page with all the filters, and it is way too heavy for my low power Raspberry Pi first model.
The main HTML document for this page takes 7 seconds to load when passing through the proxy, mainly due to the processing time. When reloading the page, it only takes 400ms as it is already in cache. As a comparison, it takes only 24ms when loading it directly.
The complete setup looks equivalent to the µBlock setup on my laptop.
I don’t have a more recent version of the Raspberry Pi (typically Raspberry Pi 3) to test what the performances are on such a more powerful system. If you can try it, let me know, I am curious about the way it handles the load, and I could publish an edit to this article.
EDIT: A related article of interest is https://www.shaftinc.fr/blocage-pubs-unbound.html.