“You are not selfhosting”… that is more or less what Laurent Guerby (founder of Tetaneutral.net told me last week after a presentation at the ENS). At first, it surprised me, because I am selfhosting. I have my own server to serve this blog, to handle my emails, to connect to IRC, to handle my XMPP account, to provide me a fully working network access and so on. Basically every part of my life on internet passes somehow through my server, so I am tempted to consider that I do selfhost.
Plus I wrote some notes (in French) about the checklist before selfhosting and the first step in selfhosting. What is the status of this document if indeed I am not selfhosting?
“You need two machines to selfhost, not one”
Actually, most people forget it when they talk about selfhosting, but to fully selfhost, you need 2 machines, and not only one. Indeed, if you use the cloud services from Google, Facebook, etc., your data are replicated across their datacenters. Your data are stored multiple times, in multiple places around the world and this limits the risk of data losses (both temporary, due to network issues, or permanently).
On the contrary, if you selfhost on a single server, your data is stored only at one place, on one machine. This means that any problem on this machine will kill your entire numerical life. Some people use two servers, but at the same place (a Synology and a Raspi at home for instance). In this case, you do have data replication and you can indeed recover from a material problem, but you will run into troubles if there is a physical problem, like a fire or a simple network failure.
Imagine you have MX backup on two machines on the same network access. If one of the server experiences problems, the other one will be able to handle emails delivery, but if the problem is on the network access, MX backup is useless.
Same thing happen if you have a fire at home and all your devices (backupped one plus the server to store the backups) are at home. You will simply lose everything.
Then, if you care about the service you benefit from, you should have two servers in distinct places. Else, you will not have the same quality of service as the one you would have with the standard cloud, and selfhosting is not that much interesting and not worth the price. Note that you do not need to actually own two machines: you can also use a part of a friend’s server, as discussed below.
On this part, I am backupping my server daily on a friend server (encrypted backups, so no privacy concerns to have), and we set up some redundancy as MX backups and so. Then, I have effectively two machines, so that my downtime is reduced, and I can mitigate problems with my main server.
“You need to be able to move quickly”
Another problem, often neglected (starting from my own notes) is the ability to move quickly to another place, and to fully disappear (so that noone can recover anything from your server). This, of course, depends on your hosting-provider, and if you have a one year contract with your hosting-provider, it might not be that easy (or cheap at least) to move to another server, in the end.
The point of being able to move quickly is:
- To be able to move if you have troubles with your hosting-provider, so that you are fully free and selfhosted and not tied to your hosting-provider.
- To be able to setup a new server very quickly in case of problems (electrical failure, network problem, hard-drive failure, …), to reduce downtime.
Reconfiguring a server is a time-consuming task, and is a real pain, so you will want to do it as less as possible, and will hence increase the dependency on your hosting-provider.
People handling thousands of servers in the cloud use tools like Puppet, Ansible, Chef or simple shell scripts to spawn new machines upon need and replicate their config across all their machines. Even if you have only one or two machines, you should consider using it as they are rather easy to use and not reserved to use with thousands of machines. It will ease configuration, it will allow you to version your configuration and to easily spawn a new server very quickly. Combined with data backup, you can virtually spawn a working new server in a few minutes, anywhere you want, anytime you want.
Another point to consider is also the ability to disappear quickly and prevent anyone from recovering your data, especially if you own a server provided by your hosting-provider. These are not necessarily wiped between two users, and the next one could potentially recover sensitive data from the hard drive (like emails). A simple solution (at the cost of a bit of computational power) to this is to encrypt your disk, and delete the sector storing the keys before returning the server. This way, your hard drive will just look like garbage to the next owner, and he will not be able to recover anything from it.
Moreover, here are some other thoughts which came to me while I was thinking in the previous points.
Mutualization is not centralization
Most of the time, when we speak about selfhosting, we speak about self which means all the debate is focused on the user. We often oppose selfhosting (decentralization) to centralization, and focus only on the setup of one server for one user. This can be well seen in projects like Yunohost which allows any user to setup easily its own personal server on an old machine or a Raspberry Pi, but will not offer anything else.
However, mutualization is not centralization, and selfhosted users could benefit a lot from mutualization. Distributions like Yunohost should build a “network” of users, and offer them mutualization capabilities. For instance, one should be able to easily let 10% of his hard drive to be used for other users’ backups, and benefits from the ability to replicate its backups on others servers. This way, sharing a bit of storage, one can have a distributed backup storage, resilient to network failures and server issues. This is especially easy as we have technologies which allow us to do distributed and encrypted file storage across multiple servers, such as Tahoe-LAFS. But such solutions are not promoted at all by distributions like Yunohost…
Indeed, maintaining two servers is quite a pain, and might cost a lot, especially if one of them is just here for backups. On the contrary, you might know someone having a server and who is ready to exchange a bit of storage, bandwidth and CPU power to mirror your services and handle backups. This network of trusted users (or not necessarily, if you use encryption and distributed file storage) is a key element of selfhosting, in my opinion, and should be promoted and made easy.
Trust in the server
Last but not least, as soon as you rent a server to someone else, you need to trust him to give you the server he advertised, and to not abuse his power. If you pay a server, you might expect to have the server you ordered, or the hosting-provider will suffer from the secret if known. But what about the configuration he sets up for you? What about judiciary requests (or not so official requests)? Will he simply give the information, the access to your server in the datacenter etc easily, or will he warn you before? These are things that should be considered, with more or less care depending on why you want your server and the way you use it. But such concerns should not be avoided and should be taken into account (although it will most likely be better than with a cloud provider, in any case).
Possible mitigations for this point are to fully self-host your server (which means buying a server or recycle an old PC, host it at home, if your network connection is powerful enough, or rent a rack in a datacenter), but this can cost much more. Another solution is to look at associative hosting providers, such as Tetaneutral.net which offers to host your server, no matter what its format is. As it is your machine, you know the material, and you can add the security level you want (such as killswitches and so on).