Parallel GPG decryption
Posted on January 14, 2023 in Dev • 3 min read
I recently had to decrypt about 500k asymetrically GPG-encrypted strings and found out that it can take quite a lot of time while not being as straightforward as one might think to process in a concurrent manner to accelerate the computation time. Here is a quick work around for GPG parallel decryption.
I was initially using
python-gnupg
in a simple Python
for
loop:
import gc
import getpass
import logging
import shutil
import tempfile
from pathlib import Path
import gnupg
def _decrypt_text(encrypted_text, gpg):
decrypted = gpg.decrypt(encrypted_text)
assert decrypted.ok, 'Unable to decrypt!'
return decrypted.data.decode()
private_key_path = '...' # Path to the exported private key to use
encrypted_texts = [...] # List of encrypted texts to decrypt
try:
gnupghome = tempfile.mkdtemp()
gpg = gnupg.GPG(gnupghome=gnupghome)
private_key_path = str(Path(private_key_path).expanduser())
logging.info('Importing private key from %s...', private_key_path)
with open(private_key_path, 'rb') as fh:
private_key = gpg.decrypt_file(fh, passphrase=getpass.getpass('Private key passphrase?'))
assert private_key.ok, 'Unable to decrypt private key with provided passphrase!'
import_key = gpg.import_keys(private_key.data)
assert import_key.count > 0, 'No private key could be imported!'
logging.info('The following %s keys where successfully imported: %s', import_key.count, ','.join(set(import_key.fingerprints)))
decrypted_texts = [
_decrypt_text(encrypted_text, gpg) for encrypted_text in encrypted_texts
]
finally:
shutil.rmtree(tmp_gnupghome)
gc.collect()
This was taking a substantial amount of time and (obviously) running on a single CPU core. Extrapolating from the run time to process 1k records (3 seconds), it would take an approximate of 25 hours.
So, I thought about using as much CPU cores as possible, but parallel
processing turned out not to be as simple as a wrapping of my for
loop with
joblib
.
This is because GPG, since version 2, relies heavily on gpg-agent
which is a
single process and acts as a bottleneck here. I’m not very familiar with the
GPG code base but there are a couple of issues around on this topic (this
one for instance).
A quick work around is to run multiple gpg-agent
in parallel. This is easily
achieved in practice since GPG creates one gpg-agent
process per
GNUPGHOME
. So, I could simply initialize a pool of GPG
instances and
rotate among them for parallel processing:
import gc
import getpass
import logging
import shutil
from pathlib import Path
import gnupg
from joblib import Parallel, delayed
private_key_path = '...' # Path to the exported private key to use
encrypted_texts = [...] # List of encrypted texts to decrypt
n_jobs = 100 # Needs to be tuned to your machine specs
try:
gpg = gnupg.GPG()
private_key_path = str(Path(private_key_path).expanduser())
with open(private_key_path, 'rb') as fh:
private_key = gpg.decrypt_file(fh, passphrase=getpass.getpass('Private key passphrase?'))
assert private_key.ok, 'Unable to decrypt private key with provided passphrase!'
gpg_workers_list = {}
for i in range(n_jobs):
tmp_gnupghome = tempfile.mkdtemp()
logging.info('Using temporary GNUPGHOME %s...', tmp_gnupghome)
gpg = gnupg.GPG(gnupghome=tmp_gnupghome)
logging.info('Importing private key from %s...', private_key_path)
import_key = gpg.import_keys(private_key.data)
assert import_key.count > 0, 'No private key could be imported!'
logging.info('The following %s keys where successfully imported: %s', import_key.count, ','.join(set(import_key.fingerprints)))
gpg_workers_list[tmp_gnupghome] = gpg
decrypted_texts = Parallel(n_jobs=n_jobs)(
delayed(_decrypt_text)(
encrypted_text,
list(gpg_workers_list.values())[i % n_jobs]
)
for i, encrypted_text in enumerate(encrypted_texts)
)
finally:
for tmp_gnupghome in gpg_workers_list.keys():
shutil.rmtree(tmp_gnupghome)
gc.collect()
With this workaround, I was able to process the full 500k records in less than 15 minutes.
Note: The goal of this snippet is to demonstrate how to quickly get parallel decryption of GPG-encrypted texts. Much can be done to handle passphrases and secrets in a more secure manner, but this is beyond the scope of this article.