Parallel GPG decryption

Posted on January 14, 2023 in Dev • 3 min read

I recently had to decrypt about 500k asymetrically GPG-encrypted strings and found out that it can take quite a lot of time while not being as straightforward as one might think to process in a concurrent manner to accelerate the computation time. Here is a quick work around for GPG parallel decryption.

I was initially using python-gnupg in a simple Python for loop:

import gc
import getpass
import logging
import shutil
import tempfile
from pathlib import Path

import gnupg

def _decrypt_text(encrypted_text, gpg):
    decrypted = gpg.decrypt(encrypted_text)
    assert decrypted.ok, 'Unable to decrypt!'
    return decrypted.data.decode()


private_key_path = '...'  # Path to the exported private key to use
encrypted_texts = [...]  # List of encrypted texts to decrypt

try:
    gnupghome = tempfile.mkdtemp()
    gpg = gnupg.GPG(gnupghome=gnupghome)
    private_key_path = str(Path(private_key_path).expanduser())
    logging.info('Importing private key from %s...', private_key_path)
    with open(private_key_path, 'rb') as fh:
        private_key = gpg.decrypt_file(fh, passphrase=getpass.getpass('Private key passphrase?'))
    assert private_key.ok, 'Unable to decrypt private key with provided passphrase!'
    import_key = gpg.import_keys(private_key.data)
    assert import_key.count > 0, 'No private key could be imported!'
    logging.info('The following %s keys where successfully imported: %s', import_key.count, ','.join(set(import_key.fingerprints)))

    decrypted_texts = [
        _decrypt_text(encrypted_text, gpg) for encrypted_text in encrypted_texts
    ]
finally:
    shutil.rmtree(tmp_gnupghome)
    gc.collect()

This was taking a substantial amount of time and (obviously) running on a single CPU core. Extrapolating from the run time to process 1k records (3 seconds), it would take an approximate of 25 hours.

So, I thought about using as much CPU cores as possible, but parallel processing turned out not to be as simple as a wrapping of my for loop with joblib.

This is because GPG, since version 2, relies heavily on gpg-agent which is a single process and acts as a bottleneck here. I’m not very familiar with the GPG code base but there are a couple of issues around on this topic (this one for instance).

A quick work around is to run multiple gpg-agent in parallel. This is easily achieved in practice since GPG creates one gpg-agent process per GNUPGHOME. So, I could simply initialize a pool of GPG instances and rotate among them for parallel processing:

import gc
import getpass
import logging
import shutil
from pathlib import Path

import gnupg
from joblib import Parallel, delayed

private_key_path = '...'  # Path to the exported private key to use
encrypted_texts = [...]  # List of encrypted texts to decrypt
n_jobs = 100  # Needs to be tuned to your machine specs

try:
    gpg = gnupg.GPG()
    private_key_path = str(Path(private_key_path).expanduser())
    with open(private_key_path, 'rb') as fh:
        private_key = gpg.decrypt_file(fh, passphrase=getpass.getpass('Private key passphrase?'))
    assert private_key.ok, 'Unable to decrypt private key with provided passphrase!'

    gpg_workers_list = {}
    for i in range(n_jobs):
        tmp_gnupghome = tempfile.mkdtemp()
        logging.info('Using temporary GNUPGHOME %s...', tmp_gnupghome)
        gpg = gnupg.GPG(gnupghome=tmp_gnupghome)

        logging.info('Importing private key from %s...', private_key_path)
        import_key = gpg.import_keys(private_key.data)
        assert import_key.count > 0, 'No private key could be imported!'
        logging.info('The following %s keys where successfully imported: %s', import_key.count, ','.join(set(import_key.fingerprints)))

        gpg_workers_list[tmp_gnupghome] = gpg

    decrypted_texts = Parallel(n_jobs=n_jobs)(
        delayed(_decrypt_text)(
            encrypted_text,
            list(gpg_workers_list.values())[i % n_jobs]
        )
        for i, encrypted_text in enumerate(encrypted_texts)
    )
finally:
    for tmp_gnupghome in gpg_workers_list.keys():
        shutil.rmtree(tmp_gnupghome)
    gc.collect()

With this workaround, I was able to process the full 500k records in less than 15 minutes.

Note: The goal of this snippet is to demonstrate how to quickly get parallel decryption of GPG-encrypted texts. Much can be done to handle passphrases and secrets in a more secure manner, but this is beyond the scope of this article.