How to Decrypt PGP Encrypted files in Databricks

As a Data Engineer you may come across a project where you need to Decrypt the PGP Encrypted files in order to get the data and apply transformation logic on it.

This typically happens when you are working on highly confidential projects like Employee information management, Payment Management etc.

This blog is going to help you do that in Databricks using Shell command. So, let’s get started.

What you need to decrypt the files:

  • Private Key: It is a key that allows you to decrypt the files. It should be in .asc format. It is created alongside with public key and passphrase at the time of encryption.
  • Passphrase: is a word or phrase that protects private key files. It prevents unauthorized users from encrypting them.

Both private key and passphrase should be provided by the user who has done the encryption.

Steps to decrypt and store the decrypted files in Datalake:

  1. Upload PrivateKey.asc to Datalake.
  2. Create a .txt file to store the passphrase and upload the file to Datalake.
  3. Create folder in Datalake to store the decrypted files.
  4. Open a Databricks notebook.
  5. Use the below shell command:
%sh
# Apply looping for each file in the Source folder.
for entry in /dbfs/mnt/Ization/EncryptedFiles/*
do

# Create input variable to get .pgp file names.
input=$(echo "$(basename "$entry")" | cut -f 1 -d '.')".csv.pgp"

# Create output variable to get .csv file names, it will be used to name the Decrypted files.
output=$(echo "$(basename "$entry")" | cut -f 1 -d '.')".csv"

# Import the Private key.
gpg --no-tty --batch --import /dbfs/mnt/Ization/PrivateKey.asc

# Start the decryption process.
gpg --no-tty --batch --yes --ignore-mdc-error --pinentry-mode=loopback --passphrase-fd 1 --passphrase-file /dbfs/mnt/Ization/Passphrase.txt --output /dbfs/mnt/Ization/DecryptedFiles/$output --decrypt /dbfs/mnt/Ization/EncryptedFiles/$input

done

The above code is to decrypt .csv.pgp files, you can replace .csv with .xlsx in the code if you need to decrypt .xlsx.pgp files.

How to Decrypt PGP Encrypted Files in Databricks

Thank you and Happy learning.

Ready to elevate your skills? Click here for my Python book and here for my Machine Learning book on Kindle.

2 thoughts on “How to Decrypt PGP Encrypted files in Databricks

Add yours

  1. After looking over a few of the articles on your web site,
    I really appreciate your way of blogging. I saved as a favorite it to myy bookmark website
    list and will be checking back soon. Please check out my web sjte as
    well and tell me your opinion.

    Like

  2. Hi there! I understand this is kind of off-topic but I needed to ask.
    Does operating a well-established blog like yours require a massive
    amount work? I’m completely new to blogging however I do write in my journal every day.
    I’d like to start a blog so I can easily share my experience and views online.
    Please let me know if you have any kind of recommendations
    or tips for new aspiring bloggers. Appreciate it!

    Like

Leave a reply to Terra Cancel reply

Create a website or blog at WordPress.com

Up ↑