KMS encrypted credentials with Dataflow on GCP
Encrypting credentials
Google KMS
Cloud Key Management Service (Cloud KMS) lets you create and manage encryption keys for use in compatible Google Cloud services and in your own applications. So first things first, we have to create a Google managed KMS keyring and associated key.
Google-provided Dataflow templates
Some background on why we are doing this. Google provides open source Dataflow templates that you can use instead of writing pipeline code.
If we were to use the PostgeSQL to BigQuery one for example, we find that the only way to provide the credentials in a secure way is to encrypt them with a Google managed KMS key.
Parameter | Description |
---|---|
username | Optional: The username to use for the JDBC connection. You can pass in this value encrypted by a Cloud KMS key as a Base64-encoded string. |
password | Optional: The password to use for the JDBC connection. You can pass in this value encrypted by a Cloud KMS key as a Base64-encoded string. |
We have created a template shell script that will execute the GCloud CLI command for running a Dataflow job.
In this shell script, we have the credentials as templated parameters:
gcloud dataflow flex-template run dtf-postgres-nfl-tandl-<<job_name>> \
...
--parameters connectionURL="<<connectionURL>>" \
--parameters username="<<username>>" \
--parameters password="<<password>>" \
But wow do we get the encrypted versions of these into the template.
KMS helper function
We have written a series of helper functions to make this easier for us. Firstly a custom Beam Me Up package has been written with a KMS module.
https://github.com/mortie23/beam-poetry-mono/blob/master/eng/lib/beammeup/beammeup/kms.py
The function within the module that is used to create the Base64 encoded encrypted credentials is the encode_dataflow_parameter
function. Let’s look at what it does step by step.
# Uses another Beam Me Up function to encrypt any string value
encrypted_parameter = encrypt_symmetric(
project_id=project_id,
location_id=location_id,
key_ring_id=key_ring_id,
key_id=key_id,
plaintext=parameter,
)
# Base64 encodes the returned ciphertext (which is in bytes) and then decodes the bytes to a string
base64_encrypted_parameter = base64.b64encode(
encrypted_parameter.ciphertext
).decode("utf-8")
Let’s look at each of the parts of this to figure out what is happening. Dataflow jobs require a KMS encoded
Encrypted parameter
The result from the call to encrypt_symmetric
is an object with multiple parts. We just need the ciphertext.
name: "projects/prj-xyz-prd-fruit/locations/australia-southeast1/keyRings/keyring-fruit/cryptoKeys/key-fruit/cryptoKeyVersions/1"
ciphertext: "\n$\000\304\227|\301\324\343z\227\227\206I\214^\310\326\301\234O`\243|A\n\342\275v\332|\314|(>\252;\243\022J\000\254\004\343\003\375[\3760\333\027>(\311|k\"\302\01426\334\207/\361\\\226-\001MP\375R0\017\270v\310\323\351\331\2140\217v2P9\242\333Y\262\307\225G\200\345a\373;)\257\260\263\344]1\323\322W\241\270J\t"
ciphertext_crc32c {
value: 2931620773
}
verified_plaintext_crc32c: true
protection_level: SOFTWARE
Cipher text
The ciphertext itself is in bytes:
b'\n$\x00\xc4\x97|\xc1\xd4\xe3z\x97\x97\x86I\x8c^\xc8\xd6\xc1\x9cO`\xa3|A\n\xe2\xbdv\xda|\xcc|(>\xaa;\xa3\x12J\x00\xac\x04\xe3\x03\xfd[\xfe0\xdb\x17>(\xc9|k"\xc2\x0c26\xdc\x87/\xf1\\\x96-\x01MP\xfdR0\x0f\xb8v\xc8\xd3\xe9\xd9\x8c0\x8fv2P9\xa2\xdbY\xb2\xc7\x95G\x80\xe5a\xfb;)\xaf\xb0\xb3\xe4]1\xd3\xd2W\xa1\xb8J\t'
Base64 encoded
The result of Base64 encoding these bytes are bytes again.
b'CiQAxJd8wdTjepeXhkmMXsjWwZxPYKN8QQrivXbafMx8KD6qO6MSSgCsBOMD/Vv+MNsXPijJfGsiwgwyNtyHL/Fcli0BTVD9UjAPuHbI0+nZjDCPdjJQOaLbWbLHlUeA5WH7OymvsLPkXTHT0lehuEoJ'
Decoded
Decoding these bytes leaves us with a string we can pass into the GCloud CLI Dataflow template parameter.
'CiQAxJd8wdTjepeXhkmMXsjWwZxPYKN8QQrivXbafMx8KD6qO6MSSgCsBOMD/Vv+MNsXPijJfGsiwgwyNtyHL/Fcli0BTVD9UjAPuHbI0+nZjDCPdjJQOaLbWbLHlUeA5WH7OymvsLPkXTHT0lehuEoJ'
Filling in the parameters
A Python script is used that can source the Beam Me Up package and use our helper functions to encrypt a given credential from a .dotenv
file and then replace the values in the shell script.
encoded_username = encode_dataflow_parameter(
project_id=cfg.project_id,
location_id=cfg.location,
key_ring_id=cfg.key_ring_id,
key_id=cfg.key_id,
parameter=username,
)