Copying a table from Teradata to Google BigQuery
Copying a table from Teradata to Google BigQuery
For this demo I will be using a Windows PC including:
- Teradata Studio
- Teradata ClearScape
- Google BigQuery
- Google DataFlow
This will demonstrate the most basic copy of a table from a Teradata ClearScape instance to a Google Cloud BigQuery table.
It will not discuss things such as:
- Automation of schema creation for the target table in BigQuery
- Scheduling the pipeline
- Handling drop and recreate or truncate and load of the target table
- Other transformations
Sign up for Teradata ClearScape
Getting a cloud hosted Teradata instance for testing these days is easy using the free trial of Teradata ClearScape.
Check environment
Once the environment is created, let’s use the API to check a few things available with the free trail. Click on Copy API Token.
Create a file called .env
and paste it in that file like this.
API_TOKEN=<token>
Create the following Python script and install the required packages with pip and then run it.
import requests
import os
import json
from dotenv import load_dotenv
# Load environment variables from .env file
load_dotenv()
# Get the bearer token from the environment variables
API_TOKEN = os.getenv('API_TOKEN')
url = 'https://api.clearscape.teradata.com/environments'
headers = {
'accept': 'application/json',
'Authorization': f'Bearer {API_TOKEN}'
}
# Handle response
response = requests.get(url, headers=headers)
data = response.json()
formatted_data = json.dumps(data, indent=4)
print(formatted_data)
The response should look something like this.
[
{
"state": "RUNNING",
"region": "us-central",
"services": [
{
"name": "Jupyterlab",
"state": "RUNNING",
"url": "https://dev-gcp-copy-<id>.env.clearscape.teradata.com:8443/lab/tree/README.md?token=<token>",
"credentials": []
},
{
"name": "Vantage",
"state": "RUNNING",
"url": "bteq .logon dev-gcp-copy-<id>.env.clearscape.teradata.com/demo_user",
"credentials": [
{
"name": "username",
"value": "demo_user"
}
]
},
{
"name": "Query Service",
"state": "RUNNING",
"url": "https://dev-gcp-copy-<id>.env.clearscape.teradata.com:1443/",
"credentials": [
{
"name": "username",
"value": "demo_user"
}
]
},
{
"name": "ModelOps",
"state": "RUNNING",
"url": "https://dev-gcp-copy-<id>.env.clearscape.teradata.com:8443/modelops",
"credentials": [
{
"name": "username",
"value": "demo_user"
}
]
}
],
"name": "dev-gcp-copy",
"ip": "<ip-address>",
"dnsName": "dev-gcp-copy-<id>.env.clearscape.teradata.com",
"owner": "christopher@mortimer.xyz",
"type": "clearscape",
"createDate": "2023-10-03T09:44:10.710Z"
}
]
Connect with Teradata Studio
To test we can connect to the Teradata Database we will use Teradata Studio installed on our Windows client machine.
Create source table
Let’s create a very simple demo hello word table for copying to BigQuery.
create table demo_user.testing (
col1 integer
, col2 varchar(100)
)
;
insert into demo_user.testing values (1, 'hello');
insert into demo_user.testing values (2, 'world');
BigQuery Data Flow
Target table
For this example I have created a target table with the same schema as the source table in BigQuery using the GUI. For simplicity, I have also created a BigQuery Dataset with the same name as the source Teradata database.
Cloud Storage for driver
I have created a Google Cloud Storage bucket and downloaded the latest JDBC driver jar file and copied it up to cloud storage.
I have also added a temp
folder for later use for the DataFlow job.
DataFlow job
Create a new Dataflow job with the following configuration options.
We will be using the Dataflow template of JDBC to BigQuery.
JDBCConnectionURL=jdbc:teradata://dev-gcp-copy-<id>.env.clearscape.teradata.com/demo_user
JDBCDriverClassName=com.teradata.jdbc.TeraDriver
For the optional parameters we will need to add the username and password for the Teradata database.
This blog post does not consider how you might store the password as a secret or in a key vault of some sort. Or at a minimal, base64 encode it so it is not visible in plain text.
Note: The two erroneous jobs were before the username and password were added to the optional parameters. The error message can be seen below.
java.sql.SQLException: [Teradata JDBC Driver] [TeraJDBC 17.20.00.15] [Error 1032] [SQLState HY000] Single Sign-On NOT supported for Mechanism TD2.
Testing the results
After running the DataFlow job twice, and querying the target table we can see that the data has successfully been moved from the Teradata Database to the BigQuery Dataset.