Teradata,  Data Engineering,  GCP,  BigQuery

Copying a table from Teradata to Google BigQuery

Copying a table from Teradata to Google BigQuery

Copying a table from Teradata to Google BigQuery

For this demo I will be using a Windows PC including:

This will demonstrate the most basic copy of a table from a Teradata ClearScape instance to a Google Cloud BigQuery table.

It will not discuss things such as:

  • Automation of schema creation for the target table in BigQuery
  • Scheduling the pipeline
  • Handling drop and recreate or truncate and load of the target table
  • Other transformations

Sign up for Teradata ClearScape

Getting a cloud hosted Teradata instance for testing these days is easy using the free trial of Teradata ClearScape.

tdbq create environment

tdbq initializing environment

Check environment

Once the environment is created, let’s use the API to check a few things available with the free trail. Click on Copy API Token.

tdbq api token

Create a file called .env and paste it in that file like this.

API_TOKEN=<token>

Create the following Python script and install the required packages with pip and then run it.

import requests
import os
import json
from dotenv import load_dotenv

# Load environment variables from .env file
load_dotenv()
# Get the bearer token from the environment variables
API_TOKEN = os.getenv('API_TOKEN')

url = 'https://api.clearscape.teradata.com/environments'
headers = {
    'accept': 'application/json',
    'Authorization': f'Bearer {API_TOKEN}'
}
# Handle response
response = requests.get(url, headers=headers)
data = response.json()
formatted_data = json.dumps(data, indent=4)
print(formatted_data)

The response should look something like this.

[
  {
    "state": "RUNNING",
    "region": "us-central",
    "services": [
      {
        "name": "Jupyterlab",
        "state": "RUNNING",
        "url": "https://dev-gcp-copy-<id>.env.clearscape.teradata.com:8443/lab/tree/README.md?token=<token>",
        "credentials": []
      },
      {
        "name": "Vantage",
        "state": "RUNNING",
        "url": "bteq .logon dev-gcp-copy-<id>.env.clearscape.teradata.com/demo_user",
        "credentials": [
          {
            "name": "username",
            "value": "demo_user"
          }
        ]
      },
      {
        "name": "Query Service",
        "state": "RUNNING",
        "url": "https://dev-gcp-copy-<id>.env.clearscape.teradata.com:1443/",
        "credentials": [
          {
            "name": "username",
            "value": "demo_user"
          }
        ]
      },
      {
        "name": "ModelOps",
        "state": "RUNNING",
        "url": "https://dev-gcp-copy-<id>.env.clearscape.teradata.com:8443/modelops",
        "credentials": [
          {
            "name": "username",
            "value": "demo_user"
          }
        ]
      }
    ],
    "name": "dev-gcp-copy",
    "ip": "<ip-address>",
    "dnsName": "dev-gcp-copy-<id>.env.clearscape.teradata.com",
    "owner": "christopher@mortimer.xyz",
    "type": "clearscape",
    "createDate": "2023-10-03T09:44:10.710Z"
  }
]

Connect with Teradata Studio

To test we can connect to the Teradata Database we will use Teradata Studio installed on our Windows client machine.

tdbq studio connection

Create source table

Let’s create a very simple demo hello word table for copying to BigQuery.

create table demo_user.testing (
    col1 integer
  , col2 varchar(100)
)
;
insert into demo_user.testing values (1, 'hello');
insert into demo_user.testing values (2, 'world');

tdbq create table

BigQuery Data Flow

Target table

For this example I have created a target table with the same schema as the source table in BigQuery using the GUI. For simplicity, I have also created a BigQuery Dataset with the same name as the source Teradata database.

tdbq create table bq

Cloud Storage for driver

I have created a Google Cloud Storage bucket and downloaded the latest JDBC driver jar file and copied it up to cloud storage.

I have also added a temp folder for later use for the DataFlow job.

tdbq gcs td driver

DataFlow job

Create a new Dataflow job with the following configuration options.

tdbq dataflow jdbc to bigquery

We will be using the Dataflow template of JDBC to BigQuery.

tdbq required params

JDBCConnectionURL=jdbc:teradata://dev-gcp-copy-<id>.env.clearscape.teradata.com/demo_user
JDBCDriverClassName=com.teradata.jdbc.TeraDriver

For the optional parameters we will need to add the username and password for the Teradata database.

This blog post does not consider how you might store the password as a secret or in a key vault of some sort. Or at a minimal, base64 encode it so it is not visible in plain text.

tdbq dataflow optional

tdbq dataflow jobs

Note: The two erroneous jobs were before the username and password were added to the optional parameters. The error message can be seen below.

java.sql.SQLException: [Teradata JDBC Driver] [TeraJDBC 17.20.00.15] [Error 1032] [SQLState HY000] Single Sign-On NOT supported for Mechanism TD2.

tdbq dataflow almost success

tdbq dataflow success

Testing the results

After running the DataFlow job twice, and querying the target table we can see that the data has successfully been moved from the Teradata Database to the BigQuery Dataset.

tdbq dataflow bq results