Load to Teradata with a Python wrapper for TPT

Christopher Mortimer
Specialising in data
More posts by Christopher Mortimer.

Christopher Mortimer

08 Oct 2025•7 min read

Load to Teradata with a Python wrapper for TPT

PyTPT: A Python Wrapper for High-Performance Teradata Data Loading

This blog post is paired with the associated Github repository https://github.com/mortie23/python-teradata

In my previous blog post about Connecting to Teradata from Python, I demonstrated how to connect to Teradata from Python using the teradatasql driver and load CSV data using the built-in teradata_read_csv function. While that approach works well for many use cases, it has limitations when dealing with larger datasets or when you need the full power of Teradata’s native loading tools.

In this post I have implemented a Python package PyTPT that is basically a wrapper for Teradata’s TPT (Teradata Parallel Transporter) that can be used for bulk loading of CSV files. This approach leverages Teradata’s optimized loading engine while maintaining the simplicity and flexibility of Python.

What is TPT?

From the Official docs

Teradata Parallel Transporter (PT) is an object-oriented client application that provides scalable, high-speed, and parallel data:

Extraction
Loading
Updating

A TPT job requires at least a Job Script and Job Variables.

PyTPT bridges this gap by automatically generating these files from templates and orchestrating the entire loading process through Python.

Local Python Environment Setup

For this implementation, I’m using:

Python 3.13
Poetry for dependency management
Windows and Linux compatibility

Let’s set up the environment:

# Create your python virtual environment
# Linux
python -m venv ~/venv/pytpt
source ~/venv/pytpt/bin/activate

# Windows
python -m venv C:\venv\pytpt
C:\venv\pytpt\Scripts\Activate.ps1

# Install poetry and then packages
pip install poetry
poetry install

Prerequisites

Before we begin, you’ll need:

A Teradata machine (I’m using TD Express VM)
Teradata Studio (optional, but helpful for checking results)
Teradata Tools and Utilities (TTU) - This is crucial as it includes the tbuild binary that TPT requires

The TTU installation provides the tbuild command that PyTPT calls under the hood to execute TPT operations.

Configuration Structure

PyTPT uses a hierarchical configuration system powered by Hydra, making it easy to manage different environments and datasets.

Project Configuration

The main project configuration in conf/config.yaml. The database default will then become the name of the next configuration file and must be the same name as the database you are planning to load to.

# Main configuration file for pytpt examples
# This file sets up the default configuration structure

defaults:
  - database: PRD_ADS_PYTHON_NFL_DB
  - _self_

# Data directory configuration
data:
  dir: "../../data"

This configuration tells PyTPT where to find your CSV files and how they map to Teradata tables.

Database Configuration

First, configure your Teradata connection in conf/database/PRD_ADS_PYTHON_NFL_DB.yaml:

# @package _global_
# Teradata NFL database configuration

database:
  ddl_host: "tdvm"
  target_host: "tdvm"
  working_database: "PRD_ADS_PYTHON_NFL_DB"
  name: "PRD_ADS_PYTHON_NFL_DB"
  ddl_logmech: "TD2"
  target_logmech: "TD2"

# Table configurations for NFL data
tables:
  game: "GAME"
  game_type: "GAME_TYPE"
  game_stats: "GAME_STATS"
  game_venue: "GAME_VENUE"
  player: "PLAYER"
  venue: "VENUE" 
  team_lookup: "TEAM_LOOKUP"
  weather: "WEATHER"

How PyTPT Works

PyTPT follows a simple workflow:

Discovery: Scans the configured data directory for CSV files
Mapping: Matches CSV files to configured table names
Template Generation: Creates TPT job variables and scripts from Jinja2 templates
Execution: Runs the TPT operations in sequence:
- Drop existing table (if it exists)
- Create new table structure
- Load CSV data using TPT’s optimized bulk loading

Let’s look at the core loading function:

def load_table(
    table_name: str,
    csv_file_path: str,
) -> bool:
    """Load a single CSV file to a table using TPT."""
    from .templates import create_tpt_files

    logger.info(f"Starting table load process: {csv_file_path} -> {table_name}")

    # Create TPT files
    jvar_file, drop_file, create_file, load_file = create_tpt_files(
        table_name, csv_file_path
    )

    # Step 1: Drop table
    logger.info(f"Step 1/3: Dropping table {table_name}")
    drop_success = run_tbuild(jvar_file, drop_file, "drop")

    if not drop_success:
        logger.error(f"Failed to drop table {table_name}")
        return False

    # Step 2: Create table
    logger.info(f"Step 2/3: Creating table {table_name}")
    create_success = run_tbuild(jvar_file, create_file, "create")

    if not create_success:
        logger.error(f"Failed to create table {table_name}")
        return False

    # Step 3: Load data
    logger.info(f"Step 3/3: Loading data into {table_name}")
    load_success = run_tbuild(jvar_file, load_file, "load")

    if load_success:
        logger.success(f"Successfully completed table load: {table_name}")
    else:
        logger.error(f"Failed to load data to {table_name}")

    return load_success

Template-Driven TPT Generation

PyTPT uses Jinja2 templates to generate TPT files dynamically. Here’s an example of the create table template:

-- Create table template TPT
DEFINE JOB create_table_job
DESCRIPTION 'Create Table TPT JOB' (
  STEP drop_table (
        APPLY ('DROP TABLE {{working_database}}."{{table_name}}";')
        TO OPERATOR ($DDL);
    );
);

Here is the job variables template:

# Template Job variables file
DDLTdpId='{{ddl_host}}'
, DDLUserName='{{ddl_username}}'
, DDLUserPassword='{{ddl_password}}'
, DDLLogonMech='{{ddl_logon_mech}}'
, TargetTdpId='{{target_host}}'
, TargetUserName='{{target_username}}'
, TargetUserPassword='{{target_password}}'
, TargetLogonMech='{{target_logon_mech}}'
, TargetWorkingDatabase = '{{working_database}}'
, LoadTargetTable='{{working_database}}."{{table_name}}"'
, LoadCharacterSet = 'UTF8'
, FileReaderFileName='{{csv_file_path}}'
, FileReaderFormat='Delimited'
, FileReaderOpenmode='read'
, FileReaderTextDelimiter=','
, FileReaderSkipRows=1
, FileReaderCharSet = 'UTF8'
, FileReaderPrivateLogName='.\logs\tpt_{{table_name}}_filereader.log'
, FileReaderQuotedData = 'Optional'
, FileReaderMultipleRecordsInField = 'Y'
, FileReaderEscapeQuoteDelimiter = '"'
, FileReaderTruncateColumns = 'Y'
, FileReaderIndicatorMode = 'Y'
, LoadPrivateLogName='.\logs\tpt_{{table_name}}_load.log'
, DDLPrivateLogName='.\logs\tpt_{{table_name}}_ddl.log'

Execution and Monitoring

To run PyTPT just execute this command:

python app.py

The application provides logging throughout the process to files in the logs directory. Example of loading one table.

2025-09-29 17:11:13.883 | INFO     | __main__:main:73 - Processing 1 CSV files with table mappings
2025-09-29 17:11:13.884 | INFO     | __main__:main:82 - Processing: venue.csv -> VENUE
2025-09-29 17:11:13.884 | INFO     | pytpt.tpt_runner:load_table:144 - Starting table load process: C:\git\github\mortie23\python-teradata\eng\pytpt\..\..\data\venue.csv -> VENUE
2025-09-29 17:11:13.919 | INFO     | pytpt.tpt_runner:load_table:152 - Step 1/3: Dropping table VENUE
2025-09-29 17:11:13.920 | INFO     | pytpt.tpt_runner:run_tbuild:48 - Starting drop operation: tbuild -f render_tmp\VENUE_drop.tpt -v render_tmp\VENUE.jvar
2025-09-29 17:11:20.383 | SUCCESS  | pytpt.tpt_runner:run_tbuild:110 - drop operation completed successfully
2025-09-29 17:11:20.383 | INFO     | pytpt.tpt_runner:run_tbuild:111 - Complete TPT output saved to: logs\tpt_VENUE_drop_combined.log
2025-09-29 17:11:20.383 | INFO     | pytpt.tpt_runner:load_table:160 - Step 2/3: Creating table VENUE
2025-09-29 17:11:20.383 | INFO     | pytpt.tpt_runner:run_tbuild:48 - Starting create operation: tbuild -f render_tmp\VENUE_create.tpt -v render_tmp\VENUE.jvar
2025-09-29 17:11:26.824 | SUCCESS  | pytpt.tpt_runner:run_tbuild:110 - create operation completed successfully
2025-09-29 17:11:26.824 | INFO     | pytpt.tpt_runner:run_tbuild:111 - Complete TPT output saved to: logs\tpt_VENUE_create_combined.log
2025-09-29 17:11:26.824 | INFO     | pytpt.tpt_runner:load_table:168 - Step 3/3: Loading data into VENUE
2025-09-29 17:11:26.825 | INFO     | pytpt.tpt_runner:run_tbuild:48 - Starting load operation: tbuild -f render_tmp\VENUE_load.tpt -v render_tmp\VENUE.jvar
2025-09-29 17:11:43.988 | SUCCESS  | pytpt.tpt_runner:run_tbuild:110 - load operation completed successfully
2025-09-29 17:11:43.988 | INFO     | pytpt.tpt_runner:run_tbuild:111 - Complete TPT output saved to: logs\tpt_VENUE_load_combined.log
2025-09-29 17:11:43.988 | INFO     | pytpt.tpt_runner:run_tbuild:122 - Rows sent to RDBMS: 32
2025-09-29 17:11:43.989 | INFO     | pytpt.tpt_runner:run_tbuild:124 - Rows applied: 32
2025-09-29 17:11:43.989 | SUCCESS  | pytpt.tpt_runner:load_table:172 - Successfully completed table load: VENUE
2025-09-29 17:11:43.989 | SUCCESS  | __main__:main:86 - Successfully processed: venue.csv
2025-09-29 17:11:43.990 | INFO     | __main__:main:93 - Load process completed: 1/1 files loaded successfully
2025-09-29 17:11:43.990 | SUCCESS  | __main__:main:103 - All files loaded successfully!

PyTPT also captures detailed TPT output in log files for troubleshooting:

tpt_VENUE_create_combined.log
tpt_VENUE_load_combined.log
tpt_VENUE_drop_combined.log

Results

After I ran it on my machine the tables were created and records loaded.

Error Handling and Robustness

The system validates that rows sent equal rows applied, providing confidence in data integrity.

Comparison with Previous Approach

Feature	teradatasql + teradata_read_csv	PyTPT
Performance	Good for small-medium datasets	Optimized for large datasets
Complexity	Simple, direct SQL	Template-driven, more setup
Scalability	Limited by Python/ODBC	Native TPT parallelism
Error Handling	Basic SQL error messages	Detailed TPT logs and metrics
Configuration	Hardcoded in scripts	Flexible YAML configuration
Dependencies	teradatasql driver	TTU installation required

Debugging

Some of the defaults I have chosen were a result of issues loading certain types of CSV files that took a long time to find the correct variables.

For example, loading CSV data where text fields have newlines such as:

integer_field,text_field_with_newline
1,test
2,test
3,"this long line.
that has newlines"
4,test

Requires these job variables:

, FileReaderQuotedData = 'Optional'
, FileReaderMultipleRecordsInField = 'Y'
, FileReaderEscapeQuoteDelimiter = '"'
, FileReaderTruncateColumns = 'Y'

Load to Teradata with a Python wrapper for TPT

Christopher Mortimer

Christopher Mortimer

PyTPT: A Python Wrapper for High-Performance Teradata Data Loading

What is TPT?

Local Python Environment Setup

Prerequisites

Configuration Structure

Project Configuration

Database Configuration

How PyTPT Works

Template-Driven TPT Generation

Execution and Monitoring

Results

Error Handling and Robustness

Comparison with Previous Approach

Debugging

Quackpipe, running DuckDB local and on AWS Glue

Query Salesforce with Python

Making Machine Learning predictions from BigQuery Dataform