Data Engineering,  Python,  Teradata

Load to Teradata with a Python wrapper for TPT

Load to Teradata with a Python wrapper for TPT

PyTPT: A Python Wrapper for High-Performance Teradata Data Loading

This blog post is paired with the associated Github repository https://github.com/mortie23/python-teradata

In my previous blog post about Connecting to Teradata from Python, I demonstrated how to connect to Teradata from Python using the teradatasql driver and load CSV data using the built-in teradata_read_csv function. While that approach works well for many use cases, it has limitations when dealing with larger datasets or when you need the full power of Teradata’s native loading tools.

In this post I have implemented a Python package PyTPT that is basically a wrapper for Teradata’s TPT (Teradata Parallel Transporter) that can be used for bulk loading of CSV files. This approach leverages Teradata’s optimized loading engine while maintaining the simplicity and flexibility of Python.

What is TPT?

From the Official docs

Teradata Parallel Transporter (PT) is an object-oriented client application that provides scalable, high-speed, and parallel data:

  • Extraction
  • Loading
  • Updating

A TPT job requires at least a Job Script and Job Variables.

PyTPT bridges this gap by automatically generating these files from templates and orchestrating the entire loading process through Python.

Local Python Environment Setup

For this implementation, I’m using:

  • Python 3.13
  • Poetry for dependency management
  • Windows and Linux compatibility

Let’s set up the environment:

# Create your python virtual environment
# Linux
python -m venv ~/venv/pytpt
source ~/venv/pytpt/bin/activate

# Windows
python -m venv C:\venv\pytpt
C:\venv\pytpt\Scripts\Activate.ps1

# Install poetry and then packages
pip install poetry
poetry install

Prerequisites

Before we begin, you’ll need:

  • A Teradata machine (I’m using TD Express VM)
  • Teradata Studio (optional, but helpful for checking results)
  • Teradata Tools and Utilities (TTU) - This is crucial as it includes the tbuild binary that TPT requires

The TTU installation provides the tbuild command that PyTPT calls under the hood to execute TPT operations.

Configuration Structure

PyTPT uses a hierarchical configuration system powered by Hydra, making it easy to manage different environments and datasets.

Project Configuration

The main project configuration in conf/config.yaml. The database default will then become the name of the next configuration file and must be the same name as the database you are planning to load to.

# Main configuration file for pytpt examples
# This file sets up the default configuration structure

defaults:
  - database: PRD_ADS_PYTHON_NFL_DB
  - _self_

# Data directory configuration
data:
  dir: "../../data"

This configuration tells PyTPT where to find your CSV files and how they map to Teradata tables.

Database Configuration

First, configure your Teradata connection in conf/database/PRD_ADS_PYTHON_NFL_DB.yaml:

# @package _global_
# Teradata NFL database configuration

database:
  ddl_host: "tdvm"
  target_host: "tdvm"
  working_database: "PRD_ADS_PYTHON_NFL_DB"
  name: "PRD_ADS_PYTHON_NFL_DB"
  ddl_logmech: "TD2"
  target_logmech: "TD2"

# Table configurations for NFL data
tables:
  game: "GAME"
  game_type: "GAME_TYPE"
  game_stats: "GAME_STATS"
  game_venue: "GAME_VENUE"
  player: "PLAYER"
  venue: "VENUE" 
  team_lookup: "TEAM_LOOKUP"
  weather: "WEATHER"

How PyTPT Works

PyTPT follows a simple workflow:

  1. Discovery: Scans the configured data directory for CSV files
  2. Mapping: Matches CSV files to configured table names
  3. Template Generation: Creates TPT job variables and scripts from Jinja2 templates
  4. Execution: Runs the TPT operations in sequence:
    • Drop existing table (if it exists)
    • Create new table structure
    • Load CSV data using TPT’s optimized bulk loading

Let’s look at the core loading function:

def load_table(
    table_name: str,
    csv_file_path: str,
) -> bool:
    """Load a single CSV file to a table using TPT."""
    from .templates import create_tpt_files

    logger.info(f"Starting table load process: {csv_file_path} -> {table_name}")

    # Create TPT files
    jvar_file, drop_file, create_file, load_file = create_tpt_files(
        table_name, csv_file_path
    )

    # Step 1: Drop table
    logger.info(f"Step 1/3: Dropping table {table_name}")
    drop_success = run_tbuild(jvar_file, drop_file, "drop")

    if not drop_success:
        logger.error(f"Failed to drop table {table_name}")
        return False

    # Step 2: Create table
    logger.info(f"Step 2/3: Creating table {table_name}")
    create_success = run_tbuild(jvar_file, create_file, "create")

    if not create_success:
        logger.error(f"Failed to create table {table_name}")
        return False

    # Step 3: Load data
    logger.info(f"Step 3/3: Loading data into {table_name}")
    load_success = run_tbuild(jvar_file, load_file, "load")

    if load_success:
        logger.success(f"Successfully completed table load: {table_name}")
    else:
        logger.error(f"Failed to load data to {table_name}")

    return load_success

Template-Driven TPT Generation

PyTPT uses Jinja2 templates to generate TPT files dynamically. Here’s an example of the create table template:

-- Create table template TPT
DEFINE JOB create_table_job
DESCRIPTION 'Create Table TPT JOB' (
  STEP drop_table (
        APPLY ('DROP TABLE {{working_database}}."{{table_name}}";')
        TO OPERATOR ($DDL);
    );
);

Here is the job variables template:

# Template Job variables file
DDLTdpId='{{ddl_host}}'
, DDLUserName='{{ddl_username}}'
, DDLUserPassword='{{ddl_password}}'
, DDLLogonMech='{{ddl_logon_mech}}'
, TargetTdpId='{{target_host}}'
, TargetUserName='{{target_username}}'
, TargetUserPassword='{{target_password}}'
, TargetLogonMech='{{target_logon_mech}}'
, TargetWorkingDatabase = '{{working_database}}'
, LoadTargetTable='{{working_database}}."{{table_name}}"'
, LoadCharacterSet = 'UTF8'
, FileReaderFileName='{{csv_file_path}}'
, FileReaderFormat='Delimited'
, FileReaderOpenmode='read'
, FileReaderTextDelimiter=','
, FileReaderSkipRows=1
, FileReaderCharSet = 'UTF8'
, FileReaderPrivateLogName='.\logs\tpt_{{table_name}}_filereader.log'
, FileReaderQuotedData = 'Optional'
, FileReaderMultipleRecordsInField = 'Y'
, FileReaderEscapeQuoteDelimiter = '"'
, FileReaderTruncateColumns = 'Y'
, FileReaderIndicatorMode = 'Y'
, LoadPrivateLogName='.\logs\tpt_{{table_name}}_load.log'
, DDLPrivateLogName='.\logs\tpt_{{table_name}}_ddl.log'

Execution and Monitoring

To run PyTPT just execute this command:

python app.py

The application provides logging throughout the process to files in the logs directory. Example of loading one table.

2025-09-29 17:11:13.883 | INFO     | __main__:main:73 - Processing 1 CSV files with table mappings
2025-09-29 17:11:13.884 | INFO     | __main__:main:82 - Processing: venue.csv -> VENUE
2025-09-29 17:11:13.884 | INFO     | pytpt.tpt_runner:load_table:144 - Starting table load process: C:\git\github\mortie23\python-teradata\eng\pytpt\..\..\data\venue.csv -> VENUE
2025-09-29 17:11:13.919 | INFO     | pytpt.tpt_runner:load_table:152 - Step 1/3: Dropping table VENUE
2025-09-29 17:11:13.920 | INFO     | pytpt.tpt_runner:run_tbuild:48 - Starting drop operation: tbuild -f render_tmp\VENUE_drop.tpt -v render_tmp\VENUE.jvar
2025-09-29 17:11:20.383 | SUCCESS  | pytpt.tpt_runner:run_tbuild:110 - drop operation completed successfully
2025-09-29 17:11:20.383 | INFO     | pytpt.tpt_runner:run_tbuild:111 - Complete TPT output saved to: logs\tpt_VENUE_drop_combined.log
2025-09-29 17:11:20.383 | INFO     | pytpt.tpt_runner:load_table:160 - Step 2/3: Creating table VENUE
2025-09-29 17:11:20.383 | INFO     | pytpt.tpt_runner:run_tbuild:48 - Starting create operation: tbuild -f render_tmp\VENUE_create.tpt -v render_tmp\VENUE.jvar
2025-09-29 17:11:26.824 | SUCCESS  | pytpt.tpt_runner:run_tbuild:110 - create operation completed successfully
2025-09-29 17:11:26.824 | INFO     | pytpt.tpt_runner:run_tbuild:111 - Complete TPT output saved to: logs\tpt_VENUE_create_combined.log
2025-09-29 17:11:26.824 | INFO     | pytpt.tpt_runner:load_table:168 - Step 3/3: Loading data into VENUE
2025-09-29 17:11:26.825 | INFO     | pytpt.tpt_runner:run_tbuild:48 - Starting load operation: tbuild -f render_tmp\VENUE_load.tpt -v render_tmp\VENUE.jvar
2025-09-29 17:11:43.988 | SUCCESS  | pytpt.tpt_runner:run_tbuild:110 - load operation completed successfully
2025-09-29 17:11:43.988 | INFO     | pytpt.tpt_runner:run_tbuild:111 - Complete TPT output saved to: logs\tpt_VENUE_load_combined.log
2025-09-29 17:11:43.988 | INFO     | pytpt.tpt_runner:run_tbuild:122 - Rows sent to RDBMS: 32
2025-09-29 17:11:43.989 | INFO     | pytpt.tpt_runner:run_tbuild:124 - Rows applied: 32
2025-09-29 17:11:43.989 | SUCCESS  | pytpt.tpt_runner:load_table:172 - Successfully completed table load: VENUE
2025-09-29 17:11:43.989 | SUCCESS  | __main__:main:86 - Successfully processed: venue.csv
2025-09-29 17:11:43.990 | INFO     | __main__:main:93 - Load process completed: 1/1 files loaded successfully
2025-09-29 17:11:43.990 | SUCCESS  | __main__:main:103 - All files loaded successfully!

PyTPT also captures detailed TPT output in log files for troubleshooting:

  • tpt_VENUE_create_combined.log
  • tpt_VENUE_load_combined.log
  • tpt_VENUE_drop_combined.log

Results

After I ran it on my machine the tables were created and records loaded.

Database Tables

Game Table Load Results

Error Handling and Robustness

The system validates that rows sent equal rows applied, providing confidence in data integrity.

Comparison with Previous Approach

Featureteradatasql + teradata_read_csvPyTPT
PerformanceGood for small-medium datasetsOptimized for large datasets
ComplexitySimple, direct SQLTemplate-driven, more setup
ScalabilityLimited by Python/ODBCNative TPT parallelism
Error HandlingBasic SQL error messagesDetailed TPT logs and metrics
ConfigurationHardcoded in scriptsFlexible YAML configuration
Dependenciesteradatasql driverTTU installation required

Debugging

Some of the defaults I have chosen were a result of issues loading certain types of CSV files that took a long time to find the correct variables.

For example, loading CSV data where text fields have newlines such as:

integer_field,text_field_with_newline
1,test
2,test
3,"this long line.
that has newlines"
4,test

Requires these job variables:

, FileReaderQuotedData = 'Optional'
, FileReaderMultipleRecordsInField = 'Y'
, FileReaderEscapeQuoteDelimiter = '"'
, FileReaderTruncateColumns = 'Y'