Load to Teradata with a Python wrapper for TPT

PyTPT: A Python Wrapper for High-Performance Teradata Data Loading
This blog post is paired with the associated Github repository https://github.com/mortie23/python-teradata
In my previous blog post about Connecting to Teradata from Python, I demonstrated how to connect to Teradata from Python using the teradatasql driver and load CSV data using the built-in teradata_read_csv function. While that approach works well for many use cases, it has limitations when dealing with larger datasets or when you need the full power of Teradata’s native loading tools.
In this post I have implemented a Python package PyTPT that is basically a wrapper for Teradata’s TPT (Teradata Parallel Transporter) that can be used for bulk loading of CSV files. This approach leverages Teradata’s optimized loading engine while maintaining the simplicity and flexibility of Python.
What is TPT?
From the Official docs
Teradata Parallel Transporter (PT) is an object-oriented client application that provides scalable, high-speed, and parallel data:
- Extraction
- Loading
- Updating
A TPT job requires at least a Job Script and Job Variables.
PyTPT bridges this gap by automatically generating these files from templates and orchestrating the entire loading process through Python.
Local Python Environment Setup
For this implementation, I’m using:
- Python 3.13
- Poetry for dependency management
- Windows and Linux compatibility
Let’s set up the environment:
# Create your python virtual environment
# Linux
python -m venv ~/venv/pytpt
source ~/venv/pytpt/bin/activate
# Windows
python -m venv C:\venv\pytpt
C:\venv\pytpt\Scripts\Activate.ps1
# Install poetry and then packages
pip install poetry
poetry installPrerequisites
Before we begin, you’ll need:
- A Teradata machine (I’m using TD Express VM)
- Teradata Studio (optional, but helpful for checking results)
- Teradata Tools and Utilities (TTU) - This is crucial as it includes the
tbuildbinary that TPT requires
The TTU installation provides the tbuild command that PyTPT calls under the hood to execute TPT operations.
Configuration Structure
PyTPT uses a hierarchical configuration system powered by Hydra, making it easy to manage different environments and datasets.
Project Configuration
The main project configuration in conf/config.yaml. The database default will then become the name of the next configuration file and must be the same name as the database you are planning to load to.
# Main configuration file for pytpt examples
# This file sets up the default configuration structure
defaults:
- database: PRD_ADS_PYTHON_NFL_DB
- _self_
# Data directory configuration
data:
dir: "../../data"This configuration tells PyTPT where to find your CSV files and how they map to Teradata tables.
Database Configuration
First, configure your Teradata connection in conf/database/PRD_ADS_PYTHON_NFL_DB.yaml:
# @package _global_
# Teradata NFL database configuration
database:
ddl_host: "tdvm"
target_host: "tdvm"
working_database: "PRD_ADS_PYTHON_NFL_DB"
name: "PRD_ADS_PYTHON_NFL_DB"
ddl_logmech: "TD2"
target_logmech: "TD2"
# Table configurations for NFL data
tables:
game: "GAME"
game_type: "GAME_TYPE"
game_stats: "GAME_STATS"
game_venue: "GAME_VENUE"
player: "PLAYER"
venue: "VENUE"
team_lookup: "TEAM_LOOKUP"
weather: "WEATHER"How PyTPT Works
PyTPT follows a simple workflow:
- Discovery: Scans the configured data directory for CSV files
- Mapping: Matches CSV files to configured table names
- Template Generation: Creates TPT job variables and scripts from Jinja2 templates
- Execution: Runs the TPT operations in sequence:
- Drop existing table (if it exists)
- Create new table structure
- Load CSV data using TPT’s optimized bulk loading
Let’s look at the core loading function:
def load_table(
table_name: str,
csv_file_path: str,
) -> bool:
"""Load a single CSV file to a table using TPT."""
from .templates import create_tpt_files
logger.info(f"Starting table load process: {csv_file_path} -> {table_name}")
# Create TPT files
jvar_file, drop_file, create_file, load_file = create_tpt_files(
table_name, csv_file_path
)
# Step 1: Drop table
logger.info(f"Step 1/3: Dropping table {table_name}")
drop_success = run_tbuild(jvar_file, drop_file, "drop")
if not drop_success:
logger.error(f"Failed to drop table {table_name}")
return False
# Step 2: Create table
logger.info(f"Step 2/3: Creating table {table_name}")
create_success = run_tbuild(jvar_file, create_file, "create")
if not create_success:
logger.error(f"Failed to create table {table_name}")
return False
# Step 3: Load data
logger.info(f"Step 3/3: Loading data into {table_name}")
load_success = run_tbuild(jvar_file, load_file, "load")
if load_success:
logger.success(f"Successfully completed table load: {table_name}")
else:
logger.error(f"Failed to load data to {table_name}")
return load_successTemplate-Driven TPT Generation
PyTPT uses Jinja2 templates to generate TPT files dynamically. Here’s an example of the create table template:
-- Create table template TPT
DEFINE JOB create_table_job
DESCRIPTION 'Create Table TPT JOB' (
STEP drop_table (
APPLY ('DROP TABLE {{working_database}}."{{table_name}}";')
TO OPERATOR ($DDL);
);
);Here is the job variables template:
# Template Job variables file
DDLTdpId='{{ddl_host}}'
, DDLUserName='{{ddl_username}}'
, DDLUserPassword='{{ddl_password}}'
, DDLLogonMech='{{ddl_logon_mech}}'
, TargetTdpId='{{target_host}}'
, TargetUserName='{{target_username}}'
, TargetUserPassword='{{target_password}}'
, TargetLogonMech='{{target_logon_mech}}'
, TargetWorkingDatabase = '{{working_database}}'
, LoadTargetTable='{{working_database}}."{{table_name}}"'
, LoadCharacterSet = 'UTF8'
, FileReaderFileName='{{csv_file_path}}'
, FileReaderFormat='Delimited'
, FileReaderOpenmode='read'
, FileReaderTextDelimiter=','
, FileReaderSkipRows=1
, FileReaderCharSet = 'UTF8'
, FileReaderPrivateLogName='.\logs\tpt_{{table_name}}_filereader.log'
, FileReaderQuotedData = 'Optional'
, FileReaderMultipleRecordsInField = 'Y'
, FileReaderEscapeQuoteDelimiter = '"'
, FileReaderTruncateColumns = 'Y'
, FileReaderIndicatorMode = 'Y'
, LoadPrivateLogName='.\logs\tpt_{{table_name}}_load.log'
, DDLPrivateLogName='.\logs\tpt_{{table_name}}_ddl.log'Execution and Monitoring
To run PyTPT just execute this command:
python app.pyThe application provides logging throughout the process to files in the logs directory. Example of loading one table.
2025-09-29 17:11:13.883 | INFO | __main__:main:73 - Processing 1 CSV files with table mappings
2025-09-29 17:11:13.884 | INFO | __main__:main:82 - Processing: venue.csv -> VENUE
2025-09-29 17:11:13.884 | INFO | pytpt.tpt_runner:load_table:144 - Starting table load process: C:\git\github\mortie23\python-teradata\eng\pytpt\..\..\data\venue.csv -> VENUE
2025-09-29 17:11:13.919 | INFO | pytpt.tpt_runner:load_table:152 - Step 1/3: Dropping table VENUE
2025-09-29 17:11:13.920 | INFO | pytpt.tpt_runner:run_tbuild:48 - Starting drop operation: tbuild -f render_tmp\VENUE_drop.tpt -v render_tmp\VENUE.jvar
2025-09-29 17:11:20.383 | SUCCESS | pytpt.tpt_runner:run_tbuild:110 - drop operation completed successfully
2025-09-29 17:11:20.383 | INFO | pytpt.tpt_runner:run_tbuild:111 - Complete TPT output saved to: logs\tpt_VENUE_drop_combined.log
2025-09-29 17:11:20.383 | INFO | pytpt.tpt_runner:load_table:160 - Step 2/3: Creating table VENUE
2025-09-29 17:11:20.383 | INFO | pytpt.tpt_runner:run_tbuild:48 - Starting create operation: tbuild -f render_tmp\VENUE_create.tpt -v render_tmp\VENUE.jvar
2025-09-29 17:11:26.824 | SUCCESS | pytpt.tpt_runner:run_tbuild:110 - create operation completed successfully
2025-09-29 17:11:26.824 | INFO | pytpt.tpt_runner:run_tbuild:111 - Complete TPT output saved to: logs\tpt_VENUE_create_combined.log
2025-09-29 17:11:26.824 | INFO | pytpt.tpt_runner:load_table:168 - Step 3/3: Loading data into VENUE
2025-09-29 17:11:26.825 | INFO | pytpt.tpt_runner:run_tbuild:48 - Starting load operation: tbuild -f render_tmp\VENUE_load.tpt -v render_tmp\VENUE.jvar
2025-09-29 17:11:43.988 | SUCCESS | pytpt.tpt_runner:run_tbuild:110 - load operation completed successfully
2025-09-29 17:11:43.988 | INFO | pytpt.tpt_runner:run_tbuild:111 - Complete TPT output saved to: logs\tpt_VENUE_load_combined.log
2025-09-29 17:11:43.988 | INFO | pytpt.tpt_runner:run_tbuild:122 - Rows sent to RDBMS: 32
2025-09-29 17:11:43.989 | INFO | pytpt.tpt_runner:run_tbuild:124 - Rows applied: 32
2025-09-29 17:11:43.989 | SUCCESS | pytpt.tpt_runner:load_table:172 - Successfully completed table load: VENUE
2025-09-29 17:11:43.989 | SUCCESS | __main__:main:86 - Successfully processed: venue.csv
2025-09-29 17:11:43.990 | INFO | __main__:main:93 - Load process completed: 1/1 files loaded successfully
2025-09-29 17:11:43.990 | SUCCESS | __main__:main:103 - All files loaded successfully!PyTPT also captures detailed TPT output in log files for troubleshooting:
tpt_VENUE_create_combined.logtpt_VENUE_load_combined.logtpt_VENUE_drop_combined.log
Results
After I ran it on my machine the tables were created and records loaded.
Error Handling and Robustness
The system validates that rows sent equal rows applied, providing confidence in data integrity.
Comparison with Previous Approach
| Feature | teradatasql + teradata_read_csv | PyTPT |
|---|---|---|
| Performance | Good for small-medium datasets | Optimized for large datasets |
| Complexity | Simple, direct SQL | Template-driven, more setup |
| Scalability | Limited by Python/ODBC | Native TPT parallelism |
| Error Handling | Basic SQL error messages | Detailed TPT logs and metrics |
| Configuration | Hardcoded in scripts | Flexible YAML configuration |
| Dependencies | teradatasql driver | TTU installation required |
Debugging
Some of the defaults I have chosen were a result of issues loading certain types of CSV files that took a long time to find the correct variables.
For example, loading CSV data where text fields have newlines such as:
integer_field,text_field_with_newline
1,test
2,test
3,"this long line.
that has newlines"
4,testRequires these job variables:
, FileReaderQuotedData = 'Optional'
, FileReaderMultipleRecordsInField = 'Y'
, FileReaderEscapeQuoteDelimiter = '"'
, FileReaderTruncateColumns = 'Y'