YndigoBlue · Teradata Extension

Enterprise full-text search, native to Teradata

Add BM25-ranked full-text search to any Teradata database with no middleware, no external search engine, and no changes to your application architecture.

Included with all YndigoBlue subscriptions
product_search.sql
-- Step 1: create a full-text index on your existing table
CALL fts_metadata.CREATE_FTS_INDEX_C(
    'products_idx',    -- unique index name
    'mydb',            -- schema
    'products',        -- table
    'product_id',      -- primary key
    'name,description' -- columns to index
    result_msg
);

-- Step 2: search with BM25 relevance scoring
SELECT p.product_id, p.name, p.description, scores.score
FROM (
    SELECT ti.document_id,
           SUM(fts_metadata.BM25_SCORE(
               ti.term_frequency, ds.doc_length,
               idx.avg_doc_length, ts.document_frequency,
               idx.document_count
           )) AS score
    FROM TABLE(fts_metadata.TOKENIZE_TEXT_C('wireless noise cancelling')) AS qt(token, pos)
    INNER JOIN fts_metadata.fts_term_index ti ON ti.term = qt.token AND ti.index_name = 'products_idx'
    INNER JOIN fts_metadata.fts_document_stats ds ON ds.index_name = ti.index_name 
        AND ds.document_id = ti.document_id
    INNER JOIN fts_metadata.fts_term_stats ts ON ts.index_name = ti.index_name AND ts.term = ti.term
    CROSS JOIN fts_metadata.fts_indexes idx WHERE idx.index_name = 'products_idx'
    GROUP BY ti.document_id
) scores
INNER JOIN mydb.products p ON CAST(p.product_id AS VARCHAR(512)) = scores.document_id
ORDER BY scores.score DESC SAMPLE 10;
BM25
Relevance Ranking
TD 17+20
Teradata Versions
Pure SQL
No Middleware
Included
With Subscription
Capabilities

Production-grade search inside Teradata

Built entirely from C UDFs and SQL stored procedures — no external systems, no data movement, no new infrastructure to manage.

BM25 Relevance Scoring

The same probabilistic ranking algorithm used by Elasticsearch and Apache Lucene. Term frequency saturation, document-length normalisation and inverse document frequency — all computed natively in Teradata.

Inverted Index Engine

A full inverted index stored directly in Teradata metadata tables — term frequencies, document statistics and global IDF weights — enabling fast, ranked lookups over millions of documents.

Incremental Updates

Index a single newly-inserted document, update a modified row, or delete a removed document — without rebuilding the entire index. Designed for live production workloads.

Pure SQL Interface

Search and manage indexes entirely through standard Teradata SQL — stored procedure calls for index management, direct SQL joins for querying. No new tools or APIs to learn.

Multi-Column Indexing

Index multiple text columns simultaneously — name, description, tags, notes — with a single index creation call. Queries automatically search across all indexed columns.

Resumable & Robust

Index creation is resumable after interruption. UTF-8 validation catches malformed data gracefully. Comprehensive error recovery keeps your indexes consistent under failure.

Choose Your Indexer

Two indexers, one right fit

Teradata Full-Text Search ships with two indexer variants. Pick the one that matches your data characteristics.

_C · CLOB Recommended default

TOKENIZE_TEXT_C

For documents of any size, including LOB columns. Handles invalid UTF-8 gracefully by skipping malformed sequences and continuing.

Document sizeUnlimited (CLOB)
UTF-8 handlingValidates & recovers
Best forArticles, notes, CLOBs
Teradata 17
Teradata 20
_V · VARCHAR For shorter documents

TOKENIZE_TEXT_V

Accesses VARCHAR buffers directly via UCS-2, avoiding LOB overhead. The faster option for data that fits within 32 KB per document.

Document sizeUp to 32,000 chars
UTF-8 handlingClean data only
Best forProduct names, titles
Teradata 17
Teradata 20
Code

Up and running in minutes

Deploy to any Teradata 17 or 20 instance, create your first index, and start returning ranked search results — all through standard SQL.

-- Create a CLOB-based index (recommended — handles any document size)
CALL fts_metadata.CREATE_FTS_INDEX_C(
    'products_idx',          -- globally unique index name
    'mydb',                  -- source schema
    'products',              -- source table
    'product_id',            -- primary key column
    'name,description',      -- comma-separated columns to index
    result_msg               -- OUT: status message
);
SELECT :result_msg;          -- e.g. "Index products_idx created (1,247 documents)"

-- Or the VARCHAR variant — 3x faster when documents are < 32 KB
CALL fts_metadata.CREATE_FTS_INDEX_V(
    'products_idx_fast', 'mydb', 'products', 'product_id', 'sku,name', result_msg
);

-- View all indexes and their statistics
SELECT index_name, source_schema, source_table,
       document_count, total_terms, avg_doc_length
FROM fts_metadata.v_fts_indexes;

-- Rebuild after a bulk data change
CALL fts_metadata.REBUILD_FTS_INDEX_C('products_idx', result_msg);

-- Drop an index when no longer needed
CALL fts_metadata.DROP_FTS_INDEX('products_idx', result_msg);
-- Teradata 20 only: SEARCH_FTS convenience procedure
-- One call replaces the full BM25 query above
CALL fts_metadata.SEARCH_FTS(
    'products_idx',                    -- index to search
    'wireless noise cancelling',       -- search query
    'mydb.products',                   -- fully-qualified source table
    'product_id',                      -- primary key column
    'product_id, sku, name',           -- columns to return
    10                                 -- maximum results
);

-- Results are written to the global temporary table fts_search_results
SELECT * FROM fts_search_results ORDER BY score DESC;

-- Unranked search (fastest — no BM25 calculation, just matching docs)
SELECT DISTINCT ti.document_id
FROM TABLE(fts_metadata.TOKENIZE_TEXT_C('wireless')) AS qt(token, pos)
INNER JOIN fts_metadata.fts_term_index ti
    ON ti.term = qt.token AND ti.index_name = 'products_idx';

-- Debug tokenisation — see how your query terms are split
SELECT token, pos
FROM TABLE(fts_metadata.TOKENIZE_TEXT_C('noise-cancelling headphones (BT)')) AS t(token, pos)
ORDER BY pos;
-- A new product was inserted — index just that row (no full rebuild)
CALL fts_metadata.INDEX_SINGLE_DOCUMENT_C(
    'products_idx',
    CAST(1042 AS VARCHAR(512)),  -- the new product_id
    result_msg
);

-- An existing product was updated — re-index it
CALL fts_metadata.UPDATE_SINGLE_DOCUMENT_C(
    'products_idx',
    CAST(1042 AS VARCHAR(512)),
    result_msg
);

-- A product was deleted — remove it from the index
CALL fts_metadata.DELETE_SINGLE_DOCUMENT(
    'products_idx',
    CAST(1042 AS VARCHAR(512)),
    result_msg
);

-- After a large batch of changes, do a full rebuild instead
-- (automatically replaces index contents and recalculates statistics)
CALL fts_metadata.REBUILD_FTS_INDEX_C('products_idx', result_msg);
SELECT :result_msg;  -- "Index products_idx rebuilt (12,850 documents)"
# 1. Unpack the distribution for your Teradata version
unzip teradata-fts-td20.zip   # or teradata-fts-td17.zip

# 2. Edit credentials in the BTEQ script
nano bteq/install.bteq        # set your hostname, username, password

# 3. Run the deployment (creates fts_metadata schema + all objects)
./deploy.sh                   # Linux / macOS
deploy.bat                    # Windows

# 4. Verify installation
bteq << EOF
.LOGON myhost/myuser,mypassword
SELECT * FROM fts_metadata.v_fts_indexes;
.LOGOFF
EOF

# What gets installed into fts_metadata:
#   3 UDFs        — TOKENIZE_TEXT_C, TOKENIZE_TEXT_V, BM25_SCORE
#   9 procedures  — CREATE, DROP, REBUILD, INDEX/UPDATE/DELETE_SINGLE, SEARCH_FTS
#   4 tables      — fts_indexes, fts_term_index, fts_document_stats, fts_term_stats
#   3 views       — v_fts_indexes, v_fts_db_indexes, v_fts_index_columns
#   2 GTTs        — Working space for CLOB and VARCHAR indexing operations

Requires CREATE DATABASE privileges on the target Teradata instance and approximately 1 GB of permanent space for the fts_metadata schema. Full prerequisites are detailed in the documentation.

Architecture

Everything lives inside Teradata

No sidecar services, no Elasticsearch cluster, no ETL pipelines. The entire system — indexer, scorer, and metadata store — is deployed once into a single Teradata schema.

Your Application SQL calls only BTEQ / JDBC / ODBC TERADATA DATABASE Stored Procedures • CREATE_FTS_INDEX_C / _V • REBUILD_FTS_INDEX_C / _V • DROP_FTS_INDEX • INDEX_SINGLE_DOCUMENT_C • UPDATE_SINGLE_DOCUMENT_C • DELETE_SINGLE_DOCUMENT • SEARCH_FTS (TD20) C UDFs • TOKENIZE_TEXT_C • TOKENIZE_TEXT_V • BM25_SCORE fts_metadata Schema TABLES • fts_indexes Index registry & statistics • fts_term_index Inverted index (term → doc) • fts_document_stats Per-document length • fts_term_stats Global term frequencies VIEWS • v_fts_indexes • v_fts_db_indexes Your Tables products articles knowledge_base customer_notes
Compatibility

Teradata 17 and 20 support

All core indexing and search capabilities are available on both major Teradata versions. Teradata 20 gains the additional SEARCH_FTS convenience procedure.

Feature Teradata 17 Teradata 20
TOKENIZE_TEXT_C / _V UDFs
BM25_SCORE UDF
CREATE / DROP / REBUILD index
Incremental index updates
Direct SQL ranked search
Multi-column indexing
Resumable index creation
SEARCH_FTS convenience procedure
Velocity Framework

Full-text search through one unified API

The TeradataFts extension plugs directly into Velocity Framework, translating a single manager.FullTextRetrieval() call into native BM25-ranked SQL — the exact same method that drives full-text search on every other supported datasource.

No vendor knowledge required. Create a FullTextIndex, call FullTextRetrieval() — Velocity takes care of everything else. You never write a database-specific query.

Switch datasource, change nothing else. Move from SQL Server to Teradata by changing a single connection object. Every model, index definition, and search call in your application stays exactly as written.

BM25 on Teradata, transparently. The TeradataFts extension delivers the same high-quality relevance ranking available on other platforms — surfaced through the same Velocity API, with no extra work on your part.

Same API on all supported databases
Teradata · BM25 SQL Server · CONTAINS PostgreSQL · tsvector MySQL · FULLTEXT
using YndigoBlue.Velocity.Engine;
using YndigoBlue.Velocity.Model;

// Define a full-text index — the same object regardless of which database
// you are targeting. Velocity handles all vendor-specific DDL for you.
var idx = new FullTextIndex("product_search_idx", Product.Table)
{
    IndexColumn = "product_id",
    Columns     = ["name", "description"]
};

using var manager = new Manager(connection);

// Deploy — no knowledge of the underlying database required
manager.CreateFullTextIndex(idx, overwrite: false);

// Rebuild after a bulk data change
manager.UpdateFullTextIndex(idx);

TeradataFts is included alongside Velocity Framework in all YndigoBlue subscriptions. Learn about Velocity Framework →

Add search to your Teradata data today

Teradata Full-Text Search is included with all YndigoBlue subscriptions — no separate licence required. Deploy in minutes, search in seconds.