Skip to main content

Index Management Guide

Database indexes are critical for query performance. Taruvi's Data Service provides comprehensive index management for both flat_table and JSONB providers through Frictionless Table Schema extensions.

Overview

Indexes improve query performance by allowing the database to quickly locate rows without scanning the entire table. Without proper indexes, queries can become slow as your data grows.

When to Use Indexes

Create indexes when:

  • Filtering frequently on specific columns (WHERE status = 'active')
  • Joining tables on foreign key columns
  • Sorting results by specific columns (ORDER BY created_at)
  • Searching for unique values
  • Querying JSONB fields frequently

Avoid excessive indexes when:

  • Table has very few rows (< 1000)
  • Column values are not selective (e.g., boolean with 50/50 distribution)
  • Write performance is more critical than read performance
  • Storage space is limited

Index Definition Format

Indexes are defined in the indexes array within your Frictionless Table Schema:

{
"fields": [
{"name": "id", "type": "integer"},
{"name": "email", "type": "string"},
{"name": "status", "type": "string"},
{"name": "created_at", "type": "datetime"}
],
"primaryKey": ["id"],
"indexes": [
{
"name": "idx_users_email",
"fields": ["email"],
"unique": false,
"method": "btree",
"comment": "Fast email lookup"
}
]
}

Index Properties

PropertyTypeRequiredDescription
namestring✅ YesUnique index name (must follow PostgreSQL naming conventions)
fieldsarray✅ YesColumn names to index (or use expression)
uniquebooleanNoCreate UNIQUE index (default: false)
methodstringNoIndex type: btree, hash, gin, gist, brin (default: btree)
expressionstringNoSQL expression for expression indexes (e.g., LOWER(email))
wherestringNoWHERE clause for partial indexes (e.g., status = 'active')
commentstringNoHuman-readable description

Index Types

1. Simple Column Index

Index on a single column for fast equality and range queries:

{
"indexes": [
{
"name": "idx_users_email",
"fields": ["email"],
"method": "btree",
"comment": "Fast email lookup"
}
]
}

Use cases:

  • WHERE email = 'user@example.com'
  • WHERE email LIKE 'user%'
  • ORDER BY email

2. Composite Index

Index on multiple columns for queries filtering by multiple fields:

{
"indexes": [
{
"name": "idx_orders_status_created",
"fields": ["status", "created_at"],
"method": "btree",
"comment": "Filter orders by status and date"
}
]
}

Use cases:

  • WHERE status = 'active' AND created_at > '2024-01-01'
  • WHERE status = 'active' ORDER BY created_at DESC
Column Order Matters

For composite indexes, put the most selective column first. PostgreSQL can use a composite index for queries that filter only on the first column(s).

Example: (status, created_at) can be used for:

  • WHERE status = 'active'
  • WHERE status = 'active' AND created_at > '2024-01-01'
  • WHERE created_at > '2024-01-01' (cannot use this index)

3. Unique Index

Enforce uniqueness constraint with an index:

{
"indexes": [
{
"name": "idx_users_email_unique",
"fields": ["email"],
"unique": true,
"method": "btree",
"comment": "Ensure email uniqueness"
}
]
}

Alternative: Use field-level unique constraint (automatically creates index):

{
"fields": [
{
"name": "email",
"type": "string",
"constraints": {"unique": true}
}
]
}

4. Expression Index

Index on computed expressions for case-insensitive searches or transformations:

{
"indexes": [
{
"name": "idx_users_email_lower",
"fields": ["email"],
"expression": "LOWER(email)",
"unique": true,
"method": "btree",
"comment": "Case-insensitive unique email"
}
]
}

Use cases:

  • WHERE LOWER(email) = 'user@example.com' (case-insensitive search)
  • WHERE UPPER(name) = 'JOHN DOE'

Common expressions:

  • LOWER(column) - Case-insensitive text
  • UPPER(column) - Uppercase normalization
  • column::text - Type casting
  • extract(year from date_column) - Extract date parts

5. Partial Index

Index only a subset of rows to save space and improve performance:

{
"indexes": [
{
"name": "idx_orders_active_created",
"fields": ["created_at"],
"where": "status = 'active'",
"method": "btree",
"comment": "Index only active orders"
}
]
}

Use cases:

  • Frequently query only active/published records
  • Archive old data but only query recent data
  • Status-based filtering (e.g., only index non-deleted rows)

Benefits:

  • Smaller index size (faster updates, less storage)
  • Better cache utilization
  • Faster queries on filtered subset

6. GIN Index (for JSONB and Arrays)

Generalized Inverted Index for full-text search and JSONB queries:

{
"indexes": [
{
"name": "idx_users_metadata_gin",
"fields": ["metadata"],
"method": "gin",
"comment": "Fast JSONB queries"
}
]
}

Use cases:

  • JSONB containment: WHERE metadata @> '{"key": "value"}'
  • Key existence: WHERE metadata ? 'key'
  • Array operations: WHERE tags @> ARRAY['python', 'django']

FlatTable Provider Indexes

For normalized tables with typed columns, indexes are created as standard PostgreSQL indexes.

Example: E-commerce Schema

{
"fields": [
{"name": "id", "type": "integer"},
{"name": "email", "type": "string"},
{"name": "status", "type": "string"},
{"name": "category", "type": "string"},
{"name": "price", "type": "number"},
{"name": "created_at", "type": "datetime"}
],
"primaryKey": ["id"],
"indexes": [
{
"name": "idx_products_email_lower",
"fields": ["email"],
"expression": "LOWER(email)",
"unique": true,
"method": "btree",
"comment": "Case-insensitive unique email"
},
{
"name": "idx_products_category_price",
"fields": ["category", "price"],
"method": "btree",
"comment": "Filter by category and price range"
},
{
"name": "idx_products_active_created",
"fields": ["created_at"],
"where": "status = 'active'",
"method": "btree",
"comment": "Partial index for active products"
}
]
}

Generated SQL:

-- Expression index
CREATE UNIQUE INDEX idx_products_email_lower
ON products (LOWER(email));

-- Composite index
CREATE INDEX idx_products_category_price
ON products (category, price);

-- Partial index
CREATE INDEX idx_products_active_created
ON products (created_at)
WHERE status = 'active';

JSONB Provider Indexes

For JSONB storage, all data is in a single data JSONB column. Indexes use expression syntax with automatic type casting.

JSONB Index Expressions

The system automatically generates correct JSONB expressions:

Field TypeGenerated ExpressionExample
string(data->>'field')Text extraction
integer((data->>'field')::integer)Cast to integer
number((data->>'field')::numeric)Cast to numeric
boolean((data->>'field')::boolean)Cast to boolean
object/array (GIN)(data->'field')Keep as JSONB

Example: JSONB Schema with Indexes

{
"fields": [
{"name": "email", "type": "string"},
{"name": "status", "type": "string"},
{"name": "age", "type": "integer"},
{"name": "metadata", "type": "object"}
],
"indexes": [
{
"name": "idx_jsonb_email_status",
"fields": ["email", "status"],
"method": "btree",
"comment": "Composite JSONB index"
},
{
"name": "idx_jsonb_age",
"fields": ["age"],
"method": "btree",
"comment": "Integer field with casting"
},
{
"name": "idx_jsonb_metadata_gin",
"fields": ["metadata"],
"method": "gin",
"comment": "Full JSONB search"
}
]
}

Generated SQL:

-- Composite index with text extraction
CREATE INDEX idx_jsonb_email_status
ON users ((data->>'email'), (data->>'status'));

-- Integer field with type casting
CREATE INDEX idx_jsonb_age
ON users (((data->>'age')::integer));

-- GIN index for JSONB queries
CREATE INDEX idx_jsonb_metadata_gin
ON users USING GIN ((data->'metadata'));

JSONB Query Examples

With the indexes above, these queries are optimized:

-- Uses idx_jsonb_email_status
SELECT * FROM users
WHERE data->>'email' = 'user@example.com'
AND data->>'status' = 'active';

-- Uses idx_jsonb_age
SELECT * FROM users
WHERE (data->>'age')::integer > 21
ORDER BY (data->>'age')::integer;

-- Uses idx_jsonb_metadata_gin
SELECT * FROM users
WHERE data->'metadata' @> '{"plan": "premium"}';

Automatic Indexes

Taruvi automatically creates indexes for:

1. Primary Keys

{
"primaryKey": ["id"]
}

→ Automatic index: PRIMARY KEY (id)

2. Unique Constraints

{
"fields": [
{
"name": "email",
"type": "string",
"constraints": {"unique": true}
}
]
}

→ Automatic index: UNIQUE (email)

3. Foreign Keys (FlatTable)

{
"foreignKeys": [
{
"fields": ["user_id"],
"reference": {"resource": "users", "fields": ["id"]}
}
]
}

→ Automatic index: INDEX (user_id)

4. Foreign Keys (JSONB)

For JSONB provider, foreign key fields get expression indexes:

{
"fields": [
{"name": "user_id", "type": "integer"}
],
"foreignKeys": [
{
"fields": ["user_id"],
"reference": {"resource": "users", "fields": ["id"]}
}
]
}

→ Automatic index: INDEX ((data->>'user_id')::integer)

5. GIN Index on JSONB Column

For JSONB provider, a GIN index is automatically created on the data column:

CREATE INDEX {table_name}_data_gin_idx
ON {table_name} USING GIN (data);

This enables fast containment queries: WHERE data @> '{"status": "active"}'

Index Methods

PostgreSQL supports different index types optimized for different query patterns:

B-tree (Default)

Best for: Equality, range queries, sorting

{"method": "btree"}

Query types:

  • =, <, >, <=, >=
  • BETWEEN
  • IN
  • ORDER BY
  • MIN, MAX

Use cases: Most general-purpose indexing

Hash

Best for: Equality comparisons only

{"method": "hash"}

Query types:

  • = only

Use cases: Large equality-only lookups (rare, B-tree usually better)

GIN (Generalized Inverted Index)

Best for: Full-text search, JSONB, arrays

{"method": "gin"}

Query types:

  • @> (contains)
  • ? (key exists)
  • ?& (all keys exist)
  • ?| (any key exists)
  • @@ (full-text search)

Use cases:

  • JSONB queries
  • Array operations
  • Full-text search (with tsvector)

GiST (Generalized Search Tree)

Best for: Geometric data, ranges

{"method": "gist"}

Query types:

  • Geometric operators
  • Range overlaps
  • Nearest-neighbor searches

Use cases:

  • PostGIS geospatial queries
  • IP address ranges
  • Date/time ranges

BRIN (Block Range Index)

Best for: Very large tables with natural ordering

{"method": "brin"}

Query types:

  • Range queries on naturally ordered data

Use cases:

  • Timestamp columns on append-only tables
  • Sequential data (IoT sensors, logs)

Benefits: Very small index size, fast updates

Best Practices

1. Index Selectivity

Index columns with high selectivity (many distinct values):

Good candidates:

  • Email addresses (unique)
  • User IDs (unique)
  • Timestamps (high cardinality)
  • Status with many values

Poor candidates:

  • Boolean columns (only 2 values)
  • Gender (few distinct values)
  • is_deleted (mostly FALSE)

Exception: Use partial indexes for low-selectivity columns:

{
"name": "idx_active_orders",
"fields": ["id"],
"where": "status = 'active'",
"comment": "Only 10% of orders are active"
}

2. Composite Index Column Order

Order columns by selectivity (most selective first):

// ✅ GOOD: email is highly selective
{
"name": "idx_email_status",
"fields": ["email", "status"]
}

// ❌ BAD: status has low selectivity
{
"name": "idx_status_email",
"fields": ["status", "email"]
}

3. Avoid Over-Indexing

Each index has costs:

  • Storage: Indexes consume disk space
  • Write performance: INSERT/UPDATE/DELETE slower
  • Memory: More indexes = more cache needed

Rule of thumb: Only index columns frequently used in WHERE, JOIN, ORDER BY clauses.

4. Monitor Index Usage

Track which indexes are actually used:

-- PostgreSQL query to check index usage
SELECT
schemaname,
tablename,
indexname,
idx_scan as scans,
idx_tup_read as tuples_read,
idx_tup_fetch as tuples_fetched,
pg_size_pretty(pg_relation_size(indexrelid)) as size
FROM pg_stat_user_indexes
WHERE schemaname = 'your_schema'
ORDER BY idx_scan ASC;

Drop unused indexes:

  • idx_scan = 0 means index is never used
  • Consider removing if not used after sufficient time

5. Use EXPLAIN ANALYZE

Test if your index is being used:

EXPLAIN ANALYZE
SELECT * FROM users
WHERE status = 'active'
AND created_at > '2024-01-01';

Look for:

  • Index Scan or Index Only Scan (good)
  • Seq Scan (bad - index not used)

Common Patterns

Pattern 1: User Lookup by Email

{
"indexes": [
{
"name": "idx_users_email_lower",
"fields": ["email"],
"expression": "LOWER(email)",
"unique": true,
"method": "btree",
"comment": "Case-insensitive email lookup"
}
]
}

Query:

SELECT * FROM users WHERE LOWER(email) = LOWER('User@Example.com');

Pattern 2: Recent Active Records

{
"indexes": [
{
"name": "idx_posts_active_recent",
"fields": ["created_at"],
"where": "status = 'published'",
"method": "btree",
"comment": "Recently published posts"
}
]
}

Query:

SELECT * FROM posts
WHERE status = 'published'
ORDER BY created_at DESC
LIMIT 20;

Pattern 3: Multi-Tenant Data

{
"indexes": [
{
"name": "idx_data_tenant_created",
"fields": ["tenant_id", "created_at"],
"method": "btree",
"comment": "Tenant-scoped queries"
}
]
}

Query:

SELECT * FROM data
WHERE tenant_id = 123
ORDER BY created_at DESC;
{
"indexes": [
{
"name": "idx_users_settings_gin",
"fields": ["settings"],
"method": "gin",
"comment": "Search user settings"
}
]
}

Query:

SELECT * FROM users
WHERE settings @> '{"theme": "dark", "notifications": true}';

Validation and Errors

Valid Index Names

Index names must:

  • Start with a letter or underscore
  • Contain only letters, numbers, underscores
  • Be unique within the table
  • Follow PostgreSQL identifier rules

Valid:

  • idx_users_email
  • idx_orders_status_created
  • idx_posts_published_at

Invalid:

  • 123_index (starts with number)
  • idx-users-email (contains hyphen)
  • idx users email (contains spaces)

Common Errors

Missing Required Fields:

{
"indexes": [
{
"fields": ["email"],
"method": "btree"
}
]
}

→ Error: Index definition missing 'name' field

Invalid Method:

{
"indexes": [
{
"name": "idx_test",
"fields": ["email"],
"method": "invalid"
}
]
}

→ Error: Index 'idx_test': method 'invalid' not supported. Valid methods: btree, hash, gin, gist, brin

Non-existent Field:

{
"indexes": [
{
"name": "idx_test",
"fields": ["nonexistent_field"],
"method": "btree"
}
]
}

→ Error: Index 'idx_test': field 'nonexistent_field' does not exist in table

Migration Handling

When updating schemas with index changes, the system automatically:

Adding Indexes

// Before
{"indexes": []}

// After
{
"indexes": [
{"name": "idx_new", "fields": ["email"], "method": "btree"}
]
}

→ Generates: CREATE INDEX idx_new ON table (email);

Removing Indexes

// Before
{
"indexes": [
{"name": "idx_old", "fields": ["email"], "method": "btree"}
]
}

// After
{"indexes": []}

→ Generates: DROP INDEX idx_old;

Modifying Indexes

Index modifications require drop + recreate:

// Change method from btree to hash

→ Generates:

DROP INDEX idx_email;
CREATE INDEX idx_email ON table USING HASH (email);
Production Considerations

Index creation on large tables can take significant time and lock the table. Consider:

  • Use CREATE INDEX CONCURRENTLY for production (requires manual SQL)
  • Create indexes during maintenance windows
  • Monitor index build progress via pg_stat_progress_create_index

Performance Tips

1. Index-Only Scans

Create covering indexes that include all queried columns:

{
"name": "idx_users_email_name",
"fields": ["email", "name"],
"comment": "Covering index for email + name queries"
}

Query:

-- Can be satisfied entirely from index (no table access)
SELECT email, name FROM users WHERE email = 'user@example.com';

2. Partial Indexes for Boolean Flags

Instead of indexing boolean columns directly:

// ❌ BAD: Low selectivity
{
"name": "idx_is_deleted",
"fields": ["is_deleted"]
}

// ✅ GOOD: Partial index
{
"name": "idx_not_deleted",
"fields": ["id"],
"where": "is_deleted = FALSE"
}

3. Expression Indexes for Computed Values

For frequently computed values:

{
"name": "idx_full_name",
"fields": ["first_name", "last_name"],
"expression": "first_name || ' ' || last_name",
"comment": "Index concatenated full name"
}

4. Reindex Periodically

B-tree indexes can become bloated over time. Rebuild them:

REINDEX INDEX idx_users_email;
REINDEX TABLE users;

Resources

What's Next?