Index Management Guide
Database indexes are critical for query performance. Taruvi's Data Service provides comprehensive index management for both flat_table and JSONB providers through Frictionless Table Schema extensions.
Overview
Indexes improve query performance by allowing the database to quickly locate rows without scanning the entire table. Without proper indexes, queries can become slow as your data grows.
When to Use Indexes
✅ Create indexes when:
- Filtering frequently on specific columns (
WHERE status = 'active') - Joining tables on foreign key columns
- Sorting results by specific columns (
ORDER BY created_at) - Searching for unique values
- Querying JSONB fields frequently
❌ Avoid excessive indexes when:
- Table has very few rows (< 1000)
- Column values are not selective (e.g., boolean with 50/50 distribution)
- Write performance is more critical than read performance
- Storage space is limited
Index Definition Format
Indexes are defined in the indexes array within your Frictionless Table Schema:
{
"fields": [
{"name": "id", "type": "integer"},
{"name": "email", "type": "string"},
{"name": "status", "type": "string"},
{"name": "created_at", "type": "datetime"}
],
"primaryKey": ["id"],
"indexes": [
{
"name": "idx_users_email",
"fields": ["email"],
"unique": false,
"method": "btree",
"comment": "Fast email lookup"
}
]
}
Index Properties
| Property | Type | Required | Description |
|---|---|---|---|
name | string | ✅ Yes | Unique index name (must follow PostgreSQL naming conventions) |
fields | array | ✅ Yes | Column names to index (or use expression) |
unique | boolean | No | Create UNIQUE index (default: false) |
method | string | No | Index type: btree, hash, gin, gist, brin (default: btree) |
expression | string | No | SQL expression for expression indexes (e.g., LOWER(email)) |
where | string | No | WHERE clause for partial indexes (e.g., status = 'active') |
comment | string | No | Human-readable description |
Index Types
1. Simple Column Index
Index on a single column for fast equality and range queries:
{
"indexes": [
{
"name": "idx_users_email",
"fields": ["email"],
"method": "btree",
"comment": "Fast email lookup"
}
]
}
Use cases:
WHERE email = 'user@example.com'WHERE email LIKE 'user%'ORDER BY email
2. Composite Index
Index on multiple columns for queries filtering by multiple fields:
{
"indexes": [
{
"name": "idx_orders_status_created",
"fields": ["status", "created_at"],
"method": "btree",
"comment": "Filter orders by status and date"
}
]
}
Use cases:
WHERE status = 'active' AND created_at > '2024-01-01'WHERE status = 'active' ORDER BY created_at DESC
For composite indexes, put the most selective column first. PostgreSQL can use a composite index for queries that filter only on the first column(s).
Example: (status, created_at) can be used for:
- ✅
WHERE status = 'active' - ✅
WHERE status = 'active' AND created_at > '2024-01-01' - ❌
WHERE created_at > '2024-01-01'(cannot use this index)
3. Unique Index
Enforce uniqueness constraint with an index:
{
"indexes": [
{
"name": "idx_users_email_unique",
"fields": ["email"],
"unique": true,
"method": "btree",
"comment": "Ensure email uniqueness"
}
]
}
Alternative: Use field-level unique constraint (automatically creates index):
{
"fields": [
{
"name": "email",
"type": "string",
"constraints": {"unique": true}
}
]
}
4. Expression Index
Index on computed expressions for case-insensitive searches or transformations:
{
"indexes": [
{
"name": "idx_users_email_lower",
"fields": ["email"],
"expression": "LOWER(email)",
"unique": true,
"method": "btree",
"comment": "Case-insensitive unique email"
}
]
}
Use cases:
WHERE LOWER(email) = 'user@example.com'(case-insensitive search)WHERE UPPER(name) = 'JOHN DOE'
Common expressions:
LOWER(column)- Case-insensitive textUPPER(column)- Uppercase normalizationcolumn::text- Type castingextract(year from date_column)- Extract date parts
5. Partial Index
Index only a subset of rows to save space and improve performance:
{
"indexes": [
{
"name": "idx_orders_active_created",
"fields": ["created_at"],
"where": "status = 'active'",
"method": "btree",
"comment": "Index only active orders"
}
]
}
Use cases:
- Frequently query only active/published records
- Archive old data but only query recent data
- Status-based filtering (e.g., only index non-deleted rows)
Benefits:
- Smaller index size (faster updates, less storage)
- Better cache utilization
- Faster queries on filtered subset
6. GIN Index (for JSONB and Arrays)
Generalized Inverted Index for full-text search and JSONB queries:
{
"indexes": [
{
"name": "idx_users_metadata_gin",
"fields": ["metadata"],
"method": "gin",
"comment": "Fast JSONB queries"
}
]
}
Use cases:
- JSONB containment:
WHERE metadata @> '{"key": "value"}' - Key existence:
WHERE metadata ? 'key' - Array operations:
WHERE tags @> ARRAY['python', 'django']
FlatTable Provider Indexes
For normalized tables with typed columns, indexes are created as standard PostgreSQL indexes.
Example: E-commerce Schema
{
"fields": [
{"name": "id", "type": "integer"},
{"name": "email", "type": "string"},
{"name": "status", "type": "string"},
{"name": "category", "type": "string"},
{"name": "price", "type": "number"},
{"name": "created_at", "type": "datetime"}
],
"primaryKey": ["id"],
"indexes": [
{
"name": "idx_products_email_lower",
"fields": ["email"],
"expression": "LOWER(email)",
"unique": true,
"method": "btree",
"comment": "Case-insensitive unique email"
},
{
"name": "idx_products_category_price",
"fields": ["category", "price"],
"method": "btree",
"comment": "Filter by category and price range"
},
{
"name": "idx_products_active_created",
"fields": ["created_at"],
"where": "status = 'active'",
"method": "btree",
"comment": "Partial index for active products"
}
]
}
Generated SQL:
-- Expression index
CREATE UNIQUE INDEX idx_products_email_lower
ON products (LOWER(email));
-- Composite index
CREATE INDEX idx_products_category_price
ON products (category, price);
-- Partial index
CREATE INDEX idx_products_active_created
ON products (created_at)
WHERE status = 'active';
JSONB Provider Indexes
For JSONB storage, all data is in a single data JSONB column. Indexes use expression syntax with automatic type casting.
JSONB Index Expressions
The system automatically generates correct JSONB expressions:
| Field Type | Generated Expression | Example |
|---|---|---|
string | (data->>'field') | Text extraction |
integer | ((data->>'field')::integer) | Cast to integer |
number | ((data->>'field')::numeric) | Cast to numeric |
boolean | ((data->>'field')::boolean) | Cast to boolean |
object/array (GIN) | (data->'field') | Keep as JSONB |
Example: JSONB Schema with Indexes
{
"fields": [
{"name": "email", "type": "string"},
{"name": "status", "type": "string"},
{"name": "age", "type": "integer"},
{"name": "metadata", "type": "object"}
],
"indexes": [
{
"name": "idx_jsonb_email_status",
"fields": ["email", "status"],
"method": "btree",
"comment": "Composite JSONB index"
},
{
"name": "idx_jsonb_age",
"fields": ["age"],
"method": "btree",
"comment": "Integer field with casting"
},
{
"name": "idx_jsonb_metadata_gin",
"fields": ["metadata"],
"method": "gin",
"comment": "Full JSONB search"
}
]
}
Generated SQL:
-- Composite index with text extraction
CREATE INDEX idx_jsonb_email_status
ON users ((data->>'email'), (data->>'status'));
-- Integer field with type casting
CREATE INDEX idx_jsonb_age
ON users (((data->>'age')::integer));
-- GIN index for JSONB queries
CREATE INDEX idx_jsonb_metadata_gin
ON users USING GIN ((data->'metadata'));
JSONB Query Examples
With the indexes above, these queries are optimized:
-- Uses idx_jsonb_email_status
SELECT * FROM users
WHERE data->>'email' = 'user@example.com'
AND data->>'status' = 'active';
-- Uses idx_jsonb_age
SELECT * FROM users
WHERE (data->>'age')::integer > 21
ORDER BY (data->>'age')::integer;
-- Uses idx_jsonb_metadata_gin
SELECT * FROM users
WHERE data->'metadata' @> '{"plan": "premium"}';
Automatic Indexes
Taruvi automatically creates indexes for:
1. Primary Keys
{
"primaryKey": ["id"]
}
→ Automatic index: PRIMARY KEY (id)
2. Unique Constraints
{
"fields": [
{
"name": "email",
"type": "string",
"constraints": {"unique": true}
}
]
}
→ Automatic index: UNIQUE (email)
3. Foreign Keys (FlatTable)
{
"foreignKeys": [
{
"fields": ["user_id"],
"reference": {"resource": "users", "fields": ["id"]}
}
]
}
→ Automatic index: INDEX (user_id)
4. Foreign Keys (JSONB)
For JSONB provider, foreign key fields get expression indexes:
{
"fields": [
{"name": "user_id", "type": "integer"}
],
"foreignKeys": [
{
"fields": ["user_id"],
"reference": {"resource": "users", "fields": ["id"]}
}
]
}
→ Automatic index: INDEX ((data->>'user_id')::integer)
5. GIN Index on JSONB Column
For JSONB provider, a GIN index is automatically created on the data column:
CREATE INDEX {table_name}_data_gin_idx
ON {table_name} USING GIN (data);
This enables fast containment queries: WHERE data @> '{"status": "active"}'
Index Methods
PostgreSQL supports different index types optimized for different query patterns:
B-tree (Default)
Best for: Equality, range queries, sorting
{"method": "btree"}
Query types:
=,<,>,<=,>=BETWEENINORDER BYMIN,MAX
Use cases: Most general-purpose indexing
Hash
Best for: Equality comparisons only
{"method": "hash"}
Query types:
=only
Use cases: Large equality-only lookups (rare, B-tree usually better)
GIN (Generalized Inverted Index)
Best for: Full-text search, JSONB, arrays
{"method": "gin"}
Query types:
@>(contains)?(key exists)?&(all keys exist)?|(any key exists)@@(full-text search)
Use cases:
- JSONB queries
- Array operations
- Full-text search (with
tsvector)
GiST (Generalized Search Tree)
Best for: Geometric data, ranges
{"method": "gist"}
Query types:
- Geometric operators
- Range overlaps
- Nearest-neighbor searches
Use cases:
- PostGIS geospatial queries
- IP address ranges
- Date/time ranges
BRIN (Block Range Index)
Best for: Very large tables with natural ordering
{"method": "brin"}
Query types:
- Range queries on naturally ordered data
Use cases:
- Timestamp columns on append-only tables
- Sequential data (IoT sensors, logs)
Benefits: Very small index size, fast updates
Best Practices
1. Index Selectivity
Index columns with high selectivity (many distinct values):
✅ Good candidates:
- Email addresses (unique)
- User IDs (unique)
- Timestamps (high cardinality)
- Status with many values
❌ Poor candidates:
- Boolean columns (only 2 values)
- Gender (few distinct values)
- is_deleted (mostly FALSE)
Exception: Use partial indexes for low-selectivity columns:
{
"name": "idx_active_orders",
"fields": ["id"],
"where": "status = 'active'",
"comment": "Only 10% of orders are active"
}
2. Composite Index Column Order
Order columns by selectivity (most selective first):
// ✅ GOOD: email is highly selective
{
"name": "idx_email_status",
"fields": ["email", "status"]
}
// ❌ BAD: status has low selectivity
{
"name": "idx_status_email",
"fields": ["status", "email"]
}
3. Avoid Over-Indexing
Each index has costs:
- Storage: Indexes consume disk space
- Write performance: INSERT/UPDATE/DELETE slower
- Memory: More indexes = more cache needed
Rule of thumb: Only index columns frequently used in WHERE, JOIN, ORDER BY clauses.
4. Monitor Index Usage
Track which indexes are actually used:
-- PostgreSQL query to check index usage
SELECT
schemaname,
tablename,
indexname,
idx_scan as scans,
idx_tup_read as tuples_read,
idx_tup_fetch as tuples_fetched,
pg_size_pretty(pg_relation_size(indexrelid)) as size
FROM pg_stat_user_indexes
WHERE schemaname = 'your_schema'
ORDER BY idx_scan ASC;
Drop unused indexes:
idx_scan = 0means index is never used- Consider removing if not used after sufficient time
5. Use EXPLAIN ANALYZE
Test if your index is being used:
EXPLAIN ANALYZE
SELECT * FROM users
WHERE status = 'active'
AND created_at > '2024-01-01';
Look for:
Index ScanorIndex Only Scan(good)Seq Scan(bad - index not used)
Common Patterns
Pattern 1: User Lookup by Email
{
"indexes": [
{
"name": "idx_users_email_lower",
"fields": ["email"],
"expression": "LOWER(email)",
"unique": true,
"method": "btree",
"comment": "Case-insensitive email lookup"
}
]
}
Query:
SELECT * FROM users WHERE LOWER(email) = LOWER('User@Example.com');
Pattern 2: Recent Active Records
{
"indexes": [
{
"name": "idx_posts_active_recent",
"fields": ["created_at"],
"where": "status = 'published'",
"method": "btree",
"comment": "Recently published posts"
}
]
}
Query:
SELECT * FROM posts
WHERE status = 'published'
ORDER BY created_at DESC
LIMIT 20;
Pattern 3: Multi-Tenant Data
{
"indexes": [
{
"name": "idx_data_tenant_created",
"fields": ["tenant_id", "created_at"],
"method": "btree",
"comment": "Tenant-scoped queries"
}
]
}
Query:
SELECT * FROM data
WHERE tenant_id = 123
ORDER BY created_at DESC;
Pattern 4: JSONB Search
{
"indexes": [
{
"name": "idx_users_settings_gin",
"fields": ["settings"],
"method": "gin",
"comment": "Search user settings"
}
]
}
Query:
SELECT * FROM users
WHERE settings @> '{"theme": "dark", "notifications": true}';
Validation and Errors
Valid Index Names
Index names must:
- Start with a letter or underscore
- Contain only letters, numbers, underscores
- Be unique within the table
- Follow PostgreSQL identifier rules
✅ Valid:
idx_users_emailidx_orders_status_createdidx_posts_published_at
❌ Invalid:
123_index(starts with number)idx-users-email(contains hyphen)idx users email(contains spaces)
Common Errors
Missing Required Fields:
{
"indexes": [
{
"fields": ["email"],
"method": "btree"
}
]
}
→ Error: Index definition missing 'name' field
Invalid Method:
{
"indexes": [
{
"name": "idx_test",
"fields": ["email"],
"method": "invalid"
}
]
}
→ Error: Index 'idx_test': method 'invalid' not supported. Valid methods: btree, hash, gin, gist, brin
Non-existent Field:
{
"indexes": [
{
"name": "idx_test",
"fields": ["nonexistent_field"],
"method": "btree"
}
]
}
→ Error: Index 'idx_test': field 'nonexistent_field' does not exist in table
Migration Handling
When updating schemas with index changes, the system automatically:
Adding Indexes
// Before
{"indexes": []}
// After
{
"indexes": [
{"name": "idx_new", "fields": ["email"], "method": "btree"}
]
}
→ Generates: CREATE INDEX idx_new ON table (email);
Removing Indexes
// Before
{
"indexes": [
{"name": "idx_old", "fields": ["email"], "method": "btree"}
]
}
// After
{"indexes": []}
→ Generates: DROP INDEX idx_old;
Modifying Indexes
Index modifications require drop + recreate:
// Change method from btree to hash
→ Generates:
DROP INDEX idx_email;
CREATE INDEX idx_email ON table USING HASH (email);
Index creation on large tables can take significant time and lock the table. Consider:
- Use
CREATE INDEX CONCURRENTLYfor production (requires manual SQL) - Create indexes during maintenance windows
- Monitor index build progress via
pg_stat_progress_create_index
Performance Tips
1. Index-Only Scans
Create covering indexes that include all queried columns:
{
"name": "idx_users_email_name",
"fields": ["email", "name"],
"comment": "Covering index for email + name queries"
}
Query:
-- Can be satisfied entirely from index (no table access)
SELECT email, name FROM users WHERE email = 'user@example.com';
2. Partial Indexes for Boolean Flags
Instead of indexing boolean columns directly:
// ❌ BAD: Low selectivity
{
"name": "idx_is_deleted",
"fields": ["is_deleted"]
}
// ✅ GOOD: Partial index
{
"name": "idx_not_deleted",
"fields": ["id"],
"where": "is_deleted = FALSE"
}
3. Expression Indexes for Computed Values
For frequently computed values:
{
"name": "idx_full_name",
"fields": ["first_name", "last_name"],
"expression": "first_name || ' ' || last_name",
"comment": "Index concatenated full name"
}
4. Reindex Periodically
B-tree indexes can become bloated over time. Rebuild them:
REINDEX INDEX idx_users_email;
REINDEX TABLE users;
Resources
What's Next?
- Schema Management: Complete schema reference
- Querying Data: Optimize queries with indexes
- Migrations: Evolve schemas safely