Index Management Guide

Database indexes are critical for query performance. Taruvi's Data Service provides comprehensive index management for both flat_table and JSONB providers through Frictionless Table Schema extensions.

Overview

Indexes improve query performance by allowing the database to quickly locate rows without scanning the entire table. Without proper indexes, queries can become slow as your data grows.

When to Use Indexes

✅ Create indexes when:

Filtering frequently on specific columns (WHERE status = 'active')
Joining tables on foreign key columns
Sorting results by specific columns (ORDER BY created_at)
Searching for unique values
Querying JSONB fields frequently

❌ Avoid excessive indexes when:

Table has very few rows (< 1000)
Column values are not selective (e.g., boolean with 50/50 distribution)
Write performance is more critical than read performance
Storage space is limited

Index Definition Format

Indexes are defined in the indexes array within your Frictionless Table Schema:

{
  "fields": [
    {"name": "id", "type": "integer"},
    {"name": "email", "type": "string"},
    {"name": "status", "type": "string"},
    {"name": "created_at", "type": "datetime"}
  ],
  "primaryKey": ["id"],
  "indexes": [
    {
      "name": "idx_users_email",
      "fields": ["email"],
      "unique": false,
      "method": "btree",
      "comment": "Fast email lookup"
    }
  ]
}

Index Properties

Property	Type	Required	Description
`name`	string	✅ Yes	Unique index name (must follow PostgreSQL naming conventions)
`fields`	array	✅ Yes	Column names to index (or use `expression`)
`unique`	boolean	No	Create UNIQUE index (default: `false`)
`method`	string	No	Index type: `btree`, `hash`, `gin`, `gist`, `brin` (default: `btree`)
`expression`	string	No	SQL expression for expression indexes (e.g., `LOWER(email)`)
`where`	string	No	WHERE clause for partial indexes (e.g., `status = 'active'`)
`comment`	string	No	Human-readable description

Index Types

1. Simple Column Index

Index on a single column for fast equality and range queries:

{
  "indexes": [
    {
      "name": "idx_users_email",
      "fields": ["email"],
      "method": "btree",
      "comment": "Fast email lookup"
    }
  ]
}

Use cases:

WHERE email = 'user@example.com'
WHERE email LIKE 'user%'
ORDER BY email

2. Composite Index

Index on multiple columns for queries filtering by multiple fields:

{
  "indexes": [
    {
      "name": "idx_orders_status_created",
      "fields": ["status", "created_at"],
      "method": "btree",
      "comment": "Filter orders by status and date"
    }
  ]
}

Use cases:

WHERE status = 'active' AND created_at > '2024-01-01'
WHERE status = 'active' ORDER BY created_at DESC

Column Order Matters

For composite indexes, put the most selective column first. PostgreSQL can use a composite index for queries that filter only on the first column(s).

Example: (status, created_at) can be used for:

✅ WHERE status = 'active'
✅ WHERE status = 'active' AND created_at > '2024-01-01'
❌ WHERE created_at > '2024-01-01' (cannot use this index)

3. Unique Index

Enforce uniqueness constraint with an index:

{
  "indexes": [
    {
      "name": "idx_users_email_unique",
      "fields": ["email"],
      "unique": true,
      "method": "btree",
      "comment": "Ensure email uniqueness"
    }
  ]
}

Alternative: Use field-level unique constraint (automatically creates index):

{
  "fields": [
    {
      "name": "email",
      "type": "string",
      "constraints": {"unique": true}
    }
  ]
}

4. Expression Index

Index on computed expressions for case-insensitive searches or transformations:

{
  "indexes": [
    {
      "name": "idx_users_email_lower",
      "fields": ["email"],
      "expression": "LOWER(email)",
      "unique": true,
      "method": "btree",
      "comment": "Case-insensitive unique email"
    }
  ]
}

Use cases:

WHERE LOWER(email) = 'user@example.com' (case-insensitive search)
WHERE UPPER(name) = 'JOHN DOE'

Common expressions:

LOWER(column) - Case-insensitive text
UPPER(column) - Uppercase normalization
column::text - Type casting
extract(year from date_column) - Extract date parts

5. Partial Index

Index only a subset of rows to save space and improve performance:

{
  "indexes": [
    {
      "name": "idx_orders_active_created",
      "fields": ["created_at"],
      "where": "status = 'active'",
      "method": "btree",
      "comment": "Index only active orders"
    }
  ]
}

Use cases:

Frequently query only active/published records
Archive old data but only query recent data
Status-based filtering (e.g., only index non-deleted rows)

Benefits:

Smaller index size (faster updates, less storage)
Better cache utilization
Faster queries on filtered subset

6. GIN Index (for JSONB and Arrays)

Generalized Inverted Index for full-text search and JSONB queries:

{
  "indexes": [
    {
      "name": "idx_users_metadata_gin",
      "fields": ["metadata"],
      "method": "gin",
      "comment": "Fast JSONB queries"
    }
  ]
}

Use cases:

JSONB containment: WHERE metadata @> '{"key": "value"}'
Key existence: WHERE metadata ? 'key'
Array operations: WHERE tags @> ARRAY['python', 'django']

FlatTable Provider Indexes

For normalized tables with typed columns, indexes are created as standard PostgreSQL indexes.

Example: E-commerce Schema

{
  "fields": [
    {"name": "id", "type": "integer"},
    {"name": "email", "type": "string"},
    {"name": "status", "type": "string"},
    {"name": "category", "type": "string"},
    {"name": "price", "type": "number"},
    {"name": "created_at", "type": "datetime"}
  ],
  "primaryKey": ["id"],
  "indexes": [
    {
      "name": "idx_products_email_lower",
      "fields": ["email"],
      "expression": "LOWER(email)",
      "unique": true,
      "method": "btree",
      "comment": "Case-insensitive unique email"
    },
    {
      "name": "idx_products_category_price",
      "fields": ["category", "price"],
      "method": "btree",
      "comment": "Filter by category and price range"
    },
    {
      "name": "idx_products_active_created",
      "fields": ["created_at"],
      "where": "status = 'active'",
      "method": "btree",
      "comment": "Partial index for active products"
    }
  ]
}

Generated SQL:

-- Expression index
CREATE UNIQUE INDEX idx_products_email_lower
  ON products (LOWER(email));

-- Composite index
CREATE INDEX idx_products_category_price
  ON products (category, price);

-- Partial index
CREATE INDEX idx_products_active_created
  ON products (created_at)
  WHERE status = 'active';

JSONB Provider Indexes

For JSONB storage, all data is in a single data JSONB column. Indexes use expression syntax with automatic type casting.

JSONB Index Expressions

The system automatically generates correct JSONB expressions:

Field Type	Generated Expression	Example
`string`	`(data->>'field')`	Text extraction
`integer`	`((data->>'field')::integer)`	Cast to integer
`number`	`((data->>'field')::numeric)`	Cast to numeric
`boolean`	`((data->>'field')::boolean)`	Cast to boolean
`object`/`array` (GIN)	`(data->'field')`	Keep as JSONB

Example: JSONB Schema with Indexes

{
  "fields": [
    {"name": "email", "type": "string"},
    {"name": "status", "type": "string"},
    {"name": "age", "type": "integer"},
    {"name": "metadata", "type": "object"}
  ],
  "indexes": [
    {
      "name": "idx_jsonb_email_status",
      "fields": ["email", "status"],
      "method": "btree",
      "comment": "Composite JSONB index"
    },
    {
      "name": "idx_jsonb_age",
      "fields": ["age"],
      "method": "btree",
      "comment": "Integer field with casting"
    },
    {
      "name": "idx_jsonb_metadata_gin",
      "fields": ["metadata"],
      "method": "gin",
      "comment": "Full JSONB search"
    }
  ]
}

Generated SQL:

-- Composite index with text extraction
CREATE INDEX idx_jsonb_email_status
  ON users ((data->>'email'), (data->>'status'));

-- Integer field with type casting
CREATE INDEX idx_jsonb_age
  ON users (((data->>'age')::integer));

-- GIN index for JSONB queries
CREATE INDEX idx_jsonb_metadata_gin
  ON users USING GIN ((data->'metadata'));

JSONB Query Examples

With the indexes above, these queries are optimized:

-- Uses idx_jsonb_email_status
SELECT * FROM users
WHERE data->>'email' = 'user@example.com'
  AND data->>'status' = 'active';

-- Uses idx_jsonb_age
SELECT * FROM users
WHERE (data->>'age')::integer > 21
ORDER BY (data->>'age')::integer;

-- Uses idx_jsonb_metadata_gin
SELECT * FROM users
WHERE data->'metadata' @> '{"plan": "premium"}';

Automatic Indexes

Taruvi automatically creates indexes for:

1. Primary Keys

{
  "primaryKey": ["id"]
}

→ Automatic index: PRIMARY KEY (id)

2. Unique Constraints

{
  "fields": [
    {
      "name": "email",
      "type": "string",
      "constraints": {"unique": true}
    }
  ]
}

→ Automatic index: UNIQUE (email)

3. Foreign Keys (FlatTable)

{
  "foreignKeys": [
    {
      "fields": ["user_id"],
      "reference": {"resource": "users", "fields": ["id"]}
    }
  ]
}

→ Automatic index: INDEX (user_id)

4. Foreign Keys (JSONB)

For JSONB provider, foreign key fields get expression indexes:

{
  "fields": [
    {"name": "user_id", "type": "integer"}
  ],
  "foreignKeys": [
    {
      "fields": ["user_id"],
      "reference": {"resource": "users", "fields": ["id"]}
    }
  ]
}

→ Automatic index: INDEX ((data->>'user_id')::integer)

5. GIN Index on JSONB Column

For JSONB provider, a GIN index is automatically created on the data column:

CREATE INDEX {table_name}_data_gin_idx
  ON {table_name} USING GIN (data);

This enables fast containment queries: WHERE data @> '{"status": "active"}'

Index Methods

PostgreSQL supports different index types optimized for different query patterns:

B-tree (Default)

Best for: Equality, range queries, sorting

{"method": "btree"}

Query types:

=, <, >, <=, >=
BETWEEN
IN
ORDER BY
MIN, MAX

Use cases: Most general-purpose indexing

Hash

Best for: Equality comparisons only

{"method": "hash"}

Query types:

= only

Use cases: Large equality-only lookups (rare, B-tree usually better)

GIN (Generalized Inverted Index)

Best for: Full-text search, JSONB, arrays

{"method": "gin"}

Query types:

@> (contains)
? (key exists)
?& (all keys exist)
?| (any key exists)
@@ (full-text search)

Use cases:

JSONB queries
Array operations
Full-text search (with tsvector)

GiST (Generalized Search Tree)

Best for: Geometric data, ranges

{"method": "gist"}

Query types:

Geometric operators
Range overlaps
Nearest-neighbor searches

Use cases:

PostGIS geospatial queries
IP address ranges
Date/time ranges

BRIN (Block Range Index)

Best for: Very large tables with natural ordering

{"method": "brin"}

Query types:

Range queries on naturally ordered data

Use cases:

Timestamp columns on append-only tables
Sequential data (IoT sensors, logs)

Benefits: Very small index size, fast updates

Best Practices

1. Index Selectivity

Index columns with high selectivity (many distinct values):

✅ Good candidates:

Email addresses (unique)
User IDs (unique)
Timestamps (high cardinality)
Status with many values

❌ Poor candidates:

Boolean columns (only 2 values)
Gender (few distinct values)
is_deleted (mostly FALSE)

Exception: Use partial indexes for low-selectivity columns:

{
  "name": "idx_active_orders",
  "fields": ["id"],
  "where": "status = 'active'",
  "comment": "Only 10% of orders are active"
}

2. Composite Index Column Order

Order columns by selectivity (most selective first):

// ✅ GOOD: email is highly selective
{
  "name": "idx_email_status",
  "fields": ["email", "status"]
}

// ❌ BAD: status has low selectivity
{
  "name": "idx_status_email",
  "fields": ["status", "email"]
}

3. Avoid Over-Indexing

Each index has costs:

Storage: Indexes consume disk space
Write performance: INSERT/UPDATE/DELETE slower
Memory: More indexes = more cache needed

Rule of thumb: Only index columns frequently used in WHERE, JOIN, ORDER BY clauses.

4. Monitor Index Usage

Track which indexes are actually used:

-- PostgreSQL query to check index usage
SELECT
  schemaname,
  tablename,
  indexname,
  idx_scan as scans,
  idx_tup_read as tuples_read,
  idx_tup_fetch as tuples_fetched,
  pg_size_pretty(pg_relation_size(indexrelid)) as size
FROM pg_stat_user_indexes
WHERE schemaname = 'your_schema'
ORDER BY idx_scan ASC;

Drop unused indexes:

idx_scan = 0 means index is never used
Consider removing if not used after sufficient time

5. Use EXPLAIN ANALYZE

Test if your index is being used:

EXPLAIN ANALYZE
SELECT * FROM users
WHERE status = 'active'
  AND created_at > '2024-01-01';

Look for:

Index Scan or Index Only Scan (good)
Seq Scan (bad - index not used)

Common Patterns

Pattern 1: User Lookup by Email

{
  "indexes": [
    {
      "name": "idx_users_email_lower",
      "fields": ["email"],
      "expression": "LOWER(email)",
      "unique": true,
      "method": "btree",
      "comment": "Case-insensitive email lookup"
    }
  ]
}

Query:

SELECT * FROM users WHERE LOWER(email) = LOWER('User@Example.com');

Pattern 2: Recent Active Records

{
  "indexes": [
    {
      "name": "idx_posts_active_recent",
      "fields": ["created_at"],
      "where": "status = 'published'",
      "method": "btree",
      "comment": "Recently published posts"
    }
  ]
}

Query:

SELECT * FROM posts
WHERE status = 'published'
ORDER BY created_at DESC
LIMIT 20;

Pattern 3: Multi-Tenant Data

{
  "indexes": [
    {
      "name": "idx_data_tenant_created",
      "fields": ["tenant_id", "created_at"],
      "method": "btree",
      "comment": "Tenant-scoped queries"
    }
  ]
}

Query:

SELECT * FROM data
WHERE tenant_id = 123
ORDER BY created_at DESC;

Pattern 4: JSONB Search

{
  "indexes": [
    {
      "name": "idx_users_settings_gin",
      "fields": ["settings"],
      "method": "gin",
      "comment": "Search user settings"
    }
  ]
}

Query:

SELECT * FROM users
WHERE settings @> '{"theme": "dark", "notifications": true}';

Validation and Errors

Valid Index Names

Index names must:

Start with a letter or underscore
Contain only letters, numbers, underscores
Be unique within the table
Follow PostgreSQL identifier rules

✅ Valid:

idx_users_email
idx_orders_status_created
idx_posts_published_at

❌ Invalid:

123_index (starts with number)
idx-users-email (contains hyphen)
idx users email (contains spaces)

Common Errors

Missing Required Fields:

{
  "indexes": [
    {
      "fields": ["email"],
      "method": "btree"
    }
  ]
}

→ Error: Index definition missing 'name' field

Invalid Method:

{
  "indexes": [
    {
      "name": "idx_test",
      "fields": ["email"],
      "method": "invalid"
    }
  ]
}

→ Error: Index 'idx_test': method 'invalid' not supported. Valid methods: btree, hash, gin, gist, brin

Non-existent Field:

{
  "indexes": [
    {
      "name": "idx_test",
      "fields": ["nonexistent_field"],
      "method": "btree"
    }
  ]
}

→ Error: Index 'idx_test': field 'nonexistent_field' does not exist in table

Migration Handling

When updating schemas with index changes, the system automatically:

Adding Indexes

// Before
{"indexes": []}

// After
{
  "indexes": [
    {"name": "idx_new", "fields": ["email"], "method": "btree"}
  ]
}

→ Generates: CREATE INDEX idx_new ON table (email);

Removing Indexes

// Before
{
  "indexes": [
    {"name": "idx_old", "fields": ["email"], "method": "btree"}
  ]
}

// After
{"indexes": []}

→ Generates: DROP INDEX idx_old;

Modifying Indexes

Index modifications require drop + recreate:

// Change method from btree to hash

→ Generates:

DROP INDEX idx_email;
CREATE INDEX idx_email ON table USING HASH (email);

Production Considerations

Index creation on large tables can take significant time and lock the table. Consider:

Use CREATE INDEX CONCURRENTLY for production (requires manual SQL)
Create indexes during maintenance windows
Monitor index build progress via pg_stat_progress_create_index

Performance Tips

1. Index-Only Scans

Create covering indexes that include all queried columns:

{
  "name": "idx_users_email_name",
  "fields": ["email", "name"],
  "comment": "Covering index for email + name queries"
}

Query:

-- Can be satisfied entirely from index (no table access)
SELECT email, name FROM users WHERE email = 'user@example.com';

2. Partial Indexes for Boolean Flags

Instead of indexing boolean columns directly:

// ❌ BAD: Low selectivity
{
  "name": "idx_is_deleted",
  "fields": ["is_deleted"]
}

// ✅ GOOD: Partial index
{
  "name": "idx_not_deleted",
  "fields": ["id"],
  "where": "is_deleted = FALSE"
}

3. Expression Indexes for Computed Values

For frequently computed values:

{
  "name": "idx_full_name",
  "fields": ["first_name", "last_name"],
  "expression": "first_name || ' ' || last_name",
  "comment": "Index concatenated full name"
}

4. Reindex Periodically

B-tree indexes can become bloated over time. Rebuild them:

REINDEX INDEX idx_users_email;
REINDEX TABLE users;

Resources

What's Next?

Schema Management: Complete schema reference
Querying Data: Optimize queries with indexes
Migrations: Evolve schemas safely

Overview​

When to Use Indexes​

Index Definition Format​

Index Properties​

Index Types​

1. Simple Column Index​

2. Composite Index​

3. Unique Index​

4. Expression Index​

5. Partial Index​

6. GIN Index (for JSONB and Arrays)​

FlatTable Provider Indexes​

Example: E-commerce Schema​

JSONB Provider Indexes​

JSONB Index Expressions​

Example: JSONB Schema with Indexes​

JSONB Query Examples​

Automatic Indexes​

1. Primary Keys​

2. Unique Constraints​

3. Foreign Keys (FlatTable)​

4. Foreign Keys (JSONB)​

5. GIN Index on JSONB Column​

Index Methods​

B-tree (Default)​

Hash​

GIN (Generalized Inverted Index)​

GiST (Generalized Search Tree)​

BRIN (Block Range Index)​

Best Practices​

1. Index Selectivity​

2. Composite Index Column Order​

3. Avoid Over-Indexing​

4. Monitor Index Usage​

5. Use EXPLAIN ANALYZE​

Common Patterns​

Pattern 1: User Lookup by Email​

Pattern 2: Recent Active Records​

Pattern 3: Multi-Tenant Data​

Pattern 4: JSONB Search​

Validation and Errors​

Valid Index Names​

Common Errors​

Migration Handling​

Adding Indexes​

Removing Indexes​

Modifying Indexes​

Performance Tips​

1. Index-Only Scans​

2. Partial Indexes for Boolean Flags​

3. Expression Indexes for Computed Values​

4. Reindex Periodically​

Resources​

What's Next?​

Overview

When to Use Indexes

Index Definition Format

Index Properties

Index Types

1. Simple Column Index

2. Composite Index

3. Unique Index

4. Expression Index

5. Partial Index

6. GIN Index (for JSONB and Arrays)

FlatTable Provider Indexes

Example: E-commerce Schema

JSONB Provider Indexes

JSONB Index Expressions

Example: JSONB Schema with Indexes

JSONB Query Examples

Automatic Indexes

1. Primary Keys

2. Unique Constraints

3. Foreign Keys (FlatTable)

4. Foreign Keys (JSONB)

5. GIN Index on JSONB Column

Index Methods

B-tree (Default)

Hash

GIN (Generalized Inverted Index)

GiST (Generalized Search Tree)

BRIN (Block Range Index)

Best Practices

1. Index Selectivity

2. Composite Index Column Order

3. Avoid Over-Indexing

4. Monitor Index Usage

5. Use EXPLAIN ANALYZE

Common Patterns

Pattern 1: User Lookup by Email

Pattern 2: Recent Active Records

Pattern 3: Multi-Tenant Data

Pattern 4: JSONB Search

Validation and Errors

Valid Index Names

Common Errors

Migration Handling

Adding Indexes

Removing Indexes

Modifying Indexes

Performance Tips

1. Index-Only Scans

2. Partial Indexes for Boolean Flags

3. Expression Indexes for Computed Values

4. Reindex Periodically

Resources

What's Next?