FlowIndex
FlowIndex

Architecture

System architecture of FlowIndex -- dual ingesters, worker pipeline, and database schema.

LLM IndexLLM Full

FlowIndex follows a dual ingester + deriver architecture designed for high-throughput blockchain indexing with near-real-time derived data.

System Overview

flowchart LR
  FAN["Flow Access Nodes"] --> FI["Forward Ingester"]
  FAN --> BI["Backward Ingester"]
  FI -->|raw blocks, txs, events| DB[("PostgreSQL")]
  BI -->|raw blocks, txs, events| DB
  DB --> LD["Live Derivers"]
  LD -->|derived data| DB
  DB --> HD["History Deriver"]
  HD -->|derived data| DB
  DB --> API["API Server"]
  API -->|REST + WebSocket| FE["Frontend UI"]

Dual Ingester Pattern

Two independent ingesters run concurrently, each optimized for its workload:

Forward Ingester (main_ingester)

Tracks the chain head in real time. Processes blocks in ascending order.

  1. Reads its checkpoint from the database
  2. Fetches the latest sealed block height from a Flow Access Node
  3. Fills a batch using a concurrent worker pool
  4. Performs reorg detection by verifying parent hash continuity
  5. Saves the batch atomically in a single database transaction
  6. Broadcasts new blocks and transactions via WebSocket
  7. Triggers the forward Live Deriver for near-real-time derived data

Backward Ingester (history_ingester)

Backfills historical data in descending order, silently filling the chain from the present back to genesis.

  1. Walks backwards from its checkpoint
  2. Handles spork boundaries automatically -- when a node returns "not found", the ingester adjusts the floor to the spork root height
  3. Supports per-spork node configuration via FLOW_HISTORIC_ACCESS_NODES
  4. No WebSocket broadcasts (silent backfill)
  5. Triggers the history Live Deriver for immediate derived data

Data Flow

sequenceDiagram
  participant FI as Forward Ingester
  participant BI as Backward Ingester
  participant Raw as raw.* tables
  participant LD as Live Derivers
  participant App as app.* tables
  participant API as API Server

  FI->>Raw: Save blocks, txs, events
  FI->>LD: NotifyRange(from, to)
  BI->>Raw: Save blocks, txs, events
  BI->>LD: NotifyRange(from, to)
  LD->>Raw: Read raw data
  LD->>App: Write derived data
  API->>Raw: Query raw data
  API->>App: Query derived data

Derivers and Workers

Derivers transform raw blockchain data into queryable derived tables. They operate in two phases per chunk of blocks:

Phase 1 -- Independent Processors (parallel)

WorkerPurposeOutput Tables
token_workerParse FT/NFT events into transfersft_transfers, nft_transfers, ft_tokens, nft_collections
evm_workerDecode EVM transactions from Flow eventsevm_tx_hashes
tx_contracts_workerExtract contract imports, tag transactionstx_contracts, tx_tags
accounts_workerCatalog accounts from creation eventsaccounts, coa_accounts
meta_workerBuild address-transaction index, extract keysaddress_transactions, account_keys, smart_contracts
tx_metrics_workerCompute per-transaction metricstx_metrics
staking_workerParse staking and epoch eventsstaking_events, staking_nodes, epoch_stats
defi_workerParse DEX swap eventsdefi_events, defi_pairs

Phase 2 -- Token-Dependent Processors (parallel, after Phase 1)

WorkerPurposeOutput Tables
ft_holdings_workerUpdate FT balances from transfersft_holdings
nft_ownership_workerUpdate NFT ownership from transfersnft_ownership
daily_balance_workerAggregate daily FT deltasdaily_balances

Queue-Based Workers (independent)

These operate outside the deriver pipeline, finding their own work from derived tables:

WorkerPurpose
nft_item_metadata_workerFetch per-NFT metadata via Cadence scripts
nft_ownership_reconcilerVerify NFT ownership against chain state
token_metadata_workerFetch on-chain FT/NFT collection metadata

Live Deriver vs. History Deriver

Live DeriverHistory Deriver
TriggerIngester callback on each batchPeriodic background scan
Chunk size10 blocks (configurable)1,000 blocks (configurable)
LatencyNear-real-time (~1 second)Background safety net
InstancesTwo (forward + history)One
PurposePrimary processing pathCatch missed ranges

The forward Live Deriver is triggered by the forward ingester after each batch commit, ensuring derived data is available within seconds of a block being sealed.

The history Live Deriver is triggered by the backward ingester, processing newly backfilled blocks immediately.

The History Deriver runs as a safety net, scanning for any raw blocks that were not yet processed by the derivers.

Database Schema

The database uses a dual-layer design:

Raw Layer (raw.*)

Stores blockchain data exactly as received from Flow Access Nodes. Append-only and partitioned by block height.

TableContents
raw.blocksBlock headers (height, ID, parent ID, timestamp, tx/event counts)
raw.transactionsFull transaction data (script, arguments, authorizers, status)
raw.eventsAll events emitted by transactions
raw.tx_lookupTransaction ID to block height mapping
raw.block_lookupBlock ID to height mapping
raw.scriptsDeduplicated transaction scripts (script_hash to script_text)

App Layer (app.*)

Stores worker-derived projections optimized for queries.

TableContents
app.ft_transfersFungible token transfers (sender, receiver, amount, token)
app.nft_transfersNFT transfers (sender, receiver, collection, NFT ID)
app.ft_holdingsCurrent FT balances per account per token
app.nft_ownershipCurrent NFT ownership
app.accountsKnown accounts with creation metadata
app.smart_contractsDeployed contracts with code and metadata
app.account_keysFlow account public keys
app.address_transactionsAddress-to-transaction index for fast lookups
app.evm_tx_hashesCadence transaction to EVM hash mappings
app.staking_nodesStaking node state and delegation info
app.defi_pairsDEX trading pairs and liquidity pools
app.indexing_checkpointsResumability state for all workers
app.worker_leasesLease-based concurrency control

Partitioning Strategy

Raw tables are range-partitioned by block_height with partition sizes of 5-10 million rows. Lookup tables (tx_lookup, block_lookup) avoid expensive cross-partition scans.

flowchart TB
  P["raw.transactions"] --> P1["Partition 0 - 5M"]
  P --> P2["Partition 5M - 10M"]
  P --> P3["Partition 10M - 15M"]
  P --> P4["..."]

Resumability

All ingesters and workers track progress via app.indexing_checkpoints. On restart, each component resumes from its last committed checkpoint. Writes are idempotent (upsert-based), so retries are safe.

Workers use a lease mechanism (app.worker_leases) to prevent duplicate processing across concurrent instances. Failed leases are automatically retried up to 20 times before being flagged for manual intervention.

Reorg Handling

The forward ingester performs parent hash verification on each batch. If a chain reorganization is detected:

  1. Affected blocks are surgically deleted (not truncated)
  2. Worker checkpoints are clamped to the rollback height
  3. Worker leases overlapping the rollback range are deleted for re-derivation

On this page