Architecture

System architecture of FlowIndex -- dual ingesters, worker pipeline, and database schema.

FlowIndex follows a dual ingester + deriver architecture designed for high-throughput blockchain indexing with near-real-time derived data.

System Overview

flowchart LR
  FAN["Flow Access Nodes"] --> FI["Forward Ingester"]
  FAN --> BI["Backward Ingester"]
  FI -->|raw blocks, txs, events| DB[("PostgreSQL")]
  BI -->|raw blocks, txs, events| DB
  DB --> LD["Live Derivers"]
  LD -->|derived data| DB
  DB --> HD["History Deriver"]
  HD -->|derived data| DB
  DB --> API["API Server"]
  API -->|REST + WebSocket| FE["Frontend UI"]

Dual Ingester Pattern

Two independent ingesters run concurrently, each optimized for its workload:

Forward Ingester (`main_ingester`)

Tracks the chain head in real time. Processes blocks in ascending order.

Reads its checkpoint from the database
Fetches the latest sealed block height from a Flow Access Node
Fills a batch using a concurrent worker pool
Performs reorg detection by verifying parent hash continuity
Saves the batch atomically in a single database transaction
Broadcasts new blocks and transactions via WebSocket
Triggers the forward Live Deriver for near-real-time derived data

Backward Ingester (`history_ingester`)

Backfills historical data in descending order, silently filling the chain from the present back to genesis.

Walks backwards from its checkpoint
Handles spork boundaries automatically -- when a node returns "not found", the ingester adjusts the floor to the spork root height
Supports per-spork node configuration via FLOW_HISTORIC_ACCESS_NODES
No WebSocket broadcasts (silent backfill)
Triggers the history Live Deriver for immediate derived data

Data Flow

sequenceDiagram
  participant FI as Forward Ingester
  participant BI as Backward Ingester
  participant Raw as raw.* tables
  participant LD as Live Derivers
  participant App as app.* tables
  participant API as API Server

  FI->>Raw: Save blocks, txs, events
  FI->>LD: NotifyRange(from, to)
  BI->>Raw: Save blocks, txs, events
  BI->>LD: NotifyRange(from, to)
  LD->>Raw: Read raw data
  LD->>App: Write derived data
  API->>Raw: Query raw data
  API->>App: Query derived data

Derivers and Workers

Derivers transform raw blockchain data into queryable derived tables. They operate in two phases per chunk of blocks:

Phase 1 -- Independent Processors (parallel)

Worker	Purpose	Output Tables
`token_worker`	Parse FT/NFT events into transfers	`ft_transfers`, `nft_transfers`, `ft_tokens`, `nft_collections`
`evm_worker`	Decode EVM transactions from Flow events	`evm_tx_hashes`
`tx_contracts_worker`	Extract contract imports, tag transactions	`tx_contracts`, `tx_tags`
`accounts_worker`	Catalog accounts from creation events	`accounts`, `coa_accounts`
`meta_worker`	Build address-transaction index, extract keys	`address_transactions`, `account_keys`, `smart_contracts`
`tx_metrics_worker`	Compute per-transaction metrics	`tx_metrics`
`staking_worker`	Parse staking and epoch events	`staking_events`, `staking_nodes`, `epoch_stats`
`defi_worker`	Parse DEX swap events	`defi_events`, `defi_pairs`

Phase 2 -- Token-Dependent Processors (parallel, after Phase 1)

Worker	Purpose	Output Tables
`ft_holdings_worker`	Update FT balances from transfers	`ft_holdings`
`nft_ownership_worker`	Update NFT ownership from transfers	`nft_ownership`
`daily_balance_worker`	Aggregate daily FT deltas	`daily_balances`

Queue-Based Workers (independent)

These operate outside the deriver pipeline, finding their own work from derived tables:

Worker	Purpose
`nft_item_metadata_worker`	Fetch per-NFT metadata via Cadence scripts
`nft_ownership_reconciler`	Verify NFT ownership against chain state
`token_metadata_worker`	Fetch on-chain FT/NFT collection metadata

Live Deriver vs. History Deriver

	Live Deriver	History Deriver
Trigger	Ingester callback on each batch	Periodic background scan
Chunk size	10 blocks (configurable)	1,000 blocks (configurable)
Latency	Near-real-time (~1 second)	Background safety net
Instances	Two (forward + history)	One
Purpose	Primary processing path	Catch missed ranges

The forward Live Deriver is triggered by the forward ingester after each batch commit, ensuring derived data is available within seconds of a block being sealed.

The history Live Deriver is triggered by the backward ingester, processing newly backfilled blocks immediately.

The History Deriver runs as a safety net, scanning for any raw blocks that were not yet processed by the derivers.

Database Schema

The database uses a dual-layer design:

Raw Layer (`raw.*`)

Stores blockchain data exactly as received from Flow Access Nodes. Append-only and partitioned by block height.

Table	Contents
`raw.blocks`	Block headers (height, ID, parent ID, timestamp, tx/event counts)
`raw.transactions`	Full transaction data (script, arguments, authorizers, status)
`raw.events`	All events emitted by transactions
`raw.tx_lookup`	Transaction ID to block height mapping
`raw.block_lookup`	Block ID to height mapping
`raw.scripts`	Deduplicated transaction scripts (script_hash to script_text)

App Layer (`app.*`)

Stores worker-derived projections optimized for queries.

Table	Contents
`app.ft_transfers`	Fungible token transfers (sender, receiver, amount, token)
`app.nft_transfers`	NFT transfers (sender, receiver, collection, NFT ID)
`app.ft_holdings`	Current FT balances per account per token
`app.nft_ownership`	Current NFT ownership
`app.accounts`	Known accounts with creation metadata
`app.smart_contracts`	Deployed contracts with code and metadata
`app.account_keys`	Flow account public keys
`app.address_transactions`	Address-to-transaction index for fast lookups
`app.evm_tx_hashes`	Cadence transaction to EVM hash mappings
`app.staking_nodes`	Staking node state and delegation info
`app.defi_pairs`	DEX trading pairs and liquidity pools
`app.indexing_checkpoints`	Resumability state for all workers
`app.worker_leases`	Lease-based concurrency control

Partitioning Strategy

Raw tables are range-partitioned by block_height with partition sizes of 5-10 million rows. Lookup tables (tx_lookup, block_lookup) avoid expensive cross-partition scans.

flowchart TB
  P["raw.transactions"] --> P1["Partition 0 - 5M"]
  P --> P2["Partition 5M - 10M"]
  P --> P3["Partition 10M - 15M"]
  P --> P4["..."]

All ingesters and workers track progress via app.indexing_checkpoints. On restart, each component resumes from its last committed checkpoint. Writes are idempotent (upsert-based), so retries are safe.

Workers use a lease mechanism (app.worker_leases) to prevent duplicate processing across concurrent instances. Failed leases are automatically retried up to 20 times before being flagged for manual intervention.

Reorg Handling

The forward ingester performs parent hash verification on each batch. If a chain reorganization is detected:

Affected blocks are surgically deleted (not truncated)
Worker checkpoints are clamped to the rollback height
Worker leases overlapping the rollback range are deleted for re-derivation

Architecture

On this page