Full Build Explanation And AI Decision Log

Source: docs/architecture/FULL_BUILD_EXPLANATION_AND_AI_DECISION_LOG.md

Manual Index Client UI

Full Build Explanation And AI Decision Log

Purpose

This document explains the current Earthbond build as it exists in this repository and live stack.

It is meant to answer four questions:

  1. what the platform currently does,
  2. which parts were directly requested by the user,
  3. where the assistant made engineering choices,
  4. what still needs to be built for deeper semantic processing.

This is a build-provenance document, not a marketing summary.

Authorship Record

For build-provenance purposes, this document records:

That means the build should be described as user-directed and assistant-implemented, not as a product concept that originated independently from the assistant.

Current Live State

As of 2026-03-30, the live demo_tenant Volve stack reports:

Coverage audit currently reports:

That means every staged package is represented in canonical tables and available to the application.

It does not mean every binary format has been fully semantically decoded down to native engineering meaning.

What Was Derived From The User

These requirements came directly from the user and should be treated as the primary product intent:

  1. build the system around real Volve data rather than hypothetical schemas,
  2. stage data for WAN-safe use through MinIO,
  3. validate before normalization,
  4. normalize all major data classes into canonical tables,
  5. rank reopening targets and estimate remaining barrels,
  6. keep the workflow visible in the UI,
  7. separate field-package workflow from legacy upload-session workflow,
  8. make outputs downloadable and documented,
  9. stop treating "minimum good enough" as the final standard,
  10. process all data, not only the highest-priority subsets.

Those requests are why the repository now contains field-package profiling, validation, phase-based normalization, canonical coverage audit, MinIO staging, scoring, volumetrics, and expanded artifact ingestion.

The user-directed brief was also broader than this list suggests.

It was an investigation into:

  1. whether open data alone could support a serious proof-of-concept,
  2. how difficult full canonicalization and cross-domain linkage would be in practice,
  3. what parts of the data would actually be useful for defendable payload-screening,
  4. what surfaced during development that changed the draft project structure,
  5. and what would still separate this POC from a production monetizable application.

Several important findings came out of that investigation and were then folded into the build:

  1. provenance and package coverage needed to become first-class features rather than background implementation detail,
  2. CRS/WGS84/ECEF normalization had to be treated as part of source truth,
  3. well identity and alias resolution became a core engineering problem,
  4. full artifact representation mattered even where full native semantic decode was not yet practical,
  5. ranking and barrel outputs had to be presented explicitly as heuristic decision-support layers.

Where The Assistant Made Engineering Choices

The assistant made implementation decisions in places where the user specified the goal but not the mechanism.

1. Storage model

Choice:

Reason:

2. Database layout

Choice:

Reason:

3. Canonical-first workflow

Choice:

Reason:

4. Phase-based ingestion

Choice:

Reason:

5. Coverage audit standard

Choice:

Reason:

6. Scoring and volumetric methods

Choice:

Reason:

Build Chronology By Capability

A. Foundation

Core schemas, auth, upload/session flow, raw registry, CRS, audit, and semantic spine were added first.

Main files:

B. Field-package profiling and validation

This layer inventories a real field package and decides what can be loaded.

Main files:

C. Canonical field-package normalization

Phase 1 canonicalized wells, aliases, locations, production, tops, completions, and structural context.

Main files:

D. Phase-2 log and reservoir support

This added interpreted log outputs, pay events, bypassed candidates, and dynamic reservoir support.

Main files:

E. Technical, WITSML, seismic, reservoir-body, and full artifact ingest

This expanded processing from selective domains into full package coverage.

Main files:

F. Scoring and remaining barrels

These layers convert canonical evidence into ranked candidates and low/mid/high barrel ranges.

Main files:

G. Application and deployment path

The workflow is exposed through the data-plane API, client upload page, WAN proxy, and MinIO staging utilities.

Main files:

How To Read "Full Processing"

There are four different meanings of "fully processed." They should not be conflated.

1. Coverage complete

Every package is known, validated, staged, and represented in canonical tables.

Current state:

2. Canonical domain complete

The important structured facts from a domain are loaded into typed canonical tables.

Current state:

3. Semantic decode complete

The native meaning of every binary/text format is deeply extracted.

Current state:

Examples still not fully decoded:

4. Decision-complete

The system can answer the business question with strong confidence and low manual cleanup.

Current state:

What Was Computed Versus What Was Interpreted

Directly loaded from source data

Deterministically derived

Heuristic or model-based

Those heuristic layers are deliberately kept separate from raw and canonical factual layers.

What Still Needs To Be Built

If the target is "deep full semantic processing," these remain:

  1. full DLIS/LIS curve extraction and mnemonic/unit normalization,
  2. full SEG-Y header/sample decode into canonical seismic trace objects,
  3. deeper Eclipse parsing for schedules, vectors, and cell-level semantics,
  4. deeper RMS parsing for realization/property/cell semantics,
  5. structured parsing of technical HTML/PDF reports beyond XML daily reports,
  6. stronger well-to-compartment intersection logic using more model geometry and fewer proxies,
  7. explicit rights/attribution enforcement in download/export flows,
  8. formal repo licensing and third-party notice handling.

Teaching Standard For Future Work

When adding new capabilities, use this order:

  1. prove the source exists,
  2. validate domain coverage,
  3. define the canonical target table,
  4. preserve raw provenance,
  5. separate deterministic facts from heuristics,
  6. expose the result in the UI and API,
  7. add coverage audit status,
  8. document legal/license boundaries.

If a new build step does not satisfy those eight points, it is not finished.

Bottom Line

This build is no longer just a schema exercise.

It is now:

What it is not yet: