System Boundary
This build should be described as a user-directed proof-of-value / proof-of-concept built on open data.
- It is not a production deployment.
- It is not a reserves-booking engine.
- It is not an integration into proprietary internal software.
- It is a database-backed normalization and decision-support system built to test whether defendable payload-screening can be achieved from open data alone.
Objective
The user objective was broader than a simple feature request. The build was intended to test whether open subsurface data could be turned into a defendable payload-screening system, how difficult that would be in practice, and what would have to exist before such a system could become part of a production, monetizable application.
The user-directed investigation asked the system to prove several things at once:
- use open field data,
- normalize it into a source-of-truth data model,
- cross-reference wells, logs, production, technical context, WITSML, seismic, and models,
- produce payload-screening outputs that a technical team can inspect and challenge,
- show whether the approach can be defended technically rather than only demonstrated visually,
- surface what is still difficult, partial, or approximation-grade,
- and reveal what would be required to move from draft development into a product-grade application later.
This means the project should be described as an investigation into what can be used, what is worth canonicalizing, what signals are strong enough for decision support, and what additional governance or engineering would be needed before commercialization.
Several important facts surfaced during this investigation and were then folded back into the draft development project:
- open data alone is enough to build a serious canonical proof-of-concept if provenance is preserved,
- CRS and ECEF handling are central, not peripheral, to defensible spatial truth,
- cross-domain well identity resolution is one of the hardest and most important problems,
- package-level canonical coverage matters because “present” and “decision-ready” are not the same thing,
- full artifact representation is necessary even when full semantic decode is not yet complete,
- ranking and barrel estimates are useful only if their heuristic nature is stated explicitly.
The build therefore optimizes for provenance, reproducibility, explainability, and technical honesty rather than novelty claims.
End-To-End Decision Flow
Open Source Data
Volve packages, reports, logs, models, seismic, WITSML
Validation Gate
Inventory, readiness, parser debt, package coverage
Canonical Load
Typed tables in PostgreSQL across wells, tops, production, technical, seismic, model, artifacts
Derived Interpretation
QC, pay events, bypassed-pay, penetrations, support rows, geometry transforms
Decision Outputs
Reopening ranking and low/mid/high remaining-barrel estimates
| Decision Step | What Happens | Why It Is Defendable |
|---|---|---|
| Validation | Confirms what is present, what is partial, and what is blocked. | Prevents silent assumptions and exposes missing coverage explicitly. |
| Canonicalization | Moves source data into typed domain tables. | Separates source-authored data from ad hoc script output. |
| Linking | Connects wells to aliases, logs, WITSML, production, reports, and models. | Cross-domain conclusions can be traced back to concrete supporting records. |
| Derived Interpretation | Computes QC, pay intervals, bypassed candidates, support rows, and penetrations. | Derived logic is explicit and inspectable rather than hidden. |
| Ranking | Scores wells relative to the loaded portfolio. | Outputs are broken into components instead of a single opaque number. |
| Barrel Estimate | Produces low/mid/high screening-grade ranges. | Communicates uncertainty honestly rather than claiming reserve-grade certainty. |
Why The Database Matters
The database is not just storage. It is the mechanism that makes the result defendable.
| Schema | Role | Decision Value |
|---|---|---|
raw |
Source bundles and objects | Proves what entered the system. |
ops |
Canonical operational truth | Provides the direct record base for decisions. |
semantic |
Entity projection | Supports graph-style query and future AI use without changing source truth. |
audit |
Traceability and evidence | Preserves how each result was created. |
core |
Users, access, tenancy | Separates operational governance from domain facts. |
The central defense is simple: the DB preserves source truth, derived truth, and decision outputs as separate layers.
CRS And ECEF Source Of Truth
CRS normalization is not decoration in this build. It is part of the truth model.
- WGS84 coordinates are normalized into stable spatial fields.
- ECEF coordinates are computed deterministically.
- Transform chains are preserved so spatial assumptions can be inspected later.
This matters because wells, trajectories, surfaces, penetrations, and payload intervals cannot be defended if their spatial basis is hidden or inconsistent.
| Spatial Concern | How The Build Handles It | Why It Matters |
|---|---|---|
| CRS resolution | Tracks CRS and transform context instead of discarding it. | Prevents mixed-coordinate ambiguity. |
| WGS84 normalization | Creates a common geographic reference frame. | Allows cross-domain spatial joins. |
| ECEF conversion | Computes deterministic Cartesian coordinates. | Supports consistent geometry and distance logic. |
| Depth and spatial linkage | Connects tops, penetrations, trajectories, and intervals through canonical records. | Improves defendability of “where is the payload?” questions. |
Schema To Decision Map
| Decision Topic | Main Tables | What The Tables Prove |
|---|---|---|
| Which well is which? | ops.canonical_wells, ops.canonical_well_aliases, ops.canonical_well_source_links |
Stable well identity and alias provenance. |
| Where is the well? | ops.canonical_well_locations |
Spatial anchor for cross-domain joins. |
| What reservoir intervals are present? | ops.canonical_formation_tops, ops.canonical_structural_surfaces, ops.canonical_reservoir_bodies |
Static subsurface context. |
| What was actually completed? | ops.canonical_completion_intervals |
Completion/perforation context for bypassed-pay logic. |
| What happened over time? | ops.canonical_production_records, ops.canonical_dynamic_reservoir_context, ops.canonical_technical_daily_reports, ops.canonical_witsml_* |
Dynamic and operational support. |
| What do the logs say? | ops.well_logs, ops.well_qc_cards, ops.well_interpretations, ops.well_pay_events, ops.well_bypassed_candidates |
Petrophysical support and interval-level interpretation. |
| Which wells access which bodies? | ops.canonical_well_reservoir_penetrations |
Reservoir-access evidence. |
| What is the ranked decision? | ops.well_reopening_targets |
Explicit score components and rank. |
| What is the barrel range? | ops.remaining_barrel_estimates |
Low/mid/high screening-grade payload estimate. |
Human vs AI vs Derived
| Class | Examples | How To Describe It |
|---|---|---|
| Human-authored source data | production values, tops, completions, WITSML XML fields, technical reports, artifact metadata | Directly extracted from open source packages. |
| Human-directed build intent | use Volve, use MinIO, process all packages, export outputs, make the build defendable | Product direction came from Johann F.R Wilhelm. |
| Assistant-designed implementation | schema layout, migrations, workflow phases, scoring structure, coverage audit | Engineering mechanics supplied under user direction. |
| Deterministic transform | WGS84, ECEF, counts, geometry envelopes, canonical inventories | Mechanically derived and reproducible. |
| Rule-based heuristic | scores, confidence labels, pay candidates, remaining barrels | Authored logic, not hidden model inference. |
| Runtime AI/ML inference | none found in canonicalization/scoring path | Current live outputs are not generated by runtime LLM decision-making. |
IP And Copyright Position
| Area | Current Position | Defendable Statement |
|---|---|---|
| Project direction and build framing | User-originated | Project intent and narrowing decisions are attributed to Johann F.R Wilhelm. |
| Repository implementation | Mixed authorship | Code and docs include assistant-generated implementation under user direction. |
| Open dataset handling | Provenance preserved | Normalization does not erase source-license constraints. |
| Exports and reports | Improved but not fully closed | Metadata-first posture is stronger than reproducing source text, but legal/policy closure is still separate from engineering documentation. |
Colleague Defense Summary
- The build is grounded in real open source data, not synthetic examples.
- The database separates source facts, canonical truth, and derived outputs.
- CRS and ECEF are explicit parts of the source-of-truth model.
- Cross-reference and ranking are built from inspectable schema and explicit logic.
- The runtime path is deterministic or rule-based, not live LLM decision-making.
- The result is defendable as a user-directed POC for decision support, not as a production reserves system.
Supporting Artifacts
- Build Rationale, Schema, And IP Defense
- Full Build Explanation And AI Decision Log
- IP And Copyright Compliance Review
- Well Reopening Scoring Schema And Worklist Design
- System Interaction Map
Schema provenance ledger:
docs/operations/BUILD_SCHEMA_PROVENANCE_AUDIT_20260330.jsondocs/operations/BUILD_SCHEMA_PROVENANCE_AUDIT_20260330.mddocs/operations/BUILD_SCHEMA_PROVENANCE_AUDIT_20260330.csv