Build Position
This build is a real-data field-package processing system centered on Volve. It is not a toy schema exercise and it is not a reserve-booking engine.
It should be described explicitly as a user-directed proof-of-value / proof-of-concept built from open data, not as a production deployment and not as an integration into proprietary internal software.
The user defined the objective and kept narrowing the scope until the build had a concrete shape. The assistant implemented the mechanics needed to execute that scope.
The practical design rule was: preserve provenance, validate before normalization, canonicalize before scoring, and mark derived outputs honestly.
What Was Determined By Human Direction
These were not only isolated feature requests. They were the user-directed structure of the investigation and therefore count as human-directed build intent.
- Use real Volve data instead of a hypothetical demo schema.
- Validate all data before normalization.
- Support WAN access through MinIO and the app UI.
- Normalize all major data classes, not only high-priority subsets.
- Keep the workflow explicit and visible in the UI.
- Export outputs so the build can be inspected and defended.
- Explain clearly what is human, what is derived, and what is AI.
- Review IP and copyright exposure.
The user was also directing the build toward a harder question: whether open data could support a defendable payload-screening proof-of-concept, how difficult that would be, what could realistically be used, and what would still be missing before the result could become part of a production monetizable application.
These requirements therefore fixed the build boundary: use open data, keep secret software out of scope, make the result inspectable by colleagues, and optimize for defendability rather than novelty claims.
They also surfaced several draft-project conclusions that were added back into the build structure:
- spatial truth needs explicit CRS/ECEF discipline,
- identity resolution across wells and packages is a first-order problem,
- “data present” is not the same as “decision-grade,”
- artifact-level canonicalization matters even when deep native decode is incomplete,
- ranking and volumetrics must be presented as heuristic outputs, not false certainties.
What Was Determined By Assistant Engineering Decisions
The user set the goals. The assistant chose the implementation mechanics.
- One PostgreSQL cluster with schema separation instead of several independent databases.
- MinIO object storage as the WAN-safe field-package source.
- Phase-based pipeline: validate, normalize, enrich, score, estimate.
- Canonical-first architecture instead of scoring from raw archives.
- Coverage audit as an explicit package completeness ledger.
- Rule-based scoring and low/mid/high volumetrics instead of pretending to have reserve-grade certainty.
- Semantic entity projection from canonical records.
These are engineering choices, not user-authored domain facts. Another competent team could have built a comparable proof-of-concept from the same requirements. The value here is explicitness and traceability, not exclusivity.
Architecture Schematic
Source Packages
Volve files in local staging and MinIO
Validation Gate
Package inventory, readiness, parser debt, coverage
Canonical Normalization
Wells, tops, completions, production, seismic, WITSML, reports, artifacts
Derived Layers
QC, pay events, bypassed candidates, reservoir bodies, penetrations
Decision Outputs
Reopening ranking and remaining-barrel estimates
How The Database Is Used
| Schema | Purpose | Why It Exists |
|---|---|---|
raw |
Source bundle/object registry | Proves what entered the system and from where. |
ops |
Canonical operational truth | Holds typed domain records used by the application and analysis logic. |
semantic |
Ontology projection | Creates entity/relationship structure for later query and AI use. |
audit |
Traceability and evidence | Allows the build to be defended and reproduced. |
core |
Users, permissions, tenancy | Separates platform control state from subsurface domain state. |
The database is therefore not just storage. It is the structure that preserves provenance, reproducibility, and separation between source facts and derived outputs.
It is also how CRS normalization and ECEF/WGS84 transforms become defendable source-of-truth fields rather than hidden script behavior.
High-Level Schema Overview
| Layer | Main Tables | What They Mean |
|---|---|---|
| Canonical identity | ops.canonical_wells, ops.canonical_well_aliases, ops.canonical_well_locations |
Stable well master, aliases, spatial anchors. |
| Static subsurface | ops.canonical_formation_tops, ops.canonical_completion_intervals, ops.canonical_structural_surfaces |
Tops, perforations, and structural context. |
| Dynamic and technical | ops.canonical_production_records, ops.canonical_dynamic_reservoir_context, ops.canonical_technical_daily_reports, ops.canonical_witsml_* |
Time-series production and technical operational context. |
| Logs and interpretation | ops.well_logs, ops.well_qc_cards, ops.well_interpretations, ops.well_pay_events, ops.well_bypassed_candidates |
Log normalization, QC, pay intervals, and candidate logic. |
| Reservoir and seismic support | ops.canonical_seismic_*, ops.canonical_reservoir_bodies, ops.canonical_well_reservoir_penetrations, ops.canonical_reservoir_model_artifacts |
Survey/model representation and reservoir-access logic. |
| Decision outputs | ops.well_reopening_targets, ops.remaining_barrel_estimates |
Ranked decisions and low/mid/high barrel estimates. |
| Semantic projection | semantic.entities, semantic.entity_links, semantic.entity_source_links |
Entity graph projected from canonical truth. |
Human vs AI vs Derived Overview
| Class | Examples | How To Defend It |
|---|---|---|
| Human source data | production values, tops, completions, WITSML XML fields, technical reports | Directly extracted from source packages. |
| Human-directed scope | use Volve, use MinIO, process all packages, expose workflow, export results | Product intent came from the user. |
| Assistant-designed implementation | schema layout, migrations, workflow phases, scoring mechanics, coverage audit | Engineering design choices made to satisfy user goals. |
| Deterministic transform | WGS84/ECEF, counts, geometry envelopes, package inventories | Mechanically derived from source facts and used to create a defendable spatial source of truth. |
| Rule-based heuristic | scores, confidence labels, bypassed-pay candidates, remaining barrels | Authored rules and weights, not runtime opaque model inference. |
| Runtime AI/ML use | none found in canonicalization/scoring path | Current live outputs are deterministic or heuristic, not live LLM decisions. |
How We Get To The Conclusions
- Inventory packages and confirm what domains exist.
- Validate readiness and identify parser debt or partial coverage.
- Normalize source data into typed canonical tables.
- Link records across wells, logs, WITSML, reports, seismic, and models.
- Compute derived outputs such as QC, pay events, penetrations, and support scores.
- Compute decision-support outputs such as reopening targets and remaining barrels.
The conclusion is therefore evidence-driven. It is not “AI said so.” It is “source data was normalized, linked, and then processed by authored deterministic and heuristic logic.”
The product-origin boundary is equally important: this is a user-directed POC executed with assistant-authored implementation, not a product concept that originated independently from the assistant.
IP And Copyright Boundary
| Area | Current Position | Remaining Gap |
|---|---|---|
| Repository code | Traceable and documented | No top-level LICENSE or rights statement yet. |
| Third-party libraries | Bundled assets identified | No central third-party notices file yet. |
| Dataset handling | Provenance preserved and excerpts tightened | License propagation into exports/downloads is not fully explicit. |
| AI authorship posture | Build provenance documented | No formal repo-level AI authorship policy yet. |
Defense Summary For Colleagues
Use this exact line of argument:
- The build is grounded in real source data, not synthetic examples.
- The database separates raw source records from canonical truth and from derived outputs.
- Source-authored values are preserved with provenance.
- Derived values are explicitly marked as deterministic or heuristic.
- The system currently does not rely on runtime LLM/ML inference for canonicalization or ranking.
- The product direction came from Johann F.R Wilhelm; the assistant supplied implementation detail.
- The current barrel outputs are screening-grade heuristics, not booked reserves.
- IP posture is improving, but still requires repo license, third-party notices, and formal AI authorship policy.
Supporting Artifacts
- Full Build Explanation And AI Decision Log
- IP And Copyright Compliance Review
- Operations Report Index
Source-side artifacts for deeper validation:
docs/operations/BUILD_SCHEMA_PROVENANCE_AUDIT_20260330.jsondocs/operations/BUILD_SCHEMA_PROVENANCE_AUDIT_20260330.mddocs/operations/BUILD_SCHEMA_PROVENANCE_AUDIT_20260330.csv