Build Rationale, Schema, And IP Defense

Authorship And Rights Position

This documentation records Johann F.R Wilhelm as the author and creator of the project direction, problem framing, and intellectual build intent.

Assistant contributions in this repository are implementation assistance, documentation assistance, and engineering structure supplied within that user-directed frame.

This is an authorship and provenance statement for internal and colleague review. It does not replace formal legal registration or counsel review.

Build Position

This build is a real-data field-package processing system centered on Volve. It is not a toy schema exercise and it is not a reserve-booking engine.

It should be described explicitly as a user-directed proof-of-value / proof-of-concept built from open data, not as a production deployment and not as an integration into proprietary internal software.

The user defined the objective and kept narrowing the scope until the build had a concrete shape. The assistant implemented the mechanics needed to execute that scope.

The practical design rule was: preserve provenance, validate before normalization, canonicalize before scoring, and mark derived outputs honestly.

40 Canonical wells

16160 Production records

78 Reservoir bodies

18/18 Packages canonical_loaded

What Was Determined By Human Direction

These were not only isolated feature requests. They were the user-directed structure of the investigation and therefore count as human-directed build intent.

Use real Volve data instead of a hypothetical demo schema.
Validate all data before normalization.
Support WAN access through MinIO and the app UI.
Normalize all major data classes, not only high-priority subsets.
Keep the workflow explicit and visible in the UI.
Export outputs so the build can be inspected and defended.
Explain clearly what is human, what is derived, and what is AI.
Review IP and copyright exposure.

The user was also directing the build toward a harder question: whether open data could support a defendable payload-screening proof-of-concept, how difficult that would be, what could realistically be used, and what would still be missing before the result could become part of a production monetizable application.

These requirements therefore fixed the build boundary: use open data, keep secret software out of scope, make the result inspectable by colleagues, and optimize for defendability rather than novelty claims.

They also surfaced several draft-project conclusions that were added back into the build structure:

spatial truth needs explicit CRS/ECEF discipline,
identity resolution across wells and packages is a first-order problem,
“data present” is not the same as “decision-grade,”
artifact-level canonicalization matters even when deep native decode is incomplete,
ranking and volumetrics must be presented as heuristic outputs, not false certainties.

What Was Determined By Assistant Engineering Decisions

The user set the goals. The assistant chose the implementation mechanics.

One PostgreSQL cluster with schema separation instead of several independent databases.
MinIO object storage as the WAN-safe field-package source.
Phase-based pipeline: validate, normalize, enrich, score, estimate.
Canonical-first architecture instead of scoring from raw archives.
Coverage audit as an explicit package completeness ledger.
Rule-based scoring and low/mid/high volumetrics instead of pretending to have reserve-grade certainty.
Semantic entity projection from canonical records.

These are engineering choices, not user-authored domain facts. Another competent team could have built a comparable proof-of-concept from the same requirements. The value here is explicitness and traceability, not exclusivity.

Architecture Schematic

Source Packages

Volve files in local staging and MinIO

→

Validation Gate

Package inventory, readiness, parser debt, coverage

→

Canonical Normalization

Wells, tops, completions, production, seismic, WITSML, reports, artifacts

→

Derived Layers

QC, pay events, bypassed candidates, reservoir bodies, penetrations

→

Decision Outputs

Reopening ranking and remaining-barrel estimates

UI / WAN Access Upload page, manuals, admin workflow, MinIO prefix mode

Data Plane Validation, normalization, export, scoring, coverage audit

Storage MinIO for package staging, PostgreSQL for canonical and audit state

How The Database Is Used

Schema	Purpose	Why It Exists
`raw`	Source bundle/object registry	Proves what entered the system and from where.
`ops`	Canonical operational truth	Holds typed domain records used by the application and analysis logic.
`semantic`	Ontology projection	Creates entity/relationship structure for later query and AI use.
`audit`	Traceability and evidence	Allows the build to be defended and reproduced.
`core`	Users, permissions, tenancy	Separates platform control state from subsurface domain state.

The database is therefore not just storage. It is the structure that preserves provenance, reproducibility, and separation between source facts and derived outputs.

It is also how CRS normalization and ECEF/WGS84 transforms become defendable source-of-truth fields rather than hidden script behavior.

High-Level Schema Overview

Layer	Main Tables	What They Mean
Canonical identity	`ops.canonical_wells`, `ops.canonical_well_aliases`, `ops.canonical_well_locations`	Stable well master, aliases, spatial anchors.
Static subsurface	`ops.canonical_formation_tops`, `ops.canonical_completion_intervals`, `ops.canonical_structural_surfaces`	Tops, perforations, and structural context.
Dynamic and technical	`ops.canonical_production_records`, `ops.canonical_dynamic_reservoir_context`, `ops.canonical_technical_daily_reports`, `ops.canonical_witsml_*`	Time-series production and technical operational context.
Logs and interpretation	`ops.well_logs`, `ops.well_qc_cards`, `ops.well_interpretations`, `ops.well_pay_events`, `ops.well_bypassed_candidates`	Log normalization, QC, pay intervals, and candidate logic.
Reservoir and seismic support	`ops.canonical_seismic_*`, `ops.canonical_reservoir_bodies`, `ops.canonical_well_reservoir_penetrations`, `ops.canonical_reservoir_model_artifacts`	Survey/model representation and reservoir-access logic.
Decision outputs	`ops.well_reopening_targets`, `ops.remaining_barrel_estimates`	Ranked decisions and low/mid/high barrel estimates.
Semantic projection	`semantic.entities`, `semantic.entity_links`, `semantic.entity_source_links`	Entity graph projected from canonical truth.

Human vs AI vs Derived Overview

Class	Examples	How To Defend It
Human source data	production values, tops, completions, WITSML XML fields, technical reports	Directly extracted from source packages.
Human-directed scope	use Volve, use MinIO, process all packages, expose workflow, export results	Product intent came from the user.
Assistant-designed implementation	schema layout, migrations, workflow phases, scoring mechanics, coverage audit	Engineering design choices made to satisfy user goals.
Deterministic transform	WGS84/ECEF, counts, geometry envelopes, package inventories	Mechanically derived from source facts and used to create a defendable spatial source of truth.
Rule-based heuristic	scores, confidence labels, bypassed-pay candidates, remaining barrels	Authored rules and weights, not runtime opaque model inference.
Runtime AI/ML use	none found in canonicalization/scoring path	Current live outputs are deterministic or heuristic, not live LLM decisions.

How We Get To The Conclusions

Inventory packages and confirm what domains exist.
Validate readiness and identify parser debt or partial coverage.
Normalize source data into typed canonical tables.
Link records across wells, logs, WITSML, reports, seismic, and models.
Compute derived outputs such as QC, pay events, penetrations, and support scores.
Compute decision-support outputs such as reopening targets and remaining barrels.

The conclusion is therefore evidence-driven. It is not “AI said so.” It is “source data was normalized, linked, and then processed by authored deterministic and heuristic logic.”

The product-origin boundary is equally important: this is a user-directed POC executed with assistant-authored implementation, not a product concept that originated independently from the assistant.

IP And Copyright Boundary

Area	Current Position	Remaining Gap
Repository code	Traceable and documented	No top-level `LICENSE` or rights statement yet.
Third-party libraries	Bundled assets identified	No central third-party notices file yet.
Dataset handling	Provenance preserved and excerpts tightened	License propagation into exports/downloads is not fully explicit.
AI authorship posture	Build provenance documented	No formal repo-level AI authorship policy yet.

Defense Summary For Colleagues

Use this exact line of argument:

The build is grounded in real source data, not synthetic examples.
The database separates raw source records from canonical truth and from derived outputs.
Source-authored values are preserved with provenance.
Derived values are explicitly marked as deterministic or heuristic.
The system currently does not rely on runtime LLM/ML inference for canonicalization or ranking.
The product direction came from Johann F.R Wilhelm; the assistant supplied implementation detail.
The current barrel outputs are screening-grade heuristics, not booked reserves.
IP posture is improving, but still requires repo license, third-party notices, and formal AI authorship policy.

Supporting Artifacts

Source-side artifacts for deeper validation:

docs/operations/BUILD_SCHEMA_PROVENANCE_AUDIT_20260330.json
docs/operations/BUILD_SCHEMA_PROVENANCE_AUDIT_20260330.md
docs/operations/BUILD_SCHEMA_PROVENANCE_AUDIT_20260330.csv