Database Schema Defense Report

Source: docs/architecture/DATABASE_SCHEMA_DEFENSE_REPORT_MANUAL.html

Manual Index Client UI

Database Schema Defense Report

This report explains the database as the central truth-and-provenance system behind the current Earthbond proof-of-concept.

It is written for colleague review: why the schemas exist, what a row means, why the tables are separate, and why the current design is defendable for open-data decision support.

0028Alembic version
69Live tables audited
40Canonical wells
16160Production rows

Why The Database Exists

The database was not added as generic storage. It exists to separate source evidence, canonical truth, derived interpretation, decision outputs, and audit traceability.

Without this structure, the build could not answer which source created a value, whether a value is direct or derived, what spatial basis was used, or why a well received a given score.

Schema Roles

Schema Purpose Why It Exists Audited Tables
core Platform control and governance state. Users, tenants, access control, and CRS reference policy should not be mixed with subsurface facts. 10
raw Source bundle and object registry. Preserves what entered the system before any canonical interpretation occurs. 3
ops Canonical operational and decision-support truth. This is where open-data packages become typed, queryable facts and explicit derived outputs. 45
semantic Entity and relationship projection. Supports later graph/query/AI usage without replacing canonical truth as the source of record. 6
audit Traceability, evidence, and integrity. Preserves how results were produced so they can be defended and replayed. 5

Major Table Families

Family Main Tables What One Row Means Why It Was Built This Way
Source registry raw.source_bundles, raw.source_objects One row proves a source bundle or source object exists and was staged. Without this layer, canonical rows would lose their source-of-origin chain.
Well identity and aliases ops.canonical_wells, ops.canonical_well_aliases, ops.canonical_well_locations, ops.canonical_well_source_links One row represents one canonical well, one alias mapping, one well location, or one source-link supporting that well identity. Identity resolution is a first-order problem across inconsistent source naming.
Static subsurface ops.canonical_formation_tops, ops.canonical_completion_intervals, ops.canonical_structural_surfaces One row represents one top, one completion interval, or one structural surface/support object. These are different physical truth classes and should not be merged into one generic well row.
Dynamic and technical history ops.canonical_production_records, ops.canonical_technical_daily_reports, ops.canonical_witsml_* One row represents one production record, one technical report, one trajectory, one BHA run, or one WITSML message/event. Time-series and operational history need their own row grain to remain auditable.
Logs and interpretation ops.well_logs, ops.well_qc_cards, ops.well_interpretations, ops.well_pay_events, ops.well_bypassed_candidates One row represents one log run, one QC card, one interpretation, one pay event, or one bypassed-pay candidate. Raw log presence, QC, and interpreted meaning are not the same thing and must be separated.
Seismic and reservoir support ops.canonical_seismic_*, ops.canonical_reservoir_model_artifacts, ops.canonical_reservoir_bodies, ops.canonical_well_reservoir_penetrations One row represents one survey/artifact, one model artifact, one reservoir body, or one well-to-body penetration. Package presence, derived geometry, and penetration logic must remain distinguishable.
Coverage and workflow ops.field_package_normalizations, ops.field_package_ingest_coverage One row represents one normalization run or one package coverage status. Data present is not the same as data canonically processed; the workflow state must be explicit.
Decision outputs ops.well_reopening_targets, ops.remaining_barrel_estimation_runs, ops.remaining_barrel_estimates One row represents one reopening target or one well-level barrel estimate. Heuristic outputs must remain separate from factual source records.
Semantic projection semantic.entities, semantic.entity_links, semantic.entity_source_links One row represents one semantic entity or one semantic link projected from canonical data. This allows later graph-style usage without turning the semantic layer into the source-of-truth layer.
Audit audit.event_ledger, audit.evidence_packs, audit.integrity_checks One row represents one event, evidence pack, or integrity check in the system trail. This is the replay and defense layer for internal review.

Why Rows Matter

A row is the smallest defendable unit of truth in this build.

  • One row in ops.canonical_wells means one canonical well identity.
  • One row in ops.canonical_production_records means one production time record.
  • One row in ops.well_pay_events means one interpreted pay interval.
  • One row in ops.well_reopening_targets means one ranked decision output for one well.
  • One row in audit.event_ledger means one auditable system event.

That row-level design is deliberate. It makes results inspectable and challengeable instead of collapsing them into opaque file-level summaries.

CRS And ECEF Truth

Spatial normalization is part of the truth model, not a side utility.

WGS84 and ECEF are used so wells, trajectories, tops, surfaces, and penetrations can be compared inside one defendable spatial frame. CRS ambiguity would otherwise undermine the entire decision layer.

Table-By-Table Walkthrough

Table Role Primary Origin Runtime Inference Total Rows Project Rows Columns Why It Exists
audit.event_ledger audit and integrity ledger operational_system deterministic_rule_derived 547 9 Audit tables are system-generated traceability and integrity records.
audit.event_ledger_202602 audit and integrity ledger operational_system deterministic_rule_derived 179 9 Audit tables are system-generated traceability and integrity records.
audit.event_ledger_202603 audit and integrity ledger operational_system deterministic_rule_derived 368 9 Audit tables are system-generated traceability and integrity records.
audit.evidence_packs audit and integrity ledger operational_system deterministic_rule_derived 34 8 Audit tables are system-generated traceability and integrity records.
audit.integrity_checks audit and integrity ledger operational_system deterministic_rule_derived 0 6 Audit tables are system-generated traceability and integrity records.
core.password_reset_tokens platform identity and policy operational_system none 6 10 Core platform tables are system/auth/policy state, not Volve domain data.
core.policy_sets platform identity and policy operational_system none 4 7 Core platform tables are system/auth/policy state, not Volve domain data.
core.reference_crs_profiles platform identity and policy operational_system none 2 6 Core platform tables are system/auth/policy state, not Volve domain data.
core.roles platform identity and policy operational_system none 0 2 Core platform tables are system/auth/policy state, not Volve domain data.
core.tenant_schema_registry platform identity and policy operational_system none 3 5 Core platform tables are system/auth/policy state, not Volve domain data.
core.tenants platform identity and policy operational_system none 3 5 Core platform tables are system/auth/policy state, not Volve domain data.
core.user_page_access platform identity and policy operational_system none 15 5 Core platform tables are system/auth/policy state, not Volve domain data.
core.user_project_access platform identity and policy operational_system none 0 0 5 Core platform tables are system/auth/policy state, not Volve domain data.
core.user_roles platform identity and policy operational_system none 0 4 Core platform tables are system/auth/policy state, not Volve domain data.
core.users platform identity and policy operational_system none 3 14 Core platform tables are system/auth/policy state, not Volve domain data.
ops.canonical_completion_intervals perforations and completion intervals source_human_dataset none 12 12 23 Loaded from perforation/completion source files; status fields remain source-authored.
ops.canonical_dynamic_reservoir_context reservoir-model support signals deterministic_rule_derived rule_based_heuristic 40 40 15 Built from Eclipse/RMS references and parsed model content. Support scores are heuristic, not ML.
ops.canonical_formation_tops well tops and picks source_human_dataset none 409 409 19 Loaded from well-picks files. Spatial/depth transforms are deterministic when present.
ops.canonical_log_artifacts full log-estate artifact inventory source_human_dataset deterministic_transform 5344 5344 20 Artifact facts come from log packages; curve/sample counts are deterministic parses.
ops.canonical_package_artifacts generic package artifact inventory source_human_dataset deterministic_transform 5483 5483 17 Generic artifact registry for packages not deeply semantically decoded yet.
ops.canonical_production_records daily/monthly production history source_human_dataset none 16160 16160 25 Directly normalized from workbook and production package records.
ops.canonical_report_documents report document registry source_human_dataset deterministic_transform 2 2 18 Document-level facts are canonicalized from reports packages. Any extracted text fields are source-derived and need copyright controls.
ops.canonical_reservoir_bodies derived reservoir bodies deterministic_rule_derived rule_based_heuristic 78 78 28 Bodies are derived from tops, RMS point surfaces, and Eclipse support. Approximate areas/volumes are heuristic approximations.
ops.canonical_reservoir_model_artifacts RMS/Eclipse artifact inventory source_human_dataset deterministic_transform 5454 5454 18 Artifact facts come from model packages; linkage and parse status are computed.
ops.canonical_seismic_artifacts seismic artifact inventory source_human_dataset deterministic_transform 98 98 23 Artifact facts come from packages and headers; line extents/counts are deterministic summaries.
ops.canonical_seismic_surveys seismic survey registry source_human_dataset deterministic_transform 14 14 12 Survey entities are canonicalized from seismic support packages; not full trace semantics.
ops.canonical_structural_surfaces structural metadata envelope source_human_dataset deterministic_transform 92 92 19 Surface extents and counts are deterministic summaries of human-authored horizon/fault files.
ops.canonical_technical_daily_reports technical drilling daily reports source_human_dataset deterministic_transform 1555 1555 29 Extracted from technical XML. Report content is human-authored source data.
ops.canonical_well_aliases well identity resolver aliases deterministic_rule_derived rule_based_heuristic 496 496 11 Aliases are generated by normalization rules and content-based identity resolution authored in code.
ops.canonical_well_locations canonical spatial anchors source_human_dataset deterministic_transform 34 34 20 Human-authored source coordinates are preserved and augmented with deterministic CRS/WGS84/ECEF transforms.
ops.canonical_well_reservoir_penetrations well-to-reservoir intersections deterministic_rule_derived rule_based_heuristic 78 78 23 Penetrations are derived from top/base intervals, COMPDAT, and spatial matching. Not full cell-level simulation geometry.
ops.canonical_well_source_links well/source lineage links deterministic_rule_derived rule_based_heuristic 1048 1048 13 Links are created by content parsing and alias resolution. This is not source-authored data.
ops.canonical_wells canonical well identity master source_human_dataset none 40 40 13 Directly normalized from structured source identifiers such as production workbooks and well interpretation files.
ops.canonical_witsml_bha_runs canonical WITSML BHA runs source_human_dataset deterministic_transform 15 15 20 Loaded from WITSML XML with deterministic parsing and well linkage.
ops.canonical_witsml_messages canonical WITSML messages source_human_dataset deterministic_transform 582 582 22 Message text is direct source content. Linkage confidence is deterministic rule output.
ops.canonical_witsml_support WITSML support rollup deterministic_rule_derived rule_based_heuristic 8 8 17 Aggregated per-well support counts derived from canonical WITSML records.
ops.canonical_witsml_trajectories canonical WITSML trajectories source_human_dataset deterministic_transform 14 14 24 Trajectory station counts and ranges are deterministic summaries from WITSML XML.
ops.canonical_witsml_wellbores canonical WITSML wellbores source_human_dataset deterministic_transform 7 7 20 Source facts come from WITSML XML; canonical well linkage is rule-based.
ops.dead_letter application data operational_system deterministic_rule_derived 0 6 No explicit table override defined; review table-specific logic before treating as source-authored.
ops.field_package_ingest_coverage package-level coverage audit operational_system deterministic_rule_derived 18 18 19 System-generated ledger showing canonical status of each staged package.
ops.field_package_normalizations application data operational_system deterministic_rule_derived 35 35 11 No explicit table override defined; review table-specific logic before treating as source-authored.
ops.job_attempts application data operational_system deterministic_rule_derived 24 7 No explicit table override defined; review table-specific logic before treating as source-authored.
ops.job_results application data operational_system deterministic_rule_derived 264 5 No explicit table override defined; review table-specific logic before treating as source-authored.
ops.jobs application data operational_system deterministic_rule_derived 335 7 No explicit table override defined; review table-specific logic before treating as source-authored.
ops.jobs_202602 application data operational_system deterministic_rule_derived 103 7 No explicit table override defined; review table-specific logic before treating as source-authored.
ops.jobs_202603 application data operational_system deterministic_rule_derived 232 7 No explicit table override defined; review table-specific logic before treating as source-authored.
ops.pointcloud_grids application data operational_system deterministic_rule_derived 5 5 11 No explicit table override defined; review table-specific logic before treating as source-authored.
ops.pointcloud_tiles application data operational_system deterministic_rule_derived 29 29 22 No explicit table override defined; review table-specific logic before treating as source-authored.
ops.remaining_barrel_estimates remaining-barrel estimates deterministic_rule_derived rule_based_heuristic 11 11 42 Low/mid/high barrel outputs are heuristic volumetric estimates, not booked reserves.
ops.remaining_barrel_estimation_runs barrel-estimation run ledger operational_system rule_based_heuristic 1 1 13 Run metadata and methodology for remaining-barrel estimation.
ops.reopen_score_profiles application data operational_system deterministic_rule_derived 1 1 12 No explicit table override defined; review table-specific logic before treating as source-authored.
ops.scraper_schedules application data operational_system deterministic_rule_derived 16 8 No explicit table override defined; review table-specific logic before treating as source-authored.
ops.upload_sessions application data operational_system deterministic_rule_derived 480 480 11 No explicit table override defined; review table-specific logic before treating as source-authored.
ops.well_bypassed_candidates bypassed-pay candidates deterministic_rule_derived rule_based_heuristic 265 265 22 Candidate classification and score are heuristic outputs from coded completion/pay rules.
ops.well_data_gaps application data operational_system deterministic_rule_derived 72 72 10 No explicit table override defined; review table-specific logic before treating as source-authored.
ops.well_interpretations petrophysical interpretation runs deterministic_rule_derived rule_based_heuristic 40 40 14 Interpretation outputs are computed by formulas/cutoffs configured in code. No runtime LLM or statistical model is used.
ops.well_logs normalized well-log run registry source_human_dataset deterministic_transform 56 56 29 Run-level metadata comes from log files; CRS and curve summaries are deterministic.
ops.well_pay_events pay interval events deterministic_rule_derived rule_based_heuristic 269 269 27 Intervals are derived from interpreted logs and thickness cutoffs.
ops.well_qc_cards log QC summaries deterministic_rule_derived rule_based_heuristic 56 56 11 QC metrics and ratings are computed from completeness/washout rules, not from an AI model.
ops.well_reopening_targets reopening candidate ranking deterministic_rule_derived rule_based_heuristic 40 40 34 Scores, classifications, and recommendations are heuristic outputs from authored formulas and weights. No runtime LLM/ML model is used.
raw.extracted_fields raw source intake source_human_dataset none 480 8 Raw intake tables preserve uploaded/source package facts and extraction traces.
raw.source_bundles raw source intake source_human_dataset none 4 4 11 Raw intake tables preserve uploaded/source package facts and extraction traces.
raw.source_objects raw source intake source_human_dataset none 32 32 19 Raw intake tables preserve uploaded/source package facts and extraction traces.
semantic.entities semantic entity projection deterministic_rule_derived deterministic_semantic_projection 40 40 13 Semantic layer is projected from canonical data using the designed ontology. No runtime LLM inference observed.
semantic.entity_links semantic relationship projection deterministic_rule_derived deterministic_semantic_projection 0 0 11 Relationship graph is derived from canonical data and schema rules.
semantic.entity_source_links semantic/source lineage deterministic_rule_derived deterministic_semantic_projection 0 10 Traceability from semantic entities back to raw sources.
semantic.entity_types semantic projection deterministic_rule_derived deterministic_semantic_projection 19 7 Semantic layer projects canonical records into ontology entities and relations.
semantic.query_profiles semantic projection deterministic_rule_derived deterministic_semantic_projection 2 9 Semantic layer projects canonical records into ontology entities and relations.
semantic.relation_types semantic projection deterministic_rule_derived deterministic_semantic_projection 18 8 Semantic layer projects canonical records into ontology entities and relations.

Defense Summary

  1. The DB is the mechanism that separates source facts from heuristic outputs.
  2. Tables are separated by object meaning, not by file convenience.
  3. Rows are kept at the smallest useful unit so conclusions can be audited.
  4. CRS and ECEF are explicit because spatial truth is part of the defendable result.
  5. The result is a truth-and-provenance system for open-data decision support, not just storage.