Database Schema Defense Report
This report explains the database as the central truth-and-provenance system behind the current Earthbond proof-of-concept.
It is written for colleague review: why the schemas exist, what a row means, why the tables are separate, and why the current design is defendable for open-data decision support.
Why The Database Exists
The database was not added as generic storage. It exists to separate source evidence, canonical truth, derived interpretation, decision outputs, and audit traceability.
Without this structure, the build could not answer which source created a value, whether a value is direct or derived, what spatial basis was used, or why a well received a given score.
Schema Roles
| Schema | Purpose | Why It Exists | Audited Tables |
|---|---|---|---|
core |
Platform control and governance state. | Users, tenants, access control, and CRS reference policy should not be mixed with subsurface facts. | 10 |
raw |
Source bundle and object registry. | Preserves what entered the system before any canonical interpretation occurs. | 3 |
ops |
Canonical operational and decision-support truth. | This is where open-data packages become typed, queryable facts and explicit derived outputs. | 45 |
semantic |
Entity and relationship projection. | Supports later graph/query/AI usage without replacing canonical truth as the source of record. | 6 |
audit |
Traceability, evidence, and integrity. | Preserves how results were produced so they can be defended and replayed. | 5 |
Major Table Families
| Family | Main Tables | What One Row Means | Why It Was Built This Way |
|---|---|---|---|
| Source registry | raw.source_bundles, raw.source_objects | One row proves a source bundle or source object exists and was staged. | Without this layer, canonical rows would lose their source-of-origin chain. |
| Well identity and aliases | ops.canonical_wells, ops.canonical_well_aliases, ops.canonical_well_locations, ops.canonical_well_source_links | One row represents one canonical well, one alias mapping, one well location, or one source-link supporting that well identity. | Identity resolution is a first-order problem across inconsistent source naming. |
| Static subsurface | ops.canonical_formation_tops, ops.canonical_completion_intervals, ops.canonical_structural_surfaces | One row represents one top, one completion interval, or one structural surface/support object. | These are different physical truth classes and should not be merged into one generic well row. |
| Dynamic and technical history | ops.canonical_production_records, ops.canonical_technical_daily_reports, ops.canonical_witsml_* | One row represents one production record, one technical report, one trajectory, one BHA run, or one WITSML message/event. | Time-series and operational history need their own row grain to remain auditable. |
| Logs and interpretation | ops.well_logs, ops.well_qc_cards, ops.well_interpretations, ops.well_pay_events, ops.well_bypassed_candidates | One row represents one log run, one QC card, one interpretation, one pay event, or one bypassed-pay candidate. | Raw log presence, QC, and interpreted meaning are not the same thing and must be separated. |
| Seismic and reservoir support | ops.canonical_seismic_*, ops.canonical_reservoir_model_artifacts, ops.canonical_reservoir_bodies, ops.canonical_well_reservoir_penetrations | One row represents one survey/artifact, one model artifact, one reservoir body, or one well-to-body penetration. | Package presence, derived geometry, and penetration logic must remain distinguishable. |
| Coverage and workflow | ops.field_package_normalizations, ops.field_package_ingest_coverage | One row represents one normalization run or one package coverage status. | Data present is not the same as data canonically processed; the workflow state must be explicit. |
| Decision outputs | ops.well_reopening_targets, ops.remaining_barrel_estimation_runs, ops.remaining_barrel_estimates | One row represents one reopening target or one well-level barrel estimate. | Heuristic outputs must remain separate from factual source records. |
| Semantic projection | semantic.entities, semantic.entity_links, semantic.entity_source_links | One row represents one semantic entity or one semantic link projected from canonical data. | This allows later graph-style usage without turning the semantic layer into the source-of-truth layer. |
| Audit | audit.event_ledger, audit.evidence_packs, audit.integrity_checks | One row represents one event, evidence pack, or integrity check in the system trail. | This is the replay and defense layer for internal review. |
Why Rows Matter
A row is the smallest defendable unit of truth in this build.
- One row in
ops.canonical_wellsmeans one canonical well identity. - One row in
ops.canonical_production_recordsmeans one production time record. - One row in
ops.well_pay_eventsmeans one interpreted pay interval. - One row in
ops.well_reopening_targetsmeans one ranked decision output for one well. - One row in
audit.event_ledgermeans one auditable system event.
That row-level design is deliberate. It makes results inspectable and challengeable instead of collapsing them into opaque file-level summaries.
CRS And ECEF Truth
Spatial normalization is part of the truth model, not a side utility.
WGS84 and ECEF are used so wells, trajectories, tops, surfaces, and penetrations can be compared inside one defendable spatial frame. CRS ambiguity would otherwise undermine the entire decision layer.
Table-By-Table Walkthrough
| Table | Role | Primary Origin | Runtime Inference | Total Rows | Project Rows | Columns | Why It Exists |
|---|---|---|---|---|---|---|---|
audit.event_ledger |
audit and integrity ledger | operational_system |
deterministic_rule_derived |
547 | 9 | Audit tables are system-generated traceability and integrity records. | |
audit.event_ledger_202602 |
audit and integrity ledger | operational_system |
deterministic_rule_derived |
179 | 9 | Audit tables are system-generated traceability and integrity records. | |
audit.event_ledger_202603 |
audit and integrity ledger | operational_system |
deterministic_rule_derived |
368 | 9 | Audit tables are system-generated traceability and integrity records. | |
audit.evidence_packs |
audit and integrity ledger | operational_system |
deterministic_rule_derived |
34 | 8 | Audit tables are system-generated traceability and integrity records. | |
audit.integrity_checks |
audit and integrity ledger | operational_system |
deterministic_rule_derived |
0 | 6 | Audit tables are system-generated traceability and integrity records. | |
core.password_reset_tokens |
platform identity and policy | operational_system |
none |
6 | 10 | Core platform tables are system/auth/policy state, not Volve domain data. | |
core.policy_sets |
platform identity and policy | operational_system |
none |
4 | 7 | Core platform tables are system/auth/policy state, not Volve domain data. | |
core.reference_crs_profiles |
platform identity and policy | operational_system |
none |
2 | 6 | Core platform tables are system/auth/policy state, not Volve domain data. | |
core.roles |
platform identity and policy | operational_system |
none |
0 | 2 | Core platform tables are system/auth/policy state, not Volve domain data. | |
core.tenant_schema_registry |
platform identity and policy | operational_system |
none |
3 | 5 | Core platform tables are system/auth/policy state, not Volve domain data. | |
core.tenants |
platform identity and policy | operational_system |
none |
3 | 5 | Core platform tables are system/auth/policy state, not Volve domain data. | |
core.user_page_access |
platform identity and policy | operational_system |
none |
15 | 5 | Core platform tables are system/auth/policy state, not Volve domain data. | |
core.user_project_access |
platform identity and policy | operational_system |
none |
0 | 0 | 5 | Core platform tables are system/auth/policy state, not Volve domain data. |
core.user_roles |
platform identity and policy | operational_system |
none |
0 | 4 | Core platform tables are system/auth/policy state, not Volve domain data. | |
core.users |
platform identity and policy | operational_system |
none |
3 | 14 | Core platform tables are system/auth/policy state, not Volve domain data. | |
ops.canonical_completion_intervals |
perforations and completion intervals | source_human_dataset |
none |
12 | 12 | 23 | Loaded from perforation/completion source files; status fields remain source-authored. |
ops.canonical_dynamic_reservoir_context |
reservoir-model support signals | deterministic_rule_derived |
rule_based_heuristic |
40 | 40 | 15 | Built from Eclipse/RMS references and parsed model content. Support scores are heuristic, not ML. |
ops.canonical_formation_tops |
well tops and picks | source_human_dataset |
none |
409 | 409 | 19 | Loaded from well-picks files. Spatial/depth transforms are deterministic when present. |
ops.canonical_log_artifacts |
full log-estate artifact inventory | source_human_dataset |
deterministic_transform |
5344 | 5344 | 20 | Artifact facts come from log packages; curve/sample counts are deterministic parses. |
ops.canonical_package_artifacts |
generic package artifact inventory | source_human_dataset |
deterministic_transform |
5483 | 5483 | 17 | Generic artifact registry for packages not deeply semantically decoded yet. |
ops.canonical_production_records |
daily/monthly production history | source_human_dataset |
none |
16160 | 16160 | 25 | Directly normalized from workbook and production package records. |
ops.canonical_report_documents |
report document registry | source_human_dataset |
deterministic_transform |
2 | 2 | 18 | Document-level facts are canonicalized from reports packages. Any extracted text fields are source-derived and need copyright controls. |
ops.canonical_reservoir_bodies |
derived reservoir bodies | deterministic_rule_derived |
rule_based_heuristic |
78 | 78 | 28 | Bodies are derived from tops, RMS point surfaces, and Eclipse support. Approximate areas/volumes are heuristic approximations. |
ops.canonical_reservoir_model_artifacts |
RMS/Eclipse artifact inventory | source_human_dataset |
deterministic_transform |
5454 | 5454 | 18 | Artifact facts come from model packages; linkage and parse status are computed. |
ops.canonical_seismic_artifacts |
seismic artifact inventory | source_human_dataset |
deterministic_transform |
98 | 98 | 23 | Artifact facts come from packages and headers; line extents/counts are deterministic summaries. |
ops.canonical_seismic_surveys |
seismic survey registry | source_human_dataset |
deterministic_transform |
14 | 14 | 12 | Survey entities are canonicalized from seismic support packages; not full trace semantics. |
ops.canonical_structural_surfaces |
structural metadata envelope | source_human_dataset |
deterministic_transform |
92 | 92 | 19 | Surface extents and counts are deterministic summaries of human-authored horizon/fault files. |
ops.canonical_technical_daily_reports |
technical drilling daily reports | source_human_dataset |
deterministic_transform |
1555 | 1555 | 29 | Extracted from technical XML. Report content is human-authored source data. |
ops.canonical_well_aliases |
well identity resolver aliases | deterministic_rule_derived |
rule_based_heuristic |
496 | 496 | 11 | Aliases are generated by normalization rules and content-based identity resolution authored in code. |
ops.canonical_well_locations |
canonical spatial anchors | source_human_dataset |
deterministic_transform |
34 | 34 | 20 | Human-authored source coordinates are preserved and augmented with deterministic CRS/WGS84/ECEF transforms. |
ops.canonical_well_reservoir_penetrations |
well-to-reservoir intersections | deterministic_rule_derived |
rule_based_heuristic |
78 | 78 | 23 | Penetrations are derived from top/base intervals, COMPDAT, and spatial matching. Not full cell-level simulation geometry. |
ops.canonical_well_source_links |
well/source lineage links | deterministic_rule_derived |
rule_based_heuristic |
1048 | 1048 | 13 | Links are created by content parsing and alias resolution. This is not source-authored data. |
ops.canonical_wells |
canonical well identity master | source_human_dataset |
none |
40 | 40 | 13 | Directly normalized from structured source identifiers such as production workbooks and well interpretation files. |
ops.canonical_witsml_bha_runs |
canonical WITSML BHA runs | source_human_dataset |
deterministic_transform |
15 | 15 | 20 | Loaded from WITSML XML with deterministic parsing and well linkage. |
ops.canonical_witsml_messages |
canonical WITSML messages | source_human_dataset |
deterministic_transform |
582 | 582 | 22 | Message text is direct source content. Linkage confidence is deterministic rule output. |
ops.canonical_witsml_support |
WITSML support rollup | deterministic_rule_derived |
rule_based_heuristic |
8 | 8 | 17 | Aggregated per-well support counts derived from canonical WITSML records. |
ops.canonical_witsml_trajectories |
canonical WITSML trajectories | source_human_dataset |
deterministic_transform |
14 | 14 | 24 | Trajectory station counts and ranges are deterministic summaries from WITSML XML. |
ops.canonical_witsml_wellbores |
canonical WITSML wellbores | source_human_dataset |
deterministic_transform |
7 | 7 | 20 | Source facts come from WITSML XML; canonical well linkage is rule-based. |
ops.dead_letter |
application data | operational_system |
deterministic_rule_derived |
0 | 6 | No explicit table override defined; review table-specific logic before treating as source-authored. | |
ops.field_package_ingest_coverage |
package-level coverage audit | operational_system |
deterministic_rule_derived |
18 | 18 | 19 | System-generated ledger showing canonical status of each staged package. |
ops.field_package_normalizations |
application data | operational_system |
deterministic_rule_derived |
35 | 35 | 11 | No explicit table override defined; review table-specific logic before treating as source-authored. |
ops.job_attempts |
application data | operational_system |
deterministic_rule_derived |
24 | 7 | No explicit table override defined; review table-specific logic before treating as source-authored. | |
ops.job_results |
application data | operational_system |
deterministic_rule_derived |
264 | 5 | No explicit table override defined; review table-specific logic before treating as source-authored. | |
ops.jobs |
application data | operational_system |
deterministic_rule_derived |
335 | 7 | No explicit table override defined; review table-specific logic before treating as source-authored. | |
ops.jobs_202602 |
application data | operational_system |
deterministic_rule_derived |
103 | 7 | No explicit table override defined; review table-specific logic before treating as source-authored. | |
ops.jobs_202603 |
application data | operational_system |
deterministic_rule_derived |
232 | 7 | No explicit table override defined; review table-specific logic before treating as source-authored. | |
ops.pointcloud_grids |
application data | operational_system |
deterministic_rule_derived |
5 | 5 | 11 | No explicit table override defined; review table-specific logic before treating as source-authored. |
ops.pointcloud_tiles |
application data | operational_system |
deterministic_rule_derived |
29 | 29 | 22 | No explicit table override defined; review table-specific logic before treating as source-authored. |
ops.remaining_barrel_estimates |
remaining-barrel estimates | deterministic_rule_derived |
rule_based_heuristic |
11 | 11 | 42 | Low/mid/high barrel outputs are heuristic volumetric estimates, not booked reserves. |
ops.remaining_barrel_estimation_runs |
barrel-estimation run ledger | operational_system |
rule_based_heuristic |
1 | 1 | 13 | Run metadata and methodology for remaining-barrel estimation. |
ops.reopen_score_profiles |
application data | operational_system |
deterministic_rule_derived |
1 | 1 | 12 | No explicit table override defined; review table-specific logic before treating as source-authored. |
ops.scraper_schedules |
application data | operational_system |
deterministic_rule_derived |
16 | 8 | No explicit table override defined; review table-specific logic before treating as source-authored. | |
ops.upload_sessions |
application data | operational_system |
deterministic_rule_derived |
480 | 480 | 11 | No explicit table override defined; review table-specific logic before treating as source-authored. |
ops.well_bypassed_candidates |
bypassed-pay candidates | deterministic_rule_derived |
rule_based_heuristic |
265 | 265 | 22 | Candidate classification and score are heuristic outputs from coded completion/pay rules. |
ops.well_data_gaps |
application data | operational_system |
deterministic_rule_derived |
72 | 72 | 10 | No explicit table override defined; review table-specific logic before treating as source-authored. |
ops.well_interpretations |
petrophysical interpretation runs | deterministic_rule_derived |
rule_based_heuristic |
40 | 40 | 14 | Interpretation outputs are computed by formulas/cutoffs configured in code. No runtime LLM or statistical model is used. |
ops.well_logs |
normalized well-log run registry | source_human_dataset |
deterministic_transform |
56 | 56 | 29 | Run-level metadata comes from log files; CRS and curve summaries are deterministic. |
ops.well_pay_events |
pay interval events | deterministic_rule_derived |
rule_based_heuristic |
269 | 269 | 27 | Intervals are derived from interpreted logs and thickness cutoffs. |
ops.well_qc_cards |
log QC summaries | deterministic_rule_derived |
rule_based_heuristic |
56 | 56 | 11 | QC metrics and ratings are computed from completeness/washout rules, not from an AI model. |
ops.well_reopening_targets |
reopening candidate ranking | deterministic_rule_derived |
rule_based_heuristic |
40 | 40 | 34 | Scores, classifications, and recommendations are heuristic outputs from authored formulas and weights. No runtime LLM/ML model is used. |
raw.extracted_fields |
raw source intake | source_human_dataset |
none |
480 | 8 | Raw intake tables preserve uploaded/source package facts and extraction traces. | |
raw.source_bundles |
raw source intake | source_human_dataset |
none |
4 | 4 | 11 | Raw intake tables preserve uploaded/source package facts and extraction traces. |
raw.source_objects |
raw source intake | source_human_dataset |
none |
32 | 32 | 19 | Raw intake tables preserve uploaded/source package facts and extraction traces. |
semantic.entities |
semantic entity projection | deterministic_rule_derived |
deterministic_semantic_projection |
40 | 40 | 13 | Semantic layer is projected from canonical data using the designed ontology. No runtime LLM inference observed. |
semantic.entity_links |
semantic relationship projection | deterministic_rule_derived |
deterministic_semantic_projection |
0 | 0 | 11 | Relationship graph is derived from canonical data and schema rules. |
semantic.entity_source_links |
semantic/source lineage | deterministic_rule_derived |
deterministic_semantic_projection |
0 | 10 | Traceability from semantic entities back to raw sources. | |
semantic.entity_types |
semantic projection | deterministic_rule_derived |
deterministic_semantic_projection |
19 | 7 | Semantic layer projects canonical records into ontology entities and relations. | |
semantic.query_profiles |
semantic projection | deterministic_rule_derived |
deterministic_semantic_projection |
2 | 9 | Semantic layer projects canonical records into ontology entities and relations. | |
semantic.relation_types |
semantic projection | deterministic_rule_derived |
deterministic_semantic_projection |
18 | 8 | Semantic layer projects canonical records into ontology entities and relations. |
Defense Summary
- The DB is the mechanism that separates source facts from heuristic outputs.
- Tables are separated by object meaning, not by file convenience.
- Rows are kept at the smallest useful unit so conclusions can be audited.
- CRS and ECEF are explicit because spatial truth is part of the defendable result.
- The result is a truth-and-provenance system for open-data decision support, not just storage.