Database Schema Defense Report

This report explains the database as the central truth-and-provenance system behind the current Earthbond proof-of-concept.

It is written for colleague review: why the schemas exist, what a row means, why the tables are separate, and why the current design is defendable for open-data decision support.

0028Alembic version

69Live tables audited

40Canonical wells

16160Production rows

Why The Database Exists

The database was not added as generic storage. It exists to separate source evidence, canonical truth, derived interpretation, decision outputs, and audit traceability.

Without this structure, the build could not answer which source created a value, whether a value is direct or derived, what spatial basis was used, or why a well received a given score.

Schema Roles

Schema	Purpose	Why It Exists	Audited Tables
`core`	Platform control and governance state.	Users, tenants, access control, and CRS reference policy should not be mixed with subsurface facts.	10
`raw`	Source bundle and object registry.	Preserves what entered the system before any canonical interpretation occurs.	3
`ops`	Canonical operational and decision-support truth.	This is where open-data packages become typed, queryable facts and explicit derived outputs.	45
`semantic`	Entity and relationship projection.	Supports later graph/query/AI usage without replacing canonical truth as the source of record.	6
`audit`	Traceability, evidence, and integrity.	Preserves how results were produced so they can be defended and replayed.	5

Major Table Families

Family	Main Tables	What One Row Means	Why It Was Built This Way
Source registry	raw.source_bundles, raw.source_objects	One row proves a source bundle or source object exists and was staged.	Without this layer, canonical rows would lose their source-of-origin chain.
Well identity and aliases	ops.canonical_wells, ops.canonical_well_aliases, ops.canonical_well_locations, ops.canonical_well_source_links	One row represents one canonical well, one alias mapping, one well location, or one source-link supporting that well identity.	Identity resolution is a first-order problem across inconsistent source naming.
Static subsurface	ops.canonical_formation_tops, ops.canonical_completion_intervals, ops.canonical_structural_surfaces	One row represents one top, one completion interval, or one structural surface/support object.	These are different physical truth classes and should not be merged into one generic well row.
Dynamic and technical history	ops.canonical_production_records, ops.canonical_technical_daily_reports, ops.canonical_witsml_*	One row represents one production record, one technical report, one trajectory, one BHA run, or one WITSML message/event.	Time-series and operational history need their own row grain to remain auditable.
Logs and interpretation	ops.well_logs, ops.well_qc_cards, ops.well_interpretations, ops.well_pay_events, ops.well_bypassed_candidates	One row represents one log run, one QC card, one interpretation, one pay event, or one bypassed-pay candidate.	Raw log presence, QC, and interpreted meaning are not the same thing and must be separated.
Seismic and reservoir support	ops.canonical_seismic_*, ops.canonical_reservoir_model_artifacts, ops.canonical_reservoir_bodies, ops.canonical_well_reservoir_penetrations	One row represents one survey/artifact, one model artifact, one reservoir body, or one well-to-body penetration.	Package presence, derived geometry, and penetration logic must remain distinguishable.
Coverage and workflow	ops.field_package_normalizations, ops.field_package_ingest_coverage	One row represents one normalization run or one package coverage status.	Data present is not the same as data canonically processed; the workflow state must be explicit.
Decision outputs	ops.well_reopening_targets, ops.remaining_barrel_estimation_runs, ops.remaining_barrel_estimates	One row represents one reopening target or one well-level barrel estimate.	Heuristic outputs must remain separate from factual source records.
Semantic projection	semantic.entities, semantic.entity_links, semantic.entity_source_links	One row represents one semantic entity or one semantic link projected from canonical data.	This allows later graph-style usage without turning the semantic layer into the source-of-truth layer.
Audit	audit.event_ledger, audit.evidence_packs, audit.integrity_checks	One row represents one event, evidence pack, or integrity check in the system trail.	This is the replay and defense layer for internal review.

Why Rows Matter

A row is the smallest defendable unit of truth in this build.

One row in ops.canonical_wells means one canonical well identity.
One row in ops.canonical_production_records means one production time record.
One row in ops.well_pay_events means one interpreted pay interval.
One row in ops.well_reopening_targets means one ranked decision output for one well.
One row in audit.event_ledger means one auditable system event.

That row-level design is deliberate. It makes results inspectable and challengeable instead of collapsing them into opaque file-level summaries.

CRS And ECEF Truth

Spatial normalization is part of the truth model, not a side utility.

WGS84 and ECEF are used so wells, trajectories, tops, surfaces, and penetrations can be compared inside one defendable spatial frame. CRS ambiguity would otherwise undermine the entire decision layer.

Table-By-Table Walkthrough

Table	Role	Primary Origin	Runtime Inference	Total Rows	Project Rows	Columns	Why It Exists
`audit.event_ledger`	audit and integrity ledger	`operational_system`	`deterministic_rule_derived`	547		9	Audit tables are system-generated traceability and integrity records.
`audit.event_ledger_202602`	audit and integrity ledger	`operational_system`	`deterministic_rule_derived`	179		9	Audit tables are system-generated traceability and integrity records.
`audit.event_ledger_202603`	audit and integrity ledger	`operational_system`	`deterministic_rule_derived`	368		9	Audit tables are system-generated traceability and integrity records.
`audit.evidence_packs`	audit and integrity ledger	`operational_system`	`deterministic_rule_derived`	34		8	Audit tables are system-generated traceability and integrity records.
`audit.integrity_checks`	audit and integrity ledger	`operational_system`	`deterministic_rule_derived`	0		6	Audit tables are system-generated traceability and integrity records.
`core.password_reset_tokens`	platform identity and policy	`operational_system`	`none`	6		10	Core platform tables are system/auth/policy state, not Volve domain data.
`core.policy_sets`	platform identity and policy	`operational_system`	`none`	4		7	Core platform tables are system/auth/policy state, not Volve domain data.
`core.reference_crs_profiles`	platform identity and policy	`operational_system`	`none`	2		6	Core platform tables are system/auth/policy state, not Volve domain data.
`core.roles`	platform identity and policy	`operational_system`	`none`	0		2	Core platform tables are system/auth/policy state, not Volve domain data.
`core.tenant_schema_registry`	platform identity and policy	`operational_system`	`none`	3		5	Core platform tables are system/auth/policy state, not Volve domain data.
`core.tenants`	platform identity and policy	`operational_system`	`none`	3		5	Core platform tables are system/auth/policy state, not Volve domain data.
`core.user_page_access`	platform identity and policy	`operational_system`	`none`	15		5	Core platform tables are system/auth/policy state, not Volve domain data.
`core.user_project_access`	platform identity and policy	`operational_system`	`none`	0	0	5	Core platform tables are system/auth/policy state, not Volve domain data.
`core.user_roles`	platform identity and policy	`operational_system`	`none`	0		4	Core platform tables are system/auth/policy state, not Volve domain data.
`core.users`	platform identity and policy	`operational_system`	`none`	3		14	Core platform tables are system/auth/policy state, not Volve domain data.
`ops.canonical_completion_intervals`	perforations and completion intervals	`source_human_dataset`	`none`	12	12	23	Loaded from perforation/completion source files; status fields remain source-authored.
`ops.canonical_dynamic_reservoir_context`	reservoir-model support signals	`deterministic_rule_derived`	`rule_based_heuristic`	40	40	15	Built from Eclipse/RMS references and parsed model content. Support scores are heuristic, not ML.
`ops.canonical_formation_tops`	well tops and picks	`source_human_dataset`	`none`	409	409	19	Loaded from well-picks files. Spatial/depth transforms are deterministic when present.
`ops.canonical_log_artifacts`	full log-estate artifact inventory	`source_human_dataset`	`deterministic_transform`	5344	5344	20	Artifact facts come from log packages; curve/sample counts are deterministic parses.
`ops.canonical_package_artifacts`	generic package artifact inventory	`source_human_dataset`	`deterministic_transform`	5483	5483	17	Generic artifact registry for packages not deeply semantically decoded yet.
`ops.canonical_production_records`	daily/monthly production history	`source_human_dataset`	`none`	16160	16160	25	Directly normalized from workbook and production package records.
`ops.canonical_report_documents`	report document registry	`source_human_dataset`	`deterministic_transform`	2	2	18	Document-level facts are canonicalized from reports packages. Any extracted text fields are source-derived and need copyright controls.
`ops.canonical_reservoir_bodies`	derived reservoir bodies	`deterministic_rule_derived`	`rule_based_heuristic`	78	78	28	Bodies are derived from tops, RMS point surfaces, and Eclipse support. Approximate areas/volumes are heuristic approximations.
`ops.canonical_reservoir_model_artifacts`	RMS/Eclipse artifact inventory	`source_human_dataset`	`deterministic_transform`	5454	5454	18	Artifact facts come from model packages; linkage and parse status are computed.
`ops.canonical_seismic_artifacts`	seismic artifact inventory	`source_human_dataset`	`deterministic_transform`	98	98	23	Artifact facts come from packages and headers; line extents/counts are deterministic summaries.
`ops.canonical_seismic_surveys`	seismic survey registry	`source_human_dataset`	`deterministic_transform`	14	14	12	Survey entities are canonicalized from seismic support packages; not full trace semantics.
`ops.canonical_structural_surfaces`	structural metadata envelope	`source_human_dataset`	`deterministic_transform`	92	92	19	Surface extents and counts are deterministic summaries of human-authored horizon/fault files.
`ops.canonical_technical_daily_reports`	technical drilling daily reports	`source_human_dataset`	`deterministic_transform`	1555	1555	29	Extracted from technical XML. Report content is human-authored source data.
`ops.canonical_well_aliases`	well identity resolver aliases	`deterministic_rule_derived`	`rule_based_heuristic`	496	496	11	Aliases are generated by normalization rules and content-based identity resolution authored in code.
`ops.canonical_well_locations`	canonical spatial anchors	`source_human_dataset`	`deterministic_transform`	34	34	20	Human-authored source coordinates are preserved and augmented with deterministic CRS/WGS84/ECEF transforms.
`ops.canonical_well_reservoir_penetrations`	well-to-reservoir intersections	`deterministic_rule_derived`	`rule_based_heuristic`	78	78	23	Penetrations are derived from top/base intervals, COMPDAT, and spatial matching. Not full cell-level simulation geometry.
`ops.canonical_well_source_links`	well/source lineage links	`deterministic_rule_derived`	`rule_based_heuristic`	1048	1048	13	Links are created by content parsing and alias resolution. This is not source-authored data.
`ops.canonical_wells`	canonical well identity master	`source_human_dataset`	`none`	40	40	13	Directly normalized from structured source identifiers such as production workbooks and well interpretation files.
`ops.canonical_witsml_bha_runs`	canonical WITSML BHA runs	`source_human_dataset`	`deterministic_transform`	15	15	20	Loaded from WITSML XML with deterministic parsing and well linkage.
`ops.canonical_witsml_messages`	canonical WITSML messages	`source_human_dataset`	`deterministic_transform`	582	582	22	Message text is direct source content. Linkage confidence is deterministic rule output.
`ops.canonical_witsml_support`	WITSML support rollup	`deterministic_rule_derived`	`rule_based_heuristic`	8	8	17	Aggregated per-well support counts derived from canonical WITSML records.
`ops.canonical_witsml_trajectories`	canonical WITSML trajectories	`source_human_dataset`	`deterministic_transform`	14	14	24	Trajectory station counts and ranges are deterministic summaries from WITSML XML.
`ops.canonical_witsml_wellbores`	canonical WITSML wellbores	`source_human_dataset`	`deterministic_transform`	7	7	20	Source facts come from WITSML XML; canonical well linkage is rule-based.
`ops.dead_letter`	application data	`operational_system`	`deterministic_rule_derived`	0		6	No explicit table override defined; review table-specific logic before treating as source-authored.
`ops.field_package_ingest_coverage`	package-level coverage audit	`operational_system`	`deterministic_rule_derived`	18	18	19	System-generated ledger showing canonical status of each staged package.
`ops.field_package_normalizations`	application data	`operational_system`	`deterministic_rule_derived`	35	35	11	No explicit table override defined; review table-specific logic before treating as source-authored.
`ops.job_attempts`	application data	`operational_system`	`deterministic_rule_derived`	24		7	No explicit table override defined; review table-specific logic before treating as source-authored.
`ops.job_results`	application data	`operational_system`	`deterministic_rule_derived`	264		5	No explicit table override defined; review table-specific logic before treating as source-authored.
`ops.jobs`	application data	`operational_system`	`deterministic_rule_derived`	335		7	No explicit table override defined; review table-specific logic before treating as source-authored.
`ops.jobs_202602`	application data	`operational_system`	`deterministic_rule_derived`	103		7	No explicit table override defined; review table-specific logic before treating as source-authored.
`ops.jobs_202603`	application data	`operational_system`	`deterministic_rule_derived`	232		7	No explicit table override defined; review table-specific logic before treating as source-authored.
`ops.pointcloud_grids`	application data	`operational_system`	`deterministic_rule_derived`	5	5	11	No explicit table override defined; review table-specific logic before treating as source-authored.
`ops.pointcloud_tiles`	application data	`operational_system`	`deterministic_rule_derived`	29	29	22	No explicit table override defined; review table-specific logic before treating as source-authored.
`ops.remaining_barrel_estimates`	remaining-barrel estimates	`deterministic_rule_derived`	`rule_based_heuristic`	11	11	42	Low/mid/high barrel outputs are heuristic volumetric estimates, not booked reserves.
`ops.remaining_barrel_estimation_runs`	barrel-estimation run ledger	`operational_system`	`rule_based_heuristic`	1	1	13	Run metadata and methodology for remaining-barrel estimation.
`ops.reopen_score_profiles`	application data	`operational_system`	`deterministic_rule_derived`	1	1	12	No explicit table override defined; review table-specific logic before treating as source-authored.
`ops.scraper_schedules`	application data	`operational_system`	`deterministic_rule_derived`	16		8	No explicit table override defined; review table-specific logic before treating as source-authored.
`ops.upload_sessions`	application data	`operational_system`	`deterministic_rule_derived`	480	480	11	No explicit table override defined; review table-specific logic before treating as source-authored.
`ops.well_bypassed_candidates`	bypassed-pay candidates	`deterministic_rule_derived`	`rule_based_heuristic`	265	265	22	Candidate classification and score are heuristic outputs from coded completion/pay rules.
`ops.well_data_gaps`	application data	`operational_system`	`deterministic_rule_derived`	72	72	10	No explicit table override defined; review table-specific logic before treating as source-authored.
`ops.well_interpretations`	petrophysical interpretation runs	`deterministic_rule_derived`	`rule_based_heuristic`	40	40	14	Interpretation outputs are computed by formulas/cutoffs configured in code. No runtime LLM or statistical model is used.
`ops.well_logs`	normalized well-log run registry	`source_human_dataset`	`deterministic_transform`	56	56	29	Run-level metadata comes from log files; CRS and curve summaries are deterministic.
`ops.well_pay_events`	pay interval events	`deterministic_rule_derived`	`rule_based_heuristic`	269	269	27	Intervals are derived from interpreted logs and thickness cutoffs.
`ops.well_qc_cards`	log QC summaries	`deterministic_rule_derived`	`rule_based_heuristic`	56	56	11	QC metrics and ratings are computed from completeness/washout rules, not from an AI model.
`ops.well_reopening_targets`	reopening candidate ranking	`deterministic_rule_derived`	`rule_based_heuristic`	40	40	34	Scores, classifications, and recommendations are heuristic outputs from authored formulas and weights. No runtime LLM/ML model is used.
`raw.extracted_fields`	raw source intake	`source_human_dataset`	`none`	480		8	Raw intake tables preserve uploaded/source package facts and extraction traces.
`raw.source_bundles`	raw source intake	`source_human_dataset`	`none`	4	4	11	Raw intake tables preserve uploaded/source package facts and extraction traces.
`raw.source_objects`	raw source intake	`source_human_dataset`	`none`	32	32	19	Raw intake tables preserve uploaded/source package facts and extraction traces.
`semantic.entities`	semantic entity projection	`deterministic_rule_derived`	`deterministic_semantic_projection`	40	40	13	Semantic layer is projected from canonical data using the designed ontology. No runtime LLM inference observed.
`semantic.entity_links`	semantic relationship projection	`deterministic_rule_derived`	`deterministic_semantic_projection`	0	0	11	Relationship graph is derived from canonical data and schema rules.
`semantic.entity_source_links`	semantic/source lineage	`deterministic_rule_derived`	`deterministic_semantic_projection`	0		10	Traceability from semantic entities back to raw sources.
`semantic.entity_types`	semantic projection	`deterministic_rule_derived`	`deterministic_semantic_projection`	19		7	Semantic layer projects canonical records into ontology entities and relations.
`semantic.query_profiles`	semantic projection	`deterministic_rule_derived`	`deterministic_semantic_projection`	2		9	Semantic layer projects canonical records into ontology entities and relations.
`semantic.relation_types`	semantic projection	`deterministic_rule_derived`	`deterministic_semantic_projection`	18		8	Semantic layer projects canonical records into ontology entities and relations.

Defense Summary

The DB is the mechanism that separates source facts from heuristic outputs.
Tables are separated by object meaning, not by file convenience.
Rows are kept at the smallest useful unit so conclusions can be audited.
CRS and ECEF are explicit because spatial truth is part of the defendable result.
The result is a truth-and-provenance system for open-data decision support, not just storage.