Data Platform Storage, DB, And Processing Manual
This manual consolidates how the build stores immutable raw data, extracts metadata, persists processing state, promotes canonical truth, projects ontology entities, and exposes APIs. Parts of this were already described across the system-interaction map and schema-defense manuals. This report is the combined colleague-facing version.
Alembic version: 0028
End-To-End Data Flow
1. Raw bytes arrive through upload endpoints or field-package staging.
2. Raw objects are written to MinIO and registered in PostgreSQL.
3. Deterministic metadata is extracted and stored in PostgreSQL.
4. Processing jobs and results are tracked in PostgreSQL.
5. Canonical ingest transforms raw package content into typed operational tables.
6. Semantic projection links canonical truth into ontology-ready entities.
7. Decision layers read canonical and semantic data to produce scores and barrel estimates.
Proxy And Routing Layers
The deployment does run proxy servers. It is not browser-to-backend direct. The routing model is layered on purpose so the WAN entrypoint stays on one hostname while internal services remain private.
| Layer | Technology | What It Routes | Why It Exists |
|---|---|---|---|
| Public proxy | Caddy |
443 -> edge-web, with /admin/ basic auth |
TLS termination, public hostname binding, security headers, admin gate. |
| Internal edge proxy | nginx |
/, /admin/, /data/, /control/, /minio/ |
Single internal routing layer for UI, APIs, and object-store access. |
| App web proxy | nginx |
same-origin /data/ and /control/ from client/admin web apps |
Keeps browser API calls same-origin and avoids exposing raw internal service ports. |
| API gateway | FastAPI + httpx |
/data/* to data-plane, /control/* to control-plane |
Application-level routing, auth/header propagation, and service boundary separation. |
Programming Language Stack
The build is not “no JavaScript”. The backend and processing core are primarily Python, but the deployed web UI does use JavaScript in the browser.
| Layer | Primary Language | Where It Is Used | Why |
|---|---|---|---|
| Backend APIs | Python |
apps/api-gateway, apps/data-plane-api, apps/control-plane-api |
Main application logic, ingest orchestration, auth, canonicalization, scoring, exports. |
| Workers and ops scripts | Python and shell |
apps/worker-*, scripts/ |
Background processing, migrations, exports, audits, health checks. |
| Database | SQL |
PostgreSQL + Alembic migrations | Canonical truth, provenance, queryability, auditability, and relational integrity. |
| Browser UI | JavaScript |
apps/client-web/*.js, apps/admin-web/*.js |
UI interaction, fetch calls, live workflow controls, rendered manuals. |
| Reverse proxies | Config languages | Caddyfile, nginx config |
Routing, TLS, same-origin proxying, and exposure control. |
Defensible summary: the platform logic is Python-first, PostgreSQL-backed, and infrastructure-config driven. JavaScript is used for the web interface, not as the primary data-processing language.
How MinIO Was Built And How It Is Used
MinIO is deployed as the object store service in docker-compose.yml.
- Container:
earthbond-minio - Image:
quay.io/minio/minio:latest - Internal API endpoint:
http://minio:9000 - Host S3 API port:
127.0.0.1:19000 - Host console port:
127.0.0.1:19001 - Host-backed storage mount:
./data/minio:/data - Initialized buckets:
raw-vault,evidence-packs
MinIO is used to keep raw bytes immutable enough for replay and audit while PostgreSQL stores searchable metadata, workflow state, canonical truth, semantic projections, and audit events.
Typical object-key patterns
{project_id}/raw/{upload_id}/{filename}for direct uploads{project_id}/raw/{zip_upload_id}/{member_path}for zip member uploads{project_id}/field-packages/volve-all_files/...for staged field packages
Are APIs Used
Yes. The build uses APIs heavily. Browser and WAN interactions do not bypass the backend; they go through the web UI, the API gateway, and the data-plane API. Workers also participate in processing.
| API Group | Endpoints | Why They Exist |
|---|---|---|
| Auth and session | /auth/login |
Establishes authenticated access before protected upload and field-package actions are allowed. |
| Direct upload and raw storage | /uploads/presign, /uploads/direct, /uploads/direct-zip, /projects/{project_id}/uploads, /uploads/detect-kind, /projects/{project_id}/upload-workflow-state |
Writes raw bytes to MinIO, records upload session state, and registers raw source objects in PostgreSQL. |
| Field-package staging and validation | /projects/{project_id}/field-package/server-path-info, /projects/{project_id}/field-package/stage-to-storage, /projects/{project_id}/field-package/local-scan, /projects/{project_id}/field-package/browser-scan, /projects/{project_id}/field-package/coverage-audit, /projects/{project_id}/field-package/domain-report |
Moves package data into MinIO, inventories the source, validates structure, and records coverage/readiness. |
| Canonical ingest and enrichment | /projects/{project_id}/field-package/normalize-phase1, /projects/{project_id}/field-package/ingest-technical-context, /projects/{project_id}/field-package/ingest-full-log-estate, /projects/{project_id}/field-package/ingest-report-documents, /projects/{project_id}/field-package/ingest-model-artifacts, /projects/{project_id}/field-package/ingest-misc-artifacts, /projects/{project_id}/field-package/ingest-logs-phase2, /projects/{project_id}/field-package/ingest-reservoir-context, /projects/{project_id}/field-package/ingest-witsml-context, /projects/{project_id}/field-package/ingest-seismic-support, /projects/{project_id}/field-package/ingest-reservoir-bodies, /projects/{project_id}/field-package/resolve-well-links |
Promotes raw package content into canonical, semantic, and technical layers. |
| Inspection and decision outputs | /projects/{project_id}/field-package/canonical-summary, /projects/{project_id}/field-package/canonical-records, /projects/{project_id}/field-package/run-reopening-scoring, /projects/{project_id}/field-package/reopening-targets, /projects/{project_id}/field-package/run-remaining-barrels, /projects/{project_id}/field-package/remaining-barrels, /projects/{project_id}/field-package/export-domains |
Exposes processed facts and decision-support outputs after canonicalization is complete enough to query. |
How Raw Data Is Stored And Metadata Is Extracted
Raw data is stored as immutable object bytes in MinIO. The database never replaces those source bytes with canonical outputs. Instead, PostgreSQL stores object registration, extracted metadata, job state, audit lineage, and later canonical tables.
The raw storage pattern is deliberate: raw bytes remain replayable, while extracted metadata becomes queryable.
Key raw tables
raw.source_bundles: one bundle-level registration row.raw.source_objects: one object-level registry row for each stored object.raw.extracted_fields: one extracted field per object/field pair.ops.upload_sessions: one browser or upload-session control row.ops.jobsandops.job_results: one processing run and one result payload.
How Processing Moves From Raw To Canonical And Semantic
Canonical processing is not a replacement for raw storage. It is a promoted layer that turns raw package content into typed facts.
ops.canonical_wells,ops.canonical_well_aliases,ops.canonical_well_source_linksresolve identity.ops.canonical_formation_tops,ops.canonical_completion_intervals,ops.canonical_structural_surfacescarry subsurface facts.ops.canonical_production_records,ops.canonical_technical_daily_reports, andops.canonical_witsml_*carry technical and dynamic context.semantic.entitiesandsemantic.entity_linksproject ontology-oriented entities from canonical truth.
Key Tables, IDs, And Stored Data Points
raw.source_bundles
Primary key columns: bundle_id
Role: raw source intake
Total rows: 4
Origin: source_human_dataset | Runtime inference: none
Raw intake tables preserve uploaded/source package facts and extraction traces.
| Column | Type | Nullable | Provenance | Notes |
|---|---|---|---|---|
bundle_id |
uuid |
False | operational_system |
System key, tenancy, run, or timestamp field. |
tenant_id |
uuid |
True | operational_system |
System key, tenancy, run, or timestamp field. |
project_id |
text |
False | operational_system |
System key, tenancy, run, or timestamp field. |
bundle_type |
text |
False | source_human_dataset |
Defaulted to direct source value for this canonical source table. |
ingest_status |
text |
False | source_human_dataset |
Defaulted to direct source value for this canonical source table. |
source_tag |
text |
True | source_human_dataset |
Defaulted to direct source value for this canonical source table. |
authority_rank |
text |
False | source_human_dataset |
Defaulted to direct source value for this canonical source table. |
metadata |
jsonb |
False | mixed_source_and_system |
JSON container that may hold source values plus processing metadata. |
created_by |
text |
True | source_human_dataset |
Defaulted to direct source value for this canonical source table. |
created_at |
timestamp with time zone |
False | operational_system |
System key, tenancy, run, or timestamp field. |
updated_at |
timestamp with time zone |
False | operational_system |
System key, tenancy, run, or timestamp field. |
raw.source_objects
Primary key columns: source_object_id
Role: raw source intake
Total rows: 32
Origin: source_human_dataset | Runtime inference: none
Raw intake tables preserve uploaded/source package facts and extraction traces.
| Column | Type | Nullable | Provenance | Notes |
|---|---|---|---|---|
source_object_id |
uuid |
False | operational_system |
System key, tenancy, run, or timestamp field. |
bundle_id |
uuid |
False | operational_system |
System key, tenancy, run, or timestamp field. |
upload_id |
uuid |
True | operational_system |
System key, tenancy, run, or timestamp field. |
tenant_id |
uuid |
True | operational_system |
System key, tenancy, run, or timestamp field. |
project_id |
text |
False | operational_system |
System key, tenancy, run, or timestamp field. |
object_key |
text |
False | source_human_dataset |
Defaulted to direct source value for this canonical source table. |
file_name |
text |
False | source_human_dataset |
Defaulted to direct source value for this canonical source table. |
file_ext |
text |
True | source_human_dataset |
Defaulted to direct source value for this canonical source table. |
mime_type |
text |
False | source_human_dataset |
Defaulted to direct source value for this canonical source table. |
source_type |
text |
False | source_human_dataset |
Defaulted to direct source value for this canonical source table. |
parse_status |
text |
False | source_human_dataset |
Defaulted to direct source value for this canonical source table. |
ocr_status |
text |
False | source_human_dataset |
Defaulted to direct source value for this canonical source table. |
sha256 |
text |
True | source_human_dataset |
Defaulted to direct source value for this canonical source table. |
size_bytes |
bigint |
True | source_human_dataset |
Defaulted to direct source value for this canonical source table. |
authority_rank |
text |
False | source_human_dataset |
Defaulted to direct source value for this canonical source table. |
metadata_profile |
jsonb |
False | mixed_source_and_system |
JSON payload requiring field-level review. |
created_by |
text |
True | source_human_dataset |
Defaulted to direct source value for this canonical source table. |
created_at |
timestamp with time zone |
False | operational_system |
System key, tenancy, run, or timestamp field. |
updated_at |
timestamp with time zone |
False | operational_system |
System key, tenancy, run, or timestamp field. |
raw.extracted_fields
Primary key columns: extracted_field_id
Role: raw source intake
Total rows: 480
Origin: source_human_dataset | Runtime inference: none
Raw intake tables preserve uploaded/source package facts and extraction traces.
| Column | Type | Nullable | Provenance | Notes |
|---|---|---|---|---|
extracted_field_id |
uuid |
False | operational_system |
System key, tenancy, run, or timestamp field. |
source_object_id |
uuid |
False | operational_system |
System key, tenancy, run, or timestamp field. |
field_name |
text |
False | source_human_dataset |
Directly sourced from human-authored input files or regulator/operator datasets. |
field_value_text |
text |
True | source_human_dataset |
Defaulted to direct source value for this canonical source table. |
field_value_json |
jsonb |
True | mixed_source_and_system |
JSON payload requiring field-level review. |
field_confidence |
double precision |
True | source_human_dataset |
Defaulted to direct source value for this canonical source table. |
extraction_method |
text |
False | source_human_dataset |
Defaulted to direct source value for this canonical source table. |
created_at |
timestamp with time zone |
False | operational_system |
System key, tenancy, run, or timestamp field. |
ops.upload_sessions
Primary key columns: upload_id
Role: application data
Total rows: 480
Origin: operational_system | Runtime inference: deterministic_rule_derived
No explicit table override defined; review table-specific logic before treating as source-authored.
| Column | Type | Nullable | Provenance | Notes |
|---|---|---|---|---|
upload_id |
uuid |
False | operational_system |
System key, tenancy, run, or timestamp field. |
tenant_id |
uuid |
True | operational_system |
System key, tenancy, run, or timestamp field. |
project_id |
text |
False | operational_system |
System key, tenancy, run, or timestamp field. |
filename |
text |
False | operational_system |
Defaulted to platform/system state for this table. |
content_type |
text |
False | operational_system |
Defaulted to platform/system state for this table. |
object_key |
text |
False | operational_system |
Defaulted to platform/system state for this table. |
status |
text |
False | operational_system |
System key, tenancy, run, or timestamp field. |
created_by |
text |
True | operational_system |
Defaulted to platform/system state for this table. |
metadata |
jsonb |
False | mixed_source_and_system |
JSON container that may hold source values plus processing metadata. |
created_at |
timestamp with time zone |
False | operational_system |
System key, tenancy, run, or timestamp field. |
updated_at |
timestamp with time zone |
False | operational_system |
System key, tenancy, run, or timestamp field. |
ops.jobs
Primary key columns:
Role: application data
Total rows: 335
Origin: operational_system | Runtime inference: deterministic_rule_derived
No explicit table override defined; review table-specific logic before treating as source-authored.
| Column | Type | Nullable | Provenance | Notes |
|---|---|---|---|---|
job_id |
uuid |
False | operational_system |
System key, tenancy, run, or timestamp field. |
tenant_id |
uuid |
True | operational_system |
System key, tenancy, run, or timestamp field. |
job_type |
text |
False | operational_system |
Defaulted to platform/system state for this table. |
status |
text |
False | operational_system |
System key, tenancy, run, or timestamp field. |
payload |
jsonb |
False | mixed_source_and_system |
JSON payload requiring field-level review. |
created_at |
timestamp with time zone |
False | operational_system |
System key, tenancy, run, or timestamp field. |
updated_at |
timestamp with time zone |
False | operational_system |
System key, tenancy, run, or timestamp field. |
ops.job_results
Primary key columns: job_id
Role: application data
Total rows: 264
Origin: operational_system | Runtime inference: deterministic_rule_derived
No explicit table override defined; review table-specific logic before treating as source-authored.
| Column | Type | Nullable | Provenance | Notes |
|---|---|---|---|---|
job_id |
uuid |
False | operational_system |
System key, tenancy, run, or timestamp field. |
result |
jsonb |
False | mixed_source_and_system |
JSON payload requiring field-level review. |
output_ref |
text |
True | operational_system |
Defaulted to platform/system state for this table. |
processed_by |
text |
True | operational_system |
Defaulted to platform/system state for this table. |
processed_at |
timestamp with time zone |
False | operational_system |
Defaulted to platform/system state for this table. |
ops.field_package_normalizations
Primary key columns: run_id
Role: application data
Total rows: 35
Origin: operational_system | Runtime inference: deterministic_rule_derived
No explicit table override defined; review table-specific logic before treating as source-authored.
| Column | Type | Nullable | Provenance | Notes |
|---|---|---|---|---|
run_id |
uuid |
False | operational_system |
System key, tenancy, run, or timestamp field. |
tenant_id |
uuid |
True | operational_system |
System key, tenancy, run, or timestamp field. |
project_id |
text |
False | operational_system |
System key, tenancy, run, or timestamp field. |
root_path |
text |
False | operational_system |
Defaulted to platform/system state for this table. |
normalized_domains |
jsonb |
False | mixed_source_and_system |
JSON payload requiring field-level review. |
counts |
jsonb |
False | mixed_source_and_system |
JSON payload requiring field-level review. |
issues |
jsonb |
False | mixed_source_and_system |
JSON payload requiring field-level review. |
status |
text |
False | operational_system |
System key, tenancy, run, or timestamp field. |
actor |
text |
True | operational_system |
System key, tenancy, run, or timestamp field. |
created_at |
timestamp with time zone |
False | operational_system |
System key, tenancy, run, or timestamp field. |
updated_at |
timestamp with time zone |
False | operational_system |
System key, tenancy, run, or timestamp field. |
ops.field_package_ingest_coverage
Primary key columns: coverage_id
Role: package-level coverage audit
Total rows: 18
Origin: operational_system | Runtime inference: deterministic_rule_derived
System-generated ledger showing canonical status of each staged package.
| Column | Type | Nullable | Provenance | Notes |
|---|---|---|---|---|
coverage_id |
uuid |
False | operational_system |
System key, tenancy, run, or timestamp field. |
project_id |
text |
False | operational_system |
System key, tenancy, run, or timestamp field. |
source_label |
text |
False | operational_system |
Defaulted to platform/system state for this table. |
package_name |
text |
False | operational_system |
Defaulted to platform/system state for this table. |
domain_key |
text |
True | operational_system |
Defaulted to platform/system state for this table. |
package_kind |
text |
False | operational_system |
Defaulted to platform/system state for this table. |
source_mode |
text |
False | operational_system |
Defaulted to platform/system state for this table. |
object_prefix |
text |
True | operational_system |
Defaulted to platform/system state for this table. |
root_path |
text |
True | operational_system |
Defaulted to platform/system state for this table. |
file_count |
integer |
False | operational_system |
Defaulted to platform/system state for this table. |
total_bytes |
bigint |
False | operational_system |
Defaulted to platform/system state for this table. |
canonical_status |
text |
False | operational_system |
Defaulted to platform/system state for this table. |
canonical_reason |
text |
True | operational_system |
Defaulted to platform/system state for this table. |
canonical_counts |
jsonb |
False | mixed_source_and_system |
JSON container that may hold source values plus processing metadata. |
required_next_step |
text |
True | operational_system |
Defaulted to platform/system state for this table. |
actor |
text |
True | operational_system |
System key, tenancy, run, or timestamp field. |
metadata |
jsonb |
False | mixed_source_and_system |
JSON container that may hold source values plus processing metadata. |
created_at |
timestamp with time zone |
False | operational_system |
System key, tenancy, run, or timestamp field. |
updated_at |
timestamp with time zone |
False | operational_system |
System key, tenancy, run, or timestamp field. |
ops.canonical_wells
Primary key columns: canonical_well_id
Role: canonical well identity master
Total rows: 40
Origin: source_human_dataset | Runtime inference: none
Directly normalized from structured source identifiers such as production workbooks and well interpretation files.
| Column | Type | Nullable | Provenance | Notes |
|---|---|---|---|---|
canonical_well_id |
uuid |
False | operational_system |
System key, tenancy, run, or timestamp field. |
tenant_id |
uuid |
True | operational_system |
System key, tenancy, run, or timestamp field. |
project_id |
text |
False | operational_system |
System key, tenancy, run, or timestamp field. |
well_name |
text |
False | source_human_dataset |
Directly sourced from human-authored input files or regulator/operator datasets. |
npd_well_bore_code |
text |
True | source_human_dataset |
Directly sourced from human-authored input files or regulator/operator datasets. |
npd_well_bore_name |
text |
True | source_human_dataset |
Directly sourced from human-authored input files or regulator/operator datasets. |
well_bore_code |
text |
True | source_human_dataset |
Directly sourced from human-authored input files or regulator/operator datasets. |
field_name |
text |
True | source_human_dataset |
Directly sourced from human-authored input files or regulator/operator datasets. |
field_code |
text |
True | source_human_dataset |
Directly sourced from human-authored input files or regulator/operator datasets. |
source_kind |
text |
False | source_human_dataset |
Defaulted to direct source value for this canonical source table. |
metadata |
jsonb |
False | mixed_source_and_system |
JSON container that may hold source values plus processing metadata. |
created_at |
timestamp with time zone |
False | operational_system |
System key, tenancy, run, or timestamp field. |
updated_at |
timestamp with time zone |
False | operational_system |
System key, tenancy, run, or timestamp field. |
ops.canonical_well_aliases
Primary key columns: alias_id
Role: well identity resolver aliases
Total rows: 496
Origin: deterministic_rule_derived | Runtime inference: rule_based_heuristic
Aliases are generated by normalization rules and content-based identity resolution authored in code.
| Column | Type | Nullable | Provenance | Notes |
|---|---|---|---|---|
alias_id |
uuid |
False | operational_system |
System key, tenancy, run, or timestamp field. |
canonical_well_id |
uuid |
False | operational_system |
System key, tenancy, run, or timestamp field. |
project_id |
text |
False | operational_system |
System key, tenancy, run, or timestamp field. |
alias |
text |
False | deterministic_rule_derived |
Defaulted to rule-derived value for this table. |
alias_slug |
text |
False | deterministic_rule_derived |
Defaulted to rule-derived value for this table. |
alias_kind |
text |
False | deterministic_rule_derived |
Defaulted to rule-derived value for this table. |
source_kind |
text |
False | deterministic_rule_derived |
Defaulted to rule-derived value for this table. |
confidence |
text |
False | rule_based_heuristic |
Heuristic/rule-based value generated by pipeline logic; no runtime LLM/ML model observed. |
evidence |
jsonb |
False | operational_system |
Traceability and lineage payload generated by the platform. |
created_at |
timestamp with time zone |
False | operational_system |
System key, tenancy, run, or timestamp field. |
updated_at |
timestamp with time zone |
False | operational_system |
System key, tenancy, run, or timestamp field. |
ops.canonical_well_source_links
Primary key columns: source_link_id
Role: well/source lineage links
Total rows: 1048
Origin: deterministic_rule_derived | Runtime inference: rule_based_heuristic
Links are created by content parsing and alias resolution. This is not source-authored data.
| Column | Type | Nullable | Provenance | Notes |
|---|---|---|---|---|
source_link_id |
uuid |
False | operational_system |
System key, tenancy, run, or timestamp field. |
canonical_well_id |
uuid |
False | operational_system |
System key, tenancy, run, or timestamp field. |
project_id |
text |
False | operational_system |
System key, tenancy, run, or timestamp field. |
source_domain |
text |
False | deterministic_rule_derived |
Defaulted to rule-derived value for this table. |
source_object_key |
text |
False | source_human_dataset |
Directly sourced from human-authored input files or regulator/operator datasets. |
source_member |
text |
True | source_human_dataset |
Directly sourced from human-authored input files or regulator/operator datasets. |
linkage_method |
text |
False | deterministic_rule_derived |
Defaulted to rule-derived value for this table. |
linkage_confidence |
text |
False | rule_based_heuristic |
Heuristic/rule-based value generated by pipeline logic; no runtime LLM/ML model observed. |
matched_identifier |
text |
True | deterministic_rule_derived |
Defaulted to rule-derived value for this table. |
identifier_kind |
text |
True | deterministic_rule_derived |
Defaulted to rule-derived value for this table. |
evidence |
jsonb |
False | operational_system |
Traceability and lineage payload generated by the platform. |
created_at |
timestamp with time zone |
False | operational_system |
System key, tenancy, run, or timestamp field. |
updated_at |
timestamp with time zone |
False | operational_system |
System key, tenancy, run, or timestamp field. |
semantic.entities
Primary key columns: entity_id
Role: semantic entity projection
Total rows: 40
Origin: deterministic_rule_derived | Runtime inference: deterministic_semantic_projection
Semantic layer is projected from canonical data using the designed ontology. No runtime LLM inference observed.
| Column | Type | Nullable | Provenance | Notes |
|---|---|---|---|---|
entity_id |
uuid |
False | operational_system |
System key, tenancy, run, or timestamp field. |
tenant_id |
uuid |
True | operational_system |
System key, tenancy, run, or timestamp field. |
project_id |
text |
False | operational_system |
System key, tenancy, run, or timestamp field. |
entity_type_key |
text |
False | deterministic_rule_derived |
Defaulted to rule-derived value for this table. |
entity_key |
text |
False | deterministic_rule_derived |
Defaulted to rule-derived value for this table. |
display_name |
text |
True | deterministic_rule_derived |
Defaulted to rule-derived value for this table. |
confidence |
text |
False | rule_based_heuristic |
Heuristic/rule-based value generated by pipeline logic; no runtime LLM/ML model observed. |
is_preferred |
boolean |
False | deterministic_rule_derived |
Defaulted to rule-derived value for this table. |
canonical_ref |
jsonb |
False | operational_system |
Traceability and lineage payload generated by the platform. |
attributes |
jsonb |
False | mixed_source_and_system |
JSON container that may hold source values plus processing metadata. |
provenance |
jsonb |
False | operational_system |
Traceability and lineage payload generated by the platform. |
created_at |
timestamp with time zone |
False | operational_system |
System key, tenancy, run, or timestamp field. |
updated_at |
timestamp with time zone |
False | operational_system |
System key, tenancy, run, or timestamp field. |
semantic.entity_links
Primary key columns: entity_link_id
Role: semantic relationship projection
Total rows: 0
Origin: deterministic_rule_derived | Runtime inference: deterministic_semantic_projection
Relationship graph is derived from canonical data and schema rules.
| Column | Type | Nullable | Provenance | Notes |
|---|---|---|---|---|
entity_link_id |
uuid |
False | operational_system |
System key, tenancy, run, or timestamp field. |
tenant_id |
uuid |
True | operational_system |
System key, tenancy, run, or timestamp field. |
project_id |
text |
False | operational_system |
System key, tenancy, run, or timestamp field. |
relation_type_key |
text |
False | deterministic_rule_derived |
Defaulted to rule-derived value for this table. |
from_entity_id |
uuid |
False | operational_system |
System key, tenancy, run, or timestamp field. |
to_entity_id |
uuid |
False | operational_system |
System key, tenancy, run, or timestamp field. |
confidence |
text |
False | rule_based_heuristic |
Heuristic/rule-based value generated by pipeline logic; no runtime LLM/ML model observed. |
attributes |
jsonb |
False | mixed_source_and_system |
JSON container that may hold source values plus processing metadata. |
provenance |
jsonb |
False | operational_system |
Traceability and lineage payload generated by the platform. |
created_at |
timestamp with time zone |
False | operational_system |
System key, tenancy, run, or timestamp field. |
updated_at |
timestamp with time zone |
False | operational_system |
System key, tenancy, run, or timestamp field. |
audit.event_ledger
Primary key columns:
Role: audit and integrity ledger
Total rows: 547
Origin: operational_system | Runtime inference: deterministic_rule_derived
Audit tables are system-generated traceability and integrity records.
| Column | Type | Nullable | Provenance | Notes |
|---|---|---|---|---|
event_id |
uuid |
False | operational_system |
System key, tenancy, run, or timestamp field. |
event_time |
timestamp with time zone |
False | operational_system |
Defaulted to platform/system state for this table. |
tenant_id |
uuid |
True | operational_system |
System key, tenancy, run, or timestamp field. |
event_type |
text |
False | operational_system |
Defaulted to platform/system state for this table. |
actor_id |
uuid |
True | operational_system |
System key, tenancy, run, or timestamp field. |
payload |
jsonb |
False | mixed_source_and_system |
JSON payload requiring field-level review. |
prev_hash |
text |
False | operational_system |
Defaulted to platform/system state for this table. |
curr_hash |
text |
False | operational_system |
Defaulted to platform/system state for this table. |
signature_ref |
text |
True | operational_system |
Defaulted to platform/system state for this table. |
audit.evidence_packs
Primary key columns: evidence_pack_id
Role: audit and integrity ledger
Total rows: 34
Origin: operational_system | Runtime inference: deterministic_rule_derived
Audit tables are system-generated traceability and integrity records.
| Column | Type | Nullable | Provenance | Notes |
|---|---|---|---|---|
evidence_pack_id |
uuid |
False | operational_system |
System key, tenancy, run, or timestamp field. |
tenant_id |
uuid |
True | operational_system |
System key, tenancy, run, or timestamp field. |
dataset_id |
text |
False | operational_system |
System key, tenancy, run, or timestamp field. |
object_key |
text |
False | operational_system |
Defaulted to platform/system state for this table. |
generated_at |
timestamp with time zone |
False | operational_system |
Defaulted to platform/system state for this table. |
generator_subject |
text |
False | operational_system |
Defaulted to platform/system state for this table. |
signature_ref |
text |
True | operational_system |
Defaulted to platform/system state for this table. |
status |
text |
False | operational_system |
System key, tenancy, run, or timestamp field. |
Full Table Appendix
Machine-readable exports accompanying this manual:
docs/operations/DATA_PLATFORM_STORAGE_AND_PROCESSING_AUDIT_20260331.jsondocs/operations/DATA_PLATFORM_STORAGE_AND_PROCESSING_AUDIT_20260331.csv
| Table | Primary Key Columns | Role | Total Rows | Primary Origin | Runtime Inference |
|---|---|---|---|---|---|
audit.event_ledger |
|
audit and integrity ledger | 547 | operational_system |
deterministic_rule_derived |
audit.event_ledger_202602 |
|
audit and integrity ledger | 179 | operational_system |
deterministic_rule_derived |
audit.event_ledger_202603 |
|
audit and integrity ledger | 368 | operational_system |
deterministic_rule_derived |
audit.evidence_packs |
evidence_pack_id |
audit and integrity ledger | 34 | operational_system |
deterministic_rule_derived |
audit.integrity_checks |
check_id |
audit and integrity ledger | 0 | operational_system |
deterministic_rule_derived |
core.password_reset_tokens |
token_id |
platform identity and policy | 6 | operational_system |
none |
core.policy_sets |
policy_id |
platform identity and policy | 4 | operational_system |
none |
core.reference_crs_profiles |
profile_id |
platform identity and policy | 2 | operational_system |
none |
core.roles |
role_id |
platform identity and policy | 0 | operational_system |
none |
core.tenant_schema_registry |
tenant_id |
platform identity and policy | 3 | operational_system |
none |
core.tenants |
tenant_id |
platform identity and policy | 3 | operational_system |
none |
core.user_page_access |
user_id, page_key |
platform identity and policy | 15 | operational_system |
none |
core.user_project_access |
user_id, project_id |
platform identity and policy | 0 | operational_system |
none |
core.user_roles |
user_id, role_id, tenant_id |
platform identity and policy | 0 | operational_system |
none |
core.users |
user_id |
platform identity and policy | 3 | operational_system |
none |
ops.canonical_completion_intervals |
completion_interval_id |
perforations and completion intervals | 12 | source_human_dataset |
none |
ops.canonical_dynamic_reservoir_context |
dynamic_context_id |
reservoir-model support signals | 40 | deterministic_rule_derived |
rule_based_heuristic |
ops.canonical_formation_tops |
formation_top_id |
well tops and picks | 409 | source_human_dataset |
none |
ops.canonical_log_artifacts |
log_artifact_id |
full log-estate artifact inventory | 5344 | source_human_dataset |
deterministic_transform |
ops.canonical_package_artifacts |
package_artifact_id |
generic package artifact inventory | 5483 | source_human_dataset |
deterministic_transform |
ops.canonical_production_records |
production_record_id |
daily/monthly production history | 16160 | source_human_dataset |
none |
ops.canonical_report_documents |
report_document_id |
report document registry | 2 | source_human_dataset |
deterministic_transform |
ops.canonical_reservoir_bodies |
body_id |
derived reservoir bodies | 78 | deterministic_rule_derived |
rule_based_heuristic |
ops.canonical_reservoir_model_artifacts |
model_artifact_id |
RMS/Eclipse artifact inventory | 5454 | source_human_dataset |
deterministic_transform |
ops.canonical_seismic_artifacts |
artifact_id |
seismic artifact inventory | 98 | source_human_dataset |
deterministic_transform |
ops.canonical_seismic_surveys |
survey_id |
seismic survey registry | 14 | source_human_dataset |
deterministic_transform |
ops.canonical_structural_surfaces |
structural_surface_id |
structural metadata envelope | 92 | source_human_dataset |
deterministic_transform |
ops.canonical_technical_daily_reports |
technical_report_id |
technical drilling daily reports | 1555 | source_human_dataset |
deterministic_transform |
ops.canonical_well_aliases |
alias_id |
well identity resolver aliases | 496 | deterministic_rule_derived |
rule_based_heuristic |
ops.canonical_well_locations |
location_id |
canonical spatial anchors | 34 | source_human_dataset |
deterministic_transform |
ops.canonical_well_reservoir_penetrations |
penetration_id |
well-to-reservoir intersections | 78 | deterministic_rule_derived |
rule_based_heuristic |
ops.canonical_well_source_links |
source_link_id |
well/source lineage links | 1048 | deterministic_rule_derived |
rule_based_heuristic |
ops.canonical_wells |
canonical_well_id |
canonical well identity master | 40 | source_human_dataset |
none |
ops.canonical_witsml_bha_runs |
witsml_bha_run_id |
canonical WITSML BHA runs | 15 | source_human_dataset |
deterministic_transform |
ops.canonical_witsml_messages |
witsml_message_id |
canonical WITSML messages | 582 | source_human_dataset |
deterministic_transform |
ops.canonical_witsml_support |
witsml_support_id |
WITSML support rollup | 8 | deterministic_rule_derived |
rule_based_heuristic |
ops.canonical_witsml_trajectories |
witsml_trajectory_id |
canonical WITSML trajectories | 14 | source_human_dataset |
deterministic_transform |
ops.canonical_witsml_wellbores |
witsml_wellbore_id |
canonical WITSML wellbores | 7 | source_human_dataset |
deterministic_transform |
ops.dead_letter |
dead_letter_id |
application data | 0 | operational_system |
deterministic_rule_derived |
ops.field_package_ingest_coverage |
coverage_id |
package-level coverage audit | 18 | operational_system |
deterministic_rule_derived |
ops.field_package_normalizations |
run_id |
application data | 35 | operational_system |
deterministic_rule_derived |
ops.job_attempts |
attempt_id |
application data | 24 | operational_system |
deterministic_rule_derived |
ops.job_results |
job_id |
application data | 264 | operational_system |
deterministic_rule_derived |
ops.jobs |
|
application data | 335 | operational_system |
deterministic_rule_derived |
ops.jobs_202602 |
|
application data | 103 | operational_system |
deterministic_rule_derived |
ops.jobs_202603 |
|
application data | 232 | operational_system |
deterministic_rule_derived |
ops.pointcloud_grids |
grid_row_id |
application data | 5 | operational_system |
deterministic_rule_derived |
ops.pointcloud_tiles |
tile_id |
application data | 29 | operational_system |
deterministic_rule_derived |
ops.remaining_barrel_estimates |
estimate_id |
remaining-barrel estimates | 11 | deterministic_rule_derived |
rule_based_heuristic |
ops.remaining_barrel_estimation_runs |
run_id |
barrel-estimation run ledger | 1 | operational_system |
rule_based_heuristic |
ops.reopen_score_profiles |
scoring_profile_id |
application data | 1 | operational_system |
deterministic_rule_derived |
ops.scraper_schedules |
schedule_id |
application data | 16 | operational_system |
deterministic_rule_derived |
ops.upload_sessions |
upload_id |
application data | 480 | operational_system |
deterministic_rule_derived |
ops.well_bypassed_candidates |
candidate_id |
bypassed-pay candidates | 265 | deterministic_rule_derived |
rule_based_heuristic |
ops.well_data_gaps |
gap_id |
application data | 72 | operational_system |
deterministic_rule_derived |
ops.well_interpretations |
interpretation_id |
petrophysical interpretation runs | 40 | deterministic_rule_derived |
rule_based_heuristic |
ops.well_logs |
well_log_id |
normalized well-log run registry | 56 | source_human_dataset |
deterministic_transform |
ops.well_pay_events |
event_id |
pay interval events | 269 | deterministic_rule_derived |
rule_based_heuristic |
ops.well_qc_cards |
qc_card_id |
log QC summaries | 56 | deterministic_rule_derived |
rule_based_heuristic |
ops.well_reopening_targets |
reopening_target_id |
reopening candidate ranking | 40 | deterministic_rule_derived |
rule_based_heuristic |
raw.extracted_fields |
extracted_field_id |
raw source intake | 480 | source_human_dataset |
none |
raw.source_bundles |
bundle_id |
raw source intake | 4 | source_human_dataset |
none |
raw.source_objects |
source_object_id |
raw source intake | 32 | source_human_dataset |
none |
semantic.entities |
entity_id |
semantic entity projection | 40 | deterministic_rule_derived |
deterministic_semantic_projection |
semantic.entity_links |
entity_link_id |
semantic relationship projection | 0 | deterministic_rule_derived |
deterministic_semantic_projection |
semantic.entity_source_links |
entity_source_link_id |
semantic/source lineage | 0 | deterministic_rule_derived |
deterministic_semantic_projection |
semantic.entity_types |
entity_type_id |
semantic projection | 19 | deterministic_rule_derived |
deterministic_semantic_projection |
semantic.query_profiles |
query_profile_id |
semantic projection | 2 | deterministic_rule_derived |
deterministic_semantic_projection |
semantic.relation_types |
relation_type_id |
semantic projection | 18 | deterministic_rule_derived |
deterministic_semantic_projection |