Human Input Log
This report treats the uploaded Earthbond workflow draw.io source as human-authored product input.
Recorded author: Johann F.R Wilhelm
Authoritative source file: /Users/robertwilhelm/Documents/New project/human_workflow.drawio
Recorded as:
- documentation log:
docs/operations/HUMAN_INPUT_WORKFLOW_LOG_20260331.md - immutable audit event:
audit.event_ledger
Audit event type: human_input.workflow_reference_logged
Reference Workflow Extracted From Human Input
Source Inputs
Open data, partners, client surface and subsurface data
Validation + Ingest Job
Validate intake contract, then create ingest job in PostgreSQL
Raw Write + Audit
Write bytes to MinIO, register object pointer, then create immediate audit trail
Deterministic Extract
Extract headers and metadata, store metadata in PostgreSQL, preserve raw vault and metadata catalog separately
Catalog / Quarantine Outcome
Searchable metadata in PostgreSQL, raw bytes in MinIO, quarantine on invalid or extract-fail while retaining raw object and issue flags
| Reference Step | Intent |
|---|---|
| Source inputs | Accept training/open, partner, and client data. |
| Ingest job creation | Create workflow control row in PostgreSQL as flow_a.ingest_job. |
| Intake contract validation | Validate source identity, format, schema, authority, and confidence before the happy path continues. |
| Raw object write | Store immutable bytes in MinIO. |
| Object pointer registration | Store object key, hash, type, size, and state in PostgreSQL as flow_a.object. |
| Immediate audit trail | Hash, timestamps, and source URI become traceable. |
| Deterministic metadata extraction | Read headers/metadata for SEG-Y, LAS, DLIS, WITSML, LiDAR, raster, CSV. |
| Extract metadata store | Write extraction outputs to PostgreSQL as flow_a.object_extract. |
| Raw dataset vault | Keep immutable source files in MinIO even when downstream issues occur. |
| Metadata object catalog | Searchable metadata stays in PostgreSQL. |
| Quarantine/review | Catch missing CRS, unreadable header, corrupt file, or schema failure. |
| Quarantine outputs | Retain the raw object and attach issue state / flags to the catalog side. |
Current Build Alignment To The Reference Workflow
| Reference Step | Current Build Mapping | Status | Notes |
|---|---|---|---|
| Create ingest job in PostgreSQL | ops.jobs, ops.upload_sessions |
Matched | Implemented through upload/job workflow in the early stack. |
| Validate intake contract | validation gate, field-package validation, project/tenant/page access, source-mode validation | Partial | Validation exists, but there is not yet one single strict universal contract row matching the exact flow_a reference shape across every format. |
| Write raw payload to MinIO | MinIO raw vault, staged Volve prefix, object storage pipeline | Matched | Raw bytes are staged and retained in MinIO. |
| Register object pointer in PostgreSQL | raw.source_bundles, raw.source_objects |
Matched | Stores object key, file metadata, parse state, and source linkage. |
| Immediate audit trail | audit.event_ledger, audit.evidence_packs |
Matched | Hash-chained event logging exists and is immutable. |
| Raw dataset vault | MinIO raw vault, staged field-package source retention | Matched | The source file is preserved even when validation or extraction produce issues. |
| Deterministic header / metadata extractor | raw.extracted_fields, ops.job_results, canonical artifact tables, format-specific parsers |
Matched / Expanded | Implemented, but now spread across raw extraction plus deeper canonical format-specific ingest. The current build goes beyond the source diagram here. |
| Store extracted headers / metadata in PostgreSQL | raw.extracted_fields, raw.source_objects.metadata_profile, ops.job_results, canonical artifact metadata |
Matched / Expanded | The build uses several tables instead of one single flow_a_object_extract table. |
| Metadata capture / object catalog | raw.source_objects, ops.canonical_*_artifacts, ops.field_package_ingest_coverage |
Matched / Expanded | The object catalog became both a raw registry and a canonical coverage ledger. |
| Quarantine / review | validation gate, blocked/partial states, risk/quarantine language in pointcloud/welllog docs and flows | Partial | The draw.io source shows two explicit quarantine triggers: invalid / missing from intake validation and extract fail from metadata extraction. Current build behavior exists conceptually and in some runtime paths, but it is not yet one fully unified cross-domain quarantine state machine. |
| Retain raw object / issue state / flags | raw-object retention in MinIO plus state metadata on raw.source_objects and coverage/audit rows |
Partial | The build preserves source objects and records issues, but does not yet express this as one explicit post-quarantine contract exactly like the source draw.io lane. |
| Outcome: raw in MinIO, searchable metadata in PostgreSQL | MinIO + PostgreSQL split architecture | Matched | This is one of the strongest matches between the reference workflow and the current build. |
Workflow Added Beyond The Reference
The uploaded workflow stops at raw storage, object cataloging, metadata extraction, and quarantine. The current build goes materially beyond that.
| Added Workflow | Why It Was Added |
|---|---|
| Canonical normalization | The current build moves from searchable metadata into typed operational truth tables. |
| Coverage audit | The build explicitly tracks whether packages are only present or canonically represented. |
| Alias and source-link resolution | Needed to join inconsistent well naming across packages. |
| Semantic projection | Added to support later entity/AI usage without replacing canonical truth. |
| Reservoir bodies and penetrations | Added to answer where wells access reservoir context. |
| Reopening scoring | Added as a portfolio decision-support layer, not present in the reference workflow. |
| Remaining-barrel estimation | Added as low/mid/high screening-grade output, not present in the reference workflow. |
| Documentation and export manuals | Added so the build can be defended to colleagues and reviewed externally. |
What Is Missing Or Still Partial Relative To The Reference
| Gap | Current State | Why It Matters |
|---|---|---|
| Single unified intake-contract object | Partial | The build validates source mode, project, page access, package readiness, and format-specific logic, but there is no single universal table/contract exactly matching flow_a.ingest_job + intake contract semantics from the draw.io source. |
| Single unified object-extract table | Partial | The current build spreads extraction across raw.extracted_fields, ops.job_results, metadata profiles, and canonical artifact tables rather than one flow_a.object_extract table. |
| Uniform quarantine state machine across all domains | Partial | Quarantine/blocking concepts exist, but the behavior is not yet one normalized pipeline for every format and domain with the exact branch semantics shown in the draw.io source. |
| Training-data branch from open data and partners | Missing as a distinct operational lane | The uploaded workflow explicitly shows a training-data input branch; the current build does not yet operate a separate ML-training data pipeline. |
| Uniform authority/confidence metadata across all raw objects | Partial | The fields exist in several places, but they are not normalized as one consistent cross-domain intake standard yet. |
When The Workflow Expanded Beyond The Reference
| Migration / Phase | What It Added | Relation To Reference Workflow |
|---|---|---|
0004 | Immutable audit ledger | Matches and strengthens the reference audit-trail step. |
0008 | Upload sessions, jobs, job results | Matches ingest-job creation and metadata-result handling. |
0016 | raw.source_bundles, raw.source_objects, raw.extracted_fields | Matches raw-vault and object-catalog intent very closely. |
0019 | Canonical wells, tops, completions, production, structure | This is where the build moves beyond metadata capture into canonical truth. |
0017 and later semantic work | Semantic entity projection | Added beyond the reference workflow. |
0018 | Reopening scoring | Added beyond the reference workflow. |
0022 | Remaining-barrel estimates | Added beyond the reference workflow. |
0023-0028 | Seismic support, reservoir bodies, WITSML canonicalization, coverage audit, full artifact ingest | Expanded the build from ingest metadata into full package coverage and decision-support context. |
Conclusion
The current build matches the uploaded workflow strongly at the raw-ingest, object-store, object-pointer, audit, and searchable-metadata levels.
The current build then goes further than the reference by adding canonical truth tables, semantic projection, reservoir interpretation, ranking, and barrel-estimation layers.
The main remaining gaps are not basic ingest. They are consistency gaps: one universal intake contract, one universal extract catalog shape, and one unified quarantine model across all supported domains with explicit retain-object and issue-flag behavior.