Reference Workflow Alignment Report

Source: docs/architecture/REFERENCE_WORKFLOW_ALIGNMENT_REPORT_MANUAL.html

Manual Index Client UI

Human Input Log

This report treats the uploaded Earthbond workflow draw.io source as human-authored product input.

Recorded author: Johann F.R Wilhelm

Authoritative source file: /Users/robertwilhelm/Documents/New project/human_workflow.drawio

Recorded as:

  • documentation log: docs/operations/HUMAN_INPUT_WORKFLOW_LOG_20260331.md
  • immutable audit event: audit.event_ledger

Audit event type: human_input.workflow_reference_logged

Reference Workflow Extracted From Human Input

Source Inputs

Open data, partners, client surface and subsurface data

Validation + Ingest Job

Validate intake contract, then create ingest job in PostgreSQL

Raw Write + Audit

Write bytes to MinIO, register object pointer, then create immediate audit trail

Deterministic Extract

Extract headers and metadata, store metadata in PostgreSQL, preserve raw vault and metadata catalog separately

Catalog / Quarantine Outcome

Searchable metadata in PostgreSQL, raw bytes in MinIO, quarantine on invalid or extract-fail while retaining raw object and issue flags

Reference Step Intent
Source inputsAccept training/open, partner, and client data.
Ingest job creationCreate workflow control row in PostgreSQL as flow_a.ingest_job.
Intake contract validationValidate source identity, format, schema, authority, and confidence before the happy path continues.
Raw object writeStore immutable bytes in MinIO.
Object pointer registrationStore object key, hash, type, size, and state in PostgreSQL as flow_a.object.
Immediate audit trailHash, timestamps, and source URI become traceable.
Deterministic metadata extractionRead headers/metadata for SEG-Y, LAS, DLIS, WITSML, LiDAR, raster, CSV.
Extract metadata storeWrite extraction outputs to PostgreSQL as flow_a.object_extract.
Raw dataset vaultKeep immutable source files in MinIO even when downstream issues occur.
Metadata object catalogSearchable metadata stays in PostgreSQL.
Quarantine/reviewCatch missing CRS, unreadable header, corrupt file, or schema failure.
Quarantine outputsRetain the raw object and attach issue state / flags to the catalog side.

Current Build Alignment To The Reference Workflow

Reference Step Current Build Mapping Status Notes
Create ingest job in PostgreSQL ops.jobs, ops.upload_sessions Matched Implemented through upload/job workflow in the early stack.
Validate intake contract validation gate, field-package validation, project/tenant/page access, source-mode validation Partial Validation exists, but there is not yet one single strict universal contract row matching the exact flow_a reference shape across every format.
Write raw payload to MinIO MinIO raw vault, staged Volve prefix, object storage pipeline Matched Raw bytes are staged and retained in MinIO.
Register object pointer in PostgreSQL raw.source_bundles, raw.source_objects Matched Stores object key, file metadata, parse state, and source linkage.
Immediate audit trail audit.event_ledger, audit.evidence_packs Matched Hash-chained event logging exists and is immutable.
Raw dataset vault MinIO raw vault, staged field-package source retention Matched The source file is preserved even when validation or extraction produce issues.
Deterministic header / metadata extractor raw.extracted_fields, ops.job_results, canonical artifact tables, format-specific parsers Matched / Expanded Implemented, but now spread across raw extraction plus deeper canonical format-specific ingest. The current build goes beyond the source diagram here.
Store extracted headers / metadata in PostgreSQL raw.extracted_fields, raw.source_objects.metadata_profile, ops.job_results, canonical artifact metadata Matched / Expanded The build uses several tables instead of one single flow_a_object_extract table.
Metadata capture / object catalog raw.source_objects, ops.canonical_*_artifacts, ops.field_package_ingest_coverage Matched / Expanded The object catalog became both a raw registry and a canonical coverage ledger.
Quarantine / review validation gate, blocked/partial states, risk/quarantine language in pointcloud/welllog docs and flows Partial The draw.io source shows two explicit quarantine triggers: invalid / missing from intake validation and extract fail from metadata extraction. Current build behavior exists conceptually and in some runtime paths, but it is not yet one fully unified cross-domain quarantine state machine.
Retain raw object / issue state / flags raw-object retention in MinIO plus state metadata on raw.source_objects and coverage/audit rows Partial The build preserves source objects and records issues, but does not yet express this as one explicit post-quarantine contract exactly like the source draw.io lane.
Outcome: raw in MinIO, searchable metadata in PostgreSQL MinIO + PostgreSQL split architecture Matched This is one of the strongest matches between the reference workflow and the current build.

Workflow Added Beyond The Reference

The uploaded workflow stops at raw storage, object cataloging, metadata extraction, and quarantine. The current build goes materially beyond that.

Added Workflow Why It Was Added
Canonical normalizationThe current build moves from searchable metadata into typed operational truth tables.
Coverage auditThe build explicitly tracks whether packages are only present or canonically represented.
Alias and source-link resolutionNeeded to join inconsistent well naming across packages.
Semantic projectionAdded to support later entity/AI usage without replacing canonical truth.
Reservoir bodies and penetrationsAdded to answer where wells access reservoir context.
Reopening scoringAdded as a portfolio decision-support layer, not present in the reference workflow.
Remaining-barrel estimationAdded as low/mid/high screening-grade output, not present in the reference workflow.
Documentation and export manualsAdded so the build can be defended to colleagues and reviewed externally.

What Is Missing Or Still Partial Relative To The Reference

Gap Current State Why It Matters
Single unified intake-contract object Partial The build validates source mode, project, page access, package readiness, and format-specific logic, but there is no single universal table/contract exactly matching flow_a.ingest_job + intake contract semantics from the draw.io source.
Single unified object-extract table Partial The current build spreads extraction across raw.extracted_fields, ops.job_results, metadata profiles, and canonical artifact tables rather than one flow_a.object_extract table.
Uniform quarantine state machine across all domains Partial Quarantine/blocking concepts exist, but the behavior is not yet one normalized pipeline for every format and domain with the exact branch semantics shown in the draw.io source.
Training-data branch from open data and partners Missing as a distinct operational lane The uploaded workflow explicitly shows a training-data input branch; the current build does not yet operate a separate ML-training data pipeline.
Uniform authority/confidence metadata across all raw objects Partial The fields exist in several places, but they are not normalized as one consistent cross-domain intake standard yet.

When The Workflow Expanded Beyond The Reference

Migration / Phase What It Added Relation To Reference Workflow
0004Immutable audit ledgerMatches and strengthens the reference audit-trail step.
0008Upload sessions, jobs, job resultsMatches ingest-job creation and metadata-result handling.
0016raw.source_bundles, raw.source_objects, raw.extracted_fieldsMatches raw-vault and object-catalog intent very closely.
0019Canonical wells, tops, completions, production, structureThis is where the build moves beyond metadata capture into canonical truth.
0017 and later semantic workSemantic entity projectionAdded beyond the reference workflow.
0018Reopening scoringAdded beyond the reference workflow.
0022Remaining-barrel estimatesAdded beyond the reference workflow.
0023-0028Seismic support, reservoir bodies, WITSML canonicalization, coverage audit, full artifact ingestExpanded the build from ingest metadata into full package coverage and decision-support context.

Conclusion

The current build matches the uploaded workflow strongly at the raw-ingest, object-store, object-pointer, audit, and searchable-metadata levels.

The current build then goes further than the reference by adding canonical truth tables, semantic projection, reservoir interpretation, ranking, and barrel-estimation layers.

The main remaining gaps are not basic ingest. They are consistency gaps: one universal intake contract, one universal extract catalog shape, and one unified quarantine model across all supported domains with explicit retain-object and issue-flag behavior.