Well Log Payload Architecture And Module Design

Purpose

This document defines the data-first Earthbond architecture for:

abstracting data from well logs and associated drilling records,
identifying what is present, what is missing, and what can be normalized,
separating deterministic outputs from assumptions,
ranking review payloads for scientists and engineers.

The goal is not to make every well look usable.

The goal is to make each well land in the correct confidence state with a full evidence trail.

High-Level Design Architecture

The platform should be built around canonical truth generation, not around file upload or 3D display.

flowchart TD A["Source Intake"] --> B["Raw Vault + Source Bundle Registry"] B --> C["Metadata Extraction"] C --> D["Contract Validation"] D --> E["Gap Classification"] D --> F["Canonical Well Identity"] D --> G["Spatial Truth Engine"] D --> H["Log Normalization Engine"] G --> I["Trajectory / TVDSS / WGS84 / ECEF"] H --> J["Petrophysical Interpretation"] I --> K["Canonical Event Builder"] J --> K K --> L["Bypassed-Pay Candidate Builder"] E --> L L --> M["Scientist Review Queue"] K --> N["Evidence Pack"] N --> O["Audit Ledger"] I --> P["2D / 3D Publication"] L --> P

Why We Need The Data

The system is trying to answer six different questions, and each requires different data quality:

Is this the correct well?
Where is it really located in space?
What depth reference is the data using?
Do the logs show a technically interesting interval?
Was that interval already completed, tested, or depleted?
Is the interval strong enough to send to a scientist or engineer for deeper review?

Without the underlying data, the platform cannot do any of the following safely:

compare wells across a field,
align surface and subsurface layers,
rank bypassed-pay opportunities,
evaluate uncertainty,
produce defensible outputs for technical or commercial decisions.

What Data Is Important

1. Well identity data

This prevents mismatching logs, completions, and trajectories.

Important fields:

API number
UWI
permit number
well name
operator
field / basin / lease
county / state
spud / completion / abandonment date

2. Surface location and spatial reference data

This determines whether the well can be correctly placed in 2D and 3D.

Important fields:

latitude / longitude
easting / northing
source CRS
resolved EPSG
local grid name if used
coordinate epoch if applicable
vertical datum
KB / GL / DF / RT elevation
WGS84 geodetic output
ECEF coordinates

3. Depth reference and survey data

This determines whether interval depths can be compared across wells.

Important fields:

measured depth (MD)
true vertical depth (TVD)
true vertical depth subsea (TVDSS)
deviation survey stations
inclination
azimuth
survey method
tie points
reference point used for subsea conversion

4. Log curve data

This is the primary petrophysical input.

Important fields:

Gamma ray (GR)
Bulk density (RHOB)
Neutron porosity (NPHI)
Resistivity (RT, ILD, LLD, similar families)
Caliper (CAL)
Sonic (DT)
Spontaneous potential (SP)
Photoelectric factor (PEF)
environmental corrections
sampling interval
curve units
null handling

5. Completion and operational data

This determines whether a good interval is actually actionable.

Important fields:

perforated intervals
tested intervals
workovers
stimulation history
production history
shut-in status
abandonment status
casing and cement notes
pressure or test data

6. Geologic and drilling context

This helps separate a local log anomaly from a real reservoir interval.

Important fields:

formation tops
markers
cuttings or lithology descriptions
mud logs
offset wells
structural maps
seismic / horizons later
DEM / LiDAR / survey control points for surface truth

What The Petrophysicist Is After

The petrophysicist is not after the raw LAS file.

They are after a technically defensible answer to these questions:

Is this interval reservoir or non-reservoir?
How shaly is it?
How much effective porosity is present?
Is there evidence of hydrocarbons rather than water?
How thick is the net interval?
Was it already completed or properly tested?
Is the data quality high enough to trust the conclusion?

For a bypassed-pay workflow, the scientist usually wants:

top/base of interval,
gross thickness,
net pay,
Vsh,
PHIE,
Rt,
optional Sw,
QC state,
completion overlap result,
recommendation,
confidence and uncertainty.

What The Scientist Looks For To Find Payload

The platform should not declare "payload found" from one curve.

It should combine geology, petrophysics, completion history, and spatial context.

The scientist is typically looking for:

intervals with low-to-moderate shale,
adequate porosity,
supportive resistivity,
enough net thickness,
no convincing evidence the interval was already properly perforated or tested,
acceptable log quality,
acceptable spatial and depth-reference confidence.

The review output is therefore a ranked list of:

wells,
intervals,
reasons they are interesting,
reasons confidence is high or low,
what additional data is still needed.

What We Normally Expect From Well Log Data

Modern digital wells

Usually available:

LAS or DLIS file,
run metadata,
curve inventory,
clearer header data,
better deviation surveys,
cleaner unit conventions,
more complete completion context.

Older or legacy wells

Often missing or ambiguous:

CRS,
vertical datum,
deviation survey,
machine-readable completions,
reliable identifiers,
consistent mnemonics,
exact unit definitions,
clean digital source.

Paper-only wells

Usually require:

OCR,
header extraction,
table extraction,
image segmentation,
manual review,
confidence tagging before promotion.

What Is Missing Most Often

The most common historical gaps are:

missing well identifier or conflicting aliases,
missing source CRS,
missing vertical datum,
missing KB/RT elevation,
missing deviation survey,
missing completion/perforation intervals,
missing caliper,
unclear curve units,
scanned rather than digital curves,
multiple contradictory versions of the same record.

The system should explicitly capture these as structured gaps, not narrative notes.

Can We Adjust Missing Data

Yes, but only under controlled rules.

Allowed adjustments:

mnemonic alias mapping,
unit normalization,
null cleanup,
depth resampling,
minimum-curvature TVD/TVDSS when a valid survey exists,
vertical-well assumption only with downgrade,
CRS transformation when the source definition is sufficiently known,
authority ranking across multiple sources.

Not allowed as silent automation:

inventing CRS,
inventing vertical datum,
silently converting MD to TVDSS,
inventing completion intervals from vague narratives,
merging wells purely by name similarity.

Entire Workflow

Stage 1: Intake

Inputs:

LAS / DLIS / LIS,
PDF / scanned reports,
CSV / XLSX completion tables,
survey files,
spatial files,
manual metadata entry.

Outputs:

source bundle,
raw object registry,
upload session record.

Stage 2: Metadata Extraction

Tasks:

detect file type,
parse headers,
extract identifiers,
extract location fields,
extract curve inventory,
extract depth range,
identify likely completion tables,
assign preliminary authority ranking.

Outputs:

extracted metadata fields,
file profile,
parse status.

Stage 3: Contract Validation

Validate:

Contract A: identity + spatial anchor,
Contract B: log run + curves,
Contract C: completion + survey + context.

Outputs:

pass / partial / fail per contract,
promotion recommendation,
initial gap list.

Stage 4: Gap Classification

Classify all gaps into domains:

identity,
location,
CRS,
depth,
curve,
completion,
QC,
chronology.

Each gap gets:

severity,
promotion action,
source object link,
recommended next step.

Stage 5: Canonical Well Resolution

Tasks:

resolve aliases,
match source records into one canonical well,
assign preferred source by authority,
retain conflicting alternatives for audit.

Outputs:

canonical well record,
alias records,
source-to-canonical lineage.

Stage 6: Spatial Truth Normalization

Tasks:

resolve source CRS,
validate coordinate plausibility,
standardize vertical datum,
convert to WGS84,
convert to ECEF,
compute transform status and notes.

Outputs:

preferred location record,
transform provenance,
WGS84 and ECEF values,
spatial confidence state.

Stage 7: Depth And Trajectory Normalization

Tasks:

parse deviation survey,
compute TVD and TVDSS,
mark whether minimum curvature was used,
downgrade if only vertical assumption is possible.

Outputs:

trajectory header,
station table,
depth reference status,
TVDSS confidence state.

Stage 8: Curve Normalization

Tasks:

standard mnemonic mapping,
unit conversion,
resampling,
null cleanup,
presence/absence analysis,
QC summaries.

Outputs:

normalized curve catalog,
sample storage,
curve-level QC.

Stage 9: Deterministic Interpretation

Tasks:

compute Vsh,
compute porosity metrics,
compute or estimate Sw where justified,
identify candidate intervals,
assign interval-level QC.

Outputs:

pay-event records,
derived-preview curves,
interpretation traceability.

Stage 10: Completion Reconciliation

Tasks:

map perforated intervals,
compare candidate intervals to tested intervals,
remove clearly exhausted/already tested events,
classify remaining intervals.

Outputs:

bypassed-pay candidates,
recommended next actions,
action confidence.

Stage 11: Evidence And Audit

Tasks:

collect source links,
collect transform chain,
collect formulas and cutoffs,
collect output signatures,
publish evidence pack.

Outputs:

evidence pack,
immutable audit entries,
reproducibility record.

Stage 12: Scientist Review Queue

Tasks:

rank wells by score and confidence,
separate blocked wells from review-ready wells,
expose top intervals and missing-data reasons,
drive manual confirmation workflow.

Outputs:

ranked review queue,
promoted candidate list,
blocked/quarantined list.

Module Architecture Design

Module 1: Source Intake Module

Purpose:

accept files and bundles,
land them in raw storage,
preserve source integrity.

Why it exists:

raw data must never be overwritten,
uploads often contain mixed file types,
paper and digital sources must coexist.

Primary storage targets:

MinIO raw vault,
raw.source_bundles,
raw.source_objects.

Module 2: Metadata Extraction Module

Purpose:

classify the source,
extract all detectable header and structural information,
prepare the source for contract validation.

Why it exists:

every downstream module depends on knowing what the file actually is,
partial metadata is still useful,
OCR and digital parse can be handled through the same field registry.

Primary outputs:

extracted metadata fields,
curve inventory,
document profile,
parse confidence.

Module 3: Contract Validation Module

Purpose:

separate usable, partial, and unusable data,
prevent silent promotion of bad wells.

Why it exists:

most failures happen before interpretation,
missing CRS or completions must change behavior,
well identity errors are fatal.

Primary outputs:

contract status by domain,
validation errors,
promotion recommendation.

Module 4: Gap Engine

Purpose:

store what is missing or ambiguous in structured form,
drive quarantine and review rules.

Why it exists:

old wells are rarely complete,
data gaps must be queryable,
review queues depend on gap severity.

Primary outputs:

ops.well_data_gaps,
domain, severity, action,
gap summary for UI and reports.

Module 5: Canonical Well Resolver

Purpose:

unify aliases and repeated sources into one canonical well.

Why it exists:

multiple files often refer to the same well differently,
completion history and logs must be linked to the same well,
the scientist should review one well record, not twenty fragments.

Primary outputs:

canonical well row,
alias table,
source lineage map.

Module 6: Spatial Truth Module

Purpose:

normalize horizontal and vertical references,
compute WGS84 and ECEF,
quarantine bad transforms.

Why it exists:

2D/3D alignment depends on reliable spatial anchoring,
field-scale comparison fails if datum handling is wrong,
old wells often have projection mistakes.

Primary outputs:

preferred location record,
transform provenance,
geodetic and ECEF coordinates.

Module 7: Trajectory And Depth Module

Purpose:

transform MD into TVD/TVDSS when valid,
represent the well path in canonical depth terms.

Why it exists:

bypassed-pay comparisons require interval consistency,
deviated wells cannot be treated as vertical,
many legacy wells have incomplete survey support.

Primary outputs:

trajectory record,
survey stations,
depth reference status,
uncertainty notes.

Module 8: Curve Normalization Module

Purpose:

convert raw log curves into a deterministic canonical set.

Why it exists:

mnemonics vary,
units vary,
null handling varies,
interpretation requires consistent sampling.

Primary outputs:

curve catalog,
normalized curve values,
curve-level QC summary.

Module 9: Interpretation Module

Purpose:

run deterministic petrophysical workflows.

Why it exists:

first-pass payload detection must not depend on black-box AI,
scientists need explainable outputs,
formulas and cutoffs must be auditable.

Primary outputs:

pay-event rows,
interpretation summaries,
trace signatures.

Module 10: Completion Reconciliation Module

Purpose:

classify whether a technically good interval is actually bypassed.

Why it exists:

a promising interval is not payload if it was already completed and tested,
completion history changes actionability.

Primary outputs:

bypassed-pay candidates,
scores,
recommendations,
next-action list.

Module 11: Evidence Pack Module

Purpose:

assemble the full chain from raw source to output.

Why it exists:

outputs must be reproducible,
disagreements must be traceable,
commercial trust depends on auditability.

Primary outputs:

evidence pack record,
hash-linked references,
delivery-ready JSON / markdown packages.

Module 12: Scientist Work Queue Module

Purpose:

route the best opportunities to experts.

Why it exists:

not every well deserves the same human attention,
the system must reduce workload, not increase it,
payload identification is a ranking problem, not only a detection problem.

Primary outputs:

ranked wells,
ranked intervals,
blocked wells,
required follow-up actions.

Recommended Data Model Separation

Use four logical layers:

core

users, projects, permissions, catalogs

raw

source bundles, source objects, extracted fields

ops

canonical wells, trajectories, logs, curves, events, candidates, gaps

audit

evidence packs, lineage, immutable ledger

This is the correct shape for PostgreSQL because it keeps:

raw immutable,
canonical normalized,
derived outputs explicit,
audit separate and defensible.

Bulletproof Operating Rule

There is no bulletproof geology.

There is a bulletproof workflow.

That workflow is:

preserve raw,
extract and classify,
validate contracts,
quantify gaps,
normalize spatial truth,
normalize curves,
compute deterministic intervals,
reconcile with completion history,
rank review payload,
publish evidence-backed outputs.

The system should never silently upgrade a well from uncertain to decision-grade.