Earthbond POV Technical QA Evaluation and Implementation Blueprint (v1)

1) Purpose

This document converts Earthbond_POV_Technical_QA.pdf into an implementation-grade, separate POV plan that fits the code already built in this repository.

It answers four decisions explicitly:

What is already implemented vs missing.
Why each required item matters technically and commercially.
Which formulas and controls are non-negotiable.
How to structure PostgreSQL + object storage so AI can be added safely without rework.

---

2) Executive Verdict

Yes, this build is worthwhile, but only if we keep Earthbond differentiated on:

Spatial truth controls (CRS/datum/epoch/vertical + ECEF canonicalization).
Audit-grade lineage (input -> transform chain -> output event).
Multi-source operational fusion (well logs + completion + spatial context + real-time signals).

Petrophysical calculations alone are not enough differentiation. Commercial software already does that well. The defensible value is:

Cross-silo harmonization and QA at ingest.
Deterministic, explainable event outputs with traceability.
Fast operational decision packaging (QC card + bypass candidate + uncertainty context).

---

3) Current Build vs PDF Q1-Q12

| QA Item | Status in Current Repo | Notes |

|---|---|---|

| Q1 Output schema per well | Partial | Well events exist, but final Earthbond event fields (full ECEF-centric schema and confidence/action semantics) need tightening. |

| Q2 Ground truth validation | Partial | Integrity checks exist; geological validation metrics and formal acceptance gate automation need expansion. |

| Q3 Error tradeoff policy | Partial | Conservative triage behavior is present; thresholds and false-negative control should be hard-coded as policy profiles. |

| Q4 CRS/ECEF policy | Strong Partial | Strict CRS and ECEF logic exists in platform; well-log pipeline needs full per-event ECEF + vertical datum trace fields. |

| Q5 Metadata contracts A/B/C | Partial | Contract skeleton exists; required-field quarantine and typed contract validation need stricter enforcement and richer schemas. |

| Q6 Lat/lon enrichment | Deferred by design | Correct to defer external enrichments for core POV; schema extension points are already feasible. |

| Q7 Uncertainty cone spec | Gap | Concept exists, but full minimum-curvature + Monte Carlo P95 cone service for well trajectories is not complete. |

| Q8 Seismic scope | Out of scope now | Correctly deferred. Need schema keys only for future linkage. |

| Q9 Performance targets | Partial | Job framework exists; benchmark harness and SLO dashboards need to be formalized. |

| Q10 Security/compliance | Partial/Strong | Tenant roles, audit tables, and isolation exist; full RLS-by-tenant and compliance controls should be completed before enterprise scale. |

| Q11 Acceptance gates | Partial | Reports exist; signed gate workflow and stage-based acceptance API should be explicit. |

| Q12 Sizing/build sequence | Strong | Validator-first path is aligned with current architecture and should remain the first hardening priority. |

---

4) Item-by-Item Implementation Plan (Q1-Q12)

Q1) Exact decision output per well

Why it matters

Decision users need a deterministic, comparable event record. Without strict event schema, outputs are hard to audit and cannot be trusted across projects.

What to build

Lock a canonical bypassed_pay_event contract (API + DB + object artifacts).
Include both interpretable fields (depth_top_tvdss) and canonical spatial fields (depth_top_ecef_z).
Keep recommended_action, confidence, and cutoffs_applied mandatory.

Required output layers

Event table (one row per pay interval).
Well Status Pack (well-level JSON + PDF/HTML).
Portfolio scorecard (one row per well).

Practical example

If mean_phie passes by a narrow margin and caliper is missing, keep the event and down-rank confidence to LOW; do not auto-discard.

Example event JSON (target output contract)


{
  "event_id": "WELL-042-E03",
  "well_id": "WELL-042",
  "depth_top_tvdss": -3124.5,
  "depth_base_tvdss": -3136.2,
  "depth_top_ecef_z": 3478921.332,
  "depth_base_ecef_z": 3478912.004,
  "gross_thickness_m": 11.7,
  "net_pay_m": 8.4,
  "mean_vsh": 0.29,
  "mean_phie": 0.11,
  "mean_rt": 13.8,
  "mean_sw": 0.42,
  "qc_flag": "CAUTION",
  "source_crs": "EPSG:26741",
  "ecef_transform_applied": true,
  "completion_status": "BYPASSED",
  "classification": "BYPASSED-PAY CANDIDATE",
  "confidence": "MODERATE",
  "recommended_action": "cased-hole_pulsed-neutron_evaluation",
  "cutoffs_applied": {
    "vsh_max": 0.35,
    "phie_min": 0.08,
    "rt_min": 10.0,
    "thickness_min_m": 1.0
  },
  "processing_version": "welllog-pov-v1.0.0"
}

---

Q2) Ground truth for validation

Why it matters

Without objective validation gates, “looks right” can hide systemic errors.

Two-level validation model

Pipeline integrity validation:

deterministic replay
cutoff consistency
depth monotonicity
ECEF range checks

Geological validation:

3-5 known-outcome wells
detection and false-positive targets
reviewer sign-off attached to run id

Immediate implementation

Add ops.validation_wells + ops.validation_metrics.
Compute:

detection rate
false positive rate
missed known-pay count

Store per-run validation artifact as JSON + markdown report in object store.

---

Q3) Error tradeoff policy (false negative vs false positive)

Why it matters

Triage systems should minimize missed opportunities first.

Policy to codify

False negatives are more costly than false positives.
Borderline intervals become LOW confidence candidates, not hard rejects.
Hard reject only on data invalidity (not reservoir marginality).

Implementation detail

Add policy profile table (ops.policy_profiles) with thresholds and tradeoff mode.
Ensure classifier always emits one of:

BYPASSED_PAY_CANDIDATE
TESTED_PAY
TESTED_WET
INCONCLUSIVE

Add “why” fields:

reason_codes[]
confidence_drivers
data_gap_impact

---

Q4) CRS and ECEF policy

Why it matters

Cross-well spatial analytics fail if datum/epoch/vertical assumptions are wrong.

Non-negotiable chain

Source CRS identification (EPSG when possible, else named CRS + resolver notes).
Vertical datum declaration (KB, MSL, local datum).
MD -> TVD/TVDSS using deviation survey + minimum curvature.
Horizontal CRS transform to geographic on WGS84 with correct datum transform.
Geographic + height -> ECEF (EPSG:4978).
Persist full transform provenance in audit evidence.

Formula anchors

Given geodetic latitude phi, longitude lambda, height h, with WGS84:

N = a / sqrt(1 - e^2 * sin(phi)^2)
X = (N + h) * cos(phi) * cos(lambda)
Y = (N + h) * cos(phi) * sin(lambda)
Z = (N * (1 - e^2) + h) * sin(phi)

Where a = 6378137.0 and e^2 = 6.69437999014e-3.

Failure controls

Missing CRS -> quarantine for 3D usage (CRS_UNKNOWN), still allow non-spatial petrophysics.
Missing KB/vertical reference -> keep MD_ONLY, downgrade confidence.
Mislabeled geographic/projected units -> hard fail and quarantine.

---

Q5) Minimum metadata contracts (A/B/C)

Why it matters

Most failures happen before analytics, during ambiguous ingest.

Contract enforcement model

Contract A: well header + location/depth references.
Contract B: required curves (GR, RHOB, NPHI, deep RT, DEPTH/MD).
Contract C: completion/perforation history for bypass labeling.

Implementation detail

Add JSON Schema files for A/B/C in contracts/events/metadata_contracts/.
Add pre-processing validation endpoint returning:

PASS
PARTIAL_PROCESS
QUARANTINE

Persist contract violations with machine-readable codes.

---

Q6) Lat/lon enrichment requirements

Why it matters

Enrichment is useful, but not required for initial bypassed-pay proof.

POV decision

Keep enrichment optional in Phase 1.
Build schema now for future joins by ECEF or geodetic keys:

nearby wells
lease boundaries
DEM/LiDAR
hazard zones

Implementation note

Do not block core workflow on external API connectivity.

---

Q7) Exact 3D uncertainty cone specification

Why it matters

A single deterministic trajectory line hides real positional uncertainty.

Core method

Best estimate trajectory: minimum curvature on survey stations.
Uncertainty propagation: Monte Carlo perturbation on inclination/azimuth.
Envelope output: P50 and P95 lateral radius per depth station.

Key formulas

For two survey stations 1 and 2:

Dogleg angle:

DL = acos(cos(I1)*cos(I2) + sin(I1)*sin(I2)*cos(A2-A1))

Ratio factor:

RF = 1, if DL ~ 0
else RF = 2/DL * tan(DL/2)

TVD increment:

dTVD = dMD/2 * (cos(I1) + cos(I2)) * RF

Monte Carlo:

I' = I + Normal(0, sigma_inc)
A' = A + Normal(0, sigma_azi)
Recompute trajectory N times (POV uses 500).
At each depth, compute radial quantiles of lateral offsets:

r50 = quantile(r, 0.50)
r95 = quantile(r, 0.95)

Deliverable table

Per depth station:

ecef_x, ecef_y, ecef_z best estimate
r50_m, r95_m
position_confidence from r95_m bands

---

Q8) Seismic scope

Why it matters

Scope control protects delivery speed.

POV rule

Keep SEG-Y/horizon interpretation out of current delivery.
Add only linkage keys:

well_id
depth_tvdss_m
optional future seismic_horizon_id

---

Q9) Operational performance targets

Why it matters

If runtime is unpredictable, user trust drops quickly.

Targets to enforce in CI/perf harness

LAS ingest + QC card: < 60s per well.
Full interpretation: < 5 min per well.
Cone computation (500 iterations): < 2 min per well.
Batch 50 wells: < 4h.

Implementation detail

Add benchmark runner scripts and store results per commit.
Add red/yellow/green run metadata in admin UI.

---

Q10) Security and compliance constraints

Why it matters

Client trust and legal isolation are mandatory before scale.

POV baseline

Tenant data isolation.
Least-privilege roles per service type.
Audit evidence per transformation run.
Retention policy + deletion workflow evidence.

Postgres controls

Role separation (control, data, worker, readonly).
Row-level isolation policy by tenant/project where needed.
Immutable ledger rules for audit events.

---

Q11) Acceptance gates

Why it matters

Formal gate sign-off converts technical output to business acceptance.

Gate structure

Gate 1 Data acceptance (QC card correctness).
Gate 2 Technical acceptance (geology plausibility + cutoff policy).
Gate 3 Decision-grade confirmation (actionable candidate + audit sufficiency).

Build action

Add ops.acceptance_gates table with actor, timestamp, notes, artifact refs.
Add API endpoint for gate status and required pending items.

---

Q12) Dataset sizing and first build order

Why it matters

Correct sizing prevents over-engineering and missed deadlines.

Sizing decision

Design for 100 wells in POV operations.
Keep raw files in object store.
Keep metadata/events in PostgreSQL.
Use asynchronous jobs for heavy transforms.

First build sequence (still correct)

Ingest validator.
QC card + quarantine.
Deterministic interpretation.
Bypassed-pay event outputs.
Cone model.

---

5) Data Architecture for AI Readiness

Can AI work from metadata only?

Yes for:

Search, filtering, ranking, routing, and anomaly triage.
Similarity retrieval using event-level features and embeddings.

No for:

Full petrophysical re-derivation.
Curve-shape interpretation.
Raw waveform/trace-level modeling.

Conclusion:

Use both. Metadata is the control plane; raw/derived arrays are the signal plane.

Recommended storage tiers

Raw zone (object storage): original LAS/DLIS/CSV/PDF, immutable.
Curated derived zone (object storage): normalized arrays, interpretation outputs, cone artifacts.
Metadata truth index (PostgreSQL/PostGIS): contracts, lineage, event rows, QC, policies, status.
AI feature zone (PostgreSQL + optional vector index): compact features, embeddings, labels.

What can go wrong and mitigation

Wrong CRS label:

impact: false spatial joins and wrong nearby-well risk.
mitigation: resolver + geodetic plausibility checks + quarantine.

Unit mismatch (ft vs m, API vs normalized):

impact: false pay detection.
mitigation: explicit unit normalization and unit lineage fields.

Missing completion history:

impact: candidate misclassification.
mitigation: classify as UNKNOWN completion, do not overstate confidence.

Data drift over time:

impact: unstable decision thresholds.
mitigation: versioned cutoff profiles + run reproducibility artifacts.

Cross-tenant leakage:

impact: legal/compliance failure.
mitigation: strict tenant scoping, audited access, isolated namespaces.

---

6) Build-vs-Buy Comparison (What Existing Software Already Does)

| Capability | Typical commercial tools | Earthbond opportunity |

|---|---|---|

| Petrophysical interpretation | Strong (mature) | Not primary differentiator by itself |

| Well planning/survey workflows | Strong | Integrate results, do not rebuild entire planning suite |

| Enterprise data model standardization | Growing (OSDU adoption) | Strong value in governance + ingestion quality + cross-source operational readiness |

| Audit-grade CRS/ECEF chain tied to decisions | Usually fragmented | Core differentiator if enforced end-to-end |

| Fast bypassed-pay triage across messy legacy datasets | Partial/manual in many orgs | High-value immediate monetization path |

Bottom line

Do not compete head-on as a pure interpretation workstation. Win as the spatially-governed, auditable, multi-source decision layer on top of existing silos and tools.

---

7) Monetization Before Full AI

You can monetize before full LLM deployment:

Paid ingest QA and CRS reconciliation service.
Bypassed-pay triage reports with acceptance gates.
Audit-ready evidence packs for technical due diligence.
Continuous monitoring dashboard (status, risk flags, data gaps).

Then add AI for:

candidate ranking refinement,
analog retrieval,
natural-language query over validated metadata + selected raw snippets.

---

8) Example Diagrams

A) End-to-end truth chain

flowchart LR A["Raw LAS/DLIS + Header"] --> B["Contract Validation (A/B/C)"] B -->|Pass| C["Normalize + Unit/Curve Mapping"] B -->|Fail| Q["Quarantine + Data Gap Queue"] C --> D["MD->TVDSS + CRS->ECEF Transform"] D --> E["Petrophysics + Event Detection"] E --> F["Bypassed-Pay Classification"] F --> G["Well Status Pack + Portfolio Scorecard"] D --> H["Uncertainty Cone (P95)"] H --> G G --> I["Audit Evidence + Acceptance Gates"]

B) Metadata vs raw data for AI

flowchart TD M["Metadata Index (PostgreSQL)"] --> R["Retrieval / Routing / Governance"] O["Raw + Derived Arrays (Object Store)"] --> S["Signal Computation / Reprocessing"] R --> X["LLM Assistant + Decision UI"] S --> X

---

9) Implementation Roadmap (Pragmatic)

Sprint 1 (hardening)

Lock Q1 event schema and contract tests.
Enforce A/B/C metadata contracts and quarantine reason codes.
Add validation metrics tables and API output.

Sprint 2 (spatial rigor)

Finalize per-event ECEF + transform provenance fields.
Add strict vertical datum handling and explicit assumptions.
Add map sanity checks (region bounds and centroid plausibility).

Sprint 3 (uncertainty + acceptance)

Implement Monte Carlo cone service and storage.
Add Gate 1/2/3 sign-off workflow in API and admin UI.
Publish decision-grade report templates from live outputs.

---

10) Repository-Level Implementation Delta (Next Coding Phase)

Database migrations (db/migrations/versions)

add 0011_welllog_event_schema_hardening.py:
enforce Q1-required event columns
add spatial provenance fields (source_crs, transform_chain, ecef_transform_applied)
add validation metrics tables and acceptance gate tables

API contracts (contracts/openapi/data-plane.yaml)

tighten request/response schemas for:
POST /well-logs/jobs/normalize
POST /well-logs/jobs/interpret
POST /well-logs/jobs/bypassed-pay
GET /projects/{project_id}/wells/{well_id}/status-pack
add gate/validation endpoints for Q2 and Q11

Worker logic (apps/worker-ingest/src/worker_ingest/main.py)

enrich normalize results with explicit contract validation outcomes (A/B/C)
persist transform provenance and quarantine reason codes
add cone calculation job type and artifact output

Domain library (libs/py/welllog_core/__init__.py)

add strict event contract builder
add uncertainty cone computation helpers
add deterministic reason-code generation

UI (apps/client-web)

add “Acceptance Gates” panel per project
add validation metrics widget (detection, false positive, missed known-pay)
add cone preview controls and event confidence explanation

Operations/testing scripts (scripts/dev, tests)

add benchmark harness for Q9 targets
add golden-run replay tests for deterministic parity
add validation-well comparison test fixtures

---

11) Learning Resources and Comparable Platforms

Geodesy / CRS / ECEF (must-study)

PROJ cartesian conversion docs: https://proj.org/en/stable/operations/conversions/cart.html
pyproj Transformer API: https://pyproj4.github.io/pyproj/stable/api/transformer.html
EPSG 9602 (geographic <-> geocentric method): https://epsg.io/9602-method
IOGP guide on dynamic CRSs and coordinate epochs: https://www.iogp.org/bookstore/product/guide-to-coordinate-reference-systems-including-dynamic-datums-in-the-context-of-iot-and-digital-twins/
NOAA VDatum (vertical datum transformation): https://vdatum.noaa.gov/

Well trajectory and uncertainty

ISCWSA home/resources (error model authority): https://www.iscwsa.net/
Well trajectory methods and uncertainty model documentation: https://jonnymaserati.github.io/welleng/

Data platform and standards

OSDU Forum landing page: https://osduforum.org
OSDU GitHub organization (public code/assets): https://github.com/osdu
PostgreSQL partitioning: https://www.postgresql.org/docs/current/ddl-partitioning.html
PostgreSQL row security: https://www.postgresql.org/docs/current/ddl-rowsecurity.html
PostgreSQL BRIN indexes: https://www.postgresql.org/docs/current/brin.html
PostGIS spatial indexing: https://postgis.net/workshops/uk/postgis-intro/indexing.html

Well-log data format references

lasio (Python LAS parser with LAS/CWLS context): https://lasio.readthedocs.io/en/latest/
Energistics RP66v1 (DLIS) specification: https://energistics.org/sites/default/files/RP66/V1/Toc/main.html

Comparable software references

SLB Techlog: https://www.slb.com/products-and-services/delivering-digital-at-scale/software/techlog-wellbore-software/techlog
S&P Global PETRA: https://www.spglobal.com/commodityinsights/en/ci/products/petra.html
Landmark software family (Halliburton): https://www.halliburton.com
OpendTect (open seismic interpretation baseline): https://www.opendtect.org/osr/
Potree WebGL point-cloud viewer baseline: https://github.com/potree/potree

Video hubs and training starts

ISCWSA media/videos: https://www.iscwsa.net/media/
OSDU Forum updates/events hub: https://osduforum.org
NOAA VDatum tutorial resources: https://vdatum.noaa.gov/

---

12) Decision Guidance: Are We Structuring for AI or Monetizing First?

Do both in sequence:

Monetize now with deterministic outputs and audit-ready workflow.
Structure now so AI can be added without re-architecture.

The correct design principle:

Metadata drives trust and governance.
Raw/derived data drives scientific signal extraction.
AI should consume both through controlled, versioned pipelines.

If forced to choose priority now: prioritize deterministic QA + bypassed-pay outcomes + acceptance gates first. This generates client value immediately and creates reliable training data for later AI.