POV Technical QA Evaluation Blueprint

Source: docs/architecture/POV_TECHNICAL_QA_EVALUATION_IMPLEMENTATION_V1.md

Manual Index Client UI

Earthbond POV Technical QA Evaluation and Implementation Blueprint (v1)

1) Purpose

This document converts Earthbond_POV_Technical_QA.pdf into an implementation-grade, separate POV plan that fits the code already built in this repository.

It answers four decisions explicitly:

  1. What is already implemented vs missing.
  2. Why each required item matters technically and commercially.
  3. Which formulas and controls are non-negotiable.
  4. How to structure PostgreSQL + object storage so AI can be added safely without rework.

---

2) Executive Verdict

Yes, this build is worthwhile, but only if we keep Earthbond differentiated on:

  1. Spatial truth controls (CRS/datum/epoch/vertical + ECEF canonicalization).
  2. Audit-grade lineage (input -> transform chain -> output event).
  3. Multi-source operational fusion (well logs + completion + spatial context + real-time signals).

Petrophysical calculations alone are not enough differentiation. Commercial software already does that well. The defensible value is:

  1. Cross-silo harmonization and QA at ingest.
  2. Deterministic, explainable event outputs with traceability.
  3. Fast operational decision packaging (QC card + bypass candidate + uncertainty context).

---

3) Current Build vs PDF Q1-Q12

| QA Item | Status in Current Repo | Notes |

|---|---|---|

| Q1 Output schema per well | Partial | Well events exist, but final Earthbond event fields (full ECEF-centric schema and confidence/action semantics) need tightening. |

| Q2 Ground truth validation | Partial | Integrity checks exist; geological validation metrics and formal acceptance gate automation need expansion. |

| Q3 Error tradeoff policy | Partial | Conservative triage behavior is present; thresholds and false-negative control should be hard-coded as policy profiles. |

| Q4 CRS/ECEF policy | Strong Partial | Strict CRS and ECEF logic exists in platform; well-log pipeline needs full per-event ECEF + vertical datum trace fields. |

| Q5 Metadata contracts A/B/C | Partial | Contract skeleton exists; required-field quarantine and typed contract validation need stricter enforcement and richer schemas. |

| Q6 Lat/lon enrichment | Deferred by design | Correct to defer external enrichments for core POV; schema extension points are already feasible. |

| Q7 Uncertainty cone spec | Gap | Concept exists, but full minimum-curvature + Monte Carlo P95 cone service for well trajectories is not complete. |

| Q8 Seismic scope | Out of scope now | Correctly deferred. Need schema keys only for future linkage. |

| Q9 Performance targets | Partial | Job framework exists; benchmark harness and SLO dashboards need to be formalized. |

| Q10 Security/compliance | Partial/Strong | Tenant roles, audit tables, and isolation exist; full RLS-by-tenant and compliance controls should be completed before enterprise scale. |

| Q11 Acceptance gates | Partial | Reports exist; signed gate workflow and stage-based acceptance API should be explicit. |

| Q12 Sizing/build sequence | Strong | Validator-first path is aligned with current architecture and should remain the first hardening priority. |

---

4) Item-by-Item Implementation Plan (Q1-Q12)

Q1) Exact decision output per well

Why it matters

Decision users need a deterministic, comparable event record. Without strict event schema, outputs are hard to audit and cannot be trusted across projects.

What to build

  1. Lock a canonical bypassed_pay_event contract (API + DB + object artifacts).
  2. Include both interpretable fields (depth_top_tvdss) and canonical spatial fields (depth_top_ecef_z).
  3. Keep recommended_action, confidence, and cutoffs_applied mandatory.

Required output layers

  1. Event table (one row per pay interval).
  2. Well Status Pack (well-level JSON + PDF/HTML).
  3. Portfolio scorecard (one row per well).

Practical example

If mean_phie passes by a narrow margin and caliper is missing, keep the event and down-rank confidence to LOW; do not auto-discard.

Example event JSON (target output contract)


{
  "event_id": "WELL-042-E03",
  "well_id": "WELL-042",
  "depth_top_tvdss": -3124.5,
  "depth_base_tvdss": -3136.2,
  "depth_top_ecef_z": 3478921.332,
  "depth_base_ecef_z": 3478912.004,
  "gross_thickness_m": 11.7,
  "net_pay_m": 8.4,
  "mean_vsh": 0.29,
  "mean_phie": 0.11,
  "mean_rt": 13.8,
  "mean_sw": 0.42,
  "qc_flag": "CAUTION",
  "source_crs": "EPSG:26741",
  "ecef_transform_applied": true,
  "completion_status": "BYPASSED",
  "classification": "BYPASSED-PAY CANDIDATE",
  "confidence": "MODERATE",
  "recommended_action": "cased-hole_pulsed-neutron_evaluation",
  "cutoffs_applied": {
    "vsh_max": 0.35,
    "phie_min": 0.08,
    "rt_min": 10.0,
    "thickness_min_m": 1.0
  },
  "processing_version": "welllog-pov-v1.0.0"
}

---

Q2) Ground truth for validation

Why it matters

Without objective validation gates, “looks right” can hide systemic errors.

Two-level validation model

  1. Pipeline integrity validation:
  1. Geological validation:

Immediate implementation

  1. Add ops.validation_wells + ops.validation_metrics.
  2. Compute:
  1. Store per-run validation artifact as JSON + markdown report in object store.

---

Q3) Error tradeoff policy (false negative vs false positive)

Why it matters

Triage systems should minimize missed opportunities first.

Policy to codify

  1. False negatives are more costly than false positives.
  2. Borderline intervals become LOW confidence candidates, not hard rejects.
  3. Hard reject only on data invalidity (not reservoir marginality).

Implementation detail

  1. Add policy profile table (ops.policy_profiles) with thresholds and tradeoff mode.
  2. Ensure classifier always emits one of:
  1. Add “why” fields:

---

Q4) CRS and ECEF policy

Why it matters

Cross-well spatial analytics fail if datum/epoch/vertical assumptions are wrong.

Non-negotiable chain

  1. Source CRS identification (EPSG when possible, else named CRS + resolver notes).
  2. Vertical datum declaration (KB, MSL, local datum).
  3. MD -> TVD/TVDSS using deviation survey + minimum curvature.
  4. Horizontal CRS transform to geographic on WGS84 with correct datum transform.
  5. Geographic + height -> ECEF (EPSG:4978).
  6. Persist full transform provenance in audit evidence.

Formula anchors

Given geodetic latitude phi, longitude lambda, height h, with WGS84:

  1. N = a / sqrt(1 - e^2 * sin(phi)^2)
  2. X = (N + h) * cos(phi) * cos(lambda)
  3. Y = (N + h) * cos(phi) * sin(lambda)
  4. Z = (N * (1 - e^2) + h) * sin(phi)

Where a = 6378137.0 and e^2 = 6.69437999014e-3.

Failure controls

  1. Missing CRS -> quarantine for 3D usage (CRS_UNKNOWN), still allow non-spatial petrophysics.
  2. Missing KB/vertical reference -> keep MD_ONLY, downgrade confidence.
  3. Mislabeled geographic/projected units -> hard fail and quarantine.

---

Q5) Minimum metadata contracts (A/B/C)

Why it matters

Most failures happen before analytics, during ambiguous ingest.

Contract enforcement model

  1. Contract A: well header + location/depth references.
  2. Contract B: required curves (GR, RHOB, NPHI, deep RT, DEPTH/MD).
  3. Contract C: completion/perforation history for bypass labeling.

Implementation detail

  1. Add JSON Schema files for A/B/C in contracts/events/metadata_contracts/.
  2. Add pre-processing validation endpoint returning:
  1. Persist contract violations with machine-readable codes.

---

Q6) Lat/lon enrichment requirements

Why it matters

Enrichment is useful, but not required for initial bypassed-pay proof.

POV decision

  1. Keep enrichment optional in Phase 1.
  2. Build schema now for future joins by ECEF or geodetic keys:

Implementation note

Do not block core workflow on external API connectivity.

---

Q7) Exact 3D uncertainty cone specification

Why it matters

A single deterministic trajectory line hides real positional uncertainty.

Core method

  1. Best estimate trajectory: minimum curvature on survey stations.
  2. Uncertainty propagation: Monte Carlo perturbation on inclination/azimuth.
  3. Envelope output: P50 and P95 lateral radius per depth station.

Key formulas

For two survey stations 1 and 2:

  1. Dogleg angle:
  1. Ratio factor:
  1. TVD increment:

Monte Carlo:

  1. I' = I + Normal(0, sigma_inc)
  2. A' = A + Normal(0, sigma_azi)
  3. Recompute trajectory N times (POV uses 500).
  4. At each depth, compute radial quantiles of lateral offsets:

Deliverable table

Per depth station:

  1. ecef_x, ecef_y, ecef_z best estimate
  2. r50_m, r95_m
  3. position_confidence from r95_m bands

---

Q8) Seismic scope

Why it matters

Scope control protects delivery speed.

POV rule

  1. Keep SEG-Y/horizon interpretation out of current delivery.
  2. Add only linkage keys:

---

Q9) Operational performance targets

Why it matters

If runtime is unpredictable, user trust drops quickly.

Targets to enforce in CI/perf harness

  1. LAS ingest + QC card: < 60s per well.
  2. Full interpretation: < 5 min per well.
  3. Cone computation (500 iterations): < 2 min per well.
  4. Batch 50 wells: < 4h.

Implementation detail

  1. Add benchmark runner scripts and store results per commit.
  2. Add red/yellow/green run metadata in admin UI.

---

Q10) Security and compliance constraints

Why it matters

Client trust and legal isolation are mandatory before scale.

POV baseline

  1. Tenant data isolation.
  2. Least-privilege roles per service type.
  3. Audit evidence per transformation run.
  4. Retention policy + deletion workflow evidence.

Postgres controls

  1. Role separation (control, data, worker, readonly).
  2. Row-level isolation policy by tenant/project where needed.
  3. Immutable ledger rules for audit events.

---

Q11) Acceptance gates

Why it matters

Formal gate sign-off converts technical output to business acceptance.

Gate structure

  1. Gate 1 Data acceptance (QC card correctness).
  2. Gate 2 Technical acceptance (geology plausibility + cutoff policy).
  3. Gate 3 Decision-grade confirmation (actionable candidate + audit sufficiency).

Build action

  1. Add ops.acceptance_gates table with actor, timestamp, notes, artifact refs.
  2. Add API endpoint for gate status and required pending items.

---

Q12) Dataset sizing and first build order

Why it matters

Correct sizing prevents over-engineering and missed deadlines.

Sizing decision

  1. Design for 100 wells in POV operations.
  2. Keep raw files in object store.
  3. Keep metadata/events in PostgreSQL.
  4. Use asynchronous jobs for heavy transforms.

First build sequence (still correct)

  1. Ingest validator.
  2. QC card + quarantine.
  3. Deterministic interpretation.
  4. Bypassed-pay event outputs.
  5. Cone model.

---

5) Data Architecture for AI Readiness

Can AI work from metadata only?

Yes for:

  1. Search, filtering, ranking, routing, and anomaly triage.
  2. Similarity retrieval using event-level features and embeddings.

No for:

  1. Full petrophysical re-derivation.
  2. Curve-shape interpretation.
  3. Raw waveform/trace-level modeling.

Conclusion:

Use both. Metadata is the control plane; raw/derived arrays are the signal plane.

Recommended storage tiers

  1. Raw zone (object storage): original LAS/DLIS/CSV/PDF, immutable.
  2. Curated derived zone (object storage): normalized arrays, interpretation outputs, cone artifacts.
  3. Metadata truth index (PostgreSQL/PostGIS): contracts, lineage, event rows, QC, policies, status.
  4. AI feature zone (PostgreSQL + optional vector index): compact features, embeddings, labels.

What can go wrong and mitigation

  1. Wrong CRS label:
  1. Unit mismatch (ft vs m, API vs normalized):
  1. Missing completion history:
  1. Data drift over time:
  1. Cross-tenant leakage:

---

6) Build-vs-Buy Comparison (What Existing Software Already Does)

| Capability | Typical commercial tools | Earthbond opportunity |

|---|---|---|

| Petrophysical interpretation | Strong (mature) | Not primary differentiator by itself |

| Well planning/survey workflows | Strong | Integrate results, do not rebuild entire planning suite |

| Enterprise data model standardization | Growing (OSDU adoption) | Strong value in governance + ingestion quality + cross-source operational readiness |

| Audit-grade CRS/ECEF chain tied to decisions | Usually fragmented | Core differentiator if enforced end-to-end |

| Fast bypassed-pay triage across messy legacy datasets | Partial/manual in many orgs | High-value immediate monetization path |

Bottom line

Do not compete head-on as a pure interpretation workstation. Win as the spatially-governed, auditable, multi-source decision layer on top of existing silos and tools.

---

7) Monetization Before Full AI

You can monetize before full LLM deployment:

  1. Paid ingest QA and CRS reconciliation service.
  2. Bypassed-pay triage reports with acceptance gates.
  3. Audit-ready evidence packs for technical due diligence.
  4. Continuous monitoring dashboard (status, risk flags, data gaps).

Then add AI for:

  1. candidate ranking refinement,
  2. analog retrieval,
  3. natural-language query over validated metadata + selected raw snippets.

---

8) Example Diagrams

A) End-to-end truth chain

flowchart LR A["Raw LAS/DLIS + Header"] --> B["Contract Validation (A/B/C)"] B -->|Pass| C["Normalize + Unit/Curve Mapping"] B -->|Fail| Q["Quarantine + Data Gap Queue"] C --> D["MD->TVDSS + CRS->ECEF Transform"] D --> E["Petrophysics + Event Detection"] E --> F["Bypassed-Pay Classification"] F --> G["Well Status Pack + Portfolio Scorecard"] D --> H["Uncertainty Cone (P95)"] H --> G G --> I["Audit Evidence + Acceptance Gates"]

B) Metadata vs raw data for AI

flowchart TD M["Metadata Index (PostgreSQL)"] --> R["Retrieval / Routing / Governance"] O["Raw + Derived Arrays (Object Store)"] --> S["Signal Computation / Reprocessing"] R --> X["LLM Assistant + Decision UI"] S --> X

---

9) Implementation Roadmap (Pragmatic)

Sprint 1 (hardening)

  1. Lock Q1 event schema and contract tests.
  2. Enforce A/B/C metadata contracts and quarantine reason codes.
  3. Add validation metrics tables and API output.

Sprint 2 (spatial rigor)

  1. Finalize per-event ECEF + transform provenance fields.
  2. Add strict vertical datum handling and explicit assumptions.
  3. Add map sanity checks (region bounds and centroid plausibility).

Sprint 3 (uncertainty + acceptance)

  1. Implement Monte Carlo cone service and storage.
  2. Add Gate 1/2/3 sign-off workflow in API and admin UI.
  3. Publish decision-grade report templates from live outputs.

---

10) Repository-Level Implementation Delta (Next Coding Phase)

  1. Database migrations (db/migrations/versions)
  1. API contracts (contracts/openapi/data-plane.yaml)
  1. Worker logic (apps/worker-ingest/src/worker_ingest/main.py)
  1. Domain library (libs/py/welllog_core/__init__.py)
  1. UI (apps/client-web)
  1. Operations/testing scripts (scripts/dev, tests)

---

11) Learning Resources and Comparable Platforms

Geodesy / CRS / ECEF (must-study)

  1. PROJ cartesian conversion docs: https://proj.org/en/stable/operations/conversions/cart.html
  2. pyproj Transformer API: https://pyproj4.github.io/pyproj/stable/api/transformer.html
  3. EPSG 9602 (geographic <-> geocentric method): https://epsg.io/9602-method
  4. IOGP guide on dynamic CRSs and coordinate epochs: https://www.iogp.org/bookstore/product/guide-to-coordinate-reference-systems-including-dynamic-datums-in-the-context-of-iot-and-digital-twins/
  5. NOAA VDatum (vertical datum transformation): https://vdatum.noaa.gov/

Well trajectory and uncertainty

  1. ISCWSA home/resources (error model authority): https://www.iscwsa.net/
  2. Well trajectory methods and uncertainty model documentation: https://jonnymaserati.github.io/welleng/

Data platform and standards

  1. OSDU Forum landing page: https://osduforum.org
  2. OSDU GitHub organization (public code/assets): https://github.com/osdu
  3. PostgreSQL partitioning: https://www.postgresql.org/docs/current/ddl-partitioning.html
  4. PostgreSQL row security: https://www.postgresql.org/docs/current/ddl-rowsecurity.html
  5. PostgreSQL BRIN indexes: https://www.postgresql.org/docs/current/brin.html
  6. PostGIS spatial indexing: https://postgis.net/workshops/uk/postgis-intro/indexing.html

Well-log data format references

  1. lasio (Python LAS parser with LAS/CWLS context): https://lasio.readthedocs.io/en/latest/
  2. Energistics RP66v1 (DLIS) specification: https://energistics.org/sites/default/files/RP66/V1/Toc/main.html

Comparable software references

  1. SLB Techlog: https://www.slb.com/products-and-services/delivering-digital-at-scale/software/techlog-wellbore-software/techlog
  2. S&P Global PETRA: https://www.spglobal.com/commodityinsights/en/ci/products/petra.html
  3. Landmark software family (Halliburton): https://www.halliburton.com
  4. OpendTect (open seismic interpretation baseline): https://www.opendtect.org/osr/
  5. Potree WebGL point-cloud viewer baseline: https://github.com/potree/potree

Video hubs and training starts

  1. ISCWSA media/videos: https://www.iscwsa.net/media/
  2. OSDU Forum updates/events hub: https://osduforum.org
  3. NOAA VDatum tutorial resources: https://vdatum.noaa.gov/

---

12) Decision Guidance: Are We Structuring for AI or Monetizing First?

Do both in sequence:

  1. Monetize now with deterministic outputs and audit-ready workflow.
  2. Structure now so AI can be added without re-architecture.

The correct design principle:

  1. Metadata drives trust and governance.
  2. Raw/derived data drives scientific signal extraction.
  3. AI should consume both through controlled, versioned pipelines.

If forced to choose priority now: prioritize deterministic QA + bypassed-pay outcomes + acceptance gates first. This generates client value immediately and creates reliable training data for later AI.