Earthbond POV Technical QA Evaluation and Implementation Blueprint (v1)
1) Purpose
This document converts Earthbond_POV_Technical_QA.pdf into an implementation-grade, separate POV plan that fits the code already built in this repository.
It answers four decisions explicitly:
- What is already implemented vs missing.
- Why each required item matters technically and commercially.
- Which formulas and controls are non-negotiable.
- How to structure PostgreSQL + object storage so AI can be added safely without rework.
---
2) Executive Verdict
Yes, this build is worthwhile, but only if we keep Earthbond differentiated on:
- Spatial truth controls (CRS/datum/epoch/vertical + ECEF canonicalization).
- Audit-grade lineage (input -> transform chain -> output event).
- Multi-source operational fusion (well logs + completion + spatial context + real-time signals).
Petrophysical calculations alone are not enough differentiation. Commercial software already does that well. The defensible value is:
- Cross-silo harmonization and QA at ingest.
- Deterministic, explainable event outputs with traceability.
- Fast operational decision packaging (QC card + bypass candidate + uncertainty context).
---
3) Current Build vs PDF Q1-Q12
| QA Item | Status in Current Repo | Notes |
|---|---|---|
| Q1 Output schema per well | Partial | Well events exist, but final Earthbond event fields (full ECEF-centric schema and confidence/action semantics) need tightening. |
| Q2 Ground truth validation | Partial | Integrity checks exist; geological validation metrics and formal acceptance gate automation need expansion. |
| Q3 Error tradeoff policy | Partial | Conservative triage behavior is present; thresholds and false-negative control should be hard-coded as policy profiles. |
| Q4 CRS/ECEF policy | Strong Partial | Strict CRS and ECEF logic exists in platform; well-log pipeline needs full per-event ECEF + vertical datum trace fields. |
| Q5 Metadata contracts A/B/C | Partial | Contract skeleton exists; required-field quarantine and typed contract validation need stricter enforcement and richer schemas. |
| Q6 Lat/lon enrichment | Deferred by design | Correct to defer external enrichments for core POV; schema extension points are already feasible. |
| Q7 Uncertainty cone spec | Gap | Concept exists, but full minimum-curvature + Monte Carlo P95 cone service for well trajectories is not complete. |
| Q8 Seismic scope | Out of scope now | Correctly deferred. Need schema keys only for future linkage. |
| Q9 Performance targets | Partial | Job framework exists; benchmark harness and SLO dashboards need to be formalized. |
| Q10 Security/compliance | Partial/Strong | Tenant roles, audit tables, and isolation exist; full RLS-by-tenant and compliance controls should be completed before enterprise scale. |
| Q11 Acceptance gates | Partial | Reports exist; signed gate workflow and stage-based acceptance API should be explicit. |
| Q12 Sizing/build sequence | Strong | Validator-first path is aligned with current architecture and should remain the first hardening priority. |
---
4) Item-by-Item Implementation Plan (Q1-Q12)
Q1) Exact decision output per well
Why it matters
Decision users need a deterministic, comparable event record. Without strict event schema, outputs are hard to audit and cannot be trusted across projects.
What to build
- Lock a canonical
bypassed_pay_eventcontract (API + DB + object artifacts). - Include both interpretable fields (
depth_top_tvdss) and canonical spatial fields (depth_top_ecef_z). - Keep
recommended_action,confidence, andcutoffs_appliedmandatory.
Required output layers
- Event table (one row per pay interval).
- Well Status Pack (well-level JSON + PDF/HTML).
- Portfolio scorecard (one row per well).
Practical example
If mean_phie passes by a narrow margin and caliper is missing, keep the event and down-rank confidence to LOW; do not auto-discard.
Example event JSON (target output contract)
{
"event_id": "WELL-042-E03",
"well_id": "WELL-042",
"depth_top_tvdss": -3124.5,
"depth_base_tvdss": -3136.2,
"depth_top_ecef_z": 3478921.332,
"depth_base_ecef_z": 3478912.004,
"gross_thickness_m": 11.7,
"net_pay_m": 8.4,
"mean_vsh": 0.29,
"mean_phie": 0.11,
"mean_rt": 13.8,
"mean_sw": 0.42,
"qc_flag": "CAUTION",
"source_crs": "EPSG:26741",
"ecef_transform_applied": true,
"completion_status": "BYPASSED",
"classification": "BYPASSED-PAY CANDIDATE",
"confidence": "MODERATE",
"recommended_action": "cased-hole_pulsed-neutron_evaluation",
"cutoffs_applied": {
"vsh_max": 0.35,
"phie_min": 0.08,
"rt_min": 10.0,
"thickness_min_m": 1.0
},
"processing_version": "welllog-pov-v1.0.0"
}
---
Q2) Ground truth for validation
Why it matters
Without objective validation gates, “looks right” can hide systemic errors.
Two-level validation model
- Pipeline integrity validation:
- deterministic replay
- cutoff consistency
- depth monotonicity
- ECEF range checks
- Geological validation:
- 3-5 known-outcome wells
- detection and false-positive targets
- reviewer sign-off attached to run id
Immediate implementation
- Add
ops.validation_wells+ops.validation_metrics. - Compute:
- detection rate
- false positive rate
- missed known-pay count
- Store per-run validation artifact as JSON + markdown report in object store.
---
Q3) Error tradeoff policy (false negative vs false positive)
Why it matters
Triage systems should minimize missed opportunities first.
Policy to codify
- False negatives are more costly than false positives.
- Borderline intervals become
LOWconfidence candidates, not hard rejects. - Hard reject only on data invalidity (not reservoir marginality).
Implementation detail
- Add policy profile table (
ops.policy_profiles) with thresholds and tradeoff mode. - Ensure classifier always emits one of:
BYPASSED_PAY_CANDIDATETESTED_PAYTESTED_WETINCONCLUSIVE
- Add “why” fields:
reason_codes[]confidence_driversdata_gap_impact
---
Q4) CRS and ECEF policy
Why it matters
Cross-well spatial analytics fail if datum/epoch/vertical assumptions are wrong.
Non-negotiable chain
- Source CRS identification (
EPSGwhen possible, else named CRS + resolver notes). - Vertical datum declaration (
KB,MSL, local datum). - MD -> TVD/TVDSS using deviation survey + minimum curvature.
- Horizontal CRS transform to geographic on WGS84 with correct datum transform.
- Geographic + height -> ECEF (EPSG:4978).
- Persist full transform provenance in audit evidence.
Formula anchors
Given geodetic latitude phi, longitude lambda, height h, with WGS84:
N = a / sqrt(1 - e^2 * sin(phi)^2)X = (N + h) * cos(phi) * cos(lambda)Y = (N + h) * cos(phi) * sin(lambda)Z = (N * (1 - e^2) + h) * sin(phi)
Where a = 6378137.0 and e^2 = 6.69437999014e-3.
Failure controls
- Missing CRS -> quarantine for 3D usage (
CRS_UNKNOWN), still allow non-spatial petrophysics. - Missing KB/vertical reference -> keep
MD_ONLY, downgrade confidence. - Mislabeled geographic/projected units -> hard fail and quarantine.
---
Q5) Minimum metadata contracts (A/B/C)
Why it matters
Most failures happen before analytics, during ambiguous ingest.
Contract enforcement model
- Contract A: well header + location/depth references.
- Contract B: required curves (
GR,RHOB,NPHI, deepRT,DEPTH/MD). - Contract C: completion/perforation history for bypass labeling.
Implementation detail
- Add JSON Schema files for A/B/C in
contracts/events/metadata_contracts/. - Add pre-processing validation endpoint returning:
PASSPARTIAL_PROCESSQUARANTINE
- Persist contract violations with machine-readable codes.
---
Q6) Lat/lon enrichment requirements
Why it matters
Enrichment is useful, but not required for initial bypassed-pay proof.
POV decision
- Keep enrichment optional in Phase 1.
- Build schema now for future joins by ECEF or geodetic keys:
- nearby wells
- lease boundaries
- DEM/LiDAR
- hazard zones
Implementation note
Do not block core workflow on external API connectivity.
---
Q7) Exact 3D uncertainty cone specification
Why it matters
A single deterministic trajectory line hides real positional uncertainty.
Core method
- Best estimate trajectory: minimum curvature on survey stations.
- Uncertainty propagation: Monte Carlo perturbation on inclination/azimuth.
- Envelope output: P50 and P95 lateral radius per depth station.
Key formulas
For two survey stations 1 and 2:
- Dogleg angle:
DL = acos(cos(I1)*cos(I2) + sin(I1)*sin(I2)*cos(A2-A1))
- Ratio factor:
RF = 1, ifDL ~ 0- else
RF = 2/DL * tan(DL/2)
- TVD increment:
dTVD = dMD/2 * (cos(I1) + cos(I2)) * RF
Monte Carlo:
I' = I + Normal(0, sigma_inc)A' = A + Normal(0, sigma_azi)- Recompute trajectory
Ntimes (POV uses 500). - At each depth, compute radial quantiles of lateral offsets:
r50 = quantile(r, 0.50)r95 = quantile(r, 0.95)
Deliverable table
Per depth station:
ecef_x,ecef_y,ecef_zbest estimater50_m,r95_mposition_confidencefromr95_mbands
---
Q8) Seismic scope
Why it matters
Scope control protects delivery speed.
POV rule
- Keep SEG-Y/horizon interpretation out of current delivery.
- Add only linkage keys:
well_iddepth_tvdss_m- optional future
seismic_horizon_id
---
Q9) Operational performance targets
Why it matters
If runtime is unpredictable, user trust drops quickly.
Targets to enforce in CI/perf harness
- LAS ingest + QC card: < 60s per well.
- Full interpretation: < 5 min per well.
- Cone computation (500 iterations): < 2 min per well.
- Batch 50 wells: < 4h.
Implementation detail
- Add benchmark runner scripts and store results per commit.
- Add red/yellow/green run metadata in admin UI.
---
Q10) Security and compliance constraints
Why it matters
Client trust and legal isolation are mandatory before scale.
POV baseline
- Tenant data isolation.
- Least-privilege roles per service type.
- Audit evidence per transformation run.
- Retention policy + deletion workflow evidence.
Postgres controls
- Role separation (
control,data,worker,readonly). - Row-level isolation policy by tenant/project where needed.
- Immutable ledger rules for audit events.
---
Q11) Acceptance gates
Why it matters
Formal gate sign-off converts technical output to business acceptance.
Gate structure
- Gate 1 Data acceptance (QC card correctness).
- Gate 2 Technical acceptance (geology plausibility + cutoff policy).
- Gate 3 Decision-grade confirmation (actionable candidate + audit sufficiency).
Build action
- Add
ops.acceptance_gatestable with actor, timestamp, notes, artifact refs. - Add API endpoint for gate status and required pending items.
---
Q12) Dataset sizing and first build order
Why it matters
Correct sizing prevents over-engineering and missed deadlines.
Sizing decision
- Design for 100 wells in POV operations.
- Keep raw files in object store.
- Keep metadata/events in PostgreSQL.
- Use asynchronous jobs for heavy transforms.
First build sequence (still correct)
- Ingest validator.
- QC card + quarantine.
- Deterministic interpretation.
- Bypassed-pay event outputs.
- Cone model.
---
5) Data Architecture for AI Readiness
Can AI work from metadata only?
Yes for:
- Search, filtering, ranking, routing, and anomaly triage.
- Similarity retrieval using event-level features and embeddings.
No for:
- Full petrophysical re-derivation.
- Curve-shape interpretation.
- Raw waveform/trace-level modeling.
Conclusion:
Use both. Metadata is the control plane; raw/derived arrays are the signal plane.
Recommended storage tiers
- Raw zone (object storage): original LAS/DLIS/CSV/PDF, immutable.
- Curated derived zone (object storage): normalized arrays, interpretation outputs, cone artifacts.
- Metadata truth index (PostgreSQL/PostGIS): contracts, lineage, event rows, QC, policies, status.
- AI feature zone (PostgreSQL + optional vector index): compact features, embeddings, labels.
What can go wrong and mitigation
- Wrong CRS label:
- impact: false spatial joins and wrong nearby-well risk.
- mitigation: resolver + geodetic plausibility checks + quarantine.
- Unit mismatch (ft vs m, API vs normalized):
- impact: false pay detection.
- mitigation: explicit unit normalization and unit lineage fields.
- Missing completion history:
- impact: candidate misclassification.
- mitigation: classify as
UNKNOWNcompletion, do not overstate confidence.
- Data drift over time:
- impact: unstable decision thresholds.
- mitigation: versioned cutoff profiles + run reproducibility artifacts.
- Cross-tenant leakage:
- impact: legal/compliance failure.
- mitigation: strict tenant scoping, audited access, isolated namespaces.
---
6) Build-vs-Buy Comparison (What Existing Software Already Does)
| Capability | Typical commercial tools | Earthbond opportunity |
|---|---|---|
| Petrophysical interpretation | Strong (mature) | Not primary differentiator by itself |
| Well planning/survey workflows | Strong | Integrate results, do not rebuild entire planning suite |
| Enterprise data model standardization | Growing (OSDU adoption) | Strong value in governance + ingestion quality + cross-source operational readiness |
| Audit-grade CRS/ECEF chain tied to decisions | Usually fragmented | Core differentiator if enforced end-to-end |
| Fast bypassed-pay triage across messy legacy datasets | Partial/manual in many orgs | High-value immediate monetization path |
Bottom line
Do not compete head-on as a pure interpretation workstation. Win as the spatially-governed, auditable, multi-source decision layer on top of existing silos and tools.
---
7) Monetization Before Full AI
You can monetize before full LLM deployment:
- Paid ingest QA and CRS reconciliation service.
- Bypassed-pay triage reports with acceptance gates.
- Audit-ready evidence packs for technical due diligence.
- Continuous monitoring dashboard (status, risk flags, data gaps).
Then add AI for:
- candidate ranking refinement,
- analog retrieval,
- natural-language query over validated metadata + selected raw snippets.
---
8) Example Diagrams
A) End-to-end truth chain
B) Metadata vs raw data for AI
---
9) Implementation Roadmap (Pragmatic)
Sprint 1 (hardening)
- Lock Q1 event schema and contract tests.
- Enforce A/B/C metadata contracts and quarantine reason codes.
- Add validation metrics tables and API output.
Sprint 2 (spatial rigor)
- Finalize per-event ECEF + transform provenance fields.
- Add strict vertical datum handling and explicit assumptions.
- Add map sanity checks (region bounds and centroid plausibility).
Sprint 3 (uncertainty + acceptance)
- Implement Monte Carlo cone service and storage.
- Add Gate 1/2/3 sign-off workflow in API and admin UI.
- Publish decision-grade report templates from live outputs.
---
10) Repository-Level Implementation Delta (Next Coding Phase)
- Database migrations (
db/migrations/versions)
- add
0011_welllog_event_schema_hardening.py: - enforce Q1-required event columns
- add spatial provenance fields (
source_crs,transform_chain,ecef_transform_applied) - add validation metrics tables and acceptance gate tables
- API contracts (
contracts/openapi/data-plane.yaml)
- tighten request/response schemas for:
POST /well-logs/jobs/normalizePOST /well-logs/jobs/interpretPOST /well-logs/jobs/bypassed-payGET /projects/{project_id}/wells/{well_id}/status-pack- add gate/validation endpoints for Q2 and Q11
- Worker logic (
apps/worker-ingest/src/worker_ingest/main.py)
- enrich normalize results with explicit contract validation outcomes (A/B/C)
- persist transform provenance and quarantine reason codes
- add cone calculation job type and artifact output
- Domain library (
libs/py/welllog_core/__init__.py)
- add strict event contract builder
- add uncertainty cone computation helpers
- add deterministic reason-code generation
- UI (
apps/client-web)
- add “Acceptance Gates” panel per project
- add validation metrics widget (detection, false positive, missed known-pay)
- add cone preview controls and event confidence explanation
- Operations/testing scripts (
scripts/dev,tests)
- add benchmark harness for Q9 targets
- add golden-run replay tests for deterministic parity
- add validation-well comparison test fixtures
---
11) Learning Resources and Comparable Platforms
Geodesy / CRS / ECEF (must-study)
- PROJ cartesian conversion docs: https://proj.org/en/stable/operations/conversions/cart.html
- pyproj Transformer API: https://pyproj4.github.io/pyproj/stable/api/transformer.html
- EPSG 9602 (geographic <-> geocentric method): https://epsg.io/9602-method
- IOGP guide on dynamic CRSs and coordinate epochs: https://www.iogp.org/bookstore/product/guide-to-coordinate-reference-systems-including-dynamic-datums-in-the-context-of-iot-and-digital-twins/
- NOAA VDatum (vertical datum transformation): https://vdatum.noaa.gov/
Well trajectory and uncertainty
- ISCWSA home/resources (error model authority): https://www.iscwsa.net/
- Well trajectory methods and uncertainty model documentation: https://jonnymaserati.github.io/welleng/
Data platform and standards
- OSDU Forum landing page: https://osduforum.org
- OSDU GitHub organization (public code/assets): https://github.com/osdu
- PostgreSQL partitioning: https://www.postgresql.org/docs/current/ddl-partitioning.html
- PostgreSQL row security: https://www.postgresql.org/docs/current/ddl-rowsecurity.html
- PostgreSQL BRIN indexes: https://www.postgresql.org/docs/current/brin.html
- PostGIS spatial indexing: https://postgis.net/workshops/uk/postgis-intro/indexing.html
Well-log data format references
lasio(Python LAS parser with LAS/CWLS context): https://lasio.readthedocs.io/en/latest/- Energistics RP66v1 (DLIS) specification: https://energistics.org/sites/default/files/RP66/V1/Toc/main.html
Comparable software references
- SLB Techlog: https://www.slb.com/products-and-services/delivering-digital-at-scale/software/techlog-wellbore-software/techlog
- S&P Global PETRA: https://www.spglobal.com/commodityinsights/en/ci/products/petra.html
- Landmark software family (Halliburton): https://www.halliburton.com
- OpendTect (open seismic interpretation baseline): https://www.opendtect.org/osr/
- Potree WebGL point-cloud viewer baseline: https://github.com/potree/potree
Video hubs and training starts
- ISCWSA media/videos: https://www.iscwsa.net/media/
- OSDU Forum updates/events hub: https://osduforum.org
- NOAA VDatum tutorial resources: https://vdatum.noaa.gov/
---
12) Decision Guidance: Are We Structuring for AI or Monetizing First?
Do both in sequence:
- Monetize now with deterministic outputs and audit-ready workflow.
- Structure now so AI can be added without re-architecture.
The correct design principle:
- Metadata drives trust and governance.
- Raw/derived data drives scientific signal extraction.
- AI should consume both through controlled, versioned pipelines.
If forced to choose priority now: prioritize deterministic QA + bypassed-pay outcomes + acceptance gates first. This generates client value immediately and creates reliable training data for later AI.