IP And Copyright Compliance Review

Scope

This document reviews the repository and current Volve field-package build from an intellectual-property and copyright posture perspective.

It is an engineering compliance review.

It is not legal advice and should not replace counsel review.

Recorded Authorship Position

For repository and build-provenance purposes, this documentation records the following asserted position:

Johann F.R Wilhelm is the author and creator of the project direction, problem framing, and intellectual build intent.
assistant-generated code and documentation in this repository were produced as implementation assistance within that user-directed frame.
this repository therefore should not be described as a purely machine-originated product artifact.

This section records an internal authorship and provenance position.

It does not, by itself, settle legal title, copyright registration, assignment, or enforceability.

Official Baseline Sources

These are the primary sources used for the legal framing in this review:

U.S. Copyright Office, Circular 61, Computer Programs

https://www.copyright.gov/circs/circ61.pdf

U.S. Copyright Office, Circular 33, Works Not Protected by Copyright

https://www.copyright.gov/circs/circ33.pdf

U.S. Copyright Office, AI policy/report materials on copyrightability and human authorship

https://www.copyright.gov/ai/

These sources matter because the build mixes:

user-authored requirements,
assistant-generated code,
third-party open-source libraries,
third-party datasets,
and machine-derived factual outputs.

Core Principles Applied To This Build

1. Facts and measurements are treated differently from expressive works

Examples in this system:

coordinates,
well names,
sample counts,
curve mnemonics,
production values,
top depths,
archive member paths.

These are generally factual or functional data points, not expressive prose.

2. Source code is protected as expression

The Python, TypeScript, HTML, CSS, SQL, and migration files in this repo are expressive code artifacts.

That means code ownership and licensing need an explicit policy.

3. AI-generated material is not automatically protected the same way as human-authored expression

This repository contains assistant-generated code and documentation.

That means you should not assume all machine-originated expression is registrable or owned in the same way as purely human-authored source without a clear authorship and review position.

4. Dataset license terms still apply even when data is normalized

Canonicalization does not erase the source license.

If source data is licensed with attribution, noncommercial, share-alike, or resale restrictions, those constraints still matter for storage, redistribution, and downstream product behavior.

What The Current Build Gets Right

1. Raw provenance is preserved

The build keeps source-object, bundle, alias, source-link, and artifact-level provenance.

That is good for:

traceability,
attribution,
takedown scope,
and source-boundary enforcement.

2. The system separates factual normalization from source documents

Much of the current canonical layer stores:

structured facts,
inventories,
derived geometry,
scores,
and artifact metadata.

That is materially better than republishing full third-party files or long source excerpts.

3. The dataset’s license presence is detectable

The Volve package includes a license file in the reservoir-model package.

The repo has already surfaced that the Volve dataset is distributed with license conditions and is not simply unrestricted public-domain material.

4. Access is gated

The app uses authentication, page access, and project access.

That matters because dataset rights enforcement is not only a documentation problem. It is also an access-control problem.

What The Build Still Does Not Fully Solve

1. No top-level repository license

There is currently no top-level LICENSE, NOTICE, or equivalent ownership statement in the repo root.

That means the code’s outward licensing position is incomplete.

Practical consequence:

you cannot honestly say the repo’s copyright/license posture is fully finished.

2. No explicit third-party notice file

The repository bundles third-party frontend assets under:

Those assets need explicit notice/attribution handling in a maintained third-party notice file.

3. AI authorship posture is not formally documented

The repo contains extensive assistant-generated code and docs.

There is no explicit internal record stating:

who reviewed it,
who approved it,
which portions are treated as human-authored derivative revisions,
and what ownership model applies.

4. Dataset-specific license enforcement is not yet complete in product flows

The system stages, validates, normalizes, and exports Volve data.

What still needs to be explicit:

attribution display,
export/download warnings,
noncommercial and non-resale boundary handling,
dataset-license propagation into manifests and artifacts.

5. Historical generated reports contained source text excerpts

The profiler previously embedded source text snippets from readme/license-like files into generated JSON/Markdown outputs.

That was an avoidable IP risk.

This review triggered a code change so future profiling outputs redact excerpts and store metadata instead.

Affected code path fixed:

field_package __init__.py

Current Compliance Assessment By Asset Type

A. Repository source code

Status:

partial

Reason:

code exists and is traceable,
but top-level licensing/ownership notice is missing.

B. Third-party open-source libraries

Status:

partial

Reason:

bundled assets are identifiable,
but central notice/attribution inventory is missing.

C. Volve source datasets

Status:

partial

Reason:

source provenance is preserved,
license presence is visible,
but product-level rights enforcement and attribution policy are not yet fully codified.

D. Canonical normalized facts

Status:

stronger

Reason:

factual normalization is generally a lower-risk posture than redistributing source prose or binaries,
but it still inherits source-license constraints where contract/license terms apply.

E. Generated reports and exports

Status:

improving

Reason:

exports are mostly structural/domain summaries,
but excerpt capture had to be tightened.

What Is Copyright-Protected In This Build

Likely protected

source code in this repository,
architecture and operations documentation,
third-party bundled libraries,
third-party reports, PDFs, and narrative source documents,
source datasets to the extent they contain protected expressive material or are governed by license/contract terms,
curated selection/arrangement of materials where human authorship exists.

Usually not protected as such

raw facts,
measurements,
coordinates,
well identifiers,
purely functional field names,
simple counts and inventories,
mathematical formulas by themselves,
machine-derived factual summaries with little or no original expressive authorship.

What Must Be Upheld In Product Behavior

To say the build is respecting rights, the system should do all of these:

preserve source provenance,
preserve dataset license metadata,
display attribution where required,
avoid default redistribution of source documents and raw proprietary binaries,
avoid embedding verbatim source text into convenience exports unless clearly allowed,
record third-party software notices,
document AI/human authorship review for repository code,
add a clear code license or private-use ownership statement at repo level.

Required Actions

Priority 1

add a top-level repo LICENSE or explicit private/internal rights statement,
add a top-level THIRD_PARTY_NOTICES.md,
add a DATASET_LICENSES.md that includes Volve attribution and usage restrictions,
ensure download/export screens show dataset-license context.

Priority 2

regenerate current profile/validation outputs using redacted text metadata only,
review historical generated artifacts that may contain source excerpts,
mark internal-only artifacts that should not be publicly published.

Priority 3

add a repo-level AI contribution/authorship policy,
define who is the human approver of assistant-generated code,
keep signoff records for major generated modules and docs.

Bottom Line

The build is not in a reckless posture, because:

provenance is preserved,
access is gated,
normalized facts are separated from many source documents,
and excerpt handling has now been tightened.

But the build is not yet in a fully closed copyright/compliance posture either.

The blocking gaps are concrete:

no repo license/rights statement,
no third-party notices file,
no formal AI authorship policy,
no fully explicit dataset-license enforcement layer across exports/downloads.

That is the honest state.