IP And Copyright Compliance Review

Source: docs/operations/IP_AND_COPYRIGHT_COMPLIANCE_REVIEW.md

Manual Index Client UI

IP And Copyright Compliance Review

Scope

This document reviews the repository and current Volve field-package build from an intellectual-property and copyright posture perspective.

It is an engineering compliance review.

It is not legal advice and should not replace counsel review.

Recorded Authorship Position

For repository and build-provenance purposes, this documentation records the following asserted position:

This section records an internal authorship and provenance position.

It does not, by itself, settle legal title, copyright registration, assignment, or enforceability.

Official Baseline Sources

These are the primary sources used for the legal framing in this review:

  1. U.S. Copyright Office, Circular 61, Computer Programs

https://www.copyright.gov/circs/circ61.pdf

  1. U.S. Copyright Office, Circular 33, Works Not Protected by Copyright

https://www.copyright.gov/circs/circ33.pdf

  1. U.S. Copyright Office, AI policy/report materials on copyrightability and human authorship

https://www.copyright.gov/ai/

These sources matter because the build mixes:

Core Principles Applied To This Build

1. Facts and measurements are treated differently from expressive works

Examples in this system:

These are generally factual or functional data points, not expressive prose.

2. Source code is protected as expression

The Python, TypeScript, HTML, CSS, SQL, and migration files in this repo are expressive code artifacts.

That means code ownership and licensing need an explicit policy.

3. AI-generated material is not automatically protected the same way as human-authored expression

This repository contains assistant-generated code and documentation.

That means you should not assume all machine-originated expression is registrable or owned in the same way as purely human-authored source without a clear authorship and review position.

4. Dataset license terms still apply even when data is normalized

Canonicalization does not erase the source license.

If source data is licensed with attribution, noncommercial, share-alike, or resale restrictions, those constraints still matter for storage, redistribution, and downstream product behavior.

What The Current Build Gets Right

1. Raw provenance is preserved

The build keeps source-object, bundle, alias, source-link, and artifact-level provenance.

That is good for:

2. The system separates factual normalization from source documents

Much of the current canonical layer stores:

That is materially better than republishing full third-party files or long source excerpts.

3. The dataset’s license presence is detectable

The Volve package includes a license file in the reservoir-model package.

The repo has already surfaced that the Volve dataset is distributed with license conditions and is not simply unrestricted public-domain material.

4. Access is gated

The app uses authentication, page access, and project access.

That matters because dataset rights enforcement is not only a documentation problem. It is also an access-control problem.

What The Build Still Does Not Fully Solve

1. No top-level repository license

There is currently no top-level LICENSE, NOTICE, or equivalent ownership statement in the repo root.

That means the code’s outward licensing position is incomplete.

Practical consequence:

2. No explicit third-party notice file

The repository bundles third-party frontend assets under:

Those assets need explicit notice/attribution handling in a maintained third-party notice file.

3. AI authorship posture is not formally documented

The repo contains extensive assistant-generated code and docs.

There is no explicit internal record stating:

4. Dataset-specific license enforcement is not yet complete in product flows

The system stages, validates, normalizes, and exports Volve data.

What still needs to be explicit:

5. Historical generated reports contained source text excerpts

The profiler previously embedded source text snippets from readme/license-like files into generated JSON/Markdown outputs.

That was an avoidable IP risk.

This review triggered a code change so future profiling outputs redact excerpts and store metadata instead.

Affected code path fixed:

Current Compliance Assessment By Asset Type

A. Repository source code

Status:

Reason:

B. Third-party open-source libraries

Status:

Reason:

C. Volve source datasets

Status:

Reason:

D. Canonical normalized facts

Status:

Reason:

E. Generated reports and exports

Status:

Reason:

What Is Copyright-Protected In This Build

Likely protected

  1. source code in this repository,
  2. architecture and operations documentation,
  3. third-party bundled libraries,
  4. third-party reports, PDFs, and narrative source documents,
  5. source datasets to the extent they contain protected expressive material or are governed by license/contract terms,
  6. curated selection/arrangement of materials where human authorship exists.

Usually not protected as such

  1. raw facts,
  2. measurements,
  3. coordinates,
  4. well identifiers,
  5. purely functional field names,
  6. simple counts and inventories,
  7. mathematical formulas by themselves,
  8. machine-derived factual summaries with little or no original expressive authorship.

What Must Be Upheld In Product Behavior

To say the build is respecting rights, the system should do all of these:

  1. preserve source provenance,
  2. preserve dataset license metadata,
  3. display attribution where required,
  4. avoid default redistribution of source documents and raw proprietary binaries,
  5. avoid embedding verbatim source text into convenience exports unless clearly allowed,
  6. record third-party software notices,
  7. document AI/human authorship review for repository code,
  8. add a clear code license or private-use ownership statement at repo level.

Required Actions

Priority 1

  1. add a top-level repo LICENSE or explicit private/internal rights statement,
  2. add a top-level THIRD_PARTY_NOTICES.md,
  3. add a DATASET_LICENSES.md that includes Volve attribution and usage restrictions,
  4. ensure download/export screens show dataset-license context.

Priority 2

  1. regenerate current profile/validation outputs using redacted text metadata only,
  2. review historical generated artifacts that may contain source excerpts,
  3. mark internal-only artifacts that should not be publicly published.

Priority 3

  1. add a repo-level AI contribution/authorship policy,
  2. define who is the human approver of assistant-generated code,
  3. keep signoff records for major generated modules and docs.

Bottom Line

The build is not in a reckless posture, because:

But the build is not yet in a fully closed copyright/compliance posture either.

The blocking gaps are concrete:

  1. no repo license/rights statement,
  2. no third-party notices file,
  3. no formal AI authorship policy,
  4. no fully explicit dataset-license enforcement layer across exports/downloads.

That is the honest state.