Verifying AI-Generated Content at Scale

TL;DR

—Sign at generation time, not publication time — closing the gap between creation and signing is the single most important implementation decision
—Batch verification via nightly audit jobs catches mismatches before they become incidents
—The free tier handles 10,000 assets per month — enough to validate your integration end to end before committing to production
—Manual provenance tracking breaks at around 1,000 assets per month — automate before you reach that threshold
—The parent_hash field enables chain-of-custody tracking across post-processing pipelines, not just point-in-time signing

Where Manual Tracking Breaks Down

When a pipeline produces a hundred AI-generated images a day, provenance is manageable with a spreadsheet and discipline. Someone tracks the source, the tool, the date, and the intended use. The system works because the volume is low enough that humans can maintain it.

At a thousand assets per month, the spreadsheet starts to fail. Records get missed. The person maintaining it changes. Fields are filled in inconsistently. By the time you need the provenance data — for a regulatory inquiry, a legal dispute, or a brand safety audit — the record is incomplete or untrustworthy.

At ten thousand assets per month, manual tracking has already broken completely. Organizations at this volume either have no provenance records, have records that are incomplete by design ("we only track campaign assets, not editorial"), or have records that are too inconsistent to rely on.

The threshold at which you need automated provenance infrastructure is lower than most organizations expect. If your pipeline produces more than a few hundred AI-generated assets per month, automation is not a future concern — it is a current operational requirement.

The Pattern That Works: Sign at Generation Time

The teams using Ledgible at scale follow a consistent pattern: sign at generation time, not at publication time.

The moment the model finishes rendering, the ingest call fires. The hash and signature are recorded before the file leaves the generation cluster. This means the provenance record reflects the state of the asset at its origin point — not at some later stage where the file may have been modified, resized, watermarked, or format-converted.

This matters for two reasons. First, it closes the window of tamper opportunity. A signature applied after a file has moved through a pipeline is a weaker claim than one applied at creation. Second, it is the defensible position under EU AI Act Article 50 and equivalent frameworks — the requirement is disclosure at the point of generation, not disclosure at the point of publication.

The implementation is a single function call wired into your generation pipeline:

import requests
import hashlib

def sign_at_generation(file_bytes, tool_id, creator_id, api_key):
    canonical_hash = "sha256:" + hashlib.sha256(file_bytes).hexdigest()

    response = requests.post(
        "https://ledgible.ai/api/v1/assets/ingest",
        headers={"Authorization": f"Bearer {api_key}"},
        json={
            "asset_type": "image",
            "creator_id": creator_id,
            "tool_id": tool_id,
            "canonical_hash": canonical_hash,
            "metadata": {"ai_generated": True}
        }
    )

    return response.json()["asset_id"], response.json()["provenance_token"]

Handling Post-Processing: The Parent Hash Pattern

Most enterprise content pipelines do not publish model outputs directly. Images get resized for different channels. Videos get transcoded. Documents get formatted. Watermarks get applied.

Each post-processing step produces a derivative asset with a different hash than the original. If you only sign the original and publish the processed version, the published asset's hash does not match the provenance record. Verification fails.

The solution is to sign both the original and the processed version, linking them with the parent_hash field:

# Sign the original model output
original_asset_id, _ = sign_at_generation(
    original_bytes, "stable-diffusion@xl-2.0", "ai:pipeline-1", api_key
)
original_hash = "sha256:" + hashlib.sha256(original_bytes).hexdigest()

# Post-process (resize, watermark, format convert)
processed_bytes = post_process(original_bytes)
processed_hash = "sha256:" + hashlib.sha256(processed_bytes).hexdigest()

# Sign the processed version, linking to the original
requests.post(
    "https://ledgible.ai/api/v1/assets/ingest",
    headers={"Authorization": f"Bearer {api_key}"},
    json={
        "asset_type": "image",
        "creator_id": "system:post-processor",
        "tool_id": "imagemagick@7.1",
        "canonical_hash": processed_hash,
        "parent_hash": original_hash,
        "metadata": {
            "derived_from": original_asset_id,
            "processing_steps": ["resize-1920x1080", "watermark", "convert-webp"]
        }
    }
)

The result is a verifiable chain of custody: model output → post-processed version → published asset. Each step is signed, and the parent-hash links form a traversable lineage from the published file back to the original generation event.

Batch Verification: Nightly Audit Jobs

Signing at generation time ensures every new asset has a provenance record. Nightly audit jobs ensure that existing published assets have not been tampered with since they were signed.

The pattern used by teams running Ledgible in production:

import requests

def nightly_provenance_audit(published_assets, api_key):
    mismatches = []

    for asset in published_assets:
        # Recompute hash of current file
        current_hash = compute_hash(asset["file_path"])

        # Verify against ledger
        response = requests.get(
            f"https://ledgible.ai/api/v1/verify?hash={current_hash}"
        )

        result = response.json()

        if not result["verified"]:
            mismatches.append({
                "asset_id": asset["id"],
                "file_path": asset["file_path"],
                "expected_hash": asset["stored_hash"],
                "current_hash": current_hash,
                "ledger_status": result.get("status", "not_found")
            })

    return mismatches

# Run nightly — alert if any mismatches surface
mismatches = nightly_provenance_audit(get_published_assets(), API_KEY)
if mismatches:
    send_alert(f"{len(mismatches)} provenance mismatches detected", mismatches)

In practice, mismatches are rare in well-managed pipelines. But their rarity is only meaningful if you are checking regularly. A mismatch that goes undetected for six months is a much more serious incident than one caught the morning after it occurred.

The verification endpoint is public and requires no authentication — you can run audit jobs against it without exposing your API key to the audit infrastructure.

Why Cloud Detection APIs Are Not a Substitute

A common question from teams evaluating content provenance infrastructure: "We already use AWS Rekognition / Google Vision AI to detect AI-generated content. Isn't that the same thing?"

It is not, and the distinction matters. Cloud AI detection APIs are probabilistic forensic tools. They analyze a file's characteristics and return a probability that the content is AI-generated. They are useful for screening content you did not produce. They are not provenance.

The specific limitations for enterprise use cases:

•Detection accuracy degrades for content that has been post-processed, compressed, or distributed through social platforms
•Detection APIs cannot tell you who generated the content, which tool was used, or when it was created
•Detection results are not cryptographically verifiable — a 91% confidence score is not a signed record
•Detection APIs cannot produce the machine-readable disclosure record required under EU AI Act Article 50

Provenance and detection answer different questions. Provenance says: "We signed this asset at generation time and here is the signed record." Detection says: "This asset looks like it might be AI-generated, with this confidence." For content your organization produces, provenance is the correct tool. Detection is for screening content produced by others.

How Ledgible Compares to Building Your Own Signing Infrastructure

	Ledgible	Build Yourself
Time to first signed asset	Minutes	3–6 months
Append-only ledger enforcement	✓ Built in	You design and maintain
Public verify endpoint	✓ No auth required	You build and host
Multi-asset-type support	✓ image, video, text, audio	You implement each
SOC 2 compliance	✓ Phase 2	You own the audit scope
Key rotation infrastructure	✓ Built in	You design and maintain
Compliance export tooling	✓ Built in	You build for each jurisdiction
Ongoing maintenance	Ledgible's problem	Your problem

Building your own is a reasonable choice for organizations with specific sovereignty requirements or unique pipeline architectures that cannot be served by a standard API. For most enterprise content pipelines, it is six to twelve months of engineering for a capability that is not your product's core differentiator.

Getting Started: The 30-Minute Integration

For a team running a production AI content pipeline, the complete Ledgible integration takes approximately 30 minutes:

•Generate an API key in your Ledgible dashboard — one key per pipeline
•Wire the ingest call into your generation handler — one function call at model output
•Add parent_hash signing if your pipeline has post-processing steps
•Set up a nightly audit job to verify your published asset catalog
•Test the verify endpoint with a sample asset hash

The free tier handles 10,000 assets per month — enough to run a full production validation before upgrading. Most teams reach a stable integration within a day and move to Pro once they have confirmed provenance records are flowing correctly end to end.

What to Do Next

→Start signing assets — free tier, no credit card required →Verify an existing asset — paste any SHA-256 hash and see a provenance response →Read the ingest API reference — full field documentation and integration patterns