Back to Blog
Engineering

Why SHA-256 Alone Is Not Enough for Asset Provenance

Ledgible Engineering·March 18, 2026·8 min read

TL;DR

  • SHA-256 answers one question: has this file changed? It answers nothing about who created it or when.
  • A hash without a signature is a fingerprint without an identity — anyone can compute the same hash and make the same claim.
  • Provenance requires binding the hash to an identity through a cryptographic signature.
  • Most content authentication tools stop at hashing. That is why they fail under legal or regulatory scrutiny.
  • Ledgible stores both the hash and the HMAC signature — integrity plus attribution in a single verifiable record.

The Question a Hash Answers

SHA-256 is one of the most widely deployed cryptographic primitives in existence. It produces a deterministic 256-bit fingerprint of any input — feed it the same file twice and you will always get the same output. Change a single byte in the file and the hash changes completely. This property makes it excellent for one specific purpose: detecting whether a file has been modified.

That is the full extent of what a hash tells you.

A hash is a statement about a file's content. It is not a statement about who produced that content, when they produced it, or under what authority. Two completely different organizations can hash the same file and get identical results. One of them made it; the other just got hold of a copy. The hash cannot tell them apart.

Why This Matters for Enterprise Content Pipelines

Consider a practical scenario. A global media company uses AI tools to produce campaign imagery at scale — thousands of assets per month. They store a SHA-256 hash of each asset in their DAM system. When a regulatory inquiry arrives asking them to demonstrate the provenance of a specific asset, what can they actually prove?

They can prove the file has not changed since it was hashed. They cannot prove:

  • Who produced the original file
  • Which AI tool or human creator made it
  • When it was created versus when it was hashed
  • Whether the hash was computed before or after post-processing

Under the EU AI Act's Article 50, under FTC guidance on AI content disclosure, and in any legal dispute over content ownership, these are precisely the questions that matter. A SHA-256 hash stored in a database is not a provenance record. It is an integrity check.

What Signing Adds — And Why It Changes Everything

Where hashing is a one-way function applied to data, signing binds that hash to a key — and a key is controlled by an identity.

When Ledgible signs an asset with HMAC-SHA256, the record states: this specific hash was witnessed and recorded by the holder of this key, at this specific moment in time. The signature cannot be produced without the secret key. The timestamp is set server-side and is not modifiable by the caller. The record is written to an append-only ledger that cannot be altered after the fact.

This produces something qualitatively different from a hash: a verifiable attribution claim.

Why Most Content Authentication Tools Stop Too Early

There is a category of tools — watermarking services, hash registries, checksum databases — that are positioned as content authentication solutions. Most of them are, at their core, hash storage with a UI. They answer the integrity question and market it as the provenance question.

The distinction matters under scrutiny:

Hash registry approach

"We have a record that this hash existed in our system at this date."

  • Who computed the hash? Unknown.
  • Who controlled the original file? Unknown.
  • Is this the original file or a derivative? Unknown.
  • Can a regulator independently verify this without calling you? No.

Signed provenance record approach

"We have a cryptographically signed record that this specific file, identified by this hash, was submitted by this authenticated organization using this tool, at this timestamp, and the signature was computed using a key that only that organization holds."

  • Who submitted it? Verified via API key authentication.
  • Which tool produced it? Recorded in tool_id.
  • When was it signed? Server-side timestamp, not client-supplied.
  • Can anyone verify it? Yes — the public verify endpoint requires no authentication.

The second approach is what regulators, auditors, and legal teams are actually asking for when they request content provenance documentation.

The Ledgible Record: Both Hash and Signature

Every asset ingested through the Ledgible API produces a record containing:

{
  "asset_id": "uuid",
  "canonical_hash": "sha256:abc123...",
  "signature": "hmac-sha256:xyz789...",
  "creator_id": "org:acme-corp",
  "tool_id": "adobe-firefly@3.0",
  "signed_at": "2026-03-18T10:00:00Z",
  "status": "verified_automated"
}

The canonical_hash gives you integrity — proof the file has not been modified since signing. The signature gives you attribution — proof that the holder of the signing key witnessed this specific hash at this specific moment. Together they answer the complete provenance question: what is this file, and who says so?

The record is publicly verifiable. Anyone with the asset's SHA-256 hash can query:

curl "https://ledgible.ai/api/v1/verify?hash=sha256:abc123..."

And receive a verifiable answer — without contacting Ledgible, without authentication, without access to your internal systems.

Where Hash-Only Approaches Fail Under Legal Scrutiny

Three scenarios where hash-only content authentication breaks down:

Scenario 1: Disputed ownership

Two parties claim to have produced the same asset. Both have a hash stored in their systems. Neither hash proves priority of creation — only the signing timestamp on a signed provenance record does.

Scenario 2: Regulatory audit

A regulator asks you to demonstrate that a specific AI-generated asset was disclosed as AI-produced at the time of publication. A hash proves the file exists. It does not prove when it was created, which tool produced it, or that the disclosure obligation was met at creation time rather than retroactively applied.

Scenario 3: Supply chain tampering

An asset moves through a multi-vendor pipeline — generation, post-processing, watermarking, format conversion. A hash of the final file proves the final file is unmodified. It does not prove the chain of custody from the original generation event to the final published version. A signed parent-hash chain in Ledgible does.

The Migration Path for Teams Already Using Hash-Based Systems

If your current approach stores SHA-256 hashes without signatures, migrating to Ledgible does not require replacing your existing infrastructure. You add one API call at your signing point:

curl -X POST https://ledgible.ai/api/v1/assets/ingest \
  -H "X-API-Key: ldg_your_api_key" \
  -d '{
    "canonical_hash": "sha256:your-existing-hash",
    "creator_id": "org:your-org",
    "tool_id": "your-tool@version",
    "asset_type": "image"
  }'

Your existing hash becomes the canonical_hash in the Ledgible record. The ingest API adds the signature, the timestamp, and the attribution — transforming your integrity check into a provenance record.

More from the blog