From: Kushal Das <kushal@sunet.se>
To: git@vger.kernel.org
Cc: "brian m. carlson" <sandals@crustytoothpaste.net>
Subject: [BUG] v2.45+: git commit -S invalidates signature for non-UTF-8 messages
Date: Mon, 20 Apr 2026 10:59:05 +0200 [thread overview]
Message-ID: <4d5d04e2-49c4-4781-a289-f8cf79570643@sunet.se> (raw)
Hi all,
Every `git commit -S` since v2.45.0 produces a permanently-BAD
signature when the commit message contains bytes that are not valid
UTF-8 AND `i18n.commitEncoding` is unset (i.e. the default case).
Verification fails under both `gpg --verify` and any non-GnuPG signer.
The failure is deterministic: it happens every time, on every
non-UTF-8 commit, no card or external tooling needed.
My best guess is commit 6206089cbd0b1cb30a017ec904567f040ab4cea0
starting this (and I am maybe 100% wrong in identifying the cause).
In pre-6206089cbd `commit_tree_extended`, `verify_utf8(&buffer)` ran
BEFORE `sign_with_header(&buffer, sign_commit)`. `verify_utf8` is not
a simple validator -- it mutates the strbuf in place, replacing
invalid-UTF-8 bytes with their Latin-1 -> UTF-8 two-byte form. The
signer therefore saw the transcoded bytes, and the same transcoded
bytes were then written to the object database. Signer and
verifier agreed.
After 6206089cbd, the sequence in `commit_tree_extended` is:
write_commit_tree(&buffer, ...);
sign_commit_to_strbuf(&sig, &buffer, sign_commit); /* pre-transcode */
...
add_commit_signature(&buffer, bufs[i].sig, bufs[i].algo);
/* and then */
if (encoding_is_utf8 && (!verify_utf8(&buffer) ||
!verify_utf8(&compat_buffer)))
fprintf(stderr, _(commit_utf8_warn)); /* post-sign
transcode */
...
odb_write_object_ext(..., buffer.buf, buffer.len, OBJ_COMMIT, ret, ...);
The signature in `bufs[i].sig` covers the raw (non-UTF-8) buffer. The
`verify_utf8` call after `add_commit_signature` rewrites the message
portion of the stored object to UTF-8. The object that hits the ODB
therefore contains bytes that no longer match what the signer hashed,
and any verifier that reads the commit back and re-hashes the
sig-stripped buffer will find a mismatch.
As a reproducer I ran the following command with the below bash script
to verify so far
`podman run --rm -it -v ./git_bug.sh:/git_bug.sh:Z fedora:41 bash -c
'dnf -y install git gnupg2 >/dev/null 2>&1 && /git_bug.sh 2>&1'`
```
#!/usr/bin/env bash
# git_bug.sh -- minimal in-container reproducer for the git 2.45+
# sign/store divergence on non-UTF-8 commit messages.
#
# Two cases:
# CASE A: default git config. On git 2.45+ the signer hashes the
# raw (Latin-1) bytes but the ODB stores the UTF-8-transcoded
# bytes. Verify fails with BAD signature.
# CASE B: same commit with `i18n.commitEncoding=iso-8859-1`. git
# tags the object with an `encoding` header and skips the
# in-place transcode, so sign and store bytes agree.
#
# Exit codes:
# 0 Not reproduced. Git either predates v2.45 or carries a fix.
# 1 Reproduced. CASE A BAD, CASE B GOOD (workaround confirmed).
# 2 Unexpected state or missing tool.
#
# Dependencies: git + gpg + awk + od + a POSIX shell. the script mints
its own
# throwaway Ed25519 key in a temp GNUPGHOME. Host state untouched.
#
set -euo pipefail
step() { printf '>>> %s\n' "$*"; }
say() { printf '%s\n' "$*"; }
die() { printf 'git_bug.sh: %s\n' "$*" >&2; exit 2; }
for tool in git gpg awk od; do
command -v "$tool" >/dev/null || die "$tool not found on PATH"
done
SANDBOX=$(mktemp -d -t git-bug-XXXXXX)
export GNUPGHOME="$SANDBOX/gnupg"
mkdir -p "$GNUPGHOME"
chmod 700 "$GNUPGHOME"
# Pre-create agent + gpg configs so loopback pinentry works and no
# dirmngr / tty prompts can fire.
cat > "$GNUPGHOME/gpg-agent.conf" <<EOF
allow-loopback-pinentry
default-cache-ttl 3600
max-cache-ttl 3600
EOF
cat > "$GNUPGHOME/gpg.conf" <<EOF
pinentry-mode loopback
batch
trust-model always
EOF
cleanup() {
gpgconf --homedir "$GNUPGHOME" --kill all >/dev/null 2>&1 || true
rm -rf "$SANDBOX"
}
trap cleanup EXIT
PASS=""
step "detected: $(git --version), $(gpg --version | head -n1)"
# ---------------------------------------------------------------------
# Mint a throwaway Ed25519 signing key. Ed25519 signs in well under a
# second, unlike RSA-4096 which on some containers takes tens of
# seconds and easily reads as "the script is hung".
# ---------------------------------------------------------------------
step "minting throwaway Ed25519 key"
cat > "$SANDBOX/keygen" <<EOF
%no-protection
Key-Type: EDDSA
Key-Curve: ed25519
Key-Usage: sign,cert
Name-Real: git-bug-tester
Name-Email: bug@example.local
Expire-Date: 1y
%commit
EOF
gpg --batch --gen-key "$SANDBOX/keygen" 2>"$SANDBOX/keygen.err" \
|| { cat "$SANDBOX/keygen.err" >&2; die "key generation failed"; }
KEY_FP=$(gpg --list-keys --with-colons bug@example.local \
| awk -F: '/^fpr:/ {print $10; exit}')
[ -n "$KEY_FP" ] || die "could not read key fingerprint"
say "key: $KEY_FP"
# Kill the gpg-agent spawned by keygen so the next invocation starts
# fresh with our loopback-pinentry policy.
gpgconf --kill gpg-agent >/dev/null 2>&1 || true
# gpg wrapper git will invoke as `gpg.program`. Branches on whether
# --verify is in argv so we can capture the two stdin streams to
# separate files without clobbering each other.
cat > "$SANDBOX/gpg-capture" <<WRAP
#!/usr/bin/env bash
OUT="$SANDBOX/sign_stdin.bin"
VERIFY=0
for a in "\$@"; do
if [[ "\$a" == "--verify" ]]; then
OUT="$SANDBOX/verify_stdin.bin"
VERIFY=1
break
fi
done
if (( VERIFY )); then
tee "\$OUT" | exec gpg "\$@"
else
tee "\$OUT" | exec gpg \\
--pinentry-mode loopback --passphrase "$PASS" --batch --yes "\$@"
fi
WRAP
chmod +x "$SANDBOX/gpg-capture"
# ---------------------------------------------------------------------
# Run one case: fresh repo, optional extra git config, sign a
# lone-0xa7 commit, dump the bytes, verify. Returns 0 on GOOD, 1 on
# BAD. Prints a progress line for every potentially-slow operation so
# a "is it stuck?" question is answerable from the terminal.
# ---------------------------------------------------------------------
run_case() {
local label="$1"
shift
local extra=("$@")
local repo="$SANDBOX/repo_${label}"
rm -f "$SANDBOX/sign_stdin.bin" "$SANDBOX/verify_stdin.bin"
mkdir -p "$repo"
say ""
say "=== CASE $label ==="
say "extra git config: ${extra[*]:-<none>}"
(
cd "$repo"
step "git init + config"
git init --quiet
git config user.name tester
git config user.email bug@example.local
git config user.signingkey "$KEY_FP"
git config commit.gpgsign true
git config gpg.program "$SANDBOX/gpg-capture"
for kv in "${extra[@]}"; do
git config "${kv%%=*}" "${kv#*=}"
done
printf 'section \xa7\n' > "$SANDBOX/msg.txt"
echo content > f.txt
git add f.txt
step "git commit -F msg.txt (invokes gpg-capture to sign)"
git commit -F "$SANDBOX/msg.txt" --quiet
2>"$SANDBOX/commit_${label}.err" \
|| { cat "$SANDBOX/commit_${label}.err" >&2; exit 1; }
)
say "-- msg.txt (bytes we asked git to commit): --"
od -An -tx1 -v "$SANDBOX/msg.txt"
say "-- sign_stdin (bytes git fed gpg at SIGN time, tail): --"
od -An -tx1 -v "$SANDBOX/sign_stdin.bin" | tail -n 1
say "-- commit object body (bytes in the ODB, tail): --"
(cd "$repo" && git cat-file commit HEAD) | od -An -tx1 -v | tail -n 1
step "git verify-commit HEAD"
local rc=0
(cd "$repo" && git verify-commit HEAD) >/dev/null 2>&1 || rc=$?
if [[ -f "$SANDBOX/verify_stdin.bin" ]]; then
say "-- verify_stdin (bytes git fed gpg at VERIFY time, tail): --"
od -An -tx1 -v "$SANDBOX/verify_stdin.bin" | tail -n 1
fi
if (( rc == 0 )); then
say "verify: GOOD (exit 0)"
return 0
else
say "verify: BAD (exit $rc)"
return 1
fi
}
if run_case A; then
STATE_A=GOOD
else
STATE_A=BAD
fi
if run_case B i18n.commitEncoding=iso-8859-1; then
STATE_B=GOOD
else
STATE_B=BAD
fi
say ""
say "=== summary ==="
say "CASE A (default commitEncoding): $STATE_A"
say "CASE B (i18n.commitEncoding=iso-8859-1): $STATE_B"
if [[ "$STATE_A" == BAD && "$STATE_B" == GOOD ]]; then
say "RESULT: reproduced. This git has the v2.45+ regression:"
say " verify_utf8 runs after the signature is already computed,"
say " so the signer sees raw bytes and the ODB stores transcoded"
say " bytes. Workaround confirmed: i18n.commitEncoding=iso-8859-1."
exit 1
fi
if [[ "$STATE_A" == GOOD && "$STATE_B" == GOOD ]]; then
say "RESULT: not reproduced. Either git predates v2.45 or has a fix."
exit 0
fi
say "RESULT: unexpected state (A=$STATE_A B=$STATE_B)."
exit 2
```
Output:
```
>>> detected: git version 2.52.0, gpg (GnuPG) 2.4.5
>>> minting throwaway Ed25519 key
key: 5DDD17B4C7A2BB447EAC16F5280A03FBE5C7A8DF
=== CASE A ===
extra git config: <none>
>>> git init + config
>>> git commit -F msg.txt (invokes gpg-capture to sign)
-- msg.txt (bytes we asked git to commit): --
73 65 63 74 69 6f 6e 20 a7 0a
-- sign_stdin (bytes git fed gpg at SIGN time, tail): --
a7 0a
-- commit object body (bytes in the ODB, tail): --
20 c2 a7 0a
>>> git verify-commit HEAD
-- verify_stdin (bytes git fed gpg at VERIFY time, tail): --
c2 a7 0a
verify: BAD (exit 1)
=== CASE B ===
extra git config: i18n.commitEncoding=iso-8859-1
>>> git init + config
>>> git commit -F msg.txt (invokes gpg-capture to sign)
-- msg.txt (bytes we asked git to commit): --
73 65 63 74 69 6f 6e 20 a7 0a
-- sign_stdin (bytes git fed gpg at SIGN time, tail): --
69 6f 6e 20 a7 0a
-- commit object body (bytes in the ODB, tail): --
74 69 6f 6e 20 a7 0a
>>> git verify-commit HEAD
-- verify_stdin (bytes git fed gpg at VERIFY time, tail): --
69 6f 6e 20 a7 0a
verify: GOOD (exit 0)
=== summary ===
CASE A (default commitEncoding): BAD
CASE B (i18n.commitEncoding=iso-8859-1): GOOD
RESULT: reproduced. This git has the v2.45+ regression:
verify_utf8 runs after the signature is already computed,
so the signer sees raw bytes and the ODB stores transcoded
bytes. Workaround confirmed: i18n.commitEncoding=iso-8859-1.
```
I tested this in different ways.
- git 2.53.0 from Fedora 43 reproduces.
- git 2.53.0 from ubuntu-latest (GitHub Actions) reproduces.
- git 2.43.0 from Ubuntu 24.04 does NOT reproduce (v2.44 is the last
tag without 6206089cbd).
- Any signer produces the bug -- tested with GnuPG 2.4.9 and with
my non-GnuPG tclig [0]; both hash over the same pre-transcode bytes
the command-line signer is piped, so both get invalidated by the
post-sign transcode.
[0] https://crates.io/crates/tumpa-cli
Kushal
next reply other threads:[~2026-04-20 8:59 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-20 8:59 Kushal Das [this message]
2026-04-20 22:11 ` [BUG] v2.45+: git commit -S invalidates signature for non-UTF-8 messages brian m. carlson
2026-04-20 22:14 ` [PATCH 1/2] commit: name UTF-8 function appropriately brian m. carlson
2026-04-20 22:14 ` [PATCH 2/2] commit: sign commit after mutating buffer brian m. carlson
2026-04-22 15:10 ` Elijah Newren
2026-04-24 20:17 ` brian m. carlson
2026-04-22 15:10 ` [PATCH 1/2] commit: name UTF-8 function appropriately Elijah Newren
2026-04-21 7:39 ` [BUG] v2.45+: git commit -S invalidates signature for non-UTF-8 messages Kushal Das
2026-04-21 22:13 ` brian m. carlson
2026-04-22 18:13 ` D. Ben Knoble
2026-04-27 22:18 ` [PATCH v2 1/2] commit: name UTF-8 function appropriately brian m. carlson
2026-04-27 22:18 ` [PATCH v2 2/2] commit: sign commit after mutating buffer brian m. carlson
2026-05-12 5:54 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4d5d04e2-49c4-4781-a289-f8cf79570643@sunet.se \
--to=kushal@sunet.se \
--cc=git@vger.kernel.org \
--cc=sandals@crustytoothpaste.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox