Git development
 help / color / mirror / Atom feed
* [BUG] v2.45+: git commit -S invalidates signature for non-UTF-8 messages
@ 2026-04-20  8:59 Kushal Das
  2026-04-20 22:11 ` brian m. carlson
  0 siblings, 1 reply; 13+ messages in thread
From: Kushal Das @ 2026-04-20  8:59 UTC (permalink / raw)
  To: git; +Cc: brian m. carlson

Hi all,

Every `git commit -S` since v2.45.0 produces a permanently-BAD
signature when the commit message contains bytes that are not valid
UTF-8 AND `i18n.commitEncoding` is unset (i.e. the default case).
Verification fails under both `gpg --verify` and any non-GnuPG signer.
The failure is deterministic: it happens every time, on every
non-UTF-8 commit, no card or external tooling needed.

My best guess is commit 6206089cbd0b1cb30a017ec904567f040ab4cea0 
starting this (and I am maybe 100% wrong in identifying the cause).

In pre-6206089cbd `commit_tree_extended`, `verify_utf8(&buffer)` ran
BEFORE `sign_with_header(&buffer, sign_commit)`. `verify_utf8` is not
a simple validator -- it mutates the strbuf in place, replacing
invalid-UTF-8 bytes with their Latin-1 -> UTF-8 two-byte form. The
signer therefore saw the transcoded bytes, and the same transcoded
bytes were then written to the object database. Signer and
verifier agreed.

After 6206089cbd, the sequence in `commit_tree_extended` is:

   write_commit_tree(&buffer, ...);
   sign_commit_to_strbuf(&sig, &buffer, sign_commit);   /* pre-transcode */
   ...
   add_commit_signature(&buffer, bufs[i].sig, bufs[i].algo);
   /* and then */
   if (encoding_is_utf8 && (!verify_utf8(&buffer) || 
!verify_utf8(&compat_buffer)))
       fprintf(stderr, _(commit_utf8_warn));            /* post-sign 
transcode */
   ...
   odb_write_object_ext(..., buffer.buf, buffer.len, OBJ_COMMIT, ret, ...);

The signature in `bufs[i].sig` covers the raw (non-UTF-8) buffer. The
`verify_utf8` call after `add_commit_signature` rewrites the message
portion of the stored object to UTF-8. The object that hits the ODB
therefore contains bytes that no longer match what the signer hashed,
and any verifier that reads the commit back and re-hashes the
sig-stripped buffer will find a mismatch.


As a reproducer I ran the following command with the below bash script 
to verify so far
`podman run --rm -it -v ./git_bug.sh:/git_bug.sh:Z fedora:41 bash -c 
'dnf -y install git gnupg2 >/dev/null 2>&1  && /git_bug.sh 2>&1'`


```
#!/usr/bin/env bash
# git_bug.sh -- minimal in-container reproducer for the git 2.45+
# sign/store divergence on non-UTF-8 commit messages.
#
# Two cases:
#   CASE A: default git config. On git 2.45+ the signer hashes the
#           raw (Latin-1) bytes but the ODB stores the UTF-8-transcoded
#           bytes. Verify fails with BAD signature.
#   CASE B: same commit with `i18n.commitEncoding=iso-8859-1`. git
#           tags the object with an `encoding` header and skips the
#           in-place transcode, so sign and store bytes agree.
#
# Exit codes:
#   0   Not reproduced. Git either predates v2.45 or carries a fix.
#   1   Reproduced. CASE A BAD, CASE B GOOD (workaround confirmed).
#   2   Unexpected state or missing tool.
#
# Dependencies: git + gpg + awk + od + a POSIX shell. the script mints 
its own
# throwaway Ed25519 key in a temp GNUPGHOME. Host state untouched.
#
set -euo pipefail

step() { printf '>>> %s\n' "$*"; }
say()  { printf '%s\n' "$*"; }
die()  { printf 'git_bug.sh: %s\n' "$*" >&2; exit 2; }

for tool in git gpg awk od; do
     command -v "$tool" >/dev/null || die "$tool not found on PATH"
done

SANDBOX=$(mktemp -d -t git-bug-XXXXXX)
export GNUPGHOME="$SANDBOX/gnupg"
mkdir -p "$GNUPGHOME"
chmod 700 "$GNUPGHOME"
# Pre-create agent + gpg configs so loopback pinentry works and no
# dirmngr / tty prompts can fire.
cat > "$GNUPGHOME/gpg-agent.conf" <<EOF
allow-loopback-pinentry
default-cache-ttl 3600
max-cache-ttl 3600
EOF
cat > "$GNUPGHOME/gpg.conf" <<EOF
pinentry-mode loopback
batch
trust-model always
EOF

cleanup() {
     gpgconf --homedir "$GNUPGHOME" --kill all >/dev/null 2>&1 || true
     rm -rf "$SANDBOX"
}
trap cleanup EXIT

PASS=""

step "detected: $(git --version), $(gpg --version | head -n1)"

# ---------------------------------------------------------------------
# Mint a throwaway Ed25519 signing key. Ed25519 signs in well under a
# second, unlike RSA-4096 which on some containers takes tens of
# seconds and easily reads as "the script is hung".
# ---------------------------------------------------------------------
step "minting throwaway Ed25519 key"
cat > "$SANDBOX/keygen" <<EOF
%no-protection
Key-Type: EDDSA
Key-Curve: ed25519
Key-Usage: sign,cert
Name-Real: git-bug-tester
Name-Email: bug@example.local
Expire-Date: 1y
%commit
EOF
gpg --batch --gen-key "$SANDBOX/keygen" 2>"$SANDBOX/keygen.err" \
     || { cat "$SANDBOX/keygen.err" >&2; die "key generation failed"; }
KEY_FP=$(gpg --list-keys --with-colons bug@example.local \
     | awk -F: '/^fpr:/ {print $10; exit}')
[ -n "$KEY_FP" ] || die "could not read key fingerprint"
say "key: $KEY_FP"

# Kill the gpg-agent spawned by keygen so the next invocation starts
# fresh with our loopback-pinentry policy.
gpgconf --kill gpg-agent >/dev/null 2>&1 || true

# gpg wrapper git will invoke as `gpg.program`. Branches on whether
# --verify is in argv so we can capture the two stdin streams to
# separate files without clobbering each other.
cat > "$SANDBOX/gpg-capture" <<WRAP
#!/usr/bin/env bash
OUT="$SANDBOX/sign_stdin.bin"
VERIFY=0
for a in "\$@"; do
     if [[ "\$a" == "--verify" ]]; then
         OUT="$SANDBOX/verify_stdin.bin"
         VERIFY=1
         break
     fi
done
if (( VERIFY )); then
     tee "\$OUT" | exec gpg "\$@"
else
     tee "\$OUT" | exec gpg \\
         --pinentry-mode loopback --passphrase "$PASS" --batch --yes "\$@"
fi
WRAP
chmod +x "$SANDBOX/gpg-capture"

# ---------------------------------------------------------------------
# Run one case: fresh repo, optional extra git config, sign a
# lone-0xa7 commit, dump the bytes, verify. Returns 0 on GOOD, 1 on
# BAD. Prints a progress line for every potentially-slow operation so
# a "is it stuck?" question is answerable from the terminal.
# ---------------------------------------------------------------------
run_case() {
     local label="$1"
     shift
     local extra=("$@")
     local repo="$SANDBOX/repo_${label}"
     rm -f "$SANDBOX/sign_stdin.bin" "$SANDBOX/verify_stdin.bin"
     mkdir -p "$repo"

     say ""
     say "=== CASE $label ==="
     say "extra git config: ${extra[*]:-<none>}"

     (
         cd "$repo"
         step "git init + config"
         git init --quiet
         git config user.name tester
         git config user.email bug@example.local
         git config user.signingkey "$KEY_FP"
         git config commit.gpgsign true
         git config gpg.program "$SANDBOX/gpg-capture"
         for kv in "${extra[@]}"; do
             git config "${kv%%=*}" "${kv#*=}"
         done

         printf 'section \xa7\n' > "$SANDBOX/msg.txt"
         echo content > f.txt
         git add f.txt

         step "git commit -F msg.txt (invokes gpg-capture to sign)"
         git commit -F "$SANDBOX/msg.txt" --quiet 
2>"$SANDBOX/commit_${label}.err" \
             || { cat "$SANDBOX/commit_${label}.err" >&2; exit 1; }
     )

     say "-- msg.txt (bytes we asked git to commit): --"
     od -An -tx1 -v "$SANDBOX/msg.txt"

     say "-- sign_stdin (bytes git fed gpg at SIGN time, tail): --"
     od -An -tx1 -v "$SANDBOX/sign_stdin.bin" | tail -n 1

     say "-- commit object body (bytes in the ODB, tail): --"
     (cd "$repo" && git cat-file commit HEAD) | od -An -tx1 -v | tail -n 1

     step "git verify-commit HEAD"
     local rc=0
     (cd "$repo" && git verify-commit HEAD) >/dev/null 2>&1 || rc=$?

     if [[ -f "$SANDBOX/verify_stdin.bin" ]]; then
         say "-- verify_stdin (bytes git fed gpg at VERIFY time, tail): --"
         od -An -tx1 -v "$SANDBOX/verify_stdin.bin" | tail -n 1
     fi

     if (( rc == 0 )); then
         say "verify: GOOD (exit 0)"
         return 0
     else
         say "verify: BAD (exit $rc)"
         return 1
     fi
}

if run_case A; then
     STATE_A=GOOD
else
     STATE_A=BAD
fi

if run_case B i18n.commitEncoding=iso-8859-1; then
     STATE_B=GOOD
else
     STATE_B=BAD
fi

say ""
say "=== summary ==="
say "CASE A (default commitEncoding):         $STATE_A"
say "CASE B (i18n.commitEncoding=iso-8859-1): $STATE_B"

if [[ "$STATE_A" == BAD && "$STATE_B" == GOOD ]]; then
     say "RESULT: reproduced. This git has the v2.45+ regression:"
     say "  verify_utf8 runs after the signature is already computed,"
     say "  so the signer sees raw bytes and the ODB stores transcoded"
     say "  bytes. Workaround confirmed: i18n.commitEncoding=iso-8859-1."
     exit 1
fi
if [[ "$STATE_A" == GOOD && "$STATE_B" == GOOD ]]; then
     say "RESULT: not reproduced. Either git predates v2.45 or has a fix."
     exit 0
fi
say "RESULT: unexpected state (A=$STATE_A B=$STATE_B)."
exit 2
```

Output:

```
 >>> detected: git version 2.52.0, gpg (GnuPG) 2.4.5
 >>> minting throwaway Ed25519 key
key: 5DDD17B4C7A2BB447EAC16F5280A03FBE5C7A8DF

=== CASE A ===
extra git config: <none>
 >>> git init + config
 >>> git commit -F msg.txt (invokes gpg-capture to sign)
-- msg.txt (bytes we asked git to commit): --
  73 65 63 74 69 6f 6e 20 a7 0a
-- sign_stdin (bytes git fed gpg at SIGN time, tail): --
  a7 0a
-- commit object body (bytes in the ODB, tail): --
  20 c2 a7 0a
 >>> git verify-commit HEAD
-- verify_stdin (bytes git fed gpg at VERIFY time, tail): --
  c2 a7 0a
verify: BAD (exit 1)

=== CASE B ===
extra git config: i18n.commitEncoding=iso-8859-1
 >>> git init + config
 >>> git commit -F msg.txt (invokes gpg-capture to sign)
-- msg.txt (bytes we asked git to commit): --
  73 65 63 74 69 6f 6e 20 a7 0a
-- sign_stdin (bytes git fed gpg at SIGN time, tail): --
  69 6f 6e 20 a7 0a
-- commit object body (bytes in the ODB, tail): --
  74 69 6f 6e 20 a7 0a
 >>> git verify-commit HEAD
-- verify_stdin (bytes git fed gpg at VERIFY time, tail): --
  69 6f 6e 20 a7 0a
verify: GOOD (exit 0)

=== summary ===
CASE A (default commitEncoding):         BAD
CASE B (i18n.commitEncoding=iso-8859-1): GOOD
RESULT: reproduced. This git has the v2.45+ regression:
   verify_utf8 runs after the signature is already computed,
   so the signer sees raw bytes and the ODB stores transcoded
   bytes. Workaround confirmed: i18n.commitEncoding=iso-8859-1.
```

I tested this in different ways.

- git 2.53.0 from Fedora 43 reproduces.
- git 2.53.0 from ubuntu-latest (GitHub Actions) reproduces.
- git 2.43.0 from Ubuntu 24.04 does NOT reproduce (v2.44 is the last
   tag without 6206089cbd).
- Any signer produces the bug -- tested with GnuPG 2.4.9 and with
   my non-GnuPG tclig [0]; both hash over the same pre-transcode bytes
   the command-line signer is piped, so both get invalidated by the
   post-sign transcode.


[0] https://crates.io/crates/tumpa-cli

Kushal


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2026-05-12  5:54 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-20  8:59 [BUG] v2.45+: git commit -S invalidates signature for non-UTF-8 messages Kushal Das
2026-04-20 22:11 ` brian m. carlson
2026-04-20 22:14   ` [PATCH 1/2] commit: name UTF-8 function appropriately brian m. carlson
2026-04-20 22:14     ` [PATCH 2/2] commit: sign commit after mutating buffer brian m. carlson
2026-04-22 15:10       ` Elijah Newren
2026-04-24 20:17         ` brian m. carlson
2026-04-22 15:10     ` [PATCH 1/2] commit: name UTF-8 function appropriately Elijah Newren
2026-04-21  7:39   ` [BUG] v2.45+: git commit -S invalidates signature for non-UTF-8 messages Kushal Das
2026-04-21 22:13     ` brian m. carlson
2026-04-22 18:13   ` D. Ben Knoble
2026-04-27 22:18   ` [PATCH v2 1/2] commit: name UTF-8 function appropriately brian m. carlson
2026-04-27 22:18     ` [PATCH v2 2/2] commit: sign commit after mutating buffer brian m. carlson
2026-05-12  5:54       ` Junio C Hamano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox