* [BUG] v2.45+: git commit -S invalidates signature for non-UTF-8 messages
@ 2026-04-20 8:59 Kushal Das
2026-04-20 22:11 ` brian m. carlson
0 siblings, 1 reply; 13+ messages in thread
From: Kushal Das @ 2026-04-20 8:59 UTC (permalink / raw)
To: git; +Cc: brian m. carlson
Hi all,
Every `git commit -S` since v2.45.0 produces a permanently-BAD
signature when the commit message contains bytes that are not valid
UTF-8 AND `i18n.commitEncoding` is unset (i.e. the default case).
Verification fails under both `gpg --verify` and any non-GnuPG signer.
The failure is deterministic: it happens every time, on every
non-UTF-8 commit, no card or external tooling needed.
My best guess is commit 6206089cbd0b1cb30a017ec904567f040ab4cea0
starting this (and I am maybe 100% wrong in identifying the cause).
In pre-6206089cbd `commit_tree_extended`, `verify_utf8(&buffer)` ran
BEFORE `sign_with_header(&buffer, sign_commit)`. `verify_utf8` is not
a simple validator -- it mutates the strbuf in place, replacing
invalid-UTF-8 bytes with their Latin-1 -> UTF-8 two-byte form. The
signer therefore saw the transcoded bytes, and the same transcoded
bytes were then written to the object database. Signer and
verifier agreed.
After 6206089cbd, the sequence in `commit_tree_extended` is:
write_commit_tree(&buffer, ...);
sign_commit_to_strbuf(&sig, &buffer, sign_commit); /* pre-transcode */
...
add_commit_signature(&buffer, bufs[i].sig, bufs[i].algo);
/* and then */
if (encoding_is_utf8 && (!verify_utf8(&buffer) ||
!verify_utf8(&compat_buffer)))
fprintf(stderr, _(commit_utf8_warn)); /* post-sign
transcode */
...
odb_write_object_ext(..., buffer.buf, buffer.len, OBJ_COMMIT, ret, ...);
The signature in `bufs[i].sig` covers the raw (non-UTF-8) buffer. The
`verify_utf8` call after `add_commit_signature` rewrites the message
portion of the stored object to UTF-8. The object that hits the ODB
therefore contains bytes that no longer match what the signer hashed,
and any verifier that reads the commit back and re-hashes the
sig-stripped buffer will find a mismatch.
As a reproducer I ran the following command with the below bash script
to verify so far
`podman run --rm -it -v ./git_bug.sh:/git_bug.sh:Z fedora:41 bash -c
'dnf -y install git gnupg2 >/dev/null 2>&1 && /git_bug.sh 2>&1'`
```
#!/usr/bin/env bash
# git_bug.sh -- minimal in-container reproducer for the git 2.45+
# sign/store divergence on non-UTF-8 commit messages.
#
# Two cases:
# CASE A: default git config. On git 2.45+ the signer hashes the
# raw (Latin-1) bytes but the ODB stores the UTF-8-transcoded
# bytes. Verify fails with BAD signature.
# CASE B: same commit with `i18n.commitEncoding=iso-8859-1`. git
# tags the object with an `encoding` header and skips the
# in-place transcode, so sign and store bytes agree.
#
# Exit codes:
# 0 Not reproduced. Git either predates v2.45 or carries a fix.
# 1 Reproduced. CASE A BAD, CASE B GOOD (workaround confirmed).
# 2 Unexpected state or missing tool.
#
# Dependencies: git + gpg + awk + od + a POSIX shell. the script mints
its own
# throwaway Ed25519 key in a temp GNUPGHOME. Host state untouched.
#
set -euo pipefail
step() { printf '>>> %s\n' "$*"; }
say() { printf '%s\n' "$*"; }
die() { printf 'git_bug.sh: %s\n' "$*" >&2; exit 2; }
for tool in git gpg awk od; do
command -v "$tool" >/dev/null || die "$tool not found on PATH"
done
SANDBOX=$(mktemp -d -t git-bug-XXXXXX)
export GNUPGHOME="$SANDBOX/gnupg"
mkdir -p "$GNUPGHOME"
chmod 700 "$GNUPGHOME"
# Pre-create agent + gpg configs so loopback pinentry works and no
# dirmngr / tty prompts can fire.
cat > "$GNUPGHOME/gpg-agent.conf" <<EOF
allow-loopback-pinentry
default-cache-ttl 3600
max-cache-ttl 3600
EOF
cat > "$GNUPGHOME/gpg.conf" <<EOF
pinentry-mode loopback
batch
trust-model always
EOF
cleanup() {
gpgconf --homedir "$GNUPGHOME" --kill all >/dev/null 2>&1 || true
rm -rf "$SANDBOX"
}
trap cleanup EXIT
PASS=""
step "detected: $(git --version), $(gpg --version | head -n1)"
# ---------------------------------------------------------------------
# Mint a throwaway Ed25519 signing key. Ed25519 signs in well under a
# second, unlike RSA-4096 which on some containers takes tens of
# seconds and easily reads as "the script is hung".
# ---------------------------------------------------------------------
step "minting throwaway Ed25519 key"
cat > "$SANDBOX/keygen" <<EOF
%no-protection
Key-Type: EDDSA
Key-Curve: ed25519
Key-Usage: sign,cert
Name-Real: git-bug-tester
Name-Email: bug@example.local
Expire-Date: 1y
%commit
EOF
gpg --batch --gen-key "$SANDBOX/keygen" 2>"$SANDBOX/keygen.err" \
|| { cat "$SANDBOX/keygen.err" >&2; die "key generation failed"; }
KEY_FP=$(gpg --list-keys --with-colons bug@example.local \
| awk -F: '/^fpr:/ {print $10; exit}')
[ -n "$KEY_FP" ] || die "could not read key fingerprint"
say "key: $KEY_FP"
# Kill the gpg-agent spawned by keygen so the next invocation starts
# fresh with our loopback-pinentry policy.
gpgconf --kill gpg-agent >/dev/null 2>&1 || true
# gpg wrapper git will invoke as `gpg.program`. Branches on whether
# --verify is in argv so we can capture the two stdin streams to
# separate files without clobbering each other.
cat > "$SANDBOX/gpg-capture" <<WRAP
#!/usr/bin/env bash
OUT="$SANDBOX/sign_stdin.bin"
VERIFY=0
for a in "\$@"; do
if [[ "\$a" == "--verify" ]]; then
OUT="$SANDBOX/verify_stdin.bin"
VERIFY=1
break
fi
done
if (( VERIFY )); then
tee "\$OUT" | exec gpg "\$@"
else
tee "\$OUT" | exec gpg \\
--pinentry-mode loopback --passphrase "$PASS" --batch --yes "\$@"
fi
WRAP
chmod +x "$SANDBOX/gpg-capture"
# ---------------------------------------------------------------------
# Run one case: fresh repo, optional extra git config, sign a
# lone-0xa7 commit, dump the bytes, verify. Returns 0 on GOOD, 1 on
# BAD. Prints a progress line for every potentially-slow operation so
# a "is it stuck?" question is answerable from the terminal.
# ---------------------------------------------------------------------
run_case() {
local label="$1"
shift
local extra=("$@")
local repo="$SANDBOX/repo_${label}"
rm -f "$SANDBOX/sign_stdin.bin" "$SANDBOX/verify_stdin.bin"
mkdir -p "$repo"
say ""
say "=== CASE $label ==="
say "extra git config: ${extra[*]:-<none>}"
(
cd "$repo"
step "git init + config"
git init --quiet
git config user.name tester
git config user.email bug@example.local
git config user.signingkey "$KEY_FP"
git config commit.gpgsign true
git config gpg.program "$SANDBOX/gpg-capture"
for kv in "${extra[@]}"; do
git config "${kv%%=*}" "${kv#*=}"
done
printf 'section \xa7\n' > "$SANDBOX/msg.txt"
echo content > f.txt
git add f.txt
step "git commit -F msg.txt (invokes gpg-capture to sign)"
git commit -F "$SANDBOX/msg.txt" --quiet
2>"$SANDBOX/commit_${label}.err" \
|| { cat "$SANDBOX/commit_${label}.err" >&2; exit 1; }
)
say "-- msg.txt (bytes we asked git to commit): --"
od -An -tx1 -v "$SANDBOX/msg.txt"
say "-- sign_stdin (bytes git fed gpg at SIGN time, tail): --"
od -An -tx1 -v "$SANDBOX/sign_stdin.bin" | tail -n 1
say "-- commit object body (bytes in the ODB, tail): --"
(cd "$repo" && git cat-file commit HEAD) | od -An -tx1 -v | tail -n 1
step "git verify-commit HEAD"
local rc=0
(cd "$repo" && git verify-commit HEAD) >/dev/null 2>&1 || rc=$?
if [[ -f "$SANDBOX/verify_stdin.bin" ]]; then
say "-- verify_stdin (bytes git fed gpg at VERIFY time, tail): --"
od -An -tx1 -v "$SANDBOX/verify_stdin.bin" | tail -n 1
fi
if (( rc == 0 )); then
say "verify: GOOD (exit 0)"
return 0
else
say "verify: BAD (exit $rc)"
return 1
fi
}
if run_case A; then
STATE_A=GOOD
else
STATE_A=BAD
fi
if run_case B i18n.commitEncoding=iso-8859-1; then
STATE_B=GOOD
else
STATE_B=BAD
fi
say ""
say "=== summary ==="
say "CASE A (default commitEncoding): $STATE_A"
say "CASE B (i18n.commitEncoding=iso-8859-1): $STATE_B"
if [[ "$STATE_A" == BAD && "$STATE_B" == GOOD ]]; then
say "RESULT: reproduced. This git has the v2.45+ regression:"
say " verify_utf8 runs after the signature is already computed,"
say " so the signer sees raw bytes and the ODB stores transcoded"
say " bytes. Workaround confirmed: i18n.commitEncoding=iso-8859-1."
exit 1
fi
if [[ "$STATE_A" == GOOD && "$STATE_B" == GOOD ]]; then
say "RESULT: not reproduced. Either git predates v2.45 or has a fix."
exit 0
fi
say "RESULT: unexpected state (A=$STATE_A B=$STATE_B)."
exit 2
```
Output:
```
>>> detected: git version 2.52.0, gpg (GnuPG) 2.4.5
>>> minting throwaway Ed25519 key
key: 5DDD17B4C7A2BB447EAC16F5280A03FBE5C7A8DF
=== CASE A ===
extra git config: <none>
>>> git init + config
>>> git commit -F msg.txt (invokes gpg-capture to sign)
-- msg.txt (bytes we asked git to commit): --
73 65 63 74 69 6f 6e 20 a7 0a
-- sign_stdin (bytes git fed gpg at SIGN time, tail): --
a7 0a
-- commit object body (bytes in the ODB, tail): --
20 c2 a7 0a
>>> git verify-commit HEAD
-- verify_stdin (bytes git fed gpg at VERIFY time, tail): --
c2 a7 0a
verify: BAD (exit 1)
=== CASE B ===
extra git config: i18n.commitEncoding=iso-8859-1
>>> git init + config
>>> git commit -F msg.txt (invokes gpg-capture to sign)
-- msg.txt (bytes we asked git to commit): --
73 65 63 74 69 6f 6e 20 a7 0a
-- sign_stdin (bytes git fed gpg at SIGN time, tail): --
69 6f 6e 20 a7 0a
-- commit object body (bytes in the ODB, tail): --
74 69 6f 6e 20 a7 0a
>>> git verify-commit HEAD
-- verify_stdin (bytes git fed gpg at VERIFY time, tail): --
69 6f 6e 20 a7 0a
verify: GOOD (exit 0)
=== summary ===
CASE A (default commitEncoding): BAD
CASE B (i18n.commitEncoding=iso-8859-1): GOOD
RESULT: reproduced. This git has the v2.45+ regression:
verify_utf8 runs after the signature is already computed,
so the signer sees raw bytes and the ODB stores transcoded
bytes. Workaround confirmed: i18n.commitEncoding=iso-8859-1.
```
I tested this in different ways.
- git 2.53.0 from Fedora 43 reproduces.
- git 2.53.0 from ubuntu-latest (GitHub Actions) reproduces.
- git 2.43.0 from Ubuntu 24.04 does NOT reproduce (v2.44 is the last
tag without 6206089cbd).
- Any signer produces the bug -- tested with GnuPG 2.4.9 and with
my non-GnuPG tclig [0]; both hash over the same pre-transcode bytes
the command-line signer is piped, so both get invalidated by the
post-sign transcode.
[0] https://crates.io/crates/tumpa-cli
Kushal
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [BUG] v2.45+: git commit -S invalidates signature for non-UTF-8 messages
2026-04-20 8:59 [BUG] v2.45+: git commit -S invalidates signature for non-UTF-8 messages Kushal Das
@ 2026-04-20 22:11 ` brian m. carlson
2026-04-20 22:14 ` [PATCH 1/2] commit: name UTF-8 function appropriately brian m. carlson
` (3 more replies)
0 siblings, 4 replies; 13+ messages in thread
From: brian m. carlson @ 2026-04-20 22:11 UTC (permalink / raw)
To: Kushal Das; +Cc: git
[-- Attachment #1: Type: text/plain, Size: 1648 bytes --]
On 2026-04-20 at 08:59:05, Kushal Das wrote:
> Hi all,
>
> Every `git commit -S` since v2.45.0 produces a permanently-BAD
> signature when the commit message contains bytes that are not valid
> UTF-8 AND `i18n.commitEncoding` is unset (i.e. the default case).
> Verification fails under both `gpg --verify` and any non-GnuPG signer.
> The failure is deterministic: it happens every time, on every
> non-UTF-8 commit, no card or external tooling needed.
I'm not sure that's a valid configuration. The commit message either
needs to be UTF-8 or you need to declare the encoding so Git can convert
it.
> My best guess is commit 6206089cbd0b1cb30a017ec904567f040ab4cea0 starting
> this (and I am maybe 100% wrong in identifying the cause).
It does bisect to that commit. I wrote that patch originally, but it
got modified and sent upstream by someone else. I'm not sure where it
got introduced, though.
> In pre-6206089cbd `commit_tree_extended`, `verify_utf8(&buffer)` ran
> BEFORE `sign_with_header(&buffer, sign_commit)`. `verify_utf8` is not
> a simple validator -- it mutates the strbuf in place, replacing
> invalid-UTF-8 bytes with their Latin-1 -> UTF-8 two-byte form. The
> signer therefore saw the transcoded bytes, and the same transcoded
> bytes were then written to the object database. Signer and
> verifier agreed.
The fact that we have a function called `verify_utf8` that does more
than verify is a problem. I'll send out a two-patch series in a minute
or two that first fixes that to be called `ensure_utf8` and then fixes
the issue.
--
brian m. carlson (they/them)
Toronto, Ontario, CA
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 325 bytes --]
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH 1/2] commit: name UTF-8 function appropriately
2026-04-20 22:11 ` brian m. carlson
@ 2026-04-20 22:14 ` brian m. carlson
2026-04-20 22:14 ` [PATCH 2/2] commit: sign commit after mutating buffer brian m. carlson
2026-04-22 15:10 ` [PATCH 1/2] commit: name UTF-8 function appropriately Elijah Newren
2026-04-21 7:39 ` [BUG] v2.45+: git commit -S invalidates signature for non-UTF-8 messages Kushal Das
` (2 subsequent siblings)
3 siblings, 2 replies; 13+ messages in thread
From: brian m. carlson @ 2026-04-20 22:14 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Kushal Das
We have a function named verify_utf8, but it does more than verify, it
modifies the buffer if it is not UTF-8. This is different from what
most people would expect, so call the function ensure_utf8, since it
mutates the buffer in some cases.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
commit.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/commit.c b/commit.c
index 80d8d07875..790dd2faed 100644
--- a/commit.c
+++ b/commit.c
@@ -1637,12 +1637,12 @@ static int find_invalid_utf8(const char *buf, int len)
}
/*
- * This verifies that the buffer is in proper utf8 format.
+ * This ensures that the buffer is in proper utf8 format.
*
* If it isn't, it assumes any non-utf8 characters are Latin1,
* and does the conversion.
*/
-static int verify_utf8(struct strbuf *buf)
+static int ensure_utf8(struct strbuf *buf)
{
int ok = 1;
long pos = 0;
@@ -1819,7 +1819,7 @@ int commit_tree_extended(const char *msg, size_t msg_len,
}
/* And check the encoding. */
- if (encoding_is_utf8 && (!verify_utf8(&buffer) || !verify_utf8(&compat_buffer)))
+ if (encoding_is_utf8 && (!ensure_utf8(&buffer) || !ensure_utf8(&compat_buffer)))
fprintf(stderr, _(commit_utf8_warn));
if (r->compat_hash_algo) {
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH 2/2] commit: sign commit after mutating buffer
2026-04-20 22:14 ` [PATCH 1/2] commit: name UTF-8 function appropriately brian m. carlson
@ 2026-04-20 22:14 ` brian m. carlson
2026-04-22 15:10 ` Elijah Newren
2026-04-22 15:10 ` [PATCH 1/2] commit: name UTF-8 function appropriately Elijah Newren
1 sibling, 1 reply; 13+ messages in thread
From: brian m. carlson @ 2026-04-20 22:14 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Kushal Das
The ensure_utf8 function can mutate the buffer to change its encoding,
so we must call it before signing the buffer so that we do not
invalidate the signature, which is made over raw bytes. Add a test for
this case as well using 0xfe and 0xff, which are never valid in UTF-8.
Reported-by: Kushal Das <kushal@sunet.se>
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
commit.c | 12 ++++++++----
t/t7510-signed-commit.sh | 8 ++++++++
2 files changed, 16 insertions(+), 4 deletions(-)
diff --git a/commit.c b/commit.c
index 790dd2faed..bc41859be1 100644
--- a/commit.c
+++ b/commit.c
@@ -1747,6 +1747,11 @@ int commit_tree_extended(const char *msg, size_t msg_len,
oidcpy(&parent_buf[i++], &p->item->object.oid);
write_commit_tree(&buffer, msg, msg_len, tree, parent_buf, nparents, author, committer, extra);
+
+ /* And check the encoding. */
+ if (encoding_is_utf8 && !ensure_utf8(&buffer))
+ fprintf(stderr, _(commit_utf8_warn));
+
if (sign_commit && sign_buffer(&buffer, &sig, sign_commit,
SIGN_BUFFER_USE_DEFAULT_KEY)) {
result = -1;
@@ -1780,6 +1785,9 @@ int commit_tree_extended(const char *msg, size_t msg_len,
free_commit_extra_headers(compat_extra);
free(mapped_parents);
+ if (encoding_is_utf8 && !ensure_utf8(&compat_buffer))
+ fprintf(stderr, _(commit_utf8_warn));
+
if (sign_commit && sign_buffer(&compat_buffer, &compat_sig,
sign_commit,
SIGN_BUFFER_USE_DEFAULT_KEY)) {
@@ -1818,10 +1826,6 @@ int commit_tree_extended(const char *msg, size_t msg_len,
}
}
- /* And check the encoding. */
- if (encoding_is_utf8 && (!ensure_utf8(&buffer) || !ensure_utf8(&compat_buffer)))
- fprintf(stderr, _(commit_utf8_warn));
-
if (r->compat_hash_algo) {
hash_object_file(r->compat_hash_algo, compat_buffer.buf, compat_buffer.len,
OBJ_COMMIT, &compat_oid_buf);
diff --git a/t/t7510-signed-commit.sh b/t/t7510-signed-commit.sh
index 1201c85ba6..071dbb3d39 100755
--- a/t/t7510-signed-commit.sh
+++ b/t/t7510-signed-commit.sh
@@ -462,4 +462,12 @@ test_expect_success 'custom `gpg.program`' '
git commit -S --allow-empty -m signed-commit
'
+test_expect_success GPG 'commit verifies with non-UTF-8 commit message' '
+ printf "I hate\\376\\377UTF-8\\n" >message &&
+ echo unusual-message >file &&
+ git add file &&
+ test_tick && git commit -S -F message &&
+ git verify-commit HEAD
+'
+
test_done
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [BUG] v2.45+: git commit -S invalidates signature for non-UTF-8 messages
2026-04-20 22:11 ` brian m. carlson
2026-04-20 22:14 ` [PATCH 1/2] commit: name UTF-8 function appropriately brian m. carlson
@ 2026-04-21 7:39 ` Kushal Das
2026-04-21 22:13 ` brian m. carlson
2026-04-22 18:13 ` D. Ben Knoble
2026-04-27 22:18 ` [PATCH v2 1/2] commit: name UTF-8 function appropriately brian m. carlson
3 siblings, 1 reply; 13+ messages in thread
From: Kushal Das @ 2026-04-21 7:39 UTC (permalink / raw)
To: brian m. carlson, git
Hi,
On 4/21/26 12:11 AM, brian m. carlson wrote:
> On 2026-04-20 at 08:59:05, Kushal Das wrote:
>> Hi all,
>>
>> Every `git commit -S` since v2.45.0 produces a permanently-BAD
>> signature when the commit message contains bytes that are not valid
>> UTF-8 AND `i18n.commitEncoding` is unset (i.e. the default case).
>> Verification fails under both `gpg --verify` and any non-GnuPG signer.
>> The failure is deterministic: it happens every time, on every
>> non-UTF-8 commit, no card or external tooling needed.
>
> I'm not sure that's a valid configuration. The commit message either
> needs to be UTF-8 or you need to declare the encoding so Git can convert
> it.
>
It is not a valid configuration, but I am guessing there are more people
like me who never knew this configuration and just freaked out by seeing
bad signatures over own commits :)
Thank you for quick fix.
I am also wondering in the test harness for git signing, if you want to
include other tools than gnupg for testing.
Kushal
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [BUG] v2.45+: git commit -S invalidates signature for non-UTF-8 messages
2026-04-21 7:39 ` [BUG] v2.45+: git commit -S invalidates signature for non-UTF-8 messages Kushal Das
@ 2026-04-21 22:13 ` brian m. carlson
0 siblings, 0 replies; 13+ messages in thread
From: brian m. carlson @ 2026-04-21 22:13 UTC (permalink / raw)
To: Kushal Das; +Cc: git
[-- Attachment #1: Type: text/plain, Size: 382 bytes --]
On 2026-04-21 at 07:39:11, Kushal Das wrote:
> I am also wondering in the test harness for git signing, if you want to
> include other tools than gnupg for testing.
The signing code is abstract at that point in the code, so it should
work identically with SSH or X.509 and I don't think a separate test is
necessary.
--
brian m. carlson (they/them)
Toronto, Ontario, CA
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 325 bytes --]
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 1/2] commit: name UTF-8 function appropriately
2026-04-20 22:14 ` [PATCH 1/2] commit: name UTF-8 function appropriately brian m. carlson
2026-04-20 22:14 ` [PATCH 2/2] commit: sign commit after mutating buffer brian m. carlson
@ 2026-04-22 15:10 ` Elijah Newren
1 sibling, 0 replies; 13+ messages in thread
From: Elijah Newren @ 2026-04-22 15:10 UTC (permalink / raw)
To: brian m. carlson; +Cc: git, Junio C Hamano, Kushal Das
On Mon, Apr 20, 2026 at 3:14 PM brian m. carlson
<sandals@crustytoothpaste.net> wrote:
>
> We have a function named verify_utf8, but it does more than verify, it
> modifies the buffer if it is not UTF-8. This is different from what
> most people would expect, so call the function ensure_utf8, since it
> mutates the buffer in some cases.
>
> Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
> ---
> commit.c | 6 +++---
> 1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/commit.c b/commit.c
> index 80d8d07875..790dd2faed 100644
> --- a/commit.c
> +++ b/commit.c
> @@ -1637,12 +1637,12 @@ static int find_invalid_utf8(const char *buf, int len)
> }
>
> /*
> - * This verifies that the buffer is in proper utf8 format.
> + * This ensures that the buffer is in proper utf8 format.
> *
> * If it isn't, it assumes any non-utf8 characters are Latin1,
> * and does the conversion.
> */
> -static int verify_utf8(struct strbuf *buf)
> +static int ensure_utf8(struct strbuf *buf)
> {
> int ok = 1;
> long pos = 0;
> @@ -1819,7 +1819,7 @@ int commit_tree_extended(const char *msg, size_t msg_len,
> }
>
> /* And check the encoding. */
> - if (encoding_is_utf8 && (!verify_utf8(&buffer) || !verify_utf8(&compat_buffer)))
> + if (encoding_is_utf8 && (!ensure_utf8(&buffer) || !ensure_utf8(&compat_buffer)))
> fprintf(stderr, _(commit_utf8_warn));
>
> if (r->compat_hash_algo) {
Makes sense.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 2/2] commit: sign commit after mutating buffer
2026-04-20 22:14 ` [PATCH 2/2] commit: sign commit after mutating buffer brian m. carlson
@ 2026-04-22 15:10 ` Elijah Newren
2026-04-24 20:17 ` brian m. carlson
0 siblings, 1 reply; 13+ messages in thread
From: Elijah Newren @ 2026-04-22 15:10 UTC (permalink / raw)
To: brian m. carlson; +Cc: git, Junio C Hamano, Kushal Das
On Mon, Apr 20, 2026 at 3:14 PM brian m. carlson
<sandals@crustytoothpaste.net> wrote:
>
> The ensure_utf8 function can mutate the buffer to change its encoding,
> so we must call it before signing the buffer so that we do not
> invalidate the signature, which is made over raw bytes. Add a test for
> this case as well using 0xfe and 0xff, which are never valid in UTF-8.
>
> Reported-by: Kushal Das <kushal@sunet.se>
> Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
> ---
> commit.c | 12 ++++++++----
> t/t7510-signed-commit.sh | 8 ++++++++
> 2 files changed, 16 insertions(+), 4 deletions(-)
>
> diff --git a/commit.c b/commit.c
> index 790dd2faed..bc41859be1 100644
> --- a/commit.c
> +++ b/commit.c
> @@ -1747,6 +1747,11 @@ int commit_tree_extended(const char *msg, size_t msg_len,
> oidcpy(&parent_buf[i++], &p->item->object.oid);
>
> write_commit_tree(&buffer, msg, msg_len, tree, parent_buf, nparents, author, committer, extra);
> +
> + /* And check the encoding. */
> + if (encoding_is_utf8 && !ensure_utf8(&buffer))
> + fprintf(stderr, _(commit_utf8_warn));
> +
> if (sign_commit && sign_buffer(&buffer, &sig, sign_commit,
> SIGN_BUFFER_USE_DEFAULT_KEY)) {
> result = -1;
> @@ -1780,6 +1785,9 @@ int commit_tree_extended(const char *msg, size_t msg_len,
> free_commit_extra_headers(compat_extra);
> free(mapped_parents);
>
> + if (encoding_is_utf8 && !ensure_utf8(&compat_buffer))
> + fprintf(stderr, _(commit_utf8_warn));
> +
So the users might see "commit message did not conform to UTF-8..."
twice? (Isn't compat_buffer likely to have invalid UTF-8 whenever
buffer does?) Do we want to avoid that double printing?
> if (sign_commit && sign_buffer(&compat_buffer, &compat_sig,
> sign_commit,
> SIGN_BUFFER_USE_DEFAULT_KEY)) {
> @@ -1818,10 +1826,6 @@ int commit_tree_extended(const char *msg, size_t msg_len,
> }
> }
>
> - /* And check the encoding. */
> - if (encoding_is_utf8 && (!ensure_utf8(&buffer) || !ensure_utf8(&compat_buffer)))
> - fprintf(stderr, _(commit_utf8_warn));
> -
Did the change in this patch also fix a short-circuiting error?
Previously, when both buffers had invalid UTF-8, we'd only call
ensure_utf8() on the first one and fix it, and then short-circuit and
not handle compat_buffer, right?
> if (r->compat_hash_algo) {
> hash_object_file(r->compat_hash_algo, compat_buffer.buf, compat_buffer.len,
> OBJ_COMMIT, &compat_oid_buf);
> diff --git a/t/t7510-signed-commit.sh b/t/t7510-signed-commit.sh
> index 1201c85ba6..071dbb3d39 100755
> --- a/t/t7510-signed-commit.sh
> +++ b/t/t7510-signed-commit.sh
> @@ -462,4 +462,12 @@ test_expect_success 'custom `gpg.program`' '
> git commit -S --allow-empty -m signed-commit
> '
>
> +test_expect_success GPG 'commit verifies with non-UTF-8 commit message' '
> + printf "I hate\\376\\377UTF-8\\n" >message &&
> + echo unusual-message >file &&
> + git add file &&
> + test_tick && git commit -S -F message &&
> + git verify-commit HEAD
> +'
Nice test.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [BUG] v2.45+: git commit -S invalidates signature for non-UTF-8 messages
2026-04-20 22:11 ` brian m. carlson
2026-04-20 22:14 ` [PATCH 1/2] commit: name UTF-8 function appropriately brian m. carlson
2026-04-21 7:39 ` [BUG] v2.45+: git commit -S invalidates signature for non-UTF-8 messages Kushal Das
@ 2026-04-22 18:13 ` D. Ben Knoble
2026-04-27 22:18 ` [PATCH v2 1/2] commit: name UTF-8 function appropriately brian m. carlson
3 siblings, 0 replies; 13+ messages in thread
From: D. Ben Knoble @ 2026-04-22 18:13 UTC (permalink / raw)
To: brian m. carlson, Kushal Das, git
On Mon, Apr 20, 2026 at 6:12 PM brian m. carlson
<sandals@crustytoothpaste.net> wrote:
>
> On 2026-04-20 at 08:59:05, Kushal Das wrote:
> > Hi all,
> >
> > Every `git commit -S` since v2.45.0 produces a permanently-BAD
> > signature when the commit message contains bytes that are not valid
> > UTF-8 AND `i18n.commitEncoding` is unset (i.e. the default case).
> > Verification fails under both `gpg --verify` and any non-GnuPG signer.
> > The failure is deterministic: it happens every time, on every
> > non-UTF-8 commit, no card or external tooling needed.
>
> I'm not sure that's a valid configuration. The commit message either
> needs to be UTF-8 or you need to declare the encoding so Git can convert
> it.
>
> > My best guess is commit 6206089cbd0b1cb30a017ec904567f040ab4cea0 starting
> > this (and I am maybe 100% wrong in identifying the cause).
>
> It does bisect to that commit. I wrote that patch originally, but it
> got modified and sent upstream by someone else. I'm not sure where it
> got introduced, though.
According to the `amlog` notes ref: <20231002024034.2611-9-ebiederm@gmail.com>
--
D. Ben Knoble
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 2/2] commit: sign commit after mutating buffer
2026-04-22 15:10 ` Elijah Newren
@ 2026-04-24 20:17 ` brian m. carlson
0 siblings, 0 replies; 13+ messages in thread
From: brian m. carlson @ 2026-04-24 20:17 UTC (permalink / raw)
To: Elijah Newren; +Cc: git, Junio C Hamano, Kushal Das
[-- Attachment #1: Type: text/plain, Size: 1796 bytes --]
On 2026-04-22 at 15:10:29, Elijah Newren wrote:
> On Mon, Apr 20, 2026 at 3:14 PM brian m. carlson
> <sandals@crustytoothpaste.net> wrote:
> > diff --git a/commit.c b/commit.c
> > index 790dd2faed..bc41859be1 100644
> > --- a/commit.c
> > +++ b/commit.c
> > @@ -1747,6 +1747,11 @@ int commit_tree_extended(const char *msg, size_t msg_len,
> > oidcpy(&parent_buf[i++], &p->item->object.oid);
> >
> > write_commit_tree(&buffer, msg, msg_len, tree, parent_buf, nparents, author, committer, extra);
> > +
> > + /* And check the encoding. */
> > + if (encoding_is_utf8 && !ensure_utf8(&buffer))
> > + fprintf(stderr, _(commit_utf8_warn));
> > +
> > if (sign_commit && sign_buffer(&buffer, &sig, sign_commit,
> > SIGN_BUFFER_USE_DEFAULT_KEY)) {
> > result = -1;
> > @@ -1780,6 +1785,9 @@ int commit_tree_extended(const char *msg, size_t msg_len,
> > free_commit_extra_headers(compat_extra);
> > free(mapped_parents);
> >
> > + if (encoding_is_utf8 && !ensure_utf8(&compat_buffer))
> > + fprintf(stderr, _(commit_utf8_warn));
> > +
>
> So the users might see "commit message did not conform to UTF-8..."
> twice? (Isn't compat_buffer likely to have invalid UTF-8 whenever
> buffer does?) Do we want to avoid that double printing?
Yeah, I'll fix that in v2.
> Did the change in this patch also fix a short-circuiting error?
> Previously, when both buffers had invalid UTF-8, we'd only call
> ensure_utf8() on the first one and fix it, and then short-circuit and
> not handle compat_buffer, right?
I believe it did, yes.
--
brian m. carlson (they/them)
Toronto, Ontario, CA
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 325 bytes --]
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH v2 1/2] commit: name UTF-8 function appropriately
2026-04-20 22:11 ` brian m. carlson
` (2 preceding siblings ...)
2026-04-22 18:13 ` D. Ben Knoble
@ 2026-04-27 22:18 ` brian m. carlson
2026-04-27 22:18 ` [PATCH v2 2/2] commit: sign commit after mutating buffer brian m. carlson
3 siblings, 1 reply; 13+ messages in thread
From: brian m. carlson @ 2026-04-27 22:18 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Kushal Das, Elijah Newren
We have a function named verify_utf8, but it does more than verify, it
modifies the buffer if it is not UTF-8. This is different from what
most people would expect, so call the function ensure_utf8, since it
mutates the buffer in some cases.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
commit.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/commit.c b/commit.c
index 80d8d07875..790dd2faed 100644
--- a/commit.c
+++ b/commit.c
@@ -1637,12 +1637,12 @@ static int find_invalid_utf8(const char *buf, int len)
}
/*
- * This verifies that the buffer is in proper utf8 format.
+ * This ensures that the buffer is in proper utf8 format.
*
* If it isn't, it assumes any non-utf8 characters are Latin1,
* and does the conversion.
*/
-static int verify_utf8(struct strbuf *buf)
+static int ensure_utf8(struct strbuf *buf)
{
int ok = 1;
long pos = 0;
@@ -1819,7 +1819,7 @@ int commit_tree_extended(const char *msg, size_t msg_len,
}
/* And check the encoding. */
- if (encoding_is_utf8 && (!verify_utf8(&buffer) || !verify_utf8(&compat_buffer)))
+ if (encoding_is_utf8 && (!ensure_utf8(&buffer) || !ensure_utf8(&compat_buffer)))
fprintf(stderr, _(commit_utf8_warn));
if (r->compat_hash_algo) {
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v2 2/2] commit: sign commit after mutating buffer
2026-04-27 22:18 ` [PATCH v2 1/2] commit: name UTF-8 function appropriately brian m. carlson
@ 2026-04-27 22:18 ` brian m. carlson
2026-05-12 5:54 ` Junio C Hamano
0 siblings, 1 reply; 13+ messages in thread
From: brian m. carlson @ 2026-04-27 22:18 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Kushal Das, Elijah Newren
The ensure_utf8 function can mutate the buffer to change its encoding,
so we must call it before signing the buffer so that we do not
invalidate the signature, which is made over raw bytes. Fix a bug which
caused the compatibility code to not convert the compatibility buffer if
the main buffer was invalid UTF-8. We expect both buffers to be valid
UTF-8 or both invalid, since the only data that would differ between
them would be hex object IDs, which are always valid UTF-8.
Add a test for this case using 0xfe and 0xff, which are never valid in
UTF-8.
Reported-by: Kushal Das <kushal@sunet.se>
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
commit.c | 15 +++++++++++----
t/t7510-signed-commit.sh | 10 ++++++++++
2 files changed, 21 insertions(+), 4 deletions(-)
diff --git a/commit.c b/commit.c
index 790dd2faed..e5d725fe93 100644
--- a/commit.c
+++ b/commit.c
@@ -1726,6 +1726,7 @@ int commit_tree_extended(const char *msg, size_t msg_len,
struct repository *r = the_repository;
int result = 0;
int encoding_is_utf8;
+ bool warned = false;
struct strbuf buffer = STRBUF_INIT, compat_buffer = STRBUF_INIT;
struct strbuf sig = STRBUF_INIT, compat_sig = STRBUF_INIT;
struct object_id *parent_buf = NULL, *compat_oid = NULL;
@@ -1747,6 +1748,13 @@ int commit_tree_extended(const char *msg, size_t msg_len,
oidcpy(&parent_buf[i++], &p->item->object.oid);
write_commit_tree(&buffer, msg, msg_len, tree, parent_buf, nparents, author, committer, extra);
+
+ /* And check the encoding. */
+ if (encoding_is_utf8 && !ensure_utf8(&buffer)) {
+ fprintf(stderr, _(commit_utf8_warn));
+ warned = true;
+ }
+
if (sign_commit && sign_buffer(&buffer, &sig, sign_commit,
SIGN_BUFFER_USE_DEFAULT_KEY)) {
result = -1;
@@ -1780,6 +1788,9 @@ int commit_tree_extended(const char *msg, size_t msg_len,
free_commit_extra_headers(compat_extra);
free(mapped_parents);
+ if (encoding_is_utf8 && !ensure_utf8(&compat_buffer) && !warned)
+ fprintf(stderr, _(commit_utf8_warn));
+
if (sign_commit && sign_buffer(&compat_buffer, &compat_sig,
sign_commit,
SIGN_BUFFER_USE_DEFAULT_KEY)) {
@@ -1818,10 +1829,6 @@ int commit_tree_extended(const char *msg, size_t msg_len,
}
}
- /* And check the encoding. */
- if (encoding_is_utf8 && (!ensure_utf8(&buffer) || !ensure_utf8(&compat_buffer)))
- fprintf(stderr, _(commit_utf8_warn));
-
if (r->compat_hash_algo) {
hash_object_file(r->compat_hash_algo, compat_buffer.buf, compat_buffer.len,
OBJ_COMMIT, &compat_oid_buf);
diff --git a/t/t7510-signed-commit.sh b/t/t7510-signed-commit.sh
index 1201c85ba6..aa9108da54 100755
--- a/t/t7510-signed-commit.sh
+++ b/t/t7510-signed-commit.sh
@@ -462,4 +462,14 @@ test_expect_success 'custom `gpg.program`' '
git commit -S --allow-empty -m signed-commit
'
+test_expect_success GPG 'commit verifies with non-UTF-8 commit message' '
+ printf "I hate\\376\\377UTF-8\\n" >message &&
+ echo unusual-message >file &&
+ git add file &&
+ test_tick && git commit -S -F message 2>err &&
+ git verify-commit HEAD &&
+ grep "commit message did not conform to UTF-8" err >lines &&
+ test_line_count = 1 lines
+'
+
test_done
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH v2 2/2] commit: sign commit after mutating buffer
2026-04-27 22:18 ` [PATCH v2 2/2] commit: sign commit after mutating buffer brian m. carlson
@ 2026-05-12 5:54 ` Junio C Hamano
0 siblings, 0 replies; 13+ messages in thread
From: Junio C Hamano @ 2026-05-12 5:54 UTC (permalink / raw)
To: brian m. carlson; +Cc: git, Kushal Das, Elijah Newren
"brian m. carlson" <sandals@crustytoothpaste.net> writes:
> The ensure_utf8 function can mutate the buffer to change its encoding,
> so we must call it before signing the buffer so that we do not
> invalidate the signature, which is made over raw bytes. Fix a bug which
> caused the compatibility code to not convert the compatibility buffer if
> the main buffer was invalid UTF-8. We expect both buffers to be valid
> UTF-8 or both invalid, since the only data that would differ between
> them would be hex object IDs, which are always valid UTF-8.
>
> Add a test for this case using 0xfe and 0xff, which are never valid in
> UTF-8.
>
> Reported-by: Kushal Das <kushal@sunet.se>
> Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
> ---
> commit.c | 15 +++++++++++----
> t/t7510-signed-commit.sh | 10 ++++++++++
> 2 files changed, 21 insertions(+), 4 deletions(-)
This iteration hasn't seen any reaction but comparing it with the
previous round and peeking at comments that the previous round
received, I guess everybody commented on the previous round is happy
with this version.
Let me mark the topic for 'next'.
Thanks.
>
> diff --git a/commit.c b/commit.c
> index 790dd2faed..e5d725fe93 100644
> --- a/commit.c
> +++ b/commit.c
> @@ -1726,6 +1726,7 @@ int commit_tree_extended(const char *msg, size_t msg_len,
> struct repository *r = the_repository;
> int result = 0;
> int encoding_is_utf8;
> + bool warned = false;
> struct strbuf buffer = STRBUF_INIT, compat_buffer = STRBUF_INIT;
> struct strbuf sig = STRBUF_INIT, compat_sig = STRBUF_INIT;
> struct object_id *parent_buf = NULL, *compat_oid = NULL;
> @@ -1747,6 +1748,13 @@ int commit_tree_extended(const char *msg, size_t msg_len,
> oidcpy(&parent_buf[i++], &p->item->object.oid);
>
> write_commit_tree(&buffer, msg, msg_len, tree, parent_buf, nparents, author, committer, extra);
> +
> + /* And check the encoding. */
> + if (encoding_is_utf8 && !ensure_utf8(&buffer)) {
> + fprintf(stderr, _(commit_utf8_warn));
> + warned = true;
> + }
> +
> if (sign_commit && sign_buffer(&buffer, &sig, sign_commit,
> SIGN_BUFFER_USE_DEFAULT_KEY)) {
> result = -1;
> @@ -1780,6 +1788,9 @@ int commit_tree_extended(const char *msg, size_t msg_len,
> free_commit_extra_headers(compat_extra);
> free(mapped_parents);
>
> + if (encoding_is_utf8 && !ensure_utf8(&compat_buffer) && !warned)
> + fprintf(stderr, _(commit_utf8_warn));
> +
> if (sign_commit && sign_buffer(&compat_buffer, &compat_sig,
> sign_commit,
> SIGN_BUFFER_USE_DEFAULT_KEY)) {
> @@ -1818,10 +1829,6 @@ int commit_tree_extended(const char *msg, size_t msg_len,
> }
> }
>
> - /* And check the encoding. */
> - if (encoding_is_utf8 && (!ensure_utf8(&buffer) || !ensure_utf8(&compat_buffer)))
> - fprintf(stderr, _(commit_utf8_warn));
> -
> if (r->compat_hash_algo) {
> hash_object_file(r->compat_hash_algo, compat_buffer.buf, compat_buffer.len,
> OBJ_COMMIT, &compat_oid_buf);
> diff --git a/t/t7510-signed-commit.sh b/t/t7510-signed-commit.sh
> index 1201c85ba6..aa9108da54 100755
> --- a/t/t7510-signed-commit.sh
> +++ b/t/t7510-signed-commit.sh
> @@ -462,4 +462,14 @@ test_expect_success 'custom `gpg.program`' '
> git commit -S --allow-empty -m signed-commit
> '
>
> +test_expect_success GPG 'commit verifies with non-UTF-8 commit message' '
> + printf "I hate\\376\\377UTF-8\\n" >message &&
> + echo unusual-message >file &&
> + git add file &&
> + test_tick && git commit -S -F message 2>err &&
> + git verify-commit HEAD &&
> + grep "commit message did not conform to UTF-8" err >lines &&
> + test_line_count = 1 lines
> +'
> +
> test_done
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2026-05-12 5:54 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-20 8:59 [BUG] v2.45+: git commit -S invalidates signature for non-UTF-8 messages Kushal Das
2026-04-20 22:11 ` brian m. carlson
2026-04-20 22:14 ` [PATCH 1/2] commit: name UTF-8 function appropriately brian m. carlson
2026-04-20 22:14 ` [PATCH 2/2] commit: sign commit after mutating buffer brian m. carlson
2026-04-22 15:10 ` Elijah Newren
2026-04-24 20:17 ` brian m. carlson
2026-04-22 15:10 ` [PATCH 1/2] commit: name UTF-8 function appropriately Elijah Newren
2026-04-21 7:39 ` [BUG] v2.45+: git commit -S invalidates signature for non-UTF-8 messages Kushal Das
2026-04-21 22:13 ` brian m. carlson
2026-04-22 18:13 ` D. Ben Knoble
2026-04-27 22:18 ` [PATCH v2 1/2] commit: name UTF-8 function appropriately brian m. carlson
2026-04-27 22:18 ` [PATCH v2 2/2] commit: sign commit after mutating buffer brian m. carlson
2026-05-12 5:54 ` Junio C Hamano
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox