git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Patrick Steinhardt <ps@pks.im>
To: git@vger.kernel.org
Cc: Elijah Newren <newren@gmail.com>,
	Karthik Nayak <karthik.188@gmail.com>,
	 "brian m. carlson" <sandals@crustytoothpaste.net>,
	 Jeff King <peff@peff.net>, Junio C Hamano <gitster@pobox.com>,
	 shejialuo <shejialuo@gmail.com>,
	Christian Couder <chriscool@tuxfamily.org>
Subject: [PATCH] refs/packed: fix BUG when seeking refs with UTF-8 characters
Date: Fri, 04 Apr 2025 12:58:38 +0200	[thread overview]
Message-ID: <20250404-b4-pks-packed-backend-seek-with-utf8-v1-1-6ceb694e3bd7@pks.im> (raw)

It was reported that using git-pull(1) in a repository whose remote
contains branches with emojis leads to the following bug:

    $ git pull
    remote: Enumerating objects: 161255, done.
    remote: Counting objects: 100% (55884/55884), done.
    remote: Compressing objects: 100% (5518/5518), done.
    remote: Total 161255 (delta 54253), reused 50509 (delta 50364),
    pack-reused 105371 (from 4)
    Receiving objects: 100% (161255/161255), 309.90 MiB | 16.87 MiB/s, done.
    Resolving deltas: 100% (118048/118048), completed with 13416 local objects.
    From github.com:github/github
       97ab7ae3f3745..8fb2f9fa180ed  master -> origin/master
    [...snip many screenfuls of updates to origin remotes...]
    BUG: refs/packed-backend.c:984: packed-refs backend yielded reference
    preceding its prefix
    error: fetch died of signal 6

This issue bisects to 22600c04529 (refs/iterator: implement seeking for
packed-ref iterators, 2025-03-12) where we have implemented seeking for
the packed-ref iterator. As part of that change we introduced a check
that verifies that the iterator only returns refnames bigger than the
prefix. In theory, this check should always hold: when a prefix is set
we know that we would've seeked that prefix first, so we should never
see a reference sorting before that prefix.

But in practice the check itself is misbehaving when handling unicode
characters. The particular issue triggered with a branch that got the
"shaved ice" unicode character in its name, which is composed of the
bytes "0xEE 0x90 0xBF". The bug triggers when we compare the refname
"refs/heads/<shaved-ice>" to something like "refs/heads/z", and it
specifically hits when comparing the first byte, "0xEE".

The root cause is that the most-significant bit of 0xEE is set. The
`refname` and `prefix` pointers that we use to compare bytes with one
another are both pointers to signed characters. As such, when we
dereference the 0xEE byte the result is a _negative_ value, and this
value will of course compare smaller than "z".

We can see that this issue is avoided in `cmp_packed_refname()`, where
we explicitly cast each byte to its unsigned form. Fix the bug by doing
the same in `packed_ref_iterator_advance()`.

Reported-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
Hi,

this patch addresses the issue reported by Elijah at [1]. Thanks!

Patrick

[1]: <CABPp-BFBqC_t5QSexRQpYsqXBa11WK+OqGt167E=K=xod=buQw@mail.gmail.com>
---
 refs/packed-backend.c  |  4 ++--
 t/t1408-packed-refs.sh | 15 +++++++++++++++
 2 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index b4289a7d9ce..7e31904bd41 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -980,9 +980,9 @@ static int packed_ref_iterator_advance(struct ref_iterator *ref_iterator)
 			continue;
 
 		while (prefix && *prefix) {
-			if (*refname < *prefix)
+			if ((unsigned char)*refname < (unsigned char)*prefix)
 				BUG("packed-refs backend yielded reference preceding its prefix");
-			else if (*refname > *prefix)
+			else if ((unsigned char)*refname > (unsigned char)*prefix)
 				return ITER_DONE;
 			prefix++;
 			refname++;
diff --git a/t/t1408-packed-refs.sh b/t/t1408-packed-refs.sh
index 41ba1f1d7fc..833477f0fa3 100755
--- a/t/t1408-packed-refs.sh
+++ b/t/t1408-packed-refs.sh
@@ -42,4 +42,19 @@ test_expect_success 'no error from stale entry in packed-refs' '
 	test_cmp expect actual
 '
 
+test_expect_success 'list packed refs with unicode characters' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+		test_commit --no-tag A &&
+		git update-ref refs/heads/ HEAD &&
+		git update-ref refs/heads/z HEAD &&
+		git pack-refs --all &&
+		printf "%s commit\trefs/heads/z\n" $(git rev-parse HEAD) >expect &&
+		git for-each-ref refs/heads/z >actual &&
+		test_cmp expect actual
+	)
+'
+
 test_done

---
base-commit: 5b97a56fa0e7d580dc8865b73107407c9b3f0eff
change-id: 20250404-b4-pks-packed-backend-seek-with-utf8-668c182ddcf7


             reply	other threads:[~2025-04-04 10:58 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-04 10:58 Patrick Steinhardt [this message]
2025-04-04 18:21 ` [PATCH] refs/packed: fix BUG when seeking refs with UTF-8 characters Elijah Newren
2025-04-04 20:57 ` Jeff King
2025-04-05  1:38   ` brian m. carlson
2025-04-08  6:46     ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250404-b4-pks-packed-backend-seek-with-utf8-v1-1-6ceb694e3bd7@pks.im \
    --to=ps@pks.im \
    --cc=chriscool@tuxfamily.org \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=karthik.188@gmail.com \
    --cc=newren@gmail.com \
    --cc=peff@peff.net \
    --cc=sandals@crustytoothpaste.net \
    --cc=shejialuo@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).