From: Patrick Steinhardt <ps@pks.im>
To: Jeff King <peff@peff.net>
Cc: Collin Funk <collin.funk1@gmail.com>,
git@vger.kernel.org, Junio C Hamano <gitster@pobox.com>
Subject: Re: [PATCH 4/4] ref-filter: avoid strrchr() in rstrip_ref_components()
Date: Mon, 16 Feb 2026 08:23:44 +0100 [thread overview]
Message-ID: <aZLGAMiDdZ_vplND@pks.im> (raw)
In-Reply-To: <20260215090744.GD695631@coredump.intra.peff.net>
On Sun, Feb 15, 2026 at 04:07:44AM -0500, Jeff King wrote:
> To strip path components from our refname string, we repeatedly call
> strrchr() to find the trailing slash, shortening the string each time by
> assigning NUL over it. This has two downsides:
>
> 1. Calling strrchr() in a loop is quadratic, since each call has to
> call strlen() under the hood to find the end of the string (even
> though we know exactly where it is from the last loop iteration).
Ah, indeed, that's something I missed.
> 2. We need a temporary buffer, since we're munging the string with NUL
> as we shorten it (which we must do, because strrchr() has no other
> way of knowing what we consider the end of the string).
Right, upon reading the preceding patch I figured that we can improve
this function even further and avoid the call to `xstrdup()` in the case
where we have less components than we're being asked to strip.
> Using memrchr() would let us fix both of these, but it isn't portable.
> So instead, let's just open-code the string traversal from back to
> front as we loop.
>
> I doubt that the quadratic nature is a serious concern. You can see it
> in practice with something like:
>
> git init
> git commit --allow-empty -m foo
> echo "$(git rev-parse HEAD) refs/heads$(perl -e 'print "/a" x 500_000')" >.git/packed-refs
> time git for-each-ref --format='%(refname:rstrip=-1)'
>
> That takes ~5.5s to run on my machine before this patch, and ~11ms
> after. But I don't think there's a reasonable way for somebody to infect
> you with such a garbage ref, as the wire protocol is limited to 64k
> pkt-lines. The difference is measurable for me for a 32k-component ref
> (about 19ms vs 7ms), so perhaps you could create some chaos by pushing a
> lot of them. But we also run into filesystem limits (if the loose
> backend is in use), and in practice it seems like there are probably
> simpler and more effective ways to waste CPU.
Agreed, not much of a concern, but good regardless to see it being
addressed.
> Likewise the extra allocation probably isn't really measurable. In fact,
> since our goal is to return an allocated string, we end up having to
> make the same allocation anyway (though it is sized to the result,
> rather than the input). My main goal was simplicity in avoiding the need
> to handle cleaning it up in the early return path.
Likewise.
> diff --git a/ref-filter.c b/ref-filter.c
> index 1008b2fd5a..ac32b0e6bb 100644
> --- a/ref-filter.c
> +++ b/ref-filter.c
> @@ -2213,17 +2213,15 @@ static const char *lstrip_ref_components(const char *refname, int len)
> static const char *rstrip_ref_components(const char *refname, int len)
> {
> int remaining = normalize_component_count(refname, len);
> - char *start = xstrdup(refname);
> + const char *end = refname + strlen(refname);
>
> - while (remaining-- > 0) {
> - char *p = strrchr(start, '/');
> - if (!p) {
> - free(start);
> + while (remaining > 0) {
> + if (end == refname)
> return xstrdup("");
> - } else
> - p[0] = '\0';
> + if (*--end == '/')
> + remaining--;
We start scannign from the trailing NUL byte, so this would also cause
us to detect if the refname had "/" as a suffix. But I assume that's a
case we don't even need to care about, as refs cannot end with a slash
anyway.
Another edge case is if we were passed the empty string, but as we
already abort in case we see that `end == refname` we're good there,
too.
> }
> - return start;
> + return xmemdupz(refname, end - refname);
> }
So overall this and all the preceding patches look good to me. Thanks!
Patrick
next prev parent reply other threads:[~2026-02-16 7:23 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-14 5:15 [PATCH] ref-filter: don't declare a strdup'd variable const before writing to it Collin Funk
2026-02-15 8:57 ` [PATCH 0/4] cleaning up ref-filter lstrip/rstrip code Jeff King
2026-02-15 9:00 ` [PATCH 1/4] ref-filter: factor out refname component counting Jeff King
2026-02-17 18:07 ` Junio C Hamano
2026-02-19 11:21 ` Jeff King
2026-02-19 18:56 ` Junio C Hamano
2026-02-20 6:00 ` [PATCH] ref-filter: clarify lstrip/rstrip " Jeff King
2026-02-22 17:04 ` [PATCH 1/4] ref-filter: factor out refname " Karthik Nayak
2026-02-15 9:02 ` [PATCH 2/4] ref-filter: simplify lstrip_ref_components() memory handling Jeff King
2026-02-15 9:05 ` [PATCH 3/4] ref-filter: simplify rstrip_ref_components() " Jeff King
2026-02-15 9:07 ` [PATCH 4/4] ref-filter: avoid strrchr() in rstrip_ref_components() Jeff King
2026-02-16 7:23 ` Patrick Steinhardt [this message]
2026-02-15 9:11 ` [PATCH 0/4] cleaning up ref-filter lstrip/rstrip code Jeff King
2026-02-15 22:23 ` Collin Funk
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aZLGAMiDdZ_vplND@pks.im \
--to=ps@pks.im \
--cc=collin.funk1@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.