public inbox for git@vger.kernel.org
 help / color / mirror / Atom feed
From: Patrick Steinhardt <ps@pks.im>
To: Jeff King <peff@peff.net>
Cc: Collin Funk <collin.funk1@gmail.com>,
	git@vger.kernel.org, Junio C Hamano <gitster@pobox.com>
Subject: Re: [PATCH 4/4] ref-filter: avoid strrchr() in rstrip_ref_components()
Date: Mon, 16 Feb 2026 08:23:44 +0100	[thread overview]
Message-ID: <aZLGAMiDdZ_vplND@pks.im> (raw)
In-Reply-To: <20260215090744.GD695631@coredump.intra.peff.net>

On Sun, Feb 15, 2026 at 04:07:44AM -0500, Jeff King wrote:
> To strip path components from our refname string, we repeatedly call
> strrchr() to find the trailing slash, shortening the string each time by
> assigning NUL over it. This has two downsides:
> 
>   1. Calling strrchr() in a loop is quadratic, since each call has to
>      call strlen() under the hood to find the end of the string (even
>      though we know exactly where it is from the last loop iteration).

Ah, indeed, that's something I missed.

>   2. We need a temporary buffer, since we're munging the string with NUL
>      as we shorten it (which we must do, because strrchr() has no other
>      way of knowing what we consider the end of the string).

Right, upon reading the preceding patch I figured that we can improve
this function even further and avoid the call to `xstrdup()` in the case
where we have less components than we're being asked to strip.

> Using memrchr() would let us fix both of these, but it isn't portable.
> So instead, let's just open-code the string traversal from back to
> front as we loop.
> 
> I doubt that the quadratic nature is a serious concern. You can see it
> in practice with something like:
> 
>   git init
>   git commit --allow-empty -m foo
>   echo "$(git rev-parse HEAD) refs/heads$(perl -e 'print "/a" x 500_000')" >.git/packed-refs
>   time git for-each-ref --format='%(refname:rstrip=-1)'
> 
> That takes ~5.5s to run on my machine before this patch, and ~11ms
> after. But I don't think there's a reasonable way for somebody to infect
> you with such a garbage ref, as the wire protocol is limited to 64k
> pkt-lines. The difference is measurable for me for a 32k-component ref
> (about 19ms vs 7ms), so perhaps you could create some chaos by pushing a
> lot of them. But we also run into filesystem limits (if the loose
> backend is in use), and in practice it seems like there are probably
> simpler and more effective ways to waste CPU.

Agreed, not much of a concern, but good regardless to see it being
addressed.

> Likewise the extra allocation probably isn't really measurable. In fact,
> since our goal is to return an allocated string, we end up having to
> make the same allocation anyway (though it is sized to the result,
> rather than the input). My main goal was simplicity in avoiding the need
> to handle cleaning it up in the early return path.

Likewise.

> diff --git a/ref-filter.c b/ref-filter.c
> index 1008b2fd5a..ac32b0e6bb 100644
> --- a/ref-filter.c
> +++ b/ref-filter.c
> @@ -2213,17 +2213,15 @@ static const char *lstrip_ref_components(const char *refname, int len)
>  static const char *rstrip_ref_components(const char *refname, int len)
>  {
>  	int remaining = normalize_component_count(refname, len);
> -	char *start = xstrdup(refname);
> +	const char *end = refname + strlen(refname);
>  
> -	while (remaining-- > 0) {
> -		char *p = strrchr(start, '/');
> -		if (!p) {
> -			free(start);
> +	while (remaining > 0) {
> +		if (end == refname)
>  			return xstrdup("");
> -		} else
> -			p[0] = '\0';
> +		if (*--end == '/')
> +			remaining--;

We start scannign from the trailing NUL byte, so this would also cause
us to detect if the refname had "/" as a suffix. But I assume that's a
case we don't even need to care about, as refs cannot end with a slash
anyway.

Another edge case is if we were passed the empty string, but as we
already abort in case we see that `end == refname` we're good there,
too.

>  	}
> -	return start;
> +	return xmemdupz(refname, end - refname);
>  }

So overall this and all the preceding patches look good to me. Thanks!

Patrick

  reply	other threads:[~2026-02-16  7:23 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-14  5:15 [PATCH] ref-filter: don't declare a strdup'd variable const before writing to it Collin Funk
2026-02-15  8:57 ` [PATCH 0/4] cleaning up ref-filter lstrip/rstrip code Jeff King
2026-02-15  9:00   ` [PATCH 1/4] ref-filter: factor out refname component counting Jeff King
2026-02-17 18:07     ` Junio C Hamano
2026-02-19 11:21       ` Jeff King
2026-02-19 18:56         ` Junio C Hamano
2026-02-20  6:00           ` [PATCH] ref-filter: clarify lstrip/rstrip " Jeff King
2026-02-22 17:04         ` [PATCH 1/4] ref-filter: factor out refname " Karthik Nayak
2026-02-15  9:02   ` [PATCH 2/4] ref-filter: simplify lstrip_ref_components() memory handling Jeff King
2026-02-15  9:05   ` [PATCH 3/4] ref-filter: simplify rstrip_ref_components() " Jeff King
2026-02-15  9:07   ` [PATCH 4/4] ref-filter: avoid strrchr() in rstrip_ref_components() Jeff King
2026-02-16  7:23     ` Patrick Steinhardt [this message]
2026-02-15  9:11   ` [PATCH 0/4] cleaning up ref-filter lstrip/rstrip code Jeff King
2026-02-15 22:23   ` Collin Funk

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aZLGAMiDdZ_vplND@pks.im \
    --to=ps@pks.im \
    --cc=collin.funk1@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox