public inbox for git@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] ref-filter: don't declare a strdup'd variable const before writing to it
@ 2026-02-14  5:15 Collin Funk
  2026-02-15  8:57 ` [PATCH 0/4] cleaning up ref-filter lstrip/rstrip code Jeff King
  0 siblings, 1 reply; 14+ messages in thread
From: Collin Funk @ 2026-02-14  5:15 UTC (permalink / raw)
  To: git; +Cc: Collin Funk, Junio C Hamano

I generally don't like the casts like in rstrip_ref_components and
rstrip_ref_components because they force you to write this:

    free((char *)free_ptr);

And the const doesn't really benefit readability, in my opinion.

That is a bit of a seperate topic than fixing the warning, though, so
I left them as-is.

-- 8< --

With glibc-2.43 there is the following warning:

    ../ref-filter.c: In function ‘rstrip_ref_components’:
    ../ref-filter.c:2237:27: warning: initialization discards ‘const’ qualifier from pointer target type [-Wdiscarded-qualifiers]
     2237 |                 char *p = strrchr(start, '/');
          |

We can remove the const from "start" since it is the result of strdup
and we end up writing to it through "p".

Signed-off-by: Collin Funk <collin.funk1@gmail.com>
---
 ref-filter.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ref-filter.c b/ref-filter.c
index 3917c4ccd9..183cb6bbd7 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -2214,7 +2214,7 @@ static const char *lstrip_ref_components(const char *refname, int len)
 static const char *rstrip_ref_components(const char *refname, int len)
 {
 	long remaining = len;
-	const char *start = xstrdup(refname);
+	char *start = xstrdup(refname);
 	const char *to_free = start;
 
 	if (len < 0) {
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 0/4] cleaning up ref-filter lstrip/rstrip code
  2026-02-14  5:15 [PATCH] ref-filter: don't declare a strdup'd variable const before writing to it Collin Funk
@ 2026-02-15  8:57 ` Jeff King
  2026-02-15  9:00   ` [PATCH 1/4] ref-filter: factor out refname component counting Jeff King
                     ` (5 more replies)
  0 siblings, 6 replies; 14+ messages in thread
From: Jeff King @ 2026-02-15  8:57 UTC (permalink / raw)
  To: Collin Funk; +Cc: git, Junio C Hamano

On Fri, Feb 13, 2026 at 09:15:57PM -0800, Collin Funk wrote:

> I generally don't like the casts like in rstrip_ref_components and
> rstrip_ref_components because they force you to write this:
> 
>     free((char *)free_ptr);
> 
> And the const doesn't really benefit readability, in my opinion.

Agreed. It is especially egregious in this case because the const
variable is called to_free, and so its only purpose is to be non-const. ;)

> That is a bit of a seperate topic than fixing the warning, though, so
> I left them as-is.

It is a separate topic, but I feel like this is a good opportunity to
make this code less horrible. That is, there are some obvious
low-hanging cleanups that make the code more readable, and as a side
effect we clean up the const confusion. In such cases I think it is
worth veering off the path a little.

I was going to catalog the numerous flaws I found, but by the time I
explained them, I had basically written patches and commit messages. So
here is what I would propose instead. I hope I'm not stealing your
thunder nor knocking us too far off our goal.

The first three I hope are no-brainers, and the final one fixes the
glibc const issue. The fourth is perhaps more risky.

  [1/4]: ref-filter: factor out refname component counting
  [2/4]: ref-filter: simplify lstrip_ref_components() memory handling
  [3/4]: ref-filter: simplify rstrip_ref_components() memory handling
  [4/4]: ref-filter: open-code slash search in rstrip_ref_components()

 ref-filter.c | 54 +++++++++++++++++-----------------------------------
 1 file changed, 17 insertions(+), 37 deletions(-)

-Peff

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/4] ref-filter: factor out refname component counting
  2026-02-15  8:57 ` [PATCH 0/4] cleaning up ref-filter lstrip/rstrip code Jeff King
@ 2026-02-15  9:00   ` Jeff King
  2026-02-17 18:07     ` Junio C Hamano
  2026-02-15  9:02   ` [PATCH 2/4] ref-filter: simplify lstrip_ref_components() memory handling Jeff King
                     ` (4 subsequent siblings)
  5 siblings, 1 reply; 14+ messages in thread
From: Jeff King @ 2026-02-15  9:00 UTC (permalink / raw)
  To: Collin Funk; +Cc: git, Junio C Hamano

The "lstrip" and "rstrip" options to the %(refname) placeholder both
accept a negative length, which asks us to keep that many path
components (rather than stripping that many).

The code to count components and convert the negative value to a
positive was copied from lstrip to rstrip in 1a34728e6b (ref-filter: add
an 'rstrip=<N>' option to atoms which deal with refnames, 2017-01-10).

Let's factor it out into a separate function. This reduces duplication
and also makes the lstrip/rstrip functions much easier to follow, since
the bulk of their code is now the actual stripping.

Note that the computed "remaining" value is currently stored as a
"long", so in theory that's what our function should return. But this is
purely historical. When the variable was added in 0571979bd6 (tag: do
not show ambiguous tag names as "tags/foo", 2016-01-25), we parsed the
value from strtol(), and thus used a long. But these days we take "len"
as an int, and also use an int to count up components. So let's just
consistently use int here. This value could only overflow in a
pathological case (e.g., 4GB worth of "a/a/...") and even then will not
result in out-of-bounds memory access (we keep stripping until we run
out of string to parse).

The minimal Myers diff here is a little hard to read; with --patience
the code movement is shown much more clearly.

Signed-off-by: Jeff King <peff@peff.net>
---
I did generate this with --patience. Using --color-words also helps show
that it's a pure code movement.

 ref-filter.c | 56 +++++++++++++++++++++-------------------------------
 1 file changed, 22 insertions(+), 34 deletions(-)

diff --git a/ref-filter.c b/ref-filter.c
index 3917c4ccd9..9153331f42 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -2173,28 +2173,32 @@ static inline char *copy_advance(char *dst, const char *src)
 	return dst;
 }
 
+static int normalize_component_count(const char *refname, int len)
+{
+	if (len < 0) {
+		int i;
+		const char *p = refname;
+
+		/* Find total no of '/' separated path-components */
+		for (i = 0; p[i]; p[i] == '/' ? i++ : *p++)
+			;
+		/*
+		 * The number of components we need to strip is now
+		 * the total minus the components to be left (Plus one
+		 * because we count the number of '/', but the number
+		 * of components is one more than the no of '/').
+		 */
+		len = i + len + 1;
+	}
+	return len;
+}
+
 static const char *lstrip_ref_components(const char *refname, int len)
 {
-	long remaining = len;
+	int remaining = normalize_component_count(refname, len);
 	const char *start = xstrdup(refname);
 	const char *to_free = start;
 
-	if (len < 0) {
-		int i;
-		const char *p = refname;
-
-		/* Find total no of '/' separated path-components */
-		for (i = 0; p[i]; p[i] == '/' ? i++ : *p++)
-			;
-		/*
-		 * The number of components we need to strip is now
-		 * the total minus the components to be left (Plus one
-		 * because we count the number of '/', but the number
-		 * of components is one more than the no of '/').
-		 */
-		remaining = i + len + 1;
-	}
-
 	while (remaining > 0) {
 		switch (*start++) {
 		case '\0':
@@ -2213,26 +2217,10 @@ static const char *lstrip_ref_components(const char *refname, int len)
 
 static const char *rstrip_ref_components(const char *refname, int len)
 {
-	long remaining = len;
+	int remaining = normalize_component_count(refname, len);
 	const char *start = xstrdup(refname);
 	const char *to_free = start;
 
-	if (len < 0) {
-		int i;
-		const char *p = refname;
-
-		/* Find total no of '/' separated path-components */
-		for (i = 0; p[i]; p[i] == '/' ? i++ : *p++)
-			;
-		/*
-		 * The number of components we need to strip is now
-		 * the total minus the components to be left (Plus one
-		 * because we count the number of '/', but the number
-		 * of components is one more than the no of '/').
-		 */
-		remaining = i + len + 1;
-	}
-
 	while (remaining-- > 0) {
 		char *p = strrchr(start, '/');
 		if (!p) {
-- 
2.53.0.438.gad17e1cd28


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/4] ref-filter: simplify lstrip_ref_components() memory handling
  2026-02-15  8:57 ` [PATCH 0/4] cleaning up ref-filter lstrip/rstrip code Jeff King
  2026-02-15  9:00   ` [PATCH 1/4] ref-filter: factor out refname component counting Jeff King
@ 2026-02-15  9:02   ` Jeff King
  2026-02-15  9:05   ` [PATCH 3/4] ref-filter: simplify rstrip_ref_components() " Jeff King
                     ` (3 subsequent siblings)
  5 siblings, 0 replies; 14+ messages in thread
From: Jeff King @ 2026-02-15  9:02 UTC (permalink / raw)
  To: Collin Funk; +Cc: git, Junio C Hamano

We're walking forward in the string, skipping path components from
left-to-right. So when we've stripped as much as we want, the pointer we
have is a complete NUL-terminated string and we can just return it
(after duplicating it, of course). So there is no need for a temporary
allocated string.

But we do make an extra temporary copy due to f0062d3b74 (ref-filter:
free item->value and item->value->s, 2018-10-18). This is probably from
cargo-culting the technique used in rstrip_ref_components(), which
_does_ need a separate string (since it is stripping from the end and
ties off the temporary string with a NUL).

Let's drop the extra allocation. This is slightly more efficient, but
more importantly makes the code much simpler.

Signed-off-by: Jeff King <peff@peff.net>
---
 ref-filter.c | 9 ++-------
 1 file changed, 2 insertions(+), 7 deletions(-)

diff --git a/ref-filter.c b/ref-filter.c
index 9153331f42..eb09fda21b 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -2196,23 +2196,18 @@ static int normalize_component_count(const char *refname, int len)
 static const char *lstrip_ref_components(const char *refname, int len)
 {
 	int remaining = normalize_component_count(refname, len);
-	const char *start = xstrdup(refname);
-	const char *to_free = start;
 
 	while (remaining > 0) {
-		switch (*start++) {
+		switch (*refname++) {
 		case '\0':
-			free((char *)to_free);
 			return xstrdup("");
 		case '/':
 			remaining--;
 			break;
 		}
 	}
 
-	start = xstrdup(start);
-	free((char *)to_free);
-	return start;
+	return xstrdup(refname);
 }
 
 static const char *rstrip_ref_components(const char *refname, int len)
-- 
2.53.0.438.gad17e1cd28


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 3/4] ref-filter: simplify rstrip_ref_components() memory handling
  2026-02-15  8:57 ` [PATCH 0/4] cleaning up ref-filter lstrip/rstrip code Jeff King
  2026-02-15  9:00   ` [PATCH 1/4] ref-filter: factor out refname component counting Jeff King
  2026-02-15  9:02   ` [PATCH 2/4] ref-filter: simplify lstrip_ref_components() memory handling Jeff King
@ 2026-02-15  9:05   ` Jeff King
  2026-02-15  9:07   ` [PATCH 4/4] ref-filter: avoid strrchr() in rstrip_ref_components() Jeff King
                     ` (2 subsequent siblings)
  5 siblings, 0 replies; 14+ messages in thread
From: Jeff King @ 2026-02-15  9:05 UTC (permalink / raw)
  To: Collin Funk; +Cc: git, Junio C Hamano

We're stripping path components from the end of a string, which we do by
assigning a NUL as we parse each component, shortening the string. This
requires an extra temporary buffer to avoid munging our input string.

But the way that we allocate the buffer is unusual. We have an extra
"to_free" variable. Usually this is used when the access variable is
conceptually const, like:

   const char *foo;
   char *to_free = NULL;

   if (...)
           foo = to_free = xstrdup(...);
   else
           foo = some_const_string;
   ...
   free(to_free);

But that's not what's happening here. Our "start" variable always points
to the allocated buffer, and to_free is redundant. Worse, it is marked
as const itself, requiring a cast when we free it.

Let's drop to_free entirely, and mark "start" as non-const, making the
memory handling more clear. As a bonus, this also silences a warning
from glibc-2.43 that our call to strrchr() implicitly strips away the
const-ness of "start".

Signed-off-by: Jeff King <peff@peff.net>
---
 ref-filter.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/ref-filter.c b/ref-filter.c
index eb09fda21b..1008b2fd5a 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -2213,13 +2213,12 @@ static const char *lstrip_ref_components(const char *refname, int len)
 static const char *rstrip_ref_components(const char *refname, int len)
 {
 	int remaining = normalize_component_count(refname, len);
-	const char *start = xstrdup(refname);
-	const char *to_free = start;
+	char *start = xstrdup(refname);
 
 	while (remaining-- > 0) {
 		char *p = strrchr(start, '/');
 		if (!p) {
-			free((char *)to_free);
+			free(start);
 			return xstrdup("");
 		} else
 			p[0] = '\0';
-- 
2.53.0.438.gad17e1cd28


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 4/4] ref-filter: avoid strrchr() in rstrip_ref_components()
  2026-02-15  8:57 ` [PATCH 0/4] cleaning up ref-filter lstrip/rstrip code Jeff King
                     ` (2 preceding siblings ...)
  2026-02-15  9:05   ` [PATCH 3/4] ref-filter: simplify rstrip_ref_components() " Jeff King
@ 2026-02-15  9:07   ` Jeff King
  2026-02-16  7:23     ` Patrick Steinhardt
  2026-02-15  9:11   ` [PATCH 0/4] cleaning up ref-filter lstrip/rstrip code Jeff King
  2026-02-15 22:23   ` Collin Funk
  5 siblings, 1 reply; 14+ messages in thread
From: Jeff King @ 2026-02-15  9:07 UTC (permalink / raw)
  To: Collin Funk; +Cc: git, Junio C Hamano

To strip path components from our refname string, we repeatedly call
strrchr() to find the trailing slash, shortening the string each time by
assigning NUL over it. This has two downsides:

  1. Calling strrchr() in a loop is quadratic, since each call has to
     call strlen() under the hood to find the end of the string (even
     though we know exactly where it is from the last loop iteration).

  2. We need a temporary buffer, since we're munging the string with NUL
     as we shorten it (which we must do, because strrchr() has no other
     way of knowing what we consider the end of the string).

Using memrchr() would let us fix both of these, but it isn't portable.
So instead, let's just open-code the string traversal from back to
front as we loop.

I doubt that the quadratic nature is a serious concern. You can see it
in practice with something like:

  git init
  git commit --allow-empty -m foo
  echo "$(git rev-parse HEAD) refs/heads$(perl -e 'print "/a" x 500_000')" >.git/packed-refs
  time git for-each-ref --format='%(refname:rstrip=-1)'

That takes ~5.5s to run on my machine before this patch, and ~11ms
after. But I don't think there's a reasonable way for somebody to infect
you with such a garbage ref, as the wire protocol is limited to 64k
pkt-lines. The difference is measurable for me for a 32k-component ref
(about 19ms vs 7ms), so perhaps you could create some chaos by pushing a
lot of them. But we also run into filesystem limits (if the loose
backend is in use), and in practice it seems like there are probably
simpler and more effective ways to waste CPU.

Likewise the extra allocation probably isn't really measurable. In fact,
since our goal is to return an allocated string, we end up having to
make the same allocation anyway (though it is sized to the result,
rather than the input). My main goal was simplicity in avoiding the need
to handle cleaning it up in the early return path.

Signed-off-by: Jeff King <peff@peff.net>
---
 ref-filter.c | 14 ++++++--------
 1 file changed, 6 insertions(+), 8 deletions(-)

diff --git a/ref-filter.c b/ref-filter.c
index 1008b2fd5a..ac32b0e6bb 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -2213,17 +2213,15 @@ static const char *lstrip_ref_components(const char *refname, int len)
 static const char *rstrip_ref_components(const char *refname, int len)
 {
 	int remaining = normalize_component_count(refname, len);
-	char *start = xstrdup(refname);
+	const char *end = refname + strlen(refname);
 
-	while (remaining-- > 0) {
-		char *p = strrchr(start, '/');
-		if (!p) {
-			free(start);
+	while (remaining > 0) {
+		if (end == refname)
 			return xstrdup("");
-		} else
-			p[0] = '\0';
+		if (*--end == '/')
+			remaining--;
 	}
-	return start;
+	return xmemdupz(refname, end - refname);
 }
 
 static const char *show_ref(struct refname_atom *atom, const char *refname)
-- 
2.53.0.438.gad17e1cd28

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/4] cleaning up ref-filter lstrip/rstrip code
  2026-02-15  8:57 ` [PATCH 0/4] cleaning up ref-filter lstrip/rstrip code Jeff King
                     ` (3 preceding siblings ...)
  2026-02-15  9:07   ` [PATCH 4/4] ref-filter: avoid strrchr() in rstrip_ref_components() Jeff King
@ 2026-02-15  9:11   ` Jeff King
  2026-02-15 22:23   ` Collin Funk
  5 siblings, 0 replies; 14+ messages in thread
From: Jeff King @ 2026-02-15  9:11 UTC (permalink / raw)
  To: Collin Funk; +Cc: git, Junio C Hamano

On Sun, Feb 15, 2026 at 03:57:55AM -0500, Jeff King wrote:

> > That is a bit of a seperate topic than fixing the warning, though, so
> > I left them as-is.
> 
> It is a separate topic, but I feel like this is a good opportunity to
> make this code less horrible. That is, there are some obvious
> low-hanging cleanups that make the code more readable, and as a side
> effect we clean up the const confusion. In such cases I think it is
> worth veering off the path a little.
> 
> I was going to catalog the numerous flaws I found, but by the time I
> explained them, I had basically written patches and commit messages. So
> here is what I would propose instead. I hope I'm not stealing your
> thunder nor knocking us too far off our goal.
> 
> The first three I hope are no-brainers, and the final one fixes the
> glibc const issue. The fourth is perhaps more risky.
> 
>   [1/4]: ref-filter: factor out refname component counting
>   [2/4]: ref-filter: simplify lstrip_ref_components() memory handling
>   [3/4]: ref-filter: simplify rstrip_ref_components() memory handling
>   [4/4]: ref-filter: open-code slash search in rstrip_ref_components()
> 
>  ref-filter.c | 54 +++++++++++++++++-----------------------------------
>  1 file changed, 17 insertions(+), 37 deletions(-)

BTW, you might notice one further opportunity for cleanup: these
functions return an allocated string via a "const char *". But that
issue is endemic to the ref-filter code, courtesy of f0062d3b74
(ref-filter: free item->value and item->value->s, 2018-10-18), and we
should probably look into cleaning it up all at once.

And that crosses my line of "way off topic, let's leave it for another
day". See, I do have _some_ restraint. ;)

-Peff

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/4] cleaning up ref-filter lstrip/rstrip code
  2026-02-15  8:57 ` [PATCH 0/4] cleaning up ref-filter lstrip/rstrip code Jeff King
                     ` (4 preceding siblings ...)
  2026-02-15  9:11   ` [PATCH 0/4] cleaning up ref-filter lstrip/rstrip code Jeff King
@ 2026-02-15 22:23   ` Collin Funk
  5 siblings, 0 replies; 14+ messages in thread
From: Collin Funk @ 2026-02-15 22:23 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Junio C Hamano

Jeff King <peff@peff.net> writes:

> On Fri, Feb 13, 2026 at 09:15:57PM -0800, Collin Funk wrote:
>
>> I generally don't like the casts like in rstrip_ref_components and
>> rstrip_ref_components because they force you to write this:
>> 
>>     free((char *)free_ptr);
>> 
>> And the const doesn't really benefit readability, in my opinion.
>
> Agreed. It is especially egregious in this case because the const
> variable is called to_free, and so its only purpose is to be non-const. ;)
>
>> That is a bit of a seperate topic than fixing the warning, though, so
>> I left them as-is.
>
> It is a separate topic, but I feel like this is a good opportunity to
> make this code less horrible. That is, there are some obvious
> low-hanging cleanups that make the code more readable, and as a side
> effect we clean up the const confusion. In such cases I think it is
> worth veering off the path a little.
>
> I was going to catalog the numerous flaws I found, but by the time I
> explained them, I had basically written patches and commit messages. So
> here is what I would propose instead. I hope I'm not stealing your
> thunder nor knocking us too far off our goal.

No need to worry about stealing my thunder.

> The first three I hope are no-brainers, and the final one fixes the
> glibc const issue. The fourth is perhaps more risky.
>
>   [1/4]: ref-filter: factor out refname component counting
>   [2/4]: ref-filter: simplify lstrip_ref_components() memory handling
>   [3/4]: ref-filter: simplify rstrip_ref_components() memory handling
>   [4/4]: ref-filter: open-code slash search in rstrip_ref_components()

The cleanups look good and certainly make the code more understandable.

Also, I confirm that the last one fixes the glibc-2.43 warning.

Collin

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 4/4] ref-filter: avoid strrchr() in rstrip_ref_components()
  2026-02-15  9:07   ` [PATCH 4/4] ref-filter: avoid strrchr() in rstrip_ref_components() Jeff King
@ 2026-02-16  7:23     ` Patrick Steinhardt
  0 siblings, 0 replies; 14+ messages in thread
From: Patrick Steinhardt @ 2026-02-16  7:23 UTC (permalink / raw)
  To: Jeff King; +Cc: Collin Funk, git, Junio C Hamano

On Sun, Feb 15, 2026 at 04:07:44AM -0500, Jeff King wrote:
> To strip path components from our refname string, we repeatedly call
> strrchr() to find the trailing slash, shortening the string each time by
> assigning NUL over it. This has two downsides:
> 
>   1. Calling strrchr() in a loop is quadratic, since each call has to
>      call strlen() under the hood to find the end of the string (even
>      though we know exactly where it is from the last loop iteration).

Ah, indeed, that's something I missed.

>   2. We need a temporary buffer, since we're munging the string with NUL
>      as we shorten it (which we must do, because strrchr() has no other
>      way of knowing what we consider the end of the string).

Right, upon reading the preceding patch I figured that we can improve
this function even further and avoid the call to `xstrdup()` in the case
where we have less components than we're being asked to strip.

> Using memrchr() would let us fix both of these, but it isn't portable.
> So instead, let's just open-code the string traversal from back to
> front as we loop.
> 
> I doubt that the quadratic nature is a serious concern. You can see it
> in practice with something like:
> 
>   git init
>   git commit --allow-empty -m foo
>   echo "$(git rev-parse HEAD) refs/heads$(perl -e 'print "/a" x 500_000')" >.git/packed-refs
>   time git for-each-ref --format='%(refname:rstrip=-1)'
> 
> That takes ~5.5s to run on my machine before this patch, and ~11ms
> after. But I don't think there's a reasonable way for somebody to infect
> you with such a garbage ref, as the wire protocol is limited to 64k
> pkt-lines. The difference is measurable for me for a 32k-component ref
> (about 19ms vs 7ms), so perhaps you could create some chaos by pushing a
> lot of them. But we also run into filesystem limits (if the loose
> backend is in use), and in practice it seems like there are probably
> simpler and more effective ways to waste CPU.

Agreed, not much of a concern, but good regardless to see it being
addressed.

> Likewise the extra allocation probably isn't really measurable. In fact,
> since our goal is to return an allocated string, we end up having to
> make the same allocation anyway (though it is sized to the result,
> rather than the input). My main goal was simplicity in avoiding the need
> to handle cleaning it up in the early return path.

Likewise.

> diff --git a/ref-filter.c b/ref-filter.c
> index 1008b2fd5a..ac32b0e6bb 100644
> --- a/ref-filter.c
> +++ b/ref-filter.c
> @@ -2213,17 +2213,15 @@ static const char *lstrip_ref_components(const char *refname, int len)
>  static const char *rstrip_ref_components(const char *refname, int len)
>  {
>  	int remaining = normalize_component_count(refname, len);
> -	char *start = xstrdup(refname);
> +	const char *end = refname + strlen(refname);
>  
> -	while (remaining-- > 0) {
> -		char *p = strrchr(start, '/');
> -		if (!p) {
> -			free(start);
> +	while (remaining > 0) {
> +		if (end == refname)
>  			return xstrdup("");
> -		} else
> -			p[0] = '\0';
> +		if (*--end == '/')
> +			remaining--;

We start scannign from the trailing NUL byte, so this would also cause
us to detect if the refname had "/" as a suffix. But I assume that's a
case we don't even need to care about, as refs cannot end with a slash
anyway.

Another edge case is if we were passed the empty string, but as we
already abort in case we see that `end == refname` we're good there,
too.

>  	}
> -	return start;
> +	return xmemdupz(refname, end - refname);
>  }

So overall this and all the preceding patches look good to me. Thanks!

Patrick

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/4] ref-filter: factor out refname component counting
  2026-02-15  9:00   ` [PATCH 1/4] ref-filter: factor out refname component counting Jeff King
@ 2026-02-17 18:07     ` Junio C Hamano
  2026-02-19 11:21       ` Jeff King
  0 siblings, 1 reply; 14+ messages in thread
From: Junio C Hamano @ 2026-02-17 18:07 UTC (permalink / raw)
  To: Jeff King; +Cc: Collin Funk, git

Jeff King <peff@peff.net> writes:

> +	if (len < 0) {
> +		int i;
> +		const char *p = refname;
> +
> +		/* Find total no of '/' separated path-components */
> +		for (i = 0; p[i]; p[i] == '/' ? i++ : *p++)
> +			;

Sorry, but I have no idea what this loop (copied verbatim from the
original) is trying to do.

We start at the beginning of the refname string, and while we are in
the leading run of '/' we increment i to find the end of that
run. E.g., we start with refname="///foo", p points at the leftmost
'/', i runs from 0 to 3 at which point p[i] points at the first
non-'/' character, at which point we do *p++, to make p point at the
second slash?  Is the dereferencing of the pointer in *p++ a no-op
that is there only to confuse readers?

And then p moves to the right until p[i] points at the end of the
string.  It does count the number of slashes in 'i', but there is no
satisfying simple answer to this question: "what does p mean while
this loop runs?".

Anyway, the conversion looks very faithful to the original.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/4] ref-filter: factor out refname component counting
  2026-02-17 18:07     ` Junio C Hamano
@ 2026-02-19 11:21       ` Jeff King
  2026-02-19 18:56         ` Junio C Hamano
  2026-02-22 17:04         ` [PATCH 1/4] ref-filter: factor out refname " Karthik Nayak
  0 siblings, 2 replies; 14+ messages in thread
From: Jeff King @ 2026-02-19 11:21 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Karthik Nayak, Collin Funk, git

On Tue, Feb 17, 2026 at 10:07:29AM -0800, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > +	if (len < 0) {
> > +		int i;
> > +		const char *p = refname;
> > +
> > +		/* Find total no of '/' separated path-components */
> > +		for (i = 0; p[i]; p[i] == '/' ? i++ : *p++)
> > +			;
> 
> Sorry, but I have no idea what this loop (copied verbatim from the
> original) is trying to do.
> 
> We start at the beginning of the refname string, and while we are in
> the leading run of '/' we increment i to find the end of that
> run. E.g., we start with refname="///foo", p points at the leftmost
> '/', i runs from 0 to 3 at which point p[i] points at the first
> non-'/' character, at which point we do *p++, to make p point at the
> second slash?  Is the dereferencing of the pointer in *p++ a no-op
> that is there only to confuse readers?
> 
> And then p moves to the right until p[i] points at the end of the
> string.  It does count the number of slashes in 'i', but there is no
> satisfying simple answer to this question: "what does p mean while
> this loop runs?".
> 
> Anyway, the conversion looks very faithful to the original.

Heh, I missed your message initially but was independently staring at
this because Coverity complained that the dereference in "*p++" is
useless. Which is...kind of right. It is a void context, so the
dereferenced char goes nowhere and it is a noop. But if you don't do it,
then gcc complains that the two sides of the ternary have mis-matched
types (an int and a pointer). Which is true, but since nobody looks at
the result, it does not matter.

Writing it like:

  int i = 0;
  while (p[i]) {
	if (p[i] == '/')
		i++;
	else
		p++;
  }

perhaps resolves the syntactic confusion. Leaving only the semantic
confusion. ;)

I guess the thinking was that "p+i" represents the traversal, with "i"
encoding the counted slashes (so we must increment _one_ of them each
time). But I cannot fathom how that is easier than counting the slashes
like:

  int slashes = 0;
  for (p = refname; *p; p++) {
	if (*p == '/')
		slashes++;
  }

Which made me wonder if I am missing some corner case, and it is not
just counting slashes. But it must be, because "i" is never incremented
except when we see a slash.

+cc Karthik, the original author, for any wisdom, but the commit is now
almost 10 years old.

Is it worth rewriting to the "slashes" form above for clarity? I was
afraid to touch it just to shut up Coverity, but now we have two
confused people.

-Peff

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/4] ref-filter: factor out refname component counting
  2026-02-19 11:21       ` Jeff King
@ 2026-02-19 18:56         ` Junio C Hamano
  2026-02-20  6:00           ` [PATCH] ref-filter: clarify lstrip/rstrip " Jeff King
  2026-02-22 17:04         ` [PATCH 1/4] ref-filter: factor out refname " Karthik Nayak
  1 sibling, 1 reply; 14+ messages in thread
From: Junio C Hamano @ 2026-02-19 18:56 UTC (permalink / raw)
  To: Jeff King; +Cc: Karthik Nayak, Collin Funk, git

Jeff King <peff@peff.net> writes:

>> And then p moves to the right until p[i] points at the end of the
>> string.  It does count the number of slashes in 'i', but there is no
>> satisfying simple answer to this question: "what does p mean while
>> this loop runs?".
>> ...
> Which made me wonder if I am missing some corner case, and it is not
> just counting slashes. But it must be, because "i" is never incremented
> except when we see a slash.
>
> +cc Karthik, the original author, for any wisdom, but the commit is now
> almost 10 years old.
>
> Is it worth rewriting to the "slashes" form above for clarity? I was
> afraid to touch it just to shut up Coverity, but now we have two
> confused people.

Yup, I think the answer to my "what does p mean?" question is "by
itself p has *no* meaning, but (p-refname) is maintained to be the
number of non-slash bytes we scanned so far, while i is the number
of slashes."

And from that point of view, your "count slashes in the most stupid
way that even 5 year old understands" certainly does make the result
far easier to read.

Thanks.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH] ref-filter: clarify lstrip/rstrip component counting
  2026-02-19 18:56         ` Junio C Hamano
@ 2026-02-20  6:00           ` Jeff King
  0 siblings, 0 replies; 14+ messages in thread
From: Jeff King @ 2026-02-20  6:00 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Karthik Nayak, Collin Funk, git

On Thu, Feb 19, 2026 at 10:56:53AM -0800, Junio C Hamano wrote:

> > Is it worth rewriting to the "slashes" form above for clarity? I was
> > afraid to touch it just to shut up Coverity, but now we have two
> > confused people.
> 
> Yup, I think the answer to my "what does p mean?" question is "by
> itself p has *no* meaning, but (p-refname) is maintained to be the
> number of non-slash bytes we scanned so far, while i is the number
> of slashes."
> 
> And from that point of view, your "count slashes in the most stupid
> way that even 5 year old understands" certainly does make the result
> far easier to read.

Here it is in patch form. Probably not worth as many words as I wrote in
the commit message, but most of it is just summarizing our earlier
findings.

I do notice that this function may not do what we want for
"/absolute/ref/name" or for "refs//with//double//slashes". But I don't
think it should see either of those, as it would always get normalized
refnames from Git itself. So I think we can ignore it for now.

-- >8 --
Subject: [PATCH] ref-filter: clarify lstrip/rstrip component counting

When a strip option to the %(refname) placeholder is asked to leave N
path components, we first count up the path components to know how many
to remove. That happens with a loop like this:

	/* Find total no of '/' separated path-components */
	for (i = 0; p[i]; p[i] == '/' ? i++ : *p++)
		;

which is a little hard to understand for two reasons.

First, the dereference in "*p++" is seemingly useless, since nobody
looks at the result. And static analyzers like Coverity will complain
about that. But removing the "*" will cause gcc to complain with
-Wint-conversion, since the two sides of the ternary do not match (one
is a pointer and the other an int).

Second, it is not clear what the meaning of "p" is at each iteration of
the loop, as its position with respect to our walk over the string
depends on how many slashes we've seen. The answer is that by itself, it
doesn't really mean anything: "p + i" represents the current state of
our walk, with "i" counting up slashes, and "p" by itself essentially
meaningless.

None of this behaves incorrectly, but ultimately the loop is just
counting the slashes in the refname. We can do that much more simply
with a for-loop iterating over the string and a separate slash counter.

We can also drop the comment, which is somewhat misleading. We are
counting slashes, not components (and a comment later in the function
makes it clear that we must add one to compensate). In the new code it
is obvious that we are counting slashes here.

Signed-off-by: Jeff King <peff@peff.net>
---
 ref-filter.c | 13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/ref-filter.c b/ref-filter.c
index ac32b0e6bb..6bbb6fac18 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -2176,19 +2176,20 @@ static inline char *copy_advance(char *dst, const char *src)
 static int normalize_component_count(const char *refname, int len)
 {
 	if (len < 0) {
-		int i;
-		const char *p = refname;
+		int slashes = 0;
+
+		for (const char *p = refname; *p; p++) {
+			if (*p == '/')
+				slashes++;
+		}
 
-		/* Find total no of '/' separated path-components */
-		for (i = 0; p[i]; p[i] == '/' ? i++ : *p++)
-			;
 		/*
 		 * The number of components we need to strip is now
 		 * the total minus the components to be left (Plus one
 		 * because we count the number of '/', but the number
 		 * of components is one more than the no of '/').
 		 */
-		len = i + len + 1;
+		len = slashes + len + 1;
 	}
 	return len;
 }
-- 
2.53.0.528.g678f28d038


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/4] ref-filter: factor out refname component counting
  2026-02-19 11:21       ` Jeff King
  2026-02-19 18:56         ` Junio C Hamano
@ 2026-02-22 17:04         ` Karthik Nayak
  1 sibling, 0 replies; 14+ messages in thread
From: Karthik Nayak @ 2026-02-22 17:04 UTC (permalink / raw)
  To: Jeff King, Junio C Hamano; +Cc: Collin Funk, git

[-- Attachment #1: Type: text/plain, Size: 2734 bytes --]

Jeff King <peff@peff.net> writes:

> On Tue, Feb 17, 2026 at 10:07:29AM -0800, Junio C Hamano wrote:
>
>> Jeff King <peff@peff.net> writes:
>>
>> > +	if (len < 0) {
>> > +		int i;
>> > +		const char *p = refname;
>> > +
>> > +		/* Find total no of '/' separated path-components */
>> > +		for (i = 0; p[i]; p[i] == '/' ? i++ : *p++)
>> > +			;
>>
>> Sorry, but I have no idea what this loop (copied verbatim from the
>> original) is trying to do.
>>
>> We start at the beginning of the refname string, and while we are in
>> the leading run of '/' we increment i to find the end of that
>> run. E.g., we start with refname="///foo", p points at the leftmost
>> '/', i runs from 0 to 3 at which point p[i] points at the first
>> non-'/' character, at which point we do *p++, to make p point at the
>> second slash?  Is the dereferencing of the pointer in *p++ a no-op
>> that is there only to confuse readers?
>>
>> And then p moves to the right until p[i] points at the end of the
>> string.  It does count the number of slashes in 'i', but there is no
>> satisfying simple answer to this question: "what does p mean while
>> this loop runs?".
>>
>> Anyway, the conversion looks very faithful to the original.
>
> Heh, I missed your message initially but was independently staring at
> this because Coverity complained that the dereference in "*p++" is
> useless. Which is...kind of right. It is a void context, so the
> dereferenced char goes nowhere and it is a noop. But if you don't do it,
> then gcc complains that the two sides of the ternary have mis-matched
> types (an int and a pointer). Which is true, but since nobody looks at
> the result, it does not matter.
>
> Writing it like:
>
>   int i = 0;
>   while (p[i]) {
> 	if (p[i] == '/')
> 		i++;
> 	else
> 		p++;
>   }
>
> perhaps resolves the syntactic confusion. Leaving only the semantic
> confusion. ;)
>
> I guess the thinking was that "p+i" represents the traversal, with "i"
> encoding the counted slashes (so we must increment _one_ of them each
> time). But I cannot fathom how that is easier than counting the slashes
> like:
>
>   int slashes = 0;
>   for (p = refname; *p; p++) {
> 	if (*p == '/')
> 		slashes++;
>   }
>
> Which made me wonder if I am missing some corner case, and it is not
> just counting slashes. But it must be, because "i" is never incremented
> except when we see a slash.
>
> +cc Karthik, the original author, for any wisdom, but the commit is now
> almost 10 years old.
>

I'm embarrassed and frankly don't remember this code :) Your new patch
looks sensible to me.

> Is it worth rewriting to the "slashes" form above for clarity? I was
> afraid to touch it just to shut up Coverity, but now we have two
> confused people.
>
> -Peff

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2026-02-22 17:04 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-14  5:15 [PATCH] ref-filter: don't declare a strdup'd variable const before writing to it Collin Funk
2026-02-15  8:57 ` [PATCH 0/4] cleaning up ref-filter lstrip/rstrip code Jeff King
2026-02-15  9:00   ` [PATCH 1/4] ref-filter: factor out refname component counting Jeff King
2026-02-17 18:07     ` Junio C Hamano
2026-02-19 11:21       ` Jeff King
2026-02-19 18:56         ` Junio C Hamano
2026-02-20  6:00           ` [PATCH] ref-filter: clarify lstrip/rstrip " Jeff King
2026-02-22 17:04         ` [PATCH 1/4] ref-filter: factor out refname " Karthik Nayak
2026-02-15  9:02   ` [PATCH 2/4] ref-filter: simplify lstrip_ref_components() memory handling Jeff King
2026-02-15  9:05   ` [PATCH 3/4] ref-filter: simplify rstrip_ref_components() " Jeff King
2026-02-15  9:07   ` [PATCH 4/4] ref-filter: avoid strrchr() in rstrip_ref_components() Jeff King
2026-02-16  7:23     ` Patrick Steinhardt
2026-02-15  9:11   ` [PATCH 0/4] cleaning up ref-filter lstrip/rstrip code Jeff King
2026-02-15 22:23   ` Collin Funk

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox