* [PATCH] ref-filter: fix stale parsed objects
@ 2025-11-04 14:36 Patrick Steinhardt
2025-11-04 15:26 ` Junio C Hamano
` (2 more replies)
0 siblings, 3 replies; 7+ messages in thread
From: Patrick Steinhardt @ 2025-11-04 14:36 UTC (permalink / raw)
To: git; +Cc: Jeff King, Junio C Hamano
In 054f5f457e (ref-filter: parse objects on demand, 2025-10-23) we have
started to skip parsing some objects in case we don't need to access
their values in the first place. This was done by introducing a new
member `struct expand_data::maybe_object` that gets populated on demand
via `get_or_parse_object()`.
This has led to a regression though where the object now gets reused
because we don't reset it properly. The `oi` structure is declared in
global scope, and there is no single place where we reset it before
invoking `get_object()`. The consequence is that the `maybe_object`
member doesn't get reset across calls, so subsequent calls will end up
reusing the same object.
This is only an issue for a subset of retrieved values, as not all of
the infrastructure ends up calling `get_or_parse_object()`. So the
effect is limited, which is probably why the issue wasn't detected
earlier.
Fix the issue by resetting `maybe_object` in `get_object()`.
Reported-by: Junio C Hamano <gitster@pobox.com>
Based-on-patch-by: Jeff King <peff@peff.net>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
As reported by Junio in <xmqqo6pjt2wn.fsf@gitster.g>. This applies
directly on top of ps/ref-peeled-tags at 054f5f457e (ref-filter: parse
objects on demand, 2025-10-23)
Thanks!
Patrick
---
ref-filter.c | 2 ++
t/t7004-tag.sh | 20 ++++++++++++++++++++
2 files changed, 22 insertions(+)
diff --git a/ref-filter.c b/ref-filter.c
index 7cfcd5c355..d8667c569a 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -2367,6 +2367,8 @@ static int get_object(struct ref_array_item *ref, int deref,
int eaten = 0;
int ret;
+ oi->maybe_object = NULL;
+
if (oi->info.contentp) {
/* We need to know that to use parse_object_buffer properly */
oi->info.sizep = &oi->size;
diff --git a/t/t7004-tag.sh b/t/t7004-tag.sh
index 10835631ca..d1388cfdf4 100755
--- a/t/t7004-tag.sh
+++ b/t/t7004-tag.sh
@@ -2332,4 +2332,24 @@ test_expect_success 'If tag cannot be created then tag message file is not unlin
test_path_exists .git/TAG_EDITMSG
'
+test_expect_success 'annotated tag version sort' '
+ git tag -a -m "sample 1.0" vsample-1.0 &&
+ git tag -a -m "sample 2.0" vsample-2.0 &&
+ git tag -a -m "sample 10.0" vsample-10.0 &&
+ cat >expect <<-EOF &&
+ vsample-1.0
+ vsample-2.0
+ vsample-10.0
+ EOF
+
+ git tag --list --sort=version:tag vsample-\* >actual &&
+ test_cmp expect actual &&
+
+ # Ensure that we also handle this case alright in the case we have the
+ # peeled values cached e.g. via the packed-refs file.
+ git pack-refs --all &&
+ git tag --list --sort=version:tag vsample-\* &&
+ test_cmp expect actual
+'
+
test_done
---
base-commit: 0b2c6cbdc7ac42a10094102781112b122cbc2b88
change-id: 20251104-b4-pks-ref-filter-fixup-048013766635
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH] ref-filter: fix stale parsed objects
2025-11-04 14:36 [PATCH] ref-filter: fix stale parsed objects Patrick Steinhardt
@ 2025-11-04 15:26 ` Junio C Hamano
2025-11-04 18:31 ` Junio C Hamano
2025-11-04 21:33 ` Jeff King
2 siblings, 0 replies; 7+ messages in thread
From: Junio C Hamano @ 2025-11-04 15:26 UTC (permalink / raw)
To: Patrick Steinhardt; +Cc: git, Jeff King
Patrick Steinhardt <ps@pks.im> writes:
> This is only an issue for a subset of retrieved values, as not all of
> the infrastructure ends up calling `get_or_parse_object()`. So the
> effect is limited, which is probably why the issue wasn't detected
> earlier.
>
> Fix the issue by resetting `maybe_object` in `get_object()`.
Yup, I got lucky as --sort=version:tag is one of the things I use
at least once a week.
Will queue on top of that topic and immediately merge down to
'next'.
Thanks.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] ref-filter: fix stale parsed objects
2025-11-04 14:36 [PATCH] ref-filter: fix stale parsed objects Patrick Steinhardt
2025-11-04 15:26 ` Junio C Hamano
@ 2025-11-04 18:31 ` Junio C Hamano
2025-11-04 21:11 ` Jeff King
2025-11-06 6:04 ` Patrick Steinhardt
2025-11-04 21:33 ` Jeff King
2 siblings, 2 replies; 7+ messages in thread
From: Junio C Hamano @ 2025-11-04 18:31 UTC (permalink / raw)
To: Patrick Steinhardt; +Cc: git, Jeff King
Patrick Steinhardt <ps@pks.im> writes:
> In 054f5f457e (ref-filter: parse objects on demand, 2025-10-23) we have
> started to skip parsing some objects in case we don't need to access
> their values in the first place. This was done by introducing a new
> member `struct expand_data::maybe_object` that gets populated on demand
> via `get_or_parse_object()`.
>
> This has led to a regression though where the object now gets reused
> because we don't reset it properly. The `oi` structure is declared in
> global scope, and there is no single place where we reset it before
> invoking `get_object()`. The consequence is that the `maybe_object`
> member doesn't get reset across calls, so subsequent calls will end up
> reusing the same object.
>
> This is only an issue for a subset of retrieved values, as not all of
> the infrastructure ends up calling `get_or_parse_object()`. So the
> effect is limited, which is probably why the issue wasn't detected
> earlier.
>
> Fix the issue by resetting `maybe_object` in `get_object()`.
>
> Reported-by: Junio C Hamano <gitster@pobox.com>
> Based-on-patch-by: Jeff King <peff@peff.net>
> Signed-off-by: Patrick Steinhardt <ps@pks.im>
> ---
> As reported by Junio in <xmqqo6pjt2wn.fsf@gitster.g>. This applies
> directly on top of ps/ref-peeled-tags at 054f5f457e (ref-filter: parse
> objects on demand, 2025-10-23)
>
> Thanks!
Thanks. As we stop reusing a stale maybe_object and instead start
parsing the right object when we need to, I wondered if the "on
demand" commit needs a new benchmark, but the example cited in the
message used %(raw) so it would not be affected, I guess.
Queued. Thanks.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] ref-filter: fix stale parsed objects
2025-11-04 18:31 ` Junio C Hamano
@ 2025-11-04 21:11 ` Jeff King
2025-11-04 22:04 ` Jeff King
2025-11-06 6:04 ` Patrick Steinhardt
1 sibling, 1 reply; 7+ messages in thread
From: Jeff King @ 2025-11-04 21:11 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Patrick Steinhardt, git
On Tue, Nov 04, 2025 at 10:31:14AM -0800, Junio C Hamano wrote:
> > As reported by Junio in <xmqqo6pjt2wn.fsf@gitster.g>. This applies
> > directly on top of ps/ref-peeled-tags at 054f5f457e (ref-filter: parse
> > objects on demand, 2025-10-23)
> >
> > Thanks!
>
> Thanks. As we stop reusing a stale maybe_object and instead start
> parsing the right object when we need to, I wondered if the "on
> demand" commit needs a new benchmark, but the example cited in the
> message used %(raw) so it would not be affected, I guess.
Yeah. The point was to speed up stuff like %(raw) that needs the object
contents but doesn't want to do the parse. So it never looked at
maybe_object in the first place, which is why it got faster.
I actually wonder if there is any other placeholder that benefits at
all. The ref-filter code already tries to avoid doing unneeded work. The
most obvious there is not loading the object at all, which is why stuff
like "%(refname) %(objectname)" is faster than adding in %(raw), which
needs the object contents (but no parsing). And likewise stuff like
%(tag) needs parsing, and thus also triggers loading the object.
So 054f5f457e helps formats which require the object contents but _not_
parsing. I can't think of another placeholder besides %(raw) which would
benefit from that.
What _would_ help more formats is doing the parsing more progressively,
skipping parts that aren't needed. If you just want %(tree), for
example, then:
1. You don't need to look at non-commit objects.
2. You can stop parsing the commit objects after the "tree" line.
But doing that is more involved. So not an argument against the patch,
but just noting its limitations. ;)
I think the parser in pretty.c tries to be a bit more careful here. It
would be nice if we could unify these.
-Peff
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] ref-filter: fix stale parsed objects
2025-11-04 14:36 [PATCH] ref-filter: fix stale parsed objects Patrick Steinhardt
2025-11-04 15:26 ` Junio C Hamano
2025-11-04 18:31 ` Junio C Hamano
@ 2025-11-04 21:33 ` Jeff King
2 siblings, 0 replies; 7+ messages in thread
From: Jeff King @ 2025-11-04 21:33 UTC (permalink / raw)
To: Patrick Steinhardt; +Cc: git, Junio C Hamano
On Tue, Nov 04, 2025 at 03:36:13PM +0100, Patrick Steinhardt wrote:
> Fix the issue by resetting `maybe_object` in `get_object()`.
Thanks, this spot makes sense looking at the context.
> +test_expect_success 'annotated tag version sort' '
> + git tag -a -m "sample 1.0" vsample-1.0 &&
> + git tag -a -m "sample 2.0" vsample-2.0 &&
> + git tag -a -m "sample 10.0" vsample-10.0 &&
> + cat >expect <<-EOF &&
> + vsample-1.0
> + vsample-2.0
> + vsample-10.0
> + EOF
> +
> + git tag --list --sort=version:tag vsample-\* >actual &&
> + test_cmp expect actual &&
> +
> + # Ensure that we also handle this case alright in the case we have the
> + # peeled values cached e.g. via the packed-refs file.
> + git pack-refs --all &&
> + git tag --list --sort=version:tag vsample-\* &&
> + test_cmp expect actual
> +'
This test seems fine, though I think you can see the same thing even
more easily with just:
git for-each-ref --format='%(refname) %(tag)' refs/tags/vsample-\*
which shows each tag after the first with the same (wrong) tag.
Curiously if I run something similar in git.git like:
git for-each-ref --format='%(refname) %(tag)'
I get garbage uninitialized data on each of the tag lines. The
difference is that the first parsed object is a non-tag, so our stale
parsed state didn't actually fill in the tag values. Surprisingly ASan
doesn't complain, but it may be because we end up looking at memory with
a bogus type-cast (get_or_parse returns the stale "struct commit", but
we cast it to a "struct tag").
I didn't dig too deeply there since the fix here should make it all go
away (and seems to in my testing). I wondered if there was a way for a
broken repo to fool the code here (we think something is a tag, but in
the odb it's really a commit or something), but I don't think so. We
enter grab_tag_values() based on data->type being OBJ_TAG, and then we
feed that same value to parse_object_buffer(). So it will be the same
everywhere, and our cast can never do the wrong thing. And if we see
corruption (e.g., a tag refers to object X as a blob, and then another
tag refers to it as a commit), then lookup_commit() should return NULL
for us.
So I think this fix should be sufficient.
-Peff
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] ref-filter: fix stale parsed objects
2025-11-04 21:11 ` Jeff King
@ 2025-11-04 22:04 ` Jeff King
0 siblings, 0 replies; 7+ messages in thread
From: Jeff King @ 2025-11-04 22:04 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Patrick Steinhardt, git
On Tue, Nov 04, 2025 at 04:11:30PM -0500, Jeff King wrote:
> I actually wonder if there is any other placeholder that benefits at
> all. The ref-filter code already tries to avoid doing unneeded work. The
> most obvious there is not loading the object at all, which is why stuff
> like "%(refname) %(objectname)" is faster than adding in %(raw), which
> needs the object contents (but no parsing). And likewise stuff like
> %(tag) needs parsing, and thus also triggers loading the object.
>
> So 054f5f457e helps formats which require the object contents but _not_
> parsing. I can't think of another placeholder besides %(raw) which would
> benefit from that.
BTW, the one other oddity I aw while looking at this is %(describe). It
asks for SOURCE_OBJ, which causes ref-filter to load the object. But we
don't need it! We're just going to call out to git-describe.
But just changing that SOURCE_ flag is not enough, since we call
grab_describe_values() from deep inside grab_values(), which we do after
getting the object content. We'd have to move that call further up, but
taking care to handle both deref=0 and deref=1 cases. Or maybe not? Does
git-describe always return the same answer when describing a tag versus
what it points to? In which case it should be more like how the
AHEAD_BEHIND atom works.
At any rate, I was sufficiently grossed out by the ref-filter code that
I stopped poking at it. Loading the objects is an unnecessary
inefficiency, but compared to spawning a separate git-describe process,
it is probably peanuts. So somebody who cares more can dig further if
they want.
-Peff
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] ref-filter: fix stale parsed objects
2025-11-04 18:31 ` Junio C Hamano
2025-11-04 21:11 ` Jeff King
@ 2025-11-06 6:04 ` Patrick Steinhardt
1 sibling, 0 replies; 7+ messages in thread
From: Patrick Steinhardt @ 2025-11-06 6:04 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git, Jeff King
On Tue, Nov 04, 2025 at 10:31:14AM -0800, Junio C Hamano wrote:
> Patrick Steinhardt <ps@pks.im> writes:
>
> > In 054f5f457e (ref-filter: parse objects on demand, 2025-10-23) we have
> > started to skip parsing some objects in case we don't need to access
> > their values in the first place. This was done by introducing a new
> > member `struct expand_data::maybe_object` that gets populated on demand
> > via `get_or_parse_object()`.
> >
> > This has led to a regression though where the object now gets reused
> > because we don't reset it properly. The `oi` structure is declared in
> > global scope, and there is no single place where we reset it before
> > invoking `get_object()`. The consequence is that the `maybe_object`
> > member doesn't get reset across calls, so subsequent calls will end up
> > reusing the same object.
> >
> > This is only an issue for a subset of retrieved values, as not all of
> > the infrastructure ends up calling `get_or_parse_object()`. So the
> > effect is limited, which is probably why the issue wasn't detected
> > earlier.
> >
> > Fix the issue by resetting `maybe_object` in `get_object()`.
> >
> > Reported-by: Junio C Hamano <gitster@pobox.com>
> > Based-on-patch-by: Jeff King <peff@peff.net>
> > Signed-off-by: Patrick Steinhardt <ps@pks.im>
> > ---
> > As reported by Junio in <xmqqo6pjt2wn.fsf@gitster.g>. This applies
> > directly on top of ps/ref-peeled-tags at 054f5f457e (ref-filter: parse
> > objects on demand, 2025-10-23)
> >
> > Thanks!
>
> Thanks. As we stop reusing a stale maybe_object and instead start
> parsing the right object when we need to, I wondered if the "on
> demand" commit needs a new benchmark, but the example cited in the
> message used %(raw) so it would not be affected, I guess.
I just did another benchmark, and relative numbers still look the same
as in the original one:
Benchmark 1: for-each-ref (revision = a29e2e8fe7e3935e23d2a03dc429cc9c2e68bfbe~)
Time (mean ± σ): 369.6 ms ± 0.5 ms [User: 311.9 ms, System: 56.3 ms]
Range (min … max): 368.7 ms … 370.1 ms 10 runs
Benchmark 2: for-each-ref (revision = a29e2e8fe7e3935e23d2a03dc429cc9c2e68bfbe)
Time (mean ± σ): 327.9 ms ± 0.5 ms [User: 279.9 ms, System: 46.6 ms]
Range (min … max): 327.3 ms … 328.8 ms 10 runs
Summary
for-each-ref (revision = a29e2e8fe7e3935e23d2a03dc429cc9c2e68bfbe) ran
1.13 ± 0.00 times faster than for-each-ref (revision = a29e2e8fe7e3935e23d2a03dc429cc9c2e68bfbe~)
Thanks!
Patrick
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2025-11-06 6:04 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-04 14:36 [PATCH] ref-filter: fix stale parsed objects Patrick Steinhardt
2025-11-04 15:26 ` Junio C Hamano
2025-11-04 18:31 ` Junio C Hamano
2025-11-04 21:11 ` Jeff King
2025-11-04 22:04 ` Jeff King
2025-11-06 6:04 ` Patrick Steinhardt
2025-11-04 21:33 ` Jeff King
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).