* [PATCH] ref-filter: handle nested tags in --points-at option @ 2023-07-01 20:57 Jan Klötzke 2023-07-02 12:56 ` Jeff King 0 siblings, 1 reply; 15+ messages in thread From: Jan Klötzke @ 2023-07-01 20:57 UTC (permalink / raw) To: git; +Cc: Jeff King, Junio C Hamano, Steve Kemp, René Scharfe, Stefan Beller Tags are dereferenced until reaching a different object type to handle nested tags, e.g. on checkout. In contrast, "git tag --points-at=..." fails to list such nested tags because only one level of indirection is obtained in filter_refs(). Implement the recursive dereferencing for the "--points-at" option when filtering refs to unify the behaviour. Signed-off-by: Jan Klötzke <jan@kloetzke.net> --- ref-filter.c | 16 +++++++--------- t/t6302-for-each-ref-filter.sh | 2 ++ 2 files changed, 9 insertions(+), 9 deletions(-) diff --git a/ref-filter.c b/ref-filter.c index e0d03a9f8e..ad7f244414 100644 --- a/ref-filter.c +++ b/ref-filter.c @@ -2211,10 +2211,7 @@ static int for_each_fullref_in_pattern(struct ref_filter *filter, * of oids. If the given ref is a tag, check if the given tag points * at one of the oids in the given oid array. * NEEDSWORK: - * 1. Only a single level of indirection is obtained, we might want to - * change this to account for multiple levels (e.g. annotated tags - * pointing to annotated tags pointing to a commit.) - * 2. As the refs are cached we might know what refname peels to without + * As the refs are cached we might know what refname peels to without * the need to parse the object via parse_object(). peel_ref() might be a * more efficient alternative to obtain the pointee. */ @@ -2222,18 +2219,19 @@ static const struct object_id *match_points_at(struct oid_array *points_at, const struct object_id *oid, const char *refname) { - const struct object_id *tagged_oid = NULL; struct object *obj; if (oid_array_lookup(points_at, oid) >= 0) return oid; obj = parse_object(the_repository, oid); + while (obj && obj->type == OBJ_TAG) { + oid = get_tagged_oid((struct tag *)obj); + if (oid_array_lookup(points_at, oid) >= 0) + return oid; + obj = parse_object(the_repository, oid); + } if (!obj) die(_("malformed object at '%s'"), refname); - if (obj->type == OBJ_TAG) - tagged_oid = get_tagged_oid((struct tag *)obj); - if (tagged_oid && oid_array_lookup(points_at, tagged_oid) >= 0) - return tagged_oid; return NULL; } diff --git a/t/t6302-for-each-ref-filter.sh b/t/t6302-for-each-ref-filter.sh index 1ce5f490e9..af223e44d6 100755 --- a/t/t6302-for-each-ref-filter.sh +++ b/t/t6302-for-each-ref-filter.sh @@ -45,6 +45,8 @@ test_expect_success 'check signed tags with --points-at' ' sed -e "s/Z$//" >expect <<-\EOF && refs/heads/side Z refs/tags/annotated-tag four + refs/tags/doubly-annotated-tag An annotated tag + refs/tags/doubly-signed-tag A signed tag refs/tags/four Z refs/tags/signed-tag four EOF -- 2.39.2 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH] ref-filter: handle nested tags in --points-at option 2023-07-01 20:57 [PATCH] ref-filter: handle nested tags in --points-at option Jan Klötzke @ 2023-07-02 12:56 ` Jeff King 2023-07-02 16:25 ` René Scharfe ` (2 more replies) 0 siblings, 3 replies; 15+ messages in thread From: Jeff King @ 2023-07-02 12:56 UTC (permalink / raw) To: Jan Klötzke Cc: git, Junio C Hamano, Steve Kemp, René Scharfe, Stefan Beller On Sat, Jul 01, 2023 at 10:57:02PM +0200, Jan Klötzke wrote: > Tags are dereferenced until reaching a different object type to handle > nested tags, e.g. on checkout. In contrast, "git tag --points-at=..." > fails to list such nested tags because only one level of indirection is > obtained in filter_refs(). Implement the recursive dereferencing for the > "--points-at" option when filtering refs to unify the behaviour. That seems reasonable to me. It is changing the definition of --points-at slightly, but I think in a way that should be less surprising to users (i.e., we can consider the old behavior a bug). The existing documentation is sufficiently vague about "points" that I don't think it needs to be updated (though arguably we could improve that here, too). Note that most other tag-peeling in Git (like the peeled values returned by upload-pack) errs in the opposite direction: they peel completely, and don't show the intermediate values. We _could_ switch to that here, but I think it would be a behavior regression (but see below on why we might entertain the thought). My biggest question would be whether this introduces any performance penalty for the more common cases (lightweight tags and single-level annotated tags). The answer is "no", I think; we are already paying the cost to parse every object to find out if it's a tag, and your new loop only does an extra parse if we see a tag-of-tag. Good. Let me go off on a tangent here, since I'm looking at the performance of this function. The current code is already rather pessimal here, as we could probably avoid parsing non-tags entirely. Some strategies there are: 1. We could check the object type via oid_object_info() before parsing. This carries a small penalty (two lookups) for tags but a big win (avoiding loading the object contents) for non-tags. An easy way to do this is to replace the parse_object() with parse_object_with_flags(PARSE_OBJECT_SKIP_HASH_CHECK), which tries to avoid loading object contents (especially using the commit-graph for commits, which presumably covers most non-tag refs). 2. We could be using the peel_iterated_oid() interface (this is the peel_ref() thing mentioned in the comment you touched, but it has since been renamed). But it does the "peel all the way" thing mentioned above (both because of its interface, but also because that's what the packed-refs peel lines store). So to do that we'd either have to enhance the packed-refs store (which would not be too hard to do in a backwards-compatible way), or switch --points-at to only match either the direct ref value or the fully-peeled value. I don't think either of those is something your patch needs to deal with. It is not making these kinds of optimizations any harder (it is the existing "peel only once" behavior that does so). I mostly wanted to get it written down while we are all looking at this function. > diff --git a/ref-filter.c b/ref-filter.c > index e0d03a9f8e..ad7f244414 100644 > --- a/ref-filter.c > +++ b/ref-filter.c > @@ -2211,10 +2211,7 @@ static int for_each_fullref_in_pattern(struct ref_filter *filter, > * of oids. If the given ref is a tag, check if the given tag points > * at one of the oids in the given oid array. > * NEEDSWORK: > - * 1. Only a single level of indirection is obtained, we might want to > - * change this to account for multiple levels (e.g. annotated tags > - * pointing to annotated tags pointing to a commit.) > - * 2. As the refs are cached we might know what refname peels to without > + * As the refs are cached we might know what refname peels to without > * the need to parse the object via parse_object(). peel_ref() might be a > * more efficient alternative to obtain the pointee. > */ Great, thanks for cleaning up this comment. > @@ -2222,18 +2219,19 @@ static const struct object_id *match_points_at(struct oid_array *points_at, > const struct object_id *oid, > const char *refname) > { > - const struct object_id *tagged_oid = NULL; > struct object *obj; > > if (oid_array_lookup(points_at, oid) >= 0) > return oid; > obj = parse_object(the_repository, oid); > + while (obj && obj->type == OBJ_TAG) { > + oid = get_tagged_oid((struct tag *)obj); > + if (oid_array_lookup(points_at, oid) >= 0) > + return oid; > + obj = parse_object(the_repository, oid); > + } OK, so we are doing the usual peeling loop here. I wondered if we might be able to use peel_object(), but it again suffers from the "peel all the way" syndrome. So we have to loop ourselves so that we can check at each level. Good. > if (!obj) > die(_("malformed object at '%s'"), refname); This will now trigger if refname points to a broken object, or if its tag does. I think the resulting message is OK in either case (and presumably lower level code would produce extra error messages, too). > - if (obj->type == OBJ_TAG) > - tagged_oid = get_tagged_oid((struct tag *)obj); > - if (tagged_oid && oid_array_lookup(points_at, tagged_oid) >= 0) > - return tagged_oid; This code is moved into the loop body, but your version there drops the "if (tagged_oid)" check. I think that is OK (and even preferable), though. In get_tagged_oid() we will die() if the tagged object is NULL (though even before switching to that function this check was questionable, because it is "tag->tagged" that may be NULL, and we were dereferencing that unconditionally). So the code looks good. > diff --git a/t/t6302-for-each-ref-filter.sh b/t/t6302-for-each-ref-filter.sh > index 1ce5f490e9..af223e44d6 100755 > --- a/t/t6302-for-each-ref-filter.sh > +++ b/t/t6302-for-each-ref-filter.sh > @@ -45,6 +45,8 @@ test_expect_success 'check signed tags with --points-at' ' > sed -e "s/Z$//" >expect <<-\EOF && > refs/heads/side Z > refs/tags/annotated-tag four > + refs/tags/doubly-annotated-tag An annotated tag > + refs/tags/doubly-signed-tag A signed tag > refs/tags/four Z > refs/tags/signed-tag four > EOF And the test looks good, too. It is nice that we can rely on the existing setup for the doubly-* tags. Thanks for an easy-to-review patch. :) -Peff ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] ref-filter: handle nested tags in --points-at option 2023-07-02 12:56 ` Jeff King @ 2023-07-02 16:25 ` René Scharfe 2023-07-02 20:27 ` Jeff King 2023-07-02 22:02 ` Jeff King 2023-07-05 6:10 ` Junio C Hamano 2 siblings, 1 reply; 15+ messages in thread From: René Scharfe @ 2023-07-02 16:25 UTC (permalink / raw) To: Jeff King, Jan Klötzke Cc: git, Junio C Hamano, Steve Kemp, Stefan Beller Am 02.07.23 um 14:56 schrieb Jeff King: > On Sat, Jul 01, 2023 at 10:57:02PM +0200, Jan Klötzke wrote: > >> @@ -2222,18 +2219,19 @@ static const struct object_id *match_points_at(struct oid_array *points_at, >> const struct object_id *oid, >> const char *refname) >> { >> - const struct object_id *tagged_oid = NULL; >> struct object *obj; >> >> if (oid_array_lookup(points_at, oid) >= 0) >> return oid; >> obj = parse_object(the_repository, oid); >> + while (obj && obj->type == OBJ_TAG) { >> + oid = get_tagged_oid((struct tag *)obj); >> + if (oid_array_lookup(points_at, oid) >= 0) >> + return oid; >> + obj = parse_object(the_repository, oid); >> + } > > OK, so we are doing the usual peeling loop here. I wondered if we might > be able to use peel_object(), but it again suffers from the "peel all > the way" syndrome. So we have to loop ourselves so that we can check at > each level. Good. > >> if (!obj) >> die(_("malformed object at '%s'"), refname); > > This will now trigger if refname points to a broken object, or if its > tag does. I think the resulting message is OK in either case (and > presumably lower level code would produce extra error messages, too). > >> - if (obj->type == OBJ_TAG) >> - tagged_oid = get_tagged_oid((struct tag *)obj); >> - if (tagged_oid && oid_array_lookup(points_at, tagged_oid) >= 0) >> - return tagged_oid; > > This code is moved into the loop body, but your version there drops the > "if (tagged_oid)" check. I think that is OK (and even preferable), > though. In get_tagged_oid() we will die() if the tagged object is NULL > (though even before switching to that function this check was > questionable, because it is "tag->tagged" that may be NULL, and we were > dereferencing that unconditionally). The check is necessary in the current code because tagged_oid is NULL if obj is not a tag. The new code no longer needs it. > So the code looks good. I agree. René ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] ref-filter: handle nested tags in --points-at option 2023-07-02 16:25 ` René Scharfe @ 2023-07-02 20:27 ` Jeff King 0 siblings, 0 replies; 15+ messages in thread From: Jeff King @ 2023-07-02 20:27 UTC (permalink / raw) To: René Scharfe Cc: Jan Klötzke, git, Junio C Hamano, Steve Kemp, Stefan Beller On Sun, Jul 02, 2023 at 06:25:13PM +0200, René Scharfe wrote: > >> - if (obj->type == OBJ_TAG) > >> - tagged_oid = get_tagged_oid((struct tag *)obj); > >> - if (tagged_oid && oid_array_lookup(points_at, tagged_oid) >= 0) > >> - return tagged_oid; > > > > This code is moved into the loop body, but your version there drops the > > "if (tagged_oid)" check. I think that is OK (and even preferable), > > though. In get_tagged_oid() we will die() if the tagged object is NULL > > (though even before switching to that function this check was > > questionable, because it is "tag->tagged" that may be NULL, and we were > > dereferencing that unconditionally). > > The check is necessary in the current code because tagged_oid is NULL if > obj is not a tag. The new code no longer needs it. Oh, right. Probably: if (obj->type == OBJ_TAG) { const struct object_id *tagged_oid = get_tagged_oid((struct tag *)obj); if (oid_array_lookup(points_at, tagged_oid) >= 0) return tagged_oid; } would have been a better way to write the original, but the new one with the loop is better still. ;) I also notice that the function returns a pointer to an oid, even though the sole caller only cares about a boolean result. Not that big a deal, though the memory lifetime of the return value is confusing. We might return "tagged_oid" which points to a struct that will live forever, but we might also return "oid" directly, which points to memory passed in from the caller with a limited lifetime. -Peff ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] ref-filter: handle nested tags in --points-at option 2023-07-02 12:56 ` Jeff King 2023-07-02 16:25 ` René Scharfe @ 2023-07-02 22:02 ` Jeff King 2023-07-02 22:33 ` [PATCH 0/3] a few --points-at optimizations/cleanups Jeff King 2023-07-03 20:25 ` [PATCH] ref-filter: handle nested tags in --points-at option Jan Klötzke 2023-07-05 6:10 ` Junio C Hamano 2 siblings, 2 replies; 15+ messages in thread From: Jeff King @ 2023-07-02 22:02 UTC (permalink / raw) To: Jan Klötzke Cc: git, Junio C Hamano, Steve Kemp, René Scharfe, Stefan Beller On Sun, Jul 02, 2023 at 08:56:11AM -0400, Jeff King wrote: > My biggest question would be whether this introduces any performance > penalty for the more common cases (lightweight tags and single-level > annotated tags). The answer is "no", I think; we are already paying the > cost to parse every object to find out if it's a tag, and your new loop > only does an extra parse if we see a tag-of-tag. Good. Reading more carefully, I think this does actually change the performance a bit, because we end up parsing the pointed-to commits, as well. So here's before and after your patch running "git for-each-ref --points-at=HEAD" on linux.git (785 refs, all but 3 are tags): Benchmark 1: ./git.old for-each-ref --points-at=HEAD Time (mean ± σ): 11.4 ms ± 0.2 ms [User: 6.5 ms, System: 4.9 ms] Range (min … max): 11.0 ms … 12.3 ms 239 runs Benchmark 2: ./git.new for-each-ref --points-at=HEAD Time (mean ± σ): 20.6 ms ± 0.5 ms [User: 10.4 ms, System: 10.2 ms] Range (min … max): 19.8 ms … 22.7 ms 133 runs Summary './git.old for-each-ref --points-at=HEAD' ran 1.80 ± 0.06 times faster than './git.new for-each-ref --points-at=HEAD' The absolute numbers are pretty small, but the percent change isn't great. I'll send some patches in a minute that can be applied on top to improve this case, as well as fix the other issues I pointed out in the existing code. -Peff ^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 0/3] a few --points-at optimizations/cleanups 2023-07-02 22:02 ` Jeff King @ 2023-07-02 22:33 ` Jeff King 2023-07-02 22:35 ` [PATCH 1/3] ref-filter: avoid parsing tagged objects in match_points_at() Jeff King ` (2 more replies) 2023-07-03 20:25 ` [PATCH] ref-filter: handle nested tags in --points-at option Jan Klötzke 1 sibling, 3 replies; 15+ messages in thread From: Jeff King @ 2023-07-02 22:33 UTC (permalink / raw) To: Jan Klötzke Cc: git, Junio C Hamano, Steve Kemp, René Scharfe, Stefan Beller On Sun, Jul 02, 2023 at 06:02:43PM -0400, Jeff King wrote: > I'll send some patches in a minute that can be applied on top to > improve this case, as well as fix the other issues I pointed out in the > existing code. Here they are, on top of your patch. [1/3]: ref-filter: avoid parsing tagged objects in match_points_at() [2/3]: ref-filter: avoid parsing non-tags in match_points_at() [3/3]: ref-filter: simplify return type of match_points_at ref-filter.c | 32 +++++++++++++++++++++----------- 1 file changed, 21 insertions(+), 11 deletions(-) -Peff ^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 1/3] ref-filter: avoid parsing tagged objects in match_points_at() 2023-07-02 22:33 ` [PATCH 0/3] a few --points-at optimizations/cleanups Jeff King @ 2023-07-02 22:35 ` Jeff King 2023-07-02 22:37 ` [PATCH 2/3] ref-filter: avoid parsing non-tags " Jeff King 2023-07-02 22:38 ` [PATCH 3/3] ref-filter: simplify return type of match_points_at Jeff King 2 siblings, 0 replies; 15+ messages in thread From: Jeff King @ 2023-07-02 22:35 UTC (permalink / raw) To: Jan Klötzke Cc: git, Junio C Hamano, Steve Kemp, René Scharfe, Stefan Beller When we peel tags to check if they match a --points-at oid, we recursively parse the tagged object to see if it is also a tag. But since the tag itself tells us the type of the object it points to (and even gives us the appropriate object struct via its "tagged" member), we can use that directly. We do still have to make sure to call parse_tag() before looking at each tag. This is redundant for the outermost tag (since we did call parse_object() to find its type), but that's OK; parse_tag() is smart enough to make this a noop when the tag has already been parsed. In my clone of linux.git, with 782 tags (and only 3 non-tags), this yields a significant speedup (bringing us back where we were before the commit before this one started recursively dereferencing tags): Benchmark 1: ./git.old for-each-ref --points-at=HEAD --format="%(refname)" Time (mean ± σ): 20.3 ms ± 0.5 ms [User: 11.1 ms, System: 9.1 ms] Range (min … max): 19.6 ms … 21.5 ms 141 runs Benchmark 2: ./git.new for-each-ref --points-at=HEAD --format="%(refname)" Time (mean ± σ): 11.4 ms ± 0.2 ms [User: 6.3 ms, System: 5.0 ms] Range (min … max): 11.0 ms … 12.2 ms 250 runs Summary './git.new for-each-ref --points-at=HEAD --format="%(refname)"' ran 1.79 ± 0.05 times faster than './git.old for-each-ref --points-at=HEAD --format="%(refname)"' Signed-off-by: Jeff King <peff@peff.net> --- This could optionally be squashed into the original. If we leave it as its own patch, it might be worth replacing "the commit before this one" with the actual oid (once Junio has picked it up and it's stable). ref-filter.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/ref-filter.c b/ref-filter.c index ad7f244414..e091f056ab 100644 --- a/ref-filter.c +++ b/ref-filter.c @@ -2225,10 +2225,18 @@ static const struct object_id *match_points_at(struct oid_array *points_at, return oid; obj = parse_object(the_repository, oid); while (obj && obj->type == OBJ_TAG) { - oid = get_tagged_oid((struct tag *)obj); + struct tag *tag = (struct tag *)obj; + + if (parse_tag(tag) < 0) { + obj = NULL; + break; + } + + oid = get_tagged_oid(tag); if (oid_array_lookup(points_at, oid) >= 0) return oid; - obj = parse_object(the_repository, oid); + + obj = tag->tagged; } if (!obj) die(_("malformed object at '%s'"), refname); -- 2.41.0.586.g3c0cc15bc7 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH 2/3] ref-filter: avoid parsing non-tags in match_points_at() 2023-07-02 22:33 ` [PATCH 0/3] a few --points-at optimizations/cleanups Jeff King 2023-07-02 22:35 ` [PATCH 1/3] ref-filter: avoid parsing tagged objects in match_points_at() Jeff King @ 2023-07-02 22:37 ` Jeff King 2023-07-02 22:38 ` [PATCH 3/3] ref-filter: simplify return type of match_points_at Jeff King 2 siblings, 0 replies; 15+ messages in thread From: Jeff King @ 2023-07-02 22:37 UTC (permalink / raw) To: Jan Klötzke Cc: git, Junio C Hamano, Steve Kemp, René Scharfe, Stefan Beller When handling --points-at, we have to try to peel each ref to see if it's a tag that points at a requested oid. We start this process by calling parse_object() on the oid pointed to by each ref. The cost of parsing each object adds up, especially in an output that doesn't otherwise need to open the objects at all. Ideally we'd use peel_iterated_oid() here, which uses the cached information in the packed-refs file. But we can't, because our --points-at must match not only the fully peeled value, but any interim values (so if tag A points to tag B which points to commit C, we should match --points-at=B, but peel_iterated_oid() will only tell us about C). So the best we can do (absent changes to the packed-refs peel traits) is to avoid parsing non-tags. The obvious way to do that is to call oid_object_info() to check the type before parsing. But there are a few gotchas there, like checking if the object has already been parsed. Instead we can just tell parse_object() that we are OK skipping the hash check, which lets it turn on several optimizations. Commits can be loaded via the commit graph (so it's both fast and we have the benefit of the parsed data if we need it later at the output stage). Blobs are not loaded at all. Trees are still loaded, but it's rather rare to have a ref point directly to a tree (and since this is just an optimization, kicking in 99% of the time is OK). Even though we're paying for an extra lookup, the cost to avoid parsing the non-tags is a net benefit. In my git.git repository with 941 tags and 1440 other refs pointing to commits, this significantly cuts the runtime: Benchmark 1: ./git.old for-each-ref --points-at=HEAD Time (mean ± σ): 26.8 ms ± 0.5 ms [User: 24.5 ms, System: 2.2 ms] Range (min … max): 25.9 ms … 29.2 ms 107 runs Benchmark 2: ./git.new for-each-ref --points-at=HEAD Time (mean ± σ): 9.1 ms ± 0.3 ms [User: 6.8 ms, System: 2.2 ms] Range (min … max): 8.6 ms … 10.2 ms 308 runs Summary './git.new for-each-ref --points-at=HEAD' ran 2.96 ± 0.10 times faster than './git.old for-each-ref --points-at=HEAD' In a repository that is mostly annotated tags, we'd expect less improvement (we might still skip a few object loads, but that's balanced by the extra lookups). In my clone of linux.git, which has 782 tags and 3 branches, the run-time is about the same (it's actually ~1% faster on average after this patch, but that's within the run-to-run noise). Signed-off-by: Jeff King <peff@peff.net> --- ref-filter.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/ref-filter.c b/ref-filter.c index e091f056ab..f5ac486430 100644 --- a/ref-filter.c +++ b/ref-filter.c @@ -2223,7 +2223,8 @@ static const struct object_id *match_points_at(struct oid_array *points_at, if (oid_array_lookup(points_at, oid) >= 0) return oid; - obj = parse_object(the_repository, oid); + obj = parse_object_with_flags(the_repository, oid, + PARSE_OBJECT_SKIP_HASH_CHECK); while (obj && obj->type == OBJ_TAG) { struct tag *tag = (struct tag *)obj; -- 2.41.0.586.g3c0cc15bc7 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH 3/3] ref-filter: simplify return type of match_points_at 2023-07-02 22:33 ` [PATCH 0/3] a few --points-at optimizations/cleanups Jeff King 2023-07-02 22:35 ` [PATCH 1/3] ref-filter: avoid parsing tagged objects in match_points_at() Jeff King 2023-07-02 22:37 ` [PATCH 2/3] ref-filter: avoid parsing non-tags " Jeff King @ 2023-07-02 22:38 ` Jeff King 2 siblings, 0 replies; 15+ messages in thread From: Jeff King @ 2023-07-02 22:38 UTC (permalink / raw) To: Jan Klötzke Cc: git, Junio C Hamano, Steve Kemp, René Scharfe, Stefan Beller We return the oid that matched, but the sole caller only cares whether we matched anything at all. This is mostly academic, since there's only one caller, but the lifetime of the returned pointer is not immediately clear. Sometimes it points to an oid in a tag struct, which should live forever. And sometimes to the oid passed in, which only lives as long as the each_ref_fn callback we're called from. Simplify this to a boolean return which is more direct and obvious. As a bonus, this lets us avoid the weird pattern of overwriting our "oid" parameter in the loop (since we now only refer to the tagged oid one time, and can just inline the call to get it). Signed-off-by: Jeff King <peff@peff.net> --- ref-filter.c | 19 ++++++++++--------- 1 file changed, 10 insertions(+), 9 deletions(-) diff --git a/ref-filter.c b/ref-filter.c index f5ac486430..0629435f08 100644 --- a/ref-filter.c +++ b/ref-filter.c @@ -2209,20 +2209,22 @@ static int for_each_fullref_in_pattern(struct ref_filter *filter, /* * Given a ref (oid, refname), check if the ref belongs to the array * of oids. If the given ref is a tag, check if the given tag points - * at one of the oids in the given oid array. + * at one of the oids in the given oid array. Returns non-zero if a + * match is found. + * * NEEDSWORK: * As the refs are cached we might know what refname peels to without * the need to parse the object via parse_object(). peel_ref() might be a * more efficient alternative to obtain the pointee. */ -static const struct object_id *match_points_at(struct oid_array *points_at, - const struct object_id *oid, - const char *refname) +static int match_points_at(struct oid_array *points_at, + const struct object_id *oid, + const char *refname) { struct object *obj; if (oid_array_lookup(points_at, oid) >= 0) - return oid; + return 1; obj = parse_object_with_flags(the_repository, oid, PARSE_OBJECT_SKIP_HASH_CHECK); while (obj && obj->type == OBJ_TAG) { @@ -2233,15 +2235,14 @@ static const struct object_id *match_points_at(struct oid_array *points_at, break; } - oid = get_tagged_oid(tag); - if (oid_array_lookup(points_at, oid) >= 0) - return oid; + if (oid_array_lookup(points_at, get_tagged_oid(tag)) >= 0) + return 1; obj = tag->tagged; } if (!obj) die(_("malformed object at '%s'"), refname); - return NULL; + return 0; } /* -- 2.41.0.586.g3c0cc15bc7 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH] ref-filter: handle nested tags in --points-at option 2023-07-02 22:02 ` Jeff King 2023-07-02 22:33 ` [PATCH 0/3] a few --points-at optimizations/cleanups Jeff King @ 2023-07-03 20:25 ` Jan Klötzke 1 sibling, 0 replies; 15+ messages in thread From: Jan Klötzke @ 2023-07-03 20:25 UTC (permalink / raw) To: Jeff King Cc: git, Junio C Hamano, Steve Kemp, René Scharfe, Stefan Beller Am Sun, Jul 02, 2023 at 06:02:43PM -0400 schrieb Jeff King: > On Sun, Jul 02, 2023 at 08:56:11AM -0400, Jeff King wrote: > > > My biggest question would be whether this introduces any performance > > penalty for the more common cases (lightweight tags and single-level > > annotated tags). The answer is "no", I think; we are already paying the > > cost to parse every object to find out if it's a tag, and your new loop > > only does an extra parse if we see a tag-of-tag. Good. > > Reading more carefully, I think this does actually change the > performance a bit, because we end up parsing the pointed-to commits, as > well. So here's before and after your patch running "git for-each-ref > --points-at=HEAD" on linux.git (785 refs, all but 3 are tags): > > Benchmark 1: ./git.old for-each-ref --points-at=HEAD > Time (mean ± σ): 11.4 ms ± 0.2 ms [User: 6.5 ms, System: 4.9 ms] > Range (min … max): 11.0 ms … 12.3 ms 239 runs > > Benchmark 2: ./git.new for-each-ref --points-at=HEAD > Time (mean ± σ): 20.6 ms ± 0.5 ms [User: 10.4 ms, System: 10.2 ms] > Range (min … max): 19.8 ms … 22.7 ms 133 runs > > Summary > './git.old for-each-ref --points-at=HEAD' ran > 1.80 ± 0.06 times faster than './git.new for-each-ref --points-at=HEAD' > > The absolute numbers are pretty small, but the percent change isn't > great. I'll send some patches in a minute that can be applied on top to > improve this case, as well as fix the other issues I pointed out in the > existing code. I have to admit I was not entirely sure about the performance implications. The relative performance drop is indeed substantial. Thanks for the thorough review and swiftly taking care of these! -- Jan ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] ref-filter: handle nested tags in --points-at option 2023-07-02 12:56 ` Jeff King 2023-07-02 16:25 ` René Scharfe 2023-07-02 22:02 ` Jeff King @ 2023-07-05 6:10 ` Junio C Hamano 2023-07-05 12:41 ` Jeff King 2 siblings, 1 reply; 15+ messages in thread From: Junio C Hamano @ 2023-07-05 6:10 UTC (permalink / raw) To: Jeff King Cc: Jan Klötzke, git, Steve Kemp, René Scharfe, Stefan Beller Jeff King <peff@peff.net> writes: > That seems reasonable to me. It is changing the definition of > --points-at slightly, but I think in a way that should be less > surprising to users (i.e., we can consider the old behavior a bug). OK. > The > existing documentation is sufficiently vague about "points" that I don't > think it needs to be updated (though arguably we could improve that > here, too). True, too. Having said that, as we lack a primitive exposed to the user script to only peel one level of a tag, it makes me wonder how much of this is a practical issue. Is a tag that points at another tag mere curiosity and subject of mental gymnastics, or is it a useful construct with real world use case? If the latter, isn't it more like "a tag to a tag _could_ be made useful if we also supported X and Y and Z" where X could be "There should be a syntax like A^{commit} that lets us peel only one level of tag"? > Note that most other tag-peeling in Git (like the peeled values returned > by upload-pack) errs in the opposite direction: they peel completely, > and don't show the intermediate values. Exactly. While allowing to feed the intermediate level as the argument to "points at" does sound like a good idea on surface, I am skeptical about the practical value it brings---if we cannot do this without any additional overhead (of course if there are tags to tags in the repository, I am perfectly fine to pay the cost of dereferencing the chain, but I am more worried about regressing the performance in repositories without tags to tags), I am not sure if it is worth it. > My biggest question would be whether this introduces any performance > penalty for the more common cases (lightweight tags and single-level > annotated tags). The answer is "no", I think; we are already paying the > cost to parse every object to find out if it's a tag, and your new loop > only does an extra parse if we see a tag-of-tag. Good. OK. > Let me go off on a tangent here, since I'm looking at the performance > of this function. The current code is already rather pessimal here, as > we could probably avoid parsing non-tags entirely. Some strategies > there are: > > 1. We could check the object type via oid_object_info() before > parsing. This carries a small penalty (two lookups) for tags but > a big win (avoiding loading the object contents) for non-tags. > > An easy way to do this is to replace the parse_object() with > parse_object_with_flags(PARSE_OBJECT_SKIP_HASH_CHECK), which > tries to avoid loading object contents (especially using the > commit-graph for commits, which presumably covers most non-tag > refs). > > 2. We could be using the peel_iterated_oid() interface (this is the > peel_ref() thing mentioned in the comment you touched, but it has > since been renamed). But it does the "peel all the way" thing > mentioned above (both because of its interface, but also because > that's what the packed-refs peel lines store). > > So to do that we'd either have to enhance the packed-refs store > (which would not be too hard to do in a backwards-compatible > way), or switch --points-at to only match either the direct ref > value or the fully-peeled value. > > I don't think either of those is something your patch needs to deal > with. It is not making these kinds of optimizations any harder (it is > the existing "peel only once" behavior that does so). I mostly wanted > to get it written down while we are all looking at this function. My preference would be to see these optimization done first and then add this new loop on top of it. That way, we can measure more easily what kind of additional overhead, if any, we are paying by adding the loop. Thanks. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] ref-filter: handle nested tags in --points-at option 2023-07-05 6:10 ` Junio C Hamano @ 2023-07-05 12:41 ` Jeff King 2023-07-05 17:16 ` Junio C Hamano 0 siblings, 1 reply; 15+ messages in thread From: Jeff King @ 2023-07-05 12:41 UTC (permalink / raw) To: Junio C Hamano Cc: Jan Klötzke, git, Steve Kemp, René Scharfe, Stefan Beller On Tue, Jul 04, 2023 at 11:10:47PM -0700, Junio C Hamano wrote: > My preference would be to see these optimization done first and then > add this new loop on top of it. That way, we can measure more > easily what kind of additional overhead, if any, we are paying by > adding the loop. I ended up doing them on top, rather than before, but I think the size of the impact can easily be seen. The one thing that would actually make us a lot faster (by using the packed-refs peels) is to make full peels the only option, and do not bother letting --points-at match "B" in an A->B->C peel. But that would be removing something that is currently matched (even before the patch in this thread), so I stopped short of it in my optimizations. But even if we decide to do that, Jan's patch is not making anything worse there (in fact, it is making it better, because it is matching "C" which we do not currently match). So I'd be inclined to proceed with the patches I sent earlier, and then if we choose to later refactor again to drop "B", we can. Of course if somebody wants to do that refactor _now_ instead of the patches already posted, I'm OK with that. It just won't be me due to time constraints. -Peff ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] ref-filter: handle nested tags in --points-at option 2023-07-05 12:41 ` Jeff King @ 2023-07-05 17:16 ` Junio C Hamano 2023-07-05 18:50 ` Jan Klötzke 0 siblings, 1 reply; 15+ messages in thread From: Junio C Hamano @ 2023-07-05 17:16 UTC (permalink / raw) To: Jeff King Cc: Jan Klötzke, git, Steve Kemp, René Scharfe, Stefan Beller Jeff King <peff@peff.net> writes: > On Tue, Jul 04, 2023 at 11:10:47PM -0700, Junio C Hamano wrote: > >> My preference would be to see these optimization done first and then >> add this new loop on top of it. That way, we can measure more >> easily what kind of additional overhead, if any, we are paying by >> adding the loop. > > I ended up doing them on top, rather than before, but I think the size > of the impact can easily be seen. Ah, I should have first read (or at least skimmed) all messages before responding. Thanks for following through. > The one thing that would actually make us a lot faster (by using the > packed-refs peels) is to make full peels the only option, and do not > bother letting --points-at match "B" in an A->B->C peel. But that would > be removing something that is currently matched (even before the patch > in this thread), so I stopped short of it in my optimizations. Interesting. Right now, if I create a 'direct' tag that points directly at HEAD, and then create an 'indirect' tag that points at 'direct', i.e. $ git tag -a -m 'a direct tag to HEAD' direct HEAD $ git tag -a -m 'an indirect tag' indirect direct I would get a piece of advice message that encourages to correct the mistake with "git tag -f indirect direct^{}". Then I ask for tags that point at HEAD, I see only 'direct' and not 'indirect'. Your optimization would start showing both 'direct' and 'indirect' if they are packed. But you are correct to worry about the opposite case. If I ask for tags that point at 'direct', I currently see 'indirect', but of course 'indirect' will not appear as the peeled value of any ref, and the optimized version will stop saying that 'indirect' is a ref that points at 'direct'. That sounds like a regression. > But even > if we decide to do that, Jan's patch is not making anything worse there > (in fact, it is making it better, because it is matching "C" which we do > not currently match). > ... > So I'd be inclined to proceed with the patches I sent earlier, and then > if we choose to later refactor again to drop "B", we can. We generally avoid taking away anything once we give it to users; once the patch under discussion goes in, there is no taking it back, i.e. the new _behaviour_ closes the door to certain optimizations. I do not at all mind to see us decide and declare that it is a good thing to say that not just 'direct' but also 'indirect' points at HEAD, and that 'indirect' points at 'direct' and the patch under discussion makes the world a etter place, and we will not regret that decision. But the time to make such a decision is now, before we give a go-ahead to the patch. Thanks. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] ref-filter: handle nested tags in --points-at option 2023-07-05 17:16 ` Junio C Hamano @ 2023-07-05 18:50 ` Jan Klötzke 2023-07-05 20:15 ` Junio C Hamano 0 siblings, 1 reply; 15+ messages in thread From: Jan Klötzke @ 2023-07-05 18:50 UTC (permalink / raw) To: Junio C Hamano Cc: Jeff King, git, Steve Kemp, René Scharfe, Stefan Beller Am Wed, Jul 05, 2023 at 10:16:17AM -0700 schrieb Junio C Hamano: > > The one thing that would actually make us a lot faster (by using the > > packed-refs peels) is to make full peels the only option, and do not > > bother letting --points-at match "B" in an A->B->C peel. But that would > > be removing something that is currently matched (even before the patch > > in this thread), so I stopped short of it in my optimizations. > > Interesting. Right now, if I create a 'direct' tag that points > directly at HEAD, and then create an 'indirect' tag that points at > 'direct', i.e. > > $ git tag -a -m 'a direct tag to HEAD' direct HEAD > $ git tag -a -m 'an indirect tag' indirect direct > > I would get a piece of advice message that encourages to correct the > mistake with "git tag -f indirect direct^{}". Then I ask for tags > that point at HEAD, I see only 'direct' and not 'indirect'. Your > optimization would start showing both 'direct' and 'indirect' if > they are packed. But you are correct to worry about the opposite > case. If I ask for tags that point at 'direct', I currently see > 'indirect', but of course 'indirect' will not appear as the peeled > value of any ref, and the optimized version will stop saying that > 'indirect' is a ref that points at 'direct'. That sounds like a > regression. I agree. I purposely let the loop run through all the tags in the chain in my inital patch to keep the existing behaviour. Unless there is a compelling reason I would suggest do keep it this way. I wouldn't rule out that someone relies on being able to query for indirect tags by supplying a tag to the --points-at option. > > But even > > if we decide to do that, Jan's patch is not making anything worse there > > (in fact, it is making it better, because it is matching "C" which we do > > not currently match). > > ... > > So I'd be inclined to proceed with the patches I sent earlier, and then > > if we choose to later refactor again to drop "B", we can. > > We generally avoid taking away anything once we give it to users; > once the patch under discussion goes in, there is no taking it back, > i.e. the new _behaviour_ closes the door to certain optimizations. > > I do not at all mind to see us decide and declare that it is a good > thing to say that not just 'direct' but also 'indirect' points at > HEAD, and that 'indirect' points at 'direct' and the patch under > discussion makes the world a etter place, and we will not regret > that decision. But the time to make such a decision is now, before > we give a go-ahead to the patch. The reasoning why I came up with the patch in the first place was an odd behaviour that was reported to me in [1]. The user had a recipe that checked out a nested tag for a package but the Bob Build Tool thought the user messed with the working copy. Behind the scenes Bob checked out the indirect tag (foobar-3.13.1) successfully. But when the user examined the project state ("bob status"), Bob ran git tag --points-at HEAD in the working copy to get the list of tags for the current HEAD. That did not return the expected tag in the list, so the workspace state was flagged. OTOH, the tag _is_ visible when manually executing "git log" in the workspace. Clearly this is confusing and so was I when I tried to find the root cause. To sum up: $ git checkout nested-tag --> works $ git log --> shows nested tag in decoration $ git tag --points-at HEAD --> does *not* show nested tag AFAICT the nested tag in the mentioned bug report was not created on purpose. I haven't come across any sensible use case for them either. But as most git commands can handle nested tags they should better be supported consistently IMHO. -- Jan [1] https://github.com/BobBuildTool/bob/issues/520 ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] ref-filter: handle nested tags in --points-at option 2023-07-05 18:50 ` Jan Klötzke @ 2023-07-05 20:15 ` Junio C Hamano 0 siblings, 0 replies; 15+ messages in thread From: Junio C Hamano @ 2023-07-05 20:15 UTC (permalink / raw) To: Jan Klötzke Cc: Jeff King, git, Steve Kemp, René Scharfe, Stefan Beller Jan Klötzke <jan@kloetzke.net> writes: > Am Wed, Jul 05, 2023 at 10:16:17AM -0700 schrieb Junio C Hamano: >> Interesting. Right now, if I create a 'direct' tag that points >> directly at HEAD, and then create an 'indirect' tag that points at >> 'direct', i.e. >> >> $ git tag -a -m 'a direct tag to HEAD' direct HEAD >> $ git tag -a -m 'an indirect tag' indirect direct >> >> I would get a piece of advice message that encourages to correct the >> mistake with "git tag -f indirect direct^{}". > ... > But as most git commands can handle nested tags they should better be > supported consistently IMHO. Do they? Most git commands handle nested tags in only one way: by fully peeling. "git checkout --detach indirect" in the above scenario would handle nested tag "indirect" well, but it is done by making "direct" tag inaccessible when the only thing you have is the "indirect" tag. For example, you cannot create another "indirect" tag that points at "direct" tag with "git tag", with $ git tag -a -m 'another indirect' indirect-2 indirect^{} The resulting tag will be another direct tag to the underlying commit, and not a tag of the "direct" tag. In that sense, --points-at we currently have that only peels once is inconsistent with the others, but --points-at that peels repeatedly and allows the intermediate steps to match is also behaving inconsistently relative to most git commands. Combined with the fact that we seem to discourage such an indirect tag, we should either: (1) declare that indirect tags are not useful, turn the warning advice.nestedTag into a stronger error, devise appropriate transition plan to get rid of nested tag (e.g. eventually making it impossible to use "git tag" to create such a tag and let "git fsck" complain about them), and perhaps change "--points-at" to take only the fully peeled object into account so that optimization based on packed-refs becomes possible. Or (2) declare that indirect tags are useful thing to support, tone down the advice.nestedTag message, and enhance the support of indirect tags, starting with this "--points-at" enhancement. I am inclined to support (2), but then a consistent support would need to eventually include a "peel only a single level" primitive as well. That would be the first step to allow "most git commands" to support nested tags well, as they currently do not. Thanks for working on this. Let's queue it, together will Peff's patches (which I haven't studied fully yet). ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2023-07-05 20:15 UTC | newest] Thread overview: 15+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-07-01 20:57 [PATCH] ref-filter: handle nested tags in --points-at option Jan Klötzke 2023-07-02 12:56 ` Jeff King 2023-07-02 16:25 ` René Scharfe 2023-07-02 20:27 ` Jeff King 2023-07-02 22:02 ` Jeff King 2023-07-02 22:33 ` [PATCH 0/3] a few --points-at optimizations/cleanups Jeff King 2023-07-02 22:35 ` [PATCH 1/3] ref-filter: avoid parsing tagged objects in match_points_at() Jeff King 2023-07-02 22:37 ` [PATCH 2/3] ref-filter: avoid parsing non-tags " Jeff King 2023-07-02 22:38 ` [PATCH 3/3] ref-filter: simplify return type of match_points_at Jeff King 2023-07-03 20:25 ` [PATCH] ref-filter: handle nested tags in --points-at option Jan Klötzke 2023-07-05 6:10 ` Junio C Hamano 2023-07-05 12:41 ` Jeff King 2023-07-05 17:16 ` Junio C Hamano 2023-07-05 18:50 ` Jan Klötzke 2023-07-05 20:15 ` Junio C Hamano
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).