[PATCH] ref-filter: handle nested tags in --points-at option

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] ref-filter: handle nested tags in --points-at option
@ 2023-07-01 20:57 Jan Klötzke
  2023-07-02 12:56 ` Jeff King
  0 siblings, 1 reply; 15+ messages in thread
From: Jan Klötzke @ 2023-07-01 20:57 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Junio C Hamano, Steve Kemp, René Scharfe,
	Stefan Beller

Tags are dereferenced until reaching a different object type to handle
nested tags, e.g. on checkout. In contrast, "git tag --points-at=..."
fails to list such nested tags because only one level of indirection is
obtained in filter_refs(). Implement the recursive dereferencing for the
"--points-at" option when filtering refs to unify the behaviour.

Signed-off-by: Jan Klötzke <jan@kloetzke.net>
---
 ref-filter.c                   | 16 +++++++---------
 t/t6302-for-each-ref-filter.sh |  2 ++
 2 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/ref-filter.c b/ref-filter.c
index e0d03a9f8e..ad7f244414 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -2211,10 +2211,7 @@ static int for_each_fullref_in_pattern(struct ref_filter *filter,
  * of oids. If the given ref is a tag, check if the given tag points
  * at one of the oids in the given oid array.
  * NEEDSWORK:
- * 1. Only a single level of indirection is obtained, we might want to
- * change this to account for multiple levels (e.g. annotated tags
- * pointing to annotated tags pointing to a commit.)
- * 2. As the refs are cached we might know what refname peels to without
+ * As the refs are cached we might know what refname peels to without
  * the need to parse the object via parse_object(). peel_ref() might be a
  * more efficient alternative to obtain the pointee.
  */
@@ -2222,18 +2219,19 @@ static const struct object_id *match_points_at(struct oid_array *points_at,
 					       const struct object_id *oid,
 					       const char *refname)
 {
-	const struct object_id *tagged_oid = NULL;
 	struct object *obj;
 
 	if (oid_array_lookup(points_at, oid) >= 0)
 		return oid;
 	obj = parse_object(the_repository, oid);
+	while (obj && obj->type == OBJ_TAG) {
+		oid = get_tagged_oid((struct tag *)obj);
+		if (oid_array_lookup(points_at, oid) >= 0)
+			return oid;
+		obj = parse_object(the_repository, oid);
+	}
 	if (!obj)
 		die(_("malformed object at '%s'"), refname);
-	if (obj->type == OBJ_TAG)
-		tagged_oid = get_tagged_oid((struct tag *)obj);
-	if (tagged_oid && oid_array_lookup(points_at, tagged_oid) >= 0)
-		return tagged_oid;
 	return NULL;
 }
 
diff --git a/t/t6302-for-each-ref-filter.sh b/t/t6302-for-each-ref-filter.sh
index 1ce5f490e9..af223e44d6 100755
--- a/t/t6302-for-each-ref-filter.sh
+++ b/t/t6302-for-each-ref-filter.sh
@@ -45,6 +45,8 @@ test_expect_success 'check signed tags with --points-at' '
 	sed -e "s/Z$//" >expect <<-\EOF &&
 	refs/heads/side Z
 	refs/tags/annotated-tag four
+	refs/tags/doubly-annotated-tag An annotated tag
+	refs/tags/doubly-signed-tag A signed tag
 	refs/tags/four Z
 	refs/tags/signed-tag four
 	EOF
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH] ref-filter: handle nested tags in --points-at option
  2023-07-01 20:57 [PATCH] ref-filter: handle nested tags in --points-at option Jan Klötzke
@ 2023-07-02 12:56 ` Jeff King
  2023-07-02 16:25   ` René Scharfe
                     ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Jeff King @ 2023-07-02 12:56 UTC (permalink / raw)
  To: Jan Klötzke
  Cc: git, Junio C Hamano, Steve Kemp, René Scharfe, Stefan Beller

On Sat, Jul 01, 2023 at 10:57:02PM +0200, Jan Klötzke wrote:

> Tags are dereferenced until reaching a different object type to handle
> nested tags, e.g. on checkout. In contrast, "git tag --points-at=..."
> fails to list such nested tags because only one level of indirection is
> obtained in filter_refs(). Implement the recursive dereferencing for the
> "--points-at" option when filtering refs to unify the behaviour.

That seems reasonable to me. It is changing the definition of
--points-at slightly, but I think in a way that should be less
surprising to users (i.e., we can consider the old behavior a bug).  The
existing documentation is sufficiently vague about "points" that I don't
think it needs to be updated (though arguably we could improve that
here, too).

Note that most other tag-peeling in Git (like the peeled values returned
by upload-pack) errs in the opposite direction: they peel completely,
and don't show the intermediate values. We _could_ switch to that here,
but I think it would be a behavior regression (but see below on why we
might entertain the thought).

My biggest question would be whether this introduces any performance
penalty for the more common cases (lightweight tags and single-level
annotated tags). The answer is "no", I think; we are already paying the
cost to parse every object to find out if it's a tag, and your new loop
only does an extra parse if we see a tag-of-tag. Good.

  Let me go off on a tangent here, since I'm looking at the performance
  of this function. The current code is already rather pessimal here, as
  we could probably avoid parsing non-tags entirely. Some strategies
  there are:

    1. We could check the object type via oid_object_info() before
       parsing. This carries a small penalty (two lookups) for tags but
       a big win (avoiding loading the object contents) for non-tags.

       An easy way to do this is to replace the parse_object() with
       parse_object_with_flags(PARSE_OBJECT_SKIP_HASH_CHECK), which
       tries to avoid loading object contents (especially using the
       commit-graph for commits, which presumably covers most non-tag
       refs).

    2. We could be using the peel_iterated_oid() interface (this is the
       peel_ref() thing mentioned in the comment you touched, but it has
       since been renamed). But it does the "peel all the way" thing
       mentioned above (both because of its interface, but also because
       that's what the packed-refs peel lines store).

       So to do that we'd either have to enhance the packed-refs store
       (which would not be too hard to do in a backwards-compatible
       way), or switch --points-at to only match either the direct ref
       value or the fully-peeled value.

  I don't think either of those is something your patch needs to deal
  with. It is not making these kinds of optimizations any harder (it is
  the existing "peel only once" behavior that does so). I mostly wanted
  to get it written down while we are all looking at this function.

> diff --git a/ref-filter.c b/ref-filter.c
> index e0d03a9f8e..ad7f244414 100644
> --- a/ref-filter.c
> +++ b/ref-filter.c
> @@ -2211,10 +2211,7 @@ static int for_each_fullref_in_pattern(struct ref_filter *filter,
>   * of oids. If the given ref is a tag, check if the given tag points
>   * at one of the oids in the given oid array.
>   * NEEDSWORK:
> - * 1. Only a single level of indirection is obtained, we might want to
> - * change this to account for multiple levels (e.g. annotated tags
> - * pointing to annotated tags pointing to a commit.)
> - * 2. As the refs are cached we might know what refname peels to without
> + * As the refs are cached we might know what refname peels to without
>   * the need to parse the object via parse_object(). peel_ref() might be a
>   * more efficient alternative to obtain the pointee.
>   */

Great, thanks for cleaning up this comment.

> @@ -2222,18 +2219,19 @@ static const struct object_id *match_points_at(struct oid_array *points_at,
>  					       const struct object_id *oid,
>  					       const char *refname)
>  {
> -	const struct object_id *tagged_oid = NULL;
>  	struct object *obj;
>  
>  	if (oid_array_lookup(points_at, oid) >= 0)
>  		return oid;
>  	obj = parse_object(the_repository, oid);
> +	while (obj && obj->type == OBJ_TAG) {
> +		oid = get_tagged_oid((struct tag *)obj);
> +		if (oid_array_lookup(points_at, oid) >= 0)
> +			return oid;
> +		obj = parse_object(the_repository, oid);
> +	}

OK, so we are doing the usual peeling loop here. I wondered if we might
be able to use peel_object(), but it again suffers from the "peel all
the way" syndrome. So we have to loop ourselves so that we can check at
each level. Good.

>  	if (!obj)
>  		die(_("malformed object at '%s'"), refname);

This will now trigger if refname points to a broken object, or if its
tag does. I think the resulting message is OK in either case (and
presumably lower level code would produce extra error messages, too).

> -	if (obj->type == OBJ_TAG)
> -		tagged_oid = get_tagged_oid((struct tag *)obj);
> -	if (tagged_oid && oid_array_lookup(points_at, tagged_oid) >= 0)
> -		return tagged_oid;

This code is moved into the loop body, but your version there drops the
"if (tagged_oid)" check. I think that is OK (and even preferable),
though. In get_tagged_oid() we will die() if the tagged object is NULL
(though even before switching to that function this check was
questionable, because it is "tag->tagged" that may be NULL, and we were
dereferencing that unconditionally).

So the code looks good.

> diff --git a/t/t6302-for-each-ref-filter.sh b/t/t6302-for-each-ref-filter.sh
> index 1ce5f490e9..af223e44d6 100755
> --- a/t/t6302-for-each-ref-filter.sh
> +++ b/t/t6302-for-each-ref-filter.sh
> @@ -45,6 +45,8 @@ test_expect_success 'check signed tags with --points-at' '
>  	sed -e "s/Z$//" >expect <<-\EOF &&
>  	refs/heads/side Z
>  	refs/tags/annotated-tag four
> +	refs/tags/doubly-annotated-tag An annotated tag
> +	refs/tags/doubly-signed-tag A signed tag
>  	refs/tags/four Z
>  	refs/tags/signed-tag four
>  	EOF

And the test looks good, too. It is nice that we can rely on the
existing setup for the doubly-* tags.

Thanks for an easy-to-review patch. :)

-Peff

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] ref-filter: handle nested tags in --points-at option
  2023-07-02 12:56 ` Jeff King
@ 2023-07-02 16:25   ` René Scharfe
  2023-07-02 20:27     ` Jeff King
  2023-07-02 22:02   ` Jeff King
  2023-07-05  6:10   ` Junio C Hamano
  2 siblings, 1 reply; 15+ messages in thread
From: René Scharfe @ 2023-07-02 16:25 UTC (permalink / raw)
  To: Jeff King, Jan Klötzke
  Cc: git, Junio C Hamano, Steve Kemp, Stefan Beller

Am 02.07.23 um 14:56 schrieb Jeff King:
> On Sat, Jul 01, 2023 at 10:57:02PM +0200, Jan Klötzke wrote:
>
>> @@ -2222,18 +2219,19 @@ static const struct object_id *match_points_at(struct oid_array *points_at,
>>  					       const struct object_id *oid,
>>  					       const char *refname)
>>  {
>> -	const struct object_id *tagged_oid = NULL;
>>  	struct object *obj;
>>
>>  	if (oid_array_lookup(points_at, oid) >= 0)
>>  		return oid;
>>  	obj = parse_object(the_repository, oid);
>> +	while (obj && obj->type == OBJ_TAG) {
>> +		oid = get_tagged_oid((struct tag *)obj);
>> +		if (oid_array_lookup(points_at, oid) >= 0)
>> +			return oid;
>> +		obj = parse_object(the_repository, oid);
>> +	}
>
> OK, so we are doing the usual peeling loop here. I wondered if we might
> be able to use peel_object(), but it again suffers from the "peel all
> the way" syndrome. So we have to loop ourselves so that we can check at
> each level. Good.
>
>>  	if (!obj)
>>  		die(_("malformed object at '%s'"), refname);
>
> This will now trigger if refname points to a broken object, or if its
> tag does. I think the resulting message is OK in either case (and
> presumably lower level code would produce extra error messages, too).
>
>> -	if (obj->type == OBJ_TAG)
>> -		tagged_oid = get_tagged_oid((struct tag *)obj);
>> -	if (tagged_oid && oid_array_lookup(points_at, tagged_oid) >= 0)
>> -		return tagged_oid;
>
> This code is moved into the loop body, but your version there drops the
> "if (tagged_oid)" check. I think that is OK (and even preferable),
> though. In get_tagged_oid() we will die() if the tagged object is NULL
> (though even before switching to that function this check was
> questionable, because it is "tag->tagged" that may be NULL, and we were
> dereferencing that unconditionally).

The check is necessary in the current code because tagged_oid is NULL if
obj is not a tag.  The new code no longer needs it.

> So the code looks good.

I agree.

René

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] ref-filter: handle nested tags in --points-at option
  2023-07-02 16:25   ` René Scharfe
@ 2023-07-02 20:27     ` Jeff King
  0 siblings, 0 replies; 15+ messages in thread
From: Jeff King @ 2023-07-02 20:27 UTC (permalink / raw)
  To: René Scharfe
  Cc: Jan Klötzke, git, Junio C Hamano, Steve Kemp, Stefan Beller

On Sun, Jul 02, 2023 at 06:25:13PM +0200, René Scharfe wrote:

> >> -	if (obj->type == OBJ_TAG)
> >> -		tagged_oid = get_tagged_oid((struct tag *)obj);
> >> -	if (tagged_oid && oid_array_lookup(points_at, tagged_oid) >= 0)
> >> -		return tagged_oid;
> >
> > This code is moved into the loop body, but your version there drops the
> > "if (tagged_oid)" check. I think that is OK (and even preferable),
> > though. In get_tagged_oid() we will die() if the tagged object is NULL
> > (though even before switching to that function this check was
> > questionable, because it is "tag->tagged" that may be NULL, and we were
> > dereferencing that unconditionally).
> 
> The check is necessary in the current code because tagged_oid is NULL if
> obj is not a tag.  The new code no longer needs it.

Oh, right. Probably:

	if (obj->type == OBJ_TAG) {
		const struct object_id *tagged_oid = get_tagged_oid((struct tag *)obj);
		if (oid_array_lookup(points_at, tagged_oid) >= 0)
			return tagged_oid;
	}

would have been a better way to write the original, but the new one with
the loop is better still. ;)

I also notice that the function returns a pointer to an oid, even though
the sole caller only cares about a boolean result. Not that big a deal,
though the memory lifetime of the return value is confusing.  We might
return "tagged_oid" which points to a struct that will live forever, but
we might also return "oid" directly, which points to memory passed in
from the caller with a limited lifetime.

-Peff

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] ref-filter: handle nested tags in --points-at option
  2023-07-02 12:56 ` Jeff King
  2023-07-02 16:25   ` René Scharfe
@ 2023-07-02 22:02   ` Jeff King
  2023-07-02 22:33     ` [PATCH 0/3] a few --points-at optimizations/cleanups Jeff King
  2023-07-03 20:25     ` [PATCH] ref-filter: handle nested tags in --points-at option Jan Klötzke
  2023-07-05  6:10   ` Junio C Hamano
  2 siblings, 2 replies; 15+ messages in thread
From: Jeff King @ 2023-07-02 22:02 UTC (permalink / raw)
  To: Jan Klötzke
  Cc: git, Junio C Hamano, Steve Kemp, René Scharfe, Stefan Beller

On Sun, Jul 02, 2023 at 08:56:11AM -0400, Jeff King wrote:

> My biggest question would be whether this introduces any performance
> penalty for the more common cases (lightweight tags and single-level
> annotated tags). The answer is "no", I think; we are already paying the
> cost to parse every object to find out if it's a tag, and your new loop
> only does an extra parse if we see a tag-of-tag. Good.

Reading more carefully, I think this does actually change the
performance a bit, because we end up parsing the pointed-to commits, as
well. So here's before and after your patch running "git for-each-ref
--points-at=HEAD" on linux.git (785 refs, all but 3 are tags):

  Benchmark 1: ./git.old for-each-ref --points-at=HEAD
    Time (mean ± σ):      11.4 ms ±   0.2 ms    [User: 6.5 ms, System: 4.9 ms]
    Range (min … max):    11.0 ms …  12.3 ms    239 runs
  
  Benchmark 2: ./git.new for-each-ref --points-at=HEAD
    Time (mean ± σ):      20.6 ms ±   0.5 ms    [User: 10.4 ms, System: 10.2 ms]
    Range (min … max):    19.8 ms …  22.7 ms    133 runs
  
  Summary
    './git.old for-each-ref --points-at=HEAD' ran
      1.80 ± 0.06 times faster than './git.new for-each-ref --points-at=HEAD'

The absolute numbers are pretty small, but the percent change isn't
great. I'll send some patches in a minute that can be applied on top to
improve this case, as well as fix the other issues I pointed out in the
existing code.

-Peff

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 0/3] a few --points-at optimizations/cleanups
  2023-07-02 22:02   ` Jeff King
@ 2023-07-02 22:33     ` Jeff King
  2023-07-02 22:35       ` [PATCH 1/3] ref-filter: avoid parsing tagged objects in match_points_at() Jeff King
                         ` (2 more replies)
  2023-07-03 20:25     ` [PATCH] ref-filter: handle nested tags in --points-at option Jan Klötzke
  1 sibling, 3 replies; 15+ messages in thread
From: Jeff King @ 2023-07-02 22:33 UTC (permalink / raw)
  To: Jan Klötzke
  Cc: git, Junio C Hamano, Steve Kemp, René Scharfe, Stefan Beller

On Sun, Jul 02, 2023 at 06:02:43PM -0400, Jeff King wrote:

> I'll send some patches in a minute that can be applied on top to
> improve this case, as well as fix the other issues I pointed out in the
> existing code.

Here they are, on top of your patch.

  [1/3]: ref-filter: avoid parsing tagged objects in match_points_at()
  [2/3]: ref-filter: avoid parsing non-tags in match_points_at()
  [3/3]: ref-filter: simplify return type of match_points_at

 ref-filter.c | 32 +++++++++++++++++++++-----------
 1 file changed, 21 insertions(+), 11 deletions(-)

-Peff

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/3] ref-filter: avoid parsing tagged objects in match_points_at()
  2023-07-02 22:33     ` [PATCH 0/3] a few --points-at optimizations/cleanups Jeff King
@ 2023-07-02 22:35       ` Jeff King
  2023-07-02 22:37       ` [PATCH 2/3] ref-filter: avoid parsing non-tags " Jeff King
  2023-07-02 22:38       ` [PATCH 3/3] ref-filter: simplify return type of match_points_at Jeff King
  2 siblings, 0 replies; 15+ messages in thread
From: Jeff King @ 2023-07-02 22:35 UTC (permalink / raw)
  To: Jan Klötzke
  Cc: git, Junio C Hamano, Steve Kemp, René Scharfe, Stefan Beller

When we peel tags to check if they match a --points-at oid, we
recursively parse the tagged object to see if it is also a tag. But
since the tag itself tells us the type of the object it points to (and
even gives us the appropriate object struct via its "tagged" member), we
can use that directly.

We do still have to make sure to call parse_tag() before looking at each
tag. This is redundant for the outermost tag (since we did call
parse_object() to find its type), but that's OK; parse_tag() is smart
enough to make this a noop when the tag has already been parsed.

In my clone of linux.git, with 782 tags (and only 3 non-tags), this
yields a significant speedup (bringing us back where we were before the
commit before this one started recursively dereferencing tags):

  Benchmark 1: ./git.old for-each-ref --points-at=HEAD --format="%(refname)"
    Time (mean ± σ):      20.3 ms ±   0.5 ms    [User: 11.1 ms, System: 9.1 ms]
    Range (min … max):    19.6 ms …  21.5 ms    141 runs

  Benchmark 2: ./git.new for-each-ref --points-at=HEAD --format="%(refname)"
    Time (mean ± σ):      11.4 ms ±   0.2 ms    [User: 6.3 ms, System: 5.0 ms]
    Range (min … max):    11.0 ms …  12.2 ms    250 runs

  Summary
    './git.new for-each-ref --points-at=HEAD --format="%(refname)"' ran
      1.79 ± 0.05 times faster than './git.old for-each-ref --points-at=HEAD --format="%(refname)"'

Signed-off-by: Jeff King <peff@peff.net>
---
This could optionally be squashed into the original. If we leave it as
its own patch, it might be worth replacing "the commit before this one"
with the actual oid (once Junio has picked it up and it's stable).

 ref-filter.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/ref-filter.c b/ref-filter.c
index ad7f244414..e091f056ab 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -2225,10 +2225,18 @@ static const struct object_id *match_points_at(struct oid_array *points_at,
 		return oid;
 	obj = parse_object(the_repository, oid);
 	while (obj && obj->type == OBJ_TAG) {
-		oid = get_tagged_oid((struct tag *)obj);
+		struct tag *tag = (struct tag *)obj;
+
+		if (parse_tag(tag) < 0) {
+			obj = NULL;
+			break;
+		}
+
+		oid = get_tagged_oid(tag);
 		if (oid_array_lookup(points_at, oid) >= 0)
 			return oid;
-		obj = parse_object(the_repository, oid);
+
+		obj = tag->tagged;
 	}
 	if (!obj)
 		die(_("malformed object at '%s'"), refname);
-- 
2.41.0.586.g3c0cc15bc7


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/3] ref-filter: avoid parsing non-tags in match_points_at()
  2023-07-02 22:33     ` [PATCH 0/3] a few --points-at optimizations/cleanups Jeff King
  2023-07-02 22:35       ` [PATCH 1/3] ref-filter: avoid parsing tagged objects in match_points_at() Jeff King
@ 2023-07-02 22:37       ` Jeff King
  2023-07-02 22:38       ` [PATCH 3/3] ref-filter: simplify return type of match_points_at Jeff King
  2 siblings, 0 replies; 15+ messages in thread
From: Jeff King @ 2023-07-02 22:37 UTC (permalink / raw)
  To: Jan Klötzke
  Cc: git, Junio C Hamano, Steve Kemp, René Scharfe, Stefan Beller

When handling --points-at, we have to try to peel each ref to see if
it's a tag that points at a requested oid. We start this process by
calling parse_object() on the oid pointed to by each ref.

The cost of parsing each object adds up, especially in an output that
doesn't otherwise need to open the objects at all. Ideally we'd use
peel_iterated_oid() here, which uses the cached information in the
packed-refs file. But we can't, because our --points-at must match not
only the fully peeled value, but any interim values (so if tag A points
to tag B which points to commit C, we should match --points-at=B, but
peel_iterated_oid() will only tell us about C).

So the best we can do (absent changes to the packed-refs peel traits) is
to avoid parsing non-tags. The obvious way to do that is to call
oid_object_info() to check the type before parsing. But there are a few
gotchas there, like checking if the object has already been parsed.

Instead we can just tell parse_object() that we are OK skipping the hash
check, which lets it turn on several optimizations. Commits can be
loaded via the commit graph (so it's both fast and we have the benefit
of the parsed data if we need it later at the output stage). Blobs are
not loaded at all. Trees are still loaded, but it's rather rare to have
a ref point directly to a tree (and since this is just an optimization,
kicking in 99% of the time is OK).

Even though we're paying for an extra lookup, the cost to avoid parsing
the non-tags is a net benefit. In my git.git repository with 941 tags
and 1440 other refs pointing to commits, this significantly cuts the
runtime:

  Benchmark 1: ./git.old for-each-ref --points-at=HEAD
    Time (mean ± σ):      26.8 ms ±   0.5 ms    [User: 24.5 ms, System: 2.2 ms]
    Range (min … max):    25.9 ms …  29.2 ms    107 runs

  Benchmark 2: ./git.new for-each-ref --points-at=HEAD
    Time (mean ± σ):       9.1 ms ±   0.3 ms    [User: 6.8 ms, System: 2.2 ms]
    Range (min … max):     8.6 ms …  10.2 ms    308 runs

  Summary
    './git.new for-each-ref --points-at=HEAD' ran
      2.96 ± 0.10 times faster than './git.old for-each-ref --points-at=HEAD'

In a repository that is mostly annotated tags, we'd expect less
improvement (we might still skip a few object loads, but that's balanced
by the extra lookups). In my clone of linux.git, which has 782 tags and
3 branches, the run-time is about the same (it's actually ~1% faster on
average after this patch, but that's within the run-to-run noise).

Signed-off-by: Jeff King <peff@peff.net>
---
 ref-filter.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/ref-filter.c b/ref-filter.c
index e091f056ab..f5ac486430 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -2223,7 +2223,8 @@ static const struct object_id *match_points_at(struct oid_array *points_at,

 	if (oid_array_lookup(points_at, oid) >= 0)
 		return oid;
-	obj = parse_object(the_repository, oid);
+	obj = parse_object_with_flags(the_repository, oid,
+				      PARSE_OBJECT_SKIP_HASH_CHECK);
 	while (obj && obj->type == OBJ_TAG) {
 		struct tag *tag = (struct tag *)obj;

-- 
2.41.0.586.g3c0cc15bc7

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 3/3] ref-filter: simplify return type of match_points_at
  2023-07-02 22:33     ` [PATCH 0/3] a few --points-at optimizations/cleanups Jeff King
  2023-07-02 22:35       ` [PATCH 1/3] ref-filter: avoid parsing tagged objects in match_points_at() Jeff King
  2023-07-02 22:37       ` [PATCH 2/3] ref-filter: avoid parsing non-tags " Jeff King
@ 2023-07-02 22:38       ` Jeff King
  2 siblings, 0 replies; 15+ messages in thread
From: Jeff King @ 2023-07-02 22:38 UTC (permalink / raw)
  To: Jan Klötzke
  Cc: git, Junio C Hamano, Steve Kemp, René Scharfe, Stefan Beller

We return the oid that matched, but the sole caller only cares whether
we matched anything at all. This is mostly academic, since there's only
one caller, but the lifetime of the returned pointer is not immediately
clear. Sometimes it points to an oid in a tag struct, which should live
forever. And sometimes to the oid passed in, which only lives as long as
the each_ref_fn callback we're called from.

Simplify this to a boolean return which is more direct and obvious. As a
bonus, this lets us avoid the weird pattern of overwriting our "oid"
parameter in the loop (since we now only refer to the tagged oid one
time, and can just inline the call to get it).

Signed-off-by: Jeff King <peff@peff.net>
---
 ref-filter.c | 19 ++++++++++---------
 1 file changed, 10 insertions(+), 9 deletions(-)

diff --git a/ref-filter.c b/ref-filter.c
index f5ac486430..0629435f08 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -2209,20 +2209,22 @@ static int for_each_fullref_in_pattern(struct ref_filter *filter,
 /*
  * Given a ref (oid, refname), check if the ref belongs to the array
  * of oids. If the given ref is a tag, check if the given tag points
- * at one of the oids in the given oid array.
+ * at one of the oids in the given oid array. Returns non-zero if a
+ * match is found.
+ *
  * NEEDSWORK:
  * As the refs are cached we might know what refname peels to without
  * the need to parse the object via parse_object(). peel_ref() might be a
  * more efficient alternative to obtain the pointee.
  */
-static const struct object_id *match_points_at(struct oid_array *points_at,
-					       const struct object_id *oid,
-					       const char *refname)
+static int match_points_at(struct oid_array *points_at,
+			   const struct object_id *oid,
+			   const char *refname)
 {
 	struct object *obj;
 
 	if (oid_array_lookup(points_at, oid) >= 0)
-		return oid;
+		return 1;
 	obj = parse_object_with_flags(the_repository, oid,
 				      PARSE_OBJECT_SKIP_HASH_CHECK);
 	while (obj && obj->type == OBJ_TAG) {
@@ -2233,15 +2235,14 @@ static const struct object_id *match_points_at(struct oid_array *points_at,
 			break;
 		}
 
-		oid = get_tagged_oid(tag);
-		if (oid_array_lookup(points_at, oid) >= 0)
-			return oid;
+		if (oid_array_lookup(points_at, get_tagged_oid(tag)) >= 0)
+			return 1;
 
 		obj = tag->tagged;
 	}
 	if (!obj)
 		die(_("malformed object at '%s'"), refname);
-	return NULL;
+	return 0;
 }
 
 /*
-- 
2.41.0.586.g3c0cc15bc7

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH] ref-filter: handle nested tags in --points-at option
  2023-07-02 22:02   ` Jeff King
  2023-07-02 22:33     ` [PATCH 0/3] a few --points-at optimizations/cleanups Jeff King
@ 2023-07-03 20:25     ` Jan Klötzke
  1 sibling, 0 replies; 15+ messages in thread
From: Jan Klötzke @ 2023-07-03 20:25 UTC (permalink / raw)
  To: Jeff King
  Cc: git, Junio C Hamano, Steve Kemp, René Scharfe, Stefan Beller

Am Sun, Jul 02, 2023 at 06:02:43PM -0400 schrieb Jeff King:
> On Sun, Jul 02, 2023 at 08:56:11AM -0400, Jeff King wrote:
> 
> > My biggest question would be whether this introduces any performance
> > penalty for the more common cases (lightweight tags and single-level
> > annotated tags). The answer is "no", I think; we are already paying the
> > cost to parse every object to find out if it's a tag, and your new loop
> > only does an extra parse if we see a tag-of-tag. Good.
> 
> Reading more carefully, I think this does actually change the
> performance a bit, because we end up parsing the pointed-to commits, as
> well. So here's before and after your patch running "git for-each-ref
> --points-at=HEAD" on linux.git (785 refs, all but 3 are tags):
> 
>   Benchmark 1: ./git.old for-each-ref --points-at=HEAD
>     Time (mean ± σ):      11.4 ms ±   0.2 ms    [User: 6.5 ms, System: 4.9 ms]
>     Range (min … max):    11.0 ms …  12.3 ms    239 runs
>   
>   Benchmark 2: ./git.new for-each-ref --points-at=HEAD
>     Time (mean ± σ):      20.6 ms ±   0.5 ms    [User: 10.4 ms, System: 10.2 ms]
>     Range (min … max):    19.8 ms …  22.7 ms    133 runs
>   
>   Summary
>     './git.old for-each-ref --points-at=HEAD' ran
>       1.80 ± 0.06 times faster than './git.new for-each-ref --points-at=HEAD'
> 
> The absolute numbers are pretty small, but the percent change isn't
> great. I'll send some patches in a minute that can be applied on top to
> improve this case, as well as fix the other issues I pointed out in the
> existing code.

I have to admit I was not entirely sure about the performance
implications. The relative performance drop is indeed substantial.
Thanks for the thorough review and swiftly taking care of these!

-- Jan

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] ref-filter: handle nested tags in --points-at option
  2023-07-02 12:56 ` Jeff King
  2023-07-02 16:25   ` René Scharfe
  2023-07-02 22:02   ` Jeff King
@ 2023-07-05  6:10   ` Junio C Hamano
  2023-07-05 12:41     ` Jeff King
  2 siblings, 1 reply; 15+ messages in thread
From: Junio C Hamano @ 2023-07-05  6:10 UTC (permalink / raw)
  To: Jeff King
  Cc: Jan Klötzke, git, Steve Kemp, René Scharfe,
	Stefan Beller

Jeff King <peff@peff.net> writes:

> That seems reasonable to me. It is changing the definition of
> --points-at slightly, but I think in a way that should be less
> surprising to users (i.e., we can consider the old behavior a bug).

OK.

> The
> existing documentation is sufficiently vague about "points" that I don't
> think it needs to be updated (though arguably we could improve that
> here, too).

True, too.

Having said that, as we lack a primitive exposed to the user script
to only peel one level of a tag, it makes me wonder how much of this
is a practical issue.  Is a tag that points at another tag mere
curiosity and subject of mental gymnastics, or is it a useful
construct with real world use case?  If the latter, isn't it more
like "a tag to a tag _could_ be made useful if we also supported X
and Y and Z" where X could be "There should be a syntax like
A^{commit} that lets us peel only one level of tag"?

> Note that most other tag-peeling in Git (like the peeled values returned
> by upload-pack) errs in the opposite direction: they peel completely,
> and don't show the intermediate values.

Exactly.  While allowing to feed the intermediate level as the
argument to "points at" does sound like a good idea on surface, I am
skeptical about the practical value it brings---if we cannot do this
without any additional overhead (of course if there are tags to tags
in the repository, I am perfectly fine to pay the cost of
dereferencing the chain, but I am more worried about regressing the
performance in repositories without tags to tags), I am not sure if
it is worth it.

> My biggest question would be whether this introduces any performance
> penalty for the more common cases (lightweight tags and single-level
> annotated tags). The answer is "no", I think; we are already paying the
> cost to parse every object to find out if it's a tag, and your new loop
> only does an extra parse if we see a tag-of-tag. Good.

OK.

>   Let me go off on a tangent here, since I'm looking at the performance
>   of this function. The current code is already rather pessimal here, as
>   we could probably avoid parsing non-tags entirely. Some strategies
>   there are:
>
>     1. We could check the object type via oid_object_info() before
>        parsing. This carries a small penalty (two lookups) for tags but
>        a big win (avoiding loading the object contents) for non-tags.
>
>        An easy way to do this is to replace the parse_object() with
>        parse_object_with_flags(PARSE_OBJECT_SKIP_HASH_CHECK), which
>        tries to avoid loading object contents (especially using the
>        commit-graph for commits, which presumably covers most non-tag
>        refs).
>
>     2. We could be using the peel_iterated_oid() interface (this is the
>        peel_ref() thing mentioned in the comment you touched, but it has
>        since been renamed). But it does the "peel all the way" thing
>        mentioned above (both because of its interface, but also because
>        that's what the packed-refs peel lines store).
>
>        So to do that we'd either have to enhance the packed-refs store
>        (which would not be too hard to do in a backwards-compatible
>        way), or switch --points-at to only match either the direct ref
>        value or the fully-peeled value.
>
>   I don't think either of those is something your patch needs to deal
>   with. It is not making these kinds of optimizations any harder (it is
>   the existing "peel only once" behavior that does so). I mostly wanted
>   to get it written down while we are all looking at this function.

My preference would be to see these optimization done first and then
add this new loop on top of it.  That way, we can measure more
easily what kind of additional overhead, if any, we are paying by
adding the loop.

Thanks.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] ref-filter: handle nested tags in --points-at option
  2023-07-05  6:10   ` Junio C Hamano
@ 2023-07-05 12:41     ` Jeff King
  2023-07-05 17:16       ` Junio C Hamano
  0 siblings, 1 reply; 15+ messages in thread
From: Jeff King @ 2023-07-05 12:41 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Jan Klötzke, git, Steve Kemp, René Scharfe,
	Stefan Beller

On Tue, Jul 04, 2023 at 11:10:47PM -0700, Junio C Hamano wrote:

> My preference would be to see these optimization done first and then
> add this new loop on top of it.  That way, we can measure more
> easily what kind of additional overhead, if any, we are paying by
> adding the loop.

I ended up doing them on top, rather than before, but I think the size
of the impact can easily be seen.

The one thing that would actually make us a lot faster (by using the
packed-refs peels) is to make full peels the only option, and do not
bother letting --points-at match "B" in an A->B->C peel. But that would
be removing something that is currently matched (even before the patch
in this thread), so I stopped short of it in my optimizations. But even
if we decide to do that, Jan's patch is not making anything worse there
(in fact, it is making it better, because it is matching "C" which we do
not currently match).

So I'd be inclined to proceed with the patches I sent earlier, and then
if we choose to later refactor again to drop "B", we can.

Of course if somebody wants to do that refactor _now_ instead of the
patches already posted, I'm OK with that. It just won't be me due to
time constraints.

-Peff

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] ref-filter: handle nested tags in --points-at option
  2023-07-05 12:41     ` Jeff King
@ 2023-07-05 17:16       ` Junio C Hamano
  2023-07-05 18:50         ` Jan Klötzke
  0 siblings, 1 reply; 15+ messages in thread
From: Junio C Hamano @ 2023-07-05 17:16 UTC (permalink / raw)
  To: Jeff King
  Cc: Jan Klötzke, git, Steve Kemp, René Scharfe,
	Stefan Beller

Jeff King <peff@peff.net> writes:

> On Tue, Jul 04, 2023 at 11:10:47PM -0700, Junio C Hamano wrote:
>
>> My preference would be to see these optimization done first and then
>> add this new loop on top of it.  That way, we can measure more
>> easily what kind of additional overhead, if any, we are paying by
>> adding the loop.
>
> I ended up doing them on top, rather than before, but I think the size
> of the impact can easily be seen.

Ah, I should have first read (or at least skimmed) all messages
before responding.  Thanks for following through.

> The one thing that would actually make us a lot faster (by using the
> packed-refs peels) is to make full peels the only option, and do not
> bother letting --points-at match "B" in an A->B->C peel. But that would
> be removing something that is currently matched (even before the patch
> in this thread), so I stopped short of it in my optimizations.

Interesting.  Right now, if I create a 'direct' tag that points
directly at HEAD, and then create an 'indirect' tag that points at
'direct', i.e.

    $ git tag -a -m 'a direct tag to HEAD' direct HEAD
    $ git tag -a -m 'an indirect tag' indirect direct

I would get a piece of advice message that encourages to correct the
mistake with "git tag -f indirect direct^{}".  Then I ask for tags
that point at HEAD, I see only 'direct' and not 'indirect'.  Your
optimization would start showing both 'direct' and 'indirect' if
they are packed.  But you are correct to worry about the opposite
case.  If I ask for tags that point at 'direct', I currently see
'indirect', but of course 'indirect' will not appear as the peeled
value of any ref, and the optimized version will stop saying that
'indirect' is a ref that points at 'direct'.  That sounds like a
regression.

> But even
> if we decide to do that, Jan's patch is not making anything worse there
> (in fact, it is making it better, because it is matching "C" which we do
> not currently match).
> ...
> So I'd be inclined to proceed with the patches I sent earlier, and then
> if we choose to later refactor again to drop "B", we can.

We generally avoid taking away anything once we give it to users;
once the patch under discussion goes in, there is no taking it back,
i.e. the new _behaviour_ closes the door to certain optimizations.

I do not at all mind to see us decide and declare that it is a good
thing to say that not just 'direct' but also 'indirect' points at
HEAD, and that 'indirect' points at 'direct' and the patch under
discussion makes the world a etter place, and we will not regret
that decision.  But the time to make such a decision is now, before
we give a go-ahead to the patch.

Thanks.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] ref-filter: handle nested tags in --points-at option
  2023-07-05 17:16       ` Junio C Hamano
@ 2023-07-05 18:50         ` Jan Klötzke
  2023-07-05 20:15           ` Junio C Hamano
  0 siblings, 1 reply; 15+ messages in thread
From: Jan Klötzke @ 2023-07-05 18:50 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Jeff King, git, Steve Kemp, René Scharfe, Stefan Beller

Am Wed, Jul 05, 2023 at 10:16:17AM -0700 schrieb Junio C Hamano:
> > The one thing that would actually make us a lot faster (by using the
> > packed-refs peels) is to make full peels the only option, and do not
> > bother letting --points-at match "B" in an A->B->C peel. But that would
> > be removing something that is currently matched (even before the patch
> > in this thread), so I stopped short of it in my optimizations.
> 
> Interesting.  Right now, if I create a 'direct' tag that points
> directly at HEAD, and then create an 'indirect' tag that points at
> 'direct', i.e.
> 
>     $ git tag -a -m 'a direct tag to HEAD' direct HEAD
>     $ git tag -a -m 'an indirect tag' indirect direct
> 
> I would get a piece of advice message that encourages to correct the
> mistake with "git tag -f indirect direct^{}".  Then I ask for tags
> that point at HEAD, I see only 'direct' and not 'indirect'.  Your
> optimization would start showing both 'direct' and 'indirect' if
> they are packed.  But you are correct to worry about the opposite
> case.  If I ask for tags that point at 'direct', I currently see
> 'indirect', but of course 'indirect' will not appear as the peeled
> value of any ref, and the optimized version will stop saying that
> 'indirect' is a ref that points at 'direct'.  That sounds like a
> regression.

I agree. I purposely let the loop run through all the tags in the chain
in my inital patch to keep the existing behaviour. Unless there is a
compelling reason I would suggest do keep it this way. I wouldn't rule
out that someone relies on being able to query for indirect tags by
supplying a tag to the --points-at option.

> > But even
> > if we decide to do that, Jan's patch is not making anything worse there
> > (in fact, it is making it better, because it is matching "C" which we do
> > not currently match).
> > ...
> > So I'd be inclined to proceed with the patches I sent earlier, and then
> > if we choose to later refactor again to drop "B", we can.
> 
> We generally avoid taking away anything once we give it to users;
> once the patch under discussion goes in, there is no taking it back,
> i.e. the new _behaviour_ closes the door to certain optimizations.
> 
> I do not at all mind to see us decide and declare that it is a good
> thing to say that not just 'direct' but also 'indirect' points at
> HEAD, and that 'indirect' points at 'direct' and the patch under
> discussion makes the world a etter place, and we will not regret
> that decision.  But the time to make such a decision is now, before
> we give a go-ahead to the patch.

The reasoning why I came up with the patch in the first place was an odd
behaviour that was reported to me in [1]. The user had a recipe that
checked out a nested tag for a package but the Bob Build Tool thought
the user messed with the working copy. Behind the scenes Bob checked out
the indirect tag (foobar-3.13.1) successfully. But when the user
examined the project state ("bob status"), Bob ran

  git tag --points-at HEAD

in the working copy to get the list of tags for the current HEAD. That
did not return the expected tag in the list, so the workspace state was
flagged. OTOH, the tag _is_ visible when manually executing "git log" in
the workspace. Clearly this is confusing and so was I when I tried to
find the root cause. To sum up:

 $ git checkout nested-tag   --> works
 $ git log                   --> shows nested tag in decoration
 $ git tag --points-at HEAD  --> does *not* show nested tag

AFAICT the nested tag in the mentioned bug report was not created on
purpose. I haven't come across any sensible use case for them either.
But as most git commands can handle nested tags they should better be
supported consistently IMHO.

-- Jan

[1] https://github.com/BobBuildTool/bob/issues/520

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] ref-filter: handle nested tags in --points-at option
  2023-07-05 18:50         ` Jan Klötzke
@ 2023-07-05 20:15           ` Junio C Hamano
  0 siblings, 0 replies; 15+ messages in thread
From: Junio C Hamano @ 2023-07-05 20:15 UTC (permalink / raw)
  To: Jan Klötzke
  Cc: Jeff King, git, Steve Kemp, René Scharfe, Stefan Beller

Jan Klötzke <jan@kloetzke.net> writes:

> Am Wed, Jul 05, 2023 at 10:16:17AM -0700 schrieb Junio C Hamano:
>> Interesting.  Right now, if I create a 'direct' tag that points
>> directly at HEAD, and then create an 'indirect' tag that points at
>> 'direct', i.e.
>> 
>>     $ git tag -a -m 'a direct tag to HEAD' direct HEAD
>>     $ git tag -a -m 'an indirect tag' indirect direct
>> 
>> I would get a piece of advice message that encourages to correct the
>> mistake with "git tag -f indirect direct^{}".
> ...
> But as most git commands can handle nested tags they should better be
> supported consistently IMHO.

Do they?  Most git commands handle nested tags in only one way: by
fully peeling.  "git checkout --detach indirect" in the above
scenario would handle nested tag "indirect" well, but it is done by
making "direct" tag inaccessible when the only thing you have is the
"indirect" tag.  For example, you cannot create another "indirect"
tag that points at "direct" tag with "git tag", with

    $ git tag -a -m 'another indirect' indirect-2 indirect^{}

The resulting tag will be another direct tag to the underlying
commit, and not a tag of the "direct" tag.

In that sense, --points-at we currently have that only peels once is
inconsistent with the others, but --points-at that peels repeatedly
and allows the intermediate steps to match is also behaving
inconsistently relative to most git commands.

Combined with the fact that we seem to discourage such an indirect
tag, we should either:

 (1) declare that indirect tags are not useful, turn the warning
     advice.nestedTag into a stronger error, devise appropriate
     transition plan to get rid of nested tag (e.g. eventually
     making it impossible to use "git tag" to create such a tag and
     let "git fsck" complain about them), and perhaps change
     "--points-at" to take only the fully peeled object into account
     so that optimization based on packed-refs becomes possible.  Or

 (2) declare that indirect tags are useful thing to support, tone
     down the advice.nestedTag message, and enhance the support of
     indirect tags, starting with this "--points-at" enhancement.

I am inclined to support (2), but then a consistent support would
need to eventually include a "peel only a single level" primitive as
well.  That would be the first step to allow "most git commands" to
support nested tags well, as they currently do not.

Thanks for working on this.  Let's queue it, together will Peff's
patches (which I haven't studied fully yet).

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2023-07-05 20:15 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-07-01 20:57 [PATCH] ref-filter: handle nested tags in --points-at option Jan Klötzke
2023-07-02 12:56 ` Jeff King
2023-07-02 16:25   ` René Scharfe
2023-07-02 20:27     ` Jeff King
2023-07-02 22:02   ` Jeff King
2023-07-02 22:33     ` [PATCH 0/3] a few --points-at optimizations/cleanups Jeff King
2023-07-02 22:35       ` [PATCH 1/3] ref-filter: avoid parsing tagged objects in match_points_at() Jeff King
2023-07-02 22:37       ` [PATCH 2/3] ref-filter: avoid parsing non-tags " Jeff King
2023-07-02 22:38       ` [PATCH 3/3] ref-filter: simplify return type of match_points_at Jeff King
2023-07-03 20:25     ` [PATCH] ref-filter: handle nested tags in --points-at option Jan Klötzke
2023-07-05  6:10   ` Junio C Hamano
2023-07-05 12:41     ` Jeff King
2023-07-05 17:16       ` Junio C Hamano
2023-07-05 18:50         ` Jan Klötzke
2023-07-05 20:15           ` Junio C Hamano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).