Linux kbuild/kconfig development
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: Rasmus Villemoes <ravi@prevas.dk>
Cc: Josh Poimboeuf <jpoimboe@kernel.org>,
	Masahiro Yamada <masahiroy@kernel.org>,
	linux-kernel@vger.kernel.org, linux-kbuild@vger.kernel.org,
	git@vger.kernel.org
Subject: Re: [PATCH] setlocalversion: Add workaround for "git describe" performance issue
Date: Thu, 31 Oct 2024 07:42:10 -0400	[thread overview]
Message-ID: <20241031114210.GA593548@coredump.intra.peff.net> (raw)
In-Reply-To: <87bjz0k17c.fsf@prevas.dk>

On Thu, Oct 31, 2024 at 11:37:27AM +0100, Rasmus Villemoes wrote:

> and that "gave up" commit is v4.18-rc4, the eleventh commit
> encountered. That also explains why you have to add a "dummy" second
> --match to make --candidates=1 have the expected behaviour.
> 
> Perhaps the logic should instead be that as soon as match_cnt hits
> max_candidates (i.e. all the tags we're going to consider have actually
> been visited), we break out. That is, the last "else" above should
> instead be replaced by
> 
>   if (match_cnt == max_candidates) {
>     ... /* ? , gave_up_on is now a misnomer */
>     break;
>   }

Yes, I agree that is the right direction. Replacing the "else" entirely
feels a little weird, because it is part of the:

  if (!tags && !all && n->prio < 2)
	...
  else if (match_cnt < max_candidates)
	...
  else
	...

So we'd now run that check even if we triggered the first block. But I
don't think it should matter in practice. We only increment match_cnt in
the else-if here. So the "else" block could go away, and the check for
giving up could go inside the else-if.

It does seem like gave_up_on is now pointless, but I'm not sure I
understand all of the code here. I assumed that it was only used to
report "this is where we gave up", and to give you the extra bit of
information that there _were_ other candidates that we omitted (and not
just exactly max_candidates). Of course we don't show that without
--debug. So it seems silly to spend a bunch of extra CPU for that.

But the plot thickens.

What I was going to suggest is that if we wanted to retain that one bit
of information, what we could do instead is: independent of
max_candidates, see if we've found all of the possible names we expanded
from --match. Then max_candidates would work as it does now, but we'd
avoid fruitlessly searching when there are no more names to find.

Counting the number of expanded names is a little weird. We use them to
annotate the commits, but of course multiple names can point to a single
commit, and there's a priority override system. I think the final number
we can find is the number of entries in the "names" hash.

So I expected this to work:

diff --git a/builtin/describe.c b/builtin/describe.c
index 7330a77b38..70a11072de 100644
--- a/builtin/describe.c
+++ b/builtin/describe.c
@@ -380,6 +380,9 @@ static void describe_commit(struct object_id *oid, struct strbuf *dst)
 				c->object.flags |= t->flag_within;
 				if (n->prio == 2)
 					annotated_cnt++;
+
+				if (match_cnt == hashmap_get_size(&names))
+					break;
 			}
 			else {
 				gave_up_on = c;

but it's still slow! If we set "gave_up_on = c", then it gets fast. I'm
not sure why that is. Later we do:

        if (gave_up_on) {
                commit_list_insert_by_date(gave_up_on, &list);
                seen_commits--;
        }
        seen_commits += finish_depth_computation(&list, &all_matches[0]);

but I don't at all understand why adding gave_up_on lets that finish
sooner. So I'm worried we're missing something about how it is used.

One hack is to just, like the max_candidates case, let us look at one
_more_ commit before bailing. Like this:

diff --git a/builtin/describe.c b/builtin/describe.c
index 7330a77b38..177c8232f6 100644
--- a/builtin/describe.c
+++ b/builtin/describe.c
@@ -365,6 +365,11 @@ static void describe_commit(struct object_id *oid, struct strbuf *dst)
 		struct commit_list *parents = c->parents;
 		struct commit_name **slot;
 
+		if (match_cnt == hashmap_get_size(&names)) {
+			gave_up_on = c;
+			break;
+		}
+
 		seen_commits++;
 		slot = commit_names_peek(&commit_names, c);
 		n = slot ? *slot : NULL;


That works, but I have a feeling that figured out what the heck is going
on with gave_up_on might produce a more elegant solution.

> Then as a further DWIM aid, wherever the initialization logic is could
> be updated so that, after expanding all the --match= wildcards, if the
> number of tags is less than max_candidates, automatically lower
> max_candidates to that number (which in the setlocalversion case will
> always be 1 because we're not actually passing a wildcard).

Yeah, I had the same thought (though if we do a separate hashmap check
as above, it wouldn't be needed).

-Peff

  reply	other threads:[~2024-10-31 11:42 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-10-31  1:20 [PATCH] setlocalversion: Add workaround for "git describe" performance issue Josh Poimboeuf
2024-10-31 10:37 ` Rasmus Villemoes
2024-10-31 11:42   ` Jeff King [this message]
2024-10-31 12:24     ` Jeff King
2024-10-31 14:43       ` Jeff King
2024-11-04 12:37         ` Benno Evers
2024-11-01 10:23       ` Rasmus Villemoes
2024-11-01 11:39         ` Jeff King
2024-10-31 11:43   ` Masahiro Yamada

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20241031114210.GA593548@coredump.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=jpoimboe@kernel.org \
    --cc=linux-kbuild@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=masahiroy@kernel.org \
    --cc=ravi@prevas.dk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox