From: Jeff King <peff@peff.net>
To: Rasmus Villemoes <ravi@prevas.dk>
Cc: Josh Poimboeuf <jpoimboe@kernel.org>,
Masahiro Yamada <masahiroy@kernel.org>,
linux-kernel@vger.kernel.org, linux-kbuild@vger.kernel.org,
git@vger.kernel.org
Subject: Re: [PATCH] setlocalversion: Add workaround for "git describe" performance issue
Date: Thu, 31 Oct 2024 07:42:10 -0400 [thread overview]
Message-ID: <20241031114210.GA593548@coredump.intra.peff.net> (raw)
In-Reply-To: <87bjz0k17c.fsf@prevas.dk>
On Thu, Oct 31, 2024 at 11:37:27AM +0100, Rasmus Villemoes wrote:
> and that "gave up" commit is v4.18-rc4, the eleventh commit
> encountered. That also explains why you have to add a "dummy" second
> --match to make --candidates=1 have the expected behaviour.
>
> Perhaps the logic should instead be that as soon as match_cnt hits
> max_candidates (i.e. all the tags we're going to consider have actually
> been visited), we break out. That is, the last "else" above should
> instead be replaced by
>
> if (match_cnt == max_candidates) {
> ... /* ? , gave_up_on is now a misnomer */
> break;
> }
Yes, I agree that is the right direction. Replacing the "else" entirely
feels a little weird, because it is part of the:
if (!tags && !all && n->prio < 2)
...
else if (match_cnt < max_candidates)
...
else
...
So we'd now run that check even if we triggered the first block. But I
don't think it should matter in practice. We only increment match_cnt in
the else-if here. So the "else" block could go away, and the check for
giving up could go inside the else-if.
It does seem like gave_up_on is now pointless, but I'm not sure I
understand all of the code here. I assumed that it was only used to
report "this is where we gave up", and to give you the extra bit of
information that there _were_ other candidates that we omitted (and not
just exactly max_candidates). Of course we don't show that without
--debug. So it seems silly to spend a bunch of extra CPU for that.
But the plot thickens.
What I was going to suggest is that if we wanted to retain that one bit
of information, what we could do instead is: independent of
max_candidates, see if we've found all of the possible names we expanded
from --match. Then max_candidates would work as it does now, but we'd
avoid fruitlessly searching when there are no more names to find.
Counting the number of expanded names is a little weird. We use them to
annotate the commits, but of course multiple names can point to a single
commit, and there's a priority override system. I think the final number
we can find is the number of entries in the "names" hash.
So I expected this to work:
diff --git a/builtin/describe.c b/builtin/describe.c
index 7330a77b38..70a11072de 100644
--- a/builtin/describe.c
+++ b/builtin/describe.c
@@ -380,6 +380,9 @@ static void describe_commit(struct object_id *oid, struct strbuf *dst)
c->object.flags |= t->flag_within;
if (n->prio == 2)
annotated_cnt++;
+
+ if (match_cnt == hashmap_get_size(&names))
+ break;
}
else {
gave_up_on = c;
but it's still slow! If we set "gave_up_on = c", then it gets fast. I'm
not sure why that is. Later we do:
if (gave_up_on) {
commit_list_insert_by_date(gave_up_on, &list);
seen_commits--;
}
seen_commits += finish_depth_computation(&list, &all_matches[0]);
but I don't at all understand why adding gave_up_on lets that finish
sooner. So I'm worried we're missing something about how it is used.
One hack is to just, like the max_candidates case, let us look at one
_more_ commit before bailing. Like this:
diff --git a/builtin/describe.c b/builtin/describe.c
index 7330a77b38..177c8232f6 100644
--- a/builtin/describe.c
+++ b/builtin/describe.c
@@ -365,6 +365,11 @@ static void describe_commit(struct object_id *oid, struct strbuf *dst)
struct commit_list *parents = c->parents;
struct commit_name **slot;
+ if (match_cnt == hashmap_get_size(&names)) {
+ gave_up_on = c;
+ break;
+ }
+
seen_commits++;
slot = commit_names_peek(&commit_names, c);
n = slot ? *slot : NULL;
That works, but I have a feeling that figured out what the heck is going
on with gave_up_on might produce a more elegant solution.
> Then as a further DWIM aid, wherever the initialization logic is could
> be updated so that, after expanding all the --match= wildcards, if the
> number of tags is less than max_candidates, automatically lower
> max_candidates to that number (which in the setlocalversion case will
> always be 1 because we're not actually passing a wildcard).
Yeah, I had the same thought (though if we do a separate hashmap check
as above, it wouldn't be needed).
-Peff
next prev parent reply other threads:[~2024-10-31 11:42 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-10-31 1:20 [PATCH] setlocalversion: Add workaround for "git describe" performance issue Josh Poimboeuf
2024-10-31 10:37 ` Rasmus Villemoes
2024-10-31 11:42 ` Jeff King [this message]
2024-10-31 12:24 ` Jeff King
2024-10-31 14:43 ` Jeff King
2024-11-04 12:37 ` Benno Evers
2024-11-01 10:23 ` Rasmus Villemoes
2024-11-01 11:39 ` Jeff King
2024-10-31 11:43 ` Masahiro Yamada
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20241031114210.GA593548@coredump.intra.peff.net \
--to=peff@peff.net \
--cc=git@vger.kernel.org \
--cc=jpoimboe@kernel.org \
--cc=linux-kbuild@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=masahiroy@kernel.org \
--cc=ravi@prevas.dk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox