From: Jeff King <peff@peff.net>
To: Rasmus Villemoes <ravi@prevas.dk>
Cc: Benno Evers <benno@bmevers.de>,
Josh Poimboeuf <jpoimboe@kernel.org>,
Masahiro Yamada <masahiroy@kernel.org>,
linux-kernel@vger.kernel.org, linux-kbuild@vger.kernel.org,
git@vger.kernel.org
Subject: Re: [PATCH] setlocalversion: Add workaround for "git describe" performance issue
Date: Thu, 31 Oct 2024 10:43:51 -0400 [thread overview]
Message-ID: <20241031144351.GA1720940@coredump.intra.peff.net> (raw)
In-Reply-To: <20241031122456.GB593548@coredump.intra.peff.net>
On Thu, Oct 31, 2024 at 08:24:56AM -0400, Jeff King wrote:
> We have to feed at least one commit with the "within" flag into the
> traversal so that it can let us end things. But I don't think it really
> matters if that commit is the one we found, or if it's a parent of one
> that we happened to pass "within" bits down to.
>
> So I think we can just set "gave_up_on" to the final element we found
> (whether from max_candidates or from finding every possible name). I.e.,
> what I showed earlier, or what you were proposing.
Hmph. So I don't think this is quite true, but now I'm puzzled again.
It is accurate to say that we must make sure _some_ commit with the
those flag bits set remains in "list". And I don't think it matters if
it's the candidate we found, or its parent.
But there's other stuff happening in that loop, after we process that
max candidate (where we'd proposed to break) but before we hit the next
possible candidate. Stuff like adding onto the depth of the other
candidates. Josh's example doesn't show that because it only has one
candidate, but I could imagine a case where it does matter (though I
didn't construct one).
So I'd have thought that this:
diff --git a/builtin/describe.c b/builtin/describe.c
index 7330a77b38..b0f645c41d 100644
--- a/builtin/describe.c
+++ b/builtin/describe.c
@@ -366,6 +366,12 @@ static void describe_commit(struct object_id *oid, struct strbuf *dst)
struct commit_name **slot;
seen_commits++;
+
+ if (match_cnt == max_candidates) {
+ gave_up_on = c;
+ break;
+ }
+
slot = commit_names_peek(&commit_names, c);
n = slot ? *slot : NULL;
if (n) {
@@ -381,10 +387,6 @@ static void describe_commit(struct object_id *oid, struct strbuf *dst)
if (n->prio == 2)
annotated_cnt++;
}
- else {
- gave_up_on = c;
- break;
- }
}
for (cur_match = 0; cur_match < match_cnt; cur_match++) {
struct possible_tag *t = &all_matches[cur_match];
would do it, by just finishing out the loop iteration and bailing on the
next commit. After all, that commit _could_ be a candidate itself. But
it causes a test in t6120 to fail. We have a disjoint history like this:
B
o
\
o-----o---o----x
A
and we expect that "x" is described as "A-3" (because we are including
the disjoint B). But after the patch above and with --candidates=2
(since there are only two tags and part of our goal is to limit
candidates to the number of tags), we find "B-4". Which is worse (at
least by some metrics).
I think this comes from 30b1c7ad9d (describe: don't abort too early when
searching tags, 2020-02-26). And given the problem description there, I
can see how quitting early in a disjoint history will give you worse
answers. But the patch above is triggering a case that already _could_
trigger.
So it feels like 30b1c7ad9d is incomplete. Without any patches, if I
limit it to --candidates=2 but make A^ a tag, then it gets the same
wrong answer (for the exact same reason). And I don't see a way to make
it correct without losing the ability to break out of the traversal
early when we hit max_candidates (which is obviously a very important
optimization in general). But maybe I'm missing something.
I do think my patch above is not introducing a new problem that wasn't
already there. It's just that the toy repo, having so few tags, means
any logic to reduce max_candidates will trigger there.
+cc the author of 30b1c7ad9d for any wisdom
-Peff
next prev parent reply other threads:[~2024-10-31 14:43 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <309549cafdcfe50c4fceac3263220cc3d8b109b2.1730337435.git.jpoimboe@kernel.org>
2024-10-31 10:37 ` [PATCH] setlocalversion: Add workaround for "git describe" performance issue Rasmus Villemoes
2024-10-31 11:42 ` Jeff King
2024-10-31 12:24 ` Jeff King
2024-10-31 14:43 ` Jeff King [this message]
2024-11-04 12:37 ` Benno Evers
2024-11-06 19:22 ` [PATCH 0/4] perf improvements for git-describe with few tags Jeff King
2024-11-06 19:26 ` Jeff King
2024-11-06 21:16 ` [PATCH 1/4] t6120: demonstrate weakness in disjoint-root handling Jeff King
2024-11-06 21:17 ` [PATCH 2/4] t/perf: add tests for git-describe Jeff King
2024-11-06 21:17 ` [PATCH 3/4] describe: stop digging for max_candidates+1 Jeff King
2024-11-06 21:17 ` [PATCH 4/4] describe: stop traversing when we run out of names Jeff King
2024-12-04 23:15 ` [PATCH] fixup! " Josh Steadmon
2024-12-04 23:27 ` Jeff King
2024-12-04 23:54 ` Jeff King
2024-12-05 20:14 ` [PATCH] describe: drop early return for max_candidates == 0 Jeff King
2024-12-05 22:28 ` Josh Steadmon
2024-12-05 23:21 ` Jeff King
2024-12-06 3:01 ` Junio C Hamano
2024-12-06 3:28 ` Jeff King
2024-12-06 4:19 ` Junio C Hamano
2024-12-06 5:42 ` [PATCH] describe: split "found all tags" and max_candidates logic Jeff King
2024-11-26 5:05 ` [PATCH 0/4] perf improvements for git-describe with few tags Junio C Hamano
2024-12-04 23:04 ` Josh Steadmon
2024-12-04 23:20 ` Jeff King
2024-11-01 10:23 ` [PATCH] setlocalversion: Add workaround for "git describe" performance issue Rasmus Villemoes
2024-11-01 11:39 ` Jeff King
2024-10-31 11:43 ` Masahiro Yamada
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20241031144351.GA1720940@coredump.intra.peff.net \
--to=peff@peff.net \
--cc=benno@bmevers.de \
--cc=git@vger.kernel.org \
--cc=jpoimboe@kernel.org \
--cc=linux-kbuild@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=masahiroy@kernel.org \
--cc=ravi@prevas.dk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).