* [bug] "git bisect old v3.0" takes 21 mins on Linux repo
@ 2025-01-13 22:11 Askar Safin
2025-01-13 23:16 ` Askar Safin
2025-01-14 22:21 ` D. Ben Knoble
0 siblings, 2 replies; 10+ messages in thread
From: Askar Safin @ 2025-01-13 22:11 UTC (permalink / raw)
To: git
Hi. This is bug report. "git bisect" is unacceptable slow on Linux repo.
Steps to reproduce:
===
d-user@comp:/tmp/t$ git clone git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git
Cloning into 'linux'...
remote: Enumerating objects: 13079335, done.
remote: Counting objects: 100% (153/153), done.
remote: Compressing objects: 100% (108/108), done.
remote: Total 13079335 (delta 84), reused 70 (delta 45), pack-reused 13079182
Receiving objects: 100% (13079335/13079335), 5.18 GiB | 13.72 MiB/s, done.
Resolving deltas: 100% (10454171/10454171), done.
Updating files: 100% (87234/87234), done.
d-user@comp:/tmp/t$ cd linux
d-user@comp:/tmp/t/linux$ git bisect start
status: waiting for both good and bad commits
d-user@comp:/tmp/t/linux$ git bisect new v6.13-rc7
status: waiting for good commit(s), bad commit known
d-user@comp:/tmp/t/linux$ time -p git bisect old v3.0
Bisecting: 535608 revisions left to test after this (roughly 19 steps)
[62606c224d72a98c35d21a849f95cccf95b0a252] Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
real 1293.32
user 1291.70
sys 1.41
===
1293.32 s (21 mins) is unacceptably slow. During "git bisect" execution process "git bisect--helper" occupies 100 % of CPU in "htop" output. (This means that "git bisect--helper" is not parallel program, overwise it would occupy significantly more than 100 %).
So, please, make "git bisect" faster. (Maybe it makes sence to make it parallel?)
My OS is Debian 12 Bookworm. Output of "uname -a" is "Linux comp 6.1.0-28-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.119-1 (2024-11-22) x86_64 GNU/Linux".
My git version is 2.39.5.
The above test was performed on tmpfs on real hardware without any kind of virtualization.
I will try to perform the same test with latest git version and will report my findings in the next mail (hopefully today).
--
Askar Safin
https://types.pl/@safinaskar
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [bug] "git bisect old v3.0" takes 21 mins on Linux repo
2025-01-13 22:11 [bug] "git bisect old v3.0" takes 21 mins on Linux repo Askar Safin
@ 2025-01-13 23:16 ` Askar Safin
2025-01-14 22:21 ` D. Ben Knoble
1 sibling, 0 replies; 10+ messages in thread
From: Askar Safin @ 2025-01-13 23:16 UTC (permalink / raw)
To: safinaskar; +Cc: git
The same thing happens with git 2.47.1.
It seems "git bisect" didn't change much from 2.47.1, so I think
there is no sense to test more versions (but I can if you ask)
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [bug] "git bisect old v3.0" takes 21 mins on Linux repo
2025-01-13 22:11 [bug] "git bisect old v3.0" takes 21 mins on Linux repo Askar Safin
2025-01-13 23:16 ` Askar Safin
@ 2025-01-14 22:21 ` D. Ben Knoble
2025-01-16 10:52 ` Jeff King
1 sibling, 1 reply; 10+ messages in thread
From: D. Ben Knoble @ 2025-01-14 22:21 UTC (permalink / raw)
To: Askar Safin; +Cc: git
On Mon, Jan 13, 2025 at 5:11 PM Askar Safin <safinaskar@zohomail.com> wrote:
>
> Hi. This is bug report. "git bisect" is unacceptable slow on Linux repo.
That may be a bit inflammatory ;)
>
> Steps to reproduce:
>
> ===
> d-user@comp:/tmp/t$ git clone git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git
> Cloning into 'linux'...
> remote: Enumerating objects: 13079335, done.
> remote: Counting objects: 100% (153/153), done.
> remote: Compressing objects: 100% (108/108), done.
> remote: Total 13079335 (delta 84), reused 70 (delta 45), pack-reused 13079182
> Receiving objects: 100% (13079335/13079335), 5.18 GiB | 13.72 MiB/s, done.
> Resolving deltas: 100% (10454171/10454171), done.
> Updating files: 100% (87234/87234), done.
> d-user@comp:/tmp/t$ cd linux
> d-user@comp:/tmp/t/linux$ git bisect start
> status: waiting for both good and bad commits
> d-user@comp:/tmp/t/linux$ git bisect new v6.13-rc7
> status: waiting for good commit(s), bad commit known
> d-user@comp:/tmp/t/linux$ time -p git bisect old v3.0
> Bisecting: 535608 revisions left to test after this (roughly 19 steps)
> [62606c224d72a98c35d21a849f95cccf95b0a252] Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
> real 1293.32
> user 1291.70
> sys 1.41
> ===
FWIW:
$ time git rev-list --count v3.0...v6.13-rc7
1070175
git rev-list --count v3.0...v6.13-rc7 13,57s user 1,41s system 96%
cpu 15,466 total
That's a large number of revisions to bisect. Further,
# --force needed because my filesystem is case-insensitive :eyeroll:
$ time git checkout [--force] 62606c224d72a98c35d21a849f95cccf95b0a252
git checkout --force 62606c224d72a98c35d21a849f95cccf95b0a252 7,94s
user 18,54s system 96% cpu 27,360 total
Using pathspecs or a smaller commit range should help speed up the
start. (On a recent git, the helper is gone, so I'm not sure where the
time is spent—but I do notice that `git bisect start v6.13-rc7 v3.0`
is slow enough that I've killed it rather than wait.)
--
D. Ben Knoble
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [bug] "git bisect old v3.0" takes 21 mins on Linux repo
2025-01-14 22:21 ` D. Ben Knoble
@ 2025-01-16 10:52 ` Jeff King
2025-01-16 12:53 ` Jeff King
0 siblings, 1 reply; 10+ messages in thread
From: Jeff King @ 2025-01-16 10:52 UTC (permalink / raw)
To: D. Ben Knoble; +Cc: Askar Safin, git
On Tue, Jan 14, 2025 at 05:21:20PM -0500, D. Ben Knoble wrote:
> FWIW:
>
> $ time git rev-list --count v3.0...v6.13-rc7
> 1070175
> git rev-list --count v3.0...v6.13-rc7 13,57s user 1,41s system 96%
> cpu 15,466 total
>
> That's a large number of revisions to bisect. Further,
Yeah. I'm not very familiar with the bisect code, but it looks like it's
quadratic. In do_find_bisection(), we have a big list of commits, and we
iterate like this:
for (p = list; p; p = p->next) {
if (p->item->object.flags & UNINTERESTING)
continue;
if (weight(p) != -2)
continue;
if (bisect_flags & FIND_BISECTION_FIRST_PARENT_ONLY)
BUG("shouldn't be calling count-distance in fp mode");
weight_set(p, count_distance(p));
clear_distance(list);
/* Does it happen to be at half-way? */
if (!(bisect_flags & FIND_BISECTION_ALL) &&
approx_halfway(p, nr))
return p;
counted++;
}
That clear_distance() call likewise iterates through the list to clear
the COUNTED flags from each. I guess we might be able to traverse down
from the tip of the commit we're operating on, clearing flags there.
Since that's how the flags are set in count_distance().
I suspect it's still quadratic, though, because count_distance() is
traversing separately for each (and in the worst case everything is
reachable from it). But it might still improve things in practice.
-Peff
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [bug] "git bisect old v3.0" takes 21 mins on Linux repo
2025-01-16 10:52 ` Jeff King
@ 2025-01-16 12:53 ` Jeff King
2025-01-16 13:52 ` Jeff King
0 siblings, 1 reply; 10+ messages in thread
From: Jeff King @ 2025-01-16 12:53 UTC (permalink / raw)
To: D. Ben Knoble; +Cc: Askar Safin, git
On Thu, Jan 16, 2025 at 05:52:46AM -0500, Jeff King wrote:
> That clear_distance() call likewise iterates through the list to clear
> the COUNTED flags from each. I guess we might be able to traverse down
> from the tip of the commit we're operating on, clearing flags there.
> Since that's how the flags are set in count_distance().
>
> I suspect it's still quadratic, though, because count_distance() is
> traversing separately for each (and in the worst case everything is
> reachable from it). But it might still improve things in practice.
Apparently I'm good at suspecting. Here's a patch to make
clear_distance() walk the same commits as count_distance(), including
the string-of-pearls recursion avoidance:
diff --git a/bisect.c b/bisect.c
index 1a9069c9ad..ecf656316b 100644
--- a/bisect.c
+++ b/bisect.c
@@ -69,12 +69,28 @@ static int count_distance(struct commit_list *entry)
return nr;
}
-static void clear_distance(struct commit_list *list)
+static void clear_distance(struct commit_list *entry)
{
- while (list) {
- struct commit *commit = list->item;
+ while (entry) {
+ struct commit *commit = entry->item;
+ struct commit_list *p;
+
+ if (commit->object.flags & UNINTERESTING)
+ break;
+ if (!(commit->object.flags & COUNTED))
+ break;
+
commit->object.flags &= ~COUNTED;
- list = list->next;
+
+ p = commit->parents;
+ entry = p;
+ if (p) {
+ p = p->next;
+ while (p) {
+ clear_distance(p);
+ p = p->next;
+ }
+ }
}
}
@@ -338,7 +354,7 @@ static struct commit_list *do_find_bisection(struct commit_list *list,
if (bisect_flags & FIND_BISECTION_FIRST_PARENT_ONLY)
BUG("shouldn't be calling count-distance in fp mode");
weight_set(p, count_distance(p));
- clear_distance(list);
+ clear_distance(p);
/* Does it happen to be at half-way? */
if (!(bisect_flags & FIND_BISECTION_ALL) &&
It cuts Askar's case on my machine from 16m51s to 9m34s. So a big
improvement but still...not great.
I suspect that the whole bisection count algorithm needs to be rewritten
to all run in a single traversal. I guess if you iterate over the
commits in reverse-topo order, you should be able to just compute each
distance as "d(commit) = 1; d(commit) += d(p) for parents(commit)". But
it's not a problem I've thought a lot about, so I'm probably missing
some subtlety.
At any rate, an easier way to time this is:
git rev-list --bisect v3.0..v6.13-rc7
which is the expensive part of what git-bisect is doing under the hood.
-Peff
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [bug] "git bisect old v3.0" takes 21 mins on Linux repo
2025-01-16 12:53 ` Jeff King
@ 2025-01-16 13:52 ` Jeff King
2025-01-16 16:40 ` Junio C Hamano
0 siblings, 1 reply; 10+ messages in thread
From: Jeff King @ 2025-01-16 13:52 UTC (permalink / raw)
To: D. Ben Knoble; +Cc: Askar Safin, git
On Thu, Jan 16, 2025 at 07:53:13AM -0500, Jeff King wrote:
> I suspect that the whole bisection count algorithm needs to be rewritten
> to all run in a single traversal. I guess if you iterate over the
> commits in reverse-topo order, you should be able to just compute each
> distance as "d(commit) = 1; d(commit) += d(p) for parents(commit)". But
> it's not a problem I've thought a lot about, so I'm probably missing
> some subtlety.
Oh nevermind, that won't work, as it double-counts commits that are
reachable from each parent. Still, it feels like there ought to be a way
to compute it with a single traversal.
I think this is similar to the reachability bitmap computation, which
computes a bitmap for each commit (and then the weight of each commit is
the number of set bits, but we've removed the duplicates). We do that in
a single traversal these days, but it's pretty complex and heavyweight.
So I think there's room for improvement here, but it sounds non-trivial.
-Peff
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [bug] "git bisect old v3.0" takes 21 mins on Linux repo
2025-01-16 13:52 ` Jeff King
@ 2025-01-16 16:40 ` Junio C Hamano
2025-01-17 5:31 ` Askar Safin
0 siblings, 1 reply; 10+ messages in thread
From: Junio C Hamano @ 2025-01-16 16:40 UTC (permalink / raw)
To: Jeff King; +Cc: D. Ben Knoble, Askar Safin, git
Jeff King <peff@peff.net> writes:
> Oh nevermind, that won't work, as it double-counts commits that are
> reachable from each parent. Still, it feels like there ought to be a way
> to compute it with a single traversal.
It is somewhat embarrassing and amusing at the same time that the
"This is a truly stupid algorithm, but it's only used for bisection,
and we just don't care enough." comment Linus wrote long time ago is
still with us. Many men tried, they did not die but failed.
> I think this is similar to the reachability bitmap computation, which
> computes a bitmap for each commit (and then the weight of each commit is
> the number of set bits, but we've removed the duplicates). We do that in
> a single traversal these days, but it's pretty complex and heavyweight.
The original algorithm was done way before any of the recent
auxiliary data structures were invented. There may be approaches we
haven't tried and can now try to improve bisection, expoliting newer
things like reachability bitmaps.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [bug] "git bisect old v3.0" takes 21 mins on Linux repo
2025-01-16 16:40 ` Junio C Hamano
@ 2025-01-17 5:31 ` Askar Safin
2025-01-17 13:14 ` Jeff King
0 siblings, 1 reply; 10+ messages in thread
From: Askar Safin @ 2025-01-17 5:31 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Jeff King, D. Ben Knoble, git
I think "git bisect" is very important part of git.
Linux's "submitting-patches" contains this text:
> When dividing your change into a series of patches, take special care to ensure that the kernel builds and runs properly after each patch in the series. Developers using git bisect to track down a problem can end up splitting your patch series at any point; they will not thank you if you introduce bugs in the middle.
( https://docs.kernel.org/process/submitting-patches.html )
So, as you can see, existance of "git bisect" is rationale for contributor guidelines!
Contributor rules written the way they are because of "git bisect"!!!
--
Askar Safin
https://types.pl/@safinaskar
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [bug] "git bisect old v3.0" takes 21 mins on Linux repo
2025-01-17 5:31 ` Askar Safin
@ 2025-01-17 13:14 ` Jeff King
2025-01-17 16:50 ` Junio C Hamano
0 siblings, 1 reply; 10+ messages in thread
From: Jeff King @ 2025-01-17 13:14 UTC (permalink / raw)
To: Askar Safin; +Cc: Junio C Hamano, D. Ben Knoble, git
On Fri, Jan 17, 2025 at 09:31:56AM +0400, Askar Safin wrote:
> I think "git bisect" is very important part of git.
Me too. But that doesn't make it any easier to figure out a more
optimized algorithm. ;)
In the meantime, here are some other options:
1. You can manually pick a commit that is around the midpoint of
history and try it. That will quickly reduce the search space to
something more manageable. E.g., maybe try v4.0 and v5.0 and use
those as your initial good/bad starting points (depending on the
result). Those might not be the exact halfway point, but it's good
enough to get started.
2. In a branchy history like linux.git, you can make the problem space
much smaller by looking only at the history along the first parent.
E.g.:
git bisect start --first-parent
git bisect good v3.0
git bisect bad v6.13-rc7
That runs in about 7 seconds for me. It will probably give you a
merge commit rather than the exact culprit along the second-parent
history. But with that merge commit, you can start a new, much
smaller bisection with it as the "bad" and its first-parent as the
"good".
Both of those are trading a bit of accuracy in finding the exact
midpoint in the early steps. It's perhaps another possible option for
git-bisect itself: if we see a very large number of commits, we could
try to approximate rather than finding the exact answer. In most
histories I'd expect that taking the midpoint of a linearized topo-order
would get you a pretty reasonable outcome. E.g.:
total=$(git rev-list --count v3.0..v6.13-rc7)
git rev-list --topo-order v3.0..v6.13-rc7 |
tail -n +$((total / 2)) | head -n 1
runs in about 2s on my machine. The commit it finds, ed194d136769,
is pretty close to the middle:
$ git rev-list --count v3.0..ed194d136769
526863
$ git rev-list --count ed194d136769..v6.13-rc7
543312
-Peff
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [bug] "git bisect old v3.0" takes 21 mins on Linux repo
2025-01-17 13:14 ` Jeff King
@ 2025-01-17 16:50 ` Junio C Hamano
0 siblings, 0 replies; 10+ messages in thread
From: Junio C Hamano @ 2025-01-17 16:50 UTC (permalink / raw)
To: Jeff King; +Cc: Askar Safin, D. Ben Knoble, git
Jeff King <peff@peff.net> writes:
> Both of those are trading a bit of accuracy in finding the exact
> midpoint in the early steps. It's perhaps another possible option for
> git-bisect itself: if we see a very large number of commits, we could
> try to approximate rather than finding the exact answer.
Another thing the user may (but "bisect" itself cannot) try is to
use a path-limited bisection (that is, if you know your breakage is
inside one subsytem, you only check commits that touch the area).
> In most
> histories I'd expect that taking the midpoint of a linearized topo-order
> would get you a pretty reasonable outcome. E.g.:
>
> total=$(git rev-list --count v3.0..v6.13-rc7)
> git rev-list --topo-order v3.0..v6.13-rc7 |
> tail -n +$((total / 2)) | head -n 1
>
> runs in about 2s on my machine. The commit it finds, ed194d136769,
> is pretty close to the middle:
>
> $ git rev-list --count v3.0..ed194d136769
> 526863
> $ git rev-list --count ed194d136769..v6.13-rc7
> 543312
Interesting thought.
When I did the "single strand of pearls" optimization, I recall I
punted and said "we need to count the weight for all merges the
honest way".
One thing we may want to try is *not* to do the count_distance() for
all merges. For example, if we have 1000 commits in the range,
first you pick a merge M among them and count how many commits in
the range it can reach. Let's say it reachs 400 commits.
We are trying to find a commit that can reach as close to 500
commits, and we know any ancestor of M would reach fewer than 400
commits, so we know the score they will get would be worse than M
without running count_distance() on them. We should be able to
exploit this to optimize, shouldn't we? In order to count the
number for M, count_distance() must have traversed all the ancestor
commits of M before coming up with its answer, so by the time we see
M's score (1000/2 - 400) and realize that it is the best one we have
seen so far that we aim to improve, we know that the score for all
the commits we have seen during that traversal cannot be better than
M's, no?
If M can reach 700 commits (instead of 400), the argument goes the
other way---anything that can reach M can reach even more, so they
cannot be any closer to the middle than M. Knowing what can reach
M, however, needs something like reachability bitmap, though.
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2025-01-17 16:51 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-13 22:11 [bug] "git bisect old v3.0" takes 21 mins on Linux repo Askar Safin
2025-01-13 23:16 ` Askar Safin
2025-01-14 22:21 ` D. Ben Knoble
2025-01-16 10:52 ` Jeff King
2025-01-16 12:53 ` Jeff King
2025-01-16 13:52 ` Jeff King
2025-01-16 16:40 ` Junio C Hamano
2025-01-17 5:31 ` Askar Safin
2025-01-17 13:14 ` Jeff King
2025-01-17 16:50 ` Junio C Hamano
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).