git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Jeff King <peff@peff.net>
Cc: Askar Safin <safinaskar@zohomail.com>,
	 "D. Ben Knoble" <ben.knoble@gmail.com>,
	 git <git@vger.kernel.org>
Subject: Re: [bug] "git bisect old v3.0" takes 21 mins on Linux repo
Date: Fri, 17 Jan 2025 08:50:57 -0800	[thread overview]
Message-ID: <xmqqv7udmlji.fsf@gitster.g> (raw)
In-Reply-To: <20250117131404.GD2893666@coredump.intra.peff.net> (Jeff King's message of "Fri, 17 Jan 2025 08:14:04 -0500")

Jeff King <peff@peff.net> writes:

> Both of those are trading a bit of accuracy in finding the exact
> midpoint in the early steps. It's perhaps another possible option for
> git-bisect itself: if we see a very large number of commits, we could
> try to approximate rather than finding the exact answer.

Another thing the user may (but "bisect" itself cannot) try is to
use a path-limited bisection (that is, if you know your breakage is
inside one subsytem, you only check commits that touch the area).

> In most
> histories I'd expect that taking the midpoint of a linearized topo-order
> would get you a pretty reasonable outcome. E.g.:
>
>   total=$(git rev-list --count v3.0..v6.13-rc7)
>   git rev-list --topo-order v3.0..v6.13-rc7 |
>   tail -n +$((total / 2)) | head -n 1
>
> runs in about 2s on my machine. The commit it finds, ed194d136769,
> is pretty close to the middle:
>
>   $ git rev-list --count v3.0..ed194d136769
>   526863
>   $ git rev-list --count ed194d136769..v6.13-rc7
>   543312

Interesting thought.

When I did the "single strand of pearls" optimization, I recall I
punted and said "we need to count the weight for all merges the
honest way".

One thing we may want to try is *not* to do the count_distance() for
all merges.  For example, if we have 1000 commits in the range,
first you pick a merge M among them and count how many commits in
the range it can reach.  Let's say it reachs 400 commits.

We are trying to find a commit that can reach as close to 500
commits, and we know any ancestor of M would reach fewer than 400
commits, so we know the score they will get would be worse than M
without running count_distance() on them.  We should be able to
exploit this to optimize, shouldn't we?  In order to count the
number for M, count_distance() must have traversed all the ancestor
commits of M before coming up with its answer, so by the time we see
M's score (1000/2 - 400) and realize that it is the best one we have
seen so far that we aim to improve, we know that the score for all
the commits we have seen during that traversal cannot be better than
M's, no?

If M can reach 700 commits (instead of 400), the argument goes the
other way---anything that can reach M can reach even more, so they
cannot be any closer to the middle than M.  Knowing what can reach
M, however, needs something like reachability bitmap, though.


      reply	other threads:[~2025-01-17 16:51 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-01-13 22:11 [bug] "git bisect old v3.0" takes 21 mins on Linux repo Askar Safin
2025-01-13 23:16 ` Askar Safin
2025-01-14 22:21 ` D. Ben Knoble
2025-01-16 10:52   ` Jeff King
2025-01-16 12:53     ` Jeff King
2025-01-16 13:52       ` Jeff King
2025-01-16 16:40         ` Junio C Hamano
2025-01-17  5:31           ` Askar Safin
2025-01-17 13:14             ` Jeff King
2025-01-17 16:50               ` Junio C Hamano [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqv7udmlji.fsf@gitster.g \
    --to=gitster@pobox.com \
    --cc=ben.knoble@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=peff@peff.net \
    --cc=safinaskar@zohomail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).