git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Christian Couder <chriscool@tuxfamily.org>
To: Junio C Hamano <gitster@pobox.com>
Cc: git@vger.kernel.org, Ingo Molnar <mingo@elte.hu>
Subject: Re: [PATCH] rev-list: estimate number of bisection step left
Date: Thu, 19 Feb 2009 06:16:18 +0100	[thread overview]
Message-ID: <200902190616.18747.chriscool@tuxfamily.org> (raw)
In-Reply-To: <7vljs58qul.fsf@gitster.siamese.dyndns.org>

Le mardi 17 février 2009, Junio C Hamano a écrit :
> Christian Couder <chriscool@tuxfamily.org> writes:
> > +static int estimate_bisect_steps(int all)
> > +{
> > +	int log2 = 0;
> > +	int left = (all >> 1) - 1;
> > +
> > +	if (left <= 0)
> > +		return 0;
> > +
> > +	do {
> > +		left = left >> 1;
> > +		log2++;
> > +	} while (left);
> > +
> > +	return log2;
> > +}
> > ...
> > diff --git a/git-bisect.sh b/git-bisect.sh
> > index 85db4ba..6b23439 100755
> > --- a/git-bisect.sh
> > +++ b/git-bisect.sh
> > @@ -500,7 +500,7 @@ bisect_next() {
> >  	# commit is also a "skip" commit (see above).
> >  	exit_if_skipped_commits "$bisect_rev"
> >
> > -	bisect_checkout "$bisect_rev" "$bisect_nr revisions left to test
> > after this" +	bisect_checkout "$bisect_rev" "$bisect_nr revisions left
> > to test after this (roughtly $bisect_steps steps)"
>
> "roughly".

Yes, thanks.

> all	left
> 0	0
> 1	0
> 2	0
> 3	0
> 4	1
> 5	1
> 6	2
> 7	2
> 8	2
> 9	2
>
> It seems that at the very low end the estimate is a bit too optimistic.
> How about showing this number from the Porcelain only when $bisect_steps
> is more than 2 (or all is more than 9)?

I think it's more consistent to always show it.

Now for the algorithm, first please note that we are looking for an estimate 
of the number of bisect steps left _after the current one_, and that git 
bisect currently only displays an estimate of the number of revisions left 
to test _after the current one_.

Here is a table to help analyse what should be the best estimate for
the number of bisect steps left.

N : linear case                    --> probabilities --> best | v1
------------------------------------------------------------------
1 : G-B                            --> 0             --> 0    | 0
2 : G-U1-B                         --> 0             --> 0    | 0
3 : G-U1-U2-B                      --> 0(1/3) 1(2/3) --> 1    | 0
4 : G-U1-U2-U3-B                   --> 1             --> 1    | 1
5 : G-U1-U2-U3-U4-B                --> 1(3/5) 2(2/5) --> 1    | 1
6 : G-U1-U2-U3-U4-U5-B             --> 1(2/6) 2(4/6) --> 2    | 2
7 : G-U1-U2-U3-U4-U5-U6-B          --> 1(1/7) 2(6/7) --> 2    | 2
8 : G-U1-U2-U3-U4-U5-U6-U7-B       --> 2             --> 2    | 2
9 : G-U1-U2-U3-U4-U5-U6-U7-U8-B    --> 2(7/9) 3(2/9) --> 2    | 2
10: G-U1-U2-U3-U4-U5-U6-U7-U8-U9-B --> 2(6/10)3(4/10)--> 2    | 3

In the column "N", there is the number of revisions that could _now_
be the first bad commit we are looking for.

The "linear case" column describes the linear history corresponding to
the number in column N. G means good, B means bad, and Ux means
unknown. Note that the first bad revision we are looking for can be
any Ux or B.

In the "probabilities" column, there are the different outcomes in
number of steps with the odds of each outcome in parenthesis
corresponding to the linear case.

The "best" column gives the most accurate estimate among the different
outcomes in the "probabilities" column.

The "v1" column gives the estimates according my first patch.

Now looking at the table, we have the following:

best(2^n) == n - 1

and for any x between 0 included and 2^n excluded, the probability for
n - 1 steps left looks like:

P(2^n + x) == (2^n - x) / (2^n + x) 

If P(2^n + x) < 0.5 we should return n and otherwise n - 1.

But P(2^n + x) < 0.5 means:

2 * (2^n - x) < (2^n + x)

that is: 2^n < 3x

So the improved algorithm could be something like:

static int estimate_bisect_steps(int all)
{
	int n, x, e;
	float p;

	if (all < 3)
		return 0;

	n = log2(all);
	e = exp2(n);
	x = all - e;

	return (e < 3 * x) ? n : n - 1 ;
}

But on Linux, log2 and exp2 are defined in "math.h" and available with:

_XOPEN_SOURCE >= 600 || _ISOC99_SOURCE; or cc -std=c99

and we must link with -lm, but I don't know about the other platforms.

So I don't know what to do about them. Please advise.

Thanks in advance,
Christian.

  reply	other threads:[~2009-02-19  5:18 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-02-17  5:09 [PATCH] rev-list: estimate number of bisection step left Christian Couder
2009-02-17  7:28 ` Junio C Hamano
2009-02-19  5:16   ` Christian Couder [this message]
2009-02-19  5:26     ` Christian Couder
2009-02-19  5:32     ` Christian Couder
2009-02-19  6:02       ` John Tapsell
2009-02-19  6:49         ` Christian Couder
2009-02-17 14:44 ` Johannes Schindelin
2009-02-17 15:11   ` John Tapsell
2009-02-17 15:31     ` Johannes Schindelin
2009-02-17 15:36       ` John Tapsell
2009-02-17 15:39         ` Johannes Schindelin
2009-02-17 15:39       ` Thomas Rast

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200902190616.18747.chriscool@tuxfamily.org \
    --to=chriscool@tuxfamily.org \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=mingo@elte.hu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).