git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2.1 resend] rev-list: estimate number of bisection step left
@ 2009-02-21  8:26 Christian Couder
  2009-02-21  9:27 ` Junio C Hamano
  0 siblings, 1 reply; 3+ messages in thread
From: Christian Couder @ 2009-02-21  8:26 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Ingo Molnar, John Tapsell, Johannes Schindelin, Thomas Rast

This patch teaches "git rev-list --bisect-vars" to output an estimate
of the number of bisection step left _after the current one_ along with
the other variables it already outputs.

This patch also makes "git-bisect.sh" display this number of steps left
_after the current one_, along with the estimate of the number of
revisions left to test (after the current one).

Here is a table to help analyse what should be the best estimate for
the number of bisect steps left.

N : linear case                    --> probabilities --> best
-------------------------------------------------------------
1 : G-B                            --> 0             --> 0
2 : G-U1-B                         --> 0             --> 0
3 : G-U1-U2-B                      --> 0(1/3) 1(2/3) --> 1
4 : G-U1-U2-U3-B                   --> 1             --> 1
5 : G-U1-U2-U3-U4-B                --> 1(3/5) 2(2/5) --> 1
6 : G-U1-U2-U3-U4-U5-B             --> 1(2/6) 2(4/6) --> 2
7 : G-U1-U2-U3-U4-U5-U6-B          --> 1(1/7) 2(6/7) --> 2
8 : G-U1-U2-U3-U4-U5-U6-U7-B       --> 2             --> 2
9 : G-U1-U2-U3-U4-U5-U6-U7-U8-B    --> 2(7/9) 3(2/9) --> 2
10: G-U1-U2-U3-U4-U5-U6-U7-U8-U9-B --> 2(6/10)3(4/10)--> 2

In the column "N", there is the number of revisions that could _now_
be the first bad commit we are looking for.

The "linear case" column describes the linear history corresponding to
the number in column N. G means good, B means bad, and Ux means
unknown. Note that the first bad revision we are looking for can be
any Ux or B.

In the "probabilities" column, there are the different outcomes in
number of steps with the odds of each outcome in parenthesis
corresponding to the linear case.

The "best" column gives the most accurate estimate among the different
outcomes in the "probabilities" column.

We have the following:

best(2^n) == n - 1

and for any x between 0 included and 2^n excluded, the probability for
n - 1 steps left looks like:

P(2^n + x) == (2^n - x) / (2^n + x)

and P(2^n + x) < 0.5 means 2^n < 3x

So the algorithm used in this patch calculates 2^n and x, and then
choose between returning n - 1 and n.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 builtin-rev-list.c |   45 +++++++++++++++++++++++++++++++++++++++++++--
 git-bisect.sh      |    2 +-
 2 files changed, 44 insertions(+), 3 deletions(-)

	There should be only a spurious space removed compared to version v2.

diff --git a/builtin-rev-list.c b/builtin-rev-list.c
index 436afa4..40d5fcb 100644
--- a/builtin-rev-list.c
+++ b/builtin-rev-list.c
@@ -574,6 +574,45 @@ static struct commit_list *find_bisection(struct commit_list *list,
 	return best;
 }
 
+static inline int log2i(int n)
+{
+	int log2 = 0;
+
+	for (; n > 1; n >>= 1)
+		log2++;
+
+	return log2;
+}
+
+static inline int exp2i(int n)
+{
+	return 1 << n;
+}
+
+/*
+ * Estimate the number of bisect steps left (after the current step)
+ *
+ * For any x between 0 included and 2^n excluded, the probability for
+ * n - 1 steps left looks like:
+ *
+ * P(2^n + x) == (2^n - x) / (2^n + x)
+ *
+ * and P(2^n + x) < 0.5 means 2^n < 3x
+ */
+static int estimate_bisect_steps(int all)
+{
+	int n, x, e;
+
+	if (all < 3)
+		return 0;
+
+	n = log2i(all);
+	e = exp2i(n);
+	x = all - e;
+
+	return (e < 3 * x) ? n : n - 1;
+}
+
 int cmd_rev_list(int argc, const char **argv, const char *prefix)
 {
 	struct commit_list *list;
@@ -688,12 +727,14 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
 			       "bisect_nr=%d\n"
 			       "bisect_good=%d\n"
 			       "bisect_bad=%d\n"
-			       "bisect_all=%d\n",
+			       "bisect_all=%d\n"
+			       "bisect_steps=%d\n",
 			       hex,
 			       cnt - 1,
 			       all - reaches - 1,
 			       reaches - 1,
-			       all);
+			       all,
+			       estimate_bisect_steps(all));
 			return 0;
 		}
 	}
diff --git a/git-bisect.sh b/git-bisect.sh
index 85db4ba..9ea1cdc 100755
--- a/git-bisect.sh
+++ b/git-bisect.sh
@@ -500,7 +500,7 @@ bisect_next() {
 	# commit is also a "skip" commit (see above).
 	exit_if_skipped_commits "$bisect_rev"
 
-	bisect_checkout "$bisect_rev" "$bisect_nr revisions left to test after this"
+	bisect_checkout "$bisect_rev" "$bisect_nr revisions left to test after this (roughly $bisect_steps steps)"
 }
 
 bisect_visualize() {
-- 
1.6.2.rc1.22.g3290.dirty

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH v2.1 resend] rev-list: estimate number of bisection step left
  2009-02-21  8:26 [PATCH v2.1 resend] rev-list: estimate number of bisection step left Christian Couder
@ 2009-02-21  9:27 ` Junio C Hamano
  2009-02-21 17:12   ` Christian Couder
  0 siblings, 1 reply; 3+ messages in thread
From: Junio C Hamano @ 2009-02-21  9:27 UTC (permalink / raw)
  To: Christian Couder
  Cc: git, Ingo Molnar, John Tapsell, Johannes Schindelin, Thomas Rast

Christian Couder <chriscool@tuxfamily.org> writes:

> Here is a table to help analyse what should be the best estimate for
> the number of bisect steps left.
> ...
> So the algorithm used in this patch calculates 2^n and x, and then
> choose between returning n - 1 and n.

Two dumb questions about the math (because I suck at math).

 - Do you mean by "best estimate" the best case scenario, or something
   else (perhaps "the most accurate")?

 - Is computing based on linear history a good enough approximation?

I am not advocating to make things more precise nor saying your math is
flawed.  I'd prefer simpler code for things like this --- after all, my
understanding of this whole exercise is to give "this is more or less the
number you should expect" ballpack figure and nothing more (correct me if
that is not what you are aiming for).

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH v2.1 resend] rev-list: estimate number of bisection step left
  2009-02-21  9:27 ` Junio C Hamano
@ 2009-02-21 17:12   ` Christian Couder
  0 siblings, 0 replies; 3+ messages in thread
From: Christian Couder @ 2009-02-21 17:12 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Ingo Molnar, John Tapsell, Johannes Schindelin, Thomas Rast

Le samedi 21 février 2009, Junio C Hamano a écrit :
> Christian Couder <chriscool@tuxfamily.org> writes:
> > Here is a table to help analyse what should be the best estimate for
> > the number of bisect steps left.
> > ...
> > So the algorithm used in this patch calculates 2^n and x, and then
> > choose between returning n - 1 and n.
>
> Two dumb questions about the math (because I suck at math).
>
>  - Do you mean by "best estimate" the best case scenario, or something
>    else (perhaps "the most accurate")?

I mean the number of steps that has the highest frequency of occurrence, if 
we suppose that:

1) we are in the linear case, and that
2) the first bad revision has an equal chance of being any of the unknown 
(Ux) and the bad (B) revisions.

>  - Is computing based on linear history a good enough approximation?

I think so. Before I sent patch v1, I tested it on some non linear cases and 
it worked quite well in those cases as well. I think the bisect algorithm 
tries as best as possible to remove half the revisions from the set of the 
possible first bad revisions and this means that all the algorithms based 
on log2(all) should be quite good in any case.

> I am not advocating to make things more precise nor saying your math is
> flawed.  I'd prefer simpler code for things like this --- after all, my
> understanding of this whole exercise is to give "this is more or less the
> number you should expect" ballpack figure and nothing more (correct me if
> that is not what you are aiming for).

Yes, that's what I am aiming for. And I thought that my first patch was 
quite good at doing that, so when you said that "at the very low end the 
estimate is a bit too optimistic", I had another look and found out that 
for "all == 3" indeed the algorithm gave 0 when 1 was better, because 1 
should happen on average 2 times out of 3.

So I came up with the algorithm in v2 which is better because it should 
always give the best estimate (supposing 1) and 2)). It's also not much 
more complex compared to v1 or anything based on log2(all), so I don't see 
any reason to use something else.

Regards,
Christian.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2009-02-21 17:15 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-02-21  8:26 [PATCH v2.1 resend] rev-list: estimate number of bisection step left Christian Couder
2009-02-21  9:27 ` Junio C Hamano
2009-02-21 17:12   ` Christian Couder

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).