git-rev-list: "--bisect" flag

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* git-rev-list: "--bisect" flag
@ 2005-06-18  6:31 Linus Torvalds
  2005-06-19  0:18 ` Jon Seymour
  0 siblings, 1 reply; 19+ messages in thread
From: Linus Torvalds @ 2005-06-18  6:31 UTC (permalink / raw)
  To: Git Mailing List

Ok, I just added a feature to "git-rev-list" that I hereby nominate to be
the coolest feature ever, namely the "--bisect" flag, which basically
tries to split the list of revisions in two, and prints out just the "most
middle" commit.

The idea is that you want to do binary-search for a bug, but since the git
history isn't a nice linear thing, you don't know where the hell the
mid-point is between two commits, and even if you _do_ know where it is,
it gets even more complicated when you have 3 points (on different
branches) that are known good, and one point that is known bad.

Now, assuming you have three known-good points, and one known bad point, 
the way to get all commits in between is

	git-rev-list bad ^good1 ^good2 ^good3

and with the new flag you can now trivially get an estimate for what the 
mid-point is in this list. So you just do

	git-rev-list --bisect bad ^good1 ^good2 ^good3

and you now have the point to test. If testing shows that mid-way-point 
was bad, you just continue with that as a "new bad":

	git-rev-list --bisect new-bad ^good1 ^good2 ^good3

but if it shows that that point was still good, you instead just add it to 
the list of uninteresting commits, and do

	git-rev-list --bisect bad ^good1 ^good2 ^good3 ^new-good

instead. Every iteration you basically cut down the number of commits by 
half - you're bisecting the set, and binary-searching for the first bad 
commit.

Note that git also makes resetting to the test-point be really trivial: 
you just do

	git-read-tree -m -u <prev-point> <test-point>

where "prev-point" was your previous state (ie usually "HEAD" the first 
time, but the previous test-point otherwise), and "test-point" is the new 
point you want to test.

I haven't scripted this, but it should basically be pretty easy to add a
couple of simple scripts to make it very easy to do the binary search be
totally automated, assuming just a simple "mark test as failed/good"  
interface. Even without the scripting, it shouldn't be that hard to do
entirely by hand.

The idea being that if you notice that something doesn't work (usually in 
HEAD), but you know it used to work in the previous release, you just 
start the whole thing going with

	# set up initial state
	cat .git/HEAD > .git/BAD
	cat .git/HEAD > .git/CURRENT
	echo "^v2.6.12" > .git/GOOD

and then you just do something like:

	# iterate
	git-rev-list --bisect BAD $(cat .git/BAD) > .git/HALFWAY
	if [ ! -s .git/HALFWAY ]; then
		echo FOUND IT:
		cat .git/CURRENT
		exit 1
	fi
	git-read-tree -m -u CURRENT HALFWAY
	cat .git/HALFWAY > .git/CURRENT

	make ..

	# if good
	echo '^'$(cat .git/CURENT) >> .git/GOOD
	.. iterate ..

	# if bad
	cat .git/CURRENT > .git/BAD
	.. iterate ..

until it says "FOUND IT".

(The above is untested, but the --bisect behaviour itself I tested, and it 
seems to do the right thing).

NOTE! I'm not convinced I actually find a particularly good place to split 
at, but I think my stupid "halfway point" decision is good enough in 
practice. So it might not really be a real binary search, but I think it 
comes pretty close to it unless you have some really odd cases.

			Linus

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: git-rev-list: "--bisect" flag
  2005-06-18  6:31 git-rev-list: "--bisect" flag Linus Torvalds
@ 2005-06-19  0:18 ` Jon Seymour
  2005-06-19  3:38   ` Jan Harkes
  2005-06-19  3:43   ` Linus Torvalds
  0 siblings, 2 replies; 19+ messages in thread
From: Jon Seymour @ 2005-06-19  0:18 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Git Mailing List

On 6/18/05, Linus Torvalds <torvalds@osdl.org> wrote:
> 
> Ok, I just added a feature to "git-rev-list" that I hereby nominate to be
> the coolest feature ever, namely the "--bisect" flag, which basically
> tries to split the list of revisions in two, and prints out just the "most
> middle" commit.
> 

Perhaps in answering this question, you can help me understand the
intended behaviour of your bisection algorithm a little better. The
question is this: for your purposes, given a rev-list output, why not
simply use the literal middle element of the outputed list?

jon.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: git-rev-list: "--bisect" flag
  2005-06-19  0:18 ` Jon Seymour
@ 2005-06-19  3:38   ` Jan Harkes
  2005-06-19  4:07     ` Jon Seymour
  2005-06-19  3:43   ` Linus Torvalds
  1 sibling, 1 reply; 19+ messages in thread
From: Jan Harkes @ 2005-06-19  3:38 UTC (permalink / raw)
  To: Git Mailing List

On Sun, Jun 19, 2005 at 10:18:45AM +1000, Jon Seymour wrote:
> On 6/18/05, Linus Torvalds <torvalds@osdl.org> wrote:
> > 
> > Ok, I just added a feature to "git-rev-list" that I hereby nominate to be
> > the coolest feature ever, namely the "--bisect" flag, which basically
> > tries to split the list of revisions in two, and prints out just the "most
> > middle" commit.
> > 
> 
> Perhaps in answering this question, you can help me understand the
> intended behaviour of your bisection algorithm a little better. The
> question is this: for your purposes, given a rev-list output, why not
> simply use the literal middle element of the outputed list?

A was known good, parallel development for commits B and C, finally
merged into D. A bug got introduced in B, which we discover by the time
we have a checked out copy of D.

     .--- B ---.
    /           \
   A             D
    \           /
     `--- C ---'

git-rev-list E ^A shows this as BCD. Pick the halfway point C, and this
one checks out ok. So at this point we would decide that the bug got
introduced by the C->D change.

My guess is that Linus's bisect algorithm considers the branches, so
once C is cleared as good, we get a rev-list that looks like BD, and as
such can still find the exact bad commit B.

Jan

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: git-rev-list: "--bisect" flag
  2005-06-19  3:38   ` Jan Harkes
@ 2005-06-19  4:07     ` Jon Seymour
  0 siblings, 0 replies; 19+ messages in thread
From: Jon Seymour @ 2005-06-19  4:07 UTC (permalink / raw)
  To: Git Mailing List

On 6/19/05, Jan Harkes <jaharkes@cs.cmu.edu> wrote:
> On Sun, Jun 19, 2005 at 10:18:45AM +1000, Jon Seymour wr
> A was known good, parallel development for commits B and C, finally
> merged into D. A bug got introduced in B, which we discover by the time
> we have a checked out copy of D.
> 
>      .--- B ---.
>     /           \
>    A             D
>     \           /
>      `--- C ---'
> 
> git-rev-list E ^A shows this as BCD. Pick the halfway point C, and this

I assume you mean git-rev-list D ^A

> one checks out ok. So at this point we would decide that the bug got
> introduced by the C->D change.

I think you misunderstood the intent of my original question which was
implicitly referring to the implementation of the bisection algorithm,
not to the bug blatt algorithm that makes use of the bisection
algorithm.

My actual question was: "Why is the bisection algorithm itself the way it is? "

A simple bisection algorithm which simply chooses the middle of the
git-rev-list output chooses a point which one could in principle argue
is as good as the one chosen by the current bisection algorithm.
However, I must admit that I don't quite understand the subtleties of
Linus' bisection algorithm, hence the question - why is his bisection
algorithm better than simply choosing the literal middle?

I'll post a patch that I have already sent to Linus which demonstrates
an implementation of --bisect that chooses the literal middle. This
patch also includes a demonstration of Linus' binary bug blatting
technique (see t/t6001-rev-list-merge-order.sh)

jon.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: git-rev-list: "--bisect" flag
  2005-06-19  0:18 ` Jon Seymour
  2005-06-19  3:38   ` Jan Harkes
@ 2005-06-19  3:43   ` Linus Torvalds
  2005-06-19  5:03     ` Linus Torvalds
  1 sibling, 1 reply; 19+ messages in thread
From: Linus Torvalds @ 2005-06-19  3:43 UTC (permalink / raw)
  To: jon; +Cc: Git Mailing List

On Sun, 19 Jun 2005, Jon Seymour wrote:
> 
> Perhaps in answering this question, you can help me understand the
> intended behaviour of your bisection algorithm a little better. The
> question is this: for your purposes, given a rev-list output, why not
> simply use the literal middle element of the outputed list?

The literal middle tends to be in the middle _timewise_, but that's
irrelevant. It might be just a single step away from one of the commits
that has already been marked "good", and as such, testing that one might
well be totally pointless: if you have a thousand commits (not at all
unlikely), testing whether they are good or not one at a time is just not
reasonable.

What we want to get is the commit that basically bisects the other commits 
from a "reachability graph" angle. We want something that when selected as 
the new "top" will have roughly half the commits in it.

Put another way: let's say that we have those thousand commits that _may_ 
have introduced a bug, but we don't know which one. What do we want? We 
want to find a new head that contains roughly five hundred of the unknown 
commits. No more, no less. Maybe we won't find a  500/500 split, but we 
definitely want a 600/400 split over a 1/999 split, since the 1/999 split 
isn't likely to get us any closer to the answer.

Now, I say "roughly", because there may not _be_ any commit that exactly 
bisects the case. For a trivial case, let's say that we know that 'A' is 
bad, and 'B' is good, but the graph in between them looks like this:

                   A
                 / |
                a  | 
                |  b
                |  | \
                c  d  e
                 \ | /
                   B

so the bug might have been introduced by any of a-e, and we just don't
know which one was the cause.

Now, selecting either 'a' or 'b' as the new tip is a good choice, 'a'
reaches two unknowns (a+c) while b reaches 3 unknowns (b+d+e) but the
others only reach a single unknown.

And notice how the two good bisection places were both _recent_. They were 
not in the middle "time-wise".

Also note that we do NOT want to reach the maximal amount of commits: we 
want to reach _half_ the commits. If we select a commit that reaches a lot 
of other unknown commits ('A' is the maximal here, of course), then we're 
not actually bisecting the thing at all: we're just testing close to a 
"known bad" case.

But let's say that 'c' didn't actually exist (or, alternatively, we have
tested that one and know that "B+c" is good, which basically makes 'c'
irrelevant for bug-hunting): in that case there is no good bisection,
since whichever of the remaining ones we choose (a,b,d or e) we'll always
have split it up so that we basically test the addition or subtraction of
just one single commit. There's not anything "--bisect" can do about that, 
and it will just happen to pick one of them. And that's what I mean
by "there may not be any commit that is nicely in the middle".

		Linus

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: git-rev-list: "--bisect" flag
  2005-06-19  3:43   ` Linus Torvalds
@ 2005-06-19  5:03     ` Linus Torvalds
  2005-06-19  5:05       ` David Lang
  2005-06-19 10:15       ` Jon Seymour
  0 siblings, 2 replies; 19+ messages in thread
From: Linus Torvalds @ 2005-06-19  5:03 UTC (permalink / raw)
  To: jon; +Cc: Git Mailing List

On Sat, 18 Jun 2005, Linus Torvalds wrote:
> 
> Now, I say "roughly", because there may not _be_ any commit that exactly 
> bisects the case. For a trivial case, let's say that we know that 'A' is 
> bad, and 'B' is good, but the graph in between them looks like this:

Let's make a different case, just to make it even more obvious.

Let's say that the graph is

        A
       / \
      a1  b1
      |   |
      a2  b2
      |   |
      a3  b3
      |   |
      a4  b4
      |   |
      a5  b5
      |   |
      a6  b6
      |   |
      a7  b7
      |   |
      a8  b8
      |   |
      a9  b9
       \ /
        B

and we know that "A" is bad, and "B" is good, but we don't know which of 
a1-a9/b1-b9 are buggy.

Where do we start testing?

We start testing at either 'a1' or 'b1', because those are the two values 
that bisect the list either into the "a1-a9" series, or the "b1-b9" 
series. Any other starting point would be a bug.

Or, let's say that the graph is

        A
       / \
      |   b1
      |   |
      |   b2
      |   |
      |   b3
      |   |
      a1  b4
      |   |
      a2  b5
      |   |
      a3  b6
      |   |
      |   b7
      |   |
      |   b8
      |   |
      |   b9
       \ /
        B

and in this case the right place to pick is 'b4', because that's the one
that reaches 6 commits (b4-b9), while it leaves 6 commits unreachable
(a1-a3, b1-b3): that's a perfect bisection, and now it's totally 
unambiguous (in the previous case a1 and b1 were equally good choices, in 
this case there is only one valid choice).

It gets more interesting when you have intermediate merges:

        A
       / \
      |   b1
      |   |
      |   b2
      |   |
      |   b3
      |   |
      a1  b4
      |   |
      a2  b5
      | / |
      a3  b6
      |   |
      |   b7
      |   |
      |   b8
      |   |
      |   b9
       \ /
        B

The above graph _looks_ very similar, but now the right place to bisect is
'b5', because that reaches six commits (a3 + b5-b9), and again there are
six commits unreachable (a1-a2 + b1-b4). Again, this is unambiguous -
there is one clearly superior choice.

The current --bisect algorithm may not always pick that best choice: I
think I should subtract one from the total number of commits, since right
now it counts the "good" commit too, ie 'A'. But I think it's at most
off-by-one.

The bigger problem is that the algorithm is something like O(N^3) in cost
- albeit with a fairly cheap constant factor. In other words, it might not
be worthwhile bisecting three years worth of development with it, and you
migth be better off starting with a rougher half-way-point algorithm
("let's try some release a year and a half ago first").

The performance problem seems to really be pretty theoretical: I can
bisect the developemnt from 2.6.12-rc2 to current head in 0.59 seconds, so
it's not like it's horribly slow for something like a couple of months
worth of development.

			Linus

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: git-rev-list: "--bisect" flag
  2005-06-19  5:03     ` Linus Torvalds
@ 2005-06-19  5:05       ` David Lang
  2005-06-19  5:30         ` Linus Torvalds
  2005-06-19 10:15       ` Jon Seymour
  1 sibling, 1 reply; 19+ messages in thread
From: David Lang @ 2005-06-19  5:05 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: jon, Git Mailing List

On Sat, 18 Jun 2005, Linus Torvalds wrote:

> The performance problem seems to really be pretty theoretical: I can
> bisect the developemnt from 2.6.12-rc2 to current head in 0.59 seconds, so
> it's not like it's horribly slow for something like a couple of months
> worth of development.

if it takes you about as long to type the command (and scan it to make 
sure you didn't mistype it) as it does to execute you don't have a 
performance problem :-)

David Lang

-- 
There are two ways of constructing a software design. One way is to make it so simple that there are obviously no deficiencies. And the other way is to make it so complicated that there are no obvious deficiencies.
  -- C.A.R. Hoare

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: git-rev-list: "--bisect" flag
  2005-06-19  5:05       ` David Lang
@ 2005-06-19  5:30         ` Linus Torvalds
  0 siblings, 0 replies; 19+ messages in thread
From: Linus Torvalds @ 2005-06-19  5:30 UTC (permalink / raw)
  To: David Lang; +Cc: jon, Git Mailing List

On Sat, 18 Jun 2005, David Lang wrote:
> 
> if it takes you about as long to type the command (and scan it to make 
> sure you didn't mistype it) as it does to execute you don't have a 
> performance problem :-)

Yeah, well, in all honesty, I haven't actually verified that my bisection 
gives the right answer.

I verified "visually" that it does something half-way sane with this:

  git-rev-list --bisect HEAD ^v2.6.12-rc2 > .git/refs/tags/bisect-1
  git-rev-list --bisect HEAD ^v2.6.12-rc2 ^bisect-1 > .git/refs/tags/bisect-2
  git-rev-list --bisect HEAD ^v2.6.12-rc2 ^bisect-1 ^bisect-2 > .git/refs/tags/bisect-3
  git-rev-list --bisect HEAD ^v2.6.12-rc2 ^bisect-1 ^bisect-2 ^bisect-3 > .git/refs/tags/bisect-4
  git-rev-list --bisect HEAD ^v2.6.12-rc2 ^bisect-1 ^bisect-2 ^bisect-3 ^bisect-4 > .git/refs/tags/bisect-5
  gitk

and it looked sane (ie "bisect-1" should show up as a tag about half-way, 
and the others should get progressively closer to the HEAD). But when I 
now tried it again (I did the above originally with plain 2.6.12), I get 
errors from "gitk":

	ERROR: none of the pending commits can be done yet:
	  bd6ae2f6d61da0f90c6b66e9a4ab6c53ef8c159a
	  2512809255d018744fe6c2f5e996c83769846c07
	  88d7bd8cb9eb8d64bf7997600b0d64f7834047c5
	  b3214970abbe983cd89842ae24ea00e21bba79f6

which looks like the current kernel overflows gitk some way (there's 
suddenly a lot more parallellism, since there were a number of merges that 
were pending, waiting for the 2.6.12 release).

		Linus

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: git-rev-list: "--bisect" flag
  2005-06-19  5:03     ` Linus Torvalds
  2005-06-19  5:05       ` David Lang
@ 2005-06-19 10:15       ` Jon Seymour
  2005-06-19 14:41         ` Jon Seymour
  1 sibling, 1 reply; 19+ messages in thread
From: Jon Seymour @ 2005-06-19 10:15 UTC (permalink / raw)
  To: Git Mailing List

FWIW, I accept that Linus' bisection algorithm does have better worst
case performance (in terms of number of build/test iterations) on at
least some examples than the literal middle bisection algorithm and
that the "literal middle" bisection algorithm has degenerate cases
which mean that its performance is sometimes worse than O(log 2 N).

I don't actually have a good feel for what the actual bound on Linus'
algorithm is, but I'll measure it on the 2.6 kernel and post the
results.

Regards,

jon.

On 6/19/05, Linus Torvalds <torvalds@osdl.org> wrote:
> 
> 
> On Sat, 18 Jun 2005, Linus Torvalds wrote:
> >
> > Now, I say "roughly", because there may not _be_ any commit that exactly
> > bisects the case. For a trivial case, let's say that we know that 'A' is
> > bad, and 'B' is good, but the graph in between them looks like this:
> 
> Let's make a different case, just to make it even more obvious.
> 
> Let's say that the graph is
> 
> 
>         A
>        / \
>       a1  b1
>       |   |
>       a2  b2
>       |   |
>       a3  b3
>       |   |
>       a4  b4
>       |   |
>       a5  b5
>       |   |
>       a6  b6
>       |   |
>       a7  b7
>       |   |
>       a8  b8
>       |   |
>       a9  b9
>        \ /
>         B
> 
> and we know that "A" is bad, and "B" is good, but we don't know which of
> a1-a9/b1-b9 are buggy.
> 
> Where do we start testing?
> 
> We start testing at either 'a1' or 'b1', because those are the two values
> that bisect the list either into the "a1-a9" series, or the "b1-b9"
> series. Any other starting point would be a bug.
> 
> Or, let's say that the graph is
> 
>         A
>        / \
>       |   b1
>       |   |
>       |   b2
>       |   |
>       |   b3
>       |   |
>       a1  b4
>       |   |
>       a2  b5
>       |   |
>       a3  b6
>       |   |
>       |   b7
>       |   |
>       |   b8
>       |   |
>       |   b9
>        \ /
>         B
> 
> 
> and in this case the right place to pick is 'b4', because that's the one
> that reaches 6 commits (b4-b9), while it leaves 6 commits unreachable
> (a1-a3, b1-b3): that's a perfect bisection, and now it's totally
> unambiguous (in the previous case a1 and b1 were equally good choices, in
> this case there is only one valid choice).
> 
> It gets more interesting when you have intermediate merges:
> 
>         A
>        / \
>       |   b1
>       |   |
>       |   b2
>       |   |
>       |   b3
>       |   |
>       a1  b4
>       |   |
>       a2  b5
>       | / |
>       a3  b6
>       |   |
>       |   b7
>       |   |
>       |   b8
>       |   |
>       |   b9
>        \ /
>         B
> 
> The above graph _looks_ very similar, but now the right place to bisect is
> 'b5', because that reaches six commits (a3 + b5-b9), and again there are
> six commits unreachable (a1-a2 + b1-b4). Again, this is unambiguous -
> there is one clearly superior choice.
> 
> The current --bisect algorithm may not always pick that best choice: I
> think I should subtract one from the total number of commits, since right
> now it counts the "good" commit too, ie 'A'. But I think it's at most
> off-by-one.
> 
> The bigger problem is that the algorithm is something like O(N^3) in cost
> - albeit with a fairly cheap constant factor. In other words, it might not
> be worthwhile bisecting three years worth of development with it, and you
> migth be better off starting with a rougher half-way-point algorithm
> ("let's try some release a year and a half ago first").
> 
> The performance problem seems to really be pretty theoretical: I can
> bisect the developemnt from 2.6.12-rc2 to current head in 0.59 seconds, so
> it's not like it's horribly slow for something like a couple of months
> worth of development.
> 
>                         Linus
> 


-- 
homepage: http://www.zeta.org.au/~jon/
blog: http://orwelliantremors.blogspot.com/

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: git-rev-list: "--bisect" flag
  2005-06-19 10:15       ` Jon Seymour
@ 2005-06-19 14:41         ` Jon Seymour
  2005-06-19 17:20           ` Linus Torvalds
  0 siblings, 1 reply; 19+ messages in thread
From: Jon Seymour @ 2005-06-19 14:41 UTC (permalink / raw)
  To: Git Mailing List, Linus Torvalds

Linus,

Would I be correct in stating that an intuitive  reason why your
algorithm is better than selecting the linear middle is the following:

If you concentrate on testing merges, rather than non-merges, the
chances are you are going to eliminate N-times as many possible good
commits as if you pick a random commit, where N is the average fan-out
of the commit graph.

With the linear middle algorithm, it doesn't really matter if you pick
a child of a merge - that is almost as good as picking the merge
itself. But if you happen to pick a parent of merge, then if it is
good and the parent is also good, you have wasted the opportunity to
clean up the peer branches of the parent you chose.

So the heuristic should be:

- focus on branches until there are no  branches left, then keep
bisecting the list that remains.

Building on this insight, though, I'd like to point out that if you
want to eliminate things on a large scale, the things to focus on
first are epoch boundaries. Epoch boundaries are defined in such a way
that if an epoch boundary is good then everything reachable from an
epoch boundary is also good [ assuming exactly one fault ].

If you happen to choose a child of an epoch boundary, that's ok on
one-level, but on another level it's not ok. In particular, you
haven't eliminated everything beyond the epoch boundary so the open
list will be longer than it strictly needs to be.

So, I propose a modification to my "linear middle" bisection algorithm
 as follows:

If there is more than one epoch boundary in the interesting graph,
choose the literal middle epoch boundary, otherwise, if there is more
than one merge in the interesting graph, chose the literal middle
merge, otherwise if there are no merges in the interesting graph,
choose the literal middle of the remaining list.

I am going to build a version of my linear middle code that
demonstrates this algorithm and then test it against the Linux 2.6
kernel.

FWIW: my measurements of your algorithm thus far show that if the bug
exists in the first 1070 of the 2119 commits between HEAD and
v2.6.12-rc2 it consistently (very consistently) takes between 11 and
13 iterations of git-bug-blatt-script to find the bug.

Specifically: average (12.10), median (12), stdev(0.412), max(13), min(11).

This compares with my naive literal middle algorithm (measurements
only for the first 619 commits):

average(21), median(16), stdev(10.6), max(49), min(8).

The number of commits is 2118, the number with a fanout > 1 is 173.
The average fanout of those with fanout is 2. My average is currently
2 times worse than yours and my worst case is 4-5 times worse than
yours.

So, you will be pleased to know that your intuition about the
correctness of your own algorithm has been objectively verified.

My intuition at this point is that my revised algorithm won't
significantly differ from your algorithm. I am thinking it will
require ~ 3 epoch boundary tests to identify the relevant epoch, 5
merge tests to identify the relevant segment and 4 bisections of a
linear segment to identify the correct merge which ends up being 12
iterations of the bug-blatt algorithm which really is no different to
yours. I suspect my algorithm might be better for really, really, big
repositories,  with long histories, but I judge that the chance an
interesting bug will be deep in the long history is somewhat remote.

Both algorithms will benefit, however, if the intuition that most bugs
are recent is correct. git-bug-blatt-script could be modified to test
the first epoch boundary before recursively bisecting things. This
will make the typical bug search ~ 9 iterations long (since the
average epoch appears to be less than 512 commits  and more than 256
commits big).

jon.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: git-rev-list: "--bisect" flag
  2005-06-19 14:41         ` Jon Seymour
@ 2005-06-19 17:20           ` Linus Torvalds
  2005-06-19 17:36             ` Ingo Molnar
  2005-06-19 19:55             ` Jon Seymour
  0 siblings, 2 replies; 19+ messages in thread
From: Linus Torvalds @ 2005-06-19 17:20 UTC (permalink / raw)
  To: jon; +Cc: Git Mailing List

On Mon, 20 Jun 2005, Jon Seymour wrote:
> 
> Would I be correct in stating that an intuitive  reason why your
> algorithm is better than selecting the linear middle is the following:
> 
> If you concentrate on testing merges, rather than non-merges, the
> chances are you are going to eliminate N-times as many possible good
> commits as if you pick a random commit, where N is the average fan-out
> of the commit graph.

No. You really shouldn't concentrate on merges either.

The thing is, you do _not_ want to test as many commits as possible, or as 
few commits as possible.

This is _not_ a "try to eliminate as many commits as possible" problem. 
It's really one of trying to eliminate _half_ the commits. Not more, not 
less. 

So to some degree you want to _avoid_ merges, because they pick up a lot 
of commits, and that might take you over the limit. But at the same time 
you need to be very aware of merges, since if you ignore them, you'll pick 
up too _few_ commits.

So in a very real sense, whether you pick a merge or not doesn't depend on 
whether that one entry is a merge itself - it really depends on the whole 
flow. You can't do any local measurements.

The reason my algorithm is so horrid (O(3)) is that you can't even cache 
the dang thing sanely: you can't optimize if by remembering how many 
interesting commits are reachable from one commit, and then doing "this 
commit reaches commits X and Y, so I can reach X.count + Y.count commits 
from it". That's just not true, because often (basically always) there is 
lots of common commits in X.count and Y.count..

So that's why I do that strange full reachability thing for _each_ commit,
and then clear the reachability flags and start with the next commit. It's
horrid, but I can't see a sane way to avoid it (I could do the clearing
less often by introducing the notion of "generations" in the reachability,
but I didn't want to add a new counter to the data structures - and it
wouldn't really change anything fundamental).

In your measurements:

> FWIW: my measurements of your algorithm thus far show that if the bug
> exists in the first 1070 of the 2119 commits between HEAD and
> v2.6.12-rc2 it consistently (very consistently) takes between 11 and
> 13 iterations of git-bug-blatt-script to find the bug.
> 
> Specifically: average (12.10), median (12), stdev(0.412), max(13), 
> min(11).

Yes. Consistency is the name of the game. A low standard deviation is what
it's all about, because that's how binary searches work. The point about
binary searching for a bug is that you're not really looking for "the one"  
commit, you're really looking for the _range_ of two adjacent commits: it
doesn't even help if you happen to pick the buggy commit on the first try,
because the only thing that matters is when you have zeroed in on the
"buggy and previous" one.

In other words, in a traditional search, when you pick an entry, you know 
that you got it right: you might be lucky and pick it first, and you'll be 
happy. But in the "search for a bug" case, you always have to go the 
_full_ "log2()" thing, because you will always have to not just pick the 
right entry, you will also have had to "bisect to zero" so that you know 
that the entry before it was not buggy.

This is because the "is it buggy" is not a unique thing, it's really just 
a partial ordering in the set (if you're really unlucky, you might have 
two bugs that interact, and you'll not actually find "the commit" at all, 
but the one you do end up zeroing in on should at least be _one_ border 
condition for the bug, so it should help you somewhat regardless). 

So you basically cannot avoid doing "ceil(log2(N))" tests, and with 2119
commits, you should pretty much _always_ get 12, and basically a standard
deviation of zero.

The reason you don't get that is that _occasionally_ you can't get close
enough to "half-way", and depending on whether the bug was in the smaller
set or bigger set, you might be lucky or unlucky. The problem here is that
by definition you'll be unlucky more of the time: the bigger set (the
unlucky case) is _bigger_, so it's more likely to have the bug, so what
you should see statistically is that you can't actually do _better_ than
ceil(log2(N)).

("cannot do better" obviously is true only in the statistical sense. You
can always be lucky in any individual test, and if you are intelligent and
choose the commits to test based on the symptoms of the bug you can always
do better, of course).

NOTE NOTE NOTE! The above is all based on random distribution of bugs,
where all commits count equally, which is obviously not even true. You 
coudl try to do a "weighted bisection", where you weigh commits 
differently: you might say, for example, that if the author field matches 
the string "torvalds", then the likelihood of a bug is obviously 
miniscule, so such a commit only counts as 0.1.

Or you could argue that a pure merge seldom introduces a bug (which is 
probably not true, but hey, this is theory), and you could decide that 
when you "count commits", then normal commits count as two, and merges 
count as one, and the bisection is trying to get equal "weights" on both 
sides, not "equal number of commits".

However, that's where "consistency" has another advantage: maybe it's
possible to get better results on average by taking statistical bug
distribution behaviour into account (like "bugs are likely new - try to
weigh the bisection 60% reachable / 40% unreachable instead of 50-50"), 
but that means that then occasionally you do worse, and from a 
psychological angle I think that's unacceptable. I think most bug hunters 
would much prefer to know that "I need to reboot 7 times and I'll get it", 
over "I probably need to reboot only 5 times, but it might be 10 if I'm 
unlucky".

> This compares with my naive literal middle algorithm (measurements
> only for the first 619 commits):
> 
> average(21), median(16), stdev(10.6), max(49), min(8).

Yes. I really don't believe you can do better than 12, unless you start 
depending on knowledge of the distribution of bugs.

		Linus

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: git-rev-list: "--bisect" flag
  2005-06-19 17:20           ` Linus Torvalds
@ 2005-06-19 17:36             ` Ingo Molnar
  2005-06-19 19:01               ` Linus Torvalds
  2005-06-19 19:55             ` Jon Seymour
  1 sibling, 1 reply; 19+ messages in thread
From: Ingo Molnar @ 2005-06-19 17:36 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: jon, Git Mailing List

* Linus Torvalds <torvalds@osdl.org> wrote:

> NOTE NOTE NOTE! The above is all based on random distribution of bugs, 
> where all commits count equally, which is obviously not even true. You 
> coudl try to do a "weighted bisection", where you weigh commits 
> differently: you might say, for example, that if the author field 
> matches the string "torvalds", then the likelihood of a bug is 
> obviously miniscule, so such a commit only counts as 0.1.

another assumption is that the number of testsystems is a power of two 
minus 1. With 2 or more testsystems (and automated testing) you could 
dissect the search space into 3, 5 or more roughly equal pieces in the 
first step (2, 4, 8 ... sections are already supported via the bisect 
flag). To decrease the time needed to find a bug it makes sense to 
increase the number of testsystems, especially if it takes minutes to 
boot - or if it takes minutes (or hours) to reproduce a bug. If each box 
runs a separate kernel then statistically, if one of them triggers the 
bug, only half of them have to be rebooted with new kernels, the others 
would still be kept running in a "commit space of interest".

But i guess it's not a big degradation to just round the test method to 
the nearest power-of-2 bisection method.

	Ingo

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: git-rev-list: "--bisect" flag
  2005-06-19 17:36             ` Ingo Molnar
@ 2005-06-19 19:01               ` Linus Torvalds
  0 siblings, 0 replies; 19+ messages in thread
From: Linus Torvalds @ 2005-06-19 19:01 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: jon, Git Mailing List

On Sun, 19 Jun 2005, Ingo Molnar wrote:
> 
> another assumption is that the number of testsystems is a power of two 
> minus 1. With 2 or more testsystems (and automated testing) you could 
> dissect the search space into 3, 5 or more roughly equal pieces in the 
> first step (2, 4, 8 ... sections are already supported via the bisect 
> flag).

Yeah. I don't think it matters much, though, since the "more testsystems"  
thing will only really end up helping by a fairly small amount, at the
cost of much more complexity. The difference between "log2(N)" and
"log5(N)" isn't _that_ big, and it's even smaller when you just do the
power-of-two thing and use 4 systems instead of 5 (ie now it's "log4(N)"  
vs "log5(N)").

Also, to use multiple test-systems efficiently, they all end up having to
be synchronized (ie the next iteration needs to get the result from all
the test-systems from the previous one), so it's quite a lot of bother for
not a lot of improvement. It also assumes you can partition the problem
space well into <n> roughly equal pieces - which may or may not be true.

More importantly, quite often the nasty problems are the ones that only 
happen for one machine. Trying to make it easy for non-developers to test 
different kernels is what this is about: I started thinking about this 
when we had the "x86-64 SIGSEGV's for me" issue, where bisection was 
fairly easy in the -mm series due to a nice linear series, but even then 
you had to have tools to apply/unapply different patches etc.

		Linus

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: git-rev-list: "--bisect" flag
  2005-06-19 17:20           ` Linus Torvalds
  2005-06-19 17:36             ` Ingo Molnar
@ 2005-06-19 19:55             ` Jon Seymour
  2005-06-20  3:09               ` Linus Torvalds
  1 sibling, 1 reply; 19+ messages in thread
From: Jon Seymour @ 2005-06-19 19:55 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Git Mailing List

On 6/20/05, Linus Torvalds <torvalds@osdl.org> wrote:
> 
> 
> On Mon, 20 Jun 2005, Jon Seymour wrote:
> >
> > Would I be correct in stating that an intuitive  reason why your
> > algorithm is better than selecting the linear middle is the following:
> >
> > If you concentrate on testing merges, rather than non-merges, the
> > chances are you are going to eliminate N-times as many possible good
> > commits as if you pick a random commit, where N is the average fan-out
> > of the commit graph.
> 
> No. You really shouldn't concentrate on merges either.
> 
> The thing is, you do _not_ want to test as many commits as possible, or as
> few commits as possible.
> 
> This is _not_ a "try to eliminate as many commits as possible" problem.
> It's really one of trying to eliminate _half_ the commits. Not more, not
> less.
> 

Yep, ok, so the node you are looking for is one that can reach as
close to half of the rest of the graph as possible - that's what you
mean by half-way reachability. If the graph was N nodes deep (no
fan-out) that would be the literal middle. If it was N nodes wide
(e.g. fanout N), there is no good node and you basically have to test
everything since one test doesn't imply anything about the other N-1
nodes.

A typical commit graph is worse than O(log2 N) by a factor that is
determined by some measure of the parallel branching in the graph.

> 
> The reason my algorithm is so horrid (O(3)) is that you can't even cache
> the dang thing sanely: you can't optimize if by remembering how many
> interesting commits are reachable from one commit, and then doing "this
> commit reaches commits X and Y, so I can reach X.count + Y.count commits
> from it". That's just not true, because often (basically always) there is
> lots of common commits in X.count and Y.count..

I presume  you mean O(N^3)?
 
Will you accept a patch that can reduce the worst case cost to some
lower order? I have a hunch (conceivably wrong, of course) that it can
be done in O(N).

> > This compares with my naive literal middle algorithm (measurements
> > only for the first 619 commits):
> >
> > average(21), median(16), stdev(10.6), max(49), min(8).
> 
> Yes. I really don't believe you can do better than 12, unless you start
> depending on knowledge of the distribution of bugs.
> 

Agreed. 

jon.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: git-rev-list: "--bisect" flag
  2005-06-19 19:55             ` Jon Seymour
@ 2005-06-20  3:09               ` Linus Torvalds
  2005-06-20  3:27                 ` Jon Seymour
  0 siblings, 1 reply; 19+ messages in thread
From: Linus Torvalds @ 2005-06-20  3:09 UTC (permalink / raw)
  To: jon; +Cc: Git Mailing List

On Mon, 20 Jun 2005, Jon Seymour wrote:
>
> Yep, ok, so the node you are looking for is one that can reach as
> close to half of the rest of the graph as possible - that's what you
> mean by half-way reachability. If the graph was N nodes deep (no
> fan-out) that would be the literal middle. If it was N nodes wide
> (e.g. fanout N), there is no good node and you basically have to test
> everything since one test doesn't imply anything about the other N-1
> nodes.

Exactly.

> A typical commit graph is worse than O(log2 N) by a factor that is
> determined by some measure of the parallel branching in the graph.

Yes. In many cases it's ok to have parallellism, and the fact that any 
normal merges will always be two-way (ie there's never a commit in 
practice that has a very high fan-in), I do believe that you normally get 
pretty close to that O(log2N) behaviour. In fact, your numbers seemed to 
say that we're normally within 10% of it.

> I presume  you mean O(N^3)?

Yup again.

> Will you accept a patch that can reduce the worst case cost to some
> lower order? I have a hunch (conceivably wrong, of course) that it can
> be done in O(N).

Well, it depends on how nasty it ends up being. I thought my stupid
algorithm would suck horribly, but considering that it can trivially
bisect all of the git history so far in basically zero time, it doesn't
seem to be that bad. You're more likely to have problems with having to
bring in all the commits from disk in the cold-cache case than you are to
have problems with the current algorithm performance.

Even at O(N^3) it shouldn't be too nasty even if some crazy person tried
to bisect a years worth of development (which really isn't very reasonable
behaviour anyway, it's just not sensible to not try a few standard
releases in between first).

			Linus

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: git-rev-list: "--bisect" flag
  2005-06-20  3:09               ` Linus Torvalds
@ 2005-06-20  3:27                 ` Jon Seymour
  2005-06-20  3:30                   ` Jon Seymour
  0 siblings, 1 reply; 19+ messages in thread
From: Jon Seymour @ 2005-06-20  3:27 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Git Mailing List

> Well, it depends on how nasty it ends up being. I thought my stupid
> algorithm would suck horribly, but considering that it can trivially
> bisect all of the git history so far in basically zero time, it doesn't
> seem to be that bad. You're more likely to have problems with having to
> bring in all the commits from disk in the cold-cache case than you are to
> have problems with the current algorithm performance.
> 

I'll give you a sketch of the algorithm. 

First, recall that merge-order code uses an incremental topological
sort whose key idea is to employ a conservation of mass analogy. A
similar analogy can help here.

1. count all visible nodes [ i.e. nodes that git-rev-list would print
], call this value N
2. at the top node inject N units of mass
3. traverse the visible graph, in topological order
4. at each node, send all the mass received from parents minus 1 unit
onto visible parents. Record how much mass you have sent downstream.
Keep a record of the nodes that have seen nearest to half of that
mass.
5. when the traversal completes, choose the node that saw closest to
1/2 of the original mass [ or pick one at random if there is more than
one ]. It must be able to reach close to 1/2 of the visible graph,
'cos all that mass it has seen has to drain somewhere!

This algorithm is demonstrably O(V+E) because the traversal is O(V+E)

My intention is to add this algorithm to my refactored
epoch.c/traversal.c, since this is a much better framework for doing
traversals than the current epoch.c.

Do you have any objections to me doing that?

jon.
-- 
homepage: http://www.zeta.org.au/~jon/
blog: http://orwelliantremors.blogspot.com/

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: git-rev-list: "--bisect" flag
  2005-06-20  3:27                 ` Jon Seymour
@ 2005-06-20  3:30                   ` Jon Seymour
  2005-06-20 10:29                     ` Jon Seymour
  0 siblings, 1 reply; 19+ messages in thread
From: Jon Seymour @ 2005-06-20  3:30 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Git Mailing List

> 1. count all visible nodes [ i.e. nodes that git-rev-list would print
> ], call this value N
> 2. at the top node inject N units of mass
> 3. traverse the visible graph, in topological order
> 4. at each node, send all the mass received from parents minus 1 unit
> onto visible parents. Record how much mass you have sent downstream.
> Keep a record of the nodes that have seen nearest to half of that
> mass.

Correction - at each node, send all mass received from _children_

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: git-rev-list: "--bisect" flag
  2005-06-20  3:30                   ` Jon Seymour
@ 2005-06-20 10:29                     ` Jon Seymour
  2005-06-20 14:37                       ` Jon Seymour
  0 siblings, 1 reply; 19+ messages in thread
From: Jon Seymour @ 2005-06-20 10:29 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Git Mailing List

On 6/20/05, Jon Seymour <jon.seymour@gmail.com> wrote:
> > 1. count all visible nodes [ i.e. nodes that git-rev-list would print
> > ], call this value N
> > 2. at the top node inject N units of mass
> > 3. traverse the visible graph, in topological order
> > 4. at each node, send all the mass received from parents minus 1 unit
> > onto visible parents. Record how much mass you have sent downstream.
> > Keep a record of the nodes that have seen nearest to half of that
> > mass.
> 
> Correction - at each node, send all mass received from _children_
> 

This is a little too simple. It would work if the all nodes in the
visible graph were all connected to the base of an epoch. However, the
pruning behaviour of known goods effectively installs blockages into
the network and causes "backflow" which complicates things somewhat -
a plumbers nightmare!

I am exploring another variation of the idea which is even simpler.
Experimentally it seems quite similar to your average case
performance, but doesn't yet approach your low std deviations.
However, I think I know why that is, so I will, after a few games of
pool and a similar number of beers, have another crack at it.

jon.
-- 
homepage: http://www.zeta.org.au/~jon/
blog: http://orwelliantremors.blogspot.com/

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: git-rev-list: "--bisect" flag
  2005-06-20 10:29                     ` Jon Seymour
@ 2005-06-20 14:37                       ` Jon Seymour
  0 siblings, 0 replies; 19+ messages in thread
From: Jon Seymour @ 2005-06-20 14:37 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Git Mailing List

> I am exploring another variation of the idea which is even simpler.
> Experimentally it seems quite similar to your average case
> performance, but doesn't yet approach your low std deviations.
> However, I think I know why that is, so I will, after a few games of
> pool and a similar number of beers, have another crack at it.
> 

Ok, just for information, I have a linear bisection algorithm which
results in a bug-blatt algorithm with the following characteristics on
the Linux 2.6 kernel:

AVERAGE	13.64589235
MEDIAN	13
MAX	28
MIN	9
STDEV	2.171923221

These results were derived by running the algorithm over the full 2118
commits of the
current Linux 2.6 source.

I'll document what this algorithm is and what its flaws are.

The algorithm is as follows:

Consider the graph as a closed network of pipes with N reservoirs of 1
unit of volume. Rotate this graph so the head is at the top. Now, pour
N units of liquid into the top of the network of pipes. The liquid
percolates through the network of pipes and ends up in the reservoirs.
The selected bisection point is the node that saw as close to N/2
units of liquid as any other node.

Now, why isn't this as good (in terms of average and stdev) as Linus'
algorithm? The flaw with this algorithm is that some liquid from peer
nodes reaches the lower half of the network before the liquid that
went through the selected node. As a result, the selected node is
elevated "above" the ideal bisection point, by a factor that I believe
would be sufficient to explain the higher average and also the greater
standard deviation.

Still, my algorithm is very simple and very linear, so it's very good a start.

As it happens, I believe I have (re-?)discovered a precise way to
discover the bisection points in O(n). I am in the midst of
implementing that now.

jon.

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2005-06-20 14:32 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-06-18  6:31 git-rev-list: "--bisect" flag Linus Torvalds
2005-06-19  0:18 ` Jon Seymour
2005-06-19  3:38   ` Jan Harkes
2005-06-19  4:07     ` Jon Seymour
2005-06-19  3:43   ` Linus Torvalds
2005-06-19  5:03     ` Linus Torvalds
2005-06-19  5:05       ` David Lang
2005-06-19  5:30         ` Linus Torvalds
2005-06-19 10:15       ` Jon Seymour
2005-06-19 14:41         ` Jon Seymour
2005-06-19 17:20           ` Linus Torvalds
2005-06-19 17:36             ` Ingo Molnar
2005-06-19 19:01               ` Linus Torvalds
2005-06-19 19:55             ` Jon Seymour
2005-06-20  3:09               ` Linus Torvalds
2005-06-20  3:27                 ` Jon Seymour
2005-06-20  3:30                   ` Jon Seymour
2005-06-20 10:29                     ` Jon Seymour
2005-06-20 14:37                       ` Jon Seymour

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).