* git-rev-list: "--bisect" flag @ 2005-06-18 6:31 Linus Torvalds 2005-06-19 0:18 ` Jon Seymour 0 siblings, 1 reply; 19+ messages in thread From: Linus Torvalds @ 2005-06-18 6:31 UTC (permalink / raw) To: Git Mailing List Ok, I just added a feature to "git-rev-list" that I hereby nominate to be the coolest feature ever, namely the "--bisect" flag, which basically tries to split the list of revisions in two, and prints out just the "most middle" commit. The idea is that you want to do binary-search for a bug, but since the git history isn't a nice linear thing, you don't know where the hell the mid-point is between two commits, and even if you _do_ know where it is, it gets even more complicated when you have 3 points (on different branches) that are known good, and one point that is known bad. Now, assuming you have three known-good points, and one known bad point, the way to get all commits in between is git-rev-list bad ^good1 ^good2 ^good3 and with the new flag you can now trivially get an estimate for what the mid-point is in this list. So you just do git-rev-list --bisect bad ^good1 ^good2 ^good3 and you now have the point to test. If testing shows that mid-way-point was bad, you just continue with that as a "new bad": git-rev-list --bisect new-bad ^good1 ^good2 ^good3 but if it shows that that point was still good, you instead just add it to the list of uninteresting commits, and do git-rev-list --bisect bad ^good1 ^good2 ^good3 ^new-good instead. Every iteration you basically cut down the number of commits by half - you're bisecting the set, and binary-searching for the first bad commit. Note that git also makes resetting to the test-point be really trivial: you just do git-read-tree -m -u <prev-point> <test-point> where "prev-point" was your previous state (ie usually "HEAD" the first time, but the previous test-point otherwise), and "test-point" is the new point you want to test. I haven't scripted this, but it should basically be pretty easy to add a couple of simple scripts to make it very easy to do the binary search be totally automated, assuming just a simple "mark test as failed/good" interface. Even without the scripting, it shouldn't be that hard to do entirely by hand. The idea being that if you notice that something doesn't work (usually in HEAD), but you know it used to work in the previous release, you just start the whole thing going with # set up initial state cat .git/HEAD > .git/BAD cat .git/HEAD > .git/CURRENT echo "^v2.6.12" > .git/GOOD and then you just do something like: # iterate git-rev-list --bisect BAD $(cat .git/BAD) > .git/HALFWAY if [ ! -s .git/HALFWAY ]; then echo FOUND IT: cat .git/CURRENT exit 1 fi git-read-tree -m -u CURRENT HALFWAY cat .git/HALFWAY > .git/CURRENT make .. # if good echo '^'$(cat .git/CURENT) >> .git/GOOD .. iterate .. # if bad cat .git/CURRENT > .git/BAD .. iterate .. until it says "FOUND IT". (The above is untested, but the --bisect behaviour itself I tested, and it seems to do the right thing). NOTE! I'm not convinced I actually find a particularly good place to split at, but I think my stupid "halfway point" decision is good enough in practice. So it might not really be a real binary search, but I think it comes pretty close to it unless you have some really odd cases. Linus ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: git-rev-list: "--bisect" flag 2005-06-18 6:31 git-rev-list: "--bisect" flag Linus Torvalds @ 2005-06-19 0:18 ` Jon Seymour 2005-06-19 3:38 ` Jan Harkes 2005-06-19 3:43 ` Linus Torvalds 0 siblings, 2 replies; 19+ messages in thread From: Jon Seymour @ 2005-06-19 0:18 UTC (permalink / raw) To: Linus Torvalds; +Cc: Git Mailing List On 6/18/05, Linus Torvalds <torvalds@osdl.org> wrote: > > Ok, I just added a feature to "git-rev-list" that I hereby nominate to be > the coolest feature ever, namely the "--bisect" flag, which basically > tries to split the list of revisions in two, and prints out just the "most > middle" commit. > Perhaps in answering this question, you can help me understand the intended behaviour of your bisection algorithm a little better. The question is this: for your purposes, given a rev-list output, why not simply use the literal middle element of the outputed list? jon. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: git-rev-list: "--bisect" flag 2005-06-19 0:18 ` Jon Seymour @ 2005-06-19 3:38 ` Jan Harkes 2005-06-19 4:07 ` Jon Seymour 2005-06-19 3:43 ` Linus Torvalds 1 sibling, 1 reply; 19+ messages in thread From: Jan Harkes @ 2005-06-19 3:38 UTC (permalink / raw) To: Git Mailing List On Sun, Jun 19, 2005 at 10:18:45AM +1000, Jon Seymour wrote: > On 6/18/05, Linus Torvalds <torvalds@osdl.org> wrote: > > > > Ok, I just added a feature to "git-rev-list" that I hereby nominate to be > > the coolest feature ever, namely the "--bisect" flag, which basically > > tries to split the list of revisions in two, and prints out just the "most > > middle" commit. > > > > Perhaps in answering this question, you can help me understand the > intended behaviour of your bisection algorithm a little better. The > question is this: for your purposes, given a rev-list output, why not > simply use the literal middle element of the outputed list? A was known good, parallel development for commits B and C, finally merged into D. A bug got introduced in B, which we discover by the time we have a checked out copy of D. .--- B ---. / \ A D \ / `--- C ---' git-rev-list E ^A shows this as BCD. Pick the halfway point C, and this one checks out ok. So at this point we would decide that the bug got introduced by the C->D change. My guess is that Linus's bisect algorithm considers the branches, so once C is cleared as good, we get a rev-list that looks like BD, and as such can still find the exact bad commit B. Jan ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: git-rev-list: "--bisect" flag 2005-06-19 3:38 ` Jan Harkes @ 2005-06-19 4:07 ` Jon Seymour 0 siblings, 0 replies; 19+ messages in thread From: Jon Seymour @ 2005-06-19 4:07 UTC (permalink / raw) To: Git Mailing List On 6/19/05, Jan Harkes <jaharkes@cs.cmu.edu> wrote: > On Sun, Jun 19, 2005 at 10:18:45AM +1000, Jon Seymour wr > A was known good, parallel development for commits B and C, finally > merged into D. A bug got introduced in B, which we discover by the time > we have a checked out copy of D. > > .--- B ---. > / \ > A D > \ / > `--- C ---' > > git-rev-list E ^A shows this as BCD. Pick the halfway point C, and this I assume you mean git-rev-list D ^A > one checks out ok. So at this point we would decide that the bug got > introduced by the C->D change. I think you misunderstood the intent of my original question which was implicitly referring to the implementation of the bisection algorithm, not to the bug blatt algorithm that makes use of the bisection algorithm. My actual question was: "Why is the bisection algorithm itself the way it is? " A simple bisection algorithm which simply chooses the middle of the git-rev-list output chooses a point which one could in principle argue is as good as the one chosen by the current bisection algorithm. However, I must admit that I don't quite understand the subtleties of Linus' bisection algorithm, hence the question - why is his bisection algorithm better than simply choosing the literal middle? I'll post a patch that I have already sent to Linus which demonstrates an implementation of --bisect that chooses the literal middle. This patch also includes a demonstration of Linus' binary bug blatting technique (see t/t6001-rev-list-merge-order.sh) jon. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: git-rev-list: "--bisect" flag 2005-06-19 0:18 ` Jon Seymour 2005-06-19 3:38 ` Jan Harkes @ 2005-06-19 3:43 ` Linus Torvalds 2005-06-19 5:03 ` Linus Torvalds 1 sibling, 1 reply; 19+ messages in thread From: Linus Torvalds @ 2005-06-19 3:43 UTC (permalink / raw) To: jon; +Cc: Git Mailing List On Sun, 19 Jun 2005, Jon Seymour wrote: > > Perhaps in answering this question, you can help me understand the > intended behaviour of your bisection algorithm a little better. The > question is this: for your purposes, given a rev-list output, why not > simply use the literal middle element of the outputed list? The literal middle tends to be in the middle _timewise_, but that's irrelevant. It might be just a single step away from one of the commits that has already been marked "good", and as such, testing that one might well be totally pointless: if you have a thousand commits (not at all unlikely), testing whether they are good or not one at a time is just not reasonable. What we want to get is the commit that basically bisects the other commits from a "reachability graph" angle. We want something that when selected as the new "top" will have roughly half the commits in it. Put another way: let's say that we have those thousand commits that _may_ have introduced a bug, but we don't know which one. What do we want? We want to find a new head that contains roughly five hundred of the unknown commits. No more, no less. Maybe we won't find a 500/500 split, but we definitely want a 600/400 split over a 1/999 split, since the 1/999 split isn't likely to get us any closer to the answer. Now, I say "roughly", because there may not _be_ any commit that exactly bisects the case. For a trivial case, let's say that we know that 'A' is bad, and 'B' is good, but the graph in between them looks like this: A / | a | | b | | \ c d e \ | / B so the bug might have been introduced by any of a-e, and we just don't know which one was the cause. Now, selecting either 'a' or 'b' as the new tip is a good choice, 'a' reaches two unknowns (a+c) while b reaches 3 unknowns (b+d+e) but the others only reach a single unknown. And notice how the two good bisection places were both _recent_. They were not in the middle "time-wise". Also note that we do NOT want to reach the maximal amount of commits: we want to reach _half_ the commits. If we select a commit that reaches a lot of other unknown commits ('A' is the maximal here, of course), then we're not actually bisecting the thing at all: we're just testing close to a "known bad" case. But let's say that 'c' didn't actually exist (or, alternatively, we have tested that one and know that "B+c" is good, which basically makes 'c' irrelevant for bug-hunting): in that case there is no good bisection, since whichever of the remaining ones we choose (a,b,d or e) we'll always have split it up so that we basically test the addition or subtraction of just one single commit. There's not anything "--bisect" can do about that, and it will just happen to pick one of them. And that's what I mean by "there may not be any commit that is nicely in the middle". Linus ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: git-rev-list: "--bisect" flag 2005-06-19 3:43 ` Linus Torvalds @ 2005-06-19 5:03 ` Linus Torvalds 2005-06-19 5:05 ` David Lang 2005-06-19 10:15 ` Jon Seymour 0 siblings, 2 replies; 19+ messages in thread From: Linus Torvalds @ 2005-06-19 5:03 UTC (permalink / raw) To: jon; +Cc: Git Mailing List On Sat, 18 Jun 2005, Linus Torvalds wrote: > > Now, I say "roughly", because there may not _be_ any commit that exactly > bisects the case. For a trivial case, let's say that we know that 'A' is > bad, and 'B' is good, but the graph in between them looks like this: Let's make a different case, just to make it even more obvious. Let's say that the graph is A / \ a1 b1 | | a2 b2 | | a3 b3 | | a4 b4 | | a5 b5 | | a6 b6 | | a7 b7 | | a8 b8 | | a9 b9 \ / B and we know that "A" is bad, and "B" is good, but we don't know which of a1-a9/b1-b9 are buggy. Where do we start testing? We start testing at either 'a1' or 'b1', because those are the two values that bisect the list either into the "a1-a9" series, or the "b1-b9" series. Any other starting point would be a bug. Or, let's say that the graph is A / \ | b1 | | | b2 | | | b3 | | a1 b4 | | a2 b5 | | a3 b6 | | | b7 | | | b8 | | | b9 \ / B and in this case the right place to pick is 'b4', because that's the one that reaches 6 commits (b4-b9), while it leaves 6 commits unreachable (a1-a3, b1-b3): that's a perfect bisection, and now it's totally unambiguous (in the previous case a1 and b1 were equally good choices, in this case there is only one valid choice). It gets more interesting when you have intermediate merges: A / \ | b1 | | | b2 | | | b3 | | a1 b4 | | a2 b5 | / | a3 b6 | | | b7 | | | b8 | | | b9 \ / B The above graph _looks_ very similar, but now the right place to bisect is 'b5', because that reaches six commits (a3 + b5-b9), and again there are six commits unreachable (a1-a2 + b1-b4). Again, this is unambiguous - there is one clearly superior choice. The current --bisect algorithm may not always pick that best choice: I think I should subtract one from the total number of commits, since right now it counts the "good" commit too, ie 'A'. But I think it's at most off-by-one. The bigger problem is that the algorithm is something like O(N^3) in cost - albeit with a fairly cheap constant factor. In other words, it might not be worthwhile bisecting three years worth of development with it, and you migth be better off starting with a rougher half-way-point algorithm ("let's try some release a year and a half ago first"). The performance problem seems to really be pretty theoretical: I can bisect the developemnt from 2.6.12-rc2 to current head in 0.59 seconds, so it's not like it's horribly slow for something like a couple of months worth of development. Linus ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: git-rev-list: "--bisect" flag 2005-06-19 5:03 ` Linus Torvalds @ 2005-06-19 5:05 ` David Lang 2005-06-19 5:30 ` Linus Torvalds 2005-06-19 10:15 ` Jon Seymour 1 sibling, 1 reply; 19+ messages in thread From: David Lang @ 2005-06-19 5:05 UTC (permalink / raw) To: Linus Torvalds; +Cc: jon, Git Mailing List On Sat, 18 Jun 2005, Linus Torvalds wrote: > The performance problem seems to really be pretty theoretical: I can > bisect the developemnt from 2.6.12-rc2 to current head in 0.59 seconds, so > it's not like it's horribly slow for something like a couple of months > worth of development. if it takes you about as long to type the command (and scan it to make sure you didn't mistype it) as it does to execute you don't have a performance problem :-) David Lang -- There are two ways of constructing a software design. One way is to make it so simple that there are obviously no deficiencies. And the other way is to make it so complicated that there are no obvious deficiencies. -- C.A.R. Hoare ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: git-rev-list: "--bisect" flag 2005-06-19 5:05 ` David Lang @ 2005-06-19 5:30 ` Linus Torvalds 0 siblings, 0 replies; 19+ messages in thread From: Linus Torvalds @ 2005-06-19 5:30 UTC (permalink / raw) To: David Lang; +Cc: jon, Git Mailing List On Sat, 18 Jun 2005, David Lang wrote: > > if it takes you about as long to type the command (and scan it to make > sure you didn't mistype it) as it does to execute you don't have a > performance problem :-) Yeah, well, in all honesty, I haven't actually verified that my bisection gives the right answer. I verified "visually" that it does something half-way sane with this: git-rev-list --bisect HEAD ^v2.6.12-rc2 > .git/refs/tags/bisect-1 git-rev-list --bisect HEAD ^v2.6.12-rc2 ^bisect-1 > .git/refs/tags/bisect-2 git-rev-list --bisect HEAD ^v2.6.12-rc2 ^bisect-1 ^bisect-2 > .git/refs/tags/bisect-3 git-rev-list --bisect HEAD ^v2.6.12-rc2 ^bisect-1 ^bisect-2 ^bisect-3 > .git/refs/tags/bisect-4 git-rev-list --bisect HEAD ^v2.6.12-rc2 ^bisect-1 ^bisect-2 ^bisect-3 ^bisect-4 > .git/refs/tags/bisect-5 gitk and it looked sane (ie "bisect-1" should show up as a tag about half-way, and the others should get progressively closer to the HEAD). But when I now tried it again (I did the above originally with plain 2.6.12), I get errors from "gitk": ERROR: none of the pending commits can be done yet: bd6ae2f6d61da0f90c6b66e9a4ab6c53ef8c159a 2512809255d018744fe6c2f5e996c83769846c07 88d7bd8cb9eb8d64bf7997600b0d64f7834047c5 b3214970abbe983cd89842ae24ea00e21bba79f6 which looks like the current kernel overflows gitk some way (there's suddenly a lot more parallellism, since there were a number of merges that were pending, waiting for the 2.6.12 release). Linus ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: git-rev-list: "--bisect" flag 2005-06-19 5:03 ` Linus Torvalds 2005-06-19 5:05 ` David Lang @ 2005-06-19 10:15 ` Jon Seymour 2005-06-19 14:41 ` Jon Seymour 1 sibling, 1 reply; 19+ messages in thread From: Jon Seymour @ 2005-06-19 10:15 UTC (permalink / raw) To: Git Mailing List FWIW, I accept that Linus' bisection algorithm does have better worst case performance (in terms of number of build/test iterations) on at least some examples than the literal middle bisection algorithm and that the "literal middle" bisection algorithm has degenerate cases which mean that its performance is sometimes worse than O(log 2 N). I don't actually have a good feel for what the actual bound on Linus' algorithm is, but I'll measure it on the 2.6 kernel and post the results. Regards, jon. On 6/19/05, Linus Torvalds <torvalds@osdl.org> wrote: > > > On Sat, 18 Jun 2005, Linus Torvalds wrote: > > > > Now, I say "roughly", because there may not _be_ any commit that exactly > > bisects the case. For a trivial case, let's say that we know that 'A' is > > bad, and 'B' is good, but the graph in between them looks like this: > > Let's make a different case, just to make it even more obvious. > > Let's say that the graph is > > > A > / \ > a1 b1 > | | > a2 b2 > | | > a3 b3 > | | > a4 b4 > | | > a5 b5 > | | > a6 b6 > | | > a7 b7 > | | > a8 b8 > | | > a9 b9 > \ / > B > > and we know that "A" is bad, and "B" is good, but we don't know which of > a1-a9/b1-b9 are buggy. > > Where do we start testing? > > We start testing at either 'a1' or 'b1', because those are the two values > that bisect the list either into the "a1-a9" series, or the "b1-b9" > series. Any other starting point would be a bug. > > Or, let's say that the graph is > > A > / \ > | b1 > | | > | b2 > | | > | b3 > | | > a1 b4 > | | > a2 b5 > | | > a3 b6 > | | > | b7 > | | > | b8 > | | > | b9 > \ / > B > > > and in this case the right place to pick is 'b4', because that's the one > that reaches 6 commits (b4-b9), while it leaves 6 commits unreachable > (a1-a3, b1-b3): that's a perfect bisection, and now it's totally > unambiguous (in the previous case a1 and b1 were equally good choices, in > this case there is only one valid choice). > > It gets more interesting when you have intermediate merges: > > A > / \ > | b1 > | | > | b2 > | | > | b3 > | | > a1 b4 > | | > a2 b5 > | / | > a3 b6 > | | > | b7 > | | > | b8 > | | > | b9 > \ / > B > > The above graph _looks_ very similar, but now the right place to bisect is > 'b5', because that reaches six commits (a3 + b5-b9), and again there are > six commits unreachable (a1-a2 + b1-b4). Again, this is unambiguous - > there is one clearly superior choice. > > The current --bisect algorithm may not always pick that best choice: I > think I should subtract one from the total number of commits, since right > now it counts the "good" commit too, ie 'A'. But I think it's at most > off-by-one. > > The bigger problem is that the algorithm is something like O(N^3) in cost > - albeit with a fairly cheap constant factor. In other words, it might not > be worthwhile bisecting three years worth of development with it, and you > migth be better off starting with a rougher half-way-point algorithm > ("let's try some release a year and a half ago first"). > > The performance problem seems to really be pretty theoretical: I can > bisect the developemnt from 2.6.12-rc2 to current head in 0.59 seconds, so > it's not like it's horribly slow for something like a couple of months > worth of development. > > Linus > -- homepage: http://www.zeta.org.au/~jon/ blog: http://orwelliantremors.blogspot.com/ ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: git-rev-list: "--bisect" flag 2005-06-19 10:15 ` Jon Seymour @ 2005-06-19 14:41 ` Jon Seymour 2005-06-19 17:20 ` Linus Torvalds 0 siblings, 1 reply; 19+ messages in thread From: Jon Seymour @ 2005-06-19 14:41 UTC (permalink / raw) To: Git Mailing List, Linus Torvalds Linus, Would I be correct in stating that an intuitive reason why your algorithm is better than selecting the linear middle is the following: If you concentrate on testing merges, rather than non-merges, the chances are you are going to eliminate N-times as many possible good commits as if you pick a random commit, where N is the average fan-out of the commit graph. With the linear middle algorithm, it doesn't really matter if you pick a child of a merge - that is almost as good as picking the merge itself. But if you happen to pick a parent of merge, then if it is good and the parent is also good, you have wasted the opportunity to clean up the peer branches of the parent you chose. So the heuristic should be: - focus on branches until there are no branches left, then keep bisecting the list that remains. Building on this insight, though, I'd like to point out that if you want to eliminate things on a large scale, the things to focus on first are epoch boundaries. Epoch boundaries are defined in such a way that if an epoch boundary is good then everything reachable from an epoch boundary is also good [ assuming exactly one fault ]. If you happen to choose a child of an epoch boundary, that's ok on one-level, but on another level it's not ok. In particular, you haven't eliminated everything beyond the epoch boundary so the open list will be longer than it strictly needs to be. So, I propose a modification to my "linear middle" bisection algorithm as follows: If there is more than one epoch boundary in the interesting graph, choose the literal middle epoch boundary, otherwise, if there is more than one merge in the interesting graph, chose the literal middle merge, otherwise if there are no merges in the interesting graph, choose the literal middle of the remaining list. I am going to build a version of my linear middle code that demonstrates this algorithm and then test it against the Linux 2.6 kernel. FWIW: my measurements of your algorithm thus far show that if the bug exists in the first 1070 of the 2119 commits between HEAD and v2.6.12-rc2 it consistently (very consistently) takes between 11 and 13 iterations of git-bug-blatt-script to find the bug. Specifically: average (12.10), median (12), stdev(0.412), max(13), min(11). This compares with my naive literal middle algorithm (measurements only for the first 619 commits): average(21), median(16), stdev(10.6), max(49), min(8). The number of commits is 2118, the number with a fanout > 1 is 173. The average fanout of those with fanout is 2. My average is currently 2 times worse than yours and my worst case is 4-5 times worse than yours. So, you will be pleased to know that your intuition about the correctness of your own algorithm has been objectively verified. My intuition at this point is that my revised algorithm won't significantly differ from your algorithm. I am thinking it will require ~ 3 epoch boundary tests to identify the relevant epoch, 5 merge tests to identify the relevant segment and 4 bisections of a linear segment to identify the correct merge which ends up being 12 iterations of the bug-blatt algorithm which really is no different to yours. I suspect my algorithm might be better for really, really, big repositories, with long histories, but I judge that the chance an interesting bug will be deep in the long history is somewhat remote. Both algorithms will benefit, however, if the intuition that most bugs are recent is correct. git-bug-blatt-script could be modified to test the first epoch boundary before recursively bisecting things. This will make the typical bug search ~ 9 iterations long (since the average epoch appears to be less than 512 commits and more than 256 commits big). jon. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: git-rev-list: "--bisect" flag 2005-06-19 14:41 ` Jon Seymour @ 2005-06-19 17:20 ` Linus Torvalds 2005-06-19 17:36 ` Ingo Molnar 2005-06-19 19:55 ` Jon Seymour 0 siblings, 2 replies; 19+ messages in thread From: Linus Torvalds @ 2005-06-19 17:20 UTC (permalink / raw) To: jon; +Cc: Git Mailing List On Mon, 20 Jun 2005, Jon Seymour wrote: > > Would I be correct in stating that an intuitive reason why your > algorithm is better than selecting the linear middle is the following: > > If you concentrate on testing merges, rather than non-merges, the > chances are you are going to eliminate N-times as many possible good > commits as if you pick a random commit, where N is the average fan-out > of the commit graph. No. You really shouldn't concentrate on merges either. The thing is, you do _not_ want to test as many commits as possible, or as few commits as possible. This is _not_ a "try to eliminate as many commits as possible" problem. It's really one of trying to eliminate _half_ the commits. Not more, not less. So to some degree you want to _avoid_ merges, because they pick up a lot of commits, and that might take you over the limit. But at the same time you need to be very aware of merges, since if you ignore them, you'll pick up too _few_ commits. So in a very real sense, whether you pick a merge or not doesn't depend on whether that one entry is a merge itself - it really depends on the whole flow. You can't do any local measurements. The reason my algorithm is so horrid (O(3)) is that you can't even cache the dang thing sanely: you can't optimize if by remembering how many interesting commits are reachable from one commit, and then doing "this commit reaches commits X and Y, so I can reach X.count + Y.count commits from it". That's just not true, because often (basically always) there is lots of common commits in X.count and Y.count.. So that's why I do that strange full reachability thing for _each_ commit, and then clear the reachability flags and start with the next commit. It's horrid, but I can't see a sane way to avoid it (I could do the clearing less often by introducing the notion of "generations" in the reachability, but I didn't want to add a new counter to the data structures - and it wouldn't really change anything fundamental). In your measurements: > FWIW: my measurements of your algorithm thus far show that if the bug > exists in the first 1070 of the 2119 commits between HEAD and > v2.6.12-rc2 it consistently (very consistently) takes between 11 and > 13 iterations of git-bug-blatt-script to find the bug. > > Specifically: average (12.10), median (12), stdev(0.412), max(13), > min(11). Yes. Consistency is the name of the game. A low standard deviation is what it's all about, because that's how binary searches work. The point about binary searching for a bug is that you're not really looking for "the one" commit, you're really looking for the _range_ of two adjacent commits: it doesn't even help if you happen to pick the buggy commit on the first try, because the only thing that matters is when you have zeroed in on the "buggy and previous" one. In other words, in a traditional search, when you pick an entry, you know that you got it right: you might be lucky and pick it first, and you'll be happy. But in the "search for a bug" case, you always have to go the _full_ "log2()" thing, because you will always have to not just pick the right entry, you will also have had to "bisect to zero" so that you know that the entry before it was not buggy. This is because the "is it buggy" is not a unique thing, it's really just a partial ordering in the set (if you're really unlucky, you might have two bugs that interact, and you'll not actually find "the commit" at all, but the one you do end up zeroing in on should at least be _one_ border condition for the bug, so it should help you somewhat regardless). So you basically cannot avoid doing "ceil(log2(N))" tests, and with 2119 commits, you should pretty much _always_ get 12, and basically a standard deviation of zero. The reason you don't get that is that _occasionally_ you can't get close enough to "half-way", and depending on whether the bug was in the smaller set or bigger set, you might be lucky or unlucky. The problem here is that by definition you'll be unlucky more of the time: the bigger set (the unlucky case) is _bigger_, so it's more likely to have the bug, so what you should see statistically is that you can't actually do _better_ than ceil(log2(N)). ("cannot do better" obviously is true only in the statistical sense. You can always be lucky in any individual test, and if you are intelligent and choose the commits to test based on the symptoms of the bug you can always do better, of course). NOTE NOTE NOTE! The above is all based on random distribution of bugs, where all commits count equally, which is obviously not even true. You coudl try to do a "weighted bisection", where you weigh commits differently: you might say, for example, that if the author field matches the string "torvalds", then the likelihood of a bug is obviously miniscule, so such a commit only counts as 0.1. Or you could argue that a pure merge seldom introduces a bug (which is probably not true, but hey, this is theory), and you could decide that when you "count commits", then normal commits count as two, and merges count as one, and the bisection is trying to get equal "weights" on both sides, not "equal number of commits". However, that's where "consistency" has another advantage: maybe it's possible to get better results on average by taking statistical bug distribution behaviour into account (like "bugs are likely new - try to weigh the bisection 60% reachable / 40% unreachable instead of 50-50"), but that means that then occasionally you do worse, and from a psychological angle I think that's unacceptable. I think most bug hunters would much prefer to know that "I need to reboot 7 times and I'll get it", over "I probably need to reboot only 5 times, but it might be 10 if I'm unlucky". > This compares with my naive literal middle algorithm (measurements > only for the first 619 commits): > > average(21), median(16), stdev(10.6), max(49), min(8). Yes. I really don't believe you can do better than 12, unless you start depending on knowledge of the distribution of bugs. Linus ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: git-rev-list: "--bisect" flag 2005-06-19 17:20 ` Linus Torvalds @ 2005-06-19 17:36 ` Ingo Molnar 2005-06-19 19:01 ` Linus Torvalds 2005-06-19 19:55 ` Jon Seymour 1 sibling, 1 reply; 19+ messages in thread From: Ingo Molnar @ 2005-06-19 17:36 UTC (permalink / raw) To: Linus Torvalds; +Cc: jon, Git Mailing List * Linus Torvalds <torvalds@osdl.org> wrote: > NOTE NOTE NOTE! The above is all based on random distribution of bugs, > where all commits count equally, which is obviously not even true. You > coudl try to do a "weighted bisection", where you weigh commits > differently: you might say, for example, that if the author field > matches the string "torvalds", then the likelihood of a bug is > obviously miniscule, so such a commit only counts as 0.1. another assumption is that the number of testsystems is a power of two minus 1. With 2 or more testsystems (and automated testing) you could dissect the search space into 3, 5 or more roughly equal pieces in the first step (2, 4, 8 ... sections are already supported via the bisect flag). To decrease the time needed to find a bug it makes sense to increase the number of testsystems, especially if it takes minutes to boot - or if it takes minutes (or hours) to reproduce a bug. If each box runs a separate kernel then statistically, if one of them triggers the bug, only half of them have to be rebooted with new kernels, the others would still be kept running in a "commit space of interest". But i guess it's not a big degradation to just round the test method to the nearest power-of-2 bisection method. Ingo ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: git-rev-list: "--bisect" flag 2005-06-19 17:36 ` Ingo Molnar @ 2005-06-19 19:01 ` Linus Torvalds 0 siblings, 0 replies; 19+ messages in thread From: Linus Torvalds @ 2005-06-19 19:01 UTC (permalink / raw) To: Ingo Molnar; +Cc: jon, Git Mailing List On Sun, 19 Jun 2005, Ingo Molnar wrote: > > another assumption is that the number of testsystems is a power of two > minus 1. With 2 or more testsystems (and automated testing) you could > dissect the search space into 3, 5 or more roughly equal pieces in the > first step (2, 4, 8 ... sections are already supported via the bisect > flag). Yeah. I don't think it matters much, though, since the "more testsystems" thing will only really end up helping by a fairly small amount, at the cost of much more complexity. The difference between "log2(N)" and "log5(N)" isn't _that_ big, and it's even smaller when you just do the power-of-two thing and use 4 systems instead of 5 (ie now it's "log4(N)" vs "log5(N)"). Also, to use multiple test-systems efficiently, they all end up having to be synchronized (ie the next iteration needs to get the result from all the test-systems from the previous one), so it's quite a lot of bother for not a lot of improvement. It also assumes you can partition the problem space well into <n> roughly equal pieces - which may or may not be true. More importantly, quite often the nasty problems are the ones that only happen for one machine. Trying to make it easy for non-developers to test different kernels is what this is about: I started thinking about this when we had the "x86-64 SIGSEGV's for me" issue, where bisection was fairly easy in the -mm series due to a nice linear series, but even then you had to have tools to apply/unapply different patches etc. Linus ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: git-rev-list: "--bisect" flag 2005-06-19 17:20 ` Linus Torvalds 2005-06-19 17:36 ` Ingo Molnar @ 2005-06-19 19:55 ` Jon Seymour 2005-06-20 3:09 ` Linus Torvalds 1 sibling, 1 reply; 19+ messages in thread From: Jon Seymour @ 2005-06-19 19:55 UTC (permalink / raw) To: Linus Torvalds; +Cc: Git Mailing List On 6/20/05, Linus Torvalds <torvalds@osdl.org> wrote: > > > On Mon, 20 Jun 2005, Jon Seymour wrote: > > > > Would I be correct in stating that an intuitive reason why your > > algorithm is better than selecting the linear middle is the following: > > > > If you concentrate on testing merges, rather than non-merges, the > > chances are you are going to eliminate N-times as many possible good > > commits as if you pick a random commit, where N is the average fan-out > > of the commit graph. > > No. You really shouldn't concentrate on merges either. > > The thing is, you do _not_ want to test as many commits as possible, or as > few commits as possible. > > This is _not_ a "try to eliminate as many commits as possible" problem. > It's really one of trying to eliminate _half_ the commits. Not more, not > less. > Yep, ok, so the node you are looking for is one that can reach as close to half of the rest of the graph as possible - that's what you mean by half-way reachability. If the graph was N nodes deep (no fan-out) that would be the literal middle. If it was N nodes wide (e.g. fanout N), there is no good node and you basically have to test everything since one test doesn't imply anything about the other N-1 nodes. A typical commit graph is worse than O(log2 N) by a factor that is determined by some measure of the parallel branching in the graph. > > The reason my algorithm is so horrid (O(3)) is that you can't even cache > the dang thing sanely: you can't optimize if by remembering how many > interesting commits are reachable from one commit, and then doing "this > commit reaches commits X and Y, so I can reach X.count + Y.count commits > from it". That's just not true, because often (basically always) there is > lots of common commits in X.count and Y.count.. I presume you mean O(N^3)? Will you accept a patch that can reduce the worst case cost to some lower order? I have a hunch (conceivably wrong, of course) that it can be done in O(N). > > This compares with my naive literal middle algorithm (measurements > > only for the first 619 commits): > > > > average(21), median(16), stdev(10.6), max(49), min(8). > > Yes. I really don't believe you can do better than 12, unless you start > depending on knowledge of the distribution of bugs. > Agreed. jon. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: git-rev-list: "--bisect" flag 2005-06-19 19:55 ` Jon Seymour @ 2005-06-20 3:09 ` Linus Torvalds 2005-06-20 3:27 ` Jon Seymour 0 siblings, 1 reply; 19+ messages in thread From: Linus Torvalds @ 2005-06-20 3:09 UTC (permalink / raw) To: jon; +Cc: Git Mailing List On Mon, 20 Jun 2005, Jon Seymour wrote: > > Yep, ok, so the node you are looking for is one that can reach as > close to half of the rest of the graph as possible - that's what you > mean by half-way reachability. If the graph was N nodes deep (no > fan-out) that would be the literal middle. If it was N nodes wide > (e.g. fanout N), there is no good node and you basically have to test > everything since one test doesn't imply anything about the other N-1 > nodes. Exactly. > A typical commit graph is worse than O(log2 N) by a factor that is > determined by some measure of the parallel branching in the graph. Yes. In many cases it's ok to have parallellism, and the fact that any normal merges will always be two-way (ie there's never a commit in practice that has a very high fan-in), I do believe that you normally get pretty close to that O(log2N) behaviour. In fact, your numbers seemed to say that we're normally within 10% of it. > I presume you mean O(N^3)? Yup again. > Will you accept a patch that can reduce the worst case cost to some > lower order? I have a hunch (conceivably wrong, of course) that it can > be done in O(N). Well, it depends on how nasty it ends up being. I thought my stupid algorithm would suck horribly, but considering that it can trivially bisect all of the git history so far in basically zero time, it doesn't seem to be that bad. You're more likely to have problems with having to bring in all the commits from disk in the cold-cache case than you are to have problems with the current algorithm performance. Even at O(N^3) it shouldn't be too nasty even if some crazy person tried to bisect a years worth of development (which really isn't very reasonable behaviour anyway, it's just not sensible to not try a few standard releases in between first). Linus ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: git-rev-list: "--bisect" flag 2005-06-20 3:09 ` Linus Torvalds @ 2005-06-20 3:27 ` Jon Seymour 2005-06-20 3:30 ` Jon Seymour 0 siblings, 1 reply; 19+ messages in thread From: Jon Seymour @ 2005-06-20 3:27 UTC (permalink / raw) To: Linus Torvalds; +Cc: Git Mailing List > Well, it depends on how nasty it ends up being. I thought my stupid > algorithm would suck horribly, but considering that it can trivially > bisect all of the git history so far in basically zero time, it doesn't > seem to be that bad. You're more likely to have problems with having to > bring in all the commits from disk in the cold-cache case than you are to > have problems with the current algorithm performance. > I'll give you a sketch of the algorithm. First, recall that merge-order code uses an incremental topological sort whose key idea is to employ a conservation of mass analogy. A similar analogy can help here. 1. count all visible nodes [ i.e. nodes that git-rev-list would print ], call this value N 2. at the top node inject N units of mass 3. traverse the visible graph, in topological order 4. at each node, send all the mass received from parents minus 1 unit onto visible parents. Record how much mass you have sent downstream. Keep a record of the nodes that have seen nearest to half of that mass. 5. when the traversal completes, choose the node that saw closest to 1/2 of the original mass [ or pick one at random if there is more than one ]. It must be able to reach close to 1/2 of the visible graph, 'cos all that mass it has seen has to drain somewhere! This algorithm is demonstrably O(V+E) because the traversal is O(V+E) My intention is to add this algorithm to my refactored epoch.c/traversal.c, since this is a much better framework for doing traversals than the current epoch.c. Do you have any objections to me doing that? jon. -- homepage: http://www.zeta.org.au/~jon/ blog: http://orwelliantremors.blogspot.com/ ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: git-rev-list: "--bisect" flag 2005-06-20 3:27 ` Jon Seymour @ 2005-06-20 3:30 ` Jon Seymour 2005-06-20 10:29 ` Jon Seymour 0 siblings, 1 reply; 19+ messages in thread From: Jon Seymour @ 2005-06-20 3:30 UTC (permalink / raw) To: Linus Torvalds; +Cc: Git Mailing List > 1. count all visible nodes [ i.e. nodes that git-rev-list would print > ], call this value N > 2. at the top node inject N units of mass > 3. traverse the visible graph, in topological order > 4. at each node, send all the mass received from parents minus 1 unit > onto visible parents. Record how much mass you have sent downstream. > Keep a record of the nodes that have seen nearest to half of that > mass. Correction - at each node, send all mass received from _children_ ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: git-rev-list: "--bisect" flag 2005-06-20 3:30 ` Jon Seymour @ 2005-06-20 10:29 ` Jon Seymour 2005-06-20 14:37 ` Jon Seymour 0 siblings, 1 reply; 19+ messages in thread From: Jon Seymour @ 2005-06-20 10:29 UTC (permalink / raw) To: Linus Torvalds; +Cc: Git Mailing List On 6/20/05, Jon Seymour <jon.seymour@gmail.com> wrote: > > 1. count all visible nodes [ i.e. nodes that git-rev-list would print > > ], call this value N > > 2. at the top node inject N units of mass > > 3. traverse the visible graph, in topological order > > 4. at each node, send all the mass received from parents minus 1 unit > > onto visible parents. Record how much mass you have sent downstream. > > Keep a record of the nodes that have seen nearest to half of that > > mass. > > Correction - at each node, send all mass received from _children_ > This is a little too simple. It would work if the all nodes in the visible graph were all connected to the base of an epoch. However, the pruning behaviour of known goods effectively installs blockages into the network and causes "backflow" which complicates things somewhat - a plumbers nightmare! I am exploring another variation of the idea which is even simpler. Experimentally it seems quite similar to your average case performance, but doesn't yet approach your low std deviations. However, I think I know why that is, so I will, after a few games of pool and a similar number of beers, have another crack at it. jon. -- homepage: http://www.zeta.org.au/~jon/ blog: http://orwelliantremors.blogspot.com/ ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: git-rev-list: "--bisect" flag 2005-06-20 10:29 ` Jon Seymour @ 2005-06-20 14:37 ` Jon Seymour 0 siblings, 0 replies; 19+ messages in thread From: Jon Seymour @ 2005-06-20 14:37 UTC (permalink / raw) To: Linus Torvalds; +Cc: Git Mailing List > I am exploring another variation of the idea which is even simpler. > Experimentally it seems quite similar to your average case > performance, but doesn't yet approach your low std deviations. > However, I think I know why that is, so I will, after a few games of > pool and a similar number of beers, have another crack at it. > Ok, just for information, I have a linear bisection algorithm which results in a bug-blatt algorithm with the following characteristics on the Linux 2.6 kernel: AVERAGE 13.64589235 MEDIAN 13 MAX 28 MIN 9 STDEV 2.171923221 These results were derived by running the algorithm over the full 2118 commits of the current Linux 2.6 source. I'll document what this algorithm is and what its flaws are. The algorithm is as follows: Consider the graph as a closed network of pipes with N reservoirs of 1 unit of volume. Rotate this graph so the head is at the top. Now, pour N units of liquid into the top of the network of pipes. The liquid percolates through the network of pipes and ends up in the reservoirs. The selected bisection point is the node that saw as close to N/2 units of liquid as any other node. Now, why isn't this as good (in terms of average and stdev) as Linus' algorithm? The flaw with this algorithm is that some liquid from peer nodes reaches the lower half of the network before the liquid that went through the selected node. As a result, the selected node is elevated "above" the ideal bisection point, by a factor that I believe would be sufficient to explain the higher average and also the greater standard deviation. Still, my algorithm is very simple and very linear, so it's very good a start. As it happens, I believe I have (re-?)discovered a precise way to discover the bisection points in O(n). I am in the midst of implementing that now. jon. ^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2005-06-20 14:32 UTC | newest] Thread overview: 19+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2005-06-18 6:31 git-rev-list: "--bisect" flag Linus Torvalds 2005-06-19 0:18 ` Jon Seymour 2005-06-19 3:38 ` Jan Harkes 2005-06-19 4:07 ` Jon Seymour 2005-06-19 3:43 ` Linus Torvalds 2005-06-19 5:03 ` Linus Torvalds 2005-06-19 5:05 ` David Lang 2005-06-19 5:30 ` Linus Torvalds 2005-06-19 10:15 ` Jon Seymour 2005-06-19 14:41 ` Jon Seymour 2005-06-19 17:20 ` Linus Torvalds 2005-06-19 17:36 ` Ingo Molnar 2005-06-19 19:01 ` Linus Torvalds 2005-06-19 19:55 ` Jon Seymour 2005-06-20 3:09 ` Linus Torvalds 2005-06-20 3:27 ` Jon Seymour 2005-06-20 3:30 ` Jon Seymour 2005-06-20 10:29 ` Jon Seymour 2005-06-20 14:37 ` Jon Seymour
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).