From: Jon Seymour <jon.seymour@gmail.com>
To: Git Mailing List <git@vger.kernel.org>,
Linus Torvalds <torvalds@osdl.org>
Subject: Re: git-rev-list: "--bisect" flag
Date: Mon, 20 Jun 2005 00:41:41 +1000 [thread overview]
Message-ID: <2cfc40320506190741409f3a5@mail.gmail.com> (raw)
In-Reply-To: <2cfc403205061903155a6090db@mail.gmail.com>
Linus,
Would I be correct in stating that an intuitive reason why your
algorithm is better than selecting the linear middle is the following:
If you concentrate on testing merges, rather than non-merges, the
chances are you are going to eliminate N-times as many possible good
commits as if you pick a random commit, where N is the average fan-out
of the commit graph.
With the linear middle algorithm, it doesn't really matter if you pick
a child of a merge - that is almost as good as picking the merge
itself. But if you happen to pick a parent of merge, then if it is
good and the parent is also good, you have wasted the opportunity to
clean up the peer branches of the parent you chose.
So the heuristic should be:
- focus on branches until there are no branches left, then keep
bisecting the list that remains.
Building on this insight, though, I'd like to point out that if you
want to eliminate things on a large scale, the things to focus on
first are epoch boundaries. Epoch boundaries are defined in such a way
that if an epoch boundary is good then everything reachable from an
epoch boundary is also good [ assuming exactly one fault ].
If you happen to choose a child of an epoch boundary, that's ok on
one-level, but on another level it's not ok. In particular, you
haven't eliminated everything beyond the epoch boundary so the open
list will be longer than it strictly needs to be.
So, I propose a modification to my "linear middle" bisection algorithm
as follows:
If there is more than one epoch boundary in the interesting graph,
choose the literal middle epoch boundary, otherwise, if there is more
than one merge in the interesting graph, chose the literal middle
merge, otherwise if there are no merges in the interesting graph,
choose the literal middle of the remaining list.
I am going to build a version of my linear middle code that
demonstrates this algorithm and then test it against the Linux 2.6
kernel.
FWIW: my measurements of your algorithm thus far show that if the bug
exists in the first 1070 of the 2119 commits between HEAD and
v2.6.12-rc2 it consistently (very consistently) takes between 11 and
13 iterations of git-bug-blatt-script to find the bug.
Specifically: average (12.10), median (12), stdev(0.412), max(13), min(11).
This compares with my naive literal middle algorithm (measurements
only for the first 619 commits):
average(21), median(16), stdev(10.6), max(49), min(8).
The number of commits is 2118, the number with a fanout > 1 is 173.
The average fanout of those with fanout is 2. My average is currently
2 times worse than yours and my worst case is 4-5 times worse than
yours.
So, you will be pleased to know that your intuition about the
correctness of your own algorithm has been objectively verified.
My intuition at this point is that my revised algorithm won't
significantly differ from your algorithm. I am thinking it will
require ~ 3 epoch boundary tests to identify the relevant epoch, 5
merge tests to identify the relevant segment and 4 bisections of a
linear segment to identify the correct merge which ends up being 12
iterations of the bug-blatt algorithm which really is no different to
yours. I suspect my algorithm might be better for really, really, big
repositories, with long histories, but I judge that the chance an
interesting bug will be deep in the long history is somewhat remote.
Both algorithms will benefit, however, if the intuition that most bugs
are recent is correct. git-bug-blatt-script could be modified to test
the first epoch boundary before recursively bisecting things. This
will make the typical bug search ~ 9 iterations long (since the
average epoch appears to be less than 512 commits and more than 256
commits big).
jon.
next prev parent reply other threads:[~2005-06-19 14:36 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2005-06-18 6:31 git-rev-list: "--bisect" flag Linus Torvalds
2005-06-19 0:18 ` Jon Seymour
2005-06-19 3:38 ` Jan Harkes
2005-06-19 4:07 ` Jon Seymour
2005-06-19 3:43 ` Linus Torvalds
2005-06-19 5:03 ` Linus Torvalds
2005-06-19 5:05 ` David Lang
2005-06-19 5:30 ` Linus Torvalds
2005-06-19 10:15 ` Jon Seymour
2005-06-19 14:41 ` Jon Seymour [this message]
2005-06-19 17:20 ` Linus Torvalds
2005-06-19 17:36 ` Ingo Molnar
2005-06-19 19:01 ` Linus Torvalds
2005-06-19 19:55 ` Jon Seymour
2005-06-20 3:09 ` Linus Torvalds
2005-06-20 3:27 ` Jon Seymour
2005-06-20 3:30 ` Jon Seymour
2005-06-20 10:29 ` Jon Seymour
2005-06-20 14:37 ` Jon Seymour
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=2cfc40320506190741409f3a5@mail.gmail.com \
--to=jon.seymour@gmail.com \
--cc=git@vger.kernel.org \
--cc=jon@blackcubes.dyndns.org \
--cc=torvalds@osdl.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).