git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Ealdwulf Wuffinga <ealdwulf@googlemail.com>
To: "H. Peter Anvin" <hpa@zytor.com>
Cc: Clemens Buchacher <drizzd@aon.at>,
	Christian Couder <chriscool@tuxfamily.org>,
	Sam Vilain <sam@vilain.net>,
	Git Mailing List <git@vger.kernel.org>
Subject: Re: RFE: "git bisect reverse"
Date: Thu, 28 May 2009 22:07:29 +0100	[thread overview]
Message-ID: <efe2b6d70905281407x56bb788aq3dba4b27eb91d7a6@mail.gmail.com> (raw)
In-Reply-To: <4A1E00F1.4030709@zytor.com>

On Thu, May 28, 2009 at 4:11 AM, H. Peter Anvin <hpa@zytor.com> wrote:

> Again, given a bisection, the information gain by "bisecting" at point x
>  where 0 < x < 1 is:
>
>        -(x log2 x)-((1-x) log2 (1-x))
>
> At x = 0.5 this gives the optimal 1 bit, but the curve is rather flat
> near the top.  You don't drop to 1/2 bit of information until
> x = 0.11 or 0.89, and it doesn't drop to 1/4 bit of information until
> x = 0.04 or 0.96.
>
> Thus, the lack of optimality in searching away from a skip point is much
> smaller than the potential cost of having to having to skip multiple
> nearby points.


I understand that. I didn't mean to imply that there was anything
wrong with your proposal, indeed, it makes sense for git-bisect.

What I am interested in is how to extend bisection to the case of
intermittent bugs; where a test which observes the fault means that it
cannot have been introduced in subsequent commits, but a test which
does not observe the fault cannot guarantee that it must have been
introduced in a subsequent commit.

The simplest way to deal with this is to try to reduce it to the
deterministic case
by repeating the test some number of times. It turns out, that this is
rather inefficient.

In bbchop, the search algorithm does not assume that the test is deterministic.
Therefore, it has to calculate the probabilities in order to know when it has
accumulated enough evidence to accuse a particular commit. It turns
out that it is not much more expensive to calculate which commit we
can expect to  gain the most information from by testing it next.

How can I incorporate your skipping feature into this model? The problem is that
while (just thinking about the linear case for the moment) there is a
fixed boundary at one end - where we actually saw a fault - on the
other side there are a bunch of fuzzy probabilities, ultimately
bounded by wherever we decided the limit of the search was.
So when we get a skip we could hop half way toward the limit. That
would be reasonable toward the beginning of the search, but towards
the end when most of the probability is concentrated in a small number
of commits, it would make no sense.

It would fit a lot better into this algorithm to have some model of
the probability that a commit will cause a skip. It doesn't actually
have to be a very good one, because if it's poor it will only make the
search slightly less efficient, not affect the reliability of the
final result.

Ealdwulf

  reply	other threads:[~2009-05-28 21:08 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-05-26 22:21 RFE: "git bisect reverse" H. Peter Anvin
2009-05-27  3:00 ` Sam Vilain
2009-05-27  4:20   ` H. Peter Anvin
2009-05-27  5:26     ` Christian Couder
2009-05-27 21:11       ` Ealdwulf Wuffinga
2009-05-27 21:18         ` Clemens Buchacher
2009-05-27 22:07           ` Ealdwulf Wuffinga
2009-05-27 23:08             ` Sam Vilain
2009-05-28 20:29               ` Ealdwulf Wuffinga
2009-05-29  4:20                 ` Sam Vilain
2009-05-31 22:41                   ` Ealdwulf Wuffinga
2009-05-28  3:11             ` H. Peter Anvin
2009-05-28 21:07               ` Ealdwulf Wuffinga [this message]
2009-05-28 21:54                 ` H. Peter Anvin
2009-05-31 22:18                   ` Ealdwulf Wuffinga
2009-05-27 20:11   ` Christian Couder
2009-05-27  8:22 ` Nanako Shiraishi
2009-05-27 20:26   ` Matthieu Moy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=efe2b6d70905281407x56bb788aq3dba4b27eb91d7a6@mail.gmail.com \
    --to=ealdwulf@googlemail.com \
    --cc=chriscool@tuxfamily.org \
    --cc=drizzd@aon.at \
    --cc=git@vger.kernel.org \
    --cc=hpa@zytor.com \
    --cc=sam@vilain.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).