From mboxrd@z Thu Jan 1 00:00:00 1970 From: Ealdwulf Wuffinga Subject: Re: RFE: "git bisect reverse" Date: Thu, 28 May 2009 22:07:29 +0100 Message-ID: References: <4A1C6B70.4050501@zytor.com> <4A1CACB2.7000702@vilain.net> <4A1CBF7A.3090708@zytor.com> <200905270726.59883.chriscool@tuxfamily.org> <20090527211836.GA14841@localhost> <4A1E00F1.4030709@zytor.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: Clemens Buchacher , Christian Couder , Sam Vilain , Git Mailing List To: "H. Peter Anvin" X-From: git-owner@vger.kernel.org Thu May 28 23:08:03 2009 Return-path: Envelope-to: gcvg-git-2@gmane.org Received: from vger.kernel.org ([209.132.176.167]) by lo.gmane.org with esmtp (Exim 4.50) id 1M9mpU-0004a0-1a for gcvg-git-2@gmane.org; Thu, 28 May 2009 23:08:00 +0200 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1763530AbZE1VHt convert rfc822-to-quoted-printable (ORCPT ); Thu, 28 May 2009 17:07:49 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1762294AbZE1VHt (ORCPT ); Thu, 28 May 2009 17:07:49 -0400 Received: from mail-ew0-f176.google.com ([209.85.219.176]:40464 "EHLO mail-ew0-f176.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756638AbZE1VHs convert rfc822-to-8bit (ORCPT ); Thu, 28 May 2009 17:07:48 -0400 Received: by ewy24 with SMTP id 24so5819003ewy.37 for ; Thu, 28 May 2009 14:07:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :from:date:message-id:subject:to:cc:content-type :content-transfer-encoding; bh=amyoNWRFALE1lpdDdIMi5aOBAYiPW3rrrwg89PRN3hc=; b=HUmokYtaVPRMzVxhfBSI/EKS4QhtWYvquabvBafSCxqfIiil6RJFUMDYUPZZqcRJoo IF9qdoT0MgwEv8ujXnTk2KEV4qLELDewRIf1slewjDHK8ct5BOaQwp20gRVzpNM4MmFI HpAZTry/ha1DQhGbN+sACAK2r2DEHIWJZTGps= DomainKey-Signature: a=rsa-sha1; c=nofws; d=googlemail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type:content-transfer-encoding; b=ZdsDQjiKYYv4HkIzxDWMgyjesrCrz/uwdduboLIfxk+47GXKfIhDOQXkycPdJLJFVR +QopPDWRypku1muyGnig/xWoCUYi/rwvxBKeWVZPH6uqoZ5aDvT3jCBU/nRna3ZhgheU KSEcglbweowAbIzP/gNaabuMwTzJTfW3gzSXA= Received: by 10.216.4.80 with SMTP id 58mr660259wei.173.1243544869085; Thu, 28 May 2009 14:07:49 -0700 (PDT) In-Reply-To: <4A1E00F1.4030709@zytor.com> Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Archived-At: On Thu, May 28, 2009 at 4:11 AM, H. Peter Anvin wrote: > Again, given a bisection, the information gain by "bisecting" at poin= t x > =A0where 0 < x < 1 is: > > =A0 =A0 =A0 =A0-(x log2 x)-((1-x) log2 (1-x)) > > At x =3D 0.5 this gives the optimal 1 bit, but the curve is rather fl= at > near the top. =A0You don't drop to 1/2 bit of information until > x =3D 0.11 or 0.89, and it doesn't drop to 1/4 bit of information unt= il > x =3D 0.04 or 0.96. > > Thus, the lack of optimality in searching away from a skip point is m= uch > smaller than the potential cost of having to having to skip multiple > nearby points. I understand that. I didn't mean to imply that there was anything wrong with your proposal, indeed, it makes sense for git-bisect. What I am interested in is how to extend bisection to the case of intermittent bugs; where a test which observes the fault means that it cannot have been introduced in subsequent commits, but a test which does not observe the fault cannot guarantee that it must have been introduced in a subsequent commit. The simplest way to deal with this is to try to reduce it to the deterministic case by repeating the test some number of times. It turns out, that this is rather inefficient. In bbchop, the search algorithm does not assume that the test is determ= inistic. Therefore, it has to calculate the probabilities in order to know when = it has accumulated enough evidence to accuse a particular commit. It turns out that it is not much more expensive to calculate which commit we can expect to gain the most information from by testing it next. How can I incorporate your skipping feature into this model? The proble= m is that while (just thinking about the linear case for the moment) there is a fixed boundary at one end - where we actually saw a fault - on the other side there are a bunch of fuzzy probabilities, ultimately bounded by wherever we decided the limit of the search was. So when we get a skip we could hop half way toward the limit. That would be reasonable toward the beginning of the search, but towards the end when most of the probability is concentrated in a small number of commits, it would make no sense. It would fit a lot better into this algorithm to have some model of the probability that a commit will cause a skip. It doesn't actually have to be a very good one, because if it's poor it will only make the search slightly less efficient, not affect the reliability of the final result. Ealdwulf