An idea for "git bisect" and a GSoC enquiry

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* An idea for "git bisect" and a GSoC enquiry
@ 2014-02-26  8:28 Jacopo Notarstefano
  2014-02-26 19:58 ` Junio C Hamano
                   ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: Jacopo Notarstefano @ 2014-02-26  8:28 UTC (permalink / raw)
  To: git

Hey everyone,

my name is Jacopo, a student developer from Italy, and I'm interested
in applying to this years' Google Summer of Code. I set my eyes on the
project called "git-bisect improvements", in particular the subtask
about swapping the "good" and "bad" labels when looking for a
bug-fixing release.

I have a very simple proposal for that: add a new "mark" subcommand.
Here is an example of how it should work:

1) A developer wants to find in which commit a past regression was
fixed. She start bisecting as usual with "git bisect start".
2) The current HEAD has the bugfix, so she marks it as fixed with "git
bisect mark fixed".
3) She knows that HEAD~100 had the regression, so she marks it as
unfixed with "git bisect mark unfixed".
4) Now that git knows what the two labels are, it starts bisecting as usual.

For compatibility with already written scripts, "git bisect good" and
"git bisect bad" will alias to "git bisect mark good" and "git bisect
mark bad" respectively.

Does this make sense? Did I overlook some details?

There were already several proposals on this topic, among which those
listed at https://git.wiki.kernel.org/index.php/SmallProjectsIdeas#git_bisect_fix.2Funfixed.
I'm interested in contacting the prospective mentor, Christian Couder,
to go over these. What's the proper way to ask for an introduction? I
tried asking on IRC, but had no success.

Cheers,
Jacopo Notarstefano

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: An idea for "git bisect" and a GSoC enquiry
  2014-02-26  8:28 An idea for "git bisect" and a GSoC enquiry Jacopo Notarstefano
@ 2014-02-26 19:58 ` Junio C Hamano
  2014-02-28  9:00   ` Jacopo Notarstefano
  2014-02-27 11:18 ` Michael Haggerty
  2014-02-27 14:47 ` Christian Couder
  2 siblings, 1 reply; 17+ messages in thread
From: Junio C Hamano @ 2014-02-26 19:58 UTC (permalink / raw)
  To: Jacopo Notarstefano; +Cc: git

Jacopo Notarstefano <jacopo.notarstefano@gmail.com> writes:

> Does this make sense? Did I overlook some details?

How does this solve the labels shown in "git bisect visualize"?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: An idea for "git bisect" and a GSoC enquiry
  2014-02-26 19:58 ` Junio C Hamano
@ 2014-02-28  9:00   ` Jacopo Notarstefano
  0 siblings, 0 replies; 17+ messages in thread
From: Jacopo Notarstefano @ 2014-02-28  9:00 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

Mh. Haven't thought of that. I have no experience with TK, so I'm
having trouble digging up where the "good" and "bad" labels in the GUI
are generated.

I guess that a solution might involve writing a temporary file in
$GIT_DIR called something like BISECT_LABELS in which the chosen
labels are listed and reused across all tools that require them.

(Sorry for sending this email twice, I thought I had sent it to the
list as well!)

On Wed, Feb 26, 2014 at 8:58 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Jacopo Notarstefano <jacopo.notarstefano@gmail.com> writes:
>
>> Does this make sense? Did I overlook some details?
>
> How does this solve the labels shown in "git bisect visualize"?
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: An idea for "git bisect" and a GSoC enquiry
  2014-02-26  8:28 An idea for "git bisect" and a GSoC enquiry Jacopo Notarstefano
  2014-02-26 19:58 ` Junio C Hamano
@ 2014-02-27 11:18 ` Michael Haggerty
  2014-02-27 12:09   ` Matthieu Moy
                     ` (2 more replies)
  2014-02-27 14:47 ` Christian Couder
  2 siblings, 3 replies; 17+ messages in thread
From: Michael Haggerty @ 2014-02-27 11:18 UTC (permalink / raw)
  To: Jacopo Notarstefano; +Cc: git, Christian Couder, Junio C Hamano

On 02/26/2014 09:28 AM, Jacopo Notarstefano wrote:
> my name is Jacopo, a student developer from Italy, and I'm interested
> in applying to this years' Google Summer of Code. I set my eyes on the
> project called "git-bisect improvements", in particular the subtask
> about swapping the "good" and "bad" labels when looking for a
> bug-fixing release.

Hello and welcome!

> I have a very simple proposal for that: add a new "mark" subcommand.
> Here is an example of how it should work:
> 
> 1) A developer wants to find in which commit a past regression was
> fixed. She start bisecting as usual with "git bisect start".
> 2) The current HEAD has the bugfix, so she marks it as fixed with "git
> bisect mark fixed".
> 3) She knows that HEAD~100 had the regression, so she marks it as
> unfixed with "git bisect mark unfixed".
> 4) Now that git knows what the two labels are, it starts bisecting as usual.
> 
> For compatibility with already written scripts, "git bisect good" and
> "git bisect bad" will alias to "git bisect mark good" and "git bisect
> mark bad" respectively.
> 
> Does this make sense? Did I overlook some details?

I don't understand the benefit of adding a new command "mark" rather
than continuing to use "good", "bad", plus new commands "unfixed" and
"fixed".  Does this solve any problems?

What happens if the user mixes, say, "good" and "fixed" in a single
bisect session?

I think it would be more convenient if "git bisect" would autodetect
whether the history went from "good" to "bad" or vice versa.  The
algorithm could be:

1. Wait until the user has marked one commit "bad" and one commit "good".

2. If a "good" commit is an ancestor of a "bad" one, then "git bisect"
should announce "I will now look for the first bad commit".  If
reversed, then announce "I will now look for the first good commit".  If
neither commit is an ancestor of the other, then explain the situation
and ask the user to run "git bisect find-first-bad" or "git bisect
find-first-good" or to mark another commit "bad" or "good".

3. If the user marks another commit, go back to step 2, also doing a
consistency check to make sure that all of the ancestry relationships go
in a consistent direction.

4. After the direction is clear, the old bisect algorithm can be used
(though taking account of the direction).  Obviously a lot of the output
would have to be adjusted, as would the way that a bisect is visualized.

I can't think of any fundamental problems with a scheme like this, and I
think it would be easier to use than the unfixed/fixed scheme.  But that
is only my opinion; other opinions are undoubtedly available :-)

> There were already several proposals on this topic, among which those
> listed at https://git.wiki.kernel.org/index.php/SmallProjectsIdeas#git_bisect_fix.2Funfixed.
> I'm interested in contacting the prospective mentor, Christian Couder,
> to go over these. What's the proper way to ask for an introduction? I
> tried asking on IRC, but had no success.

Just CC Christian on your emails to the mailing list, like I've done
with this email.  As a rule of thumb all communications should go to the
mailing list *plus* any people who are likely to be personally
interested in the topic (e.g., because they have participated in the
thread).

By the way, although "git bisect fixed/unfixed" would be a very useful
improvement, and has gone unimplemented for a lamentably long time, my
personal feeling is that it has too meat in it to constitute a GSoC
project by itself.

Michael

-- 
Michael Haggerty
mhagger@alum.mit.edu
http://softwareswirl.blogspot.com/

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: An idea for "git bisect" and a GSoC enquiry
  2014-02-27 11:18 ` Michael Haggerty
@ 2014-02-27 12:09   ` Matthieu Moy
  2014-02-28  9:03   ` Jacopo Notarstefano
       [not found]   ` <CAL0uuq3TGb2wjaqNxwXYa++E5rjVoozox5mZbzTaE17OKtsVTg@mail.gmail.com>
  2 siblings, 0 replies; 17+ messages in thread
From: Matthieu Moy @ 2014-02-27 12:09 UTC (permalink / raw)
  To: Michael Haggerty
  Cc: Jacopo Notarstefano, git, Christian Couder, Junio C Hamano

----- Original Message -----
> I don't understand the benefit of adding a new command "mark" rather
> than continuing to use "good", "bad", plus new commands "unfixed" and
> "fixed".  Does this solve any problems?

I think it could be interesting to allow arbitrary words here. For example, I recently walked through history to find a performance regression, it would have been natural to use slow/fast instead of bad/good (bad/good would actually do the job, but slightly less naturally). One can look for a change which is neither a fix nor a bug (e.g. when did command foo start behaving like that? when did we start using such or such feature in the code).

I wouldn't fight for it, but I think it makes sense.

-- 
Matthieu Moy
http://www-verimag.imag.fr/~moy/

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: An idea for "git bisect" and a GSoC enquiry
  2014-02-27 11:18 ` Michael Haggerty
  2014-02-27 12:09   ` Matthieu Moy
@ 2014-02-28  9:03   ` Jacopo Notarstefano
  2014-02-28 18:31     ` Junio C Hamano
       [not found]   ` <CAL0uuq3TGb2wjaqNxwXYa++E5rjVoozox5mZbzTaE17OKtsVTg@mail.gmail.com>
  2 siblings, 1 reply; 17+ messages in thread
From: Jacopo Notarstefano @ 2014-02-28  9:03 UTC (permalink / raw)
  To: Michael Haggerty; +Cc: git, Christian Couder, Junio C Hamano

On Thu, Feb 27, 2014 at 12:18 PM, Michael Haggerty <mhagger@alum.mit.edu> wrote:
> I don't understand the benefit of adding a new command "mark" rather
> than continuing to use "good", "bad", plus new commands "unfixed" and
> "fixed".  Does this solve any problems?
>

As Matthieu Moy remarked in a previous email, the main reason is
extensibility: I prefer having a single command to assign new
descriptive labels instead of having to patch git-bisect.sh to create
new labels like fixed, unfixed, fast, slow...

> What happens if the user mixes, say, "good" and "fixed" in a single
> bisect session?
>

I don't think that's an issue. If the user uses the label "fixed"
instead of "bad" she will have a hard time remembering to use it every
time she needs it, and maybe the output of "git bisect" will look very
confusing, but what can git do? This is a semantic user input error,
not a syntax one.

> I think it would be more convenient if "git bisect" would autodetect
> whether the history went from "good" to "bad" or vice versa.  The
> algorithm could be:
>
> 1. Wait until the user has marked one commit "bad" and one commit "good".
>
> 2. If a "good" commit is an ancestor of a "bad" one, then "git bisect"
> should announce "I will now look for the first bad commit".  If
> reversed, then announce "I will now look for the first good commit".  If
> neither commit is an ancestor of the other, then explain the situation
> and ask the user to run "git bisect find-first-bad" or "git bisect
> find-first-good" or to mark another commit "bad" or "good".
>
> 3. If the user marks another commit, go back to step 2, also doing a
> consistency check to make sure that all of the ancestry relationships go
> in a consistent direction.
>
> 4. After the direction is clear, the old bisect algorithm can be used
> (though taking account of the direction).  Obviously a lot of the output
> would have to be adjusted, as would the way that a bisect is visualized.
>
> I can't think of any fundamental problems with a scheme like this, and I
> think it would be easier to use than the unfixed/fixed scheme.  But that
> is only my opinion; other opinions are undoubtedly available :-)
>

I like this idea! It also looks fun to implement. A minor difference
is that I'd rather die with an error on point 2) if there's no
ancestorship relation between the two commits; if the user is asking
for such a thing then she has a fundamental misconception of the state
of her repository.

> By the way, although "git bisect fixed/unfixed" would be a very useful
> improvement, and has gone unimplemented for a lamentably long time, my
> personal feeling is that it has too meat in it to constitute a GSoC
> project by itself.

Oh! Then in fact, as Christian Couder said, this project shouldn't be
marked as "easy".

(Sorry for sending this email twice! I thought I had sent it to the
list as well.)

On Thu, Feb 27, 2014 at 12:18 PM, Michael Haggerty <mhagger@alum.mit.edu> wrote:
> On 02/26/2014 09:28 AM, Jacopo Notarstefano wrote:
>> my name is Jacopo, a student developer from Italy, and I'm interested
>> in applying to this years' Google Summer of Code. I set my eyes on the
>> project called "git-bisect improvements", in particular the subtask
>> about swapping the "good" and "bad" labels when looking for a
>> bug-fixing release.
>
> Hello and welcome!
>
>> I have a very simple proposal for that: add a new "mark" subcommand.
>> Here is an example of how it should work:
>>
>> 1) A developer wants to find in which commit a past regression was
>> fixed. She start bisecting as usual with "git bisect start".
>> 2) The current HEAD has the bugfix, so she marks it as fixed with "git
>> bisect mark fixed".
>> 3) She knows that HEAD~100 had the regression, so she marks it as
>> unfixed with "git bisect mark unfixed".
>> 4) Now that git knows what the two labels are, it starts bisecting as usual.
>>
>> For compatibility with already written scripts, "git bisect good" and
>> "git bisect bad" will alias to "git bisect mark good" and "git bisect
>> mark bad" respectively.
>>
>> Does this make sense? Did I overlook some details?
>
> I don't understand the benefit of adding a new command "mark" rather
> than continuing to use "good", "bad", plus new commands "unfixed" and
> "fixed".  Does this solve any problems?
>
> What happens if the user mixes, say, "good" and "fixed" in a single
> bisect session?
>
> I think it would be more convenient if "git bisect" would autodetect
> whether the history went from "good" to "bad" or vice versa.  The
> algorithm could be:
>
> 1. Wait until the user has marked one commit "bad" and one commit "good".
>
> 2. If a "good" commit is an ancestor of a "bad" one, then "git bisect"
> should announce "I will now look for the first bad commit".  If
> reversed, then announce "I will now look for the first good commit".  If
> neither commit is an ancestor of the other, then explain the situation
> and ask the user to run "git bisect find-first-bad" or "git bisect
> find-first-good" or to mark another commit "bad" or "good".
>
> 3. If the user marks another commit, go back to step 2, also doing a
> consistency check to make sure that all of the ancestry relationships go
> in a consistent direction.
>
> 4. After the direction is clear, the old bisect algorithm can be used
> (though taking account of the direction).  Obviously a lot of the output
> would have to be adjusted, as would the way that a bisect is visualized.
>
> I can't think of any fundamental problems with a scheme like this, and I
> think it would be easier to use than the unfixed/fixed scheme.  But that
> is only my opinion; other opinions are undoubtedly available :-)
>
>> There were already several proposals on this topic, among which those
>> listed at https://git.wiki.kernel.org/index.php/SmallProjectsIdeas#git_bisect_fix.2Funfixed.
>> I'm interested in contacting the prospective mentor, Christian Couder,
>> to go over these. What's the proper way to ask for an introduction? I
>> tried asking on IRC, but had no success.
>
> Just CC Christian on your emails to the mailing list, like I've done
> with this email.  As a rule of thumb all communications should go to the
> mailing list *plus* any people who are likely to be personally
> interested in the topic (e.g., because they have participated in the
> thread).
>
> By the way, although "git bisect fixed/unfixed" would be a very useful
> improvement, and has gone unimplemented for a lamentably long time, my
> personal feeling is that it has too meat in it to constitute a GSoC
> project by itself.
>
> Michael
>
> --
> Michael Haggerty
> mhagger@alum.mit.edu
> http://softwareswirl.blogspot.com/

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: An idea for "git bisect" and a GSoC enquiry
  2014-02-28  9:03   ` Jacopo Notarstefano
@ 2014-02-28 18:31     ` Junio C Hamano
  2014-03-01 11:31       ` Jacopo Notarstefano
  0 siblings, 1 reply; 17+ messages in thread
From: Junio C Hamano @ 2014-02-28 18:31 UTC (permalink / raw)
  To: Jacopo Notarstefano; +Cc: Michael Haggerty, git, Christian Couder

Jacopo Notarstefano <jacopo.notarstefano@gmail.com> writes:

> On Thu, Feb 27, 2014 at 12:18 PM, Michael Haggerty <mhagger@alum.mit.edu> wrote:
>> I don't understand the benefit of adding a new command "mark" rather
>> than continuing to use "good", "bad", plus new commands "unfixed" and
>> "fixed".  Does this solve any problems?
>>
>
> As Matthieu Moy remarked in a previous email, the main reason is
> extensibility: I prefer having a single command to assign new
> descriptive labels instead of having to patch git-bisect.sh to create
> new labels like fixed, unfixed, fast, slow...
>
>> What happens if the user mixes, say, "good" and "fixed" in a single
>> bisect session?
>
> I don't think that's an issue. If the user uses the label "fixed"
> instead of "bad" she will have a hard time remembering to use it every
> time she needs it,...

I am not sure I understand what you are trying to say.  Are you
saying that we should stick to good/bad and allow the users use
nothing else, because allowing "fixed" will be confusing?

> and maybe the output of "git bisect" will look very
> confusing, but what can git do? This is a semantic user input error,
> not a syntax one.

For a young tool or a feature, catering to perfect human perfectly
is a good first goal---if it does not work well even for error-free
human input, it would be worthless.  However, its second goal after
achieving that first goal ought to be to help imperfect humans.

I can very well imagine somebody start hunting for an earlier bugfix
(perhaps trying to find it to backport to an older maintenance
track), start saying "fixed", "broken", "broken", ..., continue
after leaving for lunch for a while, and then try to mark the next
version he tests as "bad" because it has a bug.

It technically may be an user error, in the sense that in such a
"where is the fix?" session, you want to mark a "still-has-bug" one
as "broken" and mark a "no-longer-has-bug" one as "fixed" (just like
"still-broken" as "bad" and "no-longer-broken" as "good" in regular
bisection).  But at that point, the tool *knows* that the user
earlier used "fixed" (or "broken") to mark some commits *already*.

Why do you think there is nothing it can do to help the user?  Upon
seeing "bad", the tool should at least be able to say "Excuse me,
but you earlier said 'fixed' for one of the commits, so your
vocabulary now is limited to 'fixed' and 'broken'".  I think it also
should be able to add "Did you mean to say 'broken'?", or even "I'd
assume that you meant 'broken' and will continue."

I have always wondered if we can introduce a value neutral synonyms
to good and bad.  For a bisect session, we care only about two
states: "still-X" and "no-longer-X" where X may be 'working' for the
normal bug-hunting bisection and 'broken' for the fix-hunting one.

	$ git bisect still-working v1.6.0
        $ git bisect no-longer-working v1.8.0

would be a way to find a bug that was introduced during v1.6.0..v1.8.0,
while

        $ git bisect still-broken v1.6.0
        $ git bisect no-longer-broken v1.8.0

would be a way to find a fix in the same range.  The lowest-level
bisection machinery could just _ignore_ anything after still/no-longer
and do its thing, while the end-user facing layer could enforce,
once one commit is marked as still-X (or no-longer-X), that nothing
other than the same X is used, and issue an error message, perhaps
like this:

	$ git bisect still-broken v1.6.0
        $ git bisect still-working v1.8.0
        error: You earlier marked v1.6.0 as "still-broken" and
        error: are hunting for the first commit that can be marked
        error: as "no-longer-broken".  Say either "still-broken" or
        error: "no-longer-broken", not "still-working".

and that can be done without having to understand that "broken" is
the opposite of "working" (of course if we understood that, we could
even offer to guess that the user meant "no-longer-broken" by
"still-working", but we do not want to go there).

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: An idea for "git bisect" and a GSoC enquiry
  2014-02-28 18:31     ` Junio C Hamano
@ 2014-03-01 11:31       ` Jacopo Notarstefano
  2014-03-03 18:34         ` Junio C Hamano
  0 siblings, 1 reply; 17+ messages in thread
From: Jacopo Notarstefano @ 2014-03-01 11:31 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Michael Haggerty, Git Mailing List, Christian Couder

> I am not sure I understand what you are trying to say.  Are you
> saying that we should stick to good/bad and allow the users use
> nothing else, because allowing "fixed" will be confusing?
>

No! Pretty much the opposite of that. My idea (the "mark" subcommand)
is to let people define their own pairs of labels to represent two
opposite states of a commit. My point was that, if people choose pairs
of words that are not opposites (such as "good" and "fixed") then it's
their error, not something that git should attempt to fix or detect.

> For a young tool or a feature, catering to perfect human perfectly
> is a good first goal---if it does not work well even for error-free
> human input, it would be worthless.  However, its second goal after
> achieving that first goal ought to be to help imperfect humans.
>

Loved this.

> Why do you think there is nothing it can do to help the user?  Upon
> seeing "bad", the tool should at least be able to say "Excuse me,
> but you earlier said 'fixed' for one of the commits, so your
> vocabulary now is limited to 'fixed' and 'broken'".  I think it also
> should be able to add "Did you mean to say 'broken'?", or even "I'd
> assume that you meant 'broken' and will continue."
>

I haven't said this, but this is pretty much what I had in mind.
Suppose a user wants to find a bugfix between HEAD and HEAD~10, this
is what she would do:

$ git bisect start
$ git bisect mark working HEAD
$ git bisect mark broken HEAD~10

[git will now start bisecting as usual. Suppose that she is now at HEAD~5]

$ git bisect mark bad
-> Error: unrecognized label 'bad'. You previously used 'working' and
'fixed' to describe commits in this git bisect session. Please mark
commits with one of these labels.

I suppose that we could cater a little better to imperfect humans if
we had two predefined parallel list of antonyms in which to search for
given labels and infer whether they are positive or negative labels,
but this is beyond the scope of my proposal.

> I have always wondered if we can introduce a value neutral synonyms
> to good and bad.  For a bisect session, we care only about two
> states: "still-X" and "no-longer-X" where X may be 'working' for the
> normal bug-hunting bisection and 'broken' for the fix-hunting one.
>
>         $ git bisect still-working v1.6.0
>         $ git bisect no-longer-working v1.8.0
>
> would be a way to find a bug that was introduced during v1.6.0..v1.8.0,
> while
>
>         $ git bisect still-broken v1.6.0
>         $ git bisect no-longer-broken v1.8.0
>
> would be a way to find a fix in the same range.  The lowest-level
> bisection machinery could just _ignore_ anything after still/no-longer
> and do its thing, [...]

This is remarkably similar to my proposal. Using "mark", these would be:

$ git bisect mark working v1.6.0
$ git bisect mark not-working v1.8.0

and

$ git bisect mark broken v1.6.0
$ git bisect mark not-broken v1.8.0

> while the end-user facing layer could enforce,
> once one commit is marked as still-X (or no-longer-X), that nothing
> other than the same X is used, and issue an error message, perhaps
> like this:
>
>         $ git bisect still-broken v1.6.0
>         $ git bisect still-working v1.8.0
>         error: You earlier marked v1.6.0 as "still-broken" and
>         error: are hunting for the first commit that can be marked
>         error: as "no-longer-broken".  Say either "still-broken" or
>         error: "no-longer-broken", not "still-working".
>
> and that can be done without having to understand that "broken" is
> the opposite of "working" (of course if we understood that, we could
> even offer to guess that the user meant "no-longer-broken" by
> "still-working", but we do not want to go there).

Here my proposal differs in that I have no way of knowing which label
is good and which label is bad: I blindly accept two distinct labels
and bisect with those. I gave an example of this behaviour above.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: An idea for "git bisect" and a GSoC enquiry
  2014-03-01 11:31       ` Jacopo Notarstefano
@ 2014-03-03 18:34         ` Junio C Hamano
  2014-03-12  1:32           ` Jacopo Notarstefano
  0 siblings, 1 reply; 17+ messages in thread
From: Junio C Hamano @ 2014-03-03 18:34 UTC (permalink / raw)
  To: Jacopo Notarstefano; +Cc: Michael Haggerty, Git Mailing List, Christian Couder

Jacopo Notarstefano <jacopo.notarstefano@gmail.com> writes:

> Here my proposal differs in that I have no way of knowing which label
> is good and which label is bad: I blindly accept two distinct labels
> and bisect with those. I gave an example of this behaviour above.

I think you fundamentally cannot use two labels that are merely
"distinct" and bisect correctly.  You need to know which ones
(i.e. good) are to be excluded and the other (i.e. bad) are to be
included when computing the "remaining to be tested" set of commits.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: An idea for "git bisect" and a GSoC enquiry
  2014-03-03 18:34         ` Junio C Hamano
@ 2014-03-12  1:32           ` Jacopo Notarstefano
  2014-03-12 18:31             ` Junio C Hamano
  0 siblings, 1 reply; 17+ messages in thread
From: Jacopo Notarstefano @ 2014-03-12  1:32 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Michael Haggerty, Git Mailing List, Christian Couder

On Mon, Mar 3, 2014 at 7:34 PM, Junio C Hamano <gitster@pobox.com> wrote:
> I think you fundamentally cannot use two labels that are merely
> "distinct" and bisect correctly.  You need to know which ones
> (i.e. good) are to be excluded and the other (i.e. bad) are to be
> included when computing the "remaining to be tested" set of commits.

Good point. Yes, this isn't viable.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: An idea for "git bisect" and a GSoC enquiry
  2014-03-12  1:32           ` Jacopo Notarstefano
@ 2014-03-12 18:31             ` Junio C Hamano
  2014-03-13 17:19               ` Michael Haggerty
  0 siblings, 1 reply; 17+ messages in thread
From: Junio C Hamano @ 2014-03-12 18:31 UTC (permalink / raw)
  To: Jacopo Notarstefano; +Cc: Michael Haggerty, Git Mailing List, Christian Couder

Jacopo Notarstefano <jacopo.notarstefano@gmail.com> writes:

> On Mon, Mar 3, 2014 at 7:34 PM, Junio C Hamano <gitster@pobox.com> wrote:
>> I think you fundamentally cannot use two labels that are merely
>> "distinct" and bisect correctly.  You need to know which ones
>> (i.e. good) are to be excluded and the other (i.e. bad) are to be
>> included when computing the "remaining to be tested" set of commits.
>
> Good point. Yes, this isn't viable.

But if you make them into --no-longer-X vs --still-X, then it will
be viable without us knowing what X means.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: An idea for "git bisect" and a GSoC enquiry
  2014-03-12 18:31             ` Junio C Hamano
@ 2014-03-13 17:19               ` Michael Haggerty
  2014-03-13 18:47                 ` Junio C Hamano
  0 siblings, 1 reply; 17+ messages in thread
From: Michael Haggerty @ 2014-03-13 17:19 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jacopo Notarstefano, Git Mailing List, Christian Couder

On 03/12/2014 07:31 PM, Junio C Hamano wrote:
> Jacopo Notarstefano <jacopo.notarstefano@gmail.com> writes:
> 
>> On Mon, Mar 3, 2014 at 7:34 PM, Junio C Hamano <gitster@pobox.com> wrote:
>>> I think you fundamentally cannot use two labels that are merely
>>> "distinct" and bisect correctly.  You need to know which ones
>>> (i.e. good) are to be excluded and the other (i.e. bad) are to be
>>> included when computing the "remaining to be tested" set of commits.
>>
>> Good point. Yes, this isn't viable.
> 
> But if you make them into --no-longer-X vs --still-X, then it will
> be viable without us knowing what X means.

Yes, but who wants to type such long and inelegant option names?

It seems to me that we can infer which mark is which from the normal
bisect user interaction.  At the startup phase of a bisect, there are
only three cases:

1. There are fewer than two different types of marks on tested commits.
   For example, maybe one commit has been marked "bad".  Or two commits
   have both been marked "slow".  In this case we wait for the user to
   choose another commit manually, so we don't have to know the meaning
   of the mark.

2. There are two different types of marks, but no commits with
   differing marks are ancestors of each other.  In this case, we pick
   the merge base of two commits with differing marks and present it
   to the user for testing.  But we can do that without knowing which
   mark is "before the change" and which mark means "after the
   change".  So just defer the inference.

3. There are two different types of marks, and a commit with one mark
   is an ancestor of a commit with the other mark.  In this case, it is
   clear from the ancestry which mark means "before the change" and
   which mark means "after the change".  So record the "orientation" of
   the marks and continue like in the old days.

Of course, there are still details to be worked out, like how to tag the
commits before we know which mark means what.  But that is just a
clerical problem, not a fundamental one.

Michael

-- 
Michael Haggerty
mhagger@alum.mit.edu
http://softwareswirl.blogspot.com/

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: An idea for "git bisect" and a GSoC enquiry
  2014-03-13 17:19               ` Michael Haggerty
@ 2014-03-13 18:47                 ` Junio C Hamano
  0 siblings, 0 replies; 17+ messages in thread
From: Junio C Hamano @ 2014-03-13 18:47 UTC (permalink / raw)
  To: Michael Haggerty; +Cc: Jacopo Notarstefano, Git Mailing List, Christian Couder

Michael Haggerty <mhagger@alum.mit.edu> writes:

> It seems to me that we can infer which mark is which from the normal
> bisect user interaction.  At the startup phase of a bisect, there are
> only three cases:
>
> 1. There are fewer than two different types of marks on tested commits.
>    For example, maybe one commit has been marked "bad".  Or two commits
>    have both been marked "slow".  In this case we wait for the user to
>    choose another commit manually, so we don't have to know the meaning
>    of the mark.
>
> 2. There are two different types of marks, but no commits with
>    differing marks are ancestors of each other.  In this case, we pick
>    the merge base of two commits with differing marks and present it
>    to the user for testing.  But we can do that without knowing which
>    mark is "before the change" and which mark means "after the
>    change".  So just defer the inference.
>
> 3. There are two different types of marks, and a commit with one mark
>    is an ancestor of a commit with the other mark.  In this case, it is
>    clear from the ancestry which mark means "before the change" and
>    which mark means "after the change".  So record the "orientation" of
>    the marks and continue like in the old days.
>
> Of course, there are still details to be worked out, like how to tag the
> commits before we know which mark means what.  But that is just a
> clerical problem, not a fundamental one.

Yup, with an extra "state" kept somewhere in $GIT_DIR, we should in
principle be able to defer the "value judgement" (aka "which one
should be treated as a bottom of the range").

The first change that is needed for this scheme to be workable is to
decide how we mark such an unknown state at the beginning, though.
We assume that we need to keep track of a single top one ("bad", aka
"no-longer-good") while we have to keep track of multiple bottom
ones ("good").

There also is a safety valve in the current logic for transitioning
from case #2 to case #3; when a common ancestor is marked as "bad"
(aka "no-longer-good"), we notice that the original bisection is
screwy in the sense that the user is seeing not just a single state
flip that made something that used to be good into bad.

I am afraid that we may instead _silently_ decide that the user is
trying to locate a state flip that made something that used to be
bad (at the common ancestor) into good with the logic proposed
above.  From the point of view of the user who wanted to find a
regression by marking one as "bad" and the other "good", running
bisection whose semantics suddenly and silently changed into an
opposite "where was it fixed" hunt would be an unpleasant and
confusing experience.  I do not know, without knowing the meaning of
"slow" and "fast" (which implicitly tells us which way the user
intends to bisect), how well we can keep that safety valve.

Other than that, I like the idea.

^ permalink raw reply	[flat|nested] 17+ messages in thread

[parent not found: <CAL0uuq3TGb2wjaqNxwXYa++E5rjVoozox5mZbzTaE17OKtsVTg@mail.gmail.com>]

[parent not found: <a8cf74b4-bae1-4511-a45e-d4ca90e3c3e1@email.android.com>]

* Re: An idea for "git bisect" and a GSoC enquiry
       [not found]     ` <a8cf74b4-bae1-4511-a45e-d4ca90e3c3e1@email.android.com>
@ 2014-02-28  9:07       ` Jacopo Notarstefano
  2014-02-28  9:13       ` Jacopo Notarstefano
  1 sibling, 0 replies; 17+ messages in thread
From: Jacopo Notarstefano @ 2014-02-28  9:07 UTC (permalink / raw)
  To: Michael Haggerty; +Cc: git, Junio C Hamano, Christian Couder

This email was sent privately by Michael to me as a result of my
previous error. I'm quoting it in its entirety so that he doesn't have
to submit it twice.

On Thu, Feb 27, 2014 at 8:32 PM, Michael Haggerty <mhagger@alum.mit.edu> wrote:
> Please forgive my typos and brevity; this was typed on a phone.
>
> Michael
> On February 27, 2014 5:16:40 PM CET, Jacopo Notarstefano <jacopo.notarstefano@gmail.com> wrote:
>>On Thu, Feb 27, 2014 at 12:18 PM, Michael Haggerty
>><mhagger@alum.mit.edu> wrote:
>>> What happens if the user mixes, say, "good" and "fixed" in a single
>>> bisect session?
>>>
>>
>>I don't think that's an issue. If the user uses the label "fixed"
>>instead of "bad" she will have a hard time remembering to use it every
>>time she needs it, and maybe the output of "git bisect" will look very
>>confusing, but what can git do? This is a semantic user input error,
>>not a syntax one.
>
> - git could emit an error message and refuse to continue
> - git could interpret the command one way or the other, with or without a warning
>
> By my count that gives at least five possibilities. The feature cannot be implemented without choosing one.
>
>>> I think it would be more convenient if "git bisect" would autodetect
>>> whether the history went from "good" to "bad" or vice versa.  The
>>> algorithm could be:
>>>
>>> 1. Wait until the user has marked one commit "bad" and one commit
>>"good".
>>>
>>> 2. If a "good" commit is an ancestor of a "bad" one, then "git
>>bisect"
>>> should announce "I will now look for the first bad commit".  If
>>> reversed, then announce "I will now look for the first good commit".
>>If
>>> neither commit is an ancestor of the other, then explain the
>>situation
>>> and ask the user to run "git bisect find-first-bad" or "git bisect
>>> find-first-good" or to mark another commit "bad" or "good".
>>>
>>> 3. If the user marks another commit, go back to step 2, also doing a
>>> consistency check to make sure that all of the ancestry relationships
>>go
>>> in a consistent direction.
>>>
>>> 4. After the direction is clear, the old bisect algorithm can be used
>>> (though taking account of the direction).  Obviously a lot of the
>>output
>>> would have to be adjusted, as would the way that a bisect is
>>visualized.
>>>
>>> I can't think of any fundamental problems with a scheme like this,
>>and I
>>> think it would be easier to use than the unfixed/fixed scheme.  But
>>that
>>> is only my opinion; other opinions are undoubtedly available :-)
>>>
>>
>>I like this idea! It also looks fun to implement. A minor difference
>>is that I'd rather die with an error on point 2) if there's no
>>ancestorship relation between the two commits; if the user is asking
>>for such a thing then she has a fundamental misconception of the state
>>of her repository.
>
> That is not correct. If there is a bug on one branch but not another, it is legitimate to ask when the bug was introduced, and git bisect can indeed handle this case today (think about how this could work, and try it!)
>
>>> By the way, although "git bisect fixed/unfixed" would be a very
>>useful
>>> improvement, and has gone unimplemented for a lamentably long time,
>>my
>>> personal feeling is that it has too meat in it to constitute a GSoC
>>> project by itself.
>>>
>>
>>Oh! Then in fact, as Christian Couder said, this project shouldn't be
>>marked as "easy".
>
> Sorry for the typo; I meant to say "too LITTLE meat".
>
>
> --
> Michael Haggerty
> mhagger@alum.mit.edu

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: An idea for "git bisect" and a GSoC enquiry
       [not found]     ` <a8cf74b4-bae1-4511-a45e-d4ca90e3c3e1@email.android.com>
  2014-02-28  9:07       ` Jacopo Notarstefano
@ 2014-02-28  9:13       ` Jacopo Notarstefano
  1 sibling, 0 replies; 17+ messages in thread
From: Jacopo Notarstefano @ 2014-02-28  9:13 UTC (permalink / raw)
  To: Michael Haggerty; +Cc: git, Junio C Hamano, Christian Couder

> - git could emit an error message and refuse to continue
> - git could interpret the command one way or the other, with or without a warning
>
> By my count that gives at least five possibilities. The feature cannot be implemented without choosing one.
>

Let me explain what I meant with an example.
1) The user starts bisecting with bisect start.
2) The user marks HEAD as good with git bisect mark good.
3) The user then marks HEAD~10 as fixed with git bisect mark fixed.
4) Git will then continue bisecting as usual with the labels "good"
and "fixed" instead of "bad" and "good" respectively.

This is very confusing, but is a result of a user semantic error, so
no warning is emitted. After all, this might have been what the user
wanted.

> That is not correct. If there is a bug on one branch but not another, it is legitimate to ask when the bug was introduced, and git bisect can indeed handle this case today (think about how this could work, and try it!)
>

Interesting. I did not know that. Yes, I see how that might pan out,
and why my idea is worse.

> Sorry for the typo; I meant to say "too LITTLE meat".
>

Ok. Not a big issue for me: I might squash another project together in
my proposal. I've already seen one that piqued my interest: "Unifying
git branch -l, git tag -l, and git for-each-ref".

(Sorry for sending this email twice! I thought I had sent it to the
list as well.)

On Thu, Feb 27, 2014 at 8:32 PM, Michael Haggerty <mhagger@alum.mit.edu> wrote:
> Please forgive my typos and brevity; this was typed on a phone.
>
> Michael
> On February 27, 2014 5:16:40 PM CET, Jacopo Notarstefano <jacopo.notarstefano@gmail.com> wrote:
>>On Thu, Feb 27, 2014 at 12:18 PM, Michael Haggerty
>><mhagger@alum.mit.edu> wrote:
>>> What happens if the user mixes, say, "good" and "fixed" in a single
>>> bisect session?
>>>
>>
>>I don't think that's an issue. If the user uses the label "fixed"
>>instead of "bad" she will have a hard time remembering to use it every
>>time she needs it, and maybe the output of "git bisect" will look very
>>confusing, but what can git do? This is a semantic user input error,
>>not a syntax one.
>
> - git could emit an error message and refuse to continue
> - git could interpret the command one way or the other, with or without a warning
>
> By my count that gives at least five possibilities. The feature cannot be implemented without choosing one.
>
>>> I think it would be more convenient if "git bisect" would autodetect
>>> whether the history went from "good" to "bad" or vice versa.  The
>>> algorithm could be:
>>>
>>> 1. Wait until the user has marked one commit "bad" and one commit
>>"good".
>>>
>>> 2. If a "good" commit is an ancestor of a "bad" one, then "git
>>bisect"
>>> should announce "I will now look for the first bad commit".  If
>>> reversed, then announce "I will now look for the first good commit".
>>If
>>> neither commit is an ancestor of the other, then explain the
>>situation
>>> and ask the user to run "git bisect find-first-bad" or "git bisect
>>> find-first-good" or to mark another commit "bad" or "good".
>>>
>>> 3. If the user marks another commit, go back to step 2, also doing a
>>> consistency check to make sure that all of the ancestry relationships
>>go
>>> in a consistent direction.
>>>
>>> 4. After the direction is clear, the old bisect algorithm can be used
>>> (though taking account of the direction).  Obviously a lot of the
>>output
>>> would have to be adjusted, as would the way that a bisect is
>>visualized.
>>>
>>> I can't think of any fundamental problems with a scheme like this,
>>and I
>>> think it would be easier to use than the unfixed/fixed scheme.  But
>>that
>>> is only my opinion; other opinions are undoubtedly available :-)
>>>
>>
>>I like this idea! It also looks fun to implement. A minor difference
>>is that I'd rather die with an error on point 2) if there's no
>>ancestorship relation between the two commits; if the user is asking
>>for such a thing then she has a fundamental misconception of the state
>>of her repository.
>
> That is not correct. If there is a bug on one branch but not another, it is legitimate to ask when the bug was introduced, and git bisect can indeed handle this case today (think about how this could work, and try it!)
>
>>> By the way, although "git bisect fixed/unfixed" would be a very
>>useful
>>> improvement, and has gone unimplemented for a lamentably long time,
>>my
>>> personal feeling is that it has too meat in it to constitute a GSoC
>>> project by itself.
>>>
>>
>>Oh! Then in fact, as Christian Couder said, this project shouldn't be
>>marked as "easy".
>
> Sorry for the typo; I meant to say "too LITTLE meat".
>
>
> --
> Michael Haggerty
> mhagger@alum.mit.edu

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: An idea for "git bisect" and a GSoC enquiry
  2014-02-26  8:28 An idea for "git bisect" and a GSoC enquiry Jacopo Notarstefano
  2014-02-26 19:58 ` Junio C Hamano
  2014-02-27 11:18 ` Michael Haggerty
@ 2014-02-27 14:47 ` Christian Couder
  2014-02-27 22:46   ` Andrew Ardill
  2 siblings, 1 reply; 17+ messages in thread
From: Christian Couder @ 2014-02-27 14:47 UTC (permalink / raw)
  To: Jacopo Notarstefano; +Cc: git

Hi,

On Wed, Feb 26, 2014 at 9:28 AM, Jacopo Notarstefano
<jacopo.notarstefano@gmail.com> wrote:
> Hey everyone,
>
> my name is Jacopo, a student developer from Italy, and I'm interested
> in applying to this years' Google Summer of Code. I set my eyes on the
> project called "git-bisect improvements", in particular the subtask
> about swapping the "good" and "bad" labels when looking for a
> bug-fixing release.
>
> I have a very simple proposal for that: add a new "mark" subcommand.
> Here is an example of how it should work:
>
> 1) A developer wants to find in which commit a past regression was
> fixed. She start bisecting as usual with "git bisect start".
> 2) The current HEAD has the bugfix, so she marks it as fixed with "git
> bisect mark fixed".
> 3) She knows that HEAD~100 had the regression, so she marks it as
> unfixed with "git bisect mark unfixed".
> 4) Now that git knows what the two labels are, it starts bisecting as usual.
>
> For compatibility with already written scripts, "git bisect good" and
> "git bisect bad" will alias to "git bisect mark good" and "git bisect
> mark bad" respectively.
>
> Does this make sense? Did I overlook some details?

As Junio said adding a command "mark" doesn't by itself solve the
difficult problems related to this project.
(By the way I think it is misleading to state that this GSoC is "easy".)

> There were already several proposals on this topic, among which those
> listed at https://git.wiki.kernel.org/index.php/SmallProjectsIdeas#git_bisect_fix.2Funfixed.
> I'm interested in contacting the prospective mentor, Christian Couder,
> to go over these. What's the proper way to ask for an introduction?

As Michael said, you can just CC me or send me a private email.

But I think the most important thing right now is first to gather as
much information as you can from the previous discussions on this
topic on this mainling list.
Perhaps you should also gather information on how git bisect works.

It will help you understand what are the difficult problems.

One of the problems, for example, is that git bisect can work using a
"good" commit that is not an ancestor of the "bad" commit.
In this case it will checkout the merge bases between the good and the
bad commit. (And by the way this is related to the bug that should
also be fixed as part of this project.)

Then you are welcome to come back and ask questions, or suggest solutions.

> I tried asking on IRC, but had no success.

Sorry but I don't use IRC.

Thanks,
Christian.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: An idea for "git bisect" and a GSoC enquiry
  2014-02-27 14:47 ` Christian Couder
@ 2014-02-27 22:46   ` Andrew Ardill
  0 siblings, 0 replies; 17+ messages in thread
From: Andrew Ardill @ 2014-02-27 22:46 UTC (permalink / raw)
  To: Christian Couder; +Cc: Jacopo Notarstefano, git

On 27 February 2014 06:47, Christian Couder <christian.couder@gmail.com> wrote:
> But I think the most important thing right now is first to gather as
> much information as you can from the previous discussions on this
> topic on this mainling list.
> Perhaps you should also gather information on how git bisect works.

I have also, at one time, started working on this problem, though I
never submitted any of my patches :(. I went the way of renaming the
internal logic to make it less tied to the good/bad distinction that
is currently hard coded in. That may not be the best starting point,
but let me summarise the thoughts I had at the time, particularly
around the different adjective pairs that we might use.

A general description of git bisect is that you start with a commit
that exhibits a given property, find a commit that does not have that
property, and then look for when the property was introduced. I think
of this property as the 'bisect property' of the bisect search. The
property is described with our adjective pair, currently 'bad' (with
the property) and 'good' (without the property). We assume that
commits with the property have an ancestor without the property, and
as this assumption is so essential to how git bisect works I think of
it as the 'bisect relationship' of the bisect search, and we care
about the direction of this relationship between commits.

The proposed adjectives tend to be along the lines of the following:

- good->bad (current); good<->bad
The bisect property is currently always described as 'bad', the
introduction of a bug being the motivating use case. The problem with
this is that we often want to find when a 'good' behaviour was
introduced, or when a neutral change occurred.
A solution is to allow reversing our bisect relationship, by either
detecting the intended direction or allowing the user to choose. If we
reverse the direction our adjectives also flip, and so the bisect
property we are now looking for is 'good' instead of 'bad'. The terms
good and bad don't work well with neutral searches.

- unfixed->fixed
For this pair, the bisect property would always be described by the
'fixed' adjective. It seems odd to ever reverse the bisect
relationship, as we don't usually say something was 'fixed' and then
became 'unfixed'. The behaviour of this pair would thus be near
identical to current usage of 'good->bad', but with the bisect
property conceptually reversed (when was a bug fixed vs when was a bug
introduced).

- old->new
This pair avoids making any judgement on what type of bisect property
we have. The adjectives are thus simply describing the bisect
relationship, and the user is free to use any bisect property they
wish. The main problem with this is that it is possible to have
commits without the property (thus described as 'old') that were made
chronologically after a commit with the property ('new'). This has the
potential to cause confusion for users.

- without->with
This pair also avoids making a judgement on the bisect property, but
avoids potential chronological confusion that 'old->new' has. You
could potentially allow users to reverse the bisect relationship's
direction, but these adjectives allow you to easily invert the bisect
property without causing confusion. For example, 'without bug XYZ' can
instead be written as 'with bug XYZ fixed'.

----

My preference is for the without->with adjective pair, as I believe it
maps most closely to the concept of finding a commit that changed a
given property, and it allows that property to be negated without
introducing too much confusion. Reversing the relationship's direction
would also make sense, however that is a significantly greater change
to the commands logic.

Thus, my initial work was to refactor the internal naming to use the
terms with and without, as that would make a better place from which
to add other features (such as reversing the relationship direction,
or adding new adjective pairs).

Sorry if that is all confusing to read, or if I'm repeating things
that have been said before :)

Regards,

Andrew Ardill

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2014-03-13 18:47 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-02-26  8:28 An idea for "git bisect" and a GSoC enquiry Jacopo Notarstefano
2014-02-26 19:58 ` Junio C Hamano
2014-02-28  9:00   ` Jacopo Notarstefano
2014-02-27 11:18 ` Michael Haggerty
2014-02-27 12:09   ` Matthieu Moy
2014-02-28  9:03   ` Jacopo Notarstefano
2014-02-28 18:31     ` Junio C Hamano
2014-03-01 11:31       ` Jacopo Notarstefano
2014-03-03 18:34         ` Junio C Hamano
2014-03-12  1:32           ` Jacopo Notarstefano
2014-03-12 18:31             ` Junio C Hamano
2014-03-13 17:19               ` Michael Haggerty
2014-03-13 18:47                 ` Junio C Hamano
     [not found]   ` <CAL0uuq3TGb2wjaqNxwXYa++E5rjVoozox5mZbzTaE17OKtsVTg@mail.gmail.com>
     [not found]     ` <a8cf74b4-bae1-4511-a45e-d4ca90e3c3e1@email.android.com>
2014-02-28  9:07       ` Jacopo Notarstefano
2014-02-28  9:13       ` Jacopo Notarstefano
2014-02-27 14:47 ` Christian Couder
2014-02-27 22:46   ` Andrew Ardill

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).