* [BUG-ish] diff compaction heuristic false positive @ 2016-06-10 7:50 Jeff King 2016-06-10 8:31 ` Jeff King 2016-06-10 8:31 ` [BUG-ish] diff compaction heuristic false positive Michael Haggerty 0 siblings, 2 replies; 16+ messages in thread From: Jeff King @ 2016-06-10 7:50 UTC (permalink / raw) To: Stefan Beller; +Cc: Junio C Hamano, Jacob Keller, git I found a false positive with the new compaction heuristic in v2.9: -- >8 -- # start with a simple file cat >file.rb <<\EOF def foo do_foo_stuff() common_ending() end EOF git add file.rb git commit -m base # and then add another function with a similar ending cat >>file.rb <<\EOF def bar do_bar_stuff() common_ending() end EOF -- 8< -- I get this rather unfortunate diff: $ git diff diff --git a/file.rb b/file.rb index bd9d1cb..67fbeba 100644 --- a/file.rb +++ b/file.rb @@ -1,5 +1,11 @@ def foo do_foo_stuff() + common_ending() +end + +def bar + do_bar_stuff() + common_ending() end but without the compaction heuristic (or with an older git), I get: $ git -c diff.compactionHeuristic=false diff diff --git a/file.rb b/file.rb index bd9d1cb..67fbeba 100644 --- a/file.rb +++ b/file.rb @@ -3,3 +3,9 @@ def foo common_ending() end + +def bar + do_bar_stuff() + + common_ending() +end :( The problem is that the common bits are separated from the interesting bits by a blank line. This is simplified from a real-world example, but I think you could come up with the same example in C, like: if (foo) { do_foo(); something_else(); } -Peff ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [BUG-ish] diff compaction heuristic false positive 2016-06-10 7:50 [BUG-ish] diff compaction heuristic false positive Jeff King @ 2016-06-10 8:31 ` Jeff King 2016-06-10 15:56 ` Junio C Hamano 2016-06-10 8:31 ` [BUG-ish] diff compaction heuristic false positive Michael Haggerty 1 sibling, 1 reply; 16+ messages in thread From: Jeff King @ 2016-06-10 8:31 UTC (permalink / raw) To: Stefan Beller; +Cc: Junio C Hamano, Jacob Keller, git On Fri, Jun 10, 2016 at 03:50:43AM -0400, Jeff King wrote: > I found a false positive with the new compaction heuristic in v2.9: > [...] And by the way, this is less "hey neat, I found a case" and more "wow, this is a lot worse than I thought". I diffed the old and new output for the top 10,000 commits in this particular ruby code base. There were 45 commits with changed diffs. Spot-checking them manually, a little over 1/3 of them featured this bad pattern. The others looked like strict improvements. That's a lot worse than the outcomes we saw on other code bases earlier. 1/3 bad is still a net improvement, so I dunno. Is this worth worrying about? Should we bring back the documentation for the knob to disable it? Should we consider making it tunable via gitattributes? I don't think that last one really helps; the good cases _and_ the bad ones are both in ruby code (though certainly the C code we looked at earlier was all good). It may also be possible to make it Just Work by using extra information like indentation. I haven't thought hard enough about that to say. -Peff ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [BUG-ish] diff compaction heuristic false positive 2016-06-10 8:31 ` Jeff King @ 2016-06-10 15:56 ` Junio C Hamano 2016-06-10 16:25 ` Stefan Beller 0 siblings, 1 reply; 16+ messages in thread From: Junio C Hamano @ 2016-06-10 15:56 UTC (permalink / raw) To: Jeff King; +Cc: Stefan Beller, Jacob Keller, git Jeff King <peff@peff.net> writes: > On Fri, Jun 10, 2016 at 03:50:43AM -0400, Jeff King wrote: > >> I found a false positive with the new compaction heuristic in v2.9: >> [...] > > And by the way, this is less "hey neat, I found a case" and more "wow, > this is a lot worse than I thought". > > I diffed the old and new output for the top 10,000 commits in this > particular ruby code base. There were 45 commits with changed diffs. > Spot-checking them manually, a little over 1/3 of them featured this bad > pattern. The others looked like strict improvements. > > That's a lot worse than the outcomes we saw on other code bases earlier. > 1/3 bad is still a net improvement, so I dunno. Is this worth worrying > about? Should we bring back the documentation for the knob to disable > it? Should we consider making it tunable via gitattributes? > > I don't think that last one really helps; the good cases _and_ the bad > ones are both in ruby code (though certainly the C code we looked at > earlier was all good). > > It may also be possible to make it Just Work by using extra information > like indentation. I haven't thought hard enough about that to say. > > -Peff I recall saying "we'd end up being better in some and worse in others" at the very beginning. How about toggling the default back for the upcoming release, keeping the experimentation knob in the code, and try different heuristics like the "indentation" during the next cycle? ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [BUG-ish] diff compaction heuristic false positive 2016-06-10 15:56 ` Junio C Hamano @ 2016-06-10 16:25 ` Stefan Beller 2016-06-10 16:29 ` Jacob Keller 0 siblings, 1 reply; 16+ messages in thread From: Stefan Beller @ 2016-06-10 16:25 UTC (permalink / raw) To: Junio C Hamano; +Cc: Jeff King, Jacob Keller, git@vger.kernel.org On Fri, Jun 10, 2016 at 8:56 AM, Junio C Hamano <gitster@pobox.com> wrote: > Jeff King <peff@peff.net> writes: > >> On Fri, Jun 10, 2016 at 03:50:43AM -0400, Jeff King wrote: >> >>> I found a false positive with the new compaction heuristic in v2.9: >>> [...] >> >> And by the way, this is less "hey neat, I found a case" and more "wow, >> this is a lot worse than I thought". >> >> I diffed the old and new output for the top 10,000 commits in this >> particular ruby code base. There were 45 commits with changed diffs. >> Spot-checking them manually, a little over 1/3 of them featured this bad >> pattern. The others looked like strict improvements. >> >> That's a lot worse than the outcomes we saw on other code bases earlier. >> 1/3 bad is still a net improvement, so I dunno. Is this worth worrying >> about? Should we bring back the documentation for the knob to disable >> it? Should we consider making it tunable via gitattributes? >> >> I don't think that last one really helps; the good cases _and_ the bad >> ones are both in ruby code (though certainly the C code we looked at >> earlier was all good). >> >> It may also be possible to make it Just Work by using extra information >> like indentation. I haven't thought hard enough about that to say. >> >> -Peff > > I recall saying "we'd end up being better in some and worse in > others" at the very beginning. How about toggling the default back > for the upcoming release, keeping the experimentation knob in the > code, and try different heuristics like the "indentation" during the > next cycle? Sure. I thought about for a while now and by now I agree with Junio. No matter what kind of heuristic we can come up with it is easy to construct a counter example. That said, let's try the indentation thing, though I suspect one of the early motivating examples (an excerpt from a kernel config file) would not do well with it, as it had not an indentation scheme as programming languages do. Thanks, Stefan ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [BUG-ish] diff compaction heuristic false positive 2016-06-10 16:25 ` Stefan Beller @ 2016-06-10 16:29 ` Jacob Keller 2016-06-10 18:13 ` Re* " Junio C Hamano 0 siblings, 1 reply; 16+ messages in thread From: Jacob Keller @ 2016-06-10 16:29 UTC (permalink / raw) To: Stefan Beller; +Cc: Junio C Hamano, Jeff King, git@vger.kernel.org On Fri, Jun 10, 2016 at 9:25 AM, Stefan Beller <sbeller@google.com> wrote: > On Fri, Jun 10, 2016 at 8:56 AM, Junio C Hamano <gitster@pobox.com> wrote: >> Jeff King <peff@peff.net> writes: >> >>> On Fri, Jun 10, 2016 at 03:50:43AM -0400, Jeff King wrote: >>> >>>> I found a false positive with the new compaction heuristic in v2.9: >>>> [...] >>> >>> And by the way, this is less "hey neat, I found a case" and more "wow, >>> this is a lot worse than I thought". >>> >>> I diffed the old and new output for the top 10,000 commits in this >>> particular ruby code base. There were 45 commits with changed diffs. >>> Spot-checking them manually, a little over 1/3 of them featured this bad >>> pattern. The others looked like strict improvements. >>> >>> That's a lot worse than the outcomes we saw on other code bases earlier. >>> 1/3 bad is still a net improvement, so I dunno. Is this worth worrying >>> about? Should we bring back the documentation for the knob to disable >>> it? Should we consider making it tunable via gitattributes? >>> >>> I don't think that last one really helps; the good cases _and_ the bad >>> ones are both in ruby code (though certainly the C code we looked at >>> earlier was all good). >>> >>> It may also be possible to make it Just Work by using extra information >>> like indentation. I haven't thought hard enough about that to say. >>> >>> -Peff >> >> I recall saying "we'd end up being better in some and worse in >> others" at the very beginning. How about toggling the default back >> for the upcoming release, keeping the experimentation knob in the >> code, and try different heuristics like the "indentation" during the >> next cycle? > > Sure. I thought about for a while now and by now I agree with Junio. > No matter what kind of heuristic we can come up with it is easy to construct > a counter example. > > That said, let's try the indentation thing, though I suspect > one of the early motivating examples (an excerpt from a kernel config file) > would not do well with it, as it had not an indentation scheme as programming > languages do. > > Thanks, > Stefan I think we could use the indentation trick and it might help in this case. I agree, let's disable this for this cycle and experiment in the next one. Good catch, Peff. As others have said you will always be able to produce counter examples, that's the nature of heuristics. The idea is to see if we can come up with something simple that mostly improves the output, even if sometimes it might have a negative impact on the outputs. But I think we should avoid changing behavior unless it's mostly an improvement. Regards, Jake ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re* [BUG-ish] diff compaction heuristic false positive 2016-06-10 16:29 ` Jacob Keller @ 2016-06-10 18:13 ` Junio C Hamano 2016-06-10 18:21 ` Stefan Beller 2016-06-10 20:30 ` Jeff King 0 siblings, 2 replies; 16+ messages in thread From: Junio C Hamano @ 2016-06-10 18:13 UTC (permalink / raw) To: Jacob Keller; +Cc: Stefan Beller, Jeff King, git@vger.kernel.org Jacob Keller <jacob.keller@gmail.com> writes: > I think we could use the indentation trick and it might help in this > case. I agree, let's disable this for this cycle and experiment in the > next one. Good catch, Peff. > > As others have said you will always be able to produce counter > examples, that's the nature of heuristics. The idea is to see if we > can come up with something simple that mostly improves the output, > even if sometimes it might have a negative impact on the outputs. But > I think we should avoid changing behavior unless it's mostly an > improvement. OK, let's do this then for the upcoming release for now. I am tempted to flip it back on after the release in 'next', so that developers would be exposed to the heuristic by default, though. -- >8 -- Subject: [PATCH] diff: disable compaction heuristic for now http://lkml.kernel.org/g/20160610075043.GA13411@sigill.intra.peff.net reports that a change to add a new "function" with common ending with the existing one at the end of the file is shown like this: def foo do_foo_stuff() + common_ending() +end + +def bar + do_bar_stuff() + common_ending() end when the new heuristic is in use. In reality, the change is to add the blank line before "def bar" and everything below, which is what the code without the new heuristic shows. Disable the heuristics by default and leave it as an experimental feature for now. Signed-off-by: Junio C Hamano <gitster@pobox.com> --- diff.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/diff.c b/diff.c index 05ca3ce..9116d9d 100644 --- a/diff.c +++ b/diff.c @@ -25,7 +25,7 @@ #endif static int diff_detect_rename_default; -static int diff_compaction_heuristic = 1; +static int diff_compaction_heuristic; /* experimental */ static int diff_rename_limit_default = 400; static int diff_suppress_blank_empty; static int diff_use_color_default = -1; -- 2.9.0-rc2-275-g493bdbb ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: Re* [BUG-ish] diff compaction heuristic false positive 2016-06-10 18:13 ` Re* " Junio C Hamano @ 2016-06-10 18:21 ` Stefan Beller 2016-06-10 20:30 ` Jeff King 1 sibling, 0 replies; 16+ messages in thread From: Stefan Beller @ 2016-06-10 18:21 UTC (permalink / raw) To: Junio C Hamano; +Cc: Jacob Keller, Jeff King, git@vger.kernel.org On Fri, Jun 10, 2016 at 11:13 AM, Junio C Hamano <gitster@pobox.com> wrote: > Subject: [PATCH] diff: disable compaction heuristic for now > > http://lkml.kernel.org/g/20160610075043.GA13411@sigill.intra.peff.net > reports that a change to add a new "function" with common ending > with the existing one at the end of the file is shown like this: > > def foo > do_foo_stuff() > > + common_ending() > +end > + > +def bar > + do_bar_stuff() > + > common_ending() > end > > when the new heuristic is in use. In reality, the change is to add > the blank line before "def bar" and everything below, which is what > the code without the new heuristic shows. > > Disable the heuristics by default and leave it as an experimental > feature for now. > > Signed-off-by: Junio C Hamano <gitster@pobox.com> > --- Thanks, Stefan ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Re* [BUG-ish] diff compaction heuristic false positive 2016-06-10 18:13 ` Re* " Junio C Hamano 2016-06-10 18:21 ` Stefan Beller @ 2016-06-10 20:30 ` Jeff King 2016-06-10 20:48 ` [PATCH v2] diff: disable compaction heuristic for now Junio C Hamano 1 sibling, 1 reply; 16+ messages in thread From: Jeff King @ 2016-06-10 20:30 UTC (permalink / raw) To: Junio C Hamano; +Cc: Jacob Keller, Stefan Beller, git@vger.kernel.org On Fri, Jun 10, 2016 at 11:13:10AM -0700, Junio C Hamano wrote: > Jacob Keller <jacob.keller@gmail.com> writes: > > > I think we could use the indentation trick and it might help in this > > case. I agree, let's disable this for this cycle and experiment in the > > next one. Good catch, Peff. > > > > As others have said you will always be able to produce counter > > examples, that's the nature of heuristics. The idea is to see if we > > can come up with something simple that mostly improves the output, > > even if sometimes it might have a negative impact on the outputs. But > > I think we should avoid changing behavior unless it's mostly an > > improvement. > > OK, let's do this then for the upcoming release for now. I am > tempted to flip it back on after the release in 'next', so that > developers would be exposed to the heuristic by default, though. Thanks for the patch, and I agree flipping it back on in "next" is a good idea. It may be that I am making a fuss over nothing. As you say, we always knew that it might have false positives. Mostly I was just surprised how frequent they were in this example (I also wondered why the same pattern did not trigger in the C code we looked at). > -- >8 -- > Subject: [PATCH] diff: disable compaction heuristic for now Looks good. We probably want a patch to the release notes to note that it's not on by default. And we may want to advertise the experimental knob so that people actually try it (otherwise we won't get feedback from the masses). -Peff ^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH v2] diff: disable compaction heuristic for now 2016-06-10 20:30 ` Jeff King @ 2016-06-10 20:48 ` Junio C Hamano 2016-06-10 20:53 ` Jeff King 2016-06-10 20:55 ` Junio C Hamano 0 siblings, 2 replies; 16+ messages in thread From: Junio C Hamano @ 2016-06-10 20:48 UTC (permalink / raw) To: Jeff King; +Cc: Jacob Keller, Stefan Beller, git@vger.kernel.org http://lkml.kernel.org/g/20160610075043.GA13411@sigill.intra.peff.net reports that a change to add a new "function" with common ending with the existing one at the end of the file is shown like this: def foo do_foo_stuff() + common_ending() +end + +def bar + do_bar_stuff() + common_ending() end when the new heuristic is in use. In reality, the change is to add the blank line before "def bar" and everything below, which is what the code without the new heuristic shows. Disable the heuristics by default, and resurrect the documentation for the option and the configuration variables, while clearly marking the feature as still experimental. Signed-off-by: Junio C Hamano <gitster@pobox.com> --- Jeff King <peff@peff.net> writes: > We probably want a patch to the release notes to note that it's not on > by default. And we may want to advertise the experimental knob so > that people actually try it (otherwise we won't get feedback from the > masses). OK, I think that is sensible. The interdiff is not a strict reversion of 77085a61 (diff: undocument the compaction heuristic knobs for experimentation, 2016-05-02) but stresses that the feature is off by default and is experimental. Documentation/diff-config.txt | 5 +++++ Documentation/diff-options.txt | 7 +++++++ diff.c | 2 +- 3 files changed, 13 insertions(+), 1 deletion(-) diff --git a/Documentation/diff-config.txt b/Documentation/diff-config.txt index 6eaa452..6fb70c5 100644 --- a/Documentation/diff-config.txt +++ b/Documentation/diff-config.txt @@ -166,6 +166,11 @@ diff.tool:: include::mergetools-diff.txt[] +diff.compactionHeuristic:: + Set this option to `true` to enable an experimental heuristic that + shifts the hunk boundary in an attempt to make the resulting + patch easier to read. + diff.algorithm:: Choose a diff algorithm. The variants are as follows: + diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt index 3ad6404..9baf1ad 100644 --- a/Documentation/diff-options.txt +++ b/Documentation/diff-options.txt @@ -63,6 +63,13 @@ ifndef::git-format-patch[] Synonym for `-p --raw`. endif::git-format-patch[] +--compaction-heuristic:: +--no-compaction-heuristic:: + These are to help debugging and tuning an experimental + heuristic (which is off by default) that shifts the hunk + boundary in an attempt to make the resulting patch easier + to read. + --minimal:: Spend extra time to make sure the smallest possible diff is produced. diff --git a/diff.c b/diff.c index 05ca3ce..9116d9d 100644 --- a/diff.c +++ b/diff.c @@ -25,7 +25,7 @@ #endif static int diff_detect_rename_default; -static int diff_compaction_heuristic = 1; +static int diff_compaction_heuristic; /* experimental */ static int diff_rename_limit_default = 400; static int diff_suppress_blank_empty; static int diff_use_color_default = -1; -- 2.9.0-rc2-285-ge226c12 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH v2] diff: disable compaction heuristic for now 2016-06-10 20:48 ` [PATCH v2] diff: disable compaction heuristic for now Junio C Hamano @ 2016-06-10 20:53 ` Jeff King 2016-06-10 20:55 ` Junio C Hamano 1 sibling, 0 replies; 16+ messages in thread From: Jeff King @ 2016-06-10 20:53 UTC (permalink / raw) To: Junio C Hamano; +Cc: Jacob Keller, Stefan Beller, git@vger.kernel.org On Fri, Jun 10, 2016 at 01:48:58PM -0700, Junio C Hamano wrote: > Jeff King <peff@peff.net> writes: > > > We probably want a patch to the release notes to note that it's not on > > by default. And we may want to advertise the experimental knob so > > that people actually try it (otherwise we won't get feedback from the > > masses). > > OK, I think that is sensible. The interdiff is not a strict > reversion of 77085a61 (diff: undocument the compaction heuristic > knobs for experimentation, 2016-05-02) but stresses that the > feature is off by default and is experimental. Looks good to me. Do we want to include in "experimental" that the option and command-line flag might go away in the future? I think probably not. I do not mind supporting this forever (it _has_ shown its usefulness, so I don't think it is any worse to maintain going forward than things like --patience, which people can tweak if their diff looks uglier than they would like). -Peff ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2] diff: disable compaction heuristic for now 2016-06-10 20:48 ` [PATCH v2] diff: disable compaction heuristic for now Junio C Hamano 2016-06-10 20:53 ` Jeff King @ 2016-06-10 20:55 ` Junio C Hamano 2016-06-10 21:05 ` Jeff King 1 sibling, 1 reply; 16+ messages in thread From: Junio C Hamano @ 2016-06-10 20:55 UTC (permalink / raw) To: Jeff King; +Cc: Jacob Keller, Stefan Beller, git@vger.kernel.org Junio C Hamano <gitster@pobox.com> writes: > Jeff King <peff@peff.net> writes: > > > We probably want a patch to the release notes to note that it's not on > > by default. And we may want to advertise the experimental knob so > > that people actually try it (otherwise we won't get feedback from the > > masses). > > OK, I think that is sensible. The interdiff is not a strict > reversion of 77085a61 (diff: undocument the compaction heuristic > knobs for experimentation, 2016-05-02) but stresses that the > feature is off by default and is experimental. This is for 'master' when the topic is merged (will keep it in stash for now). Documentation/RelNotes/2.9.0.txt | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/Documentation/RelNotes/2.9.0.txt b/Documentation/RelNotes/2.9.0.txt index 927cb9b..79b46e1 100644 --- a/Documentation/RelNotes/2.9.0.txt +++ b/Documentation/RelNotes/2.9.0.txt @@ -118,9 +118,11 @@ UI, Workflows & Features * HTTP transport clients learned to throw extra HTTP headers at the server, specified via http.extraHeader configuration variable. - * Patch output from "git diff" and friends has been tweaked to be - more readable by using a blank line as a strong hint that the - contents before and after it belong to logically separate units. + * The "--compaction-heuristic" option to "git diff" family of + commands enables a heuristic to make the patch output more readable + by using a blank line as a strong hint that the contents before and + after it belong to logically separate units. It is still + experimental. * A new configuration variable core.hooksPath allows customizing where the hook directory is. ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH v2] diff: disable compaction heuristic for now 2016-06-10 20:55 ` Junio C Hamano @ 2016-06-10 21:05 ` Jeff King 2016-06-10 21:46 ` Junio C Hamano 0 siblings, 1 reply; 16+ messages in thread From: Jeff King @ 2016-06-10 21:05 UTC (permalink / raw) To: Junio C Hamano; +Cc: Jacob Keller, Stefan Beller, git@vger.kernel.org On Fri, Jun 10, 2016 at 01:55:01PM -0700, Junio C Hamano wrote: > This is for 'master' when the topic is merged (will keep it in stash > for now). > > Documentation/RelNotes/2.9.0.txt | 8 +++++--- > 1 file changed, 5 insertions(+), 3 deletions(-) > > diff --git a/Documentation/RelNotes/2.9.0.txt b/Documentation/RelNotes/2.9.0.txt > index 927cb9b..79b46e1 100644 > --- a/Documentation/RelNotes/2.9.0.txt > +++ b/Documentation/RelNotes/2.9.0.txt > @@ -118,9 +118,11 @@ UI, Workflows & Features > * HTTP transport clients learned to throw extra HTTP headers at the > server, specified via http.extraHeader configuration variable. > > - * Patch output from "git diff" and friends has been tweaked to be > - more readable by using a blank line as a strong hint that the > - contents before and after it belong to logically separate units. > + * The "--compaction-heuristic" option to "git diff" family of > + commands enables a heuristic to make the patch output more readable > + by using a blank line as a strong hint that the contents before and > + after it belong to logically separate units. It is still > + experimental. > > * A new configuration variable core.hooksPath allows customizing > where the hook directory is. Looks good. I think your calendar calls for release 2.9.0 on Monday. Are you going to bump the schedule for this? I don't think it's very high risk, and wouldn't need to. -Peff ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2] diff: disable compaction heuristic for now 2016-06-10 21:05 ` Jeff King @ 2016-06-10 21:46 ` Junio C Hamano 0 siblings, 0 replies; 16+ messages in thread From: Junio C Hamano @ 2016-06-10 21:46 UTC (permalink / raw) To: Jeff King; +Cc: Jacob Keller, Stefan Beller, git@vger.kernel.org Jeff King <peff@peff.net> writes: > Looks good. > > I think your calendar calls for release 2.9.0 on Monday. Are you going > to bump the schedule for this? I don't think it's very high risk, and > wouldn't need to. I didn't plan to. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [BUG-ish] diff compaction heuristic false positive 2016-06-10 7:50 [BUG-ish] diff compaction heuristic false positive Jeff King 2016-06-10 8:31 ` Jeff King @ 2016-06-10 8:31 ` Michael Haggerty 2016-06-10 8:41 ` Jeff King 1 sibling, 1 reply; 16+ messages in thread From: Michael Haggerty @ 2016-06-10 8:31 UTC (permalink / raw) To: Jeff King, Stefan Beller; +Cc: Junio C Hamano, Jacob Keller, git On 06/10/2016 09:50 AM, Jeff King wrote: > I found a false positive with the new compaction heuristic in v2.9: > [...] > I get this rather unfortunate diff: > > $ git diff > diff --git a/file.rb b/file.rb > index bd9d1cb..67fbeba 100644 > --- a/file.rb > +++ b/file.rb > @@ -1,5 +1,11 @@ > def foo > do_foo_stuff() > > + common_ending() > +end > + > +def bar > + do_bar_stuff() > + > common_ending() > end I've often thought that indentation would be a good, fairly universal signal for diff to use when deciding how to slide hunks around. Most source code is indented in a way that shows its structure. I propose the following heuristic: * Prefer to start and end hunks following lines with the least indentation. * Define the "indentation" of a blank line to be the indentation of the previous non-blank line minus epsilon. * In the case of a tie, prefer to slide the hunk down as far as possible. For the case above, the indentations for the candidate "before-the-hunk" lines and the resulting hunk would be > def foo > 2 do_foo_stuff() > 2-ε > 2 common_ending() > 0 end > 0-ε + > +def bar > + do_bar_stuff() > + > + common_ending() > +end I haven't tried testing this heuristic systematically but I have the feeling that it would be pretty effective and yet quite easy to implement. Michael ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [BUG-ish] diff compaction heuristic false positive 2016-06-10 8:31 ` [BUG-ish] diff compaction heuristic false positive Michael Haggerty @ 2016-06-10 8:41 ` Jeff King 2016-06-10 11:00 ` Michael Haggerty 0 siblings, 1 reply; 16+ messages in thread From: Jeff King @ 2016-06-10 8:41 UTC (permalink / raw) To: Michael Haggerty; +Cc: Stefan Beller, Junio C Hamano, Jacob Keller, git On Fri, Jun 10, 2016 at 10:31:13AM +0200, Michael Haggerty wrote: > I've often thought that indentation would be a good, fairly universal > signal for diff to use when deciding how to slide hunks around. Most > source code is indented in a way that shows its structure. > > I propose the following heuristic: > > * Prefer to start and end hunks following lines with the least > indentation. > > * Define the "indentation" of a blank line to be the indentation of > the previous non-blank line minus epsilon. > > * In the case of a tie, prefer to slide the hunk down as far as > possible. Hmm. That might help this case, but the original motivation for this heuristic was something like: ## # foo def foo something end ## # bar def bar something_else end where we add the first function above the second. We end up with: diff --git a/file.rb b/file.rb index 1f9b151..f991c76 100644 --- a/file.rb +++ b/file.rb @@ -1,4 +1,10 @@ ## +# foo +def foo + something +end + +## # bar def bar something else I.e., crediting the "##" to the wrong spot (or in C, the "/*"). I don't think indentation helps us there (sliding-up would, but like sliding-down, it just depends on the order of the hunks). So I agree that adding indentation to the mix might help, but I don't think it can replace this heuristic. -Peff ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [BUG-ish] diff compaction heuristic false positive 2016-06-10 8:41 ` Jeff King @ 2016-06-10 11:00 ` Michael Haggerty 0 siblings, 0 replies; 16+ messages in thread From: Michael Haggerty @ 2016-06-10 11:00 UTC (permalink / raw) To: Jeff King; +Cc: Stefan Beller, Junio C Hamano, Jacob Keller, git On 06/10/2016 10:41 AM, Jeff King wrote: > On Fri, Jun 10, 2016 at 10:31:13AM +0200, Michael Haggerty wrote: > >> I've often thought that indentation would be a good, fairly universal >> signal for diff to use when deciding how to slide hunks around. Most >> source code is indented in a way that shows its structure. >> >> I propose the following heuristic: >> >> * Prefer to start and end hunks following lines with the least >> indentation. >> >> * Define the "indentation" of a blank line to be the indentation of >> the previous non-blank line minus epsilon. >> >> * In the case of a tie, prefer to slide the hunk down as far as >> possible. > > Hmm. That might help this case, but the original motivation for this > heuristic was something like: > > ## > # foo > def foo > something > end > > ## > # bar > def bar > something_else > end > > where we add the first function above the second. We end up with: > > diff --git a/file.rb b/file.rb > index 1f9b151..f991c76 100644 > --- a/file.rb > +++ b/file.rb > @@ -1,4 +1,10 @@ > ## > +# foo > +def foo > + something > +end > + > +## > # bar > def bar > something else > > I.e., crediting the "##" to the wrong spot (or in C, the "/*"). I don't > think indentation helps us there (sliding-up would, but like > sliding-down, it just depends on the order of the hunks). > > So I agree that adding indentation to the mix might help, but I don't > think it can replace this heuristic. Ummm, I think the indentation heuristic would work with that example, too, as long as we consider there to be an imaginary line "0" of the file (i.e., preceding the first real line) that has an indentation of -ε. Michael ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2016-06-10 21:47 UTC | newest] Thread overview: 16+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2016-06-10 7:50 [BUG-ish] diff compaction heuristic false positive Jeff King 2016-06-10 8:31 ` Jeff King 2016-06-10 15:56 ` Junio C Hamano 2016-06-10 16:25 ` Stefan Beller 2016-06-10 16:29 ` Jacob Keller 2016-06-10 18:13 ` Re* " Junio C Hamano 2016-06-10 18:21 ` Stefan Beller 2016-06-10 20:30 ` Jeff King 2016-06-10 20:48 ` [PATCH v2] diff: disable compaction heuristic for now Junio C Hamano 2016-06-10 20:53 ` Jeff King 2016-06-10 20:55 ` Junio C Hamano 2016-06-10 21:05 ` Jeff King 2016-06-10 21:46 ` Junio C Hamano 2016-06-10 8:31 ` [BUG-ish] diff compaction heuristic false positive Michael Haggerty 2016-06-10 8:41 ` Jeff King 2016-06-10 11:00 ` Michael Haggerty
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).