From: John Keeping <john@keeping.me.uk>
To: Junio C Hamano <gitster@pobox.com>
Cc: git@vger.kernel.org
Subject: [PATCH v2] diffcore-break: don't divide by zero
Date: Wed, 3 Apr 2013 20:24:05 +0100 [thread overview]
Message-ID: <20130403192405.GJ2222@serenity.lan> (raw)
In-Reply-To: <7va9pga73t.fsf@alter.siamese.dyndns.org>
When the source file is empty, the calculation of the merge score
results in a division by zero. In the situation:
== preimage == == postimage ==
F (empty file) F (a large file)
E (a new empty file)
it does not make sense to consider F->E as a rename, so it is better not
to break the pre- and post-image of F.
Bail out early in this case to avoid hitting the divide-by-zero. This
causes the merge score to be left at zero.
Signed-off-by: John Keeping <john@keeping.me.uk>
---
On Tue, Apr 02, 2013 at 03:41:10PM -0700, Junio C Hamano wrote:
> John Keeping <john@keeping.me.uk> writes:
>
> > The message for commit 6dd4b66 (Fix diffcore-break total breakage)
> > indicates that "don't bother to break small files" is wrong in some
> > cases, but it I wonder if "don't bother to break empty files" is okay.
>
> This has a rather subtle ramifications, and we would need to think
> carefully. "break" does two very different things, and the criteria
> you would want to use to decide to "break" a filepair depends on
> which one you are interested in.
>
> The very original motivation of "break" was to show a patch that
> rewrites an existing file completely in a way different from the
> usual "diff -u" output, which will try to minimize the output by
> finding (rare) lines that happen to be the same between the preimage
> and postimage, intersparsed in many lines of "-" (deletion) and "+"
> (addition). Such a change is often easier to understand when shown
> as a single block of "-" (deletion) of the entire original contents,
> followed by a single block of "+" (addition) of the entire new
> contents.
>
> A totally separate motivation of "break" is the one Linus talks
> about in the log message of the said commit. A path filename.h was
> moved to filename_32.h, and a new (and much smaller) filename.h was
> introduced, that "#include"s filename_32.h. "diff -M" that pairs
> deleted files with created files to compute renames in such a case
> would not consider the original filename.h as a possible source of
> filename_32.h that was created. You want to break modification of
> filename.h into (virtual) deletion and addition of filename.h.
>
> For the purpose of the former, you would want not to break a file
> too aggressively. If you started from a file with 1000 lines in it,
> and deleted 910 lines and added 10 lines to result in a file with
> 100 lines, you still have 90 lines of shared contents between the
> preimage and the postimage, and you do not want to show it as
> "delete 1000 lines and add 100 lines". You would want to base your
> decision on how much common material exists between the two.
>
> For the purpose of the latter, however, it is better to err on the
> side of breaking than not breaking. After breaking a suspicious
> modification into addition and deletion, if rename comparison does
> not find a suitable destination for the virtual deletion, the broken
> halves will be merged back, so breaking too little can hurt by
> missing potential renames, but breaking too much will not. You do
> want to break the "900 deleted out of 1000 and then added 10" case,
> as that 900 lines may have gone to another file that was created in
> the same commit. "But we have 90 lines of common material" does not
> matter for the decision to break.
>
> In today's code, the return value of should_break() is about the
> latter, while the score it stores in *merge_score is about the
> former.
Thanks for this explanation. Following it through, it seems that
bailing out early when the destination is empty will do the wrong thing,
whereas letting the code carry on should result in a merge score of
MAX_SCORE and the function returning 1.
So it seems that the patch you proposed is the best way to handle this.
diffcore-break.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/diffcore-break.c b/diffcore-break.c
index 44f8678..1d9e530 100644
--- a/diffcore-break.c
+++ b/diffcore-break.c
@@ -68,6 +68,9 @@ static int should_break(struct diff_filespec *src,
if (max_size < MINIMUM_BREAK_SIZE)
return 0; /* we do not break too small filepair */
+ if (!src->size)
+ return 0; /* we do not let empty files get renamed */
+
if (diffcore_count_changes(src, dst,
&src->cnt_data, &dst->cnt_data,
0,
--
1.8.2.540.gf023cfe
next prev parent reply other threads:[~2013-04-03 19:24 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-04-02 19:50 [PATCH 0/2] Fixes for undefined behaviour John Keeping
2013-04-02 19:50 ` [PATCH 1/2] diffcore-break: don't divide by zero John Keeping
2013-04-02 21:15 ` Junio C Hamano
2013-04-02 21:36 ` John Keeping
2013-04-02 22:41 ` Junio C Hamano
2013-04-03 19:24 ` John Keeping [this message]
2013-04-02 19:50 ` [PATCH 2/2] bisect: avoid signed integer overflow John Keeping
2013-04-02 21:09 ` Junio C Hamano
2013-04-03 19:17 ` John Keeping
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130403192405.GJ2222@serenity.lan \
--to=john@keeping.me.uk \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).