* [PATCH 1/2] diff: fix handling of binary rewrite diffs @ 2008-12-09 8:12 Jeff King 2008-12-09 8:13 ` [PATCH 2/2] diff: respect textconv in " Jeff King 2008-12-10 8:34 ` [PATCH 1/2] diff: fix handling of binary " Junio C Hamano 0 siblings, 2 replies; 7+ messages in thread From: Jeff King @ 2008-12-09 8:12 UTC (permalink / raw) To: Junio C Hamano; +Cc: git The current emit_rewrite_diff code always writes a text patch without checking whether the content is binary. This means that if you end up with a rewrite diff for a binary file, you get lots of raw binary goo in your patch. Instead, if we have binary files, then let's just skip emit_rewrite_diff altogether. We will already have shown the "dissimilarity index" line, so it is really about the diff contents. If binary diffs are turned off, the "Binary files a/file and b/file differ" message should be the same in either case. If we do have binary patches turned on, there isn't much point in making a less-efficient binary patch that does a total rewrite; no human is going to read it, and since binary patches don't apply with any fuzz anyway, the result of application should be the same. Signed-off-by: Jeff King <peff@peff.net> --- I couldn't think of a good reason to want a different format of binary patch for a rewrite versus a regular diff, but others may be able to. In that case, you have to mimic the binary codepath somewhat in emit_rewrite_diff; I started on it, but realized I was just copying the code. However, we should at least do something before 1.6.1; as it is, it spews binary garbage. I discovered this because one of my textconv'd files had a rewrite (which apparently happens by changing a few lines in a word document :)) and gave me a bogus diff. However, the problem is not unique to textconv; patch 1/2 handles binary files and 2/2 handles the textconv case. I didn't put the tests into the existing t4022-diff-rewrite because I wanted to keep the textconv tests together with these tests, and I think it makes sense to test features in isolation before testing them together. IOW: t4022 - rewrite works t4030 - textconv works t4031 - rewrite AND textconv work together diff.c | 4 ++- t/t4031-diff-rewrite-binary.sh | 42 ++++++++++++++++++++++++++++++++++++++++ 2 files changed, 45 insertions(+), 1 deletions(-) create mode 100755 t/t4031-diff-rewrite-binary.sh diff --git a/diff.c b/diff.c index f644947..ea958a2 100644 --- a/diff.c +++ b/diff.c @@ -1376,7 +1376,9 @@ static void builtin_diff(const char *name_a, */ if ((one->mode ^ two->mode) & S_IFMT) goto free_ab_and_return; - if (complete_rewrite) { + if (complete_rewrite && + !diff_filespec_is_binary(one) && + !diff_filespec_is_binary(two)) { emit_rewrite_diff(name_a, name_b, one, two, o); o->found_changes = 1; goto free_ab_and_return; diff --git a/t/t4031-diff-rewrite-binary.sh b/t/t4031-diff-rewrite-binary.sh new file mode 100755 index 0000000..4b522f7 --- /dev/null +++ b/t/t4031-diff-rewrite-binary.sh @@ -0,0 +1,42 @@ +#!/bin/sh + +test_description='rewrite diff on binary file' + +. ./test-lib.sh + +# We must be large enough to meet the MINIMUM_BREAK_SIZE +# requirement. +make_file() { + for i in 1 2 3 4 5 6 7 8 9 10; do + for j in 1 2 3 4 5 6 7 9 10; do + for k in 1 2 3 4 5; do + printf "$1\n" + done + done + done >file +} + +test_expect_success 'create binary file with changes' ' + make_file "\\0" && + git add file && + make_file "\\01" +' + +test_expect_success 'vanilla diff is binary' ' + git diff >diff && + grep "Binary files a/file and b/file differ" diff +' + +test_expect_success 'rewrite diff is binary' ' + git diff -B >diff && + grep "dissimilarity index" diff && + grep "Binary files a/file and b/file differ" diff +' + +test_expect_success 'rewrite diff can show binary patch' ' + git diff -B --binary >diff && + grep "dissimilarity index" diff && + grep "GIT binary patch" diff +' + +test_done -- 1.6.1.rc2.1.g8f945.dirty ^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH 2/2] diff: respect textconv in rewrite diffs 2008-12-09 8:12 [PATCH 1/2] diff: fix handling of binary rewrite diffs Jeff King @ 2008-12-09 8:13 ` Jeff King 2008-12-10 8:34 ` Junio C Hamano 2008-12-10 8:34 ` [PATCH 1/2] diff: fix handling of binary " Junio C Hamano 1 sibling, 1 reply; 7+ messages in thread From: Jeff King @ 2008-12-09 8:13 UTC (permalink / raw) To: Junio C Hamano; +Cc: git Currently we just skip rewrite diffs for binary files; this patch makes an exception for files which will be textconv'd, and actually performs the textconv before generating the diff. Conceptually, rewrite diffs should be in the exact same format as the a non-rewrite diff, except that we refuse to share any context. Thus it makes very little sense for "git diff" to show a textconv'd diff, but for "git diff -B" to show "Binary files differ". Signed-off-by: Jeff King <peff@peff.net> --- diff.c | 48 ++++++++++++++++++++++++++++++---------- t/t4031-diff-rewrite-binary.sh | 21 ++++++++++++++++- 2 files changed, 56 insertions(+), 13 deletions(-) diff --git a/diff.c b/diff.c index ea958a2..6dc19bc 100644 --- a/diff.c +++ b/diff.c @@ -229,6 +229,8 @@ static void emit_rewrite_diff(const char *name_a, const char *name_b, struct diff_filespec *one, struct diff_filespec *two, + const char *textconv_one, + const char *textconv_two, struct diff_options *o) { int lc_a, lc_b; @@ -241,6 +243,8 @@ static void emit_rewrite_diff(const char *name_a, const char *reset = diff_get_color(color_diff, DIFF_RESET); static struct strbuf a_name = STRBUF_INIT, b_name = STRBUF_INIT; const char *a_prefix, *b_prefix; + const char *data_one, *data_two; + size_t size_one, size_two; if (diff_mnemonic_prefix && DIFF_OPT_TST(o, REVERSE_DIFF)) { a_prefix = o->b_prefix; @@ -262,8 +266,27 @@ static void emit_rewrite_diff(const char *name_a, diff_populate_filespec(one, 0); diff_populate_filespec(two, 0); - lc_a = count_lines(one->data, one->size); - lc_b = count_lines(two->data, two->size); + if (textconv_one) { + data_one = run_textconv(textconv_one, one, &size_one); + if (!data_one) + die("unable to read files to diff"); + } + else { + data_one = one->data; + size_one = one->size; + } + if (textconv_two) { + data_two = run_textconv(textconv_two, two, &size_two); + if (!data_two) + die("unable to read files to diff"); + } + else { + data_two = two->data; + size_two = two->size; + } + + lc_a = count_lines(data_one, size_one); + lc_b = count_lines(data_two, size_two); fprintf(o->file, "%s--- %s%s%s\n%s+++ %s%s%s\n%s@@ -", metainfo, a_name.buf, name_a_tab, reset, @@ -273,9 +296,9 @@ static void emit_rewrite_diff(const char *name_a, print_line_count(o->file, lc_b); fprintf(o->file, " @@%s\n", reset); if (lc_a) - copy_file_with_prefix(o->file, '-', one->data, one->size, old, reset); + copy_file_with_prefix(o->file, '-', data_one, size_one, old, reset); if (lc_b) - copy_file_with_prefix(o->file, '+', two->data, two->size, new, reset); + copy_file_with_prefix(o->file, '+', data_two, size_two, new, reset); } static int fill_mmfile(mmfile_t *mf, struct diff_filespec *one) @@ -1334,6 +1357,11 @@ static void builtin_diff(const char *name_a, const char *a_prefix, *b_prefix; const char *textconv_one = NULL, *textconv_two = NULL; + if (DIFF_OPT_TST(o, ALLOW_TEXTCONV)) { + textconv_one = get_textconv(one); + textconv_two = get_textconv(two); + } + diff_set_mnemonic_prefix(o, "a/", "b/"); if (DIFF_OPT_TST(o, REVERSE_DIFF)) { a_prefix = o->b_prefix; @@ -1377,9 +1405,10 @@ static void builtin_diff(const char *name_a, if ((one->mode ^ two->mode) & S_IFMT) goto free_ab_and_return; if (complete_rewrite && - !diff_filespec_is_binary(one) && - !diff_filespec_is_binary(two)) { - emit_rewrite_diff(name_a, name_b, one, two, o); + (textconv_one || !diff_filespec_is_binary(one)) && + (textconv_two || !diff_filespec_is_binary(two))) { + emit_rewrite_diff(name_a, name_b, one, two, + textconv_one, textconv_two, o); o->found_changes = 1; goto free_ab_and_return; } @@ -1388,11 +1417,6 @@ static void builtin_diff(const char *name_a, if (fill_mmfile(&mf1, one) < 0 || fill_mmfile(&mf2, two) < 0) die("unable to read files to diff"); - if (DIFF_OPT_TST(o, ALLOW_TEXTCONV)) { - textconv_one = get_textconv(one); - textconv_two = get_textconv(two); - } - if (!DIFF_OPT_TST(o, TEXT) && ( (diff_filespec_is_binary(one) && !textconv_one) || (diff_filespec_is_binary(two) && !textconv_two) )) { diff --git a/t/t4031-diff-rewrite-binary.sh b/t/t4031-diff-rewrite-binary.sh index 4b522f7..72e5bfd 100755 --- a/t/t4031-diff-rewrite-binary.sh +++ b/t/t4031-diff-rewrite-binary.sh @@ -7,13 +7,15 @@ test_description='rewrite diff on binary file' # We must be large enough to meet the MINIMUM_BREAK_SIZE # requirement. make_file() { + # common first line to help identify rewrite versus regular diff + printf "=\n" >file for i in 1 2 3 4 5 6 7 8 9 10; do for j in 1 2 3 4 5 6 7 9 10; do for k in 1 2 3 4 5; do printf "$1\n" done done - done >file + done >>file } test_expect_success 'create binary file with changes' ' @@ -39,4 +41,21 @@ test_expect_success 'rewrite diff can show binary patch' ' grep "GIT binary patch" diff ' +cat >dump <<'EOF' +#!/bin/sh +perl -e '$/ = undef; $_ = <>; s/./ord($&)/ge; print $_' < "$1" +EOF +chmod +x dump +test_expect_success 'setup textconv' ' + echo file diff=foo >.gitattributes && + git config diff.foo.textconv "$PWD"/dump +' + +test_expect_success 'rewrite diff respects textconv' ' + git diff -B >diff && + grep "dissimilarity index" diff && + grep "^-61" diff && + grep "^-0" diff +' + test_done -- 1.6.1.rc2.1.g8f945.dirty ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH 2/2] diff: respect textconv in rewrite diffs 2008-12-09 8:13 ` [PATCH 2/2] diff: respect textconv in " Jeff King @ 2008-12-10 8:34 ` Junio C Hamano 2008-12-10 9:02 ` Jeff King 0 siblings, 1 reply; 7+ messages in thread From: Junio C Hamano @ 2008-12-10 8:34 UTC (permalink / raw) To: Jeff King; +Cc: git Jeff King <peff@peff.net> writes: > Currently we just skip rewrite diffs for binary files; this patch makes > an exception for files which will be textconv'd, and actually performs > the textconv before generating the diff. > > Conceptually, rewrite diffs should be in the exact same format as the a > non-rewrite diff, except that we refuse to share any context. Thus it > makes very little sense for "git diff" to show a textconv'd diff, but > for "git diff -B" to show "Binary files differ". Makes sense. > +cat >dump <<'EOF' > +#!/bin/sh > +perl -e '$/ = undef; $_ = <>; s/./ord($&)/ge; print $_' < "$1" > +EOF I'll squash in a change to make this part use $SHELL_PATH for consistency. Thanks. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 2/2] diff: respect textconv in rewrite diffs 2008-12-10 8:34 ` Junio C Hamano @ 2008-12-10 9:02 ` Jeff King 2008-12-10 20:27 ` Junio C Hamano 0 siblings, 1 reply; 7+ messages in thread From: Jeff King @ 2008-12-10 9:02 UTC (permalink / raw) To: Junio C Hamano; +Cc: git On Wed, Dec 10, 2008 at 12:34:26AM -0800, Junio C Hamano wrote: > > +cat >dump <<'EOF' > > +#!/bin/sh > > +perl -e '$/ = undef; $_ = <>; s/./ord($&)/ge; print $_' < "$1" > > +EOF > > I'll squash in a change to make this part use $SHELL_PATH for > consistency. Thanks. It was cut-and-paste from t4030, so if we care, it might be worth changing there, too (and naming it "dump" instead of "hexdump", because it actually dumps in decimal :) ). But more importantly, the fixup you just pushed seems to have an extra ">dump": > +{ > + echo "#!$SHELL_PATH" > + cat >dump <<'EOF' > +perl -e '$/ = undef; $_ = <>; s/./ord($&)/ge; print $_' < "$1" > +EOF > +} >dump > +chmod +x dump -Peff ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 2/2] diff: respect textconv in rewrite diffs 2008-12-10 9:02 ` Jeff King @ 2008-12-10 20:27 ` Junio C Hamano 0 siblings, 0 replies; 7+ messages in thread From: Junio C Hamano @ 2008-12-10 20:27 UTC (permalink / raw) To: Jeff King; +Cc: git Jeff King <peff@peff.net> writes: > But more importantly, the fixup you just pushed seems to have an extra > ">dump": Oops, sorry. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 1/2] diff: fix handling of binary rewrite diffs 2008-12-09 8:12 [PATCH 1/2] diff: fix handling of binary rewrite diffs Jeff King 2008-12-09 8:13 ` [PATCH 2/2] diff: respect textconv in " Jeff King @ 2008-12-10 8:34 ` Junio C Hamano 2008-12-10 9:04 ` Jeff King 1 sibling, 1 reply; 7+ messages in thread From: Junio C Hamano @ 2008-12-10 8:34 UTC (permalink / raw) To: Jeff King; +Cc: git Jeff King <peff@peff.net> writes: > Instead, if we have binary files, then let's just skip emit_rewrite_diff > altogether. We will already have shown the "dissimilarity index" line, > so it is really about the diff contents. If binary diffs are turned off, > the "Binary files a/file and b/file differ" message should be the same > in either case. If we do have binary patches turned on, there isn't much > point in making a less-efficient binary patch that does a total rewrite; > no human is going to read it, and since binary patches don't apply with > any fuzz anyway, the result of application should be the same. Makes sense. > diff.c | 4 ++- > t/t4031-diff-rewrite-binary.sh | 42 ++++++++++++++++++++++++++++++++++++++++ > 2 files changed, 45 insertions(+), 1 deletions(-) > create mode 100755 t/t4031-diff-rewrite-binary.sh > > diff --git a/diff.c b/diff.c > index f644947..ea958a2 100644 > --- a/diff.c > +++ b/diff.c > @@ -1376,7 +1376,9 @@ static void builtin_diff(const char *name_a, > */ > if ((one->mode ^ two->mode) & S_IFMT) > goto free_ab_and_return; > - if (complete_rewrite) { > + if (complete_rewrite && > + !diff_filespec_is_binary(one) && > + !diff_filespec_is_binary(two)) { > emit_rewrite_diff(name_a, name_b, one, two, o); > o->found_changes = 1; > goto free_ab_and_return; And looks correct. > diff --git a/t/t4031-diff-rewrite-binary.sh b/t/t4031-diff-rewrite-binary.sh > new file mode 100755 > index 0000000..4b522f7 > --- /dev/null > +++ b/t/t4031-diff-rewrite-binary.sh > @@ -0,0 +1,42 @@ > +#!/bin/sh > + > +test_description='rewrite diff on binary file' > + > +. ./test-lib.sh > + > +# We must be large enough to meet the MINIMUM_BREAK_SIZE > +# requirement. > +make_file() { > + for i in 1 2 3 4 5 6 7 8 9 10; do > + for j in 1 2 3 4 5 6 7 9 10; do > + for k in 1 2 3 4 5; do > + printf "$1\n" > + done > + done > + done >file > +} > + > +test_expect_success 'create binary file with changes' ' > + make_file "\\0" && > + git add file && > + make_file "\\01" > +' Hmm... "1 2 3 4 5 6 7 9 10"? ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 1/2] diff: fix handling of binary rewrite diffs 2008-12-10 8:34 ` [PATCH 1/2] diff: fix handling of binary " Junio C Hamano @ 2008-12-10 9:04 ` Jeff King 0 siblings, 0 replies; 7+ messages in thread From: Jeff King @ 2008-12-10 9:04 UTC (permalink / raw) To: Junio C Hamano; +Cc: git On Wed, Dec 10, 2008 at 12:34:49AM -0800, Junio C Hamano wrote: > > + for j in 1 2 3 4 5 6 7 9 10; do > Hmm... "1 2 3 4 5 6 7 9 10"? Heh. Oops. I see you fixed this up as 1 through 9. My intent was 1 through 10, but it really doesn't matter. The point is just to hit the minimum break size, and I left a lot of leeway in case it ever gets increased later. -Peff ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2008-12-10 20:28 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-12-09 8:12 [PATCH 1/2] diff: fix handling of binary rewrite diffs Jeff King 2008-12-09 8:13 ` [PATCH 2/2] diff: respect textconv in " Jeff King 2008-12-10 8:34 ` Junio C Hamano 2008-12-10 9:02 ` Jeff King 2008-12-10 20:27 ` Junio C Hamano 2008-12-10 8:34 ` [PATCH 1/2] diff: fix handling of binary " Junio C Hamano 2008-12-10 9:04 ` Jeff King
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).