* [PATCH 1/2] diff: fix handling of binary rewrite diffs
@ 2008-12-09 8:12 Jeff King
2008-12-09 8:13 ` [PATCH 2/2] diff: respect textconv in " Jeff King
2008-12-10 8:34 ` [PATCH 1/2] diff: fix handling of binary " Junio C Hamano
0 siblings, 2 replies; 7+ messages in thread
From: Jeff King @ 2008-12-09 8:12 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
The current emit_rewrite_diff code always writes a text
patch without checking whether the content is binary. This
means that if you end up with a rewrite diff for a binary
file, you get lots of raw binary goo in your patch.
Instead, if we have binary files, then let's just skip
emit_rewrite_diff altogether. We will already have shown the
"dissimilarity index" line, so it is really about the diff
contents. If binary diffs are turned off, the "Binary files
a/file and b/file differ" message should be the same in
either case. If we do have binary patches turned on, there
isn't much point in making a less-efficient binary patch
that does a total rewrite; no human is going to read it, and
since binary patches don't apply with any fuzz anyway, the
result of application should be the same.
Signed-off-by: Jeff King <peff@peff.net>
---
I couldn't think of a good reason to want a different format of binary
patch for a rewrite versus a regular diff, but others may be able to.
In that case, you have to mimic the binary codepath somewhat in
emit_rewrite_diff; I started on it, but realized I was just copying the
code.
However, we should at least do something before 1.6.1; as it is, it
spews binary garbage.
I discovered this because one of my textconv'd files had a rewrite
(which apparently happens by changing a few lines in a word document :))
and gave me a bogus diff. However, the problem is not unique to
textconv; patch 1/2 handles binary files and 2/2 handles the textconv
case.
I didn't put the tests into the existing t4022-diff-rewrite because I
wanted to keep the textconv tests together with these tests, and I think
it makes sense to test features in isolation before testing them
together. IOW:
t4022 - rewrite works
t4030 - textconv works
t4031 - rewrite AND textconv work together
diff.c | 4 ++-
t/t4031-diff-rewrite-binary.sh | 42 ++++++++++++++++++++++++++++++++++++++++
2 files changed, 45 insertions(+), 1 deletions(-)
create mode 100755 t/t4031-diff-rewrite-binary.sh
diff --git a/diff.c b/diff.c
index f644947..ea958a2 100644
--- a/diff.c
+++ b/diff.c
@@ -1376,7 +1376,9 @@ static void builtin_diff(const char *name_a,
*/
if ((one->mode ^ two->mode) & S_IFMT)
goto free_ab_and_return;
- if (complete_rewrite) {
+ if (complete_rewrite &&
+ !diff_filespec_is_binary(one) &&
+ !diff_filespec_is_binary(two)) {
emit_rewrite_diff(name_a, name_b, one, two, o);
o->found_changes = 1;
goto free_ab_and_return;
diff --git a/t/t4031-diff-rewrite-binary.sh b/t/t4031-diff-rewrite-binary.sh
new file mode 100755
index 0000000..4b522f7
--- /dev/null
+++ b/t/t4031-diff-rewrite-binary.sh
@@ -0,0 +1,42 @@
+#!/bin/sh
+
+test_description='rewrite diff on binary file'
+
+. ./test-lib.sh
+
+# We must be large enough to meet the MINIMUM_BREAK_SIZE
+# requirement.
+make_file() {
+ for i in 1 2 3 4 5 6 7 8 9 10; do
+ for j in 1 2 3 4 5 6 7 9 10; do
+ for k in 1 2 3 4 5; do
+ printf "$1\n"
+ done
+ done
+ done >file
+}
+
+test_expect_success 'create binary file with changes' '
+ make_file "\\0" &&
+ git add file &&
+ make_file "\\01"
+'
+
+test_expect_success 'vanilla diff is binary' '
+ git diff >diff &&
+ grep "Binary files a/file and b/file differ" diff
+'
+
+test_expect_success 'rewrite diff is binary' '
+ git diff -B >diff &&
+ grep "dissimilarity index" diff &&
+ grep "Binary files a/file and b/file differ" diff
+'
+
+test_expect_success 'rewrite diff can show binary patch' '
+ git diff -B --binary >diff &&
+ grep "dissimilarity index" diff &&
+ grep "GIT binary patch" diff
+'
+
+test_done
--
1.6.1.rc2.1.g8f945.dirty
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH 2/2] diff: respect textconv in rewrite diffs
2008-12-09 8:12 [PATCH 1/2] diff: fix handling of binary rewrite diffs Jeff King
@ 2008-12-09 8:13 ` Jeff King
2008-12-10 8:34 ` Junio C Hamano
2008-12-10 8:34 ` [PATCH 1/2] diff: fix handling of binary " Junio C Hamano
1 sibling, 1 reply; 7+ messages in thread
From: Jeff King @ 2008-12-09 8:13 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
Currently we just skip rewrite diffs for binary files; this
patch makes an exception for files which will be textconv'd,
and actually performs the textconv before generating the
diff.
Conceptually, rewrite diffs should be in the exact same
format as the a non-rewrite diff, except that we refuse to
share any context. Thus it makes very little sense for "git
diff" to show a textconv'd diff, but for "git diff -B" to
show "Binary files differ".
Signed-off-by: Jeff King <peff@peff.net>
---
diff.c | 48 ++++++++++++++++++++++++++++++----------
t/t4031-diff-rewrite-binary.sh | 21 ++++++++++++++++-
2 files changed, 56 insertions(+), 13 deletions(-)
diff --git a/diff.c b/diff.c
index ea958a2..6dc19bc 100644
--- a/diff.c
+++ b/diff.c
@@ -229,6 +229,8 @@ static void emit_rewrite_diff(const char *name_a,
const char *name_b,
struct diff_filespec *one,
struct diff_filespec *two,
+ const char *textconv_one,
+ const char *textconv_two,
struct diff_options *o)
{
int lc_a, lc_b;
@@ -241,6 +243,8 @@ static void emit_rewrite_diff(const char *name_a,
const char *reset = diff_get_color(color_diff, DIFF_RESET);
static struct strbuf a_name = STRBUF_INIT, b_name = STRBUF_INIT;
const char *a_prefix, *b_prefix;
+ const char *data_one, *data_two;
+ size_t size_one, size_two;
if (diff_mnemonic_prefix && DIFF_OPT_TST(o, REVERSE_DIFF)) {
a_prefix = o->b_prefix;
@@ -262,8 +266,27 @@ static void emit_rewrite_diff(const char *name_a,
diff_populate_filespec(one, 0);
diff_populate_filespec(two, 0);
- lc_a = count_lines(one->data, one->size);
- lc_b = count_lines(two->data, two->size);
+ if (textconv_one) {
+ data_one = run_textconv(textconv_one, one, &size_one);
+ if (!data_one)
+ die("unable to read files to diff");
+ }
+ else {
+ data_one = one->data;
+ size_one = one->size;
+ }
+ if (textconv_two) {
+ data_two = run_textconv(textconv_two, two, &size_two);
+ if (!data_two)
+ die("unable to read files to diff");
+ }
+ else {
+ data_two = two->data;
+ size_two = two->size;
+ }
+
+ lc_a = count_lines(data_one, size_one);
+ lc_b = count_lines(data_two, size_two);
fprintf(o->file,
"%s--- %s%s%s\n%s+++ %s%s%s\n%s@@ -",
metainfo, a_name.buf, name_a_tab, reset,
@@ -273,9 +296,9 @@ static void emit_rewrite_diff(const char *name_a,
print_line_count(o->file, lc_b);
fprintf(o->file, " @@%s\n", reset);
if (lc_a)
- copy_file_with_prefix(o->file, '-', one->data, one->size, old, reset);
+ copy_file_with_prefix(o->file, '-', data_one, size_one, old, reset);
if (lc_b)
- copy_file_with_prefix(o->file, '+', two->data, two->size, new, reset);
+ copy_file_with_prefix(o->file, '+', data_two, size_two, new, reset);
}
static int fill_mmfile(mmfile_t *mf, struct diff_filespec *one)
@@ -1334,6 +1357,11 @@ static void builtin_diff(const char *name_a,
const char *a_prefix, *b_prefix;
const char *textconv_one = NULL, *textconv_two = NULL;
+ if (DIFF_OPT_TST(o, ALLOW_TEXTCONV)) {
+ textconv_one = get_textconv(one);
+ textconv_two = get_textconv(two);
+ }
+
diff_set_mnemonic_prefix(o, "a/", "b/");
if (DIFF_OPT_TST(o, REVERSE_DIFF)) {
a_prefix = o->b_prefix;
@@ -1377,9 +1405,10 @@ static void builtin_diff(const char *name_a,
if ((one->mode ^ two->mode) & S_IFMT)
goto free_ab_and_return;
if (complete_rewrite &&
- !diff_filespec_is_binary(one) &&
- !diff_filespec_is_binary(two)) {
- emit_rewrite_diff(name_a, name_b, one, two, o);
+ (textconv_one || !diff_filespec_is_binary(one)) &&
+ (textconv_two || !diff_filespec_is_binary(two))) {
+ emit_rewrite_diff(name_a, name_b, one, two,
+ textconv_one, textconv_two, o);
o->found_changes = 1;
goto free_ab_and_return;
}
@@ -1388,11 +1417,6 @@ static void builtin_diff(const char *name_a,
if (fill_mmfile(&mf1, one) < 0 || fill_mmfile(&mf2, two) < 0)
die("unable to read files to diff");
- if (DIFF_OPT_TST(o, ALLOW_TEXTCONV)) {
- textconv_one = get_textconv(one);
- textconv_two = get_textconv(two);
- }
-
if (!DIFF_OPT_TST(o, TEXT) &&
( (diff_filespec_is_binary(one) && !textconv_one) ||
(diff_filespec_is_binary(two) && !textconv_two) )) {
diff --git a/t/t4031-diff-rewrite-binary.sh b/t/t4031-diff-rewrite-binary.sh
index 4b522f7..72e5bfd 100755
--- a/t/t4031-diff-rewrite-binary.sh
+++ b/t/t4031-diff-rewrite-binary.sh
@@ -7,13 +7,15 @@ test_description='rewrite diff on binary file'
# We must be large enough to meet the MINIMUM_BREAK_SIZE
# requirement.
make_file() {
+ # common first line to help identify rewrite versus regular diff
+ printf "=\n" >file
for i in 1 2 3 4 5 6 7 8 9 10; do
for j in 1 2 3 4 5 6 7 9 10; do
for k in 1 2 3 4 5; do
printf "$1\n"
done
done
- done >file
+ done >>file
}
test_expect_success 'create binary file with changes' '
@@ -39,4 +41,21 @@ test_expect_success 'rewrite diff can show binary patch' '
grep "GIT binary patch" diff
'
+cat >dump <<'EOF'
+#!/bin/sh
+perl -e '$/ = undef; $_ = <>; s/./ord($&)/ge; print $_' < "$1"
+EOF
+chmod +x dump
+test_expect_success 'setup textconv' '
+ echo file diff=foo >.gitattributes &&
+ git config diff.foo.textconv "$PWD"/dump
+'
+
+test_expect_success 'rewrite diff respects textconv' '
+ git diff -B >diff &&
+ grep "dissimilarity index" diff &&
+ grep "^-61" diff &&
+ grep "^-0" diff
+'
+
test_done
--
1.6.1.rc2.1.g8f945.dirty
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH 2/2] diff: respect textconv in rewrite diffs
2008-12-09 8:13 ` [PATCH 2/2] diff: respect textconv in " Jeff King
@ 2008-12-10 8:34 ` Junio C Hamano
2008-12-10 9:02 ` Jeff King
0 siblings, 1 reply; 7+ messages in thread
From: Junio C Hamano @ 2008-12-10 8:34 UTC (permalink / raw)
To: Jeff King; +Cc: git
Jeff King <peff@peff.net> writes:
> Currently we just skip rewrite diffs for binary files; this patch makes
> an exception for files which will be textconv'd, and actually performs
> the textconv before generating the diff.
>
> Conceptually, rewrite diffs should be in the exact same format as the a
> non-rewrite diff, except that we refuse to share any context. Thus it
> makes very little sense for "git diff" to show a textconv'd diff, but
> for "git diff -B" to show "Binary files differ".
Makes sense.
> +cat >dump <<'EOF'
> +#!/bin/sh
> +perl -e '$/ = undef; $_ = <>; s/./ord($&)/ge; print $_' < "$1"
> +EOF
I'll squash in a change to make this part use $SHELL_PATH for
consistency. Thanks.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 1/2] diff: fix handling of binary rewrite diffs
2008-12-09 8:12 [PATCH 1/2] diff: fix handling of binary rewrite diffs Jeff King
2008-12-09 8:13 ` [PATCH 2/2] diff: respect textconv in " Jeff King
@ 2008-12-10 8:34 ` Junio C Hamano
2008-12-10 9:04 ` Jeff King
1 sibling, 1 reply; 7+ messages in thread
From: Junio C Hamano @ 2008-12-10 8:34 UTC (permalink / raw)
To: Jeff King; +Cc: git
Jeff King <peff@peff.net> writes:
> Instead, if we have binary files, then let's just skip emit_rewrite_diff
> altogether. We will already have shown the "dissimilarity index" line,
> so it is really about the diff contents. If binary diffs are turned off,
> the "Binary files a/file and b/file differ" message should be the same
> in either case. If we do have binary patches turned on, there isn't much
> point in making a less-efficient binary patch that does a total rewrite;
> no human is going to read it, and since binary patches don't apply with
> any fuzz anyway, the result of application should be the same.
Makes sense.
> diff.c | 4 ++-
> t/t4031-diff-rewrite-binary.sh | 42 ++++++++++++++++++++++++++++++++++++++++
> 2 files changed, 45 insertions(+), 1 deletions(-)
> create mode 100755 t/t4031-diff-rewrite-binary.sh
>
> diff --git a/diff.c b/diff.c
> index f644947..ea958a2 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -1376,7 +1376,9 @@ static void builtin_diff(const char *name_a,
> */
> if ((one->mode ^ two->mode) & S_IFMT)
> goto free_ab_and_return;
> - if (complete_rewrite) {
> + if (complete_rewrite &&
> + !diff_filespec_is_binary(one) &&
> + !diff_filespec_is_binary(two)) {
> emit_rewrite_diff(name_a, name_b, one, two, o);
> o->found_changes = 1;
> goto free_ab_and_return;
And looks correct.
> diff --git a/t/t4031-diff-rewrite-binary.sh b/t/t4031-diff-rewrite-binary.sh
> new file mode 100755
> index 0000000..4b522f7
> --- /dev/null
> +++ b/t/t4031-diff-rewrite-binary.sh
> @@ -0,0 +1,42 @@
> +#!/bin/sh
> +
> +test_description='rewrite diff on binary file'
> +
> +. ./test-lib.sh
> +
> +# We must be large enough to meet the MINIMUM_BREAK_SIZE
> +# requirement.
> +make_file() {
> + for i in 1 2 3 4 5 6 7 8 9 10; do
> + for j in 1 2 3 4 5 6 7 9 10; do
> + for k in 1 2 3 4 5; do
> + printf "$1\n"
> + done
> + done
> + done >file
> +}
> +
> +test_expect_success 'create binary file with changes' '
> + make_file "\\0" &&
> + git add file &&
> + make_file "\\01"
> +'
Hmm... "1 2 3 4 5 6 7 9 10"?
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 2/2] diff: respect textconv in rewrite diffs
2008-12-10 8:34 ` Junio C Hamano
@ 2008-12-10 9:02 ` Jeff King
2008-12-10 20:27 ` Junio C Hamano
0 siblings, 1 reply; 7+ messages in thread
From: Jeff King @ 2008-12-10 9:02 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
On Wed, Dec 10, 2008 at 12:34:26AM -0800, Junio C Hamano wrote:
> > +cat >dump <<'EOF'
> > +#!/bin/sh
> > +perl -e '$/ = undef; $_ = <>; s/./ord($&)/ge; print $_' < "$1"
> > +EOF
>
> I'll squash in a change to make this part use $SHELL_PATH for
> consistency. Thanks.
It was cut-and-paste from t4030, so if we care, it might be worth
changing there, too (and naming it "dump" instead of "hexdump", because
it actually dumps in decimal :) ).
But more importantly, the fixup you just pushed seems to have an extra
">dump":
> +{
> + echo "#!$SHELL_PATH"
> + cat >dump <<'EOF'
> +perl -e '$/ = undef; $_ = <>; s/./ord($&)/ge; print $_' < "$1"
> +EOF
> +} >dump
> +chmod +x dump
-Peff
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 1/2] diff: fix handling of binary rewrite diffs
2008-12-10 8:34 ` [PATCH 1/2] diff: fix handling of binary " Junio C Hamano
@ 2008-12-10 9:04 ` Jeff King
0 siblings, 0 replies; 7+ messages in thread
From: Jeff King @ 2008-12-10 9:04 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
On Wed, Dec 10, 2008 at 12:34:49AM -0800, Junio C Hamano wrote:
> > + for j in 1 2 3 4 5 6 7 9 10; do
> Hmm... "1 2 3 4 5 6 7 9 10"?
Heh. Oops.
I see you fixed this up as 1 through 9. My intent was 1 through 10, but
it really doesn't matter. The point is just to hit the minimum break
size, and I left a lot of leeway in case it ever gets increased later.
-Peff
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 2/2] diff: respect textconv in rewrite diffs
2008-12-10 9:02 ` Jeff King
@ 2008-12-10 20:27 ` Junio C Hamano
0 siblings, 0 replies; 7+ messages in thread
From: Junio C Hamano @ 2008-12-10 20:27 UTC (permalink / raw)
To: Jeff King; +Cc: git
Jeff King <peff@peff.net> writes:
> But more importantly, the fixup you just pushed seems to have an extra
> ">dump":
Oops, sorry.
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2008-12-10 20:28 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-12-09 8:12 [PATCH 1/2] diff: fix handling of binary rewrite diffs Jeff King
2008-12-09 8:13 ` [PATCH 2/2] diff: respect textconv in " Jeff King
2008-12-10 8:34 ` Junio C Hamano
2008-12-10 9:02 ` Jeff King
2008-12-10 20:27 ` Junio C Hamano
2008-12-10 8:34 ` [PATCH 1/2] diff: fix handling of binary " Junio C Hamano
2008-12-10 9:04 ` Jeff King
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).