* Commit/merge messages for binary files @ 2011-02-17 20:40 Piotr Krukowiecki 2011-02-18 13:53 ` Matthieu Moy 0 siblings, 1 reply; 12+ messages in thread From: Piotr Krukowiecki @ 2011-02-17 20:40 UTC (permalink / raw) To: git Hi, there's a different output when committing change and when merging change for a binary file. Do the insertions/deletions have any meaning for binary files? For example: $ git commit -m Updated [topic 5da30ce] Updated 1 files changed, 8 insertions(+), 103 deletions(-) rewrite blob.o (100%) $ git merge topic Updating 75ab259..5da30ce Fast-forward blob.o | Bin 25920 -> 4364 bytes 1 files changed, 0 insertions(+), 0 deletions(-) -- Piotrek ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Commit/merge messages for binary files 2011-02-17 20:40 Commit/merge messages for binary files Piotr Krukowiecki @ 2011-02-18 13:53 ` Matthieu Moy 2011-02-18 20:30 ` Piotr Krukowiecki 0 siblings, 1 reply; 12+ messages in thread From: Matthieu Moy @ 2011-02-18 13:53 UTC (permalink / raw) To: Piotr Krukowiecki; +Cc: git Piotr Krukowiecki <piotr.krukowiecki@gmail.com> writes: > Hi, > > there's a different output when committing change and when merging > change for a binary file. > Do the insertions/deletions have any meaning for binary files? No. They're inserted/deleted *lines*, and that wouldn't make sense for binary files. -- Matthieu Moy http://www-verimag.imag.fr/~moy/ ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Commit/merge messages for binary files 2011-02-18 13:53 ` Matthieu Moy @ 2011-02-18 20:30 ` Piotr Krukowiecki 2011-02-18 21:19 ` Matthieu Moy 0 siblings, 1 reply; 12+ messages in thread From: Piotr Krukowiecki @ 2011-02-18 20:30 UTC (permalink / raw) To: Matthieu Moy; +Cc: git W dniu 18.02.2011 14:53, Matthieu Moy pisze: > Piotr Krukowiecki <piotr.krukowiecki@gmail.com> writes: > >> Hi, >> >> there's a different output when committing change and when merging >> change for a binary file. >> Do the insertions/deletions have any meaning for binary files? > > No. They're inserted/deleted *lines*, and that wouldn't make sense for > binary files. > So it's a bug? -- Piotr Krukowiecki ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Commit/merge messages for binary files 2011-02-18 20:30 ` Piotr Krukowiecki @ 2011-02-18 21:19 ` Matthieu Moy 2011-02-18 21:42 ` Piotr Krukowiecki 0 siblings, 1 reply; 12+ messages in thread From: Matthieu Moy @ 2011-02-18 21:19 UTC (permalink / raw) To: Piotr Krukowiecki; +Cc: git Piotr Krukowiecki <piotr.krukowiecki@gmail.com> writes: > W dniu 18.02.2011 14:53, Matthieu Moy pisze: >> Piotr Krukowiecki <piotr.krukowiecki@gmail.com> writes: >> >>> Hi, >>> >>> there's a different output when committing change and when merging >>> change for a binary file. >>> Do the insertions/deletions have any meaning for binary files? >> >> No. They're inserted/deleted *lines*, and that wouldn't make sense for >> binary files. > > So it's a bug? I don't see any bug. There were no insertion/deletion in text files, hence you see 0 for both. -- Matthieu Moy http://www-verimag.imag.fr/~moy/ ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Commit/merge messages for binary files 2011-02-18 21:19 ` Matthieu Moy @ 2011-02-18 21:42 ` Piotr Krukowiecki 2011-02-19 7:06 ` Jeff King 0 siblings, 1 reply; 12+ messages in thread From: Piotr Krukowiecki @ 2011-02-18 21:42 UTC (permalink / raw) To: Matthieu Moy; +Cc: git W dniu 18.02.2011 22:19, Matthieu Moy pisze: > Piotr Krukowiecki <piotr.krukowiecki@gmail.com> writes: > >> W dniu 18.02.2011 14:53, Matthieu Moy pisze: >>> Piotr Krukowiecki <piotr.krukowiecki@gmail.com> writes: >>> >>>> Hi, >>>> >>>> there's a different output when committing change and when merging >>>> change for a binary file. >>>> Do the insertions/deletions have any meaning for binary files? >>> >>> No. They're inserted/deleted *lines*, and that wouldn't make sense for >>> binary files. >> >> So it's a bug? > > I don't see any bug. There were no insertion/deletion in text files, > hence you see 0 for both. Maybe I wasn't clear. Both the commit and the merge was for the same binary file. In case of commit there were non-zero insertions/deletions. See example below - notice both update binary file blob.o: $ git commit -m Updated [topic 5da30ce] Updated 1 files changed, 8 insertions(+), 103 deletions(-) rewrite blob.o (100%) $ git merge topic Updating 75ab259..5da30ce Fast-forward blob.o | Bin 25920 -> 4364 bytes 1 files changed, 0 insertions(+), 0 deletions(-) Hope that helps, -- Piotr Krukowiecki ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Commit/merge messages for binary files 2011-02-18 21:42 ` Piotr Krukowiecki @ 2011-02-19 7:06 ` Jeff King 2011-02-19 8:03 ` Jeff King 0 siblings, 1 reply; 12+ messages in thread From: Jeff King @ 2011-02-19 7:06 UTC (permalink / raw) To: Piotr Krukowiecki; +Cc: Matthieu Moy, git On Fri, Feb 18, 2011 at 10:42:42PM +0100, Piotr Krukowiecki wrote: > Maybe I wasn't clear. Both the commit and the merge was for the same binary > file. > > In case of commit there were non-zero insertions/deletions. > See example below - notice both update binary file blob.o: > > $ git commit -m Updated > [topic 5da30ce] Updated > 1 files changed, 8 insertions(+), 103 deletions(-) > rewrite blob.o (100%) Yeah, that seems weird. I think it might have to do with break detection in the diff, which is on during the commit summary diff: $ git init $ dd if=/dev/urandom of=foo.rand bs=1k count=4 $ git add foo.rand $ git commit -m one [master (root-commit) 8c5783a] one 1 files changed, 0 insertions(+), 0 deletions(-) create mode 100644 foo.rand $ git show --oneline --stat --summary 8c5783a one foo.rand | Bin 0 -> 4096 bytes 1 files changed, 0 insertions(+), 0 deletions(-) create mode 100644 foo.rand So far so good. Now let's rewrite it. $ dd if=/dev/urandom of=foo.rand bs=1k count=4 $ git commit -a -m two [master 43961bc] two 1 files changed, 15 insertions(+), 19 deletions(-) rewrite foo.rand (100%) Now that seems wrong. What about doing another diff: $ git show --oneline --stat --summary 43961bc two foo.rand | Bin 4096 -> 4096 bytes 1 files changed, 0 insertions(+), 0 deletions(-) That's right. Now let's turn on break detection: $ git show --oneline --stat --summary -B 43961bc two foo.rand | 34 +++++++++++++++------------------- 1 files changed, 15 insertions(+), 19 deletions(-) rewrite foo.rand (100%) Broken again. So I guess we have some problem with making sure we treat broken filepairs as binary. -Peff ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Commit/merge messages for binary files 2011-02-19 7:06 ` Jeff King @ 2011-02-19 8:03 ` Jeff King 2011-02-19 8:04 ` [PATCH 1/2] diff: handle diffstat of rewritten " Jeff King 2011-02-19 8:16 ` [PATCH 2/2] diff: don't retrieve binary blobs for diffstat Jeff King 0 siblings, 2 replies; 12+ messages in thread From: Jeff King @ 2011-02-19 8:03 UTC (permalink / raw) To: Piotr Krukowiecki; +Cc: Junio C Hamano, Matthieu Moy, git On Sat, Feb 19, 2011 at 02:06:01AM -0500, Jeff King wrote: > Now that seems wrong. What about doing another diff: > > $ git show --oneline --stat --summary > 43961bc two > foo.rand | Bin 4096 -> 4096 bytes > 1 files changed, 0 insertions(+), 0 deletions(-) > > That's right. Now let's turn on break detection: > > $ git show --oneline --stat --summary -B > 43961bc two > foo.rand | 34 +++++++++++++++------------------- > 1 files changed, 15 insertions(+), 19 deletions(-) > rewrite foo.rand (100%) > > Broken again. So I guess we have some problem with making sure we treat > broken filepairs as binary. Yep, builtin_diffstat just didn't handle that case. Two patch series to follow: [1/2]: diff: handle diffstat of rewritten binary files [2/2]: diff: don't retrieve binary blobs for diffstat The first one fixes the bug, and the second is a convenient optimization I noticed while there. -Peff ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH 1/2] diff: handle diffstat of rewritten binary files 2011-02-19 8:03 ` Jeff King @ 2011-02-19 8:04 ` Jeff King 2011-02-19 8:16 ` [PATCH 2/2] diff: don't retrieve binary blobs for diffstat Jeff King 1 sibling, 0 replies; 12+ messages in thread From: Jeff King @ 2011-02-19 8:04 UTC (permalink / raw) To: Piotr Krukowiecki; +Cc: Junio C Hamano, Matthieu Moy, git The logic in builtin_diffstat assumes that a complete_rewrite pair should have its lines counted. This is nonsensical for binary files and leads to confusing things like: $ git diff --stat --summary HEAD^ HEAD foo.rand | Bin 4096 -> 4096 bytes 1 files changed, 0 insertions(+), 0 deletions(-) $ git diff --stat --summary -B HEAD^ HEAD foo.rand | 34 +++++++++++++++------------------- 1 files changed, 15 insertions(+), 19 deletions(-) rewrite foo.rand (100%) So let's reorder the function to handle binary files first (which from diffstat's perspective look like complete rewrites anyway), then rewrites, then actual diffstats. There are two bonus prizes to this reorder: 1. It gets rid of a now-superfluous goto. 2. The binary case is at the top, which means we can further optimize it in the next patch. Signed-off-by: Jeff King <peff@peff.net> --- diff.c | 24 ++++++++++++++---------- 1 files changed, 14 insertions(+), 10 deletions(-) diff --git a/diff.c b/diff.c index 5422c43..2ac0fe9 100644 --- a/diff.c +++ b/diff.c @@ -2079,25 +2079,30 @@ static void builtin_diffstat(const char *name_a, const char *name_b, data->is_unmerged = 1; return; } - if (complete_rewrite) { + + if (diff_filespec_is_binary(one) || diff_filespec_is_binary(two)) { + if (fill_mmfile(&mf1, one) < 0 || fill_mmfile(&mf2, two) < 0) + die("unable to read files to diff"); + data->is_binary = 1; + data->added = mf2.size; + data->deleted = mf1.size; + } + + else if (complete_rewrite) { diff_populate_filespec(one, 0); diff_populate_filespec(two, 0); data->deleted = count_lines(one->data, one->size); data->added = count_lines(two->data, two->size); - goto free_and_return; } - if (fill_mmfile(&mf1, one) < 0 || fill_mmfile(&mf2, two) < 0) - die("unable to read files to diff"); - if (diff_filespec_is_binary(one) || diff_filespec_is_binary(two)) { - data->is_binary = 1; - data->added = mf2.size; - data->deleted = mf1.size; - } else { + else { /* Crazy xdl interfaces.. */ xpparam_t xpp; xdemitconf_t xecfg; + if (fill_mmfile(&mf1, one) < 0 || fill_mmfile(&mf2, two) < 0) + die("unable to read files to diff"); + memset(&xpp, 0, sizeof(xpp)); memset(&xecfg, 0, sizeof(xecfg)); xpp.flags = o->xdl_opts; @@ -2105,7 +2110,6 @@ static void builtin_diffstat(const char *name_a, const char *name_b, &xpp, &xecfg); } - free_and_return: diff_free_filespec_data(one); diff_free_filespec_data(two); } -- 1.7.4.1.26.g3372c ^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 2/2] diff: don't retrieve binary blobs for diffstat 2011-02-19 8:03 ` Jeff King 2011-02-19 8:04 ` [PATCH 1/2] diff: handle diffstat of rewritten " Jeff King @ 2011-02-19 8:16 ` Jeff King 2011-02-21 23:33 ` Junio C Hamano 1 sibling, 1 reply; 12+ messages in thread From: Jeff King @ 2011-02-19 8:16 UTC (permalink / raw) To: Piotr Krukowiecki; +Cc: Junio C Hamano, Matthieu Moy, git We only need the size, which is much cheaper to get, especially if it is a big binary file. Signed-off-by: Jeff King <peff@peff.net> --- This of course is only really helpful if you have marked the files as binary via gitattributes, since otherwise we have to pull in the blob to find out that it's binary. :) But in my real-world photo/video repo, which has media files marked via gitattributes as binary (but to textconv exif tags, of course). The commit in question has 26 files totalling 88 megabytes. $ time git show --stat >old real 0m0.428s user 0m0.392s sys 0m0.032s $ time git.jk.diffstat-binary show --stat >new real 0m0.005s user 0m0.004s sys 0m0.000s $ cmp old new && echo ok ok 8500% speedup isn't too bad. :) diff.c | 15 +++++++++++---- 1 files changed, 11 insertions(+), 4 deletions(-) diff --git a/diff.c b/diff.c index 2ac0fe9..6640857 100644 --- a/diff.c +++ b/diff.c @@ -245,6 +245,15 @@ static int fill_mmfile(mmfile_t *mf, struct diff_filespec *one) return 0; } +/* like fill_mmfile, but only for size, so we can avoid retrieving blob */ +static unsigned long diff_filespec_size(struct diff_filespec *one) +{ + if (!DIFF_FILE_VALID(one)) + return 0; + diff_populate_filespec(one, 1); + return one->size; +} + static int count_trailing_blank(mmfile_t *mf, unsigned ws_rule) { char *ptr = mf->ptr; @@ -2081,11 +2090,9 @@ static void builtin_diffstat(const char *name_a, const char *name_b, } if (diff_filespec_is_binary(one) || diff_filespec_is_binary(two)) { - if (fill_mmfile(&mf1, one) < 0 || fill_mmfile(&mf2, two) < 0) - die("unable to read files to diff"); data->is_binary = 1; - data->added = mf2.size; - data->deleted = mf1.size; + data->added = diff_filespec_size(two); + data->deleted = diff_filespec_size(one); } else if (complete_rewrite) { -- 1.7.4.1.26.g3372c ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH 2/2] diff: don't retrieve binary blobs for diffstat 2011-02-19 8:16 ` [PATCH 2/2] diff: don't retrieve binary blobs for diffstat Jeff King @ 2011-02-21 23:33 ` Junio C Hamano 2011-02-22 15:37 ` Jeff King 0 siblings, 1 reply; 12+ messages in thread From: Junio C Hamano @ 2011-02-21 23:33 UTC (permalink / raw) To: Jeff King; +Cc: Piotr Krukowiecki, Junio C Hamano, Matthieu Moy, git Jeff King <peff@peff.net> writes: > We only need the size, which is much cheaper to get, > especially if it is a big binary file. > > Signed-off-by: Jeff King <peff@peff.net> > --- Nice ;-) Do we want a test or two for 1/2 by the way? ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 2/2] diff: don't retrieve binary blobs for diffstat 2011-02-21 23:33 ` Junio C Hamano @ 2011-02-22 15:37 ` Jeff King 2011-02-22 19:00 ` Junio C Hamano 0 siblings, 1 reply; 12+ messages in thread From: Jeff King @ 2011-02-22 15:37 UTC (permalink / raw) To: Junio C Hamano; +Cc: Piotr Krukowiecki, Matthieu Moy, git On Mon, Feb 21, 2011 at 03:33:05PM -0800, Junio C Hamano wrote: > Jeff King <peff@peff.net> writes: > > > We only need the size, which is much cheaper to get, > > especially if it is a big binary file. > > > > Signed-off-by: Jeff King <peff@peff.net> > > --- > > Nice ;-) Do we want a test or two for 1/2 by the way? Yeah, it's probably a good idea. Can you squash this in, or should I resend? --- diff --git a/t/t4031-diff-rewrite-binary.sh b/t/t4031-diff-rewrite-binary.sh index 7e7b307..7d7470f 100755 --- a/t/t4031-diff-rewrite-binary.sh +++ b/t/t4031-diff-rewrite-binary.sh @@ -44,6 +44,13 @@ test_expect_success 'rewrite diff can show binary patch' ' grep "GIT binary patch" diff ' +test_expect_success 'rewrite diff --stat shows binary changes' ' + git diff -B --stat --summary >diff && + grep "Bin" diff && + grep "0 insertions.*0 deletions" diff && + grep " rewrite file" diff +' + { echo "#!$SHELL_PATH" cat <<'EOF' ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH 2/2] diff: don't retrieve binary blobs for diffstat 2011-02-22 15:37 ` Jeff King @ 2011-02-22 19:00 ` Junio C Hamano 0 siblings, 0 replies; 12+ messages in thread From: Junio C Hamano @ 2011-02-22 19:00 UTC (permalink / raw) To: Jeff King; +Cc: Junio C Hamano, Piotr Krukowiecki, Matthieu Moy, git Jeff King <peff@peff.net> writes: >> Nice ;-) Do we want a test or two for 1/2 by the way? > > Yeah, it's probably a good idea. Can you squash this in,... Done; thanks. ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2011-02-22 19:00 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-02-17 20:40 Commit/merge messages for binary files Piotr Krukowiecki 2011-02-18 13:53 ` Matthieu Moy 2011-02-18 20:30 ` Piotr Krukowiecki 2011-02-18 21:19 ` Matthieu Moy 2011-02-18 21:42 ` Piotr Krukowiecki 2011-02-19 7:06 ` Jeff King 2011-02-19 8:03 ` Jeff King 2011-02-19 8:04 ` [PATCH 1/2] diff: handle diffstat of rewritten " Jeff King 2011-02-19 8:16 ` [PATCH 2/2] diff: don't retrieve binary blobs for diffstat Jeff King 2011-02-21 23:33 ` Junio C Hamano 2011-02-22 15:37 ` Jeff King 2011-02-22 19:00 ` Junio C Hamano
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).