From: Jeff King <peff@peff.net>
To: Junio C Hamano <gitster@pobox.com>
Cc: Jay Soffian <jaysoffian@gmail.com>,
Michael J Gruber <git@drmicha.warpmail.net>,
git <git@vger.kernel.org>
Subject: Re: [PATCH 3/5] combine-diff: handle binary files as binary
Date: Mon, 30 May 2011 12:19:27 -0400 [thread overview]
Message-ID: <20110530161927.GC24431@sigill.intra.peff.net> (raw)
In-Reply-To: <20110530143627.GC31490@sigill.intra.peff.net>
On Mon, May 30, 2011 at 10:36:27AM -0400, Jeff King wrote:
> 1. Grab each blob, check binary-ness, and free. This double-loads in
> the common, non-binary case.
> [...]
>
> I'll try to take a look at it this week and get some measurements on (1)
> versus (2) for both speed and peak memory usage. And then see if I can
> do better with (3), and implement the "peek" solution both here and in
> regular diff.
I was curious about this, so I stole a few minutes to do some
preliminary benchmarks this morning.
The first thing to look at is the performance of the original code, that
does not check binary-ness at all. It's going to represent the best we
can do with any strategy. So I tried:
git log -p --cc --merges origin/master
on git.git using both v1.7.5.3 and the jk/combine-diff-binary-etc
branch. And it turns out that the extra loads really don't make a
difference in practice. My best-of-5 for the two cases were:
$ time git.v1.7.5.3 log -p --cc --merges origin/master >/dev/null
real 0m59.518s
user 0m58.672s
sys 0m0.688s
$ time git.jk.binary-combined-diff log -p --cc \
--merges origin/master >/dev/null
real 0m58.949s
user 0m58.220s
sys 0m0.572s
The new code actually came out slightly faster. One reason may be that
there are 3 combined diffs of git-gui/lib/git-gui.ico that we avoid
doing (and just say "Binary files differ"). That's not a lot, but it
gives us a very tiny edge (though that edge is very close to the amount
of noise between runs). Still, I think it implies that the extra loads
in the common non-binary case are not actually measurable.
The peak memory use between the two should be the same (since we free
each blob immediately), but I didn't measure it.
So I think in practice it's not a big deal. I'll still take a look at
the "peek" optimization later this week, since that can make a
difference in some corner cases. And as part of that, it will probably
make sense to keep the buffers around for small-ish files, so we'll get
the optimization I mentioned more or less for free. I'll also do the
check for duplicated sha1s that you mentioned.
-Peff
next prev parent reply other threads:[~2011-05-30 16:19 UTC|newest]
Thread overview: 37+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-05-22 20:12 combined diff does not detect binary files and ignores -diff attribute Jay Soffian
2011-05-23 13:30 ` Michael J Gruber
2011-05-23 15:17 ` Jay Soffian
2011-05-23 17:07 ` Junio C Hamano
2011-05-23 18:11 ` Jeff King
2011-05-23 20:15 ` Jeff King
2011-05-23 20:16 ` [PATCH 1/5] combine-diff: split header printing into its own function Jeff King
2011-05-23 20:16 ` [PATCH 2/5] combine-diff: calculate mode_differs earlier Jeff King
2011-05-23 20:27 ` [PATCH 3/5] combine-diff: handle binary files as binary Jeff King
2011-05-23 23:02 ` Junio C Hamano
2011-05-23 23:50 ` Jeff King
2011-05-30 6:33 ` Junio C Hamano
2011-05-30 14:36 ` Jeff King
2011-05-30 16:19 ` Jeff King [this message]
2011-05-30 19:32 ` Junio C Hamano
2011-05-31 22:42 ` Junio C Hamano
2011-05-23 20:30 ` [PATCH 4/5] refactor get_textconv to not require diff_filespec Jeff King
2011-05-23 20:31 ` [PATCH 5/5] combine-diff: respect textconv attributes Jeff King
2011-05-23 22:47 ` Junio C Hamano
2011-05-23 23:39 ` Jeff King
2011-05-24 16:20 ` Junio C Hamano
2011-05-24 18:52 ` Jeff King
2011-05-23 22:55 ` combined diff does not detect binary files and ignores -diff attribute Jay Soffian
2011-05-23 23:31 ` Jay Soffian
2011-05-23 23:49 ` Jeff King
2011-05-24 0:59 ` Jay Soffian
2011-05-23 23:41 ` Jeff King
2011-05-24 4:46 ` Junio C Hamano
2011-05-24 7:19 ` Michael J Gruber
2011-05-24 15:36 ` Junio C Hamano
2011-05-24 16:38 ` Michael J Gruber
2011-05-24 16:43 ` Junio C Hamano
2011-05-24 16:52 ` Jay Soffian
2011-05-24 19:13 ` Jeff King
2011-05-25 7:38 ` Michael J Gruber
2011-05-25 15:29 ` Jeff King
2011-05-24 14:40 ` Jay Soffian
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110530161927.GC24431@sigill.intra.peff.net \
--to=peff@peff.net \
--cc=git@drmicha.warpmail.net \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=jaysoffian@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).