git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: Junio C Hamano <gitster@pobox.com>
Cc: Jay Soffian <jaysoffian@gmail.com>,
	Michael J Gruber <git@drmicha.warpmail.net>,
	git <git@vger.kernel.org>
Subject: Re: [PATCH 3/5] combine-diff: handle binary files as binary
Date: Mon, 30 May 2011 12:19:27 -0400	[thread overview]
Message-ID: <20110530161927.GC24431@sigill.intra.peff.net> (raw)
In-Reply-To: <20110530143627.GC31490@sigill.intra.peff.net>

On Mon, May 30, 2011 at 10:36:27AM -0400, Jeff King wrote:

>   1. Grab each blob, check binary-ness, and free. This double-loads in
>      the common, non-binary case.
> [...]
>
> I'll try to take a look at it this week and get some measurements on (1)
> versus (2) for both speed and peak memory usage. And then see if I can
> do better with (3), and implement the "peek" solution both here and in
> regular diff.

I was curious about this, so I stole a few minutes to do some
preliminary benchmarks this morning.

The first thing to look at is the performance of the original code, that
does not check binary-ness at all. It's going to represent the best we
can do with any strategy. So I tried:

  git log -p --cc --merges origin/master

on git.git using both v1.7.5.3 and the jk/combine-diff-binary-etc
branch. And it turns out that the extra loads really don't make a
difference in practice. My best-of-5 for the two cases were:

  $ time git.v1.7.5.3 log -p --cc --merges origin/master >/dev/null
  real    0m59.518s
  user    0m58.672s
  sys     0m0.688s

  $ time git.jk.binary-combined-diff log -p --cc \
      --merges origin/master >/dev/null
  real    0m58.949s
  user    0m58.220s
  sys     0m0.572s

The new code actually came out slightly faster.  One reason may be that
there are 3 combined diffs of git-gui/lib/git-gui.ico that we avoid
doing (and just say "Binary files differ"). That's not a lot, but it
gives us a very tiny edge (though that edge is very close to the amount
of noise between runs). Still, I think it implies that the extra loads
in the common non-binary case are not actually measurable.

The peak memory use between the two should be the same (since we free
each blob immediately), but I didn't measure it.

So I think in practice it's not a big deal. I'll still take a look at
the "peek" optimization later this week, since that can make a
difference in some corner cases. And as part of that, it will probably
make sense to keep the buffers around for small-ish files, so we'll get
the optimization I mentioned more or less for free. I'll also do the
check for duplicated sha1s that you mentioned.

-Peff

  reply	other threads:[~2011-05-30 16:19 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-05-22 20:12 combined diff does not detect binary files and ignores -diff attribute Jay Soffian
2011-05-23 13:30 ` Michael J Gruber
2011-05-23 15:17   ` Jay Soffian
2011-05-23 17:07     ` Junio C Hamano
2011-05-23 18:11     ` Jeff King
2011-05-23 20:15       ` Jeff King
2011-05-23 20:16         ` [PATCH 1/5] combine-diff: split header printing into its own function Jeff King
2011-05-23 20:16         ` [PATCH 2/5] combine-diff: calculate mode_differs earlier Jeff King
2011-05-23 20:27         ` [PATCH 3/5] combine-diff: handle binary files as binary Jeff King
2011-05-23 23:02           ` Junio C Hamano
2011-05-23 23:50             ` Jeff King
2011-05-30  6:33           ` Junio C Hamano
2011-05-30 14:36             ` Jeff King
2011-05-30 16:19               ` Jeff King [this message]
2011-05-30 19:32                 ` Junio C Hamano
2011-05-31 22:42               ` Junio C Hamano
2011-05-23 20:30         ` [PATCH 4/5] refactor get_textconv to not require diff_filespec Jeff King
2011-05-23 20:31         ` [PATCH 5/5] combine-diff: respect textconv attributes Jeff King
2011-05-23 22:47           ` Junio C Hamano
2011-05-23 23:39             ` Jeff King
2011-05-24 16:20           ` Junio C Hamano
2011-05-24 18:52             ` Jeff King
2011-05-23 22:55         ` combined diff does not detect binary files and ignores -diff attribute Jay Soffian
2011-05-23 23:31           ` Jay Soffian
2011-05-23 23:49             ` Jeff King
2011-05-24  0:59               ` Jay Soffian
2011-05-23 23:41           ` Jeff King
2011-05-24  4:46             ` Junio C Hamano
2011-05-24  7:19               ` Michael J Gruber
2011-05-24 15:36                 ` Junio C Hamano
2011-05-24 16:38                   ` Michael J Gruber
2011-05-24 16:43                     ` Junio C Hamano
2011-05-24 16:52                     ` Jay Soffian
2011-05-24 19:13                 ` Jeff King
2011-05-25  7:38                   ` Michael J Gruber
2011-05-25 15:29                     ` Jeff King
2011-05-24 14:40               ` Jay Soffian

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110530161927.GC24431@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@drmicha.warpmail.net \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jaysoffian@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).