git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: Sverre Rabbelier <srabbelier@gmail.com>
Cc: John <john@puckerupgames.com>, git@vger.kernel.org
Subject: Re: serious performance issues with images, audio files, and other "non-code" data
Date: Tue, 18 May 2010 15:27:58 -0400	[thread overview]
Message-ID: <20100518192758.GC2383@coredump.intra.peff.net> (raw)
In-Reply-To: <AANLkTiltNPIR5gAWMOqZ2Y_azFUU93kH54ddHuCFFeCp@mail.gmail.com>

On Tue, May 18, 2010 at 09:10:58PM +0200, Sverre Rabbelier wrote:

> On Tue, May 18, 2010 at 21:07, Jeff King <peff@peff.net> wrote:
> > No, not to my knowledge. Even the "binary" attribute just says "this
> > file is binary, don't text diff it". I think we will always still do
> > rewrite-detection for operations like "git status" and the diff summary
> > of "git commit".
> 
> Would that not be a very sensible optimization that would help John
> (and other users of big files) a lot?

It might help some, but I worry about overloading the meaning of
"-delta". Right now it has a very clear meaning: don't delta for
packfiles. But that doesn't mean I might not want to see break detection
(or inexact rename detection, for that matter) at some time.

Large binary files shouldn't be taxing on regular diffs.  If you have
marked a file as "binary" and we are not creating a binary diff (i.e.,
just printing "binary files differ"), then we shouldn't even need to
pull the blob from storage (since we can tell from the sha1 that it is
different). I haven't checked to see if we do that simple optimization
(if you haven't marked it with a binary attribute, then obviously we do
have to look at the blob to find out that it is binary).

So:

  1. I think it would need a separate attribute that is about diffing
     (possibly even just options to a custom diff filter).

  2. I am not clear exactly what options would work best. Do you want to
     disable diffing entirely? Disable just inexact rename detection and
     break detection? If break detection is disabled, do you assume it
     is _always_ a rewrite, or never?

So I am open to the idea, but I think we would need a more concrete
proposal and some timings to show how it is a benefit.

-Peff

  reply	other threads:[~2010-05-18 19:28 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-05-12 18:53 serious performance issues with images, audio files, and other "non-code" data John
2010-05-12 19:15 ` Jakub Narebski
2010-05-14  5:10 ` Jeff King
2010-05-14 12:54   ` John
2010-05-14 17:26     ` Dirk Süsserott
2010-05-17 23:16     ` Jeff King
2010-05-17 23:33       ` Sverre Rabbelier
2010-05-18 19:07         ` Jeff King
2010-05-18 19:10           ` Sverre Rabbelier
2010-05-18 19:27             ` Jeff King [this message]
2010-05-18 19:37               ` Nicolas Pitre
2010-05-18 18:50       ` John
2010-05-18 18:54         ` Sverre Rabbelier
2010-05-18 19:19         ` Jeff King
2010-05-18 19:33           ` Nicolas Pitre
2010-05-18 19:41             ` Jeff King
2010-05-18 19:59               ` Nicolas Pitre
2010-05-24  0:21                 ` John
2010-05-24  1:16                   ` Junio C Hamano
2010-05-24  7:01                     ` John
2010-05-25  6:33                       ` Jeff King
2010-05-25  7:28                     ` Michael J Gruber
2010-05-25 16:12                       ` John
2010-05-25 17:18                         ` Nicolas Pitre
2010-05-25 17:47                           ` John
2010-05-24  5:39                   ` Jeff King
2010-05-24  6:44                     ` John
2010-05-24  6:45                       ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100518192758.GC2383@coredump.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=john@puckerupgames.com \
    --cc=srabbelier@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).