git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Nicolas Pitre <nico@cam.org>
To: Carl Baldwin <cnb@fc.hp.com>
Cc: Junio C Hamano <junkio@cox.net>, git@vger.kernel.org
Subject: Re: [PATCH] diff-delta: produce optimal pack data
Date: Fri, 24 Feb 2006 15:02:07 -0500 (EST)	[thread overview]
Message-ID: <Pine.LNX.4.64.0602241438521.31162@localhost.localdomain> (raw)
In-Reply-To: <20060224192354.GC387@hpsvcnb.fc.hp.com>

On Fri, 24 Feb 2006, Carl Baldwin wrote:

> I see what you're saying about this data reuse helping to speed up
> subsequent cloning operations.  However, if packing takes this long and
> doesn't give me any disk space savings then I don't want to pay the very
> heavy price of packing these files even the first time nor do I want to
> pay the price incrementally.

Of course.  There is admitedly a problem here.  I'm just abusing a bit 
of your time to properly identify its parameters.

> The most I would tolerate for the first pack is a few seconds.  The most
> I would tolerate for any incremental pack is about 1 second.

Well that is probably a bit tight.  Ideally it should be linear with the 
size of the data set to process.  If you have 10 files 10MB each it 
should take about the same time to pack than 10000 files of 10KB each.  
Of course incrementally packing one additional 10MB file might take more 
than a second although it is only one file.
 
> BTW, git repack has been going for 30 minutes and has packed 4/36
> objects.  :-)

Pathetic.

> I think the right answer would be for git to avoid trying to pack files
> like this.  Junio mentioned something like this in his message.

Yes.  First we could add an additional parameter to the repacking 
strategy which is the undeltified but deflated size of an object.  That 
would prevent any deltas to become bigger than the simply deflated 
version.

Remains the delta performance issue.  I think I know what the problem 
is.  I'm not sure I know what the best solution would be though.  The 
patological data set is easy to identify quickly and one strategy might 
simply to bail out early when it happens and therefore not attempt any 
delta.

However, if you could let me play with two samples of your big file I'd 
be grateful.  If so I'd like to make git work well with your data set 
too which is not that uncommon after all.


Nicolas

  reply	other threads:[~2006-02-24 20:02 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-02-22  1:45 [PATCH] diff-delta: produce optimal pack data Nicolas Pitre
2006-02-24  8:49 ` Junio C Hamano
2006-02-24 15:37   ` Nicolas Pitre
2006-02-24 23:55     ` Junio C Hamano
2006-02-24 17:44   ` Carl Baldwin
2006-02-24 17:56     ` Nicolas Pitre
2006-02-24 18:35       ` Carl Baldwin
2006-02-24 18:57         ` Nicolas Pitre
2006-02-24 19:23           ` Carl Baldwin
2006-02-24 20:02             ` Nicolas Pitre [this message]
2006-02-24 20:40               ` Carl Baldwin
2006-02-24 21:12                 ` Nicolas Pitre
2006-02-24 22:50                   ` Carl Baldwin
2006-02-25  3:53                     ` Nicolas Pitre
2006-02-24 20:02             ` Linus Torvalds
2006-02-24 20:19               ` Nicolas Pitre
2006-02-24 20:53               ` Junio C Hamano
2006-02-24 21:39                 ` Nicolas Pitre
2006-02-24 21:48                   ` Nicolas Pitre
2006-02-25  0:45                   ` Linus Torvalds
2006-02-25  3:07                     ` Nicolas Pitre
2006-02-25  4:05                       ` Linus Torvalds
2006-02-25  5:10                         ` Nicolas Pitre
2006-02-25  5:35                           ` Nicolas Pitre
2006-03-07 23:48                             ` [RFH] zlib gurus out there? Junio C Hamano
2006-03-08  0:59                               ` Linus Torvalds
2006-03-08  1:22                                 ` Junio C Hamano
2006-03-08  2:00                                   ` Linus Torvalds
2006-03-08  9:46                                     ` Johannes Schindelin
2006-03-08 10:45                               ` [PATCH] write_sha1_file(): Perform Z_FULL_FLUSH between header and data Sergey Vlasov
2006-03-08 11:04                                 ` Junio C Hamano
2006-03-08 14:17                                   ` Sergey Vlasov
2006-02-25 19:18                           ` [PATCH] diff-delta: produce optimal pack data Linus Torvalds
2006-02-24 18:49       ` Carl Baldwin
2006-02-24 19:03         ` Nicolas Pitre

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Pine.LNX.4.64.0602241438521.31162@localhost.localdomain \
    --to=nico@cam.org \
    --cc=cnb@fc.hp.com \
    --cc=git@vger.kernel.org \
    --cc=junkio@cox.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).