Re: [PATCH] diff-delta: produce optimal pack data

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Nicolas Pitre <nico@cam.org>
To: Carl Baldwin <cnb@fc.hp.com>
Cc: Junio C Hamano <junkio@cox.net>, git@vger.kernel.org
Subject: Re: [PATCH] diff-delta: produce optimal pack data
Date: Fri, 24 Feb 2006 16:12:14 -0500 (EST)	[thread overview]
Message-ID: <Pine.LNX.4.64.0602241544270.31162@localhost.localdomain> (raw)
In-Reply-To: <20060224204022.GA15962@hpsvcnb.fc.hp.com>

On Fri, 24 Feb 2006, Carl Baldwin wrote:

> On Fri, Feb 24, 2006 at 03:02:07PM -0500, Nicolas Pitre wrote:
> > Well that is probably a bit tight.  Ideally it should be linear with the 
> > size of the data set to process.  If you have 10 files 10MB each it 
> > should take about the same time to pack than 10000 files of 10KB each.  
> > Of course incrementally packing one additional 10MB file might take more 
> > than a second although it is only one file.
> 
> Well, I might not have been fair here.  I tried an experiment where I
> packed each of the twelve large blob objects explicitly one-by-one using
> git-pack-objects.  Incrementally packing each single object was very
> fast.  Well under a second per object on my machine.
> 
> After the twelve large objects were packed into individual packs the
> rest of the packing went very quickly and git v1.2.3's date reuse worked
> very well.  This was sort of my attempt at simulating how things would
> be if git avoided deltification of each of these large files. I'm sorry
> to have been so harsh earlier I just didn't understand that
> incrementally packing one-by-one was going to help this much.

Hmmmmmmm....

I don't think I understand what is going on here.

You say that, if you add those big files and incrementally repack after 
each commit using git repack with no option, then it requires only about 
one second each time.  Right?

But if you use "git-repack -a -f" then it is gone for more than an hour?

I'd expect something like 2 * (sum i for i = 1 to 10) i.e. in the 110 
second range due to the combinatorial effect when repacking everything.  
This is far from one hour and something appears to be really really 
wrong.

How many files besides those 12 big blobs do you have?

> This gives me hope that if somehow git were to not attempt to deltify
> these objects then performance would be much better than acceptible.
> 
> [snip]
> > However, if you could let me play with two samples of your big file I'd 
> > be grateful.  If so I'd like to make git work well with your data set 
> > too which is not that uncommon after all.
> 
> I would be happy to do this.  I will probably need to scrub a bit and
> make sure that the result shows the same characteristics.  How would you
> like me to deliver these files to you?  They are about 25 MB deflated.

If you can add them into a single .tgz with instructions on how 
to reproduce the issue and provide me with an URL where I can fetch it 
that'd be perfect.


Nicolas

next prev parent reply	other threads:[~2006-02-24 21:12 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-02-22  1:45 [PATCH] diff-delta: produce optimal pack data Nicolas Pitre
2006-02-24  8:49 ` Junio C Hamano
2006-02-24 15:37   ` Nicolas Pitre
2006-02-24 23:55     ` Junio C Hamano
2006-02-24 17:44   ` Carl Baldwin
2006-02-24 17:56     ` Nicolas Pitre
2006-02-24 18:35       ` Carl Baldwin
2006-02-24 18:57         ` Nicolas Pitre
2006-02-24 19:23           ` Carl Baldwin
2006-02-24 20:02             ` Nicolas Pitre
2006-02-24 20:40               ` Carl Baldwin
2006-02-24 21:12                 ` Nicolas Pitre [this message]
2006-02-24 22:50                   ` Carl Baldwin
2006-02-25  3:53                     ` Nicolas Pitre
2006-02-24 20:02             ` Linus Torvalds
2006-02-24 20:19               ` Nicolas Pitre
2006-02-24 20:53               ` Junio C Hamano
2006-02-24 21:39                 ` Nicolas Pitre
2006-02-24 21:48                   ` Nicolas Pitre
2006-02-25  0:45                   ` Linus Torvalds
2006-02-25  3:07                     ` Nicolas Pitre
2006-02-25  4:05                       ` Linus Torvalds
2006-02-25  5:10                         ` Nicolas Pitre
2006-02-25  5:35                           ` Nicolas Pitre
2006-03-07 23:48                             ` [RFH] zlib gurus out there? Junio C Hamano
2006-03-08  0:59                               ` Linus Torvalds
2006-03-08  1:22                                 ` Junio C Hamano
2006-03-08  2:00                                   ` Linus Torvalds
2006-03-08  9:46                                     ` Johannes Schindelin
2006-03-08 10:45                               ` [PATCH] write_sha1_file(): Perform Z_FULL_FLUSH between header and data Sergey Vlasov
2006-03-08 11:04                                 ` Junio C Hamano
2006-03-08 14:17                                   ` Sergey Vlasov
2006-02-25 19:18                           ` [PATCH] diff-delta: produce optimal pack data Linus Torvalds
2006-02-24 18:49       ` Carl Baldwin
2006-02-24 19:03         ` Nicolas Pitre

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Pine.LNX.4.64.0602241544270.31162@localhost.localdomain \
    --to=nico@cam.org \
    --cc=cnb@fc.hp.com \
    --cc=git@vger.kernel.org \
    --cc=junkio@cox.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).