git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Nicolas Pitre <nico@cam.org>
To: Junio C Hamano <junkio@cox.net>
Cc: Linus Torvalds <torvalds@osdl.org>, git@vger.kernel.org
Subject: Re: [PATCH] pack-objects: reuse data from existing pack.
Date: Wed, 15 Feb 2006 22:41:24 -0500 (EST)	[thread overview]
Message-ID: <Pine.LNX.4.64.0602152226130.5606@localhost.localdomain> (raw)
In-Reply-To: <7vbqx8m62q.fsf@assigned-by-dhcp.cox.net>

On Wed, 15 Feb 2006, Junio C Hamano wrote:

> When generating a new pack, notice if we have already the wanted
> object in existing packs.  If the object has a delitified
> representation, and its base object is also what we are going to
> pack, then reuse the existing deltified representation
> unconditionally, bypassing all the expensive find_deltas() and
> try_deltas() routines.
> 
> Also, when writing out such deltified representation and
> undeltified representation, if a matching data already exists in
> an existing pack, just write it out without uncompressing &
> recompressing.

Great !

> Without this patch:
> 
>     $ git-rev-list --objects v1.0.0 >RL
>     $ time git-pack-objects p <RL
> 
>     Generating pack...
>     Done counting 12233 objects.
>     Packing 12233 objects....................
>     60a88b3979df41e22d1edc3967095e897f720192
> 
>     real    0m32.751s
>     user    0m27.090s
>     sys     0m2.750s
> 
> With this patch:
> 
>     $ git-rev-list --objects v1.0.0 >RL
>     $ time ../git.junio/git-pack-objects q <RL
> 
>     Generating pack...
>     Done counting 12233 objects.
>     Packing 12233 objects.....................
>     60a88b3979df41e22d1edc3967095e897f720192
>     Total 12233, written 12233, reused 12177
> 
>     real    0m4.007s
>     user    0m3.360s
>     sys     0m0.090s
> 
> Signed-off-by: Junio C Hamano <junkio@cox.net>
> 
> ---
> 
>  * This may depend on one cleanup patch I have not sent out, but
>    I am so excited that I could not help sending this out first.
> 
>    Admittedly this is hot off the press, I have not had enough
>    time to beat this too hard, but the resulting pack from the
>    above passed unpack-objects, index-pack and verify-pack.

In fact, the resulting pack should be identical with or without this 
patch, shouldn't it?

FYI: I have list of patches to produce even smaller (yet still 
compatible) packs, or less dense ones but with much reduced CPU usage.  
All depending on a new --speed argument to git-pack-objects.  I've been 
able to produce 15-20% smaller packs with the same depth and window 
size, but taking twice as much CPU time to produce. Combined with your 
patch, one could repack the object store with the maximum compression 
even if it is expensive CPU wise, but any pull will benefit from it 
afterwards with no additional cost.

I only need to find some time to finally clean and re-test those 
patches...


Nicolas

  parent reply	other threads:[~2006-02-16  3:41 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-02-15  8:39 [FYI] pack idx format Junio C Hamano
2006-02-15 11:16 ` Johannes Schindelin
2006-02-15 16:46 ` Nicolas Pitre
2006-02-16  1:58   ` Junio C Hamano
2006-02-16  1:43 ` [PATCH] pack-objects: reuse data from existing pack Junio C Hamano
2006-02-16  1:45   ` [PATCH] packed objects: minor cleanup Junio C Hamano
2006-02-16  3:41   ` Nicolas Pitre [this message]
2006-02-16  3:59     ` [PATCH] pack-objects: reuse data from existing pack Junio C Hamano
2006-02-16  3:55   ` Linus Torvalds
2006-02-16  4:07     ` Junio C Hamano
2006-02-16  8:32   ` Andreas Ericsson
2006-02-16  9:13     ` Junio C Hamano
2006-02-17  4:30   ` Junio C Hamano
2006-02-17 10:37     ` [PATCH] pack-objects: finishing touches Junio C Hamano
2006-02-18  6:50       ` [PATCH] pack-objects: avoid delta chains that are too long Junio C Hamano
2006-02-17 15:39     ` [PATCH] pack-objects: reuse data from existing pack Linus Torvalds
2006-02-17 18:18       ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Pine.LNX.4.64.0602152226130.5606@localhost.localdomain \
    --to=nico@cam.org \
    --cc=git@vger.kernel.org \
    --cc=junkio@cox.net \
    --cc=torvalds@osdl.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).