git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Shawn Pearce <spearce@spearce.org>
To: Linus Torvalds <torvalds@osdl.org>
Cc: Git Mailing List <git@vger.kernel.org>, Nicolas Pitre <nico@cam.org>
Subject: Re: 1.3.0 creating bigger packs than 1.2.3
Date: Thu, 20 Apr 2006 13:31:31 -0400	[thread overview]
Message-ID: <20060420173131.GF31738@spearce.org> (raw)
In-Reply-To: <Pine.LNX.4.64.0604200954440.3701@g5.osdl.org>

I just spent some time bisecting this issue and it looks like the
following change by Junio may be the culprit:

  commit 1d6b38cc76c348e2477506ca9759fc241e3d0d46
  Author: Junio C Hamano <junkio@cox.net>
  Date:   Wed Feb 22 22:10:24 2006 -0800
  
      pack-objects: use full pathname to help hashing with "thin" pack.
      
      This uses the same hashing algorithm to the "preferred base
      tree" objects and the incoming pathnames, to group the same
      files from different revs together, while spreading files with
      the same basename in different directories.
      
      Signed-off-by: Junio C Hamano <junkio@cox.net>
  
  :100644 100644 af3bdf5d358b8a47ed23bcb7e9721e956eb59d60 3a16b7e4ce25ec05c64817dfd92dd9d517ab9dd3 M      pack-objects.c


Linus Torvalds <torvalds@osdl.org> wrote:
> 
> 
> On Thu, 20 Apr 2006, Shawn Pearce wrote:
> > 
> > Oddly enough repacking the v1.2.3 pack using 1.3.0.g56c1 created an
> > even smaller pack ("git-repack -a -d"):
> 
> That's "normal". Repacking without -f will always pack _more_, never less. 
> So a different packing algorithm can only improve (of course, usually not 
> by a huge margin, and it quickly diminishes).
> 
> > but then adding -f definately gives us the 2x explosion again:
> > 
> >   Total 46391, written 46391 (delta 6649), reused 37894 (delta 0)
> >   129M pack-7f766f5af5547554bacb28c0294bd562589dc5e7.pack
> 
> Right. Doing the -f will discard any old packing info, so if the new 
> packing algorithm has problems (and it obviously does), then using -f will 
> show them.
> 
> > > You could try to revert that change:
> > > 
> > > 	git revert eeef7135fed9b8784627c4c96e125241c06c65e1
> > 
> > Whoa.  I did that revert and fixup on top of 'next'.  The pack
> > from "git-repack -a -d -f" is now even larger due to even less
> > delta reuse:
> 
> Ok, so that wasn't it, and the new sort order is superior.
> 
> That means that it probably _is_ the delta changes themselves (probably 
> commit c13c6bf7 "diff-delta: bound hash list length to avoid O(m*n) 
> behavior". You can try
> 
> 	git revert c13c6bf7
> 
> to see if that's it. Although Nico already showed interest, and if you 
> make the archive available to him, he's sure to figure it out.
> 
> > With --window=50 on 'next' (without the revert'):
> > 
> >   Total 46391, written 46391 (delta 6666), reused 39723 (delta 0)
> >   129M pack-7f766f5af5547554bacb28c0294bd562589dc5e7.pack
> 
> Yeah, that didn't do much. Slightly more deltas than without, but not a 
> lot, and it didn't matter much size-wise.
> 
> You can try "--depth=50" (slogan: more "hot delta on delta action"), but 
> it's looking less and less like a delta selection issue, and more and more 
> like the deltas themselves are deproved.
> 
> 			Linus

-- 
Shawn.

  parent reply	other threads:[~2006-04-20 17:31 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-04-20 13:36 1.3.0 creating bigger packs than 1.2.3 Shawn Pearce
2006-04-20 14:47 ` Linus Torvalds
2006-04-20 15:03   ` Shawn Pearce
2006-04-20 16:07     ` Linus Torvalds
2006-04-20 16:43       ` Shawn Pearce
2006-04-20 17:03         ` Linus Torvalds
2006-04-20 17:24           ` Junio C Hamano
2006-04-20 17:31           ` Shawn Pearce [this message]
2006-04-20 17:54             ` Nicolas Pitre
2006-04-20 21:31             ` Junio C Hamano
2006-04-20 21:53               ` Shawn Pearce
2006-04-20 21:56               ` Jakub Narebski
2006-04-20 17:41           ` Nicolas Pitre
2006-04-20 17:55           ` Shawn Pearce
2006-04-20 18:24             ` Nicolas Pitre
2006-04-20 18:49               ` Junio C Hamano
2006-04-20 21:02                 ` Nicolas Pitre
2006-04-20 21:40                   ` Junio C Hamano
2006-04-20 22:02                     ` Shawn Pearce
2006-04-20 22:35                       ` Junio C Hamano
2006-04-21  1:01                         ` Shawn Pearce
2006-04-20 22:59                       ` Linus Torvalds
2006-04-21  0:52                     ` Nicolas Pitre
2006-04-21  1:20                     ` Shawn Pearce
2006-04-21  2:28                       ` Nicolas Pitre
2006-04-21  2:40                         ` Shawn Pearce
2006-04-21  3:07                           ` Nicolas Pitre
2006-04-21  2:32                       ` Shawn Pearce
2006-04-20 23:02                   ` Junio C Hamano
2006-04-20 16:09 ` Nicolas Pitre

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20060420173131.GF31738@spearce.org \
    --to=spearce@spearce.org \
    --cc=git@vger.kernel.org \
    --cc=nico@cam.org \
    --cc=torvalds@osdl.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).