git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Linus Torvalds <torvalds@linux-foundation.org>
To: Chris Lee <clee@kde.org>
Cc: git@vger.kernel.org
Subject: Re: Partitioned packs
Date: Tue, 3 Apr 2007 19:52:56 -0700 (PDT)	[thread overview]
Message-ID: <Pine.LNX.4.64.0704031944540.6730@woody.linux-foundation.org> (raw)
In-Reply-To: <Pine.LNX.4.64.0704031858470.6730@woody.linux-foundation.org>



On Tue, 3 Apr 2007, Linus Torvalds wrote:
> 
> So trying to partition things doesn't help (because the objects are 
> already well sorted), and it does hurt.

Side note: I think that there *are* cases where partitioned packs can do 
better, but I think that in order to do better you should

 - partition by "recency", ie put objects that are not reachable from any 
   recent point in older packs.

 - make sure that the "packed_git" list is always sorted so that the older 
   data packs are at the end.

and that should actually speed up many loads, just because the recent 
objects are all in one pack, and because it's smaller, that pack can be 
looked up a bit faster.

On the other hand, the power of a log(n) function like a binary search is 
that lookup in a big pack that is four times the size of four smaller 
packs is really not all that much more expensive, so the advantage is 
probably pretty small.

And for things that need old objects (and "git blame" does obviously very 
much tend to fall into that category), any partitioning is likely to be 
bad.

So I think partitioning is valid, but my suspicion is that you'd want to 
partition for *other* reasons than highest performance. Better reasons to 
have multiple packs:

 - just because you haven't repacked ;)
 - to keep "git repack" times down by marking old big packs as "keep" once 
   they get big enough (the space advantage of packing eventually flattens 
   out, so there's no real overwhelming reason to repack old stuff if you 
   have "enough")
 - filesystem and pack-file limitations (ie the 2**31 limit)

but I doubt performance is ever going to be a really compelling one.

You can obviously always optimize for some very *particular* load by 
packing optimally for just that one (keep exactly the objects you need in 
one particular pack, don't even touch any other packs), but I don't think 
any load is *so* special that you shouldn't think of other loads.

			Linus

      reply	other threads:[~2007-04-04  2:54 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-04-04  1:36 Partitioned packs Chris Lee
2007-04-04  1:16 ` David Lang
2007-04-04  1:58 ` Junio C Hamano
2007-04-04  2:14 ` Linus Torvalds
2007-04-04  2:52   ` Linus Torvalds [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Pine.LNX.4.64.0704031944540.6730@woody.linux-foundation.org \
    --to=torvalds@linux-foundation.org \
    --cc=clee@kde.org \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).