From: Linus Torvalds <torvalds@linux-foundation.org>
To: Chris Lee <clee@kde.org>
Cc: git@vger.kernel.org
Subject: Re: Partitioned packs
Date: Tue, 3 Apr 2007 19:52:56 -0700 (PDT) [thread overview]
Message-ID: <Pine.LNX.4.64.0704031944540.6730@woody.linux-foundation.org> (raw)
In-Reply-To: <Pine.LNX.4.64.0704031858470.6730@woody.linux-foundation.org>
On Tue, 3 Apr 2007, Linus Torvalds wrote:
>
> So trying to partition things doesn't help (because the objects are
> already well sorted), and it does hurt.
Side note: I think that there *are* cases where partitioned packs can do
better, but I think that in order to do better you should
- partition by "recency", ie put objects that are not reachable from any
recent point in older packs.
- make sure that the "packed_git" list is always sorted so that the older
data packs are at the end.
and that should actually speed up many loads, just because the recent
objects are all in one pack, and because it's smaller, that pack can be
looked up a bit faster.
On the other hand, the power of a log(n) function like a binary search is
that lookup in a big pack that is four times the size of four smaller
packs is really not all that much more expensive, so the advantage is
probably pretty small.
And for things that need old objects (and "git blame" does obviously very
much tend to fall into that category), any partitioning is likely to be
bad.
So I think partitioning is valid, but my suspicion is that you'd want to
partition for *other* reasons than highest performance. Better reasons to
have multiple packs:
- just because you haven't repacked ;)
- to keep "git repack" times down by marking old big packs as "keep" once
they get big enough (the space advantage of packing eventually flattens
out, so there's no real overwhelming reason to repack old stuff if you
have "enough")
- filesystem and pack-file limitations (ie the 2**31 limit)
but I doubt performance is ever going to be a really compelling one.
You can obviously always optimize for some very *particular* load by
packing optimally for just that one (keep exactly the objects you need in
one particular pack, don't even touch any other packs), but I don't think
any load is *so* special that you shouldn't think of other loads.
Linus
prev parent reply other threads:[~2007-04-04 2:54 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-04-04 1:36 Partitioned packs Chris Lee
2007-04-04 1:16 ` David Lang
2007-04-04 1:58 ` Junio C Hamano
2007-04-04 2:14 ` Linus Torvalds
2007-04-04 2:52 ` Linus Torvalds [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Pine.LNX.4.64.0704031944540.6730@woody.linux-foundation.org \
--to=torvalds@linux-foundation.org \
--cc=clee@kde.org \
--cc=git@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).