* How to unpack recent objects?
@ 2010-12-16 20:33 Phillip Susi
2010-12-16 20:40 ` Jonathan Nieder
2010-12-16 21:19 ` Nicolas Pitre
0 siblings, 2 replies; 6+ messages in thread
From: Phillip Susi @ 2010-12-16 20:33 UTC (permalink / raw)
To: git
It looks like you can use git-unpack-objects to unpack ALL objects, but
how can you unpack only recent ones that you are likely to use while
leaving the ancient stuff packed? Ideally I want to unpack all file
objects from the current commit, and a reasonable number of commit
objects going back into the history so accessing them with checkout,
diff, log, etc will be fast.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: How to unpack recent objects?
2010-12-16 20:33 How to unpack recent objects? Phillip Susi
@ 2010-12-16 20:40 ` Jonathan Nieder
2010-12-16 21:19 ` Nicolas Pitre
1 sibling, 0 replies; 6+ messages in thread
From: Jonathan Nieder @ 2010-12-16 20:40 UTC (permalink / raw)
To: Phillip Susi; +Cc: git
Hi Phillip,
Phillip Susi wrote:
> It looks like you can use git-unpack-objects to unpack ALL objects, but
> how can you unpack only recent ones that you are likely to use while
> leaving the ancient stuff packed? Ideally I want to unpack all file
> objects from the current commit, and a reasonable number of commit
> objects going back into the history so accessing them with checkout,
> diff, log, etc will be fast.
Have you tried the experiment? You can pack all objects and then make
a few commits that do not reuse any blobs from before on top of that;
then "cp -a" the repository and use "git gc --aggressive" to get one
big pack as a control. Then it should be possible to time checkout,
diff, log, etc[1].
It would also be interesting to know what the nature of these objects
are, in case it is possible to speed things up some other way.
Jonathan
[1] My uninformed guess is that the packed version will be faster,
because of cache effects among other reasons. The point of loose
objects is to speed up writing objects rather than reading them.
But I'd be happy to be surprised.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: How to unpack recent objects?
2010-12-16 20:33 How to unpack recent objects? Phillip Susi
2010-12-16 20:40 ` Jonathan Nieder
@ 2010-12-16 21:19 ` Nicolas Pitre
2010-12-16 22:06 ` Phillip Susi
1 sibling, 1 reply; 6+ messages in thread
From: Nicolas Pitre @ 2010-12-16 21:19 UTC (permalink / raw)
To: Phillip Susi; +Cc: git
On Thu, 16 Dec 2010, Phillip Susi wrote:
> It looks like you can use git-unpack-objects to unpack ALL objects, but
> how can you unpack only recent ones that you are likely to use while
> leaving the ancient stuff packed? Ideally I want to unpack all file
> objects from the current commit, and a reasonable number of commit
> objects going back into the history so accessing them with checkout,
> diff, log, etc will be fast.
What makes you think that unpacking them will actually make the access
to them faster? Instead, you should consider _repacking_ them,
ultimately using the --aggressive parameter with the gc command, if you
want faster accesses.
Nicolas
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: How to unpack recent objects?
2010-12-16 21:19 ` Nicolas Pitre
@ 2010-12-16 22:06 ` Phillip Susi
2010-12-16 22:18 ` Jakub Narebski
2010-12-16 23:12 ` Nicolas Pitre
0 siblings, 2 replies; 6+ messages in thread
From: Phillip Susi @ 2010-12-16 22:06 UTC (permalink / raw)
To: Nicolas Pitre; +Cc: git
On 12/16/2010 4:19 PM, Nicolas Pitre wrote:
> What makes you think that unpacking them will actually make the access
> to them faster? Instead, you should consider _repacking_ them,
> ultimately using the --aggressive parameter with the gc command, if you
> want faster accesses.
Because decompressing and undeltifying the objects in the pack file
takes a fair amount of cpu time. It seems a waste to do this for the
same set of objects repeatedly rather than just keeping them loose.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: How to unpack recent objects?
2010-12-16 22:06 ` Phillip Susi
@ 2010-12-16 22:18 ` Jakub Narebski
2010-12-16 23:12 ` Nicolas Pitre
1 sibling, 0 replies; 6+ messages in thread
From: Jakub Narebski @ 2010-12-16 22:18 UTC (permalink / raw)
To: Phillip Susi; +Cc: Nicolas Pitre, git
Phillip Susi <psusi@cfl.rr.com> writes:
> On 12/16/2010 4:19 PM, Nicolas Pitre wrote:
> > What makes you think that unpacking them will actually make the access
> > to them faster? Instead, you should consider _repacking_ them,
> > ultimately using the --aggressive parameter with the gc command, if you
> > want faster accesses.
>
> Because decompressing and undeltifying the objects in the pack file
> takes a fair amount of cpu time. It seems a waste to do this for the
> same set of objects repeatedly rather than just keeping them loose.
Loose objects are also compressed.
Besides git has some kind of delta cache, so when you are accessing a
few objects (like e.g. when doing 'git log -p' - log + diff) you don't
need to undeltify and uncompress the same objects repeatedly.
Also in practice it is IO that is bottleneck, not CPU. And having
many files is bad for filesystem cache. Originally packfiles were for
the network transfer, but it turned out that they are better also as
on-disk format.
--
Jakub Narebski
Poland
ShadeHawk on #git
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: How to unpack recent objects?
2010-12-16 22:06 ` Phillip Susi
2010-12-16 22:18 ` Jakub Narebski
@ 2010-12-16 23:12 ` Nicolas Pitre
1 sibling, 0 replies; 6+ messages in thread
From: Nicolas Pitre @ 2010-12-16 23:12 UTC (permalink / raw)
To: Phillip Susi; +Cc: git
On Thu, 16 Dec 2010, Phillip Susi wrote:
> On 12/16/2010 4:19 PM, Nicolas Pitre wrote:
> > What makes you think that unpacking them will actually make the access
> > to them faster? Instead, you should consider _repacking_ them,
> > ultimately using the --aggressive parameter with the gc command, if you
> > want faster accesses.
>
> Because decompressing and undeltifying the objects in the pack file
> takes a fair amount of cpu time. It seems a waste to do this for the
> same set of objects repeatedly rather than just keeping them loose.
Well, here are a couple implementation details you might not know about:
1) Loose objects are compressed too. So you gain nothing on that front
by keeping objects loose.
2) Delta ordering is so that recent objects, i.e. those belonging to
most recent commits, are not delta compressed but rather used as base
objects for "older" objects to delta against. So in practice, the
cost of undeltifying objects is pushed towards objects that you're
most unlikely to access frequently.
3) Object placement within the pack is also optimized so that
objects belonging to recent commits are close together, and walking
them creates a linear IO access pattern which is much faster than
accessing random individual files as loose objects are.
4) Packed objects take considerably less space than loose ones which
makes for much better usage of the file system cache in the operating
system. This largely outweights the cost of undeltifying objects.
5) Git also keeps a cache of most frequently referenced objects when
replaying delta chains so deep deltas don't bring exponential costs.
And, in some cases, Git does even pick up the content of an object by
using its checked out form in the working directory directly instead of
locating and decompressing the object data.
So you shouldn't have to worry on that front. Git is not the fastest
SCM out there just by luck.
Nicolas
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2010-12-16 23:12 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-12-16 20:33 How to unpack recent objects? Phillip Susi
2010-12-16 20:40 ` Jonathan Nieder
2010-12-16 21:19 ` Nicolas Pitre
2010-12-16 22:06 ` Phillip Susi
2010-12-16 22:18 ` Jakub Narebski
2010-12-16 23:12 ` Nicolas Pitre
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).