From: Shawn Pearce <spearce@spearce.org>
To: Jakub Narebski <jnareb@gmail.com>
Cc: git@vger.kernel.org
Subject: Re: Why do base objects appear behind the delta in packs?
Date: Tue, 29 Aug 2006 12:27:47 -0400 [thread overview]
Message-ID: <20060829162747.GA21729@spearce.org> (raw)
In-Reply-To: <ed1kn3$c3r$1@sea.gmane.org>
Jakub Narebski <jnareb@gmail.com> wrote:
> Shawn Pearce wrote:
>
> > From a data locality perspective putting the base object before
> > or after the delta shouldn't matter, as either way the delta
> > is useless without the base. So placing the base immediately
> > before the delta should perform just as well as placing it after.
> > Either way the OS should have the base in cache by the time the
> > delta is being accessed.
>
> _Should_ perform? Have you got any measurements of speed of creating "base
> before delta" pack, and reading objects from this kind of pack?
No, not yet. It just seemed odd to me that the base was put behind
the delta which then forces unpack-objects to hold a delta in memory
until it finds the corresponding base later in the stream when it
could have been just as simple to require the base appear before
the delta. I wondered what the rationale was for the additional
complexity in unpack-objects.
Nicolas' reply pointed out that the current arrangement of base
after delta may actually offer improved performance due to the
OS performing read-ahead when you seek to the delta. But he also
pointed out this base after delta situtation should be rather rare
as we try to delta older objects against newer objects and we try to
place newer objects at the front of the pack, so it likely shouldn't
matter that much.
I just instrumented builtin-pack-objects.c to count how many times
we put the delta before the base and then repacked a current Git
repo with `git repack -a -d -f`. 28167 objects, 19170 deltas. 6003
deltas appeared before their base objects. So 31% of the time.
That's certainly not the common case but it does occur with some
frequency. However resorting the output of verify-pack -v by offset
and visually looking at the entries you can clearly see it doesn't
happen very often early in the pack. Most of the objects in the
front of the pack are undeltafied commits.
This particular Git repository has 6723 commits and 905 trees that
weren't deltafied. That's a total of 4 MiB of uncompressed data,
most of which appears at the front of the pack. Only 68 commits
were deltas but 8067 trees were made into deltas. The compressed
commits seemed to occupy the first 2 MiB of the pack file; that's
25% of the 8 MiB pack. A commit-specific pack local dictionary
could be interesting here as it might some pack space.
I'm going to shutup now and not say anything further on the subject
unless I've got some hard results indicating a different organization
is better or worse than what we have right now.
--
Shawn.
next prev parent reply other threads:[~2006-08-29 16:27 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-08-29 13:42 Why do base objects appear behind the delta in packs? Shawn Pearce
2006-08-29 14:40 ` Nicolas Pitre
2006-08-29 14:58 ` Jakub Narebski
2006-08-29 16:27 ` Shawn Pearce [this message]
2006-08-29 17:34 ` Junio C Hamano
2006-08-29 17:44 ` Shawn Pearce
2006-08-29 18:16 ` Nicolas Pitre
2006-08-29 18:32 ` Shawn Pearce
2006-08-29 19:23 ` Jon Smirl
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20060829162747.GA21729@spearce.org \
--to=spearce@spearce.org \
--cc=git@vger.kernel.org \
--cc=jnareb@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).