git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Shawn Pearce <spearce@spearce.org>
To: Junio C Hamano <junkio@cox.net>
Cc: git@vger.kernel.org
Subject: Re: Why do base objects appear behind the delta in packs?
Date: Tue, 29 Aug 2006 13:44:48 -0400	[thread overview]
Message-ID: <20060829174448.GD21729@spearce.org> (raw)
In-Reply-To: <7v8xl7moo7.fsf@assigned-by-dhcp.cox.net>

Junio C Hamano <junkio@cox.net> wrote:
> Shawn Pearce <spearce@spearce.org> writes:
> 
> >> Shawn Pearce wrote:
> >> 
> >> > From a data locality perspective putting the base object before
> >> > or after the delta shouldn't matter, as either way the delta
> >> > is useless without the base.  So placing the base immediately
> >> > before the delta should perform just as well as placing it after.
> >> > Either way the OS should have the base in cache by the time the
> >> > delta is being accessed.
> >... 
> > I'm going to shutup now and not say anything further on the subject
> > unless I've got some hard results indicating a different organization
> > is better or worse than what we have right now.
> 
> I think that may be a sensible thing to do (no sarcasm -- I
> think this measurement is long overdue).
> 
> The code was initially proposed just like you suggested but is
> in the current form precisely for the reason of avoiding
> back-seek.  I distinctly remember me asking Linus "does mmap()
> favor forward scan by doing readahead?  I thought its point was
> to allow random access" (the answer is "yes" and "yes but
> forward is the common case").
> 
> The pack-using side in sha1_file.c used to read deltified object
> (both header and delta) in full, pick up and read base, and
> apply delta to base.  This was thought to be memory hungry on a
> longer delta chain, so the current code reads only the header of
> a deltified object, reads base, then reads the delta to apply.
> The last step involves seeking back, and might make the
> back-seek avoidance less effective than before.

Thank you.  That was the sort of response I was looking for.  :-)

I know Jon wants to shrink that ~500 MB Mozilla pack to something
a lot smaller, and I'd like to help him do that without losing huge
amounts of performance on the read.  Very long delta chains (5000!)
are simply impossible to wade through for even one object; doing it
for an entire commit to checkout the files is something I wouldn't
want to wish on anyone.

So I'm probably going to wind up spending some time doing research
and experimentation on pack storage.  I may just discover we're
as good as we can get.  Or I may find that doing something else
saves us only 5% at the cost of far too much complexity and thus
isn't really worth doing.  Or I may get lucky and discover a way
to improve on what we have.

More on this thread (maybe) in a few months.  I have other stuff
I should be doing right now.  :)

-- 
Shawn.

  reply	other threads:[~2006-08-29 17:44 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-08-29 13:42 Why do base objects appear behind the delta in packs? Shawn Pearce
2006-08-29 14:40 ` Nicolas Pitre
2006-08-29 14:58 ` Jakub Narebski
2006-08-29 16:27   ` Shawn Pearce
2006-08-29 17:34     ` Junio C Hamano
2006-08-29 17:44       ` Shawn Pearce [this message]
2006-08-29 18:16         ` Nicolas Pitre
2006-08-29 18:32           ` Shawn Pearce
2006-08-29 19:23             ` Jon Smirl

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20060829174448.GD21729@spearce.org \
    --to=spearce@spearce.org \
    --cc=git@vger.kernel.org \
    --cc=junkio@cox.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).