From: linux@horizon.com
To: jonsmirl@gmail.com, linux@horizon.com
Cc: git@vger.kernel.org, gitzilla@gmail.com
Subject: Re: A look at some alternative PACK file encodings
Date: 7 Sep 2006 09:34:56 -0400 [thread overview]
Message-ID: <20060907133456.24226.qmail@science.horizon.com> (raw)
In-Reply-To: <9e4733910609070557jd8cfc57nd4f7a8973b69f6ed@mail.gmail.com>
>> An alternative would be to create a small "placeholder" object that
>> just gives an ID, then refer to it by offset.
>>
>> That would avoid the need for an id/offset bit with every offset,
>> and possibly save more space if the same object was referenced
>> multiple times.
>>
>> And it just seems simpler.
> There are 2 million objects in the Mozilla pack. This table would take:
> 2M * (20b (sha) + 10b(object index/overhead) = 60MB
> This 60MB is pretty much incompressible and increases download time.
>
> Much better if storage of the sha1s can be totally eliminated and
> replaced by something smaller. Alternatively this map could be
> stripped for transmission and rebuilt locally.
Um, I think I wasn't clear. Objects in a "thin" pack (for network
updating of a different pack) that are referred to but not included
would have stand-ins containing just the object ID. Objects that *are*
present would simply be present and referred to by offset as usual.
Imagine you have a "thin" pack containing a delta to an object that the
recipient has, so isn't in the pack. The delta has to specify the
base object somehow. If the base object is in the pack, you can
specify it by offset. If it's not, you can either:
- Generalize the base object pointer to allow an object ID option, or
- Provide a pointer to a magic kind of "external reference" pointer
object.
I was proposing the latter.
For regular packs, such objects wouldn't even be present, because
all base objects are in the pack itself.
And, of course, you'd only create such objects if you needed to,
if there was at least one pointer to them.
Compared to putting the object ID directly in the pointer, it has
Cost: An extra offset pointer and object header.
Extra time follwoing the indirection resolving the pointer.
Benefit: Non-indirect object pointers are a bit smaller.
The code is simpler.
Second and later references to the same external object are
another offset, not another 20 bytes.
next prev parent reply other threads:[~2006-09-07 13:35 UTC|newest]
Thread overview: 37+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-09-07 9:07 A look at some alternative PACK file encodings linux
2006-09-07 12:57 ` Jon Smirl
2006-09-07 13:34 ` linux [this message]
2006-09-07 14:19 ` Jon Smirl
2006-09-07 15:01 ` linux
2006-09-07 14:39 ` Richard Curnow
2006-09-07 17:40 ` Junio C Hamano
2006-09-07 17:22 ` A Large Angry SCM
2006-09-07 17:32 ` Nicolas Pitre
2006-09-07 19:22 ` linux
-- strict thread matches above, loose matches on Subject: below --
2006-09-07 8:41 linux
2006-09-07 17:20 ` Nicolas Pitre
2006-09-07 19:16 ` linux
2006-09-06 21:47 A Large Angry SCM
2006-09-06 23:23 ` Jon Smirl
2006-09-06 23:39 ` A Large Angry SCM
2006-09-06 23:56 ` Linus Torvalds
2006-09-07 0:10 ` Jon Smirl
2006-09-07 0:06 ` David Lang
2006-09-07 0:19 ` A Large Angry SCM
2006-09-07 0:45 ` Linus Torvalds
2006-09-07 0:37 ` Nicolas Pitre
2006-09-07 0:04 ` Jon Smirl
2006-09-07 5:41 ` Shawn Pearce
2006-09-07 5:34 ` Shawn Pearce
2006-09-07 0:40 ` Nicolas Pitre
2006-09-07 0:59 ` Jon Smirl
2006-09-07 2:30 ` Nicolas Pitre
2006-09-07 2:33 ` A Large Angry SCM
2006-09-07 1:11 ` Junio C Hamano
2006-09-07 2:47 ` Nicolas Pitre
2006-09-07 4:33 ` Shawn Pearce
2006-09-07 5:27 ` Junio C Hamano
2006-09-07 5:46 ` Shawn Pearce
2006-09-07 18:50 ` Junio C Hamano
2006-09-07 5:21 ` Shawn Pearce
[not found] ` <9e4733910609061617m6783d6c4xaca2f9575e12d455@mail.gmail.com>
2006-09-07 5:39 ` A Large Angry SCM
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20060907133456.24226.qmail@science.horizon.com \
--to=linux@horizon.com \
--cc=git@vger.kernel.org \
--cc=gitzilla@gmail.com \
--cc=jonsmirl@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).