git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: A Large Angry SCM <gitzilla@gmail.com>
To: Linus Torvalds <torvalds@osdl.org>
Cc: Jon Smirl <jonsmirl@gmail.com>, git@vger.kernel.org
Subject: Re: A look at some alternative PACK file encodings
Date: Wed, 06 Sep 2006 17:19:18 -0700	[thread overview]
Message-ID: <44FF6586.8080206@gmail.com> (raw)
In-Reply-To: <Pine.LNX.4.64.0609061651500.27779@g5.osdl.org>

Linus Torvalds wrote:
> 
> On Wed, 6 Sep 2006, A Large Angry SCM wrote:
> 
>> Jon Smirl wrote:
>>> On 9/6/06, A Large Angry SCM <gitzilla@gmail.com> wrote:
>>>> TREE objects do not delta or deflate well.
>>> I can understand why they don't deflate, the path names are pretty
>>> much unique and the sha1s are incompressible. By why don't they delta
>>> well? Does sorting them by size mess up the delta process?
>> My guess would be the TREEs would only delta well against other TREE
>> versions for the same path.
> 
> That's what you'd normally have in a real project, though. I wonder if 
> your "pack mashup" lost the normal behaviour: we very much sort trees 
> together normally, thanks to the "sort-by-filename, then by size" 
> behaviour that git-pack-objects should have (for trees, the size normally 
> shouldn't change, so the sorting should basically boil down to "sort the 
> same directory together, keeping the ordering it had from git-rev-list").

The mashup is just all the projects in a single repository with a bushy
refs tree so I can view the updates in a single gitk window.

The sorting by name, then by path may be breaking the object version
relationship for wide graphs.

> Btw, that "keeping the ordering it had" part I'm not convinced we actually 
> enforce. That would depend on the sort algorithm used by "qsort()", I 
> think. So there might be room for improvement there in order to keep 
> things in recency order.

qsort() is not stable.

>> Just looking at the structures in non-BLOBS, I see a lot of potential
>> for the use of a set dictionaries when deflating TREEs and another set
>> of dictionaries when deflating COMMITs and TAGs. The low hanging fruit
>> is to create dictionaries of the most referenced IDs across all TREE or
>> COMMIT/TAG objects.
>
> Is there any way to get zlib to just generate a suggested dictionary from 
> a given set of input?

The docs suggest "no".

  parent reply	other threads:[~2006-09-07  0:19 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-09-06 21:47 A look at some alternative PACK file encodings A Large Angry SCM
2006-09-06 23:23 ` Jon Smirl
2006-09-06 23:39   ` A Large Angry SCM
2006-09-06 23:56     ` Linus Torvalds
2006-09-07  0:10       ` Jon Smirl
2006-09-07  0:06         ` David Lang
2006-09-07  0:19       ` A Large Angry SCM [this message]
2006-09-07  0:45         ` Linus Torvalds
2006-09-07  0:37       ` Nicolas Pitre
2006-09-07  0:04     ` Jon Smirl
2006-09-07  5:41       ` Shawn Pearce
2006-09-07  5:34     ` Shawn Pearce
2006-09-07  0:40   ` Nicolas Pitre
2006-09-07  0:59     ` Jon Smirl
2006-09-07  2:30       ` Nicolas Pitre
2006-09-07  2:33       ` A Large Angry SCM
2006-09-07  1:11     ` Junio C Hamano
2006-09-07  2:47       ` Nicolas Pitre
2006-09-07  4:33     ` Shawn Pearce
2006-09-07  5:27       ` Junio C Hamano
2006-09-07  5:46         ` Shawn Pearce
2006-09-07 18:50           ` Junio C Hamano
2006-09-07  5:21   ` Shawn Pearce
     [not found] ` <9e4733910609061617m6783d6c4xaca2f9575e12d455@mail.gmail.com>
2006-09-07  5:39   ` A Large Angry SCM
  -- strict thread matches above, loose matches on Subject: below --
2006-09-07  8:41 linux
2006-09-07 17:20 ` Nicolas Pitre
2006-09-07 19:16   ` linux
2006-09-07  9:07 linux
2006-09-07 12:57 ` Jon Smirl
2006-09-07 13:34   ` linux
2006-09-07 14:19     ` Jon Smirl
2006-09-07 15:01       ` linux
2006-09-07 14:39     ` Richard Curnow
2006-09-07 17:40       ` Junio C Hamano
2006-09-07 17:22   ` A Large Angry SCM
2006-09-07 17:32 ` Nicolas Pitre
2006-09-07 19:22   ` linux

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=44FF6586.8080206@gmail.com \
    --to=gitzilla@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=jonsmirl@gmail.com \
    --cc=torvalds@osdl.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).