git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Thomas Rast <trast@student.ethz.ch>
To: Junio C Hamano <gitster@pobox.com>
Cc: <git@vger.kernel.org>, Thomas Gummerer <t.gummerer@gmail.com>,
	"David Michael Barr" <davidbarr@google.com>
Subject: Re: [PATCH 2/2] index-v4: document the entry format
Date: Mon, 30 Apr 2012 19:20:16 +0200	[thread overview]
Message-ID: <87vckhuofj.fsf@thomas.inf.ethz.ch> (raw)
In-Reply-To: <xmqqpqas93sa.fsf_-_@junio.mtv.corp.google.com> (Junio C. Hamano's message of "Fri, 27 Apr 2012 16:02:45 -0700")

Hi Junio,

I seem to have completely missed the earlier series at

  http://thread.gmane.org/gmane.comp.version-control.git/194660

My bad.

Thomas has been working on a prototype converter over the past few days,
with results similar to (but not quite as good as) your numbers

    $ ls -l .git/index*
    -rw-r----- 1 jch eng 25586488 2012-04-03 15:27 .git/index
    -rw-r----- 1 jch eng 14654328 2012-04-03 15:38 .git/index-4

while taking a different approach with different tradeoffs.

Nevertheless...

> +  (Version 4) In version 4, the entry path name is prefix-compressed
> +    relative to the path name for the previous entry (the very first
> +    entry is encoded as if the path name for the previous entry is an
> +    empty string).  At the beginning of an entry, an integer N in the
> +    variable width encoding (the same encoding as the offset is encoded
> +    for OFS_DELTA pack entries; see pack-format.txt) is stored, followed
> +    by a NUL-terminated string S.  Removing N bytes from the end of the
> +    path name for the previous entry, and replacing it with the string S
> +    yields the path name for this entry.
[..]
> +  (Version 4) In version 4, the padding after the pathname does not
> +  exist.

I think there are actually several separate ideas here:

* The prefix compression.  Thomas is not using this idea; we've been
  toying with making the index bisectable (within each directory) for
  fast single-entry lookups, which inherently conflicts with this.  The
  directory-like layout partially achieves the same (elides common path
  components).

* The varint encoding (or offset encoding, but "varint" is something you
  can google :-).  David suggested using it on stat() data, combined
  with zigzag encoding and delta against the first entry in the
  directory, which gives some good compression results.  Profiling will
  have to say whether the extra decoding effort is worth the space
  savings.

* The lack of variable padding, which is a good idea -- in any case I
  seem to remember Shawn complaining about it.

-- 
Thomas Rast
trast@{inf,student}.ethz.ch

  reply	other threads:[~2012-04-30 17:20 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-04-03 22:53 [PATCH 0/9] Prefix-compress on-disk index entries Junio C Hamano
2012-04-03 22:53 ` [PATCH 1/9] varint: make it available outside the context of pack Junio C Hamano
2012-04-03 22:53 ` [PATCH 2/9] cache.h: hide on-disk index details Junio C Hamano
2012-04-03 22:53 ` [PATCH 3/9] read-cache.c: allow unaligned mapping of the index file Junio C Hamano
2012-04-03 22:53 ` [PATCH 4/9] read-cache.c: make create_from_disk() report number of bytes it consumed Junio C Hamano
2012-04-03 22:53 ` [PATCH 5/9] read-cache.c: report the header version we do not understand Junio C Hamano
2012-04-03 22:53 ` [PATCH 6/9] read-cache.c: move code to copy ondisk to incore cache to a helper function Junio C Hamano
2012-04-03 22:53 ` [PATCH 7/9] read-cache.c: move code to copy incore to ondisk " Junio C Hamano
2012-04-03 22:53 ` [PATCH 8/9] read-cache.c: read prefix-compressed names in index on-disk version v4 Junio C Hamano
2012-04-03 22:53 ` [PATCH 9/9] read-cache.c: write index v4 format Junio C Hamano
2012-04-04  1:44 ` [PATCH 0/9] Prefix-compress on-disk index entries David Barr
2012-04-04 15:33   ` Junio C Hamano
2012-04-04 16:57     ` Junio C Hamano
2012-04-04 16:58       ` [PATCH 2/2] update-index: upgrade/downgrade on-disk index version Junio C Hamano
2012-04-04 12:34 ` [PATCH 0/9] Prefix-compress on-disk index entries Nguyen Thai Ngoc Duy
2012-04-04 18:44   ` Junio C Hamano
2012-04-06  8:41     ` David Barr
2012-05-02  1:58       ` Nguyen Thai Ngoc Duy
2012-05-02  4:26         ` David Barr
2012-04-27 22:58 ` [PATCH 1/2] unpack-trees: preserve the index file version of original Junio C Hamano
2012-04-27 23:02   ` [PATCH 2/2] index-v4: document the entry format Junio C Hamano
2012-04-30 17:20     ` Thomas Rast [this message]
2012-05-01  4:00       ` Junio C Hamano
2012-05-01 21:43         ` Thomas Rast
2012-05-02 15:12         ` Shawn Pearce
2012-05-02 17:04           ` Junio C Hamano
2012-05-02 17:13             ` Shawn Pearce

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87vckhuofj.fsf@thomas.inf.ethz.ch \
    --to=trast@student.ethz.ch \
    --cc=davidbarr@google.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=t.gummerer@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).