git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Junio C Hamano <junkio@cox.net>
To: git@vger.kernel.org
Subject: [FYI] pack idx format
Date: Wed, 15 Feb 2006 00:39:23 -0800	[thread overview]
Message-ID: <7vd5hpm2x0.fsf@assigned-by-dhcp.cox.net> (raw)

This is still WIP but if anybody is interested...  Once done, it
should become Documentation/technical/pack-format.txt.

The reason I started doing this is to prototype this one:

	<7v4q3453qu.fsf@assigned-by-dhcp.cox.net>

-- >8 --

Idx file:

The idx file is to map object name SHA1 to offset into the
corresponding pack file.  There is the 'first-level fan-out'
table at the beginning, and then the main part of the index
follows.  This is a table whose entries are sorted by their
object name SHA1.  The file ends with some trailer information.

The main part is a table of 24-byte entries, and each entry is:

	offset : 4-byte network byte order integer.
	SHA1   : 20-byte object name SHA1.

The data for the named object begins at byte offset "offset" in
the corresponding pack file.

Before this main table, at the beginning of the idx file, there
is a table of 256 4-byte network byte order integers.  This is
called "first-level fan-out".  N-th entry of this table records
the offset into the main index for the first object whose object
name SHA1 starts with N+1.  fanout[255] points at the end of
main index.  The offset is expressed in 24-bytes unit.

Example:

	idx
	    +--------------------------------+
	    | fanout[0] = 2                  |-.
	    +--------------------------------+ |
	    | fanout[1]                      | |
	    +--------------------------------+ |
	    | fanout[2]                      | |
	    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
	    | fanout[255]                    | |
	    +--------------------------------+ |
main	    | offset                         | |
index	    | object name 00XXXXXXXXXXXXXXXX | |
table	    +--------------------------------+ | 
	    | offset                         | |
	    | object name 00XXXXXXXXXXXXXXXX | |
	    +--------------------------------+ |
	  .-| offset                         |<+
	  | | object name 01XXXXXXXXXXXXXXXX |
	  | +--------------------------------+
	  | | offset                         |
	  | | object name 01XXXXXXXXXXXXXXXX |
	  | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
	  | | offset                         |
	  | | object name FFXXXXXXXXXXXXXXXX |
	  | +--------------------------------+
trailer	  | | packfile checksum              |
	  | +--------------------------------+
	  | | idxfile checksum               |
	  | +--------------------------------+
          .-------.      
                  |
Pack file entry: <+

     packed object header:
	1-byte type (upper 4-bit)
	       size0 (lower 4-bit) 
        n-byte sizeN (as long as MSB is set, each 7-bit)
		size0..sizeN form 4+7+7+..+7 bit integer, size0
		is the most significant part.
     packed object data:
        If it is not DELTA, then deflated bytes (the size above
		is the size before compression).
	If it is DELTA, then
	  20-byte base object name SHA1 (the size above is the
	  	size of the delta data that follows).
          delta data, deflated.

             reply	other threads:[~2006-02-15  8:39 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-02-15  8:39 Junio C Hamano [this message]
2006-02-15 11:16 ` [FYI] pack idx format Johannes Schindelin
2006-02-15 16:46 ` Nicolas Pitre
2006-02-16  1:58   ` Junio C Hamano
2006-02-16  1:43 ` [PATCH] pack-objects: reuse data from existing pack Junio C Hamano
2006-02-16  1:45   ` [PATCH] packed objects: minor cleanup Junio C Hamano
2006-02-16  3:41   ` [PATCH] pack-objects: reuse data from existing pack Nicolas Pitre
2006-02-16  3:59     ` Junio C Hamano
2006-02-16  3:55   ` Linus Torvalds
2006-02-16  4:07     ` Junio C Hamano
2006-02-16  8:32   ` Andreas Ericsson
2006-02-16  9:13     ` Junio C Hamano
2006-02-17  4:30   ` Junio C Hamano
2006-02-17 10:37     ` [PATCH] pack-objects: finishing touches Junio C Hamano
2006-02-18  6:50       ` [PATCH] pack-objects: avoid delta chains that are too long Junio C Hamano
2006-02-17 15:39     ` [PATCH] pack-objects: reuse data from existing pack Linus Torvalds
2006-02-17 18:18       ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7vd5hpm2x0.fsf@assigned-by-dhcp.cox.net \
    --to=junkio@cox.net \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).