All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Nguyễn Thái Ngọc Duy" <pclouds@gmail.com>
To: git@vger.kernel.org
Cc: "Nicolas Pitre" <nico@fluxnic.net>,
	"Junio C Hamano" <gitster@pobox.com>,
	"Nguyễn Thái Ngọc Duy" <pclouds@gmail.com>
Subject: [PATCH v3] Document pack v4 format
Date: Fri,  6 Sep 2013 09:14:03 +0700	[thread overview]
Message-ID: <1378433643-15638-1-git-send-email-pclouds@gmail.com> (raw)
In-Reply-To: <1377917393-28460-1-git-send-email-pclouds@gmail.com>


Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 Should be up to date with Nico's latest implementation and also cover
 additions to the format that everybody seems to agree on:

  - new types for canonical trees and commits
  - sha-1 table covering missing objects in thin packs

 Documentation/technical/pack-format.txt | 133 +++++++++++++++++++++++++++++++-
 1 file changed, 132 insertions(+), 1 deletion(-)

diff --git a/Documentation/technical/pack-format.txt b/Documentation/technical/pack-format.txt
index 8e5bf60..c5327ff 100644
--- a/Documentation/technical/pack-format.txt
+++ b/Documentation/technical/pack-format.txt
@@ -1,7 +1,7 @@
 Git pack format
 ===============
 
-== pack-*.pack files have the following format:
+== pack-*.pack files version 2 and 3 have the following format:
 
    - A header appears at the beginning and consists of the following:
 
@@ -36,6 +36,132 @@ Git pack format
 
   - The trailer records 20-byte SHA-1 checksum of all of the above.
 
+== pack-*.pack files version 4 have the following format:
+
+   - A header appears at the beginning and consists of the following:
+
+     4-byte signature:
+	The signature is: {'P', 'A', 'C', 'K'}
+
+     4-byte version number (network byte order): must be 4
+
+     4-byte number of objects contained in the pack (network byte order)
+
+   - A series of tables, described separately.
+
+   - The tables are followed by number of object entries, each of
+     which looks like below:
+
+     (undeltified representation)
+     n-byte type and length (4-bit type, (n-1)*7+4-bit length)
+     data
+
+     (deltified representation)
+     n-byte type and length (4-bit type, (n-1)*7+4-bit length)
+     base object name in SHA-1 reference encoding
+     compressed delta data
+
+     "type" is used to determine object type. Commit has type 1, tree
+     2, blob 3, tag 4, ref-delta 7, canonical-commit 9 (commit type
+     with bit 3 set), canonical-tree 10 (tree type with bit 3 set).
+     Compared to v2, ofs-delta type is not used, and canonical-commit
+     and canonical-tree are new types.
+
+     In undeltified format, blobs and tags ares compressed. Trees are
+     not compressed at all. Some headers in commits are stored
+     uncompressed, the rest is compressed. Tree and commit
+     representations are described in detail separately.
+
+     Blobs and tags are deltified and compressed the same way in
+     v3. Commits are not delitifed. Trees are deltified using
+     undeltified representation.
+
+     Trees and commits in canonical types are in the same format as
+     v2: in canonical format and deflated. They can be used for
+     completing thin packs or preserving somewhat ill-formatted
+     objects.
+
+  - The trailer records 20-byte SHA-1 checksum of all of the above.
+
+=== Pack v4 tables
+
+ - A table of sorted SHA-1 object names for all objects contained in
+   the on-disk pack.
+
+   Thin packs are used for transferring on the wire and may omit base
+   objects, expecting the receiver to add them before writing to
+   disk. The SHA-1 table in thin packs must include the omitted objects
+   as well.
+
+   This table can be referred to using "SHA-1 reference encoding": the
+   index, in variable length encoding, to this table.
+
+ - Ident table: the uncompressed length in variable encoding,
+   followed by zlib-compressed dictionary. Each entry consists of
+   two prefix bytes storing timezone followed by a NUL-terminated
+   string.
+
+   Entries should be sorted by frequency so that the most frequent
+   entry has the smallest index, thus most efficient variable
+   encoding.
+
+   The table can be referred to using "ident reference encoding": the
+   index number, in variable length encoding, to this table.
+
+ - Tree path table: the same format to ident table. Each entry
+   consists of two prefix bytes storing tree entry mode, then a
+   NUL-terminated path name. Same sort order recommendation applies.
+
+=== Commit representation
+
+  - n-byte type and length (4-bit type, (n-1)*7+4-bit length)
+
+  - Tree SHA-1 in SHA-1 reference encoding
+
+  - Parent count in variable length encoding
+
+  - Parent SHA-1s in SHA-1 reference encoding
+
+  - Author reference in ident reference encoding
+
+  - Author timestamp in variable length encoding
+
+  - Committer reference in ident reference encoding
+
+  - Committer timestamp, encoded as a difference against author
+    timestamp with the LSB used to indicate negative difference.
+
+  - Compressed data of remaining header and the body
+
+=== Tree representation
+
+  - n-byte type and length (4-bit type, (n-1)*7+4-bit length)
+
+  - Number of tree entries in variable length encoding
+
+  - A number of entries, each can be in either forms
+
+    - INT(path_index << 1)        INT(sha1_index)
+
+    - INT((entry_start << 1) | 1) INT(entry_count << 1)
+
+    - INT((entry_start << 1) | 1) INT((entry_count << 1) | 1) INT(base_sha1_index)
+
+    INT() denotes a number in variable length encoding. path_index is
+    the index to the tree path table. sha1_index is the index to the
+    SHA-1 table. entry_start is the first tree entry to copy
+    from. entry_count is the number of tree entries to
+    copy. base_sha1_index is the index to SHA-1 table of the base tree
+    to copy from.
+
+    The LSB of the first number indicates whether it's a plain tree
+    entry (LSB not set), or an instruction to copy tree entries from
+    another tree (LSB set).
+
+    For copying from another tree, is the LSB of the second number is
+    set, it will be followed by a base tree SHA-1. If it's not set,
+    the last base tree will be used.
+
 == Original (version 1) pack-*.idx files have the following format:
 
   - The header consists of 256 4-byte network byte order
@@ -160,3 +286,8 @@ Pack file entry: <+
     corresponding packfile.
 
     20-byte SHA-1-checksum of all of the above.
+
+== Version 3 pack-*.idx files support only *.pack files version 4. The
+   format is the same as version 2 except that the table of sorted
+   20-byte SHA-1 object names is missing in the .idx files. The same
+   table exists in .pack files and will be used instead.
-- 
1.8.2.83.gc99314b

  parent reply	other threads:[~2013-09-06  2:11 UTC|newest]

Thread overview: 83+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-08-27  4:25 [PATCH 00/23] Preliminary pack v4 support Nicolas Pitre
2013-08-27  4:25 ` [PATCH 01/23] pack v4: initial pack dictionary structure and code Nicolas Pitre
2013-08-27 15:08   ` Junio C Hamano
2013-08-27 16:13     ` Nicolas Pitre
2013-08-27  4:25 ` [PATCH 02/23] export packed_object_info() Nicolas Pitre
2013-08-27  4:25 ` [PATCH 03/23] pack v4: scan tree objects Nicolas Pitre
2013-08-27  4:25 ` [PATCH 04/23] pack v4: add tree entry mode support to dictionary entries Nicolas Pitre
2013-08-27  4:25 ` [PATCH 05/23] pack v4: add commit object parsing Nicolas Pitre
2013-08-27 15:26   ` Junio C Hamano
2013-08-27 16:47     ` Nicolas Pitre
2013-08-27 17:42       ` Junio C Hamano
2013-08-27  4:25 ` [PATCH 06/23] pack v4: split the object list and dictionary creation Nicolas Pitre
2013-08-27  4:25 ` [PATCH 07/23] pack v4: move to struct pack_idx_entry and get rid of our own struct idx_entry Nicolas Pitre
2013-08-27  4:25 ` [PATCH 08/23] pack v4: basic references encoding Nicolas Pitre
2013-08-27 15:29   ` Junio C Hamano
2013-08-27 15:53     ` Nicolas Pitre
2013-08-27  4:25 ` [PATCH 09/23] pack v4: commit object encoding Nicolas Pitre
2013-08-27 15:39   ` Junio C Hamano
2013-08-27 16:50     ` Nicolas Pitre
2013-08-27 19:59     ` Nicolas Pitre
2013-08-27 20:15       ` Junio C Hamano
2013-08-27 21:43         ` Nicolas Pitre
2013-09-02 20:48   ` Duy Nguyen
2013-09-03  6:30     ` Nicolas Pitre
2013-09-03  7:41       ` Duy Nguyen
2013-09-05  3:50         ` Nicolas Pitre
2013-08-27  4:25 ` [PATCH 10/23] pack v4: tree " Nicolas Pitre
2013-08-27 15:44   ` Junio C Hamano
2013-08-27 16:52     ` Nicolas Pitre
2013-08-27  4:25 ` [PATCH 11/23] pack v4: dictionary table output Nicolas Pitre
2013-08-27  4:25 ` [PATCH 12/23] pack v4: creation code Nicolas Pitre
2013-08-27 15:48   ` Junio C Hamano
2013-08-27 16:59     ` Nicolas Pitre
2013-08-27  4:25 ` [PATCH 13/23] pack v4: object headers Nicolas Pitre
2013-08-27  4:25 ` [PATCH 14/23] pack v4: object data copy Nicolas Pitre
2013-08-27 15:53   ` Junio C Hamano
2013-08-27 18:24     ` Nicolas Pitre
2013-08-27  4:25 ` [PATCH 15/23] pack v4: object writing Nicolas Pitre
2013-08-27  4:26 ` [PATCH 16/23] pack v4: tree object delta encoding Nicolas Pitre
2013-08-27  4:26 ` [PATCH 17/23] pack v4: load delta candidate for encoding tree objects Nicolas Pitre
2013-08-27  4:26 ` [PATCH 18/23] pack v4: honor pack.compression config option Nicolas Pitre
2013-08-27  4:26 ` [PATCH 19/23] pack v4: relax commit parsing a bit Nicolas Pitre
2013-08-27  4:26 ` [PATCH 20/23] pack index v3 Nicolas Pitre
2013-08-27  4:26 ` [PATCH 21/23] pack v4: normalize pack name to properly generate the pack index file name Nicolas Pitre
2013-08-27  4:26 ` [PATCH 22/23] pack v4: add progress display Nicolas Pitre
2013-08-27  4:26 ` [PATCH 23/23] initial pack index v3 support on the read side Nicolas Pitre
2013-08-31 11:45   ` Duy Nguyen
2013-09-03  6:09     ` Nicolas Pitre
2013-09-03  7:34       ` Duy Nguyen
2013-08-27 11:17 ` [PATCH] Document pack v4 format Nguyễn Thái Ngọc Duy
2013-08-27 18:25   ` Junio C Hamano
2013-08-27 18:53   ` Nicolas Pitre
2013-08-31  2:49   ` [PATCH v2] " Nguyễn Thái Ngọc Duy
2013-09-03  6:00     ` Nicolas Pitre
2013-09-03  6:46       ` Nicolas Pitre
2013-09-03 11:49         ` Duy Nguyen
2013-09-03 14:54           ` Duy Nguyen
2013-09-05  4:12             ` Nicolas Pitre
2013-09-05  4:19               ` Duy Nguyen
2013-09-05  4:40                 ` Nicolas Pitre
2013-09-05  5:04                   ` Duy Nguyen
2013-09-05  5:39                     ` Nicolas Pitre
2013-09-05 16:52                       ` Duy Nguyen
2013-09-05 17:14                         ` Nicolas Pitre
2013-09-05 20:26                           ` Junio C Hamano
2013-09-05 21:04                             ` Nicolas Pitre
2013-09-06  4:18                         ` Duy Nguyen
2013-09-06 13:19                           ` Nicolas Pitre
2013-09-06  2:14     ` Nguyễn Thái Ngọc Duy [this message]
2013-09-06  3:23       ` [PATCH v3] " Nicolas Pitre
2013-09-06  9:48         ` Duy Nguyen
2013-09-06 13:25           ` Nicolas Pitre
2013-09-06 13:44             ` Duy Nguyen
2013-09-06 16:44               ` Nicolas Pitre
2013-09-07  4:57                 ` Nicolas Pitre
2013-09-07  4:52       ` Nicolas Pitre
2013-09-07  8:05         ` Duy Nguyen
2013-08-27 15:03 ` [PATCH 00/23] Preliminary pack v4 support Junio C Hamano
2013-08-27 15:59   ` Nicolas Pitre
2013-08-27 16:44     ` Junio C Hamano
2013-08-28  2:30       ` Duy Nguyen
2013-08-28  2:58         ` Nicolas Pitre
2013-08-28  3:06           ` Duy Nguyen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1378433643-15638-1-git-send-email-pclouds@gmail.com \
    --to=pclouds@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=nico@fluxnic.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.