From: Nicolas Pitre <nico@fluxnic.net>
To: git@vger.kernel.org
Subject: [PATCH 10/38] pack v4: commit object encoding
Date: Thu, 05 Sep 2013 02:19:33 -0400 [thread overview]
Message-ID: <1378362001-1738-11-git-send-email-nico@fluxnic.net> (raw)
In-Reply-To: <1378362001-1738-1-git-send-email-nico@fluxnic.net>
This goes as follows:
- Tree reference: either variable length encoding of the index
into the SHA1 table or the literal SHA1 prefixed by 0 (see
encode_sha1ref()).
- Parent count: variable length encoding of the number of parents.
This is normally going to occupy a single byte but doesn't have to.
- List of parent references: a list of encode_sha1ref() encoded
references, or nothing if the parent count was zero.
- Author reference: variable length encoding of an index into the author
identifier dictionary table which also covers the time zone. To make
the overall encoding efficient, the author table is sorted by usage
frequency so the most used names are first and require the shortest
index encoding.
- Author time stamp: variable length encoded. Year 2038 ready!
- Committer reference: same as author reference.
- Committer time stamp: same as author time stamp.
The remainder of the canonical commit object content is then zlib
compressed and appended to the above.
Rationale: The most important commit object data is densely encoded while
requiring no zlib inflate processing on access, and all SHA1 references
are most likely to be direct indices into the pack index file requiring
no SHA1 search into the pack index file.
Signed-off-by: Nicolas Pitre <nico@fluxnic.net>
---
packv4-create.c | 119 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 119 insertions(+)
diff --git a/packv4-create.c b/packv4-create.c
index 12527c0..d4a79f4 100644
--- a/packv4-create.c
+++ b/packv4-create.c
@@ -14,6 +14,9 @@
#include "pack.h"
#include "varint.h"
+
+static int pack_compression_level = Z_DEFAULT_COMPRESSION;
+
struct data_entry {
unsigned offset;
unsigned size;
@@ -274,6 +277,122 @@ static int encode_sha1ref(const unsigned char *sha1, unsigned char *buf)
return 1 + 20;
}
+/*
+ * This converts a canonical commit object buffer into its
+ * tightly packed representation using the already populated
+ * and sorted commit_name_table dictionary. The parsing is
+ * strict so to ensure the canonical version may always be
+ * regenerated and produce the same hash.
+ */
+void *pv4_encode_commit(void *buffer, unsigned long *sizep)
+{
+ unsigned long size = *sizep;
+ char *in, *tail, *end;
+ unsigned char *out;
+ unsigned char sha1[20];
+ int nb_parents, index, tz_val;
+ unsigned long time;
+ z_stream stream;
+ int status;
+
+ /*
+ * It is guaranteed that the output is always going to be smaller
+ * than the input. We could even do this conversion in place.
+ */
+ in = buffer;
+ tail = in + size;
+ buffer = xmalloc(size);
+ out = buffer;
+
+ /* parse the "tree" line */
+ if (in + 46 >= tail || memcmp(in, "tree ", 5) || in[45] != '\n')
+ goto bad_data;
+ if (get_sha1_lowhex(in + 5, sha1) < 0)
+ goto bad_data;
+ in += 46;
+ out += encode_sha1ref(sha1, out);
+
+ /* count how many "parent" lines */
+ nb_parents = 0;
+ while (in + 48 < tail && !memcmp(in, "parent ", 7) && in[47] == '\n') {
+ nb_parents++;
+ in += 48;
+ }
+ out += encode_varint(nb_parents, out);
+
+ /* rewind and parse the "parent" lines */
+ in -= 48 * nb_parents;
+ while (nb_parents--) {
+ if (get_sha1_lowhex(in + 7, sha1))
+ goto bad_data;
+ out += encode_sha1ref(sha1, out);
+ in += 48;
+ }
+
+ /* parse the "author" line */
+ /* it must be at least "author x <x> 0 +0000\n" i.e. 21 chars */
+ if (in + 21 >= tail || memcmp(in, "author ", 7))
+ goto bad_data;
+ in += 7;
+ end = get_nameend_and_tz(in, &tz_val);
+ if (!end)
+ goto bad_data;
+ index = dict_add_entry(commit_name_table, tz_val, in, end - in);
+ if (index < 0)
+ goto bad_dict;
+ out += encode_varint(index, out);
+ time = strtoul(end, &end, 10);
+ if (!end || end[0] != ' ' || end[6] != '\n')
+ goto bad_data;
+ out += encode_varint(time, out);
+ in = end + 7;
+
+ /* parse the "committer" line */
+ /* it must be at least "committer x <x> 0 +0000\n" i.e. 24 chars */
+ if (in + 24 >= tail || memcmp(in, "committer ", 7))
+ goto bad_data;
+ in += 10;
+ end = get_nameend_and_tz(in, &tz_val);
+ if (!end)
+ goto bad_data;
+ index = dict_add_entry(commit_name_table, tz_val, in, end - in);
+ if (index < 0)
+ goto bad_dict;
+ out += encode_varint(index, out);
+ time = strtoul(end, &end, 10);
+ if (!end || end[0] != ' ' || end[6] != '\n')
+ goto bad_data;
+ out += encode_varint(time, out);
+ in = end + 7;
+
+ /* finally, deflate the remaining data */
+ memset(&stream, 0, sizeof(stream));
+ deflateInit(&stream, pack_compression_level);
+ stream.next_in = (unsigned char *)in;
+ stream.avail_in = tail - in;
+ stream.next_out = (unsigned char *)out;
+ stream.avail_out = size - (out - (unsigned char *)buffer);
+ status = deflate(&stream, Z_FINISH);
+ end = (char *)stream.next_out;
+ deflateEnd(&stream);
+ if (status != Z_STREAM_END) {
+ error("deflate error status %d", status);
+ goto bad;
+ }
+
+ *sizep = end - (char *)buffer;
+ return buffer;
+
+bad_data:
+ error("bad commit data");
+ goto bad;
+bad_dict:
+ error("bad dict entry");
+bad:
+ free(buffer);
+ return NULL;
+}
+
static struct pack_idx_entry *get_packed_object_list(struct packed_git *p)
{
unsigned i, nr_objects = p->num_objects;
--
1.8.4.38.g317e65b
next prev parent reply other threads:[~2013-09-05 6:20 UTC|newest]
Thread overview: 124+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-09-05 6:19 [PATCH 00/38] pack version 4 basic functionalities Nicolas Pitre
2013-09-05 6:19 ` [PATCH 01/38] pack v4: initial pack dictionary structure and code Nicolas Pitre
2013-09-05 6:19 ` [PATCH 02/38] export packed_object_info() Nicolas Pitre
2013-09-05 6:19 ` [PATCH 03/38] pack v4: scan tree objects Nicolas Pitre
2013-09-05 6:19 ` [PATCH 04/38] pack v4: add tree entry mode support to dictionary entries Nicolas Pitre
2013-09-05 6:19 ` [PATCH 05/38] pack v4: add commit object parsing Nicolas Pitre
2013-09-05 10:30 ` SZEDER Gábor
2013-09-05 17:30 ` Nicolas Pitre
2013-09-05 6:19 ` [PATCH 06/38] pack v4: split the object list and dictionary creation Nicolas Pitre
2013-09-05 6:19 ` [PATCH 07/38] pack v4: move to struct pack_idx_entry and get rid of our own struct idx_entry Nicolas Pitre
2013-09-05 6:19 ` [PATCH 08/38] pack v4: basic SHA1 reference encoding Nicolas Pitre
2013-09-05 6:19 ` [PATCH 09/38] introduce get_sha1_lowhex() Nicolas Pitre
2013-09-05 6:19 ` Nicolas Pitre [this message]
2013-09-06 6:57 ` [PATCH 10/38] pack v4: commit object encoding Junio C Hamano
2013-09-06 21:28 ` Nicolas Pitre
2013-09-06 22:08 ` Junio C Hamano
2013-09-07 4:41 ` Nicolas Pitre
2013-09-05 6:19 ` [PATCH 11/38] pack v4: tree " Nicolas Pitre
2013-09-05 6:19 ` [PATCH 12/38] pack v4: dictionary table output Nicolas Pitre
2013-09-05 6:19 ` [PATCH 13/38] pack v4: creation code Nicolas Pitre
2013-09-05 6:19 ` [PATCH 14/38] pack v4: object headers Nicolas Pitre
2013-09-05 6:19 ` [PATCH 15/38] pack v4: object data copy Nicolas Pitre
2013-09-05 6:19 ` [PATCH 16/38] pack v4: object writing Nicolas Pitre
2013-09-05 6:19 ` [PATCH 17/38] pack v4: tree object delta encoding Nicolas Pitre
2013-09-05 6:19 ` [PATCH 18/38] pack v4: load delta candidate for encoding tree objects Nicolas Pitre
2013-09-05 6:19 ` [PATCH 19/38] packv4-create: optimize delta encoding Nicolas Pitre
2013-09-05 6:19 ` [PATCH 20/38] pack v4: honor pack.compression config option Nicolas Pitre
2013-09-05 6:19 ` [PATCH 21/38] pack v4: relax commit parsing a bit Nicolas Pitre
2013-09-05 6:19 ` [PATCH 22/38] pack index v3 Nicolas Pitre
2013-09-05 6:19 ` [PATCH 23/38] packv4-create: normalize pack name to properly generate the pack index file name Nicolas Pitre
2013-09-05 6:19 ` [PATCH 24/38] packv4-create: add progress display Nicolas Pitre
2013-09-05 6:19 ` [PATCH 25/38] pack v4: initial pack index v3 support on the read side Nicolas Pitre
2013-09-05 6:19 ` [PATCH 26/38] pack v4: object header decode Nicolas Pitre
2013-09-05 6:19 ` [PATCH 27/38] pack v4: code to obtain a SHA1 from a sha1ref Nicolas Pitre
2013-09-05 6:19 ` [PATCH 28/38] pack v4: code to load and prepare a pack dictionary table for use Nicolas Pitre
2013-09-05 6:19 ` [PATCH 29/38] pack v4: code to retrieve a name Nicolas Pitre
2013-09-05 6:19 ` [PATCH 30/38] pack v4: code to recreate a canonical commit object Nicolas Pitre
2013-09-05 6:19 ` [PATCH 31/38] sha1_file.c: make use of decode_varint() Nicolas Pitre
2013-09-05 7:35 ` SZEDER Gábor
2013-09-05 6:19 ` [PATCH 32/38] pack v4: parse delta base reference Nicolas Pitre
2013-09-05 6:19 ` [PATCH 33/38] pack v4: we can read commit objects now Nicolas Pitre
2013-09-05 6:19 ` [PATCH 34/38] pack v4: code to retrieve a path component Nicolas Pitre
2013-09-05 6:19 ` [PATCH 35/38] pack v4: decode tree objects Nicolas Pitre
2013-09-05 6:19 ` [PATCH 36/38] pack v4: get " Nicolas Pitre
2013-09-05 6:20 ` [PATCH 37/38] pack v4: introduce "escape hatches" in the name and path indexes Nicolas Pitre
2013-09-05 19:02 ` Nicolas Pitre
2013-09-05 21:48 ` Nicolas Pitre
2013-09-05 23:57 ` Duy Nguyen
2013-09-05 6:20 ` [PATCH 38/38] packv4-create: add a command line argument to limit tree copy sequences Nicolas Pitre
2013-09-07 10:43 ` [PATCH 00/12] pack v4 support in index-pack Nguyễn Thái Ngọc Duy
2013-09-07 10:43 ` [PATCH 01/12] pack v4: split pv4_create_dict() out of load_dict() Nguyễn Thái Ngọc Duy
2013-09-07 10:43 ` [PATCH 02/12] index-pack: split out varint decoding code Nguyễn Thái Ngọc Duy
2013-09-07 10:43 ` [PATCH 03/12] index-pack: do not allocate buffer for unpacking deltas in the first pass Nguyễn Thái Ngọc Duy
2013-09-07 10:43 ` [PATCH 04/12] index-pack: split inflate/digest code out of unpack_entry_data Nguyễn Thái Ngọc Duy
2013-09-07 10:43 ` [PATCH 05/12] index-pack: parse v4 header and dictionaries Nguyễn Thái Ngọc Duy
2013-09-08 2:14 ` Nicolas Pitre
2013-09-07 10:43 ` [PATCH 06/12] index-pack: make sure all objects are registered in v4's SHA-1 table Nguyễn Thái Ngọc Duy
2013-09-07 10:43 ` [PATCH 07/12] index-pack: parse v4 commit format Nguyễn Thái Ngọc Duy
2013-09-07 10:43 ` [PATCH 08/12] index-pack: parse v4 tree format Nguyễn Thái Ngọc Duy
2013-09-08 2:52 ` Nicolas Pitre
2013-09-07 10:43 ` [PATCH 09/12] index-pack: move delta base queuing code to unpack_raw_entry Nguyễn Thái Ngọc Duy
2013-09-07 10:43 ` [PATCH 10/12] index-pack: record all delta bases in v4 (tree and ref-delta) Nguyễn Thái Ngọc Duy
2013-09-07 10:43 ` [PATCH 11/12] index-pack: skip looking for ofs-deltas in v4 as they are not allowed Nguyễn Thái Ngọc Duy
2013-09-07 10:43 ` [PATCH 12/12] index-pack: resolve v4 one-base trees Nguyễn Thái Ngọc Duy
2013-09-08 3:28 ` Nicolas Pitre
2013-09-08 3:44 ` Duy Nguyen
2013-09-08 7:22 ` [PATCH v2 00/14] pack v4 support in index-pack Nguyễn Thái Ngọc Duy
2013-09-08 7:22 ` [PATCH v2 01/14] pack v4: split pv4_create_dict() out of load_dict() Nguyễn Thái Ngọc Duy
2013-09-08 7:22 ` [PATCH v2 02/14] pack v4: add pv4_free_dict() Nguyễn Thái Ngọc Duy
2013-09-08 7:22 ` [PATCH v2 03/14] index-pack: add more comments on some big functions Nguyễn Thái Ngọc Duy
2013-09-08 7:22 ` [PATCH v2 04/14] index-pack: split out varint decoding code Nguyễn Thái Ngọc Duy
2013-09-08 7:22 ` [PATCH v2 05/14] index-pack: do not allocate buffer for unpacking deltas in the first pass Nguyễn Thái Ngọc Duy
2013-09-08 7:22 ` [PATCH v2 06/14] index-pack: split inflate/digest code out of unpack_entry_data Nguyễn Thái Ngọc Duy
2013-09-08 7:22 ` [PATCH v2 07/14] index-pack: parse v4 header and dictionaries Nguyễn Thái Ngọc Duy
2013-09-08 7:22 ` [PATCH v2 08/14] index-pack: make sure all objects are registered in v4's SHA-1 table Nguyễn Thái Ngọc Duy
2013-09-08 7:22 ` [PATCH v2 09/14] index-pack: parse v4 commit format Nguyễn Thái Ngọc Duy
2013-09-08 7:22 ` [PATCH v2 10/14] index-pack: parse v4 tree format Nguyễn Thái Ngọc Duy
2013-09-08 7:22 ` [PATCH v2 11/14] index-pack: move delta base queuing code to unpack_raw_entry Nguyễn Thái Ngọc Duy
2013-09-08 7:22 ` [PATCH v2 12/14] index-pack: record all delta bases in v4 (tree and ref-delta) Nguyễn Thái Ngọc Duy
2013-09-08 7:22 ` [PATCH v2 13/14] index-pack: skip looking for ofs-deltas in v4 as they are not allowed Nguyễn Thái Ngọc Duy
2013-09-08 7:22 ` [PATCH v2 14/14] index-pack: resolve v4 one-base trees Nguyễn Thái Ngọc Duy
2013-09-08 15:04 ` [PATCH 00/11] pack v4 support in pack-objects Nguyễn Thái Ngọc Duy
2013-09-08 15:04 ` [PATCH 01/11] pack v4: allocate dicts from the beginning Nguyễn Thái Ngọc Duy
2013-09-08 15:04 ` [PATCH 02/11] pack v4: stop using static/global variables in packv4-create.c Nguyễn Thái Ngọc Duy
2013-09-08 15:04 ` [PATCH 03/11] pack v4: move packv4-create.c to libgit.a Nguyễn Thái Ngọc Duy
2013-09-08 20:56 ` Nicolas Pitre
2013-09-08 15:04 ` [PATCH 04/11] pack v4: add version argument to write_pack_header Nguyễn Thái Ngọc Duy
2013-09-08 15:04 ` [PATCH 05/11] pack-write.c: add pv4_encode_in_pack_object_header Nguyễn Thái Ngọc Duy
2013-09-08 20:51 ` Nicolas Pitre
2013-09-08 15:04 ` [PATCH 06/11] pack-objects: add --version to specify written pack version Nguyễn Thái Ngọc Duy
2013-09-08 15:04 ` [PATCH 07/11] list-objects.c: add show_tree_entry callback to traverse_commit_list Nguyễn Thái Ngọc Duy
2013-09-08 15:04 ` [PATCH 08/11] pack-objects: create pack v4 tables Nguyễn Thái Ngọc Duy
2013-09-09 10:40 ` Duy Nguyen
2013-09-09 13:07 ` Nicolas Pitre
2013-09-09 15:21 ` Junio C Hamano
2013-09-08 15:04 ` [PATCH 09/11] pack-objects: do not cache delta for v4 trees Nguyễn Thái Ngọc Duy
2013-09-08 15:04 ` [PATCH 10/11] pack-objects: exclude commits out of delta objects in v4 Nguyễn Thái Ngọc Duy
2013-09-08 15:04 ` [PATCH 11/11] pack-objects: support writing pack v4 Nguyễn Thái Ngọc Duy
2013-09-09 13:57 ` [PATCH v2 00/16] pack v4 support in pack-objects Nguyễn Thái Ngọc Duy
2013-09-09 13:57 ` [PATCH v2 01/16] pack v4: allocate dicts from the beginning Nguyễn Thái Ngọc Duy
2013-09-09 13:57 ` [PATCH v2 02/16] pack v4: stop using static/global variables in packv4-create.c Nguyễn Thái Ngọc Duy
2013-09-09 13:57 ` [PATCH v2 03/16] pack v4: move packv4-create.c to libgit.a Nguyễn Thái Ngọc Duy
2013-09-09 13:57 ` [PATCH v2 04/16] pack v4: add version argument to write_pack_header Nguyễn Thái Ngọc Duy
2013-09-09 13:57 ` [PATCH v2 05/16] pack_write: tighten valid object type check in encode_in_pack_object_header Nguyễn Thái Ngọc Duy
2013-09-09 13:57 ` [PATCH v2 06/16] pack-write.c: add pv4_encode_object_header Nguyễn Thái Ngọc Duy
2013-09-09 13:57 ` [PATCH v2 07/16] pack-objects: add --version to specify written pack version Nguyễn Thái Ngọc Duy
2013-09-09 13:57 ` [PATCH v2 08/16] list-objects.c: add show_tree_entry callback to traverse_commit_list Nguyễn Thái Ngọc Duy
2013-09-09 13:58 ` [PATCH v2 09/16] pack-objects: do not cache delta for v4 trees Nguyễn Thái Ngọc Duy
2013-09-09 13:58 ` [PATCH v2 10/16] pack-objects: exclude commits out of delta objects in v4 Nguyễn Thái Ngọc Duy
2013-09-09 13:58 ` [PATCH v2 11/16] pack-objects: create pack v4 tables Nguyễn Thái Ngọc Duy
2013-09-09 13:58 ` [PATCH v2 12/16] pack-objects: prepare SHA-1 table in v4 Nguyễn Thái Ngọc Duy
2013-09-09 13:58 ` [PATCH v2 13/16] pack-objects: support writing pack v4 Nguyễn Thái Ngọc Duy
2013-09-09 13:58 ` [PATCH v2 14/16] pack v4: support "end-of-pack" indicator in index-pack and pack-objects Nguyễn Thái Ngọc Duy
2013-09-09 13:58 ` [PATCH v2 15/16] index-pack: use nr_objects_final as sha1_table size Nguyễn Thái Ngọc Duy
2013-09-09 15:01 ` Nicolas Pitre
2013-09-09 18:34 ` Junio C Hamano
2013-09-09 18:46 ` Nicolas Pitre
2013-09-09 18:56 ` Junio C Hamano
2013-09-09 19:11 ` Nicolas Pitre
2013-09-09 19:30 ` Junio C Hamano
2013-09-09 19:56 ` Nicolas Pitre
2013-09-10 0:45 ` Duy Nguyen
2013-09-12 15:34 ` Nicolas Pitre
2013-09-09 13:58 ` [PATCH v2 16/16] index-pack: support completing thin packs v4 Nguyễn Thái Ngọc Duy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1378362001-1738-11-git-send-email-nico@fluxnic.net \
--to=nico@fluxnic.net \
--cc=git@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).