Git development
 help / color / mirror / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: git@vger.kernel.org
Cc: Nicolas Pitre <nico@fluxnic.net>,
	"Shawn O. Pearce" <spearce@spearce.org>,
	Jeff King <peff@peff.net>
Subject: [RFC/PATCH] define the way new representation types are encoded in the pack
Date: Thu, 27 Oct 2011 23:04:40 -0700	[thread overview]
Message-ID: <7v62j9veh3.fsf@alter.siamese.dyndns.org> (raw)

In addition to four basic types (commit, tree, blob and tag), the pack
stream can encode a few other "representation" types, such as REF_DELTA
and OFS_DELTA. As we allocate 3 bits in the first byte for this purpose,
we do not have much room to add new representation types in place, but we
do have one value reserved for future expansion.

This patch is about defining how that reserved value is used.

The first byte in the pack stream data consists of the following for the
current representation types:

 - Bit 0-3 are used for the low 4-bit of "some" size (not necessarily the
   size of the representation);

 - Bit 4-6 are used for object types 0-7, but we have not used type 5 so
   far and reserved it for future expansion (we could also use type 0
   recorded in the pack stream for future expansion, just like how I
   convert 5 into the real "extended" representation type in this patch);

 - Bit 7 is used to signal if the second byte needs to be read for sizes
   that do not fit in the 4-bit.

When bit 4-6 encodes type 5, the first byte is used this way:

 - Bit 0-3 denotes the real "extended" representation type. Because types
   0-7 can already be encoded without using the extended format, we can
   offset the type by 8 (i.e. if bit 0-3 says 3, it means representation
   type 11 = 3 + 8);

 - Bit 4-6 has the value "5";

 - Bit 7 is used to signal if the _third_ byte needs to be read for larger
   size that cannot be represented with 8-bit.

As it is unlikely for us to pack things that do not need to record any
size, the second byte is always used in full to encode the low 8-bit of
the size.

I haven't started using type=8 and upwards for anything yet, but because
we have only one "future expansion" value left, I want us to be extremely
careful in order to avoid painting us into a corner that we cannot get out
of, so I am sending this out early for a preliminary review.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 cache.h     |    3 ++-
 sha1_file.c |   36 ++++++++++++++++++++++++++++++++----
 2 files changed, 34 insertions(+), 5 deletions(-)

diff --git a/cache.h b/cache.h
index 2e6ad36..b02139b 100644
--- a/cache.h
+++ b/cache.h
@@ -380,9 +380,10 @@ enum object_type {
 	OBJ_TREE = 2,
 	OBJ_BLOB = 3,
 	OBJ_TAG = 4,
-	/* 5 for future expansion */
+	OBJ_EXT = 5, /* 5 for future expansion */
 	OBJ_OFS_DELTA = 6,
 	OBJ_REF_DELTA = 7,
+	OBJ_CAT_TREE = 8,
 	OBJ_ANY,
 	OBJ_MAX
 };
diff --git a/sha1_file.c b/sha1_file.c
index 27f3b9b..4dcd023 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -1254,16 +1254,43 @@ static int experimental_loose_object(unsigned char *map)
 }
 
 unsigned long unpack_object_header_buffer(const unsigned char *buf,
-		unsigned long len, enum object_type *type, unsigned long *sizep)
+	unsigned long len, enum object_type *typep, unsigned long *sizep)
 {
 	unsigned shift;
 	unsigned long size, c;
 	unsigned long used = 0;
+	enum object_type type;
 
+	/*
+	 * MSB of the first byte is used to tell if the second byte
+	 * needs to be read for the size, so type field is only 3-bit
+	 * wide.
+	 */
 	c = buf[used++];
-	*type = (c >> 4) & 7;
-	size = c & 15;
-	shift = 4;
+	type = (c >> 4) & 7;
+
+	if (type != OBJ_EXT) {
+		/*
+		 * For basic types of object representations, the low
+		 * 4-bit of the first byte is used for the lowermost
+		 * 4-bit of the size. The MSB of the first byte tells
+		 * if the second byte needs to be read for size.
+		 */
+		size = c & 15;
+		shift = 4;
+	} else {
+		/*
+		 * For extended types, the low 4-bit of the first byte
+		 * is used for the representation type (offset by 8),
+		 * and the size begins at the second byte. The MSB of
+		 * the first byte is still used to indicate the next
+		 * byte (i.e. the third byte) needs to be read for the
+		 * size.
+		 */
+		type = (c & 15) + 8;
+		size = buf[used++];
+		shift = 8;
+	}
 	while (c & 0x80) {
 		if (len <= used || bitsizeof(long) <= shift) {
 			error("bad object header");
@@ -1274,6 +1301,7 @@ unsigned long unpack_object_header_buffer(const unsigned char *buf,
 		shift += 7;
 	}
 	*sizep = size;
+	*typep = type;
 	return used;
 }
 

             reply	other threads:[~2011-10-28  6:04 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-10-28  6:04 Junio C Hamano [this message]
2011-10-28  6:12 ` [RFC/PATCH] define the way new representation types are encoded in the pack Junio C Hamano
2011-10-30  5:40   ` Nguyen Thai Ngoc Duy
2011-10-30  7:11     ` Junio C Hamano
2011-10-30  7:13       ` Junio C Hamano
2011-10-30  9:31       ` Nguyen Thai Ngoc Duy
2011-10-28 13:41 ` Jakub Narebski
2011-10-28 15:24   ` Junio C Hamano
2011-10-28 15:44 ` Shawn Pearce
2011-10-28 22:48   ` Nicolas Pitre
2011-10-28 23:07     ` Shawn Pearce
2011-10-28 23:30       ` Nicolas Pitre

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7v62j9veh3.fsf@alter.siamese.dyndns.org \
    --to=gitster@pobox.com \
    --cc=git@vger.kernel.org \
    --cc=nico@fluxnic.net \
    --cc=peff@peff.net \
    --cc=spearce@spearce.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox