git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* pack v4 trees with a canonical base
@ 2013-09-09 19:25 Nicolas Pitre
  2013-09-10  0:52 ` Duy Nguyen
  0 siblings, 1 reply; 6+ messages in thread
From: Nicolas Pitre @ 2013-09-09 19:25 UTC (permalink / raw)
  To: git; +Cc: Nguyn Thái Ngc Duy

While reviewing the pack v4 thin support Ive realized that some base 
tree objects appended to a thin pack won't be in pack v4 format.  Hence 
the patch below to deal with that possibility.

An eventual optimization to index-pack when completing a pack would be 
to attempt the encoding of appended tree objects into the packv4 format 
using the existing dictionary table in the pack, and fall back to the 
canonical format if that table doesn't have all the necessary elements.

    packv4-parse.c: allow tree entry copying from a canonical tree object
    
    It is possible for a base tree object to be in the canonical
    representation.  This may happen if the encoder detected an irregularity
    preventing a loss free encoding to the pack v4 format, or if a thin pack
    was completed with such tree objects.
    
    Signed-off-by: Nicolas Pitre <nico@fluxnic.net>

diff --git a/packv4-parse.c b/packv4-parse.c
index c62c4ae..36942bb 100644
--- a/packv4-parse.c
+++ b/packv4-parse.c
@@ -10,6 +10,7 @@
 
 #include "cache.h"
 #include "packv4-parse.h"
+#include "tree-walk.h"
 #include "varint.h"
 
 const unsigned char *get_sha1ref(struct packed_git *p,
@@ -321,6 +322,45 @@ void *pv4_get_commit(struct packed_git *p, struct pack_window **w_curs,
 	return dst;
 }
 
+static int copy_canonical_tree_entries(struct packed_git *p, off_t offset,
+				       unsigned int start, unsigned int count,
+				       unsigned char **dstp, unsigned long *sizep)
+{
+	void *data;
+	const void *from, *end;
+	enum object_type type;
+	unsigned long size;
+	struct tree_desc desc;
+
+	data = unpack_entry(p, offset, &type, &size);
+	if (!data)
+		return -1;
+	if (type != OBJ_TREE) {
+		free(data);
+		return -1;
+	}
+
+	init_tree_desc(&desc, data, size);
+
+	while (start--)
+		update_tree_entry(&desc);
+
+	from = desc.buffer;
+	while (count--)
+		update_tree_entry(&desc);
+	end = desc.buffer;
+
+	if (end - from > *sizep) {
+		free(data);
+		return -1;
+	}
+	memcpy(*dstp, from, end - from);
+	*dstp += end - from;
+	*sizep -= end - from;
+	free(data);
+	return 0;
+}
+
 static int tree_entry_prefix(unsigned char *buf, unsigned long size,
 			     const unsigned char *path, unsigned mode)
 {
@@ -364,7 +404,12 @@ static int decode_entries(struct packed_git *p, struct pack_window **w_curs,
 		while (*scp & 128)
 			if (++scp - src >= avail - 20)
 				return -1;
-		/* let's still make sure this is actually a tree */
+		/* is this a canonical tree object? */
+		if ((*scp & 0xf) == OBJ_TREE)
+			return copy_canonical_tree_entries(p, offset,
+							   start, count,
+							   dstp, sizep);
+		/* let's still make sure this is actually a pv4 tree */
 		if ((*scp++ & 0xf) != OBJ_PV4_TREE)
 			return -1;
 	}

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: pack v4 trees with a canonical base
  2013-09-09 19:25 pack v4 trees with a canonical base Nicolas Pitre
@ 2013-09-10  0:52 ` Duy Nguyen
  2013-09-10 20:13   ` Junio C Hamano
  0 siblings, 1 reply; 6+ messages in thread
From: Duy Nguyen @ 2013-09-10  0:52 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Git Mailing List

On Tue, Sep 10, 2013 at 2:25 AM, Nicolas Pitre <nico@fluxnic.net> wrote:
> An eventual optimization to index-pack when completing a pack would be
> to attempt the encoding of appended tree objects into the packv4 format
> using the existing dictionary table in the pack, and fall back to the
> canonical format if that table doesn't have all the necessary elements.

Yeah, it's on the improvement todo list. The way pack-objects creates
dictionaries right now, the tree dict should contain all elements the
base trees need so fall back is only necessary when trees are have
extra zeros in mode field.
-- 
Duy

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: pack v4 trees with a canonical base
  2013-09-10  0:52 ` Duy Nguyen
@ 2013-09-10 20:13   ` Junio C Hamano
  2013-09-10 20:17     ` Nicolas Pitre
  0 siblings, 1 reply; 6+ messages in thread
From: Junio C Hamano @ 2013-09-10 20:13 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Nicolas Pitre, Git Mailing List

Duy Nguyen <pclouds@gmail.com> writes:

> On Tue, Sep 10, 2013 at 2:25 AM, Nicolas Pitre <nico@fluxnic.net> wrote:
>> An eventual optimization to index-pack when completing a pack would be
>> to attempt the encoding of appended tree objects into the packv4 format
>> using the existing dictionary table in the pack, and fall back to the
>> canonical format if that table doesn't have all the necessary elements.
>
> Yeah, it's on the improvement todo list. The way pack-objects creates
> dictionaries right now, the tree dict should contain all elements the
> base trees need so fall back is only necessary when trees are have
> extra zeros in mode field.

Careful.

There may be trees in the wild that record 100775 or 100777 in the
mode field for executable blobs, which also need to be special
cased.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: pack v4 trees with a canonical base
  2013-09-10 20:13   ` Junio C Hamano
@ 2013-09-10 20:17     ` Nicolas Pitre
  2013-09-10 20:45       ` Junio C Hamano
  0 siblings, 1 reply; 6+ messages in thread
From: Nicolas Pitre @ 2013-09-10 20:17 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Duy Nguyen, Git Mailing List

On Tue, 10 Sep 2013, Junio C Hamano wrote:

> Duy Nguyen <pclouds@gmail.com> writes:
> 
> > On Tue, Sep 10, 2013 at 2:25 AM, Nicolas Pitre <nico@fluxnic.net> wrote:
> >> An eventual optimization to index-pack when completing a pack would be
> >> to attempt the encoding of appended tree objects into the packv4 format
> >> using the existing dictionary table in the pack, and fall back to the
> >> canonical format if that table doesn't have all the necessary elements.
> >
> > Yeah, it's on the improvement todo list. The way pack-objects creates
> > dictionaries right now, the tree dict should contain all elements the
> > base trees need so fall back is only necessary when trees are have
> > extra zeros in mode field.
> 
> Careful.
> 
> There may be trees in the wild that record 100775 or 100777 in the
> mode field for executable blobs, which also need to be special
> cased.

All the file mode bits are always preserved.  So this is not really a 
special case as far as the pack v4 encoding is concerned.


Nicolas

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: pack v4 trees with a canonical base
  2013-09-10 20:17     ` Nicolas Pitre
@ 2013-09-10 20:45       ` Junio C Hamano
  2013-09-10 21:03         ` Nicolas Pitre
  0 siblings, 1 reply; 6+ messages in thread
From: Junio C Hamano @ 2013-09-10 20:45 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Duy Nguyen, Git Mailing List

Nicolas Pitre <nico@fluxnic.net> writes:

very much appreciated to> On Tue, 10 Sep 2013, Junio C Hamano wrote:
>
>> Duy Nguyen <pclouds@gmail.com> writes:
>> 
>> > On Tue, Sep 10, 2013 at 2:25 AM, Nicolas Pitre <nico@fluxnic.net> wrote:
>> >> An eventual optimization to index-pack when completing a pack would be
>> >> to attempt the encoding of appended tree objects into the packv4 format
>> >> using the existing dictionary table in the pack, and fall back to the
>> >> canonical format if that table doesn't have all the necessary elements.
>> >
>> > Yeah, it's on the improvement todo list. The way pack-objects creates
>> > dictionaries right now, the tree dict should contain all elements the
>> > base trees need so fall back is only necessary when trees are have
>> > extra zeros in mode field.
>> 
>> Careful.
>> 
>> There may be trees in the wild that record 100775 or 100777 in the
>> mode field for executable blobs, which also need to be special
>> cased.
>
> All the file mode bits are always preserved.  So this is not really a 
> special case as far as the pack v4 encoding is concerned.

Ahh. OK.  It can theoretically be argued that you could further
squeeze 13 bits out per tree entry because you would need only 5
possible values (100644, 100755 120000, 40000, and 160000, all
octal) for the modes, but we will never know what other modes we
would want to use in the future, so not being over-tight and using
16-bit for this purpose is probably a good trade-off (squeezing 8
bits out per tree entry would make the shape of ident table entry
and tree path entry different and may hurt reusing the code to parse
these tables).

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: pack v4 trees with a canonical base
  2013-09-10 20:45       ` Junio C Hamano
@ 2013-09-10 21:03         ` Nicolas Pitre
  0 siblings, 0 replies; 6+ messages in thread
From: Nicolas Pitre @ 2013-09-10 21:03 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Duy Nguyen, Git Mailing List

On Tue, 10 Sep 2013, Junio C Hamano wrote:

> Nicolas Pitre <nico@fluxnic.net> writes:
> 
> > On Tue, 10 Sep 2013, Junio C Hamano wrote:
> >
> >> There may be trees in the wild that record 100775 or 100777 in the
> >> mode field for executable blobs, which also need to be special
> >> cased.
> >
> > All the file mode bits are always preserved.  So this is not really a 
> > special case as far as the pack v4 encoding is concerned.
> 
> Ahh. OK.  It can theoretically be argued that you could further
> squeeze 13 bits out per tree entry because you would need only 5
> possible values (100644, 100755 120000, 40000, and 160000, all
> octal) for the modes, but we will never know what other modes we
> would want to use in the future, so not being over-tight and using
> 16-bit for this purpose is probably a good trade-off

Absolutely.  I tried not to lose any of the currently available 
extension possibilities in the canonical object format.

> (squeezing 8 bits out per tree entry would make the shape of ident 
> table entry and tree path entry different and may hurt reusing the 
> code to parse these tables).

One could argue that 16 bits is much more than sufficient to encode a 
time zone offset too.  but again this didn't seem worth painting 
ourselves in a corner if ever some creative time zones are used.

Those table are also compressed.  So any repetition of the same mode bit 
pattern or sparseness in the tz bits is likely to be compressed well.


Nicolas

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2013-09-10 21:04 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-09-09 19:25 pack v4 trees with a canonical base Nicolas Pitre
2013-09-10  0:52 ` Duy Nguyen
2013-09-10 20:13   ` Junio C Hamano
2013-09-10 20:17     ` Nicolas Pitre
2013-09-10 20:45       ` Junio C Hamano
2013-09-10 21:03         ` Nicolas Pitre

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).