* [PATCH] Update, and clear up the pack format documentation a bit
@ 2008-04-05 18:07 Peter Eriksen
2008-04-05 23:58 ` Junio C Hamano
2008-04-06 4:51 ` Shawn O. Pearce
0 siblings, 2 replies; 4+ messages in thread
From: Peter Eriksen @ 2008-04-05 18:07 UTC (permalink / raw)
To: git
The current documentation does not mention the ofs_delta pack
object type. This patch is also supposed to make the text a bit
more readable, since it moves the object entry header
description earlier.
I fixes one error in these lines:
If it is DELTA, then
20-byte base object name SHA1 (the size above is the
size of the delta data that follows).
The size given in the object header is actually the inflated size
of the delta data that follows, since the call chain goes like
this:
For delta objects:
unpack_entry()
unpack_object_header()
unpack_delta_entry()
unpack_compressed_entry()
For non-delta objects:
unpack_entry()
unpack_object_header()
unpack_compressed_entry()
unpack_compressed_entry() allocates a buffer of the size
given in its last argument, and inflates the data into
this buffer.
So all objects have in fact their inflated size given
in the packed object header.
Signed-off-by: Peter Eriksen <s022018@student.dtu.dk>
---
Documentation/technical/pack-format.txt | 43
++++++++++++++++--------------
1 files changed, 23 insertions(+), 20 deletions(-)
Did I understand this right especially the part
with what the length field in the packed objects
headers mean?
diff --git a/Documentation/technical/pack-format.txt
b/Documentation/technical/pack-format.txt
index aa87756..35ee01d 100644
--- a/Documentation/technical/pack-format.txt
+++ b/Documentation/technical/pack-format.txt
@@ -19,15 +19,34 @@ GIT pack format
- The header is followed by number of object entries, each of
which looks like this:
+
+ An n-byte header encoding the
+ type of the object
+ length of the object before compression
+
+ The format of the header:
+ 1-byte size extension bit (MSB)
+ type (next 3 bit)
+ size0 (lower 4-bit)
+ n-byte sizeN (as long as MSB is set, each 7-bit)
+ size0..sizeN form 4+7+7+..+7 bit integer, size0
+ is the least significant part, and sizeN is the
+ most significant part.
+
- (undeltified representation)
- n-byte type and length (3-bit type, (n-1)*7+4-bit length)
+ The header is followed by:
+
+ (for object types: commit, tree, blob, and tag)
compressed data
- (deltified representation)
- n-byte type and length (3-bit type, (n-1)*7+4-bit length)
+ (for object type ref_delta)
20-byte base object name
compressed delta data
+
+ (for object type ofs_delta)
+ n-byte offset (n*7-bit as above, but with size0 being 7 bit)
+ compressed delta data
+
Observation: length of each object is encoded in a variable
length format and is not constrained to 32-bit or anything.
@@ -92,22 +111,6 @@ trailer | | packfile checksum |
|
Pack file entry: <+
- packed object header:
- 1-byte size extension bit (MSB)
- type (next 3 bit)
- size0 (lower 4-bit)
- n-byte sizeN (as long as MSB is set, each 7-bit)
- size0..sizeN form 4+7+7+..+7 bit integer, size0
- is the least significant part, and sizeN is the
- most significant part.
- packed object data:
- If it is not DELTA, then deflated bytes (the size above
- is the size before compression).
- If it is DELTA, then
- 20-byte base object name SHA1 (the size above is the
- size of the delta data that follows).
- delta data, deflated.
-
= Version 2 pack-*.idx files support packs larger than 4 GiB, and
have some other reorganizations. They have the format:
--
1.5.5-rc3.GIT
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH] Update, and clear up the pack format documentation a bit
2008-04-05 18:07 [PATCH] Update, and clear up the pack format documentation a bit Peter Eriksen
@ 2008-04-05 23:58 ` Junio C Hamano
2008-04-06 4:51 ` Shawn O. Pearce
1 sibling, 0 replies; 4+ messages in thread
From: Junio C Hamano @ 2008-04-05 23:58 UTC (permalink / raw)
To: Peter Eriksen; +Cc: git
"Peter Eriksen" <s022018@student.dtu.dk> writes:
> The current documentation does not mention the ofs_delta pack
> object type. This patch is also supposed to make the text a bit
> more readable, since it moves the object entry header
> description earlier.
>
> I fixes one error in these lines:
>
> If it is DELTA, then
> 20-byte base object name SHA1 (the size above is the
> size of the delta data that follows).
>
> The size given in the object header is actually the inflated size
> of the delta data that follows,...
Your understanding is correct. Throughout the pack-objects program,
delta_size is always expressed in uncompressed number of bytes. The
original description you quoted above does not even say "the size of the
delta data (compressed)", so in that sense I do not think the original
description is really an error; if the update makes the description
clearer that would be good.
> - The header is followed by number of object entries, each of
> which looks like this:
> +
> + An n-byte header encoding the
> + type of the object
Hmm.
This is just terminology, but I think calling ref-delta and ofs-delta
"type of object", is confusing. This "type" field is about object
representation in the pack.
There are "undeltified" representations (4 object types), "ref-delta" and
"ofs-delta" representations.
> + length of the object before compression
And this is the length of the representation specific data.
- for undeltified representations of the four object types, this
size is the size of the _object_;
- for deltified representations, this is _NOT_ the size of the _object_
(i.e. final object data after applying the delta). This is the size of
the delta data to be applied to the delta base, and does not include
the base object name (for ref-delta) nor size to represent the offset
(for ofs-delta).
> +
> + The format of the header:
> + 1-byte size extension bit (MSB)
> + type (next 3 bit)
> + size0 (lower 4-bit)
> + n-byte sizeN (as long as MSB is set, each 7-bit)
> + size0..sizeN form 4+7+7+..+7 bit integer, size0
> + is the least significant part, and sizeN is the
> + most significant part.
> +
>
> + The header is followed by:
> +
> + (for object types: commit, tree, blob, and tag)
> compressed data
Correct.
> + (for object type ref_delta)
> 20-byte base object name
> compressed delta data
> +
> + (for object type ofs_delta)
> + n-byte offset (n*7-bit as above, but with size0 being 7 bit)
> + compressed delta data
Correct.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] Update, and clear up the pack format documentation a bit
2008-04-05 18:07 [PATCH] Update, and clear up the pack format documentation a bit Peter Eriksen
2008-04-05 23:58 ` Junio C Hamano
@ 2008-04-06 4:51 ` Shawn O. Pearce
2008-04-06 6:16 ` Junio C Hamano
1 sibling, 1 reply; 4+ messages in thread
From: Shawn O. Pearce @ 2008-04-06 4:51 UTC (permalink / raw)
To: Peter Eriksen; +Cc: git
Peter Eriksen <s022018@student.dtu.dk> wrote:
> The current documentation does not mention the ofs_delta pack
> object type. This patch is also supposed to make the text a bit
> more readable, since it moves the object entry header
> description earlier.
...
> diff --git a/Documentation/technical/pack-format.txt
> b/Documentation/technical/pack-format.txt
> index aa87756..35ee01d 100644
> --- a/Documentation/technical/pack-format.txt
> +++ b/Documentation/technical/pack-format.txt
> compressed delta data
> +
> + (for object type ofs_delta)
> + n-byte offset (n*7-bit as above, but with size0 being 7 bit)
> + compressed delta data
> +
That is not correct. The ofs_delta is encoded as an n-byte offset
that is subtracted from the current object's first byte (the byte
holding the type/representation field and first 4 bits of length).
The n-byte encoding for an ofs_delta is different then the one
used for the length. We add 1 for each byte where the MSB is 1.
We also store the data in big-endian form (the most significant
byte is first and the least significant byte is last).
See get_delta_base in sha1_file.c for the details of this.
In pack v4 I planned on using this particular encoding in more
of the format than just here.
--
Shawn.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] Update, and clear up the pack format documentation a bit
2008-04-06 4:51 ` Shawn O. Pearce
@ 2008-04-06 6:16 ` Junio C Hamano
0 siblings, 0 replies; 4+ messages in thread
From: Junio C Hamano @ 2008-04-06 6:16 UTC (permalink / raw)
To: Shawn O. Pearce; +Cc: Peter Eriksen, git
"Shawn O. Pearce" <spearce@spearce.org> writes:
> Peter Eriksen <s022018@student.dtu.dk> wrote:
> ...
>> + (for object type ofs_delta)
>> + n-byte offset (n*7-bit as above, but with size0 being 7 bit)
>> + compressed delta data
>> +
>
> That is not correct. The ofs_delta is encoded as an n-byte offset
> that is subtracted from the current object's first byte (the byte
> holding the type/representation field and first 4 bits of length).
Right. Saying just "n-byte offset" can be mistaken as the offset from the
beginning of the file, and making it clear that it is relative is good.
> The n-byte encoding for an ofs_delta is different then the one
> used for the length. We add 1 for each byte where the MSB is 1.
> We also store the data in big-endian form (the most significant
> byte is first and the least significant byte is last).
Ah, I forgot about that one.
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2008-04-06 6:17 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-04-05 18:07 [PATCH] Update, and clear up the pack format documentation a bit Peter Eriksen
2008-04-05 23:58 ` Junio C Hamano
2008-04-06 4:51 ` Shawn O. Pearce
2008-04-06 6:16 ` Junio C Hamano
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).