git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* How OBJ_REF_DELTA pack file size calculated
@ 2024-01-20  4:28 Farhan Khan
  2024-01-20 18:26 ` Junio C Hamano
  0 siblings, 1 reply; 2+ messages in thread
From: Farhan Khan @ 2024-01-20  4:28 UTC (permalink / raw)
  To: git

Hi,

I am trying to implement REF_OBJ_DELTA, but having some trouble with offsets (sizes in the pack file) that do not align, specifically the size of the whole object does not seem to add up.

For example, consider this run:

git verify-pack -v .git/objects/pack/pack-48269bdfe1d28d20f603c6b23eed5717b7474e76.pack  | grep 82daab01f43e34b9f7c8e0db81a9951933b04f1b

82daab01f43e34b9f7c8e0db81a9951933b04f1b commit 94 101 82749 1 ecd0e8c88ed8891da372f5630d542150b0a0531e

The size of the object is 94 bytes
The size of the entry is 101 bytes.

My patching/reconstruction of the object works, the compressed size is 97 bytes. However, I cannot figure out where the 101 comes from. The size of the object header is 2 bytes, the OBJ_REF_DELTA is 20 bytes (the SHA1), but that does not add up to 101 bytes.

I am trying to understand where the 101 bytes comes from.

If not, can you please point me to where in the code the offset size for OBJ_REF_DELTA is calculated. I tried myself from buildin/verify-pack.c, but there seems to be some multi-threating/processing going on and I was not able to determine where the calculation happens.

Thanks!
--
Farhan Khan
PGP Fingerprint: 1312 89CE 663E 1EB2 179C 1C83 C41D 2281 F8DA C0DE


^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: How OBJ_REF_DELTA pack file size calculated
  2024-01-20  4:28 How OBJ_REF_DELTA pack file size calculated Farhan Khan
@ 2024-01-20 18:26 ` Junio C Hamano
  0 siblings, 0 replies; 2+ messages in thread
From: Junio C Hamano @ 2024-01-20 18:26 UTC (permalink / raw)
  To: Farhan Khan; +Cc: git

"Farhan Khan" <farhan@farhan.codes> writes:

> 82daab01f43e34b9f7c8e0db81a9951933b04f1b commit 94 101 82749 1 ecd0e8c88ed8891da372f5630d542150b0a0531e
>
> The size of the object is 94 bytes
> The size of the entry is 101 bytes.

> My patching/reconstruction of the object works, the compressed
> size is 97 bytes.

What do you mean by this?

The dense object header expresses the inflated size of the object
(which should be 94 in your case).  By expressing it as a delta
against some other object in the pack and then deflating the delta,
we should get the data that is much smaller than 94+20 if we choose
to express it in the OBJ_REF_DELTA representation, as with such a
suboptimal delta base, we would be better off expressing it as a
base object that is merely deflated.  We do not need 20-byte offset
overhead, and when reconstructing the object, they do not need to
deflate the base object and apply the delta.

So I am not sure what you mean by "the compressed size is 97 bytes".

> However, I cannot figure out where the 101 comes
> from. The size of the object header is 2 bytes, the OBJ_REF_DELTA
> is 20 bytes (the SHA1), but that does not add up to 101 bytes.

$ git help format-pack

   - The header is followed by a number of object entries, each of
     which looks like this:

     (undeltified representation)
     n-byte type and length (3-bit type, (n-1)*7+4-bit length)
     compressed data

     (deltified representation)
     n-byte type and length (3-bit type, (n-1)*7+4-bit length)
     base object name if OBJ_REF_DELTA or a negative relative
	 offset from the delta object's position in the pack if this
	 is an OBJ_OFS_DELTA object
     compressed delta data

     Observation: the length of each object is encoded in a variable
     length format and is not constrained to 32-bit or anything.

So, if the object header for this object in this pack is 2 bytes
long as you observed above, then 101 bytes should be 2 bytes of 
header, 20 bytes of base object name, and the remainder would be a
deflated delta data that is 101 - 22 = 79 bytes.  Reading the base
object and applying that delta (which deflates to 79 bytes) would
reconstruct the original 94 bytes of the object.


^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2024-01-20 18:26 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-01-20  4:28 How OBJ_REF_DELTA pack file size calculated Farhan Khan
2024-01-20 18:26 ` Junio C Hamano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).