From: Junio C Hamano <gitster@pobox.com>
To: "Farhan Khan" <farhan@farhan.codes>
Cc: git@vger.kernel.org
Subject: Re: How OBJ_REF_DELTA pack file size calculated
Date: Sat, 20 Jan 2024 10:26:29 -0800 [thread overview]
Message-ID: <xmqq1qabg8tm.fsf@gitster.g> (raw)
In-Reply-To: <afea6dc9-e557-4730-abe6-00947f77be06@app.fastmail.com> (Farhan Khan's message of "Fri, 19 Jan 2024 23:28:05 -0500")
"Farhan Khan" <farhan@farhan.codes> writes:
> 82daab01f43e34b9f7c8e0db81a9951933b04f1b commit 94 101 82749 1 ecd0e8c88ed8891da372f5630d542150b0a0531e
>
> The size of the object is 94 bytes
> The size of the entry is 101 bytes.
> My patching/reconstruction of the object works, the compressed
> size is 97 bytes.
What do you mean by this?
The dense object header expresses the inflated size of the object
(which should be 94 in your case). By expressing it as a delta
against some other object in the pack and then deflating the delta,
we should get the data that is much smaller than 94+20 if we choose
to express it in the OBJ_REF_DELTA representation, as with such a
suboptimal delta base, we would be better off expressing it as a
base object that is merely deflated. We do not need 20-byte offset
overhead, and when reconstructing the object, they do not need to
deflate the base object and apply the delta.
So I am not sure what you mean by "the compressed size is 97 bytes".
> However, I cannot figure out where the 101 comes
> from. The size of the object header is 2 bytes, the OBJ_REF_DELTA
> is 20 bytes (the SHA1), but that does not add up to 101 bytes.
$ git help format-pack
- The header is followed by a number of object entries, each of
which looks like this:
(undeltified representation)
n-byte type and length (3-bit type, (n-1)*7+4-bit length)
compressed data
(deltified representation)
n-byte type and length (3-bit type, (n-1)*7+4-bit length)
base object name if OBJ_REF_DELTA or a negative relative
offset from the delta object's position in the pack if this
is an OBJ_OFS_DELTA object
compressed delta data
Observation: the length of each object is encoded in a variable
length format and is not constrained to 32-bit or anything.
So, if the object header for this object in this pack is 2 bytes
long as you observed above, then 101 bytes should be 2 bytes of
header, 20 bytes of base object name, and the remainder would be a
deflated delta data that is 101 - 22 = 79 bytes. Reading the base
object and applying that delta (which deflates to 79 bytes) would
reconstruct the original 94 bytes of the object.
prev parent reply other threads:[~2024-01-20 18:26 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-01-20 4:28 How OBJ_REF_DELTA pack file size calculated Farhan Khan
2024-01-20 18:26 ` Junio C Hamano [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=xmqq1qabg8tm.fsf@gitster.g \
--to=gitster@pobox.com \
--cc=farhan@farhan.codes \
--cc=git@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).