From: Junio C Hamano <gitster@pobox.com>
To: Patrick Steinhardt <ps@pks.im>
Cc: Abraham Samuel Adekunle <abrahamadekunle50@gmail.com>,
git@vger.kernel.org,
Christian Couder <christian.couder@gmail.com>,
Karthik Nayak <karthik.188@gmail.com>,
Justin Tobler <jltobler@gmail.com>,
Siddharth Asthana <siddharthasthana31@gmail.com>,
Ayush Chandekar <ayu.chandekar@gmail.com>,
Lucas Seiki Oshiro <lucasseikioshiro@gmail.com>,
Phillip Wood <phillip.wood123@gmail.com>
Subject: Re: [PATCH] repack-promisor: add fake paths to oids when repacking promisor objects
Date: Tue, 31 Mar 2026 08:51:06 -0700 [thread overview]
Message-ID: <xmqqqzp02e1h.fsf@gitster.g> (raw)
In-Reply-To: <actjaxIkDEXHJbyi@pks.im> (Patrick Steinhardt's message of "Tue, 31 Mar 2026 08:02:19 +0200")
Patrick Steinhardt <ps@pks.im> writes:
>> This will ensure they can be grouped by the type and existing pack
>> order which will make them end up close together in the sort, improving
>> delta compression.
>
> I think the general idea may be sound, but ideally we would have some
> benchmarks that demonstrate it actually is. Like, can you come up with
> scenarios where it will indeed improve the packfile size and show the
> advantage of this change? Are there scenarios that are likely to have a
> disadvantage because of this new ordering? Which of these scenarios do
> we expect to be more likely?
>
> Before answering these questions we basically just claim it's going to
> be an improvement without actually verifying.
While it is a very good point that a change that claims to improve
performance must come with verifyable data, because the packfile
size alone is not what you want to optimize for in the first place,
it is quite hard to come up with a useful benchmark in this area.
Back when I was working on packfile generation, we needed to
optimize for two things (luckily they are not competing goals). One
is to choose a good delta-base, which will contribute to an overall
pack size that is smaller. The ordering of the objects in a pack,
on the other hand, does not directly contribute to the size, but has
impact on runtime performance, by keeping related things closer
together to reduce the need to "seek" in the pack stream.
Generally, two objects that appear next to each other in a well
optimized packstream are not expected to be similar with each other.
They are more likely to be two unrelated files that appear in the
same tree object (i.e., they do not delta with each other well, but
at runtime, they are often needed together). So it may even be
detrimental to use the offset in packfile as a clue to choose among
potential delta bases.
Do we have name-hash data for the original pack somewhere available
so that the repacker can take advantage of? If so, it may be more
relevant thing to reuse.
I agree with your other points in your review, too. Thanks for
helping this topic.
prev parent reply other threads:[~2026-03-31 15:51 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-27 16:12 [PATCH] repack-promisor: add fake paths to oids when repacking promisor objects Abraham Samuel Adekunle
2026-03-31 6:02 ` Patrick Steinhardt
2026-03-31 9:53 ` Samuel Abraham
2026-03-31 15:51 ` Junio C Hamano [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=xmqqqzp02e1h.fsf@gitster.g \
--to=gitster@pobox.com \
--cc=abrahamadekunle50@gmail.com \
--cc=ayu.chandekar@gmail.com \
--cc=christian.couder@gmail.com \
--cc=git@vger.kernel.org \
--cc=jltobler@gmail.com \
--cc=karthik.188@gmail.com \
--cc=lucasseikioshiro@gmail.com \
--cc=phillip.wood123@gmail.com \
--cc=ps@pks.im \
--cc=siddharthasthana31@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox