From: Patrick Steinhardt <ps@pks.im>
To: Abraham Samuel Adekunle <abrahamadekunle50@gmail.com>
Cc: git@vger.kernel.org,
Christian Couder <christian.couder@gmail.com>,
Karthik Nayak <karthik.188@gmail.com>,
Justin Tobler <jltobler@gmail.com>,
Siddharth Asthana <siddharthasthana31@gmail.com>,
Ayush Chandekar <ayu.chandekar@gmail.com>,
Lucas Seiki Oshiro <lucasseikioshiro@gmail.com>,
Junio C Hamano <gitster@pobox.com>,
Phillip Wood <phillip.wood123@gmail.com>
Subject: Re: [PATCH] repack-promisor: add fake paths to oids when repacking promisor objects
Date: Tue, 31 Mar 2026 08:02:19 +0200 [thread overview]
Message-ID: <actjaxIkDEXHJbyi@pks.im> (raw)
In-Reply-To: <acasFS_UXC8NybtE@Adekunles-MacBook-Air.local>
On Fri, Mar 27, 2026 at 05:12:43PM +0100, Abraham Samuel Adekunle wrote:
> This change addresses the NEEDSWORK comment added by commit
> 5d19e81 (repack: repack promisor objects if -a or -A is set).
>
> When 'git-repack' repacks promisor objects, only the raw oids
> are sent to 'git-pack-objects'.
> This gives 'git-pack-objects' no information about the original
> pack order of those objects in the packfile so it must
> rely on its default strategy of sorting the objects by type and
> then by size over again. This can produce suboptimal packfiles
> because the objects that were previously stored in the same
> packfile can become separated.
>
> Provide a hint to 'git-pack-objects' when sorting, by using the
> packfile basename, and the offset of the object in the existing
> packfile as fake paths when writing the oids to 'git-pack-objects'.
>
> This will ensure they can be grouped by the type and existing pack
> order which will make them end up close together in the sort, improving
> delta compression.
I think the general idea may be sound, but ideally we would have some
benchmarks that demonstrate it actually is. Like, can you come up with
scenarios where it will indeed improve the packfile size and show the
advantage of this change? Are there scenarios that are likely to have a
disadvantage because of this new ordering? Which of these scenarios do
we expect to be more likely?
Before answering these questions we basically just claim it's going to
be an improvement without actually verifying.
> diff --git a/repack-promisor.c b/repack-promisor.c
> index 90318ce150..3f3034fb79 100644
> --- a/repack-promisor.c
> +++ b/repack-promisor.c
> @@ -12,25 +12,51 @@ struct write_oid_context {
> const struct git_hash_algo *algop;
> };
>
> +/**
> + * Build fake path for the objects to give pack-objects
> + * an ordering hint.
> + * For the packed objects: pack-basename/offset-padded
> + */
> +
Nit: this empty line can be removed.
> +static void build_ordering_hint(struct object_info *oi, struct strbuf *hint)
> +{
> + struct packed_git *pack;
> + unsigned long offset;
> +
> + if (oi->whence == OI_PACKED) {
> + pack = oi->u.packed.pack;
> + offset = oi->u.packed.offset;
> + strbuf_addf(hint, "%s/%05lu", pack_basename(pack), (unsigned long)offset);
> + } else
> + strbuf_addstr(hint, "loose");
Nit: our coding guidelines say that once an if statement requires curly
braces in one branch, all branches should have them.
I also have to wonder whether it's going to be a benefit to also specify
a hint for loose objects, or whether we should rather not write any hint
at all for them.
> +}
> +
> /*
> * Write oid to the given struct child_process's stdin, starting it first if
> * necessary.
> */
> static int write_oid(const struct object_id *oid,
> - struct object_info *oi UNUSED,
> + struct object_info *oi,
> void *data)
> {
> struct write_oid_context *ctx = data;
> struct child_process *cmd = ctx->cmd;
> + struct strbuf hint = STRBUF_INIT;
>
> if (cmd->in == -1) {
> if (start_command(cmd))
> die(_("could not start pack-objects to repack promisor objects"));
> }
>
> + build_ordering_hint(oi, &hint);
> +
> if (write_in_full(cmd->in, oid_to_hex(oid), ctx->algop->hexsz) < 0 ||
> + write_in_full(cmd->in, " ", 1) < 0 ||
> + write_in_full(cmd->in, hint.buf, hint.len) < 0 ||
> write_in_full(cmd->in, "\n", 1) < 0)
> die(_("failed to feed promisor objects to pack-objects"));
This now translate into at least four write(3p) syscalls per object. Can
we maybe reuse a buffer so that we can reduce the number of syscalls?
Thanks!
Patrick
next prev parent reply other threads:[~2026-03-31 6:02 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-27 16:12 [PATCH] repack-promisor: add fake paths to oids when repacking promisor objects Abraham Samuel Adekunle
2026-03-31 6:02 ` Patrick Steinhardt [this message]
2026-03-31 9:53 ` Samuel Abraham
2026-03-31 15:51 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=actjaxIkDEXHJbyi@pks.im \
--to=ps@pks.im \
--cc=abrahamadekunle50@gmail.com \
--cc=ayu.chandekar@gmail.com \
--cc=christian.couder@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=jltobler@gmail.com \
--cc=karthik.188@gmail.com \
--cc=lucasseikioshiro@gmail.com \
--cc=phillip.wood123@gmail.com \
--cc=siddharthasthana31@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox