From: Lorenzo Pegorari <lorenzo.pegorari2002@gmail.com>
To: Junio C Hamano <gitster@pobox.com>
Cc: git@vger.kernel.org, Taylor Blau <me@ttaylorr.com>,
Derrick Stolee <stolee@gmail.com>, Patrick Steinhardt <ps@pks.im>,
Tian Yuchen <cat@malon.dev>,
Eric Sunshine <sunshine@sunshineco.com>,
Elijah Newren <newren@gmail.com>
Subject: Re: [GSoC PATCH v5 2/6] repack-promisor add helper to fill promisor file after repack
Date: Fri, 17 Apr 2026 02:30:52 +0200 [thread overview]
Message-ID: <aeF_PL5qYS-7Ogvd@lorenzo-VM> (raw)
In-Reply-To: <xmqqo6jolmla.fsf@gitster.g>
On Sat, Apr 11, 2026 at 11:27:45PM -0700, Junio C Hamano wrote:
> LorenzoPegorari <lorenzo.pegorari2002@gmail.com> writes:
>
> > +/*
> > + * Go through all .promisor files contained in repo (excluding those whose name
> > + * appears in not_repacked_basenames, which acts as a ignorelist), and copies
> > + * their content inside the destination file "<packtmp>-<dest_hex>.promisor".
> > + * Each line of a never repacked .promisor file is: "<oid> <ref>" (as described
> > + * in the write_promisor_file() function).
> > + * After a repack, the copied lines will be: "<oid> <ref> <time>", where <time>
> > + * is the time (in Unix time) at which the .promisor file was last modified.
> > + * Only the lines whose <oid> is present inside "<packtmp>-<dest_hex>.idx" will
> > + * be copied.
> > + * The contents of all .promisor files are assumed to be correctly formed.
> > + */
> > +static void copy_promisor_content(struct repository *repo,
> > + const char *dest_hex,
> > + const char *packtmp,
> > + struct strset *not_repacked_basenames)
> > +{
> > + char *dest_idx_name;
> > + char *dest_promisor_name;
> > + FILE *dest;
> > + struct strset dest_content = STRSET_INIT;
> > + struct strbuf dest_to_write = STRBUF_INIT;
> > + struct strbuf source_promisor_name = STRBUF_INIT;
> > + struct strbuf line = STRBUF_INIT;
> > + struct object_id dest_oid;
> > + struct packed_git *dest_pack, *p;
> > + int err;
> > +
> > + dest_idx_name = mkpathdup("%s-%s.idx", packtmp, dest_hex);
> > + get_oid_hex_algop(dest_hex, &dest_oid, repo->hash_algo);
>
> This needs to prepare for a corrupt input in dest_hex, which would
> result in garbage dest_oid. The helper function should signal a
> failure with its return value, right?
Ack. I think the best way is to signal a `warning()`, and then simply
exit the helper function leaving the ".promisor" file empty.
> > + dest_pack = parse_pack_index(repo, dest_oid.hash, dest_idx_name);
>
> As you earlier mentioned, this use of parse_pack_index() is
> perfectly fine. The call chains that reach here are both from
> cmd_repack() that calls either repack_promisor_objects() or
> pack_geometry_repack_promisors(), and both ran "pack-objects" to
> create a new pack and called finish_repacking_promisor_objects(),
> which in turn calls us, so the dest_hex/packtmp we are dealing with
> point newly created packfile that is about to become but not yet
> completed as a part of this repository. We know we created it, and
> we know "pack-objects" did not fail, so parse_pack_index() being
> loose in validation does not pose a practical problem.
Exactly. I couldn't quite explain it as good as you right now. :)
> This still needs to prepare for parse_pack_index() to return NULL,
> though.
Ack.
> In the above two cases, we should make sure that dest_idx_name gets
> freed before we return control to the caller (possibly signaling an
> error by returning -1, but the current caller is not expecting to
> hear a failure from us and that may be OK).
Again, I think this should be treated the same as when `dest_hex` is
garbage.
> > + /* Open the .promisor dest file, and fill dest_content with its content */
> > + dest_promisor_name = mkpathdup("%s-%s.promisor", packtmp, dest_hex);
> > + dest = xfopen(dest_promisor_name, "r+");
> > + while (strbuf_getline(&line, dest) != EOF)
> > + strset_add(&dest_content, line.buf);
> > +
> > + repo_for_each_pack(repo, p) {
> > + FILE *source;
> > + struct stat source_stat;
> > +
> > + if (!p->pack_promisor)
> > + continue;
> > +
> > + if (not_repacked_basenames &&
> > + strset_contains(not_repacked_basenames, pack_basename(p)))
> > + continue;
> > +
> > + strbuf_reset(&source_promisor_name);
> > + strbuf_addstr(&source_promisor_name, p->pack_name);
> > + strbuf_strip_suffix(&source_promisor_name, ".pack");
> > + strbuf_addstr(&source_promisor_name, ".promisor");
> > +
> > + if (stat(source_promisor_name.buf, &source_stat))
> > + die(_("File not found: %s"), source_promisor_name.buf);
> > +
> > + source = xfopen(source_promisor_name.buf, "r");
> > +
> > + while (strbuf_getline(&line, source) != EOF) {
> > + struct string_list line_sections = STRING_LIST_INIT_DUP;
> > + struct object_id oid;
> > +
> > + /* Split line into <oid>, <ref> and <time> (if <time> exists) */
> > + string_list_split(&line_sections, line.buf, " ", 3);
>
> The strbuf's contents line.buf[] is read/write, so we could use
> line_sections that is initialized with NODUP and call
> split_in_place() to avoid unnecessary small allocations and
> deallocations, no?
I don't think so, because we still need the complete `line` when we
append the <time> to it (if we do so), and when we print it to the
`dest` file. This means that we can't use `split_in_place()` and
initialize it with `NODUP`, because then we would have the complete
`line`.
> More importantly, we say "split into up to 3 pieces". What happens
> if this is totally malformed and there is only one word? Should we
> still trust this line and try to carry it forward? I doubt it.
I think we should discard the line if it can't be split up into 2 or 3
pieces.
> > + /* Ignore the lines where <oid> doesn't appear in the dest_pack */
> > + get_oid_hex_algop(line_sections.items[0].string, &oid, repo->hash_algo);
>
> Or the first word split is not a sane hexadecimal string that
> get_oid_hex() fails?
Same, we should just discard it.
> It would be the simplest to ignore/skip the line, just like what you
> do to a correctly formated line about an irrelevant <oid> (iow, the
> if() statement immediately below).
Agreed.
> > + if (!find_pack_entry_one(&oid, dest_pack)) {
>
> Assuming that the object name was read correctly, if the pack we
> just created does not have the <oid> we read from the existing
> .promisor file, this line we just read has nothing to do with the
> repacked result, so we ignore it, which sounds fine.
>
> > + string_list_clear(&line_sections, 0);
> > + continue;
> > + }
> > +
> > + /* If <time> doesn't exist, retrieve it and add it to line */
> > + if (line_sections.nr < 3)
> > + strbuf_addf(&line, " %" PRItime, (timestamp_t)source_stat.st_mtime);
>
> Should we also validate line_sections[1] in some way? I am not sure
> if we want to call check_ref_format() on it.
>
> If we insist that .nr is at least 2 immediately after we split the
> string, and make sure the line begins with <oid> (i.e., parsable as
> hex object name) that might be sufficient. I dunno.
I think we should check <ref>. Found some success using:
`check_refname_format(<ref>, REFNAME_ALLOW_ONELEVEL)`
> > + /*
> > + * Add the finalized line to dest_to_write and dest_content if it
> > + * wasn't already present inside dest_content
> > + */
> > + if (strset_add(&dest_content, line.buf)) {
> > + strbuf_addbuf(&dest_to_write, &line);
> > + strbuf_addch(&dest_to_write, '\n');
> > + }
> > +
> > + string_list_clear(&line_sections, 0);
> > + }
> > +
> > + err = ferror(source);
> > + err |= fclose(source);
> > + if (err)
> > + die(_("Could not read '%s' promisor file"), source_promisor_name.buf);
> > + }
> > +
> > + /* If dest_to_write is not empty, then there are new lines to append */
> > + if (dest_to_write.len) {
> > + if (fseek(dest, 0L, SEEK_END))
> > + die_errno(_("fseek failed"));
> > + fprintf(dest, "%s", dest_to_write.buf);
> > + }
> > +
> > + err = ferror(dest);
> > + err |= fclose(dest);
> > + if (err)
> > + die(_("Could not write '%s' promisor file"), dest_promisor_name);
> > +
> > + close_pack_index(dest_pack);
>
> As we discussed,
>
> free(dest_pack);
>
> is missing.
Ack.
> > + free(dest_idx_name);
> > + free(dest_promisor_name);
> > + strset_clear(&dest_content);
> > + strbuf_release(&dest_to_write);
> > + strbuf_release(&source_promisor_name);
> > + strbuf_release(&line);
> > +}
> > +
> > static void finish_repacking_promisor_objects(struct repository *repo,
> > struct child_process *cmd,
> > struct string_list *names,
next prev parent reply other threads:[~2026-04-17 0:30 UTC|newest]
Thread overview: 78+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-21 21:28 [GSoC PATCH 0/3] preserve promisor files content after repack LorenzoPegorari
2026-03-21 21:28 ` [GSoC PATCH 1/3] pack-write: add explanation to promisor file content LorenzoPegorari
2026-03-21 21:28 ` [GSoC PATCH 2/3] pack-write: add helper to fill promisor file after repack LorenzoPegorari
2026-03-22 2:04 ` Eric Sunshine
2026-03-22 18:50 ` Lorenzo Pegorari
2026-03-21 21:29 ` [GSoC PATCH 3/3] repack-promisor: preserve content of promisor files " LorenzoPegorari
2026-03-22 19:16 ` [GSoC PATCH v2 0/4] preserve promisor files content " LorenzoPegorari
2026-03-22 19:16 ` [GSoC PATCH v2 1/4] pack-write: add explanation to promisor file content LorenzoPegorari
2026-03-23 21:07 ` Junio C Hamano
2026-03-25 21:33 ` Lorenzo Pegorari
2026-03-22 19:18 ` [GSoC PATCH v2 2/4] pack-write: add helper to fill promisor file after repack LorenzoPegorari
2026-03-23 20:27 ` Eric Sunshine
2026-03-26 16:15 ` Lorenzo Pegorari
2026-03-23 21:30 ` Junio C Hamano
2026-03-26 2:01 ` Lorenzo Pegorari
2026-03-22 19:18 ` [GSoC PATCH v2 3/4] repack-promisor: preserve content of promisor files " LorenzoPegorari
2026-03-23 21:48 ` Junio C Hamano
2026-03-26 2:12 ` Lorenzo Pegorari
2026-03-22 19:18 ` [GSoC PATCH v2 4/4] t7700: test for promisor file content " LorenzoPegorari
2026-04-06 0:23 ` [GSoC PATCH v3 0/5] preserve promisor files " LorenzoPegorari
2026-04-06 0:24 ` [GSoC PATCH v3 1/5] pack-write: add explanation to promisor file content LorenzoPegorari
2026-04-06 0:24 ` [GSoC PATCH v3 2/5] pack-write: add helper to fill promisor file after repack LorenzoPegorari
2026-04-06 17:22 ` Tian Yuchen
2026-04-06 18:40 ` Lorenzo Pegorari
2026-04-06 21:17 ` Junio C Hamano
2026-04-07 21:46 ` Lorenzo Pegorari
2026-04-07 2:01 ` Junio C Hamano
2026-04-07 21:52 ` Lorenzo Pegorari
2026-04-07 22:03 ` Junio C Hamano
2026-04-06 21:34 ` Junio C Hamano
2026-04-07 22:07 ` Lorenzo Pegorari
2026-04-06 0:25 ` [GSoC PATCH v3 3/5] repack-promisor: preserve content of promisor files " LorenzoPegorari
2026-04-06 0:25 ` [GSoC PATCH v3 4/5] t7700: test for promisor file content " LorenzoPegorari
2026-04-06 22:05 ` Junio C Hamano
2026-04-07 23:28 ` Lorenzo Pegorari
2026-04-07 18:10 ` Junio C Hamano
2026-04-07 23:11 ` Lorenzo Pegorari
2026-04-08 0:38 ` Lorenzo Pegorari
2026-04-06 0:25 ` [GSoC PATCH v3 5/5] t7703: test for promisor file content after geometric repack LorenzoPegorari
2026-04-10 15:01 ` [GSoC PATCH v4 0/5] preserve promisor files content after repack LorenzoPegorari
2026-04-10 15:02 ` [GSoC PATCH v4 1/5] pack-write: add explanation to promisor file content LorenzoPegorari
2026-04-10 15:02 ` [GSoC PATCH v4 2/5] pack-write: add helper to fill promisor file after repack LorenzoPegorari
2026-04-10 16:01 ` Junio C Hamano
2026-04-10 16:34 ` Lorenzo Pegorari
2026-04-10 18:10 ` [PATCH] CodingGuidelines: st_mtimespec vs st_mtim vs st_mtime Junio C Hamano
2026-04-16 23:46 ` Elijah Newren
2026-04-17 4:25 ` Junio C Hamano
2026-04-10 15:03 ` [GSoC PATCH v4 3/5] repack-promisor: preserve content of promisor files after repack LorenzoPegorari
2026-04-10 15:04 ` [GSoC PATCH v4 4/5] t7700: test for promisor file content " LorenzoPegorari
2026-04-10 15:04 ` [GSoC PATCH v4 5/5] t7703: test for promisor file content after geometric repack LorenzoPegorari
2026-04-10 15:47 ` [GSoC PATCH v4 0/5] preserve promisor files content after repack Junio C Hamano
2026-04-10 16:44 ` Lorenzo Pegorari
2026-04-10 22:54 ` [GSoC PATCH v5 0/6] " LorenzoPegorari
2026-04-10 22:54 ` [GSoC PATCH v5 1/6] pack-write: add explanation to promisor file content LorenzoPegorari
2026-04-10 22:55 ` [GSoC PATCH v5 2/6] repack-promisor add helper to fill promisor file after repack LorenzoPegorari
2026-04-10 23:30 ` Junio C Hamano
2026-04-11 1:59 ` Lorenzo Pegorari
2026-04-12 6:27 ` Junio C Hamano
2026-04-17 0:30 ` Lorenzo Pegorari [this message]
2026-04-10 22:55 ` [GSoC PATCH v5 3/6] repack-promisor: preserve content of promisor files " LorenzoPegorari
2026-04-11 18:25 ` Tian Yuchen
2026-04-17 0:34 ` Lorenzo Pegorari
2026-04-10 22:55 ` [GSoC PATCH v5 4/6] t7700: test for promisor file content " LorenzoPegorari
2026-04-10 22:56 ` [GSoC PATCH v5 5/6] t7703: test for promisor file content after geometric repack LorenzoPegorari
2026-04-11 18:49 ` Tian Yuchen
2026-04-17 0:46 ` Lorenzo Pegorari
2026-04-10 22:56 ` [GSoC PATCH v5 6/6] repack-promisor: add missing headers LorenzoPegorari
2026-04-18 14:16 ` [GSoC PATCH v6 0/6] preserve promisor files content after repack LorenzoPegorari
2026-04-18 14:16 ` [GSoC PATCH v6 1/6] pack-write: add explanation to promisor file content LorenzoPegorari
2026-04-18 14:17 ` [GSoC PATCH v6 2/6] repack-promisor add helper to fill promisor file after repack LorenzoPegorari
2026-04-18 14:17 ` [GSoC PATCH v6 3/6] repack-promisor: preserve content of promisor files " LorenzoPegorari
2026-04-18 14:17 ` [GSoC PATCH v6 4/6] t7700: test for promisor file content " LorenzoPegorari
2026-04-18 14:17 ` [GSoC PATCH v6 5/6] t7703: test for promisor file content after geometric repack LorenzoPegorari
2026-04-18 14:17 ` [GSoC PATCH v6 6/6] repack-promisor: add missing headers LorenzoPegorari
2026-05-12 6:49 ` [GSoC PATCH v6 0/6] preserve promisor files content after repack Junio C Hamano
2026-04-10 23:05 ` [GSoC PATCH v4 0/5] " Junio C Hamano
2026-04-11 2:02 ` Junio C Hamano
2026-04-11 14:05 ` Lorenzo Pegorari
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aeF_PL5qYS-7Ogvd@lorenzo-VM \
--to=lorenzo.pegorari2002@gmail.com \
--cc=cat@malon.dev \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=me@ttaylorr.com \
--cc=newren@gmail.com \
--cc=ps@pks.im \
--cc=stolee@gmail.com \
--cc=sunshine@sunshineco.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.