From: Junio C Hamano <gitster@pobox.com>
To: LorenzoPegorari <lorenzo.pegorari2002@gmail.com>
Cc: git@vger.kernel.org, Taylor Blau <me@ttaylorr.com>,
Derrick Stolee <stolee@gmail.com>,
Patrick Steinhardt <ps@pks.im>, Tian Yuchen <cat@malon.dev>,
Eric Sunshine <sunshine@sunshineco.com>,
Elijah Newren <newren@gmail.com>
Subject: Re: [GSoC PATCH v5 2/6] repack-promisor add helper to fill promisor file after repack
Date: Sat, 11 Apr 2026 23:27:45 -0700 [thread overview]
Message-ID: <xmqqo6jolmla.fsf@gitster.g> (raw)
In-Reply-To: <3558bb38956b522c91057598db645eb42ffb48b2.1775861047.git.lorenzo.pegorari2002@gmail.com> (LorenzoPegorari's message of "Sat, 11 Apr 2026 00:55:11 +0200")
LorenzoPegorari <lorenzo.pegorari2002@gmail.com> writes:
> +/*
> + * Go through all .promisor files contained in repo (excluding those whose name
> + * appears in not_repacked_basenames, which acts as a ignorelist), and copies
> + * their content inside the destination file "<packtmp>-<dest_hex>.promisor".
> + * Each line of a never repacked .promisor file is: "<oid> <ref>" (as described
> + * in the write_promisor_file() function).
> + * After a repack, the copied lines will be: "<oid> <ref> <time>", where <time>
> + * is the time (in Unix time) at which the .promisor file was last modified.
> + * Only the lines whose <oid> is present inside "<packtmp>-<dest_hex>.idx" will
> + * be copied.
> + * The contents of all .promisor files are assumed to be correctly formed.
> + */
> +static void copy_promisor_content(struct repository *repo,
> + const char *dest_hex,
> + const char *packtmp,
> + struct strset *not_repacked_basenames)
> +{
> + char *dest_idx_name;
> + char *dest_promisor_name;
> + FILE *dest;
> + struct strset dest_content = STRSET_INIT;
> + struct strbuf dest_to_write = STRBUF_INIT;
> + struct strbuf source_promisor_name = STRBUF_INIT;
> + struct strbuf line = STRBUF_INIT;
> + struct object_id dest_oid;
> + struct packed_git *dest_pack, *p;
> + int err;
> +
> + dest_idx_name = mkpathdup("%s-%s.idx", packtmp, dest_hex);
> + get_oid_hex_algop(dest_hex, &dest_oid, repo->hash_algo);
This needs to prepare for a corrupt input in dest_hex, which would
result in garbage dest_oid. The helper function should signal a
failure with its return value, right?
> + dest_pack = parse_pack_index(repo, dest_oid.hash, dest_idx_name);
As you earlier mentioned, this use of parse_pack_index() is
perfectly fine. The call chains that reach here are both from
cmd_repack() that calls either repack_promisor_objects() or
pack_geometry_repack_promisors(), and both ran "pack-objects" to
create a new pack and called finish_repacking_promisor_objects(),
which in turn calls us, so the dest_hex/packtmp we are dealing with
point newly created packfile that is about to become but not yet
completed as a part of this repository. We know we created it, and
we know "pack-objects" did not fail, so parse_pack_index() being
loose in validation does not pose a practical problem.
This still needs to prepare for parse_pack_index() to return NULL,
though.
In the above two cases, we should make sure that dest_idx_name gets
freed before we return control to the caller (possibly signaling an
error by returning -1, but the current caller is not expecting to
hear a failure from us and that may be OK).
> + /* Open the .promisor dest file, and fill dest_content with its content */
> + dest_promisor_name = mkpathdup("%s-%s.promisor", packtmp, dest_hex);
> + dest = xfopen(dest_promisor_name, "r+");
> + while (strbuf_getline(&line, dest) != EOF)
> + strset_add(&dest_content, line.buf);
> +
> + repo_for_each_pack(repo, p) {
> + FILE *source;
> + struct stat source_stat;
> +
> + if (!p->pack_promisor)
> + continue;
> +
> + if (not_repacked_basenames &&
> + strset_contains(not_repacked_basenames, pack_basename(p)))
> + continue;
> +
> + strbuf_reset(&source_promisor_name);
> + strbuf_addstr(&source_promisor_name, p->pack_name);
> + strbuf_strip_suffix(&source_promisor_name, ".pack");
> + strbuf_addstr(&source_promisor_name, ".promisor");
> +
> + if (stat(source_promisor_name.buf, &source_stat))
> + die(_("File not found: %s"), source_promisor_name.buf);
> +
> + source = xfopen(source_promisor_name.buf, "r");
> +
> + while (strbuf_getline(&line, source) != EOF) {
> + struct string_list line_sections = STRING_LIST_INIT_DUP;
> + struct object_id oid;
> +
> + /* Split line into <oid>, <ref> and <time> (if <time> exists) */
> + string_list_split(&line_sections, line.buf, " ", 3);
The strbuf's contents line.buf[] is read/write, so we could use
line_sections that is initialized with NODUP and call
split_in_place() to avoid unnecessary small allocations and
deallocations, no?
More importantly, we say "split into up to 3 pieces". What happens
if this is totally malformed and there is only one word? Should we
still trust this line and try to carry it forward? I doubt it.
> + /* Ignore the lines where <oid> doesn't appear in the dest_pack */
> + get_oid_hex_algop(line_sections.items[0].string, &oid, repo->hash_algo);
Or the first word split is not a sane hexadecimal string that
get_oid_hex() fails?
It would be the simplest to ignore/skip the line, just like what you
do to a correctly formated line about an irrelevant <oid> (iow, the
if() statement immediately below).
> + if (!find_pack_entry_one(&oid, dest_pack)) {
Assuming that the object name was read correctly, if the pack we
just created does not have the <oid> we read from the existing
.promisor file, this line we just read has nothing to do with the
repacked result, so we ignore it, which sounds fine.
> + string_list_clear(&line_sections, 0);
> + continue;
> + }
> +
> + /* If <time> doesn't exist, retrieve it and add it to line */
> + if (line_sections.nr < 3)
> + strbuf_addf(&line, " %" PRItime, (timestamp_t)source_stat.st_mtime);
Should we also validate line_sections[1] in some way? I am not sure
if we want to call check_ref_format() on it.
If we insist that .nr is at least 2 immediately after we split the
string, and make sure the line begins with <oid> (i.e., parsable as
hex object name) that might be sufficient. I dunno.
> + /*
> + * Add the finalized line to dest_to_write and dest_content if it
> + * wasn't already present inside dest_content
> + */
> + if (strset_add(&dest_content, line.buf)) {
> + strbuf_addbuf(&dest_to_write, &line);
> + strbuf_addch(&dest_to_write, '\n');
> + }
> +
> + string_list_clear(&line_sections, 0);
> + }
> +
> + err = ferror(source);
> + err |= fclose(source);
> + if (err)
> + die(_("Could not read '%s' promisor file"), source_promisor_name.buf);
> + }
> +
> + /* If dest_to_write is not empty, then there are new lines to append */
> + if (dest_to_write.len) {
> + if (fseek(dest, 0L, SEEK_END))
> + die_errno(_("fseek failed"));
> + fprintf(dest, "%s", dest_to_write.buf);
> + }
> +
> + err = ferror(dest);
> + err |= fclose(dest);
> + if (err)
> + die(_("Could not write '%s' promisor file"), dest_promisor_name);
> +
> + close_pack_index(dest_pack);
As we discussed,
free(dest_pack);
is missing.
> + free(dest_idx_name);
> + free(dest_promisor_name);
> + strset_clear(&dest_content);
> + strbuf_release(&dest_to_write);
> + strbuf_release(&source_promisor_name);
> + strbuf_release(&line);
> +}
> +
> static void finish_repacking_promisor_objects(struct repository *repo,
> struct child_process *cmd,
> struct string_list *names,
next prev parent reply other threads:[~2026-04-12 6:27 UTC|newest]
Thread overview: 78+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-21 21:28 [GSoC PATCH 0/3] preserve promisor files content after repack LorenzoPegorari
2026-03-21 21:28 ` [GSoC PATCH 1/3] pack-write: add explanation to promisor file content LorenzoPegorari
2026-03-21 21:28 ` [GSoC PATCH 2/3] pack-write: add helper to fill promisor file after repack LorenzoPegorari
2026-03-22 2:04 ` Eric Sunshine
2026-03-22 18:50 ` Lorenzo Pegorari
2026-03-21 21:29 ` [GSoC PATCH 3/3] repack-promisor: preserve content of promisor files " LorenzoPegorari
2026-03-22 19:16 ` [GSoC PATCH v2 0/4] preserve promisor files content " LorenzoPegorari
2026-03-22 19:16 ` [GSoC PATCH v2 1/4] pack-write: add explanation to promisor file content LorenzoPegorari
2026-03-23 21:07 ` Junio C Hamano
2026-03-25 21:33 ` Lorenzo Pegorari
2026-03-22 19:18 ` [GSoC PATCH v2 2/4] pack-write: add helper to fill promisor file after repack LorenzoPegorari
2026-03-23 20:27 ` Eric Sunshine
2026-03-26 16:15 ` Lorenzo Pegorari
2026-03-23 21:30 ` Junio C Hamano
2026-03-26 2:01 ` Lorenzo Pegorari
2026-03-22 19:18 ` [GSoC PATCH v2 3/4] repack-promisor: preserve content of promisor files " LorenzoPegorari
2026-03-23 21:48 ` Junio C Hamano
2026-03-26 2:12 ` Lorenzo Pegorari
2026-03-22 19:18 ` [GSoC PATCH v2 4/4] t7700: test for promisor file content " LorenzoPegorari
2026-04-06 0:23 ` [GSoC PATCH v3 0/5] preserve promisor files " LorenzoPegorari
2026-04-06 0:24 ` [GSoC PATCH v3 1/5] pack-write: add explanation to promisor file content LorenzoPegorari
2026-04-06 0:24 ` [GSoC PATCH v3 2/5] pack-write: add helper to fill promisor file after repack LorenzoPegorari
2026-04-06 17:22 ` Tian Yuchen
2026-04-06 18:40 ` Lorenzo Pegorari
2026-04-06 21:17 ` Junio C Hamano
2026-04-07 21:46 ` Lorenzo Pegorari
2026-04-07 2:01 ` Junio C Hamano
2026-04-07 21:52 ` Lorenzo Pegorari
2026-04-07 22:03 ` Junio C Hamano
2026-04-06 21:34 ` Junio C Hamano
2026-04-07 22:07 ` Lorenzo Pegorari
2026-04-06 0:25 ` [GSoC PATCH v3 3/5] repack-promisor: preserve content of promisor files " LorenzoPegorari
2026-04-06 0:25 ` [GSoC PATCH v3 4/5] t7700: test for promisor file content " LorenzoPegorari
2026-04-06 22:05 ` Junio C Hamano
2026-04-07 23:28 ` Lorenzo Pegorari
2026-04-07 18:10 ` Junio C Hamano
2026-04-07 23:11 ` Lorenzo Pegorari
2026-04-08 0:38 ` Lorenzo Pegorari
2026-04-06 0:25 ` [GSoC PATCH v3 5/5] t7703: test for promisor file content after geometric repack LorenzoPegorari
2026-04-10 15:01 ` [GSoC PATCH v4 0/5] preserve promisor files content after repack LorenzoPegorari
2026-04-10 15:02 ` [GSoC PATCH v4 1/5] pack-write: add explanation to promisor file content LorenzoPegorari
2026-04-10 15:02 ` [GSoC PATCH v4 2/5] pack-write: add helper to fill promisor file after repack LorenzoPegorari
2026-04-10 16:01 ` Junio C Hamano
2026-04-10 16:34 ` Lorenzo Pegorari
2026-04-10 18:10 ` [PATCH] CodingGuidelines: st_mtimespec vs st_mtim vs st_mtime Junio C Hamano
2026-04-16 23:46 ` Elijah Newren
2026-04-17 4:25 ` Junio C Hamano
2026-04-10 15:03 ` [GSoC PATCH v4 3/5] repack-promisor: preserve content of promisor files after repack LorenzoPegorari
2026-04-10 15:04 ` [GSoC PATCH v4 4/5] t7700: test for promisor file content " LorenzoPegorari
2026-04-10 15:04 ` [GSoC PATCH v4 5/5] t7703: test for promisor file content after geometric repack LorenzoPegorari
2026-04-10 15:47 ` [GSoC PATCH v4 0/5] preserve promisor files content after repack Junio C Hamano
2026-04-10 16:44 ` Lorenzo Pegorari
2026-04-10 22:54 ` [GSoC PATCH v5 0/6] " LorenzoPegorari
2026-04-10 22:54 ` [GSoC PATCH v5 1/6] pack-write: add explanation to promisor file content LorenzoPegorari
2026-04-10 22:55 ` [GSoC PATCH v5 2/6] repack-promisor add helper to fill promisor file after repack LorenzoPegorari
2026-04-10 23:30 ` Junio C Hamano
2026-04-11 1:59 ` Lorenzo Pegorari
2026-04-12 6:27 ` Junio C Hamano [this message]
2026-04-17 0:30 ` Lorenzo Pegorari
2026-04-10 22:55 ` [GSoC PATCH v5 3/6] repack-promisor: preserve content of promisor files " LorenzoPegorari
2026-04-11 18:25 ` Tian Yuchen
2026-04-17 0:34 ` Lorenzo Pegorari
2026-04-10 22:55 ` [GSoC PATCH v5 4/6] t7700: test for promisor file content " LorenzoPegorari
2026-04-10 22:56 ` [GSoC PATCH v5 5/6] t7703: test for promisor file content after geometric repack LorenzoPegorari
2026-04-11 18:49 ` Tian Yuchen
2026-04-17 0:46 ` Lorenzo Pegorari
2026-04-10 22:56 ` [GSoC PATCH v5 6/6] repack-promisor: add missing headers LorenzoPegorari
2026-04-18 14:16 ` [GSoC PATCH v6 0/6] preserve promisor files content after repack LorenzoPegorari
2026-04-18 14:16 ` [GSoC PATCH v6 1/6] pack-write: add explanation to promisor file content LorenzoPegorari
2026-04-18 14:17 ` [GSoC PATCH v6 2/6] repack-promisor add helper to fill promisor file after repack LorenzoPegorari
2026-04-18 14:17 ` [GSoC PATCH v6 3/6] repack-promisor: preserve content of promisor files " LorenzoPegorari
2026-04-18 14:17 ` [GSoC PATCH v6 4/6] t7700: test for promisor file content " LorenzoPegorari
2026-04-18 14:17 ` [GSoC PATCH v6 5/6] t7703: test for promisor file content after geometric repack LorenzoPegorari
2026-04-18 14:17 ` [GSoC PATCH v6 6/6] repack-promisor: add missing headers LorenzoPegorari
2026-05-12 6:49 ` [GSoC PATCH v6 0/6] preserve promisor files content after repack Junio C Hamano
2026-04-10 23:05 ` [GSoC PATCH v4 0/5] " Junio C Hamano
2026-04-11 2:02 ` Junio C Hamano
2026-04-11 14:05 ` Lorenzo Pegorari
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=xmqqo6jolmla.fsf@gitster.g \
--to=gitster@pobox.com \
--cc=cat@malon.dev \
--cc=git@vger.kernel.org \
--cc=lorenzo.pegorari2002@gmail.com \
--cc=me@ttaylorr.com \
--cc=newren@gmail.com \
--cc=ps@pks.im \
--cc=stolee@gmail.com \
--cc=sunshine@sunshineco.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.