Git development
 help / color / mirror / Atom feed
From: Lorenzo Pegorari <lorenzo.pegorari2002@gmail.com>
To: Junio C Hamano <gitster@pobox.com>
Cc: git@vger.kernel.org, Taylor Blau <me@ttaylorr.com>,
	Derrick Stolee <stolee@gmail.com>, Patrick Steinhardt <ps@pks.im>,
	Tian Yuchen <cat@malon.dev>,
	Eric Sunshine <sunshine@sunshineco.com>,
	Elijah Newren <newren@gmail.com>
Subject: Re: [GSoC PATCH v5 2/6] repack-promisor add helper to fill promisor file after repack
Date: Fri, 17 Apr 2026 02:30:52 +0200	[thread overview]
Message-ID: <aeF_PL5qYS-7Ogvd@lorenzo-VM> (raw)
In-Reply-To: <xmqqo6jolmla.fsf@gitster.g>

On Sat, Apr 11, 2026 at 11:27:45PM -0700, Junio C Hamano wrote:
> LorenzoPegorari <lorenzo.pegorari2002@gmail.com> writes:
> 
> > +/*
> > + * Go through all .promisor files contained in repo (excluding those whose name
> > + * appears in not_repacked_basenames, which acts as a ignorelist), and copies
> > + * their content inside the destination file "<packtmp>-<dest_hex>.promisor".
> > + * Each line of a never repacked .promisor file is: "<oid> <ref>" (as described
> > + * in the write_promisor_file() function).
> > + * After a repack, the copied lines will be: "<oid> <ref> <time>", where <time>
> > + * is the time (in Unix time) at which the .promisor file was last modified.
> > + * Only the lines whose <oid> is present inside "<packtmp>-<dest_hex>.idx" will
> > + * be copied.
> > + * The contents of all .promisor files are assumed to be correctly formed.
> > + */
> > +static void copy_promisor_content(struct repository *repo,
> > +					      const char *dest_hex,
> > +					      const char *packtmp,
> > +					      struct strset *not_repacked_basenames)
> > +{
> > +	char *dest_idx_name;
> > +	char *dest_promisor_name;
> > +	FILE *dest;
> > +	struct strset dest_content = STRSET_INIT;
> > +	struct strbuf dest_to_write = STRBUF_INIT;
> > +	struct strbuf source_promisor_name = STRBUF_INIT;
> > +	struct strbuf line = STRBUF_INIT;
> > +	struct object_id dest_oid;
> > +	struct packed_git *dest_pack, *p;
> > +	int err;
> > +
> > +	dest_idx_name = mkpathdup("%s-%s.idx", packtmp, dest_hex);
> > +	get_oid_hex_algop(dest_hex, &dest_oid, repo->hash_algo);
> 
> This needs to prepare for a corrupt input in dest_hex, which would
> result in garbage dest_oid.  The helper function should signal a
> failure with its return value, right?

Ack. I think the best way is to signal a `warning()`, and then simply
exit the helper function leaving the ".promisor" file empty.

> > +	dest_pack = parse_pack_index(repo, dest_oid.hash, dest_idx_name);
> 
> As you earlier mentioned, this use of parse_pack_index() is
> perfectly fine.  The call chains that reach here are both from
> cmd_repack() that calls either repack_promisor_objects() or
> pack_geometry_repack_promisors(), and both ran "pack-objects" to
> create a new pack and called finish_repacking_promisor_objects(),
> which in turn calls us, so the dest_hex/packtmp we are dealing with
> point newly created packfile that is about to become but not yet
> completed as a part of this repository.  We know we created it, and
> we know "pack-objects" did not fail, so parse_pack_index() being
> loose in validation does not pose a practical problem.

Exactly. I couldn't quite explain it as good as you right now. :)

> This still needs to prepare for parse_pack_index() to return NULL,
> though.

Ack.

> In the above two cases, we should make sure that dest_idx_name gets
> freed before we return control to the caller (possibly signaling an
> error by returning -1, but the current caller is not expecting to
> hear a failure from us and that may be OK).

Again, I think this should be treated the same as when `dest_hex` is
garbage.

> > +	/* Open the .promisor dest file, and fill dest_content with its content */
> > +	dest_promisor_name = mkpathdup("%s-%s.promisor", packtmp, dest_hex);
> > +	dest = xfopen(dest_promisor_name, "r+");
> > +	while (strbuf_getline(&line, dest) != EOF)
> > +		strset_add(&dest_content, line.buf);
> > +
> > +	repo_for_each_pack(repo, p) {
> > +		FILE *source;
> > +		struct stat source_stat;
> > +
> > +		if (!p->pack_promisor)
> > +			continue;
> > +
> > +		if (not_repacked_basenames &&
> > +			strset_contains(not_repacked_basenames, pack_basename(p)))
> > +			continue;
> > +
> > +		strbuf_reset(&source_promisor_name);
> > +		strbuf_addstr(&source_promisor_name, p->pack_name);
> > +		strbuf_strip_suffix(&source_promisor_name, ".pack");
> > +		strbuf_addstr(&source_promisor_name, ".promisor");
> > +
> > +		if (stat(source_promisor_name.buf, &source_stat))
> > +			die(_("File not found: %s"), source_promisor_name.buf);
> > +
> > +		source = xfopen(source_promisor_name.buf, "r");
> > +
> > +		while (strbuf_getline(&line, source) != EOF) {
> > +			struct string_list line_sections = STRING_LIST_INIT_DUP;
> > +			struct object_id oid;
> > +
> > +			/* Split line into <oid>, <ref> and <time> (if <time> exists) */
> > +			string_list_split(&line_sections, line.buf, " ", 3);
> 
> The strbuf's contents line.buf[] is read/write, so we could use
> line_sections that is initialized with NODUP and call
> split_in_place() to avoid unnecessary small allocations and
> deallocations, no?

I don't think so, because we still need the complete `line` when we
append the <time> to it (if we do so), and when we print it to the
`dest` file. This means that we can't use `split_in_place()` and
initialize it with `NODUP`, because then we would have the complete
`line`.

> More importantly, we say "split into up to 3 pieces".  What happens
> if this is totally malformed and there is only one word?  Should we
> still trust this line and try to carry it forward?  I doubt it.

I think we should discard the line if it can't be split up into 2 or 3
pieces.

> > +			/* Ignore the lines where <oid> doesn't appear in the dest_pack */
> > +			get_oid_hex_algop(line_sections.items[0].string, &oid, repo->hash_algo);
> 
> Or the first word split is not a sane hexadecimal string that
> get_oid_hex() fails?

Same, we should just discard it.

> It would be the simplest to ignore/skip the line, just like what you
> do to a correctly formated line about an irrelevant <oid> (iow, the
> if() statement immediately below).

Agreed.

> > +			if (!find_pack_entry_one(&oid, dest_pack)) {
> 
> Assuming that the object name was read correctly, if the pack we
> just created does not have the <oid> we read from the existing
> .promisor file, this line we just read has nothing to do with the
> repacked result, so we ignore it, which sounds fine.
> 
> > +				string_list_clear(&line_sections, 0);
> > +				continue;
> > +			}
> > +
> > +			/* If <time> doesn't exist, retrieve it and add it to line */
> > +			if (line_sections.nr < 3)
> > +				strbuf_addf(&line, " %" PRItime, (timestamp_t)source_stat.st_mtime);
> 
> Should we also validate line_sections[1] in some way?  I am not sure
> if we want to call check_ref_format() on it.
>
> If we insist that .nr is at least 2 immediately after we split the
> string, and make sure the line begins with <oid> (i.e., parsable as
> hex object name) that might be sufficient.  I dunno.

I think we should check <ref>. Found some success using:
        `check_refname_format(<ref>, REFNAME_ALLOW_ONELEVEL)`

> > +			/*
> > +			 * Add the finalized line to dest_to_write and dest_content if it
> > +			 * wasn't already present inside dest_content
> > +			 */
> > +			if (strset_add(&dest_content, line.buf)) {
> > +				strbuf_addbuf(&dest_to_write, &line);
> > +				strbuf_addch(&dest_to_write, '\n');
> > +			}
> > +
> > +			string_list_clear(&line_sections, 0);
> > +		}
> > +
> > +		err = ferror(source);
> > +		err |= fclose(source);
> > +		if (err)
> > +			die(_("Could not read '%s' promisor file"), source_promisor_name.buf);
> > +	}
> > +
> > +	/* If dest_to_write is not empty, then there are new lines to append */
> > +	if (dest_to_write.len) {
> > +		if (fseek(dest, 0L, SEEK_END))
> > +			die_errno(_("fseek failed"));
> > +		fprintf(dest, "%s", dest_to_write.buf);
> > +	}
> > +
> > +	err = ferror(dest);
> > +	err |= fclose(dest);
> > +	if (err)
> > +		die(_("Could not write '%s' promisor file"), dest_promisor_name);
> > +
> > +	close_pack_index(dest_pack);
> 
> As we discussed, 
> 
> 	free(dest_pack);
> 
> is missing.

Ack.

> > +	free(dest_idx_name);
> > +	free(dest_promisor_name);
> > +	strset_clear(&dest_content);
> > +	strbuf_release(&dest_to_write);
> > +	strbuf_release(&source_promisor_name);
> > +	strbuf_release(&line);
> > +}
> > +
> >  static void finish_repacking_promisor_objects(struct repository *repo,
> >  					      struct child_process *cmd,
> >  					      struct string_list *names,

  reply	other threads:[~2026-04-17  0:30 UTC|newest]

Thread overview: 78+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-21 21:28 [GSoC PATCH 0/3] preserve promisor files content after repack LorenzoPegorari
2026-03-21 21:28 ` [GSoC PATCH 1/3] pack-write: add explanation to promisor file content LorenzoPegorari
2026-03-21 21:28 ` [GSoC PATCH 2/3] pack-write: add helper to fill promisor file after repack LorenzoPegorari
2026-03-22  2:04   ` Eric Sunshine
2026-03-22 18:50     ` Lorenzo Pegorari
2026-03-21 21:29 ` [GSoC PATCH 3/3] repack-promisor: preserve content of promisor files " LorenzoPegorari
2026-03-22 19:16 ` [GSoC PATCH v2 0/4] preserve promisor files content " LorenzoPegorari
2026-03-22 19:16   ` [GSoC PATCH v2 1/4] pack-write: add explanation to promisor file content LorenzoPegorari
2026-03-23 21:07     ` Junio C Hamano
2026-03-25 21:33       ` Lorenzo Pegorari
2026-03-22 19:18   ` [GSoC PATCH v2 2/4] pack-write: add helper to fill promisor file after repack LorenzoPegorari
2026-03-23 20:27     ` Eric Sunshine
2026-03-26 16:15       ` Lorenzo Pegorari
2026-03-23 21:30     ` Junio C Hamano
2026-03-26  2:01       ` Lorenzo Pegorari
2026-03-22 19:18   ` [GSoC PATCH v2 3/4] repack-promisor: preserve content of promisor files " LorenzoPegorari
2026-03-23 21:48     ` Junio C Hamano
2026-03-26  2:12       ` Lorenzo Pegorari
2026-03-22 19:18   ` [GSoC PATCH v2 4/4] t7700: test for promisor file content " LorenzoPegorari
2026-04-06  0:23   ` [GSoC PATCH v3 0/5] preserve promisor files " LorenzoPegorari
2026-04-06  0:24     ` [GSoC PATCH v3 1/5] pack-write: add explanation to promisor file content LorenzoPegorari
2026-04-06  0:24     ` [GSoC PATCH v3 2/5] pack-write: add helper to fill promisor file after repack LorenzoPegorari
2026-04-06 17:22       ` Tian Yuchen
2026-04-06 18:40         ` Lorenzo Pegorari
2026-04-06 21:17           ` Junio C Hamano
2026-04-07 21:46             ` Lorenzo Pegorari
2026-04-07  2:01           ` Junio C Hamano
2026-04-07 21:52             ` Lorenzo Pegorari
2026-04-07 22:03               ` Junio C Hamano
2026-04-06 21:34       ` Junio C Hamano
2026-04-07 22:07         ` Lorenzo Pegorari
2026-04-06  0:25     ` [GSoC PATCH v3 3/5] repack-promisor: preserve content of promisor files " LorenzoPegorari
2026-04-06  0:25     ` [GSoC PATCH v3 4/5] t7700: test for promisor file content " LorenzoPegorari
2026-04-06 22:05       ` Junio C Hamano
2026-04-07 23:28         ` Lorenzo Pegorari
2026-04-07 18:10       ` Junio C Hamano
2026-04-07 23:11         ` Lorenzo Pegorari
2026-04-08  0:38           ` Lorenzo Pegorari
2026-04-06  0:25     ` [GSoC PATCH v3 5/5] t7703: test for promisor file content after geometric repack LorenzoPegorari
2026-04-10 15:01     ` [GSoC PATCH v4 0/5] preserve promisor files content after repack LorenzoPegorari
2026-04-10 15:02       ` [GSoC PATCH v4 1/5] pack-write: add explanation to promisor file content LorenzoPegorari
2026-04-10 15:02       ` [GSoC PATCH v4 2/5] pack-write: add helper to fill promisor file after repack LorenzoPegorari
2026-04-10 16:01         ` Junio C Hamano
2026-04-10 16:34           ` Lorenzo Pegorari
2026-04-10 18:10           ` [PATCH] CodingGuidelines: st_mtimespec vs st_mtim vs st_mtime Junio C Hamano
2026-04-16 23:46             ` Elijah Newren
2026-04-17  4:25               ` Junio C Hamano
2026-04-10 15:03       ` [GSoC PATCH v4 3/5] repack-promisor: preserve content of promisor files after repack LorenzoPegorari
2026-04-10 15:04       ` [GSoC PATCH v4 4/5] t7700: test for promisor file content " LorenzoPegorari
2026-04-10 15:04       ` [GSoC PATCH v4 5/5] t7703: test for promisor file content after geometric repack LorenzoPegorari
2026-04-10 15:47       ` [GSoC PATCH v4 0/5] preserve promisor files content after repack Junio C Hamano
2026-04-10 16:44         ` Lorenzo Pegorari
2026-04-10 22:54       ` [GSoC PATCH v5 0/6] " LorenzoPegorari
2026-04-10 22:54         ` [GSoC PATCH v5 1/6] pack-write: add explanation to promisor file content LorenzoPegorari
2026-04-10 22:55         ` [GSoC PATCH v5 2/6] repack-promisor add helper to fill promisor file after repack LorenzoPegorari
2026-04-10 23:30           ` Junio C Hamano
2026-04-11  1:59             ` Lorenzo Pegorari
2026-04-12  6:27           ` Junio C Hamano
2026-04-17  0:30             ` Lorenzo Pegorari [this message]
2026-04-10 22:55         ` [GSoC PATCH v5 3/6] repack-promisor: preserve content of promisor files " LorenzoPegorari
2026-04-11 18:25           ` Tian Yuchen
2026-04-17  0:34             ` Lorenzo Pegorari
2026-04-10 22:55         ` [GSoC PATCH v5 4/6] t7700: test for promisor file content " LorenzoPegorari
2026-04-10 22:56         ` [GSoC PATCH v5 5/6] t7703: test for promisor file content after geometric repack LorenzoPegorari
2026-04-11 18:49           ` Tian Yuchen
2026-04-17  0:46             ` Lorenzo Pegorari
2026-04-10 22:56         ` [GSoC PATCH v5 6/6] repack-promisor: add missing headers LorenzoPegorari
2026-04-18 14:16         ` [GSoC PATCH v6 0/6] preserve promisor files content after repack LorenzoPegorari
2026-04-18 14:16           ` [GSoC PATCH v6 1/6] pack-write: add explanation to promisor file content LorenzoPegorari
2026-04-18 14:17           ` [GSoC PATCH v6 2/6] repack-promisor add helper to fill promisor file after repack LorenzoPegorari
2026-04-18 14:17           ` [GSoC PATCH v6 3/6] repack-promisor: preserve content of promisor files " LorenzoPegorari
2026-04-18 14:17           ` [GSoC PATCH v6 4/6] t7700: test for promisor file content " LorenzoPegorari
2026-04-18 14:17           ` [GSoC PATCH v6 5/6] t7703: test for promisor file content after geometric repack LorenzoPegorari
2026-04-18 14:17           ` [GSoC PATCH v6 6/6] repack-promisor: add missing headers LorenzoPegorari
2026-05-12  6:49           ` [GSoC PATCH v6 0/6] preserve promisor files content after repack Junio C Hamano
2026-04-10 23:05       ` [GSoC PATCH v4 0/5] " Junio C Hamano
2026-04-11  2:02         ` Junio C Hamano
2026-04-11 14:05           ` Lorenzo Pegorari

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aeF_PL5qYS-7Ogvd@lorenzo-VM \
    --to=lorenzo.pegorari2002@gmail.com \
    --cc=cat@malon.dev \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=me@ttaylorr.com \
    --cc=newren@gmail.com \
    --cc=ps@pks.im \
    --cc=stolee@gmail.com \
    --cc=sunshine@sunshineco.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox