* [GSoC PATCH 0/3] preserve promisor files content after repack
@ 2026-03-21 21:28 LorenzoPegorari
2026-03-21 21:28 ` [GSoC PATCH 1/3] pack-write: add explanation to promisor file content LorenzoPegorari
` (3 more replies)
0 siblings, 4 replies; 78+ messages in thread
From: LorenzoPegorari @ 2026-03-21 21:28 UTC (permalink / raw)
To: git; +Cc: Patrick Steinhardt, Taylor Blau, Karthik Nayak, Junio C Hamano
The goal of this patch is to solve the NEEDSWORK comment added by
5374a290 (fetch-pack: write fetched refs to .promisor, 14/10/2019). This
is done by adding a helper function that takes the content of all
.promisor files in the `repository`, and copies it inside the first
.promisor file created by the repack.
Also, I added a comment explaining what is the purpose of the content of
the .promisor files, since this wasn't explained anywhere (I found
information regarding this only in the message of the previously cited
commit).
I am not satisfied at all with this patch, and I would love to have some
feedback for the v2. The issues/questions that I had while writing these
patches are the following:
* Is there a way to not have to check line by line if the content of
.promisor files are already inside the destination .promisor file?
* Does it make sense to copy everything inside the first .promisor file
created by the repack? Is it worth the effort make sure to copy each
ref (and associatde hash) inside the .promisor file of the packfile
that contains that ref?
LorenzoPegorari (3):
pack-write: add explanation to promisor file content
pack-write: add helper to fill promisor file after repack
repack-promisor: preserve content of promisor files after repack
Documentation/git-repack.adoc | 4 +-
pack-write.c | 71 +++++++++++++++++++++++++++++++++++
pack.h | 1 +
repack-promisor.c | 23 ++++++++----
4 files changed, 89 insertions(+), 10 deletions(-)
--
2.43.0
^ permalink raw reply [flat|nested] 78+ messages in thread
* [GSoC PATCH 1/3] pack-write: add explanation to promisor file content
2026-03-21 21:28 [GSoC PATCH 0/3] preserve promisor files content after repack LorenzoPegorari
@ 2026-03-21 21:28 ` LorenzoPegorari
2026-03-21 21:28 ` [GSoC PATCH 2/3] pack-write: add helper to fill promisor file after repack LorenzoPegorari
` (2 subsequent siblings)
3 siblings, 0 replies; 78+ messages in thread
From: LorenzoPegorari @ 2026-03-21 21:28 UTC (permalink / raw)
To: git; +Cc: Patrick Steinhardt, Taylor Blau, Karthik Nayak, Junio C Hamano
In the entire codebase there is no explanation as to why the ".promisor"
files may contain the ref names (and their associated hashes) that were
fetched at the time the corresponding packfile was downloaded.
Add comment explaining that these pieces of information are used only for
debugging reasons, and how they can be used while debugging.
Signed-off-by: LorenzoPegorari <lorenzo.pegorari2002@gmail.com>
---
pack-write.c | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/pack-write.c b/pack-write.c
index 83eaf88541..6a2023327e 100644
--- a/pack-write.c
+++ b/pack-write.c
@@ -603,6 +603,15 @@ void write_promisor_file(const char *promisor_name, struct ref **sought, int nr_
int i, err;
FILE *output = xfopen(promisor_name, "w");
+ /*
+ * Write in the .promisor file the ref names and associated hashes,
+ * obtained by fetch-pack, at the point of generation of the
+ * corresponding packfile. These pieces of info are only used to make
+ * it easier to debug issues with partial clones, as we can identify
+ * what refs (and their associated hashes) were fetched at the time
+ * the packfile was downloaded, and if necessary, compare those hashes
+ * against what the promisor remote reports now.
+ */
for (i = 0; i < nr_sought; i++)
fprintf(output, "%s %s\n", oid_to_hex(&sought[i]->old_oid),
sought[i]->name);
--
2.43.0
^ permalink raw reply related [flat|nested] 78+ messages in thread
* [GSoC PATCH 2/3] pack-write: add helper to fill promisor file after repack
2026-03-21 21:28 [GSoC PATCH 0/3] preserve promisor files content after repack LorenzoPegorari
2026-03-21 21:28 ` [GSoC PATCH 1/3] pack-write: add explanation to promisor file content LorenzoPegorari
@ 2026-03-21 21:28 ` LorenzoPegorari
2026-03-22 2:04 ` Eric Sunshine
2026-03-21 21:29 ` [GSoC PATCH 3/3] repack-promisor: preserve content of promisor files " LorenzoPegorari
2026-03-22 19:16 ` [GSoC PATCH v2 0/4] preserve promisor files content " LorenzoPegorari
3 siblings, 1 reply; 78+ messages in thread
From: LorenzoPegorari @ 2026-03-21 21:28 UTC (permalink / raw)
To: git; +Cc: Patrick Steinhardt, Taylor Blau, Karthik Nayak, Junio C Hamano
Create a `copy_all_promisor_files()` helper function used to copy the
contents of all ".promisor" files in a `repository` inside another
".promisor" file.
This function can be used to preserve the contents of all ".promisor"
files inside a new ".promisor" file, for example when a repack happens.
This function is written in such a way so that it will read all the
".promisor" files inside the given `repository` line by line, and copy
only the lines that are not already present in the destination file. This
is done to avoid copying the same lines multiple times that may come from
multiple (redundant) packfiles. A better way to achieve this might be (is
definitely) possible.
Signed-off-by: LorenzoPegorari <lorenzo.pegorari2002@gmail.com>
---
pack-write.c | 62 ++++++++++++++++++++++++++++++++++++++++++++++++++++
pack.h | 1 +
2 files changed, 63 insertions(+)
diff --git a/pack-write.c b/pack-write.c
index 6a2023327e..3620e6bd02 100644
--- a/pack-write.c
+++ b/pack-write.c
@@ -621,3 +621,65 @@ void write_promisor_file(const char *promisor_name, struct ref **sought, int nr_
if (err)
die(_("could not write '%s' promisor file"), promisor_name);
}
+
+void copy_all_promisor_files(struct repository *repo, const char *promisor_name)
+{
+ struct strbuf promisor_source_name = STRBUF_INIT;
+ struct strbuf read_source = STRBUF_INIT, read_dest = STRBUF_INIT;
+ struct strbuf write_dest = STRBUF_INIT;
+ int err;
+
+ FILE *dest = xfopen(promisor_name, "r+");
+
+ struct packed_git *p;
+ repo_for_each_pack(repo, p) {
+ if (!p->pack_promisor)
+ continue;
+
+ strbuf_reset(&promisor_source_name);
+ strbuf_addstr(&promisor_source_name, p->pack_name);
+ strbuf_strip_suffix(&promisor_source_name, ".pack");
+ strbuf_addstr(&promisor_source_name, ".promisor");
+ FILE *source = xfopen(promisor_source_name.buf, "r");
+
+ /*
+ * For each line of the promisor source file, check if it already
+ * is in the promisor dest file. If not, add it to write_dest, so
+ * that it will be written in the dest file.
+ */
+ while (strbuf_getline(&read_source, source) != EOF) {
+ if (fseek(dest, 0L, SEEK_SET))
+ die_errno(_("fseek failed"));
+ int is_source_in_dest = 0;
+ while (strbuf_getline(&read_dest, dest) != EOF) {
+ if (!strbuf_cmp(&read_source, &read_dest)) {
+ is_source_in_dest = 1;
+ break;
+ }
+ }
+ if (!is_source_in_dest) {
+ strbuf_addbuf(&write_dest, &read_source);
+ strbuf_addstr(&write_dest, "\n");
+ }
+ }
+
+ if (write_dest.len) {
+ strbuf_strip_suffix(&write_dest, "\n");
+ if (fseek(dest, 0L, SEEK_END))
+ die_errno(_("fseek failed"));
+ fprintf(dest, "%s\n", write_dest.buf);
+ fflush(dest);
+ strbuf_reset(&write_dest);
+ }
+
+ err = ferror(source);
+ err |= fclose(source);
+ if (err)
+ die(_("could not read '%s' promisor file"), promisor_source_name.buf);
+ }
+
+ err = ferror(dest);
+ err |= fclose(dest);
+ if (err)
+ die(_("could not write '%s' promisor file"), promisor_name);
+}
diff --git a/pack.h b/pack.h
index ec76472e49..509e90edba 100644
--- a/pack.h
+++ b/pack.h
@@ -105,6 +105,7 @@ char *index_pack_lockfile(struct repository *r, int fd, int *is_well_formed);
struct ref;
void write_promisor_file(const char *promisor_name, struct ref **sought, int nr_sought);
+void copy_all_promisor_files(struct repository *repo, const char *promisor_name);
char *write_rev_file(struct repository *repo,
const char *rev_name,
--
2.43.0
^ permalink raw reply related [flat|nested] 78+ messages in thread
* [GSoC PATCH 3/3] repack-promisor: preserve content of promisor files after repack
2026-03-21 21:28 [GSoC PATCH 0/3] preserve promisor files content after repack LorenzoPegorari
2026-03-21 21:28 ` [GSoC PATCH 1/3] pack-write: add explanation to promisor file content LorenzoPegorari
2026-03-21 21:28 ` [GSoC PATCH 2/3] pack-write: add helper to fill promisor file after repack LorenzoPegorari
@ 2026-03-21 21:29 ` LorenzoPegorari
2026-03-22 19:16 ` [GSoC PATCH v2 0/4] preserve promisor files content " LorenzoPegorari
3 siblings, 0 replies; 78+ messages in thread
From: LorenzoPegorari @ 2026-03-21 21:29 UTC (permalink / raw)
To: git; +Cc: Patrick Steinhardt, Taylor Blau, Karthik Nayak, Junio C Hamano
When a repack involving promisor packfiles happens, the new ".promisor"
file is created empty, losing all the debug info that might be present
inside the ".promisor" files before the repack.
Use the "copy_all_promisor_files()" function created previously to
preserve the contents of all ".promisor" files inside the first
".promisor" file created by the repack.
Also, update the documentation accordingly.
Signed-off-by: LorenzoPegorari <lorenzo.pegorari2002@gmail.com>
---
Documentation/git-repack.adoc | 4 ++--
repack-promisor.c | 23 +++++++++++++++--------
2 files changed, 17 insertions(+), 10 deletions(-)
diff --git a/Documentation/git-repack.adoc b/Documentation/git-repack.adoc
index 673ce91083..33d3c8afbd 100644
--- a/Documentation/git-repack.adoc
+++ b/Documentation/git-repack.adoc
@@ -45,8 +45,8 @@ other objects in that pack they already have locally.
+
Promisor packfiles are repacked separately: if there are packfiles that
have an associated ".promisor" file, these packfiles will be repacked
-into another separate pack, and an empty ".promisor" file corresponding
-to the new separate pack will be written.
+into another separate pack, and a ".promisor" file corresponding to the
+new separate pack will be written (with arbitrary contents).
-A::
Same as `-a`, unless `-d` is used. Then any unreachable
diff --git a/repack-promisor.c b/repack-promisor.c
index 90318ce150..6670728669 100644
--- a/repack-promisor.c
+++ b/repack-promisor.c
@@ -40,6 +40,7 @@ static void finish_repacking_promisor_objects(struct repository *repo,
const char *packtmp)
{
struct strbuf line = STRBUF_INIT;
+ int is_first_promisor = 1;
FILE *out;
close(cmd->in);
@@ -55,19 +56,25 @@ static void finish_repacking_promisor_objects(struct repository *repo,
/*
* pack-objects creates the .pack and .idx files, but not the
- * .promisor file. Create the .promisor file, which is empty.
- *
- * NEEDSWORK: fetch-pack sometimes generates non-empty
- * .promisor files containing the ref names and associated
- * hashes at the point of generation of the corresponding
- * packfile, but this would not preserve their contents. Maybe
- * concatenate the contents of all .promisor files instead of
- * just creating a new empty file.
+ * .promisor file. Create the .promisor file.
*/
promisor_name = mkpathdup("%s-%s.promisor", packtmp,
line.buf);
write_promisor_file(promisor_name, NULL, 0);
+ /*
+ * Fetch-pack sometimes generates non-empty .promisor files
+ * containing the ref names and associated hashes at the point of
+ * generation of the corresponding packfile. These pieces of info
+ * are only used for debugging reasons. In order to preserve
+ * these, let's copy the contents of all .promisor files in the
+ * first promisor file created.
+ */
+ if (is_first_promisor) {
+ copy_all_promisor_files(repo, promisor_name);
+ is_first_promisor = 0;
+ }
+
item->util = generated_pack_populate(item->string, packtmp);
free(promisor_name);
--
2.43.0
^ permalink raw reply related [flat|nested] 78+ messages in thread
* Re: [GSoC PATCH 2/3] pack-write: add helper to fill promisor file after repack
2026-03-21 21:28 ` [GSoC PATCH 2/3] pack-write: add helper to fill promisor file after repack LorenzoPegorari
@ 2026-03-22 2:04 ` Eric Sunshine
2026-03-22 18:50 ` Lorenzo Pegorari
0 siblings, 1 reply; 78+ messages in thread
From: Eric Sunshine @ 2026-03-22 2:04 UTC (permalink / raw)
To: LorenzoPegorari
Cc: git, Patrick Steinhardt, Taylor Blau, Karthik Nayak,
Junio C Hamano
On Sat, Mar 21, 2026 at 5:29 PM LorenzoPegorari
<lorenzo.pegorari2002@gmail.com> wrote:
> Create a `copy_all_promisor_files()` helper function used to copy the
> contents of all ".promisor" files in a `repository` inside another
> ".promisor" file.
>
> This function can be used to preserve the contents of all ".promisor"
> files inside a new ".promisor" file, for example when a repack happens.
>
> This function is written in such a way so that it will read all the
> ".promisor" files inside the given `repository` line by line, and copy
> only the lines that are not already present in the destination file. This
> is done to avoid copying the same lines multiple times that may come from
> multiple (redundant) packfiles. A better way to achieve this might be (is
> definitely) possible.
>
> Signed-off-by: LorenzoPegorari <lorenzo.pegorari2002@gmail.com>
> ---
> diff --git a/pack-write.c b/pack-write.c
> @@ -621,3 +621,65 @@ void write_promisor_file(const char *promisor_name, struct ref **sought, int nr_
> +void copy_all_promisor_files(struct repository *repo, const char *promisor_name)
> +{
> + struct strbuf promisor_source_name = STRBUF_INIT;
> + struct strbuf read_source = STRBUF_INIT, read_dest = STRBUF_INIT;
> + struct strbuf write_dest = STRBUF_INIT;
These strbufs don't seem to be released, thus are leaked.
> + int err;
> +
> + FILE *dest = xfopen(promisor_name, "r+");
> +
> + struct packed_git *p;
Style nit: Place all the variable declarations together (without blank
lines), followed by a blank line.
> + repo_for_each_pack(repo, p) {
> + if (!p->pack_promisor)
> + continue;
> +
> + strbuf_reset(&promisor_source_name);
> + strbuf_addstr(&promisor_source_name, p->pack_name);
> + strbuf_strip_suffix(&promisor_source_name, ".pack");
> + strbuf_addstr(&promisor_source_name, ".promisor");
> + FILE *source = xfopen(promisor_source_name.buf, "r");
This project still frowns upon variable declaration after code. You
will want to declare `FILE *source;` at the top of this loop body and
then assign `source = xfopen(...)` here.
> + /*
> + * For each line of the promisor source file, check if it already
> + * is in the promisor dest file. If not, add it to write_dest, so
> + * that it will be written in the dest file.
> + */
> + while (strbuf_getline(&read_source, source) != EOF) {
> + if (fseek(dest, 0L, SEEK_SET))
> + die_errno(_("fseek failed"));
> + int is_source_in_dest = 0;
Ditto regarding variable declaration following code.
> + while (strbuf_getline(&read_dest, dest) != EOF) {
> + if (!strbuf_cmp(&read_source, &read_dest)) {
> + is_source_in_dest = 1;
> + break;
> + }
> + }
> + if (!is_source_in_dest) {
> + strbuf_addbuf(&write_dest, &read_source);
> + strbuf_addstr(&write_dest, "\n");
> + }
The commit message talks about this, and it is indeed very ugly that
this re-reads `dest` from the beginning for *every* `source` line. Is
there a reason you can't simply read `dest` into a `strset` (see Git's
`strmap.h`) in its entirety before entering the repo_for_each_pack()
loop and then merely check the strset for existence using
strset_add()?
> + }
> +
> + if (write_dest.len) {
> + strbuf_strip_suffix(&write_dest, "\n");
> + if (fseek(dest, 0L, SEEK_END))
> + die_errno(_("fseek failed"));
> + fprintf(dest, "%s\n", write_dest.buf);
> + fflush(dest);
> + strbuf_reset(&write_dest);
> + }
> +
> + err = ferror(source);
> + err |= fclose(source);
> + if (err)
> + die(_("could not read '%s' promisor file"), promisor_source_name.buf);
> + }
> +
> + err = ferror(dest);
> + err |= fclose(dest);
> + if (err)
> + die(_("could not write '%s' promisor file"), promisor_name);
> +}
^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: [GSoC PATCH 2/3] pack-write: add helper to fill promisor file after repack
2026-03-22 2:04 ` Eric Sunshine
@ 2026-03-22 18:50 ` Lorenzo Pegorari
0 siblings, 0 replies; 78+ messages in thread
From: Lorenzo Pegorari @ 2026-03-22 18:50 UTC (permalink / raw)
To: Eric Sunshine
Cc: git, Patrick Steinhardt, Taylor Blau, Karthik Nayak,
Junio C Hamano
On Sat, Mar 21, 2026 at 10:04:01PM -0400, Eric Sunshine wrote:
> On Sat, Mar 21, 2026 at 5:29 PM LorenzoPegorari
> <lorenzo.pegorari2002@gmail.com> wrote:
> > Create a `copy_all_promisor_files()` helper function used to copy the
> > contents of all ".promisor" files in a `repository` inside another
> > ".promisor" file.
> >
> > This function can be used to preserve the contents of all ".promisor"
> > files inside a new ".promisor" file, for example when a repack happens.
> >
> > This function is written in such a way so that it will read all the
> > ".promisor" files inside the given `repository` line by line, and copy
> > only the lines that are not already present in the destination file. This
> > is done to avoid copying the same lines multiple times that may come from
> > multiple (redundant) packfiles. A better way to achieve this might be (is
> > definitely) possible.
> >
> > Signed-off-by: LorenzoPegorari <lorenzo.pegorari2002@gmail.com>
> > ---
> > diff --git a/pack-write.c b/pack-write.c
> > @@ -621,3 +621,65 @@ void write_promisor_file(const char *promisor_name, struct ref **sought, int nr_
> > +void copy_all_promisor_files(struct repository *repo, const char *promisor_name)
> > +{
> > + struct strbuf promisor_source_name = STRBUF_INIT;
> > + struct strbuf read_source = STRBUF_INIT, read_dest = STRBUF_INIT;
> > + struct strbuf write_dest = STRBUF_INIT;
>
> These strbufs don't seem to be released, thus are leaked.
Of course... trivial mistake. Will fix it in v2.
> > + int err;
> > +
> > + FILE *dest = xfopen(promisor_name, "r+");
> > +
> > + struct packed_git *p;
>
> Style nit: Place all the variable declarations together (without blank
> lines), followed by a blank line.
Ack.
> > + repo_for_each_pack(repo, p) {
> > + if (!p->pack_promisor)
> > + continue;
> > +
> > + strbuf_reset(&promisor_source_name);
> > + strbuf_addstr(&promisor_source_name, p->pack_name);
> > + strbuf_strip_suffix(&promisor_source_name, ".pack");
> > + strbuf_addstr(&promisor_source_name, ".promisor");
> > + FILE *source = xfopen(promisor_source_name.buf, "r");
>
> This project still frowns upon variable declaration after code. You
> will want to declare `FILE *source;` at the top of this loop body and
> then assign `source = xfopen(...)` here.
Ack.
> > + /*
> > + * For each line of the promisor source file, check if it already
> > + * is in the promisor dest file. If not, add it to write_dest, so
> > + * that it will be written in the dest file.
> > + */
> > + while (strbuf_getline(&read_source, source) != EOF) {
> > + if (fseek(dest, 0L, SEEK_SET))
> > + die_errno(_("fseek failed"));
> > + int is_source_in_dest = 0;
>
> Ditto regarding variable declaration following code.
Ack.
> > + while (strbuf_getline(&read_dest, dest) != EOF) {
> > + if (!strbuf_cmp(&read_source, &read_dest)) {
> > + is_source_in_dest = 1;
> > + break;
> > + }
> > + }
> > + if (!is_source_in_dest) {
> > + strbuf_addbuf(&write_dest, &read_source);
> > + strbuf_addstr(&write_dest, "\n");
> > + }
>
> The commit message talks about this, and it is indeed very ugly that
> this re-reads `dest` from the beginning for *every* `source` line. Is
> there a reason you can't simply read `dest` into a `strset` (see Git's
> `strmap.h`) in its entirety before entering the repo_for_each_pack()
> loop and then merely check the strset for existence using
> strset_add()?
No reason at all, except for me to knowing about `strset`! Thanks for
suggesting it to me. Will use it in v2.
> > + }
> > +
> > + if (write_dest.len) {
> > + strbuf_strip_suffix(&write_dest, "\n");
> > + if (fseek(dest, 0L, SEEK_END))
> > + die_errno(_("fseek failed"));
> > + fprintf(dest, "%s\n", write_dest.buf);
> > + fflush(dest);
> > + strbuf_reset(&write_dest);
> > + }
> > +
> > + err = ferror(source);
> > + err |= fclose(source);
> > + if (err)
> > + die(_("could not read '%s' promisor file"), promisor_source_name.buf);
> > + }
> > +
> > + err = ferror(dest);
> > + err |= fclose(dest);
> > + if (err)
> > + die(_("could not write '%s' promisor file"), promisor_name);
> > +}
^ permalink raw reply [flat|nested] 78+ messages in thread
* [GSoC PATCH v2 0/4] preserve promisor files content after repack
2026-03-21 21:28 [GSoC PATCH 0/3] preserve promisor files content after repack LorenzoPegorari
` (2 preceding siblings ...)
2026-03-21 21:29 ` [GSoC PATCH 3/3] repack-promisor: preserve content of promisor files " LorenzoPegorari
@ 2026-03-22 19:16 ` LorenzoPegorari
2026-03-22 19:16 ` [GSoC PATCH v2 1/4] pack-write: add explanation to promisor file content LorenzoPegorari
` (4 more replies)
3 siblings, 5 replies; 78+ messages in thread
From: LorenzoPegorari @ 2026-03-22 19:16 UTC (permalink / raw)
To: git
Cc: Elijah Newren, Patrick Steinhardt, Junio C Hamano, Taylor Blau,
Eric Sunshine
The goal of this patch is to solve the NEEDSWORK comment added by
5374a290 (fetch-pack: write fetched refs to .promisor, 14/10/2019). This
is done by adding a helper function that takes the content of all
.promisor files in the `repository`, and copies it inside the first
.promisor file created by the repack.
Also, I added a comment explaining what is the purpose of the content of
the .promisor files, since this wasn't explained anywhere (I found
information regarding this only in the message of the previously cited
commit).
Finally, I added a test to "t7700-repack.sh" that checks if the content
of .promisor files are correctly copied into the first .promisor file
created by a repack.
V2 DIFF:
* changed how the `copy_all_promisor_files()` function works, so that it
reads `dest` into a `strset` in its entirety before entering the
`repo_for_each_pack()` loop, and then checks the `strset` for
existence using `strset_add()` (as suggested by Eric Sunshine)
* correctly release `strbuf`s
* added test
LorenzoPegorari (4):
pack-write: add explanation to promisor file content
pack-write: add helper to fill promisor file after repack
repack-promisor: preserve content of promisor files after repack
t7700: test for promisor file content after repack
Documentation/git-repack.adoc | 4 +-
pack-write.c | 70 +++++++++++++++++++++++++++++++++++
pack.h | 1 +
repack-promisor.c | 23 ++++++++----
t/t7700-repack.sh | 12 ++++++
5 files changed, 100 insertions(+), 10 deletions(-)
Range-diff against v1:
1: 9bba49563e = 1: fec0c24897 pack-write: add explanation to promisor file content
2: 3c0702f81b < -: ---------- pack-write: add helper to fill promisor file after repack
-: ---------- > 2: 0bb031e744 pack-write: add helper to fill promisor file after repack
3: 6967066fe3 = 3: 3dab969a39 repack-promisor: preserve content of promisor files after repack
-: ---------- > 4: cb642d8225 t7700: test for promisor file content after repack
--
2.43.0
^ permalink raw reply [flat|nested] 78+ messages in thread
* [GSoC PATCH v2 1/4] pack-write: add explanation to promisor file content
2026-03-22 19:16 ` [GSoC PATCH v2 0/4] preserve promisor files content " LorenzoPegorari
@ 2026-03-22 19:16 ` LorenzoPegorari
2026-03-23 21:07 ` Junio C Hamano
2026-03-22 19:18 ` [GSoC PATCH v2 2/4] pack-write: add helper to fill promisor file after repack LorenzoPegorari
` (3 subsequent siblings)
4 siblings, 1 reply; 78+ messages in thread
From: LorenzoPegorari @ 2026-03-22 19:16 UTC (permalink / raw)
To: git
Cc: Elijah Newren, Patrick Steinhardt, Junio C Hamano, Taylor Blau,
Eric Sunshine
In the entire codebase there is no explanation as to why the ".promisor"
files may contain the ref names (and their associated hashes) that were
fetched at the time the corresponding packfile was downloaded.
Add comment explaining that these pieces of information are used only for
debugging reasons, and how they can be used while debugging.
Signed-off-by: LorenzoPegorari <lorenzo.pegorari2002@gmail.com>
---
pack-write.c | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/pack-write.c b/pack-write.c
index 83eaf88541..6a2023327e 100644
--- a/pack-write.c
+++ b/pack-write.c
@@ -603,6 +603,15 @@ void write_promisor_file(const char *promisor_name, struct ref **sought, int nr_
int i, err;
FILE *output = xfopen(promisor_name, "w");
+ /*
+ * Write in the .promisor file the ref names and associated hashes,
+ * obtained by fetch-pack, at the point of generation of the
+ * corresponding packfile. These pieces of info are only used to make
+ * it easier to debug issues with partial clones, as we can identify
+ * what refs (and their associated hashes) were fetched at the time
+ * the packfile was downloaded, and if necessary, compare those hashes
+ * against what the promisor remote reports now.
+ */
for (i = 0; i < nr_sought; i++)
fprintf(output, "%s %s\n", oid_to_hex(&sought[i]->old_oid),
sought[i]->name);
--
2.43.0
^ permalink raw reply related [flat|nested] 78+ messages in thread
* [GSoC PATCH v2 2/4] pack-write: add helper to fill promisor file after repack
2026-03-22 19:16 ` [GSoC PATCH v2 0/4] preserve promisor files content " LorenzoPegorari
2026-03-22 19:16 ` [GSoC PATCH v2 1/4] pack-write: add explanation to promisor file content LorenzoPegorari
@ 2026-03-22 19:18 ` LorenzoPegorari
2026-03-23 20:27 ` Eric Sunshine
2026-03-23 21:30 ` Junio C Hamano
2026-03-22 19:18 ` [GSoC PATCH v2 3/4] repack-promisor: preserve content of promisor files " LorenzoPegorari
` (2 subsequent siblings)
4 siblings, 2 replies; 78+ messages in thread
From: LorenzoPegorari @ 2026-03-22 19:18 UTC (permalink / raw)
To: git
Cc: Elijah Newren, Patrick Steinhardt, Junio C Hamano, Taylor Blau,
Eric Sunshine
Create a `copy_all_promisor_files()` helper function used to copy the
contents of all ".promisor" files in a `repository` inside another
".promisor" file.
This function can be used to preserve the contents of all ".promisor"
files inside a new ".promisor" file, for example when a repack happens.
This function is written in such a way so that it will read all the
".promisor" files inside the given `repository` line by line, and copy
only the lines that are not already present in the destination file. This
is done to avoid copying the same lines multiple times that may come from
multiple (redundant) packfiles. There might be another better/cleaner way
to achieve this.
Signed-off-by: LorenzoPegorari <lorenzo.pegorari2002@gmail.com>
---
pack-write.c | 61 ++++++++++++++++++++++++++++++++++++++++++++++++++++
pack.h | 1 +
2 files changed, 62 insertions(+)
diff --git a/pack-write.c b/pack-write.c
index 6a2023327e..583e40b423 100644
--- a/pack-write.c
+++ b/pack-write.c
@@ -13,6 +13,7 @@
#include "path.h"
#include "repository.h"
#include "strbuf.h"
+#include "strmap.h"
void reset_pack_idx_option(struct pack_idx_option *opts)
{
@@ -621,3 +622,63 @@ void write_promisor_file(const char *promisor_name, struct ref **sought, int nr_
if (err)
die(_("could not write '%s' promisor file"), promisor_name);
}
+
+void copy_all_promisor_files(struct repository *repo, const char *promisor_name)
+{
+ struct strset dest_content = STRSET_INIT;
+ struct strbuf read_line = STRBUF_INIT;
+ struct strbuf promisor_source_name = STRBUF_INIT;
+ struct strbuf write_dest = STRBUF_INIT;
+ FILE *dest, *source;
+ struct packed_git *p;
+ int err;
+
+ dest = xfopen(promisor_name, "r+");
+ while (strbuf_getline(&read_line, dest) != EOF)
+ strset_add(&dest_content, read_line.buf);
+
+ repo_for_each_pack(repo, p) {
+ if (!p->pack_promisor)
+ continue;
+
+ strbuf_reset(&promisor_source_name);
+ strbuf_addstr(&promisor_source_name, p->pack_name);
+ strbuf_strip_suffix(&promisor_source_name, ".pack");
+ strbuf_addstr(&promisor_source_name, ".promisor");
+ source = xfopen(promisor_source_name.buf, "r");
+
+ /*
+ * For each line of the promisor source file, check if it already
+ * is in the promisor dest file. If not, add it to write_dest, so
+ * that it will be written in the dest file.
+ */
+ while (strbuf_getline(&read_line, source) != EOF) {
+ if (strset_add(&dest_content, read_line.buf)) {
+ strbuf_addbuf(&write_dest, &read_line);
+ strbuf_addstr(&write_dest, "\n");
+ }
+ }
+
+ err = ferror(source);
+ err |= fclose(source);
+ if (err)
+ die(_("could not read '%s' promisor file"), promisor_source_name.buf);
+ }
+
+ if (write_dest.len) {
+ strbuf_strip_suffix(&write_dest, "\n");
+ if (fseek(dest, 0L, SEEK_END))
+ die_errno(_("fseek failed"));
+ fprintf(dest, "%s\n", write_dest.buf);
+ }
+
+ err = ferror(dest);
+ err |= fclose(dest);
+ if (err)
+ die(_("could not write '%s' promisor file"), promisor_name);
+
+ strbuf_release(&read_line);
+ strbuf_release(&promisor_source_name);
+ strbuf_release(&write_dest);
+ strset_clear(&dest_content);
+}
diff --git a/pack.h b/pack.h
index ec76472e49..509e90edba 100644
--- a/pack.h
+++ b/pack.h
@@ -105,6 +105,7 @@ char *index_pack_lockfile(struct repository *r, int fd, int *is_well_formed);
struct ref;
void write_promisor_file(const char *promisor_name, struct ref **sought, int nr_sought);
+void copy_all_promisor_files(struct repository *repo, const char *promisor_name);
char *write_rev_file(struct repository *repo,
const char *rev_name,
--
2.43.0
^ permalink raw reply related [flat|nested] 78+ messages in thread
* [GSoC PATCH v2 3/4] repack-promisor: preserve content of promisor files after repack
2026-03-22 19:16 ` [GSoC PATCH v2 0/4] preserve promisor files content " LorenzoPegorari
2026-03-22 19:16 ` [GSoC PATCH v2 1/4] pack-write: add explanation to promisor file content LorenzoPegorari
2026-03-22 19:18 ` [GSoC PATCH v2 2/4] pack-write: add helper to fill promisor file after repack LorenzoPegorari
@ 2026-03-22 19:18 ` LorenzoPegorari
2026-03-23 21:48 ` Junio C Hamano
2026-03-22 19:18 ` [GSoC PATCH v2 4/4] t7700: test for promisor file content " LorenzoPegorari
2026-04-06 0:23 ` [GSoC PATCH v3 0/5] preserve promisor files " LorenzoPegorari
4 siblings, 1 reply; 78+ messages in thread
From: LorenzoPegorari @ 2026-03-22 19:18 UTC (permalink / raw)
To: git
Cc: Elijah Newren, Patrick Steinhardt, Junio C Hamano, Taylor Blau,
Eric Sunshine
When a repack involving promisor packfiles happens, the new ".promisor"
file is created empty, losing all the debug info that might be present
inside the ".promisor" files before the repack.
Use the "copy_all_promisor_files()" function created previously to
preserve the contents of all ".promisor" files inside the first
".promisor" file created by the repack.
Also, update the documentation accordingly.
Signed-off-by: LorenzoPegorari <lorenzo.pegorari2002@gmail.com>
---
Documentation/git-repack.adoc | 4 ++--
repack-promisor.c | 23 +++++++++++++++--------
2 files changed, 17 insertions(+), 10 deletions(-)
diff --git a/Documentation/git-repack.adoc b/Documentation/git-repack.adoc
index 673ce91083..33d3c8afbd 100644
--- a/Documentation/git-repack.adoc
+++ b/Documentation/git-repack.adoc
@@ -45,8 +45,8 @@ other objects in that pack they already have locally.
+
Promisor packfiles are repacked separately: if there are packfiles that
have an associated ".promisor" file, these packfiles will be repacked
-into another separate pack, and an empty ".promisor" file corresponding
-to the new separate pack will be written.
+into another separate pack, and a ".promisor" file corresponding to the
+new separate pack will be written (with arbitrary contents).
-A::
Same as `-a`, unless `-d` is used. Then any unreachable
diff --git a/repack-promisor.c b/repack-promisor.c
index 90318ce150..6670728669 100644
--- a/repack-promisor.c
+++ b/repack-promisor.c
@@ -40,6 +40,7 @@ static void finish_repacking_promisor_objects(struct repository *repo,
const char *packtmp)
{
struct strbuf line = STRBUF_INIT;
+ int is_first_promisor = 1;
FILE *out;
close(cmd->in);
@@ -55,19 +56,25 @@ static void finish_repacking_promisor_objects(struct repository *repo,
/*
* pack-objects creates the .pack and .idx files, but not the
- * .promisor file. Create the .promisor file, which is empty.
- *
- * NEEDSWORK: fetch-pack sometimes generates non-empty
- * .promisor files containing the ref names and associated
- * hashes at the point of generation of the corresponding
- * packfile, but this would not preserve their contents. Maybe
- * concatenate the contents of all .promisor files instead of
- * just creating a new empty file.
+ * .promisor file. Create the .promisor file.
*/
promisor_name = mkpathdup("%s-%s.promisor", packtmp,
line.buf);
write_promisor_file(promisor_name, NULL, 0);
+ /*
+ * Fetch-pack sometimes generates non-empty .promisor files
+ * containing the ref names and associated hashes at the point of
+ * generation of the corresponding packfile. These pieces of info
+ * are only used for debugging reasons. In order to preserve
+ * these, let's copy the contents of all .promisor files in the
+ * first promisor file created.
+ */
+ if (is_first_promisor) {
+ copy_all_promisor_files(repo, promisor_name);
+ is_first_promisor = 0;
+ }
+
item->util = generated_pack_populate(item->string, packtmp);
free(promisor_name);
--
2.43.0
^ permalink raw reply related [flat|nested] 78+ messages in thread
* [GSoC PATCH v2 4/4] t7700: test for promisor file content after repack
2026-03-22 19:16 ` [GSoC PATCH v2 0/4] preserve promisor files content " LorenzoPegorari
` (2 preceding siblings ...)
2026-03-22 19:18 ` [GSoC PATCH v2 3/4] repack-promisor: preserve content of promisor files " LorenzoPegorari
@ 2026-03-22 19:18 ` LorenzoPegorari
2026-04-06 0:23 ` [GSoC PATCH v3 0/5] preserve promisor files " LorenzoPegorari
4 siblings, 0 replies; 78+ messages in thread
From: LorenzoPegorari @ 2026-03-22 19:18 UTC (permalink / raw)
To: git
Cc: Elijah Newren, Patrick Steinhardt, Junio C Hamano, Taylor Blau,
Eric Sunshine
Add test that checks if the content of all ".promisor" files are copied
inside the first ".promisor" file created by a repack.
Signed-off-by: LorenzoPegorari <lorenzo.pegorari2002@gmail.com>
---
t/t7700-repack.sh | 12 ++++++++++++
1 file changed, 12 insertions(+)
diff --git a/t/t7700-repack.sh b/t/t7700-repack.sh
index 63ef63fc50..10187d5954 100755
--- a/t/t7700-repack.sh
+++ b/t/t7700-repack.sh
@@ -904,4 +904,16 @@ test_expect_success 'pending objects are repacked appropriately' '
)
'
+test_expect_success 'check .promisor file content after repack' '
+ git init prom_test1 &&
+ test_commit -C prom_test1 temp &&
+ git clone prom_test1 prom_test2 --filter=blob:none --no-local &&
+
+ cp $(ls prom_test2/.git/objects/pack/pack-*.promisor) prom_content_before &&
+ git -C prom_test2 repack -a -d &&
+ cp $(ls prom_test2/.git/objects/pack/pack-*.promisor) prom_content_after &&
+
+ test_cmp prom_content_before prom_content_after
+'
+
test_done
--
2.43.0
^ permalink raw reply related [flat|nested] 78+ messages in thread
* Re: [GSoC PATCH v2 2/4] pack-write: add helper to fill promisor file after repack
2026-03-22 19:18 ` [GSoC PATCH v2 2/4] pack-write: add helper to fill promisor file after repack LorenzoPegorari
@ 2026-03-23 20:27 ` Eric Sunshine
2026-03-26 16:15 ` Lorenzo Pegorari
2026-03-23 21:30 ` Junio C Hamano
1 sibling, 1 reply; 78+ messages in thread
From: Eric Sunshine @ 2026-03-23 20:27 UTC (permalink / raw)
To: LorenzoPegorari
Cc: git, Elijah Newren, Patrick Steinhardt, Junio C Hamano,
Taylor Blau
On Sun, Mar 22, 2026 at 3:18 PM LorenzoPegorari
<lorenzo.pegorari2002@gmail.com> wrote:
> Create a `copy_all_promisor_files()` helper function used to copy the
> contents of all ".promisor" files in a `repository` inside another
> ".promisor" file.
>
> This function can be used to preserve the contents of all ".promisor"
> files inside a new ".promisor" file, for example when a repack happens.
>
> This function is written in such a way so that it will read all the
> ".promisor" files inside the given `repository` line by line, and copy
> only the lines that are not already present in the destination file. This
> is done to avoid copying the same lines multiple times that may come from
> multiple (redundant) packfiles. There might be another better/cleaner way
> to achieve this.
>
> Signed-off-by: LorenzoPegorari <lorenzo.pegorari2002@gmail.com>
> ---
Thanks, I think this version addresses all my review comments[*] and
looks much better overall. Use of `strset` makes a big difference over
the previous attempt. A couple minor comments below...
[*]: https://lore.kernel.org/git/CAPig+cQSsMfvHJnwuXGQ1Je8ekz=Rqbaibn-3shbya5y-5xTKg@mail.gmail.com/
> diff --git a/pack-write.c b/pack-write.c
> @@ -621,3 +622,63 @@ void write_promisor_file(const char *promisor_name, struct ref **sought, int nr_
> +void copy_all_promisor_files(struct repository *repo, const char *promisor_name)
> +{
> + struct strset dest_content = STRSET_INIT;
> + struct strbuf read_line = STRBUF_INIT;
> + struct strbuf promisor_source_name = STRBUF_INIT;
> + struct strbuf write_dest = STRBUF_INIT;
> + FILE *dest, *source;
> + struct packed_git *p;
> + int err;
Nit: I probably would have declared `FILE *dest` within the scope of
the repo_for_each_pack() loop as suggested in the review, but it's not
worth a reroll.
> + dest = xfopen(promisor_name, "r+");
> + while (strbuf_getline(&read_line, dest) != EOF)
> + strset_add(&dest_content, read_line.buf);
> +
> + repo_for_each_pack(repo, p) {
> + if (!p->pack_promisor)
> + continue;
> +
> + strbuf_reset(&promisor_source_name);
> + strbuf_addstr(&promisor_source_name, p->pack_name);
> + strbuf_strip_suffix(&promisor_source_name, ".pack");
> + strbuf_addstr(&promisor_source_name, ".promisor");
> + source = xfopen(promisor_source_name.buf, "r");
> +
> + /*
> + * For each line of the promisor source file, check if it already
> + * is in the promisor dest file. If not, add it to write_dest, so
> + * that it will be written in the dest file.
> + */
> + while (strbuf_getline(&read_line, source) != EOF) {
> + if (strset_add(&dest_content, read_line.buf)) {
> + strbuf_addbuf(&write_dest, &read_line);
> + strbuf_addstr(&write_dest, "\n");
Not worth a reroll, but this could also be:
strbuf_addch(&write_dest, '\n');
> + }
> + }
> +
> + err = ferror(source);
> + err |= fclose(source);
> + if (err)
> + die(_("could not read '%s' promisor file"), promisor_source_name.buf);
> + }
> +
> + if (write_dest.len) {
> + strbuf_strip_suffix(&write_dest, "\n");
> + if (fseek(dest, 0L, SEEK_END))
> + die_errno(_("fseek failed"));
> + fprintf(dest, "%s\n", write_dest.buf);
> + }
Can you explain why you strip "\n" and then re-add it via fprintf()?
The reason is not immediately obvious.
> + err = ferror(dest);
> + err |= fclose(dest);
> + if (err)
> + die(_("could not write '%s' promisor file"), promisor_name);
> +
> + strbuf_release(&read_line);
> + strbuf_release(&promisor_source_name);
> + strbuf_release(&write_dest);
> + strset_clear(&dest_content);
> +}
Everything appears to be released. Good.
^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: [GSoC PATCH v2 1/4] pack-write: add explanation to promisor file content
2026-03-22 19:16 ` [GSoC PATCH v2 1/4] pack-write: add explanation to promisor file content LorenzoPegorari
@ 2026-03-23 21:07 ` Junio C Hamano
2026-03-25 21:33 ` Lorenzo Pegorari
0 siblings, 1 reply; 78+ messages in thread
From: Junio C Hamano @ 2026-03-23 21:07 UTC (permalink / raw)
To: LorenzoPegorari
Cc: git, Elijah Newren, Patrick Steinhardt, Taylor Blau,
Eric Sunshine
LorenzoPegorari <lorenzo.pegorari2002@gmail.com> writes:
> In the entire codebase there is no explanation as to why the ".promisor"
> files may contain the ref names (and their associated hashes) that were
> fetched at the time the corresponding packfile was downloaded.
>
> Add comment explaining that these pieces of information are used only for
> debugging reasons, and how they can be used while debugging.
>
> Signed-off-by: LorenzoPegorari <lorenzo.pegorari2002@gmail.com>
A natural question any reader of the above (and below) would be
asking is: Who told you that these are only to aid debugging?
Please refer to the commit that brought in the reasoning behind the
comment to make it more convincing.
Something like this replacing the second paragraph,
As explained in the log message of the commit 5374a290
(fetch-pack: write fetched refs to .promisor, 2019-10-14), where
this loop originally came from, these ref values are not
actually used for anything in the production, but are solely
there to help debugging. Explain it in a new comment.
perhaps?
> + /*
> + * Write in the .promisor file the ref names and associated hashes,
> + * obtained by fetch-pack, at the point of generation of the
> + * corresponding packfile. These pieces of info are only used to make
> + * it easier to debug issues with partial clones, as we can identify
> + * what refs (and their associated hashes) were fetched at the time
> + * the packfile was downloaded, and if necessary, compare those hashes
> + * against what the promisor remote reports now.
> + */
I do not want to sound too pedantic, but we align '*' asterisks in
our multi-line comments, assuming tabwidth=8 and monospace:
/*
* Write in the .promisor ...
...
* against what the promisor remote reports now.
*/
Your second and subsequent lines lack a single whitespace after the
leading tab used for indent.
> for (i = 0; i < nr_sought; i++)
> fprintf(output, "%s %s\n", oid_to_hex(&sought[i]->old_oid),
> sought[i]->name);
^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: [GSoC PATCH v2 2/4] pack-write: add helper to fill promisor file after repack
2026-03-22 19:18 ` [GSoC PATCH v2 2/4] pack-write: add helper to fill promisor file after repack LorenzoPegorari
2026-03-23 20:27 ` Eric Sunshine
@ 2026-03-23 21:30 ` Junio C Hamano
2026-03-26 2:01 ` Lorenzo Pegorari
1 sibling, 1 reply; 78+ messages in thread
From: Junio C Hamano @ 2026-03-23 21:30 UTC (permalink / raw)
To: LorenzoPegorari
Cc: git, Elijah Newren, Patrick Steinhardt, Taylor Blau,
Eric Sunshine
LorenzoPegorari <lorenzo.pegorari2002@gmail.com> writes:
> Create a `copy_all_promisor_files()` helper function used to copy the
> contents of all ".promisor" files in a `repository` inside another
> ".promisor" file.
>
> This function can be used to preserve the contents of all ".promisor"
> files inside a new ".promisor" file, for example when a repack happens.
>
> This function is written in such a way so that it will read all the
> ".promisor" files inside the given `repository` line by line, and copy
> only the lines that are not already present in the destination file. This
> is done to avoid copying the same lines multiple times that may come from
> multiple (redundant) packfiles. There might be another better/cleaner way
> to achieve this.
In the previous step, we extablished that these "back then their ref
X used to point at object Y" records are there so that we can
identify which refs were fetched at the time the packfile was
downloaded to help debugging. When repacking, losing these records
certainly would lose information.
But would concatenating all into a single file help preserve the
useful information? Don't we need do better than that?
A NEEDSWORK comment, as was discussed in another thread or two in
the recent past, is not necessarily a well thought out fully
finished specification of an additional piece of work. "We know
this has a problem, we may need to do something about it, like
concatenating to save the contents, perhaps? We do not know the
answer, and we do not bother thinking it through right at this
moment. It is left to the future developers to figure it out" is
what a NEEDSWORK comment is about.
Your first response to such a comment may be "yeah, I agree that it
is bad to lose information we added to help debugging", but the
second one should be to wonder if the "like concatenating..." is the
best approach going forward.
In other words, we should take a NEEDSWORK comment as a mere
starting point, and what NEEDS your work begins at thinking what
needs to be done about the problem raised there.
By mixing them up all into a single list, you no longer can tell
when their ref X was observed to be pointing at object Y anymore.
You may have two packs originally, with a record for "ref X pointing
at object Y" in each of them, but by deduping them, you lose the
information that you cloned at one time, and made an additional
fetch on another day, and the fact the ref X was pointing at the
same value at both times. I am not sure if it is a good
implementation if the objective of this topic is to preserve
information that is useful for debugging.
I wonder if it helps to append to each line the file timestamp of
the .promisor file we took the record originally? For the sake of
completeness, we could consider adding the filename as well, but we
can quickly dismiss it as not so useful ;-)
If repacking already repacked promisor packfile, the records would
already contain such a timestamp at the end, so the code to copy
existing records must be prepared to see if the records are the
<ref, oid> tuple, or <ref, oid, timestamp> tuple, and act
accordingly.
I am *not* saying that without such a "preserve timestamp" column in
the record, copying existing records to a new .promisor file is
useless. But we do not see any explanation why the author thinks
that it is sufficient to copy existing records while silently
deduping. We can implement only one choice backed by series of
decisions like "timestamp might help" and "original filenames would
probably not help", and the design should describe what was
considered and rejected (as opposed to "we didn't think things
through---we just did what the original NEEDSWORK comment suggested
doing").
Thanks.
^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: [GSoC PATCH v2 3/4] repack-promisor: preserve content of promisor files after repack
2026-03-22 19:18 ` [GSoC PATCH v2 3/4] repack-promisor: preserve content of promisor files " LorenzoPegorari
@ 2026-03-23 21:48 ` Junio C Hamano
2026-03-26 2:12 ` Lorenzo Pegorari
0 siblings, 1 reply; 78+ messages in thread
From: Junio C Hamano @ 2026-03-23 21:48 UTC (permalink / raw)
To: LorenzoPegorari
Cc: git, Elijah Newren, Patrick Steinhardt, Taylor Blau,
Eric Sunshine
LorenzoPegorari <lorenzo.pegorari2002@gmail.com> writes:
> @@ -40,6 +40,7 @@ static void finish_repacking_promisor_objects(struct repository *repo,
> const char *packtmp)
> {
> struct strbuf line = STRBUF_INIT;
> + int is_first_promisor = 1;
> FILE *out;
> ...
> + /*
> + * Fetch-pack sometimes generates non-empty .promisor files
> + * containing the ref names and associated hashes at the point of
> + * generation of the corresponding packfile. These pieces of info
> + * are only used for debugging reasons. In order to preserve
> + * these, let's copy the contents of all .promisor files in the
> + * first promisor file created.
> + */
> + if (is_first_promisor) {
> + copy_all_promisor_files(repo, promisor_name);
> + is_first_promisor = 0;
> + }
> +
Here the underlying assumption seems to be that whichever one of the
two potential callers of this function, repack_promisor_objects()
and pack_geometry_repack_promisors(), would handle all the existing
packs with corresponding .promisor file so it is safe to coalesce
all the debugging comments from all the existing .promisor files
into one?
Is it really true, though? Especially with geometry repacking
enabled, wouldn't a regular repack coalesce only the smallish ones
into a single pack while leaving an already largeish ones intact, or
something?
Thanks.
^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: [GSoC PATCH v2 1/4] pack-write: add explanation to promisor file content
2026-03-23 21:07 ` Junio C Hamano
@ 2026-03-25 21:33 ` Lorenzo Pegorari
0 siblings, 0 replies; 78+ messages in thread
From: Lorenzo Pegorari @ 2026-03-25 21:33 UTC (permalink / raw)
To: Junio C Hamano
Cc: git, Elijah Newren, Patrick Steinhardt, Taylor Blau,
Eric Sunshine
On Mon, Mar 23, 2026 at 02:07:31PM -0700, Junio C Hamano wrote:
> LorenzoPegorari <lorenzo.pegorari2002@gmail.com> writes:
>
> > In the entire codebase there is no explanation as to why the ".promisor"
> > files may contain the ref names (and their associated hashes) that were
> > fetched at the time the corresponding packfile was downloaded.
> >
> > Add comment explaining that these pieces of information are used only for
> > debugging reasons, and how they can be used while debugging.
> >
> > Signed-off-by: LorenzoPegorari <lorenzo.pegorari2002@gmail.com>
>
> A natural question any reader of the above (and below) would be
> asking is: Who told you that these are only to aid debugging?
>
> Please refer to the commit that brought in the reasoning behind the
> comment to make it more convincing.
>
> Something like this replacing the second paragraph,
>
> As explained in the log message of the commit 5374a290
> (fetch-pack: write fetched refs to .promisor, 2019-10-14), where
> this loop originally came from, these ref values are not
> actually used for anything in the production, but are solely
> there to help debugging. Explain it in a new comment.
>
> perhaps?
Makes perfect sense. I should have done this from the start. Thanks for
pointing that out.
> > + /*
> > + * Write in the .promisor file the ref names and associated hashes,
> > + * obtained by fetch-pack, at the point of generation of the
> > + * corresponding packfile. These pieces of info are only used to make
> > + * it easier to debug issues with partial clones, as we can identify
> > + * what refs (and their associated hashes) were fetched at the time
> > + * the packfile was downloaded, and if necessary, compare those hashes
> > + * against what the promisor remote reports now.
> > + */
>
> I do not want to sound too pedantic, but we align '*' asterisks in
> our multi-line comments, assuming tabwidth=8 and monospace:
>
> /*
> * Write in the .promisor ...
> ...
> * against what the promisor remote reports now.
> */
>
> Your second and subsequent lines lack a single whitespace after the
> leading tab used for indent.
You are not too pedantic! Ack.
> > for (i = 0; i < nr_sought; i++)
> > fprintf(output, "%s %s\n", oid_to_hex(&sought[i]->old_oid),
> > sought[i]->name);
^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: [GSoC PATCH v2 2/4] pack-write: add helper to fill promisor file after repack
2026-03-23 21:30 ` Junio C Hamano
@ 2026-03-26 2:01 ` Lorenzo Pegorari
0 siblings, 0 replies; 78+ messages in thread
From: Lorenzo Pegorari @ 2026-03-26 2:01 UTC (permalink / raw)
To: Junio C Hamano
Cc: git, Elijah Newren, Patrick Steinhardt, Taylor Blau,
Eric Sunshine
On Mon, Mar 23, 2026 at 02:30:26PM -0700, Junio C Hamano wrote:
> LorenzoPegorari <lorenzo.pegorari2002@gmail.com> writes:
>
> > Create a `copy_all_promisor_files()` helper function used to copy the
> > contents of all ".promisor" files in a `repository` inside another
> > ".promisor" file.
> >
> > This function can be used to preserve the contents of all ".promisor"
> > files inside a new ".promisor" file, for example when a repack happens.
> >
> > This function is written in such a way so that it will read all the
> > ".promisor" files inside the given `repository` line by line, and copy
> > only the lines that are not already present in the destination file. This
> > is done to avoid copying the same lines multiple times that may come from
> > multiple (redundant) packfiles. There might be another better/cleaner way
> > to achieve this.
>
> In the previous step, we extablished that these "back then their ref
> X used to point at object Y" records are there so that we can
> identify which refs were fetched at the time the packfile was
> downloaded to help debugging. When repacking, losing these records
> certainly would lose information.
>
> But would concatenating all into a single file help preserve the
> useful information? Don't we need do better than that?
Yeah, we absolutely need to! That's why I said that I was not satisfied
at all with the patch (in cover letter of v1). I really needed some
feedback, because I knew that I was doing things wrong.
> A NEEDSWORK comment, as was discussed in another thread or two in
> the recent past, is not necessarily a well thought out fully
> finished specification of an additional piece of work. "We know
> this has a problem, we may need to do something about it, like
> concatenating to save the contents, perhaps? We do not know the
> answer, and we do not bother thinking it through right at this
> moment. It is left to the future developers to figure it out" is
> what a NEEDSWORK comment is about.
>
> Your first response to such a comment may be "yeah, I agree that it
> is bad to lose information we added to help debugging", but the
> second one should be to wonder if the "like concatenating..." is the
> best approach going forward.
>
> In other words, we should take a NEEDSWORK comment as a mere
> starting point, and what NEEDS your work begins at thinking what
> needs to be done about the problem raised there.
I fully understand this! Honestly, my biggest weakness that I've
discovered about myself as a dev (through past open-source experience,
e.g. GSoC'25) is that I get hesitant when I have to work on and submit
a patch if I don't have a lot of experience with the codebase. This
happens particularly when I have to take a decision, and not only
complete a task.
In fact, I decided to work on this specific NEEDSWORK issue to get more
experience on promisor remotes (the feature that I want to improve in my
GSoC proposal) before the GSoC coding period... if I get selected, of
course :-).
I will try my best to improve!
> By mixing them up all into a single list, you no longer can tell
> when their ref X was observed to be pointing at object Y anymore.
> You may have two packs originally, with a record for "ref X pointing
> at object Y" in each of them, but by deduping them, you lose the
> information that you cloned at one time, and made an additional
> fetch on another day, and the fact the ref X was pointing at the
> same value at both times. I am not sure if it is a good
> implementation if the objective of this topic is to preserve
> information that is useful for debugging.
My reasoning was based on the (wrong) assumption that it was impossible
for the same record fo "ref X is pointing at object Y" to appear
multiple times. Obviously then, deduping them is the wrong solution, as
it will lose some debugging information.
> I wonder if it helps to append to each line the file timestamp of
> the .promisor file we took the record originally? For the sake of
> completeness, we could consider adding the filename as well, but we
> can quickly dismiss it as not so useful ;-)
Related to what I said before about getting hesitant... I actually
thought about (pretty much) exactly this! My original idea was to add
a timestamp of the current time when the repack happened. I discarded it
because I didn't want to add any new information (for no particular
reason tbh) and because I didn't want ".promisor" file content to
potentially become too long if many repacks happen.
Your solution is much cleaner compared to what I originally thought of.
> If repacking already repacked promisor packfile, the records would
> already contain such a timestamp at the end, so the code to copy
> existing records must be prepared to see if the records are the
> <ref, oid> tuple, or <ref, oid, timestamp> tuple, and act
> accordingly.
Makes perfect sense.
> I am *not* saying that without such a "preserve timestamp" column in
> the record, copying existing records to a new .promisor file is
> useless. But we do not see any explanation why the author thinks
> that it is sufficient to copy existing records while silently
> deduping. We can implement only one choice backed by series of
> decisions like "timestamp might help" and "original filenames would
> probably not help", and the design should describe what was
> considered and rejected (as opposed to "we didn't think things
> through---we just did what the original NEEDSWORK comment suggested
> doing").
I 100% agree.
> Thanks.
Thank you Junio for your time and feedback,
Lorenzo
^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: [GSoC PATCH v2 3/4] repack-promisor: preserve content of promisor files after repack
2026-03-23 21:48 ` Junio C Hamano
@ 2026-03-26 2:12 ` Lorenzo Pegorari
0 siblings, 0 replies; 78+ messages in thread
From: Lorenzo Pegorari @ 2026-03-26 2:12 UTC (permalink / raw)
To: Junio C Hamano
Cc: git, Elijah Newren, Patrick Steinhardt, Taylor Blau,
Eric Sunshine
On Mon, Mar 23, 2026 at 02:48:21PM -0700, Junio C Hamano wrote:
> LorenzoPegorari <lorenzo.pegorari2002@gmail.com> writes:
>
> > @@ -40,6 +40,7 @@ static void finish_repacking_promisor_objects(struct repository *repo,
> > const char *packtmp)
> > {
> > struct strbuf line = STRBUF_INIT;
> > + int is_first_promisor = 1;
> > FILE *out;
> > ...
> > + /*
> > + * Fetch-pack sometimes generates non-empty .promisor files
> > + * containing the ref names and associated hashes at the point of
> > + * generation of the corresponding packfile. These pieces of info
> > + * are only used for debugging reasons. In order to preserve
> > + * these, let's copy the contents of all .promisor files in the
> > + * first promisor file created.
> > + */
> > + if (is_first_promisor) {
> > + copy_all_promisor_files(repo, promisor_name);
> > + is_first_promisor = 0;
> > + }
> > +
>
> Here the underlying assumption seems to be that whichever one of the
> two potential callers of this function, repack_promisor_objects()
> and pack_geometry_repack_promisors(), would handle all the existing
> packs with corresponding .promisor file so it is safe to coalesce
> all the debugging comments from all the existing .promisor files
> into one?
>
> Is it really true, though? Especially with geometry repacking
> enabled, wouldn't a regular repack coalesce only the smallish ones
> into a single pack while leaving an already largeish ones intact, or
> something?
>
> Thanks.
I will look into this. I'm going to drastically rework this patch
series, so that the next version will be much better and better
explained.
Thank you so much for the time,
Lorenzo
^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: [GSoC PATCH v2 2/4] pack-write: add helper to fill promisor file after repack
2026-03-23 20:27 ` Eric Sunshine
@ 2026-03-26 16:15 ` Lorenzo Pegorari
0 siblings, 0 replies; 78+ messages in thread
From: Lorenzo Pegorari @ 2026-03-26 16:15 UTC (permalink / raw)
To: Eric Sunshine
Cc: git, Elijah Newren, Patrick Steinhardt, Junio C Hamano,
Taylor Blau
On Mon, Mar 23, 2026 at 04:27:44PM -0400, Eric Sunshine wrote:
> On Sun, Mar 22, 2026 at 3:18 PM LorenzoPegorari
> <lorenzo.pegorari2002@gmail.com> wrote:
> > Create a `copy_all_promisor_files()` helper function used to copy the
> > contents of all ".promisor" files in a `repository` inside another
> > ".promisor" file.
> >
> > This function can be used to preserve the contents of all ".promisor"
> > files inside a new ".promisor" file, for example when a repack happens.
> >
> > This function is written in such a way so that it will read all the
> > ".promisor" files inside the given `repository` line by line, and copy
> > only the lines that are not already present in the destination file. This
> > is done to avoid copying the same lines multiple times that may come from
> > multiple (redundant) packfiles. There might be another better/cleaner way
> > to achieve this.
> >
> > Signed-off-by: LorenzoPegorari <lorenzo.pegorari2002@gmail.com>
> > ---
>
> Thanks, I think this version addresses all my review comments[*] and
> looks much better overall. Use of `strset` makes a big difference over
> the previous attempt. A couple minor comments below...
>
> [*]: https://lore.kernel.org/git/CAPig+cQSsMfvHJnwuXGQ1Je8ekz=Rqbaibn-3shbya5y-5xTKg@mail.gmail.com/
>
> > diff --git a/pack-write.c b/pack-write.c
> > @@ -621,3 +622,63 @@ void write_promisor_file(const char *promisor_name, struct ref **sought, int nr_
> > +void copy_all_promisor_files(struct repository *repo, const char *promisor_name)
> > +{
> > + struct strset dest_content = STRSET_INIT;
> > + struct strbuf read_line = STRBUF_INIT;
> > + struct strbuf promisor_source_name = STRBUF_INIT;
> > + struct strbuf write_dest = STRBUF_INIT;
> > + FILE *dest, *source;
> > + struct packed_git *p;
> > + int err;
>
> Nit: I probably would have declared `FILE *dest` within the scope of
> the repo_for_each_pack() loop as suggested in the review, but it's not
> worth a reroll.
Ack. I will fix it. Thanks!
> > + dest = xfopen(promisor_name, "r+");
> > + while (strbuf_getline(&read_line, dest) != EOF)
> > + strset_add(&dest_content, read_line.buf);
> > +
> > + repo_for_each_pack(repo, p) {
> > + if (!p->pack_promisor)
> > + continue;
> > +
> > + strbuf_reset(&promisor_source_name);
> > + strbuf_addstr(&promisor_source_name, p->pack_name);
> > + strbuf_strip_suffix(&promisor_source_name, ".pack");
> > + strbuf_addstr(&promisor_source_name, ".promisor");
> > + source = xfopen(promisor_source_name.buf, "r");
> > +
> > + /*
> > + * For each line of the promisor source file, check if it already
> > + * is in the promisor dest file. If not, add it to write_dest, so
> > + * that it will be written in the dest file.
> > + */
> > + while (strbuf_getline(&read_line, source) != EOF) {
> > + if (strset_add(&dest_content, read_line.buf)) {
> > + strbuf_addbuf(&write_dest, &read_line);
> > + strbuf_addstr(&write_dest, "\n");
>
> Not worth a reroll, but this could also be:
>
> strbuf_addch(&write_dest, '\n');
Ack.
> > + }
> > + }
> > +
> > + err = ferror(source);
> > + err |= fclose(source);
> > + if (err)
> > + die(_("could not read '%s' promisor file"), promisor_source_name.buf);
> > + }
> > +
> > + if (write_dest.len) {
> > + strbuf_strip_suffix(&write_dest, "\n");
> > + if (fseek(dest, 0L, SEEK_END))
> > + die_errno(_("fseek failed"));
> > + fprintf(dest, "%s\n", write_dest.buf);
> > + }
>
> Can you explain why you strip "\n" and then re-add it via fprintf()?
> The reason is not immediately obvious.
Stripping it and then adding it again is actually not necessary. I think
it was necessary in a previous iteration. Thanks for noticing, will fix!
> > + err = ferror(dest);
> > + err |= fclose(dest);
> > + if (err)
> > + die(_("could not write '%s' promisor file"), promisor_name);
> > +
> > + strbuf_release(&read_line);
> > + strbuf_release(&promisor_source_name);
> > + strbuf_release(&write_dest);
> > + strset_clear(&dest_content);
> > +}
>
> Everything appears to be released. Good.
Thank you so much for your help Eric,
Lorenzo
^ permalink raw reply [flat|nested] 78+ messages in thread
* [GSoC PATCH v3 0/5] preserve promisor files content after repack
2026-03-22 19:16 ` [GSoC PATCH v2 0/4] preserve promisor files content " LorenzoPegorari
` (3 preceding siblings ...)
2026-03-22 19:18 ` [GSoC PATCH v2 4/4] t7700: test for promisor file content " LorenzoPegorari
@ 2026-04-06 0:23 ` LorenzoPegorari
2026-04-06 0:24 ` [GSoC PATCH v3 1/5] pack-write: add explanation to promisor file content LorenzoPegorari
` (5 more replies)
4 siblings, 6 replies; 78+ messages in thread
From: LorenzoPegorari @ 2026-04-06 0:23 UTC (permalink / raw)
To: git
Cc: Derrick Stolee, Patrick Steinhardt, Taylor Blau, Junio C Hamano,
Elijah Newren, Eric Sunshine
The goal of this patch is to solve the NEEDSWORK comment added by
5374a290 (fetch-pack: write fetched refs to .promisor, 14/10/2019). This
is done by adding a helper function that takes the content of all
.promisor files in the `repository`, and copies it inside the first
.promisor file created by the repack.
Also, I added a comment explaining what is the purpose of the content of
the .promisor files, since this wasn't explained anywhere (I found
information regarding this only in the message of the previously cited
commit).
Finally, I added some tests to "t7700-repack.sh" and
"t7703-repack-geometric.sh" that check if the content of .promisor files
are correctly copied into the .promisor files created by a repack.
This version is significantly different from the previous one. Maybe I
should have created a completely different patch series. Let me know for
the future.
IMPORTANT:
The "CodingGuidelines" explicitly state that:
"A C file must directly include the header files that declare the
functions and the types it uses, except for the functions and types
that are made available to it by including one of the header files
it must include by the previous rule"
where "the previous rule" is (if I understand correctly), the one related
to "<git-compat-util.h>". From what I understand then, I should have
added an include for "strmap.h" (which is needed for `strset`), correct?
And if I am correct, shouldn't "strbuf.h", "hash.h", "odb.h",
"string-list.h" and "strvec.h" also be included?
V3 DIFF:
* Made the helper function "copy_promisor_content()" a static function,
because, in my opinion, it is too specific to be used anywhere else
(at least in the near future).
* Modified the helper function to add a <time> piece of information when
coping the content of repacked ".promisor" files, as suggested by
Junio Hamano. This is done to give an additional piece of information
so that, after a repack, it is still possible to know when the ref
<ref> was observed to be pointing at object <oid>.
* Modified the helper function so that it copies each line only if the
<oid> appears in the pack associated with the newly created
".promisor" file. This is done so that it is possible correctly copy
each line in the correct newly created ".promisor" file, instead of
simply appending everything inside the first created ".promisor" file.
I have done some tests with a git repo that contains around 1M objects
and with ".promisor" files that contain in total around 100k lines,
and this operation doesn't seem to meaningfully increase the execution
time of the repack (it seems to add around 0.5% of execution time on
average).
* Modified the helper function to implement a "promisor ignorelist", so
that we can explicitly tell to ignore certain packs that we know that
we excluded (for example during a geometric repack).
* Implemented better tests.
LorenzoPegorari (5):
pack-write: add explanation to promisor file content
pack-write: add helper to fill promisor file after repack
repack-promisor: preserve content of promisor files after repack
t7700: test for promisor file content after repack
t7703: test for promisor file content after geometric repack
Documentation/git-repack.adoc | 4 +-
pack-write.c | 9 ++
repack-promisor.c | 149 +++++++++++++++++++++++++++++++---
t/t7700-repack.sh | 63 ++++++++++++++
t/t7703-repack-geometric.sh | 42 ++++++++++
5 files changed, 252 insertions(+), 15 deletions(-)
--
2.53.0.585.g1533fa96a8
^ permalink raw reply [flat|nested] 78+ messages in thread
* [GSoC PATCH v3 1/5] pack-write: add explanation to promisor file content
2026-04-06 0:23 ` [GSoC PATCH v3 0/5] preserve promisor files " LorenzoPegorari
@ 2026-04-06 0:24 ` LorenzoPegorari
2026-04-06 0:24 ` [GSoC PATCH v3 2/5] pack-write: add helper to fill promisor file after repack LorenzoPegorari
` (4 subsequent siblings)
5 siblings, 0 replies; 78+ messages in thread
From: LorenzoPegorari @ 2026-04-06 0:24 UTC (permalink / raw)
To: git
Cc: Derrick Stolee, Patrick Steinhardt, Taylor Blau, Junio C Hamano,
Elijah Newren, Eric Sunshine
In the entire codebase there is no explanation as to why the ".promisor"
files may contain the ref names (and their associated hashes) that were
fetched at the time the corresponding packfile was downloaded.
As explained in the log message of commit 5374a290 (fetch-pack: write
fetched refs to .promisor, 2019-10-14), where this loop originally came
from, these ref names (and associated hashes) are not used for anything
in the production, but are solely there to help debugging.
Explain this in a new comment.
Signed-off-by: LorenzoPegorari <lorenzo.pegorari2002@gmail.com>
---
pack-write.c | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/pack-write.c b/pack-write.c
index 83eaf88541..b8ab9510ff 100644
--- a/pack-write.c
+++ b/pack-write.c
@@ -603,6 +603,15 @@ void write_promisor_file(const char *promisor_name, struct ref **sought, int nr_
int i, err;
FILE *output = xfopen(promisor_name, "w");
+ /*
+ * Write in the .promisor file the ref names and associated hashes,
+ * obtained by fetch-pack, at the point of generation of the
+ * corresponding packfile. These pieces of info are only used to make
+ * it easier to debug issues with partial clones, as we can identify
+ * what refs (and their associated hashes) were fetched at the time
+ * the packfile was downloaded, and if necessary, compare those hashes
+ * against what the promisor remote reports now.
+ */
for (i = 0; i < nr_sought; i++)
fprintf(output, "%s %s\n", oid_to_hex(&sought[i]->old_oid),
sought[i]->name);
--
2.53.0.585.g1533fa96a8
^ permalink raw reply related [flat|nested] 78+ messages in thread
* [GSoC PATCH v3 2/5] pack-write: add helper to fill promisor file after repack
2026-04-06 0:23 ` [GSoC PATCH v3 0/5] preserve promisor files " LorenzoPegorari
2026-04-06 0:24 ` [GSoC PATCH v3 1/5] pack-write: add explanation to promisor file content LorenzoPegorari
@ 2026-04-06 0:24 ` LorenzoPegorari
2026-04-06 17:22 ` Tian Yuchen
2026-04-06 21:34 ` Junio C Hamano
2026-04-06 0:25 ` [GSoC PATCH v3 3/5] repack-promisor: preserve content of promisor files " LorenzoPegorari
` (3 subsequent siblings)
5 siblings, 2 replies; 78+ messages in thread
From: LorenzoPegorari @ 2026-04-06 0:24 UTC (permalink / raw)
To: git
Cc: Derrick Stolee, Patrick Steinhardt, Taylor Blau, Junio C Hamano,
Elijah Newren, Eric Sunshine
A ".promisor" file may contain ref names (and their associated hashes)
that were fetched at the time the corresponding packfile was downloaded.
This information is used for debugging reasons. This information is
stored as lines structured like this: "<oid> <ref>".
Create a `copy_promisor_content()` helper function that allows this
debugging info to not be lost after a `repack`, by coping it inside a new
".promisor" file.
The function logic is the following:
* Take all ".promisor" files contained inside the given `repo`.
* Ignore those whose name is contained inside the given `strset
not_repacked_names`, which basically acts as a "promisor ignorelist"
(intended to be used for packfiles that have not been repacked).
* Read each line of the remaining ".promisor" files, which can be:
* "<oid> <ref>" if the ".promisor" file was never repacked. If so,
add the time at which the ".promisor" file was last modified <time>
to the line to create the string: "<oid> <ref> <time>".
* "<oid> <ref> <time>" if the ".promisor" file was repacked. If so,
don't modify it.
* Ignore the line if its <oid> is not present inside the
"<packtmp>-<dest_hex>.idx" file.
* If the destination file "<packtmp>-<dest_hex>.promisor" does not
already contain the line, append it to the file.
The function assumes that the contents of all ".promisor" files are
correctly formed.
The time of last data modification is used in place of the time of file
creation, because the former is much easier to obtain than the latter
one.
Signed-off-by: LorenzoPegorari <lorenzo.pegorari2002@gmail.com>
---
repack-promisor.c | 119 ++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 119 insertions(+)
diff --git a/repack-promisor.c b/repack-promisor.c
index 90318ce150..6da452e8ff 100644
--- a/repack-promisor.c
+++ b/repack-promisor.c
@@ -34,6 +34,125 @@ static int write_oid(const struct object_id *oid,
return 0;
}
+/*
+ * Go through all .promisor files contained in repo (excluding those whose name
+ * appears in not_repacked_basenames, which acts as a ignorelist), and copies
+ * their content inside the destination file "<packtmp>-<dest_hex>.promisor".
+ * Each line of a never repacked .promisor file is: "<oid> <ref>" (as described
+ * in the write_promisor_file() function).
+ * After a repack, the copied lines will be: "<oid> <ref> <time>", where <time>
+ * is the time at which the .promisor file was last modified.
+ * Only the lines whose <oid> is present inside "<packtmp>-<dest_hex>.idx" will
+ * be copied.
+ * The contents of all .promisor files are assumed to be correctly formed.
+ */
+static void copy_promisor_content(struct repository *repo,
+ const char *dest_hex,
+ const char *packtmp,
+ struct strset *not_repacked_basenames)
+{
+ char *dest_idx_name;
+ char *dest_promisor_name;
+ FILE *dest;
+ struct strset dest_content = STRSET_INIT;
+ struct strbuf dest_to_write = STRBUF_INIT;
+ struct strbuf source_promisor_name = STRBUF_INIT;
+ struct strbuf line = STRBUF_INIT;
+ struct object_id dest_oid;
+ struct packed_git *dest_pack, *p;
+ int err;
+
+ dest_idx_name = mkpathdup("%s-%s.idx", packtmp, dest_hex);
+ get_oid_hex_algop(dest_hex, &dest_oid, repo->hash_algo);
+ dest_pack = parse_pack_index(repo, dest_oid.hash, dest_idx_name);
+
+ /* Open the .promisor dest file, and fill dest_content with its content */
+ dest_promisor_name = mkpathdup("%s-%s.promisor", packtmp, dest_hex);
+ dest = xfopen(dest_promisor_name, "r+");
+ while (strbuf_getline(&line, dest) != EOF)
+ strset_add(&dest_content, line.buf);
+
+ repo_for_each_pack(repo, p) {
+ FILE *source;
+ struct stat source_stat;
+
+ if (!p->pack_promisor)
+ continue;
+
+ if (not_repacked_basenames &&
+ strset_contains(not_repacked_basenames, pack_basename(p)))
+ continue;
+
+ strbuf_reset(&source_promisor_name);
+ strbuf_addstr(&source_promisor_name, p->pack_name);
+ strbuf_strip_suffix(&source_promisor_name, ".pack");
+ strbuf_addstr(&source_promisor_name, ".promisor");
+
+ if (stat(source_promisor_name.buf, &source_stat))
+ die(_("File not found: %s"), source_promisor_name.buf);
+
+ source = xfopen(source_promisor_name.buf, "r");
+
+ while (strbuf_getline(&line, source) != EOF) {
+ struct strbuf **parts;
+ struct object_id oid;
+
+ /* Split line into <oid>, <ref> and <time> (if <time> exists) */
+ parts = strbuf_split_max(&line, ' ', 3);
+
+ /* Ignore the lines where <oid> doesn't appear in the dest_pack */
+ strbuf_rtrim(parts[0]);
+ get_oid_hex_algop(parts[0]->buf, &oid, repo->hash_algo);
+ if (!find_pack_entry_one(&oid, dest_pack))
+ continue;
+
+ /* If <time> doesn't exist, retrieve it and add it to line */
+ if (!parts[2]) {
+ struct tm tm;
+ localtime_r(&source_stat.st_mtim.tv_sec, &tm),
+ strbuf_addch(&line, ' ');
+ strbuf_addftime(&line, "%Y/%m/%d-%H:%M:%S", &tm, 0, 0);
+ }
+
+ /*
+ * Add the finalized line to dest_to_write and dest_content if it
+ * wasn't already present inside dest_content
+ */
+ if (strset_add(&dest_content, line.buf)) {
+ strbuf_addbuf(&dest_to_write, &line);
+ strbuf_addch(&dest_to_write, '\n');
+ }
+
+ strbuf_list_free(parts);
+ }
+
+ err = ferror(source);
+ err |= fclose(source);
+ if (err)
+ die(_("Could not read '%s' promisor file"), source_promisor_name.buf);
+ }
+
+ /* If dest_to_write is not empty, then there are new lines to append */
+ if (dest_to_write.len) {
+ if (fseek(dest, 0L, SEEK_END))
+ die_errno(_("fseek failed"));
+ fprintf(dest, "%s", dest_to_write.buf);
+ }
+
+ err = ferror(dest);
+ err |= fclose(dest);
+ if (err)
+ die(_("Could not write '%s' promisor file"), dest_promisor_name);
+
+ close_pack_index(dest_pack);
+ free(dest_idx_name);
+ free(dest_promisor_name);
+ strset_clear(&dest_content);
+ strbuf_release(&dest_to_write);
+ strbuf_release(&source_promisor_name);
+ strbuf_release(&line);
+}
+
static void finish_repacking_promisor_objects(struct repository *repo,
struct child_process *cmd,
struct string_list *names,
--
2.53.0.585.g1533fa96a8
^ permalink raw reply related [flat|nested] 78+ messages in thread
* [GSoC PATCH v3 3/5] repack-promisor: preserve content of promisor files after repack
2026-04-06 0:23 ` [GSoC PATCH v3 0/5] preserve promisor files " LorenzoPegorari
2026-04-06 0:24 ` [GSoC PATCH v3 1/5] pack-write: add explanation to promisor file content LorenzoPegorari
2026-04-06 0:24 ` [GSoC PATCH v3 2/5] pack-write: add helper to fill promisor file after repack LorenzoPegorari
@ 2026-04-06 0:25 ` LorenzoPegorari
2026-04-06 0:25 ` [GSoC PATCH v3 4/5] t7700: test for promisor file content " LorenzoPegorari
` (2 subsequent siblings)
5 siblings, 0 replies; 78+ messages in thread
From: LorenzoPegorari @ 2026-04-06 0:25 UTC (permalink / raw)
To: git
Cc: Derrick Stolee, Patrick Steinhardt, Taylor Blau, Junio C Hamano,
Elijah Newren, Eric Sunshine
When a repack involving promisor packfiles happens, the new ".promisor"
file is created empty, losing all the debug info that might be present
inside the ".promisor" files before the repack.
Use the "copy_promisor_content()" function created previously to preserve
the contents of all ".promisor" files inside the first ".promisor" file
created by the repack.
For geometric repacking, we have to create a `strset` that contains the
basenames of all excluded packs. For "normal" repacking this is not
necessary, since there should be no excluded packs.
Also, update the documentation accordingly.
Signed-off-by: LorenzoPegorari <lorenzo.pegorari2002@gmail.com>
---
Documentation/git-repack.adoc | 4 ++--
repack-promisor.c | 30 +++++++++++++++++-------------
2 files changed, 19 insertions(+), 15 deletions(-)
diff --git a/Documentation/git-repack.adoc b/Documentation/git-repack.adoc
index 673ce91083..33d3c8afbd 100644
--- a/Documentation/git-repack.adoc
+++ b/Documentation/git-repack.adoc
@@ -45,8 +45,8 @@ other objects in that pack they already have locally.
+
Promisor packfiles are repacked separately: if there are packfiles that
have an associated ".promisor" file, these packfiles will be repacked
-into another separate pack, and an empty ".promisor" file corresponding
-to the new separate pack will be written.
+into another separate pack, and a ".promisor" file corresponding to the
+new separate pack will be written (with arbitrary contents).
-A::
Same as `-a`, unless `-d` is used. Then any unreachable
diff --git a/repack-promisor.c b/repack-promisor.c
index 6da452e8ff..37502e0023 100644
--- a/repack-promisor.c
+++ b/repack-promisor.c
@@ -156,7 +156,8 @@ static void copy_promisor_content(struct repository *repo,
static void finish_repacking_promisor_objects(struct repository *repo,
struct child_process *cmd,
struct string_list *names,
- const char *packtmp)
+ const char *packtmp,
+ struct strset *not_repacked_basenames)
{
struct strbuf line = STRBUF_INIT;
FILE *out;
@@ -174,19 +175,15 @@ static void finish_repacking_promisor_objects(struct repository *repo,
/*
* pack-objects creates the .pack and .idx files, but not the
- * .promisor file. Create the .promisor file, which is empty.
- *
- * NEEDSWORK: fetch-pack sometimes generates non-empty
- * .promisor files containing the ref names and associated
- * hashes at the point of generation of the corresponding
- * packfile, but this would not preserve their contents. Maybe
- * concatenate the contents of all .promisor files instead of
- * just creating a new empty file.
+ * .promisor file. Create the .promisor file.
*/
promisor_name = mkpathdup("%s-%s.promisor", packtmp,
line.buf);
write_promisor_file(promisor_name, NULL, 0);
+ /* Now let's fill the content of the newly created .promisor file */
+ copy_promisor_content(repo, line.buf, packtmp, not_repacked_basenames);
+
item->util = generated_pack_populate(item->string, packtmp);
free(promisor_name);
@@ -226,7 +223,7 @@ void repack_promisor_objects(struct repository *repo,
return;
}
- finish_repacking_promisor_objects(repo, &cmd, names, packtmp);
+ finish_repacking_promisor_objects(repo, &cmd, names, packtmp, NULL);
}
void pack_geometry_repack_promisors(struct repository *repo,
@@ -237,6 +234,7 @@ void pack_geometry_repack_promisors(struct repository *repo,
{
struct child_process cmd = CHILD_PROCESS_INIT;
FILE *in;
+ struct strset not_repacked_basenames = STRSET_INIT;
if (!geometry->promisor_split)
return;
@@ -250,9 +248,15 @@ void pack_geometry_repack_promisors(struct repository *repo,
in = xfdopen(cmd.in, "w");
for (size_t i = 0; i < geometry->promisor_split; i++)
fprintf(in, "%s\n", pack_basename(geometry->promisor_pack[i]));
- for (size_t i = geometry->promisor_split; i < geometry->promisor_pack_nr; i++)
- fprintf(in, "^%s\n", pack_basename(geometry->promisor_pack[i]));
+ for (size_t i = geometry->promisor_split; i < geometry->promisor_pack_nr; i++) {
+ const char *name = pack_basename(geometry->promisor_pack[i]);
+ fprintf(in, "^%s\n", name);
+ strset_add(¬_repacked_basenames, name);
+ }
fclose(in);
- finish_repacking_promisor_objects(repo, &cmd, names, packtmp);
+ finish_repacking_promisor_objects(repo, &cmd, names, packtmp,
+ strset_get_size(¬_repacked_basenames) ? ¬_repacked_basenames : NULL);
+
+ strset_clear(¬_repacked_basenames);
}
--
2.53.0.585.g1533fa96a8
^ permalink raw reply related [flat|nested] 78+ messages in thread
* [GSoC PATCH v3 4/5] t7700: test for promisor file content after repack
2026-04-06 0:23 ` [GSoC PATCH v3 0/5] preserve promisor files " LorenzoPegorari
` (2 preceding siblings ...)
2026-04-06 0:25 ` [GSoC PATCH v3 3/5] repack-promisor: preserve content of promisor files " LorenzoPegorari
@ 2026-04-06 0:25 ` LorenzoPegorari
2026-04-06 22:05 ` Junio C Hamano
2026-04-07 18:10 ` Junio C Hamano
2026-04-06 0:25 ` [GSoC PATCH v3 5/5] t7703: test for promisor file content after geometric repack LorenzoPegorari
2026-04-10 15:01 ` [GSoC PATCH v4 0/5] preserve promisor files content after repack LorenzoPegorari
5 siblings, 2 replies; 78+ messages in thread
From: LorenzoPegorari @ 2026-04-06 0:25 UTC (permalink / raw)
To: git
Cc: Derrick Stolee, Patrick Steinhardt, Taylor Blau, Junio C Hamano,
Elijah Newren, Eric Sunshine
Add tests that checks if the content of ".promisor" files are correctly
copied inside the ".promisor" files created by a repack.
Signed-off-by: LorenzoPegorari <lorenzo.pegorari2002@gmail.com>
---
t/t7700-repack.sh | 63 +++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 63 insertions(+)
diff --git a/t/t7700-repack.sh b/t/t7700-repack.sh
index 63ef63fc50..89a2116641 100755
--- a/t/t7700-repack.sh
+++ b/t/t7700-repack.sh
@@ -904,4 +904,67 @@ test_expect_success 'pending objects are repacked appropriately' '
)
'
+test_expect_success 'check one .promisor file content after repack' '
+ test_when_finished rm -rf prom_test &&
+ git init prom_test &&
+ path=prom_test/.git/objects/pack &&
+
+ (
+ test_commit_bulk -C prom_test --start=1 1 &&
+
+ # Simulate .promisor file by creating it manually
+ prom=$(ls $path/*.pack | sed "s/\.pack/.promisor/") &&
+ oid=$(git -C prom_test rev-parse HEAD) &&
+ echo "$oid ref" >$prom &&
+
+ # Save the current .promisor content, repack, and check if correct
+ prom_before_repack=$(cat $prom) &&
+ git -C prom_test repack -a -d &&
+ prom=$(ls $path/*.pack | sed "s/\.pack/.promisor/") &&
+ # $prom should contain "$prom_before_repack <date>"
+ test_grep "$prom_before_repack " $prom &&
+
+ # Save the current .promisor content, repack, and check if correct
+ cat $prom >prom_before_repack &&
+ git -C prom_test repack -a -d &&
+ prom=$(ls $path/*.pack | sed "s/\.pack/.promisor/") &&
+ # $prom should be exactly the same as prom_before_repack
+ test_cmp prom_before_repack $prom
+ )
+'
+
+test_expect_success 'check multiple .promisor file content after repack' '
+ test_when_finished rm -rf prom_test &&
+ git init prom_test &&
+ path=prom_test/.git/objects/pack &&
+
+ (
+ # Create 2 packs and simulate .promisor files by creating them manually
+ test_commit_bulk -C prom_test --start=1 1 &&
+ prom=$(ls $path/*.pack | sed "s/\.pack/.promisor/") &&
+ oid=$(git -C prom_test rev-parse HEAD) &&
+ echo "$oid ref" >$prom &&
+ prom_before_repack1=$(cat $prom) &&
+ test_commit_bulk -C prom_test --start=1 1 &&
+ prom=$(ls -t $path/*.pack | head -n 1 | sed "s/\.pack/.promisor/") &&
+ oid=$(git -C prom_test rev-parse HEAD) &&
+ echo "$oid ref" >$prom &&
+ prom_before_repack2=$(cat $prom) &&
+
+ # Repack, and check if correct compared to previous saved .promisor content
+ git -C prom_test repack -a -d &&
+ prom=$(ls $path/*.pack | sed "s/\.pack/.promisor/") &&
+ # $prom should contain "$prom_before_repack1 <date>" & "$prom_before_repack2 <date>"
+ test_grep "$prom_before_repack1 " $prom &&
+ test_grep "$prom_before_repack2 " $prom &&
+
+ # Save the current .promisor content, repack, and check if correct
+ cat $prom >prom_before_repack &&
+ git -C prom_test repack -a -d &&
+ prom=$(ls $path/*.pack | sed "s/\.pack/.promisor/") &&
+ # $prom should be exactly the same as prom_before_repack
+ test_cmp prom_before_repack $prom
+ )
+'
+
test_done
--
2.53.0.585.g1533fa96a8
^ permalink raw reply related [flat|nested] 78+ messages in thread
* [GSoC PATCH v3 5/5] t7703: test for promisor file content after geometric repack
2026-04-06 0:23 ` [GSoC PATCH v3 0/5] preserve promisor files " LorenzoPegorari
` (3 preceding siblings ...)
2026-04-06 0:25 ` [GSoC PATCH v3 4/5] t7700: test for promisor file content " LorenzoPegorari
@ 2026-04-06 0:25 ` LorenzoPegorari
2026-04-10 15:01 ` [GSoC PATCH v4 0/5] preserve promisor files content after repack LorenzoPegorari
5 siblings, 0 replies; 78+ messages in thread
From: LorenzoPegorari @ 2026-04-06 0:25 UTC (permalink / raw)
To: git
Cc: Derrick Stolee, Patrick Steinhardt, Taylor Blau, Junio C Hamano,
Elijah Newren, Eric Sunshine
Add test that checks if the content of ".promisor" files are correctly
copied inside the ".promisor" files created by a geometric repack.
Signed-off-by: LorenzoPegorari <lorenzo.pegorari2002@gmail.com>
---
t/t7703-repack-geometric.sh | 42 +++++++++++++++++++++++++++++++++++++
1 file changed, 42 insertions(+)
diff --git a/t/t7703-repack-geometric.sh b/t/t7703-repack-geometric.sh
index 04d5d8fc33..231db98743 100755
--- a/t/t7703-repack-geometric.sh
+++ b/t/t7703-repack-geometric.sh
@@ -541,4 +541,46 @@ test_expect_success 'geometric repack works with promisor packs' '
)
'
+test_expect_success 'check .promisor file content after geometric repack' '
+ test_when_finished rm -rf prom_test &&
+ git init prom_test &&
+ path=prom_test/.git/objects/pack &&
+
+ (
+ # Create 2 packs with 3 objs each, and manually create .promisor files
+ test_commit_bulk -C prom_test --start=1 1 && # 3 objects
+ prom=$(ls $path/*.pack | sed "s/\.pack/.promisor/") &&
+ oid=$(git -C prom_test rev-parse HEAD) &&
+ echo "$oid ref" >$prom &&
+ prom_before_repack1=$(cat $prom) &&
+ test_commit_bulk -C prom_test --start=2 1 && # 3 objects
+ prom=$(ls -t $path/*.pack | head -n 1 | sed "s/\.pack/.promisor/") &&
+ oid=$(git -C prom_test rev-parse HEAD) &&
+ echo "$oid ref" >$prom &&
+ prom_before_repack2=$(cat $prom) &&
+
+ # Create 2 packs with 12 and 24 objs, and manually create .promisor files
+ test_commit_bulk -C prom_test --start=3 4 && # 12 objects
+ prom=$(ls -t $path/*.pack | head -n 1 | sed "s/\.pack/.promisor/") &&
+ oid=$(git -C prom_test rev-parse HEAD) &&
+ echo "$oid ref" >$prom &&
+ prom_before_repack3=$(cat $prom) &&
+ test_commit_bulk -C prom_test --start=7 8 && # 24 objects
+ prom=$(ls -t $path/*.pack | head -n 1 | sed "s/\.pack/.promisor/") &&
+ oid=$(git -C prom_test rev-parse HEAD) &&
+ echo "$oid ref" >$prom &&
+ prom_before_repack4=$(cat $prom) &&
+
+ # Geometric repack, and check if correct compared to previous saved .promisor content
+ git -C prom_test repack --geometric 2 -d &&
+ prom=$(ls -t $path/*.pack | head -n 1 | sed "s/\.pack/.promisor/") &&
+ # $prom should have repacked only the first 2 small packs, so it should only contain
+ # the following: "$prom_before_repack1 <date>" & "$prom_before_repack2 <date>"
+ test_grep "$prom_before_repack1 " $prom &&
+ test_grep "$prom_before_repack2 " $prom &&
+ test_grep ! $prom_before_repack3 $prom &&
+ test_grep ! $prom_before_repack4 $prom
+ )
+'
+
test_done
--
2.53.0.585.g1533fa96a8
^ permalink raw reply related [flat|nested] 78+ messages in thread
* Re: [GSoC PATCH v3 2/5] pack-write: add helper to fill promisor file after repack
2026-04-06 0:24 ` [GSoC PATCH v3 2/5] pack-write: add helper to fill promisor file after repack LorenzoPegorari
@ 2026-04-06 17:22 ` Tian Yuchen
2026-04-06 18:40 ` Lorenzo Pegorari
2026-04-06 21:34 ` Junio C Hamano
1 sibling, 1 reply; 78+ messages in thread
From: Tian Yuchen @ 2026-04-06 17:22 UTC (permalink / raw)
To: LorenzoPegorari, git
Cc: Derrick Stolee, Patrick Steinhardt, Taylor Blau, Junio C Hamano,
Elijah Newren, Eric Sunshine
Hi,
On 4/6/26 08:24, LorenzoPegorari wrote:
> + while (strbuf_getline(&line, source) != EOF) {
> + struct strbuf **parts;
> + struct object_id oid;
> +
> + /* Split line into <oid>, <ref> and <time> (if <time> exists) */
> + parts = strbuf_split_max(&line, ' ', 3);
> +
> + /* Ignore the lines where <oid> doesn't appear in the dest_pack */
> + strbuf_rtrim(parts[0]);
> + get_oid_hex_algop(parts[0]->buf, &oid, repo->hash_algo);
> + if (!find_pack_entry_one(&oid, dest_pack))
> + continue;
Memory leak here;
> +
> + /* If <time> doesn't exist, retrieve it and add it to line */
> + if (!parts[2]) {
> + struct tm tm;
> + localtime_r(&source_stat.st_mtim.tv_sec, &tm),
Typo.
> + strbuf_addch(&line, ' ');
> + strbuf_addftime(&line, "%Y/%m/%d-%H:%M:%S", &tm, 0, 0);
> + }
> +
> + /*
> + * Add the finalized line to dest_to_write and dest_content if it
> + * wasn't already present inside dest_content
> + */
> + if (strset_add(&dest_content, line.buf)) {
> + strbuf_addbuf(&dest_to_write, &line);
> + strbuf_addch(&dest_to_write, '\n');
> + }
It looks good elsewhere, at least in this patch 2/5. (ゝ∀・)
Regards, Yuchen
^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: [GSoC PATCH v3 2/5] pack-write: add helper to fill promisor file after repack
2026-04-06 17:22 ` Tian Yuchen
@ 2026-04-06 18:40 ` Lorenzo Pegorari
2026-04-06 21:17 ` Junio C Hamano
2026-04-07 2:01 ` Junio C Hamano
0 siblings, 2 replies; 78+ messages in thread
From: Lorenzo Pegorari @ 2026-04-06 18:40 UTC (permalink / raw)
To: Tian Yuchen
Cc: git, Derrick Stolee, Patrick Steinhardt, Taylor Blau,
Junio C Hamano, Elijah Newren, Eric Sunshine
On Tue, Apr 07, 2026 at 01:22:16AM +0800, Tian Yuchen wrote:
> Hi,
>
> On 4/6/26 08:24, LorenzoPegorari wrote:
>
> > + while (strbuf_getline(&line, source) != EOF) {
> > + struct strbuf **parts;
> > + struct object_id oid;
> > +
> > + /* Split line into <oid>, <ref> and <time> (if <time> exists) */
> > + parts = strbuf_split_max(&line, ' ', 3);
> > +
> > + /* Ignore the lines where <oid> doesn't appear in the dest_pack */
> > + strbuf_rtrim(parts[0]);
> > + get_oid_hex_algop(parts[0]->buf, &oid, repo->hash_algo);
> > + if (!find_pack_entry_one(&oid, dest_pack))
> > + continue;
>
> Memory leak here;
Yep, `strbuf_list_free(parts)` is missing here. Ack.
> > +
> > + /* If <time> doesn't exist, retrieve it and add it to line */
> > + if (!parts[2]) {
> > + struct tm tm;
> > + localtime_r(&source_stat.st_mtim.tv_sec, &tm),
>
> Typo.
Ack.
>
> > + strbuf_addch(&line, ' ');
> > + strbuf_addftime(&line, "%Y/%m/%d-%H:%M:%S", &tm, 0, 0);
> > + }
> > +
> > + /*
> > + * Add the finalized line to dest_to_write and dest_content if it
> > + * wasn't already present inside dest_content
> > + */
> > + if (strset_add(&dest_content, line.buf)) {
> > + strbuf_addbuf(&dest_to_write, &line);
> > + strbuf_addch(&dest_to_write, '\n');
> > + }
>
> It looks good elsewhere, at least in this patch 2/5. (ゝ∀・)
>
> Regards, Yuchen
Thank you Yuchen!
Lorenzo
^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: [GSoC PATCH v3 2/5] pack-write: add helper to fill promisor file after repack
2026-04-06 18:40 ` Lorenzo Pegorari
@ 2026-04-06 21:17 ` Junio C Hamano
2026-04-07 21:46 ` Lorenzo Pegorari
2026-04-07 2:01 ` Junio C Hamano
1 sibling, 1 reply; 78+ messages in thread
From: Junio C Hamano @ 2026-04-06 21:17 UTC (permalink / raw)
To: Lorenzo Pegorari
Cc: Tian Yuchen, git, Derrick Stolee, Patrick Steinhardt, Taylor Blau,
Elijah Newren, Eric Sunshine
Lorenzo Pegorari <lorenzo.pegorari2002@gmail.com> writes:
> On Tue, Apr 07, 2026 at 01:22:16AM +0800, Tian Yuchen wrote:
>> Hi,
>>
>> On 4/6/26 08:24, LorenzoPegorari wrote:
>>
>> > + while (strbuf_getline(&line, source) != EOF) {
>> > + struct strbuf **parts;
>> > + struct object_id oid;
>> > +
>> > + /* Split line into <oid>, <ref> and <time> (if <time> exists) */
>> > + parts = strbuf_split_max(&line, ' ', 3);
>> > +
>> > + /* Ignore the lines where <oid> doesn't appear in the dest_pack */
>> > + strbuf_rtrim(parts[0]);
>> > + get_oid_hex_algop(parts[0]->buf, &oid, repo->hash_algo);
>> > + if (!find_pack_entry_one(&oid, dest_pack))
>> > + continue;
>>
>> Memory leak here;
>
> Yep, `strbuf_list_free(parts)` is missing here. Ack.
Also strbuf_split*() is a bad API. Unless you need all the parts[]
strbuf instances all editable at the same time, an array of strbuf
is a data structure that is way overkill. Splitting into string-list
may make it more palatable, I think.
We even went through a series of patches (and follow-up effort by
other contributors) [*] to rewrite callers that unnecessarily call
strbuf_split*().
[References]
https://lore.kernel.org/git/20250731225433.4028872-1-gitster@pobox.com/
https://lore.kernel.org/git/cover.1761217100.git.belkid98@gmail.com/
^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: [GSoC PATCH v3 2/5] pack-write: add helper to fill promisor file after repack
2026-04-06 0:24 ` [GSoC PATCH v3 2/5] pack-write: add helper to fill promisor file after repack LorenzoPegorari
2026-04-06 17:22 ` Tian Yuchen
@ 2026-04-06 21:34 ` Junio C Hamano
2026-04-07 22:07 ` Lorenzo Pegorari
1 sibling, 1 reply; 78+ messages in thread
From: Junio C Hamano @ 2026-04-06 21:34 UTC (permalink / raw)
To: LorenzoPegorari
Cc: git, Derrick Stolee, Patrick Steinhardt, Taylor Blau,
Elijah Newren, Eric Sunshine
LorenzoPegorari <lorenzo.pegorari2002@gmail.com> writes:
> A ".promisor" file may contain ref names (and their associated hashes)
> that were fetched at the time the corresponding packfile was downloaded.
> This information is used for debugging reasons. This information is
> stored as lines structured like this: "<oid> <ref>".
>
> Create a `copy_promisor_content()` helper function that allows this
> debugging info to not be lost after a `repack`, by coping it inside a new
> ".promisor" file.
"coping" -> "copying"
> The function logic is the following:
> * Take all ".promisor" files contained inside the given `repo`.
> * Ignore those whose name is contained inside the given `strset
> not_repacked_names`, which basically acts as a "promisor ignorelist"
> (intended to be used for packfiles that have not been repacked).
> * Read each line of the remaining ".promisor" files, which can be:
> * "<oid> <ref>" if the ".promisor" file was never repacked. If so,
> add the time at which the ".promisor" file was last modified <time>
> to the line to create the string: "<oid> <ref> <time>".
> * "<oid> <ref> <time>" if the ".promisor" file was repacked. If so,
> don't modify it.
> * Ignore the line if its <oid> is not present inside the
> "<packtmp>-<dest_hex>.idx" file.
> * If the destination file "<packtmp>-<dest_hex>.promisor" does not
> already contain the line, append it to the file.
>
> The function assumes that the contents of all ".promisor" files are
> correctly formed.
>
> The time of last data modification is used in place of the time of file
> creation, because the former is much easier to obtain than the latter
> one.
The time of file creation is not recorded anywhere if you are
dealing with the usual UNIX filesystems (ctime is not creation
time), so it is not the issue of "easier to obtain".
The reason why this design chooses to add time is because in a
never-repacked .promisor file, the modification time of the file
itself can be used when you compare the entries in it with entries
in another .promisor file that did get repacked. By having
timestamp, the debugger can tell at which time the refs at the
remote repository pointed at what object---the same ref may appear
twice in the same .promisor file and having timestamps would help
understanding what happened over time.
^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: [GSoC PATCH v3 4/5] t7700: test for promisor file content after repack
2026-04-06 0:25 ` [GSoC PATCH v3 4/5] t7700: test for promisor file content " LorenzoPegorari
@ 2026-04-06 22:05 ` Junio C Hamano
2026-04-07 23:28 ` Lorenzo Pegorari
2026-04-07 18:10 ` Junio C Hamano
1 sibling, 1 reply; 78+ messages in thread
From: Junio C Hamano @ 2026-04-06 22:05 UTC (permalink / raw)
To: LorenzoPegorari
Cc: git, Derrick Stolee, Patrick Steinhardt, Taylor Blau,
Elijah Newren, Eric Sunshine
LorenzoPegorari <lorenzo.pegorari2002@gmail.com> writes:
> +test_expect_success 'check one .promisor file content after repack' '
> + test_when_finished rm -rf prom_test &&
> + git init prom_test &&
> + path=prom_test/.git/objects/pack &&
> +
> + (
> + test_commit_bulk -C prom_test --start=1 1 &&
> +
> + # Simulate .promisor file by creating it manually
> + prom=$(ls $path/*.pack | sed "s/\.pack/.promisor/") &&
So "prom" is a list of filenames; since $path does not have any
funny letters that interferes, later use of unquotd $prom will
list these files. OK.
> + oid=$(git -C prom_test rev-parse HEAD) &&
> + echo "$oid ref" >$prom &&
Oh, not quite. How are we guaranteeing that there is only one file
in the list of files in $prom?
In any case, quoting from Documentation/CodingGuidelines:
- Redirection operators should be written with space before, but no
space after them. In other words, write 'echo test >"$file"'
instead of 'echo test> $file' or 'echo test > $file'. Note that
even though it is not required by POSIX to double-quote the
redirection target in a variable (as shown above), our code does so
because some versions of bash issue a warning without the quotes.
(incorrect)
cat hello > world < universe
echo hello >$world
(correct)
cat hello >world <universe
echo hello >"$world"
> + # Save the current .promisor content, repack, and check if correct
> + prom_before_repack=$(cat $prom) &&
This is misleading, unless you plan to update the early part of this
test to store a more realistic data in the $prom file. Wouldn't it
be equivanent to
prom_before_repack="$oid ref" &&
at this point?
> + git -C prom_test repack -a -d &&
> + prom=$(ls $path/*.pack | sed "s/\.pack/.promisor/") &&
We expect that there is only one .pack and .promisor file. Why
are we listing .pack and turning them to .promisor, instead of doing
prom=$(ls $path/*.promisor) &&
here? Don't we expect that this "repack" to recreate .promisor file
as well (and if we do not see the file then we detected another bug,
which is a good thing)?
> + # $prom should contain "$prom_before_repack <date>"
> + test_grep "$prom_before_repack " $prom &&
I do not quite understand this test. Ahh, OK. We expect that there
was only a single entry in the original, because that is what we
placed in the original .promisor file.
Enclose $prom inside a pair of double quotes, as it is misleading
without. I wasted a few minutes wondering where you are expecting
these possibly multiple promisor files from.
> + # Save the current .promisor content, repack, and check if correct
> + cat $prom >prom_before_repack &&
cp "$prom" prom_before_repack &&
would be more standard.
> + git -C prom_test repack -a -d &&
> + prom=$(ls $path/*.pack | sed "s/\.pack/.promisor/") &&
The same comment about "don't we know .promisor file should exist,
and shouldn't we check it directly?" applies here.
> + # $prom should be exactly the same as prom_before_repack
> + test_cmp prom_before_repack $prom
> + )
> +'
Same comment applies from earlier to the next test piece, I suspect.
Let's take a look.
> +
> +test_expect_success 'check multiple .promisor file content after repack' '
> + test_when_finished rm -rf prom_test &&
> + git init prom_test &&
> + path=prom_test/.git/objects/pack &&
> +
> + (
> + # Create 2 packs and simulate .promisor files by creating them manually
> + test_commit_bulk -C prom_test --start=1 1 &&
> + prom=$(ls $path/*.pack | sed "s/\.pack/.promisor/") &&
> + oid=$(git -C prom_test rev-parse HEAD) &&
> + echo "$oid ref" >$prom &&
> + prom_before_repack1=$(cat $prom) &&
> + test_commit_bulk -C prom_test --start=1 1 &&
> + prom=$(ls -t $path/*.pack | head -n 1 | sed "s/\.pack/.promisor/") &&
Do not pipe head into sed, as sed is more capable.
ls -t $path/*.pack | sed "s/.../;q"
> + oid=$(git -C prom_test rev-parse HEAD) &&
> + echo "$oid ref" >$prom &&
> + prom_before_repack2=$(cat $prom) &&
But more importantly, this may become a source of flakiness. These
two packfiles are likely to have very close timestamps and depending
on the timing, how heavily loaded the machine is, and the phase of
the moon, it is not guaranteed that you'd grab the name of the new
pack. Instead of sorting by type or getting the first one, which
would not work reliably, grab both and filter out what you already
have seen.
> + # Repack, and check if correct compared to previous saved .promisor content
> + git -C prom_test repack -a -d &&
> + prom=$(ls $path/*.pack | sed "s/\.pack/.promisor/") &&
> + # $prom should contain "$prom_before_repack1 <date>" & "$prom_before_repack2 <date>"
> + test_grep "$prom_before_repack1 " $prom &&
> + test_grep "$prom_before_repack2 " $prom &&
> +
> + # Save the current .promisor content, repack, and check if correct
> + cat $prom >prom_before_repack &&
> + git -C prom_test repack -a -d &&
> + prom=$(ls $path/*.pack | sed "s/\.pack/.promisor/") &&
> + # $prom should be exactly the same as prom_before_repack
> + test_cmp prom_before_repack $prom
> + )
> +'
> +
> test_done
^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: [GSoC PATCH v3 2/5] pack-write: add helper to fill promisor file after repack
2026-04-06 18:40 ` Lorenzo Pegorari
2026-04-06 21:17 ` Junio C Hamano
@ 2026-04-07 2:01 ` Junio C Hamano
2026-04-07 21:52 ` Lorenzo Pegorari
1 sibling, 1 reply; 78+ messages in thread
From: Junio C Hamano @ 2026-04-07 2:01 UTC (permalink / raw)
To: Lorenzo Pegorari
Cc: Tian Yuchen, git, Derrick Stolee, Patrick Steinhardt, Taylor Blau,
Elijah Newren, Eric Sunshine
Lorenzo Pegorari <lorenzo.pegorari2002@gmail.com> writes:
> On Tue, Apr 07, 2026 at 01:22:16AM +0800, Tian Yuchen wrote:
>> Hi,
>>
>> On 4/6/26 08:24, LorenzoPegorari wrote:
>>
>> > + while (strbuf_getline(&line, source) != EOF) {
>> > + struct strbuf **parts;
>> > + struct object_id oid;
>> > +
>> > + /* Split line into <oid>, <ref> and <time> (if <time> exists) */
>> > + parts = strbuf_split_max(&line, ' ', 3);
>> > +
>> > + /* Ignore the lines where <oid> doesn't appear in the dest_pack */
>> > + strbuf_rtrim(parts[0]);
>> > + get_oid_hex_algop(parts[0]->buf, &oid, repo->hash_algo);
>> > + if (!find_pack_entry_one(&oid, dest_pack))
>> > + continue;
>>
>> Memory leak here;
>
> Yep, `strbuf_list_free(parts)` is missing here. Ack.
>
>> > +
>> > + /* If <time> doesn't exist, retrieve it and add it to line */
>> > + if (!parts[2]) {
>> > + struct tm tm;
>> > + localtime_r(&source_stat.st_mtim.tv_sec, &tm),
>>
>> Typo.
>
> Ack.
Not just an unintended use of comma operator, this is not portable
and breaks OSX build
https://github.com/git/git/actions/runs/24058681172/job/70170218891#step:4:213
>> > + strbuf_addch(&line, ' ');
>> > + strbuf_addftime(&line, "%Y/%m/%d-%H:%M:%S", &tm, 0, 0);
I suspect that storing seconds since epoch as a large integer would
be simpler and much less error prone than storing localtime in
textual form without even recording the timezone.
^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: [GSoC PATCH v3 4/5] t7700: test for promisor file content after repack
2026-04-06 0:25 ` [GSoC PATCH v3 4/5] t7700: test for promisor file content " LorenzoPegorari
2026-04-06 22:05 ` Junio C Hamano
@ 2026-04-07 18:10 ` Junio C Hamano
2026-04-07 23:11 ` Lorenzo Pegorari
1 sibling, 1 reply; 78+ messages in thread
From: Junio C Hamano @ 2026-04-07 18:10 UTC (permalink / raw)
To: LorenzoPegorari
Cc: git, Derrick Stolee, Patrick Steinhardt, Taylor Blau,
Elijah Newren, Eric Sunshine
LorenzoPegorari <lorenzo.pegorari2002@gmail.com> writes:
> Add tests that checks if the content of ".promisor" files are correctly
> copied inside the ".promisor" files created by a repack.
>
> Signed-off-by: LorenzoPegorari <lorenzo.pegorari2002@gmail.com>
> ---
> t/t7700-repack.sh | 63 +++++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 63 insertions(+)
>
> diff --git a/t/t7700-repack.sh b/t/t7700-repack.sh
> index 63ef63fc50..89a2116641 100755
> --- a/t/t7700-repack.sh
> +++ b/t/t7700-repack.sh
> @@ -904,4 +904,67 @@ test_expect_success 'pending objects are repacked appropriately' '
> )
> '
>
> +test_expect_success 'check one .promisor file content after repack' '
> + test_when_finished rm -rf prom_test &&
> + git init prom_test &&
> + path=prom_test/.git/objects/pack &&
> +
> + (
> + test_commit_bulk -C prom_test --start=1 1 &&
> +
> + # Simulate .promisor file by creating it manually
> + prom=$(ls $path/*.pack | sed "s/\.pack/.promisor/") &&
> + oid=$(git -C prom_test rev-parse HEAD) &&
> + echo "$oid ref" >$prom &&
> +
> + # Save the current .promisor content, repack, and check if correct
> + prom_before_repack=$(cat $prom) &&
> + git -C prom_test repack -a -d &&
> + prom=$(ls $path/*.pack | sed "s/\.pack/.promisor/") &&
> + # $prom should contain "$prom_before_repack <date>"
> + test_grep "$prom_before_repack " $prom &&
> +
> + # Save the current .promisor content, repack, and check if correct
> + cat $prom >prom_before_repack &&
> + git -C prom_test repack -a -d &&
> + prom=$(ls $path/*.pack | sed "s/\.pack/.promisor/") &&
> + # $prom should be exactly the same as prom_before_repack
> + test_cmp prom_before_repack $prom
> + )
> +'
> +
> +test_expect_success 'check multiple .promisor file content after repack' '
> +...
> +
> + # Repack, and check if correct compared to previous saved .promisor content
> + git -C prom_test repack -a -d &&
> + prom=$(ls $path/*.pack | sed "s/\.pack/.promisor/") &&
> + # $prom should contain "$prom_before_repack1 <date>" & "$prom_before_repack2 <date>"
> + test_grep "$prom_before_repack1 " $prom &&
> + test_grep "$prom_before_repack2 " $prom &&
This test seems to be flakey.
https://github.com/git/git/actions/runs/24095497271/job/70292906676#step:10:5274
shows that $prom gets two file names, and because test_grep is
expecting a single source to grep inside, the first test_grep
fails.
^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: [GSoC PATCH v3 2/5] pack-write: add helper to fill promisor file after repack
2026-04-06 21:17 ` Junio C Hamano
@ 2026-04-07 21:46 ` Lorenzo Pegorari
0 siblings, 0 replies; 78+ messages in thread
From: Lorenzo Pegorari @ 2026-04-07 21:46 UTC (permalink / raw)
To: Junio C Hamano
Cc: Tian Yuchen, git, Derrick Stolee, Patrick Steinhardt, Taylor Blau,
Elijah Newren, Eric Sunshine
On Mon, Apr 06, 2026 at 02:17:02PM -0700, Junio C Hamano wrote:
> Also strbuf_split*() is a bad API. Unless you need all the parts[]
> strbuf instances all editable at the same time, an array of strbuf
> is a data structure that is way overkill. Splitting into string-list
> may make it more palatable, I think.
>
> We even went through a series of patches (and follow-up effort by
> other contributors) [*] to rewrite callers that unnecessarily call
> strbuf_split*().
>
> [References]
> https://lore.kernel.org/git/20250731225433.4028872-1-gitster@pobox.com/
> https://lore.kernel.org/git/cover.1761217100.git.belkid98@gmail.com/
Mhm makes perfect sense. I will rewrite it using `string_list`. Thanks!
Lorenzo
^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: [GSoC PATCH v3 2/5] pack-write: add helper to fill promisor file after repack
2026-04-07 2:01 ` Junio C Hamano
@ 2026-04-07 21:52 ` Lorenzo Pegorari
2026-04-07 22:03 ` Junio C Hamano
0 siblings, 1 reply; 78+ messages in thread
From: Lorenzo Pegorari @ 2026-04-07 21:52 UTC (permalink / raw)
To: Junio C Hamano
Cc: Tian Yuchen, git, Derrick Stolee, Patrick Steinhardt, Taylor Blau,
Elijah Newren, Eric Sunshine
On Mon, Apr 06, 2026 at 07:01:18PM -0700, Junio C Hamano wrote:
> Lorenzo Pegorari <lorenzo.pegorari2002@gmail.com> writes:
> > On Tue, Apr 07, 2026 at 01:22:16AM +0800, Tian Yuchen wrote:
> >> On 4/6/26 08:24, LorenzoPegorari wrote:
> >> > +
> >> > + /* If <time> doesn't exist, retrieve it and add it to line */
> >> > + if (!parts[2]) {
> >> > + struct tm tm;
> >> > + localtime_r(&source_stat.st_mtim.tv_sec, &tm),
> >>
> >> Typo.
> >
> > Ack.
>
> Not just an unintended use of comma operator, this is not portable
> and breaks OSX build
>
> https://github.com/git/git/actions/runs/24058681172/job/70170218891#step:4:213
Yeah, I was shocked that it compiled at all on my system with no issue
whatsoever.
> >> > + strbuf_addch(&line, ' ');
> >> > + strbuf_addftime(&line, "%Y/%m/%d-%H:%M:%S", &tm, 0, 0);
>
> I suspect that storing seconds since epoch as a large integer would
> be simpler and much less error prone than storing localtime in
> textual form without even recording the timezone.
Yeah, maybe for this kinda specific debugging info being less error
prone is more important then "looking good". Will do that.
Thanks,
Lorenzo
^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: [GSoC PATCH v3 2/5] pack-write: add helper to fill promisor file after repack
2026-04-07 21:52 ` Lorenzo Pegorari
@ 2026-04-07 22:03 ` Junio C Hamano
0 siblings, 0 replies; 78+ messages in thread
From: Junio C Hamano @ 2026-04-07 22:03 UTC (permalink / raw)
To: Lorenzo Pegorari
Cc: Tian Yuchen, git, Derrick Stolee, Patrick Steinhardt, Taylor Blau,
Elijah Newren, Eric Sunshine
Lorenzo Pegorari <lorenzo.pegorari2002@gmail.com> writes:
> On Mon, Apr 06, 2026 at 07:01:18PM -0700, Junio C Hamano wrote:
>> Lorenzo Pegorari <lorenzo.pegorari2002@gmail.com> writes:
>> > On Tue, Apr 07, 2026 at 01:22:16AM +0800, Tian Yuchen wrote:
>> >> On 4/6/26 08:24, LorenzoPegorari wrote:
>> >> > +
>> >> > + /* If <time> doesn't exist, retrieve it and add it to line */
>> >> > + if (!parts[2]) {
>> >> > + struct tm tm;
>> >> > + localtime_r(&source_stat.st_mtim.tv_sec, &tm),
>> >>
>> >> Typo.
>> >
>> > Ack.
>>
>> Not just an unintended use of comma operator, this is not portable
>> and breaks OSX build
>>
>> https://github.com/git/git/actions/runs/24058681172/job/70170218891#step:4:213
>
> Yeah, I was shocked that it compiled at all on my system with no issue
> whatsoever.
>
>> >> > + strbuf_addch(&line, ' ');
>> >> > + strbuf_addftime(&line, "%Y/%m/%d-%H:%M:%S", &tm, 0, 0);
>>
>> I suspect that storing seconds since epoch as a large integer would
>> be simpler and much less error prone than storing localtime in
>> textual form without even recording the timezone.
>
> Yeah, maybe for this kinda specific debugging info being less error
> prone is more important then "looking good". Will do that.
>
> Thanks,
> Lorenzo
Before moving on, please fetch what I pushed out and see if
SQUASH??? commit I made on top (you should be able to find it out of
'seen') is any useful for your update.
Thanks.
^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: [GSoC PATCH v3 2/5] pack-write: add helper to fill promisor file after repack
2026-04-06 21:34 ` Junio C Hamano
@ 2026-04-07 22:07 ` Lorenzo Pegorari
0 siblings, 0 replies; 78+ messages in thread
From: Lorenzo Pegorari @ 2026-04-07 22:07 UTC (permalink / raw)
To: Junio C Hamano
Cc: git, Derrick Stolee, Patrick Steinhardt, Taylor Blau,
Elijah Newren, Eric Sunshine
On Mon, Apr 06, 2026 at 02:34:32PM -0700, Junio C Hamano wrote:
> LorenzoPegorari <lorenzo.pegorari2002@gmail.com> writes:
>
> > A ".promisor" file may contain ref names (and their associated hashes)
> > that were fetched at the time the corresponding packfile was downloaded.
> > This information is used for debugging reasons. This information is
> > stored as lines structured like this: "<oid> <ref>".
> >
> > Create a `copy_promisor_content()` helper function that allows this
> > debugging info to not be lost after a `repack`, by coping it inside a new
> > ".promisor" file.
>
> "coping" -> "copying"
Ack.
> > The function logic is the following:
> > * Take all ".promisor" files contained inside the given `repo`.
> > * Ignore those whose name is contained inside the given `strset
> > not_repacked_names`, which basically acts as a "promisor ignorelist"
> > (intended to be used for packfiles that have not been repacked).
> > * Read each line of the remaining ".promisor" files, which can be:
> > * "<oid> <ref>" if the ".promisor" file was never repacked. If so,
> > add the time at which the ".promisor" file was last modified <time>
> > to the line to create the string: "<oid> <ref> <time>".
> > * "<oid> <ref> <time>" if the ".promisor" file was repacked. If so,
> > don't modify it.
> > * Ignore the line if its <oid> is not present inside the
> > "<packtmp>-<dest_hex>.idx" file.
> > * If the destination file "<packtmp>-<dest_hex>.promisor" does not
> > already contain the line, append it to the file.
> >
> > The function assumes that the contents of all ".promisor" files are
> > correctly formed.
> >
> > The time of last data modification is used in place of the time of file
> > creation, because the former is much easier to obtain than the latter
> > one.
>
> The time of file creation is not recorded anywhere if you are
> dealing with the usual UNIX filesystems (ctime is not creation
> time), so it is not the issue of "easier to obtain".
That's what I found out during my researches, but I wasn't sure. Thanks
for confirming it.
> The reason why this design chooses to add time is because in a
> never-repacked .promisor file, the modification time of the file
> itself can be used when you compare the entries in it with entries
> in another .promisor file that did get repacked. By having
> timestamp, the debugger can tell at which time the refs at the
> remote repository pointed at what object---the same ref may appear
> twice in the same .promisor file and having timestamps would help
> understanding what happened over time.
Exactly. I'll improve the commit message to better explain the utility
of these timestamps.
Thanks,
Lorenzo
^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: [GSoC PATCH v3 4/5] t7700: test for promisor file content after repack
2026-04-07 18:10 ` Junio C Hamano
@ 2026-04-07 23:11 ` Lorenzo Pegorari
2026-04-08 0:38 ` Lorenzo Pegorari
0 siblings, 1 reply; 78+ messages in thread
From: Lorenzo Pegorari @ 2026-04-07 23:11 UTC (permalink / raw)
To: Junio C Hamano
Cc: git, Derrick Stolee, Patrick Steinhardt, Taylor Blau,
Elijah Newren, Eric Sunshine
On Tue, Apr 07, 2026 at 11:10:02AM -0700, Junio C Hamano wrote:
> LorenzoPegorari <lorenzo.pegorari2002@gmail.com> writes:
>
> > Add tests that checks if the content of ".promisor" files are correctly
> > copied inside the ".promisor" files created by a repack.
> >
> > Signed-off-by: LorenzoPegorari <lorenzo.pegorari2002@gmail.com>
> > ---
> > t/t7700-repack.sh | 63 +++++++++++++++++++++++++++++++++++++++++++++++
> > 1 file changed, 63 insertions(+)
> >
> > diff --git a/t/t7700-repack.sh b/t/t7700-repack.sh
> > index 63ef63fc50..89a2116641 100755
> > --- a/t/t7700-repack.sh
> > +++ b/t/t7700-repack.sh
> > @@ -904,4 +904,67 @@ test_expect_success 'pending objects are repacked appropriately' '
> > )
> > '
> >
> > +test_expect_success 'check one .promisor file content after repack' '
> > + test_when_finished rm -rf prom_test &&
> > + git init prom_test &&
> > + path=prom_test/.git/objects/pack &&
> > +
> > + (
> > + test_commit_bulk -C prom_test --start=1 1 &&
> > +
> > + # Simulate .promisor file by creating it manually
> > + prom=$(ls $path/*.pack | sed "s/\.pack/.promisor/") &&
> > + oid=$(git -C prom_test rev-parse HEAD) &&
> > + echo "$oid ref" >$prom &&
> > +
> > + # Save the current .promisor content, repack, and check if correct
> > + prom_before_repack=$(cat $prom) &&
> > + git -C prom_test repack -a -d &&
> > + prom=$(ls $path/*.pack | sed "s/\.pack/.promisor/") &&
> > + # $prom should contain "$prom_before_repack <date>"
> > + test_grep "$prom_before_repack " $prom &&
> > +
> > + # Save the current .promisor content, repack, and check if correct
> > + cat $prom >prom_before_repack &&
> > + git -C prom_test repack -a -d &&
> > + prom=$(ls $path/*.pack | sed "s/\.pack/.promisor/") &&
> > + # $prom should be exactly the same as prom_before_repack
> > + test_cmp prom_before_repack $prom
> > + )
> > +'
> > +
> > +test_expect_success 'check multiple .promisor file content after repack' '
> > +...
> > +
> > + # Repack, and check if correct compared to previous saved .promisor content
> > + git -C prom_test repack -a -d &&
> > + prom=$(ls $path/*.pack | sed "s/\.pack/.promisor/") &&
> > + # $prom should contain "$prom_before_repack1 <date>" & "$prom_before_repack2 <date>"
> > + test_grep "$prom_before_repack1 " $prom &&
> > + test_grep "$prom_before_repack2 " $prom &&
>
> This test seems to be flakey.
>
> https://github.com/git/git/actions/runs/24095497271/job/70292906676#step:10:5274
>
> shows that $prom gets two file names, and because test_grep is
> expecting a single source to grep inside, the first test_grep
> fails.
Uff yeah, I see.
I also saw your other mail regarding the "SQUASH???" commit you made
(inside the `seen` branch). I'm not so sure if it is useful to solve
this issue tho.
It looks like, for some reason, `repack -a` fails to repack everything
into a single pack, but I believe that `repack -a -f` should force it to
repack everything no matter what (I think??).
^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: [GSoC PATCH v3 4/5] t7700: test for promisor file content after repack
2026-04-06 22:05 ` Junio C Hamano
@ 2026-04-07 23:28 ` Lorenzo Pegorari
0 siblings, 0 replies; 78+ messages in thread
From: Lorenzo Pegorari @ 2026-04-07 23:28 UTC (permalink / raw)
To: Junio C Hamano
Cc: git, Derrick Stolee, Patrick Steinhardt, Taylor Blau,
Elijah Newren, Eric Sunshine
On Mon, Apr 06, 2026 at 03:05:37PM -0700, Junio C Hamano wrote:
> LorenzoPegorari <lorenzo.pegorari2002@gmail.com> writes:
>
> > +test_expect_success 'check one .promisor file content after repack' '
> > + test_when_finished rm -rf prom_test &&
> > + git init prom_test &&
> > + path=prom_test/.git/objects/pack &&
> > +
> > + (
> > + test_commit_bulk -C prom_test --start=1 1 &&
> > +
> > + # Simulate .promisor file by creating it manually
> > + prom=$(ls $path/*.pack | sed "s/\.pack/.promisor/") &&
>
> So "prom" is a list of filenames; since $path does not have any
> funny letters that interferes, later use of unquotd $prom will
> list these files. OK.
`test_commit_bulk` creates a single ".pack" file. We use `$prom` to
create the associated ".promisor" file.
> > + oid=$(git -C prom_test rev-parse HEAD) &&
> > + echo "$oid ref" >$prom &&
>
> Oh, not quite. How are we guaranteeing that there is only one file
> in the list of files in $prom?
Yes, because `test_commit_bulk` specifically creates a single pack.
> In any case, quoting from Documentation/CodingGuidelines:
>
> - Redirection operators should be written with space before, but no
> space after them. In other words, write 'echo test >"$file"'
> instead of 'echo test> $file' or 'echo test > $file'. Note that
> even though it is not required by POSIX to double-quote the
> redirection target in a variable (as shown above), our code does so
> because some versions of bash issue a warning without the quotes.
>
> (incorrect)
> cat hello > world < universe
> echo hello >$world
>
> (correct)
> cat hello >world <universe
> echo hello >"$world"
Ack.
> > + # Save the current .promisor content, repack, and check if correct
> > + prom_before_repack=$(cat $prom) &&
>
> This is misleading, unless you plan to update the early part of this
> test to store a more realistic data in the $prom file. Wouldn't it
> be equivanent to
>
> prom_before_repack="$oid ref" &&
>
> at this point?
Yeah, simply using "$oid ref" would be better at this point. One less
confusing variable.
> > + git -C prom_test repack -a -d &&
> > + prom=$(ls $path/*.pack | sed "s/\.pack/.promisor/") &&
>
> We expect that there is only one .pack and .promisor file. Why
> are we listing .pack and turning them to .promisor, instead of doing
>
> prom=$(ls $path/*.promisor) &&
>
> here? Don't we expect that this "repack" to recreate .promisor file
> as well (and if we do not see the file then we detected another bug,
> which is a good thing)?
100% correct. I just copied it from above without thinking enough. Will
do this. Thanks!
> > + # $prom should contain "$prom_before_repack <date>"
> > + test_grep "$prom_before_repack " $prom &&
>
> I do not quite understand this test. Ahh, OK. We expect that there
> was only a single entry in the original, because that is what we
> placed in the original .promisor file.
Exactly.
> Enclose $prom inside a pair of double quotes, as it is misleading
> without. I wasted a few minutes wondering where you are expecting
> these possibly multiple promisor files from.
Will do that.
> > + # Save the current .promisor content, repack, and check if correct
> > + cat $prom >prom_before_repack &&
>
> cp "$prom" prom_before_repack &&
>
> would be more standard.
Ack.
> > + git -C prom_test repack -a -d &&
> > + prom=$(ls $path/*.pack | sed "s/\.pack/.promisor/") &&
>
> The same comment about "don't we know .promisor file should exist,
> and shouldn't we check it directly?" applies here.
True. Ack.
> > + # $prom should be exactly the same as prom_before_repack
> > + test_cmp prom_before_repack $prom
> > + )
> > +'
>
>
> Same comment applies from earlier to the next test piece, I suspect.
> Let's take a look.
>
> > +
> > +test_expect_success 'check multiple .promisor file content after repack' '
> > + test_when_finished rm -rf prom_test &&
> > + git init prom_test &&
> > + path=prom_test/.git/objects/pack &&
> > +
> > + (
> > + # Create 2 packs and simulate .promisor files by creating them manually
> > + test_commit_bulk -C prom_test --start=1 1 &&
> > + prom=$(ls $path/*.pack | sed "s/\.pack/.promisor/") &&
> > + oid=$(git -C prom_test rev-parse HEAD) &&
> > + echo "$oid ref" >$prom &&
> > + prom_before_repack1=$(cat $prom) &&
>
> > + test_commit_bulk -C prom_test --start=1 1 &&
> > + prom=$(ls -t $path/*.pack | head -n 1 | sed "s/\.pack/.promisor/") &&
>
> Do not pipe head into sed, as sed is more capable.
>
> ls -t $path/*.pack | sed "s/.../;q"
Ack.
> > + oid=$(git -C prom_test rev-parse HEAD) &&
> > + echo "$oid ref" >$prom &&
> > + prom_before_repack2=$(cat $prom) &&
>
> But more importantly, this may become a source of flakiness. These
> two packfiles are likely to have very close timestamps and depending
> on the timing, how heavily loaded the machine is, and the phase of
> the moon, it is not guaranteed that you'd grab the name of the new
> pack. Instead of sorting by type or getting the first one, which
> would not work reliably, grab both and filter out what you already
> have seen.
Makes sense. Ack
> > + # Repack, and check if correct compared to previous saved .promisor content
> > + git -C prom_test repack -a -d &&
> > + prom=$(ls $path/*.pack | sed "s/\.pack/.promisor/") &&
> > + # $prom should contain "$prom_before_repack1 <date>" & "$prom_before_repack2 <date>"
> > + test_grep "$prom_before_repack1 " $prom &&
> > + test_grep "$prom_before_repack2 " $prom &&
> > +
> > + # Save the current .promisor content, repack, and check if correct
> > + cat $prom >prom_before_repack &&
> > + git -C prom_test repack -a -d &&
> > + prom=$(ls $path/*.pack | sed "s/\.pack/.promisor/") &&
> > + # $prom should be exactly the same as prom_before_repack
> > + test_cmp prom_before_repack $prom
> > + )
> > +'
> > +
> > test_done
Thank you so much Junio,
Lorenzo
^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: [GSoC PATCH v3 4/5] t7700: test for promisor file content after repack
2026-04-07 23:11 ` Lorenzo Pegorari
@ 2026-04-08 0:38 ` Lorenzo Pegorari
0 siblings, 0 replies; 78+ messages in thread
From: Lorenzo Pegorari @ 2026-04-08 0:38 UTC (permalink / raw)
To: Junio C Hamano
Cc: git, Derrick Stolee, Patrick Steinhardt, Taylor Blau,
Elijah Newren, Eric Sunshine
On Wed, Apr 08, 2026 at 01:11:42AM +0200, Lorenzo Pegorari wrote:
> On Tue, Apr 07, 2026 at 11:10:02AM -0700, Junio C Hamano wrote:
> > LorenzoPegorari <lorenzo.pegorari2002@gmail.com> writes:
> >
> > > Add tests that checks if the content of ".promisor" files are correctly
> > > copied inside the ".promisor" files created by a repack.
> > >
> > > Signed-off-by: LorenzoPegorari <lorenzo.pegorari2002@gmail.com>
> > > ---
> > > t/t7700-repack.sh | 63 +++++++++++++++++++++++++++++++++++++++++++++++
> > > 1 file changed, 63 insertions(+)
> > >
> > > diff --git a/t/t7700-repack.sh b/t/t7700-repack.sh
> > > index 63ef63fc50..89a2116641 100755
> > > --- a/t/t7700-repack.sh
> > > +++ b/t/t7700-repack.sh
> > > @@ -904,4 +904,67 @@ test_expect_success 'pending objects are repacked appropriately' '
> > > )
> > > '
> > >
> > > +test_expect_success 'check one .promisor file content after repack' '
> > > + test_when_finished rm -rf prom_test &&
> > > + git init prom_test &&
> > > + path=prom_test/.git/objects/pack &&
> > > +
> > > + (
> > > + test_commit_bulk -C prom_test --start=1 1 &&
> > > +
> > > + # Simulate .promisor file by creating it manually
> > > + prom=$(ls $path/*.pack | sed "s/\.pack/.promisor/") &&
> > > + oid=$(git -C prom_test rev-parse HEAD) &&
> > > + echo "$oid ref" >$prom &&
> > > +
> > > + # Save the current .promisor content, repack, and check if correct
> > > + prom_before_repack=$(cat $prom) &&
> > > + git -C prom_test repack -a -d &&
> > > + prom=$(ls $path/*.pack | sed "s/\.pack/.promisor/") &&
> > > + # $prom should contain "$prom_before_repack <date>"
> > > + test_grep "$prom_before_repack " $prom &&
> > > +
> > > + # Save the current .promisor content, repack, and check if correct
> > > + cat $prom >prom_before_repack &&
> > > + git -C prom_test repack -a -d &&
> > > + prom=$(ls $path/*.pack | sed "s/\.pack/.promisor/") &&
> > > + # $prom should be exactly the same as prom_before_repack
> > > + test_cmp prom_before_repack $prom
> > > + )
> > > +'
> > > +
> > > +test_expect_success 'check multiple .promisor file content after repack' '
> > > +...
> > > +
> > > + # Repack, and check if correct compared to previous saved .promisor content
> > > + git -C prom_test repack -a -d &&
> > > + prom=$(ls $path/*.pack | sed "s/\.pack/.promisor/") &&
> > > + # $prom should contain "$prom_before_repack1 <date>" & "$prom_before_repack2 <date>"
> > > + test_grep "$prom_before_repack1 " $prom &&
> > > + test_grep "$prom_before_repack2 " $prom &&
> >
> > This test seems to be flakey.
> >
> > https://github.com/git/git/actions/runs/24095497271/job/70292906676#step:10:5274
> >
> > shows that $prom gets two file names, and because test_grep is
> > expecting a single source to grep inside, the first test_grep
> > fails.
>
> Uff yeah, I see.
>
> I also saw your other mail regarding the "SQUASH???" commit you made
> (inside the `seen` branch). I'm not so sure if it is useful to solve
> this issue tho.
>
> It looks like, for some reason, `repack -a` fails to repack everything
> into a single pack, but I believe that `repack -a -f` should force it to
> repack everything no matter what (I think??).
Ok, I think that I have fixed it. Forked the repo and tested multiple
times with the GitHub Actions-based CI. Issue doesn't appear anymore.
Thank you super much for the help Junio.
May I ask you for a tiny bit more of your time to answer the following
question that I originally asked in the cover letter:
>The "CodingGuidelines" explicitly state that:
> "A C file must directly include the header files that declare the
> functions and the types it uses, except for the functions and types
> that are made available to it by including one of the header files
> it must include by the previous rule"
> where "the previous rule" is (if I understand correctly), the one related
> to "<git-compat-util.h>". From what I understand then, I should have
> added an include for "strmap.h" (which is needed for `strset`), correct?
> And if I am correct, shouldn't "strbuf.h", "hash.h", "odb.h",
> "string-list.h" and "strvec.h" also be included?
Thank you so much in advance. Pretty sure the next patch version will be
the last one.
Lorenzo
^ permalink raw reply [flat|nested] 78+ messages in thread
* [GSoC PATCH v4 0/5] preserve promisor files content after repack
2026-04-06 0:23 ` [GSoC PATCH v3 0/5] preserve promisor files " LorenzoPegorari
` (4 preceding siblings ...)
2026-04-06 0:25 ` [GSoC PATCH v3 5/5] t7703: test for promisor file content after geometric repack LorenzoPegorari
@ 2026-04-10 15:01 ` LorenzoPegorari
2026-04-10 15:02 ` [GSoC PATCH v4 1/5] pack-write: add explanation to promisor file content LorenzoPegorari
` (7 more replies)
5 siblings, 8 replies; 78+ messages in thread
From: LorenzoPegorari @ 2026-04-10 15:01 UTC (permalink / raw)
To: git
Cc: Taylor Blau, Patrick Steinhardt, Derrick Stolee, Junio C Hamano,
Elijah Newren, Eric Sunshine, Tian Yuchen
The goal of this patch is to solve the NEEDSWORK comment added by
5374a290 (fetch-pack: write fetched refs to .promisor, 14/10/2019). This
is done by adding a helper function that takes the content of all
.promisor files in the `repository`, and copies it inside the first
.promisor file created by the repack.
Also, I added a comment explaining what is the purpose of the content of
the .promisor files, since this wasn't explained anywhere (I found
information regarding this only in the message of the previously cited
commit).
Finally, I added some tests to "t7700-repack.sh" and
"t7703-repack-geometric.sh" that check if the content of .promisor files
are correctly copied into the .promisor files created by a repack.
If Eric Sunshine, Tian Yuchen (for the patch 2/5 "pack-write: add helper
to fill promisor file after repack") and Junio Hamano (for all patches)
want to be added with a `<Reviewed-by>` tag, please let me know (and, of
course, thanks a lot for the help)!
QUESTION:
The "CodingGuidelines" explicitly state that:
"A C file must directly include the header files that declare the
functions and the types it uses, except for the functions and types
that are made available to it by including one of the header files
it must include by the previous rule"
where "the previous rule" is (if I understand correctly), the one related
to "<git-compat-util.h>". From what I understand then, I should have
added an include for "strmap.h" (which is needed for `strset`), correct?
And if I am correct, shouldn't "strbuf.h", "hash.h", "odb.h",
"string-list.h" and "strvec.h" also be included?
V4 DIFF:
* `copy_promisor_content()` now prints timestamps in Unix time format.
* `copy_promisor_content()` now doesn't use a list of `strbuf`, but
instead uses the more lightweight `string_list`.
* improved the tests.
* fixed issue (that showed up in the GitHub Actions-based CI) where
sometimes the 2 packs created in the second new test inside "t7700"
were not both repacked into a single new pack.
LorenzoPegorari (5):
pack-write: add explanation to promisor file content
pack-write: add helper to fill promisor file after repack
repack-promisor: preserve content of promisor files after repack
t7700: test for promisor file content after repack
t7703: test for promisor file content after geometric repack
Documentation/git-repack.adoc | 4 +-
pack-write.c | 9 +++
repack-promisor.c | 146 +++++++++++++++++++++++++++++++---
t/t7700-repack.sh | 60 ++++++++++++++
t/t7703-repack-geometric.sh | 33 ++++++++
5 files changed, 237 insertions(+), 15 deletions(-)
Range-diff against v3:
1: eb1964dca8 = 1: b4990fcdf0 pack-write: add explanation to promisor file content
2: 3cd1542919 ! 2: 34c4e79311 pack-write: add helper to fill promisor file after repack
@@ Commit message
stored as lines structured like this: "<oid> <ref>".
Create a `copy_promisor_content()` helper function that allows this
- debugging info to not be lost after a `repack`, by coping it inside a new
- ".promisor" file.
+ debugging info to not be lost after a `repack`, by copying it inside a
+ new ".promisor" file.
The function logic is the following:
* Take all ".promisor" files contained inside the given `repo`.
@@ Commit message
(intended to be used for packfiles that have not been repacked).
* Read each line of the remaining ".promisor" files, which can be:
* "<oid> <ref>" if the ".promisor" file was never repacked. If so,
- add the time at which the ".promisor" file was last modified <time>
- to the line to create the string: "<oid> <ref> <time>".
+ add the time (in Unix time) at which the ".promisor" file was last
+ modified <time> to the line, to obtain: "<oid> <ref> <time>".
* "<oid> <ref> <time>" if the ".promisor" file was repacked. If so,
don't modify it.
* Ignore the line if its <oid> is not present inside the
@@ Commit message
The function assumes that the contents of all ".promisor" files are
correctly formed.
- The time of last data modification is used in place of the time of file
- creation, because the former is much easier to obtain than the latter
- one.
+ The time of last data modification, for never-repacked ".promisor" file,
+ can be used when comparing the entries in it with entries in another
+ ".promisor" file that did get repacked. With these timestamps, the
+ debugger will be able to tell at which time the refs at the remote
+ repository pointed at what object. Also, when looking at already
+ repacked ".promisor" files, the same ref may appear multiple times, and
+ having timestamps will help understanding what happened over time.
Signed-off-by: LorenzoPegorari <lorenzo.pegorari2002@gmail.com>
@@ repack-promisor.c: static int write_oid(const struct object_id *oid,
+ * Each line of a never repacked .promisor file is: "<oid> <ref>" (as described
+ * in the write_promisor_file() function).
+ * After a repack, the copied lines will be: "<oid> <ref> <time>", where <time>
-+ * is the time at which the .promisor file was last modified.
++ * is the time (in Unix time) at which the .promisor file was last modified.
+ * Only the lines whose <oid> is present inside "<packtmp>-<dest_hex>.idx" will
+ * be copied.
+ * The contents of all .promisor files are assumed to be correctly formed.
@@ repack-promisor.c: static int write_oid(const struct object_id *oid,
+ source = xfopen(source_promisor_name.buf, "r");
+
+ while (strbuf_getline(&line, source) != EOF) {
-+ struct strbuf **parts;
++ struct string_list line_sections = STRING_LIST_INIT_DUP;
+ struct object_id oid;
+
+ /* Split line into <oid>, <ref> and <time> (if <time> exists) */
-+ parts = strbuf_split_max(&line, ' ', 3);
++ string_list_split(&line_sections, line.buf, " ", 3);
+
+ /* Ignore the lines where <oid> doesn't appear in the dest_pack */
-+ strbuf_rtrim(parts[0]);
-+ get_oid_hex_algop(parts[0]->buf, &oid, repo->hash_algo);
-+ if (!find_pack_entry_one(&oid, dest_pack))
++ get_oid_hex_algop(line_sections.items[0].string, &oid, repo->hash_algo);
++ if (!find_pack_entry_one(&oid, dest_pack)) {
++ string_list_clear(&line_sections, 0);
+ continue;
++ }
+
+ /* If <time> doesn't exist, retrieve it and add it to line */
-+ if (!parts[2]) {
-+ struct tm tm;
-+ localtime_r(&source_stat.st_mtim.tv_sec, &tm),
-+ strbuf_addch(&line, ' ');
-+ strbuf_addftime(&line, "%Y/%m/%d-%H:%M:%S", &tm, 0, 0);
-+ }
++ if (line_sections.nr < 3)
++ strbuf_addf(&line, " %lld", (long long int)source_stat.st_mtim.tv_sec);
+
+ /*
+ * Add the finalized line to dest_to_write and dest_content if it
@@ repack-promisor.c: static int write_oid(const struct object_id *oid,
+ strbuf_addch(&dest_to_write, '\n');
+ }
+
-+ strbuf_list_free(parts);
++ string_list_clear(&line_sections, 0);
+ }
+
+ err = ferror(source);
3: c16b1198fd = 3: 72ef2378b9 repack-promisor: preserve content of promisor files after repack
4: 8e58c1263d ! 4: 0aceaed480 t7700: test for promisor file content after repack
@@ Metadata
## Commit message ##
t7700: test for promisor file content after repack
- Add tests that checks if the content of ".promisor" files are correctly
+ Add tests that check if the content of ".promisor" files are correctly
copied inside the ".promisor" files created by a repack.
+ The `-f` flag is used when repacking to ensure that all the packs
+ (created with `test_commit_bulk`) are repacked into a single new pack.
+
Signed-off-by: LorenzoPegorari <lorenzo.pegorari2002@gmail.com>
## t/t7700-repack.sh ##
@@ t/t7700-repack.sh: test_expect_success 'pending objects are repacked appropriate
'
+test_expect_success 'check one .promisor file content after repack' '
-+ test_when_finished rm -rf prom_test &&
++ test_when_finished rm -rf prom_test prom_before_repack &&
+ git init prom_test &&
+ path=prom_test/.git/objects/pack &&
+
+ (
-+ test_commit_bulk -C prom_test --start=1 1 &&
-+
++ test_commit_bulk -C prom_test 1 &&
++
+ # Simulate .promisor file by creating it manually
+ prom=$(ls $path/*.pack | sed "s/\.pack/.promisor/") &&
+ oid=$(git -C prom_test rev-parse HEAD) &&
-+ echo "$oid ref" >$prom &&
++ echo "$oid ref" >"$prom" &&
+
-+ # Save the current .promisor content, repack, and check if correct
-+ prom_before_repack=$(cat $prom) &&
-+ git -C prom_test repack -a -d &&
-+ prom=$(ls $path/*.pack | sed "s/\.pack/.promisor/") &&
-+ # $prom should contain "$prom_before_repack <date>"
-+ test_grep "$prom_before_repack " $prom &&
++ # Repack, and check if correct
++ git -C prom_test repack -a -d -f &&
++ prom=$(ls $path/*.promisor) &&
++ # $prom should contain "$oid ref <time>"
++ test_grep "$prom_before_repack " "$prom" &&
+
+ # Save the current .promisor content, repack, and check if correct
-+ cat $prom >prom_before_repack &&
-+ git -C prom_test repack -a -d &&
-+ prom=$(ls $path/*.pack | sed "s/\.pack/.promisor/") &&
++ cp "$prom" prom_before_repack &&
++ git -C prom_test repack -a -d -f &&
++ prom=$(ls $path/*.promisor) &&
+ # $prom should be exactly the same as prom_before_repack
-+ test_cmp prom_before_repack $prom
++ test_cmp prom_before_repack "$prom"
+ )
+'
+
+test_expect_success 'check multiple .promisor file content after repack' '
-+ test_when_finished rm -rf prom_test &&
++ test_when_finished rm -rf prom_test prom_before_repack &&
+ git init prom_test &&
+ path=prom_test/.git/objects/pack &&
+
+ (
+ # Create 2 packs and simulate .promisor files by creating them manually
-+ test_commit_bulk -C prom_test --start=1 1 &&
++ test_commit_bulk -C prom_test 1 &&
+ prom=$(ls $path/*.pack | sed "s/\.pack/.promisor/") &&
-+ oid=$(git -C prom_test rev-parse HEAD) &&
-+ echo "$oid ref" >$prom &&
-+ prom_before_repack1=$(cat $prom) &&
-+ test_commit_bulk -C prom_test --start=1 1 &&
-+ prom=$(ls -t $path/*.pack | head -n 1 | sed "s/\.pack/.promisor/") &&
-+ oid=$(git -C prom_test rev-parse HEAD) &&
-+ echo "$oid ref" >$prom &&
-+ prom_before_repack2=$(cat $prom) &&
++ oid1=$(git -C prom_test rev-parse HEAD) &&
++ echo "$oid1 ref1" >"$prom" &&
++ test_commit_bulk -C prom_test 1 &&
++ prom=$(ls $path/*.pack | sed "s/\.pack/.promisor/; \|$prom|d") &&
++ oid2=$(git -C prom_test rev-parse HEAD) &&
++ echo "$oid2 ref2" >"$prom" &&
+
-+ # Repack, and check if correct compared to previous saved .promisor content
-+ git -C prom_test repack -a -d &&
-+ prom=$(ls $path/*.pack | sed "s/\.pack/.promisor/") &&
-+ # $prom should contain "$prom_before_repack1 <date>" & "$prom_before_repack2 <date>"
-+ test_grep "$prom_before_repack1 " $prom &&
-+ test_grep "$prom_before_repack2 " $prom &&
++ # Repack, and check if correct
++ git -C prom_test repack -a -d -f &&
++ prom=$(ls $path/*.promisor) &&
++ # $prom should contain "$oid1 ref1 <time>" & "$oid2 ref2 <time>"
++ test_grep "$oid1 ref1 " "$prom" &&
++ test_grep "$oid2 ref2 " "$prom" &&
+
+ # Save the current .promisor content, repack, and check if correct
-+ cat $prom >prom_before_repack &&
-+ git -C prom_test repack -a -d &&
-+ prom=$(ls $path/*.pack | sed "s/\.pack/.promisor/") &&
++ cp "$prom" prom_before_repack &&
++ git -C prom_test repack -a -d -f &&
++ prom=$(ls $path/*.promisor) &&
+ # $prom should be exactly the same as prom_before_repack
-+ test_cmp prom_before_repack $prom
++ test_cmp prom_before_repack "$prom"
+ )
+'
+
5: 1533fa96a8 ! 5: d9f6341481 t7703: test for promisor file content after geometric repack
@@ t/t7703-repack-geometric.sh: test_expect_success 'geometric repack works with pr
+ (
+ # Create 2 packs with 3 objs each, and manually create .promisor files
+ test_commit_bulk -C prom_test --start=1 1 && # 3 objects
-+ prom=$(ls $path/*.pack | sed "s/\.pack/.promisor/") &&
-+ oid=$(git -C prom_test rev-parse HEAD) &&
-+ echo "$oid ref" >$prom &&
-+ prom_before_repack1=$(cat $prom) &&
++ prom1=$(ls $path/*.pack | sed "s/\.pack/.promisor/") &&
++ oid1=$(git -C prom_test rev-parse HEAD) &&
++ echo "$oid1 ref1" >"$prom1" &&
+ test_commit_bulk -C prom_test --start=2 1 && # 3 objects
-+ prom=$(ls -t $path/*.pack | head -n 1 | sed "s/\.pack/.promisor/") &&
-+ oid=$(git -C prom_test rev-parse HEAD) &&
-+ echo "$oid ref" >$prom &&
-+ prom_before_repack2=$(cat $prom) &&
++ prom2=$(ls $path/*.pack | sed "s/\.pack/.promisor/; \|$prom1|d") &&
++ oid2=$(git -C prom_test rev-parse HEAD) &&
++ echo "$oid2 ref2" >"$prom2" &&
+
-+ # Create 2 packs with 12 and 24 objs, and manually create .promisor files
++ # Create 1 pack with 12 objs, and manually create .promisor file
+ test_commit_bulk -C prom_test --start=3 4 && # 12 objects
-+ prom=$(ls -t $path/*.pack | head -n 1 | sed "s/\.pack/.promisor/") &&
-+ oid=$(git -C prom_test rev-parse HEAD) &&
-+ echo "$oid ref" >$prom &&
-+ prom_before_repack3=$(cat $prom) &&
-+ test_commit_bulk -C prom_test --start=7 8 && # 24 objects
-+ prom=$(ls -t $path/*.pack | head -n 1 | sed "s/\.pack/.promisor/") &&
-+ oid=$(git -C prom_test rev-parse HEAD) &&
-+ echo "$oid ref" >$prom &&
-+ prom_before_repack4=$(cat $prom) &&
++ prom3=$(ls $path/*.pack | sed "s/\.pack/.promisor/; \|$prom1|d; \|$prom2|d") &&
++ oid3=$(git -C prom_test rev-parse HEAD) &&
++ echo "$oid3 ref3" >"$prom3" &&
+
-+ # Geometric repack, and check if correct compared to previous saved .promisor content
++ # Geometric repack, and check if correct
+ git -C prom_test repack --geometric 2 -d &&
-+ prom=$(ls -t $path/*.pack | head -n 1 | sed "s/\.pack/.promisor/") &&
-+ # $prom should have repacked only the first 2 small packs, so it should only contain
-+ # the following: "$prom_before_repack1 <date>" & "$prom_before_repack2 <date>"
-+ test_grep "$prom_before_repack1 " $prom &&
-+ test_grep "$prom_before_repack2 " $prom &&
-+ test_grep ! $prom_before_repack3 $prom &&
-+ test_grep ! $prom_before_repack4 $prom
++ prom=$(ls $path/*.pack | sed "s/\.pack/.promisor/; \|$prom3|d") &&
++ # $prom should have repacked only the first 2 small packs, so it should only
++ # contain the following: "$oid1 ref1 <time>" & "$oid2 ref2 <time>"
++ test_grep "$oid1 ref1 " "$prom" &&
++ test_grep "$oid2 ref2 " "$prom" &&
++ test_grep ! "$oid3 ref3" "$prom"
+ )
+'
+
--
2.53.0.585.ge25071d955
^ permalink raw reply [flat|nested] 78+ messages in thread
* [GSoC PATCH v4 1/5] pack-write: add explanation to promisor file content
2026-04-10 15:01 ` [GSoC PATCH v4 0/5] preserve promisor files content after repack LorenzoPegorari
@ 2026-04-10 15:02 ` LorenzoPegorari
2026-04-10 15:02 ` [GSoC PATCH v4 2/5] pack-write: add helper to fill promisor file after repack LorenzoPegorari
` (6 subsequent siblings)
7 siblings, 0 replies; 78+ messages in thread
From: LorenzoPegorari @ 2026-04-10 15:02 UTC (permalink / raw)
To: git
Cc: Taylor Blau, Patrick Steinhardt, Derrick Stolee, Junio C Hamano,
Elijah Newren, Eric Sunshine, Tian Yuchen
In the entire codebase there is no explanation as to why the ".promisor"
files may contain the ref names (and their associated hashes) that were
fetched at the time the corresponding packfile was downloaded.
As explained in the log message of commit 5374a290 (fetch-pack: write
fetched refs to .promisor, 2019-10-14), where this loop originally came
from, these ref names (and associated hashes) are not used for anything
in the production, but are solely there to help debugging.
Explain this in a new comment.
Signed-off-by: LorenzoPegorari <lorenzo.pegorari2002@gmail.com>
---
pack-write.c | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/pack-write.c b/pack-write.c
index 83eaf88541..b8ab9510ff 100644
--- a/pack-write.c
+++ b/pack-write.c
@@ -603,6 +603,15 @@ void write_promisor_file(const char *promisor_name, struct ref **sought, int nr_
int i, err;
FILE *output = xfopen(promisor_name, "w");
+ /*
+ * Write in the .promisor file the ref names and associated hashes,
+ * obtained by fetch-pack, at the point of generation of the
+ * corresponding packfile. These pieces of info are only used to make
+ * it easier to debug issues with partial clones, as we can identify
+ * what refs (and their associated hashes) were fetched at the time
+ * the packfile was downloaded, and if necessary, compare those hashes
+ * against what the promisor remote reports now.
+ */
for (i = 0; i < nr_sought; i++)
fprintf(output, "%s %s\n", oid_to_hex(&sought[i]->old_oid),
sought[i]->name);
--
2.53.0.585.ge25071d955
^ permalink raw reply related [flat|nested] 78+ messages in thread
* [GSoC PATCH v4 2/5] pack-write: add helper to fill promisor file after repack
2026-04-10 15:01 ` [GSoC PATCH v4 0/5] preserve promisor files content after repack LorenzoPegorari
2026-04-10 15:02 ` [GSoC PATCH v4 1/5] pack-write: add explanation to promisor file content LorenzoPegorari
@ 2026-04-10 15:02 ` LorenzoPegorari
2026-04-10 16:01 ` Junio C Hamano
2026-04-10 15:03 ` [GSoC PATCH v4 3/5] repack-promisor: preserve content of promisor files after repack LorenzoPegorari
` (5 subsequent siblings)
7 siblings, 1 reply; 78+ messages in thread
From: LorenzoPegorari @ 2026-04-10 15:02 UTC (permalink / raw)
To: git
Cc: Taylor Blau, Patrick Steinhardt, Derrick Stolee, Junio C Hamano,
Elijah Newren, Eric Sunshine, Tian Yuchen
A ".promisor" file may contain ref names (and their associated hashes)
that were fetched at the time the corresponding packfile was downloaded.
This information is used for debugging reasons. This information is
stored as lines structured like this: "<oid> <ref>".
Create a `copy_promisor_content()` helper function that allows this
debugging info to not be lost after a `repack`, by copying it inside a
new ".promisor" file.
The function logic is the following:
* Take all ".promisor" files contained inside the given `repo`.
* Ignore those whose name is contained inside the given `strset
not_repacked_names`, which basically acts as a "promisor ignorelist"
(intended to be used for packfiles that have not been repacked).
* Read each line of the remaining ".promisor" files, which can be:
* "<oid> <ref>" if the ".promisor" file was never repacked. If so,
add the time (in Unix time) at which the ".promisor" file was last
modified <time> to the line, to obtain: "<oid> <ref> <time>".
* "<oid> <ref> <time>" if the ".promisor" file was repacked. If so,
don't modify it.
* Ignore the line if its <oid> is not present inside the
"<packtmp>-<dest_hex>.idx" file.
* If the destination file "<packtmp>-<dest_hex>.promisor" does not
already contain the line, append it to the file.
The function assumes that the contents of all ".promisor" files are
correctly formed.
The time of last data modification, for never-repacked ".promisor" file,
can be used when comparing the entries in it with entries in another
".promisor" file that did get repacked. With these timestamps, the
debugger will be able to tell at which time the refs at the remote
repository pointed at what object. Also, when looking at already
repacked ".promisor" files, the same ref may appear multiple times, and
having timestamps will help understanding what happened over time.
Signed-off-by: LorenzoPegorari <lorenzo.pegorari2002@gmail.com>
---
repack-promisor.c | 116 ++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 116 insertions(+)
diff --git a/repack-promisor.c b/repack-promisor.c
index 90318ce150..797314d7b9 100644
--- a/repack-promisor.c
+++ b/repack-promisor.c
@@ -34,6 +34,122 @@ static int write_oid(const struct object_id *oid,
return 0;
}
+/*
+ * Go through all .promisor files contained in repo (excluding those whose name
+ * appears in not_repacked_basenames, which acts as a ignorelist), and copies
+ * their content inside the destination file "<packtmp>-<dest_hex>.promisor".
+ * Each line of a never repacked .promisor file is: "<oid> <ref>" (as described
+ * in the write_promisor_file() function).
+ * After a repack, the copied lines will be: "<oid> <ref> <time>", where <time>
+ * is the time (in Unix time) at which the .promisor file was last modified.
+ * Only the lines whose <oid> is present inside "<packtmp>-<dest_hex>.idx" will
+ * be copied.
+ * The contents of all .promisor files are assumed to be correctly formed.
+ */
+static void copy_promisor_content(struct repository *repo,
+ const char *dest_hex,
+ const char *packtmp,
+ struct strset *not_repacked_basenames)
+{
+ char *dest_idx_name;
+ char *dest_promisor_name;
+ FILE *dest;
+ struct strset dest_content = STRSET_INIT;
+ struct strbuf dest_to_write = STRBUF_INIT;
+ struct strbuf source_promisor_name = STRBUF_INIT;
+ struct strbuf line = STRBUF_INIT;
+ struct object_id dest_oid;
+ struct packed_git *dest_pack, *p;
+ int err;
+
+ dest_idx_name = mkpathdup("%s-%s.idx", packtmp, dest_hex);
+ get_oid_hex_algop(dest_hex, &dest_oid, repo->hash_algo);
+ dest_pack = parse_pack_index(repo, dest_oid.hash, dest_idx_name);
+
+ /* Open the .promisor dest file, and fill dest_content with its content */
+ dest_promisor_name = mkpathdup("%s-%s.promisor", packtmp, dest_hex);
+ dest = xfopen(dest_promisor_name, "r+");
+ while (strbuf_getline(&line, dest) != EOF)
+ strset_add(&dest_content, line.buf);
+
+ repo_for_each_pack(repo, p) {
+ FILE *source;
+ struct stat source_stat;
+
+ if (!p->pack_promisor)
+ continue;
+
+ if (not_repacked_basenames &&
+ strset_contains(not_repacked_basenames, pack_basename(p)))
+ continue;
+
+ strbuf_reset(&source_promisor_name);
+ strbuf_addstr(&source_promisor_name, p->pack_name);
+ strbuf_strip_suffix(&source_promisor_name, ".pack");
+ strbuf_addstr(&source_promisor_name, ".promisor");
+
+ if (stat(source_promisor_name.buf, &source_stat))
+ die(_("File not found: %s"), source_promisor_name.buf);
+
+ source = xfopen(source_promisor_name.buf, "r");
+
+ while (strbuf_getline(&line, source) != EOF) {
+ struct string_list line_sections = STRING_LIST_INIT_DUP;
+ struct object_id oid;
+
+ /* Split line into <oid>, <ref> and <time> (if <time> exists) */
+ string_list_split(&line_sections, line.buf, " ", 3);
+
+ /* Ignore the lines where <oid> doesn't appear in the dest_pack */
+ get_oid_hex_algop(line_sections.items[0].string, &oid, repo->hash_algo);
+ if (!find_pack_entry_one(&oid, dest_pack)) {
+ string_list_clear(&line_sections, 0);
+ continue;
+ }
+
+ /* If <time> doesn't exist, retrieve it and add it to line */
+ if (line_sections.nr < 3)
+ strbuf_addf(&line, " %lld", (long long int)source_stat.st_mtim.tv_sec);
+
+ /*
+ * Add the finalized line to dest_to_write and dest_content if it
+ * wasn't already present inside dest_content
+ */
+ if (strset_add(&dest_content, line.buf)) {
+ strbuf_addbuf(&dest_to_write, &line);
+ strbuf_addch(&dest_to_write, '\n');
+ }
+
+ string_list_clear(&line_sections, 0);
+ }
+
+ err = ferror(source);
+ err |= fclose(source);
+ if (err)
+ die(_("Could not read '%s' promisor file"), source_promisor_name.buf);
+ }
+
+ /* If dest_to_write is not empty, then there are new lines to append */
+ if (dest_to_write.len) {
+ if (fseek(dest, 0L, SEEK_END))
+ die_errno(_("fseek failed"));
+ fprintf(dest, "%s", dest_to_write.buf);
+ }
+
+ err = ferror(dest);
+ err |= fclose(dest);
+ if (err)
+ die(_("Could not write '%s' promisor file"), dest_promisor_name);
+
+ close_pack_index(dest_pack);
+ free(dest_idx_name);
+ free(dest_promisor_name);
+ strset_clear(&dest_content);
+ strbuf_release(&dest_to_write);
+ strbuf_release(&source_promisor_name);
+ strbuf_release(&line);
+}
+
static void finish_repacking_promisor_objects(struct repository *repo,
struct child_process *cmd,
struct string_list *names,
--
2.53.0.585.ge25071d955
^ permalink raw reply related [flat|nested] 78+ messages in thread
* [GSoC PATCH v4 3/5] repack-promisor: preserve content of promisor files after repack
2026-04-10 15:01 ` [GSoC PATCH v4 0/5] preserve promisor files content after repack LorenzoPegorari
2026-04-10 15:02 ` [GSoC PATCH v4 1/5] pack-write: add explanation to promisor file content LorenzoPegorari
2026-04-10 15:02 ` [GSoC PATCH v4 2/5] pack-write: add helper to fill promisor file after repack LorenzoPegorari
@ 2026-04-10 15:03 ` LorenzoPegorari
2026-04-10 15:04 ` [GSoC PATCH v4 4/5] t7700: test for promisor file content " LorenzoPegorari
` (4 subsequent siblings)
7 siblings, 0 replies; 78+ messages in thread
From: LorenzoPegorari @ 2026-04-10 15:03 UTC (permalink / raw)
To: git
Cc: Taylor Blau, Patrick Steinhardt, Derrick Stolee, Junio C Hamano,
Elijah Newren, Eric Sunshine, Tian Yuchen
When a repack involving promisor packfiles happens, the new ".promisor"
file is created empty, losing all the debug info that might be present
inside the ".promisor" files before the repack.
Use the "copy_promisor_content()" function created previously to preserve
the contents of all ".promisor" files inside the first ".promisor" file
created by the repack.
For geometric repacking, we have to create a `strset` that contains the
basenames of all excluded packs. For "normal" repacking this is not
necessary, since there should be no excluded packs.
Also, update the documentation accordingly.
Signed-off-by: LorenzoPegorari <lorenzo.pegorari2002@gmail.com>
---
Documentation/git-repack.adoc | 4 ++--
repack-promisor.c | 30 +++++++++++++++++-------------
2 files changed, 19 insertions(+), 15 deletions(-)
diff --git a/Documentation/git-repack.adoc b/Documentation/git-repack.adoc
index 673ce91083..33d3c8afbd 100644
--- a/Documentation/git-repack.adoc
+++ b/Documentation/git-repack.adoc
@@ -45,8 +45,8 @@ other objects in that pack they already have locally.
+
Promisor packfiles are repacked separately: if there are packfiles that
have an associated ".promisor" file, these packfiles will be repacked
-into another separate pack, and an empty ".promisor" file corresponding
-to the new separate pack will be written.
+into another separate pack, and a ".promisor" file corresponding to the
+new separate pack will be written (with arbitrary contents).
-A::
Same as `-a`, unless `-d` is used. Then any unreachable
diff --git a/repack-promisor.c b/repack-promisor.c
index 797314d7b9..0c373c8820 100644
--- a/repack-promisor.c
+++ b/repack-promisor.c
@@ -153,7 +153,8 @@ static void copy_promisor_content(struct repository *repo,
static void finish_repacking_promisor_objects(struct repository *repo,
struct child_process *cmd,
struct string_list *names,
- const char *packtmp)
+ const char *packtmp,
+ struct strset *not_repacked_basenames)
{
struct strbuf line = STRBUF_INIT;
FILE *out;
@@ -171,19 +172,15 @@ static void finish_repacking_promisor_objects(struct repository *repo,
/*
* pack-objects creates the .pack and .idx files, but not the
- * .promisor file. Create the .promisor file, which is empty.
- *
- * NEEDSWORK: fetch-pack sometimes generates non-empty
- * .promisor files containing the ref names and associated
- * hashes at the point of generation of the corresponding
- * packfile, but this would not preserve their contents. Maybe
- * concatenate the contents of all .promisor files instead of
- * just creating a new empty file.
+ * .promisor file. Create the .promisor file.
*/
promisor_name = mkpathdup("%s-%s.promisor", packtmp,
line.buf);
write_promisor_file(promisor_name, NULL, 0);
+ /* Now let's fill the content of the newly created .promisor file */
+ copy_promisor_content(repo, line.buf, packtmp, not_repacked_basenames);
+
item->util = generated_pack_populate(item->string, packtmp);
free(promisor_name);
@@ -223,7 +220,7 @@ void repack_promisor_objects(struct repository *repo,
return;
}
- finish_repacking_promisor_objects(repo, &cmd, names, packtmp);
+ finish_repacking_promisor_objects(repo, &cmd, names, packtmp, NULL);
}
void pack_geometry_repack_promisors(struct repository *repo,
@@ -234,6 +231,7 @@ void pack_geometry_repack_promisors(struct repository *repo,
{
struct child_process cmd = CHILD_PROCESS_INIT;
FILE *in;
+ struct strset not_repacked_basenames = STRSET_INIT;
if (!geometry->promisor_split)
return;
@@ -247,9 +245,15 @@ void pack_geometry_repack_promisors(struct repository *repo,
in = xfdopen(cmd.in, "w");
for (size_t i = 0; i < geometry->promisor_split; i++)
fprintf(in, "%s\n", pack_basename(geometry->promisor_pack[i]));
- for (size_t i = geometry->promisor_split; i < geometry->promisor_pack_nr; i++)
- fprintf(in, "^%s\n", pack_basename(geometry->promisor_pack[i]));
+ for (size_t i = geometry->promisor_split; i < geometry->promisor_pack_nr; i++) {
+ const char *name = pack_basename(geometry->promisor_pack[i]);
+ fprintf(in, "^%s\n", name);
+ strset_add(¬_repacked_basenames, name);
+ }
fclose(in);
- finish_repacking_promisor_objects(repo, &cmd, names, packtmp);
+ finish_repacking_promisor_objects(repo, &cmd, names, packtmp,
+ strset_get_size(¬_repacked_basenames) ? ¬_repacked_basenames : NULL);
+
+ strset_clear(¬_repacked_basenames);
}
--
2.53.0.585.ge25071d955
^ permalink raw reply related [flat|nested] 78+ messages in thread
* [GSoC PATCH v4 4/5] t7700: test for promisor file content after repack
2026-04-10 15:01 ` [GSoC PATCH v4 0/5] preserve promisor files content after repack LorenzoPegorari
` (2 preceding siblings ...)
2026-04-10 15:03 ` [GSoC PATCH v4 3/5] repack-promisor: preserve content of promisor files after repack LorenzoPegorari
@ 2026-04-10 15:04 ` LorenzoPegorari
2026-04-10 15:04 ` [GSoC PATCH v4 5/5] t7703: test for promisor file content after geometric repack LorenzoPegorari
` (3 subsequent siblings)
7 siblings, 0 replies; 78+ messages in thread
From: LorenzoPegorari @ 2026-04-10 15:04 UTC (permalink / raw)
To: git
Cc: Taylor Blau, Patrick Steinhardt, Derrick Stolee, Junio C Hamano,
Elijah Newren, Eric Sunshine, Tian Yuchen
Add tests that check if the content of ".promisor" files are correctly
copied inside the ".promisor" files created by a repack.
The `-f` flag is used when repacking to ensure that all the packs
(created with `test_commit_bulk`) are repacked into a single new pack.
Signed-off-by: LorenzoPegorari <lorenzo.pegorari2002@gmail.com>
---
t/t7700-repack.sh | 60 +++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 60 insertions(+)
diff --git a/t/t7700-repack.sh b/t/t7700-repack.sh
index 63ef63fc50..186a931ea7 100755
--- a/t/t7700-repack.sh
+++ b/t/t7700-repack.sh
@@ -904,4 +904,64 @@ test_expect_success 'pending objects are repacked appropriately' '
)
'
+test_expect_success 'check one .promisor file content after repack' '
+ test_when_finished rm -rf prom_test prom_before_repack &&
+ git init prom_test &&
+ path=prom_test/.git/objects/pack &&
+
+ (
+ test_commit_bulk -C prom_test 1 &&
+
+ # Simulate .promisor file by creating it manually
+ prom=$(ls $path/*.pack | sed "s/\.pack/.promisor/") &&
+ oid=$(git -C prom_test rev-parse HEAD) &&
+ echo "$oid ref" >"$prom" &&
+
+ # Repack, and check if correct
+ git -C prom_test repack -a -d -f &&
+ prom=$(ls $path/*.promisor) &&
+ # $prom should contain "$oid ref <time>"
+ test_grep "$prom_before_repack " "$prom" &&
+
+ # Save the current .promisor content, repack, and check if correct
+ cp "$prom" prom_before_repack &&
+ git -C prom_test repack -a -d -f &&
+ prom=$(ls $path/*.promisor) &&
+ # $prom should be exactly the same as prom_before_repack
+ test_cmp prom_before_repack "$prom"
+ )
+'
+
+test_expect_success 'check multiple .promisor file content after repack' '
+ test_when_finished rm -rf prom_test prom_before_repack &&
+ git init prom_test &&
+ path=prom_test/.git/objects/pack &&
+
+ (
+ # Create 2 packs and simulate .promisor files by creating them manually
+ test_commit_bulk -C prom_test 1 &&
+ prom=$(ls $path/*.pack | sed "s/\.pack/.promisor/") &&
+ oid1=$(git -C prom_test rev-parse HEAD) &&
+ echo "$oid1 ref1" >"$prom" &&
+ test_commit_bulk -C prom_test 1 &&
+ prom=$(ls $path/*.pack | sed "s/\.pack/.promisor/; \|$prom|d") &&
+ oid2=$(git -C prom_test rev-parse HEAD) &&
+ echo "$oid2 ref2" >"$prom" &&
+
+ # Repack, and check if correct
+ git -C prom_test repack -a -d -f &&
+ prom=$(ls $path/*.promisor) &&
+ # $prom should contain "$oid1 ref1 <time>" & "$oid2 ref2 <time>"
+ test_grep "$oid1 ref1 " "$prom" &&
+ test_grep "$oid2 ref2 " "$prom" &&
+
+ # Save the current .promisor content, repack, and check if correct
+ cp "$prom" prom_before_repack &&
+ git -C prom_test repack -a -d -f &&
+ prom=$(ls $path/*.promisor) &&
+ # $prom should be exactly the same as prom_before_repack
+ test_cmp prom_before_repack "$prom"
+ )
+'
+
test_done
--
2.53.0.585.ge25071d955
^ permalink raw reply related [flat|nested] 78+ messages in thread
* [GSoC PATCH v4 5/5] t7703: test for promisor file content after geometric repack
2026-04-10 15:01 ` [GSoC PATCH v4 0/5] preserve promisor files content after repack LorenzoPegorari
` (3 preceding siblings ...)
2026-04-10 15:04 ` [GSoC PATCH v4 4/5] t7700: test for promisor file content " LorenzoPegorari
@ 2026-04-10 15:04 ` LorenzoPegorari
2026-04-10 15:47 ` [GSoC PATCH v4 0/5] preserve promisor files content after repack Junio C Hamano
` (2 subsequent siblings)
7 siblings, 0 replies; 78+ messages in thread
From: LorenzoPegorari @ 2026-04-10 15:04 UTC (permalink / raw)
To: git
Cc: Taylor Blau, Patrick Steinhardt, Derrick Stolee, Junio C Hamano,
Elijah Newren, Eric Sunshine, Tian Yuchen
Add test that checks if the content of ".promisor" files are correctly
copied inside the ".promisor" files created by a geometric repack.
Signed-off-by: LorenzoPegorari <lorenzo.pegorari2002@gmail.com>
---
t/t7703-repack-geometric.sh | 33 +++++++++++++++++++++++++++++++++
1 file changed, 33 insertions(+)
diff --git a/t/t7703-repack-geometric.sh b/t/t7703-repack-geometric.sh
index 04d5d8fc33..a8e3e6ae3f 100755
--- a/t/t7703-repack-geometric.sh
+++ b/t/t7703-repack-geometric.sh
@@ -541,4 +541,37 @@ test_expect_success 'geometric repack works with promisor packs' '
)
'
+test_expect_success 'check .promisor file content after geometric repack' '
+ test_when_finished rm -rf prom_test &&
+ git init prom_test &&
+ path=prom_test/.git/objects/pack &&
+
+ (
+ # Create 2 packs with 3 objs each, and manually create .promisor files
+ test_commit_bulk -C prom_test --start=1 1 && # 3 objects
+ prom1=$(ls $path/*.pack | sed "s/\.pack/.promisor/") &&
+ oid1=$(git -C prom_test rev-parse HEAD) &&
+ echo "$oid1 ref1" >"$prom1" &&
+ test_commit_bulk -C prom_test --start=2 1 && # 3 objects
+ prom2=$(ls $path/*.pack | sed "s/\.pack/.promisor/; \|$prom1|d") &&
+ oid2=$(git -C prom_test rev-parse HEAD) &&
+ echo "$oid2 ref2" >"$prom2" &&
+
+ # Create 1 pack with 12 objs, and manually create .promisor file
+ test_commit_bulk -C prom_test --start=3 4 && # 12 objects
+ prom3=$(ls $path/*.pack | sed "s/\.pack/.promisor/; \|$prom1|d; \|$prom2|d") &&
+ oid3=$(git -C prom_test rev-parse HEAD) &&
+ echo "$oid3 ref3" >"$prom3" &&
+
+ # Geometric repack, and check if correct
+ git -C prom_test repack --geometric 2 -d &&
+ prom=$(ls $path/*.pack | sed "s/\.pack/.promisor/; \|$prom3|d") &&
+ # $prom should have repacked only the first 2 small packs, so it should only
+ # contain the following: "$oid1 ref1 <time>" & "$oid2 ref2 <time>"
+ test_grep "$oid1 ref1 " "$prom" &&
+ test_grep "$oid2 ref2 " "$prom" &&
+ test_grep ! "$oid3 ref3" "$prom"
+ )
+'
+
test_done
--
2.53.0.585.ge25071d955
^ permalink raw reply related [flat|nested] 78+ messages in thread
* Re: [GSoC PATCH v4 0/5] preserve promisor files content after repack
2026-04-10 15:01 ` [GSoC PATCH v4 0/5] preserve promisor files content after repack LorenzoPegorari
` (4 preceding siblings ...)
2026-04-10 15:04 ` [GSoC PATCH v4 5/5] t7703: test for promisor file content after geometric repack LorenzoPegorari
@ 2026-04-10 15:47 ` Junio C Hamano
2026-04-10 16:44 ` Lorenzo Pegorari
2026-04-10 22:54 ` [GSoC PATCH v5 0/6] " LorenzoPegorari
2026-04-10 23:05 ` [GSoC PATCH v4 0/5] " Junio C Hamano
7 siblings, 1 reply; 78+ messages in thread
From: Junio C Hamano @ 2026-04-10 15:47 UTC (permalink / raw)
To: LorenzoPegorari
Cc: git, Taylor Blau, Patrick Steinhardt, Derrick Stolee,
Elijah Newren, Eric Sunshine, Tian Yuchen
LorenzoPegorari <lorenzo.pegorari2002@gmail.com> writes:
> QUESTION:
> The "CodingGuidelines" explicitly state that:
> "A C file must directly include the header files that declare the
> functions and the types it uses, except for the functions and types
> that are made available to it by including one of the header files
> it must include by the previous rule"
> where "the previous rule" is (if I understand correctly), the one related
> to "<git-compat-util.h>". From what I understand then, I should have
> added an include for "strmap.h" (which is needed for `strset`), correct?
> And if I am correct, shouldn't "strbuf.h", "hash.h", "odb.h",
> "string-list.h" and "strvec.h" also be included?
If you are using any of the facilities declared in these header
files in your program, yes.
In practice many header files pull in other header files for
definitions they themselves use. For example, <X.h> that defines
"struct X" may include <Y.h> for the definition of "struct Y" because
the former embeds an instance of the latter, instead of having a pointer
to an on-heap instance of the latter.
If you use both "struct X" and "struct Y", your program may compile
with only <X.h> included without <Y.h> included in such a case, but
the guideline suggests against doing so, because it should not be
relied on. The implementation of "struct X" may change in the
future and stop depending on "struct Y", at which time <X.h> stops
including <Y.h> itself, and your program would start failing to
build, because you use "struct Y" but without including <Y.h>.
But in practice, use of strbuf is so widespread and the header is
included in some other headers that do not need to, so your build
may happen to work without including <strbuf.h>, for example.
^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: [GSoC PATCH v4 2/5] pack-write: add helper to fill promisor file after repack
2026-04-10 15:02 ` [GSoC PATCH v4 2/5] pack-write: add helper to fill promisor file after repack LorenzoPegorari
@ 2026-04-10 16:01 ` Junio C Hamano
2026-04-10 16:34 ` Lorenzo Pegorari
2026-04-10 18:10 ` [PATCH] CodingGuidelines: st_mtimespec vs st_mtim vs st_mtime Junio C Hamano
0 siblings, 2 replies; 78+ messages in thread
From: Junio C Hamano @ 2026-04-10 16:01 UTC (permalink / raw)
To: LorenzoPegorari
Cc: git, Taylor Blau, Patrick Steinhardt, Derrick Stolee,
Elijah Newren, Eric Sunshine, Tian Yuchen
LorenzoPegorari <lorenzo.pegorari2002@gmail.com> writes:
> + /* If <time> doesn't exist, retrieve it and add it to line */
> + if (line_sections.nr < 3)
> + strbuf_addf(&line, " %lld", (long long int)source_stat.st_mtim.tv_sec);
It should be easy to see in the output of
$ git grep -e '%lld' -e 'st_mtim\.tv_sec'
that we do not use these constructs.
Write it like this instead
strbuf_addf(&line, " %" PRItime,
(timestamp_t)source_stat.st_mtime);
examples to mimick the uses of timestamp_t and PRItime are found in
many places; worktree.c, date.c, builtin/blame.c would give plenty.
^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: [GSoC PATCH v4 2/5] pack-write: add helper to fill promisor file after repack
2026-04-10 16:01 ` Junio C Hamano
@ 2026-04-10 16:34 ` Lorenzo Pegorari
2026-04-10 18:10 ` [PATCH] CodingGuidelines: st_mtimespec vs st_mtim vs st_mtime Junio C Hamano
1 sibling, 0 replies; 78+ messages in thread
From: Lorenzo Pegorari @ 2026-04-10 16:34 UTC (permalink / raw)
To: Junio C Hamano
Cc: git, Taylor Blau, Patrick Steinhardt, Derrick Stolee,
Elijah Newren, Eric Sunshine, Tian Yuchen
On Fri, Apr 10, 2026 at 09:01:16AM -0700, Junio C Hamano wrote:
> LorenzoPegorari <lorenzo.pegorari2002@gmail.com> writes:
>
> > + /* If <time> doesn't exist, retrieve it and add it to line */
> > + if (line_sections.nr < 3)
> > + strbuf_addf(&line, " %lld", (long long int)source_stat.st_mtim.tv_sec);
>
> It should be easy to see in the output of
>
> $ git grep -e '%lld' -e 'st_mtim\.tv_sec'
>
> that we do not use these constructs.
>
> Write it like this instead
>
> strbuf_addf(&line, " %" PRItime,
> (timestamp_t)source_stat.st_mtime);
>
> examples to mimick the uses of timestamp_t and PRItime are found in
> many places; worktree.c, date.c, builtin/blame.c would give plenty.
Oh I see. I really need to get into this mechanism of constantly using
`git grep` to look for examples for pretty much everything. Still
learning. Thanks Junio!
Lorenzo
^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: [GSoC PATCH v4 0/5] preserve promisor files content after repack
2026-04-10 15:47 ` [GSoC PATCH v4 0/5] preserve promisor files content after repack Junio C Hamano
@ 2026-04-10 16:44 ` Lorenzo Pegorari
0 siblings, 0 replies; 78+ messages in thread
From: Lorenzo Pegorari @ 2026-04-10 16:44 UTC (permalink / raw)
To: Junio C Hamano
Cc: git, Taylor Blau, Patrick Steinhardt, Derrick Stolee,
Elijah Newren, Eric Sunshine, Tian Yuchen
On Fri, Apr 10, 2026 at 08:47:43AM -0700, Junio C Hamano wrote:
> LorenzoPegorari <lorenzo.pegorari2002@gmail.com> writes:
>
> > QUESTION:
> > The "CodingGuidelines" explicitly state that:
> > "A C file must directly include the header files that declare the
> > functions and the types it uses, except for the functions and types
> > that are made available to it by including one of the header files
> > it must include by the previous rule"
> > where "the previous rule" is (if I understand correctly), the one related
> > to "<git-compat-util.h>". From what I understand then, I should have
> > added an include for "strmap.h" (which is needed for `strset`), correct?
> > And if I am correct, shouldn't "strbuf.h", "hash.h", "odb.h",
> > "string-list.h" and "strvec.h" also be included?
>
> If you are using any of the facilities declared in these header
> files in your program, yes.
Got it.
> In practice many header files pull in other header files for
> definitions they themselves use. For example, <X.h> that defines
> "struct X" may include <Y.h> for the definition of "struct Y" because
> the former embeds an instance of the latter, instead of having a pointer
> to an on-heap instance of the latter.
>
> If you use both "struct X" and "struct Y", your program may compile
> with only <X.h> included without <Y.h> included in such a case, but
> the guideline suggests against doing so, because it should not be
> relied on. The implementation of "struct X" may change in the
> future and stop depending on "struct Y", at which time <X.h> stops
> including <Y.h> itself, and your program would start failing to
> build, because you use "struct Y" but without including <Y.h>.
>
> But in practice, use of strbuf is so widespread and the header is
> included in some other headers that do not need to, so your build
> may happen to work without including <strbuf.h>, for example.
Yeah, I 100% understand this. I simply found it weird that there were
many missing headers, so I was scared that I was not understanding the
guidelines.
I will add a 6th patch that adds these missing headers, in order to
comply with the guidelines.
Thanks,
Lorenzo
^ permalink raw reply [flat|nested] 78+ messages in thread
* [PATCH] CodingGuidelines: st_mtimespec vs st_mtim vs st_mtime
2026-04-10 16:01 ` Junio C Hamano
2026-04-10 16:34 ` Lorenzo Pegorari
@ 2026-04-10 18:10 ` Junio C Hamano
2026-04-16 23:46 ` Elijah Newren
1 sibling, 1 reply; 78+ messages in thread
From: Junio C Hamano @ 2026-04-10 18:10 UTC (permalink / raw)
To: git
Cc: LorenzoPegorari, Taylor Blau, Patrick Steinhardt, Derrick Stolee,
Elijah Newren, Eric Sunshine, Tian Yuchen
Most unfortunately macOS does not support st_[amc]tim for timestamps
down to nanosecond resolution as POSIX systems.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
Documentation/CodingGuidelines | 6 ++++++
1 file changed, 6 insertions(+)
diff --git c/Documentation/CodingGuidelines w/Documentation/CodingGuidelines
index 4992e52093..4e54139fd7 100644
--- c/Documentation/CodingGuidelines
+++ w/Documentation/CodingGuidelines
@@ -693,6 +693,12 @@ For C programs:
char *dogs[] = ...;
walk_all_dogs(dogs);
+ - For file timestamps, do not use "st_mtim" (and other timestamp
+ members in "struct stat") unconditionally; not everybody is POSIX
+ (grep for USE_ST_TIMESPEC). If you only need timestamp in whole
+ second resolution, "st_mtime" should work fine everywhere.
+
+
For Perl programs:
- Most of the C guidelines above apply.
^ permalink raw reply related [flat|nested] 78+ messages in thread
* [GSoC PATCH v5 0/6] preserve promisor files content after repack
2026-04-10 15:01 ` [GSoC PATCH v4 0/5] preserve promisor files content after repack LorenzoPegorari
` (5 preceding siblings ...)
2026-04-10 15:47 ` [GSoC PATCH v4 0/5] preserve promisor files content after repack Junio C Hamano
@ 2026-04-10 22:54 ` LorenzoPegorari
2026-04-10 22:54 ` [GSoC PATCH v5 1/6] pack-write: add explanation to promisor file content LorenzoPegorari
` (6 more replies)
2026-04-10 23:05 ` [GSoC PATCH v4 0/5] " Junio C Hamano
7 siblings, 7 replies; 78+ messages in thread
From: LorenzoPegorari @ 2026-04-10 22:54 UTC (permalink / raw)
To: git
Cc: Taylor Blau, Derrick Stolee, Junio C Hamano, Patrick Steinhardt,
Tian Yuchen, Eric Sunshine, Elijah Newren
The goal of this patch is to solve the NEEDSWORK comment added by
5374a290 (fetch-pack: write fetched refs to .promisor, 14/10/2019). This
is done by adding a helper function that takes the content of all
.promisor files in the `repository`, and copies it inside the first
.promisor file created by the repack.
Also, I added a comment explaining what is the purpose of the content of
the .promisor files, since this wasn't explained anywhere (I found
information regarding this only in the message of the previously cited
commit).
Finally, I added some tests to "t7700-repack.sh" and
"t7703-repack-geometric.sh" that check if the content of .promisor files
are correctly copied into the .promisor files created by a repack.
If Eric Sunshine, Tian Yuchen (for the patch 2/5 "pack-write: add helper
to fill promisor file after repack") and Junio Hamano (for all patches)
want to be added with a `<Reviewed-by>` tag, please let me know (and, of
course, thanks a lot for the help)!
V5 DIFF:
* fixed commit message (from `pack-write:` to `repack-promisor:`).
* fixed timestamp `fprintf()` format.
LorenzoPegorari (6):
pack-write: add explanation to promisor file content
repack-promisor add helper to fill promisor file after repack
repack-promisor: preserve content of promisor files after repack
t7700: test for promisor file content after repack
t7703: test for promisor file content after geometric repack
repack-promisor: add missing headers
Documentation/git-repack.adoc | 4 +-
pack-write.c | 9 ++
repack-promisor.c | 152 +++++++++++++++++++++++++++++++---
t/t7700-repack.sh | 60 ++++++++++++++
t/t7703-repack-geometric.sh | 33 ++++++++
5 files changed, 243 insertions(+), 15 deletions(-)
Range-diff against v4:
1: b4990fcdf0 = 1: b4990fcdf0 pack-write: add explanation to promisor file content
2: 34c4e79311 ! 2: 3558bb3895 pack-write: add helper to fill promisor file after repack
@@ Metadata
Author: LorenzoPegorari <lorenzo.pegorari2002@gmail.com>
## Commit message ##
- pack-write: add helper to fill promisor file after repack
+ repack-promisor add helper to fill promisor file after repack
A ".promisor" file may contain ref names (and their associated hashes)
that were fetched at the time the corresponding packfile was downloaded.
@@ repack-promisor.c: static int write_oid(const struct object_id *oid,
+
+ /* If <time> doesn't exist, retrieve it and add it to line */
+ if (line_sections.nr < 3)
-+ strbuf_addf(&line, " %lld", (long long int)source_stat.st_mtim.tv_sec);
++ strbuf_addf(&line, " %" PRItime, (timestamp_t)source_stat.st_mtime);
+
+ /*
+ * Add the finalized line to dest_to_write and dest_content if it
3: 72ef2378b9 = 3: b483be7558 repack-promisor: preserve content of promisor files after repack
4: 0aceaed480 = 4: f631993c89 t7700: test for promisor file content after repack
5: d9f6341481 = 5: ab307e68fe t7703: test for promisor file content after geometric repack
-: ---------- > 6: e8720aaf12 repack-promisor: add missing headers
--
2.53.0.584.ge8720aaf12
^ permalink raw reply [flat|nested] 78+ messages in thread
* [GSoC PATCH v5 1/6] pack-write: add explanation to promisor file content
2026-04-10 22:54 ` [GSoC PATCH v5 0/6] " LorenzoPegorari
@ 2026-04-10 22:54 ` LorenzoPegorari
2026-04-10 22:55 ` [GSoC PATCH v5 2/6] repack-promisor add helper to fill promisor file after repack LorenzoPegorari
` (5 subsequent siblings)
6 siblings, 0 replies; 78+ messages in thread
From: LorenzoPegorari @ 2026-04-10 22:54 UTC (permalink / raw)
To: git
Cc: Taylor Blau, Derrick Stolee, Junio C Hamano, Patrick Steinhardt,
Tian Yuchen, Eric Sunshine, Elijah Newren
In the entire codebase there is no explanation as to why the ".promisor"
files may contain the ref names (and their associated hashes) that were
fetched at the time the corresponding packfile was downloaded.
As explained in the log message of commit 5374a290 (fetch-pack: write
fetched refs to .promisor, 2019-10-14), where this loop originally came
from, these ref names (and associated hashes) are not used for anything
in the production, but are solely there to help debugging.
Explain this in a new comment.
Signed-off-by: LorenzoPegorari <lorenzo.pegorari2002@gmail.com>
---
pack-write.c | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/pack-write.c b/pack-write.c
index 83eaf88541..b8ab9510ff 100644
--- a/pack-write.c
+++ b/pack-write.c
@@ -603,6 +603,15 @@ void write_promisor_file(const char *promisor_name, struct ref **sought, int nr_
int i, err;
FILE *output = xfopen(promisor_name, "w");
+ /*
+ * Write in the .promisor file the ref names and associated hashes,
+ * obtained by fetch-pack, at the point of generation of the
+ * corresponding packfile. These pieces of info are only used to make
+ * it easier to debug issues with partial clones, as we can identify
+ * what refs (and their associated hashes) were fetched at the time
+ * the packfile was downloaded, and if necessary, compare those hashes
+ * against what the promisor remote reports now.
+ */
for (i = 0; i < nr_sought; i++)
fprintf(output, "%s %s\n", oid_to_hex(&sought[i]->old_oid),
sought[i]->name);
--
2.53.0.584.ge8720aaf12
^ permalink raw reply related [flat|nested] 78+ messages in thread
* [GSoC PATCH v5 2/6] repack-promisor add helper to fill promisor file after repack
2026-04-10 22:54 ` [GSoC PATCH v5 0/6] " LorenzoPegorari
2026-04-10 22:54 ` [GSoC PATCH v5 1/6] pack-write: add explanation to promisor file content LorenzoPegorari
@ 2026-04-10 22:55 ` LorenzoPegorari
2026-04-10 23:30 ` Junio C Hamano
2026-04-12 6:27 ` Junio C Hamano
2026-04-10 22:55 ` [GSoC PATCH v5 3/6] repack-promisor: preserve content of promisor files " LorenzoPegorari
` (4 subsequent siblings)
6 siblings, 2 replies; 78+ messages in thread
From: LorenzoPegorari @ 2026-04-10 22:55 UTC (permalink / raw)
To: git
Cc: Taylor Blau, Derrick Stolee, Junio C Hamano, Patrick Steinhardt,
Tian Yuchen, Eric Sunshine, Elijah Newren
A ".promisor" file may contain ref names (and their associated hashes)
that were fetched at the time the corresponding packfile was downloaded.
This information is used for debugging reasons. This information is
stored as lines structured like this: "<oid> <ref>".
Create a `copy_promisor_content()` helper function that allows this
debugging info to not be lost after a `repack`, by copying it inside a
new ".promisor" file.
The function logic is the following:
* Take all ".promisor" files contained inside the given `repo`.
* Ignore those whose name is contained inside the given `strset
not_repacked_names`, which basically acts as a "promisor ignorelist"
(intended to be used for packfiles that have not been repacked).
* Read each line of the remaining ".promisor" files, which can be:
* "<oid> <ref>" if the ".promisor" file was never repacked. If so,
add the time (in Unix time) at which the ".promisor" file was last
modified <time> to the line, to obtain: "<oid> <ref> <time>".
* "<oid> <ref> <time>" if the ".promisor" file was repacked. If so,
don't modify it.
* Ignore the line if its <oid> is not present inside the
"<packtmp>-<dest_hex>.idx" file.
* If the destination file "<packtmp>-<dest_hex>.promisor" does not
already contain the line, append it to the file.
The function assumes that the contents of all ".promisor" files are
correctly formed.
The time of last data modification, for never-repacked ".promisor" file,
can be used when comparing the entries in it with entries in another
".promisor" file that did get repacked. With these timestamps, the
debugger will be able to tell at which time the refs at the remote
repository pointed at what object. Also, when looking at already
repacked ".promisor" files, the same ref may appear multiple times, and
having timestamps will help understanding what happened over time.
Signed-off-by: LorenzoPegorari <lorenzo.pegorari2002@gmail.com>
---
repack-promisor.c | 116 ++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 116 insertions(+)
diff --git a/repack-promisor.c b/repack-promisor.c
index 90318ce150..72677f8c9f 100644
--- a/repack-promisor.c
+++ b/repack-promisor.c
@@ -34,6 +34,122 @@ static int write_oid(const struct object_id *oid,
return 0;
}
+/*
+ * Go through all .promisor files contained in repo (excluding those whose name
+ * appears in not_repacked_basenames, which acts as a ignorelist), and copies
+ * their content inside the destination file "<packtmp>-<dest_hex>.promisor".
+ * Each line of a never repacked .promisor file is: "<oid> <ref>" (as described
+ * in the write_promisor_file() function).
+ * After a repack, the copied lines will be: "<oid> <ref> <time>", where <time>
+ * is the time (in Unix time) at which the .promisor file was last modified.
+ * Only the lines whose <oid> is present inside "<packtmp>-<dest_hex>.idx" will
+ * be copied.
+ * The contents of all .promisor files are assumed to be correctly formed.
+ */
+static void copy_promisor_content(struct repository *repo,
+ const char *dest_hex,
+ const char *packtmp,
+ struct strset *not_repacked_basenames)
+{
+ char *dest_idx_name;
+ char *dest_promisor_name;
+ FILE *dest;
+ struct strset dest_content = STRSET_INIT;
+ struct strbuf dest_to_write = STRBUF_INIT;
+ struct strbuf source_promisor_name = STRBUF_INIT;
+ struct strbuf line = STRBUF_INIT;
+ struct object_id dest_oid;
+ struct packed_git *dest_pack, *p;
+ int err;
+
+ dest_idx_name = mkpathdup("%s-%s.idx", packtmp, dest_hex);
+ get_oid_hex_algop(dest_hex, &dest_oid, repo->hash_algo);
+ dest_pack = parse_pack_index(repo, dest_oid.hash, dest_idx_name);
+
+ /* Open the .promisor dest file, and fill dest_content with its content */
+ dest_promisor_name = mkpathdup("%s-%s.promisor", packtmp, dest_hex);
+ dest = xfopen(dest_promisor_name, "r+");
+ while (strbuf_getline(&line, dest) != EOF)
+ strset_add(&dest_content, line.buf);
+
+ repo_for_each_pack(repo, p) {
+ FILE *source;
+ struct stat source_stat;
+
+ if (!p->pack_promisor)
+ continue;
+
+ if (not_repacked_basenames &&
+ strset_contains(not_repacked_basenames, pack_basename(p)))
+ continue;
+
+ strbuf_reset(&source_promisor_name);
+ strbuf_addstr(&source_promisor_name, p->pack_name);
+ strbuf_strip_suffix(&source_promisor_name, ".pack");
+ strbuf_addstr(&source_promisor_name, ".promisor");
+
+ if (stat(source_promisor_name.buf, &source_stat))
+ die(_("File not found: %s"), source_promisor_name.buf);
+
+ source = xfopen(source_promisor_name.buf, "r");
+
+ while (strbuf_getline(&line, source) != EOF) {
+ struct string_list line_sections = STRING_LIST_INIT_DUP;
+ struct object_id oid;
+
+ /* Split line into <oid>, <ref> and <time> (if <time> exists) */
+ string_list_split(&line_sections, line.buf, " ", 3);
+
+ /* Ignore the lines where <oid> doesn't appear in the dest_pack */
+ get_oid_hex_algop(line_sections.items[0].string, &oid, repo->hash_algo);
+ if (!find_pack_entry_one(&oid, dest_pack)) {
+ string_list_clear(&line_sections, 0);
+ continue;
+ }
+
+ /* If <time> doesn't exist, retrieve it and add it to line */
+ if (line_sections.nr < 3)
+ strbuf_addf(&line, " %" PRItime, (timestamp_t)source_stat.st_mtime);
+
+ /*
+ * Add the finalized line to dest_to_write and dest_content if it
+ * wasn't already present inside dest_content
+ */
+ if (strset_add(&dest_content, line.buf)) {
+ strbuf_addbuf(&dest_to_write, &line);
+ strbuf_addch(&dest_to_write, '\n');
+ }
+
+ string_list_clear(&line_sections, 0);
+ }
+
+ err = ferror(source);
+ err |= fclose(source);
+ if (err)
+ die(_("Could not read '%s' promisor file"), source_promisor_name.buf);
+ }
+
+ /* If dest_to_write is not empty, then there are new lines to append */
+ if (dest_to_write.len) {
+ if (fseek(dest, 0L, SEEK_END))
+ die_errno(_("fseek failed"));
+ fprintf(dest, "%s", dest_to_write.buf);
+ }
+
+ err = ferror(dest);
+ err |= fclose(dest);
+ if (err)
+ die(_("Could not write '%s' promisor file"), dest_promisor_name);
+
+ close_pack_index(dest_pack);
+ free(dest_idx_name);
+ free(dest_promisor_name);
+ strset_clear(&dest_content);
+ strbuf_release(&dest_to_write);
+ strbuf_release(&source_promisor_name);
+ strbuf_release(&line);
+}
+
static void finish_repacking_promisor_objects(struct repository *repo,
struct child_process *cmd,
struct string_list *names,
--
2.53.0.584.ge8720aaf12
^ permalink raw reply related [flat|nested] 78+ messages in thread
* [GSoC PATCH v5 3/6] repack-promisor: preserve content of promisor files after repack
2026-04-10 22:54 ` [GSoC PATCH v5 0/6] " LorenzoPegorari
2026-04-10 22:54 ` [GSoC PATCH v5 1/6] pack-write: add explanation to promisor file content LorenzoPegorari
2026-04-10 22:55 ` [GSoC PATCH v5 2/6] repack-promisor add helper to fill promisor file after repack LorenzoPegorari
@ 2026-04-10 22:55 ` LorenzoPegorari
2026-04-11 18:25 ` Tian Yuchen
2026-04-10 22:55 ` [GSoC PATCH v5 4/6] t7700: test for promisor file content " LorenzoPegorari
` (3 subsequent siblings)
6 siblings, 1 reply; 78+ messages in thread
From: LorenzoPegorari @ 2026-04-10 22:55 UTC (permalink / raw)
To: git
Cc: Taylor Blau, Derrick Stolee, Junio C Hamano, Patrick Steinhardt,
Tian Yuchen, Eric Sunshine, Elijah Newren
When a repack involving promisor packfiles happens, the new ".promisor"
file is created empty, losing all the debug info that might be present
inside the ".promisor" files before the repack.
Use the "copy_promisor_content()" function created previously to preserve
the contents of all ".promisor" files inside the first ".promisor" file
created by the repack.
For geometric repacking, we have to create a `strset` that contains the
basenames of all excluded packs. For "normal" repacking this is not
necessary, since there should be no excluded packs.
Also, update the documentation accordingly.
Signed-off-by: LorenzoPegorari <lorenzo.pegorari2002@gmail.com>
---
Documentation/git-repack.adoc | 4 ++--
repack-promisor.c | 30 +++++++++++++++++-------------
2 files changed, 19 insertions(+), 15 deletions(-)
diff --git a/Documentation/git-repack.adoc b/Documentation/git-repack.adoc
index 673ce91083..33d3c8afbd 100644
--- a/Documentation/git-repack.adoc
+++ b/Documentation/git-repack.adoc
@@ -45,8 +45,8 @@ other objects in that pack they already have locally.
+
Promisor packfiles are repacked separately: if there are packfiles that
have an associated ".promisor" file, these packfiles will be repacked
-into another separate pack, and an empty ".promisor" file corresponding
-to the new separate pack will be written.
+into another separate pack, and a ".promisor" file corresponding to the
+new separate pack will be written (with arbitrary contents).
-A::
Same as `-a`, unless `-d` is used. Then any unreachable
diff --git a/repack-promisor.c b/repack-promisor.c
index 72677f8c9f..6d9590cd4e 100644
--- a/repack-promisor.c
+++ b/repack-promisor.c
@@ -153,7 +153,8 @@ static void copy_promisor_content(struct repository *repo,
static void finish_repacking_promisor_objects(struct repository *repo,
struct child_process *cmd,
struct string_list *names,
- const char *packtmp)
+ const char *packtmp,
+ struct strset *not_repacked_basenames)
{
struct strbuf line = STRBUF_INIT;
FILE *out;
@@ -171,19 +172,15 @@ static void finish_repacking_promisor_objects(struct repository *repo,
/*
* pack-objects creates the .pack and .idx files, but not the
- * .promisor file. Create the .promisor file, which is empty.
- *
- * NEEDSWORK: fetch-pack sometimes generates non-empty
- * .promisor files containing the ref names and associated
- * hashes at the point of generation of the corresponding
- * packfile, but this would not preserve their contents. Maybe
- * concatenate the contents of all .promisor files instead of
- * just creating a new empty file.
+ * .promisor file. Create the .promisor file.
*/
promisor_name = mkpathdup("%s-%s.promisor", packtmp,
line.buf);
write_promisor_file(promisor_name, NULL, 0);
+ /* Now let's fill the content of the newly created .promisor file */
+ copy_promisor_content(repo, line.buf, packtmp, not_repacked_basenames);
+
item->util = generated_pack_populate(item->string, packtmp);
free(promisor_name);
@@ -223,7 +220,7 @@ void repack_promisor_objects(struct repository *repo,
return;
}
- finish_repacking_promisor_objects(repo, &cmd, names, packtmp);
+ finish_repacking_promisor_objects(repo, &cmd, names, packtmp, NULL);
}
void pack_geometry_repack_promisors(struct repository *repo,
@@ -234,6 +231,7 @@ void pack_geometry_repack_promisors(struct repository *repo,
{
struct child_process cmd = CHILD_PROCESS_INIT;
FILE *in;
+ struct strset not_repacked_basenames = STRSET_INIT;
if (!geometry->promisor_split)
return;
@@ -247,9 +245,15 @@ void pack_geometry_repack_promisors(struct repository *repo,
in = xfdopen(cmd.in, "w");
for (size_t i = 0; i < geometry->promisor_split; i++)
fprintf(in, "%s\n", pack_basename(geometry->promisor_pack[i]));
- for (size_t i = geometry->promisor_split; i < geometry->promisor_pack_nr; i++)
- fprintf(in, "^%s\n", pack_basename(geometry->promisor_pack[i]));
+ for (size_t i = geometry->promisor_split; i < geometry->promisor_pack_nr; i++) {
+ const char *name = pack_basename(geometry->promisor_pack[i]);
+ fprintf(in, "^%s\n", name);
+ strset_add(¬_repacked_basenames, name);
+ }
fclose(in);
- finish_repacking_promisor_objects(repo, &cmd, names, packtmp);
+ finish_repacking_promisor_objects(repo, &cmd, names, packtmp,
+ strset_get_size(¬_repacked_basenames) ? ¬_repacked_basenames : NULL);
+
+ strset_clear(¬_repacked_basenames);
}
--
2.53.0.584.ge8720aaf12
^ permalink raw reply related [flat|nested] 78+ messages in thread
* [GSoC PATCH v5 4/6] t7700: test for promisor file content after repack
2026-04-10 22:54 ` [GSoC PATCH v5 0/6] " LorenzoPegorari
` (2 preceding siblings ...)
2026-04-10 22:55 ` [GSoC PATCH v5 3/6] repack-promisor: preserve content of promisor files " LorenzoPegorari
@ 2026-04-10 22:55 ` LorenzoPegorari
2026-04-10 22:56 ` [GSoC PATCH v5 5/6] t7703: test for promisor file content after geometric repack LorenzoPegorari
` (2 subsequent siblings)
6 siblings, 0 replies; 78+ messages in thread
From: LorenzoPegorari @ 2026-04-10 22:55 UTC (permalink / raw)
To: git
Cc: Taylor Blau, Derrick Stolee, Junio C Hamano, Patrick Steinhardt,
Tian Yuchen, Eric Sunshine, Elijah Newren
Add tests that check if the content of ".promisor" files are correctly
copied inside the ".promisor" files created by a repack.
The `-f` flag is used when repacking to ensure that all the packs
(created with `test_commit_bulk`) are repacked into a single new pack.
Signed-off-by: LorenzoPegorari <lorenzo.pegorari2002@gmail.com>
---
t/t7700-repack.sh | 60 +++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 60 insertions(+)
diff --git a/t/t7700-repack.sh b/t/t7700-repack.sh
index 63ef63fc50..186a931ea7 100755
--- a/t/t7700-repack.sh
+++ b/t/t7700-repack.sh
@@ -904,4 +904,64 @@ test_expect_success 'pending objects are repacked appropriately' '
)
'
+test_expect_success 'check one .promisor file content after repack' '
+ test_when_finished rm -rf prom_test prom_before_repack &&
+ git init prom_test &&
+ path=prom_test/.git/objects/pack &&
+
+ (
+ test_commit_bulk -C prom_test 1 &&
+
+ # Simulate .promisor file by creating it manually
+ prom=$(ls $path/*.pack | sed "s/\.pack/.promisor/") &&
+ oid=$(git -C prom_test rev-parse HEAD) &&
+ echo "$oid ref" >"$prom" &&
+
+ # Repack, and check if correct
+ git -C prom_test repack -a -d -f &&
+ prom=$(ls $path/*.promisor) &&
+ # $prom should contain "$oid ref <time>"
+ test_grep "$prom_before_repack " "$prom" &&
+
+ # Save the current .promisor content, repack, and check if correct
+ cp "$prom" prom_before_repack &&
+ git -C prom_test repack -a -d -f &&
+ prom=$(ls $path/*.promisor) &&
+ # $prom should be exactly the same as prom_before_repack
+ test_cmp prom_before_repack "$prom"
+ )
+'
+
+test_expect_success 'check multiple .promisor file content after repack' '
+ test_when_finished rm -rf prom_test prom_before_repack &&
+ git init prom_test &&
+ path=prom_test/.git/objects/pack &&
+
+ (
+ # Create 2 packs and simulate .promisor files by creating them manually
+ test_commit_bulk -C prom_test 1 &&
+ prom=$(ls $path/*.pack | sed "s/\.pack/.promisor/") &&
+ oid1=$(git -C prom_test rev-parse HEAD) &&
+ echo "$oid1 ref1" >"$prom" &&
+ test_commit_bulk -C prom_test 1 &&
+ prom=$(ls $path/*.pack | sed "s/\.pack/.promisor/; \|$prom|d") &&
+ oid2=$(git -C prom_test rev-parse HEAD) &&
+ echo "$oid2 ref2" >"$prom" &&
+
+ # Repack, and check if correct
+ git -C prom_test repack -a -d -f &&
+ prom=$(ls $path/*.promisor) &&
+ # $prom should contain "$oid1 ref1 <time>" & "$oid2 ref2 <time>"
+ test_grep "$oid1 ref1 " "$prom" &&
+ test_grep "$oid2 ref2 " "$prom" &&
+
+ # Save the current .promisor content, repack, and check if correct
+ cp "$prom" prom_before_repack &&
+ git -C prom_test repack -a -d -f &&
+ prom=$(ls $path/*.promisor) &&
+ # $prom should be exactly the same as prom_before_repack
+ test_cmp prom_before_repack "$prom"
+ )
+'
+
test_done
--
2.53.0.584.ge8720aaf12
^ permalink raw reply related [flat|nested] 78+ messages in thread
* [GSoC PATCH v5 5/6] t7703: test for promisor file content after geometric repack
2026-04-10 22:54 ` [GSoC PATCH v5 0/6] " LorenzoPegorari
` (3 preceding siblings ...)
2026-04-10 22:55 ` [GSoC PATCH v5 4/6] t7700: test for promisor file content " LorenzoPegorari
@ 2026-04-10 22:56 ` LorenzoPegorari
2026-04-11 18:49 ` Tian Yuchen
2026-04-10 22:56 ` [GSoC PATCH v5 6/6] repack-promisor: add missing headers LorenzoPegorari
2026-04-18 14:16 ` [GSoC PATCH v6 0/6] preserve promisor files content after repack LorenzoPegorari
6 siblings, 1 reply; 78+ messages in thread
From: LorenzoPegorari @ 2026-04-10 22:56 UTC (permalink / raw)
To: git
Cc: Taylor Blau, Derrick Stolee, Junio C Hamano, Patrick Steinhardt,
Tian Yuchen, Eric Sunshine, Elijah Newren
Add test that checks if the content of ".promisor" files are correctly
copied inside the ".promisor" files created by a geometric repack.
Signed-off-by: LorenzoPegorari <lorenzo.pegorari2002@gmail.com>
---
t/t7703-repack-geometric.sh | 33 +++++++++++++++++++++++++++++++++
1 file changed, 33 insertions(+)
diff --git a/t/t7703-repack-geometric.sh b/t/t7703-repack-geometric.sh
index 04d5d8fc33..a8e3e6ae3f 100755
--- a/t/t7703-repack-geometric.sh
+++ b/t/t7703-repack-geometric.sh
@@ -541,4 +541,37 @@ test_expect_success 'geometric repack works with promisor packs' '
)
'
+test_expect_success 'check .promisor file content after geometric repack' '
+ test_when_finished rm -rf prom_test &&
+ git init prom_test &&
+ path=prom_test/.git/objects/pack &&
+
+ (
+ # Create 2 packs with 3 objs each, and manually create .promisor files
+ test_commit_bulk -C prom_test --start=1 1 && # 3 objects
+ prom1=$(ls $path/*.pack | sed "s/\.pack/.promisor/") &&
+ oid1=$(git -C prom_test rev-parse HEAD) &&
+ echo "$oid1 ref1" >"$prom1" &&
+ test_commit_bulk -C prom_test --start=2 1 && # 3 objects
+ prom2=$(ls $path/*.pack | sed "s/\.pack/.promisor/; \|$prom1|d") &&
+ oid2=$(git -C prom_test rev-parse HEAD) &&
+ echo "$oid2 ref2" >"$prom2" &&
+
+ # Create 1 pack with 12 objs, and manually create .promisor file
+ test_commit_bulk -C prom_test --start=3 4 && # 12 objects
+ prom3=$(ls $path/*.pack | sed "s/\.pack/.promisor/; \|$prom1|d; \|$prom2|d") &&
+ oid3=$(git -C prom_test rev-parse HEAD) &&
+ echo "$oid3 ref3" >"$prom3" &&
+
+ # Geometric repack, and check if correct
+ git -C prom_test repack --geometric 2 -d &&
+ prom=$(ls $path/*.pack | sed "s/\.pack/.promisor/; \|$prom3|d") &&
+ # $prom should have repacked only the first 2 small packs, so it should only
+ # contain the following: "$oid1 ref1 <time>" & "$oid2 ref2 <time>"
+ test_grep "$oid1 ref1 " "$prom" &&
+ test_grep "$oid2 ref2 " "$prom" &&
+ test_grep ! "$oid3 ref3" "$prom"
+ )
+'
+
test_done
--
2.53.0.584.ge8720aaf12
^ permalink raw reply related [flat|nested] 78+ messages in thread
* [GSoC PATCH v5 6/6] repack-promisor: add missing headers
2026-04-10 22:54 ` [GSoC PATCH v5 0/6] " LorenzoPegorari
` (4 preceding siblings ...)
2026-04-10 22:56 ` [GSoC PATCH v5 5/6] t7703: test for promisor file content after geometric repack LorenzoPegorari
@ 2026-04-10 22:56 ` LorenzoPegorari
2026-04-18 14:16 ` [GSoC PATCH v6 0/6] preserve promisor files content after repack LorenzoPegorari
6 siblings, 0 replies; 78+ messages in thread
From: LorenzoPegorari @ 2026-04-10 22:56 UTC (permalink / raw)
To: git
Cc: Taylor Blau, Derrick Stolee, Junio C Hamano, Patrick Steinhardt,
Tian Yuchen, Eric Sunshine, Elijah Newren
According to the coding guidelines, a C file must directly include the
header files that declare the facilities it uses.
Directly include these missing headers, in order to comply with the
coding guidelines.
Signed-off-by: LorenzoPegorari <lorenzo.pegorari2002@gmail.com>
---
repack-promisor.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/repack-promisor.c b/repack-promisor.c
index 6d9590cd4e..26055212a3 100644
--- a/repack-promisor.c
+++ b/repack-promisor.c
@@ -1,11 +1,17 @@
#include "git-compat-util.h"
#include "repack.h"
+#include "hash.h"
#include "hex.h"
+#include "odb.h"
#include "pack.h"
#include "packfile.h"
#include "path.h"
#include "repository.h"
#include "run-command.h"
+#include "strbuf.h"
+#include "string-list.h"
+#include "strmap.h"
+#include "strvec.h"
struct write_oid_context {
struct child_process *cmd;
--
2.53.0.584.ge8720aaf12
^ permalink raw reply related [flat|nested] 78+ messages in thread
* Re: [GSoC PATCH v4 0/5] preserve promisor files content after repack
2026-04-10 15:01 ` [GSoC PATCH v4 0/5] preserve promisor files content after repack LorenzoPegorari
` (6 preceding siblings ...)
2026-04-10 22:54 ` [GSoC PATCH v5 0/6] " LorenzoPegorari
@ 2026-04-10 23:05 ` Junio C Hamano
2026-04-11 2:02 ` Junio C Hamano
7 siblings, 1 reply; 78+ messages in thread
From: Junio C Hamano @ 2026-04-10 23:05 UTC (permalink / raw)
To: LorenzoPegorari
Cc: git, Taylor Blau, Patrick Steinhardt, Derrick Stolee,
Elijah Newren, Eric Sunshine, Tian Yuchen
LorenzoPegorari <lorenzo.pegorari2002@gmail.com> writes:
> The goal of this patch is to solve the NEEDSWORK comment added by
> 5374a290 (fetch-pack: write fetched refs to .promisor, 14/10/2019). This
> is done by adding a helper function that takes the content of all
> .promisor files in the `repository`, and copies it inside the first
> .promisor file created by the repack.
> ...
> V4 DIFF:
> * `copy_promisor_content()` now prints timestamps in Unix time format.
> * `copy_promisor_content()` now doesn't use a list of `strbuf`, but
> instead uses the more lightweight `string_list`.
> * improved the tests.
> * fixed issue (that showed up in the GitHub Actions-based CI) where
> sometimes the 2 packs created in the second new test inside "t7700"
> were not both repacked into a single new pack.
When merged to the tip of 'seen' (with a fixup to use st_mtime where
we need only whole second precision, to avoid using st_mtim on
platforms that do not have it), this seems to break linux-leaks and
linux-reftable-leaks CI jobs (t0410, t5616, and t5710).
This topic standalone, without interaction with other things in
'seen', breaks these three tests.
https://github.com/git/git/actions/runs/24267948258/job/70866907548
This is one commit directly on top of your topic that reduces CI
jobs down to just two "leaks" job, and removes many test scripts
to leave only these three breaking ones.
I managed to also locally reproduce this failure. Here is how
$ cd t && t5710-*.sh -i -v
dies:
Direct leak of 285 byte(s) in 1 object(s) allocated from:
#0 0x55ce48ca1d4d in malloc (git+0x8cd4d) (BuildId: 1aa6efa30b2fc4772028a3dd31aba3ced49bf128)
#1 0x55ce48fed3f2 in do_xmalloc wrapper.c:55:8
#2 0x55ce48fed3b6 in xmalloc wrapper.c:76:9
#3 0x55ce48eed314 in alloc_packed_git packfile.c:306:25
#4 0x55ce48eed209 in parse_pack_index packfile.c:326:25
#5 0x55ce48f65f03 in copy_promisor_content repack-promisor.c:67:14
...
^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: [GSoC PATCH v5 2/6] repack-promisor add helper to fill promisor file after repack
2026-04-10 22:55 ` [GSoC PATCH v5 2/6] repack-promisor add helper to fill promisor file after repack LorenzoPegorari
@ 2026-04-10 23:30 ` Junio C Hamano
2026-04-11 1:59 ` Lorenzo Pegorari
2026-04-12 6:27 ` Junio C Hamano
1 sibling, 1 reply; 78+ messages in thread
From: Junio C Hamano @ 2026-04-10 23:30 UTC (permalink / raw)
To: LorenzoPegorari, Patrick Steinhardt
Cc: git, Taylor Blau, Derrick Stolee, Patrick Steinhardt, Tian Yuchen,
Eric Sunshine, Elijah Newren
LorenzoPegorari <lorenzo.pegorari2002@gmail.com> writes:
> + dest_pack = parse_pack_index(repo, dest_oid.hash, dest_idx_name);
parse_pack_index() has this comment:
/*
* Parse the pack idx file found at idx_path and create a packed_git struct
* which can be used with find_pack_entry_one().
*
* You probably don't want to use this function! It skips most of the normal
* sanity checks (including whether we even have the matching .pack file),
* and does not add the resulting packed_git struct to the internal list of
* packs. You probably want add_packed_git() instead.
*/
struct packed_git *parse_pack_index(struct repository *r, unsigned char *sha1,
const char *idx_path);
The function can return NULL, but this caller does not seem to be
prepared for it to return NULL (i.e., the loop introduced by the
repo_for_each_pack() macro we see below, nobody assumes dest_pack
could be NULL).
But what pack index file are we parsing here? Isn't it already part
of the running system that we should be able to find on the list of
packfiles in the packfile store? Is this because we lack "find a
packfile on this packfile store by name" API, because what we want
to find if each of <oid>s we have appear in the particular packfile
or not, and packfile_list_find_oid() is not sufficiently precise (i.e.
"the object appears in one of the packfile on the list" is not what
we want to know, "the object appears in this particular packfile" is)?
Patrick CC'ed primarily because this part of the API and the
data structures have been reshuffled to add quite a lot of
abstraction since I last looked at the area.
As close_pack_index(dest_pack) does not release resources held by
dest_pack itself (even though the region of mmaped memory that is
pointed at by its index_data member is unmapped), I think that is
where the memory leak is breaking the CI jobs (see my other message).
But I am not sure if the use of parse_pack_index() - close_pack_index()
API is the right thing to use here.
> + /* Open the .promisor dest file, and fill dest_content with its content */
> + dest_promisor_name = mkpathdup("%s-%s.promisor", packtmp, dest_hex);
> + dest = xfopen(dest_promisor_name, "r+");
> + while (strbuf_getline(&line, dest) != EOF)
> + strset_add(&dest_content, line.buf);
> +
> + repo_for_each_pack(repo, p) {
> + FILE *source;
> + struct stat source_stat;
> +
> + if (!p->pack_promisor)
> + continue;
> +
> + if (not_repacked_basenames &&
> + strset_contains(not_repacked_basenames, pack_basename(p)))
> + continue;
> +
> + strbuf_reset(&source_promisor_name);
> + strbuf_addstr(&source_promisor_name, p->pack_name);
> + strbuf_strip_suffix(&source_promisor_name, ".pack");
> + strbuf_addstr(&source_promisor_name, ".promisor");
> +
> + if (stat(source_promisor_name.buf, &source_stat))
> + die(_("File not found: %s"), source_promisor_name.buf);
> +
> + source = xfopen(source_promisor_name.buf, "r");
> +
> + while (strbuf_getline(&line, source) != EOF) {
> + struct string_list line_sections = STRING_LIST_INIT_DUP;
> + struct object_id oid;
> +
> + /* Split line into <oid>, <ref> and <time> (if <time> exists) */
> + string_list_split(&line_sections, line.buf, " ", 3);
> +
> + /* Ignore the lines where <oid> doesn't appear in the dest_pack */
> + get_oid_hex_algop(line_sections.items[0].string, &oid, repo->hash_algo);
> + if (!find_pack_entry_one(&oid, dest_pack)) {
> + string_list_clear(&line_sections, 0);
> + continue;
> + }
> +
> + /* If <time> doesn't exist, retrieve it and add it to line */
> + if (line_sections.nr < 3)
> + strbuf_addf(&line, " %" PRItime, (timestamp_t)source_stat.st_mtime);
> +
> + /*
> + * Add the finalized line to dest_to_write and dest_content if it
> + * wasn't already present inside dest_content
> + */
> + if (strset_add(&dest_content, line.buf)) {
> + strbuf_addbuf(&dest_to_write, &line);
> + strbuf_addch(&dest_to_write, '\n');
> + }
> +
> + string_list_clear(&line_sections, 0);
> + }
> +
> + err = ferror(source);
> + err |= fclose(source);
> + if (err)
> + die(_("Could not read '%s' promisor file"), source_promisor_name.buf);
> + }
> +
> + /* If dest_to_write is not empty, then there are new lines to append */
> + if (dest_to_write.len) {
> + if (fseek(dest, 0L, SEEK_END))
> + die_errno(_("fseek failed"));
> + fprintf(dest, "%s", dest_to_write.buf);
> + }
> +
> + err = ferror(dest);
> + err |= fclose(dest);
> + if (err)
> + die(_("Could not write '%s' promisor file"), dest_promisor_name);
> +
> + close_pack_index(dest_pack);
> + free(dest_idx_name);
> + free(dest_promisor_name);
> + strset_clear(&dest_content);
> + strbuf_release(&dest_to_write);
> + strbuf_release(&source_promisor_name);
> + strbuf_release(&line);
> +}
> +
> static void finish_repacking_promisor_objects(struct repository *repo,
> struct child_process *cmd,
> struct string_list *names,
^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: [GSoC PATCH v5 2/6] repack-promisor add helper to fill promisor file after repack
2026-04-10 23:30 ` Junio C Hamano
@ 2026-04-11 1:59 ` Lorenzo Pegorari
0 siblings, 0 replies; 78+ messages in thread
From: Lorenzo Pegorari @ 2026-04-11 1:59 UTC (permalink / raw)
To: Junio C Hamano
Cc: Patrick Steinhardt, git, Taylor Blau, Derrick Stolee, Tian Yuchen,
Eric Sunshine, Elijah Newren
On Fri, Apr 10, 2026 at 04:30:40PM -0700, Junio C Hamano wrote:
> LorenzoPegorari <lorenzo.pegorari2002@gmail.com> writes:
>
> > + dest_pack = parse_pack_index(repo, dest_oid.hash, dest_idx_name);
>
> parse_pack_index() has this comment:
>
> /*
> * Parse the pack idx file found at idx_path and create a packed_git struct
> * which can be used with find_pack_entry_one().
> *
> * You probably don't want to use this function! It skips most of the normal
> * sanity checks (including whether we even have the matching .pack file),
> * and does not add the resulting packed_git struct to the internal list of
> * packs. You probably want add_packed_git() instead.
> */
> struct packed_git *parse_pack_index(struct repository *r, unsigned char *sha1,
> const char *idx_path);
>
> The function can return NULL, but this caller does not seem to be
> prepared for it to return NULL (i.e., the loop introduced by the
> repo_for_each_pack() macro we see below, nobody assumes dest_pack
> could be NULL).
That's true. The function is created assuming that the promisor files
are correctly formed, but this is unrelated to the promisor files
content, so it should be checked to avoid any issue. I will add a
`die()` if `parse_pack_index()` returns `NULL`, since this means that
something bad happened considering that we literally just created the
pack that we want to retrieve.
> But what pack index file are we parsing here?
`parse_pack_index()` is used to create a temporary `packed_git` for the
just created pack, so that we can then check if <oid> (for each line of
each ".promisor" file) appears in it (and then append that line to the
".promisor" file of the just created pack).
> Isn't it already part
> of the running system that we should be able to find on the list of
> packfiles in the packfile store?
No, the new pack's `packed_git` doesn't appear in the
`repo_for_each_pack` macro loop, if that is what you are asking.
> Is this because we lack "find a
> packfile on this packfile store by name" API, because what we want
> to find if each of <oid>s we have appear in the particular packfile
> or not, and packfile_list_find_oid() is not sufficiently precise (i.e.
> "the object appears in one of the packfile on the list" is not what
> we want to know, "the object appears in this particular packfile" is)?
Yes, it's pretty much this.
> Patrick CC'ed primarily because this part of the API and the
> data structures have been reshuffled to add quite a lot of
> abstraction since I last looked at the area.
>
> As close_pack_index(dest_pack) does not release resources held by
> dest_pack itself (even though the region of mmaped memory that is
> pointed at by its index_data member is unmapped), I think that is
> where the memory leak is breaking the CI jobs (see my other message).
>
> But I am not sure if the use of parse_pack_index() - close_pack_index()
> API is the right thing to use here.
I struggled a lot to understand how to deal with a pack opened with
`parse_pack_index()`, considering that this function is only used once
in the codebase. In the end I decided to simply use `close_pack_index()`
as it is done in this other instance where `parse_pack_index()` is used.
Having done more research now, I realized that the issue is that the
code is missing a `free(dest_pack)`. With this line, no leak happens,
and all GitHub Actions-based CI tests are green.
Also, I think that `close_pack_index()` is indeed the correct way to
close a `packed_git` opened with `parse_pack_index()` (instead of using
the generic `close_pack()`, or not closing it at all).
Finally, having looked more deeply at that other instance where
`parse_pack_index()` is used, I believe there actually is a possible
memory leak that might rarely ever happen. Isn't there a `free()`
missing, like this:
---
http.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/http.c b/http.c
index 8ea1b9d1f6..e765852071 100644
--- a/http.c
+++ b/http.c
@@ -2446,8 +2446,10 @@ static int fetch_and_setup_pack_index(struct packfile_list *packs,
if (!ret)
close_pack_index(new_pack);
free(tmp_idx);
- if (ret)
+ if (ret) {
+ free(new_pack);
return -1;
+ }
packfile_list_prepend(packs, new_pack);
return 0;
--
> > + /* Open the .promisor dest file, and fill dest_content with its content */
> > + dest_promisor_name = mkpathdup("%s-%s.promisor", packtmp, dest_hex);
> > + dest = xfopen(dest_promisor_name, "r+");
> > + while (strbuf_getline(&line, dest) != EOF)
> > + strset_add(&dest_content, line.buf);
> > +
> > + repo_for_each_pack(repo, p) {
> > + FILE *source;
> > + struct stat source_stat;
> > +
> > + if (!p->pack_promisor)
> > + continue;
> > +
> > + if (not_repacked_basenames &&
> > + strset_contains(not_repacked_basenames, pack_basename(p)))
> > + continue;
> > +
Thanks Junio,
Lorenzo
^ permalink raw reply related [flat|nested] 78+ messages in thread
* Re: [GSoC PATCH v4 0/5] preserve promisor files content after repack
2026-04-10 23:05 ` [GSoC PATCH v4 0/5] " Junio C Hamano
@ 2026-04-11 2:02 ` Junio C Hamano
2026-04-11 14:05 ` Lorenzo Pegorari
0 siblings, 1 reply; 78+ messages in thread
From: Junio C Hamano @ 2026-04-11 2:02 UTC (permalink / raw)
To: LorenzoPegorari
Cc: git, Taylor Blau, Patrick Steinhardt, Derrick Stolee,
Elijah Newren, Eric Sunshine, Tian Yuchen
Junio C Hamano <gitster@pobox.com> writes:
> When merged to the tip of 'seen' (with a fixup to use st_mtime where
> we need only whole second precision, to avoid using st_mtim on
> platforms that do not have it), this seems to break linux-leaks and
> linux-reftable-leaks CI jobs (t0410, t5616, and t5710).
>
> This topic standalone, without interaction with other things in
> 'seen', breaks these three tests.
>
> https://github.com/git/git/actions/runs/24267948258/job/70866907548
>
> This is one commit directly on top of your topic that reduces CI
> jobs down to just two "leaks" job, and removes many test scripts
> to leave only these three breaking ones.
>
>
> I managed to also locally reproduce this failure. Here is how
>
> $ cd t && t5710-*.sh -i -v
>
> dies:
>
> Direct leak of 285 byte(s) in 1 object(s) allocated from:
> #0 0x55ce48ca1d4d in malloc (git+0x8cd4d) (BuildId: 1aa6efa30b2fc4772028a3dd31aba3ced49bf128)
> #1 0x55ce48fed3f2 in do_xmalloc wrapper.c:55:8
> #2 0x55ce48fed3b6 in xmalloc wrapper.c:76:9
> #3 0x55ce48eed314 in alloc_packed_git packfile.c:306:25
> #4 0x55ce48eed209 in parse_pack_index packfile.c:326:25
> #5 0x55ce48f65f03 in copy_promisor_content repack-promisor.c:67:14
> ...
FWIW, the tip of 'seen' as of this writing has queued this topic
(the latest round v5) plus the attached "SQUASH???" commit at its
tip.
The first hunk (die when dest_pack is NULL) is absolutely positively
a wrong thing to do. As I said, I do not know if we want to call
parse_pack_index() here or if we have a more appropriate helper
function to use, but assuming that parse_pack_index() is the right
thing to call, we should be prepared to see NULL returned. BUG("")
is reserved for detected programming errors, and it is absolutely a
wrong thing to call.
As the content copied by this function is supposed to be for
debugging only, I think dying when we cannot copy is not what we
want. Rather, it probably makes more sense to fall back to the
traditional behaviour (e.g., not copying and instead leaving an
empty file, if that is what we did before this patch series).
The second hunk just line-wraps an overly long line. There are
other overly long lines in this deeply indented block (which is a
sign that it might be worth to see if the block is better made into
a small static helper function) that should be line-wrapped in a
similar way, but I didn't bother.
THe last hunk is a real bugfix for the leak (again, provided that
parse_pack_index() is what we want to use).
repack-promisor.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/repack-promisor.c b/repack-promisor.c
index 26055212a3..c7025e97f2 100644
--- a/repack-promisor.c
+++ b/repack-promisor.c
@@ -71,6 +71,8 @@ static void copy_promisor_content(struct repository *repo,
dest_idx_name = mkpathdup("%s-%s.idx", packtmp, dest_hex);
get_oid_hex_algop(dest_hex, &dest_oid, repo->hash_algo);
dest_pack = parse_pack_index(repo, dest_oid.hash, dest_idx_name);
+ if (!dest_pack)
+ BUG("parse_pack_index() failed.");
/* Open the .promisor dest file, and fill dest_content with its content */
dest_promisor_name = mkpathdup("%s-%s.promisor", packtmp, dest_hex);
@@ -115,7 +117,8 @@ static void copy_promisor_content(struct repository *repo,
/* If <time> doesn't exist, retrieve it and add it to line */
if (line_sections.nr < 3)
- strbuf_addf(&line, " %" PRItime, (timestamp_t)source_stat.st_mtime);
+ strbuf_addf(&line, " %" PRItime,
+ (timestamp_t)source_stat.st_mtime);
/*
* Add the finalized line to dest_to_write and dest_content if it
@@ -148,6 +151,7 @@ static void copy_promisor_content(struct repository *repo,
die(_("Could not write '%s' promisor file"), dest_promisor_name);
close_pack_index(dest_pack);
+ free(dest_pack);
free(dest_idx_name);
free(dest_promisor_name);
strset_clear(&dest_content);
--
2.54.0-rc1-178-ge3ec7964f3
^ permalink raw reply related [flat|nested] 78+ messages in thread
* Re: [GSoC PATCH v4 0/5] preserve promisor files content after repack
2026-04-11 2:02 ` Junio C Hamano
@ 2026-04-11 14:05 ` Lorenzo Pegorari
0 siblings, 0 replies; 78+ messages in thread
From: Lorenzo Pegorari @ 2026-04-11 14:05 UTC (permalink / raw)
To: Junio C Hamano
Cc: git, Taylor Blau, Patrick Steinhardt, Derrick Stolee,
Elijah Newren, Eric Sunshine, Tian Yuchen
On Fri, Apr 10, 2026 at 07:02:17PM -0700, Junio C Hamano wrote:
> Junio C Hamano <gitster@pobox.com> writes:
>
> > When merged to the tip of 'seen' (with a fixup to use st_mtime where
> > we need only whole second precision, to avoid using st_mtim on
> > platforms that do not have it), this seems to break linux-leaks and
> > linux-reftable-leaks CI jobs (t0410, t5616, and t5710).
> >
> > This topic standalone, without interaction with other things in
> > 'seen', breaks these three tests.
> >
> > https://github.com/git/git/actions/runs/24267948258/job/70866907548
> >
> > This is one commit directly on top of your topic that reduces CI
> > jobs down to just two "leaks" job, and removes many test scripts
> > to leave only these three breaking ones.
> >
> >
> > I managed to also locally reproduce this failure. Here is how
> >
> > $ cd t && t5710-*.sh -i -v
> >
> > dies:
> >
> > Direct leak of 285 byte(s) in 1 object(s) allocated from:
> > #0 0x55ce48ca1d4d in malloc (git+0x8cd4d) (BuildId: 1aa6efa30b2fc4772028a3dd31aba3ced49bf128)
> > #1 0x55ce48fed3f2 in do_xmalloc wrapper.c:55:8
> > #2 0x55ce48fed3b6 in xmalloc wrapper.c:76:9
> > #3 0x55ce48eed314 in alloc_packed_git packfile.c:306:25
> > #4 0x55ce48eed209 in parse_pack_index packfile.c:326:25
> > #5 0x55ce48f65f03 in copy_promisor_content repack-promisor.c:67:14
> > ...
>
>
> FWIW, the tip of 'seen' as of this writing has queued this topic
> (the latest round v5) plus the attached "SQUASH???" commit at its
> tip.
>
> The first hunk (die when dest_pack is NULL) is absolutely positively
> a wrong thing to do. As I said, I do not know if we want to call
> parse_pack_index() here or if we have a more appropriate helper
> function to use, but assuming that parse_pack_index() is the right
> thing to call, we should be prepared to see NULL returned. BUG("")
> is reserved for detected programming errors, and it is absolutely a
> wrong thing to call.
>
> As the content copied by this function is supposed to be for
> debugging only, I think dying when we cannot copy is not what we
> want. Rather, it probably makes more sense to fall back to the
> traditional behaviour (e.g., not copying and instead leaving an
> empty file, if that is what we did before this patch series).
Got it. I will add a `warning(_("..."))` if this happens, and then
immediately `return -1` to indicate that something went wrong and so the
".promisor" file will be left empty.
> The second hunk just line-wraps an overly long line. There are
> other overly long lines in this deeply indented block (which is a
> sign that it might be worth to see if the block is better made into
> a small static helper function) that should be line-wrapped in a
> similar way, but I didn't bother.
There are only I believe 3 lines of code that exceed the soft cap of 80
characters per line. I will line-wrap these long lines.
> THe last hunk is a real bugfix for the leak (again, provided that
> parse_pack_index() is what we want to use).
Yeah, I also noticed the issue, and reported it in an email pretty much
at the same time that you sent this one.
>
> repack-promisor.c | 6 +++++-
> 1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/repack-promisor.c b/repack-promisor.c
> index 26055212a3..c7025e97f2 100644
> --- a/repack-promisor.c
> +++ b/repack-promisor.c
> @@ -71,6 +71,8 @@ static void copy_promisor_content(struct repository *repo,
> dest_idx_name = mkpathdup("%s-%s.idx", packtmp, dest_hex);
> get_oid_hex_algop(dest_hex, &dest_oid, repo->hash_algo);
> dest_pack = parse_pack_index(repo, dest_oid.hash, dest_idx_name);
> + if (!dest_pack)
> + BUG("parse_pack_index() failed.");
>
> /* Open the .promisor dest file, and fill dest_content with its content */
> dest_promisor_name = mkpathdup("%s-%s.promisor", packtmp, dest_hex);
> @@ -115,7 +117,8 @@ static void copy_promisor_content(struct repository *repo,
>
> /* If <time> doesn't exist, retrieve it and add it to line */
> if (line_sections.nr < 3)
> - strbuf_addf(&line, " %" PRItime, (timestamp_t)source_stat.st_mtime);
> + strbuf_addf(&line, " %" PRItime,
> + (timestamp_t)source_stat.st_mtime);
>
> /*
> * Add the finalized line to dest_to_write and dest_content if it
> @@ -148,6 +151,7 @@ static void copy_promisor_content(struct repository *repo,
> die(_("Could not write '%s' promisor file"), dest_promisor_name);
>
> close_pack_index(dest_pack);
> + free(dest_pack);
> free(dest_idx_name);
> free(dest_promisor_name);
> strset_clear(&dest_content);
> --
Thanks Junio.
Regarding the "Should we use `parse_pack_index()` here?" question, I
have also found success using some like this:
```
[...]
struct packed_git *dest_pack, *p;
struct odb_source_files *files;
int err;
dest_idx_name = mkpathdup("%s-%s.idx", packtmp, dest_hex);
files = odb_source_files_downcast(repo->objects->sources);
dest_pack = packfile_store_load_pack(files->packed, dest_idx_name, 0);
[...]
```
This code passes all the tests without any leaks.
Now, in all honesty, I don't know the implications of using
`parse_pack_index()` compared to using `packfile_store_load_pack()` and
`odb_source_files_downcast()`. Unfortunately these functions are rarely
used in the code, so I don't have many examples to look at.
Let me know what is your opinion on this. In the meantime I will explore
this possibility further.
Thanks a lot,
Lorenzo
^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: [GSoC PATCH v5 3/6] repack-promisor: preserve content of promisor files after repack
2026-04-10 22:55 ` [GSoC PATCH v5 3/6] repack-promisor: preserve content of promisor files " LorenzoPegorari
@ 2026-04-11 18:25 ` Tian Yuchen
2026-04-17 0:34 ` Lorenzo Pegorari
0 siblings, 1 reply; 78+ messages in thread
From: Tian Yuchen @ 2026-04-11 18:25 UTC (permalink / raw)
To: LorenzoPegorari, git
Cc: Taylor Blau, Derrick Stolee, Junio C Hamano, Patrick Steinhardt,
Eric Sunshine, Elijah Newren
On 4/11/26 06:55, LorenzoPegorari wrote:
> @@ -171,19 +172,15 @@ static void finish_repacking_promisor_objects(struct repository *repo,
>
> /*
> * pack-objects creates the .pack and .idx files, but not the
> - * .promisor file. Create the .promisor file, which is empty.
> - *
> - * NEEDSWORK: fetch-pack sometimes generates non-empty
> - * .promisor files containing the ref names and associated
> - * hashes at the point of generation of the corresponding
> - * packfile, but this would not preserve their contents. Maybe
> - * concatenate the contents of all .promisor files instead of
> - * just creating a new empty file.
> + * .promisor file. Create the .promisor file.
> */
> promisor_name = mkpathdup("%s-%s.promisor", packtmp,
> line.buf);
> write_promisor_file(promisor_name, NULL, 0);
>
> + /* Now let's fill the content of the newly created .promisor file */
> + copy_promisor_content(repo, line.buf, packtmp, not_repacked_basenames);
Here, the file opened by copy_promisor_content() is an empty file. Is
this line necessary? ;)
...hold on. I recall you mentioning in one of the versions that you had
downgraded this helper from a generic function to a static one. Since it
now only serves this particular business logic, I think the
implementation should be tweaked slightly as well.
> + /* Open the .promisor dest file, and fill dest_content with its content */
> + dest_promisor_name = mkpathdup("%s-%s.promisor", packtmp, dest_hex);
> + dest = xfopen(dest_promisor_name, "r+");
> + while (strbuf_getline(&line, dest) != EOF)
> + strset_add(&dest_content, line.buf);
If file contains a large number of unique lines, dest_to_write, which is
a strbuf, may keep realloc memory until the loop ends, at which point
all the memory is released. I wonder if this might be wasting some heap.
If it were me, I might write it like this:
struct strset seen_lines = STRSET_INIT;
dest = xfopen(dest_promisor_name, "w");
while (strbuf_getline(&line, source) != EOF) {
if (strset_add(&seen_lines, line.buf)) {
fprintf(dest, "%s\n", line.buf);
}
}
It also prevents file pointer misalignment.
(I think we still need to discuss what should ultimately become of this
helper; at the moment, it seems a bit disjointed, doesn’t it?)
Thank you, Yuchen
^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: [GSoC PATCH v5 5/6] t7703: test for promisor file content after geometric repack
2026-04-10 22:56 ` [GSoC PATCH v5 5/6] t7703: test for promisor file content after geometric repack LorenzoPegorari
@ 2026-04-11 18:49 ` Tian Yuchen
2026-04-17 0:46 ` Lorenzo Pegorari
0 siblings, 1 reply; 78+ messages in thread
From: Tian Yuchen @ 2026-04-11 18:49 UTC (permalink / raw)
To: LorenzoPegorari, git
Cc: Taylor Blau, Derrick Stolee, Junio C Hamano, Patrick Steinhardt,
Eric Sunshine, Elijah Newren
On 4/11/26 06:56, LorenzoPegorari wrote:
> Add test that checks if the content of ".promisor" files are correctly
> copied inside the ".promisor" files created by a geometric repack.
>
> Signed-off-by: LorenzoPegorari <lorenzo.pegorari2002@gmail.com>
> ---
> t/t7703-repack-geometric.sh | 33 +++++++++++++++++++++++++++++++++
> 1 file changed, 33 insertions(+)
>
> diff --git a/t/t7703-repack-geometric.sh b/t/t7703-repack-geometric.sh
> index 04d5d8fc33..a8e3e6ae3f 100755
> --- a/t/t7703-repack-geometric.sh
> +++ b/t/t7703-repack-geometric.sh
> @@ -541,4 +541,37 @@ test_expect_success 'geometric repack works with promisor packs' '
> )
> '
>
> +test_expect_success 'check .promisor file content after geometric repack' '
> + test_when_finished rm -rf prom_test &&
> + git init prom_test &&
> + path=prom_test/.git/objects/pack &&
> +
> + (
> + # Create 2 packs with 3 objs each, and manually create .promisor files
> + test_commit_bulk -C prom_test --start=1 1 && # 3 objects
---
> + prom1=$(ls $path/*.pack | sed "s/\.pack/.promisor/") &&
This approach seems a bit fragile.
- Perhaps you’ve heard the saying which goes like "never parse the
output of ls". In a nutshell, the output of this command is not
standardised;
- *.pack? This may produce multiple lines of output, which I don’t think
is what we want.
- $path instead of "$path", which cannot correctly handle spacing in
directory names;
- sed s command matches the "first" string it meets. We can’t guarantee
that the '.pack' part won’t appear in users’ path names, can we?
(Fun fact: There are approximately 8,000 people in the United States
with the surname 'Pack'. Source: 2010 Census ; - )
> + oid1=$(git -C prom_test rev-parse HEAD) &&
> + echo "$oid1 ref1" >"$prom1" &&
> + test_commit_bulk -C prom_test --start=2 1 && # 3 objects
> + prom2=$(ls $path/*.pack | sed "s/\.pack/.promisor/; \|$prom1|d") &&
> + oid2=$(git -C prom_test rev-parse HEAD) &&
> + echo "$oid2 ref2" >"$prom2" &&
> +
> + # Create 1 pack with 12 objs, and manually create .promisor file
> + test_commit_bulk -C prom_test --start=3 4 && # 12 objects
> + prom3=$(ls $path/*.pack | sed "s/\.pack/.promisor/; \|$prom1|d; \|$prom2|d") &&
> + oid3=$(git -C prom_test rev-parse HEAD) &&
> + echo "$oid3 ref3" >"$prom3" &&
> +
> + # Geometric repack, and check if correct
> + git -C prom_test repack --geometric 2 -d &&
> + prom=$(ls $path/*.pack | sed "s/\.pack/.promisor/; \|$prom3|d") &&
> + # $prom should have repacked only the first 2 small packs, so it should only
> + # contain the following: "$oid1 ref1 <time>" & "$oid2 ref2 <time>"
> + test_grep "$oid1 ref1 " "$prom" &&
> + test_grep "$oid2 ref2 " "$prom" &&
> + test_grep ! "$oid3 ref3" "$prom"
> + )
> +'
> +
> test_done
Thanks, yuchen
^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: [GSoC PATCH v5 2/6] repack-promisor add helper to fill promisor file after repack
2026-04-10 22:55 ` [GSoC PATCH v5 2/6] repack-promisor add helper to fill promisor file after repack LorenzoPegorari
2026-04-10 23:30 ` Junio C Hamano
@ 2026-04-12 6:27 ` Junio C Hamano
2026-04-17 0:30 ` Lorenzo Pegorari
1 sibling, 1 reply; 78+ messages in thread
From: Junio C Hamano @ 2026-04-12 6:27 UTC (permalink / raw)
To: LorenzoPegorari
Cc: git, Taylor Blau, Derrick Stolee, Patrick Steinhardt, Tian Yuchen,
Eric Sunshine, Elijah Newren
LorenzoPegorari <lorenzo.pegorari2002@gmail.com> writes:
> +/*
> + * Go through all .promisor files contained in repo (excluding those whose name
> + * appears in not_repacked_basenames, which acts as a ignorelist), and copies
> + * their content inside the destination file "<packtmp>-<dest_hex>.promisor".
> + * Each line of a never repacked .promisor file is: "<oid> <ref>" (as described
> + * in the write_promisor_file() function).
> + * After a repack, the copied lines will be: "<oid> <ref> <time>", where <time>
> + * is the time (in Unix time) at which the .promisor file was last modified.
> + * Only the lines whose <oid> is present inside "<packtmp>-<dest_hex>.idx" will
> + * be copied.
> + * The contents of all .promisor files are assumed to be correctly formed.
> + */
> +static void copy_promisor_content(struct repository *repo,
> + const char *dest_hex,
> + const char *packtmp,
> + struct strset *not_repacked_basenames)
> +{
> + char *dest_idx_name;
> + char *dest_promisor_name;
> + FILE *dest;
> + struct strset dest_content = STRSET_INIT;
> + struct strbuf dest_to_write = STRBUF_INIT;
> + struct strbuf source_promisor_name = STRBUF_INIT;
> + struct strbuf line = STRBUF_INIT;
> + struct object_id dest_oid;
> + struct packed_git *dest_pack, *p;
> + int err;
> +
> + dest_idx_name = mkpathdup("%s-%s.idx", packtmp, dest_hex);
> + get_oid_hex_algop(dest_hex, &dest_oid, repo->hash_algo);
This needs to prepare for a corrupt input in dest_hex, which would
result in garbage dest_oid. The helper function should signal a
failure with its return value, right?
> + dest_pack = parse_pack_index(repo, dest_oid.hash, dest_idx_name);
As you earlier mentioned, this use of parse_pack_index() is
perfectly fine. The call chains that reach here are both from
cmd_repack() that calls either repack_promisor_objects() or
pack_geometry_repack_promisors(), and both ran "pack-objects" to
create a new pack and called finish_repacking_promisor_objects(),
which in turn calls us, so the dest_hex/packtmp we are dealing with
point newly created packfile that is about to become but not yet
completed as a part of this repository. We know we created it, and
we know "pack-objects" did not fail, so parse_pack_index() being
loose in validation does not pose a practical problem.
This still needs to prepare for parse_pack_index() to return NULL,
though.
In the above two cases, we should make sure that dest_idx_name gets
freed before we return control to the caller (possibly signaling an
error by returning -1, but the current caller is not expecting to
hear a failure from us and that may be OK).
> + /* Open the .promisor dest file, and fill dest_content with its content */
> + dest_promisor_name = mkpathdup("%s-%s.promisor", packtmp, dest_hex);
> + dest = xfopen(dest_promisor_name, "r+");
> + while (strbuf_getline(&line, dest) != EOF)
> + strset_add(&dest_content, line.buf);
> +
> + repo_for_each_pack(repo, p) {
> + FILE *source;
> + struct stat source_stat;
> +
> + if (!p->pack_promisor)
> + continue;
> +
> + if (not_repacked_basenames &&
> + strset_contains(not_repacked_basenames, pack_basename(p)))
> + continue;
> +
> + strbuf_reset(&source_promisor_name);
> + strbuf_addstr(&source_promisor_name, p->pack_name);
> + strbuf_strip_suffix(&source_promisor_name, ".pack");
> + strbuf_addstr(&source_promisor_name, ".promisor");
> +
> + if (stat(source_promisor_name.buf, &source_stat))
> + die(_("File not found: %s"), source_promisor_name.buf);
> +
> + source = xfopen(source_promisor_name.buf, "r");
> +
> + while (strbuf_getline(&line, source) != EOF) {
> + struct string_list line_sections = STRING_LIST_INIT_DUP;
> + struct object_id oid;
> +
> + /* Split line into <oid>, <ref> and <time> (if <time> exists) */
> + string_list_split(&line_sections, line.buf, " ", 3);
The strbuf's contents line.buf[] is read/write, so we could use
line_sections that is initialized with NODUP and call
split_in_place() to avoid unnecessary small allocations and
deallocations, no?
More importantly, we say "split into up to 3 pieces". What happens
if this is totally malformed and there is only one word? Should we
still trust this line and try to carry it forward? I doubt it.
> + /* Ignore the lines where <oid> doesn't appear in the dest_pack */
> + get_oid_hex_algop(line_sections.items[0].string, &oid, repo->hash_algo);
Or the first word split is not a sane hexadecimal string that
get_oid_hex() fails?
It would be the simplest to ignore/skip the line, just like what you
do to a correctly formated line about an irrelevant <oid> (iow, the
if() statement immediately below).
> + if (!find_pack_entry_one(&oid, dest_pack)) {
Assuming that the object name was read correctly, if the pack we
just created does not have the <oid> we read from the existing
.promisor file, this line we just read has nothing to do with the
repacked result, so we ignore it, which sounds fine.
> + string_list_clear(&line_sections, 0);
> + continue;
> + }
> +
> + /* If <time> doesn't exist, retrieve it and add it to line */
> + if (line_sections.nr < 3)
> + strbuf_addf(&line, " %" PRItime, (timestamp_t)source_stat.st_mtime);
Should we also validate line_sections[1] in some way? I am not sure
if we want to call check_ref_format() on it.
If we insist that .nr is at least 2 immediately after we split the
string, and make sure the line begins with <oid> (i.e., parsable as
hex object name) that might be sufficient. I dunno.
> + /*
> + * Add the finalized line to dest_to_write and dest_content if it
> + * wasn't already present inside dest_content
> + */
> + if (strset_add(&dest_content, line.buf)) {
> + strbuf_addbuf(&dest_to_write, &line);
> + strbuf_addch(&dest_to_write, '\n');
> + }
> +
> + string_list_clear(&line_sections, 0);
> + }
> +
> + err = ferror(source);
> + err |= fclose(source);
> + if (err)
> + die(_("Could not read '%s' promisor file"), source_promisor_name.buf);
> + }
> +
> + /* If dest_to_write is not empty, then there are new lines to append */
> + if (dest_to_write.len) {
> + if (fseek(dest, 0L, SEEK_END))
> + die_errno(_("fseek failed"));
> + fprintf(dest, "%s", dest_to_write.buf);
> + }
> +
> + err = ferror(dest);
> + err |= fclose(dest);
> + if (err)
> + die(_("Could not write '%s' promisor file"), dest_promisor_name);
> +
> + close_pack_index(dest_pack);
As we discussed,
free(dest_pack);
is missing.
> + free(dest_idx_name);
> + free(dest_promisor_name);
> + strset_clear(&dest_content);
> + strbuf_release(&dest_to_write);
> + strbuf_release(&source_promisor_name);
> + strbuf_release(&line);
> +}
> +
> static void finish_repacking_promisor_objects(struct repository *repo,
> struct child_process *cmd,
> struct string_list *names,
^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: [PATCH] CodingGuidelines: st_mtimespec vs st_mtim vs st_mtime
2026-04-10 18:10 ` [PATCH] CodingGuidelines: st_mtimespec vs st_mtim vs st_mtime Junio C Hamano
@ 2026-04-16 23:46 ` Elijah Newren
2026-04-17 4:25 ` Junio C Hamano
0 siblings, 1 reply; 78+ messages in thread
From: Elijah Newren @ 2026-04-16 23:46 UTC (permalink / raw)
To: Junio C Hamano
Cc: git, LorenzoPegorari, Taylor Blau, Patrick Steinhardt,
Derrick Stolee, Eric Sunshine, Tian Yuchen
On Fri, Apr 10, 2026 at 11:10 AM Junio C Hamano <gitster@pobox.com> wrote:
>
> Most unfortunately macOS does not support st_[amc]tim for timestamps
> down to nanosecond resolution as POSIX systems.
>
> Signed-off-by: Junio C Hamano <gitster@pobox.com>
> ---
> Documentation/CodingGuidelines | 6 ++++++
> 1 file changed, 6 insertions(+)
>
> diff --git c/Documentation/CodingGuidelines w/Documentation/CodingGuidelines
> index 4992e52093..4e54139fd7 100644
> --- c/Documentation/CodingGuidelines
> +++ w/Documentation/CodingGuidelines
> @@ -693,6 +693,12 @@ For C programs:
> char *dogs[] = ...;
> walk_all_dogs(dogs);
>
> + - For file timestamps, do not use "st_mtim" (and other timestamp
> + members in "struct stat") unconditionally; not everybody is POSIX
> + (grep for USE_ST_TIMESPEC). If you only need timestamp in whole
> + second resolution, "st_mtime" should work fine everywhere.
> +
> +
> For Perl programs:
>
> - Most of the C guidelines above apply.
Looks good to me. As a minor nit, "need timestamp" -> "need a timestamp".
^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: [GSoC PATCH v5 2/6] repack-promisor add helper to fill promisor file after repack
2026-04-12 6:27 ` Junio C Hamano
@ 2026-04-17 0:30 ` Lorenzo Pegorari
0 siblings, 0 replies; 78+ messages in thread
From: Lorenzo Pegorari @ 2026-04-17 0:30 UTC (permalink / raw)
To: Junio C Hamano
Cc: git, Taylor Blau, Derrick Stolee, Patrick Steinhardt, Tian Yuchen,
Eric Sunshine, Elijah Newren
On Sat, Apr 11, 2026 at 11:27:45PM -0700, Junio C Hamano wrote:
> LorenzoPegorari <lorenzo.pegorari2002@gmail.com> writes:
>
> > +/*
> > + * Go through all .promisor files contained in repo (excluding those whose name
> > + * appears in not_repacked_basenames, which acts as a ignorelist), and copies
> > + * their content inside the destination file "<packtmp>-<dest_hex>.promisor".
> > + * Each line of a never repacked .promisor file is: "<oid> <ref>" (as described
> > + * in the write_promisor_file() function).
> > + * After a repack, the copied lines will be: "<oid> <ref> <time>", where <time>
> > + * is the time (in Unix time) at which the .promisor file was last modified.
> > + * Only the lines whose <oid> is present inside "<packtmp>-<dest_hex>.idx" will
> > + * be copied.
> > + * The contents of all .promisor files are assumed to be correctly formed.
> > + */
> > +static void copy_promisor_content(struct repository *repo,
> > + const char *dest_hex,
> > + const char *packtmp,
> > + struct strset *not_repacked_basenames)
> > +{
> > + char *dest_idx_name;
> > + char *dest_promisor_name;
> > + FILE *dest;
> > + struct strset dest_content = STRSET_INIT;
> > + struct strbuf dest_to_write = STRBUF_INIT;
> > + struct strbuf source_promisor_name = STRBUF_INIT;
> > + struct strbuf line = STRBUF_INIT;
> > + struct object_id dest_oid;
> > + struct packed_git *dest_pack, *p;
> > + int err;
> > +
> > + dest_idx_name = mkpathdup("%s-%s.idx", packtmp, dest_hex);
> > + get_oid_hex_algop(dest_hex, &dest_oid, repo->hash_algo);
>
> This needs to prepare for a corrupt input in dest_hex, which would
> result in garbage dest_oid. The helper function should signal a
> failure with its return value, right?
Ack. I think the best way is to signal a `warning()`, and then simply
exit the helper function leaving the ".promisor" file empty.
> > + dest_pack = parse_pack_index(repo, dest_oid.hash, dest_idx_name);
>
> As you earlier mentioned, this use of parse_pack_index() is
> perfectly fine. The call chains that reach here are both from
> cmd_repack() that calls either repack_promisor_objects() or
> pack_geometry_repack_promisors(), and both ran "pack-objects" to
> create a new pack and called finish_repacking_promisor_objects(),
> which in turn calls us, so the dest_hex/packtmp we are dealing with
> point newly created packfile that is about to become but not yet
> completed as a part of this repository. We know we created it, and
> we know "pack-objects" did not fail, so parse_pack_index() being
> loose in validation does not pose a practical problem.
Exactly. I couldn't quite explain it as good as you right now. :)
> This still needs to prepare for parse_pack_index() to return NULL,
> though.
Ack.
> In the above two cases, we should make sure that dest_idx_name gets
> freed before we return control to the caller (possibly signaling an
> error by returning -1, but the current caller is not expecting to
> hear a failure from us and that may be OK).
Again, I think this should be treated the same as when `dest_hex` is
garbage.
> > + /* Open the .promisor dest file, and fill dest_content with its content */
> > + dest_promisor_name = mkpathdup("%s-%s.promisor", packtmp, dest_hex);
> > + dest = xfopen(dest_promisor_name, "r+");
> > + while (strbuf_getline(&line, dest) != EOF)
> > + strset_add(&dest_content, line.buf);
> > +
> > + repo_for_each_pack(repo, p) {
> > + FILE *source;
> > + struct stat source_stat;
> > +
> > + if (!p->pack_promisor)
> > + continue;
> > +
> > + if (not_repacked_basenames &&
> > + strset_contains(not_repacked_basenames, pack_basename(p)))
> > + continue;
> > +
> > + strbuf_reset(&source_promisor_name);
> > + strbuf_addstr(&source_promisor_name, p->pack_name);
> > + strbuf_strip_suffix(&source_promisor_name, ".pack");
> > + strbuf_addstr(&source_promisor_name, ".promisor");
> > +
> > + if (stat(source_promisor_name.buf, &source_stat))
> > + die(_("File not found: %s"), source_promisor_name.buf);
> > +
> > + source = xfopen(source_promisor_name.buf, "r");
> > +
> > + while (strbuf_getline(&line, source) != EOF) {
> > + struct string_list line_sections = STRING_LIST_INIT_DUP;
> > + struct object_id oid;
> > +
> > + /* Split line into <oid>, <ref> and <time> (if <time> exists) */
> > + string_list_split(&line_sections, line.buf, " ", 3);
>
> The strbuf's contents line.buf[] is read/write, so we could use
> line_sections that is initialized with NODUP and call
> split_in_place() to avoid unnecessary small allocations and
> deallocations, no?
I don't think so, because we still need the complete `line` when we
append the <time> to it (if we do so), and when we print it to the
`dest` file. This means that we can't use `split_in_place()` and
initialize it with `NODUP`, because then we would have the complete
`line`.
> More importantly, we say "split into up to 3 pieces". What happens
> if this is totally malformed and there is only one word? Should we
> still trust this line and try to carry it forward? I doubt it.
I think we should discard the line if it can't be split up into 2 or 3
pieces.
> > + /* Ignore the lines where <oid> doesn't appear in the dest_pack */
> > + get_oid_hex_algop(line_sections.items[0].string, &oid, repo->hash_algo);
>
> Or the first word split is not a sane hexadecimal string that
> get_oid_hex() fails?
Same, we should just discard it.
> It would be the simplest to ignore/skip the line, just like what you
> do to a correctly formated line about an irrelevant <oid> (iow, the
> if() statement immediately below).
Agreed.
> > + if (!find_pack_entry_one(&oid, dest_pack)) {
>
> Assuming that the object name was read correctly, if the pack we
> just created does not have the <oid> we read from the existing
> .promisor file, this line we just read has nothing to do with the
> repacked result, so we ignore it, which sounds fine.
>
> > + string_list_clear(&line_sections, 0);
> > + continue;
> > + }
> > +
> > + /* If <time> doesn't exist, retrieve it and add it to line */
> > + if (line_sections.nr < 3)
> > + strbuf_addf(&line, " %" PRItime, (timestamp_t)source_stat.st_mtime);
>
> Should we also validate line_sections[1] in some way? I am not sure
> if we want to call check_ref_format() on it.
>
> If we insist that .nr is at least 2 immediately after we split the
> string, and make sure the line begins with <oid> (i.e., parsable as
> hex object name) that might be sufficient. I dunno.
I think we should check <ref>. Found some success using:
`check_refname_format(<ref>, REFNAME_ALLOW_ONELEVEL)`
> > + /*
> > + * Add the finalized line to dest_to_write and dest_content if it
> > + * wasn't already present inside dest_content
> > + */
> > + if (strset_add(&dest_content, line.buf)) {
> > + strbuf_addbuf(&dest_to_write, &line);
> > + strbuf_addch(&dest_to_write, '\n');
> > + }
> > +
> > + string_list_clear(&line_sections, 0);
> > + }
> > +
> > + err = ferror(source);
> > + err |= fclose(source);
> > + if (err)
> > + die(_("Could not read '%s' promisor file"), source_promisor_name.buf);
> > + }
> > +
> > + /* If dest_to_write is not empty, then there are new lines to append */
> > + if (dest_to_write.len) {
> > + if (fseek(dest, 0L, SEEK_END))
> > + die_errno(_("fseek failed"));
> > + fprintf(dest, "%s", dest_to_write.buf);
> > + }
> > +
> > + err = ferror(dest);
> > + err |= fclose(dest);
> > + if (err)
> > + die(_("Could not write '%s' promisor file"), dest_promisor_name);
> > +
> > + close_pack_index(dest_pack);
>
> As we discussed,
>
> free(dest_pack);
>
> is missing.
Ack.
> > + free(dest_idx_name);
> > + free(dest_promisor_name);
> > + strset_clear(&dest_content);
> > + strbuf_release(&dest_to_write);
> > + strbuf_release(&source_promisor_name);
> > + strbuf_release(&line);
> > +}
> > +
> > static void finish_repacking_promisor_objects(struct repository *repo,
> > struct child_process *cmd,
> > struct string_list *names,
^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: [GSoC PATCH v5 3/6] repack-promisor: preserve content of promisor files after repack
2026-04-11 18:25 ` Tian Yuchen
@ 2026-04-17 0:34 ` Lorenzo Pegorari
0 siblings, 0 replies; 78+ messages in thread
From: Lorenzo Pegorari @ 2026-04-17 0:34 UTC (permalink / raw)
To: Tian Yuchen
Cc: git, Taylor Blau, Derrick Stolee, Junio C Hamano,
Patrick Steinhardt, Eric Sunshine, Elijah Newren
On Sun, Apr 12, 2026 at 02:25:50AM +0800, Tian Yuchen wrote:
> On 4/11/26 06:55, LorenzoPegorari wrote:
>
> > @@ -171,19 +172,15 @@ static void finish_repacking_promisor_objects(struct repository *repo,
> > /*
> > * pack-objects creates the .pack and .idx files, but not the
> > - * .promisor file. Create the .promisor file, which is empty.
> > - *
> > - * NEEDSWORK: fetch-pack sometimes generates non-empty
> > - * .promisor files containing the ref names and associated
> > - * hashes at the point of generation of the corresponding
> > - * packfile, but this would not preserve their contents. Maybe
> > - * concatenate the contents of all .promisor files instead of
> > - * just creating a new empty file.
> > + * .promisor file. Create the .promisor file.
> > */
> > promisor_name = mkpathdup("%s-%s.promisor", packtmp,
> > line.buf);
> > write_promisor_file(promisor_name, NULL, 0);
> > + /* Now let's fill the content of the newly created .promisor file */
> > + copy_promisor_content(repo, line.buf, packtmp, not_repacked_basenames);
>
> Here, the file opened by copy_promisor_content() is an empty file. Is this
> line necessary? ;)
>
> ...hold on. I recall you mentioning in one of the versions that you had
> downgraded this helper from a generic function to a static one. Since it now
> only serves this particular business logic, I think the implementation
> should be tweaked slightly as well.
Good point. It makes sense to modify the helper function to create the
empty ".promisor" file, and then fill it, so that we won't use
`write_promisor_file()` at all.
> > + /* Open the .promisor dest file, and fill dest_content with its content */
> > + dest_promisor_name = mkpathdup("%s-%s.promisor", packtmp, dest_hex);
> > + dest = xfopen(dest_promisor_name, "r+");
> > + while (strbuf_getline(&line, dest) != EOF)
> > + strset_add(&dest_content, line.buf);
>
> If file contains a large number of unique lines, dest_to_write, which is a
> strbuf, may keep realloc memory until the loop ends, at which point all the
> memory is released. I wonder if this might be wasting some heap.
>
> If it were me, I might write it like this:
>
> struct strset seen_lines = STRSET_INIT;
> dest = xfopen(dest_promisor_name, "w");
> while (strbuf_getline(&line, source) != EOF) {
> if (strset_add(&seen_lines, line.buf)) {
> fprintf(dest, "%s\n", line.buf);
> }
> }
>
> It also prevents file pointer misalignment.
Makes sense. Will use this. Thanks!
> (I think we still need to discuss what should ultimately become of this
> helper; at the moment, it seems a bit disjointed, doesn’t it?)
I feel like keeping it a static function only used to generate the
".promisor" files after a repack makes the most sense.
> Thank you, Yuchen
Thanks,
Lorenzo
^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: [GSoC PATCH v5 5/6] t7703: test for promisor file content after geometric repack
2026-04-11 18:49 ` Tian Yuchen
@ 2026-04-17 0:46 ` Lorenzo Pegorari
0 siblings, 0 replies; 78+ messages in thread
From: Lorenzo Pegorari @ 2026-04-17 0:46 UTC (permalink / raw)
To: Tian Yuchen
Cc: git, Taylor Blau, Derrick Stolee, Junio C Hamano,
Patrick Steinhardt, Eric Sunshine, Elijah Newren
On Sun, Apr 12, 2026 at 02:49:05AM +0800, Tian Yuchen wrote:
> On 4/11/26 06:56, LorenzoPegorari wrote:
> > Add test that checks if the content of ".promisor" files are correctly
> > copied inside the ".promisor" files created by a geometric repack.
> >
> > Signed-off-by: LorenzoPegorari <lorenzo.pegorari2002@gmail.com>
> > ---
> > t/t7703-repack-geometric.sh | 33 +++++++++++++++++++++++++++++++++
> > 1 file changed, 33 insertions(+)
> >
> > diff --git a/t/t7703-repack-geometric.sh b/t/t7703-repack-geometric.sh
> > index 04d5d8fc33..a8e3e6ae3f 100755
> > --- a/t/t7703-repack-geometric.sh
> > +++ b/t/t7703-repack-geometric.sh
> > @@ -541,4 +541,37 @@ test_expect_success 'geometric repack works with promisor packs' '
> > )
> > '
> > +test_expect_success 'check .promisor file content after geometric repack' '
> > + test_when_finished rm -rf prom_test &&
> > + git init prom_test &&
> > + path=prom_test/.git/objects/pack &&
> > +
> > + (
> > + # Create 2 packs with 3 objs each, and manually create .promisor files
> > + test_commit_bulk -C prom_test --start=1 1 && # 3 objects
>
> ---
>
> > + prom1=$(ls $path/*.pack | sed "s/\.pack/.promisor/") &&
>
> This approach seems a bit fragile.
>
> - Perhaps you’ve heard the saying which goes like "never parse the output of
> ls". In a nutshell, the output of this command is not standardised;
Didn't know that. I'm learning a lot! :)
I will rewrite this using `find`.
> - *.pack? This may produce multiple lines of output, which I don’t think is
> what we want.
`test_commit_bulk` specifically creates a single pack. With "*.pack" we
might get multiple lines of output, but only if we first created
multiple packs.
I think doing something like this makes sense:
```
[...]
path=prom_test/.git/objects/pack &&
# Create first pack
test_commit_bulk -C prom_test 1 &&
# Get first pack name, and then get its ".promisor" filename
prom1=$(find $path -name "*.pack" | sed "s/.pack$/.promisor/") &&
[...]
# Create second pack
test_commit_bulk -C prom_test 1 &&
# Get all packs names, then get their ".promisor" filenames, and finally
# remove the filename that we got before, to obtain the new filename
prom2=$(find $path -name "*.pack" | sed "s/.pack$/.promisor/; \|$prom1|d") &&
[...]
```
> - $path instead of "$path", which cannot correctly handle spacing in
> directory names;
Ack.
> - sed s command matches the "first" string it meets. We can’t guarantee that
> the '.pack' part won’t appear in users’ path names, can we?
>
> (Fun fact: There are approximately 8,000 people in the United States with
> the surname 'Pack'. Source: 2010 Census ; - )
True. I will use `sed "s/.pack$/.promisor/"`, which will only look for
".pack" substrings that appear at the end of the string.
>
> > + oid1=$(git -C prom_test rev-parse HEAD) &&
> > + echo "$oid1 ref1" >"$prom1" &&
> > + test_commit_bulk -C prom_test --start=2 1 && # 3 objects
> > + prom2=$(ls $path/*.pack | sed "s/\.pack/.promisor/; \|$prom1|d") &&
> > + oid2=$(git -C prom_test rev-parse HEAD) &&
> > + echo "$oid2 ref2" >"$prom2" &&
> > +
> > + # Create 1 pack with 12 objs, and manually create .promisor file
> > + test_commit_bulk -C prom_test --start=3 4 && # 12 objects
> > + prom3=$(ls $path/*.pack | sed "s/\.pack/.promisor/; \|$prom1|d; \|$prom2|d") &&
> > + oid3=$(git -C prom_test rev-parse HEAD) &&
> > + echo "$oid3 ref3" >"$prom3" &&
> > +
> > + # Geometric repack, and check if correct
> > + git -C prom_test repack --geometric 2 -d &&
> > + prom=$(ls $path/*.pack | sed "s/\.pack/.promisor/; \|$prom3|d") &&
> > + # $prom should have repacked only the first 2 small packs, so it should only
> > + # contain the following: "$oid1 ref1 <time>" & "$oid2 ref2 <time>"
> > + test_grep "$oid1 ref1 " "$prom" &&
> > + test_grep "$oid2 ref2 " "$prom" &&
> > + test_grep ! "$oid3 ref3" "$prom"
> > + )
> > +'
> > +
> > test_done
>
> Thanks, yuchen
Thanks,
Lorenzo
^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: [PATCH] CodingGuidelines: st_mtimespec vs st_mtim vs st_mtime
2026-04-16 23:46 ` Elijah Newren
@ 2026-04-17 4:25 ` Junio C Hamano
0 siblings, 0 replies; 78+ messages in thread
From: Junio C Hamano @ 2026-04-17 4:25 UTC (permalink / raw)
To: Elijah Newren
Cc: git, LorenzoPegorari, Taylor Blau, Patrick Steinhardt,
Derrick Stolee, Eric Sunshine, Tian Yuchen
Elijah Newren <newren@gmail.com> writes:
>> For Perl programs:
>>
>> - Most of the C guidelines above apply.
>
> Looks good to me. As a minor nit, "need timestamp" -> "need a timestamp".
Thanks. Will amend.
^ permalink raw reply [flat|nested] 78+ messages in thread
* [GSoC PATCH v6 0/6] preserve promisor files content after repack
2026-04-10 22:54 ` [GSoC PATCH v5 0/6] " LorenzoPegorari
` (5 preceding siblings ...)
2026-04-10 22:56 ` [GSoC PATCH v5 6/6] repack-promisor: add missing headers LorenzoPegorari
@ 2026-04-18 14:16 ` LorenzoPegorari
2026-04-18 14:16 ` [GSoC PATCH v6 1/6] pack-write: add explanation to promisor file content LorenzoPegorari
` (6 more replies)
6 siblings, 7 replies; 78+ messages in thread
From: LorenzoPegorari @ 2026-04-18 14:16 UTC (permalink / raw)
To: git
Cc: Taylor Blau, Derrick Stolee, Junio C Hamano, Patrick Steinhardt,
Tian Yuchen, Eric Sunshine, Elijah Newren
The goal of this patch is to solve the NEEDSWORK comment added by
5374a290 (fetch-pack: write fetched refs to .promisor, 14/10/2019). This
is done by adding a helper function that takes the content of all
.promisor files in the `repository`, and copies it inside the first
.promisor file created by the repack.
Also, I added a comment explaining what is the purpose of the content of
the .promisor files, since this wasn't explained anywhere (I found
information regarding this only in the message of the previously cited
commit).
Finally, I added some tests to "t7700-repack.sh" and
"t7703-repack-geometric.sh" that check if the content of .promisor files
are correctly copied into the .promisor files created by a repack.
V6 DIFF:
* changed the name of the helper function to
`write_promisor_file_after_repack`.
* modified the helper function to create the ".promisor" file, so that
is not required anymore.
* modified the logic of the helper function (as suggested by Tian
Yuchen)
* modified the helper function to check for possible errors, and to
check if the lines of the ".promisor" files are correctly formed.
* fixed memory leak.
* improved comments.
LorenzoPegorari (6):
pack-write: add explanation to promisor file content
repack-promisor add helper to fill promisor file after repack
repack-promisor: preserve content of promisor files after repack
t7700: test for promisor file content after repack
t7703: test for promisor file content after geometric repack
repack-promisor: add missing headers
Documentation/git-repack.adoc | 4 +-
pack-write.c | 9 ++
repack-promisor.c | 194 ++++++++++++++++++++++++++++++----
t/t7700-repack.sh | 61 +++++++++++
t/t7703-repack-geometric.sh | 33 ++++++
5 files changed, 280 insertions(+), 21 deletions(-)
--
2.53.0.584.g6b87e8e9dd
^ permalink raw reply [flat|nested] 78+ messages in thread
* [GSoC PATCH v6 1/6] pack-write: add explanation to promisor file content
2026-04-18 14:16 ` [GSoC PATCH v6 0/6] preserve promisor files content after repack LorenzoPegorari
@ 2026-04-18 14:16 ` LorenzoPegorari
2026-04-18 14:17 ` [GSoC PATCH v6 2/6] repack-promisor add helper to fill promisor file after repack LorenzoPegorari
` (5 subsequent siblings)
6 siblings, 0 replies; 78+ messages in thread
From: LorenzoPegorari @ 2026-04-18 14:16 UTC (permalink / raw)
To: git
Cc: Taylor Blau, Derrick Stolee, Junio C Hamano, Patrick Steinhardt,
Tian Yuchen, Eric Sunshine, Elijah Newren
In the entire codebase there is no explanation as to why the ".promisor"
files may contain the ref names (and their associated hashes) that were
fetched at the time the corresponding packfile was downloaded.
As explained in the log message of commit 5374a290 (fetch-pack: write
fetched refs to .promisor, 2019-10-14), where this loop originally came
from, these ref names (and associated hashes) are not used for anything
in the production, but are solely there to help debugging.
Explain this in a new comment.
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: LorenzoPegorari <lorenzo.pegorari2002@gmail.com>
---
pack-write.c | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/pack-write.c b/pack-write.c
index 83eaf88541..b8ab9510ff 100644
--- a/pack-write.c
+++ b/pack-write.c
@@ -603,6 +603,15 @@ void write_promisor_file(const char *promisor_name, struct ref **sought, int nr_
int i, err;
FILE *output = xfopen(promisor_name, "w");
+ /*
+ * Write in the .promisor file the ref names and associated hashes,
+ * obtained by fetch-pack, at the point of generation of the
+ * corresponding packfile. These pieces of info are only used to make
+ * it easier to debug issues with partial clones, as we can identify
+ * what refs (and their associated hashes) were fetched at the time
+ * the packfile was downloaded, and if necessary, compare those hashes
+ * against what the promisor remote reports now.
+ */
for (i = 0; i < nr_sought; i++)
fprintf(output, "%s %s\n", oid_to_hex(&sought[i]->old_oid),
sought[i]->name);
--
2.53.0.584.g6b87e8e9dd
^ permalink raw reply related [flat|nested] 78+ messages in thread
* [GSoC PATCH v6 2/6] repack-promisor add helper to fill promisor file after repack
2026-04-18 14:16 ` [GSoC PATCH v6 0/6] preserve promisor files content after repack LorenzoPegorari
2026-04-18 14:16 ` [GSoC PATCH v6 1/6] pack-write: add explanation to promisor file content LorenzoPegorari
@ 2026-04-18 14:17 ` LorenzoPegorari
2026-04-18 14:17 ` [GSoC PATCH v6 3/6] repack-promisor: preserve content of promisor files " LorenzoPegorari
` (4 subsequent siblings)
6 siblings, 0 replies; 78+ messages in thread
From: LorenzoPegorari @ 2026-04-18 14:17 UTC (permalink / raw)
To: git
Cc: Taylor Blau, Derrick Stolee, Junio C Hamano, Patrick Steinhardt,
Tian Yuchen, Eric Sunshine, Elijah Newren
A ".promisor" file may contain ref names (and their associated hashes)
that were fetched at the time the corresponding packfile was downloaded.
This information is used for debugging reasons. This information is
stored as lines structured like this: "<oid> <ref>".
Create a `write_promisor_file_after_repack()` helper function that allows
this debugging info to not be lost after a `repack`, by copying it inside
a new ".promisor" file.
The function logic is the following:
* Take all ".promisor" files contained inside the given `repo`.
* Ignore those whose name is contained inside the given `strset
not_repacked_names`, which basically acts as a "promisor ignorelist"
(intended to be used for packfiles that have not been repacked).
* Read each line of the remaining ".promisor" files, which can be:
* "<oid> <ref>" if the ".promisor" file was never repacked. If so,
add the time (in Unix time) at which the ".promisor" file was last
modified <time> to the line, to obtain: "<oid> <ref> <time>".
* "<oid> <ref> <time>" if the ".promisor" file was repacked. If so,
don't modify it.
* Ignore the line if its <oid> is not present inside the
"<packtmp>-<dest_hex>.idx" file.
* If the destination file "<packtmp>-<dest_hex>.promisor" does not
already contain the line, append it to the file.
The time of last data modification, for never-repacked ".promisor" file,
can be used when comparing the entries in it with entries in another
".promisor" file that did get repacked. With these timestamps, the
debugger will be able to tell at which time the refs at the remote
repository pointed at what object. Also, when looking at already
repacked ".promisor" files, the same ref may appear multiple times, and
having timestamps will help understanding what happened over time.
The function tries its best to deal with malformed ".promisor" files,
ignoring those lines:
* That cannot be split into "<oid> <ref>" or "<oid> <ref> <time>".
* Whose <oid> is not a sane hexadecimal string.
* Whose <ref> does not have the correct format for a refname.
The function `parse_pack_index()`, which is loose in validation, can be
safely used to obtain the `packed_git` of the packs created during the
`repack` because, when `write_promisor_file_after_repack()` is called by
`finish_repacking_promisor_objects()`, we know for a fact that they were
just successfully created by `pack-objects` (also, these packs have not
yet been finalized, and so they are not part of the repository). Anyway,
if an error happens while trying to obtain the `packed_git`, the
".promisor" file will be created empty.
Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Helped-by: Tian Yuchen <cat@malon.dev>
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: LorenzoPegorari <lorenzo.pegorari2002@gmail.com>
---
repack-promisor.c | 149 ++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 149 insertions(+)
diff --git a/repack-promisor.c b/repack-promisor.c
index 90318ce150..8fc541d2cf 100644
--- a/repack-promisor.c
+++ b/repack-promisor.c
@@ -4,6 +4,7 @@
#include "pack.h"
#include "packfile.h"
#include "path.h"
+#include "refs.h"
#include "repository.h"
#include "run-command.h"
@@ -34,6 +35,154 @@ static int write_oid(const struct object_id *oid,
return 0;
}
+/*
+ * Go through all .promisor files contained in repo (excluding those whose name
+ * appears in not_repacked_basenames, which acts as a ignorelist), and copies
+ * their content inside the destination file "<packtmp>-<dest_hex>.promisor".
+ * Each line of a never repacked .promisor file is: "<oid> <ref>" (as described
+ * in the write_promisor_file() function).
+ * After a repack, the copied lines will be: "<oid> <ref> <time>", where <time>
+ * is the time (in Unix time) at which the .promisor file was last modified.
+ * Only the lines whose <oid> is present inside "<packtmp>-<dest_hex>.idx" will
+ * be copied.
+ * The contents of all .promisor files are assumed to be correctly formed.
+ */
+static void write_promisor_file_after_repack(struct repository *repo,
+ const char *dest_hex,
+ const char *packtmp,
+ struct strset *not_repacked_basenames)
+{
+ char *dest_promisor_name;
+ char *dest_idx_name;
+ FILE *dest;
+ struct object_id dest_oid;
+ struct packed_git *dest_pack, *p;
+ struct strbuf source_promisor_name = STRBUF_INIT;
+ struct strset seen_lines = STRSET_INIT;
+ struct strbuf line = STRBUF_INIT;
+ int err;
+
+ /* First of all, let's create and open the .promisor dest file */
+ dest_promisor_name = mkpathdup("%s-%s.promisor", packtmp, dest_hex);
+ dest = xfopen(dest_promisor_name, "w");
+
+ /*
+ * Now let's retrieve the destination pack.
+ * We use parse_pack_index() because dest_hex/packtmp point to the packfile
+ * that "pack-objects" just created, which is about to become part of this
+ * repository, but has not yet been finalized.
+ * If we are here, we know that "pack-objects" did not fail, so
+ * parse_pack_index() being loose in validation does not pose a problem.
+ * If an error happens, we simply leave the ".promisor" file empty.
+ */
+ if (get_oid_hex_algop(dest_hex, &dest_oid, repo->hash_algo)) {
+ warning(_("Promisor file left empty: '%s' not a hash"), dest_hex);
+ if (fclose(dest))
+ die(_("Could not close '%s' promisor file"), dest_promisor_name);
+ free(dest_promisor_name);
+ return;
+ }
+ dest_idx_name = mkpathdup("%s-%s.idx", packtmp, dest_hex);
+ dest_pack = parse_pack_index(repo, dest_oid.hash, dest_idx_name);
+ if (!dest_pack) {
+ warning(_("Promisor file left empty: couldn't open packfile '%s'"), dest_idx_name);
+ if (fclose(dest))
+ die(_("Could not close '%s' promisor file"), dest_promisor_name);
+ free(dest_promisor_name);
+ free(dest_idx_name);
+ return;
+ }
+
+ repo_for_each_pack(repo, p) {
+ FILE *source;
+ struct stat source_stat;
+
+ if (!p->pack_promisor)
+ continue;
+
+ if (not_repacked_basenames &&
+ strset_contains(not_repacked_basenames, pack_basename(p)))
+ continue;
+
+ strbuf_reset(&source_promisor_name);
+ strbuf_addstr(&source_promisor_name, p->pack_name);
+ strbuf_strip_suffix(&source_promisor_name, ".pack");
+ strbuf_addstr(&source_promisor_name, ".promisor");
+
+ if (stat(source_promisor_name.buf, &source_stat))
+ die(_("File not found: %s"), source_promisor_name.buf);
+
+ source = xfopen(source_promisor_name.buf, "r");
+
+ while (strbuf_getline(&line, source) != EOF) {
+ struct string_list line_sections = STRING_LIST_INIT_DUP;
+ struct object_id oid;
+
+ /* Split line into <oid>, <ref> and <time> (if <time> exists).
+ * Check that it was actually split into 2 or 3 parts. If it was
+ * not, then it is malformed, so skip it.
+ */
+ string_list_split(&line_sections, line.buf, " ", 3);
+ if (line_sections.nr != 2 && line_sections.nr != 3) {
+ string_list_clear(&line_sections, 0);
+ continue;
+ }
+
+ /* Skip the lines where <oid> is not a sane hexadecimal string */
+ if (get_oid_hex_algop(line_sections.items[0].string,
+ &oid, repo->hash_algo)) {
+ string_list_clear(&line_sections, 0);
+ continue;
+ }
+ /* Ignore the lines where <oid> doesn't appear in the dest_pack */
+ if (!find_pack_entry_one(&oid, dest_pack)) {
+ string_list_clear(&line_sections, 0);
+ continue;
+ }
+
+ /*
+ * Skip the lines where <ref> does not have the
+ * correct format for a refname.
+ */
+ printf("%s\n", line_sections.items[1].string);
+ if (check_refname_format(line_sections.items[1].string,
+ REFNAME_ALLOW_ONELEVEL)) {
+ string_list_clear(&line_sections, 0);
+ continue;
+ }
+
+ /* If <time> doesn't exist, retrieve it and add it to line */
+ if (line_sections.nr != 3)
+ strbuf_addf(&line, " %" PRItime,
+ (timestamp_t)source_stat.st_mtime);
+
+ /* If the finalized line is new, append it to dest */
+ if (strset_add(&seen_lines, line.buf))
+ fprintf(dest, "%s\n", line.buf);
+
+ string_list_clear(&line_sections, 0);
+ }
+
+ err = ferror(source);
+ err |= fclose(source);
+ if (err)
+ die(_("Could not read '%s' promisor file"), source_promisor_name.buf);
+ }
+
+ err = ferror(dest);
+ err |= fclose(dest);
+ if (err)
+ die(_("Could not write '%s' promisor file"), dest_promisor_name);
+
+ close_pack_index(dest_pack);
+ free(dest_pack);
+ free(dest_promisor_name);
+ free(dest_idx_name);
+ strbuf_release(&source_promisor_name);
+ strbuf_release(&line);
+ strset_clear(&seen_lines);
+}
+
static void finish_repacking_promisor_objects(struct repository *repo,
struct child_process *cmd,
struct string_list *names,
--
2.53.0.584.g6b87e8e9dd
^ permalink raw reply related [flat|nested] 78+ messages in thread
* [GSoC PATCH v6 3/6] repack-promisor: preserve content of promisor files after repack
2026-04-18 14:16 ` [GSoC PATCH v6 0/6] preserve promisor files content after repack LorenzoPegorari
2026-04-18 14:16 ` [GSoC PATCH v6 1/6] pack-write: add explanation to promisor file content LorenzoPegorari
2026-04-18 14:17 ` [GSoC PATCH v6 2/6] repack-promisor add helper to fill promisor file after repack LorenzoPegorari
@ 2026-04-18 14:17 ` LorenzoPegorari
2026-04-18 14:17 ` [GSoC PATCH v6 4/6] t7700: test for promisor file content " LorenzoPegorari
` (3 subsequent siblings)
6 siblings, 0 replies; 78+ messages in thread
From: LorenzoPegorari @ 2026-04-18 14:17 UTC (permalink / raw)
To: git
Cc: Taylor Blau, Derrick Stolee, Junio C Hamano, Patrick Steinhardt,
Tian Yuchen, Eric Sunshine, Elijah Newren
When a `repack` involving promisor packfiles happens, the new ".promisor"
file is created empty, losing all the debug info that might be present
inside the ".promisor" files before the `repack`.
Use the previously created "write_promisor_file_after_repack()" function
to preserve the contents of all ".promisor" files inside the ".promisor"
files created by the `repack`.
For geometric repacking, we have to create a `strset` that contains the
basenames of all excluded packs. For "normal" repacking this is not
necessary, since there should be no excluded packs.
Also, update the documentation accordingly.
Helped-by: Tian Yuchen <cat@malon.dev>
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: LorenzoPegorari <lorenzo.pegorari2002@gmail.com>
---
Documentation/git-repack.adoc | 4 ++--
repack-promisor.c | 39 ++++++++++++++++++-----------------
2 files changed, 22 insertions(+), 21 deletions(-)
diff --git a/Documentation/git-repack.adoc b/Documentation/git-repack.adoc
index 673ce91083..33d3c8afbd 100644
--- a/Documentation/git-repack.adoc
+++ b/Documentation/git-repack.adoc
@@ -45,8 +45,8 @@ other objects in that pack they already have locally.
+
Promisor packfiles are repacked separately: if there are packfiles that
have an associated ".promisor" file, these packfiles will be repacked
-into another separate pack, and an empty ".promisor" file corresponding
-to the new separate pack will be written.
+into another separate pack, and a ".promisor" file corresponding to the
+new separate pack will be written (with arbitrary contents).
-A::
Same as `-a`, unless `-d` is used. Then any unreachable
diff --git a/repack-promisor.c b/repack-promisor.c
index 8fc541d2cf..06393ef06e 100644
--- a/repack-promisor.c
+++ b/repack-promisor.c
@@ -186,7 +186,8 @@ static void write_promisor_file_after_repack(struct repository *repo,
static void finish_repacking_promisor_objects(struct repository *repo,
struct child_process *cmd,
struct string_list *names,
- const char *packtmp)
+ const char *packtmp,
+ struct strset *not_repacked_basenames)
{
struct strbuf line = STRBUF_INIT;
FILE *out;
@@ -196,7 +197,6 @@ static void finish_repacking_promisor_objects(struct repository *repo,
out = xfdopen(cmd->out, "r");
while (strbuf_getline_lf(&line, out) != EOF) {
struct string_list_item *item;
- char *promisor_name;
if (line.len != repo->hash_algo->hexsz)
die(_("repack: Expecting full hex object ID lines only from pack-objects."));
@@ -204,22 +204,16 @@ static void finish_repacking_promisor_objects(struct repository *repo,
/*
* pack-objects creates the .pack and .idx files, but not the
- * .promisor file. Create the .promisor file, which is empty.
- *
- * NEEDSWORK: fetch-pack sometimes generates non-empty
- * .promisor files containing the ref names and associated
- * hashes at the point of generation of the corresponding
- * packfile, but this would not preserve their contents. Maybe
- * concatenate the contents of all .promisor files instead of
- * just creating a new empty file.
+ * ".promisor" file. To create the "".promisor" file, we don't use the
+ * helper function write_promisor_file(), but instead we use the
+ * specific function write_promisor_file_after_repack(), which creates
+ * the file and appropriately fills it with the content of the
+ * ".promisor" files used for the repack.
*/
- promisor_name = mkpathdup("%s-%s.promisor", packtmp,
- line.buf);
- write_promisor_file(promisor_name, NULL, 0);
+ write_promisor_file_after_repack(repo, line.buf, packtmp,
+ not_repacked_basenames);
item->util = generated_pack_populate(item->string, packtmp);
-
- free(promisor_name);
}
fclose(out);
@@ -256,7 +250,7 @@ void repack_promisor_objects(struct repository *repo,
return;
}
- finish_repacking_promisor_objects(repo, &cmd, names, packtmp);
+ finish_repacking_promisor_objects(repo, &cmd, names, packtmp, NULL);
}
void pack_geometry_repack_promisors(struct repository *repo,
@@ -267,6 +261,7 @@ void pack_geometry_repack_promisors(struct repository *repo,
{
struct child_process cmd = CHILD_PROCESS_INIT;
FILE *in;
+ struct strset not_repacked_basenames = STRSET_INIT;
if (!geometry->promisor_split)
return;
@@ -280,9 +275,15 @@ void pack_geometry_repack_promisors(struct repository *repo,
in = xfdopen(cmd.in, "w");
for (size_t i = 0; i < geometry->promisor_split; i++)
fprintf(in, "%s\n", pack_basename(geometry->promisor_pack[i]));
- for (size_t i = geometry->promisor_split; i < geometry->promisor_pack_nr; i++)
- fprintf(in, "^%s\n", pack_basename(geometry->promisor_pack[i]));
+ for (size_t i = geometry->promisor_split; i < geometry->promisor_pack_nr; i++) {
+ const char *name = pack_basename(geometry->promisor_pack[i]);
+ fprintf(in, "^%s\n", name);
+ strset_add(¬_repacked_basenames, name);
+ }
fclose(in);
- finish_repacking_promisor_objects(repo, &cmd, names, packtmp);
+ finish_repacking_promisor_objects(repo, &cmd, names, packtmp,
+ strset_get_size(¬_repacked_basenames) ? ¬_repacked_basenames : NULL);
+
+ strset_clear(¬_repacked_basenames);
}
--
2.53.0.584.g6b87e8e9dd
^ permalink raw reply related [flat|nested] 78+ messages in thread
* [GSoC PATCH v6 4/6] t7700: test for promisor file content after repack
2026-04-18 14:16 ` [GSoC PATCH v6 0/6] preserve promisor files content after repack LorenzoPegorari
` (2 preceding siblings ...)
2026-04-18 14:17 ` [GSoC PATCH v6 3/6] repack-promisor: preserve content of promisor files " LorenzoPegorari
@ 2026-04-18 14:17 ` LorenzoPegorari
2026-04-18 14:17 ` [GSoC PATCH v6 5/6] t7703: test for promisor file content after geometric repack LorenzoPegorari
` (2 subsequent siblings)
6 siblings, 0 replies; 78+ messages in thread
From: LorenzoPegorari @ 2026-04-18 14:17 UTC (permalink / raw)
To: git
Cc: Taylor Blau, Derrick Stolee, Junio C Hamano, Patrick Steinhardt,
Tian Yuchen, Eric Sunshine, Elijah Newren
Add tests that check if the content of ".promisor" files are correctly
copied inside the ".promisor" files created by a `repack`.
The `-f` flag is used when repacking to ensure that all the packs
(created with `test_commit_bulk`) are repacked into a single new pack.
Helped-by: Tian Yuchen <cat@malon.dev>
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: LorenzoPegorari <lorenzo.pegorari2002@gmail.com>
---
t/t7700-repack.sh | 61 +++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 61 insertions(+)
diff --git a/t/t7700-repack.sh b/t/t7700-repack.sh
index 63ef63fc50..1decd7520a 100755
--- a/t/t7700-repack.sh
+++ b/t/t7700-repack.sh
@@ -904,4 +904,65 @@ test_expect_success 'pending objects are repacked appropriately' '
)
'
+test_expect_success 'check one .promisor file content after repack' '
+ test_when_finished rm -rf prom_test prom_before_repack &&
+ git init prom_test &&
+ path=prom_test/.git/objects/pack &&
+
+ (
+ # Create 1 pack
+ test_commit_bulk -C prom_test 1 &&
+
+ # Simulate .promisor file by creating it manually
+ prom=$(find $path -name "*.pack" | sed "s/.pack$/.promisor/") &&
+ oid=$(git -C prom_test rev-parse HEAD) &&
+ echo "$oid ref" >"$prom" &&
+
+ # Repack, and check if correct
+ git -C prom_test repack -a -d -f &&
+ prom=$(find $path -name "*.promisor") &&
+ # $prom should contain "$oid ref <time>"
+ test_grep "$oid ref " "$prom" &&
+
+ # Save the current .promisor content, repack, and check if correct
+ cp "$prom" prom_before_repack &&
+ git -C prom_test repack -a -d -f &&
+ prom=$(find $path -name "*.promisor") &&
+ # $prom should be exactly the same as prom_before_repack
+ test_cmp prom_before_repack "$prom"
+ )
+'
+
+test_expect_success 'check multiple .promisor file content after repack' '
+ test_when_finished rm -rf prom_test prom_before_repack &&
+ git init prom_test &&
+ path=prom_test/.git/objects/pack &&
+
+ (
+ # Create 2 packs and simulate .promisor files by creating them manually
+ test_commit_bulk -C prom_test 1 &&
+ prom=$(find $path -name "*.pack" | sed "s/.pack$/.promisor/") &&
+ oid1=$(git -C prom_test rev-parse HEAD) &&
+ echo "$oid1 ref1" >"$prom" &&
+ test_commit_bulk -C prom_test 1 &&
+ prom=$(find $path -name "*.pack" | sed "s/.pack$/.promisor/; \|$prom|d") &&
+ oid2=$(git -C prom_test rev-parse HEAD) &&
+ echo "$oid2 ref2" >"$prom" &&
+
+ # Repack, and check if correct
+ git -C prom_test repack -a -d -f &&
+ prom=$(find $path -name "*.promisor") &&
+ # $prom should contain "$oid1 ref1 <time>" & "$oid2 ref2 <time>"
+ test_grep "$oid1 ref1 " "$prom" &&
+ test_grep "$oid2 ref2 " "$prom" &&
+
+ # Save the current .promisor content, repack, and check if correct
+ cp "$prom" prom_before_repack &&
+ git -C prom_test repack -a -d -f &&
+ prom=$(find $path -name "*.promisor") &&
+ # $prom should be exactly the same as prom_before_repack
+ test_cmp prom_before_repack "$prom"
+ )
+'
+
test_done
--
2.53.0.584.g6b87e8e9dd
^ permalink raw reply related [flat|nested] 78+ messages in thread
* [GSoC PATCH v6 5/6] t7703: test for promisor file content after geometric repack
2026-04-18 14:16 ` [GSoC PATCH v6 0/6] preserve promisor files content after repack LorenzoPegorari
` (3 preceding siblings ...)
2026-04-18 14:17 ` [GSoC PATCH v6 4/6] t7700: test for promisor file content " LorenzoPegorari
@ 2026-04-18 14:17 ` LorenzoPegorari
2026-04-18 14:17 ` [GSoC PATCH v6 6/6] repack-promisor: add missing headers LorenzoPegorari
2026-05-12 6:49 ` [GSoC PATCH v6 0/6] preserve promisor files content after repack Junio C Hamano
6 siblings, 0 replies; 78+ messages in thread
From: LorenzoPegorari @ 2026-04-18 14:17 UTC (permalink / raw)
To: git
Cc: Taylor Blau, Derrick Stolee, Junio C Hamano, Patrick Steinhardt,
Tian Yuchen, Eric Sunshine, Elijah Newren
Add test that checks if the content of ".promisor" files are correctly
copied inside the ".promisor" files created by a geometric `repack`.
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: LorenzoPegorari <lorenzo.pegorari2002@gmail.com>
---
t/t7703-repack-geometric.sh | 33 +++++++++++++++++++++++++++++++++
1 file changed, 33 insertions(+)
diff --git a/t/t7703-repack-geometric.sh b/t/t7703-repack-geometric.sh
index 04d5d8fc33..316247a3b9 100755
--- a/t/t7703-repack-geometric.sh
+++ b/t/t7703-repack-geometric.sh
@@ -541,4 +541,37 @@ test_expect_success 'geometric repack works with promisor packs' '
)
'
+test_expect_success 'check .promisor file content after geometric repack' '
+ test_when_finished rm -rf prom_test &&
+ git init prom_test &&
+ path=prom_test/.git/objects/pack &&
+
+ (
+ # Create 2 packs with 3 objs each, and manually create .promisor files
+ test_commit_bulk -C prom_test --start=1 1 && # 3 objects
+ prom1=$(find $path -name "*.pack" | sed "s/.pack$/.promisor/") &&
+ oid1=$(git -C prom_test rev-parse HEAD) &&
+ echo "$oid1 ref1" >"$prom1" &&
+ test_commit_bulk -C prom_test --start=2 1 && # 3 objects
+ prom2=$(find $path -name "*.pack" | sed "s/.pack$/.promisor/; \|$prom1|d") &&
+ oid2=$(git -C prom_test rev-parse HEAD) &&
+ echo "$oid2 ref2" >"$prom2" &&
+
+ # Create 1 pack with 12 objs, and manually create .promisor file
+ test_commit_bulk -C prom_test --start=3 4 && # 12 objects
+ prom3=$(find $path -name "*.pack" | sed "s/.pack$/.promisor/; \|$prom1|d; \|$prom2|d") &&
+ oid3=$(git -C prom_test rev-parse HEAD) &&
+ echo "$oid3 ref3" >"$prom3" &&
+
+ # Geometric repack, and check if correct
+ git -C prom_test repack --geometric 2 -d &&
+ prom=$(find $path -name "*.pack" | sed "s/.pack$/.promisor/; \|$prom3|d") &&
+ # $prom should have repacked only the first 2 small packs, so it should only
+ # contain the following: "$oid1 ref1 <time>" & "$oid2 ref2 <time>"
+ test_grep "$oid1 ref1 " "$prom" &&
+ test_grep "$oid2 ref2 " "$prom" &&
+ test_grep ! "$oid3 ref3" "$prom"
+ )
+'
+
test_done
--
2.53.0.584.g6b87e8e9dd
^ permalink raw reply related [flat|nested] 78+ messages in thread
* [GSoC PATCH v6 6/6] repack-promisor: add missing headers
2026-04-18 14:16 ` [GSoC PATCH v6 0/6] preserve promisor files content after repack LorenzoPegorari
` (4 preceding siblings ...)
2026-04-18 14:17 ` [GSoC PATCH v6 5/6] t7703: test for promisor file content after geometric repack LorenzoPegorari
@ 2026-04-18 14:17 ` LorenzoPegorari
2026-05-12 6:49 ` [GSoC PATCH v6 0/6] preserve promisor files content after repack Junio C Hamano
6 siblings, 0 replies; 78+ messages in thread
From: LorenzoPegorari @ 2026-04-18 14:17 UTC (permalink / raw)
To: git
Cc: Taylor Blau, Derrick Stolee, Junio C Hamano, Patrick Steinhardt,
Tian Yuchen, Eric Sunshine, Elijah Newren
According to the coding guidelines, a C file must directly include the
header files that declare the facilities it uses.
Directly include these missing headers, in order to comply with the
coding guidelines.
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: LorenzoPegorari <lorenzo.pegorari2002@gmail.com>
---
repack-promisor.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/repack-promisor.c b/repack-promisor.c
index 06393ef06e..472aef0081 100644
--- a/repack-promisor.c
+++ b/repack-promisor.c
@@ -1,12 +1,18 @@
#include "git-compat-util.h"
#include "repack.h"
+#include "hash.h"
#include "hex.h"
+#include "odb.h"
#include "pack.h"
#include "packfile.h"
#include "path.h"
#include "refs.h"
#include "repository.h"
#include "run-command.h"
+#include "strbuf.h"
+#include "string-list.h"
+#include "strmap.h"
+#include "strvec.h"
struct write_oid_context {
struct child_process *cmd;
--
2.53.0.584.g6b87e8e9dd
^ permalink raw reply related [flat|nested] 78+ messages in thread
* Re: [GSoC PATCH v6 0/6] preserve promisor files content after repack
2026-04-18 14:16 ` [GSoC PATCH v6 0/6] preserve promisor files content after repack LorenzoPegorari
` (5 preceding siblings ...)
2026-04-18 14:17 ` [GSoC PATCH v6 6/6] repack-promisor: add missing headers LorenzoPegorari
@ 2026-05-12 6:49 ` Junio C Hamano
6 siblings, 0 replies; 78+ messages in thread
From: Junio C Hamano @ 2026-05-12 6:49 UTC (permalink / raw)
To: git, Christian Couder
Cc: Taylor Blau, LorenzoPegorari, Derrick Stolee, Patrick Steinhardt,
Tian Yuchen, Eric Sunshine, Elijah Newren
LorenzoPegorari <lorenzo.pegorari2002@gmail.com> writes:
> The goal of this patch is to solve the NEEDSWORK comment added by
> 5374a290 (fetch-pack: write fetched refs to .promisor, 14/10/2019). This
> is done by adding a helper function that takes the content of all
> .promisor files in the `repository`, and copies it inside the first
> .promisor file created by the repack.
>
> Also, I added a comment explaining what is the purpose of the content of
> the .promisor files, since this wasn't explained anywhere (I found
> information regarding this only in the message of the previously cited
> commit).
>
> Finally, I added some tests to "t7700-repack.sh" and
> "t7703-repack-geometric.sh" that check if the content of .promisor files
> are correctly copied into the .promisor files created by a repack.
>
> V6 DIFF:
> * changed the name of the helper function to
> `write_promisor_file_after_repack`.
> * modified the helper function to create the ".promisor" file, so that
> is not required anymore.
> * modified the logic of the helper function (as suggested by Tian
> Yuchen)
> * modified the helper function to check for possible errors, and to
> check if the lines of the ".promisor" files are correctly formed.
> * fixed memory leak.
> * improved comments.
>
> LorenzoPegorari (6):
> pack-write: add explanation to promisor file content
> repack-promisor add helper to fill promisor file after repack
> repack-promisor: preserve content of promisor files after repack
> t7700: test for promisor file content after repack
> t7703: test for promisor file content after geometric repack
> repack-promisor: add missing headers
>
> Documentation/git-repack.adoc | 4 +-
> pack-write.c | 9 ++
> repack-promisor.c | 194 ++++++++++++++++++++++++++++++----
> t/t7700-repack.sh | 61 +++++++++++
> t/t7703-repack-geometric.sh | 33 ++++++
> 5 files changed, 280 insertions(+), 21 deletions(-)
Lorenzo, it seems that not many people are reviewing this final
round, and then I noticed that the list of CC addresses lacks a big
name in the promisor remote topic, so I added Christian to the To:
line of this message. Christian, you have no obligation to review
these patches if they do not interest you, but just in case you
weren't aware of this effort, I thought it might interest you; I am
sure we all would benefit from your expertise.
Thanks.
^ permalink raw reply [flat|nested] 78+ messages in thread
end of thread, other threads:[~2026-05-12 6:49 UTC | newest]
Thread overview: 78+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-21 21:28 [GSoC PATCH 0/3] preserve promisor files content after repack LorenzoPegorari
2026-03-21 21:28 ` [GSoC PATCH 1/3] pack-write: add explanation to promisor file content LorenzoPegorari
2026-03-21 21:28 ` [GSoC PATCH 2/3] pack-write: add helper to fill promisor file after repack LorenzoPegorari
2026-03-22 2:04 ` Eric Sunshine
2026-03-22 18:50 ` Lorenzo Pegorari
2026-03-21 21:29 ` [GSoC PATCH 3/3] repack-promisor: preserve content of promisor files " LorenzoPegorari
2026-03-22 19:16 ` [GSoC PATCH v2 0/4] preserve promisor files content " LorenzoPegorari
2026-03-22 19:16 ` [GSoC PATCH v2 1/4] pack-write: add explanation to promisor file content LorenzoPegorari
2026-03-23 21:07 ` Junio C Hamano
2026-03-25 21:33 ` Lorenzo Pegorari
2026-03-22 19:18 ` [GSoC PATCH v2 2/4] pack-write: add helper to fill promisor file after repack LorenzoPegorari
2026-03-23 20:27 ` Eric Sunshine
2026-03-26 16:15 ` Lorenzo Pegorari
2026-03-23 21:30 ` Junio C Hamano
2026-03-26 2:01 ` Lorenzo Pegorari
2026-03-22 19:18 ` [GSoC PATCH v2 3/4] repack-promisor: preserve content of promisor files " LorenzoPegorari
2026-03-23 21:48 ` Junio C Hamano
2026-03-26 2:12 ` Lorenzo Pegorari
2026-03-22 19:18 ` [GSoC PATCH v2 4/4] t7700: test for promisor file content " LorenzoPegorari
2026-04-06 0:23 ` [GSoC PATCH v3 0/5] preserve promisor files " LorenzoPegorari
2026-04-06 0:24 ` [GSoC PATCH v3 1/5] pack-write: add explanation to promisor file content LorenzoPegorari
2026-04-06 0:24 ` [GSoC PATCH v3 2/5] pack-write: add helper to fill promisor file after repack LorenzoPegorari
2026-04-06 17:22 ` Tian Yuchen
2026-04-06 18:40 ` Lorenzo Pegorari
2026-04-06 21:17 ` Junio C Hamano
2026-04-07 21:46 ` Lorenzo Pegorari
2026-04-07 2:01 ` Junio C Hamano
2026-04-07 21:52 ` Lorenzo Pegorari
2026-04-07 22:03 ` Junio C Hamano
2026-04-06 21:34 ` Junio C Hamano
2026-04-07 22:07 ` Lorenzo Pegorari
2026-04-06 0:25 ` [GSoC PATCH v3 3/5] repack-promisor: preserve content of promisor files " LorenzoPegorari
2026-04-06 0:25 ` [GSoC PATCH v3 4/5] t7700: test for promisor file content " LorenzoPegorari
2026-04-06 22:05 ` Junio C Hamano
2026-04-07 23:28 ` Lorenzo Pegorari
2026-04-07 18:10 ` Junio C Hamano
2026-04-07 23:11 ` Lorenzo Pegorari
2026-04-08 0:38 ` Lorenzo Pegorari
2026-04-06 0:25 ` [GSoC PATCH v3 5/5] t7703: test for promisor file content after geometric repack LorenzoPegorari
2026-04-10 15:01 ` [GSoC PATCH v4 0/5] preserve promisor files content after repack LorenzoPegorari
2026-04-10 15:02 ` [GSoC PATCH v4 1/5] pack-write: add explanation to promisor file content LorenzoPegorari
2026-04-10 15:02 ` [GSoC PATCH v4 2/5] pack-write: add helper to fill promisor file after repack LorenzoPegorari
2026-04-10 16:01 ` Junio C Hamano
2026-04-10 16:34 ` Lorenzo Pegorari
2026-04-10 18:10 ` [PATCH] CodingGuidelines: st_mtimespec vs st_mtim vs st_mtime Junio C Hamano
2026-04-16 23:46 ` Elijah Newren
2026-04-17 4:25 ` Junio C Hamano
2026-04-10 15:03 ` [GSoC PATCH v4 3/5] repack-promisor: preserve content of promisor files after repack LorenzoPegorari
2026-04-10 15:04 ` [GSoC PATCH v4 4/5] t7700: test for promisor file content " LorenzoPegorari
2026-04-10 15:04 ` [GSoC PATCH v4 5/5] t7703: test for promisor file content after geometric repack LorenzoPegorari
2026-04-10 15:47 ` [GSoC PATCH v4 0/5] preserve promisor files content after repack Junio C Hamano
2026-04-10 16:44 ` Lorenzo Pegorari
2026-04-10 22:54 ` [GSoC PATCH v5 0/6] " LorenzoPegorari
2026-04-10 22:54 ` [GSoC PATCH v5 1/6] pack-write: add explanation to promisor file content LorenzoPegorari
2026-04-10 22:55 ` [GSoC PATCH v5 2/6] repack-promisor add helper to fill promisor file after repack LorenzoPegorari
2026-04-10 23:30 ` Junio C Hamano
2026-04-11 1:59 ` Lorenzo Pegorari
2026-04-12 6:27 ` Junio C Hamano
2026-04-17 0:30 ` Lorenzo Pegorari
2026-04-10 22:55 ` [GSoC PATCH v5 3/6] repack-promisor: preserve content of promisor files " LorenzoPegorari
2026-04-11 18:25 ` Tian Yuchen
2026-04-17 0:34 ` Lorenzo Pegorari
2026-04-10 22:55 ` [GSoC PATCH v5 4/6] t7700: test for promisor file content " LorenzoPegorari
2026-04-10 22:56 ` [GSoC PATCH v5 5/6] t7703: test for promisor file content after geometric repack LorenzoPegorari
2026-04-11 18:49 ` Tian Yuchen
2026-04-17 0:46 ` Lorenzo Pegorari
2026-04-10 22:56 ` [GSoC PATCH v5 6/6] repack-promisor: add missing headers LorenzoPegorari
2026-04-18 14:16 ` [GSoC PATCH v6 0/6] preserve promisor files content after repack LorenzoPegorari
2026-04-18 14:16 ` [GSoC PATCH v6 1/6] pack-write: add explanation to promisor file content LorenzoPegorari
2026-04-18 14:17 ` [GSoC PATCH v6 2/6] repack-promisor add helper to fill promisor file after repack LorenzoPegorari
2026-04-18 14:17 ` [GSoC PATCH v6 3/6] repack-promisor: preserve content of promisor files " LorenzoPegorari
2026-04-18 14:17 ` [GSoC PATCH v6 4/6] t7700: test for promisor file content " LorenzoPegorari
2026-04-18 14:17 ` [GSoC PATCH v6 5/6] t7703: test for promisor file content after geometric repack LorenzoPegorari
2026-04-18 14:17 ` [GSoC PATCH v6 6/6] repack-promisor: add missing headers LorenzoPegorari
2026-05-12 6:49 ` [GSoC PATCH v6 0/6] preserve promisor files content after repack Junio C Hamano
2026-04-10 23:05 ` [GSoC PATCH v4 0/5] " Junio C Hamano
2026-04-11 2:02 ` Junio C Hamano
2026-04-11 14:05 ` Lorenzo Pegorari
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox