git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Bug in 2.48 with `git refs migrate`
@ 2025-01-13 13:56 brian m. carlson
  2025-01-13 15:45 ` Patrick Steinhardt
  2025-01-15 11:54 ` Karthik Nayak
  0 siblings, 2 replies; 21+ messages in thread
From: brian m. carlson @ 2025-01-13 13:56 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt

[-- Attachment #1: Type: text/plain, Size: 1957 bytes --]

Hi,

I noticed that Git 2.48 has support for migrating refs when there are
reflogs and, as promised at Git Merge, I decided to try it out.
Unfortunately, I got an error:

----
% git refs migrate --ref-format=reftable --dry-run
error: reftable: transaction failure: corrupt reftable file
----

Here's a small reproduction case:

----
#!/bin/sh

rm -fr test-repo
git init -b dev test-repo
cd test-repo

# start first block
touch foo.txt
git add foo.txt
git commit -m +

head=$(git rev-parse HEAD)
seq 5000 | sed -Ee "s!^(.*)\$!create refs/heads/ref-\1 $head!" | git update-ref --stdin
# end first block

# start second block
echo abc >bar.txt
git add bar.txt
git commit -m +
head=$(git rev-parse HEAD)
seq 3000 | sed -Ee "s!^(.*)\$!update refs/heads/ref-\1 $head!" | git update-ref --stdin
# end second block

git refs migrate --ref-format=reftable
----

I can also reproduce this on the latest master.

If you remove the second block, it does not appear to reproduce.  Some
investigation led me to the conclusion that the difference is when
max_update_index is not 1, the header has the value 1 for it but the
trailer has the correct value, and so we flag the header and trailer as
mismatching and therefore it gets marked as corrupt.  I believe the
reason things work when removing the second block is because that value
remains 1, and so it works.

I haven't done anything else to investigate here, for which I apologize,
but I just wanted to mention it while it was fresh on my mind.

In case this is helpful, I did see this when attempting to migrate two
work repositories with lots of reflogs and many refs (the smaller has
2983 and the larger, 44832).  I obviously cannot send you these
repositories or things in them, but I'm happy to test patches against
them.

Please let me know if I can provide more useful information.
-- 
brian m. carlson (they/them or he/him)
Toronto, Ontario, CA

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Bug in 2.48 with `git refs migrate`
  2025-01-13 13:56 Bug in 2.48 with `git refs migrate` brian m. carlson
@ 2025-01-13 15:45 ` Patrick Steinhardt
  2025-01-15 11:54 ` Karthik Nayak
  1 sibling, 0 replies; 21+ messages in thread
From: Patrick Steinhardt @ 2025-01-13 15:45 UTC (permalink / raw)
  To: brian m. carlson, git; +Cc: Karthik Nayak

On Mon, Jan 13, 2025 at 01:56:33PM +0000, brian m. carlson wrote:
> If you remove the second block, it does not appear to reproduce.  Some
> investigation led me to the conclusion that the difference is when
> max_update_index is not 1, the header has the value 1 for it but the
> trailer has the correct value, and so we flag the header and trailer as
> mismatching and therefore it gets marked as corrupt.  I believe the
> reason things work when removing the second block is because that value
> remains 1, and so it works.

Hm. Makes me wonder whether this is caused by the newly added code to
also migrate reflog entries, as we'd play around with the update index
only when migrating those. Cc'ing Karthik so he can have a look.

> I haven't done anything else to investigate here, for which I apologize,
> but I just wanted to mention it while it was fresh on my mind.
> 
> In case this is helpful, I did see this when attempting to migrate two
> work repositories with lots of reflogs and many refs (the smaller has
> 2983 and the larger, 44832).  I obviously cannot send you these
> repositories or things in them, but I'm happy to test patches against
> them.
> 
> Please let me know if I can provide more useful information.

Thanks for your report!

Patrick

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Bug in 2.48 with `git refs migrate`
  2025-01-13 13:56 Bug in 2.48 with `git refs migrate` brian m. carlson
  2025-01-13 15:45 ` Patrick Steinhardt
@ 2025-01-15 11:54 ` Karthik Nayak
  2025-01-15 17:07   ` Junio C Hamano
                     ` (3 more replies)
  1 sibling, 4 replies; 21+ messages in thread
From: Karthik Nayak @ 2025-01-15 11:54 UTC (permalink / raw)
  To: brian m. carlson, git; +Cc: Patrick Steinhardt

[-- Attachment #1: Type: text/plain, Size: 8839 bytes --]

"brian m. carlson" <sandals@crustytoothpaste.net> writes:

Hello brian,

> Hi,
>
> I noticed that Git 2.48 has support for migrating refs when there are
> reflogs and, as promised at Git Merge, I decided to try it out.
> Unfortunately, I got an error:
>

Thanks for trying it out and reporting.

>
> ----
> % git refs migrate --ref-format=reftable --dry-run
> error: reftable: transaction failure: corrupt reftable file
> ----
>
> Here's a small reproduction case:
>
> ----
> #!/bin/sh
>
> rm -fr test-repo
> git init -b dev test-repo
> cd test-repo
>
> # start first block
> touch foo.txt
> git add foo.txt
> git commit -m +
>
> head=$(git rev-parse HEAD)
> seq 5000 | sed -Ee "s!^(.*)\$!create refs/heads/ref-\1 $head!" | git update-ref --stdin
> # end first block
>
> # start second block
> echo abc >bar.txt
> git add bar.txt
> git commit -m +
> head=$(git rev-parse HEAD)
> seq 3000 | sed -Ee "s!^(.*)\$!update refs/heads/ref-\1 $head!" | git update-ref --stdin
> # end second block
>
> git refs migrate --ref-format=reftable
> ----
>
> I can also reproduce this on the latest master.
>

I can also reproduce it using the script provided here.

>
> If you remove the second block, it does not appear to reproduce.  Some
> investigation led me to the conclusion that the difference is when
> max_update_index is not 1, the header has the value 1 for it but the
> trailer has the correct value, and so we flag the header and trailer as
> mismatching and therefore it gets marked as corrupt.  I believe the
> reason things work when removing the second block is because that value
> remains 1, and so it works.
>

Indeed, you're absolutely correct here, this is because the
`max_update_index` is indeed different between the footer and the
header. This flow is triggered when we have multiple reftable blocks
being written with multiple reflog entries. The multiple blocks is
crucial because we update the `max_update_index` right before writing
the reflogs. So if there are refs being written and they spawn more than
one block, the header is written without the updated `max_update_index`,
causing the mismatch.

> I haven't done anything else to investigate here, for which I apologize,
> but I just wanted to mention it while it was fresh on my mind.
>

I think you've provided sufficient information and also enough to make
it easy for me to debug.

> In case this is helpful, I did see this when attempting to migrate two
> work repositories with lots of reflogs and many refs (the smaller has
> 2983 and the larger, 44832).  I obviously cannot send you these
> repositories or things in them, but I'm happy to test patches against
> them.
>

I'm attaching a patch below which should fixes the issue for me and also
adding a test to test against the same. I'd be grateful if you could
also test the patch against the repositoryies you mention.

> Please let me know if I can provide more useful information.
> --
> brian m. carlson (they/them or he/him)
> Toronto, Ontario, CA

-- >8 --

Subject: [PATCH] reftable: write correct max_update_index to header

In 297c09eabb (refs: allow multiple reflog entries for the same refname,
2024-12-16), the reftable backend learned to handle multiple reflog
entries within the same transaction. This was done modifying the
`update_index` for reflogs with multiple indices. During writing the
logs, the `max_update_index` of the writer was modified to ensure the
limits were raised to the modified `update_index`s.

However, since ref entries are written before the modification to the
`max_update_index`, if there are multiple blocks to be written, the
reftable backend writes the header with the old `max_update_index`. When
all logs are finally written, the footer will be written with the new
`min_update_index`. This causes a mismatch between the header and the
footer and causes the reftable file to be corrupted. The existing tests
only spawn a single block and since headers are lazily written with the
first block, the tests didn't capture this bug.

To fix the issue, the appropriate `max_update_index` limit must be set
even before the first block is written. Add a `max_index` field to the
transaction which holds the `max_index` within all its updates, then
propagate this value to the reftable backend, wherein this is used to
the set the `max_update_index` correctly.

Add a test which creates a few thousand reference updates with multiple
reflog entries, which should trigger the bug.

Reported-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
 refs.c                  |  7 +++++++
 refs/refs-internal.h    |  1 +
 refs/reftable-backend.c | 20 ++++++++++----------
 t/t1460-refs-migrate.sh | 12 ++++++++++++
 4 files changed, 30 insertions(+), 10 deletions(-)

diff --git a/refs.c b/refs.c
index 0f41b2fd4a..f7b6f0f897 100644
--- a/refs.c
+++ b/refs.c
@@ -1345,6 +1345,13 @@ int ref_transaction_update_reflog(struct
ref_transaction *transaction,
 	update->flags &= ~REF_HAVE_OLD;
 	update->index = index;

+	/*
+	 * Reference backends may need to know the max index to optimize
+	 * their writes. So we store the max_index on the transaction level.
+	 */
+	if (index > transaction->max_index)
+		transaction->max_index = index;
+
 	return 0;
 }

diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index 16550862d3..aaab711bb9 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -203,6 +203,7 @@ struct ref_transaction {
 	enum ref_transaction_state state;
 	void *backend_data;
 	unsigned int flags;
+	unsigned int max_index;
 };

 /*
diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
index 00d95a9a2f..289496058e 100644
--- a/refs/reftable-backend.c
+++ b/refs/reftable-backend.c
@@ -942,6 +942,7 @@ struct write_transaction_table_arg {
 	size_t updates_nr;
 	size_t updates_alloc;
 	size_t updates_expected;
+	unsigned int max_index;
 };

 struct reftable_transaction_data {
@@ -1428,7 +1429,6 @@ static int write_transaction_table(struct
reftable_writer *writer, void *cb_data
 	struct reftable_log_record *logs = NULL;
 	struct ident_split committer_ident = {0};
 	size_t logs_nr = 0, logs_alloc = 0, i;
-	uint64_t max_update_index = ts;
 	const char *committer_info;
 	int ret = 0;

@@ -1438,7 +1438,12 @@ static int write_transaction_table(struct
reftable_writer *writer, void *cb_data

 	QSORT(arg->updates, arg->updates_nr, transaction_update_cmp);

-	reftable_writer_set_limits(writer, ts, ts);
+	/*
+	 * During reflog migration, we add indexes for a single reflog with
+	 * multiple entries. Each entry will contain a different update_index,
+	 * so set the limits accordingly.
+	 */
+	reftable_writer_set_limits(writer, ts, ts + arg->max_index);

 	for (i = 0; i < arg->updates_nr; i++) {
 		struct reftable_transaction_update *tx_update = &arg->updates[i];
@@ -1540,12 +1545,6 @@ static int write_transaction_table(struct
reftable_writer *writer, void *cb_data
 				 */
 				log->update_index = ts + u->index;

-				/*
-				 * Note the max update_index so the limit can be set later on.
-				 */
-				if (log->update_index > max_update_index)
-					max_update_index = log->update_index;
-
 				log->refname = xstrdup(u->refname);
 				memcpy(log->value.update.new_hash,
 				       u->new_oid.hash, GIT_MAX_RAWSZ);
@@ -1609,8 +1608,6 @@ static int write_transaction_table(struct
reftable_writer *writer, void *cb_data
 	 * and log blocks.
 	 */
 	if (logs) {
-		reftable_writer_set_limits(writer, ts, max_update_index);
-
 		ret = reftable_writer_add_logs(writer, logs, logs_nr);
 		if (ret < 0)
 			goto done;
@@ -1631,6 +1628,9 @@ static int reftable_be_transaction_finish(struct
ref_store *ref_store UNUSED,
 	struct reftable_transaction_data *tx_data = transaction->backend_data;
 	int ret = 0;

+	if (tx_data->args)
+		tx_data->args->max_index = transaction->max_index;
+
 	for (size_t i = 0; i < tx_data->args_nr; i++) {
 		ret = reftable_addition_add(tx_data->args[i].addition,
 					    write_transaction_table, &tx_data->args[i]);
diff --git a/t/t1460-refs-migrate.sh b/t/t1460-refs-migrate.sh
index f59bc4860f..307b2998ef 100755
--- a/t/t1460-refs-migrate.sh
+++ b/t/t1460-refs-migrate.sh
@@ -227,6 +227,18 @@ do
 	done
 done

+test_expect_success 'multiple reftable blocks with multiple entries' '
+	test_when_finished "rm -rf repo" &&
+	git init --ref-format=files repo &&
+	test_commit -C repo first &&
+	printf "create refs/heads/ref-%d HEAD\n" $(test_seq 5000) >stdin &&
+	git -C repo update-ref --stdin <stdin &&
+	test_commit -C repo second &&
+	printf "update refs/heads/ref-%d HEAD\n" $(test_seq 3000) >stdin &&
+	git -C repo update-ref --stdin <stdin &&
+	test_migration repo reftable
+'
+
 test_expect_success 'migrating from files format deletes backend files' '
 	test_when_finished "rm -rf repo" &&
 	git init --ref-format=files repo &&
-- 
2.47.0

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: Bug in 2.48 with `git refs migrate`
  2025-01-15 11:54 ` Karthik Nayak
@ 2025-01-15 17:07   ` Junio C Hamano
  2025-01-16  6:44     ` Karthik Nayak
  2025-01-16  2:05   ` brian m. carlson
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 21+ messages in thread
From: Junio C Hamano @ 2025-01-15 17:07 UTC (permalink / raw)
  To: Karthik Nayak; +Cc: brian m. carlson, git, Patrick Steinhardt

Karthik Nayak <karthik.188@gmail.com> writes:

> Subject: [PATCH] reftable: write correct max_update_index to header
>
> In 297c09eabb (refs: allow multiple reflog entries for the same refname,
> 2024-12-16), the reftable backend learned to handle multiple reflog
> entries within the same transaction. This was done modifying the

"done" -> "done by", I think.

> To fix the issue, the appropriate `max_update_index` limit must be set
> even before the first block is written. Add a `max_index` field to the
> transaction which holds the `max_index` within all its updates, then
> propagate this value to the reftable backend, wherein this is used to
> the set the `max_update_index` correctly.


> diff --git a/refs.c b/refs.c
> index 0f41b2fd4a..f7b6f0f897 100644
> --- a/refs.c
> +++ b/refs.c
> @@ -1345,6 +1345,13 @@ int ref_transaction_update_reflog(struct

Not the focus of this topic/fix, but I notice that the only caller
of this ref_transaction_update_reflog() function is a static
function migrate_one_reflog_entry() in the same file.  Do we expect
that we would add more callers?  Otherwise we should make it a file
scope static and remove it from <refs.h> file.

> ref_transaction *transaction,
>  	update->flags &= ~REF_HAVE_OLD;
>  	update->index = index;
>
> +	/*
> +	 * Reference backends may need to know the max index to optimize
> +	 * their writes. So we store the max_index on the transaction level.
> +	 */
> +	if (index > transaction->max_index)
> +		transaction->max_index = index;
> +
>  	return 0;
>  }

So from the problem description, whenever we consume an index by
assigning it to an update that belongs to a transaction, we must
make sure that transaction's max_index covers that value of the
index?  I was wondering if we should have a less error prone way to
do that by having a helper function that takes ref_update and
ref_transaction objects to do the above, but this is exclusively
used by reflog migration code path and nowhere else, so it may
probably be fine as-is.  I dunno.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Bug in 2.48 with `git refs migrate`
  2025-01-15 11:54 ` Karthik Nayak
  2025-01-15 17:07   ` Junio C Hamano
@ 2025-01-16  2:05   ` brian m. carlson
  2025-01-16  7:29     ` Karthik Nayak
  2025-01-16 23:21   ` brian m. carlson
  2025-01-23 13:56   ` [PATCH v2] reftable: write correct max_update_index to header Karthik Nayak
  3 siblings, 1 reply; 21+ messages in thread
From: brian m. carlson @ 2025-01-16  2:05 UTC (permalink / raw)
  To: Karthik Nayak; +Cc: git, Patrick Steinhardt

[-- Attachment #1: Type: text/plain, Size: 1575 bytes --]

On 2025-01-15 at 11:54:51, Karthik Nayak wrote:
> "brian m. carlson" <sandals@crustytoothpaste.net> writes:
> 
> I'm attaching a patch below which should fixes the issue for me and also
> adding a test to test against the same. I'd be grateful if you could
> also test the patch against the repositoryies you mention.

Fantastic, I'll try to do that tomorrow and get back to you.  I really
appreciate such a prompt response.

> diff --git a/refs/refs-internal.h b/refs/refs-internal.h
> index 16550862d3..aaab711bb9 100644
> --- a/refs/refs-internal.h
> +++ b/refs/refs-internal.h
> @@ -203,6 +203,7 @@ struct ref_transaction {
>  	enum ref_transaction_state state;
>  	void *backend_data;
>  	unsigned int flags;
> +	unsigned int max_index;
>  };
> 
>  /*
> diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
> index 00d95a9a2f..289496058e 100644
> --- a/refs/reftable-backend.c
> +++ b/refs/reftable-backend.c
> @@ -942,6 +942,7 @@ struct write_transaction_table_arg {
>  	size_t updates_nr;
>  	size_t updates_alloc;
>  	size_t updates_expected;
> +	unsigned int max_index;

I wonder if this and the above should be `uint64_t` instead of `unsigned
int`.  From the file names and the data format, it looks like we
intentionally use a 64-bit integer.  That's good, because I have
unfortunately seen some people who have created giant test repositories
with really unreasonable numbers of commits and I could see us possibly
exceeding a 32-bit integer here.
-- 
brian m. carlson (they/them or he/him)
Toronto, Ontario, CA

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Bug in 2.48 with `git refs migrate`
  2025-01-15 17:07   ` Junio C Hamano
@ 2025-01-16  6:44     ` Karthik Nayak
  0 siblings, 0 replies; 21+ messages in thread
From: Karthik Nayak @ 2025-01-16  6:44 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: brian m. carlson, git, Patrick Steinhardt

[-- Attachment #1: Type: text/plain, Size: 2500 bytes --]

Junio C Hamano <gitster@pobox.com> writes:

> Karthik Nayak <karthik.188@gmail.com> writes:
>
>> Subject: [PATCH] reftable: write correct max_update_index to header
>>
>> In 297c09eabb (refs: allow multiple reflog entries for the same refname,
>> 2024-12-16), the reftable backend learned to handle multiple reflog
>> entries within the same transaction. This was done modifying the
>
> "done" -> "done by", I think.
>

Yup, thanks!

>> To fix the issue, the appropriate `max_update_index` limit must be set
>> even before the first block is written. Add a `max_index` field to the
>> transaction which holds the `max_index` within all its updates, then
>> propagate this value to the reftable backend, wherein this is used to
>> the set the `max_update_index` correctly.
>
>
>> diff --git a/refs.c b/refs.c
>> index 0f41b2fd4a..f7b6f0f897 100644
>> --- a/refs.c
>> +++ b/refs.c
>> @@ -1345,6 +1345,13 @@ int ref_transaction_update_reflog(struct
>
> Not the focus of this topic/fix, but I notice that the only caller
> of this ref_transaction_update_reflog() function is a static
> function migrate_one_reflog_entry() in the same file.  Do we expect
> that we would add more callers?  Otherwise we should make it a file
> scope static and remove it from <refs.h> file.
>

I don't think we expect any other callers now. Maybe someday we'd expose
creating reflogs to the users. But for now, making it static is more
worthwhile.

Let's follow the boy scout rule and clean this up, I'll add a commit to
do the same in v2 of my patch.

>> ref_transaction *transaction,
>>  	update->flags &= ~REF_HAVE_OLD;
>>  	update->index = index;
>>
>> +	/*
>> +	 * Reference backends may need to know the max index to optimize
>> +	 * their writes. So we store the max_index on the transaction level.
>> +	 */
>> +	if (index > transaction->max_index)
>> +		transaction->max_index = index;
>> +
>>  	return 0;
>>  }
>
> So from the problem description, whenever we consume an index by
> assigning it to an update that belongs to a transaction, we must
> make sure that transaction's max_index covers that value of the
> index?  I was wondering if we should have a less error prone way to
> do that by having a helper function that takes ref_update and
> ref_transaction objects to do the above, but this is exclusively
> used by reflog migration code path and nowhere else, so it may
> probably be fine as-is.  I dunno.

Yup, that's correct. It should be okay, but let me do it anyways, makes
it safer for the future.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Bug in 2.48 with `git refs migrate`
  2025-01-16  2:05   ` brian m. carlson
@ 2025-01-16  7:29     ` Karthik Nayak
  0 siblings, 0 replies; 21+ messages in thread
From: Karthik Nayak @ 2025-01-16  7:29 UTC (permalink / raw)
  To: brian m. carlson; +Cc: git, Patrick Steinhardt

[-- Attachment #1: Type: text/plain, Size: 1956 bytes --]

"brian m. carlson" <sandals@crustytoothpaste.net> writes:

> On 2025-01-15 at 11:54:51, Karthik Nayak wrote:
>> "brian m. carlson" <sandals@crustytoothpaste.net> writes:
>>
>> I'm attaching a patch below which should fixes the issue for me and also
>> adding a test to test against the same. I'd be grateful if you could
>> also test the patch against the repositoryies you mention.
>
> Fantastic, I'll try to do that tomorrow and get back to you.  I really
> appreciate such a prompt response.
>
>> diff --git a/refs/refs-internal.h b/refs/refs-internal.h
>> index 16550862d3..aaab711bb9 100644
>> --- a/refs/refs-internal.h
>> +++ b/refs/refs-internal.h
>> @@ -203,6 +203,7 @@ struct ref_transaction {
>>  	enum ref_transaction_state state;
>>  	void *backend_data;
>>  	unsigned int flags;
>> +	unsigned int max_index;
>>  };
>>
>>  /*
>> diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
>> index 00d95a9a2f..289496058e 100644
>> --- a/refs/reftable-backend.c
>> +++ b/refs/reftable-backend.c
>> @@ -942,6 +942,7 @@ struct write_transaction_table_arg {
>>  	size_t updates_nr;
>>  	size_t updates_alloc;
>>  	size_t updates_expected;
>> +	unsigned int max_index;
>
> I wonder if this and the above should be `uint64_t` instead of `unsigned
> int`.  From the file names and the data format, it looks like we
> intentionally use a 64-bit integer.  That's good, because I have
> unfortunately seen some people who have created giant test repositories
> with really unreasonable numbers of commits and I could see us possibly
> exceeding a 32-bit integer here.

Ideally it should be okay, since this only comes into play when we're
migrating reflogs. So the index would be the number of reflog entries
for a given ref. I suppose even for large repositories this number
should be low. But I'd rather be safe here, so let me modify this in the
next version to be 'uint64_t'.

> --
> brian m. carlson (they/them or he/him)
> Toronto, Ontario, CA

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Bug in 2.48 with `git refs migrate`
  2025-01-15 11:54 ` Karthik Nayak
  2025-01-15 17:07   ` Junio C Hamano
  2025-01-16  2:05   ` brian m. carlson
@ 2025-01-16 23:21   ` brian m. carlson
  2025-01-17  0:04     ` Junio C Hamano
  2025-01-17  6:15     ` Karthik Nayak
  2025-01-23 13:56   ` [PATCH v2] reftable: write correct max_update_index to header Karthik Nayak
  3 siblings, 2 replies; 21+ messages in thread
From: brian m. carlson @ 2025-01-16 23:21 UTC (permalink / raw)
  To: Karthik Nayak; +Cc: git, Patrick Steinhardt

[-- Attachment #1: Type: text/plain, Size: 695 bytes --]

On 2025-01-15 at 11:54:51, Karthik Nayak wrote:
> "brian m. carlson" <sandals@crustytoothpaste.net> writes:
> I'm attaching a patch below which should fixes the issue for me and also
> adding a test to test against the same. I'd be grateful if you could
> also test the patch against the repositoryies you mention.

I can confirm that the patch did indeed fix the problem.  I was able to
convert both repositories successfully (and very quickly, no less), and
they both work fine (I did normal development activity with them) with
an unmodified Git 2.48 after the migration process.

Thanks again for the quick fix.
-- 
brian m. carlson (they/them or he/him)
Toronto, Ontario, CA

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Bug in 2.48 with `git refs migrate`
  2025-01-16 23:21   ` brian m. carlson
@ 2025-01-17  0:04     ` Junio C Hamano
  2025-01-17  6:17       ` Karthik Nayak
  2025-01-17  6:15     ` Karthik Nayak
  1 sibling, 1 reply; 21+ messages in thread
From: Junio C Hamano @ 2025-01-17  0:04 UTC (permalink / raw)
  To: brian m. carlson; +Cc: Karthik Nayak, git, Patrick Steinhardt

"brian m. carlson" <sandals@crustytoothpaste.net> writes:

> On 2025-01-15 at 11:54:51, Karthik Nayak wrote:
>> "brian m. carlson" <sandals@crustytoothpaste.net> writes:
>> I'm attaching a patch below which should fixes the issue for me and also
>> adding a test to test against the same. I'd be grateful if you could
>> also test the patch against the repositoryies you mention.
>
> I can confirm that the patch did indeed fix the problem.  I was able to
> convert both repositories successfully (and very quickly, no less), and
> they both work fine (I did normal development activity with them) with
> an unmodified Git 2.48 after the migration process.
>
> Thanks again for the quick fix.

Thanks, both.  Let's merge it down to 'next' and then to 'master',
then.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Bug in 2.48 with `git refs migrate`
  2025-01-16 23:21   ` brian m. carlson
  2025-01-17  0:04     ` Junio C Hamano
@ 2025-01-17  6:15     ` Karthik Nayak
  1 sibling, 0 replies; 21+ messages in thread
From: Karthik Nayak @ 2025-01-17  6:15 UTC (permalink / raw)
  To: brian m. carlson; +Cc: git, Patrick Steinhardt

[-- Attachment #1: Type: text/plain, Size: 836 bytes --]

"brian m. carlson" <sandals@crustytoothpaste.net> writes:

> On 2025-01-15 at 11:54:51, Karthik Nayak wrote:
>> "brian m. carlson" <sandals@crustytoothpaste.net> writes:
>> I'm attaching a patch below which should fixes the issue for me and also
>> adding a test to test against the same. I'd be grateful if you could
>> also test the patch against the repositoryies you mention.
>
> I can confirm that the patch did indeed fix the problem.  I was able to
> convert both repositories successfully (and very quickly, no less), and
> they both work fine (I did normal development activity with them) with
> an unmodified Git 2.48 after the migration process.
>

Wohoo!

> Thanks again for the quick fix.

Thanks for reporting and testing! I'm glad it fixed the issue.

> --
> brian m. carlson (they/them or he/him)
> Toronto, Ontario, CA

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Bug in 2.48 with `git refs migrate`
  2025-01-17  0:04     ` Junio C Hamano
@ 2025-01-17  6:17       ` Karthik Nayak
  0 siblings, 0 replies; 21+ messages in thread
From: Karthik Nayak @ 2025-01-17  6:17 UTC (permalink / raw)
  To: Junio C Hamano, brian m. carlson; +Cc: git, Patrick Steinhardt

[-- Attachment #1: Type: text/plain, Size: 1055 bytes --]

Junio C Hamano <gitster@pobox.com> writes:

> "brian m. carlson" <sandals@crustytoothpaste.net> writes:
>
>> On 2025-01-15 at 11:54:51, Karthik Nayak wrote:
>>> "brian m. carlson" <sandals@crustytoothpaste.net> writes:
>>> I'm attaching a patch below which should fixes the issue for me and also
>>> adding a test to test against the same. I'd be grateful if you could
>>> also test the patch against the repositoryies you mention.
>>
>> I can confirm that the patch did indeed fix the problem.  I was able to
>> convert both repositories successfully (and very quickly, no less), and
>> they both work fine (I did normal development activity with them) with
>> an unmodified Git 2.48 after the migration process.
>>
>> Thanks again for the quick fix.
>
> Thanks, both.  Let's merge it down to 'next' and then to 'master',
> then.

I was going to send some additional commits around the patch as a v2,
but I think they can come in as follow ups. This would ensure that the
fix is already in place and we can discuss over the other commits
slowly.

Thanks

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v2] reftable: write correct max_update_index to header
  2025-01-15 11:54 ` Karthik Nayak
                     ` (2 preceding siblings ...)
  2025-01-16 23:21   ` brian m. carlson
@ 2025-01-23 13:56   ` Karthik Nayak
  2025-01-23 18:11     ` Junio C Hamano
  3 siblings, 1 reply; 21+ messages in thread
From: Karthik Nayak @ 2025-01-23 13:56 UTC (permalink / raw)
  To: karthik.188; +Cc: git, ps, gitster, sandals, Johannes.Schindelin

In 297c09eabb (refs: allow multiple reflog entries for the same refname,
2024-12-16), the reftable backend learned to handle multiple reflog
entries within the same transaction. This was done modifying the
`update_index` for reflogs with multiple indices. During writing the
logs, the `max_update_index` of the writer was modified to ensure the
limits were raised to the modified `update_index`s.

However, since ref entries are written before the modification to the
`max_update_index`, if there are multiple blocks to be written, the
reftable backend writes the header with the old `max_update_index`. When
all logs are finally written, the footer will be written with the new
`min_update_index`. This causes a mismatch between the header and the
footer and causes the reftable file to be corrupted. The existing tests
only spawn a single block and since headers are lazily written with the
first block, the tests didn't capture this bug.

To fix the issue, the appropriate `max_update_index` limit must be set
even before the first block is written. Add a `max_index` field to the
transaction which holds the `max_index` within all its updates, then
propagate this value to the reftable backend, wherein this is used to
the set the `max_update_index` correctly.

Add a test which creates a few thousand reference updates with multiple
reflog entries, which should trigger the bug.

Reported-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---

Hello,

While this patch was merged to next, Dscho reported that it was flaky
on macos pipeline. On further investigation I found this was easily
reproducible when the leak sanitizer was turned on and the reftable
tests were run. The fix was simply to add the missing 0 initialization.

I also found that we didn't set the appropriate values for all elements
of the `reftable_transaction_data`. Which I've done now.

The patch is based on Maint f93ff170b9 (Git 2.48.1, 2025-01-13). 

Thanks

Changes from v1:
1. Correctly intialize the value to `0` for `write_transaction_table_arg.max_index`.
2. Set the `max_index` for all elements of `reftable_transaction_data`.
3. The range diff was made against the commit in Junio's branch
   `gitster/kn/reflog-migration-fix`, which included his 'S-o-b', which I've
   removed in this patch, since that is usually applied by Junio. Not sure if
   that was the right move. 

Range diff:
1:  bc67b4ab5f ! 1:  5489826b47 reftable: write correct max_update_index to header
    @@ Commit message
     
         Reported-by: brian m. carlson <sandals@crustytoothpaste.net>
         Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
    -    Signed-off-by: Junio C Hamano <gitster@pobox.com>
     
      ## refs.c ##
     @@ refs.c: int ref_transaction_update_reflog(struct ref_transaction *transaction,
    @@ refs/reftable-backend.c: struct write_transaction_table_arg {
      };
      
      struct reftable_transaction_data {
    +@@ refs/reftable-backend.c: static int prepare_transaction_update(struct write_transaction_table_arg **out,
    + 		arg->updates_nr = 0;
    + 		arg->updates_alloc = 0;
    + 		arg->updates_expected = 0;
    ++		arg->max_index = 0;
    + 	}
    + 
    + 	arg->updates_expected++;
     @@ refs/reftable-backend.c: static int write_transaction_table(struct reftable_writer *writer, void *cb_data
      	struct reftable_log_record *logs = NULL;
      	struct ident_split committer_ident = {0};
    @@ refs/reftable-backend.c: static int write_transaction_table(struct reftable_writ
      		if (ret < 0)
      			goto done;
     @@ refs/reftable-backend.c: static int reftable_be_transaction_finish(struct ref_store *ref_store UNUSED,
    - 	struct reftable_transaction_data *tx_data = transaction->backend_data;
      	int ret = 0;
      
    -+	if (tx_data->args)
    -+		tx_data->args->max_index = transaction->max_index;
    -+
      	for (size_t i = 0; i < tx_data->args_nr; i++) {
    ++		tx_data->args[i].max_index = transaction->max_index;
    ++
      		ret = reftable_addition_add(tx_data->args[i].addition,
      					    write_transaction_table, &tx_data->args[i]);
    + 		if (ret < 0)
     
      ## t/t1460-refs-migrate.sh ##
     @@ t/t1460-refs-migrate.sh: do
---
 refs.c                  |  7 +++++++
 refs/refs-internal.h    |  1 +
 refs/reftable-backend.c | 20 ++++++++++----------
 t/t1460-refs-migrate.sh | 12 ++++++++++++
 4 files changed, 30 insertions(+), 10 deletions(-)

diff --git a/refs.c b/refs.c
index 0f41b2fd4a..f7b6f0f897 100644
--- a/refs.c
+++ b/refs.c
@@ -1345,6 +1345,13 @@ int ref_transaction_update_reflog(struct ref_transaction *transaction,
 	update->flags &= ~REF_HAVE_OLD;
 	update->index = index;
 
+	/*
+	 * Reference backends may need to know the max index to optimize
+	 * their writes. So we store the max_index on the transaction level.
+	 */
+	if (index > transaction->max_index)
+		transaction->max_index = index;
+
 	return 0;
 }
 
diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index 16550862d3..aaab711bb9 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -203,6 +203,7 @@ struct ref_transaction {
 	enum ref_transaction_state state;
 	void *backend_data;
 	unsigned int flags;
+	unsigned int max_index;
 };
 
 /*
diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
index 00d95a9a2f..d39a14c5a4 100644
--- a/refs/reftable-backend.c
+++ b/refs/reftable-backend.c
@@ -942,6 +942,7 @@ struct write_transaction_table_arg {
 	size_t updates_nr;
 	size_t updates_alloc;
 	size_t updates_expected;
+	unsigned int max_index;
 };
 
 struct reftable_transaction_data {
@@ -1019,6 +1020,7 @@ static int prepare_transaction_update(struct write_transaction_table_arg **out,
 		arg->updates_nr = 0;
 		arg->updates_alloc = 0;
 		arg->updates_expected = 0;
+		arg->max_index = 0;
 	}
 
 	arg->updates_expected++;
@@ -1428,7 +1430,6 @@ static int write_transaction_table(struct reftable_writer *writer, void *cb_data
 	struct reftable_log_record *logs = NULL;
 	struct ident_split committer_ident = {0};
 	size_t logs_nr = 0, logs_alloc = 0, i;
-	uint64_t max_update_index = ts;
 	const char *committer_info;
 	int ret = 0;
 
@@ -1438,7 +1439,12 @@ static int write_transaction_table(struct reftable_writer *writer, void *cb_data
 
 	QSORT(arg->updates, arg->updates_nr, transaction_update_cmp);
 
-	reftable_writer_set_limits(writer, ts, ts);
+	/*
+	 * During reflog migration, we add indexes for a single reflog with
+	 * multiple entries. Each entry will contain a different update_index,
+	 * so set the limits accordingly.
+	 */
+	reftable_writer_set_limits(writer, ts, ts + arg->max_index);
 
 	for (i = 0; i < arg->updates_nr; i++) {
 		struct reftable_transaction_update *tx_update = &arg->updates[i];
@@ -1540,12 +1546,6 @@ static int write_transaction_table(struct reftable_writer *writer, void *cb_data
 				 */
 				log->update_index = ts + u->index;
 
-				/*
-				 * Note the max update_index so the limit can be set later on.
-				 */
-				if (log->update_index > max_update_index)
-					max_update_index = log->update_index;
-
 				log->refname = xstrdup(u->refname);
 				memcpy(log->value.update.new_hash,
 				       u->new_oid.hash, GIT_MAX_RAWSZ);
@@ -1609,8 +1609,6 @@ static int write_transaction_table(struct reftable_writer *writer, void *cb_data
 	 * and log blocks.
 	 */
 	if (logs) {
-		reftable_writer_set_limits(writer, ts, max_update_index);
-
 		ret = reftable_writer_add_logs(writer, logs, logs_nr);
 		if (ret < 0)
 			goto done;
@@ -1632,6 +1630,8 @@ static int reftable_be_transaction_finish(struct ref_store *ref_store UNUSED,
 	int ret = 0;
 
 	for (size_t i = 0; i < tx_data->args_nr; i++) {
+		tx_data->args[i].max_index = transaction->max_index;
+
 		ret = reftable_addition_add(tx_data->args[i].addition,
 					    write_transaction_table, &tx_data->args[i]);
 		if (ret < 0)
diff --git a/t/t1460-refs-migrate.sh b/t/t1460-refs-migrate.sh
index f59bc4860f..307b2998ef 100755
--- a/t/t1460-refs-migrate.sh
+++ b/t/t1460-refs-migrate.sh
@@ -227,6 +227,18 @@ do
 	done
 done
 
+test_expect_success 'multiple reftable blocks with multiple entries' '
+	test_when_finished "rm -rf repo" &&
+	git init --ref-format=files repo &&
+	test_commit -C repo first &&
+	printf "create refs/heads/ref-%d HEAD\n" $(test_seq 5000) >stdin &&
+	git -C repo update-ref --stdin <stdin &&
+	test_commit -C repo second &&
+	printf "update refs/heads/ref-%d HEAD\n" $(test_seq 3000) >stdin &&
+	git -C repo update-ref --stdin <stdin &&
+	test_migration repo reftable
+'
+
 test_expect_success 'migrating from files format deletes backend files' '
 	test_when_finished "rm -rf repo" &&
 	git init --ref-format=files repo &&
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH v2] reftable: write correct max_update_index to header
  2025-01-23 13:56   ` [PATCH v2] reftable: write correct max_update_index to header Karthik Nayak
@ 2025-01-23 18:11     ` Junio C Hamano
  2025-01-23 18:24       ` Junio C Hamano
  2025-01-24  4:06       ` [PATCH v2] reftable: write correct max_update_index to header Karthik Nayak
  0 siblings, 2 replies; 21+ messages in thread
From: Junio C Hamano @ 2025-01-23 18:11 UTC (permalink / raw)
  To: Karthik Nayak; +Cc: git, ps, sandals, Johannes.Schindelin

Karthik Nayak <karthik.188@gmail.com> writes:

> While this patch was merged to next, Dscho reported that it was flaky
> on macos pipeline. On further investigation I found this was easily
> reproducible when the leak sanitizer was turned on and the reftable
> tests were run. The fix was simply to add the missing 0 initialization.

If it is already _in_ 'next', please turn it into a relative patch
on top of it, instead of replacing it.

That will give you an opportunity to describe the breakage in the
original version, which everybody missed until it hit 'next'.  And
you can also credit the folks who reported the breakage, and
describe the fix.

The reason we do not revert out of 'next' lightly is because the
changes we merge to 'next' are supposed to be reviewed well enough,
which means that any bug we discover later is likely to have been
caused by mistakes any of us may repeat in the future, and it is
worth documenting in our history.

It is quite a different review philosophy if you compare the rules
we use for patches that haven't hit 'next'.  These uncooked patches
may have mistrakes that reviewers can easily spot and get corrected,
and these easy ones are not worth documenting as much.

> The patch is based on Maint f93ff170b9 (Git 2.48.1, 2025-01-13). 

Thanks.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2] reftable: write correct max_update_index to header
  2025-01-23 18:11     ` Junio C Hamano
@ 2025-01-23 18:24       ` Junio C Hamano
  2025-01-24 14:02         ` [PATCH v3] refs: fix uninitialized memory access of `max_index` Karthik Nayak
  2025-01-24  4:06       ` [PATCH v2] reftable: write correct max_update_index to header Karthik Nayak
  1 sibling, 1 reply; 21+ messages in thread
From: Junio C Hamano @ 2025-01-23 18:24 UTC (permalink / raw)
  To: Karthik Nayak; +Cc: git, ps, sandals, Johannes.Schindelin

Junio C Hamano <gitster@pobox.com> writes:

> Karthik Nayak <karthik.188@gmail.com> writes:
>
>> While this patch was merged to next, Dscho reported that it was flaky
>> on macos pipeline. On further investigation I found this was easily
>> reproducible when the leak sanitizer was turned on and the reftable
>> tests were run. The fix was simply to add the missing 0 initialization.
>
> If it is already _in_ 'next', please turn it into a relative patch
> on top of it, instead of replacing it.

For now, I have tentatively created the following and will queue on
a separate kn/reflog-migration-fix-fix topic (which would be ahead
of kn/reflog-migration-fix topic by this one commit), in the hope
that it can be replaced with a version with proper commit log
message that describes what bugs in the original "fix" are
addressed, how they are caused (e.g., how does it lead to the
breakage to forget clearing of arg->max_index in the first hunk
had?), and what their fixes are.

Thanks.

--- >8 ---
From: Karthik Nayak <karthik.188@gmail.com>
Date: Fri, 20 Dec 2024 13:58:37 +0100
Subject: [PATCH] SQUASH - needs to describe the breakage and fix in v1

---
 refs/reftable-backend.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
index 68db2baa8f..bb658826fe 100644
--- a/refs/reftable-backend.c
+++ b/refs/reftable-backend.c
@@ -920,6 +920,7 @@ static int prepare_transaction_update(struct write_transaction_table_arg **out,
 		arg->updates_nr = 0;
 		arg->updates_alloc = 0;
 		arg->updates_expected = 0;
+		arg->max_index = 0;
 	}
 
 	arg->updates_expected++;
@@ -1502,10 +1503,9 @@ static int reftable_be_transaction_finish(struct ref_store *ref_store UNUSED,
 	struct reftable_transaction_data *tx_data = transaction->backend_data;
 	int ret = 0;
 
-	if (tx_data->args)
-		tx_data->args->max_index = transaction->max_index;
-
 	for (size_t i = 0; i < tx_data->args_nr; i++) {
+		tx_data->args[i].max_index = transaction->max_index;
+
 		ret = reftable_addition_add(tx_data->args[i].addition,
 					    write_transaction_table, &tx_data->args[i]);
 		if (ret < 0)
-- 
2.48.1-259-gf9754493bb


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH v2] reftable: write correct max_update_index to header
  2025-01-23 18:11     ` Junio C Hamano
  2025-01-23 18:24       ` Junio C Hamano
@ 2025-01-24  4:06       ` Karthik Nayak
  2025-01-24 15:25         ` Junio C Hamano
  1 sibling, 1 reply; 21+ messages in thread
From: Karthik Nayak @ 2025-01-24  4:06 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, ps, sandals, Johannes.Schindelin

[-- Attachment #1: Type: text/plain, Size: 3037 bytes --]

Junio C Hamano <gitster@pobox.com> writes:

> Karthik Nayak <karthik.188@gmail.com> writes:
>
>> While this patch was merged to next, Dscho reported that it was flaky
>> on macos pipeline. On further investigation I found this was easily
>> reproducible when the leak sanitizer was turned on and the reftable
>> tests were run. The fix was simply to add the missing 0 initialization.
>
> If it is already _in_ 'next', please turn it into a relative patch
> on top of it, instead of replacing it.
>
> That will give you an opportunity to describe the breakage in the
> original version, which everybody missed until it hit 'next'.  And
> you can also credit the folks who reported the breakage, and
> describe the fix.
>
> The reason we do not revert out of 'next' lightly is because the
> changes we merge to 'next' are supposed to be reviewed well enough,
> which means that any bug we discover later is likely to have been
> caused by mistakes any of us may repeat in the future, and it is
> worth documenting in our history.
>
> It is quite a different review philosophy if you compare the rules
> we use for patches that haven't hit 'next'.  These uncooked patches
> may have mistrakes that reviewers can easily spot and get corrected,
> and these easy ones are not worth documenting as much.
>

Thanks Junio, I understand your reasoning here and it makes sense to me.
Do you think it is worthwhile to also add something like this to our
Documentation? I couldn't find anything there. I'll add a small patch to
the bottom of this mail.

>> The patch is based on Maint f93ff170b9 (Git 2.48.1, 2025-01-13).
>
> Thanks.

-- >8 --

Subject: [PATCH] doc: add guideline to tackle bugs in `next`

When fixing a bug in a topic already merged into `next`, there are no
strict guidelines to follow. While topics in `seen` can be reverted,
topics in `next` have undergone thorough review. Documenting fixes for
such topics is valuable, as it helps to clarify the issue and
contributes to preventing similar problems in the future.

Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
 Documentation/SubmittingPatches | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/Documentation/SubmittingPatches b/Documentation/SubmittingPatches
index 958e3cc3d5..72454acf21 100644
--- a/Documentation/SubmittingPatches
+++ b/Documentation/SubmittingPatches
@@ -115,6 +115,13 @@ latest HEAD commit of `maint` or `master` based
on the following cases:
   new API features on the cutting edge that recently appeared in
   `master` but were not available in the released version).

+* If you're fixing a bug in a topic that's already been merged into
+  `next`, it's preferable to create a patch relative to that topic.
+  This approach allows you to describe the issue in the original version
+  that went unnoticed until it reached next. Additionally, it provides
+  an opportunity to credit those who reported the issue and document
+  the details of the fix.
+
 * Otherwise (such as if you are adding new features) use `master`.


-- 
2.47.0

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v3] refs: fix uninitialized memory access of `max_index`
  2025-01-23 18:24       ` Junio C Hamano
@ 2025-01-24 14:02         ` Karthik Nayak
  2025-01-24 14:46           ` Patrick Steinhardt
  0 siblings, 1 reply; 21+ messages in thread
From: Karthik Nayak @ 2025-01-24 14:02 UTC (permalink / raw)
  To: gitster; +Cc: Johannes.Schindelin, git, karthik.188, ps, sandals

When migrating reflogs between reference backends, maintaining the
original order of the reflog entries is crucial. To achieve this, an
`index` field is stored within the `ref_update` struct.

In the reftable backend, before writing any references, the writer must
be configured with the minimum and maximum update index values. The
`max_update_index` is derived from the maximum `ref_update.index` value
in a transaction . The commit bc67b4ab5f (reftable: write correct
max_update_index to header, 2025-01-15) addressed this by propagating the
`max_update_index` value from the transaction to
`write_transaction_table_arg` and, ultimately, to
`reftable_writer_set_limits()`, which sets the min and max index for the
reftable writer.

However, that commit introduced an issue:

  - In `reftable_transaction_data`, which contains an array of
  `write_transaction_table_arg`, only the first element was assigned the
  `max_index` value.

As a result, any elements beyond the first in the array contained
uninitialized `max_index`. The writer contains multiple elements of
`write_transaction_table_arg` to correspond to different worktrees being
written. This uninitialized value was later used to set the
`max_update_index` for the writer, potentially causing overflow or
undefined behavior.

Fix this by:

  - Initializing the `max_index` field to 0.
  - Moving the assignment of `max_index` in
  `reftable_be_transaction_finish()` inside the loop, ensuring all
  elements of the array are correctly initialized.

Initializing `max_index` to `0` is not strictly necessary, as all
elements of `write_transaction_table_arg.max_index` are now assigned
correctly. However, this initialization is added for consistency and to
safeguard against potential future changes that might inadvertently
introduce uninitialized memory access.

Reported-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
Hello,

As suggested, I've redone my patch to make this a relative patch on top of
'kn/reflog-migration-fix'. 

This is based on top of maint with 'kn/reflog-migration-fix' merged in.

Thanks
---
 refs/reftable-backend.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
index 289496058e..d39a14c5a4 100644
--- a/refs/reftable-backend.c
+++ b/refs/reftable-backend.c
@@ -1020,6 +1020,7 @@ static int prepare_transaction_update(struct write_transaction_table_arg **out,
 		arg->updates_nr = 0;
 		arg->updates_alloc = 0;
 		arg->updates_expected = 0;
+		arg->max_index = 0;
 	}
 
 	arg->updates_expected++;
@@ -1628,10 +1629,9 @@ static int reftable_be_transaction_finish(struct ref_store *ref_store UNUSED,
 	struct reftable_transaction_data *tx_data = transaction->backend_data;
 	int ret = 0;
 
-	if (tx_data->args)
-		tx_data->args->max_index = transaction->max_index;
-
 	for (size_t i = 0; i < tx_data->args_nr; i++) {
+		tx_data->args[i].max_index = transaction->max_index;
+
 		ret = reftable_addition_add(tx_data->args[i].addition,
 					    write_transaction_table, &tx_data->args[i]);
 		if (ret < 0)
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH v3] refs: fix uninitialized memory access of `max_index`
  2025-01-24 14:02         ` [PATCH v3] refs: fix uninitialized memory access of `max_index` Karthik Nayak
@ 2025-01-24 14:46           ` Patrick Steinhardt
  2025-01-24 15:48             ` Karthik Nayak
  0 siblings, 1 reply; 21+ messages in thread
From: Patrick Steinhardt @ 2025-01-24 14:46 UTC (permalink / raw)
  To: Karthik Nayak; +Cc: gitster, Johannes.Schindelin, git, sandals

On Fri, Jan 24, 2025 at 03:02:03PM +0100, Karthik Nayak wrote:
> When migrating reflogs between reference backends, maintaining the
> original order of the reflog entries is crucial. To achieve this, an
> `index` field is stored within the `ref_update` struct.
> 
> In the reftable backend, before writing any references, the writer must
> be configured with the minimum and maximum update index values. The
> `max_update_index` is derived from the maximum `ref_update.index` value
> in a transaction . The commit bc67b4ab5f (reftable: write correct
> max_update_index to header, 2025-01-15) addressed this by propagating the
> `max_update_index` value from the transaction to
> `write_transaction_table_arg` and, ultimately, to
> `reftable_writer_set_limits()`, which sets the min and max index for the
> reftable writer.
> 
> However, that commit introduced an issue:
> 
>   - In `reftable_transaction_data`, which contains an array of
>   `write_transaction_table_arg`, only the first element was assigned the
>   `max_index` value.
> 
> As a result, any elements beyond the first in the array contained
> uninitialized `max_index`. The writer contains multiple elements of
> `write_transaction_table_arg` to correspond to different worktrees being
> written. This uninitialized value was later used to set the
> `max_update_index` for the writer, potentially causing overflow or
> undefined behavior.

It reads a bit funny as a bulleted list with a single item, only. A
suggestion for the above:

    However, we only set the update index for the first
    `write_transaction_table_arg`, even though there can be multiple
    such arguments. This is the case when we write to multiple stacks in
    a single transaction, e.g. when updating references in two different
    worktrees at once. And, if so, we wouldn't have initialized the
    update index for any but the first such argument. This uninitialized
    value was later used to set the `max_update_index` for the writer,
    potentially causing undefined behaviour.

Other than that this is nicely described, and the fix looks reasonable,
too.

Patrick

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2] reftable: write correct max_update_index to header
  2025-01-24  4:06       ` [PATCH v2] reftable: write correct max_update_index to header Karthik Nayak
@ 2025-01-24 15:25         ` Junio C Hamano
  0 siblings, 0 replies; 21+ messages in thread
From: Junio C Hamano @ 2025-01-24 15:25 UTC (permalink / raw)
  To: Karthik Nayak; +Cc: git, ps, sandals, Johannes.Schindelin

Karthik Nayak <karthik.188@gmail.com> writes:

> Thanks Junio, I understand your reasoning here and it makes sense to me.
> Do you think it is worthwhile to also add something like this to our
> Documentation? I couldn't find anything there. I'll add a small patch to
> the bottom of this mail.

Perhaps.  I think referring people to "Notes from the maintainer"
may also work without duplicating information.

Thanks.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3] refs: fix uninitialized memory access of `max_index`
  2025-01-24 14:46           ` Patrick Steinhardt
@ 2025-01-24 15:48             ` Karthik Nayak
  2025-01-27 10:20               ` Patrick Steinhardt
  0 siblings, 1 reply; 21+ messages in thread
From: Karthik Nayak @ 2025-01-24 15:48 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: gitster, Johannes.Schindelin, git, sandals

[-- Attachment #1: Type: text/plain, Size: 2312 bytes --]

Patrick Steinhardt <ps@pks.im> writes:

> On Fri, Jan 24, 2025 at 03:02:03PM +0100, Karthik Nayak wrote:
>> When migrating reflogs between reference backends, maintaining the
>> original order of the reflog entries is crucial. To achieve this, an
>> `index` field is stored within the `ref_update` struct.
>>
>> In the reftable backend, before writing any references, the writer must
>> be configured with the minimum and maximum update index values. The
>> `max_update_index` is derived from the maximum `ref_update.index` value
>> in a transaction . The commit bc67b4ab5f (reftable: write correct
>> max_update_index to header, 2025-01-15) addressed this by propagating the
>> `max_update_index` value from the transaction to
>> `write_transaction_table_arg` and, ultimately, to
>> `reftable_writer_set_limits()`, which sets the min and max index for the
>> reftable writer.
>>
>> However, that commit introduced an issue:
>>
>>   - In `reftable_transaction_data`, which contains an array of
>>   `write_transaction_table_arg`, only the first element was assigned the
>>   `max_index` value.
>>
>> As a result, any elements beyond the first in the array contained
>> uninitialized `max_index`. The writer contains multiple elements of
>> `write_transaction_table_arg` to correspond to different worktrees being
>> written. This uninitialized value was later used to set the
>> `max_update_index` for the writer, potentially causing overflow or
>> undefined behavior.
>
> It reads a bit funny as a bulleted list with a single item, only. A
> suggestion for the above:
>
>     However, we only set the update index for the first
>     `write_transaction_table_arg`, even though there can be multiple
>     such arguments. This is the case when we write to multiple stacks in
>     a single transaction, e.g. when updating references in two different
>     worktrees at once. And, if so, we wouldn't have initialized the
>     update index for any but the first such argument. This uninitialized
>     value was later used to set the `max_update_index` for the writer,
>     potentially causing undefined behaviour.
>
> Other than that this is nicely described, and the fix looks reasonable,
> too.
>
> Patrick

Thanks Patrick for taking the time, this seems much better. Let me add
this in for the next version.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3] refs: fix uninitialized memory access of `max_index`
  2025-01-24 15:48             ` Karthik Nayak
@ 2025-01-27 10:20               ` Patrick Steinhardt
  2025-01-27 16:24                 ` Junio C Hamano
  0 siblings, 1 reply; 21+ messages in thread
From: Patrick Steinhardt @ 2025-01-27 10:20 UTC (permalink / raw)
  To: Karthik Nayak; +Cc: gitster, Johannes.Schindelin, git, sandals

On Fri, Jan 24, 2025 at 07:48:43AM -0800, Karthik Nayak wrote:
> Thanks Patrick for taking the time, this seems much better. Let me add
> this in for the next version.

I've sent a v4 of this patch, but forgot to set the In-reply-to header.
The patch can be found at [1].

Patrick

[1]: <b7e3dd3cc870024f0e80dad26c5a7a96483c6cf4.1737970803.git.ps@pks.im>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3] refs: fix uninitialized memory access of `max_index`
  2025-01-27 10:20               ` Patrick Steinhardt
@ 2025-01-27 16:24                 ` Junio C Hamano
  0 siblings, 0 replies; 21+ messages in thread
From: Junio C Hamano @ 2025-01-27 16:24 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: Karthik Nayak, Johannes.Schindelin, git, sandals

Patrick Steinhardt <ps@pks.im> writes:

> On Fri, Jan 24, 2025 at 07:48:43AM -0800, Karthik Nayak wrote:
>> Thanks Patrick for taking the time, this seems much better. Let me add
>> this in for the next version.
>
> I've sent a v4 of this patch, but forgot to set the In-reply-to header.
> The patch can be found at [1].
>
> Patrick
>
> [1]: <b7e3dd3cc870024f0e80dad26c5a7a96483c6cf4.1737970803.git.ps@pks.im>

Ah, how very considerate of you.  List-archive spelunkers have
bidirectional links to see (1) from an older discussion, how the
issues were resolved, and (2) from the final attempt, where the
issues came from.

Thanks.

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2025-01-27 16:24 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-13 13:56 Bug in 2.48 with `git refs migrate` brian m. carlson
2025-01-13 15:45 ` Patrick Steinhardt
2025-01-15 11:54 ` Karthik Nayak
2025-01-15 17:07   ` Junio C Hamano
2025-01-16  6:44     ` Karthik Nayak
2025-01-16  2:05   ` brian m. carlson
2025-01-16  7:29     ` Karthik Nayak
2025-01-16 23:21   ` brian m. carlson
2025-01-17  0:04     ` Junio C Hamano
2025-01-17  6:17       ` Karthik Nayak
2025-01-17  6:15     ` Karthik Nayak
2025-01-23 13:56   ` [PATCH v2] reftable: write correct max_update_index to header Karthik Nayak
2025-01-23 18:11     ` Junio C Hamano
2025-01-23 18:24       ` Junio C Hamano
2025-01-24 14:02         ` [PATCH v3] refs: fix uninitialized memory access of `max_index` Karthik Nayak
2025-01-24 14:46           ` Patrick Steinhardt
2025-01-24 15:48             ` Karthik Nayak
2025-01-27 10:20               ` Patrick Steinhardt
2025-01-27 16:24                 ` Junio C Hamano
2025-01-24  4:06       ` [PATCH v2] reftable: write correct max_update_index to header Karthik Nayak
2025-01-24 15:25         ` Junio C Hamano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).