* [PATCH 0/3] refs: small followups to the migration corruption fix
@ 2025-01-17 7:59 Karthik Nayak
2025-01-17 7:59 ` [PATCH 1/3] refs: mark `ref_transaction_update_reflog()` as static Karthik Nayak
` (3 more replies)
0 siblings, 4 replies; 30+ messages in thread
From: Karthik Nayak @ 2025-01-17 7:59 UTC (permalink / raw)
To: git; +Cc: Karthik Nayak, sandals, gitster
This is a follow up to the bug that was reported [1] around `git refs
migrate --ref-format=reftable` where the migration would fail for
repositories with reflogs with lots of entries. This was caused due to a
mismatch in the reftable's header and footer, specifically WRT the
'max_update_index'.
While there was a fix posted. This series is a small followup to fix
some of the topics discussed there:
1. To mark `ref_transaction_update_reflog()` as static since it is only
used internally within 'refs.c'.
2. To change the type of 'max_index' from 'unsigned int' to 'uint64_t'.
This would be much safer for large repositories with millions of files
and on 32bit systems.
3. To add a safeguard to prevent 'update_index' changes after first
block is written. This is a preventive measure to ensure such bugs don't
arise in the future.
This is based on top of master 757161efcc (Sync with Git 2.48.1,
2025-01-13) with 'kn/reflog-migration-fix' merged in.
[1]: https://lore.kernel.org/r/Z4UbkcmJAU1MT-Rs@tapette.crustytoothpaste.net
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
Karthik Nayak (3):
refs: mark `ref_transaction_update_reflog()` as static
refs: use 'uint64_t' for 'ref_update.index'
reftable: prevent 'update_index' changes after header write
refs.c | 18 ++++++++++--------
refs.h | 14 --------------
refs/refs-internal.h | 4 ++--
refs/reftable-backend.c | 2 +-
reftable/writer.c | 7 +++++++
5 files changed, 20 insertions(+), 25 deletions(-)
---
---
base-commit: a5aa44e7930761cb900813d971b4105f901818fb
change-id: 20250117-461-corrupted-reftable-followup-eb0e4fd1a723
Thanks
- Karthik
^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH 1/3] refs: mark `ref_transaction_update_reflog()` as static
2025-01-17 7:59 [PATCH 0/3] refs: small followups to the migration corruption fix Karthik Nayak
@ 2025-01-17 7:59 ` Karthik Nayak
2025-01-17 9:29 ` Patrick Steinhardt
2025-01-17 7:59 ` [PATCH 2/3] refs: use 'uint64_t' for 'ref_update.index' Karthik Nayak
` (2 subsequent siblings)
3 siblings, 1 reply; 30+ messages in thread
From: Karthik Nayak @ 2025-01-17 7:59 UTC (permalink / raw)
To: git; +Cc: Karthik Nayak, sandals, gitster
The `ref_transaction_update_reflog()` function is only used within
'refs.c', so mark it as static.
Reported-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
refs.c | 16 +++++++++-------
refs.h | 14 --------------
2 files changed, 9 insertions(+), 21 deletions(-)
diff --git a/refs.c b/refs.c
index f7b6f0f897eb58665e10a2efd3eb53c3f72abe61..1bb6f0356d5c5cae8bea9d6f4d5ff35164a03c64 100644
--- a/refs.c
+++ b/refs.c
@@ -1318,13 +1318,15 @@ int ref_transaction_update(struct ref_transaction *transaction,
return 0;
}
-int ref_transaction_update_reflog(struct ref_transaction *transaction,
- const char *refname,
- const struct object_id *new_oid,
- const struct object_id *old_oid,
- const char *committer_info, unsigned int flags,
- const char *msg, unsigned int index,
- struct strbuf *err)
+static int ref_transaction_update_reflog(struct ref_transaction *transaction,
+ const char *refname,
+ const struct object_id *new_oid,
+ const struct object_id *old_oid,
+ const char *committer_info,
+ unsigned int flags,
+ const char *msg,
+ unsigned int index,
+ struct strbuf *err)
{
struct ref_update *update;
diff --git a/refs.h b/refs.h
index a0cdd99250e8286b55808b697b0a94afac5d8319..09be47afbee51e99f4ae49588cd65596ccfcb07e 100644
--- a/refs.h
+++ b/refs.h
@@ -771,20 +771,6 @@ int ref_transaction_update(struct ref_transaction *transaction,
unsigned int flags, const char *msg,
struct strbuf *err);
-/*
- * Similar to`ref_transaction_update`, but this function is only for adding
- * a reflog update. Supports providing custom committer information. The index
- * field can be utiltized to order updates as desired. When not used, the
- * updates default to being ordered by refname.
- */
-int ref_transaction_update_reflog(struct ref_transaction *transaction,
- const char *refname,
- const struct object_id *new_oid,
- const struct object_id *old_oid,
- const char *committer_info, unsigned int flags,
- const char *msg, unsigned int index,
- struct strbuf *err);
-
/*
* Add a reference creation to transaction. new_oid is the value that
* the reference should have after the update; it must not be
--
2.47.0
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH 2/3] refs: use 'uint64_t' for 'ref_update.index'
2025-01-17 7:59 [PATCH 0/3] refs: small followups to the migration corruption fix Karthik Nayak
2025-01-17 7:59 ` [PATCH 1/3] refs: mark `ref_transaction_update_reflog()` as static Karthik Nayak
@ 2025-01-17 7:59 ` Karthik Nayak
2025-01-17 7:59 ` [PATCH 3/3] reftable: prevent 'update_index' changes after header write Karthik Nayak
2025-01-21 3:34 ` [PATCH v2 0/3] refs: small followups to the migration corruption fix Karthik Nayak
3 siblings, 0 replies; 30+ messages in thread
From: Karthik Nayak @ 2025-01-17 7:59 UTC (permalink / raw)
To: git; +Cc: Karthik Nayak, sandals, gitster
The 'ref_update.index' variable is used to store an index for a given
reference update. This index is used to order the updates in a
predetermined order, while the default ordering is alphabetical as per
the refname.
For large repositories with millions of references, it should be safer
to use 'uint64_t'. Let's do that. This also is applied for all other
code sections where we store 'index' and pass it around.
Reported-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
refs.c | 4 ++--
refs/refs-internal.h | 4 ++--
refs/reftable-backend.c | 2 +-
3 files changed, 5 insertions(+), 5 deletions(-)
diff --git a/refs.c b/refs.c
index 1bb6f0356d5c5cae8bea9d6f4d5ff35164a03c64..0de23f91b2e4401e93aee6940d4aa2abea4312cf 100644
--- a/refs.c
+++ b/refs.c
@@ -1325,7 +1325,7 @@ static int ref_transaction_update_reflog(struct ref_transaction *transaction,
const char *committer_info,
unsigned int flags,
const char *msg,
- unsigned int index,
+ uint64_t index,
struct strbuf *err)
{
struct ref_update *update;
@@ -2807,7 +2807,7 @@ static int migrate_one_ref(const char *refname, const char *referent UNUSED, con
}
struct reflog_migration_data {
- unsigned int index;
+ uint64_t index;
const char *refname;
struct ref_store *old_refs;
struct ref_transaction *transaction;
diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index aaab711bb96844755dfa600d37efdb25b30a0765..8894b43d1d1a327d404d3923c507d2d39649de19 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -120,7 +120,7 @@ struct ref_update {
* when migrating reflogs and we want to ensure we carry over the
* same order.
*/
- unsigned int index;
+ uint64_t index;
/*
* If this ref_update was split off of a symref update via
@@ -203,7 +203,7 @@ struct ref_transaction {
enum ref_transaction_state state;
void *backend_data;
unsigned int flags;
- unsigned int max_index;
+ uint64_t max_index;
};
/*
diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
index 289496058e6eb4d3e3aef96ca70219cd4ff78eae..6814c87bc618229ac8a70b904be3f850371ad876 100644
--- a/refs/reftable-backend.c
+++ b/refs/reftable-backend.c
@@ -942,7 +942,7 @@ struct write_transaction_table_arg {
size_t updates_nr;
size_t updates_alloc;
size_t updates_expected;
- unsigned int max_index;
+ uint64_t max_index;
};
struct reftable_transaction_data {
--
2.47.0
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH 3/3] reftable: prevent 'update_index' changes after header write
2025-01-17 7:59 [PATCH 0/3] refs: small followups to the migration corruption fix Karthik Nayak
2025-01-17 7:59 ` [PATCH 1/3] refs: mark `ref_transaction_update_reflog()` as static Karthik Nayak
2025-01-17 7:59 ` [PATCH 2/3] refs: use 'uint64_t' for 'ref_update.index' Karthik Nayak
@ 2025-01-17 7:59 ` Karthik Nayak
2025-01-17 9:29 ` Patrick Steinhardt
2025-01-21 3:34 ` [PATCH v2 0/3] refs: small followups to the migration corruption fix Karthik Nayak
3 siblings, 1 reply; 30+ messages in thread
From: Karthik Nayak @ 2025-01-17 7:59 UTC (permalink / raw)
To: git; +Cc: Karthik Nayak, sandals, gitster
The function `reftable_writer_set_limits()` allows updating the
'min_update_index' and 'max_update_index' of a reftable writer. These
values are written to both the writer's header and footer.
Since the header is written during the first block write, any subsequent
changes to the update index would create a mismatch between the header
and footer values. The footer would contain the newer values while the
header retained the original ones. To fix this bug, we now prevent
callers from updating these values after the header has been written.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
reftable/writer.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/reftable/writer.c b/reftable/writer.c
index 740c98038eaf883258bef4988f78977ac7e4a75a..c602b873543790e36178f797ed9f98112671f97f 100644
--- a/reftable/writer.c
+++ b/reftable/writer.c
@@ -182,6 +182,13 @@ int reftable_writer_new(struct reftable_writer **out,
void reftable_writer_set_limits(struct reftable_writer *w, uint64_t min,
uint64_t max)
{
+ /*
+ * The limits shouldn't be modified post writing the first block, else
+ * it would cause a mismatch between the header and the footer.
+ */
+ if (w->next)
+ BUG("update index modified after writing first block");
+
w->min_update_index = min;
w->max_update_index = max;
}
--
2.47.0
^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [PATCH 3/3] reftable: prevent 'update_index' changes after header write
2025-01-17 7:59 ` [PATCH 3/3] reftable: prevent 'update_index' changes after header write Karthik Nayak
@ 2025-01-17 9:29 ` Patrick Steinhardt
2025-01-20 11:47 ` Karthik Nayak
0 siblings, 1 reply; 30+ messages in thread
From: Patrick Steinhardt @ 2025-01-17 9:29 UTC (permalink / raw)
To: Karthik Nayak; +Cc: git, sandals, gitster
On Fri, Jan 17, 2025 at 08:59:14AM +0100, Karthik Nayak wrote:
> diff --git a/reftable/writer.c b/reftable/writer.c
> index 740c98038eaf883258bef4988f78977ac7e4a75a..c602b873543790e36178f797ed9f98112671f97f 100644
> --- a/reftable/writer.c
> +++ b/reftable/writer.c
> @@ -182,6 +182,13 @@ int reftable_writer_new(struct reftable_writer **out,
> void reftable_writer_set_limits(struct reftable_writer *w, uint64_t min,
> uint64_t max)
> {
> + /*
> + * The limits shouldn't be modified post writing the first block, else
> + * it would cause a mismatch between the header and the footer.
> + */
Can we make this *even* stricter? I think that this is something that is
easy to do wrong, and the fact that it only triggers in some situations
of misuse may easily make tests miss this issue. So ideally, we should
assert that `set_limits()` is always called before queueing any records
to the writer. This would make us error out in all situations where the
calling order is wrong.
There are two ways I can see us doing that:
- Detect any state written by `writer_add_record()` and error out if
it's set when `reftable_writer_set_limits()` is called.
- Adapt `reftable_writer_new()` so that it takes the update indices as
input and drop `reftable_writer_set_limits()` altogether.
The latter might be preferable as you basically want to set limits in
all (most?) situations anyway.
> + if (w->next)
> + BUG("update index modified after writing first block");
Let's not use BUG, but rather return a `REFTABLE_API_ERROR` error. It
requires a bit more plumbing because we'll also hvae to adapt all
callers to handle errors. But on the one hand we don't want to die in
library code. And on the other hand we don't want to keep on adding more
dependencies on "git-compat-util.h".
Patrick
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 1/3] refs: mark `ref_transaction_update_reflog()` as static
2025-01-17 7:59 ` [PATCH 1/3] refs: mark `ref_transaction_update_reflog()` as static Karthik Nayak
@ 2025-01-17 9:29 ` Patrick Steinhardt
2025-01-20 11:17 ` Karthik Nayak
0 siblings, 1 reply; 30+ messages in thread
From: Patrick Steinhardt @ 2025-01-17 9:29 UTC (permalink / raw)
To: Karthik Nayak; +Cc: git, sandals, gitster
On Fri, Jan 17, 2025 at 08:59:12AM +0100, Karthik Nayak wrote:
> diff --git a/refs.h b/refs.h
> index a0cdd99250e8286b55808b697b0a94afac5d8319..09be47afbee51e99f4ae49588cd65596ccfcb07e 100644
> --- a/refs.h
> +++ b/refs.h
> @@ -771,20 +771,6 @@ int ref_transaction_update(struct ref_transaction *transaction,
> unsigned int flags, const char *msg,
> struct strbuf *err);
>
> -/*
> - * Similar to`ref_transaction_update`, but this function is only for adding
> - * a reflog update. Supports providing custom committer information. The index
> - * field can be utiltized to order updates as desired. When not used, the
> - * updates default to being ordered by refname.
> - */
Do we maybe want to move the comment over? The explanation of the index
field seems useful to me.
Patrick
> -int ref_transaction_update_reflog(struct ref_transaction *transaction,
> - const char *refname,
> - const struct object_id *new_oid,
> - const struct object_id *old_oid,
> - const char *committer_info, unsigned int flags,
> - const char *msg, unsigned int index,
> - struct strbuf *err);
> -
> /*
> * Add a reference creation to transaction. new_oid is the value that
> * the reference should have after the update; it must not be
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 1/3] refs: mark `ref_transaction_update_reflog()` as static
2025-01-17 9:29 ` Patrick Steinhardt
@ 2025-01-20 11:17 ` Karthik Nayak
0 siblings, 0 replies; 30+ messages in thread
From: Karthik Nayak @ 2025-01-20 11:17 UTC (permalink / raw)
To: Patrick Steinhardt; +Cc: git, sandals, gitster
[-- Attachment #1: Type: text/plain, Size: 1392 bytes --]
Patrick Steinhardt <ps@pks.im> writes:
> On Fri, Jan 17, 2025 at 08:59:12AM +0100, Karthik Nayak wrote:
>> diff --git a/refs.h b/refs.h
>> index a0cdd99250e8286b55808b697b0a94afac5d8319..09be47afbee51e99f4ae49588cd65596ccfcb07e 100644
>> --- a/refs.h
>> +++ b/refs.h
>> @@ -771,20 +771,6 @@ int ref_transaction_update(struct ref_transaction *transaction,
>> unsigned int flags, const char *msg,
>> struct strbuf *err);
>>
>> -/*
>> - * Similar to`ref_transaction_update`, but this function is only for adding
>> - * a reflog update. Supports providing custom committer information. The index
>> - * field can be utiltized to order updates as desired. When not used, the
>> - * updates default to being ordered by refname.
>> - */
>
> Do we maybe want to move the comment over? The explanation of the index
> field seems useful to me.
>
Yeah, that'd be useful indeed, let me move it.
> Patrick
>
>> -int ref_transaction_update_reflog(struct ref_transaction *transaction,
>> - const char *refname,
>> - const struct object_id *new_oid,
>> - const struct object_id *old_oid,
>> - const char *committer_info, unsigned int flags,
>> - const char *msg, unsigned int index,
>> - struct strbuf *err);
>> -
>> /*
>> * Add a reference creation to transaction. new_oid is the value that
>> * the reference should have after the update; it must not be
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 3/3] reftable: prevent 'update_index' changes after header write
2025-01-17 9:29 ` Patrick Steinhardt
@ 2025-01-20 11:47 ` Karthik Nayak
2025-01-20 12:18 ` Karthik Nayak
0 siblings, 1 reply; 30+ messages in thread
From: Karthik Nayak @ 2025-01-20 11:47 UTC (permalink / raw)
To: Patrick Steinhardt; +Cc: git, sandals, gitster
[-- Attachment #1: Type: text/plain, Size: 2709 bytes --]
Patrick Steinhardt <ps@pks.im> writes:
> On Fri, Jan 17, 2025 at 08:59:14AM +0100, Karthik Nayak wrote:
>> diff --git a/reftable/writer.c b/reftable/writer.c
>> index 740c98038eaf883258bef4988f78977ac7e4a75a..c602b873543790e36178f797ed9f98112671f97f 100644
>> --- a/reftable/writer.c
>> +++ b/reftable/writer.c
>> @@ -182,6 +182,13 @@ int reftable_writer_new(struct reftable_writer **out,
>> void reftable_writer_set_limits(struct reftable_writer *w, uint64_t min,
>> uint64_t max)
>> {
>> + /*
>> + * The limits shouldn't be modified post writing the first block, else
>> + * it would cause a mismatch between the header and the footer.
>> + */
>
> Can we make this *even* stricter? I think that this is something that is
> easy to do wrong, and the fact that it only triggers in some situations
> of misuse may easily make tests miss this issue. So ideally, we should
> assert that `set_limits()` is always called before queueing any records
> to the writer. This would make us error out in all situations where the
> calling order is wrong.
>
I agree here, it makes sense to make this stricter. Like you mentioned,
currently they are independent. The only way to enforce the limits is to
ensure that they are dependent.
> There are two ways I can see us doing that:
>
> - Detect any state written by `writer_add_record()` and error out if
> it's set when `reftable_writer_set_limits()` is called.
>
Yeah I think this would be simple to do. I guess we can check
`w->last_key` is set, since any record write would modify that.
> - Adapt `reftable_writer_new()` so that it takes the update indices as
> input and drop `reftable_writer_set_limits()` altogether.
>
This one is a bit harder to do because of our flow. Generally the writer
is provided to callers via a callback function passed to
`reftable_addition_add()`. I guess I could simply pass the data:
caller -> reftable_addition_add() -> reftable_writer_new()
Any direct users of `reftable_writer_new()` would simply pass the data
directly.
I'll play around and see if this is doable without too much refactoring
and have something in the next version.
> The latter might be preferable as you basically want to set limits in
> all (most?) situations anyway.
>
>> + if (w->next)
>> + BUG("update index modified after writing first block");
>
> Let's not use BUG, but rather return a `REFTABLE_API_ERROR` error. It
> requires a bit more plumbing because we'll also hvae to adapt all
> callers to handle errors. But on the one hand we don't want to die in
> library code. And on the other hand we don't want to keep on adding more
> dependencies on "git-compat-util.h".
>
Fair enough, thanks for explaining.
> Patrick
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 3/3] reftable: prevent 'update_index' changes after header write
2025-01-20 11:47 ` Karthik Nayak
@ 2025-01-20 12:18 ` Karthik Nayak
0 siblings, 0 replies; 30+ messages in thread
From: Karthik Nayak @ 2025-01-20 12:18 UTC (permalink / raw)
To: Patrick Steinhardt; +Cc: git, sandals, gitster
[-- Attachment #1: Type: text/plain, Size: 2937 bytes --]
Karthik Nayak <karthik.188@gmail.com> writes:
> Patrick Steinhardt <ps@pks.im> writes:
>
>> On Fri, Jan 17, 2025 at 08:59:14AM +0100, Karthik Nayak wrote:
>>> diff --git a/reftable/writer.c b/reftable/writer.c
>>> index 740c98038eaf883258bef4988f78977ac7e4a75a..c602b873543790e36178f797ed9f98112671f97f 100644
>>> --- a/reftable/writer.c
>>> +++ b/reftable/writer.c
>>> @@ -182,6 +182,13 @@ int reftable_writer_new(struct reftable_writer **out,
>>> void reftable_writer_set_limits(struct reftable_writer *w, uint64_t min,
>>> uint64_t max)
>>> {
>>> + /*
>>> + * The limits shouldn't be modified post writing the first block, else
>>> + * it would cause a mismatch between the header and the footer.
>>> + */
>>
>> Can we make this *even* stricter? I think that this is something that is
>> easy to do wrong, and the fact that it only triggers in some situations
>> of misuse may easily make tests miss this issue. So ideally, we should
>> assert that `set_limits()` is always called before queueing any records
>> to the writer. This would make us error out in all situations where the
>> calling order is wrong.
>>
>
> I agree here, it makes sense to make this stricter. Like you mentioned,
> currently they are independent. The only way to enforce the limits is to
> ensure that they are dependent.
>
>> There are two ways I can see us doing that:
>>
>> - Detect any state written by `writer_add_record()` and error out if
>> it's set when `reftable_writer_set_limits()` is called.
>>
>
> Yeah I think this would be simple to do. I guess we can check
> `w->last_key` is set, since any record write would modify that.
>
>> - Adapt `reftable_writer_new()` so that it takes the update indices as
>> input and drop `reftable_writer_set_limits()` altogether.
>>
>
> This one is a bit harder to do because of our flow. Generally the writer
> is provided to callers via a callback function passed to
> `reftable_addition_add()`. I guess I could simply pass the data:
>
> caller -> reftable_addition_add() -> reftable_writer_new()
>
> Any direct users of `reftable_writer_new()` would simply pass the data
> directly.
>
> I'll play around and see if this is doable without too much refactoring
> and have something in the next version.
It seems like `reftable_addition_add()` is internally and externally
called by a lot of functions and as such, modifying them all would be
tedious. Another idea is to modify the function pointer that
`reftable_addition_add()` receives so that the `write_table` argument
receives a function which would set the limits, but this seems like bad
design to me.
So the best/easiest way to do this is to error out if any state is set
before calling `reftable_writer_set_limits()`. I also realized that if
`reftable_writer_set_limits()` isn't called, the writer errors anyways.
So just ensuring that `reftable_writer_set_limits()` checks for any
state being set would work well.
[snip]
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]
^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH v2 0/3] refs: small followups to the migration corruption fix
2025-01-17 7:59 [PATCH 0/3] refs: small followups to the migration corruption fix Karthik Nayak
` (2 preceding siblings ...)
2025-01-17 7:59 ` [PATCH 3/3] reftable: prevent 'update_index' changes after header write Karthik Nayak
@ 2025-01-21 3:34 ` Karthik Nayak
2025-01-21 3:34 ` [PATCH v2 1/3] refs: mark `ref_transaction_update_reflog()` as static Karthik Nayak
` (3 more replies)
3 siblings, 4 replies; 30+ messages in thread
From: Karthik Nayak @ 2025-01-21 3:34 UTC (permalink / raw)
To: git; +Cc: Karthik Nayak, ps, Junio C Hamano, brian m. carlson
This is a follow up to the bug that was reported [1] around `git refs
migrate --ref-format=reftable` where the migration would fail for
repositories with reflogs with lots of entries. This was caused due to a
mismatch in the reftable's header and footer, specifically WRT the
'max_update_index'.
While there was a fix posted. This series is a small followup to fix
some of the topics discussed there:
1. To mark `ref_transaction_update_reflog()` as static since it is only
used internally within 'refs.c'.
2. To change the type of 'max_index' from 'unsigned int' to 'uint64_t'.
This would be much safer for large repositories with millions of files
and on 32bit systems.
3. To add a safeguard to prevent 'update_index' changes post any record
addition. This is a preventive measure to ensure such bugs don't arise
in the future.
This is based on top of master 757161efcc (Sync with Git 2.48.1,
2025-01-13) with 'kn/reflog-migration-fix' merged in.
[1]: https://lore.kernel.org/r/Z4UbkcmJAU1MT-Rs@tapette.crustytoothpaste.net
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
Changes in v2:
- Commit 1: Keep the function comments.
- Commit 3: Modify the requirement to be for any record writes instead
of the first block write. This ensures that the limits need to be set
before any records being added. Instead of using `BUG()`, return
`REFTABLE_API_ERROR`. Handle all callers as needed.
- Link to v1: https://lore.kernel.org/r/20250117-461-corrupted-reftable-followup-v1-0-70ee605ae3fe@gmail.com
---
Karthik Nayak (3):
refs: mark `ref_transaction_update_reflog()` as static
refs: use 'uint64_t' for 'ref_update.index'
reftable: prevent 'update_index' changes after adding records
refs.c | 24 ++++++++++++++++--------
refs.h | 14 --------------
refs/refs-internal.h | 4 ++--
refs/reftable-backend.c | 22 ++++++++++++++++------
reftable/reftable-error.h | 1 +
reftable/reftable-writer.h | 24 ++++++++++++++----------
reftable/stack.c | 6 ++++--
reftable/writer.c | 13 +++++++++++--
t/unit-tests/t-reftable-stack.c | 8 +++++---
9 files changed, 69 insertions(+), 47 deletions(-)
---
Range-diff versus v1:
1: 82a4f950dd ! 1: 5821ddfbe2 refs: mark `ref_transaction_update_reflog()` as static
@@ refs.c: int ref_transaction_update(struct ref_transaction *transaction,
- const char *committer_info, unsigned int flags,
- const char *msg, unsigned int index,
- struct strbuf *err)
++/*
++ * Similar to`ref_transaction_update`, but this function is only for adding
++ * a reflog update. Supports providing custom committer information. The index
++ * field can be utiltized to order updates as desired. When not used, the
++ * updates default to being ordered by refname.
++ */
+static int ref_transaction_update_reflog(struct ref_transaction *transaction,
+ const char *refname,
+ const struct object_id *new_oid,
2: 51ed04e95d = 2: 8025cfadf2 refs: use 'uint64_t' for 'ref_update.index'
3: 34e3001e29 < -: ---------- reftable: prevent 'update_index' changes after header write
-: ---------- > 3: 4b0416ce20 reftable: prevent 'update_index' changes after adding records
---
base-commit: a5aa44e7930761cb900813d971b4105f901818fb
change-id: 20250117-461-corrupted-reftable-followup-eb0e4fd1a723
Thanks
- Karthik
^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH v2 1/3] refs: mark `ref_transaction_update_reflog()` as static
2025-01-21 3:34 ` [PATCH v2 0/3] refs: small followups to the migration corruption fix Karthik Nayak
@ 2025-01-21 3:34 ` Karthik Nayak
2025-01-21 3:34 ` [PATCH v2 2/3] refs: use 'uint64_t' for 'ref_update.index' Karthik Nayak
` (2 subsequent siblings)
3 siblings, 0 replies; 30+ messages in thread
From: Karthik Nayak @ 2025-01-21 3:34 UTC (permalink / raw)
To: git; +Cc: Karthik Nayak, ps, Junio C Hamano
The `ref_transaction_update_reflog()` function is only used within
'refs.c', so mark it as static.
Reported-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
refs.c | 22 +++++++++++++++-------
refs.h | 14 --------------
2 files changed, 15 insertions(+), 21 deletions(-)
diff --git a/refs.c b/refs.c
index f7b6f0f897eb58665e10a2efd3eb53c3f72abe61..ad6d774717150f1fe68a59c629e05e49a469693f 100644
--- a/refs.c
+++ b/refs.c
@@ -1318,13 +1318,21 @@ int ref_transaction_update(struct ref_transaction *transaction,
return 0;
}
-int ref_transaction_update_reflog(struct ref_transaction *transaction,
- const char *refname,
- const struct object_id *new_oid,
- const struct object_id *old_oid,
- const char *committer_info, unsigned int flags,
- const char *msg, unsigned int index,
- struct strbuf *err)
+/*
+ * Similar to`ref_transaction_update`, but this function is only for adding
+ * a reflog update. Supports providing custom committer information. The index
+ * field can be utiltized to order updates as desired. When not used, the
+ * updates default to being ordered by refname.
+ */
+static int ref_transaction_update_reflog(struct ref_transaction *transaction,
+ const char *refname,
+ const struct object_id *new_oid,
+ const struct object_id *old_oid,
+ const char *committer_info,
+ unsigned int flags,
+ const char *msg,
+ unsigned int index,
+ struct strbuf *err)
{
struct ref_update *update;
diff --git a/refs.h b/refs.h
index a0cdd99250e8286b55808b697b0a94afac5d8319..09be47afbee51e99f4ae49588cd65596ccfcb07e 100644
--- a/refs.h
+++ b/refs.h
@@ -771,20 +771,6 @@ int ref_transaction_update(struct ref_transaction *transaction,
unsigned int flags, const char *msg,
struct strbuf *err);
-/*
- * Similar to`ref_transaction_update`, but this function is only for adding
- * a reflog update. Supports providing custom committer information. The index
- * field can be utiltized to order updates as desired. When not used, the
- * updates default to being ordered by refname.
- */
-int ref_transaction_update_reflog(struct ref_transaction *transaction,
- const char *refname,
- const struct object_id *new_oid,
- const struct object_id *old_oid,
- const char *committer_info, unsigned int flags,
- const char *msg, unsigned int index,
- struct strbuf *err);
-
/*
* Add a reference creation to transaction. new_oid is the value that
* the reference should have after the update; it must not be
--
2.47.0
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH v2 2/3] refs: use 'uint64_t' for 'ref_update.index'
2025-01-21 3:34 ` [PATCH v2 0/3] refs: small followups to the migration corruption fix Karthik Nayak
2025-01-21 3:34 ` [PATCH v2 1/3] refs: mark `ref_transaction_update_reflog()` as static Karthik Nayak
@ 2025-01-21 3:34 ` Karthik Nayak
2025-01-21 3:34 ` [PATCH v2 3/3] reftable: prevent 'update_index' changes after adding records Karthik Nayak
2025-01-22 5:35 ` [PATCH v3 0/3] refs: small followups to the migration corruption fix Karthik Nayak
3 siblings, 0 replies; 30+ messages in thread
From: Karthik Nayak @ 2025-01-21 3:34 UTC (permalink / raw)
To: git; +Cc: Karthik Nayak, ps, brian m. carlson
The 'ref_update.index' variable is used to store an index for a given
reference update. This index is used to order the updates in a
predetermined order, while the default ordering is alphabetical as per
the refname.
For large repositories with millions of references, it should be safer
to use 'uint64_t'. Let's do that. This also is applied for all other
code sections where we store 'index' and pass it around.
Reported-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
refs.c | 4 ++--
refs/refs-internal.h | 4 ++--
refs/reftable-backend.c | 2 +-
3 files changed, 5 insertions(+), 5 deletions(-)
diff --git a/refs.c b/refs.c
index ad6d774717150f1fe68a59c629e05e49a469693f..ef04f403a6a3a34f9156b4cf68c3daa29c9cbad6 100644
--- a/refs.c
+++ b/refs.c
@@ -1331,7 +1331,7 @@ static int ref_transaction_update_reflog(struct ref_transaction *transaction,
const char *committer_info,
unsigned int flags,
const char *msg,
- unsigned int index,
+ uint64_t index,
struct strbuf *err)
{
struct ref_update *update;
@@ -2813,7 +2813,7 @@ static int migrate_one_ref(const char *refname, const char *referent UNUSED, con
}
struct reflog_migration_data {
- unsigned int index;
+ uint64_t index;
const char *refname;
struct ref_store *old_refs;
struct ref_transaction *transaction;
diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index aaab711bb96844755dfa600d37efdb25b30a0765..8894b43d1d1a327d404d3923c507d2d39649de19 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -120,7 +120,7 @@ struct ref_update {
* when migrating reflogs and we want to ensure we carry over the
* same order.
*/
- unsigned int index;
+ uint64_t index;
/*
* If this ref_update was split off of a symref update via
@@ -203,7 +203,7 @@ struct ref_transaction {
enum ref_transaction_state state;
void *backend_data;
unsigned int flags;
- unsigned int max_index;
+ uint64_t max_index;
};
/*
diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
index 289496058e6eb4d3e3aef96ca70219cd4ff78eae..6814c87bc618229ac8a70b904be3f850371ad876 100644
--- a/refs/reftable-backend.c
+++ b/refs/reftable-backend.c
@@ -942,7 +942,7 @@ struct write_transaction_table_arg {
size_t updates_nr;
size_t updates_alloc;
size_t updates_expected;
- unsigned int max_index;
+ uint64_t max_index;
};
struct reftable_transaction_data {
--
2.47.0
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH v2 3/3] reftable: prevent 'update_index' changes after adding records
2025-01-21 3:34 ` [PATCH v2 0/3] refs: small followups to the migration corruption fix Karthik Nayak
2025-01-21 3:34 ` [PATCH v2 1/3] refs: mark `ref_transaction_update_reflog()` as static Karthik Nayak
2025-01-21 3:34 ` [PATCH v2 2/3] refs: use 'uint64_t' for 'ref_update.index' Karthik Nayak
@ 2025-01-21 3:34 ` Karthik Nayak
2025-01-21 6:56 ` Patrick Steinhardt
2025-01-22 5:35 ` [PATCH v3 0/3] refs: small followups to the migration corruption fix Karthik Nayak
3 siblings, 1 reply; 30+ messages in thread
From: Karthik Nayak @ 2025-01-21 3:34 UTC (permalink / raw)
To: git; +Cc: Karthik Nayak, ps
The function `reftable_writer_set_limits()` allows updating the
'min_update_index' and 'max_update_index' of a reftable writer. These
values are written to both the writer's header and footer.
Since the header is written during the first block write, any subsequent
changes to the update index would create a mismatch between the header
and footer values. The footer would contain the newer values while the
header retained the original ones.
To fix this bug, prevent callers from updating these values after any
record is written. To do this, modify the function to return an error
whenever the limits are modified after any record adds. Check for record
adds within `reftable_writer_set_limits()` by checking the `last_key`
variable, which is set whenever a new record is added.
Modify all callers of the function to anticipate a return type and
handle it accordingly.
Helped-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
refs/reftable-backend.c | 20 +++++++++++++++-----
reftable/reftable-error.h | 1 +
reftable/reftable-writer.h | 24 ++++++++++++++----------
reftable/stack.c | 6 ++++--
reftable/writer.c | 13 +++++++++++--
t/unit-tests/t-reftable-stack.c | 8 +++++---
6 files changed, 50 insertions(+), 22 deletions(-)
diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
index 6814c87bc618229ac8a70b904be3f850371ad876..9cfb0cb26721a9425c3b4a374f7b41e192037315 100644
--- a/refs/reftable-backend.c
+++ b/refs/reftable-backend.c
@@ -1443,7 +1443,9 @@ static int write_transaction_table(struct reftable_writer *writer, void *cb_data
* multiple entries. Each entry will contain a different update_index,
* so set the limits accordingly.
*/
- reftable_writer_set_limits(writer, ts, ts + arg->max_index);
+ ret = reftable_writer_set_limits(writer, ts, ts + arg->max_index);
+ if (ret < 0)
+ goto done;
for (i = 0; i < arg->updates_nr; i++) {
struct reftable_transaction_update *tx_update = &arg->updates[i];
@@ -1766,7 +1768,9 @@ static int write_copy_table(struct reftable_writer *writer, void *cb_data)
deletion_ts = creation_ts = reftable_stack_next_update_index(arg->be->stack);
if (arg->delete_old)
creation_ts++;
- reftable_writer_set_limits(writer, deletion_ts, creation_ts);
+ ret = reftable_writer_set_limits(writer, deletion_ts, creation_ts);
+ if (ret < 0)
+ goto done;
/*
* Add the new reference. If this is a rename then we also delete the
@@ -2298,7 +2302,9 @@ static int write_reflog_existence_table(struct reftable_writer *writer,
if (ret <= 0)
goto done;
- reftable_writer_set_limits(writer, ts, ts);
+ ret = reftable_writer_set_limits(writer, ts, ts);
+ if (ret < 0)
+ goto done;
/*
* The existence entry has both old and new object ID set to the
@@ -2357,7 +2363,9 @@ static int write_reflog_delete_table(struct reftable_writer *writer, void *cb_da
uint64_t ts = reftable_stack_next_update_index(arg->stack);
int ret;
- reftable_writer_set_limits(writer, ts, ts);
+ ret = reftable_writer_set_limits(writer, ts, ts);
+ if (ret < 0)
+ goto out;
ret = reftable_stack_init_log_iterator(arg->stack, &it);
if (ret < 0)
@@ -2434,7 +2442,9 @@ static int write_reflog_expiry_table(struct reftable_writer *writer, void *cb_da
if (arg->records[i].value_type == REFTABLE_LOG_UPDATE)
live_records++;
- reftable_writer_set_limits(writer, ts, ts);
+ ret = reftable_writer_set_limits(writer, ts, ts);
+ if (ret < 0)
+ return ret;
if (!is_null_oid(&arg->update_oid)) {
struct reftable_ref_record ref = {0};
diff --git a/reftable/reftable-error.h b/reftable/reftable-error.h
index f4048265629fe456207b88620658193f770a84f0..a7e33d964d0cfe5546f588d26c0fcb66ab326828 100644
--- a/reftable/reftable-error.h
+++ b/reftable/reftable-error.h
@@ -30,6 +30,7 @@ enum reftable_error {
/* Misuse of the API:
* - on writing a record with NULL refname.
+ * - on writing a record before setting the writer limits.
* - on writing a reftable_ref_record outside the table limits
* - on writing a ref or log record before the stack's
* next_update_inde*x
diff --git a/reftable/reftable-writer.h b/reftable/reftable-writer.h
index 5f9afa620bb00de66c311765fb0ae8c6f56401ae..1ea014d389cc47f173279e3234a82f3fcbc807a0 100644
--- a/reftable/reftable-writer.h
+++ b/reftable/reftable-writer.h
@@ -124,17 +124,21 @@ int reftable_writer_new(struct reftable_writer **out,
int (*flush_func)(void *),
void *writer_arg, const struct reftable_write_options *opts);
-/* Set the range of update indices for the records we will add. When writing a
- table into a stack, the min should be at least
- reftable_stack_next_update_index(), or REFTABLE_API_ERROR is returned.
-
- For transactional updates to a stack, typically min==max, and the
- update_index can be obtained by inspeciting the stack. When converting an
- existing ref database into a single reftable, this would be a range of
- update-index timestamps.
+/*
+ * Set the range of update indices for the records we will add. When writing a
+ * table into a stack, the min should be at least
+ * reftable_stack_next_update_index(), or REFTABLE_API_ERROR is returned.
+ *
+ * For transactional updates to a stack, typically min==max, and the
+ * update_index can be obtained by inspeciting the stack. When converting an
+ * existing ref database into a single reftable, this would be a range of
+ * update-index timestamps.
+ *
+ * The function should be called before adding any records to the writer. If not
+ * it will fail with REFTABLE_API_ERROR.
*/
-void reftable_writer_set_limits(struct reftable_writer *w, uint64_t min,
- uint64_t max);
+int reftable_writer_set_limits(struct reftable_writer *w, uint64_t min,
+ uint64_t max);
/*
Add a reftable_ref_record. The record should have names that come after
diff --git a/reftable/stack.c b/reftable/stack.c
index 531660a49f0948c33041831ee0d740feacb22b2f..9649dbbb04c51e106ee752f14481bbad381cb348 100644
--- a/reftable/stack.c
+++ b/reftable/stack.c
@@ -1058,8 +1058,10 @@ static int stack_write_compact(struct reftable_stack *st,
for (size_t i = first; i <= last; i++)
st->stats.bytes += st->readers[i]->size;
- reftable_writer_set_limits(wr, st->readers[first]->min_update_index,
- st->readers[last]->max_update_index);
+ err = reftable_writer_set_limits(wr, st->readers[first]->min_update_index,
+ st->readers[last]->max_update_index);
+ if (err < 0)
+ goto done;
err = reftable_merged_table_new(&mt, st->readers + first, subtabs_len,
st->opts.hash_id);
diff --git a/reftable/writer.c b/reftable/writer.c
index 740c98038eaf883258bef4988f78977ac7e4a75a..03acbdbcce75fd51820c5fb016bd94f0f7f4914a 100644
--- a/reftable/writer.c
+++ b/reftable/writer.c
@@ -179,11 +179,20 @@ int reftable_writer_new(struct reftable_writer **out,
return 0;
}
-void reftable_writer_set_limits(struct reftable_writer *w, uint64_t min,
- uint64_t max)
+int reftable_writer_set_limits(struct reftable_writer *w, uint64_t min,
+ uint64_t max)
{
+ /*
+ * The limits should be set before any records are added to the writer.
+ * Check if any records were added by checking if `last_key` was set.
+ */
+ if (w->last_key.len)
+ return REFTABLE_API_ERROR;
+
w->min_update_index = min;
w->max_update_index = max;
+
+ return 0;
}
static void writer_release(struct reftable_writer *w)
diff --git a/t/unit-tests/t-reftable-stack.c b/t/unit-tests/t-reftable-stack.c
index aeec195b2b1014445d71c5db39a9795017fd8ff2..b23edf18a7d75b0c2292490ad06d4dfaaa571e79 100644
--- a/t/unit-tests/t-reftable-stack.c
+++ b/t/unit-tests/t-reftable-stack.c
@@ -103,7 +103,8 @@ static void t_read_file(void)
static int write_test_ref(struct reftable_writer *wr, void *arg)
{
struct reftable_ref_record *ref = arg;
- reftable_writer_set_limits(wr, ref->update_index, ref->update_index);
+ check(!reftable_writer_set_limits(wr, ref->update_index,
+ ref->update_index));
return reftable_writer_add_ref(wr, ref);
}
@@ -143,7 +144,8 @@ static int write_test_log(struct reftable_writer *wr, void *arg)
{
struct write_log_arg *wla = arg;
- reftable_writer_set_limits(wr, wla->update_index, wla->update_index);
+ check(!reftable_writer_set_limits(wr, wla->update_index,
+ wla->update_index));
return reftable_writer_add_log(wr, wla->log);
}
@@ -961,7 +963,7 @@ static void t_reflog_expire(void)
static int write_nothing(struct reftable_writer *wr, void *arg UNUSED)
{
- reftable_writer_set_limits(wr, 1, 1);
+ check(!reftable_writer_set_limits(wr, 1, 1));
return 0;
}
--
2.47.0
^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [PATCH v2 3/3] reftable: prevent 'update_index' changes after adding records
2025-01-21 3:34 ` [PATCH v2 3/3] reftable: prevent 'update_index' changes after adding records Karthik Nayak
@ 2025-01-21 6:56 ` Patrick Steinhardt
2025-01-21 11:44 ` Karthik Nayak
0 siblings, 1 reply; 30+ messages in thread
From: Patrick Steinhardt @ 2025-01-21 6:56 UTC (permalink / raw)
To: Karthik Nayak; +Cc: git
On Tue, Jan 21, 2025 at 04:34:12AM +0100, Karthik Nayak wrote:
> The function `reftable_writer_set_limits()` allows updating the
> 'min_update_index' and 'max_update_index' of a reftable writer. These
> values are written to both the writer's header and footer.
>
> Since the header is written during the first block write, any subsequent
> changes to the update index would create a mismatch between the header
> and footer values. The footer would contain the newer values while the
> header retained the original ones.
>
> To fix this bug, prevent callers from updating these values after any
Nit: it's not really fixing a bug, but protecting us against it. Not
worth a reroll though, from my point of view.
> diff --git a/reftable/reftable-writer.h b/reftable/reftable-writer.h
> index 5f9afa620bb00de66c311765fb0ae8c6f56401ae..1ea014d389cc47f173279e3234a82f3fcbc807a0 100644
> --- a/reftable/reftable-writer.h
> +++ b/reftable/reftable-writer.h
> @@ -124,17 +124,21 @@ int reftable_writer_new(struct reftable_writer **out,
> int (*flush_func)(void *),
> void *writer_arg, const struct reftable_write_options *opts);
>
> -/* Set the range of update indices for the records we will add. When writing a
> - table into a stack, the min should be at least
> - reftable_stack_next_update_index(), or REFTABLE_API_ERROR is returned.
> -
> - For transactional updates to a stack, typically min==max, and the
> - update_index can be obtained by inspeciting the stack. When converting an
> - existing ref database into a single reftable, this would be a range of
> - update-index timestamps.
> +/*
> + * Set the range of update indices for the records we will add. When writing a
> + * table into a stack, the min should be at least
> + * reftable_stack_next_update_index(), or REFTABLE_API_ERROR is returned.
> + *
> + * For transactional updates to a stack, typically min==max, and the
> + * update_index can be obtained by inspeciting the stack. When converting an
> + * existing ref database into a single reftable, this would be a range of
> + * update-index timestamps.
> + *
> + * The function should be called before adding any records to the writer. If not
> + * it will fail with REFTABLE_API_ERROR.
> */
Thanks for updating this. I think the reftable library is one of those
code areas where it makes sense to sneak in a formatting fix every now
and then because its coding style is quite alien to Git's own in some
places. We could also do it all in one go, but I strongly doubt that it
would be worth the churn.
> -void reftable_writer_set_limits(struct reftable_writer *w, uint64_t min,
> - uint64_t max);
> +int reftable_writer_set_limits(struct reftable_writer *w, uint64_t min,
> + uint64_t max);
>
> /*
> Add a reftable_ref_record. The record should have names that come after
> diff --git a/reftable/writer.c b/reftable/writer.c
> index 740c98038eaf883258bef4988f78977ac7e4a75a..03acbdbcce75fd51820c5fb016bd94f0f7f4914a 100644
> --- a/reftable/writer.c
> +++ b/reftable/writer.c
> @@ -179,11 +179,20 @@ int reftable_writer_new(struct reftable_writer **out,
> return 0;
> }
>
> -void reftable_writer_set_limits(struct reftable_writer *w, uint64_t min,
> - uint64_t max)
> +int reftable_writer_set_limits(struct reftable_writer *w, uint64_t min,
> + uint64_t max)
> {
> + /*
> + * The limits should be set before any records are added to the writer.
> + * Check if any records were added by checking if `last_key` was set.
> + */
> + if (w->last_key.len)
> + return REFTABLE_API_ERROR;
Hm. Using the last key feels somewhat dangerous to me as it does get
reset at times, e.g. when finishing writing the current section. It
_should_ work, but overall it just feels a tad to disconnected from the
thing that we actually want to check.
How about we instead use `next`? This variable records the offset of the
next block we're about to write, and `writer_flush_nonempty_block()`
uses it directly to check whether we're currently writing the first
block in order to decide whether it needs to write a header or not. If
it's 0, we know that we haven't written the first block yet. That feels
much closer aligned with what we're checking.
> diff --git a/t/unit-tests/t-reftable-stack.c b/t/unit-tests/t-reftable-stack.c
> index aeec195b2b1014445d71c5db39a9795017fd8ff2..b23edf18a7d75b0c2292490ad06d4dfaaa571e79 100644
> --- a/t/unit-tests/t-reftable-stack.c
> +++ b/t/unit-tests/t-reftable-stack.c
Can we maybe add a unit test that demonstrates the error?
Patrick
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v2 3/3] reftable: prevent 'update_index' changes after adding records
2025-01-21 6:56 ` Patrick Steinhardt
@ 2025-01-21 11:44 ` Karthik Nayak
0 siblings, 0 replies; 30+ messages in thread
From: Karthik Nayak @ 2025-01-21 11:44 UTC (permalink / raw)
To: Patrick Steinhardt; +Cc: git
[-- Attachment #1: Type: text/plain, Size: 5430 bytes --]
Patrick Steinhardt <ps@pks.im> writes:
> On Tue, Jan 21, 2025 at 04:34:12AM +0100, Karthik Nayak wrote:
>> The function `reftable_writer_set_limits()` allows updating the
>> 'min_update_index' and 'max_update_index' of a reftable writer. These
>> values are written to both the writer's header and footer.
>>
>> Since the header is written during the first block write, any subsequent
>> changes to the update index would create a mismatch between the header
>> and footer values. The footer would contain the newer values while the
>> header retained the original ones.
>>
>> To fix this bug, prevent callers from updating these values after any
>
> Nit: it's not really fixing a bug, but protecting us against it. Not
> worth a reroll though, from my point of view.
>
That's right, I'll add that in.
>> diff --git a/reftable/reftable-writer.h b/reftable/reftable-writer.h
>> index 5f9afa620bb00de66c311765fb0ae8c6f56401ae..1ea014d389cc47f173279e3234a82f3fcbc807a0 100644
>> --- a/reftable/reftable-writer.h
>> +++ b/reftable/reftable-writer.h
>> @@ -124,17 +124,21 @@ int reftable_writer_new(struct reftable_writer **out,
>> int (*flush_func)(void *),
>> void *writer_arg, const struct reftable_write_options *opts);
>>
>> -/* Set the range of update indices for the records we will add. When writing a
>> - table into a stack, the min should be at least
>> - reftable_stack_next_update_index(), or REFTABLE_API_ERROR is returned.
>> -
>> - For transactional updates to a stack, typically min==max, and the
>> - update_index can be obtained by inspeciting the stack. When converting an
>> - existing ref database into a single reftable, this would be a range of
>> - update-index timestamps.
>> +/*
>> + * Set the range of update indices for the records we will add. When writing a
>> + * table into a stack, the min should be at least
>> + * reftable_stack_next_update_index(), or REFTABLE_API_ERROR is returned.
>> + *
>> + * For transactional updates to a stack, typically min==max, and the
>> + * update_index can be obtained by inspeciting the stack. When converting an
>> + * existing ref database into a single reftable, this would be a range of
>> + * update-index timestamps.
>> + *
>> + * The function should be called before adding any records to the writer. If not
>> + * it will fail with REFTABLE_API_ERROR.
>> */
>
> Thanks for updating this. I think the reftable library is one of those
> code areas where it makes sense to sneak in a formatting fix every now
> and then because its coding style is quite alien to Git's own in some
> places. We could also do it all in one go, but I strongly doubt that it
> would be worth the churn.
>
Generally I try to sneak in small fixes like this around code being
touched. I know it is a little more toll on reviewers, but small
improvements do add up.
>> -void reftable_writer_set_limits(struct reftable_writer *w, uint64_t min,
>> - uint64_t max);
>> +int reftable_writer_set_limits(struct reftable_writer *w, uint64_t min,
>> + uint64_t max);
>>
>> /*
>> Add a reftable_ref_record. The record should have names that come after
>
>> diff --git a/reftable/writer.c b/reftable/writer.c
>> index 740c98038eaf883258bef4988f78977ac7e4a75a..03acbdbcce75fd51820c5fb016bd94f0f7f4914a 100644
>> --- a/reftable/writer.c
>> +++ b/reftable/writer.c
>> @@ -179,11 +179,20 @@ int reftable_writer_new(struct reftable_writer **out,
>> return 0;
>> }
>>
>> -void reftable_writer_set_limits(struct reftable_writer *w, uint64_t min,
>> - uint64_t max)
>> +int reftable_writer_set_limits(struct reftable_writer *w, uint64_t min,
>> + uint64_t max)
>> {
>> + /*
>> + * The limits should be set before any records are added to the writer.
>> + * Check if any records were added by checking if `last_key` was set.
>> + */
>> + if (w->last_key.len)
>> + return REFTABLE_API_ERROR;
>
> Hm. Using the last key feels somewhat dangerous to me as it does get
> reset at times, e.g. when finishing writing the current section. It
> _should_ work, but overall it just feels a tad to disconnected from the
> thing that we actually want to check.
>
> How about we instead use `next`? This variable records the offset of the
> next block we're about to write, and `writer_flush_nonempty_block()`
> uses it directly to check whether we're currently writing the first
> block in order to decide whether it needs to write a header or not. If
> it's 0, we know that we haven't written the first block yet. That feels
> much closer aligned with what we're checking.
>
The last version did use `next`. I changed it because `next` is only
modified once the first block has been written. This would still allow
limit modification post writing of first few records.
This should be okay however since we're concerned about header <> footer
mismatch. But from an ideological point, it makes sense to only allow
limit modification before _any_ records have been written.
I'm thinking if we should use both `if (w->next || w->last_key.len)`.
This way we capture all modifications.
>> diff --git a/t/unit-tests/t-reftable-stack.c b/t/unit-tests/t-reftable-stack.c
>> index aeec195b2b1014445d71c5db39a9795017fd8ff2..b23edf18a7d75b0c2292490ad06d4dfaaa571e79 100644
>> --- a/t/unit-tests/t-reftable-stack.c
>> +++ b/t/unit-tests/t-reftable-stack.c
>
> Can we maybe add a unit test that demonstrates the error?
Good suggestion, will add it!
> Patrick
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]
^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH v3 0/3] refs: small followups to the migration corruption fix
2025-01-21 3:34 ` [PATCH v2 0/3] refs: small followups to the migration corruption fix Karthik Nayak
` (2 preceding siblings ...)
2025-01-21 3:34 ` [PATCH v2 3/3] reftable: prevent 'update_index' changes after adding records Karthik Nayak
@ 2025-01-22 5:35 ` Karthik Nayak
2025-01-22 5:35 ` [PATCH v3 1/3] refs: mark `ref_transaction_update_reflog()` as static Karthik Nayak
` (2 more replies)
3 siblings, 3 replies; 30+ messages in thread
From: Karthik Nayak @ 2025-01-22 5:35 UTC (permalink / raw)
To: git; +Cc: Karthik Nayak, Patrick Steinhardt, Junio C Hamano,
brian m. carlson
This is a follow up to the bug that was reported [1] around `git refs
migrate --ref-format=reftable` where the migration would fail for
repositories with reflogs with lots of entries. This was caused due to a
mismatch in the reftable's header and footer, specifically WRT the
'max_update_index'.
While there was a fix posted. This series is a small followup to fix
some of the topics discussed there:
1. To mark `ref_transaction_update_reflog()` as static since it is only
used internally within 'refs.c'.
2. To change the type of 'max_index' from 'unsigned int' to 'uint64_t'.
This would be much safer for large repositories with millions of files
and on 32bit systems.
3. To add a safeguard to prevent 'update_index' changes post any record
addition. This is a preventive measure to ensure such bugs don't arise
in the future.
This is based on top of master 757161efcc (Sync with Git 2.48.1,
2025-01-13) with 'kn/reflog-migration-fix' merged in.
[1]: https://lore.kernel.org/r/Z4UbkcmJAU1MT-Rs@tapette.crustytoothpaste.net
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
Changes in v3:
- Commit 3: Modify the commit message to make it clearer that this is a
preventive measure. Modify the condition to check for both `last_key`
and `next`. Also add a unit test.
- Link to v2: https://lore.kernel.org/r/20250121-461-corrupted-reftable-followup-v2-0-37e26c7a79b4@gmail.com
Changes in v2:
- Commit 1: Keep the function comments.
- Commit 3: Modify the requirement to be for any record writes instead
of the first block write. This ensures that the limits need to be set
before any records being added. Instead of using `BUG()`, return
`REFTABLE_API_ERROR`. Handle all callers as needed.
- Link to v1: https://lore.kernel.org/r/20250117-461-corrupted-reftable-followup-v1-0-70ee605ae3fe@gmail.com
---
Karthik Nayak (3):
refs: mark `ref_transaction_update_reflog()` as static
refs: use 'uint64_t' for 'ref_update.index'
reftable: prevent 'update_index' changes after adding records
refs.c | 24 ++++++++++++------
refs.h | 14 -----------
refs/refs-internal.h | 4 +--
refs/reftable-backend.c | 22 ++++++++++++-----
reftable/reftable-error.h | 1 +
reftable/reftable-writer.h | 24 ++++++++++--------
reftable/stack.c | 6 +++--
reftable/writer.c | 15 +++++++++++-
t/unit-tests/t-reftable-stack.c | 54 ++++++++++++++++++++++++++++++++++++++---
9 files changed, 118 insertions(+), 46 deletions(-)
---
Range-diff versus v2:
1: 76e6da564a = 1: cafede9b10 refs: mark `ref_transaction_update_reflog()` as static
2: f8e2e81eb4 = 2: 737360d883 refs: use 'uint64_t' for 'ref_update.index'
3: ca92e29ecc ! 3: 01bf1d765f reftable: prevent 'update_index' changes after adding records
@@ Commit message
and footer values. The footer would contain the newer values while the
header retained the original ones.
- To fix this bug, prevent callers from updating these values after any
- record is written. To do this, modify the function to return an error
- whenever the limits are modified after any record adds. Check for record
- adds within `reftable_writer_set_limits()` by checking the `last_key`
- variable, which is set whenever a new record is added.
+ To protect against this bug, prevent callers from updating these values
+ after any record is written. To do this, modify the function to return
+ an error whenever the limits are modified after any record adds. Check
+ for record adds within `reftable_writer_set_limits()` by checking the
+ `last_key` and `next` variable. The former is updated after each record
+ added, but is reset at certain points. The latter is set after writing
+ the first block.
Modify all callers of the function to anticipate a return type and
- handle it accordingly.
+ handle it accordingly. Add a unit test to also ensure the function
+ returns the error as expected.
Helped-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
@@ reftable/writer.c: int reftable_writer_new(struct reftable_writer **out,
}
-void reftable_writer_set_limits(struct reftable_writer *w, uint64_t min,
-- uint64_t max)
+int reftable_writer_set_limits(struct reftable_writer *w, uint64_t min,
-+ uint64_t max)
+ uint64_t max)
{
+ /*
-+ * The limits should be set before any records are added to the writer.
-+ * Check if any records were added by checking if `last_key` was set.
++ * Set the min/max update index limits for the reftable writer.
++ * This must be called before adding any records, since:
++ * - The 'next' field gets set after writing the first block.
++ * - The 'last_key' field updates with each new record (but resets
++ * after sections).
++ * Returns REFTABLE_API_ERROR if called after writing has begun.
+ */
-+ if (w->last_key.len)
++ if (w->next || w->last_key.len)
+ return REFTABLE_API_ERROR;
+
w->min_update_index = min;
@@ t/unit-tests/t-reftable-stack.c: static void t_reflog_expire(void)
return 0;
}
+@@ t/unit-tests/t-reftable-stack.c: static void t_reftable_stack_reload_with_missing_table(void)
+ clear_dir(dir);
+ }
+
++static int write_limits_after_ref(struct reftable_writer *wr, void *arg)
++{
++ struct reftable_ref_record *ref = arg;
++ check(!reftable_writer_set_limits(wr, ref->update_index, ref->update_index));
++ check(!reftable_writer_add_ref(wr, ref));
++ return reftable_writer_set_limits(wr, ref->update_index, ref->update_index);
++}
++
++static void t_reftable_invalid_limit_updates(void)
++{
++ struct reftable_ref_record ref = {
++ .refname = (char *) "HEAD",
++ .update_index = 1,
++ .value_type = REFTABLE_REF_SYMREF,
++ .value.symref = (char *) "master",
++ };
++ struct reftable_write_options opts = {
++ .default_permissions = 0660,
++ };
++ struct reftable_addition *add = NULL;
++ char *dir = get_tmp_dir(__LINE__);
++ struct reftable_stack *st = NULL;
++ int err;
++
++ err = reftable_new_stack(&st, dir, &opts);
++ check(!err);
++
++ reftable_addition_destroy(add);
++
++ err = reftable_stack_new_addition(&add, st, 0);
++ check(!err);
++
++ /*
++ * write_limits_after_ref also updates the update indexes after adding
++ * the record. This should cause an err to be returned, since the limits
++ * must be set at the start.
++ */
++ err = reftable_addition_add(add, write_limits_after_ref, &ref);
++ check_int(err, ==, REFTABLE_API_ERROR);
++
++ reftable_addition_destroy(add);
++ reftable_stack_destroy(st);
++ clear_dir(dir);
++}
++
+ int cmd_main(int argc UNUSED, const char *argv[] UNUSED)
+ {
+ TEST(t_empty_add(), "empty addition to stack");
+ TEST(t_read_file(), "read_lines works");
+ TEST(t_reflog_expire(), "expire reflog entries");
++ TEST(t_reftable_invalid_limit_updates(), "prevent limit updates after adding records");
+ TEST(t_reftable_stack_add(), "add multiple refs and logs to stack");
+ TEST(t_reftable_stack_add_one(), "add a single ref record to stack");
+ TEST(t_reftable_stack_add_performs_auto_compaction(), "addition to stack triggers auto-compaction");
---
base-commit: a5aa44e7930761cb900813d971b4105f901818fb
change-id: 20250117-461-corrupted-reftable-followup-eb0e4fd1a723
Thanks
- Karthik
^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH v3 1/3] refs: mark `ref_transaction_update_reflog()` as static
2025-01-22 5:35 ` [PATCH v3 0/3] refs: small followups to the migration corruption fix Karthik Nayak
@ 2025-01-22 5:35 ` Karthik Nayak
2025-01-22 5:35 ` [PATCH v3 2/3] refs: use 'uint64_t' for 'ref_update.index' Karthik Nayak
2025-01-22 5:35 ` [PATCH v3 3/3] reftable: prevent 'update_index' changes after adding records Karthik Nayak
2 siblings, 0 replies; 30+ messages in thread
From: Karthik Nayak @ 2025-01-22 5:35 UTC (permalink / raw)
To: git; +Cc: Karthik Nayak, Patrick Steinhardt, Junio C Hamano
The `ref_transaction_update_reflog()` function is only used within
'refs.c', so mark it as static.
Reported-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
refs.c | 22 +++++++++++++++-------
refs.h | 14 --------------
2 files changed, 15 insertions(+), 21 deletions(-)
diff --git a/refs.c b/refs.c
index f7b6f0f897eb58665e10a2efd3eb53c3f72abe61..ad6d774717150f1fe68a59c629e05e49a469693f 100644
--- a/refs.c
+++ b/refs.c
@@ -1318,13 +1318,21 @@ int ref_transaction_update(struct ref_transaction *transaction,
return 0;
}
-int ref_transaction_update_reflog(struct ref_transaction *transaction,
- const char *refname,
- const struct object_id *new_oid,
- const struct object_id *old_oid,
- const char *committer_info, unsigned int flags,
- const char *msg, unsigned int index,
- struct strbuf *err)
+/*
+ * Similar to`ref_transaction_update`, but this function is only for adding
+ * a reflog update. Supports providing custom committer information. The index
+ * field can be utiltized to order updates as desired. When not used, the
+ * updates default to being ordered by refname.
+ */
+static int ref_transaction_update_reflog(struct ref_transaction *transaction,
+ const char *refname,
+ const struct object_id *new_oid,
+ const struct object_id *old_oid,
+ const char *committer_info,
+ unsigned int flags,
+ const char *msg,
+ unsigned int index,
+ struct strbuf *err)
{
struct ref_update *update;
diff --git a/refs.h b/refs.h
index a0cdd99250e8286b55808b697b0a94afac5d8319..09be47afbee51e99f4ae49588cd65596ccfcb07e 100644
--- a/refs.h
+++ b/refs.h
@@ -771,20 +771,6 @@ int ref_transaction_update(struct ref_transaction *transaction,
unsigned int flags, const char *msg,
struct strbuf *err);
-/*
- * Similar to`ref_transaction_update`, but this function is only for adding
- * a reflog update. Supports providing custom committer information. The index
- * field can be utiltized to order updates as desired. When not used, the
- * updates default to being ordered by refname.
- */
-int ref_transaction_update_reflog(struct ref_transaction *transaction,
- const char *refname,
- const struct object_id *new_oid,
- const struct object_id *old_oid,
- const char *committer_info, unsigned int flags,
- const char *msg, unsigned int index,
- struct strbuf *err);
-
/*
* Add a reference creation to transaction. new_oid is the value that
* the reference should have after the update; it must not be
--
2.47.0
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH v3 2/3] refs: use 'uint64_t' for 'ref_update.index'
2025-01-22 5:35 ` [PATCH v3 0/3] refs: small followups to the migration corruption fix Karthik Nayak
2025-01-22 5:35 ` [PATCH v3 1/3] refs: mark `ref_transaction_update_reflog()` as static Karthik Nayak
@ 2025-01-22 5:35 ` Karthik Nayak
2025-01-22 5:35 ` [PATCH v3 3/3] reftable: prevent 'update_index' changes after adding records Karthik Nayak
2 siblings, 0 replies; 30+ messages in thread
From: Karthik Nayak @ 2025-01-22 5:35 UTC (permalink / raw)
To: git; +Cc: Karthik Nayak, Patrick Steinhardt, brian m. carlson
The 'ref_update.index' variable is used to store an index for a given
reference update. This index is used to order the updates in a
predetermined order, while the default ordering is alphabetical as per
the refname.
For large repositories with millions of references, it should be safer
to use 'uint64_t'. Let's do that. This also is applied for all other
code sections where we store 'index' and pass it around.
Reported-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
refs.c | 4 ++--
refs/refs-internal.h | 4 ++--
refs/reftable-backend.c | 2 +-
3 files changed, 5 insertions(+), 5 deletions(-)
diff --git a/refs.c b/refs.c
index ad6d774717150f1fe68a59c629e05e49a469693f..ef04f403a6a3a34f9156b4cf68c3daa29c9cbad6 100644
--- a/refs.c
+++ b/refs.c
@@ -1331,7 +1331,7 @@ static int ref_transaction_update_reflog(struct ref_transaction *transaction,
const char *committer_info,
unsigned int flags,
const char *msg,
- unsigned int index,
+ uint64_t index,
struct strbuf *err)
{
struct ref_update *update;
@@ -2813,7 +2813,7 @@ static int migrate_one_ref(const char *refname, const char *referent UNUSED, con
}
struct reflog_migration_data {
- unsigned int index;
+ uint64_t index;
const char *refname;
struct ref_store *old_refs;
struct ref_transaction *transaction;
diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index aaab711bb96844755dfa600d37efdb25b30a0765..8894b43d1d1a327d404d3923c507d2d39649de19 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -120,7 +120,7 @@ struct ref_update {
* when migrating reflogs and we want to ensure we carry over the
* same order.
*/
- unsigned int index;
+ uint64_t index;
/*
* If this ref_update was split off of a symref update via
@@ -203,7 +203,7 @@ struct ref_transaction {
enum ref_transaction_state state;
void *backend_data;
unsigned int flags;
- unsigned int max_index;
+ uint64_t max_index;
};
/*
diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
index 289496058e6eb4d3e3aef96ca70219cd4ff78eae..6814c87bc618229ac8a70b904be3f850371ad876 100644
--- a/refs/reftable-backend.c
+++ b/refs/reftable-backend.c
@@ -942,7 +942,7 @@ struct write_transaction_table_arg {
size_t updates_nr;
size_t updates_alloc;
size_t updates_expected;
- unsigned int max_index;
+ uint64_t max_index;
};
struct reftable_transaction_data {
--
2.47.0
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH v3 3/3] reftable: prevent 'update_index' changes after adding records
2025-01-22 5:35 ` [PATCH v3 0/3] refs: small followups to the migration corruption fix Karthik Nayak
2025-01-22 5:35 ` [PATCH v3 1/3] refs: mark `ref_transaction_update_reflog()` as static Karthik Nayak
2025-01-22 5:35 ` [PATCH v3 2/3] refs: use 'uint64_t' for 'ref_update.index' Karthik Nayak
@ 2025-01-22 5:35 ` Karthik Nayak
2025-01-22 12:12 ` Patrick Steinhardt
2025-02-01 2:24 ` undefined behavior in unit tests, was " Jeff King
2 siblings, 2 replies; 30+ messages in thread
From: Karthik Nayak @ 2025-01-22 5:35 UTC (permalink / raw)
To: git; +Cc: Karthik Nayak, Patrick Steinhardt
The function `reftable_writer_set_limits()` allows updating the
'min_update_index' and 'max_update_index' of a reftable writer. These
values are written to both the writer's header and footer.
Since the header is written during the first block write, any subsequent
changes to the update index would create a mismatch between the header
and footer values. The footer would contain the newer values while the
header retained the original ones.
To protect against this bug, prevent callers from updating these values
after any record is written. To do this, modify the function to return
an error whenever the limits are modified after any record adds. Check
for record adds within `reftable_writer_set_limits()` by checking the
`last_key` and `next` variable. The former is updated after each record
added, but is reset at certain points. The latter is set after writing
the first block.
Modify all callers of the function to anticipate a return type and
handle it accordingly. Add a unit test to also ensure the function
returns the error as expected.
Helped-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
refs/reftable-backend.c | 20 +++++++++++----
reftable/reftable-error.h | 1 +
reftable/reftable-writer.h | 24 ++++++++++--------
reftable/stack.c | 6 +++--
reftable/writer.c | 15 +++++++++++-
t/unit-tests/t-reftable-stack.c | 54 ++++++++++++++++++++++++++++++++++++++---
6 files changed, 99 insertions(+), 21 deletions(-)
diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
index 6814c87bc618229ac8a70b904be3f850371ad876..9cfb0cb26721a9425c3b4a374f7b41e192037315 100644
--- a/refs/reftable-backend.c
+++ b/refs/reftable-backend.c
@@ -1443,7 +1443,9 @@ static int write_transaction_table(struct reftable_writer *writer, void *cb_data
* multiple entries. Each entry will contain a different update_index,
* so set the limits accordingly.
*/
- reftable_writer_set_limits(writer, ts, ts + arg->max_index);
+ ret = reftable_writer_set_limits(writer, ts, ts + arg->max_index);
+ if (ret < 0)
+ goto done;
for (i = 0; i < arg->updates_nr; i++) {
struct reftable_transaction_update *tx_update = &arg->updates[i];
@@ -1766,7 +1768,9 @@ static int write_copy_table(struct reftable_writer *writer, void *cb_data)
deletion_ts = creation_ts = reftable_stack_next_update_index(arg->be->stack);
if (arg->delete_old)
creation_ts++;
- reftable_writer_set_limits(writer, deletion_ts, creation_ts);
+ ret = reftable_writer_set_limits(writer, deletion_ts, creation_ts);
+ if (ret < 0)
+ goto done;
/*
* Add the new reference. If this is a rename then we also delete the
@@ -2298,7 +2302,9 @@ static int write_reflog_existence_table(struct reftable_writer *writer,
if (ret <= 0)
goto done;
- reftable_writer_set_limits(writer, ts, ts);
+ ret = reftable_writer_set_limits(writer, ts, ts);
+ if (ret < 0)
+ goto done;
/*
* The existence entry has both old and new object ID set to the
@@ -2357,7 +2363,9 @@ static int write_reflog_delete_table(struct reftable_writer *writer, void *cb_da
uint64_t ts = reftable_stack_next_update_index(arg->stack);
int ret;
- reftable_writer_set_limits(writer, ts, ts);
+ ret = reftable_writer_set_limits(writer, ts, ts);
+ if (ret < 0)
+ goto out;
ret = reftable_stack_init_log_iterator(arg->stack, &it);
if (ret < 0)
@@ -2434,7 +2442,9 @@ static int write_reflog_expiry_table(struct reftable_writer *writer, void *cb_da
if (arg->records[i].value_type == REFTABLE_LOG_UPDATE)
live_records++;
- reftable_writer_set_limits(writer, ts, ts);
+ ret = reftable_writer_set_limits(writer, ts, ts);
+ if (ret < 0)
+ return ret;
if (!is_null_oid(&arg->update_oid)) {
struct reftable_ref_record ref = {0};
diff --git a/reftable/reftable-error.h b/reftable/reftable-error.h
index f4048265629fe456207b88620658193f770a84f0..a7e33d964d0cfe5546f588d26c0fcb66ab326828 100644
--- a/reftable/reftable-error.h
+++ b/reftable/reftable-error.h
@@ -30,6 +30,7 @@ enum reftable_error {
/* Misuse of the API:
* - on writing a record with NULL refname.
+ * - on writing a record before setting the writer limits.
* - on writing a reftable_ref_record outside the table limits
* - on writing a ref or log record before the stack's
* next_update_inde*x
diff --git a/reftable/reftable-writer.h b/reftable/reftable-writer.h
index 5f9afa620bb00de66c311765fb0ae8c6f56401ae..1ea014d389cc47f173279e3234a82f3fcbc807a0 100644
--- a/reftable/reftable-writer.h
+++ b/reftable/reftable-writer.h
@@ -124,17 +124,21 @@ int reftable_writer_new(struct reftable_writer **out,
int (*flush_func)(void *),
void *writer_arg, const struct reftable_write_options *opts);
-/* Set the range of update indices for the records we will add. When writing a
- table into a stack, the min should be at least
- reftable_stack_next_update_index(), or REFTABLE_API_ERROR is returned.
-
- For transactional updates to a stack, typically min==max, and the
- update_index can be obtained by inspeciting the stack. When converting an
- existing ref database into a single reftable, this would be a range of
- update-index timestamps.
+/*
+ * Set the range of update indices for the records we will add. When writing a
+ * table into a stack, the min should be at least
+ * reftable_stack_next_update_index(), or REFTABLE_API_ERROR is returned.
+ *
+ * For transactional updates to a stack, typically min==max, and the
+ * update_index can be obtained by inspeciting the stack. When converting an
+ * existing ref database into a single reftable, this would be a range of
+ * update-index timestamps.
+ *
+ * The function should be called before adding any records to the writer. If not
+ * it will fail with REFTABLE_API_ERROR.
*/
-void reftable_writer_set_limits(struct reftable_writer *w, uint64_t min,
- uint64_t max);
+int reftable_writer_set_limits(struct reftable_writer *w, uint64_t min,
+ uint64_t max);
/*
Add a reftable_ref_record. The record should have names that come after
diff --git a/reftable/stack.c b/reftable/stack.c
index 531660a49f0948c33041831ee0d740feacb22b2f..9649dbbb04c51e106ee752f14481bbad381cb348 100644
--- a/reftable/stack.c
+++ b/reftable/stack.c
@@ -1058,8 +1058,10 @@ static int stack_write_compact(struct reftable_stack *st,
for (size_t i = first; i <= last; i++)
st->stats.bytes += st->readers[i]->size;
- reftable_writer_set_limits(wr, st->readers[first]->min_update_index,
- st->readers[last]->max_update_index);
+ err = reftable_writer_set_limits(wr, st->readers[first]->min_update_index,
+ st->readers[last]->max_update_index);
+ if (err < 0)
+ goto done;
err = reftable_merged_table_new(&mt, st->readers + first, subtabs_len,
st->opts.hash_id);
diff --git a/reftable/writer.c b/reftable/writer.c
index 740c98038eaf883258bef4988f78977ac7e4a75a..76e24018172fc1d80a6535698757979e53b0e213 100644
--- a/reftable/writer.c
+++ b/reftable/writer.c
@@ -179,11 +179,24 @@ int reftable_writer_new(struct reftable_writer **out,
return 0;
}
-void reftable_writer_set_limits(struct reftable_writer *w, uint64_t min,
+int reftable_writer_set_limits(struct reftable_writer *w, uint64_t min,
uint64_t max)
{
+ /*
+ * Set the min/max update index limits for the reftable writer.
+ * This must be called before adding any records, since:
+ * - The 'next' field gets set after writing the first block.
+ * - The 'last_key' field updates with each new record (but resets
+ * after sections).
+ * Returns REFTABLE_API_ERROR if called after writing has begun.
+ */
+ if (w->next || w->last_key.len)
+ return REFTABLE_API_ERROR;
+
w->min_update_index = min;
w->max_update_index = max;
+
+ return 0;
}
static void writer_release(struct reftable_writer *w)
diff --git a/t/unit-tests/t-reftable-stack.c b/t/unit-tests/t-reftable-stack.c
index aeec195b2b1014445d71c5db39a9795017fd8ff2..c3f0059c346edbe1ad543c9832959c6fc0aa9180 100644
--- a/t/unit-tests/t-reftable-stack.c
+++ b/t/unit-tests/t-reftable-stack.c
@@ -103,7 +103,8 @@ static void t_read_file(void)
static int write_test_ref(struct reftable_writer *wr, void *arg)
{
struct reftable_ref_record *ref = arg;
- reftable_writer_set_limits(wr, ref->update_index, ref->update_index);
+ check(!reftable_writer_set_limits(wr, ref->update_index,
+ ref->update_index));
return reftable_writer_add_ref(wr, ref);
}
@@ -143,7 +144,8 @@ static int write_test_log(struct reftable_writer *wr, void *arg)
{
struct write_log_arg *wla = arg;
- reftable_writer_set_limits(wr, wla->update_index, wla->update_index);
+ check(!reftable_writer_set_limits(wr, wla->update_index,
+ wla->update_index));
return reftable_writer_add_log(wr, wla->log);
}
@@ -961,7 +963,7 @@ static void t_reflog_expire(void)
static int write_nothing(struct reftable_writer *wr, void *arg UNUSED)
{
- reftable_writer_set_limits(wr, 1, 1);
+ check(!reftable_writer_set_limits(wr, 1, 1));
return 0;
}
@@ -1369,11 +1371,57 @@ static void t_reftable_stack_reload_with_missing_table(void)
clear_dir(dir);
}
+static int write_limits_after_ref(struct reftable_writer *wr, void *arg)
+{
+ struct reftable_ref_record *ref = arg;
+ check(!reftable_writer_set_limits(wr, ref->update_index, ref->update_index));
+ check(!reftable_writer_add_ref(wr, ref));
+ return reftable_writer_set_limits(wr, ref->update_index, ref->update_index);
+}
+
+static void t_reftable_invalid_limit_updates(void)
+{
+ struct reftable_ref_record ref = {
+ .refname = (char *) "HEAD",
+ .update_index = 1,
+ .value_type = REFTABLE_REF_SYMREF,
+ .value.symref = (char *) "master",
+ };
+ struct reftable_write_options opts = {
+ .default_permissions = 0660,
+ };
+ struct reftable_addition *add = NULL;
+ char *dir = get_tmp_dir(__LINE__);
+ struct reftable_stack *st = NULL;
+ int err;
+
+ err = reftable_new_stack(&st, dir, &opts);
+ check(!err);
+
+ reftable_addition_destroy(add);
+
+ err = reftable_stack_new_addition(&add, st, 0);
+ check(!err);
+
+ /*
+ * write_limits_after_ref also updates the update indexes after adding
+ * the record. This should cause an err to be returned, since the limits
+ * must be set at the start.
+ */
+ err = reftable_addition_add(add, write_limits_after_ref, &ref);
+ check_int(err, ==, REFTABLE_API_ERROR);
+
+ reftable_addition_destroy(add);
+ reftable_stack_destroy(st);
+ clear_dir(dir);
+}
+
int cmd_main(int argc UNUSED, const char *argv[] UNUSED)
{
TEST(t_empty_add(), "empty addition to stack");
TEST(t_read_file(), "read_lines works");
TEST(t_reflog_expire(), "expire reflog entries");
+ TEST(t_reftable_invalid_limit_updates(), "prevent limit updates after adding records");
TEST(t_reftable_stack_add(), "add multiple refs and logs to stack");
TEST(t_reftable_stack_add_one(), "add a single ref record to stack");
TEST(t_reftable_stack_add_performs_auto_compaction(), "addition to stack triggers auto-compaction");
--
2.47.0
^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [PATCH v3 3/3] reftable: prevent 'update_index' changes after adding records
2025-01-22 5:35 ` [PATCH v3 3/3] reftable: prevent 'update_index' changes after adding records Karthik Nayak
@ 2025-01-22 12:12 ` Patrick Steinhardt
2025-01-22 17:50 ` Junio C Hamano
2025-02-01 2:24 ` undefined behavior in unit tests, was " Jeff King
1 sibling, 1 reply; 30+ messages in thread
From: Patrick Steinhardt @ 2025-01-22 12:12 UTC (permalink / raw)
To: Karthik Nayak; +Cc: git
On Wed, Jan 22, 2025 at 06:35:49AM +0100, Karthik Nayak wrote:
> diff --git a/t/unit-tests/t-reftable-stack.c b/t/unit-tests/t-reftable-stack.c
> index aeec195b2b1014445d71c5db39a9795017fd8ff2..c3f0059c346edbe1ad543c9832959c6fc0aa9180 100644
> --- a/t/unit-tests/t-reftable-stack.c
> +++ b/t/unit-tests/t-reftable-stack.c
> @@ -1369,11 +1371,57 @@ static void t_reftable_stack_reload_with_missing_table(void)
> clear_dir(dir);
> }
>
> +static int write_limits_after_ref(struct reftable_writer *wr, void *arg)
> +{
> + struct reftable_ref_record *ref = arg;
> + check(!reftable_writer_set_limits(wr, ref->update_index, ref->update_index));
> + check(!reftable_writer_add_ref(wr, ref));
> + return reftable_writer_set_limits(wr, ref->update_index, ref->update_index);
> +}
Nice.
> +static void t_reftable_invalid_limit_updates(void)
> +{
> + struct reftable_ref_record ref = {
> + .refname = (char *) "HEAD",
> + .update_index = 1,
> + .value_type = REFTABLE_REF_SYMREF,
> + .value.symref = (char *) "master",
> + };
> + struct reftable_write_options opts = {
> + .default_permissions = 0660,
> + };
Nit: it's unnecessary to pass write options. Other than that the test
looks good to me, and this nit isn't worth a reroll.
Thanks for working on this!
Patrick
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v3 3/3] reftable: prevent 'update_index' changes after adding records
2025-01-22 12:12 ` Patrick Steinhardt
@ 2025-01-22 17:50 ` Junio C Hamano
2025-01-22 21:57 ` Junio C Hamano
0 siblings, 1 reply; 30+ messages in thread
From: Junio C Hamano @ 2025-01-22 17:50 UTC (permalink / raw)
To: Patrick Steinhardt; +Cc: Karthik Nayak, git
Patrick Steinhardt <ps@pks.im> writes:
>> +static void t_reftable_invalid_limit_updates(void)
>> +{
>> + struct reftable_ref_record ref = {
>> + .refname = (char *) "HEAD",
>> + .update_index = 1,
>> + .value_type = REFTABLE_REF_SYMREF,
>> + .value.symref = (char *) "master",
>> + };
>> + struct reftable_write_options opts = {
>> + .default_permissions = 0660,
>> + };
>
> Nit: it's unnecessary to pass write options. Other than that the test
> looks good to me, and this nit isn't worth a reroll.
This write_options opts is used later in this call.
err = reftable_new_stack(&st, dir, &opts);
and "git grep reftable_new_stack" finds many hits, almost all in
t/unit-tests/ hierarchy, only two among many of them passing NULL as
the "use the default set of options" signal. And majority of them
initialize their opts like so:
struct reftable_write_options opts = { 0 };
So I agree that this one should pass NULL to be more explicit that
we do not exercise any special features from the API, but so should
many other existing callers that pass such meaningless &opts, I
would think.
Thanks.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v3 3/3] reftable: prevent 'update_index' changes after adding records
2025-01-22 17:50 ` Junio C Hamano
@ 2025-01-22 21:57 ` Junio C Hamano
0 siblings, 0 replies; 30+ messages in thread
From: Junio C Hamano @ 2025-01-22 21:57 UTC (permalink / raw)
To: Patrick Steinhardt; +Cc: Karthik Nayak, git
Junio C Hamano <gitster@pobox.com> writes:
>> Nit: it's unnecessary to pass write options. Other than that the test
>> looks good to me, and this nit isn't worth a reroll.
>
> This write_options opts is used later in this call.
>
> err = reftable_new_stack(&st, dir, &opts);
>
> and "git grep reftable_new_stack" finds many hits, almost all in
> t/unit-tests/ hierarchy, only two among many of them passing NULL as
> the "use the default set of options" signal. And majority of them
> initialize their opts like so:
>
> struct reftable_write_options opts = { 0 };
>
> So I agree that this one should pass NULL to be more explicit that
> we do not exercise any special features from the API, but so should
> many other existing callers that pass such meaningless &opts, I
> would think.
I may have been vague, but what I meant was that cleaning it up to
pass NULL when &opts is unnecessary can and probably should be left
outside the scope of this miniseries, and done with a clean-up patch
that is separate.
Thanks.
^ permalink raw reply [flat|nested] 30+ messages in thread
* undefined behavior in unit tests, was Re: [PATCH v3 3/3] reftable: prevent 'update_index' changes after adding records
2025-01-22 5:35 ` [PATCH v3 3/3] reftable: prevent 'update_index' changes after adding records Karthik Nayak
2025-01-22 12:12 ` Patrick Steinhardt
@ 2025-02-01 2:24 ` Jeff King
2025-02-01 10:33 ` Phillip Wood
` (2 more replies)
1 sibling, 3 replies; 30+ messages in thread
From: Jeff King @ 2025-02-01 2:24 UTC (permalink / raw)
To: Karthik Nayak; +Cc: git, Patrick Steinhardt
On Wed, Jan 22, 2025 at 06:35:49AM +0100, Karthik Nayak wrote:
> +static void t_reftable_invalid_limit_updates(void)
> +{
> + struct reftable_ref_record ref = {
> + .refname = (char *) "HEAD",
> + .update_index = 1,
> + .value_type = REFTABLE_REF_SYMREF,
> + .value.symref = (char *) "master",
> + };
> + struct reftable_write_options opts = {
> + .default_permissions = 0660,
> + };
> + struct reftable_addition *add = NULL;
> + char *dir = get_tmp_dir(__LINE__);
> + struct reftable_stack *st = NULL;
> + int err;
> +
> + err = reftable_new_stack(&st, dir, &opts);
> + check(!err);
> +
> + reftable_addition_destroy(add);
> +
> + err = reftable_stack_new_addition(&add, st, 0);
> + check(!err);
Coverity complains that this function may have undefined behavior. It's
an issue we have in a lot of other tests that have moved to the
unit-test framework. I've mostly been ignoring it, but this is a pretty
straight-forward example, so I thought I'd write a note.
The issue is that reftable_new_stack() might fail, leaving "st" as NULL.
And then we feed it to reftable_stack_new_addition(), which dereferences
it.
In normal production code, we'd expect something like:
if (err)
return -1;
to avoid running the rest of the function after the first error. But the
test harness check() function doesn't return. It just complains to
stdout and keeps running! So you'll get something like[1]:
$ t/unit-tests/bin/t-reftable-stack
ok 1 - empty addition to stack
ok 2 - read_lines works
ok 3 - expire reflog entries
# check "!err" failed at t/unit-tests/t-reftable-stack.c:1404
Segmentation fault
So...yes, we will probably notice that the test failed from the exit
code. But it's not great when the harness itself barfs so had. Plus a
compiler may be free to reorder things in a confusing way if it can see
that "st" must never be NULL.
It feels like we probably ought to return as soon as a check() fails.
That does create other headaches, though. E.g., we'd potentially leak
from an early return (which our LSan builds will complain about),
meaning that test code needs to start doing the usual "goto out" type of
cleanup.
So I dunno. Maybe we just live with it. But it feels pretty ugly.
-Peff
[1] This would happen in practice if malloc() failed, but you can
simulate it yourself like this, which is what I used to create the
output above:
diff --git a/reftable/stack.c b/reftable/stack.c
index 026a9f9742..fe77132102 100644
--- a/reftable/stack.c
+++ b/reftable/stack.c
@@ -861,6 +861,11 @@ int reftable_stack_new_addition(struct reftable_addition **dest,
int err = 0;
struct reftable_addition empty = REFTABLE_ADDITION_INIT;
+ if (flags & (1 << 16)) {
+ *dest = NULL;
+ return REFTABLE_OUT_OF_MEMORY_ERROR;
+ }
+
REFTABLE_CALLOC_ARRAY(*dest, 1);
if (!*dest)
return REFTABLE_OUT_OF_MEMORY_ERROR;
diff --git a/t/unit-tests/t-reftable-stack.c b/t/unit-tests/t-reftable-stack.c
index c3f0059c34..73ed9792a5 100644
--- a/t/unit-tests/t-reftable-stack.c
+++ b/t/unit-tests/t-reftable-stack.c
@@ -1400,7 +1400,7 @@ static void t_reftable_invalid_limit_updates(void)
reftable_addition_destroy(add);
- err = reftable_stack_new_addition(&add, st, 0);
+ err = reftable_stack_new_addition(&add, st, (1 << 16));
check(!err);
/*
^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: undefined behavior in unit tests, was Re: [PATCH v3 3/3] reftable: prevent 'update_index' changes after adding records
2025-02-01 2:24 ` undefined behavior in unit tests, was " Jeff King
@ 2025-02-01 10:33 ` Phillip Wood
2025-02-03 5:41 ` Patrick Steinhardt
2025-02-03 15:37 ` Jeff King
2025-02-03 5:40 ` Patrick Steinhardt
2025-02-03 15:20 ` Karthik Nayak
2 siblings, 2 replies; 30+ messages in thread
From: Phillip Wood @ 2025-02-01 10:33 UTC (permalink / raw)
To: Jeff King, Karthik Nayak; +Cc: git, Patrick Steinhardt
Hi Peff
On 01/02/2025 02:24, Jeff King wrote:
> On Wed, Jan 22, 2025 at 06:35:49AM +0100, Karthik Nayak wrote:
>
> Coverity complains that this function may have undefined behavior. It's
> an issue we have in a lot of other tests that have moved to the
> unit-test framework. I've mostly been ignoring it, but this is a pretty
> straight-forward example, so I thought I'd write a note.
>
> The issue is that reftable_new_stack() might fail, leaving "st" as NULL.
> And then we feed it to reftable_stack_new_addition(), which dereferences
> it.
>
> In normal production code, we'd expect something like:
>
> if (err)
> return -1;
>
> to avoid running the rest of the function after the first error. But the
> test harness check() function doesn't return. It just complains to
> stdout and keeps running!
That is to allow the test to add more context with test_msg() or do
things like check all the members of a struct before returning. It is a
bug in the test if it does not return after finding a NULL pointer, the
correct usage is
if (!check(ptr))
return;
As we're in the process of switching to using clar which does exit the
text function if a check fails (that means there may be leaks on failure
but if the test is failing then I don't think we should be worrying
about leaks) I don't know if it is worth fixing these or not. I guess it
depends if there are the list of targets for Seyi's Outreachy project.
Best Wishes
Phillip
So you'll get something like[1]:
>
> $ t/unit-tests/bin/t-reftable-stack
> ok 1 - empty addition to stack
> ok 2 - read_lines works
> ok 3 - expire reflog entries
> # check "!err" failed at t/unit-tests/t-reftable-stack.c:1404
> Segmentation fault
>
> So...yes, we will probably notice that the test failed from the exit
> code. But it's not great when the harness itself barfs so had. Plus a
> compiler may be free to reorder things in a confusing way if it can see
> that "st" must never be NULL.
>
> It feels like we probably ought to return as soon as a check() fails.
> That does create other headaches, though. E.g., we'd potentially leak
> from an early return (which our LSan builds will complain about),
> meaning that test code needs to start doing the usual "goto out" type of
> cleanup.
>
> So I dunno. Maybe we just live with it. But it feels pretty ugly.
>
> -Peff
>
> [1] This would happen in practice if malloc() failed, but you can
> simulate it yourself like this, which is what I used to create the
> output above:
>
> diff --git a/reftable/stack.c b/reftable/stack.c
> index 026a9f9742..fe77132102 100644
> --- a/reftable/stack.c
> +++ b/reftable/stack.c
> @@ -861,6 +861,11 @@ int reftable_stack_new_addition(struct reftable_addition **dest,
> int err = 0;
> struct reftable_addition empty = REFTABLE_ADDITION_INIT;
>
> + if (flags & (1 << 16)) {
> + *dest = NULL;
> + return REFTABLE_OUT_OF_MEMORY_ERROR;
> + }
> +
> REFTABLE_CALLOC_ARRAY(*dest, 1);
> if (!*dest)
> return REFTABLE_OUT_OF_MEMORY_ERROR;
> diff --git a/t/unit-tests/t-reftable-stack.c b/t/unit-tests/t-reftable-stack.c
> index c3f0059c34..73ed9792a5 100644
> --- a/t/unit-tests/t-reftable-stack.c
> +++ b/t/unit-tests/t-reftable-stack.c
> @@ -1400,7 +1400,7 @@ static void t_reftable_invalid_limit_updates(void)
>
> reftable_addition_destroy(add);
>
> - err = reftable_stack_new_addition(&add, st, 0);
> + err = reftable_stack_new_addition(&add, st, (1 << 16));
> check(!err);
>
> /*
>
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: undefined behavior in unit tests, was Re: [PATCH v3 3/3] reftable: prevent 'update_index' changes after adding records
2025-02-01 2:24 ` undefined behavior in unit tests, was " Jeff King
2025-02-01 10:33 ` Phillip Wood
@ 2025-02-03 5:40 ` Patrick Steinhardt
2025-02-03 15:20 ` Karthik Nayak
2 siblings, 0 replies; 30+ messages in thread
From: Patrick Steinhardt @ 2025-02-03 5:40 UTC (permalink / raw)
To: Jeff King; +Cc: Karthik Nayak, git
On Fri, Jan 31, 2025 at 09:24:09PM -0500, Jeff King wrote:
> On Wed, Jan 22, 2025 at 06:35:49AM +0100, Karthik Nayak wrote:
>
> > +static void t_reftable_invalid_limit_updates(void)
> > +{
> > + struct reftable_ref_record ref = {
> > + .refname = (char *) "HEAD",
> > + .update_index = 1,
> > + .value_type = REFTABLE_REF_SYMREF,
> > + .value.symref = (char *) "master",
> > + };
> > + struct reftable_write_options opts = {
> > + .default_permissions = 0660,
> > + };
> > + struct reftable_addition *add = NULL;
> > + char *dir = get_tmp_dir(__LINE__);
> > + struct reftable_stack *st = NULL;
> > + int err;
> > +
> > + err = reftable_new_stack(&st, dir, &opts);
> > + check(!err);
> > +
> > + reftable_addition_destroy(add);
> > +
> > + err = reftable_stack_new_addition(&add, st, 0);
> > + check(!err);
>
> Coverity complains that this function may have undefined behavior. It's
> an issue we have in a lot of other tests that have moved to the
> unit-test framework. I've mostly been ignoring it, but this is a pretty
> straight-forward example, so I thought I'd write a note.
>
> The issue is that reftable_new_stack() might fail, leaving "st" as NULL.
> And then we feed it to reftable_stack_new_addition(), which dereferences
> it.
>
> In normal production code, we'd expect something like:
>
> if (err)
> return -1;
>
> to avoid running the rest of the function after the first error. But the
> test harness check() function doesn't return. It just complains to
> stdout and keeps running! So you'll get something like[1]:
>
> $ t/unit-tests/bin/t-reftable-stack
> ok 1 - empty addition to stack
> ok 2 - read_lines works
> ok 3 - expire reflog entries
> # check "!err" failed at t/unit-tests/t-reftable-stack.c:1404
> Segmentation fault
>
> So...yes, we will probably notice that the test failed from the exit
> code. But it's not great when the harness itself barfs so had. Plus a
> compiler may be free to reorder things in a confusing way if it can see
> that "st" must never be NULL.
>
> It feels like we probably ought to return as soon as a check() fails.
> That does create other headaches, though. E.g., we'd potentially leak
> from an early return (which our LSan builds will complain about),
> meaning that test code needs to start doing the usual "goto out" type of
> cleanup.
>
> So I dunno. Maybe we just live with it. But it feels pretty ugly.
It's one of the pitfalls of our own testing framework, from my point of
view. It's somewhat unexpected that the test would just continue running
as this is the wrong thing to do in almost all cases. When assumptions
fail, nothing good will come of it.
In any case, issues like this will be fixed once we migrate those tests
to the clar unit test framework. The default there is to abort the tests
in case an assertion fails, which is much saner from my point of view.
So I'm not sure if it is worth fixing this now, or whether refactoring
the tests to use clar is the more sensible fix.
Patrick
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: undefined behavior in unit tests, was Re: [PATCH v3 3/3] reftable: prevent 'update_index' changes after adding records
2025-02-01 10:33 ` Phillip Wood
@ 2025-02-03 5:41 ` Patrick Steinhardt
2025-02-03 14:11 ` Junio C Hamano
2025-02-03 15:37 ` Jeff King
1 sibling, 1 reply; 30+ messages in thread
From: Patrick Steinhardt @ 2025-02-03 5:41 UTC (permalink / raw)
To: phillip.wood; +Cc: Jeff King, Karthik Nayak, git
On Sat, Feb 01, 2025 at 10:33:13AM +0000, Phillip Wood wrote:
> Hi Peff
>
> On 01/02/2025 02:24, Jeff King wrote:
> > On Wed, Jan 22, 2025 at 06:35:49AM +0100, Karthik Nayak wrote:
> >
> > Coverity complains that this function may have undefined behavior. It's
> > an issue we have in a lot of other tests that have moved to the
> > unit-test framework. I've mostly been ignoring it, but this is a pretty
> > straight-forward example, so I thought I'd write a note.
> >
> > The issue is that reftable_new_stack() might fail, leaving "st" as NULL.
> > And then we feed it to reftable_stack_new_addition(), which dereferences
> > it.
> >
> > In normal production code, we'd expect something like:
> >
> > if (err)
> > return -1;
> >
> > to avoid running the rest of the function after the first error. But the
> > test harness check() function doesn't return. It just complains to
> > stdout and keeps running!
>
> That is to allow the test to add more context with test_msg() or do things
> like check all the members of a struct before returning. It is a bug in the
> test if it does not return after finding a NULL pointer, the correct usage
> is
>
> if (!check(ptr))
> return;
>
> As we're in the process of switching to using clar which does exit the text
> function if a check fails (that means there may be leaks on failure but if
> the test is failing then I don't think we should be worrying about leaks) I
> don't know if it is worth fixing these or not. I guess it depends if there
> are the list of targets for Seyi's Outreachy project.
Ah, yes, should've read your mail first, as you're saying basically the
same as I did :)
Patrick
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: undefined behavior in unit tests, was Re: [PATCH v3 3/3] reftable: prevent 'update_index' changes after adding records
2025-02-03 5:41 ` Patrick Steinhardt
@ 2025-02-03 14:11 ` Junio C Hamano
0 siblings, 0 replies; 30+ messages in thread
From: Junio C Hamano @ 2025-02-03 14:11 UTC (permalink / raw)
To: Patrick Steinhardt; +Cc: phillip.wood, Jeff King, Karthik Nayak, git
Patrick Steinhardt <ps@pks.im> writes:
>> if (!check(ptr))
>> return;
>>
>> As we're in the process of switching to using clar which does exit the text
>> function if a check fails (that means there may be leaks on failure but if
>> the test is failing then I don't think we should be worrying about leaks) I
>> don't know if it is worth fixing these or not. I guess it depends if there
>> are the list of targets for Seyi's Outreachy project.
>
> Ah, yes, should've read your mail first, as you're saying basically the
> same as I did :)
Yup, I 100% agree with both of you. Thanks.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: undefined behavior in unit tests, was Re: [PATCH v3 3/3] reftable: prevent 'update_index' changes after adding records
2025-02-01 2:24 ` undefined behavior in unit tests, was " Jeff King
2025-02-01 10:33 ` Phillip Wood
2025-02-03 5:40 ` Patrick Steinhardt
@ 2025-02-03 15:20 ` Karthik Nayak
2025-02-03 15:38 ` Jeff King
2 siblings, 1 reply; 30+ messages in thread
From: Karthik Nayak @ 2025-02-03 15:20 UTC (permalink / raw)
To: Jeff King; +Cc: git, Patrick Steinhardt
[-- Attachment #1: Type: text/plain, Size: 3925 bytes --]
Jeff King <peff@peff.net> writes:
> On Wed, Jan 22, 2025 at 06:35:49AM +0100, Karthik Nayak wrote:
>
>> +static void t_reftable_invalid_limit_updates(void)
>> +{
>> + struct reftable_ref_record ref = {
>> + .refname = (char *) "HEAD",
>> + .update_index = 1,
>> + .value_type = REFTABLE_REF_SYMREF,
>> + .value.symref = (char *) "master",
>> + };
>> + struct reftable_write_options opts = {
>> + .default_permissions = 0660,
>> + };
>> + struct reftable_addition *add = NULL;
>> + char *dir = get_tmp_dir(__LINE__);
>> + struct reftable_stack *st = NULL;
>> + int err;
>> +
>> + err = reftable_new_stack(&st, dir, &opts);
>> + check(!err);
>> +
>> + reftable_addition_destroy(add);
>> +
>> + err = reftable_stack_new_addition(&add, st, 0);
>> + check(!err);
>
> Coverity complains that this function may have undefined behavior. It's
> an issue we have in a lot of other tests that have moved to the
> unit-test framework. I've mostly been ignoring it, but this is a pretty
> straight-forward example, so I thought I'd write a note.
>
> The issue is that reftable_new_stack() might fail, leaving "st" as NULL.
> And then we feed it to reftable_stack_new_addition(), which dereferences
> it.
>
> In normal production code, we'd expect something like:
>
> if (err)
> return -1;
>
> to avoid running the rest of the function after the first error. But the
> test harness check() function doesn't return. It just complains to
> stdout and keeps running! So you'll get something like[1]:
>
> $ t/unit-tests/bin/t-reftable-stack
> ok 1 - empty addition to stack
> ok 2 - read_lines works
> ok 3 - expire reflog entries
> # check "!err" failed at t/unit-tests/t-reftable-stack.c:1404
> Segmentation fault
>
> So...yes, we will probably notice that the test failed from the exit
> code. But it's not great when the harness itself barfs so had. Plus a
> compiler may be free to reorder things in a confusing way if it can see
> that "st" must never be NULL.
>
> It feels like we probably ought to return as soon as a check() fails.
> That does create other headaches, though. E.g., we'd potentially leak
> from an early return (which our LSan builds will complain about),
> meaning that test code needs to start doing the usual "goto out" type of
> cleanup.
>
> So I dunno. Maybe we just live with it. But it feels pretty ugly.
>
Thanks for pointing it out, I didn't notice this, mostly as I was
copying from existing test cases and it does seem like this (wrong)
pattern exists in a lot of the tests.
Like Phillip and Patrick mentioned, this should go away since we're
moving to using the clar test framework. I think it makes sense to keep
this as is to stay consistent with the rest of code in this file for
now. It is ugly, but seems like that would be simpler while migrating.
> -Peff
>
> [1] This would happen in practice if malloc() failed, but you can
> simulate it yourself like this, which is what I used to create the
> output above:
>
> diff --git a/reftable/stack.c b/reftable/stack.c
> index 026a9f9742..fe77132102 100644
> --- a/reftable/stack.c
> +++ b/reftable/stack.c
> @@ -861,6 +861,11 @@ int reftable_stack_new_addition(struct reftable_addition **dest,
> int err = 0;
> struct reftable_addition empty = REFTABLE_ADDITION_INIT;
>
> + if (flags & (1 << 16)) {
> + *dest = NULL;
> + return REFTABLE_OUT_OF_MEMORY_ERROR;
> + }
> +
> REFTABLE_CALLOC_ARRAY(*dest, 1);
> if (!*dest)
> return REFTABLE_OUT_OF_MEMORY_ERROR;
> diff --git a/t/unit-tests/t-reftable-stack.c b/t/unit-tests/t-reftable-stack.c
> index c3f0059c34..73ed9792a5 100644
> --- a/t/unit-tests/t-reftable-stack.c
> +++ b/t/unit-tests/t-reftable-stack.c
> @@ -1400,7 +1400,7 @@ static void t_reftable_invalid_limit_updates(void)
>
> reftable_addition_destroy(add);
>
> - err = reftable_stack_new_addition(&add, st, 0);
> + err = reftable_stack_new_addition(&add, st, (1 << 16));
> check(!err);
>
> /*
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: undefined behavior in unit tests, was Re: [PATCH v3 3/3] reftable: prevent 'update_index' changes after adding records
2025-02-01 10:33 ` Phillip Wood
2025-02-03 5:41 ` Patrick Steinhardt
@ 2025-02-03 15:37 ` Jeff King
1 sibling, 0 replies; 30+ messages in thread
From: Jeff King @ 2025-02-03 15:37 UTC (permalink / raw)
To: phillip.wood; +Cc: Karthik Nayak, git, Patrick Steinhardt
On Sat, Feb 01, 2025 at 10:33:13AM +0000, Phillip Wood wrote:
> > In normal production code, we'd expect something like:
> >
> > if (err)
> > return -1;
> >
> > to avoid running the rest of the function after the first error. But the
> > test harness check() function doesn't return. It just complains to
> > stdout and keeps running!
>
> That is to allow the test to add more context with test_msg() or do things
> like check all the members of a struct before returning. It is a bug in the
> test if it does not return after finding a NULL pointer, the correct usage
> is
>
> if (!check(ptr))
> return;
>
> As we're in the process of switching to using clar which does exit the text
> function if a check fails (that means there may be leaks on failure but if
> the test is failing then I don't think we should be worrying about leaks) I
> don't know if it is worth fixing these or not. I guess it depends if there
> are the list of targets for Seyi's Outreachy project.
Ah, that's good to hear. I don't think there's any urgency here. These
have been popping up since people started adding more unit-tests/ last
summer. Waiting a few more months to switch to clar is probably not a
big deal.
I'm OK with ignoring a leak in a failing test. I do suspect that
Coverity might still complain about the leaks, because it is doing
static analysis to show that we _can_ leak (rather than the tests, which
are seeing if we leaked at runtime). But I'm not sure how much effort we
want to spend on making tests do cleanup on failure. Especially in a
language like C.
Anyway, I'll continue to ignore these Coverity results for now, then. ;)
-Peff
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: undefined behavior in unit tests, was Re: [PATCH v3 3/3] reftable: prevent 'update_index' changes after adding records
2025-02-03 15:20 ` Karthik Nayak
@ 2025-02-03 15:38 ` Jeff King
0 siblings, 0 replies; 30+ messages in thread
From: Jeff King @ 2025-02-03 15:38 UTC (permalink / raw)
To: Karthik Nayak; +Cc: git, Patrick Steinhardt
On Mon, Feb 03, 2025 at 07:20:16AM -0800, Karthik Nayak wrote:
> Like Phillip and Patrick mentioned, this should go away since we're
> moving to using the clar test framework. I think it makes sense to keep
> this as is to stay consistent with the rest of code in this file for
> now. It is ugly, but seems like that would be simpler while migrating.
Yeah, definitely not worth addressing this single case. I was more
interested in the overall trend, but it sounds like there are plans
there already.
-Peff
^ permalink raw reply [flat|nested] 30+ messages in thread
end of thread, other threads:[~2025-02-03 15:38 UTC | newest]
Thread overview: 30+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-17 7:59 [PATCH 0/3] refs: small followups to the migration corruption fix Karthik Nayak
2025-01-17 7:59 ` [PATCH 1/3] refs: mark `ref_transaction_update_reflog()` as static Karthik Nayak
2025-01-17 9:29 ` Patrick Steinhardt
2025-01-20 11:17 ` Karthik Nayak
2025-01-17 7:59 ` [PATCH 2/3] refs: use 'uint64_t' for 'ref_update.index' Karthik Nayak
2025-01-17 7:59 ` [PATCH 3/3] reftable: prevent 'update_index' changes after header write Karthik Nayak
2025-01-17 9:29 ` Patrick Steinhardt
2025-01-20 11:47 ` Karthik Nayak
2025-01-20 12:18 ` Karthik Nayak
2025-01-21 3:34 ` [PATCH v2 0/3] refs: small followups to the migration corruption fix Karthik Nayak
2025-01-21 3:34 ` [PATCH v2 1/3] refs: mark `ref_transaction_update_reflog()` as static Karthik Nayak
2025-01-21 3:34 ` [PATCH v2 2/3] refs: use 'uint64_t' for 'ref_update.index' Karthik Nayak
2025-01-21 3:34 ` [PATCH v2 3/3] reftable: prevent 'update_index' changes after adding records Karthik Nayak
2025-01-21 6:56 ` Patrick Steinhardt
2025-01-21 11:44 ` Karthik Nayak
2025-01-22 5:35 ` [PATCH v3 0/3] refs: small followups to the migration corruption fix Karthik Nayak
2025-01-22 5:35 ` [PATCH v3 1/3] refs: mark `ref_transaction_update_reflog()` as static Karthik Nayak
2025-01-22 5:35 ` [PATCH v3 2/3] refs: use 'uint64_t' for 'ref_update.index' Karthik Nayak
2025-01-22 5:35 ` [PATCH v3 3/3] reftable: prevent 'update_index' changes after adding records Karthik Nayak
2025-01-22 12:12 ` Patrick Steinhardt
2025-01-22 17:50 ` Junio C Hamano
2025-01-22 21:57 ` Junio C Hamano
2025-02-01 2:24 ` undefined behavior in unit tests, was " Jeff King
2025-02-01 10:33 ` Phillip Wood
2025-02-03 5:41 ` Patrick Steinhardt
2025-02-03 14:11 ` Junio C Hamano
2025-02-03 15:37 ` Jeff King
2025-02-03 5:40 ` Patrick Steinhardt
2025-02-03 15:20 ` Karthik Nayak
2025-02-03 15:38 ` Jeff King
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).