* [PATCH 0/3] refs: small followups to the migration corruption fix @ 2025-01-17 7:59 Karthik Nayak 2025-01-17 7:59 ` [PATCH 1/3] refs: mark `ref_transaction_update_reflog()` as static Karthik Nayak ` (3 more replies) 0 siblings, 4 replies; 30+ messages in thread From: Karthik Nayak @ 2025-01-17 7:59 UTC (permalink / raw) To: git; +Cc: Karthik Nayak, sandals, gitster This is a follow up to the bug that was reported [1] around `git refs migrate --ref-format=reftable` where the migration would fail for repositories with reflogs with lots of entries. This was caused due to a mismatch in the reftable's header and footer, specifically WRT the 'max_update_index'. While there was a fix posted. This series is a small followup to fix some of the topics discussed there: 1. To mark `ref_transaction_update_reflog()` as static since it is only used internally within 'refs.c'. 2. To change the type of 'max_index' from 'unsigned int' to 'uint64_t'. This would be much safer for large repositories with millions of files and on 32bit systems. 3. To add a safeguard to prevent 'update_index' changes after first block is written. This is a preventive measure to ensure such bugs don't arise in the future. This is based on top of master 757161efcc (Sync with Git 2.48.1, 2025-01-13) with 'kn/reflog-migration-fix' merged in. [1]: https://lore.kernel.org/r/Z4UbkcmJAU1MT-Rs@tapette.crustytoothpaste.net Signed-off-by: Karthik Nayak <karthik.188@gmail.com> --- Karthik Nayak (3): refs: mark `ref_transaction_update_reflog()` as static refs: use 'uint64_t' for 'ref_update.index' reftable: prevent 'update_index' changes after header write refs.c | 18 ++++++++++-------- refs.h | 14 -------------- refs/refs-internal.h | 4 ++-- refs/reftable-backend.c | 2 +- reftable/writer.c | 7 +++++++ 5 files changed, 20 insertions(+), 25 deletions(-) --- --- base-commit: a5aa44e7930761cb900813d971b4105f901818fb change-id: 20250117-461-corrupted-reftable-followup-eb0e4fd1a723 Thanks - Karthik ^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH 1/3] refs: mark `ref_transaction_update_reflog()` as static 2025-01-17 7:59 [PATCH 0/3] refs: small followups to the migration corruption fix Karthik Nayak @ 2025-01-17 7:59 ` Karthik Nayak 2025-01-17 9:29 ` Patrick Steinhardt 2025-01-17 7:59 ` [PATCH 2/3] refs: use 'uint64_t' for 'ref_update.index' Karthik Nayak ` (2 subsequent siblings) 3 siblings, 1 reply; 30+ messages in thread From: Karthik Nayak @ 2025-01-17 7:59 UTC (permalink / raw) To: git; +Cc: Karthik Nayak, sandals, gitster The `ref_transaction_update_reflog()` function is only used within 'refs.c', so mark it as static. Reported-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Karthik Nayak <karthik.188@gmail.com> --- refs.c | 16 +++++++++------- refs.h | 14 -------------- 2 files changed, 9 insertions(+), 21 deletions(-) diff --git a/refs.c b/refs.c index f7b6f0f897eb58665e10a2efd3eb53c3f72abe61..1bb6f0356d5c5cae8bea9d6f4d5ff35164a03c64 100644 --- a/refs.c +++ b/refs.c @@ -1318,13 +1318,15 @@ int ref_transaction_update(struct ref_transaction *transaction, return 0; } -int ref_transaction_update_reflog(struct ref_transaction *transaction, - const char *refname, - const struct object_id *new_oid, - const struct object_id *old_oid, - const char *committer_info, unsigned int flags, - const char *msg, unsigned int index, - struct strbuf *err) +static int ref_transaction_update_reflog(struct ref_transaction *transaction, + const char *refname, + const struct object_id *new_oid, + const struct object_id *old_oid, + const char *committer_info, + unsigned int flags, + const char *msg, + unsigned int index, + struct strbuf *err) { struct ref_update *update; diff --git a/refs.h b/refs.h index a0cdd99250e8286b55808b697b0a94afac5d8319..09be47afbee51e99f4ae49588cd65596ccfcb07e 100644 --- a/refs.h +++ b/refs.h @@ -771,20 +771,6 @@ int ref_transaction_update(struct ref_transaction *transaction, unsigned int flags, const char *msg, struct strbuf *err); -/* - * Similar to`ref_transaction_update`, but this function is only for adding - * a reflog update. Supports providing custom committer information. The index - * field can be utiltized to order updates as desired. When not used, the - * updates default to being ordered by refname. - */ -int ref_transaction_update_reflog(struct ref_transaction *transaction, - const char *refname, - const struct object_id *new_oid, - const struct object_id *old_oid, - const char *committer_info, unsigned int flags, - const char *msg, unsigned int index, - struct strbuf *err); - /* * Add a reference creation to transaction. new_oid is the value that * the reference should have after the update; it must not be -- 2.47.0 ^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [PATCH 1/3] refs: mark `ref_transaction_update_reflog()` as static 2025-01-17 7:59 ` [PATCH 1/3] refs: mark `ref_transaction_update_reflog()` as static Karthik Nayak @ 2025-01-17 9:29 ` Patrick Steinhardt 2025-01-20 11:17 ` Karthik Nayak 0 siblings, 1 reply; 30+ messages in thread From: Patrick Steinhardt @ 2025-01-17 9:29 UTC (permalink / raw) To: Karthik Nayak; +Cc: git, sandals, gitster On Fri, Jan 17, 2025 at 08:59:12AM +0100, Karthik Nayak wrote: > diff --git a/refs.h b/refs.h > index a0cdd99250e8286b55808b697b0a94afac5d8319..09be47afbee51e99f4ae49588cd65596ccfcb07e 100644 > --- a/refs.h > +++ b/refs.h > @@ -771,20 +771,6 @@ int ref_transaction_update(struct ref_transaction *transaction, > unsigned int flags, const char *msg, > struct strbuf *err); > > -/* > - * Similar to`ref_transaction_update`, but this function is only for adding > - * a reflog update. Supports providing custom committer information. The index > - * field can be utiltized to order updates as desired. When not used, the > - * updates default to being ordered by refname. > - */ Do we maybe want to move the comment over? The explanation of the index field seems useful to me. Patrick > -int ref_transaction_update_reflog(struct ref_transaction *transaction, > - const char *refname, > - const struct object_id *new_oid, > - const struct object_id *old_oid, > - const char *committer_info, unsigned int flags, > - const char *msg, unsigned int index, > - struct strbuf *err); > - > /* > * Add a reference creation to transaction. new_oid is the value that > * the reference should have after the update; it must not be ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 1/3] refs: mark `ref_transaction_update_reflog()` as static 2025-01-17 9:29 ` Patrick Steinhardt @ 2025-01-20 11:17 ` Karthik Nayak 0 siblings, 0 replies; 30+ messages in thread From: Karthik Nayak @ 2025-01-20 11:17 UTC (permalink / raw) To: Patrick Steinhardt; +Cc: git, sandals, gitster [-- Attachment #1: Type: text/plain, Size: 1392 bytes --] Patrick Steinhardt <ps@pks.im> writes: > On Fri, Jan 17, 2025 at 08:59:12AM +0100, Karthik Nayak wrote: >> diff --git a/refs.h b/refs.h >> index a0cdd99250e8286b55808b697b0a94afac5d8319..09be47afbee51e99f4ae49588cd65596ccfcb07e 100644 >> --- a/refs.h >> +++ b/refs.h >> @@ -771,20 +771,6 @@ int ref_transaction_update(struct ref_transaction *transaction, >> unsigned int flags, const char *msg, >> struct strbuf *err); >> >> -/* >> - * Similar to`ref_transaction_update`, but this function is only for adding >> - * a reflog update. Supports providing custom committer information. The index >> - * field can be utiltized to order updates as desired. When not used, the >> - * updates default to being ordered by refname. >> - */ > > Do we maybe want to move the comment over? The explanation of the index > field seems useful to me. > Yeah, that'd be useful indeed, let me move it. > Patrick > >> -int ref_transaction_update_reflog(struct ref_transaction *transaction, >> - const char *refname, >> - const struct object_id *new_oid, >> - const struct object_id *old_oid, >> - const char *committer_info, unsigned int flags, >> - const char *msg, unsigned int index, >> - struct strbuf *err); >> - >> /* >> * Add a reference creation to transaction. new_oid is the value that >> * the reference should have after the update; it must not be [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 690 bytes --] ^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH 2/3] refs: use 'uint64_t' for 'ref_update.index' 2025-01-17 7:59 [PATCH 0/3] refs: small followups to the migration corruption fix Karthik Nayak 2025-01-17 7:59 ` [PATCH 1/3] refs: mark `ref_transaction_update_reflog()` as static Karthik Nayak @ 2025-01-17 7:59 ` Karthik Nayak 2025-01-17 7:59 ` [PATCH 3/3] reftable: prevent 'update_index' changes after header write Karthik Nayak 2025-01-21 3:34 ` [PATCH v2 0/3] refs: small followups to the migration corruption fix Karthik Nayak 3 siblings, 0 replies; 30+ messages in thread From: Karthik Nayak @ 2025-01-17 7:59 UTC (permalink / raw) To: git; +Cc: Karthik Nayak, sandals, gitster The 'ref_update.index' variable is used to store an index for a given reference update. This index is used to order the updates in a predetermined order, while the default ordering is alphabetical as per the refname. For large repositories with millions of references, it should be safer to use 'uint64_t'. Let's do that. This also is applied for all other code sections where we store 'index' and pass it around. Reported-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Karthik Nayak <karthik.188@gmail.com> --- refs.c | 4 ++-- refs/refs-internal.h | 4 ++-- refs/reftable-backend.c | 2 +- 3 files changed, 5 insertions(+), 5 deletions(-) diff --git a/refs.c b/refs.c index 1bb6f0356d5c5cae8bea9d6f4d5ff35164a03c64..0de23f91b2e4401e93aee6940d4aa2abea4312cf 100644 --- a/refs.c +++ b/refs.c @@ -1325,7 +1325,7 @@ static int ref_transaction_update_reflog(struct ref_transaction *transaction, const char *committer_info, unsigned int flags, const char *msg, - unsigned int index, + uint64_t index, struct strbuf *err) { struct ref_update *update; @@ -2807,7 +2807,7 @@ static int migrate_one_ref(const char *refname, const char *referent UNUSED, con } struct reflog_migration_data { - unsigned int index; + uint64_t index; const char *refname; struct ref_store *old_refs; struct ref_transaction *transaction; diff --git a/refs/refs-internal.h b/refs/refs-internal.h index aaab711bb96844755dfa600d37efdb25b30a0765..8894b43d1d1a327d404d3923c507d2d39649de19 100644 --- a/refs/refs-internal.h +++ b/refs/refs-internal.h @@ -120,7 +120,7 @@ struct ref_update { * when migrating reflogs and we want to ensure we carry over the * same order. */ - unsigned int index; + uint64_t index; /* * If this ref_update was split off of a symref update via @@ -203,7 +203,7 @@ struct ref_transaction { enum ref_transaction_state state; void *backend_data; unsigned int flags; - unsigned int max_index; + uint64_t max_index; }; /* diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c index 289496058e6eb4d3e3aef96ca70219cd4ff78eae..6814c87bc618229ac8a70b904be3f850371ad876 100644 --- a/refs/reftable-backend.c +++ b/refs/reftable-backend.c @@ -942,7 +942,7 @@ struct write_transaction_table_arg { size_t updates_nr; size_t updates_alloc; size_t updates_expected; - unsigned int max_index; + uint64_t max_index; }; struct reftable_transaction_data { -- 2.47.0 ^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH 3/3] reftable: prevent 'update_index' changes after header write 2025-01-17 7:59 [PATCH 0/3] refs: small followups to the migration corruption fix Karthik Nayak 2025-01-17 7:59 ` [PATCH 1/3] refs: mark `ref_transaction_update_reflog()` as static Karthik Nayak 2025-01-17 7:59 ` [PATCH 2/3] refs: use 'uint64_t' for 'ref_update.index' Karthik Nayak @ 2025-01-17 7:59 ` Karthik Nayak 2025-01-17 9:29 ` Patrick Steinhardt 2025-01-21 3:34 ` [PATCH v2 0/3] refs: small followups to the migration corruption fix Karthik Nayak 3 siblings, 1 reply; 30+ messages in thread From: Karthik Nayak @ 2025-01-17 7:59 UTC (permalink / raw) To: git; +Cc: Karthik Nayak, sandals, gitster The function `reftable_writer_set_limits()` allows updating the 'min_update_index' and 'max_update_index' of a reftable writer. These values are written to both the writer's header and footer. Since the header is written during the first block write, any subsequent changes to the update index would create a mismatch between the header and footer values. The footer would contain the newer values while the header retained the original ones. To fix this bug, we now prevent callers from updating these values after the header has been written. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> --- reftable/writer.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/reftable/writer.c b/reftable/writer.c index 740c98038eaf883258bef4988f78977ac7e4a75a..c602b873543790e36178f797ed9f98112671f97f 100644 --- a/reftable/writer.c +++ b/reftable/writer.c @@ -182,6 +182,13 @@ int reftable_writer_new(struct reftable_writer **out, void reftable_writer_set_limits(struct reftable_writer *w, uint64_t min, uint64_t max) { + /* + * The limits shouldn't be modified post writing the first block, else + * it would cause a mismatch between the header and the footer. + */ + if (w->next) + BUG("update index modified after writing first block"); + w->min_update_index = min; w->max_update_index = max; } -- 2.47.0 ^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [PATCH 3/3] reftable: prevent 'update_index' changes after header write 2025-01-17 7:59 ` [PATCH 3/3] reftable: prevent 'update_index' changes after header write Karthik Nayak @ 2025-01-17 9:29 ` Patrick Steinhardt 2025-01-20 11:47 ` Karthik Nayak 0 siblings, 1 reply; 30+ messages in thread From: Patrick Steinhardt @ 2025-01-17 9:29 UTC (permalink / raw) To: Karthik Nayak; +Cc: git, sandals, gitster On Fri, Jan 17, 2025 at 08:59:14AM +0100, Karthik Nayak wrote: > diff --git a/reftable/writer.c b/reftable/writer.c > index 740c98038eaf883258bef4988f78977ac7e4a75a..c602b873543790e36178f797ed9f98112671f97f 100644 > --- a/reftable/writer.c > +++ b/reftable/writer.c > @@ -182,6 +182,13 @@ int reftable_writer_new(struct reftable_writer **out, > void reftable_writer_set_limits(struct reftable_writer *w, uint64_t min, > uint64_t max) > { > + /* > + * The limits shouldn't be modified post writing the first block, else > + * it would cause a mismatch between the header and the footer. > + */ Can we make this *even* stricter? I think that this is something that is easy to do wrong, and the fact that it only triggers in some situations of misuse may easily make tests miss this issue. So ideally, we should assert that `set_limits()` is always called before queueing any records to the writer. This would make us error out in all situations where the calling order is wrong. There are two ways I can see us doing that: - Detect any state written by `writer_add_record()` and error out if it's set when `reftable_writer_set_limits()` is called. - Adapt `reftable_writer_new()` so that it takes the update indices as input and drop `reftable_writer_set_limits()` altogether. The latter might be preferable as you basically want to set limits in all (most?) situations anyway. > + if (w->next) > + BUG("update index modified after writing first block"); Let's not use BUG, but rather return a `REFTABLE_API_ERROR` error. It requires a bit more plumbing because we'll also hvae to adapt all callers to handle errors. But on the one hand we don't want to die in library code. And on the other hand we don't want to keep on adding more dependencies on "git-compat-util.h". Patrick ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 3/3] reftable: prevent 'update_index' changes after header write 2025-01-17 9:29 ` Patrick Steinhardt @ 2025-01-20 11:47 ` Karthik Nayak 2025-01-20 12:18 ` Karthik Nayak 0 siblings, 1 reply; 30+ messages in thread From: Karthik Nayak @ 2025-01-20 11:47 UTC (permalink / raw) To: Patrick Steinhardt; +Cc: git, sandals, gitster [-- Attachment #1: Type: text/plain, Size: 2709 bytes --] Patrick Steinhardt <ps@pks.im> writes: > On Fri, Jan 17, 2025 at 08:59:14AM +0100, Karthik Nayak wrote: >> diff --git a/reftable/writer.c b/reftable/writer.c >> index 740c98038eaf883258bef4988f78977ac7e4a75a..c602b873543790e36178f797ed9f98112671f97f 100644 >> --- a/reftable/writer.c >> +++ b/reftable/writer.c >> @@ -182,6 +182,13 @@ int reftable_writer_new(struct reftable_writer **out, >> void reftable_writer_set_limits(struct reftable_writer *w, uint64_t min, >> uint64_t max) >> { >> + /* >> + * The limits shouldn't be modified post writing the first block, else >> + * it would cause a mismatch between the header and the footer. >> + */ > > Can we make this *even* stricter? I think that this is something that is > easy to do wrong, and the fact that it only triggers in some situations > of misuse may easily make tests miss this issue. So ideally, we should > assert that `set_limits()` is always called before queueing any records > to the writer. This would make us error out in all situations where the > calling order is wrong. > I agree here, it makes sense to make this stricter. Like you mentioned, currently they are independent. The only way to enforce the limits is to ensure that they are dependent. > There are two ways I can see us doing that: > > - Detect any state written by `writer_add_record()` and error out if > it's set when `reftable_writer_set_limits()` is called. > Yeah I think this would be simple to do. I guess we can check `w->last_key` is set, since any record write would modify that. > - Adapt `reftable_writer_new()` so that it takes the update indices as > input and drop `reftable_writer_set_limits()` altogether. > This one is a bit harder to do because of our flow. Generally the writer is provided to callers via a callback function passed to `reftable_addition_add()`. I guess I could simply pass the data: caller -> reftable_addition_add() -> reftable_writer_new() Any direct users of `reftable_writer_new()` would simply pass the data directly. I'll play around and see if this is doable without too much refactoring and have something in the next version. > The latter might be preferable as you basically want to set limits in > all (most?) situations anyway. > >> + if (w->next) >> + BUG("update index modified after writing first block"); > > Let's not use BUG, but rather return a `REFTABLE_API_ERROR` error. It > requires a bit more plumbing because we'll also hvae to adapt all > callers to handle errors. But on the one hand we don't want to die in > library code. And on the other hand we don't want to keep on adding more > dependencies on "git-compat-util.h". > Fair enough, thanks for explaining. > Patrick [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 690 bytes --] ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 3/3] reftable: prevent 'update_index' changes after header write 2025-01-20 11:47 ` Karthik Nayak @ 2025-01-20 12:18 ` Karthik Nayak 0 siblings, 0 replies; 30+ messages in thread From: Karthik Nayak @ 2025-01-20 12:18 UTC (permalink / raw) To: Patrick Steinhardt; +Cc: git, sandals, gitster [-- Attachment #1: Type: text/plain, Size: 2937 bytes --] Karthik Nayak <karthik.188@gmail.com> writes: > Patrick Steinhardt <ps@pks.im> writes: > >> On Fri, Jan 17, 2025 at 08:59:14AM +0100, Karthik Nayak wrote: >>> diff --git a/reftable/writer.c b/reftable/writer.c >>> index 740c98038eaf883258bef4988f78977ac7e4a75a..c602b873543790e36178f797ed9f98112671f97f 100644 >>> --- a/reftable/writer.c >>> +++ b/reftable/writer.c >>> @@ -182,6 +182,13 @@ int reftable_writer_new(struct reftable_writer **out, >>> void reftable_writer_set_limits(struct reftable_writer *w, uint64_t min, >>> uint64_t max) >>> { >>> + /* >>> + * The limits shouldn't be modified post writing the first block, else >>> + * it would cause a mismatch between the header and the footer. >>> + */ >> >> Can we make this *even* stricter? I think that this is something that is >> easy to do wrong, and the fact that it only triggers in some situations >> of misuse may easily make tests miss this issue. So ideally, we should >> assert that `set_limits()` is always called before queueing any records >> to the writer. This would make us error out in all situations where the >> calling order is wrong. >> > > I agree here, it makes sense to make this stricter. Like you mentioned, > currently they are independent. The only way to enforce the limits is to > ensure that they are dependent. > >> There are two ways I can see us doing that: >> >> - Detect any state written by `writer_add_record()` and error out if >> it's set when `reftable_writer_set_limits()` is called. >> > > Yeah I think this would be simple to do. I guess we can check > `w->last_key` is set, since any record write would modify that. > >> - Adapt `reftable_writer_new()` so that it takes the update indices as >> input and drop `reftable_writer_set_limits()` altogether. >> > > This one is a bit harder to do because of our flow. Generally the writer > is provided to callers via a callback function passed to > `reftable_addition_add()`. I guess I could simply pass the data: > > caller -> reftable_addition_add() -> reftable_writer_new() > > Any direct users of `reftable_writer_new()` would simply pass the data > directly. > > I'll play around and see if this is doable without too much refactoring > and have something in the next version. It seems like `reftable_addition_add()` is internally and externally called by a lot of functions and as such, modifying them all would be tedious. Another idea is to modify the function pointer that `reftable_addition_add()` receives so that the `write_table` argument receives a function which would set the limits, but this seems like bad design to me. So the best/easiest way to do this is to error out if any state is set before calling `reftable_writer_set_limits()`. I also realized that if `reftable_writer_set_limits()` isn't called, the writer errors anyways. So just ensuring that `reftable_writer_set_limits()` checks for any state being set would work well. [snip] [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 690 bytes --] ^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH v2 0/3] refs: small followups to the migration corruption fix 2025-01-17 7:59 [PATCH 0/3] refs: small followups to the migration corruption fix Karthik Nayak ` (2 preceding siblings ...) 2025-01-17 7:59 ` [PATCH 3/3] reftable: prevent 'update_index' changes after header write Karthik Nayak @ 2025-01-21 3:34 ` Karthik Nayak 2025-01-21 3:34 ` [PATCH v2 1/3] refs: mark `ref_transaction_update_reflog()` as static Karthik Nayak ` (3 more replies) 3 siblings, 4 replies; 30+ messages in thread From: Karthik Nayak @ 2025-01-21 3:34 UTC (permalink / raw) To: git; +Cc: Karthik Nayak, ps, Junio C Hamano, brian m. carlson This is a follow up to the bug that was reported [1] around `git refs migrate --ref-format=reftable` where the migration would fail for repositories with reflogs with lots of entries. This was caused due to a mismatch in the reftable's header and footer, specifically WRT the 'max_update_index'. While there was a fix posted. This series is a small followup to fix some of the topics discussed there: 1. To mark `ref_transaction_update_reflog()` as static since it is only used internally within 'refs.c'. 2. To change the type of 'max_index' from 'unsigned int' to 'uint64_t'. This would be much safer for large repositories with millions of files and on 32bit systems. 3. To add a safeguard to prevent 'update_index' changes post any record addition. This is a preventive measure to ensure such bugs don't arise in the future. This is based on top of master 757161efcc (Sync with Git 2.48.1, 2025-01-13) with 'kn/reflog-migration-fix' merged in. [1]: https://lore.kernel.org/r/Z4UbkcmJAU1MT-Rs@tapette.crustytoothpaste.net Signed-off-by: Karthik Nayak <karthik.188@gmail.com> --- Changes in v2: - Commit 1: Keep the function comments. - Commit 3: Modify the requirement to be for any record writes instead of the first block write. This ensures that the limits need to be set before any records being added. Instead of using `BUG()`, return `REFTABLE_API_ERROR`. Handle all callers as needed. - Link to v1: https://lore.kernel.org/r/20250117-461-corrupted-reftable-followup-v1-0-70ee605ae3fe@gmail.com --- Karthik Nayak (3): refs: mark `ref_transaction_update_reflog()` as static refs: use 'uint64_t' for 'ref_update.index' reftable: prevent 'update_index' changes after adding records refs.c | 24 ++++++++++++++++-------- refs.h | 14 -------------- refs/refs-internal.h | 4 ++-- refs/reftable-backend.c | 22 ++++++++++++++++------ reftable/reftable-error.h | 1 + reftable/reftable-writer.h | 24 ++++++++++++++---------- reftable/stack.c | 6 ++++-- reftable/writer.c | 13 +++++++++++-- t/unit-tests/t-reftable-stack.c | 8 +++++--- 9 files changed, 69 insertions(+), 47 deletions(-) --- Range-diff versus v1: 1: 82a4f950dd ! 1: 5821ddfbe2 refs: mark `ref_transaction_update_reflog()` as static @@ refs.c: int ref_transaction_update(struct ref_transaction *transaction, - const char *committer_info, unsigned int flags, - const char *msg, unsigned int index, - struct strbuf *err) ++/* ++ * Similar to`ref_transaction_update`, but this function is only for adding ++ * a reflog update. Supports providing custom committer information. The index ++ * field can be utiltized to order updates as desired. When not used, the ++ * updates default to being ordered by refname. ++ */ +static int ref_transaction_update_reflog(struct ref_transaction *transaction, + const char *refname, + const struct object_id *new_oid, 2: 51ed04e95d = 2: 8025cfadf2 refs: use 'uint64_t' for 'ref_update.index' 3: 34e3001e29 < -: ---------- reftable: prevent 'update_index' changes after header write -: ---------- > 3: 4b0416ce20 reftable: prevent 'update_index' changes after adding records --- base-commit: a5aa44e7930761cb900813d971b4105f901818fb change-id: 20250117-461-corrupted-reftable-followup-eb0e4fd1a723 Thanks - Karthik ^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH v2 1/3] refs: mark `ref_transaction_update_reflog()` as static 2025-01-21 3:34 ` [PATCH v2 0/3] refs: small followups to the migration corruption fix Karthik Nayak @ 2025-01-21 3:34 ` Karthik Nayak 2025-01-21 3:34 ` [PATCH v2 2/3] refs: use 'uint64_t' for 'ref_update.index' Karthik Nayak ` (2 subsequent siblings) 3 siblings, 0 replies; 30+ messages in thread From: Karthik Nayak @ 2025-01-21 3:34 UTC (permalink / raw) To: git; +Cc: Karthik Nayak, ps, Junio C Hamano The `ref_transaction_update_reflog()` function is only used within 'refs.c', so mark it as static. Reported-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Karthik Nayak <karthik.188@gmail.com> --- refs.c | 22 +++++++++++++++------- refs.h | 14 -------------- 2 files changed, 15 insertions(+), 21 deletions(-) diff --git a/refs.c b/refs.c index f7b6f0f897eb58665e10a2efd3eb53c3f72abe61..ad6d774717150f1fe68a59c629e05e49a469693f 100644 --- a/refs.c +++ b/refs.c @@ -1318,13 +1318,21 @@ int ref_transaction_update(struct ref_transaction *transaction, return 0; } -int ref_transaction_update_reflog(struct ref_transaction *transaction, - const char *refname, - const struct object_id *new_oid, - const struct object_id *old_oid, - const char *committer_info, unsigned int flags, - const char *msg, unsigned int index, - struct strbuf *err) +/* + * Similar to`ref_transaction_update`, but this function is only for adding + * a reflog update. Supports providing custom committer information. The index + * field can be utiltized to order updates as desired. When not used, the + * updates default to being ordered by refname. + */ +static int ref_transaction_update_reflog(struct ref_transaction *transaction, + const char *refname, + const struct object_id *new_oid, + const struct object_id *old_oid, + const char *committer_info, + unsigned int flags, + const char *msg, + unsigned int index, + struct strbuf *err) { struct ref_update *update; diff --git a/refs.h b/refs.h index a0cdd99250e8286b55808b697b0a94afac5d8319..09be47afbee51e99f4ae49588cd65596ccfcb07e 100644 --- a/refs.h +++ b/refs.h @@ -771,20 +771,6 @@ int ref_transaction_update(struct ref_transaction *transaction, unsigned int flags, const char *msg, struct strbuf *err); -/* - * Similar to`ref_transaction_update`, but this function is only for adding - * a reflog update. Supports providing custom committer information. The index - * field can be utiltized to order updates as desired. When not used, the - * updates default to being ordered by refname. - */ -int ref_transaction_update_reflog(struct ref_transaction *transaction, - const char *refname, - const struct object_id *new_oid, - const struct object_id *old_oid, - const char *committer_info, unsigned int flags, - const char *msg, unsigned int index, - struct strbuf *err); - /* * Add a reference creation to transaction. new_oid is the value that * the reference should have after the update; it must not be -- 2.47.0 ^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH v2 2/3] refs: use 'uint64_t' for 'ref_update.index' 2025-01-21 3:34 ` [PATCH v2 0/3] refs: small followups to the migration corruption fix Karthik Nayak 2025-01-21 3:34 ` [PATCH v2 1/3] refs: mark `ref_transaction_update_reflog()` as static Karthik Nayak @ 2025-01-21 3:34 ` Karthik Nayak 2025-01-21 3:34 ` [PATCH v2 3/3] reftable: prevent 'update_index' changes after adding records Karthik Nayak 2025-01-22 5:35 ` [PATCH v3 0/3] refs: small followups to the migration corruption fix Karthik Nayak 3 siblings, 0 replies; 30+ messages in thread From: Karthik Nayak @ 2025-01-21 3:34 UTC (permalink / raw) To: git; +Cc: Karthik Nayak, ps, brian m. carlson The 'ref_update.index' variable is used to store an index for a given reference update. This index is used to order the updates in a predetermined order, while the default ordering is alphabetical as per the refname. For large repositories with millions of references, it should be safer to use 'uint64_t'. Let's do that. This also is applied for all other code sections where we store 'index' and pass it around. Reported-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Karthik Nayak <karthik.188@gmail.com> --- refs.c | 4 ++-- refs/refs-internal.h | 4 ++-- refs/reftable-backend.c | 2 +- 3 files changed, 5 insertions(+), 5 deletions(-) diff --git a/refs.c b/refs.c index ad6d774717150f1fe68a59c629e05e49a469693f..ef04f403a6a3a34f9156b4cf68c3daa29c9cbad6 100644 --- a/refs.c +++ b/refs.c @@ -1331,7 +1331,7 @@ static int ref_transaction_update_reflog(struct ref_transaction *transaction, const char *committer_info, unsigned int flags, const char *msg, - unsigned int index, + uint64_t index, struct strbuf *err) { struct ref_update *update; @@ -2813,7 +2813,7 @@ static int migrate_one_ref(const char *refname, const char *referent UNUSED, con } struct reflog_migration_data { - unsigned int index; + uint64_t index; const char *refname; struct ref_store *old_refs; struct ref_transaction *transaction; diff --git a/refs/refs-internal.h b/refs/refs-internal.h index aaab711bb96844755dfa600d37efdb25b30a0765..8894b43d1d1a327d404d3923c507d2d39649de19 100644 --- a/refs/refs-internal.h +++ b/refs/refs-internal.h @@ -120,7 +120,7 @@ struct ref_update { * when migrating reflogs and we want to ensure we carry over the * same order. */ - unsigned int index; + uint64_t index; /* * If this ref_update was split off of a symref update via @@ -203,7 +203,7 @@ struct ref_transaction { enum ref_transaction_state state; void *backend_data; unsigned int flags; - unsigned int max_index; + uint64_t max_index; }; /* diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c index 289496058e6eb4d3e3aef96ca70219cd4ff78eae..6814c87bc618229ac8a70b904be3f850371ad876 100644 --- a/refs/reftable-backend.c +++ b/refs/reftable-backend.c @@ -942,7 +942,7 @@ struct write_transaction_table_arg { size_t updates_nr; size_t updates_alloc; size_t updates_expected; - unsigned int max_index; + uint64_t max_index; }; struct reftable_transaction_data { -- 2.47.0 ^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH v2 3/3] reftable: prevent 'update_index' changes after adding records 2025-01-21 3:34 ` [PATCH v2 0/3] refs: small followups to the migration corruption fix Karthik Nayak 2025-01-21 3:34 ` [PATCH v2 1/3] refs: mark `ref_transaction_update_reflog()` as static Karthik Nayak 2025-01-21 3:34 ` [PATCH v2 2/3] refs: use 'uint64_t' for 'ref_update.index' Karthik Nayak @ 2025-01-21 3:34 ` Karthik Nayak 2025-01-21 6:56 ` Patrick Steinhardt 2025-01-22 5:35 ` [PATCH v3 0/3] refs: small followups to the migration corruption fix Karthik Nayak 3 siblings, 1 reply; 30+ messages in thread From: Karthik Nayak @ 2025-01-21 3:34 UTC (permalink / raw) To: git; +Cc: Karthik Nayak, ps The function `reftable_writer_set_limits()` allows updating the 'min_update_index' and 'max_update_index' of a reftable writer. These values are written to both the writer's header and footer. Since the header is written during the first block write, any subsequent changes to the update index would create a mismatch between the header and footer values. The footer would contain the newer values while the header retained the original ones. To fix this bug, prevent callers from updating these values after any record is written. To do this, modify the function to return an error whenever the limits are modified after any record adds. Check for record adds within `reftable_writer_set_limits()` by checking the `last_key` variable, which is set whenever a new record is added. Modify all callers of the function to anticipate a return type and handle it accordingly. Helped-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Karthik Nayak <karthik.188@gmail.com> --- refs/reftable-backend.c | 20 +++++++++++++++----- reftable/reftable-error.h | 1 + reftable/reftable-writer.h | 24 ++++++++++++++---------- reftable/stack.c | 6 ++++-- reftable/writer.c | 13 +++++++++++-- t/unit-tests/t-reftable-stack.c | 8 +++++--- 6 files changed, 50 insertions(+), 22 deletions(-) diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c index 6814c87bc618229ac8a70b904be3f850371ad876..9cfb0cb26721a9425c3b4a374f7b41e192037315 100644 --- a/refs/reftable-backend.c +++ b/refs/reftable-backend.c @@ -1443,7 +1443,9 @@ static int write_transaction_table(struct reftable_writer *writer, void *cb_data * multiple entries. Each entry will contain a different update_index, * so set the limits accordingly. */ - reftable_writer_set_limits(writer, ts, ts + arg->max_index); + ret = reftable_writer_set_limits(writer, ts, ts + arg->max_index); + if (ret < 0) + goto done; for (i = 0; i < arg->updates_nr; i++) { struct reftable_transaction_update *tx_update = &arg->updates[i]; @@ -1766,7 +1768,9 @@ static int write_copy_table(struct reftable_writer *writer, void *cb_data) deletion_ts = creation_ts = reftable_stack_next_update_index(arg->be->stack); if (arg->delete_old) creation_ts++; - reftable_writer_set_limits(writer, deletion_ts, creation_ts); + ret = reftable_writer_set_limits(writer, deletion_ts, creation_ts); + if (ret < 0) + goto done; /* * Add the new reference. If this is a rename then we also delete the @@ -2298,7 +2302,9 @@ static int write_reflog_existence_table(struct reftable_writer *writer, if (ret <= 0) goto done; - reftable_writer_set_limits(writer, ts, ts); + ret = reftable_writer_set_limits(writer, ts, ts); + if (ret < 0) + goto done; /* * The existence entry has both old and new object ID set to the @@ -2357,7 +2363,9 @@ static int write_reflog_delete_table(struct reftable_writer *writer, void *cb_da uint64_t ts = reftable_stack_next_update_index(arg->stack); int ret; - reftable_writer_set_limits(writer, ts, ts); + ret = reftable_writer_set_limits(writer, ts, ts); + if (ret < 0) + goto out; ret = reftable_stack_init_log_iterator(arg->stack, &it); if (ret < 0) @@ -2434,7 +2442,9 @@ static int write_reflog_expiry_table(struct reftable_writer *writer, void *cb_da if (arg->records[i].value_type == REFTABLE_LOG_UPDATE) live_records++; - reftable_writer_set_limits(writer, ts, ts); + ret = reftable_writer_set_limits(writer, ts, ts); + if (ret < 0) + return ret; if (!is_null_oid(&arg->update_oid)) { struct reftable_ref_record ref = {0}; diff --git a/reftable/reftable-error.h b/reftable/reftable-error.h index f4048265629fe456207b88620658193f770a84f0..a7e33d964d0cfe5546f588d26c0fcb66ab326828 100644 --- a/reftable/reftable-error.h +++ b/reftable/reftable-error.h @@ -30,6 +30,7 @@ enum reftable_error { /* Misuse of the API: * - on writing a record with NULL refname. + * - on writing a record before setting the writer limits. * - on writing a reftable_ref_record outside the table limits * - on writing a ref or log record before the stack's * next_update_inde*x diff --git a/reftable/reftable-writer.h b/reftable/reftable-writer.h index 5f9afa620bb00de66c311765fb0ae8c6f56401ae..1ea014d389cc47f173279e3234a82f3fcbc807a0 100644 --- a/reftable/reftable-writer.h +++ b/reftable/reftable-writer.h @@ -124,17 +124,21 @@ int reftable_writer_new(struct reftable_writer **out, int (*flush_func)(void *), void *writer_arg, const struct reftable_write_options *opts); -/* Set the range of update indices for the records we will add. When writing a - table into a stack, the min should be at least - reftable_stack_next_update_index(), or REFTABLE_API_ERROR is returned. - - For transactional updates to a stack, typically min==max, and the - update_index can be obtained by inspeciting the stack. When converting an - existing ref database into a single reftable, this would be a range of - update-index timestamps. +/* + * Set the range of update indices for the records we will add. When writing a + * table into a stack, the min should be at least + * reftable_stack_next_update_index(), or REFTABLE_API_ERROR is returned. + * + * For transactional updates to a stack, typically min==max, and the + * update_index can be obtained by inspeciting the stack. When converting an + * existing ref database into a single reftable, this would be a range of + * update-index timestamps. + * + * The function should be called before adding any records to the writer. If not + * it will fail with REFTABLE_API_ERROR. */ -void reftable_writer_set_limits(struct reftable_writer *w, uint64_t min, - uint64_t max); +int reftable_writer_set_limits(struct reftable_writer *w, uint64_t min, + uint64_t max); /* Add a reftable_ref_record. The record should have names that come after diff --git a/reftable/stack.c b/reftable/stack.c index 531660a49f0948c33041831ee0d740feacb22b2f..9649dbbb04c51e106ee752f14481bbad381cb348 100644 --- a/reftable/stack.c +++ b/reftable/stack.c @@ -1058,8 +1058,10 @@ static int stack_write_compact(struct reftable_stack *st, for (size_t i = first; i <= last; i++) st->stats.bytes += st->readers[i]->size; - reftable_writer_set_limits(wr, st->readers[first]->min_update_index, - st->readers[last]->max_update_index); + err = reftable_writer_set_limits(wr, st->readers[first]->min_update_index, + st->readers[last]->max_update_index); + if (err < 0) + goto done; err = reftable_merged_table_new(&mt, st->readers + first, subtabs_len, st->opts.hash_id); diff --git a/reftable/writer.c b/reftable/writer.c index 740c98038eaf883258bef4988f78977ac7e4a75a..03acbdbcce75fd51820c5fb016bd94f0f7f4914a 100644 --- a/reftable/writer.c +++ b/reftable/writer.c @@ -179,11 +179,20 @@ int reftable_writer_new(struct reftable_writer **out, return 0; } -void reftable_writer_set_limits(struct reftable_writer *w, uint64_t min, - uint64_t max) +int reftable_writer_set_limits(struct reftable_writer *w, uint64_t min, + uint64_t max) { + /* + * The limits should be set before any records are added to the writer. + * Check if any records were added by checking if `last_key` was set. + */ + if (w->last_key.len) + return REFTABLE_API_ERROR; + w->min_update_index = min; w->max_update_index = max; + + return 0; } static void writer_release(struct reftable_writer *w) diff --git a/t/unit-tests/t-reftable-stack.c b/t/unit-tests/t-reftable-stack.c index aeec195b2b1014445d71c5db39a9795017fd8ff2..b23edf18a7d75b0c2292490ad06d4dfaaa571e79 100644 --- a/t/unit-tests/t-reftable-stack.c +++ b/t/unit-tests/t-reftable-stack.c @@ -103,7 +103,8 @@ static void t_read_file(void) static int write_test_ref(struct reftable_writer *wr, void *arg) { struct reftable_ref_record *ref = arg; - reftable_writer_set_limits(wr, ref->update_index, ref->update_index); + check(!reftable_writer_set_limits(wr, ref->update_index, + ref->update_index)); return reftable_writer_add_ref(wr, ref); } @@ -143,7 +144,8 @@ static int write_test_log(struct reftable_writer *wr, void *arg) { struct write_log_arg *wla = arg; - reftable_writer_set_limits(wr, wla->update_index, wla->update_index); + check(!reftable_writer_set_limits(wr, wla->update_index, + wla->update_index)); return reftable_writer_add_log(wr, wla->log); } @@ -961,7 +963,7 @@ static void t_reflog_expire(void) static int write_nothing(struct reftable_writer *wr, void *arg UNUSED) { - reftable_writer_set_limits(wr, 1, 1); + check(!reftable_writer_set_limits(wr, 1, 1)); return 0; } -- 2.47.0 ^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [PATCH v2 3/3] reftable: prevent 'update_index' changes after adding records 2025-01-21 3:34 ` [PATCH v2 3/3] reftable: prevent 'update_index' changes after adding records Karthik Nayak @ 2025-01-21 6:56 ` Patrick Steinhardt 2025-01-21 11:44 ` Karthik Nayak 0 siblings, 1 reply; 30+ messages in thread From: Patrick Steinhardt @ 2025-01-21 6:56 UTC (permalink / raw) To: Karthik Nayak; +Cc: git On Tue, Jan 21, 2025 at 04:34:12AM +0100, Karthik Nayak wrote: > The function `reftable_writer_set_limits()` allows updating the > 'min_update_index' and 'max_update_index' of a reftable writer. These > values are written to both the writer's header and footer. > > Since the header is written during the first block write, any subsequent > changes to the update index would create a mismatch between the header > and footer values. The footer would contain the newer values while the > header retained the original ones. > > To fix this bug, prevent callers from updating these values after any Nit: it's not really fixing a bug, but protecting us against it. Not worth a reroll though, from my point of view. > diff --git a/reftable/reftable-writer.h b/reftable/reftable-writer.h > index 5f9afa620bb00de66c311765fb0ae8c6f56401ae..1ea014d389cc47f173279e3234a82f3fcbc807a0 100644 > --- a/reftable/reftable-writer.h > +++ b/reftable/reftable-writer.h > @@ -124,17 +124,21 @@ int reftable_writer_new(struct reftable_writer **out, > int (*flush_func)(void *), > void *writer_arg, const struct reftable_write_options *opts); > > -/* Set the range of update indices for the records we will add. When writing a > - table into a stack, the min should be at least > - reftable_stack_next_update_index(), or REFTABLE_API_ERROR is returned. > - > - For transactional updates to a stack, typically min==max, and the > - update_index can be obtained by inspeciting the stack. When converting an > - existing ref database into a single reftable, this would be a range of > - update-index timestamps. > +/* > + * Set the range of update indices for the records we will add. When writing a > + * table into a stack, the min should be at least > + * reftable_stack_next_update_index(), or REFTABLE_API_ERROR is returned. > + * > + * For transactional updates to a stack, typically min==max, and the > + * update_index can be obtained by inspeciting the stack. When converting an > + * existing ref database into a single reftable, this would be a range of > + * update-index timestamps. > + * > + * The function should be called before adding any records to the writer. If not > + * it will fail with REFTABLE_API_ERROR. > */ Thanks for updating this. I think the reftable library is one of those code areas where it makes sense to sneak in a formatting fix every now and then because its coding style is quite alien to Git's own in some places. We could also do it all in one go, but I strongly doubt that it would be worth the churn. > -void reftable_writer_set_limits(struct reftable_writer *w, uint64_t min, > - uint64_t max); > +int reftable_writer_set_limits(struct reftable_writer *w, uint64_t min, > + uint64_t max); > > /* > Add a reftable_ref_record. The record should have names that come after > diff --git a/reftable/writer.c b/reftable/writer.c > index 740c98038eaf883258bef4988f78977ac7e4a75a..03acbdbcce75fd51820c5fb016bd94f0f7f4914a 100644 > --- a/reftable/writer.c > +++ b/reftable/writer.c > @@ -179,11 +179,20 @@ int reftable_writer_new(struct reftable_writer **out, > return 0; > } > > -void reftable_writer_set_limits(struct reftable_writer *w, uint64_t min, > - uint64_t max) > +int reftable_writer_set_limits(struct reftable_writer *w, uint64_t min, > + uint64_t max) > { > + /* > + * The limits should be set before any records are added to the writer. > + * Check if any records were added by checking if `last_key` was set. > + */ > + if (w->last_key.len) > + return REFTABLE_API_ERROR; Hm. Using the last key feels somewhat dangerous to me as it does get reset at times, e.g. when finishing writing the current section. It _should_ work, but overall it just feels a tad to disconnected from the thing that we actually want to check. How about we instead use `next`? This variable records the offset of the next block we're about to write, and `writer_flush_nonempty_block()` uses it directly to check whether we're currently writing the first block in order to decide whether it needs to write a header or not. If it's 0, we know that we haven't written the first block yet. That feels much closer aligned with what we're checking. > diff --git a/t/unit-tests/t-reftable-stack.c b/t/unit-tests/t-reftable-stack.c > index aeec195b2b1014445d71c5db39a9795017fd8ff2..b23edf18a7d75b0c2292490ad06d4dfaaa571e79 100644 > --- a/t/unit-tests/t-reftable-stack.c > +++ b/t/unit-tests/t-reftable-stack.c Can we maybe add a unit test that demonstrates the error? Patrick ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v2 3/3] reftable: prevent 'update_index' changes after adding records 2025-01-21 6:56 ` Patrick Steinhardt @ 2025-01-21 11:44 ` Karthik Nayak 0 siblings, 0 replies; 30+ messages in thread From: Karthik Nayak @ 2025-01-21 11:44 UTC (permalink / raw) To: Patrick Steinhardt; +Cc: git [-- Attachment #1: Type: text/plain, Size: 5430 bytes --] Patrick Steinhardt <ps@pks.im> writes: > On Tue, Jan 21, 2025 at 04:34:12AM +0100, Karthik Nayak wrote: >> The function `reftable_writer_set_limits()` allows updating the >> 'min_update_index' and 'max_update_index' of a reftable writer. These >> values are written to both the writer's header and footer. >> >> Since the header is written during the first block write, any subsequent >> changes to the update index would create a mismatch between the header >> and footer values. The footer would contain the newer values while the >> header retained the original ones. >> >> To fix this bug, prevent callers from updating these values after any > > Nit: it's not really fixing a bug, but protecting us against it. Not > worth a reroll though, from my point of view. > That's right, I'll add that in. >> diff --git a/reftable/reftable-writer.h b/reftable/reftable-writer.h >> index 5f9afa620bb00de66c311765fb0ae8c6f56401ae..1ea014d389cc47f173279e3234a82f3fcbc807a0 100644 >> --- a/reftable/reftable-writer.h >> +++ b/reftable/reftable-writer.h >> @@ -124,17 +124,21 @@ int reftable_writer_new(struct reftable_writer **out, >> int (*flush_func)(void *), >> void *writer_arg, const struct reftable_write_options *opts); >> >> -/* Set the range of update indices for the records we will add. When writing a >> - table into a stack, the min should be at least >> - reftable_stack_next_update_index(), or REFTABLE_API_ERROR is returned. >> - >> - For transactional updates to a stack, typically min==max, and the >> - update_index can be obtained by inspeciting the stack. When converting an >> - existing ref database into a single reftable, this would be a range of >> - update-index timestamps. >> +/* >> + * Set the range of update indices for the records we will add. When writing a >> + * table into a stack, the min should be at least >> + * reftable_stack_next_update_index(), or REFTABLE_API_ERROR is returned. >> + * >> + * For transactional updates to a stack, typically min==max, and the >> + * update_index can be obtained by inspeciting the stack. When converting an >> + * existing ref database into a single reftable, this would be a range of >> + * update-index timestamps. >> + * >> + * The function should be called before adding any records to the writer. If not >> + * it will fail with REFTABLE_API_ERROR. >> */ > > Thanks for updating this. I think the reftable library is one of those > code areas where it makes sense to sneak in a formatting fix every now > and then because its coding style is quite alien to Git's own in some > places. We could also do it all in one go, but I strongly doubt that it > would be worth the churn. > Generally I try to sneak in small fixes like this around code being touched. I know it is a little more toll on reviewers, but small improvements do add up. >> -void reftable_writer_set_limits(struct reftable_writer *w, uint64_t min, >> - uint64_t max); >> +int reftable_writer_set_limits(struct reftable_writer *w, uint64_t min, >> + uint64_t max); >> >> /* >> Add a reftable_ref_record. The record should have names that come after > >> diff --git a/reftable/writer.c b/reftable/writer.c >> index 740c98038eaf883258bef4988f78977ac7e4a75a..03acbdbcce75fd51820c5fb016bd94f0f7f4914a 100644 >> --- a/reftable/writer.c >> +++ b/reftable/writer.c >> @@ -179,11 +179,20 @@ int reftable_writer_new(struct reftable_writer **out, >> return 0; >> } >> >> -void reftable_writer_set_limits(struct reftable_writer *w, uint64_t min, >> - uint64_t max) >> +int reftable_writer_set_limits(struct reftable_writer *w, uint64_t min, >> + uint64_t max) >> { >> + /* >> + * The limits should be set before any records are added to the writer. >> + * Check if any records were added by checking if `last_key` was set. >> + */ >> + if (w->last_key.len) >> + return REFTABLE_API_ERROR; > > Hm. Using the last key feels somewhat dangerous to me as it does get > reset at times, e.g. when finishing writing the current section. It > _should_ work, but overall it just feels a tad to disconnected from the > thing that we actually want to check. > > How about we instead use `next`? This variable records the offset of the > next block we're about to write, and `writer_flush_nonempty_block()` > uses it directly to check whether we're currently writing the first > block in order to decide whether it needs to write a header or not. If > it's 0, we know that we haven't written the first block yet. That feels > much closer aligned with what we're checking. > The last version did use `next`. I changed it because `next` is only modified once the first block has been written. This would still allow limit modification post writing of first few records. This should be okay however since we're concerned about header <> footer mismatch. But from an ideological point, it makes sense to only allow limit modification before _any_ records have been written. I'm thinking if we should use both `if (w->next || w->last_key.len)`. This way we capture all modifications. >> diff --git a/t/unit-tests/t-reftable-stack.c b/t/unit-tests/t-reftable-stack.c >> index aeec195b2b1014445d71c5db39a9795017fd8ff2..b23edf18a7d75b0c2292490ad06d4dfaaa571e79 100644 >> --- a/t/unit-tests/t-reftable-stack.c >> +++ b/t/unit-tests/t-reftable-stack.c > > Can we maybe add a unit test that demonstrates the error? Good suggestion, will add it! > Patrick [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 690 bytes --] ^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH v3 0/3] refs: small followups to the migration corruption fix 2025-01-21 3:34 ` [PATCH v2 0/3] refs: small followups to the migration corruption fix Karthik Nayak ` (2 preceding siblings ...) 2025-01-21 3:34 ` [PATCH v2 3/3] reftable: prevent 'update_index' changes after adding records Karthik Nayak @ 2025-01-22 5:35 ` Karthik Nayak 2025-01-22 5:35 ` [PATCH v3 1/3] refs: mark `ref_transaction_update_reflog()` as static Karthik Nayak ` (2 more replies) 3 siblings, 3 replies; 30+ messages in thread From: Karthik Nayak @ 2025-01-22 5:35 UTC (permalink / raw) To: git; +Cc: Karthik Nayak, Patrick Steinhardt, Junio C Hamano, brian m. carlson This is a follow up to the bug that was reported [1] around `git refs migrate --ref-format=reftable` where the migration would fail for repositories with reflogs with lots of entries. This was caused due to a mismatch in the reftable's header and footer, specifically WRT the 'max_update_index'. While there was a fix posted. This series is a small followup to fix some of the topics discussed there: 1. To mark `ref_transaction_update_reflog()` as static since it is only used internally within 'refs.c'. 2. To change the type of 'max_index' from 'unsigned int' to 'uint64_t'. This would be much safer for large repositories with millions of files and on 32bit systems. 3. To add a safeguard to prevent 'update_index' changes post any record addition. This is a preventive measure to ensure such bugs don't arise in the future. This is based on top of master 757161efcc (Sync with Git 2.48.1, 2025-01-13) with 'kn/reflog-migration-fix' merged in. [1]: https://lore.kernel.org/r/Z4UbkcmJAU1MT-Rs@tapette.crustytoothpaste.net Signed-off-by: Karthik Nayak <karthik.188@gmail.com> --- Changes in v3: - Commit 3: Modify the commit message to make it clearer that this is a preventive measure. Modify the condition to check for both `last_key` and `next`. Also add a unit test. - Link to v2: https://lore.kernel.org/r/20250121-461-corrupted-reftable-followup-v2-0-37e26c7a79b4@gmail.com Changes in v2: - Commit 1: Keep the function comments. - Commit 3: Modify the requirement to be for any record writes instead of the first block write. This ensures that the limits need to be set before any records being added. Instead of using `BUG()`, return `REFTABLE_API_ERROR`. Handle all callers as needed. - Link to v1: https://lore.kernel.org/r/20250117-461-corrupted-reftable-followup-v1-0-70ee605ae3fe@gmail.com --- Karthik Nayak (3): refs: mark `ref_transaction_update_reflog()` as static refs: use 'uint64_t' for 'ref_update.index' reftable: prevent 'update_index' changes after adding records refs.c | 24 ++++++++++++------ refs.h | 14 ----------- refs/refs-internal.h | 4 +-- refs/reftable-backend.c | 22 ++++++++++++----- reftable/reftable-error.h | 1 + reftable/reftable-writer.h | 24 ++++++++++-------- reftable/stack.c | 6 +++-- reftable/writer.c | 15 +++++++++++- t/unit-tests/t-reftable-stack.c | 54 ++++++++++++++++++++++++++++++++++++++--- 9 files changed, 118 insertions(+), 46 deletions(-) --- Range-diff versus v2: 1: 76e6da564a = 1: cafede9b10 refs: mark `ref_transaction_update_reflog()` as static 2: f8e2e81eb4 = 2: 737360d883 refs: use 'uint64_t' for 'ref_update.index' 3: ca92e29ecc ! 3: 01bf1d765f reftable: prevent 'update_index' changes after adding records @@ Commit message and footer values. The footer would contain the newer values while the header retained the original ones. - To fix this bug, prevent callers from updating these values after any - record is written. To do this, modify the function to return an error - whenever the limits are modified after any record adds. Check for record - adds within `reftable_writer_set_limits()` by checking the `last_key` - variable, which is set whenever a new record is added. + To protect against this bug, prevent callers from updating these values + after any record is written. To do this, modify the function to return + an error whenever the limits are modified after any record adds. Check + for record adds within `reftable_writer_set_limits()` by checking the + `last_key` and `next` variable. The former is updated after each record + added, but is reset at certain points. The latter is set after writing + the first block. Modify all callers of the function to anticipate a return type and - handle it accordingly. + handle it accordingly. Add a unit test to also ensure the function + returns the error as expected. Helped-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Karthik Nayak <karthik.188@gmail.com> @@ reftable/writer.c: int reftable_writer_new(struct reftable_writer **out, } -void reftable_writer_set_limits(struct reftable_writer *w, uint64_t min, -- uint64_t max) +int reftable_writer_set_limits(struct reftable_writer *w, uint64_t min, -+ uint64_t max) + uint64_t max) { + /* -+ * The limits should be set before any records are added to the writer. -+ * Check if any records were added by checking if `last_key` was set. ++ * Set the min/max update index limits for the reftable writer. ++ * This must be called before adding any records, since: ++ * - The 'next' field gets set after writing the first block. ++ * - The 'last_key' field updates with each new record (but resets ++ * after sections). ++ * Returns REFTABLE_API_ERROR if called after writing has begun. + */ -+ if (w->last_key.len) ++ if (w->next || w->last_key.len) + return REFTABLE_API_ERROR; + w->min_update_index = min; @@ t/unit-tests/t-reftable-stack.c: static void t_reflog_expire(void) return 0; } +@@ t/unit-tests/t-reftable-stack.c: static void t_reftable_stack_reload_with_missing_table(void) + clear_dir(dir); + } + ++static int write_limits_after_ref(struct reftable_writer *wr, void *arg) ++{ ++ struct reftable_ref_record *ref = arg; ++ check(!reftable_writer_set_limits(wr, ref->update_index, ref->update_index)); ++ check(!reftable_writer_add_ref(wr, ref)); ++ return reftable_writer_set_limits(wr, ref->update_index, ref->update_index); ++} ++ ++static void t_reftable_invalid_limit_updates(void) ++{ ++ struct reftable_ref_record ref = { ++ .refname = (char *) "HEAD", ++ .update_index = 1, ++ .value_type = REFTABLE_REF_SYMREF, ++ .value.symref = (char *) "master", ++ }; ++ struct reftable_write_options opts = { ++ .default_permissions = 0660, ++ }; ++ struct reftable_addition *add = NULL; ++ char *dir = get_tmp_dir(__LINE__); ++ struct reftable_stack *st = NULL; ++ int err; ++ ++ err = reftable_new_stack(&st, dir, &opts); ++ check(!err); ++ ++ reftable_addition_destroy(add); ++ ++ err = reftable_stack_new_addition(&add, st, 0); ++ check(!err); ++ ++ /* ++ * write_limits_after_ref also updates the update indexes after adding ++ * the record. This should cause an err to be returned, since the limits ++ * must be set at the start. ++ */ ++ err = reftable_addition_add(add, write_limits_after_ref, &ref); ++ check_int(err, ==, REFTABLE_API_ERROR); ++ ++ reftable_addition_destroy(add); ++ reftable_stack_destroy(st); ++ clear_dir(dir); ++} ++ + int cmd_main(int argc UNUSED, const char *argv[] UNUSED) + { + TEST(t_empty_add(), "empty addition to stack"); + TEST(t_read_file(), "read_lines works"); + TEST(t_reflog_expire(), "expire reflog entries"); ++ TEST(t_reftable_invalid_limit_updates(), "prevent limit updates after adding records"); + TEST(t_reftable_stack_add(), "add multiple refs and logs to stack"); + TEST(t_reftable_stack_add_one(), "add a single ref record to stack"); + TEST(t_reftable_stack_add_performs_auto_compaction(), "addition to stack triggers auto-compaction"); --- base-commit: a5aa44e7930761cb900813d971b4105f901818fb change-id: 20250117-461-corrupted-reftable-followup-eb0e4fd1a723 Thanks - Karthik ^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH v3 1/3] refs: mark `ref_transaction_update_reflog()` as static 2025-01-22 5:35 ` [PATCH v3 0/3] refs: small followups to the migration corruption fix Karthik Nayak @ 2025-01-22 5:35 ` Karthik Nayak 2025-01-22 5:35 ` [PATCH v3 2/3] refs: use 'uint64_t' for 'ref_update.index' Karthik Nayak 2025-01-22 5:35 ` [PATCH v3 3/3] reftable: prevent 'update_index' changes after adding records Karthik Nayak 2 siblings, 0 replies; 30+ messages in thread From: Karthik Nayak @ 2025-01-22 5:35 UTC (permalink / raw) To: git; +Cc: Karthik Nayak, Patrick Steinhardt, Junio C Hamano The `ref_transaction_update_reflog()` function is only used within 'refs.c', so mark it as static. Reported-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Karthik Nayak <karthik.188@gmail.com> --- refs.c | 22 +++++++++++++++------- refs.h | 14 -------------- 2 files changed, 15 insertions(+), 21 deletions(-) diff --git a/refs.c b/refs.c index f7b6f0f897eb58665e10a2efd3eb53c3f72abe61..ad6d774717150f1fe68a59c629e05e49a469693f 100644 --- a/refs.c +++ b/refs.c @@ -1318,13 +1318,21 @@ int ref_transaction_update(struct ref_transaction *transaction, return 0; } -int ref_transaction_update_reflog(struct ref_transaction *transaction, - const char *refname, - const struct object_id *new_oid, - const struct object_id *old_oid, - const char *committer_info, unsigned int flags, - const char *msg, unsigned int index, - struct strbuf *err) +/* + * Similar to`ref_transaction_update`, but this function is only for adding + * a reflog update. Supports providing custom committer information. The index + * field can be utiltized to order updates as desired. When not used, the + * updates default to being ordered by refname. + */ +static int ref_transaction_update_reflog(struct ref_transaction *transaction, + const char *refname, + const struct object_id *new_oid, + const struct object_id *old_oid, + const char *committer_info, + unsigned int flags, + const char *msg, + unsigned int index, + struct strbuf *err) { struct ref_update *update; diff --git a/refs.h b/refs.h index a0cdd99250e8286b55808b697b0a94afac5d8319..09be47afbee51e99f4ae49588cd65596ccfcb07e 100644 --- a/refs.h +++ b/refs.h @@ -771,20 +771,6 @@ int ref_transaction_update(struct ref_transaction *transaction, unsigned int flags, const char *msg, struct strbuf *err); -/* - * Similar to`ref_transaction_update`, but this function is only for adding - * a reflog update. Supports providing custom committer information. The index - * field can be utiltized to order updates as desired. When not used, the - * updates default to being ordered by refname. - */ -int ref_transaction_update_reflog(struct ref_transaction *transaction, - const char *refname, - const struct object_id *new_oid, - const struct object_id *old_oid, - const char *committer_info, unsigned int flags, - const char *msg, unsigned int index, - struct strbuf *err); - /* * Add a reference creation to transaction. new_oid is the value that * the reference should have after the update; it must not be -- 2.47.0 ^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH v3 2/3] refs: use 'uint64_t' for 'ref_update.index' 2025-01-22 5:35 ` [PATCH v3 0/3] refs: small followups to the migration corruption fix Karthik Nayak 2025-01-22 5:35 ` [PATCH v3 1/3] refs: mark `ref_transaction_update_reflog()` as static Karthik Nayak @ 2025-01-22 5:35 ` Karthik Nayak 2025-01-22 5:35 ` [PATCH v3 3/3] reftable: prevent 'update_index' changes after adding records Karthik Nayak 2 siblings, 0 replies; 30+ messages in thread From: Karthik Nayak @ 2025-01-22 5:35 UTC (permalink / raw) To: git; +Cc: Karthik Nayak, Patrick Steinhardt, brian m. carlson The 'ref_update.index' variable is used to store an index for a given reference update. This index is used to order the updates in a predetermined order, while the default ordering is alphabetical as per the refname. For large repositories with millions of references, it should be safer to use 'uint64_t'. Let's do that. This also is applied for all other code sections where we store 'index' and pass it around. Reported-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Karthik Nayak <karthik.188@gmail.com> --- refs.c | 4 ++-- refs/refs-internal.h | 4 ++-- refs/reftable-backend.c | 2 +- 3 files changed, 5 insertions(+), 5 deletions(-) diff --git a/refs.c b/refs.c index ad6d774717150f1fe68a59c629e05e49a469693f..ef04f403a6a3a34f9156b4cf68c3daa29c9cbad6 100644 --- a/refs.c +++ b/refs.c @@ -1331,7 +1331,7 @@ static int ref_transaction_update_reflog(struct ref_transaction *transaction, const char *committer_info, unsigned int flags, const char *msg, - unsigned int index, + uint64_t index, struct strbuf *err) { struct ref_update *update; @@ -2813,7 +2813,7 @@ static int migrate_one_ref(const char *refname, const char *referent UNUSED, con } struct reflog_migration_data { - unsigned int index; + uint64_t index; const char *refname; struct ref_store *old_refs; struct ref_transaction *transaction; diff --git a/refs/refs-internal.h b/refs/refs-internal.h index aaab711bb96844755dfa600d37efdb25b30a0765..8894b43d1d1a327d404d3923c507d2d39649de19 100644 --- a/refs/refs-internal.h +++ b/refs/refs-internal.h @@ -120,7 +120,7 @@ struct ref_update { * when migrating reflogs and we want to ensure we carry over the * same order. */ - unsigned int index; + uint64_t index; /* * If this ref_update was split off of a symref update via @@ -203,7 +203,7 @@ struct ref_transaction { enum ref_transaction_state state; void *backend_data; unsigned int flags; - unsigned int max_index; + uint64_t max_index; }; /* diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c index 289496058e6eb4d3e3aef96ca70219cd4ff78eae..6814c87bc618229ac8a70b904be3f850371ad876 100644 --- a/refs/reftable-backend.c +++ b/refs/reftable-backend.c @@ -942,7 +942,7 @@ struct write_transaction_table_arg { size_t updates_nr; size_t updates_alloc; size_t updates_expected; - unsigned int max_index; + uint64_t max_index; }; struct reftable_transaction_data { -- 2.47.0 ^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH v3 3/3] reftable: prevent 'update_index' changes after adding records 2025-01-22 5:35 ` [PATCH v3 0/3] refs: small followups to the migration corruption fix Karthik Nayak 2025-01-22 5:35 ` [PATCH v3 1/3] refs: mark `ref_transaction_update_reflog()` as static Karthik Nayak 2025-01-22 5:35 ` [PATCH v3 2/3] refs: use 'uint64_t' for 'ref_update.index' Karthik Nayak @ 2025-01-22 5:35 ` Karthik Nayak 2025-01-22 12:12 ` Patrick Steinhardt 2025-02-01 2:24 ` undefined behavior in unit tests, was " Jeff King 2 siblings, 2 replies; 30+ messages in thread From: Karthik Nayak @ 2025-01-22 5:35 UTC (permalink / raw) To: git; +Cc: Karthik Nayak, Patrick Steinhardt The function `reftable_writer_set_limits()` allows updating the 'min_update_index' and 'max_update_index' of a reftable writer. These values are written to both the writer's header and footer. Since the header is written during the first block write, any subsequent changes to the update index would create a mismatch between the header and footer values. The footer would contain the newer values while the header retained the original ones. To protect against this bug, prevent callers from updating these values after any record is written. To do this, modify the function to return an error whenever the limits are modified after any record adds. Check for record adds within `reftable_writer_set_limits()` by checking the `last_key` and `next` variable. The former is updated after each record added, but is reset at certain points. The latter is set after writing the first block. Modify all callers of the function to anticipate a return type and handle it accordingly. Add a unit test to also ensure the function returns the error as expected. Helped-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Karthik Nayak <karthik.188@gmail.com> --- refs/reftable-backend.c | 20 +++++++++++---- reftable/reftable-error.h | 1 + reftable/reftable-writer.h | 24 ++++++++++-------- reftable/stack.c | 6 +++-- reftable/writer.c | 15 +++++++++++- t/unit-tests/t-reftable-stack.c | 54 ++++++++++++++++++++++++++++++++++++++--- 6 files changed, 99 insertions(+), 21 deletions(-) diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c index 6814c87bc618229ac8a70b904be3f850371ad876..9cfb0cb26721a9425c3b4a374f7b41e192037315 100644 --- a/refs/reftable-backend.c +++ b/refs/reftable-backend.c @@ -1443,7 +1443,9 @@ static int write_transaction_table(struct reftable_writer *writer, void *cb_data * multiple entries. Each entry will contain a different update_index, * so set the limits accordingly. */ - reftable_writer_set_limits(writer, ts, ts + arg->max_index); + ret = reftable_writer_set_limits(writer, ts, ts + arg->max_index); + if (ret < 0) + goto done; for (i = 0; i < arg->updates_nr; i++) { struct reftable_transaction_update *tx_update = &arg->updates[i]; @@ -1766,7 +1768,9 @@ static int write_copy_table(struct reftable_writer *writer, void *cb_data) deletion_ts = creation_ts = reftable_stack_next_update_index(arg->be->stack); if (arg->delete_old) creation_ts++; - reftable_writer_set_limits(writer, deletion_ts, creation_ts); + ret = reftable_writer_set_limits(writer, deletion_ts, creation_ts); + if (ret < 0) + goto done; /* * Add the new reference. If this is a rename then we also delete the @@ -2298,7 +2302,9 @@ static int write_reflog_existence_table(struct reftable_writer *writer, if (ret <= 0) goto done; - reftable_writer_set_limits(writer, ts, ts); + ret = reftable_writer_set_limits(writer, ts, ts); + if (ret < 0) + goto done; /* * The existence entry has both old and new object ID set to the @@ -2357,7 +2363,9 @@ static int write_reflog_delete_table(struct reftable_writer *writer, void *cb_da uint64_t ts = reftable_stack_next_update_index(arg->stack); int ret; - reftable_writer_set_limits(writer, ts, ts); + ret = reftable_writer_set_limits(writer, ts, ts); + if (ret < 0) + goto out; ret = reftable_stack_init_log_iterator(arg->stack, &it); if (ret < 0) @@ -2434,7 +2442,9 @@ static int write_reflog_expiry_table(struct reftable_writer *writer, void *cb_da if (arg->records[i].value_type == REFTABLE_LOG_UPDATE) live_records++; - reftable_writer_set_limits(writer, ts, ts); + ret = reftable_writer_set_limits(writer, ts, ts); + if (ret < 0) + return ret; if (!is_null_oid(&arg->update_oid)) { struct reftable_ref_record ref = {0}; diff --git a/reftable/reftable-error.h b/reftable/reftable-error.h index f4048265629fe456207b88620658193f770a84f0..a7e33d964d0cfe5546f588d26c0fcb66ab326828 100644 --- a/reftable/reftable-error.h +++ b/reftable/reftable-error.h @@ -30,6 +30,7 @@ enum reftable_error { /* Misuse of the API: * - on writing a record with NULL refname. + * - on writing a record before setting the writer limits. * - on writing a reftable_ref_record outside the table limits * - on writing a ref or log record before the stack's * next_update_inde*x diff --git a/reftable/reftable-writer.h b/reftable/reftable-writer.h index 5f9afa620bb00de66c311765fb0ae8c6f56401ae..1ea014d389cc47f173279e3234a82f3fcbc807a0 100644 --- a/reftable/reftable-writer.h +++ b/reftable/reftable-writer.h @@ -124,17 +124,21 @@ int reftable_writer_new(struct reftable_writer **out, int (*flush_func)(void *), void *writer_arg, const struct reftable_write_options *opts); -/* Set the range of update indices for the records we will add. When writing a - table into a stack, the min should be at least - reftable_stack_next_update_index(), or REFTABLE_API_ERROR is returned. - - For transactional updates to a stack, typically min==max, and the - update_index can be obtained by inspeciting the stack. When converting an - existing ref database into a single reftable, this would be a range of - update-index timestamps. +/* + * Set the range of update indices for the records we will add. When writing a + * table into a stack, the min should be at least + * reftable_stack_next_update_index(), or REFTABLE_API_ERROR is returned. + * + * For transactional updates to a stack, typically min==max, and the + * update_index can be obtained by inspeciting the stack. When converting an + * existing ref database into a single reftable, this would be a range of + * update-index timestamps. + * + * The function should be called before adding any records to the writer. If not + * it will fail with REFTABLE_API_ERROR. */ -void reftable_writer_set_limits(struct reftable_writer *w, uint64_t min, - uint64_t max); +int reftable_writer_set_limits(struct reftable_writer *w, uint64_t min, + uint64_t max); /* Add a reftable_ref_record. The record should have names that come after diff --git a/reftable/stack.c b/reftable/stack.c index 531660a49f0948c33041831ee0d740feacb22b2f..9649dbbb04c51e106ee752f14481bbad381cb348 100644 --- a/reftable/stack.c +++ b/reftable/stack.c @@ -1058,8 +1058,10 @@ static int stack_write_compact(struct reftable_stack *st, for (size_t i = first; i <= last; i++) st->stats.bytes += st->readers[i]->size; - reftable_writer_set_limits(wr, st->readers[first]->min_update_index, - st->readers[last]->max_update_index); + err = reftable_writer_set_limits(wr, st->readers[first]->min_update_index, + st->readers[last]->max_update_index); + if (err < 0) + goto done; err = reftable_merged_table_new(&mt, st->readers + first, subtabs_len, st->opts.hash_id); diff --git a/reftable/writer.c b/reftable/writer.c index 740c98038eaf883258bef4988f78977ac7e4a75a..76e24018172fc1d80a6535698757979e53b0e213 100644 --- a/reftable/writer.c +++ b/reftable/writer.c @@ -179,11 +179,24 @@ int reftable_writer_new(struct reftable_writer **out, return 0; } -void reftable_writer_set_limits(struct reftable_writer *w, uint64_t min, +int reftable_writer_set_limits(struct reftable_writer *w, uint64_t min, uint64_t max) { + /* + * Set the min/max update index limits for the reftable writer. + * This must be called before adding any records, since: + * - The 'next' field gets set after writing the first block. + * - The 'last_key' field updates with each new record (but resets + * after sections). + * Returns REFTABLE_API_ERROR if called after writing has begun. + */ + if (w->next || w->last_key.len) + return REFTABLE_API_ERROR; + w->min_update_index = min; w->max_update_index = max; + + return 0; } static void writer_release(struct reftable_writer *w) diff --git a/t/unit-tests/t-reftable-stack.c b/t/unit-tests/t-reftable-stack.c index aeec195b2b1014445d71c5db39a9795017fd8ff2..c3f0059c346edbe1ad543c9832959c6fc0aa9180 100644 --- a/t/unit-tests/t-reftable-stack.c +++ b/t/unit-tests/t-reftable-stack.c @@ -103,7 +103,8 @@ static void t_read_file(void) static int write_test_ref(struct reftable_writer *wr, void *arg) { struct reftable_ref_record *ref = arg; - reftable_writer_set_limits(wr, ref->update_index, ref->update_index); + check(!reftable_writer_set_limits(wr, ref->update_index, + ref->update_index)); return reftable_writer_add_ref(wr, ref); } @@ -143,7 +144,8 @@ static int write_test_log(struct reftable_writer *wr, void *arg) { struct write_log_arg *wla = arg; - reftable_writer_set_limits(wr, wla->update_index, wla->update_index); + check(!reftable_writer_set_limits(wr, wla->update_index, + wla->update_index)); return reftable_writer_add_log(wr, wla->log); } @@ -961,7 +963,7 @@ static void t_reflog_expire(void) static int write_nothing(struct reftable_writer *wr, void *arg UNUSED) { - reftable_writer_set_limits(wr, 1, 1); + check(!reftable_writer_set_limits(wr, 1, 1)); return 0; } @@ -1369,11 +1371,57 @@ static void t_reftable_stack_reload_with_missing_table(void) clear_dir(dir); } +static int write_limits_after_ref(struct reftable_writer *wr, void *arg) +{ + struct reftable_ref_record *ref = arg; + check(!reftable_writer_set_limits(wr, ref->update_index, ref->update_index)); + check(!reftable_writer_add_ref(wr, ref)); + return reftable_writer_set_limits(wr, ref->update_index, ref->update_index); +} + +static void t_reftable_invalid_limit_updates(void) +{ + struct reftable_ref_record ref = { + .refname = (char *) "HEAD", + .update_index = 1, + .value_type = REFTABLE_REF_SYMREF, + .value.symref = (char *) "master", + }; + struct reftable_write_options opts = { + .default_permissions = 0660, + }; + struct reftable_addition *add = NULL; + char *dir = get_tmp_dir(__LINE__); + struct reftable_stack *st = NULL; + int err; + + err = reftable_new_stack(&st, dir, &opts); + check(!err); + + reftable_addition_destroy(add); + + err = reftable_stack_new_addition(&add, st, 0); + check(!err); + + /* + * write_limits_after_ref also updates the update indexes after adding + * the record. This should cause an err to be returned, since the limits + * must be set at the start. + */ + err = reftable_addition_add(add, write_limits_after_ref, &ref); + check_int(err, ==, REFTABLE_API_ERROR); + + reftable_addition_destroy(add); + reftable_stack_destroy(st); + clear_dir(dir); +} + int cmd_main(int argc UNUSED, const char *argv[] UNUSED) { TEST(t_empty_add(), "empty addition to stack"); TEST(t_read_file(), "read_lines works"); TEST(t_reflog_expire(), "expire reflog entries"); + TEST(t_reftable_invalid_limit_updates(), "prevent limit updates after adding records"); TEST(t_reftable_stack_add(), "add multiple refs and logs to stack"); TEST(t_reftable_stack_add_one(), "add a single ref record to stack"); TEST(t_reftable_stack_add_performs_auto_compaction(), "addition to stack triggers auto-compaction"); -- 2.47.0 ^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [PATCH v3 3/3] reftable: prevent 'update_index' changes after adding records 2025-01-22 5:35 ` [PATCH v3 3/3] reftable: prevent 'update_index' changes after adding records Karthik Nayak @ 2025-01-22 12:12 ` Patrick Steinhardt 2025-01-22 17:50 ` Junio C Hamano 2025-02-01 2:24 ` undefined behavior in unit tests, was " Jeff King 1 sibling, 1 reply; 30+ messages in thread From: Patrick Steinhardt @ 2025-01-22 12:12 UTC (permalink / raw) To: Karthik Nayak; +Cc: git On Wed, Jan 22, 2025 at 06:35:49AM +0100, Karthik Nayak wrote: > diff --git a/t/unit-tests/t-reftable-stack.c b/t/unit-tests/t-reftable-stack.c > index aeec195b2b1014445d71c5db39a9795017fd8ff2..c3f0059c346edbe1ad543c9832959c6fc0aa9180 100644 > --- a/t/unit-tests/t-reftable-stack.c > +++ b/t/unit-tests/t-reftable-stack.c > @@ -1369,11 +1371,57 @@ static void t_reftable_stack_reload_with_missing_table(void) > clear_dir(dir); > } > > +static int write_limits_after_ref(struct reftable_writer *wr, void *arg) > +{ > + struct reftable_ref_record *ref = arg; > + check(!reftable_writer_set_limits(wr, ref->update_index, ref->update_index)); > + check(!reftable_writer_add_ref(wr, ref)); > + return reftable_writer_set_limits(wr, ref->update_index, ref->update_index); > +} Nice. > +static void t_reftable_invalid_limit_updates(void) > +{ > + struct reftable_ref_record ref = { > + .refname = (char *) "HEAD", > + .update_index = 1, > + .value_type = REFTABLE_REF_SYMREF, > + .value.symref = (char *) "master", > + }; > + struct reftable_write_options opts = { > + .default_permissions = 0660, > + }; Nit: it's unnecessary to pass write options. Other than that the test looks good to me, and this nit isn't worth a reroll. Thanks for working on this! Patrick ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v3 3/3] reftable: prevent 'update_index' changes after adding records 2025-01-22 12:12 ` Patrick Steinhardt @ 2025-01-22 17:50 ` Junio C Hamano 2025-01-22 21:57 ` Junio C Hamano 0 siblings, 1 reply; 30+ messages in thread From: Junio C Hamano @ 2025-01-22 17:50 UTC (permalink / raw) To: Patrick Steinhardt; +Cc: Karthik Nayak, git Patrick Steinhardt <ps@pks.im> writes: >> +static void t_reftable_invalid_limit_updates(void) >> +{ >> + struct reftable_ref_record ref = { >> + .refname = (char *) "HEAD", >> + .update_index = 1, >> + .value_type = REFTABLE_REF_SYMREF, >> + .value.symref = (char *) "master", >> + }; >> + struct reftable_write_options opts = { >> + .default_permissions = 0660, >> + }; > > Nit: it's unnecessary to pass write options. Other than that the test > looks good to me, and this nit isn't worth a reroll. This write_options opts is used later in this call. err = reftable_new_stack(&st, dir, &opts); and "git grep reftable_new_stack" finds many hits, almost all in t/unit-tests/ hierarchy, only two among many of them passing NULL as the "use the default set of options" signal. And majority of them initialize their opts like so: struct reftable_write_options opts = { 0 }; So I agree that this one should pass NULL to be more explicit that we do not exercise any special features from the API, but so should many other existing callers that pass such meaningless &opts, I would think. Thanks. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v3 3/3] reftable: prevent 'update_index' changes after adding records 2025-01-22 17:50 ` Junio C Hamano @ 2025-01-22 21:57 ` Junio C Hamano 0 siblings, 0 replies; 30+ messages in thread From: Junio C Hamano @ 2025-01-22 21:57 UTC (permalink / raw) To: Patrick Steinhardt; +Cc: Karthik Nayak, git Junio C Hamano <gitster@pobox.com> writes: >> Nit: it's unnecessary to pass write options. Other than that the test >> looks good to me, and this nit isn't worth a reroll. > > This write_options opts is used later in this call. > > err = reftable_new_stack(&st, dir, &opts); > > and "git grep reftable_new_stack" finds many hits, almost all in > t/unit-tests/ hierarchy, only two among many of them passing NULL as > the "use the default set of options" signal. And majority of them > initialize their opts like so: > > struct reftable_write_options opts = { 0 }; > > So I agree that this one should pass NULL to be more explicit that > we do not exercise any special features from the API, but so should > many other existing callers that pass such meaningless &opts, I > would think. I may have been vague, but what I meant was that cleaning it up to pass NULL when &opts is unnecessary can and probably should be left outside the scope of this miniseries, and done with a clean-up patch that is separate. Thanks. ^ permalink raw reply [flat|nested] 30+ messages in thread
* undefined behavior in unit tests, was Re: [PATCH v3 3/3] reftable: prevent 'update_index' changes after adding records 2025-01-22 5:35 ` [PATCH v3 3/3] reftable: prevent 'update_index' changes after adding records Karthik Nayak 2025-01-22 12:12 ` Patrick Steinhardt @ 2025-02-01 2:24 ` Jeff King 2025-02-01 10:33 ` Phillip Wood ` (2 more replies) 1 sibling, 3 replies; 30+ messages in thread From: Jeff King @ 2025-02-01 2:24 UTC (permalink / raw) To: Karthik Nayak; +Cc: git, Patrick Steinhardt On Wed, Jan 22, 2025 at 06:35:49AM +0100, Karthik Nayak wrote: > +static void t_reftable_invalid_limit_updates(void) > +{ > + struct reftable_ref_record ref = { > + .refname = (char *) "HEAD", > + .update_index = 1, > + .value_type = REFTABLE_REF_SYMREF, > + .value.symref = (char *) "master", > + }; > + struct reftable_write_options opts = { > + .default_permissions = 0660, > + }; > + struct reftable_addition *add = NULL; > + char *dir = get_tmp_dir(__LINE__); > + struct reftable_stack *st = NULL; > + int err; > + > + err = reftable_new_stack(&st, dir, &opts); > + check(!err); > + > + reftable_addition_destroy(add); > + > + err = reftable_stack_new_addition(&add, st, 0); > + check(!err); Coverity complains that this function may have undefined behavior. It's an issue we have in a lot of other tests that have moved to the unit-test framework. I've mostly been ignoring it, but this is a pretty straight-forward example, so I thought I'd write a note. The issue is that reftable_new_stack() might fail, leaving "st" as NULL. And then we feed it to reftable_stack_new_addition(), which dereferences it. In normal production code, we'd expect something like: if (err) return -1; to avoid running the rest of the function after the first error. But the test harness check() function doesn't return. It just complains to stdout and keeps running! So you'll get something like[1]: $ t/unit-tests/bin/t-reftable-stack ok 1 - empty addition to stack ok 2 - read_lines works ok 3 - expire reflog entries # check "!err" failed at t/unit-tests/t-reftable-stack.c:1404 Segmentation fault So...yes, we will probably notice that the test failed from the exit code. But it's not great when the harness itself barfs so had. Plus a compiler may be free to reorder things in a confusing way if it can see that "st" must never be NULL. It feels like we probably ought to return as soon as a check() fails. That does create other headaches, though. E.g., we'd potentially leak from an early return (which our LSan builds will complain about), meaning that test code needs to start doing the usual "goto out" type of cleanup. So I dunno. Maybe we just live with it. But it feels pretty ugly. -Peff [1] This would happen in practice if malloc() failed, but you can simulate it yourself like this, which is what I used to create the output above: diff --git a/reftable/stack.c b/reftable/stack.c index 026a9f9742..fe77132102 100644 --- a/reftable/stack.c +++ b/reftable/stack.c @@ -861,6 +861,11 @@ int reftable_stack_new_addition(struct reftable_addition **dest, int err = 0; struct reftable_addition empty = REFTABLE_ADDITION_INIT; + if (flags & (1 << 16)) { + *dest = NULL; + return REFTABLE_OUT_OF_MEMORY_ERROR; + } + REFTABLE_CALLOC_ARRAY(*dest, 1); if (!*dest) return REFTABLE_OUT_OF_MEMORY_ERROR; diff --git a/t/unit-tests/t-reftable-stack.c b/t/unit-tests/t-reftable-stack.c index c3f0059c34..73ed9792a5 100644 --- a/t/unit-tests/t-reftable-stack.c +++ b/t/unit-tests/t-reftable-stack.c @@ -1400,7 +1400,7 @@ static void t_reftable_invalid_limit_updates(void) reftable_addition_destroy(add); - err = reftable_stack_new_addition(&add, st, 0); + err = reftable_stack_new_addition(&add, st, (1 << 16)); check(!err); /* ^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: undefined behavior in unit tests, was Re: [PATCH v3 3/3] reftable: prevent 'update_index' changes after adding records 2025-02-01 2:24 ` undefined behavior in unit tests, was " Jeff King @ 2025-02-01 10:33 ` Phillip Wood 2025-02-03 5:41 ` Patrick Steinhardt 2025-02-03 15:37 ` Jeff King 2025-02-03 5:40 ` Patrick Steinhardt 2025-02-03 15:20 ` Karthik Nayak 2 siblings, 2 replies; 30+ messages in thread From: Phillip Wood @ 2025-02-01 10:33 UTC (permalink / raw) To: Jeff King, Karthik Nayak; +Cc: git, Patrick Steinhardt Hi Peff On 01/02/2025 02:24, Jeff King wrote: > On Wed, Jan 22, 2025 at 06:35:49AM +0100, Karthik Nayak wrote: > > Coverity complains that this function may have undefined behavior. It's > an issue we have in a lot of other tests that have moved to the > unit-test framework. I've mostly been ignoring it, but this is a pretty > straight-forward example, so I thought I'd write a note. > > The issue is that reftable_new_stack() might fail, leaving "st" as NULL. > And then we feed it to reftable_stack_new_addition(), which dereferences > it. > > In normal production code, we'd expect something like: > > if (err) > return -1; > > to avoid running the rest of the function after the first error. But the > test harness check() function doesn't return. It just complains to > stdout and keeps running! That is to allow the test to add more context with test_msg() or do things like check all the members of a struct before returning. It is a bug in the test if it does not return after finding a NULL pointer, the correct usage is if (!check(ptr)) return; As we're in the process of switching to using clar which does exit the text function if a check fails (that means there may be leaks on failure but if the test is failing then I don't think we should be worrying about leaks) I don't know if it is worth fixing these or not. I guess it depends if there are the list of targets for Seyi's Outreachy project. Best Wishes Phillip So you'll get something like[1]: > > $ t/unit-tests/bin/t-reftable-stack > ok 1 - empty addition to stack > ok 2 - read_lines works > ok 3 - expire reflog entries > # check "!err" failed at t/unit-tests/t-reftable-stack.c:1404 > Segmentation fault > > So...yes, we will probably notice that the test failed from the exit > code. But it's not great when the harness itself barfs so had. Plus a > compiler may be free to reorder things in a confusing way if it can see > that "st" must never be NULL. > > It feels like we probably ought to return as soon as a check() fails. > That does create other headaches, though. E.g., we'd potentially leak > from an early return (which our LSan builds will complain about), > meaning that test code needs to start doing the usual "goto out" type of > cleanup. > > So I dunno. Maybe we just live with it. But it feels pretty ugly. > > -Peff > > [1] This would happen in practice if malloc() failed, but you can > simulate it yourself like this, which is what I used to create the > output above: > > diff --git a/reftable/stack.c b/reftable/stack.c > index 026a9f9742..fe77132102 100644 > --- a/reftable/stack.c > +++ b/reftable/stack.c > @@ -861,6 +861,11 @@ int reftable_stack_new_addition(struct reftable_addition **dest, > int err = 0; > struct reftable_addition empty = REFTABLE_ADDITION_INIT; > > + if (flags & (1 << 16)) { > + *dest = NULL; > + return REFTABLE_OUT_OF_MEMORY_ERROR; > + } > + > REFTABLE_CALLOC_ARRAY(*dest, 1); > if (!*dest) > return REFTABLE_OUT_OF_MEMORY_ERROR; > diff --git a/t/unit-tests/t-reftable-stack.c b/t/unit-tests/t-reftable-stack.c > index c3f0059c34..73ed9792a5 100644 > --- a/t/unit-tests/t-reftable-stack.c > +++ b/t/unit-tests/t-reftable-stack.c > @@ -1400,7 +1400,7 @@ static void t_reftable_invalid_limit_updates(void) > > reftable_addition_destroy(add); > > - err = reftable_stack_new_addition(&add, st, 0); > + err = reftable_stack_new_addition(&add, st, (1 << 16)); > check(!err); > > /* > ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: undefined behavior in unit tests, was Re: [PATCH v3 3/3] reftable: prevent 'update_index' changes after adding records 2025-02-01 10:33 ` Phillip Wood @ 2025-02-03 5:41 ` Patrick Steinhardt 2025-02-03 14:11 ` Junio C Hamano 2025-02-03 15:37 ` Jeff King 1 sibling, 1 reply; 30+ messages in thread From: Patrick Steinhardt @ 2025-02-03 5:41 UTC (permalink / raw) To: phillip.wood; +Cc: Jeff King, Karthik Nayak, git On Sat, Feb 01, 2025 at 10:33:13AM +0000, Phillip Wood wrote: > Hi Peff > > On 01/02/2025 02:24, Jeff King wrote: > > On Wed, Jan 22, 2025 at 06:35:49AM +0100, Karthik Nayak wrote: > > > > Coverity complains that this function may have undefined behavior. It's > > an issue we have in a lot of other tests that have moved to the > > unit-test framework. I've mostly been ignoring it, but this is a pretty > > straight-forward example, so I thought I'd write a note. > > > > The issue is that reftable_new_stack() might fail, leaving "st" as NULL. > > And then we feed it to reftable_stack_new_addition(), which dereferences > > it. > > > > In normal production code, we'd expect something like: > > > > if (err) > > return -1; > > > > to avoid running the rest of the function after the first error. But the > > test harness check() function doesn't return. It just complains to > > stdout and keeps running! > > That is to allow the test to add more context with test_msg() or do things > like check all the members of a struct before returning. It is a bug in the > test if it does not return after finding a NULL pointer, the correct usage > is > > if (!check(ptr)) > return; > > As we're in the process of switching to using clar which does exit the text > function if a check fails (that means there may be leaks on failure but if > the test is failing then I don't think we should be worrying about leaks) I > don't know if it is worth fixing these or not. I guess it depends if there > are the list of targets for Seyi's Outreachy project. Ah, yes, should've read your mail first, as you're saying basically the same as I did :) Patrick ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: undefined behavior in unit tests, was Re: [PATCH v3 3/3] reftable: prevent 'update_index' changes after adding records 2025-02-03 5:41 ` Patrick Steinhardt @ 2025-02-03 14:11 ` Junio C Hamano 0 siblings, 0 replies; 30+ messages in thread From: Junio C Hamano @ 2025-02-03 14:11 UTC (permalink / raw) To: Patrick Steinhardt; +Cc: phillip.wood, Jeff King, Karthik Nayak, git Patrick Steinhardt <ps@pks.im> writes: >> if (!check(ptr)) >> return; >> >> As we're in the process of switching to using clar which does exit the text >> function if a check fails (that means there may be leaks on failure but if >> the test is failing then I don't think we should be worrying about leaks) I >> don't know if it is worth fixing these or not. I guess it depends if there >> are the list of targets for Seyi's Outreachy project. > > Ah, yes, should've read your mail first, as you're saying basically the > same as I did :) Yup, I 100% agree with both of you. Thanks. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: undefined behavior in unit tests, was Re: [PATCH v3 3/3] reftable: prevent 'update_index' changes after adding records 2025-02-01 10:33 ` Phillip Wood 2025-02-03 5:41 ` Patrick Steinhardt @ 2025-02-03 15:37 ` Jeff King 1 sibling, 0 replies; 30+ messages in thread From: Jeff King @ 2025-02-03 15:37 UTC (permalink / raw) To: phillip.wood; +Cc: Karthik Nayak, git, Patrick Steinhardt On Sat, Feb 01, 2025 at 10:33:13AM +0000, Phillip Wood wrote: > > In normal production code, we'd expect something like: > > > > if (err) > > return -1; > > > > to avoid running the rest of the function after the first error. But the > > test harness check() function doesn't return. It just complains to > > stdout and keeps running! > > That is to allow the test to add more context with test_msg() or do things > like check all the members of a struct before returning. It is a bug in the > test if it does not return after finding a NULL pointer, the correct usage > is > > if (!check(ptr)) > return; > > As we're in the process of switching to using clar which does exit the text > function if a check fails (that means there may be leaks on failure but if > the test is failing then I don't think we should be worrying about leaks) I > don't know if it is worth fixing these or not. I guess it depends if there > are the list of targets for Seyi's Outreachy project. Ah, that's good to hear. I don't think there's any urgency here. These have been popping up since people started adding more unit-tests/ last summer. Waiting a few more months to switch to clar is probably not a big deal. I'm OK with ignoring a leak in a failing test. I do suspect that Coverity might still complain about the leaks, because it is doing static analysis to show that we _can_ leak (rather than the tests, which are seeing if we leaked at runtime). But I'm not sure how much effort we want to spend on making tests do cleanup on failure. Especially in a language like C. Anyway, I'll continue to ignore these Coverity results for now, then. ;) -Peff ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: undefined behavior in unit tests, was Re: [PATCH v3 3/3] reftable: prevent 'update_index' changes after adding records 2025-02-01 2:24 ` undefined behavior in unit tests, was " Jeff King 2025-02-01 10:33 ` Phillip Wood @ 2025-02-03 5:40 ` Patrick Steinhardt 2025-02-03 15:20 ` Karthik Nayak 2 siblings, 0 replies; 30+ messages in thread From: Patrick Steinhardt @ 2025-02-03 5:40 UTC (permalink / raw) To: Jeff King; +Cc: Karthik Nayak, git On Fri, Jan 31, 2025 at 09:24:09PM -0500, Jeff King wrote: > On Wed, Jan 22, 2025 at 06:35:49AM +0100, Karthik Nayak wrote: > > > +static void t_reftable_invalid_limit_updates(void) > > +{ > > + struct reftable_ref_record ref = { > > + .refname = (char *) "HEAD", > > + .update_index = 1, > > + .value_type = REFTABLE_REF_SYMREF, > > + .value.symref = (char *) "master", > > + }; > > + struct reftable_write_options opts = { > > + .default_permissions = 0660, > > + }; > > + struct reftable_addition *add = NULL; > > + char *dir = get_tmp_dir(__LINE__); > > + struct reftable_stack *st = NULL; > > + int err; > > + > > + err = reftable_new_stack(&st, dir, &opts); > > + check(!err); > > + > > + reftable_addition_destroy(add); > > + > > + err = reftable_stack_new_addition(&add, st, 0); > > + check(!err); > > Coverity complains that this function may have undefined behavior. It's > an issue we have in a lot of other tests that have moved to the > unit-test framework. I've mostly been ignoring it, but this is a pretty > straight-forward example, so I thought I'd write a note. > > The issue is that reftable_new_stack() might fail, leaving "st" as NULL. > And then we feed it to reftable_stack_new_addition(), which dereferences > it. > > In normal production code, we'd expect something like: > > if (err) > return -1; > > to avoid running the rest of the function after the first error. But the > test harness check() function doesn't return. It just complains to > stdout and keeps running! So you'll get something like[1]: > > $ t/unit-tests/bin/t-reftable-stack > ok 1 - empty addition to stack > ok 2 - read_lines works > ok 3 - expire reflog entries > # check "!err" failed at t/unit-tests/t-reftable-stack.c:1404 > Segmentation fault > > So...yes, we will probably notice that the test failed from the exit > code. But it's not great when the harness itself barfs so had. Plus a > compiler may be free to reorder things in a confusing way if it can see > that "st" must never be NULL. > > It feels like we probably ought to return as soon as a check() fails. > That does create other headaches, though. E.g., we'd potentially leak > from an early return (which our LSan builds will complain about), > meaning that test code needs to start doing the usual "goto out" type of > cleanup. > > So I dunno. Maybe we just live with it. But it feels pretty ugly. It's one of the pitfalls of our own testing framework, from my point of view. It's somewhat unexpected that the test would just continue running as this is the wrong thing to do in almost all cases. When assumptions fail, nothing good will come of it. In any case, issues like this will be fixed once we migrate those tests to the clar unit test framework. The default there is to abort the tests in case an assertion fails, which is much saner from my point of view. So I'm not sure if it is worth fixing this now, or whether refactoring the tests to use clar is the more sensible fix. Patrick ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: undefined behavior in unit tests, was Re: [PATCH v3 3/3] reftable: prevent 'update_index' changes after adding records 2025-02-01 2:24 ` undefined behavior in unit tests, was " Jeff King 2025-02-01 10:33 ` Phillip Wood 2025-02-03 5:40 ` Patrick Steinhardt @ 2025-02-03 15:20 ` Karthik Nayak 2025-02-03 15:38 ` Jeff King 2 siblings, 1 reply; 30+ messages in thread From: Karthik Nayak @ 2025-02-03 15:20 UTC (permalink / raw) To: Jeff King; +Cc: git, Patrick Steinhardt [-- Attachment #1: Type: text/plain, Size: 3925 bytes --] Jeff King <peff@peff.net> writes: > On Wed, Jan 22, 2025 at 06:35:49AM +0100, Karthik Nayak wrote: > >> +static void t_reftable_invalid_limit_updates(void) >> +{ >> + struct reftable_ref_record ref = { >> + .refname = (char *) "HEAD", >> + .update_index = 1, >> + .value_type = REFTABLE_REF_SYMREF, >> + .value.symref = (char *) "master", >> + }; >> + struct reftable_write_options opts = { >> + .default_permissions = 0660, >> + }; >> + struct reftable_addition *add = NULL; >> + char *dir = get_tmp_dir(__LINE__); >> + struct reftable_stack *st = NULL; >> + int err; >> + >> + err = reftable_new_stack(&st, dir, &opts); >> + check(!err); >> + >> + reftable_addition_destroy(add); >> + >> + err = reftable_stack_new_addition(&add, st, 0); >> + check(!err); > > Coverity complains that this function may have undefined behavior. It's > an issue we have in a lot of other tests that have moved to the > unit-test framework. I've mostly been ignoring it, but this is a pretty > straight-forward example, so I thought I'd write a note. > > The issue is that reftable_new_stack() might fail, leaving "st" as NULL. > And then we feed it to reftable_stack_new_addition(), which dereferences > it. > > In normal production code, we'd expect something like: > > if (err) > return -1; > > to avoid running the rest of the function after the first error. But the > test harness check() function doesn't return. It just complains to > stdout and keeps running! So you'll get something like[1]: > > $ t/unit-tests/bin/t-reftable-stack > ok 1 - empty addition to stack > ok 2 - read_lines works > ok 3 - expire reflog entries > # check "!err" failed at t/unit-tests/t-reftable-stack.c:1404 > Segmentation fault > > So...yes, we will probably notice that the test failed from the exit > code. But it's not great when the harness itself barfs so had. Plus a > compiler may be free to reorder things in a confusing way if it can see > that "st" must never be NULL. > > It feels like we probably ought to return as soon as a check() fails. > That does create other headaches, though. E.g., we'd potentially leak > from an early return (which our LSan builds will complain about), > meaning that test code needs to start doing the usual "goto out" type of > cleanup. > > So I dunno. Maybe we just live with it. But it feels pretty ugly. > Thanks for pointing it out, I didn't notice this, mostly as I was copying from existing test cases and it does seem like this (wrong) pattern exists in a lot of the tests. Like Phillip and Patrick mentioned, this should go away since we're moving to using the clar test framework. I think it makes sense to keep this as is to stay consistent with the rest of code in this file for now. It is ugly, but seems like that would be simpler while migrating. > -Peff > > [1] This would happen in practice if malloc() failed, but you can > simulate it yourself like this, which is what I used to create the > output above: > > diff --git a/reftable/stack.c b/reftable/stack.c > index 026a9f9742..fe77132102 100644 > --- a/reftable/stack.c > +++ b/reftable/stack.c > @@ -861,6 +861,11 @@ int reftable_stack_new_addition(struct reftable_addition **dest, > int err = 0; > struct reftable_addition empty = REFTABLE_ADDITION_INIT; > > + if (flags & (1 << 16)) { > + *dest = NULL; > + return REFTABLE_OUT_OF_MEMORY_ERROR; > + } > + > REFTABLE_CALLOC_ARRAY(*dest, 1); > if (!*dest) > return REFTABLE_OUT_OF_MEMORY_ERROR; > diff --git a/t/unit-tests/t-reftable-stack.c b/t/unit-tests/t-reftable-stack.c > index c3f0059c34..73ed9792a5 100644 > --- a/t/unit-tests/t-reftable-stack.c > +++ b/t/unit-tests/t-reftable-stack.c > @@ -1400,7 +1400,7 @@ static void t_reftable_invalid_limit_updates(void) > > reftable_addition_destroy(add); > > - err = reftable_stack_new_addition(&add, st, 0); > + err = reftable_stack_new_addition(&add, st, (1 << 16)); > check(!err); > > /* [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 690 bytes --] ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: undefined behavior in unit tests, was Re: [PATCH v3 3/3] reftable: prevent 'update_index' changes after adding records 2025-02-03 15:20 ` Karthik Nayak @ 2025-02-03 15:38 ` Jeff King 0 siblings, 0 replies; 30+ messages in thread From: Jeff King @ 2025-02-03 15:38 UTC (permalink / raw) To: Karthik Nayak; +Cc: git, Patrick Steinhardt On Mon, Feb 03, 2025 at 07:20:16AM -0800, Karthik Nayak wrote: > Like Phillip and Patrick mentioned, this should go away since we're > moving to using the clar test framework. I think it makes sense to keep > this as is to stay consistent with the rest of code in this file for > now. It is ugly, but seems like that would be simpler while migrating. Yeah, definitely not worth addressing this single case. I was more interested in the overall trend, but it sounds like there are plans there already. -Peff ^ permalink raw reply [flat|nested] 30+ messages in thread
end of thread, other threads:[~2025-02-03 15:38 UTC | newest] Thread overview: 30+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-01-17 7:59 [PATCH 0/3] refs: small followups to the migration corruption fix Karthik Nayak 2025-01-17 7:59 ` [PATCH 1/3] refs: mark `ref_transaction_update_reflog()` as static Karthik Nayak 2025-01-17 9:29 ` Patrick Steinhardt 2025-01-20 11:17 ` Karthik Nayak 2025-01-17 7:59 ` [PATCH 2/3] refs: use 'uint64_t' for 'ref_update.index' Karthik Nayak 2025-01-17 7:59 ` [PATCH 3/3] reftable: prevent 'update_index' changes after header write Karthik Nayak 2025-01-17 9:29 ` Patrick Steinhardt 2025-01-20 11:47 ` Karthik Nayak 2025-01-20 12:18 ` Karthik Nayak 2025-01-21 3:34 ` [PATCH v2 0/3] refs: small followups to the migration corruption fix Karthik Nayak 2025-01-21 3:34 ` [PATCH v2 1/3] refs: mark `ref_transaction_update_reflog()` as static Karthik Nayak 2025-01-21 3:34 ` [PATCH v2 2/3] refs: use 'uint64_t' for 'ref_update.index' Karthik Nayak 2025-01-21 3:34 ` [PATCH v2 3/3] reftable: prevent 'update_index' changes after adding records Karthik Nayak 2025-01-21 6:56 ` Patrick Steinhardt 2025-01-21 11:44 ` Karthik Nayak 2025-01-22 5:35 ` [PATCH v3 0/3] refs: small followups to the migration corruption fix Karthik Nayak 2025-01-22 5:35 ` [PATCH v3 1/3] refs: mark `ref_transaction_update_reflog()` as static Karthik Nayak 2025-01-22 5:35 ` [PATCH v3 2/3] refs: use 'uint64_t' for 'ref_update.index' Karthik Nayak 2025-01-22 5:35 ` [PATCH v3 3/3] reftable: prevent 'update_index' changes after adding records Karthik Nayak 2025-01-22 12:12 ` Patrick Steinhardt 2025-01-22 17:50 ` Junio C Hamano 2025-01-22 21:57 ` Junio C Hamano 2025-02-01 2:24 ` undefined behavior in unit tests, was " Jeff King 2025-02-01 10:33 ` Phillip Wood 2025-02-03 5:41 ` Patrick Steinhardt 2025-02-03 14:11 ` Junio C Hamano 2025-02-03 15:37 ` Jeff King 2025-02-03 5:40 ` Patrick Steinhardt 2025-02-03 15:20 ` Karthik Nayak 2025-02-03 15:38 ` Jeff King
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).