git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Patrick Steinhardt <ps@pks.im>
To: Karthik Nayak <karthik.188@gmail.com>
Cc: git@vger.kernel.org
Subject: Re: [PATCH v2 3/3] reftable: prevent 'update_index' changes after adding records
Date: Tue, 21 Jan 2025 07:56:29 +0100	[thread overview]
Message-ID: <Z49FHQgsFSX6sTxu@pks.im> (raw)
In-Reply-To: <20250121-461-corrupted-reftable-followup-v2-3-37e26c7a79b4@gmail.com>

On Tue, Jan 21, 2025 at 04:34:12AM +0100, Karthik Nayak wrote:
> The function `reftable_writer_set_limits()` allows updating the
> 'min_update_index' and 'max_update_index' of a reftable writer. These
> values are written to both the writer's header and footer.
> 
> Since the header is written during the first block write, any subsequent
> changes to the update index would create a mismatch between the header
> and footer values. The footer would contain the newer values while the
> header retained the original ones.
> 
> To fix this bug, prevent callers from updating these values after any

Nit: it's not really fixing a bug, but protecting us against it. Not
worth a reroll though, from my point of view.

> diff --git a/reftable/reftable-writer.h b/reftable/reftable-writer.h
> index 5f9afa620bb00de66c311765fb0ae8c6f56401ae..1ea014d389cc47f173279e3234a82f3fcbc807a0 100644
> --- a/reftable/reftable-writer.h
> +++ b/reftable/reftable-writer.h
> @@ -124,17 +124,21 @@ int reftable_writer_new(struct reftable_writer **out,
>  			int (*flush_func)(void *),
>  			void *writer_arg, const struct reftable_write_options *opts);
>  
> -/* Set the range of update indices for the records we will add. When writing a
> -   table into a stack, the min should be at least
> -   reftable_stack_next_update_index(), or REFTABLE_API_ERROR is returned.
> -
> -   For transactional updates to a stack, typically min==max, and the
> -   update_index can be obtained by inspeciting the stack. When converting an
> -   existing ref database into a single reftable, this would be a range of
> -   update-index timestamps.
> +/*
> + * Set the range of update indices for the records we will add. When writing a
> + * table into a stack, the min should be at least
> + * reftable_stack_next_update_index(), or REFTABLE_API_ERROR is returned.
> + *
> + * For transactional updates to a stack, typically min==max, and the
> + * update_index can be obtained by inspeciting the stack. When converting an
> + * existing ref database into a single reftable, this would be a range of
> + * update-index timestamps.
> + *
> + * The function should be called before adding any records to the writer. If not
> + * it will fail with REFTABLE_API_ERROR.
>   */

Thanks for updating this. I think the reftable library is one of those
code areas where it makes sense to sneak in a formatting fix every now
and then because its coding style is quite alien to Git's own in some
places. We could also do it all in one go, but I strongly doubt that it
would be worth the churn.

> -void reftable_writer_set_limits(struct reftable_writer *w, uint64_t min,
> -				uint64_t max);
> +int reftable_writer_set_limits(struct reftable_writer *w, uint64_t min,
> +			       uint64_t max);
>  
>  /*
>    Add a reftable_ref_record. The record should have names that come after

> diff --git a/reftable/writer.c b/reftable/writer.c
> index 740c98038eaf883258bef4988f78977ac7e4a75a..03acbdbcce75fd51820c5fb016bd94f0f7f4914a 100644
> --- a/reftable/writer.c
> +++ b/reftable/writer.c
> @@ -179,11 +179,20 @@ int reftable_writer_new(struct reftable_writer **out,
>  	return 0;
>  }
>  
> -void reftable_writer_set_limits(struct reftable_writer *w, uint64_t min,
> -				uint64_t max)
> +int reftable_writer_set_limits(struct reftable_writer *w, uint64_t min,
> +			       uint64_t max)
>  {
> +	/*
> +	 * The limits should be set before any records are added to the writer.
> +	 * Check if any records were added by checking if `last_key` was set.
> +	 */
> +	if (w->last_key.len)
> +		return REFTABLE_API_ERROR;

Hm. Using the last key feels somewhat dangerous to me as it does get
reset at times, e.g. when finishing writing the current section. It
_should_ work, but overall it just feels a tad to disconnected from the
thing that we actually want to check.

How about we instead use `next`? This variable records the offset of the
next block we're about to write, and `writer_flush_nonempty_block()`
uses it directly to check whether we're currently writing the first
block in order to decide whether it needs to write a header or not. If
it's 0, we know that we haven't written the first block yet. That feels
much closer aligned with what we're checking.

> diff --git a/t/unit-tests/t-reftable-stack.c b/t/unit-tests/t-reftable-stack.c
> index aeec195b2b1014445d71c5db39a9795017fd8ff2..b23edf18a7d75b0c2292490ad06d4dfaaa571e79 100644
> --- a/t/unit-tests/t-reftable-stack.c
> +++ b/t/unit-tests/t-reftable-stack.c

Can we maybe add a unit test that demonstrates the error?

Patrick

  reply	other threads:[~2025-01-21  6:56 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-01-17  7:59 [PATCH 0/3] refs: small followups to the migration corruption fix Karthik Nayak
2025-01-17  7:59 ` [PATCH 1/3] refs: mark `ref_transaction_update_reflog()` as static Karthik Nayak
2025-01-17  9:29   ` Patrick Steinhardt
2025-01-20 11:17     ` Karthik Nayak
2025-01-17  7:59 ` [PATCH 2/3] refs: use 'uint64_t' for 'ref_update.index' Karthik Nayak
2025-01-17  7:59 ` [PATCH 3/3] reftable: prevent 'update_index' changes after header write Karthik Nayak
2025-01-17  9:29   ` Patrick Steinhardt
2025-01-20 11:47     ` Karthik Nayak
2025-01-20 12:18       ` Karthik Nayak
2025-01-21  3:34 ` [PATCH v2 0/3] refs: small followups to the migration corruption fix Karthik Nayak
2025-01-21  3:34   ` [PATCH v2 1/3] refs: mark `ref_transaction_update_reflog()` as static Karthik Nayak
2025-01-21  3:34   ` [PATCH v2 2/3] refs: use 'uint64_t' for 'ref_update.index' Karthik Nayak
2025-01-21  3:34   ` [PATCH v2 3/3] reftable: prevent 'update_index' changes after adding records Karthik Nayak
2025-01-21  6:56     ` Patrick Steinhardt [this message]
2025-01-21 11:44       ` Karthik Nayak
2025-01-22  5:35   ` [PATCH v3 0/3] refs: small followups to the migration corruption fix Karthik Nayak
2025-01-22  5:35     ` [PATCH v3 1/3] refs: mark `ref_transaction_update_reflog()` as static Karthik Nayak
2025-01-22  5:35     ` [PATCH v3 2/3] refs: use 'uint64_t' for 'ref_update.index' Karthik Nayak
2025-01-22  5:35     ` [PATCH v3 3/3] reftable: prevent 'update_index' changes after adding records Karthik Nayak
2025-01-22 12:12       ` Patrick Steinhardt
2025-01-22 17:50         ` Junio C Hamano
2025-01-22 21:57           ` Junio C Hamano
2025-02-01  2:24       ` undefined behavior in unit tests, was " Jeff King
2025-02-01 10:33         ` Phillip Wood
2025-02-03  5:41           ` Patrick Steinhardt
2025-02-03 14:11             ` Junio C Hamano
2025-02-03 15:37           ` Jeff King
2025-02-03  5:40         ` Patrick Steinhardt
2025-02-03 15:20         ` Karthik Nayak
2025-02-03 15:38           ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z49FHQgsFSX6sTxu@pks.im \
    --to=ps@pks.im \
    --cc=git@vger.kernel.org \
    --cc=karthik.188@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).