public inbox for linux-bcachefs@vger.kernel.org
 help / color / mirror / Atom feed
From: Brian Foster <bfoster@redhat.com>
To: Kent Overstreet <kent.overstreet@linux.dev>
Cc: linux-bcachefs@vger.kernel.org
Subject: Re: [PATCH 07/17] bcachefs: Make journal replay more efficient
Date: Tue, 14 Nov 2023 08:19:04 -0500	[thread overview]
Message-ID: <ZVNzyMRK6BHeEIiS@bfoster> (raw)
In-Reply-To: <20231110163157.2736111-8-kent.overstreet@linux.dev>

On Fri, Nov 10, 2023 at 11:31:44AM -0500, Kent Overstreet wrote:
> Journal replay now first attempts to replay keys in sorted order,
> similar to how the btree write buffer flush path works.
> 
> Any keys that can not be replayed due to journal deadlock are then left
> for later and replayed in journal order, unpinning journal entries as we
> go.
> 
> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
> ---

Just a question and a nit..

>  fs/bcachefs/errcode.h  |  1 -
>  fs/bcachefs/recovery.c | 84 ++++++++++++++++++++++++++++--------------
>  2 files changed, 56 insertions(+), 29 deletions(-)
> 
...
> diff --git a/fs/bcachefs/recovery.c b/fs/bcachefs/recovery.c
> index 6eafff2557a4..3524d1e0003e 100644
> --- a/fs/bcachefs/recovery.c
> +++ b/fs/bcachefs/recovery.c
...
> @@ -166,26 +159,61 @@ static int bch2_journal_replay(struct bch_fs *c)
>  			goto err;
>  	}
>  
> -	for (i = 0; i < keys->nr; i++) {
> -		k = keys_sorted[i];
> +	/*
> +	 * First, attempt to replay keys in sorted order. This is more
> +	 * efficient, but some might fail if that would cause a journal
> +	 * deadlock.
> +	 */

What does "sorted order" mean here vs. when we sort by journal seq
below?

> +	for (size_t i = 0; i < keys->nr; i++) {
> +		cond_resched();
> +
> +		struct journal_key *k = keys->d + i;
> +
> +		ret = commit_do(trans, NULL, NULL,
> +				BTREE_INSERT_NOFAIL|
> +				BTREE_INSERT_JOURNAL_RECLAIM|
> +				(!k->allocated ? BTREE_INSERT_JOURNAL_REPLAY : 0),
> +			     bch2_journal_replay_key(trans, k));
> +		BUG_ON(!ret && !k->overwritten);
> +		if (ret) {
> +			ret = darray_push(&keys_sorted, k);
> +			if (ret)
> +				goto err;
> +		}
> +	}
> +
> +	/*
> +	 * Now, replay any remaining keys in the order in which they appear in
> +	 * the journal, unpinning those journal entries as we go:
> +	 */
> +	sort(keys_sorted.data, keys_sorted.nr,
> +	     sizeof(keys_sorted.data[0]),
> +	     journal_sort_seq_cmp, NULL);
>  
> +	darray_for_each(keys_sorted, kp) {
>  		cond_resched();
>  
> +		struct journal_key *k = *kp;
> +
>  		replay_now_at(j, k->journal_seq);
>  
> -		ret = bch2_trans_do(c, NULL, NULL,
> -				    BTREE_INSERT_NOFAIL|
> -				    (!k->allocated
> -				     ? BTREE_INSERT_JOURNAL_REPLAY|BCH_WATERMARK_reclaim
> -				     : 0),
> +		ret = commit_do(trans, NULL, NULL,
> +				BTREE_INSERT_NOFAIL|
> +				(!k->allocated
> +				 ? BTREE_INSERT_JOURNAL_REPLAY|BCH_WATERMARK_reclaim
> +				 : 0),
>  			     bch2_journal_replay_key(trans, k));
> -		if (ret) {
> -			bch_err(c, "journal replay: error while replaying key at btree %s level %u: %s",
> -				bch2_btree_id_str(k->btree_id), k->level, bch2_err_str(ret));
> +		bch_err_msg(c, ret, "while replaying key at btree %s level %u:",
> +			    bch2_btree_id_str(k->btree_id), k->level);
> +		if (ret)
>  			goto err;
> -		}
> +
> +		BUG_ON(!k->overwritten);
>  	}
>  
> +	bch2_trans_put(trans);
> +	trans = NULL;
> +

Nit: This looks spurious w/ the exit path, just FWIW.

Brian

>  	replay_now_at(j, j->replay_journal_seq_end);
>  	j->replay_journal_seq = 0;
>  
> @@ -196,10 +224,10 @@ static int bch2_journal_replay(struct bch_fs *c)
>  	if (keys->nr && !ret)
>  		bch2_journal_log_msg(c, "journal replay finished");
>  err:
> -	kvfree(keys_sorted);
> -
> -	if (ret)
> -		bch_err_fn(c, ret);
> +	if (trans)
> +		bch2_trans_put(trans);
> +	darray_exit(&keys_sorted);
> +	bch_err_fn(c, ret);
>  	return ret;
>  }
>  
> -- 
> 2.42.0
> 
> 


  reply	other threads:[~2023-11-14 13:18 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-10 16:31 [PATCH 00/17] btree write buffer & journal optimizations Kent Overstreet
2023-11-10 16:31 ` [PATCH 01/17] bcachefs: Kill journal pre-reservations Kent Overstreet
2023-11-10 16:31 ` [PATCH 02/17] bcachefs: track_event_change() Kent Overstreet
2023-11-10 16:31 ` [PATCH 03/17] bcachefs: Journal pins must always have a flush_fn Kent Overstreet
2023-11-13 15:22   ` Brian Foster
2023-11-13 16:36     ` Kent Overstreet
2023-11-13 17:08       ` Brian Foster
2023-11-10 16:31 ` [PATCH 04/17] bcachefs: BTREE_INSERT_JOURNAL_REPLAY now "don't init trans->journal_res" Kent Overstreet
2023-11-10 16:31 ` [PATCH 05/17] bcachefs: Kill BTREE_UPDATE_PREJOURNAL Kent Overstreet
2023-11-13 15:29   ` Brian Foster
2023-11-13 16:49     ` Kent Overstreet
2023-11-14 13:17       ` Brian Foster
2023-11-10 16:31 ` [PATCH 06/17] bcachefs: Go rw before journal replay Kent Overstreet
2023-11-10 16:31 ` [PATCH 07/17] bcachefs: Make journal replay more efficient Kent Overstreet
2023-11-14 13:19   ` Brian Foster [this message]
2023-11-15  1:50     ` Kent Overstreet
2023-11-10 16:31 ` [PATCH 08/17] bcachefs: Don't flush journal after replay Kent Overstreet
2023-11-10 16:31 ` [PATCH 09/17] bcachefs: Unwritten journal buffers are always dirty Kent Overstreet
2023-11-10 16:31 ` [PATCH 10/17] bcachefs: journal->buf_lock Kent Overstreet
2023-11-10 16:31 ` [PATCH 11/17] bcachefs: bch2_journal_block_reservations() Kent Overstreet
2023-11-10 16:31 ` [PATCH 12/17] bcachefs: Clean up btree write buffer write ref handling Kent Overstreet
2023-11-10 16:31 ` [PATCH 13/17] bcachefs: bch2_btree_write_buffer_flush_locked() Kent Overstreet
2023-11-10 16:31 ` [PATCH 14/17] bcachefs: bch2_btree_write_buffer_flush() -> bch2_btree_write_buffer_tryflush() Kent Overstreet
2023-11-10 16:31 ` [PATCH 15/17] bcachefs: Improve btree write buffer tracepoints Kent Overstreet
2023-11-10 16:31 ` [PATCH 16/17] bcachefs: btree write buffer now slurps keys from journal Kent Overstreet
2023-11-21 10:56   ` Geert Uytterhoeven
2023-11-21 16:52     ` Kent Overstreet
2023-11-10 16:31 ` [PATCH 17/17] bcachefs: Inline btree write buffer sort Kent Overstreet
2023-11-10 16:42 ` [PATCH 00/17] btree write buffer & journal optimizations Kent Overstreet

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZVNzyMRK6BHeEIiS@bfoster \
    --to=bfoster@redhat.com \
    --cc=kent.overstreet@linux.dev \
    --cc=linux-bcachefs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox