linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Chandan Babu R <chandanrlinux@gmail.com>
To: Dave Chinner <david@fromorbit.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [PATCH 5/8] xfs: CIL checkpoint flushes caches unconditionally
Date: Wed, 24 Feb 2021 12:46:18 +0530	[thread overview]
Message-ID: <87v9aiorzx.fsf@garuda> (raw)
In-Reply-To: <20210223033442.3267258-6-david@fromorbit.com>

On 23 Feb 2021 at 09:04, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
>
> Currently every journal IO is issued as REQ_PREFLUSH | REQ_FUA to
> guarantee the ordering requirements the journal has w.r.t. metadata
> writeback. THe two ordering constraints are:
>
> 1. we cannot overwrite metadata in the journal until we guarantee
> that the dirty metadata has been written back in place and is
> stable.
>
> 2. we cannot write back dirty metadata until it has been written to
> the journal and guaranteed to be stable (and hence recoverable) in
> the journal.
>
> These rules apply to the atomic transactions recorded in the
> journal, not to the journal IO itself. Hence we need to ensure
> metadata is stable before we start writing a new transaction to the
> journal (guarantee #1), and we need to ensure the entire transaction
> is stable in the journal before we start metadata writeback
> (guarantee #2).
>
> The ordering guarantees of #1 are currently provided by REQ_PREFLUSH
> being added to every iclog IO. This causes the journal IO to issue a
> cache flush and wait for it to complete before issuing the write IO
> to the journal. Hence all completed metadata IO is guaranteed to be
> stable before the journal overwrites the old metadata.
>
> However, for long running CIL checkpoints that might do a thousand
> journal IOs, we don't need every single one of these iclog IOs to
> issue a cache flush - the cache flush done before the first iclog is
> submitted is sufficient to cover the entire range in the log that
> the checkpoint will overwrite because the CIL space reservation
> guarantees the tail of the log (completed metadata) is already
> beyond the range of the checkpoint write.
>
> Hence we only need a full cache flush between closing off the CIL
> checkpoint context (i.e. when the push switches it out) and issuing
> the first journal IO. Rather than plumbing this through to the
> journal IO, we can start this cache flush the moment the CIL context
> is owned exclusively by the push worker. The cache flush can be in
> progress while we process the CIL ready for writing, hence
> reducing the latency of the initial iclog write. This is especially
> true for large checkpoints, where we might have to process hundreds
> of thousands of log vectors before we issue the first iclog write.
> In these cases, it is likely the cache flush has already been
> completed by the time we have built the CIL log vector chain.
>

Indeed, a single cache flush of the "data device" that is issued before
writing the first iclog of a CIL context is sufficient to make sure that the
metadata has really reached non-volatile storage.

Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>


> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/xfs_log_cil.c | 29 +++++++++++++++++++++++++----
>  1 file changed, 25 insertions(+), 4 deletions(-)
>
> diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c
> index c5cc1b7ad25e..8bcacd463f06 100644
> --- a/fs/xfs/xfs_log_cil.c
> +++ b/fs/xfs/xfs_log_cil.c
> @@ -656,6 +656,7 @@ xlog_cil_push_work(
>  	struct xfs_log_vec	lvhdr = { NULL };
>  	xfs_lsn_t		commit_lsn;
>  	xfs_lsn_t		push_seq;
> +	DECLARE_COMPLETION_ONSTACK(bdev_flush);
>
>  	new_ctx = kmem_zalloc(sizeof(*new_ctx), KM_NOFS);
>  	new_ctx->ticket = xlog_cil_ticket_alloc(log);
> @@ -719,10 +720,24 @@ xlog_cil_push_work(
>  	spin_unlock(&cil->xc_push_lock);
>
>  	/*
> -	 * pull all the log vectors off the items in the CIL, and
> -	 * remove the items from the CIL. We don't need the CIL lock
> -	 * here because it's only needed on the transaction commit
> -	 * side which is currently locked out by the flush lock.
> +	 * The CIL is stable at this point - nothing new will be added to it
> +	 * because we hold the flush lock exclusively. Hence we can now issue
> +	 * a cache flush to ensure all the completed metadata in the journal we
> +	 * are about to overwrite is on stable storage.
> +	 *
> +	 * This avoids the need to have the iclogs issue REQ_PREFLUSH based
> +	 * cache flushes to provide this ordering guarantee, and hence for CIL
> +	 * checkpoints that require hundreds or thousands of log writes no
> +	 * longer need to issue device cache flushes to provide metadata
> +	 * writeback ordering.
> +	 */
> +	xfs_flush_bdev_async(log->l_mp->m_ddev_targp->bt_bdev, &bdev_flush);
> +
> +	/*
> +	 * Pull all the log vectors off the items in the CIL, and remove the
> +	 * items from the CIL. We don't need the CIL lock here because it's only
> +	 * needed on the transaction commit side which is currently locked out
> +	 * by the flush lock.
>  	 */
>  	lv = NULL;
>  	num_iovecs = 0;
> @@ -806,6 +821,12 @@ xlog_cil_push_work(
>  	lvhdr.lv_iovecp = &lhdr;
>  	lvhdr.lv_next = ctx->lv_chain;
>
> +	/*
> +	 * Before we format and submit the first iclog, we have to ensure that
> +	 * the metadata writeback ordering cache flush is complete.
> +	 */
> +	wait_for_completion(&bdev_flush);
> +
>  	error = xlog_write(log, &lvhdr, tic, &ctx->start_lsn, NULL, 0, true);
>  	if (error)
>  		goto out_abort_free_ticket;


--
chandan

  reply	other threads:[~2021-02-24  7:17 UTC|newest]

Thread overview: 59+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-23  3:34 [PATCH v2] xfs: various log stuff Dave Chinner
2021-02-23  3:34 ` [PATCH 1/8] xfs: log stripe roundoff is a property of the log Dave Chinner
2021-02-23 10:29   ` Chandan Babu R
2021-02-24 20:14   ` Darrick J. Wong
2021-02-25  8:32   ` Christoph Hellwig
2021-03-01 15:13   ` Brian Foster
2021-02-23  3:34 ` [PATCH 2/8] xfs: separate CIL commit record IO Dave Chinner
2021-02-23 12:12   ` Chandan Babu R
2021-02-24 20:34   ` Darrick J. Wong
2021-02-24 21:44     ` Dave Chinner
2021-02-24 23:06       ` Darrick J. Wong
2021-02-25  8:34       ` Christoph Hellwig
2021-02-25 20:47         ` Dave Chinner
2021-03-01  9:09           ` Christoph Hellwig
2021-03-03  0:11             ` Dave Chinner
2021-02-26  2:48         ` Darrick J. Wong
2021-02-28 16:36           ` Brian Foster
2021-02-28 23:46             ` Dave Chinner
2021-03-01 15:33               ` Brian Foster
2021-03-01 15:19   ` Brian Foster
2021-03-03  0:41     ` Dave Chinner
2021-03-03 15:22       ` Brian Foster
2021-03-04 22:57         ` Dave Chinner
2021-03-05  0:44       ` Dave Chinner
2021-02-23  3:34 ` [PATCH 3/8] xfs: move and rename xfs_blkdev_issue_flush Dave Chinner
2021-02-23 12:57   ` Chandan Babu R
2021-02-24 20:45   ` Darrick J. Wong
2021-02-24 22:01     ` Dave Chinner
2021-02-25  8:36   ` Christoph Hellwig
2021-02-23  3:34 ` [PATCH 4/8] xfs: async blkdev cache flush Dave Chinner
2021-02-23  5:29   ` Chaitanya Kulkarni
2021-02-23 14:02   ` Chandan Babu R
2021-02-24 20:51   ` Darrick J. Wong
2021-02-23  3:34 ` [PATCH 5/8] xfs: CIL checkpoint flushes caches unconditionally Dave Chinner
2021-02-24  7:16   ` Chandan Babu R [this message]
2021-02-24 20:57   ` Darrick J. Wong
2021-02-25  8:42   ` Christoph Hellwig
2021-02-25 21:07     ` Dave Chinner
2021-02-23  3:34 ` [PATCH 6/8] xfs: remove need_start_rec parameter from xlog_write() Dave Chinner
2021-02-24  7:17   ` Chandan Babu R
2021-02-24 20:59   ` Darrick J. Wong
2021-02-25  8:49   ` Christoph Hellwig
2021-02-25 20:55     ` Dave Chinner
2021-02-23  3:34 ` [PATCH 7/8] xfs: journal IO cache flush reductions Dave Chinner
2021-02-23  8:05   ` [PATCH 7/8 v2] " Dave Chinner
2021-02-24 12:27     ` Chandan Babu R
2021-02-24 20:32       ` Dave Chinner
2021-02-24 21:13     ` Darrick J. Wong
2021-02-24 22:03       ` Dave Chinner
2021-02-25  4:09     ` Chandan Babu R
2021-02-25  7:13       ` Chandan Babu R
2021-03-01  5:44       ` Dave Chinner
2021-03-01  5:56         ` Dave Chinner
2021-02-25  8:58     ` Christoph Hellwig
2021-02-25 21:06       ` Dave Chinner
2021-03-01 19:29     ` Brian Foster
2021-02-23  3:34 ` [PATCH 8/8] xfs: Fix CIL throttle hang when CIL space used going backwards Dave Chinner
2021-02-24 21:18   ` Darrick J. Wong
2021-02-24 22:05     ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87v9aiorzx.fsf@garuda \
    --to=chandanrlinux@gmail.com \
    --cc=david@fromorbit.com \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).