From: Chandan Babu R <chandanrlinux@gmail.com>
To: Dave Chinner <david@fromorbit.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [PATCH 5/8] xfs: CIL checkpoint flushes caches unconditionally
Date: Wed, 24 Feb 2021 12:46:18 +0530 [thread overview]
Message-ID: <87v9aiorzx.fsf@garuda> (raw)
In-Reply-To: <20210223033442.3267258-6-david@fromorbit.com>
On 23 Feb 2021 at 09:04, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
>
> Currently every journal IO is issued as REQ_PREFLUSH | REQ_FUA to
> guarantee the ordering requirements the journal has w.r.t. metadata
> writeback. THe two ordering constraints are:
>
> 1. we cannot overwrite metadata in the journal until we guarantee
> that the dirty metadata has been written back in place and is
> stable.
>
> 2. we cannot write back dirty metadata until it has been written to
> the journal and guaranteed to be stable (and hence recoverable) in
> the journal.
>
> These rules apply to the atomic transactions recorded in the
> journal, not to the journal IO itself. Hence we need to ensure
> metadata is stable before we start writing a new transaction to the
> journal (guarantee #1), and we need to ensure the entire transaction
> is stable in the journal before we start metadata writeback
> (guarantee #2).
>
> The ordering guarantees of #1 are currently provided by REQ_PREFLUSH
> being added to every iclog IO. This causes the journal IO to issue a
> cache flush and wait for it to complete before issuing the write IO
> to the journal. Hence all completed metadata IO is guaranteed to be
> stable before the journal overwrites the old metadata.
>
> However, for long running CIL checkpoints that might do a thousand
> journal IOs, we don't need every single one of these iclog IOs to
> issue a cache flush - the cache flush done before the first iclog is
> submitted is sufficient to cover the entire range in the log that
> the checkpoint will overwrite because the CIL space reservation
> guarantees the tail of the log (completed metadata) is already
> beyond the range of the checkpoint write.
>
> Hence we only need a full cache flush between closing off the CIL
> checkpoint context (i.e. when the push switches it out) and issuing
> the first journal IO. Rather than plumbing this through to the
> journal IO, we can start this cache flush the moment the CIL context
> is owned exclusively by the push worker. The cache flush can be in
> progress while we process the CIL ready for writing, hence
> reducing the latency of the initial iclog write. This is especially
> true for large checkpoints, where we might have to process hundreds
> of thousands of log vectors before we issue the first iclog write.
> In these cases, it is likely the cache flush has already been
> completed by the time we have built the CIL log vector chain.
>
Indeed, a single cache flush of the "data device" that is issued before
writing the first iclog of a CIL context is sufficient to make sure that the
metadata has really reached non-volatile storage.
Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
> fs/xfs/xfs_log_cil.c | 29 +++++++++++++++++++++++++----
> 1 file changed, 25 insertions(+), 4 deletions(-)
>
> diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c
> index c5cc1b7ad25e..8bcacd463f06 100644
> --- a/fs/xfs/xfs_log_cil.c
> +++ b/fs/xfs/xfs_log_cil.c
> @@ -656,6 +656,7 @@ xlog_cil_push_work(
> struct xfs_log_vec lvhdr = { NULL };
> xfs_lsn_t commit_lsn;
> xfs_lsn_t push_seq;
> + DECLARE_COMPLETION_ONSTACK(bdev_flush);
>
> new_ctx = kmem_zalloc(sizeof(*new_ctx), KM_NOFS);
> new_ctx->ticket = xlog_cil_ticket_alloc(log);
> @@ -719,10 +720,24 @@ xlog_cil_push_work(
> spin_unlock(&cil->xc_push_lock);
>
> /*
> - * pull all the log vectors off the items in the CIL, and
> - * remove the items from the CIL. We don't need the CIL lock
> - * here because it's only needed on the transaction commit
> - * side which is currently locked out by the flush lock.
> + * The CIL is stable at this point - nothing new will be added to it
> + * because we hold the flush lock exclusively. Hence we can now issue
> + * a cache flush to ensure all the completed metadata in the journal we
> + * are about to overwrite is on stable storage.
> + *
> + * This avoids the need to have the iclogs issue REQ_PREFLUSH based
> + * cache flushes to provide this ordering guarantee, and hence for CIL
> + * checkpoints that require hundreds or thousands of log writes no
> + * longer need to issue device cache flushes to provide metadata
> + * writeback ordering.
> + */
> + xfs_flush_bdev_async(log->l_mp->m_ddev_targp->bt_bdev, &bdev_flush);
> +
> + /*
> + * Pull all the log vectors off the items in the CIL, and remove the
> + * items from the CIL. We don't need the CIL lock here because it's only
> + * needed on the transaction commit side which is currently locked out
> + * by the flush lock.
> */
> lv = NULL;
> num_iovecs = 0;
> @@ -806,6 +821,12 @@ xlog_cil_push_work(
> lvhdr.lv_iovecp = &lhdr;
> lvhdr.lv_next = ctx->lv_chain;
>
> + /*
> + * Before we format and submit the first iclog, we have to ensure that
> + * the metadata writeback ordering cache flush is complete.
> + */
> + wait_for_completion(&bdev_flush);
> +
> error = xlog_write(log, &lvhdr, tic, &ctx->start_lsn, NULL, 0, true);
> if (error)
> goto out_abort_free_ticket;
--
chandan
next prev parent reply other threads:[~2021-02-24 7:17 UTC|newest]
Thread overview: 59+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-02-23 3:34 [PATCH v2] xfs: various log stuff Dave Chinner
2021-02-23 3:34 ` [PATCH 1/8] xfs: log stripe roundoff is a property of the log Dave Chinner
2021-02-23 10:29 ` Chandan Babu R
2021-02-24 20:14 ` Darrick J. Wong
2021-02-25 8:32 ` Christoph Hellwig
2021-03-01 15:13 ` Brian Foster
2021-02-23 3:34 ` [PATCH 2/8] xfs: separate CIL commit record IO Dave Chinner
2021-02-23 12:12 ` Chandan Babu R
2021-02-24 20:34 ` Darrick J. Wong
2021-02-24 21:44 ` Dave Chinner
2021-02-24 23:06 ` Darrick J. Wong
2021-02-25 8:34 ` Christoph Hellwig
2021-02-25 20:47 ` Dave Chinner
2021-03-01 9:09 ` Christoph Hellwig
2021-03-03 0:11 ` Dave Chinner
2021-02-26 2:48 ` Darrick J. Wong
2021-02-28 16:36 ` Brian Foster
2021-02-28 23:46 ` Dave Chinner
2021-03-01 15:33 ` Brian Foster
2021-03-01 15:19 ` Brian Foster
2021-03-03 0:41 ` Dave Chinner
2021-03-03 15:22 ` Brian Foster
2021-03-04 22:57 ` Dave Chinner
2021-03-05 0:44 ` Dave Chinner
2021-02-23 3:34 ` [PATCH 3/8] xfs: move and rename xfs_blkdev_issue_flush Dave Chinner
2021-02-23 12:57 ` Chandan Babu R
2021-02-24 20:45 ` Darrick J. Wong
2021-02-24 22:01 ` Dave Chinner
2021-02-25 8:36 ` Christoph Hellwig
2021-02-23 3:34 ` [PATCH 4/8] xfs: async blkdev cache flush Dave Chinner
2021-02-23 5:29 ` Chaitanya Kulkarni
2021-02-23 14:02 ` Chandan Babu R
2021-02-24 20:51 ` Darrick J. Wong
2021-02-23 3:34 ` [PATCH 5/8] xfs: CIL checkpoint flushes caches unconditionally Dave Chinner
2021-02-24 7:16 ` Chandan Babu R [this message]
2021-02-24 20:57 ` Darrick J. Wong
2021-02-25 8:42 ` Christoph Hellwig
2021-02-25 21:07 ` Dave Chinner
2021-02-23 3:34 ` [PATCH 6/8] xfs: remove need_start_rec parameter from xlog_write() Dave Chinner
2021-02-24 7:17 ` Chandan Babu R
2021-02-24 20:59 ` Darrick J. Wong
2021-02-25 8:49 ` Christoph Hellwig
2021-02-25 20:55 ` Dave Chinner
2021-02-23 3:34 ` [PATCH 7/8] xfs: journal IO cache flush reductions Dave Chinner
2021-02-23 8:05 ` [PATCH 7/8 v2] " Dave Chinner
2021-02-24 12:27 ` Chandan Babu R
2021-02-24 20:32 ` Dave Chinner
2021-02-24 21:13 ` Darrick J. Wong
2021-02-24 22:03 ` Dave Chinner
2021-02-25 4:09 ` Chandan Babu R
2021-02-25 7:13 ` Chandan Babu R
2021-03-01 5:44 ` Dave Chinner
2021-03-01 5:56 ` Dave Chinner
2021-02-25 8:58 ` Christoph Hellwig
2021-02-25 21:06 ` Dave Chinner
2021-03-01 19:29 ` Brian Foster
2021-02-23 3:34 ` [PATCH 8/8] xfs: Fix CIL throttle hang when CIL space used going backwards Dave Chinner
2021-02-24 21:18 ` Darrick J. Wong
2021-02-24 22:05 ` Dave Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87v9aiorzx.fsf@garuda \
--to=chandanrlinux@gmail.com \
--cc=david@fromorbit.com \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).