linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <djwong@kernel.org>
To: Dave Chinner <david@fromorbit.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [PATCH 2/8] xfs: separate CIL commit record IO
Date: Wed, 24 Feb 2021 15:06:28 -0800	[thread overview]
Message-ID: <20210224230628.GG7272@magnolia> (raw)
In-Reply-To: <20210224214417.GB4662@dread.disaster.area>

On Thu, Feb 25, 2021 at 08:44:17AM +1100, Dave Chinner wrote:
> On Wed, Feb 24, 2021 at 12:34:29PM -0800, Darrick J. Wong wrote:
> > On Tue, Feb 23, 2021 at 02:34:36PM +1100, Dave Chinner wrote:
> > > From: Dave Chinner <dchinner@redhat.com>
> > > 
> > > To allow for iclog IO device cache flush behaviour to be optimised,
> > > we first need to separate out the commit record iclog IO from the
> > > rest of the checkpoint so we can wait for the checkpoint IO to
> > > complete before we issue the commit record.
> > > 
> > > This separation is only necessary if the commit record is being
> > > written into a different iclog to the start of the checkpoint as the
> > > upcoming cache flushing changes requires completion ordering against
> > > the other iclogs submitted by the checkpoint.
> > > 
> > > If the entire checkpoint and commit is in the one iclog, then they
> > > are both covered by the one set of cache flush primitives on the
> > > iclog and hence there is no need to separate them for ordering.
> > > 
> > > Otherwise, we need to wait for all the previous iclogs to complete
> > > so they are ordered correctly and made stable by the REQ_PREFLUSH
> > > that the commit record iclog IO issues. This guarantees that if a
> > > reader sees the commit record in the journal, they will also see the
> > > entire checkpoint that commit record closes off.
> > > 
> > > This also provides the guarantee that when the commit record IO
> > > completes, we can safely unpin all the log items in the checkpoint
> > > so they can be written back because the entire checkpoint is stable
> > > in the journal.
> > > 
> > > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > > ---
> > >  fs/xfs/xfs_log.c      | 55 +++++++++++++++++++++++++++++++++++++++++++
> > >  fs/xfs/xfs_log_cil.c  |  7 ++++++
> > >  fs/xfs/xfs_log_priv.h |  2 ++
> > >  3 files changed, 64 insertions(+)
> > > 
> > > diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
> > > index fa284f26d10e..ff26fb46d70f 100644
> > > --- a/fs/xfs/xfs_log.c
> > > +++ b/fs/xfs/xfs_log.c
> > > @@ -808,6 +808,61 @@ xlog_wait_on_iclog(
> > >  	return 0;
> > >  }
> > >  
> > > +/*
> > > + * Wait on any iclogs that are still flushing in the range of start_lsn to the
> > > + * current iclog's lsn. The caller holds a reference to the iclog, but otherwise
> > > + * holds no log locks.
> > > + *
> > > + * We walk backwards through the iclogs to find the iclog with the highest lsn
> > > + * in the range that we need to wait for and then wait for it to complete.
> > > + * Completion ordering of iclog IOs ensures that all prior iclogs to the
> > > + * candidate iclog we need to sleep on have been complete by the time our
> > > + * candidate has completed it's IO.
> > 
> > Hmm, I guess this means that iclog header lsns are supposed to increase
> > as one walks forwards through the list?
> 
> yes, the iclogs are written sequentially to the log - we don't
> switch the log->l_iclog pointer to the current active iclog until we
> switch it out, and then the next iclog in the loop is physically
> located at a higher lsn to the one we just switched out.
> 
> > > + *
> > > + * Therefore we only need to find the first iclog that isn't clean within the
> > > + * span of our flush range. If we come across a clean, newly activated iclog
> > > + * with a lsn of 0, it means IO has completed on this iclog and all previous
> > > + * iclogs will be have been completed prior to this one. Hence finding a newly
> > > + * activated iclog indicates that there are no iclogs in the range we need to
> > > + * wait on and we are done searching.
> > 
> > I don't see an explicit check for an iclog with a zero lsn?  Is that
> > implied by XLOG_STATE_ACTIVE?
> 
> It's handled by the XFS_LSN_CMP(prev_lsn, start_lsn) < 0 check.  if
> the prev_lsn is zero because the iclog is clean, then this check
> will always be true.
> 
> > Also, do you have any idea what was Christoph talking about wrt devices
> > with no-op flushes the last time this patch was posted?  This change
> > seems straightforward to me (assuming the answers to my two question are
> > 'yes') but I didn't grok what subtlety he was alluding to...?
> 
> He was wondering what devices benefited from this. It has no impact
> on highspeed devices that do not require flushes/FUA (e.g. high end
> intel optane SSDs) but those are not the devices this change is
> aimed at. There are no regressions on these high end devices,
> either, so they are largely irrelevant to the patch and what it
> targets...

Ok, that's what I thought.  It seemed fairly self-evident to me that
high speed devices wouldn't care.

Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com

  reply	other threads:[~2021-02-24 23:07 UTC|newest]

Thread overview: 59+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-23  3:34 [PATCH v2] xfs: various log stuff Dave Chinner
2021-02-23  3:34 ` [PATCH 1/8] xfs: log stripe roundoff is a property of the log Dave Chinner
2021-02-23 10:29   ` Chandan Babu R
2021-02-24 20:14   ` Darrick J. Wong
2021-02-25  8:32   ` Christoph Hellwig
2021-03-01 15:13   ` Brian Foster
2021-02-23  3:34 ` [PATCH 2/8] xfs: separate CIL commit record IO Dave Chinner
2021-02-23 12:12   ` Chandan Babu R
2021-02-24 20:34   ` Darrick J. Wong
2021-02-24 21:44     ` Dave Chinner
2021-02-24 23:06       ` Darrick J. Wong [this message]
2021-02-25  8:34       ` Christoph Hellwig
2021-02-25 20:47         ` Dave Chinner
2021-03-01  9:09           ` Christoph Hellwig
2021-03-03  0:11             ` Dave Chinner
2021-02-26  2:48         ` Darrick J. Wong
2021-02-28 16:36           ` Brian Foster
2021-02-28 23:46             ` Dave Chinner
2021-03-01 15:33               ` Brian Foster
2021-03-01 15:19   ` Brian Foster
2021-03-03  0:41     ` Dave Chinner
2021-03-03 15:22       ` Brian Foster
2021-03-04 22:57         ` Dave Chinner
2021-03-05  0:44       ` Dave Chinner
2021-02-23  3:34 ` [PATCH 3/8] xfs: move and rename xfs_blkdev_issue_flush Dave Chinner
2021-02-23 12:57   ` Chandan Babu R
2021-02-24 20:45   ` Darrick J. Wong
2021-02-24 22:01     ` Dave Chinner
2021-02-25  8:36   ` Christoph Hellwig
2021-02-23  3:34 ` [PATCH 4/8] xfs: async blkdev cache flush Dave Chinner
2021-02-23  5:29   ` Chaitanya Kulkarni
2021-02-23 14:02   ` Chandan Babu R
2021-02-24 20:51   ` Darrick J. Wong
2021-02-23  3:34 ` [PATCH 5/8] xfs: CIL checkpoint flushes caches unconditionally Dave Chinner
2021-02-24  7:16   ` Chandan Babu R
2021-02-24 20:57   ` Darrick J. Wong
2021-02-25  8:42   ` Christoph Hellwig
2021-02-25 21:07     ` Dave Chinner
2021-02-23  3:34 ` [PATCH 6/8] xfs: remove need_start_rec parameter from xlog_write() Dave Chinner
2021-02-24  7:17   ` Chandan Babu R
2021-02-24 20:59   ` Darrick J. Wong
2021-02-25  8:49   ` Christoph Hellwig
2021-02-25 20:55     ` Dave Chinner
2021-02-23  3:34 ` [PATCH 7/8] xfs: journal IO cache flush reductions Dave Chinner
2021-02-23  8:05   ` [PATCH 7/8 v2] " Dave Chinner
2021-02-24 12:27     ` Chandan Babu R
2021-02-24 20:32       ` Dave Chinner
2021-02-24 21:13     ` Darrick J. Wong
2021-02-24 22:03       ` Dave Chinner
2021-02-25  4:09     ` Chandan Babu R
2021-02-25  7:13       ` Chandan Babu R
2021-03-01  5:44       ` Dave Chinner
2021-03-01  5:56         ` Dave Chinner
2021-02-25  8:58     ` Christoph Hellwig
2021-02-25 21:06       ` Dave Chinner
2021-03-01 19:29     ` Brian Foster
2021-02-23  3:34 ` [PATCH 8/8] xfs: Fix CIL throttle hang when CIL space used going backwards Dave Chinner
2021-02-24 21:18   ` Darrick J. Wong
2021-02-24 22:05     ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210224230628.GG7272@magnolia \
    --to=djwong@kernel.org \
    --cc=david@fromorbit.com \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).