public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Brian Foster <bfoster@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>,
	linux-xfs@vger.kernel.org, Dave Chinner <dchinner@redhat.com>
Subject: Re: [PATCH 06/20] xfs: don't use REQ_PREFLUSH for split log writes
Date: Wed, 5 Jun 2019 08:45:44 +1000	[thread overview]
Message-ID: <20190604224544.GB29573@dread.disaster.area> (raw)
In-Reply-To: <20190604161240.GA44563@bfoster>

On Tue, Jun 04, 2019 at 12:12:40PM -0400, Brian Foster wrote:
> On Mon, Jun 03, 2019 at 07:29:31PM +0200, Christoph Hellwig wrote:
> > If we have to split a log write because it wraps the end of the log we
> > can't just use REQ_PREFLUSH to flush before the first log write,
> > as the writes might get reordered somewhere in the I/O stack.  Issue
> > a manual flush in that case so that the ordering of the two log I/Os
> > doesn't matter.
> > 
> > Signed-off-by: Christoph Hellwig <hch@lst.de>
> > Reviewed-by: Dave Chinner <dchinner@redhat.com>
> > ---
> >  fs/xfs/xfs_log.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
> > index 3b82ca8ac9c8..646a190e5730 100644
> > --- a/fs/xfs/xfs_log.c
> > +++ b/fs/xfs/xfs_log.c
> > @@ -1941,7 +1941,7 @@ xlog_sync(
> >  	 * synchronously here; for an internal log we can simply use the block
> >  	 * layer state machine for preflushes.
> >  	 */
> > -	if (log->l_mp->m_logdev_targp != log->l_mp->m_ddev_targp)
> > +	if (log->l_mp->m_logdev_targp != log->l_mp->m_ddev_targp || split)
> >  		xfs_blkdev_issue_flush(log->l_mp->m_ddev_targp);
> 
> I'm curious if this is really necessary. The log record isn't
> recoverable until it's complete on disk (and thus the tail LSN stamped
> in the record header not relevant). As long as the cache flushes before
> the record is completely written, what difference does it make if it was
> made up of two out of order I/Os?

The problem is not whether the log write is recoverable, it's
whether what it overwrites is already on stable storage. i.e.  the
tail of the log can be overwritten by the split write to the start
of the log before the cache flush in the first iclog IO makes the
metadata it is overwriting stable. i.e:


	metadata write		-> volatile disk cache
	move log tail forwards	-> tail wraps back to start
<...>
	log write wrapping tail
	  iclog split
	   iclog write to end /w PREFLUSH + FUA
	   			-> queued in request queue
	   iclog write to start /w FUA
	   			-> queued in request queue
<....>
	request queue gets processed
	  dispatches write to start w/ FUA
				-> overwrites tail of log
<....>
	  dispatches write to end w/ PREFLUSH + FUA
	  			-> flushes metadata @ tail of log

If we have a power loss incident after the first FUA write to the
start of the log but before the second write issues/completes the
PREFLUSH, we have a situation on disk where the log tail has been
overwritten but the metadata that it overwrote had not yet been
committed to stable storage. That will result in either a corrupt
log (can't find tail) or a corrupt fielsystem because metadata in
some structure was not recovered.

> Granted log wrapping is not a frequent operation, but the explicit flush
> is a synchronous operation in the log force path whereas the flush flag
> isn't.

We have the options of:

	1) issuing a synchronous flush before both writes and then
	doing them w/ FUA only; or
	2) issuing both log writes with PREFLUSH+FUA.

In the first case, the fact the cache flush is done synchronously
really doesn't affect anything - it's done in the CIL push kworker
context, so blocking here doesn't really add any extra latency to
anything except synchronous log force waiters. Hence, typically,
there is nothing waiting on the log being flushed so it what extra
latency there is mostly won't matter.

In the second case, one of the cache flushes is superfluous and for
busy filesytems with small logs where we frequently hit the wrap
case this may add up to quite a bit of avoidable IO overhead....

Either way works, it's not clear to me that one is always superior
to the other, so we just have to chose one....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

  reply	other threads:[~2019-06-04 22:45 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-06-03 17:29 use bios directly in the log code v2 Christoph Hellwig
2019-06-03 17:29 ` [PATCH 01/20] xfs: remove the no-op spinlock_destroy stub Christoph Hellwig
2019-06-03 17:29 ` [PATCH 02/20] xfs: remove the never used _XBF_COMPOUND flag Christoph Hellwig
2019-06-03 17:29 ` [PATCH 03/20] xfs: renumber XBF_WRITE_FAIL Christoph Hellwig
2019-06-03 17:29 ` [PATCH 04/20] xfs: make mem_to_page available outside of xfs_buf.c Christoph Hellwig
2019-06-03 17:29 ` [PATCH 05/20] xfs: reformat xlog_get_lowest_lsn Christoph Hellwig
2019-06-03 17:29 ` [PATCH 06/20] xfs: don't use REQ_PREFLUSH for split log writes Christoph Hellwig
2019-06-04 16:12   ` Brian Foster
2019-06-04 22:45     ` Dave Chinner [this message]
2019-06-05 10:51       ` Brian Foster
2019-06-05 15:14         ` Christoph Hellwig
2019-06-05 15:47           ` Brian Foster
2019-06-03 17:29 ` [PATCH 07/20] xfs: factor out log buffer writing from xlog_sync Christoph Hellwig
2019-06-04  2:20   ` Dave Chinner
2019-06-03 17:29 ` [PATCH 08/20] xfs: factor out splitting of an iclog " Christoph Hellwig
2019-06-04  2:21   ` Dave Chinner
2019-06-03 17:29 ` [PATCH 09/20] xfs: factor out iclog size calculation " Christoph Hellwig
2019-06-04  2:23   ` Dave Chinner
2019-06-03 17:29 ` [PATCH 10/20] xfs: update both stat counters together in xlog_sync Christoph Hellwig
2019-06-03 17:29 ` [PATCH 11/20] xfs: remove the syncing argument from xlog_verify_iclog Christoph Hellwig
2019-06-03 17:29 ` [PATCH 12/20] xfs: make use of the l_targ field in struct xlog Christoph Hellwig
2019-06-03 17:29 ` [PATCH 13/20] xfs: use bios directly to write log buffers Christoph Hellwig
2019-06-04  5:54   ` Dave Chinner
2019-06-04  6:10     ` Christoph Hellwig
2019-06-03 17:29 ` [PATCH 14/20] xfs: move the log ioend workqueue to struct xlog Christoph Hellwig
2019-06-19 12:19   ` Christoph Hellwig
2019-06-19 22:51     ` Darrick J. Wong
2019-06-20  6:08       ` Christoph Hellwig
2019-06-03 17:29 ` [PATCH 15/20] xfs: return an offset instead of a pointer from xlog_align Christoph Hellwig
2019-06-03 17:29 ` [PATCH 16/20] xfs: use bios directly to read and write the log recovery buffers Christoph Hellwig
2019-06-04  6:13   ` Dave Chinner
2019-06-05 15:09     ` Christoph Hellwig
2019-06-03 17:29 ` [PATCH 17/20] xfs: stop using bp naming for " Christoph Hellwig
2019-06-04  6:19   ` Dave Chinner
2019-06-03 17:29 ` [PATCH 18/20] xfs: remove unused buffer cache APIs Christoph Hellwig
2019-06-04  6:24   ` Dave Chinner
2019-06-05 15:12     ` Christoph Hellwig
2019-06-05 21:24       ` Dave Chinner
2019-06-03 17:29 ` [PATCH 19/20] xfs: properly type the b_log_item field in struct xfs_buf Christoph Hellwig
2019-06-04  6:25   ` Dave Chinner
2019-06-03 17:29 ` [PATCH 20/20] xfs: remove the b_io_length " Christoph Hellwig
2019-06-04  6:27   ` Dave Chinner
2019-06-03 17:35 ` use bios directly in the log code v2 Darrick J. Wong
2019-06-03 17:38   ` Christoph Hellwig
2019-06-04 17:25     ` Darrick J. Wong
2019-06-04 17:54       ` Christoph Hellwig
2019-06-04 18:42         ` Brian Foster
2019-06-04 18:58           ` Christoph Hellwig
  -- strict thread matches above, loose matches on Subject: below --
2019-05-23 17:37 Christoph Hellwig
2019-05-23 17:37 ` [PATCH 06/20] xfs: don't use REQ_PREFLUSH for split log writes Christoph Hellwig
2019-05-23 22:39   ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190604224544.GB29573@dread.disaster.area \
    --to=david@fromorbit.com \
    --cc=bfoster@redhat.com \
    --cc=dchinner@redhat.com \
    --cc=hch@lst.de \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox