public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Avi Kivity <avi@scylladb.com>
Cc: linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	andres@anarazel.de
Subject: Re: [RFC] xfs: reduce sub-block DIO serialisation
Date: Mon, 18 Jan 2021 08:34:01 +1100	[thread overview]
Message-ID: <20210117213401.GB78941@dread.disaster.area> (raw)
In-Reply-To: <50362fc8-3d5e-cd93-4e55-f3ecddc21780@scylladb.com>

On Thu, Jan 14, 2021 at 08:48:36AM +0200, Avi Kivity wrote:
> On 1/13/21 10:38 PM, Dave Chinner wrote:
> > On Wed, Jan 13, 2021 at 10:00:37AM +0200, Avi Kivity wrote:
> > > On 1/13/21 12:13 AM, Dave Chinner wrote:
> > > > On Tue, Jan 12, 2021 at 10:01:35AM +0200, Avi Kivity wrote:
> > > > > On 1/12/21 3:07 AM, Dave Chinner wrote:
> > > > > > Hi folks,
> > > > > > 
> > > > > > This is the XFS implementation on the sub-block DIO optimisations
> > > > > > for written extents that I've mentioned on #xfs and a couple of
> > > > > > times now on the XFS mailing list.
> > > > > > 
> > > > > > It takes the approach of using the IOMAP_NOWAIT non-blocking
> > > > > > IO submission infrastructure to optimistically dispatch sub-block
> > > > > > DIO without exclusive locking. If the extent mapping callback
> > > > > > decides that it can't do the unaligned IO without extent
> > > > > > manipulation, sub-block zeroing, blocking or splitting the IO into
> > > > > > multiple parts, it aborts the IO with -EAGAIN. This allows the high
> > > > > > level filesystem code to then take exclusive locks and resubmit the
> > > > > > IO once it has guaranteed no other IO is in progress on the inode
> > > > > > (the current implementation).
> > > > > Can you expand on the no-splitting requirement? Does it involve only
> > > > > splitting by XFS (IO spans >1 extents) or lower layers (RAID)?
> > > > XFS only.
> > > 
> > > Ok, that is somewhat under control as I can provide an extent hint, and wish
> > > really hard that the filesystem isn't fragmented.
> > > 
> > > 
> > > > > The reason I'm concerned is that it's the constraint that the application
> > > > > has least control over. I guess I could use RWF_NOWAIT to avoid blocking my
> > > > > main thread (but last time I tried I'd get occasional EIOs that frightened
> > > > > me off that).
> > > > Spurious EIO from RWF_NOWAIT is a bug that needs to be fixed. DO you
> > > > have any details?
> > > > 
> > > I reported it in [1]. It's long since gone since I disabled RWF_NOWAIT. It
> > > was relatively rare, sometimes happening in continuous integration runs that
> > > take hours, and sometimes not.
> > > 
> > > 
> > > I expect it's fixed by now since io_uring relies on it. Maybe I should turn
> > > it on for kernels > some_random_version.
> > > 
> > > 
> > > [1] https://lore.kernel.org/lkml/9bab0f40-5748-f147-efeb-5aac4fd44533@scylladb.com/t/#u
> > Yeah, as I thought. Usage of REQ_NOWAIT with filesystem based IO is
> > simply broken - it causes spurious IO failures to be reported to IO
> > completion callbacks and so are very difficult to track and/or
> > retry. iomap does not use REQ_NOWAIT at all, so you should not ever
> > see this from XFS or ext4 DIO anymore...
> 
> What kernel version would be good?

For ext4? >= 5.5 was when it was converted to the iomap DIO path
should be safe.  Before taht it would use the old DIO path which
sets REQ_NOWAIT when IOCB_NOWAIT (i.e. RWF_NOWAIT) was set for the
IO.

Btrfs is an even more recent convert to iomap-based dio (5.9?).

The REQ_NOWAIT behaviour was introduced into the old DIO path back
in 4.13 by commit 03a07c92a9ed ("block: return on congested block
device") and was intended to support RWF_NOWAIT on raw block
devices.  Hence it was not added to the iomap path as block devices
don't use that path.

Other examples of how REQ_NOWAIT breaks filesystems was a io_uring
hack to force REQ_NOWAIT IO behaviour through filesystems via
"nowait block plugs" resulted in XFS filesystem shutdowns because
of unexpected IO errors during journal writes:

https://lore.kernel.org/linux-xfs/20200915113327.GA1554921@bfoster/

There have been patches proposed to add REQ_NOWAIT to the iomap DIO
code proporsed, but they've all been NACKed because of the fact it
will break filesystem-based RWF_NOWAIT DIO.

So, long story short: On XFS you are fine on all kernels. On all
other block based filesystems you need <4.13, except for ext4 where
>= 5.5 and btrfs where >=5.9 will work correctly.

> commit 4503b7676a2e0abe69c2f2c0d8b03aec53f2f048
> Author: Jens Axboe <axboe@kernel.dk>
> Date:   Mon Jun 1 10:00:27 2020 -0600
> 
>     io_uring: catch -EIO from buffered issue request failure
> 
>     -EIO bubbles up like -EAGAIN if we fail to allocate a request at the
>     lower level. Play it safe and treat it like -EAGAIN in terms of sync
>     retry, to avoid passing back an errant -EIO.
> 
>     Catch some of these early for block based file, as non-mq devices
>     generally do not support NOWAIT. That saves us some overhead by
>     not first trying, then retrying from async context. We can go straight
>     to async punt instead.
> 
>     Signed-off-by: Jens Axboe <axboe@kernel.dk>
> 
> but this looks to be io_uring specific fix (somewhat frightening too), not
> removal of REQ_NOWAIT.

That looks like a similar case to the one I mention above where
io_uring and REQ_NOWAIT aren't playing well with others....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

  reply	other threads:[~2021-01-17 21:35 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-12  1:07 [RFC] xfs: reduce sub-block DIO serialisation Dave Chinner
2021-01-12  1:07 ` [PATCH 1/6] iomap: convert iomap_dio_rw() to an args structure Dave Chinner
2021-01-12  1:22   ` Damien Le Moal
2021-01-12  1:40   ` Darrick J. Wong
2021-01-12  1:53     ` Dave Chinner
2021-01-12 10:31   ` Christoph Hellwig
2021-01-12  1:07 ` [PATCH 2/6] iomap: move DIO NOWAIT setup up into filesystems Dave Chinner
2021-01-12  1:07 ` [PATCH 3/6] xfs: factor out a xfs_ilock_iocb helper Dave Chinner
2021-01-12  1:07 ` [PATCH 4/6] xfs: make xfs_file_aio_write_checks IOCB_NOWAIT-aware Dave Chinner
2021-01-12  1:07 ` [PATCH 5/6] xfs: split unaligned DIO write code out Dave Chinner
2021-01-12 10:37   ` Christoph Hellwig
2021-01-12  1:07 ` [PATCH 6/6] xfs: reduce exclusive locking on unaligned dio Dave Chinner
2021-01-12 10:42   ` Christoph Hellwig
2021-01-12 17:01     ` Brian Foster
2021-01-12 17:10       ` Christoph Hellwig
2021-01-12 22:06       ` Dave Chinner
2021-01-12  8:01 ` [RFC] xfs: reduce sub-block DIO serialisation Avi Kivity
2021-01-12 22:13   ` Dave Chinner
2021-01-13  8:00     ` Avi Kivity
2021-01-13 20:38       ` Dave Chinner
2021-01-14  6:48         ` Avi Kivity
2021-01-17 21:34           ` Dave Chinner [this message]
2021-01-18  7:41             ` Avi Kivity
     [not found] ` <CACz=WechdgSnVHQsg0LKjMiG8kHLujBshmc270yrdjxfpffmDQ@mail.gmail.com>
2021-01-17 21:36   ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210117213401.GB78941@dread.disaster.area \
    --to=david@fromorbit.com \
    --cc=andres@anarazel.de \
    --cc=avi@scylladb.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox