Re: [PATCH] xfs: don't allow NOWAIT DIO across extent boundaries

public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed

From: Dave Chinner <david@fromorbit.com>
To: "Darrick J. Wong" <darrick.wong@oracle.com>
Cc: linux-xfs@vger.kernel.org, axboe@kernel.dk
Subject: Re: [PATCH] xfs: don't allow NOWAIT DIO across extent boundaries
Date: Fri, 20 Nov 2020 07:50:10 +1100	[thread overview]
Message-ID: <20201119205010.GA2842436@dread.disaster.area> (raw)
In-Reply-To: <20201119175526.GF9695@magnolia>

On Thu, Nov 19, 2020 at 09:55:26AM -0800, Darrick J. Wong wrote:
> On Thu, Nov 19, 2020 at 03:23:15PM +1100, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > Jens has reported a situation where partial direct IOs can be issued
> > and completed yet still return -EAGAIN. We don't want this to report
> > a short IO as we want XFS to complete user DIO entirely or not at
> > all.
> > 
> > This partial IO situation can occur on a write IO that is split
> > across an allocated extent and a hole, and the second mapping is
> > returning EAGAIN because allocation would be required.
> > 
> > The trivial reproducer:
> > 
> > $ sudo xfs_io -fdt -c "pwrite 0 4k" -c "pwrite -V 1 -b 8k -N 0 8k" /mnt/scr/foo
> > wrote 4096/4096 bytes at offset 0
> > 4 KiB, 1 ops; 0.0001 sec (27.509 MiB/sec and 7042.2535 ops/sec)
> > pwrite: Resource temporarily unavailable
> > $
> > 
> > The pwritev2(0, 8kB, RWF_NOWAIT) call returns EAGAIN having done
> > the first 4kB write:
> > 
> >  xfs_file_direct_write: dev 259:1 ino 0x83 size 0x1000 offset 0x0 count 0x2000
> >  iomap_apply:          dev 259:1 ino 0x83 pos 0 length 8192 flags WRITE|DIRECT|NOWAIT (0x31) ops xfs_direct_write_iomap_ops caller iomap_dio_rw actor iomap_dio_actor
> >  xfs_ilock_nowait:     dev 259:1 ino 0x83 flags ILOCK_SHARED caller xfs_ilock_for_iomap
> >  xfs_iunlock:          dev 259:1 ino 0x83 flags ILOCK_SHARED caller xfs_direct_write_iomap_begin
> >  xfs_iomap_found:      dev 259:1 ino 0x83 size 0x1000 offset 0x0 count 8192 fork data startoff 0x0 startblock 24 blockcount 0x1
> >  iomap_apply_dstmap:   dev 259:1 ino 0x83 bdev 259:1 addr 102400 offset 0 length 4096 type MAPPED flags DIRTY
> > 
> > Here the first iomap loop has mapped the first 4kB of the file and
> > issued the IO, and we enter the second iomap_apply loop:
> > 
> >  iomap_apply: dev 259:1 ino 0x83 pos 4096 length 4096 flags WRITE|DIRECT|NOWAIT (0x31) ops xfs_direct_write_iomap_ops caller iomap_dio_rw actor iomap_dio_actor
> >  xfs_ilock_nowait:     dev 259:1 ino 0x83 flags ILOCK_SHARED caller xfs_ilock_for_iomap
> >  xfs_iunlock:          dev 259:1 ino 0x83 flags ILOCK_SHARED caller xfs_direct_write_iomap_begin
> > 
> > And we exit with -EAGAIN out because we hit the allocate case trying
> > to make the second 4kB block.
> > 
> > Then IO completes on the first 4kB and the original IO context
> > completes and unlocks the inode, returning -EAGAIN to userspace:
> > 
> >  xfs_end_io_direct_write: dev 259:1 ino 0x83 isize 0x1000 disize 0x1000 offset 0x0 count 4096
> >  xfs_iunlock:          dev 259:1 ino 0x83 flags IOLOCK_SHARED caller xfs_file_dio_aio_write
> > 
> > There are other vectors to the same problem when we re-enter the
> > mapping code if we have to make multiple mappinfs under NOWAIT
> > conditions. e.g. failing trylocks, COW extents being found,
> > allocation being required, and so on.
> > 
> > Avoid all these potential problems by only allowing IOMAP_NOWAIT IO
> > to go ahead if the mapping we retrieve for the IO spans an entire
> > allocated extent. This avoids the possibility of subsequent mappings
> > to complete the IO from triggering NOWAIT semantics by any means as
> > NOWAIT IO will now only enter the mapping code once per NOWAIT IO.
> > 
> > Reported-and-tested-by: Jens Axboe <axboe@kernel.dk>
> > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> 
> Hmm so I guess the whole point of this is speculative writing?

Non-blocking async writes, not speculative writes.

> As in, thread X either wants us to write impatiently, or to fail the
> whole IO so that it can hand the slow-mode write to some other thread or
> something?

Yes, that's exactly the sort of thing RWF_NOWAIT was originally
intended to support (e.g. the SeaStar IO framework).

> The manpage says something about short reads, but it doesn't
> say much about writes, but silently writing some but not all seems
> broken. :)

Well, that depends on how you look at it.

In the end, the current behaviour has a transient data state that is
finalised when the write gets re-run by the
application. The only issue here is efficiency, and we don't really
even care if there's a crash between the these two IOs that would
result in the silent write being exposed.

I say this because the write is always going to be done as two
separate IOs so there's always a failure window where the non-atomic
writes can result in only one of them hitting stable storage.

Hence this change doesn't actually affect anything from the data
integrity perspective - it just closes the window down slightly and
prevents unnecessary repeated IO and/or blocking a task we shouldn't
be blocking.

> If so then I guess this is fine, and to the (limited) extent that I grok
> the whole usecase,
> 
> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

Thanks!

-Dave.
-- 
Dave Chinner
david@fromorbit.com

     prev parent reply	other threads:[~2020-11-19 20:50 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-19  4:23 [PATCH] xfs: don't allow NOWAIT DIO across extent boundaries Dave Chinner
2020-11-19 17:55 ` Darrick J. Wong
2020-11-19 20:50   ` Dave Chinner [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201119205010.GA2842436@dread.disaster.area \
    --to=david@fromorbit.com \
    --cc=axboe@kernel.dk \
    --cc=darrick.wong@oracle.com \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox