From: Dave Chinner <david@fromorbit.com>
To: "Darrick J. Wong" <djwong@kernel.org>
Cc: "Andreas Grünbacher" <andreas.gruenbacher@gmail.com>,
"Andreas Gruenbacher" <agruenba@redhat.com>,
"Christoph Hellwig" <hch@infradead.org>,
"Alexander Viro" <viro@zeniv.linux.org.uk>,
"Matthew Wilcox" <willy@infradead.org>,
linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org,
linux-ext4@vger.kernel.org, cluster-devel@redhat.com
Subject: Re: [RFC v6 08/10] iomap/xfs: Eliminate the iomap_valid handler
Date: Thu, 19 Jan 2023 08:42:23 +1100 [thread overview]
Message-ID: <20230118214223.GH360264@dread.disaster.area> (raw)
In-Reply-To: <Y8Q4FmhYehpQPZ3Z@magnolia>
On Sun, Jan 15, 2023 at 09:29:58AM -0800, Darrick J. Wong wrote:
> 2. Do we need to revalidate mappings for directio writes? I think the
> answer is no (for xfs) because the ->iomap_begin call will allocate
> whatever blocks are needed and truncate/punch/reflink block on the
> iolock while the directio writes are pending, so you'll never end up
> with a stale mapping. But I don't know if that statement applies
> generally...
The issue is not truncate/punch/reflink for either DIO or buffered
IO - the issue that leads to stale iomaps is async extent state.
i.e. IO completion doing unwritten extent conversion.
For DIO, AIO doesn't hold the IOLOCK at all when completion is run
(like buffered writeback), but non-AIO DIO writes hold the IOLOCK
shared while waiting for completion. This means that we can have DIO
submission and completion still running concurrently, and so stale
iomaps are a definite possibility.
From my notes when I looked at this:
1. the race condition for a DIO write mapping go stale is an
overlapping DIO completion and converting the block from unwritten
to written, and then the dio write incorrectly issuing sub-block
zeroing because the mapping is now stale.
2. DIO read into a hole or unwritten extent zeroes the entire range
in the user buffer in one operation. If this is a large range, this
could race with small DIO writes within that range that have
completed
3. There is a window between dio write completion doing unwritten
extent conversion (by ->end_io) and the page cache being
invalidated, providing a window where buffered read maps can be
stale and incorrect read behaviour exposed to userpace before
the page cache is invalidated.
These all stem from IO having overlapping ranges, which is largely
unsupported but can't be entirely prevented (e.g. backup
applications running in the background). Largely the problems are
confined to sub-block IOs. i.e. when sub-block DIO writes to the
same block are being performed, we have the possiblity that one
write completes whilst the other is deciding what to zero, unaware
that the range is now MAPPED rather than UNWRITTEN.
We currently avoid issues with sub-block dio writes by using
IOMAP_DIO_OVERWRITE_ONLY with shared locking. This ensures that the
unaligned IO fits entirely within a MAPPED extent so no sub-block
zeroing is required. If allocation or sub-block zeroing is required,
then we force the filesystem to fall back to exclusive IO locking
and wait for all concurrent DIO in flight to complete so that it
can't race with any other DIO write that might cause the map to
become stale while we are doing the zeroing.
This does not avoid potential issues with DIO write vs buffered
read, nor DIO write vs mmap IO. It's not totally clear to me
whether we need ->iomap_valid checks in the buffered read paths
to avoid the completion races with DIO writes, but there are windows
there where cached iomaps could be considered stale....
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
next prev parent reply other threads:[~2023-01-18 21:43 UTC|newest]
Thread overview: 41+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-01-08 19:40 [RFC v6 00/10] Turn iomap_page_ops into iomap_folio_ops Andreas Gruenbacher
2023-01-08 19:40 ` [RFC v6 01/10] iomap: Add __iomap_put_folio helper Andreas Gruenbacher
2023-01-08 19:40 ` [RFC v6 02/10] iomap/gfs2: Unlock and put folio in page_done handler Andreas Gruenbacher
2023-01-08 19:40 ` [RFC v6 03/10] iomap: Rename page_done handler to put_folio Andreas Gruenbacher
2023-01-08 19:40 ` [RFC v6 04/10] iomap: Add iomap_get_folio helper Andreas Gruenbacher
2023-01-08 21:33 ` Dave Chinner
2023-01-09 12:46 ` Andreas Gruenbacher
2023-01-10 8:46 ` Christoph Hellwig
2023-01-10 9:07 ` Andreas Grünbacher
2023-01-10 13:34 ` Matthew Wilcox
2023-01-10 15:24 ` Christoph Hellwig
2023-01-11 19:36 ` Matthew Wilcox
2023-01-11 20:52 ` Dave Chinner
2023-01-12 8:41 ` Christoph Hellwig
2023-01-15 17:01 ` Darrick J. Wong
2023-01-15 17:06 ` Darrick J. Wong
2023-01-16 5:46 ` Matthew Wilcox
2023-01-16 7:34 ` Christoph Hellwig
2023-01-16 13:18 ` Matthew Wilcox
2023-01-16 16:02 ` Christoph Hellwig
2023-01-08 19:40 ` [RFC v6 05/10] iomap/gfs2: Get page in page_prepare handler Andreas Gruenbacher
2023-01-31 19:37 ` Matthew Wilcox
2023-01-31 21:33 ` Andreas Gruenbacher
2023-01-08 19:40 ` [RFC v6 06/10] iomap: Add __iomap_get_folio helper Andreas Gruenbacher
2023-01-10 8:48 ` Christoph Hellwig
2023-01-08 19:40 ` [RFC v6 07/10] iomap: Rename page_prepare handler to get_folio Andreas Gruenbacher
2023-01-08 19:40 ` [RFC v6 08/10] iomap/xfs: Eliminate the iomap_valid handler Andreas Gruenbacher
2023-01-08 21:59 ` Dave Chinner
2023-01-09 18:45 ` Andreas Gruenbacher
2023-01-09 22:54 ` Dave Chinner
2023-01-10 1:09 ` Andreas Grünbacher
2023-01-15 17:29 ` Darrick J. Wong
2023-01-18 7:21 ` Christoph Hellwig
2023-01-18 9:11 ` Damien Le Moal
2023-01-18 19:04 ` Darrick J. Wong
2023-01-18 19:57 ` Andreas Grünbacher
2023-01-18 21:42 ` Dave Chinner [this message]
2023-01-10 8:51 ` Christoph Hellwig
2023-01-10 8:52 ` Christoph Hellwig
2023-01-08 19:40 ` [RFC v6 09/10] iomap: Rename page_ops to folio_ops Andreas Gruenbacher
2023-01-08 19:40 ` [RFC v6 10/10] xfs: Make xfs_iomap_folio_ops static Andreas Gruenbacher
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230118214223.GH360264@dread.disaster.area \
--to=david@fromorbit.com \
--cc=agruenba@redhat.com \
--cc=andreas.gruenbacher@gmail.com \
--cc=cluster-devel@redhat.com \
--cc=djwong@kernel.org \
--cc=hch@infradead.org \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-xfs@vger.kernel.org \
--cc=viro@zeniv.linux.org.uk \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox