* [PATCH] xfs: fix COW writeback race
@ 2017-01-14 14:33 Christoph Hellwig
2017-01-17 2:06 ` Darrick J. Wong
0 siblings, 1 reply; 2+ messages in thread
From: Christoph Hellwig @ 2017-01-14 14:33 UTC (permalink / raw)
To: linux-xfs; +Cc: darrick.wong
Due to the way how xfs_iomap_write_allocate tries to convert the whole
found extents from delalloc to real space we can run into a race
condition with multiple threads doing writes to this same extent.
For the non-COW case that is harmless as the only thing that can happen
is that we call xfs_bmapi_write on an extent that has already been
converted to a real allocation. For COW writes where we move the extent
from the COW to the data fork after I/O completion the race is, however,
not quite as harmless. In the worst case we are now calling
xfs_bmapi_write on a region that contains hole in the COW work, which
will trip up an assert in debug builds or lead to file system corruption
in non-debug builds. This seems to be reproducible with workloads of
small O_DSYNC write, although so far I've not managed to come up with
a with an isolated reproducer.
The fix for the issue is relatively simple: tell xfs_bmapi_write
that we are only asked to convert delayed allocations and skip holes
in that case.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
fs/xfs/libxfs/xfs_bmap.c | 41 +++++++++++++++++++++++++++++------------
fs/xfs/libxfs/xfs_bmap.h | 3 +++
fs/xfs/xfs_iomap.c | 2 +-
3 files changed, 33 insertions(+), 13 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 44773c9..ceae779 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -4514,8 +4514,6 @@ xfs_bmapi_write(
int n; /* current extent index */
xfs_fileoff_t obno; /* old block number (offset) */
int whichfork; /* data or attr fork */
- char inhole; /* current location is hole in file */
- char wasdelay; /* old extent was delayed */
#ifdef DEBUG
xfs_fileoff_t orig_bno; /* original block number value */
@@ -4603,22 +4601,41 @@ xfs_bmapi_write(
bma.firstblock = firstblock;
while (bno < end && n < *nmap) {
- inhole = eof || bma.got.br_startoff > bno;
- wasdelay = !inhole && isnullstartblock(bma.got.br_startblock);
+ bool need_alloc = false, wasdelay = false;
- /*
- * Make sure we only reflink into a hole.
- */
- if (flags & XFS_BMAPI_REMAP)
- ASSERT(inhole);
- if (flags & XFS_BMAPI_COWFORK)
- ASSERT(!inhole);
+ /* in hole or beyoned EOF? */
+ if (eof || bma.got.br_startoff > bno) {
+ if (flags & XFS_BMAPI_DELALLOC) {
+ /*
+ * For the COW fork we can reasonably get a
+ * request for converting an extent that races
+ * with other threads already having converted
+ * part of it, as there converting COW to
+ * regular blocks is not protected using the
+ * IOLOCK.
+ */
+ ASSERT(flags & XFS_BMAPI_COWFORK);
+ if (!(flags & XFS_BMAPI_COWFORK)) {
+ error = -EIO;
+ goto error0;
+ }
+ } else {
+ need_alloc = true;
+ }
+ } else {
+ /*
+ * Make sure we only reflink into a hole.
+ */
+ ASSERT(!(flags & XFS_BMAPI_REMAP));
+ if (isnullstartblock(bma.got.br_startblock))
+ wasdelay = true;
+ }
/*
* First, deal with the hole before the allocated space
* that we found, if any.
*/
- if (inhole || wasdelay) {
+ if (need_alloc || wasdelay) {
bma.eof = eof;
bma.conv = !!(flags & XFS_BMAPI_CONVERT);
bma.wasdel = wasdelay;
diff --git a/fs/xfs/libxfs/xfs_bmap.h b/fs/xfs/libxfs/xfs_bmap.h
index cecd094..8f805a7 100644
--- a/fs/xfs/libxfs/xfs_bmap.h
+++ b/fs/xfs/libxfs/xfs_bmap.h
@@ -110,6 +110,9 @@ struct xfs_extent_free_item
/* Map something in the CoW fork. */
#define XFS_BMAPI_COWFORK 0x200
+/* Only convert delalloc space, don't allocate entirely new extents */
+#define XFS_BMAPI_DELALLOC 0x400
+
#define XFS_BMAPI_FLAGS \
{ XFS_BMAPI_ENTIRE, "ENTIRE" }, \
{ XFS_BMAPI_METADATA, "METADATA" }, \
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index ca137b7..7ee8629 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -795,7 +795,7 @@ xfs_iomap_write_allocate(
xfs_trans_t *tp;
int nimaps;
int error = 0;
- int flags = 0;
+ int flags = XFS_BMAPI_DELALLOC;
int nres;
if (whichfork == XFS_COW_FORK)
--
2.1.4
^ permalink raw reply related [flat|nested] 2+ messages in thread
* Re: [PATCH] xfs: fix COW writeback race
2017-01-14 14:33 [PATCH] xfs: fix COW writeback race Christoph Hellwig
@ 2017-01-17 2:06 ` Darrick J. Wong
0 siblings, 0 replies; 2+ messages in thread
From: Darrick J. Wong @ 2017-01-17 2:06 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: linux-xfs
On Sat, Jan 14, 2017 at 03:33:13PM +0100, Christoph Hellwig wrote:
> Due to the way how xfs_iomap_write_allocate tries to convert the whole
> found extents from delalloc to real space we can run into a race
> condition with multiple threads doing writes to this same extent.
> For the non-COW case that is harmless as the only thing that can happen
> is that we call xfs_bmapi_write on an extent that has already been
> converted to a real allocation. For COW writes where we move the extent
> from the COW to the data fork after I/O completion the race is, however,
> not quite as harmless. In the worst case we are now calling
> xfs_bmapi_write on a region that contains hole in the COW work, which
> will trip up an assert in debug builds or lead to file system corruption
> in non-debug builds. This seems to be reproducible with workloads of
> small O_DSYNC write, although so far I've not managed to come up with
> a with an isolated reproducer.
>
> The fix for the issue is relatively simple: tell xfs_bmapi_write
> that we are only asked to convert delayed allocations and skip holes
> in that case.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
> fs/xfs/libxfs/xfs_bmap.c | 41 +++++++++++++++++++++++++++++------------
> fs/xfs/libxfs/xfs_bmap.h | 3 +++
> fs/xfs/xfs_iomap.c | 2 +-
> 3 files changed, 33 insertions(+), 13 deletions(-)
>
> diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
> index 44773c9..ceae779 100644
> --- a/fs/xfs/libxfs/xfs_bmap.c
> +++ b/fs/xfs/libxfs/xfs_bmap.c
> @@ -4514,8 +4514,6 @@ xfs_bmapi_write(
> int n; /* current extent index */
> xfs_fileoff_t obno; /* old block number (offset) */
> int whichfork; /* data or attr fork */
> - char inhole; /* current location is hole in file */
> - char wasdelay; /* old extent was delayed */
>
> #ifdef DEBUG
> xfs_fileoff_t orig_bno; /* original block number value */
> @@ -4603,22 +4601,41 @@ xfs_bmapi_write(
> bma.firstblock = firstblock;
>
> while (bno < end && n < *nmap) {
> - inhole = eof || bma.got.br_startoff > bno;
> - wasdelay = !inhole && isnullstartblock(bma.got.br_startblock);
> + bool need_alloc = false, wasdelay = false;
>
> - /*
> - * Make sure we only reflink into a hole.
> - */
> - if (flags & XFS_BMAPI_REMAP)
> - ASSERT(inhole);
> - if (flags & XFS_BMAPI_COWFORK)
> - ASSERT(!inhole);
> + /* in hole or beyoned EOF? */
> + if (eof || bma.got.br_startoff > bno) {
> + if (flags & XFS_BMAPI_DELALLOC) {
> + /*
> + * For the COW fork we can reasonably get a
> + * request for converting an extent that races
> + * with other threads already having converted
> + * part of it, as there converting COW to
> + * regular blocks is not protected using the
> + * IOLOCK.
> + */
> + ASSERT(flags & XFS_BMAPI_COWFORK);
> + if (!(flags & XFS_BMAPI_COWFORK)) {
> + error = -EIO;
> + goto error0;
> + }
> + } else {
> + need_alloc = true;
> + }
> + } else {
> + /*
> + * Make sure we only reflink into a hole.
> + */
> + ASSERT(!(flags & XFS_BMAPI_REMAP));
> + if (isnullstartblock(bma.got.br_startblock))
> + wasdelay = true;
> + }
>
> /*
> * First, deal with the hole before the allocated space
> * that we found, if any.
> */
> - if (inhole || wasdelay) {
> + if (need_alloc || wasdelay) {
> bma.eof = eof;
> bma.conv = !!(flags & XFS_BMAPI_CONVERT);
> bma.wasdel = wasdelay;
> diff --git a/fs/xfs/libxfs/xfs_bmap.h b/fs/xfs/libxfs/xfs_bmap.h
> index cecd094..8f805a7 100644
> --- a/fs/xfs/libxfs/xfs_bmap.h
> +++ b/fs/xfs/libxfs/xfs_bmap.h
> @@ -110,6 +110,9 @@ struct xfs_extent_free_item
> /* Map something in the CoW fork. */
> #define XFS_BMAPI_COWFORK 0x200
>
> +/* Only convert delalloc space, don't allocate entirely new extents */
> +#define XFS_BMAPI_DELALLOC 0x400
> +
> #define XFS_BMAPI_FLAGS \
> { XFS_BMAPI_ENTIRE, "ENTIRE" }, \
> { XFS_BMAPI_METADATA, "METADATA" }, \
XFS_BMAPI_FLAGS needs an entry for XFS_BMAPI_DELALLOC so that tracing
continues to work properly.
Other than that this patch seems ok to me...
--D
> diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> index ca137b7..7ee8629 100644
> --- a/fs/xfs/xfs_iomap.c
> +++ b/fs/xfs/xfs_iomap.c
> @@ -795,7 +795,7 @@ xfs_iomap_write_allocate(
> xfs_trans_t *tp;
> int nimaps;
> int error = 0;
> - int flags = 0;
> + int flags = XFS_BMAPI_DELALLOC;
> int nres;
>
> if (whichfork == XFS_COW_FORK)
> --
> 2.1.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2017-01-17 2:06 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-01-14 14:33 [PATCH] xfs: fix COW writeback race Christoph Hellwig
2017-01-17 2:06 ` Darrick J. Wong
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).