From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
linux-xfs@vger.kernel.org,
"Darrick J. Wong" <darrick.wong@oracle.com>,
Christoph Hellwig <hch@lst.de>, Brian Foster <bfoster@redhat.com>
Subject: [PATCH 4.9 46/51] xfs: fix COW writeback race
Date: Thu, 2 Feb 2017 19:38:05 +0100 [thread overview]
Message-ID: <20170202183347.603177160@linuxfoundation.org> (raw)
In-Reply-To: <20170202183345.067336143@linuxfoundation.org>
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Christoph Hellwig <hch@lst.de>
commit d2b3964a0780d2d2994eba57f950d6c9fe489ed8 upstream.
Due to the way how xfs_iomap_write_allocate tries to convert the whole
found extents from delalloc to real space we can run into a race
condition with multiple threads doing writes to this same extent.
For the non-COW case that is harmless as the only thing that can happen
is that we call xfs_bmapi_write on an extent that has already been
converted to a real allocation. For COW writes where we move the extent
from the COW to the data fork after I/O completion the race is, however,
not quite as harmless. In the worst case we are now calling
xfs_bmapi_write on a region that contains hole in the COW work, which
will trip up an assert in debug builds or lead to file system corruption
in non-debug builds. This seems to be reproducible with workloads of
small O_DSYNC write, although so far I've not managed to come up with
a with an isolated reproducer.
The fix for the issue is relatively simple: tell xfs_bmapi_write
that we are only asked to convert delayed allocations and skip holes
in that case.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
fs/xfs/libxfs/xfs_bmap.c | 44 ++++++++++++++++++++++++++++++++------------
fs/xfs/libxfs/xfs_bmap.h | 6 +++++-
fs/xfs/xfs_iomap.c | 2 +-
3 files changed, 38 insertions(+), 14 deletions(-)
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -4607,8 +4607,6 @@ xfs_bmapi_write(
int n; /* current extent index */
xfs_fileoff_t obno; /* old block number (offset) */
int whichfork; /* data or attr fork */
- char inhole; /* current location is hole in file */
- char wasdelay; /* old extent was delayed */
#ifdef DEBUG
xfs_fileoff_t orig_bno; /* original block number value */
@@ -4694,22 +4692,44 @@ xfs_bmapi_write(
bma.firstblock = firstblock;
while (bno < end && n < *nmap) {
- inhole = eof || bma.got.br_startoff > bno;
- wasdelay = !inhole && isnullstartblock(bma.got.br_startblock);
+ bool need_alloc = false, wasdelay = false;
- /*
- * Make sure we only reflink into a hole.
- */
- if (flags & XFS_BMAPI_REMAP)
- ASSERT(inhole);
- if (flags & XFS_BMAPI_COWFORK)
- ASSERT(!inhole);
+ /* in hole or beyoned EOF? */
+ if (eof || bma.got.br_startoff > bno) {
+ if (flags & XFS_BMAPI_DELALLOC) {
+ /*
+ * For the COW fork we can reasonably get a
+ * request for converting an extent that races
+ * with other threads already having converted
+ * part of it, as there converting COW to
+ * regular blocks is not protected using the
+ * IOLOCK.
+ */
+ ASSERT(flags & XFS_BMAPI_COWFORK);
+ if (!(flags & XFS_BMAPI_COWFORK)) {
+ error = -EIO;
+ goto error0;
+ }
+
+ if (eof || bno >= end)
+ break;
+ } else {
+ need_alloc = true;
+ }
+ } else {
+ /*
+ * Make sure we only reflink into a hole.
+ */
+ ASSERT(!(flags & XFS_BMAPI_REMAP));
+ if (isnullstartblock(bma.got.br_startblock))
+ wasdelay = true;
+ }
/*
* First, deal with the hole before the allocated space
* that we found, if any.
*/
- if (inhole || wasdelay) {
+ if (need_alloc || wasdelay) {
bma.eof = eof;
bma.conv = !!(flags & XFS_BMAPI_CONVERT);
bma.wasdel = wasdelay;
--- a/fs/xfs/libxfs/xfs_bmap.h
+++ b/fs/xfs/libxfs/xfs_bmap.h
@@ -110,6 +110,9 @@ struct xfs_extent_free_item
/* Map something in the CoW fork. */
#define XFS_BMAPI_COWFORK 0x200
+/* Only convert delalloc space, don't allocate entirely new extents */
+#define XFS_BMAPI_DELALLOC 0x400
+
#define XFS_BMAPI_FLAGS \
{ XFS_BMAPI_ENTIRE, "ENTIRE" }, \
{ XFS_BMAPI_METADATA, "METADATA" }, \
@@ -120,7 +123,8 @@ struct xfs_extent_free_item
{ XFS_BMAPI_CONVERT, "CONVERT" }, \
{ XFS_BMAPI_ZERO, "ZERO" }, \
{ XFS_BMAPI_REMAP, "REMAP" }, \
- { XFS_BMAPI_COWFORK, "COWFORK" }
+ { XFS_BMAPI_COWFORK, "COWFORK" }, \
+ { XFS_BMAPI_DELALLOC, "DELALLOC" }
static inline int xfs_bmapi_aflag(int w)
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -681,7 +681,7 @@ xfs_iomap_write_allocate(
xfs_trans_t *tp;
int nimaps;
int error = 0;
- int flags = 0;
+ int flags = XFS_BMAPI_DELALLOC;
int nres;
if (whichfork == XFS_COW_FORK)
next prev parent reply other threads:[~2017-02-02 18:40 UTC|newest]
Thread overview: 55+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-02-02 18:37 [PATCH 4.9 00/51] 4.9.8-stable review Greg Kroah-Hartman
2017-02-02 18:37 ` [PATCH 4.9 01/51] r8152: fix the sw rx checksum is unavailable Greg Kroah-Hartman
2017-02-02 18:37 ` [PATCH 4.9 02/51] netvsc: add rcu_read locking to netvsc callback Greg Kroah-Hartman
2017-02-02 18:37 ` [PATCH 4.9 03/51] mlxsw: spectrum: Fix memory leak at skb reallocation Greg Kroah-Hartman
2017-02-02 18:37 ` [PATCH 4.9 04/51] mlxsw: switchx2: " Greg Kroah-Hartman
2017-02-02 18:37 ` [PATCH 4.9 05/51] mlxsw: pci: Fix EQE structure definition Greg Kroah-Hartman
2017-02-02 18:37 ` [PATCH 4.9 06/51] net: lwtunnel: Handle lwtunnel_fill_encap failure Greg Kroah-Hartman
2017-02-02 18:37 ` [PATCH 4.9 07/51] net: ipv4: fix table id in getroute response Greg Kroah-Hartman
2017-02-02 18:37 ` [PATCH 4.9 08/51] net: systemport: Decouple flow control from __bcm_sysport_tx_reclaim Greg Kroah-Hartman
2017-02-02 18:37 ` [PATCH 4.9 09/51] tcp: fix tcp_fastopen unaligned access complaints on sparc Greg Kroah-Hartman
2017-02-02 18:37 ` [PATCH 4.9 10/51] openvswitch: maintain correct checksum state in conntrack actions Greg Kroah-Hartman
2017-02-02 18:37 ` [PATCH 4.9 11/51] mlx4: do not call napi_schedule() without care Greg Kroah-Hartman
2017-02-02 18:37 ` [PATCH 4.9 12/51] ravb: do not use zero-length alignment DMA descriptor Greg Kroah-Hartman
2017-02-02 18:37 ` [PATCH 4.9 13/51] ip6_tunnel: Account for tunnel header in tunnel MTU Greg Kroah-Hartman
2017-02-02 18:37 ` [PATCH 4.9 14/51] ax25: Fix segfault after sock connection timeout Greg Kroah-Hartman
2017-02-02 18:37 ` [PATCH 4.9 15/51] net sched actions: fix refcnt when GETing of action after bind Greg Kroah-Hartman
2017-02-02 18:37 ` [PATCH 4.9 16/51] virtio: dont set VIRTIO_NET_HDR_F_DATA_VALID on xmit Greg Kroah-Hartman
2017-02-02 18:37 ` [PATCH 4.9 17/51] virtio-net: restore VIRTIO_HDR_F_DATA_VALID on receiving Greg Kroah-Hartman
2017-02-02 18:37 ` [PATCH 4.9 18/51] vxlan: fix byte order of vxlan-gpe port number Greg Kroah-Hartman
2017-02-02 18:37 ` [PATCH 4.9 19/51] net: fix harmonize_features() vs NETIF_F_HIGHDMA Greg Kroah-Hartman
2017-02-02 18:37 ` [PATCH 4.9 20/51] net: phy: bcm63xx: Utilize correct config_intr function Greg Kroah-Hartman
2017-02-02 18:37 ` [PATCH 4.9 21/51] lwtunnel: fix autoload of lwt modules Greg Kroah-Hartman
2017-02-02 18:37 ` [PATCH 4.9 22/51] ipv6: addrconf: Avoid addrconf_disable_change() using RCU read-side lock Greg Kroah-Hartman
2017-02-02 18:37 ` [PATCH 4.9 23/51] tcp: initialize max window for a new fastopen socket Greg Kroah-Hartman
2017-02-02 18:37 ` [PATCH 4.9 24/51] net/mlx5e: Do not recycle pages from emergency reserve Greg Kroah-Hartman
2017-02-02 18:37 ` [PATCH 4.9 25/51] bridge: netlink: call br_changelink() during br_dev_newlink() Greg Kroah-Hartman
2017-02-02 18:37 ` [PATCH 4.9 26/51] net: mpls: Fix multipath selection for LSR use case Greg Kroah-Hartman
2017-02-02 18:37 ` [PATCH 4.9 27/51] r8152: dont execute runtime suspend if the tx is not empty Greg Kroah-Hartman
2017-02-02 18:37 ` [PATCH 4.9 28/51] af_unix: move unix_mknod() out of bindlock Greg Kroah-Hartman
2017-02-02 18:37 ` [PATCH 4.9 30/51] net: Specify the owning module for lwtunnel ops Greg Kroah-Hartman
2017-02-02 18:37 ` [PATCH 4.9 31/51] lwtunnel: Fix oops on state free after encap module unload Greg Kroah-Hartman
2017-02-02 18:37 ` [PATCH 4.9 32/51] net: dsa: Bring back device detaching in dsa_slave_suspend() Greg Kroah-Hartman
2017-02-02 18:37 ` [PATCH 4.9 33/51] xfs: bump up reserved blocks in xfs_alloc_set_aside Greg Kroah-Hartman
2017-02-02 18:37 ` [PATCH 4.9 34/51] xfs: fix bogus minleft manipulations Greg Kroah-Hartman
2017-02-02 18:37 ` [PATCH 4.9 35/51] xfs: adjust allocation length in xfs_alloc_space_available Greg Kroah-Hartman
2017-02-02 18:37 ` [PATCH 4.9 36/51] xfs: dont rely on ->total " Greg Kroah-Hartman
2017-02-02 18:37 ` [PATCH 4.9 37/51] xfs: dont print warnings when xfs_log_force fails Greg Kroah-Hartman
2017-02-02 18:37 ` [PATCH 4.9 38/51] xfs: make the ASSERT() condition likely Greg Kroah-Hartman
2017-02-02 18:37 ` [PATCH 4.9 39/51] xfs: sanity check directory inode di_size Greg Kroah-Hartman
2017-02-02 18:37 ` [PATCH 4.9 40/51] xfs: add missing include dependencies to xfs_dir2.h Greg Kroah-Hartman
2017-02-02 18:38 ` [PATCH 4.9 41/51] xfs: replace xfs_mode_to_ftype table with switch statement Greg Kroah-Hartman
2017-02-02 18:38 ` [PATCH 4.9 42/51] xfs: sanity check inode mode when creating new dentry Greg Kroah-Hartman
2017-02-02 18:38 ` [PATCH 4.9 43/51] xfs: sanity check inode di_mode Greg Kroah-Hartman
2017-02-02 18:38 ` [PATCH 4.9 44/51] xfs: dont wrap ID in xfs_dq_get_next_id Greg Kroah-Hartman
2017-02-02 18:38 ` [PATCH 4.9 45/51] xfs: fix xfs_mode_to_ftype() prototype Greg Kroah-Hartman
2017-02-02 18:38 ` Greg Kroah-Hartman [this message]
2017-02-02 18:38 ` [PATCH 4.9 47/51] xfs: verify dirblocklog correctly Greg Kroah-Hartman
2017-02-02 18:38 ` [PATCH 4.9 48/51] xfs: remove racy hasattr check from attr ops Greg Kroah-Hartman
2017-02-02 18:38 ` [PATCH 4.9 49/51] xfs: extsize hints are not unlikely in xfs_bmap_btalloc Greg Kroah-Hartman
2017-02-02 18:38 ` [PATCH 4.9 50/51] xfs: clear _XBF_PAGES from buffers when readahead page Greg Kroah-Hartman
2017-02-02 18:38 ` [PATCH 4.9 51/51] xfs: fix bmv_count confusion w/ shared extents Greg Kroah-Hartman
2017-02-02 20:38 ` [PATCH 4.9 00/51] 4.9.8-stable review Shuah Khan
2017-02-02 20:56 ` Greg Kroah-Hartman
2017-02-03 5:14 ` Guenter Roeck
2017-02-03 7:17 ` Greg Kroah-Hartman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170202183347.603177160@linuxfoundation.org \
--to=gregkh@linuxfoundation.org \
--cc=bfoster@redhat.com \
--cc=darrick.wong@oracle.com \
--cc=hch@lst.de \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-xfs@vger.kernel.org \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.