From: "Darrick J. Wong" <djwong@kernel.org>
To: Dave Chinner <david@fromorbit.com>
Cc: Christoph Hellwig <hch@infradead.org>, linux-xfs@vger.kernel.org
Subject: [PATCH v2] xfs: _{attr,data}_map_shared should take ILOCK_EXCL until iread_extents is completely done
Date: Tue, 11 Apr 2023 11:49:34 -0700 [thread overview]
Message-ID: <20230411184934.GK360889@frogsfrogsfrogs> (raw)
From: Darrick J. Wong <djwong@kernel.org>
While fuzzing the data fork extent count on a btree-format directory
with xfs/375, I observed the following (excerpted) splat:
XFS: Assertion failed: xfs_isilocked(ip, XFS_ILOCK_EXCL), file: fs/xfs/libxfs/xfs_bmap.c, line: 1208
------------[ cut here ]------------
WARNING: CPU: 0 PID: 43192 at fs/xfs/xfs_message.c:104 assfail+0x46/0x4a [xfs]
Call Trace:
<TASK>
xfs_iread_extents+0x1af/0x210 [xfs 09f66509ece4938760fac7de64732a0cbd3e39cd]
xchk_dir_walk+0xb8/0x190 [xfs 09f66509ece4938760fac7de64732a0cbd3e39cd]
xchk_parent_count_parent_dentries+0x41/0x80 [xfs 09f66509ece4938760fac7de64732a0cbd3e39cd]
xchk_parent_validate+0x199/0x2e0 [xfs 09f66509ece4938760fac7de64732a0cbd3e39cd]
xchk_parent+0xdf/0x130 [xfs 09f66509ece4938760fac7de64732a0cbd3e39cd]
xfs_scrub_metadata+0x2b8/0x730 [xfs 09f66509ece4938760fac7de64732a0cbd3e39cd]
xfs_scrubv_metadata+0x38b/0x4d0 [xfs 09f66509ece4938760fac7de64732a0cbd3e39cd]
xfs_ioc_scrubv_metadata+0x111/0x160 [xfs 09f66509ece4938760fac7de64732a0cbd3e39cd]
xfs_file_ioctl+0x367/0xf50 [xfs 09f66509ece4938760fac7de64732a0cbd3e39cd]
__x64_sys_ioctl+0x82/0xa0
do_syscall_64+0x2b/0x80
entry_SYSCALL_64_after_hwframe+0x46/0xb0
The cause of this is a race condition in xfs_ilock_data_map_shared,
which performs an unlocked access to the data fork to guess which lock
mode it needs:
Thread 0 Thread 1
xfs_need_iread_extents
<observe no iext tree>
xfs_ilock(..., ILOCK_EXCL)
xfs_iread_extents
<observe no iext tree>
<check ILOCK_EXCL>
<load bmbt extents into iext>
<notice iext size doesn't
match nextents>
xfs_need_iread_extents
<observe iext tree>
xfs_ilock(..., ILOCK_SHARED)
<tear down iext tree>
xfs_iunlock(..., ILOCK_EXCL)
xfs_iread_extents
<observe no iext tree>
<check ILOCK_EXCL>
*BOOM*
Fix this race by adding a flag to the xfs_ifork structure to indicate
that we have not yet read in the extent records and changing the
predicate to look at the flag state, not if_height. The memory barrier
ensures that the flag will not be set until the very end of the
function.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
v2: use smp_store_release per reviewer comments
---
fs/xfs/libxfs/xfs_bmap.c | 6 ++++++
fs/xfs/libxfs/xfs_inode_fork.c | 16 +++++++++++++++-
fs/xfs/libxfs/xfs_inode_fork.h | 6 ++++--
3 files changed, 25 insertions(+), 3 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 34de6e6898c4..f11ef331e5a4 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -1171,6 +1171,12 @@ xfs_iread_extents(
goto out;
}
ASSERT(ir.loaded == xfs_iext_count(ifp));
+ /*
+ * Use release semantics so that we can use acquire semantics in
+ * xfs_need_iread_extents and be guaranteed to see a valid mapping tree
+ * after that load.
+ */
+ smp_store_release(&ifp->if_needextents, 0);
return 0;
out:
xfs_iext_destroy(ifp);
diff --git a/fs/xfs/libxfs/xfs_inode_fork.c b/fs/xfs/libxfs/xfs_inode_fork.c
index 6b21760184d9..1bbe5ea3f00b 100644
--- a/fs/xfs/libxfs/xfs_inode_fork.c
+++ b/fs/xfs/libxfs/xfs_inode_fork.c
@@ -226,10 +226,15 @@ xfs_iformat_data_fork(
/*
* Initialize the extent count early, as the per-format routines may
- * depend on it.
+ * depend on it. Use release semantics to set needextents /after/ we
+ * set the format. This ensures that we can use acquire semantics on
+ * needextents in xfs_need_iread_extents() and be guaranteed to see a
+ * valid format value after that load.
*/
ip->i_df.if_format = dip->di_format;
ip->i_df.if_nextents = xfs_dfork_data_extents(dip);
+ smp_store_release(&ip->i_df.if_needextents,
+ ip->i_df.if_format == XFS_DINODE_FMT_BTREE ? 1 : 0);
switch (inode->i_mode & S_IFMT) {
case S_IFIFO:
@@ -282,8 +287,17 @@ xfs_ifork_init_attr(
enum xfs_dinode_fmt format,
xfs_extnum_t nextents)
{
+ /*
+ * Initialize the extent count early, as the per-format routines may
+ * depend on it. Use release semantics to set needextents /after/ we
+ * set the format. This ensures that we can use acquire semantics on
+ * needextents in xfs_need_iread_extents() and be guaranteed to see a
+ * valid format value after that load.
+ */
ip->i_af.if_format = format;
ip->i_af.if_nextents = nextents;
+ smp_store_release(&ip->i_af.if_needextents,
+ ip->i_af.if_format == XFS_DINODE_FMT_BTREE ? 1 : 0);
}
void
diff --git a/fs/xfs/libxfs/xfs_inode_fork.h b/fs/xfs/libxfs/xfs_inode_fork.h
index d3943d6ad0b9..96d307784c85 100644
--- a/fs/xfs/libxfs/xfs_inode_fork.h
+++ b/fs/xfs/libxfs/xfs_inode_fork.h
@@ -24,6 +24,7 @@ struct xfs_ifork {
xfs_extnum_t if_nextents; /* # of extents in this fork */
short if_broot_bytes; /* bytes allocated for root */
int8_t if_format; /* format of this fork */
+ uint8_t if_needextents; /* extents have not been read */
};
/*
@@ -260,9 +261,10 @@ int xfs_iext_count_upgrade(struct xfs_trans *tp, struct xfs_inode *ip,
uint nr_to_add);
/* returns true if the fork has extents but they are not read in yet. */
-static inline bool xfs_need_iread_extents(struct xfs_ifork *ifp)
+static inline bool xfs_need_iread_extents(const struct xfs_ifork *ifp)
{
- return ifp->if_format == XFS_DINODE_FMT_BTREE && ifp->if_height == 0;
+ /* see xfs_iformat_{data,attr}_fork() for needextents semantics */
+ return smp_load_acquire(&ifp->if_needextents) != 0;
}
#endif /* __XFS_INODE_FORK_H__ */
next reply other threads:[~2023-04-11 18:49 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-04-11 18:49 Darrick J. Wong [this message]
2023-04-11 22:53 ` [PATCH v2] xfsprogs: _{attr,data}_map_shared should take ILOCK_EXCL until iread_extents is completely done Darrick J. Wong
2023-04-11 23:57 ` Dave Chinner
2023-04-11 23:54 ` [PATCH v2] xfs: " Dave Chinner
2023-04-12 12:02 ` Christoph Hellwig
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230411184934.GK360889@frogsfrogsfrogs \
--to=djwong@kernel.org \
--cc=david@fromorbit.com \
--cc=hch@infradead.org \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox