From: Dave Chinner <david@fromorbit.com>
To: xfs@oss.sgi.com
Subject: [PATCH 11/14] xfs: swap leaf buffer into path struct atomically during path shift
Date: Mon, 15 Feb 2016 17:18:22 +1100 [thread overview]
Message-ID: <1455517105-20033-12-git-send-email-david@fromorbit.com> (raw)
In-Reply-To: <1455517105-20033-1-git-send-email-david@fromorbit.com>
From: Brian Foster <bfoster@redhat.com>
Source kernel commit 7df1c170b9a45ab3a7401c79bbefa9939bf8eafb
The node directory lookup code uses a state structure that tracks the
path of buffers used to search for the hash of a filename through the
leaf blocks. When the lookup encounters a block that ends with the
requested hash, but the entry has not yet been found, it must shift over
to the next block and continue looking for the entry (i.e., duplicate
hashes could continue over into the next block). This shift mechanism
involves walking back up and down the state structure, replacing buffers
at the appropriate btree levels as necessary.
When a buffer is replaced, the old buffer is released and the new buffer
read into the active slot in the path structure. Because the buffer is
read directly into the path slot, a buffer read failure can result in
setting a NULL buffer pointer in an active slot. This throws off the
state cleanup code in xfs_dir2_node_lookup(), which expects to release a
buffer from each active slot. Instead, a BUG occurs due to a NULL
pointer dereference:
BUG: unable to handle kernel NULL pointer dereference at 00000000000001e8
IP: [<ffffffffa0585063>] xfs_trans_brelse+0x2a3/0x3c0 [xfs]
...
RIP: 0010:[<ffffffffa0585063>] [<ffffffffa0585063>] xfs_trans_brelse+0x2a3/0x3c0 [xfs]
...
Call Trace:
[<ffffffffa05250c6>] xfs_dir2_node_lookup+0xa6/0x2c0 [xfs]
[<ffffffffa0519f7c>] xfs_dir_lookup+0x1ac/0x1c0 [xfs]
[<ffffffffa055d0e1>] xfs_lookup+0x91/0x290 [xfs]
[<ffffffffa05580b3>] xfs_vn_lookup+0x73/0xb0 [xfs]
[<ffffffff8122de8d>] lookup_real+0x1d/0x50
[<ffffffff8123330e>] path_openat+0x91e/0x1490
[<ffffffff81235079>] do_filp_open+0x89/0x100
...
This has been reproduced via a parallel fsstress and filesystem shutdown
workload in a loop. The shutdown triggers the read error in the
aforementioned codepath and causes the BUG in xfs_dir2_node_lookup().
Update xfs_da3_path_shift() to update the active path slot atomically
with respect to the caller when a buffer is replaced. This ensures that
the caller always sees the old or new buffer in the slot and prevents
the NULL pointer dereference.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
---
libxfs/xfs_da_btree.c | 23 ++++++++++++++---------
1 file changed, 14 insertions(+), 9 deletions(-)
diff --git a/libxfs/xfs_da_btree.c b/libxfs/xfs_da_btree.c
index 25072c7..f3c04ab 100644
--- a/libxfs/xfs_da_btree.c
+++ b/libxfs/xfs_da_btree.c
@@ -1822,6 +1822,7 @@ xfs_da3_path_shift(
struct xfs_da_args *args;
struct xfs_da_node_entry *btree;
struct xfs_da3_icnode_hdr nodehdr;
+ struct xfs_buf *bp;
xfs_dablk_t blkno = 0;
int level;
int error;
@@ -1866,20 +1867,24 @@ xfs_da3_path_shift(
*/
for (blk++, level++; level < path->active; blk++, level++) {
/*
- * Release the old block.
- * (if it's dirty, trans won't actually let go)
+ * Read the next child block into a local buffer.
*/
- if (release)
- xfs_trans_brelse(args->trans, blk->bp);
+ error = xfs_da3_node_read(args->trans, dp, blkno, -1, &bp,
+ args->whichfork);
+ if (error)
+ return error;
/*
- * Read the next child block.
+ * Release the old block (if it's dirty, the trans doesn't
+ * actually let go) and swap the local buffer into the path
+ * structure. This ensures failure of the above read doesn't set
+ * a NULL buffer in an active slot in the path.
*/
+ if (release)
+ xfs_trans_brelse(args->trans, blk->bp);
blk->blkno = blkno;
- error = xfs_da3_node_read(args->trans, dp, blkno, -1,
- &blk->bp, args->whichfork);
- if (error)
- return error;
+ blk->bp = bp;
+
info = blk->bp->b_addr;
ASSERT(info->magic == cpu_to_be16(XFS_DA_NODE_MAGIC) ||
info->magic == cpu_to_be16(XFS_DA3_NODE_MAGIC) ||
--
2.5.0
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
next prev parent reply other threads:[~2016-02-15 6:19 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-02-15 6:18 [PATCH 0/14] xfsprogs: kernel libxfs sync up to 4.5-rc2 Dave Chinner
2016-02-15 6:18 ` [PATCH 01/14] libxfs: Optimize the loop for xfs_bitmap_empty Dave Chinner
2016-02-15 6:18 ` [PATCH 02/14] xfs: add missing bmap cancel calls in error paths Dave Chinner
2016-02-15 6:18 ` [PATCH 03/14] xfs: log local to remote symlink conversions correctly on v5 supers Dave Chinner
2016-02-15 6:18 ` [PATCH 04/14] xfs: per-filesystem stats counter implementation Dave Chinner
2016-02-15 6:18 ` [PATCH 05/14] xfs: introduce BMAPI_ZERO for allocating zeroed extents Dave Chinner
2016-02-16 19:20 ` Brian Foster
2016-02-15 6:18 ` [PATCH 06/14] xfs: get mp from bma->ip in xfs_bmap code Dave Chinner
2016-02-15 6:18 ` [PATCH 07/14] xfs: bmapbt checking on debug kernels too expensive Dave Chinner
2016-02-15 6:18 ` [PATCH 08/14] xfs: eliminate committed arg from xfs_bmap_finish Dave Chinner
2016-02-15 6:18 ` [PATCH 09/14] xfs: inode recovery readahead can race with inode buffer creation Dave Chinner
2016-02-15 6:18 ` [PATCH 10/14] xfs: handle dquot buffer readahead in log recovery correctly Dave Chinner
2016-02-16 19:20 ` Brian Foster
2016-02-15 6:18 ` Dave Chinner [this message]
2016-02-15 6:18 ` [PATCH 12/14] libxfs: fix two comment typos Dave Chinner
2016-02-15 6:18 ` [PATCH 13/14] xfs: stop holding ILOCK over filldir callbacks Dave Chinner
2016-02-15 6:18 ` [PATCH 14/14] xfs: Validate the length of on-disk ACLs Dave Chinner
2016-02-16 19:19 ` [PATCH 0/14] xfsprogs: kernel libxfs sync up to 4.5-rc2 Brian Foster
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1455517105-20033-12-git-send-email-david@fromorbit.com \
--to=david@fromorbit.com \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox