public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] xfs: more verifications!
@ 2018-06-05  2:43 Dave Chinner
  2018-06-05  2:43 ` [PATCH 1/3] xfs: catch bad stripe alignment configurations Dave Chinner
                   ` (4 more replies)
  0 siblings, 5 replies; 17+ messages in thread
From: Dave Chinner @ 2018-06-05  2:43 UTC (permalink / raw)
  To: linux-xfs

Hi folks,

This patchset addresses a series of corruptions noticed in an image
provided by Wen Xu. The patches only address detecting corruptions
before they can do damage - it does not address/fix the crash the
image caused but instead prevents the bad information from getting
to the point where it can cause a crash.

The third patch addresses a general btree record validation issue;
we should probably dirve this inwards to each btree implementation
with a ->verify_record() callout, as this patch does not address
all the places that btree records are traversed during searches. It
will catch any attempt to use a bad record that a search lands on,
however.

Comments?

-Dave.



^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 1/3] xfs: catch bad stripe alignment configurations
  2018-06-05  2:43 [PATCH 0/3] xfs: more verifications! Dave Chinner
@ 2018-06-05  2:43 ` Dave Chinner
  2018-06-05  3:44   ` Darrick J. Wong
  2018-06-05  2:43 ` [PATCH 2/3] xfs: verify extent size hint is valid in inode verifier Dave Chinner
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 17+ messages in thread
From: Dave Chinner @ 2018-06-05  2:43 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

When stripe alignments are invalid, data alignment algorithms in the
allocator may not work correctly. Ensure we catch superblocks with
invalid stripe alignment setups at mount time. These data alignment
mismatches are now detected at mount time like this:

XFS (loop0): SB stripe unit sanity check failed
XFS (loop0): Metadata corruption detected at xfs_sb_read_verify+0xab/0x110, xfs_sb block 0xffffffffffffffff
XFS (loop0): Unmount and run xfs_repair
XFS (loop0): First 128 bytes of corrupted metadata buffer:
0000000091c2de02: 58 46 53 42 00 00 10 00 00 00 00 00 00 00 10 00  XFSB............
0000000023bff869: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
00000000cdd8c893: 17 32 37 15 ff ca 46 3d 9a 17 d3 33 04 b5 f1 a2  .27...F=...3....
000000009fd2844f: 00 00 00 00 00 00 00 04 00 00 00 00 00 00 06 d0  ................
0000000088e9b0bb: 00 00 00 00 00 00 06 d1 00 00 00 00 00 00 06 d2  ................
00000000ff233a20: 00 00 00 01 00 00 10 00 00 00 00 01 00 00 00 00  ................
000000009db0ac8b: 00 00 03 60 e1 34 02 00 08 00 00 02 00 00 00 00  ...`.4..........
00000000f7022460: 00 00 00 00 00 00 00 00 0c 09 0b 01 0c 00 00 19  ................
XFS (loop0): SB validate failed with error -117.

And the mount fails.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/libxfs/xfs_sb.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c
index 54c5e14fdfda..fe9448862667 100644
--- a/fs/xfs/libxfs/xfs_sb.c
+++ b/fs/xfs/libxfs/xfs_sb.c
@@ -266,6 +266,22 @@ xfs_mount_validate_sb(
 		return -EFSCORRUPTED;
 	}
 
+	if (sbp->sb_unit) {
+		if (!xfs_sb_version_hasdalign(sbp) ||
+		    sbp->sb_unit > sbp->sb_width ||
+		    (sbp->sb_width % sbp->sb_unit) != 0) {
+			xfs_notice(mp, "SB stripe unit sanity check failed");
+			return -EFSCORRUPTED;
+		}
+	} else if (xfs_sb_version_hasdalign(sbp)) {
+		xfs_notice(mp, "SB stripe alignment sanity check failed");
+		return -EFSCORRUPTED;
+	} else if (sbp->sb_width) {
+		xfs_notice(mp, "SB stripe width sanity check failed");
+		return -EFSCORRUPTED;
+	}
+
+
 	if (xfs_sb_version_hascrc(&mp->m_sb) &&
 	    sbp->sb_blocksize < XFS_MIN_CRC_BLOCKSIZE) {
 		xfs_notice(mp, "v5 SB sanity check failed");
-- 
2.17.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 2/3] xfs: verify extent size hint is valid in inode verifier
  2018-06-05  2:43 [PATCH 0/3] xfs: more verifications! Dave Chinner
  2018-06-05  2:43 ` [PATCH 1/3] xfs: catch bad stripe alignment configurations Dave Chinner
@ 2018-06-05  2:43 ` Dave Chinner
  2018-06-05  4:08   ` Darrick J. Wong
  2018-06-05  2:43 ` [PATCH 3/3] xfs: validate btree records on retreival Dave Chinner
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 17+ messages in thread
From: Dave Chinner @ 2018-06-05  2:43 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

There are rules for vald extent size hints. We enforce them when
applications set them, but fuzzers violate those rules and that
screws us over.

This results in alignment assertion failures when setting up
allocations such as this in direct IO:

XFS: Assertion failed: ap->length, file: fs/xfs/libxfs/xfs_bmap.c, line: 3432
....
Call Trace:
 xfs_bmap_btalloc+0x415/0x910
 xfs_bmapi_write+0x71c/0x12e0
 xfs_iomap_write_direct+0x2a9/0x420
 xfs_file_iomap_begin+0x4dc/0xa70
 iomap_apply+0x43/0x100
 iomap_file_buffered_write+0x62/0x90
 xfs_file_buffered_aio_write+0xba/0x300
 __vfs_write+0xd5/0x150
 vfs_write+0xb6/0x180
 ksys_write+0x45/0xa0
 do_syscall_64+0x5a/0x180
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

And from xfs_db:

core.extsize = 10380288

Which is not an integer multiple of the block size, and so violates
Rule #7 for setting extent size hints. Validate extent size hint
rules in the inode verifier to catch this.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/libxfs/xfs_inode_buf.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/fs/xfs/libxfs/xfs_inode_buf.c b/fs/xfs/libxfs/xfs_inode_buf.c
index f5fff1ccb61d..be197c91307b 100644
--- a/fs/xfs/libxfs/xfs_inode_buf.c
+++ b/fs/xfs/libxfs/xfs_inode_buf.c
@@ -385,6 +385,7 @@ xfs_dinode_verify(
 	xfs_ino_t		ino,
 	struct xfs_dinode	*dip)
 {
+	xfs_failaddr_t		fa;
 	uint16_t		mode;
 	uint16_t		flags;
 	uint64_t		flags2;
@@ -501,6 +502,12 @@ xfs_dinode_verify(
 			return __this_address;
 	}
 
+	/* extent size hint validation */
+	fa = xfs_inode_validate_extsize(mp, be32_to_cpu(dip->di_extsize),
+					mode, be32_to_cpu(dip->di_flags));
+	if (fa)
+		return fa;
+
 	/* only version 3 or greater inodes are extensively verified here */
 	if (dip->di_version < 3)
 		return NULL;
-- 
2.17.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 3/3] xfs: validate btree records on retreival
  2018-06-05  2:43 [PATCH 0/3] xfs: more verifications! Dave Chinner
  2018-06-05  2:43 ` [PATCH 1/3] xfs: catch bad stripe alignment configurations Dave Chinner
  2018-06-05  2:43 ` [PATCH 2/3] xfs: verify extent size hint is valid in inode verifier Dave Chinner
@ 2018-06-05  2:43 ` Dave Chinner
  2018-06-05  4:02   ` Darrick J. Wong
  2018-06-05  2:57 ` [PATCH 4/3] xfs: verify root inode more thoroughly Dave Chinner
  2018-06-05  3:38 ` [PATCH 0/3] xfs: more verifications! Darrick J. Wong
  4 siblings, 1 reply; 17+ messages in thread
From: Dave Chinner @ 2018-06-05  2:43 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

So we don't check the validity of records as we walk the btree. When
there are corrupt records in the free space btree (e.g. zero
startblock/length or beyond EOAG) we just blindly use it and things
go bad from there. That leads to assert failures on debug kernels
like this:

XFS: Assertion failed: fs_is_ok, file: fs/xfs/libxfs/xfs_alloc.c, line: 450
....
Call Trace:
 xfs_alloc_fixup_trees+0x368/0x5c0
 xfs_alloc_ag_vextent_near+0x79a/0xe20
 xfs_alloc_ag_vextent+0x1d3/0x330
 xfs_alloc_vextent+0x5e9/0x870

Or crashes like this:

XFS (loop0): xfs_buf_find: daddr 0x7fb28 out of range, EOFS 0x8000
.....
BUG: unable to handle kernel NULL pointer dereference at 00000000000000c8
....
Call Trace:
 xfs_bmap_add_extent_hole_real+0x67d/0x930
 xfs_bmapi_write+0x934/0xc90
 xfs_da_grow_inode_int+0x27e/0x2f0
 xfs_dir2_grow_inode+0x55/0x130
 xfs_dir2_sf_to_block+0x94/0x5d0
 xfs_dir2_sf_addname+0xd0/0x590
 xfs_dir_createname+0x168/0x1a0
 xfs_rename+0x658/0x9b0

By checking that free space records pulled from the trees are
within the valid range, we catch many of these corruptions before
they can do damage.

This is a generic btree record checking deficiency. We need to
validate the records we fetch from all the different btrees before
we use them to catch corruptions like this.

This patch results in a corrupt record emitting an error message and
returning -EFSCORRUPTED, and the higher layers catch that and abort:

 XFS (loop0): Size Freespace BTree record corruption in AG 0 detected!
 XFS (loop0): start block 0x0 block count 0x0
 XFS (loop0): Internal error xfs_trans_cancel at line 1012 of file fs/xfs/xfs_trans.c.  Caller xfs_create+0x42a/0x670
 .....
 Call Trace:
  dump_stack+0x85/0xcb
  xfs_trans_cancel+0x19f/0x1c0
  xfs_create+0x42a/0x670
  xfs_generic_create+0x1f6/0x2c0
  vfs_create+0xf9/0x180
  do_mknodat+0x1f9/0x210
  do_syscall_64+0x5a/0x180
  entry_SYSCALL_64_after_hwframe+0x49/0xbe
.....
 XFS (loop0): xfs_do_force_shutdown(0x8) called from line 1013 of file fs/xfs/xfs_trans.c.  Return address = ffffffff81500868
 XFS (loop0): Corruption of in-memory data detected.  Shutting down filesystem

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/libxfs/xfs_alloc.c    | 18 ++++++++++++++--
 fs/xfs/libxfs/xfs_ialloc.c   | 31 ++++++++++++++++++++++++++-
 fs/xfs/libxfs/xfs_refcount.c | 41 +++++++++++++++++++++++++++++++-----
 fs/xfs/libxfs/xfs_rmap.c     | 34 +++++++++++++++++++++++++++++-
 4 files changed, 115 insertions(+), 9 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
index a52ffb835b16..e9caa3d18f89 100644
--- a/fs/xfs/libxfs/xfs_alloc.c
+++ b/fs/xfs/libxfs/xfs_alloc.c
@@ -215,6 +215,8 @@ xfs_alloc_get_rec(
 	xfs_extlen_t		*len,	/* output: length of extent */
 	int			*stat)	/* output: success/failure */
 {
+	struct xfs_mount	*mp = cur->bc_mp;
+	xfs_agnumber_t		agno = cur->bc_private.a.agno;
 	union xfs_btree_rec	*rec;
 	int			error;
 
@@ -222,12 +224,24 @@ xfs_alloc_get_rec(
 	if (error || !(*stat))
 		return error;
 	if (rec->alloc.ar_blockcount == 0)
-		return -EFSCORRUPTED;
+		goto out_bad_rec;
 
 	*bno = be32_to_cpu(rec->alloc.ar_startblock);
 	*len = be32_to_cpu(rec->alloc.ar_blockcount);
 
-	return error;
+	if (!xfs_verify_agbno(mp, agno, *bno) ||
+	    !xfs_verify_agbno(mp, agno, *bno + *len - 1))
+		goto out_bad_rec;
+
+	return 0;
+
+out_bad_rec:
+	xfs_warn(mp,
+		"%s Freespace BTree record corruption in AG %d detected!",
+		cur->bc_btnum == XFS_BTNUM_BNO ? "Block" : "Size", agno);
+	xfs_warn(mp,
+		"start block 0x%x block count 0x%x", *bno, *len);
+	return -EFSCORRUPTED;
 }
 
 /*
diff --git a/fs/xfs/libxfs/xfs_ialloc.c b/fs/xfs/libxfs/xfs_ialloc.c
index ec5ea02b5553..3f551eb29157 100644
--- a/fs/xfs/libxfs/xfs_ialloc.c
+++ b/fs/xfs/libxfs/xfs_ialloc.c
@@ -121,16 +121,45 @@ xfs_inobt_get_rec(
 	struct xfs_inobt_rec_incore	*irec,
 	int				*stat)
 {
+	struct xfs_mount		*mp = cur->bc_mp;
+	xfs_agnumber_t			agno = cur->bc_private.a.agno;
 	union xfs_btree_rec		*rec;
 	int				error;
+	uint64_t			realfree;
 
 	error = xfs_btree_get_rec(cur, &rec, stat);
 	if (error || *stat == 0)
 		return error;
 
-	xfs_inobt_btrec_to_irec(cur->bc_mp, rec, irec);
+	xfs_inobt_btrec_to_irec(mp, rec, irec);
+
+	if (!xfs_verify_agino(mp, agno, irec->ir_startino))
+		goto out_bad_rec;
+	if (irec->ir_count < XFS_INODES_PER_HOLEMASK_BIT ||
+	    irec->ir_count > XFS_INODES_PER_CHUNK)
+		goto out_bad_rec;
+	if (irec->ir_freecount > XFS_INODES_PER_CHUNK)
+		goto out_bad_rec;
+
+	/* if there are no holes, return the first available offset */
+	if (!xfs_inobt_issparse(irec->ir_holemask))
+		realfree = irec->ir_free;
+	else
+		realfree = irec->ir_free & xfs_inobt_irec_to_allocmask(irec);
+	if (hweight64(realfree) != irec->ir_freecount)
+		goto out_bad_rec;
 
 	return 0;
+
+out_bad_rec:
+	xfs_warn(mp,
+		"%s Inode BTree record corruption in AG %d detected!",
+		cur->bc_btnum == XFS_BTNUM_INO ? "Used" : "Free", agno);
+	xfs_warn(mp,
+"start inode 0x%x, count 0x%x, free 0x%x freemask 0x%llx, holemask 0x%x",
+		irec->ir_startino, irec->ir_count, irec->ir_freecount,
+		irec->ir_free, irec->ir_holemask);
+	return -EFSCORRUPTED;
 }
 
 /*
diff --git a/fs/xfs/libxfs/xfs_refcount.c b/fs/xfs/libxfs/xfs_refcount.c
index ed5704c7dcf5..bd778e8b809f 100644
--- a/fs/xfs/libxfs/xfs_refcount.c
+++ b/fs/xfs/libxfs/xfs_refcount.c
@@ -111,16 +111,47 @@ xfs_refcount_get_rec(
 	struct xfs_refcount_irec	*irec,
 	int				*stat)
 {
+	struct xfs_mount		*mp = cur->bc_mp;
+	xfs_agnumber_t			agno = cur->bc_private.a.agno;
 	union xfs_btree_rec		*rec;
 	int				error;
+	xfs_agblock_t			realstart;
 
 	error = xfs_btree_get_rec(cur, &rec, stat);
-	if (!error && *stat == 1) {
-		xfs_refcount_btrec_to_irec(rec, irec);
-		trace_xfs_refcount_get(cur->bc_mp, cur->bc_private.a.agno,
-				irec);
+	if (error || !*stat)
+		return error;
+
+	xfs_refcount_btrec_to_irec(rec, irec);
+
+	agno = cur->bc_private.a.agno;
+	if (irec->rc_blockcount == 0 || irec->rc_blockcount > MAXREFCEXTLEN)
+		goto out_bad_rec;
+
+	/* handle special COW-staging state */
+	realstart = irec->rc_startblock;
+	if (realstart & XFS_REFC_COW_START) {
+		if (irec->rc_refcount != 1)
+			goto out_bad_rec;
+		realstart &= ~XFS_REFC_COW_START;
 	}
-	return error;
+
+	if (!xfs_verify_agbno(mp, agno, realstart))
+		goto out_bad_rec;
+	if (!xfs_verify_agbno(mp, agno, realstart + irec->rc_blockcount - 1))
+		goto out_bad_rec;
+	if (irec->rc_refcount == 0 || irec->rc_refcount > MAXREFCOUNT)
+		goto out_bad_rec;
+
+	trace_xfs_refcount_get(cur->bc_mp, cur->bc_private.a.agno, irec);
+	return 0;
+
+out_bad_rec:
+	xfs_warn(mp,
+		"Refcount BTree record corruption in AG %d detected!", agno);
+	xfs_warn(mp,
+		"Start block 0x%x, block count 0x%x, references 0x%x",
+		irec->rc_startblock, irec->rc_blockcount, irec->rc_refcount);
+	return -EFSCORRUPTED;
 }
 
 /*
diff --git a/fs/xfs/libxfs/xfs_rmap.c b/fs/xfs/libxfs/xfs_rmap.c
index 87711f9af625..69cfc92039e7 100644
--- a/fs/xfs/libxfs/xfs_rmap.c
+++ b/fs/xfs/libxfs/xfs_rmap.c
@@ -191,6 +191,8 @@ xfs_rmap_get_rec(
 	struct xfs_rmap_irec	*irec,
 	int			*stat)
 {
+	struct xfs_mount	*mp = cur->bc_mp;
+	xfs_agnumber_t		agno = cur->bc_private.a.agno;
 	union xfs_btree_rec	*rec;
 	int			error;
 
@@ -198,7 +200,37 @@ xfs_rmap_get_rec(
 	if (error || !*stat)
 		return error;
 
-	return xfs_rmap_btrec_to_irec(rec, irec);
+	if (xfs_rmap_btrec_to_irec(rec, irec))
+		goto out_bad_rec;
+
+	if (irec->rm_blockcount == 0)
+		goto out_bad_rec;
+	if (irec->rm_startblock <= XFS_AGFL_BLOCK(mp)) {
+		if (irec->rm_owner != XFS_RMAP_OWN_FS)
+			goto out_bad_rec;
+	} else {
+		if (!xfs_verify_agbno(mp, agno, irec->rm_startblock))
+			goto out_bad_rec;
+		if (!xfs_verify_agbno(mp, agno,
+				irec->rm_startblock + irec->rm_blockcount - 1))
+			goto out_bad_rec;
+	}
+
+	if (irec->rm_owner == 0)
+		goto out_bad_rec;
+	if (irec->rm_owner > XFS_MAXINUMBER &&
+	    irec->rm_owner <= XFS_RMAP_OWN_MIN)
+		goto out_bad_rec;
+
+	return 0;
+out_bad_rec:
+	xfs_warn(mp,
+		"RMAP BTree record corruption in AG %d detected!", agno);
+	xfs_warn(mp,
+		"Owner 0x%llx, flags 0x%x, start block 0x%x block count 0x%x",
+		irec->rm_owner, irec->rm_flags, irec->rm_startblock,
+		irec->rm_blockcount);
+	return -EFSCORRUPTED;
 }
 
 struct xfs_find_left_neighbor_info {
-- 
2.17.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 4/3] xfs: verify root inode more thoroughly
  2018-06-05  2:43 [PATCH 0/3] xfs: more verifications! Dave Chinner
                   ` (2 preceding siblings ...)
  2018-06-05  2:43 ` [PATCH 3/3] xfs: validate btree records on retreival Dave Chinner
@ 2018-06-05  2:57 ` Dave Chinner
  2018-06-05  4:06   ` Darrick J. Wong
  2018-06-05  3:38 ` [PATCH 0/3] xfs: more verifications! Darrick J. Wong
  4 siblings, 1 reply; 17+ messages in thread
From: Dave Chinner @ 2018-06-05  2:57 UTC (permalink / raw)
  To: linux-xfs


From: Dave Chinner <dchinner@redhat.com>

When looking up the root inode at mount time, we don't actually do
any verification to check that the inode is allocated and accounted
for correctly in the INOBT. Make the checks on the root inode more
robust by making it an untrusted lookup. This forces the inode
lookup to use the inode btree to verify the inode is allocated
and mapped correctly to disk. This will also have the effect of
catching a significant number of AGI/INOBT related corruptions in
AG 0 at mount time.

Signed-off-by: Dave Chinner <dchinner@redhat.com>

---
 fs/xfs/xfs_mount.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
index 189fa7b615d3..a3378252baa1 100644
--- a/fs/xfs/xfs_mount.c
+++ b/fs/xfs/xfs_mount.c
@@ -862,9 +862,12 @@ xfs_mountfs(
 	 * Get and sanity-check the root inode.
 	 * Save the pointer to it in the mount structure.
 	 */
-	error = xfs_iget(mp, NULL, sbp->sb_rootino, 0, XFS_ILOCK_EXCL, &rip);
+	error = xfs_iget(mp, NULL, sbp->sb_rootino, XFS_IGET_UNTRUSTED,
+			 XFS_ILOCK_EXCL, &rip);
 	if (error) {
-		xfs_warn(mp, "failed to read root inode");
+		xfs_warn(mp,
+			"Failed to read root inode 0x%llx, error %d",
+			sbp->sb_rootino, -error);
 		goto out_log_dealloc;
 	}
 

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/3] xfs: more verifications!
  2018-06-05  2:43 [PATCH 0/3] xfs: more verifications! Dave Chinner
                   ` (3 preceding siblings ...)
  2018-06-05  2:57 ` [PATCH 4/3] xfs: verify root inode more thoroughly Dave Chinner
@ 2018-06-05  3:38 ` Darrick J. Wong
  2018-06-05  4:44   ` Dave Chinner
  4 siblings, 1 reply; 17+ messages in thread
From: Darrick J. Wong @ 2018-06-05  3:38 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Tue, Jun 05, 2018 at 12:43:10PM +1000, Dave Chinner wrote:
> Hi folks,
> 
> This patchset addresses a series of corruptions noticed in an image
> provided by Wen Xu. The patches only address detecting corruptions
> before they can do damage - it does not address/fix the crash the
> image caused but instead prevents the bad information from getting
> to the point where it can cause a crash.
> 
> The third patch addresses a general btree record validation issue;
> we should probably dirve this inwards to each btree implementation
> with a ->verify_record() callout, as this patch does not address
> all the places that btree records are traversed during searches. It
> will catch any attempt to use a bad record that a search lands on,
> however.
> 
> Comments?

So, uh, does xfs_repair catch all these things and fix them?

I wonder if online repair catches and fixes them too, but I won't yet
start heckling people about that. :P

--D

> -Dave.
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/3] xfs: catch bad stripe alignment configurations
  2018-06-05  2:43 ` [PATCH 1/3] xfs: catch bad stripe alignment configurations Dave Chinner
@ 2018-06-05  3:44   ` Darrick J. Wong
  0 siblings, 0 replies; 17+ messages in thread
From: Darrick J. Wong @ 2018-06-05  3:44 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Tue, Jun 05, 2018 at 12:43:11PM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> When stripe alignments are invalid, data alignment algorithms in the
> allocator may not work correctly. Ensure we catch superblocks with
> invalid stripe alignment setups at mount time. These data alignment
> mismatches are now detected at mount time like this:
> 
> XFS (loop0): SB stripe unit sanity check failed
> XFS (loop0): Metadata corruption detected at xfs_sb_read_verify+0xab/0x110, xfs_sb block 0xffffffffffffffff
> XFS (loop0): Unmount and run xfs_repair
> XFS (loop0): First 128 bytes of corrupted metadata buffer:
> 0000000091c2de02: 58 46 53 42 00 00 10 00 00 00 00 00 00 00 10 00  XFSB............
> 0000000023bff869: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> 00000000cdd8c893: 17 32 37 15 ff ca 46 3d 9a 17 d3 33 04 b5 f1 a2  .27...F=...3....
> 000000009fd2844f: 00 00 00 00 00 00 00 04 00 00 00 00 00 00 06 d0  ................
> 0000000088e9b0bb: 00 00 00 00 00 00 06 d1 00 00 00 00 00 00 06 d2  ................
> 00000000ff233a20: 00 00 00 01 00 00 10 00 00 00 00 01 00 00 00 00  ................
> 000000009db0ac8b: 00 00 03 60 e1 34 02 00 08 00 00 02 00 00 00 00  ...`.4..........
> 00000000f7022460: 00 00 00 00 00 00 00 00 0c 09 0b 01 0c 00 00 19  ................
> XFS (loop0): SB validate failed with error -117.
> 
> And the mount fails.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>

Looks ok,
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> ---
>  fs/xfs/libxfs/xfs_sb.c | 16 ++++++++++++++++
>  1 file changed, 16 insertions(+)
> 
> diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c
> index 54c5e14fdfda..fe9448862667 100644
> --- a/fs/xfs/libxfs/xfs_sb.c
> +++ b/fs/xfs/libxfs/xfs_sb.c
> @@ -266,6 +266,22 @@ xfs_mount_validate_sb(
>  		return -EFSCORRUPTED;
>  	}
>  
> +	if (sbp->sb_unit) {
> +		if (!xfs_sb_version_hasdalign(sbp) ||
> +		    sbp->sb_unit > sbp->sb_width ||
> +		    (sbp->sb_width % sbp->sb_unit) != 0) {
> +			xfs_notice(mp, "SB stripe unit sanity check failed");
> +			return -EFSCORRUPTED;
> +		}
> +	} else if (xfs_sb_version_hasdalign(sbp)) {
> +		xfs_notice(mp, "SB stripe alignment sanity check failed");
> +		return -EFSCORRUPTED;
> +	} else if (sbp->sb_width) {
> +		xfs_notice(mp, "SB stripe width sanity check failed");
> +		return -EFSCORRUPTED;
> +	}
> +
> +
>  	if (xfs_sb_version_hascrc(&mp->m_sb) &&
>  	    sbp->sb_blocksize < XFS_MIN_CRC_BLOCKSIZE) {
>  		xfs_notice(mp, "v5 SB sanity check failed");
> -- 
> 2.17.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/3] xfs: validate btree records on retreival
  2018-06-05  2:43 ` [PATCH 3/3] xfs: validate btree records on retreival Dave Chinner
@ 2018-06-05  4:02   ` Darrick J. Wong
  2018-06-05  4:39     ` Dave Chinner
  0 siblings, 1 reply; 17+ messages in thread
From: Darrick J. Wong @ 2018-06-05  4:02 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Tue, Jun 05, 2018 at 12:43:13PM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> So we don't check the validity of records as we walk the btree. When
> there are corrupt records in the free space btree (e.g. zero
> startblock/length or beyond EOAG) we just blindly use it and things
> go bad from there. That leads to assert failures on debug kernels
> like this:
> 
> XFS: Assertion failed: fs_is_ok, file: fs/xfs/libxfs/xfs_alloc.c, line: 450
> ....
> Call Trace:
>  xfs_alloc_fixup_trees+0x368/0x5c0
>  xfs_alloc_ag_vextent_near+0x79a/0xe20
>  xfs_alloc_ag_vextent+0x1d3/0x330
>  xfs_alloc_vextent+0x5e9/0x870
> 
> Or crashes like this:
> 
> XFS (loop0): xfs_buf_find: daddr 0x7fb28 out of range, EOFS 0x8000
> .....
> BUG: unable to handle kernel NULL pointer dereference at 00000000000000c8
> ....
> Call Trace:
>  xfs_bmap_add_extent_hole_real+0x67d/0x930
>  xfs_bmapi_write+0x934/0xc90
>  xfs_da_grow_inode_int+0x27e/0x2f0
>  xfs_dir2_grow_inode+0x55/0x130
>  xfs_dir2_sf_to_block+0x94/0x5d0
>  xfs_dir2_sf_addname+0xd0/0x590
>  xfs_dir_createname+0x168/0x1a0
>  xfs_rename+0x658/0x9b0
> 
> By checking that free space records pulled from the trees are
> within the valid range, we catch many of these corruptions before
> they can do damage.
> 
> This is a generic btree record checking deficiency. We need to
> validate the records we fetch from all the different btrees before
> we use them to catch corruptions like this.
> 
> This patch results in a corrupt record emitting an error message and
> returning -EFSCORRUPTED, and the higher layers catch that and abort:
> 
>  XFS (loop0): Size Freespace BTree record corruption in AG 0 detected!
>  XFS (loop0): start block 0x0 block count 0x0
>  XFS (loop0): Internal error xfs_trans_cancel at line 1012 of file fs/xfs/xfs_trans.c.  Caller xfs_create+0x42a/0x670
>  .....
>  Call Trace:
>   dump_stack+0x85/0xcb
>   xfs_trans_cancel+0x19f/0x1c0
>   xfs_create+0x42a/0x670
>   xfs_generic_create+0x1f6/0x2c0
>   vfs_create+0xf9/0x180
>   do_mknodat+0x1f9/0x210
>   do_syscall_64+0x5a/0x180
>   entry_SYSCALL_64_after_hwframe+0x49/0xbe
> .....
>  XFS (loop0): xfs_do_force_shutdown(0x8) called from line 1013 of file fs/xfs/xfs_trans.c.  Return address = ffffffff81500868
>  XFS (loop0): Corruption of in-memory data detected.  Shutting down filesystem
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/libxfs/xfs_alloc.c    | 18 ++++++++++++++--
>  fs/xfs/libxfs/xfs_ialloc.c   | 31 ++++++++++++++++++++++++++-
>  fs/xfs/libxfs/xfs_refcount.c | 41 +++++++++++++++++++++++++++++++-----
>  fs/xfs/libxfs/xfs_rmap.c     | 34 +++++++++++++++++++++++++++++-
>  4 files changed, 115 insertions(+), 9 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
> index a52ffb835b16..e9caa3d18f89 100644
> --- a/fs/xfs/libxfs/xfs_alloc.c
> +++ b/fs/xfs/libxfs/xfs_alloc.c
> @@ -215,6 +215,8 @@ xfs_alloc_get_rec(
>  	xfs_extlen_t		*len,	/* output: length of extent */
>  	int			*stat)	/* output: success/failure */
>  {
> +	struct xfs_mount	*mp = cur->bc_mp;
> +	xfs_agnumber_t		agno = cur->bc_private.a.agno;
>  	union xfs_btree_rec	*rec;
>  	int			error;
>  
> @@ -222,12 +224,24 @@ xfs_alloc_get_rec(
>  	if (error || !(*stat))
>  		return error;
>  	if (rec->alloc.ar_blockcount == 0)
> -		return -EFSCORRUPTED;
> +		goto out_bad_rec;
>  
>  	*bno = be32_to_cpu(rec->alloc.ar_startblock);
>  	*len = be32_to_cpu(rec->alloc.ar_blockcount);
>  
> -	return error;
> +	if (!xfs_verify_agbno(mp, agno, *bno) ||
> +	    !xfs_verify_agbno(mp, agno, *bno + *len - 1))

What about overflows?

> +		goto out_bad_rec;
> +
> +	return 0;
> +
> +out_bad_rec:
> +	xfs_warn(mp,
> +		"%s Freespace BTree record corruption in AG %d detected!",
> +		cur->bc_btnum == XFS_BTNUM_BNO ? "Block" : "Size", agno);
> +	xfs_warn(mp,
> +		"start block 0x%x block count 0x%x", *bno, *len);
> +	return -EFSCORRUPTED;
>  }
>  
>  /*
> diff --git a/fs/xfs/libxfs/xfs_ialloc.c b/fs/xfs/libxfs/xfs_ialloc.c
> index ec5ea02b5553..3f551eb29157 100644
> --- a/fs/xfs/libxfs/xfs_ialloc.c
> +++ b/fs/xfs/libxfs/xfs_ialloc.c
> @@ -121,16 +121,45 @@ xfs_inobt_get_rec(
>  	struct xfs_inobt_rec_incore	*irec,
>  	int				*stat)
>  {
> +	struct xfs_mount		*mp = cur->bc_mp;
> +	xfs_agnumber_t			agno = cur->bc_private.a.agno;
>  	union xfs_btree_rec		*rec;
>  	int				error;
> +	uint64_t			realfree;
>  
>  	error = xfs_btree_get_rec(cur, &rec, stat);
>  	if (error || *stat == 0)
>  		return error;
>  
> -	xfs_inobt_btrec_to_irec(cur->bc_mp, rec, irec);
> +	xfs_inobt_btrec_to_irec(mp, rec, irec);
> +
> +	if (!xfs_verify_agino(mp, agno, irec->ir_startino))
> +		goto out_bad_rec;
> +	if (irec->ir_count < XFS_INODES_PER_HOLEMASK_BIT ||
> +	    irec->ir_count > XFS_INODES_PER_CHUNK)
> +		goto out_bad_rec;
> +	if (irec->ir_freecount > XFS_INODES_PER_CHUNK)
> +		goto out_bad_rec;
> +
> +	/* if there are no holes, return the first available offset */
> +	if (!xfs_inobt_issparse(irec->ir_holemask))
> +		realfree = irec->ir_free;
> +	else
> +		realfree = irec->ir_free & xfs_inobt_irec_to_allocmask(irec);
> +	if (hweight64(realfree) != irec->ir_freecount)
> +		goto out_bad_rec;

Hmmm... there's a fair amount of shared logic between this and
xfs_scrub_iallocbt_rec().  Maybe the validation logic should be a new
xfs_btree_ops->verify_rec() function so that we can trap bad records as
they come in, and share the code with the btree scrubbers?

This also gets us to the broader question of where do we draw the line
between hot-path verifiers and online fsck?  This code (and its
counterparts through the rest of this patch) are very very similar to
the existing scrub/ record checking code, and as the fuzz attacks become
more sophisticated it makes more sense just to run online fsck.  Then we
gain the secondary side effect of pulling all of an AG's metadata into
memory right then and there, which (at least on fast storage) won't be
so painful.  Also if we know an AG is toast we could adapt xfs to handle
offlining of bad AGs while we fix them.

The awkward part is that online fsck is far short of a full deck...

>  
>  	return 0;
> +
> +out_bad_rec:
> +	xfs_warn(mp,
> +		"%s Inode BTree record corruption in AG %d detected!",
> +		cur->bc_btnum == XFS_BTNUM_INO ? "Used" : "Free", agno);
> +	xfs_warn(mp,
> +"start inode 0x%x, count 0x%x, free 0x%x freemask 0x%llx, holemask 0x%x",
> +		irec->ir_startino, irec->ir_count, irec->ir_freecount,
> +		irec->ir_free, irec->ir_holemask);

I've also been wondering given the insane amount of dmesg output from
the dangerous_*repair xfstests if we should be turning these into
tracepoints so that we don't hose the system when people feed us garbage
filesystems.  The downside is that you'd then have to use ftrace to
capture the exactly buffer we fell over on.

> +	return -EFSCORRUPTED;
>  }
>  
>  /*
> diff --git a/fs/xfs/libxfs/xfs_refcount.c b/fs/xfs/libxfs/xfs_refcount.c
> index ed5704c7dcf5..bd778e8b809f 100644
> --- a/fs/xfs/libxfs/xfs_refcount.c
> +++ b/fs/xfs/libxfs/xfs_refcount.c
> @@ -111,16 +111,47 @@ xfs_refcount_get_rec(
>  	struct xfs_refcount_irec	*irec,
>  	int				*stat)
>  {
> +	struct xfs_mount		*mp = cur->bc_mp;
> +	xfs_agnumber_t			agno = cur->bc_private.a.agno;
>  	union xfs_btree_rec		*rec;
>  	int				error;
> +	xfs_agblock_t			realstart;
>  
>  	error = xfs_btree_get_rec(cur, &rec, stat);
> -	if (!error && *stat == 1) {
> -		xfs_refcount_btrec_to_irec(rec, irec);
> -		trace_xfs_refcount_get(cur->bc_mp, cur->bc_private.a.agno,
> -				irec);
> +	if (error || !*stat)
> +		return error;
> +
> +	xfs_refcount_btrec_to_irec(rec, irec);
> +
> +	agno = cur->bc_private.a.agno;
> +	if (irec->rc_blockcount == 0 || irec->rc_blockcount > MAXREFCEXTLEN)
> +		goto out_bad_rec;
> +
> +	/* handle special COW-staging state */
> +	realstart = irec->rc_startblock;
> +	if (realstart & XFS_REFC_COW_START) {
> +		if (irec->rc_refcount != 1)
> +			goto out_bad_rec;
> +		realstart &= ~XFS_REFC_COW_START;
>  	}
> -	return error;
> +
> +	if (!xfs_verify_agbno(mp, agno, realstart))
> +		goto out_bad_rec;
> +	if (!xfs_verify_agbno(mp, agno, realstart + irec->rc_blockcount - 1))
> +		goto out_bad_rec;
> +	if (irec->rc_refcount == 0 || irec->rc_refcount > MAXREFCOUNT)
> +		goto out_bad_rec;
> +
> +	trace_xfs_refcount_get(cur->bc_mp, cur->bc_private.a.agno, irec);
> +	return 0;
> +
> +out_bad_rec:
> +	xfs_warn(mp,
> +		"Refcount BTree record corruption in AG %d detected!", agno);
> +	xfs_warn(mp,
> +		"Start block 0x%x, block count 0x%x, references 0x%x",
> +		irec->rc_startblock, irec->rc_blockcount, irec->rc_refcount);
> +	return -EFSCORRUPTED;
>  }
>  
>  /*
> diff --git a/fs/xfs/libxfs/xfs_rmap.c b/fs/xfs/libxfs/xfs_rmap.c
> index 87711f9af625..69cfc92039e7 100644
> --- a/fs/xfs/libxfs/xfs_rmap.c
> +++ b/fs/xfs/libxfs/xfs_rmap.c
> @@ -191,6 +191,8 @@ xfs_rmap_get_rec(
>  	struct xfs_rmap_irec	*irec,
>  	int			*stat)
>  {
> +	struct xfs_mount	*mp = cur->bc_mp;
> +	xfs_agnumber_t		agno = cur->bc_private.a.agno;
>  	union xfs_btree_rec	*rec;
>  	int			error;
>  
> @@ -198,7 +200,37 @@ xfs_rmap_get_rec(
>  	if (error || !*stat)
>  		return error;
>  
> -	return xfs_rmap_btrec_to_irec(rec, irec);
> +	if (xfs_rmap_btrec_to_irec(rec, irec))
> +		goto out_bad_rec;
> +
> +	if (irec->rm_blockcount == 0)
> +		goto out_bad_rec;
> +	if (irec->rm_startblock <= XFS_AGFL_BLOCK(mp)) {
> +		if (irec->rm_owner != XFS_RMAP_OWN_FS)
> +			goto out_bad_rec;
> +	} else {
> +		if (!xfs_verify_agbno(mp, agno, irec->rm_startblock))
> +			goto out_bad_rec;
> +		if (!xfs_verify_agbno(mp, agno,
> +				irec->rm_startblock + irec->rm_blockcount - 1))
> +			goto out_bad_rec;
> +	}
> +
> +	if (irec->rm_owner == 0)
> +		goto out_bad_rec;
> +	if (irec->rm_owner > XFS_MAXINUMBER &&
> +	    irec->rm_owner <= XFS_RMAP_OWN_MIN)
> +		goto out_bad_rec;
> +
> +	return 0;
> +out_bad_rec:
> +	xfs_warn(mp,
> +		"RMAP BTree record corruption in AG %d detected!", agno);
> +	xfs_warn(mp,
> +		"Owner 0x%llx, flags 0x%x, start block 0x%x block count 0x%x",
> +		irec->rm_owner, irec->rm_flags, irec->rm_startblock,
> +		irec->rm_blockcount);
> +	return -EFSCORRUPTED;
>  }
>  
>  struct xfs_find_left_neighbor_info {
> -- 
> 2.17.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 4/3] xfs: verify root inode more thoroughly
  2018-06-05  2:57 ` [PATCH 4/3] xfs: verify root inode more thoroughly Dave Chinner
@ 2018-06-05  4:06   ` Darrick J. Wong
  2018-06-05  5:30     ` Dave Chinner
  0 siblings, 1 reply; 17+ messages in thread
From: Darrick J. Wong @ 2018-06-05  4:06 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Tue, Jun 05, 2018 at 12:57:04PM +1000, Dave Chinner wrote:
> 
> From: Dave Chinner <dchinner@redhat.com>
> 
> When looking up the root inode at mount time, we don't actually do
> any verification to check that the inode is allocated and accounted
> for correctly in the INOBT. Make the checks on the root inode more
> robust by making it an untrusted lookup. This forces the inode
> lookup to use the inode btree to verify the inode is allocated
> and mapped correctly to disk. This will also have the effect of
> catching a significant number of AGI/INOBT related corruptions in
> AG 0 at mount time.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> 
> ---
>  fs/xfs/xfs_mount.c | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
> index 189fa7b615d3..a3378252baa1 100644
> --- a/fs/xfs/xfs_mount.c
> +++ b/fs/xfs/xfs_mount.c
> @@ -862,9 +862,12 @@ xfs_mountfs(
>  	 * Get and sanity-check the root inode.
>  	 * Save the pointer to it in the mount structure.
>  	 */
> -	error = xfs_iget(mp, NULL, sbp->sb_rootino, 0, XFS_ILOCK_EXCL, &rip);
> +	error = xfs_iget(mp, NULL, sbp->sb_rootino, XFS_IGET_UNTRUSTED,

One little quirk I've noticed with xfs_iget is that a corrupt inode
buffer's -EFSCORRUPTED gets turned into -EINVAL on the way out of iget.
There's no way to distinguish between "the inode number was crap" vs.
"the inode is marked in use but the buffer verifier failed".

This particularly roars its head in the (directory) parent pointer
scrubber where we try an untrusted iget of the .. entry and if the
alleged parent inode is corrupt we lack the ability to distinguish the
two (ideally we'd set SCRUB_OFLAG_CORRUPT if the .. entry is junk and
SCRUB_OFLAG_XCORRUPT if the inode buffer verifier failed).

I was just going to fix the parent pointer scrubber to check the inobt
directly, but since this has the same reporting problem I figured I'd
just air my dirty laundry on the list. :)

--D

> +			 XFS_ILOCK_EXCL, &rip);
>  	if (error) {
> -		xfs_warn(mp, "failed to read root inode");
> +		xfs_warn(mp,
> +			"Failed to read root inode 0x%llx, error %d",
> +			sbp->sb_rootino, -error);
>  		goto out_log_dealloc;
>  	}
>  
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/3] xfs: verify extent size hint is valid in inode verifier
  2018-06-05  2:43 ` [PATCH 2/3] xfs: verify extent size hint is valid in inode verifier Dave Chinner
@ 2018-06-05  4:08   ` Darrick J. Wong
  2018-06-05  4:24     ` Dave Chinner
  0 siblings, 1 reply; 17+ messages in thread
From: Darrick J. Wong @ 2018-06-05  4:08 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Tue, Jun 05, 2018 at 12:43:12PM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> There are rules for vald extent size hints. We enforce them when
> applications set them, but fuzzers violate those rules and that
> screws us over.
> 
> This results in alignment assertion failures when setting up
> allocations such as this in direct IO:
> 
> XFS: Assertion failed: ap->length, file: fs/xfs/libxfs/xfs_bmap.c, line: 3432
> ....
> Call Trace:
>  xfs_bmap_btalloc+0x415/0x910
>  xfs_bmapi_write+0x71c/0x12e0
>  xfs_iomap_write_direct+0x2a9/0x420
>  xfs_file_iomap_begin+0x4dc/0xa70
>  iomap_apply+0x43/0x100
>  iomap_file_buffered_write+0x62/0x90
>  xfs_file_buffered_aio_write+0xba/0x300
>  __vfs_write+0xd5/0x150
>  vfs_write+0xb6/0x180
>  ksys_write+0x45/0xa0
>  do_syscall_64+0x5a/0x180
>  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> 
> And from xfs_db:
> 
> core.extsize = 10380288
> 
> Which is not an integer multiple of the block size, and so violates
> Rule #7 for setting extent size hints. Validate extent size hint
> rules in the inode verifier to catch this.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/libxfs/xfs_inode_buf.c | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/fs/xfs/libxfs/xfs_inode_buf.c b/fs/xfs/libxfs/xfs_inode_buf.c
> index f5fff1ccb61d..be197c91307b 100644
> --- a/fs/xfs/libxfs/xfs_inode_buf.c
> +++ b/fs/xfs/libxfs/xfs_inode_buf.c
> @@ -385,6 +385,7 @@ xfs_dinode_verify(
>  	xfs_ino_t		ino,
>  	struct xfs_dinode	*dip)
>  {
> +	xfs_failaddr_t		fa;
>  	uint16_t		mode;
>  	uint16_t		flags;
>  	uint64_t		flags2;
> @@ -501,6 +502,12 @@ xfs_dinode_verify(
>  			return __this_address;
>  	}
>  
> +	/* extent size hint validation */
> +	fa = xfs_inode_validate_extsize(mp, be32_to_cpu(dip->di_extsize),
> +					mode, be32_to_cpu(dip->di_flags));

What if the cowextsize is garbage?  Do we handle that better, or do we
blow up there too?

--D

> +	if (fa)
> +		return fa;
> +
>  	/* only version 3 or greater inodes are extensively verified here */
>  	if (dip->di_version < 3)
>  		return NULL;
> -- 
> 2.17.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/3] xfs: verify extent size hint is valid in inode verifier
  2018-06-05  4:08   ` Darrick J. Wong
@ 2018-06-05  4:24     ` Dave Chinner
  2018-06-05  4:28       ` Darrick J. Wong
  0 siblings, 1 reply; 17+ messages in thread
From: Dave Chinner @ 2018-06-05  4:24 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Mon, Jun 04, 2018 at 09:08:17PM -0700, Darrick J. Wong wrote:
> On Tue, Jun 05, 2018 at 12:43:12PM +1000, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > There are rules for vald extent size hints. We enforce them when
> > applications set them, but fuzzers violate those rules and that
> > screws us over.
> > 
> > This results in alignment assertion failures when setting up
> > allocations such as this in direct IO:
> > 
> > XFS: Assertion failed: ap->length, file: fs/xfs/libxfs/xfs_bmap.c, line: 3432
> > ....
> > Call Trace:
> >  xfs_bmap_btalloc+0x415/0x910
> >  xfs_bmapi_write+0x71c/0x12e0
> >  xfs_iomap_write_direct+0x2a9/0x420
> >  xfs_file_iomap_begin+0x4dc/0xa70
> >  iomap_apply+0x43/0x100
> >  iomap_file_buffered_write+0x62/0x90
> >  xfs_file_buffered_aio_write+0xba/0x300
> >  __vfs_write+0xd5/0x150
> >  vfs_write+0xb6/0x180
> >  ksys_write+0x45/0xa0
> >  do_syscall_64+0x5a/0x180
> >  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> > 
> > And from xfs_db:
> > 
> > core.extsize = 10380288
> > 
> > Which is not an integer multiple of the block size, and so violates
> > Rule #7 for setting extent size hints. Validate extent size hint
> > rules in the inode verifier to catch this.
> > 
> > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > ---
> >  fs/xfs/libxfs/xfs_inode_buf.c | 7 +++++++
> >  1 file changed, 7 insertions(+)
> > 
> > diff --git a/fs/xfs/libxfs/xfs_inode_buf.c b/fs/xfs/libxfs/xfs_inode_buf.c
> > index f5fff1ccb61d..be197c91307b 100644
> > --- a/fs/xfs/libxfs/xfs_inode_buf.c
> > +++ b/fs/xfs/libxfs/xfs_inode_buf.c
> > @@ -385,6 +385,7 @@ xfs_dinode_verify(
> >  	xfs_ino_t		ino,
> >  	struct xfs_dinode	*dip)
> >  {
> > +	xfs_failaddr_t		fa;
> >  	uint16_t		mode;
> >  	uint16_t		flags;
> >  	uint64_t		flags2;
> > @@ -501,6 +502,12 @@ xfs_dinode_verify(
> >  			return __this_address;
> >  	}
> >  
> > +	/* extent size hint validation */
> > +	fa = xfs_inode_validate_extsize(mp, be32_to_cpu(dip->di_extsize),
> > +					mode, be32_to_cpu(dip->di_flags));
> 
> What if the cowextsize is garbage?  Do we handle that better, or do we
> blow up there too?

I haven't checked (it was a v4 image that I was looking at) - are
the rules the same?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/3] xfs: verify extent size hint is valid in inode verifier
  2018-06-05  4:24     ` Dave Chinner
@ 2018-06-05  4:28       ` Darrick J. Wong
  2018-06-05  4:47         ` Dave Chinner
  0 siblings, 1 reply; 17+ messages in thread
From: Darrick J. Wong @ 2018-06-05  4:28 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Tue, Jun 05, 2018 at 02:24:53PM +1000, Dave Chinner wrote:
> On Mon, Jun 04, 2018 at 09:08:17PM -0700, Darrick J. Wong wrote:
> > On Tue, Jun 05, 2018 at 12:43:12PM +1000, Dave Chinner wrote:
> > > From: Dave Chinner <dchinner@redhat.com>
> > > 
> > > There are rules for vald extent size hints. We enforce them when
> > > applications set them, but fuzzers violate those rules and that
> > > screws us over.
> > > 
> > > This results in alignment assertion failures when setting up
> > > allocations such as this in direct IO:
> > > 
> > > XFS: Assertion failed: ap->length, file: fs/xfs/libxfs/xfs_bmap.c, line: 3432
> > > ....
> > > Call Trace:
> > >  xfs_bmap_btalloc+0x415/0x910
> > >  xfs_bmapi_write+0x71c/0x12e0
> > >  xfs_iomap_write_direct+0x2a9/0x420
> > >  xfs_file_iomap_begin+0x4dc/0xa70
> > >  iomap_apply+0x43/0x100
> > >  iomap_file_buffered_write+0x62/0x90
> > >  xfs_file_buffered_aio_write+0xba/0x300
> > >  __vfs_write+0xd5/0x150
> > >  vfs_write+0xb6/0x180
> > >  ksys_write+0x45/0xa0
> > >  do_syscall_64+0x5a/0x180
> > >  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> > > 
> > > And from xfs_db:
> > > 
> > > core.extsize = 10380288
> > > 
> > > Which is not an integer multiple of the block size, and so violates
> > > Rule #7 for setting extent size hints. Validate extent size hint
> > > rules in the inode verifier to catch this.
> > > 
> > > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > > ---
> > >  fs/xfs/libxfs/xfs_inode_buf.c | 7 +++++++
> > >  1 file changed, 7 insertions(+)
> > > 
> > > diff --git a/fs/xfs/libxfs/xfs_inode_buf.c b/fs/xfs/libxfs/xfs_inode_buf.c
> > > index f5fff1ccb61d..be197c91307b 100644
> > > --- a/fs/xfs/libxfs/xfs_inode_buf.c
> > > +++ b/fs/xfs/libxfs/xfs_inode_buf.c
> > > @@ -385,6 +385,7 @@ xfs_dinode_verify(
> > >  	xfs_ino_t		ino,
> > >  	struct xfs_dinode	*dip)
> > >  {
> > > +	xfs_failaddr_t		fa;
> > >  	uint16_t		mode;
> > >  	uint16_t		flags;
> > >  	uint64_t		flags2;
> > > @@ -501,6 +502,12 @@ xfs_dinode_verify(
> > >  			return __this_address;
> > >  	}
> > >  
> > > +	/* extent size hint validation */
> > > +	fa = xfs_inode_validate_extsize(mp, be32_to_cpu(dip->di_extsize),
> > > +					mode, be32_to_cpu(dip->di_flags));
> > 
> > What if the cowextsize is garbage?  Do we handle that better, or do we
> > blow up there too?
> 
> I haven't checked (it was a v4 image that I was looking at) - are
> the rules the same?

Similar, but not entirely the same.  See xfs_inode_validate_cowextsize. :)

--D

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/3] xfs: validate btree records on retreival
  2018-06-05  4:02   ` Darrick J. Wong
@ 2018-06-05  4:39     ` Dave Chinner
  2018-06-05  5:08       ` Dave Chinner
  0 siblings, 1 reply; 17+ messages in thread
From: Dave Chinner @ 2018-06-05  4:39 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Mon, Jun 04, 2018 at 09:02:32PM -0700, Darrick J. Wong wrote:
> On Tue, Jun 05, 2018 at 12:43:13PM +1000, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > So we don't check the validity of records as we walk the btree. When
> > there are corrupt records in the free space btree (e.g. zero
> > startblock/length or beyond EOAG) we just blindly use it and things
> > go bad from there. That leads to assert failures on debug kernels
> > like this:
....
> > @@ -222,12 +224,24 @@ xfs_alloc_get_rec(
> >  	if (error || !(*stat))
> >  		return error;
> >  	if (rec->alloc.ar_blockcount == 0)
> > -		return -EFSCORRUPTED;
> > +		goto out_bad_rec;
> >  
> >  	*bno = be32_to_cpu(rec->alloc.ar_startblock);
> >  	*len = be32_to_cpu(rec->alloc.ar_blockcount);
> >  
> > -	return error;
> > +	if (!xfs_verify_agbno(mp, agno, *bno) ||
> > +	    !xfs_verify_agbno(mp, agno, *bno + *len - 1))
> 
> What about overflows?

I guess that also means I have to add a (*bno > *bno + *len) check
because it's all unsigned, right?

....

> > +	xfs_inobt_btrec_to_irec(mp, rec, irec);
> > +
> > +	if (!xfs_verify_agino(mp, agno, irec->ir_startino))
> > +		goto out_bad_rec;
> > +	if (irec->ir_count < XFS_INODES_PER_HOLEMASK_BIT ||
> > +	    irec->ir_count > XFS_INODES_PER_CHUNK)
> > +		goto out_bad_rec;
> > +	if (irec->ir_freecount > XFS_INODES_PER_CHUNK)
> > +		goto out_bad_rec;
> > +
> > +	/* if there are no holes, return the first available offset */
> > +	if (!xfs_inobt_issparse(irec->ir_holemask))
> > +		realfree = irec->ir_free;
> > +	else
> > +		realfree = irec->ir_free & xfs_inobt_irec_to_allocmask(irec);
> > +	if (hweight64(realfree) != irec->ir_freecount)
> > +		goto out_bad_rec;
> 
> Hmmm... there's a fair amount of shared logic between this and
> xfs_scrub_iallocbt_rec().

Probably, I did just wrote the checks from first principles.

> Maybe the validation logic should be a new
> xfs_btree_ops->verify_rec() function so that we can trap bad records as
> they come in, and share the code with the btree scrubbers?

Yeah, that's kinda what I was thinking is the next step. i.e this
patch starts us down the road, and then we drive it inwards to
capture more and more of the places we access the raw record data.

> This also gets us to the broader question of where do we draw the line
> between hot-path verifiers and online fsck?  This code (and its
> counterparts through the rest of this patch) are very very similar to
> the existing scrub/ record checking code, and as the fuzz attacks become
> more sophisticated it makes more sense just to run online fsck.

The problem is that we don't know we have a corruption if we don't
do checks like these, and hence have no trigger to tell us that
we need to scrub. And given the number of "my freespace tree got
corrupted, do you have any idea why?" reports we get, we need
something like this at runtime.

The other thing I was considering is that we do these checks in the
buffer verifier - we're already checksumming the metadata, so it's
hot in cache. Hence the overhead of checking each record isn't going
to be that bad, and it protects searches as well. That will probably
be a more efficient approach and it matches the way we currently do
most of the validation.

I'd still like to get this patch in first and then have some time to
work out how best to optimise it.

> Then we
> gain the secondary side effect of pulling all of an AG's metadata into
> memory right then and there, which (at least on fast storage) won't be
> so painful.  Also if we know an AG is toast we could adapt xfs to handle
> offlining of bad AGs while we fix them.
> 
> The awkward part is that online fsck is far short of a full deck...

I think we should aim for a robust, efficient corruption detection
mechanism first - repair comes after detection :P

> >  	return 0;
> > +
> > +out_bad_rec:
> > +	xfs_warn(mp,
> > +		"%s Inode BTree record corruption in AG %d detected!",
> > +		cur->bc_btnum == XFS_BTNUM_INO ? "Used" : "Free", agno);
> > +	xfs_warn(mp,
> > +"start inode 0x%x, count 0x%x, free 0x%x freemask 0x%llx, holemask 0x%x",
> > +		irec->ir_startino, irec->ir_count, irec->ir_freecount,
> > +		irec->ir_free, irec->ir_holemask);
> 
> I've also been wondering given the insane amount of dmesg output from
> the dangerous_*repair xfstests if we should be turning these into
> tracepoints so that we don't hose the system when people feed us garbage
> filesystems.  The downside is that you'd then have to use ftrace to
> capture the exactly buffer we fell over on.

The problem is that won't help us with the on-off corruption issues
or the people who cannot supply us with a metadump or reproducable
test case. Perhaps these should be ratelimited?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/3] xfs: more verifications!
  2018-06-05  3:38 ` [PATCH 0/3] xfs: more verifications! Darrick J. Wong
@ 2018-06-05  4:44   ` Dave Chinner
  0 siblings, 0 replies; 17+ messages in thread
From: Dave Chinner @ 2018-06-05  4:44 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Mon, Jun 04, 2018 at 08:38:00PM -0700, Darrick J. Wong wrote:
> On Tue, Jun 05, 2018 at 12:43:10PM +1000, Dave Chinner wrote:
> > Hi folks,
> > 
> > This patchset addresses a series of corruptions noticed in an image
> > provided by Wen Xu. The patches only address detecting corruptions
> > before they can do damage - it does not address/fix the crash the
> > image caused but instead prevents the bad information from getting
> > to the point where it can cause a crash.
> > 
> > The third patch addresses a general btree record validation issue;
> > we should probably dirve this inwards to each btree implementation
> > with a ->verify_record() callout, as this patch does not address
> > all the places that btree records are traversed during searches. It
> > will catch any attempt to use a bad record that a search lands on,
> > however.
> > 
> > Comments?
> 
> So, uh, does xfs_repair catch all these things and fix them?

Don't know - I couldn't test repair on the damaged image because it
was a single AG and repair barfed on the primary superblock and
couldn't recover it.

Yes, both Eric and I have asked for this fuzzing to be done on 2 AG
filesystems so we can reliably test the userspace tools against
them, too, but we're not making any progress there...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/3] xfs: verify extent size hint is valid in inode verifier
  2018-06-05  4:28       ` Darrick J. Wong
@ 2018-06-05  4:47         ` Dave Chinner
  0 siblings, 0 replies; 17+ messages in thread
From: Dave Chinner @ 2018-06-05  4:47 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Mon, Jun 04, 2018 at 09:28:36PM -0700, Darrick J. Wong wrote:
> On Tue, Jun 05, 2018 at 02:24:53PM +1000, Dave Chinner wrote:
> > On Mon, Jun 04, 2018 at 09:08:17PM -0700, Darrick J. Wong wrote:
> > > On Tue, Jun 05, 2018 at 12:43:12PM +1000, Dave Chinner wrote:
> > > > From: Dave Chinner <dchinner@redhat.com>
> > > > 
> > > > There are rules for vald extent size hints. We enforce them when
> > > > applications set them, but fuzzers violate those rules and that
> > > > screws us over.
> > > > 
> > > > This results in alignment assertion failures when setting up
> > > > allocations such as this in direct IO:
> > > > 
> > > > XFS: Assertion failed: ap->length, file: fs/xfs/libxfs/xfs_bmap.c, line: 3432
> > > > ....
> > > > Call Trace:
> > > >  xfs_bmap_btalloc+0x415/0x910
> > > >  xfs_bmapi_write+0x71c/0x12e0
> > > >  xfs_iomap_write_direct+0x2a9/0x420
> > > >  xfs_file_iomap_begin+0x4dc/0xa70
> > > >  iomap_apply+0x43/0x100
> > > >  iomap_file_buffered_write+0x62/0x90
> > > >  xfs_file_buffered_aio_write+0xba/0x300
> > > >  __vfs_write+0xd5/0x150
> > > >  vfs_write+0xb6/0x180
> > > >  ksys_write+0x45/0xa0
> > > >  do_syscall_64+0x5a/0x180
> > > >  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> > > > 
> > > > And from xfs_db:
> > > > 
> > > > core.extsize = 10380288
> > > > 
> > > > Which is not an integer multiple of the block size, and so violates
> > > > Rule #7 for setting extent size hints. Validate extent size hint
> > > > rules in the inode verifier to catch this.
> > > > 
> > > > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > > > ---
> > > >  fs/xfs/libxfs/xfs_inode_buf.c | 7 +++++++
> > > >  1 file changed, 7 insertions(+)
> > > > 
> > > > diff --git a/fs/xfs/libxfs/xfs_inode_buf.c b/fs/xfs/libxfs/xfs_inode_buf.c
> > > > index f5fff1ccb61d..be197c91307b 100644
> > > > --- a/fs/xfs/libxfs/xfs_inode_buf.c
> > > > +++ b/fs/xfs/libxfs/xfs_inode_buf.c
> > > > @@ -385,6 +385,7 @@ xfs_dinode_verify(
> > > >  	xfs_ino_t		ino,
> > > >  	struct xfs_dinode	*dip)
> > > >  {
> > > > +	xfs_failaddr_t		fa;
> > > >  	uint16_t		mode;
> > > >  	uint16_t		flags;
> > > >  	uint64_t		flags2;
> > > > @@ -501,6 +502,12 @@ xfs_dinode_verify(
> > > >  			return __this_address;
> > > >  	}
> > > >  
> > > > +	/* extent size hint validation */
> > > > +	fa = xfs_inode_validate_extsize(mp, be32_to_cpu(dip->di_extsize),
> > > > +					mode, be32_to_cpu(dip->di_flags));
> > > 
> > > What if the cowextsize is garbage?  Do we handle that better, or do we
> > > blow up there too?
> > 
> > I haven't checked (it was a v4 image that I was looking at) - are
> > the rules the same?
> 
> Similar, but not entirely the same.  See xfs_inode_validate_cowextsize. :)

OK, I'll cook up a similar patch then.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/3] xfs: validate btree records on retreival
  2018-06-05  4:39     ` Dave Chinner
@ 2018-06-05  5:08       ` Dave Chinner
  0 siblings, 0 replies; 17+ messages in thread
From: Dave Chinner @ 2018-06-05  5:08 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Tue, Jun 05, 2018 at 02:39:16PM +1000, Dave Chinner wrote:
> On Mon, Jun 04, 2018 at 09:02:32PM -0700, Darrick J. Wong wrote:
> > On Tue, Jun 05, 2018 at 12:43:13PM +1000, Dave Chinner wrote:
> > > From: Dave Chinner <dchinner@redhat.com>
> > > 
> > > So we don't check the validity of records as we walk the btree. When
> > > there are corrupt records in the free space btree (e.g. zero
> > > startblock/length or beyond EOAG) we just blindly use it and things
> > > go bad from there. That leads to assert failures on debug kernels
> > > like this:
> ....
> > > @@ -222,12 +224,24 @@ xfs_alloc_get_rec(
> > >  	if (error || !(*stat))
> > >  		return error;
> > >  	if (rec->alloc.ar_blockcount == 0)
> > > -		return -EFSCORRUPTED;
> > > +		goto out_bad_rec;
> > >  
> > >  	*bno = be32_to_cpu(rec->alloc.ar_startblock);
> > >  	*len = be32_to_cpu(rec->alloc.ar_blockcount);
> > >  
> > > -	return error;
> > > +	if (!xfs_verify_agbno(mp, agno, *bno) ||
> > > +	    !xfs_verify_agbno(mp, agno, *bno + *len - 1))
> > 
> > What about overflows?
> 
> I guess that also means I have to add a (*bno > *bno + *len) check
> because it's all unsigned, right?
> 
> ....
> 
> > > +	xfs_inobt_btrec_to_irec(mp, rec, irec);
> > > +
> > > +	if (!xfs_verify_agino(mp, agno, irec->ir_startino))
> > > +		goto out_bad_rec;
> > > +	if (irec->ir_count < XFS_INODES_PER_HOLEMASK_BIT ||
> > > +	    irec->ir_count > XFS_INODES_PER_CHUNK)
> > > +		goto out_bad_rec;
> > > +	if (irec->ir_freecount > XFS_INODES_PER_CHUNK)
> > > +		goto out_bad_rec;
> > > +
> > > +	/* if there are no holes, return the first available offset */
> > > +	if (!xfs_inobt_issparse(irec->ir_holemask))
> > > +		realfree = irec->ir_free;
> > > +	else
> > > +		realfree = irec->ir_free & xfs_inobt_irec_to_allocmask(irec);
> > > +	if (hweight64(realfree) != irec->ir_freecount)
> > > +		goto out_bad_rec;
> > 
> > Hmmm... there's a fair amount of shared logic between this and
> > xfs_scrub_iallocbt_rec().
> 
> Probably, I did just wrote the checks from first principles.

xfs_scrub_iallocbt_rec() is a lot more heavyweight. It does a lot
more stuff that what I just added, but I'm not sure it will improve
runtime corruption dtection all that much more. i.e I feel like the
additional checking falls on the wrong side of the cost/benefit
line.

Yeah, I know that line is kinda blurry....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 4/3] xfs: verify root inode more thoroughly
  2018-06-05  4:06   ` Darrick J. Wong
@ 2018-06-05  5:30     ` Dave Chinner
  0 siblings, 0 replies; 17+ messages in thread
From: Dave Chinner @ 2018-06-05  5:30 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Mon, Jun 04, 2018 at 09:06:57PM -0700, Darrick J. Wong wrote:
> On Tue, Jun 05, 2018 at 12:57:04PM +1000, Dave Chinner wrote:
> > 
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > When looking up the root inode at mount time, we don't actually do
> > any verification to check that the inode is allocated and accounted
> > for correctly in the INOBT. Make the checks on the root inode more
> > robust by making it an untrusted lookup. This forces the inode
> > lookup to use the inode btree to verify the inode is allocated
> > and mapped correctly to disk. This will also have the effect of
> > catching a significant number of AGI/INOBT related corruptions in
> > AG 0 at mount time.
> > 
> > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > 
> > ---
> >  fs/xfs/xfs_mount.c | 7 +++++--
> >  1 file changed, 5 insertions(+), 2 deletions(-)
> > 
> > diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
> > index 189fa7b615d3..a3378252baa1 100644
> > --- a/fs/xfs/xfs_mount.c
> > +++ b/fs/xfs/xfs_mount.c
> > @@ -862,9 +862,12 @@ xfs_mountfs(
> >  	 * Get and sanity-check the root inode.
> >  	 * Save the pointer to it in the mount structure.
> >  	 */
> > -	error = xfs_iget(mp, NULL, sbp->sb_rootino, 0, XFS_ILOCK_EXCL, &rip);
> > +	error = xfs_iget(mp, NULL, sbp->sb_rootino, XFS_IGET_UNTRUSTED,
> 
> One little quirk I've noticed with xfs_iget is that a corrupt inode
> buffer's -EFSCORRUPTED gets turned into -EINVAL on the way out of iget.

That's in xfs_imap_to_bp(), right?

And the only place we care about this is xfs_nfs_get_inode() so that
we return ESTALE rather than EFSCORRUPTED, yes?

Ok, so let's fix that.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2018-06-05  5:30 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-06-05  2:43 [PATCH 0/3] xfs: more verifications! Dave Chinner
2018-06-05  2:43 ` [PATCH 1/3] xfs: catch bad stripe alignment configurations Dave Chinner
2018-06-05  3:44   ` Darrick J. Wong
2018-06-05  2:43 ` [PATCH 2/3] xfs: verify extent size hint is valid in inode verifier Dave Chinner
2018-06-05  4:08   ` Darrick J. Wong
2018-06-05  4:24     ` Dave Chinner
2018-06-05  4:28       ` Darrick J. Wong
2018-06-05  4:47         ` Dave Chinner
2018-06-05  2:43 ` [PATCH 3/3] xfs: validate btree records on retreival Dave Chinner
2018-06-05  4:02   ` Darrick J. Wong
2018-06-05  4:39     ` Dave Chinner
2018-06-05  5:08       ` Dave Chinner
2018-06-05  2:57 ` [PATCH 4/3] xfs: verify root inode more thoroughly Dave Chinner
2018-06-05  4:06   ` Darrick J. Wong
2018-06-05  5:30     ` Dave Chinner
2018-06-05  3:38 ` [PATCH 0/3] xfs: more verifications! Darrick J. Wong
2018-06-05  4:44   ` Dave Chinner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox