public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCHSET v2 00/12] xfs: improve runtime refcountbt corruption detection
@ 2022-10-27 17:14 Darrick J. Wong
  2022-10-27 17:14 ` [PATCH 01/12] xfs: make sure aglen never goes negative in xfs_refcount_adjust_extents Darrick J. Wong
                   ` (11 more replies)
  0 siblings, 12 replies; 30+ messages in thread
From: Darrick J. Wong @ 2022-10-27 17:14 UTC (permalink / raw)
  To: djwong; +Cc: Dave Chinner, linux-xfs

Hi all,

Fuzz testing of the refcount btree demonstrated a weakness in validation
of refcount btree records during normal runtime.  The idea of using the
upper bit of the rc_startblock field to separate the refcount records
into one group for shared space and another for CoW staging extents was
added at the last minute.  The incore struct left this bit encoded in
the upper bit of the startblock field, which makes it all too easy for
arithmetic operations to overflow if we don't detect the cowflag
properly.

When I ran a norepair fuzz tester, I was able to crash the kernel on one
of these accidental overflows by fuzzing a key record in a node block,
which broke lookups.  To fix the problem, make the domain (shared/cow) a
separate field in the incore record.

Unfortunately, a customer also hit this once in production.  Due to bugs
in the kernel running on the VM host, writes to the disk image would
occasionally be lost.  Given sufficient memory pressure on the VM guest,
a refcountbt xfs_buf could be reclaimed and later reloaded from the
stale copy on the virtual disk.  The stale disk contents were a refcount
btree leaf block full of records for the wrong domain, and this caused
an infinite loop in the guest VM.

v2: actually include the refcount adjust loop invariant checking patch;
    move the deferred refcount continuation checks earlier in the series;
    break up the megapatch into smaller pieces; fix an uninitialized list
    error.

If you're going to start using this mess, you probably ought to just
pull from my git trees, which are linked below.

This is an extraordinary way to destroy everything.  Enjoy!
Comments and questions are, as always, welcome.

--D

kernel git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=refcount-cow-domain-6.1
---
 fs/xfs/libxfs/xfs_format.h         |   22 ---
 fs/xfs/libxfs/xfs_refcount.c       |  297 ++++++++++++++++++++++++++----------
 fs/xfs/libxfs/xfs_refcount.h       |   40 ++++-
 fs/xfs/libxfs/xfs_refcount_btree.c |   15 +-
 fs/xfs/libxfs/xfs_types.h          |   30 ++++
 fs/xfs/scrub/refcount.c            |   74 ++++-----
 fs/xfs/xfs_trace.h                 |   48 +++++-
 7 files changed, 362 insertions(+), 164 deletions(-)


^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2022-10-27 23:25 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-10-27 17:14 [PATCHSET v2 00/12] xfs: improve runtime refcountbt corruption detection Darrick J. Wong
2022-10-27 17:14 ` [PATCH 01/12] xfs: make sure aglen never goes negative in xfs_refcount_adjust_extents Darrick J. Wong
2022-10-27 20:41   ` Dave Chinner
2022-10-27 17:14 ` [PATCH 02/12] xfs: check deferred refcount op continuation parameters Darrick J. Wong
2022-10-27 20:49   ` Dave Chinner
2022-10-27 21:32     ` Darrick J. Wong
2022-10-27 21:42       ` Darrick J. Wong
2022-10-27 22:24       ` Dave Chinner
2022-10-27 23:25         ` Darrick J. Wong
2022-10-27 21:54   ` [PATCH v2.1 " Darrick J. Wong
2022-10-27 17:14 ` [PATCH 03/12] xfs: move _irec structs to xfs_types.h Darrick J. Wong
2022-10-27 17:14 ` [PATCH 04/12] xfs: refactor refcount record usage in xchk_refcountbt_rec Darrick J. Wong
2022-10-27 17:14 ` [PATCH 05/12] xfs: track cow/shared record domains explicitly in xfs_refcount_irec Darrick J. Wong
2022-10-27 21:03   ` Dave Chinner
2022-10-27 21:10     ` Darrick J. Wong
2022-10-27 17:14 ` [PATCH 06/12] xfs: report refcount domain in tracepoints Darrick J. Wong
2022-10-27 21:05   ` Dave Chinner
2022-10-27 17:14 ` [PATCH 07/12] xfs: refactor domain and refcount checking Darrick J. Wong
2022-10-27 21:07   ` Dave Chinner
2022-10-27 17:14 ` [PATCH 08/12] xfs: remove XFS_FIND_RCEXT_SHARED and _COW Darrick J. Wong
2022-10-27 21:11   ` Dave Chinner
2022-10-27 17:14 ` [PATCH 09/12] xfs: check record domain when accessing refcount records Darrick J. Wong
2022-10-27 21:15   ` Dave Chinner
2022-10-27 21:33     ` Darrick J. Wong
2022-10-27 17:14 ` [PATCH 10/12] xfs: fix agblocks check in the cow leftover recovery function Darrick J. Wong
2022-10-27 21:22   ` Dave Chinner
2022-10-27 17:15 ` [PATCH 11/12] xfs: fix uninitialized list head in struct xfs_refcount_recovery Darrick J. Wong
2022-10-27 21:24   ` Dave Chinner
2022-10-27 17:15 ` [PATCH 12/12] xfs: rename XFS_REFC_COW_START to _COWFLAG Darrick J. Wong
2022-10-27 21:25   ` Dave Chinner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox