Re: Corruption of in-memory data (0x8) detected at xfs_defer_finish_noroll on kernel 6.3

public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed

From: Dave Chinner <david@fromorbit.com>
To: Mike Pastore <mike@oobak.org>
Cc: linux-xfs@vger.kernel.org
Subject: Re: Corruption of in-memory data (0x8) detected at xfs_defer_finish_noroll on kernel 6.3
Date: Wed, 3 May 2023 08:02:58 +1000	[thread overview]
Message-ID: <20230502220258.GA3223426@dread.disaster.area> (raw)
In-Reply-To: <CAP_NaWaozOVBoJXtuXTRUWsbmGV4FQUbSPvOPHmuTO7F_FdA4g@mail.gmail.com>

On Tue, May 02, 2023 at 02:14:34PM -0500, Mike Pastore wrote:
> Hi folks,
> 
> I was playing around with some blockchain projects yesterday and had
> some curious crashes while syncing blockchain databases on XFS
> filesystems under kernel 6.3.
> 
>   * kernel 6.3.0 and 6.3.1 (ubuntu mainline)
>   * w/ and w/o the discard mount flag
>   * w/ and w/o -m crc=0
>   * ironfish (nodejs) and ergo (jvm)
> 
> The hardware is as follows:
> 
>   * Asus PRIME H670-PLUS D4
>   * Intel Core i5-12400
>   * 32GB DDR4-3200 Non-ECC UDIMM
> 
> In all cases the filesystems were newly-created under kernel 6.3 on an
> LVM2 stripe and mounted with the noatime flag. Here is the output of
> the mkfs.xfs command (after reverting back to 6.2.14—which I realize
> may not be the most helpful thing, but here it is anyway):
> 
> $ sudo lvremove -f vgtethys/ironfish
> $ sudo lvcreate -n ironfish-L 10G -i2 vgtethys /dev/nvme[12]n1p3
>   Using default stripesize 64.00 KiB.
>   Logical volume "ironfish" created.
> $ sudo mkfs.xfs -m crc=0 -m uuid=b4725d43-a12d-42df-981a-346af2809fad
> -s size=4096 /dev/vgtethys/ironfish
> meta-data=/dev/vgtethys/ironfish isize=256    agcount=16, agsize=163824 blks
>          =                       sectsz=4096  attr=2, projid32bit=1
>          =                       crc=0        finobt=0, sparse=0, rmapbt=0
>          =                       reflink=0    bigtime=0 inobtcount=0
> data     =                       bsize=4096   blocks=2621184, imaxpct=25
>          =                       sunit=16     swidth=32 blks

Stripe aligned allocation is enabled. Does the problem go away
when you use mkfs.xfs -d noalign .... ?

> The applications crash with I/O errors. Here's what I see in dmesg:
> 
> May 01 18:56:59 tethys kernel: XFS (dm-28): Internal error bno + len >
> gtbno at line 1908 of file fs/xfs/libxfs/xfs_alloc.c.  Caller
> xfs_free_ag_extent+0x14e/0x950 [xfs]

                        /*                                                       
                         * If this failure happens the request to free this      
                         * space was invalid, it's (partly) already free.        
                         * Very bad.                                             
                         */                                                      
                        if (XFS_IS_CORRUPT(mp, ltbno + ltlen > bno)) {           
                                error = -EFSCORRUPTED;                           
                                goto error0;                                     
                        }                                                        

That failure implies the btree records are corrupt in memory,
possibly due to memory corruption from something outside the XFS
code (e.g. use after free).

> May 01 18:56:59 tethys kernel: CPU: 2 PID: 48657 Comm: node Tainted: P
>           OE      6.3.1-060301-generic #202304302031

The kernel being run has been tainted by out of tree proprietary
drivers (a common source of memory corruption bugs in my
experience). Can you reproduce this problem with an untainted
kernel?

....

> And here's what I see in dmesg after rebooting and attempting to mount
> the filesystem to replay the log:
> 
> May 01 21:34:15 tethys kernel: XFS (dm-35): Metadata corruption
> detected at xfs_inode_buf_verify+0x168/0x190 [xfs], xfs_inode block
> 0x1405a0 xfs_inode_buf_verify
> May 01 21:34:15 tethys kernel: XFS (dm-35): Unmount and run xfs_repair
> May 01 21:34:15 tethys kernel: XFS (dm-35): First 128 bytes of
> corrupted metadata buffer:
> May 01 21:34:15 tethys kernel: 00000000: 5b 40 e2 3a ae 52 a0 7a 17 1d

That's not an inode buffer. It's not recognisable as XFS metadata at
all, which indicates some other problem.

Oh, this was from a test with "mkfs.xfs -m crc=0 ...", right? Please
don't use "-m crc=0" - that format is deprecated partly because it
has unfixable on-disk format recovery issues. One of those issues
manifests as an inode recovery failure because the underlying inode
buffer allocation/init does not get replayed correctly before we
attempt to replay inode changes into the buffer (that has not be
initialised)....

i.e. one of those unfixable issues manifest exactly like the
recovery failure being reported here.

> Blockchain projects tend to generate pathological filesystem loads;
> the sustained random write activity and constant (re)allocations must
> be pushing on some soft spot here.

There was a significant allocator infrastructure rewrite in 6.3. If
running an untainted kernel on an unaligned, CRC enabled filesystem
makes the problems go away, then it rules out known issues with the
rewrite.

Alternatively, if it is reproducable in a short time, you may be
able to bisect the XFS changes that landed between 6.2 and 6.3 to
find which change triggers the problem.

> Reverting to kernel 6.2.14 and
> recreating the filesystems seems to have resolved the issue—so far, at
> least—but obviously this is less than ideal. If someone would be
> willing to provide a targeted listed of desired artifacts I'd be happy
> to boot back into kernel 6.3.1 to reproduce the issue and collect
> them. Alternatively I can try to eliminate some variables (like LVM2,
> potential hardware instabilities, etc.) and provide step-by-step
> directions for reproducing the issue on another machine.

If you can find a minimal reproducer, that would help a lot in
diagnosing the issue.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

next prev parent reply	other threads:[~2023-05-02 22:03 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-02 19:14 Corruption of in-memory data (0x8) detected at xfs_defer_finish_noroll on kernel 6.3 Mike Pastore
2023-05-02 22:02 ` Dave Chinner [this message]
     [not found]   ` <CAP_NaWZEcv3B0nPEFguxVuQ8m93mO7te-bZDfwo-C8eN+f_KNA@mail.gmail.com>
2023-05-02 23:13     ` Dave Chinner
2023-05-23 21:32       ` Justin Forbes
2023-05-24  5:42         ` Dave Chinner
2023-05-25  2:24         ` Eric Sandeen
2023-05-25  2:15 ` Eric Sandeen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230502220258.GA3223426@dread.disaster.area \
    --to=david@fromorbit.com \
    --cc=linux-xfs@vger.kernel.org \
    --cc=mike@oobak.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox