linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Matthew Wilcox <willy@infradead.org>
To: Daniel Dao <dqminh@cloudflare.com>
Cc: linux-fsdevel@vger.kernel.org, Dave Chinner <david@fromorbit.com>,
	kernel-team <kernel-team@cloudflare.com>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	djwong@kernel.org
Subject: Re: Kernel NULL pointer deref and data corruptions with xfs on 6.1
Date: Thu, 27 Jul 2023 04:27:40 +0100	[thread overview]
Message-ID: <ZMHkLA+r2K6hKsr5@casper.infradead.org> (raw)
In-Reply-To: <CA+wXwBRGab3UqbLqsr8xG=ZL2u9bgyDNNea4RGfTDjqB=J3geQ@mail.gmail.com>

On Fri, Jul 21, 2023 at 11:49:04AM +0100, Daniel Dao wrote:
> We do not have a reproducer yet, but we now have more debugging data
> which hopefully
> should help narrow this down. Details as followed:
> 
> 1. Kernel NULL pointer deferencences in __filemap_get_folio
> 
> This happened on a few different hosts, with a few different repeated addresses.
> The addresses are 0000000000000036, 0000000000000076,
> 00000000000000f6. This looks
> like the xarray is corrupted and we were trying to do some work on a
> sibling entry.

I think I have a fix for this one.  Please try the attached.

> 2. Kernel NULL pointer deferencences in xfs_read_iomap_begin
> 
>     BUG: unable to handle page fault for address: 0000000000034668
>     #PF: supervisor read access in kernel mode
>     #PF: error_code(0x0000) - not-present page
>     PGD 11cfd37067 P4D 11cfd37067 PUD b88086067 PMD 0
>     Oops: 0000 [#1] PREEMPT SMP NOPTI
>     CPU: 124 PID: 3831226 Comm: rocksdb:low Kdump: loaded Tainted: G
>      W  O L     6.1.27-cloudflare-2023.5.0 #1
>     Hardware name: HYVE EDGE-METAL-GEN11/HS1811D_Lite, BIOS V0.11-sig 12/23/2022
>     RIP: 0010:xfs_read_iomap_begin (fs/xfs/xfs_iomap.c:1200)
>     Code: 0f 1f 44 00 00 41 57 41 56 41 55 41 54 55 53 48 83 ec 50 48
> 89 14 24 4c 89 44 24 08 65 48 8b 04 25 28 00 00 00 48 89 44 24 48 <48>
> 8b 87 >
>     All code
>     ========
>       0:   0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)
>       5:   41 57                   push   %r15
>       7:   41 56                   push   %r14
>       9:   41 55                   push   %r13
>       b:   41 54                   push   %r12
>       d:   55                      push   %rbp
>       e:   53                      push   %rbx
>       f:   48 83 ec 50             sub    $0x50,%rsp
>       13:   48 89 14 24             mov    %rdx,(%rsp)
>       17:   4c 89 44 24 08          mov    %r8,0x8(%rsp)
>       1c:   65 48 8b 04 25 28 00    mov    %gs:0x28,%rax
>       23:   00 00
>       25:   48 89 44 24 48          mov    %rax,0x48(%rsp)
>       2a:*  48                      rex.W           <-- trapping instruction
>       2b:   8b                      .byte 0x8b
>       2c:   87 00                   xchg   %eax,(%rax)
> 
>     Code starting with the faulting instruction
>     ===========================================
>       0:   48                      rex.W
>       1:   8b                      .byte 0x8b
>       2:   87 00                   xchg   %eax,(%rax)

This one is hard to understand because the decoding of the instruction
got cut off.  But ...

>     RSP: 0018:ffffa63810733a70 EFLAGS: 00010282
>     RAX: 78ac714f0997e100 RBX: ffffa63810733b40 RCX: 0000000000000000
>     RDX: 0000000000004000 RSI: 0000000000000000 RDI: 00000000000347a0

RDI is kind of close to the fault address ... RDI is used as the first
argument in the x86-64 SYSV ABI, and the first parameter to
xfs_read_iomap_begin() is supposed to be a struct inode pointer.

I don't think this is related.

> We also have a deadlock reading a very specific file on this host. We managed to
> do a kdump on this host and extracted out the state of the mapping.

This is almost certainly a different bug, but alos XArray related, so
I'll keep looking at this one.

  parent reply	other threads:[~2023-07-27  3:28 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-21 10:49 Kernel NULL pointer deref and data corruptions with xfs on 6.1 Daniel Dao
2023-07-24 11:23 ` Daniel Dao
2023-07-24 21:45   ` Dave Chinner
2023-07-24 22:04     ` Daniel Dao
2023-07-25  3:41     ` Matthew Wilcox
2023-07-27  3:27 ` Matthew Wilcox [this message]
2023-07-27 10:25   ` Daniel Dao
2023-07-27 12:27     ` Matthew Wilcox
2023-08-04 16:57       ` Frederick Lawler
2023-08-30 19:26         ` Frederick Lawler

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZMHkLA+r2K6hKsr5@casper.infradead.org \
    --to=willy@infradead.org \
    --cc=david@fromorbit.com \
    --cc=djwong@kernel.org \
    --cc=dqminh@cloudflare.com \
    --cc=kernel-team@cloudflare.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).