From: David Sterba <dsterba@suse.cz>
To: David Sterba <dsterba@suse.cz>
Cc: Qu Wenruo <wqu@suse.com>, linux-btrfs@vger.kernel.org
Subject: Re: [PATCH RFC] btrfs: raid56: extra debug for raid6 syndrome generation
Date: Wed, 21 Feb 2024 16:04:24 +0100 [thread overview]
Message-ID: <20240221150424.GK355@twin.jikos.cz> (raw)
In-Reply-To: <20240214073855.GO355@twin.jikos.cz>
On Wed, Feb 14, 2024 at 08:38:55AM +0100, David Sterba wrote:
> On Fri, Jan 26, 2024 at 01:51:32PM +1030, Qu Wenruo wrote:
> > [BUG]
> > I have got at least two crash report for RAID6 syndrome generation, no
> > matter if it's AVX2 or SSE2, they all seems to have a similar
> > calltrace with corrupted RAX:
> >
> > BUG: kernel NULL pointer dereference, address: 0000000000000000
> > #PF: supervisor read access in kernel mode
> > #PF: error_code(0x0000) - not-present page
> > PGD 0 P4D 0
> > Oops: 0000 [#1] PREEMPT SMP PTI
> > Workqueue: btrfs-rmw rmw_rbio_work [btrfs]
> > RIP: 0010:raid6_sse21_gen_syndrome+0x9e/0x130 [raid6_pq]
> > RAX: 0000000000000000 RBX: 0000000000001000 RCX: ffffa0ff4cfa3248
> > RDX: 0000000000000000 RSI: ffffa0f74cfa3238 RDI: 0000000000000000
> > Call Trace:
> > <TASK>
> > rmw_rbio+0x5c8/0xa80 [btrfs]
> > process_one_work+0x1c7/0x3d0
> > worker_thread+0x4d/0x380
> > kthread+0xf3/0x120
> > ret_from_fork+0x2c/0x50
> > </TASK>
> >
> > [CAUSE]
> > In fact I don't have any clue.
> >
> > Recently I also hit this in AVX512 path, and that's even in v5.15
> > backport, which doesn't have any of my RAID56 rework.
> >
> > Furthermore according to the registers:
> >
> > RAX: 0000000000000000 RBX: 0000000000001000 RCX: ffffa0ff4cfa3248
> >
> > The RAX register is showing the number of stripes (including PQ),
> > which is not correct (0).
> > But the remaining two registers are all sane.
> >
> > - RBX is the sectorsize
> > For x86_64 it should always be 4K and matches the output.
> >
> > - RCX is the pointers array
> > Which is from rbio->finish_pointers, and it looks like a sane
> > kernel address.
> >
> > [WORKAROUND]
> > For now, I can only add extra debug ASSERT()s before we call raid6
> > gen_syndrome() helper and hopes to catch the problem.
> >
> > The debug requires both CONFIG_BTRFS_DEBUG and CONFIG_BTRFS_ASSERT
> > enabled.
> >
> > My current guess is some use-after-free, but every report is only having
> > corrupted RAX but seemingly valid pointers doesn't make much sense.
> >
> > Signed-off-by: Qu Wenruo <wqu@suse.com>
>
> Reviewed-by: David Sterba <dsterba@suse.com>
>
> I haven't seen the crash for some time but with this patch I may add
> some info once it happens again.
For the record, I added this patch to for-next.
prev parent reply other threads:[~2024-02-21 15:05 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-01-26 3:21 [PATCH RFC] btrfs: raid56: extra debug for raid6 syndrome generation Qu Wenruo
2024-02-14 7:38 ` David Sterba
2024-02-21 15:04 ` David Sterba [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240221150424.GK355@twin.jikos.cz \
--to=dsterba@suse.cz \
--cc=linux-btrfs@vger.kernel.org \
--cc=wqu@suse.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox