Linux Btrfs filesystem development
 help / color / mirror / Atom feed
From: David Sterba <dsterba@suse.cz>
To: David Sterba <dsterba@suse.cz>
Cc: Qu Wenruo <wqu@suse.com>, linux-btrfs@vger.kernel.org
Subject: Re: [PATCH RFC] btrfs: raid56: extra debug for raid6 syndrome generation
Date: Wed, 21 Feb 2024 16:04:24 +0100	[thread overview]
Message-ID: <20240221150424.GK355@twin.jikos.cz> (raw)
In-Reply-To: <20240214073855.GO355@twin.jikos.cz>

On Wed, Feb 14, 2024 at 08:38:55AM +0100, David Sterba wrote:
> On Fri, Jan 26, 2024 at 01:51:32PM +1030, Qu Wenruo wrote:
> > [BUG]
> > I have got at least two crash report for RAID6 syndrome generation, no
> > matter if it's AVX2 or SSE2, they all seems to have a similar
> > calltrace with corrupted RAX:
> > 
> >  BUG: kernel NULL pointer dereference, address: 0000000000000000
> >  #PF: supervisor read access in kernel mode
> >  #PF: error_code(0x0000) - not-present page
> >  PGD 0 P4D 0
> >  Oops: 0000 [#1] PREEMPT SMP PTI
> >  Workqueue: btrfs-rmw rmw_rbio_work [btrfs]
> >  RIP: 0010:raid6_sse21_gen_syndrome+0x9e/0x130 [raid6_pq]
> >  RAX: 0000000000000000 RBX: 0000000000001000 RCX: ffffa0ff4cfa3248
> >  RDX: 0000000000000000 RSI: ffffa0f74cfa3238 RDI: 0000000000000000
> >  Call Trace:
> >   <TASK>
> >   rmw_rbio+0x5c8/0xa80 [btrfs]
> >   process_one_work+0x1c7/0x3d0
> >   worker_thread+0x4d/0x380
> >   kthread+0xf3/0x120
> >   ret_from_fork+0x2c/0x50
> >   </TASK>
> > 
> > [CAUSE]
> > In fact I don't have any clue.
> > 
> > Recently I also hit this in AVX512 path, and that's even in v5.15
> > backport, which doesn't have any of my RAID56 rework.
> > 
> > Furthermore according to the registers:
> > 
> >  RAX: 0000000000000000 RBX: 0000000000001000 RCX: ffffa0ff4cfa3248
> > 
> > The RAX register is showing the number of stripes (including PQ),
> > which is not correct (0).
> > But the remaining two registers are all sane.
> > 
> > - RBX is the sectorsize
> >   For x86_64 it should always be 4K and matches the output.
> > 
> > - RCX is the pointers array
> >   Which is from rbio->finish_pointers, and it looks like a sane
> >   kernel address.
> > 
> > [WORKAROUND]
> > For now, I can only add extra debug ASSERT()s before we call raid6
> > gen_syndrome() helper and hopes to catch the problem.
> > 
> > The debug requires both CONFIG_BTRFS_DEBUG and CONFIG_BTRFS_ASSERT
> > enabled.
> > 
> > My current guess is some use-after-free, but every report is only having
> > corrupted RAX but seemingly valid pointers doesn't make much sense.
> > 
> > Signed-off-by: Qu Wenruo <wqu@suse.com>
> 
> Reviewed-by: David Sterba <dsterba@suse.com>
> 
> I haven't seen the crash for some time but with this patch I may add
> some info once it happens again.

For the record, I added this patch to for-next.

      reply	other threads:[~2024-02-21 15:05 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-26  3:21 [PATCH RFC] btrfs: raid56: extra debug for raid6 syndrome generation Qu Wenruo
2024-02-14  7:38 ` David Sterba
2024-02-21 15:04   ` David Sterba [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240221150424.GK355@twin.jikos.cz \
    --to=dsterba@suse.cz \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=wqu@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox