All of lore.kernel.org
 help / color / mirror / Atom feed
From: Colgate Minuette <rabbit@minuette.net>
To: linux-raid@vger.kernel.org, Yu Kuai <yukuai1@huaweicloud.com>
Cc: "yukuai (C)" <yukuai3@huawei.com>,
	"yangerkun@huawei.com" <yangerkun@huawei.com>
Subject: Re: General Protection Fault in md raid10
Date: Sun, 28 Apr 2024 19:18:16 -0700	[thread overview]
Message-ID: <4561772.LvFx2qVVIh@sparkler> (raw)
In-Reply-To: <208eb375-4859-3b32-59d6-7243f9892f1e@huaweicloud.com>

On Sunday, April 28, 2024 6:02:30 PM PDT Yu Kuai wrote:
> Hi,
> 
> 在 2024/04/29 3:41, Colgate Minuette 写道:
> > Hello all,
> > 
> > I am trying to set up an md raid-10 array spanning 8 disks using the
> > following command
> > 
> >> mdadm --create /dev/md64 --level=10 --layout=o2 -n 8 /dev/sd[efghijkl]1
> > 
> > The raid is created successfully, but the moment that the newly created
> > raid starts initial sync, a general protection fault is issued. This
> > fault happens on kernels 6.1.85, 6.6.26, and 6.8.5 using mdadm version
> > 4.3. The raid is then completely unusable. After the fault, if I try to
> > stop the raid using> 
> >> mdadm --stop /dev/md64
> > 
> > mdadm hangs indefinitely.
> > 
> > I have tried raid levels 0 and 6, and both work as expected without any
> > errors on these same 8 drives. I also have a working md raid-10 on the
> > system already with 4 disks(not related to this 8 disk array).
> > 
> > Other things I have tried include trying to create/sync the raid from a
> > debian live environment, and using near/far/offset layouts, but both
> > methods came back with the same protection fault. Also ran a memory test
> > on the computer, but did not have any errors after 10 passes.
> > 
> > Below is the output from the general protection fault. Let me know of
> > anything else to try or log information that would be helpful to
> > diagnose.
> > 
> > [   10.965542] md64: detected capacity change from 0 to 120021483520
> > [   10.965593] md: resync of RAID array md64
> > [   10.999289] general protection fault, probably for non-canonical
> > address
> > 0xd071e7fff89be: 0000 [#1] PREEMPT SMP NOPTI
> > [   11.000842] CPU: 4 PID: 912 Comm: md64_raid10 Not tainted
> > 6.1.85-1-MANJARO #1 44ae6c380f5656fa036749a28fdade8f34f2f9ce
> > [   11.001192] Hardware name: ASUS System Product Name/TUF GAMING
> > X670E-PLUS WIFI, BIOS 1618 05/18/2023
> > [   11.001482] RIP: 0010:bio_copy_data_iter+0x187/0x260
> > [   11.001756] Code: 29 f1 4c 29 f6 48 c1 f9 06 48 c1 fe 06 48 c1 e1 0c 48
> > c1 e6 0c 48 01 e9 48 01 ee 48 01 d9 4c 01 d6 83 fa 08 0f 82 b0 fe ff ff
> > <48> 8b 06 48 89 01 89 d0 48 8b 7c 06 f8 48 89 7c 01 f8 48 8d 79 08
> > [   11.002045] RSP: 0018:ffffa838124ffd28 EFLAGS: 00010216
> > [   11.002336] RAX: ffffca0a84195a80 RBX: 0000000000000000 RCX:
> > ffff89be8656a000 [   11.002628] RDX: 0000000000000642 RSI:
> > 000d071e7fff89be RDI: ffff89beb4039df8 [   11.002922] RBP:
> > ffff89bd80000000 R08: ffffa838124ffd74 R09: ffffa838124ffd60 [  
> > 11.003217] R10: 00000000000009be R11: 0000000000002000 R12:
> > ffff89be8bbff400 [   11.003522] R13: ffff89beb4039a00 R14:
> > ffffca0a80000000 R15: 0000000000001000 [   11.003825] FS: 
> > 0000000000000000(0000) GS:ffff89c5b8700000(0000) knlGS: 0000000000000000
> > [   11.004126] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [   11.004429] CR2: 0000563308baac38 CR3: 000000012e900000 CR4:
> > 0000000000750ee0
> > [   11.004737] PKRU: 55555554
> > [   11.005040] Call Trace:
> > [   11.005342]  <TASK>
> > [   11.005645]  ? __die_body.cold+0x1a/0x1f
> > [   11.005951]  ? die_addr+0x3c/0x60
> > [   11.006256]  ? exc_general_protection+0x1c1/0x380
> > [   11.006562]  ? asm_exc_general_protection+0x26/0x30
> > [   11.006865]  ? bio_copy_data_iter+0x187/0x260
> > [   11.007169]  bio_copy_data+0x5c/0x80
> > [   11.007474]  raid10d+0xcad/0x1c00 [raid10
> > 1721e6c9d579361bf112b0ce400eec9240452da1]
> 
> Can you try to use addr2line or gdb to locate which this code line
> is this correspond to?
> 
> I never see problem like this before... And it'll be greate if you
> can bisect this since you can reporduce this problem easily.
> 
> Thanks,
> Kuai
> 

Can you provide guidance on how to do this? I haven't ever debugged kernel 
code before. I'm assuming this would be in the raid10.ko module, but don't 
know where to go from there.

-Colgate 



  reply	other threads:[~2024-04-29  2:18 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-28 19:41 General Protection Fault in md raid10 Colgate Minuette
2024-04-27 16:21 ` Paul E Luse
2024-04-28 20:07   ` Colgate Minuette
2024-04-27 18:22     ` Paul E Luse
2024-04-28 22:16       ` Colgate Minuette
2024-04-28 22:25         ` Roman Mamedov
2024-04-28 22:38           ` Colgate Minuette
2024-04-29  1:02 ` Yu Kuai
2024-04-29  2:18   ` Colgate Minuette [this message]
2024-04-29  3:12     ` Yu Kuai
2024-04-29  4:30       ` Colgate Minuette
2024-04-29  6:06         ` Yu Kuai
2024-04-29  6:39           ` Colgate Minuette
2024-04-29  7:06             ` Colgate Minuette
2024-04-29  7:52               ` Yu Kuai

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4561772.LvFx2qVVIh@sparkler \
    --to=rabbit@minuette.net \
    --cc=linux-raid@vger.kernel.org \
    --cc=yangerkun@huawei.com \
    --cc=yukuai1@huaweicloud.com \
    --cc=yukuai3@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.