From: Bingbu Cao <bingbu.cao@linux.intel.com>
To: Marc Zyngier <maz@kernel.org>
Cc: linux-kernel@vger.kernel.org, johan+linaro@kernel.org,
hsinyi@chromium.org, nirmal.patel@linux.intel.com,
jonathan.derrick@linux.dev, david.e.box@linux.intel.com
Subject: Re: System boot failure related to commit 'irqdomain: Switch to per-domain locking'
Date: Thu, 2 Mar 2023 16:55:21 +0800 [thread overview]
Message-ID: <bc72258e-ce88-6812-08bf-0f16f15e02ce@linux.intel.com> (raw)
In-Reply-To: <87o7pch6lo.wl-maz@kernel.org>
Zyngier and Hovold,
On 3/1/23 10:46 PM, Marc Zyngier wrote:
> On Wed, 01 Mar 2023 11:17:21 +0000,
> Bingbu Cao <bingbu.cao@linux.intel.com> wrote:
>>
>>
>> On 2/28/23 8:45 AM, Marc Zyngier wrote:
>>> On 2023-02-27 10:46, Bingbu Cao wrote:
>>>> Hi, Johan and Zyngier,
>>>>
>>>> I am using a Dell XPS laptop(Intel Processor) just update my
>>>> Linux kernel to latest tag 6.2.0, and then I see that the kernel
>>>> cannot boot successfully, it reported:
>>>> --------------------------------------------
>>>> Gave up waiting for root file system device. Common problems:
>>>> - Boot args (cat /proc/cmdline)
>>>> - Check rootdelay= (did the system wait long enough?)
>>>> - Missing modules (cat /proc/modules; ls /dev)
>>>>
>>>> ALERT! UUID=xxxxxxx does not exist. Dropping to shell!
>>>> --------------------------------------------
>>>>
>>>> And then it drop into initramfs shell, I try to use 'blkid' to
>>>> get block devices information, but it showed nothing.
>>>>
>>>> I also tried add 'rootdelay' and 'rootwait' in bootargs, but it did
>>>> not work.
>>>>
>>>> I am sure that my previous kernel 6.2.0-rc4 work normally, so I
>>>> did some bisect and found the commit below cause the failure on
>>>> my system:
>>>>
>>>> 9dbb8e3452ab irqdomain: Switch to per-domain locking
>>>>
>>>> I really have no idea why it cause my problem, but I see just
>>>> reverting this commit really help me.
>>>>
>>>> Do you have any idea?
>>>
>>> Please provide us with a kernel boot log. It is very hard
>>> to figure out what is going on without it. It would also
>>> help if you indicated what sort of device is your root
>>> filesystem on (NVMe, SATA, USB...), as it would narrow the
>>> search for the culprit.
>>
>> Unfortunately, I have not find a way to capture the console log, no
>> serial for me.
>
> You don't need serial access. Since you're able to interact with the
> machine, you can save the dmesg log on some other mass storage. Just
> make sure that USB, for example is in your initramfs, and dump the log
> there.
I can dump the log by initramfs-tool now and checked that the change
https://lore.kernel.org/all/20230223083800.31347-1-jgross@suse.com/
can fix my problem, thanks for your help.
[ 1.581072] PM: Magic number: 15:512:8
[ 1.581667] memory memory52: hash matches
[ 1.582493] RAS: Correctable Errors collector initialized.
[ 1.586949] Freeing unused decrypted memory: 2036K
[ 1.588182] Freeing unused kernel image (initmem) memory: 4584K
[ 1.604878] Write protecting the kernel read-only data: 26624k
[ 1.606123] Freeing unused kernel image (rodata/data gap) memory: 928K
[ 1.614504] x86/mm: Checked W+X mappings: passed, no W+X pages found.
[ 1.615193] Run /init as init process
[ 1.615867] with arguments:
[ 1.615868] /init
[ 1.615869] with environment:
[ 1.615869] HOME=/
[ 1.615870] TERM=linux
[ 1.615870] BOOT_IMAGE=/vmlinuz-6.2.0-ipu
[ 1.722303] wmi_bus wmi_bus-PNP0C14:02: WQBC data block query control method not found
[ 1.723991] hid: raw HID events driver (C) Jiri Kosina
[ 1.726080] BUG: kernel NULL pointer dereference, address: 0000000000000050
[ 1.726874] #PF: supervisor read access in kernel mode
[ 1.727687] #PF: error_code(0x0000) - not-present page
[ 1.728491] PGD 0 P4D 0
[ 1.729280] Oops: 0000 [#1] PREEMPT SMP NOPTI
[ 1.730078] CPU: 3 PID: 154 Comm: systemd-udevd Not tainted 6.2.0-ipu #10
[ 1.730870] Hardware name: Dell Inc. XPS 9315/, BIOS 0.0.22 12/23/2021
[ 1.731670] RIP: 0010:irq_domain_create_hierarchy+0x2d/0x70
[ 1.732470] Code: 00 00 55 48 89 e5 41 55 49 89 fd 48 89 cf 41 54 53 89 f3 85 d2 74 3f 89 d6 31 c9 89 d2 e8 6b fd ff ff 49 89 c4 4d 85 e4 74 1e <49> 8b 45 50 41 09 5c 24 28 4c 89 e7 4d 89 ac 24 80 00 00 00 49 89
[ 1.733321] RSP: 0018:ffffb811c08e38f8 EFLAGS: 00010282
[ 1.734156] RAX: ffff975001456540 RBX: 0000000000000010 RCX: 0000000000000000
[ 1.734993] RDX: ffffffffadf8be90 RSI: ffffffffac7290a0 RDI: ffff975001456570
[ 1.735841] RBP: ffffb811c08e3910 R08: ffff975001452900 R09: ffff975001452900
[ 1.736676] R10: ffff975001452900 R11: ffff97510145206f R12: ffff975001456540
[ 1.737515] R13: 0000000000000000 R14: 0000000000000013 R15: ffff975011860628
[ 1.738349] FS: 00007f20175c08c0(0000) GS:ffff97537f8c0000(0000) knlGS:0000000000000000
[ 1.739198] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1.740042] CR2: 0000000000000050 CR3: 0000000111e1c004 CR4: 0000000000770ee0
[ 1.740892] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1.741741] DR3: 0000000000000000 DR6: 00000000ffff07f0 DR7: 0000000000000400
[ 1.742592] PKRU: 55555554
[ 1.743415] Call Trace:
[ 1.744226] <TASK>
[ 1.745045] __msi_create_irq_domain+0xb8/0x180
[ 1.745863] msi_create_irq_domain+0x13/0x20
[ 1.746680] pci_msi_create_irq_domain+0x7a/0xe0
[ 1.747493] vmd_probe+0x85e/0x9a0 [vmd]
[ 1.748313] local_pci_probe+0x48/0xb0
[ 1.749133] pci_device_probe+0xc8/0x280
[ 1.749961] really_probe+0x186/0x3f0
[ 1.750779] __driver_probe_device+0x8a/0x190
[ 1.751596] driver_probe_device+0x23/0xb0
[ 1.752422] __driver_attach+0xc5/0x190
[ 1.753246] ? __pfx___driver_attach+0x10/0x10
[ 1.754075] bus_for_each_dev+0x7a/0xd0
[ 1.755273] driver_attach+0x1e/0x30
[ 1.756095] bus_add_driver+0x11c/0x230
[ 1.756916] driver_register+0x64/0x130
[ 1.758073] ? __pfx_init_module+0x10/0x10 [vmd]
[ 1.758890] __pci_register_driver+0x68/0x70
[ 1.759696] vmd_drv_init+0x23/0xff0 [vmd]
[ 1.760495] do_one_initcall+0x46/0x220
[ 1.761290] ? kmalloc_trace+0x2a/0xa0
[ 1.762079] do_init_module+0x52/0x230
[ 1.762876] load_module+0x2190/0x27c0
[ 1.763662] ? security_kernel_post_read_file+0x5c/0x70
[ 1.764453] __do_sys_finit_module+0xc8/0x140
[ 1.765246] ? __do_sys_finit_module+0xc8/0x140
[ 1.766039] __x64_sys_finit_module+0x1a/0x20
[ 1.766829] do_syscall_64+0x59/0x90
[ 1.767599] ? exit_to_user_mode_prepare+0x3d/0x190
[ 1.768369] ? syscall_exit_to_user_mode+0x26/0x50
[ 1.769146] ? __x64_sys_mmap+0x33/0x50
[ 1.769912] ? do_syscall_64+0x69/0x90
[ 1.770685] entry_SYSCALL_64_after_hwframe+0x72/0xdc
[ 1.771462] RIP: 0033:0x7f2017ca0c4d
[ 1.772244] Code: 5d c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 83 f1 0d 00 f7 d8 64 89 01 48
[ 1.773071] RSP: 002b:00007fffd75e47b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
[ 1.773904] RAX: ffffffffffffffda RBX: 0000555f7823dac0 RCX: 00007f2017ca0c4d
[ 1.774744] RDX: 0000000000000000 RSI: 00007f2017e22458 RDI: 0000000000000005
[ 1.775592] RBP: 00007f2017e22458 R08: 0000000000000000 R09: 0000000000000000
[ 1.776430] R10: 0000000000000005 R11: 0000000000000246 R12: 0000000000020000
[ 1.777269] R13: 0000555f78233ab0 R14: 0000000000000000 R15: 0000555f78236f40
[ 1.778115] </TASK>
[ 1.778955] Modules linked in: intel_ishtp(+) idma64 xhci_pci_renesas vmd(+) hid cxl_acpi wmi cxl_core fjes(-) pinctrl_tigerlake
[ 1.779839] CR2: 0000000000000050
[ 1.780704] ---[ end trace 0000000000000000 ]---
[ 1.781573] RIP: 0010:irq_domain_create_hierarchy+0x2d/0x70
[ 1.782437] Code: 00 00 55 48 89 e5 41 55 49 89 fd 48 89 cf 41 54 53 89 f3 85 d2 74 3f 89 d6 31 c9 89 d2 e8 6b fd ff ff 49 89 c4 4d 85 e4 74 1e <49> 8b 45 50 41 09 5c 24 28 4c 89 e7 4d 89 ac 24 80 00 00 00 49 89
[ 1.783349] RSP: 0018:ffffb811c08e38f8 EFLAGS: 00010282
[ 1.784250] RAX: ffff975001456540 RBX: 0000000000000010 RCX: 0000000000000000
[ 1.785158] RDX: ffffffffadf8be90 RSI: ffffffffac7290a0 RDI: ffff975001456570
[ 1.786065] RBP: ffffb811c08e3910 R08: ffff975001452900 R09: ffff975001452900
[ 1.786972] R10: ffff975001452900 R11: ffff97510145206f R12: ffff975001456540
[ 1.787860] R13: 0000000000000000 R14: 0000000000000013 R15: ffff975011860628
[ 1.788748] FS: 00007f20175c08c0(0000) GS:ffff97537f8c0000(0000) knlGS:0000000000000000
[ 1.789646] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1.790539] CR2: 0000000000000050 CR3: 0000000111e1c004 CR4: 0000000000770ee0
[ 1.791439] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1.792342] DR3: 0000000000000000 DR6: 00000000ffff07f0 DR7: 0000000000000400
[ 1.793239] PKRU: 55555554
[ 1.794123] note: systemd-udevd[154] exited with irqs disabled
[ 1.845219] ACPI: bus type thunderbolt registered
[ 1.846114] xhci_hcd 0000:00:14.0: xHCI Host Controller
[ 1.846128] xhci_hcd 0000:00:14.0: new USB bus registered, assigned bus number 1
[ 1.847456] xhci_hcd 0000:00:0d.0: xHCI Host Controller
[ 1.849893] xhci_hcd 0000:00:14.0: hcc params 0x20007fc1 hci version 0x120 quirks 0x0000100200009810
[ 1.851081] xhci_hcd 0000:00:0d.0: new USB bus registered, assigned bus number 2
[ 1.853686] xhci_hcd 0000:00:14.0: xHCI Host Controller
[ 1.854119] xhci_hcd 0000:00:0d.0: hcc params 0x20007fc1 hci version 0x120 quirks 0x0000000200009810
[ 1.855088] xhci_hcd 0000:00:14.0: new USB bus registered, assigned bus number 3
[ 1.856121] xhci_hcd 0000:00:0d.0: xHCI Host Controller
[ 1.857265] xhci_hcd 0000:00:14.0: Host supports USB 3.1 Enhanced SuperSpeed
[ 1.857325] usb usb1: New USB device found, idVendor=1d6b, idProduct=0002, bcdDevice= 6.02
[ 1.858276] intel-lpss 0000:00:15.0: enabling device (0000 -> 0002)
>
>> I am using a NVMe for my rootfs. By checking the screen log, I see
>> that 1 kernel message is missing:
>>
>> [ 4.193375] EXT4-fs (nvme0n1p3): mounted filesystem a9e1243b-332f-46ce-a5e7-cea86b44f797 with ordered data mode. Quota mode: none.
>
> OK, at least we know that NVMe is in the loop, but we don't know *why*
> yet. Please try and get the dmesg for us. I'm sure someone at Intel
> can help you with this.
>
> Thanks,
>
> M.
>
--
Best regards,
Bingbu Cao
next prev parent reply other threads:[~2023-03-02 8:55 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-02-27 10:46 System boot failure related to commit 'irqdomain: Switch to per-domain locking' Bingbu Cao
2023-02-27 11:02 ` Johan Hovold
2023-02-27 11:25 ` Bingbu Cao
2023-02-27 12:47 ` Johan Hovold
2023-02-28 0:45 ` Marc Zyngier
2023-03-01 11:17 ` Bingbu Cao
2023-03-01 14:46 ` Marc Zyngier
2023-03-02 8:55 ` Bingbu Cao [this message]
2023-03-02 9:02 ` Johan Hovold
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=bc72258e-ce88-6812-08bf-0f16f15e02ce@linux.intel.com \
--to=bingbu.cao@linux.intel.com \
--cc=david.e.box@linux.intel.com \
--cc=hsinyi@chromium.org \
--cc=johan+linaro@kernel.org \
--cc=jonathan.derrick@linux.dev \
--cc=linux-kernel@vger.kernel.org \
--cc=maz@kernel.org \
--cc=nirmal.patel@linux.intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.