All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Marek Marczykowski-Górecki" <marmarek@invisiblethingslab.com>
To: Juergen Gross <jgross@suse.com>
Cc: xen-devel <xen-devel@lists.xenproject.org>
Subject: Re: Linux 6.13-rc3 many different panics in Xen PV dom0
Date: Thu, 2 Jan 2025 19:54:50 +0100	[thread overview]
Message-ID: <Z3bg-gvaBEdSIuRW@mail-itl> (raw)
In-Reply-To: <Z3aFdrygLF9yK2EK@mail-itl>

[-- Attachment #1: Type: text/plain, Size: 7252 bytes --]

On Thu, Jan 02, 2025 at 01:24:21PM +0100, Marek Marczykowski-Górecki wrote:
> On Thu, Jan 02, 2025 at 12:30:10PM +0100, Juergen Gross wrote:
> > On 02.01.25 11:20, Jürgen Groß wrote:
> > > On 19.12.24 17:14, Marek Marczykowski-Górecki wrote:
> > > > Hi,
> > > > 
> > > > It crashes on boot like below, most of the times. But sometimes (rarely)
> > > > it manages to stay alive. Below I'm pasting few of the crashes that look
> > > > distinctly different, if you follow the links, you can find more of
> > > > them. IMHO it looks like some memory corruption bug somewhere. I tested
> > > > also Linux 6.13-rc2 before, and it had very similar issue.
> > > 
> > > ...
> > > 
> > > > 
> > > > Full log:
> > > > https://openqa.qubes-os.org/tests/122879/logfile?filename=serial0.txt
> > > 
> > > I can reproduce a crash with 6.13-rc5 PV dom0.
> > > 
> > > What is really interesting in the logs: most crashes seem to happen right
> > > after a module being loaded (in my reproducer it was right after loading
> > > the first module).
> > > 
> > > I need to go through the 6.13 commits, but I think I remember having seen
> > > a patch optimizing module loading by using large pages for addressing the
> > > loaded modules. Maybe the case of no large pages being available isn't
> > > handled properly.
> > 
> > Seems I was right.
> > 
> > For me the following diff fixes the issue. Marek, can you please confirm
> > it fixes your crashes, too?
> 
> Thanks for looking into it!
> Will do, I've pushed it to
> https://github.com/QubesOS/qubes-linux-kernel/pull/662, CI will build it
> and then I'll post it to openQA.

It is much better!

Tests are still running, but I already see that many are green. There is
one issue (likely unrelated to this change) - sys-usb (HVM domU with USB
controllers passed through) crashes on a system with Raptor Lake CPU
(only, others, including ADL and MTL look fine):

[   75.770849] Bluetooth: Core ver 2.22
[   75.770866] Oops: general protection fault, probably for non-canonical address 0xc9d2315bc82c3bbd: 0000 [#1] PREEMPT SMP NOPTI
[   75.770880] CPU: 0 UID: 0 PID: 923 Comm: (udev-worker) Not tainted 6.13.0-0.rc5.2.qubes.1.fc41.x86_64 #1
[   75.770890] Hardware name: Xen HVM domU, BIOS 4.19.0 01/02/2025
[   75.770897] RIP: 0010:msft_monitor_device_del+0x93/0x170 [bluetooth]
[   75.770924] Code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 d0 65 21 <26> 2b 8b ad 03 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[   75.770943] RSP: 0000:ffffad644108fa40 EFLAGS: 00010246
[   75.770950] RAX: ffff93da8a149600 RBX: c9d2315bc82c3810 RCX: 0000000100000000
[   75.770958] RDX: 0000000000000001 RSI: ffff93da905e9180 RDI: ffff93da81404598
[   75.770967] RBP: ffffad644108fa58 R08: 0000000000000064 R09: 00000000000012ab
[   75.770975] R10: ffff93da81207000 R11: 0000000000000286 R12: ffffad644108fb00
[   75.770983] R13: ffffad644108fa68 R14: ffff93da9089b840 R15: ffff93da8c265100
[   75.770991] FS:  000078fa4cec4bc0(0000) GS:ffff93da97000000(0000) knlGS:0000000000000000
[   75.771000] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   75.771007] CR2: 000074fa64aadc08 CR3: 00000000105d2006 CR4: 0000000000770ef0
[   75.771016] PKRU: 55555554
[   75.771019] Call Trace:
[   75.771024]  <TASK>
[   75.771028]  ? show_trace_log_lvl+0x1b0/0x2f0
[   75.771036]  ? show_trace_log_lvl+0x1b0/0x2f0
[   75.771042]  ? do_one_initcall+0x58/0x310
[   75.771048]  ? __die_body.cold+0x8/0x12
[   75.771053]  ? die_addr+0x3c/0x60
[   75.771059]  ? exc_general_protection+0x17d/0x400
[   75.771066]  ? asm_exc_general_protection+0x26/0x30
[   75.771074]  ? msft_monitor_device_del+0x93/0x170 [bluetooth]
[   75.771095]  ? bt_init+0x54/0x1d0 [bluetooth]
[   75.771114]  ? __pfx_bt_init+0x10/0x10 [bluetooth]
[   75.771131]  ? do_one_initcall+0x58/0x310
[   75.771137]  ? do_init_module+0x90/0x250
[   75.771142]  ? init_module_from_file+0x86/0xc0
[   75.771149]  ? idempotent_init_module+0x115/0x310
[   75.771156]  ? __x64_sys_finit_module+0x65/0xc0
[   75.771163]  ? do_syscall_64+0x82/0x160
[   75.771168]  ? backing_file_read_iter+0x156/0x1f0
[   75.771176]  ? ovl_read_iter+0x94/0xa0 [overlay]
[   75.771189]  ? __pfx_ovl_file_accessed+0x10/0x10 [overlay]
[   75.771199]  ? rseq_get_rseq_cs+0x1d/0x220
[   75.771205]  ? rseq_ip_fixup+0x8d/0x1d0
[   75.771210]  ? __seccomp_filter+0x303/0x520
[   75.771216]  ? syscall_exit_to_user_mode_prepare+0x15e/0x1a0
[   75.771224]  ? syscall_exit_to_user_mode+0x10/0x210
[   75.771231]  ? do_syscall_64+0x8e/0x160
[   75.771236]  ? do_sys_openat2+0x9c/0xe0
[   75.771241]  ? syscall_exit_to_user_mode_prepare+0x15e/0x1a0
[   75.771249]  ? syscall_exit_to_user_mode+0x10/0x210
[   75.771255]  ? do_syscall_64+0x8e/0x160
[   75.771260]  ? do_user_addr_fault+0x1ec/0x7b0
[   75.771267]  ? entry_SYSCALL_64_after_hwframe+0x76/0x7e
[   75.771274]  </TASK>
[   75.771277] Modules linked in: bluetooth(+) rfkill snd_seq_dummy snd_hrtimer snd_seq snd_seq_device snd_timer snd soundcore nft_reject_ipv6 nf_reject_ipv6 nft_reject_ipv4 nf_reject_ipv4 nft_reject intel_rapl_msr intel_rapl_common nft_ct intel_uncore_frequency_common intel_pmc_core intel_vsec joydev nft_masq pmt_telemetry pmt_class nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni xhci_pci polyval_generic ghash_clmulni_intel xhci_hcd sha512_ssse3 sha256_ssse3 nf_tables sha1_ssse3 ehci_pci mei_me ehci_hcd pcspkr mei ata_generic pata_acpi i2c_piix4 i2c_smbus serio_raw xen_scsiback target_core_mod xen_netback xen_privcmd xen_gntdev xen_gntalloc xen_blkback xen_evtchn loop fuse nfnetlink overlay xen_blkfront
[   75.771370] ---[ end trace 0000000000000000 ]---
[   75.771376] RIP: 0010:msft_monitor_device_del+0x93/0x170 [bluetooth]
[   75.771397] Code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 d0 65 21 <26> 2b 8b ad 03 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[   75.771416] RSP: 0000:ffffad644108fa40 EFLAGS: 00010246
[   75.771422] RAX: ffff93da8a149600 RBX: c9d2315bc82c3810 RCX: 0000000100000000
[   75.771431] RDX: 0000000000000001 RSI: ffff93da905e9180 RDI: ffff93da81404598
[   75.771439] RBP: ffffad644108fa58 R08: 0000000000000064 R09: 00000000000012ab
[   75.771446] R10: ffff93da81207000 R11: 0000000000000286 R12: ffffad644108fb00
[   75.771454] R13: ffffad644108fa68 R14: ffff93da9089b840 R15: ffff93da8c265100
[   75.771463] FS:  000078fa4cec4bc0(0000) GS:ffff93da97000000(0000) knlGS:0000000000000000
[   75.771471] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   75.771477] CR2: 000074fa64aadc08 CR3: 00000000105d2006 CR4: 0000000000770ef0
[   75.771485] PKRU: 55555554
[   75.771488] Kernel panic - not syncing: Fatal exception
[   75.771519] Kernel Offset: 0x3b800000 from 0xffffffff80200000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)

Full log inside
https://openqa.qubes-os.org/tests/124736/file/usbvm-var_log.tar.gz
(log/xen/console/guest-sys-usb.log)

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

  reply	other threads:[~2025-01-02 18:55 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-12-19 16:14 Linux 6.13-rc3 many different panics in Xen PV dom0 Marek Marczykowski-Górecki
2024-12-20  1:48 ` Marek Marczykowski-Górecki
2024-12-26 18:48   ` Marek Marczykowski-Górecki
2025-01-02 10:20 ` Jürgen Groß
2025-01-02 11:30   ` Juergen Gross
2025-01-02 12:24     ` Marek Marczykowski-Górecki
2025-01-02 18:54       ` Marek Marczykowski-Górecki [this message]
2025-01-02 19:04         ` Andrew Cooper
2025-01-02 19:17         ` Jürgen Groß
2025-01-02 19:39           ` Marek Marczykowski-Górecki
2025-01-03  0:18             ` Marek Marczykowski-Górecki
2025-01-03  0:42               ` Marek Marczykowski-Górecki
2025-01-03  2:00                 ` Andrew Cooper
2025-01-03 18:09                   ` Linux 6.13-rc5 Xen HVM with PCI passthrough (USB controller) crash Marek Marczykowski-Górecki
2025-01-03 18:32                     ` Geert Uytterhoeven

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z3bg-gvaBEdSIuRW@mail-itl \
    --to=marmarek@invisiblethingslab.com \
    --cc=jgross@suse.com \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.