* [Xen-devel] PV dom0 crash: kernel NULL pointer dereference in evtchn_from_irq
@ 2020-03-19 2:26 Marek Marczykowski-Górecki
2020-03-19 6:14 ` Jürgen Groß
0 siblings, 1 reply; 2+ messages in thread
From: Marek Marczykowski-Górecki @ 2020-03-19 2:26 UTC (permalink / raw)
To: xen-devel
[-- Attachment #1.1: Type: text/plain, Size: 5938 bytes --]
Hi,
From time to time, during intensive tests I get the dom0 crash like
below. This is PV dom0, running on Xen nested inside KVM.
I don't really know when it started happening, I've got it on at least
those versions:
- Xen 4.8.5 + Linux dom0 4.19.94
- Xen 4.13.0 + Linux dom0 5.4.25
- at least once also on physical hardware (Xen 4.13.0 + Linux dom0
5.4.x)
Contrary to the other issue, here suspend is not involved, it is just
intensive system usage - multiple VM startups, involving some I/O,
network traffic etc. This happens rather rarely (I'd say about 1-3% of
tests).
To be honest, I'm not really sure if the bug is in Xen-related code at
all, or if Xen functions are in the call trace only because it is PV
dom0.
Full crash message:
[14474.613706] BUG: kernel NULL pointer dereference, address: 000000000000001c
[14474.615832] #PF: supervisor read access in kernel mode
[14474.617321] #PF: error_code(0x0000) - not-present page
[14474.618702] PGD 0 P4D 0
[14474.619452] Oops: 0000 [#1] SMP NOPTI
[14474.620452] CPU: 0 PID: 431254 Comm: rm Not tainted 5.4.25-1.qubes.x86_64 #1
[14474.622900] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba527-rebuilt.opensuse.org 04/01/2014
[14474.626322] RIP: e030:evtchn_from_irq+0x1f/0x40
[14474.627630] Code: 40 08 c3 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 39 3d 95 ab 03 01 76 16 e8 3e e8 b2 ff 48 85 c0 74 08 48 8b 40 10 48 8b 40 08 <8b> 40 1c c3 89 fe 48 c7 c7 d5 9e 37 82 e8 7d 3d ac ff 0f 0b 31 c0
[14474.632719] RSP: e02b:ffffc90000c03f30 EFLAGS: 00010046
[14474.634143] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[14474.636259] RDX: 0000000000000028 RSI: 000000000000007f RDI: ffffffff8265b6e0
[14474.638224] RBP: ffff888052c8f428 R08: ffff888138324000 R09: ffff888138324220
[14474.640238] R10: 0000000000000000 R11: ffffffff8265b6e8 R12: ffff888052c8f4a4
[14474.642239] R13: 0000000000000000 R14: ffff88813b542000 R15: ffff88813fe26ddc
[14474.644254] FS: 00007a018057f580(0000) GS:ffff88813fe00000(0000) knlGS:0000000000000000
[14474.646563] CS: 10000e030 DS: 0000 ES: 0000 CR0: 0000000080050033
[14474.648331] CR2: 000000000000001c CR3: 000000004defa000 CR4: 0000000000000660
[14474.650352] Call Trace:
[14474.651042] <IRQ>
[14474.651647] disable_dynirq+0xd/0x30
[14474.652668] mask_ack_dynirq+0xe/0x20
[14474.653706] handle_edge_irq+0xfc/0x190
[14474.655241] generic_handle_irq+0x24/0x30
[14474.656450] __evtchn_fifo_handle_events+0x151/0x1a0
[14474.657886] __xen_evtchn_do_upcall+0x58/0x90
[14474.659093] xen_evtchn_do_upcall+0x27/0x40
[14474.660252] xen_do_hypervisor_callback+0x29/0x40
[14474.661538] </IRQ>
[14474.662150] RIP: e030:xen_hypercall_xen_version+0xa/0x20
[14474.663589] Code: 51 41 53 b8 10 00 00 00 0f 05 41 5b 59 c3 cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 51 41 53 b8 11 00 00 00 0f 05 <41> 5b 59 c3 cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc
[14474.668929] RSP: e02b:ffffc9000be0be50 EFLAGS: 00000246
[14474.670357] RAX: 000000000004000d RBX: 0000000000000000 RCX: ffffffff8100122a
[14474.672348] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[14474.674376] RBP: 0000000000000004 R08: 0000000000000000 R09: 0000000000000000
[14474.676394] R10: 0000000000000000 R11: 0000000000000246 R12: ffffc9000be0bf58
[14474.678298] R13: 00005a5a44d448a0 R14: ffff88804dfc5540 R15: ffff888065b3a540
[14474.680207] ? xen_hypercall_xen_version+0xa/0x20
[14474.681498] ? xen_force_evtchn_callback+0x9/0x10
[14474.682757] ? check_events+0x12/0x20
[14474.683759] ? xen_irq_enable_direct+0x19/0x20
[14474.685048] ? do_user_addr_fault+0x152/0x450
[14474.686304] ? do_page_fault+0x31/0x110
[14474.687886] ? page_fault+0x3e/0x50
[14474.688839] Modules linked in: br_netfilter xt_physdev xen_netback bridge stp llc joydev loop ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter snd_hda_codec_generic ledtrig_audio snd_hda_intel snd_intel_nhlt snd_hda_codec ppdev snd_hda_core snd_hwdep snd_seq snd_seq_device snd_pcm edac_mce_amd snd_timer parport_pc snd pcspkr soundcore parport e1000e i2c_piix4 xenfs ip_tables dm_thin_pool dm_persistent_data libcrc32c dm_bio_prison bochs_drm drm_kms_helper drm_vram_helper ttm drm ehci_pci virtio_scsi virtio_console ehci_hcd serio_raw ata_generic pata_acpi floppy qemu_fw_cfg xen_privcmd xen_pciback xen_blkback xen_gntalloc xen_gntdev xen_evtchn uinput pkcs8_key_parser
[14474.705128] CR2: 000000000000001c
[14474.706182] ---[ end trace 19fc15c03d0b00c8 ]---
[14474.707485] RIP: e030:evtchn_from_irq+0x1f/0x40
[14474.708768] Code: 40 08 c3 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 39 3d 95 ab 03 01 76 16 e8 3e e8 b2 ff 48 85 c0 74 08 48 8b 40 10 48 8b 40 08 <8b> 40 1c c3 89 fe 48 c7 c7 d5 9e 37 82 e8 7d 3d ac ff 0f 0b 31 c0
[14474.713814] RSP: e02b:ffffc90000c03f30 EFLAGS: 00010046
[14474.715400] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[14474.717360] RDX: 0000000000000028 RSI: 000000000000007f RDI: ffffffff8265b6e0
[14474.719337] RBP: ffff888052c8f428 R08: ffff888138324000 R09: ffff888138324220
[14474.721302] R10: 0000000000000000 R11: ffffffff8265b6e8 R12: ffff888052c8f4a4
[14474.723257] R13: 0000000000000000 R14: ffff88813b542000 R15: ffff88813fe26ddc
[14474.725232] FS: 00007a018057f580(0000) GS:ffff88813fe00000(0000) knlGS:0000000000000000
[14474.727530] CS: 10000e030 DS: 0000 ES: 0000 CR0: 0000000080050033
[14474.729222] CR2: 000000000000001c CR3: 000000004defa000 CR4: 0000000000000660
[14474.731177] Kernel panic - not syncing: Fatal exception in interrupt
Full (Xen) console log, sadly it doesn't contain more of the Linux
output:
https://openqa.qubes-os.org/tests/6992/file/serial0.txt
--
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
[-- Attachment #2: Type: text/plain, Size: 157 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: [Xen-devel] PV dom0 crash: kernel NULL pointer dereference in evtchn_from_irq
2020-03-19 2:26 [Xen-devel] PV dom0 crash: kernel NULL pointer dereference in evtchn_from_irq Marek Marczykowski-Górecki
@ 2020-03-19 6:14 ` Jürgen Groß
0 siblings, 0 replies; 2+ messages in thread
From: Jürgen Groß @ 2020-03-19 6:14 UTC (permalink / raw)
To: Marek Marczykowski-Górecki, xen-devel
On 19.03.20 03:26, Marek Marczykowski-Górecki wrote:
> Hi,
>
> From time to time, during intensive tests I get the dom0 crash like
> below. This is PV dom0, running on Xen nested inside KVM.
> I don't really know when it started happening, I've got it on at least
> those versions:
> - Xen 4.8.5 + Linux dom0 4.19.94
> - Xen 4.13.0 + Linux dom0 5.4.25
> - at least once also on physical hardware (Xen 4.13.0 + Linux dom0
> 5.4.x)
>
> Contrary to the other issue, here suspend is not involved, it is just
> intensive system usage - multiple VM startups, involving some I/O,
> network traffic etc. This happens rather rarely (I'd say about 1-3% of
> tests).
> To be honest, I'm not really sure if the bug is in Xen-related code at
> all, or if Xen functions are in the call trace only because it is PV
> dom0.
>
> Full crash message:
>
> [14474.613706] BUG: kernel NULL pointer dereference, address: 000000000000001c
> [14474.615832] #PF: supervisor read access in kernel mode
> [14474.617321] #PF: error_code(0x0000) - not-present page
> [14474.618702] PGD 0 P4D 0
> [14474.619452] Oops: 0000 [#1] SMP NOPTI
> [14474.620452] CPU: 0 PID: 431254 Comm: rm Not tainted 5.4.25-1.qubes.x86_64 #1
> [14474.622900] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba527-rebuilt.opensuse.org 04/01/2014
> [14474.626322] RIP: e030:evtchn_from_irq+0x1f/0x40
I have seen this while testing some event channel related patches and
thought I was introducing this case. Seems as if it can happen even
without my patches.
I'll send the fixup I've added to my series soon.
Juergen
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2020-03-19 6:14 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-03-19 2:26 [Xen-devel] PV dom0 crash: kernel NULL pointer dereference in evtchn_from_irq Marek Marczykowski-Górecki
2020-03-19 6:14 ` Jürgen Groß
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.