* Xen-unstable 4.8: Host crash when shutting down guest with pci device passed through using MSI-X interrupts. @ 2016-07-18 10:21 linux 2016-07-18 17:48 ` Andrew Cooper 2016-07-21 10:18 ` [PATCH] x86/vMSI-X: Fix host crash when shutting down guests with MSI capable devices Andrew Cooper 0 siblings, 2 replies; 13+ messages in thread From: linux @ 2016-07-18 10:21 UTC (permalink / raw) To: Xen-devel; +Cc: Jan Beulich Hi Jan, It seems that since your patch series starting with commit: 2016-06-22 x86/vMSI-X: defer intercept handler registration 74c6dc2d0ac4dcab0c6243cdf6ed550c1532b798 The shutdown of a guest which has a PCI device passed through which uses MSI-X interrupts causes a host crash, see the splat below. Somehow it also doesn't reboot in 5 seconds as it is supposed to (i don't have no-reboot on the command line). -- Sander (XEN) [2016-07-16 16:03:17.069] ----[ Xen-4.8-unstable x86_64 debug=y Not tainted ]---- (XEN) [2016-07-16 16:03:17.069] CPU: 0 (XEN) [2016-07-16 16:03:17.069] RIP: e008:[<ffff82d0801e39de>] msixtbl_pt_unregister+0x7b/0xd9 (XEN) [2016-07-16 16:03:17.069] RFLAGS: 0000000000010082 CONTEXT: hypervisor (d0v0) (XEN) [2016-07-16 16:03:17.069] rax: ffff83055c678e40 rbx: ffff83055c685500 rcx: 0000000000000001 (XEN) [2016-07-16 16:03:17.069] rdx: 0000000000000000 rsi: 0000000000001ab0 rdi: ffff8305313b85a0 (XEN) [2016-07-16 16:03:17.069] rbp: ffff83009fd07c78 rsp: ffff83009fd07c68 r8: ffff8305356dfff0 (XEN) [2016-07-16 16:03:17.069] r9: ffff8305356df480 r10: ffff830503420c50 r11: 0000000000000282 (XEN) [2016-07-16 16:03:17.069] r12: ffff8305313b8000 r13: ffff83009fd07e48 r14: ffff8305313b8000 (XEN) [2016-07-16 16:03:17.069] r15: ffff8305356df4a8 cr0: 0000000080050033 cr4: 00000000000006e0 (XEN) [2016-07-16 16:03:17.069] cr3: 000000053639f000 cr2: 0000000000000000 (XEN) [2016-07-16 16:03:17.069] ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: e010 cs: e008 (XEN) [2016-07-16 16:03:17.069] Xen code around <ffff82d0801e39de> (msixtbl_pt_unregister+0x7b/0xd9): (XEN) [2016-07-16 16:03:17.069] 39 42 18 74 19 48 89 ca <48> 8b 0a 0f 18 09 48 39 fa 75 ec 48 8d 7b 24 e8 (XEN) [2016-07-16 16:03:17.069] Xen stack trace from rsp=ffff83009fd07c68: (XEN) [2016-07-16 16:03:17.069] 0000000000000000 ffff8305356df480 ffff83009fd07ce8 ffff82d08014c394 (XEN) [2016-07-16 16:03:17.069] 0000000000000001 ffff8305356df480 0000000000000293 ffff8305313b80cc (XEN) [2016-07-16 16:03:17.069] 000000568012ffe5 ffff8305313b8000 ffff83009fd07cd8 ffff83009fd07e38 (XEN) [2016-07-16 16:03:17.070] 0000000000000000 ffff83054e5fc000 00007fc25a33e004 ffff8305313b8000 (XEN) [2016-07-16 16:03:17.070] ffff83009fd07da8 ffff82d0801629c8 0000000000000000 ffff83053b1191f0 (XEN) [2016-07-16 16:03:17.070] 0000000000000246 ffff83009fd07d28 ffff82d0801300ae 000000000000000e (XEN) [2016-07-16 16:03:17.070] ffff83009fd07d78 ffff82d080171497 ffff83009fd07d78 000000020001d17b (XEN) [2016-07-16 16:03:17.070] ffff83009fd07d68 0000000000000000 ffff83009fd07d68 ffff82d080130280 (XEN) [2016-07-16 16:03:17.070] ffff83009fd07d78 ffff82d08014d0aa 0000000000000202 0000000000000000 (XEN) [2016-07-16 16:03:17.070] ffff8305313b8000 ffff88005716d320 0000000000305000 00007fc25a33e004 (XEN) [2016-07-16 16:03:17.070] ffff83009fd07ef8 ffff82d080104b2c 0000000000000206 0000000000000002 (XEN) [2016-07-16 16:03:17.070] ffff83009fd07df8 ffff82d08018c9db 0000000000000cfe 0000000000000002 (XEN) [2016-07-16 16:03:17.070] 0000000000000002 ffff83054e5fc000 ffff83009fd07e48 ffff82d08019c119 (XEN) [2016-07-16 16:03:17.070] ffff83009fd07e38 0000000080121177 ffff83009fd07e38 0000000000000cfe (XEN) [2016-07-16 16:03:17.070] ffff83009fd07f18 0000000000000206 0000000c00000030 000056082bb90013 (XEN) [2016-07-16 16:03:17.070] 0000000200000056 00007fc200000013 0000305600000000 000056082b87465d (XEN) [2016-07-16 16:03:17.070] 00007ffe268206e0 00007fc25606b31f 0000000000000000 000056082b8746cf (XEN) [2016-07-16 16:03:17.070] 0000000000001000 fee5600026820730 00007ffe26820740 000056082b8797be (XEN) [2016-07-16 16:03:17.070] 00000000fee56000 0000430026820772 00007ffe26820740 0000000000003056 (XEN) [2016-07-16 16:03:17.070] 00007ffe268206e0 ffff83009ff8a000 00007ffe26820580 ffff88005716d320 (XEN) [2016-07-16 16:03:17.070] Xen call trace: (XEN) [2016-07-16 16:03:17.070] [<ffff82d0801e39de>] msixtbl_pt_unregister+0x7b/0xd9 (XEN) [2016-07-16 16:03:17.070] [<ffff82d08014c394>] pt_irq_destroy_bind+0x2be/0x3f0 (XEN) [2016-07-16 16:03:17.070] [<ffff82d0801629c8>] arch_do_domctl+0xc77/0x2414 (XEN) [2016-07-16 16:03:17.070] [<ffff82d080104b2c>] do_domctl+0x19db/0x1d26 (XEN) [2016-07-16 16:03:17.070] [<ffff82d0802426bd>] lstar_enter+0xdd/0x137 (XEN) [2016-07-16 16:03:17.070] (XEN) [2016-07-16 16:03:17.070] Pagetable walk from 0000000000000000: (XEN) [2016-07-16 16:03:17.070] L4[0x000] = 0000000000000000 ffffffffffffffff (XEN) [2016-07-16 16:03:18.147] (XEN) [2016-07-16 16:03:18.155] **************************************** (XEN) [2016-07-16 16:03:18.175] Panic on CPU 0: (XEN) [2016-07-16 16:03:18.187] FATAL PAGE FAULT (XEN) [2016-07-16 16:03:18.200] [error_code=0000] (XEN) [2016-07-16 16:03:18.214] Faulting linear address: 0000000000000000 (XEN) [2016-07-16 16:03:18.233] **************************************** (XEN) [2016-07-16 16:03:18.252] (XEN) [2016-07-16 16:03:18.261] Reboot in five seconds... _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Xen-unstable 4.8: Host crash when shutting down guest with pci device passed through using MSI-X interrupts. 2016-07-18 10:21 Xen-unstable 4.8: Host crash when shutting down guest with pci device passed through using MSI-X interrupts linux @ 2016-07-18 17:48 ` Andrew Cooper 2016-07-18 19:26 ` Sander Eikelenboom 2016-07-21 10:18 ` [PATCH] x86/vMSI-X: Fix host crash when shutting down guests with MSI capable devices Andrew Cooper 1 sibling, 1 reply; 13+ messages in thread From: Andrew Cooper @ 2016-07-18 17:48 UTC (permalink / raw) To: linux, Xen-devel; +Cc: Jan Beulich On 18/07/16 11:21, linux@eikelenboom.it wrote: > Hi Jan, > > It seems that since your patch series starting with commit: > 2016-06-22 x86/vMSI-X: defer intercept handler registration > 74c6dc2d0ac4dcab0c6243cdf6ed550c1532b798 > > The shutdown of a guest which has a PCI device passed through which > uses MSI-X interrupts causes > a host crash, see the splat below. Somehow it also doesn't reboot in 5 > seconds as it is supposed to (i don't have no-reboot on the command > line). > > -- > Sander > > > (XEN) [2016-07-16 16:03:17.069] ----[ Xen-4.8-unstable x86_64 > debug=y Not tainted ]---- > (XEN) [2016-07-16 16:03:17.069] CPU: 0 > (XEN) [2016-07-16 16:03:17.069] RIP: e008:[<ffff82d0801e39de>] > msixtbl_pt_unregister+0x7b/0xd9 > (XEN) [2016-07-16 16:03:17.069] RFLAGS: 0000000000010082 CONTEXT: > hypervisor (d0v0) > (XEN) [2016-07-16 16:03:17.069] rax: ffff83055c678e40 rbx: > ffff83055c685500 rcx: 0000000000000001 > (XEN) [2016-07-16 16:03:17.069] rdx: 0000000000000000 rsi: > 0000000000001ab0 rdi: ffff8305313b85a0 > (XEN) [2016-07-16 16:03:17.069] rbp: ffff83009fd07c78 rsp: > ffff83009fd07c68 r8: ffff8305356dfff0 > (XEN) [2016-07-16 16:03:17.069] r9: ffff8305356df480 r10: > ffff830503420c50 r11: 0000000000000282 > (XEN) [2016-07-16 16:03:17.069] r12: ffff8305313b8000 r13: > ffff83009fd07e48 r14: ffff8305313b8000 > (XEN) [2016-07-16 16:03:17.069] r15: ffff8305356df4a8 cr0: > 0000000080050033 cr4: 00000000000006e0 > (XEN) [2016-07-16 16:03:17.069] cr3: 000000053639f000 cr2: > 0000000000000000 > (XEN) [2016-07-16 16:03:17.069] ds: 0000 es: 0000 fs: 0000 gs: > 0000 ss: e010 cs: e008 > (XEN) [2016-07-16 16:03:17.069] Xen code around <ffff82d0801e39de> > (msixtbl_pt_unregister+0x7b/0xd9): > (XEN) [2016-07-16 16:03:17.069] 39 42 18 74 19 48 89 ca <48> 8b 0a 0f > 18 09 48 39 fa 75 ec 48 8d 7b 24 e8 > (XEN) [2016-07-16 16:03:17.069] Xen stack trace from > rsp=ffff83009fd07c68: > (XEN) [2016-07-16 16:03:17.069] 0000000000000000 ffff8305356df480 > ffff83009fd07ce8 ffff82d08014c394 > (XEN) [2016-07-16 16:03:17.069] 0000000000000001 ffff8305356df480 > 0000000000000293 ffff8305313b80cc > (XEN) [2016-07-16 16:03:17.069] 000000568012ffe5 ffff8305313b8000 > ffff83009fd07cd8 ffff83009fd07e38 > (XEN) [2016-07-16 16:03:17.070] 0000000000000000 ffff83054e5fc000 > 00007fc25a33e004 ffff8305313b8000 > (XEN) [2016-07-16 16:03:17.070] ffff83009fd07da8 ffff82d0801629c8 > 0000000000000000 ffff83053b1191f0 > (XEN) [2016-07-16 16:03:17.070] 0000000000000246 ffff83009fd07d28 > ffff82d0801300ae 000000000000000e > (XEN) [2016-07-16 16:03:17.070] ffff83009fd07d78 ffff82d080171497 > ffff83009fd07d78 000000020001d17b > (XEN) [2016-07-16 16:03:17.070] ffff83009fd07d68 0000000000000000 > ffff83009fd07d68 ffff82d080130280 > (XEN) [2016-07-16 16:03:17.070] ffff83009fd07d78 ffff82d08014d0aa > 0000000000000202 0000000000000000 > (XEN) [2016-07-16 16:03:17.070] ffff8305313b8000 ffff88005716d320 > 0000000000305000 00007fc25a33e004 > (XEN) [2016-07-16 16:03:17.070] ffff83009fd07ef8 ffff82d080104b2c > 0000000000000206 0000000000000002 > (XEN) [2016-07-16 16:03:17.070] ffff83009fd07df8 ffff82d08018c9db > 0000000000000cfe 0000000000000002 > (XEN) [2016-07-16 16:03:17.070] 0000000000000002 ffff83054e5fc000 > ffff83009fd07e48 ffff82d08019c119 > (XEN) [2016-07-16 16:03:17.070] ffff83009fd07e38 0000000080121177 > ffff83009fd07e38 0000000000000cfe > (XEN) [2016-07-16 16:03:17.070] ffff83009fd07f18 0000000000000206 > 0000000c00000030 000056082bb90013 > (XEN) [2016-07-16 16:03:17.070] 0000000200000056 00007fc200000013 > 0000305600000000 000056082b87465d > (XEN) [2016-07-16 16:03:17.070] 00007ffe268206e0 00007fc25606b31f > 0000000000000000 000056082b8746cf > (XEN) [2016-07-16 16:03:17.070] 0000000000001000 fee5600026820730 > 00007ffe26820740 000056082b8797be > (XEN) [2016-07-16 16:03:17.070] 00000000fee56000 0000430026820772 > 00007ffe26820740 0000000000003056 > (XEN) [2016-07-16 16:03:17.070] 00007ffe268206e0 ffff83009ff8a000 > 00007ffe26820580 ffff88005716d320 > (XEN) [2016-07-16 16:03:17.070] Xen call trace: > (XEN) [2016-07-16 16:03:17.070] [<ffff82d0801e39de>] > msixtbl_pt_unregister+0x7b/0xd9 > (XEN) [2016-07-16 16:03:17.070] [<ffff82d08014c394>] > pt_irq_destroy_bind+0x2be/0x3f0 > (XEN) [2016-07-16 16:03:17.070] [<ffff82d0801629c8>] > arch_do_domctl+0xc77/0x2414 > (XEN) [2016-07-16 16:03:17.070] [<ffff82d080104b2c>] > do_domctl+0x19db/0x1d26 > (XEN) [2016-07-16 16:03:17.070] [<ffff82d0802426bd>] > lstar_enter+0xdd/0x137 > (XEN) [2016-07-16 16:03:17.070] > (XEN) [2016-07-16 16:03:17.070] Pagetable walk from 0000000000000000: > (XEN) [2016-07-16 16:03:17.070] L4[0x000] = 0000000000000000 > ffffffffffffffff > (XEN) [2016-07-16 16:03:18.147] > (XEN) [2016-07-16 16:03:18.155] **************************************** > (XEN) [2016-07-16 16:03:18.175] Panic on CPU 0: > (XEN) [2016-07-16 16:03:18.187] FATAL PAGE FAULT > (XEN) [2016-07-16 16:03:18.200] [error_code=0000] > (XEN) [2016-07-16 16:03:18.214] Faulting linear address: 0000000000000000 > (XEN) [2016-07-16 16:03:18.233] **************************************** > (XEN) [2016-07-16 16:03:18.252] > (XEN) [2016-07-16 16:03:18.261] Reboot in five seconds... > Can you paste the disassembly of msixtbl_pt_unregister() please? That is a dereference of %rdx which is NULL at this point, but I need to figure out which pointer it is supposed to be. Thanks, ~Andrew _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Xen-unstable 4.8: Host crash when shutting down guest with pci device passed through using MSI-X interrupts. 2016-07-18 17:48 ` Andrew Cooper @ 2016-07-18 19:26 ` Sander Eikelenboom 2016-07-18 20:57 ` Andrew Cooper 0 siblings, 1 reply; 13+ messages in thread From: Sander Eikelenboom @ 2016-07-18 19:26 UTC (permalink / raw) To: Andrew Cooper; +Cc: Jan Beulich, Xen-devel Monday, July 18, 2016, 7:48:20 PM, you wrote: > On 18/07/16 11:21, linux@eikelenboom.it wrote: >> Hi Jan, >> >> It seems that since your patch series starting with commit: >> 2016-06-22 x86/vMSI-X: defer intercept handler registration >> 74c6dc2d0ac4dcab0c6243cdf6ed550c1532b798 >> >> The shutdown of a guest which has a PCI device passed through which >> uses MSI-X interrupts causes >> a host crash, see the splat below. Somehow it also doesn't reboot in 5 >> seconds as it is supposed to (i don't have no-reboot on the command >> line). >> >> -- >> Sander >> >> >> (XEN) [2016-07-16 16:03:17.069] ----[ Xen-4.8-unstable x86_64 >> debug=y Not tainted ]---- >> (XEN) [2016-07-16 16:03:17.069] CPU: 0 >> (XEN) [2016-07-16 16:03:17.069] RIP: e008:[<ffff82d0801e39de>] >> msixtbl_pt_unregister+0x7b/0xd9 >> (XEN) [2016-07-16 16:03:17.069] RFLAGS: 0000000000010082 CONTEXT: >> hypervisor (d0v0) >> (XEN) [2016-07-16 16:03:17.069] rax: ffff83055c678e40 rbx: >> ffff83055c685500 rcx: 0000000000000001 >> (XEN) [2016-07-16 16:03:17.069] rdx: 0000000000000000 rsi: >> 0000000000001ab0 rdi: ffff8305313b85a0 >> (XEN) [2016-07-16 16:03:17.069] rbp: ffff83009fd07c78 rsp: >> ffff83009fd07c68 r8: ffff8305356dfff0 >> (XEN) [2016-07-16 16:03:17.069] r9: ffff8305356df480 r10: >> ffff830503420c50 r11: 0000000000000282 >> (XEN) [2016-07-16 16:03:17.069] r12: ffff8305313b8000 r13: >> ffff83009fd07e48 r14: ffff8305313b8000 >> (XEN) [2016-07-16 16:03:17.069] r15: ffff8305356df4a8 cr0: >> 0000000080050033 cr4: 00000000000006e0 >> (XEN) [2016-07-16 16:03:17.069] cr3: 000000053639f000 cr2: >> 0000000000000000 >> (XEN) [2016-07-16 16:03:17.069] ds: 0000 es: 0000 fs: 0000 gs: >> 0000 ss: e010 cs: e008 >> (XEN) [2016-07-16 16:03:17.069] Xen code around <ffff82d0801e39de> >> (msixtbl_pt_unregister+0x7b/0xd9): >> (XEN) [2016-07-16 16:03:17.069] 39 42 18 74 19 48 89 ca <48> 8b 0a 0f >> 18 09 48 39 fa 75 ec 48 8d 7b 24 e8 >> (XEN) [2016-07-16 16:03:17.069] Xen stack trace from >> rsp=ffff83009fd07c68: >> (XEN) [2016-07-16 16:03:17.069] 0000000000000000 ffff8305356df480 >> ffff83009fd07ce8 ffff82d08014c394 >> (XEN) [2016-07-16 16:03:17.069] 0000000000000001 ffff8305356df480 >> 0000000000000293 ffff8305313b80cc >> (XEN) [2016-07-16 16:03:17.069] 000000568012ffe5 ffff8305313b8000 >> ffff83009fd07cd8 ffff83009fd07e38 >> (XEN) [2016-07-16 16:03:17.070] 0000000000000000 ffff83054e5fc000 >> 00007fc25a33e004 ffff8305313b8000 >> (XEN) [2016-07-16 16:03:17.070] ffff83009fd07da8 ffff82d0801629c8 >> 0000000000000000 ffff83053b1191f0 >> (XEN) [2016-07-16 16:03:17.070] 0000000000000246 ffff83009fd07d28 >> ffff82d0801300ae 000000000000000e >> (XEN) [2016-07-16 16:03:17.070] ffff83009fd07d78 ffff82d080171497 >> ffff83009fd07d78 000000020001d17b >> (XEN) [2016-07-16 16:03:17.070] ffff83009fd07d68 0000000000000000 >> ffff83009fd07d68 ffff82d080130280 >> (XEN) [2016-07-16 16:03:17.070] ffff83009fd07d78 ffff82d08014d0aa >> 0000000000000202 0000000000000000 >> (XEN) [2016-07-16 16:03:17.070] ffff8305313b8000 ffff88005716d320 >> 0000000000305000 00007fc25a33e004 >> (XEN) [2016-07-16 16:03:17.070] ffff83009fd07ef8 ffff82d080104b2c >> 0000000000000206 0000000000000002 >> (XEN) [2016-07-16 16:03:17.070] ffff83009fd07df8 ffff82d08018c9db >> 0000000000000cfe 0000000000000002 >> (XEN) [2016-07-16 16:03:17.070] 0000000000000002 ffff83054e5fc000 >> ffff83009fd07e48 ffff82d08019c119 >> (XEN) [2016-07-16 16:03:17.070] ffff83009fd07e38 0000000080121177 >> ffff83009fd07e38 0000000000000cfe >> (XEN) [2016-07-16 16:03:17.070] ffff83009fd07f18 0000000000000206 >> 0000000c00000030 000056082bb90013 >> (XEN) [2016-07-16 16:03:17.070] 0000000200000056 00007fc200000013 >> 0000305600000000 000056082b87465d >> (XEN) [2016-07-16 16:03:17.070] 00007ffe268206e0 00007fc25606b31f >> 0000000000000000 000056082b8746cf >> (XEN) [2016-07-16 16:03:17.070] 0000000000001000 fee5600026820730 >> 00007ffe26820740 000056082b8797be >> (XEN) [2016-07-16 16:03:17.070] 00000000fee56000 0000430026820772 >> 00007ffe26820740 0000000000003056 >> (XEN) [2016-07-16 16:03:17.070] 00007ffe268206e0 ffff83009ff8a000 >> 00007ffe26820580 ffff88005716d320 >> (XEN) [2016-07-16 16:03:17.070] Xen call trace: >> (XEN) [2016-07-16 16:03:17.070] [<ffff82d0801e39de>] >> msixtbl_pt_unregister+0x7b/0xd9 >> (XEN) [2016-07-16 16:03:17.070] [<ffff82d08014c394>] >> pt_irq_destroy_bind+0x2be/0x3f0 >> (XEN) [2016-07-16 16:03:17.070] [<ffff82d0801629c8>] >> arch_do_domctl+0xc77/0x2414 >> (XEN) [2016-07-16 16:03:17.070] [<ffff82d080104b2c>] >> do_domctl+0x19db/0x1d26 >> (XEN) [2016-07-16 16:03:17.070] [<ffff82d0802426bd>] >> lstar_enter+0xdd/0x137 >> (XEN) [2016-07-16 16:03:17.070] >> (XEN) [2016-07-16 16:03:17.070] Pagetable walk from 0000000000000000: >> (XEN) [2016-07-16 16:03:17.070] L4[0x000] = 0000000000000000 >> ffffffffffffffff >> (XEN) [2016-07-16 16:03:18.147] >> (XEN) [2016-07-16 16:03:18.155] **************************************** >> (XEN) [2016-07-16 16:03:18.175] Panic on CPU 0: >> (XEN) [2016-07-16 16:03:18.187] FATAL PAGE FAULT >> (XEN) [2016-07-16 16:03:18.200] [error_code=0000] >> (XEN) [2016-07-16 16:03:18.214] Faulting linear address: 0000000000000000 >> (XEN) [2016-07-16 16:03:18.233] **************************************** >> (XEN) [2016-07-16 16:03:18.252] >> (XEN) [2016-07-16 16:03:18.261] Reboot in five seconds... >> > Can you paste the disassembly of msixtbl_pt_unregister() please? That > is a dereference of %rdx which is NULL at this point, but I need to > figure out which pointer it is supposed to be. Hi Andrew, # addr2line -e xen-syms ffff82d0801e3e7e /usr/src/new/xen-unstable/xen/arch/x86/hvm/vmsi.c:535 (discriminator 1) So the RIP points to: void msixtbl_pt_unregister(struct domain *d, struct pirq *pirq) { struct irq_desc *irq_desc; struct msi_desc *msi_desc; struct pci_dev *pdev; struct msixtbl_entry *entry; ASSERT(pcidevs_locked()); ASSERT(spin_is_locked(&d->event_lock)); if ( !has_vlapic(d) ) return; irq_desc = pirq_spin_lock_irq_desc(pirq, NULL); if ( !irq_desc ) return; msi_desc = irq_desc->msi_desc; if ( !msi_desc ) goto out; pdev = msi_desc->dev; list_for_each_entry( entry, &d->arch.hvm_domain.msixtbl_list, list ) <--- HERE if ( pdev == entry->pdev ) goto found; out: spin_unlock_irq(&irq_desc->lock); return; found: if ( !atomic_dec_and_test(&entry->refcnt) ) del_msixtbl_entry(entry); spin_unlock_irq(&irq_desc->lock); } Disassembly: (gdb) info line msixtbl_pt_unregister Line 513 of "vmsi.c" starts at address 0xffff82d0801e3e03 <msixtbl_pt_unregister> and ends at 0xffff82d0801e3e10 <msixtbl_pt_unregister+13>. (gdb) disas 0xffff82d0801e3e03 Dump of assembler code for function msixtbl_pt_unregister: 0xffff82d0801e3e03 <+0>: push %rbp 0xffff82d0801e3e04 <+1>: mov %rsp,%rbp 0xffff82d0801e3e07 <+4>: push %r12 0xffff82d0801e3e09 <+6>: push %rbx 0xffff82d0801e3e0a <+7>: mov %rdi,%r12 0xffff82d0801e3e0d <+10>: mov %rsi,%rbx 0xffff82d0801e3e10 <+13>: callq 0xffff82d08014d585 <pcidevs_locked> 0xffff82d0801e3e15 <+18>: test %al,%al 0xffff82d0801e3e17 <+20>: jne 0xffff82d0801e3e1b <msixtbl_pt_unregister+24> 0xffff82d0801e3e19 <+22>: ud2 0xffff82d0801e3e1b <+24>: lea 0xcc(%r12),%rdi 0xffff82d0801e3e23 <+32>: callq 0xffff82d080130544 <_spin_is_locked> 0xffff82d0801e3e28 <+37>: test %eax,%eax 0xffff82d0801e3e2a <+39>: jne 0xffff82d0801e3e2e <msixtbl_pt_unregister+43> 0xffff82d0801e3e2c <+41>: ud2 0xffff82d0801e3e2e <+43>: testb $0x1,0x9dc(%r12) 0xffff82d0801e3e37 <+52>: je 0xffff82d0801e3ed7 <msixtbl_pt_unregister+212> 0xffff82d0801e3e3d <+58>: mov $0x0,%esi 0xffff82d0801e3e42 <+63>: mov %rbx,%rdi 0xffff82d0801e3e45 <+66>: callq 0xffff82d0801743a4 <pirq_spin_lock_irq_desc> 0xffff82d0801e3e4a <+71>: mov %rax,%rbx 0xffff82d0801e3e4d <+74>: test %rax,%rax 0xffff82d0801e3e50 <+77>: je 0xffff82d0801e3ed7 <msixtbl_pt_unregister+212> 0xffff82d0801e3e56 <+83>: mov 0x10(%rax),%rax 0xffff82d0801e3e5a <+87>: test %rax,%rax 0xffff82d0801e3e5d <+90>: je 0xffff82d0801e3e89 <msixtbl_pt_unregister+134> 0xffff82d0801e3e5f <+92>: mov 0x20(%rax),%rax 0xffff82d0801e3e63 <+96>: mov 0x5a0(%r12),%rdx 0xffff82d0801e3e6b <+104>: lea 0x5a0(%r12),%rdi 0xffff82d0801e3e73 <+112>: jmp 0xffff82d0801e3e7e <msixtbl_pt_unregister+123> 0xffff82d0801e3e75 <+114>: cmp %rax,0x18(%rdx) 0xffff82d0801e3e79 <+118>: je 0xffff82d0801e3e94 <msixtbl_pt_unregister+145> 0xffff82d0801e3e7b <+120>: mov %rcx,%rdx 0xffff82d0801e3e7e <+123>: mov (%rdx),%rcx 0xffff82d0801e3e81 <+126>: prefetcht0 (%rcx) 0xffff82d0801e3e84 <+129>: cmp %rdi,%rdx 0xffff82d0801e3e87 <+132>: jne 0xffff82d0801e3e75 <msixtbl_pt_unregister+114> 0xffff82d0801e3e89 <+134>: lea 0x24(%rbx),%rdi 0xffff82d0801e3e8d <+138>: callq 0xffff82d080130514 <_spin_unlock_irq> 0xffff82d0801e3e92 <+143>: jmp 0xffff82d0801e3ed7 <msixtbl_pt_unregister+212> 0xffff82d0801e3e94 <+145>: lock decl 0x10(%rdx) 0xffff82d0801e3e98 <+149>: sete %al 0xffff82d0801e3e9b <+152>: test %al,%al 0xffff82d0801e3e9d <+154>: jne 0xffff82d0801e3ece <msixtbl_pt_unregister+203> 0xffff82d0801e3e9f <+156>: mov (%rdx),%rcx 0xffff82d0801e3ea2 <+159>: mov 0x8(%rdx),%rax 0xffff82d0801e3ea6 <+163>: mov %rax,0x8(%rcx) 0xffff82d0801e3eaa <+167>: mov %rcx,(%rax) 0xffff82d0801e3ead <+170>: movabs $0x200200200200200,%rax 0xffff82d0801e3eb7 <+180>: mov %rax,0x8(%rdx) 0xffff82d0801e3ebb <+184>: lea 0x158(%rdx),%rdi 0xffff82d0801e3ec2 <+191>: lea -0xac1(%rip),%rsi # 0xffff82d0801e3408 <free_msixtbl_entry> 0xffff82d0801e3ec9 <+198>: callq 0xffff82d080122be0 <call_rcu> 0xffff82d0801e3ece <+203>: lea 0x24(%rbx),%rdi 0xffff82d0801e3ed2 <+207>: callq 0xffff82d080130514 <_spin_unlock_irq> 0xffff82d0801e3ed7 <+212>: pop %rbx 0xffff82d0801e3ed8 <+213>: pop %r12 0xffff82d0801e3eda <+215>: pop %rbp 0xffff82d0801e3edb <+216>: retq End of assembler dump. -- Sander > Thanks, > ~Andrew _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Xen-unstable 4.8: Host crash when shutting down guest with pci device passed through using MSI-X interrupts. 2016-07-18 19:26 ` Sander Eikelenboom @ 2016-07-18 20:57 ` Andrew Cooper 2016-07-18 22:03 ` linux 0 siblings, 1 reply; 13+ messages in thread From: Andrew Cooper @ 2016-07-18 20:57 UTC (permalink / raw) To: Sander Eikelenboom; +Cc: Jan Beulich, Xen-devel On 18/07/2016 20:26, Sander Eikelenboom wrote: > Monday, July 18, 2016, 7:48:20 PM, you wrote: > >> On 18/07/16 11:21, linux@eikelenboom.it wrote: >>> Hi Jan, >>> >>> It seems that since your patch series starting with commit: >>> 2016-06-22 x86/vMSI-X: defer intercept handler registration >>> 74c6dc2d0ac4dcab0c6243cdf6ed550c1532b798 >>> >>> The shutdown of a guest which has a PCI device passed through which >>> uses MSI-X interrupts causes >>> a host crash, see the splat below. Somehow it also doesn't reboot in 5 >>> seconds as it is supposed to (i don't have no-reboot on the command >>> line). >>> >>> -- >>> Sander >>> >>> >>> (XEN) [2016-07-16 16:03:17.069] ----[ Xen-4.8-unstable x86_64 >>> debug=y Not tainted ]---- >>> (XEN) [2016-07-16 16:03:17.069] CPU: 0 >>> (XEN) [2016-07-16 16:03:17.069] RIP: e008:[<ffff82d0801e39de>] >>> msixtbl_pt_unregister+0x7b/0xd9 >>> (XEN) [2016-07-16 16:03:17.069] RFLAGS: 0000000000010082 CONTEXT: >>> hypervisor (d0v0) >>> (XEN) [2016-07-16 16:03:17.069] rax: ffff83055c678e40 rbx: >>> ffff83055c685500 rcx: 0000000000000001 >>> (XEN) [2016-07-16 16:03:17.069] rdx: 0000000000000000 rsi: >>> 0000000000001ab0 rdi: ffff8305313b85a0 >>> (XEN) [2016-07-16 16:03:17.069] rbp: ffff83009fd07c78 rsp: >>> ffff83009fd07c68 r8: ffff8305356dfff0 >>> (XEN) [2016-07-16 16:03:17.069] r9: ffff8305356df480 r10: >>> ffff830503420c50 r11: 0000000000000282 >>> (XEN) [2016-07-16 16:03:17.069] r12: ffff8305313b8000 r13: >>> ffff83009fd07e48 r14: ffff8305313b8000 >>> (XEN) [2016-07-16 16:03:17.069] r15: ffff8305356df4a8 cr0: >>> 0000000080050033 cr4: 00000000000006e0 >>> (XEN) [2016-07-16 16:03:17.069] cr3: 000000053639f000 cr2: >>> 0000000000000000 >>> (XEN) [2016-07-16 16:03:17.069] ds: 0000 es: 0000 fs: 0000 gs: >>> 0000 ss: e010 cs: e008 >>> (XEN) [2016-07-16 16:03:17.069] Xen code around <ffff82d0801e39de> >>> (msixtbl_pt_unregister+0x7b/0xd9): >>> (XEN) [2016-07-16 16:03:17.069] 39 42 18 74 19 48 89 ca <48> 8b 0a 0f >>> 18 09 48 39 fa 75 ec 48 8d 7b 24 e8 >>> (XEN) [2016-07-16 16:03:17.069] Xen stack trace from >>> rsp=ffff83009fd07c68: >>> (XEN) [2016-07-16 16:03:17.069] 0000000000000000 ffff8305356df480 >>> ffff83009fd07ce8 ffff82d08014c394 >>> (XEN) [2016-07-16 16:03:17.069] 0000000000000001 ffff8305356df480 >>> 0000000000000293 ffff8305313b80cc >>> (XEN) [2016-07-16 16:03:17.069] 000000568012ffe5 ffff8305313b8000 >>> ffff83009fd07cd8 ffff83009fd07e38 >>> (XEN) [2016-07-16 16:03:17.070] 0000000000000000 ffff83054e5fc000 >>> 00007fc25a33e004 ffff8305313b8000 >>> (XEN) [2016-07-16 16:03:17.070] ffff83009fd07da8 ffff82d0801629c8 >>> 0000000000000000 ffff83053b1191f0 >>> (XEN) [2016-07-16 16:03:17.070] 0000000000000246 ffff83009fd07d28 >>> ffff82d0801300ae 000000000000000e >>> (XEN) [2016-07-16 16:03:17.070] ffff83009fd07d78 ffff82d080171497 >>> ffff83009fd07d78 000000020001d17b >>> (XEN) [2016-07-16 16:03:17.070] ffff83009fd07d68 0000000000000000 >>> ffff83009fd07d68 ffff82d080130280 >>> (XEN) [2016-07-16 16:03:17.070] ffff83009fd07d78 ffff82d08014d0aa >>> 0000000000000202 0000000000000000 >>> (XEN) [2016-07-16 16:03:17.070] ffff8305313b8000 ffff88005716d320 >>> 0000000000305000 00007fc25a33e004 >>> (XEN) [2016-07-16 16:03:17.070] ffff83009fd07ef8 ffff82d080104b2c >>> 0000000000000206 0000000000000002 >>> (XEN) [2016-07-16 16:03:17.070] ffff83009fd07df8 ffff82d08018c9db >>> 0000000000000cfe 0000000000000002 >>> (XEN) [2016-07-16 16:03:17.070] 0000000000000002 ffff83054e5fc000 >>> ffff83009fd07e48 ffff82d08019c119 >>> (XEN) [2016-07-16 16:03:17.070] ffff83009fd07e38 0000000080121177 >>> ffff83009fd07e38 0000000000000cfe >>> (XEN) [2016-07-16 16:03:17.070] ffff83009fd07f18 0000000000000206 >>> 0000000c00000030 000056082bb90013 >>> (XEN) [2016-07-16 16:03:17.070] 0000000200000056 00007fc200000013 >>> 0000305600000000 000056082b87465d >>> (XEN) [2016-07-16 16:03:17.070] 00007ffe268206e0 00007fc25606b31f >>> 0000000000000000 000056082b8746cf >>> (XEN) [2016-07-16 16:03:17.070] 0000000000001000 fee5600026820730 >>> 00007ffe26820740 000056082b8797be >>> (XEN) [2016-07-16 16:03:17.070] 00000000fee56000 0000430026820772 >>> 00007ffe26820740 0000000000003056 >>> (XEN) [2016-07-16 16:03:17.070] 00007ffe268206e0 ffff83009ff8a000 >>> 00007ffe26820580 ffff88005716d320 >>> (XEN) [2016-07-16 16:03:17.070] Xen call trace: >>> (XEN) [2016-07-16 16:03:17.070] [<ffff82d0801e39de>] >>> msixtbl_pt_unregister+0x7b/0xd9 >>> (XEN) [2016-07-16 16:03:17.070] [<ffff82d08014c394>] >>> pt_irq_destroy_bind+0x2be/0x3f0 >>> (XEN) [2016-07-16 16:03:17.070] [<ffff82d0801629c8>] >>> arch_do_domctl+0xc77/0x2414 >>> (XEN) [2016-07-16 16:03:17.070] [<ffff82d080104b2c>] >>> do_domctl+0x19db/0x1d26 >>> (XEN) [2016-07-16 16:03:17.070] [<ffff82d0802426bd>] >>> lstar_enter+0xdd/0x137 >>> (XEN) [2016-07-16 16:03:17.070] >>> (XEN) [2016-07-16 16:03:17.070] Pagetable walk from 0000000000000000: >>> (XEN) [2016-07-16 16:03:17.070] L4[0x000] = 0000000000000000 >>> ffffffffffffffff >>> (XEN) [2016-07-16 16:03:18.147] >>> (XEN) [2016-07-16 16:03:18.155] **************************************** >>> (XEN) [2016-07-16 16:03:18.175] Panic on CPU 0: >>> (XEN) [2016-07-16 16:03:18.187] FATAL PAGE FAULT >>> (XEN) [2016-07-16 16:03:18.200] [error_code=0000] >>> (XEN) [2016-07-16 16:03:18.214] Faulting linear address: 0000000000000000 >>> (XEN) [2016-07-16 16:03:18.233] **************************************** >>> (XEN) [2016-07-16 16:03:18.252] >>> (XEN) [2016-07-16 16:03:18.261] Reboot in five seconds... >>> >> Can you paste the disassembly of msixtbl_pt_unregister() please? That >> is a dereference of %rdx which is NULL at this point, but I need to >> figure out which pointer it is supposed to be. > Hi Andrew, <snip> Thanks. What has happened is that the msixtbl linked list is still uninitialised at this point. The only way I can see for this to happen is that msixtbl_init() hasn't been called, or hasn't passed its first if condition. The INIT_LIST_HEAD() visible in the context of the 2nd hunk of identified changeset is the line of code which changes the list from 0 to initialised, and I don't see anywhere which re-zeros it later. This alone suggests that the VM in question isn't actually using MSI-X interrupts, even if the device passed through is capable. Following the style of the identified changeset, andrewcoop@andrewcoop:/local/xen.git/xen$ git diff diff --git a/xen/arch/x86/hvm/vmsi.c b/xen/arch/x86/hvm/vmsi.c index e418b98..c533719 100644 --- a/xen/arch/x86/hvm/vmsi.c +++ b/xen/arch/x86/hvm/vmsi.c @@ -519,7 +519,7 @@ void msixtbl_pt_unregister(struct domain *d, struct pirq *pirq) ASSERT(pcidevs_locked()); ASSERT(spin_is_locked(&d->event_lock)); - if ( !has_vlapic(d) ) + if ( !d->arch.hvm_domain.msixtbl_list.next ) return; irq_desc = pirq_spin_lock_irq_desc(pirq, NULL); should resolve your issue, although I am very tempted to replace the opencoded list logic with a msixtbl_initialised() predicate instead. ~Andrew _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: Xen-unstable 4.8: Host crash when shutting down guest with pci device passed through using MSI-X interrupts. 2016-07-18 20:57 ` Andrew Cooper @ 2016-07-18 22:03 ` linux 2016-07-18 22:07 ` Andrew Cooper 0 siblings, 1 reply; 13+ messages in thread From: linux @ 2016-07-18 22:03 UTC (permalink / raw) To: Andrew Cooper; +Cc: Andrew Cooper, Jan Beulich, Xen-devel On 2016-07-18 22:57, Andrew Cooper wrote: > On 18/07/2016 20:26, Sander Eikelenboom wrote: >> Monday, July 18, 2016, 7:48:20 PM, you wrote: >> >>> On 18/07/16 11:21, linux@eikelenboom.it wrote: >>>> Hi Jan, >>>> >>>> It seems that since your patch series starting with commit: >>>> 2016-06-22 x86/vMSI-X: defer intercept handler registration >>>> 74c6dc2d0ac4dcab0c6243cdf6ed550c1532b798 >>>> >>>> The shutdown of a guest which has a PCI device passed through which >>>> uses MSI-X interrupts causes >>>> a host crash, see the splat below. Somehow it also doesn't reboot in >>>> 5 >>>> seconds as it is supposed to (i don't have no-reboot on the command >>>> line). >>>> >>>> -- >>>> Sander >>>> >>>> >>>> (XEN) [2016-07-16 16:03:17.069] ----[ Xen-4.8-unstable x86_64 >>>> debug=y Not tainted ]---- >>>> (XEN) [2016-07-16 16:03:17.069] CPU: 0 >>>> (XEN) [2016-07-16 16:03:17.069] RIP: e008:[<ffff82d0801e39de>] >>>> msixtbl_pt_unregister+0x7b/0xd9 >>>> (XEN) [2016-07-16 16:03:17.069] RFLAGS: 0000000000010082 CONTEXT: >>>> hypervisor (d0v0) >>>> (XEN) [2016-07-16 16:03:17.069] rax: ffff83055c678e40 rbx: >>>> ffff83055c685500 rcx: 0000000000000001 >>>> (XEN) [2016-07-16 16:03:17.069] rdx: 0000000000000000 rsi: >>>> 0000000000001ab0 rdi: ffff8305313b85a0 >>>> (XEN) [2016-07-16 16:03:17.069] rbp: ffff83009fd07c78 rsp: >>>> ffff83009fd07c68 r8: ffff8305356dfff0 >>>> (XEN) [2016-07-16 16:03:17.069] r9: ffff8305356df480 r10: >>>> ffff830503420c50 r11: 0000000000000282 >>>> (XEN) [2016-07-16 16:03:17.069] r12: ffff8305313b8000 r13: >>>> ffff83009fd07e48 r14: ffff8305313b8000 >>>> (XEN) [2016-07-16 16:03:17.069] r15: ffff8305356df4a8 cr0: >>>> 0000000080050033 cr4: 00000000000006e0 >>>> (XEN) [2016-07-16 16:03:17.069] cr3: 000000053639f000 cr2: >>>> 0000000000000000 >>>> (XEN) [2016-07-16 16:03:17.069] ds: 0000 es: 0000 fs: 0000 gs: >>>> 0000 ss: e010 cs: e008 >>>> (XEN) [2016-07-16 16:03:17.069] Xen code around <ffff82d0801e39de> >>>> (msixtbl_pt_unregister+0x7b/0xd9): >>>> (XEN) [2016-07-16 16:03:17.069] 39 42 18 74 19 48 89 ca <48> 8b 0a >>>> 0f >>>> 18 09 48 39 fa 75 ec 48 8d 7b 24 e8 >>>> (XEN) [2016-07-16 16:03:17.069] Xen stack trace from >>>> rsp=ffff83009fd07c68: >>>> (XEN) [2016-07-16 16:03:17.069] 0000000000000000 ffff8305356df480 >>>> ffff83009fd07ce8 ffff82d08014c394 >>>> (XEN) [2016-07-16 16:03:17.069] 0000000000000001 ffff8305356df480 >>>> 0000000000000293 ffff8305313b80cc >>>> (XEN) [2016-07-16 16:03:17.069] 000000568012ffe5 ffff8305313b8000 >>>> ffff83009fd07cd8 ffff83009fd07e38 >>>> (XEN) [2016-07-16 16:03:17.070] 0000000000000000 ffff83054e5fc000 >>>> 00007fc25a33e004 ffff8305313b8000 >>>> (XEN) [2016-07-16 16:03:17.070] ffff83009fd07da8 ffff82d0801629c8 >>>> 0000000000000000 ffff83053b1191f0 >>>> (XEN) [2016-07-16 16:03:17.070] 0000000000000246 ffff83009fd07d28 >>>> ffff82d0801300ae 000000000000000e >>>> (XEN) [2016-07-16 16:03:17.070] ffff83009fd07d78 ffff82d080171497 >>>> ffff83009fd07d78 000000020001d17b >>>> (XEN) [2016-07-16 16:03:17.070] ffff83009fd07d68 0000000000000000 >>>> ffff83009fd07d68 ffff82d080130280 >>>> (XEN) [2016-07-16 16:03:17.070] ffff83009fd07d78 ffff82d08014d0aa >>>> 0000000000000202 0000000000000000 >>>> (XEN) [2016-07-16 16:03:17.070] ffff8305313b8000 ffff88005716d320 >>>> 0000000000305000 00007fc25a33e004 >>>> (XEN) [2016-07-16 16:03:17.070] ffff83009fd07ef8 ffff82d080104b2c >>>> 0000000000000206 0000000000000002 >>>> (XEN) [2016-07-16 16:03:17.070] ffff83009fd07df8 ffff82d08018c9db >>>> 0000000000000cfe 0000000000000002 >>>> (XEN) [2016-07-16 16:03:17.070] 0000000000000002 ffff83054e5fc000 >>>> ffff83009fd07e48 ffff82d08019c119 >>>> (XEN) [2016-07-16 16:03:17.070] ffff83009fd07e38 0000000080121177 >>>> ffff83009fd07e38 0000000000000cfe >>>> (XEN) [2016-07-16 16:03:17.070] ffff83009fd07f18 0000000000000206 >>>> 0000000c00000030 000056082bb90013 >>>> (XEN) [2016-07-16 16:03:17.070] 0000000200000056 00007fc200000013 >>>> 0000305600000000 000056082b87465d >>>> (XEN) [2016-07-16 16:03:17.070] 00007ffe268206e0 00007fc25606b31f >>>> 0000000000000000 000056082b8746cf >>>> (XEN) [2016-07-16 16:03:17.070] 0000000000001000 fee5600026820730 >>>> 00007ffe26820740 000056082b8797be >>>> (XEN) [2016-07-16 16:03:17.070] 00000000fee56000 0000430026820772 >>>> 00007ffe26820740 0000000000003056 >>>> (XEN) [2016-07-16 16:03:17.070] 00007ffe268206e0 ffff83009ff8a000 >>>> 00007ffe26820580 ffff88005716d320 >>>> (XEN) [2016-07-16 16:03:17.070] Xen call trace: >>>> (XEN) [2016-07-16 16:03:17.070] [<ffff82d0801e39de>] >>>> msixtbl_pt_unregister+0x7b/0xd9 >>>> (XEN) [2016-07-16 16:03:17.070] [<ffff82d08014c394>] >>>> pt_irq_destroy_bind+0x2be/0x3f0 >>>> (XEN) [2016-07-16 16:03:17.070] [<ffff82d0801629c8>] >>>> arch_do_domctl+0xc77/0x2414 >>>> (XEN) [2016-07-16 16:03:17.070] [<ffff82d080104b2c>] >>>> do_domctl+0x19db/0x1d26 >>>> (XEN) [2016-07-16 16:03:17.070] [<ffff82d0802426bd>] >>>> lstar_enter+0xdd/0x137 >>>> (XEN) [2016-07-16 16:03:17.070] >>>> (XEN) [2016-07-16 16:03:17.070] Pagetable walk from >>>> 0000000000000000: >>>> (XEN) [2016-07-16 16:03:17.070] L4[0x000] = 0000000000000000 >>>> ffffffffffffffff >>>> (XEN) [2016-07-16 16:03:18.147] >>>> (XEN) [2016-07-16 16:03:18.155] >>>> **************************************** >>>> (XEN) [2016-07-16 16:03:18.175] Panic on CPU 0: >>>> (XEN) [2016-07-16 16:03:18.187] FATAL PAGE FAULT >>>> (XEN) [2016-07-16 16:03:18.200] [error_code=0000] >>>> (XEN) [2016-07-16 16:03:18.214] Faulting linear address: >>>> 0000000000000000 >>>> (XEN) [2016-07-16 16:03:18.233] >>>> **************************************** >>>> (XEN) [2016-07-16 16:03:18.252] >>>> (XEN) [2016-07-16 16:03:18.261] Reboot in five seconds... >>>> >>> Can you paste the disassembly of msixtbl_pt_unregister() please? >>> That >>> is a dereference of %rdx which is NULL at this point, but I need to >>> figure out which pointer it is supposed to be. >> Hi Andrew, > > <snip> > > Thanks. What has happened is that the msixtbl linked list is still > uninitialised at this point. The only way I can see for this to happen > is that msixtbl_init() hasn't been called, or hasn't passed its first > if > condition. The INIT_LIST_HEAD() visible in the context of the 2nd hunk > of identified changeset is the line of code which changes the list from > 0 to initialised, and I don't see anywhere which re-zeros it later. > > This alone suggests that the VM in question isn't actually using MSI-X > interrupts, even if the device passed through is capable. Hmm didn't actually check this before, but you seem to be right (below is the lspci output from within the guest). > Following the style of the identified changeset, > > andrewcoop@andrewcoop:/local/xen.git/xen$ git diff > diff --git a/xen/arch/x86/hvm/vmsi.c b/xen/arch/x86/hvm/vmsi.c > index e418b98..c533719 100644 > --- a/xen/arch/x86/hvm/vmsi.c > +++ b/xen/arch/x86/hvm/vmsi.c > @@ -519,7 +519,7 @@ void msixtbl_pt_unregister(struct domain *d, struct > pirq *pirq) > ASSERT(pcidevs_locked()); > ASSERT(spin_is_locked(&d->event_lock)); > > - if ( !has_vlapic(d) ) > + if ( !d->arch.hvm_domain.msixtbl_list.next ) > return; > > irq_desc = pirq_spin_lock_irq_desc(pirq, NULL); > > should resolve your issue, although I am very tempted to replace the > opencoded list logic with a msixtbl_initialised() predicate instead. > > ~Andrew It does resolve the issue, thanks ! -- Sander 00:05.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Turks PRO [Radeon HD 6570/7570/8550] (prog-if 00 [VGA controller]) Subsystem: PC Partner Limited / Sapphire Technology Turks PRO [Radeon HD 6570/7570/8550] Physical Slot: 5 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0 Interrupt: pin A routed to IRQ 68 Region 0: Memory at e0000000 (64-bit, prefetchable) [size=256M] Region 2: Memory at f3060000 (64-bit, non-prefetchable) [size=128K] Region 4: I/O ports at c100 [size=256] Expansion ROM at f3080000 [disabled] [size=128K] Capabilities: [50] Power Management version 3 Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME- Capabilities: [58] Express (v2) Legacy Endpoint, MSI 00 DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <4us, L1 unlimited ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset- DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported- RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+ MaxPayload 128 bytes, MaxReadReq 512 bytes DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend- LnkCap: Port #1, Speed 5GT/s, Width x16, ASPM L0s L1, Exit Latency L0s <64ns, L1 <1us ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp- LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk- ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 2.5GT/s, Width x8, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- DevCap2: Completion Timeout: Not Supported, TimeoutDis-, LTR-, OBFF Not Supported DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete-, EqualizationPhase1- EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest- Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+ Address: 00000000fee57000 Data: 4300 Kernel driver in use: radeon 00:06.0 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Turks/Whistler HDMI Audio [Radeon HD 6000 Series] Subsystem: PC Partner Limited / Sapphire Technology Turks/Whistler HDMI Audio [Radeon HD 6000 Series] Physical Slot: 6 Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0 Interrupt: pin B routed to IRQ 79 Region 0: Memory at f30b0000 (64-bit, non-prefetchable) [size=16K] Capabilities: [50] Power Management version 3 Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME- Capabilities: [58] Express (v2) Legacy Endpoint, MSI 00 DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <4us, L1 unlimited ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset- DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported- RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+ MaxPayload 128 bytes, MaxReadReq 512 bytes DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend- LnkCap: Port #1, Speed 5GT/s, Width x16, ASPM L0s L1, Exit Latency L0s <64ns, L1 <1us ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp- LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk- ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 2.5GT/s, Width x8, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- DevCap2: Completion Timeout: Not Supported, TimeoutDis-, LTR-, OBFF Not Supported DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete-, EqualizationPhase1- EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest- Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+ Address: 00000000fee56000 Data: 4300 Kernel driver in use: snd_hda_intel _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Xen-unstable 4.8: Host crash when shutting down guest with pci device passed through using MSI-X interrupts. 2016-07-18 22:03 ` linux @ 2016-07-18 22:07 ` Andrew Cooper 0 siblings, 0 replies; 13+ messages in thread From: Andrew Cooper @ 2016-07-18 22:07 UTC (permalink / raw) To: linux; +Cc: Xen-devel, Jan Beulich, Andrew Cooper On 18/07/2016 23:03, linux@eikelenboom.it wrote: > On 2016-07-18 22:57, Andrew Cooper wrote: >> On 18/07/2016 20:26, Sander Eikelenboom wrote: >>> Monday, July 18, 2016, 7:48:20 PM, you wrote: >>> >>>> On 18/07/16 11:21, linux@eikelenboom.it wrote: >>>>> Hi Jan, >>>>> >>>>> It seems that since your patch series starting with commit: >>>>> 2016-06-22 x86/vMSI-X: defer intercept handler registration >>>>> 74c6dc2d0ac4dcab0c6243cdf6ed550c1532b798 >>>>> >>>>> The shutdown of a guest which has a PCI device passed through which >>>>> uses MSI-X interrupts causes >>>>> a host crash, see the splat below. Somehow it also doesn't reboot >>>>> in 5 >>>>> seconds as it is supposed to (i don't have no-reboot on the command >>>>> line). >>>>> >>>>> -- >>>>> Sander >>>>> >>>>> >>>>> (XEN) [2016-07-16 16:03:17.069] ----[ Xen-4.8-unstable x86_64 >>>>> debug=y Not tainted ]---- >>>>> (XEN) [2016-07-16 16:03:17.069] CPU: 0 >>>>> (XEN) [2016-07-16 16:03:17.069] RIP: e008:[<ffff82d0801e39de>] >>>>> msixtbl_pt_unregister+0x7b/0xd9 >>>>> (XEN) [2016-07-16 16:03:17.069] RFLAGS: 0000000000010082 CONTEXT: >>>>> hypervisor (d0v0) >>>>> (XEN) [2016-07-16 16:03:17.069] rax: ffff83055c678e40 rbx: >>>>> ffff83055c685500 rcx: 0000000000000001 >>>>> (XEN) [2016-07-16 16:03:17.069] rdx: 0000000000000000 rsi: >>>>> 0000000000001ab0 rdi: ffff8305313b85a0 >>>>> (XEN) [2016-07-16 16:03:17.069] rbp: ffff83009fd07c78 rsp: >>>>> ffff83009fd07c68 r8: ffff8305356dfff0 >>>>> (XEN) [2016-07-16 16:03:17.069] r9: ffff8305356df480 r10: >>>>> ffff830503420c50 r11: 0000000000000282 >>>>> (XEN) [2016-07-16 16:03:17.069] r12: ffff8305313b8000 r13: >>>>> ffff83009fd07e48 r14: ffff8305313b8000 >>>>> (XEN) [2016-07-16 16:03:17.069] r15: ffff8305356df4a8 cr0: >>>>> 0000000080050033 cr4: 00000000000006e0 >>>>> (XEN) [2016-07-16 16:03:17.069] cr3: 000000053639f000 cr2: >>>>> 0000000000000000 >>>>> (XEN) [2016-07-16 16:03:17.069] ds: 0000 es: 0000 fs: 0000 gs: >>>>> 0000 ss: e010 cs: e008 >>>>> (XEN) [2016-07-16 16:03:17.069] Xen code around <ffff82d0801e39de> >>>>> (msixtbl_pt_unregister+0x7b/0xd9): >>>>> (XEN) [2016-07-16 16:03:17.069] 39 42 18 74 19 48 89 ca <48> 8b >>>>> 0a 0f >>>>> 18 09 48 39 fa 75 ec 48 8d 7b 24 e8 >>>>> (XEN) [2016-07-16 16:03:17.069] Xen stack trace from >>>>> rsp=ffff83009fd07c68: >>>>> (XEN) [2016-07-16 16:03:17.069] 0000000000000000 ffff8305356df480 >>>>> ffff83009fd07ce8 ffff82d08014c394 >>>>> (XEN) [2016-07-16 16:03:17.069] 0000000000000001 ffff8305356df480 >>>>> 0000000000000293 ffff8305313b80cc >>>>> (XEN) [2016-07-16 16:03:17.069] 000000568012ffe5 ffff8305313b8000 >>>>> ffff83009fd07cd8 ffff83009fd07e38 >>>>> (XEN) [2016-07-16 16:03:17.070] 0000000000000000 ffff83054e5fc000 >>>>> 00007fc25a33e004 ffff8305313b8000 >>>>> (XEN) [2016-07-16 16:03:17.070] ffff83009fd07da8 ffff82d0801629c8 >>>>> 0000000000000000 ffff83053b1191f0 >>>>> (XEN) [2016-07-16 16:03:17.070] 0000000000000246 ffff83009fd07d28 >>>>> ffff82d0801300ae 000000000000000e >>>>> (XEN) [2016-07-16 16:03:17.070] ffff83009fd07d78 ffff82d080171497 >>>>> ffff83009fd07d78 000000020001d17b >>>>> (XEN) [2016-07-16 16:03:17.070] ffff83009fd07d68 0000000000000000 >>>>> ffff83009fd07d68 ffff82d080130280 >>>>> (XEN) [2016-07-16 16:03:17.070] ffff83009fd07d78 ffff82d08014d0aa >>>>> 0000000000000202 0000000000000000 >>>>> (XEN) [2016-07-16 16:03:17.070] ffff8305313b8000 ffff88005716d320 >>>>> 0000000000305000 00007fc25a33e004 >>>>> (XEN) [2016-07-16 16:03:17.070] ffff83009fd07ef8 ffff82d080104b2c >>>>> 0000000000000206 0000000000000002 >>>>> (XEN) [2016-07-16 16:03:17.070] ffff83009fd07df8 ffff82d08018c9db >>>>> 0000000000000cfe 0000000000000002 >>>>> (XEN) [2016-07-16 16:03:17.070] 0000000000000002 ffff83054e5fc000 >>>>> ffff83009fd07e48 ffff82d08019c119 >>>>> (XEN) [2016-07-16 16:03:17.070] ffff83009fd07e38 0000000080121177 >>>>> ffff83009fd07e38 0000000000000cfe >>>>> (XEN) [2016-07-16 16:03:17.070] ffff83009fd07f18 0000000000000206 >>>>> 0000000c00000030 000056082bb90013 >>>>> (XEN) [2016-07-16 16:03:17.070] 0000000200000056 00007fc200000013 >>>>> 0000305600000000 000056082b87465d >>>>> (XEN) [2016-07-16 16:03:17.070] 00007ffe268206e0 00007fc25606b31f >>>>> 0000000000000000 000056082b8746cf >>>>> (XEN) [2016-07-16 16:03:17.070] 0000000000001000 fee5600026820730 >>>>> 00007ffe26820740 000056082b8797be >>>>> (XEN) [2016-07-16 16:03:17.070] 00000000fee56000 0000430026820772 >>>>> 00007ffe26820740 0000000000003056 >>>>> (XEN) [2016-07-16 16:03:17.070] 00007ffe268206e0 ffff83009ff8a000 >>>>> 00007ffe26820580 ffff88005716d320 >>>>> (XEN) [2016-07-16 16:03:17.070] Xen call trace: >>>>> (XEN) [2016-07-16 16:03:17.070] [<ffff82d0801e39de>] >>>>> msixtbl_pt_unregister+0x7b/0xd9 >>>>> (XEN) [2016-07-16 16:03:17.070] [<ffff82d08014c394>] >>>>> pt_irq_destroy_bind+0x2be/0x3f0 >>>>> (XEN) [2016-07-16 16:03:17.070] [<ffff82d0801629c8>] >>>>> arch_do_domctl+0xc77/0x2414 >>>>> (XEN) [2016-07-16 16:03:17.070] [<ffff82d080104b2c>] >>>>> do_domctl+0x19db/0x1d26 >>>>> (XEN) [2016-07-16 16:03:17.070] [<ffff82d0802426bd>] >>>>> lstar_enter+0xdd/0x137 >>>>> (XEN) [2016-07-16 16:03:17.070] >>>>> (XEN) [2016-07-16 16:03:17.070] Pagetable walk from 0000000000000000: >>>>> (XEN) [2016-07-16 16:03:17.070] L4[0x000] = 0000000000000000 >>>>> ffffffffffffffff >>>>> (XEN) [2016-07-16 16:03:18.147] >>>>> (XEN) [2016-07-16 16:03:18.155] >>>>> **************************************** >>>>> (XEN) [2016-07-16 16:03:18.175] Panic on CPU 0: >>>>> (XEN) [2016-07-16 16:03:18.187] FATAL PAGE FAULT >>>>> (XEN) [2016-07-16 16:03:18.200] [error_code=0000] >>>>> (XEN) [2016-07-16 16:03:18.214] Faulting linear address: >>>>> 0000000000000000 >>>>> (XEN) [2016-07-16 16:03:18.233] >>>>> **************************************** >>>>> (XEN) [2016-07-16 16:03:18.252] >>>>> (XEN) [2016-07-16 16:03:18.261] Reboot in five seconds... >>>>> >>>> Can you paste the disassembly of msixtbl_pt_unregister() please? That >>>> is a dereference of %rdx which is NULL at this point, but I need to >>>> figure out which pointer it is supposed to be. >>> Hi Andrew, >> >> <snip> >> >> Thanks. What has happened is that the msixtbl linked list is still >> uninitialised at this point. The only way I can see for this to happen >> is that msixtbl_init() hasn't been called, or hasn't passed its first if >> condition. The INIT_LIST_HEAD() visible in the context of the 2nd hunk >> of identified changeset is the line of code which changes the list from >> 0 to initialised, and I don't see anywhere which re-zeros it later. >> >> This alone suggests that the VM in question isn't actually using MSI-X >> interrupts, even if the device passed through is capable. > > Hmm didn't actually check this before, but you seem to be right > (below is the lspci output from within the guest). Both of those devices are using MSI interrupts - they don't even support MSI-X. > > >> Following the style of the identified changeset, >> >> andrewcoop@andrewcoop:/local/xen.git/xen$ git diff >> diff --git a/xen/arch/x86/hvm/vmsi.c b/xen/arch/x86/hvm/vmsi.c >> index e418b98..c533719 100644 >> --- a/xen/arch/x86/hvm/vmsi.c >> +++ b/xen/arch/x86/hvm/vmsi.c >> @@ -519,7 +519,7 @@ void msixtbl_pt_unregister(struct domain *d, struct >> pirq *pirq) >> ASSERT(pcidevs_locked()); >> ASSERT(spin_is_locked(&d->event_lock)); >> >> - if ( !has_vlapic(d) ) >> + if ( !d->arch.hvm_domain.msixtbl_list.next ) >> return; >> >> irq_desc = pirq_spin_lock_irq_desc(pirq, NULL); >> >> should resolve your issue, although I am very tempted to replace the >> opencoded list logic with a msixtbl_initialised() predicate instead. >> >> ~Andrew > > It does resolve the issue, thanks ! Right - I will clean up the patch tomorrow using a more logical predicate. ~Andrew _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH] x86/vMSI-X: Fix host crash when shutting down guests with MSI capable devices 2016-07-18 10:21 Xen-unstable 4.8: Host crash when shutting down guest with pci device passed through using MSI-X interrupts linux 2016-07-18 17:48 ` Andrew Cooper @ 2016-07-21 10:18 ` Andrew Cooper 2016-07-21 10:37 ` Sander Eikelenboom ` (2 more replies) 1 sibling, 3 replies; 13+ messages in thread From: Andrew Cooper @ 2016-07-21 10:18 UTC (permalink / raw) To: Xen-devel; +Cc: Andrew Cooper, Jan Beulich, Sander Eikelenboom c/s 74c6dc2d "x86/vMSI-X: defer intercept handler registration" caused MSI-X table infrastructure not to always be initialised, but it missed one path which needed an is-initialised check. If a devices is passed through to a domain which is MSI capable but not MSI-X capable, the call to msixtbl_init() is omitted, but a XEN_DOMCTL_unbind_pt_irq hypercall still calls into msixtbl_pt_unregister(). This follows the linked list pointer which is still NULL. Introduce an is-initalised check to msixtbl_pt_unregister(). Furthermore, the purpose of the open-coded msixtbl_list.next check is rather subtle. Introduce an msixtbl_initialised() predicate instead, which makes its purpose far more obvious. Reported-by: Sander Eikelenboom <linux@eikelenboom.it> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> --- CC: Jan Beulich <JBeulich@suse.com> CC: Sander Eikelenboom <linux@eikelenboom.it> Sander - would you mind double checking this patch? --- xen/arch/x86/hvm/vmsi.c | 16 +++++++++++++--- 1 file changed, 13 insertions(+), 3 deletions(-) diff --git a/xen/arch/x86/hvm/vmsi.c b/xen/arch/x86/hvm/vmsi.c index e418b98..ef1dfff 100644 --- a/xen/arch/x86/hvm/vmsi.c +++ b/xen/arch/x86/hvm/vmsi.c @@ -166,6 +166,16 @@ struct msixtbl_entry static DEFINE_RCU_READ_LOCK(msixtbl_rcu_lock); +/* + * MSI-X table infrastructure is dynamically initialised when an MSI-X capable + * device is passed through to a domain, rather than unconditionally for all + * domains. + */ +static bool msixtbl_initialised(const struct domain *d) +{ + return !!d->arch.hvm_domain.msixtbl_list.next; +} + static struct msixtbl_entry *msixtbl_find_entry( struct vcpu *v, unsigned long addr) { @@ -519,7 +529,7 @@ void msixtbl_pt_unregister(struct domain *d, struct pirq *pirq) ASSERT(pcidevs_locked()); ASSERT(spin_is_locked(&d->event_lock)); - if ( !has_vlapic(d) ) + if ( !msixtbl_initialised(d) ) return; irq_desc = pirq_spin_lock_irq_desc(pirq, NULL); @@ -552,7 +562,7 @@ void msixtbl_init(struct domain *d) struct hvm_io_handler *handler; if ( !has_hvm_container_domain(d) || !has_vlapic(d) || - d->arch.hvm_domain.msixtbl_list.next ) + msixtbl_initialised(d) ) return; INIT_LIST_HEAD(&d->arch.hvm_domain.msixtbl_list); @@ -569,7 +579,7 @@ void msixtbl_pt_cleanup(struct domain *d) { struct msixtbl_entry *entry, *temp; - if ( !d->arch.hvm_domain.msixtbl_list.next ) + if ( !msixtbl_initialised(d) ) return; spin_lock(&d->event_lock); -- 2.1.4 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH] x86/vMSI-X: Fix host crash when shutting down guests with MSI capable devices 2016-07-21 10:18 ` [PATCH] x86/vMSI-X: Fix host crash when shutting down guests with MSI capable devices Andrew Cooper @ 2016-07-21 10:37 ` Sander Eikelenboom 2016-07-22 8:50 ` Sander Eikelenboom 2016-07-25 10:26 ` George Dunlap 2 siblings, 0 replies; 13+ messages in thread From: Sander Eikelenboom @ 2016-07-21 10:37 UTC (permalink / raw) To: Andrew Cooper, Xen-devel; +Cc: Jan Beulich On July 21, 2016 12:18:37 PM GMT+02:00, Andrew Cooper <andrew.cooper3@citrix.com> wrote: >c/s 74c6dc2d "x86/vMSI-X: defer intercept handler registration" caused >MSI-X >table infrastructure not to always be initialised, but it missed one >path >which needed an is-initialised check. > >If a devices is passed through to a domain which is MSI capable but not >MSI-X >capable, the call to msixtbl_init() is omitted, but a >XEN_DOMCTL_unbind_pt_irq >hypercall still calls into msixtbl_pt_unregister(). This follows the >linked >list pointer which is still NULL. > >Introduce an is-initalised check to msixtbl_pt_unregister(). > >Furthermore, the purpose of the open-coded msixtbl_list.next check is >rather >subtle. Introduce an msixtbl_initialised() predicate instead, which >makes its >purpose far more obvious. > >Reported-by: Sander Eikelenboom <linux@eikelenboom.it> >Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> >--- >CC: Jan Beulich <JBeulich@suse.com> >CC: Sander Eikelenboom <linux@eikelenboom.it> > >Sander - would you mind double checking this patch? >--- Sure, will report back tomorrow. -- Sander > xen/arch/x86/hvm/vmsi.c | 16 +++++++++++++--- > 1 file changed, 13 insertions(+), 3 deletions(-) > >diff --git a/xen/arch/x86/hvm/vmsi.c b/xen/arch/x86/hvm/vmsi.c >index e418b98..ef1dfff 100644 >--- a/xen/arch/x86/hvm/vmsi.c >+++ b/xen/arch/x86/hvm/vmsi.c >@@ -166,6 +166,16 @@ struct msixtbl_entry > > static DEFINE_RCU_READ_LOCK(msixtbl_rcu_lock); > >+/* >+ * MSI-X table infrastructure is dynamically initialised when an MSI-X >capable >+ * device is passed through to a domain, rather than unconditionally >for all >+ * domains. >+ */ >+static bool msixtbl_initialised(const struct domain *d) >+{ >+ return !!d->arch.hvm_domain.msixtbl_list.next; >+} >+ > static struct msixtbl_entry *msixtbl_find_entry( > struct vcpu *v, unsigned long addr) > { >@@ -519,7 +529,7 @@ void msixtbl_pt_unregister(struct domain *d, struct >pirq *pirq) > ASSERT(pcidevs_locked()); > ASSERT(spin_is_locked(&d->event_lock)); > >- if ( !has_vlapic(d) ) >+ if ( !msixtbl_initialised(d) ) > return; > > irq_desc = pirq_spin_lock_irq_desc(pirq, NULL); >@@ -552,7 +562,7 @@ void msixtbl_init(struct domain *d) > struct hvm_io_handler *handler; > > if ( !has_hvm_container_domain(d) || !has_vlapic(d) || >- d->arch.hvm_domain.msixtbl_list.next ) >+ msixtbl_initialised(d) ) > return; > > INIT_LIST_HEAD(&d->arch.hvm_domain.msixtbl_list); >@@ -569,7 +579,7 @@ void msixtbl_pt_cleanup(struct domain *d) > { > struct msixtbl_entry *entry, *temp; > >- if ( !d->arch.hvm_domain.msixtbl_list.next ) >+ if ( !msixtbl_initialised(d) ) > return; > > spin_lock(&d->event_lock); _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] x86/vMSI-X: Fix host crash when shutting down guests with MSI capable devices 2016-07-21 10:18 ` [PATCH] x86/vMSI-X: Fix host crash when shutting down guests with MSI capable devices Andrew Cooper 2016-07-21 10:37 ` Sander Eikelenboom @ 2016-07-22 8:50 ` Sander Eikelenboom 2016-07-25 10:16 ` Andrew Cooper 2016-07-25 10:26 ` George Dunlap 2 siblings, 1 reply; 13+ messages in thread From: Sander Eikelenboom @ 2016-07-22 8:50 UTC (permalink / raw) To: Andrew Cooper; +Cc: Jan Beulich, Xen-devel Thursday, July 21, 2016, 12:18:37 PM, you wrote: > c/s 74c6dc2d "x86/vMSI-X: defer intercept handler registration" caused MSI-X > table infrastructure not to always be initialised, but it missed one path > which needed an is-initialised check. > If a devices is passed through to a domain which is MSI capable but not MSI-X > capable, the call to msixtbl_init() is omitted, but a XEN_DOMCTL_unbind_pt_irq > hypercall still calls into msixtbl_pt_unregister(). This follows the linked > list pointer which is still NULL. > Introduce an is-initalised check to msixtbl_pt_unregister(). > Furthermore, the purpose of the open-coded msixtbl_list.next check is rather > subtle. Introduce an msixtbl_initialised() predicate instead, which makes its > purpose far more obvious. > Reported-by: Sander Eikelenboom <linux@eikelenboom.it> > Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> > --- > CC: Jan Beulich <JBeulich@suse.com> > CC: Sander Eikelenboom <linux@eikelenboom.it> > Sander - would you mind double checking this patch? > --- Hi Andrew, Just got the chance to test and it works for me ! Thanks, Sander > xen/arch/x86/hvm/vmsi.c | 16 +++++++++++++--- > 1 file changed, 13 insertions(+), 3 deletions(-) > diff --git a/xen/arch/x86/hvm/vmsi.c b/xen/arch/x86/hvm/vmsi.c > index e418b98..ef1dfff 100644 > --- a/xen/arch/x86/hvm/vmsi.c > +++ b/xen/arch/x86/hvm/vmsi.c > @@ -166,6 +166,16 @@ struct msixtbl_entry > > static DEFINE_RCU_READ_LOCK(msixtbl_rcu_lock); > > +/* > + * MSI-X table infrastructure is dynamically initialised when an MSI-X capable > + * device is passed through to a domain, rather than unconditionally for all > + * domains. > + */ > +static bool msixtbl_initialised(const struct domain *d) > +{ + return !!d->>arch.hvm_domain.msixtbl_list.next; > +} > + > static struct msixtbl_entry *msixtbl_find_entry( > struct vcpu *v, unsigned long addr) > { > @@ -519,7 +529,7 @@ void msixtbl_pt_unregister(struct domain *d, struct pirq *pirq) > ASSERT(pcidevs_locked()); > ASSERT(spin_is_locked(&d->event_lock)); > > - if ( !has_vlapic(d) ) > + if ( !msixtbl_initialised(d) ) > return; > > irq_desc = pirq_spin_lock_irq_desc(pirq, NULL); > @@ -552,7 +562,7 @@ void msixtbl_init(struct domain *d) > struct hvm_io_handler *handler; > > if ( !has_hvm_container_domain(d) || !has_vlapic(d) || - d->>arch.hvm_domain.msixtbl_list.next ) > + msixtbl_initialised(d) ) > return; > > INIT_LIST_HEAD(&d->arch.hvm_domain.msixtbl_list); > @@ -569,7 +579,7 @@ void msixtbl_pt_cleanup(struct domain *d) > { > struct msixtbl_entry *entry, *temp; > - if ( !d->>arch.hvm_domain.msixtbl_list.next ) > + if ( !msixtbl_initialised(d) ) > return; > > spin_lock(&d->event_lock); _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] x86/vMSI-X: Fix host crash when shutting down guests with MSI capable devices 2016-07-22 8:50 ` Sander Eikelenboom @ 2016-07-25 10:16 ` Andrew Cooper 2016-07-25 10:19 ` Andrew Cooper 0 siblings, 1 reply; 13+ messages in thread From: Andrew Cooper @ 2016-07-25 10:16 UTC (permalink / raw) To: Sander Eikelenboom; +Cc: Jan Beulich, Xen-devel On 22/07/16 09:50, Sander Eikelenboom wrote: > Thursday, July 21, 2016, 12:18:37 PM, you wrote: > >> c/s 74c6dc2d "x86/vMSI-X: defer intercept handler registration" caused MSI-X >> table infrastructure not to always be initialised, but it missed one path >> which needed an is-initialised check. >> If a devices is passed through to a domain which is MSI capable but not MSI-X >> capable, the call to msixtbl_init() is omitted, but a XEN_DOMCTL_unbind_pt_irq >> hypercall still calls into msixtbl_pt_unregister(). This follows the linked >> list pointer which is still NULL. >> Introduce an is-initalised check to msixtbl_pt_unregister(). >> Furthermore, the purpose of the open-coded msixtbl_list.next check is rather >> subtle. Introduce an msixtbl_initialised() predicate instead, which makes its >> purpose far more obvious. >> Reported-by: Sander Eikelenboom <linux@eikelenboom.it> >> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> >> --- >> CC: Jan Beulich <JBeulich@suse.com> >> CC: Sander Eikelenboom <linux@eikelenboom.it> >> Sander - would you mind double checking this patch? >> --- > Hi Andrew, > > Just got the chance to test and it works for me ! > > Thanks, May I take that as a Test-by: then please? ~Andrew _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] x86/vMSI-X: Fix host crash when shutting down guests with MSI capable devices 2016-07-25 10:16 ` Andrew Cooper @ 2016-07-25 10:19 ` Andrew Cooper 2016-07-25 10:23 ` Sander Eikelenboom 0 siblings, 1 reply; 13+ messages in thread From: Andrew Cooper @ 2016-07-25 10:19 UTC (permalink / raw) To: Sander Eikelenboom; +Cc: Jan Beulich, Xen-devel On 25/07/16 11:16, Andrew Cooper wrote: > On 22/07/16 09:50, Sander Eikelenboom wrote: >> Thursday, July 21, 2016, 12:18:37 PM, you wrote: >> >>> c/s 74c6dc2d "x86/vMSI-X: defer intercept handler registration" caused MSI-X >>> table infrastructure not to always be initialised, but it missed one path >>> which needed an is-initialised check. >>> If a devices is passed through to a domain which is MSI capable but not MSI-X >>> capable, the call to msixtbl_init() is omitted, but a XEN_DOMCTL_unbind_pt_irq >>> hypercall still calls into msixtbl_pt_unregister(). This follows the linked >>> list pointer which is still NULL. >>> Introduce an is-initalised check to msixtbl_pt_unregister(). >>> Furthermore, the purpose of the open-coded msixtbl_list.next check is rather >>> subtle. Introduce an msixtbl_initialised() predicate instead, which makes its >>> purpose far more obvious. >>> Reported-by: Sander Eikelenboom <linux@eikelenboom.it> >>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> >>> --- >>> CC: Jan Beulich <JBeulich@suse.com> >>> CC: Sander Eikelenboom <linux@eikelenboom.it> >>> Sander - would you mind double checking this patch? >>> --- >> Hi Andrew, >> >> Just got the chance to test and it works for me ! >> >> Thanks, > May I take that as a Test-by: then please? And of course, I meant Tested-by: ~Andrew _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] x86/vMSI-X: Fix host crash when shutting down guests with MSI capable devices 2016-07-25 10:19 ` Andrew Cooper @ 2016-07-25 10:23 ` Sander Eikelenboom 0 siblings, 0 replies; 13+ messages in thread From: Sander Eikelenboom @ 2016-07-25 10:23 UTC (permalink / raw) To: Andrew Cooper; +Cc: Jan Beulich, Xen-devel Monday, July 25, 2016, 12:19:55 PM, you wrote: > On 25/07/16 11:16, Andrew Cooper wrote: >> On 22/07/16 09:50, Sander Eikelenboom wrote: >>> Thursday, July 21, 2016, 12:18:37 PM, you wrote: >>> >>>> c/s 74c6dc2d "x86/vMSI-X: defer intercept handler registration" caused MSI-X >>>> table infrastructure not to always be initialised, but it missed one path >>>> which needed an is-initialised check. >>>> If a devices is passed through to a domain which is MSI capable but not MSI-X >>>> capable, the call to msixtbl_init() is omitted, but a XEN_DOMCTL_unbind_pt_irq >>>> hypercall still calls into msixtbl_pt_unregister(). This follows the linked >>>> list pointer which is still NULL. >>>> Introduce an is-initalised check to msixtbl_pt_unregister(). >>>> Furthermore, the purpose of the open-coded msixtbl_list.next check is rather >>>> subtle. Introduce an msixtbl_initialised() predicate instead, which makes its >>>> purpose far more obvious. >>>> Reported-by: Sander Eikelenboom <linux@eikelenboom.it> >>>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> >>>> --- >>>> CC: Jan Beulich <JBeulich@suse.com> >>>> CC: Sander Eikelenboom <linux@eikelenboom.it> >>>> Sander - would you mind double checking this patch? >>>> --- >>> Hi Andrew, >>> >>> Just got the chance to test and it works for me ! >>> >>> Thanks, >> May I take that as a Test-by: then please? > And of course, I meant Tested-by: Yes, thanks for the quick fix ! -- Sander > ~Andrew _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] x86/vMSI-X: Fix host crash when shutting down guests with MSI capable devices 2016-07-21 10:18 ` [PATCH] x86/vMSI-X: Fix host crash when shutting down guests with MSI capable devices Andrew Cooper 2016-07-21 10:37 ` Sander Eikelenboom 2016-07-22 8:50 ` Sander Eikelenboom @ 2016-07-25 10:26 ` George Dunlap 2 siblings, 0 replies; 13+ messages in thread From: George Dunlap @ 2016-07-25 10:26 UTC (permalink / raw) To: Andrew Cooper; +Cc: Sander Eikelenboom, Jan Beulich, Xen-devel On Thu, Jul 21, 2016 at 11:18 AM, Andrew Cooper <andrew.cooper3@citrix.com> wrote: > c/s 74c6dc2d "x86/vMSI-X: defer intercept handler registration" caused MSI-X > table infrastructure not to always be initialised, but it missed one path > which needed an is-initialised check. > > If a devices is passed through to a domain which is MSI capable but not MSI-X > capable, the call to msixtbl_init() is omitted, but a XEN_DOMCTL_unbind_pt_irq > hypercall still calls into msixtbl_pt_unregister(). This follows the linked > list pointer which is still NULL. > > Introduce an is-initalised check to msixtbl_pt_unregister(). > > Furthermore, the purpose of the open-coded msixtbl_list.next check is rather > subtle. Introduce an msixtbl_initialised() predicate instead, which makes its > purpose far more obvious. Thanks for this bit. > Reported-by: Sander Eikelenboom <linux@eikelenboom.it> > Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: George Dunlap <george.dunlap@citrix.com> _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2016-07-25 10:26 UTC | newest] Thread overview: 13+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2016-07-18 10:21 Xen-unstable 4.8: Host crash when shutting down guest with pci device passed through using MSI-X interrupts linux 2016-07-18 17:48 ` Andrew Cooper 2016-07-18 19:26 ` Sander Eikelenboom 2016-07-18 20:57 ` Andrew Cooper 2016-07-18 22:03 ` linux 2016-07-18 22:07 ` Andrew Cooper 2016-07-21 10:18 ` [PATCH] x86/vMSI-X: Fix host crash when shutting down guests with MSI capable devices Andrew Cooper 2016-07-21 10:37 ` Sander Eikelenboom 2016-07-22 8:50 ` Sander Eikelenboom 2016-07-25 10:16 ` Andrew Cooper 2016-07-25 10:19 ` Andrew Cooper 2016-07-25 10:23 ` Sander Eikelenboom 2016-07-25 10:26 ` George Dunlap
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).