From: Andrew Cooper <andrew.cooper3@citrix.com>
To: linux@eikelenboom.it
Cc: Xen-devel <xen-devel@lists.xen.org>,
Jan Beulich <JBeulich@suse.com>,
Andrew Cooper <amc96@hermes.cam.ac.uk>
Subject: Re: Xen-unstable 4.8: Host crash when shutting down guest with pci device passed through using MSI-X interrupts.
Date: Mon, 18 Jul 2016 23:07:18 +0100 [thread overview]
Message-ID: <96ebbed5-8540-16db-4794-b5dd44d3fcb3@citrix.com> (raw)
In-Reply-To: <7815e37ba427b58e6932417956a2413f@eikelenboom.it>
On 18/07/2016 23:03, linux@eikelenboom.it wrote:
> On 2016-07-18 22:57, Andrew Cooper wrote:
>> On 18/07/2016 20:26, Sander Eikelenboom wrote:
>>> Monday, July 18, 2016, 7:48:20 PM, you wrote:
>>>
>>>> On 18/07/16 11:21, linux@eikelenboom.it wrote:
>>>>> Hi Jan,
>>>>>
>>>>> It seems that since your patch series starting with commit:
>>>>> 2016-06-22 x86/vMSI-X: defer intercept handler registration
>>>>> 74c6dc2d0ac4dcab0c6243cdf6ed550c1532b798
>>>>>
>>>>> The shutdown of a guest which has a PCI device passed through which
>>>>> uses MSI-X interrupts causes
>>>>> a host crash, see the splat below. Somehow it also doesn't reboot
>>>>> in 5
>>>>> seconds as it is supposed to (i don't have no-reboot on the command
>>>>> line).
>>>>>
>>>>> --
>>>>> Sander
>>>>>
>>>>>
>>>>> (XEN) [2016-07-16 16:03:17.069] ----[ Xen-4.8-unstable x86_64
>>>>> debug=y Not tainted ]----
>>>>> (XEN) [2016-07-16 16:03:17.069] CPU: 0
>>>>> (XEN) [2016-07-16 16:03:17.069] RIP: e008:[<ffff82d0801e39de>]
>>>>> msixtbl_pt_unregister+0x7b/0xd9
>>>>> (XEN) [2016-07-16 16:03:17.069] RFLAGS: 0000000000010082 CONTEXT:
>>>>> hypervisor (d0v0)
>>>>> (XEN) [2016-07-16 16:03:17.069] rax: ffff83055c678e40 rbx:
>>>>> ffff83055c685500 rcx: 0000000000000001
>>>>> (XEN) [2016-07-16 16:03:17.069] rdx: 0000000000000000 rsi:
>>>>> 0000000000001ab0 rdi: ffff8305313b85a0
>>>>> (XEN) [2016-07-16 16:03:17.069] rbp: ffff83009fd07c78 rsp:
>>>>> ffff83009fd07c68 r8: ffff8305356dfff0
>>>>> (XEN) [2016-07-16 16:03:17.069] r9: ffff8305356df480 r10:
>>>>> ffff830503420c50 r11: 0000000000000282
>>>>> (XEN) [2016-07-16 16:03:17.069] r12: ffff8305313b8000 r13:
>>>>> ffff83009fd07e48 r14: ffff8305313b8000
>>>>> (XEN) [2016-07-16 16:03:17.069] r15: ffff8305356df4a8 cr0:
>>>>> 0000000080050033 cr4: 00000000000006e0
>>>>> (XEN) [2016-07-16 16:03:17.069] cr3: 000000053639f000 cr2:
>>>>> 0000000000000000
>>>>> (XEN) [2016-07-16 16:03:17.069] ds: 0000 es: 0000 fs: 0000 gs:
>>>>> 0000 ss: e010 cs: e008
>>>>> (XEN) [2016-07-16 16:03:17.069] Xen code around <ffff82d0801e39de>
>>>>> (msixtbl_pt_unregister+0x7b/0xd9):
>>>>> (XEN) [2016-07-16 16:03:17.069] 39 42 18 74 19 48 89 ca <48> 8b
>>>>> 0a 0f
>>>>> 18 09 48 39 fa 75 ec 48 8d 7b 24 e8
>>>>> (XEN) [2016-07-16 16:03:17.069] Xen stack trace from
>>>>> rsp=ffff83009fd07c68:
>>>>> (XEN) [2016-07-16 16:03:17.069] 0000000000000000 ffff8305356df480
>>>>> ffff83009fd07ce8 ffff82d08014c394
>>>>> (XEN) [2016-07-16 16:03:17.069] 0000000000000001 ffff8305356df480
>>>>> 0000000000000293 ffff8305313b80cc
>>>>> (XEN) [2016-07-16 16:03:17.069] 000000568012ffe5 ffff8305313b8000
>>>>> ffff83009fd07cd8 ffff83009fd07e38
>>>>> (XEN) [2016-07-16 16:03:17.070] 0000000000000000 ffff83054e5fc000
>>>>> 00007fc25a33e004 ffff8305313b8000
>>>>> (XEN) [2016-07-16 16:03:17.070] ffff83009fd07da8 ffff82d0801629c8
>>>>> 0000000000000000 ffff83053b1191f0
>>>>> (XEN) [2016-07-16 16:03:17.070] 0000000000000246 ffff83009fd07d28
>>>>> ffff82d0801300ae 000000000000000e
>>>>> (XEN) [2016-07-16 16:03:17.070] ffff83009fd07d78 ffff82d080171497
>>>>> ffff83009fd07d78 000000020001d17b
>>>>> (XEN) [2016-07-16 16:03:17.070] ffff83009fd07d68 0000000000000000
>>>>> ffff83009fd07d68 ffff82d080130280
>>>>> (XEN) [2016-07-16 16:03:17.070] ffff83009fd07d78 ffff82d08014d0aa
>>>>> 0000000000000202 0000000000000000
>>>>> (XEN) [2016-07-16 16:03:17.070] ffff8305313b8000 ffff88005716d320
>>>>> 0000000000305000 00007fc25a33e004
>>>>> (XEN) [2016-07-16 16:03:17.070] ffff83009fd07ef8 ffff82d080104b2c
>>>>> 0000000000000206 0000000000000002
>>>>> (XEN) [2016-07-16 16:03:17.070] ffff83009fd07df8 ffff82d08018c9db
>>>>> 0000000000000cfe 0000000000000002
>>>>> (XEN) [2016-07-16 16:03:17.070] 0000000000000002 ffff83054e5fc000
>>>>> ffff83009fd07e48 ffff82d08019c119
>>>>> (XEN) [2016-07-16 16:03:17.070] ffff83009fd07e38 0000000080121177
>>>>> ffff83009fd07e38 0000000000000cfe
>>>>> (XEN) [2016-07-16 16:03:17.070] ffff83009fd07f18 0000000000000206
>>>>> 0000000c00000030 000056082bb90013
>>>>> (XEN) [2016-07-16 16:03:17.070] 0000000200000056 00007fc200000013
>>>>> 0000305600000000 000056082b87465d
>>>>> (XEN) [2016-07-16 16:03:17.070] 00007ffe268206e0 00007fc25606b31f
>>>>> 0000000000000000 000056082b8746cf
>>>>> (XEN) [2016-07-16 16:03:17.070] 0000000000001000 fee5600026820730
>>>>> 00007ffe26820740 000056082b8797be
>>>>> (XEN) [2016-07-16 16:03:17.070] 00000000fee56000 0000430026820772
>>>>> 00007ffe26820740 0000000000003056
>>>>> (XEN) [2016-07-16 16:03:17.070] 00007ffe268206e0 ffff83009ff8a000
>>>>> 00007ffe26820580 ffff88005716d320
>>>>> (XEN) [2016-07-16 16:03:17.070] Xen call trace:
>>>>> (XEN) [2016-07-16 16:03:17.070] [<ffff82d0801e39de>]
>>>>> msixtbl_pt_unregister+0x7b/0xd9
>>>>> (XEN) [2016-07-16 16:03:17.070] [<ffff82d08014c394>]
>>>>> pt_irq_destroy_bind+0x2be/0x3f0
>>>>> (XEN) [2016-07-16 16:03:17.070] [<ffff82d0801629c8>]
>>>>> arch_do_domctl+0xc77/0x2414
>>>>> (XEN) [2016-07-16 16:03:17.070] [<ffff82d080104b2c>]
>>>>> do_domctl+0x19db/0x1d26
>>>>> (XEN) [2016-07-16 16:03:17.070] [<ffff82d0802426bd>]
>>>>> lstar_enter+0xdd/0x137
>>>>> (XEN) [2016-07-16 16:03:17.070]
>>>>> (XEN) [2016-07-16 16:03:17.070] Pagetable walk from 0000000000000000:
>>>>> (XEN) [2016-07-16 16:03:17.070] L4[0x000] = 0000000000000000
>>>>> ffffffffffffffff
>>>>> (XEN) [2016-07-16 16:03:18.147]
>>>>> (XEN) [2016-07-16 16:03:18.155]
>>>>> ****************************************
>>>>> (XEN) [2016-07-16 16:03:18.175] Panic on CPU 0:
>>>>> (XEN) [2016-07-16 16:03:18.187] FATAL PAGE FAULT
>>>>> (XEN) [2016-07-16 16:03:18.200] [error_code=0000]
>>>>> (XEN) [2016-07-16 16:03:18.214] Faulting linear address:
>>>>> 0000000000000000
>>>>> (XEN) [2016-07-16 16:03:18.233]
>>>>> ****************************************
>>>>> (XEN) [2016-07-16 16:03:18.252]
>>>>> (XEN) [2016-07-16 16:03:18.261] Reboot in five seconds...
>>>>>
>>>> Can you paste the disassembly of msixtbl_pt_unregister() please? That
>>>> is a dereference of %rdx which is NULL at this point, but I need to
>>>> figure out which pointer it is supposed to be.
>>> Hi Andrew,
>>
>> <snip>
>>
>> Thanks. What has happened is that the msixtbl linked list is still
>> uninitialised at this point. The only way I can see for this to happen
>> is that msixtbl_init() hasn't been called, or hasn't passed its first if
>> condition. The INIT_LIST_HEAD() visible in the context of the 2nd hunk
>> of identified changeset is the line of code which changes the list from
>> 0 to initialised, and I don't see anywhere which re-zeros it later.
>>
>> This alone suggests that the VM in question isn't actually using MSI-X
>> interrupts, even if the device passed through is capable.
>
> Hmm didn't actually check this before, but you seem to be right
> (below is the lspci output from within the guest).
Both of those devices are using MSI interrupts - they don't even support
MSI-X.
>
>
>> Following the style of the identified changeset,
>>
>> andrewcoop@andrewcoop:/local/xen.git/xen$ git diff
>> diff --git a/xen/arch/x86/hvm/vmsi.c b/xen/arch/x86/hvm/vmsi.c
>> index e418b98..c533719 100644
>> --- a/xen/arch/x86/hvm/vmsi.c
>> +++ b/xen/arch/x86/hvm/vmsi.c
>> @@ -519,7 +519,7 @@ void msixtbl_pt_unregister(struct domain *d, struct
>> pirq *pirq)
>> ASSERT(pcidevs_locked());
>> ASSERT(spin_is_locked(&d->event_lock));
>>
>> - if ( !has_vlapic(d) )
>> + if ( !d->arch.hvm_domain.msixtbl_list.next )
>> return;
>>
>> irq_desc = pirq_spin_lock_irq_desc(pirq, NULL);
>>
>> should resolve your issue, although I am very tempted to replace the
>> opencoded list logic with a msixtbl_initialised() predicate instead.
>>
>> ~Andrew
>
> It does resolve the issue, thanks !
Right - I will clean up the patch tomorrow using a more logical predicate.
~Andrew
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel
next prev parent reply other threads:[~2016-07-18 22:07 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-07-18 10:21 Xen-unstable 4.8: Host crash when shutting down guest with pci device passed through using MSI-X interrupts linux
2016-07-18 17:48 ` Andrew Cooper
2016-07-18 19:26 ` Sander Eikelenboom
2016-07-18 20:57 ` Andrew Cooper
2016-07-18 22:03 ` linux
2016-07-18 22:07 ` Andrew Cooper [this message]
2016-07-21 10:18 ` [PATCH] x86/vMSI-X: Fix host crash when shutting down guests with MSI capable devices Andrew Cooper
2016-07-21 10:37 ` Sander Eikelenboom
2016-07-22 8:50 ` Sander Eikelenboom
2016-07-25 10:16 ` Andrew Cooper
2016-07-25 10:19 ` Andrew Cooper
2016-07-25 10:23 ` Sander Eikelenboom
2016-07-25 10:26 ` George Dunlap
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=96ebbed5-8540-16db-4794-b5dd44d3fcb3@citrix.com \
--to=andrew.cooper3@citrix.com \
--cc=JBeulich@suse.com \
--cc=amc96@hermes.cam.ac.uk \
--cc=linux@eikelenboom.it \
--cc=xen-devel@lists.xen.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).