From: Chao Gao <chao.gao@intel.com>
To: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: "Quan Xu" <xuquan8@huawei.com>,
xen-devel@lists.xensource.com, "Jan Beulich" <jbeulich@suse.com>,
"Kevin Tian" <kevin.tian@intel.com>,
"osstest service owner" <osstest-admin@xenproject.org>,
"Jun Nakajima" <jun.nakajima@intel.com>,
"Roger Pau Monné" <roger.pau@citrix.com>
Subject: Re: [xen-unstable test] 113959: regressions - FAIL
Date: Mon, 9 Oct 2017 19:18:01 +0800 [thread overview]
Message-ID: <20171009111759.GA21574@op-computing> (raw)
In-Reply-To: <f8485fb4-dca9-28ad-a568-fd3282b28b20@citrix.com>
On Mon, Oct 09, 2017 at 12:03:53PM +0100, Andrew Cooper wrote:
>On 09/10/17 08:58, Chao Gao wrote:
>> On Mon, Oct 09, 2017 at 02:13:22PM +0800, Chao Gao wrote:
>>> On Tue, Oct 03, 2017 at 11:08:01AM +0100, Roger Pau Monné wrote:
>>>> On Tue, Oct 03, 2017 at 09:55:44AM +0000, osstest service owner wrote:
>>>>> flight 113959 xen-unstable real [real]
>>>>> http://logs.test-lab.xenproject.org/osstest/logs/113959/
>>>>>
>>>>> Regressions :-(
>>>>>
>>>>> Tests which did not succeed and are blocking,
>>>>> including tests which could not be run:
>>>>> test-amd64-i386-libvirt-xsm 21 leak-check/check fail REGR. vs. 113954
>>>> This is due to cron running when the leak-check is executed.
>>>>
>>>>> test-armhf-armhf-xl-multivcpu 5 host-ping-check-native fail REGR. vs. 113954
>>>>> test-amd64-i386-xl-qemut-debianhvm-amd64 17 guest-stop fail REGR. vs. 113954
>>>> The test below has triggered the following ASSERT, CCing the Intel
>>>> guys.
>>>>
>>>> Oct 3 06:12:00.415168 (XEN) d15v0: intack: 2:30 pt: 38
>>>> Oct 3 06:12:19.191141 (XEN) vIRR: 00000000 00000000 00000000 00000000 00000000 00000000 00010000 00000000
>>>> Oct 3 06:12:19.199162 (XEN) PIR: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
>>>> Oct 3 06:12:19.207160 (XEN) Assertion 'intack.vector >= pt_vector' failed at intr.c:367
>>>> Oct 3 06:12:19.215215 (XEN) ----[ Xen-4.10-unstable x86_64 debug=y Not tainted ]----
>>>> Oct 3 06:12:19.223124 (XEN) CPU: 1
>>>> Oct 3 06:12:19.223153 (XEN) RIP: e008:[<ffff82d0803022a5>] vmx_intr_assist+0x617/0x637
>>>> Oct 3 06:12:19.231185 (XEN) RFLAGS: 0000000000010292 CONTEXT: hypervisor (d15v0)
>>>> Oct 3 06:12:19.239163 (XEN) rax: ffff83022dfc802c rbx: ffff8300ccc65680 rcx: 0000000000000000
>>>> Oct 3 06:12:19.247169 (XEN) rdx: ffff83022df7ffff rsi: 000000000000000a rdi: ffff82d0804606d8
>>>> Oct 3 06:12:19.255127 (XEN) rbp: ffff83022df7ff08 rsp: ffff83022df7fea8 r8: ffff83022df90000
>>>> Oct 3 06:12:19.263114 (XEN) r9: 0000000000000001 r10: 0000000000000000 r11: 0000000000000001
>>>> Oct 3 06:12:19.271109 (XEN) r12: 00000000ffffffff r13: ffff82d0803cfba6 r14: ffff82d0803cfba6
>>>> Oct 3 06:12:19.279119 (XEN) r15: 0000000000000004 cr0: 0000000080050033 cr4: 00000000001526e0
>>>> Oct 3 06:12:19.279157 (XEN) cr3: 0000000214274000 cr2: 00005622a2184dbf
>>>> Oct 3 06:12:19.287123 (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: 0000 cs: e008
>>>> Oct 3 06:12:19.295105 (XEN) Xen code around <ffff82d0803022a5> (vmx_intr_assist+0x617/0x637):
>>>> Oct 3 06:12:19.303150 (XEN) 41 bf 00 00 00 00 eb a0 <0f> 0b 89 ce 48 89 df e8 bb 20 00 00 e9 49 fe ff
>>>> Oct 3 06:12:19.311112 (XEN) Xen stack trace from rsp=ffff83022df7fea8:
>>>> Oct 3 06:12:19.311146 (XEN) ffff83022df7ff08 000000388030cf76 ffff82d0805a7570 ffff82d08057ad80
>>>> Oct 3 06:12:19.319131 (XEN) ffff83022df7ffff ffff83022df7fee0 ffff82d08023b9b6 ffff8300ccc65000
>>>> Oct 3 06:12:19.327115 (XEN) 000000000000000b 0000000000000020 00000000000000c2 0000000000000004
>>>> Oct 3 06:12:19.345094 (XEN) ffff880029eb4000 ffff82d080311c21 0000000000000004 00000000000000c2
>>>> Oct 3 06:12:19.345177 (XEN) 0000000000000020 000000000000000b ffff880029eb4000 ffffffff81adf0a0
>>>> Oct 3 06:12:19.351221 (XEN) 0000000000000000 0000000000000000 ffff88002d400008 0000000000000000
>>>> Oct 3 06:12:19.359439 (XEN) 0000000000000030 0000000000000000 00000000000003f8 00000000000003f8
>>>> Oct 3 06:12:19.367267 (XEN) ffffffff81adf0a0 0000beef0000beef ffffffff8138a5f4 000000bf0000beef
>>>> Oct 3 06:12:19.375222 (XEN) 0000000000000002 ffff88002f803e08 000000000000beef 000000000000beef
>>>> Oct 3 06:12:19.383198 (XEN) 000000000000beef 000000000000beef 000000000000beef 0000000000000001
>>>> Oct 3 06:12:19.391230 (XEN) ffff8300ccc65000 00000031ada20d00 00000000001526e0
>>>> Oct 3 06:12:19.399336 (XEN) Xen call trace:
>>>> Oct 3 06:12:19.399389 (XEN) [<ffff82d0803022a5>] vmx_intr_assist+0x617/0x637
>>>> Oct 3 06:12:19.407337 (XEN) [<ffff82d080311c21>] vmx_asm_vmexit_handler+0x41/0x120
>>>> Oct 3 06:12:19.407380 (XEN)
>>>> Oct 3 06:12:19.415246 (XEN)
>>>> Oct 3 06:12:19.415278 (XEN) ****************************************
>>>> Oct 3 06:12:19.415307 (XEN) Panic on CPU 1:
>>>> Oct 3 06:12:19.415332 (XEN) Assertion 'intack.vector >= pt_vector' failed at intr.c:367
>>>> Oct 3 06:12:19.423432 (XEN) ****************************************
>>> (CC Jan)
>>>
>>> Hi, Roger.
>>>
>>> I sent a patch to fix a possible cause of this bug, seeing
>>> https://lists.xenproject.org/archives/html/xen-devel/2017-04/msg03254.html.
>>>
>>> Due to Xen 4.9 release, I put this patch aside and later forgot to
>>> continue fixing this bug. Sorry for this. Of course, I will fix this
>>> bug.
>>>
>>> I thought the root case was:
>>> When injecting periodic timer interrupt in vmx_intr_assist(),
>>> multi-read operations are done during one event delivery. For
>>> example, if a periodic timer interrupt is from PIT, when set the
>>> corresponding bit in vIRR, the corresponding RTE is accessed in
>>> pt_update_irq(). When this function returns, it accesses the RTE
>>> again to get the vector it sets in vIRR. Between the two
>>> accesses, the content of RTE may have been changed by another CPU
>>> for no protection method in use. This case can incur the
>>> assertion failure in vmx_intr_assist().
>>>
>>> For example, in this case, we may set 0x30 in vIRR, but return 0x38 to
>>> vmx_intr_assist(). When we try to inject an interrupt, we would find
>>> 0x38 is greater than the highest vector; then the assertion failure
>>> happened. I have a xtf case to reproduce this bug, seeing
>>> https://lists.xenproject.org/archives/html/xen-devel/2017-03/msg02906.html.
>>> But according to Jan's opinion, he thought the bug was unlikely
>>> triggered in OSSTEST by these weird operations.
>>>
>>> After thinking over it, the bug also can be caused by pt_update_irq()
>>> returns 0x38 but it doesn't set 0x38 in vIRR for the corresponding RTE
>>> is masked. Please refer to the code path:
>>> vmx_intr_assist() -> pt_update_irq() -> hvm_isa_irq_assert() ->
>>> assert_irq() -> assert_gsi() -> vioapic_irq_positive_edge().
>>> Note that in vioapic_irq_positive_edge(), if ent->fields.mask is set,
>>> the function returns without setting the corresponding bit in vIRR.
>> To verify this guess, I modify the above xtf a little. The new xtf test
>> (enclosed in attachment) Create a guest with 2 vCPU. vCPU0 sets up PIT
>> to generate timer interrupt every 1ms. It also boots up vCPU1. vCPU1
>> incessantly masks/unmasks the corresponding IOAPIC RTE and sends IPI
>> (vector 0x30) to vCPU0. The bug happens as expected:
>
>On the XTF side of things, I really need to get around to cleaning up my
>SMP support work. There are an increasing number of tests which are
>creating ad-hoc APs.
>
>Recently, an APIC driver has been introduced, so you can probably drop
>1/3 of that code by using apic_init()/apic_icr_write(). I've also got a
>proto IO-APIC driver which I should clean up and upstream.
Thanks for your information. I will try to clean up this test and send
it out for review.
Thanks
Chao
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel
prev parent reply other threads:[~2017-10-09 11:18 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-10-03 9:55 [xen-unstable test] 113959: regressions - FAIL osstest service owner
2017-10-03 10:08 ` Roger Pau Monné
2017-10-09 6:13 ` Chao Gao
2017-10-09 7:58 ` Chao Gao
2017-10-09 11:03 ` Andrew Cooper
2017-10-09 11:18 ` Chao Gao [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20171009111759.GA21574@op-computing \
--to=chao.gao@intel.com \
--cc=andrew.cooper3@citrix.com \
--cc=jbeulich@suse.com \
--cc=jun.nakajima@intel.com \
--cc=kevin.tian@intel.com \
--cc=osstest-admin@xenproject.org \
--cc=roger.pau@citrix.com \
--cc=xen-devel@lists.xensource.com \
--cc=xuquan8@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.