xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: Chao Gao <chao.gao@intel.com>
To: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: "Quan Xu" <xuquan8@huawei.com>,
	xen-devel@lists.xensource.com, "Jan Beulich" <jbeulich@suse.com>,
	"Kevin Tian" <kevin.tian@intel.com>,
	"osstest service owner" <osstest-admin@xenproject.org>,
	"Jun Nakajima" <jun.nakajima@intel.com>,
	"Roger Pau Monné" <roger.pau@citrix.com>
Subject: Re: [xen-unstable test] 113959: regressions - FAIL
Date: Mon, 9 Oct 2017 19:18:01 +0800	[thread overview]
Message-ID: <20171009111759.GA21574@op-computing> (raw)
In-Reply-To: <f8485fb4-dca9-28ad-a568-fd3282b28b20@citrix.com>

On Mon, Oct 09, 2017 at 12:03:53PM +0100, Andrew Cooper wrote:
>On 09/10/17 08:58, Chao Gao wrote:
>> On Mon, Oct 09, 2017 at 02:13:22PM +0800, Chao Gao wrote:
>>> On Tue, Oct 03, 2017 at 11:08:01AM +0100, Roger Pau Monné wrote:
>>>> On Tue, Oct 03, 2017 at 09:55:44AM +0000, osstest service owner wrote:
>>>>> flight 113959 xen-unstable real [real]
>>>>> http://logs.test-lab.xenproject.org/osstest/logs/113959/
>>>>>
>>>>> Regressions :-(
>>>>>
>>>>> Tests which did not succeed and are blocking,
>>>>> including tests which could not be run:
>>>>>  test-amd64-i386-libvirt-xsm  21 leak-check/check         fail REGR. vs. 113954
>>>> This is due to cron running when the leak-check is executed.
>>>>
>>>>>  test-armhf-armhf-xl-multivcpu  5 host-ping-check-native  fail REGR. vs. 113954
>>>>>  test-amd64-i386-xl-qemut-debianhvm-amd64 17 guest-stop   fail REGR. vs. 113954
>>>> The test below has triggered the following ASSERT, CCing the Intel
>>>> guys.
>>>>
>>>> Oct  3 06:12:00.415168 (XEN) d15v0: intack: 2:30 pt: 38
>>>> Oct  3 06:12:19.191141 (XEN) vIRR: 00000000 00000000 00000000 00000000 00000000 00000000 00010000 00000000
>>>> Oct  3 06:12:19.199162 (XEN)  PIR: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
>>>> Oct  3 06:12:19.207160 (XEN) Assertion 'intack.vector >= pt_vector' failed at intr.c:367
>>>> Oct  3 06:12:19.215215 (XEN) ----[ Xen-4.10-unstable  x86_64  debug=y   Not tainted ]----
>>>> Oct  3 06:12:19.223124 (XEN) CPU:    1
>>>> Oct  3 06:12:19.223153 (XEN) RIP:    e008:[<ffff82d0803022a5>] vmx_intr_assist+0x617/0x637
>>>> Oct  3 06:12:19.231185 (XEN) RFLAGS: 0000000000010292   CONTEXT: hypervisor (d15v0)
>>>> Oct  3 06:12:19.239163 (XEN) rax: ffff83022dfc802c   rbx: ffff8300ccc65680   rcx: 0000000000000000
>>>> Oct  3 06:12:19.247169 (XEN) rdx: ffff83022df7ffff   rsi: 000000000000000a   rdi: ffff82d0804606d8
>>>> Oct  3 06:12:19.255127 (XEN) rbp: ffff83022df7ff08   rsp: ffff83022df7fea8   r8:  ffff83022df90000
>>>> Oct  3 06:12:19.263114 (XEN) r9:  0000000000000001   r10: 0000000000000000   r11: 0000000000000001
>>>> Oct  3 06:12:19.271109 (XEN) r12: 00000000ffffffff   r13: ffff82d0803cfba6   r14: ffff82d0803cfba6
>>>> Oct  3 06:12:19.279119 (XEN) r15: 0000000000000004   cr0: 0000000080050033   cr4: 00000000001526e0
>>>> Oct  3 06:12:19.279157 (XEN) cr3: 0000000214274000   cr2: 00005622a2184dbf
>>>> Oct  3 06:12:19.287123 (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
>>>> Oct  3 06:12:19.295105 (XEN) Xen code around <ffff82d0803022a5> (vmx_intr_assist+0x617/0x637):
>>>> Oct  3 06:12:19.303150 (XEN)  41 bf 00 00 00 00 eb a0 <0f> 0b 89 ce 48 89 df e8 bb 20 00 00 e9 49 fe ff
>>>> Oct  3 06:12:19.311112 (XEN) Xen stack trace from rsp=ffff83022df7fea8:
>>>> Oct  3 06:12:19.311146 (XEN)    ffff83022df7ff08 000000388030cf76 ffff82d0805a7570 ffff82d08057ad80
>>>> Oct  3 06:12:19.319131 (XEN)    ffff83022df7ffff ffff83022df7fee0 ffff82d08023b9b6 ffff8300ccc65000
>>>> Oct  3 06:12:19.327115 (XEN)    000000000000000b 0000000000000020 00000000000000c2 0000000000000004
>>>> Oct  3 06:12:19.345094 (XEN)    ffff880029eb4000 ffff82d080311c21 0000000000000004 00000000000000c2
>>>> Oct  3 06:12:19.345177 (XEN)    0000000000000020 000000000000000b ffff880029eb4000 ffffffff81adf0a0
>>>> Oct  3 06:12:19.351221 (XEN)    0000000000000000 0000000000000000 ffff88002d400008 0000000000000000
>>>> Oct  3 06:12:19.359439 (XEN)    0000000000000030 0000000000000000 00000000000003f8 00000000000003f8
>>>> Oct  3 06:12:19.367267 (XEN)    ffffffff81adf0a0 0000beef0000beef ffffffff8138a5f4 000000bf0000beef
>>>> Oct  3 06:12:19.375222 (XEN)    0000000000000002 ffff88002f803e08 000000000000beef 000000000000beef
>>>> Oct  3 06:12:19.383198 (XEN)    000000000000beef 000000000000beef 000000000000beef 0000000000000001
>>>> Oct  3 06:12:19.391230 (XEN)    ffff8300ccc65000 00000031ada20d00 00000000001526e0
>>>> Oct  3 06:12:19.399336 (XEN) Xen call trace:
>>>> Oct  3 06:12:19.399389 (XEN)    [<ffff82d0803022a5>] vmx_intr_assist+0x617/0x637
>>>> Oct  3 06:12:19.407337 (XEN)    [<ffff82d080311c21>] vmx_asm_vmexit_handler+0x41/0x120
>>>> Oct  3 06:12:19.407380 (XEN) 
>>>> Oct  3 06:12:19.415246 (XEN) 
>>>> Oct  3 06:12:19.415278 (XEN) ****************************************
>>>> Oct  3 06:12:19.415307 (XEN) Panic on CPU 1:
>>>> Oct  3 06:12:19.415332 (XEN) Assertion 'intack.vector >= pt_vector' failed at intr.c:367
>>>> Oct  3 06:12:19.423432 (XEN) ****************************************
>>> (CC Jan)
>>>
>>> Hi, Roger.
>>>
>>> I sent a patch to fix a possible cause of this bug, seeing
>>> https://lists.xenproject.org/archives/html/xen-devel/2017-04/msg03254.html.
>>>
>>> Due to Xen 4.9 release, I put this patch aside and later forgot to
>>> continue fixing this bug. Sorry for this. Of course, I will fix this
>>> bug.
>>>
>>> I thought the root case was:
>>> When injecting periodic timer interrupt in vmx_intr_assist(),
>>> multi-read operations are done during one event delivery. For
>>> example, if a periodic timer interrupt is from PIT, when set the
>>> corresponding bit in vIRR, the corresponding RTE is accessed in
>>> pt_update_irq(). When this function returns, it accesses the RTE
>>> again to get the vector it sets in vIRR.  Between the two
>>> accesses, the content of RTE may have been changed by another CPU
>>> for no protection method in use. This case can incur the
>>> assertion failure in vmx_intr_assist().
>>>
>>> For example, in this case, we may set 0x30 in vIRR, but return 0x38 to
>>> vmx_intr_assist(). When we try to inject an interrupt, we would find
>>> 0x38 is greater than the highest vector; then the assertion failure
>>> happened. I have a xtf case to reproduce this bug, seeing
>>> https://lists.xenproject.org/archives/html/xen-devel/2017-03/msg02906.html.
>>> But according to Jan's opinion, he thought the bug was unlikely
>>> triggered in OSSTEST by these weird operations.
>>>
>>> After thinking over it, the bug also can be caused by pt_update_irq()
>>> returns 0x38 but it doesn't set 0x38 in vIRR for the corresponding RTE
>>> is masked. Please refer to the code path:
>>> vmx_intr_assist() -> pt_update_irq() -> hvm_isa_irq_assert() ->
>>> assert_irq() -> assert_gsi() -> vioapic_irq_positive_edge().
>>> Note that in vioapic_irq_positive_edge(), if ent->fields.mask is set,
>>> the function returns without setting the corresponding bit in vIRR.
>> To verify this guess, I modify the above xtf a little. The new xtf test
>> (enclosed in attachment) Create a guest with 2 vCPU. vCPU0 sets up PIT
>> to generate timer interrupt every 1ms. It also boots up vCPU1. vCPU1
>> incessantly masks/unmasks the corresponding IOAPIC RTE and sends IPI
>> (vector 0x30) to vCPU0. The bug happens as expected:
>
>On the XTF side of things, I really need to get around to cleaning up my
>SMP support work.  There are an increasing number of tests which are
>creating ad-hoc APs.
>
>Recently, an APIC driver has been introduced, so you can probably drop
>1/3 of that code by using apic_init()/apic_icr_write().  I've also got a
>proto IO-APIC driver which I should clean up and upstream.

Thanks for your information. I will try to clean up this test and send
it out for review.

Thanks
Chao

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

      reply	other threads:[~2017-10-09 11:18 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-10-03  9:55 [xen-unstable test] 113959: regressions - FAIL osstest service owner
2017-10-03 10:08 ` Roger Pau Monné
2017-10-09  6:13   ` Chao Gao
2017-10-09  7:58     ` Chao Gao
2017-10-09 11:03       ` Andrew Cooper
2017-10-09 11:18         ` Chao Gao [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171009111759.GA21574@op-computing \
    --to=chao.gao@intel.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=jbeulich@suse.com \
    --cc=jun.nakajima@intel.com \
    --cc=kevin.tian@intel.com \
    --cc=osstest-admin@xenproject.org \
    --cc=roger.pau@citrix.com \
    --cc=xen-devel@lists.xensource.com \
    --cc=xuquan8@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).