From: Andrew Cooper <andrew.cooper3@citrix.com>
To: osstest service owner <osstest-admin@xenproject.org>,
xen-devel@lists.xensource.com
Cc: Ian Jackson <Ian.Jackson@eu.citrix.com>,
Boris Ostrovsky <boris.ostrovsky@oracle.com>,
Wei Liu <Wei.Liu2@citrix.com>, Jan Beulich <JBeulich@suse.com>,
Roger Pau Monne <roger.pau@citrix.com>
Subject: Re: [linux-linus bisection] complete test-amd64-amd64-xl-pvh-intel
Date: Mon, 20 Feb 2017 00:36:11 +0000 [thread overview]
Message-ID: <bdcd35cb-9d6e-ed48-b297-19e385569c99@citrix.com> (raw)
In-Reply-To: <44a5f150-aaf8-396d-6cbd-13d2fd2dcf7e@citrix.com>
On 20/02/2017 00:26, Andrew Cooper wrote:
> On 20/02/2017 00:20, Andrew Cooper wrote:
>> On 19/02/2017 23:20, osstest service owner wrote:
>>> branch xen-unstable
>>> xenbranch xen-unstable
>>> job test-amd64-amd64-xl-pvh-intel
>>> testid guest-start
>>>
>>> Tree: linux git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
>>> Tree: linuxfirmware git://xenbits.xen.org/osstest/linux-firmware.git
>>> Tree: qemu git://xenbits.xen.org/qemu-xen-traditional.git
>>> Tree: qemuu git://xenbits.xen.org/qemu-xen.git
>>> Tree: xen git://xenbits.xen.org/xen.git
>>>
>>> *** Found and reproduced problem changeset ***
>>>
>>> Bug is in tree: xen git://xenbits.xen.org/xen.git
>>> Bug introduced: ab914e04a62727b75782e401eaf2e8b72f717f61
>>> Bug not present: 2f4d2198a9b3ba94c959330b5c94fe95917c364c
>>> Last fail repro: http://logs.test-lab.xenproject.org/osstest/logs/105915/
>>>
>>>
>>> commit ab914e04a62727b75782e401eaf2e8b72f717f61
>>> Author: Jan Beulich <jbeulich@suse.com>
>>> Date: Fri Feb 17 15:51:03 2017 +0100
>>>
>>> x86: package up context switch hook pointers
>>>
>>> They're all solely dependent on guest type, so we don't need to repeat
>>> all the same three pointers in every vCPU control structure. Instead use
>>> static const structures, and store pointers to them in the domain
>>> control structure.
>>>
>>> Since touching it anyway, take the opportunity and expand
>>> schedule_tail() in the only two places invoking it, allowing the macro
>>> to be dropped.
>>>
>>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>>> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
>>> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
>>> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
>> From
>> http://logs.test-lab.xenproject.org/osstest/logs/105917/test-amd64-amd64-xl-pvh-intel/serial-fiano0.log
>> around Feb 19 23:12:06.269706
>>
>> (XEN) ----[ Xen-4.9-unstable x86_64 debug=y Not tainted ]----
>> (XEN) CPU: 2
>> (XEN) RIP: e008:[<ffff82d08016795a>]
>> domain.c#__context_switch+0x1a3/0x3e3
>> (XEN) RFLAGS: 0000000000010046 CONTEXT: hypervisor (d1v0)
>> (XEN) rax: 0000000000000000 rbx: 0000000000000002 rcx: 0000000000000000
>> (XEN) rdx: 00000031fd44b600 rsi: 0000000000000003 rdi: ffff83007de27000
>> (XEN) rbp: ffff83027d78fdb0 rsp: ffff83027d78fd60 r8: 0000000000000000
>> (XEN) r9: 0000005716f6126f r10: 0000000000007ff0 r11: 0000000000000246
>> (XEN) r12: ffff83007de27000 r13: ffff83027fb74000 r14: ffff83007dafd000
>> (XEN) r15: ffff83027d7c8000 cr0: 000000008005003b cr4: 00000000001526e0
>> (XEN) cr3: 000000007dd05000 cr2: 0000000000000008
>> (XEN) ds: 002b es: 002b fs: 0000 gs: 0000 ss: e010 cs: e008
>> (XEN) Xen code around <ffff82d08016795a>
>> (domain.c#__context_switch+0x1a3/0x3e3):
>> (XEN) 85 68 07 00 00 4c 89 e7 <ff> 50 08 4c 89 ef e8 36 e1 02 00 41 80
>> bd 78 08
>> (XEN) Xen stack trace from rsp=ffff83027d78fd60:
>> (XEN) ffff83027d78ffff 0000000000000003 0000000000000000 0000000000000000
>> (XEN) 0000000000000000 ffff83007de27000 ffff83007dafd000 ffff83027fb74000
>> (XEN) 0000000000000002 ffff83027d7c8000 ffff83027d78fe20 ffff82d08016bf1f
>> (XEN) ffff82d080131ae2 ffff83027d78fde0 0000000000000000 0000000000000000
>> (XEN) 0000000000000000 0000000000000000 ffff83027d78fe20 ffff83007dafd000
>> (XEN) ffff83007de27000 0000005716f5e5da ffff83027d796148 0000000000000001
>> (XEN) ffff83027d78feb0 ffff82d08012def9 ffff83027d7955a0 ffff83027d796160
>> (XEN) 0000000200000004 ffff83027d796140 ffff83027d78fe70 ffff82d08014af39
>> (XEN) ffff83027d78fe70 ffff83007de27000 0000000001c9c380 ffff82d0801bf800
>> (XEN) 000000107dafd000 ffff82d080322b80 ffff82d080322a80 ffffffffffffffff
>> (XEN) ffff83027d78ffff ffff83027d780000 ffff83027d78fee0 ffff82d08013128f
>> (XEN) ffff83027d78ffff ffff83007dd4c000 ffff83027d7c8000 00000000ffffffff
>> (XEN) ffff83027d78fef0 ffff82d0801312e4 ffff83027d78ff10 ffff82d080167582
>> (XEN) ffff82d0801312e4 ffff83007dafd000 ffff83027d78fdc8 0000000000000000
>> (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000
>> (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000
>> (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000
>> (XEN) ffffffff82374000 0000000000000000 0000000000000000 ffffffff81f59180
>> (XEN) 0000000000000000 0000000000000200 ffffffff82390000 0000000000000000
>> (XEN) 0000000000000000 02ffff8000000000 0000000000000000 0000000000000000
>> (XEN) Xen call trace:
>> (XEN) [<ffff82d08016795a>] domain.c#__context_switch+0x1a3/0x3e3
>> (XEN) [<ffff82d08016bf1f>] context_switch+0x147/0xf0d
>> (XEN) [<ffff82d08012def9>] schedule.c#schedule+0x5ba/0x615
>> (XEN) [<ffff82d08013128f>] softirq.c#__do_softirq+0x7f/0x8a
>> (XEN) [<ffff82d0801312e4>] do_softirq+0x13/0x15
>> (XEN) [<ffff82d080167582>] domain.c#idle_loop+0x55/0x62
>> (XEN)
>> (XEN) Pagetable walk from 0000000000000008:
>> (XEN) L4[0x000] = 000000027d7cd063 ffffffffffffffff
>> (XEN) L3[0x000] = 000000027d7cc063 ffffffffffffffff
>> (XEN) L2[0x000] = 000000027d7cb063 ffffffffffffffff
>> (XEN) L1[0x000] = 0000000000000000 ffffffffffffffff
>> (XEN)
>> (XEN) ****************************************
>> (XEN) Panic on CPU 2:
>> (XEN) FATAL PAGE FAULT
>> (XEN) [error_code=0000]
>> (XEN) Faulting linear address: 0000000000000008
>> (XEN) ****************************************
>> (XEN)
>>
>> We have followed the ->to() hook on a domain with a NULL ctxt_switch
>> pointer (confirmed by the disassembly). n is derived from current,
>> which is d1v0, but that would mean we are trying to schedule a vcpu
>> before its domain structure has been fully constructed.
>>
>> The problem is with hvm_domain_initialise()
>>
>> int hvm_domain_initialise(struct domain *d)
>> {
>> ...
>> if ( is_pvh_domain(d) )
>> {
>> register_portio_handler(d, 0, 0x10003, handle_pvh_io);
>> return 0;
>> }
>> ...
>> rc = hvm_funcs.domain_initialise(d);
>> ...
>> }
>>
>> So PVH domains exit hvm_domain_initialise() earlier than when we call
>> the vendor-specific initialisation hooks.
>>
>> Rather than fixing this specific issue, can I suggest we properly kill
>> PVH v1 at this point? Given what else it skips in
>> hvm_domain_initialise(), it clearly hasn't functioned properly in the past.
> P.S. Ian: Why did this failure not block at the push gate?
>
> It is a completely repeatable host crash, yet master has been pulled up
> to match staging.
P.P.S.
We have a cascade failure during crash which we should fix.
(XEN) Assertion 'current == idle_vcpu[smp_processor_id()]' failed at
domain.c:2178
(XEN) ----[ Xen-4.9-unstable x86_64 debug=y Not tainted ]----
(XEN) CPU: 2
(XEN) RIP: e008:[<ffff82d08016cd3d>] __sync_local_execstate+0x44/0x67
(XEN) RFLAGS: 0000000000010006 CONTEXT: hypervisor (d1v0)
<snip>
(XEN) Xen call trace:
(XEN) [<ffff82d08016cd3d>] __sync_local_execstate+0x44/0x67
(XEN) [<ffff82d080196d8d>] invalidate_interrupt+0x40/0x7d
(XEN) [<ffff82d080176112>] do_IRQ+0x8c/0x60f
(XEN) [<ffff82d0802470f7>] common_interrupt+0x67/0x70
(XEN) [<ffff82d080196865>] machine_halt+0x1d/0x32
(XEN) [<ffff82d0801476c1>] panic+0x10b/0x115
(XEN) [<ffff82d0801a1955>] do_page_fault+0x424/0x4f8
(XEN) [<ffff82d0802471f8>] entry.o#handle_exception_saved+0x66/0xa4
(XEN) [<ffff82d08016795a>] domain.c#__context_switch+0x1a3/0x3e3
(XEN) [<ffff82d08016bf1f>] context_switch+0x147/0xf0d
(XEN) [<ffff82d08012def9>] schedule.c#schedule+0x5ba/0x615
(XEN) [<ffff82d08013128f>] softirq.c#__do_softirq+0x7f/0x8a
(XEN) [<ffff82d0801312e4>] do_softirq+0x13/0x15
(XEN) [<ffff82d080167582>] domain.c#idle_loop+0x55/0x62
We really shouldn't be enabling interrupts in machine_halt(), because
there is no guarantee that it is safe to.
~Andrew
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel
next prev parent reply other threads:[~2017-02-20 0:36 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-02-19 23:20 [linux-linus bisection] complete test-amd64-amd64-xl-pvh-intel osstest service owner
2017-02-20 0:20 ` Andrew Cooper
2017-02-20 0:26 ` Andrew Cooper
2017-02-20 0:36 ` Andrew Cooper [this message]
2017-02-20 10:42 ` Removing PVHv1 code (was: Re: [linux-linus bisection] complete) test-amd64-amd64-xl-pvh-intel Roger Pau Monne
2017-02-21 13:51 ` Removing PVHv1 code Boris Ostrovsky
-- strict thread matches above, loose matches on Subject: below --
2017-07-10 13:54 [linux-linus bisection] complete test-amd64-amd64-xl-pvh-intel osstest service owner
2017-07-06 7:33 osstest service owner
2015-11-09 2:02 osstest service owner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=bdcd35cb-9d6e-ed48-b297-19e385569c99@citrix.com \
--to=andrew.cooper3@citrix.com \
--cc=Ian.Jackson@eu.citrix.com \
--cc=JBeulich@suse.com \
--cc=Wei.Liu2@citrix.com \
--cc=boris.ostrovsky@oracle.com \
--cc=osstest-admin@xenproject.org \
--cc=roger.pau@citrix.com \
--cc=xen-devel@lists.xensource.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).