From: Andrew Cooper <andrew.cooper3@citrix.com>
To: Jan Beulich <JBeulich@suse.com>, Wei Liu <wei.liu2@citrix.com>
Cc: xen-devel <xen-devel@lists.xenproject.org>,
osstest service owner <osstest-admin@xenproject.org>
Subject: Re: [xen-unstable test] 94442: regressions - FAIL
Date: Tue, 17 May 2016 14:08:07 +0100 [thread overview]
Message-ID: <573B17B7.5090606@citrix.com> (raw)
In-Reply-To: <573B152C02000078000EC11D@prv-mh.provo.novell.com>
On 17/05/16 11:57, Jan Beulich wrote:
>>>> On 16.05.16 at 11:24, <wei.liu2@citrix.com> wrote:
>> On Mon, May 16, 2016 at 02:57:13AM +0000, osstest service owner wrote:
>>> flight 94442 xen-unstable real [real]
>>> http://logs.test-lab.xenproject.org/osstest/logs/94442/
>> [...]
>>> test-amd64-i386-qemuu-rhel6hvm-intel 9 redhat-install fail REGR. vs. 94368
>> The changes in this flight shouldn't cause failure like this. See below.
>>
>> It is more likely to be caused by SMEP/SMAP fix, which are now in
>> master. It seems that previous run didn't discover this.
>>
>> Log file at:
>>
>> http://logs.test-lab.xenproject.org/osstest/logs/94442/test-amd64-i386-qemuu-rhel
>> 6hvm-intel/serial-italia0.log
>>
>> May 15 22:07:44.023500 (XEN) Xen BUG at entry.S:221
>> May 15 22:07:47.455549 (XEN) ----[ Xen-4.7.0-rc x86_64 debug=y Not tainted ]----
>> May 15 22:07:47.463500 (XEN) CPU: 0
>> May 15 22:07:47.463531 (XEN) RIP: e008:[<ffff82d0802411c7>] cr4_pv32_restore+0x37/0x40
>> May 15 22:07:47.463567 (XEN) RFLAGS: 0000000000010287 CONTEXT: hypervisor (d0v3)
>> May 15 22:07:47.471503 (XEN) rax: 0000000000000000 rbx: 00000000cf195e50 rcx: 0000000000000001
>> May 15 22:07:47.479496 (XEN) rdx: ffff8300be907ff8 rsi: 0000000000007ff0 rdi: 000000000022287e
>> May 15 22:07:47.487498 (XEN) rbp: 00007cff416f80c7 rsp: ffff8300be907f08 r8: ffff83023df8a000
>> May 15 22:07:47.495498 (XEN) r9: ffff83023df8a000 r10: 00000000deadbeef r11: 0000000000800000
>> May 15 22:07:47.503510 (XEN) r12: ffff8300bed32000 r13: ffff83023df8a000 r14: 0000000000000000
>> May 15 22:07:47.503549 (XEN) r15: ffff83023df72000 cr0: 0000000080050033 cr4: 00000000001526e0
>> May 15 22:07:47.511501 (XEN) cr3: 00000002383d7000 cr2: 00000000b71ff000
>> May 15 22:07:47.519493 (XEN) ds: 007b es: 007b fs: 00d8 gs: 0033 ss: 0000 cs: e008
>> May 15 22:07:47.527520 (XEN) Xen code around <ffff82d0802411c7> (cr4_pv32_restore+0x37/0x40):
>> May 15 22:07:47.535491 (XEN) 3b 05 03 87 0a 00 74 02 <0f> 0b 5a 31 c0 c3 0f 1f 00 f6 42 04 01 0f 84 26
>> May 15 22:07:47.535531 (XEN) Xen stack trace from rsp=ffff8300be907f08:
>> May 15 22:07:47.543502 (XEN) 0000000000000000 ffff82d080240f22 ffff83023df72000 0000000000000000
>> May 15 22:07:47.551559 (XEN) ffff83023df8a000 ffff8300bed32000 00000000cf195e6c 00000000cf195e50
>> May 15 22:07:47.559494 (XEN) 0000000000800000 00000000deadbeef ffff83023df8a000 0000000000000206
>> May 15 22:07:47.567496 (XEN) 0000000000000001 0000000000000001 0000000000000000 0000000000007ff0
>> May 15 22:07:47.575503 (XEN) 000000000022287e 0000010000000000 00000000c1001027 0000000000000061
>> May 15 22:07:47.575543 (XEN) 0000000000000246 00000000cf195e44 0000000000000069 000000000000beef
>> May 15 22:07:47.583508 (XEN) 000000000000beef 000000000000beef 000000000000beef 0000000000000000
>> May 15 22:07:47.591503 (XEN) ffff8300bed30000 0000000000000000 00000000001526e0
>> May 15 22:07:47.599493 (XEN) Xen call trace:
>> May 15 22:07:47.599522 (XEN) [<ffff82d0802411c7>] cr4_pv32_restore+0x37/0x40
> I think I see the problem the introduction of caching in v3 introduced:
> In compat_restore_all_guest we have (getting patched in by altinsn
> patching):
>
> .Lcr4_alt:
> testb $3,UREGS_cs(%rsp)
> jpe .Lcr4_alt_end
> mov CPUINFO_cr4-CPUINFO_guest_cpu_user_regs(%rsp), %rax
> and $~XEN_CR4_PV32_BITS, %rax
> mov %rax, CPUINFO_cr4-CPUINFO_guest_cpu_user_regs(%rsp)
> mov %rax, %cr4
> .Lcr4_alt_end:
>
> If an NMI occurs between the updating og the cached value and the
> actual CR4 write, the NMI handling will cause the cached value to get
> SMEP+SMAP enabled again (in both cache and CR4), and once we
> get back here, we will clear it in just CR4.
>
> We don't want to undo the caching, as that gave us performance back
> at least for 64-bit PV guests.
>
> We also can't simply swap the two instructions: If we did, an NMI
> between the two would itself trigger the BUG in cr4_pv32_restore
> (as the check there assumes that CR4 always has no less of the
> bits of interest set than the cached value).
>
> The options I see are:
>
> 1) Ditch the debug check altogether, for being false positive in
> exactly one corner case.
>
> 2) Make the NMI handler recognize the single critical pair of
> instructions.
>
> 3) Change the code sequence above to
>
> .Lcr4_alt:
> testb $3,UREGS_cs(%rsp)
> jpe .Lcr4_alt_end
> mov CPUINFO_cr4-CPUINFO_guest_cpu_user_regs(%rsp), %rax
> and $~XEN_CR4_PV32_BITS, %rax
> 1:
> mov %rax, CPUINFO_cr4-CPUINFO_guest_cpu_user_regs(%rsp)
> mov %rax, %cr4
> /* (suitable comment goes here) */
> cmp %rax, CPUINFO_cr4-CPUINFO_guest_cpu_user_regs(%rsp)
> jne 1b
> .Lcr4_alt_end:
>
> (assuming that an insane flood of NMIs not allowing this loop to
> be exited would be sufficiently problematic in other ways).
>
> I dislike 1, and between 2 and 3 I think I'd prefer the latter, unless
> someone else sees something wrong with such an approach.
+1 for option 3. If we have a flood of NMIs, we have larger problems
than this loop.
~Andrew
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
prev parent reply other threads:[~2016-05-17 13:09 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-05-16 2:57 [xen-unstable test] 94442: regressions - FAIL osstest service owner
2016-05-16 9:24 ` Wei Liu
2016-05-16 9:29 ` Andrew Cooper
2016-05-16 9:39 ` Wei Liu
2016-05-16 9:42 ` Andrew Cooper
2016-05-17 8:59 ` Jan Beulich
2016-05-17 9:01 ` Andrew Cooper
2016-05-17 9:08 ` Jan Beulich
2016-05-17 9:06 ` Jan Beulich
2016-05-17 10:57 ` Jan Beulich
2016-05-17 13:08 ` Andrew Cooper [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=573B17B7.5090606@citrix.com \
--to=andrew.cooper3@citrix.com \
--cc=JBeulich@suse.com \
--cc=osstest-admin@xenproject.org \
--cc=wei.liu2@citrix.com \
--cc=xen-devel@lists.xenproject.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.