Re: [xen-unstable test] 94442: regressions - FAIL

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Andrew Cooper <andrew.cooper3@citrix.com>
To: Jan Beulich <JBeulich@suse.com>, Wei Liu <wei.liu2@citrix.com>
Cc: xen-devel <xen-devel@lists.xenproject.org>,
	osstest service owner <osstest-admin@xenproject.org>
Subject: Re: [xen-unstable test] 94442: regressions - FAIL
Date: Tue, 17 May 2016 14:08:07 +0100	[thread overview]
Message-ID: <573B17B7.5090606@citrix.com> (raw)
In-Reply-To: <573B152C02000078000EC11D@prv-mh.provo.novell.com>

On 17/05/16 11:57, Jan Beulich wrote:
>>>> On 16.05.16 at 11:24, <wei.liu2@citrix.com> wrote:
>> On Mon, May 16, 2016 at 02:57:13AM +0000, osstest service owner wrote:
>>> flight 94442 xen-unstable real [real]
>>> http://logs.test-lab.xenproject.org/osstest/logs/94442/ 
>> [...]
>>>  test-amd64-i386-qemuu-rhel6hvm-intel  9 redhat-install    fail REGR. vs. 94368
>> The changes in this flight shouldn't cause failure like this. See below.
>>
>> It is more likely to be caused by SMEP/SMAP fix, which are now in
>> master. It seems that previous run didn't discover this.
>>
>> Log file at:
>>
>> http://logs.test-lab.xenproject.org/osstest/logs/94442/test-amd64-i386-qemuu-rhel 
>> 6hvm-intel/serial-italia0.log
>>
>> May 15 22:07:44.023500 (XEN) Xen BUG at entry.S:221
>> May 15 22:07:47.455549 (XEN) ----[ Xen-4.7.0-rc  x86_64  debug=y  Not tainted ]----
>> May 15 22:07:47.463500 (XEN) CPU:    0
>> May 15 22:07:47.463531 (XEN) RIP:    e008:[<ffff82d0802411c7>] cr4_pv32_restore+0x37/0x40
>> May 15 22:07:47.463567 (XEN) RFLAGS: 0000000000010287   CONTEXT: hypervisor (d0v3)
>> May 15 22:07:47.471503 (XEN) rax: 0000000000000000   rbx: 00000000cf195e50   rcx: 0000000000000001
>> May 15 22:07:47.479496 (XEN) rdx: ffff8300be907ff8   rsi: 0000000000007ff0   rdi: 000000000022287e
>> May 15 22:07:47.487498 (XEN) rbp: 00007cff416f80c7   rsp: ffff8300be907f08   r8:  ffff83023df8a000
>> May 15 22:07:47.495498 (XEN) r9:  ffff83023df8a000   r10: 00000000deadbeef   r11: 0000000000800000
>> May 15 22:07:47.503510 (XEN) r12: ffff8300bed32000   r13: ffff83023df8a000   r14: 0000000000000000
>> May 15 22:07:47.503549 (XEN) r15: ffff83023df72000   cr0: 0000000080050033   cr4: 00000000001526e0
>> May 15 22:07:47.511501 (XEN) cr3: 00000002383d7000   cr2: 00000000b71ff000
>> May 15 22:07:47.519493 (XEN) ds: 007b   es: 007b   fs: 00d8   gs: 0033   ss: 0000   cs: e008
>> May 15 22:07:47.527520 (XEN) Xen code around <ffff82d0802411c7> (cr4_pv32_restore+0x37/0x40):
>> May 15 22:07:47.535491 (XEN)  3b 05 03 87 0a 00 74 02 <0f> 0b 5a 31 c0 c3 0f 1f 00 f6 42 04 01 0f 84 26
>> May 15 22:07:47.535531 (XEN) Xen stack trace from rsp=ffff8300be907f08:
>> May 15 22:07:47.543502 (XEN)    0000000000000000 ffff82d080240f22 ffff83023df72000 0000000000000000
>> May 15 22:07:47.551559 (XEN)    ffff83023df8a000 ffff8300bed32000 00000000cf195e6c 00000000cf195e50
>> May 15 22:07:47.559494 (XEN)    0000000000800000 00000000deadbeef ffff83023df8a000 0000000000000206
>> May 15 22:07:47.567496 (XEN)    0000000000000001 0000000000000001 0000000000000000 0000000000007ff0
>> May 15 22:07:47.575503 (XEN)    000000000022287e 0000010000000000 00000000c1001027 0000000000000061
>> May 15 22:07:47.575543 (XEN)    0000000000000246 00000000cf195e44 0000000000000069 000000000000beef
>> May 15 22:07:47.583508 (XEN)    000000000000beef 000000000000beef 000000000000beef 0000000000000000
>> May 15 22:07:47.591503 (XEN)    ffff8300bed30000 0000000000000000 00000000001526e0
>> May 15 22:07:47.599493 (XEN) Xen call trace:
>> May 15 22:07:47.599522 (XEN)    [<ffff82d0802411c7>] cr4_pv32_restore+0x37/0x40
> I think I see the problem the introduction of caching in v3 introduced:
> In compat_restore_all_guest we have (getting patched in by altinsn
> patching):
>
> .Lcr4_alt:
>         testb $3,UREGS_cs(%rsp)
>         jpe   .Lcr4_alt_end
>         mov   CPUINFO_cr4-CPUINFO_guest_cpu_user_regs(%rsp), %rax
>         and   $~XEN_CR4_PV32_BITS, %rax
>         mov   %rax, CPUINFO_cr4-CPUINFO_guest_cpu_user_regs(%rsp)
>         mov   %rax, %cr4
> .Lcr4_alt_end:
>
> If an NMI occurs between the updating og the cached value and the
> actual CR4 write, the NMI handling will cause the cached value to get
> SMEP+SMAP enabled again (in both cache and CR4), and once we
> get back here, we will clear it in just CR4.
>
> We don't want to undo the caching, as that gave us performance back
> at least for 64-bit PV guests.
>
> We also can't simply swap the two instructions: If we did, an NMI
> between the two would itself trigger the BUG in cr4_pv32_restore
> (as the check there assumes that CR4 always has no less of the
> bits of interest set than the cached value).
>
> The options I see are:
>
> 1) Ditch the debug check altogether, for being false positive in
> exactly one corner case.
>
> 2) Make the NMI handler recognize the single critical pair of
> instructions.
>
> 3) Change the code sequence above to
>
> .Lcr4_alt:
>         testb $3,UREGS_cs(%rsp)
>         jpe   .Lcr4_alt_end
>         mov   CPUINFO_cr4-CPUINFO_guest_cpu_user_regs(%rsp), %rax
>         and   $~XEN_CR4_PV32_BITS, %rax
> 1:
>         mov   %rax, CPUINFO_cr4-CPUINFO_guest_cpu_user_regs(%rsp)
>         mov   %rax, %cr4
>         /* (suitable comment goes here) */
>         cmp   %rax, CPUINFO_cr4-CPUINFO_guest_cpu_user_regs(%rsp)
>         jne   1b
> .Lcr4_alt_end:
>
> (assuming that an insane flood of NMIs not allowing this loop to
> be exited would be sufficiently problematic in other ways).
>
> I dislike 1, and between 2 and 3 I think I'd prefer the latter, unless
> someone else sees something wrong with such an approach.

+1 for option 3.  If we have a flood of NMIs, we have larger problems
than this loop.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

     prev parent reply	other threads:[~2016-05-17 13:09 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-05-16  2:57 [xen-unstable test] 94442: regressions - FAIL osstest service owner
2016-05-16  9:24 ` Wei Liu
2016-05-16  9:29   ` Andrew Cooper
2016-05-16  9:39     ` Wei Liu
2016-05-16  9:42       ` Andrew Cooper
2016-05-17  8:59     ` Jan Beulich
2016-05-17  9:01       ` Andrew Cooper
2016-05-17  9:08         ` Jan Beulich
2016-05-17  9:06       ` Jan Beulich
2016-05-17 10:57   ` Jan Beulich
2016-05-17 13:08     ` Andrew Cooper [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=573B17B7.5090606@citrix.com \
    --to=andrew.cooper3@citrix.com \
    --cc=JBeulich@suse.com \
    --cc=osstest-admin@xenproject.org \
    --cc=wei.liu2@citrix.com \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.