xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: Boris Ostrovsky <boris.ostrovsky@oracle.com>
To: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>,
	Ed Swierk <eswierk@skyportsystems.com>,
	david.vrabel@citrix.com, jgross@suse.com
Cc: xen-devel@lists.xensource.com
Subject: Re: PAT-related crash booting Linux 4.4 + Xen 4.5 on VMware ESXi
Date: Mon, 23 May 2016 16:13:42 -0400	[thread overview]
Message-ID: <57436476.9000100@oracle.com> (raw)
In-Reply-To: <20160523141523.GB9487@char.us.oracle.com>

On 05/23/2016 10:15 AM, Konrad Rzeszutek Wilk wrote:
> On Fri, May 20, 2016 at 04:58:09PM -0700, Ed Swierk wrote:
>> (XEN) traps.c:459:d0v0 Unhandled invalid opcode fault/trap [#6] on VCPU 0 [ec=0000]
>> (XEN) domain_crash_sync called from entry.S: fault at ffff82d0802286c3 create_bounce_frame+0x12b/0x13a
>> (XEN) Domain 0 (vcpu#0) crashed on cpu#0:
>> (XEN) ----[ Xen-4.5.4-pre  x86_64  debug=n  Not tainted ]----
>> (XEN) CPU:    0
>> (XEN) RIP:    e033:[<ffffffff81053cbd>]
>> (XEN) RFLAGS: 0000000000000206   EM: 1   CONTEXT: pv guest (d0v0)
>> (XEN) rax: 0000000000000022   rbx: 00000000ffffffff   rcx: 0000000000000000
>> (XEN) rdx: 0000000000000022   rsi: 0000000000000003   rdi: 0000000000000000
>> (XEN) rbp: ffffffff81b67ea8   rsp: ffffffff81b67e68   r8:  0000000000000001
>> (XEN) r9:  0000000000000001   r10: ffffffff81b67f20   r11: 6c61765f74617020
>> (XEN) r12: 0000000000000000   r13: 0000000000000003   r14: 0000000000000000
>> (XEN) r15: ffffffff81b67ebb   cr0: 000000008005003b   cr4: 00000000001526b0
>> (XEN) cr3: 00000001b16eb000   cr2: 0000000000000000
>> (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e02b   cs: e033
>> (XEN) Guest stack trace from rsp=ffffffff81b67e68:
>> (XEN)    0000000000000000 6c61765f74617020 ffffffff81053cbd 000000010000e030
>> (XEN)    0000000000010006 ffffffff81b67ea8 000000000000e02b ffffffff81b67f20
>> (XEN)    ffffffff81b67f10 ffffffff8105b339 55ffffff81b67f10 5520204355202043
>> (XEN)    5520204355202043 5520204355202043 0020204355202043 0000000000000000
>> (XEN)    0000000000000000 ffffffff81b67f38 0000000000000000 0000000000000000
>> (XEN)    0000000000000000 ffffffff81b67ff0 ffffffff82010d0a 0000000000000000
>> (XEN)    000306f200000000 fed8320300010800 0000000000000000 0000000000000000
>> (XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
>> (XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
>> (XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
>> (XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
>> (XEN)    0000000000000000 ffffffff81b68008 0000000000000000 0000000000000000
>> (XEN)    0000000000000000 0000000000000000 00000000fffedb08
>> (XEN) Domain 0 crashed: rebooting machine in 5 seconds.
>> (XEN) Resetting with ACPI MEMORY or I/O RESET_REG.
>>
>> The crash occurs in pat_init_cache_modes(), called by
>> xen_start_kernel().  The pat value from MSR_IA32_CR_PAT is 0.
>> Strangely, the same kernel and Xen boot just fine on VMware Fusion
>> 8.1.1, even though the MSR is 0 there as well.

Are you hitting BUG_ON in update_cache_mode_entry()? I don't think I can
see how you can avoid it when MSR read returns 0.


>>
>> Anyway, guessing that it's pointless to call pat_init_cache_modes()
>> when the CPU doesn't support PAT, I added a check for cpu_has_pat.
>> This resolves the problem on ESXi and doesn't seem to break real
>> hardware, though I'm not sure how to verify PAT functionality.  So
>> this is just an RFC.

Can you start an HVM guest in Xen after your patch below?

> Cc-ing maintainers.
>> diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
>> index 9a29803..209f680 100644
>> --- a/arch/x86/xen/enlighten.c
>> +++ b/arch/x86/xen/enlighten.c
>> @@ -1633,8 +1633,12 @@ asmlinkage __visible void __init xen_start_kernel(void)
>>  	 * Modify the cache mode translation tables to match Xen's PAT
>>  	 * configuration.
>>  	 */
>> -	rdmsrl(MSR_IA32_CR_PAT, pat);
>> -	pat_init_cache_modes(pat);
>> +	if (cpu_has_pat) {
>> +		rdmsrl(MSR_IA32_CR_PAT, pat);
>> +		pat_init_cache_modes(pat);
>> +	} else {
>> +		xen_raw_console_write("CPU does not support PAT\n");
>> +	}
>>  
>>  	/* keep using Xen gdt for now; no urgent need to change it */
>>

This looks OK to me but I think we should first understand why you don't
crash on Fusion.

Also, PAT initialization code has been rewritten in Linux (for 4.5?) so
I suspect this problem is only observed on earlier kernels.

-boris


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

  reply	other threads:[~2016-05-23 20:13 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-05-20 23:58 PAT-related crash booting Linux 4.4 + Xen 4.5 on VMware ESXi Ed Swierk
2016-05-23 14:15 ` Konrad Rzeszutek Wilk
2016-05-23 20:13   ` Boris Ostrovsky [this message]
2016-05-23 22:52     ` Ed Swierk
2016-05-24 14:53       ` Kani, Toshimitsu
2016-05-24 15:25         ` Ed Swierk
2016-05-24 15:54         ` Boris Ostrovsky
2016-05-24 16:59           ` Kani, Toshimitsu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=57436476.9000100@oracle.com \
    --to=boris.ostrovsky@oracle.com \
    --cc=david.vrabel@citrix.com \
    --cc=eswierk@skyportsystems.com \
    --cc=jgross@suse.com \
    --cc=konrad.wilk@oracle.com \
    --cc=xen-devel@lists.xensource.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).