From: Boris Ostrovsky <boris.ostrovsky@oracle.com>
To: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>,
Ed Swierk <eswierk@skyportsystems.com>,
david.vrabel@citrix.com, jgross@suse.com
Cc: xen-devel@lists.xensource.com
Subject: Re: PAT-related crash booting Linux 4.4 + Xen 4.5 on VMware ESXi
Date: Mon, 23 May 2016 16:13:42 -0400 [thread overview]
Message-ID: <57436476.9000100@oracle.com> (raw)
In-Reply-To: <20160523141523.GB9487@char.us.oracle.com>
On 05/23/2016 10:15 AM, Konrad Rzeszutek Wilk wrote:
> On Fri, May 20, 2016 at 04:58:09PM -0700, Ed Swierk wrote:
>> (XEN) traps.c:459:d0v0 Unhandled invalid opcode fault/trap [#6] on VCPU 0 [ec=0000]
>> (XEN) domain_crash_sync called from entry.S: fault at ffff82d0802286c3 create_bounce_frame+0x12b/0x13a
>> (XEN) Domain 0 (vcpu#0) crashed on cpu#0:
>> (XEN) ----[ Xen-4.5.4-pre x86_64 debug=n Not tainted ]----
>> (XEN) CPU: 0
>> (XEN) RIP: e033:[<ffffffff81053cbd>]
>> (XEN) RFLAGS: 0000000000000206 EM: 1 CONTEXT: pv guest (d0v0)
>> (XEN) rax: 0000000000000022 rbx: 00000000ffffffff rcx: 0000000000000000
>> (XEN) rdx: 0000000000000022 rsi: 0000000000000003 rdi: 0000000000000000
>> (XEN) rbp: ffffffff81b67ea8 rsp: ffffffff81b67e68 r8: 0000000000000001
>> (XEN) r9: 0000000000000001 r10: ffffffff81b67f20 r11: 6c61765f74617020
>> (XEN) r12: 0000000000000000 r13: 0000000000000003 r14: 0000000000000000
>> (XEN) r15: ffffffff81b67ebb cr0: 000000008005003b cr4: 00000000001526b0
>> (XEN) cr3: 00000001b16eb000 cr2: 0000000000000000
>> (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: e02b cs: e033
>> (XEN) Guest stack trace from rsp=ffffffff81b67e68:
>> (XEN) 0000000000000000 6c61765f74617020 ffffffff81053cbd 000000010000e030
>> (XEN) 0000000000010006 ffffffff81b67ea8 000000000000e02b ffffffff81b67f20
>> (XEN) ffffffff81b67f10 ffffffff8105b339 55ffffff81b67f10 5520204355202043
>> (XEN) 5520204355202043 5520204355202043 0020204355202043 0000000000000000
>> (XEN) 0000000000000000 ffffffff81b67f38 0000000000000000 0000000000000000
>> (XEN) 0000000000000000 ffffffff81b67ff0 ffffffff82010d0a 0000000000000000
>> (XEN) 000306f200000000 fed8320300010800 0000000000000000 0000000000000000
>> (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000
>> (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000
>> (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000
>> (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000
>> (XEN) 0000000000000000 ffffffff81b68008 0000000000000000 0000000000000000
>> (XEN) 0000000000000000 0000000000000000 00000000fffedb08
>> (XEN) Domain 0 crashed: rebooting machine in 5 seconds.
>> (XEN) Resetting with ACPI MEMORY or I/O RESET_REG.
>>
>> The crash occurs in pat_init_cache_modes(), called by
>> xen_start_kernel(). The pat value from MSR_IA32_CR_PAT is 0.
>> Strangely, the same kernel and Xen boot just fine on VMware Fusion
>> 8.1.1, even though the MSR is 0 there as well.
Are you hitting BUG_ON in update_cache_mode_entry()? I don't think I can
see how you can avoid it when MSR read returns 0.
>>
>> Anyway, guessing that it's pointless to call pat_init_cache_modes()
>> when the CPU doesn't support PAT, I added a check for cpu_has_pat.
>> This resolves the problem on ESXi and doesn't seem to break real
>> hardware, though I'm not sure how to verify PAT functionality. So
>> this is just an RFC.
Can you start an HVM guest in Xen after your patch below?
> Cc-ing maintainers.
>> diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
>> index 9a29803..209f680 100644
>> --- a/arch/x86/xen/enlighten.c
>> +++ b/arch/x86/xen/enlighten.c
>> @@ -1633,8 +1633,12 @@ asmlinkage __visible void __init xen_start_kernel(void)
>> * Modify the cache mode translation tables to match Xen's PAT
>> * configuration.
>> */
>> - rdmsrl(MSR_IA32_CR_PAT, pat);
>> - pat_init_cache_modes(pat);
>> + if (cpu_has_pat) {
>> + rdmsrl(MSR_IA32_CR_PAT, pat);
>> + pat_init_cache_modes(pat);
>> + } else {
>> + xen_raw_console_write("CPU does not support PAT\n");
>> + }
>>
>> /* keep using Xen gdt for now; no urgent need to change it */
>>
This looks OK to me but I think we should first understand why you don't
crash on Fusion.
Also, PAT initialization code has been rewritten in Linux (for 4.5?) so
I suspect this problem is only observed on earlier kernels.
-boris
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
next prev parent reply other threads:[~2016-05-23 20:13 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-05-20 23:58 PAT-related crash booting Linux 4.4 + Xen 4.5 on VMware ESXi Ed Swierk
2016-05-23 14:15 ` Konrad Rzeszutek Wilk
2016-05-23 20:13 ` Boris Ostrovsky [this message]
2016-05-23 22:52 ` Ed Swierk
2016-05-24 14:53 ` Kani, Toshimitsu
2016-05-24 15:25 ` Ed Swierk
2016-05-24 15:54 ` Boris Ostrovsky
2016-05-24 16:59 ` Kani, Toshimitsu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=57436476.9000100@oracle.com \
--to=boris.ostrovsky@oracle.com \
--cc=david.vrabel@citrix.com \
--cc=eswierk@skyportsystems.com \
--cc=jgross@suse.com \
--cc=konrad.wilk@oracle.com \
--cc=xen-devel@lists.xensource.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.