* PAT-related crash booting Linux 4.4 + Xen 4.5 on VMware ESXi
@ 2016-05-20 23:58 Ed Swierk
2016-05-23 14:15 ` Konrad Rzeszutek Wilk
0 siblings, 1 reply; 8+ messages in thread
From: Ed Swierk @ 2016-05-20 23:58 UTC (permalink / raw)
To: xen-devel; +Cc: eswierk
I've encountered two problems booting a Linux 4.4 dom0 on recent
stable xen 4.5 on VMware ESXi 5.5.0.
One has the same "ata_piix: probe of 0000:00:07.1 failed with error
-22" symptom discussed some time ago, and prevents the kernel from
seeing any of the virtual IDE drives exposed by VMware. This problem
is fixed by applying Stefano's patch
(https://lkml.org/lkml/2016/4/20/345).
Another problem occurs very early during boot:
(XEN) Xen version 4.5.4-pre ( 4.5.4~pre-1skyport2) (eswierk@skyportsystems.com) (gcc (Debian 5.2.1-19.1skyport1) 5.2.1 20150930) debug=n Thu May 19 12:06:20 PDT 2016
(XEN) Bootloader: SYSLINUX 4.05 20140113
(XEN) Command line: xen console=com1,vga com1=115200 no-bootscrub dom0_mem=2048M,max:2048M ignore_loglevel
(XEN) Video information:
(XEN) VGA is text mode 80x25, font 8x16
(XEN) Disc information:
(XEN) Found 1 MBR signatures
(XEN) Found 1 EDD information structures
(XEN) Xen-e820 RAM map:
(XEN) 0000000000000000 - 000000000009f800 (usable)
(XEN) 000000000009f800 - 00000000000a0000 (reserved)
(XEN) 00000000000dc000 - 0000000000100000 (reserved)
(XEN) 0000000000100000 - 00000000bfef0000 (usable)
(XEN) 00000000bfef0000 - 00000000bfeff000 (ACPI data)
(XEN) 00000000bfeff000 - 00000000bff00000 (ACPI NVS)
(XEN) 00000000bff00000 - 00000000c0000000 (usable)
(XEN) 00000000f0000000 - 00000000f8000000 (reserved)
(XEN) 00000000fec00000 - 00000000fec10000 (reserved)
(XEN) 00000000fee00000 - 00000000fee01000 (reserved)
(XEN) 00000000fffe0000 - 0000000100000000 (reserved)
(XEN) 0000000100000000 - 00000001c0000000 (usable)
(XEN) ACPI: RSDP 000F6B80, 0024 (r2 PTLTD )
(XEN) ACPI: XSDT BFEF0F70, 0054 (r1 INTEL 440BX 6040000 VMW 1324272)
(XEN) ACPI: FACP BFEFEE73, 00F4 (r4 INTEL 440BX 6040000 PTL F4240)
(XEN) ACPI: DSDT BFEF1252, DC21 (r1 PTLTD Custom 6040000 MSFT 3000001)
(XEN) ACPI: FACS BFEFFFC0, 0040
(XEN) ACPI: BOOT BFEF122A, 0028 (r1 PTLTD $SBFTBL$ 6040000 LTP 1)
(XEN) ACPI: APIC BFEF1194, 0096 (r1 PTLTD APIC 6040000 LTP 0)
(XEN) ACPI: MCFG BFEF1158, 003C (r1 PTLTD $PCITBL$ 6040000 LTP 1)
(XEN) ACPI: SRAT BFEF1028, 0130 (r2 VMWARE MEMPLUG 6040000 VMW 1)
(XEN) ACPI: WAET BFEF1000, 0028 (r1 VMWARE VMW WAET 6040000 VMW 1)
(XEN) System RAM: 6143MB (6291004kB)
(XEN) Domain heap initialised
(XEN) Processor #0 7:15 APIC version 21
(XEN) Processor #2 7:15 APIC version 21
(XEN) Processor #4 7:15 APIC version 21
(XEN) Processor #6 7:15 APIC version 21
(XEN) Processor #8 7:15 APIC version 21
(XEN) Processor #10 7:15 APIC version 21
(XEN) IOAPIC[0]: apic_id 1, version 17, address 0xfec00000, GSI 0-23
(XEN) Enabling APIC mode: Flat. Using 1 I/O APICs
(XEN) Not enabling x2APIC: depends on iommu_supports_eim.
(XEN) Using scheduler: SMP Credit Scheduler (credit)
(XEN) Detected 2299.474 MHz processor.
(XEN) Initing memory sharing.
(XEN) xstate_init: using cntxt_size: 0x340 and states: 0x7
(XEN) I/O virtualisation disabled
(XEN) ENABLING IO-APIC IRQs
(XEN) -> Using new ACK method
(XEN) Platform timer is 3.579MHz ACPI PM Timer
(XEN) Allocated console ring of 16 KiB.
(XEN) VMX: Supported advanced features:
(XEN) - APIC TPR shadow
(XEN) - Extended Page Tables (EPT)
(XEN) - Virtual-Processor Identifiers (VPID)
(XEN) - Virtual NMI
(XEN) - MSR direct-access bitmap
(XEN) - Unrestricted Guest
(XEN) HVM: ASIDs enabled.
(XEN) HVM: VMX enabled
(XEN) HVM: Hardware Assisted Paging (HAP) not detected
(XEN) Brought up 6 CPUs
(XEN) Dom0 has maximum 600 PIRQs
(XEN) *** LOADING DOMAIN 0 ***
(XEN) Xen kernel: 64-bit, lsb, compat32
(XEN) Dom0 kernel: 64-bit, PAE, lsb, paddr 0x1000000 -> 0x3000000
(XEN) PHYSICAL MEMORY ARRANGEMENT:
(XEN) Dom0 alloc.: 00000001b0000000->00000001b4000000 (504002 pages to be allocated)
(XEN) Init. ramdisk: 00000001bf0c2000->00000001bffffe00
(XEN) VIRTUAL MEMORY ARRANGEMENT:
(XEN) Loaded kernel: ffffffff81000000->ffffffff83000000
(XEN) Init. ramdisk: 0000000000000000->0000000000000000
(XEN) Phys-Mach map: 0000008000000000->0000008000400000
(XEN) Start info: ffffffff83000000->ffffffff830004b4
(XEN) Page tables: ffffffff83001000->ffffffff8301e000
(XEN) Boot stack: ffffffff8301e000->ffffffff8301f000
(XEN) TOTAL: ffffffff80000000->ffffffff83400000
(XEN) ENTRY ADDRESS: ffffffff8200d1f0
(XEN) Dom0 has maximum 6 VCPUs
(XEN) Std. Loglevel: Errors and warnings
(XEN) Guest Loglevel: Nothing (Rate-limited: Errors and warnings)
(XEN) Xen is relinquishing VGA console.
(XEN) *** Serial input -> DOM0 (type 'CTRL-a' three times to switch input to Xen)
(XEN) Freed 308kB init memory.
mapping kernel into physical memory
(XEN) traps.c:459:d0v0 Unhandled invalid opcode fault/trap [#6] on VCPU 0 [ec=0000]
(XEN) domain_crash_sync called from entry.S: fault at ffff82d0802286c3 create_bounce_frame+0x12b/0x13a
(XEN) Domain 0 (vcpu#0) crashed on cpu#0:
(XEN) ----[ Xen-4.5.4-pre x86_64 debug=n Not tainted ]----
(XEN) CPU: 0
(XEN) RIP: e033:[<ffffffff81053cbd>]
(XEN) RFLAGS: 0000000000000206 EM: 1 CONTEXT: pv guest (d0v0)
(XEN) rax: 0000000000000022 rbx: 00000000ffffffff rcx: 0000000000000000
(XEN) rdx: 0000000000000022 rsi: 0000000000000003 rdi: 0000000000000000
(XEN) rbp: ffffffff81b67ea8 rsp: ffffffff81b67e68 r8: 0000000000000001
(XEN) r9: 0000000000000001 r10: ffffffff81b67f20 r11: 6c61765f74617020
(XEN) r12: 0000000000000000 r13: 0000000000000003 r14: 0000000000000000
(XEN) r15: ffffffff81b67ebb cr0: 000000008005003b cr4: 00000000001526b0
(XEN) cr3: 00000001b16eb000 cr2: 0000000000000000
(XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: e02b cs: e033
(XEN) Guest stack trace from rsp=ffffffff81b67e68:
(XEN) 0000000000000000 6c61765f74617020 ffffffff81053cbd 000000010000e030
(XEN) 0000000000010006 ffffffff81b67ea8 000000000000e02b ffffffff81b67f20
(XEN) ffffffff81b67f10 ffffffff8105b339 55ffffff81b67f10 5520204355202043
(XEN) 5520204355202043 5520204355202043 0020204355202043 0000000000000000
(XEN) 0000000000000000 ffffffff81b67f38 0000000000000000 0000000000000000
(XEN) 0000000000000000 ffffffff81b67ff0 ffffffff82010d0a 0000000000000000
(XEN) 000306f200000000 fed8320300010800 0000000000000000 0000000000000000
(XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN) 0000000000000000 ffffffff81b68008 0000000000000000 0000000000000000
(XEN) 0000000000000000 0000000000000000 00000000fffedb08
(XEN) Domain 0 crashed: rebooting machine in 5 seconds.
(XEN) Resetting with ACPI MEMORY or I/O RESET_REG.
The crash occurs in pat_init_cache_modes(), called by
xen_start_kernel(). The pat value from MSR_IA32_CR_PAT is 0.
Strangely, the same kernel and Xen boot just fine on VMware Fusion
8.1.1, even though the MSR is 0 there as well.
Anyway, guessing that it's pointless to call pat_init_cache_modes()
when the CPU doesn't support PAT, I added a check for cpu_has_pat.
This resolves the problem on ESXi and doesn't seem to break real
hardware, though I'm not sure how to verify PAT functionality. So
this is just an RFC.
diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index 9a29803..209f680 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -1633,8 +1633,12 @@ asmlinkage __visible void __init xen_start_kernel(void)
* Modify the cache mode translation tables to match Xen's PAT
* configuration.
*/
- rdmsrl(MSR_IA32_CR_PAT, pat);
- pat_init_cache_modes(pat);
+ if (cpu_has_pat) {
+ rdmsrl(MSR_IA32_CR_PAT, pat);
+ pat_init_cache_modes(pat);
+ } else {
+ xen_raw_console_write("CPU does not support PAT\n");
+ }
/* keep using Xen gdt for now; no urgent need to change it */
--Ed
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
^ permalink raw reply related [flat|nested] 8+ messages in thread* Re: PAT-related crash booting Linux 4.4 + Xen 4.5 on VMware ESXi
2016-05-20 23:58 PAT-related crash booting Linux 4.4 + Xen 4.5 on VMware ESXi Ed Swierk
@ 2016-05-23 14:15 ` Konrad Rzeszutek Wilk
2016-05-23 20:13 ` Boris Ostrovsky
0 siblings, 1 reply; 8+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-05-23 14:15 UTC (permalink / raw)
To: Ed Swierk, boris.ostrovsky, david.vrabel, jgross; +Cc: xen-devel
On Fri, May 20, 2016 at 04:58:09PM -0700, Ed Swierk wrote:
> I've encountered two problems booting a Linux 4.4 dom0 on recent
> stable xen 4.5 on VMware ESXi 5.5.0.
>
> One has the same "ata_piix: probe of 0000:00:07.1 failed with error
> -22" symptom discussed some time ago, and prevents the kernel from
> seeing any of the virtual IDE drives exposed by VMware. This problem
> is fixed by applying Stefano's patch
> (https://lkml.org/lkml/2016/4/20/345).
>
> Another problem occurs very early during boot:
>
> (XEN) Xen version 4.5.4-pre ( 4.5.4~pre-1skyport2) (eswierk@skyportsystems.com) (gcc (Debian 5.2.1-19.1skyport1) 5.2.1 20150930) debug=n Thu May 19 12:06:20 PDT 2016
> (XEN) Bootloader: SYSLINUX 4.05 20140113
> (XEN) Command line: xen console=com1,vga com1=115200 no-bootscrub dom0_mem=2048M,max:2048M ignore_loglevel
> (XEN) Video information:
> (XEN) VGA is text mode 80x25, font 8x16
> (XEN) Disc information:
> (XEN) Found 1 MBR signatures
> (XEN) Found 1 EDD information structures
> (XEN) Xen-e820 RAM map:
> (XEN) 0000000000000000 - 000000000009f800 (usable)
> (XEN) 000000000009f800 - 00000000000a0000 (reserved)
> (XEN) 00000000000dc000 - 0000000000100000 (reserved)
> (XEN) 0000000000100000 - 00000000bfef0000 (usable)
> (XEN) 00000000bfef0000 - 00000000bfeff000 (ACPI data)
> (XEN) 00000000bfeff000 - 00000000bff00000 (ACPI NVS)
> (XEN) 00000000bff00000 - 00000000c0000000 (usable)
> (XEN) 00000000f0000000 - 00000000f8000000 (reserved)
> (XEN) 00000000fec00000 - 00000000fec10000 (reserved)
> (XEN) 00000000fee00000 - 00000000fee01000 (reserved)
> (XEN) 00000000fffe0000 - 0000000100000000 (reserved)
> (XEN) 0000000100000000 - 00000001c0000000 (usable)
> (XEN) ACPI: RSDP 000F6B80, 0024 (r2 PTLTD )
> (XEN) ACPI: XSDT BFEF0F70, 0054 (r1 INTEL 440BX 6040000 VMW 1324272)
> (XEN) ACPI: FACP BFEFEE73, 00F4 (r4 INTEL 440BX 6040000 PTL F4240)
> (XEN) ACPI: DSDT BFEF1252, DC21 (r1 PTLTD Custom 6040000 MSFT 3000001)
> (XEN) ACPI: FACS BFEFFFC0, 0040
> (XEN) ACPI: BOOT BFEF122A, 0028 (r1 PTLTD $SBFTBL$ 6040000 LTP 1)
> (XEN) ACPI: APIC BFEF1194, 0096 (r1 PTLTD APIC 6040000 LTP 0)
> (XEN) ACPI: MCFG BFEF1158, 003C (r1 PTLTD $PCITBL$ 6040000 LTP 1)
> (XEN) ACPI: SRAT BFEF1028, 0130 (r2 VMWARE MEMPLUG 6040000 VMW 1)
> (XEN) ACPI: WAET BFEF1000, 0028 (r1 VMWARE VMW WAET 6040000 VMW 1)
> (XEN) System RAM: 6143MB (6291004kB)
> (XEN) Domain heap initialised
> (XEN) Processor #0 7:15 APIC version 21
> (XEN) Processor #2 7:15 APIC version 21
> (XEN) Processor #4 7:15 APIC version 21
> (XEN) Processor #6 7:15 APIC version 21
> (XEN) Processor #8 7:15 APIC version 21
> (XEN) Processor #10 7:15 APIC version 21
> (XEN) IOAPIC[0]: apic_id 1, version 17, address 0xfec00000, GSI 0-23
> (XEN) Enabling APIC mode: Flat. Using 1 I/O APICs
> (XEN) Not enabling x2APIC: depends on iommu_supports_eim.
> (XEN) Using scheduler: SMP Credit Scheduler (credit)
> (XEN) Detected 2299.474 MHz processor.
> (XEN) Initing memory sharing.
> (XEN) xstate_init: using cntxt_size: 0x340 and states: 0x7
> (XEN) I/O virtualisation disabled
> (XEN) ENABLING IO-APIC IRQs
> (XEN) -> Using new ACK method
> (XEN) Platform timer is 3.579MHz ACPI PM Timer
> (XEN) Allocated console ring of 16 KiB.
> (XEN) VMX: Supported advanced features:
> (XEN) - APIC TPR shadow
> (XEN) - Extended Page Tables (EPT)
> (XEN) - Virtual-Processor Identifiers (VPID)
> (XEN) - Virtual NMI
> (XEN) - MSR direct-access bitmap
> (XEN) - Unrestricted Guest
> (XEN) HVM: ASIDs enabled.
> (XEN) HVM: VMX enabled
> (XEN) HVM: Hardware Assisted Paging (HAP) not detected
> (XEN) Brought up 6 CPUs
> (XEN) Dom0 has maximum 600 PIRQs
> (XEN) *** LOADING DOMAIN 0 ***
> (XEN) Xen kernel: 64-bit, lsb, compat32
> (XEN) Dom0 kernel: 64-bit, PAE, lsb, paddr 0x1000000 -> 0x3000000
> (XEN) PHYSICAL MEMORY ARRANGEMENT:
> (XEN) Dom0 alloc.: 00000001b0000000->00000001b4000000 (504002 pages to be allocated)
> (XEN) Init. ramdisk: 00000001bf0c2000->00000001bffffe00
> (XEN) VIRTUAL MEMORY ARRANGEMENT:
> (XEN) Loaded kernel: ffffffff81000000->ffffffff83000000
> (XEN) Init. ramdisk: 0000000000000000->0000000000000000
> (XEN) Phys-Mach map: 0000008000000000->0000008000400000
> (XEN) Start info: ffffffff83000000->ffffffff830004b4
> (XEN) Page tables: ffffffff83001000->ffffffff8301e000
> (XEN) Boot stack: ffffffff8301e000->ffffffff8301f000
> (XEN) TOTAL: ffffffff80000000->ffffffff83400000
> (XEN) ENTRY ADDRESS: ffffffff8200d1f0
> (XEN) Dom0 has maximum 6 VCPUs
> (XEN) Std. Loglevel: Errors and warnings
> (XEN) Guest Loglevel: Nothing (Rate-limited: Errors and warnings)
> (XEN) Xen is relinquishing VGA console.
> (XEN) *** Serial input -> DOM0 (type 'CTRL-a' three times to switch input to Xen)
> (XEN) Freed 308kB init memory.
> mapping kernel into physical memory
> (XEN) traps.c:459:d0v0 Unhandled invalid opcode fault/trap [#6] on VCPU 0 [ec=0000]
> (XEN) domain_crash_sync called from entry.S: fault at ffff82d0802286c3 create_bounce_frame+0x12b/0x13a
> (XEN) Domain 0 (vcpu#0) crashed on cpu#0:
> (XEN) ----[ Xen-4.5.4-pre x86_64 debug=n Not tainted ]----
> (XEN) CPU: 0
> (XEN) RIP: e033:[<ffffffff81053cbd>]
> (XEN) RFLAGS: 0000000000000206 EM: 1 CONTEXT: pv guest (d0v0)
> (XEN) rax: 0000000000000022 rbx: 00000000ffffffff rcx: 0000000000000000
> (XEN) rdx: 0000000000000022 rsi: 0000000000000003 rdi: 0000000000000000
> (XEN) rbp: ffffffff81b67ea8 rsp: ffffffff81b67e68 r8: 0000000000000001
> (XEN) r9: 0000000000000001 r10: ffffffff81b67f20 r11: 6c61765f74617020
> (XEN) r12: 0000000000000000 r13: 0000000000000003 r14: 0000000000000000
> (XEN) r15: ffffffff81b67ebb cr0: 000000008005003b cr4: 00000000001526b0
> (XEN) cr3: 00000001b16eb000 cr2: 0000000000000000
> (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: e02b cs: e033
> (XEN) Guest stack trace from rsp=ffffffff81b67e68:
> (XEN) 0000000000000000 6c61765f74617020 ffffffff81053cbd 000000010000e030
> (XEN) 0000000000010006 ffffffff81b67ea8 000000000000e02b ffffffff81b67f20
> (XEN) ffffffff81b67f10 ffffffff8105b339 55ffffff81b67f10 5520204355202043
> (XEN) 5520204355202043 5520204355202043 0020204355202043 0000000000000000
> (XEN) 0000000000000000 ffffffff81b67f38 0000000000000000 0000000000000000
> (XEN) 0000000000000000 ffffffff81b67ff0 ffffffff82010d0a 0000000000000000
> (XEN) 000306f200000000 fed8320300010800 0000000000000000 0000000000000000
> (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> (XEN) 0000000000000000 ffffffff81b68008 0000000000000000 0000000000000000
> (XEN) 0000000000000000 0000000000000000 00000000fffedb08
> (XEN) Domain 0 crashed: rebooting machine in 5 seconds.
> (XEN) Resetting with ACPI MEMORY or I/O RESET_REG.
>
> The crash occurs in pat_init_cache_modes(), called by
> xen_start_kernel(). The pat value from MSR_IA32_CR_PAT is 0.
> Strangely, the same kernel and Xen boot just fine on VMware Fusion
> 8.1.1, even though the MSR is 0 there as well.
>
> Anyway, guessing that it's pointless to call pat_init_cache_modes()
> when the CPU doesn't support PAT, I added a check for cpu_has_pat.
> This resolves the problem on ESXi and doesn't seem to break real
> hardware, though I'm not sure how to verify PAT functionality. So
> this is just an RFC.
Cc-ing maintainers.
>
> diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
> index 9a29803..209f680 100644
> --- a/arch/x86/xen/enlighten.c
> +++ b/arch/x86/xen/enlighten.c
> @@ -1633,8 +1633,12 @@ asmlinkage __visible void __init xen_start_kernel(void)
> * Modify the cache mode translation tables to match Xen's PAT
> * configuration.
> */
> - rdmsrl(MSR_IA32_CR_PAT, pat);
> - pat_init_cache_modes(pat);
> + if (cpu_has_pat) {
> + rdmsrl(MSR_IA32_CR_PAT, pat);
> + pat_init_cache_modes(pat);
> + } else {
> + xen_raw_console_write("CPU does not support PAT\n");
> + }
>
> /* keep using Xen gdt for now; no urgent need to change it */
>
>
> --Ed
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: PAT-related crash booting Linux 4.4 + Xen 4.5 on VMware ESXi
2016-05-23 14:15 ` Konrad Rzeszutek Wilk
@ 2016-05-23 20:13 ` Boris Ostrovsky
2016-05-23 22:52 ` Ed Swierk
0 siblings, 1 reply; 8+ messages in thread
From: Boris Ostrovsky @ 2016-05-23 20:13 UTC (permalink / raw)
To: Konrad Rzeszutek Wilk, Ed Swierk, david.vrabel, jgross; +Cc: xen-devel
On 05/23/2016 10:15 AM, Konrad Rzeszutek Wilk wrote:
> On Fri, May 20, 2016 at 04:58:09PM -0700, Ed Swierk wrote:
>> (XEN) traps.c:459:d0v0 Unhandled invalid opcode fault/trap [#6] on VCPU 0 [ec=0000]
>> (XEN) domain_crash_sync called from entry.S: fault at ffff82d0802286c3 create_bounce_frame+0x12b/0x13a
>> (XEN) Domain 0 (vcpu#0) crashed on cpu#0:
>> (XEN) ----[ Xen-4.5.4-pre x86_64 debug=n Not tainted ]----
>> (XEN) CPU: 0
>> (XEN) RIP: e033:[<ffffffff81053cbd>]
>> (XEN) RFLAGS: 0000000000000206 EM: 1 CONTEXT: pv guest (d0v0)
>> (XEN) rax: 0000000000000022 rbx: 00000000ffffffff rcx: 0000000000000000
>> (XEN) rdx: 0000000000000022 rsi: 0000000000000003 rdi: 0000000000000000
>> (XEN) rbp: ffffffff81b67ea8 rsp: ffffffff81b67e68 r8: 0000000000000001
>> (XEN) r9: 0000000000000001 r10: ffffffff81b67f20 r11: 6c61765f74617020
>> (XEN) r12: 0000000000000000 r13: 0000000000000003 r14: 0000000000000000
>> (XEN) r15: ffffffff81b67ebb cr0: 000000008005003b cr4: 00000000001526b0
>> (XEN) cr3: 00000001b16eb000 cr2: 0000000000000000
>> (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: e02b cs: e033
>> (XEN) Guest stack trace from rsp=ffffffff81b67e68:
>> (XEN) 0000000000000000 6c61765f74617020 ffffffff81053cbd 000000010000e030
>> (XEN) 0000000000010006 ffffffff81b67ea8 000000000000e02b ffffffff81b67f20
>> (XEN) ffffffff81b67f10 ffffffff8105b339 55ffffff81b67f10 5520204355202043
>> (XEN) 5520204355202043 5520204355202043 0020204355202043 0000000000000000
>> (XEN) 0000000000000000 ffffffff81b67f38 0000000000000000 0000000000000000
>> (XEN) 0000000000000000 ffffffff81b67ff0 ffffffff82010d0a 0000000000000000
>> (XEN) 000306f200000000 fed8320300010800 0000000000000000 0000000000000000
>> (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000
>> (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000
>> (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000
>> (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000
>> (XEN) 0000000000000000 ffffffff81b68008 0000000000000000 0000000000000000
>> (XEN) 0000000000000000 0000000000000000 00000000fffedb08
>> (XEN) Domain 0 crashed: rebooting machine in 5 seconds.
>> (XEN) Resetting with ACPI MEMORY or I/O RESET_REG.
>>
>> The crash occurs in pat_init_cache_modes(), called by
>> xen_start_kernel(). The pat value from MSR_IA32_CR_PAT is 0.
>> Strangely, the same kernel and Xen boot just fine on VMware Fusion
>> 8.1.1, even though the MSR is 0 there as well.
Are you hitting BUG_ON in update_cache_mode_entry()? I don't think I can
see how you can avoid it when MSR read returns 0.
>>
>> Anyway, guessing that it's pointless to call pat_init_cache_modes()
>> when the CPU doesn't support PAT, I added a check for cpu_has_pat.
>> This resolves the problem on ESXi and doesn't seem to break real
>> hardware, though I'm not sure how to verify PAT functionality. So
>> this is just an RFC.
Can you start an HVM guest in Xen after your patch below?
> Cc-ing maintainers.
>> diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
>> index 9a29803..209f680 100644
>> --- a/arch/x86/xen/enlighten.c
>> +++ b/arch/x86/xen/enlighten.c
>> @@ -1633,8 +1633,12 @@ asmlinkage __visible void __init xen_start_kernel(void)
>> * Modify the cache mode translation tables to match Xen's PAT
>> * configuration.
>> */
>> - rdmsrl(MSR_IA32_CR_PAT, pat);
>> - pat_init_cache_modes(pat);
>> + if (cpu_has_pat) {
>> + rdmsrl(MSR_IA32_CR_PAT, pat);
>> + pat_init_cache_modes(pat);
>> + } else {
>> + xen_raw_console_write("CPU does not support PAT\n");
>> + }
>>
>> /* keep using Xen gdt for now; no urgent need to change it */
>>
This looks OK to me but I think we should first understand why you don't
crash on Fusion.
Also, PAT initialization code has been rewritten in Linux (for 4.5?) so
I suspect this problem is only observed on earlier kernels.
-boris
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: PAT-related crash booting Linux 4.4 + Xen 4.5 on VMware ESXi
2016-05-23 20:13 ` Boris Ostrovsky
@ 2016-05-23 22:52 ` Ed Swierk
2016-05-24 14:53 ` Kani, Toshimitsu
0 siblings, 1 reply; 8+ messages in thread
From: Ed Swierk @ 2016-05-23 22:52 UTC (permalink / raw)
To: Boris Ostrovsky
Cc: Juergen Gross, xen-devel, Toshi Kani, david.vrabel,
Borislav Petkov
Good question. I ran my tests again, and found I'd misinterpreted the
Fusion behavior.
On Fusion 8.1.1, MSR_IA32_CR_PAT returns a reasonable value:
(XEN) Freed 308kB init memory.
mapping kernel into physical memory
cpu_has_pat=0 cpuid_edx(1)=f89cbf5 pat=65536
pat_init_cache_modes pat=50100070406
pat_init_cache_modes i=7 pat_val=0 cache=3
pat_init_cache_modes ok
pat_init_cache_modes i=6 pat_val=0 cache=3
pat_init_cache_modes ok
pat_init_cache_modes i=5 pat_val=5 cache=5
pat_init_cache_modes ok
pat_init_cache_modes i=4 pat_val=1 cache=1
pat_init_cache_modes ok
pat_init_cache_modes i=3 pat_val=0 cache=3
pat_init_cache_modes ok
pat_init_cache_modes i=2 pat_val=7 cache=2
pat_init_cache_modes ok
pat_init_cache_modes i=1 pat_val=4 cache=4
pat_init_cache_modes ok
pat_init_cache_modes i=0 pat_val=6 cache=0
pat_init_cache_modes ok
pat_init_cache_modes pat_msg=WB WT UC- UC WC WP UC UC
about to get started...
[ 0.000000] x86/PAT: Configuration [0-7]: WB WT UC- UC WC WP UC UC
On ESXi 5.5.0, MSR_IA32_CR_PAT returns 0, and we are indeed hitting
the BUG_ON in update_cache_mode_entry():
(XEN) Freed 312kB init memory.
mapping kernel into physical memory
cpu_has_pat=0 cpuid_edx(1)=f89cbf5 pat=65536
pat_init_cache_modes pat=0
pat_init_cache_modes i=7 pat_val=0 cache=3
pat_init_cache_modes ok
pat_init_cache_modes i=6 pat_val=0 cache=3
pat_init_cache_modes ok
pat_init_cache_modes i=5 pat_val=0 cache=3
pat_init_cache_modes ok
pat_init_cache_modes i=4 pat_val=0 cache=3
pat_init_cache_modes ok
pat_init_cache_modes i=3 pat_val=0 cache=3
pat_init_cache_modes ok
pat_init_cache_modes i=2 pat_val=0 cache=3
pat_init_cache_modes ok
pat_init_cache_modes i=1 pat_val=0 cache=3
pat_init_cache_modes ok
pat_init_cache_modes i=0 pat_val=0 cache=3
(XEN) traps.c:459:d0v0 Unhandled invalid opcode fault/trap [#6] on
VCPU 0 [ec=0000]
(XEN) domain_crash_sync called from entry.S: fault at ffff82d0802276c3
create_bounce_frame+0x12b/0x13a
In both cases, the PAT CPUID feature bit is set, and cpu_has_pat is
always 0 at this early point (so my RFC patch is wrong). The simplest
fix is to call pat_init_cache_modes(pat) only if pat != 0.
This is starting to look like the same logic that's in pat_bsp_init(),
which doesn't seem to be called when booting on Xen. Should it be? Was
Xen deliberately excluded from this PAT emulation change?
https://groups.google.com/d/msg/linux.kernel/JoJKbCOxV0U/PM0I9d1v60kJ
--Ed
On Mon, May 23, 2016 at 1:13 PM, Boris Ostrovsky
<boris.ostrovsky@oracle.com> wrote:
> On 05/23/2016 10:15 AM, Konrad Rzeszutek Wilk wrote:
>> On Fri, May 20, 2016 at 04:58:09PM -0700, Ed Swierk wrote:
>>> (XEN) traps.c:459:d0v0 Unhandled invalid opcode fault/trap [#6] on VCPU 0 [ec=0000]
>>> (XEN) domain_crash_sync called from entry.S: fault at ffff82d0802286c3 create_bounce_frame+0x12b/0x13a
>>> (XEN) Domain 0 (vcpu#0) crashed on cpu#0:
>>> (XEN) ----[ Xen-4.5.4-pre x86_64 debug=n Not tainted ]----
>>> (XEN) CPU: 0
>>> (XEN) RIP: e033:[<ffffffff81053cbd>]
>>> (XEN) RFLAGS: 0000000000000206 EM: 1 CONTEXT: pv guest (d0v0)
>>> (XEN) rax: 0000000000000022 rbx: 00000000ffffffff rcx: 0000000000000000
>>> (XEN) rdx: 0000000000000022 rsi: 0000000000000003 rdi: 0000000000000000
>>> (XEN) rbp: ffffffff81b67ea8 rsp: ffffffff81b67e68 r8: 0000000000000001
>>> (XEN) r9: 0000000000000001 r10: ffffffff81b67f20 r11: 6c61765f74617020
>>> (XEN) r12: 0000000000000000 r13: 0000000000000003 r14: 0000000000000000
>>> (XEN) r15: ffffffff81b67ebb cr0: 000000008005003b cr4: 00000000001526b0
>>> (XEN) cr3: 00000001b16eb000 cr2: 0000000000000000
>>> (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: e02b cs: e033
>>> (XEN) Guest stack trace from rsp=ffffffff81b67e68:
>>> (XEN) 0000000000000000 6c61765f74617020 ffffffff81053cbd 000000010000e030
>>> (XEN) 0000000000010006 ffffffff81b67ea8 000000000000e02b ffffffff81b67f20
>>> (XEN) ffffffff81b67f10 ffffffff8105b339 55ffffff81b67f10 5520204355202043
>>> (XEN) 5520204355202043 5520204355202043 0020204355202043 0000000000000000
>>> (XEN) 0000000000000000 ffffffff81b67f38 0000000000000000 0000000000000000
>>> (XEN) 0000000000000000 ffffffff81b67ff0 ffffffff82010d0a 0000000000000000
>>> (XEN) 000306f200000000 fed8320300010800 0000000000000000 0000000000000000
>>> (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000
>>> (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000
>>> (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000
>>> (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000
>>> (XEN) 0000000000000000 ffffffff81b68008 0000000000000000 0000000000000000
>>> (XEN) 0000000000000000 0000000000000000 00000000fffedb08
>>> (XEN) Domain 0 crashed: rebooting machine in 5 seconds.
>>> (XEN) Resetting with ACPI MEMORY or I/O RESET_REG.
>>>
>>> The crash occurs in pat_init_cache_modes(), called by
>>> xen_start_kernel(). The pat value from MSR_IA32_CR_PAT is 0.
>>> Strangely, the same kernel and Xen boot just fine on VMware Fusion
>>> 8.1.1, even though the MSR is 0 there as well.
>
> Are you hitting BUG_ON in update_cache_mode_entry()? I don't think I can
> see how you can avoid it when MSR read returns 0.
>
>
>>>
>>> Anyway, guessing that it's pointless to call pat_init_cache_modes()
>>> when the CPU doesn't support PAT, I added a check for cpu_has_pat.
>>> This resolves the problem on ESXi and doesn't seem to break real
>>> hardware, though I'm not sure how to verify PAT functionality. So
>>> this is just an RFC.
>
> Can you start an HVM guest in Xen after your patch below?
>
>> Cc-ing maintainers.
>>> diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
>>> index 9a29803..209f680 100644
>>> --- a/arch/x86/xen/enlighten.c
>>> +++ b/arch/x86/xen/enlighten.c
>>> @@ -1633,8 +1633,12 @@ asmlinkage __visible void __init xen_start_kernel(void)
>>> * Modify the cache mode translation tables to match Xen's PAT
>>> * configuration.
>>> */
>>> - rdmsrl(MSR_IA32_CR_PAT, pat);
>>> - pat_init_cache_modes(pat);
>>> + if (cpu_has_pat) {
>>> + rdmsrl(MSR_IA32_CR_PAT, pat);
>>> + pat_init_cache_modes(pat);
>>> + } else {
>>> + xen_raw_console_write("CPU does not support PAT\n");
>>> + }
>>>
>>> /* keep using Xen gdt for now; no urgent need to change it */
>>>
>
> This looks OK to me but I think we should first understand why you don't
> crash on Fusion.
>
> Also, PAT initialization code has been rewritten in Linux (for 4.5?) so
> I suspect this problem is only observed on earlier kernels.
>
> -boris
>
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: PAT-related crash booting Linux 4.4 + Xen 4.5 on VMware ESXi
2016-05-23 22:52 ` Ed Swierk
@ 2016-05-24 14:53 ` Kani, Toshimitsu
2016-05-24 15:25 ` Ed Swierk
2016-05-24 15:54 ` Boris Ostrovsky
0 siblings, 2 replies; 8+ messages in thread
From: Kani, Toshimitsu @ 2016-05-24 14:53 UTC (permalink / raw)
To: boris.ostrovsky@oracle.com, eswierk@skyportsystems.com
Cc: jgross@suse.com, xen-devel@lists.xensource.com, toshi.kani@hp.com,
david.vrabel@citrix.com, bp@suse.de
On Mon, 2016-05-23 at 15:52 -0700, Ed Swierk wrote:
> Good question. I ran my tests again, and found I'd misinterpreted the
> Fusion behavior.
>
> On Fusion 8.1.1, MSR_IA32_CR_PAT returns a reasonable value:
>
> (XEN) Freed 308kB init memory.
> mapping kernel into physical memory
> cpu_has_pat=0 cpuid_edx(1)=f89cbf5 pat=65536
> pat_init_cache_modes pat=50100070406
> pat_init_cache_modes i=7 pat_val=0 cache=3
> pat_init_cache_modes ok
> pat_init_cache_modes i=6 pat_val=0 cache=3
> pat_init_cache_modes ok
> pat_init_cache_modes i=5 pat_val=5 cache=5
> pat_init_cache_modes ok
> pat_init_cache_modes i=4 pat_val=1 cache=1
> pat_init_cache_modes ok
> pat_init_cache_modes i=3 pat_val=0 cache=3
> pat_init_cache_modes ok
> pat_init_cache_modes i=2 pat_val=7 cache=2
> pat_init_cache_modes ok
> pat_init_cache_modes i=1 pat_val=4 cache=4
> pat_init_cache_modes ok
> pat_init_cache_modes i=0 pat_val=6 cache=0
> pat_init_cache_modes ok
> pat_init_cache_modes pat_msg=WB WT UC- UC WC WP UC UC
> about to get started...
> [ 0.000000] x86/PAT: Configuration [0-7]: WB WT UC-
> UC WC WP UC UC
>
> On ESXi 5.5.0, MSR_IA32_CR_PAT returns 0, and we are indeed hitting
> the BUG_ON in update_cache_mode_entry():
>
> (XEN) Freed 312kB init memory.
> mapping kernel into physical memory
> cpu_has_pat=0 cpuid_edx(1)=f89cbf5 pat=65536
> pat_init_cache_modes pat=0
> pat_init_cache_modes i=7 pat_val=0 cache=3
> pat_init_cache_modes ok
> pat_init_cache_modes i=6 pat_val=0 cache=3
> pat_init_cache_modes ok
> pat_init_cache_modes i=5 pat_val=0 cache=3
> pat_init_cache_modes ok
> pat_init_cache_modes i=4 pat_val=0 cache=3
> pat_init_cache_modes ok
> pat_init_cache_modes i=3 pat_val=0 cache=3
> pat_init_cache_modes ok
> pat_init_cache_modes i=2 pat_val=0 cache=3
> pat_init_cache_modes ok
> pat_init_cache_modes i=1 pat_val=0 cache=3
> pat_init_cache_modes ok
> pat_init_cache_modes i=0 pat_val=0 cache=3
> (XEN) traps.c:459:d0v0 Unhandled invalid opcode fault/trap [#6] on
> VCPU 0 [ec=0000]
> (XEN) domain_crash_sync called from entry.S: fault at ffff82d0802276c3
> create_bounce_frame+0x12b/0x13a
>
> In both cases, the PAT CPUID feature bit is set, and cpu_has_pat is
> always 0 at this early point (so my RFC patch is wrong). The simplest
> fix is to call pat_init_cache_modes(pat) only if pat != 0.
>
> This is starting to look like the same logic that's in pat_bsp_init(),
> which doesn't seem to be called when booting on Xen. Should it be? Was
> Xen deliberately excluded from this PAT emulation change?
> https://groups.google.com/d/msg/linux.kernel/JoJKbCOxV0U/PM0I9d1v60kJ
Calling pat_init() requires the CPU rendezvous handler in MTRR, which is
disabled in Xen. This PAT initialization has been problematic, and the
following patches addressed it in 4.6. This will fix your problem as
well.
https://lkml.org/lkml/2016/3/23/500
In particular, patch 6/7 removed the Xen code in question.
https://lkml.org/lkml/2016/3/23/503
Do you need to fix this issue in 4.4? If so, we should be able to request
backporting the patches to 4.4 stable.
-Toshi
>
> --Ed
>
>
> On Mon, May 23, 2016 at 1:13 PM, Boris Ostrovsky
> <boris.ostrovsky@oracle.com> wrote:
> >
> > On 05/23/2016 10:15 AM, Konrad Rzeszutek Wilk wrote:
> > >
> > > On Fri, May 20, 2016 at 04:58:09PM -0700, Ed Swierk wrote:
> > > >
> > > > (XEN) traps.c:459:d0v0 Unhandled invalid opcode fault/trap [#6] on
> > > > VCPU 0 [ec=0000]
> > > > (XEN) domain_crash_sync called from entry.S: fault at
> > > > ffff82d0802286c3 create_bounce_frame+0x12b/0x13a
> > > > (XEN) Domain 0 (vcpu#0) crashed on cpu#0:
> > > > (XEN) ----[ Xen-4.5.4-pre x86_64 debug=n Not tainted ]----
> > > > (XEN) CPU: 0
> > > > (XEN) RIP: e033:[<ffffffff81053cbd>]
> > > > (XEN) RFLAGS: 0000000000000206 EM: 1 CONTEXT: pv guest (d0v0)
> > > > (XEN) rax: 0000000000000022 rbx: 00000000ffffffff rcx:
> > > > 0000000000000000
> > > > (XEN) rdx: 0000000000000022 rsi: 0000000000000003 rdi:
> > > > 0000000000000000
> > > > (XEN) rbp: ffffffff81b67ea8 rsp:
> > > > ffffffff81b67e68 r8: 0000000000000001
> > > > (XEN) r9: 0000000000000001 r10: ffffffff81b67f20 r11:
> > > > 6c61765f74617020
> > > > (XEN) r12: 0000000000000000 r13: 0000000000000003 r14:
> > > > 0000000000000000
> > > > (XEN) r15: ffffffff81b67ebb cr0: 000000008005003b cr4:
> > > > 00000000001526b0
> > > > (XEN) cr3: 00000001b16eb000 cr2: 0000000000000000
> > > > (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: e02b cs:
> > > > e033
> > > > (XEN) Guest stack trace from rsp=ffffffff81b67e68:
> > > > (XEN) 0000000000000000 6c61765f74617020 ffffffff81053cbd
> > > > 000000010000e030
> > > > (XEN) 0000000000010006 ffffffff81b67ea8 000000000000e02b
> > > > ffffffff81b67f20
> > > > (XEN) ffffffff81b67f10 ffffffff8105b339 55ffffff81b67f10
> > > > 5520204355202043
> > > > (XEN) 5520204355202043 5520204355202043 0020204355202043
> > > > 0000000000000000
> > > > (XEN) 0000000000000000 ffffffff81b67f38 0000000000000000
> > > > 0000000000000000
> > > > (XEN) 0000000000000000 ffffffff81b67ff0 ffffffff82010d0a
> > > > 0000000000000000
> > > > (XEN) 000306f200000000 fed8320300010800 0000000000000000
> > > > 0000000000000000
> > > > (XEN) 0000000000000000 0000000000000000 0000000000000000
> > > > 0000000000000000
> > > > (XEN) 0000000000000000 0000000000000000 0000000000000000
> > > > 0000000000000000
> > > > (XEN) 0000000000000000 0000000000000000 0000000000000000
> > > > 0000000000000000
> > > > (XEN) 0000000000000000 0000000000000000 0000000000000000
> > > > 0000000000000000
> > > > (XEN) 0000000000000000 ffffffff81b68008 0000000000000000
> > > > 0000000000000000
> > > > (XEN) 0000000000000000 0000000000000000 00000000fffedb08
> > > > (XEN) Domain 0 crashed: rebooting machine in 5 seconds.
> > > > (XEN) Resetting with ACPI MEMORY or I/O RESET_REG.
> > > >
> > > > The crash occurs in pat_init_cache_modes(), called by
> > > > xen_start_kernel(). The pat value from MSR_IA32_CR_PAT is 0.
> > > > Strangely, the same kernel and Xen boot just fine on VMware Fusion
> > > > 8.1.1, even though the MSR is 0 there as well.
> > Are you hitting BUG_ON in update_cache_mode_entry()? I don't think I
> > can
> > see how you can avoid it when MSR read returns 0.
> >
> >
> > >
> > > >
> > > >
> > > > Anyway, guessing that it's pointless to call pat_init_cache_modes()
> > > > when the CPU doesn't support PAT, I added a check for cpu_has_pat.
> > > > This resolves the problem on ESXi and doesn't seem to break real
> > > > hardware, though I'm not sure how to verify PAT functionality. So
> > > > this is just an RFC.
> > Can you start an HVM guest in Xen after your patch below?
> >
> > >
> > > Cc-ing maintainers.
> > > >
> > > > diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
> > > > index 9a29803..209f680 100644
> > > > --- a/arch/x86/xen/enlighten.c
> > > > +++ b/arch/x86/xen/enlighten.c
> > > > @@ -1633,8 +1633,12 @@ asmlinkage __visible void __init
> > > > xen_start_kernel(void)
> > > > * Modify the cache mode translation tables to match Xen's PAT
> > > > * configuration.
> > > > */
> > > > - rdmsrl(MSR_IA32_CR_PAT, pat);
> > > > - pat_init_cache_modes(pat);
> > > > + if (cpu_has_pat) {
> > > > + rdmsrl(MSR_IA32_CR_PAT, pat);
> > > > + pat_init_cache_modes(pat);
> > > > + } else {
> > > > + xen_raw_console_write("CPU does not support PAT\n");
> > > > + }
> > > >
> > > > /* keep using Xen gdt for now; no urgent need to change it */
> > > >
> > This looks OK to me but I think we should first understand why you
> > don't
> > crash on Fusion.
> >
> > Also, PAT initialization code has been rewritten in Linux (for 4.5?) so
> > I suspect this problem is only observed on earlier kernels.
> >
> > -boris
> >
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: PAT-related crash booting Linux 4.4 + Xen 4.5 on VMware ESXi
2016-05-24 14:53 ` Kani, Toshimitsu
@ 2016-05-24 15:25 ` Ed Swierk
2016-05-24 15:54 ` Boris Ostrovsky
1 sibling, 0 replies; 8+ messages in thread
From: Ed Swierk @ 2016-05-24 15:25 UTC (permalink / raw)
To: Kani, Toshimitsu
Cc: jgross@suse.com, xen-devel@lists.xensource.com, toshi.kani@hp.com,
david.vrabel@citrix.com, boris.ostrovsky@oracle.com, bp@suse.de
Yes, we're just now moving to 4.4 stable, and will be there for a
while, so backporting would be very helpful.
--Ed
On Tue, May 24, 2016 at 7:53 AM, Kani, Toshimitsu <toshi.kani@hpe.com> wrote:
> On Mon, 2016-05-23 at 15:52 -0700, Ed Swierk wrote:
>> Good question. I ran my tests again, and found I'd misinterpreted the
>> Fusion behavior.
>>
>> On Fusion 8.1.1, MSR_IA32_CR_PAT returns a reasonable value:
>>
>> (XEN) Freed 308kB init memory.
>> mapping kernel into physical memory
>> cpu_has_pat=0 cpuid_edx(1)=f89cbf5 pat=65536
>> pat_init_cache_modes pat=50100070406
>> pat_init_cache_modes i=7 pat_val=0 cache=3
>> pat_init_cache_modes ok
>> pat_init_cache_modes i=6 pat_val=0 cache=3
>> pat_init_cache_modes ok
>> pat_init_cache_modes i=5 pat_val=5 cache=5
>> pat_init_cache_modes ok
>> pat_init_cache_modes i=4 pat_val=1 cache=1
>> pat_init_cache_modes ok
>> pat_init_cache_modes i=3 pat_val=0 cache=3
>> pat_init_cache_modes ok
>> pat_init_cache_modes i=2 pat_val=7 cache=2
>> pat_init_cache_modes ok
>> pat_init_cache_modes i=1 pat_val=4 cache=4
>> pat_init_cache_modes ok
>> pat_init_cache_modes i=0 pat_val=6 cache=0
>> pat_init_cache_modes ok
>> pat_init_cache_modes pat_msg=WB WT UC- UC WC WP UC UC
>> about to get started...
>> [ 0.000000] x86/PAT: Configuration [0-7]: WB WT UC-
>> UC WC WP UC UC
>>
>> On ESXi 5.5.0, MSR_IA32_CR_PAT returns 0, and we are indeed hitting
>> the BUG_ON in update_cache_mode_entry():
>>
>> (XEN) Freed 312kB init memory.
>> mapping kernel into physical memory
>> cpu_has_pat=0 cpuid_edx(1)=f89cbf5 pat=65536
>> pat_init_cache_modes pat=0
>> pat_init_cache_modes i=7 pat_val=0 cache=3
>> pat_init_cache_modes ok
>> pat_init_cache_modes i=6 pat_val=0 cache=3
>> pat_init_cache_modes ok
>> pat_init_cache_modes i=5 pat_val=0 cache=3
>> pat_init_cache_modes ok
>> pat_init_cache_modes i=4 pat_val=0 cache=3
>> pat_init_cache_modes ok
>> pat_init_cache_modes i=3 pat_val=0 cache=3
>> pat_init_cache_modes ok
>> pat_init_cache_modes i=2 pat_val=0 cache=3
>> pat_init_cache_modes ok
>> pat_init_cache_modes i=1 pat_val=0 cache=3
>> pat_init_cache_modes ok
>> pat_init_cache_modes i=0 pat_val=0 cache=3
>> (XEN) traps.c:459:d0v0 Unhandled invalid opcode fault/trap [#6] on
>> VCPU 0 [ec=0000]
>> (XEN) domain_crash_sync called from entry.S: fault at ffff82d0802276c3
>> create_bounce_frame+0x12b/0x13a
>>
>> In both cases, the PAT CPUID feature bit is set, and cpu_has_pat is
>> always 0 at this early point (so my RFC patch is wrong). The simplest
>> fix is to call pat_init_cache_modes(pat) only if pat != 0.
>>
>> This is starting to look like the same logic that's in pat_bsp_init(),
>> which doesn't seem to be called when booting on Xen. Should it be? Was
>> Xen deliberately excluded from this PAT emulation change?
>> https://groups.google.com/d/msg/linux.kernel/JoJKbCOxV0U/PM0I9d1v60kJ
>
> Calling pat_init() requires the CPU rendezvous handler in MTRR, which is
> disabled in Xen. This PAT initialization has been problematic, and the
> following patches addressed it in 4.6. This will fix your problem as
> well.
> https://lkml.org/lkml/2016/3/23/500
>
> In particular, patch 6/7 removed the Xen code in question.
> https://lkml.org/lkml/2016/3/23/503
>
> Do you need to fix this issue in 4.4? If so, we should be able to request
> backporting the patches to 4.4 stable.
>
> -Toshi
>
>
>>
>> --Ed
>>
>>
>> On Mon, May 23, 2016 at 1:13 PM, Boris Ostrovsky
>> <boris.ostrovsky@oracle.com> wrote:
>> >
>> > On 05/23/2016 10:15 AM, Konrad Rzeszutek Wilk wrote:
>> > >
>> > > On Fri, May 20, 2016 at 04:58:09PM -0700, Ed Swierk wrote:
>> > > >
>> > > > (XEN) traps.c:459:d0v0 Unhandled invalid opcode fault/trap [#6] on
>> > > > VCPU 0 [ec=0000]
>> > > > (XEN) domain_crash_sync called from entry.S: fault at
>> > > > ffff82d0802286c3 create_bounce_frame+0x12b/0x13a
>> > > > (XEN) Domain 0 (vcpu#0) crashed on cpu#0:
>> > > > (XEN) ----[ Xen-4.5.4-pre x86_64 debug=n Not tainted ]----
>> > > > (XEN) CPU: 0
>> > > > (XEN) RIP: e033:[<ffffffff81053cbd>]
>> > > > (XEN) RFLAGS: 0000000000000206 EM: 1 CONTEXT: pv guest (d0v0)
>> > > > (XEN) rax: 0000000000000022 rbx: 00000000ffffffff rcx:
>> > > > 0000000000000000
>> > > > (XEN) rdx: 0000000000000022 rsi: 0000000000000003 rdi:
>> > > > 0000000000000000
>> > > > (XEN) rbp: ffffffff81b67ea8 rsp:
>> > > > ffffffff81b67e68 r8: 0000000000000001
>> > > > (XEN) r9: 0000000000000001 r10: ffffffff81b67f20 r11:
>> > > > 6c61765f74617020
>> > > > (XEN) r12: 0000000000000000 r13: 0000000000000003 r14:
>> > > > 0000000000000000
>> > > > (XEN) r15: ffffffff81b67ebb cr0: 000000008005003b cr4:
>> > > > 00000000001526b0
>> > > > (XEN) cr3: 00000001b16eb000 cr2: 0000000000000000
>> > > > (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: e02b cs:
>> > > > e033
>> > > > (XEN) Guest stack trace from rsp=ffffffff81b67e68:
>> > > > (XEN) 0000000000000000 6c61765f74617020 ffffffff81053cbd
>> > > > 000000010000e030
>> > > > (XEN) 0000000000010006 ffffffff81b67ea8 000000000000e02b
>> > > > ffffffff81b67f20
>> > > > (XEN) ffffffff81b67f10 ffffffff8105b339 55ffffff81b67f10
>> > > > 5520204355202043
>> > > > (XEN) 5520204355202043 5520204355202043 0020204355202043
>> > > > 0000000000000000
>> > > > (XEN) 0000000000000000 ffffffff81b67f38 0000000000000000
>> > > > 0000000000000000
>> > > > (XEN) 0000000000000000 ffffffff81b67ff0 ffffffff82010d0a
>> > > > 0000000000000000
>> > > > (XEN) 000306f200000000 fed8320300010800 0000000000000000
>> > > > 0000000000000000
>> > > > (XEN) 0000000000000000 0000000000000000 0000000000000000
>> > > > 0000000000000000
>> > > > (XEN) 0000000000000000 0000000000000000 0000000000000000
>> > > > 0000000000000000
>> > > > (XEN) 0000000000000000 0000000000000000 0000000000000000
>> > > > 0000000000000000
>> > > > (XEN) 0000000000000000 0000000000000000 0000000000000000
>> > > > 0000000000000000
>> > > > (XEN) 0000000000000000 ffffffff81b68008 0000000000000000
>> > > > 0000000000000000
>> > > > (XEN) 0000000000000000 0000000000000000 00000000fffedb08
>> > > > (XEN) Domain 0 crashed: rebooting machine in 5 seconds.
>> > > > (XEN) Resetting with ACPI MEMORY or I/O RESET_REG.
>> > > >
>> > > > The crash occurs in pat_init_cache_modes(), called by
>> > > > xen_start_kernel(). The pat value from MSR_IA32_CR_PAT is 0.
>> > > > Strangely, the same kernel and Xen boot just fine on VMware Fusion
>> > > > 8.1.1, even though the MSR is 0 there as well.
>> > Are you hitting BUG_ON in update_cache_mode_entry()? I don't think I
>> > can
>> > see how you can avoid it when MSR read returns 0.
>> >
>> >
>> > >
>> > > >
>> > > >
>> > > > Anyway, guessing that it's pointless to call pat_init_cache_modes()
>> > > > when the CPU doesn't support PAT, I added a check for cpu_has_pat.
>> > > > This resolves the problem on ESXi and doesn't seem to break real
>> > > > hardware, though I'm not sure how to verify PAT functionality. So
>> > > > this is just an RFC.
>> > Can you start an HVM guest in Xen after your patch below?
>> >
>> > >
>> > > Cc-ing maintainers.
>> > > >
>> > > > diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
>> > > > index 9a29803..209f680 100644
>> > > > --- a/arch/x86/xen/enlighten.c
>> > > > +++ b/arch/x86/xen/enlighten.c
>> > > > @@ -1633,8 +1633,12 @@ asmlinkage __visible void __init
>> > > > xen_start_kernel(void)
>> > > > * Modify the cache mode translation tables to match Xen's PAT
>> > > > * configuration.
>> > > > */
>> > > > - rdmsrl(MSR_IA32_CR_PAT, pat);
>> > > > - pat_init_cache_modes(pat);
>> > > > + if (cpu_has_pat) {
>> > > > + rdmsrl(MSR_IA32_CR_PAT, pat);
>> > > > + pat_init_cache_modes(pat);
>> > > > + } else {
>> > > > + xen_raw_console_write("CPU does not support PAT\n");
>> > > > + }
>> > > >
>> > > > /* keep using Xen gdt for now; no urgent need to change it */
>> > > >
>> > This looks OK to me but I think we should first understand why you
>> > don't
>> > crash on Fusion.
>> >
>> > Also, PAT initialization code has been rewritten in Linux (for 4.5?) so
>> > I suspect this problem is only observed on earlier kernels.
>> >
>> > -boris
>> >
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: PAT-related crash booting Linux 4.4 + Xen 4.5 on VMware ESXi
2016-05-24 14:53 ` Kani, Toshimitsu
2016-05-24 15:25 ` Ed Swierk
@ 2016-05-24 15:54 ` Boris Ostrovsky
2016-05-24 16:59 ` Kani, Toshimitsu
1 sibling, 1 reply; 8+ messages in thread
From: Boris Ostrovsky @ 2016-05-24 15:54 UTC (permalink / raw)
To: Kani, Toshimitsu, eswierk@skyportsystems.com
Cc: jgross@suse.com, xen-devel@lists.xensource.com, toshi.kani@hp.com,
david.vrabel@citrix.com, bp@suse.de
On 05/24/2016 10:53 AM, Kani, Toshimitsu wrote:
> On Mon, 2016-05-23 at 15:52 -0700, Ed Swierk wrote:
>> Good question. I ran my tests again, and found I'd misinterpreted the
>> Fusion behavior.
>>
>> On Fusion 8.1.1, MSR_IA32_CR_PAT returns a reasonable value:
>>
>> (XEN) Freed 308kB init memory.
>> mapping kernel into physical memory
>> cpu_has_pat=0 cpuid_edx(1)=f89cbf5 pat=65536
>> pat_init_cache_modes pat=50100070406
>> pat_init_cache_modes i=7 pat_val=0 cache=3
>> pat_init_cache_modes ok
>> pat_init_cache_modes i=6 pat_val=0 cache=3
>> pat_init_cache_modes ok
>> pat_init_cache_modes i=5 pat_val=5 cache=5
>> pat_init_cache_modes ok
>> pat_init_cache_modes i=4 pat_val=1 cache=1
>> pat_init_cache_modes ok
>> pat_init_cache_modes i=3 pat_val=0 cache=3
>> pat_init_cache_modes ok
>> pat_init_cache_modes i=2 pat_val=7 cache=2
>> pat_init_cache_modes ok
>> pat_init_cache_modes i=1 pat_val=4 cache=4
>> pat_init_cache_modes ok
>> pat_init_cache_modes i=0 pat_val=6 cache=0
>> pat_init_cache_modes ok
>> pat_init_cache_modes pat_msg=WB WT UC- UC WC WP UC UC
>> about to get started...
>> [ 0.000000] x86/PAT: Configuration [0-7]: WB WT UC-
>> UC WC WP UC UC
>>
>> On ESXi 5.5.0, MSR_IA32_CR_PAT returns 0, and we are indeed hitting
>> the BUG_ON in update_cache_mode_entry():
>>
>> (XEN) Freed 312kB init memory.
>> mapping kernel into physical memory
>> cpu_has_pat=0 cpuid_edx(1)=f89cbf5 pat=65536
>> pat_init_cache_modes pat=0
>> pat_init_cache_modes i=7 pat_val=0 cache=3
>> pat_init_cache_modes ok
>> pat_init_cache_modes i=6 pat_val=0 cache=3
>> pat_init_cache_modes ok
>> pat_init_cache_modes i=5 pat_val=0 cache=3
>> pat_init_cache_modes ok
>> pat_init_cache_modes i=4 pat_val=0 cache=3
>> pat_init_cache_modes ok
>> pat_init_cache_modes i=3 pat_val=0 cache=3
>> pat_init_cache_modes ok
>> pat_init_cache_modes i=2 pat_val=0 cache=3
>> pat_init_cache_modes ok
>> pat_init_cache_modes i=1 pat_val=0 cache=3
>> pat_init_cache_modes ok
>> pat_init_cache_modes i=0 pat_val=0 cache=3
>> (XEN) traps.c:459:d0v0 Unhandled invalid opcode fault/trap [#6] on
>> VCPU 0 [ec=0000]
>> (XEN) domain_crash_sync called from entry.S: fault at ffff82d0802276c3
>> create_bounce_frame+0x12b/0x13a
>>
>> In both cases, the PAT CPUID feature bit is set, and cpu_has_pat is
>> always 0 at this early point (so my RFC patch is wrong). The simplest
>> fix is to call pat_init_cache_modes(pat) only if pat != 0.
>>
>> This is starting to look like the same logic that's in pat_bsp_init(),
>> which doesn't seem to be called when booting on Xen. Should it be? Was
>> Xen deliberately excluded from this PAT emulation change?
>> https://groups.google.com/d/msg/linux.kernel/JoJKbCOxV0U/PM0I9d1v60kJ
> Calling pat_init() requires the CPU rendezvous handler in MTRR, which is
> disabled in Xen. This PAT initialization has been problematic, and the
> following patches addressed it in 4.6. This will fix your problem as
> well.
> https://lkml.org/lkml/2016/3/23/500
>
> In particular, patch 6/7 removed the Xen code in question.
> https://lkml.org/lkml/2016/3/23/503
>
> Do you need to fix this issue in 4.4? If so, we should be able to request
> backporting the patches to 4.4 stable.
Would disabling PAT when the MSR is clearly broken (and not trying to
emulate it) not work?
-boris
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: PAT-related crash booting Linux 4.4 + Xen 4.5 on VMware ESXi
2016-05-24 15:54 ` Boris Ostrovsky
@ 2016-05-24 16:59 ` Kani, Toshimitsu
0 siblings, 0 replies; 8+ messages in thread
From: Kani, Toshimitsu @ 2016-05-24 16:59 UTC (permalink / raw)
To: boris.ostrovsky@oracle.com, eswierk@skyportsystems.com
Cc: jgross@suse.com, xen-devel@lists.xensource.com, Kani, Toshimitsu,
x86@kernel.org, david.vrabel@citrix.com, bp@suse.de
On Tue, 2016-05-24 at 11:54 -0400, Boris Ostrovsky wrote:
> On 05/24/2016 10:53 AM, Kani, Toshimitsu wrote:
> >
> > On Mon, 2016-05-23 at 15:52 -0700, Ed Swierk wrote:
> > >
> > > Good question. I ran my tests again, and found I'd misinterpreted the
> > > Fusion behavior.
> > >
> > > On Fusion 8.1.1, MSR_IA32_CR_PAT returns a reasonable value:
> > >
> > > (XEN) Freed 308kB init memory.
> > > mapping kernel into physical memory
> > > cpu_has_pat=0 cpuid_edx(1)=f89cbf5 pat=65536
> > > pat_init_cache_modes pat=50100070406
> > > pat_init_cache_modes i=7 pat_val=0 cache=3
> > > pat_init_cache_modes ok
> > > pat_init_cache_modes i=6 pat_val=0 cache=3
> > > pat_init_cache_modes ok
> > > pat_init_cache_modes i=5 pat_val=5 cache=5
> > > pat_init_cache_modes ok
> > > pat_init_cache_modes i=4 pat_val=1 cache=1
> > > pat_init_cache_modes ok
> > > pat_init_cache_modes i=3 pat_val=0 cache=3
> > > pat_init_cache_modes ok
> > > pat_init_cache_modes i=2 pat_val=7 cache=2
> > > pat_init_cache_modes ok
> > > pat_init_cache_modes i=1 pat_val=4 cache=4
> > > pat_init_cache_modes ok
> > > pat_init_cache_modes i=0 pat_val=6 cache=0
> > > pat_init_cache_modes ok
> > > pat_init_cache_modes pat_msg=WB WT UC- UC WC WP UC UC
> > > about to get started...
> > > [ 0.000000] x86/PAT: Configuration [0-7]: WB WT UC-
> > > UC WC WP UC UC
> > >
> > > On ESXi 5.5.0, MSR_IA32_CR_PAT returns 0, and we are indeed hitting
> > > the BUG_ON in update_cache_mode_entry():
> > >
> > > (XEN) Freed 312kB init memory.
> > > mapping kernel into physical memory
> > > cpu_has_pat=0 cpuid_edx(1)=f89cbf5 pat=65536
> > > pat_init_cache_modes pat=0
> > > pat_init_cache_modes i=7 pat_val=0 cache=3
> > > pat_init_cache_modes ok
> > > pat_init_cache_modes i=6 pat_val=0 cache=3
> > > pat_init_cache_modes ok
> > > pat_init_cache_modes i=5 pat_val=0 cache=3
> > > pat_init_cache_modes ok
> > > pat_init_cache_modes i=4 pat_val=0 cache=3
> > > pat_init_cache_modes ok
> > > pat_init_cache_modes i=3 pat_val=0 cache=3
> > > pat_init_cache_modes ok
> > > pat_init_cache_modes i=2 pat_val=0 cache=3
> > > pat_init_cache_modes ok
> > > pat_init_cache_modes i=1 pat_val=0 cache=3
> > > pat_init_cache_modes ok
> > > pat_init_cache_modes i=0 pat_val=0 cache=3
> > > (XEN) traps.c:459:d0v0 Unhandled invalid opcode fault/trap [#6] on
> > > VCPU 0 [ec=0000]
> > > (XEN) domain_crash_sync called from entry.S: fault at
> > > ffff82d0802276c3
> > > create_bounce_frame+0x12b/0x13a
> > >
> > > In both cases, the PAT CPUID feature bit is set, and cpu_has_pat is
> > > always 0 at this early point (so my RFC patch is wrong). The simplest
> > > fix is to call pat_init_cache_modes(pat) only if pat != 0.
> > >
> > > This is starting to look like the same logic that's in
> > > pat_bsp_init(),
> > > which doesn't seem to be called when booting on Xen. Should it be?
> > > Was
> > > Xen deliberately excluded from this PAT emulation change?
> > > https://groups.google.com/d/msg/linux.kernel/JoJKbCOxV0U/PM0I9d1v60kJ
> >
> > Calling pat_init() requires the CPU rendezvous handler in MTRR, which
> > is disabled in Xen. This PAT initialization has been problematic, and
> > the following patches addressed it in 4.6. This will fix your problem
> > as well.
> > https://lkml.org/lkml/2016/3/23/500
> >
> > In particular, patch 6/7 removed the Xen code in question.
> > https://lkml.org/lkml/2016/3/23/503
> >
> > Do you need to fix this issue in 4.4? If so, we should be able to
> > request backporting the patches to 4.4 stable.
>
> Would disabling PAT when the MSR is clearly broken (and not trying to
> emulate it) not work?
That should work, but the above patches fix the qemu32 issue also found in
4.4. So, they need to be backported to 4.4.
https://lkml.org/lkml/2016/3/3/828
-Toshi
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2016-05-24 16:59 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-05-20 23:58 PAT-related crash booting Linux 4.4 + Xen 4.5 on VMware ESXi Ed Swierk
2016-05-23 14:15 ` Konrad Rzeszutek Wilk
2016-05-23 20:13 ` Boris Ostrovsky
2016-05-23 22:52 ` Ed Swierk
2016-05-24 14:53 ` Kani, Toshimitsu
2016-05-24 15:25 ` Ed Swierk
2016-05-24 15:54 ` Boris Ostrovsky
2016-05-24 16:59 ` Kani, Toshimitsu
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).