From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andre Przywara Subject: Re: [PATCH] x86/amd: fix crash as Xen Dom0 on AMD Trinity systems Date: Wed, 30 May 2012 16:02:48 +0200 Message-ID: <4FC62888.9010407@amd.com> References: <1338383402-3838-1-git-send-email-andre.przywara@amd.com> <4FC63DAF0200007800086DC5@nat28.tlf.novell.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; Format="flowed" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <4FC63DAF0200007800086DC5@nat28.tlf.novell.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Jan Beulich Cc: jeremy@goop.org, xen-devel@lists.xensource.com, konrad.wilk@oracle.com, "Shin, Jacob" , linux-kernel@vger.kernel.org, hpa@zytor.com, mingo@elte.hu, tglx@linutronix.de List-Id: xen-devel@lists.xenproject.org On 05/30/2012 03:33 PM, Jan Beulich wrote: >>>> On 30.05.12 at 15:10, Andre Przywara wrote: >> Because we are behind a family check before tweaking the topology >> bit, we can use the standard rd/wrmsr variants for the CPUID feature >> register. >> This fixes a crash when using the kernel as a Xen Dom0 on affected >> Trinity systems. The wrmsrl_amd_safe is not properly paravirtualized >> yet (this will be fixed in another patch). > > I'm not following: If the AMD variants (putting a special value into > %edi) can be freely replaced by the non-AMD variants, why did > the AMD special ones get used in the first place? Older CPUs (K8) needed the AMD variants, starting with family 10h we can use the normal versions. > Further, I can't see how checking_wrmsrl() is being paravirtualized > any better than wrmsrl_amd_safe() - both have nothing but an > exception handling fixup attached to the wrmsr invocation. Care > to point out what actual crash it is that was seen? AFAIK, the difference is between the "l" and the regs version for rd/wrmsr. We have a patch already here to fix this. Will send it out soon. Jacob, can you comment on this? The crash dump: [ 1.601021] ------------[ cut here ]------------ [ 1.601025] kernel BUG at /buildrepos/linux-2.6-stable/arch/x86/include/asm/paravirt.h:133! [ 1.601031] invalid opcode: 0000 [#1] SMP [ 1.601038] CPU 0 [ 1.601041] Modules linked in: [ 1.601047] [ 1.601050] Pid: 0, comm: swapper/0 Not tainted 3.4.0 #1 AMD Virgo Platform/Annapurna [ 1.601061] RIP: e030:[] [] init_amd+0x21d/0x603 [ 1.601072] RSP: e02b:ffffffff81c01df8 EFLAGS: 00010246 [ 1.601076] RAX: 0000000000000000 RBX: ffffffff81ca7500 RCX: 0000000000000000 [ 1.601081] RDX: 0000000000000020 RSI: 000000000000c000 RDI: ffffffff81c01e78 [ 1.601086] RBP: ffffffff81c01ea8 R08: 0000000000004003 R09: 00000000ffffffff [ 1.601090] R10: 0000000000000000 R11: 00000000ffffffff R12: ffffffff81ca7574 [ 1.601095] R13: ffffffff81ca7514 R14: ffffffff81ca754c R15: ffffffff81ca756c [ 1.601103] FS: 0000000000000000(0000) GS:ffff8801ce600000(0000) knlGS:0000000000000000 [ 1.601108] CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b [ 1.601112] CR2: 0000000000000000 CR3: 0000000001c0c000 CR4: 0000000000040660 [ 1.601117] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 1.601122] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 1.601127] Process swapper/0 (pid: 0, threadinfo ffffffff81c00000, task ffffffff81c14020) [ 1.601131] Stack: [ 1.601134] ffffffff81c01ea8 ffffffff8169a7d8 0000001600000017 0000000000000000 [ 1.601146] ffff8801c1c3ac00 ffff8801c1c002d0 ffffffff81c01e48 ffffffff81140118 [ 1.601157] ffffffff811415cc ffff8801c1c04220 ffff8801c1c02480 00000000000000d0 [ 1.601169] Call Trace: [ 1.601175] [] ? get_cpu_cap+0x1f2/0x201 [ 1.601183] [] ? init_kmem_cache_cpus+0x3e/0x4d [ 1.601189] [] ? sysfs_slab_alias+0x60/0x8d [ 1.601195] [] identify_cpu+0x256/0x34d [ 1.601201] [] ? kmem_cache_alloc+0x50/0xe4 [ 1.601209] [] identify_boot_cpu+0x10/0x3c [ 1.601216] [] check_bugs+0x9/0x2d [ 1.601222] [] start_kernel+0x445/0x461 [ 1.601227] [] ? kernel_init+0x1d8/0x1d8 [ 1.601233] [] x86_64_start_reservations+0xba/0xc1 [ 1.601240] [] ? xen_setup_runstate_info+0x2c/0x36 [ 1.601247] [] xen_start_kernel+0x58b/0x592 [ 1.601251] Code: 0f 85 d3 00 00 00 31 c0 48 83 3d cd 5c 58 00 00 48 8d 7d b0 b9 08 00 00 00 f3 ab c7 45 b4 05 10 01 c0 c7 45 cc 3a 20 5a 9c 75 04 <0f> 0b eb fe 4c 8d 75 b0 4c 89 f7 ff 14 25 b0 11 c2 81 85 c0 8b [ 1.601374] RIP [] init_amd+0x21d/0x603 [ 1.601381] RSP [ 1.601391] ---[ end trace a7919e7f17c0a725 ]--- [ 1.601397] Kernel panic - not syncing: Attempted to kill the idle task! > Finally, I would question whether re-enabling the topology > extensions under Xen shouldn't be skipped altogether, perhaps > even on Dom0 (as the hypervisor is controlling this MSR, but in > any case on DomU - the hypervisor won't allow (read: ignore, > not fault on) the write anyway (and will log a message for each > (v)CPU that attempts this). This is probably right. Let me think about this. Thanks for picking this up. Regards, Andre. > >> Signed-off-by: Andre Przywara >> Cc: stable@vger.kernel.org # 3.4+ >> --- >> arch/x86/kernel/cpu/amd.c | 4 ++-- >> 1 files changed, 2 insertions(+), 2 deletions(-) >> >> diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c >> index 146bb62..80ccd99 100644 >> --- a/arch/x86/kernel/cpu/amd.c >> +++ b/arch/x86/kernel/cpu/amd.c >> @@ -586,9 +586,9 @@ static void __cpuinit init_amd(struct cpuinfo_x86 *c) >> !cpu_has(c, X86_FEATURE_TOPOEXT)) { >> u64 val; >> >> - if (!rdmsrl_amd_safe(0xc0011005,&val)) { >> + if (!rdmsrl_safe(0xc0011005,&val)) { >> val |= 1ULL<< 54; >> - wrmsrl_amd_safe(0xc0011005, val); >> + checking_wrmsrl(0xc0011005, val); >> rdmsrl(0xc0011005, val); >> if (val& (1ULL<< 54)) { >> set_cpu_cap(c, X86_FEATURE_TOPOEXT); > > > -- Andre Przywara AMD-Operating System Research Center (OSRC), Dresden, Germany