From mboxrd@z Thu Jan 1 00:00:00 1970 From: Weidong Han Subject: Re: RE: kernel panic when enable x2apic Date: Tue, 30 Nov 2010 16:50:11 +0800 Message-ID: <4CF4BAC3.6060009@intel.com> References: <749B9D3DBF0F054390025D9EAFF47F22301755C5@shsmsx501.ccr.corp.intel.com> <4CE3B1640200007800022BA5@vpn.id2.novell.com> <749B9D3DBF0F054390025D9EAFF47F223017565C@shsmsx501.ccr.corp.intel.com> <4CE40FAE0200007800022D71@vpn.id2.novell.com> <749B9D3DBF0F054390025D9EAFF47F22301CB045@shsmsx501.ccr.corp.intel.com> <4CE65CC102000078000234EC@vpn.id2.novell.com> <1847095241.20101119114005@eikelenboom.it> <4CE668C50200007800023536@vpn.id2.novell.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <4CE668C50200007800023536@vpn.id2.novell.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: Jan Beulich Cc: "Zhang, Yang Z" , Sander Eikelenboom , "xen-devel@lists.xensource.com" List-Id: xen-devel@lists.xenproject.org Jan Beulich wrote: >>>> On 19.11.10 at 11:40, Sander Eikelenboom wrote: >>>> >> Hello Jan, >> >> Friday, November 19, 2010, 11:17:21 AM, you wrote: >> >> >>>>>> On 18.11.10 at 05:53, "Zhang, Yang Z" wrote: >>>>>> >>>> From this output, it shows the cpupool_id = 7f034000, I don't know why it >>>> was 7f034000. I think the first cpupool_id should be 0?Am I right? >>>> >>> Yes, it ought to be zero. >>> >>>> Also the fail with write mtrr MSR, the value also is very strange: >>>> ffff83007f0f7670, it totally different with the SDM says. >>>> (XEN) MTRR: CPU 0: Writing MSR 200 to ffff83007f0f7670 failed >>>> >>> Yes, I had indicated so in an earlier reply. >>> >>>> So, I am think that maybe the heap is broken? >>>> >>> General memory corruption is more likely. The question is when it >>> starts. >>> >> General memory corruption could also be hardware related (bad dimm) ? >> > > In general, yes, but this wouldn't normally lead to patterns that look > like valid (albeit misplaced) addresses, I would think. > > Jan > > We root caused this issue. Actually it is not related to x2APIC and c/s 22375, it's caused by incorrectly setting boot_cpu_data.x86_capability. boot_cpu_data.x86_capability is set in identify_cpu, but I found boot_cpu_data.x86_capability[4] is also set in start_vmx, which may overwrite the previous values. This panic is caused by overwriting X86_FEATURE_XSAVE bit in boot_cpu_data.x86_capability. Yang's platform support xsave, and xsave is not enabled (by default), then X86_FEATURE_XSAVE bit will be cleared in boot_cpu_data.x86_capability in init_intel, that means cpu_has_xsave is 0. But later, start_vmx set that bit (cpu_has_xsave is true) again. This results in Xen to allocate xsave area in vcpu_initialise, we observed it may allocate a used address for it, therefore cause the panic. The obvious solution is to remove boot_cpu_data.x86_capability[4] = cpuid_ecx(1) in start_vmx. It indeed works with the change. I will send out the patch after more tests. Regards, Weidong