From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:58559) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YTfRO-0002Qq-Ht for qemu-devel@nongnu.org; Thu, 05 Mar 2015 18:44:31 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1YTfRN-0005QF-8N for qemu-devel@nongnu.org; Thu, 05 Mar 2015 18:44:30 -0500 Received: from mail-qc0-x229.google.com ([2607:f8b0:400d:c01::229]:35500) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YTfRN-0005Po-0L for qemu-devel@nongnu.org; Thu, 05 Mar 2015 18:44:29 -0500 Received: by qcyl6 with SMTP id l6so47012491qcy.2 for ; Thu, 05 Mar 2015 15:44:28 -0800 (PST) MIME-Version: 1.0 In-Reply-To: References: From: Andrey Korolyov Date: Fri, 6 Mar 2015 02:44:07 +0300 Message-ID: Content-Type: text/plain; charset=UTF-8 Subject: Re: [Qemu-devel] E5-2620v2 - emulation stop error List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "qemu-devel@nongnu.org" Cc: "kvm@vger.kernel.org" On Fri, Mar 6, 2015 at 1:14 AM, Andrey Korolyov wrote: > Hello, > > recently I`ve got a couple of shiny new Intel 2620v2s for future > replacement of the E5-2620v1, but I experienced relatively many events > with emulation errors, all traces looks simular to the one below. I am > running qemu-2.1 on x86 on top of 3.10 branch for testing purposes but > can switch to some other versions if necessary. Most of crashes > happened during reboot cycle or at the end of ACPI-based shutdown > action, if this can help. I have zero clues of what can introduce such > a mess inside same processor family using identical software, as > 2620v1 has no simular problem ever. Please let me know if there can be > some side measures for making entire story more clear. > > Thanks! > > KVM internal error. Suberror: 2 > extra data[0]: 800000d1 > extra data[1]: 80000b0d > EAX=00000003 EBX=00000000 ECX=00000000 EDX=00000000 > ESI=00000000 EDI=00000000 EBP=00000000 ESP=00006cd4 > EIP=0000d3f9 EFL=00010202 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0 > ES =0000 00000000 0000ffff 00009300 > CS =f000 000f0000 0000ffff 00009b00 > SS =0000 00000000 0000ffff 00009300 > DS =0000 00000000 0000ffff 00009300 > FS =0000 00000000 0000ffff 00009300 > GS =0000 00000000 0000ffff 00009300 > LDT=0000 00000000 0000ffff 00008200 > TR =0000 00000000 0000ffff 00008b00 > GDT= 000f6e98 00000037 > IDT= 00000000 000003ff > CR0=00000010 CR2=00000000 CR3=00000000 CR4=00000000 > DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 > DR3=0000000000000000 > DR6=00000000ffff0ff0 DR7=0000000000000400 > EFER=0000000000000000 > Code=48 18 67 8c 00 8c d1 8e d9 66 5a 66 58 66 5d 66 c3 cd 02 cb > 10 cb cd 13 cb cd 15 cb cd 16 cb cd 18 cb cd 19 cb cd 1c cb fa fc 66 > b8 00 e0 00 00 8e It turns out that those errors are introduced by APICv, which gets enabled due to different feature set. If anyone is interested in reproducing/fixing this exactly on 3.10, it takes about one hundred of migrations/power state changes for an issue to appear, guest OS can be Linux or Win.