From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andrew Cooper Subject: Re: Xen-4.3 and -unstable regression from changeset "numa-sched: leave node-affinity alone if not in 'auto' mode" Date: Mon, 2 Dec 2013 14:01:55 +0000 Message-ID: <529C92D3.8070307@citrix.com> References: <529737AD.7070708@citrix.com> <5297B2DE.1020806@citrix.com> <1385722276.25763.12.camel@kazak.uk.xensource.com> <529874BB.7070803@citrix.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <529874BB.7070803@citrix.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Ian Campbell Cc: George Dunlap , Dario Faggioli , Jan Beulich , Xen-devel List List-Id: xen-devel@lists.xenproject.org On 29/11/13 11:04, Andrew Cooper wrote: > On 29/11/13 10:51, Ian Campbell wrote: >> On Thu, 2013-11-28 at 21:17 +0000, Andrew Cooper wrote: >>> the XenServer patch queue on >> Are you positive that the bug is in the underlying Xen tree and not some >> interaction with a patch in your queue? >> >> A boot time issue ought to be reasonably easy to test with a bare tree. >> >> Ian. >> > I am not sure of anything at the moment, although I have found one > instance of a crash with none of the XenServer patch queue whatsoever. > > At the moment, I have narrowed the problem down to a handful of > instructions writing 0s into a well-formed region of the stack. > Clearly, this is not correct, and every tweak of the debugging causes > the problem to jump around. > > ~Andrew After some more investigation, this is not a regression at all, although the patch is directly relevant to identifying the problem. PXELINUX 4.04 2011-04-18 Copyright (C) 1994-2011 H. Peter Anvin et al boot: Loading xenrt/xen-minnow.gz... ok Loading xenrt/vmlinuz... ok After multiboot magic check Opcode from 0x105fef: 97 0e 00 00 49 8d be b0 Before lret into trampoline Opcode from 0x105fef: 97 0e 00 00 49 8d be b0 After (failed) conditional jmp to start_secondary Opcode from 0xffff830000105fef: 97 0e 86 00 49 8d be b0 __ __ _ _ _____ _ \ \/ /___ _ __ | || | |___ / / | \ // _ \ '_ \ | || |_ |_ \ | | Something between entering the trampoline and emerging in 64bit mode is corrupting a single byte at phys 0x105ff1 from its correct value to a value of 0x86. The corruption disappears if the "no-real-mode" is used. Currently the BIOS is trying to be updated, but the intersection of operating systems which will successfully boot, and will successfully run the IBM update tool is rather low. ~Andrew