From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andrew Cooper Subject: Re: Xen-4.3 and -unstable regression from changeset "numa-sched: leave node-affinity alone if not in 'auto' mode" Date: Thu, 28 Nov 2013 21:17:18 +0000 Message-ID: <5297B2DE.1020806@citrix.com> References: <529737AD.7070708@citrix.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <529737AD.7070708@citrix.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Xen-devel List , Dario Faggioli , George Dunlap Cc: Jan Beulich List-Id: xen-devel@lists.xenproject.org On 28/11/13 12:31, Andrew Cooper wrote: > Hello, > > I have recently positivly identified > b54a623efbcf5bff25c55117add1b4427b4e2f1b as causing a boot failure. > > Serial log is attached. The crash is completely deterministic, and is > from an IBM xSeries 3530 M4 server. > > Given the crash and bad patch, I suspect it is more to do with the > NUMA/memory layout than the specifics of the server. > > Dario: Being your patch, do you have any ideas? > > George: Regarding the release, if a fix cant easily be found, it might > be worth considering reverting the change. > > ~Andrew Following some further debugging, this is rather more complicated than I initially thought. There is some form of memory corruption; depending on which exact underlying changeset I base the XenServer patch queue on, or which pages are present in the queue, I get crashes in different locations, including faults from mis-aligned instructions including stack traces which are completely bogus. The saving grace is that the crashes appear to be completely deterministic for a given binary. (although this sever is slower than treacle to boot) ~Andrew