From mboxrd@z Thu Jan 1 00:00:00 1970 From: Philipp Hahn Subject: Re: xenstored crashes with SIGSEGV Date: Tue, 16 Dec 2014 13:04:13 +0100 Message-ID: <54901FBD.3090706@univention.de> References: <546461A2.2070908@univention.de> <1415869951.31613.26.camel@citrix.com> <548B1472.5080302@univention.de> <1418401932.16425.34.camel@citrix.com> <548B1BA8.3090504@univention.de> <1418403387.16425.38.camel@citrix.com> <548B23FA.6070108@univention.de> <1418407116.16425.53.camel@citrix.com> <1418649458.16425.108.camel@citrix.com> <548EEDF5.20808@univention.de> <1418655014.16425.138.camel@citrix.com> <1418665524.16425.171.camel@citrix.com> <548F60BF.4020901@univention.de> <1418726712.16425.213.camel@citrix.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <1418726712.16425.213.camel@citrix.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Ian Campbell Cc: Ian Jackson , Frediano Ziglio , Xen-devel@lists.xen.org List-Id: xen-devel@lists.xenproject.org Hello, On 16.12.2014 11:45, Ian Campbell wrote: > On Mon, 2014-12-15 at 23:29 +0100, Philipp Hahn wrote: >>> I notice in your bugzilla (for a different occurrence, I think): >>>> [2090451.721705] univention-conf[2512]: segfault at ff00000000 ip 000000000045e238 sp 00007ffff68dfa30 error 6 in python2.6[400000+21e000] >>> >>> Which appears to have faulted access 0xff000000000 too. It looks like >>> this process is a python thing, it's nothing to do with xenstored I >>> assume? >> >> Yes, that's one univention-config, which is completely independent of >> xen(stored). >> >>> It seems rather coincidental that it should be accessing the >>> same sort of address and be faulting. >> >> Yes, good catch. I'll have another look at those core dumps. > > With this in mind, please can you confirm what model of machines you've > seen this on, and in particular whether they are all the same class of > machine or whether they are significantly different. They are all from the same vendor, but I have to check the individual models and firmware versions, which might take some time. > The reason being that randomly placed 0xff values in a field of 0x00 > could possibly indicate hardware (e.g. a GPU) DMAing over the wrong > memory pages. Good catch: that would explain why it only happens for us and no one other has seen that strange bug before. Thanks you again. Philipp Hahn