From mboxrd@z Thu Jan 1 00:00:00 1970 From: Atom2 Subject: Re: HVM domains crash after upgrade from XEN 4.5.1 to 4.5.2 Date: Thu, 19 Nov 2015 20:51:15 +0100 Message-ID: <564E2833.2010709@web2web.at> References: <5644A248.1060505@web2web.at> <5644C1CD.3020202@citrix.com> <56451A2B.9090706@web2web.at> <56459E5F02000078000B4944@prv-mh.provo.novell.com> <5645B6BC.6030603@citrix.com> <56467D44.5040205@web2web.at> <56479A6B.6080102@citrix.com> <5647CE57.50209@web2web.at> <5648E727.6080204@cardoe.com> <56492BDF.5030208@web2web.at> <20151116153107.GD13720@char.us.oracle.com> <564A2B91.2090501@web2web.at> <564A6064.4080800@citrix.com> <564A626E.6010305@web2web.at> <564D00E0.8080004@web2web.at> <564D0720.9020506@citrix.com> <564DB17C02000078000B6B19@prv-mh.provo.novell.com> <564DA69C.9070809@citrix.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; Format="flowed" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <564DA69C.9070809@citrix.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Andrew Cooper , Jan Beulich Cc: Doug Goldstein , xen-devel@lists.xen.org List-Id: xen-devel@lists.xenproject.org Am 19.11.15 um 11:38 schrieb Andrew Cooper: > On 19/11/15 10:24, Jan Beulich wrote: >>>>> On 19.11.15 at 00:17, wrote: >>> The disassembly of do_IRQ now looks like a plausible function, but the >>> consistently faulting address has no plausible way of generating a >>> double fault. I suspect therefore that something has caused memory >>> corruption in Xen .text section. >> Dump of assembler code for function do_IRQ: >> 0xffff82d080176577 <+0>: push %rbp >> 0xffff82d080176578 <+1>: mov %rsp,%rbp >> 0xffff82d08017657b <+4>: push %r15 >> 0xffff82d08017657d <+6>: push %r14 >> 0xffff82d08017657f <+8>: push %r13 >> 0xffff82d080176581 <+10>: push %r12 >> 0xffff82d080176583 <+12>: push %rbx >> 0xffff82d080176584 <+13>: lea -0x1058(%rsp),%rsp >> 0xffff82d08017658c <+21>: orq $0x0,(%rsp) >> 0xffff82d080176591 <+26>: lea 0x1020(%rsp),%rsp >> >> The orq surely has potential for causing a double fault, if %rsp is >> near the stack limit. The two LEAs look suspect, presumably a >> result of some non-standard option passed to gcc. Removing that >> option might already be a step forward. Andrew, Jan - thanks again. In terms of non-standard options passed to gcc I have tried to make sense of what flags are actually being used during the build process. I am not absolutely sure, but I think the options passed to gcc are as follows: I do have system wide flags which are used for non-debug builds: CFLAGS="-march=native -O2 -pipe -fomit-frame-pointer" CXXFLAGS="${CFLAGS}" LDFLAGS="-Wl,-O1 -Wl,--as-needed" for builds with debug symbols (using splitdebug) there are system wide overrides as follows: CFLAGS="-march=native -O2 -pipe -ggdb" CXXFLAGS="${CFLAGS}" LDFLAGS: I'd assume that this inherits its value from the system wide setting of LDFLAGS for xen (the hypervisor) the build system seems to do the following: CFLAGS="" (i.e. unset CFLAGS) to me this indicates that the rest stays untouched (i.e. either standard or debug flags) for xen-tools (includes e.g. hvmloader) the build system appears to to the following: CFLAGS="" (i.e. unset CFLAGS) CXXFLAGS="${CXXFLAGS} -fno-strict-overflow" LDFLAGS="" (i.e. unset LDFLAGS) So I think there's probably nothing really fancy in my options to gcc. > Actually yes - that is a huge quantity of stack usage. > > (The actual behaviour looks very suspect - it appears to be completely > pointless). > > The #DF handler reports that %rsp in the exception frame is within > range. Having said that, > > (XEN) [ 2.788209] rbp: ffff83080ca8ed78 rsp: ffff83080ca8dcf8 > r8: ffff83080ca9d558 > ... > (XEN) [ 2.837474] Valid stack range: > ffff83080ca8e000-ffff83080ca90000, sp=ffff83080ca8dcf8, > tss.esp0=ffff83080ca8ffc0 > (XEN) [ 2.848969] No stack overflow detected. Skipping stack trace. > > In this case, the stack pointer *is* out of range, and has hit the guard > page. > > This means: > 1) There is some bug in the stack overflow detection in the #DF handler. > 2) Whatever options Gentoo compiles Xen with is sufficient to overflow > the 8K hypervisor stack. Thanks Atom2