From mboxrd@z Thu Jan 1 00:00:00 1970 From: Ian Campbell Subject: Re: xenstored crashes with SIGSEGV Date: Mon, 15 Dec 2014 14:50:14 +0000 Message-ID: <1418655014.16425.138.camel@citrix.com> References: <546461A2.2070908@univention.de> <1415869951.31613.26.camel@citrix.com> <548B1472.5080302@univention.de> <1418401932.16425.34.camel@citrix.com> <548B1BA8.3090504@univention.de> <1418403387.16425.38.camel@citrix.com> <548B23FA.6070108@univention.de> <1418407116.16425.53.camel@citrix.com> <1418649458.16425.108.camel@citrix.com> <548EEDF5.20808@univention.de> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <548EEDF5.20808@univention.de> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Philipp Hahn Cc: Ian Jackson , Xen-devel@lists.xen.org List-Id: xen-devel@lists.xenproject.org On Mon, 2014-12-15 at 15:19 +0100, Philipp Hahn wrote: > Hello Ian, > > On 15.12.2014 14:17, Ian Campbell wrote: > > On Fri, 2014-12-12 at 17:58 +0000, Ian Campbell wrote: > >> On Fri, 2014-12-12 at 18:20 +0100, Philipp Hahn wrote: > >>> On 12.12.2014 17:56, Ian Campbell wrote: > >>>> On Fri, 2014-12-12 at 17:45 +0100, Philipp Hahn wrote: > >>>>> On 12.12.2014 17:32, Ian Campbell wrote: > >>>>>> On Fri, 2014-12-12 at 17:14 +0100, Philipp Hahn wrote: > ... > >>> The 1st and 2nd trace look like this: ptr in frame #2 looks very bogus. > >>> > >>> (gdb) bt full > >>> #0 talloc_chunk_from_ptr (ptr=0xff00000000) at talloc.c:116 > >>> tc = > >>> #1 0x0000000000407edf in talloc_free (ptr=0xff00000000) at talloc.c:551 > >>> tc = > >>> #2 0x000000000040a348 in tdb_open_ex (name=0x1941fb0 > >>> "/var/lib/xenstored/tdb.0x1935bb0", > > I just noticed something strange: > > > #3 0x000000000040a684 in tdb_open (name=0xff00000000
> 0xff00000000 out of bounds>, hash_size=0, > > tdb_flags=4254928, open_flags=-1, mode=3119127560) at tdb.c:1773 > > #4 0x000000000040a70b in tdb_copy (tdb=0x192e540, outfile=0x1941fb0 > > "/var/lib/xenstored/tdb.0x1935bb0") > > Why does gdb-7.0.1 print "name=0xff000000" here for frame 3, but for > frame 2 and 4 the pointers are correct again? > Verifying the values with an explicit "print" shows them as correct. I has just noticed that and was wondering about that same thing. I'm starting to worry that 0xff00000000 might just be a gdb thing, similar to , but infinitely more misleading. I've also noticed in https://forge.univention.org/bugzilla/show_bug.cgi?id=35104 that the constant can be either 0xff000000, 0xff00000000 or 0xff0000000000 (6, 8 or 10 zeroes). > >>> hash_size=, tdb_flags=0, open_flags= >>> optimized out>, mode=, > >>> log_fn=0x4093b0 , hash_fn=) at > >>> tdb.c:1958 > > > > Please can you confirm what is at line 1958 of your copy of tdb.c. I > > think it will be tdb->locked, but I'd like to be sure. > > Yes, that's the line: > # sed -ne 1958p tdb.c > SAFE_FREE(tdb->locked); Good, thanks. > > You are running a 64-bit dom0, correct? > > yes: x86_64 Thanks for confirming. I'm resurrecting the 64-bit root partition on my test box (which it turns out was still Debian Squeeze!) > > > I've only just noticed that > > 0xff00000000 is >32bits. My testing so far was 32-bit, I don't think it > > should matter wrt use of uninitialised data etc. > > > > I can't help feeling that 0xff00000000 must be some sort of magic > > sentinel value to someone. I can't figure out what though. > > 0xff is too much for bit flip errors. and also two crashes on different > machines in the same location very much rules out any HW error for me. > > My 2nd idea was that someone decremented 0 one too many, but then that > would have to be an 8 bit value - reading the code I didn't see anything > like that. I was wondering if it was an overflow or sign-extension thing, but it doesn't seem likely, not enough high bits set for one thing. > One more thing we noticed: /var/lib/xenstored/ contained the tdb file > and to bit-identical copies after the crash, so I would read that as two > transactions being in progress at the time of the crash. Might be that > this is important. It's certainly worth noting, thanks. Ian.