From mboxrd@z Thu Jan 1 00:00:00 1970 From: Ian Campbell Subject: Re: xenstored crashes with SIGSEGV Date: Mon, 15 Dec 2014 13:17:38 +0000 Message-ID: <1418649458.16425.108.camel@citrix.com> References: <546461A2.2070908@univention.de> <1415869951.31613.26.camel@citrix.com> <548B1472.5080302@univention.de> <1418401932.16425.34.camel@citrix.com> <548B1BA8.3090504@univention.de> <1418403387.16425.38.camel@citrix.com> <548B23FA.6070108@univention.de> <1418407116.16425.53.camel@citrix.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <1418407116.16425.53.camel@citrix.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Philipp Hahn Cc: Ian Jackson , Xen-devel@lists.xen.org List-Id: xen-devel@lists.xenproject.org On Fri, 2014-12-12 at 17:58 +0000, Ian Campbell wrote: > (adding Ian J who knows a bit more about C xenstored than me...) > > On Fri, 2014-12-12 at 18:20 +0100, Philipp Hahn wrote: > > Hello Ian, > > > > On 12.12.2014 17:56, Ian Campbell wrote: > > > On Fri, 2014-12-12 at 17:45 +0100, Philipp Hahn wrote: > > >> On 12.12.2014 17:32, Ian Campbell wrote: > > >>> On Fri, 2014-12-12 at 17:14 +0100, Philipp Hahn wrote: > > >>>> We did enable tracing and now have the xenstored-trace.log of one crash: > > >>>> It contains 1.6 billion lines and is 83 GiB. > > >>>> It just shows xenstored to crash on TRANSACTION_START. > > >>>> > > >>>> Is there some tool to feed that trace back into a newly launched xenstored? > > >>> > > >>> Not that I know of I'm afraid. > > >> > > >> Okay, then I have to continue with my own tool. > > > > > > If you do end up developing a tool to replay a xenstore trace then I > > > think that'd be something great to have in tree! > > > > I just need to figure out how to talk to xenstored on the wire: for some > > strange reason xenstored is closing the connection to the UNIX socket on > > the first write inside a transaction. > > Or switch to /usr/share/pyshared/xen/xend/xenstore/xstransact.py... > > > > >>> Do you get a core dump when this happens? You might need to fiddle with > > >>> ulimits (some distros disable by default). IIRC there is also some /proc > > >>> nob which controls where core dumps go on the filesystem. > > >> > > >> Not for that specific trace: We first enabled generating core files, but > > >> only then discovered that this is not enough. > > > > > > How wasn't it enough? You mean you couldn't use gdb to extract a > > > backtrace from the core file? Or was something else wrong? > > > > The 1st and 2nd trace look like this: ptr in frame #2 looks very bogus. > > > > (gdb) bt full > > #0 talloc_chunk_from_ptr (ptr=0xff00000000) at talloc.c:116 > > tc = > > #1 0x0000000000407edf in talloc_free (ptr=0xff00000000) at talloc.c:551 > > tc = > > #2 0x000000000040a348 in tdb_open_ex (name=0x1941fb0 > > "/var/lib/xenstored/tdb.0x1935bb0", > > I've timed out for tonight will try and have another look next week. I've had another dig, and have instrumented all of the error paths from this function and I can't see any way for an invalid pointer to be produced, let alone freed. I've been running under valgrind which should have caught any uninitialised memory type errors. > > hash_size=, tdb_flags=0, open_flags= > optimized out>, mode=, > > log_fn=0x4093b0 , hash_fn=) at > > tdb.c:1958 Please can you confirm what is at line 1958 of your copy of tdb.c. I think it will be tdb->locked, but I'd like to be sure. You are running a 64-bit dom0, correct? I've only just noticed that 0xff00000000 is >32bits. My testing so far was 32-bit, I don't think it should matter wrt use of uninitialised data etc. I can't help feeling that 0xff00000000 must be some sort of magic sentinel value to someone. I can't figure out what though. Have you observed the xenstored processes growing especially large before this happens? I'm wondering if there might be a leak somewhere which after a time is resulting a I'm about to send out a patch which plumbs tdb's logging into xenstored's logging, in the hopes that next time you see this it might say something as it dies. Ian.