From: Ian Campbell <Ian.Campbell@citrix.com>
To: Philipp Hahn <hahn@univention.de>
Cc: Ian Jackson <Ian.Jackson@eu.citrix.com>, Xen-devel@lists.xen.org
Subject: Re: xenstored crashes with SIGSEGV
Date: Mon, 15 Dec 2014 13:17:38 +0000 [thread overview]
Message-ID: <1418649458.16425.108.camel@citrix.com> (raw)
In-Reply-To: <1418407116.16425.53.camel@citrix.com>
On Fri, 2014-12-12 at 17:58 +0000, Ian Campbell wrote:
> (adding Ian J who knows a bit more about C xenstored than me...)
>
> On Fri, 2014-12-12 at 18:20 +0100, Philipp Hahn wrote:
> > Hello Ian,
> >
> > On 12.12.2014 17:56, Ian Campbell wrote:
> > > On Fri, 2014-12-12 at 17:45 +0100, Philipp Hahn wrote:
> > >> On 12.12.2014 17:32, Ian Campbell wrote:
> > >>> On Fri, 2014-12-12 at 17:14 +0100, Philipp Hahn wrote:
> > >>>> We did enable tracing and now have the xenstored-trace.log of one crash:
> > >>>> It contains 1.6 billion lines and is 83 GiB.
> > >>>> It just shows xenstored to crash on TRANSACTION_START.
> > >>>>
> > >>>> Is there some tool to feed that trace back into a newly launched xenstored?
> > >>>
> > >>> Not that I know of I'm afraid.
> > >>
> > >> Okay, then I have to continue with my own tool.
> > >
> > > If you do end up developing a tool to replay a xenstore trace then I
> > > think that'd be something great to have in tree!
> >
> > I just need to figure out how to talk to xenstored on the wire: for some
> > strange reason xenstored is closing the connection to the UNIX socket on
> > the first write inside a transaction.
> > Or switch to /usr/share/pyshared/xen/xend/xenstore/xstransact.py...
> >
> > >>> Do you get a core dump when this happens? You might need to fiddle with
> > >>> ulimits (some distros disable by default). IIRC there is also some /proc
> > >>> nob which controls where core dumps go on the filesystem.
> > >>
> > >> Not for that specific trace: We first enabled generating core files, but
> > >> only then discovered that this is not enough.
> > >
> > > How wasn't it enough? You mean you couldn't use gdb to extract a
> > > backtrace from the core file? Or was something else wrong?
> >
> > The 1st and 2nd trace look like this: ptr in frame #2 looks very bogus.
> >
> > (gdb) bt full
> > #0 talloc_chunk_from_ptr (ptr=0xff00000000) at talloc.c:116
> > tc = <value optimized out>
> > #1 0x0000000000407edf in talloc_free (ptr=0xff00000000) at talloc.c:551
> > tc = <value optimized out>
> > #2 0x000000000040a348 in tdb_open_ex (name=0x1941fb0
> > "/var/lib/xenstored/tdb.0x1935bb0",
>
> I've timed out for tonight will try and have another look next week.
I've had another dig, and have instrumented all of the error paths from
this function and I can't see any way for an invalid pointer to be
produced, let alone freed. I've been running under valgrind which should
have caught any uninitialised memory type errors.
> > hash_size=<value optimized out>, tdb_flags=0, open_flags=<value
> > optimized out>, mode=<value optimized out>,
> > log_fn=0x4093b0 <null_log_fn>, hash_fn=<value optimized out>) at
> > tdb.c:1958
Please can you confirm what is at line 1958 of your copy of tdb.c. I
think it will be tdb->locked, but I'd like to be sure.
You are running a 64-bit dom0, correct? I've only just noticed that
0xff00000000 is >32bits. My testing so far was 32-bit, I don't think it
should matter wrt use of uninitialised data etc.
I can't help feeling that 0xff00000000 must be some sort of magic
sentinel value to someone. I can't figure out what though.
Have you observed the xenstored processes growing especially large
before this happens? I'm wondering if there might be a leak somewhere
which after a time is resulting a
I'm about to send out a patch which plumbs tdb's logging into
xenstored's logging, in the hopes that next time you see this it might
say something as it dies.
Ian.
next prev parent reply other threads:[~2014-12-15 13:17 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-11-13 7:45 xenstored crashes with SIGSEGV Philipp Hahn
2014-11-13 9:12 ` Ian Campbell
2014-12-12 16:14 ` Philipp Hahn
2014-12-12 16:32 ` Ian Campbell
2014-12-12 16:45 ` Philipp Hahn
2014-12-12 16:56 ` Ian Campbell
2014-12-12 17:20 ` Philipp Hahn
2014-12-12 17:58 ` Ian Campbell
2014-12-15 13:17 ` Ian Campbell [this message]
2014-12-15 14:19 ` Philipp Hahn
2014-12-15 14:50 ` Ian Campbell
2014-12-15 17:45 ` Ian Campbell
2014-12-15 22:29 ` Philipp Hahn
2014-12-16 9:51 ` Ian Campbell
2014-12-16 10:25 ` Ian Campbell
2014-12-16 10:45 ` Ian Campbell
2014-12-16 11:06 ` Ian Campbell
2014-12-16 11:30 ` Frediano Ziglio
2014-12-16 12:23 ` Ian Campbell
2014-12-16 16:13 ` Frediano Ziglio
2014-12-16 16:23 ` Ian Campbell
2014-12-16 16:44 ` Frediano Ziglio
2014-12-17 9:14 ` Frediano Ziglio
2014-12-17 12:43 ` core dump files do not include all CPU registers? Philipp Hahn
2014-12-18 10:20 ` xenstored crashes with SIGSEGV Philipp Hahn
2014-12-18 10:17 ` Ian Campbell
2014-12-18 10:25 ` David Vrabel
2014-12-19 14:30 ` Konrad Rzeszutek Wilk
2014-12-18 10:49 ` Jan Beulich
2014-12-18 10:51 ` Ian Campbell
2014-12-19 12:36 ` Philipp Hahn
2015-01-06 7:19 ` Philipp Hahn
2015-03-12 12:08 ` Philipp Hahn
2015-03-12 18:17 ` Oleg Nesterov
2015-03-12 21:57 ` Philipp Hahn
2014-12-16 12:04 ` Philipp Hahn
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1418649458.16425.108.camel@citrix.com \
--to=ian.campbell@citrix.com \
--cc=Ian.Jackson@eu.citrix.com \
--cc=Xen-devel@lists.xen.org \
--cc=hahn@univention.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.