All of lore.kernel.org
 help / color / mirror / Atom feed
From: Philipp Hahn <hahn@univention.de>
To: Ian Campbell <Ian.Campbell@citrix.com>
Cc: Xen-devel@lists.xen.org
Subject: Re: xenstored crashes with SIGSEGV
Date: Fri, 12 Dec 2014 18:20:58 +0100	[thread overview]
Message-ID: <548B23FA.6070108@univention.de> (raw)
In-Reply-To: <1418403387.16425.38.camel@citrix.com>

Hello Ian,

On 12.12.2014 17:56, Ian Campbell wrote:
> On Fri, 2014-12-12 at 17:45 +0100, Philipp Hahn wrote:
>> On 12.12.2014 17:32, Ian Campbell wrote:
>>> On Fri, 2014-12-12 at 17:14 +0100, Philipp Hahn wrote:
>>>> We did enable tracing and now have the xenstored-trace.log of one crash:
>>>> It contains 1.6 billion lines and is 83 GiB.
>>>> It just shows xenstored to crash on TRANSACTION_START.
>>>>
>>>> Is there some tool to feed that trace back into a newly launched xenstored?
>>>
>>> Not that I know of I'm afraid.
>>
>> Okay, then I have to continue with my own tool.
> 
> If you do end up developing a tool to replay a xenstore trace then I
> think that'd be something great to have in tree!

I just need to figure out how to talk to xenstored on the wire: for some
strange reason xenstored is closing the connection to the UNIX socket on
the first write inside a transaction.
Or switch to /usr/share/pyshared/xen/xend/xenstore/xstransact.py...

>>> Do you get a core dump when this happens? You might need to fiddle with
>>> ulimits (some distros disable by default). IIRC there is also some /proc
>>> nob which controls where core dumps go on the filesystem.
>>
>> Not for that specific trace: We first enabled generating core files, but
>> only then discovered that this is not enough.
> 
> How wasn't it enough? You mean you couldn't use gdb to extract a
> backtrace from the core file? Or was something else wrong?

The 1st and 2nd trace look like this: ptr in frame #2 looks very bogus.

(gdb) bt full
#0  talloc_chunk_from_ptr (ptr=0xff00000000) at talloc.c:116
        tc = <value optimized out>
#1  0x0000000000407edf in talloc_free (ptr=0xff00000000) at talloc.c:551
        tc = <value optimized out>
#2  0x000000000040a348 in tdb_open_ex (name=0x1941fb0
"/var/lib/xenstored/tdb.0x1935bb0",
    hash_size=<value optimized out>, tdb_flags=0, open_flags=<value
optimized out>, mode=<value optimized out>,
    log_fn=0x4093b0 <null_log_fn>, hash_fn=<value optimized out>) at
tdb.c:1958
        tdb = 0x1921270
        st = {st_dev = 17, st_ino = 816913342, st_nlink = 1, st_mode =
33184, st_uid = 0, st_gid = 0, __pad0 = 0,
          st_rdev = 0, st_size = 303104, st_blksize = 4096, st_blocks =
592, st_atim = {tv_sec = 1415748063,
            tv_nsec = 87562634}, st_mtim = {tv_sec = 1415748063, tv_nsec
= 87562634}, st_ctim = {
            tv_sec = 1415748063, tv_nsec = 87562634}, __unused = {0, 0, 0}}
        rev = <value optimized out>
        locked = 4232112
        vp = <value optimized out>
#3  0x000000000040a684 in tdb_open (name=0xff00000000 <Address
0xff00000000 out of bounds>, hash_size=0,
    tdb_flags=4254928, open_flags=-1, mode=3119127560) at tdb.c:1773
No locals.
#4  0x000000000040a70b in tdb_copy (tdb=0x192e540, outfile=0x1941fb0
"/var/lib/xenstored/tdb.0x1935bb0")
    at tdb.c:2124
        fd = <value optimized out>
        saved_errno = <value optimized out>
        copy = 0x0
#5  0x0000000000406c2d in do_transaction_start (conn=0x1939550,
in=<value optimized out>)
    at xenstored_transaction.c:164
        trans = 0x1935bb0
        exists = <value optimized out>
        id_str =
"\300L\222\001\000\000\000\000\330!@\000\000\000\000\000P\225\223\001"
#6  0x00000000004045ca in process_message (conn=0x1939550) at
xenstored_core.c:1214
        trans = <value optimized out>
#7  consider_message (conn=0x1939550) at xenstored_core.c:1261
No locals.
#8  handle_input (conn=0x1939550) at xenstored_core.c:1308
        bytes = <value optimized out>
        in = <value optimized out>
#9  0x0000000000405170 in main (argc=<value optimized out>, argv=<value
optimized out>) at xenstored_core.c:1964

A 3rd trace is somewhere completely different:
(gdb) bt
#0  0x00007fcbf066088d in _IO_vfprintf_internal (s=0x7fff46ac3010,
format=<value optimized out>, ap=0x7fff46ac3170)
    at vfprintf.c:1617
#1  0x00007fcbf0682732 in _IO_vsnprintf (string=0x7fff46ac318f "",
maxlen=<value optimized out>,
    format=0x40d4a4 "%.*s", args=0x7fff46ac3170) at vsnprintf.c:120
#2  0x000000000040855b in talloc_vasprintf (t=0x17aaf20, fmt=0x40d4a4
"%.*s", ap=0x7fff46ac31d0) at talloc.c:1104
#3  0x0000000000408666 in talloc_asprintf (t=0x1f, fmt=0xffffe938
<Address 0xffffe938 out of bounds>)
    at talloc.c:1129
#4  0x0000000000403a38 in ask_parents (conn=0x177a1f0, name=0x17aaf20
"/local/domain/0/backend/vif/1/0/accel",
    perm=XS_PERM_READ) at xenstored_core.c:492
#5  errno_from_parents (conn=0x177a1f0, name=0x17aaf20
"/local/domain/0/backend/vif/1/0/accel", perm=XS_PERM_READ)
    at xenstored_core.c:516
#6  get_node (conn=0x177a1f0, name=0x17aaf20
"/local/domain/0/backend/vif/1/0/accel", perm=XS_PERM_READ)
    at xenstored_core.c:543
#7  0x000000000040481d in do_read (conn=0x177a1f0) at xenstored_core.c:744
#8  process_message (conn=0x177a1f0) at xenstored_core.c:1178
#9  consider_message (conn=0x177a1f0) at xenstored_core.c:1261
#10 handle_input (conn=0x177a1f0) at xenstored_core.c:1308
#11 0x0000000000405170 in main (argc=<value optimized out>, argv=<value
optimized out>) at xenstored_core.c:1964


>> It might be interesting to see what happens if you preserve the db and
>> reboot arranging for the new xenstored to start with the old file. If
>> the corruption is part of the file then maybe it can be induced to crash
>> again more quickly.
> 
> Thanks for the pointer, will try.

Didn't crash immediately.
Now running /usr/share/pyshared/xen/xend/xenstore/tests/stress_xs.py for
the weekend.

Thanks again.
Philipp

  reply	other threads:[~2014-12-12 17:20 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-11-13  7:45 xenstored crashes with SIGSEGV Philipp Hahn
2014-11-13  9:12 ` Ian Campbell
2014-12-12 16:14   ` Philipp Hahn
2014-12-12 16:32     ` Ian Campbell
2014-12-12 16:45       ` Philipp Hahn
2014-12-12 16:56         ` Ian Campbell
2014-12-12 17:20           ` Philipp Hahn [this message]
2014-12-12 17:58             ` Ian Campbell
2014-12-15 13:17               ` Ian Campbell
2014-12-15 14:19                 ` Philipp Hahn
2014-12-15 14:50                   ` Ian Campbell
2014-12-15 17:45                     ` Ian Campbell
2014-12-15 22:29                       ` Philipp Hahn
2014-12-16  9:51                         ` Ian Campbell
2014-12-16 10:25                         ` Ian Campbell
2014-12-16 10:45                         ` Ian Campbell
2014-12-16 11:06                           ` Ian Campbell
2014-12-16 11:30                             ` Frediano Ziglio
2014-12-16 12:23                               ` Ian Campbell
2014-12-16 16:13                                 ` Frediano Ziglio
2014-12-16 16:23                                   ` Ian Campbell
2014-12-16 16:44                                     ` Frediano Ziglio
2014-12-17  9:14                                       ` Frediano Ziglio
2014-12-17 12:43                                         ` core dump files do not include all CPU registers? Philipp Hahn
2014-12-18 10:20                                         ` xenstored crashes with SIGSEGV Philipp Hahn
2014-12-18 10:17                                   ` Ian Campbell
2014-12-18 10:25                                     ` David Vrabel
2014-12-19 14:30                                       ` Konrad Rzeszutek Wilk
2014-12-18 10:49                                     ` Jan Beulich
2014-12-18 10:51                                       ` Ian Campbell
2014-12-19 12:36                                     ` Philipp Hahn
2015-01-06  7:19                                       ` Philipp Hahn
2015-03-12 12:08                                         ` Philipp Hahn
2015-03-12 18:17                                           ` Oleg Nesterov
2015-03-12 21:57                                             ` Philipp Hahn
2014-12-16 12:04                           ` Philipp Hahn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=548B23FA.6070108@univention.de \
    --to=hahn@univention.de \
    --cc=Ian.Campbell@citrix.com \
    --cc=Xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.