From: Ian Campbell <Ian.Campbell@citrix.com>
To: Frediano Ziglio <freddy77@gmail.com>
Cc: Xen-devel@lists.xen.org, Ian Jackson <Ian.Jackson@eu.citrix.com>,
Philipp Hahn <hahn@univention.de>
Subject: Re: xenstored crashes with SIGSEGV
Date: Tue, 16 Dec 2014 12:23:55 +0000 [thread overview]
Message-ID: <1418732635.16425.221.camel@citrix.com> (raw)
In-Reply-To: <CAHt6W4cfQ+JzhS1zBL4iCWJ3Mg9R25zhG1Wjr=1ukwW0qykNTQ@mail.gmail.com>
On Tue, 2014-12-16 at 11:30 +0000, Frediano Ziglio wrote:
> 2014-12-16 11:06 GMT+00:00 Ian Campbell <Ian.Campbell@citrix.com>:
> > On Tue, 2014-12-16 at 10:45 +0000, Ian Campbell wrote:
> >> On Mon, 2014-12-15 at 23:29 +0100, Philipp Hahn wrote:
> >> > > I notice in your bugzilla (for a different occurrence, I think):
> >> > >> [2090451.721705] univention-conf[2512]: segfault at ff00000000 ip 000000000045e238 sp 00007ffff68dfa30 error 6 in python2.6[400000+21e000]
> >> > >
> >> > > Which appears to have faulted access 0xff000000000 too. It looks like
> >> > > this process is a python thing, it's nothing to do with xenstored I
> >> > > assume?
> >> >
> >> > Yes, that's one univention-config, which is completely independent of
> >> > xen(stored).
> >> >
> >> > > It seems rather coincidental that it should be accessing the
> >> > > same sort of address and be faulting.
> >> >
> >> > Yes, good catch. I'll have another look at those core dumps.
> >>
> >> With this in mind, please can you confirm what model of machines you've
> >> seen this on, and in particular whether they are all the same class of
> >> machine or whether they are significantly different.
> >>
> >> The reason being that randomly placed 0xff values in a field of 0x00
> >> could possibly indicate hardware (e.g. a GPU) DMAing over the wrong
> >> memory pages.
> >
> > Thanks for giving me access to the core files. This is very suspicious:
> > (gdb) frame 2
> > #2 0x000000000040a348 in tdb_open_ex (name=0x1941fb0 "/var/lib/xenstored/tdb.0x1935bb0", hash_size=<value optimized out>, tdb_flags=0, open_flags=<value optimized out>, mode=<value optimized out>,
> > log_fn=0x4093b0 <null_log_fn>, hash_fn=<value optimized out>) at tdb.c:1958
> > 1958 SAFE_FREE(tdb->locked);
> >
> > (gdb) x/96x tdb
> > 0x1921270: 0x00000000 0x00000000 0x00000000 0x00000000
> > 0x1921280: 0x0000001f 0x000000ff 0x0000ff00 0x000000ff
> > 0x1921290: 0x00000000 0x000000ff 0x0000ff00 0x000000ff
> > 0x19212a0: 0x00000000 0x000000ff 0x0000ff00 0x000000ff
> > 0x19212b0: 0x00000000 0x000000ff 0x0000ff00 0x000000ff
> > 0x19212c0: 0x00000000 0x000000ff 0x0000ff00 0x000000ff
> > 0x19212d0: 0x00000000 0x000000ff 0x0000ff00 0x000000ff
> > 0x19212e0: 0x00000000 0x000000ff 0x0000ff00 0x000000ff
> > 0x19212f0: 0x00000000 0x000000ff 0x0000ff00 0x000000ff
> > 0x1921300: 0x00000000 0x000000ff 0x0000ff00 0x000000ff
> > 0x1921310: 0x00000000 0x000000ff 0x0000ff00 0x000000ff
> > 0x1921320: 0x00000000 0x000000ff 0x0000ff00 0x000000ff
> > 0x1921330: 0x00000000 0x000000ff 0x0000ff00 0x000000ff
> > 0x1921340: 0x00000000 0x00000000 0x0000ff00 0x000000ff
> > 0x1921350: 0x00000000 0x000000ff 0x0000ff00 0x000000ff
> > 0x1921360: 0x00000000 0x000000ff 0x0000ff00 0x000000ff
> > 0x1921370: 0x004093b0 0x00000000 0x004092f0 0x00000000
> > 0x1921380: 0x00000002 0x00000000 0x00000091 0x00000000
> > 0x1921390: 0x0193de70 0x00000000 0x01963600 0x00000000
> > 0x19213a0: 0x00000000 0x00000000 0x0193fbb0 0x00000000
> > 0x19213b0: 0x00000000 0x00000000 0x00000000 0x00000000
> > 0x19213c0: 0x00405870 0x00000000 0x0040e3e0 0x00000000
> > 0x19213d0: 0x00000038 0x00000000 0xe814ec70 0x6f2f6567
> > 0x19213e0: 0x01963650 0x00000000 0x0193dec0 0x00000000
> >
> > Something has clearly done a number on the ram of this process.
> > 0x1921270 through 0x192136f is 256 bytes...
> >
> > Since it appears to be happening to other processes too I would hazard
> > that this is not a xenstored issue.
> >
> > Ian.
> >
>
> Good catch Ian!
>
> Strange corruption. Probably not related to xenstored as you
> suggested. I would be curious to see what's before the tdb pointer and
> where does the corruption starts.
(gdb) print tdb
$2 = (TDB_CONTEXT *) 0x1921270
(gdb) x/64x 0x1921200
0x1921200: 0x01921174 0x00000000 0x00000000 0x00000000
0x1921210: 0x01921174 0x00000000 0x00000171 0x00000000
0x1921220: 0x00000000 0x00000000 0x00000000 0x00000000
0x1921230: 0x01941f60 0x00000000 0x00000000 0x00000000
0x1921240: 0x00000000 0x00000000 0x00000000 0x6f630065
0x1921250: 0x00000000 0x00000000 0x0040e8a7 0x00000000
0x1921260: 0x00000118 0x00000000 0xe814ec70 0x00000000
0x1921270: 0x00000000 0x00000000 0x00000000 0x00000000
0x1921280: 0x0000001f 0x000000ff 0x0000ff00 0x000000ff
0x1921290: 0x00000000 0x000000ff 0x0000ff00 0x000000ff
0x19212a0: 0x00000000 0x000000ff 0x0000ff00 0x000000ff
0x19212b0: 0x00000000 0x000000ff 0x0000ff00 0x000000ff
0x19212c0: 0x00000000 0x000000ff 0x0000ff00 0x000000ff
0x19212d0: 0x00000000 0x000000ff 0x0000ff00 0x000000ff
0x19212e0: 0x00000000 0x000000ff 0x0000ff00 0x000000ff
0x19212f0: 0x00000000 0x000000ff 0x0000ff00 0x000000ff
So it appears to start at 0x1921270 or maybe ...6c.
> I also don't understand where the
> "fd = 47" came from a previous mail. 0x1f is 31, not 47 (which is
> 0x2f).
I must have been using a different coredump to the origianl report
(there are several).
In the one which corresponds to the above:
(gdb) print *tdb
$3 = {name = 0x0, map_ptr = 0x0, fd = 31, map_size = 255,
read_only = 65280, locked = 0xff00000000, ecode = 65280, header = {
magic_food = "\377\000\000\000\000\000\000\000\377\000\000\000\000\377\000\000\377\000\000\000\000\000\000\000\377\000\000\000\000\377\000", version = 255, hash_size = 0, rwlocks = 255, reserved = {65280,
255, 0, 255, 65280, 255, 0, 255, 65280, 255, 0, 255, 65280,
255, 0, 255, 65280, 255, 0, 255, 65280, 255, 0, 255, 65280,
255, 0, 255, 65280, 255, 0}}, flags = 0, travlocks = {
next = 0xff0000ff00, off = 0, hash = 255}, next = 0xff0000ff00,
device = 1095216660480, inode = 1095216725760,
log_fn = 0x4093b0 <null_log_fn>,
hash_fn = 0x4092f0 <default_tdb_hash>, open_flags = 2}
(gdb) print/x *tdb
$4 = {name = 0x0, map_ptr = 0x0, fd = 0x1f, map_size = 0xff,
read_only = 0xff00, locked = 0xff00000000, ecode = 0xff00,
header = {magic_food = {0xff, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
0xff, 0x0, 0x0, 0x0, 0x0, 0xff, 0x0, 0x0, 0xff, 0x0, 0x0, 0x0,
0x0, 0x0, 0x0, 0x0, 0xff, 0x0, 0x0, 0x0, 0x0, 0xff, 0x0, 0x0},
version = 0xff, hash_size = 0x0, rwlocks = 0xff, reserved = {
0xff00, 0xff, 0x0, 0xff, 0xff00, 0xff, 0x0, 0xff, 0xff00,
0xff, 0x0, 0xff, 0xff00, 0xff, 0x0, 0xff, 0xff00, 0xff, 0x0,
0xff, 0xff00, 0xff, 0x0, 0xff, 0xff00, 0xff, 0x0, 0xff,
0xff00, 0xff, 0x0}}, flags = 0x0, travlocks = {
next = 0xff0000ff00, off = 0x0, hash = 0xff},
next = 0xff0000ff00, device = 0xff00000000, inode = 0xff0000ff00,
log_fn = 0x4093b0, hash_fn = 0x4092f0, open_flags = 0x2}
which is consistent.
> I would not be surprised about a strange bug in libc or the kernel.
Or even Xen itself, or the h/w.
Ian,
next prev parent reply other threads:[~2014-12-16 12:23 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-11-13 7:45 xenstored crashes with SIGSEGV Philipp Hahn
2014-11-13 9:12 ` Ian Campbell
2014-12-12 16:14 ` Philipp Hahn
2014-12-12 16:32 ` Ian Campbell
2014-12-12 16:45 ` Philipp Hahn
2014-12-12 16:56 ` Ian Campbell
2014-12-12 17:20 ` Philipp Hahn
2014-12-12 17:58 ` Ian Campbell
2014-12-15 13:17 ` Ian Campbell
2014-12-15 14:19 ` Philipp Hahn
2014-12-15 14:50 ` Ian Campbell
2014-12-15 17:45 ` Ian Campbell
2014-12-15 22:29 ` Philipp Hahn
2014-12-16 9:51 ` Ian Campbell
2014-12-16 10:25 ` Ian Campbell
2014-12-16 10:45 ` Ian Campbell
2014-12-16 11:06 ` Ian Campbell
2014-12-16 11:30 ` Frediano Ziglio
2014-12-16 12:23 ` Ian Campbell [this message]
2014-12-16 16:13 ` Frediano Ziglio
2014-12-16 16:23 ` Ian Campbell
2014-12-16 16:44 ` Frediano Ziglio
2014-12-17 9:14 ` Frediano Ziglio
2014-12-17 12:43 ` core dump files do not include all CPU registers? Philipp Hahn
2014-12-18 10:20 ` xenstored crashes with SIGSEGV Philipp Hahn
2014-12-18 10:17 ` Ian Campbell
2014-12-18 10:25 ` David Vrabel
2014-12-19 14:30 ` Konrad Rzeszutek Wilk
2014-12-18 10:49 ` Jan Beulich
2014-12-18 10:51 ` Ian Campbell
2014-12-19 12:36 ` Philipp Hahn
2015-01-06 7:19 ` Philipp Hahn
2015-03-12 12:08 ` Philipp Hahn
2015-03-12 18:17 ` Oleg Nesterov
2015-03-12 21:57 ` Philipp Hahn
2014-12-16 12:04 ` Philipp Hahn
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1418732635.16425.221.camel@citrix.com \
--to=ian.campbell@citrix.com \
--cc=Ian.Jackson@eu.citrix.com \
--cc=Xen-devel@lists.xen.org \
--cc=freddy77@gmail.com \
--cc=hahn@univention.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.