Re: xenstored crashes with SIGSEGV

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Ian Campbell <Ian.Campbell@citrix.com>
To: Frediano Ziglio <freddy77@gmail.com>
Cc: Xen-devel@lists.xen.org, Ian Jackson <Ian.Jackson@eu.citrix.com>,
	Philipp Hahn <hahn@univention.de>
Subject: Re: xenstored crashes with SIGSEGV
Date: Tue, 16 Dec 2014 12:23:55 +0000	[thread overview]
Message-ID: <1418732635.16425.221.camel@citrix.com> (raw)
In-Reply-To: <CAHt6W4cfQ+JzhS1zBL4iCWJ3Mg9R25zhG1Wjr=1ukwW0qykNTQ@mail.gmail.com>

On Tue, 2014-12-16 at 11:30 +0000, Frediano Ziglio wrote:
> 2014-12-16 11:06 GMT+00:00 Ian Campbell <Ian.Campbell@citrix.com>:
> > On Tue, 2014-12-16 at 10:45 +0000, Ian Campbell wrote:
> >> On Mon, 2014-12-15 at 23:29 +0100, Philipp Hahn wrote:
> >> > > I notice in your bugzilla (for a different occurrence, I think):
> >> > >> [2090451.721705] univention-conf[2512]: segfault at ff00000000 ip 000000000045e238 sp 00007ffff68dfa30 error 6 in python2.6[400000+21e000]
> >> > >
> >> > > Which appears to have faulted access 0xff000000000 too. It looks like
> >> > > this process is a python thing, it's nothing to do with xenstored I
> >> > > assume?
> >> >
> >> > Yes, that's one univention-config, which is completely independent of
> >> > xen(stored).
> >> >
> >> > > It seems rather coincidental that it should be accessing the
> >> > > same sort of address and be faulting.
> >> >
> >> > Yes, good catch. I'll have another look at those core dumps.
> >>
> >> With this in mind, please can you confirm what model of machines you've
> >> seen this on, and in particular whether they are all the same class of
> >> machine or whether they are significantly different.
> >>
> >> The reason being that randomly placed 0xff values in a field of 0x00
> >> could possibly indicate hardware (e.g. a GPU) DMAing over the wrong
> >> memory pages.
> >
> > Thanks for giving me access to the core files. This is very suspicious:
> > (gdb) frame 2
> > #2  0x000000000040a348 in tdb_open_ex (name=0x1941fb0 "/var/lib/xenstored/tdb.0x1935bb0", hash_size=<value optimized out>, tdb_flags=0, open_flags=<value optimized out>, mode=<value optimized out>,
> >     log_fn=0x4093b0 <null_log_fn>, hash_fn=<value optimized out>) at tdb.c:1958
> > 1958            SAFE_FREE(tdb->locked);
> >
> > (gdb) x/96x tdb
> > 0x1921270:      0x00000000      0x00000000      0x00000000      0x00000000
> > 0x1921280:      0x0000001f      0x000000ff      0x0000ff00      0x000000ff
> > 0x1921290:      0x00000000      0x000000ff      0x0000ff00      0x000000ff
> > 0x19212a0:      0x00000000      0x000000ff      0x0000ff00      0x000000ff
> > 0x19212b0:      0x00000000      0x000000ff      0x0000ff00      0x000000ff
> > 0x19212c0:      0x00000000      0x000000ff      0x0000ff00      0x000000ff
> > 0x19212d0:      0x00000000      0x000000ff      0x0000ff00      0x000000ff
> > 0x19212e0:      0x00000000      0x000000ff      0x0000ff00      0x000000ff
> > 0x19212f0:      0x00000000      0x000000ff      0x0000ff00      0x000000ff
> > 0x1921300:      0x00000000      0x000000ff      0x0000ff00      0x000000ff
> > 0x1921310:      0x00000000      0x000000ff      0x0000ff00      0x000000ff
> > 0x1921320:      0x00000000      0x000000ff      0x0000ff00      0x000000ff
> > 0x1921330:      0x00000000      0x000000ff      0x0000ff00      0x000000ff
> > 0x1921340:      0x00000000      0x00000000      0x0000ff00      0x000000ff
> > 0x1921350:      0x00000000      0x000000ff      0x0000ff00      0x000000ff
> > 0x1921360:      0x00000000      0x000000ff      0x0000ff00      0x000000ff
> > 0x1921370:      0x004093b0      0x00000000      0x004092f0      0x00000000
> > 0x1921380:      0x00000002      0x00000000      0x00000091      0x00000000
> > 0x1921390:      0x0193de70      0x00000000      0x01963600      0x00000000
> > 0x19213a0:      0x00000000      0x00000000      0x0193fbb0      0x00000000
> > 0x19213b0:      0x00000000      0x00000000      0x00000000      0x00000000
> > 0x19213c0:      0x00405870      0x00000000      0x0040e3e0      0x00000000
> > 0x19213d0:      0x00000038      0x00000000      0xe814ec70      0x6f2f6567
> > 0x19213e0:      0x01963650      0x00000000      0x0193dec0      0x00000000
> >
> > Something has clearly done a number on the ram of this process.
> > 0x1921270 through 0x192136f is 256 bytes...
> >
> > Since it appears to be happening to other processes too I would hazard
> > that this is not a xenstored issue.
> >
> > Ian.
> >
> 
> Good catch Ian!
> 
> Strange corruption. Probably not related to xenstored as you
> suggested. I would be curious to see what's before the tdb pointer and
> where does the corruption starts.

(gdb) print tdb
$2 = (TDB_CONTEXT *) 0x1921270
(gdb) x/64x 0x1921200
0x1921200:	0x01921174	0x00000000	0x00000000	0x00000000
0x1921210:	0x01921174	0x00000000	0x00000171	0x00000000
0x1921220:	0x00000000	0x00000000	0x00000000	0x00000000
0x1921230:	0x01941f60	0x00000000	0x00000000	0x00000000
0x1921240:	0x00000000	0x00000000	0x00000000	0x6f630065
0x1921250:	0x00000000	0x00000000	0x0040e8a7	0x00000000
0x1921260:	0x00000118	0x00000000	0xe814ec70	0x00000000
0x1921270:	0x00000000	0x00000000	0x00000000	0x00000000
0x1921280:	0x0000001f	0x000000ff	0x0000ff00	0x000000ff
0x1921290:	0x00000000	0x000000ff	0x0000ff00	0x000000ff
0x19212a0:	0x00000000	0x000000ff	0x0000ff00	0x000000ff
0x19212b0:	0x00000000	0x000000ff	0x0000ff00	0x000000ff
0x19212c0:	0x00000000	0x000000ff	0x0000ff00	0x000000ff
0x19212d0:	0x00000000	0x000000ff	0x0000ff00	0x000000ff
0x19212e0:	0x00000000	0x000000ff	0x0000ff00	0x000000ff
0x19212f0:	0x00000000	0x000000ff	0x0000ff00	0x000000ff

So it appears to start at 0x1921270 or maybe ...6c.

>  I also don't understand where the
> "fd = 47" came from a previous mail. 0x1f is 31, not 47 (which is
> 0x2f).

I must have been using a different coredump to the origianl report
(there are several). 

In the one which corresponds to the above:

(gdb) print *tdb
$3 = {name = 0x0, map_ptr = 0x0, fd = 31, map_size = 255, 
  read_only = 65280, locked = 0xff00000000, ecode = 65280, header = {
    magic_food = "\377\000\000\000\000\000\000\000\377\000\000\000\000\377\000\000\377\000\000\000\000\000\000\000\377\000\000\000\000\377\000", version = 255, hash_size = 0, rwlocks = 255, reserved = {65280, 
      255, 0, 255, 65280, 255, 0, 255, 65280, 255, 0, 255, 65280, 
      255, 0, 255, 65280, 255, 0, 255, 65280, 255, 0, 255, 65280, 
      255, 0, 255, 65280, 255, 0}}, flags = 0, travlocks = {
    next = 0xff0000ff00, off = 0, hash = 255}, next = 0xff0000ff00, 
  device = 1095216660480, inode = 1095216725760, 
  log_fn = 0x4093b0 <null_log_fn>, 
  hash_fn = 0x4092f0 <default_tdb_hash>, open_flags = 2}
(gdb) print/x *tdb
$4 = {name = 0x0, map_ptr = 0x0, fd = 0x1f, map_size = 0xff, 
  read_only = 0xff00, locked = 0xff00000000, ecode = 0xff00, 
  header = {magic_food = {0xff, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 
      0xff, 0x0, 0x0, 0x0, 0x0, 0xff, 0x0, 0x0, 0xff, 0x0, 0x0, 0x0, 
      0x0, 0x0, 0x0, 0x0, 0xff, 0x0, 0x0, 0x0, 0x0, 0xff, 0x0, 0x0}, 
    version = 0xff, hash_size = 0x0, rwlocks = 0xff, reserved = {
      0xff00, 0xff, 0x0, 0xff, 0xff00, 0xff, 0x0, 0xff, 0xff00, 
      0xff, 0x0, 0xff, 0xff00, 0xff, 0x0, 0xff, 0xff00, 0xff, 0x0, 
      0xff, 0xff00, 0xff, 0x0, 0xff, 0xff00, 0xff, 0x0, 0xff, 
      0xff00, 0xff, 0x0}}, flags = 0x0, travlocks = {
    next = 0xff0000ff00, off = 0x0, hash = 0xff}, 
  next = 0xff0000ff00, device = 0xff00000000, inode = 0xff0000ff00, 
  log_fn = 0x4093b0, hash_fn = 0x4092f0, open_flags = 0x2}

which is consistent.

> I would not be surprised about a strange bug in libc or the kernel.

Or even Xen itself, or the h/w.

Ian,

next prev parent reply	other threads:[~2014-12-16 12:23 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-11-13  7:45 xenstored crashes with SIGSEGV Philipp Hahn
2014-11-13  9:12 ` Ian Campbell
2014-12-12 16:14   ` Philipp Hahn
2014-12-12 16:32     ` Ian Campbell
2014-12-12 16:45       ` Philipp Hahn
2014-12-12 16:56         ` Ian Campbell
2014-12-12 17:20           ` Philipp Hahn
2014-12-12 17:58             ` Ian Campbell
2014-12-15 13:17               ` Ian Campbell
2014-12-15 14:19                 ` Philipp Hahn
2014-12-15 14:50                   ` Ian Campbell
2014-12-15 17:45                     ` Ian Campbell
2014-12-15 22:29                       ` Philipp Hahn
2014-12-16  9:51                         ` Ian Campbell
2014-12-16 10:25                         ` Ian Campbell
2014-12-16 10:45                         ` Ian Campbell
2014-12-16 11:06                           ` Ian Campbell
2014-12-16 11:30                             ` Frediano Ziglio
2014-12-16 12:23                               ` Ian Campbell [this message]
2014-12-16 16:13                                 ` Frediano Ziglio
2014-12-16 16:23                                   ` Ian Campbell
2014-12-16 16:44                                     ` Frediano Ziglio
2014-12-17  9:14                                       ` Frediano Ziglio
2014-12-17 12:43                                         ` core dump files do not include all CPU registers? Philipp Hahn
2014-12-18 10:20                                         ` xenstored crashes with SIGSEGV Philipp Hahn
2014-12-18 10:17                                   ` Ian Campbell
2014-12-18 10:25                                     ` David Vrabel
2014-12-19 14:30                                       ` Konrad Rzeszutek Wilk
2014-12-18 10:49                                     ` Jan Beulich
2014-12-18 10:51                                       ` Ian Campbell
2014-12-19 12:36                                     ` Philipp Hahn
2015-01-06  7:19                                       ` Philipp Hahn
2015-03-12 12:08                                         ` Philipp Hahn
2015-03-12 18:17                                           ` Oleg Nesterov
2015-03-12 21:57                                             ` Philipp Hahn
2014-12-16 12:04                           ` Philipp Hahn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1418732635.16425.221.camel@citrix.com \
    --to=ian.campbell@citrix.com \
    --cc=Ian.Jackson@eu.citrix.com \
    --cc=Xen-devel@lists.xen.org \
    --cc=freddy77@gmail.com \
    --cc=hahn@univention.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.