All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ian Campbell <Ian.Campbell@citrix.com>
To: Philipp Hahn <hahn@univention.de>
Cc: Ian Jackson <Ian.Jackson@eu.citrix.com>, Xen-devel@lists.xen.org
Subject: Re: xenstored crashes with SIGSEGV
Date: Tue, 16 Dec 2014 09:51:14 +0000	[thread overview]
Message-ID: <1418723474.16425.193.camel@citrix.com> (raw)
In-Reply-To: <548F60BF.4020901@univention.de>

On Mon, 2014-12-15 at 23:29 +0100, Philipp Hahn wrote:
> Hello Ian,
> 
> On 15.12.2014 18:45, Ian Campbell wrote:
> > On Mon, 2014-12-15 at 14:50 +0000, Ian Campbell wrote:
> >> On Mon, 2014-12-15 at 15:19 +0100, Philipp Hahn wrote:
> >>> I just noticed something strange:
> >>>
> >>>> #3  0x000000000040a684 in tdb_open (name=0xff00000000 <Address
> >>>> 0xff00000000 out of bounds>, hash_size=0,
> >>>>     tdb_flags=4254928, open_flags=-1, mode=3119127560) at tdb.c:1773
> ...
> > I'm reasonably convinced now that this is just a weird artefact of
> > running gdb on an optimised binary, probably a shortcoming in the debug
> > info leading to gdb getting confused.
> > 
> > Unfortunately this also calls into doubt the parameter to talloc_free,
> > perhaps in that context 0xff0000000 is a similar artefact.
> > 
> > Please can you print the entire contents of tdb in the second frame
> > ("print *tdb" ought to do it). I'm curious whether it is all sane or
> > not.
> 
> (gdb) print *tdb
> $1 = {name = 0x0, map_ptr = 0x0, fd = 47, map_size = 65280, read_only =
> 16711680,
>   locked = 0xff0000000000,

So it really does seem to be 0xff0000000000 in memory.

> flags = 0,
> travlocks = {
>     next = 0xff0000, off = 0, hash = 65280}, next = 0xff0000,
>   device = 280375465082880, inode = 16711680, log_fn = 0x4093b0
> <null_log_fn>,
>   hash_fn = 0x4092f0 <default_tdb_hash>, open_flags = 2}

And here we can see tdb->{flags,open_flags} == 0 and 2, contrary to what
the stack trace says we were called with, which was nonsense. Since 0
and 2 are sensible and correspond to what the caller passes I think the
stack trace is just confused.

> (gdb) info registers
> rax            0x0      0
> rbx            0x16bff70        23854960
> rcx            0xffffffffffffffff       -1
> rdx            0x40ecd0 4254928
> rsi            0x0      0
> rdi            0xff0000000000   280375465082880

And here it is in the registers.

> rbp            0x7fcaed6c96a8   0x7fcaed6c96a8
> rsp            0x7fff9dc86330   0x7fff9dc86330
> r8             0x7fcaece54c08   140509534571528
> r9             0xff00000000000000       -72057594037927936
> r10            0x7fcaed08c14c   140509536895308
> r11            0x246    582
> r12            0xd      13
> r13            0xff0000000000   280375465082880

And again.

> r14            0x4093b0 4232112
> r15            0x167d620        23582240
> rip            0x4075c4 0x4075c4 <talloc_chunk_from_ptr+4>

This must be the faulting address.

> eflags         0x10206  [ PF IF RF ]
> cs             0x33     51
> ss             0x2b     43
> ds             0x0      0
> es             0x0      0
> fs             0x0      0
> gs             0x0      0
> fctrl          0x0      0
> fstat          0x0      0
> ftag           0x0      0
> fiseg          0x0      0
> fioff          0x0      0
> foseg          0x0      0
> fooff          0x0      0
> fop            0x0      0
> mxcsr          0x0      [ ]
> 
> (gdb) disassemble
> Dump of assembler code for function talloc_chunk_from_ptr:
> 0x00000000004075c0 <talloc_chunk_from_ptr+0>:   sub    $0x8,%rsp
> 0x00000000004075c4 <talloc_chunk_from_ptr+4>:   mov    -0x8(%rdi),%edx

This is the line corresponding to %rip above which is doing a read via %
rdi, which is 0xff0000000000.

It's reading tc->flags. It's been optimised, tc = pp - SIZE, so it is
loading *(pp-SIZE+offsetof(flags)), which is pp-8 (flags is the last
field in the struct).

So rdi contains pp which == the ptr given as an argument to the
function, so ptr was bogus.

So it seems we really do have tdb->locked containing 0xff0000000000.

This is only allocated in one place which is:
	tdb->locked = talloc_zero_array(tdb, struct tdb_lock_type,
					tdb->header.hash_size+1);
midway through tdb_open_ex. It might be worth inserting a check+log for
this returning  0xff, 0xff00, 0xff0000 ... 0xff0000000000 etc.

> 0x00000000004075c7 <talloc_chunk_from_ptr+7>:   lea    -0x50(%rdi),%rax

This is actually calculating tc, ready for return upon success.

> 0x00000000004075cb <talloc_chunk_from_ptr+11>:  mov    %edx,%ecx
> 0x00000000004075cd <talloc_chunk_from_ptr+13>:  and    $0xfffffffffffffff0,%ecx
> 0x00000000004075d0 <talloc_chunk_from_ptr+16>:  cmp    $0xe814ec70,%ecx
> 0x00000000004075d6 <talloc_chunk_from_ptr+22>:  jne    0x4075e2 <talloc_chunk_from_ptr+34>

(tc->flags & ~0xF) != TALLOC_MAGIC

> 0x00000000004075d8 <talloc_chunk_from_ptr+24>:  and    $0x1,%edx
> 0x00000000004075db <talloc_chunk_from_ptr+27>:  jne    0x4075e2 <talloc_chunk_from_ptr+34>

tc->flags & TALLOC_FLAG_FREE

> 0x00000000004075dd <talloc_chunk_from_ptr+29>:  add    $0x8,%rsp
> 0x00000000004075e1 <talloc_chunk_from_ptr+33>:  retq

Success, return.

> 0x00000000004075e2 <talloc_chunk_from_ptr+34>:  nopw   0x0(%rax,%rax,1)
> 0x00000000004075e8 <talloc_chunk_from_ptr+40>:  callq  0x401b98 <abort@plt>

The two TALLOC_ABORTS both end up here if the checks above fail.

> > Can you also "p $_siginfo._sifields._sigfault.si_addr" (in frame 0).
> > This ought to be the actual faulting address, which ought to give a hint
> > on how much we can trust the parameters in the stack trace.
> 
> Hmm, my gdb refused to access $_siginfo:
> (gdb) show convenience
> $_siginfo = Unable to read siginfo

That's ok, I think I've convinced myself above what the crash is.

Ian.

  reply	other threads:[~2014-12-16  9:51 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-11-13  7:45 xenstored crashes with SIGSEGV Philipp Hahn
2014-11-13  9:12 ` Ian Campbell
2014-12-12 16:14   ` Philipp Hahn
2014-12-12 16:32     ` Ian Campbell
2014-12-12 16:45       ` Philipp Hahn
2014-12-12 16:56         ` Ian Campbell
2014-12-12 17:20           ` Philipp Hahn
2014-12-12 17:58             ` Ian Campbell
2014-12-15 13:17               ` Ian Campbell
2014-12-15 14:19                 ` Philipp Hahn
2014-12-15 14:50                   ` Ian Campbell
2014-12-15 17:45                     ` Ian Campbell
2014-12-15 22:29                       ` Philipp Hahn
2014-12-16  9:51                         ` Ian Campbell [this message]
2014-12-16 10:25                         ` Ian Campbell
2014-12-16 10:45                         ` Ian Campbell
2014-12-16 11:06                           ` Ian Campbell
2014-12-16 11:30                             ` Frediano Ziglio
2014-12-16 12:23                               ` Ian Campbell
2014-12-16 16:13                                 ` Frediano Ziglio
2014-12-16 16:23                                   ` Ian Campbell
2014-12-16 16:44                                     ` Frediano Ziglio
2014-12-17  9:14                                       ` Frediano Ziglio
2014-12-17 12:43                                         ` core dump files do not include all CPU registers? Philipp Hahn
2014-12-18 10:20                                         ` xenstored crashes with SIGSEGV Philipp Hahn
2014-12-18 10:17                                   ` Ian Campbell
2014-12-18 10:25                                     ` David Vrabel
2014-12-19 14:30                                       ` Konrad Rzeszutek Wilk
2014-12-18 10:49                                     ` Jan Beulich
2014-12-18 10:51                                       ` Ian Campbell
2014-12-19 12:36                                     ` Philipp Hahn
2015-01-06  7:19                                       ` Philipp Hahn
2015-03-12 12:08                                         ` Philipp Hahn
2015-03-12 18:17                                           ` Oleg Nesterov
2015-03-12 21:57                                             ` Philipp Hahn
2014-12-16 12:04                           ` Philipp Hahn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1418723474.16425.193.camel@citrix.com \
    --to=ian.campbell@citrix.com \
    --cc=Ian.Jackson@eu.citrix.com \
    --cc=Xen-devel@lists.xen.org \
    --cc=hahn@univention.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.