From: Ian Campbell <Ian.Campbell@citrix.com>
To: Philipp Hahn <hahn@univention.de>
Cc: Ian Jackson <Ian.Jackson@eu.citrix.com>, Xen-devel@lists.xen.org
Subject: Re: xenstored crashes with SIGSEGV
Date: Tue, 16 Dec 2014 09:51:14 +0000 [thread overview]
Message-ID: <1418723474.16425.193.camel@citrix.com> (raw)
In-Reply-To: <548F60BF.4020901@univention.de>
On Mon, 2014-12-15 at 23:29 +0100, Philipp Hahn wrote:
> Hello Ian,
>
> On 15.12.2014 18:45, Ian Campbell wrote:
> > On Mon, 2014-12-15 at 14:50 +0000, Ian Campbell wrote:
> >> On Mon, 2014-12-15 at 15:19 +0100, Philipp Hahn wrote:
> >>> I just noticed something strange:
> >>>
> >>>> #3 0x000000000040a684 in tdb_open (name=0xff00000000 <Address
> >>>> 0xff00000000 out of bounds>, hash_size=0,
> >>>> tdb_flags=4254928, open_flags=-1, mode=3119127560) at tdb.c:1773
> ...
> > I'm reasonably convinced now that this is just a weird artefact of
> > running gdb on an optimised binary, probably a shortcoming in the debug
> > info leading to gdb getting confused.
> >
> > Unfortunately this also calls into doubt the parameter to talloc_free,
> > perhaps in that context 0xff0000000 is a similar artefact.
> >
> > Please can you print the entire contents of tdb in the second frame
> > ("print *tdb" ought to do it). I'm curious whether it is all sane or
> > not.
>
> (gdb) print *tdb
> $1 = {name = 0x0, map_ptr = 0x0, fd = 47, map_size = 65280, read_only =
> 16711680,
> locked = 0xff0000000000,
So it really does seem to be 0xff0000000000 in memory.
> flags = 0,
> travlocks = {
> next = 0xff0000, off = 0, hash = 65280}, next = 0xff0000,
> device = 280375465082880, inode = 16711680, log_fn = 0x4093b0
> <null_log_fn>,
> hash_fn = 0x4092f0 <default_tdb_hash>, open_flags = 2}
And here we can see tdb->{flags,open_flags} == 0 and 2, contrary to what
the stack trace says we were called with, which was nonsense. Since 0
and 2 are sensible and correspond to what the caller passes I think the
stack trace is just confused.
> (gdb) info registers
> rax 0x0 0
> rbx 0x16bff70 23854960
> rcx 0xffffffffffffffff -1
> rdx 0x40ecd0 4254928
> rsi 0x0 0
> rdi 0xff0000000000 280375465082880
And here it is in the registers.
> rbp 0x7fcaed6c96a8 0x7fcaed6c96a8
> rsp 0x7fff9dc86330 0x7fff9dc86330
> r8 0x7fcaece54c08 140509534571528
> r9 0xff00000000000000 -72057594037927936
> r10 0x7fcaed08c14c 140509536895308
> r11 0x246 582
> r12 0xd 13
> r13 0xff0000000000 280375465082880
And again.
> r14 0x4093b0 4232112
> r15 0x167d620 23582240
> rip 0x4075c4 0x4075c4 <talloc_chunk_from_ptr+4>
This must be the faulting address.
> eflags 0x10206 [ PF IF RF ]
> cs 0x33 51
> ss 0x2b 43
> ds 0x0 0
> es 0x0 0
> fs 0x0 0
> gs 0x0 0
> fctrl 0x0 0
> fstat 0x0 0
> ftag 0x0 0
> fiseg 0x0 0
> fioff 0x0 0
> foseg 0x0 0
> fooff 0x0 0
> fop 0x0 0
> mxcsr 0x0 [ ]
>
> (gdb) disassemble
> Dump of assembler code for function talloc_chunk_from_ptr:
> 0x00000000004075c0 <talloc_chunk_from_ptr+0>: sub $0x8,%rsp
> 0x00000000004075c4 <talloc_chunk_from_ptr+4>: mov -0x8(%rdi),%edx
This is the line corresponding to %rip above which is doing a read via %
rdi, which is 0xff0000000000.
It's reading tc->flags. It's been optimised, tc = pp - SIZE, so it is
loading *(pp-SIZE+offsetof(flags)), which is pp-8 (flags is the last
field in the struct).
So rdi contains pp which == the ptr given as an argument to the
function, so ptr was bogus.
So it seems we really do have tdb->locked containing 0xff0000000000.
This is only allocated in one place which is:
tdb->locked = talloc_zero_array(tdb, struct tdb_lock_type,
tdb->header.hash_size+1);
midway through tdb_open_ex. It might be worth inserting a check+log for
this returning 0xff, 0xff00, 0xff0000 ... 0xff0000000000 etc.
> 0x00000000004075c7 <talloc_chunk_from_ptr+7>: lea -0x50(%rdi),%rax
This is actually calculating tc, ready for return upon success.
> 0x00000000004075cb <talloc_chunk_from_ptr+11>: mov %edx,%ecx
> 0x00000000004075cd <talloc_chunk_from_ptr+13>: and $0xfffffffffffffff0,%ecx
> 0x00000000004075d0 <talloc_chunk_from_ptr+16>: cmp $0xe814ec70,%ecx
> 0x00000000004075d6 <talloc_chunk_from_ptr+22>: jne 0x4075e2 <talloc_chunk_from_ptr+34>
(tc->flags & ~0xF) != TALLOC_MAGIC
> 0x00000000004075d8 <talloc_chunk_from_ptr+24>: and $0x1,%edx
> 0x00000000004075db <talloc_chunk_from_ptr+27>: jne 0x4075e2 <talloc_chunk_from_ptr+34>
tc->flags & TALLOC_FLAG_FREE
> 0x00000000004075dd <talloc_chunk_from_ptr+29>: add $0x8,%rsp
> 0x00000000004075e1 <talloc_chunk_from_ptr+33>: retq
Success, return.
> 0x00000000004075e2 <talloc_chunk_from_ptr+34>: nopw 0x0(%rax,%rax,1)
> 0x00000000004075e8 <talloc_chunk_from_ptr+40>: callq 0x401b98 <abort@plt>
The two TALLOC_ABORTS both end up here if the checks above fail.
> > Can you also "p $_siginfo._sifields._sigfault.si_addr" (in frame 0).
> > This ought to be the actual faulting address, which ought to give a hint
> > on how much we can trust the parameters in the stack trace.
>
> Hmm, my gdb refused to access $_siginfo:
> (gdb) show convenience
> $_siginfo = Unable to read siginfo
That's ok, I think I've convinced myself above what the crash is.
Ian.
next prev parent reply other threads:[~2014-12-16 9:51 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-11-13 7:45 xenstored crashes with SIGSEGV Philipp Hahn
2014-11-13 9:12 ` Ian Campbell
2014-12-12 16:14 ` Philipp Hahn
2014-12-12 16:32 ` Ian Campbell
2014-12-12 16:45 ` Philipp Hahn
2014-12-12 16:56 ` Ian Campbell
2014-12-12 17:20 ` Philipp Hahn
2014-12-12 17:58 ` Ian Campbell
2014-12-15 13:17 ` Ian Campbell
2014-12-15 14:19 ` Philipp Hahn
2014-12-15 14:50 ` Ian Campbell
2014-12-15 17:45 ` Ian Campbell
2014-12-15 22:29 ` Philipp Hahn
2014-12-16 9:51 ` Ian Campbell [this message]
2014-12-16 10:25 ` Ian Campbell
2014-12-16 10:45 ` Ian Campbell
2014-12-16 11:06 ` Ian Campbell
2014-12-16 11:30 ` Frediano Ziglio
2014-12-16 12:23 ` Ian Campbell
2014-12-16 16:13 ` Frediano Ziglio
2014-12-16 16:23 ` Ian Campbell
2014-12-16 16:44 ` Frediano Ziglio
2014-12-17 9:14 ` Frediano Ziglio
2014-12-17 12:43 ` core dump files do not include all CPU registers? Philipp Hahn
2014-12-18 10:20 ` xenstored crashes with SIGSEGV Philipp Hahn
2014-12-18 10:17 ` Ian Campbell
2014-12-18 10:25 ` David Vrabel
2014-12-19 14:30 ` Konrad Rzeszutek Wilk
2014-12-18 10:49 ` Jan Beulich
2014-12-18 10:51 ` Ian Campbell
2014-12-19 12:36 ` Philipp Hahn
2015-01-06 7:19 ` Philipp Hahn
2015-03-12 12:08 ` Philipp Hahn
2015-03-12 18:17 ` Oleg Nesterov
2015-03-12 21:57 ` Philipp Hahn
2014-12-16 12:04 ` Philipp Hahn
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1418723474.16425.193.camel@citrix.com \
--to=ian.campbell@citrix.com \
--cc=Ian.Jackson@eu.citrix.com \
--cc=Xen-devel@lists.xen.org \
--cc=hahn@univention.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.