From: Philipp Hahn <hahn@univention.de>
To: Ian Campbell <Ian.Campbell@citrix.com>,
Frediano Ziglio <freddy77@gmail.com>,
Oleg Nesterov <oleg@redhat.com>
Cc: George Dunlap <george.dunlap@eu.citrix.com>,
Ian Jackson <Ian.Jackson@eu.citrix.com>,
David Vrabel <david.vrabel@citrix.com>,
Jan Beulich <JBeulich@suse.com>,
Xen-devel@lists.xen.org
Subject: Re: xenstored crashes with SIGSEGV
Date: Thu, 12 Mar 2015 13:08:46 +0100 [thread overview]
Message-ID: <550181CE.8090204@univention.de> (raw)
In-Reply-To: <54AB8C64.5030307@univention.de>
Hello,
On 06.01.2015 08:19, Philipp Hahn wrote:
> On 19.12.2014 13:36, Philipp Hahn wrote:
>> On 18.12.2014 11:17, Ian Campbell wrote:
>>> On Tue, 2014-12-16 at 16:13 +0000, Frediano Ziglio wrote:
>>>> Do we have a bug in Xen that affect SSE instructions (possibly already
>>>> fixed after Philipp version) ?
>>>
>>> I've had a niggling feeling of Deja Vu over this which I'd been putting
>>> down to an old Xen on ARM bug in the area of FPU register switching.
>>>
>>> But it seems at some point (possibly even still) there was a similar
>>> issue with pvops kernels on x86, see:
>>> http://bugs.xenproject.org/xen/bug/40
...
>>> Philipp, what kernel are you guys using?
>>
>> The crash "2014-12-06 01:26:21 xenstored[4337]" happened on linux-3.10.46.
>
> I looked through the changes of v3.10.46..v3.10.63 and found the
> following patches:
> | fb5b6e7 x86, fpu: shift drop_init_fpu() from save_xstate_sig() to
> handle_signal()
> | b888e3d x86, fpu: __restore_xstate_sig()->math_state_restore() needs
> preempt_disable()
>
> They look interesting enough to may have fixed the bug, which could
> explain the strange bit pattern caused by not restoring the FPU state
> correctly.
...
> we're now working on upgrading the dom0 kernel which should give use
> usable core dumps again and may also fix the underlying problem. It that
> bug ever happens again I'll keep you informed.
We're now running 3.10.62 and the situation seems to have improved, but
yesterday and today we got two crashes on different host - this time
both times again in vsnprintf():
> [304534.173707] xenstored[3731]: segfault at 2 ip 00007f6da00805ad sp 00007fff544a2b80 error 4 in libc-2.11.3.so[7f6da003b000+158000]
> (gdb) where
> #0 0x00007f6da00805ad in _IO_vfprintf_internal (s=0x7fff544a3230, format=<value optimized out>, ap=0x7fff544a3790) at vfprintf.c:1617
> #1 0x00007f6da00a2452 in _IO_vsnprintf (string=0x7fff544a3390 "%%p 4249828122762082015 03:11:04 9JT\377\177", maxlen=<value optimized out>, format=0x40da48 "%s %p %04d%02d%02d %02d:%02d:%02d %s (", args=0x7fff544a3790) at vsnprintf.c:120
> #2 0x00000000004029ad in trace (fmt=0x40da48 "%s %p %04d%02d%02d %02d:%02d:%02d %s (") at xenstored_core.c:140
> #3 0x0000000000402c67 in trace_io (conn=0xbb51f0, data=0xbf1fe0, out=0) at xenstored_core.c:174
> #4 0x00000000004041cd in handle_input (conn=0xbb51f0) at xenstored_core.c:1307
> #5 0x0000000000405170 in main (argc=<value optimized out>, argv=<value optimized out>) at xenstored_core.c:1964
The SSE register again contain the 00..ff.. pattern, but accessing
%es:(%rdi)=0x0:0x2 looks very broken.
> (gdb) info all-registers
> rax 0x0 0
> rbx 0x40da48 4250184
> rcx 0xffffffffffffffff -1
> rdx 0x7fff544a3890 140734607538320
> rsi 0x40da69 4250217
> rdi 0x2 2
> rbp 0x7fff544a3790 0x7fff544a3790
> rsp 0x7fff544a3390 0x7fff544a3390
> r8 0x1 1
> r9 0x2 2
> r10 0x2 2
> r11 0x10 16
> r12 0x0 0
> r13 0x7fff544a3950 140734607538512
> r14 0x7fff544a39d0 140734607538640
> r15 0xc 12
> rip 0x4029ad 0x4029ad <trace+221>
> eflags 0x10286 [ PF SF IF RF ]
> cs 0xe033 57395
> ss 0xe02b 57387
> ds 0x0 0
> es 0x0 0
> fs 0x0 0
> gs 0x0 0
> st0 0 (raw 0x00000000000000000000)
> st1 0 (raw 0x00000000000000000000)
> st2 0 (raw 0x00000000000000000000)
> st3 0 (raw 0x00000000000000000000)
> st4 0 (raw 0x00000000000000000000)
> st5 0 (raw 0x00000000000000000000)
> st6 0 (raw 0x00000000000000000000)
> st7 0 (raw 0x00000000000000000000)
> fctrl 0x37f 895
> fstat 0x0 0
> ftag 0xffff 65535
> fiseg 0x0 0
> fioff 0x0 0
> foseg 0x0 0
> fooff 0x0 0
> fop 0x0 0
> xmm0 {v4_float = {0x0, 0x0, 0x0, 0x0}, v2_double = {0x0, 0x0}, v16_int8 = {0xff, 0x0, 0x0, 0x0, 0x0, 0xff, 0x0, 0x0, 0x0, 0x0, 0xff, 0x0, 0x0, 0x0, 0x0, 0x0}, v8_int16 = {0xff, 0x0, 0xff00, 0x0, 0x0, 0xff, 0x0, 0x0}, v4_int32 = {0xff, 0xff00, 0xff0000, 0x0}, v2_int64 = {0xff00000000ff, 0xff0000}, uint128 = 0x0000000000ff00000000ff00000000ff}
> xmm1 {v4_float = {0x0, 0x0, 0x0, 0x0}, v2_double = {0x0, 0x0}, v16_int8 = {0x25 <repeats 16 times>}, v8_int16 = {0x2525, 0x2525, 0x2525, 0x2525, 0x2525, 0x2525, 0x2525, 0x2525}, v4_int32 = {0x25252525, 0x25252525, 0x25252525, 0x25252525}, v2_int64 = {0x2525252525252525, 0x2525252525252525}, uint128 = 0x25252525252525252525252525252525}
> xmm2 {v4_float = {0x0, 0x0, 0x0, 0x0}, v2_double = {0x0, 0x0}, v16_int8 = {0x0 <repeats 16 times>}, v8_int16 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int32 = {0x0, 0x0, 0x0, 0x0}, v2_int64 = {0x0, 0x0}, uint128 = 0x00000000000000000000000000000000}
> xmm3 {v4_float = {0x0, 0x0, 0x0, 0x0}, v2_double = {0x0, 0x8000000000000000}, v16_int8 = {0x0 <repeats 14 times>, 0xff, 0xff}, v8_int16 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xffff}, v4_int32 = {0x0, 0x0, 0x0, 0xffff0000}, v2_int64 = {0x0, 0xffff000000000000}, uint128 = 0xffff0000000000000000000000000000}
> xmm4 {v4_float = {0xd34e4f00, 0x0, 0x0, 0x0}, v2_double = {0x0, 0x8000000000000000}, v16_int8 = {0x4f, 0x4e, 0x53, 0x4f, 0x4c, 0x45, 0x3d, 0x2f, 0x64, 0x65, 0x76, 0x2f, 0x63, 0x6f, 0x6e, 0x73}, v8_int16 = {0x4e4f, 0x4f53, 0x454c, 0x2f3d, 0x6564, 0x2f76, 0x6f63, 0x736e}, v4_int32 = {0x4f534e4f, 0x2f3d454c, 0x2f766564, 0x736e6f63}, v2_int64 = {0x2f3d454c4f534e4f, 0x736e6f632f766564}, uint128 = 0x736e6f632f7665642f3d454c4f534e4f}
> xmm5 {v4_float = {0x0, 0x0, 0x0, 0x0}, v2_double = {0x0, 0x0}, v16_int8 = {0x0 <repeats 16 times>}, v8_int16 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int32 = {0x0, 0x0, 0x0, 0x0}, v2_int64 = {0x0, 0x0}, uint128 = 0x00000000000000000000000000000000}
> xmm6 {v4_float = {0x0, 0x0, 0x0, 0x0}, v2_double = {0x0, 0x0}, v16_int8 = {0x0 <repeats 16 times>}, v8_int16 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int32 = {0x0, 0x0, 0x0, 0x0}, v2_int64 = {0x0, 0x0}, uint128 = 0x00000000000000000000000000000000}
> xmm7 {v4_float = {0x0, 0x0, 0x0, 0x0}, v2_double = {0x0, 0x0}, v16_int8 = {0x0 <repeats 16 times>}, v8_int16 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int32 = {0x0, 0x0, 0x0, 0x0}, v2_int64 = {0x0, 0x0}, uint128 = 0x00000000000000000000000000000000}
> xmm8 {v4_float = {0x0, 0x0, 0x0, 0x0}, v2_double = {0x0, 0x0}, v16_int8 = {0x0 <repeats 16 times>}, v8_int16 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int32 = {0x0, 0x0, 0x0, 0x0}, v2_int64 = {0x0, 0x0}, uint128 = 0x00000000000000000000000000000000}
> xmm9 {v4_float = {0x0, 0x0, 0x0, 0x0}, v2_double = {0x0, 0x0}, v16_int8 = {0x0 <repeats 16 times>}, v8_int16 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int32 = {0x0, 0x0, 0x0, 0x0}, v2_int64 = {0x0, 0x0}, uint128 = 0x00000000000000000000000000000000}
> xmm10 {v4_float = {0x0, 0x0, 0x0, 0x0}, v2_double = {0x0, 0x0}, v16_int8 = {0x0 <repeats 16 times>}, v8_int16 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int32 = {0x0, 0x0, 0x0, 0x0}, v2_int64 = {0x0, 0x0}, uint128 = 0x00000000000000000000000000000000}
> xmm11 {v4_float = {0x0, 0x0, 0x0, 0x0}, v2_double = {0x0, 0x0}, v16_int8 = {0x0 <repeats 16 times>}, v8_int16 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int32 = {0x0, 0x0, 0x0, 0x0}, v2_int64 = {0x0, 0x0}, uint128 = 0x00000000000000000000000000000000}
> xmm12 {v4_float = {0x0, 0x0, 0x0, 0x0}, v2_double = {0x0, 0x0}, v16_int8 = {0x0 <repeats 16 times>}, v8_int16 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int32 = {0x0, 0x0, 0x0, 0x0}, v2_int64 = {0x0, 0x0}, uint128 = 0x00000000000000000000000000000000}
> xmm13 {v4_float = {0x0, 0x0, 0x0, 0x0}, v2_double = {0x0, 0x0}, v16_int8 = {0x0 <repeats 16 times>}, v8_int16 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int32 = {0x0, 0x0, 0x0, 0x0}, v2_int64 = {0x0, 0x0}, uint128 = 0x00000000000000000000000000000000}
> xmm14 {v4_float = {0x0, 0x0, 0x0, 0x0}, v2_double = {0x0, 0x0}, v16_int8 = {0x0 <repeats 16 times>}, v8_int16 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int32 = {0x0, 0x0, 0x0, 0x0}, v2_int64 = {0x0, 0x0}, uint128 = 0x00000000000000000000000000000000}
> xmm15 {v4_float = {0x0, 0x0, 0x0, 0x0}, v2_double = {0x0, 0x0}, v16_int8 = {0x0 <repeats 16 times>}, v8_int16 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int32 = {0x0, 0x0, 0x0, 0x0}, v2_int64 = {0x0, 0x0}, uint128 = 0x00000000000000000000000000000000}
> mxcsr 0x1f80 [ IM DM ZM OM UM PM ]
> (gdb) x/20i $pc
> 0x7f6da00805ad <_IO_vfprintf_internal+15357>: repnz scas %es:(%rdi),%al
> 0x7f6da00805af <_IO_vfprintf_internal+15359>: xor %r10d,%r10d
> 0x7f6da00805b2 <_IO_vfprintf_internal+15362>: not %rcx
> 0x7f6da00805b5 <_IO_vfprintf_internal+15365>: lea -0x1(%rcx),%r8
> 0x7f6da00805b9 <_IO_vfprintf_internal+15369>: mov %r8d,%ecx
> 0x7f6da00805bc <_IO_vfprintf_internal+15372>: jmpq 0x7f6da007e00c <_IO_vfprintf_internal+5724>
> 0x7f6da00805c1 <_IO_vfprintf_internal+15377>: mov $0x6,%ecx
> 0x7f6da00805c6 <_IO_vfprintf_internal+15382>: xor %r10d,%r10d
> 0x7f6da00805c9 <_IO_vfprintf_internal+15385>: mov $0x6,%r8d
> 0x7f6da00805cf <_IO_vfprintf_internal+15391>: lea 0xdff57(%rip),%r9 # 0x7f6da016052d <null>
> 0x7f6da00805d6 <_IO_vfprintf_internal+15398>: jmpq 0x7f6da007d546 <_IO_vfprintf_internal+2966>
> 0x7f6da00805db <_IO_vfprintf_internal+15403>: mov 0x8(%r13),%rax
> 0x7f6da00805df <_IO_vfprintf_internal+15407>: lea 0x8(%rax),%rdx
> 0x7f6da00805e3 <_IO_vfprintf_internal+15411>: mov %rdx,0x8(%r13)
> 0x7f6da00805e7 <_IO_vfprintf_internal+15415>: jmpq 0x7f6da007eac2 <_IO_vfprintf_internal+8466>
> 0x7f6da00805ec <_IO_vfprintf_internal+15420>: mov 0x8(%r13),%rax
> 0x7f6da00805f0 <_IO_vfprintf_internal+15424>: lea 0x8(%rax),%rdx
> 0x7f6da00805f4 <_IO_vfprintf_internal+15428>: mov %rdx,0x8(%r13)
> 0x7f6da00805f8 <_IO_vfprintf_internal+15432>: jmpq 0x7f6da007f91e <_IO_vfprintf_internal+12142>
> 0x7f6da00805fd <_IO_vfprintf_internal+15437>: mov 0x8(%r13),%rax
> (gdb) x/64x $sp
> 0x7fff544a2b80: 0x544a3260 0x00007fff 0x00000001 0x00000000
> 0x7fff544a2b90: 0x0040da6a 0x00000000 0x0040da6a 0x00000000
> 0x7fff544a2ba0: 0x544a3260 0x00007fff 0xa007cb39 0x00007f6d
> 0x7fff544a2bb0: 0x00000025 0x00000000 0x00000000 0x00000000
> 0x7fff544a2bc0: 0x544a3110 0x00007fff 0x0040d500 0x00000000
> 0x7fff544a2bd0: 0x0040da48 0x00000000 0x00000000 0x00000000
> 0x7fff544a2be0: 0x00000027 0x00000000 0x544a317c 0x00007fff
> 0x7fff544a2bf0: 0x544a31b8 0x00007fff 0x544a3198 0x00007fff
> 0x7fff544a2c00: 0x00000000 0x00000000 0x00000000 0x00000000
> 0x7fff544a2c10: 0x544a2d00 0x00007fff 0x544a31ac 0x00007fff
> 0x7fff544a2c20: 0x544a31e8 0x00007fff 0x544a31c8 0x00000000
> 0x7fff544a2c30: 0x544a3170 0x00007fff 0xffffffff 0xffffffff
> 0x7fff544a2c40: 0x544a2d30 0x00007fff 0x544a30e8 0x00007fff
> 0x7fff544a2c50: 0x0040da70 0x00000000 0x00000000 0x00000000
> 0x7fff544a2c60: 0x00000000 0xffffe938 0xffffff20 0xffffffff
> 0x7fff544a2c70: 0x544a3238 0x00007fff 0x544a3118 0x00007fff
To me it looks like there is still some register/memory corruption
happening in the kernel or Xen hypervisor.
@Oleg:
Have you seen any other corruption or is one of your patches likely to
fix something like the issue mentioned above:
> $ git l1 --grep fpu v3.10.. -- arch/x86
> c7b228a Merge branch 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
> dc56c0f x86, fpu: Shift "fpu_counter = 0" from copy_thread() to arch_dup_task_struct()
> 5e23fee x86, fpu: copy_process: Sanitize fpu->last_cpu initialization
> f185350 x86, fpu: copy_process: Avoid fpu_alloc/copy if !used_math()
> 31d9633 x86, fpu: Change __thread_fpu_begin() to use use_eager_fpu()
Philipp
next prev parent reply other threads:[~2015-03-12 12:08 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-11-13 7:45 xenstored crashes with SIGSEGV Philipp Hahn
2014-11-13 9:12 ` Ian Campbell
2014-12-12 16:14 ` Philipp Hahn
2014-12-12 16:32 ` Ian Campbell
2014-12-12 16:45 ` Philipp Hahn
2014-12-12 16:56 ` Ian Campbell
2014-12-12 17:20 ` Philipp Hahn
2014-12-12 17:58 ` Ian Campbell
2014-12-15 13:17 ` Ian Campbell
2014-12-15 14:19 ` Philipp Hahn
2014-12-15 14:50 ` Ian Campbell
2014-12-15 17:45 ` Ian Campbell
2014-12-15 22:29 ` Philipp Hahn
2014-12-16 9:51 ` Ian Campbell
2014-12-16 10:25 ` Ian Campbell
2014-12-16 10:45 ` Ian Campbell
2014-12-16 11:06 ` Ian Campbell
2014-12-16 11:30 ` Frediano Ziglio
2014-12-16 12:23 ` Ian Campbell
2014-12-16 16:13 ` Frediano Ziglio
2014-12-16 16:23 ` Ian Campbell
2014-12-16 16:44 ` Frediano Ziglio
2014-12-17 9:14 ` Frediano Ziglio
2014-12-17 12:43 ` core dump files do not include all CPU registers? Philipp Hahn
2014-12-18 10:20 ` xenstored crashes with SIGSEGV Philipp Hahn
2014-12-18 10:17 ` Ian Campbell
2014-12-18 10:25 ` David Vrabel
2014-12-19 14:30 ` Konrad Rzeszutek Wilk
2014-12-18 10:49 ` Jan Beulich
2014-12-18 10:51 ` Ian Campbell
2014-12-19 12:36 ` Philipp Hahn
2015-01-06 7:19 ` Philipp Hahn
2015-03-12 12:08 ` Philipp Hahn [this message]
2015-03-12 18:17 ` Oleg Nesterov
2015-03-12 21:57 ` Philipp Hahn
2014-12-16 12:04 ` Philipp Hahn
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=550181CE.8090204@univention.de \
--to=hahn@univention.de \
--cc=Ian.Campbell@citrix.com \
--cc=Ian.Jackson@eu.citrix.com \
--cc=JBeulich@suse.com \
--cc=Xen-devel@lists.xen.org \
--cc=david.vrabel@citrix.com \
--cc=freddy77@gmail.com \
--cc=george.dunlap@eu.citrix.com \
--cc=oleg@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.