All of lore.kernel.org
 help / color / mirror / Atom feed
From: Philipp Hahn <hahn@univention.de>
To: Ian Campbell <Ian.Campbell@citrix.com>,
	Frediano Ziglio <freddy77@gmail.com>,
	Oleg Nesterov <oleg@redhat.com>
Cc: George Dunlap <george.dunlap@eu.citrix.com>,
	Ian Jackson <Ian.Jackson@eu.citrix.com>,
	David Vrabel <david.vrabel@citrix.com>,
	Jan Beulich <JBeulich@suse.com>,
	Xen-devel@lists.xen.org
Subject: Re: xenstored crashes with SIGSEGV
Date: Thu, 12 Mar 2015 13:08:46 +0100	[thread overview]
Message-ID: <550181CE.8090204@univention.de> (raw)
In-Reply-To: <54AB8C64.5030307@univention.de>

Hello,

On 06.01.2015 08:19, Philipp Hahn wrote:
> On 19.12.2014 13:36, Philipp Hahn wrote:
>> On 18.12.2014 11:17, Ian Campbell wrote:
>>> On Tue, 2014-12-16 at 16:13 +0000, Frediano Ziglio wrote:
>>>> Do we have a bug in Xen that affect SSE instructions (possibly already
>>>> fixed after Philipp version) ?
>>>
>>> I've had a niggling feeling of Deja Vu over this which I'd been putting
>>> down to an old Xen on ARM bug in the area of FPU register switching.
>>>
>>> But it seems at some point (possibly even still) there was a similar
>>> issue with pvops kernels on x86, see:
>>>         http://bugs.xenproject.org/xen/bug/40
...
>>> Philipp, what kernel are you guys using?
>>
>> The crash "2014-12-06 01:26:21 xenstored[4337]" happened on linux-3.10.46.
> 
> I looked through the changes of v3.10.46..v3.10.63 and found the
> following patches:
> | fb5b6e7 x86, fpu: shift drop_init_fpu() from save_xstate_sig() to
> handle_signal()
> | b888e3d x86, fpu: __restore_xstate_sig()->math_state_restore() needs
> preempt_disable()
> 
> They look interesting enough to may have fixed the bug, which could
> explain the strange bit pattern caused by not restoring the FPU state
> correctly.
...
> we're now working on upgrading the dom0 kernel which should give use
> usable core dumps again and may also fix the underlying problem. It that
> bug ever happens again I'll keep you informed.

We're now running 3.10.62 and the situation seems to have improved, but
yesterday and today we got two crashes on different host - this time
both times again in vsnprintf():

> [304534.173707] xenstored[3731]: segfault at 2 ip 00007f6da00805ad sp 00007fff544a2b80 error 4 in libc-2.11.3.so[7f6da003b000+158000]

> (gdb) where
> #0  0x00007f6da00805ad in _IO_vfprintf_internal (s=0x7fff544a3230, format=<value optimized out>, ap=0x7fff544a3790) at vfprintf.c:1617
> #1  0x00007f6da00a2452 in _IO_vsnprintf (string=0x7fff544a3390 "%%p 4249828122762082015 03:11:04 9JT\377\177", maxlen=<value optimized out>, format=0x40da48 "%s %p %04d%02d%02d %02d:%02d:%02d %s (", args=0x7fff544a3790) at vsnprintf.c:120
> #2  0x00000000004029ad in trace (fmt=0x40da48 "%s %p %04d%02d%02d %02d:%02d:%02d %s (") at xenstored_core.c:140
> #3  0x0000000000402c67 in trace_io (conn=0xbb51f0, data=0xbf1fe0, out=0) at xenstored_core.c:174
> #4  0x00000000004041cd in handle_input (conn=0xbb51f0) at xenstored_core.c:1307
> #5  0x0000000000405170 in main (argc=<value optimized out>, argv=<value optimized out>) at xenstored_core.c:1964

The SSE register again contain the 00..ff.. pattern, but accessing
%es:(%rdi)=0x0:0x2 looks very broken.

> (gdb) info all-registers 
> rax            0x0      0
> rbx            0x40da48 4250184
> rcx            0xffffffffffffffff       -1
> rdx            0x7fff544a3890   140734607538320
> rsi            0x40da69 4250217
> rdi            0x2      2
> rbp            0x7fff544a3790   0x7fff544a3790
> rsp            0x7fff544a3390   0x7fff544a3390
> r8             0x1      1
> r9             0x2      2
> r10            0x2      2
> r11            0x10     16
> r12            0x0      0
> r13            0x7fff544a3950   140734607538512
> r14            0x7fff544a39d0   140734607538640
> r15            0xc      12
> rip            0x4029ad 0x4029ad <trace+221>
> eflags         0x10286  [ PF SF IF RF ]
> cs             0xe033   57395
> ss             0xe02b   57387
> ds             0x0      0
> es             0x0      0
> fs             0x0      0
> gs             0x0      0
> st0            0        (raw 0x00000000000000000000)
> st1            0        (raw 0x00000000000000000000)
> st2            0        (raw 0x00000000000000000000)
> st3            0        (raw 0x00000000000000000000)
> st4            0        (raw 0x00000000000000000000)
> st5            0        (raw 0x00000000000000000000)
> st6            0        (raw 0x00000000000000000000)
> st7            0        (raw 0x00000000000000000000)
> fctrl          0x37f    895
> fstat          0x0      0
> ftag           0xffff   65535
> fiseg          0x0      0
> fioff          0x0      0
> foseg          0x0      0
> fooff          0x0      0
> fop            0x0      0
> xmm0           {v4_float = {0x0, 0x0, 0x0, 0x0}, v2_double = {0x0, 0x0}, v16_int8 = {0xff, 0x0, 0x0, 0x0, 0x0, 0xff, 0x0, 0x0, 0x0, 0x0, 0xff, 0x0, 0x0, 0x0, 0x0, 0x0}, v8_int16 = {0xff, 0x0, 0xff00, 0x0, 0x0, 0xff, 0x0, 0x0}, v4_int32 = {0xff, 0xff00, 0xff0000, 0x0}, v2_int64 = {0xff00000000ff, 0xff0000}, uint128 = 0x0000000000ff00000000ff00000000ff}
> xmm1           {v4_float = {0x0, 0x0, 0x0, 0x0}, v2_double = {0x0, 0x0}, v16_int8 = {0x25 <repeats 16 times>}, v8_int16 = {0x2525, 0x2525, 0x2525, 0x2525, 0x2525, 0x2525, 0x2525, 0x2525}, v4_int32 = {0x25252525, 0x25252525, 0x25252525, 0x25252525}, v2_int64 = {0x2525252525252525, 0x2525252525252525}, uint128 = 0x25252525252525252525252525252525}
> xmm2           {v4_float = {0x0, 0x0, 0x0, 0x0}, v2_double = {0x0, 0x0}, v16_int8 = {0x0 <repeats 16 times>}, v8_int16 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int32 = {0x0, 0x0, 0x0, 0x0}, v2_int64 = {0x0, 0x0}, uint128 = 0x00000000000000000000000000000000}
> xmm3           {v4_float = {0x0, 0x0, 0x0, 0x0}, v2_double = {0x0, 0x8000000000000000}, v16_int8 = {0x0 <repeats 14 times>, 0xff, 0xff}, v8_int16 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xffff}, v4_int32 = {0x0, 0x0, 0x0, 0xffff0000}, v2_int64 = {0x0, 0xffff000000000000}, uint128 = 0xffff0000000000000000000000000000}
> xmm4           {v4_float = {0xd34e4f00, 0x0, 0x0, 0x0}, v2_double = {0x0, 0x8000000000000000}, v16_int8 = {0x4f, 0x4e, 0x53, 0x4f, 0x4c, 0x45, 0x3d, 0x2f, 0x64, 0x65, 0x76, 0x2f, 0x63, 0x6f, 0x6e, 0x73}, v8_int16 = {0x4e4f, 0x4f53, 0x454c, 0x2f3d, 0x6564, 0x2f76, 0x6f63, 0x736e}, v4_int32 = {0x4f534e4f, 0x2f3d454c, 0x2f766564, 0x736e6f63}, v2_int64 = {0x2f3d454c4f534e4f, 0x736e6f632f766564}, uint128 = 0x736e6f632f7665642f3d454c4f534e4f}
> xmm5           {v4_float = {0x0, 0x0, 0x0, 0x0}, v2_double = {0x0, 0x0}, v16_int8 = {0x0 <repeats 16 times>}, v8_int16 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int32 = {0x0, 0x0, 0x0, 0x0}, v2_int64 = {0x0, 0x0}, uint128 = 0x00000000000000000000000000000000}
> xmm6           {v4_float = {0x0, 0x0, 0x0, 0x0}, v2_double = {0x0, 0x0}, v16_int8 = {0x0 <repeats 16 times>}, v8_int16 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int32 = {0x0, 0x0, 0x0, 0x0}, v2_int64 = {0x0, 0x0}, uint128 = 0x00000000000000000000000000000000}
> xmm7           {v4_float = {0x0, 0x0, 0x0, 0x0}, v2_double = {0x0, 0x0}, v16_int8 = {0x0 <repeats 16 times>}, v8_int16 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int32 = {0x0, 0x0, 0x0, 0x0}, v2_int64 = {0x0, 0x0}, uint128 = 0x00000000000000000000000000000000}
> xmm8           {v4_float = {0x0, 0x0, 0x0, 0x0}, v2_double = {0x0, 0x0}, v16_int8 = {0x0 <repeats 16 times>}, v8_int16 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int32 = {0x0, 0x0, 0x0, 0x0}, v2_int64 = {0x0, 0x0}, uint128 = 0x00000000000000000000000000000000}
> xmm9           {v4_float = {0x0, 0x0, 0x0, 0x0}, v2_double = {0x0, 0x0}, v16_int8 = {0x0 <repeats 16 times>}, v8_int16 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int32 = {0x0, 0x0, 0x0, 0x0}, v2_int64 = {0x0, 0x0}, uint128 = 0x00000000000000000000000000000000}
> xmm10          {v4_float = {0x0, 0x0, 0x0, 0x0}, v2_double = {0x0, 0x0}, v16_int8 = {0x0 <repeats 16 times>}, v8_int16 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int32 = {0x0, 0x0, 0x0, 0x0}, v2_int64 = {0x0, 0x0}, uint128 = 0x00000000000000000000000000000000}
> xmm11          {v4_float = {0x0, 0x0, 0x0, 0x0}, v2_double = {0x0, 0x0}, v16_int8 = {0x0 <repeats 16 times>}, v8_int16 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int32 = {0x0, 0x0, 0x0, 0x0}, v2_int64 = {0x0, 0x0}, uint128 = 0x00000000000000000000000000000000}
> xmm12          {v4_float = {0x0, 0x0, 0x0, 0x0}, v2_double = {0x0, 0x0}, v16_int8 = {0x0 <repeats 16 times>}, v8_int16 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int32 = {0x0, 0x0, 0x0, 0x0}, v2_int64 = {0x0, 0x0}, uint128 = 0x00000000000000000000000000000000}
> xmm13          {v4_float = {0x0, 0x0, 0x0, 0x0}, v2_double = {0x0, 0x0}, v16_int8 = {0x0 <repeats 16 times>}, v8_int16 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int32 = {0x0, 0x0, 0x0, 0x0}, v2_int64 = {0x0, 0x0}, uint128 = 0x00000000000000000000000000000000}
> xmm14          {v4_float = {0x0, 0x0, 0x0, 0x0}, v2_double = {0x0, 0x0}, v16_int8 = {0x0 <repeats 16 times>}, v8_int16 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int32 = {0x0, 0x0, 0x0, 0x0}, v2_int64 = {0x0, 0x0}, uint128 = 0x00000000000000000000000000000000}
> xmm15          {v4_float = {0x0, 0x0, 0x0, 0x0}, v2_double = {0x0, 0x0}, v16_int8 = {0x0 <repeats 16 times>}, v8_int16 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int32 = {0x0, 0x0, 0x0, 0x0}, v2_int64 = {0x0, 0x0}, uint128 = 0x00000000000000000000000000000000}
> mxcsr          0x1f80   [ IM DM ZM OM UM PM ]

> (gdb) x/20i $pc
> 0x7f6da00805ad <_IO_vfprintf_internal+15357>:   repnz scas %es:(%rdi),%al
> 0x7f6da00805af <_IO_vfprintf_internal+15359>:   xor    %r10d,%r10d
> 0x7f6da00805b2 <_IO_vfprintf_internal+15362>:   not    %rcx
> 0x7f6da00805b5 <_IO_vfprintf_internal+15365>:   lea    -0x1(%rcx),%r8
> 0x7f6da00805b9 <_IO_vfprintf_internal+15369>:   mov    %r8d,%ecx
> 0x7f6da00805bc <_IO_vfprintf_internal+15372>:   jmpq   0x7f6da007e00c <_IO_vfprintf_internal+5724>
> 0x7f6da00805c1 <_IO_vfprintf_internal+15377>:   mov    $0x6,%ecx
> 0x7f6da00805c6 <_IO_vfprintf_internal+15382>:   xor    %r10d,%r10d
> 0x7f6da00805c9 <_IO_vfprintf_internal+15385>:   mov    $0x6,%r8d
> 0x7f6da00805cf <_IO_vfprintf_internal+15391>:   lea    0xdff57(%rip),%r9        # 0x7f6da016052d <null>
> 0x7f6da00805d6 <_IO_vfprintf_internal+15398>:   jmpq   0x7f6da007d546 <_IO_vfprintf_internal+2966>
> 0x7f6da00805db <_IO_vfprintf_internal+15403>:   mov    0x8(%r13),%rax
> 0x7f6da00805df <_IO_vfprintf_internal+15407>:   lea    0x8(%rax),%rdx
> 0x7f6da00805e3 <_IO_vfprintf_internal+15411>:   mov    %rdx,0x8(%r13)
> 0x7f6da00805e7 <_IO_vfprintf_internal+15415>:   jmpq   0x7f6da007eac2 <_IO_vfprintf_internal+8466>
> 0x7f6da00805ec <_IO_vfprintf_internal+15420>:   mov    0x8(%r13),%rax
> 0x7f6da00805f0 <_IO_vfprintf_internal+15424>:   lea    0x8(%rax),%rdx
> 0x7f6da00805f4 <_IO_vfprintf_internal+15428>:   mov    %rdx,0x8(%r13)
> 0x7f6da00805f8 <_IO_vfprintf_internal+15432>:   jmpq   0x7f6da007f91e <_IO_vfprintf_internal+12142>
> 0x7f6da00805fd <_IO_vfprintf_internal+15437>:   mov    0x8(%r13),%rax

> (gdb) x/64x $sp
> 0x7fff544a2b80: 0x544a3260      0x00007fff      0x00000001      0x00000000
> 0x7fff544a2b90: 0x0040da6a      0x00000000      0x0040da6a      0x00000000
> 0x7fff544a2ba0: 0x544a3260      0x00007fff      0xa007cb39      0x00007f6d
> 0x7fff544a2bb0: 0x00000025      0x00000000      0x00000000      0x00000000
> 0x7fff544a2bc0: 0x544a3110      0x00007fff      0x0040d500      0x00000000
> 0x7fff544a2bd0: 0x0040da48      0x00000000      0x00000000      0x00000000
> 0x7fff544a2be0: 0x00000027      0x00000000      0x544a317c      0x00007fff
> 0x7fff544a2bf0: 0x544a31b8      0x00007fff      0x544a3198      0x00007fff
> 0x7fff544a2c00: 0x00000000      0x00000000      0x00000000      0x00000000
> 0x7fff544a2c10: 0x544a2d00      0x00007fff      0x544a31ac      0x00007fff
> 0x7fff544a2c20: 0x544a31e8      0x00007fff      0x544a31c8      0x00000000
> 0x7fff544a2c30: 0x544a3170      0x00007fff      0xffffffff      0xffffffff
> 0x7fff544a2c40: 0x544a2d30      0x00007fff      0x544a30e8      0x00007fff
> 0x7fff544a2c50: 0x0040da70      0x00000000      0x00000000      0x00000000
> 0x7fff544a2c60: 0x00000000      0xffffe938      0xffffff20      0xffffffff
> 0x7fff544a2c70: 0x544a3238      0x00007fff      0x544a3118      0x00007fff

To me it looks like there is still some register/memory corruption
happening in the kernel or Xen hypervisor.

@Oleg:
Have you seen any other corruption or is one of your patches likely to
fix something like the issue mentioned above:
> $ git l1 --grep fpu v3.10.. -- arch/x86
> c7b228a Merge branch 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
> dc56c0f x86, fpu: Shift "fpu_counter = 0" from copy_thread() to arch_dup_task_struct()
> 5e23fee x86, fpu: copy_process: Sanitize fpu->last_cpu initialization
> f185350 x86, fpu: copy_process: Avoid fpu_alloc/copy if !used_math()
> 31d9633 x86, fpu: Change __thread_fpu_begin() to use use_eager_fpu()

Philipp

  reply	other threads:[~2015-03-12 12:08 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-11-13  7:45 xenstored crashes with SIGSEGV Philipp Hahn
2014-11-13  9:12 ` Ian Campbell
2014-12-12 16:14   ` Philipp Hahn
2014-12-12 16:32     ` Ian Campbell
2014-12-12 16:45       ` Philipp Hahn
2014-12-12 16:56         ` Ian Campbell
2014-12-12 17:20           ` Philipp Hahn
2014-12-12 17:58             ` Ian Campbell
2014-12-15 13:17               ` Ian Campbell
2014-12-15 14:19                 ` Philipp Hahn
2014-12-15 14:50                   ` Ian Campbell
2014-12-15 17:45                     ` Ian Campbell
2014-12-15 22:29                       ` Philipp Hahn
2014-12-16  9:51                         ` Ian Campbell
2014-12-16 10:25                         ` Ian Campbell
2014-12-16 10:45                         ` Ian Campbell
2014-12-16 11:06                           ` Ian Campbell
2014-12-16 11:30                             ` Frediano Ziglio
2014-12-16 12:23                               ` Ian Campbell
2014-12-16 16:13                                 ` Frediano Ziglio
2014-12-16 16:23                                   ` Ian Campbell
2014-12-16 16:44                                     ` Frediano Ziglio
2014-12-17  9:14                                       ` Frediano Ziglio
2014-12-17 12:43                                         ` core dump files do not include all CPU registers? Philipp Hahn
2014-12-18 10:20                                         ` xenstored crashes with SIGSEGV Philipp Hahn
2014-12-18 10:17                                   ` Ian Campbell
2014-12-18 10:25                                     ` David Vrabel
2014-12-19 14:30                                       ` Konrad Rzeszutek Wilk
2014-12-18 10:49                                     ` Jan Beulich
2014-12-18 10:51                                       ` Ian Campbell
2014-12-19 12:36                                     ` Philipp Hahn
2015-01-06  7:19                                       ` Philipp Hahn
2015-03-12 12:08                                         ` Philipp Hahn [this message]
2015-03-12 18:17                                           ` Oleg Nesterov
2015-03-12 21:57                                             ` Philipp Hahn
2014-12-16 12:04                           ` Philipp Hahn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=550181CE.8090204@univention.de \
    --to=hahn@univention.de \
    --cc=Ian.Campbell@citrix.com \
    --cc=Ian.Jackson@eu.citrix.com \
    --cc=JBeulich@suse.com \
    --cc=Xen-devel@lists.xen.org \
    --cc=david.vrabel@citrix.com \
    --cc=freddy77@gmail.com \
    --cc=george.dunlap@eu.citrix.com \
    --cc=oleg@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.