All of lore.kernel.org
 help / color / mirror / Atom feed
* Xen host crash
@ 2013-08-29 16:22 Rushikesh Jadhav
  2013-08-29 16:25 ` Andrew Cooper
  0 siblings, 1 reply; 9+ messages in thread
From: Rushikesh Jadhav @ 2013-08-29 16:22 UTC (permalink / raw)
  To: xen-devel@lists.xen.org


[-- Attachment #1.1: Type: text/plain, Size: 1211 bytes --]

Hi People,

I had a crash of Xen [ 3.4.2 ] host today and the crash log was dumped in
/var/crash/

While analyzing the crash log & call trace on this 24 PCPU host, I found
that some of PCPUs were in idle state & many were having same call trace as

PCPU7
Call Trace:
  [ffff828c8010e310] dump_domains+0x4d0
   ffff828c80175b7c  crash_nmi_callback+0x2c
   ffff828c8015f2f9  do_nmi+0x39
   ffff828c801d6877  handle_ist_exception+0x52
   ffff828c801787b2  acpi_safe_halt+0x2


Only one PCPU has got call trace as

PCPU6
Call Trace:
  [ffff828c8010e310] dump_domains+0x4d0
   ffff828c8010eeb7  kexec_crash+0x57
   ffff828c80127b36  panic+0x136
   ffff828c8011b7da  __print_symbol+0x8a
   ffff828c8019b4ab  vmx_asm_vmexit_handler+0x6b
   ffff828c80100000  __per_cpu_shift+0x800ffff4
   ffff828c8015eb75  show_stack+0x155
   ffff828c8015eeba  fatal_trap+0x6a
   ffff828c801567a1  nmi_watchdog_tick+0x131
   ffff828c8015f37f  do_nmi+0xbf
   ffff828c801d6877  handle_ist_exception+0x52
   ffff828c8011ab02  _spin_lock+0x12

Can anyone please help me understand this & try to find out crash cause ?
There are no error logs in messages or kernel at the time of crash.

I checked for C-States and it is set to 2.

Thanks.

[-- Attachment #1.2: Type: text/html, Size: 2589 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Xen host crash
  2013-08-29 16:22 Xen host crash Rushikesh Jadhav
@ 2013-08-29 16:25 ` Andrew Cooper
  2013-08-29 16:38   ` Rushikesh Jadhav
  0 siblings, 1 reply; 9+ messages in thread
From: Andrew Cooper @ 2013-08-29 16:25 UTC (permalink / raw)
  To: Rushikesh Jadhav; +Cc: xen-devel@lists.xen.org


[-- Attachment #1.1: Type: text/plain, Size: 1602 bytes --]

On 29/08/13 17:22, Rushikesh Jadhav wrote:
> Hi People,
>
> I had a crash of Xen [ 3.4.2 ] host today and the crash log was dumped
> in /var/crash/
>
> While analyzing the crash log & call trace on this 24 PCPU host, I
> found that some of PCPUs were in idle state & many were having same
> call trace as 
>
> PCPU7
> Call Trace:
>  [ffff828c8010e310] dump_domains+0x4d0
>   ffff828c80175b7c  crash_nmi_callback+0x2c
>   ffff828c8015f2f9  do_nmi+0x39
>   ffff828c801d6877  handle_ist_exception+0x52
>   ffff828c801787b2  acpi_safe_halt+0x2
>
>
> Only one PCPU has got call trace as
>
> PCPU6
> Call Trace:
>  [ffff828c8010e310] dump_domains+0x4d0
>   ffff828c8010eeb7  kexec_crash+0x57
>   ffff828c80127b36  panic+0x136
>   ffff828c8011b7da  __print_symbol+0x8a
>   ffff828c8019b4ab  vmx_asm_vmexit_handler+0x6b
>   ffff828c80100000  __per_cpu_shift+0x800ffff4
>   ffff828c8015eb75  show_stack+0x155
>   ffff828c8015eeba  fatal_trap+0x6a
>   ffff828c801567a1  nmi_watchdog_tick+0x131
>   ffff828c8015f37f  do_nmi+0xbf
>   ffff828c801d6877  handle_ist_exception+0x52
>   ffff828c8011ab02  _spin_lock+0x12
>
> Can anyone please help me understand this & try to find out crash cause ?
> There are no error logs in messages or kernel at the time of crash.
>
> I checked for C-States and it is set to 2.
>
> Thanks.

This is a spinlock deadlock, resulting in the NMI watchdog timing out
and killing the host.  Do you have Stack and register dump for PCPU6 ?

~Andrew

>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel


[-- Attachment #1.2: Type: text/html, Size: 4755 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Xen host crash
  2013-08-29 16:25 ` Andrew Cooper
@ 2013-08-29 16:38   ` Rushikesh Jadhav
  2013-08-29 17:02     ` Andrew Cooper
  0 siblings, 1 reply; 9+ messages in thread
From: Rushikesh Jadhav @ 2013-08-29 16:38 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: xen-devel@lists.xen.org


[-- Attachment #1.1: Type: text/plain, Size: 7117 bytes --]

On Thu, Aug 29, 2013 at 9:55 PM, Andrew Cooper <andrew.cooper3@citrix.com>wrote:

>  On 29/08/13 17:22, Rushikesh Jadhav wrote:
>
> Hi People,
>
>  I had a crash of Xen [ 3.4.2 ] host today and the crash log was dumped
> in /var/crash/
>
>  While analyzing the crash log & call trace on this 24 PCPU host, I found
> that some of PCPUs were in idle state & many were having same call trace as
>
>  PCPU7
>  Call Trace:
>   [ffff828c8010e310] dump_domains+0x4d0
>    ffff828c80175b7c  crash_nmi_callback+0x2c
>    ffff828c8015f2f9  do_nmi+0x39
>    ffff828c801d6877  handle_ist_exception+0x52
>    ffff828c801787b2  acpi_safe_halt+0x2
>
>
>  Only one PCPU has got call trace as
>
>  PCPU6
>  Call Trace:
>   [ffff828c8010e310] dump_domains+0x4d0
>    ffff828c8010eeb7  kexec_crash+0x57
>    ffff828c80127b36  panic+0x136
>    ffff828c8011b7da  __print_symbol+0x8a
>    ffff828c8019b4ab  vmx_asm_vmexit_handler+0x6b
>    ffff828c80100000  __per_cpu_shift+0x800ffff4
>    ffff828c8015eb75  show_stack+0x155
>    ffff828c8015eeba  fatal_trap+0x6a
>    ffff828c801567a1  nmi_watchdog_tick+0x131
>    ffff828c8015f37f  do_nmi+0xbf
>    ffff828c801d6877  handle_ist_exception+0x52
>    ffff828c8011ab02  _spin_lock+0x12
>
>  Can anyone please help me understand this & try to find out crash cause ?
>  There are no error logs in messages or kernel at the time of crash.
>
>  I checked for C-States and it is set to 2.
>
>  Thanks.
>
>
> This is a spinlock deadlock, resulting in the NMI watchdog timing out and
> killing the host.  Do you have Stack and register dump for PCPU6 ?
>
> ~Andrew
>

Hi Andrew, here is the stack trace and register dump for PCPU6 & PCPU7

PCPU6 host state:
RIP:    e008:[<ffff828c8010e310>]
RFLAGS: 0000000000000002
rax: 0000000000000004   rbx: 0000000000000001   rcx: ffff828c803629cc
rdx: ffff828c8036286c   rsi: ffff828c803628dc   rdi: 00000000ffffffff
rbp: 0000000000000082   rsp: ffff83247fd88e10   r8:  0000000000000001
r9:  0000000000000001   r10: 00000000fffffffc   r11: 0000000000000001
r12: 0000000000000001   r13: ffff832270af39a0   r14: 0000000000000002
r15: 0000000000000009
cr0: 0000000080050033   cr4: 00000000000026f0
cr3: 000000205a12e000   cr2: fffff880005c5000
ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008

current: DOM185 VCPU6 (ffff830047ee8000)
stack context: DOM185 VCPU6 (ffff830047ee8000)
idle VCPU: ffff83007ea5e000

Stack at 0xffff83247fd88e10:
  ffff83247fd88e00:                                     8010eeb7 ffff828c
801fb1d0 ffff828c
  ffff83247fd88e20: 80127b36 ffff828c 00000028 00000030 7fd88f18 ffff8324
7fd88e48 ffff8324
  ffff83247fd88e40: 00000001 00000000 00000002 00000000 00000002 00000000
801f423d ffff828c
  ffff83247fd88e60: 00000000 00000000 801f45b3 ffff828c 00000000 00000000
00000096 00000000
  ffff83247fd88e80: 8011b7da ffff828c 000000e5 00000000 00000000 00000000
8019b4ab ffff828c
  ffff83247fd88ea0: 7fd8ff20 ffff8324 80100000 ffff828c 8015eb75 ffff828c
7fd88f58 ffff8324
  ffff83247fd88ec0: 7fd8ff28 ffff8324 7fd88f58 ffff8324 00000002 00000000
7fd8ff28 ffff8324
  ffff83247fd88ee0: 7fd88f58 ffff8324 00000002 00000000 8015eeba ffff828c
7fd8ff28 ffff8324
  ffff83247fd88f00: 7fd8ff28 ffff8324 7fd88f58 ffff8324 801567a1 ffff828c
00000000 00000000
  ffff83247fd88f20: 00000006 00000000 7fd88f58 ffff8324 8015f37f ffff828c
00000000 00000000
  ffff83247fd88f40: 339e8000 ffff8322 7fd8ff28 ffff8324 801d6877 ffff828c
00000009 00000000
  ffff83247fd88f60: 00000002 00000000 70af39a0 ffff8322 00000001 00000000
7fd8ff28 ffff8324
  ffff83247fd88f80: 339e8000 ffff8322 00ff00ff 00ff00ff 0000ffff 0000ffff
339e8018 ffff8322
  ffff83247fd88fa0: 00000002 00000000 00000000 00000000 0000000f 00000000
47ee8000 ffff8300
  ffff83247fd88fc0: 00366807 00000000 60e72e90 ffff8319 00000000 00000002
8011ab02 ffff828c
  ffff83247fd88fe0: 0000e008 00000000 00000246 00000000 7fd8fd70 ffff8324
00000000 00000000

Code:
  da e8 df 91 01 00 e9 19 fd ff ff 90 90 90 90 90 90 90 90 90 90 <4c> 89 3f
4c 89 77 08 4c 89 6f 10

Call Trace:
  [ffff828c8010e310] dump_domains+0x4d0
   ffff828c8010eeb7  kexec_crash+0x57
   ffff828c80127b36  panic+0x136
   ffff828c8011b7da  __print_symbol+0x8a
   ffff828c8019b4ab  vmx_asm_vmexit_handler+0x6b
   ffff828c80100000  __per_cpu_shift+0x800ffff4
   ffff828c8015eb75  show_stack+0x155
   ffff828c8015eeba  fatal_trap+0x6a
   ffff828c801567a1  nmi_watchdog_tick+0x131
   ffff828c8015f37f  do_nmi+0xbf
   ffff828c801d6877  handle_ist_exception+0x52
   ffff828c8011ab02  _spin_lock+0x12

  PCPU6 guest state:
DOMAIN185 VCPU3
RIP:    0000:[<fffff800016caee0>]
RFLAGS: 0000000000010206
rax: 0000000000000000   rbx: ffffffffffffffff   rcx: fffffa6002998000
rdx: 0000000000000100   rsi: fffff6fd30014d00   rdi: 0000000000000010
rbp: 0000000000000010   rsp: fffffa6001bc6c48   r8:  0000000000000000
r9:  0000000000000000   r10: 0000000000000000   r11: 0000000000000000
r12: 0000000000000000   r13: 0000000000000000   r14: 0000000000000000
r15: 0000000000000000
cr0: 0000000080050033   cr4: 00000000000026f0
cr3: 000000205a12e000   cr2: fffff880005c5000
ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: 0000

VCPU pause flags: 0 arch flags 0x1

current on PCPU6
struct vcpu at ffff830047ee8000

Stack unavailable.


  PCPU7 host state:
RIP:    e008:[<ffff828c8010e310>]
RFLAGS: 0000000000000002
rax: 0000000000000004   rbx: 0000000000000007   rcx: ffff828c80362bc0
rdx: ffff828c80362a60   rsi: ffff828c80362ad0   rdi: ffff83247fd78f58
rbp: ffff83247fd78f58   rsp: ffff83247fd78f20   r8:  000000000000234c
r9:  0000000000000002   r10: 0000000000000000   r11: 0000000000000000
r12: ffff8320f16a6230   r13: ffff83247fdefea8   r14: 0017ca4b87bb89a4
r15: ffff828c8024b100
cr0: 0000000080050033   cr4: 00000000000026f0
cr3: 000000200aab0000   cr2: 000000001588fff0
ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008

current: idle (ffff83007ea5c000)
stack context: DOM183 VCPU7 (ffff830052966000)
idle VCPU: ffff83007ea5c000

Stack at 0xffff83247fd78f20:
  ffff83247fd78f20: 80175b7c ffff828c 7fd78f58 ffff8324 8015f2f9 ffff828c
00000000 00000000
  ffff83247fd78f40: 00ec0bc5 00000000 f16a61d0 ffff8320 801d6877 ffff828c
8024b100 ffff828c
  ffff83247fd78f60: 87bb89a4 0017ca4b 7fdefea8 ffff8324 f16a6230 ffff8320
f16a61d0 ffff8320
  ffff83247fd78f80: 00ec0bc5 00000000 00000000 00000000 00000000 00000000
00000002 00000000
  ffff83247fd78fa0: 0000234c 00000000 00000003 00000000 00000001 00000000
00000808 00000000
  ffff83247fd78fc0: 644f7bdc 00000000 f16a6230 ffff8320 00000000 00000002
801787b2 ffff828c
  ffff83247fd78fe0: 0000e008 00000000 00000246 00000000 7fd7fec0 ffff8324
00000000 00000000

Code:
  da e8 df 91 01 00 e9 19 fd ff ff 90 90 90 90 90 90 90 90 90 90 <4c> 89 3f
4c 89 77 08 4c 89 6f 10

Call Trace:
  [ffff828c8010e310] dump_domains+0x4d0
   ffff828c80175b7c  crash_nmi_callback+0x2c
   ffff828c8015f2f9  do_nmi+0x39
   ffff828c801d6877  handle_ist_exception+0x52
   ffff828c801787b2  acpi_safe_halt+0x2

  PCPU7 guest state:
None (idle)






>
>
>
> _______________________________________________
> Xen-devel mailing listXen-devel@lists.xen.orghttp://lists.xen.org/xen-devel
>
>
>

[-- Attachment #1.2: Type: text/html, Size: 16334 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Xen host crash
  2013-08-29 16:38   ` Rushikesh Jadhav
@ 2013-08-29 17:02     ` Andrew Cooper
  2013-08-29 19:10       ` Rushikesh Jadhav
  0 siblings, 1 reply; 9+ messages in thread
From: Andrew Cooper @ 2013-08-29 17:02 UTC (permalink / raw)
  To: Rushikesh Jadhav; +Cc: xen-devel@lists.xen.org


[-- Attachment #1.1: Type: text/plain, Size: 4369 bytes --]

On 29/08/13 17:38, Rushikesh Jadhav wrote:
> This is a spinlock deadlock, resulting in the NMI watchdog timing out
> and killing the host.  Do you have Stack and register dump for PCPU6 ?
>
>
>     ~Andrew
>
>
> Hi Andrew, here is the stack trace and register dump for PCPU6 & PCPU7
>
> PCPU6 host state:
> RIP:    e008:[<ffff828c8010e310>]
> RFLAGS: 0000000000000002
> rax: 0000000000000004   rbx: 0000000000000001   rcx: ffff828c803629cc
> rdx: ffff828c8036286c   rsi: ffff828c803628dc   rdi: 00000000ffffffff
> rbp: 0000000000000082   rsp: ffff83247fd88e10   r8:  0000000000000001
> r9:  0000000000000001   r10: 00000000fffffffc   r11: 0000000000000001
> r12: 0000000000000001   r13: ffff832270af39a0   r14: 0000000000000002
> r15: 0000000000000009
> cr0: 0000000080050033   cr4: 00000000000026f0
> cr3: 000000205a12e000   cr2: fffff880005c5000
> ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
>
> current:DOM185 VCPU6 (ffff830047ee8000)
> stack context:DOM185 VCPU6 (ffff830047ee8000)
> idle VCPU:ffff83007ea5e000
>
> Stack at 0xffff83247fd88e10: 
>  ffff83247fd88e00:                                     8010eeb7
> ffff828c 801fb1d0 ffff828c
>  ffff83247fd88e20: 80127b36 ffff828c 00000028 00000030 7fd88f18
> ffff8324 7fd88e48 ffff8324
>  ffff83247fd88e40: 00000001 00000000 00000002 00000000 00000002
> 00000000 801f423d ffff828c
>  ffff83247fd88e60: 00000000 00000000 801f45b3 ffff828c 00000000
> 00000000 00000096 00000000
>  ffff83247fd88e80: 8011b7da ffff828c 000000e5 00000000 00000000
> 00000000 8019b4ab ffff828c
>  ffff83247fd88ea0: 7fd8ff20 ffff8324 80100000 ffff828c 8015eb75
> ffff828c 7fd88f58 ffff8324
>  ffff83247fd88ec0: 7fd8ff28 ffff8324 7fd88f58 ffff8324 00000002
> 00000000 7fd8ff28 ffff8324
>  ffff83247fd88ee0: 7fd88f58 ffff8324 00000002 00000000 8015eeba
> ffff828c 7fd8ff28 ffff8324
>  ffff83247fd88f00: 7fd8ff28 ffff8324 7fd88f58 ffff8324 801567a1
> ffff828c 00000000 00000000
>  ffff83247fd88f20: 00000006 00000000 7fd88f58 ffff8324 8015f37f
> ffff828c 00000000 00000000
>  ffff83247fd88f40: 339e8000 ffff8322 7fd8ff28 ffff8324 801d6877
> ffff828c 00000009 00000000
>  ffff83247fd88f60: 00000002 00000000 70af39a0 ffff8322 00000001
> 00000000 7fd8ff28 ffff8324
>  ffff83247fd88f80: 339e8000 ffff8322 00ff00ff 00ff00ff 0000ffff
> 0000ffff 339e8018 ffff8322
>  ffff83247fd88fa0: 00000002 00000000 00000000 00000000 0000000f
> 00000000 47ee8000 ffff8300
>  ffff83247fd88fc0: 00366807 00000000 60e72e90 ffff8319 00000000
> 00000002 8011ab02 ffff828c
>  ffff83247fd88fe0: 0000e008 00000000 00000246 00000000 7fd8fd70
> ffff8324 00000000 00000000
>
> Code:
>  da e8 df 91 01 00 e9 19 fd ff ff 90 90 90 90 90 90 90 90 90 90 <4c>
> 89 3f 4c 89 77 08 4c 89 6f 10 
>
> Call Trace:
>  [ffff828c8010e310] dump_domains+0x4d0
>   ffff828c8010eeb7  kexec_crash+0x57
>   ffff828c80127b36  panic+0x136
>   ffff828c8011b7da  __print_symbol+0x8a
>   ffff828c8019b4ab  vmx_asm_vmexit_handler+0x6b
>   ffff828c80100000  __per_cpu_shift+0x800ffff4
>   ffff828c8015eb75  show_stack+0x155
>   ffff828c8015eeba  fatal_trap+0x6a
>   ffff828c801567a1  nmi_watchdog_tick+0x131
>   ffff828c8015f37f  do_nmi+0xbf
>   ffff828c801d6877  handle_ist_exception+0x52
>   ffff828c8011ab02  _spin_lock+0x12
>
>   PCPU6 guest state:
> DOMAIN185 VCPU3
> RIP:    0000:[<fffff800016caee0>]
> RFLAGS: 0000000000010206
> rax: 0000000000000000   rbx: ffffffffffffffff   rcx: fffffa6002998000
> rdx: 0000000000000100   rsi: fffff6fd30014d00   rdi: 0000000000000010
> rbp: 0000000000000010   rsp: fffffa6001bc6c48   r8:  0000000000000000
> r9:  0000000000000000   r10: 0000000000000000   r11: 0000000000000000
> r12: 0000000000000000   r13: 0000000000000000   r14: 0000000000000000
> r15: 0000000000000000
> cr0: 0000000080050033   cr4: 00000000000026f0
> cr3: 000000205a12e000   cr2: fffff880005c5000
> ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: 0000
>
> VCPU pause flags: 0 arch flags 0x1
>
> current on PCPU6
> struct vcpu at ffff830047ee8000
>
> Stack unavailable.
>

Ok.

The problematic spinlock is at address 0xffff831960e72e90, which is
sadly a dynamically allocated one so cant be traced back to a symbol
using the symbol table.

Having said that, you are using Xen 3.4 which is ages out of date, and
in fact, probably using XenServer 5.6SP2 (so shouldn't be using
xen-devel anyway).

I suggest you upgrade to something less ancient.

~Andrew

[-- Attachment #1.2: Type: text/html, Size: 12309 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Xen host crash
  2013-08-29 17:02     ` Andrew Cooper
@ 2013-08-29 19:10       ` Rushikesh Jadhav
  2013-08-29 19:15         ` Andrew Cooper
  0 siblings, 1 reply; 9+ messages in thread
From: Rushikesh Jadhav @ 2013-08-29 19:10 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: xen-devel@lists.xen.org


[-- Attachment #1.1: Type: text/plain, Size: 5363 bytes --]

On Thu, Aug 29, 2013 at 10:32 PM, Andrew Cooper
<andrew.cooper3@citrix.com>wrote:

>  On 29/08/13 17:38, Rushikesh Jadhav wrote:
>
> This is a spinlock deadlock, resulting in the NMI watchdog timing out and
> killing the host.  Do you have Stack and register dump for PCPU6 ?
>
>>
>> ~Andrew
>>
>
>  Hi Andrew, here is the stack trace and register dump for PCPU6 & PCPU7
>
>  PCPU6 host state:
>  RIP:    e008:[<ffff828c8010e310>]
>  RFLAGS: 0000000000000002
>  rax: 0000000000000004   rbx: 0000000000000001   rcx: ffff828c803629cc
>  rdx: ffff828c8036286c   rsi: ffff828c803628dc   rdi: 00000000ffffffff
>  rbp: 0000000000000082   rsp: ffff83247fd88e10   r8:  0000000000000001
>  r9:  0000000000000001   r10: 00000000fffffffc   r11: 0000000000000001
>  r12: 0000000000000001   r13: ffff832270af39a0   r14: 0000000000000002
>  r15: 0000000000000009
>  cr0: 0000000080050033   cr4: 00000000000026f0
>  cr3: 000000205a12e000   cr2: fffff880005c5000
>  ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
>
>  current: DOM185 VCPU6 (ffff830047ee8000)
>  stack context: DOM185 VCPU6 (ffff830047ee8000)
>  idle VCPU: ffff83007ea5e000
>
>  Stack at 0xffff83247fd88e10:
>   ffff83247fd88e00:                                     8010eeb7 ffff828c
> 801fb1d0 ffff828c
>   ffff83247fd88e20: 80127b36 ffff828c 00000028 00000030 7fd88f18 ffff8324
> 7fd88e48 ffff8324
>   ffff83247fd88e40: 00000001 00000000 00000002 00000000 00000002 00000000
> 801f423d ffff828c
>   ffff83247fd88e60: 00000000 00000000 801f45b3 ffff828c 00000000 00000000
> 00000096 00000000
>   ffff83247fd88e80: 8011b7da ffff828c 000000e5 00000000 00000000 00000000
> 8019b4ab ffff828c
>   ffff83247fd88ea0: 7fd8ff20 ffff8324 80100000 ffff828c 8015eb75 ffff828c
> 7fd88f58 ffff8324
>   ffff83247fd88ec0: 7fd8ff28 ffff8324 7fd88f58 ffff8324 00000002 00000000
> 7fd8ff28 ffff8324
>   ffff83247fd88ee0: 7fd88f58 ffff8324 00000002 00000000 8015eeba ffff828c
> 7fd8ff28 ffff8324
>   ffff83247fd88f00: 7fd8ff28 ffff8324 7fd88f58 ffff8324 801567a1 ffff828c
> 00000000 00000000
>   ffff83247fd88f20: 00000006 00000000 7fd88f58 ffff8324 8015f37f ffff828c
> 00000000 00000000
>   ffff83247fd88f40: 339e8000 ffff8322 7fd8ff28 ffff8324 801d6877 ffff828c
> 00000009 00000000
>   ffff83247fd88f60: 00000002 00000000 70af39a0 ffff8322 00000001 00000000
> 7fd8ff28 ffff8324
>   ffff83247fd88f80: 339e8000 ffff8322 00ff00ff 00ff00ff 0000ffff 0000ffff
> 339e8018 ffff8322
>   ffff83247fd88fa0: 00000002 00000000 00000000 00000000 0000000f 00000000
> 47ee8000 ffff8300
>   ffff83247fd88fc0: 00366807 00000000 60e72e90 ffff8319 00000000 00000002
> 8011ab02 ffff828c
>   ffff83247fd88fe0: 0000e008 00000000 00000246 00000000 7fd8fd70 ffff8324
> 00000000 00000000
>
>  Code:
>   da e8 df 91 01 00 e9 19 fd ff ff 90 90 90 90 90 90 90 90 90 90 <4c> 89
> 3f 4c 89 77 08 4c 89 6f 10
>
>  Call Trace:
>   [ffff828c8010e310] dump_domains+0x4d0
>    ffff828c8010eeb7  kexec_crash+0x57
>    ffff828c80127b36  panic+0x136
>    ffff828c8011b7da  __print_symbol+0x8a
>    ffff828c8019b4ab  vmx_asm_vmexit_handler+0x6b
>    ffff828c80100000  __per_cpu_shift+0x800ffff4
>    ffff828c8015eb75  show_stack+0x155
>    ffff828c8015eeba  fatal_trap+0x6a
>    ffff828c801567a1  nmi_watchdog_tick+0x131
>    ffff828c8015f37f  do_nmi+0xbf
>    ffff828c801d6877  handle_ist_exception+0x52
>    ffff828c8011ab02  _spin_lock+0x12
>
>    PCPU6 guest state:
>  DOMAIN185 VCPU3
>  RIP:    0000:[<fffff800016caee0>]
>  RFLAGS: 0000000000010206
>  rax: 0000000000000000   rbx: ffffffffffffffff   rcx: fffffa6002998000
>  rdx: 0000000000000100   rsi: fffff6fd30014d00   rdi: 0000000000000010
>  rbp: 0000000000000010   rsp: fffffa6001bc6c48   r8:  0000000000000000
>  r9:  0000000000000000   r10: 0000000000000000   r11: 0000000000000000
>  r12: 0000000000000000   r13: 0000000000000000   r14: 0000000000000000
>  r15: 0000000000000000
>  cr0: 0000000080050033   cr4: 00000000000026f0
>  cr3: 000000205a12e000   cr2: fffff880005c5000
>  ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: 0000
>
>  VCPU pause flags: 0 arch flags 0x1
>
>  current on PCPU6
>  struct vcpu at ffff830047ee8000
>
>  Stack unavailable.
>
>
> Ok.
>
> The problematic spinlock is at address 0xffff831960e72e90, which is sadly
> a dynamically allocated one so cant be traced back to a symbol using the
> symbol table.
>

Thanks. How easy or hard it is to trace such thing ? I tried google for xen
crash analyze and came up with
https://github.com/xenserver/xen-crashdump-analyser &
http://xenbits.xen.org/people/andrewcoop/.


>
> Having said that, you are using Xen 3.4 which is ages out of date, and in
> fact, probably using XenServer 5.6SP2 (so shouldn't be using xen-devel
> anyway).
>

It is quite an old yet stable host with trusted guests. I wanted more
information about Xen stack traces and whats the best way to read them
hence sent on the list.

Current crash did not generate a core dump, hence I tried crashdump
analyzer from xenbits.xen.org on other core dumps but that fails with

INFO  Elf CORE crash file: /tmp/core.kdump.1405
ERROR Unexpected class 1
ERROR Failed to parse the crash file

in xen-crashdump-analyser.log.

Thanks for your help.


>
> I suggest you upgrade to something less ancient.
>

Host upgrades are in process for XenServer 6.2, I hope this one does not
get ancient again by the time further XS7 comes in.


>
>
> ~Andrew
>

[-- Attachment #1.2: Type: text/html, Size: 13936 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Xen host crash
  2013-08-29 19:10       ` Rushikesh Jadhav
@ 2013-08-29 19:15         ` Andrew Cooper
  2013-08-29 19:46           ` Rushikesh Jadhav
  0 siblings, 1 reply; 9+ messages in thread
From: Andrew Cooper @ 2013-08-29 19:15 UTC (permalink / raw)
  To: Rushikesh Jadhav; +Cc: xen-devel@lists.xen.org


[-- Attachment #1.1: Type: text/plain, Size: 1639 bytes --]

On 29/08/13 20:10, Rushikesh Jadhav wrote:
>
>
>     The problematic spinlock is at address 0xffff831960e72e90, which
>     is sadly a dynamically allocated one so cant be traced back to a
>     symbol using the symbol table.
>
>
> Thanks. How easy or hard it is to trace such thing ? I tried google
> for xen crash analyze and came up
> with https://github.com/xenserver/xen-crashdump-analyser
> & http://xenbits.xen.org/people/andrewcoop/.

The xen crashdump analyser is my new replacement to an old tool we used,
although my xenbits page is a tad out of date.  For now, github should
be considered the canonical source, although there has been interest in
getting it into the main Xen tree (perhaps when I gain enough tuits).

>  
>
>
>     Having said that, you are using Xen 3.4 which is ages out of date,
>     and in fact, probably using XenServer 5.6SP2 (so shouldn't be
>     using xen-devel anyway).
>
>
> It is quite an old yet stable host with trusted guests. I wanted more
> information about Xen stack traces and whats the best way to read them
> hence sent on the list.
>
> Current crash did not generate a core dump, hence I tried crashdump
> analyzer from xenbits.xen.org <http://xenbits.xen.org> on other core
> dumps but that fails with 
>
> INFO  Elf CORE crash file: /tmp/core.kdump.1405
> ERROR Unexpected class 1
> ERROR Failed to parse the crash file
>
> in xen-crashdump-analyser.log.

There is a reason I threw kdump away and wrote the crashdump analyser;
It would crash all over the place.  That crash file is from the kdump
utility, not from Xen.  The crashdump analyser will not be able to parse it.

~Andrew


[-- Attachment #1.2: Type: text/html, Size: 3924 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Xen host crash
  2013-08-29 19:15         ` Andrew Cooper
@ 2013-08-29 19:46           ` Rushikesh Jadhav
  2013-08-30  9:51             ` Andrew Cooper
  0 siblings, 1 reply; 9+ messages in thread
From: Rushikesh Jadhav @ 2013-08-29 19:46 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: xen-devel@lists.xen.org


[-- Attachment #1.1: Type: text/plain, Size: 2069 bytes --]

On Fri, Aug 30, 2013 at 12:45 AM, Andrew Cooper
<andrew.cooper3@citrix.com>wrote:

>  On 29/08/13 20:10, Rushikesh Jadhav wrote:
>
>
>
>> The problematic spinlock is at address 0xffff831960e72e90, which is sadly
>> a dynamically allocated one so cant be traced back to a symbol using the
>> symbol table.
>>
>
>  Thanks. How easy or hard it is to trace such thing ? I tried google for
> xen crash analyze and came up with
> https://github.com/xenserver/xen-crashdump-analyser &
> http://xenbits.xen.org/people/andrewcoop/.
>
>
> The xen crashdump analyser is my new replacement to an old tool we used,
> although my xenbits page is a tad out of date.  For now, github should be
> considered the canonical source, although there has been interest in
> getting it into the main Xen tree (perhaps when I gain enough tuits).
>
>
Yes, I saw that from code and switched to github now. Thanks.


>
>
>>
>> Having said that, you are using Xen 3.4 which is ages out of date, and in
>> fact, probably using XenServer 5.6SP2 (so shouldn't be using xen-devel
>> anyway).
>>
>
>  It is quite an old yet stable host with trusted guests. I wanted more
> information about Xen stack traces and whats the best way to read them
> hence sent on the list.
>
>  Current crash did not generate a core dump, hence I tried crashdump
> analyzer from xenbits.xen.org on other core dumps but that fails with
>
>  INFO  Elf CORE crash file: /tmp/core.kdump.1405
> ERROR Unexpected class 1
> ERROR Failed to parse the crash file
>
>  in xen-crashdump-analyser.log.
>
>
> There is a reason I threw kdump away and wrote the crashdump analyser; It
> would crash all over the place.  That crash file is from the kdump utility,
> not from Xen.  The crashdump analyser will not be able to parse it.
>

I have only one binary dump file in each crash logs /var/crash/
as core.kdump.1405. Rest all are .log files which does not seem to be
parsed by the github tool.  Is there any other file I should be looking at
?  /proc/vmcore looks to be default file for tool but its not present.




>
>
> ~Andrew
>
>

[-- Attachment #1.2: Type: text/html, Size: 5035 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Xen host crash
  2013-08-29 19:46           ` Rushikesh Jadhav
@ 2013-08-30  9:51             ` Andrew Cooper
  0 siblings, 0 replies; 9+ messages in thread
From: Andrew Cooper @ 2013-08-30  9:51 UTC (permalink / raw)
  To: Rushikesh Jadhav; +Cc: xen-devel@lists.xen.org


[-- Attachment #1.1: Type: text/plain, Size: 1546 bytes --]

On 29/08/13 20:46, Rushikesh Jadhav wrote:
>
>  
>
>>      
>>
>>
>>         Having said that, you are using Xen 3.4 which is ages out of
>>         date, and in fact, probably using XenServer 5.6SP2 (so
>>         shouldn't be using xen-devel anyway).
>>
>>
>>     It is quite an old yet stable host with trusted guests. I wanted
>>     more information about Xen stack traces and whats the best way to
>>     read them hence sent on the list.
>>
>>     Current crash did not generate a core dump, hence I tried
>>     crashdump analyzer from xenbits.xen.org <http://xenbits.xen.org>
>>     on other core dumps but that fails with 
>>
>>     INFO  Elf CORE crash file: /tmp/core.kdump.1405
>>     ERROR Unexpected class 1
>>     ERROR Failed to parse the crash file
>>
>>     in xen-crashdump-analyser.log.
>
>     There is a reason I threw kdump away and wrote the crashdump
>     analyser; It would crash all over the place.  That crash file is
>     from the kdump utility, not from Xen.  The crashdump analyser will
>     not be able to parse it.
>
>
> I have only one binary dump file in each crash logs /var/crash/
> as core.kdump.1405. Rest all are .log files which does not seem to be
> parsed by the github tool.  Is there any other file I should be
> looking at ?  /proc/vmcore looks to be default file for tool but its
> not present.

/proc/vmcore only exists at the time of the crash.  Is is a pseudo file
which refers to the crashed memory of Xen at the time.  It ceases to
exist as soon as the soon as the server reboots.

~Andrew

[-- Attachment #1.2: Type: text/html, Size: 4332 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Xen Host Crash
@ 2019-03-02 18:28 Rishi
  0 siblings, 0 replies; 9+ messages in thread
From: Rishi @ 2019-03-02 18:28 UTC (permalink / raw)
  To: xen-devel

Hello

I have a xen host crash report from XCP-NG and stack looks like

         [ffff82d0801179c8] elf_core_save_regs+0/0xae
          ffff82d08011849c  kexec_crash+0x59/0x5b
          ffff82d0801444e9  panic+0xea/0x115
          ffff82d08019dc50  do_page_fault+0x443/0x4bf
          ffff82d08023ade3  handle_exception+0x9b/0xf9
          ffff82d08023aea9  handle_exception_saved+0x68/0x94
          ffff82d08022cc4b  sh_page_fault__guest_4+0xb62/0x1f97
          ffff82d08022cb53  sh_page_fault__guest_4+0xa6a/0x1f97
          ffff82d0801352cf  _spin_unlock_recursive+0x2f/0x34
          ffff82d08022bf13  sh_update_cr3__guest_4+0x8a8/0xa7e
          ffff82d080137ce7  add_entry+0x54/0xb2
          ffff82d080177604  update_cr3+0x26/0x48
          ffff82d080235b68  toggle_guest_pt+0x2b/0x147
          ffff82d080235cbc  toggle_guest_mode+0x38/0x4a
          ffff82d080235dcd  do_iret+0xff/0x19c
          ffff82d08023ade3  handle_exception+0x9b/0xf9
          ffff82d08023add7  handle_exception+0x8f/0xf9
          ffff82d08023ade3  handle_exception+0x9b/0xf9
          ffff82d08023add7  handle_exception+0x8f/0xf9
          ffff82d08023ade3  handle_exception+0x9b/0xf9
          ffff82d08023add7  handle_exception+0x8f/0xf9
          ffff82d08023ade3  handle_exception+0x9b/0xf9
          ffff82d08023ade3  handle_exception+0x9b/0xf9
          ffff82d08019d978  do_page_fault+0x16b/0x4bf
          ffff82d08023ade3  handle_exception+0x9b/0xf9
          ffff82d08023aea9  handle_exception_saved+0x68/0x94

Xen Version used is "4.7.6-6.3.1.xcp"

I would like to know if it has to do with patch
https://xenbits.xen.org/gitweb/?p=xen.git;a=commit;h=9dc1e0cd81ee469d638d1962a92d9b4bd2972bfa
This patch is not applied on stable-4.7 or stable-4.8, but it is
applied on stable-4.9 and above till master.

Thank you for your time.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2019-03-02 18:28 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-08-29 16:22 Xen host crash Rushikesh Jadhav
2013-08-29 16:25 ` Andrew Cooper
2013-08-29 16:38   ` Rushikesh Jadhav
2013-08-29 17:02     ` Andrew Cooper
2013-08-29 19:10       ` Rushikesh Jadhav
2013-08-29 19:15         ` Andrew Cooper
2013-08-29 19:46           ` Rushikesh Jadhav
2013-08-30  9:51             ` Andrew Cooper
  -- strict thread matches above, loose matches on Subject: below --
2019-03-02 18:28 Xen Host Crash Rishi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.