All of lore.kernel.org
 help / color / mirror / Atom feed
* kexec+kdump troubles on xen 4.5-unstable, centos 7, x86_64 (need to get a crash dump)
@ 2014-10-17 18:17 Григорий Пташко
  2014-10-17 19:32 ` Andrew Cooper
  0 siblings, 1 reply; 4+ messages in thread
From: Григорий Пташко @ 2014-10-17 18:17 UTC (permalink / raw)
  To: Xen


[-- Attachment #1.1: Type: text/plain, Size: 5638 bytes --]

Hello.

The long story is this. I'm running CentOS 7 with custom built kernel.
My architecture is x86_64. I'm trying to passthrough different GPUs to xen.
I've got a problem with AMD FirePro W9100. Windows HVM guest starts with GPU
and even some 3D benchmark is running OK. But after some time of working the
domU and dom0 freeze.
I monitor the serial console for kernel panics but I don't see them at all.
I've decided to make a crash dump of the dom0 kernel to see what's going on.
And it appears that I just cannot do this.
I've tried specifying the crashkernel parameter both for the xen.gz and for
my dom0 kernel (bzImage).


1. The first case: crashkernel=256M for dom0 cmdline:

bzImage crashkernel=256M

[root@kvmxen-centos7-test1-nb ~]# systemctl status kdump.service
kdump.service - Crash recovery kernel arming
...
окт 17 21:19:38 kvmxen-centos7-test1-nb kdumpctl[1506]: kexec: loaded kdump
kernel
...

[root@kvmxen-centos7-test1-nb ~]# cat /sys/kernel/kexec_crash_loaded
1

Here we see that kexec from kdump.service worked well. Seems like it has
loaded the dump capture kernel.
And now let's try to panic:

[root@kvmxen-centos7-test1-nb ~]# echo c > /proc/sysrq-trigger

In the console we see:

[  421.673471] SysRq : Trigger a crash
[  421.677110] BUG: unable to handle kernel NULL pointer dereference at
      (null)
[  421.685021] IP: [<ffffffff81484486>] sysrq_handle_crash+0x16/0x20
[  421.691172] PGD 2d11e58067 PUD 2c95d3c067 PMD 0
[  421.695900] Oops: 0002 [#1] SMP
[  421.699210] Modules linked in: ip6table_filter ip6_tables iptable_filter
ip_tables ebtable_nat ebtables sg rpcsec_gss_krb5 nls_utf8 iTCO_wdt
iTCO_vendor_support x86_pkg_temp_thermal coretemp crct10dif_pclmul
crct10dif_common crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel
lrw gf128mul sb_edac glue_helper ablk_helper ipmi_si lpc_ich edac_core
cryptd i2c_i801 pcspkr mfd_core ipmi_msghandler mei_me ioatdma wmi mei
shpchp dca nfsd binfmt_misc mgag200 drm_kms_helper ttm drm ahci mlx4_core
libahci libata
[  421.745725] CPU: 9 PID: 11422 Comm: bash Not tainted 3.17.0 #3
[  421.751562] Hardware name: Supermicro
X9DRFF-iG+/-7G+/-iTG+/-7TG+/X9DRFF-iG+/-7G+/-iTG+/-7TG+, BIOS 3.0 07/29/2013
[  421.761910] task: ffff882e94383640 ti: ffff882c71758000 task.ti:
ffff882c71758000
[  421.769398] RIP: e030:[<ffffffff81484486>]  [<ffffffff81484486>]
sysrq_handle_crash+0x16/0x20
[  421.777961] RSP: e02b:ffff882c7175be88  EFLAGS: 00010246
[  421.783276] RAX: 000000000000000f RBX: ffffffff81d2d780 RCX:
0000000000000000
[  421.790416] RDX: 0000000000000000 RSI: ffff882eea52e5b8 RDI:
0000000000000063
[  421.797557] RBP: ffff882c7175be88 R08: 0000000000000002 R09:
ffffffff82034afc
[  421.804708] R10: 00000000000004a7 R11: 00000000000004a6 R12:
0000000000000063
[  421.811839] R13: 0000000000000000 R14: 0000000000000007 R15:
0000000000000000
[  421.818992] FS:  00007f1c0205b740(0000) GS:ffff882eea520000(0000)
knlGS:0000000000000000
[  421.827075] CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
[  421.832821] CR2: 0000000000000000 CR3: 0000002c2a879000 CR4:
0000000000042660
[  421.839972] Stack:
[  421.841998]  ffff882c7175beb8 ffffffff81484cd7 0000000000000002
00007f1c0207f000
[  421.849494]  0000000000000002 ffff882c7175bf48 ffff882c7175bed0
ffffffff8148517f
[  421.857019]  ffff882e94765380 ffff882c7175bef0 ffffffff81251afd
ffff882c7175bf48
[  421.864514] Call Trace:
[  421.866981]  [<ffffffff81484cd7>] __handle_sysrq+0x107/0x170
[  421.872645]  [<ffffffff8148517f>] write_sysrq_trigger+0x2f/0x40
[  421.878575]  [<ffffffff81251afd>] proc_reg_write+0x3d/0x80
[  421.884069]  [<ffffffff811eaef7>] vfs_write+0xb7/0x1f0
[  421.889209]  [<ffffffff811ebb15>] SyS_write+0x55/0xd0
[  421.894294]  [<ffffffff8183fc29>] system_call_fastpath+0x16/0x1b
[  421.900300] Code: 65 34 75 e5 4c 89 ef e8 d9 f7 ff ff eb db 0f 1f 80 00
00 00 00 66 66 66 66 90 55 c7 05 88 43 7f 00 01 00 00 00 48 89 e5 0f ae f8
<c6> 04 25 00 00 00 00 01 5d c3 66 66 66 66 90 55 31 c0 c7 05 2e
[  421.920596] RIP  [<ffffffff81484486>] sysrq_handle_crash+0x16/0x20
[  421.926803]  RSP <ffff882c7175be88>
[  421.930302] CR2: 0000000000000000

And that's it. The dump capture kernel is not loaded. After this kernel
panic
my server just reboot.


2. The second case: crashkernel=256M in xen.gz cmdline.

xen.gz crashkernel=256M

[root@kvmxen-centos7-test1-nb ~]# systemctl status kdump.service
kdump.service - Crash recovery kernel arming
...
   Active: failed (Result: exit-code) since Пт 2014-10-17 19:56:57 MSK; 1h
9min ago
...
окт 17 19:56:57 kvmxen-centos7-test1-nb kdumpctl[1536]: No memory reserved
for crash kernel.
окт 17 19:56:57 kvmxen-centos7-test1-nb kdumpctl[1536]: Starting kdump:
[FAILED]
....

As we see the kdump.service cannot load the dump capture kernel because
'No memory reserved for crash kernel'.


So the questions are:

1. How can I make crash dumps of the hypervisor and the dom0?

2. How am I supposed to diagnose the thing that causes such dom0 freezes?
I thought that if I ask on the list that my dom0 freezes, it will be a waste
of time without any logs or crash dumps.. But I cannot even make them..

I really want to contribute by testing xen and submitting bugs but I'd like
to do it with more material for the developers.


Thank you,
Grigory.


-- 
Best regards,
Grigory Ptashko

+7 (916) 1489766
grigory.ptashko@gmail.com
skype grigory_ptashko
linkedin.com/in/gptashko <http://ru.linkedin.com/in/gptashko/>
facebook.com/GrigoryPtashko <https://www.facebook.com/GrigoryPtashko>

[-- Attachment #1.2: Type: text/html, Size: 7097 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2014-10-19 11:30 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-10-17 18:17 kexec+kdump troubles on xen 4.5-unstable, centos 7, x86_64 (need to get a crash dump) Григорий Пташко
2014-10-17 19:32 ` Andrew Cooper
2014-10-19  9:51   ` Григорий Пташко
2014-10-19 11:30     ` Andrew Cooper

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.