kexec+kdump troubles on xen 4.5-unstable, centos 7, x86

All of lore.kernel.org
 help / color / mirror / Atom feed

* kexec+kdump troubles on xen 4.5-unstable, centos 7, x86_64 (need to get a crash dump)
@ 2014-10-17 18:17 Григорий Пташко
  2014-10-17 19:32 ` Andrew Cooper
  0 siblings, 1 reply; 4+ messages in thread
From: Григорий Пташко @ 2014-10-17 18:17 UTC (permalink / raw)
  To: Xen

[-- Attachment #1.1: Type: text/plain, Size: 5638 bytes --]

Hello.

The long story is this. I'm running CentOS 7 with custom built kernel.
My architecture is x86_64. I'm trying to passthrough different GPUs to xen.
I've got a problem with AMD FirePro W9100. Windows HVM guest starts with GPU
and even some 3D benchmark is running OK. But after some time of working the
domU and dom0 freeze.
I monitor the serial console for kernel panics but I don't see them at all.
I've decided to make a crash dump of the dom0 kernel to see what's going on.
And it appears that I just cannot do this.
I've tried specifying the crashkernel parameter both for the xen.gz and for
my dom0 kernel (bzImage).

1. The first case: crashkernel=256M for dom0 cmdline:

bzImage crashkernel=256M

[root@kvmxen-centos7-test1-nb ~]# systemctl status kdump.service
kdump.service - Crash recovery kernel arming
...
окт 17 21:19:38 kvmxen-centos7-test1-nb kdumpctl[1506]: kexec: loaded kdump
kernel
...

[root@kvmxen-centos7-test1-nb ~]# cat /sys/kernel/kexec_crash_loaded
1

Here we see that kexec from kdump.service worked well. Seems like it has
loaded the dump capture kernel.
And now let's try to panic:

[root@kvmxen-centos7-test1-nb ~]# echo c > /proc/sysrq-trigger

In the console we see:

[  421.673471] SysRq : Trigger a crash
[  421.677110] BUG: unable to handle kernel NULL pointer dereference at
      (null)
[  421.685021] IP: [<ffffffff81484486>] sysrq_handle_crash+0x16/0x20
[  421.691172] PGD 2d11e58067 PUD 2c95d3c067 PMD 0
[  421.695900] Oops: 0002 [#1] SMP
[  421.699210] Modules linked in: ip6table_filter ip6_tables iptable_filter
ip_tables ebtable_nat ebtables sg rpcsec_gss_krb5 nls_utf8 iTCO_wdt
iTCO_vendor_support x86_pkg_temp_thermal coretemp crct10dif_pclmul
crct10dif_common crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel
lrw gf128mul sb_edac glue_helper ablk_helper ipmi_si lpc_ich edac_core
cryptd i2c_i801 pcspkr mfd_core ipmi_msghandler mei_me ioatdma wmi mei
shpchp dca nfsd binfmt_misc mgag200 drm_kms_helper ttm drm ahci mlx4_core
libahci libata
[  421.745725] CPU: 9 PID: 11422 Comm: bash Not tainted 3.17.0 #3
[  421.751562] Hardware name: Supermicro
X9DRFF-iG+/-7G+/-iTG+/-7TG+/X9DRFF-iG+/-7G+/-iTG+/-7TG+, BIOS 3.0 07/29/2013
[  421.761910] task: ffff882e94383640 ti: ffff882c71758000 task.ti:
ffff882c71758000
[  421.769398] RIP: e030:[<ffffffff81484486>]  [<ffffffff81484486>]
sysrq_handle_crash+0x16/0x20
[  421.777961] RSP: e02b:ffff882c7175be88  EFLAGS: 00010246
[  421.783276] RAX: 000000000000000f RBX: ffffffff81d2d780 RCX:
0000000000000000
[  421.790416] RDX: 0000000000000000 RSI: ffff882eea52e5b8 RDI:
0000000000000063
[  421.797557] RBP: ffff882c7175be88 R08: 0000000000000002 R09:
ffffffff82034afc
[  421.804708] R10: 00000000000004a7 R11: 00000000000004a6 R12:
0000000000000063
[  421.811839] R13: 0000000000000000 R14: 0000000000000007 R15:
0000000000000000
[  421.818992] FS:  00007f1c0205b740(0000) GS:ffff882eea520000(0000)
knlGS:0000000000000000
[  421.827075] CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
[  421.832821] CR2: 0000000000000000 CR3: 0000002c2a879000 CR4:
0000000000042660
[  421.839972] Stack:
[  421.841998]  ffff882c7175beb8 ffffffff81484cd7 0000000000000002
00007f1c0207f000
[  421.849494]  0000000000000002 ffff882c7175bf48 ffff882c7175bed0
ffffffff8148517f
[  421.857019]  ffff882e94765380 ffff882c7175bef0 ffffffff81251afd
ffff882c7175bf48
[  421.864514] Call Trace:
[  421.866981]  [<ffffffff81484cd7>] __handle_sysrq+0x107/0x170
[  421.872645]  [<ffffffff8148517f>] write_sysrq_trigger+0x2f/0x40
[  421.878575]  [<ffffffff81251afd>] proc_reg_write+0x3d/0x80
[  421.884069]  [<ffffffff811eaef7>] vfs_write+0xb7/0x1f0
[  421.889209]  [<ffffffff811ebb15>] SyS_write+0x55/0xd0
[  421.894294]  [<ffffffff8183fc29>] system_call_fastpath+0x16/0x1b
[  421.900300] Code: 65 34 75 e5 4c 89 ef e8 d9 f7 ff ff eb db 0f 1f 80 00
00 00 00 66 66 66 66 90 55 c7 05 88 43 7f 00 01 00 00 00 48 89 e5 0f ae f8
<c6> 04 25 00 00 00 00 01 5d c3 66 66 66 66 90 55 31 c0 c7 05 2e
[  421.920596] RIP  [<ffffffff81484486>] sysrq_handle_crash+0x16/0x20
[  421.926803]  RSP <ffff882c7175be88>
[  421.930302] CR2: 0000000000000000

And that's it. The dump capture kernel is not loaded. After this kernel
panic
my server just reboot.

2. The second case: crashkernel=256M in xen.gz cmdline.

xen.gz crashkernel=256M

[root@kvmxen-centos7-test1-nb ~]# systemctl status kdump.service
kdump.service - Crash recovery kernel arming
...
   Active: failed (Result: exit-code) since Пт 2014-10-17 19:56:57 MSK; 1h
9min ago
...
окт 17 19:56:57 kvmxen-centos7-test1-nb kdumpctl[1536]: No memory reserved
for crash kernel.
окт 17 19:56:57 kvmxen-centos7-test1-nb kdumpctl[1536]: Starting kdump:
[FAILED]
....

As we see the kdump.service cannot load the dump capture kernel because
'No memory reserved for crash kernel'.

So the questions are:

1. How can I make crash dumps of the hypervisor and the dom0?

2. How am I supposed to diagnose the thing that causes such dom0 freezes?
I thought that if I ask on the list that my dom0 freezes, it will be a waste
of time without any logs or crash dumps.. But I cannot even make them..

I really want to contribute by testing xen and submitting bugs but I'd like
to do it with more material for the developers.

Thank you,
Grigory.

-- 
Best regards,
Grigory Ptashko

+7 (916) 1489766
grigory.ptashko@gmail.com
skype grigory_ptashko
linkedin.com/in/gptashko <http://ru.linkedin.com/in/gptashko/>
facebook.com/GrigoryPtashko <https://www.facebook.com/GrigoryPtashko>

[-- Attachment #1.2: Type: text/html, Size: 7097 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: kexec+kdump troubles on xen 4.5-unstable, centos 7, x86_64 (need to get a crash dump)
  2014-10-17 18:17 kexec+kdump troubles on xen 4.5-unstable, centos 7, x86_64 (need to get a crash dump) Григорий Пташко
@ 2014-10-17 19:32 ` Andrew Cooper
  2014-10-19  9:51   ` Григорий Пташко
  0 siblings, 1 reply; 4+ messages in thread
From: Andrew Cooper @ 2014-10-17 19:32 UTC (permalink / raw)
  To: Григорий Пташко,
	Xen

[-- Attachment #1.1: Type: text/plain, Size: 964 bytes --]

On 17/10/2014 19:17, Григорий Пташко wrote:
>
> So the questions are:
>
> 1. How can I make crash dumps of the hypervisor and the dom0?

Kexec of domains inside themselves is not supported.  Effort is being
made to make it work, but there are some architectural challenges.

The correct method is method 2, by providing a crash region in Xen for
dom0 to load into.  I suspect your problem is that systemd doesn't
understand that it is running in dom0, and is attempting to load a
normal crash kernel.

An up-to-date kexec-tools  and running `kexek` manually ought to do the
right thing.

>
> 2. How am I supposed to diagnose the thing that causes such dom0 freezes?
> I thought that if I ask on the list that my dom0 freezes, it will be a
> waste
> of time without any logs or crash dumps.. But I cannot even make them..

On the serial console, if dom0 freezes, Xen should still be usable. use
CTRL-a three times.

~Andrew

[-- Attachment #1.2: Type: text/html, Size: 1865 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: kexec+kdump troubles on xen 4.5-unstable, centos 7, x86_64 (need to get a crash dump)
  2014-10-17 19:32 ` Andrew Cooper
@ 2014-10-19  9:51   ` Григорий Пташко
  2014-10-19 11:30     ` Andrew Cooper
  0 siblings, 1 reply; 4+ messages in thread
From: Григорий Пташко @ 2014-10-19  9:51 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen


[-- Attachment #1.1: Type: text/plain, Size: 2985 bytes --]

2014-10-17 23:32 GMT+04:00 Andrew Cooper <andrew.cooper3@citrix.com>:

>  On 17/10/2014 19:17, Григорий Пташко wrote:
>
>
> So the questions are:
>
>  1. How can I make crash dumps of the hypervisor and the dom0?
>
>
> Kexec of domains inside themselves is not supported.  Effort is being made
> to make it work, but there are some architectural challenges.
>
> The correct method is method 2, by providing a crash region in Xen for
> dom0 to load into.  I suspect your problem is that systemd doesn't
> understand that it is running in dom0, and is attempting to load a normal
> crash kernel.
>
> An up-to-date kexec-tools  and running `kexek` manually ought to do the
> right thing.
>
>
OK. I've tried it again. Here's my cmdline:

APPEND xen.gz console=com1 com1=115200,8n1 crashkernel=256M iommu=1 ---
bzImage ignore_loglevel serial console=ttyS1,115200n8 ...

Here's what I see in dom0:

[root@kvmxen-centos7-test1-nb admin]# xl dmesg | grep crash
(XEN) Command line: console=com1 com1=115200,8n1 crashkernel=256M iommu=1

[root@kvmxen-centos7-test1-nb admin]# kexec -p /boot/bzImage
Memory for crashkernel is not reserved
Please reserve memory by passing "crashkernel=X@Y" parameter to the kernel
Then try loading kdump kernel

Here's the kexec's version (I built it from source rpm):

[root@kvmxen-centos7-test1-nb admin]# kexec --version
kexec-tools 2.0.4 released 17 October 2014

kdump.service is disabled in systemd. What am I doing wrong?


>
>  2. How am I supposed to diagnose the thing that causes such dom0 freezes?
> I thought that if I ask on the list that my dom0 freezes, it will be a
> waste
> of time without any logs or crash dumps.. But I cannot even make them..
>
>
> On the serial console, if dom0 freezes, Xen should still be usable.  use
> CTRL-a three times.
>

I monitor serial console via SOL (serial over lan) with this command:

$ ipmitool -I lanplus -U user -P passwd -H host sol activate

Having the cmdline I've mentioned above, I don't see any xen dmesg.
I see only the dom0 dmesg and systemd logs while my server is starting up.
After the login prompt appears I press Ctrl-A A A or Ctrl-A Ctrl-A Ctrl-A
but nothing changes. Login prompt does not go away and I don't see any xen
logs.

Also, we I issue the panic manually, I can't do anything on this SOL
console.
I just a dom0's kernel panic and the server reboots after a few seconds.

How am I supposed to get into the *alive* xen from SOL console when a
dom0 kernel panic occurs?
Do I have a wrong cmdline to use xen serial console the way I want
(I want to see xen being alive when dom0 freezes)?


Thank you very much,
Grigory.


>
>
> ~Andrew
>



-- 
Best regards,
Grigory Ptashko

+7 (916) 1489766
grigory.ptashko@gmail.com
skype grigory_ptashko
linkedin.com/in/gptashko <http://ru.linkedin.com/in/gptashko/>
facebook.com/GrigoryPtashko <https://www.facebook.com/GrigoryPtashko>

[-- Attachment #1.2: Type: text/html, Size: 5159 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: kexec+kdump troubles on xen 4.5-unstable, centos 7, x86_64 (need to get a crash dump)
  2014-10-19  9:51   ` Григорий Пташко
@ 2014-10-19 11:30     ` Andrew Cooper
  0 siblings, 0 replies; 4+ messages in thread
From: Andrew Cooper @ 2014-10-19 11:30 UTC (permalink / raw)
  To: Григорий Пташко
  Cc: Xen


[-- Attachment #1.1: Type: text/plain, Size: 2063 bytes --]

On 19/10/2014 10:51, Григорий Пташко wrote:
>
>
> 2014-10-17 23:32 GMT+04:00 Andrew Cooper <andrew.cooper3@citrix.com
> <mailto:andrew.cooper3@citrix.com>>:
>
>     On 17/10/2014 19:17, Григорий Пташко wrote:
>>
>>     So the questions are:
>>
>>     1. How can I make crash dumps of the hypervisor and the dom0?
>
>     Kexec of domains inside themselves is not supported.  Effort is
>     being made to make it work, but there are some architectural
>     challenges.
>
>     The correct method is method 2, by providing a crash region in Xen
>     for dom0 to load into.  I suspect your problem is that systemd
>     doesn't understand that it is running in dom0, and is attempting
>     to load a normal crash kernel.
>
>     An up-to-date kexec-tools  and running `kexek` manually ought to
>     do the right thing.
>
>
> OK. I've tried it again. Here's my cmdline:
>
> APPEND xen.gz console=com1 com1=115200,8n1 crashkernel=256M iommu=1
> --- bzImage ignore_loglevel serial console=ttyS1,115200n8 ...

ttyS1 is the second serial console, not the first.  Xen should be using
com2 not com1 on the command line.

Linux should be configured to use hvc0 which will then be muxed by Xen
onto the serial.

This should now get you the Xen console ring on the serial as well.

>
> Here's what I see in dom0:
>
> [root@kvmxen-centos7-test1-nb admin]# xl dmesg | grep crash
> (XEN) Command line: console=com1 com1=115200,8n1 crashkernel=256M iommu=1
>
> [root@kvmxen-centos7-test1-nb admin]# kexec -p /boot/bzImage 
> Memory for crashkernel is not reserved
> Please reserve memory by passing "crashkernel=X@Y" parameter to the kernel
> Then try loading kdump kernel
>
> Here's the kexec's version (I built it from source rpm):
>
> [root@kvmxen-centos7-test1-nb admin]# kexec --version
> kexec-tools 2.0.4 released 17 October 2014

I don't know when that date is from, but kexec-tools 2.0.4 is much older
than that.  You want 2.0.5 or newer, which contains the Xen support.

~Andrew

[-- Attachment #1.2: Type: text/html, Size: 4587 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2014-10-19 11:30 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-10-17 18:17 kexec+kdump troubles on xen 4.5-unstable, centos 7, x86_64 (need to get a crash dump) Григорий Пташко
2014-10-17 19:32 ` Andrew Cooper
2014-10-19  9:51   ` Григорий Пташко
2014-10-19 11:30     ` Andrew Cooper

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.