From mboxrd@z Thu Jan 1 00:00:00 1970 From: Stefan Bader Subject: Shutdown panic in disable_nonboot_cpus after cpupool-numa-split Date: Mon, 07 Jul 2014 13:33:14 +0200 Message-ID: <53BA857A.8070608@canonical.com> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============3818626120857037049==" Return-path: List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: "xen-devel@lists.xensource.com" List-Id: xen-devel@lists.xenproject.org This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --===============3818626120857037049== Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="RHaGoMA0KswKEH9wW1gobcPA8UL9CvToS" This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --RHaGoMA0KswKEH9wW1gobcPA8UL9CvToS Content-Type: text/plain; charset=ISO-8859-15 Content-Transfer-Encoding: quoted-printable I recently noticed that I get a panic (rebooting the system) on shutdown = in some cases. This happened only on my AMD system and also not all the time. Fin= ally realized that it is related to the use of using cpupool-numa-split (libxl with xen-4.4 maybe, but not 100% sure 4.3 as well). What happens is that on shutdown the hypervisor runs disable_nonboot_cpus= which call cpu_down for each online cpu. There is a BUG_ON in the code for the = case of cpu_down returning -EBUSY. This happens in my case as soon as the first c= pu that has been moved to pool-1 by cpupool-numa-split is attempted. The error is= returned by running the notifier_call_chain and I suspect that ends up ca= lling cpupool_cpu_remove which always returns EBUSY for cpus not in pool0. I am not sure which end needs to be fixed but looping over all online cpu= s in disable_nonboot_cpus sounds plausible. So maybe the check for pool-0 in cpupool_cpu_remove is wrong...? -Stefan [I switched around printk and BUG_ON to actually see the offending cpu] (XEN) mydbg: after notifier_call_chain in cpu_down (XEN) Error taking CPU4 down: -16 (XEN) Xen BUG at cpu.c:196 [@190 normally] (XEN) ----[ Xen-4.4.0 x86_64 debug=3Dn Not tainted ]---- (XEN) CPU: 0 (XEN) RIP: e008:[] disable_nonboot_cpus+0xff/0x110 (XEN) RFLAGS: 0000000000010246 CONTEXT: hypervisor (XEN) rax: ffff82d0802f8320 rbx: 00000000fffffff0 rcx: 00000000000000= 00 (XEN) rdx: ffff82d0802b0000 rsi: 000000000000000a rdi: ffff82d0802676= 38 (XEN) rbp: 0000000000000004 rsp: ffff82d0802b7e50 r8: ffff83041ff900= 00 (XEN) r9: 0000000000010000 r10: 0000000000000001 r11: 00000000000000= 02 (XEN) r12: 0000000000000005 r13: ffff82d0802e2620 r14: 00000000000000= 03 (XEN) r15: ffff82d0802e2620 cr0: 000000008005003b cr4: 00000000000006= f0 (XEN) cr3: 00000000dfc65000 cr2: ffff88002acdb798 (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: e010 cs: e008 (XEN) Xen stack trace from rsp=3Dffff82d0802b7e50: (XEN) ffff82d0802e2620 0000000000000000 ffff82d0802f8320 ffff82d08019e= b82 (XEN) efff82d0802f8380 ffff8300dfcff000 ffff8300dfcff000 ffff83040dca4= 0a0 (XEN) ffff8300dfcff000 ffff82d0802f8308 ffff82d0802e2620 0000000000000= 003 (XEN) ffff82d0802e2620 ffff82d0801054be ffff8300dfcff180 ffff82d0802f8= 400 (XEN) ffff82d0802f8410 ffff82d080129970 0000000000000000 ffff82d080129= c69 (XEN) ffff82d0802b0000 ffff8300dfcff000 00000000ffffffff ffff82d08015b= d2b (XEN) ffff8300dfafe000 00000000fee1dead 00007fada0b3fc8c 0000000000002= 001 (XEN) 0000000000000005 ffff880029717d78 0000000000000000 0000000000000= 246 (XEN) 00000000ffff0000 0000000000000000 0000000000000005 0000000000000= 000 (XEN) ffffffff810010ea 0000000000002001 0000000000003401 ffff880029717= ce0 (XEN) 0000010000000000 ffffffff810010ea 000000000000e033 0000000000000= 246 (XEN) ffff880029717cc8 000000000000e02b 0000000000000000 0000000000000= 000 (XEN) 0000000000000000 0000000000000000 0000000000000000 ffff8300dfafe= 000 (XEN) 0000000000000000 0000000000000000 (XEN) Xen call trace: (XEN) Xen call trace: (XEN) [] disable_nonboot_cpus+0xff/0x110 (XEN) [] enter_state_helper+0xc2/0x3c0 (XEN) [] continue_hypercall_tasklet_handler+0xbe/0xd= 0 (XEN) [] do_tasklet_work+0x60/0xa0 (XEN) [] do_tasklet+0x59/0x90 (XEN) [] idle_loop+0x1b/0x50 (XEN) (XEN) (XEN) **************************************** (XEN) Panic on CPU 0: (XEN) Xen BUG at cpu.c:196 (XEN) **************************************** (XEN) (XEN) Reboot in five seconds... --RHaGoMA0KswKEH9wW1gobcPA8UL9CvToS Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQIcBAEBCgAGBQJTuoV7AAoJEOhnXe7L7s6jS+AQANaazvlb4bRggZwzfmyxMQTu CpiOF/XTvb7ZhQ1o2HasKpbL6iDBjplpZoxcZgQw+EN1G447in3FW/Ze7Hl9vbgp oiyyZXlEHnqmuyECvMlu8nu6kEWP0UYWXTMI7fVk2V2lH4Q3tUp5Z+hh+RtrSZaT 2RVGSa9N+Dbj8imX5L1r8EBb3VuXyq5Z/yzShsp5p/nSn2HaCfhtTlO2C4kkpJpx xqYNQWdk2CLImLw8pftc0pbisGhTE/JGcodWgYmU606xHZECUb60ziP5KVqrqNB4 Us0W3oSJ11C4lYQjBEkx13YTCIpHZwXiqZlMAT8yzteCmyVF5D/EmmOwVpngcA8B FbfLLEKNZOU/hv7JfF05btsl5dEgj+B9djydcnSkoQiDqvrTXBdDjT5EZuwlsheQ Se4/aOH46runVvFIg3oaE50pwBGHjXksxPNoJKfP2heia21z84sh7r1VWiMMWvFR byTN2RzLsBdUzwoBbY5zsbjPej1ll/RQLmM85IBl6/FWTuXr2QM0Oe9OgXlvQSe9 I/+KxNkS+Bx2Qlk8SJdnIbikZ9ir8kOu+XAvXrYHkliPhPBSANfBnDrd7YtZj71T +9VPSwvSzndGu0X4cEZjr2s64mzOClVnczINmfkM8TXwW5mHQ18I5WOtvq8VfZNA YF2+ANauuCZxf0W4RSwi =mHjE -----END PGP SIGNATURE----- --RHaGoMA0KswKEH9wW1gobcPA8UL9CvToS-- --===============3818626120857037049== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel --===============3818626120857037049==--