From: Tomasz Wroblewski <tomasz.wroblewski@citrix.com>
To: "xen-devel@lists.xen.org" <xen-devel@lists.xen.org>
Cc: george.dunlap@eu.citrix.com, keir@xen.org,
Jan Beulich <JBeulich@suse.com>
Subject: [PATCH] Fix scheduler crash after s3 resume
Date: Wed, 23 Jan 2013 16:51:43 +0100 [thread overview]
Message-ID: <5100070F.7010808@citrix.com> (raw)
[-- Attachment #1: Type: text/plain, Size: 4706 bytes --]
Hi all,
This was also discussed earlier, for example here
http://xen.markmail.org/thread/iqvkylp3mclmsnbw
Changeset 25079:d5ccb2d1dbd1 (Introduce system_state variable) added a
global variable, which, among other things, is used to prevent disabling
cpu scheduler, prevent breaking vcpu affinities, prevent removing the
cpu from cpupool on suspend. However, it missed one place where cpu is
removed from the cpupool valid cpus mask, in smpboot.c, __cpu_disable(),
line 840:
cpumask_clear_cpu(cpu, cpupool0->cpu_valid);
This causes the vcpu in the default pool to be considered inactive, and
the following assertion is violated in sched_credit.c soon after resume
transitions out of xen, causing a platform reboot:
(XEN) Finishing wakeup from ACPI S3 state.
(XEN) Enabling non-boot CPUs ...
(XEN) Assertion '!cpumask_empty(&cpus) && cpumask_test_cpu(cpu, &cpus)'
failed at sched_credit.c:507
(XEN) ----[ Xen-4.3-unstable x86_64 debug=y Tainted: C ]----
(XEN) CPU: 1
(XEN) RIP: e008:[<ffff82c480119e9e>] _csched_cpu_pick+0x155/0x5fd
(XEN) RFLAGS: 0000000000010202 CONTEXT: hypervisor
(XEN) rax: 0000000000000001 rbx: 0000000000000008 rcx: 0000000000000008
(XEN) rdx: 00000000000000ff rsi: 0000000000000008 rdi: 0000000000000000
(XEN) rbp: ffff83011415fdd8 rsp: ffff83011415fcf8 r8: 0000000000000000
(XEN) r9: 000000000000003e r10: 00000008f3de731f r11: ffffea0000063800
(XEN) r12: ffff82c480261720 r13: ffff830137b4d950 r14: ffff830137beb010
(XEN) r15: ffff82c480261720 cr0: 0000000080050033 cr4: 00000000000026f0
(XEN) cr3: 000000013c17d000 cr2: ffff8800ac6ef8f0
(XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: 0000 cs: e008
(XEN) Xen stack trace from rsp=ffff83011415fcf8:
(XEN) 00000000000af257 0000000800000001 ffff8300ba4fd000 0000000000000000
(XEN) 0000000000000000 0000000000000000 0000000000000002 ffff8800ac6ef8f0
(XEN) 0000000800000000 00000001318e0025 0000000000000087 ffff83011415fd68
(XEN) ffff82c480124f79 ffff83011415fd98 ffff83011415fda8 00007fda88d1e790
(XEN) ffff8800ac6ef8f0 00000001318e0025 0000000000000000 0000000000000000
(XEN) 0000000000000000 0000000000000000 0000000000000146 ffff830137b4d940
(XEN) 0000000000000001 ffff830137b4d950 ffff830137beb010 ffff82c480261720
(XEN) ffff83011415fe48 ffff82c48011a51b 0002000e00000007 ffffffff81009071
(XEN) 000000000000e033 ffff83013a805360 ffff880002bb3c28 000000000000e02b
(XEN) e4d87248e7ca5f52 ffff830102ae2200 0000000000000001 ffff82c48011a356
(XEN) 00000008efa1f543 00007fda88d1e790 ffff83011415fe78 ffff82c48012748f
(XEN) 0000000000000002 ffff830137beb028 ffff830102ae2200 ffff830137beb8d0
(XEN) ffff83011415fec8 ffff82c48012758b ffff830114150000 ffff8800ac6ef8f0
(XEN) 80100000ae86d065 ffff82c4802e0080 ffff82c4802e0000 ffff830114158000
(XEN) ffffffffffffffff 00007fda88d1e790 ffff83011415fef8 ffff82c480124b4e
(XEN) ffff8300ba4fd000 ffffea0000063800 00000001318e0025 ffff8800ac6ef8f0
(XEN) ffff83011415ff08 ffff82c480124bb4 00007cfeebea00c7 ffff82c480226a71
(XEN) 00007fda88d1e790 ffff8800ac6ef8f0 00000001318e0025 ffffea0000063800
(XEN) ffff880002bb3c78 00000001318e0025 ffffea0000063800 0000000000000146
(XEN) 00003ffffffff000 ffffea0002b1bbf0 0000000000000000 00000001318e0025
(XEN) Xen call trace:
(XEN) [<ffff82c480119e9e>] _csched_cpu_pick+0x155/0x5fd
(XEN) [<ffff82c48011a51b>] csched_tick+0x1c5/0x342
(XEN) [<ffff82c48012748f>] execute_timer+0x4e/0x6c
(XEN) [<ffff82c48012758b>] timer_softirq_action+0xde/0x206
(XEN) [<ffff82c480124b4e>] __do_softirq+0x8e/0x99
(XEN) [<ffff82c480124bb4>] do_softirq+0x13/0x15
(XEN)
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 1:
(XEN) Assertion '!cpumask_empty(&cpus) && cpumask_test_cpu(cpu, &cpus)'
failed at sched_credit.c:507
(XEN) ****************************************
(XEN)
(XEN) Reboot in five seconds...
^ reason for above being that "cpus" cpumask is empty as it is a logical
"and" between cpupool's valid cpus (from which the cpu was removed) and
cpu affinity mask.
Attached patch follows the spirit of the changeset 25079:d5ccb2d1dbd1
(which blocked removal of the cpu from the cpupool in cpupool.c) by also
blocking it's removal from the cpupool's valid cpumask. So cpu
affinities are still preserved across suspend/resume, and scheuduler
does not need to be disabled, as per original intent (I think). Would
welcome comments.
Signed-off-by: Tomasz Wroblewski <tomasz.wroblewski@citrix.com>
Commit message:
Fix s3 resume regression (crash in scheduler) after c-s
25079:d5ccb2d1dbd1 by also blocking removal of the cpu from the
cpupool's cpu_valid mask - in the spirit of mentioned c-s.
[-- Attachment #2: fix-suspend-cpu-valid-mask --]
[-- Type: text/plain, Size: 501 bytes --]
diff -r 4b476378fc35 xen/arch/x86/smpboot.c
--- a/xen/arch/x86/smpboot.c Mon Jan 21 17:03:10 2013 +0000
+++ b/xen/arch/x86/smpboot.c Wed Jan 23 15:25:28 2013 +0000
@@ -837,7 +837,8 @@
remove_siblinginfo(cpu);
/* It's now safe to remove this processor from the online map */
- cpumask_clear_cpu(cpu, cpupool0->cpu_valid);
+ if (system_state != SYS_STATE_suspend)
+ cpumask_clear_cpu(cpu, cpupool0->cpu_valid);
cpumask_clear_cpu(cpu, &cpu_online_map);
fixup_irqs();
[-- Attachment #3: Type: text/plain, Size: 126 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
next reply other threads:[~2013-01-23 15:51 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-01-23 15:51 Tomasz Wroblewski [this message]
2013-01-23 16:11 ` [PATCH] Fix scheduler crash after s3 resume Jan Beulich
2013-01-23 16:57 ` Tomasz Wroblewski
2013-01-23 17:01 ` Tomasz Wroblewski
2013-01-23 17:50 ` Tomasz Wroblewski
2013-01-24 6:18 ` Juergen Gross
2013-01-24 14:26 ` [PATCH v2] " Tomasz Wroblewski
2013-01-24 15:36 ` Jan Beulich
2013-01-24 15:57 ` George Dunlap
2013-01-24 16:25 ` Tomasz Wroblewski
2013-01-24 16:56 ` Jan Beulich
2013-01-25 9:07 ` Tomasz Wroblewski
2013-01-25 9:36 ` Jan Beulich
2013-01-25 9:45 ` Tomasz Wroblewski
2013-01-25 10:15 ` Jan Beulich
2013-01-25 10:18 ` Tomasz Wroblewski
2013-01-25 10:29 ` Jan Beulich
2013-01-25 10:23 ` Juergen Gross
2013-01-25 10:29 ` Tomasz Wroblewski
2013-01-25 10:31 ` Jan Beulich
2013-01-25 10:35 ` Juergen Gross
2013-01-25 10:40 ` Jan Beulich
2013-01-25 11:05 ` Juergen Gross
2013-01-25 11:56 ` Tomasz Wroblewski
2013-01-25 12:27 ` Jan Beulich
2013-01-25 13:58 ` Tomasz Wroblewski
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5100070F.7010808@citrix.com \
--to=tomasz.wroblewski@citrix.com \
--cc=JBeulich@suse.com \
--cc=george.dunlap@eu.citrix.com \
--cc=keir@xen.org \
--cc=xen-devel@lists.xen.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).