From mboxrd@z Thu Jan 1 00:00:00 1970 From: Juergen Gross Subject: Re: [PATCH] Fix scheduler crash after s3 resume Date: Thu, 24 Jan 2013 07:18:17 +0100 Message-ID: <5100D229.4030906@ts.fujitsu.com> References: <5100070F.7010808@citrix.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; Format="flowed" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <5100070F.7010808@citrix.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Tomasz Wroblewski Cc: george.dunlap@eu.citrix.com, keir@xen.org, Jan Beulich , "xen-devel@lists.xen.org" List-Id: xen-devel@lists.xenproject.org Am 23.01.2013 16:51, schrieb Tomasz Wroblewski: > Hi all, > > This was also discussed earlier, for example here > http://xen.markmail.org/thread/iqvkylp3mclmsnbw > > Changeset 25079:d5ccb2d1dbd1 (Introduce system_state variable) added a > global variable, which, among other things, is used to prevent disabling > cpu scheduler, prevent breaking vcpu affinities, prevent removing the > cpu from cpupool on suspend. However, it missed one place where cpu is > removed from the cpupool valid cpus mask, in smpboot.c, __cpu_disable(), > line 840: > > cpumask_clear_cpu(cpu, cpupool0->cpu_valid); > > This causes the vcpu in the default pool to be considered inactive, and > the following assertion is violated in sched_credit.c soon after resume > transitions out of xen, causing a platform reboot: > > (XEN) Finishing wakeup from ACPI S3 state. > (XEN) Enabling non-boot CPUs ... > (XEN) Assertion '!cpumask_empty(&cpus) && cpumask_test_cpu(cpu, &cpus)' > failed at sched_credit.c:507 > (XEN) ----[ Xen-4.3-unstable x86_64 debug=y Tainted: C ]---- > (XEN) CPU: 1 > (XEN) RIP: e008:[] _csched_cpu_pick+0x155/0x5fd > (XEN) RFLAGS: 0000000000010202 CONTEXT: hypervisor > (XEN) rax: 0000000000000001 rbx: 0000000000000008 rcx: 0000000000000008 > (XEN) rdx: 00000000000000ff rsi: 0000000000000008 rdi: 0000000000000000 > (XEN) rbp: ffff83011415fdd8 rsp: ffff83011415fcf8 r8: 0000000000000000 > (XEN) r9: 000000000000003e r10: 00000008f3de731f r11: ffffea0000063800 > (XEN) r12: ffff82c480261720 r13: ffff830137b4d950 r14: ffff830137beb010 > (XEN) r15: ffff82c480261720 cr0: 0000000080050033 cr4: 00000000000026f0 > (XEN) cr3: 000000013c17d000 cr2: ffff8800ac6ef8f0 > (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: 0000 cs: e008 > (XEN) Xen stack trace from rsp=ffff83011415fcf8: > (XEN) 00000000000af257 0000000800000001 ffff8300ba4fd000 0000000000000000 > (XEN) 0000000000000000 0000000000000000 0000000000000002 ffff8800ac6ef8f0 > (XEN) 0000000800000000 00000001318e0025 0000000000000087 ffff83011415fd68 > (XEN) ffff82c480124f79 ffff83011415fd98 ffff83011415fda8 00007fda88d1e790 > (XEN) ffff8800ac6ef8f0 00000001318e0025 0000000000000000 0000000000000000 > (XEN) 0000000000000000 0000000000000000 0000000000000146 ffff830137b4d940 > (XEN) 0000000000000001 ffff830137b4d950 ffff830137beb010 ffff82c480261720 > (XEN) ffff83011415fe48 ffff82c48011a51b 0002000e00000007 ffffffff81009071 > (XEN) 000000000000e033 ffff83013a805360 ffff880002bb3c28 000000000000e02b > (XEN) e4d87248e7ca5f52 ffff830102ae2200 0000000000000001 ffff82c48011a356 > (XEN) 00000008efa1f543 00007fda88d1e790 ffff83011415fe78 ffff82c48012748f > (XEN) 0000000000000002 ffff830137beb028 ffff830102ae2200 ffff830137beb8d0 > (XEN) ffff83011415fec8 ffff82c48012758b ffff830114150000 ffff8800ac6ef8f0 > (XEN) 80100000ae86d065 ffff82c4802e0080 ffff82c4802e0000 ffff830114158000 > (XEN) ffffffffffffffff 00007fda88d1e790 ffff83011415fef8 ffff82c480124b4e > (XEN) ffff8300ba4fd000 ffffea0000063800 00000001318e0025 ffff8800ac6ef8f0 > (XEN) ffff83011415ff08 ffff82c480124bb4 00007cfeebea00c7 ffff82c480226a71 > (XEN) 00007fda88d1e790 ffff8800ac6ef8f0 00000001318e0025 ffffea0000063800 > (XEN) ffff880002bb3c78 00000001318e0025 ffffea0000063800 0000000000000146 > (XEN) 00003ffffffff000 ffffea0002b1bbf0 0000000000000000 00000001318e0025 > (XEN) Xen call trace: > (XEN) [] _csched_cpu_pick+0x155/0x5fd > (XEN) [] csched_tick+0x1c5/0x342 > (XEN) [] execute_timer+0x4e/0x6c > (XEN) [] timer_softirq_action+0xde/0x206 > (XEN) [] __do_softirq+0x8e/0x99 > (XEN) [] do_softirq+0x13/0x15 > (XEN) > (XEN) > (XEN) **************************************** > (XEN) Panic on CPU 1: > (XEN) Assertion '!cpumask_empty(&cpus) && cpumask_test_cpu(cpu, &cpus)' > failed at sched_credit.c:507 > (XEN) **************************************** > (XEN) > (XEN) Reboot in five seconds... > > ^ reason for above being that "cpus" cpumask is empty as it is a logical > "and" between cpupool's valid cpus (from which the cpu was removed) and > cpu affinity mask. > > Attached patch follows the spirit of the changeset 25079:d5ccb2d1dbd1 > (which blocked removal of the cpu from the cpupool in cpupool.c) by also > blocking it's removal from the cpupool's valid cpumask. So cpu > affinities are still preserved across suspend/resume, and scheuduler > does not need to be disabled, as per original intent (I think). Would > welcome comments. > > Signed-off-by: Tomasz Wroblewski Acked-by: Juergen Gross > > Commit message: > Fix s3 resume regression (crash in scheduler) after c-s > 25079:d5ccb2d1dbd1 by also blocking removal of the cpu from the > cpupool's cpu_valid mask - in the spirit of mentioned c-s. > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel -- Juergen Gross Principal Developer Operating Systems PBG PDG ES&S SWE OS6 Telephone: +49 (0) 89 3222 2967 Fujitsu Technology Solutions e-mail: juergen.gross@ts.fujitsu.com Domagkstr. 28 Internet: ts.fujitsu.com D-80807 Muenchen Company details: ts.fujitsu.com/imprint.html