From: Keir Fraser <keir@xen.org>
To: Ben Guthro <ben@guthro.net>, Jan Beulich <JBeulich@suse.com>
Cc: John Baboval <john.baboval@citrix.com>,
xen-devel <xen-devel@lists.xen.org>
Subject: Re: Xen4.2 S3 regression?
Date: Tue, 25 Sep 2012 15:53:29 +0100 [thread overview]
Message-ID: <CC8783F9.4CD7A%keir@xen.org> (raw)
In-Reply-To: <CAOvdn6X6_NwfTNTgwNG7CQOqu2oDb6o6r1Zf+5y0cwZGLKToxA@mail.gmail.com>
[-- Attachment #1.1: Type: text/plain, Size: 6889 bytes --]
This was introduced as part of a patch to avoid losing cpu and cpupool
affinities/memberships across S3. Looks like it breaks some assumptions in
the scheduler though, probably because all CPUs are not taken offline
atomically, nor brought back online atomically. Hence some other running CPU
can execute hypervisor code that observes VCPUs in this bad
can¹t-run-anywhere state. I guess this is what is happening. I¹m not
immediately sure of the best fix. :(
-- Keir
On 25/09/2012 15:22, "Ben Guthro" <ben@guthro.net> wrote:
> I went back to an old patch that had, since it was in this same function that
> you made reference to:
> http://markmail.org/message/qpnmiqzt5bngeejk
>
> I noticed that I was not seeing the "Breaking vcpu affinity" printk - so I
> tried to get that
>
> The change proposed in that thread seems to work around this pinning problem.
> However, I'm not sure that it is the "right thing" to be doing.
>
> Do you have any thoughts on this?
>
>
> On Tue, Sep 25, 2012 at 7:56 AM, Ben Guthro <ben@guthro.net> wrote:
>>
>> On Tue, Sep 25, 2012 at 3:00 AM, Jan Beulich <JBeulich@suse.com> wrote:
>>>>>> >>> On 24.09.12 at 23:12, Ben Guthro <ben@guthro.net> wrote:
>>>> > Here's my "Big hammer" debugging patch.
>>>> >
>>>> > If I force the cpu to be scheduled on CPU0 when the appropriate cpu is
>>>> not
>>>> > online, I can resume properly.
>>>> >
>>>> > Clearly this is not the proper solution, and I'm sure the fix is subtle.
>>>> > I'm not seeing it right now though. Perhaps tomorrow morning.
>>>> > If you have any ideas, I'm happy to run tests then.
>>>
>>> I can't see how the printk() you add in the patch would ever get
>>> reached with the other adjustment you do there.
>>
>> Apologies. I failed to separate prior debugging in this patch from the "big
>> hammer" fix
>>
>>> A debug build,
>>> as Keir suggested, would not only get the stack trace right, but
>>> would also result in the ASSERT() right after your first modification
>>> to _csched_cpu_pick() to actually do something (and likely trigger).
>>
>> Indeed. I was using non-debug builds for 2 reasons that, in hindsight may not
>> be the best of reasons.
>> 1. It was the default
>> 2. Mukesh's kdb debugger requires debug to be off, which I was making use of
>> previously, and had not disabled.
>>
>> The stack from a debug build can be found below.
>> It did, indeed trigger the ASSERT, as you predicted.
>>
>>
>> (XEN) Finishing wakeup from ACPI S3 state.
>> (XEN) Enabling non-boot CPUs ...
>> (XEN) Booting processor 1/1 eip 8a000
>> (XEN) Initializing CPU#1
>> (XEN) CPU: L1 I cache: 32K, L1 D cache: 32K
>> (XEN) CPU: L2 cache: 3072K
>> (XEN) CPU: Physical Processor ID: 0
>> (XEN) CPU: Processor Core ID: 1
>> (XEN) CMCI: CPU1 has no CMCI support
>> (XEN) CPU1: Thermal monitoring enabled (TM2)
>> (XEN) CPU1: Intel(R) Core(TM)2 Duo CPU P8400 @ 2.26GHz stepping 06
>> (XEN) microcode: CPU1 updated from revision 0x60c to 0x60f, date =
>> 2010-09-29
>> [ 82.310025] ACPI: Low-level resume complete
>> [ 82.310025] PM: Restoring platform NVS memory
>> [ 82.310025] Enabling non-boot CPUs ...
>> [ 82.310025] installing Xen timer for CPU 1
>> [ 82.310025] cpu 1 spinlock event irq 279
>> (XEN) Assertion '!cpumask_empty(&cpus) && cpumask_test_cpu(cpu, &cpus)'
>> failed at sched_credit.c:477
>> (XEN) ----[ Xen-4.2.1-pre x86_64 debug=y Tainted: C ]----
>> (XEN) CPU: 1
>> (XEN) RIP: e008:[<ffff82c48011a35a>] _csched_cpu_pick+0x135/0x552
>> (XEN) RFLAGS: 0000000000010002 CONTEXT: hypervisor
>> (XEN) rax: 0000000000000001 rbx: 0000000000000004 rcx: 0000000000000004
>> (XEN) rdx: 000000000000000f rsi: 0000000000000004 rdi: 0000000000000000
>> (XEN) rbp: ffff8301355d7dd8 rsp: ffff8301355d7d08 r8: 0000000000000000
>> (XEN) r9: 000000000000003e r10: ffff82c480231700 r11: 0000000000000246
>> (XEN) r12: ffff82c480261b20 r13: 0000000000000001 r14: ffff82c480301a60
>> (XEN) r15: ffff83013a542068 cr0: 000000008005003b cr4: 00000000000026f0
>> (XEN) cr3: 0000000131a05000 cr2: 0000000000000000
>> (XEN) ds: 002b es: 002b fs: 0000 gs: 0000 ss: e010 cs: e008
>> (XEN) Xen stack trace from rsp=ffff8301355d7d08:
>> (XEN) 0100000131a05000 ffff8301355d7d40 0000000000000082 0000000000000002
>> (XEN) ffff8300bd503000 0000000000000001 0000000000000297 ffff8301355d7d58
>> (XEN) ffff82c480125499 ffff830138216000 ffff8301355d7d98 5400000000000002
>> (XEN) 0000000000000286 ffff8301355d7d88 ffff82c480125499 ffff830138216000
>> (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000
>> (XEN) ffff830134ca6a50 ffff83013a542068 ffff83013a542068 0000000000000001
>> (XEN) ffff82c480301a60 ffff83013a542068 ffff8301355d7de8 ffff82c48011a785
>> (XEN) ffff8301355d7e58 ffff82c480123519 ffff82c480301a60 ffff82c480301a60
>> (XEN) ffff82c480301a60 ffff8300bd503000 0000000000503060 0000000000000246
>> (XEN) ffff82c480127c31 ffff8300bd503000 ffff82c480301a60 ffff82c4802ebd40
>> (XEN) ffff83013a542068 ffff88003fc8e820 ffff8301355d7e88 ffff82c4801237d3
>> (XEN) fffffffffffffffe ffff8301355ca000 ffff8300bd503000 0000000000000000
>> (XEN) ffff8301355d7ef8 ffff82c480106335 ffff8301355d7f18 ffffffff810030e1
>> (XEN) ffff8300bd503000 0000000000000000 ffff8301355d7f08 ffff82c480185390
>> (XEN) ffffffff81aafd32 ffff8300bd503000 0000000000000001 0000000000000000
>> (XEN) 0000000000000000 ffff88003fc8e820 00007cfecaa280c7 ffff82c480227348
>> (XEN) ffffffff8100130a 0000000000000018 ffff88003fc8e820 0000000000000000
>> (XEN) 0000000000000000 0000000000000001 ffff88003976fda0 ffff88003fc8bdc0
>> (XEN) 0000000000000246 ffff88003976fe60 00000000ffffffff 0000000000000000
>> (XEN) 0000000000000018 ffffffff8100130a 0000000000000000 0000000000000001
>> (XEN) Xen call trace:
>> (XEN) [<ffff82c48011a35a>] _csched_cpu_pick+0x135/0x552
>> (XEN) [<ffff82c48011a785>] csched_cpu_pick+0xe/0x10
>> (XEN) [<ffff82c480123519>] vcpu_migrate+0x19f/0x346
>> (XEN) [<ffff82c4801237d3>] vcpu_force_reschedule+0xa4/0xb6
>> (XEN) [<ffff82c480106335>] do_vcpu_op+0x2c9/0x452
>> (XEN) [<ffff82c480227348>] syscall_enter+0xc8/0x122
>> (XEN)
>> (XEN)
>> (XEN) ****************************************
>> (XEN) Panic on CPU 1:
>> (XEN) Assertion '!cpumask_empty(&cpus) && cpumask_test_cpu(cpu, &cpus)'
>> failed at sched_credit.c:477
>> (XEN) ****************************************
>> (XEN)
>> (XEN) Reboot in five seconds...
>>
>>>
>>> Anyway, this might be connected to cpu_disable_scheduler() not
>>> having a counterpart to restore the affinity it broke for pinned
>>> domains (for non-pinned ones I believe this behavior is intentional,
>>> albeit not ideal).
>>>
>>> Jan
>>>
>>
>
>
[-- Attachment #1.2: Type: text/html, Size: 8518 bytes --]
[-- Attachment #2: Type: text/plain, Size: 126 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
next prev parent reply other threads:[~2012-09-25 14:53 UTC|newest]
Thread overview: 134+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-08-07 15:04 Xen4.2 S3 regression? Ben Guthro
2012-08-07 16:21 ` Ben Guthro
2012-08-07 16:33 ` Konrad Rzeszutek Wilk
2012-08-07 16:48 ` Ben Guthro
2012-08-07 20:14 ` Ben Guthro
2012-08-08 8:35 ` Jan Beulich
2012-08-08 10:39 ` Ben Guthro
2012-08-09 15:21 ` Ben Guthro
2012-08-09 15:37 ` Jan Beulich
2012-08-09 15:46 ` Ben Guthro
2012-08-09 15:51 ` Jan Beulich
2012-08-09 16:09 ` Ben Guthro
2012-08-10 6:50 ` Jan Beulich
2012-08-10 19:15 ` Ben Guthro
2012-08-14 17:31 ` Ben Guthro
2012-08-15 8:11 ` Jan Beulich
2012-08-15 10:32 ` Ben Guthro
2012-08-15 12:32 ` Ben Guthro
2012-08-15 12:58 ` Jan Beulich
2012-08-15 13:11 ` Ben Guthro
2012-08-15 14:50 ` Jan Beulich
2012-08-15 14:58 ` Ben Guthro
2012-08-15 15:00 ` Andrew Cooper
2012-08-15 15:06 ` Jan Beulich
2012-08-15 15:16 ` Ben Guthro
2012-08-16 8:31 ` Jan Beulich
2012-08-16 10:37 ` Ben Guthro
2012-08-16 11:07 ` Jan Beulich
2012-08-16 11:56 ` Ben Guthro
2012-08-17 10:22 ` Ben Guthro
2012-08-17 10:40 ` Jan Beulich
2012-08-23 18:03 ` Ben Guthro
2012-08-23 18:37 ` Andrew Cooper
2012-08-24 22:11 ` Jan Beulich
2012-08-24 22:55 ` Jan Beulich
2012-08-25 0:48 ` Ben Guthro
2012-09-03 9:31 ` Jan Beulich
2012-09-04 12:27 ` Ben Guthro
2012-09-04 12:49 ` Ben Guthro
2012-09-04 14:26 ` Jan Beulich
2012-09-04 14:28 ` Ben Guthro
2012-09-04 14:36 ` Konrad Rzeszutek Wilk
2012-09-04 15:02 ` Jan Beulich
2012-09-06 10:22 ` Jan Beulich
2012-09-06 11:48 ` Ben Guthro
2012-09-06 11:51 ` Ben Guthro
2012-09-06 13:05 ` Konrad Rzeszutek Wilk
2012-09-06 13:27 ` Ben Guthro
2012-09-06 13:36 ` Ben Guthro
2012-09-06 16:42 ` Ben Guthro
2012-09-07 8:38 ` Jan Beulich
2012-09-07 10:37 ` Ben Guthro
2012-09-07 11:15 ` Jan Beulich
2012-09-07 11:51 ` Ben Guthro
2012-09-07 12:18 ` Jan Beulich
2012-09-07 16:06 ` Ben Guthro
2012-09-19 21:07 ` Ben Guthro
2012-09-20 6:13 ` Keir Fraser
2012-09-20 6:24 ` Keir Fraser
2012-09-20 8:03 ` Jan Beulich
2012-09-20 8:14 ` Keir Fraser
2012-09-20 12:56 ` Ben Guthro
2012-09-20 13:07 ` Keir Fraser
2012-09-20 20:30 ` Ben Guthro
2012-09-21 6:34 ` Keir Fraser
2012-09-21 6:47 ` Jan Beulich
2012-09-21 18:20 ` Ben Guthro
2012-09-21 18:42 ` Keir Fraser
2012-09-24 11:22 ` Jan Beulich
2012-09-24 11:25 ` Ben Guthro
2012-09-24 11:45 ` Jan Beulich
2012-09-24 11:54 ` Ben Guthro
2012-09-24 12:05 ` Jan Beulich
2012-09-24 12:24 ` Ben Guthro
2012-09-24 12:32 ` Jan Beulich
[not found] ` <CAOvdn6UMHmPWqedYE9GQQMDaM4oiHLDSn9ZzSgJjGf89g1DgTw@mail.gmail.com>
[not found] ` <50607D70020000780009D5C3@nat28.tlf.novell.com>
[not found] ` <CAOvdn6XL9ebp2oUV0XEXk_WdU3-=YAj+xfz6AMLDBpVThH3Xvw@mail.gmail.com>
2012-09-24 14:10 ` Jan Beulich
2012-09-24 14:16 ` Ben Guthro
2012-09-24 14:28 ` Jan Beulich
2012-09-24 19:02 ` Ben Guthro
2012-09-24 20:30 ` Keir Fraser
2012-09-24 20:46 ` Ben Guthro
2012-09-24 21:12 ` Ben Guthro
2012-09-25 7:00 ` Jan Beulich
2012-09-25 11:56 ` Ben Guthro
2012-09-25 14:22 ` Ben Guthro
2012-09-25 14:53 ` Keir Fraser [this message]
2012-09-25 15:10 ` Jan Beulich
2012-09-25 15:45 ` Ben Guthro
2012-09-25 15:52 ` Keir Fraser
2012-09-26 11:49 ` Jan Beulich
2012-09-26 10:43 ` Jan Beulich
2012-09-26 10:47 ` Ben Guthro
2012-09-26 18:21 ` Ben Guthro
2012-09-27 7:38 ` Jan Beulich
2012-09-27 7:46 ` Keir Fraser
2012-09-27 12:12 ` Ben Guthro
2012-09-27 13:41 ` Jan Beulich
2012-09-27 15:25 ` Jan Beulich
2012-09-27 15:32 ` Ben Guthro
2012-09-27 15:59 ` [PATCH] x86/ucode: fix Intel case of resume handling on boot CPU Jan Beulich
2012-09-27 16:06 ` Keir Fraser
2012-09-24 14:32 ` Xen4.2 S3 regression? Keir Fraser
2012-09-24 12:22 ` Pasi Kärkkäinen
2012-09-24 12:27 ` Ben Guthro
2012-09-24 12:37 ` Javier Marcet
2012-09-24 14:04 ` Konrad Rzeszutek Wilk
2012-09-24 15:08 ` Javier Marcet
2012-09-24 21:36 ` Javier Marcet
2012-09-25 14:06 ` Konrad Rzeszutek Wilk
2012-09-25 14:47 ` Javier Marcet
2012-09-25 15:21 ` Jan Beulich
2012-09-25 15:23 ` Javier Marcet
2012-09-25 19:55 ` Javier Marcet
2012-09-25 19:57 ` Ben Guthro
2012-09-25 20:08 ` Javier Marcet
2012-09-26 7:17 ` Jan Beulich
2012-09-26 7:59 ` Javier Marcet
2012-09-26 12:43 ` Konrad Rzeszutek Wilk
2012-09-26 14:14 ` Javier Marcet
2012-09-26 14:26 ` Ben Guthro
2012-09-26 14:40 ` Javier Marcet
2012-09-26 8:05 ` Javier Marcet
2012-09-24 12:37 ` Jan Beulich
2012-09-24 14:02 ` Konrad Rzeszutek Wilk
2012-09-20 7:17 ` Jan Beulich
[not found] ` <CAAnFQG-u1VUDgn11ZW0=UaYC4MvUtxxq8ZjjUOrNpXTSUWP41Q@mail.gmail.com>
[not found] ` <CAOvdn6VuD_5Mhd9wvOskfZWfCBjr2nT5LppDxyY5S-5LhGhSvA@mail.gmail.com>
[not found] ` <CAAnFQG_hMNvwM9Z3XPGR590=Gifos-kOftqjLFUX4YFW6tTTgg@mail.gmail.com>
[not found] ` <CAOvdn6UzdzO_sM6f9coN2udQ6eUC5=Sty-NgC7+yf3XMawF-0A@mail.gmail.com>
2012-09-04 15:31 ` Javier Marcet
-- strict thread matches above, loose matches on Subject: below --
2012-08-23 18:54 Andrew Cooper
2012-08-23 19:06 ` Ben Guthro
2012-08-23 19:26 ` Ben Guthro
2012-08-23 19:38 ` Andrew Cooper
2012-08-23 20:38 ` Ben Guthro
2012-08-24 15:10 ` Ben Guthro
2012-08-24 22:16 ` Jan Beulich
[not found] ` <CAOvdn6U1touhawCb2GvgVQZqxhWn9CRw6-wkqdxk=uOTq015OA@mail.gmail.com>
2012-09-06 9:24 ` Jan Beulich
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CC8783F9.4CD7A%keir@xen.org \
--to=keir@xen.org \
--cc=JBeulich@suse.com \
--cc=ben@guthro.net \
--cc=john.baboval@citrix.com \
--cc=xen-devel@lists.xen.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).