From mboxrd@z Thu Jan  1 00:00:00 1970
From: Keir Fraser <keir@xen.org>
Subject: Re: Xen4.2 S3 regression?
Date: Tue, 25 Sep 2012 15:53:29 +0100
Message-ID: <CC8783F9.4CD7A%keir@xen.org>
References: <CAOvdn6X6_NwfTNTgwNG7CQOqu2oDb6o6r1Zf+5y0cwZGLKToxA@mail.gmail.com>
Mime-Version: 1.0
Content-Type: multipart/mixed; boundary="===============3281119307470274314=="
Return-path: <xen-devel-bounces@lists.xen.org>
In-Reply-To: <CAOvdn6X6_NwfTNTgwNG7CQOqu2oDb6o6r1Zf+5y0cwZGLKToxA@mail.gmail.com>
List-Unsubscribe: <http://lists.xen.org/cgi-bin/mailman/options/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xen.org>
List-Help: <mailto:xen-devel-request@lists.xen.org?subject=help>
List-Subscribe: <http://lists.xen.org/cgi-bin/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=subscribe>
Sender: xen-devel-bounces@lists.xen.org
Errors-To: xen-devel-bounces@lists.xen.org
To: Ben Guthro <ben@guthro.net>, Jan Beulich <JBeulich@suse.com>
Cc: John Baboval <john.baboval@citrix.com>, xen-devel <xen-devel@lists.xen.org>
List-Id: xen-devel@lists.xenproject.org

> This message is in MIME format. Since your mail reader does not understand
this format, some or all of this message may not be legible.

--===============3281119307470274314==
Content-type: multipart/alternative;
	boundary="B_3431433214_4287069"

> This message is in MIME format. Since your mail reader does not understand
this format, some or all of this message may not be legible.

--B_3431433214_4287069
Content-type: text/plain;
	charset="ISO-8859-1"
Content-transfer-encoding: quoted-printable

This was introduced as part of a patch to avoid losing cpu and cpupool
affinities/memberships across S3. Looks like it breaks some assumptions in
the scheduler though, probably because all CPUs are not taken offline
atomically, nor brought back online atomically. Hence some other running CP=
U
can execute hypervisor code that observes VCPUs in this bad
can=B9t-run-anywhere state. I guess this is what is happening. I=B9m not
immediately sure of the best fix. :(

 -- Keir

On 25/09/2012 15:22, "Ben Guthro" <ben@guthro.net> wrote:

> I went back to an old patch that had, since it was in this same function =
that
> you made reference to:
> http://markmail.org/message/qpnmiqzt5bngeejk
>=20
> I noticed that I was not seeing the "Breaking vcpu affinity" printk - so =
I
> tried to get that=A0
>=20
> The change proposed in that thread seems to work around this pinning prob=
lem.
> However, I'm not sure that it is the "right thing" to be doing.
>=20
> Do you have any thoughts on this?
>=20
>=20
> On Tue, Sep 25, 2012 at 7:56 AM, Ben Guthro <ben@guthro.net> wrote:
>>=20
>> On Tue, Sep 25, 2012 at 3:00 AM, Jan Beulich <JBeulich@suse.com> wrote:
>>>>>> >>> On 24.09.12 at 23:12, Ben Guthro <ben@guthro.net> wrote:
>>>> > Here's my "Big hammer" debugging patch.
>>>> >
>>>> > If I force the cpu to be scheduled on CPU0 when the appropriate cpu =
is
>>>> not
>>>> > online, I can resume properly.
>>>> >
>>>> > Clearly this is not the proper solution, and I'm sure the fix is sub=
tle.
>>>> > I'm not seeing it right now though. Perhaps tomorrow morning.
>>>> > If you have any ideas, I'm happy to run tests then.
>>>=20
>>> I can't see how the printk() you add in the patch would ever get
>>> reached with the other adjustment you do there.
>>=20
>> Apologies. I failed to separate prior debugging in this patch from the "=
big
>> hammer" fix
>> =A0
>>> A debug build,
>>> as Keir suggested, would not only get the stack trace right, but
>>> would also result in the ASSERT() right after your first modification
>>> to _csched_cpu_pick() to actually do something (and likely trigger).
>>=20
>> Indeed. I was using non-debug builds for 2 reasons that, in hindsight ma=
y not
>> be the best of reasons.
>> 1. It was the default
>> 2. Mukesh's kdb debugger requires debug to be off, which I was making us=
e of
>> previously, and had not disabled.
>> =A0
>> The stack from a debug build can be found below.
>> It did, indeed trigger the ASSERT, as you predicted.
>>=20
>>=20
>> (XEN) Finishing wakeup from ACPI S3 state.
>> (XEN) Enabling non-boot CPUs =A0...
>> (XEN) Booting processor 1/1 eip 8a000
>> (XEN) Initializing CPU#1
>> (XEN) CPU: L1 I cache: 32K, L1 D cache: 32K
>> (XEN) CPU: L2 cache: 3072K
>> (XEN) CPU: Physical Processor ID: 0
>> (XEN) CPU: Processor Core ID: 1
>> (XEN) CMCI: CPU1 has no CMCI support
>> (XEN) CPU1: Thermal monitoring enabled (TM2)
>> (XEN) CPU1: Intel(R) Core(TM)2 Duo CPU =A0 =A0 P8400 =A0@ 2.26GHz stepping 06
>> (XEN) microcode: CPU1 updated from revision 0x60c to 0x60f, date =3D
>> 2010-09-29=A0
>> [ =A0 82.310025] ACPI: Low-level resume complete
>> [ =A0 82.310025] PM: Restoring platform NVS memory
>> [ =A0 82.310025] Enabling non-boot CPUs ...
>> [ =A0 82.310025] installing Xen timer for CPU 1
>> [ =A0 82.310025] cpu 1 spinlock event irq 279
>> (XEN) Assertion '!cpumask_empty(&cpus) && cpumask_test_cpu(cpu, &cpus)'
>> failed at sched_credit.c:477
>> (XEN) ----[ Xen-4.2.1-pre =A0x86_64 =A0debug=3Dy =A0Tainted: =A0 =A0C ]----
>> (XEN) CPU: =A0 =A01
>> (XEN) RIP: =A0 =A0e008:[<ffff82c48011a35a>] _csched_cpu_pick+0x135/0x552
>> (XEN) RFLAGS: 0000000000010002 =A0 CONTEXT: hypervisor
>> (XEN) rax: 0000000000000001 =A0 rbx: 0000000000000004 =A0 rcx: 0000000000000=
004
>> (XEN) rdx: 000000000000000f =A0 rsi: 0000000000000004 =A0 rdi: 0000000000000=
000
>> (XEN) rbp: ffff8301355d7dd8 =A0 rsp: ffff8301355d7d08 =A0 r8: =A00000000000000=
000
>> (XEN) r9: =A0000000000000003e =A0 r10: ffff82c480231700 =A0 r11: 0000000000000=
246
>> (XEN) r12: ffff82c480261b20 =A0 r13: 0000000000000001 =A0 r14: ffff82c480301=
a60
>> (XEN) r15: ffff83013a542068 =A0 cr0: 000000008005003b =A0 cr4: 0000000000002=
6f0
>> (XEN) cr3: 0000000131a05000 =A0 cr2: 0000000000000000
>> (XEN) ds: 002b =A0 es: 002b =A0 fs: 0000 =A0 gs: 0000 =A0 ss: e010 =A0 cs: e008
>> (XEN) Xen stack trace from rsp=3Dffff8301355d7d08:
>> (XEN) =A0 =A00100000131a05000 ffff8301355d7d40 0000000000000082 000000000000=
0002
>> (XEN) =A0 =A0ffff8300bd503000 0000000000000001 0000000000000297 ffff8301355d=
7d58
>> (XEN) =A0 =A0ffff82c480125499 ffff830138216000 ffff8301355d7d98 540000000000=
0002
>> (XEN) =A0 =A00000000000000286 ffff8301355d7d88 ffff82c480125499 ffff83013821=
6000
>> (XEN) =A0 =A00000000000000000 0000000000000000 0000000000000000 000000000000=
0000
>> (XEN) =A0 =A0ffff830134ca6a50 ffff83013a542068 ffff83013a542068 000000000000=
0001
>> (XEN) =A0 =A0ffff82c480301a60 ffff83013a542068 ffff8301355d7de8 ffff82c48011=
a785
>> (XEN) =A0 =A0ffff8301355d7e58 ffff82c480123519 ffff82c480301a60 ffff82c48030=
1a60
>> (XEN) =A0 =A0ffff82c480301a60 ffff8300bd503000 0000000000503060 000000000000=
0246
>> (XEN) =A0 =A0ffff82c480127c31 ffff8300bd503000 ffff82c480301a60 ffff82c4802e=
bd40
>> (XEN) =A0 =A0ffff83013a542068 ffff88003fc8e820 ffff8301355d7e88 ffff82c48012=
37d3
>> (XEN) =A0 =A0fffffffffffffffe ffff8301355ca000 ffff8300bd503000 000000000000=
0000
>> (XEN) =A0 =A0ffff8301355d7ef8 ffff82c480106335 ffff8301355d7f18 ffffffff8100=
30e1
>> (XEN) =A0 =A0ffff8300bd503000 0000000000000000 ffff8301355d7f08 ffff82c48018=
5390
>> (XEN) =A0 =A0ffffffff81aafd32 ffff8300bd503000 0000000000000001 000000000000=
0000
>> (XEN) =A0 =A00000000000000000 ffff88003fc8e820 00007cfecaa280c7 ffff82c48022=
7348
>> (XEN) =A0 =A0ffffffff8100130a 0000000000000018 ffff88003fc8e820 000000000000=
0000
>> (XEN) =A0 =A00000000000000000 0000000000000001 ffff88003976fda0 ffff88003fc8=
bdc0
>> (XEN) =A0 =A00000000000000246 ffff88003976fe60 00000000ffffffff 000000000000=
0000
>> (XEN) =A0 =A00000000000000018 ffffffff8100130a 0000000000000000 000000000000=
0001
>> (XEN) Xen call trace:
>> (XEN) =A0 =A0[<ffff82c48011a35a>] _csched_cpu_pick+0x135/0x552
>> (XEN) =A0 =A0[<ffff82c48011a785>] csched_cpu_pick+0xe/0x10
>> (XEN) =A0 =A0[<ffff82c480123519>] vcpu_migrate+0x19f/0x346
>> (XEN) =A0 =A0[<ffff82c4801237d3>] vcpu_force_reschedule+0xa4/0xb6
>> (XEN) =A0 =A0[<ffff82c480106335>] do_vcpu_op+0x2c9/0x452
>> (XEN) =A0 =A0[<ffff82c480227348>] syscall_enter+0xc8/0x122
>> (XEN) =A0 =A0
>> (XEN)=A0
>> (XEN) ****************************************
>> (XEN) Panic on CPU 1:
>> (XEN) Assertion '!cpumask_empty(&cpus) && cpumask_test_cpu(cpu, &cpus)'
>> failed at sched_credit.c:477
>> (XEN) ****************************************
>> (XEN)=A0
>> (XEN) Reboot in five seconds...
>>=20
>>>=20
>>> Anyway, this might be connected to cpu_disable_scheduler() not
>>> having a counterpart to restore the affinity it broke for pinned
>>> domains (for non-pinned ones I believe this behavior is intentional,
>>> albeit not ideal).
>>>=20
>>> Jan
>>>=20
>>=20
>=20
>=20


--B_3431433214_4287069
Content-type: text/html;
	charset="ISO-8859-1"
Content-transfer-encoding: quoted-printable

<HTML>
<HEAD>
<TITLE>Re: [Xen-devel] Xen4.2 S3 regression?</TITLE>
</HEAD>
<BODY>
<FONT FACE=3D"Calibri, Verdana, Helvetica, Arial"><SPAN STYLE=3D'font-size:11pt=
'>This was introduced as part of a patch to avoid losing cpu and cpupool aff=
inities/memberships across S3. Looks like it breaks some assumptions in the =
scheduler though, probably because all CPUs are not taken offline atomically=
, nor brought back online atomically. Hence some other running CPU can execu=
te hypervisor code that observes VCPUs in this bad can&#8217;t-run-anywhere =
state. I guess this is what is happening. I&#8217;m not immediately sure of =
the best fix. :(<BR>
<BR>
&nbsp;-- Keir<BR>
<BR>
On 25/09/2012 15:22, &quot;Ben Guthro&quot; &lt;<a href=3D"ben@guthro.net">be=
n@guthro.net</a>&gt; wrote:<BR>
<BR>
</SPAN></FONT><BLOCKQUOTE><FONT FACE=3D"Calibri, Verdana, Helvetica, Arial"><=
SPAN STYLE=3D'font-size:11pt'>I went back to an old patch that had, since it w=
as in this same function that you made reference to:<BR>
<a href=3D"http://markmail.org/message/qpnmiqzt5bngeejk">http://markmail.org/=
message/qpnmiqzt5bngeejk</a><BR>
<BR>
I noticed that I was not seeing the &quot;Breaking vcpu affinity&quot; prin=
tk - so I tried to get that=A0<BR>
<BR>
The change proposed in that thread seems to work around this pinning proble=
m.<BR>
However, I'm not sure that it is the &quot;right thing&quot; to be doing.<B=
R>
<BR>
Do you have any thoughts on this?<BR>
<BR>
<BR>
On Tue, Sep 25, 2012 at 7:56 AM, Ben Guthro &lt;<a href=3D"ben@guthro.net">be=
n@guthro.net</a>&gt; wrote:<BR>
</SPAN></FONT><BLOCKQUOTE><FONT FACE=3D"Calibri, Verdana, Helvetica, Arial"><=
SPAN STYLE=3D'font-size:11pt'><BR>
On Tue, Sep 25, 2012 at 3:00 AM, Jan Beulich &lt;<a href=3D"JBeulich@suse.com=
">JBeulich@suse.com</a>&gt; wrote:<BR>
</SPAN></FONT><BLOCKQUOTE><FONT FACE=3D"Calibri, Verdana, Helvetica, Arial"><=
SPAN STYLE=3D'font-size:11pt'>&gt;&gt;&gt; On 24.09.12 at 23:12, Ben Guthro &l=
t;<a href=3D"ben@guthro.net">ben@guthro.net</a>&gt; wrote:<BR>
&gt; Here's my &quot;Big hammer&quot; debugging patch.<BR>
&gt;<BR>
&gt; If I force the cpu to be scheduled on CPU0 when the appropriate cpu is=
 not<BR>
&gt; online, I can resume properly.<BR>
&gt;<BR>
&gt; Clearly this is not the proper solution, and I'm sure the fix is subtl=
e.<BR>
&gt; I'm not seeing it right now though. Perhaps tomorrow morning.<BR>
&gt; If you have any ideas, I'm happy to run tests then.<BR>
<BR>
I can't see how the printk() you add in the patch would ever get<BR>
reached with the other adjustment you do there. <BR>
</SPAN></FONT></BLOCKQUOTE><FONT FACE=3D"Calibri, Verdana, Helvetica, Arial">=
<SPAN STYLE=3D'font-size:11pt'><BR>
Apologies. I failed to separate prior debugging in this patch from the &quo=
t;big hammer&quot; fix<BR>
=A0<BR>
</SPAN></FONT><BLOCKQUOTE><FONT FACE=3D"Calibri, Verdana, Helvetica, Arial"><=
SPAN STYLE=3D'font-size:11pt'>A debug build,<BR>
as Keir suggested, would not only get the stack trace right, but<BR>
would also result in the ASSERT() right after your first modification<BR>
to _csched_cpu_pick() to actually do something (and likely trigger).<BR>
</SPAN></FONT></BLOCKQUOTE><FONT FACE=3D"Calibri, Verdana, Helvetica, Arial">=
<SPAN STYLE=3D'font-size:11pt'><BR>
Indeed. I was using non-debug builds for 2 reasons that, in hindsight may n=
ot be the best of reasons.<BR>
1. It was the default<BR>
2. Mukesh's kdb debugger requires debug to be off, which I was making use o=
f previously, and had not disabled.<BR>
=A0<BR>
The stack from a debug build can be found below.<BR>
It did, indeed trigger the ASSERT, as you predicted.<BR>
<BR>
<BR>
(XEN) Finishing wakeup from ACPI S3 state.<BR>
(XEN) Enabling non-boot CPUs =A0...<BR>
(XEN) Booting processor 1/1 eip 8a000<BR>
(XEN) Initializing CPU#1<BR>
(XEN) CPU: L1 I cache: 32K, L1 D cache: 32K<BR>
(XEN) CPU: L2 cache: 3072K<BR>
(XEN) CPU: Physical Processor ID: 0<BR>
(XEN) CPU: Processor Core ID: 1<BR>
(XEN) CMCI: CPU1 has no CMCI support<BR>
(XEN) CPU1: Thermal monitoring enabled (TM2)<BR>
(XEN) CPU1: Intel(R) Core(TM)2 Duo CPU =A0 =A0 P8400 =A0@ 2.26GHz stepping 06<BR>
(XEN) microcode: CPU1 updated from revision 0x60c to 0x60f, date =3D 2010-09-=
29=A0<BR>
[ =A0 82.310025] ACPI: Low-level resume complete<BR>
[ =A0 82.310025] PM: Restoring platform NVS memory<BR>
[ =A0 82.310025] Enabling non-boot CPUs ...<BR>
[ =A0 82.310025] installing Xen timer for CPU 1<BR>
[ =A0 82.310025] cpu 1 spinlock event irq 279<BR>
(XEN) Assertion '!cpumask_empty(&amp;cpus) &amp;&amp; cpumask_test_cpu(cpu,=
 &amp;cpus)' failed at sched_credit.c:477<BR>
(XEN) ----[ Xen-4.2.1-pre =A0x86_64 =A0debug=3Dy =A0Tainted: =A0 =A0C ]----<BR>
(XEN) CPU: =A0 =A01<BR>
(XEN) RIP: =A0 =A0e008:[&lt;ffff82c48011a35a&gt;] _csched_cpu_pick+0x135/0x552<=
BR>
(XEN) RFLAGS: 0000000000010002 =A0 CONTEXT: hypervisor<BR>
(XEN) rax: 0000000000000001 =A0 rbx: 0000000000000004 =A0 rcx: 0000000000000004=
<BR>
(XEN) rdx: 000000000000000f =A0 rsi: 0000000000000004 =A0 rdi: 0000000000000000=
<BR>
(XEN) rbp: ffff8301355d7dd8 =A0 rsp: ffff8301355d7d08 =A0 r8: =A00000000000000000=
<BR>
(XEN) r9: =A0000000000000003e =A0 r10: ffff82c480231700 =A0 r11: 0000000000000246=
<BR>
(XEN) r12: ffff82c480261b20 =A0 r13: 0000000000000001 =A0 r14: ffff82c480301a60=
<BR>
(XEN) r15: ffff83013a542068 =A0 cr0: 000000008005003b =A0 cr4: 00000000000026f0=
<BR>
(XEN) cr3: 0000000131a05000 =A0 cr2: 0000000000000000<BR>
(XEN) ds: 002b =A0 es: 002b =A0 fs: 0000 =A0 gs: 0000 =A0 ss: e010 =A0 cs: e008<BR>
(XEN) Xen stack trace from rsp=3Dffff8301355d7d08:<BR>
(XEN) =A0 =A00100000131a05000 ffff8301355d7d40 0000000000000082 000000000000000=
2<BR>
(XEN) =A0 =A0ffff8300bd503000 0000000000000001 0000000000000297 ffff8301355d7d5=
8<BR>
(XEN) =A0 =A0ffff82c480125499 ffff830138216000 ffff8301355d7d98 540000000000000=
2<BR>
(XEN) =A0 =A00000000000000286 ffff8301355d7d88 ffff82c480125499 ffff83013821600=
0<BR>
(XEN) =A0 =A00000000000000000 0000000000000000 0000000000000000 000000000000000=
0<BR>
(XEN) =A0 =A0ffff830134ca6a50 ffff83013a542068 ffff83013a542068 000000000000000=
1<BR>
(XEN) =A0 =A0ffff82c480301a60 ffff83013a542068 ffff8301355d7de8 ffff82c48011a78=
5<BR>
(XEN) =A0 =A0ffff8301355d7e58 ffff82c480123519 ffff82c480301a60 ffff82c480301a6=
0<BR>
(XEN) =A0 =A0ffff82c480301a60 ffff8300bd503000 0000000000503060 000000000000024=
6<BR>
(XEN) =A0 =A0ffff82c480127c31 ffff8300bd503000 ffff82c480301a60 ffff82c4802ebd4=
0<BR>
(XEN) =A0 =A0ffff83013a542068 ffff88003fc8e820 ffff8301355d7e88 ffff82c4801237d=
3<BR>
(XEN) =A0 =A0fffffffffffffffe ffff8301355ca000 ffff8300bd503000 000000000000000=
0<BR>
(XEN) =A0 =A0ffff8301355d7ef8 ffff82c480106335 ffff8301355d7f18 ffffffff810030e=
1<BR>
(XEN) =A0 =A0ffff8300bd503000 0000000000000000 ffff8301355d7f08 ffff82c48018539=
0<BR>
(XEN) =A0 =A0ffffffff81aafd32 ffff8300bd503000 0000000000000001 000000000000000=
0<BR>
(XEN) =A0 =A00000000000000000 ffff88003fc8e820 00007cfecaa280c7 ffff82c48022734=
8<BR>
(XEN) =A0 =A0ffffffff8100130a 0000000000000018 ffff88003fc8e820 000000000000000=
0<BR>
(XEN) =A0 =A00000000000000000 0000000000000001 ffff88003976fda0 ffff88003fc8bdc=
0<BR>
(XEN) =A0 =A00000000000000246 ffff88003976fe60 00000000ffffffff 000000000000000=
0<BR>
(XEN) =A0 =A00000000000000018 ffffffff8100130a 0000000000000000 000000000000000=
1<BR>
(XEN) Xen call trace:<BR>
(XEN) =A0 =A0[&lt;ffff82c48011a35a&gt;] _csched_cpu_pick+0x135/0x552<BR>
(XEN) =A0 =A0[&lt;ffff82c48011a785&gt;] csched_cpu_pick+0xe/0x10<BR>
(XEN) =A0 =A0[&lt;ffff82c480123519&gt;] vcpu_migrate+0x19f/0x346<BR>
(XEN) =A0 =A0[&lt;ffff82c4801237d3&gt;] vcpu_force_reschedule+0xa4/0xb6<BR>
(XEN) =A0 =A0[&lt;ffff82c480106335&gt;] do_vcpu_op+0x2c9/0x452<BR>
(XEN) =A0 =A0[&lt;ffff82c480227348&gt;] syscall_enter+0xc8/0x122<BR>
(XEN) =A0 =A0<BR>
(XEN)=A0<BR>
(XEN) ****************************************<BR>
(XEN) Panic on CPU 1:<BR>
(XEN) Assertion '!cpumask_empty(&amp;cpus) &amp;&amp; cpumask_test_cpu(cpu,=
 &amp;cpus)' failed at sched_credit.c:477<BR>
(XEN) ****************************************<BR>
(XEN)=A0<BR>
(XEN) Reboot in five seconds...<BR>
<BR>
</SPAN></FONT><BLOCKQUOTE><FONT FACE=3D"Calibri, Verdana, Helvetica, Arial"><=
SPAN STYLE=3D'font-size:11pt'><BR>
Anyway, this might be connected to cpu_disable_scheduler() not<BR>
having a counterpart to restore the affinity it broke for pinned<BR>
domains (for non-pinned ones I believe this behavior is intentional,<BR>
albeit not ideal).<BR>
<FONT COLOR=3D"#888888"><BR>
Jan<BR>
<BR>
</FONT></SPAN></FONT></BLOCKQUOTE><FONT FACE=3D"Calibri, Verdana, Helvetica, =
Arial"><SPAN STYLE=3D'font-size:11pt'><BR>
</SPAN></FONT></BLOCKQUOTE><FONT FACE=3D"Calibri, Verdana, Helvetica, Arial">=
<SPAN STYLE=3D'font-size:11pt'><BR>
<BR>
</SPAN></FONT></BLOCKQUOTE>
</BODY>
</HTML>


--B_3431433214_4287069--


--===============3281119307470274314==
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

--===============3281119307470274314==--