From mboxrd@z Thu Jan 1 00:00:00 1970 From: Keir Fraser Subject: Re: Xen4.2 S3 regression? Date: Thu, 20 Sep 2012 07:13:15 +0100 Message-ID: References: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============0740956528939052827==" Return-path: In-Reply-To: List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Ben Guthro , Jan Beulich Cc: xen-devel , john.baboval@citrix.com, Thomas Goetz , Konrad Rzeszutek Wilk List-Id: xen-devel@lists.xenproject.org > This message is in MIME format. Since your mail reader does not understand this format, some or all of this message may not be legible. --===============0740956528939052827== Content-type: multipart/alternative; boundary="B_3430969998_69237725" > This message is in MIME format. Since your mail reader does not understand this format, some or all of this message may not be legible. --B_3430969998_69237725 Content-type: text/plain; charset="ISO-8859-1" Content-transfer-encoding: quoted-printable CPU#1 got stuck in loop in cpu_init() as it appears to be =8Calready initialised=B9 in cpu_initialized bitmap. CPU#0 detects it is stuck and carries on, but the resume code assumes all CPUs are brought back online an= d crashes later. I wonder how long this has been broken. I recall reworking the CPU bringup code a lot early during 4.1.0 development... And I didn=B9t test S3. -- Keir On 19/09/2012 22:07, "Ben Guthro" wrote: > No hardware debugger just yet - but I've moved to another machine (Lenovo= T400 > laptop) - and am now seeing the following stack trace when I resume > (this is using the tip of the 4.2-testing tree) >=20 > It looks like either the vcpu, or the runstate is NULL, at this point in = the > resume process... >=20 >=20 > (XEN) Finishing wakeup from ACPI S3 state. > (XEN) Enabling non-boot CPUs =A0... > (XEN) CPU#1 already initialized! > (XEN) Stuck ?? > (XEN) Error taking CPU1 up: -5 > [ =A0 38.570054] ACPI: Low-level resume complete > [ =A0 38.570054] PM: Restoring platform NVS memory > [ =A0 38.570054] Enabling non-boot CPUs ... > (XEN) ----[ Xen-4.2.1-pre =A0x86_64 =A0debug=3Dn =A0Tainted: =A0 =A0C ]---- > (XEN) CPU: =A0 =A00 > (XEN) RIP: =A0 =A0e008:[] vcpu_runstate_get+0xe5/0x130 > (XEN) RFLAGS: 0000000000010006 =A0 CONTEXT: hypervisor > (XEN) rax: 00007d3b7fd17180 =A0 rbx: ffff8300bd2fe000 =A0 rcx: 00000000000000= 00 > (XEN) rdx: ffff08003fc8bd80 =A0 rsi: ffff82c48029fe28 =A0 rdi: ffff8300bd2fe0= 00 > (XEN) rbp: ffff82c48029fe28 =A0 rsp: ffff82c48029fdf8 =A0 r8: =A000000000000000= 08 > (XEN) r9: =A000000000000001c0 =A0 r10: ffff82c48021f4a0 =A0 r11: 00000000000002= 82 > (XEN) r12: ffff82c4802e8ee0 =A0 r13: ffff880039762da0 =A0 r14: ffff82c4802d31= 40 > (XEN) r15: fffffffffffffff2 =A0 cr0: 000000008005003b =A0 cr4: 00000000000026= f0 > (XEN) cr3: 0000000139ee4000 =A0 cr2: 0000000000000060 > (XEN) ds: 0000 =A0 es: 0000 =A0 fs: 0000 =A0 gs: 0000 =A0 ss: e010 =A0 cs: e008 > (XEN) Xen stack trace from rsp=3Dffff82c48029fdf8: > (XEN) =A0 =A0ffff8300bd2fe000 ffff82c48029ff18 ffff880037481d40 ffff880039762= da0 > (XEN) =A0 =A00000000000000001 ffff82c480157df4 0000000000000070 ffff82f6016db= 300 > (XEN) =A0 =A000000000000b6d98 ffff8301355d8000 0000000000000070 ffff82c480170= 2ab > (XEN) =A0 =A0ffff88003fc8bd80 0000000000000000 0000000000000020 ffff8300bd2fe= 000 > (XEN) =A0 =A0ffff8301355d8000 ffff880037481d40 ffff880039762da0 0000000000000= 001 > (XEN) =A0 =A00000000000000003 ffff82c4801058df ffff82c48029ff18 ffff82c480114= 62e > (XEN) =A0 =A00000000000000000 0000000000000000 0000000400000004 ffff82c48029f= f18 > (XEN) =A0 =A00000000000000010 ffff8300bd6a0000 ffff8800374819a8 ffff8300bd6a0= 000 > (XEN) =A0 =A0ffff880037481d48 0000000000000001 ffff880039762da0 ffff82c480214= 288 > (XEN) =A0 =A00000000000000003 0000000000000001 ffff880039762da0 0000000000000= 001 > (XEN) =A0 =A0ffff880037481d48 0000000000000001 0000000000000282 ffff880002dc4= 240 > (XEN) =A0 =A000000000000001c0 00000000000001c0 0000000000000018 ffffffff81001= 30a > (XEN) =A0 =A0ffff880037481d40 0000000000000001 0000000000000005 0000010000000= 000 > (XEN) =A0 =A0ffffffff8100130a 000000000000e033 0000000000000282 ffff880037481= d20 > (XEN) =A0 =A0000000000000e02b 0000000000000000 0000000000000000 0000000000000= 000 > (XEN) =A0 =A00000000000000000 0000000000000000 ffff8300bd6a0000 0000000000000= 000 > (XEN) =A0 =A00000000000000000 > (XEN) Xen call trace: > (XEN) =A0 =A0[] vcpu_runstate_get+0xe5/0x130 > (XEN) =A0 =A0[] arch_do_vcpu_op+0x134/0x5d0 > (XEN) =A0 =A0[] do_update_descriptor+0x1db/0x220 > (XEN) =A0 =A0[] do_vcpu_op+0x6f/0x4a0 > (XEN) =A0 =A0[] do_multicall+0x13e/0x330 > (XEN) =A0 =A0[] syscall_enter+0x88/0x8d > (XEN) =A0 =A0 > (XEN) Pagetable walk from 0000000000000060: > (XEN) =A0L4[0x000] =3D 00000001004a5067 0000000000038c9d > (XEN) =A0L3[0x000] =3D 000000013a703067 0000000000003094 > (XEN) =A0L2[0x000] =3D 0000000000000000 ffffffffffffffff=A0 > (XEN)=A0 > (XEN) **************************************** > (XEN) Panic on CPU 0: > (XEN) FATAL PAGE FAULT > (XEN) [error_code=3D0000] > (XEN) Faulting linear address: 0000000000000060 > (XEN) **************************************** > (XEN)=A0 > (XEN) Reboot in five seconds... >=20 >=20 > On Fri, Sep 7, 2012 at 12:06 PM, Ben Guthro wrote: >> I'll work on getting a JTAG, ICE, or something else - it is on an >> Intel SDP - so it should have the ports for it. >>=20 >> My current suspicion on this is that the hardware registers are not >> being programmed the same way as they were in 4.0.x >> (Since the "pulsing power button LED" on the laptops, and the behavior >> of the Desktop SDP are now similar) >>=20 >> Once again - I don't have a lot of evidence to back this up - however, >> if I ifdef out the register writes that actually start the low level >> suspend - in >> xen/arch/x86/acpi/power.c =A0acpi_enter_sleep_state() - the rest of the >> suspend process completes as though the machine suspended, and then >> immediately resumed. >>=20 >> In this case - the system seems to be functioning properly. >>=20 >>=20 >>=20 >>=20 >>=20 >> Hack to prevent low level S3 attached. >>=20 >>=20 >>=20 >> On Fri, Sep 7, 2012 at 8:18 AM, Jan Beulich wrote: >>>>>> >>>> On 07.09.12 at 13:51, Ben Guthro wrote: >>>> >> However, when I run with console=3Dnone, the observed behavior is ver= y >>>> >> different. >>>> >> The system seems to go to sleep successfully - but when I press the >>>> >> power button to wake it up - the power comes on - the fans spin up = - >>>> >> but the system is unresponsive. >>>> >> No video >>>> >> No network >>>> >> keyboard LEDs (Caps,Numlock) do not light up. >>>> >> >>>> >> >>>> >> Alternate debugging strategies welcome. >>> > >>> > I'm afraid other than being lucky to spot something via code >>> > inspection, the only alternative is an ITP/ICE. Maybe Intel folks >>> > could help out debugging this if it's reproducible for them. >>> > >>> > Jan >>> > >=20 >=20 >=20 > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel --B_3430969998_69237725 Content-type: text/html; charset="ISO-8859-1" Content-transfer-encoding: quoted-printable Re: [Xen-devel] Xen4.2 S3 regression? CPU#1 got stuck in loop in cpu_init() as it appears to be ‘already i= nitialised’ in cpu_initialized bitmap. CPU#0 detects it is stuck and c= arries on, but the resume code assumes all CPUs are brought back online and = crashes later.

I wonder how long this has been broken. I recall reworking the CPU bringup = code a lot early during 4.1.0 development... And I didn’t test S3.

 -- Keir

On 19/09/2012 22:07, "Ben Guthro" <be= n@guthro.net> wrote:

<= SPAN STYLE=3D'font-size:11pt'>No hardware debugger just yet - but I've moved t= o another machine (Lenovo T400 laptop) - and am now seeing the following sta= ck trace when I resume
(this is using the tip of the 4.2-testing tree)

It looks like either the vcpu, or the runstate is NULL, at this point in th= e resume process...


(XEN) Finishing wakeup from ACPI S3 state.
(XEN) Enabling non-boot CPUs =A0...
(XEN) CPU#1 already initialized!
(XEN) Stuck ??
(XEN) Error taking CPU1 up: -5
[ =A0 38.570054] ACPI: Low-level resume complete
[ =A0 38.570054] PM: Restoring platform NVS memory
[ =A0 38.570054] Enabling non-boot CPUs ...
(XEN) ----[ Xen-4.2.1-pre =A0x86_64 =A0debug=3Dn =A0Tainted: =A0 =A0C ]----
(XEN) CPU: =A0 =A00
(XEN) RIP: =A0 =A0e008:[<ffff82c480120585>] vcpu_runstate_get+0xe5/0x130<= BR> (XEN) RFLAGS: 0000000000010006 =A0 CONTEXT: hypervisor
(XEN) rax: 00007d3b7fd17180 =A0 rbx: ffff8300bd2fe000 =A0 rcx: 0000000000000000=
(XEN) rdx: ffff08003fc8bd80 =A0 rsi: ffff82c48029fe28 =A0 rdi: ffff8300bd2fe000=
(XEN) rbp: ffff82c48029fe28 =A0 rsp: ffff82c48029fdf8 =A0 r8: =A00000000000000008=
(XEN) r9: =A000000000000001c0 =A0 r10: ffff82c48021f4a0 =A0 r11: 0000000000000282=
(XEN) r12: ffff82c4802e8ee0 =A0 r13: ffff880039762da0 =A0 r14: ffff82c4802d3140=
(XEN) r15: fffffffffffffff2 =A0 cr0: 000000008005003b =A0 cr4: 00000000000026f0=
(XEN) cr3: 0000000139ee4000 =A0 cr2: 0000000000000060
(XEN) ds: 0000 =A0 es: 0000 =A0 fs: 0000 =A0 gs: 0000 =A0 ss: e010 =A0 cs: e008
(XEN) Xen stack trace from rsp=3Dffff82c48029fdf8:
(XEN) =A0 =A0ffff8300bd2fe000 ffff82c48029ff18 ffff880037481d40 ffff880039762da= 0
(XEN) =A0 =A00000000000000001 ffff82c480157df4 0000000000000070 ffff82f6016db30= 0
(XEN) =A0 =A000000000000b6d98 ffff8301355d8000 0000000000000070 ffff82c4801702a= b
(XEN) =A0 =A0ffff88003fc8bd80 0000000000000000 0000000000000020 ffff8300bd2fe00= 0
(XEN) =A0 =A0ffff8301355d8000 ffff880037481d40 ffff880039762da0 000000000000000= 1
(XEN) =A0 =A00000000000000003 ffff82c4801058df ffff82c48029ff18 ffff82c48011462= e
(XEN) =A0 =A00000000000000000 0000000000000000 0000000400000004 ffff82c48029ff1= 8
(XEN) =A0 =A00000000000000010 ffff8300bd6a0000 ffff8800374819a8 ffff8300bd6a000= 0
(XEN) =A0 =A0ffff880037481d48 0000000000000001 ffff880039762da0 ffff82c48021428= 8
(XEN) =A0 =A00000000000000003 0000000000000001 ffff880039762da0 000000000000000= 1
(XEN) =A0 =A0ffff880037481d48 0000000000000001 0000000000000282 ffff880002dc424= 0
(XEN) =A0 =A000000000000001c0 00000000000001c0 0000000000000018 ffffffff8100130= a
(XEN) =A0 =A0ffff880037481d40 0000000000000001 0000000000000005 000001000000000= 0
(XEN) =A0 =A0ffffffff8100130a 000000000000e033 0000000000000282 ffff880037481d2= 0
(XEN) =A0 =A0000000000000e02b 0000000000000000 0000000000000000 000000000000000= 0
(XEN) =A0 =A00000000000000000 0000000000000000 ffff8300bd6a0000 000000000000000= 0
(XEN) =A0 =A00000000000000000
(XEN) Xen call trace:
(XEN) =A0 =A0[<ffff82c480120585>] vcpu_runstate_get+0xe5/0x130
(XEN) =A0 =A0[<ffff82c480157df4>] arch_do_vcpu_op+0x134/0x5d0
(XEN) =A0 =A0[<ffff82c4801702ab>] do_update_descriptor+0x1db/0x220
(XEN) =A0 =A0[<ffff82c4801058df>] do_vcpu_op+0x6f/0x4a0
(XEN) =A0 =A0[<ffff82c48011462e>] do_multicall+0x13e/0x330
(XEN) =A0 =A0[<ffff82c480214288>] syscall_enter+0x88/0x8d
(XEN) =A0 =A0
(XEN) Pagetable walk from 0000000000000060:
(XEN) =A0L4[0x000] =3D 00000001004a5067 0000000000038c9d
(XEN) =A0L3[0x000] =3D 000000013a703067 0000000000003094
(XEN) =A0L2[0x000] =3D 0000000000000000 ffffffffffffffff=A0
(XEN)=A0
(XEN) ****************************************
(XEN) Panic on CPU 0:
(XEN) FATAL PAGE FAULT
(XEN) [error_code=3D0000]
(XEN) Faulting linear address: 0000000000000060
(XEN) ****************************************
(XEN)=A0
(XEN) Reboot in five seconds...


On Fri, Sep 7, 2012 at 12:06 PM, Ben Guthro <be= n@guthro.net> wrote:
<= SPAN STYLE=3D'font-size:11pt'>I'll work on getting a JTAG, ICE, or something e= lse - it is on an
Intel SDP - so it should have the ports for it.

My current suspicion on this is that the hardware registers are not
being programmed the same way as they were in 4.0.x
(Since the "pulsing power button LED" on the laptops, and the beh= avior
of the Desktop SDP are now similar)

Once again - I don't have a lot of evidence to back this up - however,
if I ifdef out the register writes that actually start the low level
suspend - in
xen/arch/x86/acpi/power.c =A0acpi_enter_sleep_state() - the rest of the
suspend process completes as though the machine suspended, and then
immediately resumed.

In this case - the system seems to be functioning properly.





Hack to prevent low level S3 attached.



On Fri, Sep 7, 2012 at 8:18 AM, Jan Beulich <JBeulich@suse.com> wrote:
>>>> On 07.09.12 at 13:51, Ben Guthro <ben@guthro.net> wrote:
>> However, when I run with console=3Dnone, the observed behavior is ve= ry
>> different.
>> The system seems to go to sleep successfully - but when I press th= e
>> power button to wake it up - the power comes on - the fans spin up= -
>> but the system is unresponsive.
>> No video
>> No network
>> keyboard LEDs (Caps,Numlock) do not light up.
>>
>>
>> Alternate debugging strategies welcome.
>
> I'm afraid other than being lucky to spot something via code
> inspection, the only alternative is an ITP/ICE. Maybe Intel folks
> could help out debugging this if it's reproducible for them.
>
> Jan
>
=


___________= ____________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel=
--B_3430969998_69237725-- --===============0740956528939052827== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel --===============0740956528939052827==--