From mboxrd@z Thu Jan 1 00:00:00 1970 From: Keir Fraser Subject: Re: Xen4.2 S3 regression? Date: Thu, 20 Sep 2012 07:24:58 +0100 Message-ID: References: Mime-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Return-path: In-Reply-To: List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Ben Guthro , Jan Beulich Cc: xen-devel , john.baboval@citrix.com, Thomas Goetz , Konrad Rzeszutek Wilk List-Id: xen-devel@lists.xenproject.org On 20/09/2012 07:13, "Keir Fraser" wrote: > CPU#1 got stuck in loop in cpu_init() as it appears to be =8Calready > initialised=B9 in cpu_initialized bitmap. CPU#0 detects it is stuck and c= arries > on, but the resume code assumes all CPUs are brought back online and cras= hes > later. > = > I wonder how long this has been broken. I recall reworking the CPU bringup > code a lot early during 4.1.0 development... And I didn=B9t test S3. > = > -- Keir However, I did test CPU hotplug a lot, and S3 uses the hotplug logic to take down and bring up CPUs. So I don't think I can have broken this. Are you able to hotplug physical CPUs from dom0 using the tools/misc/xen-hptool utility? If not, at least this might be a friendlier test method and environment than a full S3. -- Keir > On 19/09/2012 22:07, "Ben Guthro" wrote: > = >> No hardware debugger just yet - but I've moved to another machine (Lenovo >> T400 laptop) - and am now seeing the following stack trace when I resume >> (this is using the tip of the 4.2-testing tree) >> = >> It looks like either the vcpu, or the runstate is NULL, at this point in= the >> resume process... >> = >> = >> (XEN) Finishing wakeup from ACPI S3 state. >> (XEN) Enabling non-boot CPUs =A0... >> (XEN) CPU#1 already initialized! >> (XEN) Stuck ?? >> (XEN) Error taking CPU1 up: -5 >> [ =A0 38.570054] ACPI: Low-level resume complete >> [ =A0 38.570054] PM: Restoring platform NVS memory >> [ =A0 38.570054] Enabling non-boot CPUs ... >> (XEN) ----[ Xen-4.2.1-pre =A0x86_64 =A0debug=3Dn =A0Tainted: =A0 =A0C ]-= --- >> (XEN) CPU: =A0 =A00 >> (XEN) RIP: =A0 =A0e008:[] vcpu_runstate_get+0xe5/0x130 >> (XEN) RFLAGS: 0000000000010006 =A0 CONTEXT: hypervisor >> (XEN) rax: 00007d3b7fd17180 =A0 rbx: ffff8300bd2fe000 =A0 rcx: 000000000= 0000000 >> (XEN) rdx: ffff08003fc8bd80 =A0 rsi: ffff82c48029fe28 =A0 rdi: ffff8300b= d2fe000 >> (XEN) rbp: ffff82c48029fe28 =A0 rsp: ffff82c48029fdf8 =A0 r8: =A00000000= 000000008 >> (XEN) r9: =A000000000000001c0 =A0 r10: ffff82c48021f4a0 =A0 r11: 0000000= 000000282 >> (XEN) r12: ffff82c4802e8ee0 =A0 r13: ffff880039762da0 =A0 r14: ffff82c48= 02d3140 >> (XEN) r15: fffffffffffffff2 =A0 cr0: 000000008005003b =A0 cr4: 000000000= 00026f0 >> (XEN) cr3: 0000000139ee4000 =A0 cr2: 0000000000000060 >> (XEN) ds: 0000 =A0 es: 0000 =A0 fs: 0000 =A0 gs: 0000 =A0 ss: e010 =A0 c= s: e008 >> (XEN) Xen stack trace from rsp=3Dffff82c48029fdf8: >> (XEN) =A0 =A0ffff8300bd2fe000 ffff82c48029ff18 ffff880037481d40 ffff8800= 39762da0 >> (XEN) =A0 =A00000000000000001 ffff82c480157df4 0000000000000070 ffff82f6= 016db300 >> (XEN) =A0 =A000000000000b6d98 ffff8301355d8000 0000000000000070 ffff82c4= 801702ab >> (XEN) =A0 =A0ffff88003fc8bd80 0000000000000000 0000000000000020 ffff8300= bd2fe000 >> (XEN) =A0 =A0ffff8301355d8000 ffff880037481d40 ffff880039762da0 00000000= 00000001 >> (XEN) =A0 =A00000000000000003 ffff82c4801058df ffff82c48029ff18 ffff82c4= 8011462e >> (XEN) =A0 =A00000000000000000 0000000000000000 0000000400000004 ffff82c4= 8029ff18 >> (XEN) =A0 =A00000000000000010 ffff8300bd6a0000 ffff8800374819a8 ffff8300= bd6a0000 >> (XEN) =A0 =A0ffff880037481d48 0000000000000001 ffff880039762da0 ffff82c4= 80214288 >> (XEN) =A0 =A00000000000000003 0000000000000001 ffff880039762da0 00000000= 00000001 >> (XEN) =A0 =A0ffff880037481d48 0000000000000001 0000000000000282 ffff8800= 02dc4240 >> (XEN) =A0 =A000000000000001c0 00000000000001c0 0000000000000018 ffffffff= 8100130a >> (XEN) =A0 =A0ffff880037481d40 0000000000000001 0000000000000005 00000100= 00000000 >> (XEN) =A0 =A0ffffffff8100130a 000000000000e033 0000000000000282 ffff8800= 37481d20 >> (XEN) =A0 =A0000000000000e02b 0000000000000000 0000000000000000 00000000= 00000000 >> (XEN) =A0 =A00000000000000000 0000000000000000 ffff8300bd6a0000 00000000= 00000000 >> (XEN) =A0 =A00000000000000000 >> (XEN) Xen call trace: >> (XEN) =A0 =A0[] vcpu_runstate_get+0xe5/0x130 >> (XEN) =A0 =A0[] arch_do_vcpu_op+0x134/0x5d0 >> (XEN) =A0 =A0[] do_update_descriptor+0x1db/0x220 >> (XEN) =A0 =A0[] do_vcpu_op+0x6f/0x4a0 >> (XEN) =A0 =A0[] do_multicall+0x13e/0x330 >> (XEN) =A0 =A0[] syscall_enter+0x88/0x8d >> (XEN) =A0 =A0 >> (XEN) Pagetable walk from 0000000000000060: >> (XEN) =A0L4[0x000] =3D 00000001004a5067 0000000000038c9d >> (XEN) =A0L3[0x000] =3D 000000013a703067 0000000000003094 >> (XEN) =A0L2[0x000] =3D 0000000000000000 ffffffffffffffff=A0 >> (XEN)=A0 >> (XEN) **************************************** >> (XEN) Panic on CPU 0: >> (XEN) FATAL PAGE FAULT >> (XEN) [error_code=3D0000] >> (XEN) Faulting linear address: 0000000000000060 >> (XEN) **************************************** >> (XEN)=A0 >> (XEN) Reboot in five seconds... >> = >> = >> On Fri, Sep 7, 2012 at 12:06 PM, Ben Guthro wrote: >>> I'll work on getting a JTAG, ICE, or something else - it is on an >>> Intel SDP - so it should have the ports for it. >>> = >>> My current suspicion on this is that the hardware registers are not >>> being programmed the same way as they were in 4.0.x >>> (Since the "pulsing power button LED" on the laptops, and the behavior >>> of the Desktop SDP are now similar) >>> = >>> Once again - I don't have a lot of evidence to back this up - however, >>> if I ifdef out the register writes that actually start the low level >>> suspend - in >>> xen/arch/x86/acpi/power.c =A0acpi_enter_sleep_state() - the rest of the >>> suspend process completes as though the machine suspended, and then >>> immediately resumed. >>> = >>> In this case - the system seems to be functioning properly. >>> = >>> = >>> = >>> = >>> = >>> Hack to prevent low level S3 attached. >>> = >>> = >>> = >>> On Fri, Sep 7, 2012 at 8:18 AM, Jan Beulich wrote: >>>>>>> On 07.09.12 at 13:51, Ben Guthro wrote: >>>>> However, when I run with console=3Dnone, the observed behavior is very >>>>> different. >>>>> The system seems to go to sleep successfully - but when I press the >>>>> power button to wake it up - the power comes on - the fans spin up - >>>>> but the system is unresponsive. >>>>> No video >>>>> No network >>>>> keyboard LEDs (Caps,Numlock) do not light up. >>>>> = >>>>> = >>>>> Alternate debugging strategies welcome. >>>> = >>>> I'm afraid other than being lucky to spot something via code >>>> inspection, the only alternative is an ITP/ICE. Maybe Intel folks >>>> could help out debugging this if it's reproducible for them. >>>> = >>>> Jan >>>> = >> = >> = >> = >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@lists.xen.org >> http://lists.xen.org/xen-devel > =