From mboxrd@z Thu Jan 1 00:00:00 1970 From: Keir Fraser Subject: Re: Xen4.2 S3 regression? Date: Mon, 24 Sep 2012 21:30:16 +0100 Message-ID: References: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============6826780111348703616==" Return-path: In-Reply-To: List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Ben Guthro , Jan Beulich Cc: xen-devel List-Id: xen-devel@lists.xenproject.org > This message is in MIME format. Since your mail reader does not understand this format, some or all of this message may not be legible. --===============6826780111348703616== Content-type: multipart/alternative; boundary="B_3431367022_80686359" > This message is in MIME format. Since your mail reader does not understand this format, some or all of this message may not be legible. --B_3431367022_80686359 Content-type: text/plain; charset="ISO-8859-1" Content-transfer-encoding: quoted-printable Do a debug build so the backtrace can be trusted. It=B9s a NULL pointer dereference so shouldn=B9t be too tricky to make some headway on this one. Easier than the previous bug. :) -- Keir On 24/09/2012 20:02, "Ben Guthro" wrote: > Well...knock one bug down - and another crops up. >=20 > It appears that dom0_vcpu_pin is incompatible with S3. > I'll start digging into why, but if you have any thoughts from the stack > below, I'd welcome any pointers. >=20 > /btg >=20 >=20 > (XEN) Preparing system for ACPI S3 state. > (XEN) Disabling non-boot CPUs ... > (XEN) Entering ACPI S3 state. > (XEN) mce_intel.c:1239: MCA Capability: BCAST 1 SER 0 CMCI 0 firstbank 1 > extended MCE MSR 0 > (XEN) CMCI: CPU0 has no CMCI support > (XEN) CPU0: Thermal monitoring enabled (TM2) > (XEN) Finishing wakeup from ACPI S3 state. > (XEN) Enabling non-boot CPUs =A0... > (XEN) Booting processor 1/1 eip 8a000 > (XEN) Initializing CPU#1 > (XEN) CPU: L1 I cache: 32K, L1 D cache: 32K > (XEN) CPU: L2 cache: 3072K > (XEN) CPU: Physical Processor ID: 0 > (XEN) CPU: Processor Core ID: 1 > (XEN) CMCI: CPU1 has no CMCI support > (XEN) CPU1: Thermal monitoring enabled (TM2) > (XEN) CPU1: Intel(R) Core(TM)2 Duo CPU =A0 =A0 P8400 =A0@ 2.26GHz stepping 06 > (XEN) microcode: CPU1 updated from revision 0x60c to 0x60f, date =3D 2010-0= 9-29=A0 > [ =A0 60.100054] ACPI: Low-level resume complete > [ =A0 60.100054] PM: Restoring platform NVS memory > [ =A0 60.100054] Enabling non-boot CPUs ... > [ =A0 60.100054] installing Xen timer for CPU 1 > [ =A0 60.100054] cpu 1 spinlock event irq 279 > (XEN) ----[ Xen-4.2.1-pre =A0x86_64 =A0debug=3Dn =A0Tainted: =A0 =A0C ]---- > (XEN) CPU: =A0 =A01 > (XEN) RIP: =A0 =A0e008:[] vcpu_migrate+0x172/0x360 > (XEN) RFLAGS: 0000000000010096 =A0 CONTEXT: hypervisor > (XEN) rax: 00007d3b7fd17180 =A0 rbx: ffff82c4802e8ee0 =A0 rcx: ffff82c4802e8e= e0 > (XEN) rdx: ffff83013a3c5068 =A0 rsi: 0000000000000004 =A0 rdi: ffff8301300b7d= 68 > (XEN) rbp: 0000000000000001 =A0 rsp: ffff8301300b7e28 =A0 r8: =A000000000000000= 00 > (XEN) r9: =A0000000000000003e =A0 r10: 000000000000003e =A0 r11: 00000000000002= 46 > (XEN) r12: ffff83013a3c5068 =A0 r13: ffff83013a3c5068 =A0 r14: ffff82c4802d31= 40 > (XEN) r15: 0000000000000001 =A0 cr0: 000000008005003b =A0 cr4: 00000000000026= f0 > (XEN) cr3: 0000000131a05000 =A0 cr2: 0000000000000060 > (XEN) ds: 002b =A0 es: 002b =A0 fs: 0000 =A0 gs: 0000 =A0 ss: e010 =A0 cs: e008 > (XEN) Xen stack trace from rsp=3Dffff8301300b7e28: > (XEN) =A0 =A0ffff82c4802d3140 ffff83013a3c5068 0000000000000246 0000000000000= 004 > (XEN) =A0 =A0ffff8300bd2fe000 ffff82c4802e8ee0 00000004012d3140 ffff82c4802e8= ee0 > (XEN) =A0 =A0ffff88003fc8e820 ffff8300bd2fe000 ffff8301355d8000 0000000000000= 000 > (XEN) =A0 =A00000000000000000 0000000000000000 ffff88003fc8e820 ffff82c480105= a50 > (XEN) =A0 =A00000000000000000 ffff82c4801805ec 0000060f00000000 ffff82c480184= f16 > (XEN) =A0 =A00000000000000032 78a20f6e65780b0f ffff88003976fdc8 ffff8300bd2fe= 000 > (XEN) =A0 =A0ffff88003976fe50 ffff8300bd2fe000 ffff88003976fda0 0000000000000= 001 > (XEN) =A0 =A00000000000000000 ffff82c480214288 ffff88003fc8e820 0000000000000= 000 > (XEN) =A0 =A00000000000000000 0000000000000001 ffff88003976fda0 ffff88003fc8b= dc0 > (XEN) =A0 =A00000000000000246 ffff88003976fe60 00000000ffffffff 0000000000000= 000 > (XEN) =A0 =A00000000000000018 ffffffff8100130a 0000000000000000 0000000000000= 001 > (XEN) =A0 =A00000000000000007 0000010000000000 ffffffff8100130a 000000000000e= 033 > (XEN) =A0 =A00000000000000246 ffff88003976fd88 000000000000e02b d43d5f3fedaef= 5e7 > (XEN) =A0 =A0d3b2ddaeed5038ff 270adb813ad76c9b ddfd6ff5f85e6775 b5881cbf00000= 001 > (XEN) =A0 =A0ffff8300bd2fe000 0000003cba0dc180 0a109ac649c118a1 > (XEN) Xen call trace: > (XEN) =A0 =A0[] vcpu_migrate+0x172/0x360 > (XEN) =A0 =A0[] do_vcpu_op+0x1e0/0x4a0 > (XEN) =A0 =A0[] do_invalid_op+0x19c/0x3f0 > (XEN) =A0 =A0[] copy_from_user+0x26/0x90 > (XEN) =A0 =A0[] syscall_enter+0x88/0x8d > (XEN) =A0 =A0 > (XEN) Pagetable walk from 0000000000000060: > (XEN) =A0L4[0x000] =3D 0000000000000000 ffffffffffffffff > (XEN)=A0 > (XEN) **************************************** > (XEN) Panic on CPU 1: > (XEN) FATAL PAGE FAULT > (XEN) [error_code=3D0000] > (XEN) Faulting linear address: 0000000000000060 > (XEN) **************************************** > (XEN)=A0 > (XEN) Reboot in five seconds... >=20 >=20 > On Mon, Sep 24, 2012 at 10:28 AM, Jan Beulich wrote: >>>>> >>> On 24.09.12 at 16:16, Ben Guthro wrote: >>> > Would you prefer a separate [PATCH] email for this fix, or will you a= pply >>> > it as-is? >>=20 >> I'll put something together - the most important thing here obviously >> is having a proper description. Plus I'd like to slightly extend this an= d >> have acpi_dead_idle() actually use default_dead_idle(), just to have >> things consolidated in one place. I assume I can put your S-o-b on >> what you sent... >>=20 >> Jan >>=20 >>> > On Mon, Sep 24, 2012 at 10:10 AM, Jan Beulich wro= te: >>> > >>>>>>> >> >>> On 24.09.12 at 15:56, Ben Guthro wrote: >>>>> >> > On Mon, Sep 24, 2012 at 9:34 AM, Jan Beulich >>>>> wrote: >>>>>> >> >> ...; the interesting ones are >>>>>> >> >> - at the end of xen/arch/x86/acpu/cpu_idle.c:acpi_dead_idle() >>>>>> >> >> - xen/arch/x86/domain.c:default_dead_idle() >>>>> >> > >>>>> >> > >>>>> >> > Thanks! This fixes the issue on this machine! >>>> >> >>>> >> Hooray! >>>> >> >>>>> >> > Is this a reasonable long-term solution - or are there reasons n= ot to >>>>> >> > call wbinvd() here? >>>> >> >>>> >> That's a perfectly valid adjustment (see my earlier reply where >>>> >> I originally suggested it and explained why it may be necessary). >>>> >> >>>> >> Jan >>>> >> >>>> >> >>=20 >>=20 >>=20 >=20 >=20 >=20 > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel --B_3431367022_80686359 Content-type: text/html; charset="ISO-8859-1" Content-transfer-encoding: quoted-printable Re: [Xen-devel] Xen4.2 S3 regression? Do a debug build so the backtrace can be trusted. It’s a NULL pointe= r dereference so shouldn’t be too tricky to make some headway on this = one. Easier than the previous bug. :)

 -- Keir

On 24/09/2012 20:02, "Ben Guthro" <be= n@guthro.net> wrote:

<= SPAN STYLE=3D'font-size:11pt'>Well...knock one bug down - and another crops up= .

It appears that dom0_vcpu_pin is incompatible with S3.
I'll start digging into why, but if you have any thoughts from the stack be= low, I'd welcome any pointers.

/btg


(XEN) Preparing system for ACPI S3 state.
(XEN) Disabling non-boot CPUs ...
(XEN) Entering ACPI S3 state.
(XEN) mce_intel.c:1239: MCA Capability: BCAST 1 SER 0 CMCI 0 firstbank 1 ex= tended MCE MSR 0
(XEN) CMCI: CPU0 has no CMCI support
(XEN) CPU0: Thermal monitoring enabled (TM2)
(XEN) Finishing wakeup from ACPI S3 state.
(XEN) Enabling non-boot CPUs =A0...
(XEN) Booting processor 1/1 eip 8a000
(XEN) Initializing CPU#1
(XEN) CPU: L1 I cache: 32K, L1 D cache: 32K
(XEN) CPU: L2 cache: 3072K
(XEN) CPU: Physical Processor ID: 0
(XEN) CPU: Processor Core ID: 1
(XEN) CMCI: CPU1 has no CMCI support
(XEN) CPU1: Thermal monitoring enabled (TM2)
(XEN) CPU1: Intel(R) Core(TM)2 Duo CPU =A0 =A0 P8400 =A0@ 2.26GHz stepping 06
(XEN) microcode: CPU1 updated from revision 0x60c to 0x60f, date =3D 2010-09-= 29=A0
[ =A0 60.100054] ACPI: Low-level resume complete
[ =A0 60.100054] PM: Restoring platform NVS memory
[ =A0 60.100054] Enabling non-boot CPUs ...
[ =A0 60.100054] installing Xen timer for CPU 1
[ =A0 60.100054] cpu 1 spinlock event irq 279
(XEN) ----[ Xen-4.2.1-pre =A0x86_64 =A0debug=3Dn =A0Tainted: =A0 =A0C ]----
(XEN) CPU: =A0 =A01
(XEN) RIP: =A0 =A0e008:[<ffff82c480121562>] vcpu_migrate+0x172/0x360
(XEN) RFLAGS: 0000000000010096 =A0 CONTEXT: hypervisor
(XEN) rax: 00007d3b7fd17180 =A0 rbx: ffff82c4802e8ee0 =A0 rcx: ffff82c4802e8ee0=
(XEN) rdx: ffff83013a3c5068 =A0 rsi: 0000000000000004 =A0 rdi: ffff8301300b7d68=
(XEN) rbp: 0000000000000001 =A0 rsp: ffff8301300b7e28 =A0 r8: =A00000000000000000=
(XEN) r9: =A0000000000000003e =A0 r10: 000000000000003e =A0 r11: 0000000000000246=
(XEN) r12: ffff83013a3c5068 =A0 r13: ffff83013a3c5068 =A0 r14: ffff82c4802d3140=
(XEN) r15: 0000000000000001 =A0 cr0: 000000008005003b =A0 cr4: 00000000000026f0=
(XEN) cr3: 0000000131a05000 =A0 cr2: 0000000000000060
(XEN) ds: 002b =A0 es: 002b =A0 fs: 0000 =A0 gs: 0000 =A0 ss: e010 =A0 cs: e008
(XEN) Xen stack trace from rsp=3Dffff8301300b7e28:
(XEN) =A0 =A0ffff82c4802d3140 ffff83013a3c5068 0000000000000246 000000000000000= 4
(XEN) =A0 =A0ffff8300bd2fe000 ffff82c4802e8ee0 00000004012d3140 ffff82c4802e8ee= 0
(XEN) =A0 =A0ffff88003fc8e820 ffff8300bd2fe000 ffff8301355d8000 000000000000000= 0
(XEN) =A0 =A00000000000000000 0000000000000000 ffff88003fc8e820 ffff82c480105a5= 0
(XEN) =A0 =A00000000000000000 ffff82c4801805ec 0000060f00000000 ffff82c480184f1= 6
(XEN) =A0 =A00000000000000032 78a20f6e65780b0f ffff88003976fdc8 ffff8300bd2fe00= 0
(XEN) =A0 =A0ffff88003976fe50 ffff8300bd2fe000 ffff88003976fda0 000000000000000= 1
(XEN) =A0 =A00000000000000000 ffff82c480214288 ffff88003fc8e820 000000000000000= 0
(XEN) =A0 =A00000000000000000 0000000000000001 ffff88003976fda0 ffff88003fc8bdc= 0
(XEN) =A0 =A00000000000000246 ffff88003976fe60 00000000ffffffff 000000000000000= 0
(XEN) =A0 =A00000000000000018 ffffffff8100130a 0000000000000000 000000000000000= 1
(XEN) =A0 =A00000000000000007 0000010000000000 ffffffff8100130a 000000000000e03= 3
(XEN) =A0 =A00000000000000246 ffff88003976fd88 000000000000e02b d43d5f3fedaef5e= 7
(XEN) =A0 =A0d3b2ddaeed5038ff 270adb813ad76c9b ddfd6ff5f85e6775 b5881cbf0000000= 1
(XEN) =A0 =A0ffff8300bd2fe000 0000003cba0dc180 0a109ac649c118a1
(XEN) Xen call trace:
(XEN) =A0 =A0[<ffff82c480121562>] vcpu_migrate+0x172/0x360
(XEN) =A0 =A0[<ffff82c480105a50>] do_vcpu_op+0x1e0/0x4a0
(XEN) =A0 =A0[<ffff82c4801805ec>] do_invalid_op+0x19c/0x3f0
(XEN) =A0 =A0[<ffff82c480184f16>] copy_from_user+0x26/0x90
(XEN) =A0 =A0[<ffff82c480214288>] syscall_enter+0x88/0x8d
(XEN) =A0 =A0
(XEN) Pagetable walk from 0000000000000060:
(XEN) =A0L4[0x000] =3D 0000000000000000 ffffffffffffffff
(XEN)=A0
(XEN) ****************************************
(XEN) Panic on CPU 1:
(XEN) FATAL PAGE FAULT
(XEN) [error_code=3D0000]
(XEN) Faulting linear address: 0000000000000060
(XEN) ****************************************
(XEN)=A0
(XEN) Reboot in five seconds...


On Mon, Sep 24, 2012 at 10:28 AM, Jan Beulich <JBeulich@suse.com> wrote:
<= SPAN STYLE=3D'font-size:11pt'>>>> On 24.09.12 at 16:16, Ben Guthro &l= t;ben@guthro.net> wrote:
> Would you prefer a separate [PATCH] email for this fix, or will you ap= ply
> it as-is?

I'll put something together - the most important thing here obviously
is having a proper description. Plus I'd like to slightly extend this and have acpi_dead_idle() actually use default_dead_idle(), just to have
things consolidated in one place. I assume I can put your S-o-b on
what you sent...

Jan

> On Mon, Sep 24, 2012 at 10:10 AM, Jan Beulich <JBeulich@suse.com> wrote:
>
>> >>> On 24.09.12 at 15:56, Ben Guthro <ben@guthro.net> wrote:
>> > On Mon, Sep 24, 2012 at 9:34 AM, Jan Beulich <JBeulich@suse.com> wrote:
>> >> ...; the interesting ones are
>> >> - at the end of xen/arch/x86/acpu/cpu_idle.c:acpi_dead_id= le()
>> >> - xen/arch/x86/domain.c:default_dead_idle()
>> >
>> >
>> > Thanks! This fixes the issue on this machine!
>>
>> Hooray!
>>
>> > Is this a reasonable long-term solution - or are there reason= s not to
>> > call wbinvd() here?
>>
>> That's a perfectly valid adjustment (see my earlier reply where >> I originally suggested it and explained why it may be necessary).<= BR> >>
>> Jan
>>
>>



=


___________= ____________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel=
--B_3431367022_80686359-- --===============6826780111348703616== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel --===============6826780111348703616==--