From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
To: Marek Marczykowski <marmarek@invisiblethingslab.com>
Cc: "xen-devel@lists.xen.org" <xen-devel@lists.xen.org>
Subject: Re: High CPU temp, suspend problem - xen 4.1.5-pre, linux 3.7.x
Date: Fri, 22 Mar 2013 12:56:51 -0400 [thread overview]
Message-ID: <20130322165651.GA4827@phenom.dumpdata.com> (raw)
In-Reply-To: <514C79F3.5050504@invisiblethingslab.com>
On Fri, Mar 22, 2013 at 04:34:11PM +0100, Marek Marczykowski wrote:
> On 15.03.2013 14:02, Konrad Rzeszutek Wilk wrote:
> > On Wed, Mar 13, 2013 at 09:50:39PM +0100, Marek Marczykowski wrote:
> >> Hi,
> >>
> >> I've still have problems with ACPI(?) on Xen. After some system startup or
> >> resume CPU temperature goes high although all domUs (and dom0) are idle. On
> >> "good" system startup it is about 50-55C, on "bad" - above 67C (most time
> >> above 70C). I've noticed difference in C-states repored by Xen (attached
> >> files). On "bad" startups in addition suspend doesn't work - system restarts
> >> during suspend (still didn't managed to get console messages - I don't have
> >> serial port on this system). Note that sometimes system boots fine ("good"
> >> state), but problem occurs after some suspend/resume cycles. Some time ago
> >> I've got other symptoms: only CPU0 was used - for all VCPUs (according to xl
> >> vcpu-list). Maybe it is related?
> >>
> >> Hardware: Dell Latitude E6420
> >> CPU: Intel i5-2520M
> >>
> >> Software:
> >> xen stable-4.1 as of 15.02 (last commit: "xen: sched_creadit: improve picking
> >> up the idle CPU for a VCPU"), with reverted commit "Introduce system_state
> >> variable."
> >> But the same problem on vanilla xen 4.1.2.
> >>
> >> Linux 3.7.6 - happens almost every boot. On Linux 3.7.4 happens much rarer
> >> (but still occurs).
> >> Kernel config:
> >> http://git.qubes-os.org/gitweb/?p=marmarek/kernel.git;a=blob;f=config-pvops;h=a6e953f71cdc84556571b592b8af87a5a4f9a8d0;hb=HEAD
> >> I've tried some bisect from 3.7.4 to 3.7.6, but without success because
> >> problem isn't 100% reproducible.
> >>
> >> Any ideas?
> >
> > That C-states difference is important. The SYSIO part on your box means that the
> > CPU ends up doing an MWAIT. An HALT on the other hand is not so power-saving
> > friendly.
> >
> > Looking at this:
> >> (XEN) no cpu_id for acpi_id 5
> >> (XEN) no cpu_id for acpi_id 6
> >> (XEN) no cpu_id for acpi_id 7
> >> (XEN) no cpu_id for acpi_id 8
> >
> > .. means that xen-acpi-processor was trying to probe for the ACPI IDs of the
> > the other CPUs that the machine theoritcally can support. That means it got
> > the ACPI information for the first four CPUs (which is good).
> >
> > You can as the first step in trying to figure this out, add #define DEBUG 1
> > in xen-acpi-processor.c right before any of the #includes. And also boot
> > Xen with 'cpufreq=verbose'. That should tell you what kind of C-states the
> > xen-acpi-processor uploaded (And if it did it for all of the vCPUS).
> >
> > If both bootups show that we do upload the C-states for all the CPUs but they
> > vary that means digging a bit deeper in the ACPI code. Specifically in
> > acpi_processor_get_power_info_cst and seeing if it hits any of the 'continue'.
> >
> > Then I would say take also the DSDT for both bootups and compare them. It might
> > be that the BIOS is using a scratch register at reboot to construct the C-states
> > and somehow it ends up being corrupted. Which means that on the next warm reboot
> > the C-states has bogus data. This does show up in the field :-(
>
> Finally I've found some time for further debugging this. And it looks like
> some deeper ACPI code problem...
>
> I've switched to 3.8.4, on which problem is much easier to reproduce (almost
> every startup).
>
> On bad bootup, xen-acpi-processor didn't found any C-state: for each CPU
> _pr.flags.power and _pr->power.count was 0 (but flags.power_setup_done=1). In
> this case suspend (or shutdown) always ends up with reset.
This is you booting the machine from a cold-state or a warm one?
There are some BIOSes out there that I know that use the scratchpad registers in
IOH (so depending on the platform that can be 0:0e.1 , Reg 0x84). If Xen or Linux
touch it then the P-states and C-states that the BIOS generates are buggy.
But that is not the case here - you are saying that the DSDT after disassembling
(so cat /sys/firmware/acpi/tables/DSDT, or SSDT* and the iasl -d on them), the
_PSD, _PSS, and _PCT look the same?
You could also look at the FACP table and see if they are different.
>
> On good one xen-acpi-processor got C1-C3 states for each CPU, then suspend
> succeeded, but after resume CPU0 had C1-C3, but others only C1. Reloading
> xen-acpi-processor (rmmod -f...) fixes this (according to xl debug-key c), but
> still temperature keep high. Regardless of xen-acpi-processor reloading, next
> suspend always fails.
If you reload, and look at the runqeueus, are all of them using the ACPI
idler or the default one?
>
> Not sure how C-states can be related to S3 suspend, but perhaps something more
> general with ACPI is wrong?
This reminds me of something. I recall a long long time ago seeing something like this....
Completly forgot about this until now. The difference was whether the Xen's cpu_idle
as running a) the acpi_idle (so using the different C-states), or b) the default one
(so just using HLT).
With the b), during resume it would get half-way through
(http://darnok.org/xen/devel.acpi-s3.v1.serial.log) while with a) it would actually
continue on - http://darnok.org/xen/devel.acpi-s3.v0.serial.log
This was on some MSI MS-7680/H61M-P23 (MS-7680) motherboard.
Oh look: http://lists.xen.org/archives/html/xen-devel/2011-06/msg02059.html
And it looks Kevin's recommendation was use the a) case with max_cstates=1
to narrow it down.
>
> Each time DSDT (get from /sys/firmware/acpi/tables) is exactly the same.
>
> --
> Best Regards / Pozdrawiam,
> Marek Marczykowski
> Invisible Things Lab
>
next prev parent reply other threads:[~2013-03-22 16:56 UTC|newest]
Thread overview: 68+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-03-13 20:50 High CPU temp, suspend problem - xen 4.1.5-pre, linux 3.7.x Marek Marczykowski
2013-03-15 3:00 ` Dario Faggioli
2013-03-15 3:22 ` Marek Marczykowski
2013-03-15 13:02 ` Konrad Rzeszutek Wilk
2013-03-22 15:34 ` Marek Marczykowski
2013-03-22 16:56 ` Konrad Rzeszutek Wilk [this message]
2013-03-25 11:36 ` Marek Marczykowski
2013-03-25 14:17 ` Konrad Rzeszutek Wilk
2013-03-25 14:56 ` Marek Marczykowski
2013-03-26 12:17 ` Marek Marczykowski
2013-03-26 13:11 ` Jan Beulich
2013-03-26 13:50 ` Marek Marczykowski
2013-03-26 15:47 ` Andrew Cooper
2013-03-26 16:12 ` Andrew Cooper
2013-03-26 16:47 ` Marek Marczykowski
2013-03-26 16:03 ` Jan Beulich
2013-03-26 16:45 ` Marek Marczykowski
2013-03-26 17:02 ` Andrew Cooper
2013-03-26 17:42 ` Marek Marczykowski
2013-03-26 17:54 ` Andrew Cooper
2013-03-26 18:21 ` Marek Marczykowski
2013-03-26 18:50 ` Andrew Cooper
2013-03-27 8:50 ` Marek Marczykowski
2013-03-27 8:58 ` Jan Beulich
2013-03-27 8:52 ` Jan Beulich
2013-03-27 9:03 ` Jan Beulich
2013-03-27 14:01 ` Marek Marczykowski
2013-03-27 14:31 ` Marek Marczykowski
2013-03-27 14:46 ` Andrew Cooper
2013-03-27 14:49 ` Marek Marczykowski
2013-03-27 15:51 ` Marek Marczykowski
2013-03-27 16:27 ` Andrew Cooper
2013-03-27 18:16 ` Marek Marczykowski
2013-03-27 18:56 ` Andrew Cooper
2013-03-28 14:43 ` Marek Marczykowski
2013-03-28 10:50 ` Jan Beulich
2013-03-28 11:53 ` Andrew Cooper
2013-03-28 12:54 ` Jan Beulich
2013-03-28 13:19 ` Jan Beulich
2013-03-27 14:52 ` Andrew Cooper
2013-03-27 15:47 ` Konrad Rzeszutek Wilk
2013-03-27 16:56 ` Andrew Cooper
2013-03-27 17:15 ` Marek Marczykowski
2013-03-28 17:41 ` Andrew Cooper
2013-03-28 17:44 ` Marek Marczykowski
2013-03-28 17:50 ` Andrew Cooper
2013-03-29 0:26 ` Marek Marczykowski
2013-03-28 16:13 ` Jan Beulich
2013-03-28 19:03 ` Marek Marczykowski
2013-04-01 13:53 ` Ben Guthro
2013-04-02 1:13 ` Marek Marczykowski
2013-04-02 14:05 ` Konrad Rzeszutek Wilk
2013-04-15 22:09 ` Marek Marczykowski
2013-04-15 23:36 ` Ben Guthro
2013-04-15 23:51 ` konrad wilk
2013-04-16 0:19 ` Ben Guthro
2013-04-16 0:46 ` Ben Guthro
2013-04-16 3:20 ` konrad wilk
2013-04-16 1:02 ` Marek Marczykowski
2013-04-16 8:47 ` Jan Beulich
2013-04-16 11:49 ` Ben Guthro
2013-04-16 11:57 ` Jan Beulich
2013-04-16 12:09 ` Ben Guthro
2013-04-16 12:51 ` Jan Beulich
2013-03-28 16:25 ` Jan Beulich
2013-03-28 16:31 ` Marek Marczykowski
2013-03-28 16:52 ` Jan Beulich
2013-03-28 17:09 ` Marek Marczykowski
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130322165651.GA4827@phenom.dumpdata.com \
--to=konrad.wilk@oracle.com \
--cc=marmarek@invisiblethingslab.com \
--cc=xen-devel@lists.xen.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.