From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
To: Marek Marczykowski <marmarek@invisiblethingslab.com>
Cc: "xen-devel@lists.xen.org" <xen-devel@lists.xen.org>
Subject: Re: High CPU temp, suspend problem - xen 4.1.5-pre, linux 3.7.x
Date: Fri, 22 Mar 2013 12:56:51 -0400 [thread overview]
Message-ID: <20130322165651.GA4827@phenom.dumpdata.com> (raw)
In-Reply-To: <514C79F3.5050504@invisiblethingslab.com>
On Fri, Mar 22, 2013 at 04:34:11PM +0100, Marek Marczykowski wrote:
> On 15.03.2013 14:02, Konrad Rzeszutek Wilk wrote:
> > On Wed, Mar 13, 2013 at 09:50:39PM +0100, Marek Marczykowski wrote:
> >> Hi,
> >>
> >> I've still have problems with ACPI(?) on Xen. After some system startup or
> >> resume CPU temperature goes high although all domUs (and dom0) are idle. On
> >> "good" system startup it is about 50-55C, on "bad" - above 67C (most time
> >> above 70C). I've noticed difference in C-states repored by Xen (attached
> >> files). On "bad" startups in addition suspend doesn't work - system restarts
> >> during suspend (still didn't managed to get console messages - I don't have
> >> serial port on this system). Note that sometimes system boots fine ("good"
> >> state), but problem occurs after some suspend/resume cycles. Some time ago
> >> I've got other symptoms: only CPU0 was used - for all VCPUs (according to xl
> >> vcpu-list). Maybe it is related?
> >>
> >> Hardware: Dell Latitude E6420
> >> CPU: Intel i5-2520M
> >>
> >> Software:
> >> xen stable-4.1 as of 15.02 (last commit: "xen: sched_creadit: improve picking
> >> up the idle CPU for a VCPU"), with reverted commit "Introduce system_state
> >> variable."
> >> But the same problem on vanilla xen 4.1.2.
> >>
> >> Linux 3.7.6 - happens almost every boot. On Linux 3.7.4 happens much rarer
> >> (but still occurs).
> >> Kernel config:
> >> http://git.qubes-os.org/gitweb/?p=marmarek/kernel.git;a=blob;f=config-pvops;h=a6e953f71cdc84556571b592b8af87a5a4f9a8d0;hb=HEAD
> >> I've tried some bisect from 3.7.4 to 3.7.6, but without success because
> >> problem isn't 100% reproducible.
> >>
> >> Any ideas?
> >
> > That C-states difference is important. The SYSIO part on your box means that the
> > CPU ends up doing an MWAIT. An HALT on the other hand is not so power-saving
> > friendly.
> >
> > Looking at this:
> >> (XEN) no cpu_id for acpi_id 5
> >> (XEN) no cpu_id for acpi_id 6
> >> (XEN) no cpu_id for acpi_id 7
> >> (XEN) no cpu_id for acpi_id 8
> >
> > .. means that xen-acpi-processor was trying to probe for the ACPI IDs of the
> > the other CPUs that the machine theoritcally can support. That means it got
> > the ACPI information for the first four CPUs (which is good).
> >
> > You can as the first step in trying to figure this out, add #define DEBUG 1
> > in xen-acpi-processor.c right before any of the #includes. And also boot
> > Xen with 'cpufreq=verbose'. That should tell you what kind of C-states the
> > xen-acpi-processor uploaded (And if it did it for all of the vCPUS).
> >
> > If both bootups show that we do upload the C-states for all the CPUs but they
> > vary that means digging a bit deeper in the ACPI code. Specifically in
> > acpi_processor_get_power_info_cst and seeing if it hits any of the 'continue'.
> >
> > Then I would say take also the DSDT for both bootups and compare them. It might
> > be that the BIOS is using a scratch register at reboot to construct the C-states
> > and somehow it ends up being corrupted. Which means that on the next warm reboot
> > the C-states has bogus data. This does show up in the field :-(
>
> Finally I've found some time for further debugging this. And it looks like
> some deeper ACPI code problem...
>
> I've switched to 3.8.4, on which problem is much easier to reproduce (almost
> every startup).
>
> On bad bootup, xen-acpi-processor didn't found any C-state: for each CPU
> _pr.flags.power and _pr->power.count was 0 (but flags.power_setup_done=1). In
> this case suspend (or shutdown) always ends up with reset.
This is you booting the machine from a cold-state or a warm one?
There are some BIOSes out there that I know that use the scratchpad registers in
IOH (so depending on the platform that can be 0:0e.1 , Reg 0x84). If Xen or Linux
touch it then the P-states and C-states that the BIOS generates are buggy.
But that is not the case here - you are saying that the DSDT after disassembling
(so cat /sys/firmware/acpi/tables/DSDT, or SSDT* and the iasl -d on them), the
_PSD, _PSS, and _PCT look the same?
You could also look at the FACP table and see if they are different.
>
> On good one xen-acpi-processor got C1-C3 states for each CPU, then suspend
> succeeded, but after resume CPU0 had C1-C3, but others only C1. Reloading
> xen-acpi-processor (rmmod -f...) fixes this (according to xl debug-key c), but
> still temperature keep high. Regardless of xen-acpi-processor reloading, next
> suspend always fails.
If you reload, and look at the runqeueus, are all of them using the ACPI
idler or the default one?
>
> Not sure how C-states can be related to S3 suspend, but perhaps something more
> general with ACPI is wrong?
This reminds me of something. I recall a long long time ago seeing something like this....
Completly forgot about this until now. The difference was whether the Xen's cpu_idle
as running a) the acpi_idle (so using the different C-states), or b) the default one
(so just using HLT).
With the b), during resume it would get half-way through
(http://darnok.org/xen/devel.acpi-s3.v1.serial.log) while with a) it would actually
continue on - http://darnok.org/xen/devel.acpi-s3.v0.serial.log
This was on some MSI MS-7680/H61M-P23 (MS-7680) motherboard.
Oh look: http://lists.xen.org/archives/html/xen-devel/2011-06/msg02059.html
And it looks Kevin's recommendation was use the a) case with max_cstates=1
to narrow it down.
>
> Each time DSDT (get from /sys/firmware/acpi/tables) is exactly the same.
>
> --
> Best Regards / Pozdrawiam,
> Marek Marczykowski
> Invisible Things Lab
>
next prev parent reply other threads:[~2013-03-22 16:56 UTC|newest]
Thread overview: 68+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-03-13 20:50 High CPU temp, suspend problem - xen 4.1.5-pre, linux 3.7.x Marek Marczykowski
2013-03-15 3:00 ` Dario Faggioli
2013-03-15 3:22 ` Marek Marczykowski
2013-03-15 13:02 ` Konrad Rzeszutek Wilk
2013-03-22 15:34 ` Marek Marczykowski
2013-03-22 16:56 ` Konrad Rzeszutek Wilk [this message]
2013-03-25 11:36 ` Marek Marczykowski
2013-03-25 14:17 ` Konrad Rzeszutek Wilk
2013-03-25 14:56 ` Marek Marczykowski
2013-03-26 12:17 ` Marek Marczykowski
2013-03-26 13:11 ` Jan Beulich
2013-03-26 13:50 ` Marek Marczykowski
2013-03-26 15:47 ` Andrew Cooper
2013-03-26 16:12 ` Andrew Cooper
2013-03-26 16:47 ` Marek Marczykowski
2013-03-26 16:03 ` Jan Beulich
2013-03-26 16:45 ` Marek Marczykowski
2013-03-26 17:02 ` Andrew Cooper
2013-03-26 17:42 ` Marek Marczykowski
2013-03-26 17:54 ` Andrew Cooper
2013-03-26 18:21 ` Marek Marczykowski
2013-03-26 18:50 ` Andrew Cooper
2013-03-27 8:50 ` Marek Marczykowski
2013-03-27 8:58 ` Jan Beulich
2013-03-27 8:52 ` Jan Beulich
2013-03-27 9:03 ` Jan Beulich
2013-03-27 14:01 ` Marek Marczykowski
2013-03-27 14:31 ` Marek Marczykowski
2013-03-27 14:46 ` Andrew Cooper
2013-03-27 14:49 ` Marek Marczykowski
2013-03-27 15:51 ` Marek Marczykowski
2013-03-27 16:27 ` Andrew Cooper
2013-03-27 18:16 ` Marek Marczykowski
2013-03-27 18:56 ` Andrew Cooper
2013-03-28 14:43 ` Marek Marczykowski
2013-03-28 10:50 ` Jan Beulich
2013-03-28 11:53 ` Andrew Cooper
2013-03-28 12:54 ` Jan Beulich
2013-03-28 13:19 ` Jan Beulich
2013-03-27 14:52 ` Andrew Cooper
2013-03-27 15:47 ` Konrad Rzeszutek Wilk
2013-03-27 16:56 ` Andrew Cooper
2013-03-27 17:15 ` Marek Marczykowski
2013-03-28 17:41 ` Andrew Cooper
2013-03-28 17:44 ` Marek Marczykowski
2013-03-28 17:50 ` Andrew Cooper
2013-03-29 0:26 ` Marek Marczykowski
2013-03-28 16:13 ` Jan Beulich
2013-03-28 19:03 ` Marek Marczykowski
2013-04-01 13:53 ` Ben Guthro
2013-04-02 1:13 ` Marek Marczykowski
2013-04-02 14:05 ` Konrad Rzeszutek Wilk
2013-04-15 22:09 ` Marek Marczykowski
2013-04-15 23:36 ` Ben Guthro
2013-04-15 23:51 ` konrad wilk
2013-04-16 0:19 ` Ben Guthro
2013-04-16 0:46 ` Ben Guthro
2013-04-16 3:20 ` konrad wilk
2013-04-16 1:02 ` Marek Marczykowski
2013-04-16 8:47 ` Jan Beulich
2013-04-16 11:49 ` Ben Guthro
2013-04-16 11:57 ` Jan Beulich
2013-04-16 12:09 ` Ben Guthro
2013-04-16 12:51 ` Jan Beulich
2013-03-28 16:25 ` Jan Beulich
2013-03-28 16:31 ` Marek Marczykowski
2013-03-28 16:52 ` Jan Beulich
2013-03-28 17:09 ` Marek Marczykowski
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130322165651.GA4827@phenom.dumpdata.com \
--to=konrad.wilk@oracle.com \
--cc=marmarek@invisiblethingslab.com \
--cc=xen-devel@lists.xen.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).