From: Steve Freitas <sflist@ihonk.com>
To: Jan Beulich <JBeulich@suse.com>, Len Brown <len.brown@intel.com>
Cc: Don Slutz <dslutz@verizon.com>,
Jun Nakajima <jun.nakajima@intel.com>,
Donald D Dugger <donald.d.dugger@intel.com>,
"xen-devel@lists.xen.org" <xen-devel@lists.xen.org>
Subject: Re: Regression, host crash with 4.5rc1
Date: Sun, 08 Mar 2015 17:45:15 -0700 [thread overview]
Message-ID: <54FCED1B.8060605@ihonk.com> (raw)
In-Reply-To: <54F48EB902000078000652D0@mail.emea.novell.com>
[-- Attachment #1.1: Type: text/plain, Size: 8063 bytes --]
Hi Len, thanks for chiming in. I am a Xen noob and generally clueless to
the inner workings of this power management stuff, so apologies in
advance if I don't understand what is asked. I am, however, happy to try
whatever you'd like me to in pursuing this issue.
On 03/02/2015 07:24 AM, Jan Beulich wrote:
>>>> On 27.02.15 at 18:50, <len.brown@intel.com> wrote:
>> If this issue were to happen on Linux/bare-metal, this is how I'd debug it.
>> Hopefully some of this will translate to Xen in one way or another.
> Sadly not really - the kernel plays only a minor role (forwarding ACPI
> data to the hypervisor) in C-state handling under Xen.
>
>> dmesg | grep idle
>> will tell us what idle driver is running (on Dom0 kernel)
>> and if it is intel_idle, it will also tell us the supported sub-states
>> (CPUID.MWAIT.EDX value)
root@g2:~# dmesg | grep idle
[ 0.000000] RCU dyntick-idle grace-period acceleration is enabled.
[ 11.391708] intel_idle: MWAIT substates: 0x1120
[ 11.391711] intel_idle: v0.4 model 0x2C
[ 11.391712] intel_idle: lapic_timer_reliable_states 0xffffffff
[ 11.391780] intel_idle: intel_idle yielding to none
(This output is the same whether I've got max_cstate=2 set or not.)
> Yeah, we call the driver mwait-idle in the hypervisor, and the log
> would be accssible via "xl dmesg", but yes, that information is
> available there too.
>
>>>> (XEN) C1: type[C1] latency[003] usage[12219860] method[ FFH]
>>>> duration[1190961948551]
>>>> (XEN) C2: type[C1] latency[010] usage[10205554] method[ FFH]
>>>> duration[2015393965907]
>>>> (XEN) C3: type[C2] latency[020] usage[50926286] method[ FFH]
>>>> duration[30527997858148]
>> I'm hopeful that this information comes from the hardware's BIOS
>> and not some hypervisor tricking out Dom0 with a fake BIOS, yes?
> In the case of mwait-idle (intel_idle on Linux) it would be built-in
> knowledge of the driver. For acpi-cpuidle it would come from
> actual firmware, not anything fake/virtual.
>
>> Next, hopefully the attached turbostat utility can be invoked on Dom0
>> and it can read the MSRs on at least 1 processor via the /dev/cpu interface.
> Yes, that would be possible, provided it's not important what specific
> CPU it gets executed on.
I've run it (with the "max_cstate=2" intact from Xen's boot line) and
the output is as follows, while running the problematic graphics
benchmark on my Win 7 VM:
root@g2:~/turbostat-test# ./turbostat
./turbostat: APERF or MPERF went backwards *
* Frequency results do not cover entire interval *
* fix this by running Linux-2.6.30 or later *
CPU Avg_MHz %Busy Bzy_MHz TSC_MHz
- 36804******** 2736 2800
0 64323******** 2560 2800
1 8244******** 3398 2800
2 125758******** 2760 2800
3 17811******** 3032 2800
4 735******** 2977 2800
5 3954******** 2656 2800
CPU Avg_MHz %Busy Bzy_MHz TSC_MHz
- 47728******** 2804 2800
0 18007******** 3025 2800
1 69086******** 2634 2800
2 522******** 2713 2800
3 77486******** 2680 2800
4 58487******** 2932 2800
5 62777******** 3006 2800
CPU Avg_MHz %Busy Bzy_MHz TSC_MHz
- 49031******** 2728 2800
0 78178******** 2681 2800
1 62045******** 2561 2800
2 9060******** 3110 2800
3 16619******** 3255 2800
4 720******** 2661 2800
5 127565******** 2763 2800
CPU Avg_MHz %Busy Bzy_MHz TSC_MHz
- 65471******** 2700 2800
0 70582******** 2638 2800
1 2173******** 1954 2800
2 49981******** 2899 2800
3 78668******** 2682 2800
4 128293******** 2762 2800
5 63131******** 2566 2800
Not sure why the warning about the kernel version, this box is running
Debian's Linux 3.16 kernel.
With "max_cstate=2" removed from Xen's boot line, this is the result:
root@g2:~/turbostat-test# ./turbostat
./turbostat: APERF or MPERF went backwards *
* Frequency results do not cover entire interval *
* fix this by running Linux-2.6.30 or later *
CPU Avg_MHz %Busy Bzy_MHz TSC_MHz
- 23507******** 2621 2800
0 27631******** 2552 2800
1 35945******** 2978 2800
2 24417******** 2472 2800
3 1001******** 2948 2800
4 24417******** 2472 2800
5 27631******** 2552 2800
CPU Avg_MHz %Busy Bzy_MHz TSC_MHz
- 14114******** 2687 2800
0 529******** 2738 2800
1 60363******** 2750 2800
2 21290******** 2497 2800
3 1028******** 2934 2800
4 629******** 2943 2800
5 842******** 2937 2800
CPU Avg_MHz %Busy Bzy_MHz TSC_MHz
- 15048******** 2714 2800
0 25703******** 2489 2800
1 36024******** 2975 2800
2 5248******** 2454 2800
3 5248******** 2454 2800
4 9138******** 2755 2800
5 8925******** 2751 2800
CPU Avg_MHz %Busy Bzy_MHz TSC_MHz
- 32859******** 2598 2800
0 23089******** 2526 2800
1 61730******** 2751 2800
2 26138******** 2492 2800
3 26138******** 2492 2800
4 30029******** 2574 2800
5 30029******** 2574 2800
>> It may tell us just the same thing I think we learned here:
>>
>>>> (XEN) PC2[0] PC3[8589642315848] PC6[0] PC7[0]
>>>> (XEN) CC3[28794734145697] CC6[0] CC7[0]
>> which I'm assuming are a dump of the MSR residency counters.
>> If yes, it appears to be that this platform is not invoking c6 and pc6 at
>> all,
>> and that the deepest state being used is actually cc3 and pc3.
>> I don't know if that is because you've booted the kernel with max_cstate=N
>> of some kind, or if this is default.
> Sadly I haven't been able to tell which original mail the quotes
> above are from, and since I had Steve experiment with disabling
> the deepest C-state permitted to be used, it may well be that
> this output was from one of those experiments. Remember, we
> already know that with use of C6 alone disabled things work for
> him (Steve - please correct me if I'm misremembering).
AIUI, that is correct. My Xen boot line (which eliminates the dom0/U
hangs) includes "mwait-idle=1 max_cstate=2".
>> Guessing...
>> If no surprises in the debug stuff requested above, and
>> If the XEN debug stuff above is with c6 explicitly disabled...
>> Note that here are two kinds of c6 -- CC6 (core) and PC6 (package).
>> If this box supports both, the next thing to try will be to keep CC6
>> enabled, but to just disable PC6. This is done via an MSR that turbostat
>> dumps out (MSR_NHM_SNB_PKG_CST_CFG_CTL) via the wrmsr(8) utility.
> I don't think the wrmsr tool can be used (unmodified) to reliably do
> this on all CPUs in the system - we'd likely have to cook up a patch
> to the hypervisor instead, or I'd have to hand my patch to msr-tools
> to Steve so he could use the tool under Xen (albeit that would also
> require him to use one of our forward ported kernels, as the
> upstream one doesn't have a pCPU sysfs interface yet afaik).
I'm game for whatever.
>> Though if that MSR is locked by the BIOS, then BIOS SETUP option
>> may be the only way to disable the package C-state limit without
>> also disabling the associated core C-state.
> Steve, could you check whether any such option exists (it's been
> a while, so apologies if we had asked already)?
No problem. I've cruised through the BIOS options and this is what I see
that may apply:
If you'd like me to make any changes to those settings, please let me
know. For reference this is a Lenovo ThinkStation D20 running a Xeon X5660.
Thanks!
Steve
[-- Attachment #1.2.1: Type: text/html, Size: 11470 bytes --]
[-- Attachment #1.2.2: Type: image/jpeg, Size: 55018 bytes --]
[-- Attachment #1.2.3: Type: image/jpeg, Size: 60102 bytes --]
[-- Attachment #2: Type: text/plain, Size: 126 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
next prev parent reply other threads:[~2015-03-09 0:45 UTC|newest]
Thread overview: 39+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-11-03 21:46 Regression, host crash with 4.5rc1 Steve Freitas
2014-11-03 21:52 ` Steve Freitas
2014-11-04 14:01 ` Don Slutz
2014-11-04 8:20 ` Pasi Kärkkäinen
2014-11-04 10:15 ` Jan Beulich
2014-11-10 8:03 ` Steve Freitas
2014-11-10 8:51 ` Jan Beulich
2014-11-10 20:05 ` Steve Freitas
2014-11-11 8:05 ` Jan Beulich
2014-11-17 19:21 ` Steve Freitas
2014-11-18 7:54 ` Jan Beulich
2014-11-20 1:23 ` Steve Freitas
2014-11-20 7:59 ` Jan Beulich
2014-11-20 20:07 ` Steve Freitas
2014-11-21 8:42 ` Jan Beulich
2014-11-23 1:28 ` Steve Freitas
2014-11-24 8:45 ` Jan Beulich
2014-11-24 9:08 ` Steve Freitas
2014-11-24 9:15 ` Jan Beulich
2014-11-24 11:41 ` Jan Beulich
2014-11-24 22:17 ` Steve Freitas
2014-11-25 8:16 ` Jan Beulich
2014-11-25 9:38 ` Steve Freitas
2014-11-25 11:00 ` Jan Beulich
2014-11-27 5:29 ` Steve Freitas
2014-11-27 9:27 ` Jan Beulich
2014-11-28 8:24 ` Steve Freitas
2014-11-28 8:50 ` Jan Beulich
2014-11-28 9:44 ` Steve Freitas
2014-12-03 17:14 ` Dugger, Donald D
2015-02-27 17:25 ` Dugger, Donald D
2015-02-27 17:50 ` Brown, Len
2015-03-02 15:24 ` Jan Beulich
2015-03-09 0:45 ` Steve Freitas [this message]
2015-03-26 20:49 ` Brown, Len
2014-11-21 9:31 ` Jan Beulich
2014-11-04 18:35 ` Steve Freitas
2014-11-04 14:39 ` Don Slutz
2014-11-06 23:20 ` Steve Freitas
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=54FCED1B.8060605@ihonk.com \
--to=sflist@ihonk.com \
--cc=JBeulich@suse.com \
--cc=donald.d.dugger@intel.com \
--cc=dslutz@verizon.com \
--cc=jun.nakajima@intel.com \
--cc=len.brown@intel.com \
--cc=xen-devel@lists.xen.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.