All of lore.kernel.org
 help / color / mirror / Atom feed
From: Steve Freitas <sflist@ihonk.com>
To: Jan Beulich <JBeulich@suse.com>, Len Brown <len.brown@intel.com>
Cc: Don Slutz <dslutz@verizon.com>,
	Jun Nakajima <jun.nakajima@intel.com>,
	Donald D Dugger <donald.d.dugger@intel.com>,
	"xen-devel@lists.xen.org" <xen-devel@lists.xen.org>
Subject: Re: Regression, host crash with 4.5rc1
Date: Sun, 08 Mar 2015 17:45:15 -0700	[thread overview]
Message-ID: <54FCED1B.8060605@ihonk.com> (raw)
In-Reply-To: <54F48EB902000078000652D0@mail.emea.novell.com>


[-- Attachment #1.1: Type: text/plain, Size: 8063 bytes --]

Hi Len, thanks for chiming in. I am a Xen noob and generally clueless to 
the inner workings of this power management stuff, so apologies in 
advance if I don't understand what is asked. I am, however, happy to try 
whatever you'd like me to in pursuing this issue.

On 03/02/2015 07:24 AM, Jan Beulich wrote:
>>>> On 27.02.15 at 18:50, <len.brown@intel.com> wrote:
>> If this issue were to happen on Linux/bare-metal, this is how I'd debug it.
>> Hopefully some of this will translate to Xen in one way or another.
> Sadly not really - the kernel plays only a minor role (forwarding ACPI
> data to the hypervisor) in C-state handling under Xen.
>
>> dmesg | grep idle
>> will tell us what idle driver is running (on Dom0 kernel)
>> and if it is intel_idle, it will also tell us the supported sub-states
>> (CPUID.MWAIT.EDX value)

root@g2:~# dmesg | grep idle
[    0.000000]     RCU dyntick-idle grace-period acceleration is enabled.
[   11.391708] intel_idle: MWAIT substates: 0x1120
[   11.391711] intel_idle: v0.4 model 0x2C
[   11.391712] intel_idle: lapic_timer_reliable_states 0xffffffff
[   11.391780] intel_idle: intel_idle yielding to none

(This output is the same whether I've got max_cstate=2 set or not.)

> Yeah, we call the driver mwait-idle in the hypervisor, and the log
> would be accssible via "xl dmesg", but yes, that information is
> available there too.
>
>>>> (XEN)     C1:   type[C1] latency[003] usage[12219860] method[  FFH]
>>>> duration[1190961948551]
>>>> (XEN)     C2:   type[C1] latency[010] usage[10205554] method[  FFH]
>>>> duration[2015393965907]
>>>> (XEN)     C3:   type[C2] latency[020] usage[50926286] method[  FFH]
>>>> duration[30527997858148]
>> I'm hopeful that this information comes from the hardware's BIOS
>> and not some hypervisor tricking out Dom0 with a fake BIOS, yes?
> In the case of mwait-idle (intel_idle on Linux) it would be built-in
> knowledge of the driver. For acpi-cpuidle it would come from
> actual firmware, not anything fake/virtual.
>
>> Next, hopefully the attached turbostat utility can be invoked on Dom0
>> and it can read the MSRs on at least 1 processor via the /dev/cpu interface.
> Yes, that would be possible, provided it's not important what specific
> CPU it gets executed on.

I've run it (with the "max_cstate=2" intact from Xen's boot line) and 
the output is as follows, while running the problematic graphics 
benchmark on my Win 7 VM:

root@g2:~/turbostat-test# ./turbostat
./turbostat: APERF or MPERF went backwards *
* Frequency results do not cover entire interval *
* fix this by running Linux-2.6.30 or later *
      CPU Avg_MHz   %Busy Bzy_MHz TSC_MHz
        -   36804********    2736    2800
        0   64323********    2560    2800
        1    8244********    3398    2800
        2  125758********    2760    2800
        3   17811********    3032    2800
        4     735********    2977    2800
        5    3954********    2656    2800
      CPU Avg_MHz   %Busy Bzy_MHz TSC_MHz
        -   47728********    2804    2800
        0   18007********    3025    2800
        1   69086********    2634    2800
        2     522********    2713    2800
        3   77486********    2680    2800
        4   58487********    2932    2800
        5   62777********    3006    2800
      CPU Avg_MHz   %Busy Bzy_MHz TSC_MHz
        -   49031********    2728    2800
        0   78178********    2681    2800
        1   62045********    2561    2800
        2    9060********    3110    2800
        3   16619********    3255    2800
        4     720********    2661    2800
        5  127565********    2763    2800
      CPU Avg_MHz   %Busy Bzy_MHz TSC_MHz
        -   65471********    2700    2800
        0   70582********    2638    2800
        1    2173********    1954    2800
        2   49981********    2899    2800
        3   78668********    2682    2800
        4  128293********    2762    2800
        5   63131********    2566    2800

Not sure why the warning about the kernel version, this box is running 
Debian's Linux 3.16 kernel.

With "max_cstate=2" removed from Xen's boot line, this is the result:

root@g2:~/turbostat-test# ./turbostat
./turbostat: APERF or MPERF went backwards *
* Frequency results do not cover entire interval *
* fix this by running Linux-2.6.30 or later *
      CPU Avg_MHz   %Busy Bzy_MHz TSC_MHz
        -   23507********    2621    2800
        0   27631********    2552    2800
        1   35945********    2978    2800
        2   24417********    2472    2800
        3    1001********    2948    2800
        4   24417********    2472    2800
        5   27631********    2552    2800
      CPU Avg_MHz   %Busy Bzy_MHz TSC_MHz
        -   14114********    2687    2800
        0     529********    2738    2800
        1   60363********    2750    2800
        2   21290********    2497    2800
        3    1028********    2934    2800
        4     629********    2943    2800
        5     842********    2937    2800
      CPU Avg_MHz   %Busy Bzy_MHz TSC_MHz
        -   15048********    2714    2800
        0   25703********    2489    2800
        1   36024********    2975    2800
        2    5248********    2454    2800
        3    5248********    2454    2800
        4    9138********    2755    2800
        5    8925********    2751    2800
      CPU Avg_MHz   %Busy Bzy_MHz TSC_MHz
        -   32859********    2598    2800
        0   23089********    2526    2800
        1   61730********    2751    2800
        2   26138********    2492    2800
        3   26138********    2492    2800
        4   30029********    2574    2800
        5   30029********    2574    2800


>> It may tell us just the same thing I think we learned here:
>>
>>>> (XEN) PC2[0] PC3[8589642315848] PC6[0] PC7[0]
>>>> (XEN) CC3[28794734145697] CC6[0] CC7[0]
>> which I'm assuming are a dump of the MSR residency counters.
>> If yes, it appears to be that this platform is not invoking c6 and pc6 at
>> all,
>> and that the deepest state being used is actually cc3 and pc3.
>> I don't know if that is because you've booted the kernel with max_cstate=N
>> of some kind, or if this is default.
> Sadly I haven't been able to tell which original mail the quotes
> above are from, and since I had Steve experiment with disabling
> the deepest C-state permitted to be used, it may well be that
> this output was from one of those experiments. Remember, we
> already know that with use of C6 alone disabled things work for
> him (Steve - please correct me if I'm misremembering).

AIUI, that is correct. My Xen boot line (which eliminates the dom0/U 
hangs) includes "mwait-idle=1 max_cstate=2".

>> Guessing...
>> If no surprises in the debug stuff requested above, and
>> If the XEN debug stuff above is with c6 explicitly disabled...
>> Note that here are two kinds of c6 -- CC6 (core) and PC6 (package).
>> If this box supports both, the next thing to try will be to keep CC6
>> enabled, but to just disable PC6.  This is done via an MSR that turbostat
>> dumps out (MSR_NHM_SNB_PKG_CST_CFG_CTL) via the wrmsr(8) utility.
> I don't think the wrmsr tool can be used (unmodified) to reliably do
> this on all CPUs in the system - we'd likely have to cook up a patch
> to the hypervisor instead, or I'd have to hand my patch to msr-tools
> to Steve so he could use the tool under Xen (albeit that would also
> require him to use one of our forward ported kernels, as the
> upstream one doesn't have a pCPU sysfs interface yet afaik).

I'm game for whatever.

>> Though if that MSR is locked by the BIOS, then BIOS SETUP option
>> may be the only way to disable the package C-state limit without
>> also disabling the associated core C-state.
> Steve, could you check whether any such option exists (it's been
> a while, so apologies if we had asked already)?

No problem. I've cruised through the BIOS options and this is what I see 
that may apply:





If you'd like me to make any changes to those settings, please let me 
know. For reference this is a Lenovo ThinkStation D20 running a Xeon X5660.

Thanks!

Steve


[-- Attachment #1.2.1: Type: text/html, Size: 11470 bytes --]

[-- Attachment #1.2.2: Type: image/jpeg, Size: 55018 bytes --]

[-- Attachment #1.2.3: Type: image/jpeg, Size: 60102 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

  reply	other threads:[~2015-03-09  0:45 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-11-03 21:46 Regression, host crash with 4.5rc1 Steve Freitas
2014-11-03 21:52 ` Steve Freitas
2014-11-04 14:01   ` Don Slutz
2014-11-04  8:20 ` Pasi Kärkkäinen
2014-11-04 10:15   ` Jan Beulich
2014-11-10  8:03     ` Steve Freitas
2014-11-10  8:51       ` Jan Beulich
2014-11-10 20:05         ` Steve Freitas
2014-11-11  8:05           ` Jan Beulich
2014-11-17 19:21             ` Steve Freitas
2014-11-18  7:54               ` Jan Beulich
2014-11-20  1:23                 ` Steve Freitas
2014-11-20  7:59                   ` Jan Beulich
2014-11-20 20:07                     ` Steve Freitas
2014-11-21  8:42                       ` Jan Beulich
2014-11-23  1:28                         ` Steve Freitas
2014-11-24  8:45                           ` Jan Beulich
2014-11-24  9:08                             ` Steve Freitas
2014-11-24  9:15                               ` Jan Beulich
2014-11-24 11:41                               ` Jan Beulich
2014-11-24 22:17                                 ` Steve Freitas
2014-11-25  8:16                                   ` Jan Beulich
2014-11-25  9:38                                     ` Steve Freitas
2014-11-25 11:00                                       ` Jan Beulich
2014-11-27  5:29                                         ` Steve Freitas
2014-11-27  9:27                                           ` Jan Beulich
2014-11-28  8:24                                             ` Steve Freitas
2014-11-28  8:50                                               ` Jan Beulich
2014-11-28  9:44                                                 ` Steve Freitas
2014-12-03 17:14                                             ` Dugger, Donald D
2015-02-27 17:25                                             ` Dugger, Donald D
2015-02-27 17:50                                               ` Brown, Len
2015-03-02 15:24                                                 ` Jan Beulich
2015-03-09  0:45                                                   ` Steve Freitas [this message]
2015-03-26 20:49                                                     ` Brown, Len
2014-11-21  9:31                       ` Jan Beulich
2014-11-04 18:35   ` Steve Freitas
2014-11-04 14:39 ` Don Slutz
2014-11-06 23:20   ` Steve Freitas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=54FCED1B.8060605@ihonk.com \
    --to=sflist@ihonk.com \
    --cc=JBeulich@suse.com \
    --cc=donald.d.dugger@intel.com \
    --cc=dslutz@verizon.com \
    --cc=jun.nakajima@intel.com \
    --cc=len.brown@intel.com \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.