xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: Boris Ostrovsky <boris.ostrovsky@amd.com>
To: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>,
	Stefan Bader <stefan.bader@canonical.com>
Cc: "xen-devel@lists.xensource.com" <xen-devel@lists.xensource.com>,
	Jan Beulich <jbeulich@suse.com>
Subject: Re: Workings/effectiveness of the xen-acpi-processor driver
Date: Tue, 1 May 2012 18:35:45 -0400	[thread overview]
Message-ID: <4FA06541.7050607@amd.com> (raw)
In-Reply-To: <20120501200207.GA15313@phenom.dumpdata.com>

On 05/01/2012 04:02 PM, Konrad Rzeszutek Wilk wrote:
> On Thu, Apr 26, 2012 at 06:25:28PM +0200, Stefan Bader wrote:
>> On 26.04.2012 17:50, Konrad Rzeszutek Wilk wrote:
>>> On Wed, Apr 25, 2012 at 03:00:58PM +0200, Stefan Bader wrote:
>>>> Since there have been requests about that driver to get backported into 3.2, I
>>>> was interested to find out what or how much would be gained by that.
>>>>
>>>> The first system I tried was an AMD based one (8 core Opteron 6128@2GHz). Which
>>>> was not very successful as the drivers bail out of the init function because the
>>>> first call to acpi_processor_register_performance() returns -ENODEV. There is
>>>> some frequency scaling when running without Xen, so I need to do some more
>>>> debugging there.

I believe this is caused by the somewhat under-enlightened xen_apic_read():

static u32 xen_apic_read(u32 reg)
{
         return 0;
}

This results in some data, most importantly boot_cpu_physical_apicid, 
not being set correctly and, in turn, causes x86_cpu_to_apicid to be 
broken.

On larger AMD systems boot processor is typically APICID=0x20 (I don't 
have Intel system handy to see how it looks there).

As a quick and dirty test you can try:

diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index edc2448..1f78998 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -1781,6 +1781,7 @@ void __init register_lapic_address(unsigned long 
address)
         }
         if (boot_cpu_physical_apicid == -1U) {
                 boot_cpu_physical_apicid  = read_apic_id();
+               boot_cpu_physical_apicid = 32;
                 apic_version[boot_cpu_physical_apicid] =
                          GET_APIC_VERSION(apic_read(APIC_LVR));
         }


(Set it to whatever APICID on core0 is, I suspect it won't be zero).

-boris


>>>
>>> Did you back-port the other components - the ones that turn off the native
>>> frequency scalling?
>>>
>>>        provide disable_cpufreq() function to disable the API.
>>> 	xen/acpi-processor: Do not depend on CPU frequency scaling drivers.
>>>        xen/cpufreq: Disable the cpu frequency scaling drivers from loading
>>>>
>>
>> Yes, here is the full set for reference:
>>
>> * xen/cpufreq: Disable the cpu frequency scaling drivers from loading.
>> * xen/acpi: Remove the WARN's as they just create noise.
>> * xen/acpi: Fix Kconfig dependency on CPU_FREQ
>> * xen/acpi-processor: Do not depend on CPU frequency scaling drivers.
>> * xen/acpi-processor: C and P-state driver that uploads said data to hyper
>> * provide disable_cpufreq() function to disable the API.
>
> And (Linus just pulled it), you also need this one:
>   df88b2d96e36d9a9e325bfcd12eb45671cbbc937 (xen/enlighten: Disable MWAIT_LEAF so that acpi-pad won't be loaded.)
>
>>
>>>> The second system was an Intel one (4 core i7 920@2.67GHz) which was
>>>> successfully loading the driver. Via xenpm I can see the various frequencies and
>>>> also see them being changed. However the cpuidle data out of xenpm looks a bit odd:
>>>>
>>>> #>  xenpm get-cpuidle-states 0
>>>> Max C-state: C7
>>>>
>>>> cpu id               : 0
>>>> total C-states       : 2
>>>> idle time(ms)        : 10819311
>>>> C0                   : transition [00000000000000000001]
>>>>                         residency  [00000000000000005398 ms]
>>>> C1                   : transition [00000000000000000001]
>>>>                         residency  [00000000000010819311 ms]
>>>> pc3                  : [00000000000000000000 ms]
>>>> pc6                  : [00000000000000000000 ms]
>>>> pc7                  : [00000000000000000000 ms]
>>>> cc3                  : [00000000000000000000 ms]
>>>> cc6                  : [00000000000000000000 ms]
>>>>
>>>> Also gathering samples over 30s does look like only C0 and C1 are used. This
>>>
>>> Yes.
>>>> might be because C1E support is enabled in BIOS but when looking at the
>>>> intel_idle data in sysfs when running without a hypervisor will show C3 and C6
>>>> for the cores. That could have been just a wrong output, so I plugged in a power
>>>> meter and compared a kernel running natively and running as dom0 (with and
>>>> without the acpi-processor driver).
>>>>
>>>> Native: 175W
>>>> dom0:   183W (with only marginal difference between with or without the
>>>>                processor driver)
>>>> [yes, the system has a somewhat high base consumption which I attribute to a
>>>> ridiculously dimensioned graphics subsystem to be running a text console]
>>>>
>>>> This I would take as C3 and C6 really not being used and the frequency scaling
>
> So the other thing I forgot to note is that C3->C6 have a detrimental
> effect on some Intel boxes with Xen. We haven't figured out exactly which ones
> and the bug is definitly in the hypervisor. The bug is that when the CPU goes in
> those states the NIC ends up being unresponsive. Its like the interrupts stopped
> being ACKed. If I run 'xenpm set-max-cstate 2' the issue disappears.
>
>>>
>>> To go in deeper modes there is also a need to backport a Xen unstable
>>> hypercall which will allow the kernel to detect the other states besides
>>> C0-C2.
>>>
>>> "XEN_SET_PDC query was implemented in c/s 23783:
>>>      "ACPI: add _PDC input override mechanism".
>>>
>>
>> I see. There is a kernel patch about enabling MWAIT that refers to that...
>
> Were there any special things you ran when checking the output? Just plugging
> and looking at the results?
>>
>>>
>>>> having no impact on the idle system is not that much surprising. But if that was
>>>> true it would also limit the usefulness of the turbo mode which I understand
>>>> would also be limited by the c-state of the other cores.
>>>
>>> Hm, I should double-check that - but somehow I thought that Xen independetly
>>> checks for TurboMode and if the P-states are in, then they are activated.
>
> I did a bit of checking around and it does seem that is the case. From what
> I have gathered the TurboMode kicks in when the CPU is C0 mode (which should
> be obvious), and when the other cores are in anything but C0 mode. And sure
> enough that seems to be the case. But I can't get the concrete details whether
> the "but C0 mode" means that TurboMode will work better if the C mode is legacy
> C1, C2, C3 or the CPU C-states (so MWAIT enabled). Trying to find out from
> Len Brown more details..
>>>
>> Turbo mode should be enabled. I had been only looking at a generic overview
>> about it on Intel site which sounded like it  would make more of a difference on
>> how much one core could get overclocked related to how many cores are active
>> (and I translated active or not into deeper c-states or not).
>> Looking at the verbose output of turbostat it seems not to make that much
>> difference whether 2-4 cores are running. A single core alone could get one more
>> increment in clock stepping. That does not immediately sound a lot. And of
>> course how much or long the higher clock is used depends on other factors as
>> well and is not under OS control.
>>
>> In the end it is probably quite dynamic and hard to come up with hard facts to
>> prove its value. Though if I can lower the idle power usage by reaching a bit
>> further, that would greatly help to justify the effort and potential risk of
>> backporting...
>
> I understand. I wish I could give you the exact percentage points by which
> the power usage will drop. But I think the more substantial reason benefit of
> these patches is performance gains. The ones that Ian Campbell ran and were
> posted on Phorenix site paint that they are beneficial.
>
>>
>>>>
>>>> Do I misread the data I see? Or maybe its a known limitation? In case it is
>>>> worth doing more research I'll gladly try things and gather more data.
>>>
>>> Just missing some patches.
>>>
>>> Oh, and this one:
>>>        xen/acpi: Fix Kconfig dependency on CPU_FREQ
>>>
>>> Hmm.. I think a patch disappeared somewhere.
>
> That was the one I referenced at the beginning of this email.
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
>

  reply	other threads:[~2012-05-01 22:35 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-04-25 13:00 Workings/effectiveness of the xen-acpi-processor driver Stefan Bader
2012-04-26 15:50 ` Konrad Rzeszutek Wilk
2012-04-26 16:25   ` Stefan Bader
2012-04-26 17:04     ` Konrad Rzeszutek Wilk
2012-05-06 15:23       ` Pasi Kärkkäinen
2012-05-07 17:33         ` Konrad Rzeszutek Wilk
2012-05-07 17:44           ` Pasi Kärkkäinen
2012-05-01 20:02     ` Konrad Rzeszutek Wilk
2012-05-01 22:35       ` Boris Ostrovsky [this message]
2012-05-01 22:54         ` Konrad Rzeszutek Wilk
2012-05-02  0:47           ` Konrad Rzeszutek Wilk
2012-05-02  1:11             ` Boris Ostrovsky
2012-05-02  9:19               ` Jan Beulich
2012-05-02 14:56           ` Stefan Bader
2012-05-02  8:36         ` Stefan Bader
2012-05-02 15:01         ` Stefan Bader
2012-05-02 16:08           ` Konrad Rzeszutek Wilk
2012-05-02 17:06             ` Boris Ostrovsky
2012-05-02 17:14               ` Konrad Rzeszutek Wilk
2012-05-02 21:31                 ` Boris Ostrovsky
2012-05-02 21:41                   ` Konrad Rzeszutek Wilk
2012-05-02 22:09                     ` Boris Ostrovsky
2012-05-03  6:55                       ` Stefan Bader
2012-05-03 10:00                         ` Stefan Bader
2012-05-03 12:58                       ` Stefan Bader
2012-05-03 14:47                         ` Stefan Bader
2012-05-03 15:46                           ` Konrad Rzeszutek Wilk
2012-05-03 17:02                             ` Boris Ostrovsky
2012-05-03 17:08                             ` Konrad Rzeszutek Wilk
2012-05-04  8:00                               ` Stefan Bader
2012-05-03 16:14                       ` Konrad Rzeszutek Wilk
2012-05-02 21:29             ` Stefan Bader
2012-05-02  8:22       ` Stefan Bader

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4FA06541.7050607@amd.com \
    --to=boris.ostrovsky@amd.com \
    --cc=jbeulich@suse.com \
    --cc=konrad.wilk@oracle.com \
    --cc=stefan.bader@canonical.com \
    --cc=xen-devel@lists.xensource.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).