From mboxrd@z Thu Jan 1 00:00:00 1970 From: Boris Ostrovsky Subject: Re: Workings/effectiveness of the xen-acpi-processor driver Date: Tue, 1 May 2012 18:35:45 -0400 Message-ID: <4FA06541.7050607@amd.com> References: <4F97F58A.8090409@canonical.com> <20120426155033.GE26830@phenom.dumpdata.com> <4F9976F8.8040502@canonical.com> <20120501200207.GA15313@phenom.dumpdata.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; Format="flowed" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <20120501200207.GA15313@phenom.dumpdata.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Konrad Rzeszutek Wilk , Stefan Bader Cc: "xen-devel@lists.xensource.com" , Jan Beulich List-Id: xen-devel@lists.xenproject.org On 05/01/2012 04:02 PM, Konrad Rzeszutek Wilk wrote: > On Thu, Apr 26, 2012 at 06:25:28PM +0200, Stefan Bader wrote: >> On 26.04.2012 17:50, Konrad Rzeszutek Wilk wrote: >>> On Wed, Apr 25, 2012 at 03:00:58PM +0200, Stefan Bader wrote: >>>> Since there have been requests about that driver to get backported into 3.2, I >>>> was interested to find out what or how much would be gained by that. >>>> >>>> The first system I tried was an AMD based one (8 core Opteron 6128@2GHz). Which >>>> was not very successful as the drivers bail out of the init function because the >>>> first call to acpi_processor_register_performance() returns -ENODEV. There is >>>> some frequency scaling when running without Xen, so I need to do some more >>>> debugging there. I believe this is caused by the somewhat under-enlightened xen_apic_read(): static u32 xen_apic_read(u32 reg) { return 0; } This results in some data, most importantly boot_cpu_physical_apicid, not being set correctly and, in turn, causes x86_cpu_to_apicid to be broken. On larger AMD systems boot processor is typically APICID=0x20 (I don't have Intel system handy to see how it looks there). As a quick and dirty test you can try: diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c index edc2448..1f78998 100644 --- a/arch/x86/kernel/apic/apic.c +++ b/arch/x86/kernel/apic/apic.c @@ -1781,6 +1781,7 @@ void __init register_lapic_address(unsigned long address) } if (boot_cpu_physical_apicid == -1U) { boot_cpu_physical_apicid = read_apic_id(); + boot_cpu_physical_apicid = 32; apic_version[boot_cpu_physical_apicid] = GET_APIC_VERSION(apic_read(APIC_LVR)); } (Set it to whatever APICID on core0 is, I suspect it won't be zero). -boris >>> >>> Did you back-port the other components - the ones that turn off the native >>> frequency scalling? >>> >>> provide disable_cpufreq() function to disable the API. >>> xen/acpi-processor: Do not depend on CPU frequency scaling drivers. >>> xen/cpufreq: Disable the cpu frequency scaling drivers from loading >>>> >> >> Yes, here is the full set for reference: >> >> * xen/cpufreq: Disable the cpu frequency scaling drivers from loading. >> * xen/acpi: Remove the WARN's as they just create noise. >> * xen/acpi: Fix Kconfig dependency on CPU_FREQ >> * xen/acpi-processor: Do not depend on CPU frequency scaling drivers. >> * xen/acpi-processor: C and P-state driver that uploads said data to hyper >> * provide disable_cpufreq() function to disable the API. > > And (Linus just pulled it), you also need this one: > df88b2d96e36d9a9e325bfcd12eb45671cbbc937 (xen/enlighten: Disable MWAIT_LEAF so that acpi-pad won't be loaded.) > >> >>>> The second system was an Intel one (4 core i7 920@2.67GHz) which was >>>> successfully loading the driver. Via xenpm I can see the various frequencies and >>>> also see them being changed. However the cpuidle data out of xenpm looks a bit odd: >>>> >>>> #> xenpm get-cpuidle-states 0 >>>> Max C-state: C7 >>>> >>>> cpu id : 0 >>>> total C-states : 2 >>>> idle time(ms) : 10819311 >>>> C0 : transition [00000000000000000001] >>>> residency [00000000000000005398 ms] >>>> C1 : transition [00000000000000000001] >>>> residency [00000000000010819311 ms] >>>> pc3 : [00000000000000000000 ms] >>>> pc6 : [00000000000000000000 ms] >>>> pc7 : [00000000000000000000 ms] >>>> cc3 : [00000000000000000000 ms] >>>> cc6 : [00000000000000000000 ms] >>>> >>>> Also gathering samples over 30s does look like only C0 and C1 are used. This >>> >>> Yes. >>>> might be because C1E support is enabled in BIOS but when looking at the >>>> intel_idle data in sysfs when running without a hypervisor will show C3 and C6 >>>> for the cores. That could have been just a wrong output, so I plugged in a power >>>> meter and compared a kernel running natively and running as dom0 (with and >>>> without the acpi-processor driver). >>>> >>>> Native: 175W >>>> dom0: 183W (with only marginal difference between with or without the >>>> processor driver) >>>> [yes, the system has a somewhat high base consumption which I attribute to a >>>> ridiculously dimensioned graphics subsystem to be running a text console] >>>> >>>> This I would take as C3 and C6 really not being used and the frequency scaling > > So the other thing I forgot to note is that C3->C6 have a detrimental > effect on some Intel boxes with Xen. We haven't figured out exactly which ones > and the bug is definitly in the hypervisor. The bug is that when the CPU goes in > those states the NIC ends up being unresponsive. Its like the interrupts stopped > being ACKed. If I run 'xenpm set-max-cstate 2' the issue disappears. > >>> >>> To go in deeper modes there is also a need to backport a Xen unstable >>> hypercall which will allow the kernel to detect the other states besides >>> C0-C2. >>> >>> "XEN_SET_PDC query was implemented in c/s 23783: >>> "ACPI: add _PDC input override mechanism". >>> >> >> I see. There is a kernel patch about enabling MWAIT that refers to that... > > Were there any special things you ran when checking the output? Just plugging > and looking at the results? >> >>> >>>> having no impact on the idle system is not that much surprising. But if that was >>>> true it would also limit the usefulness of the turbo mode which I understand >>>> would also be limited by the c-state of the other cores. >>> >>> Hm, I should double-check that - but somehow I thought that Xen independetly >>> checks for TurboMode and if the P-states are in, then they are activated. > > I did a bit of checking around and it does seem that is the case. From what > I have gathered the TurboMode kicks in when the CPU is C0 mode (which should > be obvious), and when the other cores are in anything but C0 mode. And sure > enough that seems to be the case. But I can't get the concrete details whether > the "but C0 mode" means that TurboMode will work better if the C mode is legacy > C1, C2, C3 or the CPU C-states (so MWAIT enabled). Trying to find out from > Len Brown more details.. >>> >> Turbo mode should be enabled. I had been only looking at a generic overview >> about it on Intel site which sounded like it would make more of a difference on >> how much one core could get overclocked related to how many cores are active >> (and I translated active or not into deeper c-states or not). >> Looking at the verbose output of turbostat it seems not to make that much >> difference whether 2-4 cores are running. A single core alone could get one more >> increment in clock stepping. That does not immediately sound a lot. And of >> course how much or long the higher clock is used depends on other factors as >> well and is not under OS control. >> >> In the end it is probably quite dynamic and hard to come up with hard facts to >> prove its value. Though if I can lower the idle power usage by reaching a bit >> further, that would greatly help to justify the effort and potential risk of >> backporting... > > I understand. I wish I could give you the exact percentage points by which > the power usage will drop. But I think the more substantial reason benefit of > these patches is performance gains. The ones that Ian Campbell ran and were > posted on Phorenix site paint that they are beneficial. > >> >>>> >>>> Do I misread the data I see? Or maybe its a known limitation? In case it is >>>> worth doing more research I'll gladly try things and gather more data. >>> >>> Just missing some patches. >>> >>> Oh, and this one: >>> xen/acpi: Fix Kconfig dependency on CPU_FREQ >>> >>> Hmm.. I think a patch disappeared somewhere. > > That was the one I referenced at the beginning of this email. > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel >