From mboxrd@z Thu Jan 1 00:00:00 1970 From: Stefan Bader Subject: Re: Workings/effectiveness of the xen-acpi-processor driver Date: Wed, 02 May 2012 10:22:45 +0200 Message-ID: <4FA0EED5.8040009@canonical.com> References: <4F97F58A.8090409@canonical.com> <20120426155033.GE26830@phenom.dumpdata.com> <4F9976F8.8040502@canonical.com> <20120501200207.GA15313@phenom.dumpdata.com> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============6383696628749017196==" Return-path: In-Reply-To: <20120501200207.GA15313@phenom.dumpdata.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Konrad Rzeszutek Wilk Cc: "xen-devel@lists.xensource.com" , Jan Beulich List-Id: xen-devel@lists.xenproject.org This is an OpenPGP/MIME signed message (RFC 2440 and 3156) --===============6383696628749017196== Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="------------enig8FC035EE44599C0D30620CF6" This is an OpenPGP/MIME signed message (RFC 2440 and 3156) --------------enig8FC035EE44599C0D30620CF6 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Sorry for not responding sooner, there was a good opportunity for a long = week end here I could not miss. Also I had been moving the power meter over to= the other machine. On 01.05.2012 22:02, Konrad Rzeszutek Wilk wrote: > On Thu, Apr 26, 2012 at 06:25:28PM +0200, Stefan Bader wrote: >> On 26.04.2012 17:50, Konrad Rzeszutek Wilk wrote: >>> On Wed, Apr 25, 2012 at 03:00:58PM +0200, Stefan Bader wrote: >>>> Since there have been requests about that driver to get backported i= nto 3.2, I >>>> was interested to find out what or how much would be gained by that.= >>>> >>>> The first system I tried was an AMD based one (8 core Opteron 6128@2= GHz). Which >>>> was not very successful as the drivers bail out of the init function= because the >>>> first call to acpi_processor_register_performance() returns -ENODEV.= There is >>>> some frequency scaling when running without Xen, so I need to do som= e more >>>> debugging there. >>> >>> Did you back-port the other components - the ones that turn off the n= ative >>> frequency scalling? >>> >>> provide disable_cpufreq() function to disable the API. >>> xen/acpi-processor: Do not depend on CPU frequency scaling drivers. >>> xen/cpufreq: Disable the cpu frequency scaling drivers from loa= ding >>>> >> >> Yes, here is the full set for reference: >> >> * xen/cpufreq: Disable the cpu frequency scaling drivers from loading.= >> * xen/acpi: Remove the WARN's as they just create noise. >> * xen/acpi: Fix Kconfig dependency on CPU_FREQ >> * xen/acpi-processor: Do not depend on CPU frequency scaling drivers. >> * xen/acpi-processor: C and P-state driver that uploads said data to h= yper >> * provide disable_cpufreq() function to disable the API. >=20 > And (Linus just pulled it), you also need this one: > df88b2d96e36d9a9e325bfcd12eb45671cbbc937 (xen/enlighten: Disable MWAIT= _LEAF so that acpi-pad won't be loaded.) I suspect that one only if I would have commit 73c154c60be106b47f15d1111fc2d75cc7a436f2 xen/enlighten: Expose MWAIT and MWAIT_LEAF if hypervisor OKs it. and the change in the hypervisor to implement XEN_SET_PDC. >=20 >> >>>> The second system was an Intel one (4 core i7 920@2.67GHz) which was= >>>> successfully loading the driver. Via xenpm I can see the various fre= quencies and >>>> also see them being changed. However the cpuidle data out of xenpm l= ooks a bit odd: >>>> >>>> #> xenpm get-cpuidle-states 0 >>>> Max C-state: C7 >>>> >>>> cpu id : 0 >>>> total C-states : 2 >>>> idle time(ms) : 10819311 >>>> C0 : transition [00000000000000000001] >>>> residency [00000000000000005398 ms] >>>> C1 : transition [00000000000000000001] >>>> residency [00000000000010819311 ms] >>>> pc3 : [00000000000000000000 ms] >>>> pc6 : [00000000000000000000 ms] >>>> pc7 : [00000000000000000000 ms] >>>> cc3 : [00000000000000000000 ms] >>>> cc6 : [00000000000000000000 ms] >>>> >>>> Also gathering samples over 30s does look like only C0 and C1 are us= ed. This >>> >>> Yes.=20 >>>> might be because C1E support is enabled in BIOS but when looking at = the >>>> intel_idle data in sysfs when running without a hypervisor will show= C3 and C6 >>>> for the cores. That could have been just a wrong output, so I plugge= d in a power >>>> meter and compared a kernel running natively and running as dom0 (wi= th and >>>> without the acpi-processor driver). >>>> >>>> Native: 175W >>>> dom0: 183W (with only marginal difference between with or without = the >>>> processor driver) >>>> [yes, the system has a somewhat high base consumption which I attrib= ute to a >>>> ridiculously dimensioned graphics subsystem to be running a text con= sole] >>>> >>>> This I would take as C3 and C6 really not being used and the frequen= cy scaling >=20 > So the other thing I forgot to note is that C3->C6 have a detrimental > effect on some Intel boxes with Xen. We haven't figured out exactly whi= ch ones > and the bug is definitly in the hypervisor. The bug is that when the CP= U goes in > those states the NIC ends up being unresponsive. Its like the interrupt= s stopped > being ACKed. If I run 'xenpm set-max-cstate 2' the issue disappears. >=20 I think I saw weird behavior not only under Xen with that. One netbook I = got has the nasty tendency to go into undetermined sleeps and you had to hit some= keys to keep it running. In that case I found out that the intel gfx card was = one which does not cause interrupts on vsync and other sources seemed not abl= e to wake up from deeper c-states when being msi. So the wonder cure there is = to force interrupts not being msi. That aside, I was also comparing power usage in idle on the AMD box. And = that shows an even bigger difference between running natively and running dom0= under the hypervisor (103W against 127W). As that is AMD there never are other c-states than c1(e) which xenpm shows as being used (got the feeling that= with AMD CPUs it is quite hard to find out the c1 residency nowadays (only int= el_idle keeps statistics). I think I got some data now but still need to parse it= (if I am not wrong one needs to use power trace now). But regardless it would hint that having the deeper c-states may not be t= he real solution. >>> >>> To go in deeper modes there is also a need to backport a Xen unstable= >>> hypercall which will allow the kernel to detect the other states besi= des >>> C0-C2. >>> >>> "XEN_SET_PDC query was implemented in c/s 23783: >>> "ACPI: add _PDC input override mechanism". >>> =20 >> >> I see. There is a kernel patch about enabling MWAIT that refers to tha= t... >=20 > Were there any special things you ran when checking the output? Just pl= ugging > and looking at the results? In the first step it only was the idle power usage. But as you say the re= al target for the patches were a better overall performance under use. So I = need to think of something to quantify the difference. >> >>> >>>> having no impact on the idle system is not that much surprising. But= if that was >>>> true it would also limit the usefulness of the turbo mode which I un= derstand >>>> would also be limited by the c-state of the other cores. >>> >>> Hm, I should double-check that - but somehow I thought that Xen indep= endetly >>> checks for TurboMode and if the P-states are in, then they are activa= ted. >=20 > I did a bit of checking around and it does seem that is the case. From = what > I have gathered the TurboMode kicks in when the CPU is C0 mode (which s= hould > be obvious), and when the other cores are in anything but C0 mode. And = sure > enough that seems to be the case. But I can't get the concrete details = whether > the "but C0 mode" means that TurboMode will work better if the C mode i= s legacy > C1, C2, C3 or the CPU C-states (so MWAIT enabled). Trying to find out f= rom > Len Brown more details.. >>> Right, that is how I understood things, too. Depending on how many cores = are "active" (which does not really say what active means) the frequency can = go up in steps of 133Mhz (or was that 200) up to a certain upper limit. The upp= er limit depends on how many cores are active but for my CPU seems just one = more step between 2-4 cores active and 1 core active. The rest is not under OS control beside the CPU to go up in frequency nee= d to be in P0 mode. From the looks other cores can be in P0 as well and can go up= a bit as well. >> Turbo mode should be enabled. I had been only looking at a generic ove= rview >> about it on Intel site which sounded like it would make more of a dif= ference on >> how much one core could get overclocked related to how many cores are = active >> (and I translated active or not into deeper c-states or not). >> Looking at the verbose output of turbostat it seems not to make that m= uch >> difference whether 2-4 cores are running. A single core alone could ge= t one more >> increment in clock stepping. That does not immediately sound a lot. An= d of >> course how much or long the higher clock is used depends on other fact= ors as >> well and is not under OS control. >> >> In the end it is probably quite dynamic and hard to come up with hard = facts to >> prove its value. Though if I can lower the idle power usage by reachin= g a bit >> further, that would greatly help to justify the effort and potential r= isk of >> backporting... >=20 > I understand. I wish I could give you the exact percentage points by wh= ich > the power usage will drop. But I think the more substantial reason bene= fit of > these patches is performance gains. The ones that Ian Campbell ran and = were > posted on Phorenix site paint that they are beneficial. I probably should ask Ian (or check on that post) what tests he was using= to measure the difference. >=20 >> >>>> >>>> Do I misread the data I see? Or maybe its a known limitation? In cas= e it is >>>> worth doing more research I'll gladly try things and gather more dat= a. >>> >>> Just missing some patches.=20 >>> >>> Oh, and this one: >>> xen/acpi: Fix Kconfig dependency on CPU_FREQ >>> >>> Hmm.. I think a patch disappeared somewhere. >=20 > That was the one I referenced at the beginning of this email. >=20 > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel --------------enig8FC035EE44599C0D30620CF6 Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQIcBAEBCgAGBQJPoO7eAAoJEOhnXe7L7s6jZr0P/jEA3z3rnrIRzMAb9k58kV1i DiowtaYpKKNtlYG6RSVLtC5Hwcedw06C3EJmYg5jVGyJ5BlrQW4p/2fX+ozIPs8q phaqCmSMagPnIrUAauj0jGZzo16VnMnawSOrNmOIqLdVPkC5vwGIx+y+f8otGseo j2weShe2Fws6W/IrtyRdLgrAjZ4ROfTKaaHddv/H+Z4ua6t15P6VAuYd1FFRmw1G 82YmikwpaDO+W2l2Xo6eOylDrkFD1J2puKrVcx0/yon3ec6jQquWVhf+mbZi9kzx EK31sHCtCtpWnLmg2FSirkOCMeC8UxFx+Md37yw8wv0q0cCEgNpTjwvBQPAI4GeI u3P/Hi1GcfUDX8OecQwLcd3Wlsa5q0HP7ga/NfL0HAXrBEg3ilcJ+Zfi/m06cVZJ oAor+RknjZQXo+fHKMTfrnHQ0Kr7nO8Am9LzURyO0N+92rmNE62HYArQk60fQ1TR k6EZNa4UvK2MooWin6cx/BFvjA+Ahpbp0tpC07G7X/9XVKqEmlyBKl02LsZcixYz YOy19EoOdm3ewkKTVVxyd/G4RqR7oRgUBKG+3wHU6VjrA9uLr4qUMiKHZf5iLsDH zstEEcaDKLCTvPxYidCfgACmio2/5zlwPlDEiRbbXA3zGnVn/X3m0HZEXXOUkpSf wo4fA1jPQ5Ql3nAmSpYa =eQpr -----END PGP SIGNATURE----- --------------enig8FC035EE44599C0D30620CF6-- --===============6383696628749017196== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel --===============6383696628749017196==--