From mboxrd@z Thu Jan 1 00:00:00 1970 From: David C Niemi Subject: Re: powersave governor runs programs faster and uses more power than performance governor Date: Fri, 25 Oct 2013 10:31:07 -0400 Message-ID: <526A80AB.9070401@verisign.com> References: Mime-Version: 1.0 Content-Transfer-Encoding: QUOTED-PRINTABLE Return-path: In-Reply-To: Sender: cpufreq-owner@vger.kernel.org List-ID: Content-Type: text/plain; charset="iso-8859-1" To: Melanie Kambadur Cc: Viresh Kumar , "cpufreq@vger.kernel.org" , Linux PM list On 10/24/13 15:42, Melanie Kambadur wrote: > Thank you for your quick and very helpful responses. > > A couple of updates. I neglected to RTFM for the cpufreq setter I was > using :) When I started to print out the values for all the CPUs > rather than just the averages as Viresh suggested, I realized that my > governor settings were only applying to one of the cores at a time. M= y > apologies for the silly mistake, I thought I had verified that the > changes were applied to all of the cores. > > After actually applying the frequency governor updates to all of the > cores (and triple-checking this time), the new results for my > mini-experiment are still odd. I don't know a good way to share data > on this forum, please see a snippet of the data at the end of this > note and let me know if there is a better way to share the complete > data set. As a summary, the new average frequencies across all the > cores were: > performance w/ no apps running =3D 1.13 * 10 ^ 6 > performance w/ apps running =3D 1.33 * 10 ^ 6 > powersave w/ no apps running =3D 1.38 * 10 ^ 6 > powersave w/ apps running =3D 1.95 * 10 ^ 6 > > I compared these numbers (from cpufreq/cpuinfo_cur_freq) to i7z > reports and they seem to be reasonable. It's hard to compare > perfectly, because I can't get i7z to print the frequency values in > plain text as I would like, but they are definitely in the same > ballpark (look to be within 100 Mhz). > > Obviously, these still aren't the frequency values we'd expect. I > think David may be correct that the Dell firmware is somehow > overriding the linux governors. Here are some more details about my > server: > Dell Power Edge R420 with 2 sockets, both: > Intel=AE Xeon=AE E5-2430 2.20GHz, 15M Cache, 7.2GT/s > QPI, Turbo, 6C, 95W, Max Mem 1333MHz E52430 > Each socket actually has 6 cores, with dual SMT to make 12 logical > cores per socket, or 24 total logical cores. You very definitely need to look at Dell's BIOS power management settin= gs. By default they tend to override what you are trying to do at the = operating system level. What you probably want is an "OS Control" sett= ing. > > From /sys/devices/system/cpu/cpuN/cpufreq/scaling_driver I get that > the current p-state driver is called "intel_pstate". David, you > mention that the firmware governors are not very efficient, do you > suggest replacing the intel_pstate driver with a different driver? Of > the drivers listed here: > https://wiki.archlinux.org/index.php/CPU_Frequency_Scaling#CPU_freque= ncy_driver > , I apparently only have available speedstep and p4-clockmod in my > current kernel. Is one of those better than intel_pstate or will I > need to download a new driver or even update the kernel to get anothe= r > one? The intel_pstate governor is quite new -- it is both a governor and a d= river if you want to compare it. The older more established approach i= s to use acpi-cpufreq as the driver and Ondemand as the governor, which= works quite well for many use cases if properly configured. We have p= eople on this list who know intel_idle a lot better than I do, but if D= ell is using its Active Power Controller intel_idle is not calling the = shots anyway. > Also, by C1E do you mean idle state management? I should have given > some context for my adjustments to the power management policies, > which is that I am a grad student trying to research how system level > energy management policies compare to some specific application level > energy management policies. I would actually like to test a range of > system level policies, including different kinds of frequency and idl= e > state managers. The original goal was to compare a power-optimized > system version with a performance-optimized version (or a few such > versions), but I am learning that the options are not so simple. I > initially thought that on-demand would be the most power-efficient > frequency governor, but when I noticed that the on-demand governor wa= s > missing in my available governors list, I did some digging and > discovered people writing that on-demand was deprecated for > Sandy-Bridge (e.g., > http://www.phoronix.com/scan.php?page=3Dnews_item&px=3DMTM3NDQ) Is th= is > true? On a more general note, does anyone know what the theoretically > most power- and performance-optimized frequency governors/drivers > would be for my system setup? > > Thanks again, > > Melanie > > P.S. I haven't yet tried the latest v3.12-rc kernel, and while it is > an option, I would prefer to get the frequency tuning working on my > existing kernel to avoid having to re-run some other relevant > experiments. > =2E.. Ondemand works very well for Sandy Bridge if properly configured for yo= ur intended application. The new Intel Pstate governor is specifically= targeted to Sandy Bridge and later processors, and provides an interes= ting alternative to Ondemand within that scope, but that does not mean = Ondemand is "deprecated". Ondemand is the most common P-State governor= across a huge variety of platforms ranging from phones to large server= s and across many brands of processors besides Intel; it is silly to ca= ll it "deprecated" just because one of these platforms has an alternati= ve to it. In fact many phones have alternatives to Ondemand too, as we= ll as many platform-specific variants. Note that there are very big di= fferences across these platforms -- on phones and other battery-powered= devices, power savings are paramount, while on a server, performance u= nder peak loads is usually paramount. As for Ted Ts'o's observation, O= ndemand was originally designed before tickless kernels and it is obvious it needs to be adapted to not wake up an idle CPU just to asses= s load in a battery-powered applicaiton. You might instead want to wak= e up when you get interrupts due to network activity. But that is not = to say managing clock rates is not a good idea, just that we have to ad= apt and rethink things. There are two main sides of power management, P States (i.e. clock spee= d) and C States (i.e. what type of "halt" instruction is used). intel_= pstate is of course a manager of P States. intel_idle and acpi_idle ar= e C State drivers; most people use the "menu" C state governor and it i= s just a question of which C State driver to use and how to configure i= t. Modern Intel processors rely heavily on having a very effective C1 slee= p state that the scheduler calls when a core is idle for a short time. = C1E is the original standard for an enhanced C1 sleep state but Intel = continues to improve on it, so you may see references to C1-NHM (Nehale= m) or C1-SNB (Sandy Bridge) to distinguish feature changes between proc= essor versions. A few applications that require very low latency coming out of sleep ma= y need to avoid sleep states deeper than C1/C1E or C3 (the deeper the s= leep, the longer it takes to wake up and be ready for productive work).= It is almost never a good idea to turn off C1E -- latency to get out = of C1E is very short and it saves a lot of power vs. "polling" (i.e. ju= st leaving the core active and having it run a busy wait loop). DCN