From mboxrd@z Thu Jan 1 00:00:00 1970 From: Arjan van de Ven Subject: Re: [PATCH 3/3] PM: Introduce Intel PowerClamp Driver Date: Tue, 13 Nov 2012 14:45:11 -0800 Message-ID: <50A2CD77.7000403@linux.intel.com> References: <1352757831-5202-1-git-send-email-jacob.jun.pan@linux.intel.com> <1352757831-5202-4-git-send-email-jacob.jun.pan@linux.intel.com> <20121113211602.GA30150@linux.vnet.ibm.com> <20121113133922.47144a50@chromoly> <20121113222350.GH2489@linux.vnet.ibm.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Return-path: Received: from mga01.intel.com ([192.55.52.88]:37900 "EHLO mga01.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754220Ab2KMWpN (ORCPT ); Tue, 13 Nov 2012 17:45:13 -0500 In-Reply-To: <20121113222350.GH2489@linux.vnet.ibm.com> Sender: linux-pm-owner@vger.kernel.org List-Id: linux-pm@vger.kernel.org To: paulmck@linux.vnet.ibm.com Cc: Jacob Pan , Linux PM , LKML , Rafael Wysocki , Len Brown , Thomas Gleixner , "H. Peter Anvin" , Ingo Molnar , Zhang Rui , Rob Landley On 11/13/2012 2:23 PM, Paul E. McKenney wrote: > On Tue, Nov 13, 2012 at 01:39:22PM -0800, Jacob Pan wrote: >> On Tue, 13 Nov 2012 13:16:02 -0800 >> "Paul E. McKenney" wrote: >> >>>> Please refer to Documentation/thermal/intel_powerclamp.txt for more >>>> details. >>> >>> If I read this correctly, this forces a group of CPUs into idle for >>> about 600 milliseconds at a time. This would indeed delay grace >>> periods, which could easily result in user complaints. Also, given >>> the default RCU_BOOST_DELAY of 500 milliseconds in kernels enabling >>> RCU_BOOST, you would see needless RCU priority boosting. >>> >> the default idle injection duration is 6ms. we adjust the sleep >> interval to ensure idle ratio. So the idle duration stays the same once >> set. So would it be safe to delay grace period for this small amount in >> exchange for less over head in each injection period? > > Ah, 6ms of delay is much better than 600ms. Should be OK (famous last > words!). well... power clamping is not "free". You're going to lose performance as a trade off for dropping instantaneous power consumption.... in the measurements we've done comparing various methods.. this one is doing remarkably well. > > For most kernel configuration options, it does use softirq. And yes, > the kthread you are using would yield to softirqs -- but only as long > as softirq processing hasn't moved over to ksoftirqd. Longer term, > RCU will be moving from softirq to kthreads, though, and these might be > prempted by your powerclamp kthread, depending on priorities. It looks > like you use RT prio 50, which would usually preempt the RCU kthreads > (unless someone changed the priorities). we tried to pick a "middle of the road" value, so that usages that really really want to run, still get to run, but things that are more loose about it, get put on hold. > >>> It looks like you could end up with part of the system powerclamped >>> in some situations, and with all of it powerclamped in other >>> situations. Is that the case, or am I confused? >>> >> could you explain the part that is partially powerclamped? > > Suppose that a given system has two sockets. Are the two sockets > powerclamped independently, or at the same time? My guess was the > former, but looking at the code again, it looks like the latter. > So it is a good thing I asked, I guess. ;-) they are clamped together, and they have to. you don't get (on the systems where this driver works) any "package" C state unless all packages are idle completely. And it's these package C states where the real deep power savings happen, that's why they are such a juicy target for power clamping ;-)