public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Preeti U Murthy <preeti@linux.vnet.ibm.com>
To: Steven Noonan <steven@uplinklabs.net>
Cc: "Peter Zijlstra" <peterz@infradead.org>,
	"Thomas Gleixner" <tglx@linutronix.de>,
	jacob.jun.pan@intel.com,
	"Arjan van de Ven" <arjan@linux.intel.com>,
	"Linux Kernel mailing List" <linux-kernel@vger.kernel.org>,
	"Frédéric Weisbecker" <fweisbec@gmail.com>,
	frederic@kernel.org, "Daniel Lezcano" <daniel.lezcano@linaro.org>,
	"Amit Kucheria" <amit.kucheria@linaro.org>,
	"Eduardo Valentin" <edubezval@gmail.com>,
	"Viresh Kumar" <viresh.kumar@linaro.org>,
	rui.zhang@intel.com
Subject: Re: [PATCH V2] idle/intel_powerclamp: Redesign idle injection to use bandwidth control mechanism
Date: Wed, 11 Feb 2015 14:30:07 +0530	[thread overview]
Message-ID: <54DB1A17.9000706@linux.vnet.ibm.com> (raw)
In-Reply-To: <20150209181436.GA29721@rincewind.us-west-2.compute.internal>

On 02/09/2015 11:44 PM, Steven Noonan wrote:
> On Mon, Feb 9, 2015 at 9:56 AM, Steven Noonan <steven@uplinklabs.net> wrote:
>> On Mon, Feb 9, 2015 at 3:51 AM, Preeti U Murthy
>> <preeti@linux.vnet.ibm.com> wrote:
>>> Hi Steven,
>>>
>>> On 02/09/2015 01:02 PM, Steven Noonan wrote:
>>>> On Sun, Feb 8, 2015 at 8:49 PM, Preeti U Murthy
>>>> <preeti@linux.vnet.ibm.com> wrote:
>>>>> The powerclamp driver injects idle periods to stay within the thermal constraints.
>>>>> The driver does a fake idle by spawning per-cpu threads that call the mwait
>>>>> instruction. This behavior of fake idle can confuse the other kernel subsystems.
>>>>> For instance it calls into the nohz tick handlers, which are meant to be called
>>>>> only by the idle thread. It sets the state of the tick in each cpu to idle and
>>>>> stops the tick, when there are tasks on the runqueue. As a result the callers of
>>>>> idle_cpu()/ tick_nohz_tick_stopped() see different states of the cpu; while the
>>>>> former thinks that the cpu is busy, the latter thinks that it is idle. The outcome
>>>>> may be  inconsistency in the scheduler/nohz states which can lead to serious
>>>>> consequences. One of them was reported on this thread:
>>>>> https://lkml.org/lkml/2014/12/11/365.
>>>>>
>>>>> Thomas posted out a patch to disable the powerclamp driver from calling into the
>>>>> tick nohz code which has taken care of the above regression for the moment. However
>>>>> powerclamp driver as a result, will not be able to inject idle periods due to the
>>>>> presence of periodic ticks. With the current design of fake idle, we cannot move
>>>>> towards a better solution.
>>>>> https://lkml.org/lkml/2014/12/18/169
>>>>>
>>>>> This patch aims at removing the concept of fake idle and instead makes the cpus
>>>>> truly idle by throttling the runqueues during the idle injection periods. The situation
>>>>> is in fact very similar to throttling of cfs_rqs when they exceed their bandwidths.
>>>>> The idle injection metrics can be mapped to the bandwidth control metrics 'quota' and
>>>>> 'period' to achieve the same result. When the powerclamping is begun or when the
>>>>> clamping controls have been modified, the bandwidth for the root task group is set.
>>>>> The 'quota' will be the amount of time that the system needs to be busy and 'period'
>>>>> will be the sum of this busy duration and the idle duration as calculated by the driver.
>>>>> This gets rid of per-cpu kthreads, control cpu, hotplug notifiers and clamping mask since
>>>>> the thread starting powerclamping will set the bandwidth and throttling of all cpus will
>>>>> automatically fall in place. None of the other cpus need be bothered about this. This
>>>>> simplifies the design of the driver.
>>>>>
>>>>> Of course this is only if the idle injection metrics can be conveniently transformed
>>>>> into bandwidth control metrics. There are a couple of other primary concerns around if
>>>>> doing the below two in this patch is valid.
>>>>> a. This patch exports the functions to set the quota and period of task groups.
>>>>> b. This patch removes the constraint of not being able to set the root task grp's bandwidth.
>>>>>
>>>>> Signed-off-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
>>>>
>>>> This doesn't compile.
>>>
>>> Thanks for reporting this! I realized that I had not compiled in the powerclamp driver
>>> as a module while compile testing it. I was focusing on the issues with the design and
>>> failed to cross verify this. Apologies for the inconvenience.
>>>
>>> Find the diff compile tested below.
>>>
>>> I also realized that clamp_cpus() that sets the bandwidth cannot be called from
>>> multiple places. Currently I am calling it from end_powerclamp(), when the user changes the
>>> idle clamping duration and from a queued timer. This will require synchronization between
>>> callers which is not really called for. The queued wakeup_timer alone can re-evaluate the
>>> clamping metrics after every throttle-unthrottle period and this should suffice as far
>>> as I can see. Thoughts ?
>>
>> Hmm, I've had two system lockups so far while running a kernel with
>> intel_powerclamp loaded. Both times it slowly ground to a halt and
>> processes piled up...

Hmm I see. I am not surprised because this patch is not complete yet.
The idea was to gain suggestions and review around the approach first
before I went ahead to making it robust. Let me ease the creases that I
found and repost this patch for you to test. Thanks a lot for the
testing efforts! :)

Regards
Preeti U Murthy


  reply	other threads:[~2015-02-11  9:00 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-02-09  4:49 [PATCH V2] idle/intel_powerclamp: Redesign idle injection to use bandwidth control mechanism Preeti U Murthy
2015-02-09  7:32 ` Steven Noonan
2015-02-09 11:51   ` Preeti U Murthy
2015-02-09 17:56     ` Steven Noonan
2015-02-09 18:14       ` Steven Noonan
2015-02-11  9:00         ` Preeti U Murthy [this message]
2015-02-13 13:31 ` Peter Zijlstra
2015-02-13 14:03   ` Arjan van de Ven

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=54DB1A17.9000706@linux.vnet.ibm.com \
    --to=preeti@linux.vnet.ibm.com \
    --cc=amit.kucheria@linaro.org \
    --cc=arjan@linux.intel.com \
    --cc=daniel.lezcano@linaro.org \
    --cc=edubezval@gmail.com \
    --cc=frederic@kernel.org \
    --cc=fweisbec@gmail.com \
    --cc=jacob.jun.pan@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=rui.zhang@intel.com \
    --cc=steven@uplinklabs.net \
    --cc=tglx@linutronix.de \
    --cc=viresh.kumar@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox