All of lore.kernel.org
 help / color / mirror / Atom feed
From: Preeti U Murthy <preeti@linux.vnet.ibm.com>
To: Steven Noonan <steven@uplinklabs.net>
Cc: "Peter Zijlstra" <peterz@infradead.org>,
	"Thomas Gleixner" <tglx@linutronix.de>,
	jacob.jun.pan@intel.com,
	"Arjan van de Ven" <arjan@linux.intel.com>,
	"Linux Kernel mailing List" <linux-kernel@vger.kernel.org>,
	"Frédéric Weisbecker" <fweisbec@gmail.com>,
	frederic@kernel.org, "Daniel Lezcano" <daniel.lezcano@linaro.org>,
	"Amit Kucheria" <amit.kucheria@linaro.org>,
	"Eduardo Valentin" <edubezval@gmail.com>,
	"Viresh Kumar" <viresh.kumar@linaro.org>,
	rui.zhang@intel.com
Subject: Re: [PATCH V2] idle/intel_powerclamp: Redesign idle injection to use bandwidth control mechanism
Date: Wed, 11 Feb 2015 14:30:07 +0530	[thread overview]
Message-ID: <54DB1A17.9000706@linux.vnet.ibm.com> (raw)
In-Reply-To: <20150209181436.GA29721@rincewind.us-west-2.compute.internal>

On 02/09/2015 11:44 PM, Steven Noonan wrote:
> On Mon, Feb 9, 2015 at 9:56 AM, Steven Noonan <steven@uplinklabs.net> wrote:
>> On Mon, Feb 9, 2015 at 3:51 AM, Preeti U Murthy
>> <preeti@linux.vnet.ibm.com> wrote:
>>> Hi Steven,
>>>
>>> On 02/09/2015 01:02 PM, Steven Noonan wrote:
>>>> On Sun, Feb 8, 2015 at 8:49 PM, Preeti U Murthy
>>>> <preeti@linux.vnet.ibm.com> wrote:
>>>>> The powerclamp driver injects idle periods to stay within the thermal constraints.
>>>>> The driver does a fake idle by spawning per-cpu threads that call the mwait
>>>>> instruction. This behavior of fake idle can confuse the other kernel subsystems.
>>>>> For instance it calls into the nohz tick handlers, which are meant to be called
>>>>> only by the idle thread. It sets the state of the tick in each cpu to idle and
>>>>> stops the tick, when there are tasks on the runqueue. As a result the callers of
>>>>> idle_cpu()/ tick_nohz_tick_stopped() see different states of the cpu; while the
>>>>> former thinks that the cpu is busy, the latter thinks that it is idle. The outcome
>>>>> may be  inconsistency in the scheduler/nohz states which can lead to serious
>>>>> consequences. One of them was reported on this thread:
>>>>> https://lkml.org/lkml/2014/12/11/365.
>>>>>
>>>>> Thomas posted out a patch to disable the powerclamp driver from calling into the
>>>>> tick nohz code which has taken care of the above regression for the moment. However
>>>>> powerclamp driver as a result, will not be able to inject idle periods due to the
>>>>> presence of periodic ticks. With the current design of fake idle, we cannot move
>>>>> towards a better solution.
>>>>> https://lkml.org/lkml/2014/12/18/169
>>>>>
>>>>> This patch aims at removing the concept of fake idle and instead makes the cpus
>>>>> truly idle by throttling the runqueues during the idle injection periods. The situation
>>>>> is in fact very similar to throttling of cfs_rqs when they exceed their bandwidths.
>>>>> The idle injection metrics can be mapped to the bandwidth control metrics 'quota' and
>>>>> 'period' to achieve the same result. When the powerclamping is begun or when the
>>>>> clamping controls have been modified, the bandwidth for the root task group is set.
>>>>> The 'quota' will be the amount of time that the system needs to be busy and 'period'
>>>>> will be the sum of this busy duration and the idle duration as calculated by the driver.
>>>>> This gets rid of per-cpu kthreads, control cpu, hotplug notifiers and clamping mask since
>>>>> the thread starting powerclamping will set the bandwidth and throttling of all cpus will
>>>>> automatically fall in place. None of the other cpus need be bothered about this. This
>>>>> simplifies the design of the driver.
>>>>>
>>>>> Of course this is only if the idle injection metrics can be conveniently transformed
>>>>> into bandwidth control metrics. There are a couple of other primary concerns around if
>>>>> doing the below two in this patch is valid.
>>>>> a. This patch exports the functions to set the quota and period of task groups.
>>>>> b. This patch removes the constraint of not being able to set the root task grp's bandwidth.
>>>>>
>>>>> Signed-off-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
>>>>
>>>> This doesn't compile.
>>>
>>> Thanks for reporting this! I realized that I had not compiled in the powerclamp driver
>>> as a module while compile testing it. I was focusing on the issues with the design and
>>> failed to cross verify this. Apologies for the inconvenience.
>>>
>>> Find the diff compile tested below.
>>>
>>> I also realized that clamp_cpus() that sets the bandwidth cannot be called from
>>> multiple places. Currently I am calling it from end_powerclamp(), when the user changes the
>>> idle clamping duration and from a queued timer. This will require synchronization between
>>> callers which is not really called for. The queued wakeup_timer alone can re-evaluate the
>>> clamping metrics after every throttle-unthrottle period and this should suffice as far
>>> as I can see. Thoughts ?
>>
>> Hmm, I've had two system lockups so far while running a kernel with
>> intel_powerclamp loaded. Both times it slowly ground to a halt and
>> processes piled up...

Hmm I see. I am not surprised because this patch is not complete yet.
The idea was to gain suggestions and review around the approach first
before I went ahead to making it robust. Let me ease the creases that I
found and repost this patch for you to test. Thanks a lot for the
testing efforts! :)

Regards
Preeti U Murthy


  reply	other threads:[~2015-02-11  9:00 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-02-09  4:49 [PATCH V2] idle/intel_powerclamp: Redesign idle injection to use bandwidth control mechanism Preeti U Murthy
2015-02-09  7:32 ` Steven Noonan
2015-02-09 11:51   ` Preeti U Murthy
2015-02-09 17:56     ` Steven Noonan
2015-02-09 18:14       ` Steven Noonan
2015-02-11  9:00         ` Preeti U Murthy [this message]
2015-02-13 13:31 ` Peter Zijlstra
2015-02-13 14:03   ` Arjan van de Ven

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=54DB1A17.9000706@linux.vnet.ibm.com \
    --to=preeti@linux.vnet.ibm.com \
    --cc=amit.kucheria@linaro.org \
    --cc=arjan@linux.intel.com \
    --cc=daniel.lezcano@linaro.org \
    --cc=edubezval@gmail.com \
    --cc=frederic@kernel.org \
    --cc=fweisbec@gmail.com \
    --cc=jacob.jun.pan@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=rui.zhang@intel.com \
    --cc=steven@uplinklabs.net \
    --cc=tglx@linutronix.de \
    --cc=viresh.kumar@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.