Re: sched-freq locking

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Juri Lelli <Juri.Lelli@arm.com>
To: Steve Muckle <steve.muckle@linaro.org>
Cc: Michael Turquette <mturquette@baylibre.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Patrick Bellasi <patrick.bellasi@arm.com>,
	Morten Rasmussen <morten.rasmussen@arm.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Viresh Kumar <viresh.kumar@linaro.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-pm@vger.kernel.org" <linux-pm@vger.kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	"Rafael J. Wysocki" <rjw@rjwysocki.net>,
	Javi Merino <Javi.Merino@arm.com>,
	Punit Agrawal <punit.agrawal@arm.com>
Subject: Re: sched-freq locking
Date: Thu, 21 Jan 2016 09:45:13 +0000	[thread overview]
Message-ID: <20160121094513.GW8573@e106622-lin> (raw)
In-Reply-To: <569FF21C.9070502@linaro.org>

On 20/01/16 12:46, Steve Muckle wrote:
> Hi Juri,
> 
> On 01/20/2016 07:58 AM, Juri Lelli wrote:
> > On 20/01/16 06:50, Steve Muckle wrote:
> >> On 01/20/2016 04:18 AM, Juri Lelli wrote:
> >>> I fear that caching could break thermal. If everybody was already using
> >>> sched-freq interface to control frequency this won't probably be a
> >>> problem, but we are not yet there :(. So, IIUC by caching policy->max,
> >>> for example, we might affect what thermal expects from cpufreq.
> >>
> >> I think we could probably just access policy->max without rwsem, as long
> >> as we ensure policy is valid. Cpufreq will take care to cap a frequency
> >> request to an upper limit put in by thermal anyway so I don't think it's
> >> a problematic race. But I haven't spent much time thinking about it.
> >>
> > 
> > Mmm, can't the fact that the scheduler might think it can still request
> > a frequency above max be problematic?
> 
> I don't think so because cpufreq is going to clamp the incoming request
> to be within policy->min and policy->max anyway (see
> __cpufreq_driver_target() near the top). So nothing's going to
> break/catch fire.
> 

Right. I wasn't worried about this, but rather by the fact that the
scheduler might think there is enough space in a CPU (and balance tasks
accordingly) when that space has actually been restricted by thermal.
But, I also think there are ways to prevent this from happening, we just
need to be aware of such a problem.

> The question to me is, if the system gets throttled (or un-throttled)
> just as you are evaluating capacity requests, are you actually better
> off using the new policy->max value to scale? I suspect not, because the
> recent load tracking statistics and hence the capacity requests will
> have been calculated based on the original policy->max rather than the
> one that was just written.
> 
> Perhaps a decayed version of policy->max could be maintained and used to
> scale instead. Not sure if this will be enough of an issue in practice
> though to warrant it...
> 

Yeah, I can't really see a big issue here at the moment.

> > 
> ...
> > 
> >> Currently schedfreq has to go through two stages of aggregation. The
> >> first is aggregating the capacity request votes from the different sched
> >> classes within a CPU. The second is aggregating the capacity request
> >> votes from the CPUs within a frequency domain. I'm looking to solve the
> >> locking issues with both of these stages as well as those with calling
> >> into cpufreq in the fast path.
> >>
> >> For the first stage, currently we assume that the rq lock is held. This
> >> is the case for the existing calls into schedfreq (the load balancer
> >> required a minor change). There are a few more hooks that need to go
> >> into migration paths in core.c which will be slightly more painful, but
> >> they are IMO doable.
> >>
> >> For the second stage I believe an internal schedfreq spinlock is
> >> required, for one thing to protect against competing updates within the
> >> same frequency domain from different CPUs, but also so that we can
> >> guarantee the cpufreq policy is valid, which can be done if the
> >> governor_start/stop callbacks take the spinlock as well.
> >>
> > 
> > Does this need to nest within the rq lock?
> 
> I think it will have to because I suspect releasing the rq lock to run
> the second stage, then re-acquiring it afterwards, will cause breakage
> in the scheduler paths from which this is all being called.
> 

Right, that's what I was thinking too. However, we need to be aware that
we might add some overhead caused by contention on this new spinlock.

> Otherwise I don't see a requirement within schedfreq for it to be nested.
> 

OTOH, doesn't second stage happen on the kthread (when we need to
sleep)?

> >
> >> As for accessing various things in cpufreq->policy and trying to take
> >> rwsem in the fast path, we should be able to either cache some of the
> >> items in the governor_start() path, eliminate the references, or access
> >> them without locking rwsem (as long as we know the policy is valid,
> > 
> > If we only need to guarantee existence of the policy, without any direct
> > updates, RCU might be a good fit.
> 
> It must be guaranteed that policy->cpus and policy->freq_table are
> current/synchronized with cpufreq for the duration of the second stage.
> I think this precludes the use of RCU.
> 

Mmm, it seems so yes.

> We're going to need an internal schedfreq spinlock to arbitrate multiple
> CPUs in a frequency domain trying to go through the hot path
> concurrently anyway, so we're really just extending that critical
> section to the governor start and stop callbacks where we can safely
> cache some of the policy data.
> 
> > 
> >> which we do by taking the spinlock in governor_start()). Here are the
> >> things we currently reference in the schedfreq set capacity hook and my
> >> thoughts on each of them:
> >>
> >> policy->governor: this is currently tested to see if schedfreq is
> >> enabled, but this can be tracked internally to schedfreq and set in the
> >> governor_start/stop callbacks
> >>
> >> policy->governor_data: used to access schedfreq's data for the policy,
> >> could be changed to an internal schedfreq percpu pointer
> >>
> >> policy->cpus, policy->freq_table: these can be cached internally to
> >> schedfreq on governor_start/stop
> >>
> >> policy->max: as mentioned above I *think* we could get away with
> >> accessing this without locking rwsem as discussed above
> >>
> >> policy->transition_ongoing: this isn't strictly required as schedfreq
> >> could track internally whether it has a transition outstanding, but we
> >> should also be okay to look at this without policy->rwsem since
> >> schedfreq is the only one queuing transitions, again assuming we take
> >> care to ensure policy is valid as above
> >>
> >> Finally when actually requesting a frequency change in the fast path, we
> >> can trylock policy->rwsem and fall back to the slow path (kthread) if
> >> that fails.
> >>
> > 
> > OTOH, all the needed aggregation could make the fast path not so fast in
> > the end. So, maybe having a fast and a slow path is always good?
> 
> Sorry I didn't follow... What do you mean by having a fast and a slow path?
> 

Sorry, I wasn't clear enough. What I was trying to say is that having
two different paths, one fast where we aggregate locally and fire a
request and a second one slower where we aggregate per frequency domain,
compute the new request and call cpufreq, might be always desirable;
even on system that could issue frequency changes without sleeping.

next prev parent reply	other threads:[~2016-01-21  9:44 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <56984C30.8040402@linaro.org>
     [not found] ` <20160115104051.GP18603@e106622-lin>
     [not found]   ` <569D568D.5000500@linaro.org>
     [not found]     ` <CAEG3pNDoDHpgO7WdC8n7C8Ri+KNKZqOi9rKHMkTV0c5dkyevOw@mail.gmail.com>
     [not found]       ` <CAKfTPtCYUWBm_=GQ-NLLZ01xJqG+83vy4uapxRLjqMx_61HSew@mail.gmail.com>
     [not found]         ` <569E90CF.9050503@linaro.org>
     [not found]           ` <CAEG3pND5kHqJhrvwVjsnCnfT-nSrV8SrUaNk2p=ZfEx0MmmB=A@mail.gmail.com>
     [not found]             ` <569EB225.4040707@linaro.org>
     [not found]               ` <CAEG3pNBgpsdX8Lk7_3nHd31_bsq2sLB_f+=4_xiFFbiAWiKsxg@mail.gmail.com>
2016-01-20  1:24                 ` sched-freq locking Steve Muckle
2016-01-20 12:18                   ` Juri Lelli
2016-01-20 14:50                     ` Steve Muckle
2016-01-20 15:58                       ` Juri Lelli
2016-01-20 16:39                         ` Punit Agrawal
2016-01-20 16:39                           ` Punit Agrawal
2016-01-20 16:46                         ` Punit Agrawal
2016-01-20 16:46                           ` Punit Agrawal
2016-01-20 20:46                         ` Steve Muckle
2016-01-21  9:45                           ` Juri Lelli [this message]
2016-01-21 19:29                             ` Steve Muckle
2016-01-21  1:22                   ` Rafael J. Wysocki
2016-01-21  1:39                     ` Steve Muckle
2016-01-21  1:46                       ` Rafael J. Wysocki
2016-01-21 10:49                         ` Juri Lelli
2016-01-22  1:21                           ` Rafael J. Wysocki
2016-01-27  1:54                             ` Steve Muckle
2016-01-27  8:03                               ` Rafael J. Wysocki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160121094513.GW8573@e106622-lin \
    --to=juri.lelli@arm.com \
    --cc=Javi.Merino@arm.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=morten.rasmussen@arm.com \
    --cc=mturquette@baylibre.com \
    --cc=patrick.bellasi@arm.com \
    --cc=peterz@infradead.org \
    --cc=punit.agrawal@arm.com \
    --cc=rjw@rjwysocki.net \
    --cc=steve.muckle@linaro.org \
    --cc=vincent.guittot@linaro.org \
    --cc=viresh.kumar@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.