All of lore.kernel.org
 help / color / mirror / Atom feed
From: Patrick Bellasi <patrick.bellasi@matbug.net>
To: Qais Yousef <qais.yousef@arm.com>
Cc: Valentin Schneider <valentin.schneider@arm.com>,
	Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Luis Chamberlain <mcgrof@kernel.org>,
	Kees Cook <keescook@chromium.org>,
	Iurii Zaikin <yzaikin@google.com>,
	Juri Lelli <juri.lelli@redhat.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
	qperret@google.com, linux-kernel@vger.kernel.org,
	linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH] sched/rt: Add a new sysctl to control uclamp_util_min
Date: Wed, 22 Jan 2020 11:19:44 +0100	[thread overview]
Message-ID: <20200122101944.GA12533@darkstar> (raw)
In-Reply-To: <20200114213403.cur6gydan6kmplqb@e107158-lin.cambridge.arm.com>

On 14-Jan 21:34, Qais Yousef wrote:
> On 01/09/20 10:21, Patrick Bellasi wrote:
> > That's not entirely true. In that patch we introduce cgroup support
> > but, if you look at the code before that patch, for CFS tasks there is
> > only:
> >  - CFS task-specific values (min,max)=(0,1024) by default
> >  - CFS system-wide tunables (min,max)=(1024,1024) by default
> > and a change on the system-wide tunable allows for example to enforce
> > a uclamp_max=200 on all tasks.
> > 
> > A similar solution can be implemented for RT tasks, where we have:
> >  - RT task-specific values (min,max)=(1024,1024) by default
> >  - RT system-wide tunables (min,max)=(1024,1024) by default
> >  and a change on the system-wide tunable allows for example to enforce
> >  a uclamp_min=200 on all tasks.
> 
> I feel I'm already getting lost in the complexity of the interaction here. Do
> we really need to go that path?
> 
> So we will end up with a default system wide for all tasks + a CFS system wide
> default + an RT system wide default?
> 
> As I understand it, we have a single system wide default now.

Right now we have one system wide default and that's both for all
CFS/RT tasks, when cgroups are not in use, or for root group and
autogroup CFS/RT tasks, when cgroups are in use.

What I'm proposing is to limit the usage of the current system wide
default to CFS tasks only, while we add a similar new one specifically
for RT tasks.

At the end we will have two system wide defaults, not three.

> > > (Would we need CONFIG_RT_GROUP_SCHED for this? IIRC there's a few pain points
> > > when turning it on, but I think we don't have to if we just want things like
> > > uclamp value propagation?)
> > 
> > No, the current design for CFS tasks works also on !CONFIG_CFS_GROUP_SCHED.
> > That's because in this case:
> >   - uclamp_tg_restrict() returns just the task requested value
> >   - uclamp_eff_get() _always_ restricts the requested value considering
> >     the system defaults
> >  
> > > It's quite more work than the simple thing Qais is introducing (and on both
> > > user and kernel side).
> > 
> > But if in the future we will want to extend CGroups support to RT then
> > we will feel the pains because we do the effective computation in two
> > different places.
> 
> Hmm what you're suggesting here is that we want to have
> cpu.rt.uclamp.{min,max}? I'm not sure I can agree this is a good idea.

That's exactly what we already do for other controllers. For example,
if you look at the bandwidth controller, we have separate knobs for
CFS and RT tasks.

> It makes more sense to create a special group for all rt tasks rather than
> treat rt tasks in a cgroup differently.

Don't see why that should make more sense. This can work of course but
it would enforce a more strict configuration and usage of cgroups to
userspace.

I also have some doubths about this approach matching the delegation
model principles.

> > Do note that a proper CGroup support requires that the system default
> > values defines the values for the root group and are consistently
> > propagated down the hierarchy. Thus we need to add a dedicated pair of
> > cgroup attributes, e.g. cpu.util.rt.{min.max}.
> > 
> > To recap, we don't need CGROUP support right now but just to add a new
> > default tracking similar to what we do for CFS.
> >
> > We already proposed such a support in one of the initial versions of
> > the uclamp series:
> >    Message-ID: <20190115101513.2822-10-patrick.bellasi@arm.com>
> >    https://lore.kernel.org/lkml/20190115101513.2822-10-patrick.bellasi@arm.com/
> 
> IIUC what you're suggesting is:
> 
> 	1. Use the sysctl to specify the default_rt_uclamp_min
> 	2. Enforce this value in uclamp_eff_get() rather than my sync logic
> 	3. Remove the current hack to always set
> 	   rt_task->uclamp_min = uclamp_none(UCLAMP_MAX)

Right, that's the essence...

> If I got it correctly I'd be happy to experiment with it if this is what
> you're suggesting. Otherwise I'm afraid I'm failing to see the crust of the
> problem you're trying to highlight.

... from what your write above I think you got it right.

In my previous posting:

   Message-ID: <20200109092137.GA2811@darkstar>
   https://lore.kernel.org/lkml/20200109092137.GA2811@darkstar/

there is also the code snippet which should be good enough to extend
uclamp_eff_get(). Apart from that, what remains is:
- to add the two new sysfs knobs for sysctl_sched_uclamp_util_{min,max}_rt
- make a call about how rt tasks in cgroups are clamped, a simple
  proposal is in the second snippet of my message above.

Best,
Patrick

-- 
#include <best/regards.h>

Patrick Bellasi

  reply	other threads:[~2020-01-22 10:20 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-12-20 16:48 [PATCH] sched/rt: Add a new sysctl to control uclamp_util_min Qais Yousef
2020-01-07 13:42 ` Quentin Perret
2020-01-07 19:30   ` Dietmar Eggemann
2020-01-08  9:51     ` Quentin Perret
2020-01-08 19:16       ` Patrick Bellasi
2020-01-09 11:36       ` Qais Yousef
2020-01-09 11:16     ` Qais Yousef
2020-01-09 11:12   ` Qais Yousef
2020-01-08 13:44 ` Peter Zijlstra
2020-01-08 19:08   ` Patrick Bellasi
2020-01-09 13:00   ` Qais Yousef
2020-01-10 13:39     ` Peter Zijlstra
2020-01-12 23:35       ` Qais Yousef
2020-01-10 13:42     ` Peter Zijlstra
2020-01-12 23:31       ` Qais Yousef
2020-01-08 18:56 ` Patrick Bellasi
2020-01-09  1:35   ` Valentin Schneider
2020-01-09  9:21     ` Patrick Bellasi
2020-01-09 13:38       ` Qais Yousef
2020-01-14 21:34       ` Qais Yousef
2020-01-22 10:19         ` Patrick Bellasi [this message]
2020-01-22 11:45           ` Qais Yousef
2020-01-22 12:44             ` Patrick Bellasi
2020-01-22 14:57               ` Qais Yousef
2020-01-09 13:15     ` Qais Yousef

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200122101944.GA12533@darkstar \
    --to=patrick.bellasi@matbug.net \
    --cc=bsegall@google.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=juri.lelli@redhat.com \
    --cc=keescook@chromium.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mcgrof@kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=qais.yousef@arm.com \
    --cc=qperret@google.com \
    --cc=rostedt@goodmis.org \
    --cc=valentin.schneider@arm.com \
    --cc=vincent.guittot@linaro.org \
    --cc=yzaikin@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.