From: Valentin Schneider <valentin.schneider@arm.com>
To: Qais Yousef <qais.yousef@arm.com>
Cc: Ingo Molnar <mingo@redhat.com>,
Peter Zijlstra <peterz@infradead.org>,
Juri Lelli <juri.lelli@redhat.com>,
Vincent Guittot <vincent.guittot@linaro.org>,
Dietmar Eggemann <dietmar.eggemann@arm.com>,
Steven Rostedt <rostedt@goodmis.org>,
Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
Patrick Bellasi <patrick.bellasi@matbug.net>,
Chris Redpath <chris.redpath@arm.com>,
Lukasz Luba <lukasz.luba@arm.com>,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH v3 2/2] sched/uclamp: Protect uclamp fast path code with static key
Date: Thu, 25 Jun 2020 01:16:50 +0100 [thread overview]
Message-ID: <jhj5zbgroct.mognet@arm.com> (raw)
In-Reply-To: <20200624172605.26715-3-qais.yousef@arm.com>
Hi Qais,
On 24/06/20 18:26, Qais Yousef wrote:
[...]
> ---
>
> This takes a different approach to PSI which introduces a config option
>
> ```
> CONFIG_PSI_DEFAULT_DISABLED
>
> Require boot parameter to enable pressure stall information
> tracking (NEW)
>
> boot param psi
> ```
>
> via commit e0c274472d5d "psi: make disabling/enabling easier for vendor kernels"
>
> uclamp has a clearer points of entry when userspace would like to use it so we
> can automatically flip the switch if the kernel is running on a userspace that
> wants to user utilclamp without any extra userspace visible switches.
>
> I wanted to make this dependent on schedutil being the governor too, but beside
> the complexity, uclamp is used for capacity awareness. We could certainly
> construct a more complex condition, but I'm not sure it's worth it. Open to
> hear more opinions and points of views on this :)
>
I think the toggling conditions are good as they are. However, speaking of
schedutil, doesn't this patch break the RT frequency boost? Mind you it
might be too late for me to be thinking about this stuff.
In schedutil_cpu_util(), when uclamp isn't compiled it, we have an explicit
'goto max'. When uclamp *is* compiled in, that's taken care of by the
"natural" RT uclamp aggregation... Which doesn't happen until we flip the
static key.
It's yucky, but if you declare the key in the internal sched header, you
could reuse it in the existing 'goto max' (or sysctl value, when we make
that tweakable) path.
>
> kernel/sched/core.c | 54 +++++++++++++++++++++++++++++++++++++++++----
> 1 file changed, 50 insertions(+), 4 deletions(-)
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 235b2cae00a0..44e03d4fd19d 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -794,6 +794,25 @@ unsigned int sysctl_sched_uclamp_util_max = SCHED_CAPACITY_SCALE;
> /* All clamps are required to be less or equal than these values */
> static struct uclamp_se uclamp_default[UCLAMP_CNT];
>
> +/*
> + * This static key is used to reduce the uclamp overhead in the fast path. It
> + * only disables the call to uclamp_rq_{inc, dec}() in enqueue/dequeue_task().
> + *
> + * This allows users to continue to enable uclamp in their kernel config with
> + * minimum uclamp overhead in the fast path.
> + *
> + * As soon as userspace modifies any of the uclamp knobs, the static key is
> + * enabled, since we have an actual users that make use of uclamp
> + * functionality.
> + *
> + * The knobs that would enable this static key are:
> + *
> + * * A task modifying its uclamp value with sched_setattr().
That one makes it not just userspace, right? While the sched_setattr()
stuff is expected to be unexported, it isn't ATM and we may expect some
modules to ask for a uclamp API eventually.
> + * * An admin modifying the sysctl_sched_uclamp_{min, max} via procfs.
> + * * An admin modifying the cgroup cpu.uclamp.{min, max}
> + */
> +static DEFINE_STATIC_KEY_FALSE(sched_uclamp_used);
> +
> /* Integer rounded range for each bucket */
> #define UCLAMP_BUCKET_DELTA DIV_ROUND_CLOSEST(SCHED_CAPACITY_SCALE, UCLAMP_BUCKETS)
>
> @@ -994,9 +1013,16 @@ static inline void uclamp_rq_dec_id(struct rq *rq, struct task_struct *p,
> lockdep_assert_held(&rq->lock);
>
> bucket = &uc_rq->bucket[uc_se->bucket_id];
> - SCHED_WARN_ON(!bucket->tasks);
> - if (likely(bucket->tasks))
> - bucket->tasks--;
> +
> + /*
> + * This could happen if sched_uclamp_used was enabled while the
> + * current task was running, hence we could end up with unbalanced call
> + * to uclamp_rq_dec_id().
> + */
> + if (unlikely(!bucket->tasks))
> + return;
> +
> + bucket->tasks--;
> uc_se->active = false;
>
> /*
> @@ -1032,6 +1058,13 @@ static inline void uclamp_rq_inc(struct rq *rq, struct task_struct *p)
> {
> enum uclamp_id clamp_id;
>
> + /*
> + * Avoid any overhead until uclamp is actually used by the userspace.
> + * Including the branch if we use static_branch_likely()
I think that second point is made clear by the first one, but YMMV.
> + */
> + if (!static_branch_unlikely(&sched_uclamp_used))
> + return;
> +
> if (unlikely(!p->sched_class->uclamp_enabled))
> return;
>
> @@ -1047,6 +1080,13 @@ static inline void uclamp_rq_dec(struct rq *rq, struct task_struct *p)
> {
> enum uclamp_id clamp_id;
>
> + /*
> + * Avoid any overhead until uclamp is actually used by the userspace.
> + * Including the branch if we use static_branch_likely()
> + */
> + if (!static_branch_unlikely(&sched_uclamp_used))
> + return;
> +
> if (unlikely(!p->sched_class->uclamp_enabled))
> return;
>
> @@ -1155,8 +1195,10 @@ int sysctl_sched_uclamp_handler(struct ctl_table *table, int write,
> update_root_tg = true;
> }
>
> - if (update_root_tg)
> + if (update_root_tg) {
> uclamp_update_root_tg();
> + static_branch_enable(&sched_uclamp_used);
I don't think it matters ATM, but shouldn't we flip that *before* updating
the TG's to avoid any future surprises?
> + }
>
> /*
> * We update all RUNNABLE tasks only when task groups are in use.
> @@ -1221,6 +1263,8 @@ static void __setscheduler_uclamp(struct task_struct *p,
> if (likely(!(attr->sched_flags & SCHED_FLAG_UTIL_CLAMP)))
> return;
>
> + static_branch_enable(&sched_uclamp_used);
> +
> if (attr->sched_flags & SCHED_FLAG_UTIL_CLAMP_MIN) {
> uclamp_se_set(&p->uclamp_req[UCLAMP_MIN],
> attr->sched_util_min, true);
> @@ -7387,6 +7431,8 @@ static ssize_t cpu_uclamp_write(struct kernfs_open_file *of, char *buf,
> if (req.ret)
> return req.ret;
>
> + static_branch_enable(&sched_uclamp_used);
> +
> mutex_lock(&uclamp_mutex);
> rcu_read_lock();
next prev parent reply other threads:[~2020-06-25 0:16 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-06-24 17:26 [PATCH v3 0/2] sched: Optionally skip uclamp logic in fast path Qais Yousef
2020-06-24 17:26 ` [PATCH v3 1/2] sched/uclamp: Fix initialization of strut uclamp_rq Qais Yousef
2020-06-25 0:09 ` Valentin Schneider
2020-06-24 17:26 ` [PATCH v3 2/2] sched/uclamp: Protect uclamp fast path code with static key Qais Yousef
2020-06-25 0:16 ` Valentin Schneider [this message]
2020-06-25 11:00 ` Qais Yousef
2020-06-25 11:26 ` Valentin Schneider
2020-06-25 11:34 ` Qais Yousef
2020-06-25 15:21 ` [PATCH v3 0/2] sched: Optionally skip uclamp logic in fast path Lukasz Luba
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=jhj5zbgroct.mognet@arm.com \
--to=valentin.schneider@arm.com \
--cc=bsegall@google.com \
--cc=chris.redpath@arm.com \
--cc=dietmar.eggemann@arm.com \
--cc=juri.lelli@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=lukasz.luba@arm.com \
--cc=mgorman@suse.de \
--cc=mingo@redhat.com \
--cc=patrick.bellasi@matbug.net \
--cc=peterz@infradead.org \
--cc=qais.yousef@arm.com \
--cc=rostedt@goodmis.org \
--cc=vincent.guittot@linaro.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox