From: Phil Auld <pauld@redhat.com>
To: linux-kernel@vger.kernel.org
Cc: linux-tip-commits@vger.kernel.org,
Konstantin Khlebnikov <khlebnikov@yandex-team.ru>,
"Peter Zijlstra (Intel)" <peterz@infradead.org>,
Ingo Molnar <mingo@kernel.org>, x86 <x86@kernel.org>
Subject: Re: [tip: sched/core] sched/rt: Optimize checking group RT scheduler constraints
Date: Thu, 30 Jan 2020 14:10:13 -0500 [thread overview]
Message-ID: <20200130191013.GA24632@pauld.bos.csb> (raw)
In-Reply-To: <158029757654.396.14560704856092377008.tip-bot2@tip-bot2>
On Wed, Jan 29, 2020 at 11:32:56AM -0000 tip-bot2 for Konstantin Khlebnikov wrote:
> The following commit has been merged into the sched/core branch of tip:
>
> Commit-ID: b4fb015eeff7f3e5518a7dbe8061169a3e2f2bc7
> Gitweb: https://git.kernel.org/tip/b4fb015eeff7f3e5518a7dbe8061169a3e2f2bc7
> Author: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
> AuthorDate: Sat, 25 Jan 2020 17:50:38 +03:00
> Committer: Ingo Molnar <mingo@kernel.org>
> CommitterDate: Tue, 28 Jan 2020 21:37:09 +01:00
>
> sched/rt: Optimize checking group RT scheduler constraints
>
> Group RT scheduler contains protection against setting zero runtime for
> cgroup with RT tasks. Right now function tg_set_rt_bandwidth() iterates
> over all CPU cgroups and calls tg_has_rt_tasks() for any cgroup which
> runtime is zero (not only for changed one). Default RT runtime is zero,
> thus tg_has_rt_tasks() will is called for almost at CPU cgroups.
>
> This protection already is slightly racy: runtime limit could be changed
> between cpu_cgroup_can_attach() and cpu_cgroup_attach() because changing
> cgroup attribute does not lock cgroup_mutex while attach does not lock
> rt_constraints_mutex. Changing task scheduler class also races with
> changing rt runtime: check in __sched_setscheduler() isn't protected.
>
> Function tg_has_rt_tasks() iterates over all threads in the system.
> This gives NR_CGROUPS * NR_TASKS operations under single tasklist_lock
> locked for read tg_set_rt_bandwidth(). Any concurrent attempt of locking
> tasklist_lock for write (for example fork) will stuck with disabled irqs.
>
> This patch makes two optimizations:
> 1) Remove locking tasklist_lock and iterate only tasks in cgroup
> 2) Call tg_has_rt_tasks() iff rt runtime changes from non-zero to zero
>
> All changed code is under CONFIG_RT_GROUP_SCHED.
>
> Testcase:
>
> # mkdir /sys/fs/cgroup/cpu/test{1..10000}
> # echo 0 | tee /sys/fs/cgroup/cpu/test*/cpu.rt_runtime_us
>
> At the same time without patch fork time will be >100ms:
>
> # perf trace -e clone --duration 100 stress-ng --fork 1
>
> Also remote ping will show timings >100ms caused by irq latency.
>
> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Signed-off-by: Ingo Molnar <mingo@kernel.org>
> Link: https://lkml.kernel.org/r/157996383820.4651.11292439232549211693.stgit@buzz
> ---
> kernel/sched/rt.c | 24 +++++++++++-------------
> 1 file changed, 11 insertions(+), 13 deletions(-)
>
> diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
> index 4043abe..55a4a50 100644
> --- a/kernel/sched/rt.c
> +++ b/kernel/sched/rt.c
> @@ -2449,10 +2449,11 @@ const struct sched_class rt_sched_class = {
> */
> static DEFINE_MUTEX(rt_constraints_mutex);
>
> -/* Must be called with tasklist_lock held */
> static inline int tg_has_rt_tasks(struct task_group *tg)
> {
> - struct task_struct *g, *p;
> + struct task_struct *task;
> + struct css_task_iter it;
> + int ret = 0;
>
> /*
> * Autogroups do not have RT tasks; see autogroup_create().
> @@ -2460,12 +2461,12 @@ static inline int tg_has_rt_tasks(struct task_group *tg)
> if (task_group_is_autogroup(tg))
> return 0;
>
> - for_each_process_thread(g, p) {
> - if (rt_task(p) && task_group(p) == tg)
> - return 1;
> - }
> + css_task_iter_start(&tg->css, 0, &it);
> + while (!ret && (task = css_task_iter_next(&it)))
> + ret |= rt_task(task);
> + css_task_iter_end(&it);
>
Heh, I misread it the first time and didn't see the !ret condition. But I think
it should be just "ret = ".
But thanks for getting this fixed.
Cheers,
Phil
> - return 0;
> + return ret;
> }
>
> struct rt_schedulable_data {
> @@ -2496,9 +2497,10 @@ static int tg_rt_schedulable(struct task_group *tg, void *data)
> return -EINVAL;
>
> /*
> - * Ensure we don't starve existing RT tasks.
> + * Ensure we don't starve existing RT tasks if runtime turns zero.
> */
> - if (rt_bandwidth_enabled() && !runtime && tg_has_rt_tasks(tg))
> + if (rt_bandwidth_enabled() && !runtime &&
> + tg->rt_bandwidth.rt_runtime && tg_has_rt_tasks(tg))
> return -EBUSY;
>
> total = to_ratio(period, runtime);
> @@ -2564,7 +2566,6 @@ static int tg_set_rt_bandwidth(struct task_group *tg,
> return -EINVAL;
>
> mutex_lock(&rt_constraints_mutex);
> - read_lock(&tasklist_lock);
> err = __rt_schedulable(tg, rt_period, rt_runtime);
> if (err)
> goto unlock;
> @@ -2582,7 +2583,6 @@ static int tg_set_rt_bandwidth(struct task_group *tg,
> }
> raw_spin_unlock_irq(&tg->rt_bandwidth.rt_runtime_lock);
> unlock:
> - read_unlock(&tasklist_lock);
> mutex_unlock(&rt_constraints_mutex);
>
> return err;
> @@ -2641,9 +2641,7 @@ static int sched_rt_global_constraints(void)
> int ret = 0;
>
> mutex_lock(&rt_constraints_mutex);
> - read_lock(&tasklist_lock);
> ret = __rt_schedulable(NULL, 0, 0);
> - read_unlock(&tasklist_lock);
> mutex_unlock(&rt_constraints_mutex);
>
> return ret;
>
--
prev parent reply other threads:[~2020-01-30 19:10 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-01-25 14:50 [PATCH] sched/rt: optimize checking group rt scheduler constraints Konstantin Khlebnikov
2020-01-25 15:32 ` Cyrill Gorcunov
2020-01-25 16:15 ` Konstantin Khlebnikov
2020-01-25 16:40 ` Cyrill Gorcunov
2020-01-27 17:47 ` Phil Auld
2020-01-29 11:32 ` [tip: sched/core] sched/rt: Optimize checking group RT " tip-bot2 for Konstantin Khlebnikov
2020-01-30 19:10 ` Phil Auld [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200130191013.GA24632@pauld.bos.csb \
--to=pauld@redhat.com \
--cc=khlebnikov@yandex-team.ru \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-tip-commits@vger.kernel.org \
--cc=mingo@kernel.org \
--cc=peterz@infradead.org \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.