From: Benjamin Segall <bsegall@google.com>
To: Chuyi Zhou <zhouchuyi@bytedance.com>
Cc: mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com,
vincent.guittot@linaro.org, linux-kernel@vger.kernel.org,
htejun@gmail.com, lizefan.x@bytedance.com, vschneid@redhat.com,
Abel Wu <wuyun.abel@bytedance.com>
Subject: Re: [RESEND] sched/fair: Add min_ratio for cfs bandwidth_control
Date: Wed, 19 Oct 2022 14:01:12 -0700 [thread overview]
Message-ID: <xm26mt9rle93.fsf@google.com> (raw)
In-Reply-To: <20221019031551.24312-1-zhouchuyi@bytedance.com> (Chuyi Zhou's message of "Wed, 19 Oct 2022 11:15:51 +0800")
Chuyi Zhou <zhouchuyi@bytedance.com> writes:
> Tasks may be throttled when holding locks for a long time by current
> cfs bandwidth control mechanism once users set a too small quota/period
> ratio, which can result whole system get stuck[1].
>
> In order to prevent the above situation from happening, this patch adds
> sysctl_sched_cfs_bandwidth_min_ratio in /proc/sys/kernel, which indicates
> the minimum percentage of quota/period users can set. The default value is
> zero and users can set quota and period without triggering this
> constraint.
There's so many other sorts of bad inputs that can get you stuck here
that I'm not sure it's ever safe against lockups to provide direct write
access to an untrusted user. I'm not totally opposed but it seems like
an incomplete fix to a broken (non-default) configuration.
>
> Link[1]:https://lore.kernel.org/lkml/5987be34-b527-4ff5-a17d-5f6f0dc94d6d@huawei.com/T/
> Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com>
> Suggested-by: Abel Wu <wuyun.abel@bytedance.com>
> ---
> include/linux/sched/sysctl.h | 4 ++++
> kernel/sched/core.c | 23 +++++++++++++++++++++++
> kernel/sysctl.c | 10 ++++++++++
> 3 files changed, 37 insertions(+)
>
> diff --git a/include/linux/sched/sysctl.h b/include/linux/sched/sysctl.h
> index 303ee7dd0c7e..dedb18648f0e 100644
> --- a/include/linux/sched/sysctl.h
> +++ b/include/linux/sched/sysctl.h
> @@ -21,6 +21,10 @@ enum sched_tunable_scaling {
> SCHED_TUNABLESCALING_END,
> };
>
> +#ifdef CONFIG_CFS_BANDWIDTH
> +extern unsigned int sysctl_sched_cfs_bandwidth_min_ratio;
> +#endif
> +
> #define NUMA_BALANCING_DISABLED 0x0
> #define NUMA_BALANCING_NORMAL 0x1
> #define NUMA_BALANCING_MEMORY_TIERING 0x2
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 5800b0623ff3..8f6cfd889e37 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -10504,6 +10504,12 @@ static u64 cpu_shares_read_u64(struct cgroup_subsys_state *css,
> }
>
> #ifdef CONFIG_CFS_BANDWIDTH
> +/*
> + * The minimum of quota/period ratio users can set, default is zero and users can set
> + * quota and period without triggering this constraint.
> + */
> +unsigned int sysctl_sched_cfs_bandwidth_min_ratio;
> +
> static DEFINE_MUTEX(cfs_constraints_mutex);
>
> const u64 max_cfs_quota_period = 1 * NSEC_PER_SEC; /* 1s */
> @@ -10513,6 +10519,20 @@ static const u64 max_cfs_runtime = MAX_BW * NSEC_PER_USEC;
>
> static int __cfs_schedulable(struct task_group *tg, u64 period, u64 runtime);
>
> +static int check_cfs_bandwidth_min_ratio(u64 period, u64 quota)
> +{
> + u64 ratio;
> +
> + if (!sysctl_sched_cfs_bandwidth_min_ratio)
> + return 0;
> +
> + ratio = div64_u64(quota * 100, period);
> + if (ratio < sysctl_sched_cfs_bandwidth_min_ratio)
> + return -1;
> +
> + return 0;
> +}
> +
> static int tg_set_cfs_bandwidth(struct task_group *tg, u64 period, u64 quota,
> u64 burst)
> {
> @@ -10548,6 +10568,9 @@ static int tg_set_cfs_bandwidth(struct task_group *tg, u64 period, u64 quota,
> burst + quota > max_cfs_runtime))
> return -EINVAL;
>
> + if (quota != RUNTIME_INF && check_cfs_bandwidth_min_ratio(period, quota))
> + return -EINVAL;
> +
> /*
> * Prevent race between setting of cfs_rq->runtime_enabled and
> * unthrottle_offline_cfs_rqs().
> diff --git a/kernel/sysctl.c b/kernel/sysctl.c
> index 188c305aeb8b..7d9743e8e514 100644
> --- a/kernel/sysctl.c
> +++ b/kernel/sysctl.c
> @@ -1652,6 +1652,16 @@ static struct ctl_table kern_table[] = {
> .extra1 = SYSCTL_ZERO,
> },
> #endif /* CONFIG_NUMA_BALANCING */
> +#ifdef CONFIG_CFS_BANDWIDTH
> + {
> + .procname = "sched_cfs_bandwidth_min_ratio",
> + .data = &sysctl_sched_cfs_bandwidth_min_ratio,
> + .maxlen = sizeof(unsigned int),
> + .mode = 0644,
> + .proc_handler = proc_dointvec_minmax,
> + .extra1 = SYSCTL_ZERO,
> + },
> +#endif /* CONFIG_CFS_BANDWIDTH */
> {
> .procname = "panic",
> .data = &panic_timeout,
next prev parent reply other threads:[~2022-10-19 21:01 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-10-19 3:15 [RESEND] sched/fair: Add min_ratio for cfs bandwidth_control Chuyi Zhou
2022-10-19 21:01 ` Benjamin Segall [this message]
2022-10-20 9:14 ` [External] " Chuyi Zhou
2022-10-19 21:21 ` Tejun Heo
2022-10-20 6:35 ` Chuyi Zhou
2022-10-20 17:08 ` Peter Zijlstra
2022-10-21 18:04 ` Tejun Heo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=xm26mt9rle93.fsf@google.com \
--to=bsegall@google.com \
--cc=htejun@gmail.com \
--cc=juri.lelli@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=lizefan.x@bytedance.com \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=vincent.guittot@linaro.org \
--cc=vschneid@redhat.com \
--cc=wuyun.abel@bytedance.com \
--cc=zhouchuyi@bytedance.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox