All of lore.kernel.org
 help / color / mirror / Atom feed
From: bsegall@google.com
To: Huaixin Chang <changhuaixin@linux.alibaba.com>
Cc: linux-kernel@vger.kernel.org, peterz@infradead.org,
	mingo@redhat.com, bsegall@google.com, chiluk+linux@indeed.com,
	vincent.guittot@linaro.org, pauld@redhead.com
Subject: Re: [PATCH 1/2] sched: Defend cfs and rt bandwidth quota against overflow
Date: Mon, 20 Apr 2020 10:50:01 -0700	[thread overview]
Message-ID: <xm261roim4hi.fsf@google.com> (raw)
In-Reply-To: <20200420024421.22442-2-changhuaixin@linux.alibaba.com> (Huaixin Chang's message of "Mon, 20 Apr 2020 10:44:20 +0800")

Huaixin Chang <changhuaixin@linux.alibaba.com> writes:

> Kernel limitation on cpu.cfs_quota_us is insufficient. Some large
> numbers might cause overflow in to_ratio() calculation and produce
> unexpected results.
>
> For example, if we make two cpu cgroups and then write a reasonable
> value and a large value into child's and parent's cpu.cfs_quota_us. This
> will cause a write error.
>
> 	cd /sys/fs/cgroup/cpu
> 	mkdir parent; mkdir parent/child
> 	echo 8000 > parent/child/cpu.cfs_quota_us
> 	# 17592186044416 is (1UL << 44)
> 	echo 17592186044416 > parent/cpu.cfs_quota_us
>
> In this case, quota will overflow and thus fail the __cfs_schedulable
> check. Similar overflow also affects rt bandwidth.

More to the point is that I think doing

echo 17592186044416 > parent/cpu.cfs_quota_us
echo 8000 > parent/child/cpu.cfs_quota_us

will only fail on the second write, while with this patch it will fail
on the first, which should be more understandable.


to_ratio could be altered to avoid unnecessary internal overflow, but
min_cfs_quota_period is less than 1<<BW_SHIFT, so a cutoff would still
be needed.

Also tg_rt_schedulable sums a bunch of to_ratio(), and doesn't check for
overflow on that sum, so if we consider preventing weirdness around
schedulable checks and max quotas relevant we should probably fix that too.

>
> Signed-off-by: Huaixin Chang <changhuaixin@linux.alibaba.com>
> ---
>  kernel/sched/core.c  | 8 ++++++++
>  kernel/sched/rt.c    | 9 +++++++++
>  kernel/sched/sched.h | 2 ++
>  3 files changed, 19 insertions(+)
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 3a61a3b8eaa9..f0a74e35c3f0 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -7390,6 +7390,8 @@ static DEFINE_MUTEX(cfs_constraints_mutex);
>  
>  const u64 max_cfs_quota_period = 1 * NSEC_PER_SEC; /* 1s */
>  static const u64 min_cfs_quota_period = 1 * NSEC_PER_MSEC; /* 1ms */
> +/* More than 203 days if BW_SHIFT equals 20. */
> +static const u64 max_cfs_runtime = MAX_BW_USEC * NSEC_PER_USEC;
>  
>  static int __cfs_schedulable(struct task_group *tg, u64 period, u64 runtime);
>  
> @@ -7417,6 +7419,12 @@ static int tg_set_cfs_bandwidth(struct task_group *tg, u64 period, u64 quota)
>  	if (period > max_cfs_quota_period)
>  		return -EINVAL;
>  
> +	/*
> +	 * Bound quota to defend quota against overflow during bandwidth shift.
> +	 */
> +	if (quota != RUNTIME_INF && quota > max_cfs_runtime)
> +		return -EINVAL;
> +
>  	/*
>  	 * Prevent race between setting of cfs_rq->runtime_enabled and
>  	 * unthrottle_offline_cfs_rqs().
> diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
> index df11d88c9895..f5eea19d68c4 100644
> --- a/kernel/sched/rt.c
> +++ b/kernel/sched/rt.c
> @@ -2569,6 +2569,9 @@ static int __rt_schedulable(struct task_group *tg, u64 period, u64 runtime)
>  	return ret;
>  }
>  
> +/* More than 203 days if BW_SHIFT equals 20. */
> +static const u64 max_rt_runtime = MAX_BW_USEC * NSEC_PER_USEC;

It looks to me like __rt_schedulable doesn't divide by NSEC_PER_USEC, so
to_ratio is operating on nsec, and the limit is in nsec, and MAX_BW_USEC
should probably not be named USEC then as well.

> +
>  static int tg_set_rt_bandwidth(struct task_group *tg,
>  		u64 rt_period, u64 rt_runtime)
>  {
> @@ -2585,6 +2588,12 @@ static int tg_set_rt_bandwidth(struct task_group *tg,
>  	if (rt_period == 0)
>  		return -EINVAL;
>  
> +	/*
> +	 * Bound quota to defend quota against overflow during bandwidth shift.
> +	 */
> +	if (rt_runtime != RUNTIME_INF && rt_runtime > max_rt_runtime)
> +		return -EINVAL;
> +
>  	mutex_lock(&rt_constraints_mutex);
>  	err = __rt_schedulable(tg, rt_period, rt_runtime);
>  	if (err)
> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> index db3a57675ccf..6f6b7f545557 100644
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -1918,6 +1918,8 @@ extern void init_dl_inactive_task_timer(struct sched_dl_entity *dl_se);
>  #define BW_SHIFT		20
>  #define BW_UNIT			(1 << BW_SHIFT)
>  #define RATIO_SHIFT		8
> +#define MAX_BW_BITS		(64 - BW_SHIFT)
> +#define MAX_BW_USEC		((1UL << MAX_BW_BITS) - 1)
>  unsigned long to_ratio(u64 period, u64 runtime);
>  
>  extern void init_entity_runnable_average(struct sched_entity *se);

  reply	other threads:[~2020-04-20 17:50 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-20  2:44 [PATCH 0/2] Two small fixes for bandwidth controller Huaixin Chang
2020-04-20  2:44 ` [PATCH 1/2] sched: Defend cfs and rt bandwidth quota against overflow Huaixin Chang
2020-04-20 17:50   ` bsegall [this message]
2020-04-22  3:36     ` changhuaixin
2020-04-22 18:44       ` bsegall
2020-04-23 13:37     ` [PATCH] " Huaixin Chang
2020-04-23 20:33       ` bsegall
2020-04-25 10:52         ` [PATCH v2] " Huaixin Chang
2020-04-27 18:29           ` bsegall
2020-05-11 13:03             ` Peter Zijlstra
2020-05-19 18:44           ` [tip: sched/core] " tip-bot2 for Huaixin Chang
2020-04-22  8:38   ` [PATCH 1/2] " kbuild test robot
2020-04-22  8:38     ` kbuild test robot
2020-04-24  6:35   ` kbuild test robot
2020-04-24  6:35     ` kbuild test robot
2020-04-20  2:44 ` [PATCH 2/2] sched/fair: Refill bandwidth before scaling Huaixin Chang
2020-04-20 17:54   ` bsegall
2020-04-21 15:09   ` Phil Auld
2020-05-01 18:22   ` [tip: sched/core] " tip-bot2 for Huaixin Chang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xm261roim4hi.fsf@google.com \
    --to=bsegall@google.com \
    --cc=changhuaixin@linux.alibaba.com \
    --cc=chiluk+linux@indeed.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=pauld@redhead.com \
    --cc=peterz@infradead.org \
    --cc=vincent.guittot@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.