Re: [PATCH v2 1/2] sched, cgroup: Restore meaning to hierarchical_quota

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Benjamin Segall <bsegall@google.com>
To: Phil Auld <pauld@redhat.com>
Cc: linux-kernel@vger.kernel.org, Juri Lelli <juri.lelli@redhat.com>,
	Ingo Molnar <mingo@redhat.com>,
	Daniel Bristot de Oliveira <bristot@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Valentin Schneider <vschneid@redhat.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Mel Gorman <mgorman@suse.de>,
	Frederic Weisbecker <frederic@kernel.org>,
	Tejun Heo <tj@kernel.org>
Subject: Re: [PATCH v2 1/2] sched, cgroup: Restore meaning to hierarchical_quota
Date: Wed, 12 Jul 2023 15:09:31 -0700	[thread overview]
Message-ID: <xm268rbkg4tg.fsf@google.com> (raw)
In-Reply-To: <20230712133357.381137-2-pauld@redhat.com> (Phil Auld's message of "Wed, 12 Jul 2023 09:33:56 -0400")

Phil Auld <pauld@redhat.com> writes:

> In cgroupv2 cfs_b->hierarchical_quota is set to -1 for all task
> groups due to the previous fix simply taking the min.  It should
> reflect a limit imposed at that level or by an ancestor. Even
> though cgroupv2 does not require child quota to be less than or
> equal to that of its ancestors the task group will still be
> constrained by such a quota so this should be shown here. Cgroupv1
> continues to set this correctly.
>
> In both cases, add initialization when a new task group is created
> based on the current parent's value (or RUNTIME_INF in the case of
> root_task_group). Otherwise, the field is wrong until a quota is
> changed after creation and __cfs_schedulable() is called.
>
> Fixes: c53593e5cb69 ("sched, cgroup: Don't reject lower cpu.max on ancestors")
> Signed-off-by: Phil Auld <pauld@redhat.com>
> Reviewed-by: Ben Segall <bsegall@google.com>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Vincent Guittot <vincent.guittot@linaro.org>
> Cc: Juri Lelli <juri.lelli@redhat.com>
> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
> Cc: Valentin Schneider <vschneid@redhat.com>
> Cc: Ben Segall <bsegall@google.com>
> Cc: Frederic Weisbecker <frederic@kernel.org>
> Cc: Tejun Heo <tj@kernel.org>
> ---
>
> v2: Improve comment about how setting hierarchical_quota correctly
>
> helps the scheduler. Remove extra parens.
>  kernel/sched/core.c  | 13 +++++++++----
>  kernel/sched/fair.c  |  7 ++++---
>  kernel/sched/sched.h |  2 +-
>  3 files changed, 14 insertions(+), 8 deletions(-)
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index a68d1276bab0..f80697a79baf 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -9904,7 +9904,7 @@ void __init sched_init(void)
>  		ptr += nr_cpu_ids * sizeof(void **);
>  
>  		root_task_group.shares = ROOT_TASK_GROUP_LOAD;
> -		init_cfs_bandwidth(&root_task_group.cfs_bandwidth);
> +		init_cfs_bandwidth(&root_task_group.cfs_bandwidth, NULL);
>  #endif /* CONFIG_FAIR_GROUP_SCHED */
>  #ifdef CONFIG_RT_GROUP_SCHED
>  		root_task_group.rt_se = (struct sched_rt_entity **)ptr;
> @@ -11038,11 +11038,16 @@ static int tg_cfs_schedulable_down(struct task_group *tg, void *data)
>  
>  		/*
>  		 * Ensure max(child_quota) <= parent_quota.  On cgroup2,
> -		 * always take the min.  On cgroup1, only inherit when no
> -		 * limit is set:
> +		 * always take the non-RUNTIME_INF min.  On cgroup1, only
> +		 * inherit when no limit is set. In cgroup2 this is used
> +		 * by the scheduler to determine if a given CFS task has a
> +		 * bandwidth constraint at some higher level.
>  		 */

It's still used for determining this on cgroup1 (and the cgroup1 code
still works for that), right?

>  		if (cgroup_subsys_on_dfl(cpu_cgrp_subsys)) {
> -			quota = min(quota, parent_quota);
> +			if (quota == RUNTIME_INF)
> +				quota = parent_quota;
> +			else if (parent_quota != RUNTIME_INF)
> +				quota = min(quota, parent_quota);
>  		} else {
>  			if (quota == RUNTIME_INF)
>  				quota = parent_quota;
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 373ff5f55884..d9b3d4617e16 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -6005,13 +6005,14 @@ static enum hrtimer_restart sched_cfs_period_timer(struct hrtimer *timer)
>  	return idle ? HRTIMER_NORESTART : HRTIMER_RESTART;
>  }
>  
> -void init_cfs_bandwidth(struct cfs_bandwidth *cfs_b)
> +void init_cfs_bandwidth(struct cfs_bandwidth *cfs_b, struct cfs_bandwidth *parent)
>  {
>  	raw_spin_lock_init(&cfs_b->lock);
>  	cfs_b->runtime = 0;
>  	cfs_b->quota = RUNTIME_INF;
>  	cfs_b->period = ns_to_ktime(default_cfs_period());
>  	cfs_b->burst = 0;
> +	cfs_b->hierarchical_quota = parent ? parent->hierarchical_quota : RUNTIME_INF;
>  
>  	INIT_LIST_HEAD(&cfs_b->throttled_cfs_rq);
>  	hrtimer_init(&cfs_b->period_timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS_PINNED);
> @@ -6168,7 +6169,7 @@ static inline int throttled_lb_pair(struct task_group *tg,
>  	return 0;
>  }
>  
> -void init_cfs_bandwidth(struct cfs_bandwidth *cfs_b) {}
> +void init_cfs_bandwidth(struct cfs_bandwidth *cfs_b, struct cfs_bandwidth *parent) {}
>  
>  #ifdef CONFIG_FAIR_GROUP_SCHED
>  static void init_cfs_rq_runtime(struct cfs_rq *cfs_rq) {}
> @@ -12373,7 +12374,7 @@ int alloc_fair_sched_group(struct task_group *tg, struct task_group *parent)
>  
>  	tg->shares = NICE_0_LOAD;
>  
> -	init_cfs_bandwidth(tg_cfs_bandwidth(tg));
> +	init_cfs_bandwidth(tg_cfs_bandwidth(tg), tg_cfs_bandwidth(parent));
>  
>  	for_each_possible_cpu(i) {
>  		cfs_rq = kzalloc_node(sizeof(struct cfs_rq),
> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> index ec7b3e0a2b20..63822c9238cc 100644
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -460,7 +460,7 @@ extern void unregister_fair_sched_group(struct task_group *tg);
>  extern void init_tg_cfs_entry(struct task_group *tg, struct cfs_rq *cfs_rq,
>  			struct sched_entity *se, int cpu,
>  			struct sched_entity *parent);
> -extern void init_cfs_bandwidth(struct cfs_bandwidth *cfs_b);
> +extern void init_cfs_bandwidth(struct cfs_bandwidth *cfs_b, struct cfs_bandwidth *parent);
>  
>  extern void __refill_cfs_bandwidth_runtime(struct cfs_bandwidth *cfs_b);
>  extern void start_cfs_bandwidth(struct cfs_bandwidth *cfs_b);

next prev parent reply	other threads:[~2023-07-12 22:09 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-12 13:33 [PATCH v6 0/2] Fix nohz_full vs cfs bandwidth Phil Auld
2023-07-12 13:33 ` [PATCH v2 1/2] sched, cgroup: Restore meaning to hierarchical_quota Phil Auld
2023-07-12 22:09   ` Benjamin Segall [this message]
2023-07-13 13:23     ` Phil Auld
2023-07-13 20:12       ` Benjamin Segall
2023-07-13 23:27         ` Phil Auld
2023-07-14 12:57     ` [PATCH v3 " Phil Auld
2023-07-17 18:27       ` Tejun Heo
2023-07-18 12:57         ` Phil Auld
2023-07-18 13:25           ` Phil Auld
2023-08-09 19:34       ` [tip: sched/core] " tip-bot2 for Phil Auld
2023-07-12 13:33 ` [PATCH v6 2/2] Sched/fair: Block nohz tick_stop when cfs bandwidth in use Phil Auld
2023-07-12 22:11   ` Benjamin Segall
2023-07-13 13:25     ` Phil Auld
2023-07-31 22:49   ` Peter Zijlstra
2023-08-01 11:13     ` Phil Auld
2023-08-01 15:37       ` Peter Zijlstra
2023-08-02 14:20         ` Phil Auld
2023-08-09 19:34   ` [tip: sched/core] sched/fair: " tip-bot2 for Phil Auld
2023-07-31 15:17 ` [PATCH v6 0/2] Fix nohz_full vs cfs bandwidth Phil Auld
2023-07-31 16:38 ` Phil Auld
2023-07-31 17:23   ` Phil Auld

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xm268rbkg4tg.fsf@google.com \
    --to=bsegall@google.com \
    --cc=bristot@redhat.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=frederic@kernel.org \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=pauld@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=tj@kernel.org \
    --cc=vincent.guittot@linaro.org \
    --cc=vschneid@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.