All of lore.kernel.org
 help / color / mirror / Atom feed
From: Benjamin Segall <bsegall@google.com>
To: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Ingo Molnar <mingo@redhat.com>,
	 Peter Zijlstra <peterz@infradead.org>,
	Juri Lelli <juri.lelli@redhat.com>,
	 Vincent Guittot <vincent.guittot@linaro.org>,
	 Dietmar Eggemann <dietmar.eggemann@arm.com>,
	 Steven Rostedt <rostedt@goodmis.org>,
	 Mel Gorman <mgorman@suse.de>,
	 Valentin Schneider <vschneid@redhat.com>,
	Aaron Lu <ziqianlu@bytedance.com>,  Josh Don <joshdon@google.com>,
	<linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 4/5] sched/fair: Move the throttled tasks to a local list in tg_unthrottle_up()
Date: Thu, 28 May 2026 15:14:59 -0700	[thread overview]
Message-ID: <xm26cxyfmba4.fsf@google.com> (raw)
In-Reply-To: <20260528094830.13291-5-kprateek.nayak@amd.com> (K. Prateek Nayak's message of "Thu, 28 May 2026 09:48:29 +0000")

K Prateek Nayak <kprateek.nayak@amd.com> writes:

> An update_curr() during the enqueue of throttled task will start
> throttling the hierarchy from subsequent commit. This can lead to
> tg_throttle_down() seeing non-empty throttled_limbo_list for the cfs_rq
> attaching the task from throttled_limbo_list one by one. For example:
>
>      R
>      |
>      A
>     / \
>   *B   C
>        |
>        rq->curr
>
> *B is throttled with tasks on hte limbo list. When the tasks are
> unthrottled via tg_unthrottle_up() and entity of group B is placed onto
> A, update_curr() is called to catch up the vruntime and it may throttle
> group A causing the subsequent tg_throttle_down() to see the pending
> task's on B's limbo list.
>
>   tg_unthrottle_up()
>     /* --cfs_rq->throttle_count == 0 */
>     list_for_each_entry_safe(p, cfs_rq->throttled_limbo_list)
>       enqueue_task_fair()
>         enqueue_entity(se /* B->se */)
>           update_curr(cfs_rq /* A->gcfs_rq */)
>             account_cfs_rq_runtime(cfs_rq)
>               throttle_cfs_rq(cfs_rq /* A->gcfs_rq */ )
>                 tg_throttle_down()
>                   /* Reaches B->cfs_rq with throttle_count == 0 */
>
>                   !!! !list_empty(&cfs_rq->throttled_limbo_list)) !!!
>
> Move the tasks from throttled_limbo_list onto a local list before
> starting the unthrottle to prevent the splat described above. If the
> hierarchy is throttled again in middle of an unthrottle, put the pending
> tasks back onto the limbo list to prevent running them unnecessarily.

And for extra fun, in order to get here we need to have just finished
throttle_cfs_rq_work and its resched_curr, but not have managed to reach
schedule yet when the unthrottle hits. But yeah, that can happen.

Reviewed-By: Benjamin Segall <bsegall@google.com>


>
> Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
> ---
>  kernel/sched/fair.c | 23 +++++++++++++++++++++--
>  1 file changed, 21 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index b3b3172702a9..c48eaf2d7919 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -6710,6 +6710,7 @@ static int tg_unthrottle_up(struct task_group *tg, void *data)
>  	struct rq *rq = data;
>  	struct cfs_rq *cfs_rq = tg->cfs_rq[cpu_of(rq)];
>  	struct task_struct *p, *tmp;
> +	LIST_HEAD(throttled_tasks);
>  
>  	/*
>  	 * If cfs_rq->curr is set, the cfs_rq might not have caught up
> @@ -6740,13 +6741,31 @@ static int tg_unthrottle_up(struct task_group *tg, void *data)
>  		cfs_rq->throttled_clock_self_time += delta;
>  	}
>  
> +	/*
> +	 * Move the tasks to a local list since an update_curr() during
> +	 * enqueue_task_fair() can throttle a higher cfs_rq, and it can
> +	 * see the "throttled_limbo_list" being non-empty in
> +	 * tg_throttle_down() if throttle_count turned 0 above.
> +	 */
> +	list_splice_init(&cfs_rq->throttled_limbo_list, &throttled_tasks);
> +
>  	/* Re-enqueue the tasks that have been throttled at this level. */
> -	list_for_each_entry_safe(p, tmp, &cfs_rq->throttled_limbo_list, throttle_node) {
> +	list_for_each_entry_safe(p, tmp, &throttled_tasks, throttle_node) {
> +		/*
> +		 * Back to being throttled! Break out and put the remaining
> +		 * tasks back onto the limbo_list to prevent running them
> +		 * unnecessarily.
> +		 */
> +		if (cfs_rq->throttle_count)
> +			break;
> +
>  		list_del_init(&p->throttle_node);
>  		p->throttled = false;
> -		enqueue_task_fair(rq_of(cfs_rq), p, ENQUEUE_WAKEUP);
> +		enqueue_task_fair(rq, p, ENQUEUE_WAKEUP);
>  	}
>  
> +	list_splice(&throttled_tasks, &cfs_rq->throttled_limbo_list);
> +
>  	/* Add cfs_rq with load or one or more already running entities to the list */
>  	if (!cfs_rq_is_decayed(cfs_rq))
>  		list_add_leaf_cfs_rq(cfs_rq);

  reply	other threads:[~2026-05-28 22:15 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-28  9:48 [PATCH 0/5] sched/fair: Allow account_cfs_rq_runtime() to throttle current hierarchy K Prateek Nayak
2026-05-28  9:48 ` [PATCH 1/5] sched/fair: Convert cfs bandwidth throttling to use guards K Prateek Nayak
2026-05-28 21:46   ` Benjamin Segall
2026-05-28  9:48 ` [PATCH 2/5] sched/fair: Use throttled_csd_list for local unthrottle K Prateek Nayak
2026-05-28 21:53   ` Benjamin Segall
2026-05-28  9:48 ` [PATCH 3/5] sched/fair: Call update_curr() before unthrottling the hierarchy K Prateek Nayak
2026-05-28 22:03   ` Benjamin Segall
2026-06-01  3:52   ` Aaron Lu
2026-06-01  5:50     ` K Prateek Nayak
2026-06-01 11:27       ` Peter Zijlstra
2026-06-02  6:33         ` K Prateek Nayak
2026-05-28  9:48 ` [PATCH 4/5] sched/fair: Move the throttled tasks to a local list in tg_unthrottle_up() K Prateek Nayak
2026-05-28 22:14   ` Benjamin Segall [this message]
2026-05-28  9:48 ` [PATCH 5/5] sched/fair: Unify cfs_rq throttling via account_cfs_rq_runtime() K Prateek Nayak
2026-05-28 22:44   ` Benjamin Segall
2026-06-01 13:48   ` Peter Zijlstra
2026-06-02  7:01     ` K Prateek Nayak
2026-06-02  8:32       ` Peter Zijlstra
2026-06-02  8:57         ` K Prateek Nayak
2026-05-28 11:45 ` [PATCH 0/5] sched/fair: Allow account_cfs_rq_runtime() to throttle current hierarchy Peter Zijlstra
2026-06-01  6:18 ` Aaron Lu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xm26cxyfmba4.fsf@google.com \
    --to=bsegall@google.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=joshdon@google.com \
    --cc=juri.lelli@redhat.com \
    --cc=kprateek.nayak@amd.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=vincent.guittot@linaro.org \
    --cc=vschneid@redhat.com \
    --cc=ziqianlu@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.