All of lore.kernel.org
 help / color / mirror / Atom feed
From: Benjamin Segall <bsegall@google.com>
To: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
	 Aaron Lu <ziqianlu@bytedance.com>,
	 Valentin Schneider <vschneid@redhat.com>,
	Chengming Zhou <chengming.zhou@linux.dev>,
	 Josh Don <joshdon@google.com>,  Ingo Molnar <mingo@redhat.com>,
	 Vincent Guittot <vincent.guittot@linaro.org>,
	 Xi Wang <xii@google.com>, <linux-kernel@vger.kernel.org>,
	 Juri Lelli <juri.lelli@redhat.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	 Steven Rostedt <rostedt@goodmis.org>,
	 Mel Gorman <mgorman@suse.de>,
	 Chuyi Zhou <zhouchuyi@bytedance.com>,
	 Jan Kiszka <jan.kiszka@siemens.com>,
	 Florian Bezdeka <florian.bezdeka@siemens.com>,
	 Songtang Liu <liusongtang@bytedance.com>,
	 Chen Yu <yu.c.chen@intel.com>,
	 Matteo Martelli <matteo.martelli@codethink.co.uk>,
	 Michal Koutn?? <mkoutny@suse.com>,
	 Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Subject: Re: [PATCH v4 3/5] sched/fair: Switch to task based throttle model
Date: Wed, 03 Sep 2025 13:46:48 -0700	[thread overview]
Message-ID: <xm26o6rrtgav.fsf@google.com> (raw)
In-Reply-To: <14be66aa-e088-4267-ac10-d04d600b1294@amd.com> (K. Prateek Nayak's message of "Wed, 3 Sep 2025 22:42:01 +0530")

K Prateek Nayak <kprateek.nayak@amd.com> writes:

> Hello Peter,
>
> On 9/3/2025 8:21 PM, Peter Zijlstra wrote:
>>>  static bool dequeue_task_fair(struct rq *rq, struct task_struct *p, int flags)
>>>  {
>>> +	if (task_is_throttled(p)) {
>>> +		dequeue_throttled_task(p, flags);
>>> +		return true;
>>> +	}
>>> +
>>>  	if (!p->se.sched_delayed)
>>>  		util_est_dequeue(&rq->cfs, p);
>>>  
>> 
>> OK, so this makes it so that either a task is fully enqueued (all
>> cfs_rq's) or full not. A group cfs_rq is only marked throttled when all
>> its tasks are gone, and unthrottled when a task gets added. Right?
>
> cfs_rq (and the hierarchy below) is marked throttled when the quota
> has elapsed. Tasks on the throttled hierarchies will dequeue
> themselves completely via task work added during pick. When the last
> task leaves on a cfs_rq of throttled hierarchy, PELT is frozen for
> that cfs_rq.
>
> When a new task is added on the hierarchy, the PELT is unfrozen and
> the task becomes runnable. The cfs_rq and the hierarchy is still
> marked throttled.
>
> Unthrottling of hierarchy is only done at distribution.
>
>> 
>> But propagate_entity_cfs_rq() is still doing the old thing, and has a
>> if (cfs_rq_throttled(cfs_rq)) break; inside the for_each_sched_entity()
>> iteration.
>> 
>> This seems somewhat inconsistent; or am I missing something ? 
>
> Probably an oversight. But before that, what was the reason to have
> stopped this propagation at throttled_cfs_rq() before the changes?
>

Yeah, this was one of the things I was (slowly) looking at - with this
series we currently still abort in:

1) update_cfs_group
2) dequeue_entities's set_next_buddy
3) check_preempt_fair
4) yield_to
5) propagate_entity_cfs_rq

In the old design on throttle immediately remove the entire cfs_rq,
freeze time for it, and stop adjusting load. In the new design we still
pick from it, so we definitely don't want to stop time (and don't). I'm
guessing we probably also want to now adjust load for it, but it is
arguable - since all the cfs_rqs for the tg are likely to throttle at the
same time, so we might not want to mess with the shares distribution,
since when unthrottle comes around the most likely correct distribution
is the distribution we had at the time of throttle.

Assuming we do want to adjust load for a throttle then we probably want
to remove the aborts from update_cfs_group and propagate_entity_cfs_rq.
I'm guessing that we need the list_add_leaf_cfs_rq from propagate, but
I'm not 100% sure when they are actually doing something in propagate as
opposed to enqueue.

The other 3 are the same sort of thing - scheduling pick heuristics
which imo are pretty arbitrary to keep. We can reasonably say that "the
most likely thing a task in a throttled hierarchy will do is just go
throttle itself, so we shouldn't buddy it or let it preempt", but it
would also be reasonable to let them preempt/buddy normally, in case
they hold locks or such.

yield_to is used by kvm and st-dma-fence-chain.c. Yielding to a
throttle-on-exit kvm cpu thread isn't useful (so no need to remove the
abort there). The dma code is just yielding to a just-spawned kthread,
so it should be fine either way.

  parent reply	other threads:[~2025-09-03 20:46 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-29  8:11 [PATCH v4 0/5] Defer throttle when task exits to user Aaron Lu
2025-08-29  8:11 ` [PATCH v4 1/5] sched/fair: Add related data structure for task based throttle Aaron Lu
2025-09-03  8:05   ` [tip: sched/core] " tip-bot2 for Valentin Schneider
2025-08-29  8:11 ` [PATCH v4 2/5] sched/fair: Implement throttle task work and related helpers Aaron Lu
2025-09-03  8:05   ` [tip: sched/core] " tip-bot2 for Valentin Schneider
2025-08-29  8:11 ` [PATCH v4 3/5] sched/fair: Switch to task based throttle model Aaron Lu
2025-09-03  8:05   ` [tip: sched/core] " tip-bot2 for Valentin Schneider
2025-09-03 14:51   ` [PATCH v4 3/5] " Peter Zijlstra
2025-09-03 17:12     ` K Prateek Nayak
2025-09-03 20:27       ` Peter Zijlstra
2025-09-04  5:44         ` K Prateek Nayak
2025-09-04  7:04           ` Aaron Lu
2025-09-05 11:37             ` Aaron Lu
2025-09-05 12:53               ` Peter Zijlstra
2025-09-08 11:05                 ` [PATCH] sched/fair: Propagate load for throttled cfs_rq Aaron Lu
2025-09-09  4:20                   ` kernel test robot
2025-09-09  6:17                     ` Aaron Lu
2025-09-09  6:22                       ` K Prateek Nayak
2025-09-09  6:27                         ` Aaron Lu
2025-09-10  9:55                           ` Aaron Lu
2025-09-03 20:46       ` Benjamin Segall [this message]
2025-09-04  6:03         ` [PATCH v4 3/5] sched/fair: Switch to task based throttle model K Prateek Nayak
2025-09-09  4:10           ` Benjamin Segall
2025-09-04  8:16         ` Aaron Lu
2025-09-04  9:51           ` K Prateek Nayak
2025-09-04 11:05             ` Aaron Lu
2025-09-04 14:20               ` K Prateek Nayak
2025-09-09  3:58               ` Benjamin Segall
2025-09-09 12:03                 ` Aaron Lu
2025-09-10  3:03               ` Aaron Lu
2025-09-04 12:04           ` Aaron Lu
2025-09-05  7:53             ` Aaron Lu
2025-09-03 20:55   ` Benjamin Segall
2025-09-04 11:26     ` Aaron Lu
2025-09-04 11:30       ` Aaron Lu
2025-08-29  8:11 ` [PATCH v4 4/5] sched/fair: Task based throttle time accounting Aaron Lu
2025-09-03  8:05   ` [tip: sched/core] " tip-bot2 for Aaron Lu
2025-08-29  8:11 ` [PATCH v4 5/5] sched/fair: Get rid of throttled_lb_pair() Aaron Lu
2025-09-03  8:05   ` [tip: sched/core] " tip-bot2 for Aaron Lu
2025-09-01 10:03 ` [PATCH v4 0/5] Defer throttle when task exits to user Peter Zijlstra
2025-12-02  8:59 ` Bezdeka, Florian
2025-12-02  9:43   ` Aaron Lu
2025-12-02 10:09     ` Florian Bezdeka
2025-12-02 12:01       ` Aaron Lu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xm26o6rrtgav.fsf@google.com \
    --to=bsegall@google.com \
    --cc=bigeasy@linutronix.de \
    --cc=chengming.zhou@linux.dev \
    --cc=dietmar.eggemann@arm.com \
    --cc=florian.bezdeka@siemens.com \
    --cc=jan.kiszka@siemens.com \
    --cc=joshdon@google.com \
    --cc=juri.lelli@redhat.com \
    --cc=kprateek.nayak@amd.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=liusongtang@bytedance.com \
    --cc=matteo.martelli@codethink.co.uk \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=mkoutny@suse.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=vincent.guittot@linaro.org \
    --cc=vschneid@redhat.com \
    --cc=xii@google.com \
    --cc=yu.c.chen@intel.com \
    --cc=zhouchuyi@bytedance.com \
    --cc=ziqianlu@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.