public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: K Prateek Nayak <kprateek.nayak@amd.com>
To: Josh Don <joshdon@google.com>, Aaron Lu <ziqianlu@bytedance.com>
Cc: Valentin Schneider <vschneid@redhat.com>,
	Ben Segall <bsegall@google.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	<linux-kernel@vger.kernel.org>,
	Juri Lelli <juri.lelli@redhat.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Mel Gorman <mgorman@suse.de>,
	Chengming Zhou <chengming.zhou@linux.dev>,
	Chuyi Zhou <zhouchuyi@bytedance.com>, Xi Wang <xii@google.com>
Subject: Re: [RFC PATCH 2/7] sched/fair: Handle throttle path for task based throttle
Date: Thu, 20 Mar 2025 12:23:01 +0530	[thread overview]
Message-ID: <ab91bfe1-4331-4e33-87fa-4f4fe96adb00@amd.com> (raw)
In-Reply-To: <CABk29Nuuq6s1+FBftOPAcMkYU+F1n2nebcP5tDK9dH4_KXA2cw@mail.gmail.com>

Hello Josh,

On 3/16/2025 8:55 AM, Josh Don wrote:
> Hi Aaron,
> 
>>   static int tg_throttle_down(struct task_group *tg, void *data)
>>   {
>>          struct rq *rq = data;
>>          struct cfs_rq *cfs_rq = tg->cfs_rq[cpu_of(rq)];
>> +       struct task_struct *p;
>> +       struct rb_node *node;
>> +
>> +       cfs_rq->throttle_count++;
>> +       if (cfs_rq->throttle_count > 1)
>> +               return 0;
>>
>>          /* group is entering throttled state, stop time */
>> -       if (!cfs_rq->throttle_count) {
>> -               cfs_rq->throttled_clock_pelt = rq_clock_pelt(rq);
>> -               list_del_leaf_cfs_rq(cfs_rq);
>> +       cfs_rq->throttled_clock_pelt = rq_clock_pelt(rq);
>> +       list_del_leaf_cfs_rq(cfs_rq);
>>
>> -               SCHED_WARN_ON(cfs_rq->throttled_clock_self);
>> -               if (cfs_rq->nr_queued)
>> -                       cfs_rq->throttled_clock_self = rq_clock(rq);
>> +       SCHED_WARN_ON(cfs_rq->throttled_clock_self);
>> +       if (cfs_rq->nr_queued)
>> +               cfs_rq->throttled_clock_self = rq_clock(rq);
>> +
>> +       WARN_ON_ONCE(!list_empty(&cfs_rq->throttled_limbo_list));
>> +       /*
>> +        * rq_lock is held, current is (obviously) executing this in kernelspace.
>> +        *
>> +        * All other tasks enqueued on this rq have their saved PC at the
>> +        * context switch, so they will go through the kernel before returning
>> +        * to userspace. Thus, there are no tasks-in-userspace to handle, just
>> +        * install the task_work on all of them.
>> +        */
>> +       node = rb_first(&cfs_rq->tasks_timeline.rb_root);
>> +       while (node) {
>> +               struct sched_entity *se = __node_2_se(node);
>> +
>> +               if (!entity_is_task(se))
>> +                       goto next;
>> +
>> +               p = task_of(se);
>> +               task_throttle_setup_work(p);
>> +next:
>> +               node = rb_next(node);
>> +       }
> 
> I'd like to strongly push back on this approach. This adds quite a lot
> of extra computation to an already expensive path
> (throttle/unthrottle). e.g. this function is part of the cgroup walk
> and so it is already O(cgroups) for the number of cgroups in the
> hierarchy being throttled. This gets even worse when you consider that
> we repeat this separately across all the cpus that the
> bandwidth-constrained group is running on. Keep in mind that
> throttle/unthrottle is done with rq lock held and IRQ disabled.

On this note, do you have any statistics for how many tasks are
throttled per-CPU on your system. The info from:

     sudo ./bpftrace -e "kprobe:throttle_cfs_rq { \
         @nr_queued[((struct cfs_rq *)$1)->h_nr_queued] = count(); \
         @nr_runnable[((struct cfs_rq *)$1)->h_nr_runnable] = count(); \
     }"

could help estimate the worst case times with per-task throttling
we are expecting.

> 
> In K Prateek's last RFC, there was discussion of using context
> tracking; did you consider that approach any further? We could keep
> track of the number of threads within a cgroup hierarchy currently in
> kernel mode (similar to h_nr_runnable), and thus simplify down the
> throttling code here.

Based on Chengming's latest suggestion, we can keep tg_throttle_down()
as is and tag the task at pick using throttled_hierarchy() which will
work too.

Since it'll most likely end up doing:

     if (throttled_hierarchy(cfs_rq_of(&p->se)))
             task_throttle_setup_work(p);

The only overhead for the users not using CFS bandwidth is just the
cfs_rq->throttle_count check. If it was set, you are simply moving
the overhead to set the throttle work from the throttle path to
the pick path for the throttled tasks only and it also avoids
adding unnecessary work to task that may never get picked before
unthrottle.

> 
> Best,
> Josh

-- 
Thanks and Regards,
Prateek


  parent reply	other threads:[~2025-03-20  6:53 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-17 10:56 [RFC PATCH 0/7] Defer throttle when task exits to user Aaron Lu
2025-03-13  7:21 ` [RFC PATCH 1/7] sched/fair: Add related data structure for task based throttle Aaron Lu
2025-03-17 10:28   ` Valentin Schneider
2025-03-17 11:02     ` Aaron Lu
2025-03-13  7:21 ` [RFC PATCH 2/7] sched/fair: Handle throttle path " Aaron Lu
2025-03-13 18:14   ` K Prateek Nayak
2025-03-14  8:48     ` Aaron Lu
2025-03-14  9:00       ` K Prateek Nayak
2025-03-14  3:28   ` K Prateek Nayak
2025-03-14  8:57     ` Aaron Lu
2025-03-14  9:12       ` K Prateek Nayak
2025-03-14 15:10         ` Aaron Lu
2025-03-14  8:39   ` Chengming Zhou
2025-03-14  8:49     ` K Prateek Nayak
2025-03-14  9:42     ` Aaron Lu
2025-03-14 10:26       ` K Prateek Nayak
2025-03-14 11:47         ` Aaron Lu
2025-03-14 15:58           ` Chengming Zhou
2025-03-14 18:04           ` K Prateek Nayak
2025-03-14 11:07       ` Chengming Zhou
2025-03-31  6:42         ` Aaron Lu
2025-03-31  9:14           ` Chengming Zhou
2025-03-16  3:25   ` Josh Don
2025-03-17  2:54     ` Chengming Zhou
2025-03-20  6:59       ` K Prateek Nayak
2025-03-20  8:39         ` Chengming Zhou
2025-03-20 18:40           ` Xi Wang
2025-03-24  8:58             ` Aaron Lu
2025-03-25 10:02               ` Aaron Lu
2025-03-28  0:11                 ` Xi Wang
2025-03-28  3:11                   ` Aaron Lu
2025-03-28 22:47         ` Benjamin Segall
2025-03-19 13:43     ` Aaron Lu
2025-03-20  1:06       ` Josh Don
2025-03-20  6:53     ` K Prateek Nayak [this message]
2025-03-13  7:21 ` [RFC PATCH 3/7] sched/fair: Handle unthrottle " Aaron Lu
2025-03-14  3:53   ` K Prateek Nayak
2025-03-14  4:06     ` K Prateek Nayak
2025-03-14 10:43     ` Aaron Lu
2025-03-14 17:52       ` K Prateek Nayak
2025-03-17  5:48         ` Aaron Lu
2025-04-02  9:25         ` Aaron Lu
2025-04-02 17:24           ` K Prateek Nayak
2025-03-13  7:21 ` [RFC PATCH 4/7] sched/fair: Take care of migrated task " Aaron Lu
2025-03-14  4:03   ` K Prateek Nayak
2025-03-14  9:49     ` [External] " Aaron Lu
2025-03-13  7:21 ` [RFC PATCH 5/7] sched/fair: Take care of group/affinity/sched_class change for throttled task Aaron Lu
2025-03-14  4:51   ` K Prateek Nayak
2025-03-14 11:40     ` [External] " Aaron Lu
2025-03-13  7:22 ` [RFC PATCH 6/7] sched/fair: fix tasks_rcu with task based throttle Aaron Lu
2025-03-14  4:14   ` K Prateek Nayak
2025-03-14 11:37     ` [External] " Aaron Lu
2025-03-31  6:19     ` Aaron Lu
2025-04-01  3:17       ` K Prateek Nayak
2025-04-01  8:48         ` Aaron Lu
2025-03-13  7:22 ` [RFC PATCH 7/7] sched/fair: Make sure cfs_rq has enough runtime_remaining on unthrottle path Aaron Lu
2025-03-14  4:18   ` K Prateek Nayak
2025-03-14 11:39     ` [External] " Aaron Lu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ab91bfe1-4331-4e33-87fa-4f4fe96adb00@amd.com \
    --to=kprateek.nayak@amd.com \
    --cc=bsegall@google.com \
    --cc=chengming.zhou@linux.dev \
    --cc=dietmar.eggemann@arm.com \
    --cc=joshdon@google.com \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=vincent.guittot@linaro.org \
    --cc=vschneid@redhat.com \
    --cc=xii@google.com \
    --cc=zhouchuyi@bytedance.com \
    --cc=ziqianlu@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox