From: Benjamin Segall <bsegall@google.com>
To: Valentin Schneider <vschneid@redhat.com>
Cc: linux-kernel@vger.kernel.org,
Peter Zijlstra <peterz@infradead.org>,
Ingo Molnar <mingo@redhat.com>,
Juri Lelli <juri.lelli@redhat.com>,
Vincent Guittot <vincent.guittot@linaro.org>,
Dietmar Eggemann <dietmar.eggemann@arm.com>,
Steven Rostedt <rostedt@goodmis.org>,
Mel Gorman <mgorman@suse.de>,
Daniel Bristot de Oliveira <bristot@redhat.com>,
Phil Auld <pauld@redhat.com>,
Clark Williams <williams@redhat.com>,
Tomas Glozar <tglozar@redhat.com>
Subject: Re: [RFC PATCH 1/2] sched/fair: Only throttle CFS tasks on return to userspace
Date: Thu, 30 Nov 2023 13:26:37 -0800 [thread overview]
Message-ID: <xm26sf4n2ar6.fsf@google.com> (raw)
In-Reply-To: <20231130161245.3894682-2-vschneid@redhat.com> (Valentin Schneider's message of "Thu, 30 Nov 2023 17:12:43 +0100")
Valentin Schneider <vschneid@redhat.com> writes:
> As reported in [1], CFS bandwidth throttling is a source of headaches in
> PREEMPT_RT - generally speaking, a throttled CFS task can hold locks that
> prevent ksoftirqd from running, which prevents replenishing & unthrottling
> the cfs_rq of said CFS task.
>
> Peter mentioned that there have been discussions on changing /when/ the
> throttling happens: rather than have it be done immediately upon updating
> the runtime statistics and realizing the cfs_rq has depleted its quota, we wait
> for the task to be about to return to userspace.
>
> I'm not aware of the arguments in favour of this for !PREEMPT_RT, but given [1]
> I had a look into it. Using this approach means no task can be throttled while
> holding a kernel lock, which solves the locking dependency issue.
The alternative we've been experimenting with (and still running into
other issues that have made it hard to tell if they work) is to still
leave the tasks on their cfs_rqs, but instead have two task_timelines or
similar per cfs_rq, one of which only has unthrottleable tasks (or
partially-throttled child cgroups) on it. Then when picking into a
partially-unthrottled cfs_rq you only look at that alternate task_timeline.
This means that we get to skip this per-actually-throttled-task loop:
> @@ -5548,7 +5548,61 @@ static int tg_unthrottle_up(struct task_group *tg, void *data)
> {
> struct rq *rq = data;
> struct cfs_rq *cfs_rq = tg->cfs_rq[cpu_of(rq)];
> + struct sched_entity *se = tg->se[cpu_of(rq)];
> + struct cfs_rq *pcfs_rq = cfs_rq_of(se);
> + long task_delta = 0, idle_task_delta = 0;
> + struct task_struct *p, *tmp;
>
> + /*
> + * Re-enqueue the tasks that have been throttled at this level.
> + *
> + * The task count is up-propagated via ->unthrottled_*h_nr_running,
> + * as we can't purely rely on h_nr_running post-enqueue: the unthrottle
> + * might happen when a cfs_rq still has some tasks enqueued, either still
> + * making their way to userspace, or freshly migrated to it.
> + */
> + list_for_each_entry_safe(p, tmp, &cfs_rq->throttled_limbo_list, throttle_node) {
> + struct sched_entity *pse = &p->se;
> +
> + list_del_init(&p->throttle_node);
> + enqueue_entity(cfs_rq, pse, ENQUEUE_WAKEUP);
> + task_delta++;
> + idle_task_delta += task_has_idle_policy(p);
> + }
The downsides are that you instead have extra operations per
enqueue/dequeue/pick (but just an extra list/rbtree operation or check),
and that it doesn't do *any* accounting change for a partially dequeued
cfs_rq.
I'm going to try putting together a cleaner variant of our version that
works via task_work instead of bracketing every relevant entry point.
(That design came from when we were trying instead to only do it for
tasks holding actual locks)
next prev parent reply other threads:[~2023-11-30 21:26 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-11-30 16:12 [RFC PATCH 0/2] sched/fair: Delay throttling to kernel exit Valentin Schneider
2023-11-30 16:12 ` [RFC PATCH 1/2] sched/fair: Only throttle CFS tasks on return to userspace Valentin Schneider
2023-11-30 21:26 ` Benjamin Segall [this message]
2023-12-01 13:30 ` Valentin Schneider
2023-12-01 22:09 ` Benjamin Segall
2023-12-07 23:47 ` Benjamin Segall
2023-12-18 18:07 ` Valentin Schneider
2023-12-18 22:34 ` Benjamin Segall
2023-12-19 11:50 ` Valentin Schneider
2023-11-30 16:12 ` [RFC PATCH 2/2] sched/fair: Repurpose cfs_rq_throttled() Valentin Schneider
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=xm26sf4n2ar6.fsf@google.com \
--to=bsegall@google.com \
--cc=bristot@redhat.com \
--cc=dietmar.eggemann@arm.com \
--cc=juri.lelli@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mgorman@suse.de \
--cc=mingo@redhat.com \
--cc=pauld@redhat.com \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=tglozar@redhat.com \
--cc=vincent.guittot@linaro.org \
--cc=vschneid@redhat.com \
--cc=williams@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox