* [PATCH tip/sched/core] sched/fair: Make balance_fair() test sched_fair_runnable() instead of rq->nr_running
@ 2024-08-03 1:21 Tejun Heo
2024-08-05 3:21 ` K Prateek Nayak
2024-08-05 22:39 ` [PATCH v2 sched_ext/for-6.12] " Tejun Heo
0 siblings, 2 replies; 7+ messages in thread
From: Tejun Heo @ 2024-08-03 1:21 UTC (permalink / raw)
To: Peter Zijlstra; +Cc: Ingo Molnar, linux-kernel, kernel-team, David Vernet
balance_fair() skips newidle balancing if rq->nr_running - there are already
tasks on the rq, so no need to try to pull tasks. However, this doesn't seem
correct when bandwidth throttling is in use. When an entity gets throttled,
rq->nr_running is not decremented, so a CPU could end up in a situation
where rq->nr_running is not zero but there are no runnable tasks.
Theoretically, skipping newidle balance in this condition can lead to
increased latencies although I couldn't come up with a scenario where this
could be demonstrated reliably.
Update balance_fair() to use sched_fair_runnable() which tests
rq->cfs.nr_running which is updated by bandwidth throttling. Note that
pick_next_task_fair() already uses sched_fair_runnable() in its optimized
path for the same purpose.
This also makes put_prev_task_balance() avoid incorrectly skipping lower
priority classes' (such as sched_ext) balance(). When a CPU has only lower
priority class tasks, rq->nr_running would still be positive and
balance_fair() would return 1 even when fair doesn't have any tasks to run.
This makes put_prev_task_balance() skip lower priority classes' balance()
incorrectly which may lead to stalls.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Peter Zijlstra <peterz@infradead.org>
---
kernel/sched/fair.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8323,7 +8323,7 @@ static void set_cpus_allowed_fair(struct
static int
balance_fair(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
{
- if (rq->nr_running)
+ if (sched_fair_runnable(rq))
return 1;
return sched_balance_newidle(rq, rf) != 0;
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: [PATCH tip/sched/core] sched/fair: Make balance_fair() test sched_fair_runnable() instead of rq->nr_running 2024-08-03 1:21 [PATCH tip/sched/core] sched/fair: Make balance_fair() test sched_fair_runnable() instead of rq->nr_running Tejun Heo @ 2024-08-05 3:21 ` K Prateek Nayak 2024-08-05 18:03 ` Tejun Heo 2024-08-05 22:39 ` [PATCH v2 sched_ext/for-6.12] " Tejun Heo 1 sibling, 1 reply; 7+ messages in thread From: K Prateek Nayak @ 2024-08-05 3:21 UTC (permalink / raw) To: Tejun Heo, Peter Zijlstra Cc: Ingo Molnar, linux-kernel, kernel-team, David Vernet Hello Tejun, On 8/3/2024 6:51 AM, Tejun Heo wrote: > balance_fair() skips newidle balancing if rq->nr_running - there are already > tasks on the rq, so no need to try to pull tasks. However, this doesn't seem > correct when bandwidth throttling is in use. When an entity gets throttled, > rq->nr_running is not decremented, so a CPU could end up in a situation > where rq->nr_running is not zero but there are no runnable tasks. When does this happen exactly? throttle_cfs_rq() calls sub_nr_running() negating the "h_nr_running" of the throttled entity from the rq->nr_running. Even deadline entities on being throttled dequeue the entity (or don't enqueue them at all in enqueue_dl_entity()) and would have the correct "rq->nr_running". Could you please highlight the path that might hit this scenario? -- Thanks and Regards, Prateek > Theoretically, skipping newidle balance in this condition can lead to > increased latencies although I couldn't come up with a scenario where this > could be demonstrated reliably. > > Update balance_fair() to use sched_fair_runnable() which tests > rq->cfs.nr_running which is updated by bandwidth throttling. Note that > pick_next_task_fair() already uses sched_fair_runnable() in its optimized > path for the same purpose. > > This also makes put_prev_task_balance() avoid incorrectly skipping lower > priority classes' (such as sched_ext) balance(). When a CPU has only lower > priority class tasks, rq->nr_running would still be positive and > balance_fair() would return 1 even when fair doesn't have any tasks to run. > This makes put_prev_task_balance() skip lower priority classes' balance() > incorrectly which may lead to stalls. > > Signed-off-by: Tejun Heo <tj@kernel.org> > Reported-by: Peter Zijlstra <peterz@infradead.org> > --- > kernel/sched/fair.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -8323,7 +8323,7 @@ static void set_cpus_allowed_fair(struct > static int > balance_fair(struct rq *rq, struct task_struct *prev, struct rq_flags *rf) > { > - if (rq->nr_running) > + if (sched_fair_runnable(rq)) > return 1; > > return sched_balance_newidle(rq, rf) != 0; > ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH tip/sched/core] sched/fair: Make balance_fair() test sched_fair_runnable() instead of rq->nr_running 2024-08-05 3:21 ` K Prateek Nayak @ 2024-08-05 18:03 ` Tejun Heo 0 siblings, 0 replies; 7+ messages in thread From: Tejun Heo @ 2024-08-05 18:03 UTC (permalink / raw) To: K Prateek Nayak Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, kernel-team, David Vernet Hello, On Mon, Aug 05, 2024 at 08:51:05AM +0530, K Prateek Nayak wrote: ... > When does this happen exactly? throttle_cfs_rq() calls sub_nr_running() > negating the "h_nr_running" of the throttled entity from the > rq->nr_running. Even deadline entities on being throttled dequeue the > entity (or don't enqueue them at all in enqueue_dl_entity()) and would > have the correct "rq->nr_running". Could you please highlight the path > that might hit this scenario? I just missed the sub_nr_running() call. No wonder I couldn't see behavior difference. I'll redo the patch description. Thanks. -- tejun ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH v2 sched_ext/for-6.12] sched/fair: Make balance_fair() test sched_fair_runnable() instead of rq->nr_running 2024-08-03 1:21 [PATCH tip/sched/core] sched/fair: Make balance_fair() test sched_fair_runnable() instead of rq->nr_running Tejun Heo 2024-08-05 3:21 ` K Prateek Nayak @ 2024-08-05 22:39 ` Tejun Heo 2024-08-06 1:37 ` Chengming Zhou ` (2 more replies) 1 sibling, 3 replies; 7+ messages in thread From: Tejun Heo @ 2024-08-05 22:39 UTC (permalink / raw) To: Peter Zijlstra Cc: Ingo Molnar, linux-kernel, kernel-team, David Vernet, K Prateek Nayak balance_fair() skips newidle balancing if rq->nr_running - there are already tasks on the rq, so no need to try to pull tasks. This tests the total number of queued tasks on the CPU instead of only the fair class, but is still correct as the rq can currently only have fair class tasks while balance_fair() is running. However, with the addition of sched_ext below the fair class, this will not hold anymore and make put_prev_task_balance() skip sched_ext's balance() incorrectly as, when a CPU has only lower priority class tasks, rq->nr_running would still be positive and balance_fair() would return 1 even when fair doesn't have any tasks to run. Update balance_fair() to use sched_fair_runnable() which tests rq->cfs.nr_running which is updated by bandwidth throttling. Note that pick_next_task_fair() already uses sched_fair_runnable() in its optimized path for the same purpose. v2: K Prateek Nayak pointed out that the bw control issue described in v1 was incorrect. Patch description updated. As this makes the patch only relevant for sched_ext, I'll carry this through the sched_ext tree unless there are objections. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Peter Zijlstra <peterz@infradead.org> Cc: K Prateek Nayak <kprateek.nayak@amd.com> --- kernel/sched/fair.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -8323,7 +8323,7 @@ static void set_cpus_allowed_fair(struct static int balance_fair(struct rq *rq, struct task_struct *prev, struct rq_flags *rf) { - if (rq->nr_running) + if (sched_fair_runnable(rq)) return 1; return sched_balance_newidle(rq, rf) != 0; ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v2 sched_ext/for-6.12] sched/fair: Make balance_fair() test sched_fair_runnable() instead of rq->nr_running 2024-08-05 22:39 ` [PATCH v2 sched_ext/for-6.12] " Tejun Heo @ 2024-08-06 1:37 ` Chengming Zhou 2024-08-06 5:26 ` K Prateek Nayak 2024-08-07 10:51 ` [tip: sched/core] " tip-bot2 for Tejun Heo 2 siblings, 0 replies; 7+ messages in thread From: Chengming Zhou @ 2024-08-06 1:37 UTC (permalink / raw) To: Tejun Heo, Peter Zijlstra Cc: Ingo Molnar, linux-kernel, kernel-team, David Vernet, K Prateek Nayak On 2024/8/6 06:39, Tejun Heo wrote: > balance_fair() skips newidle balancing if rq->nr_running - there are already > tasks on the rq, so no need to try to pull tasks. This tests the total > number of queued tasks on the CPU instead of only the fair class, but is > still correct as the rq can currently only have fair class tasks while > balance_fair() is running. > > However, with the addition of sched_ext below the fair class, this will not > hold anymore and make put_prev_task_balance() skip sched_ext's balance() > incorrectly as, when a CPU has only lower priority class tasks, > rq->nr_running would still be positive and balance_fair() would return 1 > even when fair doesn't have any tasks to run. > > Update balance_fair() to use sched_fair_runnable() which tests > rq->cfs.nr_running which is updated by bandwidth throttling. Note that > pick_next_task_fair() already uses sched_fair_runnable() in its optimized > path for the same purpose. > > v2: K Prateek Nayak pointed out that the bw control issue described in v1 > was incorrect. Patch description updated. As this makes the patch only > relevant for sched_ext, I'll carry this through the sched_ext tree > unless there are objections. > > Signed-off-by: Tejun Heo <tj@kernel.org> > Reported-by: Peter Zijlstra <peterz@infradead.org> > Cc: K Prateek Nayak <kprateek.nayak@amd.com> Looks good to me. Reviewed-by: Chengming Zhou <chengming.zhou@linux.dev> Thanks. > --- > kernel/sched/fair.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -8323,7 +8323,7 @@ static void set_cpus_allowed_fair(struct > static int > balance_fair(struct rq *rq, struct task_struct *prev, struct rq_flags *rf) > { > - if (rq->nr_running) > + if (sched_fair_runnable(rq)) > return 1; > > return sched_balance_newidle(rq, rf) != 0; ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v2 sched_ext/for-6.12] sched/fair: Make balance_fair() test sched_fair_runnable() instead of rq->nr_running 2024-08-05 22:39 ` [PATCH v2 sched_ext/for-6.12] " Tejun Heo 2024-08-06 1:37 ` Chengming Zhou @ 2024-08-06 5:26 ` K Prateek Nayak 2024-08-07 10:51 ` [tip: sched/core] " tip-bot2 for Tejun Heo 2 siblings, 0 replies; 7+ messages in thread From: K Prateek Nayak @ 2024-08-06 5:26 UTC (permalink / raw) To: Tejun Heo, Peter Zijlstra Cc: Ingo Molnar, linux-kernel, kernel-team, David Vernet Hello Tejun, Thank you for updating the commit message. On 8/6/2024 4:09 AM, Tejun Heo wrote: > balance_fair() skips newidle balancing if rq->nr_running - there are already > tasks on the rq, so no need to try to pull tasks. This tests the total > number of queued tasks on the CPU instead of only the fair class, but is > still correct as the rq can currently only have fair class tasks while > balance_fair() is running. > > However, with the addition of sched_ext below the fair class, this will not > hold anymore and make put_prev_task_balance() skip sched_ext's balance() > incorrectly as, when a CPU has only lower priority class tasks, > rq->nr_running would still be positive and balance_fair() would return 1 > even when fair doesn't have any tasks to run. > > Update balance_fair() to use sched_fair_runnable() which tests > rq->cfs.nr_running which is updated by bandwidth throttling. Note that > pick_next_task_fair() already uses sched_fair_runnable() in its optimized > path for the same purpose. > > v2: K Prateek Nayak pointed out that the bw control issue described in v1 > was incorrect. Patch description updated. As this makes the patch only > relevant for sched_ext, I'll carry this through the sched_ext tree > unless there are objections. > > Signed-off-by: Tejun Heo <tj@kernel.org> > Reported-by: Peter Zijlstra <peterz@infradead.org> > Cc: K Prateek Nayak <kprateek.nayak@amd.com> Looks good to me! Reviewed-by: K Prateek Nayak <kprateek.nayak@amd.com> -- Thanks and Regards, Prateek > --- > kernel/sched/fair.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -8323,7 +8323,7 @@ static void set_cpus_allowed_fair(struct > static int > balance_fair(struct rq *rq, struct task_struct *prev, struct rq_flags *rf) > { > - if (rq->nr_running) > + if (sched_fair_runnable(rq)) > return 1; > > return sched_balance_newidle(rq, rf) != 0; ^ permalink raw reply [flat|nested] 7+ messages in thread
* [tip: sched/core] sched/fair: Make balance_fair() test sched_fair_runnable() instead of rq->nr_running 2024-08-05 22:39 ` [PATCH v2 sched_ext/for-6.12] " Tejun Heo 2024-08-06 1:37 ` Chengming Zhou 2024-08-06 5:26 ` K Prateek Nayak @ 2024-08-07 10:51 ` tip-bot2 for Tejun Heo 2 siblings, 0 replies; 7+ messages in thread From: tip-bot2 for Tejun Heo @ 2024-08-07 10:51 UTC (permalink / raw) To: linux-tip-commits Cc: Peter Zijlstra, Tejun Heo, Chengming Zhou, K Prateek Nayak, x86, linux-kernel The following commit has been merged into the sched/core branch of tip: Commit-ID: 924e2904da9b5edec61611918b98ab1f7fccc461 Gitweb: https://git.kernel.org/tip/924e2904da9b5edec61611918b98ab1f7fccc461 Author: Tejun Heo <tj@kernel.org> AuthorDate: Mon, 05 Aug 2024 12:39:10 -10:00 Committer: Peter Zijlstra <peterz@infradead.org> CommitterDate: Wed, 07 Aug 2024 12:44:16 +02:00 sched/fair: Make balance_fair() test sched_fair_runnable() instead of rq->nr_running balance_fair() skips newidle balancing if rq->nr_running - there are already tasks on the rq, so no need to try to pull tasks. This tests the total number of queued tasks on the CPU instead of only the fair class, but is still correct as the rq can currently only have fair class tasks while balance_fair() is running. However, with the addition of sched_ext below the fair class, this will not hold anymore and make put_prev_task_balance() skip sched_ext's balance() incorrectly as, when a CPU has only lower priority class tasks, rq->nr_running would still be positive and balance_fair() would return 1 even when fair doesn't have any tasks to run. Update balance_fair() to use sched_fair_runnable() which tests rq->cfs.nr_running which is updated by bandwidth throttling. Note that pick_next_task_fair() already uses sched_fair_runnable() in its optimized path for the same purpose. Reported-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Chengming Zhou <chengming.zhou@linux.dev> Reviewed-by: K Prateek Nayak <kprateek.nayak@amd.com> Link: https://lore.kernel.org/r/ZrFUjlCf7x3TNXB8@slm.duckdns.org --- kernel/sched/fair.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 795ceef..6d39a82 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -8355,7 +8355,7 @@ static void set_cpus_allowed_fair(struct task_struct *p, struct affinity_context static int balance_fair(struct rq *rq, struct task_struct *prev, struct rq_flags *rf) { - if (rq->nr_running) + if (sched_fair_runnable(rq)) return 1; return sched_balance_newidle(rq, rf) != 0; ^ permalink raw reply related [flat|nested] 7+ messages in thread
end of thread, other threads:[~2024-08-07 10:51 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-08-03 1:21 [PATCH tip/sched/core] sched/fair: Make balance_fair() test sched_fair_runnable() instead of rq->nr_running Tejun Heo 2024-08-05 3:21 ` K Prateek Nayak 2024-08-05 18:03 ` Tejun Heo 2024-08-05 22:39 ` [PATCH v2 sched_ext/for-6.12] " Tejun Heo 2024-08-06 1:37 ` Chengming Zhou 2024-08-06 5:26 ` K Prateek Nayak 2024-08-07 10:51 ` [tip: sched/core] " tip-bot2 for Tejun Heo
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox