From: Sasha Levin <sashal@kernel.org>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: Juri Lelli <juri.lelli@redhat.com>,
Peter Zijlstra <peterz@infradead.org>,
Phil Auld <pauld@redhat.com>, Waiman Long <longman@redhat.com>,
Sasha Levin <sashal@kernel.org>,
mingo@redhat.com, vincent.guittot@linaro.org
Subject: [PATCH AUTOSEL 6.13 3/7] sched/deadline: Check bandwidth overflow earlier for hotplug
Date: Sun, 26 Jan 2025 09:50:06 -0500 [thread overview]
Message-ID: <20250126145011.925720-3-sashal@kernel.org> (raw)
In-Reply-To: <20250126145011.925720-1-sashal@kernel.org>
From: Juri Lelli <juri.lelli@redhat.com>
[ Upstream commit 53916d5fd3c0b658de3463439dd2b7ce765072cb ]
Currently we check for bandwidth overflow potentially due to hotplug
operations at the end of sched_cpu_deactivate(), after the cpu going
offline has already been removed from scheduling, active_mask, etc.
This can create issues for DEADLINE tasks, as there is a substantial
race window between the start of sched_cpu_deactivate() and the moment
we possibly decide to roll-back the operation if dl_bw_deactivate()
returns failure in cpuset_cpu_inactive(). An example is a throttled
task that sees its replenishment timer firing while the cpu it was
previously running on is considered offline, but before
dl_bw_deactivate() had a chance to say no and roll-back happened.
Fix this by directly calling dl_bw_deactivate() first thing in
sched_cpu_deactivate() and do the required calculation in the former
function considering the cpu passed as an argument as offline already.
By doing so we also simplify sched_cpu_deactivate(), as there is no need
anymore for any kind of roll-back if we fail early.
Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Phil Auld <pauld@redhat.com>
Tested-by: Waiman Long <longman@redhat.com>
Link: https://lore.kernel.org/r/Zzc1DfPhbvqDDIJR@jlelli-thinkpadt14gen4.remote.csb
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
kernel/sched/core.c | 22 +++++++---------------
kernel/sched/deadline.c | 12 ++++++++++--
2 files changed, 17 insertions(+), 17 deletions(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index f823942c9e11a..ed95861e9887c 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -8182,19 +8182,14 @@ static void cpuset_cpu_active(void)
cpuset_update_active_cpus();
}
-static int cpuset_cpu_inactive(unsigned int cpu)
+static void cpuset_cpu_inactive(unsigned int cpu)
{
if (!cpuhp_tasks_frozen) {
- int ret = dl_bw_deactivate(cpu);
-
- if (ret)
- return ret;
cpuset_update_active_cpus();
} else {
num_cpus_frozen++;
partition_sched_domains(1, NULL, NULL);
}
- return 0;
}
static inline void sched_smt_present_inc(int cpu)
@@ -8256,6 +8251,11 @@ int sched_cpu_deactivate(unsigned int cpu)
struct rq *rq = cpu_rq(cpu);
int ret;
+ ret = dl_bw_deactivate(cpu);
+
+ if (ret)
+ return ret;
+
/*
* Remove CPU from nohz.idle_cpus_mask to prevent participating in
* load balancing when not active
@@ -8301,15 +8301,7 @@ int sched_cpu_deactivate(unsigned int cpu)
return 0;
sched_update_numa(cpu, false);
- ret = cpuset_cpu_inactive(cpu);
- if (ret) {
- sched_smt_present_inc(cpu);
- sched_set_rq_online(rq, cpu);
- balance_push_set(cpu, false);
- set_cpu_active(cpu, true);
- sched_update_numa(cpu, true);
- return ret;
- }
+ cpuset_cpu_inactive(cpu);
sched_domains_numa_masks_clear(cpu);
return 0;
}
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index b078014273d9e..b6781ddea7650 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -3488,6 +3488,13 @@ static int dl_bw_manage(enum dl_bw_request req, int cpu, u64 dl_bw)
}
break;
case dl_bw_req_deactivate:
+ /*
+ * cpu is not off yet, but we need to do the math by
+ * considering it off already (i.e., what would happen if we
+ * turn cpu off?).
+ */
+ cap -= arch_scale_cpu_capacity(cpu);
+
/*
* cpu is going offline and NORMAL tasks will be moved away
* from it. We can thus discount dl_server bandwidth
@@ -3505,9 +3512,10 @@ static int dl_bw_manage(enum dl_bw_request req, int cpu, u64 dl_bw)
if (dl_b->total_bw - fair_server_bw > 0) {
/*
* Leaving at least one CPU for DEADLINE tasks seems a
- * wise thing to do.
+ * wise thing to do. As said above, cpu is not offline
+ * yet, so account for that.
*/
- if (dl_bw_cpus(cpu))
+ if (dl_bw_cpus(cpu) - 1)
overflow = __dl_overflow(dl_b, cap, fair_server_bw, 0);
else
overflow = 1;
--
2.39.5
next prev parent reply other threads:[~2025-01-26 14:50 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-01-26 14:50 [PATCH AUTOSEL 6.13 1/7] sched: Don't try to catch up excess steal time Sasha Levin
2025-01-26 14:50 ` [PATCH AUTOSEL 6.13 2/7] sched/deadline: Correctly account for allocated bandwidth during hotplug Sasha Levin
2025-01-26 14:50 ` Sasha Levin [this message]
2025-01-26 14:50 ` [PATCH AUTOSEL 6.13 4/7] x86: Convert unreachable() to BUG() Sasha Levin
2025-01-26 14:50 ` [PATCH AUTOSEL 6.13 5/7] locking/ww_mutex/test: Use swap() macro Sasha Levin
2025-01-26 14:50 ` [PATCH AUTOSEL 6.13 6/7] lockdep: Fix upper limit for LOCKDEP_*_BITS configs Sasha Levin
2025-01-26 14:50 ` [PATCH AUTOSEL 6.13 7/7] x86/amd_nb: Restrict init function to AMD-based systems Sasha Levin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250126145011.925720-3-sashal@kernel.org \
--to=sashal@kernel.org \
--cc=juri.lelli@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=longman@redhat.com \
--cc=mingo@redhat.com \
--cc=pauld@redhat.com \
--cc=peterz@infradead.org \
--cc=stable@vger.kernel.org \
--cc=vincent.guittot@linaro.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.