From: Shrikanth Hegde <sshegde@linux.ibm.com>
To: linux-kernel@vger.kernel.org, mingo@kernel.org,
peterz@infradead.org, juri.lelli@redhat.com,
vincent.guittot@linaro.org, yury.norov@gmail.com,
kprateek.nayak@amd.com, iii@linux.ibm.com, corbet@lwn.net
Cc: sshegde@linux.ibm.com, tglx@kernel.org,
gregkh@linuxfoundation.org, pbonzini@redhat.com,
seanjc@google.com, vschneid@redhat.com, huschle@linux.ibm.com,
rostedt@goodmis.org, dietmar.eggemann@arm.com,
maddy@linux.ibm.com, srikar@linux.ibm.com, hdanton@sina.com,
chleroy@kernel.org, vineeth@bitbyteword.org, frederic@kernel.org,
arighi@nvidia.com, pauld@redhat.com, christian.loehle@arm.com,
tj@kernel.org, tommaso.cucinotta@gmail.com, maz@kernel.org,
rafael@kernel.org, rdunlap@infradead.org, kernellwp@gmail.com,
linux-doc@vger.kernel.org
Subject: [PATCH v5 11/24] sched/core: Push current task from non preferred CPU
Date: Thu, 25 Jun 2026 18:16:35 +0530 [thread overview]
Message-ID: <20260625124648.802832-12-sshegde@linux.ibm.com> (raw)
In-Reply-To: <20260625124648.802832-1-sshegde@linux.ibm.com>
Actively push out task running on a non-preferred CPU. Since the task is
running on the CPU, need to stop the cpu and push the task out.
However, if the task in pinned only to non-preferred CPUs, it will continue
running there. This will help in maintaining the userspace affinities
unlike CPU hotplug or isolated cpusets.
Though code is almost same as __balance_push_cpu_stop and quite close to
push_cpu_stop, it is being kept separate as it provides a cleaner
implementation w.r.t CONFIG_HOTPLUG_CPU.
Add push_task_work_done flag to protect work buffer.
Works only with FAIR class.
Signed-off-by: Shrikanth Hegde <sshegde@linux.ibm.com>
---
v4->v5:
- Move select_fallback_rq outside of rq_lock (Sashiko)
- Add context_unsafe_alias (K Prateek Nayak)
- Cleanup properly on early exit.
kernel/sched/core.c | 87 ++++++++++++++++++++++++++++++++++++++++++++
kernel/sched/sched.h | 8 ++++
2 files changed, 95 insertions(+)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index c0391e7897f5..1e42078251d5 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5794,6 +5794,9 @@ void sched_tick(void)
unsigned long hw_pressure;
u64 resched_latency;
+ if (!cpu_preferred(cpu))
+ sched_push_current_non_preferred_cpu(rq);
+
if (housekeeping_cpu(cpu, HK_TYPE_KERNEL_NOISE))
arch_scale_freq_tick();
@@ -11303,3 +11306,87 @@ void sched_change_end(struct sched_change_ctx *ctx)
p->sched_class->prio_changed(rq, p, ctx->prio);
}
}
+
+#ifdef CONFIG_PREFERRED_CPU
+/* npc - non preferred CPU */
+static DEFINE_PER_CPU(struct cpu_stop_work, npc_push_task_work);
+
+static int sched_non_preferred_cpu_push_stop(void *arg)
+{
+ struct task_struct *p = arg;
+ struct rq *rq = this_rq();
+ struct rq_flags rf;
+ int cpu;
+
+ /* sanity check and clear */
+ if (cpu_preferred(rq->cpu)) {
+ scoped_guard (rq_lock, rq)
+ rq->push_task_work_done = 0;
+ put_task_struct(p);
+ return 0;
+ }
+
+ raw_spin_lock_irq(&p->pi_lock);
+
+ /* This could take rq lock. So call it before rq lock is taken */
+ cpu = select_fallback_rq(rq->cpu, p);
+ rq_lock(rq, &rf);
+ rq->push_task_work_done = 0;
+ update_rq_clock(rq);
+
+ context_unsafe_alias(rq);
+
+ if (task_rq(p) == rq && task_on_rq_queued(p))
+ rq = __migrate_task(rq, &rf, p, cpu);
+
+ rq_unlock(rq, &rf);
+ raw_spin_unlock_irq(&p->pi_lock);
+ put_task_struct(p);
+
+ return 0;
+}
+
+/*
+ * Push the current task running on non-preferred CPU.
+ * Using this non preferred CPU will lead to more vCPU preemptions
+ * in the host. So it is better not to use this CPU.
+ *
+ * Since task is running, call a stopper to push the task out. This is
+ * similar to how task moves during hotplug. In select_fallback_rq a
+ * preferred CPU will be chosen and henceforth task shouldn't come back to
+ * this CPU again.
+ *
+ * Works for FAIR class only
+ *
+ * If task is affined only non-preferred CPUs, it can't be moved out
+ */
+void sched_push_current_non_preferred_cpu(struct rq *rq)
+{
+ struct task_struct *push_task = rq->curr;
+
+ /* Push only if it is FAIR class */
+ if (push_task->sched_class != &fair_sched_class)
+ return;
+
+ if (kthread_is_per_cpu(push_task) ||
+ is_migration_disabled(push_task))
+ return;
+
+ /* Is there any preferred CPU in the affinity list */
+ if (!task_has_preferred_cpus(push_task))
+ return;
+
+ /* There is already a stopper thread for this. Dont race with it */
+ if (rq->push_task_work_done == 1)
+ return;
+
+ /* sched_tick runs with interrupts disabled. Don't disable again */
+ get_task_struct(push_task);
+
+ scoped_guard (rq_lock, rq)
+ rq->push_task_work_done = 1;
+
+ stop_one_cpu_nowait(rq->cpu, sched_non_preferred_cpu_push_stop,
+ push_task, this_cpu_ptr(&npc_push_task_work));
+}
+#endif
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 148fe6145f1a..316d3ccefc48 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1274,6 +1274,8 @@ struct rq {
struct list_head cfs_tasks;
+ bool push_task_work_done;
+
struct sched_avg avg_rt;
struct sched_avg avg_dl;
#ifdef CONFIG_HAVE_SCHED_AVG_IRQ
@@ -4241,4 +4243,10 @@ static inline bool task_has_preferred_cpus(struct task_struct *p)
else
return cpumask_intersects(p->cpus_ptr, cpu_preferred_mask);
}
+
+#ifdef CONFIG_PREFERRED_CPU
+void sched_push_current_non_preferred_cpu(struct rq *rq);
+#else /* !CONFIG_PREFERRED_CPU */
+static inline void sched_push_current_non_preferred_cpu(struct rq *rq) { }
+#endif
#endif /* _KERNEL_SCHED_SCHED_H */
--
2.47.3
next prev parent reply other threads:[~2026-06-25 12:49 UTC|newest]
Thread overview: 46+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-25 12:46 [PATCH v5 00/24] sched: Introduce cpu_preferred_mask and steal-driven vCPU backoff Shrikanth Hegde
2026-06-25 12:46 ` [PATCH v5 01/24] sched/debug: Remove unused schedstats Shrikanth Hegde
2026-06-25 12:46 ` [PATCH v5 02/24] sched/docs: Document cpu_preferred_mask and Preferred CPU concept Shrikanth Hegde
2026-06-25 12:46 ` [PATCH v5 03/24] kconfig: Provide PREFERRED_CPU option Shrikanth Hegde
2026-06-25 12:46 ` [PATCH v5 04/24] cpumask: Introduce cpu_preferred_mask Shrikanth Hegde
2026-06-26 9:34 ` Peter Zijlstra
2026-06-26 13:37 ` Shrikanth Hegde
2026-06-26 9:39 ` Peter Zijlstra
2026-06-26 9:41 ` Peter Zijlstra
2026-06-26 13:09 ` Shrikanth Hegde
2026-06-26 13:18 ` Yury Norov
2026-06-26 13:27 ` Shrikanth Hegde
2026-06-26 12:40 ` Yury Norov
2026-06-26 13:18 ` Shrikanth Hegde
2026-06-26 18:51 ` Shrikanth Hegde
2026-06-25 12:46 ` [PATCH v5 05/24] sysfs: Add preferred CPU file Shrikanth Hegde
2026-06-25 12:46 ` [PATCH v5 06/24] sched/core: allow only preferred CPUs in is_cpu_allowed Shrikanth Hegde
2026-06-26 13:06 ` Yury Norov
2026-06-26 13:25 ` Shrikanth Hegde
2026-06-26 18:43 ` Shrikanth Hegde
2026-06-25 12:46 ` [PATCH v5 07/24] sched/fair: Select preferred CPU at wakeup when possible Shrikanth Hegde
2026-06-26 9:59 ` Peter Zijlstra
2026-06-26 13:17 ` Shrikanth Hegde
2026-06-25 12:46 ` [PATCH v5 08/24] sched/fair: load balance only among preferred CPUs Shrikanth Hegde
2026-06-25 12:46 ` [PATCH v5 09/24] sched/fair: Pull the load on preferred CPU Shrikanth Hegde
2026-06-26 10:00 ` Peter Zijlstra
2026-06-26 13:35 ` Shrikanth Hegde
2026-06-25 12:46 ` [PATCH v5 10/24] sched/core: Keep tick on non-preferred CPUs until tasks are out Shrikanth Hegde
2026-06-25 12:46 ` Shrikanth Hegde [this message]
2026-06-25 12:46 ` [PATCH v5 12/24] sched/debug: Add migration stats due to non preferred CPUs Shrikanth Hegde
2026-06-25 12:46 ` [PATCH v5 13/24] virt/steal_monitor: Add documentation Shrikanth Hegde
2026-06-25 17:00 ` Randy Dunlap
2026-06-26 4:30 ` Shrikanth Hegde
2026-06-26 9:28 ` Peter Zijlstra
2026-06-26 14:05 ` Shrikanth Hegde
2026-06-25 12:46 ` [PATCH v5 14/24] virt: Introduce steal monitor driver Shrikanth Hegde
2026-06-25 12:46 ` [PATCH v5 15/24] virt/steal_monitor: Restore to active on module disable Shrikanth Hegde
2026-06-25 12:46 ` [PATCH v5 16/24] virt/steal_monitor: Define steal_monitor structure Shrikanth Hegde
2026-06-25 12:46 ` [PATCH v5 17/24] virt/steal_monitor: Add control knobs for handling steal values Shrikanth Hegde
2026-06-25 12:46 ` [PATCH v5 18/24] virt/steal_monitor: Compute work at regular intervals Shrikanth Hegde
2026-06-25 12:46 ` [PATCH v5 19/24] virt/steal_monitor: Provide default method to get systemwide steal time Shrikanth Hegde
2026-06-25 12:46 ` [PATCH v5 20/24] virt/steal_monitor: Provide default method to inc/dec preferred CPUs Shrikanth Hegde
2026-06-25 12:46 ` [PATCH v5 21/24] virt/steal_monitor: Provide default method to get num of CPUs for steal ratio Shrikanth Hegde
2026-06-25 12:46 ` [PATCH v5 22/24] virt/steal_monitor: Act on steal values at regular intervals Shrikanth Hegde
2026-06-25 12:46 ` [PATCH v5 23/24] virt/steal_monitor: Add direction control Shrikanth Hegde
2026-06-25 12:46 ` [PATCH v5 24/24] virt/steal_monitor: Add design check of preferred subset of active Shrikanth Hegde
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260625124648.802832-12-sshegde@linux.ibm.com \
--to=sshegde@linux.ibm.com \
--cc=arighi@nvidia.com \
--cc=chleroy@kernel.org \
--cc=christian.loehle@arm.com \
--cc=corbet@lwn.net \
--cc=dietmar.eggemann@arm.com \
--cc=frederic@kernel.org \
--cc=gregkh@linuxfoundation.org \
--cc=hdanton@sina.com \
--cc=huschle@linux.ibm.com \
--cc=iii@linux.ibm.com \
--cc=juri.lelli@redhat.com \
--cc=kernellwp@gmail.com \
--cc=kprateek.nayak@amd.com \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=maddy@linux.ibm.com \
--cc=maz@kernel.org \
--cc=mingo@kernel.org \
--cc=pauld@redhat.com \
--cc=pbonzini@redhat.com \
--cc=peterz@infradead.org \
--cc=rafael@kernel.org \
--cc=rdunlap@infradead.org \
--cc=rostedt@goodmis.org \
--cc=seanjc@google.com \
--cc=srikar@linux.ibm.com \
--cc=tglx@kernel.org \
--cc=tj@kernel.org \
--cc=tommaso.cucinotta@gmail.com \
--cc=vincent.guittot@linaro.org \
--cc=vineeth@bitbyteword.org \
--cc=vschneid@redhat.com \
--cc=yury.norov@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.