From: Shrikanth Hegde <sshegde@linux.ibm.com>
To: linux-kernel@vger.kernel.org, mingo@kernel.org,
peterz@infradead.org, juri.lelli@redhat.com,
vincent.guittot@linaro.org, yury.norov@gmail.com,
kprateek.nayak@amd.com, iii@linux.ibm.com, corbet@lwn.net
Cc: sshegde@linux.ibm.com, tglx@kernel.org,
gregkh@linuxfoundation.org, pbonzini@redhat.com,
seanjc@google.com, vschneid@redhat.com, huschle@linux.ibm.com,
rostedt@goodmis.org, dietmar.eggemann@arm.com,
maddy@linux.ibm.com, srikar@linux.ibm.com, hdanton@sina.com,
chleroy@kernel.org, vineeth@bitbyteword.org, frederic@kernel.org,
arighi@nvidia.com, pauld@redhat.com, christian.loehle@arm.com,
tj@kernel.org, tommaso.cucinotta@gmail.com, maz@kernel.org,
rafael@kernel.org, rdunlap@infradead.org, kernellwp@gmail.com,
linux-doc@vger.kernel.org
Subject: [PATCH v6 09/23] sched/core: Push current task from non preferred CPU
Date: Wed, 1 Jul 2026 19:46:40 +0530 [thread overview]
Message-ID: <20260701141654.500125-10-sshegde@linux.ibm.com> (raw)
In-Reply-To: <20260701141654.500125-1-sshegde@linux.ibm.com>
Actively push out task running on a non-preferred CPU. Since the task is
running on the CPU, need to stop the cpu and push the task out.
However, if the task is pinned only to non-preferred CPUs, it will continue
running there. This will help in maintaining the userspace affinities
unlike CPU hotplug or isolated cpusets.
Though code is similar to __balance_push_cpu_stop and quite close to
push_cpu_stop, it is being kept separate as it provides a cleaner
implementation with CONFIG_PREFERRED_CPU.
Add push_task_work_done flag to protect work buffer.
Works only with FAIR class.
For now, only current running task is pushed out. This keeps the code
simpler. In future optimization maybe done to move all the queued
task on the rq.
Signed-off-by: Shrikanth Hegde <sshegde@linux.ibm.com>
---
kernel/sched/core.c | 87 ++++++++++++++++++++++++++++++++++++++++++++
kernel/sched/sched.h | 8 ++++
2 files changed, 95 insertions(+)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index aa4201bb8082..56905bac9525 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5797,6 +5797,9 @@ void sched_tick(void)
unsigned long hw_pressure;
u64 resched_latency;
+ if (!cpu_preferred(cpu))
+ sched_push_current_non_preferred_cpu(rq);
+
if (housekeeping_cpu(cpu, HK_TYPE_KERNEL_NOISE))
arch_scale_freq_tick();
@@ -11315,3 +11318,87 @@ void sched_change_end(struct sched_change_ctx *ctx)
p->sched_class->prio_changed(rq, p, ctx->prio);
}
}
+
+#ifdef CONFIG_PREFERRED_CPU
+/* npc - non preferred CPU */
+static DEFINE_PER_CPU(struct cpu_stop_work, npc_push_task_work);
+
+static int sched_non_preferred_cpu_push_stop(void *arg)
+{
+ struct task_struct *p = arg;
+ struct rq *rq = this_rq();
+ struct rq_flags rf;
+ int cpu;
+
+ /* sanity check and clear */
+ if (cpu_preferred(rq->cpu)) {
+ scoped_guard (rq_lock, rq)
+ rq->push_task_work_done = 0;
+ put_task_struct(p);
+ return 0;
+ }
+
+ raw_spin_lock_irq(&p->pi_lock);
+
+ /* This could take rq lock. So call it before rq lock is taken */
+ cpu = select_fallback_rq(rq->cpu, p);
+ rq_lock(rq, &rf);
+ rq->push_task_work_done = 0;
+ update_rq_clock(rq);
+
+ context_unsafe_alias(rq);
+
+ if (task_rq(p) == rq && task_on_rq_queued(p))
+ rq = __migrate_task(rq, &rf, p, cpu);
+
+ rq_unlock(rq, &rf);
+ raw_spin_unlock_irq(&p->pi_lock);
+ put_task_struct(p);
+
+ return 0;
+}
+
+/*
+ * Push the current task running on non-preferred CPU.
+ * Using this non preferred CPU will lead to more vCPU preemptions
+ * in the host. So it is better not to use this CPU.
+ *
+ * Since task is running, call a stopper to push the task out. This is
+ * similar to how task moves during hotplug. In select_fallback_rq a
+ * preferred CPU will be chosen and henceforth task shouldn't come back to
+ * this CPU again.
+ *
+ * Works for FAIR class only
+ *
+ * If task is affined only non-preferred CPUs, it can't be moved out
+ */
+void sched_push_current_non_preferred_cpu(struct rq *rq)
+{
+ struct task_struct *push_task = rq->curr;
+
+ /* Preferred feature works only for FAIR class */
+ if (push_task->sched_class != &fair_sched_class)
+ return;
+
+ if (kthread_is_per_cpu(push_task) ||
+ is_migration_disabled(push_task))
+ return;
+
+ /* Don't push the task if it is affined only on non preferred CPUs */
+ if (!task_has_preferred_cpus(push_task))
+ return;
+
+ /* There is already a stopper thread for this. Dont race with it. */
+ if (rq->push_task_work_done == 1)
+ return;
+
+ /* sched_tick runs with interrupts disabled. */
+ get_task_struct(push_task);
+
+ scoped_guard (rq_lock, rq)
+ rq->push_task_work_done = 1;
+
+ stop_one_cpu_nowait(rq->cpu, sched_non_preferred_cpu_push_stop,
+ push_task, this_cpu_ptr(&npc_push_task_work));
+}
+#endif
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 36ae20310891..711fc8bd7ebc 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1277,6 +1277,8 @@ struct rq {
struct list_head cfs_tasks;
+ bool push_task_work_done;
+
struct sched_avg avg_rt;
struct sched_avg avg_dl;
#ifdef CONFIG_HAVE_SCHED_AVG_IRQ
@@ -4239,4 +4241,10 @@ static inline bool task_has_preferred_cpus(struct task_struct *p)
return cpumask_intersects(p->cpus_ptr, cpu_preferred_mask);
}
+#ifdef CONFIG_PREFERRED_CPU
+void sched_push_current_non_preferred_cpu(struct rq *rq);
+#else /* !CONFIG_PREFERRED_CPU */
+static inline void sched_push_current_non_preferred_cpu(struct rq *rq) { }
+#endif
+
#endif /* _KERNEL_SCHED_SCHED_H */
--
2.47.3
next prev parent reply other threads:[~2026-07-01 14:19 UTC|newest]
Thread overview: 35+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-07-01 14:16 [PATCH v6 00/23] sched: Introduce cpu_preferred_mask and steal-driven vCPU backoff Shrikanth Hegde
2026-07-01 14:16 ` [PATCH v6 01/23] sched/docs: Document cpu_preferred_mask and Preferred CPU concept Shrikanth Hegde
2026-07-01 14:16 ` [PATCH v6 02/23] kconfig: Provide PREFERRED_CPU option Shrikanth Hegde
2026-07-01 14:16 ` [PATCH v6 03/23] cpumask: Introduce cpu_preferred_mask Shrikanth Hegde
2026-07-01 15:35 ` Yury Norov
2026-07-01 16:40 ` Shrikanth Hegde
2026-07-01 14:16 ` [PATCH v6 04/23] sysfs: Add preferred CPU file Shrikanth Hegde
2026-07-01 14:16 ` [PATCH v6 05/23] sched/core: Try to use a preferred CPU in is_cpu_allowed Shrikanth Hegde
2026-07-01 16:09 ` Yury Norov
2026-07-01 16:49 ` Shrikanth Hegde
2026-07-02 6:30 ` Shrikanth Hegde
2026-07-01 14:16 ` [PATCH v6 06/23] sched/fair: Load balance only among preferred CPUs Shrikanth Hegde
2026-07-01 16:19 ` Yury Norov
2026-07-01 16:41 ` Shrikanth Hegde
2026-07-01 14:16 ` [PATCH v6 07/23] sched/fair: Pull the load on preferred CPU Shrikanth Hegde
2026-07-01 14:16 ` [PATCH v6 08/23] sched/core: Keep tick on non-preferred CPUs until tasks are out Shrikanth Hegde
2026-07-01 14:16 ` Shrikanth Hegde [this message]
2026-07-01 16:50 ` [PATCH v6 09/23] sched/core: Push current task from non preferred CPU Yury Norov
2026-07-01 17:03 ` Shrikanth Hegde
2026-07-01 14:16 ` [PATCH v6 10/23] sched/debug: Add migration stats due to non preferred CPUs Shrikanth Hegde
2026-07-01 14:16 ` [PATCH v6 11/23] virt/steal_monitor: Add documentation Shrikanth Hegde
2026-07-01 14:16 ` [PATCH v6 12/23] virt: Introduce steal monitor driver Shrikanth Hegde
2026-07-01 14:16 ` [PATCH v6 13/23] virt/steal_monitor: Restore to active on module disable Shrikanth Hegde
2026-07-01 14:16 ` [PATCH v6 14/23] virt/steal_monitor: Define steal_monitor structure Shrikanth Hegde
2026-07-01 14:16 ` [PATCH v6 15/23] virt/steal_monitor: Add control knobs for handling steal values Shrikanth Hegde
2026-07-02 7:16 ` Shrikanth Hegde
2026-07-01 14:16 ` [PATCH v6 16/23] virt/steal_monitor: Compute work at regular intervals Shrikanth Hegde
2026-07-01 14:16 ` [PATCH v6 17/23] virt/steal_monitor: Provide default method to get systemwide steal time Shrikanth Hegde
2026-07-01 14:16 ` [PATCH v6 18/23] virt/steal_monitor: Provide default method to inc/dec preferred CPUs Shrikanth Hegde
2026-07-01 14:16 ` [PATCH v6 19/23] virt/steal_monitor: Provide default method to get num of CPUs for steal ratio Shrikanth Hegde
2026-07-01 14:16 ` [PATCH v6 20/23] virt/steal_monitor: Act on steal values at regular intervals Shrikanth Hegde
2026-07-02 7:18 ` Shrikanth Hegde
2026-07-01 14:16 ` [PATCH v6 21/23] virt/steal_monitor: Add direction control Shrikanth Hegde
2026-07-01 14:16 ` [PATCH v6 22/23] virt/steal_monitor: Add design check of preferred subset of active Shrikanth Hegde
2026-07-01 14:16 ` [PATCH v6 23/23] virt/steal_monitor: Optimise decrease_preferred_cpus when all CPUs are housekeeping Shrikanth Hegde
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260701141654.500125-10-sshegde@linux.ibm.com \
--to=sshegde@linux.ibm.com \
--cc=arighi@nvidia.com \
--cc=chleroy@kernel.org \
--cc=christian.loehle@arm.com \
--cc=corbet@lwn.net \
--cc=dietmar.eggemann@arm.com \
--cc=frederic@kernel.org \
--cc=gregkh@linuxfoundation.org \
--cc=hdanton@sina.com \
--cc=huschle@linux.ibm.com \
--cc=iii@linux.ibm.com \
--cc=juri.lelli@redhat.com \
--cc=kernellwp@gmail.com \
--cc=kprateek.nayak@amd.com \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=maddy@linux.ibm.com \
--cc=maz@kernel.org \
--cc=mingo@kernel.org \
--cc=pauld@redhat.com \
--cc=pbonzini@redhat.com \
--cc=peterz@infradead.org \
--cc=rafael@kernel.org \
--cc=rdunlap@infradead.org \
--cc=rostedt@goodmis.org \
--cc=seanjc@google.com \
--cc=srikar@linux.ibm.com \
--cc=tglx@kernel.org \
--cc=tj@kernel.org \
--cc=tommaso.cucinotta@gmail.com \
--cc=vincent.guittot@linaro.org \
--cc=vineeth@bitbyteword.org \
--cc=vschneid@redhat.com \
--cc=yury.norov@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox