From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A559B31ED7C for ; Thu, 14 May 2026 15:24:53 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.158.5 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778772295; cv=none; b=oYuBG0eEhtPmqIj8TwuAaMltthU3EUPYuAgm74dN58SxwDYRa+UX1+Ds7Iia2TL3dC2xm2okcA3DUmPWg7OnaJnQY4YKWpjULJy5eb8tLGsw0g8Yw1aAZ3rO948YfMOEWwWLwIjX29UPcFpn/DhA/clRLEp50q6yA/hpillyMT8= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778772295; c=relaxed/simple; bh=hgy+3ZiDjdqxN8cuejdMSyZahxxxGz3Kg0WJEqYZeP8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=q2FVSiOSbWfYCQJxpifYh6j5y3HdVaFexCSBOrHIePwrKGdgEEzWjjZuuHGCbo7viAuYCubCHDlDmZwrzagtQen03JmElQ1q61m1h58mVt2p5xCSr0pUuGsXyx2mcBTteBDcyShYjyfJBMnpwDdykesMpeVQlUXsyHGAeEQ2yqk= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com; spf=pass smtp.mailfrom=linux.ibm.com; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b=MQp0FNAw; arc=none smtp.client-ip=148.163.158.5 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b="MQp0FNAw" Received: from pps.filterd (m0356516.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.11/8.18.1.11) with ESMTP id 64E9WOGb3178621; Thu, 14 May 2026 15:24:37 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=pp1; bh=dac8C0mDbvJmUm7fm PAMctVFX35zhHm+Z0V++2GVJwk=; b=MQp0FNAwIZGHetCr4MtDOZRCfMHhaklYr MSMPnSHEghKVMhYEaQ0I/eEFAqW7TLS0PP9SfXrK5q6lOi8iMF+f/jk+jpEjypN1 eaBfOCsm5Y42/qHKJeJGjZhAYfKTBSGiNvkWgJF/m45OS6OLPSy+NMrmi1ZnOHT9 5HnVTx64C6PXaxV0BEZKFO/Wd5WnHB5/kyKcrLgEM6c+xrnMjhs+mMgpf/JYsuAL x0ddPuJBebusyZ1kHhJNbF50/uuT/v2h3RBMnVXKsVAsiVB9CekcVJbruHInmbHT DsmWBofEI9Kfd7vTonhx2OmbJM1U6PPUib+AVqWKgkZMMKrh4HM/g== Received: from ppma12.dal12v.mail.ibm.com (dc.9e.1632.ip4.static.sl-reverse.com [50.22.158.220]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 4e3nv6vgwv-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 14 May 2026 15:24:36 +0000 (GMT) Received: from pps.filterd (ppma12.dal12v.mail.ibm.com [127.0.0.1]) by ppma12.dal12v.mail.ibm.com (8.18.1.7/8.18.1.7) with ESMTP id 64EF9QkV002913; Thu, 14 May 2026 15:24:35 GMT Received: from smtprelay07.fra02v.mail.ibm.com ([9.218.2.229]) by ppma12.dal12v.mail.ibm.com (PPS) with ESMTPS id 4e3nfgmxa0-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 14 May 2026 15:24:35 +0000 (GMT) Received: from smtpav01.fra02v.mail.ibm.com (smtpav01.fra02v.mail.ibm.com [10.20.54.100]) by smtprelay07.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 64EFOW6E28443032 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 14 May 2026 15:24:32 GMT Received: from smtpav01.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id EA7092004D; Thu, 14 May 2026 15:24:31 +0000 (GMT) Received: from smtpav01.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id C5A9D20040; Thu, 14 May 2026 15:24:23 +0000 (GMT) Received: from li-7bb28a4c-2dab-11b2-a85c-887b5c60d769.ibm.com.com (unknown [9.124.213.185]) by smtpav01.fra02v.mail.ibm.com (Postfix) with ESMTP; Thu, 14 May 2026 15:24:23 +0000 (GMT) From: Shrikanth Hegde To: linux-kernel@vger.kernel.org, mingo@kernel.org, peterz@infradead.org, juri.lelli@redhat.com, vincent.guittot@linaro.org, yury.norov@gmail.com, kprateek.nayak@amd.com, iii@linux.ibm.com Cc: sshegde@linux.ibm.com, tglx@kernel.org, gregkh@linuxfoundation.org, pbonzini@redhat.com, seanjc@google.com, vschneid@redhat.com, huschle@linux.ibm.com, rostedt@goodmis.org, dietmar.eggemann@arm.com, mgorman@suse.de, bsegall@google.com, maddy@linux.ibm.com, srikar@linux.ibm.com, hdanton@sina.com, chleroy@kernel.org, vineeth@bitbyteword.org, frederic@kernel.org, arighi@nvidia.com, pauld@redhat.com, christian.loehle@arm.com, tj@kernel.org, tommaso.cucinotta@gmail.com, maz@kernel.org, rafael@kernel.org Subject: [PATCH v3 11/20] sched/core: Push current task from non preferred CPU Date: Thu, 14 May 2026 20:51:55 +0530 Message-ID: <20260514152204.481115-12-sshegde@linux.ibm.com> X-Mailer: git-send-email 2.51.0 In-Reply-To: <20260514152204.481115-1-sshegde@linux.ibm.com> References: <20260514152204.481115-1-sshegde@linux.ibm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-TM-AS-GCONF: 00 X-Proofpoint-Reinject: loops=2 maxloops=12 X-Authority-Analysis: v=2.4 cv=Us1T8ewB c=1 sm=1 tr=0 ts=6a05e935 cx=c_pps a=bLidbwmWQ0KltjZqbj+ezA==:117 a=bLidbwmWQ0KltjZqbj+ezA==:17 a=NGcC8JguVDcA:10 a=VkNPw1HP01LnGYTKEx00:22 a=RnoormkPH1_aCDwRdu11:22 a=Y2IxJ9c9Rs8Kov3niI8_:22 a=VnNF1IyMAAAA:8 a=t2TxufUTn63SgxZdg_oA:9 X-Proofpoint-GUID: O2tym8StsBQ1Iex0bYhDB2GYXHPi0DL0 X-Proofpoint-Spam-Details-Enc: AW1haW4tMjYwNTE0MDE1MyBTYWx0ZWRfX6AveXdWqq5ye uh+UeVGi5Jl+b89WS0qoRpeRcMn9mqJAs6CRjqa+1So8mv9xj5R90BZTJuqK9o/xXHJGbzr7QKb TygWhqVZEYuTn2zDz1tNK5aiAMrWA9/h6HLTaI69MXZUkY7eUx21jrCUkekcGz8wE8PrzNEHNfd TruTadme8p9QSHkfJAt7Dd9WGYwcs28DnlEB8jlrxqWT3s0hr9Dkv3Bl1gnELVClPHIPeWqYtqC AtNdSyJ43KhtD2bPQgN70qa+246l0hB8vOvfbiyKVPpB1nZnO1bWMGkhmtL8ulAJeHtKAicPrSz Es5H4g9xqUJZmHudQ/WocXlSKdkAaQOayQZs35H6NircSftKmOYX5e+l2T9spnP3Yckz95EVZ9O PYicDZ/FyEukYZB92Kn0ZJ5fP5poE6roq3Cr6/4MCOQM+0BR2JrtJ+rb7mhUiIIZtebCdfJ0GXM /PQ5YRXH8v4VtaP/uMg== X-Proofpoint-ORIG-GUID: ul84zmTeJtXUctUo_m-u8TIDdfEfBr0N X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1143,Hydra:6.1.51,FMLib:17.12.100.49 definitions=2026-05-14_03,2026-05-13_01,2025-10-01_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 phishscore=0 priorityscore=1501 impostorscore=0 lowpriorityscore=0 bulkscore=0 suspectscore=0 spamscore=0 malwarescore=0 clxscore=1015 adultscore=0 classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0 reason=mlx scancount=1 engine=8.22.0-2605050000 definitions=main-2605140153 Actively push out task running on a non-preferred CPU. Since the task is running on the CPU, need to stop the cpu and push the task out. However, if the task in pinned only to non-preferred CPUs, it will continue running there. This will help in maintaining the userspace affinities unlike CPU hotplug or isolated cpusets. Though code is almost same as __balance_push_cpu_stop and quite close to push_cpu_stop, it is being kept separate as it provides a cleaner implementation w.r.t CONFIG_HOTPLUG_CPU. Add push_task_work_done flag to protect work buffer. Works for all classes. Best results today with FAIR/RT. Signed-off-by: Shrikanth Hegde --- kernel/sched/core.c | 87 ++++++++++++++++++++++++++++++++++++++++++++ kernel/sched/sched.h | 7 ++++ 2 files changed, 94 insertions(+) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 86fa4bfaead0..508773e71929 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -5678,6 +5678,9 @@ void sched_tick(void) unsigned long hw_pressure; u64 resched_latency; + if (!cpu_preferred(cpu)) + sched_push_current_non_preferred_cpu(rq); + if (housekeeping_cpu(cpu, HK_TYPE_KERNEL_NOISE)) arch_scale_freq_tick(); @@ -11263,3 +11266,87 @@ void sched_change_end(struct sched_change_ctx *ctx) p->sched_class->prio_changed(rq, p, ctx->prio); } } + +#ifdef CONFIG_PREFERRED_CPU +/* npc - non preferred CPU */ +static DEFINE_PER_CPU(struct cpu_stop_work, npc_push_task_work); + +static int sched_non_preferred_cpu_push_stop(void *arg) +{ + struct task_struct *p = arg; + struct rq *rq = this_rq(); + struct rq_flags rf; + int cpu; + + raw_spin_lock_irq(&p->pi_lock); + rq_lock(rq, &rf); + rq->push_task_work_done = 0; + + update_rq_clock(rq); + + if (task_rq(p) == rq && task_on_rq_queued(p)) { + cpu = select_fallback_rq(rq->cpu, p); + rq = __migrate_task(rq, &rf, p, cpu); + } + + rq_unlock(rq, &rf); + raw_spin_unlock_irq(&p->pi_lock); + put_task_struct(p); + + return 0; +} + +/* + * Push the current task running on non-preferred CPU. + * Using this non preferred CPU will lead to more vCPU preemptions + * in the host. So it is better not to use this CPU. + * + * Since task is running, call a stopper to push the task out. This is + * similar to how task moves during hotplug. In select_fallback_rq a + * preferred CPU will be chosen and henceforth task shouldn't come back to + * this CPU again. + * + * Works for FAIR/RT class only + * + * If task is affined only non-preferred CPUs, it can't be moved out + */ +void sched_push_current_non_preferred_cpu(struct rq *rq) +{ + struct task_struct *push_task = rq->curr; + unsigned long flags; + struct rq_flags rf; + + /* sanity check */ + if (cpu_preferred(rq->cpu)) + return; + + /* Push only if it is FAIR/RT class */ + if (push_task->sched_class != &fair_sched_class && + push_task->sched_class != &rt_sched_class) + return; + + if (kthread_is_per_cpu(push_task) || + is_migration_disabled(push_task)) + return; + + /* Is there any preferred CPU in the affinity list */ + if (!task_has_preferred_cpus(push_task)) + return; + + /* There is already a stopper thread for this. Dont race with it */ + if (rq->push_task_work_done == 1) + return; + + local_irq_save(flags); + + get_task_struct(push_task); + + rq_lock(rq, &rf); + rq->push_task_work_done = 1; + rq_unlock(rq, &rf); + + stop_one_cpu_nowait(rq->cpu, sched_non_preferred_cpu_push_stop, + push_task, this_cpu_ptr(&npc_push_task_work)); + local_irq_restore(flags); +} +#endif diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 90743b9e5add..96870021a842 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -1244,6 +1244,7 @@ struct rq { unsigned char nohz_idle_balance; unsigned char idle_balance; + bool push_task_work_done; unsigned long misfit_task_load; @@ -4138,4 +4139,10 @@ static inline bool task_has_preferred_cpus(struct task_struct *p) return cpumask_intersects(p->cpus_ptr, cpu_preferred_mask); } +#ifdef CONFIG_PREFERRED_CPU +void sched_push_current_non_preferred_cpu(struct rq *rq); +#else /* !CONFIG_PREFERRED_CPU */ +static inline void sched_push_current_non_preferred_cpu(struct rq *rq) { } +#endif + #endif /* _KERNEL_SCHED_SCHED_H */ -- 2.47.3