From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 76ECF2D7D27 for ; Fri, 10 Apr 2026 09:48:33 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.156.1 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775814514; cv=none; b=A6k1Nc8VD38Fn78qKOgurvIITXRIxJmzAmZLp8qjjx3KcJuDaPJaMQ5pjqTr9p1dBZGCqRy6KCPZTrLyFfpO53aI2/VwEFPwouVdy7UgfwMxqkJ43nNVydRjySCtvJcwL+ME604x2onUc5h5B5Nt8T8dihaPIUZUJlaJ9VDQcdg= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775814514; c=relaxed/simple; bh=M4QANkNTMtjQCXiHFsNTWAfhm6nqSMrE2R/yDnzXGEs=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=lFvP7N/jCJVMtKHiFS+Uc3I8zDXrsaHqwbdxRhbUG/1wJ9D6B6v5BV70KgnlQxDgREocRS/Z4MQW4uB9OchJQ3ZoZdLIKxgRolRmhbZCAZtWdDmz86L2d0M3Uin/RZlcVnnx/+rFW/W6vBbdxzPMAjUXhsi4bDiBqueSpLwEvWQ= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com; spf=pass smtp.mailfrom=linux.ibm.com; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b=fNj/PUAC; arc=none smtp.client-ip=148.163.156.1 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b="fNj/PUAC" Received: from pps.filterd (m0360083.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.11/8.18.1.11) with ESMTP id 63A3nXot665439; Fri, 10 Apr 2026 09:47:52 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=cc :content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to; s=pp1; bh=sqiGOY Y/2vWq8dTPz/vYzInkDEoKkT3W6Nw8nQfOrDA=; b=fNj/PUACRu3bm4ttUaoG8J r8/+UrTRwuC9UySfQ4D9zGEAeDZH1B2dEDWYmT0MAa+uV4STDi0m9FyvL2tf2Q8i BBc8ZRTJyTi6M9vkAH3LBPv8/VF5TyjVKFKSy7vAxuAVAsKPe3YFwBAmL2asthXb Arp57b7q0MhWCgakdDHzCJucPkav9frt5J16YdBYqGtjufoFuW/EYpAuagVNtHvm A1cuA8LdgBdTesVg0aFaKiMog7HmR/kI2TWSY5V0swg+sUyjT28cxTj3IV8VAyXe Rw9i28lt0zMNZGkxI0KNuogokdZfB6twNmnCN6GH66DI84QsMivinpUZoM2cTC9w == Received: from ppma23.wdc07v.mail.ibm.com (5d.69.3da9.ip4.static.sl-reverse.com [169.61.105.93]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 4dcn2ehas8-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 10 Apr 2026 09:47:51 +0000 (GMT) Received: from pps.filterd (ppma23.wdc07v.mail.ibm.com [127.0.0.1]) by ppma23.wdc07v.mail.ibm.com (8.18.1.2/8.18.1.2) with ESMTP id 63A803br013882; Fri, 10 Apr 2026 09:47:50 GMT Received: from smtprelay02.fra02v.mail.ibm.com ([9.218.2.226]) by ppma23.wdc07v.mail.ibm.com (PPS) with ESMTPS id 4dcmf4fe9x-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 10 Apr 2026 09:47:49 +0000 Received: from smtpav02.fra02v.mail.ibm.com (smtpav02.fra02v.mail.ibm.com [10.20.54.101]) by smtprelay02.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 63A9lk9953543402 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 10 Apr 2026 09:47:46 GMT Received: from smtpav02.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id E8DA720043; Fri, 10 Apr 2026 09:47:45 +0000 (GMT) Received: from smtpav02.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 83D9220040; Fri, 10 Apr 2026 09:47:41 +0000 (GMT) Received: from [9.39.25.34] (unknown [9.39.25.34]) by smtpav02.fra02v.mail.ibm.com (Postfix) with ESMTP; Fri, 10 Apr 2026 09:47:41 +0000 (GMT) Message-ID: Date: Fri, 10 Apr 2026 15:17:40 +0530 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v2 00/17] sched/paravirt: Introduce cpu_preferred_mask and steal-driven vCPU backoff To: linux-kernel@vger.kernel.org Cc: pbonzini@redhat.com, seanjc@google.com, kprateek.nayak@amd.com, vschneid@redhat.com, iii@linux.ibm.com, huschle@linux.ibm.com, rostedt@goodmis.org, dietmar.eggemann@arm.com, mgorman@suse.de, bsegall@google.com, maddy@linux.ibm.com, srikar@linux.ibm.com, hdanton@sina.com, chleroy@kernel.org, vineeth@bitbyteword.org, joelagnelf@nvidia.com, mingo@kernel.org, peterz@infradead.org, juri.lelli@redhat.com, vincent.guittot@linaro.org, tglx@linutronix.de, yury.norov@gmail.com, gregkh@linuxfoundation.org References: <20260407191950.643549-1-sshegde@linux.ibm.com> From: Shrikanth Hegde Content-Language: en-US In-Reply-To: <20260407191950.643549-1-sshegde@linux.ibm.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-TM-AS-GCONF: 00 X-Proofpoint-Reinject: loops=2 maxloops=12 X-Proofpoint-ORIG-GUID: tTdrnFbYuYnvOBt54D6QX_-lStfuD4LB X-Proofpoint-Spam-Details-Enc: AW1haW4tMjYwNDEwMDA5MSBTYWx0ZWRfX+KT/xZRbZrSo dK59yLpu44HsP9Xz3GZY26UcqRQw2KYk69L/FkKlTqRqBOJNgIkpoPUw0rCw3hEoeCRB3+aZ4Sm /c+WfAfQqTuizmHq/Ec68yeiYnkxoh/iJ/V8WYLRbWWzPzSskzWlQXXN+tk0qXHW+NSnCTK6xyW GYfmuGFLVKrIBm68JjTRF8YX1seofI3BxVswV3LTCKlgvDejQOWS6Exewv26Zyr/nHXt2HcCko6 CGZ3e5feic7nK15xKmhH7j465iHiyAkQFtcvbfmlgICyK15Q6kpXr7uOqMZIdEa0IVMRFCaWSPn WjSUYf0sDltjPLloH9Ll8E2hFqZYm530PX4M6D/detUEf/KF5GHk+dvlu1/acPXdrMqdOkaeocK UGhNKpB21JcR1YHR8AK4rva/be+wjKP1AG/YQj90yEo7HU7V8p615CWDMT1N7f867cK+x3c0kJ6 Zt5bJIC+8XwVQPfaVXA== X-Authority-Analysis: v=2.4 cv=Cfw4Irrl c=1 sm=1 tr=0 ts=69d8c747 cx=c_pps a=3Bg1Hr4SwmMryq2xdFQyZA==:117 a=3Bg1Hr4SwmMryq2xdFQyZA==:17 a=IkcTkHD0fZMA:10 a=A5OVakUREuEA:10 a=VkNPw1HP01LnGYTKEx00:22 a=RnoormkPH1_aCDwRdu11:22 a=iQ6ETzBq9ecOQQE5vZCe:22 a=psqIk9jioDVMDXFg1V0A:9 a=QEXdDO2ut3YA:10 X-Proofpoint-GUID: mgR8g2vGW2w0LvK7MwHExbSIl1yeD26U X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1143,Hydra:6.1.51,FMLib:17.12.100.49 definitions=2026-04-10_03,2026-04-09_02,2025-10-01_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 malwarescore=0 phishscore=0 clxscore=1015 adultscore=0 suspectscore=0 priorityscore=1501 impostorscore=0 bulkscore=0 spamscore=0 lowpriorityscore=0 classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0 reason=mlx scancount=1 engine=8.22.0-2604010000 definitions=main-2604100091 On 4/8/26 12:49 AM, Shrikanth Hegde wrote: > In the virtualized environment, often there is vCPU overcommit. i.e. sum > of CPUs in all guests(virtual CPU aka vCPU) exceed the underlying physical CPU > (managed by host aka pCPU). Patch to write custom CPUs into preferred CPUs. This might help one echo specific CPUs based on their hardware topology. This could be used to find out the different kind of patterns across HWs and kind of arch specific hooks one might need if generic STEAL_MONITOR can't cater to all needs. Note: This disables the generic steal when custom mask is provided and enables it once empty mask is echoed. --- drivers/base/cpu.c | 54 ++++++++++++++++++++++++++++++++++++++++++- include/linux/sched.h | 3 +++ kernel/sched/core.c | 4 ++++ 3 files changed, 60 insertions(+), 1 deletion(-) diff --git a/drivers/base/cpu.c b/drivers/base/cpu.c index 0a6cf37f2001..133f28b15906 100644 --- a/drivers/base/cpu.c +++ b/drivers/base/cpu.c @@ -392,12 +392,64 @@ static int cpu_uevent(const struct device *dev, struct kobj_uevent_env *env) #endif #ifdef CONFIG_PARAVIRT +static ssize_t preferred_store(struct device *dev, + struct device_attribute *attr, + const char *buf, size_t count) +{ + cpumask_var_t temp_mask; + int retval = 0; + int cpu; + + if (!alloc_cpumask_var(&temp_mask, GFP_KERNEL)) + return -ENOMEM; + + retval = cpulist_parse(buf, temp_mask); + if (retval) + goto free_mask; + + /* ALL cpus can't be marked as paravirt */ + if (cpumask_equal(temp_mask, cpu_online_mask)) { + retval = -EINVAL; + goto free_mask; + } + if (cpumask_weight(temp_mask) > num_online_cpus()) { + retval = -EINVAL; + goto free_mask; + } + + /* Echoing > means all CPUs are preferred and Enables generic steal monitor */ + if (cpumask_empty(temp_mask)) { + static_branch_disable(&disable_generic_steal_mon); + cpumask_copy((struct cpumask *)&__cpu_preferred_mask, cpu_online_mask); + + } else { + /* + * Explicit Specification of Usable CPUs and Disables generic steal + * monitor + */ + static_branch_enable(&disable_generic_steal_mon); + cpumask_copy((struct cpumask *)&__cpu_preferred_mask, temp_mask); + + /* Enable tick on nohz_full cpu */ + for_each_cpu_andnot(cpu, cpu_online_mask, temp_mask) { + if (tick_nohz_full_cpu(cpu)) + tick_nohz_dep_set_cpu(cpu, TICK_DEP_BIT_SCHED); + } + } + + retval = count; + +free_mask: + free_cpumask_var(temp_mask); + return retval; +} + static ssize_t preferred_show(struct device *dev, struct device_attribute *attr, char *buf) { return sysfs_emit(buf, "%*pbl\n", cpumask_pr_args(cpu_preferred_mask)); } -static DEVICE_ATTR_RO(preferred); +static DEVICE_ATTR_RW(preferred); #endif const struct bus_type cpu_subsys = { diff --git a/include/linux/sched.h b/include/linux/sched.h index 6c0d5d36f21c..3760c8047ffe 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -2515,4 +2515,7 @@ extern void migrate_enable(void); DEFINE_LOCK_GUARD_0(migrate, migrate_disable(), migrate_enable()) +#ifdef CONFIG_PARAVIRT +DECLARE_STATIC_KEY_FALSE(disable_generic_steal_mon); +#endif #endif diff --git a/kernel/sched/core.c b/kernel/sched/core.c index cb9110f95ebf..680da55070f8 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -11339,6 +11339,7 @@ void sched_push_current_non_preferred_cpu(struct rq *rq) } struct steal_monitor_t steal_mon; +DEFINE_STATIC_KEY_FALSE(disable_generic_steal_mon); void sched_init_steal_monitor(void) { @@ -11428,6 +11429,9 @@ void sched_trigger_steal_computation(int cpu) if (likely(cpu != first_hk_cpu)) return; + if (static_branch_unlikely(&disable_generic_steal_mon)) + return; + /* * Since everything is updated by first housekeeping CPU, * There is no need for complex syncronization. -- 2.47.3