From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from desiato.infradead.org (desiato.infradead.org [90.155.92.199]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 23D5D3B27D8 for ; Thu, 19 Mar 2026 09:03:00 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.92.199 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773910986; cv=none; b=hI25vVJGLJbaxMfichbbLtVHMwzDMRZjkZjNzUud6gWUbrtI2bgQQ90xpQxutep5AJXo/fEhFbDYPCHLPfP+tHaURrQBE6zUDgtYnTTtQvTcq2CzpZosP01EbLl25HF0h7U0i96qzWTdmX/LqnbTACeWfZDn38uXSlnXa7mt5Vo= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773910986; c=relaxed/simple; bh=bs15cjzotf1nrZeeDimtFep24pWgkY7BVQqf8HiXCFg=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=NYoIMgZcNDxiiakBTccZ5FEYWss2LCjbM+mUS+ELFPB6vWcqmXWw6ZUFJ5jpGeDBf93GESxYl1AfLGmrdXkMrs+Qjc4FlQzXSW7hDdsg97kiDAoQUVE90BsFHwRLxsuIfNyY+6Q2GMArbmr8hBhwh5mdi9TULyGz66X2VbliS14= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=i2yQcTW8; arc=none smtp.client-ip=90.155.92.199 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="i2yQcTW8" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=L4iN0v6EFouElAUzoyvjcafJBHnjzkJUa96EIEr8q1g=; b=i2yQcTW8SeC4wUbk6FLe3uu5Xd tQR5as/1viUVp4MiFfj9RBTFWLasc2VTwdEsnLXkRuE+/ro/xChwSylh67ZquQjqW3vLlxPK2ZxkI L6Yvl/bBqfNSaL9Ol/lLY0BnYnsOLmwrb6DjF+WQph23Ndy7Ckem6F3aERLHk1uLfJyiQrxl3tJ3P vDcA1SQblq8S4sDTHbuSWtbHqwSTf2utxN2WHIzJ/baonXVOOAfDBZBTKujVORHBTc2dlpGHzu3rh bgCpww3XyZ5z/R7tPowpo3TM8HEfcEQM4nel/RglFq5PBRjdXIkuWsaquWcp1Kg1ImYquXgB4KlRw EvcIBT8w==; Received: from 77-249-17-252.cable.dynamic.v4.ziggo.nl ([77.249.17.252] helo=noisy.programming.kicks-ass.net) by desiato.infradead.org with esmtpsa (Exim 4.98.2 #2 (Red Hat Linux)) id 1w39H7-0000000D58S-3SXF; Thu, 19 Mar 2026 09:02:42 +0000 Received: by noisy.programming.kicks-ass.net (Postfix, from userid 1000) id 5A729301185; Thu, 19 Mar 2026 10:02:40 +0100 (CET) Date: Thu, 19 Mar 2026 10:02:40 +0100 From: Peter Zijlstra To: Tejun Heo Cc: Xuewen Yan , mingo@redhat.com, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, vschneid@redhat.com, lukasz.luba@arm.com, linux-kernel@vger.kernel.org, rui.zhang@intel.com, di.shen@unisoc.com, ke.wang@unisoc.com, xuewen.yan94@gmail.com, ubizjak@gmail.com, Marco Elver Subject: Re: [RFC PATCH] sched: Add scx_cpuperf_target in sched_cpu_util() Message-ID: <20260319090240.GS3738010@noisy.programming.kicks-ass.net> References: <20260318121755.16354-1-xuewen.yan@unisoc.com> <20260318124718.GC3738786@noisy.programming.kicks-ass.net> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: On Wed, Mar 18, 2026 at 03:08:43PM -1000, Tejun Heo wrote: > On Wed, Mar 18, 2026 at 01:47:18PM +0100, Peter Zijlstra wrote: > > On Wed, Mar 18, 2026 at 08:17:55PM +0800, Xuewen Yan wrote: > > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > > > index bf948db905ed..20adb6fede2a 100644 > > > --- a/kernel/sched/fair.c > > > +++ b/kernel/sched/fair.c > > > @@ -8198,7 +8198,12 @@ unsigned long effective_cpu_util(int cpu, unsigned long util_cfs, > > > > > > unsigned long sched_cpu_util(int cpu) > > > { > > > - return effective_cpu_util(cpu, cpu_util_cfs(cpu), NULL, NULL); > > > + unsigned long util = scx_cpuperf_target(cpu); > > > + > > > + if (!scx_switched_all()) > > > + util += cpu_util_cfs(cpu); > > > + > > > + return effective_cpu_util(cpu, util, NULL, NULL); > > > } > > > > This puts the common case of no ext muck into the slow path of that > > static_branch. > > > > This wants to be something like: > > > > unsigned long sched_cpu_util(int cpu) > > { > > unsigned long util = cpu_util_cfs(cpu); > > > > if (scx_enabled()) { > > unsigned long scx_util = scx_cpuperf_target(cpu); > > > > if (!scx_switched_all()) > > scx_util += util; > > > > util = scx_util; > > } > > > > return effective_cpu_util(cpu, util, NULL, NULL); > > } > > scx_switched_all() is an unlikely static branch just like scx_enabled() and > scx_cpuperf_target() has scx_enabled() in it too, so the difference for the > fair path between the two versions is two noop run-throughs vs. one. Either > way is fine but it is more code for likely no discernible gain. (added noinline to effective_cpu_util() for clarity) So the original patch generates this: sched_cpu_util: 1c5240: sched_cpu_util+0x0 endbr64 1c5244: sched_cpu_util+0x4 call 0x1c5249 <__fentry__> 1c5249: sched_cpu_util+0x9 push %rbp 1c524a: sched_cpu_util+0xa push %rbx 1c524b: sched_cpu_util+0xb mov %edi,%ebx 1c524d: sched_cpu_util+0xd = nop2 (if DEFAULT) = jmp 1c5271 (if JUMP) 1c524f: sched_cpu_util+0xf xor %ebp,%ebp 1c5251: sched_cpu_util+0x11 = nop2 (if DEFAULT) = jmp 1c5261 (if JUMP) 1c5253: sched_cpu_util+0x13 xor %edx,%edx 1c5255: sched_cpu_util+0x15 xor %esi,%esi 1c5257: sched_cpu_util+0x17 mov %ebx,%edi 1c5259: sched_cpu_util+0x19 call 0x1bc5b0 1c525e: sched_cpu_util+0x1e add %rax,%rbp 1c5261: sched_cpu_util+0x21 mov %rbp,%rsi 1c5264: sched_cpu_util+0x24 mov %ebx,%edi 1c5266: sched_cpu_util+0x26 xor %ecx,%ecx 1c5268: sched_cpu_util+0x28 pop %rbx 1c5269: sched_cpu_util+0x29 xor %edx,%edx 1c526b: sched_cpu_util+0x2b pop %rbp 1c526c: sched_cpu_util+0x2c jmp 0x1c5160 (slowpath) 1c5271: sched_cpu_util+0x31 movslq %edi,%rdx 1c5274: sched_cpu_util+0x34 mov $0x0,%rax 1c527b: sched_cpu_util+0x3b mov 0x0(,%rdx,8),%rdx 1c5283: sched_cpu_util+0x43 mov 0xa34(%rdx,%rax,1),%ebp 1c528a: sched_cpu_util+0x4a jmp 0x1c5251 While my proposal generates this: sched_cpu_util: 1c5240: sched_cpu_util+0x0 endbr64 1c5244: sched_cpu_util+0x4 call 0x1c5249 <__fentry__> 1c5249: sched_cpu_util+0x9 push %rbx 1c524a: sched_cpu_util+0xa xor %esi,%esi 1c524c: sched_cpu_util+0xc xor %edx,%edx 1c524e: sched_cpu_util+0xe mov %edi,%ebx 1c5250: sched_cpu_util+0x10 call 0x1bc5b0 1c5255: sched_cpu_util+0x15 mov %rax,%rsi 1c5258: sched_cpu_util+0x18 = nop2 (if DEFAULT) = jmp 1c5266 (if JUMP) 1c525a: sched_cpu_util+0x1a mov %ebx,%edi 1c525c: sched_cpu_util+0x1c xor %ecx,%ecx 1c525e: sched_cpu_util+0x1e xor %edx,%edx 1c5260: sched_cpu_util+0x20 pop %rbx 1c5261: sched_cpu_util+0x21 jmp 0x1c5160 (slowpath) 1c5266: sched_cpu_util+0x26 = nop2 (if DEFAULT) = jmp 1c527b (if JUMP) 1c5268: sched_cpu_util+0x28 xor %eax,%eax 1c526a: sched_cpu_util+0x2a = nop2 (if DEFAULT) = jmp 1c5296 (if JUMP) 1c526c: sched_cpu_util+0x2c mov %ebx,%edi 1c526e: sched_cpu_util+0x2e add %rax,%rsi 1c5271: sched_cpu_util+0x31 xor %ecx,%ecx 1c5273: sched_cpu_util+0x33 xor %edx,%edx 1c5275: sched_cpu_util+0x35 pop %rbx 1c5276: sched_cpu_util+0x36 jmp 0x1c5160 1c527b: sched_cpu_util+0x3b movslq %ebx,%rdx 1c527e: sched_cpu_util+0x3e mov $0x0,%rax 1c5285: sched_cpu_util+0x45 mov 0x0(,%rdx,8),%rdx 1c528d: sched_cpu_util+0x4d mov 0xa34(%rdx,%rax,1),%eax 1c5294: sched_cpu_util+0x54 jmp 0x1c526a 1c5296: sched_cpu_util+0x56 mov %ebx,%edi 1c5298: sched_cpu_util+0x58 mov %rax,%rsi 1c529b: sched_cpu_util+0x5b xor %ecx,%ecx 1c529d: sched_cpu_util+0x5d xor %edx,%edx 1c529f: sched_cpu_util+0x5f pop %rbx 1c52a0: sched_cpu_util+0x60 jmp 0x1c5160 That fastpath is definitely better; the slowpath is worse, but that is in part because the compilers are stupid and cannot eliminate static_branch(). /me goes try again .. Yeah, the below patch does nothing :-( It will happily emit scx_enabled() twice. --- diff --git a/arch/x86/include/asm/jump_label.h b/arch/x86/include/asm/jump_label.h index 05b16299588d..47cd1a1f9784 100644 --- a/arch/x86/include/asm/jump_label.h +++ b/arch/x86/include/asm/jump_label.h @@ -32,7 +32,7 @@ JUMP_TABLE_ENTRY(key, label) #endif /* CONFIG_HAVE_JUMP_LABEL_HACK */ -static __always_inline bool arch_static_branch(struct static_key * const key, const bool branch) +static __always_inline __const bool arch_static_branch(struct static_key * const key, const bool branch) { asm goto(ARCH_STATIC_BRANCH_ASM("%c0 + %c1", "%l[l_yes]") : : "i" (key), "i" (branch) : : l_yes); @@ -42,7 +42,7 @@ static __always_inline bool arch_static_branch(struct static_key * const key, co return true; } -static __always_inline bool arch_static_branch_jump(struct static_key * const key, const bool branch) +static __always_inline __const bool arch_static_branch_jump(struct static_key * const key, const bool branch) { asm goto("1:" "jmp %l[l_yes]\n\t" diff --git a/include/linux/compiler_attributes.h b/include/linux/compiler_attributes.h index c16d4199bf92..553fc9f3f7eb 100644 --- a/include/linux/compiler_attributes.h +++ b/include/linux/compiler_attributes.h @@ -312,6 +312,7 @@ * gcc: https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#index-pure-function-attribute */ #define __pure __attribute__((__pure__)) +#define __const __attribute__((__const__)) /* * gcc: https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#index-section-function-attribute