All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: Tejun Heo <tj@kernel.org>
Cc: Xuewen Yan <xuewen.yan@unisoc.com>,
	mingo@redhat.com, juri.lelli@redhat.com,
	vincent.guittot@linaro.org, dietmar.eggemann@arm.com,
	rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de,
	vschneid@redhat.com, lukasz.luba@arm.com,
	linux-kernel@vger.kernel.org, rui.zhang@intel.com,
	di.shen@unisoc.com, ke.wang@unisoc.com, xuewen.yan94@gmail.com,
	ubizjak@gmail.com, Marco Elver <elver@google.com>
Subject: Re: [RFC PATCH] sched: Add scx_cpuperf_target in sched_cpu_util()
Date: Thu, 19 Mar 2026 10:02:40 +0100	[thread overview]
Message-ID: <20260319090240.GS3738010@noisy.programming.kicks-ass.net> (raw)
In-Reply-To: <abtMmzntD4XCrG2M@slm.duckdns.org>

On Wed, Mar 18, 2026 at 03:08:43PM -1000, Tejun Heo wrote:
> On Wed, Mar 18, 2026 at 01:47:18PM +0100, Peter Zijlstra wrote:
> > On Wed, Mar 18, 2026 at 08:17:55PM +0800, Xuewen Yan wrote:
> > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > > index bf948db905ed..20adb6fede2a 100644
> > > --- a/kernel/sched/fair.c
> > > +++ b/kernel/sched/fair.c
> > > @@ -8198,7 +8198,12 @@ unsigned long effective_cpu_util(int cpu, unsigned long util_cfs,
> > >  
> > >  unsigned long sched_cpu_util(int cpu)
> > >  {
> > > -	return effective_cpu_util(cpu, cpu_util_cfs(cpu), NULL, NULL);
> > > +	unsigned long util = scx_cpuperf_target(cpu);
> > > +
> > > +	if (!scx_switched_all())
> > > +		util += cpu_util_cfs(cpu);
> > > +
> > > +	return effective_cpu_util(cpu, util, NULL, NULL);
> > >  }
> > 
> > This puts the common case of no ext muck into the slow path of that
> > static_branch.
> > 
> > This wants to be something like:
> > 
> > unsigned long sched_cpu_util(int cpu)
> > {
> > 	unsigned long util = cpu_util_cfs(cpu);
> > 
> > 	if (scx_enabled()) {
> > 		unsigned long scx_util = scx_cpuperf_target(cpu);
> > 
> > 		if (!scx_switched_all())
> > 			scx_util += util;
> > 
> > 		util = scx_util;
> > 	}
> > 
> > 	return effective_cpu_util(cpu, util, NULL, NULL);
> > }
> 
> scx_switched_all() is an unlikely static branch just like scx_enabled() and
> scx_cpuperf_target() has scx_enabled() in it too, so the difference for the
> fair path between the two versions is two noop run-throughs vs. one. Either
> way is fine but it is more code for likely no discernible gain.

(added noinline to effective_cpu_util() for clarity)

So the original patch generates this:

sched_cpu_util:
1c5240:  sched_cpu_util+0x0       endbr64
1c5244:  sched_cpu_util+0x4       call   0x1c5249 <__fentry__>
1c5249:  sched_cpu_util+0x9       push   %rbp
1c524a:  sched_cpu_util+0xa       push   %rbx
1c524b:  sched_cpu_util+0xb       mov    %edi,%ebx
1c524d:  sched_cpu_util+0xd       <jump_table.1c524d>
                                  = nop2                                   (if DEFAULT)
                                  = jmp    1c5271 <sched_cpu_util+0x31>    (if JUMP)
1c524f:  sched_cpu_util+0xf       xor    %ebp,%ebp
1c5251:  sched_cpu_util+0x11      <jump_table.1c5251>
                                  = nop2                                   (if DEFAULT)
                                  = jmp    1c5261 <sched_cpu_util+0x21>    (if JUMP)
1c5253:  sched_cpu_util+0x13      xor    %edx,%edx
1c5255:  sched_cpu_util+0x15      xor    %esi,%esi
1c5257:  sched_cpu_util+0x17      mov    %ebx,%edi
1c5259:  sched_cpu_util+0x19      call   0x1bc5b0 <cpu_util.constprop.0>
1c525e:  sched_cpu_util+0x1e      add    %rax,%rbp
1c5261:  sched_cpu_util+0x21      mov    %rbp,%rsi
1c5264:  sched_cpu_util+0x24      mov    %ebx,%edi
1c5266:  sched_cpu_util+0x26      xor    %ecx,%ecx
1c5268:  sched_cpu_util+0x28      pop    %rbx
1c5269:  sched_cpu_util+0x29      xor    %edx,%edx
1c526b:  sched_cpu_util+0x2b      pop    %rbp
1c526c:  sched_cpu_util+0x2c      jmp    0x1c5160 <effective_cpu_util>

(slowpath)

1c5271:  sched_cpu_util+0x31      movslq %edi,%rdx
1c5274:  sched_cpu_util+0x34      mov    $0x0,%rax
1c527b:  sched_cpu_util+0x3b      mov    0x0(,%rdx,8),%rdx
1c5283:  sched_cpu_util+0x43      mov    0xa34(%rdx,%rax,1),%ebp
1c528a:  sched_cpu_util+0x4a      jmp    0x1c5251 <sched_cpu_util+0x11>


While my proposal generates this:

sched_cpu_util:
1c5240:  sched_cpu_util+0x0       endbr64
1c5244:  sched_cpu_util+0x4       call   0x1c5249 <__fentry__>
1c5249:  sched_cpu_util+0x9       push   %rbx
1c524a:  sched_cpu_util+0xa       xor    %esi,%esi
1c524c:  sched_cpu_util+0xc       xor    %edx,%edx
1c524e:  sched_cpu_util+0xe       mov    %edi,%ebx
1c5250:  sched_cpu_util+0x10      call   0x1bc5b0 <cpu_util.constprop.0>
1c5255:  sched_cpu_util+0x15      mov    %rax,%rsi
1c5258:  sched_cpu_util+0x18      <jump_table.1c5258>
                                  = nop2                                   (if DEFAULT)
                                  = jmp    1c5266 <sched_cpu_util+0x26>    (if JUMP)
1c525a:  sched_cpu_util+0x1a      mov    %ebx,%edi
1c525c:  sched_cpu_util+0x1c      xor    %ecx,%ecx
1c525e:  sched_cpu_util+0x1e      xor    %edx,%edx
1c5260:  sched_cpu_util+0x20      pop    %rbx
1c5261:  sched_cpu_util+0x21      jmp    0x1c5160 <effective_cpu_util>

(slowpath)

1c5266:  sched_cpu_util+0x26      <jump_table.1c5266>
                                  = nop2                                   (if DEFAULT)
                                  = jmp    1c527b <sched_cpu_util+0x3b>    (if JUMP)
1c5268:  sched_cpu_util+0x28      xor    %eax,%eax
1c526a:  sched_cpu_util+0x2a      <jump_table.1c526a>
                                  = nop2                                   (if DEFAULT)
                                  = jmp    1c5296 <sched_cpu_util+0x56>    (if JUMP)
1c526c:  sched_cpu_util+0x2c      mov    %ebx,%edi
1c526e:  sched_cpu_util+0x2e      add    %rax,%rsi
1c5271:  sched_cpu_util+0x31      xor    %ecx,%ecx
1c5273:  sched_cpu_util+0x33      xor    %edx,%edx
1c5275:  sched_cpu_util+0x35      pop    %rbx
1c5276:  sched_cpu_util+0x36      jmp    0x1c5160 <effective_cpu_util>
1c527b:  sched_cpu_util+0x3b      movslq %ebx,%rdx
1c527e:  sched_cpu_util+0x3e      mov    $0x0,%rax
1c5285:  sched_cpu_util+0x45      mov    0x0(,%rdx,8),%rdx
1c528d:  sched_cpu_util+0x4d      mov    0xa34(%rdx,%rax,1),%eax
1c5294:  sched_cpu_util+0x54      jmp    0x1c526a <sched_cpu_util+0x2a>
1c5296:  sched_cpu_util+0x56      mov    %ebx,%edi
1c5298:  sched_cpu_util+0x58      mov    %rax,%rsi
1c529b:  sched_cpu_util+0x5b      xor    %ecx,%ecx
1c529d:  sched_cpu_util+0x5d      xor    %edx,%edx
1c529f:  sched_cpu_util+0x5f      pop    %rbx
1c52a0:  sched_cpu_util+0x60      jmp    0x1c5160 <effective_cpu_util>


That fastpath is definitely better; the slowpath is worse, but that is
in part because the compilers are stupid and cannot eliminate
static_branch().

/me goes try again .. Yeah, the below patch does nothing :-( It will
happily emit scx_enabled() twice.


---
diff --git a/arch/x86/include/asm/jump_label.h b/arch/x86/include/asm/jump_label.h
index 05b16299588d..47cd1a1f9784 100644
--- a/arch/x86/include/asm/jump_label.h
+++ b/arch/x86/include/asm/jump_label.h
@@ -32,7 +32,7 @@
 	JUMP_TABLE_ENTRY(key, label)
 #endif /* CONFIG_HAVE_JUMP_LABEL_HACK */
 
-static __always_inline bool arch_static_branch(struct static_key * const key, const bool branch)
+static __always_inline __const bool arch_static_branch(struct static_key * const key, const bool branch)
 {
 	asm goto(ARCH_STATIC_BRANCH_ASM("%c0 + %c1", "%l[l_yes]")
 		: :  "i" (key), "i" (branch) : : l_yes);
@@ -42,7 +42,7 @@ static __always_inline bool arch_static_branch(struct static_key * const key, co
 	return true;
 }
 
-static __always_inline bool arch_static_branch_jump(struct static_key * const key, const bool branch)
+static __always_inline __const bool arch_static_branch_jump(struct static_key * const key, const bool branch)
 {
 	asm goto("1:"
 		"jmp %l[l_yes]\n\t"
diff --git a/include/linux/compiler_attributes.h b/include/linux/compiler_attributes.h
index c16d4199bf92..553fc9f3f7eb 100644
--- a/include/linux/compiler_attributes.h
+++ b/include/linux/compiler_attributes.h
@@ -312,6 +312,7 @@
  *   gcc: https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#index-pure-function-attribute
  */
 #define __pure                          __attribute__((__pure__))
+#define __const                         __attribute__((__const__))
 
 /*
  *   gcc: https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#index-section-function-attribute


  parent reply	other threads:[~2026-03-19  9:03 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-18 12:17 [RFC PATCH] sched: Add scx_cpuperf_target in sched_cpu_util() Xuewen Yan
2026-03-18 12:47 ` Peter Zijlstra
2026-03-18 12:55   ` Vincent Guittot
2026-03-18 13:44     ` Qais Yousef
2026-03-19  2:13       ` Xuewen Yan
2026-03-19  7:09         ` Vincent Guittot
2026-03-19 10:18         ` Lukasz Luba
2026-03-24  1:32         ` Qais Yousef
2026-03-18 13:03   ` [PATCH] sched/cpufreq: Reorder so non-SCX is common path Christian Loehle
2026-03-19  1:08   ` [RFC PATCH] sched: Add scx_cpuperf_target in sched_cpu_util() Tejun Heo
2026-03-19  2:24     ` Xuewen Yan
2026-03-19  2:38       ` Xuewen Yan
2026-03-19  9:02     ` Peter Zijlstra [this message]
2026-03-19 10:01       ` Uros Bizjak
2026-03-19 10:26         ` Peter Zijlstra
2026-03-19 11:02           ` Uros Bizjak
2026-03-19 11:12             ` Peter Zijlstra
2026-03-19 11:19               ` Uros Bizjak
2026-03-19 11:33                 ` Peter Zijlstra
2026-03-19 11:22               ` Peter Zijlstra
2026-03-18 12:54 ` Christian Loehle
2026-03-19  1:21 ` Tejun Heo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260319090240.GS3738010@noisy.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=bsegall@google.com \
    --cc=di.shen@unisoc.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=elver@google.com \
    --cc=juri.lelli@redhat.com \
    --cc=ke.wang@unisoc.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lukasz.luba@arm.com \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=rostedt@goodmis.org \
    --cc=rui.zhang@intel.com \
    --cc=tj@kernel.org \
    --cc=ubizjak@gmail.com \
    --cc=vincent.guittot@linaro.org \
    --cc=vschneid@redhat.com \
    --cc=xuewen.yan94@gmail.com \
    --cc=xuewen.yan@unisoc.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.