public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Yury Norov <yury.norov@gmail.com>
To: Kuan-Wei Chiu <visitorckw@gmail.com>
Cc: I Hsin Cheng <richard120310@gmail.com>,
	linux@rasmusvillemoes.dk, jserv@ccns.ncku.edu.tw,
	mark.rutland@arm.com, linux-kernel@vger.kernel.org,
	eleanor15x@gmail.com
Subject: Re: [PATCH] cpumask: Optimize cpumask_any_but()
Date: Thu, 23 Jan 2025 17:39:45 -0500	[thread overview]
Message-ID: <Z5LFMeqZA4K4X7Qz@thinkpad> (raw)
In-Reply-To: <Z4tZDcCDt+U69kUF@visitorckw-System-Product-Name>

On Sat, Jan 18, 2025 at 03:32:29PM +0800, Kuan-Wei Chiu wrote:
> Hi Yury,
> 
> On Fri, Jan 17, 2025 at 11:32:54AM -0500, Yury Norov wrote:
> > On Fri, Jan 17, 2025 at 10:59:31PM +0800, I Hsin Cheng wrote:
> > > On Fri, Jan 17, 2025 at 10:26:58PM +0800, Kuan-Wei Chiu wrote:
> > > > The cpumask_any_but() function can avoid using a loop to determine the
> > > > CPU index to return. If the first set bit in the cpumask is not equal
> > > > to the specified CPU, we can directly return the index of the first set
> > > > bit. Otherwise, we return the next set bit's index.
> > > > 
> > > > This optimization replaces the loop with a single if statement,
> > > > allowing the compiler to generate more concise and efficient code.
> > 
> > I thought compilers are smart enough to unroll loop in this case. Can
> > you show disassembled code before and after?
> > 
> Since cpumask_any_but() is an inline function, I added the following to
> lib/cpumask.c for convenience:
> 
> unsigned int non_inline_cpumask_any_but(const struct cpumask *mask, unsigned int cpu);
> unsigned int non_inline_cpumask_any_but(const struct cpumask *mask, unsigned int cpu)
> {
> 	return cpumask_any_but(mask, cpu);
> }
> 
> I used objdump -d ./lib/cpumask.o to compare the differences.
> 
> * Before the patch:
> 
> 00000000000001f0 <non_inline_cpumask_any_but>:
>  1f0:	f3 0f 1e fa          	endbr64 
>  1f4:	48 8b 3f             	mov    (%rdi),%rdi
>  1f7:	b8 40 00 00 00       	mov    $0x40,%eax
>  1fc:	48 85 ff             	test   %rdi,%rdi
>  1ff:	74 4b                	je     24c <non_inline_cpumask_any_but+0x5c>
>  201:	f3 48 0f bc d7       	tzcnt  %rdi,%rdx
>  206:	89 d0                	mov    %edx,%eax
>  208:	39 d6                	cmp    %edx,%esi
>  20a:	75 40                	jne    24c <non_inline_cpumask_any_but+0x5c>
>  20c:	83 fa 3f             	cmp    $0x3f,%edx
>  20f:	77 3b                	ja     24c <non_inline_cpumask_any_but+0x5c>
>  211:	41 b8 01 00 00 00    	mov    $0x1,%r8d
>  217:	83 c0 01             	add    $0x1,%eax
>  21a:	83 f8 40             	cmp    $0x40,%eax
>  21d:	74 2d                	je     24c <non_inline_cpumask_any_but+0x5c>
>  21f:	89 c1                	mov    %eax,%ecx
>  221:	4c 89 c2             	mov    %r8,%rdx
>  224:	48 d3 e2             	shl    %cl,%rdx
>  227:	48 89 d0             	mov    %rdx,%rax
>  22a:	48 f7 d8             	neg    %rax
>  22d:	48 21 f8             	and    %rdi,%rax
>  230:	74 15                	je     247 <non_inline_cpumask_any_but+0x57>
>  232:	f3 48 0f bc d0       	tzcnt  %rax,%rdx
>  237:	89 d0                	mov    %edx,%eax
>  239:	39 d6                	cmp    %edx,%esi
>  23b:	75 0f                	jne    24c <non_inline_cpumask_any_but+0x5c>
>  23d:	83 fa 3f             	cmp    $0x3f,%edx
>  240:	76 d5                	jbe    217 <non_inline_cpumask_any_but+0x27>
>  242:	e9 00 00 00 00       	jmp    247 <non_inline_cpumask_any_but+0x57>
>  247:	b8 40 00 00 00       	mov    $0x40,%eax
>  24c:	e9 00 00 00 00       	jmp    251 <non_inline_cpumask_any_but+0x61>
> 
> * After the patch:
> 
> 00000000000001f0 <non_inline_cpumask_any_but>:
>  1f0:	f3 0f 1e fa          	endbr64 
>  1f4:	48 8b 17             	mov    (%rdi),%rdx
>  1f7:	48 85 d2             	test   %rdx,%rdx
>  1fa:	74 34                	je     230 <non_inline_cpumask_any_but+0x40>
>  1fc:	f3 48 0f bc ca       	tzcnt  %rdx,%rcx
>  201:	89 c8                	mov    %ecx,%eax
>  203:	39 ce                	cmp    %ecx,%esi
>  205:	75 2e                	jne    235 <non_inline_cpumask_any_but+0x45>
>  207:	83 c1 01             	add    $0x1,%ecx
>  20a:	83 f9 3f             	cmp    $0x3f,%ecx
>  20d:	77 21                	ja     230 <non_inline_cpumask_any_but+0x40>
>  20f:	48 c7 c0 ff ff ff ff 	mov    $0xffffffffffffffff,%rax
>  216:	48 d3 e0             	shl    %cl,%rax
>  219:	48 89 c1             	mov    %rax,%rcx
>  21c:	b8 40 00 00 00       	mov    $0x40,%eax
>  221:	48 21 d1             	and    %rdx,%rcx
>  224:	74 0f                	je     235 <non_inline_cpumask_any_but+0x45>
>  226:	f3 48 0f bc c1       	tzcnt  %rcx,%rax
>  22b:	e9 00 00 00 00       	jmp    230 <non_inline_cpumask_any_but+0x40>
>  230:	b8 40 00 00 00       	mov    $0x40,%eax
>  235:	e9 00 00 00 00       	jmp    23a <non_inline_cpumask_any_but+0x4a>
> 
> > > > 
> > > > As a result, the size of the bzImage built with x86 defconfig is
> > > > reduced by 4096 bytes:
> > > > 
> > > > * Before:
> > > > $ size arch/x86/boot/bzImage
> > > >    text    data     bss     dec     hex filename
> > > > 13537280           1024       0 13538304         ce9400 arch/x86/boot/bzImage
> > > > 
> > > > * After:
> > > > $ size arch/x86/boot/bzImage
> > > >    text    data     bss     dec     hex filename
> > > > 13533184           1024       0 13534208         ce8400 arch/x86/boot/bzImage
> > 
> > Comparing zipped images tells little about code generation. Please use
> > scripts/bloat-o-meter.
> > 
> $ ./scripts/bloat-o-meter ./old_cpumask.o ./new_cpumask.o
> add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-23 (-23)
> Function                                     old     new   delta
> non_inline_cpumask_any_but                    97      74     -23
> Total: Before=522, After=499, chg -4.41%

No need to introduce a wrapper. You need to build allyesconfig (or
defconfig) before and after your patch, and then run bloat-o-meter
against old and new vmlinux.

And specifically for cpumasks, can you please run this experiment
with NR_CPUS == 32 and NR_CPUS == 4096, for example. That way you
will test the change against small_cpumask_bits optimization.

Thanks,
Yury

      reply	other threads:[~2025-01-23 22:39 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-01-17 14:26 [PATCH] cpumask: Optimize cpumask_any_but() Kuan-Wei Chiu
2025-01-17 14:59 ` I Hsin Cheng
2025-01-17 16:32   ` Kuan-Wei Chiu
2025-01-17 16:32   ` Yury Norov
2025-01-18  7:32     ` Kuan-Wei Chiu
2025-01-23 22:39       ` Yury Norov [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z5LFMeqZA4K4X7Qz@thinkpad \
    --to=yury.norov@gmail.com \
    --cc=eleanor15x@gmail.com \
    --cc=jserv@ccns.ncku.edu.tw \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux@rasmusvillemoes.dk \
    --cc=mark.rutland@arm.com \
    --cc=richard120310@gmail.com \
    --cc=visitorckw@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox