From: Yury Norov <yury.norov@gmail.com>
To: Kuan-Wei Chiu <visitorckw@gmail.com>
Cc: I Hsin Cheng <richard120310@gmail.com>,
linux@rasmusvillemoes.dk, jserv@ccns.ncku.edu.tw,
mark.rutland@arm.com, linux-kernel@vger.kernel.org,
eleanor15x@gmail.com
Subject: Re: [PATCH] cpumask: Optimize cpumask_any_but()
Date: Thu, 23 Jan 2025 17:39:45 -0500 [thread overview]
Message-ID: <Z5LFMeqZA4K4X7Qz@thinkpad> (raw)
In-Reply-To: <Z4tZDcCDt+U69kUF@visitorckw-System-Product-Name>
On Sat, Jan 18, 2025 at 03:32:29PM +0800, Kuan-Wei Chiu wrote:
> Hi Yury,
>
> On Fri, Jan 17, 2025 at 11:32:54AM -0500, Yury Norov wrote:
> > On Fri, Jan 17, 2025 at 10:59:31PM +0800, I Hsin Cheng wrote:
> > > On Fri, Jan 17, 2025 at 10:26:58PM +0800, Kuan-Wei Chiu wrote:
> > > > The cpumask_any_but() function can avoid using a loop to determine the
> > > > CPU index to return. If the first set bit in the cpumask is not equal
> > > > to the specified CPU, we can directly return the index of the first set
> > > > bit. Otherwise, we return the next set bit's index.
> > > >
> > > > This optimization replaces the loop with a single if statement,
> > > > allowing the compiler to generate more concise and efficient code.
> >
> > I thought compilers are smart enough to unroll loop in this case. Can
> > you show disassembled code before and after?
> >
> Since cpumask_any_but() is an inline function, I added the following to
> lib/cpumask.c for convenience:
>
> unsigned int non_inline_cpumask_any_but(const struct cpumask *mask, unsigned int cpu);
> unsigned int non_inline_cpumask_any_but(const struct cpumask *mask, unsigned int cpu)
> {
> return cpumask_any_but(mask, cpu);
> }
>
> I used objdump -d ./lib/cpumask.o to compare the differences.
>
> * Before the patch:
>
> 00000000000001f0 <non_inline_cpumask_any_but>:
> 1f0: f3 0f 1e fa endbr64
> 1f4: 48 8b 3f mov (%rdi),%rdi
> 1f7: b8 40 00 00 00 mov $0x40,%eax
> 1fc: 48 85 ff test %rdi,%rdi
> 1ff: 74 4b je 24c <non_inline_cpumask_any_but+0x5c>
> 201: f3 48 0f bc d7 tzcnt %rdi,%rdx
> 206: 89 d0 mov %edx,%eax
> 208: 39 d6 cmp %edx,%esi
> 20a: 75 40 jne 24c <non_inline_cpumask_any_but+0x5c>
> 20c: 83 fa 3f cmp $0x3f,%edx
> 20f: 77 3b ja 24c <non_inline_cpumask_any_but+0x5c>
> 211: 41 b8 01 00 00 00 mov $0x1,%r8d
> 217: 83 c0 01 add $0x1,%eax
> 21a: 83 f8 40 cmp $0x40,%eax
> 21d: 74 2d je 24c <non_inline_cpumask_any_but+0x5c>
> 21f: 89 c1 mov %eax,%ecx
> 221: 4c 89 c2 mov %r8,%rdx
> 224: 48 d3 e2 shl %cl,%rdx
> 227: 48 89 d0 mov %rdx,%rax
> 22a: 48 f7 d8 neg %rax
> 22d: 48 21 f8 and %rdi,%rax
> 230: 74 15 je 247 <non_inline_cpumask_any_but+0x57>
> 232: f3 48 0f bc d0 tzcnt %rax,%rdx
> 237: 89 d0 mov %edx,%eax
> 239: 39 d6 cmp %edx,%esi
> 23b: 75 0f jne 24c <non_inline_cpumask_any_but+0x5c>
> 23d: 83 fa 3f cmp $0x3f,%edx
> 240: 76 d5 jbe 217 <non_inline_cpumask_any_but+0x27>
> 242: e9 00 00 00 00 jmp 247 <non_inline_cpumask_any_but+0x57>
> 247: b8 40 00 00 00 mov $0x40,%eax
> 24c: e9 00 00 00 00 jmp 251 <non_inline_cpumask_any_but+0x61>
>
> * After the patch:
>
> 00000000000001f0 <non_inline_cpumask_any_but>:
> 1f0: f3 0f 1e fa endbr64
> 1f4: 48 8b 17 mov (%rdi),%rdx
> 1f7: 48 85 d2 test %rdx,%rdx
> 1fa: 74 34 je 230 <non_inline_cpumask_any_but+0x40>
> 1fc: f3 48 0f bc ca tzcnt %rdx,%rcx
> 201: 89 c8 mov %ecx,%eax
> 203: 39 ce cmp %ecx,%esi
> 205: 75 2e jne 235 <non_inline_cpumask_any_but+0x45>
> 207: 83 c1 01 add $0x1,%ecx
> 20a: 83 f9 3f cmp $0x3f,%ecx
> 20d: 77 21 ja 230 <non_inline_cpumask_any_but+0x40>
> 20f: 48 c7 c0 ff ff ff ff mov $0xffffffffffffffff,%rax
> 216: 48 d3 e0 shl %cl,%rax
> 219: 48 89 c1 mov %rax,%rcx
> 21c: b8 40 00 00 00 mov $0x40,%eax
> 221: 48 21 d1 and %rdx,%rcx
> 224: 74 0f je 235 <non_inline_cpumask_any_but+0x45>
> 226: f3 48 0f bc c1 tzcnt %rcx,%rax
> 22b: e9 00 00 00 00 jmp 230 <non_inline_cpumask_any_but+0x40>
> 230: b8 40 00 00 00 mov $0x40,%eax
> 235: e9 00 00 00 00 jmp 23a <non_inline_cpumask_any_but+0x4a>
>
> > > >
> > > > As a result, the size of the bzImage built with x86 defconfig is
> > > > reduced by 4096 bytes:
> > > >
> > > > * Before:
> > > > $ size arch/x86/boot/bzImage
> > > > text data bss dec hex filename
> > > > 13537280 1024 0 13538304 ce9400 arch/x86/boot/bzImage
> > > >
> > > > * After:
> > > > $ size arch/x86/boot/bzImage
> > > > text data bss dec hex filename
> > > > 13533184 1024 0 13534208 ce8400 arch/x86/boot/bzImage
> >
> > Comparing zipped images tells little about code generation. Please use
> > scripts/bloat-o-meter.
> >
> $ ./scripts/bloat-o-meter ./old_cpumask.o ./new_cpumask.o
> add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-23 (-23)
> Function old new delta
> non_inline_cpumask_any_but 97 74 -23
> Total: Before=522, After=499, chg -4.41%
No need to introduce a wrapper. You need to build allyesconfig (or
defconfig) before and after your patch, and then run bloat-o-meter
against old and new vmlinux.
And specifically for cpumasks, can you please run this experiment
with NR_CPUS == 32 and NR_CPUS == 4096, for example. That way you
will test the change against small_cpumask_bits optimization.
Thanks,
Yury
prev parent reply other threads:[~2025-01-23 22:39 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-01-17 14:26 [PATCH] cpumask: Optimize cpumask_any_but() Kuan-Wei Chiu
2025-01-17 14:59 ` I Hsin Cheng
2025-01-17 16:32 ` Kuan-Wei Chiu
2025-01-17 16:32 ` Yury Norov
2025-01-18 7:32 ` Kuan-Wei Chiu
2025-01-23 22:39 ` Yury Norov [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Z5LFMeqZA4K4X7Qz@thinkpad \
--to=yury.norov@gmail.com \
--cc=eleanor15x@gmail.com \
--cc=jserv@ccns.ncku.edu.tw \
--cc=linux-kernel@vger.kernel.org \
--cc=linux@rasmusvillemoes.dk \
--cc=mark.rutland@arm.com \
--cc=richard120310@gmail.com \
--cc=visitorckw@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox