From: Yury Norov <yury.norov@gmail.com>
To: Kuan-Wei Chiu <visitorckw@gmail.com>
Cc: I Hsin Cheng <richard120310@gmail.com>,
linux@rasmusvillemoes.dk, jserv@ccns.ncku.edu.tw,
mark.rutland@arm.com, linux-kernel@vger.kernel.org,
eleanor15x@gmail.com
Subject: Re: [PATCH] cpumask: Optimize cpumask_any_but()
Date: Thu, 23 Jan 2025 17:39:45 -0500 [thread overview]
Message-ID: <Z5LFMeqZA4K4X7Qz@thinkpad> (raw)
In-Reply-To: <Z4tZDcCDt+U69kUF@visitorckw-System-Product-Name>
On Sat, Jan 18, 2025 at 03:32:29PM +0800, Kuan-Wei Chiu wrote:
> Hi Yury,
>
> On Fri, Jan 17, 2025 at 11:32:54AM -0500, Yury Norov wrote:
> > On Fri, Jan 17, 2025 at 10:59:31PM +0800, I Hsin Cheng wrote:
> > > On Fri, Jan 17, 2025 at 10:26:58PM +0800, Kuan-Wei Chiu wrote:
> > > > The cpumask_any_but() function can avoid using a loop to determine the
> > > > CPU index to return. If the first set bit in the cpumask is not equal
> > > > to the specified CPU, we can directly return the index of the first set
> > > > bit. Otherwise, we return the next set bit's index.
> > > >
> > > > This optimization replaces the loop with a single if statement,
> > > > allowing the compiler to generate more concise and efficient code.
> >
> > I thought compilers are smart enough to unroll loop in this case. Can
> > you show disassembled code before and after?
> >
> Since cpumask_any_but() is an inline function, I added the following to
> lib/cpumask.c for convenience:
>
> unsigned int non_inline_cpumask_any_but(const struct cpumask *mask, unsigned int cpu);
> unsigned int non_inline_cpumask_any_but(const struct cpumask *mask, unsigned int cpu)
> {
> return cpumask_any_but(mask, cpu);
> }
>
> I used objdump -d ./lib/cpumask.o to compare the differences.
>
> * Before the patch:
>
> 00000000000001f0 <non_inline_cpumask_any_but>:
> 1f0: f3 0f 1e fa endbr64
> 1f4: 48 8b 3f mov (%rdi),%rdi
> 1f7: b8 40 00 00 00 mov $0x40,%eax
> 1fc: 48 85 ff test %rdi,%rdi
> 1ff: 74 4b je 24c <non_inline_cpumask_any_but+0x5c>
> 201: f3 48 0f bc d7 tzcnt %rdi,%rdx
> 206: 89 d0 mov %edx,%eax
> 208: 39 d6 cmp %edx,%esi
> 20a: 75 40 jne 24c <non_inline_cpumask_any_but+0x5c>
> 20c: 83 fa 3f cmp $0x3f,%edx
> 20f: 77 3b ja 24c <non_inline_cpumask_any_but+0x5c>
> 211: 41 b8 01 00 00 00 mov $0x1,%r8d
> 217: 83 c0 01 add $0x1,%eax
> 21a: 83 f8 40 cmp $0x40,%eax
> 21d: 74 2d je 24c <non_inline_cpumask_any_but+0x5c>
> 21f: 89 c1 mov %eax,%ecx
> 221: 4c 89 c2 mov %r8,%rdx
> 224: 48 d3 e2 shl %cl,%rdx
> 227: 48 89 d0 mov %rdx,%rax
> 22a: 48 f7 d8 neg %rax
> 22d: 48 21 f8 and %rdi,%rax
> 230: 74 15 je 247 <non_inline_cpumask_any_but+0x57>
> 232: f3 48 0f bc d0 tzcnt %rax,%rdx
> 237: 89 d0 mov %edx,%eax
> 239: 39 d6 cmp %edx,%esi
> 23b: 75 0f jne 24c <non_inline_cpumask_any_but+0x5c>
> 23d: 83 fa 3f cmp $0x3f,%edx
> 240: 76 d5 jbe 217 <non_inline_cpumask_any_but+0x27>
> 242: e9 00 00 00 00 jmp 247 <non_inline_cpumask_any_but+0x57>
> 247: b8 40 00 00 00 mov $0x40,%eax
> 24c: e9 00 00 00 00 jmp 251 <non_inline_cpumask_any_but+0x61>
>
> * After the patch:
>
> 00000000000001f0 <non_inline_cpumask_any_but>:
> 1f0: f3 0f 1e fa endbr64
> 1f4: 48 8b 17 mov (%rdi),%rdx
> 1f7: 48 85 d2 test %rdx,%rdx
> 1fa: 74 34 je 230 <non_inline_cpumask_any_but+0x40>
> 1fc: f3 48 0f bc ca tzcnt %rdx,%rcx
> 201: 89 c8 mov %ecx,%eax
> 203: 39 ce cmp %ecx,%esi
> 205: 75 2e jne 235 <non_inline_cpumask_any_but+0x45>
> 207: 83 c1 01 add $0x1,%ecx
> 20a: 83 f9 3f cmp $0x3f,%ecx
> 20d: 77 21 ja 230 <non_inline_cpumask_any_but+0x40>
> 20f: 48 c7 c0 ff ff ff ff mov $0xffffffffffffffff,%rax
> 216: 48 d3 e0 shl %cl,%rax
> 219: 48 89 c1 mov %rax,%rcx
> 21c: b8 40 00 00 00 mov $0x40,%eax
> 221: 48 21 d1 and %rdx,%rcx
> 224: 74 0f je 235 <non_inline_cpumask_any_but+0x45>
> 226: f3 48 0f bc c1 tzcnt %rcx,%rax
> 22b: e9 00 00 00 00 jmp 230 <non_inline_cpumask_any_but+0x40>
> 230: b8 40 00 00 00 mov $0x40,%eax
> 235: e9 00 00 00 00 jmp 23a <non_inline_cpumask_any_but+0x4a>
>
> > > >
> > > > As a result, the size of the bzImage built with x86 defconfig is
> > > > reduced by 4096 bytes:
> > > >
> > > > * Before:
> > > > $ size arch/x86/boot/bzImage
> > > > text data bss dec hex filename
> > > > 13537280 1024 0 13538304 ce9400 arch/x86/boot/bzImage
> > > >
> > > > * After:
> > > > $ size arch/x86/boot/bzImage
> > > > text data bss dec hex filename
> > > > 13533184 1024 0 13534208 ce8400 arch/x86/boot/bzImage
> >
> > Comparing zipped images tells little about code generation. Please use
> > scripts/bloat-o-meter.
> >
> $ ./scripts/bloat-o-meter ./old_cpumask.o ./new_cpumask.o
> add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-23 (-23)
> Function old new delta
> non_inline_cpumask_any_but 97 74 -23
> Total: Before=522, After=499, chg -4.41%
No need to introduce a wrapper. You need to build allyesconfig (or
defconfig) before and after your patch, and then run bloat-o-meter
against old and new vmlinux.
And specifically for cpumasks, can you please run this experiment
with NR_CPUS == 32 and NR_CPUS == 4096, for example. That way you
will test the change against small_cpumask_bits optimization.
Thanks,
Yury
prev parent reply other threads:[~2025-01-23 22:39 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-01-17 14:26 [PATCH] cpumask: Optimize cpumask_any_but() Kuan-Wei Chiu
2025-01-17 14:59 ` I Hsin Cheng
2025-01-17 16:32 ` Kuan-Wei Chiu
2025-01-17 16:32 ` Yury Norov
2025-01-18 7:32 ` Kuan-Wei Chiu
2025-01-23 22:39 ` Yury Norov [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Z5LFMeqZA4K4X7Qz@thinkpad \
--to=yury.norov@gmail.com \
--cc=eleanor15x@gmail.com \
--cc=jserv@ccns.ncku.edu.tw \
--cc=linux-kernel@vger.kernel.org \
--cc=linux@rasmusvillemoes.dk \
--cc=mark.rutland@arm.com \
--cc=richard120310@gmail.com \
--cc=visitorckw@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.