From: David Laight <david.laight.linux@gmail.com>
To: Nicolas Pitre <npitre@baylibre.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
linux-kernel@vger.kernel.org, u.kleine-koenig@baylibre.com,
Oleg Nesterov <oleg@redhat.com>,
Peter Zijlstra <peterz@infradead.org>,
Biju Das <biju.das.jz@bp.renesas.com>
Subject: Re: [PATCH v2 next 3/4] lib: Add mul_u64_add_u64_div_u64() and mul_u64_u64_div_u64_roundup()
Date: Sun, 25 May 2025 12:38:45 +0100 [thread overview]
Message-ID: <20250525123845.5b023297@pumpkin> (raw)
In-Reply-To: <403s8q39-33sp-pp3s-95o8-14s190or25o5@onlyvoer.pbz>
On Wed, 21 May 2025 09:50:28 -0400 (EDT)
Nicolas Pitre <npitre@baylibre.com> wrote:
> On Wed, 21 May 2025, David Laight wrote:
>...
>
> Depends how costly the ilog2 is. On ARM the clz instruction is about 1
> cycle. If you need to figure out the MSB manually then it might be best
> to skip those ilog2's.
I was going to measure it.
But I've pulled chunks from the kernel headers into a userspace build.
This is the x86-32 code for the 'if (ilog2(a) + ilog2(b) < 62)' test:
1b2b: 0f bd c6 bsr %esi,%eax
1b2e: 75 05 jne 1b35 <mul_u64_add_u64_div_u64_new+0x75>
1b30: b8 ff ff ff ff mov $0xffffffff,%eax
1b35: 85 c9 test %ecx,%ecx
1b37: 74 0d je 1b46 <mul_u64_add_u64_div_u64_new+0x86>
1b39: 0f bd c1 bsr %ecx,%eax
1b3c: 75 05 jne 1b43 <mul_u64_add_u64_div_u64_new+0x83>
1b3e: b8 ff ff ff ff mov $0xffffffff,%eax
1b43: 83 c0 20 add $0x20,%eax
1b46: 0f bd d5 bsr %ebp,%edx
1b49: 75 05 jne 1b50 <mul_u64_add_u64_div_u64_new+0x90>
1b4b: ba ff ff ff ff mov $0xffffffff,%edx
1b50: 85 ff test %edi,%edi
1b52: 74 0d je 1b61 <mul_u64_add_u64_div_u64_new+0xa1>
1b54: 0f bd d7 bsr %edi,%edx
1b57: 75 05 jne 1b5e <mul_u64_add_u64_div_u64_new+0x9e>
1b59: ba ff ff ff ff mov $0xffffffff,%edx
1b5e: 83 c2 20 add $0x20,%edx
1b61: 8d 1c 02 lea (%edx,%eax,1),%ebx
1b64: 83 fb 3e cmp $0x3e,%ebx
1b67: 0f 8e 0b 03 00 00 jle 1e78 <mul_u64_add_u64_div_u64_new+0x3b8>
If 'cmov' is enabled (not by default even after the current plan to remove 486 support) it is:
1b2b: ba ff ff ff ff mov $0xffffffff,%edx
1b30: 85 c9 test %ecx,%ecx
1b32: 0f 85 98 03 00 00 jne 1ed0 <mul_u64_add_u64_div_u64_new+0x410>
1b38: 0f bd c6 bsr %esi,%eax
1b3b: 0f 44 c2 cmove %edx,%eax
1b3e: bb ff ff ff ff mov $0xffffffff,%ebx
1b43: 85 ff test %edi,%edi
1b45: 0f 85 75 03 00 00 jne 1ec0 <mul_u64_add_u64_div_u64_new+0x400>
1b4b: 0f bd d5 bsr %ebp,%edx
1b4e: 0f 44 d3 cmove %ebx,%edx
1b51: 8d 1c 02 lea (%edx,%eax,1),%ebx
1b54: 83 fb 3e cmp $0x3e,%ebx
1b57: 0f 8e 0b 03 00 00 jle 1e68 <mul_u64_add_u64_div_u64_new+0x3a8>
with:
1ec0: 0f bd d7 bsr %edi,%edx
1ec3: 0f 44 d3 cmove %ebx,%edx
1ec6: 83 c2 20 add $0x20,%edx
1ec9: e9 83 fc ff ff jmp 1b51 <mul_u64_add_u64_div_u64_new+0x91>
1ed0: 0f bd c1 bsr %ecx,%eax
1ed3: 0f 44 c2 cmove %edx,%eax
1ed6: 83 c0 20 add $0x20,%eax
1ed9: e9 60 fc ff ff jmp 1b3e <mul_u64_add_u64_div_u64_new+0x7e>
Neither is pretty...
Some of the 'damage' is because the x86 'bsr' (bit scan reverse) sets 'z' for zero
and leaves the output undefined (unchanged on later cpu).
For reference I can get the multiply down to:
1b5d: 89 f0 mov %esi,%eax
1b5f: f7 e5 mul %ebp
1b61: 03 44 24 38 add 0x38(%esp),%eax
1b65: 83 d2 00 adc $0x0,%edx
1b68: 89 d3 mov %edx,%ebx
1b6a: 89 44 24 08 mov %eax,0x8(%esp)
1b6e: 89 f0 mov %esi,%eax
1b70: f7 e7 mul %edi
1b72: 03 44 24 3c add 0x3c(%esp),%eax
1b76: 83 d2 00 adc $0x0,%edx
1b79: 01 d8 add %ebx,%eax
1b7b: 83 d2 00 adc $0x0,%edx
1b7e: 89 d6 mov %edx,%esi
1b80: 89 c3 mov %eax,%ebx
1b82: 89 c8 mov %ecx,%eax
1b84: f7 e7 mul %edi
1b86: 89 c7 mov %eax,%edi
1b88: 89 c8 mov %ecx,%eax
1b8a: 01 f7 add %esi,%edi
1b8c: 83 d2 00 adc $0x0,%edx
1b8f: 89 d6 mov %edx,%esi
1b91: f7 e5 mul %ebp
1b93: 89 c1 mov %eax,%ecx
1b95: 8b 44 24 08 mov 0x8(%esp),%eax
1b99: 89 d5 mov %edx,%ebp
1b9b: 01 d9 add %ebx,%ecx
1b9d: 83 d5 00 adc $0x0,%ebp
1ba0: 89 44 24 28 mov %eax,0x28(%esp)
1ba4: 01 ef add %ebp,%edi
1ba6: 83 d6 00 adc $0x0,%esi
1ba9: 89 74 24 1c mov %esi,0x1c(%esp)
1bad: 8b 5c 24 1c mov 0x1c(%esp),%ebx
1bb1: 89 7c 24 18 mov %edi,0x18(%esp)
1bb5: 8b 44 24 18 mov 0x18(%esp),%eax
1bb9: 89 4c 24 2c mov %ecx,0x2c(%esp)
1bbd: 09 c3 or %eax,%ebx
1bbf: 0f 84 1b 03 00 00 je 1ee0 <mul_u64_add_u64_div_u64_new+0x420>
If you follow the register dependency chain it won't be as long as it looks.
(Although the last few instructions are terrible! - I've tried a few things
and they won't go away.)
David
next prev parent reply other threads:[~2025-05-25 11:38 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-05-18 13:38 [PATCH v2 next 0/4] Implement mul_u64_u64_div_u64_roundup() David Laight
2025-05-18 13:38 ` [PATCH v2 next 1/4] lib: mul_u64_u64_div_u64() rename parameter 'c' to 'd' David Laight
2025-05-20 2:11 ` Nicolas Pitre
2025-05-18 13:38 ` [PATCH v2 next 2/4] lib: mul_u64_u64_div_u64() Use BUG_ON() for divide by zero David Laight
2025-05-18 15:42 ` kernel test robot
2025-05-18 21:50 ` David Laight
2025-05-19 6:10 ` Uwe Kleine-König
2025-05-19 11:59 ` David Laight
2025-05-20 1:54 ` Nicolas Pitre
2025-05-20 2:21 ` Nicolas Pitre
2025-05-20 21:43 ` David Laight
2025-05-20 22:28 ` Nicolas Pitre
2025-05-18 13:38 ` [PATCH v2 next 3/4] lib: Add mul_u64_add_u64_div_u64() and mul_u64_u64_div_u64_roundup() David Laight
2025-05-20 3:03 ` Nicolas Pitre
2025-05-20 21:37 ` David Laight
2025-05-20 22:24 ` Nicolas Pitre
2025-05-21 12:52 ` David Laight
2025-05-21 13:50 ` Nicolas Pitre
2025-05-25 11:38 ` David Laight [this message]
2025-05-18 13:38 ` [PATCH v2 next 4/4] lib: Add tests for mul_u64_u64_div_u64_roundup() David Laight
2025-05-20 3:07 ` Nicolas Pitre
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250525123845.5b023297@pumpkin \
--to=david.laight.linux@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=biju.das.jz@bp.renesas.com \
--cc=linux-kernel@vger.kernel.org \
--cc=npitre@baylibre.com \
--cc=oleg@redhat.com \
--cc=peterz@infradead.org \
--cc=u.kleine-koenig@baylibre.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox