From: David Laight <david.laight.linux@gmail.com>
To: Nicolas Pitre <npitre@baylibre.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
linux-kernel@vger.kernel.org, u.kleine-koenig@baylibre.com,
Oleg Nesterov <oleg@redhat.com>,
Peter Zijlstra <peterz@infradead.org>,
Biju Das <biju.das.jz@bp.renesas.com>
Subject: Re: [PATCH v2 next 3/4] lib: Add mul_u64_add_u64_div_u64() and mul_u64_u64_div_u64_roundup()
Date: Sun, 25 May 2025 12:38:45 +0100 [thread overview]
Message-ID: <20250525123845.5b023297@pumpkin> (raw)
In-Reply-To: <403s8q39-33sp-pp3s-95o8-14s190or25o5@onlyvoer.pbz>
On Wed, 21 May 2025 09:50:28 -0400 (EDT)
Nicolas Pitre <npitre@baylibre.com> wrote:
> On Wed, 21 May 2025, David Laight wrote:
>...
>
> Depends how costly the ilog2 is. On ARM the clz instruction is about 1
> cycle. If you need to figure out the MSB manually then it might be best
> to skip those ilog2's.
I was going to measure it.
But I've pulled chunks from the kernel headers into a userspace build.
This is the x86-32 code for the 'if (ilog2(a) + ilog2(b) < 62)' test:
1b2b: 0f bd c6 bsr %esi,%eax
1b2e: 75 05 jne 1b35 <mul_u64_add_u64_div_u64_new+0x75>
1b30: b8 ff ff ff ff mov $0xffffffff,%eax
1b35: 85 c9 test %ecx,%ecx
1b37: 74 0d je 1b46 <mul_u64_add_u64_div_u64_new+0x86>
1b39: 0f bd c1 bsr %ecx,%eax
1b3c: 75 05 jne 1b43 <mul_u64_add_u64_div_u64_new+0x83>
1b3e: b8 ff ff ff ff mov $0xffffffff,%eax
1b43: 83 c0 20 add $0x20,%eax
1b46: 0f bd d5 bsr %ebp,%edx
1b49: 75 05 jne 1b50 <mul_u64_add_u64_div_u64_new+0x90>
1b4b: ba ff ff ff ff mov $0xffffffff,%edx
1b50: 85 ff test %edi,%edi
1b52: 74 0d je 1b61 <mul_u64_add_u64_div_u64_new+0xa1>
1b54: 0f bd d7 bsr %edi,%edx
1b57: 75 05 jne 1b5e <mul_u64_add_u64_div_u64_new+0x9e>
1b59: ba ff ff ff ff mov $0xffffffff,%edx
1b5e: 83 c2 20 add $0x20,%edx
1b61: 8d 1c 02 lea (%edx,%eax,1),%ebx
1b64: 83 fb 3e cmp $0x3e,%ebx
1b67: 0f 8e 0b 03 00 00 jle 1e78 <mul_u64_add_u64_div_u64_new+0x3b8>
If 'cmov' is enabled (not by default even after the current plan to remove 486 support) it is:
1b2b: ba ff ff ff ff mov $0xffffffff,%edx
1b30: 85 c9 test %ecx,%ecx
1b32: 0f 85 98 03 00 00 jne 1ed0 <mul_u64_add_u64_div_u64_new+0x410>
1b38: 0f bd c6 bsr %esi,%eax
1b3b: 0f 44 c2 cmove %edx,%eax
1b3e: bb ff ff ff ff mov $0xffffffff,%ebx
1b43: 85 ff test %edi,%edi
1b45: 0f 85 75 03 00 00 jne 1ec0 <mul_u64_add_u64_div_u64_new+0x400>
1b4b: 0f bd d5 bsr %ebp,%edx
1b4e: 0f 44 d3 cmove %ebx,%edx
1b51: 8d 1c 02 lea (%edx,%eax,1),%ebx
1b54: 83 fb 3e cmp $0x3e,%ebx
1b57: 0f 8e 0b 03 00 00 jle 1e68 <mul_u64_add_u64_div_u64_new+0x3a8>
with:
1ec0: 0f bd d7 bsr %edi,%edx
1ec3: 0f 44 d3 cmove %ebx,%edx
1ec6: 83 c2 20 add $0x20,%edx
1ec9: e9 83 fc ff ff jmp 1b51 <mul_u64_add_u64_div_u64_new+0x91>
1ed0: 0f bd c1 bsr %ecx,%eax
1ed3: 0f 44 c2 cmove %edx,%eax
1ed6: 83 c0 20 add $0x20,%eax
1ed9: e9 60 fc ff ff jmp 1b3e <mul_u64_add_u64_div_u64_new+0x7e>
Neither is pretty...
Some of the 'damage' is because the x86 'bsr' (bit scan reverse) sets 'z' for zero
and leaves the output undefined (unchanged on later cpu).
For reference I can get the multiply down to:
1b5d: 89 f0 mov %esi,%eax
1b5f: f7 e5 mul %ebp
1b61: 03 44 24 38 add 0x38(%esp),%eax
1b65: 83 d2 00 adc $0x0,%edx
1b68: 89 d3 mov %edx,%ebx
1b6a: 89 44 24 08 mov %eax,0x8(%esp)
1b6e: 89 f0 mov %esi,%eax
1b70: f7 e7 mul %edi
1b72: 03 44 24 3c add 0x3c(%esp),%eax
1b76: 83 d2 00 adc $0x0,%edx
1b79: 01 d8 add %ebx,%eax
1b7b: 83 d2 00 adc $0x0,%edx
1b7e: 89 d6 mov %edx,%esi
1b80: 89 c3 mov %eax,%ebx
1b82: 89 c8 mov %ecx,%eax
1b84: f7 e7 mul %edi
1b86: 89 c7 mov %eax,%edi
1b88: 89 c8 mov %ecx,%eax
1b8a: 01 f7 add %esi,%edi
1b8c: 83 d2 00 adc $0x0,%edx
1b8f: 89 d6 mov %edx,%esi
1b91: f7 e5 mul %ebp
1b93: 89 c1 mov %eax,%ecx
1b95: 8b 44 24 08 mov 0x8(%esp),%eax
1b99: 89 d5 mov %edx,%ebp
1b9b: 01 d9 add %ebx,%ecx
1b9d: 83 d5 00 adc $0x0,%ebp
1ba0: 89 44 24 28 mov %eax,0x28(%esp)
1ba4: 01 ef add %ebp,%edi
1ba6: 83 d6 00 adc $0x0,%esi
1ba9: 89 74 24 1c mov %esi,0x1c(%esp)
1bad: 8b 5c 24 1c mov 0x1c(%esp),%ebx
1bb1: 89 7c 24 18 mov %edi,0x18(%esp)
1bb5: 8b 44 24 18 mov 0x18(%esp),%eax
1bb9: 89 4c 24 2c mov %ecx,0x2c(%esp)
1bbd: 09 c3 or %eax,%ebx
1bbf: 0f 84 1b 03 00 00 je 1ee0 <mul_u64_add_u64_div_u64_new+0x420>
If you follow the register dependency chain it won't be as long as it looks.
(Although the last few instructions are terrible! - I've tried a few things
and they won't go away.)
David
next prev parent reply other threads:[~2025-05-25 11:38 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-05-18 13:38 [PATCH v2 next 0/4] Implement mul_u64_u64_div_u64_roundup() David Laight
2025-05-18 13:38 ` [PATCH v2 next 1/4] lib: mul_u64_u64_div_u64() rename parameter 'c' to 'd' David Laight
2025-05-20 2:11 ` Nicolas Pitre
2025-05-18 13:38 ` [PATCH v2 next 2/4] lib: mul_u64_u64_div_u64() Use BUG_ON() for divide by zero David Laight
2025-05-18 15:42 ` kernel test robot
2025-05-18 21:50 ` David Laight
2025-05-19 6:10 ` Uwe Kleine-König
2025-05-19 11:59 ` David Laight
2025-05-20 1:54 ` Nicolas Pitre
2025-05-20 2:21 ` Nicolas Pitre
2025-05-20 21:43 ` David Laight
2025-05-20 22:28 ` Nicolas Pitre
2025-05-18 13:38 ` [PATCH v2 next 3/4] lib: Add mul_u64_add_u64_div_u64() and mul_u64_u64_div_u64_roundup() David Laight
2025-05-20 3:03 ` Nicolas Pitre
2025-05-20 21:37 ` David Laight
2025-05-20 22:24 ` Nicolas Pitre
2025-05-21 12:52 ` David Laight
2025-05-21 13:50 ` Nicolas Pitre
2025-05-25 11:38 ` David Laight [this message]
2025-05-18 13:38 ` [PATCH v2 next 4/4] lib: Add tests for mul_u64_u64_div_u64_roundup() David Laight
2025-05-20 3:07 ` Nicolas Pitre
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250525123845.5b023297@pumpkin \
--to=david.laight.linux@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=biju.das.jz@bp.renesas.com \
--cc=linux-kernel@vger.kernel.org \
--cc=npitre@baylibre.com \
--cc=oleg@redhat.com \
--cc=peterz@infradead.org \
--cc=u.kleine-koenig@baylibre.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.