All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Laight <david.laight.linux@gmail.com>
To: Nicolas Pitre <npitre@baylibre.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	linux-kernel@vger.kernel.org, u.kleine-koenig@baylibre.com,
	Oleg Nesterov <oleg@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Biju Das <biju.das.jz@bp.renesas.com>
Subject: Re: [PATCH v2 next 3/4] lib: Add mul_u64_add_u64_div_u64() and mul_u64_u64_div_u64_roundup()
Date: Sun, 25 May 2025 12:38:45 +0100	[thread overview]
Message-ID: <20250525123845.5b023297@pumpkin> (raw)
In-Reply-To: <403s8q39-33sp-pp3s-95o8-14s190or25o5@onlyvoer.pbz>

On Wed, 21 May 2025 09:50:28 -0400 (EDT)
Nicolas Pitre <npitre@baylibre.com> wrote:

> On Wed, 21 May 2025, David Laight wrote:
>...
> 
> Depends how costly the ilog2 is. On ARM the clz instruction is about 1 
> cycle. If you need to figure out the MSB manually then it might be best 
> to skip those ilog2's.

I was going to measure it.
But I've pulled chunks from the kernel headers into a userspace build.
This is the x86-32 code for the 'if (ilog2(a) + ilog2(b) < 62)' test:

    1b2b:       0f bd c6                bsr    %esi,%eax
    1b2e:       75 05                   jne    1b35 <mul_u64_add_u64_div_u64_new+0x75>
    1b30:       b8 ff ff ff ff          mov    $0xffffffff,%eax
    1b35:       85 c9                   test   %ecx,%ecx
    1b37:       74 0d                   je     1b46 <mul_u64_add_u64_div_u64_new+0x86>
    1b39:       0f bd c1                bsr    %ecx,%eax
    1b3c:       75 05                   jne    1b43 <mul_u64_add_u64_div_u64_new+0x83>
    1b3e:       b8 ff ff ff ff          mov    $0xffffffff,%eax
    1b43:       83 c0 20                add    $0x20,%eax
    1b46:       0f bd d5                bsr    %ebp,%edx
    1b49:       75 05                   jne    1b50 <mul_u64_add_u64_div_u64_new+0x90>
    1b4b:       ba ff ff ff ff          mov    $0xffffffff,%edx
    1b50:       85 ff                   test   %edi,%edi
    1b52:       74 0d                   je     1b61 <mul_u64_add_u64_div_u64_new+0xa1>
    1b54:       0f bd d7                bsr    %edi,%edx
    1b57:       75 05                   jne    1b5e <mul_u64_add_u64_div_u64_new+0x9e>
    1b59:       ba ff ff ff ff          mov    $0xffffffff,%edx
    1b5e:       83 c2 20                add    $0x20,%edx
    1b61:       8d 1c 02                lea    (%edx,%eax,1),%ebx
    1b64:       83 fb 3e                cmp    $0x3e,%ebx
    1b67:       0f 8e 0b 03 00 00       jle    1e78 <mul_u64_add_u64_div_u64_new+0x3b8>

If 'cmov' is enabled (not by default even after the current plan to remove 486 support) it is:

    1b2b:       ba ff ff ff ff          mov    $0xffffffff,%edx
    1b30:       85 c9                   test   %ecx,%ecx
    1b32:       0f 85 98 03 00 00       jne    1ed0 <mul_u64_add_u64_div_u64_new+0x410>
    1b38:       0f bd c6                bsr    %esi,%eax
    1b3b:       0f 44 c2                cmove  %edx,%eax
    1b3e:       bb ff ff ff ff          mov    $0xffffffff,%ebx
    1b43:       85 ff                   test   %edi,%edi
    1b45:       0f 85 75 03 00 00       jne    1ec0 <mul_u64_add_u64_div_u64_new+0x400>
    1b4b:       0f bd d5                bsr    %ebp,%edx
    1b4e:       0f 44 d3                cmove  %ebx,%edx
    1b51:       8d 1c 02                lea    (%edx,%eax,1),%ebx
    1b54:       83 fb 3e                cmp    $0x3e,%ebx
    1b57:       0f 8e 0b 03 00 00       jle    1e68 <mul_u64_add_u64_div_u64_new+0x3a8>

with:
    1ec0:       0f bd d7                bsr    %edi,%edx
    1ec3:       0f 44 d3                cmove  %ebx,%edx
    1ec6:       83 c2 20                add    $0x20,%edx
    1ec9:       e9 83 fc ff ff          jmp    1b51 <mul_u64_add_u64_div_u64_new+0x91>

    1ed0:       0f bd c1                bsr    %ecx,%eax
    1ed3:       0f 44 c2                cmove  %edx,%eax
    1ed6:       83 c0 20                add    $0x20,%eax
    1ed9:       e9 60 fc ff ff          jmp    1b3e <mul_u64_add_u64_div_u64_new+0x7e>

Neither is pretty...
Some of the 'damage' is because the x86 'bsr' (bit scan reverse) sets 'z' for zero
and leaves the output undefined (unchanged on later cpu).

For reference I can get the multiply down to:
    1b5d:       89 f0                   mov    %esi,%eax
    1b5f:       f7 e5                   mul    %ebp
    1b61:       03 44 24 38             add    0x38(%esp),%eax
    1b65:       83 d2 00                adc    $0x0,%edx
    1b68:       89 d3                   mov    %edx,%ebx
    1b6a:       89 44 24 08             mov    %eax,0x8(%esp)
    1b6e:       89 f0                   mov    %esi,%eax
    1b70:       f7 e7                   mul    %edi
    1b72:       03 44 24 3c             add    0x3c(%esp),%eax
    1b76:       83 d2 00                adc    $0x0,%edx
    1b79:       01 d8                   add    %ebx,%eax
    1b7b:       83 d2 00                adc    $0x0,%edx
    1b7e:       89 d6                   mov    %edx,%esi
    1b80:       89 c3                   mov    %eax,%ebx
    1b82:       89 c8                   mov    %ecx,%eax
    1b84:       f7 e7                   mul    %edi
    1b86:       89 c7                   mov    %eax,%edi
    1b88:       89 c8                   mov    %ecx,%eax
    1b8a:       01 f7                   add    %esi,%edi
    1b8c:       83 d2 00                adc    $0x0,%edx
    1b8f:       89 d6                   mov    %edx,%esi
    1b91:       f7 e5                   mul    %ebp
    1b93:       89 c1                   mov    %eax,%ecx
    1b95:       8b 44 24 08             mov    0x8(%esp),%eax
    1b99:       89 d5                   mov    %edx,%ebp
    1b9b:       01 d9                   add    %ebx,%ecx
    1b9d:       83 d5 00                adc    $0x0,%ebp
    1ba0:       89 44 24 28             mov    %eax,0x28(%esp)
    1ba4:       01 ef                   add    %ebp,%edi
    1ba6:       83 d6 00                adc    $0x0,%esi
    1ba9:       89 74 24 1c             mov    %esi,0x1c(%esp)
    1bad:       8b 5c 24 1c             mov    0x1c(%esp),%ebx
    1bb1:       89 7c 24 18             mov    %edi,0x18(%esp)
    1bb5:       8b 44 24 18             mov    0x18(%esp),%eax
    1bb9:       89 4c 24 2c             mov    %ecx,0x2c(%esp)
    1bbd:       09 c3                   or     %eax,%ebx
    1bbf:       0f 84 1b 03 00 00       je     1ee0 <mul_u64_add_u64_div_u64_new+0x420>

If you follow the register dependency chain it won't be as long as it looks.
(Although the last few instructions are terrible! - I've tried a few things
and they won't go away.)

	David


  reply	other threads:[~2025-05-25 11:38 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-05-18 13:38 [PATCH v2 next 0/4] Implement mul_u64_u64_div_u64_roundup() David Laight
2025-05-18 13:38 ` [PATCH v2 next 1/4] lib: mul_u64_u64_div_u64() rename parameter 'c' to 'd' David Laight
2025-05-20  2:11   ` Nicolas Pitre
2025-05-18 13:38 ` [PATCH v2 next 2/4] lib: mul_u64_u64_div_u64() Use BUG_ON() for divide by zero David Laight
2025-05-18 15:42   ` kernel test robot
2025-05-18 21:50     ` David Laight
2025-05-19  6:10   ` Uwe Kleine-König
2025-05-19 11:59     ` David Laight
2025-05-20  1:54       ` Nicolas Pitre
2025-05-20  2:21   ` Nicolas Pitre
2025-05-20 21:43     ` David Laight
2025-05-20 22:28       ` Nicolas Pitre
2025-05-18 13:38 ` [PATCH v2 next 3/4] lib: Add mul_u64_add_u64_div_u64() and mul_u64_u64_div_u64_roundup() David Laight
2025-05-20  3:03   ` Nicolas Pitre
2025-05-20 21:37     ` David Laight
2025-05-20 22:24       ` Nicolas Pitre
2025-05-21 12:52         ` David Laight
2025-05-21 13:50           ` Nicolas Pitre
2025-05-25 11:38             ` David Laight [this message]
2025-05-18 13:38 ` [PATCH v2 next 4/4] lib: Add tests for mul_u64_u64_div_u64_roundup() David Laight
2025-05-20  3:07   ` Nicolas Pitre

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250525123845.5b023297@pumpkin \
    --to=david.laight.linux@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=biju.das.jz@bp.renesas.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=npitre@baylibre.com \
    --cc=oleg@redhat.com \
    --cc=peterz@infradead.org \
    --cc=u.kleine-koenig@baylibre.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.