From: David Laight <david.laight.linux@gmail.com>
To: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
linux-kernel@vger.kernel.org, u.kleine-koenig@baylibre.com,
Nicolas Pitre <npitre@baylibre.com>,
Oleg Nesterov <oleg@redhat.com>,
Peter Zijlstra <peterz@infradead.org>,
Biju Das <biju.das.jz@bp.renesas.com>,
Borislav Petkov <bp@alien8.de>,
Dave Hansen <dave.hansen@linux.intel.com>,
Ingo Molnar <mingo@redhat.com>,
Thomas Gleixner <tglx@linutronix.de>,
Li RongQing <lirongqing@baidu.com>,
Khazhismel Kumykov <khazhy@chromium.org>,
Jens Axboe <axboe@kernel.dk>,
x86@kernel.org
Subject: Re: [PATCH v5 next 7/9] lib: mul_u64_u64_div_u64() optimise multiply on 32bit x86
Date: Thu, 6 Nov 2025 09:26:33 +0000 [thread overview]
Message-ID: <20251106092633.6058001e@pumpkin> (raw)
In-Reply-To: <DE009601-0605-4A6E-99B3-E8A789F85BF6@zytor.com>
On Wed, 05 Nov 2025 15:45:29 -0800
"H. Peter Anvin" <hpa@zytor.com> wrote:
> On November 5, 2025 12:10:33 PM PST, David Laight <david.laight.linux@gmail.com> wrote:
> >gcc generates horrid code for both ((u64)u32_a * u32_b) and (u64_a + u32_b).
> >As well as the extra instructions it can generate a lot of spills to stack
> >(including spills of constant zeros and even multiplies by constant zero).
> >
> >mul_u32_u32() already exists to optimise the multiply.
> >Add a similar add_u64_32() for the addition.
> >Disable both for clang - it generates better code without them.
> >
> >Move the 64x64 => 128 multiply into a static inline helper function
> >for code clarity.
> >No need for the a/b_hi/lo variables, the implicit casts on the function
> >calls do the work for us.
> >Should have minimal effect on the generated code.
> >
> >Use mul_u32_u32() and add_u64_u32() in the 64x64 => 128 multiply
> >in mul_u64_add_u64_div_u64().
> >
> >Signed-off-by: David Laight <david.laight.linux@gmail.com>
> >Reviewed-by: Nicolas Pitre <npitre@baylibre.com>
> >---
> >
> >Changes for v4:
> >- merge in patch 8.
> >- Add comments about gcc being 'broken' for mixed 32/64 bit maths.
> > clang doesn't have the same issues.
> >- Use a #define for define mul_add() to avoid 'defined but not used'
> > errors.
> >
> > arch/x86/include/asm/div64.h | 19 +++++++++++++++++
> > include/linux/math64.h | 11 ++++++++++
> > lib/math/div64.c | 40 +++++++++++++++++++++++-------------
> > 3 files changed, 56 insertions(+), 14 deletions(-)
> >
> >diff --git a/arch/x86/include/asm/div64.h b/arch/x86/include/asm/div64.h
> >index 6d8a3de3f43a..30fd06ede751 100644
> >--- a/arch/x86/include/asm/div64.h
> >+++ b/arch/x86/include/asm/div64.h
> >@@ -60,6 +60,12 @@ static inline u64 div_u64_rem(u64 dividend, u32 divisor, u32 *remainder)
> > }
> > #define div_u64_rem div_u64_rem
> >
> >+/*
> >+ * gcc tends to zero extend 32bit values and do full 64bit maths.
> >+ * Define asm functions that avoid this.
> >+ * (clang generates better code for the C versions.)
> >+ */
> >+#ifndef __clang__
> > static inline u64 mul_u32_u32(u32 a, u32 b)
> > {
> > u32 high, low;
> >@@ -71,6 +77,19 @@ static inline u64 mul_u32_u32(u32 a, u32 b)
> > }
> > #define mul_u32_u32 mul_u32_u32
> >
> >+static inline u64 add_u64_u32(u64 a, u32 b)
> >+{
> >+ u32 high = a >> 32, low = a;
> >+
> >+ asm ("addl %[b], %[low]; adcl $0, %[high]"
> >+ : [low] "+r" (low), [high] "+r" (high)
> >+ : [b] "rm" (b) );
> >+
> >+ return low | (u64)high << 32;
> >+}
> >+#define add_u64_u32 add_u64_u32
> >+#endif
...
>
> By the way have you filed gcc bug reports for this?
As in the need for the asm() above?
No...
I doubt one was filed when the mul version was added either.
ISTR that some very recent gcc versions were a bit better, but it depends
on minor code changes and compiler options.
I suspect that internally gcc sometimes keeps a 64bit value as two 32bit
ones, but at other times it is assigned to a 64bit internal register.
If the latter happens it always promotes a 32bit value to 64 bits and
assigns to another 64bit register.
At that point it won't split the 64bit registers - so a lot of spills to
stack happen when it tries to assign real registers.
So breath on an 'A' (dx:ax) constraint and the generated code is horrid.
Even the lo | (u64)hi << 32 can generate 'or' instructions.
The same happens for int128 on 64bit.
David
next prev parent reply other threads:[~2025-11-06 9:26 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-05 20:10 [PATCH v5 next 0/9] Implement mul_u64_u64_div_u64_roundup() David Laight
2025-11-05 20:10 ` [PATCH v5 next 1/9] lib: mul_u64_u64_div_u64() rename parameter 'c' to 'd' David Laight
2025-11-05 20:10 ` [PATCH v5 next 2/9] lib: mul_u64_u64_div_u64() Combine overflow and divide by zero checks David Laight
2025-11-05 20:10 ` [PATCH v5 next 3/9] lib: mul_u64_u64_div_u64() simplify check for a 64bit product David Laight
2025-11-05 20:59 ` Nicolas Pitre
2025-11-05 20:10 ` [PATCH v5 next 4/9] lib: Add mul_u64_add_u64_div_u64() and mul_u64_u64_div_u64_roundup() David Laight
2025-11-06 0:26 ` H. Peter Anvin
2025-11-06 9:52 ` David Laight
2025-11-05 20:10 ` [PATCH v5 next 5/9] lib: Add tests for mul_u64_u64_div_u64_roundup() David Laight
2025-11-05 20:10 ` [PATCH v5 next 6/9] lib: test_mul_u64_u64_div_u64: Test both generic and arch versions David Laight
2025-11-05 20:10 ` [PATCH v5 next 7/9] lib: mul_u64_u64_div_u64() optimise multiply on 32bit x86 David Laight
2025-11-05 23:45 ` H. Peter Anvin
2025-11-06 9:26 ` David Laight [this message]
2025-11-05 20:10 ` [PATCH v5 next 8/9] lib: mul_u64_u64_div_u64() Optimise the divide code David Laight
2025-11-05 20:10 ` [PATCH v5 next 9/9] lib: test_mul_u64_u64_div_u64: Test the 32bit code on 64bit David Laight
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20251106092633.6058001e@pumpkin \
--to=david.laight.linux@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=axboe@kernel.dk \
--cc=biju.das.jz@bp.renesas.com \
--cc=bp@alien8.de \
--cc=dave.hansen@linux.intel.com \
--cc=hpa@zytor.com \
--cc=khazhy@chromium.org \
--cc=linux-kernel@vger.kernel.org \
--cc=lirongqing@baidu.com \
--cc=mingo@redhat.com \
--cc=npitre@baylibre.com \
--cc=oleg@redhat.com \
--cc=peterz@infradead.org \
--cc=tglx@linutronix.de \
--cc=u.kleine-koenig@baylibre.com \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.