From: David Laight <david.laight.linux@gmail.com>
To: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
linux-kernel@vger.kernel.org, u.kleine-koenig@baylibre.com,
Nicolas Pitre <npitre@baylibre.com>,
Oleg Nesterov <oleg@redhat.com>,
Peter Zijlstra <peterz@infradead.org>,
Biju Das <biju.das.jz@bp.renesas.com>,
Borislav Petkov <bp@alien8.de>,
Dave Hansen <dave.hansen@linux.intel.com>,
Ingo Molnar <mingo@redhat.com>,
Thomas Gleixner <tglx@linutronix.de>,
Li RongQing <lirongqing@baidu.com>,
Khazhismel Kumykov <khazhy@chromium.org>,
Jens Axboe <axboe@kernel.dk>,
x86@kernel.org
Subject: Re: [PATCH v5 next 7/9] lib: mul_u64_u64_div_u64() optimise multiply on 32bit x86
Date: Thu, 6 Nov 2025 09:26:33 +0000 [thread overview]
Message-ID: <20251106092633.6058001e@pumpkin> (raw)
In-Reply-To: <DE009601-0605-4A6E-99B3-E8A789F85BF6@zytor.com>
On Wed, 05 Nov 2025 15:45:29 -0800
"H. Peter Anvin" <hpa@zytor.com> wrote:
> On November 5, 2025 12:10:33 PM PST, David Laight <david.laight.linux@gmail.com> wrote:
> >gcc generates horrid code for both ((u64)u32_a * u32_b) and (u64_a + u32_b).
> >As well as the extra instructions it can generate a lot of spills to stack
> >(including spills of constant zeros and even multiplies by constant zero).
> >
> >mul_u32_u32() already exists to optimise the multiply.
> >Add a similar add_u64_32() for the addition.
> >Disable both for clang - it generates better code without them.
> >
> >Move the 64x64 => 128 multiply into a static inline helper function
> >for code clarity.
> >No need for the a/b_hi/lo variables, the implicit casts on the function
> >calls do the work for us.
> >Should have minimal effect on the generated code.
> >
> >Use mul_u32_u32() and add_u64_u32() in the 64x64 => 128 multiply
> >in mul_u64_add_u64_div_u64().
> >
> >Signed-off-by: David Laight <david.laight.linux@gmail.com>
> >Reviewed-by: Nicolas Pitre <npitre@baylibre.com>
> >---
> >
> >Changes for v4:
> >- merge in patch 8.
> >- Add comments about gcc being 'broken' for mixed 32/64 bit maths.
> > clang doesn't have the same issues.
> >- Use a #define for define mul_add() to avoid 'defined but not used'
> > errors.
> >
> > arch/x86/include/asm/div64.h | 19 +++++++++++++++++
> > include/linux/math64.h | 11 ++++++++++
> > lib/math/div64.c | 40 +++++++++++++++++++++++-------------
> > 3 files changed, 56 insertions(+), 14 deletions(-)
> >
> >diff --git a/arch/x86/include/asm/div64.h b/arch/x86/include/asm/div64.h
> >index 6d8a3de3f43a..30fd06ede751 100644
> >--- a/arch/x86/include/asm/div64.h
> >+++ b/arch/x86/include/asm/div64.h
> >@@ -60,6 +60,12 @@ static inline u64 div_u64_rem(u64 dividend, u32 divisor, u32 *remainder)
> > }
> > #define div_u64_rem div_u64_rem
> >
> >+/*
> >+ * gcc tends to zero extend 32bit values and do full 64bit maths.
> >+ * Define asm functions that avoid this.
> >+ * (clang generates better code for the C versions.)
> >+ */
> >+#ifndef __clang__
> > static inline u64 mul_u32_u32(u32 a, u32 b)
> > {
> > u32 high, low;
> >@@ -71,6 +77,19 @@ static inline u64 mul_u32_u32(u32 a, u32 b)
> > }
> > #define mul_u32_u32 mul_u32_u32
> >
> >+static inline u64 add_u64_u32(u64 a, u32 b)
> >+{
> >+ u32 high = a >> 32, low = a;
> >+
> >+ asm ("addl %[b], %[low]; adcl $0, %[high]"
> >+ : [low] "+r" (low), [high] "+r" (high)
> >+ : [b] "rm" (b) );
> >+
> >+ return low | (u64)high << 32;
> >+}
> >+#define add_u64_u32 add_u64_u32
> >+#endif
...
>
> By the way have you filed gcc bug reports for this?
As in the need for the asm() above?
No...
I doubt one was filed when the mul version was added either.
ISTR that some very recent gcc versions were a bit better, but it depends
on minor code changes and compiler options.
I suspect that internally gcc sometimes keeps a 64bit value as two 32bit
ones, but at other times it is assigned to a 64bit internal register.
If the latter happens it always promotes a 32bit value to 64 bits and
assigns to another 64bit register.
At that point it won't split the 64bit registers - so a lot of spills to
stack happen when it tries to assign real registers.
So breath on an 'A' (dx:ax) constraint and the generated code is horrid.
Even the lo | (u64)hi << 32 can generate 'or' instructions.
The same happens for int128 on 64bit.
David
next prev parent reply other threads:[~2025-11-06 9:26 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-05 20:10 [PATCH v5 next 0/9] Implement mul_u64_u64_div_u64_roundup() David Laight
2025-11-05 20:10 ` [PATCH v5 next 1/9] lib: mul_u64_u64_div_u64() rename parameter 'c' to 'd' David Laight
2025-11-05 20:10 ` [PATCH v5 next 2/9] lib: mul_u64_u64_div_u64() Combine overflow and divide by zero checks David Laight
2025-11-05 20:10 ` [PATCH v5 next 3/9] lib: mul_u64_u64_div_u64() simplify check for a 64bit product David Laight
2025-11-05 20:59 ` Nicolas Pitre
2025-11-05 20:10 ` [PATCH v5 next 4/9] lib: Add mul_u64_add_u64_div_u64() and mul_u64_u64_div_u64_roundup() David Laight
2025-11-06 0:26 ` H. Peter Anvin
2025-11-06 9:52 ` David Laight
2025-11-05 20:10 ` [PATCH v5 next 5/9] lib: Add tests for mul_u64_u64_div_u64_roundup() David Laight
2025-11-05 20:10 ` [PATCH v5 next 6/9] lib: test_mul_u64_u64_div_u64: Test both generic and arch versions David Laight
2025-11-05 20:10 ` [PATCH v5 next 7/9] lib: mul_u64_u64_div_u64() optimise multiply on 32bit x86 David Laight
2025-11-05 23:45 ` H. Peter Anvin
2025-11-06 9:26 ` David Laight [this message]
2025-11-05 20:10 ` [PATCH v5 next 8/9] lib: mul_u64_u64_div_u64() Optimise the divide code David Laight
2025-11-05 20:10 ` [PATCH v5 next 9/9] lib: test_mul_u64_u64_div_u64: Test the 32bit code on 64bit David Laight
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20251106092633.6058001e@pumpkin \
--to=david.laight.linux@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=axboe@kernel.dk \
--cc=biju.das.jz@bp.renesas.com \
--cc=bp@alien8.de \
--cc=dave.hansen@linux.intel.com \
--cc=hpa@zytor.com \
--cc=khazhy@chromium.org \
--cc=linux-kernel@vger.kernel.org \
--cc=lirongqing@baidu.com \
--cc=mingo@redhat.com \
--cc=npitre@baylibre.com \
--cc=oleg@redhat.com \
--cc=peterz@infradead.org \
--cc=tglx@linutronix.de \
--cc=u.kleine-koenig@baylibre.com \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox