From: David Laight <david.laight.linux@gmail.com>
To: Nicolas Pitre <npitre@baylibre.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
linux-kernel@vger.kernel.org, u.kleine-koenig@baylibre.com,
Oleg Nesterov <oleg@redhat.com>,
Peter Zijlstra <peterz@infradead.org>,
Biju Das <biju.das.jz@bp.renesas.com>
Subject: Re: [PATCH v2 next 3/4] lib: Add mul_u64_add_u64_div_u64() and mul_u64_u64_div_u64_roundup()
Date: Wed, 21 May 2025 13:52:46 +0100 [thread overview]
Message-ID: <20250521135246.7dab6bda@pumpkin> (raw)
In-Reply-To: <148nop5q-s958-n0q4-66r8-o91ns4pnr4on@onlyvoer.pbz>
On Tue, 20 May 2025 18:24:58 -0400 (EDT)
Nicolas Pitre <npitre@baylibre.com> wrote:
> On Tue, 20 May 2025, David Laight wrote:
>
> > On Mon, 19 May 2025 23:03:21 -0400 (EDT)
> > Nicolas Pitre <npitre@baylibre.com> wrote:
> >
...
> > > Here you should do:
> > >
> > > if (ilog2(a) + ilog2(b) <= 62) {
> > > u64 ab = a * b;
> > > u64 abc = ab + c;
> > > if (ab <= abc)
> > > return div64_u64(abc, d);
> > > }
> > >
> > > This is cheap and won't unconditionally discard the faster path when c != 0;
> >
> > That isn't really cheap.
> > ilog2() is likely to be a similar cost to a multiply
> > (my brain remembers them both as 'latency 3' on x86).
>
> I'm not discussing the ilog2() usage though. I'm just against limiting
> the test to !c. My suggestion is about supporting all values of c.
I've had further thoughts on that test.
Most (but not all - and I've forgotten which) 64bit cpu have a 64x64=>128
multiple and support u128.
So the 'multiply in parts' code is mostly for 32bit.
That means that the 'a * b' for the call to div64_u64() has to be three
32x32=>64 multiplies with all the extra 'add' and 'adc $0' to generate
a correct 64bit result.
This is (well should be) much the same as the multiply coded in the
function - except it is generated by the compiler itself.
The only parts it can ignore are the those that set 'z' and 'y_hi'.
If my clock sequence (in the other email) is correct it saves 3 of 10
clocks - so test to avoid the multiply has to be better than that.
That probably means the only worthwhile check is for a and b being 32bit
so a single multiply can be used.
The generated code for 32bit x86 isn't as good as one might hope.
partially due to only having 7 (6 if %bp is a stack frame) registers.
clang makes a reasonable job of it, gcc doesn't.
There is already a mul_u32_u32() wrapper in arch/x86/include/asm/div64.h.
There needs to be a similar add_u64_u32() (contains add %s,%d_lo, adc $0,%d_hi).
Without them gcc spills a lot of values to stack - including constant zeros.
I might add those and use them in v3 (which I need to send anyway).
They'll match what my 'pending' faster code does.
David
next prev parent reply other threads:[~2025-05-21 12:52 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-05-18 13:38 [PATCH v2 next 0/4] Implement mul_u64_u64_div_u64_roundup() David Laight
2025-05-18 13:38 ` [PATCH v2 next 1/4] lib: mul_u64_u64_div_u64() rename parameter 'c' to 'd' David Laight
2025-05-20 2:11 ` Nicolas Pitre
2025-05-18 13:38 ` [PATCH v2 next 2/4] lib: mul_u64_u64_div_u64() Use BUG_ON() for divide by zero David Laight
2025-05-18 15:42 ` kernel test robot
2025-05-18 21:50 ` David Laight
2025-05-19 6:10 ` Uwe Kleine-König
2025-05-19 11:59 ` David Laight
2025-05-20 1:54 ` Nicolas Pitre
2025-05-20 2:21 ` Nicolas Pitre
2025-05-20 21:43 ` David Laight
2025-05-20 22:28 ` Nicolas Pitre
2025-05-18 13:38 ` [PATCH v2 next 3/4] lib: Add mul_u64_add_u64_div_u64() and mul_u64_u64_div_u64_roundup() David Laight
2025-05-20 3:03 ` Nicolas Pitre
2025-05-20 21:37 ` David Laight
2025-05-20 22:24 ` Nicolas Pitre
2025-05-21 12:52 ` David Laight [this message]
2025-05-21 13:50 ` Nicolas Pitre
2025-05-25 11:38 ` David Laight
2025-05-18 13:38 ` [PATCH v2 next 4/4] lib: Add tests for mul_u64_u64_div_u64_roundup() David Laight
2025-05-20 3:07 ` Nicolas Pitre
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250521135246.7dab6bda@pumpkin \
--to=david.laight.linux@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=biju.das.jz@bp.renesas.com \
--cc=linux-kernel@vger.kernel.org \
--cc=npitre@baylibre.com \
--cc=oleg@redhat.com \
--cc=peterz@infradead.org \
--cc=u.kleine-koenig@baylibre.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox