Re: [PATCH v2 next 3/4] lib: Add mul_u64_add_u64_div_u64() and mul_u64_u64_div_u64_roundup()

All of lore.kernel.org
 help / color / mirror / Atom feed

From: David Laight <david.laight.linux@gmail.com>
To: Nicolas Pitre <npitre@baylibre.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	linux-kernel@vger.kernel.org, u.kleine-koenig@baylibre.com,
	Oleg Nesterov <oleg@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Biju Das <biju.das.jz@bp.renesas.com>
Subject: Re: [PATCH v2 next 3/4] lib: Add mul_u64_add_u64_div_u64() and mul_u64_u64_div_u64_roundup()
Date: Wed, 21 May 2025 13:52:46 +0100	[thread overview]
Message-ID: <20250521135246.7dab6bda@pumpkin> (raw)
In-Reply-To: <148nop5q-s958-n0q4-66r8-o91ns4pnr4on@onlyvoer.pbz>

On Tue, 20 May 2025 18:24:58 -0400 (EDT)
Nicolas Pitre <npitre@baylibre.com> wrote:

> On Tue, 20 May 2025, David Laight wrote:
> 
> > On Mon, 19 May 2025 23:03:21 -0400 (EDT)
> > Nicolas Pitre <npitre@baylibre.com> wrote:
> >   
...
> > > Here you should do:
> > > 
> > > 	if (ilog2(a) + ilog2(b) <= 62) {
> > > 		u64 ab = a * b;
> > > 		u64 abc = ab + c;
> > > 		if (ab <= abc)
> > > 			return div64_u64(abc, d);
> > > 	}
> > > 
> > > This is cheap and won't unconditionally discard the faster path when c != 0;  
> > 
> > That isn't really cheap.
> > ilog2() is likely to be a similar cost to a multiply
> > (my brain remembers them both as 'latency 3' on x86).  
> 
> I'm not discussing the ilog2() usage though. I'm just against limiting 
> the test to !c. My suggestion is about supporting all values of c.

I've had further thoughts on that test.
Most (but not all - and I've forgotten which) 64bit cpu have a 64x64=>128
multiple and support u128.
So the 'multiply in parts' code is mostly for 32bit.
That means that the 'a * b' for the call to div64_u64() has to be three
32x32=>64 multiplies with all the extra 'add' and 'adc $0' to generate
a correct 64bit result.
This is (well should be) much the same as the multiply coded in the
function - except it is generated by the compiler itself.
The only parts it can ignore are the those that set 'z' and 'y_hi'.
If my clock sequence (in the other email) is correct it saves 3 of 10
clocks - so test to avoid the multiply has to be better than that.
That probably means the only worthwhile check is for a and b being 32bit
so a single multiply can be used.

The generated code for 32bit x86 isn't as good as one might hope.
partially due to only having 7 (6 if %bp is a stack frame) registers.
clang makes a reasonable job of it, gcc doesn't.
There is already a mul_u32_u32() wrapper in arch/x86/include/asm/div64.h.
There needs to be a similar add_u64_u32() (contains add %s,%d_lo, adc $0,%d_hi).
Without them gcc spills a lot of values to stack - including constant zeros.

I might add those and use them in v3 (which I need to send anyway).
They'll match what my 'pending' faster code does.

	David

next prev parent reply	other threads:[~2025-05-21 12:52 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-05-18 13:38 [PATCH v2 next 0/4] Implement mul_u64_u64_div_u64_roundup() David Laight
2025-05-18 13:38 ` [PATCH v2 next 1/4] lib: mul_u64_u64_div_u64() rename parameter 'c' to 'd' David Laight
2025-05-20  2:11   ` Nicolas Pitre
2025-05-18 13:38 ` [PATCH v2 next 2/4] lib: mul_u64_u64_div_u64() Use BUG_ON() for divide by zero David Laight
2025-05-18 15:42   ` kernel test robot
2025-05-18 21:50     ` David Laight
2025-05-19  6:10   ` Uwe Kleine-König
2025-05-19 11:59     ` David Laight
2025-05-20  1:54       ` Nicolas Pitre
2025-05-20  2:21   ` Nicolas Pitre
2025-05-20 21:43     ` David Laight
2025-05-20 22:28       ` Nicolas Pitre
2025-05-18 13:38 ` [PATCH v2 next 3/4] lib: Add mul_u64_add_u64_div_u64() and mul_u64_u64_div_u64_roundup() David Laight
2025-05-20  3:03   ` Nicolas Pitre
2025-05-20 21:37     ` David Laight
2025-05-20 22:24       ` Nicolas Pitre
2025-05-21 12:52         ` David Laight [this message]
2025-05-21 13:50           ` Nicolas Pitre
2025-05-25 11:38             ` David Laight
2025-05-18 13:38 ` [PATCH v2 next 4/4] lib: Add tests for mul_u64_u64_div_u64_roundup() David Laight
2025-05-20  3:07   ` Nicolas Pitre

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250521135246.7dab6bda@pumpkin \
    --to=david.laight.linux@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=biju.das.jz@bp.renesas.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=npitre@baylibre.com \
    --cc=oleg@redhat.com \
    --cc=peterz@infradead.org \
    --cc=u.kleine-koenig@baylibre.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.