Re: [PATCH v4 next 3/9] lib: mul_u64_u64_div_u64() simplify check for a 64bit product

All of lore.kernel.org
 help / color / mirror / Atom feed

From: David Laight <david.laight.linux@gmail.com>
To: Nicolas Pitre <npitre@baylibre.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	linux-kernel@vger.kernel.org, u.kleine-koenig@baylibre.com,
	Oleg Nesterov <oleg@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Biju Das <biju.das.jz@bp.renesas.com>,
	Borislav Petkov <bp@alien8.de>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	"H. Peter Anvin" <hpa@zytor.com>, Ingo Molnar <mingo@redhat.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Li RongQing <lirongqing@baidu.com>, Yu Kuai <yukuai3@huawei.com>,
	Khazhismel Kumykov <khazhy@chromium.org>,
	Jens Axboe <axboe@kernel.dk>,
	x86@kernel.org
Subject: Re: [PATCH v4 next 3/9] lib: mul_u64_u64_div_u64() simplify check for a 64bit product
Date: Fri, 31 Oct 2025 09:19:18 +0000	[thread overview]
Message-ID: <20251031091918.643b0868@pumpkin> (raw)
In-Reply-To: <26p1nq66-8pq5-3655-r7n5-102o989391s2@onlyvoer.pbz>

On Wed, 29 Oct 2025 14:11:08 -0400 (EDT)
Nicolas Pitre <npitre@baylibre.com> wrote:

> On Wed, 29 Oct 2025, David Laight wrote:
> 
> > If the product is only 64bits div64_u64() can be used for the divide.
> > Replace the pre-multiply check (ilog2(a) + ilog2(b) <= 62) with a
> > simple post-multiply check that the high 64bits are zero.
> > 
> > This has the advantage of being simpler, more accurate and less code.
> > It will always be faster when the product is larger than 64bits.
> > 
> > Most 64bit cpu have a native 64x64=128 bit multiply, this is needed
> > (for the low 64bits) even when div64_u64() is called - so the early
> > check gains nothing and is just extra code.
> > 
> > 32bit cpu will need a compare (etc) to generate the 64bit ilog2()
> > from two 32bit bit scans - so that is non-trivial.
> > (Never mind the mess of x86's 'bsr' and any oddball cpu without
> > fast bit-scan instructions.)
> > Whereas the additional instructions for the 128bit multiply result
> > are pretty much one multiply and two adds (typically the 'adc $0,%reg'
> > can be run in parallel with the instruction that follows).
> > 
> > The only outliers are 64bit systems without 128bit mutiply and
> > simple in order 32bit ones with fast bit scan but needing extra
> > instructions to get the high bits of the multiply result.
> > I doubt it makes much difference to either, the latter is definitely
> > not mainstream.
> > 
> > If anyone is worried about the analysis they can look at the
> > generated code for x86 (especially when cmov isn't used).
> > 
> > Signed-off-by: David Laight <david.laight.linux@gmail.com>  
> 
> Comment below.
> 
> 
> > ---
> > 
> > Split from patch 3 for v2, unchanged since.
> > 
> >  lib/math/div64.c | 6 +++---
> >  1 file changed, 3 insertions(+), 3 deletions(-)
> > 
> > diff --git a/lib/math/div64.c b/lib/math/div64.c
> > index 1092f41e878e..7158d141b6e9 100644
> > --- a/lib/math/div64.c
> > +++ b/lib/math/div64.c
> > @@ -186,9 +186,6 @@ EXPORT_SYMBOL(iter_div_u64_rem);
> >  #ifndef mul_u64_u64_div_u64
> >  u64 mul_u64_u64_div_u64(u64 a, u64 b, u64 d)
> >  {
> > -	if (ilog2(a) + ilog2(b) <= 62)
> > -		return div64_u64(a * b, d);
> > -
> >  #if defined(__SIZEOF_INT128__)
> >  
> >  	/* native 64x64=128 bits multiplication */
> > @@ -224,6 +221,9 @@ u64 mul_u64_u64_div_u64(u64 a, u64 b, u64 d)
> >  		return ~0ULL;
> >  	}
> >  
> > +	if (!n_hi)
> > +		return div64_u64(n_lo, d);  
> 
> I'd move this before the overflow test. If this is to be taken then 
> you'll save one test. same cost otherwise.
> 

I wanted the 'divide by zero' result to be consistent.

Additionally the change to stop the x86-64 version panicking on
overflow also makes it return ~0 for divide by zero.
If that is done then this version needs to be consistent and
return ~0 for divide by zero - which div64_u64() won't do.

It is worth remembering that the chance of (a * b + c)/d being ~0
is pretty small (for non-test inputs), and any code that might expect
such a value is likely to have to handle overflow as well.
(Not to mention avoiding overflow of 'a' and 'b'.)
So using ~0 for overflow isn't really a problem.

	David

next prev parent reply	other threads:[~2025-10-31  9:19 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-29 17:38 [PATCH v4 next 0/9] Implement mul_u64_u64_div_u64_roundup() David Laight
2025-10-29 17:38 ` [PATCH v4 next 1/9] lib: mul_u64_u64_div_u64() rename parameter 'c' to 'd' David Laight
2025-10-29 17:38 ` [PATCH v4 next 2/9] lib: mul_u64_u64_div_u64() Combine overflow and divide by zero checks David Laight
2025-10-29 18:02   ` Nicolas Pitre
2025-10-29 17:38 ` [PATCH v4 next 3/9] lib: mul_u64_u64_div_u64() simplify check for a 64bit product David Laight
2025-10-29 18:11   ` Nicolas Pitre
2025-10-31  9:19     ` David Laight [this message]
2025-10-31 17:26       ` Nicolas Pitre
2025-10-31 18:04         ` David Laight
2025-10-31 18:45           ` Nicolas Pitre
2025-10-31 20:12             ` David Laight
2025-10-29 17:38 ` [PATCH v4 next 4/9] lib: Add mul_u64_add_u64_div_u64() and mul_u64_u64_div_u64_roundup() David Laight
2025-10-29 18:17   ` Nicolas Pitre
2025-10-31 20:59   ` David Laight
2025-11-01  2:12     ` Andrew Morton
2025-10-29 17:38 ` [PATCH v4 next 5/9] lib: Add tests for mul_u64_u64_div_u64_roundup() David Laight
2025-10-29 18:26   ` Nicolas Pitre
2025-10-29 17:38 ` [PATCH v4 next 6/9] lib: test_mul_u64_u64_div_u64: Test both generic and arch versions David Laight
2025-10-29 18:53   ` Nicolas Pitre
2025-11-01 19:35   ` kernel test robot
2025-11-01 20:59   ` kernel test robot
2025-11-02 10:36     ` David Laight
2025-10-29 17:38 ` [PATCH v4 next 7/9] lib: mul_u64_u64_div_u64() optimise multiply on 32bit x86 David Laight
2025-10-29 19:01   ` Nicolas Pitre
2025-10-29 17:38 ` [PATCH v4 next 8/9] lib: mul_u64_u64_div_u64() Optimise the divide code David Laight
2025-10-29 20:47   ` Nicolas Pitre
2025-10-29 17:38 ` [PATCH v4 next 9/9] lib: test_mul_u64_u64_div_u64: Test the 32bit code on 64bit David Laight
2025-10-29 20:48   ` Nicolas Pitre
2025-10-31  4:29 ` [PATCH v4 next 0/9] Implement mul_u64_u64_div_u64_roundup() Andrew Morton
2025-11-04 17:16   ` Nicolas Pitre
2025-10-31 13:52 ` Oleg Nesterov
2025-10-31 16:17   ` David Laight

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20251031091918.643b0868@pumpkin \
    --to=david.laight.linux@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=axboe@kernel.dk \
    --cc=biju.das.jz@bp.renesas.com \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=hpa@zytor.com \
    --cc=khazhy@chromium.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lirongqing@baidu.com \
    --cc=mingo@redhat.com \
    --cc=npitre@baylibre.com \
    --cc=oleg@redhat.com \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=u.kleine-koenig@baylibre.com \
    --cc=x86@kernel.org \
    --cc=yukuai3@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.