All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Laight <david.laight.linux@gmail.com>
To: Oleg Nesterov <oleg@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>, Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Borislav Petkov <bp@alien8.de>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	"Li,Rongqing" <lirongqing@baidu.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	linux-kernel@vger.kernel.org, x86@kernel.org
Subject: Re: [PATCH] x86/math64: handle #DE in mul_u64_u64_div_u64()
Date: Tue, 22 Jul 2025 22:56:53 +0100	[thread overview]
Message-ID: <20250722225653.65520fa3@pumpkin> (raw)
In-Reply-To: <20250722183853.GD2845@redhat.com>

On Tue, 22 Jul 2025 20:38:54 +0200
Oleg Nesterov <oleg@redhat.com> wrote:

> On 07/22, H. Peter Anvin wrote:
> >
> > On July 22, 2025 10:58:08 AM PDT, Oleg Nesterov <oleg@redhat.com> wrote:  
> > >On 07/22, H. Peter Anvin wrote:  
> > >>
> > >> On July 22, 2025 3:50:35 AM PDT, Oleg Nesterov <oleg@redhat.com> wrote:  
> > >> >
> > >> >The generic implementation doesn't WARN... OK, I won't argue.
> > >> >How about
> > >> >
> > >> >	static inline u64 mul_u64_u64_div_u64(u64 a, u64 mul, u64 div)
> > >> >	{
> > >> >		char ok = 0;
> > >> >		u64 q;
> > >> >
> > >> >		asm ("mulq %3; 1: divq %4; movb $1,%1; 2:\n"
> > >> >			_ASM_EXTABLE(1b, 2b)
> > >> >			: "=a" (q), "+r" (ok)
> > >> >			: "a" (a), "rm" (mul), "rm" (div)
> > >> >			: "rdx");
> > >> >
> > >> >		if (ok)
> > >> >			return q;
> > >> >		BUG_ON(!div);
> > >> >		WARN_ON_ONCE(1);
> > >> >		return ~(u64)0;
> > >> >	}
> > >> >
> > >> >?
> > >> >
> > >> >Oleg.  
> > >>
> > >> Maybe the generic version *should* warn?  
> > >
> > >David is going to change the generic version to WARN.
> > >  
> > >> As far as the ok output, the Right Way™ to do it is with an asm goto instead
> > >> of a status variable; the second best tends to be to use the flags output.  
> > >
> > >This is what I was going to do initially. But this needs
> > >CONFIG_CC_HAS_ASM_GOTO_OUTPUT
> > >
> > >Oleg.
> > >  
> >
> > But that's what you want to optimize for, since that is all the modern compilers, even if you have to have two versions as a result.  
> 
> Well, this 'divq' is slow anyway, I don't won't to add 2 versions.
> Can we add the optimized version later if it really makes sense?

Yes, what matters more is code size and simplicity of use (by the caller).
Zen3 has a reasonably fast divq, but you have to get to 'cannon lake' to
get an intel one that isn't near to 100 clocks.

The generic code is horrid - nearly 1000 clocks for random data running
the 32bit code on sandy bridge! (I'm not sure newer will be much better).
(And non-x86 without 'sh[rl]d' will be worse.)

I've re-written it (patches posted a few weeks back).
Sandy bridge is now (from memory) ~250 clcoks in 32bit and ~150 in 64bit,
zen5 ~80 (helped by the much faster divq used for 64/64 divide).

The killer is mispredicted branches, change the arguments so a
slightly different path is taken and it costs at least 20 clocks.

I'm timing single calls, any kind of loop trains the branch predictor
(I'm not running cold cache though).

Given those clock counts I rally wouldn't worry about a few integer
instructions in the x86_64 path.

	David

> 
> Oleg.
> 


  parent reply	other threads:[~2025-07-22 21:56 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-07-21 13:04 [PATCH] x86/math64: handle #DE in mul_u64_u64_div_u64() Oleg Nesterov
2025-07-21 18:20 ` David Laight
2025-07-22 10:50   ` Oleg Nesterov
2025-07-22 12:09     ` David Laight
2025-07-22 13:21       ` Oleg Nesterov
2025-07-22 22:03         ` David Laight
2025-07-23  9:38           ` Oleg Nesterov
2025-07-23 21:48             ` David Laight
2025-07-24  8:11               ` Oleg Nesterov
2025-07-24  8:25                 ` Oleg Nesterov
2025-07-24 11:14                   ` Oleg Nesterov
2025-07-25  1:00                     ` H. Peter Anvin
2025-07-25 10:12                       ` Oleg Nesterov
2025-07-25 21:46                         ` David Laight
2025-07-26  9:55                           ` Oleg Nesterov
2025-07-24 12:00                   ` David Laight
2025-07-24 13:58                     ` Oleg Nesterov
2025-07-22 16:53       ` H. Peter Anvin
2025-07-22 21:53         ` David Laight
2025-07-22 16:50     ` H. Peter Anvin
2025-07-22 17:58       ` Oleg Nesterov
2025-07-22 18:12         ` H. Peter Anvin
2025-07-22 18:38           ` Oleg Nesterov
2025-07-22 19:26             ` H. Peter Anvin
2025-07-22 21:56             ` David Laight [this message]
2025-07-27 12:34 ` [PATCH v2] " Oleg Nesterov
2025-07-28 18:53   ` David Laight
2025-07-30  2:30   ` [????] " Li,Rongqing

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250722225653.65520fa3@pumpkin \
    --to=david.laight.linux@gmail.com \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=hpa@zytor.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lirongqing@baidu.com \
    --cc=mingo@redhat.com \
    --cc=oleg@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.