* Re: [PATCH next] i386: Remove string functions that use 'rep scasb' [not found] <20260327195747.89556-1-david.laight.linux@gmail.com> @ 2026-03-30 16:58 ` Dave Hansen 2026-03-30 17:21 ` Andy Shevchenko 0 siblings, 1 reply; 6+ messages in thread From: Dave Hansen @ 2026-03-30 16:58 UTC (permalink / raw) To: david.laight.linux, Andrew Morton, Andy Shevchenko, Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin, Uros Bizjak, linux-kernel, Kees Cook, Linus Torvalds On 3/27/26 12:57, david.laight.linux@gmail.com wrote: > The fixed overhead of all the 'rep xxx' instructions is rather more > that might expect. While 'rep movs' is getting better on more recent > CPU, the same is not true for 'rep scasb'. On my Zen-5 it has a > fixed overhead of 150 clocks and then takes 3 clocks for each byte. > I've not measured any Intel CPU, but the cost might be 'only' 40 + > 2n. One measurement on a modern 64-bit CPU isn't super convincing to me. > Remove the asm versions of strcat() strncat() strlen() memchr() > and memscan(), the generic C versions will be faster. > > It is quite likely that all these functions are slower than the generic > code on pretty much all CPU since the 486. This is rather handwavy for my taste. There seem to be two valid paths here: 1. We continue the "nobody cares about 32-bit" refrain. This removes a bunch of 32-bit-only code and complexity. If it causes a performance regression, we do not care much. 2. Someone makes _some_ kind of effort to test this on at least *one* 32-bit-only CPU to see if it does any harm. In other words, I'm not opposed to the patch, but the justification doesn't really work for me as written. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH next] i386: Remove string functions that use 'rep scasb' 2026-03-30 16:58 ` [PATCH next] i386: Remove string functions that use 'rep scasb' Dave Hansen @ 2026-03-30 17:21 ` Andy Shevchenko 2026-03-30 19:20 ` David Laight 0 siblings, 1 reply; 6+ messages in thread From: Andy Shevchenko @ 2026-03-30 17:21 UTC (permalink / raw) To: Dave Hansen Cc: david.laight.linux, Andrew Morton, Andy Shevchenko, Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin, Uros Bizjak, linux-kernel, Kees Cook, Linus Torvalds On Mon, Mar 30, 2026 at 7:58 PM Dave Hansen <dave.hansen@intel.com> wrote: > > On 3/27/26 12:57, david.laight.linux@gmail.com wrote: > > The fixed overhead of all the 'rep xxx' instructions is rather more > > that might expect. While 'rep movs' is getting better on more recent > > CPU, the same is not true for 'rep scasb'. On my Zen-5 it has a > > fixed overhead of 150 clocks and then takes 3 clocks for each byte. > > I've not measured any Intel CPU, but the cost might be 'only' 40 + > > 2n. > > One measurement on a modern 64-bit CPU isn't super convincing to me. > > > Remove the asm versions of strcat() strncat() strlen() memchr() > > and memscan(), the generic C versions will be faster. > > > > It is quite likely that all these functions are slower than the generic > > code on pretty much all CPU since the 486. > > This is rather handwavy for my taste. > > There seem to be two valid paths here: > > 1. We continue the "nobody cares about 32-bit" refrain. This removes a > bunch of 32-bit-only code and complexity. If it causes a performance > regression, we do not care much. > 2. Someone makes _some_ kind of effort to test this on at least *one* > 32-bit-only CPU to see if it does any harm. > > In other words, I'm not opposed to the patch, but the justification > doesn't really work for me as written. I have Intel Quark at hand to test. But I need to know the step-by-step instructions on what to do. -- With Best Regards, Andy Shevchenko ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH next] i386: Remove string functions that use 'rep scasb' 2026-03-30 17:21 ` Andy Shevchenko @ 2026-03-30 19:20 ` David Laight 2026-03-30 19:47 ` Dave Hansen 0 siblings, 1 reply; 6+ messages in thread From: David Laight @ 2026-03-30 19:20 UTC (permalink / raw) To: Andy Shevchenko Cc: Dave Hansen, Andrew Morton, Andy Shevchenko, Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin, Uros Bizjak, linux-kernel, Kees Cook, Linus Torvalds On Mon, 30 Mar 2026 20:21:41 +0300 Andy Shevchenko <andy.shevchenko@gmail.com> wrote: > On Mon, Mar 30, 2026 at 7:58 PM Dave Hansen <dave.hansen@intel.com> wrote: > > > > On 3/27/26 12:57, david.laight.linux@gmail.com wrote: > > > The fixed overhead of all the 'rep xxx' instructions is rather more > > > that might expect. While 'rep movs' is getting better on more recent > > > CPU, the same is not true for 'rep scasb'. On my Zen-5 it has a > > > fixed overhead of 150 clocks and then takes 3 clocks for each byte. > > > I've not measured any Intel CPU, but the cost might be 'only' 40 + > > > 2n. > > > > One measurement on a modern 64-bit CPU isn't super convincing to me. > > > > > Remove the asm versions of strcat() strncat() strlen() memchr() > > > and memscan(), the generic C versions will be faster. > > > > > > It is quite likely that all these functions are slower than the generic > > > code on pretty much all CPU since the 486. > > > > This is rather handwavy for my taste. > > > > There seem to be two valid paths here: > > > > 1. We continue the "nobody cares about 32-bit" refrain. This removes a > > bunch of 32-bit-only code and complexity. If it causes a performance > > regression, we do not care much. > > 2. Someone makes _some_ kind of effort to test this on at least *one* > > 32-bit-only CPU to see if it does any harm. > > > > In other words, I'm not opposed to the patch, but the justification > > doesn't really work for me as written. > > I have Intel Quark at hand to test. But I need to know the > step-by-step instructions on what to do. > I can run my test on a few 'older' systems, but I don't have anything Intel before Sandy bridge and only an AMD 'Excavator' (or similar). I do remember (a long time ago) getting my Athlon 700 to run a copy loop as fast as 'rep movl' - but the setup time was a lot worse. So I suspect that generation of cpu didn't have a large overhead. If I've read Agner's tables he gives a 40 clock setup to P-II onwards. I can give you the source of the test I've been using. David ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH next] i386: Remove string functions that use 'rep scasb' 2026-03-30 19:20 ` David Laight @ 2026-03-30 19:47 ` Dave Hansen 2026-03-31 0:27 ` Maciej W. Rozycki 0 siblings, 1 reply; 6+ messages in thread From: Dave Hansen @ 2026-03-30 19:47 UTC (permalink / raw) To: David Laight, Andy Shevchenko Cc: Andrew Morton, Andy Shevchenko, Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin, Uros Bizjak, linux-kernel, Kees Cook, Linus Torvalds On 3/30/26 12:20, David Laight wrote: > I have Intel Quark at hand to test. But I need to know the > step-by-step instructions on what to do. I'll take it if it's all that we have, but Quark is really weird. It's probably Intel's last sold 32-bit-only CPU, but it wasn't used for anything remotely performance sensitive, it's more like a 1995 CPU than a 2010 CPU, and Intel probably sold like twenty of them. ;) But, seriously, we don't need to go digging in the junk heap for performance numbers. If nobody has one handy, it's just extra justification for "we don't care". But let's just say *THAT* instead of doing some kind of performance theater where we pretend that like every cycle on CPUs from 2003 matters on a 2026 kernel, and that we even cared enough to measure it. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH next] i386: Remove string functions that use 'rep scasb' 2026-03-30 19:47 ` Dave Hansen @ 2026-03-31 0:27 ` Maciej W. Rozycki 2026-03-31 6:59 ` Andy Shevchenko 0 siblings, 1 reply; 6+ messages in thread From: Maciej W. Rozycki @ 2026-03-31 0:27 UTC (permalink / raw) To: Dave Hansen Cc: David Laight, Andy Shevchenko, Andrew Morton, Andy Shevchenko, Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin, Uros Bizjak, linux-kernel, Kees Cook, Linus Torvalds On Mon, 30 Mar 2026, Dave Hansen wrote: > > I have Intel Quark at hand to test. But I need to know the > > step-by-step instructions on what to do. > > I'll take it if it's all that we have, but Quark is really weird. It's > probably Intel's last sold 32-bit-only CPU, but it wasn't used for > anything remotely performance sensitive, it's more like a 1995 CPU than > a 2010 CPU, and Intel probably sold like twenty of them. ;) > > But, seriously, we don't need to go digging in the junk heap for > performance numbers. If nobody has one handy, it's just extra > justification for "we don't care". > > But let's just say *THAT* instead of doing some kind of performance > theater where we pretend that like every cycle on CPUs from 2003 matters > on a 2026 kernel, and that we even cared enough to measure it. FWIW I can benchmark on a genuine i486 or Pentium MMX system right away, but I'm more concerned about support being dropped altogether rather than squeezing out any extra cycles from these boxes at this point. If anyone runs such equipment for performance nowadays, they must clearly be mad or have missed something. Maciej ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH next] i386: Remove string functions that use 'rep scasb' 2026-03-31 0:27 ` Maciej W. Rozycki @ 2026-03-31 6:59 ` Andy Shevchenko 0 siblings, 0 replies; 6+ messages in thread From: Andy Shevchenko @ 2026-03-31 6:59 UTC (permalink / raw) To: Maciej W. Rozycki Cc: Dave Hansen, David Laight, Andrew Morton, Andy Shevchenko, Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin, Uros Bizjak, linux-kernel, Kees Cook, Linus Torvalds On Tue, Mar 31, 2026 at 3:27 AM Maciej W. Rozycki <macro@orcam.me.uk> wrote: > On Mon, 30 Mar 2026, Dave Hansen wrote: > > > > I have Intel Quark at hand to test. But I need to know the > > > step-by-step instructions on what to do. > > > > I'll take it if it's all that we have, but Quark is really weird. It's > > probably Intel's last sold 32-bit-only CPU, but it wasn't used for > > anything remotely performance sensitive, it's more like a 1995 CPU than > > a 2010 CPU, and Intel probably sold like twenty of them. ;) > > > > But, seriously, we don't need to go digging in the junk heap for > > performance numbers. If nobody has one handy, it's just extra > > justification for "we don't care". > > > > But let's just say *THAT* instead of doing some kind of performance > > theater where we pretend that like every cycle on CPUs from 2003 matters > > on a 2026 kernel, and that we even cared enough to measure it. > > FWIW I can benchmark on a genuine i486 or Pentium MMX system right away, > but I'm more concerned about support being dropped altogether rather than > squeezing out any extra cycles from these boxes at this point. If anyone > runs such equipment for performance nowadays, they must clearly be mad or > have missed something. It makes sense for people who want a tiny x86 core running something as fast as they can with all the benefits from that small core. Intel Quark was designed for power and efficiency for the embedded world, having slightly better performance is not a bad idea. -- With Best Regards, Andy Shevchenko ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2026-03-31 6:59 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20260327195747.89556-1-david.laight.linux@gmail.com>
2026-03-30 16:58 ` [PATCH next] i386: Remove string functions that use 'rep scasb' Dave Hansen
2026-03-30 17:21 ` Andy Shevchenko
2026-03-30 19:20 ` David Laight
2026-03-30 19:47 ` Dave Hansen
2026-03-31 0:27 ` Maciej W. Rozycki
2026-03-31 6:59 ` Andy Shevchenko
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox