Re: performance anomaly in rep movsq/movsb as seen on Sapphire Rapids executing sync_regs()

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: david laight <david.laight@runbox.com>
To: Mateusz Guzik <mjguzik@gmail.com>
Cc: x86@kernel.org, glx@linutronix.de, mingo@redhat.com,
	bp@alien8.de, dave.hansen@linux.intel.com, hpa@zytor.com,
	linux-kernel@vger.kernel.org, torvalds@linux-foundation.org,
	olichtne@redhat.com, atomasov@redhat.com, aokuliar@redhat.com
Subject: Re: performance anomaly in rep movsq/movsb as seen on Sapphire Rapids executing sync_regs()
Date: Thu, 27 Nov 2025 09:58:01 +0000	[thread overview]
Message-ID: <20251127095801.0473d641@pumpkin> (raw)
In-Reply-To: <mwwusvl7jllmck64xczeka42lglmsh7mlthuvmmqlmi5stp3na@raiwozh466wz>

On Thu, 27 Nov 2025 07:55:27 +0100
Mateusz Guzik <mjguzik@gmail.com> wrote:

> Sapphire Rapids has both ERMS (of course) and FSRM.
> 
> sync_regs() runs into a corner case where both rep movsq and rep movsb
> suffer massive penalty for being used to copy 168 bytes, which clear
> itself when data is copied by a bunch of movq instead.
> 
> I verified the issue is not present on AMD EPYC 9454, I don't know about
> other Intel CPUs.

On pretty much all intel cpu 'rep movsb' and 'rep movsq' seem to be
implemented in the same hardware - so the length in the 'q' case is
just multiplied by 8.
(That goes all the way back to Sandy bridge.)

I'm guessing all the copies are at the same page alignment?

I found some strange alignment related issues on a zen-5 cpu.
Mostly neither the source nor destination alignment made much difference.
(Apart from (IIRC) 64 byte aligning the destination doubling throughput.)
But some copies were horribly slow.
It was something like copies where the page offset of the destination
was less than 64 bytes from the page offset of the src and the src wasn't
on a page boundary (the byte alignment wasn't relevant).

I wonder if Sapphire Rapids has some similar perversion?
Or, is that one of the big/little cpu where most of the cpu are
actually atom ones - which may not have either ERMS or FSRM ?

I need to rerun those tests using data dependencies instead of lfence
and get a much better estimation of the instruction setup time.
But I am lacking old amd and new intel hardware.

	David

next prev parent reply	other threads:[~2025-11-27  9:58 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-27  6:55 performance anomaly in rep movsq/movsb as seen on Sapphire Rapids executing sync_regs() Mateusz Guzik
2025-11-27  9:58 ` david laight [this message]
2025-12-03 21:35   ` Dave Hansen
2025-12-03 21:37 ` Dave Hansen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20251127095801.0473d641@pumpkin \
    --to=david.laight@runbox.com \
    --cc=aokuliar@redhat.com \
    --cc=atomasov@redhat.com \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=glx@linutronix.de \
    --cc=hpa@zytor.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=mjguzik@gmail.com \
    --cc=olichtne@redhat.com \
    --cc=torvalds@linux-foundation.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox