From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from mailtransmit04.runbox.com (mailtransmit04.runbox.com [185.226.149.37])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id D33F72DA765
	for <linux-kernel@vger.kernel.org>; Thu, 27 Nov 2025 09:58:25 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=185.226.149.37
ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1764237509; cv=none; b=k/vCUU0LrDKol/Emy3j99vUJTA41qkD8DenA0OdO4pvtcSwrbRVoPX11l5UGOcy2cZiq/RdSBKpPk9SbHr01Vh2t4aYUMe5aCs1XtcS+FSpOF3+oFpHQdgIRkuy/OSS/cDDTrc2Z4ZLDRBPCetY5EolGcpnWme67TEg3iPfmMfw=
ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1764237509; c=relaxed/simple;
	bh=Uuva1rPzLArm+7N8U5R9+cOnFd9xyBtdmho2c5n3iuQ=;
	h=Date:From:To:Cc:Subject:Message-ID:In-Reply-To:References:
	 MIME-Version:Content-Type; b=OSMiPLt/0xUz+KyO0n5JtmxwqT9+rcxN266Ad5gAVV+hBbr5+0Fb+iqYJgNyHaUyZ1zM623LkIhXQxa8JpGtwW+v0Yadn5L5GSCGClIgWvJ8zLF3ZDzEexdweXoQjA+ylhl/ubb7i7Jp1Qrp9cS76Lr5aOetDu1qBA9wVSUePKs=
ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=runbox.com; spf=pass smtp.mailfrom=runbox.com; dkim=pass (2048-bit key) header.d=runbox.com header.i=@runbox.com header.b=rhyLEnUK; arc=none smtp.client-ip=185.226.149.37
Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=runbox.com
Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=runbox.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=runbox.com header.i=@runbox.com header.b="rhyLEnUK"
Received: from mailtransmit02.runbox ([10.9.9.162] helo=aibo.runbox.com)
	by mailtransmit04.runbox.com with esmtps  (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
	(Exim 4.93)
	(envelope-from <david.laight@runbox.com>)
	id 1vOYlQ-00CJts-Aj; Thu, 27 Nov 2025 10:58:12 +0100
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=runbox.com;
	 s=selector1; h=Content-Transfer-Encoding:Content-Type:MIME-Version:
	References:In-Reply-To:Message-ID:Subject:Cc:To:From:Date;
	bh=9hxwA7xIEnlzzV7siRRc7j91/JVnJknVoZgNEPdUlXc=; b=rhyLEnUK2fEHo28akBW5bAHe8b
	qd/2cDQ1v3yFjmpbPYJTg/PnoL9mc3ALqwJwCIHTTdP9djNqwxxZYNwhh6QOvEeI92QjBRJJ9A5aH
	mgn/yRhgAU91qy1lvegL61WoMAgorKil4hBuUjFYOFzPDGI/vm2LItZJ6DGm+DMBnZIKxOunCUXmb
	QFiGbD9opuucezDfv/6kY9cB3t/xpaZK6LUMebzLFc6OD6Bp6rmyNB9WhLUlZhCOlOC+gFCNItiXD
	FbA1KCR0vAAUJL/LfBp4cLgiCdVPRHeNXyuwW1fYWO/hj5s9pWTdJGN9gPE3KDshcHA0d2en6h/zb
	Du5UDh4A==;
Received: from [10.9.9.73] (helo=submission02.runbox)
	by mailtransmit02.runbox with esmtp (Exim 4.86_2)
	(envelope-from <david.laight@runbox.com>)
	id 1vOYlP-0004Ef-86; Thu, 27 Nov 2025 10:58:11 +0100
Received: by submission02.runbox with esmtpsa  [Authenticated ID (1493616)]  (TLS1.2:ECDHE_SECP256R1__RSA_SHA256__AES_256_GCM:256)
	(Exim 4.93)
	id 1vOYlH-00H0XS-TL; Thu, 27 Nov 2025 10:58:04 +0100
Date: Thu, 27 Nov 2025 09:58:01 +0000
From: david laight <david.laight@runbox.com>
To: Mateusz Guzik <mjguzik@gmail.com>
Cc: x86@kernel.org, glx@linutronix.de, mingo@redhat.com, bp@alien8.de,
 dave.hansen@linux.intel.com, hpa@zytor.com, linux-kernel@vger.kernel.org,
 torvalds@linux-foundation.org, olichtne@redhat.com, atomasov@redhat.com,
 aokuliar@redhat.com
Subject: Re: performance anomaly in rep movsq/movsb as seen on Sapphire
 Rapids executing sync_regs()
Message-ID: <20251127095801.0473d641@pumpkin>
In-Reply-To: <mwwusvl7jllmck64xczeka42lglmsh7mlthuvmmqlmi5stp3na@raiwozh466wz>
References: <mwwusvl7jllmck64xczeka42lglmsh7mlthuvmmqlmi5stp3na@raiwozh466wz>
X-Mailer: Claws Mail 4.1.1 (GTK 3.24.38; arm-unknown-linux-gnueabihf)
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit

On Thu, 27 Nov 2025 07:55:27 +0100
Mateusz Guzik <mjguzik@gmail.com> wrote:

> Sapphire Rapids has both ERMS (of course) and FSRM.
> 
> sync_regs() runs into a corner case where both rep movsq and rep movsb
> suffer massive penalty for being used to copy 168 bytes, which clear
> itself when data is copied by a bunch of movq instead.
> 
> I verified the issue is not present on AMD EPYC 9454, I don't know about
> other Intel CPUs.

On pretty much all intel cpu 'rep movsb' and 'rep movsq' seem to be
implemented in the same hardware - so the length in the 'q' case is
just multiplied by 8.
(That goes all the way back to Sandy bridge.)

I'm guessing all the copies are at the same page alignment?

I found some strange alignment related issues on a zen-5 cpu.
Mostly neither the source nor destination alignment made much difference.
(Apart from (IIRC) 64 byte aligning the destination doubling throughput.)
But some copies were horribly slow.
It was something like copies where the page offset of the destination
was less than 64 bytes from the page offset of the src and the src wasn't
on a page boundary (the byte alignment wasn't relevant).

I wonder if Sapphire Rapids has some similar perversion?
Or, is that one of the big/little cpu where most of the cpu are
actually atom ones - which may not have either ERMS or FSRM ?

I need to rerun those tests using data dependencies instead of lfence
and get a much better estimation of the instruction setup time.
But I am lacking old amd and new intel hardware.

	David