From: David Laight <david.laight.linux@gmail.com>
To: Herton Krzesinski <hkrzesin@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>,
Linus Torvalds <torvalds@linux-foundation.org>,
x86@kernel.org, tglx@linutronix.de, mingo@redhat.com,
bp@alien8.de, dave.hansen@linux.intel.com, hpa@zytor.com,
linux-kernel@vger.kernel.org, olichtne@redhat.com,
atomasov@redhat.com, aokuliar@redhat.com
Subject: Re: [PATCH] x86: add back the alignment of the destination to 8 bytes in copy_user_generic()
Date: Wed, 19 Mar 2025 13:07:39 +0000 [thread overview]
Message-ID: <20250319130739.4c93069b@pumpkin> (raw)
In-Reply-To: <CAJmZWFGOc_HR2xrD_L9PADmfWP=Sg9v3-yeDRdiw=mhD_BSwww@mail.gmail.com>
On Tue, 18 Mar 2025 19:50:41 -0300
Herton Krzesinski <hkrzesin@redhat.com> wrote:
> On Tue, Mar 18, 2025 at 6:59 PM David Laight
> <david.laight.linux@gmail.com> wrote:
...
> For Intel, I was looking and looks like after Sandy Bridge based CPUs
> most/almost all have ERMS, and FSRM is something only newer ones have.
> So the way back to Ivy Bridge is ERMS and not FSRM.
ERMS behaves much the same as FSRM.
The cost of the first tranfser is a few clocks higher (maybe 30 not 24),
and (IIRC) the overhead for the next couple of blocks is a bit bigger.
Reading Agner's tables (again) Haswell will do 32 bytes/clock
(for an aligned destination) whereas Sandy/Ivy bridge 'only' do 16..
I doubt it is enough to treat them differently.
The real issue with using (aligned) 'rep movsq' was the 140 clock
setup cost on P4 netburst (and no one cares about that and more).
I don't think anything else really needs an open coded loop.
There is no hint in the tables of the AND cpu (going way back)
having long setup times.
The differing cost of different ways of aligning the copy will show
up most on short copies.
You also need to benchmarks differing sizes/alignments - otherwise the branch
predictor will get it right every time - which it doesn't in 'real code'.
David
next prev parent reply other threads:[~2025-03-19 13:07 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-03-14 17:53 Performance issues in copy_user_generic() in x86_64 Herton R. Krzesinski
2025-03-14 17:53 ` [PATCH] x86: add back the alignment of the destination to 8 bytes in copy_user_generic() Herton R. Krzesinski
2025-03-14 19:06 ` Linus Torvalds
2025-03-14 20:33 ` Herton Krzesinski
2025-03-16 10:58 ` Ingo Molnar
2025-03-16 11:09 ` Ingo Molnar
2025-03-17 13:18 ` Herton Krzesinski
2025-03-18 21:59 ` David Laight
2025-03-18 22:50 ` Herton Krzesinski
2025-03-19 13:07 ` David Laight [this message]
2025-03-17 13:16 ` David Laight
2025-03-17 21:29 ` Linus Torvalds
2025-03-17 22:32 ` David Laight
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250319130739.4c93069b@pumpkin \
--to=david.laight.linux@gmail.com \
--cc=aokuliar@redhat.com \
--cc=atomasov@redhat.com \
--cc=bp@alien8.de \
--cc=dave.hansen@linux.intel.com \
--cc=hkrzesin@redhat.com \
--cc=hpa@zytor.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@kernel.org \
--cc=mingo@redhat.com \
--cc=olichtne@redhat.com \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.