Re: [PATCH] x86: write aligned to 8 bytes in copy_user_generic (when without FSRM/ERMS)

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: David Laight <david.laight.linux@gmail.com>
To: Mateusz Guzik <mjguzik@gmail.com>
Cc: Herton Krzesinski <hkrzesin@redhat.com>,
	x86@kernel.org, tglx@linutronix.de, mingo@redhat.com,
	bp@alien8.de, dave.hansen@linux.intel.com, hpa@zytor.com,
	linux-kernel@vger.kernel.org, torvalds@linux-foundation.org,
	olichtne@redhat.com, atomasov@redhat.com, aokuliar@redhat.com
Subject: Re: [PATCH] x86: write aligned to 8 bytes in copy_user_generic (when without FSRM/ERMS)
Date: Thu, 20 Mar 2025 21:31:45 +0000	[thread overview]
Message-ID: <20250320213145.6d016e21@pumpkin> (raw)
In-Reply-To: <CAGudoHEDAzOrndyJeb7L95cMPmHHk0O5wk=tRCe35FE6GVYs1w@mail.gmail.com>

On Thu, 20 Mar 2025 19:02:21 +0100
Mateusz Guzik <mjguzik@gmail.com> wrote:

> On Thu, Mar 20, 2025 at 6:51 PM Herton Krzesinski <hkrzesin@redhat.com> wrote:
> >
> > On Thu, Mar 20, 2025 at 11:36 AM Mateusz Guzik <mjguzik@gmail.com> wrote:  
...
> > > That said, have you experimented with aligning the target to 16 bytes
> > > or more bytes?  
> >
> > Yes I tried to do 32-byte write aligned on an old Xeon (Sandy Bridge based)
> > and got no improvement at least in the specific benchmark I'm doing here.
> > Also after your question here I tried 16-byte/32-byte on the AMD cpu as
> > well and got no difference from the 8-byte alignment, same bench as well.
> > I tried to do 8-byte alignment for the ERMS case on Intel and got no
> > difference on the systems I tested. I'm not saying it may not improve in
> > some other case, just that in my specific testing I couldn't tell/measure
> > any improvement.
> >  
> 
> oof, I would not got as far back as Sandy Bridge. ;)

It is a boundary point.
Agner's tables (fairly reliable have):

Sandy Bridge
Page 222
MOVS 5 4
REP MOVS 2n 1.5 n  worst case
REP MOVS 3/16B 1/16B best case

which is the same as Ivy bridge - which you'd sort of expect since
Ivy bridge is a minor update, Agner's tables have the same values for it.
Haswell jumps to 1/32B.

I didn't test Sandy bridge (I've got one, powered off), but did test Ivy Bridge.
Neither the source nor destination alignment made any difference at all.

As I said earlier the only alignment that made any difference was 32byte
aligning the destination on Haswell (and later).
That is needed to get 32 bytes/clock rather than 16 bytes/clock.

> 
> I think Skylake is the oldest yeller to worry about, if one insists on it.
> 
> That said, if memory serves right these bufs like to be misaligned to
> weird extents, it very well may be in your tests aligning to 8 had a
> side effect of aligning it to 16 even.
> 
> > >
> > > Moreover, I have some recollection that there were uarchs with ERMS
> > > which also liked the target to be aligned -- as in perhaps this should
> > > be done regardless of FSRM?  

Dunno, the only report is some AMD cpu being slow with misaligned writes.
But that is the copy loop, not 'rep movsq'.
I don't have one to test.

> >
> > Where I tested I didn't see improvements but may be there is some case,
> > but I didn't have any.
> >  
> > >
> > > And most importantly memset, memcpy and clear_user would all use a
> > > revamp and they are missing rep handling for bigger sizes (I verified
> > > they *do* show up). Not only that, but memcpy uses overlapping stores
> > > while memset just loops over stuff.
> > >
> > > I intended to sort it out long time ago and maybe will find some time
> > > now that I got reminded of it, but I would be deligthed if it got
> > > picked up.
> > >
> > > Hacking this up is just some screwing around, the real time consuming
> > > part is the benchmarking so I completely understand if you are not
> > > interested.  
> >
> > Yes, the most time you spend is on benchmarking. May be later I could
> > try to take a look but will not put any promises on it.

I found I needed to use the performance counter to get a proper cycle count.
But then directly read the register to avoid all the 'library' overhead.
Then add lfence/mfence both sides of the cycle count read.
After subtracting the overhead of a 'null function' I could measure the
number of clocks each operation took.
So could tell when I was actually getting 32 bytes copied per clock.

(Or testing the ip checksum code the number of bytes/clock - can get to 12).

	David

> >  
> 
> Now I'm curious enough what's up here. If I don't run out of steam,
> I'm gonna cover memset and memcpy myself.
>

next prev parent reply	other threads:[~2025-03-20 21:31 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-20 14:22 [PATCH] x86: write aligned to 8 bytes in copy_user_generic (when without FSRM/ERMS) Herton R. Krzesinski
2025-03-20 14:35 ` Mateusz Guzik
2025-03-20 17:51   ` Herton Krzesinski
2025-03-20 18:02     ` Mateusz Guzik
2025-03-20 18:37       ` Herton Krzesinski
2025-03-20 21:31       ` David Laight [this message]
2025-03-28 21:17 ` [tip: x86/mm] x86/uaccess: Improve performance by aligning writes to 8 bytes in copy_user_generic(), on non-FSRM/ERMS CPUs tip-bot2 for Herton R. Krzesinski
2025-03-28 21:47   ` Ingo Molnar
2025-03-28 21:53 ` [tip: x86/urgent] " tip-bot2 for Herton R. Krzesinski
2025-03-28 22:04 ` tip-bot2 for Herton R. Krzesinski
2025-03-28 22:11   ` Ingo Molnar
2025-05-02  6:19 ` [PATCH] x86: write aligned to 8 bytes in copy_user_generic (when without FSRM/ERMS) Nicholas Sielicki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250320213145.6d016e21@pumpkin \
    --to=david.laight.linux@gmail.com \
    --cc=aokuliar@redhat.com \
    --cc=atomasov@redhat.com \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=hkrzesin@redhat.com \
    --cc=hpa@zytor.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=mjguzik@gmail.com \
    --cc=olichtne@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox