All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Laight <david.laight.linux@gmail.com>
To: Herton Krzesinski <hkrzesin@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Mateusz Guzik <mjguzik@gmail.com>,
	x86@kernel.org, tglx@linutronix.de, mingo@redhat.com,
	bp@alien8.de, dave.hansen@linux.intel.com, hpa@zytor.com,
	olichtne@redhat.com, atomasov@redhat.com, aokuliar@redhat.com,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH] x86: handle the tail in rep_movs_alternative() with an overlapping store
Date: Wed, 26 Mar 2025 22:45:27 +0000	[thread overview]
Message-ID: <20250326224527.10105902@pumpkin> (raw)
In-Reply-To: <CAJmZWFHTG8cR77zrUpKF81tphcTQ1fyDO6vqnY57ptcs2yM=-A@mail.gmail.com>

On Tue, 25 Mar 2025 19:42:09 -0300
Herton Krzesinski <hkrzesin@redhat.com> wrote:

...
> I have been trying to also measure the impact of changes like above, however,
> it seems I don't get improvement or it's limited due impact of
> profiling, I tried
> to uninline/move copy_user_generic() like this:

If you use the PERF_COUNT_HW_CPU_CYCLES counter bracketed by 'mfence'
you can get reasonably consistent cycle counts for short sequences.

The problem here is that you need the specific cpu that is causing issues.
Probably zen2 or zen3.

Benchmarking 'rep movsb' on a zen5 can be summarised:
Test overhead: 195 clocks ('rep movb' asm with a 'nop') subtracted from the
other values.
  length    clocks
       0       7
   1..3f       5
      40       4
  41..7f       5
  80..1ff     39 (except 16c with is 4 clocks faster!)
      200     38
 201..23f     40
      240     38
 241..27f     41
      280     39
The pattern then continues much the same, increasing by 1 clock every 64 bytes
with the multiple of 64 being a bit cheaper.

With a 'sailing wind' a copy loop should do 8 bytes/clock.
(Faster if the cpu supports more than one write/clock.)
So might be faster for lengths between 128 and ~256.

Misaligning the addresses doesn't usually make any difference.
(There is a small penalty for destinations in the last cache line of a page.)

But there is strange oddity.
If (dest - src) % 4096 is between 1 and 63 then short copies are 55 clocks
jumping to 75 at 128 bytes and then increasing slowly.
(I think that matches what I've seen.)

	David



  parent reply	other threads:[~2025-03-26 22:45 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-20 19:05 [PATCH] x86: handle the tail in rep_movs_alternative() with an overlapping store Mateusz Guzik
2025-03-20 19:23 ` Linus Torvalds
2025-03-20 19:33   ` Mateusz Guzik
2025-03-20 20:24     ` Mateusz Guzik
2025-03-22 12:02       ` David Laight
2025-03-20 21:17     ` Linus Torvalds
2025-03-20 23:53       ` Linus Torvalds
2025-03-21 20:10         ` Mateusz Guzik
2025-03-21 20:47         ` David Laight
2025-03-25 22:42           ` Herton Krzesinski
2025-03-25 22:51             ` Linus Torvalds
2025-03-26 22:45             ` David Laight [this message]
2025-03-26 22:59             ` Mateusz Guzik

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250326224527.10105902@pumpkin \
    --to=david.laight.linux@gmail.com \
    --cc=aokuliar@redhat.com \
    --cc=atomasov@redhat.com \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=hkrzesin@redhat.com \
    --cc=hpa@zytor.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=mjguzik@gmail.com \
    --cc=olichtne@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.