All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: mjguzik@gmail.com, linux-kernel@vger.kernel.org,
	mingo@redhat.com, torvalds@linux-foundation.org, x86@kernel.org
Subject: Re: [RFC PATCH] x86: prevent gcc from emitting rep movsq/stosq for inlined ops
Date: Wed, 2 Apr 2025 20:22:41 +0200	[thread overview]
Message-ID: <20250402182241.GY5880@noisy.programming.kicks-ass.net> (raw)
In-Reply-To: <948fffdc-d0d8-49c4-90b6-b91f282f76c9@citrix.com>

On Wed, Apr 02, 2025 at 07:17:03PM +0100, Andrew Cooper wrote:
> > Please make this a gcc bug-report instead - I really don't want to
> > have random compiler-specific tuning options in the kernel. Because
> > that whole memcpy-strategy thing is something that gets tuned by a lot
> > of other compiler options (ie -march and different versions).
> 
> I've discussed this with PeterZ in the past, although I can't for the
> life of me find the bugzilla ticket I thought I opened on the matter. 
> (Maybe I never got that far).
> 
> The behaviour wanted is:
> 
> 1) Convert to plain plain accesses (so they can be merged/combined/etc), or
> 2) Emit a library call
> 
> because we do provide forms that are better than the GCC-chosen "REP
> MOVSQ with manual alignment" in the general case.
> 
> Taking a leaf out of the repoline book, the ideal library call(s) would be:
> 
>     CALL __x86_thunk_rep_{mov,stos}sb
> 
> using the REP ABI (parameters in %rcx/%rdi/etc), rather than the SYSV ABI.
> 
> For current/future processors, which have fast reps of all short/zero
> flavours, we can even inline the REP {MOV,STO}S instruction to avoid the
> call.
> 
> For older microarchitectures, they can reuse the existing memcpy/memset
> implementations, just with marginally less parameter shuffling.
> 
> How does this sound?

Right, vague memories indeed. We do something like this manually for
copy_user_generic().

But it would indeed be very nice if the compiler were to emit such thunk
calls instead of doing rep whatever and then we can objtool collect the
locations and patch at runtime to be 'rep movs' or not, depending on
CPU flags etc.

  reply	other threads:[~2025-04-02 18:22 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-02 13:42 [RFC PATCH] x86: prevent gcc from emitting rep movsq/stosq for inlined ops Mateusz Guzik
2025-04-02 16:21 ` Linus Torvalds
2025-04-02 16:27   ` Mateusz Guzik
2025-04-02 18:17     ` Andrew Cooper
2025-04-02 18:22       ` Peter Zijlstra [this message]
2025-04-02 18:29       ` Linus Torvalds
2025-04-02 18:40         ` Andrew Cooper
2025-04-02 18:56           ` Linus Torvalds
2025-04-02 23:39             ` Mateusz Guzik
2025-04-13 10:27     ` Mateusz Guzik
2025-04-13 18:20       ` David Laight
2025-04-13 18:58         ` Mateusz Guzik
2025-04-02 22:29 ` David Laight
2025-04-02 23:15   ` Mateusz Guzik

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250402182241.GY5880@noisy.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=andrew.cooper3@citrix.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=mjguzik@gmail.com \
    --cc=torvalds@linux-foundation.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.