From: David Laight <david.laight.linux@gmail.com>
To: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>,
Thomas Gleixner <tglx@linutronix.de>,
Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
Dave Hansen <dave.hansen@linux.intel.com>,
x86@kernel.org, "H. Peter Anvin" <hpa@zytor.com>,
Peter Zijlstra <peterz@infradead.org>,
Ard Biesheuvel <ardb@kernel.org>,
"Paul E. McKenney" <paulmck@kernel.org>,
Josh Poimboeuf <jpoimboe@kernel.org>,
Xiongwei Song <xiongwei.song@windriver.com>,
Xin Li <xin3.li@intel.com>,
"Mike Rapoport (IBM)" <rppt@kernel.org>,
Brijesh Singh <brijesh.singh@amd.com>,
Michael Roth <michael.roth@amd.com>,
Tony Luck <tony.luck@intel.com>,
Alexey Kardashevskiy <aik@amd.com>,
Alexander Shishkin <alexander.shishkin@linux.intel.com>,
Jonathan Corbet <corbet@lwn.net>,
Sohil Mehta <sohil.mehta@intel.com>,
Ingo Molnar <mingo@kernel.org>,
Pawan Gupta <pawan.kumar.gupta@linux.intel.com>,
Daniel Sneddon <daniel.sneddon@linux.intel.com>,
Kai Huang <kai.huang@intel.com>,
Sandipan Das <sandipan.das@amd.com>,
Breno Leitao <leitao@debian.org>,
Rick Edgecombe <rick.p.edgecombe@intel.com>,
Alexei Starovoitov <ast@kernel.org>, Hou Tao <houtao1@huawei.com>,
Juergen Gross <jgross@suse.com>,
Vegard Nossum <vegard.nossum@oracle.com>,
Kees Cook <kees@kernel.org>, Eric Biggers <ebiggers@google.com>,
Jason Gunthorpe <jgg@ziepe.ca>,
"Masami Hiramatsu (Google)" <mhiramat@kernel.org>,
Andrew Morton <akpm@linux-foundation.org>,
Luis Chamberlain <mcgrof@kernel.org>,
Yuntao Wang <ytcoode@gmail.com>,
Rasmus Villemoes <linux@rasmusvillemoes.dk>,
Christophe Leroy <christophe.leroy@csgroup.eu>,
Tejun Heo <tj@kernel.org>, Changbin Du <changbin.du@huawei.com>,
Huang Shijie <shijie@os.amperecomputing.com>,
Geert Uytterhoeven <geert+renesas@glider.be>,
Namhyung Kim <namhyung@kernel.org>,
Arnaldo Carvalho de Melo <acme@redhat.com>,
linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-efi@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [PATCHv8 02/17] x86/asm: Introduce inline memcpy and memset
Date: Thu, 3 Jul 2025 18:02:13 +0100 [thread overview]
Message-ID: <20250703180213.28c0e92e@pumpkin> (raw)
In-Reply-To: <uoysignw2pmdls5v57z4cty76hhz7fv7ikcih2qgeltbgnem4f@jt2r24bqvzau>
On Thu, 3 Jul 2025 17:10:34 +0300
"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> wrote:
> On Thu, Jul 03, 2025 at 01:15:52PM +0100, David Laight wrote:
> > On Thu, 3 Jul 2025 13:39:57 +0300
> > "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> wrote:
> >
> > > On Thu, Jul 03, 2025 at 09:44:17AM +0100, David Laight wrote:
> > > > On Tue, 1 Jul 2025 12:58:31 +0300
> > > > "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> wrote:
> > > >
> > > > > Extract memcpy and memset functions from copy_user_generic() and
> > > > > __clear_user().
> > > > >
> > > > > They can be used as inline memcpy and memset instead of the GCC builtins
> > > > > whenever necessary. LASS requires them to handle text_poke.
> > > >
> > > > Except they contain the fault handlers so aren't generic calls.
> > >
> > > That's true. I will add a comment to clarify it.
> >
> > They need renaming.
>
> __inline_memcpy/memset_safe()?
'safe' against what :-)
They can't be used for user accesses without access_ok() and clac.
The get/put_user variants without access_ok() have _unsafe() suffix.
>
> > ...
> > > > > diff --git a/arch/x86/lib/clear_page_64.S b/arch/x86/lib/clear_page_64.S
> > > > > index a508e4a8c66a..47b613690f84 100644
> > > > > --- a/arch/x86/lib/clear_page_64.S
> > > > > +++ b/arch/x86/lib/clear_page_64.S
> > > > > @@ -55,17 +55,26 @@ SYM_FUNC_END(clear_page_erms)
> > > > > EXPORT_SYMBOL_GPL(clear_page_erms)
> > > > >
> > > > > /*
> > > > > - * Default clear user-space.
> > > > > + * Default memset.
> > > > > * Input:
> > > > > * rdi destination
> > > > > + * rsi scratch
> > > > > * rcx count
> > > > > - * rax is zero
> > > > > + * al is value
> > > > > *
> > > > > * Output:
> > > > > * rcx: uncleared bytes or 0 if successful.
> > > > > + * rdx: clobbered
> > > > > */
> > > > > SYM_FUNC_START(rep_stos_alternative)
> > > > > ANNOTATE_NOENDBR
> > > > > +
> > > > > + movzbq %al, %rsi
> > > > > + movabs $0x0101010101010101, %rax
> > > > > +
> > > > > + /* RDX:RAX = RAX * RSI */
> > > > > + mulq %rsi
> > > >
> > > > NAK - you can't do that here.
> > > > Neither %rsi nor %rdx can be trashed.
> > > > The function has a very explicit calling convention.
> > >
> > > What calling convention? We change the only caller to confirm to this.
> >
> > The one that is implicit in:
> >
> > > > > + asm volatile("1:\n\t"
> > > > > + ALT_64("rep stosb",
> > > > > + "call rep_stos_alternative", ALT_NOT(X86_FEATURE_FSRM))
> > > > > + "2:\n\t"
> > > > > + _ASM_EXTABLE_UA(1b, 2b)
> > > > > + : "+c" (len), "+D" (addr), ASM_CALL_CONSTRAINT
> > > > > + : "a" ((uint8_t)v)
> >
> > The called function is only allowed to change the registers that
> > 'rep stosb' uses - except it can access (but not change)
> > all of %rax - not just %al.
> >
> > See: https://godbolt.org/z/3fnrT3x9r
> > In particular note that 'do_mset' must not change %rax.
> >
> > This is very specific and is done so that the compiler can use
> > all the registers.
>
> Okay, I see what you are saying.
>
> > > > It is also almost certainly a waste of time.
> > > > Pretty much all the calls will be for a constant 0x00.
> > > > Rename it all memzero() ...
> > >
> > > text_poke_memset() is not limited to zeroing.
> >
> > But you don't want the overhead of extending the constant
> > on all the calls - never mind reserving %rdx to do it.
> > Maybe define a function that requires the caller to have
> > done the 'dirty work' - so any code that wants memzero()
> > just passes zero.
> > Or do the multiply in the C code where it will get optimised
> > away for constant zero.
> > You do get the multiply for the 'rep stosb' case - but that
> > is always going to be true unless you complicate things further.
>
> The patch below seems to do the trick: compiler optimizes out the
> multiplication for v == 0.
>
> It would be nice to avoid it for X86_FEATURE_FSRM, but we cannot use
> cpu_feature_enabled() here as <asm/cpufeature.h> depends on
> <asm/string.h>.
>
> I cannot say I like the result.
>
> Any suggestions?
>
> diff --git a/arch/x86/include/asm/string.h b/arch/x86/include/asm/string.h
> index becb9ee3bc8a..c7644a6f426b 100644
> --- a/arch/x86/include/asm/string.h
> +++ b/arch/x86/include/asm/string.h
> @@ -35,16 +35,27 @@ static __always_inline void *__inline_memcpy(void *to, const void *from, size_t
>
> static __always_inline void *__inline_memset(void *addr, int v, size_t len)
> {
> + unsigned long val = v;
> void *ret = addr;
>
> + if (IS_ENABLED(CONFIG_X86_64)) {
> + /*
> + * Fill all bytes by value in byte 0.
> + *
> + * To be used in rep_stos_alternative()i
> + */
> + val &= 0xff;
> + val *= 0x0101010101010101;
> + }
That won't compile for 32bit, and it needs the same thing done.
val *= (unsigned long)0x0101010101010101ull;
should work.
I don't think you need the 'val &= 0xff', just rely on the caller
passing a valid value - nothing will break badly if it doesn't.
David
> +
> asm volatile("1:\n\t"
> ALT_64("rep stosb",
> "call rep_stos_alternative", ALT_NOT(X86_FEATURE_FSRM))
> "2:\n\t"
> _ASM_EXTABLE_UA(1b, 2b)
> : "+c" (len), "+D" (addr), ASM_CALL_CONSTRAINT
> - : "a" (v)
> - : "memory", _ASM_SI, _ASM_DX);
> + : "a" (val)
> + : "memory");
>
> return ret + len;
> }
> diff --git a/arch/x86/lib/clear_page_64.S b/arch/x86/lib/clear_page_64.S
> index 47b613690f84..3ef7d796deb3 100644
> --- a/arch/x86/lib/clear_page_64.S
> +++ b/arch/x86/lib/clear_page_64.S
> @@ -58,23 +58,15 @@ EXPORT_SYMBOL_GPL(clear_page_erms)
> * Default memset.
> * Input:
> * rdi destination
> - * rsi scratch
> * rcx count
> * al is value
> *
> * Output:
> * rcx: uncleared bytes or 0 if successful.
> - * rdx: clobbered
> */
> SYM_FUNC_START(rep_stos_alternative)
> ANNOTATE_NOENDBR
>
> - movzbq %al, %rsi
> - movabs $0x0101010101010101, %rax
> -
> - /* RDX:RAX = RAX * RSI */
> - mulq %rsi
> -
> cmpq $64,%rcx
> jae .Lunrolled
>
next prev parent reply other threads:[~2025-07-03 17:02 UTC|newest]
Thread overview: 60+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-07-01 9:58 [PATCHv8 00/17] x86: Enable Linear Address Space Separation support Kirill A. Shutemov
2025-07-01 9:58 ` [PATCHv8 01/17] x86/cpu: Enumerate the LASS feature bits Kirill A. Shutemov
2025-07-01 9:58 ` [PATCHv8 02/17] x86/asm: Introduce inline memcpy and memset Kirill A. Shutemov
2025-07-03 8:44 ` David Laight
2025-07-03 10:39 ` Kirill A. Shutemov
2025-07-03 12:15 ` David Laight
2025-07-03 13:33 ` Vegard Nossum
2025-07-03 16:52 ` David Laight
2025-07-03 14:10 ` Kirill A. Shutemov
2025-07-03 17:02 ` David Laight [this message]
2025-07-03 17:13 ` Dave Hansen
2025-07-04 9:04 ` Kirill A. Shutemov
2025-07-06 9:13 ` David Laight
2025-07-07 8:02 ` Kirill A. Shutemov
2025-07-07 9:33 ` David Laight
2025-07-01 9:58 ` [PATCHv8 03/17] x86/alternatives: Disable LASS when patching kernel alternatives Kirill A. Shutemov
2025-07-01 18:44 ` Sohil Mehta
2025-07-01 9:58 ` [PATCHv8 04/17] x86/cpu: Defer CR pinning setup until after EFI initialization Kirill A. Shutemov
2025-07-01 19:03 ` Sohil Mehta
2025-07-02 9:47 ` Kirill A. Shutemov
2025-07-01 23:10 ` Dave Hansen
2025-07-02 10:05 ` Kirill A. Shutemov
2025-07-04 12:23 ` Kirill A. Shutemov
2025-07-01 9:58 ` [PATCHv8 05/17] efi: Disable LASS around set_virtual_address_map() EFI call Kirill A. Shutemov
2025-07-01 9:58 ` [PATCHv8 06/17] x86/vsyscall: Do not require X86_PF_INSTR to emulate vsyscall Kirill A. Shutemov
2025-07-01 9:58 ` [PATCHv8 07/17] x86/vsyscall: Reorganize the #PF emulation code Kirill A. Shutemov
2025-07-01 9:58 ` [PATCHv8 08/17] x86/traps: Consolidate user fixups in exc_general_protection() Kirill A. Shutemov
2025-07-01 9:58 ` [PATCHv8 09/17] x86/vsyscall: Add vsyscall emulation for #GP Kirill A. Shutemov
2025-07-01 9:58 ` [PATCHv8 10/17] x86/vsyscall: Disable LASS if vsyscall mode is set to EMULATE Kirill A. Shutemov
2025-07-01 9:58 ` [PATCHv8 11/17] x86/cpu: Set LASS CR4 bit as pinning sensitive Kirill A. Shutemov
2025-07-01 22:51 ` Sohil Mehta
2025-07-01 9:58 ` [PATCHv8 12/17] x86/traps: Communicate a LASS violation in #GP message Kirill A. Shutemov
2025-07-02 0:36 ` Sohil Mehta
2025-07-02 10:10 ` Kirill A. Shutemov
2025-07-01 9:58 ` [PATCHv8 13/17] x86/traps: Generalize #GP address decode and hint code Kirill A. Shutemov
2025-07-02 0:54 ` Sohil Mehta
2025-07-01 9:58 ` [PATCHv8 14/17] x86/traps: Handle LASS thrown #SS Kirill A. Shutemov
2025-07-02 1:35 ` Sohil Mehta
2025-07-02 2:00 ` H. Peter Anvin
2025-07-02 2:06 ` H. Peter Anvin
2025-07-02 10:17 ` Kirill A. Shutemov
2025-07-02 14:37 ` H. Peter Anvin
2025-07-02 14:47 ` Kirill A. Shutemov
2025-07-02 17:10 ` H. Peter Anvin
2025-07-02 23:42 ` Andrew Cooper
2025-07-03 0:44 ` H. Peter Anvin
2025-07-06 9:22 ` David Laight
2025-07-06 15:07 ` H. Peter Anvin
2025-07-02 13:27 ` Kirill A. Shutemov
2025-07-02 17:56 ` Sohil Mehta
2025-07-03 10:40 ` Kirill A. Shutemov
2025-07-02 20:05 ` Sohil Mehta
2025-07-03 11:31 ` Kirill A. Shutemov
2025-07-03 20:12 ` Sohil Mehta
2025-07-04 9:23 ` Kirill A. Shutemov
2025-07-01 9:58 ` [PATCHv8 15/17] x86/cpu: Make LAM depend on LASS Kirill A. Shutemov
2025-07-01 23:03 ` Sohil Mehta
2025-07-01 9:58 ` [PATCHv8 16/17] x86/cpu: Enable LASS during CPU initialization Kirill A. Shutemov
2025-07-01 9:58 ` [PATCHv8 17/17] x86: Re-enable Linear Address Masking Kirill A. Shutemov
2025-07-01 23:13 ` Sohil Mehta
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250703180213.28c0e92e@pumpkin \
--to=david.laight.linux@gmail.com \
--cc=acme@redhat.com \
--cc=aik@amd.com \
--cc=akpm@linux-foundation.org \
--cc=alexander.shishkin@linux.intel.com \
--cc=ardb@kernel.org \
--cc=ast@kernel.org \
--cc=bp@alien8.de \
--cc=brijesh.singh@amd.com \
--cc=changbin.du@huawei.com \
--cc=christophe.leroy@csgroup.eu \
--cc=corbet@lwn.net \
--cc=daniel.sneddon@linux.intel.com \
--cc=dave.hansen@linux.intel.com \
--cc=ebiggers@google.com \
--cc=geert+renesas@glider.be \
--cc=houtao1@huawei.com \
--cc=hpa@zytor.com \
--cc=jgg@ziepe.ca \
--cc=jgross@suse.com \
--cc=jpoimboe@kernel.org \
--cc=kai.huang@intel.com \
--cc=kees@kernel.org \
--cc=kirill.shutemov@linux.intel.com \
--cc=leitao@debian.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-efi@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux@rasmusvillemoes.dk \
--cc=luto@kernel.org \
--cc=mcgrof@kernel.org \
--cc=mhiramat@kernel.org \
--cc=michael.roth@amd.com \
--cc=mingo@kernel.org \
--cc=mingo@redhat.com \
--cc=namhyung@kernel.org \
--cc=paulmck@kernel.org \
--cc=pawan.kumar.gupta@linux.intel.com \
--cc=peterz@infradead.org \
--cc=rick.p.edgecombe@intel.com \
--cc=rppt@kernel.org \
--cc=sandipan.das@amd.com \
--cc=shijie@os.amperecomputing.com \
--cc=sohil.mehta@intel.com \
--cc=tglx@linutronix.de \
--cc=tj@kernel.org \
--cc=tony.luck@intel.com \
--cc=vegard.nossum@oracle.com \
--cc=x86@kernel.org \
--cc=xin3.li@intel.com \
--cc=xiongwei.song@windriver.com \
--cc=ytcoode@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.