All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] x86: combine memmove FSRM and ERMS alternatives
@ 2023-01-13 20:32 Daniel Verkamp
  2023-01-14  6:55 ` Greg KH
  0 siblings, 1 reply; 9+ messages in thread
From: Daniel Verkamp @ 2023-01-13 20:32 UTC (permalink / raw)
  To: Tony Luck, Borislav Petkov; +Cc: Jiri Slaby, stable, Daniel Verkamp

The x86-64 memmove code has two ALTERNATIVE statements in a row, one to
handle FSRM ("Fast Short REP MOVSB"), and one to handle ERMS ("Enhanced
REP MOVSB"). If either of these features is present, the goal is to jump
directly to a REP MOVSB; otherwise, some setup code that handles short
lengths is executed. The first comparison of a sequence of specific
small sizes is included in the first ALTERNATIVE, so it will be replaced
by NOPs if FSRM is set, and then (assuming ERMS is also set) execution
will fall through to the JMP to a REP MOVSB in the next ALTERNATIVE.

The two ALTERNATIVE invocations can be combined into a single instance
of ALTERNATIVE_2 to simplify and slightly shorten the code. If either
FSRM or ERMS is set, the first instruction in the memmove_begin_forward
path will be replaced with a jump to the REP MOVSB.

This also prevents a problem when FSRM is set but ERMS is not; in this
case, the previous code would have replaced both ALTERNATIVEs with NOPs
and skipped the first check for sizes less than 0x20 bytes. This
combination of CPU features is arguably a firmware bug, but this patch
makes the function robust against this badness.

Fixes: f444a5ff95dc ("x86/cpufeatures: Add support for fast short REP; MOVSB")
Signed-off-by: Daniel Verkamp <dverkamp@chromium.org>
---
 arch/x86/lib/memmove_64.S | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/x86/lib/memmove_64.S b/arch/x86/lib/memmove_64.S
index 724bbf83eb5b..1fc36dbd3bdc 100644
--- a/arch/x86/lib/memmove_64.S
+++ b/arch/x86/lib/memmove_64.S
@@ -38,8 +38,10 @@ SYM_FUNC_START(__memmove)
 
 	/* FSRM implies ERMS => no length checks, do the copy directly */
 .Lmemmove_begin_forward:
-	ALTERNATIVE "cmp $0x20, %rdx; jb 1f", "", X86_FEATURE_FSRM
-	ALTERNATIVE "", "jmp .Lmemmove_erms", X86_FEATURE_ERMS
+	ALTERNATIVE_2 \
+		"cmp $0x20, %rdx; jb 1f", \
+		"jmp .Lmemmove_erms", X86_FEATURE_FSRM, \
+		"jmp .Lmemmove_erms", X86_FEATURE_ERMS
 
 	/*
 	 * movsq instruction have many startup latency
-- 
2.39.0.314.g84b9a713c41-goog


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH] x86: combine memmove FSRM and ERMS alternatives
@ 2023-01-13 20:34 Daniel Verkamp
  2023-01-14  6:55 ` Greg KH
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Daniel Verkamp @ 2023-01-13 20:34 UTC (permalink / raw)
  To: x86, Tony Luck, Borislav Petkov; +Cc: Jiri Slaby, stable, Daniel Verkamp

The x86-64 memmove code has two ALTERNATIVE statements in a row, one to
handle FSRM ("Fast Short REP MOVSB"), and one to handle ERMS ("Enhanced
REP MOVSB"). If either of these features is present, the goal is to jump
directly to a REP MOVSB; otherwise, some setup code that handles short
lengths is executed. The first comparison of a sequence of specific
small sizes is included in the first ALTERNATIVE, so it will be replaced
by NOPs if FSRM is set, and then (assuming ERMS is also set) execution
will fall through to the JMP to a REP MOVSB in the next ALTERNATIVE.

The two ALTERNATIVE invocations can be combined into a single instance
of ALTERNATIVE_2 to simplify and slightly shorten the code. If either
FSRM or ERMS is set, the first instruction in the memmove_begin_forward
path will be replaced with a jump to the REP MOVSB.

This also prevents a problem when FSRM is set but ERMS is not; in this
case, the previous code would have replaced both ALTERNATIVEs with NOPs
and skipped the first check for sizes less than 0x20 bytes. This
combination of CPU features is arguably a firmware bug, but this patch
makes the function robust against this badness.

Fixes: f444a5ff95dc ("x86/cpufeatures: Add support for fast short REP; MOVSB")
Signed-off-by: Daniel Verkamp <dverkamp@chromium.org>
---
 arch/x86/lib/memmove_64.S | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/x86/lib/memmove_64.S b/arch/x86/lib/memmove_64.S
index 724bbf83eb5b..1fc36dbd3bdc 100644
--- a/arch/x86/lib/memmove_64.S
+++ b/arch/x86/lib/memmove_64.S
@@ -38,8 +38,10 @@ SYM_FUNC_START(__memmove)
 
 	/* FSRM implies ERMS => no length checks, do the copy directly */
 .Lmemmove_begin_forward:
-	ALTERNATIVE "cmp $0x20, %rdx; jb 1f", "", X86_FEATURE_FSRM
-	ALTERNATIVE "", "jmp .Lmemmove_erms", X86_FEATURE_ERMS
+	ALTERNATIVE_2 \
+		"cmp $0x20, %rdx; jb 1f", \
+		"jmp .Lmemmove_erms", X86_FEATURE_FSRM, \
+		"jmp .Lmemmove_erms", X86_FEATURE_ERMS
 
 	/*
 	 * movsq instruction have many startup latency
-- 
2.39.0.314.g84b9a713c41-goog


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] x86: combine memmove FSRM and ERMS alternatives
  2023-01-13 20:32 Daniel Verkamp
@ 2023-01-14  6:55 ` Greg KH
  0 siblings, 0 replies; 9+ messages in thread
From: Greg KH @ 2023-01-14  6:55 UTC (permalink / raw)
  To: Daniel Verkamp; +Cc: Tony Luck, Borislav Petkov, Jiri Slaby, stable

On Fri, Jan 13, 2023 at 12:32:17PM -0800, Daniel Verkamp wrote:
> The x86-64 memmove code has two ALTERNATIVE statements in a row, one to
> handle FSRM ("Fast Short REP MOVSB"), and one to handle ERMS ("Enhanced
> REP MOVSB"). If either of these features is present, the goal is to jump
> directly to a REP MOVSB; otherwise, some setup code that handles short
> lengths is executed. The first comparison of a sequence of specific
> small sizes is included in the first ALTERNATIVE, so it will be replaced
> by NOPs if FSRM is set, and then (assuming ERMS is also set) execution
> will fall through to the JMP to a REP MOVSB in the next ALTERNATIVE.
> 
> The two ALTERNATIVE invocations can be combined into a single instance
> of ALTERNATIVE_2 to simplify and slightly shorten the code. If either
> FSRM or ERMS is set, the first instruction in the memmove_begin_forward
> path will be replaced with a jump to the REP MOVSB.
> 
> This also prevents a problem when FSRM is set but ERMS is not; in this
> case, the previous code would have replaced both ALTERNATIVEs with NOPs
> and skipped the first check for sizes less than 0x20 bytes. This
> combination of CPU features is arguably a firmware bug, but this patch
> makes the function robust against this badness.
> 
> Fixes: f444a5ff95dc ("x86/cpufeatures: Add support for fast short REP; MOVSB")
> Signed-off-by: Daniel Verkamp <dverkamp@chromium.org>
> ---
>  arch/x86/lib/memmove_64.S | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/lib/memmove_64.S b/arch/x86/lib/memmove_64.S
> index 724bbf83eb5b..1fc36dbd3bdc 100644
> --- a/arch/x86/lib/memmove_64.S
> +++ b/arch/x86/lib/memmove_64.S
> @@ -38,8 +38,10 @@ SYM_FUNC_START(__memmove)
>  
>  	/* FSRM implies ERMS => no length checks, do the copy directly */
>  .Lmemmove_begin_forward:
> -	ALTERNATIVE "cmp $0x20, %rdx; jb 1f", "", X86_FEATURE_FSRM
> -	ALTERNATIVE "", "jmp .Lmemmove_erms", X86_FEATURE_ERMS
> +	ALTERNATIVE_2 \
> +		"cmp $0x20, %rdx; jb 1f", \
> +		"jmp .Lmemmove_erms", X86_FEATURE_FSRM, \
> +		"jmp .Lmemmove_erms", X86_FEATURE_ERMS
>  
>  	/*
>  	 * movsq instruction have many startup latency
> -- 
> 2.39.0.314.g84b9a713c41-goog
> 

<formletter>

This is not the correct way to submit patches for inclusion in the
stable kernel tree.  Please read:
    https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html
for how to do this properly.

</formletter>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] x86: combine memmove FSRM and ERMS alternatives
  2023-01-13 20:34 [PATCH] x86: combine memmove FSRM and ERMS alternatives Daniel Verkamp
@ 2023-01-14  6:55 ` Greg KH
  2023-01-14  9:24 ` Ingo Molnar
  2023-01-14 10:42 ` Borislav Petkov
  2 siblings, 0 replies; 9+ messages in thread
From: Greg KH @ 2023-01-14  6:55 UTC (permalink / raw)
  To: Daniel Verkamp; +Cc: x86, Tony Luck, Borislav Petkov, Jiri Slaby, stable

On Fri, Jan 13, 2023 at 12:34:27PM -0800, Daniel Verkamp wrote:
> The x86-64 memmove code has two ALTERNATIVE statements in a row, one to
> handle FSRM ("Fast Short REP MOVSB"), and one to handle ERMS ("Enhanced
> REP MOVSB"). If either of these features is present, the goal is to jump
> directly to a REP MOVSB; otherwise, some setup code that handles short
> lengths is executed. The first comparison of a sequence of specific
> small sizes is included in the first ALTERNATIVE, so it will be replaced
> by NOPs if FSRM is set, and then (assuming ERMS is also set) execution
> will fall through to the JMP to a REP MOVSB in the next ALTERNATIVE.
> 
> The two ALTERNATIVE invocations can be combined into a single instance
> of ALTERNATIVE_2 to simplify and slightly shorten the code. If either
> FSRM or ERMS is set, the first instruction in the memmove_begin_forward
> path will be replaced with a jump to the REP MOVSB.
> 
> This also prevents a problem when FSRM is set but ERMS is not; in this
> case, the previous code would have replaced both ALTERNATIVEs with NOPs
> and skipped the first check for sizes less than 0x20 bytes. This
> combination of CPU features is arguably a firmware bug, but this patch
> makes the function robust against this badness.
> 
> Fixes: f444a5ff95dc ("x86/cpufeatures: Add support for fast short REP; MOVSB")
> Signed-off-by: Daniel Verkamp <dverkamp@chromium.org>
> ---
>  arch/x86/lib/memmove_64.S | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/lib/memmove_64.S b/arch/x86/lib/memmove_64.S
> index 724bbf83eb5b..1fc36dbd3bdc 100644
> --- a/arch/x86/lib/memmove_64.S
> +++ b/arch/x86/lib/memmove_64.S
> @@ -38,8 +38,10 @@ SYM_FUNC_START(__memmove)
>  
>  	/* FSRM implies ERMS => no length checks, do the copy directly */
>  .Lmemmove_begin_forward:
> -	ALTERNATIVE "cmp $0x20, %rdx; jb 1f", "", X86_FEATURE_FSRM
> -	ALTERNATIVE "", "jmp .Lmemmove_erms", X86_FEATURE_ERMS
> +	ALTERNATIVE_2 \
> +		"cmp $0x20, %rdx; jb 1f", \
> +		"jmp .Lmemmove_erms", X86_FEATURE_FSRM, \
> +		"jmp .Lmemmove_erms", X86_FEATURE_ERMS
>  
>  	/*
>  	 * movsq instruction have many startup latency
> -- 
> 2.39.0.314.g84b9a713c41-goog
> 

<formletter>

This is not the correct way to submit patches for inclusion in the
stable kernel tree.  Please read:
    https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html
for how to do this properly.

</formletter>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] x86: combine memmove FSRM and ERMS alternatives
  2023-01-13 20:34 [PATCH] x86: combine memmove FSRM and ERMS alternatives Daniel Verkamp
  2023-01-14  6:55 ` Greg KH
@ 2023-01-14  9:24 ` Ingo Molnar
  2023-01-14 10:42 ` Borislav Petkov
  2 siblings, 0 replies; 9+ messages in thread
From: Ingo Molnar @ 2023-01-14  9:24 UTC (permalink / raw)
  To: Daniel Verkamp; +Cc: x86, Tony Luck, Borislav Petkov, Jiri Slaby, stable


* Daniel Verkamp <dverkamp@chromium.org> wrote:

> The x86-64 memmove code has two ALTERNATIVE statements in a row, one to
> handle FSRM ("Fast Short REP MOVSB"), and one to handle ERMS ("Enhanced
> REP MOVSB"). If either of these features is present, the goal is to jump
> directly to a REP MOVSB; otherwise, some setup code that handles short
> lengths is executed. The first comparison of a sequence of specific
> small sizes is included in the first ALTERNATIVE, so it will be replaced
> by NOPs if FSRM is set, and then (assuming ERMS is also set) execution
> will fall through to the JMP to a REP MOVSB in the next ALTERNATIVE.
> 
> The two ALTERNATIVE invocations can be combined into a single instance
> of ALTERNATIVE_2 to simplify and slightly shorten the code. If either
> FSRM or ERMS is set, the first instruction in the memmove_begin_forward
> path will be replaced with a jump to the REP MOVSB.
> 
> This also prevents a problem when FSRM is set but ERMS is not; in this
> case, the previous code would have replaced both ALTERNATIVEs with NOPs
> and skipped the first check for sizes less than 0x20 bytes. This
> combination of CPU features is arguably a firmware bug, but this patch
> makes the function robust against this badness.
> 
> Fixes: f444a5ff95dc ("x86/cpufeatures: Add support for fast short REP; MOVSB")
> Signed-off-by: Daniel Verkamp <dverkamp@chromium.org>

Could you please extend the changelog with the information about the guest 
and baremetal firmware tthat Jiri Slaby pointed out?

Ie. acknowledging Boris's point that these are technically invalid CPUID 
combinations, but emphasizing that the x86 memcpy routines should not 
behave in an undefined fashion while other OSs boot fine under the same 
firmware environment ...

[ Also, please Cc: lkml and don't Cc: -stable. ]

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] x86: combine memmove FSRM and ERMS alternatives
  2023-01-13 20:34 [PATCH] x86: combine memmove FSRM and ERMS alternatives Daniel Verkamp
  2023-01-14  6:55 ` Greg KH
  2023-01-14  9:24 ` Ingo Molnar
@ 2023-01-14 10:42 ` Borislav Petkov
  2023-01-14 16:17   ` Borislav Petkov
  2 siblings, 1 reply; 9+ messages in thread
From: Borislav Petkov @ 2023-01-14 10:42 UTC (permalink / raw)
  To: Daniel Verkamp; +Cc: x86, Tony Luck, Jiri Slaby, lkml

+ lkml.

Always CC lkml on patches pls.

Ok, let's see. I hope the coffee's working already and I'm not missing an
aspect...

On Fri, Jan 13, 2023 at 12:34:27PM -0800, Daniel Verkamp wrote:
> diff --git a/arch/x86/lib/memmove_64.S b/arch/x86/lib/memmove_64.S
> index 724bbf83eb5b..1fc36dbd3bdc 100644
> --- a/arch/x86/lib/memmove_64.S
> +++ b/arch/x86/lib/memmove_64.S
> @@ -38,8 +38,10 @@ SYM_FUNC_START(__memmove)
>  
>  	/* FSRM implies ERMS => no length checks, do the copy directly */
>  .Lmemmove_begin_forward:
> -	ALTERNATIVE "cmp $0x20, %rdx; jb 1f", "", X86_FEATURE_FSRM
> -	ALTERNATIVE "", "jmp .Lmemmove_erms", X86_FEATURE_ERMS
> +	ALTERNATIVE_2 \
> +		"cmp $0x20, %rdx; jb 1f", \
> +		"jmp .Lmemmove_erms", X86_FEATURE_FSRM, \
> +		"jmp .Lmemmove_erms", X86_FEATURE_ERMS

This is wrong in the ERMS case:

* If you have FSRM, you can simply do

	REP; MOVSB

as any size is handled fine. So that's ok. BUT:

* If you have ERMS, you need to jump to 1f for smaller sizes. ERMS makes sense
only for bigger than, well, we have 0x20 there.

So if you have ERMS, you can't simply replace:

	"cmp $0x20, %rdx; jb 1f"

with

	"jmp .Lmemmove_erms"

You still need that size check.

IOW, it should be something like this:

	ALTERNATIVE_2
		"cmp $0x20, %rdx; jb 1f; jmp .Lmemmove_erms", X86_FEATURE_ERMS,
		"jmp .Lmemmove_erms", X86_FEATURE_FSRM

But you can't have JMPs as NOT the first insn in alternatives because we fixup
the JMP offsets only for the first insn, see where recompute_jump() is called.

So, in order for the above to work, you'd need to use the insn decoder and look
at every insn in replacement and recompute_jump() it if it is a JMP.

And there are nice examples how to do that - see the loops in alternative.c
doing insn_decode_kernel().

Feel like getting your hands dirty with that?

:-)

Or, altenatively (pun intended), you can do what copy_user_generic() does and
move all that logic into C and inline asm. Which I'd prefer, actually, instead of
doing ugly asm hacks. Depends on how ugly it gets...

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] x86: combine memmove FSRM and ERMS alternatives
  2023-01-14 10:42 ` Borislav Petkov
@ 2023-01-14 16:17   ` Borislav Petkov
  2023-01-14 20:49     ` Borislav Petkov
  0 siblings, 1 reply; 9+ messages in thread
From: Borislav Petkov @ 2023-01-14 16:17 UTC (permalink / raw)
  To: Daniel Verkamp; +Cc: x86, Tony Luck, Jiri Slaby, lkml

On Sat, Jan 14, 2023 at 11:42:13AM +0100, Borislav Petkov wrote:
> Or, altenatively (pun intended), you can do what copy_user_generic() does and
> move all that logic into C and inline asm. Which I'd prefer, actually, instead of
> doing ugly asm hacks. Depends on how ugly it gets...

Alternatively #2, you can do the below as a minimal fix for stable along with
explaining what we're doing there and why and then do the other things I
suggested - if you'd like, that is - later and with no pressure.

Thx.

---
diff --git a/arch/x86/lib/memmove_64.S b/arch/x86/lib/memmove_64.S
index 02661861e5dd..d6ffb4164cdb 100644
--- a/arch/x86/lib/memmove_64.S
+++ b/arch/x86/lib/memmove_64.S
@@ -38,10 +38,9 @@ SYM_FUNC_START(__memmove)
 	cmp %rdi, %r8
 	jg 2f
 
-	/* FSRM implies ERMS => no length checks, do the copy directly */
 .Lmemmove_begin_forward:
 	ALTERNATIVE "cmp $0x20, %rdx; jb 1f", "", X86_FEATURE_FSRM
-	ALTERNATIVE "", "jmp .Lmemmove_erms", X86_FEATURE_ERMS
+	ALTERNATIVE "cmp $0x20, %rdx; jb 1f", "jmp .Lmemmove_erms", X86_FEATURE_ERMS
 
 	/*
 	 * movsq instruction have many startup latency


-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] x86: combine memmove FSRM and ERMS alternatives
  2023-01-14 16:17   ` Borislav Petkov
@ 2023-01-14 20:49     ` Borislav Petkov
  2023-01-15 23:49       ` Borislav Petkov
  0 siblings, 1 reply; 9+ messages in thread
From: Borislav Petkov @ 2023-01-14 20:49 UTC (permalink / raw)
  To: Daniel Verkamp; +Cc: x86, Tony Luck, Jiri Slaby, lkml

On Sat, Jan 14, 2023 at 05:17:28PM +0100, Borislav Petkov wrote:
> On Sat, Jan 14, 2023 at 11:42:13AM +0100, Borislav Petkov wrote:
> > Or, altenatively (pun intended), you can do what copy_user_generic() does and
> > move all that logic into C and inline asm. Which I'd prefer, actually, instead of
> > doing ugly asm hacks. Depends on how ugly it gets...
> 
> Alternatively #2, you can do the below as a minimal fix for stable along with
> explaining what we're doing there and why and then do the other things I
> suggested - if you'd like, that is - later and with no pressure.
> 
> Thx.
> 
> ---
> diff --git a/arch/x86/lib/memmove_64.S b/arch/x86/lib/memmove_64.S
> index 02661861e5dd..d6ffb4164cdb 100644
> --- a/arch/x86/lib/memmove_64.S
> +++ b/arch/x86/lib/memmove_64.S
> @@ -38,10 +38,9 @@ SYM_FUNC_START(__memmove)
>  	cmp %rdi, %r8
>  	jg 2f
>  
> -	/* FSRM implies ERMS => no length checks, do the copy directly */
>  .Lmemmove_begin_forward:
>  	ALTERNATIVE "cmp $0x20, %rdx; jb 1f", "", X86_FEATURE_FSRM
> -	ALTERNATIVE "", "jmp .Lmemmove_erms", X86_FEATURE_ERMS
> +	ALTERNATIVE "cmp $0x20, %rdx; jb 1f", "jmp .Lmemmove_erms", X86_FEATURE_ERMS

Forget what I said - now that I think of it this is bull.

The more and more I think about it, the more I like the copy_user_generic() idea
but lemme see how ugly it gets...


-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] x86: combine memmove FSRM and ERMS alternatives
  2023-01-14 20:49     ` Borislav Petkov
@ 2023-01-15 23:49       ` Borislav Petkov
  0 siblings, 0 replies; 9+ messages in thread
From: Borislav Petkov @ 2023-01-15 23:49 UTC (permalink / raw)
  To: Daniel Verkamp; +Cc: x86, Tony Luck, Jiri Slaby, lkml

On Sat, Jan 14, 2023 at 09:49:01PM +0100, Borislav Petkov wrote:
> The more and more I think about it, the more I like the copy_user_generic() idea
> but lemme see how ugly it gets...

Something like this. It doesn't build yet but it is supposed to show what I mean
more concretely.

I definitely like the removal of the grafted ALTERNATIVEs in the asm code...

---

diff --git a/arch/x86/include/asm/string_64.h b/arch/x86/include/asm/string_64.h
index 888731ccf1f6..0b48ebd74e2a 100644
--- a/arch/x86/include/asm/string_64.h
+++ b/arch/x86/include/asm/string_64.h
@@ -76,8 +76,17 @@ void *__msan_memmove(void *dest, const void *src, size_t len);
 #define memmove __msan_memmove
 #else
 void *memmove(void *dest, const void *src, size_t count);
+void *__memmove_fsrm(void *dest, const void *src, size_t count);
+void *__memmove_erms(void *dest, void *src, size_t count);
 #endif
-void *__memmove(void *dest, const void *src, size_t count);
+static __always_inline void *__memmove(void *dest, const void *src, size_t count)
+{
+	alternative_call_2(__memmove_fsrm,
+			   __memmove_erms, ALT_NOT(X86_FEATURE_FSRM),
+			   ___memmove, ALT_NOT(X86_FEATURE_ERMS));
+
+	return dest;
+}
 
 int memcmp(const void *cs, const void *ct, size_t count);
 size_t strlen(const char *s);
diff --git a/arch/x86/lib/Makefile b/arch/x86/lib/Makefile
index 4f1a40a86534..039139fa5228 100644
--- a/arch/x86/lib/Makefile
+++ b/arch/x86/lib/Makefile
@@ -70,7 +70,7 @@ ifneq ($(CONFIG_GENERIC_CSUM),y)
         lib-y += csum-partial_64.o csum-copy_64.o csum-wrappers_64.o
 endif
         lib-y += clear_page_64.o copy_page_64.o
-        lib-y += memmove_64.o memset_64.o
+        lib-y += memmove_64.o memmove.o memset_64.o
         lib-y += copy_user_64.o
 	lib-y += cmpxchg16b_emu.o
 endif
diff --git a/arch/x86/lib/memmove.c b/arch/x86/lib/memmove.c
new file mode 100644
index 000000000000..70dbf85dbd35
--- /dev/null
+++ b/arch/x86/lib/memmove.c
@@ -0,0 +1,18 @@
+void *__memmove_fsrm(void *dest, const void *src, size_t count)
+{
+	if (dest < src)
+		return __memmove(dest, src, count);
+
+	asm volatile("rep; movsb\n\t"
+		    : "+D" (dest), "+S" (src), "c" (count));
+
+	return dest;
+}
+
+void *__memmove_erms(void *dest, void *src, size_t count)
+{
+	if (size < 32)
+		return __memmove(dest, src, count);
+
+	return __memmove_fsrm(dest, src, count);
+}
diff --git a/arch/x86/lib/memmove_64.S b/arch/x86/lib/memmove_64.S
index 02661861e5dd..81a89bd146a1 100644
--- a/arch/x86/lib/memmove_64.S
+++ b/arch/x86/lib/memmove_64.S
@@ -26,9 +26,12 @@
  * Output:
  * rax: dest
  */
-SYM_FUNC_START(__memmove)
+SYM_FUNC_START(___memmove)
 
+	/* Handle more 32 bytes in loop */
 	mov %rdi, %rax
+	cmp $0x20, %rdx
+	jb	1f
 
 	/* Decide forward/backward copy mode */
 	cmp %rdi, %rsi
@@ -38,10 +41,7 @@ SYM_FUNC_START(__memmove)
 	cmp %rdi, %r8
 	jg 2f
 
-	/* FSRM implies ERMS => no length checks, do the copy directly */
 .Lmemmove_begin_forward:
-	ALTERNATIVE "cmp $0x20, %rdx; jb 1f", "", X86_FEATURE_FSRM
-	ALTERNATIVE "", "jmp .Lmemmove_erms", X86_FEATURE_ERMS
 
 	/*
 	 * movsq instruction have many startup latency
@@ -207,13 +207,5 @@ SYM_FUNC_START(__memmove)
 	movb %r11b, (%rdi)
 13:
 	RET
-
-.Lmemmove_erms:
-	movq %rdx, %rcx
-	rep movsb
-	RET
-SYM_FUNC_END(__memmove)
-EXPORT_SYMBOL(__memmove)
-
-SYM_FUNC_ALIAS(memmove, __memmove)
-EXPORT_SYMBOL(memmove)
+SYM_FUNC_END(___memmove)
+EXPORT_SYMBOL(___memmove)

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply related	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2023-01-15 23:49 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-01-13 20:34 [PATCH] x86: combine memmove FSRM and ERMS alternatives Daniel Verkamp
2023-01-14  6:55 ` Greg KH
2023-01-14  9:24 ` Ingo Molnar
2023-01-14 10:42 ` Borislav Petkov
2023-01-14 16:17   ` Borislav Petkov
2023-01-14 20:49     ` Borislav Petkov
2023-01-15 23:49       ` Borislav Petkov
  -- strict thread matches above, loose matches on Subject: below --
2023-01-13 20:32 Daniel Verkamp
2023-01-14  6:55 ` Greg KH

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.