From: Sasha Levin <sashal@kernel.org>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: Mateusz Guzik <mjguzik@gmail.com>,
Linus Torvalds <torvalds@linux-foundation.org>,
Sasha Levin <sashal@kernel.org>,
tglx@linutronix.de, mingo@redhat.com, bp@alien8.de,
dave.hansen@linux.intel.com, x86@kernel.org, ira.weiny@intel.com
Subject: [PATCH AUTOSEL 6.5 36/36] x86: bring back rep movsq for user access on CPUs without ERMS
Date: Fri, 8 Sep 2023 15:28:47 -0400 [thread overview]
Message-ID: <20230908192848.3462476-36-sashal@kernel.org> (raw)
In-Reply-To: <20230908192848.3462476-1-sashal@kernel.org>
From: Mateusz Guzik <mjguzik@gmail.com>
[ Upstream commit ca96b162bfd21a5d55e3cd6099e4ee357a0eeb68 ]
Intel CPUs ship with ERMS for over a decade, but this is not true for
AMD. In particular one reasonably recent uarch (EPYC 7R13) does not
have it (or at least the bit is inactive when running on the Amazon EC2
cloud -- I found rather conflicting information about AMD CPUs vs the
extension).
Hand-rolled mov loops executing in this case are quite pessimal compared
to rep movsq for bigger sizes. While the upper limit depends on uarch,
everyone is well south of 1KB AFAICS and sizes bigger than that are
common.
While technically ancient CPUs may be suffering from rep usage, gcc has
been emitting it for years all over kernel code, so I don't think this
is a legitimate concern.
Sample result from read1_processes from will-it-scale (4KB reads/s):
before: 1507021
after: 1721828 (+14%)
Note that the cutoff point for rep usage is set to 64 bytes, which is
way too conservative but I'm sticking to what was done in 47ee3f1dd93b
("x86: re-introduce support for ERMS copies for user space accesses").
That is to say *some* copies will now go slower, which is fixable but
beyond the scope of this patch.
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
arch/x86/include/asm/uaccess_64.h | 2 +-
arch/x86/lib/copy_user_64.S | 57 +++++++------------------------
2 files changed, 14 insertions(+), 45 deletions(-)
diff --git a/arch/x86/include/asm/uaccess_64.h b/arch/x86/include/asm/uaccess_64.h
index 81b826d3b7530..f2c02e4469ccc 100644
--- a/arch/x86/include/asm/uaccess_64.h
+++ b/arch/x86/include/asm/uaccess_64.h
@@ -116,7 +116,7 @@ copy_user_generic(void *to, const void *from, unsigned long len)
"2:\n"
_ASM_EXTABLE_UA(1b, 2b)
:"+c" (len), "+D" (to), "+S" (from), ASM_CALL_CONSTRAINT
- : : "memory", "rax", "r8", "r9", "r10", "r11");
+ : : "memory", "rax");
clac();
return len;
}
diff --git a/arch/x86/lib/copy_user_64.S b/arch/x86/lib/copy_user_64.S
index 01c5de4c279b8..0a81aafed7f88 100644
--- a/arch/x86/lib/copy_user_64.S
+++ b/arch/x86/lib/copy_user_64.S
@@ -27,7 +27,7 @@
* NOTE! The calling convention is very intentionally the same as
* for 'rep movs', so that we can rewrite the function call with
* just a plain 'rep movs' on machines that have FSRM. But to make
- * it simpler for us, we can clobber rsi/rdi and rax/r8-r11 freely.
+ * it simpler for us, we can clobber rsi/rdi and rax freely.
*/
SYM_FUNC_START(rep_movs_alternative)
cmpq $64,%rcx
@@ -68,55 +68,24 @@ SYM_FUNC_START(rep_movs_alternative)
_ASM_EXTABLE_UA( 3b, .Lcopy_user_tail)
.Llarge:
-0: ALTERNATIVE "jmp .Lunrolled", "rep movsb", X86_FEATURE_ERMS
+0: ALTERNATIVE "jmp .Llarge_movsq", "rep movsb", X86_FEATURE_ERMS
1: RET
- _ASM_EXTABLE_UA( 0b, 1b)
+ _ASM_EXTABLE_UA( 0b, 1b)
- .p2align 4
-.Lunrolled:
-10: movq (%rsi),%r8
-11: movq 8(%rsi),%r9
-12: movq 16(%rsi),%r10
-13: movq 24(%rsi),%r11
-14: movq %r8,(%rdi)
-15: movq %r9,8(%rdi)
-16: movq %r10,16(%rdi)
-17: movq %r11,24(%rdi)
-20: movq 32(%rsi),%r8
-21: movq 40(%rsi),%r9
-22: movq 48(%rsi),%r10
-23: movq 56(%rsi),%r11
-24: movq %r8,32(%rdi)
-25: movq %r9,40(%rdi)
-26: movq %r10,48(%rdi)
-27: movq %r11,56(%rdi)
- addq $64,%rsi
- addq $64,%rdi
- subq $64,%rcx
- cmpq $64,%rcx
- jae .Lunrolled
- cmpl $8,%ecx
- jae .Lword
+.Llarge_movsq:
+ movq %rcx,%rax
+ shrq $3,%rcx
+ andl $7,%eax
+0: rep movsq
+ movl %eax,%ecx
testl %ecx,%ecx
jne .Lcopy_user_tail
RET
- _ASM_EXTABLE_UA(10b, .Lcopy_user_tail)
- _ASM_EXTABLE_UA(11b, .Lcopy_user_tail)
- _ASM_EXTABLE_UA(12b, .Lcopy_user_tail)
- _ASM_EXTABLE_UA(13b, .Lcopy_user_tail)
- _ASM_EXTABLE_UA(14b, .Lcopy_user_tail)
- _ASM_EXTABLE_UA(15b, .Lcopy_user_tail)
- _ASM_EXTABLE_UA(16b, .Lcopy_user_tail)
- _ASM_EXTABLE_UA(17b, .Lcopy_user_tail)
- _ASM_EXTABLE_UA(20b, .Lcopy_user_tail)
- _ASM_EXTABLE_UA(21b, .Lcopy_user_tail)
- _ASM_EXTABLE_UA(22b, .Lcopy_user_tail)
- _ASM_EXTABLE_UA(23b, .Lcopy_user_tail)
- _ASM_EXTABLE_UA(24b, .Lcopy_user_tail)
- _ASM_EXTABLE_UA(25b, .Lcopy_user_tail)
- _ASM_EXTABLE_UA(26b, .Lcopy_user_tail)
- _ASM_EXTABLE_UA(27b, .Lcopy_user_tail)
+1: leaq (%rax,%rcx,8),%rcx
+ jmp .Lcopy_user_tail
+
+ _ASM_EXTABLE_UA( 0b, 1b)
SYM_FUNC_END(rep_movs_alternative)
EXPORT_SYMBOL(rep_movs_alternative)
--
2.40.1
prev parent reply other threads:[~2023-09-08 19:32 UTC|newest]
Thread overview: 48+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-09-08 19:28 [PATCH AUTOSEL 6.5 01/36] drm/bridge: tc358762: Instruct DSI host to generate HSE packets Sasha Levin
2023-09-08 19:28 ` [PATCH AUTOSEL 6.5 02/36] drm/edid: Add quirk for OSVR HDK 2.0 Sasha Levin
2023-09-08 19:28 ` [PATCH AUTOSEL 6.5 03/36] drm: bridge: samsung-dsim: Drain command transfer FIFO before transfer Sasha Levin
2023-09-08 19:28 ` [PATCH AUTOSEL 6.5 04/36] ASoC: amd: vangogh: Use dmi_first_match() for DMI quirk handling Sasha Levin
2023-09-08 19:28 ` [PATCH AUTOSEL 6.5 05/36] arm64: dts: qcom: sm6125-pdx201: correct ramoops pmsg-size Sasha Levin
2023-09-08 19:28 ` [PATCH AUTOSEL 6.5 06/36] arm64: dts: qcom: sm6125-sprout: " Sasha Levin
2023-09-08 19:28 ` [PATCH AUTOSEL 6.5 07/36] arm64: dts: qcom: sm6350: " Sasha Levin
2023-09-08 19:28 ` [PATCH AUTOSEL 6.5 08/36] arm64: dts: qcom: sm8150-kumano: " Sasha Levin
2023-09-08 19:28 ` [PATCH AUTOSEL 6.5 09/36] arm64: dts: qcom: sm8250-edo: " Sasha Levin
2023-09-08 19:28 ` [PATCH AUTOSEL 6.5 10/36] drm/amdgpu: Increase soft IH ring size Sasha Levin
2023-09-08 19:28 ` [PATCH AUTOSEL 6.5 11/36] drm/amd/display: Add stream overhead in BW calculations for 128b/132b Sasha Levin
2023-09-08 19:28 ` [PATCH AUTOSEL 6.5 12/36] samples/hw_breakpoint: Fix kernel BUG 'invalid opcode: 0000' Sasha Levin
2023-09-08 19:28 ` [PATCH AUTOSEL 6.5 13/36] drm/amdgpu: Update ring scheduler info as needed Sasha Levin
2023-09-08 19:28 ` [PATCH AUTOSEL 6.5 14/36] drm/amd/display: Read down-spread percentage from lut to adjust dprefclk Sasha Levin
2023-09-08 19:28 ` [PATCH AUTOSEL 6.5 15/36] drm/amd/display: Fix underflow issue on 175hz timing Sasha Levin
2023-09-08 19:28 ` [PATCH AUTOSEL 6.5 16/36] drm/vkms: Fix race-condition between the hrtimer and the atomic commit Sasha Levin
2023-09-08 19:28 ` [PATCH AUTOSEL 6.5 17/36] ASoC: SOF: topology: simplify code to prevent static analysis warnings Sasha Levin
2023-09-08 19:28 ` [PATCH AUTOSEL 6.5 18/36] ASoC: Intel: sof_sdw: Update BT offload config for soundwire config Sasha Levin
2023-09-08 19:28 ` [PATCH AUTOSEL 6.5 19/36] ALSA: hda: intel-dsp-cfg: add LunarLake support Sasha Levin
2023-09-08 19:28 ` [PATCH AUTOSEL 6.5 20/36] drm/amd/display: Use DTBCLK as refclk instead of DPREFCLK Sasha Levin
2023-09-08 19:28 ` [PATCH AUTOSEL 6.5 21/36] drm/amd/display: Blocking invalid 420 modes on HDMI TMDS for DCN31 Sasha Levin
2023-09-08 19:28 ` [PATCH AUTOSEL 6.5 22/36] drm/amd/display: Blocking invalid 420 modes on HDMI TMDS for DCN314 Sasha Levin
2023-09-08 19:28 ` [PATCH AUTOSEL 6.5 23/36] drm/amd/display: Use max memclk variable when setting max memclk Sasha Levin
2023-09-08 19:28 ` [PATCH AUTOSEL 6.5 24/36] drm/msm/adreno: Use quirk identify hw_apriv Sasha Levin
2023-09-08 19:28 ` [PATCH AUTOSEL 6.5 25/36] drm/msm/adreno: Use quirk to identify cached-coherent support Sasha Levin
2023-09-08 19:28 ` [PATCH AUTOSEL 6.5 26/36] drm/exynos: fix a possible null-pointer dereference due to data race in exynos_drm_crtc_atomic_disable() Sasha Levin
2023-09-08 19:28 ` [PATCH AUTOSEL 6.5 27/36] io_uring: annotate the struct io_kiocb slab for appropriate user copy Sasha Levin
2023-09-08 19:28 ` [PATCH AUTOSEL 6.5 28/36] drm/mediatek: dp: Change logging to dev for mtk_dp_aux_transfer() Sasha Levin
2023-09-08 19:28 ` [PATCH AUTOSEL 6.5 29/36] bus: ti-sysc: Configure uart quirks for k3 SoC Sasha Levin
2023-09-08 19:28 ` [PATCH AUTOSEL 6.5 30/36] arm64: dts: qcom: sc8280xp-x13s: Add camera activity LED Sasha Levin
2023-09-11 6:33 ` Johan Hovold
2023-09-18 21:41 ` Sasha Levin
2023-09-19 6:15 ` Johan Hovold
2023-09-19 13:06 ` Sasha Levin
2023-09-19 13:28 ` Johan Hovold
2023-09-19 15:09 ` Sasha Levin
2023-09-19 15:40 ` Johan Hovold
2023-09-19 16:00 ` Johan Hovold
2023-09-20 4:53 ` Thorsten Leemhuis
2023-09-20 7:06 ` Johan Hovold
2023-09-20 7:16 ` Krzysztof Kozlowski
2023-09-08 19:28 ` [PATCH AUTOSEL 6.5 31/36] block: Allow bio_iov_iter_get_pages() with bio->bi_bdev unset Sasha Levin
2023-09-08 19:32 ` Jens Axboe
2023-09-08 19:28 ` [PATCH AUTOSEL 6.5 32/36] md: raid1: fix potential OOB in raid1_remove_disk() Sasha Levin
2023-09-08 19:28 ` [PATCH AUTOSEL 6.5 33/36] ext2: fix datatype of block number in ext2_xattr_set2() Sasha Levin
2023-09-08 19:28 ` [PATCH AUTOSEL 6.5 34/36] blk-mq: fix tags leak when shrink nr_hw_queues Sasha Levin
2023-09-08 19:28 ` [PATCH AUTOSEL 6.5 35/36] ASoC: SOF: amd: clear panic mask status when panic occurs Sasha Levin
2023-09-08 19:28 ` Sasha Levin [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230908192848.3462476-36-sashal@kernel.org \
--to=sashal@kernel.org \
--cc=bp@alien8.de \
--cc=dave.hansen@linux.intel.com \
--cc=ira.weiny@intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=mjguzik@gmail.com \
--cc=stable@vger.kernel.org \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox