From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
stable@vger.kernel.org, Paolo Abeni <pabeni@redhat.com>,
Linus Torvalds <torvalds@linux-foundation.org>,
Alan Cox <gnomes@lxorguk.ukuu.org.uk>,
Andy Lutomirski <luto@kernel.org>, Borislav Petkov <bp@alien8.de>,
Brian Gerst <brgerst@gmail.com>,
Denys Vlasenko <dvlasenk@redhat.com>,
"H. Peter Anvin" <hpa@zytor.com>,
Hannes Frederic Sowa <hannes@stressinduktion.org>,
Josh Poimboeuf <jpoimboe@redhat.com>,
Kees Cook <keescook@chromium.org>,
Peter Zijlstra <peterz@infradead.org>,
Thomas Gleixner <tglx@linutronix.de>,
Ingo Molnar <mingo@kernel.org>,
Mel Gorman <mgorman@techsingularity.net>
Subject: [PATCH 4.12 17/27] x86/uaccess: Optimize copy_user_enhanced_fast_string() for short strings
Date: Mon, 10 Jul 2017 19:10:14 +0200 [thread overview]
Message-ID: <20170710170104.619950083@linuxfoundation.org> (raw)
In-Reply-To: <20170710170103.831324799@linuxfoundation.org>
4.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Paolo Abeni <pabeni@redhat.com>
commit 236222d39347e0e486010f10c1493e83dbbdfba8 upstream.
According to the Intel datasheet, the REP MOVSB instruction
exposes a pretty heavy setup cost (50 ticks), which hurts
short string copy operations.
This change tries to avoid this cost by calling the explicit
loop available in the unrolled code for strings shorter
than 64 bytes.
The 64 bytes cutoff value is arbitrary from the code logic
point of view - it has been selected based on measurements,
as the largest value that still ensures a measurable gain.
Micro benchmarks of the __copy_from_user() function with
lengths in the [0-63] range show this performance gain
(shorter the string, larger the gain):
- in the [55%-4%] range on Intel Xeon(R) CPU E5-2690 v4
- in the [72%-9%] range on Intel Core i7-4810MQ
Other tested CPUs - namely Intel Atom S1260 and AMD Opteron
8216 - show no difference, because they do not expose the
ERMS feature bit.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Alan Cox <gnomes@lxorguk.ukuu.org.uk>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/4533a1d101fd460f80e21329a34928fad521c1d4.1498744345.git.pabeni@redhat.com
[ Clarified the changelog. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
arch/x86/lib/copy_user_64.S | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
--- a/arch/x86/lib/copy_user_64.S
+++ b/arch/x86/lib/copy_user_64.S
@@ -37,7 +37,7 @@ ENTRY(copy_user_generic_unrolled)
movl %edx,%ecx
andl $63,%edx
shrl $6,%ecx
- jz 17f
+ jz .L_copy_short_string
1: movq (%rsi),%r8
2: movq 1*8(%rsi),%r9
3: movq 2*8(%rsi),%r10
@@ -58,7 +58,8 @@ ENTRY(copy_user_generic_unrolled)
leaq 64(%rdi),%rdi
decl %ecx
jnz 1b
-17: movl %edx,%ecx
+.L_copy_short_string:
+ movl %edx,%ecx
andl $7,%edx
shrl $3,%ecx
jz 20f
@@ -174,6 +175,8 @@ EXPORT_SYMBOL(copy_user_generic_string)
*/
ENTRY(copy_user_enhanced_fast_string)
ASM_STAC
+ cmpl $64,%edx
+ jb .L_copy_short_string /* less then 64 bytes, avoid the costly 'rep' */
movl %edx,%ecx
1: rep
movsb
next prev parent reply other threads:[~2017-07-10 17:29 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-07-10 17:09 [PATCH 4.12 00/27] 4.12.1-stable review Greg Kroah-Hartman
2017-07-10 17:09 ` [PATCH 4.12 01/27] driver core: platform: fix race condition with driver_override Greg Kroah-Hartman
2017-07-10 17:09 ` [PATCH 4.12 02/27] RDMA/uverbs: Check port number supplied by user verbs cmds Greg Kroah-Hartman
2017-07-10 17:10 ` [PATCH 4.12 03/27] usb: dwc3: replace %p with %pK Greg Kroah-Hartman
2017-07-10 17:10 ` [PATCH 4.12 04/27] USB: serial: cp210x: add ID for CEL EM3588 USB ZigBee stick Greg Kroah-Hartman
2017-07-10 17:10 ` [PATCH 4.12 05/27] usb: usbip: set buffer pointers to NULL after free Greg Kroah-Hartman
2017-07-10 17:10 ` [PATCH 4.12 06/27] Add USB quirk for HVR-950q to avoid intermittent device resets Greg Kroah-Hartman
2017-07-10 17:10 ` [PATCH 4.12 07/27] usb: Fix typo in the definition of Endpoint[out]Request Greg Kroah-Hartman
2017-07-10 17:10 ` [PATCH 4.12 08/27] USB: core: fix device node leak Greg Kroah-Hartman
2017-07-10 17:10 ` [PATCH 4.12 11/27] xhci: Limit USB2 port wake support for AMD Promontory hosts Greg Kroah-Hartman
2017-07-10 17:10 ` [PATCH 4.12 12/27] gfs2: Fix glock rhashtable rcu bug Greg Kroah-Hartman
2017-07-10 17:10 ` [PATCH 4.12 13/27] Add "shutdown" to "struct class" Greg Kroah-Hartman
2017-07-10 17:10 ` [PATCH 4.12 14/27] tpm: Issue a TPM2_Shutdown for TPM2 devices Greg Kroah-Hartman
2017-07-10 17:10 ` [PATCH 4.12 15/27] tpm: fix a kernel memory leak in tpm-sysfs.c Greg Kroah-Hartman
2017-07-10 17:10 ` [PATCH 4.12 16/27] powerpc/powernv: Fix CPU_HOTPLUG=n idle.c compile error Greg Kroah-Hartman
2017-07-10 17:10 ` Greg Kroah-Hartman [this message]
2017-07-10 17:10 ` [PATCH 4.12 18/27] sched/fair, cpumask: Export for_each_cpu_wrap() Greg Kroah-Hartman
2017-07-10 17:10 ` [PATCH 4.12 19/27] sched/core: Implement new approach to scale select_idle_cpu() Greg Kroah-Hartman
2017-07-10 17:10 ` [PATCH 4.12 20/27] sched/numa: Use down_read_trylock() for the mmap_sem Greg Kroah-Hartman
2017-07-10 17:10 ` [PATCH 4.12 21/27] sched/numa: Override part of migrate_degrades_locality() when idle balancing Greg Kroah-Hartman
2017-07-10 17:10 ` [PATCH 4.12 22/27] sched/fair: Simplify wake_affine() for the single socket case Greg Kroah-Hartman
2017-07-10 17:10 ` [PATCH 4.12 23/27] sched/numa: Implement NUMA node level wake_affine() Greg Kroah-Hartman
2017-07-10 17:10 ` [PATCH 4.12 24/27] sched/fair: Remove effective_load() Greg Kroah-Hartman
2017-07-10 17:10 ` [PATCH 4.12 25/27] sched/numa: Hide numa_wake_affine() from UP build Greg Kroah-Hartman
2017-07-10 17:10 ` [PATCH 4.12 26/27] xen: avoid deadlock in xenbus driver Greg Kroah-Hartman
2017-07-10 17:10 ` [PATCH 4.12 27/27] crypto: drbg - Fixes panic in wait_for_completion call Greg Kroah-Hartman
2017-07-11 1:21 ` [PATCH 4.12 00/27] 4.12.1-stable review Guenter Roeck
2017-07-11 9:47 ` Greg Kroah-Hartman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170710170104.619950083@linuxfoundation.org \
--to=gregkh@linuxfoundation.org \
--cc=bp@alien8.de \
--cc=brgerst@gmail.com \
--cc=dvlasenk@redhat.com \
--cc=gnomes@lxorguk.ukuu.org.uk \
--cc=hannes@stressinduktion.org \
--cc=hpa@zytor.com \
--cc=jpoimboe@redhat.com \
--cc=keescook@chromium.org \
--cc=linux-kernel@vger.kernel.org \
--cc=luto@kernel.org \
--cc=mgorman@techsingularity.net \
--cc=mingo@kernel.org \
--cc=pabeni@redhat.com \
--cc=peterz@infradead.org \
--cc=stable@vger.kernel.org \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox