All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] x86/uaccess: use unrolled string copy for short strings
@ 2017-06-21 11:09 Paolo Abeni
  2017-06-21 17:38 ` Kees Cook
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Paolo Abeni @ 2017-06-21 11:09 UTC (permalink / raw)
  To: x86
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Al Viro, Kees Cook,
	Hannes Frederic Sowa, linux-kernel

The 'rep' prefix suffers for a relevant "setup cost"; as a result
string copies with unrolled loops are faster than even
optimized string copy using 'rep' variant, for short string.

This change updates __copy_user_generic() to use the unrolled
version for small string length. The threshold length for short
string - 64 - has been selected with empirical measures as the
larger value that still ensure a measurable gain.

A micro-benchmark of __copy_from_user() with different lengths shows
the following:

string len	vanilla		patched 	delta
bytes		ticks		ticks		tick(%)

0		58		26		32(55%)
1		49		29		20(40%)
2		49		31		18(36%)
3		49		32		17(34%)
4		50		34		16(32%)
5		49		35		14(28%)
6		49		36		13(26%)
7		49		38		11(22%)
8		50		31		19(38%)
9		51		33		18(35%)
10		52		36		16(30%)
11		52		37		15(28%)
12		52		38		14(26%)
13		52		40		12(23%)
14		52		41		11(21%)
15		52		42		10(19%)
16		51		34		17(33%)
17		51		35		16(31%)
18		52		37		15(28%)
19		51		38		13(25%)
20		52		39		13(25%)
21		52		40		12(23%)
22		51		42		9(17%)
23		51		46		5(9%)
24		52		35		17(32%)
25		52		37		15(28%)
26		52		38		14(26%)
27		52		39		13(25%)
28		52		40		12(23%)
29		53		42		11(20%)
30		52		43		9(17%)
31		52		44		8(15%)
32		51		36		15(29%)
33		51		38		13(25%)
34		51		39		12(23%)
35		51		41		10(19%)
36		52		41		11(21%)
37		52		43		9(17%)
38		51		44		7(13%)
39		52		46		6(11%)
40		51		37		14(27%)
41		50		38		12(24%)
42		50		39		11(22%)
43		50		40		10(20%)
44		50		42		8(16%)
45		50		43		7(14%)
46		50		43		7(14%)
47		50		45		5(10%)
48		50		37		13(26%)
49		49		38		11(22%)
50		50		40		10(20%)
51		50		42		8(16%)
52		50		42		8(16%)
53		49		46		3(6%)
54		50		46		4(8%)
55		49		48		1(2%)
56		50		39		11(22%)
57		50		40		10(20%)
58		49		42		7(14%)
59		50		42		8(16%)
60		50		46		4(8%)
61		50		47		3(6%)
62		50		48		2(4%)
63		50		48		2(4%)
64		51		38		13(25%)

Above 64 bytes the gain fades away.

Very similar values are collectd for __copy_to_user().
UDP receive performances under flood with small packets using recvfrom()
increase by ~5%.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 arch/x86/include/asm/uaccess_64.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/x86/include/asm/uaccess_64.h b/arch/x86/include/asm/uaccess_64.h
index c5504b9..16a8871 100644
--- a/arch/x86/include/asm/uaccess_64.h
+++ b/arch/x86/include/asm/uaccess_64.h
@@ -28,6 +28,9 @@ copy_user_generic(void *to, const void *from, unsigned len)
 {
 	unsigned ret;
 
+	if (len <= 64)
+		return copy_user_generic_unrolled(to, from, len);
+
 	/*
 	 * If CPU has ERMS feature, use copy_user_enhanced_fast_string.
 	 * Otherwise, if CPU has rep_good feature, use copy_user_generic_string.
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2017-06-30 13:17 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-06-21 11:09 [PATCH] x86/uaccess: use unrolled string copy for short strings Paolo Abeni
2017-06-21 17:38 ` Kees Cook
2017-06-22 14:55   ` Alan Cox
2017-06-22  8:47 ` Ingo Molnar
2017-06-22 17:02   ` Paolo Abeni
2017-06-22 17:30 ` Linus Torvalds
2017-06-22 17:54   ` Paolo Abeni
2017-06-29 13:55   ` [PATCH] x86/uaccess: optimize copy_user_enhanced_fast_string for short string Paolo Abeni
2017-06-29 21:40     ` Linus Torvalds
2017-06-30 13:10     ` [tip:x86/asm] x86/uaccess: Optimize copy_user_enhanced_fast_string() for short strings tip-bot for Paolo Abeni

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.