public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH -tip 0/4] x86: signal handler improvement
@ 2008-09-23  1:45 Hiroshi Shimamoto
  2008-09-23  1:50 ` [RFC PATCH -tip 1/4] x86: uaccess: rename __put_user_u64 to __put_user_asm_u64 Hiroshi Shimamoto
                   ` (4 more replies)
  0 siblings, 5 replies; 9+ messages in thread
From: Hiroshi Shimamoto @ 2008-09-23  1:45 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, H. Peter Anvin; +Cc: linux-kernel

This patch series is experimental.
I made it against tip/master.

I noticed there are inefficient codes in x86 signals.
For example, disassembled 32-bit setup_sigcontext();

0000007c <setup_sigcontext>:
  7c:	55                   	push   %ebp
  7d:	89 e5                	mov    %esp,%ebp
  7f:	57                   	push   %edi
  80:	31 ff                	xor    %edi,%edi
  82:	56                   	push   %esi
  83:	89 c6                	mov    %eax,%esi
  85:	53                   	push   %ebx
  86:	83 ec 58             	sub    $0x58,%esp
  89:	89 55 a4             	mov    %edx,-0x5c(%ebp)
  8c:	89 fa                	mov    %edi,%edx
  8e:	8b 41 24             	mov    0x24(%ecx),%eax
  91:	89 46 04             	mov    %eax,0x4(%esi)
  94:	89 55 a8             	mov    %edx,-0x58(%ebp)
...
 184:	8b 5d ac             	mov    -0x54(%ebp),%ebx
 187:	0b 5d a8             	or     -0x58(%ebp),%ebx
 18a:	0b 5d b0             	or     -0x50(%ebp),%ebx
 18d:	0b 5d b4             	or     -0x4c(%ebp),%ebx
 190:	0b 5d b8             	or     -0x48(%ebp),%ebx
 193:	0b 5d bc             	or     -0x44(%ebp),%ebx
 196:	0b 5d c0             	or     -0x40(%ebp),%ebx
 199:	0b 5d c4             	or     -0x3c(%ebp),%ebx
 19c:	0b 5d c8             	or     -0x38(%ebp),%ebx
 19f:	0b 5d cc             	or     -0x34(%ebp),%ebx
 1a2:	0b 5d d0             	or     -0x30(%ebp),%ebx
 1a5:	0b 5d d4             	or     -0x2c(%ebp),%ebx
 1a8:	0b 5d d8             	or     -0x28(%ebp),%ebx
 1ab:	0b 5d dc             	or     -0x24(%ebp),%ebx
 1ae:	0b 5d e0             	or     -0x20(%ebp),%ebx
 1b1:	0b 5d e4             	or     -0x1c(%ebp),%ebx
 1b4:	0b 5d e8             	or     -0x18(%ebp),%ebx
 1b7:	0b 5d ec             	or     -0x14(%ebp),%ebx
 1ba:	0b 5d f0             	or     -0x10(%ebp),%ebx
 1bd:	09 fb                	or     %edi,%ebx
...
 1dc:	09 d8                	or     %ebx,%eax
 1de:	5b                   	pop    %ebx
 1df:	09 c8                	or     %ecx,%eax
 1e1:	5e                   	pop    %esi
 1e2:	5f                   	pop    %edi
 1e3:	5d                   	pop    %ebp
 1e4:	c3                   	ret    

there is a lot of "or" operation with stack, and it came from a set of
following lines;

err |= __put_user(x, ptr);

The above line compiled to like this;
  a0:	89 fa                	mov    %edi,%edx
  a2:	8b 41 20             	mov    0x20(%ecx),%eax
  a5:	89 46 08             	mov    %eax,0x8(%esi)
  a8:	89 55 b0             	mov    %edx,-0x50(%ebp)

and errors in __put_user is stored to stack.
At the end of function, all errors in stack are calculated. If access to user
space is failed, %edx is set to -EFAULT in exception handler and stored to
stack for later operation.
It seems inefficient.

This patch series introduce __{put|get}_user_cerr for cumulative error
handling. So, the line;
err |= __put_user(x, ptr);
changes to
__put_user_cerr(x, ptr, err);

and it compiled to like this;
  92:	8b 41 20             	mov    0x20(%ecx),%eax
  95:	89 47 08             	mov    %eax,0x8(%edi)

The cumulative error is kept in the other register, in this example,
%esi is used for this, and returns it. If access to user space causes fault,
%esi is set to the value (%esi | -EFAULT) in exception handler.

 137:	89 f0                	mov    %esi,%eax
 139:	5b                   	pop    %ebx
 13a:	5e                   	pop    %esi
 13b:	5f                   	pop    %edi
 13c:	5d                   	pop    %ebp
 13d:	c3                   	ret    

The results of this, I got a little performance improvement in signal
handling. Here is a result of lmbench.
I've tried 64-bit only at this time. Will measure on 32-bit.

Processor, Processes - times in microseconds - smaller is better
------------------------------------------------------------------------------
Host                 OS  Mhz null null      open slct sig  sig  fork exec sh
                             call  I/O stat clos TCP  inst hndl proc proc proc
--------- ------------- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ----
tip64-sig Linux 2.6.27- 3788 0.17 0.27 1.67 2.65 3.20 0.32 2.97 180. 524. 1806
tip64-sig Linux 2.6.27- 3788 0.17 0.27 1.68 2.65 3.20 0.32 2.95 180. 520. 1796
tip64-sig Linux 2.6.27- 3788 0.17 0.27 1.68 2.72 3.21 0.32 2.96 177. 528. 1812
tip64     Linux 2.6.27- 3788 0.17 0.27 1.68 2.74 3.22 0.32 3.01 174. 523. 1853
tip64     Linux 2.6.27- 3788 0.17 0.27 1.68 2.73 3.22 0.32 3.01 175. 523. 1806
tip64     Linux 2.6.27- 3788 0.17 0.27 1.68 2.74 3.20 0.32 3.01 181. 523. 1791

This patch series also reduces stack usages and code size.

$ size signal_*
   text	   data	    bss	    dec	    hex	filename
   4507	      0	      0	   4507	   119b	signal_32.o.new
   5031	      0	      0	   5031	   13a7	signal_32.o.old
   3827	      0	      0	   3827	    ef3	signal_64.o.new
   4652	      0	      0	   4652	   122c	signal_64.o.old

Comments are welcome.
I'll handle this patch series if it's good.

thanks,
Hiroshi Shimamoto


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [RFC PATCH -tip 1/4] x86: uaccess: rename __put_user_u64 to __put_user_asm_u64
  2008-09-23  1:45 [RFC PATCH -tip 0/4] x86: signal handler improvement Hiroshi Shimamoto
@ 2008-09-23  1:50 ` Hiroshi Shimamoto
  2008-09-23  1:51 ` [RFC PATCH -tip 2/4] x86: uaccess: introduce __{put|get}_user_asm_eop Hiroshi Shimamoto
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 9+ messages in thread
From: Hiroshi Shimamoto @ 2008-09-23  1:50 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, H. Peter Anvin; +Cc: linux-kernel

From: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>

Use same rule to name of __put_user and __get_user.

Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
---
 include/asm-x86/uaccess.h |    6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/include/asm-x86/uaccess.h b/include/asm-x86/uaccess.h
index 414b116..c098dfe 100644
--- a/include/asm-x86/uaccess.h
+++ b/include/asm-x86/uaccess.h
@@ -186,7 +186,7 @@ extern int __get_user_bad(void);
 
 
 #ifdef CONFIG_X86_32
-#define __put_user_u64(x, addr, err)					\
+#define __put_user_asm_u64(x, addr, err)					\
 	asm volatile("1:	movl %%eax,0(%2)\n"			\
 		     "2:	movl %%edx,4(%2)\n"			\
 		     "3:\n"						\
@@ -203,7 +203,7 @@ extern int __get_user_bad(void);
 	asm volatile("call __put_user_8" : "=a" (__ret_pu)	\
 		     : "A" ((typeof(*(ptr)))(x)), "c" (ptr) : "ebx")
 #else
-#define __put_user_u64(x, ptr, retval) \
+#define __put_user_asm_u64(x, ptr, retval) \
 	__put_user_asm(x, ptr, retval, "q", "", "Zr", -EFAULT)
 #define __put_user_x8(x, ptr, __ret_pu) __put_user_x(8, x, ptr, __ret_pu)
 #endif
@@ -279,7 +279,7 @@ do {									\
 		__put_user_asm(x, ptr, retval, "l", "k",  "ir", errret);\
 		break;							\
 	case 8:								\
-		__put_user_u64((__typeof__(*ptr))(x), ptr, retval);	\
+		__put_user_asm_u64((__typeof__(*ptr))(x), ptr, retval);	\
 		break;							\
 	default:							\
 		__put_user_bad();					\
-- 
1.5.6


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [RFC PATCH -tip 2/4] x86: uaccess: introduce __{put|get}_user_asm_eop
  2008-09-23  1:45 [RFC PATCH -tip 0/4] x86: signal handler improvement Hiroshi Shimamoto
  2008-09-23  1:50 ` [RFC PATCH -tip 1/4] x86: uaccess: rename __put_user_u64 to __put_user_asm_u64 Hiroshi Shimamoto
@ 2008-09-23  1:51 ` Hiroshi Shimamoto
  2008-09-23  1:51 ` [RFC PATCH -tip 3/4] x86: uaccess: introduce __{put|get}_user_cerr Hiroshi Shimamoto
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 9+ messages in thread
From: Hiroshi Shimamoto @ 2008-09-23  1:51 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, H. Peter Anvin; +Cc: linux-kernel

From: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>

Introduce __{put|get}_user_asm_eop which receives eop for error opecode.
Define __{put|get}_user_asm as __{put|get}_user_asm_eop with "mov".

Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
---
 include/asm-x86/uaccess.h |   27 +++++++++++++++++++++------
 1 files changed, 21 insertions(+), 6 deletions(-)

diff --git a/include/asm-x86/uaccess.h b/include/asm-x86/uaccess.h
index c098dfe..84b0600 100644
--- a/include/asm-x86/uaccess.h
+++ b/include/asm-x86/uaccess.h
@@ -186,12 +186,12 @@ extern int __get_user_bad(void);
 
 
 #ifdef CONFIG_X86_32
-#define __put_user_asm_u64(x, addr, err)					\
+#define __put_user_asm_eop_u64(x, addr, eop, err)			\
 	asm volatile("1:	movl %%eax,0(%2)\n"			\
 		     "2:	movl %%edx,4(%2)\n"			\
 		     "3:\n"						\
 		     ".section .fixup,\"ax\"\n"				\
-		     "4:	movl %3,%0\n"				\
+		     "4:	" eop " %3,%0\n"			\
 		     "	jmp 3b\n"					\
 		     ".previous\n"					\
 		     _ASM_EXTABLE(1b, 4b)				\
@@ -199,12 +199,17 @@ extern int __get_user_bad(void);
 		     : "=r" (err)					\
 		     : "A" (x), "r" (addr), "i" (-EFAULT), "0" (err))
 
+#define __put_user_asm_u64(x, addr, err)			\
+	__put_user_asm_eop_u64(x, addr, "movl", err)
+
 #define __put_user_x8(x, ptr, __ret_pu)				\
 	asm volatile("call __put_user_8" : "=a" (__ret_pu)	\
 		     : "A" ((typeof(*(ptr)))(x)), "c" (ptr) : "ebx")
 #else
 #define __put_user_asm_u64(x, ptr, retval) \
 	__put_user_asm(x, ptr, retval, "q", "", "Zr", -EFAULT)
+#define __put_user_asm_eop_u64(x, ptr, eop, retval) \
+	__put_user_asm_eop(x, ptr, retval, "q", "", "Zr", eop, -EFAULT)
 #define __put_user_x8(x, ptr, __ret_pu) __put_user_x(8, x, ptr, __ret_pu)
 #endif
 
@@ -311,9 +316,12 @@ do {									\
 
 #ifdef CONFIG_X86_32
 #define __get_user_asm_u64(x, ptr, retval, errret)	(x) = __get_user_bad()
+#define __get_user_asm_eop_u64(x, ptr, retval, eop, errret)	(x) = __get_user_bad()
 #else
 #define __get_user_asm_u64(x, ptr, retval, errret) \
 	 __get_user_asm(x, ptr, retval, "q", "", "=r", errret)
+#define __get_user_asm_eop_u64(x, ptr, retval, eop, errret) \
+	 __get_user_asm_eop(x, ptr, retval, "q", "", "=r", eop, errret)
 #endif
 
 #define __get_user_size(x, ptr, size, retval, errret)			\
@@ -338,11 +346,11 @@ do {									\
 	}								\
 } while (0)
 
-#define __get_user_asm(x, addr, err, itype, rtype, ltype, errret)	\
+#define __get_user_asm_eop(x, addr, err, itype, rtype, ltype, eop, errret)	\
 	asm volatile("1:	mov"itype" %2,%"rtype"1\n"		\
 		     "2:\n"						\
 		     ".section .fixup,\"ax\"\n"				\
-		     "3:	mov %3,%0\n"				\
+		     "3:	" eop " %3,%0\n"			\
 		     "	xor"itype" %"rtype"1,%"rtype"1\n"		\
 		     "	jmp 2b\n"					\
 		     ".previous\n"					\
@@ -350,6 +358,9 @@ do {									\
 		     : "=r" (err), ltype(x)				\
 		     : "m" (__m(addr)), "i" (errret), "0" (err))
 
+#define __get_user_asm(x, addr, err, itype, rtype, ltype, errret)	\
+	__get_user_asm_eop(x, addr, err, itype, rtype, ltype, "mov", errret)
+
 #define __put_user_nocheck(x, ptr, size)			\
 ({								\
 	long __pu_err;						\
@@ -375,16 +386,20 @@ struct __large_struct { unsigned long buf[100]; };
  * we do not write to any memory gcc knows about, so there are no
  * aliasing issues.
  */
-#define __put_user_asm(x, addr, err, itype, rtype, ltype, errret)	\
+#define __put_user_asm_eop(x, addr, err, itype, rtype, ltype, eop, errret)	\
 	asm volatile("1:	mov"itype" %"rtype"1,%2\n"		\
 		     "2:\n"						\
 		     ".section .fixup,\"ax\"\n"				\
-		     "3:	mov %3,%0\n"				\
+		     "3:	" eop " %3,%0\n"			\
 		     "	jmp 2b\n"					\
 		     ".previous\n"					\
 		     _ASM_EXTABLE(1b, 3b)				\
 		     : "=r"(err)					\
 		     : ltype(x), "m" (__m(addr)), "i" (errret), "0" (err))
+
+#define __put_user_asm(x, addr, err, itype, rtype, ltype, errret)	\
+	__put_user_asm_eop(x, addr, err, itype, rtype, ltype, "mov", errret)
+
 /**
  * __get_user: - Get a simple variable from user space, with less checking.
  * @x:   Variable to store result.
-- 
1.5.6


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [RFC PATCH -tip 3/4] x86: uaccess: introduce __{put|get}_user_cerr
  2008-09-23  1:45 [RFC PATCH -tip 0/4] x86: signal handler improvement Hiroshi Shimamoto
  2008-09-23  1:50 ` [RFC PATCH -tip 1/4] x86: uaccess: rename __put_user_u64 to __put_user_asm_u64 Hiroshi Shimamoto
  2008-09-23  1:51 ` [RFC PATCH -tip 2/4] x86: uaccess: introduce __{put|get}_user_asm_eop Hiroshi Shimamoto
@ 2008-09-23  1:51 ` Hiroshi Shimamoto
  2008-09-23  1:51 ` [RFC PATCH -tip 4/4] x86: signal: use __{put|get}_user_cerr Hiroshi Shimamoto
  2008-09-23  8:50 ` [RFC PATCH -tip 0/4] x86: signal handler improvement Ingo Molnar
  4 siblings, 0 replies; 9+ messages in thread
From: Hiroshi Shimamoto @ 2008-09-23  1:51 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, H. Peter Anvin; +Cc: linux-kernel

From: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>

Introduce __{put|get}_user_cerr for cumulative error handling.
The following 2 lines are same.
  __{put|get}_user_cerr(x, ptr, err);
  err |= __{put|get}_user(x, ptr);

Introduce __{put|get}_user_size_cerr for internal use from __{put|get}_user_cerr.

Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
---
 include/asm-x86/uaccess.h |   74 +++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 74 insertions(+), 0 deletions(-)

diff --git a/include/asm-x86/uaccess.h b/include/asm-x86/uaccess.h
index 84b0600..b83448a 100644
--- a/include/asm-x86/uaccess.h
+++ b/include/asm-x86/uaccess.h
@@ -291,6 +291,31 @@ do {									\
 	}								\
 } while (0)
 
+#define __put_user_size_cerr(x, ptr, size, retval, errret)		\
+do {									\
+	__chk_user_ptr(ptr);						\
+	switch (size) {							\
+	case 1:								\
+		__put_user_asm_eop(x, ptr, retval, "b", "b", "iq",	\
+							"or", errret);	\
+		break;							\
+	case 2:								\
+		__put_user_asm_eop(x, ptr, retval, "w", "w", "ir",	\
+							"or", errret);	\
+		break;							\
+	case 4:								\
+		__put_user_asm_eop(x, ptr, retval, "l", "k",  "ir",	\
+							"or", errret);	\
+		break;							\
+	case 8:								\
+		__put_user_asm_eop_u64((__typeof__(*ptr))(x), ptr,	\
+							"or", retval);	\
+		break;							\
+	default:							\
+		__put_user_bad();					\
+	}								\
+} while (0)
+
 #else
 
 #define __put_user_size(x, ptr, size, retval, errret)			\
@@ -302,6 +327,14 @@ do {									\
 		retval = errret;					\
 } while (0)
 
+#define __put_user_size_cerr(x, ptr, size, retval, errret)		\
+do {									\
+	__typeof__(*(ptr))__pus_tmp = x;				\
+									\
+	if (unlikely(__copy_to_user_ll(ptr, &__pus_tmp, size) != 0))	\
+		retval |= errret;					\
+} while (0)
+
 #define put_user(x, ptr)					\
 ({								\
 	int __ret_pu;						\
@@ -346,6 +379,30 @@ do {									\
 	}								\
 } while (0)
 
+#define __get_user_size_cerr(x, ptr, size, retval, errret)		\
+do {									\
+	__chk_user_ptr(ptr);						\
+	switch (size) {							\
+	case 1:								\
+		__get_user_asm_eop(x, ptr, retval, "b", "b", "=q",	\
+							"or", errret);	\
+		break;							\
+	case 2:								\
+		__get_user_asm_eop(x, ptr, retval, "w", "w", "=r",	\
+							"or", errret);	\
+		break;							\
+	case 4:								\
+		__get_user_asm_eop(x, ptr, retval, "l", "k", "=r",	\
+							"or", errret);	\
+		break;							\
+	case 8:								\
+		__get_user_asm_eop_u64(x, ptr, retval, "or", errret);	\
+		break;							\
+	default:							\
+		(x) = __get_user_bad();					\
+	}								\
+} while (0)
+
 #define __get_user_asm_eop(x, addr, err, itype, rtype, ltype, eop, errret)	\
 	asm volatile("1:	mov"itype" %2,%"rtype"1\n"		\
 		     "2:\n"						\
@@ -368,6 +425,11 @@ do {									\
 	__pu_err;						\
 })
 
+#define __put_user_nocheck_cerr(x, ptr, size, err)			\
+do {									\
+	__put_user_size_cerr((x), (ptr), (size), (err), -EFAULT);	\
+} while (0)
+
 #define __get_user_nocheck(x, ptr, size)				\
 ({									\
 	long __gu_err;							\
@@ -377,6 +439,13 @@ do {									\
 	__gu_err;							\
 })
 
+#define __get_user_nocheck_cerr(x, ptr, size, err)			\
+do {									\
+	unsigned long __gu_val;						\
+	__get_user_size_cerr(__gu_val, (ptr), (size), (err), -EFAULT);	\
+	(x) = (__force __typeof__(*(ptr)))__gu_val;			\
+} while (0)
+
 /* FIXME: this hack is definitely wrong -AK */
 struct __large_struct { unsigned long buf[100]; };
 #define __m(x) (*(struct __large_struct __user *)(x))
@@ -423,6 +492,8 @@ struct __large_struct { unsigned long buf[100]; };
 
 #define __get_user(x, ptr)						\
 	__get_user_nocheck((x), (ptr), sizeof(*(ptr)))
+#define __get_user_cerr(x, ptr, err)					\
+	__get_user_nocheck_cerr((x), (ptr), sizeof(*(ptr)), (err))
 /**
  * __put_user: - Write a simple value into user space, with less checking.
  * @x:   Value to copy to user space.
@@ -445,6 +516,9 @@ struct __large_struct { unsigned long buf[100]; };
 
 #define __put_user(x, ptr)						\
 	__put_user_nocheck((__typeof__(*(ptr)))(x), (ptr), sizeof(*(ptr)))
+#define __put_user_cerr(x, ptr, err)					\
+	__put_user_nocheck_cerr((__typeof__(*(ptr)))(x), (ptr),		\
+				sizeof(*(ptr)), (err))
 
 #define __get_user_unaligned __get_user
 #define __put_user_unaligned __put_user
-- 
1.5.6


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [RFC PATCH -tip 4/4] x86: signal: use __{put|get}_user_cerr
  2008-09-23  1:45 [RFC PATCH -tip 0/4] x86: signal handler improvement Hiroshi Shimamoto
                   ` (2 preceding siblings ...)
  2008-09-23  1:51 ` [RFC PATCH -tip 3/4] x86: uaccess: introduce __{put|get}_user_cerr Hiroshi Shimamoto
@ 2008-09-23  1:51 ` Hiroshi Shimamoto
  2008-09-23  8:50 ` [RFC PATCH -tip 0/4] x86: signal handler improvement Ingo Molnar
  4 siblings, 0 replies; 9+ messages in thread
From: Hiroshi Shimamoto @ 2008-09-23  1:51 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, H. Peter Anvin; +Cc: linux-kernel

From: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>

Use __{put|get}_user_cerr for cumulative error handling in x86 signal code.

Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
---
 arch/x86/kernel/signal_32.c |   96 +++++++++++++++++++++---------------------
 arch/x86/kernel/signal_64.c |   86 +++++++++++++++++++-------------------
 2 files changed, 91 insertions(+), 91 deletions(-)

diff --git a/arch/x86/kernel/signal_32.c b/arch/x86/kernel/signal_32.c
index da3cf32..cd6233e 100644
--- a/arch/x86/kernel/signal_32.c
+++ b/arch/x86/kernel/signal_32.c
@@ -126,21 +126,21 @@ restore_sigcontext(struct pt_regs *regs, struct sigcontext __user *sc,
 	/* Always make any pending restarted system calls return -EINTR */
 	current_thread_info()->restart_block.fn = do_no_restart_syscall;
 
-#define COPY(x)		err |= __get_user(regs->x, &sc->x)
+#define COPY(x)		__get_user_cerr(regs->x, &sc->x, err)
 
 #define COPY_SEG(seg)							\
 	{ unsigned short tmp;						\
-	  err |= __get_user(tmp, &sc->seg);				\
+	  __get_user_cerr(tmp, &sc->seg, err);				\
 	  regs->seg = tmp; }
 
 #define COPY_SEG_STRICT(seg)						\
 	{ unsigned short tmp;						\
-	  err |= __get_user(tmp, &sc->seg);				\
+	  __get_user_cerr(tmp, &sc->seg, err);				\
 	  regs->seg = tmp|3; }
 
 #define GET_SEG(seg)							\
 	{ unsigned short tmp;						\
-	  err |= __get_user(tmp, &sc->seg);				\
+	  __get_user_cerr(tmp, &sc->seg, err);				\
 	  loadsegment(seg, tmp); }
 
 	GET_SEG(gs);
@@ -155,7 +155,7 @@ restore_sigcontext(struct pt_regs *regs, struct sigcontext __user *sc,
 	{
 		unsigned int tmpflags;
 
-		err |= __get_user(tmpflags, &sc->flags);
+		__get_user_cerr(tmpflags, &sc->flags, err);
 		regs->flags = (regs->flags & ~FIX_EFLAGS) |
 						(tmpflags & FIX_EFLAGS);
 		regs->orig_ax = -1;		/* disable syscall checks */
@@ -164,11 +164,11 @@ restore_sigcontext(struct pt_regs *regs, struct sigcontext __user *sc,
 	{
 		void __user *buf;
 
-		err |= __get_user(buf, &sc->fpstate);
+		__get_user_cerr(buf, &sc->fpstate, err);
 		err |= restore_i387_xstate(buf);
 	}
 
-	err |= __get_user(*pax, &sc->ax);
+	__get_user_cerr(*pax, &sc->ax, err);
 	return err;
 }
 
@@ -262,37 +262,37 @@ setup_sigcontext(struct sigcontext __user *sc, void __user *fpstate,
 {
 	int tmp, err = 0;
 
-	err |= __put_user(regs->fs, (unsigned int __user *)&sc->fs);
+	__put_user_cerr(regs->fs, (unsigned int __user *)&sc->fs, err);
 	savesegment(gs, tmp);
-	err |= __put_user(tmp, (unsigned int __user *)&sc->gs);
-
-	err |= __put_user(regs->es, (unsigned int __user *)&sc->es);
-	err |= __put_user(regs->ds, (unsigned int __user *)&sc->ds);
-	err |= __put_user(regs->di, &sc->di);
-	err |= __put_user(regs->si, &sc->si);
-	err |= __put_user(regs->bp, &sc->bp);
-	err |= __put_user(regs->sp, &sc->sp);
-	err |= __put_user(regs->bx, &sc->bx);
-	err |= __put_user(regs->dx, &sc->dx);
-	err |= __put_user(regs->cx, &sc->cx);
-	err |= __put_user(regs->ax, &sc->ax);
-	err |= __put_user(current->thread.trap_no, &sc->trapno);
-	err |= __put_user(current->thread.error_code, &sc->err);
-	err |= __put_user(regs->ip, &sc->ip);
-	err |= __put_user(regs->cs, (unsigned int __user *)&sc->cs);
-	err |= __put_user(regs->flags, &sc->flags);
-	err |= __put_user(regs->sp, &sc->sp_at_signal);
-	err |= __put_user(regs->ss, (unsigned int __user *)&sc->ss);
+	__put_user_cerr(tmp, (unsigned int __user *)&sc->gs, err);
+
+	__put_user_cerr(regs->es, (unsigned int __user *)&sc->es, err);
+	__put_user_cerr(regs->ds, (unsigned int __user *)&sc->ds, err);
+	__put_user_cerr(regs->di, &sc->di, err);
+	__put_user_cerr(regs->si, &sc->si, err);
+	__put_user_cerr(regs->bp, &sc->bp, err);
+	__put_user_cerr(regs->sp, &sc->sp, err);
+	__put_user_cerr(regs->bx, &sc->bx, err);
+	__put_user_cerr(regs->dx, &sc->dx, err);
+	__put_user_cerr(regs->cx, &sc->cx, err);
+	__put_user_cerr(regs->ax, &sc->ax, err);
+	__put_user_cerr(current->thread.trap_no, &sc->trapno, err);
+	__put_user_cerr(current->thread.error_code, &sc->err, err);
+	__put_user_cerr(regs->ip, &sc->ip, err);
+	__put_user_cerr(regs->cs, (unsigned int __user *)&sc->cs, err);
+	__put_user_cerr(regs->flags, &sc->flags, err);
+	__put_user_cerr(regs->sp, &sc->sp_at_signal, err);
+	__put_user_cerr(regs->ss, (unsigned int __user *)&sc->ss, err);
 
 	tmp = save_i387_xstate(fpstate);
 	if (tmp < 0)
 		err = 1;
 	else
-		err |= __put_user(tmp ? fpstate : NULL, &sc->fpstate);
+		__put_user_cerr(tmp ? fpstate : NULL, &sc->fpstate, err);
 
 	/* non-iBCS2 extensions.. */
-	err |= __put_user(mask, &sc->oldmask);
-	err |= __put_user(current->thread.cr2, &sc->cr2);
+	__put_user_cerr(mask, &sc->oldmask, err);
+	__put_user_cerr(current->thread.cr2, &sc->cr2, err);
 
 	return err;
 }
@@ -377,7 +377,7 @@ __setup_frame(int sig, struct k_sigaction *ka, sigset_t *set,
 		restorer = ka->sa.sa_restorer;
 
 	/* Set up to return from userspace.  */
-	err |= __put_user(restorer, &frame->pretcode);
+	__put_user_cerr(restorer, &frame->pretcode, err);
 
 	/*
 	 * This is popl %eax ; movl $__NR_sigreturn, %eax ; int $0x80
@@ -386,9 +386,9 @@ __setup_frame(int sig, struct k_sigaction *ka, sigset_t *set,
 	 * reasons and because gdb uses it as a signature to notice
 	 * signal handler stack frames.
 	 */
-	err |= __put_user(0xb858, (short __user *)(frame->retcode+0));
-	err |= __put_user(__NR_sigreturn, (int __user *)(frame->retcode+2));
-	err |= __put_user(0x80cd, (short __user *)(frame->retcode+6));
+	__put_user_cerr(0xb858, (short __user *)(frame->retcode+0), err);
+	__put_user_cerr(__NR_sigreturn, (int __user *)(frame->retcode+2), err);
+	__put_user_cerr(0x80cd, (short __user *)(frame->retcode+6), err);
 
 	if (err)
 		return -EFAULT;
@@ -421,23 +421,23 @@ static int __setup_rt_frame(int sig, struct k_sigaction *ka, siginfo_t *info,
 	if (!access_ok(VERIFY_WRITE, frame, sizeof(*frame)))
 		return -EFAULT;
 
-	err |= __put_user(sig, &frame->sig);
-	err |= __put_user(&frame->info, &frame->pinfo);
-	err |= __put_user(&frame->uc, &frame->puc);
+	__put_user_cerr(sig, &frame->sig, err);
+	__put_user_cerr(&frame->info, &frame->pinfo, err);
+	__put_user_cerr(&frame->uc, &frame->puc, err);
 	err |= copy_siginfo_to_user(&frame->info, info);
 	if (err)
 		return -EFAULT;
 
 	/* Create the ucontext.  */
 	if (cpu_has_xsave)
-		err |= __put_user(UC_FP_XSTATE, &frame->uc.uc_flags);
+		__put_user_cerr(UC_FP_XSTATE, &frame->uc.uc_flags, err);
 	else
-		err |= __put_user(0, &frame->uc.uc_flags);
-	err |= __put_user(0, &frame->uc.uc_link);
-	err |= __put_user(current->sas_ss_sp, &frame->uc.uc_stack.ss_sp);
-	err |= __put_user(sas_ss_flags(regs->sp),
-			  &frame->uc.uc_stack.ss_flags);
-	err |= __put_user(current->sas_ss_size, &frame->uc.uc_stack.ss_size);
+		__put_user_cerr(0, &frame->uc.uc_flags, err);
+	__put_user_cerr(0, &frame->uc.uc_link, err);
+	__put_user_cerr(current->sas_ss_sp, &frame->uc.uc_stack.ss_sp, err);
+	__put_user_cerr(sas_ss_flags(regs->sp),
+			  &frame->uc.uc_stack.ss_flags, err);
+	__put_user_cerr(current->sas_ss_size, &frame->uc.uc_stack.ss_size, err);
 	err |= setup_sigcontext(&frame->uc.uc_mcontext, fpstate,
 				regs, set->sig[0]);
 	err |= __copy_to_user(&frame->uc.uc_sigmask, set, sizeof(*set));
@@ -448,7 +448,7 @@ static int __setup_rt_frame(int sig, struct k_sigaction *ka, siginfo_t *info,
 	restorer = VDSO32_SYMBOL(current->mm->context.vdso, rt_sigreturn);
 	if (ka->sa.sa_flags & SA_RESTORER)
 		restorer = ka->sa.sa_restorer;
-	err |= __put_user(restorer, &frame->pretcode);
+	__put_user_cerr(restorer, &frame->pretcode, err);
 
 	/*
 	 * This is movl $__NR_rt_sigreturn, %ax ; int $0x80
@@ -457,9 +457,9 @@ static int __setup_rt_frame(int sig, struct k_sigaction *ka, siginfo_t *info,
 	 * reasons and because gdb uses it as a signature to notice
 	 * signal handler stack frames.
 	 */
-	err |= __put_user(0xb8, (char __user *)(frame->retcode+0));
-	err |= __put_user(__NR_rt_sigreturn, (int __user *)(frame->retcode+1));
-	err |= __put_user(0x80cd, (short __user *)(frame->retcode+5));
+	__put_user_cerr(0xb8, (char __user *)(frame->retcode+0), err);
+	__put_user_cerr(__NR_rt_sigreturn, (int __user *)(frame->retcode+1), err);
+	__put_user_cerr(0x80cd, (short __user *)(frame->retcode+5), err);
 
 	if (err)
 		return -EFAULT;
diff --git a/arch/x86/kernel/signal_64.c b/arch/x86/kernel/signal_64.c
index bf77d47..10375f4 100644
--- a/arch/x86/kernel/signal_64.c
+++ b/arch/x86/kernel/signal_64.c
@@ -64,7 +64,7 @@ restore_sigcontext(struct pt_regs *regs, struct sigcontext __user *sc,
 	/* Always make any pending restarted system calls return -EINTR */
 	current_thread_info()->restart_block.fn = do_no_restart_syscall;
 
-#define COPY(x)		(err |= __get_user(regs->x, &sc->x))
+#define COPY(x)		__get_user_cerr(regs->x, &sc->x, err)
 
 	COPY(di); COPY(si); COPY(bp); COPY(sp); COPY(bx);
 	COPY(dx); COPY(cx); COPY(ip);
@@ -82,13 +82,13 @@ restore_sigcontext(struct pt_regs *regs, struct sigcontext __user *sc,
 	 * App's signal handler can save/restore other segments if needed. */
 	{
 		unsigned cs;
-		err |= __get_user(cs, &sc->cs);
+		__get_user_cerr(cs, &sc->cs, err);
 		regs->cs = cs | 3;	/* Force into user mode */
 	}
 
 	{
 		unsigned int tmpflags;
-		err |= __get_user(tmpflags, &sc->flags);
+		__get_user_cerr(tmpflags, &sc->flags, err);
 		regs->flags = (regs->flags & ~FIX_EFLAGS) | (tmpflags & FIX_EFLAGS);
 		regs->orig_ax = -1;		/* disable syscall checks */
 	}
@@ -96,11 +96,11 @@ restore_sigcontext(struct pt_regs *regs, struct sigcontext __user *sc,
 	{
 		void __user *buf;
 
-		err |= __get_user(buf, &sc->fpstate);
+		__get_user_cerr(buf, &sc->fpstate, err);
 		err |= restore_i387_xstate(buf);
 	}
 
-	err |= __get_user(*pax, &sc->ax);
+	__get_user_cerr(*pax, &sc->ax, err);
 	return err;
 }
 
@@ -150,32 +150,32 @@ setup_sigcontext(struct sigcontext __user *sc, struct pt_regs *regs,
 {
 	int err = 0;
 
-	err |= __put_user(regs->cs, &sc->cs);
-	err |= __put_user(0, &sc->gs);
-	err |= __put_user(0, &sc->fs);
-
-	err |= __put_user(regs->di, &sc->di);
-	err |= __put_user(regs->si, &sc->si);
-	err |= __put_user(regs->bp, &sc->bp);
-	err |= __put_user(regs->sp, &sc->sp);
-	err |= __put_user(regs->bx, &sc->bx);
-	err |= __put_user(regs->dx, &sc->dx);
-	err |= __put_user(regs->cx, &sc->cx);
-	err |= __put_user(regs->ax, &sc->ax);
-	err |= __put_user(regs->r8, &sc->r8);
-	err |= __put_user(regs->r9, &sc->r9);
-	err |= __put_user(regs->r10, &sc->r10);
-	err |= __put_user(regs->r11, &sc->r11);
-	err |= __put_user(regs->r12, &sc->r12);
-	err |= __put_user(regs->r13, &sc->r13);
-	err |= __put_user(regs->r14, &sc->r14);
-	err |= __put_user(regs->r15, &sc->r15);
-	err |= __put_user(me->thread.trap_no, &sc->trapno);
-	err |= __put_user(me->thread.error_code, &sc->err);
-	err |= __put_user(regs->ip, &sc->ip);
-	err |= __put_user(regs->flags, &sc->flags);
-	err |= __put_user(mask, &sc->oldmask);
-	err |= __put_user(me->thread.cr2, &sc->cr2);
+	__put_user_cerr(regs->cs, &sc->cs, err);
+	__put_user_cerr(0, &sc->gs, err);
+	__put_user_cerr(0, &sc->fs, err);
+
+	__put_user_cerr(regs->di, &sc->di, err);
+	__put_user_cerr(regs->si, &sc->si, err);
+	__put_user_cerr(regs->bp, &sc->bp, err);
+	__put_user_cerr(regs->sp, &sc->sp, err);
+	__put_user_cerr(regs->bx, &sc->bx, err);
+	__put_user_cerr(regs->dx, &sc->dx, err);
+	__put_user_cerr(regs->cx, &sc->cx, err);
+	__put_user_cerr(regs->ax, &sc->ax, err);
+	__put_user_cerr(regs->r8, &sc->r8, err);
+	__put_user_cerr(regs->r9, &sc->r9, err);
+	__put_user_cerr(regs->r10, &sc->r10, err);
+	__put_user_cerr(regs->r11, &sc->r11, err);
+	__put_user_cerr(regs->r12, &sc->r12, err);
+	__put_user_cerr(regs->r13, &sc->r13, err);
+	__put_user_cerr(regs->r14, &sc->r14, err);
+	__put_user_cerr(regs->r15, &sc->r15, err);
+	__put_user_cerr(me->thread.trap_no, &sc->trapno, err);
+	__put_user_cerr(me->thread.error_code, &sc->err, err);
+	__put_user_cerr(regs->ip, &sc->ip, err);
+	__put_user_cerr(regs->flags, &sc->flags, err);
+	__put_user_cerr(mask, &sc->oldmask, err);
+	__put_user_cerr(me->thread.cr2, &sc->cr2, err);
 
 	return err;
 }
@@ -229,19 +229,19 @@ static int __setup_rt_frame(int sig, struct k_sigaction *ka, siginfo_t *info,
 
 	/* Create the ucontext.  */
 	if (cpu_has_xsave)
-		err |= __put_user(UC_FP_XSTATE, &frame->uc.uc_flags);
+		__put_user_cerr(UC_FP_XSTATE, &frame->uc.uc_flags, err);
 	else
-		err |= __put_user(0, &frame->uc.uc_flags);
-	err |= __put_user(0, &frame->uc.uc_link);
-	err |= __put_user(me->sas_ss_sp, &frame->uc.uc_stack.ss_sp);
-	err |= __put_user(sas_ss_flags(regs->sp),
-			  &frame->uc.uc_stack.ss_flags);
-	err |= __put_user(me->sas_ss_size, &frame->uc.uc_stack.ss_size);
-	err |= setup_sigcontext(&frame->uc.uc_mcontext, regs, set->sig[0], me);
-	err |= __put_user(fp, &frame->uc.uc_mcontext.fpstate);
+		__put_user_cerr(0, &frame->uc.uc_flags, err);
+	__put_user_cerr(0, &frame->uc.uc_link, err);
+	__put_user_cerr(me->sas_ss_sp, &frame->uc.uc_stack.ss_sp, err);
+	__put_user_cerr(sas_ss_flags(regs->sp),
+			  &frame->uc.uc_stack.ss_flags, err);
+	__put_user_cerr(me->sas_ss_size, &frame->uc.uc_stack.ss_size, err);
+	setup_sigcontext(&frame->uc.uc_mcontext, regs, set->sig[0], me);
+	__put_user_cerr(fp, &frame->uc.uc_mcontext.fpstate, err);
 	if (sizeof(*set) == 16) {
-		__put_user(set->sig[0], &frame->uc.uc_sigmask.sig[0]);
-		__put_user(set->sig[1], &frame->uc.uc_sigmask.sig[1]);
+		__put_user_cerr(set->sig[0], &frame->uc.uc_sigmask.sig[0], err);
+		__put_user_cerr(set->sig[1], &frame->uc.uc_sigmask.sig[1], err);
 	} else
 		err |= __copy_to_user(&frame->uc.uc_sigmask, set, sizeof(*set));
 
@@ -249,7 +249,7 @@ static int __setup_rt_frame(int sig, struct k_sigaction *ka, siginfo_t *info,
 	   already in userspace.  */
 	/* x86-64 should always use SA_RESTORER. */
 	if (ka->sa.sa_flags & SA_RESTORER) {
-		err |= __put_user(ka->sa.sa_restorer, &frame->pretcode);
+		__put_user_cerr(ka->sa.sa_restorer, &frame->pretcode, err);
 	} else {
 		/* could use a vstub here */
 		return -EFAULT;
-- 
1.5.6


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH -tip 0/4] x86: signal handler improvement
  2008-09-23  1:45 [RFC PATCH -tip 0/4] x86: signal handler improvement Hiroshi Shimamoto
                   ` (3 preceding siblings ...)
  2008-09-23  1:51 ` [RFC PATCH -tip 4/4] x86: signal: use __{put|get}_user_cerr Hiroshi Shimamoto
@ 2008-09-23  8:50 ` Ingo Molnar
  2008-09-23  8:55   ` Ingo Molnar
  2008-09-23 16:58   ` Hiroshi Shimamoto
  4 siblings, 2 replies; 9+ messages in thread
From: Ingo Molnar @ 2008-09-23  8:50 UTC (permalink / raw)
  To: Hiroshi Shimamoto; +Cc: Thomas Gleixner, H. Peter Anvin, linux-kernel


* Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com> wrote:

> This patch series is experimental.
> I made it against tip/master.
> 
> I noticed there are inefficient codes in x86 signals.
> For example, disassembled 32-bit setup_sigcontext();
> 
> 0000007c <setup_sigcontext>:
>   7c:	55                   	push   %ebp
>   7d:	89 e5                	mov    %esp,%ebp
>   7f:	57                   	push   %edi
>   80:	31 ff                	xor    %edi,%edi
>   82:	56                   	push   %esi
>   83:	89 c6                	mov    %eax,%esi
>   85:	53                   	push   %ebx
>   86:	83 ec 58             	sub    $0x58,%esp
>   89:	89 55 a4             	mov    %edx,-0x5c(%ebp)
>   8c:	89 fa                	mov    %edi,%edx
>   8e:	8b 41 24             	mov    0x24(%ecx),%eax
>   91:	89 46 04             	mov    %eax,0x4(%esi)
>   94:	89 55 a8             	mov    %edx,-0x58(%ebp)
> ...
>  184:	8b 5d ac             	mov    -0x54(%ebp),%ebx
>  187:	0b 5d a8             	or     -0x58(%ebp),%ebx
>  18a:	0b 5d b0             	or     -0x50(%ebp),%ebx
>  18d:	0b 5d b4             	or     -0x4c(%ebp),%ebx
>  190:	0b 5d b8             	or     -0x48(%ebp),%ebx
>  193:	0b 5d bc             	or     -0x44(%ebp),%ebx
>  196:	0b 5d c0             	or     -0x40(%ebp),%ebx
>  199:	0b 5d c4             	or     -0x3c(%ebp),%ebx
>  19c:	0b 5d c8             	or     -0x38(%ebp),%ebx
>  19f:	0b 5d cc             	or     -0x34(%ebp),%ebx
>  1a2:	0b 5d d0             	or     -0x30(%ebp),%ebx
>  1a5:	0b 5d d4             	or     -0x2c(%ebp),%ebx
>  1a8:	0b 5d d8             	or     -0x28(%ebp),%ebx
>  1ab:	0b 5d dc             	or     -0x24(%ebp),%ebx
>  1ae:	0b 5d e0             	or     -0x20(%ebp),%ebx
>  1b1:	0b 5d e4             	or     -0x1c(%ebp),%ebx
>  1b4:	0b 5d e8             	or     -0x18(%ebp),%ebx
>  1b7:	0b 5d ec             	or     -0x14(%ebp),%ebx
>  1ba:	0b 5d f0             	or     -0x10(%ebp),%ebx
>  1bd:	09 fb                	or     %edi,%ebx
> ...
>  1dc:	09 d8                	or     %ebx,%eax
>  1de:	5b                   	pop    %ebx
>  1df:	09 c8                	or     %ecx,%eax
>  1e1:	5e                   	pop    %esi
>  1e2:	5f                   	pop    %edi
>  1e3:	5d                   	pop    %ebp
>  1e4:	c3                   	ret    
> 
> there is a lot of "or" operation with stack, and it came from a set of
> following lines;
> 
> err |= __put_user(x, ptr);
> 
> The above line compiled to like this;
>   a0:	89 fa                	mov    %edi,%edx
>   a2:	8b 41 20             	mov    0x20(%ecx),%eax
>   a5:	89 46 08             	mov    %eax,0x8(%esi)
>   a8:	89 55 b0             	mov    %edx,-0x50(%ebp)
> 
> and errors in __put_user is stored to stack. At the end of function, 
> all errors in stack are calculated. If access to user space is failed, 
> %edx is set to -EFAULT in exception handler and stored to stack for 
> later operation. It seems inefficient.

yes, very much! Years ago i tried years to improve it but didnt think of 
your solution back then.

> This patch series introduce __{put|get}_user_cerr for cumulative error
> handling. So, the line;
> err |= __put_user(x, ptr);
> changes to
> __put_user_cerr(x, ptr, err);
> 
> and it compiled to like this;
>   92:	8b 41 20             	mov    0x20(%ecx),%eax
>   95:	89 47 08             	mov    %eax,0x8(%edi)
> 
> The cumulative error is kept in the other register, in this example,
> %esi is used for this, and returns it. If access to user space causes fault,
> %esi is set to the value (%esi | -EFAULT) in exception handler.
> 
>  137:	89 f0                	mov    %esi,%eax
>  139:	5b                   	pop    %ebx
>  13a:	5e                   	pop    %esi
>  13b:	5f                   	pop    %edi
>  13c:	5d                   	pop    %ebp
>  13d:	c3                   	ret    
> 
> The results of this, I got a little performance improvement in signal
> handling. Here is a result of lmbench.
> I've tried 64-bit only at this time. Will measure on 32-bit.
> 
> Processor, Processes - times in microseconds - smaller is better
> ------------------------------------------------------------------------------
> Host                 OS  Mhz null null      open slct sig  sig  fork exec sh
>                              call  I/O stat clos TCP  inst hndl proc proc proc
> --------- ------------- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ----
> tip64-sig Linux 2.6.27- 3788 0.17 0.27 1.67 2.65 3.20 0.32 2.97 180. 524. 1806
> tip64-sig Linux 2.6.27- 3788 0.17 0.27 1.68 2.65 3.20 0.32 2.95 180. 520. 1796
> tip64-sig Linux 2.6.27- 3788 0.17 0.27 1.68 2.72 3.21 0.32 2.96 177. 528. 1812
> tip64     Linux 2.6.27- 3788 0.17 0.27 1.68 2.74 3.22 0.32 3.01 174. 523. 1853
> tip64     Linux 2.6.27- 3788 0.17 0.27 1.68 2.73 3.22 0.32 3.01 175. 523. 1806
> tip64     Linux 2.6.27- 3788 0.17 0.27 1.68 2.74 3.20 0.32 3.01 181. 523. 1791
> 
> This patch series also reduces stack usages and code size.
> 
> $ size signal_*
>    text	   data	    bss	    dec	    hex	filename
>    4507	      0	      0	   4507	   119b	signal_32.o.new
>    5031	      0	      0	   5031	   13a7	signal_32.o.old
>    3827	      0	      0	   3827	    ef3	signal_64.o.new
>    4652	      0	      0	   4652	   122c	signal_64.o.old
> 
> Comments are welcome.
> I'll handle this patch series if it's good.

very nice!!

could we perhaps first finish unifying them into signal.c, and then 
introduce __put_user_cerr() in signal_32/64.c?

Or we could put your patches into tip/x86/signal as-is if you expect to 
finish the unification in the near future.

	Ingo

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH -tip 0/4] x86: signal handler improvement
  2008-09-23  8:50 ` [RFC PATCH -tip 0/4] x86: signal handler improvement Ingo Molnar
@ 2008-09-23  8:55   ` Ingo Molnar
  2008-09-23 17:00     ` Hiroshi Shimamoto
  2008-09-23 16:58   ` Hiroshi Shimamoto
  1 sibling, 1 reply; 9+ messages in thread
From: Ingo Molnar @ 2008-09-23  8:55 UTC (permalink / raw)
  To: Hiroshi Shimamoto; +Cc: Thomas Gleixner, H. Peter Anvin, linux-kernel


* Ingo Molnar <mingo@elte.hu> wrote:

> could we perhaps first finish unifying them into signal.c, and then 
> introduce __put_user_cerr() in signal_32/64.c?

i've got an API suggestion as well. Instead of:

-               err |= __put_user(UC_FP_XSTATE, &frame->uc.uc_flags);
+               __put_user_cerr(UC_FP_XSTATE, &frame->uc.uc_flags, err);

could you instead please make it:

+               __put_user_cerr(UC_FP_XSTATE, &frame->uc.uc_flags, &err);

i.e. pass in 'err' as a reference. This makes it clear to the casual 
reader, in a C calling convention sense, that there's a side-effect to 
'err'. [ There should be no change to the resulting code as 
__put_user_cerr() is a macro. ]

	Ingo

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH -tip 0/4] x86: signal handler improvement
  2008-09-23  8:50 ` [RFC PATCH -tip 0/4] x86: signal handler improvement Ingo Molnar
  2008-09-23  8:55   ` Ingo Molnar
@ 2008-09-23 16:58   ` Hiroshi Shimamoto
  1 sibling, 0 replies; 9+ messages in thread
From: Hiroshi Shimamoto @ 2008-09-23 16:58 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Thomas Gleixner, H. Peter Anvin, linux-kernel

Ingo Molnar wrote:
> * Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com> wrote:
> 
>> This patch series is experimental.
>> I made it against tip/master.
>>
>> I noticed there are inefficient codes in x86 signals.
>> For example, disassembled 32-bit setup_sigcontext();
>>
>> 0000007c <setup_sigcontext>:
>>   7c:	55                   	push   %ebp
>>   7d:	89 e5                	mov    %esp,%ebp
>>   7f:	57                   	push   %edi
>>   80:	31 ff                	xor    %edi,%edi
>>   82:	56                   	push   %esi
>>   83:	89 c6                	mov    %eax,%esi
>>   85:	53                   	push   %ebx
>>   86:	83 ec 58             	sub    $0x58,%esp
>>   89:	89 55 a4             	mov    %edx,-0x5c(%ebp)
>>   8c:	89 fa                	mov    %edi,%edx
>>   8e:	8b 41 24             	mov    0x24(%ecx),%eax
>>   91:	89 46 04             	mov    %eax,0x4(%esi)
>>   94:	89 55 a8             	mov    %edx,-0x58(%ebp)
>> ...
>>  184:	8b 5d ac             	mov    -0x54(%ebp),%ebx
>>  187:	0b 5d a8             	or     -0x58(%ebp),%ebx
>>  18a:	0b 5d b0             	or     -0x50(%ebp),%ebx
>>  18d:	0b 5d b4             	or     -0x4c(%ebp),%ebx
>>  190:	0b 5d b8             	or     -0x48(%ebp),%ebx
>>  193:	0b 5d bc             	or     -0x44(%ebp),%ebx
>>  196:	0b 5d c0             	or     -0x40(%ebp),%ebx
>>  199:	0b 5d c4             	or     -0x3c(%ebp),%ebx
>>  19c:	0b 5d c8             	or     -0x38(%ebp),%ebx
>>  19f:	0b 5d cc             	or     -0x34(%ebp),%ebx
>>  1a2:	0b 5d d0             	or     -0x30(%ebp),%ebx
>>  1a5:	0b 5d d4             	or     -0x2c(%ebp),%ebx
>>  1a8:	0b 5d d8             	or     -0x28(%ebp),%ebx
>>  1ab:	0b 5d dc             	or     -0x24(%ebp),%ebx
>>  1ae:	0b 5d e0             	or     -0x20(%ebp),%ebx
>>  1b1:	0b 5d e4             	or     -0x1c(%ebp),%ebx
>>  1b4:	0b 5d e8             	or     -0x18(%ebp),%ebx
>>  1b7:	0b 5d ec             	or     -0x14(%ebp),%ebx
>>  1ba:	0b 5d f0             	or     -0x10(%ebp),%ebx
>>  1bd:	09 fb                	or     %edi,%ebx
>> ...
>>  1dc:	09 d8                	or     %ebx,%eax
>>  1de:	5b                   	pop    %ebx
>>  1df:	09 c8                	or     %ecx,%eax
>>  1e1:	5e                   	pop    %esi
>>  1e2:	5f                   	pop    %edi
>>  1e3:	5d                   	pop    %ebp
>>  1e4:	c3                   	ret    
>>
>> there is a lot of "or" operation with stack, and it came from a set of
>> following lines;
>>
>> err |= __put_user(x, ptr);
>>
>> The above line compiled to like this;
>>   a0:	89 fa                	mov    %edi,%edx
>>   a2:	8b 41 20             	mov    0x20(%ecx),%eax
>>   a5:	89 46 08             	mov    %eax,0x8(%esi)
>>   a8:	89 55 b0             	mov    %edx,-0x50(%ebp)
>>
>> and errors in __put_user is stored to stack. At the end of function, 
>> all errors in stack are calculated. If access to user space is failed, 
>> %edx is set to -EFAULT in exception handler and stored to stack for 
>> later operation. It seems inefficient.
> 
> yes, very much! Years ago i tried years to improve it but didnt think of 
> your solution back then.
> 
>> This patch series introduce __{put|get}_user_cerr for cumulative error
>> handling. So, the line;
>> err |= __put_user(x, ptr);
>> changes to
>> __put_user_cerr(x, ptr, err);
>>
>> and it compiled to like this;
>>   92:	8b 41 20             	mov    0x20(%ecx),%eax
>>   95:	89 47 08             	mov    %eax,0x8(%edi)
>>
>> The cumulative error is kept in the other register, in this example,
>> %esi is used for this, and returns it. If access to user space causes fault,
>> %esi is set to the value (%esi | -EFAULT) in exception handler.
>>
>>  137:	89 f0                	mov    %esi,%eax
>>  139:	5b                   	pop    %ebx
>>  13a:	5e                   	pop    %esi
>>  13b:	5f                   	pop    %edi
>>  13c:	5d                   	pop    %ebp
>>  13d:	c3                   	ret    
>>
>> The results of this, I got a little performance improvement in signal
>> handling. Here is a result of lmbench.
>> I've tried 64-bit only at this time. Will measure on 32-bit.
>>
>> Processor, Processes - times in microseconds - smaller is better
>> ------------------------------------------------------------------------------
>> Host                 OS  Mhz null null      open slct sig  sig  fork exec sh
>>                              call  I/O stat clos TCP  inst hndl proc proc proc
>> --------- ------------- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ----
>> tip64-sig Linux 2.6.27- 3788 0.17 0.27 1.67 2.65 3.20 0.32 2.97 180. 524. 1806
>> tip64-sig Linux 2.6.27- 3788 0.17 0.27 1.68 2.65 3.20 0.32 2.95 180. 520. 1796
>> tip64-sig Linux 2.6.27- 3788 0.17 0.27 1.68 2.72 3.21 0.32 2.96 177. 528. 1812
>> tip64     Linux 2.6.27- 3788 0.17 0.27 1.68 2.74 3.22 0.32 3.01 174. 523. 1853
>> tip64     Linux 2.6.27- 3788 0.17 0.27 1.68 2.73 3.22 0.32 3.01 175. 523. 1806
>> tip64     Linux 2.6.27- 3788 0.17 0.27 1.68 2.74 3.20 0.32 3.01 181. 523. 1791
>>
>> This patch series also reduces stack usages and code size.
>>
>> $ size signal_*
>>    text	   data	    bss	    dec	    hex	filename
>>    4507	      0	      0	   4507	   119b	signal_32.o.new
>>    5031	      0	      0	   5031	   13a7	signal_32.o.old
>>    3827	      0	      0	   3827	    ef3	signal_64.o.new
>>    4652	      0	      0	   4652	   122c	signal_64.o.old
>>
>> Comments are welcome.
>> I'll handle this patch series if it's good.
> 
> very nice!!
> 
> could we perhaps first finish unifying them into signal.c, and then 
> introduce __put_user_cerr() in signal_32/64.c?
> 
> Or we could put your patches into tip/x86/signal as-is if you expect to 
> finish the unification in the near future.

Thanks for looking this series!

I prefer to finish unification first.
Will push this series after unification.

thanks,
Hiroshi Shimamoto


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH -tip 0/4] x86: signal handler improvement
  2008-09-23  8:55   ` Ingo Molnar
@ 2008-09-23 17:00     ` Hiroshi Shimamoto
  0 siblings, 0 replies; 9+ messages in thread
From: Hiroshi Shimamoto @ 2008-09-23 17:00 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Thomas Gleixner, H. Peter Anvin, linux-kernel

Ingo Molnar wrote:
> * Ingo Molnar <mingo@elte.hu> wrote:
> 
>> could we perhaps first finish unifying them into signal.c, and then 
>> introduce __put_user_cerr() in signal_32/64.c?
> 
> i've got an API suggestion as well. Instead of:
> 
> -               err |= __put_user(UC_FP_XSTATE, &frame->uc.uc_flags);
> +               __put_user_cerr(UC_FP_XSTATE, &frame->uc.uc_flags, err);
> 
> could you instead please make it:
> 
> +               __put_user_cerr(UC_FP_XSTATE, &frame->uc.uc_flags, &err);
> 
> i.e. pass in 'err' as a reference. This makes it clear to the casual 
> reader, in a C calling convention sense, that there's a side-effect to 
> 'err'. [ There should be no change to the resulting code as 
> __put_user_cerr() is a macro. ]

make sense!
Will do in next post.

thanks,
Hiroshi Shimamoto



^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2008-09-23 17:01 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-09-23  1:45 [RFC PATCH -tip 0/4] x86: signal handler improvement Hiroshi Shimamoto
2008-09-23  1:50 ` [RFC PATCH -tip 1/4] x86: uaccess: rename __put_user_u64 to __put_user_asm_u64 Hiroshi Shimamoto
2008-09-23  1:51 ` [RFC PATCH -tip 2/4] x86: uaccess: introduce __{put|get}_user_asm_eop Hiroshi Shimamoto
2008-09-23  1:51 ` [RFC PATCH -tip 3/4] x86: uaccess: introduce __{put|get}_user_cerr Hiroshi Shimamoto
2008-09-23  1:51 ` [RFC PATCH -tip 4/4] x86: signal: use __{put|get}_user_cerr Hiroshi Shimamoto
2008-09-23  8:50 ` [RFC PATCH -tip 0/4] x86: signal handler improvement Ingo Molnar
2008-09-23  8:55   ` Ingo Molnar
2008-09-23 17:00     ` Hiroshi Shimamoto
2008-09-23 16:58   ` Hiroshi Shimamoto

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox