From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754058AbYIWQ7N (ORCPT ); Tue, 23 Sep 2008 12:59:13 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1752444AbYIWQ66 (ORCPT ); Tue, 23 Sep 2008 12:58:58 -0400 Received: from gateway-1237.mvista.com ([63.81.120.158]:2347 "EHLO gateway-1237.mvista.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752440AbYIWQ65 (ORCPT ); Tue, 23 Sep 2008 12:58:57 -0400 Message-ID: <48D9204E.2020706@ct.jp.nec.com> Date: Tue, 23 Sep 2008 09:58:54 -0700 From: Hiroshi Shimamoto User-Agent: Thunderbird 2.0.0.16 (Windows/20080708) MIME-Version: 1.0 To: Ingo Molnar Cc: Thomas Gleixner , "H. Peter Anvin" , linux-kernel@vger.kernel.org Subject: Re: [RFC PATCH -tip 0/4] x86: signal handler improvement References: <48D84A2A.2030009@ct.jp.nec.com> <20080923085027.GA25698@elte.hu> In-Reply-To: <20080923085027.GA25698@elte.hu> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Ingo Molnar wrote: > * Hiroshi Shimamoto wrote: > >> This patch series is experimental. >> I made it against tip/master. >> >> I noticed there are inefficient codes in x86 signals. >> For example, disassembled 32-bit setup_sigcontext(); >> >> 0000007c : >> 7c: 55 push %ebp >> 7d: 89 e5 mov %esp,%ebp >> 7f: 57 push %edi >> 80: 31 ff xor %edi,%edi >> 82: 56 push %esi >> 83: 89 c6 mov %eax,%esi >> 85: 53 push %ebx >> 86: 83 ec 58 sub $0x58,%esp >> 89: 89 55 a4 mov %edx,-0x5c(%ebp) >> 8c: 89 fa mov %edi,%edx >> 8e: 8b 41 24 mov 0x24(%ecx),%eax >> 91: 89 46 04 mov %eax,0x4(%esi) >> 94: 89 55 a8 mov %edx,-0x58(%ebp) >> ... >> 184: 8b 5d ac mov -0x54(%ebp),%ebx >> 187: 0b 5d a8 or -0x58(%ebp),%ebx >> 18a: 0b 5d b0 or -0x50(%ebp),%ebx >> 18d: 0b 5d b4 or -0x4c(%ebp),%ebx >> 190: 0b 5d b8 or -0x48(%ebp),%ebx >> 193: 0b 5d bc or -0x44(%ebp),%ebx >> 196: 0b 5d c0 or -0x40(%ebp),%ebx >> 199: 0b 5d c4 or -0x3c(%ebp),%ebx >> 19c: 0b 5d c8 or -0x38(%ebp),%ebx >> 19f: 0b 5d cc or -0x34(%ebp),%ebx >> 1a2: 0b 5d d0 or -0x30(%ebp),%ebx >> 1a5: 0b 5d d4 or -0x2c(%ebp),%ebx >> 1a8: 0b 5d d8 or -0x28(%ebp),%ebx >> 1ab: 0b 5d dc or -0x24(%ebp),%ebx >> 1ae: 0b 5d e0 or -0x20(%ebp),%ebx >> 1b1: 0b 5d e4 or -0x1c(%ebp),%ebx >> 1b4: 0b 5d e8 or -0x18(%ebp),%ebx >> 1b7: 0b 5d ec or -0x14(%ebp),%ebx >> 1ba: 0b 5d f0 or -0x10(%ebp),%ebx >> 1bd: 09 fb or %edi,%ebx >> ... >> 1dc: 09 d8 or %ebx,%eax >> 1de: 5b pop %ebx >> 1df: 09 c8 or %ecx,%eax >> 1e1: 5e pop %esi >> 1e2: 5f pop %edi >> 1e3: 5d pop %ebp >> 1e4: c3 ret >> >> there is a lot of "or" operation with stack, and it came from a set of >> following lines; >> >> err |= __put_user(x, ptr); >> >> The above line compiled to like this; >> a0: 89 fa mov %edi,%edx >> a2: 8b 41 20 mov 0x20(%ecx),%eax >> a5: 89 46 08 mov %eax,0x8(%esi) >> a8: 89 55 b0 mov %edx,-0x50(%ebp) >> >> and errors in __put_user is stored to stack. At the end of function, >> all errors in stack are calculated. If access to user space is failed, >> %edx is set to -EFAULT in exception handler and stored to stack for >> later operation. It seems inefficient. > > yes, very much! Years ago i tried years to improve it but didnt think of > your solution back then. > >> This patch series introduce __{put|get}_user_cerr for cumulative error >> handling. So, the line; >> err |= __put_user(x, ptr); >> changes to >> __put_user_cerr(x, ptr, err); >> >> and it compiled to like this; >> 92: 8b 41 20 mov 0x20(%ecx),%eax >> 95: 89 47 08 mov %eax,0x8(%edi) >> >> The cumulative error is kept in the other register, in this example, >> %esi is used for this, and returns it. If access to user space causes fault, >> %esi is set to the value (%esi | -EFAULT) in exception handler. >> >> 137: 89 f0 mov %esi,%eax >> 139: 5b pop %ebx >> 13a: 5e pop %esi >> 13b: 5f pop %edi >> 13c: 5d pop %ebp >> 13d: c3 ret >> >> The results of this, I got a little performance improvement in signal >> handling. Here is a result of lmbench. >> I've tried 64-bit only at this time. Will measure on 32-bit. >> >> Processor, Processes - times in microseconds - smaller is better >> ------------------------------------------------------------------------------ >> Host OS Mhz null null open slct sig sig fork exec sh >> call I/O stat clos TCP inst hndl proc proc proc >> --------- ------------- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- >> tip64-sig Linux 2.6.27- 3788 0.17 0.27 1.67 2.65 3.20 0.32 2.97 180. 524. 1806 >> tip64-sig Linux 2.6.27- 3788 0.17 0.27 1.68 2.65 3.20 0.32 2.95 180. 520. 1796 >> tip64-sig Linux 2.6.27- 3788 0.17 0.27 1.68 2.72 3.21 0.32 2.96 177. 528. 1812 >> tip64 Linux 2.6.27- 3788 0.17 0.27 1.68 2.74 3.22 0.32 3.01 174. 523. 1853 >> tip64 Linux 2.6.27- 3788 0.17 0.27 1.68 2.73 3.22 0.32 3.01 175. 523. 1806 >> tip64 Linux 2.6.27- 3788 0.17 0.27 1.68 2.74 3.20 0.32 3.01 181. 523. 1791 >> >> This patch series also reduces stack usages and code size. >> >> $ size signal_* >> text data bss dec hex filename >> 4507 0 0 4507 119b signal_32.o.new >> 5031 0 0 5031 13a7 signal_32.o.old >> 3827 0 0 3827 ef3 signal_64.o.new >> 4652 0 0 4652 122c signal_64.o.old >> >> Comments are welcome. >> I'll handle this patch series if it's good. > > very nice!! > > could we perhaps first finish unifying them into signal.c, and then > introduce __put_user_cerr() in signal_32/64.c? > > Or we could put your patches into tip/x86/signal as-is if you expect to > finish the unification in the near future. Thanks for looking this series! I prefer to finish unification first. Will push this series after unification. thanks, Hiroshi Shimamoto