From mboxrd@z Thu Jan 1 00:00:00 1970 From: Eric Dumazet Subject: Re: [PATCH net-next-2.6] filter: cleanup codes[] init Date: Fri, 19 Nov 2010 10:54:27 +0100 Message-ID: <1290160467.3034.33.camel@edumazet-laptop> References: <1290132284-12328-1-git-send-email-xiaosuo@gmail.com> <1290153111.29509.2.camel@edumazet-laptop> <1290153398.29509.7.camel@edumazet-laptop> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: "David S. Miller" , Hagen Paul Pfeifer , netdev@vger.kernel.org To: Changli Gao Return-path: Received: from mail-wy0-f174.google.com ([74.125.82.174]:60695 "EHLO mail-wy0-f174.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752168Ab0KSJyc (ORCPT ); Fri, 19 Nov 2010 04:54:32 -0500 Received: by wyb28 with SMTP id 28so4248998wyb.19 for ; Fri, 19 Nov 2010 01:54:31 -0800 (PST) In-Reply-To: Sender: netdev-owner@vger.kernel.org List-ID: Le vendredi 19 novembre 2010 =C3=A0 16:38 +0800, Changli Gao a =C3=A9cr= it : > I compared the asm code of sk_run_filter. > As you see, an additional 'dec %edx' instruction is inserted. > sk_chk_filter() only runs 1 times, I think we can afford the 'dec > instruction' and 'dirty' code, but sk_run_filter() runs much often, > this additional dec instruction isn't affordable. >=20 Maybe on your setup. By the way, the=20 u32 f_k =3D fentry->k; that David added in commit 57fe93b374a6b871 was much more a problem on arches with not enough registers. x86_32 for example : compiler use a register (%esi on my gcc-4.5.1) to store f_k, and more important A register is now stored in stack instead of a cpu register. On my compilers gcc version 4.4.5 (Ubuntu/Linaro 4.4.4-14ubuntu5) 64bit gcc-4.5.1 (self compiled) 32bit result code was the same, before and after patch Most probably you have "CONFIG_CC_OPTIMIZE_FOR_SIZE=3Dy" which unfortunately is known to generate poor looking code. 39b: 49 8d 14 c6 lea (%r14,%rax,8),%rdx 39f: 66 83 3a 2d cmpw $0x2d,(%rdx) 3a3: 8b 42 04 mov 0x4(%rdx),%eax // f_k =3D= fentry->k; 3a6: 76 28 jbe 3d0 3d0: 0f b7 0a movzwl (%rdx),%ecx 3d3: ff 24 cd 00 00 00 00 jmpq *0x0(,%rcx,8) 32bit code: 2e0: 8d 04 df lea (%edi,%ebx,8),%eax 2e3: 66 83 38 2d cmpw $0x2d,(%eax) 2e7: 8b 70 04 mov 0x4(%eax),%esi // f_k =3D fentry->k= ; 2ea: 76 1c jbe 308 308: 0f b7 10 movzwl (%eax),%edx 30b: ff 24 95 38 00 00 00 jmp *0x38(,%edx,4) DIV_X instruction : 480: 8b 45 a4 mov -0x5c(%ebp),%eax 483: 85 c0 test %eax,%eax 485: 0f 84 9d fe ff ff je 328 48b: 8b 45 ac mov -0x54(%ebp),%eax // A 48e: 31 d2 xor %edx,%edx 490: f7 75 a4 divl -0x5c(%ebp) 493: 89 45 ac mov %eax,-0x54(%ebp) // A 496: e9 85 fe ff ff jmp 320 I believe we should revert the u32 f_k =3D fentry->k; part fentry->k as is fast as f_k if stored on stack, and avoids one instruction if fentry->k is not needed.