From mboxrd@z Thu Jan 1 00:00:00 1970 From: Eric Dumazet Subject: Re: [PATCH v1] net: filter: Just In Time compiler Date: Sun, 03 Apr 2011 07:41:58 +0200 Message-ID: <1301809318.2837.125.camel@edumazet-laptop> References: <1301783301.2837.77.camel@edumazet-laptop> <20110402225002.GJ3108@nuttenaction> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: David Miller , netdev , Arnaldo Carvalho de Melo To: Hagen Paul Pfeifer Return-path: Received: from mail-ww0-f44.google.com ([74.125.82.44]:54624 "EHLO mail-ww0-f44.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751479Ab1DCFmG (ORCPT ); Sun, 3 Apr 2011 01:42:06 -0400 Received: by wwa36 with SMTP id 36so5399202wwa.1 for ; Sat, 02 Apr 2011 22:42:04 -0700 (PDT) In-Reply-To: <20110402225002.GJ3108@nuttenaction> Sender: netdev-owner@vger.kernel.org List-ID: Le dimanche 03 avril 2011 =C3=A0 00:50 +0200, Hagen Paul Pfeifer a =C3=A9= crit : > * Eric Dumazet | 2011-04-03 00:28:21 [+0200]: >=20 > >In order to speedup packet filtering, here is an implementation of a= JIT > >compiler for x86_64 >=20 > Great work! Eric, do you have some numbers? For a trivial "ret" filte= r the > performance gain should be marginal - if any. But what with complex f= ilter > rules? And as said last time: libpcap optimizer seems to be target nu= mber one > for optimization. Again: great work! Preliminary performance results are good, even for basic filter. (I changed the AND operator to be able to use "and $imm8,%al" for typical net addr/[24-31] operations, and "and $imm16,%ax" for addr/[16-23] ones) case BPF_S_ALU_AND_K: if (K >=3D 0xFFFFFF00) { EMIT2(0x24, K & 0xFF); /* and imm8,%al */ } else if (K >=3D 0xFFFF0000) { EMIT2(0x66, 0x25); /* and imm16,%ax */ EMIT2(K, 2); } else { EMIT1_off32(0x25, K); /* and imm32,%eax */ } break; dummy0 udpflood, and following basic active tcpdump, catching no frames (condition is not met) # tcpdump -p -n -s 0 -i dummy0 net 192.168.2.0/24 -d (000) ldh [12] (001) jeq #0x800 jt 2 jf 8 (002) ld [26] (003) and #0xffffff00 (004) jeq #0xc0a80200 jt 16 jf 5 (005) ld [30] (006) and #0xffffff00 (007) jeq #0xc0a80200 jt 16 jf 17 (008) jeq #0x806 jt 10 jf 9 (009) jeq #0x8035 jt 10 jf 17 (010) ld [28] (011) and #0xffffff00 (012) jeq #0xc0a80200 jt 16 jf 13 (013) ld [38] (014) and #0xffffff00 (015) jeq #0xc0a80200 jt 16 jf 17 (016) ret #65535 (017) ret #0 flen=3D18 proglen=3D147 pass=3D3 image=3Dffffffffa00b5000 JIT code: ffffffffa00b5000: 55 48 89 e5 48 83 ec 60 48 89 5d f8 44 8b 4= f 60 JIT code: ffffffffa00b5010: 44 2b 4f 64 4c 8b 87 b8 00 00 00 be 0c 00 0= 0 00 JIT code: ffffffffa00b5020: e8 24 dc 2e e1 3d 00 08 00 00 75 28 be 1a 0= 0 00 JIT code: ffffffffa00b5030: 00 e8 fe db 2e e1 24 00 3d 00 02 a8 c0 74 4= 9 be JIT code: ffffffffa00b5040: 1e 00 00 00 e8 eb db 2e e1 24 00 3d 00 02 a= 8 c0 JIT code: ffffffffa00b5050: 74 36 eb 3b 3d 06 08 00 00 74 07 3d 35 80 0= 0 00 JIT code: ffffffffa00b5060: 75 2d be 1c 00 00 00 e8 c8 db 2e e1 24 00 3= d 00 JIT code: ffffffffa00b5070: 02 a8 c0 74 13 be 26 00 00 00 e8 b5 db 2e e= 1 24 JIT code: ffffffffa00b5080: 00 3d 00 02 a8 c0 75 07 b8 ff ff 00 00 eb 0= 2 31 JIT code: ffffffffa00b5090: c0 c9 c3 Benchmark : ifconfig dummy0 10.2.2.2 netmask 255.255.255.0 up 1) Baseline (no active tcpdump) # time /root/udpflood -f -l 10000000 10.2.2.1 real 0m7.941s user 0m0.823s sys 0m7.103s 2) Time with normal filtering (JIT disabled) # time /root/udpflood -f -l 10000000 10.2.2.1 real 0m10.165s user 0m1.000s sys 0m9.149s 3) JIT enabled # time /root/udpflood -f -l 10000000 10.2.2.1 real 0m9.615s user 0m1.022s sys 0m8.578s Thats about 50ns saved per invocation, on a E5540 @ 2.53GHz We could get better results if we inline the fastpath of ld/ldh/ldb, instead of calling helpers. (In this case, avoiding three call/ret instructions pair).