From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2D538C4363A for ; Wed, 21 Oct 2020 10:49:57 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id BE1A120795 for ; Wed, 21 Oct 2020 10:49:56 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2439005AbgJUKt4 (ORCPT ); Wed, 21 Oct 2020 06:49:56 -0400 Received: from correo.us.es ([193.147.175.20]:40796 "EHLO mail.us.es" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2439003AbgJUKt4 (ORCPT ); Wed, 21 Oct 2020 06:49:56 -0400 Received: from antivirus1-rhel7.int (unknown [192.168.2.11]) by mail.us.es (Postfix) with ESMTP id D0E4CD395EC for ; Wed, 21 Oct 2020 12:49:54 +0200 (CEST) Received: from antivirus1-rhel7.int (localhost [127.0.0.1]) by antivirus1-rhel7.int (Postfix) with ESMTP id C3FFBFC5FC for ; Wed, 21 Oct 2020 12:49:54 +0200 (CEST) Received: by antivirus1-rhel7.int (Postfix, from userid 99) id B95F06DA63; Wed, 21 Oct 2020 12:49:54 +0200 (CEST) Received: from antivirus1-rhel7.int (localhost [127.0.0.1]) by antivirus1-rhel7.int (Postfix) with ESMTP id 872AA4DE7A; Wed, 21 Oct 2020 12:49:52 +0200 (CEST) Received: from 192.168.1.97 (192.168.1.97) by antivirus1-rhel7.int (F-Secure/fsigk_smtp/550/antivirus1-rhel7.int); Wed, 21 Oct 2020 12:49:52 +0200 (CEST) X-Virus-Status: clean(F-Secure/fsigk_smtp/550/antivirus1-rhel7.int) Received: from us.es (unknown [90.77.255.23]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) (Authenticated sender: 1984lsi) by entrada.int (Postfix) with ESMTPSA id 67A4C4301DE3; Wed, 21 Oct 2020 12:49:52 +0200 (CEST) Date: Wed, 21 Oct 2020 12:49:52 +0200 X-SMTPAUTHUS: auth mail.us.es From: Pablo Neira Ayuso To: Phil Sutter , Florian Westphal , netfilter-devel@vger.kernel.org Subject: Re: [net-next PATCH 0/2] netfilter: Improve inverted IP prefix matches Message-ID: <20201021104952.GA31026@salvia> References: <20201001165744.25466-1-phil@nwl.cc> <20201001222536.GB12773@breakpoint.cc> <20201002090033.GB1845@orbyte.nwl.cc> <20201021104321.GA30742@salvia> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20201021104321.GA30742@salvia> User-Agent: Mutt/1.10.1 (2018-07-13) X-Virus-Scanned: ClamAV using ClamSMTP Precedence: bulk List-ID: X-Mailing-List: netfilter-devel@vger.kernel.org On Wed, Oct 21, 2020 at 12:43:21PM +0200, Pablo Neira Ayuso wrote: > Hi Phil, > > On Fri, Oct 02, 2020 at 11:00:33AM +0200, Phil Sutter wrote: > > Hi Florian, > > > > On Fri, Oct 02, 2020 at 12:25:36AM +0200, Florian Westphal wrote: > > > Phil Sutter wrote: > > > > The following two patches improve packet throughput in a test setup > > > > sending UDP packets (using iperf3) between two netns. The ruleset used > > > > on receiver side is like this: > > > > > > > > | *filter > > > > | :test - [0:0] > > > > | -A INPUT -j test > > > > | -A INPUT -j ACCEPT > > > > | -A test ! -s 10.0.0.0/10 -j DROP # this line repeats 10000 times > > > > | COMMIT > > > > > > > > These are the generated VM instructions for each rule: > > > > > > > > | [ payload load 4b @ network header + 12 => reg 1 ] > > > > | [ bitwise reg 1 = (reg=1 & 0x0000c0ff ) ^ 0x00000000 ] > > > > > > Not related to this patch, but we should avoid the bitop if the > > > netmask is divisble by 8 (can adjust the cmp -- adjusting the > > > payload expr is probably not worth it). > > > > See the patch I just sent to this list. I adjusted both - it simply > > didn't appear to me that I could get by with reducing the cmp expression > > size only. The upside though is that detecting the prefix match based on > > payload expression length is quick and easy. > > > > Someone will have to adjust nft tool, though. ;) > > > > > > | [ cmp eq reg 1 0x0000000a ] > > > > | [ counter pkts 0 bytes 0 ] > > > > > > Out of curiosity, does omitting 'counter' help? > > > > > > nft counter is rather expensive due to bh disable, > > > iptables does it once at the evaluation loop only. > > > > I changed the test to create the base ruleset using iptables-nft-restore > > just as before, but create the rules in 'test' chain like so: > > > > | nft add rule filter test ip saddr != 10.0.0.0/10 drop > > > > The VM code is as expected: > > > > | [ payload load 4b @ network header + 12 => reg 1 ] > > | [ bitwise reg 1 = (reg=1 & 0x0000c0ff ) ^ 0x00000000 ] > > | [ cmp eq reg 1 0x0000000a ] > > | [ immediate reg 0 drop ] > > > > Performance is ~7000pkt/s. So while it's faster than iptables-nft, it's > > still quite a bit slower than legacy iptables despite the skipped > > counters. > > iptables is optimized for matching on input/output device name and > IPv4 address + mask (see ip_packet_match()) for historical reasons, > iptables does not use a match for this since the beginning. For clarity here, I mean: iptables does not use the generic match infrastructure for matching on these fields, instead it is using ip_packet_match() which is called from ipt_do_table() which is the core function that evaluates the packet. > One possibility (in the short-term) is to add an internal kernel > expression to achieve the same behaviour. The kernel needs to detects > for: > > payload (nh, offset to ip saddr or ip daddr or ip protocol) + cmp > payload (nh, offset to ip saddr or ip daddr) + bitwise + cmp > meta (iifname or oifname) + bitwise + cmp > meta (iifname or oifname) + cmp > > at the very beginning of the rule. > > and squash these expressions into the "built-in" iptables match > expression which emulates ip_packet_match(). > > Not nice, but if microbenchmarks using thousand of rules really matter > (this is worst case O(n) linear list evaluation...) then it might make > sense to explore this.