From: Stefano Brivio <sbrivio@redhat.com>
To: Florian Westphal <fw@strlen.de>
Cc: <netfilter-devel@vger.kernel.org>
Subject: Re: [PATCH nf-next 0/2] nf_tables: avoid retpoline overhead on set lookups
Date: Sat, 15 May 2021 02:57:08 +0200 [thread overview]
Message-ID: <20210515025708.1cacf2ac@elisabeth> (raw)
In-Reply-To: <20210513202956.22709-1-fw@strlen.de>
Hi Florian,
On Thu, 13 May 2021 22:29:54 +0200
Florian Westphal <fw@strlen.de> wrote:
> This adds a nft_set_do_lookup() helper, then extends it to use
> direct calls when RETPOLINE feature is enabled.
>
> For non-retpoline builds, nft_set_do_lookup() inline helper
> does a indirect call. INDIRECT_CALLABLE_SCOPE macro allows to
> keep the lookup functions static in this case.
Thanks for doing this! And sorry I looked into it more than one year
ago without ever finishing it ;)
I ran some quick tests, I was curious to see the impact of dropping
indirect calls on that path. With the 'performance' test cases of
nft_concat_range.sh, roughly estimating clock cycles as clock frequency
divided by packet rate, it looks like this offsets entirely the usage of
retpolines!
With a 'return true;' in the lookup function (I patched nft_set_pipapo),
on my usual single AMD Epyc 7351 thread, 2.9GHz, average of three runs,
I get:
| packet | est. |
| rate | cycles |
| (Mpps) | |
-----------------------------------------------|--------|--------|
Without retpolines, netdev drop | 15.443 | 188 |
Without retpolines, dummy lookup function | 9.995 | 292 |
-> Without retpolines, set lookup | | 104-|-.
- - - - - - - - - - - - - - - - - - - - - - - -|- - - - | - - - -|
With retpolines, netdev drop | 10.420 | 278 | |
With retpolines, dummy lookup function | 7.038 | 412 |
-> With retpolines, set lookup | | 134 | |
- - - - - - - - - - - - - - - - - - - - - - - -|- - - - | - - - -|
This series, retpolines, netdev drop | 10.431 | 278 | |
This series, retpolines, dummy lookup function | 7.549 | 384 |
-> This series, retpolines, set lookup | ^ +7% | 106-|-'
estimated clock cycles for set lookup only are the difference between
cycles to hit the dummy lookup function and cycles to drop packets from
the netdev hook -- they're now approximately the same with and without
retpolines.
For context, I also ran the whole set of tests with actual matching.
This is indicative, just a single run:
--------------.-----------------------------------.--------------------------.
AMD Epyc 7351 | baselines, Mpps | this series |
1 thread |___________________________________|__________________________|
2.9GHz | | | | | | | |
512KiB L1D$ | netdev | hash | rbtree | | hash | rbtree | |
--------------| hook | no | single | | no | single | |
type entries | drop | ranges | field | pipapo | ranges | field | pipapo |
--------------|--------|--------|--------|--------|--------|-----------------|
net,port | | | | | +15% | +4% | +4% |
1000 | 10.1 | 5.2 | 2.7 | 4.6 | 6.0 | 2.8 | 4.8 |
--------------|--------|--------|--------|--------|--------|--------|--------|
port,net | | | | | +11% | +5% | +4% |
100 | 10.4 | 5.4 | 4.1 | 5.0 | 6.0 | 4.3 | 5.2 |
--------------|--------|--------|--------|--------|--------|--------|--------|
net6,port | | | | | +15% | +9% | +6% |
1000 | 10.0 | 4.6 | 1.1 | 3.1 | 9.9 | 1.2 | 3.3 |
--------------|--------|--------|--------|--------|--------|--------|--------|
port,proto | | | | | +7% | +3% | +3% |
10000 | 10.7 | 6.0 | 3.0 | 3.0 | 6.4 | 3.1 | 3.1 |
--------------|--------|--------|--------|--------|--------|--------|--------|
net6,port,mac | | | | | +3% | +4% | +3% |
10 | 9.9 | 3.8 | 2.7 | 3.3 | 3.9 | 2.8 | 3.4 |
--------------|--------|--------|--------|--------|--------|--------|--------|
net6,port,mac, | | | | | +3% | +9% | +4% |
proto 1000 | 10.0 | 3.6 | 1.1 | 2.4 | 3.7 | 1.2 | 2.5 |
--------------|--------|--------|--------|--------|--------|--------|--------|
net,mac | | | | | +6% | +4% | +3% |
1000 | 10.5 | 4.8 | 2.7 | 4.0 | 5.1 | 2.8 | 4.1 |
--------------'--------'--------'--------'--------'--------'--------'--------'
--
Stefano
next prev parent reply other threads:[~2021-05-15 0:57 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-05-13 20:29 [PATCH nf-next 0/2] nf_tables: avoid retpoline overhead on set lookups Florian Westphal
2021-05-13 20:29 ` [PATCH nf-next 1/2] netfilter: add and use nft_set_do_lookup helper Florian Westphal
2021-05-13 20:29 ` [PATCH nf-next 2/2] netfilter: nf_tables: prefer direct calls for set lookups Florian Westphal
2021-05-15 0:57 ` Stefano Brivio [this message]
2021-05-18 16:02 ` [PATCH nf-next 0/2] nf_tables: avoid retpoline overhead on " Pablo Neira Ayuso
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210515025708.1cacf2ac@elisabeth \
--to=sbrivio@redhat.com \
--cc=fw@strlen.de \
--cc=netfilter-devel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).