* batch netlink messages - performance improvement
@ 2016-02-25 19:43 Yigal Reiss (yreiss)
2016-02-26 10:04 ` Florian Westphal
0 siblings, 1 reply; 3+ messages in thread
From: Yigal Reiss (yreiss) @ 2016-02-25 19:43 UTC (permalink / raw)
To: netfilter-devel@vger.kernel.org
Hi,
I would like to check an idea.
I am using nfqueue for DPI in user space. I use the already existing batch verdict from user space. The problem with that is that reducing the number of user <--> kernel context switches is bound to 1/2, since kernel --> user space still reports every single packet. So if I have batch verdict for every 25 or 50 packets, then still I only reduced the number of switches by an order of 2.
So I tried batching the unicast netlink messages (carrying the packets) from kernel to user space. I do that by calling sk->sk_data_ready(sk); (in __netlink_sendskb() in af_netlink.c) only every [N] packets. This seems to contribute similar performance improvements as the batch verdict.
However I have no experience in kernel programming and currently I only implemented a quick and dirty hack (no timeout, assuming a single socket...) just to demonstrate the improvement. My question is therefore whether such an improvement could be interesting for the main kernel. Does it bear any problems etc.
If this suggestion makes sense, how would you suggest proceed with this idea? I could continue and start working on a patch, but since as I wrote I have no experience in kernel programming I would like to have some thumbs up for the directions I'm taking, what makes sense and what's not etc so I don't waste my and other people time.
B.t.w., I saw that there is another potential improvement which is mmaping the packets to user space. I couldn't figure out whether this feature is complete in any kernel version and is it ready to use.
Thanks,
Yigal
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: batch netlink messages - performance improvement
2016-02-25 19:43 batch netlink messages - performance improvement Yigal Reiss (yreiss)
@ 2016-02-26 10:04 ` Florian Westphal
2016-02-29 14:37 ` Yigal Reiss (yreiss)
0 siblings, 1 reply; 3+ messages in thread
From: Florian Westphal @ 2016-02-26 10:04 UTC (permalink / raw)
To: Yigal Reiss (yreiss); +Cc: netfilter-devel@vger.kernel.org
Yigal Reiss (yreiss) <yreiss@cisco.com> wrote:
> So I tried batching the unicast netlink messages (carrying the packets) from kernel to user space. I do that by calling sk->sk_data_ready(sk); (in __netlink_sendskb() in af_netlink.c) only every [N] packets. This seems to contribute similar performance improvements as the batch verdict.
Uh?
That makes no sense to me. Why and how does that help?
Can you share numbers or example program that exhibits
this behaviour?
I'd expect that in most cases (in non-idle case) that
sock_def_readable usually doesn't do anything (skwq_has_sleeper should
be false).
For nfqueue best receipe seems to be
recvmmsg + batch number of vectors read + NFQA_CFG_F_GSO.
> If this suggestion makes sense, how would you suggest proceed with this idea?
I'd first like to understand what is so expensive in sock_def_readable
that this helps in first place.
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2016-02-29 14:46 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-02-25 19:43 batch netlink messages - performance improvement Yigal Reiss (yreiss)
2016-02-26 10:04 ` Florian Westphal
2016-02-29 14:37 ` Yigal Reiss (yreiss)
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).