From: Richard Gobert <richardbgobert@gmail.com>
To: Paolo Abeni <pabeni@redhat.com>,
davem@davemloft.net, edumazet@google.com, kuba@kernel.org,
dsahern@kernel.org, willemdebruijn.kernel@gmail.com,
shuah@kernel.org, netdev@vger.kernel.org,
linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org
Subject: Re: [PATCH net-next v7 2/3] net: gro: move L3 flush checks to tcp_gro_receive and udp_gro_receive_segment
Date: Thu, 18 Apr 2024 17:05:42 +0200 [thread overview]
Message-ID: <f4b10cc8-fdd9-4dd6-92c7-60acf66702e0@gmail.com> (raw)
In-Reply-To: <469c26d112600bce3a7fe77131c62eae4ecae5d1.camel@redhat.com>
Paolo Abeni wrote:
> On Tue, 2024-04-16 at 11:21 +0200, Paolo Abeni wrote:
>> On Fri, 2024-04-12 at 17:55 +0200, Richard Gobert wrote:
>>> {inet,ipv6}_gro_receive functions perform flush checks (ttl, flags,
>>> iph->id, ...) against all packets in a loop. These flush checks are used
>>> currently in all tcp flows and in some UDP flows in GRO.
>>>
>>> These checks need to be done only once and only against the found p skb,
>>> since they only affect flush and not same_flow.
>>>
>>> Leveraging the previous commit in the series, in which correct network
>>> header offsets are saved for both outer and inner network headers -
>>> allowing these checks to be done only once, in tcp_gro_receive and
>>> udp_gro_receive_segment. As a result, NAPI_GRO_CB(p)->flush is not used at
>>> all. In addition, flush_id checks are more declarative and contained in
>>> inet_gro_flush, thus removing the need for flush_id in napi_gro_cb.
>>>
>>> This results in less parsing code for UDP flows and non-loop flush tests
>>> for TCP flows.
>>>
>>> To make sure results are not within noise range - I've made netfilter drop
>>> all TCP packets, and measured CPU performance in GRO (in this case GRO is
>>> responsible for about 50% of the CPU utilization).
>>>
>>> L3 flush/flush_id checks are not relevant to UDP connections where
>>> skb_gro_receive_list is called. The only code change relevant to this flow
>>> is inet_gro_receive. The rest of the code parsing this flow stays the
>>> same.
>>>
>>> All concurrent connections tested are with the same ip srcaddr and
>>> dstaddr.
>>>
>>> perf top while replaying 64 concurrent IP/UDP connections (UDP fwd flow):
>>> net-next:
>>> 3.03% [kernel] [k] inet_gro_receive
>>>
>>> patch applied:
>>> 2.78% [kernel] [k] inet_gro_receive
>>
>> Why there are no figures for
>> udp_gro_receive_segment()/gro_network_flush() here?
>>
>> Also you should be able to observer a very high amount of CPU usage by
>> GRO even with TCP with very high speed links, keeping the BH/GRO on a
>> CPU and the user-space/data copy on a different one (or using rx zero
>> copy).
>
> To be more explicit: I think at least the above figures are required,
> and I still fear the real gain in that case would range from zero to
> negative.
>
I wrote about it in the commit message in short, sorry if I wasn't clear
enough.
gro_network_flush is compiled in-line to both udp_gro_receive_segment and
tcp_gro_receive. udp_gro_receive_segment is compiled in-line to
udp_gro_receive.
The UDP numbers I posted are not relevant anymore after Willem and
Alexander's thread, after which we understood flush and flush_id should be
calculated for all UDP flows.
I can post new numbers for the UDP fwd path after implementing the correct
change. As for TCP - the numbers I posted stay the same.
You should note there is an increase in CPU utilization in tcp_gro_receive
because of the inline compilation of gro_network_flush. The numbers make
sense and show performance enhancement in the case I showed when both
inet_gro_receive and tcp_gro_receive are accounted for.
> If you can't do the TCP part of the testing, please provide at least
> the figures for a single UDP flow, that should give more indication WRT
> the result we can expect with TCP.
>
> Note that GRO is used mainly by TCP and TCP packets with different
> src/dst port will land into different GRO hash buckets, having
> different RX hash.
>
> That will happen even for UDP, at least for some (most?) nics include
> the UDP ports in the RX hash.
>
> Thanks,
>
> Paolo
>
next prev parent reply other threads:[~2024-04-18 15:06 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-04-12 15:55 [PATCH net-next v7 0/3] net: gro: move p->{flush/flush_id} calculations to L4 Richard Gobert
2024-04-12 15:55 ` [PATCH net-next v7 1/3] net: gro: add {inner_}network_offset to napi_gro_cb Richard Gobert
2024-04-14 1:15 ` Willem de Bruijn
2024-04-17 13:57 ` Richard Gobert
2024-04-16 9:36 ` Paolo Abeni
2024-04-18 15:09 ` Richard Gobert
2024-04-12 15:55 ` [PATCH net-next v7 2/3] net: gro: move L3 flush checks to tcp_gro_receive and udp_gro_receive_segment Richard Gobert
2024-04-16 9:21 ` Paolo Abeni
2024-04-16 9:58 ` Paolo Abeni
2024-04-18 15:05 ` Richard Gobert [this message]
2024-04-12 15:55 ` [PATCH net-next v7 3/3] selftests/net: add flush id selftests Richard Gobert
2024-04-14 0:55 ` Willem de Bruijn
2024-04-17 14:01 ` Richard Gobert
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=f4b10cc8-fdd9-4dd6-92c7-60acf66702e0@gmail.com \
--to=richardbgobert@gmail.com \
--cc=davem@davemloft.net \
--cc=dsahern@kernel.org \
--cc=edumazet@google.com \
--cc=kuba@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=shuah@kernel.org \
--cc=willemdebruijn.kernel@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).