public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
From: Felix Fietkau <nbd@nbd.name>
To: Eric Dumazet <edumazet@google.com>
Cc: netdev@vger.kernel.org, "David S. Miller" <davem@davemloft.net>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
	David Ahern <dsahern@kernel.org>,
	linux-kernel@vger.kernel.org
Subject: Re: [RFC] net: add TCP fraglist GRO support
Date: Tue, 23 Apr 2024 13:55:14 +0200	[thread overview]
Message-ID: <7476374f-cf0c-45d0-8100-1b2cd2f290d5@nbd.name> (raw)
In-Reply-To: <CANn89iJZvoKVB+AK1_44gki2pHyigyMLXFkyevSQpH3iDbnCvw@mail.gmail.com>

On 23.04.24 13:17, Eric Dumazet wrote:
> On Tue, Apr 23, 2024 at 12:25 PM Felix Fietkau <nbd@nbd.name> wrote:
>>
>> On 23.04.24 12:15, Eric Dumazet wrote:
>> > On Tue, Apr 23, 2024 at 11:41 AM Felix Fietkau <nbd@nbd.name> wrote:
>> >>
>> >> When forwarding TCP after GRO, software segmentation is very expensive,
>> >> especially when the checksum needs to be recalculated.
>> >> One case where that's currently unavoidable is when routing packets over
>> >> PPPoE. Performance improves significantly when using fraglist GRO
>> >> implemented in the same way as for UDP.
>> >>
>> >> Here's a measurement of running 2 TCP streams through a MediaTek MT7622
>> >> device (2-core Cortex-A53), which runs NAT with flow offload enabled from
>> >> one ethernet port to PPPoE on another ethernet port + cake qdisc set to
>> >> 1Gbps.
>> >>
>> >> rx-gro-list off: 630 Mbit/s, CPU 35% idle
>> >> rx-gro-list on:  770 Mbit/s, CPU 40% idle
>> >
>> > Hi Felix
>> >
>> > changelog is a bit terse, and patch complex.
>> >
>> > Could you elaborate why this issue
>> > seems to be related to a specific driver ?
>> >
>> > I think we should push hard to not use frag_list in drivers :/
>> >
>> > And GRO itself could avoid building frag_list skbs
>> > in hosts where forwarding is enabled.
>> >
>> > (Note that we also can increase MAX_SKB_FRAGS to 45 these days)
>>
>> The issue is not related to a specific driver at all. Here's how traffic
>> flows: TCP packets are received on the SoC ethernet driver, the network
>> stack performs regular GRO. The packet gets forwarded by flow offloading
>> until it reaches the PPPoE device. PPPoE does not support GSO packets,
>> so the packets need to be segmented again.
>> This is *very* expensive, since data needs to be copied and checksummed.
> 
> gso segmentation does not copy the payload, unless the device has no
> SG capability.
> 
> I guess something should be done about that, regardless of your GRO work,
> since most ethernet devices support SG these days.
> 
> Some drivers use header split for RX, so forwarding to  PPPoE
> would require a linearization anyway, if SG is not properly handled.

In the world of consumer-grade WiFi devices, there are a lot of chipsets 
with limited or nonexistent SG support, and very limited checksum 
offload capabilities on Ethernet. The WiFi side of these devices is 
often even worse. I think fraglist GRO is a decent fallback for the 
inevitable corner cases.

>> So in my patch, I changed the code to build fraglist GRO instead of
>> regular GRO packets, whenever there is no local socket to receive the
>> packets. This makes segmenting very cheap, since the original skbs are
>> preserved on the trip through the stack. The only cost is an extra
>> socket lookup whenever NETIF_F_FRAGLIST_GRO is enabled.
> 
> A socket lookup in multi-net-namespace world is not going to work generically,
> but I get the idea now.

Right, I can't think of a proper solution to this at the moment. 
Considering that NETIF_F_FRAGLIST_GRO is opt-in and only meant for 
rather specific configurations anyway, this should not be too much of a 
problem, right?

- Felix

  reply	other threads:[~2024-04-23 11:55 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-23  9:41 [RFC] net: add TCP fraglist GRO support Felix Fietkau
2024-04-23 10:15 ` Eric Dumazet
2024-04-23 10:25   ` Felix Fietkau
2024-04-23 11:17     ` Eric Dumazet
2024-04-23 11:55       ` Felix Fietkau [this message]
2024-04-23 12:11         ` Eric Dumazet
2024-04-23 12:23           ` Felix Fietkau
2024-04-23 13:07             ` Eric Dumazet
2024-04-23 14:34             ` Paolo Abeni
2024-04-23 16:55               ` Felix Fietkau
2024-04-24  1:24                 ` Willem de Bruijn
2024-04-24 13:50                   ` Felix Fietkau
2024-04-24 14:30                     ` Willem de Bruijn
2024-04-24 16:26                       ` Felix Fietkau
2024-04-23 15:03   ` David Ahern
2024-04-23 15:18     ` Eric Dumazet

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7476374f-cf0c-45d0-8100-1b2cd2f290d5@nbd.name \
    --to=nbd@nbd.name \
    --cc=davem@davemloft.net \
    --cc=dsahern@kernel.org \
    --cc=edumazet@google.com \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox