From: Aditya Garg <gargaditya@linux.microsoft.com>
To: Eric Dumazet <edumazet@google.com>
Cc: kys@microsoft.com, haiyangz@microsoft.com, wei.liu@kernel.org,
decui@microsoft.com, andrew+netdev@lunn.ch, davem@davemloft.net,
kuba@kernel.org, pabeni@redhat.com, longli@microsoft.com,
kotaranov@microsoft.com, horms@kernel.org,
shradhagupta@linux.microsoft.com, ernis@linux.microsoft.com,
dipayanroy@linux.microsoft.com, shirazsaleem@microsoft.com,
linux-hyperv@vger.kernel.org, netdev@vger.kernel.org,
linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org,
gargaditya@microsoft.com, ssengar@linux.microsoft.com
Subject: Re: [PATCH net-next] net: mana: Linearize SKB if TX SGEs exceeds hardware limit
Date: Wed, 22 Oct 2025 22:14:03 +0530 [thread overview]
Message-ID: <cd716f3f-cb08-4ff9-8de4-25363180d7a4@linux.microsoft.com> (raw)
In-Reply-To: <CANn89iKFsuUnwMb-upqwswrCYaTL-MXVwsQdxFhduZeZRAJZ2A@mail.gmail.com>
On 17-10-2025 23:36, Eric Dumazet wrote:
> On Fri, Oct 17, 2025 at 10:41 AM Aditya Garg
> <gargaditya@linux.microsoft.com> wrote:
>>
>> On 08-10-2025 20:58, Aditya Garg wrote:
>>> On 08-10-2025 20:51, Eric Dumazet wrote:
>>>> On Wed, Oct 8, 2025 at 8:16 AM Aditya Garg
>>>> <gargaditya@linux.microsoft.com> wrote:
>>>>>
>>>>> On 03-10-2025 21:45, Eric Dumazet wrote:
>>>>>> On Fri, Oct 3, 2025 at 8:47 AM Aditya Garg
>>>>>> <gargaditya@linux.microsoft.com> wrote:
>>>>>>>
>>>>>>> The MANA hardware supports a maximum of 30 scatter-gather entries
>>>>>>> (SGEs)
>>>>>>> per TX WQE. In rare configurations where MAX_SKB_FRAGS + 2 exceeds
>>>>>>> this
>>>>>>> limit, the driver drops the skb. Add a check in mana_start_xmit() to
>>>>>>> detect such cases and linearize the SKB before transmission.
>>>>>>>
>>>>>>> Return NETDEV_TX_BUSY only for -ENOSPC from
>>>>>>> mana_gd_post_work_request(),
>>>>>>> send other errors to free_sgl_ptr to free resources and record the tx
>>>>>>> drop.
>>>>>>>
>>>>>>> Signed-off-by: Aditya Garg <gargaditya@linux.microsoft.com>
>>>>>>> Reviewed-by: Dipayaan Roy <dipayanroy@linux.microsoft.com>
>>>>>>> ---
>>>>>>> drivers/net/ethernet/microsoft/mana/mana_en.c | 26 +++++++++++++
>>>>>>> ++----
>>>>>>> include/net/mana/gdma.h | 8 +++++-
>>>>>>> include/net/mana/mana.h | 1 +
>>>>>>> 3 files changed, 29 insertions(+), 6 deletions(-)
>>>>>>>
>>>>>>> diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/
>>>>>>> drivers/net/ethernet/microsoft/mana/mana_en.c
>>>>>>> index f4fc86f20213..22605753ca84 100644
>>>>>>> --- a/drivers/net/ethernet/microsoft/mana/mana_en.c
>>>>>>> +++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
>>>>>>> @@ -20,6 +20,7 @@
>>>>>>>
>>>>>>> #include <net/mana/mana.h>
>>>>>>> #include <net/mana/mana_auxiliary.h>
>>>>>>> +#include <linux/skbuff.h>
>>>>>>>
>>>>>>> static DEFINE_IDA(mana_adev_ida);
>>>>>>>
>>>>>>> @@ -289,6 +290,19 @@ netdev_tx_t mana_start_xmit(struct sk_buff
>>>>>>> *skb, struct net_device *ndev)
>>>>>>> cq = &apc->tx_qp[txq_idx].tx_cq;
>>>>>>> tx_stats = &txq->stats;
>>>>>>>
>>>>>>> + BUILD_BUG_ON(MAX_TX_WQE_SGL_ENTRIES !=
>>>>>>> MANA_MAX_TX_WQE_SGL_ENTRIES);
>>>>>>> + #if (MAX_SKB_FRAGS + 2 > MANA_MAX_TX_WQE_SGL_ENTRIES)
>>>>>>> + if (skb_shinfo(skb)->nr_frags + 2 >
>>>>>>> MANA_MAX_TX_WQE_SGL_ENTRIES) {
>>>>>>> + netdev_info_once(ndev,
>>>>>>> + "nr_frags %d exceeds max
>>>>>>> supported sge limit. Attempting skb_linearize\n",
>>>>>>> + skb_shinfo(skb)->nr_frags);
>>>>>>> + if (skb_linearize(skb)) {
>>>>>>
>>>>>> This will fail in many cases.
>>>>>>
>>>>>> This sort of check is better done in ndo_features_check()
>>>>>>
>>>>>> Most probably this would occur for GSO packets, so can ask a software
>>>>>> segmentation
>>>>>> to avoid this big and risky kmalloc() by all means.
>>>>>>
>>>>>> Look at idpf_features_check() which has something similar.
>>>>>
>>>>> Hi Eric,
>>>>> Thank you for your review. I understand your concerns regarding the use
>>>>> of skb_linearize() in the xmit path, as it can fail under memory
>>>>> pressure and introduces additional overhead in the transmit path. Based
>>>>> on your input, I will work on a v2 that will move the SGE limit check to
>>>>> the ndo_features_check() path and for GSO skbs exceding the hw limit
>>>>> will disable the NETIF_F_GSO_MASK to enforce software segmentation in
>>>>> kernel before the call to xmit.
>>>>> Also for non GSO skb exceeding the SGE hw limit should we go for using
>>>>> skb_linearize only then or would you suggest some other approach here?
>>>>
>>>> I think that for non GSO, the linearization attempt is fine.
>>>>
>>>> Note that this is extremely unlikely for non malicious users,
>>>> and MTU being usually small (9K or less),
>>>> the allocation will be much smaller than a GSO packet.
>>>
>>> Okay. Will send a v2
>> Hi Eric,
>> I tested the code by disabling GSO in ndo_features_check when the number
>> of SGEs exceeds the hardware limit, using iperf for a single TCP
>> connection with zerocopy enabled. I noticed a significant difference in
>> throughput compared to when we linearize the skbs.
>> For reference, the throughput is 35.6 Gbits/sec when using
>> skb_linearize, but drops to 6.75 Gbits/sec when disabling GSO per skb.
>
> You must be doing something very wrong.
>
> Difference between TSO and non TSO should not be that high.
>
> ethtool -K eth0 tso on
> netperf -H tjbp27
> MIGRATED TCP STREAM TEST from ::0 (::) port 0 AF_INET6 to
> tjbp27.prod.google.com () port 0 AF_INET6
> Recv Send Send
> Socket Socket Message Elapsed
> Size Size Size Time Throughput
> bytes bytes bytes secs. 10^6bits/sec
>
> 540000 262144 262144 10.00 92766.69
>
>
> ethtool -K eth0 tso off
> netperf -H tjbp27
> MIGRATED TCP STREAM TEST from ::0 (::) port 0 AF_INET6 to
> tjbp27.prod.google.com () port 0 AF_INET6
> Recv Send Send
> Socket Socket Message Elapsed
> Size Size Size Time Throughput
> bytes bytes bytes secs. 10^6bits/sec
>
> 540000 262144 262144 10.00 52218.97
>
> Now if I force linearization, you can definitely see the very high
> cost of the copies !
>
> ethtool -K eth1 sg off
> tjbp26:/home/edumazet# ./netperf -H tjbp27
> MIGRATED TCP STREAM TEST from ::0 (::) port 0 AF_INET6 to
> tjbp27.prod.google.com () port 0 AF_INET6
> Recv Send Send
> Socket Socket Message Elapsed
> Size Size Size Time Throughput
> bytes bytes bytes secs. 10^6bits/sec
>
> 540000 262144 262144 10.00 16951.32
>
>>
>> Hence, We propose to linearizing skbs until the first failure occurs.
>
> Hmm... basically hiding a bug then ?
>
>> After that, we switch to a fail-safe mode by disabling GSO for SKBs with
>> sge > hw limit using the ndo_feature_check implementation, while
>> continuing to apply skb_linearize() for non-GSO packets that exceed the
>> hardware limit. This ensures we remain on the optimal performance path
>> initially, and only transition to the fail-safe path after encountering
>> a failure.
>
> Please post your patch (adding the check in ndo_features_check()),
> perhaps one of us is able to help.
Okay Eric, I'll Post a v2 with RFC. Please let me know.
Regards,
Aditya
next prev parent reply other threads:[~2025-10-22 16:44 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-03 15:47 [PATCH net-next] net: mana: Linearize SKB if TX SGEs exceeds hardware limit Aditya Garg
2025-10-03 16:15 ` Eric Dumazet
2025-10-08 15:16 ` Aditya Garg
2025-10-08 15:21 ` Eric Dumazet
2025-10-08 15:28 ` Aditya Garg
2025-10-17 17:41 ` Aditya Garg
2025-10-17 18:06 ` Eric Dumazet
2025-10-22 16:44 ` Aditya Garg [this message]
2025-10-04 9:38 ` Simon Horman
2025-10-08 15:18 ` Aditya Garg
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=cd716f3f-cb08-4ff9-8de4-25363180d7a4@linux.microsoft.com \
--to=gargaditya@linux.microsoft.com \
--cc=andrew+netdev@lunn.ch \
--cc=davem@davemloft.net \
--cc=decui@microsoft.com \
--cc=dipayanroy@linux.microsoft.com \
--cc=edumazet@google.com \
--cc=ernis@linux.microsoft.com \
--cc=gargaditya@microsoft.com \
--cc=haiyangz@microsoft.com \
--cc=horms@kernel.org \
--cc=kotaranov@microsoft.com \
--cc=kuba@kernel.org \
--cc=kys@microsoft.com \
--cc=linux-hyperv@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-rdma@vger.kernel.org \
--cc=longli@microsoft.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=shirazsaleem@microsoft.com \
--cc=shradhagupta@linux.microsoft.com \
--cc=ssengar@linux.microsoft.com \
--cc=wei.liu@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).