public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
From: Paolo Abeni <pabeni@redhat.com>
To: Joe Damato <joe@dama.to>,
	netdev@vger.kernel.org, Michael Chan <michael.chan@broadcom.com>,
	Pavan Chebbi <pavan.chebbi@broadcom.com>,
	Andrew Lunn <andrew+netdev@lunn.ch>,
	"David S. Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>,
	horms@kernel.org, linux-kernel@vger.kernel.org, leon@kernel.org
Subject: Re: [net-next v5 09/12] net: bnxt: Add SW GSO completion and teardown support
Date: Thu, 26 Mar 2026 18:20:11 +0100	[thread overview]
Message-ID: <b33bb10d-630a-4591-a91e-136ade1e970d@redhat.com> (raw)
In-Reply-To: <acVmr9gxjm6mJ0IJ@devvm20253.cco0.facebook.com>

On 3/26/26 6:02 PM, Joe Damato wrote:
> On Thu, Mar 26, 2026 at 01:39:17PM +0100, Paolo Abeni wrote:
>> On 3/23/26 7:38 PM, Joe Damato wrote:
>>> Update __bnxt_tx_int and bnxt_free_one_tx_ring_skbs to handle SW GSO
>>> segments:
>>>
>>> - MID segments: adjust tx_pkts/tx_bytes accounting and skip skb free
>>>   (the skb is shared across all segments and freed only once)
>>>
>>> - LAST segments: if the DMA IOVA path was used, use dma_iova_destroy to
>>>   tear down the contiguous mapping. On the fallback path, payload DMA
>>>   unmapping is handled by the existing per-BD dma_unmap_len walk.
>>>
>>> Both MID and LAST completions advance tx_inline_cons to release the
>>> segment's inline header slot back to the ring.
>>>
>>> is_sw_gso is initialized to zero, so the new code paths are not run.
>>>
>>> Suggested-by: Jakub Kicinski <kuba@kernel.org>
>>> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
>>> Signed-off-by: Joe Damato <joe@dama.to>
>>> ---
>>>  v5:
>>>    - Added Pavan's Reviewed-by. No functional changes.
>>>
>>>  v3:
>>>    - completion paths updated to use DMA IOVA APIs to teardown mappings.
>>>
>>>  rfcv2:
>>>    - Update the shared header buffer consumer on TX completion.
>>>
>>>  drivers/net/ethernet/broadcom/bnxt/bnxt.c     | 82 +++++++++++++++++--
>>>  .../net/ethernet/broadcom/bnxt/bnxt_ethtool.c | 19 ++++-
>>>  2 files changed, 91 insertions(+), 10 deletions(-)
>>>
>>> diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
>>> index 2759a4e2b148..40a16f96feba 100644
>>> --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
>>> +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
>>> @@ -74,6 +74,8 @@
>>>  #include "bnxt_debugfs.h"
>>>  #include "bnxt_coredump.h"
>>>  #include "bnxt_hwmon.h"
>>> +#include "bnxt_gso.h"
>>> +#include <net/tso.h>
>>>  
>>>  #define BNXT_TX_TIMEOUT		(5 * HZ)
>>>  #define BNXT_DEF_MSG_ENABLE	(NETIF_MSG_DRV | NETIF_MSG_HW | \
>>> @@ -817,12 +819,13 @@ static bool __bnxt_tx_int(struct bnxt *bp, struct bnxt_tx_ring_info *txr,
>>>  	bool rc = false;
>>>  
>>>  	while (RING_TX(bp, cons) != hw_cons) {
>>> -		struct bnxt_sw_tx_bd *tx_buf;
>>> +		struct bnxt_sw_tx_bd *tx_buf, *head_buf;
>>>  		struct sk_buff *skb;
>>>  		bool is_ts_pkt;
>>>  		int j, last;
>>>  
>>>  		tx_buf = &txr->tx_buf_ring[RING_TX(bp, cons)];
>>> +		head_buf = tx_buf;
>>>  		skb = tx_buf->skb;
>>>  
>>>  		if (unlikely(!skb)) {
>>> @@ -869,6 +872,23 @@ static bool __bnxt_tx_int(struct bnxt *bp, struct bnxt_tx_ring_info *txr,
>>>  							    DMA_TO_DEVICE, 0);
>>>  			}
>>>  		}
>>> +
>>> +		if (unlikely(head_buf->is_sw_gso)) {
>>> +			txr->tx_inline_cons++;
>>> +			if (head_buf->is_sw_gso == BNXT_SW_GSO_LAST) {
>>> +				if (dma_use_iova(&head_buf->iova_state))
>>
>> I'm likely lost, but AFAICS the previous patch/bnxt_sw_udp_gso_xmit()
>> initialize head_buf->iova_state only when
>> `dma_use_iova(&head_buf->iova_state) == true`. I.e. in fallback scenario
>> the previous iova_state is maintained.
> 
> Note that calling dma_iova_try_alloc zeroes the state before returning whether
> the IOVA DMA API can be used or not and I call that uncoditionally (see
> below).
> 
>> Additionally AFAICS dma_iova_destroy does not clear `head_buf->iova_state`.
> 
> That's my understanding, too, that dma_iova_destroy doesn't clear the state.
>  
>> It looks like that 2 consecutive skb hitting the same slot use a
>> different dma mapping strategy (fallback vs iova) bat things will
>> happen?!? should the previous patch always initializing
>> head_buf->iova_state?
> 
> AFAICT, to switch the IOMMU domain would require unbind the device, changing
> the IOMMU type, and re-binding the device... which would destroy all the rings
> in the process and thus this wouldn't happen.
> 
> The only way I could potentially imagine this happening would be in extreme
> IOVA pressure (maybe?):
>   - packet A in slot N, dma_iova_try_alloc suceeds -> head_buf->iova_state
>     copied
>   - completion the packet occurs, dma_iova_destroy is called,
>     head_buf->iova_state is not cleared
>   - packet B in slot N, dma_iova_try_alloc fails due to IOVA pressure...
>     head_buf->iova_state is stale
> 
> I'm pretty skeptical that this is a realistic case, TBH.
> 
> That said and since it seems my v5 got CR, I can send a v6 with this slight
> change to address the case you've mentioned above.
> 
> I'll send in a couple hours unless I hear otherwise:
> 
> diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_gso.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_gso.c
> index 9c30ee063ef5..7c198847a771 100644
> --- a/drivers/net/ethernet/broadcom/bnxt/bnxt_gso.c
> +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_gso.c
> @@ -142,8 +142,12 @@ netdev_tx_t bnxt_sw_udp_gso_xmit(struct bnxt *bp,
> 
>                 tx_buf->is_sw_gso = last ? BNXT_SW_GSO_LAST : BNXT_SW_GSO_MID;
> 
> -               /* Store IOVA state on the last segment for completion */
> -               if (last && tso_dma_map_use_iova(&map)) {
> +               /* Store IOVA state on the last segment for completion.
> +                * Always copy so that a stale iova_state from a prior
> +                * occupant of this ring slot cannot be misread by
> +                * dma_use_iova() in the completion path.
> +                */
> +               if (last) {
>                         tx_buf->iova_state = map.iova_state;
>                         tx_buf->iova_total_len = map.total_len;
>                 }
> 

Since tso_dma_map_use_iova(&map) is the likely option, I tend to think
that the above change is worthy even if the problem I feared about is
extremely unlikely if possible at all: the code is IMHO easier to
follow, and FWIW does not overoptimize an unlikely scenario.

/P


  reply	other threads:[~2026-03-26 17:20 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-23 18:38 [net-next v5 00/12] Add TSO map-once DMA helpers and bnxt SW USO support Joe Damato
2026-03-23 18:38 ` [net-next v5 01/12] net: tso: Introduce tso_dma_map Joe Damato
2026-03-23 18:38 ` [net-next v5 02/12] net: tso: Add tso_dma_map helpers Joe Damato
2026-03-23 18:38 ` [net-next v5 03/12] net: bnxt: Export bnxt_xmit_get_cfa_action Joe Damato
2026-03-23 18:38 ` [net-next v5 04/12] net: bnxt: Add a helper for tx_bd_ext Joe Damato
2026-03-23 18:38 ` [net-next v5 05/12] net: bnxt: Use dma_unmap_len for TX completion unmapping Joe Damato
2026-03-23 18:38 ` [net-next v5 06/12] net: bnxt: Add TX inline buffer infrastructure Joe Damato
2026-03-23 18:38 ` [net-next v5 07/12] net: bnxt: Add boilerplate GSO code Joe Damato
2026-03-23 18:38 ` [net-next v5 08/12] net: bnxt: Implement software USO Joe Damato
2026-03-23 18:38 ` [net-next v5 09/12] net: bnxt: Add SW GSO completion and teardown support Joe Damato
2026-03-26 12:39   ` Paolo Abeni
2026-03-26 17:02     ` Joe Damato
2026-03-26 17:20       ` Paolo Abeni [this message]
2026-03-23 18:38 ` [net-next v5 10/12] net: bnxt: Dispatch to SW USO Joe Damato
2026-03-23 18:38 ` [net-next v5 11/12] net: netdevsim: Add support for " Joe Damato
2026-03-23 18:38 ` [net-next v5 12/12] selftests: drv-net: Add USO test Joe Damato

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b33bb10d-630a-4591-a91e-136ade1e970d@redhat.com \
    --to=pabeni@redhat.com \
    --cc=andrew+netdev@lunn.ch \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=horms@kernel.org \
    --cc=joe@dama.to \
    --cc=kuba@kernel.org \
    --cc=leon@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=michael.chan@broadcom.com \
    --cc=netdev@vger.kernel.org \
    --cc=pavan.chebbi@broadcom.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox