Netdev List
 help / color / mirror / Atom feed
  • * Re: [PATCH net-next v3 0/7] bnxt_en: implement device memory TCP for bnxt
           [not found] <20241003160620.1521626-1-ap420073@gmail.com>
           [not found] ` <20241003160620.1521626-8-ap420073@gmail.com>
    @ 2024-10-16 20:17 ` Stanislav Fomichev
      2024-10-17  8:58   ` Taehee Yoo
           [not found] ` <20241003160620.1521626-3-ap420073@gmail.com>
      2 siblings, 1 reply; 17+ messages in thread
    From: Stanislav Fomichev @ 2024-10-16 20:17 UTC (permalink / raw)
      To: Taehee Yoo
      Cc: davem, kuba, pabeni, edumazet, almasrymina, netdev, linux-doc,
    	donald.hunter, corbet, michael.chan, kory.maincent, andrew,
    	maxime.chevallier, danieller, hengqi, ecree.xilinx,
    	przemyslaw.kitszel, hkallweit1, ahmed.zaki, paul.greenwalt,
    	rrameshbabu, idosch, asml.silence, kaiyuanz, willemb,
    	aleksander.lobakin, dw, sridhar.samudrala, bcreeley
    
    On 10/03, Taehee Yoo wrote:
    > This series implements device memory TCP for bnxt_en driver and
    > necessary ethtool command implementations.
    > 
    > NICs that use the bnxt_en driver support tcp-data-split feature named
    > HDS(header-data-split).
    > But there is no implementation for the HDS to enable/disable by ethtool.
    > Only getting the current HDS status is implemented and the HDS is just
    > automatically enabled only when either LRO, HW-GRO, or JUMBO is enabled.
    > The hds_threshold follows the rx-copybreak value but it wasn't
    > changeable.
    > 
    > Currently, bnxt_en driver enables tcp-data-split by default but not
    > always work.
    > There is hds_threshold value, which indicates that a packet size is
    > larger than this value, a packet will be split into header and data.
    > hds_threshold value has been 256, which is a default value of
    > rx-copybreak value too.
    > The rx-copybreak value hasn't been allowed to change so the
    > hds_threshold too.
    > 
    > This patchset decouples hds_threshold and rx-copybreak first.
    > and make tcp-data-split, rx-copybreak, and
    > tcp-data-split-thresh(hds_threshold) configurable independently.
    > 
    > But the default configuration is the same.
    > The default value of rx-copybreak is 256 and default
    > tcp-data-split-thresh is also 256.
    > 
    > There are several related options.
    > TPA(HW-GRO, LRO), JUMBO, jumbo_thresh(firmware command), and Aggregation
    > Ring.
    > 
    > The aggregation ring is fundamental to these all features.
    > When gro/lro/jumbo packets are received, NIC receives the first packet
    > from the normal ring.
    > follow packets come from the aggregation ring.
    > 
    > These features are working regardless of HDS.
    > When TPA is enabled and HDS is disabled, the first packet contains
    > header and payload too.
    > and the following packets contain payload only.
    > If HDS is enabled, the first packet contains the header only, and the
    > following packets contain only payload.
    > So, HW-GRO/LRO is working regardless of HDS.
    > 
    > There is another threshold value, which is jumbo_thresh.
    > This is very similar to hds_thresh, but jumbo thresh doesn't split
    > header and data.
    > It just split the first and following data based on length.
    > When NIC receives 1500 sized packet, and jumbo_thresh is 256(default, but
    > follows rx-copybreak),
    > the first data is 256 and the following packet size is 1500-256.
    > 
    > Before this patch, at least if one of GRO, LRO, and JUMBO flags is
    > enabled, the Aggregation ring will be enabled.
    > If the Aggregation ring is enabled, both hds_threshold and
    > jumbo_thresh are set to the default value of rx-copybreak.
    > 
    > So, GRO, LRO, JUMBO frames, they larger than 256 bytes, they will
    > be split into header and data if the protocol is TCP or UDP.
    > for the other protocol, jumbo_thresh works instead of hds_thresh.
    > 
    > This means that tcp-data-split relies on the GRO, LRO, and JUMBO flags.
    > But by this patch, tcp-data-split no longer relies on these flags.
    > If the tcp-data-split is enabled, the Aggregation ring will be
    > enabled.
    > Also, hds_threshold no longer follows rx-copybreak value, it will
    > be set to the tcp-data-split-thresh value by user-space, but the
    > default value is still 256.
    > 
    > If the protocol is TCP or UDP and the HDS is disabled and Aggregation
    > ring is enabled, a packet will be split into several pieces due to
    > jumbo_thresh.
    > 
    > When XDP is attached, tcp-data-split is automatically disabled.
    > 
    > LRO, GRO, and JUMBO are tested with BCM57414, BCM57504 and the firmware
    > version is 230.0.157.0.
    > I couldn't find any specification about minimum and maximum value
    > of hds_threshold, but from my test result, it was about 0 ~ 1023.
    > It means, over 1023 sized packets will be split into header and data if
    > tcp-data-split is enabled regardless of hds_treshold value.
    > When hds_threshold is 1500 and received packet size is 1400, HDS should
    > not be activated, but it is activated.
    > The maximum value of hds_threshold(tcp-data-split-thresh)
    > value is 256 because it has been working.
    > It was decided very conservatively.
    > 
    > I checked out the tcp-data-split(HDS) works independently of GRO, LRO,
    > JUMBO. Tested GRO/LRO, JUMBO with enabled HDS and disabled HDS.
    > Also, I checked out tcp-data-split should be disabled automatically
    > when XDP is attached and disallowed to enable it again while XDP is
    > attached. I tested ranged values from min to max for
    > tcp-data-split-thresh and rx-copybreak, and it works.
    > tcp-data-split-thresh from 0 to 256, and rx-copybreak 65 to 256.
    > When testing this patchset, I checked skb->data, skb->data_len, and
    > nr_frags values.
    > 
    > The first patch implements .{set, get}_tunable() in the bnxt_en.
    > The bnxt_en driver has been supporting the rx-copybreak feature but is
    > not configurable, Only the default rx-copybreak value has been working.
    > So, it changes the bnxt_en driver to be able to configure
    > the rx-copybreak value.
    > 
    > The second patch adds an implementation of tcp-data-split ethtool
    > command.
    > The HDS relies on the Aggregation ring, which is automatically enabled
    > when either LRO, GRO, or large mtu is configured.
    > So, if the Aggregation ring is enabled, HDS is automatically enabled by
    > it.
    > 
    > The third patch adds tcp-data-split-thresh command in the ethtool.
    > This threshold value indicates if a received packet size is larger
    > than this threshold, the packet's header and payload will be split.
    > Example:
    >    # ethtool -G <interface name> tcp-data-split-thresh <value>
    > This option can not be used when tcp-data-split is disabled or not
    > supported.
    >    # ethtool -G enp14s0f0np0 tcp-data-split on tcp-data-split-thresh 256
    >    # ethtool -g enp14s0f0np0
    >    Ring parameters for enp14s0f0np0:
    >    Pre-set maximums:
    >    ...
    >    Current hardware settings:
    >    ...
    >    TCP data split:         on
    >    TCP data split thresh:  256
    > 
    >    # ethtool -G enp14s0f0np0 tcp-data-split off
    >    # ethtool -g enp14s0f0np0
    >    Ring parameters for enp14s0f0np0:
    >    Pre-set maximums:
    >    ...
    >    Current hardware settings:
    >    ...
    >    TCP data split:         off
    >    TCP data split thresh:  n/a
    > 
    > The fourth patch adds the implementation of tcp-data-split-thresh logic
    > in the bnxt_en driver.
    > The default value is 256, which used to be the default rx-copybreak
    > value.
    > 
    > The fifth and sixth adds condition check for devmem and ethtool.
    > If tcp-data-split is disabled or threshold value is not zero, setup of
    > devmem will be failed.
    > Also, tcp-data-split and tcp-data-split-thresh will not be changed
    > while devmem is running.
    > 
    > The last patch implements device memory TCP for bnxt_en driver.
    > It usually converts generic page_pool api to netmem page_pool api.
    > 
    > No dependencies exist between device memory TCP and GRO/LRO/MTU.
    > Only tcp-data-split and tcp-data-split-thresh should be enabled when the
    > device memory TCP.
    > While devmem TCP is set, tcp-data-split and tcp-data-split-thresh can't
    > be updated because core API disallows change.
    > 
    > I tested the interface up/down while devmem TCP running. It works well.
    > Also, channel count change, and rx/tx ringsize change tests work well too.
    > 
    > The devmem TCP test NIC is BCM57504
    
    [..]
    
    > All necessary configuration validations exist at the core API level.
    > 
    > Note that by this patch, the setup of device memory TCP would fail.
    > Because tcp-data-split-thresh command is not supported by ethtool yet.
    > The tcp-data-split-thresh should be 0 for setup device memory TCP and
    > the default of bnxt is 256.
    > So, for the bnxt, it always fails until ethtool supports
    > tcp-data-split-thresh command.
    > 
    > The ncdevmem.c will be updated after ethtool supports
    > tcp-data-split-thresh option.
    
    FYI, I've tested your series with BCM57504 on top of [1] and [2] with
    a couple of patches to make ncdevmem.c and TX work (see below). [1]
    decouples ncdevmem from ethtool so we can flip header split settings
    without requiring recent ethtool. Both RX and TX work perfectly.
    Feel free to carry:
    
    Tested-by: Stanislav Fomichev <sdf@fomichev.me>
    
    Also feel free to take over the ncdevmem patch if my ncdevmem changes
    get pulled before your series.
    
    1: https://lore.kernel.org/netdev/20241009171252.2328284-1-sdf@fomichev.me/
    2: https://lore.kernel.org/netdev/20240913150913.1280238-1-sdf@fomichev.me/
    
    commit 69bc0e247eb4132ef5fd0b118719427d35d462fc
    Author:     Stanislav Fomichev <sdf@fomichev.me>
    AuthorDate: Tue Oct 15 15:56:43 2024 -0700
    Commit:     Stanislav Fomichev <sdf@fomichev.me>
    CommitDate: Wed Oct 16 13:13:42 2024 -0700
    
        selftests: ncdevmem: Set header split threshold to 0
        
        Needs to happen on BRCM to allow devmem to be attached.
        
        Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
    
    diff --git a/tools/testing/selftests/drivers/net/hw/ncdevmem.c b/tools/testing/selftests/drivers/net/hw/ncdevmem.c
    index 903dac3e61d5..6a94d52a6c43 100644
    --- a/tools/testing/selftests/drivers/net/hw/ncdevmem.c
    +++ b/tools/testing/selftests/drivers/net/hw/ncdevmem.c
    @@ -322,6 +322,8 @@ static int configure_headersplit(bool on)
     	ethtool_rings_set_req_set_header_dev_index(req, ifindex);
     	/* 0 - off, 1 - auto, 2 - on */
     	ethtool_rings_set_req_set_tcp_data_split(req, on ? 2 : 0);
    +	if (enable)
    +		ethtool_rings_set_req_set_tcp_data_split_thresh(req, 0);
     	ret = ethtool_rings_set(ys, req);
     	if (ret < 0)
     		fprintf(stderr, "YNL failed: %s\n", ys->err.msg);
    
    
    commit ef5ba647bc94a19153c2c5cfc64ebe4cb86ac58d
    Author:     Stanislav Fomichev <sdf@fomichev.me>
    AuthorDate: Fri Oct 11 13:52:03 2024 -0700
    Commit:     Stanislav Fomichev <sdf@fomichev.me>
    CommitDate: Wed Oct 16 13:13:42 2024 -0700
    
        bnxt_en: support tx device memory
        
        The only change is to not unmap the frags on completions.
        
        Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
    
    diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
    index 6e422e24750a..cb22707a35aa 100644
    --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
    +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
    @@ -692,7 +692,10 @@ static netdev_tx_t bnxt_start_xmit(struct sk_buff *skb, struct net_device *dev)
     			goto tx_dma_error;
     
     		tx_buf = &txr->tx_buf_ring[RING_TX(bp, prod)];
    -		dma_unmap_addr_set(tx_buf, mapping, mapping);
    +		if (netmem_is_net_iov(frag->netmem))
    +			dma_unmap_addr_set(tx_buf, mapping, 0);
    +		else
    +			dma_unmap_addr_set(tx_buf, mapping, mapping);
     
     		txbd->tx_bd_haddr = cpu_to_le64(mapping);
     
    @@ -749,9 +752,10 @@ static netdev_tx_t bnxt_start_xmit(struct sk_buff *skb, struct net_device *dev)
     	for (i = 0; i < last_frag; i++) {
     		prod = NEXT_TX(prod);
     		tx_buf = &txr->tx_buf_ring[RING_TX(bp, prod)];
    -		dma_unmap_page(&pdev->dev, dma_unmap_addr(tx_buf, mapping),
    -			       skb_frag_size(&skb_shinfo(skb)->frags[i]),
    -			       DMA_TO_DEVICE);
    +		if (dma_unmap_addr(tx_buf, mapping))
    +			dma_unmap_page(&pdev->dev, dma_unmap_addr(tx_buf, mapping),
    +				       skb_frag_size(&skb_shinfo(skb)->frags[i]),
    +				       DMA_TO_DEVICE);
     	}
     
     tx_free:
    @@ -821,11 +825,12 @@ static bool __bnxt_tx_int(struct bnxt *bp, struct bnxt_tx_ring_info *txr,
     		for (j = 0; j < last; j++) {
     			cons = NEXT_TX(cons);
     			tx_buf = &txr->tx_buf_ring[RING_TX(bp, cons)];
    -			dma_unmap_page(
    -				&pdev->dev,
    -				dma_unmap_addr(tx_buf, mapping),
    -				skb_frag_size(&skb_shinfo(skb)->frags[j]),
    -				DMA_TO_DEVICE);
    +			if (dma_unmap_addr(tx_buf, mapping))
    +				dma_unmap_page(
    +					&pdev->dev,
    +					dma_unmap_addr(tx_buf, mapping),
    +					skb_frag_size(&skb_shinfo(skb)->frags[j]),
    +					DMA_TO_DEVICE);
     		}
     		if (unlikely(is_ts_pkt)) {
     			if (BNXT_CHIP_P5(bp)) {
    @@ -3296,10 +3301,11 @@ static void bnxt_free_tx_skbs(struct bnxt *bp)
     				skb_frag_t *frag = &skb_shinfo(skb)->frags[k];
     
     				tx_buf = &txr->tx_buf_ring[ring_idx];
    -				dma_unmap_page(
    -					&pdev->dev,
    -					dma_unmap_addr(tx_buf, mapping),
    -					skb_frag_size(frag), DMA_TO_DEVICE);
    +				if (dma_unmap_addr(tx_buf, mapping))
    +					dma_unmap_page(
    +						&pdev->dev,
    +						dma_unmap_addr(tx_buf, mapping),
    +						skb_frag_size(frag), DMA_TO_DEVICE);
     			}
     			dev_kfree_skb(skb);
     		}
    
    ^ permalink raw reply related	[flat|nested] 17+ messages in thread
  • [parent not found: <20241003160620.1521626-3-ap420073@gmail.com>]

  • end of thread, other threads:[~2024-11-01 17:11 UTC | newest]
    
    Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
    -- links below jump to the message on this page --
         [not found] <20241003160620.1521626-1-ap420073@gmail.com>
         [not found] ` <20241003160620.1521626-8-ap420073@gmail.com>
         [not found]   ` <CAHS8izO-7pPk7xyY4JdyaY4hZpd7zerbjhGanRvaTk+OOsvY0A@mail.gmail.com>
         [not found]     ` <CAMArcTU61G=fexf-RJDSW_sGp9dZCkJsJKC=yjg79RS9Ugjuxw@mail.gmail.com>
         [not found]       ` <20241008125023.7fbc1f64@kernel.org>
         [not found]         ` <CAMArcTWVrQ7KWPt+c0u7X=jvBd2VZGVLwjWYCjMYhWZTymMRTg@mail.gmail.com>
         [not found]           ` <20241009170102.1980ed1d@kernel.org>
    2024-10-10 17:44             ` [PATCH net-next v3 7/7] bnxt_en: add support for device memory tcp Mina Almasry
    2024-10-11  1:34               ` Jakub Kicinski
    2024-10-11 17:33                 ` Mina Almasry
    2024-10-11 23:42                   ` Jason Gunthorpe
    2024-10-14 22:38                     ` Mina Almasry
    2024-10-15  0:16                       ` Jakub Kicinski
    2024-10-15  1:10                         ` Mina Almasry
    2024-10-15 12:44                           ` Jason Gunthorpe
    2024-10-18  8:25                             ` Mina Almasry
    2024-10-19 13:55                               ` Taehee Yoo
    2024-10-15 14:29                       ` Pavel Begunkov
    2024-10-15 17:38                         ` David Wei
    2024-10-16 20:17 ` [PATCH net-next v3 0/7] bnxt_en: implement device memory TCP for bnxt Stanislav Fomichev
    2024-10-17  8:58   ` Taehee Yoo
         [not found] ` <20241003160620.1521626-3-ap420073@gmail.com>
         [not found]   ` <20241008111926.7056cc93@kernel.org>
         [not found]     ` <CAMArcTU+r+Pj_y7rUvRwTrDWqg57xy4e-OacjWCfKRCUa8A-aw@mail.gmail.com>
         [not found]       ` <20241009082837.2735cd97@kernel.org>
    2024-10-31 17:34         ` [PATCH net-next v3 2/7] bnxt_en: add support for tcp-data-split ethtool command Taehee Yoo
    2024-10-31 23:56           ` Jakub Kicinski
    2024-11-01 17:11             ` Taehee Yoo
    

    This is a public inbox, see mirroring instructions
    for how to clone and mirror all data and code used for this inbox