netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Stanislav Fomichev <stfomichev@gmail.com>
To: Taehee Yoo <ap420073@gmail.com>
Cc: davem@davemloft.net, kuba@kernel.org, pabeni@redhat.com,
	edumazet@google.com, almasrymina@google.com,
	netdev@vger.kernel.org, linux-doc@vger.kernel.org,
	donald.hunter@gmail.com, corbet@lwn.net,
	michael.chan@broadcom.com, kory.maincent@bootlin.com,
	andrew@lunn.ch, maxime.chevallier@bootlin.com,
	danieller@nvidia.com, hengqi@linux.alibaba.com,
	ecree.xilinx@gmail.com, przemyslaw.kitszel@intel.com,
	hkallweit1@gmail.com, ahmed.zaki@intel.com,
	paul.greenwalt@intel.com, rrameshbabu@nvidia.com,
	idosch@nvidia.com, asml.silence@gmail.com, kaiyuanz@google.com,
	willemb@google.com, aleksander.lobakin@intel.com, dw@davidwei.uk,
	sridhar.samudrala@intel.com, bcreeley@amd.com
Subject: Re: [PATCH net-next v3 0/7] bnxt_en: implement device memory TCP for bnxt
Date: Wed, 16 Oct 2024 13:17:28 -0700	[thread overview]
Message-ID: <ZxAfWHk3aRWl-F31@mini-arch> (raw)
In-Reply-To: <20241003160620.1521626-1-ap420073@gmail.com>

On 10/03, Taehee Yoo wrote:
> This series implements device memory TCP for bnxt_en driver and
> necessary ethtool command implementations.
> 
> NICs that use the bnxt_en driver support tcp-data-split feature named
> HDS(header-data-split).
> But there is no implementation for the HDS to enable/disable by ethtool.
> Only getting the current HDS status is implemented and the HDS is just
> automatically enabled only when either LRO, HW-GRO, or JUMBO is enabled.
> The hds_threshold follows the rx-copybreak value but it wasn't
> changeable.
> 
> Currently, bnxt_en driver enables tcp-data-split by default but not
> always work.
> There is hds_threshold value, which indicates that a packet size is
> larger than this value, a packet will be split into header and data.
> hds_threshold value has been 256, which is a default value of
> rx-copybreak value too.
> The rx-copybreak value hasn't been allowed to change so the
> hds_threshold too.
> 
> This patchset decouples hds_threshold and rx-copybreak first.
> and make tcp-data-split, rx-copybreak, and
> tcp-data-split-thresh(hds_threshold) configurable independently.
> 
> But the default configuration is the same.
> The default value of rx-copybreak is 256 and default
> tcp-data-split-thresh is also 256.
> 
> There are several related options.
> TPA(HW-GRO, LRO), JUMBO, jumbo_thresh(firmware command), and Aggregation
> Ring.
> 
> The aggregation ring is fundamental to these all features.
> When gro/lro/jumbo packets are received, NIC receives the first packet
> from the normal ring.
> follow packets come from the aggregation ring.
> 
> These features are working regardless of HDS.
> When TPA is enabled and HDS is disabled, the first packet contains
> header and payload too.
> and the following packets contain payload only.
> If HDS is enabled, the first packet contains the header only, and the
> following packets contain only payload.
> So, HW-GRO/LRO is working regardless of HDS.
> 
> There is another threshold value, which is jumbo_thresh.
> This is very similar to hds_thresh, but jumbo thresh doesn't split
> header and data.
> It just split the first and following data based on length.
> When NIC receives 1500 sized packet, and jumbo_thresh is 256(default, but
> follows rx-copybreak),
> the first data is 256 and the following packet size is 1500-256.
> 
> Before this patch, at least if one of GRO, LRO, and JUMBO flags is
> enabled, the Aggregation ring will be enabled.
> If the Aggregation ring is enabled, both hds_threshold and
> jumbo_thresh are set to the default value of rx-copybreak.
> 
> So, GRO, LRO, JUMBO frames, they larger than 256 bytes, they will
> be split into header and data if the protocol is TCP or UDP.
> for the other protocol, jumbo_thresh works instead of hds_thresh.
> 
> This means that tcp-data-split relies on the GRO, LRO, and JUMBO flags.
> But by this patch, tcp-data-split no longer relies on these flags.
> If the tcp-data-split is enabled, the Aggregation ring will be
> enabled.
> Also, hds_threshold no longer follows rx-copybreak value, it will
> be set to the tcp-data-split-thresh value by user-space, but the
> default value is still 256.
> 
> If the protocol is TCP or UDP and the HDS is disabled and Aggregation
> ring is enabled, a packet will be split into several pieces due to
> jumbo_thresh.
> 
> When XDP is attached, tcp-data-split is automatically disabled.
> 
> LRO, GRO, and JUMBO are tested with BCM57414, BCM57504 and the firmware
> version is 230.0.157.0.
> I couldn't find any specification about minimum and maximum value
> of hds_threshold, but from my test result, it was about 0 ~ 1023.
> It means, over 1023 sized packets will be split into header and data if
> tcp-data-split is enabled regardless of hds_treshold value.
> When hds_threshold is 1500 and received packet size is 1400, HDS should
> not be activated, but it is activated.
> The maximum value of hds_threshold(tcp-data-split-thresh)
> value is 256 because it has been working.
> It was decided very conservatively.
> 
> I checked out the tcp-data-split(HDS) works independently of GRO, LRO,
> JUMBO. Tested GRO/LRO, JUMBO with enabled HDS and disabled HDS.
> Also, I checked out tcp-data-split should be disabled automatically
> when XDP is attached and disallowed to enable it again while XDP is
> attached. I tested ranged values from min to max for
> tcp-data-split-thresh and rx-copybreak, and it works.
> tcp-data-split-thresh from 0 to 256, and rx-copybreak 65 to 256.
> When testing this patchset, I checked skb->data, skb->data_len, and
> nr_frags values.
> 
> The first patch implements .{set, get}_tunable() in the bnxt_en.
> The bnxt_en driver has been supporting the rx-copybreak feature but is
> not configurable, Only the default rx-copybreak value has been working.
> So, it changes the bnxt_en driver to be able to configure
> the rx-copybreak value.
> 
> The second patch adds an implementation of tcp-data-split ethtool
> command.
> The HDS relies on the Aggregation ring, which is automatically enabled
> when either LRO, GRO, or large mtu is configured.
> So, if the Aggregation ring is enabled, HDS is automatically enabled by
> it.
> 
> The third patch adds tcp-data-split-thresh command in the ethtool.
> This threshold value indicates if a received packet size is larger
> than this threshold, the packet's header and payload will be split.
> Example:
>    # ethtool -G <interface name> tcp-data-split-thresh <value>
> This option can not be used when tcp-data-split is disabled or not
> supported.
>    # ethtool -G enp14s0f0np0 tcp-data-split on tcp-data-split-thresh 256
>    # ethtool -g enp14s0f0np0
>    Ring parameters for enp14s0f0np0:
>    Pre-set maximums:
>    ...
>    Current hardware settings:
>    ...
>    TCP data split:         on
>    TCP data split thresh:  256
> 
>    # ethtool -G enp14s0f0np0 tcp-data-split off
>    # ethtool -g enp14s0f0np0
>    Ring parameters for enp14s0f0np0:
>    Pre-set maximums:
>    ...
>    Current hardware settings:
>    ...
>    TCP data split:         off
>    TCP data split thresh:  n/a
> 
> The fourth patch adds the implementation of tcp-data-split-thresh logic
> in the bnxt_en driver.
> The default value is 256, which used to be the default rx-copybreak
> value.
> 
> The fifth and sixth adds condition check for devmem and ethtool.
> If tcp-data-split is disabled or threshold value is not zero, setup of
> devmem will be failed.
> Also, tcp-data-split and tcp-data-split-thresh will not be changed
> while devmem is running.
> 
> The last patch implements device memory TCP for bnxt_en driver.
> It usually converts generic page_pool api to netmem page_pool api.
> 
> No dependencies exist between device memory TCP and GRO/LRO/MTU.
> Only tcp-data-split and tcp-data-split-thresh should be enabled when the
> device memory TCP.
> While devmem TCP is set, tcp-data-split and tcp-data-split-thresh can't
> be updated because core API disallows change.
> 
> I tested the interface up/down while devmem TCP running. It works well.
> Also, channel count change, and rx/tx ringsize change tests work well too.
> 
> The devmem TCP test NIC is BCM57504

[..]

> All necessary configuration validations exist at the core API level.
> 
> Note that by this patch, the setup of device memory TCP would fail.
> Because tcp-data-split-thresh command is not supported by ethtool yet.
> The tcp-data-split-thresh should be 0 for setup device memory TCP and
> the default of bnxt is 256.
> So, for the bnxt, it always fails until ethtool supports
> tcp-data-split-thresh command.
> 
> The ncdevmem.c will be updated after ethtool supports
> tcp-data-split-thresh option.

FYI, I've tested your series with BCM57504 on top of [1] and [2] with
a couple of patches to make ncdevmem.c and TX work (see below). [1]
decouples ncdevmem from ethtool so we can flip header split settings
without requiring recent ethtool. Both RX and TX work perfectly.
Feel free to carry:

Tested-by: Stanislav Fomichev <sdf@fomichev.me>

Also feel free to take over the ncdevmem patch if my ncdevmem changes
get pulled before your series.

1: https://lore.kernel.org/netdev/20241009171252.2328284-1-sdf@fomichev.me/
2: https://lore.kernel.org/netdev/20240913150913.1280238-1-sdf@fomichev.me/

commit 69bc0e247eb4132ef5fd0b118719427d35d462fc
Author:     Stanislav Fomichev <sdf@fomichev.me>
AuthorDate: Tue Oct 15 15:56:43 2024 -0700
Commit:     Stanislav Fomichev <sdf@fomichev.me>
CommitDate: Wed Oct 16 13:13:42 2024 -0700

    selftests: ncdevmem: Set header split threshold to 0
    
    Needs to happen on BRCM to allow devmem to be attached.
    
    Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>

diff --git a/tools/testing/selftests/drivers/net/hw/ncdevmem.c b/tools/testing/selftests/drivers/net/hw/ncdevmem.c
index 903dac3e61d5..6a94d52a6c43 100644
--- a/tools/testing/selftests/drivers/net/hw/ncdevmem.c
+++ b/tools/testing/selftests/drivers/net/hw/ncdevmem.c
@@ -322,6 +322,8 @@ static int configure_headersplit(bool on)
 	ethtool_rings_set_req_set_header_dev_index(req, ifindex);
 	/* 0 - off, 1 - auto, 2 - on */
 	ethtool_rings_set_req_set_tcp_data_split(req, on ? 2 : 0);
+	if (enable)
+		ethtool_rings_set_req_set_tcp_data_split_thresh(req, 0);
 	ret = ethtool_rings_set(ys, req);
 	if (ret < 0)
 		fprintf(stderr, "YNL failed: %s\n", ys->err.msg);


commit ef5ba647bc94a19153c2c5cfc64ebe4cb86ac58d
Author:     Stanislav Fomichev <sdf@fomichev.me>
AuthorDate: Fri Oct 11 13:52:03 2024 -0700
Commit:     Stanislav Fomichev <sdf@fomichev.me>
CommitDate: Wed Oct 16 13:13:42 2024 -0700

    bnxt_en: support tx device memory
    
    The only change is to not unmap the frags on completions.
    
    Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 6e422e24750a..cb22707a35aa 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -692,7 +692,10 @@ static netdev_tx_t bnxt_start_xmit(struct sk_buff *skb, struct net_device *dev)
 			goto tx_dma_error;
 
 		tx_buf = &txr->tx_buf_ring[RING_TX(bp, prod)];
-		dma_unmap_addr_set(tx_buf, mapping, mapping);
+		if (netmem_is_net_iov(frag->netmem))
+			dma_unmap_addr_set(tx_buf, mapping, 0);
+		else
+			dma_unmap_addr_set(tx_buf, mapping, mapping);
 
 		txbd->tx_bd_haddr = cpu_to_le64(mapping);
 
@@ -749,9 +752,10 @@ static netdev_tx_t bnxt_start_xmit(struct sk_buff *skb, struct net_device *dev)
 	for (i = 0; i < last_frag; i++) {
 		prod = NEXT_TX(prod);
 		tx_buf = &txr->tx_buf_ring[RING_TX(bp, prod)];
-		dma_unmap_page(&pdev->dev, dma_unmap_addr(tx_buf, mapping),
-			       skb_frag_size(&skb_shinfo(skb)->frags[i]),
-			       DMA_TO_DEVICE);
+		if (dma_unmap_addr(tx_buf, mapping))
+			dma_unmap_page(&pdev->dev, dma_unmap_addr(tx_buf, mapping),
+				       skb_frag_size(&skb_shinfo(skb)->frags[i]),
+				       DMA_TO_DEVICE);
 	}
 
 tx_free:
@@ -821,11 +825,12 @@ static bool __bnxt_tx_int(struct bnxt *bp, struct bnxt_tx_ring_info *txr,
 		for (j = 0; j < last; j++) {
 			cons = NEXT_TX(cons);
 			tx_buf = &txr->tx_buf_ring[RING_TX(bp, cons)];
-			dma_unmap_page(
-				&pdev->dev,
-				dma_unmap_addr(tx_buf, mapping),
-				skb_frag_size(&skb_shinfo(skb)->frags[j]),
-				DMA_TO_DEVICE);
+			if (dma_unmap_addr(tx_buf, mapping))
+				dma_unmap_page(
+					&pdev->dev,
+					dma_unmap_addr(tx_buf, mapping),
+					skb_frag_size(&skb_shinfo(skb)->frags[j]),
+					DMA_TO_DEVICE);
 		}
 		if (unlikely(is_ts_pkt)) {
 			if (BNXT_CHIP_P5(bp)) {
@@ -3296,10 +3301,11 @@ static void bnxt_free_tx_skbs(struct bnxt *bp)
 				skb_frag_t *frag = &skb_shinfo(skb)->frags[k];
 
 				tx_buf = &txr->tx_buf_ring[ring_idx];
-				dma_unmap_page(
-					&pdev->dev,
-					dma_unmap_addr(tx_buf, mapping),
-					skb_frag_size(frag), DMA_TO_DEVICE);
+				if (dma_unmap_addr(tx_buf, mapping))
+					dma_unmap_page(
+						&pdev->dev,
+						dma_unmap_addr(tx_buf, mapping),
+						skb_frag_size(frag), DMA_TO_DEVICE);
 			}
 			dev_kfree_skb(skb);
 		}

  parent reply	other threads:[~2024-10-16 20:17 UTC|newest]

Thread overview: 73+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-10-03 16:06 [PATCH net-next v3 0/7] bnxt_en: implement device memory TCP for bnxt Taehee Yoo
2024-10-03 16:06 ` [PATCH net-next v3 1/7] bnxt_en: add support for rx-copybreak ethtool command Taehee Yoo
2024-10-03 16:57   ` Brett Creeley
2024-10-03 17:15     ` Taehee Yoo
2024-10-03 17:13   ` Michael Chan
2024-10-03 17:22     ` Taehee Yoo
2024-10-03 17:43       ` Michael Chan
2024-10-03 18:28         ` Taehee Yoo
2024-10-03 18:34         ` Andrew Lunn
2024-10-05  6:29           ` Taehee Yoo
2024-10-08 18:10             ` Jakub Kicinski
2024-10-08 19:38               ` Michael Chan
2024-10-08 19:53                 ` Jakub Kicinski
2024-10-08 20:35                   ` Michael Chan
2024-10-03 16:06 ` [PATCH net-next v3 2/7] bnxt_en: add support for tcp-data-split " Taehee Yoo
2024-10-08 18:19   ` Jakub Kicinski
2024-10-09 13:54     ` Taehee Yoo
2024-10-09 15:28       ` Jakub Kicinski
2024-10-09 17:47         ` Taehee Yoo
2024-10-31 17:34         ` Taehee Yoo
2024-10-31 23:56           ` Jakub Kicinski
2024-11-01 17:11             ` Taehee Yoo
2024-10-03 16:06 ` [PATCH net-next v3 3/7] net: ethtool: add support for configuring tcp-data-split-thresh Taehee Yoo
2024-10-03 18:25   ` Mina Almasry
2024-10-03 19:33     ` Taehee Yoo
2024-10-04  1:47       ` Mina Almasry
2024-10-05  6:11         ` Taehee Yoo
2024-10-08 18:33   ` Jakub Kicinski
2024-10-09 14:25     ` Taehee Yoo
2024-10-09 15:46       ` Jakub Kicinski
2024-10-09 17:49         ` Taehee Yoo
2024-10-03 16:06 ` [PATCH net-next v3 4/7] bnxt_en: add support for tcp-data-split-thresh ethtool command Taehee Yoo
2024-10-03 18:13   ` Brett Creeley
2024-10-03 19:13     ` Taehee Yoo
2024-10-08 18:35   ` Jakub Kicinski
2024-10-09 14:31     ` Taehee Yoo
2024-10-03 16:06 ` [PATCH net-next v3 5/7] net: devmem: add ring parameter filtering Taehee Yoo
2024-10-03 18:29   ` Mina Almasry
2024-10-04  3:57     ` Taehee Yoo
2024-10-03 18:35   ` Brett Creeley
2024-10-03 18:49     ` Mina Almasry
2024-10-08 19:28       ` Jakub Kicinski
2024-10-09 14:35         ` Taehee Yoo
2024-10-04  4:01     ` Taehee Yoo
2024-10-03 16:06 ` [PATCH net-next v3 6/7] net: ethtool: " Taehee Yoo
2024-10-03 18:32   ` Mina Almasry
2024-10-03 19:35     ` Taehee Yoo
2024-10-03 16:06 ` [PATCH net-next v3 7/7] bnxt_en: add support for device memory tcp Taehee Yoo
2024-10-03 18:43   ` Mina Almasry
2024-10-04 10:34     ` Taehee Yoo
2024-10-08  2:57       ` David Wei
2024-10-09 15:02         ` Taehee Yoo
2024-10-08 19:50       ` Jakub Kicinski
2024-10-09 15:37         ` Taehee Yoo
2024-10-10  0:01           ` Jakub Kicinski
2024-10-10 17:44             ` Mina Almasry
2024-10-11  1:34               ` Jakub Kicinski
2024-10-11 17:33                 ` Mina Almasry
2024-10-11 23:42                   ` Jason Gunthorpe
2024-10-14 22:38                     ` Mina Almasry
2024-10-15  0:16                       ` Jakub Kicinski
2024-10-15  1:10                         ` Mina Almasry
2024-10-15 12:44                           ` Jason Gunthorpe
2024-10-18  8:25                             ` Mina Almasry
2024-10-19 13:55                               ` Taehee Yoo
2024-10-15 14:29                       ` Pavel Begunkov
2024-10-15 17:38                         ` David Wei
2024-10-05  3:48   ` kernel test robot
2024-10-08  2:45   ` David Wei
2024-10-08  3:54     ` Taehee Yoo
2024-10-08  3:58       ` Taehee Yoo
2024-10-16 20:17 ` Stanislav Fomichev [this message]
2024-10-17  8:58   ` [PATCH net-next v3 0/7] bnxt_en: implement device memory TCP for bnxt Taehee Yoo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZxAfWHk3aRWl-F31@mini-arch \
    --to=stfomichev@gmail.com \
    --cc=ahmed.zaki@intel.com \
    --cc=aleksander.lobakin@intel.com \
    --cc=almasrymina@google.com \
    --cc=andrew@lunn.ch \
    --cc=ap420073@gmail.com \
    --cc=asml.silence@gmail.com \
    --cc=bcreeley@amd.com \
    --cc=corbet@lwn.net \
    --cc=danieller@nvidia.com \
    --cc=davem@davemloft.net \
    --cc=donald.hunter@gmail.com \
    --cc=dw@davidwei.uk \
    --cc=ecree.xilinx@gmail.com \
    --cc=edumazet@google.com \
    --cc=hengqi@linux.alibaba.com \
    --cc=hkallweit1@gmail.com \
    --cc=idosch@nvidia.com \
    --cc=kaiyuanz@google.com \
    --cc=kory.maincent@bootlin.com \
    --cc=kuba@kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=maxime.chevallier@bootlin.com \
    --cc=michael.chan@broadcom.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=paul.greenwalt@intel.com \
    --cc=przemyslaw.kitszel@intel.com \
    --cc=rrameshbabu@nvidia.com \
    --cc=sridhar.samudrala@intel.com \
    --cc=willemb@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).