netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next 00/15] net/mlx5e: SHAMPO, Enable HW GRO once more
@ 2024-05-28 14:27 Tariq Toukan
  2024-05-28 14:27 ` [PATCH net-next 01/15] net/mlx5e: SHAMPO, Use net_prefetch API Tariq Toukan
                   ` (14 more replies)
  0 siblings, 15 replies; 27+ messages in thread
From: Tariq Toukan @ 2024-05-28 14:27 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: netdev, Saeed Mahameed, Gal Pressman, Leon Romanovsky,
	Tariq Toukan

This series enables hardware GRO for ConnectX-7 and newer NICs.
SHAMPO stands for Split Header And Merge Payload Offload.

The first part of the series contains important fixes and improvements.

The second part reworks the HW GRO counters.

Lastly, HW GRO is perf optimized and enabled.

Here are the bandwidth numbers for a simple iperf3 test over a single rq
where the application and irq are pinned to the same CPU:

+---------+--------+--------+-----------+-------------+
| streams | SW GRO | HW GRO | Unit      | Improvement |
+---------+--------+--------+-----------+-------------+
| 1       | 36     | 57     | Gbits/sec |    1.6 x    |
| 4       | 34     | 50     | Gbits/sec |    1.5 x    |
| 8       | 31     | 43     | Gbits/sec |    1.4 x    |
+---------+--------+--------+-----------+-------------+

Benchmark details:
VM based setup
CPU: Intel(R) Xeon(R) Platinum 8380 CPU, 24 cores
NIC: ConnectX-7 100GbE
iperf3 and irq running on same CPU over a single receive queue

Series generated against:
commit de31e96cf423 ("net/core: move the lockdep-init of sk_callback_lock to sk_init_common()")

Thanks,
Tariq.


Dragos Tatulea (11):
  net/mlx5e: SHAMPO, Fix incorrect page release
  net/mlx5e: SHAMPO, Fix invalid WQ linked list unlink
  net/mlx5e: SHAMPO, Fix FCS config when HW GRO on
  net/mlx5e: SHAMPO, Disable gso_size for non GRO packets
  net/mlx5e: SHAMPO, Simplify header page release in teardown
  net/mlx5e: SHAMPO, Specialize mlx5e_fill_skb_data()
  net/mlx5e: SHAMPO, Make GRO counters more precise
  net/mlx5e: SHAMPO, Drop rx_gro_match_packets counter
  net/mlx5e: SHAMPO, Add no-split ethtool counters for header/data split
  net/mlx5e: SHAMPO, Add header-only ethtool counters for header data
    split
  net/mlx5e: SHAMPO, Coalesce skb fragments to page size

Tariq Toukan (1):
  net/mlx5e: SHAMPO, Use net_prefetch API

Yoray Zack (3):
  net/mlx5e: SHAMPO, Skipping on duplicate flush of the same SHAMPO SKB
  net/mlx5e: SHAMPO, Use KSMs instead of KLMs
  net/mlx5e: SHAMPO, Re-enable HW-GRO

 .../ethernet/mellanox/mlx5/counters.rst       |  34 ++-
 drivers/net/ethernet/mellanox/mlx5/core/en.h  |  22 +-
 .../ethernet/mellanox/mlx5/core/en/params.c   |  12 +-
 .../net/ethernet/mellanox/mlx5/core/en/txrx.h |  19 ++
 .../net/ethernet/mellanox/mlx5/core/en_main.c |  71 ++++--
 .../net/ethernet/mellanox/mlx5/core/en_rx.c   | 205 ++++++++----------
 .../ethernet/mellanox/mlx5/core/en_stats.c    |  11 +-
 .../ethernet/mellanox/mlx5/core/en_stats.h    |  10 +-
 include/linux/mlx5/device.h                   |   1 +
 include/linux/mlx5/mlx5_ifc.h                 |  16 +-
 10 files changed, 223 insertions(+), 178 deletions(-)

-- 
2.31.1


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH net-next 01/15] net/mlx5e: SHAMPO, Use net_prefetch API
  2024-05-28 14:27 [PATCH net-next 00/15] net/mlx5e: SHAMPO, Enable HW GRO once more Tariq Toukan
@ 2024-05-28 14:27 ` Tariq Toukan
  2024-05-28 14:27 ` [PATCH net-next 02/15] net/mlx5e: SHAMPO, Fix incorrect page release Tariq Toukan
                   ` (13 subsequent siblings)
  14 siblings, 0 replies; 27+ messages in thread
From: Tariq Toukan @ 2024-05-28 14:27 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: netdev, Saeed Mahameed, Gal Pressman, Leon Romanovsky,
	Tariq Toukan

Let the SHAMPO functions use the net-specific prefetch API,
similar to all other usages.

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
index b5333da20e8a..369d101bf03c 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -2212,8 +2212,8 @@ mlx5e_skb_from_cqe_shampo(struct mlx5e_rq *rq, struct mlx5e_mpw_info *wi,
 	if (likely(frag_size <= BIT(MLX5E_SHAMPO_LOG_MAX_HEADER_ENTRY_SIZE))) {
 		/* build SKB around header */
 		dma_sync_single_range_for_cpu(rq->pdev, head->addr, 0, frag_size, rq->buff.map_dir);
-		prefetchw(hdr);
-		prefetch(data);
+		net_prefetchw(hdr);
+		net_prefetch(data);
 		skb = mlx5e_build_linear_skb(rq, hdr, frag_size, rx_headroom, head_size, 0);
 
 		if (unlikely(!skb))
@@ -2230,7 +2230,7 @@ mlx5e_skb_from_cqe_shampo(struct mlx5e_rq *rq, struct mlx5e_mpw_info *wi,
 			return NULL;
 		}
 
-		prefetchw(skb->data);
+		net_prefetchw(skb->data);
 		mlx5e_copy_skb_header(rq, skb, head->frag_page->page, head->addr,
 				      head_offset + rx_headroom,
 				      rx_headroom, head_size);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH net-next 02/15] net/mlx5e: SHAMPO, Fix incorrect page release
  2024-05-28 14:27 [PATCH net-next 00/15] net/mlx5e: SHAMPO, Enable HW GRO once more Tariq Toukan
  2024-05-28 14:27 ` [PATCH net-next 01/15] net/mlx5e: SHAMPO, Use net_prefetch API Tariq Toukan
@ 2024-05-28 14:27 ` Tariq Toukan
  2024-05-30  1:12   ` Jakub Kicinski
  2024-05-28 14:27 ` [PATCH net-next 03/15] net/mlx5e: SHAMPO, Fix invalid WQ linked list unlink Tariq Toukan
                   ` (12 subsequent siblings)
  14 siblings, 1 reply; 27+ messages in thread
From: Tariq Toukan @ 2024-05-28 14:27 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: netdev, Saeed Mahameed, Gal Pressman, Leon Romanovsky,
	Dragos Tatulea, Tariq Toukan

From: Dragos Tatulea <dtatulea@nvidia.com>

Under the following conditions:
1) No skb created yet
2) header_size == 0 (no SHAMPO header)
3) header_index + 1 % MLX5E_SHAMPO_WQ_HEADER_PER_PAGE == 0 (this is the
   last page fragment of a SHAMPO header page)

a new skb is formed with a page that is NOT a SHAMPO header page (it
is a regular data page). Further down in the same function
(mlx5e_handle_rx_cqe_mpwrq_shampo()), a SHAMPO header page from
header_index is released. This is wrong and it leads to SHAMPO header
pages being released more than once.

Fixes: 6f5742846053 ("net/mlx5e: RX, Enable skb page recycling through the page_pool")
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
index 369d101bf03c..1ddfa00f923f 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -2369,7 +2369,8 @@ static void mlx5e_handle_rx_cqe_mpwrq_shampo(struct mlx5e_rq *rq, struct mlx5_cq
 	if (flush)
 		mlx5e_shampo_flush_skb(rq, cqe, match);
 free_hd_entry:
-	mlx5e_free_rx_shampo_hd_entry(rq, header_index);
+	if (likely(head_size))
+		mlx5e_free_rx_shampo_hd_entry(rq, header_index);
 mpwrq_cqe_out:
 	if (likely(wi->consumed_strides < rq->mpwqe.num_strides))
 		return;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH net-next 03/15] net/mlx5e: SHAMPO, Fix invalid WQ linked list unlink
  2024-05-28 14:27 [PATCH net-next 00/15] net/mlx5e: SHAMPO, Enable HW GRO once more Tariq Toukan
  2024-05-28 14:27 ` [PATCH net-next 01/15] net/mlx5e: SHAMPO, Use net_prefetch API Tariq Toukan
  2024-05-28 14:27 ` [PATCH net-next 02/15] net/mlx5e: SHAMPO, Fix incorrect page release Tariq Toukan
@ 2024-05-28 14:27 ` Tariq Toukan
  2024-05-28 14:27 ` [PATCH net-next 04/15] net/mlx5e: SHAMPO, Fix FCS config when HW GRO on Tariq Toukan
                   ` (11 subsequent siblings)
  14 siblings, 0 replies; 27+ messages in thread
From: Tariq Toukan @ 2024-05-28 14:27 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: netdev, Saeed Mahameed, Gal Pressman, Leon Romanovsky,
	Dragos Tatulea, Tariq Toukan

From: Dragos Tatulea <dtatulea@nvidia.com>

When all the strides in a WQE have been consumed, the WQE is unlinked
from the WQ linked list (mlx5_wq_ll_pop()). For SHAMPO, it is possible
to receive CQEs with 0 consumed strides for the same WQE even after the
WQE is fully consumed and unlinked. This triggers an additional unlink
for the same wqe which corrupts the linked list.

Fix this scenario by accepting 0 sized consumed strides without
unlinking the WQE again.

Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
index 1ddfa00f923f..b3ef0dd23729 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -2375,6 +2375,9 @@ static void mlx5e_handle_rx_cqe_mpwrq_shampo(struct mlx5e_rq *rq, struct mlx5_cq
 	if (likely(wi->consumed_strides < rq->mpwqe.num_strides))
 		return;
 
+	if (unlikely(!cstrides))
+		return;
+
 	wq  = &rq->mpwqe.wq;
 	wqe = mlx5_wq_ll_get_wqe(wq, wqe_id);
 	mlx5_wq_ll_pop(wq, cqe->wqe_id, &wqe->next.next_wqe_index);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH net-next 04/15] net/mlx5e: SHAMPO, Fix FCS config when HW GRO on
  2024-05-28 14:27 [PATCH net-next 00/15] net/mlx5e: SHAMPO, Enable HW GRO once more Tariq Toukan
                   ` (2 preceding siblings ...)
  2024-05-28 14:27 ` [PATCH net-next 03/15] net/mlx5e: SHAMPO, Fix invalid WQ linked list unlink Tariq Toukan
@ 2024-05-28 14:27 ` Tariq Toukan
  2024-05-28 14:27 ` [PATCH net-next 05/15] net/mlx5e: SHAMPO, Disable gso_size for non GRO packets Tariq Toukan
                   ` (10 subsequent siblings)
  14 siblings, 0 replies; 27+ messages in thread
From: Tariq Toukan @ 2024-05-28 14:27 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: netdev, Saeed Mahameed, Gal Pressman, Leon Romanovsky,
	Dragos Tatulea, Tariq Toukan

From: Dragos Tatulea <dtatulea@nvidia.com>

For the following scenario:

ethtool --features eth3 rx-gro-hw on
ethtool --features eth3 rx-fcs on
ethtool --features eth3 rx-fcs off

... there is a firmware error because the driver enables HW GRO first
while FCS is still enabled.

This patch fixes this by swapping the order of HW GRO and FCS for this
specific case. Take LRO into consideration as well for consistency.

Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index b758bc72ac36..1b999bf8d3a0 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -4259,13 +4259,19 @@ int mlx5e_set_features(struct net_device *netdev, netdev_features_t features)
 #define MLX5E_HANDLE_FEATURE(feature, handler) \
 	mlx5e_handle_feature(netdev, &oper_features, feature, handler)
 
-	err |= MLX5E_HANDLE_FEATURE(NETIF_F_LRO, set_feature_lro);
-	err |= MLX5E_HANDLE_FEATURE(NETIF_F_GRO_HW, set_feature_hw_gro);
+	if (features & (NETIF_F_GRO_HW | NETIF_F_LRO)) {
+		err |= MLX5E_HANDLE_FEATURE(NETIF_F_RXFCS, set_feature_rx_fcs);
+		err |= MLX5E_HANDLE_FEATURE(NETIF_F_LRO, set_feature_lro);
+		err |= MLX5E_HANDLE_FEATURE(NETIF_F_GRO_HW, set_feature_hw_gro);
+	} else {
+		err |= MLX5E_HANDLE_FEATURE(NETIF_F_LRO, set_feature_lro);
+		err |= MLX5E_HANDLE_FEATURE(NETIF_F_GRO_HW, set_feature_hw_gro);
+		err |= MLX5E_HANDLE_FEATURE(NETIF_F_RXFCS, set_feature_rx_fcs);
+	}
 	err |= MLX5E_HANDLE_FEATURE(NETIF_F_HW_VLAN_CTAG_FILTER,
 				    set_feature_cvlan_filter);
 	err |= MLX5E_HANDLE_FEATURE(NETIF_F_HW_TC, set_feature_hw_tc);
 	err |= MLX5E_HANDLE_FEATURE(NETIF_F_RXALL, set_feature_rx_all);
-	err |= MLX5E_HANDLE_FEATURE(NETIF_F_RXFCS, set_feature_rx_fcs);
 	err |= MLX5E_HANDLE_FEATURE(NETIF_F_HW_VLAN_CTAG_RX, set_feature_rx_vlan);
 #ifdef CONFIG_MLX5_EN_ARFS
 	err |= MLX5E_HANDLE_FEATURE(NETIF_F_NTUPLE, set_feature_arfs);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH net-next 05/15] net/mlx5e: SHAMPO, Disable gso_size for non GRO packets
  2024-05-28 14:27 [PATCH net-next 00/15] net/mlx5e: SHAMPO, Enable HW GRO once more Tariq Toukan
                   ` (3 preceding siblings ...)
  2024-05-28 14:27 ` [PATCH net-next 04/15] net/mlx5e: SHAMPO, Fix FCS config when HW GRO on Tariq Toukan
@ 2024-05-28 14:27 ` Tariq Toukan
  2024-05-28 14:27 ` [PATCH net-next 06/15] net/mlx5e: SHAMPO, Simplify header page release in teardown Tariq Toukan
                   ` (9 subsequent siblings)
  14 siblings, 0 replies; 27+ messages in thread
From: Tariq Toukan @ 2024-05-28 14:27 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: netdev, Saeed Mahameed, Gal Pressman, Leon Romanovsky,
	Dragos Tatulea, Tariq Toukan

From: Dragos Tatulea <dtatulea@nvidia.com>

When HW GRO is enabled, forwarding of packets is broken due to gso_size
being set incorrectly on non GRO packets.

Non GRO packets have a skb GRO count of 1. mlx5 always sets gso_size on
the skb, even for non GRO packets. It leans on the fact that gso_size is
normally reset in napi_gro_complete(). But this happens only for packets
from GRO'able protocols (TCP/UDP) that have a gro_receive() handler.

The problematic scenarios are:

1) Non GRO protocol packets are received, validate_xmit_skb() will drop
   them (see EPROTONOSUPPORT in skb_mac_gso_segment()). The fix for
   this case would be to not set gso_size at all for SHAMPO packets with
   header size 0.

2) Packets from a GRO'ed protocol (TCP) are received but immediately
   flushed because they are not GRO'able (TCP SYN for example).
   mlx5e_shampo_update_hdr(), which updates the remaining GRO state on
   the skb, is not called because skb GRO count is 1. The fix here would
   be to always call mlx5e_shampo_update_hdr(), regardless of skb GRO
   count. But this call is expensive

The unified fix for both cases is to reset gso_size before calling
napi_gro_receive(). It is a change that is more effective (no call to
mlx5e_shampo_update_hdr() necessary) and simple (smallest code
footprint).

Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
index b3ef0dd23729..a13fa760f948 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -2267,6 +2267,8 @@ mlx5e_shampo_flush_skb(struct mlx5e_rq *rq, struct mlx5_cqe64 *cqe, bool match)
 		mlx5e_shampo_align_fragment(skb, rq->mpwqe.log_stride_sz);
 	if (NAPI_GRO_CB(skb)->count > 1)
 		mlx5e_shampo_update_hdr(rq, cqe, match);
+	else
+		skb_shinfo(skb)->gso_size = 0;
 	napi_gro_receive(rq->cq.napi, skb);
 	rq->hw_gro_data->skb = NULL;
 }
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH net-next 06/15] net/mlx5e: SHAMPO, Simplify header page release in teardown
  2024-05-28 14:27 [PATCH net-next 00/15] net/mlx5e: SHAMPO, Enable HW GRO once more Tariq Toukan
                   ` (4 preceding siblings ...)
  2024-05-28 14:27 ` [PATCH net-next 05/15] net/mlx5e: SHAMPO, Disable gso_size for non GRO packets Tariq Toukan
@ 2024-05-28 14:27 ` Tariq Toukan
  2024-05-28 14:27 ` [PATCH net-next 07/15] net/mlx5e: SHAMPO, Specialize mlx5e_fill_skb_data() Tariq Toukan
                   ` (8 subsequent siblings)
  14 siblings, 0 replies; 27+ messages in thread
From: Tariq Toukan @ 2024-05-28 14:27 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: netdev, Saeed Mahameed, Gal Pressman, Leon Romanovsky,
	Dragos Tatulea, Tariq Toukan

From: Dragos Tatulea <dtatulea@nvidia.com>

The function that releases SHAMPO header pages (mlx5e_shampo_dealloc_hd)
has some complicated logic that comes from the fact that it is called
twice during teardown:
1) To release the posted header pages that didn't get any completions.
2) To release all remaining header pages.

This flow is not necessary: all header pages can be released from the
driver side in one go. Furthermore, the above flow is buggy. Taking the
8 headers per page example:
1) Release fragments 5-7. Page will be released.
2) Release remaining fragments 0-4. The bits in the header will indicate
   that the page needs releasing. But this is incorrect: page was
   released in step 1.

This patch releases all header pages in one go. This simplifies the
header page cleanup function. For consistency, the datapath header
page release API (mlx5e_free_rx_shampo_hd_entry()) is used.

Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h  |  2 +-
 .../net/ethernet/mellanox/mlx5/core/en_main.c | 12 +---
 .../net/ethernet/mellanox/mlx5/core/en_rx.c   | 61 +++++--------------
 3 files changed, 17 insertions(+), 58 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index e85fb71bf0b4..ff326601d4a4 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -1014,7 +1014,7 @@ void mlx5e_build_ptys2ethtool_map(void);
 bool mlx5e_check_fragmented_striding_rq_cap(struct mlx5_core_dev *mdev, u8 page_shift,
 					    enum mlx5e_mpwrq_umr_mode umr_mode);
 
-void mlx5e_shampo_dealloc_hd(struct mlx5e_rq *rq, u16 len, u16 start, bool close);
+void mlx5e_shampo_dealloc_hd(struct mlx5e_rq *rq);
 void mlx5e_get_stats(struct net_device *dev, struct rtnl_link_stats64 *stats);
 void mlx5e_fold_sw_stats64(struct mlx5e_priv *priv, struct rtnl_link_stats64 *s);
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 1b999bf8d3a0..1b08995b8022 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -1208,15 +1208,6 @@ void mlx5e_free_rx_missing_descs(struct mlx5e_rq *rq)
 		head = mlx5_wq_ll_get_wqe_next_ix(wq, head);
 	}
 
-	if (test_bit(MLX5E_RQ_STATE_SHAMPO, &rq->state)) {
-		u16 len;
-
-		len = (rq->mpwqe.shampo->pi - rq->mpwqe.shampo->ci) &
-		      (rq->mpwqe.shampo->hd_per_wq - 1);
-		mlx5e_shampo_dealloc_hd(rq, len, rq->mpwqe.shampo->ci, false);
-		rq->mpwqe.shampo->pi = rq->mpwqe.shampo->ci;
-	}
-
 	rq->mpwqe.actual_wq_head = wq->head;
 	rq->mpwqe.umr_in_progress = 0;
 	rq->mpwqe.umr_completed = 0;
@@ -1244,8 +1235,7 @@ void mlx5e_free_rx_descs(struct mlx5e_rq *rq)
 		}
 
 		if (test_bit(MLX5E_RQ_STATE_SHAMPO, &rq->state))
-			mlx5e_shampo_dealloc_hd(rq, rq->mpwqe.shampo->hd_per_wq,
-						0, true);
+			mlx5e_shampo_dealloc_hd(rq);
 	} else {
 		struct mlx5_wq_cyc *wq = &rq->wqe.wq;
 		u16 missing = mlx5_wq_cyc_missing(wq);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
index a13fa760f948..bb59ee0b1567 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -839,44 +839,28 @@ static int mlx5e_alloc_rx_mpwqe(struct mlx5e_rq *rq, u16 ix)
 	return err;
 }
 
-/* This function is responsible to dealloc SHAMPO header buffer.
- * close == true specifies that we are in the middle of closing RQ operation so
- * we go over all the entries and if they are not in use we free them,
- * otherwise we only go over a specific range inside the header buffer that are
- * not in use.
- */
-void mlx5e_shampo_dealloc_hd(struct mlx5e_rq *rq, u16 len, u16 start, bool close)
+static void
+mlx5e_free_rx_shampo_hd_entry(struct mlx5e_rq *rq, u16 header_index)
 {
 	struct mlx5e_shampo_hd *shampo = rq->mpwqe.shampo;
-	struct mlx5e_frag_page *deleted_page = NULL;
-	int hd_per_wq = shampo->hd_per_wq;
-	struct mlx5e_dma_info *hd_info;
-	int i, index = start;
-
-	for (i = 0; i < len; i++, index++) {
-		if (index == hd_per_wq)
-			index = 0;
-
-		if (close && !test_bit(index, shampo->bitmap))
-			continue;
+	u64 addr = shampo->info[header_index].addr;
 
-		hd_info = &shampo->info[index];
-		hd_info->addr = ALIGN_DOWN(hd_info->addr, PAGE_SIZE);
-		if (hd_info->frag_page && hd_info->frag_page != deleted_page) {
-			deleted_page = hd_info->frag_page;
-			mlx5e_page_release_fragmented(rq, hd_info->frag_page);
-		}
+	if (((header_index + 1) & (MLX5E_SHAMPO_WQ_HEADER_PER_PAGE - 1)) == 0) {
+		struct mlx5e_dma_info *dma_info = &shampo->info[header_index];
 
-		hd_info->frag_page = NULL;
+		dma_info->addr = ALIGN_DOWN(addr, PAGE_SIZE);
+		mlx5e_page_release_fragmented(rq, dma_info->frag_page);
 	}
+	clear_bit(header_index, shampo->bitmap);
+}
 
-	if (start + len > hd_per_wq) {
-		len -= hd_per_wq - start;
-		bitmap_clear(shampo->bitmap, start, hd_per_wq - start);
-		start = 0;
-	}
+void mlx5e_shampo_dealloc_hd(struct mlx5e_rq *rq)
+{
+	struct mlx5e_shampo_hd *shampo = rq->mpwqe.shampo;
+	int i;
 
-	bitmap_clear(shampo->bitmap, start, len);
+	for_each_set_bit(i, shampo->bitmap, rq->mpwqe.shampo->hd_per_wq)
+		mlx5e_free_rx_shampo_hd_entry(rq, i);
 }
 
 static void mlx5e_dealloc_rx_mpwqe(struct mlx5e_rq *rq, u16 ix)
@@ -2281,21 +2265,6 @@ mlx5e_hw_gro_skb_has_enough_space(struct sk_buff *skb, u16 data_bcnt)
 	return PAGE_SIZE * nr_frags + data_bcnt <= GRO_LEGACY_MAX_SIZE;
 }
 
-static void
-mlx5e_free_rx_shampo_hd_entry(struct mlx5e_rq *rq, u16 header_index)
-{
-	struct mlx5e_shampo_hd *shampo = rq->mpwqe.shampo;
-	u64 addr = shampo->info[header_index].addr;
-
-	if (((header_index + 1) & (MLX5E_SHAMPO_WQ_HEADER_PER_PAGE - 1)) == 0) {
-		struct mlx5e_dma_info *dma_info = &shampo->info[header_index];
-
-		dma_info->addr = ALIGN_DOWN(addr, PAGE_SIZE);
-		mlx5e_page_release_fragmented(rq, dma_info->frag_page);
-	}
-	bitmap_clear(shampo->bitmap, header_index, 1);
-}
-
 static void mlx5e_handle_rx_cqe_mpwrq_shampo(struct mlx5e_rq *rq, struct mlx5_cqe64 *cqe)
 {
 	u16 data_bcnt		= mpwrq_get_cqe_byte_cnt(cqe) - cqe->shampo.header_size;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH net-next 07/15] net/mlx5e: SHAMPO, Specialize mlx5e_fill_skb_data()
  2024-05-28 14:27 [PATCH net-next 00/15] net/mlx5e: SHAMPO, Enable HW GRO once more Tariq Toukan
                   ` (5 preceding siblings ...)
  2024-05-28 14:27 ` [PATCH net-next 06/15] net/mlx5e: SHAMPO, Simplify header page release in teardown Tariq Toukan
@ 2024-05-28 14:27 ` Tariq Toukan
  2024-05-28 14:28 ` [PATCH net-next 08/15] net/mlx5e: SHAMPO, Skipping on duplicate flush of the same SHAMPO SKB Tariq Toukan
                   ` (7 subsequent siblings)
  14 siblings, 0 replies; 27+ messages in thread
From: Tariq Toukan @ 2024-05-28 14:27 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: netdev, Saeed Mahameed, Gal Pressman, Leon Romanovsky,
	Dragos Tatulea, Tariq Toukan

From: Dragos Tatulea <dtatulea@nvidia.com>

mlx5e_fill_skb_data() used to have multiple callers. But after the XDP
multibuf refactoring from commit 2cb0e27d43b4 ("net/mlx5e: RX, Prepare
non-linear striding RQ for XDP multi-buffer support") the SHAMPO code
path is the only caller.

Take advantage of this and specialize the function:
- Drop the redundant check.
- Assume that data_bcnt is > 0. This is needed in a downstream patch.

Rename the function as well to make things clear.

Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Suggested-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
 .../net/ethernet/mellanox/mlx5/core/en_rx.c   | 25 ++++++++-----------
 1 file changed, 11 insertions(+), 14 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
index bb59ee0b1567..1e3a5b2afeae 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -1948,21 +1948,16 @@ const struct mlx5e_rx_handlers mlx5e_rx_handlers_rep = {
 #endif
 
 static void
-mlx5e_fill_skb_data(struct sk_buff *skb, struct mlx5e_rq *rq,
-		    struct mlx5e_frag_page *frag_page,
-		    u32 data_bcnt, u32 data_offset)
+mlx5e_shampo_fill_skb_data(struct sk_buff *skb, struct mlx5e_rq *rq,
+			   struct mlx5e_frag_page *frag_page,
+			   u32 data_bcnt, u32 data_offset)
 {
 	net_prefetchw(skb->data);
 
-	while (data_bcnt) {
+	do {
 		/* Non-linear mode, hence non-XSK, which always uses PAGE_SIZE. */
 		u32 pg_consumed_bytes = min_t(u32, PAGE_SIZE - data_offset, data_bcnt);
-		unsigned int truesize;
-
-		if (test_bit(MLX5E_RQ_STATE_SHAMPO, &rq->state))
-			truesize = pg_consumed_bytes;
-		else
-			truesize = ALIGN(pg_consumed_bytes, BIT(rq->mpwqe.log_stride_sz));
+		unsigned int truesize = pg_consumed_bytes;
 
 		frag_page->frags++;
 		mlx5e_add_skb_frag(rq, skb, frag_page->page, data_offset,
@@ -1971,7 +1966,7 @@ mlx5e_fill_skb_data(struct sk_buff *skb, struct mlx5e_rq *rq,
 		data_bcnt -= pg_consumed_bytes;
 		data_offset = 0;
 		frag_page++;
-	}
+	} while (data_bcnt);
 }
 
 static struct sk_buff *
@@ -2330,10 +2325,12 @@ static void mlx5e_handle_rx_cqe_mpwrq_shampo(struct mlx5e_rq *rq, struct mlx5_cq
 	}
 
 	if (likely(head_size)) {
-		struct mlx5e_frag_page *frag_page;
+		if (data_bcnt) {
+			struct mlx5e_frag_page *frag_page;
 
-		frag_page = &wi->alloc_units.frag_pages[page_idx];
-		mlx5e_fill_skb_data(*skb, rq, frag_page, data_bcnt, data_offset);
+			frag_page = &wi->alloc_units.frag_pages[page_idx];
+			mlx5e_shampo_fill_skb_data(*skb, rq, frag_page, data_bcnt, data_offset);
+		}
 	}
 
 	mlx5e_shampo_complete_rx_cqe(rq, cqe, cqe_bcnt, *skb);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH net-next 08/15] net/mlx5e: SHAMPO, Skipping on duplicate flush of the same SHAMPO SKB
  2024-05-28 14:27 [PATCH net-next 00/15] net/mlx5e: SHAMPO, Enable HW GRO once more Tariq Toukan
                   ` (6 preceding siblings ...)
  2024-05-28 14:27 ` [PATCH net-next 07/15] net/mlx5e: SHAMPO, Specialize mlx5e_fill_skb_data() Tariq Toukan
@ 2024-05-28 14:28 ` Tariq Toukan
  2024-06-05 13:48   ` Simon Horman
  2024-05-28 14:28 ` [PATCH net-next 09/15] net/mlx5e: SHAMPO, Make GRO counters more precise Tariq Toukan
                   ` (6 subsequent siblings)
  14 siblings, 1 reply; 27+ messages in thread
From: Tariq Toukan @ 2024-05-28 14:28 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: netdev, Saeed Mahameed, Gal Pressman, Leon Romanovsky, Yoray Zack,
	Tariq Toukan

From: Yoray Zack <yorayz@nvidia.com>

SHAMPO SKB can be flushed in mlx5e_shampo_complete_rx_cqe().
If the SKB was flushed, rq->hw_gro_data->skb was also set to NULL.

We can skip on flushing the SKB in mlx5e_shampo_flush_skb
if rq->hw_gro_data->skb == NULL.

Signed-off-by: Yoray Zack <yorayz@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
index 1e3a5b2afeae..3f76c33aada0 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -2334,7 +2334,7 @@ static void mlx5e_handle_rx_cqe_mpwrq_shampo(struct mlx5e_rq *rq, struct mlx5_cq
 	}
 
 	mlx5e_shampo_complete_rx_cqe(rq, cqe, cqe_bcnt, *skb);
-	if (flush)
+	if (flush && rq->hw_gro_data->skb)
 		mlx5e_shampo_flush_skb(rq, cqe, match);
 free_hd_entry:
 	if (likely(head_size))
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH net-next 09/15] net/mlx5e: SHAMPO, Make GRO counters more precise
  2024-05-28 14:27 [PATCH net-next 00/15] net/mlx5e: SHAMPO, Enable HW GRO once more Tariq Toukan
                   ` (7 preceding siblings ...)
  2024-05-28 14:28 ` [PATCH net-next 08/15] net/mlx5e: SHAMPO, Skipping on duplicate flush of the same SHAMPO SKB Tariq Toukan
@ 2024-05-28 14:28 ` Tariq Toukan
  2024-05-28 14:28 ` [PATCH net-next 10/15] net/mlx5e: SHAMPO, Drop rx_gro_match_packets counter Tariq Toukan
                   ` (5 subsequent siblings)
  14 siblings, 0 replies; 27+ messages in thread
From: Tariq Toukan @ 2024-05-28 14:28 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: netdev, Saeed Mahameed, Gal Pressman, Leon Romanovsky,
	Dragos Tatulea, Tariq Toukan

From: Dragos Tatulea <dtatulea@nvidia.com>

Don't count non GRO packets. A non GRO packet is a packet with
a GRO cb count of 1.

Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
 .../ethernet/mellanox/mlx5/counters.rst             | 10 ++++++----
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c     | 13 ++++++++-----
 2 files changed, 14 insertions(+), 9 deletions(-)

diff --git a/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/counters.rst b/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/counters.rst
index fed821ef9b09..7ed010dbe469 100644
--- a/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/counters.rst
+++ b/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/counters.rst
@@ -189,17 +189,19 @@ the software port.
 
    * - `rx[i]_gro_packets`
      - Number of received packets processed using hardware-accelerated GRO. The
-       number of hardware GRO offloaded packets received on ring i.
+       number of hardware GRO offloaded packets received on ring i. Only true GRO
+       packets are counted: only packets that are in an SKB with a GRO count > 1.
      - Acceleration
 
    * - `rx[i]_gro_bytes`
      - Number of received bytes processed using hardware-accelerated GRO. The
-       number of hardware GRO offloaded bytes received on ring i.
+       number of hardware GRO offloaded bytes received on ring i. Only true GRO
+       packets are counted: only packets that are in an SKB with a GRO count > 1.
      - Acceleration
 
    * - `rx[i]_gro_skbs`
-     - The number of receive SKBs constructed while performing
-       hardware-accelerated GRO.
+     - The number of GRO SKBs constructed from hardware-accelerated GRO. Only SKBs
+       with a GRO count > 1 are counted.
      - Informative
 
    * - `rx[i]_gro_match_packets`
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
index 3f76c33aada0..79b486d5475d 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -1596,9 +1596,7 @@ static void mlx5e_shampo_complete_rx_cqe(struct mlx5e_rq *rq,
 	struct mlx5e_rq_stats *stats = rq->stats;
 
 	stats->packets++;
-	stats->gro_packets++;
 	stats->bytes += cqe_bcnt;
-	stats->gro_bytes += cqe_bcnt;
 	if (NAPI_GRO_CB(skb)->count != 1)
 		return;
 	mlx5e_build_rx_skb(cqe, cqe_bcnt, rq, skb);
@@ -2240,14 +2238,19 @@ mlx5e_shampo_flush_skb(struct mlx5e_rq *rq, struct mlx5_cqe64 *cqe, bool match)
 {
 	struct sk_buff *skb = rq->hw_gro_data->skb;
 	struct mlx5e_rq_stats *stats = rq->stats;
+	u16 gro_count = NAPI_GRO_CB(skb)->count;
 
-	stats->gro_skbs++;
 	if (likely(skb_shinfo(skb)->nr_frags))
 		mlx5e_shampo_align_fragment(skb, rq->mpwqe.log_stride_sz);
-	if (NAPI_GRO_CB(skb)->count > 1)
+	if (gro_count > 1) {
+		stats->gro_skbs++;
+		stats->gro_packets += gro_count;
+		stats->gro_bytes += skb->data_len + skb_headlen(skb) * gro_count;
+
 		mlx5e_shampo_update_hdr(rq, cqe, match);
-	else
+	} else {
 		skb_shinfo(skb)->gso_size = 0;
+	}
 	napi_gro_receive(rq->cq.napi, skb);
 	rq->hw_gro_data->skb = NULL;
 }
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH net-next 10/15] net/mlx5e: SHAMPO, Drop rx_gro_match_packets counter
  2024-05-28 14:27 [PATCH net-next 00/15] net/mlx5e: SHAMPO, Enable HW GRO once more Tariq Toukan
                   ` (8 preceding siblings ...)
  2024-05-28 14:28 ` [PATCH net-next 09/15] net/mlx5e: SHAMPO, Make GRO counters more precise Tariq Toukan
@ 2024-05-28 14:28 ` Tariq Toukan
  2024-05-28 14:28 ` [PATCH net-next 11/15] net/mlx5e: SHAMPO, Add no-split ethtool counters for header/data split Tariq Toukan
                   ` (4 subsequent siblings)
  14 siblings, 0 replies; 27+ messages in thread
From: Tariq Toukan @ 2024-05-28 14:28 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: netdev, Saeed Mahameed, Gal Pressman, Leon Romanovsky,
	Dragos Tatulea, Tariq Toukan

From: Dragos Tatulea <dtatulea@nvidia.com>

After modifying rx_gro_packets to be more accurate, the
rx_gro_match_packets counter is redundant.

Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
 .../device_drivers/ethernet/mellanox/mlx5/counters.rst       | 5 -----
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c              | 2 --
 drivers/net/ethernet/mellanox/mlx5/core/en_stats.c           | 3 ---
 drivers/net/ethernet/mellanox/mlx5/core/en_stats.h           | 2 --
 4 files changed, 12 deletions(-)

diff --git a/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/counters.rst b/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/counters.rst
index 7ed010dbe469..18638a8e7c73 100644
--- a/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/counters.rst
+++ b/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/counters.rst
@@ -204,11 +204,6 @@ the software port.
        with a GRO count > 1 are counted.
      - Informative
 
-   * - `rx[i]_gro_match_packets`
-     - Number of received packets processed using hardware-accelerated GRO that
-       met the flow table match criteria.
-     - Informative
-
    * - `rx[i]_gro_large_hds`
      - Number of receive packets using hardware-accelerated GRO that have large
        headers that require additional memory to be allocated.
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
index 79b486d5475d..7ab7215843b6 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -2296,8 +2296,6 @@ static void mlx5e_handle_rx_cqe_mpwrq_shampo(struct mlx5e_rq *rq, struct mlx5_cq
 		goto mpwrq_cqe_out;
 	}
 
-	stats->gro_match_packets += match;
-
 	if (*skb && (!match || !(mlx5e_hw_gro_skb_has_enough_space(*skb, data_bcnt)))) {
 		match = false;
 		mlx5e_shampo_flush_skb(rq, cqe, match);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c
index e211c41cec06..a1657fad9a0d 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c
@@ -141,7 +141,6 @@ static const struct counter_desc sw_stats_desc[] = {
 	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_gro_packets) },
 	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_gro_bytes) },
 	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_gro_skbs) },
-	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_gro_match_packets) },
 	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_gro_large_hds) },
 	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_ecn_mark) },
 	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_removed_vlan_packets) },
@@ -343,7 +342,6 @@ static void mlx5e_stats_grp_sw_update_stats_rq_stats(struct mlx5e_sw_stats *s,
 	s->rx_gro_packets             += rq_stats->gro_packets;
 	s->rx_gro_bytes               += rq_stats->gro_bytes;
 	s->rx_gro_skbs                += rq_stats->gro_skbs;
-	s->rx_gro_match_packets       += rq_stats->gro_match_packets;
 	s->rx_gro_large_hds           += rq_stats->gro_large_hds;
 	s->rx_ecn_mark                += rq_stats->ecn_mark;
 	s->rx_removed_vlan_packets    += rq_stats->removed_vlan_packets;
@@ -2053,7 +2051,6 @@ static const struct counter_desc rq_stats_desc[] = {
 	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, gro_packets) },
 	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, gro_bytes) },
 	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, gro_skbs) },
-	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, gro_match_packets) },
 	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, gro_large_hds) },
 	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, ecn_mark) },
 	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, removed_vlan_packets) },
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h
index 650732288616..25daae526caa 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h
@@ -153,7 +153,6 @@ struct mlx5e_sw_stats {
 	u64 rx_gro_packets;
 	u64 rx_gro_bytes;
 	u64 rx_gro_skbs;
-	u64 rx_gro_match_packets;
 	u64 rx_gro_large_hds;
 	u64 rx_mcast_packets;
 	u64 rx_ecn_mark;
@@ -352,7 +351,6 @@ struct mlx5e_rq_stats {
 	u64 gro_packets;
 	u64 gro_bytes;
 	u64 gro_skbs;
-	u64 gro_match_packets;
 	u64 gro_large_hds;
 	u64 mcast_packets;
 	u64 ecn_mark;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH net-next 11/15] net/mlx5e: SHAMPO, Add no-split ethtool counters for header/data split
  2024-05-28 14:27 [PATCH net-next 00/15] net/mlx5e: SHAMPO, Enable HW GRO once more Tariq Toukan
                   ` (9 preceding siblings ...)
  2024-05-28 14:28 ` [PATCH net-next 10/15] net/mlx5e: SHAMPO, Drop rx_gro_match_packets counter Tariq Toukan
@ 2024-05-28 14:28 ` Tariq Toukan
  2024-05-30  1:22   ` Jakub Kicinski
  2024-05-28 14:28 ` [PATCH net-next 12/15] net/mlx5e: SHAMPO, Add header-only ethtool counters for header data split Tariq Toukan
                   ` (3 subsequent siblings)
  14 siblings, 1 reply; 27+ messages in thread
From: Tariq Toukan @ 2024-05-28 14:28 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: netdev, Saeed Mahameed, Gal Pressman, Leon Romanovsky,
	Dragos Tatulea, Tariq Toukan

From: Dragos Tatulea <dtatulea@nvidia.com>

When SHAMPO can't identify the protocol/header of a packet, it will
yield a packet that is not split - all the packet is in the data part.
Count this value in packets and bytes.

Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
 .../device_drivers/ethernet/mellanox/mlx5/counters.rst | 10 ++++++++++
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c        |  3 +++
 drivers/net/ethernet/mellanox/mlx5/core/en_stats.c     |  4 ++++
 drivers/net/ethernet/mellanox/mlx5/core/en_stats.h     |  4 ++++
 4 files changed, 21 insertions(+)

diff --git a/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/counters.rst b/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/counters.rst
index 18638a8e7c73..deb0e07432c4 100644
--- a/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/counters.rst
+++ b/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/counters.rst
@@ -209,6 +209,16 @@ the software port.
        headers that require additional memory to be allocated.
      - Informative
 
+   * - `rx[i]_hds_nosplit_packets`
+     - Number of packets that were not split in modes that do header/data split
+       [#accel]_.
+     - Informative
+
+   * - `rx[i]_hds_nosplit_bytes`
+     - Number of bytes that were not split in modes that do header/data split
+       [#accel]_.
+     - Informative
+
    * - `rx[i]_lro_packets`
      - The number of LRO packets received on ring i [#accel]_.
      - Acceleration
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
index 7ab7215843b6..f40f34877904 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -2332,6 +2332,9 @@ static void mlx5e_handle_rx_cqe_mpwrq_shampo(struct mlx5e_rq *rq, struct mlx5_cq
 			frag_page = &wi->alloc_units.frag_pages[page_idx];
 			mlx5e_shampo_fill_skb_data(*skb, rq, frag_page, data_bcnt, data_offset);
 		}
+	} else {
+		stats->hds_nosplit_packets++;
+		stats->hds_nosplit_bytes += data_bcnt;
 	}
 
 	mlx5e_shampo_complete_rx_cqe(rq, cqe, cqe_bcnt, *skb);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c
index a1657fad9a0d..96ecf675f90d 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c
@@ -343,6 +343,8 @@ static void mlx5e_stats_grp_sw_update_stats_rq_stats(struct mlx5e_sw_stats *s,
 	s->rx_gro_bytes               += rq_stats->gro_bytes;
 	s->rx_gro_skbs                += rq_stats->gro_skbs;
 	s->rx_gro_large_hds           += rq_stats->gro_large_hds;
+	s->rx_hds_nosplit_packets     += rq_stats->hds_nosplit_packets;
+	s->rx_hds_nosplit_bytes       += rq_stats->hds_nosplit_bytes;
 	s->rx_ecn_mark                += rq_stats->ecn_mark;
 	s->rx_removed_vlan_packets    += rq_stats->removed_vlan_packets;
 	s->rx_csum_none               += rq_stats->csum_none;
@@ -2052,6 +2054,8 @@ static const struct counter_desc rq_stats_desc[] = {
 	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, gro_bytes) },
 	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, gro_skbs) },
 	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, gro_large_hds) },
+	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, hds_nosplit_packets) },
+	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, hds_nosplit_bytes) },
 	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, ecn_mark) },
 	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, removed_vlan_packets) },
 	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, wqe_err) },
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h
index 25daae526caa..6967c8c91f9a 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h
@@ -154,6 +154,8 @@ struct mlx5e_sw_stats {
 	u64 rx_gro_bytes;
 	u64 rx_gro_skbs;
 	u64 rx_gro_large_hds;
+	u64 rx_hds_nosplit_packets;
+	u64 rx_hds_nosplit_bytes;
 	u64 rx_mcast_packets;
 	u64 rx_ecn_mark;
 	u64 rx_removed_vlan_packets;
@@ -352,6 +354,8 @@ struct mlx5e_rq_stats {
 	u64 gro_bytes;
 	u64 gro_skbs;
 	u64 gro_large_hds;
+	u64 hds_nosplit_packets;
+	u64 hds_nosplit_bytes;
 	u64 mcast_packets;
 	u64 ecn_mark;
 	u64 removed_vlan_packets;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH net-next 12/15] net/mlx5e: SHAMPO, Add header-only ethtool counters for header data split
  2024-05-28 14:27 [PATCH net-next 00/15] net/mlx5e: SHAMPO, Enable HW GRO once more Tariq Toukan
                   ` (10 preceding siblings ...)
  2024-05-28 14:28 ` [PATCH net-next 11/15] net/mlx5e: SHAMPO, Add no-split ethtool counters for header/data split Tariq Toukan
@ 2024-05-28 14:28 ` Tariq Toukan
  2024-05-28 14:28 ` [PATCH net-next 13/15] net/mlx5e: SHAMPO, Use KSMs instead of KLMs Tariq Toukan
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 27+ messages in thread
From: Tariq Toukan @ 2024-05-28 14:28 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: netdev, Saeed Mahameed, Gal Pressman, Leon Romanovsky,
	Dragos Tatulea, Tariq Toukan

From: Dragos Tatulea <dtatulea@nvidia.com>

Count the number of header-only packets and bytes from SHAMPO.

Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
 .../device_drivers/ethernet/mellanox/mlx5/counters.rst   | 9 +++++++++
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c          | 3 +++
 drivers/net/ethernet/mellanox/mlx5/core/en_stats.c       | 4 ++++
 drivers/net/ethernet/mellanox/mlx5/core/en_stats.h       | 4 ++++
 4 files changed, 20 insertions(+)

diff --git a/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/counters.rst b/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/counters.rst
index deb0e07432c4..9d12dd154d2e 100644
--- a/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/counters.rst
+++ b/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/counters.rst
@@ -219,6 +219,15 @@ the software port.
        [#accel]_.
      - Informative
 
+   * - `rx[i]_hds_nodata_packets`
+     - Number of header only packets in header/data split mode [#accel]_.
+     - Informative
+
+   * - `rx[i]_hds_nodata_bytes`
+     - Number of bytes for header only packets in header/data split mode
+       [#accel]_.
+     - Informative
+
    * - `rx[i]_lro_packets`
      - The number of LRO packets received on ring i [#accel]_.
      - Acceleration
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
index f40f34877904..834428ed45ee 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -2331,6 +2331,9 @@ static void mlx5e_handle_rx_cqe_mpwrq_shampo(struct mlx5e_rq *rq, struct mlx5_cq
 
 			frag_page = &wi->alloc_units.frag_pages[page_idx];
 			mlx5e_shampo_fill_skb_data(*skb, rq, frag_page, data_bcnt, data_offset);
+		} else {
+			stats->hds_nodata_packets++;
+			stats->hds_nodata_bytes += head_size;
 		}
 	} else {
 		stats->hds_nosplit_packets++;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c
index 96ecf675f90d..a4c2691e3bd9 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c
@@ -345,6 +345,8 @@ static void mlx5e_stats_grp_sw_update_stats_rq_stats(struct mlx5e_sw_stats *s,
 	s->rx_gro_large_hds           += rq_stats->gro_large_hds;
 	s->rx_hds_nosplit_packets     += rq_stats->hds_nosplit_packets;
 	s->rx_hds_nosplit_bytes       += rq_stats->hds_nosplit_bytes;
+	s->rx_hds_nodata_packets      += rq_stats->hds_nodata_packets;
+	s->rx_hds_nodata_bytes        += rq_stats->hds_nodata_bytes;
 	s->rx_ecn_mark                += rq_stats->ecn_mark;
 	s->rx_removed_vlan_packets    += rq_stats->removed_vlan_packets;
 	s->rx_csum_none               += rq_stats->csum_none;
@@ -2056,6 +2058,8 @@ static const struct counter_desc rq_stats_desc[] = {
 	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, gro_large_hds) },
 	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, hds_nosplit_packets) },
 	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, hds_nosplit_bytes) },
+	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, hds_nodata_packets) },
+	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, hds_nodata_bytes) },
 	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, ecn_mark) },
 	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, removed_vlan_packets) },
 	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, wqe_err) },
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h
index 6967c8c91f9a..b811cf6ecf9d 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h
@@ -156,6 +156,8 @@ struct mlx5e_sw_stats {
 	u64 rx_gro_large_hds;
 	u64 rx_hds_nosplit_packets;
 	u64 rx_hds_nosplit_bytes;
+	u64 rx_hds_nodata_packets;
+	u64 rx_hds_nodata_bytes;
 	u64 rx_mcast_packets;
 	u64 rx_ecn_mark;
 	u64 rx_removed_vlan_packets;
@@ -356,6 +358,8 @@ struct mlx5e_rq_stats {
 	u64 gro_large_hds;
 	u64 hds_nosplit_packets;
 	u64 hds_nosplit_bytes;
+	u64 hds_nodata_packets;
+	u64 hds_nodata_bytes;
 	u64 mcast_packets;
 	u64 ecn_mark;
 	u64 removed_vlan_packets;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH net-next 13/15] net/mlx5e: SHAMPO, Use KSMs instead of KLMs
  2024-05-28 14:27 [PATCH net-next 00/15] net/mlx5e: SHAMPO, Enable HW GRO once more Tariq Toukan
                   ` (11 preceding siblings ...)
  2024-05-28 14:28 ` [PATCH net-next 12/15] net/mlx5e: SHAMPO, Add header-only ethtool counters for header data split Tariq Toukan
@ 2024-05-28 14:28 ` Tariq Toukan
  2024-05-30  1:23   ` Jakub Kicinski
  2024-05-28 14:28 ` [PATCH net-next 14/15] net/mlx5e: SHAMPO, Re-enable HW-GRO Tariq Toukan
  2024-05-28 14:28 ` [PATCH net-next 15/15] net/mlx5e: SHAMPO, Coalesce skb fragments to page size Tariq Toukan
  14 siblings, 1 reply; 27+ messages in thread
From: Tariq Toukan @ 2024-05-28 14:28 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: netdev, Saeed Mahameed, Gal Pressman, Leon Romanovsky, Yoray Zack,
	Tariq Toukan

From: Yoray Zack <yorayz@nvidia.com>

KSM Mkey is KLM Mkey with a fixed buffer size. Due to this fact,
it is a faster mechanism than KLM.

SHAMPO feature used KLMs Mkeys for memory mappings of its headers buffer.
As it used KLMs with the same buffer size for each entry,
we can use KSMs instead.

This commit changes the Mkeys that map the SHAMPO headers buffer
from KLMs to KSMs.

Signed-off-by: Yoray Zack <yorayz@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h  | 20 +-----
 .../ethernet/mellanox/mlx5/core/en/params.c   | 12 ++--
 .../net/ethernet/mellanox/mlx5/core/en/txrx.h | 19 ++++++
 .../net/ethernet/mellanox/mlx5/core/en_main.c | 21 +++---
 .../net/ethernet/mellanox/mlx5/core/en_rx.c   | 65 +++++++++----------
 include/linux/mlx5/device.h                   |  1 +
 6 files changed, 71 insertions(+), 67 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index ff326601d4a4..bec784d25d7b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -80,6 +80,7 @@ struct page_pool;
 				 SKB_DATA_ALIGN(sizeof(struct skb_shared_info)))
 
 #define MLX5E_RX_MAX_HEAD (256)
+#define MLX5E_SHAMPO_LOG_HEADER_ENTRY_SIZE (8)
 #define MLX5E_SHAMPO_LOG_MAX_HEADER_ENTRY_SIZE (9)
 #define MLX5E_SHAMPO_WQ_HEADER_PER_PAGE (PAGE_SIZE >> MLX5E_SHAMPO_LOG_MAX_HEADER_ENTRY_SIZE)
 #define MLX5E_SHAMPO_WQ_BASE_HEAD_ENTRY_SIZE (64)
@@ -146,25 +147,6 @@ struct page_pool;
 #define MLX5E_TX_XSK_POLL_BUDGET       64
 #define MLX5E_SQ_RECOVER_MIN_INTERVAL  500 /* msecs */
 
-#define MLX5E_KLM_UMR_WQE_SZ(sgl_len)\
-	(sizeof(struct mlx5e_umr_wqe) +\
-	(sizeof(struct mlx5_klm) * (sgl_len)))
-
-#define MLX5E_KLM_UMR_WQEBBS(klm_entries) \
-	(DIV_ROUND_UP(MLX5E_KLM_UMR_WQE_SZ(klm_entries), MLX5_SEND_WQE_BB))
-
-#define MLX5E_KLM_UMR_DS_CNT(klm_entries)\
-	(DIV_ROUND_UP(MLX5E_KLM_UMR_WQE_SZ(klm_entries), MLX5_SEND_WQE_DS))
-
-#define MLX5E_KLM_MAX_ENTRIES_PER_WQE(wqe_size)\
-	(((wqe_size) - sizeof(struct mlx5e_umr_wqe)) / sizeof(struct mlx5_klm))
-
-#define MLX5E_KLM_ENTRIES_PER_WQE(wqe_size)\
-	ALIGN_DOWN(MLX5E_KLM_MAX_ENTRIES_PER_WQE(wqe_size), MLX5_UMR_KLM_NUM_ENTRIES_ALIGNMENT)
-
-#define MLX5E_MAX_KLM_PER_WQE(mdev) \
-	MLX5E_KLM_ENTRIES_PER_WQE(MLX5_SEND_WQE_BB * mlx5e_get_max_sq_aligned_wqebbs(mdev))
-
 #define mlx5e_state_dereference(priv, p) \
 	rcu_dereference_protected((p), lockdep_is_held(&(priv)->state_lock))
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/params.c b/drivers/net/ethernet/mellanox/mlx5/core/en/params.c
index ec819dfc98be..6c9ccccca81e 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/params.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/params.c
@@ -1071,18 +1071,18 @@ static u32 mlx5e_shampo_icosq_sz(struct mlx5_core_dev *mdev,
 				 struct mlx5e_params *params,
 				 struct mlx5e_rq_param *rq_param)
 {
-	int max_num_of_umr_per_wqe, max_hd_per_wqe, max_klm_per_umr, rest;
+	int max_num_of_umr_per_wqe, max_hd_per_wqe, max_ksm_per_umr, rest;
 	void *wqc = MLX5_ADDR_OF(rqc, rq_param->rqc, wq);
 	int wq_size = BIT(MLX5_GET(wq, wqc, log_wq_sz));
 	u32 wqebbs;
 
-	max_klm_per_umr = MLX5E_MAX_KLM_PER_WQE(mdev);
+	max_ksm_per_umr = MLX5E_MAX_KSM_PER_WQE(mdev);
 	max_hd_per_wqe = mlx5e_shampo_hd_per_wqe(mdev, params, rq_param);
-	max_num_of_umr_per_wqe = max_hd_per_wqe / max_klm_per_umr;
-	rest = max_hd_per_wqe % max_klm_per_umr;
-	wqebbs = MLX5E_KLM_UMR_WQEBBS(max_klm_per_umr) * max_num_of_umr_per_wqe;
+	max_num_of_umr_per_wqe = max_hd_per_wqe / max_ksm_per_umr;
+	rest = max_hd_per_wqe % max_ksm_per_umr;
+	wqebbs = MLX5E_KSM_UMR_WQEBBS(max_ksm_per_umr) * max_num_of_umr_per_wqe;
 	if (rest)
-		wqebbs += MLX5E_KLM_UMR_WQEBBS(rest);
+		wqebbs += MLX5E_KSM_UMR_WQEBBS(rest);
 	wqebbs *= wq_size;
 	return wqebbs;
 }
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/txrx.h b/drivers/net/ethernet/mellanox/mlx5/core/en/txrx.h
index 879d698b6119..d1f0f868d494 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/txrx.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/txrx.h
@@ -34,6 +34,25 @@
 
 #define MLX5E_RX_ERR_CQE(cqe) (get_cqe_opcode(cqe) != MLX5_CQE_RESP_SEND)
 
+#define MLX5E_KSM_UMR_WQE_SZ(sgl_len)\
+	(sizeof(struct mlx5e_umr_wqe) +\
+	(sizeof(struct mlx5_ksm) * (sgl_len)))
+
+#define MLX5E_KSM_UMR_WQEBBS(ksm_entries) \
+	(DIV_ROUND_UP(MLX5E_KSM_UMR_WQE_SZ(ksm_entries), MLX5_SEND_WQE_BB))
+
+#define MLX5E_KSM_UMR_DS_CNT(ksm_entries)\
+	(DIV_ROUND_UP(MLX5E_KSM_UMR_WQE_SZ(ksm_entries), MLX5_SEND_WQE_DS))
+
+#define MLX5E_KSM_MAX_ENTRIES_PER_WQE(wqe_size)\
+	(((wqe_size) - sizeof(struct mlx5e_umr_wqe)) / sizeof(struct mlx5_ksm))
+
+#define MLX5E_KSM_ENTRIES_PER_WQE(wqe_size)\
+	ALIGN_DOWN(MLX5E_KSM_MAX_ENTRIES_PER_WQE(wqe_size), MLX5_UMR_KSM_NUM_ENTRIES_ALIGNMENT)
+
+#define MLX5E_MAX_KSM_PER_WQE(mdev) \
+	MLX5E_KSM_ENTRIES_PER_WQE(MLX5_SEND_WQE_BB * mlx5e_get_max_sq_aligned_wqebbs(mdev))
+
 static inline
 ktime_t mlx5e_cqe_ts_to_ns(cqe_ts_to_ns func, struct mlx5_clock *clock, u64 cqe_ts)
 {
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 1b08995b8022..913cc0275871 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -504,8 +504,8 @@ static int mlx5e_create_umr_mkey(struct mlx5_core_dev *mdev,
 	return err;
 }
 
-static int mlx5e_create_umr_klm_mkey(struct mlx5_core_dev *mdev,
-				     u64 nentries,
+static int mlx5e_create_umr_ksm_mkey(struct mlx5_core_dev *mdev,
+				     u64 nentries, u8 log_entry_size,
 				     u32 *umr_mkey)
 {
 	int inlen;
@@ -525,12 +525,13 @@ static int mlx5e_create_umr_klm_mkey(struct mlx5_core_dev *mdev,
 	MLX5_SET(mkc, mkc, umr_en, 1);
 	MLX5_SET(mkc, mkc, lw, 1);
 	MLX5_SET(mkc, mkc, lr, 1);
-	MLX5_SET(mkc, mkc, access_mode_1_0, MLX5_MKC_ACCESS_MODE_KLMS);
+	MLX5_SET(mkc, mkc, access_mode_1_0, MLX5_MKC_ACCESS_MODE_KSM);
 	mlx5e_mkey_set_relaxed_ordering(mdev, mkc);
 	MLX5_SET(mkc, mkc, qpn, 0xffffff);
 	MLX5_SET(mkc, mkc, pd, mdev->mlx5e_res.hw_objs.pdn);
 	MLX5_SET(mkc, mkc, translations_octword_size, nentries);
-	MLX5_SET(mkc, mkc, length64, 1);
+	MLX5_SET(mkc, mkc, log_page_size, log_entry_size);
+	MLX5_SET64(mkc, mkc, len, nentries << log_entry_size);
 	err = mlx5_core_create_mkey(mdev, umr_mkey, in, inlen);
 
 	kvfree(in);
@@ -565,14 +566,16 @@ static int mlx5e_create_rq_umr_mkey(struct mlx5_core_dev *mdev, struct mlx5e_rq
 static int mlx5e_create_rq_hd_umr_mkey(struct mlx5_core_dev *mdev,
 				       struct mlx5e_rq *rq)
 {
-	u32 max_klm_size = BIT(MLX5_CAP_GEN(mdev, log_max_klm_list_size));
+	u32 max_ksm_size = BIT(MLX5_CAP_GEN(mdev, log_max_klm_list_size));
 
-	if (max_klm_size < rq->mpwqe.shampo->hd_per_wq) {
-		mlx5_core_err(mdev, "max klm list size 0x%x is smaller than shampo header buffer list size 0x%x\n",
-			      max_klm_size, rq->mpwqe.shampo->hd_per_wq);
+	if (max_ksm_size < rq->mpwqe.shampo->hd_per_wq) {
+		mlx5_core_err(mdev, "max ksm list size 0x%x is smaller than shampo header buffer list size 0x%x\n",
+			      max_ksm_size, rq->mpwqe.shampo->hd_per_wq);
 		return -EINVAL;
 	}
-	return mlx5e_create_umr_klm_mkey(mdev, rq->mpwqe.shampo->hd_per_wq,
+
+	return mlx5e_create_umr_ksm_mkey(mdev, rq->mpwqe.shampo->hd_per_wq,
+					 MLX5E_SHAMPO_LOG_HEADER_ENTRY_SIZE,
 					 &rq->mpwqe.shampo->mkey);
 }
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
index 834428ed45ee..e6987bd467d7 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -619,25 +619,25 @@ static int bitmap_find_window(unsigned long *bitmap, int len,
 	return min(len, count);
 }
 
-static void build_klm_umr(struct mlx5e_icosq *sq, struct mlx5e_umr_wqe *umr_wqe,
-			  __be32 key, u16 offset, u16 klm_len, u16 wqe_bbs)
+static void build_ksm_umr(struct mlx5e_icosq *sq, struct mlx5e_umr_wqe *umr_wqe,
+			  __be32 key, u16 offset, u16 ksm_len)
 {
-	memset(umr_wqe, 0, offsetof(struct mlx5e_umr_wqe, inline_klms));
+	memset(umr_wqe, 0, offsetof(struct mlx5e_umr_wqe, inline_ksms));
 	umr_wqe->ctrl.opmod_idx_opcode =
 		cpu_to_be32((sq->pc << MLX5_WQE_CTRL_WQE_INDEX_SHIFT) |
 			     MLX5_OPCODE_UMR);
 	umr_wqe->ctrl.umr_mkey = key;
 	umr_wqe->ctrl.qpn_ds = cpu_to_be32((sq->sqn << MLX5_WQE_CTRL_QPN_SHIFT)
-					    | MLX5E_KLM_UMR_DS_CNT(klm_len));
+					    | MLX5E_KSM_UMR_DS_CNT(ksm_len));
 	umr_wqe->uctrl.flags = MLX5_UMR_TRANSLATION_OFFSET_EN | MLX5_UMR_INLINE;
 	umr_wqe->uctrl.xlt_offset = cpu_to_be16(offset);
-	umr_wqe->uctrl.xlt_octowords = cpu_to_be16(klm_len);
+	umr_wqe->uctrl.xlt_octowords = cpu_to_be16(ksm_len);
 	umr_wqe->uctrl.mkey_mask     = cpu_to_be64(MLX5_MKEY_MASK_FREE);
 }
 
 static int mlx5e_build_shampo_hd_umr(struct mlx5e_rq *rq,
 				     struct mlx5e_icosq *sq,
-				     u16 klm_entries, u16 index)
+				     u16 ksm_entries, u16 index)
 {
 	struct mlx5e_shampo_hd *shampo = rq->mpwqe.shampo;
 	u16 entries, pi, header_offset, err, wqe_bbs, new_entries;
@@ -650,20 +650,20 @@ static int mlx5e_build_shampo_hd_umr(struct mlx5e_rq *rq,
 	int headroom, i;
 
 	headroom = rq->buff.headroom;
-	new_entries = klm_entries - (shampo->pi & (MLX5_UMR_KLM_NUM_ENTRIES_ALIGNMENT - 1));
-	entries = ALIGN(klm_entries, MLX5_UMR_KLM_NUM_ENTRIES_ALIGNMENT);
-	wqe_bbs = MLX5E_KLM_UMR_WQEBBS(entries);
+	new_entries = ksm_entries - (shampo->pi & (MLX5_UMR_KSM_NUM_ENTRIES_ALIGNMENT - 1));
+	entries = ALIGN(ksm_entries, MLX5_UMR_KSM_NUM_ENTRIES_ALIGNMENT);
+	wqe_bbs = MLX5E_KSM_UMR_WQEBBS(entries);
 	pi = mlx5e_icosq_get_next_pi(sq, wqe_bbs);
 	umr_wqe = mlx5_wq_cyc_get_wqe(&sq->wq, pi);
-	build_klm_umr(sq, umr_wqe, shampo->key, index, entries, wqe_bbs);
+	build_ksm_umr(sq, umr_wqe, shampo->key, index, entries);
 
 	frag_page = &shampo->pages[page_index];
 
 	for (i = 0; i < entries; i++, index++) {
 		dma_info = &shampo->info[index];
-		if (i >= klm_entries || (index < shampo->pi && shampo->pi - index <
-					 MLX5_UMR_KLM_NUM_ENTRIES_ALIGNMENT))
-			goto update_klm;
+		if (i >= ksm_entries || (index < shampo->pi && shampo->pi - index <
+					 MLX5_UMR_KSM_NUM_ENTRIES_ALIGNMENT))
+			goto update_ksm;
 		header_offset = (index & (MLX5E_SHAMPO_WQ_HEADER_PER_PAGE - 1)) <<
 			MLX5E_SHAMPO_LOG_MAX_HEADER_ENTRY_SIZE;
 		if (!(header_offset & (PAGE_SIZE - 1))) {
@@ -683,12 +683,11 @@ static int mlx5e_build_shampo_hd_umr(struct mlx5e_rq *rq,
 			dma_info->frag_page = frag_page;
 		}
 
-update_klm:
-		umr_wqe->inline_klms[i].bcount =
-			cpu_to_be32(MLX5E_RX_MAX_HEAD);
-		umr_wqe->inline_klms[i].key    = cpu_to_be32(lkey);
-		umr_wqe->inline_klms[i].va     =
-			cpu_to_be64(dma_info->addr + headroom);
+update_ksm:
+		umr_wqe->inline_ksms[i] = (struct mlx5_ksm) {
+			.key = cpu_to_be32(lkey),
+			.va  = cpu_to_be64(dma_info->addr + headroom),
+		};
 	}
 
 	sq->db.wqe_info[pi] = (struct mlx5e_icosq_wqe_info) {
@@ -720,37 +719,37 @@ static int mlx5e_build_shampo_hd_umr(struct mlx5e_rq *rq,
 static int mlx5e_alloc_rx_hd_mpwqe(struct mlx5e_rq *rq)
 {
 	struct mlx5e_shampo_hd *shampo = rq->mpwqe.shampo;
-	u16 klm_entries, num_wqe, index, entries_before;
+	u16 ksm_entries, num_wqe, index, entries_before;
 	struct mlx5e_icosq *sq = rq->icosq;
-	int i, err, max_klm_entries, len;
+	int i, err, max_ksm_entries, len;
 
-	max_klm_entries = MLX5E_MAX_KLM_PER_WQE(rq->mdev);
-	klm_entries = bitmap_find_window(shampo->bitmap,
+	max_ksm_entries = MLX5E_MAX_KSM_PER_WQE(rq->mdev);
+	ksm_entries = bitmap_find_window(shampo->bitmap,
 					 shampo->hd_per_wqe,
 					 shampo->hd_per_wq, shampo->pi);
-	if (!klm_entries)
+	if (!ksm_entries)
 		return 0;
 
-	klm_entries += (shampo->pi & (MLX5_UMR_KLM_NUM_ENTRIES_ALIGNMENT - 1));
-	index = ALIGN_DOWN(shampo->pi, MLX5_UMR_KLM_NUM_ENTRIES_ALIGNMENT);
+	ksm_entries += (shampo->pi & (MLX5_UMR_KSM_NUM_ENTRIES_ALIGNMENT - 1));
+	index = ALIGN_DOWN(shampo->pi, MLX5_UMR_KSM_NUM_ENTRIES_ALIGNMENT);
 	entries_before = shampo->hd_per_wq - index;
 
-	if (unlikely(entries_before < klm_entries))
-		num_wqe = DIV_ROUND_UP(entries_before, max_klm_entries) +
-			  DIV_ROUND_UP(klm_entries - entries_before, max_klm_entries);
+	if (unlikely(entries_before < ksm_entries))
+		num_wqe = DIV_ROUND_UP(entries_before, max_ksm_entries) +
+			  DIV_ROUND_UP(ksm_entries - entries_before, max_ksm_entries);
 	else
-		num_wqe = DIV_ROUND_UP(klm_entries, max_klm_entries);
+		num_wqe = DIV_ROUND_UP(ksm_entries, max_ksm_entries);
 
 	for (i = 0; i < num_wqe; i++) {
-		len = (klm_entries > max_klm_entries) ? max_klm_entries :
-							klm_entries;
+		len = (ksm_entries > max_ksm_entries) ? max_ksm_entries :
+							ksm_entries;
 		if (unlikely(index + len > shampo->hd_per_wq))
 			len = shampo->hd_per_wq - index;
 		err = mlx5e_build_shampo_hd_umr(rq, sq, len, index);
 		if (unlikely(err))
 			return err;
 		index = (index + len) & (rq->mpwqe.shampo->hd_per_wq - 1);
-		klm_entries -= len;
+		ksm_entries -= len;
 	}
 
 	return 0;
diff --git a/include/linux/mlx5/device.h b/include/linux/mlx5/device.h
index d7bb31d9a446..da09bfaa7b81 100644
--- a/include/linux/mlx5/device.h
+++ b/include/linux/mlx5/device.h
@@ -294,6 +294,7 @@ enum {
 #define MLX5_UMR_FLEX_ALIGNMENT 0x40
 #define MLX5_UMR_MTT_NUM_ENTRIES_ALIGNMENT (MLX5_UMR_FLEX_ALIGNMENT / sizeof(struct mlx5_mtt))
 #define MLX5_UMR_KLM_NUM_ENTRIES_ALIGNMENT (MLX5_UMR_FLEX_ALIGNMENT / sizeof(struct mlx5_klm))
+#define MLX5_UMR_KSM_NUM_ENTRIES_ALIGNMENT (MLX5_UMR_FLEX_ALIGNMENT / sizeof(struct mlx5_ksm))
 
 #define MLX5_USER_INDEX_LEN (MLX5_FLD_SZ_BYTES(qpc, user_index) * 8)
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH net-next 14/15] net/mlx5e: SHAMPO, Re-enable HW-GRO
  2024-05-28 14:27 [PATCH net-next 00/15] net/mlx5e: SHAMPO, Enable HW GRO once more Tariq Toukan
                   ` (12 preceding siblings ...)
  2024-05-28 14:28 ` [PATCH net-next 13/15] net/mlx5e: SHAMPO, Use KSMs instead of KLMs Tariq Toukan
@ 2024-05-28 14:28 ` Tariq Toukan
  2024-05-28 14:28 ` [PATCH net-next 15/15] net/mlx5e: SHAMPO, Coalesce skb fragments to page size Tariq Toukan
  14 siblings, 0 replies; 27+ messages in thread
From: Tariq Toukan @ 2024-05-28 14:28 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: netdev, Saeed Mahameed, Gal Pressman, Leon Romanovsky, Yoray Zack,
	Tariq Toukan

From: Yoray Zack <yorayz@nvidia.com>

Add back HW-GRO to the reported features.

As the current implementation of HW-GRO uses KSMs with a
specific fixed buffer size (256B) to map its headers buffer,
we reported the feature only if the NIC is supporting KSM and
the minimum value for buffer size is below the requested one.

iperf3 bandwidth comparison:
+---------+--------+--------+-----------+
| streams | SW GRO | HW GRO | Unit      |
|---------+--------+--------+-----------|
| 1       | 36     | 42     | Gbits/sec |
| 4       | 34     | 39     | Gbits/sec |
| 8       | 31     | 35     | Gbits/sec |
+---------+--------+--------+-----------+

A downstream patch will add skb fragment coalescing which will improve
performance considerably.

Benchmark details:
VM based setup
CPU: Intel(R) Xeon(R) Platinum 8380 CPU, 24 cores
NIC: ConnectX-7 100GbE
iperf3 and irq running on same CPU over a single receive queue

Signed-off-by: Yoray Zack <yorayz@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
 .../net/ethernet/mellanox/mlx5/core/en_main.c | 26 +++++++++++++++++++
 include/linux/mlx5/mlx5_ifc.h                 | 16 ++++++++----
 2 files changed, 37 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 913cc0275871..0f3d107961a4 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -74,6 +74,27 @@
 #include "lib/devcom.h"
 #include "lib/sd.h"
 
+static bool mlx5e_hw_gro_supported(struct mlx5_core_dev *mdev)
+{
+	if (!MLX5_CAP_GEN(mdev, shampo))
+		return false;
+
+	/* Our HW-GRO implementation relies on "KSM Mkey" for
+	 * SHAMPO headers buffer mapping
+	 */
+	if (!MLX5_CAP_GEN(mdev, fixed_buffer_size))
+		return false;
+
+	if (!MLX5_CAP_GEN_2(mdev, min_mkey_log_entity_size_fixed_buffer_valid))
+		return false;
+
+	if (MLX5_CAP_GEN_2(mdev, min_mkey_log_entity_size_fixed_buffer) >
+	    MLX5E_SHAMPO_LOG_HEADER_ENTRY_SIZE)
+		return false;
+
+	return true;
+}
+
 bool mlx5e_check_fragmented_striding_rq_cap(struct mlx5_core_dev *mdev, u8 page_shift,
 					    enum mlx5e_mpwrq_umr_mode umr_mode)
 {
@@ -5331,6 +5352,11 @@ static void mlx5e_build_nic_netdev(struct net_device *netdev)
 	netdev->hw_features      |= NETIF_F_HW_VLAN_CTAG_FILTER;
 	netdev->hw_features      |= NETIF_F_HW_VLAN_STAG_TX;
 
+	if (mlx5e_hw_gro_supported(mdev) &&
+	    mlx5e_check_fragmented_striding_rq_cap(mdev, PAGE_SHIFT,
+						   MLX5E_MPWRQ_UMR_MODE_ALIGNED))
+		netdev->hw_features    |= NETIF_F_GRO_HW;
+
 	if (mlx5e_tunnel_any_tx_proto_supported(mdev)) {
 		netdev->hw_enc_features |= NETIF_F_HW_CSUM;
 		netdev->hw_enc_features |= NETIF_F_TSO;
diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index f468763478ae..488509f84982 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -1526,8 +1526,7 @@ struct mlx5_ifc_cmd_hca_cap_bits {
 	u8         ts_cqe_to_dest_cqn[0x1];
 	u8         reserved_at_b3[0x6];
 	u8         go_back_n[0x1];
-	u8         shampo[0x1];
-	u8         reserved_at_bb[0x5];
+	u8         reserved_at_ba[0x6];
 
 	u8         max_sgl_for_optimized_performance[0x8];
 	u8         log_max_cq_sz[0x8];
@@ -1744,7 +1743,9 @@ struct mlx5_ifc_cmd_hca_cap_bits {
 	u8         reserved_at_280[0x10];
 	u8         max_wqe_sz_sq[0x10];
 
-	u8         reserved_at_2a0[0x10];
+	u8         reserved_at_2a0[0xb];
+	u8         shampo[0x1];
+	u8         reserved_at_2ac[0x4];
 	u8         max_wqe_sz_rq[0x10];
 
 	u8         max_flow_counter_31_16[0x10];
@@ -2017,7 +2018,8 @@ struct mlx5_ifc_cmd_hca_cap_2_bits {
 	u8	   reserved_at_250[0x10];
 
 	u8	   reserved_at_260[0x120];
-	u8	   reserved_at_380[0x10];
+	u8	   reserved_at_380[0xb];
+	u8	   min_mkey_log_entity_size_fixed_buffer[0x5];
 	u8	   ec_vf_vport_base[0x10];
 
 	u8	   reserved_at_3a0[0x10];
@@ -2029,7 +2031,11 @@ struct mlx5_ifc_cmd_hca_cap_2_bits {
 	u8	   pcc_ifa2[0x1];
 	u8	   reserved_at_3f1[0xf];
 
-	u8	   reserved_at_400[0x400];
+	u8	   reserved_at_400[0x1];
+	u8	   min_mkey_log_entity_size_fixed_buffer_valid[0x1];
+	u8	   reserved_at_402[0x1e];
+
+	u8	   reserved_at_420[0x3e0];
 };
 
 enum mlx5_ifc_flow_destination_type {
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH net-next 15/15] net/mlx5e: SHAMPO, Coalesce skb fragments to page size
  2024-05-28 14:27 [PATCH net-next 00/15] net/mlx5e: SHAMPO, Enable HW GRO once more Tariq Toukan
                   ` (13 preceding siblings ...)
  2024-05-28 14:28 ` [PATCH net-next 14/15] net/mlx5e: SHAMPO, Re-enable HW-GRO Tariq Toukan
@ 2024-05-28 14:28 ` Tariq Toukan
  14 siblings, 0 replies; 27+ messages in thread
From: Tariq Toukan @ 2024-05-28 14:28 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: netdev, Saeed Mahameed, Gal Pressman, Leon Romanovsky,
	Dragos Tatulea, Tariq Toukan

From: Dragos Tatulea <dtatulea@nvidia.com>

When doing hardware GRO (SHAMPO), the driver puts each data payload of a
packet from the wire into one skb fragment. TCP Zero-Copy expects page
sized skb fragments to be able to do it's page-flipping magic. With the
current way of arranging fragments by the driver, only specific MTUs
(page sized multiple + header size) will yield such page sized fragments
in a high percentage.

This change improves payload arrangement in the skb for hardware GRO by
coalescing payloads into a single skb fragment when possible.

To demonstrate the fix, running tcp_mmap with a MTU of 1500 yields:
- Before:  0 % bytes mmap'ed
- After : 81 % bytes mmap'ed

More importantly, coalescing considerably improves the HW GRO performance.
Here are the results for a iperf3 bandwidth benchmark:
+---------+--------+--------+------------------------+-----------+
| streams | SW GRO | HW GRO | HW GRO with coalescing | Unit      |
|---------+--------+--------+------------------------+-----------|
| 1       | 36     | 42     | 57                     | Gbits/sec |
| 4       | 34     | 39     | 50                     | Gbits/sec |
| 8       | 31     | 35     | 43                     | Gbits/sec |
+---------+--------+--------+------------------------+-----------+

Benchmark details:
VM based setup
CPU: Intel(R) Xeon(R) Platinum 8380 CPU, 24 cores
NIC: ConnectX-7 100GbE
iperf3 and irq running on same CPU over a single receive queue

Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
 .../net/ethernet/mellanox/mlx5/core/en_rx.c   | 19 +++++++++++++------
 1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
index e6987bd467d7..54edeb8c652e 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -523,15 +523,23 @@ mlx5e_add_skb_shared_info_frag(struct mlx5e_rq *rq, struct skb_shared_info *sinf
 
 static inline void
 mlx5e_add_skb_frag(struct mlx5e_rq *rq, struct sk_buff *skb,
-		   struct page *page, u32 frag_offset, u32 len,
+		   struct mlx5e_frag_page *frag_page,
+		   u32 frag_offset, u32 len,
 		   unsigned int truesize)
 {
-	dma_addr_t addr = page_pool_get_dma_addr(page);
+	dma_addr_t addr = page_pool_get_dma_addr(frag_page->page);
+	u8 next_frag = skb_shinfo(skb)->nr_frags;
 
 	dma_sync_single_for_cpu(rq->pdev, addr + frag_offset, len,
 				rq->buff.map_dir);
-	skb_add_rx_frag(skb, skb_shinfo(skb)->nr_frags,
-			page, frag_offset, len, truesize);
+
+	if (skb_can_coalesce(skb, next_frag, frag_page->page, frag_offset)) {
+		skb_coalesce_rx_frag(skb, next_frag - 1, len, truesize);
+	} else {
+		frag_page->frags++;
+		skb_add_rx_frag(skb, next_frag, frag_page->page,
+				frag_offset, len, truesize);
+	}
 }
 
 static inline void
@@ -1956,8 +1964,7 @@ mlx5e_shampo_fill_skb_data(struct sk_buff *skb, struct mlx5e_rq *rq,
 		u32 pg_consumed_bytes = min_t(u32, PAGE_SIZE - data_offset, data_bcnt);
 		unsigned int truesize = pg_consumed_bytes;
 
-		frag_page->frags++;
-		mlx5e_add_skb_frag(rq, skb, frag_page->page, data_offset,
+		mlx5e_add_skb_frag(rq, skb, frag_page, data_offset,
 				   pg_consumed_bytes, truesize);
 
 		data_bcnt -= pg_consumed_bytes;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH net-next 02/15] net/mlx5e: SHAMPO, Fix incorrect page release
  2024-05-28 14:27 ` [PATCH net-next 02/15] net/mlx5e: SHAMPO, Fix incorrect page release Tariq Toukan
@ 2024-05-30  1:12   ` Jakub Kicinski
  2024-05-30  3:24     ` Saeed Mahameed
  0 siblings, 1 reply; 27+ messages in thread
From: Jakub Kicinski @ 2024-05-30  1:12 UTC (permalink / raw)
  To: Tariq Toukan
  Cc: David S. Miller, Paolo Abeni, Eric Dumazet, netdev,
	Saeed Mahameed, Gal Pressman, Leon Romanovsky, Dragos Tatulea

On Tue, 28 May 2024 17:27:54 +0300 Tariq Toukan wrote:
> Fixes: 6f5742846053 ("net/mlx5e: RX, Enable skb page recycling through the page_pool")

Sounds like a bug fix, why net-next?

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH net-next 11/15] net/mlx5e: SHAMPO, Add no-split ethtool counters for header/data split
  2024-05-28 14:28 ` [PATCH net-next 11/15] net/mlx5e: SHAMPO, Add no-split ethtool counters for header/data split Tariq Toukan
@ 2024-05-30  1:22   ` Jakub Kicinski
  2024-05-30  3:32     ` Saeed Mahameed
  0 siblings, 1 reply; 27+ messages in thread
From: Jakub Kicinski @ 2024-05-30  1:22 UTC (permalink / raw)
  To: Tariq Toukan
  Cc: David S. Miller, Paolo Abeni, Eric Dumazet, netdev,
	Saeed Mahameed, Gal Pressman, Leon Romanovsky, Dragos Tatulea

On Tue, 28 May 2024 17:28:03 +0300 Tariq Toukan wrote:
> +   * - `rx[i]_hds_nosplit_packets`
> +     - Number of packets that were not split in modes that do header/data split
> +       [#accel]_.
> +     - Informative
> +
> +   * - `rx[i]_hds_nosplit_bytes`
> +     - Number of bytes that were not split in modes that do header/data split
> +       [#accel]_.
> +     - Informative

This is too vague. The ethtool HDS feature is for TCP only.
What does this count? Non-TCP packets basically?

Given this is a HW-GRO series, are HDS packets == HW-GRO eligible
packets?

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH net-next 13/15] net/mlx5e: SHAMPO, Use KSMs instead of KLMs
  2024-05-28 14:28 ` [PATCH net-next 13/15] net/mlx5e: SHAMPO, Use KSMs instead of KLMs Tariq Toukan
@ 2024-05-30  1:23   ` Jakub Kicinski
  2024-05-30  3:26     ` Saeed Mahameed
  0 siblings, 1 reply; 27+ messages in thread
From: Jakub Kicinski @ 2024-05-30  1:23 UTC (permalink / raw)
  To: Tariq Toukan
  Cc: David S. Miller, Paolo Abeni, Eric Dumazet, netdev,
	Saeed Mahameed, Gal Pressman, Leon Romanovsky, Yoray Zack

On Tue, 28 May 2024 17:28:05 +0300 Tariq Toukan wrote:
> KSM Mkey is KLM Mkey with a fixed buffer size. Due to this fact,
> it is a faster mechanism than KLM.
> 
> SHAMPO feature used KLMs Mkeys for memory mappings of its headers buffer.
> As it used KLMs with the same buffer size for each entry,
> we can use KSMs instead.
> 
> This commit changes the Mkeys that map the SHAMPO headers buffer
> from KLMs to KSMs.

Any references for understanding what KSM and KLM stand for?

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH net-next 02/15] net/mlx5e: SHAMPO, Fix incorrect page release
  2024-05-30  1:12   ` Jakub Kicinski
@ 2024-05-30  3:24     ` Saeed Mahameed
  0 siblings, 0 replies; 27+ messages in thread
From: Saeed Mahameed @ 2024-05-30  3:24 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Tariq Toukan, David S. Miller, Paolo Abeni, Eric Dumazet, netdev,
	Saeed Mahameed, Gal Pressman, Leon Romanovsky, Dragos Tatulea

On 29 May 18:12, Jakub Kicinski wrote:
>On Tue, 28 May 2024 17:27:54 +0300 Tariq Toukan wrote:
>> Fixes: 6f5742846053 ("net/mlx5e: RX, Enable skb page recycling through the page_pool")
>
>Sounds like a bug fix, why net-next?
>

This only affects HW GRO which you couldn't enable before this series.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH net-next 13/15] net/mlx5e: SHAMPO, Use KSMs instead of KLMs
  2024-05-30  1:23   ` Jakub Kicinski
@ 2024-05-30  3:26     ` Saeed Mahameed
  0 siblings, 0 replies; 27+ messages in thread
From: Saeed Mahameed @ 2024-05-30  3:26 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Tariq Toukan, David S. Miller, Paolo Abeni, Eric Dumazet, netdev,
	Saeed Mahameed, Gal Pressman, Leon Romanovsky, Yoray Zack

On 29 May 18:23, Jakub Kicinski wrote:
>On Tue, 28 May 2024 17:28:05 +0300 Tariq Toukan wrote:
>> KSM Mkey is KLM Mkey with a fixed buffer size. Due to this fact,
>> it is a faster mechanism than KLM.
>>
>> SHAMPO feature used KLMs Mkeys for memory mappings of its headers buffer.
>> As it used KLMs with the same buffer size for each entry,
>> we can use KSMs instead.
>>
>> This commit changes the Mkeys that map the SHAMPO headers buffer
>> from KLMs to KSMs.
>
>Any references for understanding what KSM and KLM stand for?
>

Not available publicly. Simply those are two different HW mechanisms to
translate HW virtual to physical addresses. KSM assumes fixed buffer
length, hence performs faster.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH net-next 11/15] net/mlx5e: SHAMPO, Add no-split ethtool counters for header/data split
  2024-05-30  1:22   ` Jakub Kicinski
@ 2024-05-30  3:32     ` Saeed Mahameed
  2024-05-30 15:31       ` Jakub Kicinski
  0 siblings, 1 reply; 27+ messages in thread
From: Saeed Mahameed @ 2024-05-30  3:32 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Tariq Toukan, David S. Miller, Paolo Abeni, Eric Dumazet, netdev,
	Saeed Mahameed, Gal Pressman, Leon Romanovsky, Dragos Tatulea

On 29 May 18:22, Jakub Kicinski wrote:
>On Tue, 28 May 2024 17:28:03 +0300 Tariq Toukan wrote:
>> +   * - `rx[i]_hds_nosplit_packets`
>> +     - Number of packets that were not split in modes that do header/data split
>> +       [#accel]_.
>> +     - Informative
>> +
>> +   * - `rx[i]_hds_nosplit_bytes`
>> +     - Number of bytes that were not split in modes that do header/data split
>> +       [#accel]_.
>> +     - Informative
>
>This is too vague. The ethtool HDS feature is for TCP only.
>What does this count? Non-TCP packets basically?
>

But this is not the ethtool HDS, this is the mlx5 HW GRO hds.
On the sane note, are we planning to have different control knobs/stats for
tcp/udp/ip HDS? ConnectX supports both TCP and UDP on the same queue, 
the driver has no control on which protocol gets HDS and which doesn't.

>Given this is a HW-GRO series, are HDS packets == HW-GRO eligible
>packets?
>

No, UDP will also get header data split or other TCP packets that don't
belong to any aggregation context in the HW.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH net-next 11/15] net/mlx5e: SHAMPO, Add no-split ethtool counters for header/data split
  2024-05-30  3:32     ` Saeed Mahameed
@ 2024-05-30 15:31       ` Jakub Kicinski
  2024-06-03 12:46         ` Dragos Tatulea
  0 siblings, 1 reply; 27+ messages in thread
From: Jakub Kicinski @ 2024-05-30 15:31 UTC (permalink / raw)
  To: Saeed Mahameed
  Cc: Tariq Toukan, David S. Miller, Paolo Abeni, Eric Dumazet, netdev,
	Saeed Mahameed, Gal Pressman, Leon Romanovsky, Dragos Tatulea

On Wed, 29 May 2024 20:32:23 -0700 Saeed Mahameed wrote:
> On 29 May 18:22, Jakub Kicinski wrote:
> >On Tue, 28 May 2024 17:28:03 +0300 Tariq Toukan wrote:  
> >> +   * - `rx[i]_hds_nosplit_packets`
> >> +     - Number of packets that were not split in modes that do header/data split
> >> +       [#accel]_.
> >> +     - Informative
> >> +
> >> +   * - `rx[i]_hds_nosplit_bytes`
> >> +     - Number of bytes that were not split in modes that do header/data split
> >> +       [#accel]_.
> >> +     - Informative  
> >
> >This is too vague. The ethtool HDS feature is for TCP only.
> >What does this count? Non-TCP packets basically?
> 
> But this is not the ethtool HDS, this is the mlx5 HW GRO hds.

Okay, but you need to put more detail into the description.
"not split in modes which do split" is going to immediately 
make the reader ask themselves "but why?".

> On the sane note, are we planning to have different control knobs/stats for
> tcp/udp/ip HDS? ConnectX supports both TCP and UDP on the same queue, 
> the driver has no control on which protocol gets HDS and which doesn't.

No plans at this stage. The ethtool HDS is specifically there
to tell user space whether it should bother trying to use TCP mmap.

> >Given this is a HW-GRO series, are HDS packets == HW-GRO eligible
> >packets?
> 
> No, UDP will also get header data split or other TCP packets that don't
> belong to any aggregation context in the HW.

I see.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH net-next 11/15] net/mlx5e: SHAMPO, Add no-split ethtool counters for header/data split
  2024-05-30 15:31       ` Jakub Kicinski
@ 2024-06-03 12:46         ` Dragos Tatulea
  0 siblings, 0 replies; 27+ messages in thread
From: Dragos Tatulea @ 2024-06-03 12:46 UTC (permalink / raw)
  To: kuba@kernel.org, saeed@kernel.org
  Cc: davem@davemloft.net, Tariq Toukan, Gal Pressman,
	netdev@vger.kernel.org, pabeni@redhat.com, edumazet@google.com,
	Saeed Mahameed, Leon Romanovsky

On Thu, 2024-05-30 at 08:31 -0700, Jakub Kicinski wrote:
> On Wed, 29 May 2024 20:32:23 -0700 Saeed Mahameed wrote:
> > On 29 May 18:22, Jakub Kicinski wrote:
> > > On Tue, 28 May 2024 17:28:03 +0300 Tariq Toukan wrote:  
> > > > +   * - `rx[i]_hds_nosplit_packets`
> > > > +     - Number of packets that were not split in modes that do header/data split
> > > > +       [#accel]_.
> > > > +     - Informative
> > > > +
> > > > +   * - `rx[i]_hds_nosplit_bytes`
> > > > +     - Number of bytes that were not split in modes that do header/data split
> > > > +       [#accel]_.
> > > > +     - Informative  
> > > 
> > > This is too vague. The ethtool HDS feature is for TCP only.
> > > What does this count? Non-TCP packets basically?
> > 
> > But this is not the ethtool HDS, this is the mlx5 HW GRO hds.
> 
> Okay, but you need to put more detail into the description.
> "not split in modes which do split" is going to immediately 
> make the reader ask themselves "but why?".
> 
We discussed internally and decided to drop this counter and patch for now. This
will be added back in the HDS series so that we have more time to converge on a
the documentation part.

> > On the sane note, are we planning to have different control knobs/stats for
> > tcp/udp/ip HDS? ConnectX supports both TCP and UDP on the same queue, 
> > the driver has no control on which protocol gets HDS and which doesn't.
> 
> No plans at this stage. The ethtool HDS is specifically there
> to tell user space whether it should bother trying to use TCP mmap.
> 
> > > Given this is a HW-GRO series, are HDS packets == HW-GRO eligible
> > > packets?
> > 
> > No, UDP will also get header data split or other TCP packets that don't
> > belong to any aggregation context in the HW.
> 
> I see.

Thanks,
Dragos

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH net-next 08/15] net/mlx5e: SHAMPO, Skipping on duplicate flush of the same SHAMPO SKB
  2024-05-28 14:28 ` [PATCH net-next 08/15] net/mlx5e: SHAMPO, Skipping on duplicate flush of the same SHAMPO SKB Tariq Toukan
@ 2024-06-05 13:48   ` Simon Horman
  2024-06-05 17:55     ` Dragos Tatulea
  0 siblings, 1 reply; 27+ messages in thread
From: Simon Horman @ 2024-06-05 13:48 UTC (permalink / raw)
  To: Tariq Toukan
  Cc: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
	netdev, Saeed Mahameed, Gal Pressman, Leon Romanovsky, Yoray Zack

On Tue, May 28, 2024 at 05:28:00PM +0300, Tariq Toukan wrote:
> From: Yoray Zack <yorayz@nvidia.com>
> 
> SHAMPO SKB can be flushed in mlx5e_shampo_complete_rx_cqe().
> If the SKB was flushed, rq->hw_gro_data->skb was also set to NULL.
> 
> We can skip on flushing the SKB in mlx5e_shampo_flush_skb
> if rq->hw_gro_data->skb == NULL.
> 
> Signed-off-by: Yoray Zack <yorayz@nvidia.com>
> Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
> ---
>  drivers/net/ethernet/mellanox/mlx5/core/en_rx.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
> index 1e3a5b2afeae..3f76c33aada0 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
> @@ -2334,7 +2334,7 @@ static void mlx5e_handle_rx_cqe_mpwrq_shampo(struct mlx5e_rq *rq, struct mlx5_cq
>  	}
>  
>  	mlx5e_shampo_complete_rx_cqe(rq, cqe, cqe_bcnt, *skb);
> -	if (flush)
> +	if (flush && rq->hw_gro_data->skb)
>  		mlx5e_shampo_flush_skb(rq, cqe, match);

nit: It seems awkward to reach inside rq like this
     when mlx5e_shampo_flush_skb already deals with the skb in question.

     Would it make esnse for the NULL skb check to
     be moved inside mlx5e_shampo_flush_skb() ?

>  free_hd_entry:
>  	if (likely(head_size))
> -- 
> 2.31.1
> 
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH net-next 08/15] net/mlx5e: SHAMPO, Skipping on duplicate flush of the same SHAMPO SKB
  2024-06-05 13:48   ` Simon Horman
@ 2024-06-05 17:55     ` Dragos Tatulea
  2024-06-06 13:59       ` Simon Horman
  0 siblings, 1 reply; 27+ messages in thread
From: Dragos Tatulea @ 2024-06-05 17:55 UTC (permalink / raw)
  To: Tariq Toukan, horms@kernel.org
  Cc: davem@davemloft.net, netdev@vger.kernel.org, Gal Pressman,
	Yoray Zack, Leon Romanovsky, kuba@kernel.org, edumazet@google.com,
	Saeed Mahameed, pabeni@redhat.com

On Wed, 2024-06-05 at 14:48 +0100, Simon Horman wrote:
> On Tue, May 28, 2024 at 05:28:00PM +0300, Tariq Toukan wrote:
> > From: Yoray Zack <yorayz@nvidia.com>
> > 
> > SHAMPO SKB can be flushed in mlx5e_shampo_complete_rx_cqe().
> > If the SKB was flushed, rq->hw_gro_data->skb was also set to NULL.
> > 
> > We can skip on flushing the SKB in mlx5e_shampo_flush_skb
> > if rq->hw_gro_data->skb == NULL.
> > 
> > Signed-off-by: Yoray Zack <yorayz@nvidia.com>
> > Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
> > ---
> >  drivers/net/ethernet/mellanox/mlx5/core/en_rx.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
> > index 1e3a5b2afeae..3f76c33aada0 100644
> > --- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
> > +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
> > @@ -2334,7 +2334,7 @@ static void mlx5e_handle_rx_cqe_mpwrq_shampo(struct mlx5e_rq *rq, struct mlx5_cq
> >  	}
> >  
> >  	mlx5e_shampo_complete_rx_cqe(rq, cqe, cqe_bcnt, *skb);
> > -	if (flush)
> > +	if (flush && rq->hw_gro_data->skb)
> >  		mlx5e_shampo_flush_skb(rq, cqe, match);
> 
> nit: It seems awkward to reach inside rq like this
>      when mlx5e_shampo_flush_skb already deals with the skb in question.
> 
We don't need to reach inside the rq, we could use *skb instead (skb is &rq-
>hw_gro_data->skb). *skb is used often in this function.

>      Would it make esnse for the NULL skb check to
>      be moved inside mlx5e_shampo_flush_skb() ?
> 

Thanks,
Dragos

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH net-next 08/15] net/mlx5e: SHAMPO, Skipping on duplicate flush of the same SHAMPO SKB
  2024-06-05 17:55     ` Dragos Tatulea
@ 2024-06-06 13:59       ` Simon Horman
  0 siblings, 0 replies; 27+ messages in thread
From: Simon Horman @ 2024-06-06 13:59 UTC (permalink / raw)
  To: Dragos Tatulea
  Cc: Tariq Toukan, davem@davemloft.net, netdev@vger.kernel.org,
	Gal Pressman, Yoray Zack, Leon Romanovsky, kuba@kernel.org,
	edumazet@google.com, Saeed Mahameed, pabeni@redhat.com

On Wed, Jun 05, 2024 at 05:55:24PM +0000, Dragos Tatulea wrote:
> On Wed, 2024-06-05 at 14:48 +0100, Simon Horman wrote:
> > On Tue, May 28, 2024 at 05:28:00PM +0300, Tariq Toukan wrote:
> > > From: Yoray Zack <yorayz@nvidia.com>
> > > 
> > > SHAMPO SKB can be flushed in mlx5e_shampo_complete_rx_cqe().
> > > If the SKB was flushed, rq->hw_gro_data->skb was also set to NULL.
> > > 
> > > We can skip on flushing the SKB in mlx5e_shampo_flush_skb
> > > if rq->hw_gro_data->skb == NULL.
> > > 
> > > Signed-off-by: Yoray Zack <yorayz@nvidia.com>
> > > Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
> > > ---
> > >  drivers/net/ethernet/mellanox/mlx5/core/en_rx.c | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
> > > index 1e3a5b2afeae..3f76c33aada0 100644
> > > --- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
> > > +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
> > > @@ -2334,7 +2334,7 @@ static void mlx5e_handle_rx_cqe_mpwrq_shampo(struct mlx5e_rq *rq, struct mlx5_cq
> > >  	}
> > >  
> > >  	mlx5e_shampo_complete_rx_cqe(rq, cqe, cqe_bcnt, *skb);
> > > -	if (flush)
> > > +	if (flush && rq->hw_gro_data->skb)
> > >  		mlx5e_shampo_flush_skb(rq, cqe, match);
> > 
> > nit: It seems awkward to reach inside rq like this
> >      when mlx5e_shampo_flush_skb already deals with the skb in question.
> > 
> We don't need to reach inside the rq, we could use *skb instead (skb is &rq-
> >hw_gro_data->skb). *skb is used often in this function.

So it is, thanks for pointing that out.

Clearly this is a pretty minor thing,
so no need to respin just because of it.

> 
> >      Would it make esnse for the NULL skb check to
> >      be moved inside mlx5e_shampo_flush_skb() ?
> > 
> 
> Thanks,
> Dragos

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2024-06-06 13:59 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-05-28 14:27 [PATCH net-next 00/15] net/mlx5e: SHAMPO, Enable HW GRO once more Tariq Toukan
2024-05-28 14:27 ` [PATCH net-next 01/15] net/mlx5e: SHAMPO, Use net_prefetch API Tariq Toukan
2024-05-28 14:27 ` [PATCH net-next 02/15] net/mlx5e: SHAMPO, Fix incorrect page release Tariq Toukan
2024-05-30  1:12   ` Jakub Kicinski
2024-05-30  3:24     ` Saeed Mahameed
2024-05-28 14:27 ` [PATCH net-next 03/15] net/mlx5e: SHAMPO, Fix invalid WQ linked list unlink Tariq Toukan
2024-05-28 14:27 ` [PATCH net-next 04/15] net/mlx5e: SHAMPO, Fix FCS config when HW GRO on Tariq Toukan
2024-05-28 14:27 ` [PATCH net-next 05/15] net/mlx5e: SHAMPO, Disable gso_size for non GRO packets Tariq Toukan
2024-05-28 14:27 ` [PATCH net-next 06/15] net/mlx5e: SHAMPO, Simplify header page release in teardown Tariq Toukan
2024-05-28 14:27 ` [PATCH net-next 07/15] net/mlx5e: SHAMPO, Specialize mlx5e_fill_skb_data() Tariq Toukan
2024-05-28 14:28 ` [PATCH net-next 08/15] net/mlx5e: SHAMPO, Skipping on duplicate flush of the same SHAMPO SKB Tariq Toukan
2024-06-05 13:48   ` Simon Horman
2024-06-05 17:55     ` Dragos Tatulea
2024-06-06 13:59       ` Simon Horman
2024-05-28 14:28 ` [PATCH net-next 09/15] net/mlx5e: SHAMPO, Make GRO counters more precise Tariq Toukan
2024-05-28 14:28 ` [PATCH net-next 10/15] net/mlx5e: SHAMPO, Drop rx_gro_match_packets counter Tariq Toukan
2024-05-28 14:28 ` [PATCH net-next 11/15] net/mlx5e: SHAMPO, Add no-split ethtool counters for header/data split Tariq Toukan
2024-05-30  1:22   ` Jakub Kicinski
2024-05-30  3:32     ` Saeed Mahameed
2024-05-30 15:31       ` Jakub Kicinski
2024-06-03 12:46         ` Dragos Tatulea
2024-05-28 14:28 ` [PATCH net-next 12/15] net/mlx5e: SHAMPO, Add header-only ethtool counters for header data split Tariq Toukan
2024-05-28 14:28 ` [PATCH net-next 13/15] net/mlx5e: SHAMPO, Use KSMs instead of KLMs Tariq Toukan
2024-05-30  1:23   ` Jakub Kicinski
2024-05-30  3:26     ` Saeed Mahameed
2024-05-28 14:28 ` [PATCH net-next 14/15] net/mlx5e: SHAMPO, Re-enable HW-GRO Tariq Toukan
2024-05-28 14:28 ` [PATCH net-next 15/15] net/mlx5e: SHAMPO, Coalesce skb fragments to page size Tariq Toukan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).