netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next V2 00/14] net/mlx5e: SHAMPO, Enable HW GRO once more
@ 2024-06-03 21:22 Tariq Toukan
  2024-06-03 21:22 ` [PATCH net-next V2 01/14] net/mlx5e: SHAMPO, Use net_prefetch API Tariq Toukan
                   ` (14 more replies)
  0 siblings, 15 replies; 18+ messages in thread
From: Tariq Toukan @ 2024-06-03 21:22 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: netdev, Saeed Mahameed, Gal Pressman, Leon Romanovsky,
	Tariq Toukan

This series enables hardware GRO for ConnectX-7 and newer NICs.
SHAMPO stands for Split Header And Merge Payload Offload.

The first part of the series contains important fixes and improvements.

The second part reworks the HW GRO counters.

Lastly, HW GRO is perf optimized and enabled.

Here are the bandwidth numbers for a simple iperf3 test over a single rq
where the application and irq are pinned to the same CPU:

+---------+--------+--------+-----------+-------------+
| streams | SW GRO | HW GRO | Unit      | Improvement |
+---------+--------+--------+-----------+-------------+
| 1       | 36     | 57     | Gbits/sec |    1.6 x    |
| 4       | 34     | 50     | Gbits/sec |    1.5 x    |
| 8       | 31     | 43     | Gbits/sec |    1.4 x    |
+---------+--------+--------+-----------+-------------+

Benchmark details:
VM based setup
CPU: Intel(R) Xeon(R) Platinum 8380 CPU, 24 cores
NIC: ConnectX-7 100GbE
iperf3 and irq running on same CPU over a single receive queue

Series generated against:
commit 83042ce9b7c3 ("Merge branch 'Felix-DSA-probing-cleanup'")

Thanks,
Tariq.

V2:
- Dropped the patch that adds no-split counters, we plan to add in the future
  with detailed documentation.

Dragos Tatulea (9):
  net/mlx5e: SHAMPO, Fix incorrect page release
  net/mlx5e: SHAMPO, Fix invalid WQ linked list unlink
  net/mlx5e: SHAMPO, Fix FCS config when HW GRO on
  net/mlx5e: SHAMPO, Disable gso_size for non GRO packets
  net/mlx5e: SHAMPO, Simplify header page release in teardown
  net/mlx5e: SHAMPO, Specialize mlx5e_fill_skb_data()
  net/mlx5e: SHAMPO, Make GRO counters more precise
  net/mlx5e: SHAMPO, Drop rx_gro_match_packets counter
  net/mlx5e: SHAMPO, Coalesce skb fragments to page size

Tariq Toukan (2):
  net/mlx5e: SHAMPO, Use net_prefetch API
  net/mlx5e: SHAMPO, Add header-only ethtool counters for header data
    split

Yoray Zack (3):
  net/mlx5e: SHAMPO, Skipping on duplicate flush of the same SHAMPO SKB
  net/mlx5e: SHAMPO, Use KSMs instead of KLMs
  net/mlx5e: SHAMPO, Re-enable HW-GRO

 .../ethernet/mellanox/mlx5/counters.rst       |  24 ++-
 drivers/net/ethernet/mellanox/mlx5/core/en.h  |  22 +-
 .../ethernet/mellanox/mlx5/core/en/params.c   |  12 +-
 .../net/ethernet/mellanox/mlx5/core/en/txrx.h |  19 ++
 .../net/ethernet/mellanox/mlx5/core/en_main.c |  71 ++++--
 .../net/ethernet/mellanox/mlx5/core/en_rx.c   | 202 ++++++++----------
 .../ethernet/mellanox/mlx5/core/en_stats.c    |   7 +-
 .../ethernet/mellanox/mlx5/core/en_stats.h    |   6 +-
 include/linux/mlx5/device.h                   |   1 +
 include/linux/mlx5/mlx5_ifc.h                 |  16 +-
 10 files changed, 202 insertions(+), 178 deletions(-)

-- 
2.44.0


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH net-next V2 01/14] net/mlx5e: SHAMPO, Use net_prefetch API
  2024-06-03 21:22 [PATCH net-next V2 00/14] net/mlx5e: SHAMPO, Enable HW GRO once more Tariq Toukan
@ 2024-06-03 21:22 ` Tariq Toukan
  2024-06-03 21:22 ` [PATCH net-next V2 02/14] net/mlx5e: SHAMPO, Fix incorrect page release Tariq Toukan
                   ` (13 subsequent siblings)
  14 siblings, 0 replies; 18+ messages in thread
From: Tariq Toukan @ 2024-06-03 21:22 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: netdev, Saeed Mahameed, Gal Pressman, Leon Romanovsky,
	Tariq Toukan

Let the SHAMPO functions use the net-specific prefetch API,
similar to all other usages.

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
index b5333da20e8a..369d101bf03c 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -2212,8 +2212,8 @@ mlx5e_skb_from_cqe_shampo(struct mlx5e_rq *rq, struct mlx5e_mpw_info *wi,
 	if (likely(frag_size <= BIT(MLX5E_SHAMPO_LOG_MAX_HEADER_ENTRY_SIZE))) {
 		/* build SKB around header */
 		dma_sync_single_range_for_cpu(rq->pdev, head->addr, 0, frag_size, rq->buff.map_dir);
-		prefetchw(hdr);
-		prefetch(data);
+		net_prefetchw(hdr);
+		net_prefetch(data);
 		skb = mlx5e_build_linear_skb(rq, hdr, frag_size, rx_headroom, head_size, 0);
 
 		if (unlikely(!skb))
@@ -2230,7 +2230,7 @@ mlx5e_skb_from_cqe_shampo(struct mlx5e_rq *rq, struct mlx5e_mpw_info *wi,
 			return NULL;
 		}
 
-		prefetchw(skb->data);
+		net_prefetchw(skb->data);
 		mlx5e_copy_skb_header(rq, skb, head->frag_page->page, head->addr,
 				      head_offset + rx_headroom,
 				      rx_headroom, head_size);
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH net-next V2 02/14] net/mlx5e: SHAMPO, Fix incorrect page release
  2024-06-03 21:22 [PATCH net-next V2 00/14] net/mlx5e: SHAMPO, Enable HW GRO once more Tariq Toukan
  2024-06-03 21:22 ` [PATCH net-next V2 01/14] net/mlx5e: SHAMPO, Use net_prefetch API Tariq Toukan
@ 2024-06-03 21:22 ` Tariq Toukan
  2024-06-06  3:20   ` Jakub Kicinski
  2024-06-03 21:22 ` [PATCH net-next V2 03/14] net/mlx5e: SHAMPO, Fix invalid WQ linked list unlink Tariq Toukan
                   ` (12 subsequent siblings)
  14 siblings, 1 reply; 18+ messages in thread
From: Tariq Toukan @ 2024-06-03 21:22 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: netdev, Saeed Mahameed, Gal Pressman, Leon Romanovsky,
	Dragos Tatulea, Tariq Toukan

From: Dragos Tatulea <dtatulea@nvidia.com>

Under the following conditions:
1) No skb created yet
2) header_size == 0 (no SHAMPO header)
3) header_index + 1 % MLX5E_SHAMPO_WQ_HEADER_PER_PAGE == 0 (this is the
   last page fragment of a SHAMPO header page)

a new skb is formed with a page that is NOT a SHAMPO header page (it
is a regular data page). Further down in the same function
(mlx5e_handle_rx_cqe_mpwrq_shampo()), a SHAMPO header page from
header_index is released. This is wrong and it leads to SHAMPO header
pages being released more than once.

Fixes: 6f5742846053 ("net/mlx5e: RX, Enable skb page recycling through the page_pool")
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
index 369d101bf03c..1ddfa00f923f 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -2369,7 +2369,8 @@ static void mlx5e_handle_rx_cqe_mpwrq_shampo(struct mlx5e_rq *rq, struct mlx5_cq
 	if (flush)
 		mlx5e_shampo_flush_skb(rq, cqe, match);
 free_hd_entry:
-	mlx5e_free_rx_shampo_hd_entry(rq, header_index);
+	if (likely(head_size))
+		mlx5e_free_rx_shampo_hd_entry(rq, header_index);
 mpwrq_cqe_out:
 	if (likely(wi->consumed_strides < rq->mpwqe.num_strides))
 		return;
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH net-next V2 03/14] net/mlx5e: SHAMPO, Fix invalid WQ linked list unlink
  2024-06-03 21:22 [PATCH net-next V2 00/14] net/mlx5e: SHAMPO, Enable HW GRO once more Tariq Toukan
  2024-06-03 21:22 ` [PATCH net-next V2 01/14] net/mlx5e: SHAMPO, Use net_prefetch API Tariq Toukan
  2024-06-03 21:22 ` [PATCH net-next V2 02/14] net/mlx5e: SHAMPO, Fix incorrect page release Tariq Toukan
@ 2024-06-03 21:22 ` Tariq Toukan
  2024-06-03 21:22 ` [PATCH net-next V2 04/14] net/mlx5e: SHAMPO, Fix FCS config when HW GRO on Tariq Toukan
                   ` (11 subsequent siblings)
  14 siblings, 0 replies; 18+ messages in thread
From: Tariq Toukan @ 2024-06-03 21:22 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: netdev, Saeed Mahameed, Gal Pressman, Leon Romanovsky,
	Dragos Tatulea, Tariq Toukan

From: Dragos Tatulea <dtatulea@nvidia.com>

When all the strides in a WQE have been consumed, the WQE is unlinked
from the WQ linked list (mlx5_wq_ll_pop()). For SHAMPO, it is possible
to receive CQEs with 0 consumed strides for the same WQE even after the
WQE is fully consumed and unlinked. This triggers an additional unlink
for the same wqe which corrupts the linked list.

Fix this scenario by accepting 0 sized consumed strides without
unlinking the WQE again.

Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
index 1ddfa00f923f..b3ef0dd23729 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -2375,6 +2375,9 @@ static void mlx5e_handle_rx_cqe_mpwrq_shampo(struct mlx5e_rq *rq, struct mlx5_cq
 	if (likely(wi->consumed_strides < rq->mpwqe.num_strides))
 		return;
 
+	if (unlikely(!cstrides))
+		return;
+
 	wq  = &rq->mpwqe.wq;
 	wqe = mlx5_wq_ll_get_wqe(wq, wqe_id);
 	mlx5_wq_ll_pop(wq, cqe->wqe_id, &wqe->next.next_wqe_index);
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH net-next V2 04/14] net/mlx5e: SHAMPO, Fix FCS config when HW GRO on
  2024-06-03 21:22 [PATCH net-next V2 00/14] net/mlx5e: SHAMPO, Enable HW GRO once more Tariq Toukan
                   ` (2 preceding siblings ...)
  2024-06-03 21:22 ` [PATCH net-next V2 03/14] net/mlx5e: SHAMPO, Fix invalid WQ linked list unlink Tariq Toukan
@ 2024-06-03 21:22 ` Tariq Toukan
  2024-06-03 21:22 ` [PATCH net-next V2 05/14] net/mlx5e: SHAMPO, Disable gso_size for non GRO packets Tariq Toukan
                   ` (10 subsequent siblings)
  14 siblings, 0 replies; 18+ messages in thread
From: Tariq Toukan @ 2024-06-03 21:22 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: netdev, Saeed Mahameed, Gal Pressman, Leon Romanovsky,
	Dragos Tatulea, Tariq Toukan

From: Dragos Tatulea <dtatulea@nvidia.com>

For the following scenario:

ethtool --features eth3 rx-gro-hw on
ethtool --features eth3 rx-fcs on
ethtool --features eth3 rx-fcs off

... there is a firmware error because the driver enables HW GRO first
while FCS is still enabled.

This patch fixes this by swapping the order of HW GRO and FCS for this
specific case. Take LRO into consideration as well for consistency.

Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index c53c99dde558..d0808dbe69d3 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -4259,13 +4259,19 @@ int mlx5e_set_features(struct net_device *netdev, netdev_features_t features)
 #define MLX5E_HANDLE_FEATURE(feature, handler) \
 	mlx5e_handle_feature(netdev, &oper_features, feature, handler)
 
-	err |= MLX5E_HANDLE_FEATURE(NETIF_F_LRO, set_feature_lro);
-	err |= MLX5E_HANDLE_FEATURE(NETIF_F_GRO_HW, set_feature_hw_gro);
+	if (features & (NETIF_F_GRO_HW | NETIF_F_LRO)) {
+		err |= MLX5E_HANDLE_FEATURE(NETIF_F_RXFCS, set_feature_rx_fcs);
+		err |= MLX5E_HANDLE_FEATURE(NETIF_F_LRO, set_feature_lro);
+		err |= MLX5E_HANDLE_FEATURE(NETIF_F_GRO_HW, set_feature_hw_gro);
+	} else {
+		err |= MLX5E_HANDLE_FEATURE(NETIF_F_LRO, set_feature_lro);
+		err |= MLX5E_HANDLE_FEATURE(NETIF_F_GRO_HW, set_feature_hw_gro);
+		err |= MLX5E_HANDLE_FEATURE(NETIF_F_RXFCS, set_feature_rx_fcs);
+	}
 	err |= MLX5E_HANDLE_FEATURE(NETIF_F_HW_VLAN_CTAG_FILTER,
 				    set_feature_cvlan_filter);
 	err |= MLX5E_HANDLE_FEATURE(NETIF_F_HW_TC, set_feature_hw_tc);
 	err |= MLX5E_HANDLE_FEATURE(NETIF_F_RXALL, set_feature_rx_all);
-	err |= MLX5E_HANDLE_FEATURE(NETIF_F_RXFCS, set_feature_rx_fcs);
 	err |= MLX5E_HANDLE_FEATURE(NETIF_F_HW_VLAN_CTAG_RX, set_feature_rx_vlan);
 #ifdef CONFIG_MLX5_EN_ARFS
 	err |= MLX5E_HANDLE_FEATURE(NETIF_F_NTUPLE, set_feature_arfs);
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH net-next V2 05/14] net/mlx5e: SHAMPO, Disable gso_size for non GRO packets
  2024-06-03 21:22 [PATCH net-next V2 00/14] net/mlx5e: SHAMPO, Enable HW GRO once more Tariq Toukan
                   ` (3 preceding siblings ...)
  2024-06-03 21:22 ` [PATCH net-next V2 04/14] net/mlx5e: SHAMPO, Fix FCS config when HW GRO on Tariq Toukan
@ 2024-06-03 21:22 ` Tariq Toukan
  2024-06-03 21:22 ` [PATCH net-next V2 06/14] net/mlx5e: SHAMPO, Simplify header page release in teardown Tariq Toukan
                   ` (9 subsequent siblings)
  14 siblings, 0 replies; 18+ messages in thread
From: Tariq Toukan @ 2024-06-03 21:22 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: netdev, Saeed Mahameed, Gal Pressman, Leon Romanovsky,
	Dragos Tatulea, Tariq Toukan

From: Dragos Tatulea <dtatulea@nvidia.com>

When HW GRO is enabled, forwarding of packets is broken due to gso_size
being set incorrectly on non GRO packets.

Non GRO packets have a skb GRO count of 1. mlx5 always sets gso_size on
the skb, even for non GRO packets. It leans on the fact that gso_size is
normally reset in napi_gro_complete(). But this happens only for packets
from GRO'able protocols (TCP/UDP) that have a gro_receive() handler.

The problematic scenarios are:

1) Non GRO protocol packets are received, validate_xmit_skb() will drop
   them (see EPROTONOSUPPORT in skb_mac_gso_segment()). The fix for
   this case would be to not set gso_size at all for SHAMPO packets with
   header size 0.

2) Packets from a GRO'ed protocol (TCP) are received but immediately
   flushed because they are not GRO'able (TCP SYN for example).
   mlx5e_shampo_update_hdr(), which updates the remaining GRO state on
   the skb, is not called because skb GRO count is 1. The fix here would
   be to always call mlx5e_shampo_update_hdr(), regardless of skb GRO
   count. But this call is expensive

The unified fix for both cases is to reset gso_size before calling
napi_gro_receive(). It is a change that is more effective (no call to
mlx5e_shampo_update_hdr() necessary) and simple (smallest code
footprint).

Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
index b3ef0dd23729..a13fa760f948 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -2267,6 +2267,8 @@ mlx5e_shampo_flush_skb(struct mlx5e_rq *rq, struct mlx5_cqe64 *cqe, bool match)
 		mlx5e_shampo_align_fragment(skb, rq->mpwqe.log_stride_sz);
 	if (NAPI_GRO_CB(skb)->count > 1)
 		mlx5e_shampo_update_hdr(rq, cqe, match);
+	else
+		skb_shinfo(skb)->gso_size = 0;
 	napi_gro_receive(rq->cq.napi, skb);
 	rq->hw_gro_data->skb = NULL;
 }
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH net-next V2 06/14] net/mlx5e: SHAMPO, Simplify header page release in teardown
  2024-06-03 21:22 [PATCH net-next V2 00/14] net/mlx5e: SHAMPO, Enable HW GRO once more Tariq Toukan
                   ` (4 preceding siblings ...)
  2024-06-03 21:22 ` [PATCH net-next V2 05/14] net/mlx5e: SHAMPO, Disable gso_size for non GRO packets Tariq Toukan
@ 2024-06-03 21:22 ` Tariq Toukan
  2024-06-03 21:22 ` [PATCH net-next V2 07/14] net/mlx5e: SHAMPO, Specialize mlx5e_fill_skb_data() Tariq Toukan
                   ` (8 subsequent siblings)
  14 siblings, 0 replies; 18+ messages in thread
From: Tariq Toukan @ 2024-06-03 21:22 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: netdev, Saeed Mahameed, Gal Pressman, Leon Romanovsky,
	Dragos Tatulea, Tariq Toukan

From: Dragos Tatulea <dtatulea@nvidia.com>

The function that releases SHAMPO header pages (mlx5e_shampo_dealloc_hd)
has some complicated logic that comes from the fact that it is called
twice during teardown:
1) To release the posted header pages that didn't get any completions.
2) To release all remaining header pages.

This flow is not necessary: all header pages can be released from the
driver side in one go. Furthermore, the above flow is buggy. Taking the
8 headers per page example:
1) Release fragments 5-7. Page will be released.
2) Release remaining fragments 0-4. The bits in the header will indicate
   that the page needs releasing. But this is incorrect: page was
   released in step 1.

This patch releases all header pages in one go. This simplifies the
header page cleanup function. For consistency, the datapath header
page release API (mlx5e_free_rx_shampo_hd_entry()) is used.

Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h  |  2 +-
 .../net/ethernet/mellanox/mlx5/core/en_main.c | 12 +---
 .../net/ethernet/mellanox/mlx5/core/en_rx.c   | 61 +++++--------------
 3 files changed, 17 insertions(+), 58 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index e85fb71bf0b4..ff326601d4a4 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -1014,7 +1014,7 @@ void mlx5e_build_ptys2ethtool_map(void);
 bool mlx5e_check_fragmented_striding_rq_cap(struct mlx5_core_dev *mdev, u8 page_shift,
 					    enum mlx5e_mpwrq_umr_mode umr_mode);
 
-void mlx5e_shampo_dealloc_hd(struct mlx5e_rq *rq, u16 len, u16 start, bool close);
+void mlx5e_shampo_dealloc_hd(struct mlx5e_rq *rq);
 void mlx5e_get_stats(struct net_device *dev, struct rtnl_link_stats64 *stats);
 void mlx5e_fold_sw_stats64(struct mlx5e_priv *priv, struct rtnl_link_stats64 *s);
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index d0808dbe69d3..d21a87ddc934 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -1208,15 +1208,6 @@ void mlx5e_free_rx_missing_descs(struct mlx5e_rq *rq)
 		head = mlx5_wq_ll_get_wqe_next_ix(wq, head);
 	}
 
-	if (test_bit(MLX5E_RQ_STATE_SHAMPO, &rq->state)) {
-		u16 len;
-
-		len = (rq->mpwqe.shampo->pi - rq->mpwqe.shampo->ci) &
-		      (rq->mpwqe.shampo->hd_per_wq - 1);
-		mlx5e_shampo_dealloc_hd(rq, len, rq->mpwqe.shampo->ci, false);
-		rq->mpwqe.shampo->pi = rq->mpwqe.shampo->ci;
-	}
-
 	rq->mpwqe.actual_wq_head = wq->head;
 	rq->mpwqe.umr_in_progress = 0;
 	rq->mpwqe.umr_completed = 0;
@@ -1244,8 +1235,7 @@ void mlx5e_free_rx_descs(struct mlx5e_rq *rq)
 		}
 
 		if (test_bit(MLX5E_RQ_STATE_SHAMPO, &rq->state))
-			mlx5e_shampo_dealloc_hd(rq, rq->mpwqe.shampo->hd_per_wq,
-						0, true);
+			mlx5e_shampo_dealloc_hd(rq);
 	} else {
 		struct mlx5_wq_cyc *wq = &rq->wqe.wq;
 		u16 missing = mlx5_wq_cyc_missing(wq);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
index a13fa760f948..bb59ee0b1567 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -839,44 +839,28 @@ static int mlx5e_alloc_rx_mpwqe(struct mlx5e_rq *rq, u16 ix)
 	return err;
 }
 
-/* This function is responsible to dealloc SHAMPO header buffer.
- * close == true specifies that we are in the middle of closing RQ operation so
- * we go over all the entries and if they are not in use we free them,
- * otherwise we only go over a specific range inside the header buffer that are
- * not in use.
- */
-void mlx5e_shampo_dealloc_hd(struct mlx5e_rq *rq, u16 len, u16 start, bool close)
+static void
+mlx5e_free_rx_shampo_hd_entry(struct mlx5e_rq *rq, u16 header_index)
 {
 	struct mlx5e_shampo_hd *shampo = rq->mpwqe.shampo;
-	struct mlx5e_frag_page *deleted_page = NULL;
-	int hd_per_wq = shampo->hd_per_wq;
-	struct mlx5e_dma_info *hd_info;
-	int i, index = start;
-
-	for (i = 0; i < len; i++, index++) {
-		if (index == hd_per_wq)
-			index = 0;
-
-		if (close && !test_bit(index, shampo->bitmap))
-			continue;
+	u64 addr = shampo->info[header_index].addr;
 
-		hd_info = &shampo->info[index];
-		hd_info->addr = ALIGN_DOWN(hd_info->addr, PAGE_SIZE);
-		if (hd_info->frag_page && hd_info->frag_page != deleted_page) {
-			deleted_page = hd_info->frag_page;
-			mlx5e_page_release_fragmented(rq, hd_info->frag_page);
-		}
+	if (((header_index + 1) & (MLX5E_SHAMPO_WQ_HEADER_PER_PAGE - 1)) == 0) {
+		struct mlx5e_dma_info *dma_info = &shampo->info[header_index];
 
-		hd_info->frag_page = NULL;
+		dma_info->addr = ALIGN_DOWN(addr, PAGE_SIZE);
+		mlx5e_page_release_fragmented(rq, dma_info->frag_page);
 	}
+	clear_bit(header_index, shampo->bitmap);
+}
 
-	if (start + len > hd_per_wq) {
-		len -= hd_per_wq - start;
-		bitmap_clear(shampo->bitmap, start, hd_per_wq - start);
-		start = 0;
-	}
+void mlx5e_shampo_dealloc_hd(struct mlx5e_rq *rq)
+{
+	struct mlx5e_shampo_hd *shampo = rq->mpwqe.shampo;
+	int i;
 
-	bitmap_clear(shampo->bitmap, start, len);
+	for_each_set_bit(i, shampo->bitmap, rq->mpwqe.shampo->hd_per_wq)
+		mlx5e_free_rx_shampo_hd_entry(rq, i);
 }
 
 static void mlx5e_dealloc_rx_mpwqe(struct mlx5e_rq *rq, u16 ix)
@@ -2281,21 +2265,6 @@ mlx5e_hw_gro_skb_has_enough_space(struct sk_buff *skb, u16 data_bcnt)
 	return PAGE_SIZE * nr_frags + data_bcnt <= GRO_LEGACY_MAX_SIZE;
 }
 
-static void
-mlx5e_free_rx_shampo_hd_entry(struct mlx5e_rq *rq, u16 header_index)
-{
-	struct mlx5e_shampo_hd *shampo = rq->mpwqe.shampo;
-	u64 addr = shampo->info[header_index].addr;
-
-	if (((header_index + 1) & (MLX5E_SHAMPO_WQ_HEADER_PER_PAGE - 1)) == 0) {
-		struct mlx5e_dma_info *dma_info = &shampo->info[header_index];
-
-		dma_info->addr = ALIGN_DOWN(addr, PAGE_SIZE);
-		mlx5e_page_release_fragmented(rq, dma_info->frag_page);
-	}
-	bitmap_clear(shampo->bitmap, header_index, 1);
-}
-
 static void mlx5e_handle_rx_cqe_mpwrq_shampo(struct mlx5e_rq *rq, struct mlx5_cqe64 *cqe)
 {
 	u16 data_bcnt		= mpwrq_get_cqe_byte_cnt(cqe) - cqe->shampo.header_size;
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH net-next V2 07/14] net/mlx5e: SHAMPO, Specialize mlx5e_fill_skb_data()
  2024-06-03 21:22 [PATCH net-next V2 00/14] net/mlx5e: SHAMPO, Enable HW GRO once more Tariq Toukan
                   ` (5 preceding siblings ...)
  2024-06-03 21:22 ` [PATCH net-next V2 06/14] net/mlx5e: SHAMPO, Simplify header page release in teardown Tariq Toukan
@ 2024-06-03 21:22 ` Tariq Toukan
  2024-06-03 21:22 ` [PATCH net-next V2 08/14] net/mlx5e: SHAMPO, Skipping on duplicate flush of the same SHAMPO SKB Tariq Toukan
                   ` (7 subsequent siblings)
  14 siblings, 0 replies; 18+ messages in thread
From: Tariq Toukan @ 2024-06-03 21:22 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: netdev, Saeed Mahameed, Gal Pressman, Leon Romanovsky,
	Dragos Tatulea, Tariq Toukan

From: Dragos Tatulea <dtatulea@nvidia.com>

mlx5e_fill_skb_data() used to have multiple callers. But after the XDP
multibuf refactoring from commit 2cb0e27d43b4 ("net/mlx5e: RX, Prepare
non-linear striding RQ for XDP multi-buffer support") the SHAMPO code
path is the only caller.

Take advantage of this and specialize the function:
- Drop the redundant check.
- Assume that data_bcnt is > 0. This is needed in a downstream patch.

Rename the function as well to make things clear.

Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Suggested-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
 .../net/ethernet/mellanox/mlx5/core/en_rx.c   | 25 ++++++++-----------
 1 file changed, 11 insertions(+), 14 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
index bb59ee0b1567..1e3a5b2afeae 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -1948,21 +1948,16 @@ const struct mlx5e_rx_handlers mlx5e_rx_handlers_rep = {
 #endif
 
 static void
-mlx5e_fill_skb_data(struct sk_buff *skb, struct mlx5e_rq *rq,
-		    struct mlx5e_frag_page *frag_page,
-		    u32 data_bcnt, u32 data_offset)
+mlx5e_shampo_fill_skb_data(struct sk_buff *skb, struct mlx5e_rq *rq,
+			   struct mlx5e_frag_page *frag_page,
+			   u32 data_bcnt, u32 data_offset)
 {
 	net_prefetchw(skb->data);
 
-	while (data_bcnt) {
+	do {
 		/* Non-linear mode, hence non-XSK, which always uses PAGE_SIZE. */
 		u32 pg_consumed_bytes = min_t(u32, PAGE_SIZE - data_offset, data_bcnt);
-		unsigned int truesize;
-
-		if (test_bit(MLX5E_RQ_STATE_SHAMPO, &rq->state))
-			truesize = pg_consumed_bytes;
-		else
-			truesize = ALIGN(pg_consumed_bytes, BIT(rq->mpwqe.log_stride_sz));
+		unsigned int truesize = pg_consumed_bytes;
 
 		frag_page->frags++;
 		mlx5e_add_skb_frag(rq, skb, frag_page->page, data_offset,
@@ -1971,7 +1966,7 @@ mlx5e_fill_skb_data(struct sk_buff *skb, struct mlx5e_rq *rq,
 		data_bcnt -= pg_consumed_bytes;
 		data_offset = 0;
 		frag_page++;
-	}
+	} while (data_bcnt);
 }
 
 static struct sk_buff *
@@ -2330,10 +2325,12 @@ static void mlx5e_handle_rx_cqe_mpwrq_shampo(struct mlx5e_rq *rq, struct mlx5_cq
 	}
 
 	if (likely(head_size)) {
-		struct mlx5e_frag_page *frag_page;
+		if (data_bcnt) {
+			struct mlx5e_frag_page *frag_page;
 
-		frag_page = &wi->alloc_units.frag_pages[page_idx];
-		mlx5e_fill_skb_data(*skb, rq, frag_page, data_bcnt, data_offset);
+			frag_page = &wi->alloc_units.frag_pages[page_idx];
+			mlx5e_shampo_fill_skb_data(*skb, rq, frag_page, data_bcnt, data_offset);
+		}
 	}
 
 	mlx5e_shampo_complete_rx_cqe(rq, cqe, cqe_bcnt, *skb);
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH net-next V2 08/14] net/mlx5e: SHAMPO, Skipping on duplicate flush of the same SHAMPO SKB
  2024-06-03 21:22 [PATCH net-next V2 00/14] net/mlx5e: SHAMPO, Enable HW GRO once more Tariq Toukan
                   ` (6 preceding siblings ...)
  2024-06-03 21:22 ` [PATCH net-next V2 07/14] net/mlx5e: SHAMPO, Specialize mlx5e_fill_skb_data() Tariq Toukan
@ 2024-06-03 21:22 ` Tariq Toukan
  2024-06-03 21:22 ` [PATCH net-next V2 09/14] net/mlx5e: SHAMPO, Make GRO counters more precise Tariq Toukan
                   ` (6 subsequent siblings)
  14 siblings, 0 replies; 18+ messages in thread
From: Tariq Toukan @ 2024-06-03 21:22 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: netdev, Saeed Mahameed, Gal Pressman, Leon Romanovsky, Yoray Zack,
	Tariq Toukan

From: Yoray Zack <yorayz@nvidia.com>

SHAMPO SKB can be flushed in mlx5e_shampo_complete_rx_cqe().
If the SKB was flushed, rq->hw_gro_data->skb was also set to NULL.

We can skip on flushing the SKB in mlx5e_shampo_flush_skb
if rq->hw_gro_data->skb == NULL.

Signed-off-by: Yoray Zack <yorayz@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
index 1e3a5b2afeae..3f76c33aada0 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -2334,7 +2334,7 @@ static void mlx5e_handle_rx_cqe_mpwrq_shampo(struct mlx5e_rq *rq, struct mlx5_cq
 	}
 
 	mlx5e_shampo_complete_rx_cqe(rq, cqe, cqe_bcnt, *skb);
-	if (flush)
+	if (flush && rq->hw_gro_data->skb)
 		mlx5e_shampo_flush_skb(rq, cqe, match);
 free_hd_entry:
 	if (likely(head_size))
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH net-next V2 09/14] net/mlx5e: SHAMPO, Make GRO counters more precise
  2024-06-03 21:22 [PATCH net-next V2 00/14] net/mlx5e: SHAMPO, Enable HW GRO once more Tariq Toukan
                   ` (7 preceding siblings ...)
  2024-06-03 21:22 ` [PATCH net-next V2 08/14] net/mlx5e: SHAMPO, Skipping on duplicate flush of the same SHAMPO SKB Tariq Toukan
@ 2024-06-03 21:22 ` Tariq Toukan
  2024-06-03 21:22 ` [PATCH net-next V2 10/14] net/mlx5e: SHAMPO, Drop rx_gro_match_packets counter Tariq Toukan
                   ` (5 subsequent siblings)
  14 siblings, 0 replies; 18+ messages in thread
From: Tariq Toukan @ 2024-06-03 21:22 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: netdev, Saeed Mahameed, Gal Pressman, Leon Romanovsky,
	Dragos Tatulea, Tariq Toukan

From: Dragos Tatulea <dtatulea@nvidia.com>

Don't count non GRO packets. A non GRO packet is a packet with
a GRO cb count of 1.

Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
 .../ethernet/mellanox/mlx5/counters.rst             | 10 ++++++----
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c     | 13 ++++++++-----
 2 files changed, 14 insertions(+), 9 deletions(-)

diff --git a/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/counters.rst b/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/counters.rst
index fed821ef9b09..7ed010dbe469 100644
--- a/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/counters.rst
+++ b/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/counters.rst
@@ -189,17 +189,19 @@ the software port.
 
    * - `rx[i]_gro_packets`
      - Number of received packets processed using hardware-accelerated GRO. The
-       number of hardware GRO offloaded packets received on ring i.
+       number of hardware GRO offloaded packets received on ring i. Only true GRO
+       packets are counted: only packets that are in an SKB with a GRO count > 1.
      - Acceleration
 
    * - `rx[i]_gro_bytes`
      - Number of received bytes processed using hardware-accelerated GRO. The
-       number of hardware GRO offloaded bytes received on ring i.
+       number of hardware GRO offloaded bytes received on ring i. Only true GRO
+       packets are counted: only packets that are in an SKB with a GRO count > 1.
      - Acceleration
 
    * - `rx[i]_gro_skbs`
-     - The number of receive SKBs constructed while performing
-       hardware-accelerated GRO.
+     - The number of GRO SKBs constructed from hardware-accelerated GRO. Only SKBs
+       with a GRO count > 1 are counted.
      - Informative
 
    * - `rx[i]_gro_match_packets`
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
index 3f76c33aada0..79b486d5475d 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -1596,9 +1596,7 @@ static void mlx5e_shampo_complete_rx_cqe(struct mlx5e_rq *rq,
 	struct mlx5e_rq_stats *stats = rq->stats;
 
 	stats->packets++;
-	stats->gro_packets++;
 	stats->bytes += cqe_bcnt;
-	stats->gro_bytes += cqe_bcnt;
 	if (NAPI_GRO_CB(skb)->count != 1)
 		return;
 	mlx5e_build_rx_skb(cqe, cqe_bcnt, rq, skb);
@@ -2240,14 +2238,19 @@ mlx5e_shampo_flush_skb(struct mlx5e_rq *rq, struct mlx5_cqe64 *cqe, bool match)
 {
 	struct sk_buff *skb = rq->hw_gro_data->skb;
 	struct mlx5e_rq_stats *stats = rq->stats;
+	u16 gro_count = NAPI_GRO_CB(skb)->count;
 
-	stats->gro_skbs++;
 	if (likely(skb_shinfo(skb)->nr_frags))
 		mlx5e_shampo_align_fragment(skb, rq->mpwqe.log_stride_sz);
-	if (NAPI_GRO_CB(skb)->count > 1)
+	if (gro_count > 1) {
+		stats->gro_skbs++;
+		stats->gro_packets += gro_count;
+		stats->gro_bytes += skb->data_len + skb_headlen(skb) * gro_count;
+
 		mlx5e_shampo_update_hdr(rq, cqe, match);
-	else
+	} else {
 		skb_shinfo(skb)->gso_size = 0;
+	}
 	napi_gro_receive(rq->cq.napi, skb);
 	rq->hw_gro_data->skb = NULL;
 }
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH net-next V2 10/14] net/mlx5e: SHAMPO, Drop rx_gro_match_packets counter
  2024-06-03 21:22 [PATCH net-next V2 00/14] net/mlx5e: SHAMPO, Enable HW GRO once more Tariq Toukan
                   ` (8 preceding siblings ...)
  2024-06-03 21:22 ` [PATCH net-next V2 09/14] net/mlx5e: SHAMPO, Make GRO counters more precise Tariq Toukan
@ 2024-06-03 21:22 ` Tariq Toukan
  2024-06-03 21:22 ` [PATCH net-next V2 11/14] net/mlx5e: SHAMPO, Add header-only ethtool counters for header data split Tariq Toukan
                   ` (4 subsequent siblings)
  14 siblings, 0 replies; 18+ messages in thread
From: Tariq Toukan @ 2024-06-03 21:22 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: netdev, Saeed Mahameed, Gal Pressman, Leon Romanovsky,
	Dragos Tatulea, Tariq Toukan

From: Dragos Tatulea <dtatulea@nvidia.com>

After modifying rx_gro_packets to be more accurate, the
rx_gro_match_packets counter is redundant.

Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
 .../device_drivers/ethernet/mellanox/mlx5/counters.rst       | 5 -----
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c              | 2 --
 drivers/net/ethernet/mellanox/mlx5/core/en_stats.c           | 3 ---
 drivers/net/ethernet/mellanox/mlx5/core/en_stats.h           | 2 --
 4 files changed, 12 deletions(-)

diff --git a/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/counters.rst b/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/counters.rst
index 7ed010dbe469..18638a8e7c73 100644
--- a/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/counters.rst
+++ b/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/counters.rst
@@ -204,11 +204,6 @@ the software port.
        with a GRO count > 1 are counted.
      - Informative
 
-   * - `rx[i]_gro_match_packets`
-     - Number of received packets processed using hardware-accelerated GRO that
-       met the flow table match criteria.
-     - Informative
-
    * - `rx[i]_gro_large_hds`
      - Number of receive packets using hardware-accelerated GRO that have large
        headers that require additional memory to be allocated.
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
index 79b486d5475d..7ab7215843b6 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -2296,8 +2296,6 @@ static void mlx5e_handle_rx_cqe_mpwrq_shampo(struct mlx5e_rq *rq, struct mlx5_cq
 		goto mpwrq_cqe_out;
 	}
 
-	stats->gro_match_packets += match;
-
 	if (*skb && (!match || !(mlx5e_hw_gro_skb_has_enough_space(*skb, data_bcnt)))) {
 		match = false;
 		mlx5e_shampo_flush_skb(rq, cqe, match);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c
index e1ed214e8651..a3c79da1525b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c
@@ -141,7 +141,6 @@ static const struct counter_desc sw_stats_desc[] = {
 	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_gro_packets) },
 	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_gro_bytes) },
 	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_gro_skbs) },
-	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_gro_match_packets) },
 	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_gro_large_hds) },
 	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_ecn_mark) },
 	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_removed_vlan_packets) },
@@ -343,7 +342,6 @@ static void mlx5e_stats_grp_sw_update_stats_rq_stats(struct mlx5e_sw_stats *s,
 	s->rx_gro_packets             += rq_stats->gro_packets;
 	s->rx_gro_bytes               += rq_stats->gro_bytes;
 	s->rx_gro_skbs                += rq_stats->gro_skbs;
-	s->rx_gro_match_packets       += rq_stats->gro_match_packets;
 	s->rx_gro_large_hds           += rq_stats->gro_large_hds;
 	s->rx_ecn_mark                += rq_stats->ecn_mark;
 	s->rx_removed_vlan_packets    += rq_stats->removed_vlan_packets;
@@ -2057,7 +2055,6 @@ static const struct counter_desc rq_stats_desc[] = {
 	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, gro_packets) },
 	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, gro_bytes) },
 	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, gro_skbs) },
-	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, gro_match_packets) },
 	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, gro_large_hds) },
 	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, ecn_mark) },
 	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, removed_vlan_packets) },
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h
index 650732288616..25daae526caa 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h
@@ -153,7 +153,6 @@ struct mlx5e_sw_stats {
 	u64 rx_gro_packets;
 	u64 rx_gro_bytes;
 	u64 rx_gro_skbs;
-	u64 rx_gro_match_packets;
 	u64 rx_gro_large_hds;
 	u64 rx_mcast_packets;
 	u64 rx_ecn_mark;
@@ -352,7 +351,6 @@ struct mlx5e_rq_stats {
 	u64 gro_packets;
 	u64 gro_bytes;
 	u64 gro_skbs;
-	u64 gro_match_packets;
 	u64 gro_large_hds;
 	u64 mcast_packets;
 	u64 ecn_mark;
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH net-next V2 11/14] net/mlx5e: SHAMPO, Add header-only ethtool counters for header data split
  2024-06-03 21:22 [PATCH net-next V2 00/14] net/mlx5e: SHAMPO, Enable HW GRO once more Tariq Toukan
                   ` (9 preceding siblings ...)
  2024-06-03 21:22 ` [PATCH net-next V2 10/14] net/mlx5e: SHAMPO, Drop rx_gro_match_packets counter Tariq Toukan
@ 2024-06-03 21:22 ` Tariq Toukan
  2024-06-03 21:22 ` [PATCH net-next V2 12/14] net/mlx5e: SHAMPO, Use KSMs instead of KLMs Tariq Toukan
                   ` (3 subsequent siblings)
  14 siblings, 0 replies; 18+ messages in thread
From: Tariq Toukan @ 2024-06-03 21:22 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: netdev, Saeed Mahameed, Gal Pressman, Leon Romanovsky,
	Tariq Toukan, Dragos Tatulea

Count the number of header-only packets and bytes from SHAMPO.

Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
 .../device_drivers/ethernet/mellanox/mlx5/counters.rst   | 9 +++++++++
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c          | 3 +++
 drivers/net/ethernet/mellanox/mlx5/core/en_stats.c       | 4 ++++
 drivers/net/ethernet/mellanox/mlx5/core/en_stats.h       | 4 ++++
 4 files changed, 20 insertions(+)

diff --git a/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/counters.rst b/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/counters.rst
index 18638a8e7c73..3bd72577af9a 100644
--- a/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/counters.rst
+++ b/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/counters.rst
@@ -209,6 +209,15 @@ the software port.
        headers that require additional memory to be allocated.
      - Informative
 
+   * - `rx[i]_hds_nodata_packets`
+     - Number of header only packets in header/data split mode [#accel]_.
+     - Informative
+
+   * - `rx[i]_hds_nodata_bytes`
+     - Number of bytes for header only packets in header/data split mode
+       [#accel]_.
+     - Informative
+
    * - `rx[i]_lro_packets`
      - The number of LRO packets received on ring i [#accel]_.
      - Acceleration
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
index 7ab7215843b6..3af4f70de334 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -2331,6 +2331,9 @@ static void mlx5e_handle_rx_cqe_mpwrq_shampo(struct mlx5e_rq *rq, struct mlx5_cq
 
 			frag_page = &wi->alloc_units.frag_pages[page_idx];
 			mlx5e_shampo_fill_skb_data(*skb, rq, frag_page, data_bcnt, data_offset);
+		} else {
+			stats->hds_nodata_packets++;
+			stats->hds_nodata_bytes += head_size;
 		}
 	}
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c
index a3c79da1525b..db1cac68292f 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c
@@ -343,6 +343,8 @@ static void mlx5e_stats_grp_sw_update_stats_rq_stats(struct mlx5e_sw_stats *s,
 	s->rx_gro_bytes               += rq_stats->gro_bytes;
 	s->rx_gro_skbs                += rq_stats->gro_skbs;
 	s->rx_gro_large_hds           += rq_stats->gro_large_hds;
+	s->rx_hds_nodata_packets      += rq_stats->hds_nodata_packets;
+	s->rx_hds_nodata_bytes        += rq_stats->hds_nodata_bytes;
 	s->rx_ecn_mark                += rq_stats->ecn_mark;
 	s->rx_removed_vlan_packets    += rq_stats->removed_vlan_packets;
 	s->rx_csum_none               += rq_stats->csum_none;
@@ -2056,6 +2058,8 @@ static const struct counter_desc rq_stats_desc[] = {
 	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, gro_bytes) },
 	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, gro_skbs) },
 	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, gro_large_hds) },
+	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, hds_nodata_packets) },
+	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, hds_nodata_bytes) },
 	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, ecn_mark) },
 	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, removed_vlan_packets) },
 	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, wqe_err) },
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h
index 25daae526caa..4c5858c1dd82 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h
@@ -154,6 +154,8 @@ struct mlx5e_sw_stats {
 	u64 rx_gro_bytes;
 	u64 rx_gro_skbs;
 	u64 rx_gro_large_hds;
+	u64 rx_hds_nodata_packets;
+	u64 rx_hds_nodata_bytes;
 	u64 rx_mcast_packets;
 	u64 rx_ecn_mark;
 	u64 rx_removed_vlan_packets;
@@ -352,6 +354,8 @@ struct mlx5e_rq_stats {
 	u64 gro_bytes;
 	u64 gro_skbs;
 	u64 gro_large_hds;
+	u64 hds_nodata_packets;
+	u64 hds_nodata_bytes;
 	u64 mcast_packets;
 	u64 ecn_mark;
 	u64 removed_vlan_packets;
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH net-next V2 12/14] net/mlx5e: SHAMPO, Use KSMs instead of KLMs
  2024-06-03 21:22 [PATCH net-next V2 00/14] net/mlx5e: SHAMPO, Enable HW GRO once more Tariq Toukan
                   ` (10 preceding siblings ...)
  2024-06-03 21:22 ` [PATCH net-next V2 11/14] net/mlx5e: SHAMPO, Add header-only ethtool counters for header data split Tariq Toukan
@ 2024-06-03 21:22 ` Tariq Toukan
  2024-06-03 21:22 ` [PATCH net-next V2 13/14] net/mlx5e: SHAMPO, Re-enable HW-GRO Tariq Toukan
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 18+ messages in thread
From: Tariq Toukan @ 2024-06-03 21:22 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: netdev, Saeed Mahameed, Gal Pressman, Leon Romanovsky, Yoray Zack,
	Tariq Toukan

From: Yoray Zack <yorayz@nvidia.com>

KSM Mkey is KLM Mkey with a fixed buffer size. Due to this fact,
it is a faster mechanism than KLM.

SHAMPO feature used KLMs Mkeys for memory mappings of its headers buffer.
As it used KLMs with the same buffer size for each entry,
we can use KSMs instead.

This commit changes the Mkeys that map the SHAMPO headers buffer
from KLMs to KSMs.

Signed-off-by: Yoray Zack <yorayz@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h  | 20 +-----
 .../ethernet/mellanox/mlx5/core/en/params.c   | 12 ++--
 .../net/ethernet/mellanox/mlx5/core/en/txrx.h | 19 ++++++
 .../net/ethernet/mellanox/mlx5/core/en_main.c | 21 +++---
 .../net/ethernet/mellanox/mlx5/core/en_rx.c   | 65 +++++++++----------
 include/linux/mlx5/device.h                   |  1 +
 6 files changed, 71 insertions(+), 67 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index ff326601d4a4..bec784d25d7b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -80,6 +80,7 @@ struct page_pool;
 				 SKB_DATA_ALIGN(sizeof(struct skb_shared_info)))
 
 #define MLX5E_RX_MAX_HEAD (256)
+#define MLX5E_SHAMPO_LOG_HEADER_ENTRY_SIZE (8)
 #define MLX5E_SHAMPO_LOG_MAX_HEADER_ENTRY_SIZE (9)
 #define MLX5E_SHAMPO_WQ_HEADER_PER_PAGE (PAGE_SIZE >> MLX5E_SHAMPO_LOG_MAX_HEADER_ENTRY_SIZE)
 #define MLX5E_SHAMPO_WQ_BASE_HEAD_ENTRY_SIZE (64)
@@ -146,25 +147,6 @@ struct page_pool;
 #define MLX5E_TX_XSK_POLL_BUDGET       64
 #define MLX5E_SQ_RECOVER_MIN_INTERVAL  500 /* msecs */
 
-#define MLX5E_KLM_UMR_WQE_SZ(sgl_len)\
-	(sizeof(struct mlx5e_umr_wqe) +\
-	(sizeof(struct mlx5_klm) * (sgl_len)))
-
-#define MLX5E_KLM_UMR_WQEBBS(klm_entries) \
-	(DIV_ROUND_UP(MLX5E_KLM_UMR_WQE_SZ(klm_entries), MLX5_SEND_WQE_BB))
-
-#define MLX5E_KLM_UMR_DS_CNT(klm_entries)\
-	(DIV_ROUND_UP(MLX5E_KLM_UMR_WQE_SZ(klm_entries), MLX5_SEND_WQE_DS))
-
-#define MLX5E_KLM_MAX_ENTRIES_PER_WQE(wqe_size)\
-	(((wqe_size) - sizeof(struct mlx5e_umr_wqe)) / sizeof(struct mlx5_klm))
-
-#define MLX5E_KLM_ENTRIES_PER_WQE(wqe_size)\
-	ALIGN_DOWN(MLX5E_KLM_MAX_ENTRIES_PER_WQE(wqe_size), MLX5_UMR_KLM_NUM_ENTRIES_ALIGNMENT)
-
-#define MLX5E_MAX_KLM_PER_WQE(mdev) \
-	MLX5E_KLM_ENTRIES_PER_WQE(MLX5_SEND_WQE_BB * mlx5e_get_max_sq_aligned_wqebbs(mdev))
-
 #define mlx5e_state_dereference(priv, p) \
 	rcu_dereference_protected((p), lockdep_is_held(&(priv)->state_lock))
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/params.c b/drivers/net/ethernet/mellanox/mlx5/core/en/params.c
index ec819dfc98be..6c9ccccca81e 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/params.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/params.c
@@ -1071,18 +1071,18 @@ static u32 mlx5e_shampo_icosq_sz(struct mlx5_core_dev *mdev,
 				 struct mlx5e_params *params,
 				 struct mlx5e_rq_param *rq_param)
 {
-	int max_num_of_umr_per_wqe, max_hd_per_wqe, max_klm_per_umr, rest;
+	int max_num_of_umr_per_wqe, max_hd_per_wqe, max_ksm_per_umr, rest;
 	void *wqc = MLX5_ADDR_OF(rqc, rq_param->rqc, wq);
 	int wq_size = BIT(MLX5_GET(wq, wqc, log_wq_sz));
 	u32 wqebbs;
 
-	max_klm_per_umr = MLX5E_MAX_KLM_PER_WQE(mdev);
+	max_ksm_per_umr = MLX5E_MAX_KSM_PER_WQE(mdev);
 	max_hd_per_wqe = mlx5e_shampo_hd_per_wqe(mdev, params, rq_param);
-	max_num_of_umr_per_wqe = max_hd_per_wqe / max_klm_per_umr;
-	rest = max_hd_per_wqe % max_klm_per_umr;
-	wqebbs = MLX5E_KLM_UMR_WQEBBS(max_klm_per_umr) * max_num_of_umr_per_wqe;
+	max_num_of_umr_per_wqe = max_hd_per_wqe / max_ksm_per_umr;
+	rest = max_hd_per_wqe % max_ksm_per_umr;
+	wqebbs = MLX5E_KSM_UMR_WQEBBS(max_ksm_per_umr) * max_num_of_umr_per_wqe;
 	if (rest)
-		wqebbs += MLX5E_KLM_UMR_WQEBBS(rest);
+		wqebbs += MLX5E_KSM_UMR_WQEBBS(rest);
 	wqebbs *= wq_size;
 	return wqebbs;
 }
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/txrx.h b/drivers/net/ethernet/mellanox/mlx5/core/en/txrx.h
index 879d698b6119..d1f0f868d494 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/txrx.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/txrx.h
@@ -34,6 +34,25 @@
 
 #define MLX5E_RX_ERR_CQE(cqe) (get_cqe_opcode(cqe) != MLX5_CQE_RESP_SEND)
 
+#define MLX5E_KSM_UMR_WQE_SZ(sgl_len)\
+	(sizeof(struct mlx5e_umr_wqe) +\
+	(sizeof(struct mlx5_ksm) * (sgl_len)))
+
+#define MLX5E_KSM_UMR_WQEBBS(ksm_entries) \
+	(DIV_ROUND_UP(MLX5E_KSM_UMR_WQE_SZ(ksm_entries), MLX5_SEND_WQE_BB))
+
+#define MLX5E_KSM_UMR_DS_CNT(ksm_entries)\
+	(DIV_ROUND_UP(MLX5E_KSM_UMR_WQE_SZ(ksm_entries), MLX5_SEND_WQE_DS))
+
+#define MLX5E_KSM_MAX_ENTRIES_PER_WQE(wqe_size)\
+	(((wqe_size) - sizeof(struct mlx5e_umr_wqe)) / sizeof(struct mlx5_ksm))
+
+#define MLX5E_KSM_ENTRIES_PER_WQE(wqe_size)\
+	ALIGN_DOWN(MLX5E_KSM_MAX_ENTRIES_PER_WQE(wqe_size), MLX5_UMR_KSM_NUM_ENTRIES_ALIGNMENT)
+
+#define MLX5E_MAX_KSM_PER_WQE(mdev) \
+	MLX5E_KSM_ENTRIES_PER_WQE(MLX5_SEND_WQE_BB * mlx5e_get_max_sq_aligned_wqebbs(mdev))
+
 static inline
 ktime_t mlx5e_cqe_ts_to_ns(cqe_ts_to_ns func, struct mlx5_clock *clock, u64 cqe_ts)
 {
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index d21a87ddc934..2a3e0de51f0e 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -504,8 +504,8 @@ static int mlx5e_create_umr_mkey(struct mlx5_core_dev *mdev,
 	return err;
 }
 
-static int mlx5e_create_umr_klm_mkey(struct mlx5_core_dev *mdev,
-				     u64 nentries,
+static int mlx5e_create_umr_ksm_mkey(struct mlx5_core_dev *mdev,
+				     u64 nentries, u8 log_entry_size,
 				     u32 *umr_mkey)
 {
 	int inlen;
@@ -525,12 +525,13 @@ static int mlx5e_create_umr_klm_mkey(struct mlx5_core_dev *mdev,
 	MLX5_SET(mkc, mkc, umr_en, 1);
 	MLX5_SET(mkc, mkc, lw, 1);
 	MLX5_SET(mkc, mkc, lr, 1);
-	MLX5_SET(mkc, mkc, access_mode_1_0, MLX5_MKC_ACCESS_MODE_KLMS);
+	MLX5_SET(mkc, mkc, access_mode_1_0, MLX5_MKC_ACCESS_MODE_KSM);
 	mlx5e_mkey_set_relaxed_ordering(mdev, mkc);
 	MLX5_SET(mkc, mkc, qpn, 0xffffff);
 	MLX5_SET(mkc, mkc, pd, mdev->mlx5e_res.hw_objs.pdn);
 	MLX5_SET(mkc, mkc, translations_octword_size, nentries);
-	MLX5_SET(mkc, mkc, length64, 1);
+	MLX5_SET(mkc, mkc, log_page_size, log_entry_size);
+	MLX5_SET64(mkc, mkc, len, nentries << log_entry_size);
 	err = mlx5_core_create_mkey(mdev, umr_mkey, in, inlen);
 
 	kvfree(in);
@@ -565,14 +566,16 @@ static int mlx5e_create_rq_umr_mkey(struct mlx5_core_dev *mdev, struct mlx5e_rq
 static int mlx5e_create_rq_hd_umr_mkey(struct mlx5_core_dev *mdev,
 				       struct mlx5e_rq *rq)
 {
-	u32 max_klm_size = BIT(MLX5_CAP_GEN(mdev, log_max_klm_list_size));
+	u32 max_ksm_size = BIT(MLX5_CAP_GEN(mdev, log_max_klm_list_size));
 
-	if (max_klm_size < rq->mpwqe.shampo->hd_per_wq) {
-		mlx5_core_err(mdev, "max klm list size 0x%x is smaller than shampo header buffer list size 0x%x\n",
-			      max_klm_size, rq->mpwqe.shampo->hd_per_wq);
+	if (max_ksm_size < rq->mpwqe.shampo->hd_per_wq) {
+		mlx5_core_err(mdev, "max ksm list size 0x%x is smaller than shampo header buffer list size 0x%x\n",
+			      max_ksm_size, rq->mpwqe.shampo->hd_per_wq);
 		return -EINVAL;
 	}
-	return mlx5e_create_umr_klm_mkey(mdev, rq->mpwqe.shampo->hd_per_wq,
+
+	return mlx5e_create_umr_ksm_mkey(mdev, rq->mpwqe.shampo->hd_per_wq,
+					 MLX5E_SHAMPO_LOG_HEADER_ENTRY_SIZE,
 					 &rq->mpwqe.shampo->mkey);
 }
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
index 3af4f70de334..f1fbf60d0356 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -619,25 +619,25 @@ static int bitmap_find_window(unsigned long *bitmap, int len,
 	return min(len, count);
 }
 
-static void build_klm_umr(struct mlx5e_icosq *sq, struct mlx5e_umr_wqe *umr_wqe,
-			  __be32 key, u16 offset, u16 klm_len, u16 wqe_bbs)
+static void build_ksm_umr(struct mlx5e_icosq *sq, struct mlx5e_umr_wqe *umr_wqe,
+			  __be32 key, u16 offset, u16 ksm_len)
 {
-	memset(umr_wqe, 0, offsetof(struct mlx5e_umr_wqe, inline_klms));
+	memset(umr_wqe, 0, offsetof(struct mlx5e_umr_wqe, inline_ksms));
 	umr_wqe->ctrl.opmod_idx_opcode =
 		cpu_to_be32((sq->pc << MLX5_WQE_CTRL_WQE_INDEX_SHIFT) |
 			     MLX5_OPCODE_UMR);
 	umr_wqe->ctrl.umr_mkey = key;
 	umr_wqe->ctrl.qpn_ds = cpu_to_be32((sq->sqn << MLX5_WQE_CTRL_QPN_SHIFT)
-					    | MLX5E_KLM_UMR_DS_CNT(klm_len));
+					    | MLX5E_KSM_UMR_DS_CNT(ksm_len));
 	umr_wqe->uctrl.flags = MLX5_UMR_TRANSLATION_OFFSET_EN | MLX5_UMR_INLINE;
 	umr_wqe->uctrl.xlt_offset = cpu_to_be16(offset);
-	umr_wqe->uctrl.xlt_octowords = cpu_to_be16(klm_len);
+	umr_wqe->uctrl.xlt_octowords = cpu_to_be16(ksm_len);
 	umr_wqe->uctrl.mkey_mask     = cpu_to_be64(MLX5_MKEY_MASK_FREE);
 }
 
 static int mlx5e_build_shampo_hd_umr(struct mlx5e_rq *rq,
 				     struct mlx5e_icosq *sq,
-				     u16 klm_entries, u16 index)
+				     u16 ksm_entries, u16 index)
 {
 	struct mlx5e_shampo_hd *shampo = rq->mpwqe.shampo;
 	u16 entries, pi, header_offset, err, wqe_bbs, new_entries;
@@ -650,20 +650,20 @@ static int mlx5e_build_shampo_hd_umr(struct mlx5e_rq *rq,
 	int headroom, i;
 
 	headroom = rq->buff.headroom;
-	new_entries = klm_entries - (shampo->pi & (MLX5_UMR_KLM_NUM_ENTRIES_ALIGNMENT - 1));
-	entries = ALIGN(klm_entries, MLX5_UMR_KLM_NUM_ENTRIES_ALIGNMENT);
-	wqe_bbs = MLX5E_KLM_UMR_WQEBBS(entries);
+	new_entries = ksm_entries - (shampo->pi & (MLX5_UMR_KSM_NUM_ENTRIES_ALIGNMENT - 1));
+	entries = ALIGN(ksm_entries, MLX5_UMR_KSM_NUM_ENTRIES_ALIGNMENT);
+	wqe_bbs = MLX5E_KSM_UMR_WQEBBS(entries);
 	pi = mlx5e_icosq_get_next_pi(sq, wqe_bbs);
 	umr_wqe = mlx5_wq_cyc_get_wqe(&sq->wq, pi);
-	build_klm_umr(sq, umr_wqe, shampo->key, index, entries, wqe_bbs);
+	build_ksm_umr(sq, umr_wqe, shampo->key, index, entries);
 
 	frag_page = &shampo->pages[page_index];
 
 	for (i = 0; i < entries; i++, index++) {
 		dma_info = &shampo->info[index];
-		if (i >= klm_entries || (index < shampo->pi && shampo->pi - index <
-					 MLX5_UMR_KLM_NUM_ENTRIES_ALIGNMENT))
-			goto update_klm;
+		if (i >= ksm_entries || (index < shampo->pi && shampo->pi - index <
+					 MLX5_UMR_KSM_NUM_ENTRIES_ALIGNMENT))
+			goto update_ksm;
 		header_offset = (index & (MLX5E_SHAMPO_WQ_HEADER_PER_PAGE - 1)) <<
 			MLX5E_SHAMPO_LOG_MAX_HEADER_ENTRY_SIZE;
 		if (!(header_offset & (PAGE_SIZE - 1))) {
@@ -683,12 +683,11 @@ static int mlx5e_build_shampo_hd_umr(struct mlx5e_rq *rq,
 			dma_info->frag_page = frag_page;
 		}
 
-update_klm:
-		umr_wqe->inline_klms[i].bcount =
-			cpu_to_be32(MLX5E_RX_MAX_HEAD);
-		umr_wqe->inline_klms[i].key    = cpu_to_be32(lkey);
-		umr_wqe->inline_klms[i].va     =
-			cpu_to_be64(dma_info->addr + headroom);
+update_ksm:
+		umr_wqe->inline_ksms[i] = (struct mlx5_ksm) {
+			.key = cpu_to_be32(lkey),
+			.va  = cpu_to_be64(dma_info->addr + headroom),
+		};
 	}
 
 	sq->db.wqe_info[pi] = (struct mlx5e_icosq_wqe_info) {
@@ -720,37 +719,37 @@ static int mlx5e_build_shampo_hd_umr(struct mlx5e_rq *rq,
 static int mlx5e_alloc_rx_hd_mpwqe(struct mlx5e_rq *rq)
 {
 	struct mlx5e_shampo_hd *shampo = rq->mpwqe.shampo;
-	u16 klm_entries, num_wqe, index, entries_before;
+	u16 ksm_entries, num_wqe, index, entries_before;
 	struct mlx5e_icosq *sq = rq->icosq;
-	int i, err, max_klm_entries, len;
+	int i, err, max_ksm_entries, len;
 
-	max_klm_entries = MLX5E_MAX_KLM_PER_WQE(rq->mdev);
-	klm_entries = bitmap_find_window(shampo->bitmap,
+	max_ksm_entries = MLX5E_MAX_KSM_PER_WQE(rq->mdev);
+	ksm_entries = bitmap_find_window(shampo->bitmap,
 					 shampo->hd_per_wqe,
 					 shampo->hd_per_wq, shampo->pi);
-	if (!klm_entries)
+	if (!ksm_entries)
 		return 0;
 
-	klm_entries += (shampo->pi & (MLX5_UMR_KLM_NUM_ENTRIES_ALIGNMENT - 1));
-	index = ALIGN_DOWN(shampo->pi, MLX5_UMR_KLM_NUM_ENTRIES_ALIGNMENT);
+	ksm_entries += (shampo->pi & (MLX5_UMR_KSM_NUM_ENTRIES_ALIGNMENT - 1));
+	index = ALIGN_DOWN(shampo->pi, MLX5_UMR_KSM_NUM_ENTRIES_ALIGNMENT);
 	entries_before = shampo->hd_per_wq - index;
 
-	if (unlikely(entries_before < klm_entries))
-		num_wqe = DIV_ROUND_UP(entries_before, max_klm_entries) +
-			  DIV_ROUND_UP(klm_entries - entries_before, max_klm_entries);
+	if (unlikely(entries_before < ksm_entries))
+		num_wqe = DIV_ROUND_UP(entries_before, max_ksm_entries) +
+			  DIV_ROUND_UP(ksm_entries - entries_before, max_ksm_entries);
 	else
-		num_wqe = DIV_ROUND_UP(klm_entries, max_klm_entries);
+		num_wqe = DIV_ROUND_UP(ksm_entries, max_ksm_entries);
 
 	for (i = 0; i < num_wqe; i++) {
-		len = (klm_entries > max_klm_entries) ? max_klm_entries :
-							klm_entries;
+		len = (ksm_entries > max_ksm_entries) ? max_ksm_entries :
+							ksm_entries;
 		if (unlikely(index + len > shampo->hd_per_wq))
 			len = shampo->hd_per_wq - index;
 		err = mlx5e_build_shampo_hd_umr(rq, sq, len, index);
 		if (unlikely(err))
 			return err;
 		index = (index + len) & (rq->mpwqe.shampo->hd_per_wq - 1);
-		klm_entries -= len;
+		ksm_entries -= len;
 	}
 
 	return 0;
diff --git a/include/linux/mlx5/device.h b/include/linux/mlx5/device.h
index d7bb31d9a446..da09bfaa7b81 100644
--- a/include/linux/mlx5/device.h
+++ b/include/linux/mlx5/device.h
@@ -294,6 +294,7 @@ enum {
 #define MLX5_UMR_FLEX_ALIGNMENT 0x40
 #define MLX5_UMR_MTT_NUM_ENTRIES_ALIGNMENT (MLX5_UMR_FLEX_ALIGNMENT / sizeof(struct mlx5_mtt))
 #define MLX5_UMR_KLM_NUM_ENTRIES_ALIGNMENT (MLX5_UMR_FLEX_ALIGNMENT / sizeof(struct mlx5_klm))
+#define MLX5_UMR_KSM_NUM_ENTRIES_ALIGNMENT (MLX5_UMR_FLEX_ALIGNMENT / sizeof(struct mlx5_ksm))
 
 #define MLX5_USER_INDEX_LEN (MLX5_FLD_SZ_BYTES(qpc, user_index) * 8)
 
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH net-next V2 13/14] net/mlx5e: SHAMPO, Re-enable HW-GRO
  2024-06-03 21:22 [PATCH net-next V2 00/14] net/mlx5e: SHAMPO, Enable HW GRO once more Tariq Toukan
                   ` (11 preceding siblings ...)
  2024-06-03 21:22 ` [PATCH net-next V2 12/14] net/mlx5e: SHAMPO, Use KSMs instead of KLMs Tariq Toukan
@ 2024-06-03 21:22 ` Tariq Toukan
  2024-06-03 21:22 ` [PATCH net-next V2 14/14] net/mlx5e: SHAMPO, Coalesce skb fragments to page size Tariq Toukan
  2024-06-06  3:30 ` [PATCH net-next V2 00/14] net/mlx5e: SHAMPO, Enable HW GRO once more patchwork-bot+netdevbpf
  14 siblings, 0 replies; 18+ messages in thread
From: Tariq Toukan @ 2024-06-03 21:22 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: netdev, Saeed Mahameed, Gal Pressman, Leon Romanovsky, Yoray Zack,
	Tariq Toukan

From: Yoray Zack <yorayz@nvidia.com>

Add back HW-GRO to the reported features.

As the current implementation of HW-GRO uses KSMs with a
specific fixed buffer size (256B) to map its headers buffer,
we reported the feature only if the NIC is supporting KSM and
the minimum value for buffer size is below the requested one.

iperf3 bandwidth comparison:
+---------+--------+--------+-----------+
| streams | SW GRO | HW GRO | Unit      |
|---------+--------+--------+-----------|
| 1       | 36     | 42     | Gbits/sec |
| 4       | 34     | 39     | Gbits/sec |
| 8       | 31     | 35     | Gbits/sec |
+---------+--------+--------+-----------+

A downstream patch will add skb fragment coalescing which will improve
performance considerably.

Benchmark details:
VM based setup
CPU: Intel(R) Xeon(R) Platinum 8380 CPU, 24 cores
NIC: ConnectX-7 100GbE
iperf3 and irq running on same CPU over a single receive queue

Signed-off-by: Yoray Zack <yorayz@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
 .../net/ethernet/mellanox/mlx5/core/en_main.c | 26 +++++++++++++++++++
 include/linux/mlx5/mlx5_ifc.h                 | 16 ++++++++----
 2 files changed, 37 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 2a3e0de51f0e..44a64d062e42 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -74,6 +74,27 @@
 #include "lib/devcom.h"
 #include "lib/sd.h"
 
+static bool mlx5e_hw_gro_supported(struct mlx5_core_dev *mdev)
+{
+	if (!MLX5_CAP_GEN(mdev, shampo))
+		return false;
+
+	/* Our HW-GRO implementation relies on "KSM Mkey" for
+	 * SHAMPO headers buffer mapping
+	 */
+	if (!MLX5_CAP_GEN(mdev, fixed_buffer_size))
+		return false;
+
+	if (!MLX5_CAP_GEN_2(mdev, min_mkey_log_entity_size_fixed_buffer_valid))
+		return false;
+
+	if (MLX5_CAP_GEN_2(mdev, min_mkey_log_entity_size_fixed_buffer) >
+	    MLX5E_SHAMPO_LOG_HEADER_ENTRY_SIZE)
+		return false;
+
+	return true;
+}
+
 bool mlx5e_check_fragmented_striding_rq_cap(struct mlx5_core_dev *mdev, u8 page_shift,
 					    enum mlx5e_mpwrq_umr_mode umr_mode)
 {
@@ -5331,6 +5352,11 @@ static void mlx5e_build_nic_netdev(struct net_device *netdev)
 	netdev->hw_features      |= NETIF_F_HW_VLAN_CTAG_FILTER;
 	netdev->hw_features      |= NETIF_F_HW_VLAN_STAG_TX;
 
+	if (mlx5e_hw_gro_supported(mdev) &&
+	    mlx5e_check_fragmented_striding_rq_cap(mdev, PAGE_SHIFT,
+						   MLX5E_MPWRQ_UMR_MODE_ALIGNED))
+		netdev->hw_features    |= NETIF_F_GRO_HW;
+
 	if (mlx5e_tunnel_any_tx_proto_supported(mdev)) {
 		netdev->hw_enc_features |= NETIF_F_HW_CSUM;
 		netdev->hw_enc_features |= NETIF_F_TSO;
diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index 5df52e15f7d6..17acd0f3ca8e 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -1526,8 +1526,7 @@ struct mlx5_ifc_cmd_hca_cap_bits {
 	u8         ts_cqe_to_dest_cqn[0x1];
 	u8         reserved_at_b3[0x6];
 	u8         go_back_n[0x1];
-	u8         shampo[0x1];
-	u8         reserved_at_bb[0x5];
+	u8         reserved_at_ba[0x6];
 
 	u8         max_sgl_for_optimized_performance[0x8];
 	u8         log_max_cq_sz[0x8];
@@ -1744,7 +1743,9 @@ struct mlx5_ifc_cmd_hca_cap_bits {
 	u8         reserved_at_280[0x10];
 	u8         max_wqe_sz_sq[0x10];
 
-	u8         reserved_at_2a0[0x10];
+	u8         reserved_at_2a0[0xb];
+	u8         shampo[0x1];
+	u8         reserved_at_2ac[0x4];
 	u8         max_wqe_sz_rq[0x10];
 
 	u8         max_flow_counter_31_16[0x10];
@@ -2017,7 +2018,8 @@ struct mlx5_ifc_cmd_hca_cap_2_bits {
 	u8	   reserved_at_250[0x10];
 
 	u8	   reserved_at_260[0x120];
-	u8	   reserved_at_380[0x10];
+	u8	   reserved_at_380[0xb];
+	u8	   min_mkey_log_entity_size_fixed_buffer[0x5];
 	u8	   ec_vf_vport_base[0x10];
 
 	u8	   reserved_at_3a0[0x10];
@@ -2029,7 +2031,11 @@ struct mlx5_ifc_cmd_hca_cap_2_bits {
 	u8	   pcc_ifa2[0x1];
 	u8	   reserved_at_3f1[0xf];
 
-	u8	   reserved_at_400[0x400];
+	u8	   reserved_at_400[0x1];
+	u8	   min_mkey_log_entity_size_fixed_buffer_valid[0x1];
+	u8	   reserved_at_402[0x1e];
+
+	u8	   reserved_at_420[0x3e0];
 };
 
 enum mlx5_ifc_flow_destination_type {
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH net-next V2 14/14] net/mlx5e: SHAMPO, Coalesce skb fragments to page size
  2024-06-03 21:22 [PATCH net-next V2 00/14] net/mlx5e: SHAMPO, Enable HW GRO once more Tariq Toukan
                   ` (12 preceding siblings ...)
  2024-06-03 21:22 ` [PATCH net-next V2 13/14] net/mlx5e: SHAMPO, Re-enable HW-GRO Tariq Toukan
@ 2024-06-03 21:22 ` Tariq Toukan
  2024-06-06  3:30 ` [PATCH net-next V2 00/14] net/mlx5e: SHAMPO, Enable HW GRO once more patchwork-bot+netdevbpf
  14 siblings, 0 replies; 18+ messages in thread
From: Tariq Toukan @ 2024-06-03 21:22 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: netdev, Saeed Mahameed, Gal Pressman, Leon Romanovsky,
	Dragos Tatulea, Tariq Toukan

From: Dragos Tatulea <dtatulea@nvidia.com>

When doing hardware GRO (SHAMPO), the driver puts each data payload of a
packet from the wire into one skb fragment. TCP Zero-Copy expects page
sized skb fragments to be able to do it's page-flipping magic. With the
current way of arranging fragments by the driver, only specific MTUs
(page sized multiple + header size) will yield such page sized fragments
in a high percentage.

This change improves payload arrangement in the skb for hardware GRO by
coalescing payloads into a single skb fragment when possible.

To demonstrate the fix, running tcp_mmap with a MTU of 1500 yields:
- Before:  0 % bytes mmap'ed
- After : 81 % bytes mmap'ed

More importantly, coalescing considerably improves the HW GRO performance.
Here are the results for a iperf3 bandwidth benchmark:
+---------+--------+--------+------------------------+-----------+
| streams | SW GRO | HW GRO | HW GRO with coalescing | Unit      |
|---------+--------+--------+------------------------+-----------|
| 1       | 36     | 42     | 57                     | Gbits/sec |
| 4       | 34     | 39     | 50                     | Gbits/sec |
| 8       | 31     | 35     | 43                     | Gbits/sec |
+---------+--------+--------+------------------------+-----------+

Benchmark details:
VM based setup
CPU: Intel(R) Xeon(R) Platinum 8380 CPU, 24 cores
NIC: ConnectX-7 100GbE
iperf3 and irq running on same CPU over a single receive queue

Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
 .../net/ethernet/mellanox/mlx5/core/en_rx.c   | 19 +++++++++++++------
 1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
index f1fbf60d0356..43f018567faf 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -523,15 +523,23 @@ mlx5e_add_skb_shared_info_frag(struct mlx5e_rq *rq, struct skb_shared_info *sinf
 
 static inline void
 mlx5e_add_skb_frag(struct mlx5e_rq *rq, struct sk_buff *skb,
-		   struct page *page, u32 frag_offset, u32 len,
+		   struct mlx5e_frag_page *frag_page,
+		   u32 frag_offset, u32 len,
 		   unsigned int truesize)
 {
-	dma_addr_t addr = page_pool_get_dma_addr(page);
+	dma_addr_t addr = page_pool_get_dma_addr(frag_page->page);
+	u8 next_frag = skb_shinfo(skb)->nr_frags;
 
 	dma_sync_single_for_cpu(rq->pdev, addr + frag_offset, len,
 				rq->buff.map_dir);
-	skb_add_rx_frag(skb, skb_shinfo(skb)->nr_frags,
-			page, frag_offset, len, truesize);
+
+	if (skb_can_coalesce(skb, next_frag, frag_page->page, frag_offset)) {
+		skb_coalesce_rx_frag(skb, next_frag - 1, len, truesize);
+	} else {
+		frag_page->frags++;
+		skb_add_rx_frag(skb, next_frag, frag_page->page,
+				frag_offset, len, truesize);
+	}
 }
 
 static inline void
@@ -1956,8 +1964,7 @@ mlx5e_shampo_fill_skb_data(struct sk_buff *skb, struct mlx5e_rq *rq,
 		u32 pg_consumed_bytes = min_t(u32, PAGE_SIZE - data_offset, data_bcnt);
 		unsigned int truesize = pg_consumed_bytes;
 
-		frag_page->frags++;
-		mlx5e_add_skb_frag(rq, skb, frag_page->page, data_offset,
+		mlx5e_add_skb_frag(rq, skb, frag_page, data_offset,
 				   pg_consumed_bytes, truesize);
 
 		data_bcnt -= pg_consumed_bytes;
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next V2 02/14] net/mlx5e: SHAMPO, Fix incorrect page release
  2024-06-03 21:22 ` [PATCH net-next V2 02/14] net/mlx5e: SHAMPO, Fix incorrect page release Tariq Toukan
@ 2024-06-06  3:20   ` Jakub Kicinski
  2024-06-06  5:32     ` Tariq Toukan
  0 siblings, 1 reply; 18+ messages in thread
From: Jakub Kicinski @ 2024-06-06  3:20 UTC (permalink / raw)
  To: Tariq Toukan
  Cc: David S. Miller, Paolo Abeni, Eric Dumazet, netdev,
	Saeed Mahameed, Gal Pressman, Leon Romanovsky, Dragos Tatulea

On Tue, 4 Jun 2024 00:22:07 +0300 Tariq Toukan wrote:
> Fixes: 6f5742846053 ("net/mlx5e: RX, Enable skb page recycling through the page_pool")

Why is this still here?! If the bad code path cannot be hit without
applying patches later in the series, it's not a fix. Let me remove
this for you when applying so y'all don't descend on me for "hating
vendors" :|

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next V2 00/14] net/mlx5e: SHAMPO, Enable HW GRO once more
  2024-06-03 21:22 [PATCH net-next V2 00/14] net/mlx5e: SHAMPO, Enable HW GRO once more Tariq Toukan
                   ` (13 preceding siblings ...)
  2024-06-03 21:22 ` [PATCH net-next V2 14/14] net/mlx5e: SHAMPO, Coalesce skb fragments to page size Tariq Toukan
@ 2024-06-06  3:30 ` patchwork-bot+netdevbpf
  14 siblings, 0 replies; 18+ messages in thread
From: patchwork-bot+netdevbpf @ 2024-06-06  3:30 UTC (permalink / raw)
  To: Tariq Toukan; +Cc: davem, kuba, pabeni, edumazet, netdev, saeedm, gal, leonro

Hello:

This series was applied to netdev/net-next.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Tue, 4 Jun 2024 00:22:05 +0300 you wrote:
> This series enables hardware GRO for ConnectX-7 and newer NICs.
> SHAMPO stands for Split Header And Merge Payload Offload.
> 
> The first part of the series contains important fixes and improvements.
> 
> The second part reworks the HW GRO counters.
> 
> [...]

Here is the summary with links:
  - [net-next,V2,01/14] net/mlx5e: SHAMPO, Use net_prefetch API
    https://git.kernel.org/netdev/net-next/c/4e92d247418c
  - [net-next,V2,02/14] net/mlx5e: SHAMPO, Fix incorrect page release
    https://git.kernel.org/netdev/net-next/c/70bd03b89f20
  - [net-next,V2,03/14] net/mlx5e: SHAMPO, Fix invalid WQ linked list unlink
    https://git.kernel.org/netdev/net-next/c/fba8334721e2
  - [net-next,V2,04/14] net/mlx5e: SHAMPO, Fix FCS config when HW GRO on
    https://git.kernel.org/netdev/net-next/c/a64bbd8c286f
  - [net-next,V2,05/14] net/mlx5e: SHAMPO, Disable gso_size for non GRO packets
    https://git.kernel.org/netdev/net-next/c/083dbb54c480
  - [net-next,V2,06/14] net/mlx5e: SHAMPO, Simplify header page release in teardown
    https://git.kernel.org/netdev/net-next/c/e839ac9a89cb
  - [net-next,V2,07/14] net/mlx5e: SHAMPO, Specialize mlx5e_fill_skb_data()
    https://git.kernel.org/netdev/net-next/c/d34d7d1973c4
  - [net-next,V2,08/14] net/mlx5e: SHAMPO, Skipping on duplicate flush of the same SHAMPO SKB
    https://git.kernel.org/netdev/net-next/c/f5a699e00f04
  - [net-next,V2,09/14] net/mlx5e: SHAMPO, Make GRO counters more precise
    https://git.kernel.org/netdev/net-next/c/8f9eb8bb5c5a
  - [net-next,V2,10/14] net/mlx5e: SHAMPO, Drop rx_gro_match_packets counter
    https://git.kernel.org/netdev/net-next/c/16f448d47a86
  - [net-next,V2,11/14] net/mlx5e: SHAMPO, Add header-only ethtool counters for header data split
    https://git.kernel.org/netdev/net-next/c/e95c5b9e8912
  - [net-next,V2,12/14] net/mlx5e: SHAMPO, Use KSMs instead of KLMs
    https://git.kernel.org/netdev/net-next/c/758191c9ea7b
  - [net-next,V2,13/14] net/mlx5e: SHAMPO, Re-enable HW-GRO
    https://git.kernel.org/netdev/net-next/c/99be56171fa9
  - [net-next,V2,14/14] net/mlx5e: SHAMPO, Coalesce skb fragments to page size
    https://git.kernel.org/netdev/net-next/c/14ae2fd12be8

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next V2 02/14] net/mlx5e: SHAMPO, Fix incorrect page release
  2024-06-06  3:20   ` Jakub Kicinski
@ 2024-06-06  5:32     ` Tariq Toukan
  0 siblings, 0 replies; 18+ messages in thread
From: Tariq Toukan @ 2024-06-06  5:32 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: David S. Miller, Paolo Abeni, Eric Dumazet, netdev,
	Saeed Mahameed, Gal Pressman, Leon Romanovsky, Dragos Tatulea,
	Tariq Toukan



On 06/06/2024 6:20, Jakub Kicinski wrote:
> On Tue, 4 Jun 2024 00:22:07 +0300 Tariq Toukan wrote:
>> Fixes: 6f5742846053 ("net/mlx5e: RX, Enable skb page recycling through the page_pool")
> 
> Why is this still here?!

It shouldn't, I planned to remove it, but missed it eventually.

> If the bad code path cannot be hit without
> applying patches later in the series, it's not a fix. Let me remove
> this for you when applying 

Yes please. Thanks.

> so y'all don't descend on me for "hating
> vendors" :|

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2024-06-06  5:32 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-03 21:22 [PATCH net-next V2 00/14] net/mlx5e: SHAMPO, Enable HW GRO once more Tariq Toukan
2024-06-03 21:22 ` [PATCH net-next V2 01/14] net/mlx5e: SHAMPO, Use net_prefetch API Tariq Toukan
2024-06-03 21:22 ` [PATCH net-next V2 02/14] net/mlx5e: SHAMPO, Fix incorrect page release Tariq Toukan
2024-06-06  3:20   ` Jakub Kicinski
2024-06-06  5:32     ` Tariq Toukan
2024-06-03 21:22 ` [PATCH net-next V2 03/14] net/mlx5e: SHAMPO, Fix invalid WQ linked list unlink Tariq Toukan
2024-06-03 21:22 ` [PATCH net-next V2 04/14] net/mlx5e: SHAMPO, Fix FCS config when HW GRO on Tariq Toukan
2024-06-03 21:22 ` [PATCH net-next V2 05/14] net/mlx5e: SHAMPO, Disable gso_size for non GRO packets Tariq Toukan
2024-06-03 21:22 ` [PATCH net-next V2 06/14] net/mlx5e: SHAMPO, Simplify header page release in teardown Tariq Toukan
2024-06-03 21:22 ` [PATCH net-next V2 07/14] net/mlx5e: SHAMPO, Specialize mlx5e_fill_skb_data() Tariq Toukan
2024-06-03 21:22 ` [PATCH net-next V2 08/14] net/mlx5e: SHAMPO, Skipping on duplicate flush of the same SHAMPO SKB Tariq Toukan
2024-06-03 21:22 ` [PATCH net-next V2 09/14] net/mlx5e: SHAMPO, Make GRO counters more precise Tariq Toukan
2024-06-03 21:22 ` [PATCH net-next V2 10/14] net/mlx5e: SHAMPO, Drop rx_gro_match_packets counter Tariq Toukan
2024-06-03 21:22 ` [PATCH net-next V2 11/14] net/mlx5e: SHAMPO, Add header-only ethtool counters for header data split Tariq Toukan
2024-06-03 21:22 ` [PATCH net-next V2 12/14] net/mlx5e: SHAMPO, Use KSMs instead of KLMs Tariq Toukan
2024-06-03 21:22 ` [PATCH net-next V2 13/14] net/mlx5e: SHAMPO, Re-enable HW-GRO Tariq Toukan
2024-06-03 21:22 ` [PATCH net-next V2 14/14] net/mlx5e: SHAMPO, Coalesce skb fragments to page size Tariq Toukan
2024-06-06  3:30 ` [PATCH net-next V2 00/14] net/mlx5e: SHAMPO, Enable HW GRO once more patchwork-bot+netdevbpf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).