public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
From: Tariq Toukan <tariqt@nvidia.com>
To: Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
	Andrew Lunn <andrew+netdev@lunn.ch>,
	"David S. Miller" <davem@davemloft.net>
Cc: Saeed Mahameed <saeedm@nvidia.com>,
	Leon Romanovsky <leon@kernel.org>,
	Tariq Toukan <tariqt@nvidia.com>, Mark Bloch <mbloch@nvidia.com>,
	"Alexei Starovoitov" <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	"Jesper Dangaard Brouer" <hawk@kernel.org>,
	John Fastabend <john.fastabend@gmail.com>,
	Richard Cochran <richardcochran@gmail.com>,
	<netdev@vger.kernel.org>, <linux-rdma@vger.kernel.org>,
	<linux-kernel@vger.kernel.org>, <bpf@vger.kernel.org>,
	Gal Pressman <gal@nvidia.com>,
	Dragos Tatulea <dtatulea@nvidia.com>,
	Cosmin Ratiu <cratiu@nvidia.com>,
	Pavel Begunkov <asml.silence@gmail.com>,
	David Wei <dw@davidwei.uk>
Subject: [PATCH net-next 15/15] net/mlx5e: SHAMPO, Allow high order pages in zerocopy mode
Date: Mon, 23 Feb 2026 22:41:55 +0200	[thread overview]
Message-ID: <20260223204155.1783580-16-tariqt@nvidia.com> (raw)
In-Reply-To: <20260223204155.1783580-1-tariqt@nvidia.com>

From: Dragos Tatulea <dtatulea@nvidia.com>

Allow high order pages only when SHAMPO mode is enabled (hw-gro) and the
queue is used for zerocopy (has memory provider ops set). The limit is
128K and it was chosen for the following reasons:
- 256K size requires a special case during MTT calculation to split the
  page in two. That's because two MTTs are needed to form an octword.
- Higher sizes require increasing WQE size and/or reducing the number
  of WQEs.
- Having the RQ lined with too few large pages can lead to refill
  issues.

Results show an increase in BW and a decrease in CPU usage.
The benchmark was done with the zcrx samples from liburing [0].

rx_buf_len=4K, oncpu [1]:
packets=3358832 (MB=820027), rps=55794 (MB/s=13621)
Average:     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
Average:       9    1.56    0.00   18.09   13.42    0.00   66.80    0.00    0.00    0.00    0.12

rx_buf_len=128K, oncpu [2]:
packets=3781376 (MB=923187), rps=62813 (MB/s=15335)
Average:     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
Average:       9    0.33    0.00    7.61   18.86    0.00   73.08    0.00    0.00    0.00    0.12

rx_buf_len=4K, offcpu [3]:
packets=3460368 (MB=844816), rps=57481 (MB/s=14033)
Average:     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
Average:       9    0.00    0.00    0.26    0.00    0.00   92.63    0.00    0.00    0.00    7.11
Average:      11    3.04    0.00   68.09   28.87    0.00    0.00    0.00    0.00    0.00    0.00

rx_buf_len=128K, offcpu [4]:
packets=4119840 (MB=1005820), rps=68435 (MB/s=16707)
Average:     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
Average:       9    0.00    0.00    0.87    0.00    0.00   63.77    0.00    0.00    0.00   35.36
Average:      11    1.96    0.00   43.68   54.37    0.00    0.00    0.00    0.00    0.00    0.00

[0] https://github.com/isilence/liburing/tree/zcrx/rx-buf-len

[1] commands:
  $> taskset -c 9 ./zcrx 6 -i eth2 -q 9 -A 1 -B 4096 -S 33554432
  $> ./send-zerocopy tcp -6 -D 2001:db8::1 -t 60 -C 0 -l 1 -b 1 -n 1 -z 1 -d -s 256000

[2] commands:
  $> taskset -c 9 ./zcrx 6 -i eth2 -q 9 -A 1 -B 131072 -S 33554432
  $> ./send-zerocopy tcp -6 -D 2001:db8::1 -t 60 -C 0 -l 1 -b 1 -n 1 -z 1 -d -s 256000

[3] commands:
  $> taskset -c 11 ./zcrx 6 -i eth2 -q 9 -A 1 -B 4096 -S 33554432
  $> ./send-zerocopy tcp -6 -D 2001:db8::1 -t 60 -C 0 -l 1 -b 1 -n 1 -z 1 -d -s 256000

[4] commands:
  $> taskset -c 11 ./zcrx 6 -i eth2 -q 9 -A 1 -B 131072 -S 33554432
  $> ./send-zerocopy tcp -6 -D 2001:db8::1 -t 60 -C 0 -l 1 -b 1 -n 1 -z 1 -d -s 256000

Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
 .../net/ethernet/mellanox/mlx5/core/en_main.c | 36 ++++++++++++++++++-
 1 file changed, 35 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 59e38e7e067e..67dc38981101 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -5586,12 +5586,40 @@ static int mlx5e_queue_validate_qcfg(struct net_device *dev,
 				     struct netdev_queue_config *qcfg,
 				     struct netlink_ext_ack *extack)
 {
-	if (qcfg->rx_page_size != PAGE_SIZE)
+	struct mlx5e_priv *priv = netdev_priv(dev);
+	struct mlx5_core_dev *mdev = priv->mdev;
+	u32 max;
+
+	if (!is_power_of_2(qcfg->rx_page_size)) {
+		netdev_err(priv->netdev, "rx_page_size not power of 2: %u",
+			   qcfg->rx_page_size);
 		return -EINVAL;
+	}
+
+	max = mlx5e_mpwrq_max_page_size(mdev);
+	if (qcfg->rx_page_size < PAGE_SIZE || qcfg->rx_page_size > max) {
+		netdev_err(priv->netdev,
+			   "Selected rx_page_size %u not in supported range [%lu, %u]\n",
+			   qcfg->rx_page_size, PAGE_SIZE, max);
+		return -ERANGE;
+	}
 
 	return 0;
 }
 
+static bool mlx5e_queue_validate_page_size(struct net_device *dev,
+					   struct netdev_queue_config *qcfg,
+					   int queue_index)
+{
+	if (qcfg->rx_page_size == PAGE_SIZE)
+		return true;
+
+	if (!netif_rxq_has_unreadable_mp(dev, queue_index))
+		return false;
+
+	return true;
+}
+
 static int mlx5e_queue_mem_alloc(struct net_device *dev,
 				 struct netdev_queue_config *qcfg,
 				 void *newq, int queue_index)
@@ -5623,6 +5651,12 @@ static int mlx5e_queue_mem_alloc(struct net_device *dev,
 		goto unlock;
 	}
 
+	if (!mlx5e_queue_validate_page_size(dev, qcfg, queue_index)) {
+		netdev_err(priv->netdev, "High order pages are supported only in Zero-Copy mode\n");
+		err = -EINVAL;
+		goto unlock;
+	}
+
 	err = mlx5e_open_channel(priv, queue_index, &params, qcfg, NULL,
 				 &new->c);
 unlock:
-- 
2.44.0


  parent reply	other threads:[~2026-02-23 20:44 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-23 20:41 [PATCH net-next 00/15] net/mlx5e: SHAMPO, Allow high order pages in zerocopy mode Tariq Toukan
2026-02-23 20:41 ` [PATCH net-next 01/15] net/mlx5e: Make mlx5e_rq_param naming consistent Tariq Toukan
2026-02-23 20:41 ` [PATCH net-next 02/15] net/mlx5e: Extract striding rq param calculation in function Tariq Toukan
2026-02-23 20:41 ` [PATCH net-next 03/15] net/mlx5e: Extract max_xsk_wqebbs into its own function Tariq Toukan
2026-02-23 20:41 ` [PATCH net-next 04/15] net/mlx5e: Expose and rename xsk channel parameter function Tariq Toukan
2026-02-23 20:41 ` [PATCH net-next 05/15] net/mlx5e: Alloc xsk channel param out of mlx5e_open_xsk() Tariq Toukan
2026-02-23 20:41 ` [PATCH net-next 06/15] net/mlx5e: Move xsk param into new option container struct Tariq Toukan
2026-02-23 20:41 ` [PATCH net-next 07/15] net/mlx5e: Drop unused channel parameters Tariq Toukan
2026-02-23 20:41 ` [PATCH net-next 08/15] net/mlx5e: SHAMPO, Always calculate page size Tariq Toukan
2026-02-23 20:41 ` [PATCH net-next 09/15] net/mlx5e: Set page_pool order based on calculated page_shift Tariq Toukan
2026-02-23 20:41 ` [PATCH net-next 10/15] net/mlx5e: Alloc rq drop page " Tariq Toukan
2026-02-23 20:41 ` [PATCH net-next 11/15] net/mlx5e: RX, Make page frag bias more robust Tariq Toukan
2026-02-23 20:41 ` [PATCH net-next 12/15] net/mlx5e: Add queue config ops for page size Tariq Toukan
2026-02-23 20:41 ` [PATCH net-next 13/15] net/mlx5e: Pass netdev queue config to param calculations Tariq Toukan
2026-02-23 20:41 ` [PATCH net-next 14/15] net/mlx5e: Add param helper to calculate max page size Tariq Toukan
2026-02-23 20:41 ` Tariq Toukan [this message]
2026-02-26 10:10 ` [PATCH net-next 00/15] net/mlx5e: SHAMPO, Allow high order pages in zerocopy mode patchwork-bot+netdevbpf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260223204155.1783580-16-tariqt@nvidia.com \
    --to=tariqt@nvidia.com \
    --cc=andrew+netdev@lunn.ch \
    --cc=asml.silence@gmail.com \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=cratiu@nvidia.com \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=dtatulea@nvidia.com \
    --cc=dw@davidwei.uk \
    --cc=edumazet@google.com \
    --cc=gal@nvidia.com \
    --cc=hawk@kernel.org \
    --cc=john.fastabend@gmail.com \
    --cc=kuba@kernel.org \
    --cc=leon@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=mbloch@nvidia.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=richardcochran@gmail.com \
    --cc=saeedm@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox