From: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
To: Tariq Toukan <tariqt@nvidia.com>
Cc: "David S. Miller" <davem@davemloft.net>,
Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
Eric Dumazet <edumazet@google.com>,
Andrew Lunn <andrew+netdev@lunn.ch>,
netdev@vger.kernel.org, Saeed Mahameed <saeedm@nvidia.com>,
Gal Pressman <gal@nvidia.com>,
Leon Romanovsky <leonro@nvidia.com>,
Simon Horman <horms@kernel.org>,
Donald Hunter <donald.hunter@gmail.com>,
Jiri Pirko <jiri@resnulli.us>, Jonathan Corbet <corbet@lwn.net>,
Leon Romanovsky <leon@kernel.org>,
Alexei Starovoitov <ast@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>,
Jesper Dangaard Brouer <hawk@kernel.org>,
John Fastabend <john.fastabend@gmail.com>,
Richard Cochran <richardcochran@gmail.com>,
linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org,
linux-rdma@vger.kernel.org, bpf@vger.kernel.org,
William Tu <witu@nvidia.com>
Subject: Re: [PATCH net-next 06/15] net/mlx5e: reduce the max log mpwrq sz for ECPF and reps
Date: Mon, 10 Feb 2025 10:47:26 +0100 [thread overview]
Message-ID: <Z6nLLsMa4njyLrIV@mev-dev.igk.intel.com> (raw)
In-Reply-To: <20250209101716.112774-7-tariqt@nvidia.com>
On Sun, Feb 09, 2025 at 12:17:07PM +0200, Tariq Toukan wrote:
> From: William Tu <witu@nvidia.com>
>
> For the ECPF and representors, reduce the max MPWRQ size from 256KB (18)
> to 128KB (17). This prepares the later patch for saving representor
> memory.
>
> With Striding RQ, there is a minimum of 4 MPWQEs. So with 128KB of max
> MPWRQ size, the minimal memory is 4 * 128KB = 512KB. When creating page
> pool, consider 1500 mtu, the minimal page pool size will be 512KB/4KB =
> 128 pages = 256 rx ring entries (2 entries per page).
>
> Before this patch, setting RX ringsize (ethtool -G rx) to 256 causes
> driver to allocate page pool size more than it needs due to max MPWRQ
> is 256KB (18). Ex: 4 * 256KB = 1MB, 1MB/4KB = 256 pages, but actually
> 128 pages is good enough. Reducing the max MPWRQ to 128KB fixes the
> limitation.
>
> Signed-off-by: William Tu <witu@nvidia.com>
> Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
> ---
> drivers/net/ethernet/mellanox/mlx5/core/en.h | 2 --
> .../net/ethernet/mellanox/mlx5/core/en/params.c | 15 +++++++++++----
> 2 files changed, 11 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h
> index 979fc56205e1..534fdd27c8de 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
> @@ -95,8 +95,6 @@ struct page_pool;
> #define MLX5_MPWRQ_DEF_LOG_STRIDE_SZ(mdev) \
> MLX5_MPWRQ_LOG_STRIDE_SZ(mdev, order_base_2(MLX5E_RX_MAX_HEAD))
>
> -#define MLX5_MPWRQ_MAX_LOG_WQE_SZ 18
> -
> /* Keep in sync with mlx5e_mpwrq_log_wqe_sz.
> * These are theoretical maximums, which can be further restricted by
> * capabilities. These values are used for static resource allocations and
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/params.c b/drivers/net/ethernet/mellanox/mlx5/core/en/params.c
> index 64b62ed17b07..e37d4c202bba 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en/params.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/params.c
> @@ -10,6 +10,9 @@
> #include <net/page_pool/types.h>
> #include <net/xdp_sock_drv.h>
>
> +#define MLX5_MPWRQ_MAX_LOG_WQE_SZ 18
> +#define MLX5_REP_MPWRQ_MAX_LOG_WQE_SZ 17
> +
> static u8 mlx5e_mpwrq_min_page_shift(struct mlx5_core_dev *mdev)
> {
> u8 min_page_shift = MLX5_CAP_GEN_2(mdev, log_min_mkey_entity_size);
> @@ -103,18 +106,22 @@ u8 mlx5e_mpwrq_log_wqe_sz(struct mlx5_core_dev *mdev, u8 page_shift,
> enum mlx5e_mpwrq_umr_mode umr_mode)
> {
> u8 umr_entry_size = mlx5e_mpwrq_umr_entry_size(umr_mode);
> - u8 max_pages_per_wqe, max_log_mpwqe_size;
> + u8 max_pages_per_wqe, max_log_wqe_size_calc;
> + u8 max_log_wqe_size_cap;
> u16 max_wqe_size;
>
> /* Keep in sync with MLX5_MPWRQ_MAX_PAGES_PER_WQE. */
> max_wqe_size = mlx5e_get_max_sq_aligned_wqebbs(mdev) * MLX5_SEND_WQE_BB;
> max_pages_per_wqe = ALIGN_DOWN(max_wqe_size - sizeof(struct mlx5e_umr_wqe),
> MLX5_UMR_FLEX_ALIGNMENT) / umr_entry_size;
> - max_log_mpwqe_size = ilog2(max_pages_per_wqe) + page_shift;
> + max_log_wqe_size_calc = ilog2(max_pages_per_wqe) + page_shift;
> +
> + WARN_ON_ONCE(max_log_wqe_size_calc < MLX5E_ORDER2_MAX_PACKET_MTU);
>
> - WARN_ON_ONCE(max_log_mpwqe_size < MLX5E_ORDER2_MAX_PACKET_MTU);
> + max_log_wqe_size_cap = mlx5_core_is_ecpf(mdev) ?
> + MLX5_REP_MPWRQ_MAX_LOG_WQE_SZ : MLX5_MPWRQ_MAX_LOG_WQE_SZ;
>
> - return min_t(u8, max_log_mpwqe_size, MLX5_MPWRQ_MAX_LOG_WQE_SZ);
> + return min_t(u8, max_log_wqe_size_calc, max_log_wqe_size_cap);
Changing the variable name looks like uneccessary complication, as it is
still used for the same purpouse.
I remember there were some patches to devlink for supporting changing
the representor parameters. Is it sth different or you decided to only
change the default value to fix the memory problem? (sorry, maybe I miss
the devlink series).
Anyway, looks fine, thanks:
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
> }
>
> u8 mlx5e_mpwrq_pages_per_wqe(struct mlx5_core_dev *mdev, u8 page_shift,
> --
> 2.45.0
next prev parent reply other threads:[~2025-02-10 9:51 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-02-09 10:17 [PATCH net-next 00/15] Rate management on traffic classes + misc Tariq Toukan
2025-02-09 10:17 ` [PATCH net-next 01/15] devlink: Extend devlink rate API with traffic classes bandwidth management Tariq Toukan
2025-02-09 10:17 ` [PATCH net-next 02/15] net/mlx5: Add no-op implementation for setting tc-bw on rate objects Tariq Toukan
2025-02-09 10:17 ` [PATCH net-next 03/15] net/mlx5: Add support for setting tc-bw on nodes Tariq Toukan
2025-02-09 10:17 ` [PATCH net-next 04/15] net/mlx5: Add traffic class scheduling support for vport QoS Tariq Toukan
2025-02-09 10:17 ` [PATCH net-next 05/15] net/mlx5: Manage TC arbiter nodes and implement full support for tc-bw Tariq Toukan
2025-02-09 10:17 ` [PATCH net-next 06/15] net/mlx5e: reduce the max log mpwrq sz for ECPF and reps Tariq Toukan
2025-02-10 9:47 ` Michal Swiatkowski [this message]
2025-02-09 10:17 ` [PATCH net-next 07/15] net/mlx5e: reduce rep rxq depth to 256 for ECPF Tariq Toukan
2025-02-10 9:49 ` Michal Swiatkowski
2025-02-09 10:17 ` [PATCH net-next 08/15] net/mlx5e: set the tx_queue_len for pfifo_fast Tariq Toukan
2025-02-10 9:51 ` Michal Swiatkowski
2025-02-09 10:17 ` [PATCH net-next 09/15] net/mlx5: Rename and move mlx5_esw_query_vport_vhca_id Tariq Toukan
2025-02-09 10:17 ` [PATCH net-next 10/15] net/mlx5: Expose ICM consumption per function Tariq Toukan
2025-02-09 10:17 ` [PATCH net-next 11/15] net/mlx5e: Move RQs diagnose to a dedicated function Tariq Toukan
2025-02-09 10:17 ` [PATCH net-next 12/15] net/mlx5e: Add direct TIRs to devlink rx reporter diagnose Tariq Toukan
2025-02-09 10:17 ` [PATCH net-next 13/15] net/mlx5e: Expose RSS via " Tariq Toukan
2025-02-09 10:17 ` [PATCH net-next 14/15] net/mlx5: Extend Ethtool loopback selftest to support non-linear SKB Tariq Toukan
2025-02-09 10:17 ` [PATCH net-next 15/15] net/mlx5: XDP, Enable TX side XDP multi-buffer support Tariq Toukan
2025-02-12 3:36 ` [PATCH net-next 00/15] Rate management on traffic classes + misc Jakub Kicinski
2025-02-12 11:08 ` Tariq Toukan
2025-02-12 20:19 ` Tariq Toukan
2025-03-06 14:08 ` Cosmin Ratiu
2025-02-12 19:20 ` patchwork-bot+netdevbpf
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Z6nLLsMa4njyLrIV@mev-dev.igk.intel.com \
--to=michal.swiatkowski@linux.intel.com \
--cc=andrew+netdev@lunn.ch \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=corbet@lwn.net \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=donald.hunter@gmail.com \
--cc=edumazet@google.com \
--cc=gal@nvidia.com \
--cc=hawk@kernel.org \
--cc=horms@kernel.org \
--cc=jiri@resnulli.us \
--cc=john.fastabend@gmail.com \
--cc=kuba@kernel.org \
--cc=leon@kernel.org \
--cc=leonro@nvidia.com \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-rdma@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=richardcochran@gmail.com \
--cc=saeedm@nvidia.com \
--cc=tariqt@nvidia.com \
--cc=witu@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).