Re: [PATCH net-next 3/3] net/mlx5: Add VHCA_ID page management mode support

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Jakub Kicinski <kuba@kernel.org>
To: tariqt@nvidia.com
Cc: Jakub Kicinski <kuba@kernel.org>,
	edumazet@google.com, pabeni@redhat.com, andrew+netdev@lunn.ch,
	davem@davemloft.net, saeedm@nvidia.com, leon@kernel.org,
	mbloch@nvidia.com, moshe@nvidia.com, agoldberger@nvidia.com,
	netdev@vger.kernel.org, linux-rdma@vger.kernel.org,
	linux-kernel@vger.kernel.org, gal@nvidia.com,
	dtatulea@nvidia.com
Subject: Re: [PATCH net-next 3/3] net/mlx5: Add VHCA_ID page management mode support
Date: Sat,  2 May 2026 18:45:20 -0700	[thread overview]
Message-ID: <20260503014520.4098577-1-kuba@kernel.org> (raw)
In-Reply-To: <20260501044156.260875-4-tariqt@nvidia.com>

This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
---
Subject: net/mlx5: Add VHCA_ID page management mode support

This commit adds support for VHCA_ID-based page management mode. It
updates page management commands to conditionally use vhca_id and adds
a function to resolve the function type via vport lookup.

> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
> --- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
> @@ -852,6 +852,48 @@ bool mlx5_esw_vport_vhca_id(struct mlx5_eswitch *esw, u16 vportn, u16 *vhca_id)
>  	return true;
>  }
>  
> +u16 mlx5_esw_vhca_id_to_func_type(struct mlx5_core_dev *dev, u16 vhca_id)
> +{
> +	struct mlx5_eswitch *esw = dev->priv.eswitch;
> +	struct mlx5_vport *vport;
> +	unsigned long i;
> +	u16 type;
> +
> +	if (vhca_id == MLX5_CAP_GEN(dev, vhca_id))
> +		return MLX5_SELF;
> +
> +	if (!esw)
> +		return MLX5_FUNC_TYPE_NONE;
> +
> +	mutex_lock(&esw->state_lock);

Can this race with driver unload?

Since the page allocator workqueue (pg_wq) is asynchronous and destroyed
after the eswitch structure is freed, could a pending page request evaluate
esw as valid just before it's freed, and then attempt to lock it here?

Also, is there a potential deadlock when acquiring esw->state_lock here?

If a thread holds the lock while sending a firmware command, and the
firmware issues a page request to complete it, the pg_wq worker will block
here waiting for the lock. Would this prevent the firmware command from
completing?

[ ... ]

> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c b/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c
> --- a/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c
[ ... ]
> -static int give_pages(struct mlx5_core_dev *dev, u16 func_id, int npages,
> +static int give_pages(struct mlx5_core_dev *dev, u16 func_vhca_id, int npages,
>  		      int event, bool ec_function)
>  {
> -	u32 function = get_function(func_id, ec_function);
> +	u32 function = get_function_key(dev, func_vhca_id, ec_function);
>  	u32 out[MLX5_ST_SZ_DW(manage_pages_out)] = {0};
>  	int inlen = MLX5_ST_SZ_BYTES(manage_pages_in);

This isn't a new issue introduced by this patch, but does the npages
calculation risk an integer overflow?

The npages parameter is derived from the firmware payload, and the
allocation size is calculated as:

inlen += npages * MLX5_FLD_SZ_BYTES(manage_pages_in, pas[0]);

If a compromised firmware provides a massive npages value, could this
overflow the 32-bit inlen and result in an undersized kvzalloc allocation,
followed by an out-of-bounds write in the subsequent loop?

[ ... ]

> @@ -577,8 +626,9 @@ static int reclaim_pages(struct mlx5_core_dev *dev, u16 func_id, int npages,
>  	if (nclaimed)
>  		*nclaimed = num_claimed;
>  
> -	func_type = func_id_to_type(dev, func_id, ec_function);
> -	dev->priv.page_counters[func_type] -= num_claimed;
> +	func_type = func_vhca_id_to_type(dev, func_vhca_id, ec_function);
> +	if (func_type != MLX5_FUNC_TYPE_NONE)
> +		dev->priv.page_counters[func_type] -= num_claimed;

Does this dynamically look up the vport during page reclaim?

If an SF or VF has been destroyed and its vport removed, the lookup will
return MLX5_FUNC_TYPE_NONE. Will this cause the decrement to be skipped,
leading to a counter leak and triggering the WARN() statements in
mlx5_reclaim_startup_pages() during driver unload?

[ ... ]

> @@ -658,30 +708,101 @@ static int req_pages_handler(struct notifier_block *nb,
[ ... ]
> +	} else {
> +		if (MLX5_CAP_GEN_MAX(dev, icm_mng_function_id_mode) ==
> +		    MLX5_ID_MODE_FUNCTION_VHCA_ID) {
> +			err = mlx5_pagealloc_migrate_pf_to_vhca_id(dev);
> +			if (err)
> +				return err;
> +			mlx5_page_mgt_mode_set(dev, MLX5_PAGE_MGT_MODE_VHCA_ID);
> +		}
> +	}

What happens to the device state if mlx5_pagealloc_migrate_pf_to_vhca_id()
fails?

The firmware has already been instructed to use VHCA_ID mode during driver
load, but if the migration fails, the function returns early and leaves
dev->priv.page_mgt_mode as FUNC_ID.

When the initialization aborts and mlx5_reclaim_startup_pages() runs, will
the driver send reclaim commands using the old ID, causing the firmware
to reject them and leak the allocated DMA memory?

next prev parent reply	other threads:[~2026-05-03  1:45 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-01  4:41 [PATCH net-next 0/3] net/mlx5: ICM page management in VHCA_ID mode Tariq Toukan
2026-05-01  4:41 ` [PATCH net-next 1/3] net/mlx5: Relax capability check for eswitch query paths Tariq Toukan
2026-05-01  4:41 ` [PATCH net-next 2/3] net/mlx5: Make debugfs page counters by function type dynamic Tariq Toukan
2026-05-01  4:41 ` [PATCH net-next 3/3] net/mlx5: Add VHCA_ID page management mode support Tariq Toukan
2026-05-03  1:45   ` Jakub Kicinski
2026-05-04 11:33     ` Moshe Shemesh
2026-05-03  1:45   ` Jakub Kicinski [this message]
2026-05-04 11:41     ` Moshe Shemesh

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260503014520.4098577-1-kuba@kernel.org \
    --to=kuba@kernel.org \
    --cc=agoldberger@nvidia.com \
    --cc=andrew+netdev@lunn.ch \
    --cc=davem@davemloft.net \
    --cc=dtatulea@nvidia.com \
    --cc=edumazet@google.com \
    --cc=gal@nvidia.com \
    --cc=leon@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=mbloch@nvidia.com \
    --cc=moshe@nvidia.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=saeedm@nvidia.com \
    --cc=tariqt@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox