From: Moshe Shemesh <moshe@nvidia.com>
To: Jakub Kicinski <kuba@kernel.org>, <tariqt@nvidia.com>
Cc: <edumazet@google.com>, <pabeni@redhat.com>,
<andrew+netdev@lunn.ch>, <davem@davemloft.net>,
<saeedm@nvidia.com>, <leon@kernel.org>, <mbloch@nvidia.com>,
<agoldberger@nvidia.com>, <netdev@vger.kernel.org>,
<linux-rdma@vger.kernel.org>, <linux-kernel@vger.kernel.org>,
<gal@nvidia.com>, <dtatulea@nvidia.com>
Subject: Re: [PATCH net-next 3/3] net/mlx5: Add VHCA_ID page management mode support
Date: Mon, 4 May 2026 14:41:33 +0300 [thread overview]
Message-ID: <8a7035c2-72a8-4b60-b7a3-8e36787774e8@nvidia.com> (raw)
In-Reply-To: <20260503014520.4098577-1-kuba@kernel.org>
On 5/3/2026 4:45 AM, Jakub Kicinski wrote:
>
> This is an AI-generated review of your patch. The human sending this
> email has considered the AI review valid, or at least plausible.
> ---
> Subject: net/mlx5: Add VHCA_ID page management mode support
>
> This commit adds support for VHCA_ID-based page management mode. It
> updates page management commands to conditionally use vhca_id and adds
> a function to resolve the function type via vport lookup.
>
>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
>> --- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
>> @@ -852,6 +852,48 @@ bool mlx5_esw_vport_vhca_id(struct mlx5_eswitch *esw, u16 vportn, u16 *vhca_id)
>> return true;
>> }
>>
>> +u16 mlx5_esw_vhca_id_to_func_type(struct mlx5_core_dev *dev, u16 vhca_id)
>> +{
>> + struct mlx5_eswitch *esw = dev->priv.eswitch;
>> + struct mlx5_vport *vport;
>> + unsigned long i;
>> + u16 type;
>> +
>> + if (vhca_id == MLX5_CAP_GEN(dev, vhca_id))
>> + return MLX5_SELF;
>> +
>> + if (!esw)
>> + return MLX5_FUNC_TYPE_NONE;
>> +
>> + mutex_lock(&esw->state_lock);
>
> Can this race with driver unload?
>
> Since the page allocator workqueue (pg_wq) is asynchronous and destroyed
> after the eswitch structure is freed, could a pending page request evaluate
> esw as valid just before it's freed, and then attempt to lock it here?
>
> Also, is there a potential deadlock when acquiring esw->state_lock here?
>
> If a thread holds the lock while sending a firmware command, and the
> firmware issues a page request to complete it, the pg_wq worker will block
> here waiting for the lock. Would this prevent the firmware command from
> completing?
In give_pages() the function mlx5_esw_vhca_id_to_func_type() is called
only after the firmware got the pages. Following other mail comments
will use caching instead of lock.
>
> [ ... ]
>
>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c b/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c
>> --- a/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c
>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c
> [ ... ]
>> -static int give_pages(struct mlx5_core_dev *dev, u16 func_id, int npages,
>> +static int give_pages(struct mlx5_core_dev *dev, u16 func_vhca_id, int npages,
>> int event, bool ec_function)
>> {
>> - u32 function = get_function(func_id, ec_function);
>> + u32 function = get_function_key(dev, func_vhca_id, ec_function);
>> u32 out[MLX5_ST_SZ_DW(manage_pages_out)] = {0};
>> int inlen = MLX5_ST_SZ_BYTES(manage_pages_in);
>
> This isn't a new issue introduced by this patch, but does the npages
> calculation risk an integer overflow?
>
> The npages parameter is derived from the firmware payload, and the
> allocation size is calculated as:
>
> inlen += npages * MLX5_FLD_SZ_BYTES(manage_pages_in, pas[0]);
>
> If a compromised firmware provides a massive npages value, could this
> overflow the 32-bit inlen and result in an undersized kvzalloc allocation,
> followed by an out-of-bounds write in the subsequent loop?
Technically yes, but as Sashiko writes here, it is not relevant to this
patch or this series. it could be addressed as a separate hardening patch.
>
> [ ... ]
>
>> @@ -577,8 +626,9 @@ static int reclaim_pages(struct mlx5_core_dev *dev, u16 func_id, int npages,
>> if (nclaimed)
>> *nclaimed = num_claimed;
>>
>> - func_type = func_id_to_type(dev, func_id, ec_function);
>> - dev->priv.page_counters[func_type] -= num_claimed;
>> + func_type = func_vhca_id_to_type(dev, func_vhca_id, ec_function);
>> + if (func_type != MLX5_FUNC_TYPE_NONE)
>> + dev->priv.page_counters[func_type] -= num_claimed;
>
> Does this dynamically look up the vport during page reclaim?
>
> If an SF or VF has been destroyed and its vport removed, the lookup will
> return MLX5_FUNC_TYPE_NONE. Will this cause the decrement to be skipped,
> leading to a counter leak and triggering the WARN() statements in
> mlx5_reclaim_startup_pages() during driver unload?
VF and SF vports are removed after reclaiming pages, but as already
wrote on the other mail, will try caching
>
> [ ... ]
>
>> @@ -658,30 +708,101 @@ static int req_pages_handler(struct notifier_block *nb,
> [ ... ]
>> + } else {
>> + if (MLX5_CAP_GEN_MAX(dev, icm_mng_function_id_mode) ==
>> + MLX5_ID_MODE_FUNCTION_VHCA_ID) {
>> + err = mlx5_pagealloc_migrate_pf_to_vhca_id(dev);
>> + if (err)
>> + return err;
>> + mlx5_page_mgt_mode_set(dev, MLX5_PAGE_MGT_MODE_VHCA_ID);
>> + }
>> + }
>
> What happens to the device state if mlx5_pagealloc_migrate_pf_to_vhca_id()
> fails?
>
> The firmware has already been instructed to use VHCA_ID mode during driver
> load, but if the migration fails, the function returns early and leaves
> dev->priv.page_mgt_mode as FUNC_ID.
>
> When the initialization aborts and mlx5_reclaim_startup_pages() runs, will
> the driver send reclaim commands using the old ID, causing the firmware
> to reject them and leak the allocated DMA memory?
If mlx5_pagealloc_migrate_pf_to_vhca_id() fails, it can fail only on
xa_insert of the new key, OOM issue and we couldn't set the new key. if
it does fail, the device init fails entirely anyway.
prev parent reply other threads:[~2026-05-04 11:42 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-01 4:41 [PATCH net-next 0/3] net/mlx5: ICM page management in VHCA_ID mode Tariq Toukan
2026-05-01 4:41 ` [PATCH net-next 1/3] net/mlx5: Relax capability check for eswitch query paths Tariq Toukan
2026-05-01 4:41 ` [PATCH net-next 2/3] net/mlx5: Make debugfs page counters by function type dynamic Tariq Toukan
2026-05-01 4:41 ` [PATCH net-next 3/3] net/mlx5: Add VHCA_ID page management mode support Tariq Toukan
2026-05-03 1:45 ` Jakub Kicinski
2026-05-04 11:33 ` Moshe Shemesh
2026-05-03 1:45 ` Jakub Kicinski
2026-05-04 11:41 ` Moshe Shemesh [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=8a7035c2-72a8-4b60-b7a3-8e36787774e8@nvidia.com \
--to=moshe@nvidia.com \
--cc=agoldberger@nvidia.com \
--cc=andrew+netdev@lunn.ch \
--cc=davem@davemloft.net \
--cc=dtatulea@nvidia.com \
--cc=edumazet@google.com \
--cc=gal@nvidia.com \
--cc=kuba@kernel.org \
--cc=leon@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-rdma@vger.kernel.org \
--cc=mbloch@nvidia.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=saeedm@nvidia.com \
--cc=tariqt@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox