Re: [PATCH net-next 3/3] net/mlx5: Add VHCA_ID page management mode support

public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed

From: Moshe Shemesh <moshe@nvidia.com>
To: Jakub Kicinski <kuba@kernel.org>, <tariqt@nvidia.com>
Cc: <edumazet@google.com>, <pabeni@redhat.com>,
	<andrew+netdev@lunn.ch>, <davem@davemloft.net>,
	<saeedm@nvidia.com>, <leon@kernel.org>, <mbloch@nvidia.com>,
	<agoldberger@nvidia.com>, <netdev@vger.kernel.org>,
	<linux-rdma@vger.kernel.org>, <linux-kernel@vger.kernel.org>,
	<gal@nvidia.com>, <dtatulea@nvidia.com>
Subject: Re: [PATCH net-next 3/3] net/mlx5: Add VHCA_ID page management mode support
Date: Mon, 4 May 2026 14:41:33 +0300	[thread overview]
Message-ID: <8a7035c2-72a8-4b60-b7a3-8e36787774e8@nvidia.com> (raw)
In-Reply-To: <20260503014520.4098577-1-kuba@kernel.org>



On 5/3/2026 4:45 AM, Jakub Kicinski wrote:
> 
> This is an AI-generated review of your patch. The human sending this
> email has considered the AI review valid, or at least plausible.
> ---
> Subject: net/mlx5: Add VHCA_ID page management mode support
> 
> This commit adds support for VHCA_ID-based page management mode. It
> updates page management commands to conditionally use vhca_id and adds
> a function to resolve the function type via vport lookup.
> 
>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
>> --- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
>> @@ -852,6 +852,48 @@ bool mlx5_esw_vport_vhca_id(struct mlx5_eswitch *esw, u16 vportn, u16 *vhca_id)
>>        return true;
>>   }
>>
>> +u16 mlx5_esw_vhca_id_to_func_type(struct mlx5_core_dev *dev, u16 vhca_id)
>> +{
>> +     struct mlx5_eswitch *esw = dev->priv.eswitch;
>> +     struct mlx5_vport *vport;
>> +     unsigned long i;
>> +     u16 type;
>> +
>> +     if (vhca_id == MLX5_CAP_GEN(dev, vhca_id))
>> +             return MLX5_SELF;
>> +
>> +     if (!esw)
>> +             return MLX5_FUNC_TYPE_NONE;
>> +
>> +     mutex_lock(&esw->state_lock);
> 
> Can this race with driver unload?
> 
> Since the page allocator workqueue (pg_wq) is asynchronous and destroyed
> after the eswitch structure is freed, could a pending page request evaluate
> esw as valid just before it's freed, and then attempt to lock it here?
> 
> Also, is there a potential deadlock when acquiring esw->state_lock here?
> 
> If a thread holds the lock while sending a firmware command, and the
> firmware issues a page request to complete it, the pg_wq worker will block
> here waiting for the lock. Would this prevent the firmware command from
> completing?

In give_pages() the function mlx5_esw_vhca_id_to_func_type() is called 
only after the firmware got the pages. Following other mail comments 
will use caching instead of lock.
> 
> [ ... ]
> 
>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c b/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c
>> --- a/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c
>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c
> [ ... ]
>> -static int give_pages(struct mlx5_core_dev *dev, u16 func_id, int npages,
>> +static int give_pages(struct mlx5_core_dev *dev, u16 func_vhca_id, int npages,
>>                      int event, bool ec_function)
>>   {
>> -     u32 function = get_function(func_id, ec_function);
>> +     u32 function = get_function_key(dev, func_vhca_id, ec_function);
>>        u32 out[MLX5_ST_SZ_DW(manage_pages_out)] = {0};
>>        int inlen = MLX5_ST_SZ_BYTES(manage_pages_in);
> 
> This isn't a new issue introduced by this patch, but does the npages
> calculation risk an integer overflow?
> 
> The npages parameter is derived from the firmware payload, and the
> allocation size is calculated as:
> 
> inlen += npages * MLX5_FLD_SZ_BYTES(manage_pages_in, pas[0]);
> 
> If a compromised firmware provides a massive npages value, could this
> overflow the 32-bit inlen and result in an undersized kvzalloc allocation,
> followed by an out-of-bounds write in the subsequent loop?

Technically yes, but as Sashiko writes here, it is not relevant to this 
patch or this series. it could be addressed as a separate hardening patch.

> 
> [ ... ]
> 
>> @@ -577,8 +626,9 @@ static int reclaim_pages(struct mlx5_core_dev *dev, u16 func_id, int npages,
>>        if (nclaimed)
>>                *nclaimed = num_claimed;
>>
>> -     func_type = func_id_to_type(dev, func_id, ec_function);
>> -     dev->priv.page_counters[func_type] -= num_claimed;
>> +     func_type = func_vhca_id_to_type(dev, func_vhca_id, ec_function);
>> +     if (func_type != MLX5_FUNC_TYPE_NONE)
>> +             dev->priv.page_counters[func_type] -= num_claimed;
> 
> Does this dynamically look up the vport during page reclaim?
> 
> If an SF or VF has been destroyed and its vport removed, the lookup will
> return MLX5_FUNC_TYPE_NONE. Will this cause the decrement to be skipped,
> leading to a counter leak and triggering the WARN() statements in
> mlx5_reclaim_startup_pages() during driver unload?

VF and SF vports are removed after reclaiming pages, but as already 
wrote on the other mail, will try caching

> 
> [ ... ]
> 
>> @@ -658,30 +708,101 @@ static int req_pages_handler(struct notifier_block *nb,
> [ ... ]
>> +     } else {
>> +             if (MLX5_CAP_GEN_MAX(dev, icm_mng_function_id_mode) ==
>> +                 MLX5_ID_MODE_FUNCTION_VHCA_ID) {
>> +                     err = mlx5_pagealloc_migrate_pf_to_vhca_id(dev);
>> +                     if (err)
>> +                             return err;
>> +                     mlx5_page_mgt_mode_set(dev, MLX5_PAGE_MGT_MODE_VHCA_ID);
>> +             }
>> +     }
> 
> What happens to the device state if mlx5_pagealloc_migrate_pf_to_vhca_id()
> fails?
> 
> The firmware has already been instructed to use VHCA_ID mode during driver
> load, but if the migration fails, the function returns early and leaves
> dev->priv.page_mgt_mode as FUNC_ID.
> 
> When the initialization aborts and mlx5_reclaim_startup_pages() runs, will
> the driver send reclaim commands using the old ID, causing the firmware
> to reject them and leak the allocated DMA memory?

If mlx5_pagealloc_migrate_pf_to_vhca_id() fails, it can fail only on 
xa_insert of the new key, OOM issue and we couldn't set the new key. if 
it does fail, the device init fails entirely anyway.

     prev parent reply	other threads:[~2026-05-04 11:42 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-01  4:41 [PATCH net-next 0/3] net/mlx5: ICM page management in VHCA_ID mode Tariq Toukan
2026-05-01  4:41 ` [PATCH net-next 1/3] net/mlx5: Relax capability check for eswitch query paths Tariq Toukan
2026-05-01  4:41 ` [PATCH net-next 2/3] net/mlx5: Make debugfs page counters by function type dynamic Tariq Toukan
2026-05-01  4:41 ` [PATCH net-next 3/3] net/mlx5: Add VHCA_ID page management mode support Tariq Toukan
2026-05-03  1:45   ` Jakub Kicinski
2026-05-04 11:33     ` Moshe Shemesh
2026-05-03  1:45   ` Jakub Kicinski
2026-05-04 11:41     ` Moshe Shemesh [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8a7035c2-72a8-4b60-b7a3-8e36787774e8@nvidia.com \
    --to=moshe@nvidia.com \
    --cc=agoldberger@nvidia.com \
    --cc=andrew+netdev@lunn.ch \
    --cc=davem@davemloft.net \
    --cc=dtatulea@nvidia.com \
    --cc=edumazet@google.com \
    --cc=gal@nvidia.com \
    --cc=kuba@kernel.org \
    --cc=leon@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=mbloch@nvidia.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=saeedm@nvidia.com \
    --cc=tariqt@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox