From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3040B40DFD0;
	Sun,  3 May 2026 01:45:23 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201
ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1777772723; cv=none; b=pVtUkSaPD8wX+qctzlLlhsi6GZdpyPl3pYiGo2OVkTUtuVH5l6D3LfD3XIRPeekc9SDqrTtMfJvfPMc0xdmDeS/R32ITA4R8uNQzIQeU9PbgPZCOLM/fJItEP94yrU1T185lQEXpi/+wLDGoXExWlJ6vP7lrKABpLmhzijiCtMw=
ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1777772723; c=relaxed/simple;
	bh=OcoE6oTTHn+MqLfiKmS2JWn13hTHY26HJsZtSuao5Ow=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version; b=upa1qIntEC9LVlC4b1ifIvVBO/G9JRs3+5rQ7GhvC+/szUPyrZ+u7Eo1woV12aWILbuJi+2bukREzMU1+hZtzoxDBlySUMtSXeBJy/P7bPM0CxP4I0den3AXI5FpsPvPXc6f1Vfrmz+AuYsGkatOSnaNLYuGLmrEmwsJYCngJAs=
ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=VgYQ427k; arc=none smtp.client-ip=10.30.226.201
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="VgYQ427k"
Received: by smtp.kernel.org (Postfix) with ESMTPSA id 5AD69C19425;
	Sun,  3 May 2026 01:45:22 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org;
	s=k20201202; t=1777772723;
	bh=OcoE6oTTHn+MqLfiKmS2JWn13hTHY26HJsZtSuao5Ow=;
	h=From:To:Cc:Subject:Date:In-Reply-To:References:From;
	b=VgYQ427k+iLvvL9YOyYZgSxzukF5XVV8kzvSCLAsAilNyDXPAt3Qu8rPHiyPUK9Lv
	 QnVEGsyzMZzB/yUsJcGJpSK+7jfUGpafrRnnW3IJDl58i0yzz2fFZEUsUNUZcj4TWf
	 FGY/UiOXR8fT3yrzoNAgwAC/cB5zt75+JpM41UW3WgLdbE1hVaX6Mn9OisDNempdcj
	 M2MfbRGMKlLtb1ra4zBxmOj5gKebGWDrxjIveoJ19jbeDRM8qdrwbvSuh0vUQ5eb21
	 gaA1b8J0gIp3BvdMjUZm0fM4bseH0MZ3i81Sn3yum6NitFrOmpFjjIixrx3iCgfQw/
	 orlR1sWdFfIZA==
From: Jakub Kicinski <kuba@kernel.org>
To: tariqt@nvidia.com
Cc: Jakub Kicinski <kuba@kernel.org>,
	edumazet@google.com,
	pabeni@redhat.com,
	andrew+netdev@lunn.ch,
	davem@davemloft.net,
	saeedm@nvidia.com,
	leon@kernel.org,
	mbloch@nvidia.com,
	moshe@nvidia.com,
	agoldberger@nvidia.com,
	netdev@vger.kernel.org,
	linux-rdma@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	gal@nvidia.com,
	dtatulea@nvidia.com
Subject: Re: [PATCH net-next 3/3] net/mlx5: Add VHCA_ID page management mode support
Date: Sat,  2 May 2026 18:45:20 -0700
Message-ID: <20260503014520.4098577-1-kuba@kernel.org>
X-Mailer: git-send-email 2.54.0
In-Reply-To: <20260501044156.260875-4-tariqt@nvidia.com>
References: <20260501044156.260875-4-tariqt@nvidia.com>
Precedence: bulk
X-Mailing-List: netdev@vger.kernel.org
List-Id: <netdev.vger.kernel.org>
List-Subscribe: <mailto:netdev+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:netdev+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit

This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
---
Subject: net/mlx5: Add VHCA_ID page management mode support

This commit adds support for VHCA_ID-based page management mode. It
updates page management commands to conditionally use vhca_id and adds
a function to resolve the function type via vport lookup.

> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
> --- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
> @@ -852,6 +852,48 @@ bool mlx5_esw_vport_vhca_id(struct mlx5_eswitch *esw, u16 vportn, u16 *vhca_id)
>  	return true;
>  }
>  
> +u16 mlx5_esw_vhca_id_to_func_type(struct mlx5_core_dev *dev, u16 vhca_id)
> +{
> +	struct mlx5_eswitch *esw = dev->priv.eswitch;
> +	struct mlx5_vport *vport;
> +	unsigned long i;
> +	u16 type;
> +
> +	if (vhca_id == MLX5_CAP_GEN(dev, vhca_id))
> +		return MLX5_SELF;
> +
> +	if (!esw)
> +		return MLX5_FUNC_TYPE_NONE;
> +
> +	mutex_lock(&esw->state_lock);

Can this race with driver unload?

Since the page allocator workqueue (pg_wq) is asynchronous and destroyed
after the eswitch structure is freed, could a pending page request evaluate
esw as valid just before it's freed, and then attempt to lock it here?

Also, is there a potential deadlock when acquiring esw->state_lock here?

If a thread holds the lock while sending a firmware command, and the
firmware issues a page request to complete it, the pg_wq worker will block
here waiting for the lock. Would this prevent the firmware command from
completing?

[ ... ]

> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c b/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c
> --- a/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c
[ ... ]
> -static int give_pages(struct mlx5_core_dev *dev, u16 func_id, int npages,
> +static int give_pages(struct mlx5_core_dev *dev, u16 func_vhca_id, int npages,
>  		      int event, bool ec_function)
>  {
> -	u32 function = get_function(func_id, ec_function);
> +	u32 function = get_function_key(dev, func_vhca_id, ec_function);
>  	u32 out[MLX5_ST_SZ_DW(manage_pages_out)] = {0};
>  	int inlen = MLX5_ST_SZ_BYTES(manage_pages_in);

This isn't a new issue introduced by this patch, but does the npages
calculation risk an integer overflow?

The npages parameter is derived from the firmware payload, and the
allocation size is calculated as:

inlen += npages * MLX5_FLD_SZ_BYTES(manage_pages_in, pas[0]);

If a compromised firmware provides a massive npages value, could this
overflow the 32-bit inlen and result in an undersized kvzalloc allocation,
followed by an out-of-bounds write in the subsequent loop?

[ ... ]

> @@ -577,8 +626,9 @@ static int reclaim_pages(struct mlx5_core_dev *dev, u16 func_id, int npages,
>  	if (nclaimed)
>  		*nclaimed = num_claimed;
>  
> -	func_type = func_id_to_type(dev, func_id, ec_function);
> -	dev->priv.page_counters[func_type] -= num_claimed;
> +	func_type = func_vhca_id_to_type(dev, func_vhca_id, ec_function);
> +	if (func_type != MLX5_FUNC_TYPE_NONE)
> +		dev->priv.page_counters[func_type] -= num_claimed;

Does this dynamically look up the vport during page reclaim?

If an SF or VF has been destroyed and its vport removed, the lookup will
return MLX5_FUNC_TYPE_NONE. Will this cause the decrement to be skipped,
leading to a counter leak and triggering the WARN() statements in
mlx5_reclaim_startup_pages() during driver unload?

[ ... ]

> @@ -658,30 +708,101 @@ static int req_pages_handler(struct notifier_block *nb,
[ ... ]
> +	} else {
> +		if (MLX5_CAP_GEN_MAX(dev, icm_mng_function_id_mode) ==
> +		    MLX5_ID_MODE_FUNCTION_VHCA_ID) {
> +			err = mlx5_pagealloc_migrate_pf_to_vhca_id(dev);
> +			if (err)
> +				return err;
> +			mlx5_page_mgt_mode_set(dev, MLX5_PAGE_MGT_MODE_VHCA_ID);
> +		}
> +	}

What happens to the device state if mlx5_pagealloc_migrate_pf_to_vhca_id()
fails?

The firmware has already been instructed to use VHCA_ID mode during driver
load, but if the migration fails, the function returns early and leaves
dev->priv.page_mgt_mode as FUNC_ID.

When the initialization aborts and mlx5_reclaim_startup_pages() runs, will
the driver send reclaim commands using the old ID, causing the firmware
to reject them and leak the allocated DMA memory?