public inbox for stable@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH rdma v2] RDMA/mana_ib: Disable RX steering on RSS QP destroy
@ 2026-03-23 20:10 Long Li
  2026-03-25  9:13 ` Leon Romanovsky
  0 siblings, 1 reply; 3+ messages in thread
From: Long Li @ 2026-03-23 20:10 UTC (permalink / raw)
  To: Long Li, Konstantin Taranov, Jakub Kicinski, David S . Miller,
	Paolo Abeni, Eric Dumazet, Andrew Lunn, Jason Gunthorpe,
	Leon Romanovsky, Haiyang Zhang, K . Y . Srinivasan, Wei Liu,
	Dexuan Cui
  Cc: Simon Horman, netdev, linux-rdma, linux-hyperv, linux-kernel,
	stable

When an RSS QP is destroyed (e.g. DPDK exit), mana_ib_destroy_qp_rss()
destroys the RX WQ objects but does not disable vPort RX steering in
firmware. This leaves stale steering configuration that still points to
the destroyed RX objects.

If traffic continues to arrive (e.g. peer VM is still transmitting) and
the VF interface is subsequently brought up (mana_open), the firmware
may deliver completions using stale CQ IDs from the old RX objects.
These CQ IDs can be reused by the ethernet driver for new TX CQs,
causing RX completions to land on TX CQs:

  WARNING: mana_poll_tx_cq+0x1b8/0x220 [mana]  (is_sq == false)
  WARNING: mana_gd_process_eq_events+0x209/0x290 (cq_table lookup fails)

Fix this by disabling vPort RX steering before destroying RX WQ objects.
Note that mana_fence_rqs() cannot be used here because the fence
completion is delivered on the CQ, which is polled by user-mode (e.g.
DPDK) and not visible to the kernel driver.

Refactor the disable logic into a shared mana_disable_vport_rx() in
mana_en, exported for use by mana_ib, replacing the duplicate code.
The ethernet driver's mana_dealloc_queues() is also updated to call
this common function.

Fixes: 0266a177631d ("RDMA/mana_ib: Add a driver for Microsoft Azure Network Adapter")
Cc: stable@vger.kernel.org
Signed-off-by: Long Li <longli@microsoft.com>
---
v2:
  - Removed redundant ibdev_err on mana_disable_vport_rx() failure as
    mana_cfg_vport_steering() already logs all failure scenarios.
  - Added comment clarifying this is best effort.
 drivers/infiniband/hw/mana/qp.c               | 15 +++++++++++++++
 drivers/net/ethernet/microsoft/mana/mana_en.c | 11 ++++++++++-
 include/net/mana/mana.h                       |  1 +
 3 files changed, 26 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/hw/mana/qp.c b/drivers/infiniband/hw/mana/qp.c
index 80cf4ade4b75..685e61e8436c 100644
--- a/drivers/infiniband/hw/mana/qp.c
+++ b/drivers/infiniband/hw/mana/qp.c
@@ -834,6 +834,21 @@ static int mana_ib_destroy_qp_rss(struct mana_ib_qp *qp,
 	ndev = mana_ib_get_netdev(qp->ibqp.device, qp->port);
 	mpc = netdev_priv(ndev);
 
+	/* Disable vPort RX steering before destroying RX WQ objects.
+	 * Otherwise firmware still routes traffic to the destroyed queues,
+	 * which can cause bogus completions on reused CQ IDs when the
+	 * ethernet driver later creates new queues on mana_open().
+	 *
+	 * Unlike the ethernet teardown path, mana_fence_rqs() cannot be
+	 * used here because the fence completion CQE is delivered on the
+	 * CQ which is polled by userspace (e.g. DPDK), so there is no way
+	 * for the kernel to wait for fence completion.
+	 *
+	 * This is best effort — if it fails there is not much we can do,
+	 * and mana_cfg_vport_steering() already logs the error.
+	 */
+	mana_disable_vport_rx(mpc);
+
 	for (i = 0; i < (1 << ind_tbl->log_ind_tbl_size); i++) {
 		ibwq = ind_tbl->ind_tbl[i];
 		wq = container_of(ibwq, struct mana_ib_wq, ibwq);
diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/ethernet/microsoft/mana/mana_en.c
index b3c3a70f733f..0816279f525e 100644
--- a/drivers/net/ethernet/microsoft/mana/mana_en.c
+++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
@@ -2934,6 +2934,13 @@ static void mana_rss_table_init(struct mana_port_context *apc)
 			ethtool_rxfh_indir_default(i, apc->num_queues);
 }
 
+int mana_disable_vport_rx(struct mana_port_context *apc)
+{
+	return mana_cfg_vport_steering(apc, TRI_STATE_FALSE, false, false,
+				       false);
+}
+EXPORT_SYMBOL_NS(mana_disable_vport_rx, "NET_MANA");
+
 int mana_config_rss(struct mana_port_context *apc, enum TRI_STATE rx,
 		    bool update_hash, bool update_tab)
 {
@@ -3339,10 +3346,12 @@ static int mana_dealloc_queues(struct net_device *ndev)
 	 */
 
 	apc->rss_state = TRI_STATE_FALSE;
-	err = mana_config_rss(apc, TRI_STATE_FALSE, false, false);
+	err = mana_disable_vport_rx(apc);
 	if (err && mana_en_need_log(apc, err))
 		netdev_err(ndev, "Failed to disable vPort: %d\n", err);
 
+	mana_fence_rqs(apc);
+
 	/* Even in err case, still need to cleanup the vPort */
 	mana_destroy_rxqs(apc);
 	mana_destroy_txq(apc);
diff --git a/include/net/mana/mana.h b/include/net/mana/mana.h
index 204c2b612a62..2634e9135eed 100644
--- a/include/net/mana/mana.h
+++ b/include/net/mana/mana.h
@@ -574,6 +574,7 @@ struct mana_port_context {
 netdev_tx_t mana_start_xmit(struct sk_buff *skb, struct net_device *ndev);
 int mana_config_rss(struct mana_port_context *ac, enum TRI_STATE rx,
 		    bool update_hash, bool update_tab);
+int mana_disable_vport_rx(struct mana_port_context *apc);
 
 int mana_alloc_queues(struct net_device *ndev);
 int mana_attach(struct net_device *ndev);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH rdma v2] RDMA/mana_ib: Disable RX steering on RSS QP destroy
  2026-03-23 20:10 [PATCH rdma v2] RDMA/mana_ib: Disable RX steering on RSS QP destroy Long Li
@ 2026-03-25  9:13 ` Leon Romanovsky
  2026-03-25 19:42   ` [EXTERNAL] " Long Li
  0 siblings, 1 reply; 3+ messages in thread
From: Leon Romanovsky @ 2026-03-25  9:13 UTC (permalink / raw)
  To: Long Li
  Cc: Konstantin Taranov, Jakub Kicinski, David S . Miller, Paolo Abeni,
	Eric Dumazet, Andrew Lunn, Jason Gunthorpe, Haiyang Zhang,
	K . Y . Srinivasan, Wei Liu, Dexuan Cui, Simon Horman, netdev,
	linux-rdma, linux-hyperv, linux-kernel, stable

On Mon, Mar 23, 2026 at 01:10:56PM -0700, Long Li wrote:
> When an RSS QP is destroyed (e.g. DPDK exit), mana_ib_destroy_qp_rss()
> destroys the RX WQ objects but does not disable vPort RX steering in
> firmware. This leaves stale steering configuration that still points to
> the destroyed RX objects.
> 
> If traffic continues to arrive (e.g. peer VM is still transmitting) and
> the VF interface is subsequently brought up (mana_open), the firmware
> may deliver completions using stale CQ IDs from the old RX objects.
> These CQ IDs can be reused by the ethernet driver for new TX CQs,
> causing RX completions to land on TX CQs:
> 
>   WARNING: mana_poll_tx_cq+0x1b8/0x220 [mana]  (is_sq == false)
>   WARNING: mana_gd_process_eq_events+0x209/0x290 (cq_table lookup fails)
> 
> Fix this by disabling vPort RX steering before destroying RX WQ objects.
> Note that mana_fence_rqs() cannot be used here because the fence
> completion is delivered on the CQ, which is polled by user-mode (e.g.
> DPDK) and not visible to the kernel driver.
> 
> Refactor the disable logic into a shared mana_disable_vport_rx() in
> mana_en, exported for use by mana_ib, replacing the duplicate code.
> The ethernet driver's mana_dealloc_queues() is also updated to call
> this common function.
> 
> Fixes: 0266a177631d ("RDMA/mana_ib: Add a driver for Microsoft Azure Network Adapter")
> Cc: stable@vger.kernel.org
> Signed-off-by: Long Li <longli@microsoft.com>
> ---
> v2:
>   - Removed redundant ibdev_err on mana_disable_vport_rx() failure as
>     mana_cfg_vport_steering() already logs all failure scenarios.
>   - Added comment clarifying this is best effort.
>  drivers/infiniband/hw/mana/qp.c               | 15 +++++++++++++++
>  drivers/net/ethernet/microsoft/mana/mana_en.c | 11 ++++++++++-
>  include/net/mana/mana.h                       |  1 +
>  3 files changed, 26 insertions(+), 1 deletion(-)


It doesn't apply to rdma-rc.

Looking up https://lore.kernel.org/all/20260323201106.1768705-1-longli@microsoft.com/
Grabbing thread from lore.kernel.org/all/20260323201106.1768705-1-longli@microsoft.com/t.mbox.gz
Checking for newer revisions
Grabbing search results from lore.kernel.org
Analyzing 3 messages in the thread
Looking for additional code-review trailers on lore.kernel.org
Analyzing 0 code-review messages
Checking attestation on all messages, may take a moment...
---
  [PATCH v2] RDMA/mana_ib: Disable RX steering on RSS QP destroy
    + Link: https://patch.msgid.link/20260323201106.1768705-1-longli@microsoft.com
    + Signed-off-by: Leon Romanovsky <leon@kernel.org>
  ---
  NOTE: install dkimpy for DKIM signature verification
---
Total patches: 1
---
Applying: RDMA/mana_ib: Disable RX steering on RSS QP destroy
Patch failed at 0001 RDMA/mana_ib: Disable RX steering on RSS QP destroy
error: patch failed: drivers/net/ethernet/microsoft/mana/mana_en.c:3339
error: drivers/net/ethernet/microsoft/mana/mana_en.c: patch does not apply
hint: Use 'git am --show-current-patch=diff' to see the failed patch
hint: When you have resolved this problem, run "git am --continue".
hint: If you prefer to skip this patch, run "git am --skip" instead.
hint: To restore the original branch and stop patching, run "git am --abort".
hint: Disable this message with "git config set advice.mergeConflict false"
Press any key to continue...

Thanks

^ permalink raw reply	[flat|nested] 3+ messages in thread

* RE: [EXTERNAL] Re: [PATCH rdma v2] RDMA/mana_ib: Disable RX steering on RSS QP destroy
  2026-03-25  9:13 ` Leon Romanovsky
@ 2026-03-25 19:42   ` Long Li
  0 siblings, 0 replies; 3+ messages in thread
From: Long Li @ 2026-03-25 19:42 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Konstantin Taranov, Jakub Kicinski, David S . Miller, Paolo Abeni,
	Eric Dumazet, Andrew Lunn, Jason Gunthorpe, Haiyang Zhang,
	KY Srinivasan, Wei Liu, Dexuan Cui, Simon Horman,
	netdev@vger.kernel.org, linux-rdma@vger.kernel.org,
	linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org,
	stable@vger.kernel.org

> On Mon, Mar 23, 2026 at 01:10:56PM -0700, Long Li wrote:
> > When an RSS QP is destroyed (e.g. DPDK exit), mana_ib_destroy_qp_rss()
> > destroys the RX WQ objects but does not disable vPort RX steering in
> > firmware. This leaves stale steering configuration that still points
> > to the destroyed RX objects.
> >
> > If traffic continues to arrive (e.g. peer VM is still transmitting)
> > and the VF interface is subsequently brought up (mana_open), the
> > firmware may deliver completions using stale CQ IDs from the old RX
> objects.
> > These CQ IDs can be reused by the ethernet driver for new TX CQs,
> > causing RX completions to land on TX CQs:
> >
> >   WARNING: mana_poll_tx_cq+0x1b8/0x220 [mana]  (is_sq == false)
> >   WARNING: mana_gd_process_eq_events+0x209/0x290 (cq_table lookup
> > fails)
> >
> > Fix this by disabling vPort RX steering before destroying RX WQ objects.
> > Note that mana_fence_rqs() cannot be used here because the fence
> > completion is delivered on the CQ, which is polled by user-mode (e.g.
> > DPDK) and not visible to the kernel driver.
> >
> > Refactor the disable logic into a shared mana_disable_vport_rx() in
> > mana_en, exported for use by mana_ib, replacing the duplicate code.
> > The ethernet driver's mana_dealloc_queues() is also updated to call
> > this common function.
> >
> > Fixes: 0266a177631d ("RDMA/mana_ib: Add a driver for Microsoft Azure
> > Network Adapter")
> > Cc: stable@vger.kernel.org
> > Signed-off-by: Long Li <longli@microsoft.com>
> > ---
> > v2:
> >   - Removed redundant ibdev_err on mana_disable_vport_rx() failure as
> >     mana_cfg_vport_steering() already logs all failure scenarios.
> >   - Added comment clarifying this is best effort.
> >  drivers/infiniband/hw/mana/qp.c               | 15 +++++++++++++++
> >  drivers/net/ethernet/microsoft/mana/mana_en.c | 11 ++++++++++-
> >  include/net/mana/mana.h                       |  1 +
> >  3 files changed, 26 insertions(+), 1 deletion(-)
> 
> 
> It doesn't apply to rdma-rc.

Sorry for the mistake. I have rebased it to rdma for-rc.

Thanks,
Long

> 
> Looking up
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.
> kernel.org%2Fall%2F20260323201106.1768705-1-
> longli%40microsoft.com%2F&data=05%7C02%7Clongli%40microsoft.com%7
> Cf884f054e1bd45e83ae808de8a4edeeb%7C72f988bf86f141af91ab2d7cd0
> 11db47%7C1%7C0%7C639100268646867690%7CUnknown%7CTWFpbGZs
> b3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIs
> IkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=6saLMRy7P
> Ck6nvTomdtqYFzc6iX%2FC50UG9YnfN0HILc%3D&reserved=0
> Grabbing thread from lore.kernel.org/all/20260323201106.1768705-1-
> longli@microsoft.com/t.mbox.gz
> Checking for newer revisions
> Grabbing search results from lore.kernel.org Analyzing 3 messages in the
> thread Looking for additional code-review trailers on lore.kernel.org Analyzing
> 0 code-review messages Checking attestation on all messages, may take a
> moment...
> ---
>   [PATCH v2] RDMA/mana_ib: Disable RX steering on RSS QP destroy
>     + Link:
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpatc
> h.msgid.link%2F20260323201106.1768705-1-
> longli%40microsoft.com&data=05%7C02%7Clongli%40microsoft.com%7Cf88
> 4f054e1bd45e83ae808de8a4edeeb%7C72f988bf86f141af91ab2d7cd011d
> b47%7C1%7C0%7C639100268646881774%7CUnknown%7CTWFpbGZsb3d
> 8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkF
> OIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=frtDMKrk5c3xz
> 8FOgRy5UM2HbpeTUBXcuwGTbw3k33w%3D&reserved=0
>     + Signed-off-by: Leon Romanovsky <leon@kernel.org>
>   ---
>   NOTE: install dkimpy for DKIM signature verification
> ---
> Total patches: 1
> ---
> Applying: RDMA/mana_ib: Disable RX steering on RSS QP destroy Patch failed
> at 0001 RDMA/mana_ib: Disable RX steering on RSS QP destroy
> error: patch failed: drivers/net/ethernet/microsoft/mana/mana_en.c:3339
> error: drivers/net/ethernet/microsoft/mana/mana_en.c: patch does not apply
> hint: Use 'git am --show-current-patch=diff' to see the failed patch
> hint: When you have resolved this problem, run "git am --continue".
> hint: If you prefer to skip this patch, run "git am --skip" instead.
> hint: To restore the original branch and stop patching, run "git am --abort".
> hint: Disable this message with "git config set advice.mergeConflict false"
> Press any key to continue...
> 
> Thanks


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2026-03-25 19:42 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-23 20:10 [PATCH rdma v2] RDMA/mana_ib: Disable RX steering on RSS QP destroy Long Li
2026-03-25  9:13 ` Leon Romanovsky
2026-03-25 19:42   ` [EXTERNAL] " Long Li

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox