netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next] net: mana: Fix use-after-free in reset service rescan path
@ 2025-12-16 10:55 Dipayaan Roy
  2025-12-16 12:20 ` Simon Horman
  0 siblings, 1 reply; 3+ messages in thread
From: Dipayaan Roy @ 2025-12-16 10:55 UTC (permalink / raw)
  To: kys, haiyangz, wei.liu, decui, andrew+netdev, davem, edumazet,
	kuba, pabeni, longli, kotaranov, horms, shradhagupta, ssengar,
	ernis, shirazsaleem, linux-hyperv, netdev, linux-kernel,
	linux-rdma, dipayanroy

When mana_serv_reset() encounters -ETIMEDOUT or -EPROTO from
mana_gd_resume(), it performs a PCI rescan via mana_serv_rescan().

mana_serv_rescan() calls pci_stop_and_remove_bus_device(), which can
invoke the driver's remove path and free the gdma_context associated
with the device. After returning, mana_serv_reset() currently jumps to
the out label and attempts to clear gc->in_service, dereferencing a
freed gdma_context.

The issue was observed with the following call logs:
[  698.942636] BUG: unable to handle page fault for address: ff6c2b638088508d
[  698.943121] #PF: supervisor write access in kernel mode
[  698.943423] #PF: error_code(0x0002) - not-present page
[S[  698.943793] Pat Dec  6 07:GD5 100000067 P4D 1002f7067 PUD 1002f8067 PMD 101bef067 PTE 0
0:56 2025] hv_[n e 698.944283] Oops: Oops: 0002 [#1] SMP NOPTI
tvsc f8615163-00[  698.944611] CPU: 28 UID: 0 PID: 249 Comm: kworker/28:1
...
[Sat Dec  6 07:50:56 2025] R10: [  699.121594] mana 7870:00:00.0 enP30832s1: Configured vPort 0 PD 18 DB 16
000000000000001b R11: 0000000000000000 R12: ff44cf3f40270000
[Sat Dec  6 07:50:56 2025] R13: 0000000000000001 R14: ff44cf3f402700c8 R15: ff44cf3f4021b405
[Sat Dec  6 07:50:56 2025] FS:  0000000000000000(0000) GS:ff44cf7e9fcf9000(0000) knlGS:0000000000000000
[Sat Dec  6 07:50:56 2025] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[Sat Dec  6 07:50:56 2025] CR2: ff6c2b638088508d CR3: 000000011fe43001 CR4: 0000000000b73ef0
[Sat Dec  6 07:50:56 2025] Call Trace:
[Sat Dec  6 07:50:56 2025]  <TASK>
[Sat Dec  6 07:50:56 2025]  mana_serv_func+0x24/0x50 [mana]
[Sat Dec  6 07:50:56 2025]  process_one_work+0x190/0x350
[Sat Dec  6 07:50:56 2025]  worker_thread+0x2b7/0x3d0
[Sat Dec  6 07:50:56 2025]  kthread+0xf3/0x200
[Sat Dec  6 07:50:56 2025]  ? __pfx_worker_thread+0x10/0x10
[Sat Dec  6 07:50:56 2025]  ? __pfx_kthread+0x10/0x10
[Sat Dec  6 07:50:56 2025]  ret_from_fork+0x21a/0x250
[Sat Dec  6 07:50:56 2025]  ? __pfx_kthread+0x10/0x10
[Sat Dec  6 07:50:56 2025]  ret_from_fork_asm+0x1a/0x30
[Sat Dec  6 07:50:56 2025]  </TASK>

Fix this by returning immediately after mana_serv_rescan() to avoid
accessing GC state that may no longer be valid.

Fixes: 9bf66036d686 ("net: mana: Handle hardware recovery events when probing the device")

Signed-off-by: Dipayaan Roy <dipayanroy@linux.microsoft.com>
---
 drivers/net/ethernet/microsoft/mana/gdma_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/microsoft/mana/gdma_main.c b/drivers/net/ethernet/microsoft/mana/gdma_main.c
index efb4e412ec7e..0055c231acf6 100644
--- a/drivers/net/ethernet/microsoft/mana/gdma_main.c
+++ b/drivers/net/ethernet/microsoft/mana/gdma_main.c
@@ -481,7 +481,7 @@ static void mana_serv_reset(struct pci_dev *pdev)
 		/* Perform PCI rescan on device if we failed on HWC */
 		dev_err(&pdev->dev, "MANA service: resume failed, rescanning\n");
 		mana_serv_rescan(pdev);
-		goto out;
+		return;
 	}
 
 	if (ret)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH net-next] net: mana: Fix use-after-free in reset service rescan path
  2025-12-16 10:55 [PATCH net-next] net: mana: Fix use-after-free in reset service rescan path Dipayaan Roy
@ 2025-12-16 12:20 ` Simon Horman
  2025-12-16 17:48   ` [EXTERNAL] " Long Li
  0 siblings, 1 reply; 3+ messages in thread
From: Simon Horman @ 2025-12-16 12:20 UTC (permalink / raw)
  To: Dipayaan Roy
  Cc: kys, haiyangz, wei.liu, decui, andrew+netdev, davem, edumazet,
	kuba, pabeni, longli, kotaranov, shradhagupta, ssengar, ernis,
	shirazsaleem, linux-hyperv, netdev, linux-kernel, linux-rdma,
	dipayanroy

On Tue, Dec 16, 2025 at 02:55:08AM -0800, Dipayaan Roy wrote:
> When mana_serv_reset() encounters -ETIMEDOUT or -EPROTO from
> mana_gd_resume(), it performs a PCI rescan via mana_serv_rescan().
> 
> mana_serv_rescan() calls pci_stop_and_remove_bus_device(), which can
> invoke the driver's remove path and free the gdma_context associated
> with the device. After returning, mana_serv_reset() currently jumps to
> the out label and attempts to clear gc->in_service, dereferencing a
> freed gdma_context.
> 
> The issue was observed with the following call logs:
> [  698.942636] BUG: unable to handle page fault for address: ff6c2b638088508d
> [  698.943121] #PF: supervisor write access in kernel mode
> [  698.943423] #PF: error_code(0x0002) - not-present page
> [S[  698.943793] Pat Dec  6 07:GD5 100000067 P4D 1002f7067 PUD 1002f8067 PMD 101bef067 PTE 0
> 0:56 2025] hv_[n e 698.944283] Oops: Oops: 0002 [#1] SMP NOPTI
> tvsc f8615163-00[  698.944611] CPU: 28 UID: 0 PID: 249 Comm: kworker/28:1
> ...
> [Sat Dec  6 07:50:56 2025] R10: [  699.121594] mana 7870:00:00.0 enP30832s1: Configured vPort 0 PD 18 DB 16
> 000000000000001b R11: 0000000000000000 R12: ff44cf3f40270000
> [Sat Dec  6 07:50:56 2025] R13: 0000000000000001 R14: ff44cf3f402700c8 R15: ff44cf3f4021b405
> [Sat Dec  6 07:50:56 2025] FS:  0000000000000000(0000) GS:ff44cf7e9fcf9000(0000) knlGS:0000000000000000
> [Sat Dec  6 07:50:56 2025] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [Sat Dec  6 07:50:56 2025] CR2: ff6c2b638088508d CR3: 000000011fe43001 CR4: 0000000000b73ef0
> [Sat Dec  6 07:50:56 2025] Call Trace:
> [Sat Dec  6 07:50:56 2025]  <TASK>
> [Sat Dec  6 07:50:56 2025]  mana_serv_func+0x24/0x50 [mana]
> [Sat Dec  6 07:50:56 2025]  process_one_work+0x190/0x350
> [Sat Dec  6 07:50:56 2025]  worker_thread+0x2b7/0x3d0
> [Sat Dec  6 07:50:56 2025]  kthread+0xf3/0x200
> [Sat Dec  6 07:50:56 2025]  ? __pfx_worker_thread+0x10/0x10
> [Sat Dec  6 07:50:56 2025]  ? __pfx_kthread+0x10/0x10
> [Sat Dec  6 07:50:56 2025]  ret_from_fork+0x21a/0x250
> [Sat Dec  6 07:50:56 2025]  ? __pfx_kthread+0x10/0x10
> [Sat Dec  6 07:50:56 2025]  ret_from_fork_asm+0x1a/0x30
> [Sat Dec  6 07:50:56 2025]  </TASK>
> 
> Fix this by returning immediately after mana_serv_rescan() to avoid
> accessing GC state that may no longer be valid.
> 
> Fixes: 9bf66036d686 ("net: mana: Handle hardware recovery events when probing the device")
> 

nit: no blank line here please - tags should all appear in one block

> Signed-off-by: Dipayaan Roy <dipayanroy@linux.microsoft.com>

I see that this patch is targeted at net-next.
But this is a fix for a patch present in net.
So it should be targeted at net instead

Subject: [PATCH net] ...

Probably it is not necessary to repost in order to address the minor
feedback I've provided above. But if you do, please be sure to observe
the 24h rule and wait that long between posting revisions of that patch.

https://docs.kernel.org/process/maintainer-netdev.html

The above not withstanding, this patch looks good to me.

Reviewed-by: Simon Horman <horms@kernel.org>

^ permalink raw reply	[flat|nested] 3+ messages in thread

* RE: [EXTERNAL] Re: [PATCH net-next] net: mana: Fix use-after-free in reset service rescan path
  2025-12-16 12:20 ` Simon Horman
@ 2025-12-16 17:48   ` Long Li
  0 siblings, 0 replies; 3+ messages in thread
From: Long Li @ 2025-12-16 17:48 UTC (permalink / raw)
  To: Simon Horman, Dipayaan Roy
  Cc: KY Srinivasan, Haiyang Zhang, wei.liu@kernel.org, Dexuan Cui,
	andrew+netdev@lunn.ch, davem@davemloft.net, edumazet@google.com,
	kuba@kernel.org, pabeni@redhat.com, Konstantin Taranov,
	shradhagupta@linux.microsoft.com, ssengar@linux.microsoft.com,
	ernis@linux.microsoft.com, Shiraz Saleem,
	linux-hyperv@vger.kernel.org, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org,
	Dipayaan Roy

> On Tue, Dec 16, 2025 at 02:55:08AM -0800, Dipayaan Roy wrote:
> > When mana_serv_reset() encounters -ETIMEDOUT or -EPROTO from
> > mana_gd_resume(), it performs a PCI rescan via mana_serv_rescan().
> >
> > mana_serv_rescan() calls pci_stop_and_remove_bus_device(), which can
> > invoke the driver's remove path and free the gdma_context associated
> > with the device. After returning, mana_serv_reset() currently jumps to
> > the out label and attempts to clear gc->in_service, dereferencing a
> > freed gdma_context.
> >
> > The issue was observed with the following call logs:
> > [  698.942636] BUG: unable to handle page fault for address:
> > ff6c2b638088508d [  698.943121] #PF: supervisor write access in kernel
> > mode [  698.943423] #PF: error_code(0x0002) - not-present page [S[
> > 698.943793] Pat Dec  6 07:GD5 100000067 P4D 1002f7067 PUD
> 1002f8067
> > PMD 101bef067 PTE 0
> > 0:56 2025] hv_[n e 698.944283] Oops: Oops: 0002 [#1] SMP NOPTI tvsc
> > f8615163-00[  698.944611] CPU: 28 UID: 0 PID: 249 Comm: kworker/28:1
> > ...
> > [Sat Dec  6 07:50:56 2025] R10: [  699.121594] mana 7870:00:00.0
> > enP30832s1: Configured vPort 0 PD 18 DB 16 000000000000001b R11:
> > 0000000000000000 R12: ff44cf3f40270000 [Sat Dec  6 07:50:56 2025]
> R13:
> > 0000000000000001 R14: ff44cf3f402700c8 R15: ff44cf3f4021b405 [Sat
> Dec
> > 6 07:50:56 2025] FS:  0000000000000000(0000)
> GS:ff44cf7e9fcf9000(0000)
> > knlGS:0000000000000000 [Sat Dec  6 07:50:56 2025] CS:  0010 DS: 0000
> > ES: 0000 CR0: 0000000080050033 [Sat Dec  6 07:50:56 2025] CR2:
> ff6c2b638088508d CR3: 000000011fe43001 CR4: 0000000000b73ef0 [Sat
> Dec  6 07:50:56 2025] Call Trace:
> > [Sat Dec  6 07:50:56 2025]  <TASK>
> > [Sat Dec  6 07:50:56 2025]  mana_serv_func+0x24/0x50 [mana] [Sat Dec
> > 6 07:50:56 2025]  process_one_work+0x190/0x350 [Sat Dec  6 07:50:56
> > 2025]  worker_thread+0x2b7/0x3d0 [Sat Dec  6 07:50:56 2025]
> > kthread+0xf3/0x200 [Sat Dec  6 07:50:56 2025]  ?
> > __pfx_worker_thread+0x10/0x10 [Sat Dec  6 07:50:56 2025]  ?
> > __pfx_kthread+0x10/0x10 [Sat Dec  6 07:50:56 2025]
> > ret_from_fork+0x21a/0x250 [Sat Dec  6 07:50:56 2025]  ?
> > __pfx_kthread+0x10/0x10 [Sat Dec  6 07:50:56 2025]
> > ret_from_fork_asm+0x1a/0x30 [Sat Dec  6 07:50:56 2025]  </TASK>
> >
> > Fix this by returning immediately after mana_serv_rescan() to avoid
> > accessing GC state that may no longer be valid.
> >
> > Fixes: 9bf66036d686 ("net: mana: Handle hardware recovery events when
> > probing the device")
> >
> 
> nit: no blank line here please - tags should all appear in one block
> 
> > Signed-off-by: Dipayaan Roy <dipayanroy@linux.microsoft.com>
> 
> I see that this patch is targeted at net-next.
> But this is a fix for a patch present in net.
> So it should be targeted at net instead
> 
> Subject: [PATCH net] ...
> 
> Probably it is not necessary to repost in order to address the minor feedback
> I've provided above. But if you do, please be sure to observe the 24h rule and
> wait that long between posting revisions of that patch.
> 
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs
> .kernel.org%2Fprocess%2Fmaintainer-
> netdev.html&data=05%7C02%7Clongli%40microsoft.com%7C4c2a8e5358f9
> 426996e808de3c9d8a30%7C72f988bf86f141af91ab2d7cd011db47%7C1%
> 7C0%7C639014844545711953%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0
> eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpb
> CIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=NboWX%2F1bx47kxnm95
> BiopW87UR8pG%2BuOqatiMYaUCyo%3D&reserved=0
> 
> The above not withstanding, this patch looks good to me.
> 
> Reviewed-by: Simon Horman <horms@kernel.org>

Reviewed-by: Long Li <longli@microsoft.com>

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2025-12-16 17:48 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-12-16 10:55 [PATCH net-next] net: mana: Fix use-after-free in reset service rescan path Dipayaan Roy
2025-12-16 12:20 ` Simon Horman
2025-12-16 17:48   ` [EXTERNAL] " Long Li

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).