* [PATCH v2 1/3] EDAC/versalnet: Fix teardown ordering in mc_remove()
@ 2026-04-01 11:18 Prasanna Kumar T S M
2026-04-01 11:18 ` [PATCH v2 2/3] EDAC/versalnet: Fix device name memory leak Prasanna Kumar T S M
` (2 more replies)
0 siblings, 3 replies; 8+ messages in thread
From: Prasanna Kumar T S M @ 2026-04-01 11:18 UTC (permalink / raw)
To: ptsm, ssengar, shubhrajyoti.datta, bp, tony.luck, linux-edac,
linux-kernel
The teardown sequence in mc_remove() does not mirror the reverse of the
initialization order in mc_probe(). In particular,
unregister_rpmsg_driver() is called before remove_versalnet(), and
cdx_mcdi_finish() is called after rproc_shutdown().
Reorder mc_remove() to reverse the probe initialization sequence,
consistent with the probe error-unwind paths.
The rproc reference acquired via rproc_get_by_phandle() during probe
is not released in mc_remove(), causing a reference count leak. Add
the missing rproc_put() call.
Fixes: d5fe2fec6c40 ("EDAC: Add a driver for the AMD Versal NET DDR controller")
Signed-off-by: Prasanna Kumar T S M <ptsm@linux.microsoft.com>
---
drivers/edac/versalnet_edac.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/edac/versalnet_edac.c b/drivers/edac/versalnet_edac.c
index b87fe57aa842..acd51b492772 100644
--- a/drivers/edac/versalnet_edac.c
+++ b/drivers/edac/versalnet_edac.c
@@ -955,10 +955,11 @@ static void mc_remove(struct platform_device *pdev)
{
struct mc_priv *priv = platform_get_drvdata(pdev);
- unregister_rpmsg_driver(&amd_rpmsg_driver);
remove_versalnet(priv);
- rproc_shutdown(priv->mcdi->r5_rproc);
cdx_mcdi_finish(priv->mcdi);
+ unregister_rpmsg_driver(&amd_rpmsg_driver);
+ rproc_shutdown(priv->mcdi->r5_rproc);
+ rproc_put(priv->mcdi->r5_rproc);
kfree(priv->mcdi);
}
--
2.49.0
^ permalink raw reply related [flat|nested] 8+ messages in thread* [PATCH v2 2/3] EDAC/versalnet: Fix device name memory leak 2026-04-01 11:18 [PATCH v2 1/3] EDAC/versalnet: Fix teardown ordering in mc_remove() Prasanna Kumar T S M @ 2026-04-01 11:18 ` Prasanna Kumar T S M 2026-04-01 11:19 ` [PATCH v2 3/3] EDAC/versalnet: Fix device_register() error handling in init_one_mc() Prasanna Kumar T S M 2026-04-03 10:34 ` [PATCH v2 1/3] EDAC/versalnet: Fix teardown ordering in mc_remove() Borislav Petkov 2 siblings, 0 replies; 8+ messages in thread From: Prasanna Kumar T S M @ 2026-04-01 11:18 UTC (permalink / raw) To: ptsm, ssengar, shubhrajyoti.datta, bp, tony.luck, linux-edac, linux-kernel The device name allocated via kzalloc() in init_one_mc() is assigned to dev->init_name but never freed on the normal removal path. device_register() copies init_name and then sets dev->init_name to NULL, so the name pointer becomes unreachable from the device. Thus leaking memory. Use a stack-local char array instead of using kzalloc() for name. Fixes: d5fe2fec6c40 ("EDAC: Add a driver for the AMD Versal NET DDR controller") Cc: stable@vger.kernel.org Signed-off-by: Prasanna Kumar T S M <ptsm@linux.microsoft.com> --- drivers/edac/versalnet_edac.c | 10 ++-------- 1 file changed, 2 insertions(+), 8 deletions(-) diff --git a/drivers/edac/versalnet_edac.c b/drivers/edac/versalnet_edac.c index acd51b492772..012c4f40994d 100644 --- a/drivers/edac/versalnet_edac.c +++ b/drivers/edac/versalnet_edac.c @@ -779,7 +779,7 @@ static int init_one_mc(struct mc_priv *priv, struct platform_device *pdev, int i struct mem_ctl_info *mci; struct device *dev; enum dev_type dt; - char *name; + char name[MC_NAME_LEN]; int rc; config = priv->adec[CONF + i * ADEC_NUM]; @@ -813,13 +813,9 @@ static int init_one_mc(struct mc_priv *priv, struct platform_device *pdev, int i layers[1].is_virt_csrow = false; rc = -ENOMEM; - name = kzalloc(MC_NAME_LEN, GFP_KERNEL); - if (!name) - return rc; - dev = kzalloc(sizeof(*dev), GFP_KERNEL); if (!dev) - goto err_name_free; + return rc; mci = edac_mc_alloc(i, ARRAY_SIZE(layers), layers, sizeof(struct mc_priv)); if (!mci) { @@ -858,8 +854,6 @@ static int init_one_mc(struct mc_priv *priv, struct platform_device *pdev, int i edac_mc_free(mci); err_dev_free: kfree(dev); -err_name_free: - kfree(name); return rc; } -- 2.49.0 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH v2 3/3] EDAC/versalnet: Fix device_register() error handling in init_one_mc() 2026-04-01 11:18 [PATCH v2 1/3] EDAC/versalnet: Fix teardown ordering in mc_remove() Prasanna Kumar T S M 2026-04-01 11:18 ` [PATCH v2 2/3] EDAC/versalnet: Fix device name memory leak Prasanna Kumar T S M @ 2026-04-01 11:19 ` Prasanna Kumar T S M 2026-04-09 9:46 ` Prasanna Kumar T S M 2026-04-03 10:34 ` [PATCH v2 1/3] EDAC/versalnet: Fix teardown ordering in mc_remove() Borislav Petkov 2 siblings, 1 reply; 8+ messages in thread From: Prasanna Kumar T S M @ 2026-04-01 11:19 UTC (permalink / raw) To: ptsm, ssengar, shubhrajyoti.datta, bp, tony.luck, linux-edac, linux-kernel When device_register() fails, it must be followed by put_device() rather than kfree(), because device_register() calls device_initialize() which sets up the device refcount. The matching release function versal_edac_release() handles the actual kfree(). To simplify error handling and avoid complex unwinding, split device_register() into device_initialize() and device_add(). Initialize the device early so put_device() can be used in all error paths. Fixes: d5fe2fec6c40 ("EDAC: Add a driver for the AMD Versal NET DDR controller") Cc: stable@vger.kernel.org Signed-off-by: Prasanna Kumar T S M <ptsm@linux.microsoft.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- drivers/edac/versalnet_edac.c | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/drivers/edac/versalnet_edac.c b/drivers/edac/versalnet_edac.c index 012c4f40994d..94580d3c3170 100644 --- a/drivers/edac/versalnet_edac.c +++ b/drivers/edac/versalnet_edac.c @@ -780,7 +780,7 @@ static int init_one_mc(struct mc_priv *priv, struct platform_device *pdev, int i struct device *dev; enum dev_type dt; char name[MC_NAME_LEN]; - int rc; + int rc = -ENOMEM; config = priv->adec[CONF + i * ADEC_NUM]; num_chans = FIELD_GET(MC5_NUM_CHANS_MASK, config); @@ -812,23 +812,23 @@ static int init_one_mc(struct mc_priv *priv, struct platform_device *pdev, int i layers[1].size = num_chans; layers[1].is_virt_csrow = false; - rc = -ENOMEM; dev = kzalloc(sizeof(*dev), GFP_KERNEL); if (!dev) return rc; + sprintf(name, "versal-net-ddrmc5-edac-%d", i); + + dev->init_name = name; + dev->release = versal_edac_release; + device_initialize(dev); + mci = edac_mc_alloc(i, ARRAY_SIZE(layers), layers, sizeof(struct mc_priv)); if (!mci) { edac_printk(KERN_ERR, EDAC_MC, "Failed memory allocation for MC%d\n", i); goto err_dev_free; } - sprintf(name, "versal-net-ddrmc5-edac-%d", i); - - dev->init_name = name; - dev->release = versal_edac_release; - - rc = device_register(dev); + rc = device_add(dev); if (rc) goto err_mc_free; @@ -849,11 +849,11 @@ static int init_one_mc(struct mc_priv *priv, struct platform_device *pdev, int i return 0; err_unreg: - device_unregister(mci->pdev); + device_del(dev); err_mc_free: edac_mc_free(mci); err_dev_free: - kfree(dev); + put_device(dev); return rc; } -- 2.49.0 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH v2 3/3] EDAC/versalnet: Fix device_register() error handling in init_one_mc() 2026-04-01 11:19 ` [PATCH v2 3/3] EDAC/versalnet: Fix device_register() error handling in init_one_mc() Prasanna Kumar T S M @ 2026-04-09 9:46 ` Prasanna Kumar T S M 0 siblings, 0 replies; 8+ messages in thread From: Prasanna Kumar T S M @ 2026-04-09 9:46 UTC (permalink / raw) To: ssengar, shubhrajyoti.datta, bp, tony.luck, linux-edac, linux-kernel On 01-04-2026 16:49, Prasanna Kumar T S M wrote: > When device_register() fails, it must be followed by put_device() > rather than kfree(), because device_register() calls > device_initialize() which sets up the device refcount. The matching > release function versal_edac_release() handles the actual kfree(). > > To simplify error handling and avoid complex unwinding, split > device_register() into device_initialize() and device_add(). > Initialize the device early so put_device() can be used in all > error paths. > > Fixes: d5fe2fec6c40 ("EDAC: Add a driver for the AMD Versal NET DDR controller") > Cc: stable@vger.kernel.org > Signed-off-by: Prasanna Kumar T S M <ptsm@linux.microsoft.com> > Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> > --- > drivers/edac/versalnet_edac.c | 20 ++++++++++---------- > 1 file changed, 10 insertions(+), 10 deletions(-) > > diff --git a/drivers/edac/versalnet_edac.c b/drivers/edac/versalnet_edac.c > index 012c4f40994d..94580d3c3170 100644 > --- a/drivers/edac/versalnet_edac.c > +++ b/drivers/edac/versalnet_edac.c > @@ -780,7 +780,7 @@ static int init_one_mc(struct mc_priv *priv, struct platform_device *pdev, int i > struct device *dev; > enum dev_type dt; > char name[MC_NAME_LEN]; > - int rc; > + int rc = -ENOMEM; > > config = priv->adec[CONF + i * ADEC_NUM]; > num_chans = FIELD_GET(MC5_NUM_CHANS_MASK, config); > @@ -812,23 +812,23 @@ static int init_one_mc(struct mc_priv *priv, struct platform_device *pdev, int i > layers[1].size = num_chans; > layers[1].is_virt_csrow = false; > > - rc = -ENOMEM; > dev = kzalloc(sizeof(*dev), GFP_KERNEL); > if (!dev) > return rc; > > + sprintf(name, "versal-net-ddrmc5-edac-%d", i); > + > + dev->init_name = name; Sashiko's review is valid ------------------------- If edac_mc_alloc() fails below, we jump to err_dev_free, call put_device(dev), and return from init_one_mc(). Since kobject cleanup can be deferred asynchronously (such as when CONFIG_DEBUG_KOBJECT_RELEASE is enabled), and name is a local stack array, could this result in a use-after-free if dev_name(dev) is accessed during cleanup? Would it be safer to use dev_set_name(dev, "versal-net-ddrmc5-edac-%d", i) instead of assigning dev->init_name directly? ------------------------- This is a valid issue, I will fix it in the next iteration. > + dev->release = versal_edac_release; > + device_initialize(dev); > + > mci = edac_mc_alloc(i, ARRAY_SIZE(layers), layers, sizeof(struct mc_priv)); > if (!mci) { > edac_printk(KERN_ERR, EDAC_MC, "Failed memory allocation for MC%d\n", i); > goto err_dev_free; > } > > - sprintf(name, "versal-net-ddrmc5-edac-%d", i); > - > - dev->init_name = name; > - dev->release = versal_edac_release; > - > - rc = device_register(dev); > + rc = device_add(dev); > if (rc) > goto err_mc_free; > > @@ -849,11 +849,11 @@ static int init_one_mc(struct mc_priv *priv, struct platform_device *pdev, int i > return 0; > > err_unreg: > - device_unregister(mci->pdev); > + device_del(dev); > err_mc_free: > edac_mc_free(mci); > err_dev_free: > - kfree(dev); > + put_device(dev); > > return rc; > } ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2 1/3] EDAC/versalnet: Fix teardown ordering in mc_remove() 2026-04-01 11:18 [PATCH v2 1/3] EDAC/versalnet: Fix teardown ordering in mc_remove() Prasanna Kumar T S M 2026-04-01 11:18 ` [PATCH v2 2/3] EDAC/versalnet: Fix device name memory leak Prasanna Kumar T S M 2026-04-01 11:19 ` [PATCH v2 3/3] EDAC/versalnet: Fix device_register() error handling in init_one_mc() Prasanna Kumar T S M @ 2026-04-03 10:34 ` Borislav Petkov 2026-04-06 5:26 ` Prasanna Kumar T S M 2 siblings, 1 reply; 8+ messages in thread From: Borislav Petkov @ 2026-04-03 10:34 UTC (permalink / raw) To: Prasanna Kumar T S M Cc: ssengar, shubhrajyoti.datta, tony.luck, linux-edac, linux-kernel On Wed, Apr 01, 2026 at 04:18:36AM -0700, Prasanna Kumar T S M wrote: > The teardown sequence in mc_remove() does not mirror the reverse of the > initialization order in mc_probe(). In particular, > unregister_rpmsg_driver() is called before remove_versalnet(), and > cdx_mcdi_finish() is called after rproc_shutdown(). > > Reorder mc_remove() to reverse the probe initialization sequence, > consistent with the probe error-unwind paths. > > The rproc reference acquired via rproc_get_by_phandle() during probe > is not released in mc_remove(), causing a reference count leak. Add > the missing rproc_put() call. > > Fixes: d5fe2fec6c40 ("EDAC: Add a driver for the AMD Versal NET DDR controller") > Signed-off-by: Prasanna Kumar T S M <ptsm@linux.microsoft.com> > --- > drivers/edac/versalnet_edac.c | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) Sashiko has found things, pls addres them: https://sashiko.dev/#/patchset/20260401111836.2342918-1-ptsm%40linux.microsoft.com -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2 1/3] EDAC/versalnet: Fix teardown ordering in mc_remove() 2026-04-03 10:34 ` [PATCH v2 1/3] EDAC/versalnet: Fix teardown ordering in mc_remove() Borislav Petkov @ 2026-04-06 5:26 ` Prasanna Kumar T S M 2026-04-06 8:23 ` Borislav Petkov 0 siblings, 1 reply; 8+ messages in thread From: Prasanna Kumar T S M @ 2026-04-06 5:26 UTC (permalink / raw) To: Borislav Petkov Cc: ssengar, shubhrajyoti.datta, tony.luck, linux-edac, linux-kernel On 03-04-2026 16:04, Borislav Petkov wrote: > On Wed, Apr 01, 2026 at 04:18:36AM -0700, Prasanna Kumar T S M wrote: >> The teardown sequence in mc_remove() does not mirror the reverse of the >> initialization order in mc_probe(). In particular, >> unregister_rpmsg_driver() is called before remove_versalnet(), and >> cdx_mcdi_finish() is called after rproc_shutdown(). >> >> Reorder mc_remove() to reverse the probe initialization sequence, >> consistent with the probe error-unwind paths. >> >> The rproc reference acquired via rproc_get_by_phandle() during probe >> is not released in mc_remove(), causing a reference count leak. Add >> the missing rproc_put() call. >> >> Fixes: d5fe2fec6c40 ("EDAC: Add a driver for the AMD Versal NET DDR controller") >> Signed-off-by: Prasanna Kumar T S M <ptsm@linux.microsoft.com> >> --- >> drivers/edac/versalnet_edac.c | 5 +++-- >> 1 file changed, 3 insertions(+), 2 deletions(-) > > Sashiko has found things, pls addres them: > > https://sashiko.dev/#/patchset/20260401111836.2342918-1-ptsm%40linux.microsoft.com > I asked AI to validate Sashiko's comment. This is its output ------------------------------------------------------------ Analysis The review comment is not valid (false positive). Here's the detailed reasoning: Data structure hierarchy mc_priv └── mcdi: struct cdx_mcdi * ← kfree'd at step 6 (LAST) ├── mcdi: struct cdx_mcdi_data * ← kfree'd by cdx_mcdi_finish() at step 2 │ └── iface: struct cdx_mcdi_iface (mutex, cmd list, etc.) ├── r5_rproc └── ... New mc_remove() order (after commit) (1) remove_versalnet(priv); // tear down EDAC (2) cdx_mcdi_finish(priv->mcdi); // wait_for_cleanup → kfree(cdx->mcdi) → cdx->mcdi = NULL (3) unregister_rpmsg_driver(&amd_rpmsg_driver); // destroys rpmsg endpoint (4) rproc_shutdown(priv->mcdi->r5_rproc); (5) rproc_put(priv->mcdi->r5_rproc); (6) kfree(priv->mcdi); // frees outer struct Why the concern doesn't hold The reviewer's premise is wrong. The comment says "after priv->mcdi is freed" — but priv->mcdi (the struct cdx_mcdi * pointer passed to cdx_mcdi_process_cmd) is not freed until step 6, well after unregister_rpmsg_driver() at step 3. What IS freed at step 2 is the inner cdx->mcdi (struct cdx_mcdi_data *). But cdx_mcdi_finish() also sets cdx->mcdi = NULL. If rpmsg_cb() fires between steps 2 and 3: 1. mc_priv->mcdi → still points to valid struct cdx_mcdi (not freed until step 6) 2. cdx_mcdi_process_cmd(mc_priv->mcdi, ...) is called with a valid pointer 3. Inside, cdx_mcdi_if(cdx) checks cdx->mcdi → NULL → returns NULL 4. if (!mcdi) return; → exits early, safe Additionally, cdx_mcdi_wait_for_cleanup() inside cdx_mcdi_finish() ensures all pending MCDI commands have completed before freeing, so no legitimate MCDI responses should arrive afterward. This ordering exactly matches the probe error unwind path (err_init: does cdx_mcdi_finish → kfree → unregister_rpmsg_driver → rproc_shutdown → rproc_put), which the commit explicitly states it's aligning with. Verdict: False positive. The NULL guard in cdx_mcdi_if() / cdx_mcdi_process_cmd() protects the window, and the outer struct priv->mcdi remains valid throughout. ------------------------------------------------------------ So the comment looks like a false positive. It will be great if someone from AMD verifies this. Can you please look at the other 2 patches in this series? They are independent of this change. Thanks, Prasanna Kumar ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2 1/3] EDAC/versalnet: Fix teardown ordering in mc_remove() 2026-04-06 5:26 ` Prasanna Kumar T S M @ 2026-04-06 8:23 ` Borislav Petkov 2026-04-09 9:33 ` Prasanna Kumar T S M 0 siblings, 1 reply; 8+ messages in thread From: Borislav Petkov @ 2026-04-06 8:23 UTC (permalink / raw) To: Prasanna Kumar T S M Cc: ssengar, shubhrajyoti.datta, tony.luck, linux-edac, linux-kernel On Mon, Apr 06, 2026 at 10:56:17AM +0530, Prasanna Kumar T S M wrote: > > Sashiko has found things, pls addres them: > > > > https://sashiko.dev/#/patchset/20260401111836.2342918-1-ptsm%40linux.microsoft.com > > > > I asked AI to validate Sashiko's comment. So I asked *you* to address the Sashiko review. You go and ask another AI to validate the review. Now, if I go and paste the whole conversation to a third AI, the f*ckup is complete. Tell me: is that the goal of this exercise? Let AIs do the thinking for us and we can all go shopping? Or maybe *you* should go review Sashiko's review and say, I agree because <proper, comprehensible explanation> or I don't agree because <also proper, comprehensible explanation>. Then *I* go and verify that. Hmm... -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2 1/3] EDAC/versalnet: Fix teardown ordering in mc_remove() 2026-04-06 8:23 ` Borislav Petkov @ 2026-04-09 9:33 ` Prasanna Kumar T S M 0 siblings, 0 replies; 8+ messages in thread From: Prasanna Kumar T S M @ 2026-04-09 9:33 UTC (permalink / raw) To: Borislav Petkov Cc: ssengar, shubhrajyoti.datta, tony.luck, linux-edac, linux-kernel On 06-04-2026 13:53, Borislav Petkov wrote: > On Mon, Apr 06, 2026 at 10:56:17AM +0530, Prasanna Kumar T S M wrote: >>> Sashiko has found things, pls addres them: >>> >>> https://sashiko.dev/#/patchset/20260401111836.2342918-1-ptsm%40linux.microsoft.com >>> >> >> I asked AI to validate Sashiko's comment. > > So I asked *you* to address the Sashiko review. You go and ask another AI to > validate the review. > > Now, if I go and paste the whole conversation to a third AI, the f*ckup is > complete. > > Tell me: is that the goal of this exercise? Let AIs do the thinking > for us and we can all go shopping? > > Or maybe *you* should go review Sashiko's review and say, I agree because > <proper, comprehensible explanation> or I don't agree because <also proper, > comprehensible explanation>. > > Then *I* go and verify that. > > Hmm... > Hi, Sashiko review says ------------------- If an MCDI response message arrives after priv->mcdi is freed but before the RPMSG endpoint is destroyed, could the still-active rpmsg_cb() pass the dangling priv->mcdi pointer to cdx_mcdi_process_cmd(), leading to a use-after-free? ------------------- The review is based on the assumption that priv->mcdi is freed before unregister_rpmsg_driver(). But priv->mcdi is valid till the end of the function. The cdx_mcdi_finish() waits for the mcdi->workqueue to finish processing and the mcdi->workqueue is destroyed. Any subsequent rpmsg_cb() calls cdx_mcdi_process_cmd() which safely without doing anything if mcdi (priv->mcdi->mcdi) is NULL. The review can be safely ignored. Other path in rpmsg_cb() accesses priv->adec and priv->regs, both of them are valid and doesn't cause any use-after-free issue. The review is a false positive and can be ignored. Thanks, Prasanna Kumar ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2026-04-09 9:46 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-04-01 11:18 [PATCH v2 1/3] EDAC/versalnet: Fix teardown ordering in mc_remove() Prasanna Kumar T S M 2026-04-01 11:18 ` [PATCH v2 2/3] EDAC/versalnet: Fix device name memory leak Prasanna Kumar T S M 2026-04-01 11:19 ` [PATCH v2 3/3] EDAC/versalnet: Fix device_register() error handling in init_one_mc() Prasanna Kumar T S M 2026-04-09 9:46 ` Prasanna Kumar T S M 2026-04-03 10:34 ` [PATCH v2 1/3] EDAC/versalnet: Fix teardown ordering in mc_remove() Borislav Petkov 2026-04-06 5:26 ` Prasanna Kumar T S M 2026-04-06 8:23 ` Borislav Petkov 2026-04-09 9:33 ` Prasanna Kumar T S M
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox