public inbox for linux-edac@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 1/3] EDAC/versalnet: Fix teardown ordering in mc_remove()
@ 2026-04-01 11:18 Prasanna Kumar T S M
  2026-04-01 11:18 ` [PATCH v2 2/3] EDAC/versalnet: Fix device name memory leak Prasanna Kumar T S M
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Prasanna Kumar T S M @ 2026-04-01 11:18 UTC (permalink / raw)
  To: ptsm, ssengar, shubhrajyoti.datta, bp, tony.luck, linux-edac,
	linux-kernel

The teardown sequence in mc_remove() does not mirror the reverse of the
initialization order in mc_probe(). In particular,
unregister_rpmsg_driver() is called before remove_versalnet(), and
cdx_mcdi_finish() is called after rproc_shutdown().

Reorder mc_remove() to reverse the probe initialization sequence,
consistent with the probe error-unwind paths.

The rproc reference acquired via rproc_get_by_phandle() during probe
is not released in mc_remove(), causing a reference count leak. Add
the missing rproc_put() call.

Fixes: d5fe2fec6c40 ("EDAC: Add a driver for the AMD Versal NET DDR controller")
Signed-off-by: Prasanna Kumar T S M <ptsm@linux.microsoft.com>
---
 drivers/edac/versalnet_edac.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/edac/versalnet_edac.c b/drivers/edac/versalnet_edac.c
index b87fe57aa842..acd51b492772 100644
--- a/drivers/edac/versalnet_edac.c
+++ b/drivers/edac/versalnet_edac.c
@@ -955,10 +955,11 @@ static void mc_remove(struct platform_device *pdev)
 {
 	struct mc_priv *priv = platform_get_drvdata(pdev);
 
-	unregister_rpmsg_driver(&amd_rpmsg_driver);
 	remove_versalnet(priv);
-	rproc_shutdown(priv->mcdi->r5_rproc);
 	cdx_mcdi_finish(priv->mcdi);
+	unregister_rpmsg_driver(&amd_rpmsg_driver);
+	rproc_shutdown(priv->mcdi->r5_rproc);
+	rproc_put(priv->mcdi->r5_rproc);
 	kfree(priv->mcdi);
 }
 
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH v2 2/3] EDAC/versalnet: Fix device name memory leak
  2026-04-01 11:18 [PATCH v2 1/3] EDAC/versalnet: Fix teardown ordering in mc_remove() Prasanna Kumar T S M
@ 2026-04-01 11:18 ` Prasanna Kumar T S M
  2026-04-01 11:19 ` [PATCH v2 3/3] EDAC/versalnet: Fix device_register() error handling in init_one_mc() Prasanna Kumar T S M
  2026-04-03 10:34 ` [PATCH v2 1/3] EDAC/versalnet: Fix teardown ordering in mc_remove() Borislav Petkov
  2 siblings, 0 replies; 6+ messages in thread
From: Prasanna Kumar T S M @ 2026-04-01 11:18 UTC (permalink / raw)
  To: ptsm, ssengar, shubhrajyoti.datta, bp, tony.luck, linux-edac,
	linux-kernel

The device name allocated via kzalloc() in init_one_mc() is assigned to
dev->init_name but never freed on the normal removal path.
device_register() copies init_name and then sets dev->init_name to NULL,
so the name pointer becomes unreachable from the device. Thus leaking
memory.

Use a stack-local char array instead of using kzalloc() for name.

Fixes: d5fe2fec6c40 ("EDAC: Add a driver for the AMD Versal NET DDR controller")
Cc: stable@vger.kernel.org
Signed-off-by: Prasanna Kumar T S M <ptsm@linux.microsoft.com>
---
 drivers/edac/versalnet_edac.c | 10 ++--------
 1 file changed, 2 insertions(+), 8 deletions(-)

diff --git a/drivers/edac/versalnet_edac.c b/drivers/edac/versalnet_edac.c
index acd51b492772..012c4f40994d 100644
--- a/drivers/edac/versalnet_edac.c
+++ b/drivers/edac/versalnet_edac.c
@@ -779,7 +779,7 @@ static int init_one_mc(struct mc_priv *priv, struct platform_device *pdev, int i
 	struct mem_ctl_info *mci;
 	struct device *dev;
 	enum dev_type dt;
-	char *name;
+	char name[MC_NAME_LEN];
 	int rc;
 
 	config = priv->adec[CONF + i * ADEC_NUM];
@@ -813,13 +813,9 @@ static int init_one_mc(struct mc_priv *priv, struct platform_device *pdev, int i
 	layers[1].is_virt_csrow = false;
 
 	rc = -ENOMEM;
-	name = kzalloc(MC_NAME_LEN, GFP_KERNEL);
-	if (!name)
-		return rc;
-
 	dev = kzalloc(sizeof(*dev), GFP_KERNEL);
 	if (!dev)
-		goto err_name_free;
+		return rc;
 
 	mci = edac_mc_alloc(i, ARRAY_SIZE(layers), layers, sizeof(struct mc_priv));
 	if (!mci) {
@@ -858,8 +854,6 @@ static int init_one_mc(struct mc_priv *priv, struct platform_device *pdev, int i
 	edac_mc_free(mci);
 err_dev_free:
 	kfree(dev);
-err_name_free:
-	kfree(name);
 
 	return rc;
 }
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH v2 3/3] EDAC/versalnet: Fix device_register() error handling in init_one_mc()
  2026-04-01 11:18 [PATCH v2 1/3] EDAC/versalnet: Fix teardown ordering in mc_remove() Prasanna Kumar T S M
  2026-04-01 11:18 ` [PATCH v2 2/3] EDAC/versalnet: Fix device name memory leak Prasanna Kumar T S M
@ 2026-04-01 11:19 ` Prasanna Kumar T S M
  2026-04-03 10:34 ` [PATCH v2 1/3] EDAC/versalnet: Fix teardown ordering in mc_remove() Borislav Petkov
  2 siblings, 0 replies; 6+ messages in thread
From: Prasanna Kumar T S M @ 2026-04-01 11:19 UTC (permalink / raw)
  To: ptsm, ssengar, shubhrajyoti.datta, bp, tony.luck, linux-edac,
	linux-kernel

When device_register() fails, it must be followed by put_device()
rather than kfree(), because device_register() calls
device_initialize() which sets up the device refcount. The matching
release function versal_edac_release() handles the actual kfree().

To simplify error handling and avoid complex unwinding, split
device_register() into device_initialize() and device_add().
Initialize the device early so put_device() can be used in all
error paths.

Fixes: d5fe2fec6c40 ("EDAC: Add a driver for the AMD Versal NET DDR controller")
Cc: stable@vger.kernel.org
Signed-off-by: Prasanna Kumar T S M <ptsm@linux.microsoft.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
---
 drivers/edac/versalnet_edac.c | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/drivers/edac/versalnet_edac.c b/drivers/edac/versalnet_edac.c
index 012c4f40994d..94580d3c3170 100644
--- a/drivers/edac/versalnet_edac.c
+++ b/drivers/edac/versalnet_edac.c
@@ -780,7 +780,7 @@ static int init_one_mc(struct mc_priv *priv, struct platform_device *pdev, int i
 	struct device *dev;
 	enum dev_type dt;
 	char name[MC_NAME_LEN];
-	int rc;
+	int rc = -ENOMEM;
 
 	config = priv->adec[CONF + i * ADEC_NUM];
 	num_chans = FIELD_GET(MC5_NUM_CHANS_MASK, config);
@@ -812,23 +812,23 @@ static int init_one_mc(struct mc_priv *priv, struct platform_device *pdev, int i
 	layers[1].size = num_chans;
 	layers[1].is_virt_csrow = false;
 
-	rc = -ENOMEM;
 	dev = kzalloc(sizeof(*dev), GFP_KERNEL);
 	if (!dev)
 		return rc;
 
+	sprintf(name, "versal-net-ddrmc5-edac-%d", i);
+
+	dev->init_name = name;
+	dev->release = versal_edac_release;
+	device_initialize(dev);
+
 	mci = edac_mc_alloc(i, ARRAY_SIZE(layers), layers, sizeof(struct mc_priv));
 	if (!mci) {
 		edac_printk(KERN_ERR, EDAC_MC, "Failed memory allocation for MC%d\n", i);
 		goto err_dev_free;
 	}
 
-	sprintf(name, "versal-net-ddrmc5-edac-%d", i);
-
-	dev->init_name = name;
-	dev->release = versal_edac_release;
-
-	rc = device_register(dev);
+	rc = device_add(dev);
 	if (rc)
 		goto err_mc_free;
 
@@ -849,11 +849,11 @@ static int init_one_mc(struct mc_priv *priv, struct platform_device *pdev, int i
 	return 0;
 
 err_unreg:
-	device_unregister(mci->pdev);
+	device_del(dev);
 err_mc_free:
 	edac_mc_free(mci);
 err_dev_free:
-	kfree(dev);
+	put_device(dev);
 
 	return rc;
 }
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH v2 1/3] EDAC/versalnet: Fix teardown ordering in mc_remove()
  2026-04-01 11:18 [PATCH v2 1/3] EDAC/versalnet: Fix teardown ordering in mc_remove() Prasanna Kumar T S M
  2026-04-01 11:18 ` [PATCH v2 2/3] EDAC/versalnet: Fix device name memory leak Prasanna Kumar T S M
  2026-04-01 11:19 ` [PATCH v2 3/3] EDAC/versalnet: Fix device_register() error handling in init_one_mc() Prasanna Kumar T S M
@ 2026-04-03 10:34 ` Borislav Petkov
  2026-04-06  5:26   ` Prasanna Kumar T S M
  2 siblings, 1 reply; 6+ messages in thread
From: Borislav Petkov @ 2026-04-03 10:34 UTC (permalink / raw)
  To: Prasanna Kumar T S M
  Cc: ssengar, shubhrajyoti.datta, tony.luck, linux-edac, linux-kernel

On Wed, Apr 01, 2026 at 04:18:36AM -0700, Prasanna Kumar T S M wrote:
> The teardown sequence in mc_remove() does not mirror the reverse of the
> initialization order in mc_probe(). In particular,
> unregister_rpmsg_driver() is called before remove_versalnet(), and
> cdx_mcdi_finish() is called after rproc_shutdown().
> 
> Reorder mc_remove() to reverse the probe initialization sequence,
> consistent with the probe error-unwind paths.
> 
> The rproc reference acquired via rproc_get_by_phandle() during probe
> is not released in mc_remove(), causing a reference count leak. Add
> the missing rproc_put() call.
> 
> Fixes: d5fe2fec6c40 ("EDAC: Add a driver for the AMD Versal NET DDR controller")
> Signed-off-by: Prasanna Kumar T S M <ptsm@linux.microsoft.com>
> ---
>  drivers/edac/versalnet_edac.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)

Sashiko has found things, pls addres them:

https://sashiko.dev/#/patchset/20260401111836.2342918-1-ptsm%40linux.microsoft.com

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2 1/3] EDAC/versalnet: Fix teardown ordering in mc_remove()
  2026-04-03 10:34 ` [PATCH v2 1/3] EDAC/versalnet: Fix teardown ordering in mc_remove() Borislav Petkov
@ 2026-04-06  5:26   ` Prasanna Kumar T S M
  2026-04-06  8:23     ` Borislav Petkov
  0 siblings, 1 reply; 6+ messages in thread
From: Prasanna Kumar T S M @ 2026-04-06  5:26 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: ssengar, shubhrajyoti.datta, tony.luck, linux-edac, linux-kernel



On 03-04-2026 16:04, Borislav Petkov wrote:
> On Wed, Apr 01, 2026 at 04:18:36AM -0700, Prasanna Kumar T S M wrote:
>> The teardown sequence in mc_remove() does not mirror the reverse of the
>> initialization order in mc_probe(). In particular,
>> unregister_rpmsg_driver() is called before remove_versalnet(), and
>> cdx_mcdi_finish() is called after rproc_shutdown().
>>
>> Reorder mc_remove() to reverse the probe initialization sequence,
>> consistent with the probe error-unwind paths.
>>
>> The rproc reference acquired via rproc_get_by_phandle() during probe
>> is not released in mc_remove(), causing a reference count leak. Add
>> the missing rproc_put() call.
>>
>> Fixes: d5fe2fec6c40 ("EDAC: Add a driver for the AMD Versal NET DDR controller")
>> Signed-off-by: Prasanna Kumar T S M <ptsm@linux.microsoft.com>
>> ---
>>   drivers/edac/versalnet_edac.c | 5 +++--
>>   1 file changed, 3 insertions(+), 2 deletions(-)
> 
> Sashiko has found things, pls addres them:
> 
> https://sashiko.dev/#/patchset/20260401111836.2342918-1-ptsm%40linux.microsoft.com
> 

I asked AI to validate Sashiko's comment. This is its output
------------------------------------------------------------

   Analysis

   The review comment is not valid (false positive). Here's the detailed 
reasoning:

   Data structure hierarchy

    mc_priv
      └── mcdi: struct cdx_mcdi *            ← kfree'd at step 6 (LAST)
            ├── mcdi: struct cdx_mcdi_data *  ← kfree'd by 
cdx_mcdi_finish() at step 2
            │     └── iface: struct cdx_mcdi_iface (mutex, cmd list, etc.)
            ├── r5_rproc
            └── ...

   New mc_remove() order (after commit)

    (1) remove_versalnet(priv);                  // tear down EDAC
    (2) cdx_mcdi_finish(priv->mcdi);             // wait_for_cleanup → 
kfree(cdx->mcdi) → cdx->mcdi = NULL
    (3) unregister_rpmsg_driver(&amd_rpmsg_driver); // destroys rpmsg 
endpoint
    (4) rproc_shutdown(priv->mcdi->r5_rproc);
    (5) rproc_put(priv->mcdi->r5_rproc);
    (6) kfree(priv->mcdi);                       // frees outer struct

   Why the concern doesn't hold

   The reviewer's premise is wrong. The comment says "after priv->mcdi 
is freed" — but priv->mcdi (the struct cdx_mcdi * pointer passed to 
cdx_mcdi_process_cmd) is not freed
   until step 6, well after unregister_rpmsg_driver() at step 3.

   What IS freed at step 2 is the inner cdx->mcdi (struct cdx_mcdi_data 
*). But cdx_mcdi_finish() also sets cdx->mcdi = NULL. If rpmsg_cb() 
fires between steps 2 and 3:

    1. mc_priv->mcdi → still points to valid struct cdx_mcdi (not freed 
until step 6)
    2. cdx_mcdi_process_cmd(mc_priv->mcdi, ...) is called with a valid 
pointer
    3. Inside, cdx_mcdi_if(cdx) checks cdx->mcdi → NULL → returns NULL
    4. if (!mcdi) return; → exits early, safe

   Additionally, cdx_mcdi_wait_for_cleanup() inside cdx_mcdi_finish() 
ensures all pending MCDI commands have completed before freeing, so no 
legitimate MCDI responses should
   arrive afterward.

   This ordering exactly matches the probe error unwind path (err_init: 
does cdx_mcdi_finish → kfree → unregister_rpmsg_driver → rproc_shutdown 
→ rproc_put), which the
   commit explicitly states it's aligning with.

   Verdict: False positive. The NULL guard in cdx_mcdi_if() / 
cdx_mcdi_process_cmd() protects the window, and the outer struct 
priv->mcdi remains valid throughout.

------------------------------------------------------------

So the comment looks like a false positive. It will be great if someone 
from AMD verifies this.

Can you please look at the other 2 patches in this series? They are 
independent of this change.

Thanks,
Prasanna Kumar

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2 1/3] EDAC/versalnet: Fix teardown ordering in mc_remove()
  2026-04-06  5:26   ` Prasanna Kumar T S M
@ 2026-04-06  8:23     ` Borislav Petkov
  0 siblings, 0 replies; 6+ messages in thread
From: Borislav Petkov @ 2026-04-06  8:23 UTC (permalink / raw)
  To: Prasanna Kumar T S M
  Cc: ssengar, shubhrajyoti.datta, tony.luck, linux-edac, linux-kernel

On Mon, Apr 06, 2026 at 10:56:17AM +0530, Prasanna Kumar T S M wrote:
> > Sashiko has found things, pls addres them:
> > 
> > https://sashiko.dev/#/patchset/20260401111836.2342918-1-ptsm%40linux.microsoft.com
> > 
> 
> I asked AI to validate Sashiko's comment.

So I asked *you* to address the Sashiko review. You go and ask another AI to
validate the review.

Now, if I go and paste the whole conversation to a third AI, the f*ckup is
complete.

Tell me: is that the goal of this exercise? Let AIs do the thinking
for us and we can all go shopping?

Or maybe *you* should go review Sashiko's review and say, I agree because
<proper, comprehensible explanation> or I don't agree because <also proper,
comprehensible explanation>.

Then *I* go and verify that.

Hmm...

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2026-04-06  8:24 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-01 11:18 [PATCH v2 1/3] EDAC/versalnet: Fix teardown ordering in mc_remove() Prasanna Kumar T S M
2026-04-01 11:18 ` [PATCH v2 2/3] EDAC/versalnet: Fix device name memory leak Prasanna Kumar T S M
2026-04-01 11:19 ` [PATCH v2 3/3] EDAC/versalnet: Fix device_register() error handling in init_one_mc() Prasanna Kumar T S M
2026-04-03 10:34 ` [PATCH v2 1/3] EDAC/versalnet: Fix teardown ordering in mc_remove() Borislav Petkov
2026-04-06  5:26   ` Prasanna Kumar T S M
2026-04-06  8:23     ` Borislav Petkov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox