From: Cristian Marussi <cristian.marussi@arm.com>
To: Catalin Marinas <catalin.marinas@arm.com>
Cc: Cristian Marussi <cristian.marussi@arm.com>,
Alice Ryhl <aliceryhl@google.com>,
Sudeep Holla <sudeep.holla@arm.com>,
linux-arm-kernel@lists.infradead.org, arm-scmi@vger.kernel.org,
linux-kernel@vger.kernel.org
Subject: Re: [Bug report] Memory leak in scmi_device_create
Date: Thu, 6 Mar 2025 18:43:45 +0000 [thread overview]
Message-ID: <Z8ns4T6pas9DEy8B@pluto> (raw)
In-Reply-To: <Z8nK3uFkspy61yjP@arm.com>
On Thu, Mar 06, 2025 at 04:18:38PM +0000, Catalin Marinas wrote:
> On Thu, Mar 06, 2025 at 03:47:27PM +0000, Cristian Marussi wrote:
> > On Thu, Mar 06, 2025 at 02:36:16PM +0000, Catalin Marinas wrote:
> > > This loop in scmi_device_create() looks strange:
> > >
> > > list_for_each_entry(rdev, phead, node) {
> > > struct scmi_device *sdev;
> > >
> > > sdev = __scmi_device_create(np, parent,
> > > rdev->id_table->protocol_id,
> > > rdev->id_table->name);
> > > /* Report errors and carry on... */
> > > if (sdev)
> > > scmi_dev = sdev;
> > > else
> > > pr_err("(%s) Failed to create device for protocol 0x%x (%s)\n",
> > > of_node_full_name(parent->of_node),
> > > rdev->id_table->protocol_id,
> > > rdev->id_table->name);
> > > }
> > >
> > > We can override scmi_dev a few times in the loop and lose the previous
> > > sdev allocations. Is this intended?
> >
> > Yes...it is weird..but by design I would say :P ...
> >
> > ...because this is called to instantiate one single device OR instantiate at
> > once all the multiple devices needed for a protocol: in this latter case it
> > returns just one of the created devices to signal success or NULL if all the
> > devices' creation failed....we dont need to keep the allocated devices references
> > anyway here since on success those devices are now referenced and kept on the
> > SCMI bus, so they can be searched/scanned/destroyed from there.
>
> Not sure why the pointer isn't found, device_add() should link it with
> the parent. Unless something else fails, the parent is freed and the
> linked devices unreachable. I'm not familiar at all with this code, I
> just saw kmemleak and thought of replying.
>
> The loop is still weird, scmi_chan_setup() seems to use the pointer to
> scmi_device for something more meaningful than a pass/fail check. Also
> the overall result is based only on what the last __scmi_device_create()
> return value was, irrespective of the previous iterations of the loop.
> You do have a pr_err() but no early bailing out of the loop on failure.
> I'm curious if there are any SCMI errors in the Alice's kernel log.
>
Yes, the weirdness comes from the fact such function is used alternatively
to create a single named device (and make some use of it, like in
scmi_chan_setup) OR to create a bunch of devices for the same protocol
when no specific device is asked for (name==NULL)...anyway the case at
hand that kmemleak complains about does NOT pass through that weird loop...
...good news is, I was able to reproduce a similar report consistently
with a load/unload/load sequence....the culprit is that when looking for
a device to destroy on unload, the SCMI bus uses device_find_child()
and that bumps the device refcnt implicitly...as a result when the device
is destroyed the refcnt is NEVER found as zero and so NO device_release
is ever called...this results in dev->p private_data to be never released
and that is what kmemleak spotted (at the start of teh chain):
unreferenced object 0xffff00000f583800 (size 512):
comm "insmod", pid 227, jiffies 4294912190
hex dump (first 32 bytes):
00 00 00 00 ad 4e ad de ff ff ff ff 00 00 00 00 .....N..........
ff ff ff ff ff ff ff ff 60 36 1d 8a 00 80 ff ff ........`6......
backtrace (crc 114e2eed):
kmemleak_alloc+0xbc/0xd8
__kmalloc_cache_noprof+0x2dc/0x398
device_add+0x954/0x12d0
device_register+0x28/0x40
__scmi_device_create.part.0+0x1bc/0x380
scmi_device_create+0x2d0/0x390
scmi_create_protocol_devices+0x74/0xf8
scmi_device_request_notifier+0x1f8/0x2a8
notifier_call_chain+0x110/0x3b0
blocking_notifier_call_chain+0x70/0xb0
scmi_driver_register+0x350/0x7f0
0xffff80000a3b3038
do_one_initcall+0x12c/0x730
do_init_module+0x1dc/0x640
load_module+0x4b20/0x5b70
init_module_from_file+0xec/0x158
$ ./scripts/faddr2line ./vmlinux device_add+0x954/0x12d0
device_add+0x954/0x12d0:
kmalloc_noprof at include/linux/slab.h:901
(inlined by) kzalloc_noprof at include/linux/slab.h:1037
(inlined by) device_private_init at drivers/base/core.c:3510
(inlined by) device_add at drivers/base/core.c:3561
I am posting a fix.
Thanks for the report and the help.
Any feedback and testing is much welcomed :D
Cristian
prev parent reply other threads:[~2025-03-06 18:43 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-03-05 11:59 [Bug report] Memory leak in scmi_device_create Alice Ryhl
2025-03-05 17:10 ` Cristian Marussi
2025-03-06 11:09 ` Alice Ryhl
2025-03-06 12:50 ` Cristian Marussi
2025-03-06 13:25 ` Alice Ryhl
2025-03-06 14:36 ` Catalin Marinas
2025-03-06 15:47 ` Cristian Marussi
2025-03-06 16:18 ` Catalin Marinas
2025-03-06 18:43 ` Cristian Marussi [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Z8ns4T6pas9DEy8B@pluto \
--to=cristian.marussi@arm.com \
--cc=aliceryhl@google.com \
--cc=arm-scmi@vger.kernel.org \
--cc=catalin.marinas@arm.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=sudeep.holla@arm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.