linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [Bug report] Memory leak in scmi_device_create
@ 2025-03-05 11:59 Alice Ryhl
  2025-03-05 17:10 ` Cristian Marussi
  0 siblings, 1 reply; 9+ messages in thread
From: Alice Ryhl @ 2025-03-05 11:59 UTC (permalink / raw)
  To: Sudeep Holla, Cristian Marussi; +Cc: linux-arm-kernel, arm-scmi, linux-kernel

Dear SYSTEM CONTROL & POWER/MANAGEMENT INTERFACE (SCPI/SCMI) Message
Protocol drivers maintainers,

I flashed a v6.13-rc3 kernel onto a Rock5B board and noticed the
following output in my terminal:

[  687.694465] kmemleak: 4 new suspected memory leaks (see /sys/kernel/debug/kmemleak)

It seems that there is a memory leak for devices created with
scmi_device_create.

This was with a kernel running v6.13-rc3, but as far as I can tell, no
relevant changes have landed since v6.13-rc3. My tree *does* include
commit 295416091e44 ("firmware: arm_scmi: Fix slab-use-after-free in
scmi_bus_notifier()"). I've only seen this kmemleak report once, so it's
not happening consistently.

See below for the full kmemleak report.

Alice

$ sudo cat /sys/kernel/debug/kmemleak
unreferenced object 0xffffff8106c86000 (size 2048):
  comm "swapper/0", pid 1, jiffies 4294893094
  hex dump (first 32 bytes):
    02 00 00 00 10 00 00 00 c0 01 bc 03 81 ff ff ff  ................
    60 67 ba 03 81 ff ff ff 18 60 c8 06 81 ff ff ff  `g.......`......
  backtrace (crc feae9680):
    [<00000000197aa008>] kmemleak_alloc+0x34/0xa0
    [<0000000056fe02c9>] __kmalloc_cache_noprof+0x1e0/0x450
    [<00000000a8b3dfe1>] __scmi_device_create+0xb4/0x2b4
    [<000000008714917b>] scmi_device_create+0x40/0x194
    [<000000001818f3cf>] scmi_chan_setup+0x144/0x3b8
    [<00000000970bad38>] scmi_probe+0x584/0xa78
    [<000000002600d2fd>] platform_probe+0xbc/0xf0
    [<00000000f6f556b4>] really_probe+0x1b8/0x520
    [<00000000eed93d59>] __driver_probe_device+0xe0/0x1d8
    [<00000000d613b754>] driver_probe_device+0x6c/0x208
    [<00000000187a9170>] __driver_attach+0x168/0x328
    [<00000000e3ff1834>] bus_for_each_dev+0x14c/0x178
    [<00000000984a3176>] driver_attach+0x34/0x44
    [<00000000fc35bf2a>] bus_add_driver+0x1bc/0x358
    [<00000000747fce19>] driver_register+0xc0/0x1a0
    [<0000000081cb8754>] __platform_driver_register+0x40/0x50
unreferenced object 0xffffff8103bc01c0 (size 32):
  comm "swapper/0", pid 1, jiffies 4294893094
  hex dump (first 32 bytes):
    5f 5f 73 63 6d 69 5f 74 72 61 6e 73 70 6f 72 74  __scmi_transport
    5f 64 65 76 69 63 65 5f 72 78 5f 31 30 00 ff ff  _device_rx_10...
  backtrace (crc 8dab7ca7):
    [<00000000197aa008>] kmemleak_alloc+0x34/0xa0
    [<00000000500dbc08>] __kmalloc_node_track_caller_noprof+0x234/0x528
    [<000000004990eea4>] kstrdup+0x48/0x80
    [<00000000ad4d2923>] kstrdup_const+0x30/0x3c
    [<00000000e9d3bdc3>] __scmi_device_create+0xd4/0x2b4
    [<000000008714917b>] scmi_device_create+0x40/0x194
    [<000000001818f3cf>] scmi_chan_setup+0x144/0x3b8
    [<00000000970bad38>] scmi_probe+0x584/0xa78
    [<000000002600d2fd>] platform_probe+0xbc/0xf0
    [<00000000f6f556b4>] really_probe+0x1b8/0x520
    [<00000000eed93d59>] __driver_probe_device+0xe0/0x1d8
    [<00000000d613b754>] driver_probe_device+0x6c/0x208
    [<00000000187a9170>] __driver_attach+0x168/0x328
    [<00000000e3ff1834>] bus_for_each_dev+0x14c/0x178
    [<00000000984a3176>] driver_attach+0x34/0x44
    [<00000000fc35bf2a>] bus_add_driver+0x1bc/0x358
unreferenced object 0xffffff8103ba6760 (size 16):
  comm "swapper/0", pid 1, jiffies 4294893094
  hex dump (first 16 bytes):
    73 63 6d 69 5f 64 65 76 2e 32 00 03 81 ff ff ff  scmi_dev.2......
  backtrace (crc ccc21b9a):
    [<00000000197aa008>] kmemleak_alloc+0x34/0xa0
    [<00000000500dbc08>] __kmalloc_node_track_caller_noprof+0x234/0x528
    [<00000000cdc440a0>] kvasprintf+0x90/0x11c
    [<00000000500fc732>] kvasprintf_const+0x98/0x138
    [<0000000030e28143>] kobject_set_name_vargs+0x68/0x104
    [<00000000f15f6ece>] dev_set_name+0x6c/0x98
    [<00000000c1f76eb4>] __scmi_device_create+0x17c/0x2b4
    [<000000008714917b>] scmi_device_create+0x40/0x194
    [<000000001818f3cf>] scmi_chan_setup+0x144/0x3b8
    [<00000000970bad38>] scmi_probe+0x584/0xa78
    [<000000002600d2fd>] platform_probe+0xbc/0xf0
    [<00000000f6f556b4>] really_probe+0x1b8/0x520
    [<00000000eed93d59>] __driver_probe_device+0xe0/0x1d8
    [<00000000d613b754>] driver_probe_device+0x6c/0x208
    [<00000000187a9170>] __driver_attach+0x168/0x328
    [<00000000e3ff1834>] bus_for_each_dev+0x14c/0x178
unreferenced object 0xffffff810637c800 (size 512):
  comm "swapper/0", pid 1, jiffies 4294893094
  hex dump (first 32 bytes):
    00 00 00 00 ad 4e ad de ff ff ff ff 00 00 00 00  .....N..........
    ff ff ff ff ff ff ff ff 00 1e ee 83 c0 ff ff ff  ................
  backtrace (crc 732b3ae6):
    [<00000000197aa008>] kmemleak_alloc+0x34/0xa0
    [<0000000056fe02c9>] __kmalloc_cache_noprof+0x1e0/0x450
    [<0000000026a3cb30>] device_add+0x54/0x570
    [<00000000e515c343>] device_register+0x20/0x30
    [<0000000042008204>] __scmi_device_create+0x184/0x2b4
    [<000000008714917b>] scmi_device_create+0x40/0x194
    [<000000001818f3cf>] scmi_chan_setup+0x144/0x3b8
    [<00000000970bad38>] scmi_probe+0x584/0xa78
    [<000000002600d2fd>] platform_probe+0xbc/0xf0
    [<00000000f6f556b4>] really_probe+0x1b8/0x520
    [<00000000eed93d59>] __driver_probe_device+0xe0/0x1d8
    [<00000000d613b754>] driver_probe_device+0x6c/0x208
    [<00000000187a9170>] __driver_attach+0x168/0x328
    [<00000000e3ff1834>] bus_for_each_dev+0x14c/0x178
    [<00000000984a3176>] driver_attach+0x34/0x44
    [<00000000fc35bf2a>] bus_add_driver+0x1bc/0x358


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Bug report] Memory leak in scmi_device_create
  2025-03-05 11:59 [Bug report] Memory leak in scmi_device_create Alice Ryhl
@ 2025-03-05 17:10 ` Cristian Marussi
  2025-03-06 11:09   ` Alice Ryhl
  0 siblings, 1 reply; 9+ messages in thread
From: Cristian Marussi @ 2025-03-05 17:10 UTC (permalink / raw)
  To: Alice Ryhl
  Cc: Sudeep Holla, Cristian Marussi, linux-arm-kernel, arm-scmi,
	linux-kernel

On Wed, Mar 05, 2025 at 11:59:58AM +0000, Alice Ryhl wrote:
> Dear SYSTEM CONTROL & POWER/MANAGEMENT INTERFACE (SCPI/SCMI) Message
> Protocol drivers maintainers,
> 
> I flashed a v6.13-rc3 kernel onto a Rock5B board and noticed the
> following output in my terminal:
> 
> [  687.694465] kmemleak: 4 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
> 
> It seems that there is a memory leak for devices created with
> scmi_device_create.
> 
`
Hi Alice,

thanks for this report.

> This was with a kernel running v6.13-rc3, but as far as I can tell, no
> relevant changes have landed since v6.13-rc3. My tree *does* include
> commit 295416091e44 ("firmware: arm_scmi: Fix slab-use-after-free in
> scmi_bus_notifier()"). I've only seen this kmemleak report once, so it's
> not happening consistently.
> 
> See below for the full kmemleak report.
> 
> Alice
> 
> $ sudo cat /sys/kernel/debug/kmemleak
> unreferenced object 0xffffff8106c86000 (size 2048):
>   comm "swapper/0", pid 1, jiffies 4294893094
>   hex dump (first 32 bytes):
>     02 00 00 00 10 00 00 00 c0 01 bc 03 81 ff ff ff  ................
>     60 67 ba 03 81 ff ff ff 18 60 c8 06 81 ff ff ff  `g.......`......
>   backtrace (crc feae9680):
>     [<00000000197aa008>] kmemleak_alloc+0x34/0xa0
>     [<0000000056fe02c9>] __kmalloc_cache_noprof+0x1e0/0x450
>     [<00000000a8b3dfe1>] __scmi_device_create+0xb4/0x2b4
>     [<000000008714917b>] scmi_device_create+0x40/0x194
>     [<000000001818f3cf>] scmi_chan_setup+0x144/0x3b8
>     [<00000000970bad38>] scmi_probe+0x584/0xa78
>     [<000000002600d2fd>] platform_probe+0xbc/0xf0
>     [<00000000f6f556b4>] really_probe+0x1b8/0x520
>     [<00000000eed93d59>] __driver_probe_device+0xe0/0x1d8
>     [<00000000d613b754>] driver_probe_device+0x6c/0x208
>     [<00000000187a9170>] __driver_attach+0x168/0x328
>     [<00000000e3ff1834>] bus_for_each_dev+0x14c/0x178
>     [<00000000984a3176>] driver_attach+0x34/0x44
>     [<00000000fc35bf2a>] bus_add_driver+0x1bc/0x358
>     [<00000000747fce19>] driver_register+0xc0/0x1a0
>     [<0000000081cb8754>] __platform_driver_register+0x40/0x50
> unreferenced object 0xffffff8103bc01c0 (size 32):

I could not reproduce on my setup, even though I run a system with
all the existent SCMI protocols (and related drivers) enabled (and
so a lot of device creations) and a downstream test driver that causes
even more SCMI devices to be created/destroyed at load/unload.

Coming down the path from scmi_chan_setup(), it seems something around
transport devices creation, but it is not obvious to me where the leak
could hide....

...any particular setup on your side ? ...using LKMs, loading/unloading,
any usage pattern that could help me reproduce ?

Thanks,
Cristian


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Bug report] Memory leak in scmi_device_create
  2025-03-05 17:10 ` Cristian Marussi
@ 2025-03-06 11:09   ` Alice Ryhl
  2025-03-06 12:50     ` Cristian Marussi
  2025-03-06 14:36     ` Catalin Marinas
  0 siblings, 2 replies; 9+ messages in thread
From: Alice Ryhl @ 2025-03-06 11:09 UTC (permalink / raw)
  To: Cristian Marussi; +Cc: Sudeep Holla, linux-arm-kernel, arm-scmi, linux-kernel

On Wed, Mar 05, 2025 at 05:10:16PM +0000, Cristian Marussi wrote:
> On Wed, Mar 05, 2025 at 11:59:58AM +0000, Alice Ryhl wrote:
> > Dear SYSTEM CONTROL & POWER/MANAGEMENT INTERFACE (SCPI/SCMI) Message
> > Protocol drivers maintainers,
> > 
> > I flashed a v6.13-rc3 kernel onto a Rock5B board and noticed the
> > following output in my terminal:
> > 
> > [  687.694465] kmemleak: 4 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
> > 
> > It seems that there is a memory leak for devices created with
> > scmi_device_create.
> > 
> `
> Hi Alice,
> 
> thanks for this report.
> 
> > This was with a kernel running v6.13-rc3, but as far as I can tell, no
> > relevant changes have landed since v6.13-rc3. My tree *does* include
> > commit 295416091e44 ("firmware: arm_scmi: Fix slab-use-after-free in
> > scmi_bus_notifier()"). I've only seen this kmemleak report once, so it's
> > not happening consistently.
> > 
> > See below for the full kmemleak report.
> > 
> > Alice
> > 
> > $ sudo cat /sys/kernel/debug/kmemleak
> > unreferenced object 0xffffff8106c86000 (size 2048):
> >   comm "swapper/0", pid 1, jiffies 4294893094
> >   hex dump (first 32 bytes):
> >     02 00 00 00 10 00 00 00 c0 01 bc 03 81 ff ff ff  ................
> >     60 67 ba 03 81 ff ff ff 18 60 c8 06 81 ff ff ff  `g.......`......
> >   backtrace (crc feae9680):
> >     [<00000000197aa008>] kmemleak_alloc+0x34/0xa0
> >     [<0000000056fe02c9>] __kmalloc_cache_noprof+0x1e0/0x450
> >     [<00000000a8b3dfe1>] __scmi_device_create+0xb4/0x2b4
> >     [<000000008714917b>] scmi_device_create+0x40/0x194
> >     [<000000001818f3cf>] scmi_chan_setup+0x144/0x3b8
> >     [<00000000970bad38>] scmi_probe+0x584/0xa78
> >     [<000000002600d2fd>] platform_probe+0xbc/0xf0
> >     [<00000000f6f556b4>] really_probe+0x1b8/0x520
> >     [<00000000eed93d59>] __driver_probe_device+0xe0/0x1d8
> >     [<00000000d613b754>] driver_probe_device+0x6c/0x208
> >     [<00000000187a9170>] __driver_attach+0x168/0x328
> >     [<00000000e3ff1834>] bus_for_each_dev+0x14c/0x178
> >     [<00000000984a3176>] driver_attach+0x34/0x44
> >     [<00000000fc35bf2a>] bus_add_driver+0x1bc/0x358
> >     [<00000000747fce19>] driver_register+0xc0/0x1a0
> >     [<0000000081cb8754>] __platform_driver_register+0x40/0x50
> > unreferenced object 0xffffff8103bc01c0 (size 32):
> 
> I could not reproduce on my setup, even though I run a system with
> all the existent SCMI protocols (and related drivers) enabled (and
> so a lot of device creations) and a downstream test driver that causes
> even more SCMI devices to be created/destroyed at load/unload.
> 
> Coming down the path from scmi_chan_setup(), it seems something around
> transport devices creation, but it is not obvious to me where the leak
> could hide....
> 
> ...any particular setup on your side ? ...using LKMs, loading/unloading,
> any usage pattern that could help me reproduce ?

I looked into this a bit more, and actually it does happen consistently.
It's just that kmemleak doesn't report it until 10 minutes after
booting, so I did not notice it.

As for my setup, well, I boot the kernel over pxe and the rootfs is
mounted over NFSv4. The memory leak happens even if I don't do anything
at all - I just boot and wait. The device is a Radxa Rock5B.

Not sure what other information there is to give.

I tried again with v6.14-rc5, and I still got the leak:

user@rk3588-ci:~$ sudo cat /sys/kernel/debug/kmemleak
unreferenced object 0xffffff81068c0000 (size 2048):
  comm "swapper/0", pid 1, jiffies 4294893128
  hex dump (first 32 bytes):
    02 00 00 00 10 00 00 00 40 a3 7a 03 81 ff ff ff  ........@.z.....
    60 c8 79 03 81 ff ff ff 18 00 8c 06 81 ff ff ff  `.y.............
  backtrace (crc 60df30fb):
    kmemleak_alloc+0x34/0xa0
    __kmalloc_cache_noprof+0x1e0/0x450
    __scmi_device_create+0xb4/0x2b4
    scmi_device_create+0x40/0x194
    scmi_chan_setup+0x144/0x3b8
    scmi_probe+0x51c/0x9fc
    platform_probe+0xbc/0xf0
    really_probe+0x1b8/0x520
    __driver_probe_device+0xe0/0x1d8
    driver_probe_device+0x6c/0x208
    __driver_attach+0x168/0x328
    bus_for_each_dev+0x14c/0x178
    driver_attach+0x34/0x44
    bus_add_driver+0x1bc/0x358
    driver_register+0xc0/0x1a0
    __platform_driver_register+0x40/0x50
unreferenced object 0xffffff81037aa340 (size 32):
  comm "swapper/0", pid 1, jiffies 4294893128
  hex dump (first 32 bytes):
    5f 5f 73 63 6d 69 5f 74 72 61 6e 73 70 6f 72 74  __scmi_transport
    5f 64 65 76 69 63 65 5f 72 78 5f 31 30 00 ff ff  _device_rx_10...
  backtrace (crc 8dab7ca7):
    kmemleak_alloc+0x34/0xa0
    __kmalloc_node_track_caller_noprof+0x234/0x528
    kstrdup+0x48/0x80
    kstrdup_const+0x30/0x3c
    __scmi_device_create+0xd4/0x2b4
    scmi_device_create+0x40/0x194
    scmi_chan_setup+0x144/0x3b8
    scmi_probe+0x51c/0x9fc
    platform_probe+0xbc/0xf0
    really_probe+0x1b8/0x520
    __driver_probe_device+0xe0/0x1d8
    driver_probe_device+0x6c/0x208
    __driver_attach+0x168/0x328
    bus_for_each_dev+0x14c/0x178
    driver_attach+0x34/0x44
    bus_add_driver+0x1bc/0x358
unreferenced object 0xffffff810379c860 (size 16):
  comm "swapper/0", pid 1, jiffies 4294893128
  hex dump (first 16 bytes):
    73 63 6d 69 5f 64 65 76 2e 32 00 03 81 ff ff ff  scmi_dev.2......
  backtrace (crc ccc21b9a):
    kmemleak_alloc+0x34/0xa0
    __kmalloc_node_track_caller_noprof+0x234/0x528
    kvasprintf+0x90/0x11c
    kvasprintf_const+0x98/0x138
    kobject_set_name_vargs+0x68/0x104
    dev_set_name+0x6c/0x98
    __scmi_device_create+0x17c/0x2b4
    scmi_device_create+0x40/0x194
    scmi_chan_setup+0x144/0x3b8
    scmi_probe+0x51c/0x9fc
    platform_probe+0xbc/0xf0
    really_probe+0x1b8/0x520
    __driver_probe_device+0xe0/0x1d8
    driver_probe_device+0x6c/0x208
    __driver_attach+0x168/0x328
    bus_for_each_dev+0x14c/0x178
unreferenced object 0xffffff8105be7400 (size 512):
  comm "swapper/0", pid 1, jiffies 4294893128
  hex dump (first 32 bytes):
    00 00 00 00 ad 4e ad de ff ff ff ff 00 00 00 00  .....N..........
    ff ff ff ff ff ff ff ff c0 b4 f0 83 c0 ff ff ff  ................
  backtrace (crc 7b92a969):
    kmemleak_alloc+0x34/0xa0
    __kmalloc_cache_noprof+0x1e0/0x450
    device_add+0x54/0x570
    device_register+0x20/0x30
    __scmi_device_create+0x184/0x2b4
    scmi_device_create+0x40/0x194
    scmi_chan_setup+0x144/0x3b8
    scmi_probe+0x51c/0x9fc
    platform_probe+0xbc/0xf0
    really_probe+0x1b8/0x520
    __driver_probe_device+0xe0/0x1d8
    driver_probe_device+0x6c/0x208
    __driver_attach+0x168/0x328
    bus_for_each_dev+0x14c/0x178
    driver_attach+0x34/0x44
    bus_add_driver+0x1bc/0x358


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Bug report] Memory leak in scmi_device_create
  2025-03-06 11:09   ` Alice Ryhl
@ 2025-03-06 12:50     ` Cristian Marussi
  2025-03-06 13:25       ` Alice Ryhl
  2025-03-06 14:36     ` Catalin Marinas
  1 sibling, 1 reply; 9+ messages in thread
From: Cristian Marussi @ 2025-03-06 12:50 UTC (permalink / raw)
  To: Alice Ryhl
  Cc: Cristian Marussi, Sudeep Holla, linux-arm-kernel, arm-scmi,
	linux-kernel

On Thu, Mar 06, 2025 at 11:09:33AM +0000, Alice Ryhl wrote:
> On Wed, Mar 05, 2025 at 05:10:16PM +0000, Cristian Marussi wrote:
> > On Wed, Mar 05, 2025 at 11:59:58AM +0000, Alice Ryhl wrote:
> > > Dear SYSTEM CONTROL & POWER/MANAGEMENT INTERFACE (SCPI/SCMI) Message
> > > Protocol drivers maintainers,
> > > 
> > > I flashed a v6.13-rc3 kernel onto a Rock5B board and noticed the
> > > following output in my terminal:
> > > 
> > > [  687.694465] kmemleak: 4 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
> > > 
> > > It seems that there is a memory leak for devices created with
> > > scmi_device_create.
> > > 
> > `
> > Hi Alice,
> > 
> > thanks for this report.
> > 
> > > This was with a kernel running v6.13-rc3, but as far as I can tell, no
> > > relevant changes have landed since v6.13-rc3. My tree *does* include
> > > commit 295416091e44 ("firmware: arm_scmi: Fix slab-use-after-free in
> > > scmi_bus_notifier()"). I've only seen this kmemleak report once, so it's
> > > not happening consistently.
> > > 
> > > See below for the full kmemleak report.
> > > 
> > > Alice
> > > 
> > > $ sudo cat /sys/kernel/debug/kmemleak
> > > unreferenced object 0xffffff8106c86000 (size 2048):
> > >   comm "swapper/0", pid 1, jiffies 4294893094
> > >   hex dump (first 32 bytes):
> > >     02 00 00 00 10 00 00 00 c0 01 bc 03 81 ff ff ff  ................
> > >     60 67 ba 03 81 ff ff ff 18 60 c8 06 81 ff ff ff  `g.......`......
> > >   backtrace (crc feae9680):
> > >     [<00000000197aa008>] kmemleak_alloc+0x34/0xa0
> > >     [<0000000056fe02c9>] __kmalloc_cache_noprof+0x1e0/0x450
> > >     [<00000000a8b3dfe1>] __scmi_device_create+0xb4/0x2b4
> > >     [<000000008714917b>] scmi_device_create+0x40/0x194
> > >     [<000000001818f3cf>] scmi_chan_setup+0x144/0x3b8
> > >     [<00000000970bad38>] scmi_probe+0x584/0xa78
> > >     [<000000002600d2fd>] platform_probe+0xbc/0xf0
> > >     [<00000000f6f556b4>] really_probe+0x1b8/0x520
> > >     [<00000000eed93d59>] __driver_probe_device+0xe0/0x1d8
> > >     [<00000000d613b754>] driver_probe_device+0x6c/0x208
> > >     [<00000000187a9170>] __driver_attach+0x168/0x328
> > >     [<00000000e3ff1834>] bus_for_each_dev+0x14c/0x178
> > >     [<00000000984a3176>] driver_attach+0x34/0x44
> > >     [<00000000fc35bf2a>] bus_add_driver+0x1bc/0x358
> > >     [<00000000747fce19>] driver_register+0xc0/0x1a0
> > >     [<0000000081cb8754>] __platform_driver_register+0x40/0x50
> > > unreferenced object 0xffffff8103bc01c0 (size 32):
> > 
> > I could not reproduce on my setup, even though I run a system with
> > all the existent SCMI protocols (and related drivers) enabled (and
> > so a lot of device creations) and a downstream test driver that causes
> > even more SCMI devices to be created/destroyed at load/unload.
> > 
> > Coming down the path from scmi_chan_setup(), it seems something around
> > transport devices creation, but it is not obvious to me where the leak
> > could hide....
> > 
> > ...any particular setup on your side ? ...using LKMs, loading/unloading,
> > any usage pattern that could help me reproduce ?
> 
> I looked into this a bit more, and actually it does happen consistently.
> It's just that kmemleak doesn't report it until 10 minutes after
> booting, so I did not notice it.
> 
> As for my setup, well, I boot the kernel over pxe and the rootfs is
> mounted over NFSv4. The memory leak happens even if I don't do anything
> at all - I just boot and wait. The device is a Radxa Rock5B.
> 
> Not sure what other information there is to give.
> 

My question as stated above was mainly to understand if the SCMI stack
was built-in or compiled as loadable modules (lsmod|grep -i scmi)...
...I am just to try to pin down a possible 'more-vulnerable' configuration..
..I could not see any report even triggering a kmemleak scan on v6.14-rc5
BUT I only tested with a fully built-in SCMI stack indeed as of now...so
the question.

> I tried again with v6.14-rc5, and I still got the leak:

Ok...thanks I will investigate with different configs.

Thanks,
Cristian


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Bug report] Memory leak in scmi_device_create
  2025-03-06 12:50     ` Cristian Marussi
@ 2025-03-06 13:25       ` Alice Ryhl
  0 siblings, 0 replies; 9+ messages in thread
From: Alice Ryhl @ 2025-03-06 13:25 UTC (permalink / raw)
  To: Cristian Marussi; +Cc: Sudeep Holla, linux-arm-kernel, arm-scmi, linux-kernel

On Thu, Mar 06, 2025 at 12:50:17PM +0000, Cristian Marussi wrote:
> On Thu, Mar 06, 2025 at 11:09:33AM +0000, Alice Ryhl wrote:
> > On Wed, Mar 05, 2025 at 05:10:16PM +0000, Cristian Marussi wrote:
> > > On Wed, Mar 05, 2025 at 11:59:58AM +0000, Alice Ryhl wrote:
> > > > Dear SYSTEM CONTROL & POWER/MANAGEMENT INTERFACE (SCPI/SCMI) Message
> > > > Protocol drivers maintainers,
> > > > 
> > > > I flashed a v6.13-rc3 kernel onto a Rock5B board and noticed the
> > > > following output in my terminal:
> > > > 
> > > > [  687.694465] kmemleak: 4 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
> > > > 
> > > > It seems that there is a memory leak for devices created with
> > > > scmi_device_create.
> > > > 
> > > `
> > > Hi Alice,
> > > 
> > > thanks for this report.
> > > 
> > > > This was with a kernel running v6.13-rc3, but as far as I can tell, no
> > > > relevant changes have landed since v6.13-rc3. My tree *does* include
> > > > commit 295416091e44 ("firmware: arm_scmi: Fix slab-use-after-free in
> > > > scmi_bus_notifier()"). I've only seen this kmemleak report once, so it's
> > > > not happening consistently.
> > > > 
> > > > See below for the full kmemleak report.
> > > > 
> > > > Alice
> > > > 
> > > > $ sudo cat /sys/kernel/debug/kmemleak
> > > > unreferenced object 0xffffff8106c86000 (size 2048):
> > > >   comm "swapper/0", pid 1, jiffies 4294893094
> > > >   hex dump (first 32 bytes):
> > > >     02 00 00 00 10 00 00 00 c0 01 bc 03 81 ff ff ff  ................
> > > >     60 67 ba 03 81 ff ff ff 18 60 c8 06 81 ff ff ff  `g.......`......
> > > >   backtrace (crc feae9680):
> > > >     [<00000000197aa008>] kmemleak_alloc+0x34/0xa0
> > > >     [<0000000056fe02c9>] __kmalloc_cache_noprof+0x1e0/0x450
> > > >     [<00000000a8b3dfe1>] __scmi_device_create+0xb4/0x2b4
> > > >     [<000000008714917b>] scmi_device_create+0x40/0x194
> > > >     [<000000001818f3cf>] scmi_chan_setup+0x144/0x3b8
> > > >     [<00000000970bad38>] scmi_probe+0x584/0xa78
> > > >     [<000000002600d2fd>] platform_probe+0xbc/0xf0
> > > >     [<00000000f6f556b4>] really_probe+0x1b8/0x520
> > > >     [<00000000eed93d59>] __driver_probe_device+0xe0/0x1d8
> > > >     [<00000000d613b754>] driver_probe_device+0x6c/0x208
> > > >     [<00000000187a9170>] __driver_attach+0x168/0x328
> > > >     [<00000000e3ff1834>] bus_for_each_dev+0x14c/0x178
> > > >     [<00000000984a3176>] driver_attach+0x34/0x44
> > > >     [<00000000fc35bf2a>] bus_add_driver+0x1bc/0x358
> > > >     [<00000000747fce19>] driver_register+0xc0/0x1a0
> > > >     [<0000000081cb8754>] __platform_driver_register+0x40/0x50
> > > > unreferenced object 0xffffff8103bc01c0 (size 32):
> > > 
> > > I could not reproduce on my setup, even though I run a system with
> > > all the existent SCMI protocols (and related drivers) enabled (and
> > > so a lot of device creations) and a downstream test driver that causes
> > > even more SCMI devices to be created/destroyed at load/unload.
> > > 
> > > Coming down the path from scmi_chan_setup(), it seems something around
> > > transport devices creation, but it is not obvious to me where the leak
> > > could hide....
> > > 
> > > ...any particular setup on your side ? ...using LKMs, loading/unloading,
> > > any usage pattern that could help me reproduce ?
> > 
> > I looked into this a bit more, and actually it does happen consistently.
> > It's just that kmemleak doesn't report it until 10 minutes after
> > booting, so I did not notice it.
> > 
> > As for my setup, well, I boot the kernel over pxe and the rootfs is
> > mounted over NFSv4. The memory leak happens even if I don't do anything
> > at all - I just boot and wait. The device is a Radxa Rock5B.
> > 
> > Not sure what other information there is to give.
> > 
> 
> My question as stated above was mainly to understand if the SCMI stack
> was built-in or compiled as loadable modules (lsmod|grep -i scmi)...
> ...I am just to try to pin down a possible 'more-vulnerable' configuration..
> ..I could not see any report even triggering a kmemleak scan on v6.14-rc5
> BUT I only tested with a fully built-in SCMI stack indeed as of now...so
> the question.
> 
> > I tried again with v6.14-rc5, and I still got the leak:
> 
> Ok...thanks I will investigate with different configs.

Here is the config I used:
https://gist.github.com/Darksonn/ecd31a0512f43f7e74a30ab83ff2196f

Alice


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Bug report] Memory leak in scmi_device_create
  2025-03-06 11:09   ` Alice Ryhl
  2025-03-06 12:50     ` Cristian Marussi
@ 2025-03-06 14:36     ` Catalin Marinas
  2025-03-06 15:47       ` Cristian Marussi
  1 sibling, 1 reply; 9+ messages in thread
From: Catalin Marinas @ 2025-03-06 14:36 UTC (permalink / raw)
  To: Alice Ryhl
  Cc: Cristian Marussi, Sudeep Holla, linux-arm-kernel, arm-scmi,
	linux-kernel

On Thu, Mar 06, 2025 at 11:09:33AM +0000, Alice Ryhl wrote:
> On Wed, Mar 05, 2025 at 05:10:16PM +0000, Cristian Marussi wrote:
> > On Wed, Mar 05, 2025 at 11:59:58AM +0000, Alice Ryhl wrote:
> > > This was with a kernel running v6.13-rc3, but as far as I can tell, no
> > > relevant changes have landed since v6.13-rc3. My tree *does* include
> > > commit 295416091e44 ("firmware: arm_scmi: Fix slab-use-after-free in
> > > scmi_bus_notifier()"). I've only seen this kmemleak report once, so it's
> > > not happening consistently.
> > > 
> > > See below for the full kmemleak report.
> > > 
> > > Alice
> > > 
> > > $ sudo cat /sys/kernel/debug/kmemleak
> > > unreferenced object 0xffffff8106c86000 (size 2048):
> > >   comm "swapper/0", pid 1, jiffies 4294893094
> > >   hex dump (first 32 bytes):
> > >     02 00 00 00 10 00 00 00 c0 01 bc 03 81 ff ff ff  ................
> > >     60 67 ba 03 81 ff ff ff 18 60 c8 06 81 ff ff ff  `g.......`......
> > >   backtrace (crc feae9680):
> > >     [<00000000197aa008>] kmemleak_alloc+0x34/0xa0
> > >     [<0000000056fe02c9>] __kmalloc_cache_noprof+0x1e0/0x450
> > >     [<00000000a8b3dfe1>] __scmi_device_create+0xb4/0x2b4
> > >     [<000000008714917b>] scmi_device_create+0x40/0x194
> > >     [<000000001818f3cf>] scmi_chan_setup+0x144/0x3b8
> > >     [<00000000970bad38>] scmi_probe+0x584/0xa78
> > >     [<000000002600d2fd>] platform_probe+0xbc/0xf0
> > >     [<00000000f6f556b4>] really_probe+0x1b8/0x520
> > >     [<00000000eed93d59>] __driver_probe_device+0xe0/0x1d8
> > >     [<00000000d613b754>] driver_probe_device+0x6c/0x208
> > >     [<00000000187a9170>] __driver_attach+0x168/0x328
> > >     [<00000000e3ff1834>] bus_for_each_dev+0x14c/0x178
> > >     [<00000000984a3176>] driver_attach+0x34/0x44
> > >     [<00000000fc35bf2a>] bus_add_driver+0x1bc/0x358
> > >     [<00000000747fce19>] driver_register+0xc0/0x1a0
> > >     [<0000000081cb8754>] __platform_driver_register+0x40/0x50
> > > unreferenced object 0xffffff8103bc01c0 (size 32):
> > 
> > I could not reproduce on my setup, even though I run a system with
> > all the existent SCMI protocols (and related drivers) enabled (and
> > so a lot of device creations) and a downstream test driver that causes
> > even more SCMI devices to be created/destroyed at load/unload.
> > 
> > Coming down the path from scmi_chan_setup(), it seems something around
> > transport devices creation, but it is not obvious to me where the leak
> > could hide....
> > 
> > ...any particular setup on your side ? ...using LKMs, loading/unloading,
> > any usage pattern that could help me reproduce ?
> 
> I looked into this a bit more, and actually it does happen consistently.
> It's just that kmemleak doesn't report it until 10 minutes after
> booting, so I did not notice it.

You can force the scanning with:

  echo scan > /sys/kernel/debug/kmemleak

Just do it a couple of times after boot, no need to wait 10 min for the
default background scanning.

> user@rk3588-ci:~$ sudo cat /sys/kernel/debug/kmemleak
> unreferenced object 0xffffff81068c0000 (size 2048):
>   comm "swapper/0", pid 1, jiffies 4294893128
>   hex dump (first 32 bytes):
>     02 00 00 00 10 00 00 00 40 a3 7a 03 81 ff ff ff  ........@.z.....
>     60 c8 79 03 81 ff ff ff 18 00 8c 06 81 ff ff ff  `.y.............
>   backtrace (crc 60df30fb):
>     kmemleak_alloc+0x34/0xa0
>     __kmalloc_cache_noprof+0x1e0/0x450
>     __scmi_device_create+0xb4/0x2b4

Is this the kzalloc() for sizeof(*scmi_dev)? It's surprisingly large, I
thought it would go for the kmalloc-1k slab as struct device is below
this side, at least for my builds. Anyway...

>     scmi_device_create+0x40/0x194
>     scmi_chan_setup+0x144/0x3b8
>     scmi_probe+0x51c/0x9fc
>     platform_probe+0xbc/0xf0
>     really_probe+0x1b8/0x520
>     __driver_probe_device+0xe0/0x1d8
>     driver_probe_device+0x6c/0x208
>     __driver_attach+0x168/0x328
>     bus_for_each_dev+0x14c/0x178
>     driver_attach+0x34/0x44
>     bus_add_driver+0x1bc/0x358
>     driver_register+0xc0/0x1a0
>     __platform_driver_register+0x40/0x50
> unreferenced object 0xffffff81037aa340 (size 32):
>   comm "swapper/0", pid 1, jiffies 4294893128
>   hex dump (first 32 bytes):
>     5f 5f 73 63 6d 69 5f 74 72 61 6e 73 70 6f 72 74  __scmi_transport
>     5f 64 65 76 69 63 65 5f 72 78 5f 31 30 00 ff ff  _device_rx_10...
>   backtrace (crc 8dab7ca7):
>     kmemleak_alloc+0x34/0xa0
>     __kmalloc_node_track_caller_noprof+0x234/0x528
>     kstrdup+0x48/0x80
>     kstrdup_const+0x30/0x3c

These are referenced from the main structure above, so they'd be
reported as leaks as well.

This loop in scmi_device_create() looks strange:

	list_for_each_entry(rdev, phead, node) {
		struct scmi_device *sdev;

		sdev = __scmi_device_create(np, parent,
					    rdev->id_table->protocol_id,
					    rdev->id_table->name);
		/* Report errors and carry on... */
		if (sdev)
			scmi_dev = sdev;
		else
			pr_err("(%s) Failed to create device for protocol 0x%x (%s)\n",
			       of_node_full_name(parent->of_node),
			       rdev->id_table->protocol_id,
			       rdev->id_table->name);
	}

We can override scmi_dev a few times in the loop and lose the previous
sdev allocations. Is this intended?

-- 
Catalin


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Bug report] Memory leak in scmi_device_create
  2025-03-06 14:36     ` Catalin Marinas
@ 2025-03-06 15:47       ` Cristian Marussi
  2025-03-06 16:18         ` Catalin Marinas
  0 siblings, 1 reply; 9+ messages in thread
From: Cristian Marussi @ 2025-03-06 15:47 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Alice Ryhl, Cristian Marussi, Sudeep Holla, linux-arm-kernel,
	arm-scmi, linux-kernel

On Thu, Mar 06, 2025 at 02:36:16PM +0000, Catalin Marinas wrote:
> On Thu, Mar 06, 2025 at 11:09:33AM +0000, Alice Ryhl wrote:
> > On Wed, Mar 05, 2025 at 05:10:16PM +0000, Cristian Marussi wrote:
> > > On Wed, Mar 05, 2025 at 11:59:58AM +0000, Alice Ryhl wrote:
> > > > This was with a kernel running v6.13-rc3, but as far as I can tell, no
> > > > relevant changes have landed since v6.13-rc3. My tree *does* include
> > > > commit 295416091e44 ("firmware: arm_scmi: Fix slab-use-after-free in
> > > > scmi_bus_notifier()"). I've only seen this kmemleak report once, so it's
> > > > not happening consistently.
> > > > 
> > > > See below for the full kmemleak report.
> > > > 
> > > > Alice
> > > > 
> > > > $ sudo cat /sys/kernel/debug/kmemleak
> > > > unreferenced object 0xffffff8106c86000 (size 2048):
> > > >   comm "swapper/0", pid 1, jiffies 4294893094
> > > >   hex dump (first 32 bytes):
> > > >     02 00 00 00 10 00 00 00 c0 01 bc 03 81 ff ff ff  ................
> > > >     60 67 ba 03 81 ff ff ff 18 60 c8 06 81 ff ff ff  `g.......`......
> > > >   backtrace (crc feae9680):
> > > >     [<00000000197aa008>] kmemleak_alloc+0x34/0xa0
> > > >     [<0000000056fe02c9>] __kmalloc_cache_noprof+0x1e0/0x450
> > > >     [<00000000a8b3dfe1>] __scmi_device_create+0xb4/0x2b4
> > > >     [<000000008714917b>] scmi_device_create+0x40/0x194
> > > >     [<000000001818f3cf>] scmi_chan_setup+0x144/0x3b8
> > > >     [<00000000970bad38>] scmi_probe+0x584/0xa78
> > > >     [<000000002600d2fd>] platform_probe+0xbc/0xf0
> > > >     [<00000000f6f556b4>] really_probe+0x1b8/0x520
> > > >     [<00000000eed93d59>] __driver_probe_device+0xe0/0x1d8
> > > >     [<00000000d613b754>] driver_probe_device+0x6c/0x208
> > > >     [<00000000187a9170>] __driver_attach+0x168/0x328
> > > >     [<00000000e3ff1834>] bus_for_each_dev+0x14c/0x178
> > > >     [<00000000984a3176>] driver_attach+0x34/0x44
> > > >     [<00000000fc35bf2a>] bus_add_driver+0x1bc/0x358
> > > >     [<00000000747fce19>] driver_register+0xc0/0x1a0
> > > >     [<0000000081cb8754>] __platform_driver_register+0x40/0x50
> > > > unreferenced object 0xffffff8103bc01c0 (size 32):
> > > 
> > > I could not reproduce on my setup, even though I run a system with
> > > all the existent SCMI protocols (and related drivers) enabled (and
> > > so a lot of device creations) and a downstream test driver that causes
> > > even more SCMI devices to be created/destroyed at load/unload.
> > > 
> > > Coming down the path from scmi_chan_setup(), it seems something around
> > > transport devices creation, but it is not obvious to me where the leak
> > > could hide....
> > > 
> > > ...any particular setup on your side ? ...using LKMs, loading/unloading,
> > > any usage pattern that could help me reproduce ?
> > 
> > I looked into this a bit more, and actually it does happen consistently.
> > It's just that kmemleak doesn't report it until 10 minutes after
> > booting, so I did not notice it.
> 
> You can force the scanning with:
> 
>   echo scan > /sys/kernel/debug/kmemleak
> 
> Just do it a couple of times after boot, no need to wait 10 min for the
> default background scanning.
> 
> > user@rk3588-ci:~$ sudo cat /sys/kernel/debug/kmemleak
> > unreferenced object 0xffffff81068c0000 (size 2048):
> >   comm "swapper/0", pid 1, jiffies 4294893128
> >   hex dump (first 32 bytes):
> >     02 00 00 00 10 00 00 00 40 a3 7a 03 81 ff ff ff  ........@.z.....
> >     60 c8 79 03 81 ff ff ff 18 00 8c 06 81 ff ff ff  `.y.............
> >   backtrace (crc 60df30fb):
> >     kmemleak_alloc+0x34/0xa0
> >     __kmalloc_cache_noprof+0x1e0/0x450
> >     __scmi_device_create+0xb4/0x2b4
> 
> Is this the kzalloc() for sizeof(*scmi_dev)? It's surprisingly large, I
> thought it would go for the kmalloc-1k slab as struct device is below
> this side, at least for my builds. Anyway...
> 
> >     scmi_device_create+0x40/0x194
> >     scmi_chan_setup+0x144/0x3b8
> >     scmi_probe+0x51c/0x9fc
> >     platform_probe+0xbc/0xf0
> >     really_probe+0x1b8/0x520
> >     __driver_probe_device+0xe0/0x1d8
> >     driver_probe_device+0x6c/0x208
> >     __driver_attach+0x168/0x328
> >     bus_for_each_dev+0x14c/0x178
> >     driver_attach+0x34/0x44
> >     bus_add_driver+0x1bc/0x358
> >     driver_register+0xc0/0x1a0
> >     __platform_driver_register+0x40/0x50
> > unreferenced object 0xffffff81037aa340 (size 32):
> >   comm "swapper/0", pid 1, jiffies 4294893128
> >   hex dump (first 32 bytes):
> >     5f 5f 73 63 6d 69 5f 74 72 61 6e 73 70 6f 72 74  __scmi_transport
> >     5f 64 65 76 69 63 65 5f 72 78 5f 31 30 00 ff ff  _device_rx_10...
> >   backtrace (crc 8dab7ca7):
> >     kmemleak_alloc+0x34/0xa0
> >     __kmalloc_node_track_caller_noprof+0x234/0x528
> >     kstrdup+0x48/0x80
> >     kstrdup_const+0x30/0x3c
> 
> These are referenced from the main structure above, so they'd be
> reported as leaks as well.
> 
> This loop in scmi_device_create() looks strange:
> 
> 	list_for_each_entry(rdev, phead, node) {
> 		struct scmi_device *sdev;
> 
> 		sdev = __scmi_device_create(np, parent,
> 					    rdev->id_table->protocol_id,
> 					    rdev->id_table->name);
> 		/* Report errors and carry on... */
> 		if (sdev)
> 			scmi_dev = sdev;
> 		else
> 			pr_err("(%s) Failed to create device for protocol 0x%x (%s)\n",
> 			       of_node_full_name(parent->of_node),
> 			       rdev->id_table->protocol_id,
> 			       rdev->id_table->name);
> 	}
> 
> We can override scmi_dev a few times in the loop and lose the previous
> sdev allocations. Is this intended?

Yes...it is weird..but by design I would say :P ...

...because this is called to instantiate one single device OR instantiate at
once all the multiple devices needed for a protocol: in this latter case it
returns just one of the created devices to signal success or NULL if all the
devices' creation failed....we dont need to keep the allocated devices references
anyway here since on success those devices are now referenced and kept on the
SCMI bus, so they can be searched/scanned/destroyed from there.

But maybe this is the crux of the matter, or what fools kmemleak...I
will try to reproduce again.

Thanks,
Cristian


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Bug report] Memory leak in scmi_device_create
  2025-03-06 15:47       ` Cristian Marussi
@ 2025-03-06 16:18         ` Catalin Marinas
  2025-03-06 18:43           ` Cristian Marussi
  0 siblings, 1 reply; 9+ messages in thread
From: Catalin Marinas @ 2025-03-06 16:18 UTC (permalink / raw)
  To: Cristian Marussi
  Cc: Alice Ryhl, Sudeep Holla, linux-arm-kernel, arm-scmi,
	linux-kernel

On Thu, Mar 06, 2025 at 03:47:27PM +0000, Cristian Marussi wrote:
> On Thu, Mar 06, 2025 at 02:36:16PM +0000, Catalin Marinas wrote:
> > This loop in scmi_device_create() looks strange:
> > 
> > 	list_for_each_entry(rdev, phead, node) {
> > 		struct scmi_device *sdev;
> > 
> > 		sdev = __scmi_device_create(np, parent,
> > 					    rdev->id_table->protocol_id,
> > 					    rdev->id_table->name);
> > 		/* Report errors and carry on... */
> > 		if (sdev)
> > 			scmi_dev = sdev;
> > 		else
> > 			pr_err("(%s) Failed to create device for protocol 0x%x (%s)\n",
> > 			       of_node_full_name(parent->of_node),
> > 			       rdev->id_table->protocol_id,
> > 			       rdev->id_table->name);
> > 	}
> > 
> > We can override scmi_dev a few times in the loop and lose the previous
> > sdev allocations. Is this intended?
> 
> Yes...it is weird..but by design I would say :P ...
> 
> ...because this is called to instantiate one single device OR instantiate at
> once all the multiple devices needed for a protocol: in this latter case it
> returns just one of the created devices to signal success or NULL if all the
> devices' creation failed....we dont need to keep the allocated devices references
> anyway here since on success those devices are now referenced and kept on the
> SCMI bus, so they can be searched/scanned/destroyed from there.

Not sure why the pointer isn't found, device_add() should link it with
the parent. Unless something else fails, the parent is freed and the
linked devices unreachable. I'm not familiar at all with this code, I
just saw kmemleak and thought of replying.

The loop is still weird, scmi_chan_setup() seems to use the pointer to
scmi_device for something more meaningful than a pass/fail check. Also
the overall result is based only on what the last __scmi_device_create()
return value was, irrespective of the previous iterations of the loop.
You do have a pr_err() but no early bailing out of the loop on failure.
I'm curious if there are any SCMI errors in the Alice's kernel log.

-- 
Catalin


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Bug report] Memory leak in scmi_device_create
  2025-03-06 16:18         ` Catalin Marinas
@ 2025-03-06 18:43           ` Cristian Marussi
  0 siblings, 0 replies; 9+ messages in thread
From: Cristian Marussi @ 2025-03-06 18:43 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Cristian Marussi, Alice Ryhl, Sudeep Holla, linux-arm-kernel,
	arm-scmi, linux-kernel

On Thu, Mar 06, 2025 at 04:18:38PM +0000, Catalin Marinas wrote:
> On Thu, Mar 06, 2025 at 03:47:27PM +0000, Cristian Marussi wrote:
> > On Thu, Mar 06, 2025 at 02:36:16PM +0000, Catalin Marinas wrote:
> > > This loop in scmi_device_create() looks strange:
> > > 
> > > 	list_for_each_entry(rdev, phead, node) {
> > > 		struct scmi_device *sdev;
> > > 
> > > 		sdev = __scmi_device_create(np, parent,
> > > 					    rdev->id_table->protocol_id,
> > > 					    rdev->id_table->name);
> > > 		/* Report errors and carry on... */
> > > 		if (sdev)
> > > 			scmi_dev = sdev;
> > > 		else
> > > 			pr_err("(%s) Failed to create device for protocol 0x%x (%s)\n",
> > > 			       of_node_full_name(parent->of_node),
> > > 			       rdev->id_table->protocol_id,
> > > 			       rdev->id_table->name);
> > > 	}
> > > 
> > > We can override scmi_dev a few times in the loop and lose the previous
> > > sdev allocations. Is this intended?
> > 
> > Yes...it is weird..but by design I would say :P ...
> > 
> > ...because this is called to instantiate one single device OR instantiate at
> > once all the multiple devices needed for a protocol: in this latter case it
> > returns just one of the created devices to signal success or NULL if all the
> > devices' creation failed....we dont need to keep the allocated devices references
> > anyway here since on success those devices are now referenced and kept on the
> > SCMI bus, so they can be searched/scanned/destroyed from there.
> 
> Not sure why the pointer isn't found, device_add() should link it with
> the parent. Unless something else fails, the parent is freed and the
> linked devices unreachable. I'm not familiar at all with this code, I
> just saw kmemleak and thought of replying.
> 
> The loop is still weird, scmi_chan_setup() seems to use the pointer to
> scmi_device for something more meaningful than a pass/fail check. Also
> the overall result is based only on what the last __scmi_device_create()
> return value was, irrespective of the previous iterations of the loop.
> You do have a pr_err() but no early bailing out of the loop on failure.
> I'm curious if there are any SCMI errors in the Alice's kernel log.
> 

Yes, the weirdness comes from the fact such function is used alternatively
to create a single named device (and make some use of it, like in
scmi_chan_setup) OR to create a bunch of devices for the same protocol
when no specific device is asked for (name==NULL)...anyway the case at
hand that kmemleak complains about does NOT pass through that weird loop...

...good news is, I was able to reproduce a similar report consistently
with a load/unload/load sequence....the culprit is that when looking for
a device to destroy on unload, the SCMI bus uses device_find_child()
and that bumps the device refcnt implicitly...as a result when the device
is destroyed the refcnt is NEVER found as zero and so NO device_release
is ever called...this results in dev->p private_data to be never released
and that is what kmemleak spotted (at the start of teh chain):


unreferenced object 0xffff00000f583800 (size 512):
      comm "insmod", pid 227, jiffies 4294912190
      hex dump (first 32 bytes):
        00 00 00 00 ad 4e ad de ff ff ff ff 00 00 00 00  .....N..........
        ff ff ff ff ff ff ff ff 60 36 1d 8a 00 80 ff ff  ........`6......
      backtrace (crc 114e2eed):
        kmemleak_alloc+0xbc/0xd8
        __kmalloc_cache_noprof+0x2dc/0x398
        device_add+0x954/0x12d0
        device_register+0x28/0x40
        __scmi_device_create.part.0+0x1bc/0x380
        scmi_device_create+0x2d0/0x390
        scmi_create_protocol_devices+0x74/0xf8
        scmi_device_request_notifier+0x1f8/0x2a8
        notifier_call_chain+0x110/0x3b0
        blocking_notifier_call_chain+0x70/0xb0
        scmi_driver_register+0x350/0x7f0
        0xffff80000a3b3038
        do_one_initcall+0x12c/0x730
        do_init_module+0x1dc/0x640
        load_module+0x4b20/0x5b70
        init_module_from_file+0xec/0x158
    
    $ ./scripts/faddr2line ./vmlinux device_add+0x954/0x12d0
    device_add+0x954/0x12d0:
    kmalloc_noprof at include/linux/slab.h:901
    (inlined by) kzalloc_noprof at include/linux/slab.h:1037
    (inlined by) device_private_init at drivers/base/core.c:3510
    (inlined by) device_add at drivers/base/core.c:3561

I am posting a fix.

Thanks for the report and the help.
Any feedback and testing is much welcomed :D

Cristian


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2025-03-06 18:47 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-03-05 11:59 [Bug report] Memory leak in scmi_device_create Alice Ryhl
2025-03-05 17:10 ` Cristian Marussi
2025-03-06 11:09   ` Alice Ryhl
2025-03-06 12:50     ` Cristian Marussi
2025-03-06 13:25       ` Alice Ryhl
2025-03-06 14:36     ` Catalin Marinas
2025-03-06 15:47       ` Cristian Marussi
2025-03-06 16:18         ` Catalin Marinas
2025-03-06 18:43           ` Cristian Marussi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).