* [PATCH] drivers/platform/x86/amd: pmf: Fix a double free on module unload
@ 2025-05-06 13:11 Mario Limonciello
2025-05-06 15:53 ` Dan Carpenter
0 siblings, 1 reply; 6+ messages in thread
From: Mario Limonciello @ 2025-05-06 13:11 UTC (permalink / raw)
To: mario.limonciello, Shyam-sundar.S-k, hdegoede, ilpo.jarvinen,
dan.carpenter
Cc: platform-driver-x86
From: Mario Limonciello <mario.limonciello@amd.com>
If setting up smart PC fails for any reason then this can lead to
a double free when unloading amd-pmf. This is because dev->buf was
freed but never set to NULL and is again freed in
amd_pmf_deinit_smart_pc().
Explicitly set pointers to NULL after freeing them to avoid the
double free.
Fixes: 5b1122fc4995f ("platform/x86/amd/pmf: fix cleanup in amd_pmf_init_smart_pc()")
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
---
drivers/platform/x86/amd/pmf/tee-if.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/platform/x86/amd/pmf/tee-if.c b/drivers/platform/x86/amd/pmf/tee-if.c
index a1e43873a07b0..48902f1c767c6 100644
--- a/drivers/platform/x86/amd/pmf/tee-if.c
+++ b/drivers/platform/x86/amd/pmf/tee-if.c
@@ -579,10 +579,13 @@ int amd_pmf_init_smart_pc(struct amd_pmf_dev *dev)
amd_pmf_tee_deinit(dev);
err_free_prev_data:
kfree(dev->prev_data);
+ dev->prev_data = NULL;
err_free_policy:
kfree(dev->policy_buf);
+ dev->policy_buf = NULL;
err_free_dram_buf:
kfree(dev->buf);
+ dev->buf = NULL;
err_cancel_work:
cancel_delayed_work_sync(&dev->pb_work);
--
2.43.0
^ permalink raw reply related [flat|nested] 6+ messages in thread* Re: [PATCH] drivers/platform/x86/amd: pmf: Fix a double free on module unload 2025-05-06 13:11 [PATCH] drivers/platform/x86/amd: pmf: Fix a double free on module unload Mario Limonciello @ 2025-05-06 15:53 ` Dan Carpenter 2025-05-06 16:11 ` Mario Limonciello 0 siblings, 1 reply; 6+ messages in thread From: Dan Carpenter @ 2025-05-06 15:53 UTC (permalink / raw) To: Mario Limonciello Cc: mario.limonciello, Shyam-sundar.S-k, hdegoede, ilpo.jarvinen, platform-driver-x86 On Tue, May 06, 2025 at 08:11:29AM -0500, Mario Limonciello wrote: > From: Mario Limonciello <mario.limonciello@amd.com> > > If setting up smart PC fails for any reason then this can lead to > a double free when unloading amd-pmf. This is because dev->buf was > freed but never set to NULL and is again freed in > amd_pmf_deinit_smart_pc(). > > Explicitly set pointers to NULL after freeing them to avoid the > double free. > > Fixes: 5b1122fc4995f ("platform/x86/amd/pmf: fix cleanup in amd_pmf_init_smart_pc()") > Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> > --- > drivers/platform/x86/amd/pmf/tee-if.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/drivers/platform/x86/amd/pmf/tee-if.c b/drivers/platform/x86/amd/pmf/tee-if.c > index a1e43873a07b0..48902f1c767c6 100644 > --- a/drivers/platform/x86/amd/pmf/tee-if.c > +++ b/drivers/platform/x86/amd/pmf/tee-if.c > @@ -579,10 +579,13 @@ int amd_pmf_init_smart_pc(struct amd_pmf_dev *dev) > amd_pmf_tee_deinit(dev); ^^^^^^^^^^^^^^^^^^^^^^^ > err_free_prev_data: > kfree(dev->prev_data); > + dev->prev_data = NULL; > err_free_policy: > kfree(dev->policy_buf); > + dev->policy_buf = NULL; > err_free_dram_buf: > kfree(dev->buf); > + dev->buf = NULL; > err_cancel_work: > cancel_delayed_work_sync(&dev->pb_work); > This is a real bug. Did you find it from testing or reading the code? My reading of the code says that this bug can only occur if amd_pmf_register_input_device() fails, right? We can only call amd_pmf_deinit_smart_pc() if amd_pmf_start_policy_engine() succeeds because that's where we set: dev->smart_pc_enabled = true; This patch doesn't totally fix the problem because we would still call amd_pmf_tee_deinit(). That's why I suspect you found this by auditing the code because I think that remaining bug would trigger a stack trace. I also worry that there is a small race window where we could trigger amd_pmf_tee_deinit() before amd_pmf_init_smart_pc() has finished running. Another bug is that we should cancel the work before freeing all the pointers. This looks like the more serious bug. What about if we only set dev->smart_pc_enabled = true if the whole amd_pmf_init_smart_pc() has succeeded? regards, dan carpenter ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] drivers/platform/x86/amd: pmf: Fix a double free on module unload 2025-05-06 15:53 ` Dan Carpenter @ 2025-05-06 16:11 ` Mario Limonciello 2025-05-07 1:41 ` Mario Limonciello 2025-05-07 1:49 ` Mario Limonciello 0 siblings, 2 replies; 6+ messages in thread From: Mario Limonciello @ 2025-05-06 16:11 UTC (permalink / raw) To: Dan Carpenter Cc: mario.limonciello, Shyam-sundar.S-k, hdegoede, ilpo.jarvinen, platform-driver-x86 On 5/6/2025 10:53 AM, Dan Carpenter wrote: > On Tue, May 06, 2025 at 08:11:29AM -0500, Mario Limonciello wrote: >> From: Mario Limonciello <mario.limonciello@amd.com> >> >> If setting up smart PC fails for any reason then this can lead to >> a double free when unloading amd-pmf. This is because dev->buf was >> freed but never set to NULL and is again freed in >> amd_pmf_deinit_smart_pc(). >> >> Explicitly set pointers to NULL after freeing them to avoid the >> double free. >> >> Fixes: 5b1122fc4995f ("platform/x86/amd/pmf: fix cleanup in amd_pmf_init_smart_pc()") >> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> >> --- >> drivers/platform/x86/amd/pmf/tee-if.c | 3 +++ >> 1 file changed, 3 insertions(+) >> >> diff --git a/drivers/platform/x86/amd/pmf/tee-if.c b/drivers/platform/x86/amd/pmf/tee-if.c >> index a1e43873a07b0..48902f1c767c6 100644 >> --- a/drivers/platform/x86/amd/pmf/tee-if.c >> +++ b/drivers/platform/x86/amd/pmf/tee-if.c >> @@ -579,10 +579,13 @@ int amd_pmf_init_smart_pc(struct amd_pmf_dev *dev) >> amd_pmf_tee_deinit(dev); > ^^^^^^^^^^^^^^^^^^^^^^^ > >> err_free_prev_data: >> kfree(dev->prev_data); >> + dev->prev_data = NULL; >> err_free_policy: >> kfree(dev->policy_buf); >> + dev->policy_buf = NULL; >> err_free_dram_buf: >> kfree(dev->buf); >> + dev->buf = NULL; >> err_cancel_work: >> cancel_delayed_work_sync(&dev->pb_work); >> > > This is a real bug. Did you find it from testing or reading the code? I found it from testing. I was testing some other unrelated changes and found that unloading/reloading the module eventually lead to problems. I tracked it down to your change. > My reading of the code says that this bug can only occur if > amd_pmf_register_input_device() fails, right? No; it was happening from a failure where the system didn't have a policy or had a "bad" policy. > > We can only call amd_pmf_deinit_smart_pc() if amd_pmf_start_policy_engine() > succeeds because that's where we set: > > dev->smart_pc_enabled = true; > > This patch doesn't totally fix the problem because we would still call > amd_pmf_tee_deinit(). That's why I suspect you found this by auditing > the code because I think that remaining bug would trigger a stack trace. > I also worry that there is a small race window where we could trigger > amd_pmf_tee_deinit() before amd_pmf_init_smart_pc() has finished > running. > > Another bug is that we should cancel the work before freeing all the > pointers. This looks like the more serious bug. > > What about if we only set dev->smart_pc_enabled = true if the whole > amd_pmf_init_smart_pc() has succeeded? > > regards, > dan carpenter > Right; it's only set when amd_pmf_start_policy_engine() succeeds which was not the case for me. This makes me wonder how exactly this was happening [amd_pmf_deinit_smart_pc() would only be called from amd_pmf_deinit_features()]. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] drivers/platform/x86/amd: pmf: Fix a double free on module unload 2025-05-06 16:11 ` Mario Limonciello @ 2025-05-07 1:41 ` Mario Limonciello 2025-05-07 1:49 ` Mario Limonciello 1 sibling, 0 replies; 6+ messages in thread From: Mario Limonciello @ 2025-05-07 1:41 UTC (permalink / raw) To: Dan Carpenter Cc: Shyam-sundar.S-k, hdegoede, ilpo.jarvinen, platform-driver-x86 On 5/6/2025 11:11 AM, Mario Limonciello wrote: > On 5/6/2025 10:53 AM, Dan Carpenter wrote: >> On Tue, May 06, 2025 at 08:11:29AM -0500, Mario Limonciello wrote: >>> From: Mario Limonciello <mario.limonciello@amd.com> >>> >>> If setting up smart PC fails for any reason then this can lead to >>> a double free when unloading amd-pmf. This is because dev->buf was >>> freed but never set to NULL and is again freed in >>> amd_pmf_deinit_smart_pc(). >>> >>> Explicitly set pointers to NULL after freeing them to avoid the >>> double free. >>> >>> Fixes: 5b1122fc4995f ("platform/x86/amd/pmf: fix cleanup in >>> amd_pmf_init_smart_pc()") >>> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> >>> --- >>> drivers/platform/x86/amd/pmf/tee-if.c | 3 +++ >>> 1 file changed, 3 insertions(+) >>> >>> diff --git a/drivers/platform/x86/amd/pmf/tee-if.c b/drivers/ >>> platform/x86/amd/pmf/tee-if.c >>> index a1e43873a07b0..48902f1c767c6 100644 >>> --- a/drivers/platform/x86/amd/pmf/tee-if.c >>> +++ b/drivers/platform/x86/amd/pmf/tee-if.c >>> @@ -579,10 +579,13 @@ int amd_pmf_init_smart_pc(struct amd_pmf_dev *dev) >>> amd_pmf_tee_deinit(dev); >> ^^^^^^^^^^^^^^^^^^^^^^^ >> >>> err_free_prev_data: >>> kfree(dev->prev_data); >>> + dev->prev_data = NULL; >>> err_free_policy: >>> kfree(dev->policy_buf); >>> + dev->policy_buf = NULL; >>> err_free_dram_buf: >>> kfree(dev->buf); >>> + dev->buf = NULL; >>> err_cancel_work: >>> cancel_delayed_work_sync(&dev->pb_work); >> >> This is a real bug. Did you find it from testing or reading the code? > > I found it from testing. I was testing some other unrelated changes and > found that unloading/reloading the module eventually lead to problems. I > tracked it down to your change. > >> My reading of the code says that this bug can only occur if >> amd_pmf_register_input_device() fails, right? > > No; it was happening from a failure where the system didn't have a > policy or had a "bad" policy. > >> >> We can only call amd_pmf_deinit_smart_pc() if >> amd_pmf_start_policy_engine() >> succeeds because that's where we set: >> >> dev->smart_pc_enabled = true; >> >> This patch doesn't totally fix the problem because we would still call >> amd_pmf_tee_deinit(). That's why I suspect you found this by auditing >> the code because I think that remaining bug would trigger a stack trace. >> I also worry that there is a small race window where we could trigger >> amd_pmf_tee_deinit() before amd_pmf_init_smart_pc() has finished >> running. >> >> Another bug is that we should cancel the work before freeing all the >> pointers. This looks like the more serious bug. >> >> What about if we only set dev->smart_pc_enabled = true if the whole >> amd_pmf_init_smart_pc() has succeeded? >> >> regards, >> dan carpenter >> > > Right; it's only set when amd_pmf_start_policy_engine() succeeds which > was not the case for me. This makes me wonder how exactly this was > happening [amd_pmf_deinit_smart_pc() would only be called from > amd_pmf_deinit_features()]. For completeness; here is the trace back after load/unload which was fixed by this patch I shared (passed through scripts/decode_stacktrace.sh) [ 33.636532] ------------[ cut here ]------------ [ 33.636540] kernel BUG at mm/slub.c:546! [ 33.636554] Oops: invalid opcode: 0000 [#1] SMP NOPTI [ 33.636560] CPU: 4 UID: 0 PID: 3245 Comm: rmmod Not tainted 6.15.0-rc1-00020-g0581d384f344 #79 PREEMPT(full) 4b12a0d175d8a11a0952957398aaf7c2efd0bbd7 [ 33.636565] Hardware name: Framework Laptop 13 (AMD Ryzen AI 300 Series)/FRANMGCP09, BIOS 03.03 03/10/2025 [ 33.636567] RIP: 0010:__slab_free (mm/slub.c:546 (discriminator 1) mm/slub.c:4475 (discriminator 1)) [ 33.636578] Code: 0c 44 0f b6 4c 24 13 44 8b 44 24 14 48 89 44 24 20 49 8b 04 24 4c 8b 54 24 18 48 c1 e8 09 83 e0 01 88 44 24 11 e9 ce fe ff ff <0f> 0b 41 f7 46 08 87 04 00 00 75 91 eb 83 41 f7 46 08 87 04 00 00 All code ======== 0: 0c 44 or $0x44,%al 2: 0f b6 4c 24 13 movzbl 0x13(%rsp),%ecx 7: 44 8b 44 24 14 mov 0x14(%rsp),%r8d c: 48 89 44 24 20 mov %rax,0x20(%rsp) 11: 49 8b 04 24 mov (%r12),%rax 15: 4c 8b 54 24 18 mov 0x18(%rsp),%r10 1a: 48 c1 e8 09 shr $0x9,%rax 1e: 83 e0 01 and $0x1,%eax 21: 88 44 24 11 mov %al,0x11(%rsp) 25: e9 ce fe ff ff jmp 0xfffffffffffffef8 2a:* 0f 0b ud2 <-- trapping instruction 2c: 41 f7 46 08 87 04 00 testl $0x487,0x8(%r14) 33: 00 34: 75 91 jne 0xffffffffffffffc7 36: eb 83 jmp 0xffffffffffffffbb 38: 41 f7 46 08 87 04 00 testl $0x487,0x8(%r14) 3f: 00 Code starting with the faulting instruction =========================================== 0: 0f 0b ud2 2: 41 f7 46 08 87 04 00 testl $0x487,0x8(%r14) 9: 00 a: 75 91 jne 0xffffffffffffff9d c: eb 83 jmp 0xffffffffffffff91 e: 41 f7 46 08 87 04 00 testl $0x487,0x8(%r14) 15: 00 [ 33.636580] RSP: 0018:ffffd3ae43327a10 EFLAGS: 00010246 [ 33.636584] RAX: ffff8a6164a50f00 RBX: 000000008020001e RCX: 0000000000000000 [ 33.636585] RDX: ffffd3ae43327a40 RSI: fffff51cc4929400 RDI: ffffd3ae43327a80 [ 33.636587] RBP: ffffd3ae43327ab0 R08: 0000000000000001 R09: ffffffffaa799600 [ 33.636588] R10: ffff8a6164a50e00 R11: ffffd3ae43327ab0 R12: fffff51cc4929400 [ 33.636589] R13: ffff8a6164a50e00 R14: ffff8a6140042a00 R15: 0000000000000000 [ 33.636590] FS: 00007f6c8ee08680(0000) GS:ffff8a68d20c3000(0000) knlGS:0000000000000000 [ 33.636591] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 33.636592] CR2: 00007f6c8e4578e0 CR3: 000000012fdb0000 CR4: 0000000000b50ef0 [ 33.636594] Call Trace: [ 33.636596] <TASK> [ 33.636598] ? __call_rcu_common (./arch/x86/include/asm/atomic64_64.h:15 ./include/linux/atomic/atomic-arch-fallback.h:2583 ./include/linux/atomic/atomic-long.h:38 ./include/linux/atomic/atomic-instrumented.h:3189 kernel/rcu/rcu_segcblist.h:52 kernel/rcu/tree.c:2956 kernel/rcu/tree.c:2967 kernel/rcu/tree.c:3106) [ 33.636605] ? platform_remove (drivers/base/platform.c:1424) [ 33.636610] kfree (mm/slub.c:4599 mm/slub.c:4647 mm/slub.c:4845) [ 33.636613] ? mntput_no_expire (fs/namespace.c:259 fs/namespace.c:1492) [ 33.636626] platform_remove (drivers/base/platform.c:1424) [ 33.636628] device_release_driver_internal (drivers/base/dd.c:1275 drivers/base/dd.c:1296) [ 33.636634] driver_detach (drivers/base/dd.c:1360) [ 33.636637] bus_remove_driver (drivers/base/bus.c:748) [ 33.636640] __do_sys_delete_module (kernel/module/main.c:781) [ 33.636646] do_syscall_64 (arch/x86/entry/syscall_64.c:63 (discriminator 1) arch/x86/entry/syscall_64.c:94 (discriminator 1)) [ 33.636654] ? vfs_read (fs/read_write.c:489 fs/read_write.c:570) [ 33.636658] ? ___pte_offset_map (./include/linux/pgtable.h:347 (discriminator 2) ./include/linux/pgtable.h:624 (discriminator 2) mm/pgtable-generic.c:289 (discriminator 2)) [ 33.636663] ? next_uptodate_folio (./include/linux/xarray.h:1722 mm/filemap.c:3585) [ 33.636668] ? __mod_memcg_lruvec_state (mm/memcontrol.c:586 mm/memcontrol.c:771) [ 33.636673] ? mod_objcg_state (mm/memcontrol.c:2475 mm/memcontrol.c:2837) [ 33.636675] ? xas_load (./include/linux/xarray.h:175 ./include/linux/xarray.h:1264 lib/xarray.c:241) [ 33.636680] ? xa_load (lib/xarray.c:1624) [ 33.636683] ? __memcg_slab_free_hook (mm/memcontrol.c:3093 (discriminator 1)) [ 33.636686] ? __x64_sys_close (fs/open.c:1584 fs/open.c:1566 fs/open.c:1566) [ 33.636689] ? kmem_cache_free (mm/slub.c:4646 (discriminator 1) mm/slub.c:4748 (discriminator 1)) [ 33.636692] ? syscall_exit_to_user_mode (./arch/x86/include/asm/irqflags.h:37 ./arch/x86/include/asm/irqflags.h:114 ./include/linux/entry-common.h:232 kernel/entry/common.c:206 kernel/entry/common.c:218) [ 33.636695] ? do_syscall_64 (arch/x86/entry/syscall_64.c:113) [ 33.636698] ? __count_memcg_events (mm/memcontrol.c:586 mm/memcontrol.c:862) [ 33.636699] ? handle_mm_fault (mm/memory.c:6182 mm/memory.c:6335) [ 33.636704] ? do_user_addr_fault (arch/x86/mm/fault.c:1338) [ 33.636710] ? exc_page_fault (./arch/x86/include/asm/irqflags.h:37 ./arch/x86/include/asm/irqflags.h:114 arch/x86/mm/fault.c:1488 arch/x86/mm/fault.c:1538) [ 33.636713] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130) [ 33.636716] RIP: 0033:0x7f6c8e53519b [ 33.636719] Code: 73 01 c3 48 8b 0d 5d bc 0d 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 b0 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 2d bc 0d 00 f7 d8 64 89 01 48 And this was built and reproduced from platform-x86/review-ilpo-fixes: 0581d384f344e (HEAD, platform-x86/review-ilpo-fixes) platform/x86/amd/hsmp: Make amd_hsmp and hsmp_acpi as mutually exclusive drivers 8e81b9cd6e951 drivers/platform/x86/amd: pmf: Check for invalid Smart PC Policies 690d722e02819 drivers/platform/x86/amd: pmf: Check for invalid sideloaded Smart PC Policies 02c6e43397c39 (tag: platform-drivers-x86-v6.15-4, platform-x86/fixes) ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] drivers/platform/x86/amd: pmf: Fix a double free on module unload 2025-05-06 16:11 ` Mario Limonciello 2025-05-07 1:41 ` Mario Limonciello @ 2025-05-07 1:49 ` Mario Limonciello 2025-05-07 7:22 ` Dan Carpenter 1 sibling, 1 reply; 6+ messages in thread From: Mario Limonciello @ 2025-05-07 1:49 UTC (permalink / raw) To: Mario Limonciello, Dan Carpenter Cc: Shyam-sundar.S-k, hdegoede, ilpo.jarvinen, platform-driver-x86 On 5/6/2025 11:11 AM, Mario Limonciello wrote: > On 5/6/2025 10:53 AM, Dan Carpenter wrote: >> On Tue, May 06, 2025 at 08:11:29AM -0500, Mario Limonciello wrote: >>> From: Mario Limonciello <mario.limonciello@amd.com> >>> >>> If setting up smart PC fails for any reason then this can lead to >>> a double free when unloading amd-pmf. This is because dev->buf was >>> freed but never set to NULL and is again freed in >>> amd_pmf_deinit_smart_pc(). >>> >>> Explicitly set pointers to NULL after freeing them to avoid the >>> double free. >>> >>> Fixes: 5b1122fc4995f ("platform/x86/amd/pmf: fix cleanup in >>> amd_pmf_init_smart_pc()") >>> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> >>> --- >>> drivers/platform/x86/amd/pmf/tee-if.c | 3 +++ >>> 1 file changed, 3 insertions(+) >>> >>> diff --git a/drivers/platform/x86/amd/pmf/tee-if.c b/drivers/ >>> platform/x86/amd/pmf/tee-if.c >>> index a1e43873a07b0..48902f1c767c6 100644 >>> --- a/drivers/platform/x86/amd/pmf/tee-if.c >>> +++ b/drivers/platform/x86/amd/pmf/tee-if.c >>> @@ -579,10 +579,13 @@ int amd_pmf_init_smart_pc(struct amd_pmf_dev *dev) >>> amd_pmf_tee_deinit(dev); >> ^^^^^^^^^^^^^^^^^^^^^^^ >> >>> err_free_prev_data: >>> kfree(dev->prev_data); >>> + dev->prev_data = NULL; >>> err_free_policy: >>> kfree(dev->policy_buf); >>> + dev->policy_buf = NULL; >>> err_free_dram_buf: >>> kfree(dev->buf); >>> + dev->buf = NULL; >>> err_cancel_work: >>> cancel_delayed_work_sync(&dev->pb_work); >> >> This is a real bug. Did you find it from testing or reading the code? > > I found it from testing. I was testing some other unrelated changes and > found that unloading/reloading the module eventually lead to problems. I > tracked it down to your change. > >> My reading of the code says that this bug can only occur if >> amd_pmf_register_input_device() fails, right? > > No; it was happening from a failure where the system didn't have a > policy or had a "bad" policy. > >> >> We can only call amd_pmf_deinit_smart_pc() if >> amd_pmf_start_policy_engine() >> succeeds because that's where we set: >> >> dev->smart_pc_enabled = true; >> >> This patch doesn't totally fix the problem because we would still call >> amd_pmf_tee_deinit(). That's why I suspect you found this by auditing >> the code because I think that remaining bug would trigger a stack trace. >> I also worry that there is a small race window where we could trigger >> amd_pmf_tee_deinit() before amd_pmf_init_smart_pc() has finished >> running. >> >> Another bug is that we should cancel the work before freeing all the >> pointers. This looks like the more serious bug. >> >> What about if we only set dev->smart_pc_enabled = true if the whole >> amd_pmf_init_smart_pc() has succeeded? >> >> regards, >> dan carpenter >> > > Right; it's only set when amd_pmf_start_policy_engine() succeeds which > was not the case for me. This makes me wonder how exactly this was > happening [amd_pmf_deinit_smart_pc() would only be called from > amd_pmf_deinit_features()]. Ah I think I found the actual callpath. It's amd_pmf_remove() that has kfree(dev->buf) - that's probably what's actually tripping it. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] drivers/platform/x86/amd: pmf: Fix a double free on module unload 2025-05-07 1:49 ` Mario Limonciello @ 2025-05-07 7:22 ` Dan Carpenter 0 siblings, 0 replies; 6+ messages in thread From: Dan Carpenter @ 2025-05-07 7:22 UTC (permalink / raw) To: Mario Limonciello Cc: Mario Limonciello, Shyam-sundar.S-k, hdegoede, ilpo.jarvinen, platform-driver-x86 On Tue, May 06, 2025 at 08:49:12PM -0500, Mario Limonciello wrote: > On 5/6/2025 11:11 AM, Mario Limonciello wrote: > > On 5/6/2025 10:53 AM, Dan Carpenter wrote: > > > On Tue, May 06, 2025 at 08:11:29AM -0500, Mario Limonciello wrote: > > > > From: Mario Limonciello <mario.limonciello@amd.com> > > > > > > > > If setting up smart PC fails for any reason then this can lead to > > > > a double free when unloading amd-pmf. This is because dev->buf was > > > > freed but never set to NULL and is again freed in > > > > amd_pmf_deinit_smart_pc(). > > > > > > > > Explicitly set pointers to NULL after freeing them to avoid the > > > > double free. > > > > > > > > Fixes: 5b1122fc4995f ("platform/x86/amd/pmf: fix cleanup in > > > > amd_pmf_init_smart_pc()") > > > > Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> > > > > --- > > > > drivers/platform/x86/amd/pmf/tee-if.c | 3 +++ > > > > 1 file changed, 3 insertions(+) > > > > > > > > diff --git a/drivers/platform/x86/amd/pmf/tee-if.c b/drivers/ > > > > platform/x86/amd/pmf/tee-if.c > > > > index a1e43873a07b0..48902f1c767c6 100644 > > > > --- a/drivers/platform/x86/amd/pmf/tee-if.c > > > > +++ b/drivers/platform/x86/amd/pmf/tee-if.c > > > > @@ -579,10 +579,13 @@ int amd_pmf_init_smart_pc(struct amd_pmf_dev *dev) > > > > amd_pmf_tee_deinit(dev); > > > ^^^^^^^^^^^^^^^^^^^^^^^ > > > > > > > err_free_prev_data: > > > > kfree(dev->prev_data); > > > > + dev->prev_data = NULL; > > > > err_free_policy: > > > > kfree(dev->policy_buf); > > > > + dev->policy_buf = NULL; > > > > err_free_dram_buf: > > > > kfree(dev->buf); > > > > + dev->buf = NULL; > > > > err_cancel_work: > > > > cancel_delayed_work_sync(&dev->pb_work); > > > > > > This is a real bug. Did you find it from testing or reading the code? > > > > I found it from testing. I was testing some other unrelated changes and > > found that unloading/reloading the module eventually lead to problems. I > > tracked it down to your change. > > > > > My reading of the code says that this bug can only occur if > > > amd_pmf_register_input_device() fails, right? > > > > No; it was happening from a failure where the system didn't have a > > policy or had a "bad" policy. > > > > > > > > We can only call amd_pmf_deinit_smart_pc() if > > > amd_pmf_start_policy_engine() > > > succeeds because that's where we set: > > > > > > dev->smart_pc_enabled = true; > > > > > > This patch doesn't totally fix the problem because we would still call > > > amd_pmf_tee_deinit(). That's why I suspect you found this by auditing > > > the code because I think that remaining bug would trigger a stack trace. > > > I also worry that there is a small race window where we could trigger > > > amd_pmf_tee_deinit() before amd_pmf_init_smart_pc() has finished > > > running. > > > > > > Another bug is that we should cancel the work before freeing all the > > > pointers. This looks like the more serious bug. > > > > > > What about if we only set dev->smart_pc_enabled = true if the whole > > > amd_pmf_init_smart_pc() has succeeded? > > > > > > regards, > > > dan carpenter > > > > > > > Right; it's only set when amd_pmf_start_policy_engine() succeeds which > > was not the case for me. This makes me wonder how exactly this was > > happening [amd_pmf_deinit_smart_pc() would only be called from > > amd_pmf_deinit_features()]. > > Ah I think I found the actual callpath. > > It's amd_pmf_remove() that has kfree(dev->buf) - that's probably what's > actually tripping it. Great! There should be no need to call kfree(dev->buf) there because amd_pmf_deinit_features() will do it. Could you add something like the following to your diff? The changes to amd_pmf_ta_open_session() are unrelated actually? Do that in another patch? Setting dev->tee_ctx to NULL will prevent amd_pmf_tee_deinit() from being run twice. We only need to cancel the work once it has been scheduled. We scheduled it for 3 seconds into the future so it will be canceled and that bug won't trigger in real life. But freeing the pointers before canceling is ugly. regards, dan carpenter diff --git a/drivers/platform/x86/amd/pmf/core.c b/drivers/platform/x86/amd/pmf/core.c index 96821101ec77..0a494d6e8fcf 100644 --- a/drivers/platform/x86/amd/pmf/core.c +++ b/drivers/platform/x86/amd/pmf/core.c @@ -493,7 +493,6 @@ static void amd_pmf_remove(struct platform_device *pdev) mutex_destroy(&dev->lock); mutex_destroy(&dev->update_mutex); mutex_destroy(&dev->cb_mutex); - kfree(dev->buf); } static const struct attribute_group *amd_pmf_driver_groups[] = { diff --git a/drivers/platform/x86/amd/pmf/tee-if.c b/drivers/platform/x86/amd/pmf/tee-if.c index 14b99d8b63d2..b332b5470398 100644 --- a/drivers/platform/x86/amd/pmf/tee-if.c +++ b/drivers/platform/x86/amd/pmf/tee-if.c @@ -407,12 +407,12 @@ static int amd_pmf_ta_open_session(struct tee_context *ctx, u32 *id, const uuid_ rc = tee_client_open_session(ctx, &sess_arg, NULL); if (rc < 0 || sess_arg.ret != 0) { pr_err("Failed to open TEE session err:%#x, rc:%d\n", sess_arg.ret, rc); - return rc; + return rc ?: -EINVAL; } *id = sess_arg.session; - return rc; + return 0; } static int amd_pmf_register_input_device(struct amd_pmf_dev *dev) @@ -447,7 +447,9 @@ static int amd_pmf_tee_init(struct amd_pmf_dev *dev, const uuid_t *uuid) dev->tee_ctx = tee_client_open_context(NULL, amd_pmf_amdtee_ta_match, NULL, NULL); if (IS_ERR(dev->tee_ctx)) { dev_err(dev->dev, "Failed to open TEE context\n"); - return PTR_ERR(dev->tee_ctx); + ret = PTR_ERR(dev->tee_ctx); + dev->tee_ctx = NULL; + return ret; } ret = amd_pmf_ta_open_session(dev->tee_ctx, &dev->session_id, uuid); @@ -487,9 +489,12 @@ static int amd_pmf_tee_init(struct amd_pmf_dev *dev, const uuid_t *uuid) static void amd_pmf_tee_deinit(struct amd_pmf_dev *dev) { + if (!dev->tee_ctx) + return; tee_shm_free(dev->fw_shm_pool); tee_client_close_session(dev->tee_ctx, dev->session_id); tee_client_close_context(dev->tee_ctx); + dev->tee_ctx = NULL; } int amd_pmf_init_smart_pc(struct amd_pmf_dev *dev) @@ -512,7 +517,7 @@ int amd_pmf_init_smart_pc(struct amd_pmf_dev *dev) ret = amd_pmf_set_dram_addr(dev, true); if (ret) - goto err_cancel_work; + return ret; dev->policy_base = devm_ioremap_resource(dev->dev, dev->res); if (IS_ERR(dev->policy_base)) { @@ -576,6 +581,7 @@ int amd_pmf_init_smart_pc(struct amd_pmf_dev *dev) return 0; err_pmf_remove_pb: + cancel_delayed_work_sync(&dev->pb_work); if (pb_side_load && dev->esbin) amd_pmf_remove_pb(dev); amd_pmf_tee_deinit(dev); @@ -585,8 +591,6 @@ int amd_pmf_init_smart_pc(struct amd_pmf_dev *dev) kfree(dev->policy_buf); err_free_dram_buf: kfree(dev->buf); -err_cancel_work: - cancel_delayed_work_sync(&dev->pb_work); return ret; } ^ permalink raw reply related [flat|nested] 6+ messages in thread
end of thread, other threads:[~2025-05-07 7:22 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-05-06 13:11 [PATCH] drivers/platform/x86/amd: pmf: Fix a double free on module unload Mario Limonciello 2025-05-06 15:53 ` Dan Carpenter 2025-05-06 16:11 ` Mario Limonciello 2025-05-07 1:41 ` Mario Limonciello 2025-05-07 1:49 ` Mario Limonciello 2025-05-07 7:22 ` Dan Carpenter
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.