All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] drivers/platform/x86/amd: pmf: Fix a double free on module unload
@ 2025-05-06 13:11 Mario Limonciello
  2025-05-06 15:53 ` Dan Carpenter
  0 siblings, 1 reply; 6+ messages in thread
From: Mario Limonciello @ 2025-05-06 13:11 UTC (permalink / raw)
  To: mario.limonciello, Shyam-sundar.S-k, hdegoede, ilpo.jarvinen,
	dan.carpenter
  Cc: platform-driver-x86

From: Mario Limonciello <mario.limonciello@amd.com>

If setting up smart PC fails for any reason then this can lead to
a double free when unloading amd-pmf.  This is because dev->buf was
freed but never set to NULL and is again freed in
amd_pmf_deinit_smart_pc().

Explicitly set pointers to NULL after freeing them to avoid the
double free.

Fixes: 5b1122fc4995f ("platform/x86/amd/pmf: fix cleanup in amd_pmf_init_smart_pc()")
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
---
 drivers/platform/x86/amd/pmf/tee-if.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/platform/x86/amd/pmf/tee-if.c b/drivers/platform/x86/amd/pmf/tee-if.c
index a1e43873a07b0..48902f1c767c6 100644
--- a/drivers/platform/x86/amd/pmf/tee-if.c
+++ b/drivers/platform/x86/amd/pmf/tee-if.c
@@ -579,10 +579,13 @@ int amd_pmf_init_smart_pc(struct amd_pmf_dev *dev)
 	amd_pmf_tee_deinit(dev);
 err_free_prev_data:
 	kfree(dev->prev_data);
+	dev->prev_data = NULL;
 err_free_policy:
 	kfree(dev->policy_buf);
+	dev->policy_buf = NULL;
 err_free_dram_buf:
 	kfree(dev->buf);
+	dev->buf = NULL;
 err_cancel_work:
 	cancel_delayed_work_sync(&dev->pb_work);
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] drivers/platform/x86/amd: pmf: Fix a double free on module unload
  2025-05-06 13:11 [PATCH] drivers/platform/x86/amd: pmf: Fix a double free on module unload Mario Limonciello
@ 2025-05-06 15:53 ` Dan Carpenter
  2025-05-06 16:11   ` Mario Limonciello
  0 siblings, 1 reply; 6+ messages in thread
From: Dan Carpenter @ 2025-05-06 15:53 UTC (permalink / raw)
  To: Mario Limonciello
  Cc: mario.limonciello, Shyam-sundar.S-k, hdegoede, ilpo.jarvinen,
	platform-driver-x86

On Tue, May 06, 2025 at 08:11:29AM -0500, Mario Limonciello wrote:
> From: Mario Limonciello <mario.limonciello@amd.com>
> 
> If setting up smart PC fails for any reason then this can lead to
> a double free when unloading amd-pmf.  This is because dev->buf was
> freed but never set to NULL and is again freed in
> amd_pmf_deinit_smart_pc().
> 
> Explicitly set pointers to NULL after freeing them to avoid the
> double free.
> 
> Fixes: 5b1122fc4995f ("platform/x86/amd/pmf: fix cleanup in amd_pmf_init_smart_pc()")
> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
> ---
>  drivers/platform/x86/amd/pmf/tee-if.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/platform/x86/amd/pmf/tee-if.c b/drivers/platform/x86/amd/pmf/tee-if.c
> index a1e43873a07b0..48902f1c767c6 100644
> --- a/drivers/platform/x86/amd/pmf/tee-if.c
> +++ b/drivers/platform/x86/amd/pmf/tee-if.c
> @@ -579,10 +579,13 @@ int amd_pmf_init_smart_pc(struct amd_pmf_dev *dev)
>  	amd_pmf_tee_deinit(dev);
        ^^^^^^^^^^^^^^^^^^^^^^^

>  err_free_prev_data:
>  	kfree(dev->prev_data);
> +	dev->prev_data = NULL;
>  err_free_policy:
>  	kfree(dev->policy_buf);
> +	dev->policy_buf = NULL;
>  err_free_dram_buf:
>  	kfree(dev->buf);
> +	dev->buf = NULL;
>  err_cancel_work:
>  	cancel_delayed_work_sync(&dev->pb_work);
>  

This is a real bug.  Did you find it from testing or reading the code?
My reading of the code says that this bug can only occur if
amd_pmf_register_input_device() fails, right?

We can only call amd_pmf_deinit_smart_pc() if amd_pmf_start_policy_engine()
succeeds because that's where we set:

	dev->smart_pc_enabled = true;

This patch doesn't totally fix the problem because we would still call
amd_pmf_tee_deinit().  That's why I suspect you found this by auditing
the code because I think that remaining bug would trigger a stack trace.
I also worry that there is a small race window where we could trigger
amd_pmf_tee_deinit() before amd_pmf_init_smart_pc() has finished
running.

Another bug is that we should cancel the work before freeing all the
pointers.  This looks like the more serious bug.

What about if we only set dev->smart_pc_enabled = true if the whole
amd_pmf_init_smart_pc() has succeeded?

regards,
dan carpenter


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] drivers/platform/x86/amd: pmf: Fix a double free on module unload
  2025-05-06 15:53 ` Dan Carpenter
@ 2025-05-06 16:11   ` Mario Limonciello
  2025-05-07  1:41     ` Mario Limonciello
  2025-05-07  1:49     ` Mario Limonciello
  0 siblings, 2 replies; 6+ messages in thread
From: Mario Limonciello @ 2025-05-06 16:11 UTC (permalink / raw)
  To: Dan Carpenter
  Cc: mario.limonciello, Shyam-sundar.S-k, hdegoede, ilpo.jarvinen,
	platform-driver-x86

On 5/6/2025 10:53 AM, Dan Carpenter wrote:
> On Tue, May 06, 2025 at 08:11:29AM -0500, Mario Limonciello wrote:
>> From: Mario Limonciello <mario.limonciello@amd.com>
>>
>> If setting up smart PC fails for any reason then this can lead to
>> a double free when unloading amd-pmf.  This is because dev->buf was
>> freed but never set to NULL and is again freed in
>> amd_pmf_deinit_smart_pc().
>>
>> Explicitly set pointers to NULL after freeing them to avoid the
>> double free.
>>
>> Fixes: 5b1122fc4995f ("platform/x86/amd/pmf: fix cleanup in amd_pmf_init_smart_pc()")
>> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
>> ---
>>   drivers/platform/x86/amd/pmf/tee-if.c | 3 +++
>>   1 file changed, 3 insertions(+)
>>
>> diff --git a/drivers/platform/x86/amd/pmf/tee-if.c b/drivers/platform/x86/amd/pmf/tee-if.c
>> index a1e43873a07b0..48902f1c767c6 100644
>> --- a/drivers/platform/x86/amd/pmf/tee-if.c
>> +++ b/drivers/platform/x86/amd/pmf/tee-if.c
>> @@ -579,10 +579,13 @@ int amd_pmf_init_smart_pc(struct amd_pmf_dev *dev)
>>   	amd_pmf_tee_deinit(dev);
>          ^^^^^^^^^^^^^^^^^^^^^^^
> 
>>   err_free_prev_data:
>>   	kfree(dev->prev_data);
>> +	dev->prev_data = NULL;
>>   err_free_policy:
>>   	kfree(dev->policy_buf);
>> +	dev->policy_buf = NULL;
>>   err_free_dram_buf:
>>   	kfree(dev->buf);
>> +	dev->buf = NULL;
>>   err_cancel_work:
>>   	cancel_delayed_work_sync(&dev->pb_work);
>>   
> 
> This is a real bug.  Did you find it from testing or reading the code?

I found it from testing.  I was testing some other unrelated changes and 
found that unloading/reloading the module eventually lead to problems. 
I tracked it down to your change.

> My reading of the code says that this bug can only occur if
> amd_pmf_register_input_device() fails, right?

No; it was happening from a failure where the system didn't have a 
policy or had a "bad" policy.

> 
> We can only call amd_pmf_deinit_smart_pc() if amd_pmf_start_policy_engine()
> succeeds because that's where we set:
> 
> 	dev->smart_pc_enabled = true;
> 
> This patch doesn't totally fix the problem because we would still call
> amd_pmf_tee_deinit().  That's why I suspect you found this by auditing
> the code because I think that remaining bug would trigger a stack trace.
> I also worry that there is a small race window where we could trigger
> amd_pmf_tee_deinit() before amd_pmf_init_smart_pc() has finished
> running.
> 
> Another bug is that we should cancel the work before freeing all the
> pointers.  This looks like the more serious bug.
> 
> What about if we only set dev->smart_pc_enabled = true if the whole
> amd_pmf_init_smart_pc() has succeeded?
> 
> regards,
> dan carpenter
> 

Right; it's only set when amd_pmf_start_policy_engine() succeeds which 
was not the case for me.  This makes me wonder how exactly this was 
happening [amd_pmf_deinit_smart_pc() would only be called from 
amd_pmf_deinit_features()].

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] drivers/platform/x86/amd: pmf: Fix a double free on module unload
  2025-05-06 16:11   ` Mario Limonciello
@ 2025-05-07  1:41     ` Mario Limonciello
  2025-05-07  1:49     ` Mario Limonciello
  1 sibling, 0 replies; 6+ messages in thread
From: Mario Limonciello @ 2025-05-07  1:41 UTC (permalink / raw)
  To: Dan Carpenter
  Cc: Shyam-sundar.S-k, hdegoede, ilpo.jarvinen, platform-driver-x86

On 5/6/2025 11:11 AM, Mario Limonciello wrote:
> On 5/6/2025 10:53 AM, Dan Carpenter wrote:
>> On Tue, May 06, 2025 at 08:11:29AM -0500, Mario Limonciello wrote:
>>> From: Mario Limonciello <mario.limonciello@amd.com>
>>>
>>> If setting up smart PC fails for any reason then this can lead to
>>> a double free when unloading amd-pmf.  This is because dev->buf was
>>> freed but never set to NULL and is again freed in
>>> amd_pmf_deinit_smart_pc().
>>>
>>> Explicitly set pointers to NULL after freeing them to avoid the
>>> double free.
>>>
>>> Fixes: 5b1122fc4995f ("platform/x86/amd/pmf: fix cleanup in 
>>> amd_pmf_init_smart_pc()")
>>> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
>>> ---
>>>   drivers/platform/x86/amd/pmf/tee-if.c | 3 +++
>>>   1 file changed, 3 insertions(+)
>>>
>>> diff --git a/drivers/platform/x86/amd/pmf/tee-if.c b/drivers/ 
>>> platform/x86/amd/pmf/tee-if.c
>>> index a1e43873a07b0..48902f1c767c6 100644
>>> --- a/drivers/platform/x86/amd/pmf/tee-if.c
>>> +++ b/drivers/platform/x86/amd/pmf/tee-if.c
>>> @@ -579,10 +579,13 @@ int amd_pmf_init_smart_pc(struct amd_pmf_dev *dev)
>>>       amd_pmf_tee_deinit(dev);
>>          ^^^^^^^^^^^^^^^^^^^^^^^
>>
>>>   err_free_prev_data:
>>>       kfree(dev->prev_data);
>>> +    dev->prev_data = NULL;
>>>   err_free_policy:
>>>       kfree(dev->policy_buf);
>>> +    dev->policy_buf = NULL;
>>>   err_free_dram_buf:
>>>       kfree(dev->buf);
>>> +    dev->buf = NULL;
>>>   err_cancel_work:
>>>       cancel_delayed_work_sync(&dev->pb_work);
>>
>> This is a real bug.  Did you find it from testing or reading the code?
> 
> I found it from testing.  I was testing some other unrelated changes and 
> found that unloading/reloading the module eventually lead to problems. I 
> tracked it down to your change.
> 
>> My reading of the code says that this bug can only occur if
>> amd_pmf_register_input_device() fails, right?
> 
> No; it was happening from a failure where the system didn't have a 
> policy or had a "bad" policy.
> 
>>
>> We can only call amd_pmf_deinit_smart_pc() if 
>> amd_pmf_start_policy_engine()
>> succeeds because that's where we set:
>>
>>     dev->smart_pc_enabled = true;
>>
>> This patch doesn't totally fix the problem because we would still call
>> amd_pmf_tee_deinit().  That's why I suspect you found this by auditing
>> the code because I think that remaining bug would trigger a stack trace.
>> I also worry that there is a small race window where we could trigger
>> amd_pmf_tee_deinit() before amd_pmf_init_smart_pc() has finished
>> running.
>>
>> Another bug is that we should cancel the work before freeing all the
>> pointers.  This looks like the more serious bug.
>>
>> What about if we only set dev->smart_pc_enabled = true if the whole
>> amd_pmf_init_smart_pc() has succeeded?
>>
>> regards,
>> dan carpenter
>>
> 
> Right; it's only set when amd_pmf_start_policy_engine() succeeds which 
> was not the case for me.  This makes me wonder how exactly this was 
> happening [amd_pmf_deinit_smart_pc() would only be called from 
> amd_pmf_deinit_features()].

For completeness; here is the trace back after load/unload which was 
fixed by this patch I shared (passed through scripts/decode_stacktrace.sh)

[   33.636532] ------------[ cut here ]------------ 
  
  
                                                                [ 
33.636540] kernel BUG at mm/slub.c:546! 
  
  
                                                            [ 
33.636554] Oops: invalid opcode: 0000 [#1] SMP NOPTI 
  
  
                                                            [ 
33.636560] CPU: 4 UID: 0 PID: 3245 Comm: rmmod Not tainted 
6.15.0-rc1-00020-g0581d384f344 #79 PREEMPT(full) 
4b12a0d175d8a11a0952957398aaf7c2efd0bbd7 
  
                       [   33.636565] Hardware name: Framework Laptop 13 
(AMD Ryzen AI 300 Series)/FRANMGCP09, BIOS 03.03 03/10/2025 
  
  
              [   33.636567] RIP: 0010:__slab_free (mm/slub.c:546 
(discriminator 1) mm/slub.c:4475 (discriminator 1)) 
  
  
            [ 33.636578] Code: 0c 44 0f b6 4c 24 13 44 8b 44 24 14 48 89 
44 24 20 49 8b 04 24 4c 8b 54 24 18 48 c1 e8 09 83 e0 01 88 44 24 11 e9 
ce fe ff ff <0f> 0b 41 f7 46 08 87 04 00 00 75 91 eb 83 41 f7 46 08 87 
04 00 00 
    All code 
  
  
  
======== 
  
  
                                                                   0: 
0c 44                   or     $0x44,%al 
  
  
                                                           2:   0f b6 4c 
24 13          movzbl 0x13(%rsp),%ecx 
  
  
                                                  7:   44 8b 44 24 14 
       mov    0x14(%rsp),%r8d 
  
  
                                          c:   48 89 44 24 20 
mov    %rax,0x20(%rsp) 
  
  
                                  11:   49 8b 04 24             mov 
(%r12),%rax 
  
  
                           15:   4c 8b 54 24 18          mov 
0x18(%rsp),%r10 
  
  
                           1a:   48 c1 e8 09             shr 
$0x9,%rax 
  
  
                           1e:   83 e0 01                and 
$0x1,%eax 
  
  
                           21:   88 44 24 11             mov 
%al,0x11(%rsp) 
  
  
                           25:   e9 ce fe ff ff          jmp 
0xfffffffffffffef8 
  
  
                           2a:*  0f 0b                   ud2 
<-- trapping instruction 
  
  
                  2c:   41 f7 46 08 87 04 00    testl  $0x487,0x8(%r14) 
  
  
  
          33:   00 
  
  
  
  34:   75 91                   jne    0xffffffffffffffc7 
  
  
                                                                  36: 
eb 83                   jmp    0xffffffffffffffbb 
  
  
                                                          38:   41 f7 46 
08 87 04 00    testl  $0x487,0x8(%r14) 
  
  
                                                 3f:   00 
  
  
  
  
  
  
  
                               Code starting with the faulting 
instruction 
  
  
  
=========================================== 
  
  
                                                                   0: 
0f 0b                   ud2 
  
  
                                                           2:   41 f7 46 
08 87 04 00    testl  $0x487,0x8(%r14) 
  
  
                                                  9:   00 
  
  
  
                                          a:   75 91 
jne    0xffffffffffffff9d 
  
  
                                   c:   eb 83                   jmp 
0xffffffffffffff91
    e:   41 f7 46 08 87 04 00    testl  $0x487,0x8(%r14) 
  
  
                                                                   15:   00
[   33.636580] RSP: 0018:ffffd3ae43327a10 EFLAGS: 00010246
[   33.636584] RAX: ffff8a6164a50f00 RBX: 000000008020001e RCX: 
0000000000000000
[   33.636585] RDX: ffffd3ae43327a40 RSI: fffff51cc4929400 RDI: 
ffffd3ae43327a80
[   33.636587] RBP: ffffd3ae43327ab0 R08: 0000000000000001 R09: 
ffffffffaa799600
[   33.636588] R10: ffff8a6164a50e00 R11: ffffd3ae43327ab0 R12: 
fffff51cc4929400
[   33.636589] R13: ffff8a6164a50e00 R14: ffff8a6140042a00 R15: 
0000000000000000
[   33.636590] FS:  00007f6c8ee08680(0000) GS:ffff8a68d20c3000(0000) 
knlGS:0000000000000000 
  
                                                                   [ 
33.636591] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033 
  
  
                                                            [ 
33.636592] CR2: 00007f6c8e4578e0 CR3: 000000012fdb0000 CR4: 
0000000000b50ef0 
  
  
[   33.636594] Call Trace: 
  
  
                                                                [ 
33.636596]  <TASK> 
  
  
                                                            [ 
33.636598] ? __call_rcu_common (./arch/x86/include/asm/atomic64_64.h:15 
./include/linux/atomic/atomic-arch-fallback.h:2583 
./include/linux/atomic/atomic-long.h:38 
./include/linux/atomic/atomic-instrumented.h:3189 
kernel/rcu/rcu_segcblist.h:52 kernel/rcu/tree.c:2956 
kernel/rcu/tree.c:2967 kernel/rcu/tree.c:3106) 
  
  
  
  [   33.636605] ? platform_remove (drivers/base/platform.c:1424) 
  
  
                                                                  [ 
33.636610] kfree (mm/slub.c:4599 mm/slub.c:4647 mm/slub.c:4845) 
  
  
                                                            [ 
33.636613] ? mntput_no_expire (fs/namespace.c:259 fs/namespace.c:1492) 
  
  
                                                            [ 
33.636626] platform_remove (drivers/base/platform.c:1424) 
  
  
                                                            [ 
33.636628] device_release_driver_internal (drivers/base/dd.c:1275 
drivers/base/dd.c:1296) 
  
                                                                  [ 
33.636634] driver_detach (drivers/base/dd.c:1360) 
  
  
                                                            [ 
33.636637] bus_remove_driver (drivers/base/bus.c:748) 
  
  
                                                            [ 
33.636640] __do_sys_delete_module (kernel/module/main.c:781) 
  
  
                                                            [ 
33.636646] do_syscall_64 (arch/x86/entry/syscall_64.c:63 (discriminator 
1) arch/x86/entry/syscall_64.c:94 (discriminator 1))
[   33.636654] ? vfs_read (fs/read_write.c:489 fs/read_write.c:570)
[   33.636658] ? ___pte_offset_map (./include/linux/pgtable.h:347 
(discriminator 2) ./include/linux/pgtable.h:624 (discriminator 2) 
mm/pgtable-generic.c:289 (discriminator 2)) 
  
    [   33.636663] ? next_uptodate_folio (./include/linux/xarray.h:1722 
mm/filemap.c:3585) 
  
                                                                    [ 
33.636668] ? __mod_memcg_lruvec_state (mm/memcontrol.c:586 
mm/memcontrol.c:771)
[   33.636673] ? mod_objcg_state (mm/memcontrol.c:2475 mm/memcontrol.c:2837)
[   33.636675] ? xas_load (./include/linux/xarray.h:175 
./include/linux/xarray.h:1264 lib/xarray.c:241)
[   33.636680] ? xa_load (lib/xarray.c:1624)
[   33.636683] ? __memcg_slab_free_hook (mm/memcontrol.c:3093 
(discriminator 1))
[   33.636686] ? __x64_sys_close (fs/open.c:1584 fs/open.c:1566 
fs/open.c:1566)
[   33.636689] ? kmem_cache_free (mm/slub.c:4646 (discriminator 1) 
mm/slub.c:4748 (discriminator 1))
[   33.636692] ? syscall_exit_to_user_mode 
(./arch/x86/include/asm/irqflags.h:37 
./arch/x86/include/asm/irqflags.h:114 ./include/linux/entry-common.h:232 
kernel/entry/common.c:206 kernel/entry/common.c:218)
[   33.636695] ? do_syscall_64 (arch/x86/entry/syscall_64.c:113)
[   33.636698] ? __count_memcg_events (mm/memcontrol.c:586 
mm/memcontrol.c:862)
[   33.636699] ? handle_mm_fault (mm/memory.c:6182 mm/memory.c:6335)
[   33.636704] ? do_user_addr_fault (arch/x86/mm/fault.c:1338)
[   33.636710] ? exc_page_fault (./arch/x86/include/asm/irqflags.h:37 
./arch/x86/include/asm/irqflags.h:114 arch/x86/mm/fault.c:1488 
arch/x86/mm/fault.c:1538)
[   33.636713] entry_SYSCALL_64_after_hwframe 
(arch/x86/entry/entry_64.S:130)
[   33.636716] RIP: 0033:0x7f6c8e53519b
[ 33.636719] Code: 73 01 c3 48 8b 0d 5d bc 0d 00 f7 d8 64 89 01 48 83 c8 
ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 b0 00 00 00 0f 05 
<48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 2d bc 0d 00 f7 d8 64 89 01 48

And this was built and reproduced from platform-x86/review-ilpo-fixes:

0581d384f344e (HEAD, platform-x86/review-ilpo-fixes) 
platform/x86/amd/hsmp: Make amd_hsmp and hsmp_acpi as mutually exclusive 
drivers
8e81b9cd6e951 drivers/platform/x86/amd: pmf: Check for invalid Smart PC 
Policies
690d722e02819 drivers/platform/x86/amd: pmf: Check for invalid 
sideloaded Smart PC Policies
02c6e43397c39 (tag: platform-drivers-x86-v6.15-4, platform-x86/fixes)

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] drivers/platform/x86/amd: pmf: Fix a double free on module unload
  2025-05-06 16:11   ` Mario Limonciello
  2025-05-07  1:41     ` Mario Limonciello
@ 2025-05-07  1:49     ` Mario Limonciello
  2025-05-07  7:22       ` Dan Carpenter
  1 sibling, 1 reply; 6+ messages in thread
From: Mario Limonciello @ 2025-05-07  1:49 UTC (permalink / raw)
  To: Mario Limonciello, Dan Carpenter
  Cc: Shyam-sundar.S-k, hdegoede, ilpo.jarvinen, platform-driver-x86

On 5/6/2025 11:11 AM, Mario Limonciello wrote:
> On 5/6/2025 10:53 AM, Dan Carpenter wrote:
>> On Tue, May 06, 2025 at 08:11:29AM -0500, Mario Limonciello wrote:
>>> From: Mario Limonciello <mario.limonciello@amd.com>
>>>
>>> If setting up smart PC fails for any reason then this can lead to
>>> a double free when unloading amd-pmf.  This is because dev->buf was
>>> freed but never set to NULL and is again freed in
>>> amd_pmf_deinit_smart_pc().
>>>
>>> Explicitly set pointers to NULL after freeing them to avoid the
>>> double free.
>>>
>>> Fixes: 5b1122fc4995f ("platform/x86/amd/pmf: fix cleanup in 
>>> amd_pmf_init_smart_pc()")
>>> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
>>> ---
>>>   drivers/platform/x86/amd/pmf/tee-if.c | 3 +++
>>>   1 file changed, 3 insertions(+)
>>>
>>> diff --git a/drivers/platform/x86/amd/pmf/tee-if.c b/drivers/ 
>>> platform/x86/amd/pmf/tee-if.c
>>> index a1e43873a07b0..48902f1c767c6 100644
>>> --- a/drivers/platform/x86/amd/pmf/tee-if.c
>>> +++ b/drivers/platform/x86/amd/pmf/tee-if.c
>>> @@ -579,10 +579,13 @@ int amd_pmf_init_smart_pc(struct amd_pmf_dev *dev)
>>>       amd_pmf_tee_deinit(dev);
>>          ^^^^^^^^^^^^^^^^^^^^^^^
>>
>>>   err_free_prev_data:
>>>       kfree(dev->prev_data);
>>> +    dev->prev_data = NULL;
>>>   err_free_policy:
>>>       kfree(dev->policy_buf);
>>> +    dev->policy_buf = NULL;
>>>   err_free_dram_buf:
>>>       kfree(dev->buf);
>>> +    dev->buf = NULL;
>>>   err_cancel_work:
>>>       cancel_delayed_work_sync(&dev->pb_work);
>>
>> This is a real bug.  Did you find it from testing or reading the code?
> 
> I found it from testing.  I was testing some other unrelated changes and 
> found that unloading/reloading the module eventually lead to problems. I 
> tracked it down to your change.
> 
>> My reading of the code says that this bug can only occur if
>> amd_pmf_register_input_device() fails, right?
> 
> No; it was happening from a failure where the system didn't have a 
> policy or had a "bad" policy.
> 
>>
>> We can only call amd_pmf_deinit_smart_pc() if 
>> amd_pmf_start_policy_engine()
>> succeeds because that's where we set:
>>
>>     dev->smart_pc_enabled = true;
>>
>> This patch doesn't totally fix the problem because we would still call
>> amd_pmf_tee_deinit().  That's why I suspect you found this by auditing
>> the code because I think that remaining bug would trigger a stack trace.
>> I also worry that there is a small race window where we could trigger
>> amd_pmf_tee_deinit() before amd_pmf_init_smart_pc() has finished
>> running.
>>
>> Another bug is that we should cancel the work before freeing all the
>> pointers.  This looks like the more serious bug.
>>
>> What about if we only set dev->smart_pc_enabled = true if the whole
>> amd_pmf_init_smart_pc() has succeeded?
>>
>> regards,
>> dan carpenter
>>
> 
> Right; it's only set when amd_pmf_start_policy_engine() succeeds which 
> was not the case for me.  This makes me wonder how exactly this was 
> happening [amd_pmf_deinit_smart_pc() would only be called from 
> amd_pmf_deinit_features()].

Ah I think I found the actual callpath.

It's amd_pmf_remove() that has kfree(dev->buf) - that's probably what's 
actually tripping it.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] drivers/platform/x86/amd: pmf: Fix a double free on module unload
  2025-05-07  1:49     ` Mario Limonciello
@ 2025-05-07  7:22       ` Dan Carpenter
  0 siblings, 0 replies; 6+ messages in thread
From: Dan Carpenter @ 2025-05-07  7:22 UTC (permalink / raw)
  To: Mario Limonciello
  Cc: Mario Limonciello, Shyam-sundar.S-k, hdegoede, ilpo.jarvinen,
	platform-driver-x86

On Tue, May 06, 2025 at 08:49:12PM -0500, Mario Limonciello wrote:
> On 5/6/2025 11:11 AM, Mario Limonciello wrote:
> > On 5/6/2025 10:53 AM, Dan Carpenter wrote:
> > > On Tue, May 06, 2025 at 08:11:29AM -0500, Mario Limonciello wrote:
> > > > From: Mario Limonciello <mario.limonciello@amd.com>
> > > > 
> > > > If setting up smart PC fails for any reason then this can lead to
> > > > a double free when unloading amd-pmf.  This is because dev->buf was
> > > > freed but never set to NULL and is again freed in
> > > > amd_pmf_deinit_smart_pc().
> > > > 
> > > > Explicitly set pointers to NULL after freeing them to avoid the
> > > > double free.
> > > > 
> > > > Fixes: 5b1122fc4995f ("platform/x86/amd/pmf: fix cleanup in
> > > > amd_pmf_init_smart_pc()")
> > > > Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
> > > > ---
> > > >   drivers/platform/x86/amd/pmf/tee-if.c | 3 +++
> > > >   1 file changed, 3 insertions(+)
> > > > 
> > > > diff --git a/drivers/platform/x86/amd/pmf/tee-if.c b/drivers/
> > > > platform/x86/amd/pmf/tee-if.c
> > > > index a1e43873a07b0..48902f1c767c6 100644
> > > > --- a/drivers/platform/x86/amd/pmf/tee-if.c
> > > > +++ b/drivers/platform/x86/amd/pmf/tee-if.c
> > > > @@ -579,10 +579,13 @@ int amd_pmf_init_smart_pc(struct amd_pmf_dev *dev)
> > > >       amd_pmf_tee_deinit(dev);
> > >          ^^^^^^^^^^^^^^^^^^^^^^^
> > > 
> > > >   err_free_prev_data:
> > > >       kfree(dev->prev_data);
> > > > +    dev->prev_data = NULL;
> > > >   err_free_policy:
> > > >       kfree(dev->policy_buf);
> > > > +    dev->policy_buf = NULL;
> > > >   err_free_dram_buf:
> > > >       kfree(dev->buf);
> > > > +    dev->buf = NULL;
> > > >   err_cancel_work:
> > > >       cancel_delayed_work_sync(&dev->pb_work);
> > > 
> > > This is a real bug.  Did you find it from testing or reading the code?
> > 
> > I found it from testing.  I was testing some other unrelated changes and
> > found that unloading/reloading the module eventually lead to problems. I
> > tracked it down to your change.
> > 
> > > My reading of the code says that this bug can only occur if
> > > amd_pmf_register_input_device() fails, right?
> > 
> > No; it was happening from a failure where the system didn't have a
> > policy or had a "bad" policy.
> > 
> > > 
> > > We can only call amd_pmf_deinit_smart_pc() if
> > > amd_pmf_start_policy_engine()
> > > succeeds because that's where we set:
> > > 
> > >     dev->smart_pc_enabled = true;
> > > 
> > > This patch doesn't totally fix the problem because we would still call
> > > amd_pmf_tee_deinit().  That's why I suspect you found this by auditing
> > > the code because I think that remaining bug would trigger a stack trace.
> > > I also worry that there is a small race window where we could trigger
> > > amd_pmf_tee_deinit() before amd_pmf_init_smart_pc() has finished
> > > running.
> > > 
> > > Another bug is that we should cancel the work before freeing all the
> > > pointers.  This looks like the more serious bug.
> > > 
> > > What about if we only set dev->smart_pc_enabled = true if the whole
> > > amd_pmf_init_smart_pc() has succeeded?
> > > 
> > > regards,
> > > dan carpenter
> > > 
> > 
> > Right; it's only set when amd_pmf_start_policy_engine() succeeds which
> > was not the case for me.  This makes me wonder how exactly this was
> > happening [amd_pmf_deinit_smart_pc() would only be called from
> > amd_pmf_deinit_features()].
> 
> Ah I think I found the actual callpath.
> 
> It's amd_pmf_remove() that has kfree(dev->buf) - that's probably what's
> actually tripping it.

Great!  There should be no need to call kfree(dev->buf) there because
amd_pmf_deinit_features() will do it.  Could you add something like
the following to your diff?

The changes to amd_pmf_ta_open_session() are unrelated actually?  Do
that in another patch?

Setting dev->tee_ctx to NULL will prevent amd_pmf_tee_deinit() from
being run twice.

We only need to cancel the work once it has been scheduled.  We
scheduled it for 3 seconds into the future so it will be canceled and
that bug won't trigger in real life.  But freeing the pointers before
canceling is ugly.

regards,
dan carpenter

diff --git a/drivers/platform/x86/amd/pmf/core.c b/drivers/platform/x86/amd/pmf/core.c
index 96821101ec77..0a494d6e8fcf 100644
--- a/drivers/platform/x86/amd/pmf/core.c
+++ b/drivers/platform/x86/amd/pmf/core.c
@@ -493,7 +493,6 @@ static void amd_pmf_remove(struct platform_device *pdev)
 	mutex_destroy(&dev->lock);
 	mutex_destroy(&dev->update_mutex);
 	mutex_destroy(&dev->cb_mutex);
-	kfree(dev->buf);
 }
 
 static const struct attribute_group *amd_pmf_driver_groups[] = {
diff --git a/drivers/platform/x86/amd/pmf/tee-if.c b/drivers/platform/x86/amd/pmf/tee-if.c
index 14b99d8b63d2..b332b5470398 100644
--- a/drivers/platform/x86/amd/pmf/tee-if.c
+++ b/drivers/platform/x86/amd/pmf/tee-if.c
@@ -407,12 +407,12 @@ static int amd_pmf_ta_open_session(struct tee_context *ctx, u32 *id, const uuid_
 	rc = tee_client_open_session(ctx, &sess_arg, NULL);
 	if (rc < 0 || sess_arg.ret != 0) {
 		pr_err("Failed to open TEE session err:%#x, rc:%d\n", sess_arg.ret, rc);
-		return rc;
+		return rc ?: -EINVAL;
 	}
 
 	*id = sess_arg.session;
 
-	return rc;
+	return 0;
 }
 
 static int amd_pmf_register_input_device(struct amd_pmf_dev *dev)
@@ -447,7 +447,9 @@ static int amd_pmf_tee_init(struct amd_pmf_dev *dev, const uuid_t *uuid)
 	dev->tee_ctx = tee_client_open_context(NULL, amd_pmf_amdtee_ta_match, NULL, NULL);
 	if (IS_ERR(dev->tee_ctx)) {
 		dev_err(dev->dev, "Failed to open TEE context\n");
-		return PTR_ERR(dev->tee_ctx);
+		ret = PTR_ERR(dev->tee_ctx);
+		dev->tee_ctx = NULL;
+		return ret;
 	}
 
 	ret = amd_pmf_ta_open_session(dev->tee_ctx, &dev->session_id, uuid);
@@ -487,9 +489,12 @@ static int amd_pmf_tee_init(struct amd_pmf_dev *dev, const uuid_t *uuid)
 
 static void amd_pmf_tee_deinit(struct amd_pmf_dev *dev)
 {
+	if (!dev->tee_ctx)
+		return;
 	tee_shm_free(dev->fw_shm_pool);
 	tee_client_close_session(dev->tee_ctx, dev->session_id);
 	tee_client_close_context(dev->tee_ctx);
+	dev->tee_ctx = NULL;
 }
 
 int amd_pmf_init_smart_pc(struct amd_pmf_dev *dev)
@@ -512,7 +517,7 @@ int amd_pmf_init_smart_pc(struct amd_pmf_dev *dev)
 
 	ret = amd_pmf_set_dram_addr(dev, true);
 	if (ret)
-		goto err_cancel_work;
+		return ret;
 
 	dev->policy_base = devm_ioremap_resource(dev->dev, dev->res);
 	if (IS_ERR(dev->policy_base)) {
@@ -576,6 +581,7 @@ int amd_pmf_init_smart_pc(struct amd_pmf_dev *dev)
 	return 0;
 
 err_pmf_remove_pb:
+	cancel_delayed_work_sync(&dev->pb_work);
 	if (pb_side_load && dev->esbin)
 		amd_pmf_remove_pb(dev);
 	amd_pmf_tee_deinit(dev);
@@ -585,8 +591,6 @@ int amd_pmf_init_smart_pc(struct amd_pmf_dev *dev)
 	kfree(dev->policy_buf);
 err_free_dram_buf:
 	kfree(dev->buf);
-err_cancel_work:
-	cancel_delayed_work_sync(&dev->pb_work);
 
 	return ret;
 }

^ permalink raw reply related	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2025-05-07  7:22 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-06 13:11 [PATCH] drivers/platform/x86/amd: pmf: Fix a double free on module unload Mario Limonciello
2025-05-06 15:53 ` Dan Carpenter
2025-05-06 16:11   ` Mario Limonciello
2025-05-07  1:41     ` Mario Limonciello
2025-05-07  1:49     ` Mario Limonciello
2025-05-07  7:22       ` Dan Carpenter

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.