[BUG Report] KASAN: slab-use-after-free in page_pool_recycle_in

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [BUG Report] KASAN: slab-use-after-free in page_pool_recycle_in_ring
@ 2025-05-13  8:31 Dong Chenchen
  2025-05-13 20:06 ` Mina Almasry
  0 siblings, 1 reply; 12+ messages in thread
From: Dong Chenchen @ 2025-05-13  8:31 UTC (permalink / raw)
  To: hawk, ilias.apalodimas, davem, edumazet, kuba, pabeni, horms
  Cc: netdev, linux-kernel, zhangchangzhong, Dong Chenchen

Hello,

syzkaller found the UAF issue in page_pool_recycle_in_ring[1], which is
similar to syzbot+204a4382fcb3311f3858@syzkaller.appspotmail.com.

root cause is as follow:

page_pool_recycle_in_ring
  ptr_ring_produce
    spin_lock(&r->producer_lock);
    WRITE_ONCE(r->queue[r->producer++], ptr)
      //recycle last page to pool
 				page_pool_release
				  page_pool_scrub
				    page_pool_empty_ring
				      ptr_ring_consume
				      page_pool_return_page //release all page
				  __page_pool_destroy
				     free_percpu(pool->recycle_stats);
				     kfree(pool) //free

     spin_unlock(&r->producer_lock); //pool->ring uaf read
  recycle_stat_inc(pool, ring);

page_pool can be free while page pool recycle the last page in ring.
After adding a delay to the page_pool_recycle_in_ring(), syzlog[2] can
reproduce this issue with a high probability. Maybe we can fix it by
holding the user_cnt of the page pool during the page recycle process.

Does anyone have a good idea to solve this problem?

-----
Best Regards,
Dong Chenchen

[1]
BUG: KASAN: slab-use-after-free in page_pool_recycle_in_ring (net/core/page_pool.c:718) 
Read of size 8 at addr ffff88811dfe0710 by task syz-executor.14/11451

CPU: 1 UID: 0 PID: 11451 Comm: syz-executor.14 Tainted: G        W           6.15.0-rc5-00207-g1a33418a69cc-dirty #30 PREEMPT(full)
Tainted: [W]=WARN
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl (lib/dump_stack.c:123) 
print_report (mm/kasan/report.c:409 mm/kasan/report.c:521) 
kasan_report (mm/kasan/report.c:636) 
page_pool_recycle_in_ring (net/core/page_pool.c:718) 
page_pool_put_unrefed_netmem (net/core/page_pool.c:834 (discriminator 1)) 
napi_pp_put_page (./include/net/page_pool/helpers.h:336 ./include/net/page_pool/helpers.h:366 net/core/skbuff.c:1008) 
skb_free_head (net/core/skbuff.c:1066) 
skb_release_data (net/core/skbuff.c:1108) 
sk_skb_reason_drop (net/core/skbuff.c:1177 net/core/skbuff.c:1214) 
skb_queue_purge_reason (./include/linux/skbuff.h:2147 ./include/linux/skbuff.h:2453 ./include/linux/skbuff.h:3353 net/core/skbuff.c:3917 net/core/skbuff.c:3902) 
packet_release (net/packet/af_packet.c:1288 net/packet/af_packet.c:3233) 
__sock_release (net/socket.c:648) 
sock_close (net/socket.c:1393) 
__fput (fs/file_table.c:466) 
fput_close_sync (fs/file_table.c:571) 
__x64_sys_close (fs/open.c:1583 fs/open.c:1566 fs/open.c:1566) 
do_syscall_64 (arch/x86/entry/syscall_64.c:63 arch/x86/entry/syscall_64.c:94) 
entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130) 
RIP: 0033:0x417bd1
Code: 75 14 b8 03 00 00 00 0f 05 48 3d 01 f0 ff ff 0f 83 a4 1a 00 00 c3 48 83 ec 08 e8 0a fc ff ff 48 89 04 24 b8 03 00 00 00 0f 05 <48> 8b 3c 24 48 89 c2 e8 53 fc ff ff 48 89 d0 48 83 c4 08 48 3d 01
All code
========
   0:	75 14                	jne    0x16
   2:	b8 03 00 00 00       	mov    $0x3,%eax
   7:	0f 05                	syscall 
   9:	48 3d 01 f0 ff ff    	cmp    $0xfffffffffffff001,%rax
   f:	0f 83 a4 1a 00 00    	jae    0x1ab9
  15:	c3                   	ret    
  16:	48 83 ec 08          	sub    $0x8,%rsp
  1a:	e8 0a fc ff ff       	call   0xfffffffffffffc29
  1f:	48 89 04 24          	mov    %rax,(%rsp)
  23:	b8 03 00 00 00       	mov    $0x3,%eax
  28:	0f 05                	syscall 
  2a:*	48 8b 3c 24          	mov    (%rsp),%rdi		<-- trapping instruction
  2e:	48 89 c2             	mov    %rax,%rdx
  31:	e8 53 fc ff ff       	call   0xfffffffffffffc89
  36:	48 89 d0             	mov    %rdx,%rax
  39:	48 83 c4 08          	add    $0x8,%rsp
  3d:	48                   	rex.W
  3e:	3d                   	.byte 0x3d
  3f:	01                   	.byte 0x1

Code starting with the faulting instruction
===========================================
   0:	48 8b 3c 24          	mov    (%rsp),%rdi
   4:	48 89 c2             	mov    %rax,%rdx
   7:	e8 53 fc ff ff       	call   0xfffffffffffffc5f
   c:	48 89 d0             	mov    %rdx,%rax
   f:	48 83 c4 08          	add    $0x8,%rsp
  13:	48                   	rex.W
  14:	3d                   	.byte 0x3d
  15:	01                   	.byte 0x1
RSP: 002b:00007fffe74be2a0 EFLAGS: 00000293 ORIG_RAX: 0000000000000003
RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 0000000000417bd1
RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000003
RBP: 000000000077d960 R08: 0000001b2e160000 R09: 0000000000780d78
R10: 00007fffe74be390 R11: 0000000000000293 R12: 000000000006a10e
R13: 000000000077c03c R14: 000000000077c030 R15: 0000000000000001
</TASK>

Allocated by task 11457:
kasan_save_stack (mm/kasan/common.c:48) 
kasan_save_track (./arch/x86/include/asm/current.h:25 mm/kasan/common.c:60 mm/kasan/common.c:69) 
__kasan_kmalloc (mm/kasan/common.c:377 mm/kasan/common.c:394) 
page_pool_create_percpu (./include/linux/slab.h:928 net/core/page_pool.c:344) 
bpf_test_run_xdp_live (net/bpf/test_run.c:183 net/bpf/test_run.c:383) 
bpf_prog_test_run_xdp (net/bpf/test_run.c:1316) 
__sys_bpf (kernel/bpf/syscall.c:4427 kernel/bpf/syscall.c:5852) 
__x64_sys_bpf (kernel/bpf/syscall.c:5939) 
do_syscall_64 (arch/x86/entry/syscall_64.c:63 arch/x86/entry/syscall_64.c:94) 
entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130) 

Freed by task 11137:
kasan_save_stack (mm/kasan/common.c:48) 
kasan_save_track (./arch/x86/include/asm/current.h:25 mm/kasan/common.c:60 mm/kasan/common.c:69) 
kasan_save_free_info (mm/kasan/generic.c:579) 
__kasan_slab_free (mm/kasan/common.c:271) 
kfree (mm/slub.c:4642 mm/slub.c:4841) 
page_pool_release (net/core/page_pool.c:1062 net/core/page_pool.c:1099) 
page_pool_release_retry (net/core/page_pool.c:1118) 
process_one_work (kernel/workqueue.c:3243) 
worker_thread (kernel/workqueue.c:3313 kernel/workqueue.c:3400) 
kthread (kernel/kthread.c:464) 
ret_from_fork (arch/x86/kernel/process.c:159) 
ret_from_fork_asm (arch/x86/entry/entry_64.S:258) 

Last potentially related work creation:
kasan_save_stack (mm/kasan/common.c:48) 
kasan_record_aux_stack (mm/kasan/generic.c:548) 
insert_work (./include/linux/instrumented.h:68 ./include/asm-generic/bitops/instrumented-non-atomic.h:141
	kernel/workqueue.c:788 kernel/workqueue.c:795 kernel/workqueue.c:2186) 
__queue_work (kernel/workqueue.c:2342) 
call_timer_fn (./arch/x86/include/asm/jump_label.h:36 ./include/trace/events/timer.h:127 kernel/time/timer.c:1790) 
__run_timers (kernel/time/timer.c:1836 kernel/time/timer.c:2414) 
run_timer_base (kernel/time/timer.c:2427 kernel/time/timer.c:2418 kernel/time/timer.c:2435) 
run_timer_softirq (kernel/time/timer.c:2446) 
handle_softirqs (./arch/x86/include/asm/jump_label.h:36 ./include/trace/events/irq.h:142 kernel/softirq.c:580) 
__irq_exit_rcu (kernel/softirq.c:614 kernel/softirq.c:453 kernel/softirq.c:680) 
irq_exit_rcu (kernel/softirq.c:698) 
sysvec_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:1049 arch/x86/kernel/apic/apic.c:1049) 
asm_sysvec_apic_timer_interrupt (./arch/x86/include/asm/idtentry.h:702) 

The buggy address belongs to the object at ffff88811dfe0000
which belongs to the cache kmalloc-2k of size 2048
The buggy address is located 1808 bytes inside of
freed 2048-byte region [ffff88811dfe0000, ffff88811dfe0800)

The buggy address belongs to the physical page:
page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x11dfe0
head: order:3 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x17ff00000000040(head|node=0|zone=2|lastcpupid=0x7ff)
page_type: f5(slab)
raw: 017ff00000000040 ffff888100042f00 ffffea000495e200 0000000000000002
raw: 0000000000000000 0000000000080008 00000000f5000000 0000000000000000
head: 017ff00000000040 ffff888100042f00 ffffea000495e200 0000000000000002
head: 0000000000000000 0000000000080008 00000000f5000000 0000000000000000
head: 017ff00000000003 ffffea000477f801 00000000ffffffff 00000000ffffffff
head: ffffffffffffffff 0000000000000000 00000000ffffffff 0000000000000008
page dumped because: kasan: bad access detected
page_owner tracks the page as allocated
page last allocated via order 3, migratetype Unmovable,
gfp_mask 0xd28c0(GFP_NOWAIT|__GFP_IO|__GFP_FS|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC),
pid 11004, tgid 11004 (syz-executor.4), ts 427330276241, free_ts 427308706507
post_alloc_hook (./include/linux/page_owner.h:32 mm/page_alloc.c:1718) 
get_page_from_freelist (mm/page_alloc.c:1728 mm/page_alloc.c:3688) 
__alloc_frozen_pages_noprof (mm/page_alloc.c:4970) 
alloc_pages_mpol (mm/mempolicy.c:2303) 
new_slab (mm/slub.c:2450 mm/slub.c:2618 mm/slub.c:2672) 
___slab_alloc (mm/slub.c:3859 (discriminator 3)) 
__slab_alloc.constprop.0 (mm/slub.c:3948) 
__kmalloc_node_track_caller_noprof (mm/slub.c:4023 mm/slub.c:4184 mm/slub.c:4326 mm/slub.c:4346) 
kmalloc_reserve (net/core/skbuff.c:599) 
pskb_expand_head (net/core/skbuff.c:2247) 
netlink_trim (net/netlink/af_netlink.c:1298) 
netlink_broadcast_filtered (net/netlink/af_netlink.c:453 net/netlink/af_netlink.c:1519) 
nlmsg_notify (net/netlink/af_netlink.c:2578) 
notifier_call_chain (kernel/notifier.c:85) 
call_netdevice_notifiers_info (net/core/dev.c:2176) 
page last free pid 11004 tgid 11004 stack trace:
__free_frozen_pages (./include/linux/page_owner.h:25 mm/page_alloc.c:1262 mm/page_alloc.c:2725) 
__put_partials (mm/slub.c:3180) 
qlist_free_all (mm/kasan/quarantine.c:174) 
kasan_quarantine_reduce (./include/linux/srcu.h:400 mm/kasan/quarantine.c:287) 
__kasan_slab_alloc (mm/kasan/common.c:331) 
__kmalloc_cache_noprof (mm/slub.c:4147 mm/slub.c:4196 mm/slub.c:4353) 
ref_tracker_alloc (lib/ref_tracker.c:203) 
net_rx_queue_update_kobjects (net/core/net-sysfs.c:1238 net/core/net-sysfs.c:1301) 
netdev_register_kobject (net/core/net-sysfs.c:2094 net/core/net-sysfs.c:2340) 
register_netdevice (./include/linux/netdevice.h:2751 net/core/dev.c:10999) 
veth_newlink (drivers/net/veth.c:1819) 
rtnl_newlink (net/core/rtnetlink.c:3833 net/core/rtnetlink.c:3950 net/core/rtnetlink.c:4065) 
rtnetlink_rcv_msg (net/core/rtnetlink.c:6955) 
netlink_rcv_skb (net/netlink/af_netlink.c:2535) 
netlink_unicast (net/netlink/af_netlink.c:1314 net/netlink/af_netlink.c:1339) 
netlink_sendmsg (net/netlink/af_netlink.c:1883) 

Memory state around the buggy address:
ffff88811dfe0600: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88811dfe0680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88811dfe0700: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88811dfe0780: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88811dfe0800: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc

[2]
https://lore.kernel.org/all/670c204d.050a0220.3e960.0045.GAE@google.com/T/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [BUG Report] KASAN: slab-use-after-free in page_pool_recycle_in_ring
  2025-05-13  8:31 [BUG Report] KASAN: slab-use-after-free in page_pool_recycle_in_ring Dong Chenchen
@ 2025-05-13 20:06 ` Mina Almasry
  2025-05-13 21:21   ` Jakub Kicinski
  2025-05-14  3:10   ` dongchenchen (A)
  0 siblings, 2 replies; 12+ messages in thread
From: Mina Almasry @ 2025-05-13 20:06 UTC (permalink / raw)
  To: Dong Chenchen
  Cc: hawk, ilias.apalodimas, davem, edumazet, kuba, pabeni, horms,
	netdev, linux-kernel, zhangchangzhong

On Tue, May 13, 2025 at 1:28 AM Dong Chenchen <dongchenchen2@huawei.com> wrote:
>
> Hello,
>
> syzkaller found the UAF issue in page_pool_recycle_in_ring[1], which is
> similar to syzbot+204a4382fcb3311f3858@syzkaller.appspotmail.com.
>
> root cause is as follow:
>
> page_pool_recycle_in_ring
>   ptr_ring_produce
>     spin_lock(&r->producer_lock);
>     WRITE_ONCE(r->queue[r->producer++], ptr)
>       //recycle last page to pool
>                                 page_pool_release
>                                   page_pool_scrub
>                                     page_pool_empty_ring
>                                       ptr_ring_consume
>                                       page_pool_return_page //release all page
>                                   __page_pool_destroy
>                                      free_percpu(pool->recycle_stats);
>                                      kfree(pool) //free
>
>      spin_unlock(&r->producer_lock); //pool->ring uaf read
>   recycle_stat_inc(pool, ring);
>
> page_pool can be free while page pool recycle the last page in ring.
> After adding a delay to the page_pool_recycle_in_ring(), syzlog[2] can
> reproduce this issue with a high probability. Maybe we can fix it by
> holding the user_cnt of the page pool during the page recycle process.
>
> Does anyone have a good idea to solve this problem?
>

Ugh. page_pool_release is not supposed to free the page_pool until all
inflight pages have been returned. It detects that there are pending
inflight pages by checking the atomic stats, but in this case it looks
like we've raced checking the atomic stats with another cpu returning
a netmem to the ptr ring (and it updates the stats _after_ it already
returned to the ptr_ring).

My guess here is that page_pool_scrub needs to acquire the
r->producer_lock to make sure there are no other producers returning
netmems to the ptr_ring while it's scrubbing them (and checking after
to make sure there are no inflight netmems).

Can you test this fix? It may need some massaging. I only checked it
builds. I haven't thought through all the possible races yet:

```
diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index 2b76848659418..8654608734773 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -1146,10 +1146,17 @@ static void page_pool_scrub(struct page_pool *pool)

 static int page_pool_release(struct page_pool *pool)
 {
+       bool in_softirq;
        int inflight;

+
+       /* Acquire producer lock to make sure we don't race with another thread
+        * returning a netmem to the ptr_ring.
+        */
+       in_softirq = page_pool_producer_lock(pool);
        page_pool_scrub(pool);
        inflight = page_pool_inflight(pool, true);
+       page_pool_producer_unlock(pool, in_softirq);
        if (!inflight)
                __page_pool_destroy(pool);
```

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [BUG Report] KASAN: slab-use-after-free in page_pool_recycle_in_ring
  2025-05-13 20:06 ` Mina Almasry
@ 2025-05-13 21:21   ` Jakub Kicinski
  2025-05-14  3:10   ` dongchenchen (A)
  1 sibling, 0 replies; 12+ messages in thread
From: Jakub Kicinski @ 2025-05-13 21:21 UTC (permalink / raw)
  To: Mina Almasry
  Cc: Dong Chenchen, hawk, ilias.apalodimas, davem, edumazet, pabeni,
	horms, netdev, linux-kernel, zhangchangzhong

On Tue, 13 May 2025 13:06:38 -0700 Mina Almasry wrote:
> diff --git a/net/core/page_pool.c b/net/core/page_pool.c
> index 2b76848659418..8654608734773 100644
> --- a/net/core/page_pool.c
> +++ b/net/core/page_pool.c
> @@ -1146,10 +1146,17 @@ static void page_pool_scrub(struct page_pool *pool)
> 
>  static int page_pool_release(struct page_pool *pool)
>  {
> +       bool in_softirq;
>         int inflight;
> 
> +
> +       /* Acquire producer lock to make sure we don't race with another thread
> +        * returning a netmem to the ptr_ring.
> +        */
> +       in_softirq = page_pool_producer_lock(pool);
>         page_pool_scrub(pool);
>         inflight = page_pool_inflight(pool, true);
> +       page_pool_producer_unlock(pool, in_softirq);

Makes sense! A couple minor notes.

Consumer lock should be outside, but really we only need to make 
sure producer has "exited" right? So lock/unlock, no need to wrap
any code in it.

I'd add a helper to ptr_ring.h, a "producer barrier" which just
takes/releases the producer lock. We can't be in softirq context
here but doesn't matter, let's take the lock in "any" mode IOW
irqsave() ?

The barrier is only needed if we're proceeding to destruction.
If inflight returns != 0 we won't destroy the pool so no need
to touch producer lock.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [BUG Report] KASAN: slab-use-after-free in page_pool_recycle_in_ring
  2025-05-13 20:06 ` Mina Almasry
  2025-05-13 21:21   ` Jakub Kicinski
@ 2025-05-14  3:10   ` dongchenchen (A)
  2025-05-19 19:20     ` Mina Almasry
  1 sibling, 1 reply; 12+ messages in thread
From: dongchenchen (A) @ 2025-05-14  3:10 UTC (permalink / raw)
  To: Mina Almasry
  Cc: hawk, ilias.apalodimas, davem, edumazet, kuba, pabeni, horms,
	netdev, linux-kernel, zhangchangzhong


> On Tue, May 13, 2025 at 1:28 AM Dong Chenchen <dongchenchen2@huawei.com> wrote:
>> Hello,
>>
>> syzkaller found the UAF issue in page_pool_recycle_in_ring[1], which is
>> similar to syzbot+204a4382fcb3311f3858@syzkaller.appspotmail.com.
>>
>> root cause is as follow:
>>
>> page_pool_recycle_in_ring
>>    ptr_ring_produce
>>      spin_lock(&r->producer_lock);
>>      WRITE_ONCE(r->queue[r->producer++], ptr)
>>        //recycle last page to pool
>>                                  page_pool_release
>>                                    page_pool_scrub
>>                                      page_pool_empty_ring
>>                                        ptr_ring_consume
>>                                        page_pool_return_page //release all page
>>                                    __page_pool_destroy
>>                                       free_percpu(pool->recycle_stats);
>>                                       kfree(pool) //free
>>
>>       spin_unlock(&r->producer_lock); //pool->ring uaf read
>>    recycle_stat_inc(pool, ring);
>>
>> page_pool can be free while page pool recycle the last page in ring.
>> After adding a delay to the page_pool_recycle_in_ring(), syzlog[2] can
>> reproduce this issue with a high probability. Maybe we can fix it by
>> holding the user_cnt of the page pool during the page recycle process.
>>
>> Does anyone have a good idea to solve this problem?
>>
> Ugh. page_pool_release is not supposed to free the page_pool until all
> inflight pages have been returned. It detects that there are pending
> inflight pages by checking the atomic stats, but in this case it looks
> like we've raced checking the atomic stats with another cpu returning
> a netmem to the ptr ring (and it updates the stats _after_ it already
> returned to the ptr_ring).
>
> My guess here is that page_pool_scrub needs to acquire the
> r->producer_lock to make sure there are no other producers returning
> netmems to the ptr_ring while it's scrubbing them (and checking after
> to make sure there are no inflight netmems).
>
> Can you test this fix? It may need some massaging. I only checked it
> builds. I haven't thought through all the possible races yet:
>
> ```
> diff --git a/net/core/page_pool.c b/net/core/page_pool.c
> index 2b76848659418..8654608734773 100644
> --- a/net/core/page_pool.c
> +++ b/net/core/page_pool.c
> @@ -1146,10 +1146,17 @@ static void page_pool_scrub(struct page_pool *pool)
>
>   static int page_pool_release(struct page_pool *pool)
>   {
> +       bool in_softirq;
>          int inflight;
>
> +
> +       /* Acquire producer lock to make sure we don't race with another thread
> +        * returning a netmem to the ptr_ring.
> +        */
> +       in_softirq = page_pool_producer_lock(pool);
>          page_pool_scrub(pool);
>          inflight = page_pool_inflight(pool, true);
> +       page_pool_producer_unlock(pool, in_softirq);
>          if (!inflight)
>                  __page_pool_destroy(pool);
> ```
Hi, Mina!

I tested this patch and the problem still exists.
Although this patch ensures that lock access is safe, the page recycle 
process
can access the page pool after unlock.

page_pool_recycle_in_ring
   ptr_ring_produce
     spin_lock(&r->producer_lock);
     WRITE_ONCE(r->queue[r->producer++], ptr)
     spin_unlock(&r->producer_lock);
       //recycle last page to pool
                                 page_pool_release
                                   page_pool_scrub
                                     page_pool_empty_ring
                                       ptr_ring_consume
                                       page_pool_return_page //release 
all page
                                   __page_pool_destroy
free_percpu(pool->recycle_stats);
                                      kfree(pool) //free
   recycle_stat_inc(pool, ring); //uaf read

Regards,
Dong Chenchen

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [BUG Report] KASAN: slab-use-after-free in page_pool_recycle_in_ring
  2025-05-14  3:10   ` dongchenchen (A)
@ 2025-05-19 19:20     ` Mina Almasry
  2025-05-19 22:47       ` Jakub Kicinski
  2025-05-22 15:04       ` dongchenchen (A)
  0 siblings, 2 replies; 12+ messages in thread
From: Mina Almasry @ 2025-05-19 19:20 UTC (permalink / raw)
  To: dongchenchen (A)
  Cc: hawk, ilias.apalodimas, davem, edumazet, kuba, pabeni, horms,
	netdev, linux-kernel, zhangchangzhong

On Tue, May 13, 2025 at 8:11 PM dongchenchen (A)
<dongchenchen2@huawei.com> wrote:
>
>
> > On Tue, May 13, 2025 at 1:28 AM Dong Chenchen <dongchenchen2@huawei.com> wrote:
> >> Hello,
> >>
> >> syzkaller found the UAF issue in page_pool_recycle_in_ring[1], which is
> >> similar to syzbot+204a4382fcb3311f3858@syzkaller.appspotmail.com.
> >>
> >> root cause is as follow:
> >>
> >> page_pool_recycle_in_ring
> >>    ptr_ring_produce
> >>      spin_lock(&r->producer_lock);
> >>      WRITE_ONCE(r->queue[r->producer++], ptr)
> >>        //recycle last page to pool
> >>                                  page_pool_release
> >>                                    page_pool_scrub
> >>                                      page_pool_empty_ring
> >>                                        ptr_ring_consume
> >>                                        page_pool_return_page //release all page
> >>                                    __page_pool_destroy
> >>                                       free_percpu(pool->recycle_stats);
> >>                                       kfree(pool) //free
> >>
> >>       spin_unlock(&r->producer_lock); //pool->ring uaf read
> >>    recycle_stat_inc(pool, ring);
> >>
> >> page_pool can be free while page pool recycle the last page in ring.
> >> After adding a delay to the page_pool_recycle_in_ring(), syzlog[2] can
> >> reproduce this issue with a high probability. Maybe we can fix it by
> >> holding the user_cnt of the page pool during the page recycle process.
> >>
> >> Does anyone have a good idea to solve this problem?
> >>
> > Ugh. page_pool_release is not supposed to free the page_pool until all
> > inflight pages have been returned. It detects that there are pending
> > inflight pages by checking the atomic stats, but in this case it looks
> > like we've raced checking the atomic stats with another cpu returning
> > a netmem to the ptr ring (and it updates the stats _after_ it already
> > returned to the ptr_ring).
> >
> > My guess here is that page_pool_scrub needs to acquire the
> > r->producer_lock to make sure there are no other producers returning
> > netmems to the ptr_ring while it's scrubbing them (and checking after
> > to make sure there are no inflight netmems).
> >
> > Can you test this fix? It may need some massaging. I only checked it
> > builds. I haven't thought through all the possible races yet:
> >
> > ```
> > diff --git a/net/core/page_pool.c b/net/core/page_pool.c
> > index 2b76848659418..8654608734773 100644
> > --- a/net/core/page_pool.c
> > +++ b/net/core/page_pool.c
> > @@ -1146,10 +1146,17 @@ static void page_pool_scrub(struct page_pool *pool)
> >
> >   static int page_pool_release(struct page_pool *pool)
> >   {
> > +       bool in_softirq;
> >          int inflight;
> >
> > +
> > +       /* Acquire producer lock to make sure we don't race with another thread
> > +        * returning a netmem to the ptr_ring.
> > +        */
> > +       in_softirq = page_pool_producer_lock(pool);
> >          page_pool_scrub(pool);
> >          inflight = page_pool_inflight(pool, true);
> > +       page_pool_producer_unlock(pool, in_softirq);
> >          if (!inflight)
> >                  __page_pool_destroy(pool);
> > ```
> Hi, Mina!
>
> I tested this patch and the problem still exists.
> Although this patch ensures that lock access is safe, the page recycle
> process
> can access the page pool after unlock.
>

Sorry for the very late reply; got a bit busy with some work work.

My initial analysis was wrong as the test shows with the candidate
fix. I took another look, and here is what I can tell so far. The full
syzbot report is here for reference:

https://syzkaller.appspot.com/bug?extid=204a4382fcb3311f3858

page_pool_release_retry is supposed to block freeing the page_pool
until all netmems have been freed via page_pool_put_unrefed_netmem
using the inflight logic. What is clear from the syzbot report is that
this inflight logic didn't work properly, because the
page_pool_put_unrefed_netmem call happened after
page_pool_release_retry has allowed the page_pool to be freed
(__page_pool_destroy has already been called).

The inflight logic works by taking the diff between
`pool->pages_state_release_cnt` and `pool->pages_state_hold_cnt`.
pages_state_hold_cnt is incremented when the page_pool allocates a new
page from the buddy allocator. pages_state_hold_cnt is incremented at
the end of the page_pool_put_unrefed_netmem.

We don't expect new pages to be allocated by the page_pool owner after
page_pool_destroy has been called, so pages_state_hold_cnt is supposed
to not move after page_pool_destroy is called I think.
pages_state_release_cnt should be <= pages_state_hold_cnt at the time
of page_pool_destroy is called. Then when all the inflight netmems
have been freed via page_pool_put_unrefed_netmem,
pool->pages_state_release_cnt should be == to
pool->pages_state_hold_cnt, and the page_pool should be allowed to be
freed.

Clearly this is not working, but I can't tell why. I also notice the
syzbot report is from the bpf/test_run.c, but I don't think we have
reports from prod, so it may be a test issue. Some possibilities:

1. Maybe the test is calling a page_pool allocation like
page_pool_dev_alloc_pages in parallel with page_pool_destroy. That may
increment pages_state_hold_cnt unexpectedly?

2.  Maybe one of the pages_state_*_cnt overflowed or something?

3. Memory corruption?

I'm afraid I'm not sure. Possibly littering the code with warnings for
unexpected cases would give some insight. For example, I think this
would catch case #1:

```
diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index 4011eb305cee..9fa70c60f9b5 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -536,6 +536,9 @@ static struct page
*__page_pool_alloc_page_order(struct page_pool *pool,
        alloc_stat_inc(pool, slow_high_order);
        page_pool_set_pp_info(pool, page_to_netmem(page));

+       /* Warn if we're allocating a page on a destroyed page_pool */
+       DEBUG_NET_WARN_ON(pool->destroy_cnt);
+
        /* Track how many pages are held 'in-flight' */
        pool->pages_state_hold_cnt++;
        trace_page_pool_state_hold(pool, page_to_netmem(page),
@@ -582,6 +585,8 @@ static noinline netmem_ref
__page_pool_alloc_pages_slow(struct page_pool *pool,
                page_pool_set_pp_info(pool, netmem);
                pool->alloc.cache[pool->alloc.count++] = netmem;
                /* Track how many pages are held 'in-flight' */
+               /* Warn if we're allocating a page on a destroyed page_pool */
+               DEBUG_NET_WARN_ON(pool->destroy_cnt);
                pool->pages_state_hold_cnt++;
                trace_page_pool_state_hold(pool, netmem,
                                           pool->pages_state_hold_cnt);
@@ -1271,6 +1276,8 @@ void net_mp_niov_set_page_pool(struct page_pool
*pool, struct net_iov *niov)

        page_pool_set_pp_info(pool, netmem);

+       /* Warn if we're allocating a page on a destroyed page_pool */
+       DEBUG_NET_WARN_ON(pool->destroy_cnt);
        pool->pages_state_hold_cnt++;
        trace_page_pool_state_hold(pool, netmem, pool->pages_state_hold_cnt);
```

If you have steps to repro this, maybe post them and I'll try to take
a look when I get a chance if you can't root cause this further.

--
Thanks,
Mina

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [BUG Report] KASAN: slab-use-after-free in page_pool_recycle_in_ring
  2025-05-19 19:20     ` Mina Almasry
@ 2025-05-19 22:47       ` Jakub Kicinski
  2025-05-20  0:53         ` Mina Almasry
  2025-05-22 15:04       ` dongchenchen (A)
  1 sibling, 1 reply; 12+ messages in thread
From: Jakub Kicinski @ 2025-05-19 22:47 UTC (permalink / raw)
  To: Mina Almasry
  Cc: dongchenchen (A), hawk, ilias.apalodimas, davem, edumazet, pabeni,
	horms, netdev, linux-kernel, zhangchangzhong

On Mon, 19 May 2025 12:20:59 -0700 Mina Almasry wrote:
> Clearly this is not working, but I can't tell why.

I think your fix works but for the one line that collects recycling
stats. If we put recycling stats under the producer lock we should
be safe.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [BUG Report] KASAN: slab-use-after-free in page_pool_recycle_in_ring
  2025-05-19 22:47       ` Jakub Kicinski
@ 2025-05-20  0:53         ` Mina Almasry
  2025-05-20 18:06           ` Jakub Kicinski
  0 siblings, 1 reply; 12+ messages in thread
From: Mina Almasry @ 2025-05-20  0:53 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: dongchenchen (A), hawk, ilias.apalodimas, davem, edumazet, pabeni,
	horms, netdev, linux-kernel, zhangchangzhong

On Mon, May 19, 2025 at 3:47 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Mon, 19 May 2025 12:20:59 -0700 Mina Almasry wrote:
> > Clearly this is not working, but I can't tell why.
>
> I think your fix works but for the one line that collects recycling
> stats. If we put recycling stats under the producer lock we should
> be safe.

What are you referring to as recycle stats? Because I don't think
pool->recycle_stats have anything to do with freeing the page_pool.

Or do you mean that we should put all the call sites that increment
and decrement pool->pages_state_release_cnt and
pool->pages_state_hold_cnt under the producer lock?



--
Thanks,
Mina

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [BUG Report] KASAN: slab-use-after-free in page_pool_recycle_in_ring
  2025-05-20  0:53         ` Mina Almasry
@ 2025-05-20 18:06           ` Jakub Kicinski
  2025-05-22 15:17             ` dongchenchen (A)
  0 siblings, 1 reply; 12+ messages in thread
From: Jakub Kicinski @ 2025-05-20 18:06 UTC (permalink / raw)
  To: Mina Almasry
  Cc: dongchenchen (A), hawk, ilias.apalodimas, davem, edumazet, pabeni,
	horms, netdev, linux-kernel, zhangchangzhong

On Mon, 19 May 2025 17:53:08 -0700 Mina Almasry wrote:
> On Mon, May 19, 2025 at 3:47 PM Jakub Kicinski <kuba@kernel.org> wrote:
> >
> > On Mon, 19 May 2025 12:20:59 -0700 Mina Almasry wrote:  
> > > Clearly this is not working, but I can't tell why.  
> >
> > I think your fix works but for the one line that collects recycling
> > stats. If we put recycling stats under the producer lock we should
> > be safe.  
> 
> What are you referring to as recycle stats? Because I don't think
> pool->recycle_stats have anything to do with freeing the page_pool.
> 
> Or do you mean that we should put all the call sites that increment
> and decrement pool->pages_state_release_cnt and
> pool->pages_state_hold_cnt under the producer lock?

No, just the "informational" recycling stats. Looking at what Dong
Chenchen has shared:

page_pool_recycle_in_ring
   ptr_ring_produce
     spin_lock(&r->producer_lock);
     WRITE_ONCE(r->queue[r->producer++], ptr)
     spin_unlock(&r->producer_lock);
       //recycle last page to pool
                                 page_pool_release
                                   page_pool_scrub
                                     page_pool_empty_ring
                                       ptr_ring_consume
                                       page_pool_return_page //release 
all page
                                   __page_pool_destroy
free_percpu(pool->recycle_stats);
                                      kfree(pool) //free
   recycle_stat_inc(pool, ring); //uaf read


The thread which put the last page into the ring has released the
producer lock and now it's trying to increment the recycling stats.
Which is invalid, the page it put into the right was de facto the
reference it held. So once it put that page in the ring it should
no longer touch the page pool.

It's not really related to the refcounting itself, the recycling
stats don't control the lifetime of the object. With your patch
to turn the producer lock into a proper barrier, the remaining
issue feels to me like a basic UAF?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [BUG Report] KASAN: slab-use-after-free in page_pool_recycle_in_ring
  2025-05-19 19:20     ` Mina Almasry
  2025-05-19 22:47       ` Jakub Kicinski
@ 2025-05-22 15:04       ` dongchenchen (A)
  1 sibling, 0 replies; 12+ messages in thread
From: dongchenchen (A) @ 2025-05-22 15:04 UTC (permalink / raw)
  To: Mina Almasry
  Cc: hawk, ilias.apalodimas, davem, edumazet, kuba, pabeni, horms,
	netdev, linux-kernel, zhangchangzhong


> On Tue, May 13, 2025 at 8:11 PM dongchenchen (A)
> <dongchenchen2@huawei.com> wrote:
>>
>>> On Tue, May 13, 2025 at 1:28 AM Dong Chenchen <dongchenchen2@huawei.com> wrote:
>>>> Hello,
>>>>
>>>> syzkaller found the UAF issue in page_pool_recycle_in_ring[1], which is
>>>> similar to syzbot+204a4382fcb3311f3858@syzkaller.appspotmail.com.
>>>>
>>>> root cause is as follow:
>>>>
>>>> page_pool_recycle_in_ring
>>>>     ptr_ring_produce
>>>>       spin_lock(&r->producer_lock);
>>>>       WRITE_ONCE(r->queue[r->producer++], ptr)
>>>>         //recycle last page to pool
>>>>                                   page_pool_release
>>>>                                     page_pool_scrub
>>>>                                       page_pool_empty_ring
>>>>                                         ptr_ring_consume
>>>>                                         page_pool_return_page //release all page
>>>>                                     __page_pool_destroy
>>>>                                        free_percpu(pool->recycle_stats);
>>>>                                        kfree(pool) //free
>>>>
>>>>        spin_unlock(&r->producer_lock); //pool->ring uaf read
>>>>     recycle_stat_inc(pool, ring);
>>>>
>>>> page_pool can be free while page pool recycle the last page in ring.
>>>> After adding a delay to the page_pool_recycle_in_ring(), syzlog[2] can
>>>> reproduce this issue with a high probability. Maybe we can fix it by
>>>> holding the user_cnt of the page pool during the page recycle process.
>>>>
>>>> Does anyone have a good idea to solve this problem?
>>>>
>>> Ugh. page_pool_release is not supposed to free the page_pool until all
>>> inflight pages have been returned. It detects that there are pending
>>> inflight pages by checking the atomic stats, but in this case it looks
>>> like we've raced checking the atomic stats with another cpu returning
>>> a netmem to the ptr ring (and it updates the stats _after_ it already
>>> returned to the ptr_ring).
>>>
>>> My guess here is that page_pool_scrub needs to acquire the
>>> r->producer_lock to make sure there are no other producers returning
>>> netmems to the ptr_ring while it's scrubbing them (and checking after
>>> to make sure there are no inflight netmems).
>>>
>>> Can you test this fix? It may need some massaging. I only checked it
>>> builds. I haven't thought through all the possible races yet:
>>>
>>> ```
>>> diff --git a/net/core/page_pool.c b/net/core/page_pool.c
>>> index 2b76848659418..8654608734773 100644
>>> --- a/net/core/page_pool.c
>>> +++ b/net/core/page_pool.c
>>> @@ -1146,10 +1146,17 @@ static void page_pool_scrub(struct page_pool *pool)
>>>
>>>    static int page_pool_release(struct page_pool *pool)
>>>    {
>>> +       bool in_softirq;
>>>           int inflight;
>>>
>>> +
>>> +       /* Acquire producer lock to make sure we don't race with another thread
>>> +        * returning a netmem to the ptr_ring.
>>> +        */
>>> +       in_softirq = page_pool_producer_lock(pool);
>>>           page_pool_scrub(pool);
>>>           inflight = page_pool_inflight(pool, true);
>>> +       page_pool_producer_unlock(pool, in_softirq);
>>>           if (!inflight)
>>>                   __page_pool_destroy(pool);
>>> ```
>> Hi, Mina!
>>
>> I tested this patch and the problem still exists.
>> Although this patch ensures that lock access is safe, the page recycle
>> process
>> can access the page pool after unlock.
>>
> Sorry for the very late reply; got a bit busy with some work work.
>
> My initial analysis was wrong as the test shows with the candidate
> fix. I took another look, and here is what I can tell so far. The full
> syzbot report is here for reference:
>
> https://syzkaller.appspot.com/bug?extid=204a4382fcb3311f3858
>
> page_pool_release_retry is supposed to block freeing the page_pool
> until all netmems have been freed via page_pool_put_unrefed_netmem
> using the inflight logic. What is clear from the syzbot report is that
> this inflight logic didn't work properly, because the
> page_pool_put_unrefed_netmem call happened after
> page_pool_release_retry has allowed the page_pool to be freed
> (__page_pool_destroy has already been called).
>
> The inflight logic works by taking the diff between
> `pool->pages_state_release_cnt` and `pool->pages_state_hold_cnt`.
> pages_state_hold_cnt is incremented when the page_pool allocates a new
> page from the buddy allocator. pages_state_hold_cnt is incremented at
> the end of the page_pool_put_unrefed_netmem.
>
> We don't expect new pages to be allocated by the page_pool owner after
> page_pool_destroy has been called, so pages_state_hold_cnt is supposed
> to not move after page_pool_destroy is called I think.
> pages_state_release_cnt should be <= pages_state_hold_cnt at the time
> of page_pool_destroy is called. Then when all the inflight netmems
> have been freed via page_pool_put_unrefed_netmem,
> pool->pages_state_release_cnt should be == to
> pool->pages_state_hold_cnt, and the page_pool should be allowed to be
> freed.
>
> Clearly this is not working, but I can't tell why. I also notice the
> syzbot report is from the bpf/test_run.c, but I don't think we have
> reports from prod, so it may be a test issue. Some possibilities:
>
> 1. Maybe the test is calling a page_pool allocation like
> page_pool_dev_alloc_pages in parallel with page_pool_destroy. That may
> increment pages_state_hold_cnt unexpectedly?
>
> 2.  Maybe one of the pages_state_*_cnt overflowed or something?
>
> 3. Memory corruption?
>
> I'm afraid I'm not sure. Possibly littering the code with warnings for
> unexpected cases would give some insight. For example, I think this
> would catch case #1:
>
> ```
> diff --git a/net/core/page_pool.c b/net/core/page_pool.c
> index 4011eb305cee..9fa70c60f9b5 100644
> --- a/net/core/page_pool.c
> +++ b/net/core/page_pool.c
> @@ -536,6 +536,9 @@ static struct page
> *__page_pool_alloc_page_order(struct page_pool *pool,
>          alloc_stat_inc(pool, slow_high_order);
>          page_pool_set_pp_info(pool, page_to_netmem(page));
>
> +       /* Warn if we're allocating a page on a destroyed page_pool */
> +       DEBUG_NET_WARN_ON(pool->destroy_cnt);
> +
>          /* Track how many pages are held 'in-flight' */
>          pool->pages_state_hold_cnt++;
>          trace_page_pool_state_hold(pool, page_to_netmem(page),
> @@ -582,6 +585,8 @@ static noinline netmem_ref
> __page_pool_alloc_pages_slow(struct page_pool *pool,
>                  page_pool_set_pp_info(pool, netmem);
>                  pool->alloc.cache[pool->alloc.count++] = netmem;
>                  /* Track how many pages are held 'in-flight' */
> +               /* Warn if we're allocating a page on a destroyed page_pool */
> +               DEBUG_NET_WARN_ON(pool->destroy_cnt);
>                  pool->pages_state_hold_cnt++;
>                  trace_page_pool_state_hold(pool, netmem,
>                                             pool->pages_state_hold_cnt);
> @@ -1271,6 +1276,8 @@ void net_mp_niov_set_page_pool(struct page_pool
> *pool, struct net_iov *niov)
>
>          page_pool_set_pp_info(pool, netmem);
>
> +       /* Warn if we're allocating a page on a destroyed page_pool */
> +       DEBUG_NET_WARN_ON(pool->destroy_cnt);
>          pool->pages_state_hold_cnt++;
>          trace_page_pool_state_hold(pool, netmem, pool->pages_state_hold_cnt);
> ```
>
> If you have steps to repro this, maybe post them and I'll try to take
> a look when I get a chance if you can't root cause this further.
HI, Mina
Thank you for your rigorous analysis of the problem.
We can use the config and log in syzkaller to reproduce the problem.
Enable CONFIG_PAGE_POOL_STATS and add delay to
page_pool_recycle_in_ring() can make it easier to reproduce the problem.

static noinline bool page_pool_recycle_in_ring(struct page_pool *pool, 
netmem_ref netmem)
{
         int ret;
         /* BH protection not needed if current is softirq */
@@ -715,6 +719,8 @@ static bool page_pool_recycle_in_ring(struct 
page_pool *pool, netmem_ref netmem)
                 ret = ptr_ring_produce_bh(&pool->ring, (__force void 
*)netmem);

         if (!ret) {
+               mdelay(2500);
                 recycle_stat_inc(pool, ring);
                 return true;
         }
> --
> Thanks,
> Mina

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [BUG Report] KASAN: slab-use-after-free in page_pool_recycle_in_ring
  2025-05-20 18:06           ` Jakub Kicinski
@ 2025-05-22 15:17             ` dongchenchen (A)
  2025-05-22 15:47               ` Jakub Kicinski
  0 siblings, 1 reply; 12+ messages in thread
From: dongchenchen (A) @ 2025-05-22 15:17 UTC (permalink / raw)
  To: Jakub Kicinski, Mina Almasry
  Cc: hawk, ilias.apalodimas, davem, edumazet, pabeni, horms, netdev,
	linux-kernel, zhangchangzhong


> On Mon, 19 May 2025 17:53:08 -0700 Mina Almasry wrote:
>> On Mon, May 19, 2025 at 3:47 PM Jakub Kicinski <kuba@kernel.org> wrote:
>>> On Mon, 19 May 2025 12:20:59 -0700 Mina Almasry wrote:
>>>> Clearly this is not working, but I can't tell why.
>>> I think your fix works but for the one line that collects recycling
>>> stats. If we put recycling stats under the producer lock we should
>>> be safe.
>> What are you referring to as recycle stats? Because I don't think
>> pool->recycle_stats have anything to do with freeing the page_pool.
>>
>> Or do you mean that we should put all the call sites that increment
>> and decrement pool->pages_state_release_cnt and
>> pool->pages_state_hold_cnt under the producer lock?
> No, just the "informational" recycling stats. Looking at what Dong
> Chenchen has shared:
>
> page_pool_recycle_in_ring
>     ptr_ring_produce
>       spin_lock(&r->producer_lock);
>       WRITE_ONCE(r->queue[r->producer++], ptr)
>       spin_unlock(&r->producer_lock);
>         //recycle last page to pool
>                                   page_pool_release
>                                     page_pool_scrub
>                                       page_pool_empty_ring
>                                         ptr_ring_consume
>                                         page_pool_return_page //release
> all page
>                                     __page_pool_destroy
> free_percpu(pool->recycle_stats);
>                                        kfree(pool) //free
>     recycle_stat_inc(pool, ring); //uaf read
>
>
> The thread which put the last page into the ring has released the
> producer lock and now it's trying to increment the recycling stats.
> Which is invalid, the page it put into the right was de facto the
> reference it held. So once it put that page in the ring it should
> no longer touch the page pool.
>
> It's not really related to the refcounting itself, the recycling
> stats don't control the lifetime of the object. With your patch
> to turn the producer lock into a proper barrier, the remaining
> issue feels to me like a basic UAF?

Hi, Jakub
Maybe we can fix the problem as follow:

diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index 7745ad924ae2..de3fa33d6775 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -707,19 +707,18 @@ void page_pool_return_page(struct page_pool *pool, netmem_ref netmem)
  
  static bool page_pool_recycle_in_ring(struct page_pool *pool, netmem_ref netmem)
  {
+	bool in_softirq;
  	int ret;
-	/* BH protection not needed if current is softirq */
-	if (in_softirq())
-		ret = ptr_ring_produce(&pool->ring, (__force void *)netmem);
-	else
-		ret = ptr_ring_produce_bh(&pool->ring, (__force void *)netmem);
  
-	if (!ret) {
+	/* BH protection not needed if current is softirq */
+	in_softirq = page_pool_producer_lock(pool);
+	ret = __ptr_ring_produce(&pool->ring, (__force void *)netmem);
+	if (!ret)
  		recycle_stat_inc(pool, ring);
-		return true;
-	}
  
-	return false;
+	page_pool_producer_unlock(pool, in_softirq);
+
+	return ret ? false : true;
  }
  
  /* Only allow direct recycling in special circumstances, into the

@@ -1091,10 +1090,16 @@ static void page_pool_scrub(struct page_pool *pool)
  
  static int page_pool_release(struct page_pool *pool)
  {
+	bool in_softirq;
  	int inflight;
  
+	/* Acquire producer lock to make sure we don't race with another thread
+	 * returning a netmem to the ptr_ring.
+	 */
+	in_softirq = page_pool_producer_lock(pool);
  	page_pool_scrub(pool);
  	inflight = page_pool_inflight(pool, true);
+	page_pool_producer_unlock(pool, in_softirq);
  	if (!inflight)
  		__page_pool_destroy(pool);
  
I have tested this patch.
-----
Best Regards,
Dong Chenchen


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [BUG Report] KASAN: slab-use-after-free in page_pool_recycle_in_ring
  2025-05-22 15:17             ` dongchenchen (A)
@ 2025-05-22 15:47               ` Jakub Kicinski
  2025-05-23  1:52                 ` dongchenchen (A)
  0 siblings, 1 reply; 12+ messages in thread
From: Jakub Kicinski @ 2025-05-22 15:47 UTC (permalink / raw)
  To: dongchenchen (A)
  Cc: Mina Almasry, hawk, ilias.apalodimas, davem, edumazet, pabeni,
	horms, netdev, linux-kernel, zhangchangzhong

On Thu, 22 May 2025 23:17:32 +0800 dongchenchen (A) wrote:
> Hi, Jakub
> Maybe we can fix the problem as follow:

Yes! a couple of minor nit picks below..

> diff --git a/net/core/page_pool.c b/net/core/page_pool.c
> index 7745ad924ae2..de3fa33d6775 100644
> --- a/net/core/page_pool.c
> +++ b/net/core/page_pool.c
> @@ -707,19 +707,18 @@ void page_pool_return_page(struct page_pool *pool, netmem_ref netmem)
>   
>   static bool page_pool_recycle_in_ring(struct page_pool *pool, netmem_ref netmem)
>   {
> +	bool in_softirq;
>   	int ret;
> -	/* BH protection not needed if current is softirq */
> -	if (in_softirq())
> -		ret = ptr_ring_produce(&pool->ring, (__force void *)netmem);
> -	else
> -		ret = ptr_ring_produce_bh(&pool->ring, (__force void *)netmem);
>   
> -	if (!ret) {
> +	/* BH protection not needed if current is softirq */
> +	in_softirq = page_pool_producer_lock(pool);
> +	ret = __ptr_ring_produce(&pool->ring, (__force void *)netmem);

Maybe we can flip the return value here we won't have to negate it below
and at return? Like this:

	ret = !__ptr_ring_produce(&pool->ring, (__force void *)netmem);

and adjust subsequent code

> +	if (!ret)
>   		recycle_stat_inc(pool, ring);
> -		return true;
> -	}
>   
> -	return false;
> +	page_pool_producer_unlock(pool, in_softirq);
> +
> +	return ret ? false : true;
>   }
>   
>   /* Only allow direct recycling in special circumstances, into the
> 
> @@ -1091,10 +1090,16 @@ static void page_pool_scrub(struct page_pool *pool)
>   
>   static int page_pool_release(struct page_pool *pool)
>   {
> +	bool in_softirq;
>   	int inflight;
>   
> +	/* Acquire producer lock to make sure we don't race with another thread
> +	 * returning a netmem to the ptr_ring.
> +	 */
> +	in_softirq = page_pool_producer_lock(pool);
>   	page_pool_scrub(pool);
>   	inflight = page_pool_inflight(pool, true);
> +	page_pool_producer_unlock(pool, in_softirq);

As I suggested earlier we don't have to lock the consumer, taking both
locks has lock ordering implications. My preference would be:

  	page_pool_scrub(pool);
  	inflight = page_pool_inflight(pool, true);
+	/* Acquire producer lock to make sure producers have exited. */
+	in_softirq = page_pool_producer_lock(pool);
+	page_pool_producer_unlock(pool, in_softirq);

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [BUG Report] KASAN: slab-use-after-free in page_pool_recycle_in_ring
  2025-05-22 15:47               ` Jakub Kicinski
@ 2025-05-23  1:52                 ` dongchenchen (A)
  0 siblings, 0 replies; 12+ messages in thread
From: dongchenchen (A) @ 2025-05-23  1:52 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Mina Almasry, hawk, ilias.apalodimas, davem, edumazet, pabeni,
	horms, netdev, linux-kernel, zhangchangzhong


> On Thu, 22 May 2025 23:17:32 +0800 dongchenchen (A) wrote:
>> Hi, Jakub
>> Maybe we can fix the problem as follow:
> Yes! a couple of minor nit picks below..
>
>> diff --git a/net/core/page_pool.c b/net/core/page_pool.c
>> index 7745ad924ae2..de3fa33d6775 100644
>> --- a/net/core/page_pool.c
>> +++ b/net/core/page_pool.c
>> @@ -707,19 +707,18 @@ void page_pool_return_page(struct page_pool *pool, netmem_ref netmem)
>>    
>>    static bool page_pool_recycle_in_ring(struct page_pool *pool, netmem_ref netmem)
>>    {
>> +	bool in_softirq;
>>    	int ret;
>> -	/* BH protection not needed if current is softirq */
>> -	if (in_softirq())
>> -		ret = ptr_ring_produce(&pool->ring, (__force void *)netmem);
>> -	else
>> -		ret = ptr_ring_produce_bh(&pool->ring, (__force void *)netmem);
>>    
>> -	if (!ret) {
>> +	/* BH protection not needed if current is softirq */
>> +	in_softirq = page_pool_producer_lock(pool);
>> +	ret = __ptr_ring_produce(&pool->ring, (__force void *)netmem);
> Maybe we can flip the return value here we won't have to negate it below
> and at return? Like this:
>
> 	ret = !__ptr_ring_produce(&pool->ring, (__force void *)netmem);
>
> and adjust subsequent code

Hi,Jakub
Thanks for your suggestions!

>> +	if (!ret)
>>    		recycle_stat_inc(pool, ring);
>> -		return true;
>> -	}
>>    
>> -	return false;
>> +	page_pool_producer_unlock(pool, in_softirq);
>> +
>> +	return ret ? false : true;
>>    }
>>    
>>    /* Only allow direct recycling in special circumstances, into the
>>
>> @@ -1091,10 +1090,16 @@ static void page_pool_scrub(struct page_pool *pool)
>>    
>>    static int page_pool_release(struct page_pool *pool)
>>    {
>> +	bool in_softirq;
>>    	int inflight;
>>    
>> +	/* Acquire producer lock to make sure we don't race with another thread
>> +	 * returning a netmem to the ptr_ring.
>> +	 */
>> +	in_softirq = page_pool_producer_lock(pool);
>>    	page_pool_scrub(pool);
>>    	inflight = page_pool_inflight(pool, true);
>> +	page_pool_producer_unlock(pool, in_softirq);
> As I suggested earlier we don't have to lock the consumer, taking both
> locks has lock ordering implications. My preference would be:
>
>    	page_pool_scrub(pool);
>    	inflight = page_pool_inflight(pool, true);
> +	/* Acquire producer lock to make sure producers have exited. */
> +	in_softirq = page_pool_producer_lock(pool);
> +	page_pool_producer_unlock(pool, in_softirq);

Yes! there is no need to hold lock for page_pool_inflight(). The lock can
be used as a barrier for the completion of the recycle process.
I have tested this patch. The patch will be sent later.
----- Best Regards,
Dong Chenchen


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2025-05-23  1:52 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-13  8:31 [BUG Report] KASAN: slab-use-after-free in page_pool_recycle_in_ring Dong Chenchen
2025-05-13 20:06 ` Mina Almasry
2025-05-13 21:21   ` Jakub Kicinski
2025-05-14  3:10   ` dongchenchen (A)
2025-05-19 19:20     ` Mina Almasry
2025-05-19 22:47       ` Jakub Kicinski
2025-05-20  0:53         ` Mina Almasry
2025-05-20 18:06           ` Jakub Kicinski
2025-05-22 15:17             ` dongchenchen (A)
2025-05-22 15:47               ` Jakub Kicinski
2025-05-23  1:52                 ` dongchenchen (A)
2025-05-22 15:04       ` dongchenchen (A)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).