* KASAN veth use after free in XDP_REDIRECT
@ 2023-01-24 22:45 Shawn Bohrer
2023-01-25 1:54 ` Toke Høiland-Jørgensen
0 siblings, 1 reply; 13+ messages in thread
From: Shawn Bohrer @ 2023-01-24 22:45 UTC (permalink / raw)
To: netdev, bpf; +Cc: makita.toshiaki
Hello,
We've seen the following KASAN report on our systems. When using
AF_XDP on a veth.
KASAN report:
BUG: KASAN: use-after-free in __xsk_rcv+0x18d/0x2c0
Read of size 78 at addr ffff888976250154 by task napi/iconduit-g/148640
CPU: 5 PID: 148640 Comm: napi/iconduit-g Kdump: loaded Tainted: G O 6.1.4-cloudflare-kasan-2023.1.2 #1
Hardware name: Quanta Computer Inc. QuantaPlex T41S-2U/S2S-MB, BIOS S2S_3B10.03 06/21/2018
Call Trace:
<TASK>
dump_stack_lvl+0x34/0x48
print_report+0x170/0x473
? __xsk_rcv+0x18d/0x2c0
kasan_report+0xad/0x130
? __xsk_rcv+0x18d/0x2c0
kasan_check_range+0x149/0x1a0
memcpy+0x20/0x60
__xsk_rcv+0x18d/0x2c0
__xsk_map_redirect+0x1f3/0x490
? veth_xdp_rcv_skb+0x89c/0x1ba0 [veth]
xdp_do_redirect+0x5ca/0xd60
veth_xdp_rcv_skb+0x935/0x1ba0 [veth]
? __netif_receive_skb_list_core+0x671/0x920
? veth_xdp+0x670/0x670 [veth]
veth_xdp_rcv+0x304/0xa20 [veth]
? do_xdp_generic+0x150/0x150
? veth_xdp_rcv_one+0xde0/0xde0 [veth]
? _raw_spin_lock_bh+0xe0/0xe0
? newidle_balance+0x887/0xe30
? __perf_event_task_sched_in+0xdb/0x800
veth_poll+0x139/0x571 [veth]
? veth_xdp_rcv+0xa20/0xa20 [veth]
? _raw_spin_unlock+0x39/0x70
? finish_task_switch.isra.0+0x17e/0x7d0
? __switch_to+0x5cf/0x1070
? __schedule+0x95b/0x2640
? io_schedule_timeout+0x160/0x160
__napi_poll+0xa1/0x440
napi_threaded_poll+0x3d1/0x460
? __napi_poll+0x440/0x440
? __kthread_parkme+0xc6/0x1f0
? __napi_poll+0x440/0x440
kthread+0x2a2/0x340
? kthread_complete_and_exit+0x20/0x20
ret_from_fork+0x22/0x30
</TASK>
Freed by task 148640:
kasan_save_stack+0x23/0x50
kasan_set_track+0x21/0x30
kasan_save_free_info+0x2a/0x40
____kasan_slab_free+0x169/0x1d0
slab_free_freelist_hook+0xd2/0x190
__kmem_cache_free+0x1a1/0x2f0
skb_release_data+0x449/0x600
consume_skb+0x9f/0x1c0
veth_xdp_rcv_skb+0x89c/0x1ba0 [veth]
veth_xdp_rcv+0x304/0xa20 [veth]
veth_poll+0x139/0x571 [veth]
__napi_poll+0xa1/0x440
napi_threaded_poll+0x3d1/0x460
kthread+0x2a2/0x340
ret_from_fork+0x22/0x30
The buggy address belongs to the object at ffff888976250000
which belongs to the cache kmalloc-2k of size 2048
The buggy address is located 340 bytes inside of
2048-byte region [ffff888976250000, ffff888976250800)
The buggy address belongs to the physical page:
page:00000000ae18262a refcount:2 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x976250
head:00000000ae18262a order:3 compound_mapcount:0 compound_pincount:0
flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
raw: 002ffff800010200 0000000000000000 dead000000000122 ffff88810004cf00
raw: 0000000000000000 0000000080080008 00000002ffffffff 0000000000000000
page dumped because: kasan: bad access detected
Memory state around the buggy address:
ffff888976250000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888976250080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff888976250100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888976250180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888976250200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
If I understand the code correctly it looks like a xdp_buf is
constructed pointing to the memory backed by a skb but consume_skb()
is called while the xdp_buf() is still in use.
```
case XDP_REDIRECT:
veth_xdp_get(&xdp);
consume_skb(skb);
xdp.rxq->mem = rq->xdp_mem;
if (xdp_do_redirect(rq->dev, &xdp, xdp_prog)) {
stats->rx_drops++;
goto err_xdp;
}
stats->xdp_redirect++;
rcu_read_unlock();
goto xdp_xmit;
```
It is worth noting that I think XDP_TX has the exact same problem.
Again assuming I understand the problem one naive solution might be to
move the consum_skb() call after xdp_do_redirect(). I think this
might work for BPF_MAP_TYPE_XSKMAP, BPF_MAP_TYPE_DEVMAP, and
BPF_MAP_TYPE_DEVMAP_HASH since those all seem to copy the xdb_buf to
new memory. The copy happens for XSKMAP in __xsk_rcv() and for the
DEVMAP cases happens in dev_map_enqueue_clone().
However, it would appear that for BPF_MAP_TYPE_CPUMAP that memory can
live much longer, possibly even after xdp_do_flush(). If I'm correct,
I'm not really sure where it would be safe to call consume_skb().
The XDP_TX case looks similarly complex but I have not spent as much
time looking at that one.
Thanks,
Shawn Bohrer
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: KASAN veth use after free in XDP_REDIRECT 2023-01-24 22:45 KASAN veth use after free in XDP_REDIRECT Shawn Bohrer @ 2023-01-25 1:54 ` Toke Høiland-Jørgensen 2023-01-25 2:02 ` Jakub Kicinski ` (2 more replies) 0 siblings, 3 replies; 13+ messages in thread From: Toke Høiland-Jørgensen @ 2023-01-25 1:54 UTC (permalink / raw) To: Shawn Bohrer, netdev, bpf; +Cc: makita.toshiaki Shawn Bohrer <sbohrer@cloudflare.com> writes: > Hello, > > We've seen the following KASAN report on our systems. When using > AF_XDP on a veth. > > KASAN report: > > BUG: KASAN: use-after-free in __xsk_rcv+0x18d/0x2c0 > Read of size 78 at addr ffff888976250154 by task napi/iconduit-g/148640 > > CPU: 5 PID: 148640 Comm: napi/iconduit-g Kdump: loaded Tainted: G O 6.1.4-cloudflare-kasan-2023.1.2 #1 > Hardware name: Quanta Computer Inc. QuantaPlex T41S-2U/S2S-MB, BIOS S2S_3B10.03 06/21/2018 > Call Trace: > <TASK> > dump_stack_lvl+0x34/0x48 > print_report+0x170/0x473 > ? __xsk_rcv+0x18d/0x2c0 > kasan_report+0xad/0x130 > ? __xsk_rcv+0x18d/0x2c0 > kasan_check_range+0x149/0x1a0 > memcpy+0x20/0x60 > __xsk_rcv+0x18d/0x2c0 > __xsk_map_redirect+0x1f3/0x490 > ? veth_xdp_rcv_skb+0x89c/0x1ba0 [veth] > xdp_do_redirect+0x5ca/0xd60 > veth_xdp_rcv_skb+0x935/0x1ba0 [veth] > ? __netif_receive_skb_list_core+0x671/0x920 > ? veth_xdp+0x670/0x670 [veth] > veth_xdp_rcv+0x304/0xa20 [veth] > ? do_xdp_generic+0x150/0x150 > ? veth_xdp_rcv_one+0xde0/0xde0 [veth] > ? _raw_spin_lock_bh+0xe0/0xe0 > ? newidle_balance+0x887/0xe30 > ? __perf_event_task_sched_in+0xdb/0x800 > veth_poll+0x139/0x571 [veth] > ? veth_xdp_rcv+0xa20/0xa20 [veth] > ? _raw_spin_unlock+0x39/0x70 > ? finish_task_switch.isra.0+0x17e/0x7d0 > ? __switch_to+0x5cf/0x1070 > ? __schedule+0x95b/0x2640 > ? io_schedule_timeout+0x160/0x160 > __napi_poll+0xa1/0x440 > napi_threaded_poll+0x3d1/0x460 > ? __napi_poll+0x440/0x440 > ? __kthread_parkme+0xc6/0x1f0 > ? __napi_poll+0x440/0x440 > kthread+0x2a2/0x340 > ? kthread_complete_and_exit+0x20/0x20 > ret_from_fork+0x22/0x30 > </TASK> > > Freed by task 148640: > kasan_save_stack+0x23/0x50 > kasan_set_track+0x21/0x30 > kasan_save_free_info+0x2a/0x40 > ____kasan_slab_free+0x169/0x1d0 > slab_free_freelist_hook+0xd2/0x190 > __kmem_cache_free+0x1a1/0x2f0 > skb_release_data+0x449/0x600 > consume_skb+0x9f/0x1c0 > veth_xdp_rcv_skb+0x89c/0x1ba0 [veth] > veth_xdp_rcv+0x304/0xa20 [veth] > veth_poll+0x139/0x571 [veth] > __napi_poll+0xa1/0x440 > napi_threaded_poll+0x3d1/0x460 > kthread+0x2a2/0x340 > ret_from_fork+0x22/0x30 > > > The buggy address belongs to the object at ffff888976250000 > which belongs to the cache kmalloc-2k of size 2048 > The buggy address is located 340 bytes inside of > 2048-byte region [ffff888976250000, ffff888976250800) > > The buggy address belongs to the physical page: > page:00000000ae18262a refcount:2 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x976250 > head:00000000ae18262a order:3 compound_mapcount:0 compound_pincount:0 > flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff) > raw: 002ffff800010200 0000000000000000 dead000000000122 ffff88810004cf00 > raw: 0000000000000000 0000000080080008 00000002ffffffff 0000000000000000 > page dumped because: kasan: bad access detected > > Memory state around the buggy address: > ffff888976250000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb > ffff888976250080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb >>ffff888976250100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb > ^ > ffff888976250180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb > ffff888976250200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb > > > If I understand the code correctly it looks like a xdp_buf is > constructed pointing to the memory backed by a skb but consume_skb() > is called while the xdp_buf() is still in use. > > ``` > case XDP_REDIRECT: > veth_xdp_get(&xdp); > consume_skb(skb); > xdp.rxq->mem = rq->xdp_mem; > if (xdp_do_redirect(rq->dev, &xdp, xdp_prog)) { > stats->rx_drops++; > goto err_xdp; > } > stats->xdp_redirect++; > rcu_read_unlock(); > goto xdp_xmit; > ``` > > It is worth noting that I think XDP_TX has the exact same problem. > > Again assuming I understand the problem one naive solution might be to > move the consum_skb() call after xdp_do_redirect(). I think this > might work for BPF_MAP_TYPE_XSKMAP, BPF_MAP_TYPE_DEVMAP, and > BPF_MAP_TYPE_DEVMAP_HASH since those all seem to copy the xdb_buf to > new memory. The copy happens for XSKMAP in __xsk_rcv() and for the > DEVMAP cases happens in dev_map_enqueue_clone(). > > However, it would appear that for BPF_MAP_TYPE_CPUMAP that memory can > live much longer, possibly even after xdp_do_flush(). If I'm correct, > I'm not really sure where it would be safe to call consume_skb(). So the idea is that veth_xdp_get() does a get_page(virt_to_page(xdp->data)), where xdp->data in this case points to skb->head. This should keep the data page alive even if the skb surrounding it is freed by the call to consume_skb(). However, because the skb->head in this case was allocated from a slab allocator, taking a page refcount is not enough to prevent it from being freed. I'm not sure how best to fix this. I guess we could try to detect this case in the veth driver and copy the data, like we do for skb_share() etc. However, I'm not sure how to actually detect this case... Where are the skbs being processed coming from? I.e., what path allocated them? -Toke ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: KASAN veth use after free in XDP_REDIRECT 2023-01-25 1:54 ` Toke Høiland-Jørgensen @ 2023-01-25 2:02 ` Jakub Kicinski 2023-01-25 2:07 ` Toshiaki Makita 2023-01-25 16:40 ` KASAN veth " Shawn Bohrer 2 siblings, 0 replies; 13+ messages in thread From: Jakub Kicinski @ 2023-01-25 2:02 UTC (permalink / raw) To: Toke Høiland-Jørgensen Cc: Shawn Bohrer, netdev, bpf, makita.toshiaki On Wed, 25 Jan 2023 02:54:52 +0100 Toke Høiland-Jørgensen wrote: > However, because the skb->head in this case was allocated from a slab > allocator, taking a page refcount is not enough to prevent it from being > freed. > > I'm not sure how best to fix this. I guess we could try to detect this > case in the veth driver and copy the data, like we do for skb_share() > etc. However, I'm not sure how to actually detect this case... sbk->head_frag ? ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: KASAN veth use after free in XDP_REDIRECT 2023-01-25 1:54 ` Toke Høiland-Jørgensen 2023-01-25 2:02 ` Jakub Kicinski @ 2023-01-25 2:07 ` Toshiaki Makita 2023-03-08 22:33 ` Shawn Bohrer 2023-01-25 16:40 ` KASAN veth " Shawn Bohrer 2 siblings, 1 reply; 13+ messages in thread From: Toshiaki Makita @ 2023-01-25 2:07 UTC (permalink / raw) To: Toke Høiland-Jørgensen, Shawn Bohrer Cc: makita.toshiaki, netdev, bpf On 2023/01/25 10:54, Toke Høiland-Jørgensen wrote: > Shawn Bohrer <sbohrer@cloudflare.com> writes: > >> Hello, >> >> We've seen the following KASAN report on our systems. When using >> AF_XDP on a veth. >> >> KASAN report: >> >> BUG: KASAN: use-after-free in __xsk_rcv+0x18d/0x2c0 >> Read of size 78 at addr ffff888976250154 by task napi/iconduit-g/148640 >> >> CPU: 5 PID: 148640 Comm: napi/iconduit-g Kdump: loaded Tainted: G O 6.1.4-cloudflare-kasan-2023.1.2 #1 >> Hardware name: Quanta Computer Inc. QuantaPlex T41S-2U/S2S-MB, BIOS S2S_3B10.03 06/21/2018 >> Call Trace: >> <TASK> >> dump_stack_lvl+0x34/0x48 >> print_report+0x170/0x473 >> ? __xsk_rcv+0x18d/0x2c0 >> kasan_report+0xad/0x130 >> ? __xsk_rcv+0x18d/0x2c0 >> kasan_check_range+0x149/0x1a0 >> memcpy+0x20/0x60 >> __xsk_rcv+0x18d/0x2c0 >> __xsk_map_redirect+0x1f3/0x490 >> ? veth_xdp_rcv_skb+0x89c/0x1ba0 [veth] >> xdp_do_redirect+0x5ca/0xd60 >> veth_xdp_rcv_skb+0x935/0x1ba0 [veth] >> ? __netif_receive_skb_list_core+0x671/0x920 >> ? veth_xdp+0x670/0x670 [veth] >> veth_xdp_rcv+0x304/0xa20 [veth] >> ? do_xdp_generic+0x150/0x150 >> ? veth_xdp_rcv_one+0xde0/0xde0 [veth] >> ? _raw_spin_lock_bh+0xe0/0xe0 >> ? newidle_balance+0x887/0xe30 >> ? __perf_event_task_sched_in+0xdb/0x800 >> veth_poll+0x139/0x571 [veth] >> ? veth_xdp_rcv+0xa20/0xa20 [veth] >> ? _raw_spin_unlock+0x39/0x70 >> ? finish_task_switch.isra.0+0x17e/0x7d0 >> ? __switch_to+0x5cf/0x1070 >> ? __schedule+0x95b/0x2640 >> ? io_schedule_timeout+0x160/0x160 >> __napi_poll+0xa1/0x440 >> napi_threaded_poll+0x3d1/0x460 >> ? __napi_poll+0x440/0x440 >> ? __kthread_parkme+0xc6/0x1f0 >> ? __napi_poll+0x440/0x440 >> kthread+0x2a2/0x340 >> ? kthread_complete_and_exit+0x20/0x20 >> ret_from_fork+0x22/0x30 >> </TASK> >> >> Freed by task 148640: >> kasan_save_stack+0x23/0x50 >> kasan_set_track+0x21/0x30 >> kasan_save_free_info+0x2a/0x40 >> ____kasan_slab_free+0x169/0x1d0 >> slab_free_freelist_hook+0xd2/0x190 >> __kmem_cache_free+0x1a1/0x2f0 >> skb_release_data+0x449/0x600 >> consume_skb+0x9f/0x1c0 >> veth_xdp_rcv_skb+0x89c/0x1ba0 [veth] >> veth_xdp_rcv+0x304/0xa20 [veth] >> veth_poll+0x139/0x571 [veth] >> __napi_poll+0xa1/0x440 >> napi_threaded_poll+0x3d1/0x460 >> kthread+0x2a2/0x340 >> ret_from_fork+0x22/0x30 >> >> >> The buggy address belongs to the object at ffff888976250000 >> which belongs to the cache kmalloc-2k of size 2048 >> The buggy address is located 340 bytes inside of >> 2048-byte region [ffff888976250000, ffff888976250800) >> >> The buggy address belongs to the physical page: >> page:00000000ae18262a refcount:2 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x976250 >> head:00000000ae18262a order:3 compound_mapcount:0 compound_pincount:0 >> flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff) >> raw: 002ffff800010200 0000000000000000 dead000000000122 ffff88810004cf00 >> raw: 0000000000000000 0000000080080008 00000002ffffffff 0000000000000000 >> page dumped because: kasan: bad access detected >> >> Memory state around the buggy address: >> ffff888976250000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb >> ffff888976250080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb >>> ffff888976250100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb >> ^ >> ffff888976250180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb >> ffff888976250200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb >> >> >> If I understand the code correctly it looks like a xdp_buf is >> constructed pointing to the memory backed by a skb but consume_skb() >> is called while the xdp_buf() is still in use. >> >> ``` >> case XDP_REDIRECT: >> veth_xdp_get(&xdp); >> consume_skb(skb); >> xdp.rxq->mem = rq->xdp_mem; >> if (xdp_do_redirect(rq->dev, &xdp, xdp_prog)) { >> stats->rx_drops++; >> goto err_xdp; >> } >> stats->xdp_redirect++; >> rcu_read_unlock(); >> goto xdp_xmit; >> ``` >> >> It is worth noting that I think XDP_TX has the exact same problem. >> >> Again assuming I understand the problem one naive solution might be to >> move the consum_skb() call after xdp_do_redirect(). I think this >> might work for BPF_MAP_TYPE_XSKMAP, BPF_MAP_TYPE_DEVMAP, and >> BPF_MAP_TYPE_DEVMAP_HASH since those all seem to copy the xdb_buf to >> new memory. The copy happens for XSKMAP in __xsk_rcv() and for the >> DEVMAP cases happens in dev_map_enqueue_clone(). >> >> However, it would appear that for BPF_MAP_TYPE_CPUMAP that memory can >> live much longer, possibly even after xdp_do_flush(). If I'm correct, >> I'm not really sure where it would be safe to call consume_skb(). > > So the idea is that veth_xdp_get() does a > get_page(virt_to_page(xdp->data)), where xdp->data in this case points > to skb->head. This should keep the data page alive even if the skb > surrounding it is freed by the call to consume_skb(). > > However, because the skb->head in this case was allocated from a slab > allocator, taking a page refcount is not enough to prevent it from being > freed. Not sure why skb->head is kmallocked here. skb_head_is_locked() check in veth_convert_skb_to_xdp_buff() should ensure that skb head is a page fragment. Toshiaki Makita ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: KASAN veth use after free in XDP_REDIRECT 2023-01-25 2:07 ` Toshiaki Makita @ 2023-03-08 22:33 ` Shawn Bohrer 2023-03-14 8:02 ` Toshiaki Makita 0 siblings, 1 reply; 13+ messages in thread From: Shawn Bohrer @ 2023-03-08 22:33 UTC (permalink / raw) To: Toshiaki Makita Cc: Toke Høiland-Jørgensen, makita.toshiaki, netdev, bpf, lorenzo, kernel-team On Wed, Jan 25, 2023 at 11:07:32AM +0900, Toshiaki Makita wrote: > On 2023/01/25 10:54, Toke Høiland-Jørgensen wrote: > > Shawn Bohrer <sbohrer@cloudflare.com> writes: > > > > > Hello, > > > > > > We've seen the following KASAN report on our systems. When using > > > AF_XDP on a veth. > > > > > > KASAN report: > > > > > > BUG: KASAN: use-after-free in __xsk_rcv+0x18d/0x2c0 > > > Read of size 78 at addr ffff888976250154 by task napi/iconduit-g/148640 > > > > > > CPU: 5 PID: 148640 Comm: napi/iconduit-g Kdump: loaded Tainted: G O 6.1.4-cloudflare-kasan-2023.1.2 #1 > > > Hardware name: Quanta Computer Inc. QuantaPlex T41S-2U/S2S-MB, BIOS S2S_3B10.03 06/21/2018 > > > Call Trace: > > > <TASK> > > > dump_stack_lvl+0x34/0x48 > > > print_report+0x170/0x473 > > > ? __xsk_rcv+0x18d/0x2c0 > > > kasan_report+0xad/0x130 > > > ? __xsk_rcv+0x18d/0x2c0 > > > kasan_check_range+0x149/0x1a0 > > > memcpy+0x20/0x60 > > > __xsk_rcv+0x18d/0x2c0 > > > __xsk_map_redirect+0x1f3/0x490 > > > ? veth_xdp_rcv_skb+0x89c/0x1ba0 [veth] > > > xdp_do_redirect+0x5ca/0xd60 > > > veth_xdp_rcv_skb+0x935/0x1ba0 [veth] > > > ? __netif_receive_skb_list_core+0x671/0x920 > > > ? veth_xdp+0x670/0x670 [veth] > > > veth_xdp_rcv+0x304/0xa20 [veth] > > > ? do_xdp_generic+0x150/0x150 > > > ? veth_xdp_rcv_one+0xde0/0xde0 [veth] > > > ? _raw_spin_lock_bh+0xe0/0xe0 > > > ? newidle_balance+0x887/0xe30 > > > ? __perf_event_task_sched_in+0xdb/0x800 > > > veth_poll+0x139/0x571 [veth] > > > ? veth_xdp_rcv+0xa20/0xa20 [veth] > > > ? _raw_spin_unlock+0x39/0x70 > > > ? finish_task_switch.isra.0+0x17e/0x7d0 > > > ? __switch_to+0x5cf/0x1070 > > > ? __schedule+0x95b/0x2640 > > > ? io_schedule_timeout+0x160/0x160 > > > __napi_poll+0xa1/0x440 > > > napi_threaded_poll+0x3d1/0x460 > > > ? __napi_poll+0x440/0x440 > > > ? __kthread_parkme+0xc6/0x1f0 > > > ? __napi_poll+0x440/0x440 > > > kthread+0x2a2/0x340 > > > ? kthread_complete_and_exit+0x20/0x20 > > > ret_from_fork+0x22/0x30 > > > </TASK> > > > > > > Freed by task 148640: > > > kasan_save_stack+0x23/0x50 > > > kasan_set_track+0x21/0x30 > > > kasan_save_free_info+0x2a/0x40 > > > ____kasan_slab_free+0x169/0x1d0 > > > slab_free_freelist_hook+0xd2/0x190 > > > __kmem_cache_free+0x1a1/0x2f0 > > > skb_release_data+0x449/0x600 > > > consume_skb+0x9f/0x1c0 > > > veth_xdp_rcv_skb+0x89c/0x1ba0 [veth] > > > veth_xdp_rcv+0x304/0xa20 [veth] > > > veth_poll+0x139/0x571 [veth] > > > __napi_poll+0xa1/0x440 > > > napi_threaded_poll+0x3d1/0x460 > > > kthread+0x2a2/0x340 > > > ret_from_fork+0x22/0x30 > > > > > > > > > The buggy address belongs to the object at ffff888976250000 > > > which belongs to the cache kmalloc-2k of size 2048 > > > The buggy address is located 340 bytes inside of > > > 2048-byte region [ffff888976250000, ffff888976250800) > > > > > > The buggy address belongs to the physical page: > > > page:00000000ae18262a refcount:2 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x976250 > > > head:00000000ae18262a order:3 compound_mapcount:0 compound_pincount:0 > > > flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff) > > > raw: 002ffff800010200 0000000000000000 dead000000000122 ffff88810004cf00 > > > raw: 0000000000000000 0000000080080008 00000002ffffffff 0000000000000000 > > > page dumped because: kasan: bad access detected > > > > > > Memory state around the buggy address: > > > ffff888976250000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb > > > ffff888976250080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb > > > > ffff888976250100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb > > > ^ > > > ffff888976250180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb > > > ffff888976250200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb > > > > > > > > > If I understand the code correctly it looks like a xdp_buf is > > > constructed pointing to the memory backed by a skb but consume_skb() > > > is called while the xdp_buf() is still in use. > > > > > > ``` > > > case XDP_REDIRECT: > > > veth_xdp_get(&xdp); > > > consume_skb(skb); > > > xdp.rxq->mem = rq->xdp_mem; > > > if (xdp_do_redirect(rq->dev, &xdp, xdp_prog)) { > > > stats->rx_drops++; > > > goto err_xdp; > > > } > > > stats->xdp_redirect++; > > > rcu_read_unlock(); > > > goto xdp_xmit; > > > ``` > > > > > > It is worth noting that I think XDP_TX has the exact same problem. > > > > > > Again assuming I understand the problem one naive solution might be to > > > move the consum_skb() call after xdp_do_redirect(). I think this > > > might work for BPF_MAP_TYPE_XSKMAP, BPF_MAP_TYPE_DEVMAP, and > > > BPF_MAP_TYPE_DEVMAP_HASH since those all seem to copy the xdb_buf to > > > new memory. The copy happens for XSKMAP in __xsk_rcv() and for the > > > DEVMAP cases happens in dev_map_enqueue_clone(). > > > > > > However, it would appear that for BPF_MAP_TYPE_CPUMAP that memory can > > > live much longer, possibly even after xdp_do_flush(). If I'm correct, > > > I'm not really sure where it would be safe to call consume_skb(). > > > > So the idea is that veth_xdp_get() does a > > get_page(virt_to_page(xdp->data)), where xdp->data in this case points > > to skb->head. This should keep the data page alive even if the skb > > surrounding it is freed by the call to consume_skb(). > > > > However, because the skb->head in this case was allocated from a slab > > allocator, taking a page refcount is not enough to prevent it from being > > freed. > > Not sure why skb->head is kmallocked here. > skb_head_is_locked() check in veth_convert_skb_to_xdp_buff() should ensure that > skb head is a page fragment. I have a few more details here. We have some machines running 5.15 kernels and some are running 6.1 kernels. So far it appears this only happens on 6.1. We also have a couple of different network cards but it appears that only the machines with Solarflare cards using the sfc driver hit the KASAN BUG. 718a18a0c8a67f97781e40bdef7cdd055c430996 "veth: Rework veth_xdp_rcv_skb in order to accept non-linear skb" reworked and added the veth_convert_skb_to_xdp_buff() call you mentioned. Importantly it also added a call to pskb_expand_head() which will kmalloc() the skb->head. This looks like a path that could be causing the KASAN BUG, but I have not yet confirmed that is the path we are hitting. This change was also added in 5.18 so might explain why we don't see it on 5.15. -- Shawn ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: KASAN veth use after free in XDP_REDIRECT 2023-03-08 22:33 ` Shawn Bohrer @ 2023-03-14 8:02 ` Toshiaki Makita 2023-03-14 15:25 ` Shawn Bohrer 0 siblings, 1 reply; 13+ messages in thread From: Toshiaki Makita @ 2023-03-14 8:02 UTC (permalink / raw) To: Shawn Bohrer, lorenzo Cc: Toke Høiland-Jørgensen, makita.toshiaki, netdev, bpf, kernel-team On 2023/03/09 7:33, Shawn Bohrer wrote: > On Wed, Jan 25, 2023 at 11:07:32AM +0900, Toshiaki Makita wrote: >> On 2023/01/25 10:54, Toke Høiland-Jørgensen wrote: >>> Shawn Bohrer <sbohrer@cloudflare.com> writes: >>> >>>> Hello, >>>> >>>> We've seen the following KASAN report on our systems. When using >>>> AF_XDP on a veth. >>>> >>>> KASAN report: >>>> >>>> BUG: KASAN: use-after-free in __xsk_rcv+0x18d/0x2c0 >>>> Read of size 78 at addr ffff888976250154 by task napi/iconduit-g/148640 >>>> >>>> CPU: 5 PID: 148640 Comm: napi/iconduit-g Kdump: loaded Tainted: G O 6.1.4-cloudflare-kasan-2023.1.2 #1 >>>> Hardware name: Quanta Computer Inc. QuantaPlex T41S-2U/S2S-MB, BIOS S2S_3B10.03 06/21/2018 >>>> Call Trace: >>>> <TASK> >>>> dump_stack_lvl+0x34/0x48 >>>> print_report+0x170/0x473 >>>> ? __xsk_rcv+0x18d/0x2c0 >>>> kasan_report+0xad/0x130 >>>> ? __xsk_rcv+0x18d/0x2c0 >>>> kasan_check_range+0x149/0x1a0 >>>> memcpy+0x20/0x60 >>>> __xsk_rcv+0x18d/0x2c0 >>>> __xsk_map_redirect+0x1f3/0x490 >>>> ? veth_xdp_rcv_skb+0x89c/0x1ba0 [veth] >>>> xdp_do_redirect+0x5ca/0xd60 >>>> veth_xdp_rcv_skb+0x935/0x1ba0 [veth] >>>> ? __netif_receive_skb_list_core+0x671/0x920 >>>> ? veth_xdp+0x670/0x670 [veth] >>>> veth_xdp_rcv+0x304/0xa20 [veth] >>>> ? do_xdp_generic+0x150/0x150 >>>> ? veth_xdp_rcv_one+0xde0/0xde0 [veth] >>>> ? _raw_spin_lock_bh+0xe0/0xe0 >>>> ? newidle_balance+0x887/0xe30 >>>> ? __perf_event_task_sched_in+0xdb/0x800 >>>> veth_poll+0x139/0x571 [veth] >>>> ? veth_xdp_rcv+0xa20/0xa20 [veth] >>>> ? _raw_spin_unlock+0x39/0x70 >>>> ? finish_task_switch.isra.0+0x17e/0x7d0 >>>> ? __switch_to+0x5cf/0x1070 >>>> ? __schedule+0x95b/0x2640 >>>> ? io_schedule_timeout+0x160/0x160 >>>> __napi_poll+0xa1/0x440 >>>> napi_threaded_poll+0x3d1/0x460 >>>> ? __napi_poll+0x440/0x440 >>>> ? __kthread_parkme+0xc6/0x1f0 >>>> ? __napi_poll+0x440/0x440 >>>> kthread+0x2a2/0x340 >>>> ? kthread_complete_and_exit+0x20/0x20 >>>> ret_from_fork+0x22/0x30 >>>> </TASK> >>>> >>>> Freed by task 148640: >>>> kasan_save_stack+0x23/0x50 >>>> kasan_set_track+0x21/0x30 >>>> kasan_save_free_info+0x2a/0x40 >>>> ____kasan_slab_free+0x169/0x1d0 >>>> slab_free_freelist_hook+0xd2/0x190 >>>> __kmem_cache_free+0x1a1/0x2f0 >>>> skb_release_data+0x449/0x600 >>>> consume_skb+0x9f/0x1c0 >>>> veth_xdp_rcv_skb+0x89c/0x1ba0 [veth] >>>> veth_xdp_rcv+0x304/0xa20 [veth] >>>> veth_poll+0x139/0x571 [veth] >>>> __napi_poll+0xa1/0x440 >>>> napi_threaded_poll+0x3d1/0x460 >>>> kthread+0x2a2/0x340 >>>> ret_from_fork+0x22/0x30 >>>> >>>> >>>> The buggy address belongs to the object at ffff888976250000 >>>> which belongs to the cache kmalloc-2k of size 2048 >>>> The buggy address is located 340 bytes inside of >>>> 2048-byte region [ffff888976250000, ffff888976250800) >>>> >>>> The buggy address belongs to the physical page: >>>> page:00000000ae18262a refcount:2 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x976250 >>>> head:00000000ae18262a order:3 compound_mapcount:0 compound_pincount:0 >>>> flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff) >>>> raw: 002ffff800010200 0000000000000000 dead000000000122 ffff88810004cf00 >>>> raw: 0000000000000000 0000000080080008 00000002ffffffff 0000000000000000 >>>> page dumped because: kasan: bad access detected >>>> >>>> Memory state around the buggy address: >>>> ffff888976250000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb >>>> ffff888976250080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb >>>>> ffff888976250100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb >>>> ^ >>>> ffff888976250180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb >>>> ffff888976250200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb >>>> >>>> >>>> If I understand the code correctly it looks like a xdp_buf is >>>> constructed pointing to the memory backed by a skb but consume_skb() >>>> is called while the xdp_buf() is still in use. >>>> >>>> ``` >>>> case XDP_REDIRECT: >>>> veth_xdp_get(&xdp); >>>> consume_skb(skb); >>>> xdp.rxq->mem = rq->xdp_mem; >>>> if (xdp_do_redirect(rq->dev, &xdp, xdp_prog)) { >>>> stats->rx_drops++; >>>> goto err_xdp; >>>> } >>>> stats->xdp_redirect++; >>>> rcu_read_unlock(); >>>> goto xdp_xmit; >>>> ``` >>>> >>>> It is worth noting that I think XDP_TX has the exact same problem. >>>> >>>> Again assuming I understand the problem one naive solution might be to >>>> move the consum_skb() call after xdp_do_redirect(). I think this >>>> might work for BPF_MAP_TYPE_XSKMAP, BPF_MAP_TYPE_DEVMAP, and >>>> BPF_MAP_TYPE_DEVMAP_HASH since those all seem to copy the xdb_buf to >>>> new memory. The copy happens for XSKMAP in __xsk_rcv() and for the >>>> DEVMAP cases happens in dev_map_enqueue_clone(). >>>> >>>> However, it would appear that for BPF_MAP_TYPE_CPUMAP that memory can >>>> live much longer, possibly even after xdp_do_flush(). If I'm correct, >>>> I'm not really sure where it would be safe to call consume_skb(). >>> >>> So the idea is that veth_xdp_get() does a >>> get_page(virt_to_page(xdp->data)), where xdp->data in this case points >>> to skb->head. This should keep the data page alive even if the skb >>> surrounding it is freed by the call to consume_skb(). >>> >>> However, because the skb->head in this case was allocated from a slab >>> allocator, taking a page refcount is not enough to prevent it from being >>> freed. >> >> Not sure why skb->head is kmallocked here. >> skb_head_is_locked() check in veth_convert_skb_to_xdp_buff() should ensure that >> skb head is a page fragment. > > I have a few more details here. We have some machines running 5.15 > kernels and some are running 6.1 kernels. So far it appears this only > happens on 6.1. We also have a couple of different network cards but > it appears that only the machines with Solarflare cards using the sfc > driver hit the KASAN BUG. > > 718a18a0c8a67f97781e40bdef7cdd055c430996 "veth: Rework > veth_xdp_rcv_skb in order to accept non-linear skb" reworked and added > the veth_convert_skb_to_xdp_buff() call you mentioned. Importantly it > also added a call to pskb_expand_head() which will kmalloc() the > skb->head. This looks like a path that could be causing the KASAN > BUG, but I have not yet confirmed that is the path we are hitting. > This change was also added in 5.18 so might explain why we don't see > it on 5.15. I think you made a valid point. It looks like pskb_expand_head() is incorrect here. At least I did not expect kmallocked skb->data at this point when I introduced veth XDP. Also I looked into the discussion on the suspected patch, https://patchwork.kernel.org/project/netdevbpf/list/?series=&submitter=&state=*&q=rework+veth_xdp_rcv_skb&archive=both&delegate= But there was no discussion on pskb_expand_head() nor kmallocked data. Lorenzo, what do you think? Toshiaki Makita ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: KASAN veth use after free in XDP_REDIRECT 2023-03-14 8:02 ` Toshiaki Makita @ 2023-03-14 15:25 ` Shawn Bohrer 2023-03-14 15:33 ` [PATCH] veth: Fix " Shawn Bohrer 0 siblings, 1 reply; 13+ messages in thread From: Shawn Bohrer @ 2023-03-14 15:25 UTC (permalink / raw) To: Toshiaki Makita Cc: lorenzo, Toke Høiland-Jørgensen, makita.toshiaki, netdev, bpf, kernel-team On Tue, Mar 14, 2023 at 05:02:40PM +0900, Toshiaki Makita wrote: > On 2023/03/09 7:33, Shawn Bohrer wrote: > > On Wed, Jan 25, 2023 at 11:07:32AM +0900, Toshiaki Makita wrote: > > > On 2023/01/25 10:54, Toke Høiland-Jørgensen wrote: > > > > Shawn Bohrer <sbohrer@cloudflare.com> writes: > > > > > > > > > Hello, > > > > > > > > > > We've seen the following KASAN report on our systems. When using > > > > > AF_XDP on a veth. > > > > > > > > > > KASAN report: > > > > > > > > > > BUG: KASAN: use-after-free in __xsk_rcv+0x18d/0x2c0 > > > > > Read of size 78 at addr ffff888976250154 by task napi/iconduit-g/148640 > > > > > > > > > > CPU: 5 PID: 148640 Comm: napi/iconduit-g Kdump: loaded Tainted: G O 6.1.4-cloudflare-kasan-2023.1.2 #1 > > > > > Hardware name: Quanta Computer Inc. QuantaPlex T41S-2U/S2S-MB, BIOS S2S_3B10.03 06/21/2018 > > > > > Call Trace: > > > > > <TASK> > > > > > dump_stack_lvl+0x34/0x48 > > > > > print_report+0x170/0x473 > > > > > ? __xsk_rcv+0x18d/0x2c0 > > > > > kasan_report+0xad/0x130 > > > > > ? __xsk_rcv+0x18d/0x2c0 > > > > > kasan_check_range+0x149/0x1a0 > > > > > memcpy+0x20/0x60 > > > > > __xsk_rcv+0x18d/0x2c0 > > > > > __xsk_map_redirect+0x1f3/0x490 > > > > > ? veth_xdp_rcv_skb+0x89c/0x1ba0 [veth] > > > > > xdp_do_redirect+0x5ca/0xd60 > > > > > veth_xdp_rcv_skb+0x935/0x1ba0 [veth] > > > > > ? __netif_receive_skb_list_core+0x671/0x920 > > > > > ? veth_xdp+0x670/0x670 [veth] > > > > > veth_xdp_rcv+0x304/0xa20 [veth] > > > > > ? do_xdp_generic+0x150/0x150 > > > > > ? veth_xdp_rcv_one+0xde0/0xde0 [veth] > > > > > ? _raw_spin_lock_bh+0xe0/0xe0 > > > > > ? newidle_balance+0x887/0xe30 > > > > > ? __perf_event_task_sched_in+0xdb/0x800 > > > > > veth_poll+0x139/0x571 [veth] > > > > > ? veth_xdp_rcv+0xa20/0xa20 [veth] > > > > > ? _raw_spin_unlock+0x39/0x70 > > > > > ? finish_task_switch.isra.0+0x17e/0x7d0 > > > > > ? __switch_to+0x5cf/0x1070 > > > > > ? __schedule+0x95b/0x2640 > > > > > ? io_schedule_timeout+0x160/0x160 > > > > > __napi_poll+0xa1/0x440 > > > > > napi_threaded_poll+0x3d1/0x460 > > > > > ? __napi_poll+0x440/0x440 > > > > > ? __kthread_parkme+0xc6/0x1f0 > > > > > ? __napi_poll+0x440/0x440 > > > > > kthread+0x2a2/0x340 > > > > > ? kthread_complete_and_exit+0x20/0x20 > > > > > ret_from_fork+0x22/0x30 > > > > > </TASK> > > > > > > > > > > Freed by task 148640: > > > > > kasan_save_stack+0x23/0x50 > > > > > kasan_set_track+0x21/0x30 > > > > > kasan_save_free_info+0x2a/0x40 > > > > > ____kasan_slab_free+0x169/0x1d0 > > > > > slab_free_freelist_hook+0xd2/0x190 > > > > > __kmem_cache_free+0x1a1/0x2f0 > > > > > skb_release_data+0x449/0x600 > > > > > consume_skb+0x9f/0x1c0 > > > > > veth_xdp_rcv_skb+0x89c/0x1ba0 [veth] > > > > > veth_xdp_rcv+0x304/0xa20 [veth] > > > > > veth_poll+0x139/0x571 [veth] > > > > > __napi_poll+0xa1/0x440 > > > > > napi_threaded_poll+0x3d1/0x460 > > > > > kthread+0x2a2/0x340 > > > > > ret_from_fork+0x22/0x30 > > > > > > > > > > > > > > > The buggy address belongs to the object at ffff888976250000 > > > > > which belongs to the cache kmalloc-2k of size 2048 > > > > > The buggy address is located 340 bytes inside of > > > > > 2048-byte region [ffff888976250000, ffff888976250800) > > > > > > > > > > The buggy address belongs to the physical page: > > > > > page:00000000ae18262a refcount:2 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x976250 > > > > > head:00000000ae18262a order:3 compound_mapcount:0 compound_pincount:0 > > > > > flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff) > > > > > raw: 002ffff800010200 0000000000000000 dead000000000122 ffff88810004cf00 > > > > > raw: 0000000000000000 0000000080080008 00000002ffffffff 0000000000000000 > > > > > page dumped because: kasan: bad access detected > > > > > > > > > > Memory state around the buggy address: > > > > > ffff888976250000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb > > > > > ffff888976250080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb > > > > > > ffff888976250100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb > > > > > ^ > > > > > ffff888976250180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb > > > > > ffff888976250200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb > > > > > > > > > > > > > > > If I understand the code correctly it looks like a xdp_buf is > > > > > constructed pointing to the memory backed by a skb but consume_skb() > > > > > is called while the xdp_buf() is still in use. > > > > > > > > > > ``` > > > > > case XDP_REDIRECT: > > > > > veth_xdp_get(&xdp); > > > > > consume_skb(skb); > > > > > xdp.rxq->mem = rq->xdp_mem; > > > > > if (xdp_do_redirect(rq->dev, &xdp, xdp_prog)) { > > > > > stats->rx_drops++; > > > > > goto err_xdp; > > > > > } > > > > > stats->xdp_redirect++; > > > > > rcu_read_unlock(); > > > > > goto xdp_xmit; > > > > > ``` > > > > > > > > > > It is worth noting that I think XDP_TX has the exact same problem. > > > > > > > > > > Again assuming I understand the problem one naive solution might be to > > > > > move the consum_skb() call after xdp_do_redirect(). I think this > > > > > might work for BPF_MAP_TYPE_XSKMAP, BPF_MAP_TYPE_DEVMAP, and > > > > > BPF_MAP_TYPE_DEVMAP_HASH since those all seem to copy the xdb_buf to > > > > > new memory. The copy happens for XSKMAP in __xsk_rcv() and for the > > > > > DEVMAP cases happens in dev_map_enqueue_clone(). > > > > > > > > > > However, it would appear that for BPF_MAP_TYPE_CPUMAP that memory can > > > > > live much longer, possibly even after xdp_do_flush(). If I'm correct, > > > > > I'm not really sure where it would be safe to call consume_skb(). > > > > > > > > So the idea is that veth_xdp_get() does a > > > > get_page(virt_to_page(xdp->data)), where xdp->data in this case points > > > > to skb->head. This should keep the data page alive even if the skb > > > > surrounding it is freed by the call to consume_skb(). > > > > > > > > However, because the skb->head in this case was allocated from a slab > > > > allocator, taking a page refcount is not enough to prevent it from being > > > > freed. > > > > > > Not sure why skb->head is kmallocked here. > > > skb_head_is_locked() check in veth_convert_skb_to_xdp_buff() should ensure that > > > skb head is a page fragment. > > > > I have a few more details here. We have some machines running 5.15 > > kernels and some are running 6.1 kernels. So far it appears this only > > happens on 6.1. We also have a couple of different network cards but > > it appears that only the machines with Solarflare cards using the sfc > > driver hit the KASAN BUG. > > > > 718a18a0c8a67f97781e40bdef7cdd055c430996 "veth: Rework > > veth_xdp_rcv_skb in order to accept non-linear skb" reworked and added > > the veth_convert_skb_to_xdp_buff() call you mentioned. Importantly it > > also added a call to pskb_expand_head() which will kmalloc() the > > skb->head. This looks like a path that could be causing the KASAN > > BUG, but I have not yet confirmed that is the path we are hitting. > > This change was also added in 5.18 so might explain why we don't see > > it on 5.15. > > I think you made a valid point. It looks like pskb_expand_head() is incorrect here. > At least I did not expect kmallocked skb->data at this point when I introduced > veth XDP. > > Also I looked into the discussion on the suspected patch, > https://patchwork.kernel.org/project/netdevbpf/list/?series=&submitter=&state=*&q=rework+veth_xdp_rcv_skb&archive=both&delegate= > But there was no discussion on pskb_expand_head() nor kmallocked data. > > Lorenzo, what do you think? I have confirmed this is the cause of the bug we are hitting and I've also tested a change that simply does what the old code did and follows the same code to allocate a new skb if the headroom is less than XDP_PACKET_HEADROOM. That fixes the issue but maybe there is a better solution. I'll send the patch I've tested in a reply. -- Shawn ^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH] veth: Fix use after free in XDP_REDIRECT 2023-03-14 15:25 ` Shawn Bohrer @ 2023-03-14 15:33 ` Shawn Bohrer 2023-03-14 17:13 ` Lorenzo Bianconi ` (3 more replies) 0 siblings, 4 replies; 13+ messages in thread From: Shawn Bohrer @ 2023-03-14 15:33 UTC (permalink / raw) To: toshiaki.makita1 Cc: toke, makita.toshiaki, netdev, bpf, kernel-team, lorenzo, Shawn Bohrer 718a18a0c8a67f97781e40bdef7cdd055c430996 "veth: Rework veth_xdp_rcv_skb in order to accept non-linear skb" introduced a bug where it tried to use pskb_expand_head() if the headroom was less than XDP_PACKET_HEADROOM. This however uses kmalloc to expand the head, which will later allow consume_skb() to free the skb while is it still in use by AF_XDP. Previously if the headroom was less than XDP_PACKET_HEADROOM we continued on to allocate a new skb from pages so this restores that behavior. BUG: KASAN: use-after-free in __xsk_rcv+0x18d/0x2c0 Read of size 78 at addr ffff888976250154 by task napi/iconduit-g/148640 CPU: 5 PID: 148640 Comm: napi/iconduit-g Kdump: loaded Tainted: G O 6.1.4-cloudflare-kasan-2023.1.2 #1 Hardware name: Quanta Computer Inc. QuantaPlex T41S-2U/S2S-MB, BIOS S2S_3B10.03 06/21/2018 Call Trace: <TASK> dump_stack_lvl+0x34/0x48 print_report+0x170/0x473 ? __xsk_rcv+0x18d/0x2c0 kasan_report+0xad/0x130 ? __xsk_rcv+0x18d/0x2c0 kasan_check_range+0x149/0x1a0 memcpy+0x20/0x60 __xsk_rcv+0x18d/0x2c0 __xsk_map_redirect+0x1f3/0x490 ? veth_xdp_rcv_skb+0x89c/0x1ba0 [veth] xdp_do_redirect+0x5ca/0xd60 veth_xdp_rcv_skb+0x935/0x1ba0 [veth] ? __netif_receive_skb_list_core+0x671/0x920 ? veth_xdp+0x670/0x670 [veth] veth_xdp_rcv+0x304/0xa20 [veth] ? do_xdp_generic+0x150/0x150 ? veth_xdp_rcv_one+0xde0/0xde0 [veth] ? _raw_spin_lock_bh+0xe0/0xe0 ? newidle_balance+0x887/0xe30 ? __perf_event_task_sched_in+0xdb/0x800 veth_poll+0x139/0x571 [veth] ? veth_xdp_rcv+0xa20/0xa20 [veth] ? _raw_spin_unlock+0x39/0x70 ? finish_task_switch.isra.0+0x17e/0x7d0 ? __switch_to+0x5cf/0x1070 ? __schedule+0x95b/0x2640 ? io_schedule_timeout+0x160/0x160 __napi_poll+0xa1/0x440 napi_threaded_poll+0x3d1/0x460 ? __napi_poll+0x440/0x440 ? __kthread_parkme+0xc6/0x1f0 ? __napi_poll+0x440/0x440 kthread+0x2a2/0x340 ? kthread_complete_and_exit+0x20/0x20 ret_from_fork+0x22/0x30 </TASK> Freed by task 148640: kasan_save_stack+0x23/0x50 kasan_set_track+0x21/0x30 kasan_save_free_info+0x2a/0x40 ____kasan_slab_free+0x169/0x1d0 slab_free_freelist_hook+0xd2/0x190 __kmem_cache_free+0x1a1/0x2f0 skb_release_data+0x449/0x600 consume_skb+0x9f/0x1c0 veth_xdp_rcv_skb+0x89c/0x1ba0 [veth] veth_xdp_rcv+0x304/0xa20 [veth] veth_poll+0x139/0x571 [veth] __napi_poll+0xa1/0x440 napi_threaded_poll+0x3d1/0x460 kthread+0x2a2/0x340 ret_from_fork+0x22/0x30 The buggy address belongs to the object at ffff888976250000 which belongs to the cache kmalloc-2k of size 2048 The buggy address is located 340 bytes inside of 2048-byte region [ffff888976250000, ffff888976250800) The buggy address belongs to the physical page: page:00000000ae18262a refcount:2 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x976250 head:00000000ae18262a order:3 compound_mapcount:0 compound_pincount:0 flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff) raw: 002ffff800010200 0000000000000000 dead000000000122 ffff88810004cf00 raw: 0000000000000000 0000000080080008 00000002ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff888976250000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff888976250080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb > ffff888976250100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff888976250180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff888976250200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb Fixes: 718a18a0c8a6 ("veth: Rework veth_xdp_rcv_skb in order to accept non-linear skb") Signed-off-by: Shawn Bohrer <sbohrer@cloudflare.com> --- drivers/net/veth.c | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/drivers/net/veth.c b/drivers/net/veth.c index 1bb54de7124d..6b10aa3883c5 100644 --- a/drivers/net/veth.c +++ b/drivers/net/veth.c @@ -708,7 +708,7 @@ static int veth_convert_skb_to_xdp_buff(struct veth_rq *rq, u32 frame_sz; if (skb_shared(skb) || skb_head_is_locked(skb) || - skb_shinfo(skb)->nr_frags) { + skb_shinfo(skb)->nr_frags || skb_headroom(skb) < XDP_PACKET_HEADROOM) { u32 size, len, max_head_size, off; struct sk_buff *nskb; struct page *page; @@ -773,9 +773,6 @@ static int veth_convert_skb_to_xdp_buff(struct veth_rq *rq, consume_skb(skb); skb = nskb; - } else if (skb_headroom(skb) < XDP_PACKET_HEADROOM && - pskb_expand_head(skb, VETH_XDP_HEADROOM, 0, GFP_ATOMIC)) { - goto drop; } /* SKB "head" area always have tailroom for skb_shared_info */ -- 2.34.1 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH] veth: Fix use after free in XDP_REDIRECT 2023-03-14 15:33 ` [PATCH] veth: Fix " Shawn Bohrer @ 2023-03-14 17:13 ` Lorenzo Bianconi 2023-03-15 6:19 ` Toshiaki Makita ` (2 subsequent siblings) 3 siblings, 0 replies; 13+ messages in thread From: Lorenzo Bianconi @ 2023-03-14 17:13 UTC (permalink / raw) To: Shawn Bohrer Cc: toshiaki.makita1, toke, makita.toshiaki, netdev, bpf, kernel-team [-- Attachment #1: Type: text/plain, Size: 5076 bytes --] > 718a18a0c8a67f97781e40bdef7cdd055c430996 "veth: Rework veth_xdp_rcv_skb > in order to accept non-linear skb" introduced a bug where it tried to > use pskb_expand_head() if the headroom was less than > XDP_PACKET_HEADROOM. This however uses kmalloc to expand the head, > which will later allow consume_skb() to free the skb while is it still > in use by AF_XDP. > > Previously if the headroom was less than XDP_PACKET_HEADROOM we > continued on to allocate a new skb from pages so this restores that > behavior. > > BUG: KASAN: use-after-free in __xsk_rcv+0x18d/0x2c0 > Read of size 78 at addr ffff888976250154 by task napi/iconduit-g/148640 > > CPU: 5 PID: 148640 Comm: napi/iconduit-g Kdump: loaded Tainted: G O 6.1.4-cloudflare-kasan-2023.1.2 #1 > Hardware name: Quanta Computer Inc. QuantaPlex T41S-2U/S2S-MB, BIOS S2S_3B10.03 06/21/2018 > Call Trace: > <TASK> > dump_stack_lvl+0x34/0x48 > print_report+0x170/0x473 > ? __xsk_rcv+0x18d/0x2c0 > kasan_report+0xad/0x130 > ? __xsk_rcv+0x18d/0x2c0 > kasan_check_range+0x149/0x1a0 > memcpy+0x20/0x60 > __xsk_rcv+0x18d/0x2c0 > __xsk_map_redirect+0x1f3/0x490 > ? veth_xdp_rcv_skb+0x89c/0x1ba0 [veth] > xdp_do_redirect+0x5ca/0xd60 > veth_xdp_rcv_skb+0x935/0x1ba0 [veth] > ? __netif_receive_skb_list_core+0x671/0x920 > ? veth_xdp+0x670/0x670 [veth] > veth_xdp_rcv+0x304/0xa20 [veth] > ? do_xdp_generic+0x150/0x150 > ? veth_xdp_rcv_one+0xde0/0xde0 [veth] > ? _raw_spin_lock_bh+0xe0/0xe0 > ? newidle_balance+0x887/0xe30 > ? __perf_event_task_sched_in+0xdb/0x800 > veth_poll+0x139/0x571 [veth] > ? veth_xdp_rcv+0xa20/0xa20 [veth] > ? _raw_spin_unlock+0x39/0x70 > ? finish_task_switch.isra.0+0x17e/0x7d0 > ? __switch_to+0x5cf/0x1070 > ? __schedule+0x95b/0x2640 > ? io_schedule_timeout+0x160/0x160 > __napi_poll+0xa1/0x440 > napi_threaded_poll+0x3d1/0x460 > ? __napi_poll+0x440/0x440 > ? __kthread_parkme+0xc6/0x1f0 > ? __napi_poll+0x440/0x440 > kthread+0x2a2/0x340 > ? kthread_complete_and_exit+0x20/0x20 > ret_from_fork+0x22/0x30 > </TASK> > > Freed by task 148640: > kasan_save_stack+0x23/0x50 > kasan_set_track+0x21/0x30 > kasan_save_free_info+0x2a/0x40 > ____kasan_slab_free+0x169/0x1d0 > slab_free_freelist_hook+0xd2/0x190 > __kmem_cache_free+0x1a1/0x2f0 > skb_release_data+0x449/0x600 > consume_skb+0x9f/0x1c0 > veth_xdp_rcv_skb+0x89c/0x1ba0 [veth] > veth_xdp_rcv+0x304/0xa20 [veth] > veth_poll+0x139/0x571 [veth] > __napi_poll+0xa1/0x440 > napi_threaded_poll+0x3d1/0x460 > kthread+0x2a2/0x340 > ret_from_fork+0x22/0x30 > > The buggy address belongs to the object at ffff888976250000 > which belongs to the cache kmalloc-2k of size 2048 > The buggy address is located 340 bytes inside of > 2048-byte region [ffff888976250000, ffff888976250800) > > The buggy address belongs to the physical page: > page:00000000ae18262a refcount:2 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x976250 > head:00000000ae18262a order:3 compound_mapcount:0 compound_pincount:0 > flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff) > raw: 002ffff800010200 0000000000000000 dead000000000122 ffff88810004cf00 > raw: 0000000000000000 0000000080080008 00000002ffffffff 0000000000000000 > page dumped because: kasan: bad access detected > > Memory state around the buggy address: > ffff888976250000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb > ffff888976250080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb > > ffff888976250100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb > ^ > ffff888976250180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb > ffff888976250200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb > > Fixes: 718a18a0c8a6 ("veth: Rework veth_xdp_rcv_skb in order to accept non-linear skb") > Signed-off-by: Shawn Bohrer <sbohrer@cloudflare.com> Thx for fixing it. Acked-by: Lorenzo Bianconi <lorenzo@kernel.org> > --- > drivers/net/veth.c | 5 +---- > 1 file changed, 1 insertion(+), 4 deletions(-) > > diff --git a/drivers/net/veth.c b/drivers/net/veth.c > index 1bb54de7124d..6b10aa3883c5 100644 > --- a/drivers/net/veth.c > +++ b/drivers/net/veth.c > @@ -708,7 +708,7 @@ static int veth_convert_skb_to_xdp_buff(struct veth_rq *rq, > u32 frame_sz; > > if (skb_shared(skb) || skb_head_is_locked(skb) || > - skb_shinfo(skb)->nr_frags) { > + skb_shinfo(skb)->nr_frags || skb_headroom(skb) < XDP_PACKET_HEADROOM) { > u32 size, len, max_head_size, off; > struct sk_buff *nskb; > struct page *page; > @@ -773,9 +773,6 @@ static int veth_convert_skb_to_xdp_buff(struct veth_rq *rq, > > consume_skb(skb); > skb = nskb; > - } else if (skb_headroom(skb) < XDP_PACKET_HEADROOM && > - pskb_expand_head(skb, VETH_XDP_HEADROOM, 0, GFP_ATOMIC)) { > - goto drop; > } > > /* SKB "head" area always have tailroom for skb_shared_info */ > -- > 2.34.1 > [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 228 bytes --] ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] veth: Fix use after free in XDP_REDIRECT 2023-03-14 15:33 ` [PATCH] veth: Fix " Shawn Bohrer 2023-03-14 17:13 ` Lorenzo Bianconi @ 2023-03-15 6:19 ` Toshiaki Makita 2023-03-15 18:08 ` Toke Høiland-Jørgensen 2023-03-16 4:20 ` patchwork-bot+netdevbpf 3 siblings, 0 replies; 13+ messages in thread From: Toshiaki Makita @ 2023-03-15 6:19 UTC (permalink / raw) To: Shawn Bohrer; +Cc: toke, makita.toshiaki, netdev, bpf, kernel-team, lorenzo On 2023/03/15 0:33, Shawn Bohrer wrote: > 718a18a0c8a67f97781e40bdef7cdd055c430996 "veth: Rework veth_xdp_rcv_skb > in order to accept non-linear skb" introduced a bug where it tried to > use pskb_expand_head() if the headroom was less than > XDP_PACKET_HEADROOM. This however uses kmalloc to expand the head, > which will later allow consume_skb() to free the skb while is it still > in use by AF_XDP. > > Previously if the headroom was less than XDP_PACKET_HEADROOM we > continued on to allocate a new skb from pages so this restores that > behavior. > > BUG: KASAN: use-after-free in __xsk_rcv+0x18d/0x2c0 > Read of size 78 at addr ffff888976250154 by task napi/iconduit-g/148640 > > CPU: 5 PID: 148640 Comm: napi/iconduit-g Kdump: loaded Tainted: G O 6.1.4-cloudflare-kasan-2023.1.2 #1 > Hardware name: Quanta Computer Inc. QuantaPlex T41S-2U/S2S-MB, BIOS S2S_3B10.03 06/21/2018 > Call Trace: > <TASK> > dump_stack_lvl+0x34/0x48 > print_report+0x170/0x473 > ? __xsk_rcv+0x18d/0x2c0 > kasan_report+0xad/0x130 > ? __xsk_rcv+0x18d/0x2c0 > kasan_check_range+0x149/0x1a0 > memcpy+0x20/0x60 > __xsk_rcv+0x18d/0x2c0 > __xsk_map_redirect+0x1f3/0x490 > ? veth_xdp_rcv_skb+0x89c/0x1ba0 [veth] > xdp_do_redirect+0x5ca/0xd60 > veth_xdp_rcv_skb+0x935/0x1ba0 [veth] > ? __netif_receive_skb_list_core+0x671/0x920 > ? veth_xdp+0x670/0x670 [veth] > veth_xdp_rcv+0x304/0xa20 [veth] > ? do_xdp_generic+0x150/0x150 > ? veth_xdp_rcv_one+0xde0/0xde0 [veth] > ? _raw_spin_lock_bh+0xe0/0xe0 > ? newidle_balance+0x887/0xe30 > ? __perf_event_task_sched_in+0xdb/0x800 > veth_poll+0x139/0x571 [veth] > ? veth_xdp_rcv+0xa20/0xa20 [veth] > ? _raw_spin_unlock+0x39/0x70 > ? finish_task_switch.isra.0+0x17e/0x7d0 > ? __switch_to+0x5cf/0x1070 > ? __schedule+0x95b/0x2640 > ? io_schedule_timeout+0x160/0x160 > __napi_poll+0xa1/0x440 > napi_threaded_poll+0x3d1/0x460 > ? __napi_poll+0x440/0x440 > ? __kthread_parkme+0xc6/0x1f0 > ? __napi_poll+0x440/0x440 > kthread+0x2a2/0x340 > ? kthread_complete_and_exit+0x20/0x20 > ret_from_fork+0x22/0x30 > </TASK> > > Freed by task 148640: > kasan_save_stack+0x23/0x50 > kasan_set_track+0x21/0x30 > kasan_save_free_info+0x2a/0x40 > ____kasan_slab_free+0x169/0x1d0 > slab_free_freelist_hook+0xd2/0x190 > __kmem_cache_free+0x1a1/0x2f0 > skb_release_data+0x449/0x600 > consume_skb+0x9f/0x1c0 > veth_xdp_rcv_skb+0x89c/0x1ba0 [veth] > veth_xdp_rcv+0x304/0xa20 [veth] > veth_poll+0x139/0x571 [veth] > __napi_poll+0xa1/0x440 > napi_threaded_poll+0x3d1/0x460 > kthread+0x2a2/0x340 > ret_from_fork+0x22/0x30 > > The buggy address belongs to the object at ffff888976250000 > which belongs to the cache kmalloc-2k of size 2048 > The buggy address is located 340 bytes inside of > 2048-byte region [ffff888976250000, ffff888976250800) > > The buggy address belongs to the physical page: > page:00000000ae18262a refcount:2 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x976250 > head:00000000ae18262a order:3 compound_mapcount:0 compound_pincount:0 > flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff) > raw: 002ffff800010200 0000000000000000 dead000000000122 ffff88810004cf00 > raw: 0000000000000000 0000000080080008 00000002ffffffff 0000000000000000 > page dumped because: kasan: bad access detected > > Memory state around the buggy address: > ffff888976250000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb > ffff888976250080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb >> ffff888976250100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb > ^ > ffff888976250180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb > ffff888976250200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb > > Fixes: 718a18a0c8a6 ("veth: Rework veth_xdp_rcv_skb in order to accept non-linear skb") > Signed-off-by: Shawn Bohrer <sbohrer@cloudflare.com> Acked-by: Toshiaki Makita <toshiaki.makita1@gmail.com> ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] veth: Fix use after free in XDP_REDIRECT 2023-03-14 15:33 ` [PATCH] veth: Fix " Shawn Bohrer 2023-03-14 17:13 ` Lorenzo Bianconi 2023-03-15 6:19 ` Toshiaki Makita @ 2023-03-15 18:08 ` Toke Høiland-Jørgensen 2023-03-16 4:20 ` patchwork-bot+netdevbpf 3 siblings, 0 replies; 13+ messages in thread From: Toke Høiland-Jørgensen @ 2023-03-15 18:08 UTC (permalink / raw) To: Shawn Bohrer, toshiaki.makita1 Cc: makita.toshiaki, netdev, bpf, kernel-team, lorenzo, Shawn Bohrer Shawn Bohrer <sbohrer@cloudflare.com> writes: > 718a18a0c8a67f97781e40bdef7cdd055c430996 "veth: Rework veth_xdp_rcv_skb > in order to accept non-linear skb" introduced a bug where it tried to > use pskb_expand_head() if the headroom was less than > XDP_PACKET_HEADROOM. This however uses kmalloc to expand the head, > which will later allow consume_skb() to free the skb while is it still > in use by AF_XDP. > > Previously if the headroom was less than XDP_PACKET_HEADROOM we > continued on to allocate a new skb from pages so this restores that > behavior. > > BUG: KASAN: use-after-free in __xsk_rcv+0x18d/0x2c0 > Read of size 78 at addr ffff888976250154 by task napi/iconduit-g/148640 > > CPU: 5 PID: 148640 Comm: napi/iconduit-g Kdump: loaded Tainted: G O 6.1.4-cloudflare-kasan-2023.1.2 #1 > Hardware name: Quanta Computer Inc. QuantaPlex T41S-2U/S2S-MB, BIOS S2S_3B10.03 06/21/2018 > Call Trace: > <TASK> > dump_stack_lvl+0x34/0x48 > print_report+0x170/0x473 > ? __xsk_rcv+0x18d/0x2c0 > kasan_report+0xad/0x130 > ? __xsk_rcv+0x18d/0x2c0 > kasan_check_range+0x149/0x1a0 > memcpy+0x20/0x60 > __xsk_rcv+0x18d/0x2c0 > __xsk_map_redirect+0x1f3/0x490 > ? veth_xdp_rcv_skb+0x89c/0x1ba0 [veth] > xdp_do_redirect+0x5ca/0xd60 > veth_xdp_rcv_skb+0x935/0x1ba0 [veth] > ? __netif_receive_skb_list_core+0x671/0x920 > ? veth_xdp+0x670/0x670 [veth] > veth_xdp_rcv+0x304/0xa20 [veth] > ? do_xdp_generic+0x150/0x150 > ? veth_xdp_rcv_one+0xde0/0xde0 [veth] > ? _raw_spin_lock_bh+0xe0/0xe0 > ? newidle_balance+0x887/0xe30 > ? __perf_event_task_sched_in+0xdb/0x800 > veth_poll+0x139/0x571 [veth] > ? veth_xdp_rcv+0xa20/0xa20 [veth] > ? _raw_spin_unlock+0x39/0x70 > ? finish_task_switch.isra.0+0x17e/0x7d0 > ? __switch_to+0x5cf/0x1070 > ? __schedule+0x95b/0x2640 > ? io_schedule_timeout+0x160/0x160 > __napi_poll+0xa1/0x440 > napi_threaded_poll+0x3d1/0x460 > ? __napi_poll+0x440/0x440 > ? __kthread_parkme+0xc6/0x1f0 > ? __napi_poll+0x440/0x440 > kthread+0x2a2/0x340 > ? kthread_complete_and_exit+0x20/0x20 > ret_from_fork+0x22/0x30 > </TASK> > > Freed by task 148640: > kasan_save_stack+0x23/0x50 > kasan_set_track+0x21/0x30 > kasan_save_free_info+0x2a/0x40 > ____kasan_slab_free+0x169/0x1d0 > slab_free_freelist_hook+0xd2/0x190 > __kmem_cache_free+0x1a1/0x2f0 > skb_release_data+0x449/0x600 > consume_skb+0x9f/0x1c0 > veth_xdp_rcv_skb+0x89c/0x1ba0 [veth] > veth_xdp_rcv+0x304/0xa20 [veth] > veth_poll+0x139/0x571 [veth] > __napi_poll+0xa1/0x440 > napi_threaded_poll+0x3d1/0x460 > kthread+0x2a2/0x340 > ret_from_fork+0x22/0x30 > > The buggy address belongs to the object at ffff888976250000 > which belongs to the cache kmalloc-2k of size 2048 > The buggy address is located 340 bytes inside of > 2048-byte region [ffff888976250000, ffff888976250800) > > The buggy address belongs to the physical page: > page:00000000ae18262a refcount:2 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x976250 > head:00000000ae18262a order:3 compound_mapcount:0 compound_pincount:0 > flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff) > raw: 002ffff800010200 0000000000000000 dead000000000122 ffff88810004cf00 > raw: 0000000000000000 0000000080080008 00000002ffffffff 0000000000000000 > page dumped because: kasan: bad access detected > > Memory state around the buggy address: > ffff888976250000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb > ffff888976250080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb >> ffff888976250100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb > ^ > ffff888976250180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb > ffff888976250200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb > > Fixes: 718a18a0c8a6 ("veth: Rework veth_xdp_rcv_skb in order to accept non-linear skb") > Signed-off-by: Shawn Bohrer <sbohrer@cloudflare.com> Acked-by: Toke Høiland-Jørgensen <toke@kernel.org> ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] veth: Fix use after free in XDP_REDIRECT 2023-03-14 15:33 ` [PATCH] veth: Fix " Shawn Bohrer ` (2 preceding siblings ...) 2023-03-15 18:08 ` Toke Høiland-Jørgensen @ 2023-03-16 4:20 ` patchwork-bot+netdevbpf 3 siblings, 0 replies; 13+ messages in thread From: patchwork-bot+netdevbpf @ 2023-03-16 4:20 UTC (permalink / raw) To: Shawn Bohrer Cc: toshiaki.makita1, toke, makita.toshiaki, netdev, bpf, kernel-team, lorenzo Hello: This patch was applied to netdev/net.git (main) by Jakub Kicinski <kuba@kernel.org>: On Tue, 14 Mar 2023 10:33:51 -0500 you wrote: > 718a18a0c8a67f97781e40bdef7cdd055c430996 "veth: Rework veth_xdp_rcv_skb > in order to accept non-linear skb" introduced a bug where it tried to > use pskb_expand_head() if the headroom was less than > XDP_PACKET_HEADROOM. This however uses kmalloc to expand the head, > which will later allow consume_skb() to free the skb while is it still > in use by AF_XDP. > > [...] Here is the summary with links: - veth: Fix use after free in XDP_REDIRECT https://git.kernel.org/netdev/net/c/7c10131803e4 You are awesome, thank you! -- Deet-doot-dot, I am a bot. https://korg.docs.kernel.org/patchwork/pwbot.html ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: KASAN veth use after free in XDP_REDIRECT 2023-01-25 1:54 ` Toke Høiland-Jørgensen 2023-01-25 2:02 ` Jakub Kicinski 2023-01-25 2:07 ` Toshiaki Makita @ 2023-01-25 16:40 ` Shawn Bohrer 2 siblings, 0 replies; 13+ messages in thread From: Shawn Bohrer @ 2023-01-25 16:40 UTC (permalink / raw) To: Toke Høiland-Jørgensen Cc: netdev, bpf, makita.toshiaki, kernel-team On Wed, Jan 25, 2023 at 02:54:52AM +0100, Toke Høiland-Jørgensen wrote: > Shawn Bohrer <sbohrer@cloudflare.com> writes: > > > Hello, > > > > We've seen the following KASAN report on our systems. When using > > AF_XDP on a veth. > > > > KASAN report: > > > > BUG: KASAN: use-after-free in __xsk_rcv+0x18d/0x2c0 > > Read of size 78 at addr ffff888976250154 by task napi/iconduit-g/148640 > > > > CPU: 5 PID: 148640 Comm: napi/iconduit-g Kdump: loaded Tainted: G O 6.1.4-cloudflare-kasan-2023.1.2 #1 > > Hardware name: Quanta Computer Inc. QuantaPlex T41S-2U/S2S-MB, BIOS S2S_3B10.03 06/21/2018 > > Call Trace: > > <TASK> > > dump_stack_lvl+0x34/0x48 > > print_report+0x170/0x473 > > ? __xsk_rcv+0x18d/0x2c0 > > kasan_report+0xad/0x130 > > ? __xsk_rcv+0x18d/0x2c0 > > kasan_check_range+0x149/0x1a0 > > memcpy+0x20/0x60 > > __xsk_rcv+0x18d/0x2c0 > > __xsk_map_redirect+0x1f3/0x490 > > ? veth_xdp_rcv_skb+0x89c/0x1ba0 [veth] > > xdp_do_redirect+0x5ca/0xd60 > > veth_xdp_rcv_skb+0x935/0x1ba0 [veth] > > ? __netif_receive_skb_list_core+0x671/0x920 > > ? veth_xdp+0x670/0x670 [veth] > > veth_xdp_rcv+0x304/0xa20 [veth] > > ? do_xdp_generic+0x150/0x150 > > ? veth_xdp_rcv_one+0xde0/0xde0 [veth] > > ? _raw_spin_lock_bh+0xe0/0xe0 > > ? newidle_balance+0x887/0xe30 > > ? __perf_event_task_sched_in+0xdb/0x800 > > veth_poll+0x139/0x571 [veth] > > ? veth_xdp_rcv+0xa20/0xa20 [veth] > > ? _raw_spin_unlock+0x39/0x70 > > ? finish_task_switch.isra.0+0x17e/0x7d0 > > ? __switch_to+0x5cf/0x1070 > > ? __schedule+0x95b/0x2640 > > ? io_schedule_timeout+0x160/0x160 > > __napi_poll+0xa1/0x440 > > napi_threaded_poll+0x3d1/0x460 > > ? __napi_poll+0x440/0x440 > > ? __kthread_parkme+0xc6/0x1f0 > > ? __napi_poll+0x440/0x440 > > kthread+0x2a2/0x340 > > ? kthread_complete_and_exit+0x20/0x20 > > ret_from_fork+0x22/0x30 > > </TASK> > > > > Freed by task 148640: > > kasan_save_stack+0x23/0x50 > > kasan_set_track+0x21/0x30 > > kasan_save_free_info+0x2a/0x40 > > ____kasan_slab_free+0x169/0x1d0 > > slab_free_freelist_hook+0xd2/0x190 > > __kmem_cache_free+0x1a1/0x2f0 > > skb_release_data+0x449/0x600 > > consume_skb+0x9f/0x1c0 > > veth_xdp_rcv_skb+0x89c/0x1ba0 [veth] > > veth_xdp_rcv+0x304/0xa20 [veth] > > veth_poll+0x139/0x571 [veth] > > __napi_poll+0xa1/0x440 > > napi_threaded_poll+0x3d1/0x460 > > kthread+0x2a2/0x340 > > ret_from_fork+0x22/0x30 > > > > > > The buggy address belongs to the object at ffff888976250000 > > which belongs to the cache kmalloc-2k of size 2048 > > The buggy address is located 340 bytes inside of > > 2048-byte region [ffff888976250000, ffff888976250800) > > > > The buggy address belongs to the physical page: > > page:00000000ae18262a refcount:2 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x976250 > > head:00000000ae18262a order:3 compound_mapcount:0 compound_pincount:0 > > flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff) > > raw: 002ffff800010200 0000000000000000 dead000000000122 ffff88810004cf00 > > raw: 0000000000000000 0000000080080008 00000002ffffffff 0000000000000000 > > page dumped because: kasan: bad access detected > > > > Memory state around the buggy address: > > ffff888976250000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb > > ffff888976250080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb > >>ffff888976250100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb > > ^ > > ffff888976250180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb > > ffff888976250200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb > > > > > > If I understand the code correctly it looks like a xdp_buf is > > constructed pointing to the memory backed by a skb but consume_skb() > > is called while the xdp_buf() is still in use. > > > > ``` > > case XDP_REDIRECT: > > veth_xdp_get(&xdp); > > consume_skb(skb); > > xdp.rxq->mem = rq->xdp_mem; > > if (xdp_do_redirect(rq->dev, &xdp, xdp_prog)) { > > stats->rx_drops++; > > goto err_xdp; > > } > > stats->xdp_redirect++; > > rcu_read_unlock(); > > goto xdp_xmit; > > ``` > > > > It is worth noting that I think XDP_TX has the exact same problem. > > > > Again assuming I understand the problem one naive solution might be to > > move the consum_skb() call after xdp_do_redirect(). I think this > > might work for BPF_MAP_TYPE_XSKMAP, BPF_MAP_TYPE_DEVMAP, and > > BPF_MAP_TYPE_DEVMAP_HASH since those all seem to copy the xdb_buf to > > new memory. The copy happens for XSKMAP in __xsk_rcv() and for the > > DEVMAP cases happens in dev_map_enqueue_clone(). > > > > However, it would appear that for BPF_MAP_TYPE_CPUMAP that memory can > > live much longer, possibly even after xdp_do_flush(). If I'm correct, > > I'm not really sure where it would be safe to call consume_skb(). > > So the idea is that veth_xdp_get() does a > get_page(virt_to_page(xdp->data)), where xdp->data in this case points > to skb->head. This should keep the data page alive even if the skb > surrounding it is freed by the call to consume_skb(). > > However, because the skb->head in this case was allocated from a slab > allocator, taking a page refcount is not enough to prevent it from being > freed. > > I'm not sure how best to fix this. I guess we could try to detect this > case in the veth driver and copy the data, like we do for skb_share() > etc. However, I'm not sure how to actually detect this case... > > Where are the skbs being processed coming from? I.e., what path > allocated them? I can't say for sure where this skb came from. Normally the vast majority of our packets come from an external physical NIC and are then routed into a network namespace through a veth pair. In this case this machine had a Solarflare card with the sfc driver. However, we only run the KASAN kernel on testing colo, and it normally sees very little of this traffic unless someone is explicitly running tests, and I have no idea what was happening when this bug triggered. My only other guesses are that maybe this skb came from a tun device or from the loopback device. Some of our traffic arrives encapsulated, and we decapsulate and forward. This could then enter the veth into the namespace. If it is actually important to know _where_ the skb was allocated we might be able to come up some more ideas. Thanks, Shawn ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2023-03-16 4:20 UTC | newest] Thread overview: 13+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-01-24 22:45 KASAN veth use after free in XDP_REDIRECT Shawn Bohrer 2023-01-25 1:54 ` Toke Høiland-Jørgensen 2023-01-25 2:02 ` Jakub Kicinski 2023-01-25 2:07 ` Toshiaki Makita 2023-03-08 22:33 ` Shawn Bohrer 2023-03-14 8:02 ` Toshiaki Makita 2023-03-14 15:25 ` Shawn Bohrer 2023-03-14 15:33 ` [PATCH] veth: Fix " Shawn Bohrer 2023-03-14 17:13 ` Lorenzo Bianconi 2023-03-15 6:19 ` Toshiaki Makita 2023-03-15 18:08 ` Toke Høiland-Jørgensen 2023-03-16 4:20 ` patchwork-bot+netdevbpf 2023-01-25 16:40 ` KASAN veth " Shawn Bohrer
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).