KASAN veth use after free in XDP

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* KASAN veth use after free in XDP_REDIRECT
@ 2023-01-24 22:45 Shawn Bohrer
  2023-01-25  1:54 ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 13+ messages in thread
From: Shawn Bohrer @ 2023-01-24 22:45 UTC (permalink / raw)
  To: netdev, bpf; +Cc: makita.toshiaki

Hello,

We've seen the following KASAN report on our systems. When using
AF_XDP on a veth.

KASAN report:

BUG: KASAN: use-after-free in __xsk_rcv+0x18d/0x2c0
Read of size 78 at addr ffff888976250154 by task napi/iconduit-g/148640

CPU: 5 PID: 148640 Comm: napi/iconduit-g Kdump: loaded Tainted: G           O       6.1.4-cloudflare-kasan-2023.1.2 #1
Hardware name: Quanta Computer Inc. QuantaPlex T41S-2U/S2S-MB, BIOS S2S_3B10.03 06/21/2018
Call Trace:
 <TASK>
 dump_stack_lvl+0x34/0x48
 print_report+0x170/0x473
 ? __xsk_rcv+0x18d/0x2c0
 kasan_report+0xad/0x130
 ? __xsk_rcv+0x18d/0x2c0
 kasan_check_range+0x149/0x1a0
 memcpy+0x20/0x60
 __xsk_rcv+0x18d/0x2c0
 __xsk_map_redirect+0x1f3/0x490
 ? veth_xdp_rcv_skb+0x89c/0x1ba0 [veth]
 xdp_do_redirect+0x5ca/0xd60
 veth_xdp_rcv_skb+0x935/0x1ba0 [veth]
 ? __netif_receive_skb_list_core+0x671/0x920
 ? veth_xdp+0x670/0x670 [veth]
 veth_xdp_rcv+0x304/0xa20 [veth]
 ? do_xdp_generic+0x150/0x150
 ? veth_xdp_rcv_one+0xde0/0xde0 [veth]
 ? _raw_spin_lock_bh+0xe0/0xe0
 ? newidle_balance+0x887/0xe30
 ? __perf_event_task_sched_in+0xdb/0x800
 veth_poll+0x139/0x571 [veth]
 ? veth_xdp_rcv+0xa20/0xa20 [veth]
 ? _raw_spin_unlock+0x39/0x70
 ? finish_task_switch.isra.0+0x17e/0x7d0
 ? __switch_to+0x5cf/0x1070
 ? __schedule+0x95b/0x2640
 ? io_schedule_timeout+0x160/0x160
 __napi_poll+0xa1/0x440
 napi_threaded_poll+0x3d1/0x460
 ? __napi_poll+0x440/0x440
 ? __kthread_parkme+0xc6/0x1f0
 ? __napi_poll+0x440/0x440
 kthread+0x2a2/0x340
 ? kthread_complete_and_exit+0x20/0x20
 ret_from_fork+0x22/0x30
 </TASK>

Freed by task 148640:
 kasan_save_stack+0x23/0x50
 kasan_set_track+0x21/0x30
 kasan_save_free_info+0x2a/0x40
 ____kasan_slab_free+0x169/0x1d0
 slab_free_freelist_hook+0xd2/0x190
 __kmem_cache_free+0x1a1/0x2f0
 skb_release_data+0x449/0x600
 consume_skb+0x9f/0x1c0
 veth_xdp_rcv_skb+0x89c/0x1ba0 [veth]
 veth_xdp_rcv+0x304/0xa20 [veth]
 veth_poll+0x139/0x571 [veth]
 __napi_poll+0xa1/0x440
 napi_threaded_poll+0x3d1/0x460
 kthread+0x2a2/0x340
 ret_from_fork+0x22/0x30


The buggy address belongs to the object at ffff888976250000
 which belongs to the cache kmalloc-2k of size 2048
The buggy address is located 340 bytes inside of
 2048-byte region [ffff888976250000, ffff888976250800)

The buggy address belongs to the physical page:
page:00000000ae18262a refcount:2 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x976250
head:00000000ae18262a order:3 compound_mapcount:0 compound_pincount:0
flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
raw: 002ffff800010200 0000000000000000 dead000000000122 ffff88810004cf00
raw: 0000000000000000 0000000080080008 00000002ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
 ffff888976250000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 ffff888976250080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff888976250100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                                 ^
 ffff888976250180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 ffff888976250200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb


If I understand the code correctly it looks like a xdp_buf is
constructed pointing to the memory backed by a skb but consume_skb()
is called while the xdp_buf() is still in use.

```
	case XDP_REDIRECT:
		veth_xdp_get(&xdp);
		consume_skb(skb);
		xdp.rxq->mem = rq->xdp_mem;
		if (xdp_do_redirect(rq->dev, &xdp, xdp_prog)) {
			stats->rx_drops++;
			goto err_xdp;
		}
		stats->xdp_redirect++;
		rcu_read_unlock();
		goto xdp_xmit;
```

It is worth noting that I think XDP_TX has the exact same problem.

Again assuming I understand the problem one naive solution might be to
move the consum_skb() call after xdp_do_redirect().  I think this
might work for BPF_MAP_TYPE_XSKMAP, BPF_MAP_TYPE_DEVMAP, and
BPF_MAP_TYPE_DEVMAP_HASH since those all seem to copy the xdb_buf to
new memory.  The copy happens for XSKMAP in __xsk_rcv() and for the
DEVMAP cases happens in dev_map_enqueue_clone().

However, it would appear that for BPF_MAP_TYPE_CPUMAP that memory can
live much longer, possibly even after xdp_do_flush().  If I'm correct,
I'm not really sure where it would be safe to call consume_skb().

The XDP_TX case looks similarly complex but I have not spent as much
time looking at that one.

Thanks,
Shawn Bohrer

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: KASAN veth use after free in XDP_REDIRECT
  2023-01-24 22:45 KASAN veth use after free in XDP_REDIRECT Shawn Bohrer
@ 2023-01-25  1:54 ` Toke Høiland-Jørgensen
  2023-01-25  2:02   ` Jakub Kicinski
                     ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Toke Høiland-Jørgensen @ 2023-01-25  1:54 UTC (permalink / raw)
  To: Shawn Bohrer, netdev, bpf; +Cc: makita.toshiaki

Shawn Bohrer <sbohrer@cloudflare.com> writes:

> Hello,
>
> We've seen the following KASAN report on our systems. When using
> AF_XDP on a veth.
>
> KASAN report:
>
> BUG: KASAN: use-after-free in __xsk_rcv+0x18d/0x2c0
> Read of size 78 at addr ffff888976250154 by task napi/iconduit-g/148640
>
> CPU: 5 PID: 148640 Comm: napi/iconduit-g Kdump: loaded Tainted: G           O       6.1.4-cloudflare-kasan-2023.1.2 #1
> Hardware name: Quanta Computer Inc. QuantaPlex T41S-2U/S2S-MB, BIOS S2S_3B10.03 06/21/2018
> Call Trace:
>  <TASK>
>  dump_stack_lvl+0x34/0x48
>  print_report+0x170/0x473
>  ? __xsk_rcv+0x18d/0x2c0
>  kasan_report+0xad/0x130
>  ? __xsk_rcv+0x18d/0x2c0
>  kasan_check_range+0x149/0x1a0
>  memcpy+0x20/0x60
>  __xsk_rcv+0x18d/0x2c0
>  __xsk_map_redirect+0x1f3/0x490
>  ? veth_xdp_rcv_skb+0x89c/0x1ba0 [veth]
>  xdp_do_redirect+0x5ca/0xd60
>  veth_xdp_rcv_skb+0x935/0x1ba0 [veth]
>  ? __netif_receive_skb_list_core+0x671/0x920
>  ? veth_xdp+0x670/0x670 [veth]
>  veth_xdp_rcv+0x304/0xa20 [veth]
>  ? do_xdp_generic+0x150/0x150
>  ? veth_xdp_rcv_one+0xde0/0xde0 [veth]
>  ? _raw_spin_lock_bh+0xe0/0xe0
>  ? newidle_balance+0x887/0xe30
>  ? __perf_event_task_sched_in+0xdb/0x800
>  veth_poll+0x139/0x571 [veth]
>  ? veth_xdp_rcv+0xa20/0xa20 [veth]
>  ? _raw_spin_unlock+0x39/0x70
>  ? finish_task_switch.isra.0+0x17e/0x7d0
>  ? __switch_to+0x5cf/0x1070
>  ? __schedule+0x95b/0x2640
>  ? io_schedule_timeout+0x160/0x160
>  __napi_poll+0xa1/0x440
>  napi_threaded_poll+0x3d1/0x460
>  ? __napi_poll+0x440/0x440
>  ? __kthread_parkme+0xc6/0x1f0
>  ? __napi_poll+0x440/0x440
>  kthread+0x2a2/0x340
>  ? kthread_complete_and_exit+0x20/0x20
>  ret_from_fork+0x22/0x30
>  </TASK>
>
> Freed by task 148640:
>  kasan_save_stack+0x23/0x50
>  kasan_set_track+0x21/0x30
>  kasan_save_free_info+0x2a/0x40
>  ____kasan_slab_free+0x169/0x1d0
>  slab_free_freelist_hook+0xd2/0x190
>  __kmem_cache_free+0x1a1/0x2f0
>  skb_release_data+0x449/0x600
>  consume_skb+0x9f/0x1c0
>  veth_xdp_rcv_skb+0x89c/0x1ba0 [veth]
>  veth_xdp_rcv+0x304/0xa20 [veth]
>  veth_poll+0x139/0x571 [veth]
>  __napi_poll+0xa1/0x440
>  napi_threaded_poll+0x3d1/0x460
>  kthread+0x2a2/0x340
>  ret_from_fork+0x22/0x30
>
>
> The buggy address belongs to the object at ffff888976250000
>  which belongs to the cache kmalloc-2k of size 2048
> The buggy address is located 340 bytes inside of
>  2048-byte region [ffff888976250000, ffff888976250800)
>
> The buggy address belongs to the physical page:
> page:00000000ae18262a refcount:2 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x976250
> head:00000000ae18262a order:3 compound_mapcount:0 compound_pincount:0
> flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
> raw: 002ffff800010200 0000000000000000 dead000000000122 ffff88810004cf00
> raw: 0000000000000000 0000000080080008 00000002ffffffff 0000000000000000
> page dumped because: kasan: bad access detected
>
> Memory state around the buggy address:
>  ffff888976250000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>  ffff888976250080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>>ffff888976250100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>                                                  ^
>  ffff888976250180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>  ffff888976250200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>
>
> If I understand the code correctly it looks like a xdp_buf is
> constructed pointing to the memory backed by a skb but consume_skb()
> is called while the xdp_buf() is still in use.
>
> ```
> 	case XDP_REDIRECT:
> 		veth_xdp_get(&xdp);
> 		consume_skb(skb);
> 		xdp.rxq->mem = rq->xdp_mem;
> 		if (xdp_do_redirect(rq->dev, &xdp, xdp_prog)) {
> 			stats->rx_drops++;
> 			goto err_xdp;
> 		}
> 		stats->xdp_redirect++;
> 		rcu_read_unlock();
> 		goto xdp_xmit;
> ```
>
> It is worth noting that I think XDP_TX has the exact same problem.
>
> Again assuming I understand the problem one naive solution might be to
> move the consum_skb() call after xdp_do_redirect().  I think this
> might work for BPF_MAP_TYPE_XSKMAP, BPF_MAP_TYPE_DEVMAP, and
> BPF_MAP_TYPE_DEVMAP_HASH since those all seem to copy the xdb_buf to
> new memory.  The copy happens for XSKMAP in __xsk_rcv() and for the
> DEVMAP cases happens in dev_map_enqueue_clone().
>
> However, it would appear that for BPF_MAP_TYPE_CPUMAP that memory can
> live much longer, possibly even after xdp_do_flush().  If I'm correct,
> I'm not really sure where it would be safe to call consume_skb().

So the idea is that veth_xdp_get() does a
get_page(virt_to_page(xdp->data)), where xdp->data in this case points
to skb->head. This should keep the data page alive even if the skb
surrounding it is freed by the call to consume_skb().

However, because the skb->head in this case was allocated from a slab
allocator, taking a page refcount is not enough to prevent it from being
freed.

I'm not sure how best to fix this. I guess we could try to detect this
case in the veth driver and copy the data, like we do for skb_share()
etc. However, I'm not sure how to actually detect this case...

Where are the skbs being processed coming from? I.e., what path
allocated them?

-Toke

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: KASAN veth use after free in XDP_REDIRECT
  2023-01-25  1:54 ` Toke Høiland-Jørgensen
@ 2023-01-25  2:02   ` Jakub Kicinski
  2023-01-25  2:07   ` Toshiaki Makita
  2023-01-25 16:40   ` KASAN veth " Shawn Bohrer
  2 siblings, 0 replies; 13+ messages in thread
From: Jakub Kicinski @ 2023-01-25  2:02 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Shawn Bohrer, netdev, bpf, makita.toshiaki

On Wed, 25 Jan 2023 02:54:52 +0100 Toke Høiland-Jørgensen wrote:
> However, because the skb->head in this case was allocated from a slab
> allocator, taking a page refcount is not enough to prevent it from being
> freed.
> 
> I'm not sure how best to fix this. I guess we could try to detect this
> case in the veth driver and copy the data, like we do for skb_share()
> etc. However, I'm not sure how to actually detect this case...

sbk->head_frag ?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: KASAN veth use after free in XDP_REDIRECT
  2023-01-25  1:54 ` Toke Høiland-Jørgensen
  2023-01-25  2:02   ` Jakub Kicinski
@ 2023-01-25  2:07   ` Toshiaki Makita
  2023-03-08 22:33     ` Shawn Bohrer
  2023-01-25 16:40   ` KASAN veth " Shawn Bohrer
  2 siblings, 1 reply; 13+ messages in thread
From: Toshiaki Makita @ 2023-01-25  2:07 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen, Shawn Bohrer
  Cc: makita.toshiaki, netdev, bpf

On 2023/01/25 10:54, Toke Høiland-Jørgensen wrote:
> Shawn Bohrer <sbohrer@cloudflare.com> writes:
> 
>> Hello,
>>
>> We've seen the following KASAN report on our systems. When using
>> AF_XDP on a veth.
>>
>> KASAN report:
>>
>> BUG: KASAN: use-after-free in __xsk_rcv+0x18d/0x2c0
>> Read of size 78 at addr ffff888976250154 by task napi/iconduit-g/148640
>>
>> CPU: 5 PID: 148640 Comm: napi/iconduit-g Kdump: loaded Tainted: G           O       6.1.4-cloudflare-kasan-2023.1.2 #1
>> Hardware name: Quanta Computer Inc. QuantaPlex T41S-2U/S2S-MB, BIOS S2S_3B10.03 06/21/2018
>> Call Trace:
>>   <TASK>
>>   dump_stack_lvl+0x34/0x48
>>   print_report+0x170/0x473
>>   ? __xsk_rcv+0x18d/0x2c0
>>   kasan_report+0xad/0x130
>>   ? __xsk_rcv+0x18d/0x2c0
>>   kasan_check_range+0x149/0x1a0
>>   memcpy+0x20/0x60
>>   __xsk_rcv+0x18d/0x2c0
>>   __xsk_map_redirect+0x1f3/0x490
>>   ? veth_xdp_rcv_skb+0x89c/0x1ba0 [veth]
>>   xdp_do_redirect+0x5ca/0xd60
>>   veth_xdp_rcv_skb+0x935/0x1ba0 [veth]
>>   ? __netif_receive_skb_list_core+0x671/0x920
>>   ? veth_xdp+0x670/0x670 [veth]
>>   veth_xdp_rcv+0x304/0xa20 [veth]
>>   ? do_xdp_generic+0x150/0x150
>>   ? veth_xdp_rcv_one+0xde0/0xde0 [veth]
>>   ? _raw_spin_lock_bh+0xe0/0xe0
>>   ? newidle_balance+0x887/0xe30
>>   ? __perf_event_task_sched_in+0xdb/0x800
>>   veth_poll+0x139/0x571 [veth]
>>   ? veth_xdp_rcv+0xa20/0xa20 [veth]
>>   ? _raw_spin_unlock+0x39/0x70
>>   ? finish_task_switch.isra.0+0x17e/0x7d0
>>   ? __switch_to+0x5cf/0x1070
>>   ? __schedule+0x95b/0x2640
>>   ? io_schedule_timeout+0x160/0x160
>>   __napi_poll+0xa1/0x440
>>   napi_threaded_poll+0x3d1/0x460
>>   ? __napi_poll+0x440/0x440
>>   ? __kthread_parkme+0xc6/0x1f0
>>   ? __napi_poll+0x440/0x440
>>   kthread+0x2a2/0x340
>>   ? kthread_complete_and_exit+0x20/0x20
>>   ret_from_fork+0x22/0x30
>>   </TASK>
>>
>> Freed by task 148640:
>>   kasan_save_stack+0x23/0x50
>>   kasan_set_track+0x21/0x30
>>   kasan_save_free_info+0x2a/0x40
>>   ____kasan_slab_free+0x169/0x1d0
>>   slab_free_freelist_hook+0xd2/0x190
>>   __kmem_cache_free+0x1a1/0x2f0
>>   skb_release_data+0x449/0x600
>>   consume_skb+0x9f/0x1c0
>>   veth_xdp_rcv_skb+0x89c/0x1ba0 [veth]
>>   veth_xdp_rcv+0x304/0xa20 [veth]
>>   veth_poll+0x139/0x571 [veth]
>>   __napi_poll+0xa1/0x440
>>   napi_threaded_poll+0x3d1/0x460
>>   kthread+0x2a2/0x340
>>   ret_from_fork+0x22/0x30
>>
>>
>> The buggy address belongs to the object at ffff888976250000
>>   which belongs to the cache kmalloc-2k of size 2048
>> The buggy address is located 340 bytes inside of
>>   2048-byte region [ffff888976250000, ffff888976250800)
>>
>> The buggy address belongs to the physical page:
>> page:00000000ae18262a refcount:2 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x976250
>> head:00000000ae18262a order:3 compound_mapcount:0 compound_pincount:0
>> flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
>> raw: 002ffff800010200 0000000000000000 dead000000000122 ffff88810004cf00
>> raw: 0000000000000000 0000000080080008 00000002ffffffff 0000000000000000
>> page dumped because: kasan: bad access detected
>>
>> Memory state around the buggy address:
>>   ffff888976250000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>>   ffff888976250080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>>> ffff888976250100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>>                                                   ^
>>   ffff888976250180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>>   ffff888976250200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>>
>>
>> If I understand the code correctly it looks like a xdp_buf is
>> constructed pointing to the memory backed by a skb but consume_skb()
>> is called while the xdp_buf() is still in use.
>>
>> ```
>> 	case XDP_REDIRECT:
>> 		veth_xdp_get(&xdp);
>> 		consume_skb(skb);
>> 		xdp.rxq->mem = rq->xdp_mem;
>> 		if (xdp_do_redirect(rq->dev, &xdp, xdp_prog)) {
>> 			stats->rx_drops++;
>> 			goto err_xdp;
>> 		}
>> 		stats->xdp_redirect++;
>> 		rcu_read_unlock();
>> 		goto xdp_xmit;
>> ```
>>
>> It is worth noting that I think XDP_TX has the exact same problem.
>>
>> Again assuming I understand the problem one naive solution might be to
>> move the consum_skb() call after xdp_do_redirect().  I think this
>> might work for BPF_MAP_TYPE_XSKMAP, BPF_MAP_TYPE_DEVMAP, and
>> BPF_MAP_TYPE_DEVMAP_HASH since those all seem to copy the xdb_buf to
>> new memory.  The copy happens for XSKMAP in __xsk_rcv() and for the
>> DEVMAP cases happens in dev_map_enqueue_clone().
>>
>> However, it would appear that for BPF_MAP_TYPE_CPUMAP that memory can
>> live much longer, possibly even after xdp_do_flush().  If I'm correct,
>> I'm not really sure where it would be safe to call consume_skb().
> 
> So the idea is that veth_xdp_get() does a
> get_page(virt_to_page(xdp->data)), where xdp->data in this case points
> to skb->head. This should keep the data page alive even if the skb
> surrounding it is freed by the call to consume_skb().
> 
> However, because the skb->head in this case was allocated from a slab
> allocator, taking a page refcount is not enough to prevent it from being
> freed.

Not sure why skb->head is kmallocked here.
skb_head_is_locked() check in veth_convert_skb_to_xdp_buff() should ensure that
skb head is a page fragment.

Toshiaki Makita

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: KASAN veth use after free in XDP_REDIRECT
  2023-01-25  2:07   ` Toshiaki Makita
@ 2023-03-08 22:33     ` Shawn Bohrer
  2023-03-14  8:02       ` Toshiaki Makita
  0 siblings, 1 reply; 13+ messages in thread
From: Shawn Bohrer @ 2023-03-08 22:33 UTC (permalink / raw)
  To: Toshiaki Makita
  Cc: Toke Høiland-Jørgensen, makita.toshiaki, netdev, bpf,
	lorenzo, kernel-team

On Wed, Jan 25, 2023 at 11:07:32AM +0900, Toshiaki Makita wrote:
> On 2023/01/25 10:54, Toke Høiland-Jørgensen wrote:
> > Shawn Bohrer <sbohrer@cloudflare.com> writes:
> > 
> > > Hello,
> > > 
> > > We've seen the following KASAN report on our systems. When using
> > > AF_XDP on a veth.
> > > 
> > > KASAN report:
> > > 
> > > BUG: KASAN: use-after-free in __xsk_rcv+0x18d/0x2c0
> > > Read of size 78 at addr ffff888976250154 by task napi/iconduit-g/148640
> > > 
> > > CPU: 5 PID: 148640 Comm: napi/iconduit-g Kdump: loaded Tainted: G           O       6.1.4-cloudflare-kasan-2023.1.2 #1
> > > Hardware name: Quanta Computer Inc. QuantaPlex T41S-2U/S2S-MB, BIOS S2S_3B10.03 06/21/2018
> > > Call Trace:
> > >   <TASK>
> > >   dump_stack_lvl+0x34/0x48
> > >   print_report+0x170/0x473
> > >   ? __xsk_rcv+0x18d/0x2c0
> > >   kasan_report+0xad/0x130
> > >   ? __xsk_rcv+0x18d/0x2c0
> > >   kasan_check_range+0x149/0x1a0
> > >   memcpy+0x20/0x60
> > >   __xsk_rcv+0x18d/0x2c0
> > >   __xsk_map_redirect+0x1f3/0x490
> > >   ? veth_xdp_rcv_skb+0x89c/0x1ba0 [veth]
> > >   xdp_do_redirect+0x5ca/0xd60
> > >   veth_xdp_rcv_skb+0x935/0x1ba0 [veth]
> > >   ? __netif_receive_skb_list_core+0x671/0x920
> > >   ? veth_xdp+0x670/0x670 [veth]
> > >   veth_xdp_rcv+0x304/0xa20 [veth]
> > >   ? do_xdp_generic+0x150/0x150
> > >   ? veth_xdp_rcv_one+0xde0/0xde0 [veth]
> > >   ? _raw_spin_lock_bh+0xe0/0xe0
> > >   ? newidle_balance+0x887/0xe30
> > >   ? __perf_event_task_sched_in+0xdb/0x800
> > >   veth_poll+0x139/0x571 [veth]
> > >   ? veth_xdp_rcv+0xa20/0xa20 [veth]
> > >   ? _raw_spin_unlock+0x39/0x70
> > >   ? finish_task_switch.isra.0+0x17e/0x7d0
> > >   ? __switch_to+0x5cf/0x1070
> > >   ? __schedule+0x95b/0x2640
> > >   ? io_schedule_timeout+0x160/0x160
> > >   __napi_poll+0xa1/0x440
> > >   napi_threaded_poll+0x3d1/0x460
> > >   ? __napi_poll+0x440/0x440
> > >   ? __kthread_parkme+0xc6/0x1f0
> > >   ? __napi_poll+0x440/0x440
> > >   kthread+0x2a2/0x340
> > >   ? kthread_complete_and_exit+0x20/0x20
> > >   ret_from_fork+0x22/0x30
> > >   </TASK>
> > > 
> > > Freed by task 148640:
> > >   kasan_save_stack+0x23/0x50
> > >   kasan_set_track+0x21/0x30
> > >   kasan_save_free_info+0x2a/0x40
> > >   ____kasan_slab_free+0x169/0x1d0
> > >   slab_free_freelist_hook+0xd2/0x190
> > >   __kmem_cache_free+0x1a1/0x2f0
> > >   skb_release_data+0x449/0x600
> > >   consume_skb+0x9f/0x1c0
> > >   veth_xdp_rcv_skb+0x89c/0x1ba0 [veth]
> > >   veth_xdp_rcv+0x304/0xa20 [veth]
> > >   veth_poll+0x139/0x571 [veth]
> > >   __napi_poll+0xa1/0x440
> > >   napi_threaded_poll+0x3d1/0x460
> > >   kthread+0x2a2/0x340
> > >   ret_from_fork+0x22/0x30
> > > 
> > > 
> > > The buggy address belongs to the object at ffff888976250000
> > >   which belongs to the cache kmalloc-2k of size 2048
> > > The buggy address is located 340 bytes inside of
> > >   2048-byte region [ffff888976250000, ffff888976250800)
> > > 
> > > The buggy address belongs to the physical page:
> > > page:00000000ae18262a refcount:2 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x976250
> > > head:00000000ae18262a order:3 compound_mapcount:0 compound_pincount:0
> > > flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
> > > raw: 002ffff800010200 0000000000000000 dead000000000122 ffff88810004cf00
> > > raw: 0000000000000000 0000000080080008 00000002ffffffff 0000000000000000
> > > page dumped because: kasan: bad access detected
> > > 
> > > Memory state around the buggy address:
> > >   ffff888976250000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> > >   ffff888976250080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> > > > ffff888976250100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> > >                                                   ^
> > >   ffff888976250180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> > >   ffff888976250200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> > > 
> > > 
> > > If I understand the code correctly it looks like a xdp_buf is
> > > constructed pointing to the memory backed by a skb but consume_skb()
> > > is called while the xdp_buf() is still in use.
> > > 
> > > ```
> > > 	case XDP_REDIRECT:
> > > 		veth_xdp_get(&xdp);
> > > 		consume_skb(skb);
> > > 		xdp.rxq->mem = rq->xdp_mem;
> > > 		if (xdp_do_redirect(rq->dev, &xdp, xdp_prog)) {
> > > 			stats->rx_drops++;
> > > 			goto err_xdp;
> > > 		}
> > > 		stats->xdp_redirect++;
> > > 		rcu_read_unlock();
> > > 		goto xdp_xmit;
> > > ```
> > > 
> > > It is worth noting that I think XDP_TX has the exact same problem.
> > > 
> > > Again assuming I understand the problem one naive solution might be to
> > > move the consum_skb() call after xdp_do_redirect().  I think this
> > > might work for BPF_MAP_TYPE_XSKMAP, BPF_MAP_TYPE_DEVMAP, and
> > > BPF_MAP_TYPE_DEVMAP_HASH since those all seem to copy the xdb_buf to
> > > new memory.  The copy happens for XSKMAP in __xsk_rcv() and for the
> > > DEVMAP cases happens in dev_map_enqueue_clone().
> > > 
> > > However, it would appear that for BPF_MAP_TYPE_CPUMAP that memory can
> > > live much longer, possibly even after xdp_do_flush().  If I'm correct,
> > > I'm not really sure where it would be safe to call consume_skb().
> > 
> > So the idea is that veth_xdp_get() does a
> > get_page(virt_to_page(xdp->data)), where xdp->data in this case points
> > to skb->head. This should keep the data page alive even if the skb
> > surrounding it is freed by the call to consume_skb().
> > 
> > However, because the skb->head in this case was allocated from a slab
> > allocator, taking a page refcount is not enough to prevent it from being
> > freed.
> 
> Not sure why skb->head is kmallocked here.
> skb_head_is_locked() check in veth_convert_skb_to_xdp_buff() should ensure that
> skb head is a page fragment.

I have a few more details here.  We have some machines running 5.15
kernels and some are running 6.1 kernels.  So far it appears this only
happens on 6.1.  We also have a couple of different network cards but
it appears that only the machines with Solarflare cards using the sfc
driver hit the KASAN BUG.

718a18a0c8a67f97781e40bdef7cdd055c430996 "veth: Rework
veth_xdp_rcv_skb in order to accept non-linear skb" reworked and added
the veth_convert_skb_to_xdp_buff() call you mentioned.  Importantly it
also added a call to pskb_expand_head() which will kmalloc() the
skb->head.  This looks like a path that could be causing the KASAN
BUG, but I have not yet confirmed that is the path we are hitting.
This change was also added in 5.18 so might explain why we don't see
it on 5.15.

--
Shawn


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: KASAN veth use after free in XDP_REDIRECT
  2023-03-08 22:33     ` Shawn Bohrer
@ 2023-03-14  8:02       ` Toshiaki Makita
  2023-03-14 15:25         ` Shawn Bohrer
  0 siblings, 1 reply; 13+ messages in thread
From: Toshiaki Makita @ 2023-03-14  8:02 UTC (permalink / raw)
  To: Shawn Bohrer, lorenzo
  Cc: Toke Høiland-Jørgensen, makita.toshiaki, netdev, bpf,
	kernel-team

On 2023/03/09 7:33, Shawn Bohrer wrote:
> On Wed, Jan 25, 2023 at 11:07:32AM +0900, Toshiaki Makita wrote:
>> On 2023/01/25 10:54, Toke Høiland-Jørgensen wrote:
>>> Shawn Bohrer <sbohrer@cloudflare.com> writes:
>>>
>>>> Hello,
>>>>
>>>> We've seen the following KASAN report on our systems. When using
>>>> AF_XDP on a veth.
>>>>
>>>> KASAN report:
>>>>
>>>> BUG: KASAN: use-after-free in __xsk_rcv+0x18d/0x2c0
>>>> Read of size 78 at addr ffff888976250154 by task napi/iconduit-g/148640
>>>>
>>>> CPU: 5 PID: 148640 Comm: napi/iconduit-g Kdump: loaded Tainted: G           O       6.1.4-cloudflare-kasan-2023.1.2 #1
>>>> Hardware name: Quanta Computer Inc. QuantaPlex T41S-2U/S2S-MB, BIOS S2S_3B10.03 06/21/2018
>>>> Call Trace:
>>>>    <TASK>
>>>>    dump_stack_lvl+0x34/0x48
>>>>    print_report+0x170/0x473
>>>>    ? __xsk_rcv+0x18d/0x2c0
>>>>    kasan_report+0xad/0x130
>>>>    ? __xsk_rcv+0x18d/0x2c0
>>>>    kasan_check_range+0x149/0x1a0
>>>>    memcpy+0x20/0x60
>>>>    __xsk_rcv+0x18d/0x2c0
>>>>    __xsk_map_redirect+0x1f3/0x490
>>>>    ? veth_xdp_rcv_skb+0x89c/0x1ba0 [veth]
>>>>    xdp_do_redirect+0x5ca/0xd60
>>>>    veth_xdp_rcv_skb+0x935/0x1ba0 [veth]
>>>>    ? __netif_receive_skb_list_core+0x671/0x920
>>>>    ? veth_xdp+0x670/0x670 [veth]
>>>>    veth_xdp_rcv+0x304/0xa20 [veth]
>>>>    ? do_xdp_generic+0x150/0x150
>>>>    ? veth_xdp_rcv_one+0xde0/0xde0 [veth]
>>>>    ? _raw_spin_lock_bh+0xe0/0xe0
>>>>    ? newidle_balance+0x887/0xe30
>>>>    ? __perf_event_task_sched_in+0xdb/0x800
>>>>    veth_poll+0x139/0x571 [veth]
>>>>    ? veth_xdp_rcv+0xa20/0xa20 [veth]
>>>>    ? _raw_spin_unlock+0x39/0x70
>>>>    ? finish_task_switch.isra.0+0x17e/0x7d0
>>>>    ? __switch_to+0x5cf/0x1070
>>>>    ? __schedule+0x95b/0x2640
>>>>    ? io_schedule_timeout+0x160/0x160
>>>>    __napi_poll+0xa1/0x440
>>>>    napi_threaded_poll+0x3d1/0x460
>>>>    ? __napi_poll+0x440/0x440
>>>>    ? __kthread_parkme+0xc6/0x1f0
>>>>    ? __napi_poll+0x440/0x440
>>>>    kthread+0x2a2/0x340
>>>>    ? kthread_complete_and_exit+0x20/0x20
>>>>    ret_from_fork+0x22/0x30
>>>>    </TASK>
>>>>
>>>> Freed by task 148640:
>>>>    kasan_save_stack+0x23/0x50
>>>>    kasan_set_track+0x21/0x30
>>>>    kasan_save_free_info+0x2a/0x40
>>>>    ____kasan_slab_free+0x169/0x1d0
>>>>    slab_free_freelist_hook+0xd2/0x190
>>>>    __kmem_cache_free+0x1a1/0x2f0
>>>>    skb_release_data+0x449/0x600
>>>>    consume_skb+0x9f/0x1c0
>>>>    veth_xdp_rcv_skb+0x89c/0x1ba0 [veth]
>>>>    veth_xdp_rcv+0x304/0xa20 [veth]
>>>>    veth_poll+0x139/0x571 [veth]
>>>>    __napi_poll+0xa1/0x440
>>>>    napi_threaded_poll+0x3d1/0x460
>>>>    kthread+0x2a2/0x340
>>>>    ret_from_fork+0x22/0x30
>>>>
>>>>
>>>> The buggy address belongs to the object at ffff888976250000
>>>>    which belongs to the cache kmalloc-2k of size 2048
>>>> The buggy address is located 340 bytes inside of
>>>>    2048-byte region [ffff888976250000, ffff888976250800)
>>>>
>>>> The buggy address belongs to the physical page:
>>>> page:00000000ae18262a refcount:2 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x976250
>>>> head:00000000ae18262a order:3 compound_mapcount:0 compound_pincount:0
>>>> flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
>>>> raw: 002ffff800010200 0000000000000000 dead000000000122 ffff88810004cf00
>>>> raw: 0000000000000000 0000000080080008 00000002ffffffff 0000000000000000
>>>> page dumped because: kasan: bad access detected
>>>>
>>>> Memory state around the buggy address:
>>>>    ffff888976250000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>>>>    ffff888976250080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>>>>> ffff888976250100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>>>>                                                    ^
>>>>    ffff888976250180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>>>>    ffff888976250200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>>>>
>>>>
>>>> If I understand the code correctly it looks like a xdp_buf is
>>>> constructed pointing to the memory backed by a skb but consume_skb()
>>>> is called while the xdp_buf() is still in use.
>>>>
>>>> ```
>>>> 	case XDP_REDIRECT:
>>>> 		veth_xdp_get(&xdp);
>>>> 		consume_skb(skb);
>>>> 		xdp.rxq->mem = rq->xdp_mem;
>>>> 		if (xdp_do_redirect(rq->dev, &xdp, xdp_prog)) {
>>>> 			stats->rx_drops++;
>>>> 			goto err_xdp;
>>>> 		}
>>>> 		stats->xdp_redirect++;
>>>> 		rcu_read_unlock();
>>>> 		goto xdp_xmit;
>>>> ```
>>>>
>>>> It is worth noting that I think XDP_TX has the exact same problem.
>>>>
>>>> Again assuming I understand the problem one naive solution might be to
>>>> move the consum_skb() call after xdp_do_redirect().  I think this
>>>> might work for BPF_MAP_TYPE_XSKMAP, BPF_MAP_TYPE_DEVMAP, and
>>>> BPF_MAP_TYPE_DEVMAP_HASH since those all seem to copy the xdb_buf to
>>>> new memory.  The copy happens for XSKMAP in __xsk_rcv() and for the
>>>> DEVMAP cases happens in dev_map_enqueue_clone().
>>>>
>>>> However, it would appear that for BPF_MAP_TYPE_CPUMAP that memory can
>>>> live much longer, possibly even after xdp_do_flush().  If I'm correct,
>>>> I'm not really sure where it would be safe to call consume_skb().
>>>
>>> So the idea is that veth_xdp_get() does a
>>> get_page(virt_to_page(xdp->data)), where xdp->data in this case points
>>> to skb->head. This should keep the data page alive even if the skb
>>> surrounding it is freed by the call to consume_skb().
>>>
>>> However, because the skb->head in this case was allocated from a slab
>>> allocator, taking a page refcount is not enough to prevent it from being
>>> freed.
>>
>> Not sure why skb->head is kmallocked here.
>> skb_head_is_locked() check in veth_convert_skb_to_xdp_buff() should ensure that
>> skb head is a page fragment.
> 
> I have a few more details here.  We have some machines running 5.15
> kernels and some are running 6.1 kernels.  So far it appears this only
> happens on 6.1.  We also have a couple of different network cards but
> it appears that only the machines with Solarflare cards using the sfc
> driver hit the KASAN BUG.
> 
> 718a18a0c8a67f97781e40bdef7cdd055c430996 "veth: Rework
> veth_xdp_rcv_skb in order to accept non-linear skb" reworked and added
> the veth_convert_skb_to_xdp_buff() call you mentioned.  Importantly it
> also added a call to pskb_expand_head() which will kmalloc() the
> skb->head.  This looks like a path that could be causing the KASAN
> BUG, but I have not yet confirmed that is the path we are hitting.
> This change was also added in 5.18 so might explain why we don't see
> it on 5.15.

I think you made a valid point. It looks like pskb_expand_head() is incorrect here.
At least I did not expect kmallocked skb->data at this point when I introduced
veth XDP.

Also I looked into the discussion on the suspected patch,
https://patchwork.kernel.org/project/netdevbpf/list/?series=&submitter=&state=*&q=rework+veth_xdp_rcv_skb&archive=both&delegate=
But there was no discussion on pskb_expand_head() nor kmallocked data.

Lorenzo, what do you think?

Toshiaki Makita

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: KASAN veth use after free in XDP_REDIRECT
  2023-03-14  8:02       ` Toshiaki Makita
@ 2023-03-14 15:25         ` Shawn Bohrer
  2023-03-14 15:33           ` [PATCH] veth: Fix " Shawn Bohrer
  0 siblings, 1 reply; 13+ messages in thread
From: Shawn Bohrer @ 2023-03-14 15:25 UTC (permalink / raw)
  To: Toshiaki Makita
  Cc: lorenzo, Toke Høiland-Jørgensen, makita.toshiaki,
	netdev, bpf, kernel-team

On Tue, Mar 14, 2023 at 05:02:40PM +0900, Toshiaki Makita wrote:
> On 2023/03/09 7:33, Shawn Bohrer wrote:
> > On Wed, Jan 25, 2023 at 11:07:32AM +0900, Toshiaki Makita wrote:
> > > On 2023/01/25 10:54, Toke Høiland-Jørgensen wrote:
> > > > Shawn Bohrer <sbohrer@cloudflare.com> writes:
> > > > 
> > > > > Hello,
> > > > > 
> > > > > We've seen the following KASAN report on our systems. When using
> > > > > AF_XDP on a veth.
> > > > > 
> > > > > KASAN report:
> > > > > 
> > > > > BUG: KASAN: use-after-free in __xsk_rcv+0x18d/0x2c0
> > > > > Read of size 78 at addr ffff888976250154 by task napi/iconduit-g/148640
> > > > > 
> > > > > CPU: 5 PID: 148640 Comm: napi/iconduit-g Kdump: loaded Tainted: G           O       6.1.4-cloudflare-kasan-2023.1.2 #1
> > > > > Hardware name: Quanta Computer Inc. QuantaPlex T41S-2U/S2S-MB, BIOS S2S_3B10.03 06/21/2018
> > > > > Call Trace:
> > > > >    <TASK>
> > > > >    dump_stack_lvl+0x34/0x48
> > > > >    print_report+0x170/0x473
> > > > >    ? __xsk_rcv+0x18d/0x2c0
> > > > >    kasan_report+0xad/0x130
> > > > >    ? __xsk_rcv+0x18d/0x2c0
> > > > >    kasan_check_range+0x149/0x1a0
> > > > >    memcpy+0x20/0x60
> > > > >    __xsk_rcv+0x18d/0x2c0
> > > > >    __xsk_map_redirect+0x1f3/0x490
> > > > >    ? veth_xdp_rcv_skb+0x89c/0x1ba0 [veth]
> > > > >    xdp_do_redirect+0x5ca/0xd60
> > > > >    veth_xdp_rcv_skb+0x935/0x1ba0 [veth]
> > > > >    ? __netif_receive_skb_list_core+0x671/0x920
> > > > >    ? veth_xdp+0x670/0x670 [veth]
> > > > >    veth_xdp_rcv+0x304/0xa20 [veth]
> > > > >    ? do_xdp_generic+0x150/0x150
> > > > >    ? veth_xdp_rcv_one+0xde0/0xde0 [veth]
> > > > >    ? _raw_spin_lock_bh+0xe0/0xe0
> > > > >    ? newidle_balance+0x887/0xe30
> > > > >    ? __perf_event_task_sched_in+0xdb/0x800
> > > > >    veth_poll+0x139/0x571 [veth]
> > > > >    ? veth_xdp_rcv+0xa20/0xa20 [veth]
> > > > >    ? _raw_spin_unlock+0x39/0x70
> > > > >    ? finish_task_switch.isra.0+0x17e/0x7d0
> > > > >    ? __switch_to+0x5cf/0x1070
> > > > >    ? __schedule+0x95b/0x2640
> > > > >    ? io_schedule_timeout+0x160/0x160
> > > > >    __napi_poll+0xa1/0x440
> > > > >    napi_threaded_poll+0x3d1/0x460
> > > > >    ? __napi_poll+0x440/0x440
> > > > >    ? __kthread_parkme+0xc6/0x1f0
> > > > >    ? __napi_poll+0x440/0x440
> > > > >    kthread+0x2a2/0x340
> > > > >    ? kthread_complete_and_exit+0x20/0x20
> > > > >    ret_from_fork+0x22/0x30
> > > > >    </TASK>
> > > > > 
> > > > > Freed by task 148640:
> > > > >    kasan_save_stack+0x23/0x50
> > > > >    kasan_set_track+0x21/0x30
> > > > >    kasan_save_free_info+0x2a/0x40
> > > > >    ____kasan_slab_free+0x169/0x1d0
> > > > >    slab_free_freelist_hook+0xd2/0x190
> > > > >    __kmem_cache_free+0x1a1/0x2f0
> > > > >    skb_release_data+0x449/0x600
> > > > >    consume_skb+0x9f/0x1c0
> > > > >    veth_xdp_rcv_skb+0x89c/0x1ba0 [veth]
> > > > >    veth_xdp_rcv+0x304/0xa20 [veth]
> > > > >    veth_poll+0x139/0x571 [veth]
> > > > >    __napi_poll+0xa1/0x440
> > > > >    napi_threaded_poll+0x3d1/0x460
> > > > >    kthread+0x2a2/0x340
> > > > >    ret_from_fork+0x22/0x30
> > > > > 
> > > > > 
> > > > > The buggy address belongs to the object at ffff888976250000
> > > > >    which belongs to the cache kmalloc-2k of size 2048
> > > > > The buggy address is located 340 bytes inside of
> > > > >    2048-byte region [ffff888976250000, ffff888976250800)
> > > > > 
> > > > > The buggy address belongs to the physical page:
> > > > > page:00000000ae18262a refcount:2 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x976250
> > > > > head:00000000ae18262a order:3 compound_mapcount:0 compound_pincount:0
> > > > > flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
> > > > > raw: 002ffff800010200 0000000000000000 dead000000000122 ffff88810004cf00
> > > > > raw: 0000000000000000 0000000080080008 00000002ffffffff 0000000000000000
> > > > > page dumped because: kasan: bad access detected
> > > > > 
> > > > > Memory state around the buggy address:
> > > > >    ffff888976250000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> > > > >    ffff888976250080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> > > > > > ffff888976250100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> > > > >                                                    ^
> > > > >    ffff888976250180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> > > > >    ffff888976250200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> > > > > 
> > > > > 
> > > > > If I understand the code correctly it looks like a xdp_buf is
> > > > > constructed pointing to the memory backed by a skb but consume_skb()
> > > > > is called while the xdp_buf() is still in use.
> > > > > 
> > > > > ```
> > > > > 	case XDP_REDIRECT:
> > > > > 		veth_xdp_get(&xdp);
> > > > > 		consume_skb(skb);
> > > > > 		xdp.rxq->mem = rq->xdp_mem;
> > > > > 		if (xdp_do_redirect(rq->dev, &xdp, xdp_prog)) {
> > > > > 			stats->rx_drops++;
> > > > > 			goto err_xdp;
> > > > > 		}
> > > > > 		stats->xdp_redirect++;
> > > > > 		rcu_read_unlock();
> > > > > 		goto xdp_xmit;
> > > > > ```
> > > > > 
> > > > > It is worth noting that I think XDP_TX has the exact same problem.
> > > > > 
> > > > > Again assuming I understand the problem one naive solution might be to
> > > > > move the consum_skb() call after xdp_do_redirect().  I think this
> > > > > might work for BPF_MAP_TYPE_XSKMAP, BPF_MAP_TYPE_DEVMAP, and
> > > > > BPF_MAP_TYPE_DEVMAP_HASH since those all seem to copy the xdb_buf to
> > > > > new memory.  The copy happens for XSKMAP in __xsk_rcv() and for the
> > > > > DEVMAP cases happens in dev_map_enqueue_clone().
> > > > > 
> > > > > However, it would appear that for BPF_MAP_TYPE_CPUMAP that memory can
> > > > > live much longer, possibly even after xdp_do_flush().  If I'm correct,
> > > > > I'm not really sure where it would be safe to call consume_skb().
> > > > 
> > > > So the idea is that veth_xdp_get() does a
> > > > get_page(virt_to_page(xdp->data)), where xdp->data in this case points
> > > > to skb->head. This should keep the data page alive even if the skb
> > > > surrounding it is freed by the call to consume_skb().
> > > > 
> > > > However, because the skb->head in this case was allocated from a slab
> > > > allocator, taking a page refcount is not enough to prevent it from being
> > > > freed.
> > > 
> > > Not sure why skb->head is kmallocked here.
> > > skb_head_is_locked() check in veth_convert_skb_to_xdp_buff() should ensure that
> > > skb head is a page fragment.
> > 
> > I have a few more details here.  We have some machines running 5.15
> > kernels and some are running 6.1 kernels.  So far it appears this only
> > happens on 6.1.  We also have a couple of different network cards but
> > it appears that only the machines with Solarflare cards using the sfc
> > driver hit the KASAN BUG.
> > 
> > 718a18a0c8a67f97781e40bdef7cdd055c430996 "veth: Rework
> > veth_xdp_rcv_skb in order to accept non-linear skb" reworked and added
> > the veth_convert_skb_to_xdp_buff() call you mentioned.  Importantly it
> > also added a call to pskb_expand_head() which will kmalloc() the
> > skb->head.  This looks like a path that could be causing the KASAN
> > BUG, but I have not yet confirmed that is the path we are hitting.
> > This change was also added in 5.18 so might explain why we don't see
> > it on 5.15.
> 
> I think you made a valid point. It looks like pskb_expand_head() is incorrect here.
> At least I did not expect kmallocked skb->data at this point when I introduced
> veth XDP.
> 
> Also I looked into the discussion on the suspected patch,
> https://patchwork.kernel.org/project/netdevbpf/list/?series=&submitter=&state=*&q=rework+veth_xdp_rcv_skb&archive=both&delegate=
> But there was no discussion on pskb_expand_head() nor kmallocked data.
> 
> Lorenzo, what do you think?

I have confirmed this is the cause of the bug we are hitting and I've
also tested a change that simply does what the old code did and
follows the same code to allocate a new skb if the headroom is less
than XDP_PACKET_HEADROOM.  That fixes the issue but maybe there is a
better solution.  I'll send the patch I've tested in a reply.

--
Shawn

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH] veth: Fix use after free in XDP_REDIRECT
  2023-03-14 15:25         ` Shawn Bohrer
@ 2023-03-14 15:33           ` Shawn Bohrer
  2023-03-14 17:13             ` Lorenzo Bianconi
                               ` (3 more replies)
  0 siblings, 4 replies; 13+ messages in thread
From: Shawn Bohrer @ 2023-03-14 15:33 UTC (permalink / raw)
  To: toshiaki.makita1
  Cc: toke, makita.toshiaki, netdev, bpf, kernel-team, lorenzo,
	Shawn Bohrer

718a18a0c8a67f97781e40bdef7cdd055c430996 "veth: Rework veth_xdp_rcv_skb
in order to accept non-linear skb" introduced a bug where it tried to
use pskb_expand_head() if the headroom was less than
XDP_PACKET_HEADROOM.  This however uses kmalloc to expand the head,
which will later allow consume_skb() to free the skb while is it still
in use by AF_XDP.

Previously if the headroom was less than XDP_PACKET_HEADROOM we
continued on to allocate a new skb from pages so this restores that
behavior.

BUG: KASAN: use-after-free in __xsk_rcv+0x18d/0x2c0
Read of size 78 at addr ffff888976250154 by task napi/iconduit-g/148640

CPU: 5 PID: 148640 Comm: napi/iconduit-g Kdump: loaded Tainted: G           O       6.1.4-cloudflare-kasan-2023.1.2 #1
Hardware name: Quanta Computer Inc. QuantaPlex T41S-2U/S2S-MB, BIOS S2S_3B10.03 06/21/2018
Call Trace:
  <TASK>
  dump_stack_lvl+0x34/0x48
  print_report+0x170/0x473
  ? __xsk_rcv+0x18d/0x2c0
  kasan_report+0xad/0x130
  ? __xsk_rcv+0x18d/0x2c0
  kasan_check_range+0x149/0x1a0
  memcpy+0x20/0x60
  __xsk_rcv+0x18d/0x2c0
  __xsk_map_redirect+0x1f3/0x490
  ? veth_xdp_rcv_skb+0x89c/0x1ba0 [veth]
  xdp_do_redirect+0x5ca/0xd60
  veth_xdp_rcv_skb+0x935/0x1ba0 [veth]
  ? __netif_receive_skb_list_core+0x671/0x920
  ? veth_xdp+0x670/0x670 [veth]
  veth_xdp_rcv+0x304/0xa20 [veth]
  ? do_xdp_generic+0x150/0x150
  ? veth_xdp_rcv_one+0xde0/0xde0 [veth]
  ? _raw_spin_lock_bh+0xe0/0xe0
  ? newidle_balance+0x887/0xe30
  ? __perf_event_task_sched_in+0xdb/0x800
  veth_poll+0x139/0x571 [veth]
  ? veth_xdp_rcv+0xa20/0xa20 [veth]
  ? _raw_spin_unlock+0x39/0x70
  ? finish_task_switch.isra.0+0x17e/0x7d0
  ? __switch_to+0x5cf/0x1070
  ? __schedule+0x95b/0x2640
  ? io_schedule_timeout+0x160/0x160
  __napi_poll+0xa1/0x440
  napi_threaded_poll+0x3d1/0x460
  ? __napi_poll+0x440/0x440
  ? __kthread_parkme+0xc6/0x1f0
  ? __napi_poll+0x440/0x440
  kthread+0x2a2/0x340
  ? kthread_complete_and_exit+0x20/0x20
  ret_from_fork+0x22/0x30
  </TASK>

Freed by task 148640:
  kasan_save_stack+0x23/0x50
  kasan_set_track+0x21/0x30
  kasan_save_free_info+0x2a/0x40
  ____kasan_slab_free+0x169/0x1d0
  slab_free_freelist_hook+0xd2/0x190
  __kmem_cache_free+0x1a1/0x2f0
  skb_release_data+0x449/0x600
  consume_skb+0x9f/0x1c0
  veth_xdp_rcv_skb+0x89c/0x1ba0 [veth]
  veth_xdp_rcv+0x304/0xa20 [veth]
  veth_poll+0x139/0x571 [veth]
  __napi_poll+0xa1/0x440
  napi_threaded_poll+0x3d1/0x460
  kthread+0x2a2/0x340
  ret_from_fork+0x22/0x30

The buggy address belongs to the object at ffff888976250000
  which belongs to the cache kmalloc-2k of size 2048
The buggy address is located 340 bytes inside of
  2048-byte region [ffff888976250000, ffff888976250800)

The buggy address belongs to the physical page:
page:00000000ae18262a refcount:2 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x976250
head:00000000ae18262a order:3 compound_mapcount:0 compound_pincount:0
flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
raw: 002ffff800010200 0000000000000000 dead000000000122 ffff88810004cf00
raw: 0000000000000000 0000000080080008 00000002ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
  ffff888976250000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
  ffff888976250080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> ffff888976250100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                                  ^
  ffff888976250180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
  ffff888976250200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

Fixes: 718a18a0c8a6 ("veth: Rework veth_xdp_rcv_skb in order to accept non-linear skb")
Signed-off-by: Shawn Bohrer <sbohrer@cloudflare.com>
---
 drivers/net/veth.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index 1bb54de7124d..6b10aa3883c5 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -708,7 +708,7 @@ static int veth_convert_skb_to_xdp_buff(struct veth_rq *rq,
 	u32 frame_sz;
 
 	if (skb_shared(skb) || skb_head_is_locked(skb) ||
-	    skb_shinfo(skb)->nr_frags) {
+	    skb_shinfo(skb)->nr_frags || skb_headroom(skb) < XDP_PACKET_HEADROOM) {
 		u32 size, len, max_head_size, off;
 		struct sk_buff *nskb;
 		struct page *page;
@@ -773,9 +773,6 @@ static int veth_convert_skb_to_xdp_buff(struct veth_rq *rq,
 
 		consume_skb(skb);
 		skb = nskb;
-	} else if (skb_headroom(skb) < XDP_PACKET_HEADROOM &&
-		   pskb_expand_head(skb, VETH_XDP_HEADROOM, 0, GFP_ATOMIC)) {
-		goto drop;
 	}
 
 	/* SKB "head" area always have tailroom for skb_shared_info */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH] veth: Fix use after free in XDP_REDIRECT
  2023-03-14 15:33           ` [PATCH] veth: Fix " Shawn Bohrer
@ 2023-03-14 17:13             ` Lorenzo Bianconi
  2023-03-15  6:19             ` Toshiaki Makita
                               ` (2 subsequent siblings)
  3 siblings, 0 replies; 13+ messages in thread
From: Lorenzo Bianconi @ 2023-03-14 17:13 UTC (permalink / raw)
  To: Shawn Bohrer
  Cc: toshiaki.makita1, toke, makita.toshiaki, netdev, bpf, kernel-team

[-- Attachment #1: Type: text/plain, Size: 5076 bytes --]

> 718a18a0c8a67f97781e40bdef7cdd055c430996 "veth: Rework veth_xdp_rcv_skb
> in order to accept non-linear skb" introduced a bug where it tried to
> use pskb_expand_head() if the headroom was less than
> XDP_PACKET_HEADROOM.  This however uses kmalloc to expand the head,
> which will later allow consume_skb() to free the skb while is it still
> in use by AF_XDP.
> 
> Previously if the headroom was less than XDP_PACKET_HEADROOM we
> continued on to allocate a new skb from pages so this restores that
> behavior.
> 
> BUG: KASAN: use-after-free in __xsk_rcv+0x18d/0x2c0
> Read of size 78 at addr ffff888976250154 by task napi/iconduit-g/148640
> 
> CPU: 5 PID: 148640 Comm: napi/iconduit-g Kdump: loaded Tainted: G           O       6.1.4-cloudflare-kasan-2023.1.2 #1
> Hardware name: Quanta Computer Inc. QuantaPlex T41S-2U/S2S-MB, BIOS S2S_3B10.03 06/21/2018
> Call Trace:
>   <TASK>
>   dump_stack_lvl+0x34/0x48
>   print_report+0x170/0x473
>   ? __xsk_rcv+0x18d/0x2c0
>   kasan_report+0xad/0x130
>   ? __xsk_rcv+0x18d/0x2c0
>   kasan_check_range+0x149/0x1a0
>   memcpy+0x20/0x60
>   __xsk_rcv+0x18d/0x2c0
>   __xsk_map_redirect+0x1f3/0x490
>   ? veth_xdp_rcv_skb+0x89c/0x1ba0 [veth]
>   xdp_do_redirect+0x5ca/0xd60
>   veth_xdp_rcv_skb+0x935/0x1ba0 [veth]
>   ? __netif_receive_skb_list_core+0x671/0x920
>   ? veth_xdp+0x670/0x670 [veth]
>   veth_xdp_rcv+0x304/0xa20 [veth]
>   ? do_xdp_generic+0x150/0x150
>   ? veth_xdp_rcv_one+0xde0/0xde0 [veth]
>   ? _raw_spin_lock_bh+0xe0/0xe0
>   ? newidle_balance+0x887/0xe30
>   ? __perf_event_task_sched_in+0xdb/0x800
>   veth_poll+0x139/0x571 [veth]
>   ? veth_xdp_rcv+0xa20/0xa20 [veth]
>   ? _raw_spin_unlock+0x39/0x70
>   ? finish_task_switch.isra.0+0x17e/0x7d0
>   ? __switch_to+0x5cf/0x1070
>   ? __schedule+0x95b/0x2640
>   ? io_schedule_timeout+0x160/0x160
>   __napi_poll+0xa1/0x440
>   napi_threaded_poll+0x3d1/0x460
>   ? __napi_poll+0x440/0x440
>   ? __kthread_parkme+0xc6/0x1f0
>   ? __napi_poll+0x440/0x440
>   kthread+0x2a2/0x340
>   ? kthread_complete_and_exit+0x20/0x20
>   ret_from_fork+0x22/0x30
>   </TASK>
> 
> Freed by task 148640:
>   kasan_save_stack+0x23/0x50
>   kasan_set_track+0x21/0x30
>   kasan_save_free_info+0x2a/0x40
>   ____kasan_slab_free+0x169/0x1d0
>   slab_free_freelist_hook+0xd2/0x190
>   __kmem_cache_free+0x1a1/0x2f0
>   skb_release_data+0x449/0x600
>   consume_skb+0x9f/0x1c0
>   veth_xdp_rcv_skb+0x89c/0x1ba0 [veth]
>   veth_xdp_rcv+0x304/0xa20 [veth]
>   veth_poll+0x139/0x571 [veth]
>   __napi_poll+0xa1/0x440
>   napi_threaded_poll+0x3d1/0x460
>   kthread+0x2a2/0x340
>   ret_from_fork+0x22/0x30
> 
> The buggy address belongs to the object at ffff888976250000
>   which belongs to the cache kmalloc-2k of size 2048
> The buggy address is located 340 bytes inside of
>   2048-byte region [ffff888976250000, ffff888976250800)
> 
> The buggy address belongs to the physical page:
> page:00000000ae18262a refcount:2 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x976250
> head:00000000ae18262a order:3 compound_mapcount:0 compound_pincount:0
> flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
> raw: 002ffff800010200 0000000000000000 dead000000000122 ffff88810004cf00
> raw: 0000000000000000 0000000080080008 00000002ffffffff 0000000000000000
> page dumped because: kasan: bad access detected
> 
> Memory state around the buggy address:
>   ffff888976250000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>   ffff888976250080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> > ffff888976250100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>                                                   ^
>   ffff888976250180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>   ffff888976250200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> 
> Fixes: 718a18a0c8a6 ("veth: Rework veth_xdp_rcv_skb in order to accept non-linear skb")
> Signed-off-by: Shawn Bohrer <sbohrer@cloudflare.com>

Thx for fixing it.

Acked-by: Lorenzo Bianconi <lorenzo@kernel.org>

> ---
>  drivers/net/veth.c | 5 +----
>  1 file changed, 1 insertion(+), 4 deletions(-)
> 
> diff --git a/drivers/net/veth.c b/drivers/net/veth.c
> index 1bb54de7124d..6b10aa3883c5 100644
> --- a/drivers/net/veth.c
> +++ b/drivers/net/veth.c
> @@ -708,7 +708,7 @@ static int veth_convert_skb_to_xdp_buff(struct veth_rq *rq,
>  	u32 frame_sz;
>  
>  	if (skb_shared(skb) || skb_head_is_locked(skb) ||
> -	    skb_shinfo(skb)->nr_frags) {
> +	    skb_shinfo(skb)->nr_frags || skb_headroom(skb) < XDP_PACKET_HEADROOM) {
>  		u32 size, len, max_head_size, off;
>  		struct sk_buff *nskb;
>  		struct page *page;
> @@ -773,9 +773,6 @@ static int veth_convert_skb_to_xdp_buff(struct veth_rq *rq,
>  
>  		consume_skb(skb);
>  		skb = nskb;
> -	} else if (skb_headroom(skb) < XDP_PACKET_HEADROOM &&
> -		   pskb_expand_head(skb, VETH_XDP_HEADROOM, 0, GFP_ATOMIC)) {
> -		goto drop;
>  	}
>  
>  	/* SKB "head" area always have tailroom for skb_shared_info */
> -- 
> 2.34.1
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] veth: Fix use after free in XDP_REDIRECT
  2023-03-14 15:33           ` [PATCH] veth: Fix " Shawn Bohrer
  2023-03-14 17:13             ` Lorenzo Bianconi
@ 2023-03-15  6:19             ` Toshiaki Makita
  2023-03-15 18:08             ` Toke Høiland-Jørgensen
  2023-03-16  4:20             ` patchwork-bot+netdevbpf
  3 siblings, 0 replies; 13+ messages in thread
From: Toshiaki Makita @ 2023-03-15  6:19 UTC (permalink / raw)
  To: Shawn Bohrer; +Cc: toke, makita.toshiaki, netdev, bpf, kernel-team, lorenzo

On 2023/03/15 0:33, Shawn Bohrer wrote:
> 718a18a0c8a67f97781e40bdef7cdd055c430996 "veth: Rework veth_xdp_rcv_skb
> in order to accept non-linear skb" introduced a bug where it tried to
> use pskb_expand_head() if the headroom was less than
> XDP_PACKET_HEADROOM.  This however uses kmalloc to expand the head,
> which will later allow consume_skb() to free the skb while is it still
> in use by AF_XDP.
> 
> Previously if the headroom was less than XDP_PACKET_HEADROOM we
> continued on to allocate a new skb from pages so this restores that
> behavior.
> 
> BUG: KASAN: use-after-free in __xsk_rcv+0x18d/0x2c0
> Read of size 78 at addr ffff888976250154 by task napi/iconduit-g/148640
> 
> CPU: 5 PID: 148640 Comm: napi/iconduit-g Kdump: loaded Tainted: G           O       6.1.4-cloudflare-kasan-2023.1.2 #1
> Hardware name: Quanta Computer Inc. QuantaPlex T41S-2U/S2S-MB, BIOS S2S_3B10.03 06/21/2018
> Call Trace:
>    <TASK>
>    dump_stack_lvl+0x34/0x48
>    print_report+0x170/0x473
>    ? __xsk_rcv+0x18d/0x2c0
>    kasan_report+0xad/0x130
>    ? __xsk_rcv+0x18d/0x2c0
>    kasan_check_range+0x149/0x1a0
>    memcpy+0x20/0x60
>    __xsk_rcv+0x18d/0x2c0
>    __xsk_map_redirect+0x1f3/0x490
>    ? veth_xdp_rcv_skb+0x89c/0x1ba0 [veth]
>    xdp_do_redirect+0x5ca/0xd60
>    veth_xdp_rcv_skb+0x935/0x1ba0 [veth]
>    ? __netif_receive_skb_list_core+0x671/0x920
>    ? veth_xdp+0x670/0x670 [veth]
>    veth_xdp_rcv+0x304/0xa20 [veth]
>    ? do_xdp_generic+0x150/0x150
>    ? veth_xdp_rcv_one+0xde0/0xde0 [veth]
>    ? _raw_spin_lock_bh+0xe0/0xe0
>    ? newidle_balance+0x887/0xe30
>    ? __perf_event_task_sched_in+0xdb/0x800
>    veth_poll+0x139/0x571 [veth]
>    ? veth_xdp_rcv+0xa20/0xa20 [veth]
>    ? _raw_spin_unlock+0x39/0x70
>    ? finish_task_switch.isra.0+0x17e/0x7d0
>    ? __switch_to+0x5cf/0x1070
>    ? __schedule+0x95b/0x2640
>    ? io_schedule_timeout+0x160/0x160
>    __napi_poll+0xa1/0x440
>    napi_threaded_poll+0x3d1/0x460
>    ? __napi_poll+0x440/0x440
>    ? __kthread_parkme+0xc6/0x1f0
>    ? __napi_poll+0x440/0x440
>    kthread+0x2a2/0x340
>    ? kthread_complete_and_exit+0x20/0x20
>    ret_from_fork+0x22/0x30
>    </TASK>
> 
> Freed by task 148640:
>    kasan_save_stack+0x23/0x50
>    kasan_set_track+0x21/0x30
>    kasan_save_free_info+0x2a/0x40
>    ____kasan_slab_free+0x169/0x1d0
>    slab_free_freelist_hook+0xd2/0x190
>    __kmem_cache_free+0x1a1/0x2f0
>    skb_release_data+0x449/0x600
>    consume_skb+0x9f/0x1c0
>    veth_xdp_rcv_skb+0x89c/0x1ba0 [veth]
>    veth_xdp_rcv+0x304/0xa20 [veth]
>    veth_poll+0x139/0x571 [veth]
>    __napi_poll+0xa1/0x440
>    napi_threaded_poll+0x3d1/0x460
>    kthread+0x2a2/0x340
>    ret_from_fork+0x22/0x30
> 
> The buggy address belongs to the object at ffff888976250000
>    which belongs to the cache kmalloc-2k of size 2048
> The buggy address is located 340 bytes inside of
>    2048-byte region [ffff888976250000, ffff888976250800)
> 
> The buggy address belongs to the physical page:
> page:00000000ae18262a refcount:2 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x976250
> head:00000000ae18262a order:3 compound_mapcount:0 compound_pincount:0
> flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
> raw: 002ffff800010200 0000000000000000 dead000000000122 ffff88810004cf00
> raw: 0000000000000000 0000000080080008 00000002ffffffff 0000000000000000
> page dumped because: kasan: bad access detected
> 
> Memory state around the buggy address:
>    ffff888976250000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>    ffff888976250080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>> ffff888976250100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>                                                    ^
>    ffff888976250180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>    ffff888976250200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> 
> Fixes: 718a18a0c8a6 ("veth: Rework veth_xdp_rcv_skb in order to accept non-linear skb")
> Signed-off-by: Shawn Bohrer <sbohrer@cloudflare.com>

Acked-by: Toshiaki Makita <toshiaki.makita1@gmail.com>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] veth: Fix use after free in XDP_REDIRECT
  2023-03-14 15:33           ` [PATCH] veth: Fix " Shawn Bohrer
  2023-03-14 17:13             ` Lorenzo Bianconi
  2023-03-15  6:19             ` Toshiaki Makita
@ 2023-03-15 18:08             ` Toke Høiland-Jørgensen
  2023-03-16  4:20             ` patchwork-bot+netdevbpf
  3 siblings, 0 replies; 13+ messages in thread
From: Toke Høiland-Jørgensen @ 2023-03-15 18:08 UTC (permalink / raw)
  To: Shawn Bohrer, toshiaki.makita1
  Cc: makita.toshiaki, netdev, bpf, kernel-team, lorenzo, Shawn Bohrer

Shawn Bohrer <sbohrer@cloudflare.com> writes:

> 718a18a0c8a67f97781e40bdef7cdd055c430996 "veth: Rework veth_xdp_rcv_skb
> in order to accept non-linear skb" introduced a bug where it tried to
> use pskb_expand_head() if the headroom was less than
> XDP_PACKET_HEADROOM.  This however uses kmalloc to expand the head,
> which will later allow consume_skb() to free the skb while is it still
> in use by AF_XDP.
>
> Previously if the headroom was less than XDP_PACKET_HEADROOM we
> continued on to allocate a new skb from pages so this restores that
> behavior.
>
> BUG: KASAN: use-after-free in __xsk_rcv+0x18d/0x2c0
> Read of size 78 at addr ffff888976250154 by task napi/iconduit-g/148640
>
> CPU: 5 PID: 148640 Comm: napi/iconduit-g Kdump: loaded Tainted: G           O       6.1.4-cloudflare-kasan-2023.1.2 #1
> Hardware name: Quanta Computer Inc. QuantaPlex T41S-2U/S2S-MB, BIOS S2S_3B10.03 06/21/2018
> Call Trace:
>   <TASK>
>   dump_stack_lvl+0x34/0x48
>   print_report+0x170/0x473
>   ? __xsk_rcv+0x18d/0x2c0
>   kasan_report+0xad/0x130
>   ? __xsk_rcv+0x18d/0x2c0
>   kasan_check_range+0x149/0x1a0
>   memcpy+0x20/0x60
>   __xsk_rcv+0x18d/0x2c0
>   __xsk_map_redirect+0x1f3/0x490
>   ? veth_xdp_rcv_skb+0x89c/0x1ba0 [veth]
>   xdp_do_redirect+0x5ca/0xd60
>   veth_xdp_rcv_skb+0x935/0x1ba0 [veth]
>   ? __netif_receive_skb_list_core+0x671/0x920
>   ? veth_xdp+0x670/0x670 [veth]
>   veth_xdp_rcv+0x304/0xa20 [veth]
>   ? do_xdp_generic+0x150/0x150
>   ? veth_xdp_rcv_one+0xde0/0xde0 [veth]
>   ? _raw_spin_lock_bh+0xe0/0xe0
>   ? newidle_balance+0x887/0xe30
>   ? __perf_event_task_sched_in+0xdb/0x800
>   veth_poll+0x139/0x571 [veth]
>   ? veth_xdp_rcv+0xa20/0xa20 [veth]
>   ? _raw_spin_unlock+0x39/0x70
>   ? finish_task_switch.isra.0+0x17e/0x7d0
>   ? __switch_to+0x5cf/0x1070
>   ? __schedule+0x95b/0x2640
>   ? io_schedule_timeout+0x160/0x160
>   __napi_poll+0xa1/0x440
>   napi_threaded_poll+0x3d1/0x460
>   ? __napi_poll+0x440/0x440
>   ? __kthread_parkme+0xc6/0x1f0
>   ? __napi_poll+0x440/0x440
>   kthread+0x2a2/0x340
>   ? kthread_complete_and_exit+0x20/0x20
>   ret_from_fork+0x22/0x30
>   </TASK>
>
> Freed by task 148640:
>   kasan_save_stack+0x23/0x50
>   kasan_set_track+0x21/0x30
>   kasan_save_free_info+0x2a/0x40
>   ____kasan_slab_free+0x169/0x1d0
>   slab_free_freelist_hook+0xd2/0x190
>   __kmem_cache_free+0x1a1/0x2f0
>   skb_release_data+0x449/0x600
>   consume_skb+0x9f/0x1c0
>   veth_xdp_rcv_skb+0x89c/0x1ba0 [veth]
>   veth_xdp_rcv+0x304/0xa20 [veth]
>   veth_poll+0x139/0x571 [veth]
>   __napi_poll+0xa1/0x440
>   napi_threaded_poll+0x3d1/0x460
>   kthread+0x2a2/0x340
>   ret_from_fork+0x22/0x30
>
> The buggy address belongs to the object at ffff888976250000
>   which belongs to the cache kmalloc-2k of size 2048
> The buggy address is located 340 bytes inside of
>   2048-byte region [ffff888976250000, ffff888976250800)
>
> The buggy address belongs to the physical page:
> page:00000000ae18262a refcount:2 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x976250
> head:00000000ae18262a order:3 compound_mapcount:0 compound_pincount:0
> flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
> raw: 002ffff800010200 0000000000000000 dead000000000122 ffff88810004cf00
> raw: 0000000000000000 0000000080080008 00000002ffffffff 0000000000000000
> page dumped because: kasan: bad access detected
>
> Memory state around the buggy address:
>   ffff888976250000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>   ffff888976250080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>> ffff888976250100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>                                                   ^
>   ffff888976250180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>   ffff888976250200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>
> Fixes: 718a18a0c8a6 ("veth: Rework veth_xdp_rcv_skb in order to accept non-linear skb")
> Signed-off-by: Shawn Bohrer <sbohrer@cloudflare.com>

Acked-by: Toke Høiland-Jørgensen <toke@kernel.org>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] veth: Fix use after free in XDP_REDIRECT
  2023-03-14 15:33           ` [PATCH] veth: Fix " Shawn Bohrer
                               ` (2 preceding siblings ...)
  2023-03-15 18:08             ` Toke Høiland-Jørgensen
@ 2023-03-16  4:20             ` patchwork-bot+netdevbpf
  3 siblings, 0 replies; 13+ messages in thread
From: patchwork-bot+netdevbpf @ 2023-03-16  4:20 UTC (permalink / raw)
  To: Shawn Bohrer
  Cc: toshiaki.makita1, toke, makita.toshiaki, netdev, bpf, kernel-team,
	lorenzo

Hello:

This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Tue, 14 Mar 2023 10:33:51 -0500 you wrote:
> 718a18a0c8a67f97781e40bdef7cdd055c430996 "veth: Rework veth_xdp_rcv_skb
> in order to accept non-linear skb" introduced a bug where it tried to
> use pskb_expand_head() if the headroom was less than
> XDP_PACKET_HEADROOM.  This however uses kmalloc to expand the head,
> which will later allow consume_skb() to free the skb while is it still
> in use by AF_XDP.
> 
> [...]

Here is the summary with links:
  - veth: Fix use after free in XDP_REDIRECT
    https://git.kernel.org/netdev/net/c/7c10131803e4

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: KASAN veth use after free in XDP_REDIRECT
  2023-01-25  1:54 ` Toke Høiland-Jørgensen
  2023-01-25  2:02   ` Jakub Kicinski
  2023-01-25  2:07   ` Toshiaki Makita
@ 2023-01-25 16:40   ` Shawn Bohrer
  2 siblings, 0 replies; 13+ messages in thread
From: Shawn Bohrer @ 2023-01-25 16:40 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: netdev, bpf, makita.toshiaki, kernel-team

On Wed, Jan 25, 2023 at 02:54:52AM +0100, Toke Høiland-Jørgensen wrote:
> Shawn Bohrer <sbohrer@cloudflare.com> writes:
> 
> > Hello,
> >
> > We've seen the following KASAN report on our systems. When using
> > AF_XDP on a veth.
> >
> > KASAN report:
> >
> > BUG: KASAN: use-after-free in __xsk_rcv+0x18d/0x2c0
> > Read of size 78 at addr ffff888976250154 by task napi/iconduit-g/148640
> >
> > CPU: 5 PID: 148640 Comm: napi/iconduit-g Kdump: loaded Tainted: G           O       6.1.4-cloudflare-kasan-2023.1.2 #1
> > Hardware name: Quanta Computer Inc. QuantaPlex T41S-2U/S2S-MB, BIOS S2S_3B10.03 06/21/2018
> > Call Trace:
> >  <TASK>
> >  dump_stack_lvl+0x34/0x48
> >  print_report+0x170/0x473
> >  ? __xsk_rcv+0x18d/0x2c0
> >  kasan_report+0xad/0x130
> >  ? __xsk_rcv+0x18d/0x2c0
> >  kasan_check_range+0x149/0x1a0
> >  memcpy+0x20/0x60
> >  __xsk_rcv+0x18d/0x2c0
> >  __xsk_map_redirect+0x1f3/0x490
> >  ? veth_xdp_rcv_skb+0x89c/0x1ba0 [veth]
> >  xdp_do_redirect+0x5ca/0xd60
> >  veth_xdp_rcv_skb+0x935/0x1ba0 [veth]
> >  ? __netif_receive_skb_list_core+0x671/0x920
> >  ? veth_xdp+0x670/0x670 [veth]
> >  veth_xdp_rcv+0x304/0xa20 [veth]
> >  ? do_xdp_generic+0x150/0x150
> >  ? veth_xdp_rcv_one+0xde0/0xde0 [veth]
> >  ? _raw_spin_lock_bh+0xe0/0xe0
> >  ? newidle_balance+0x887/0xe30
> >  ? __perf_event_task_sched_in+0xdb/0x800
> >  veth_poll+0x139/0x571 [veth]
> >  ? veth_xdp_rcv+0xa20/0xa20 [veth]
> >  ? _raw_spin_unlock+0x39/0x70
> >  ? finish_task_switch.isra.0+0x17e/0x7d0
> >  ? __switch_to+0x5cf/0x1070
> >  ? __schedule+0x95b/0x2640
> >  ? io_schedule_timeout+0x160/0x160
> >  __napi_poll+0xa1/0x440
> >  napi_threaded_poll+0x3d1/0x460
> >  ? __napi_poll+0x440/0x440
> >  ? __kthread_parkme+0xc6/0x1f0
> >  ? __napi_poll+0x440/0x440
> >  kthread+0x2a2/0x340
> >  ? kthread_complete_and_exit+0x20/0x20
> >  ret_from_fork+0x22/0x30
> >  </TASK>
> >
> > Freed by task 148640:
> >  kasan_save_stack+0x23/0x50
> >  kasan_set_track+0x21/0x30
> >  kasan_save_free_info+0x2a/0x40
> >  ____kasan_slab_free+0x169/0x1d0
> >  slab_free_freelist_hook+0xd2/0x190
> >  __kmem_cache_free+0x1a1/0x2f0
> >  skb_release_data+0x449/0x600
> >  consume_skb+0x9f/0x1c0
> >  veth_xdp_rcv_skb+0x89c/0x1ba0 [veth]
> >  veth_xdp_rcv+0x304/0xa20 [veth]
> >  veth_poll+0x139/0x571 [veth]
> >  __napi_poll+0xa1/0x440
> >  napi_threaded_poll+0x3d1/0x460
> >  kthread+0x2a2/0x340
> >  ret_from_fork+0x22/0x30
> >
> >
> > The buggy address belongs to the object at ffff888976250000
> >  which belongs to the cache kmalloc-2k of size 2048
> > The buggy address is located 340 bytes inside of
> >  2048-byte region [ffff888976250000, ffff888976250800)
> >
> > The buggy address belongs to the physical page:
> > page:00000000ae18262a refcount:2 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x976250
> > head:00000000ae18262a order:3 compound_mapcount:0 compound_pincount:0
> > flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
> > raw: 002ffff800010200 0000000000000000 dead000000000122 ffff88810004cf00
> > raw: 0000000000000000 0000000080080008 00000002ffffffff 0000000000000000
> > page dumped because: kasan: bad access detected
> >
> > Memory state around the buggy address:
> >  ffff888976250000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> >  ffff888976250080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> >>ffff888976250100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> >                                                  ^
> >  ffff888976250180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> >  ffff888976250200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> >
> >
> > If I understand the code correctly it looks like a xdp_buf is
> > constructed pointing to the memory backed by a skb but consume_skb()
> > is called while the xdp_buf() is still in use.
> >
> > ```
> > 	case XDP_REDIRECT:
> > 		veth_xdp_get(&xdp);
> > 		consume_skb(skb);
> > 		xdp.rxq->mem = rq->xdp_mem;
> > 		if (xdp_do_redirect(rq->dev, &xdp, xdp_prog)) {
> > 			stats->rx_drops++;
> > 			goto err_xdp;
> > 		}
> > 		stats->xdp_redirect++;
> > 		rcu_read_unlock();
> > 		goto xdp_xmit;
> > ```
> >
> > It is worth noting that I think XDP_TX has the exact same problem.
> >
> > Again assuming I understand the problem one naive solution might be to
> > move the consum_skb() call after xdp_do_redirect().  I think this
> > might work for BPF_MAP_TYPE_XSKMAP, BPF_MAP_TYPE_DEVMAP, and
> > BPF_MAP_TYPE_DEVMAP_HASH since those all seem to copy the xdb_buf to
> > new memory.  The copy happens for XSKMAP in __xsk_rcv() and for the
> > DEVMAP cases happens in dev_map_enqueue_clone().
> >
> > However, it would appear that for BPF_MAP_TYPE_CPUMAP that memory can
> > live much longer, possibly even after xdp_do_flush().  If I'm correct,
> > I'm not really sure where it would be safe to call consume_skb().
> 
> So the idea is that veth_xdp_get() does a
> get_page(virt_to_page(xdp->data)), where xdp->data in this case points
> to skb->head. This should keep the data page alive even if the skb
> surrounding it is freed by the call to consume_skb().
> 
> However, because the skb->head in this case was allocated from a slab
> allocator, taking a page refcount is not enough to prevent it from being
> freed.
> 
> I'm not sure how best to fix this. I guess we could try to detect this
> case in the veth driver and copy the data, like we do for skb_share()
> etc. However, I'm not sure how to actually detect this case...
> 
> Where are the skbs being processed coming from? I.e., what path
> allocated them?

I can't say for sure where this skb came from.  Normally the vast
majority of our packets come from an external physical NIC and are
then routed into a network namespace through a veth pair.  In this
case this machine had a Solarflare card with the sfc driver.  However,
we only run the KASAN kernel on testing colo, and it normally sees
very little of this traffic unless someone is explicitly running
tests, and I have no idea what was happening when this bug triggered.

My only other guesses are that maybe this skb came from a tun device
or from the loopback device.  Some of our traffic arrives
encapsulated, and we decapsulate and forward.  This could then enter
the veth into the namespace.  If it is actually important to know
_where_ the skb was allocated we might be able to come up some more
ideas.

Thanks,
Shawn

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2023-03-16  4:20 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-01-24 22:45 KASAN veth use after free in XDP_REDIRECT Shawn Bohrer
2023-01-25  1:54 ` Toke Høiland-Jørgensen
2023-01-25  2:02   ` Jakub Kicinski
2023-01-25  2:07   ` Toshiaki Makita
2023-03-08 22:33     ` Shawn Bohrer
2023-03-14  8:02       ` Toshiaki Makita
2023-03-14 15:25         ` Shawn Bohrer
2023-03-14 15:33           ` [PATCH] veth: Fix " Shawn Bohrer
2023-03-14 17:13             ` Lorenzo Bianconi
2023-03-15  6:19             ` Toshiaki Makita
2023-03-15 18:08             ` Toke Høiland-Jørgensen
2023-03-16  4:20             ` patchwork-bot+netdevbpf
2023-01-25 16:40   ` KASAN veth " Shawn Bohrer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).