All of lore.kernel.org
 help / color / mirror / Atom feed
From: Shawn Bohrer <sbohrer@cloudflare.com>
To: Toshiaki Makita <toshiaki.makita1@gmail.com>
Cc: lorenzo@kernel.org, "Toke Høiland-Jørgensen" <toke@kernel.org>,
	makita.toshiaki@lab.ntt.co.jp, netdev@vger.kernel.org,
	bpf@vger.kernel.org, kernel-team@cloudflare.com
Subject: Re: KASAN veth use after free in XDP_REDIRECT
Date: Tue, 14 Mar 2023 10:25:54 -0500	[thread overview]
Message-ID: <ZBCSAsUBeNvTPj/s@JNXK7M3> (raw)
In-Reply-To: <ea1ec50f-06ca-7771-f9f8-e45c13179733@gmail.com>

On Tue, Mar 14, 2023 at 05:02:40PM +0900, Toshiaki Makita wrote:
> On 2023/03/09 7:33, Shawn Bohrer wrote:
> > On Wed, Jan 25, 2023 at 11:07:32AM +0900, Toshiaki Makita wrote:
> > > On 2023/01/25 10:54, Toke Høiland-Jørgensen wrote:
> > > > Shawn Bohrer <sbohrer@cloudflare.com> writes:
> > > > 
> > > > > Hello,
> > > > > 
> > > > > We've seen the following KASAN report on our systems. When using
> > > > > AF_XDP on a veth.
> > > > > 
> > > > > KASAN report:
> > > > > 
> > > > > BUG: KASAN: use-after-free in __xsk_rcv+0x18d/0x2c0
> > > > > Read of size 78 at addr ffff888976250154 by task napi/iconduit-g/148640
> > > > > 
> > > > > CPU: 5 PID: 148640 Comm: napi/iconduit-g Kdump: loaded Tainted: G           O       6.1.4-cloudflare-kasan-2023.1.2 #1
> > > > > Hardware name: Quanta Computer Inc. QuantaPlex T41S-2U/S2S-MB, BIOS S2S_3B10.03 06/21/2018
> > > > > Call Trace:
> > > > >    <TASK>
> > > > >    dump_stack_lvl+0x34/0x48
> > > > >    print_report+0x170/0x473
> > > > >    ? __xsk_rcv+0x18d/0x2c0
> > > > >    kasan_report+0xad/0x130
> > > > >    ? __xsk_rcv+0x18d/0x2c0
> > > > >    kasan_check_range+0x149/0x1a0
> > > > >    memcpy+0x20/0x60
> > > > >    __xsk_rcv+0x18d/0x2c0
> > > > >    __xsk_map_redirect+0x1f3/0x490
> > > > >    ? veth_xdp_rcv_skb+0x89c/0x1ba0 [veth]
> > > > >    xdp_do_redirect+0x5ca/0xd60
> > > > >    veth_xdp_rcv_skb+0x935/0x1ba0 [veth]
> > > > >    ? __netif_receive_skb_list_core+0x671/0x920
> > > > >    ? veth_xdp+0x670/0x670 [veth]
> > > > >    veth_xdp_rcv+0x304/0xa20 [veth]
> > > > >    ? do_xdp_generic+0x150/0x150
> > > > >    ? veth_xdp_rcv_one+0xde0/0xde0 [veth]
> > > > >    ? _raw_spin_lock_bh+0xe0/0xe0
> > > > >    ? newidle_balance+0x887/0xe30
> > > > >    ? __perf_event_task_sched_in+0xdb/0x800
> > > > >    veth_poll+0x139/0x571 [veth]
> > > > >    ? veth_xdp_rcv+0xa20/0xa20 [veth]
> > > > >    ? _raw_spin_unlock+0x39/0x70
> > > > >    ? finish_task_switch.isra.0+0x17e/0x7d0
> > > > >    ? __switch_to+0x5cf/0x1070
> > > > >    ? __schedule+0x95b/0x2640
> > > > >    ? io_schedule_timeout+0x160/0x160
> > > > >    __napi_poll+0xa1/0x440
> > > > >    napi_threaded_poll+0x3d1/0x460
> > > > >    ? __napi_poll+0x440/0x440
> > > > >    ? __kthread_parkme+0xc6/0x1f0
> > > > >    ? __napi_poll+0x440/0x440
> > > > >    kthread+0x2a2/0x340
> > > > >    ? kthread_complete_and_exit+0x20/0x20
> > > > >    ret_from_fork+0x22/0x30
> > > > >    </TASK>
> > > > > 
> > > > > Freed by task 148640:
> > > > >    kasan_save_stack+0x23/0x50
> > > > >    kasan_set_track+0x21/0x30
> > > > >    kasan_save_free_info+0x2a/0x40
> > > > >    ____kasan_slab_free+0x169/0x1d0
> > > > >    slab_free_freelist_hook+0xd2/0x190
> > > > >    __kmem_cache_free+0x1a1/0x2f0
> > > > >    skb_release_data+0x449/0x600
> > > > >    consume_skb+0x9f/0x1c0
> > > > >    veth_xdp_rcv_skb+0x89c/0x1ba0 [veth]
> > > > >    veth_xdp_rcv+0x304/0xa20 [veth]
> > > > >    veth_poll+0x139/0x571 [veth]
> > > > >    __napi_poll+0xa1/0x440
> > > > >    napi_threaded_poll+0x3d1/0x460
> > > > >    kthread+0x2a2/0x340
> > > > >    ret_from_fork+0x22/0x30
> > > > > 
> > > > > 
> > > > > The buggy address belongs to the object at ffff888976250000
> > > > >    which belongs to the cache kmalloc-2k of size 2048
> > > > > The buggy address is located 340 bytes inside of
> > > > >    2048-byte region [ffff888976250000, ffff888976250800)
> > > > > 
> > > > > The buggy address belongs to the physical page:
> > > > > page:00000000ae18262a refcount:2 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x976250
> > > > > head:00000000ae18262a order:3 compound_mapcount:0 compound_pincount:0
> > > > > flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
> > > > > raw: 002ffff800010200 0000000000000000 dead000000000122 ffff88810004cf00
> > > > > raw: 0000000000000000 0000000080080008 00000002ffffffff 0000000000000000
> > > > > page dumped because: kasan: bad access detected
> > > > > 
> > > > > Memory state around the buggy address:
> > > > >    ffff888976250000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> > > > >    ffff888976250080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> > > > > > ffff888976250100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> > > > >                                                    ^
> > > > >    ffff888976250180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> > > > >    ffff888976250200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> > > > > 
> > > > > 
> > > > > If I understand the code correctly it looks like a xdp_buf is
> > > > > constructed pointing to the memory backed by a skb but consume_skb()
> > > > > is called while the xdp_buf() is still in use.
> > > > > 
> > > > > ```
> > > > > 	case XDP_REDIRECT:
> > > > > 		veth_xdp_get(&xdp);
> > > > > 		consume_skb(skb);
> > > > > 		xdp.rxq->mem = rq->xdp_mem;
> > > > > 		if (xdp_do_redirect(rq->dev, &xdp, xdp_prog)) {
> > > > > 			stats->rx_drops++;
> > > > > 			goto err_xdp;
> > > > > 		}
> > > > > 		stats->xdp_redirect++;
> > > > > 		rcu_read_unlock();
> > > > > 		goto xdp_xmit;
> > > > > ```
> > > > > 
> > > > > It is worth noting that I think XDP_TX has the exact same problem.
> > > > > 
> > > > > Again assuming I understand the problem one naive solution might be to
> > > > > move the consum_skb() call after xdp_do_redirect().  I think this
> > > > > might work for BPF_MAP_TYPE_XSKMAP, BPF_MAP_TYPE_DEVMAP, and
> > > > > BPF_MAP_TYPE_DEVMAP_HASH since those all seem to copy the xdb_buf to
> > > > > new memory.  The copy happens for XSKMAP in __xsk_rcv() and for the
> > > > > DEVMAP cases happens in dev_map_enqueue_clone().
> > > > > 
> > > > > However, it would appear that for BPF_MAP_TYPE_CPUMAP that memory can
> > > > > live much longer, possibly even after xdp_do_flush().  If I'm correct,
> > > > > I'm not really sure where it would be safe to call consume_skb().
> > > > 
> > > > So the idea is that veth_xdp_get() does a
> > > > get_page(virt_to_page(xdp->data)), where xdp->data in this case points
> > > > to skb->head. This should keep the data page alive even if the skb
> > > > surrounding it is freed by the call to consume_skb().
> > > > 
> > > > However, because the skb->head in this case was allocated from a slab
> > > > allocator, taking a page refcount is not enough to prevent it from being
> > > > freed.
> > > 
> > > Not sure why skb->head is kmallocked here.
> > > skb_head_is_locked() check in veth_convert_skb_to_xdp_buff() should ensure that
> > > skb head is a page fragment.
> > 
> > I have a few more details here.  We have some machines running 5.15
> > kernels and some are running 6.1 kernels.  So far it appears this only
> > happens on 6.1.  We also have a couple of different network cards but
> > it appears that only the machines with Solarflare cards using the sfc
> > driver hit the KASAN BUG.
> > 
> > 718a18a0c8a67f97781e40bdef7cdd055c430996 "veth: Rework
> > veth_xdp_rcv_skb in order to accept non-linear skb" reworked and added
> > the veth_convert_skb_to_xdp_buff() call you mentioned.  Importantly it
> > also added a call to pskb_expand_head() which will kmalloc() the
> > skb->head.  This looks like a path that could be causing the KASAN
> > BUG, but I have not yet confirmed that is the path we are hitting.
> > This change was also added in 5.18 so might explain why we don't see
> > it on 5.15.
> 
> I think you made a valid point. It looks like pskb_expand_head() is incorrect here.
> At least I did not expect kmallocked skb->data at this point when I introduced
> veth XDP.
> 
> Also I looked into the discussion on the suspected patch,
> https://patchwork.kernel.org/project/netdevbpf/list/?series=&submitter=&state=*&q=rework+veth_xdp_rcv_skb&archive=both&delegate=
> But there was no discussion on pskb_expand_head() nor kmallocked data.
> 
> Lorenzo, what do you think?

I have confirmed this is the cause of the bug we are hitting and I've
also tested a change that simply does what the old code did and
follows the same code to allocate a new skb if the headroom is less
than XDP_PACKET_HEADROOM.  That fixes the issue but maybe there is a
better solution.  I'll send the patch I've tested in a reply.

--
Shawn

  reply	other threads:[~2023-03-14 15:26 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-24 22:45 KASAN veth use after free in XDP_REDIRECT Shawn Bohrer
2023-01-25  1:54 ` Toke Høiland-Jørgensen
2023-01-25  2:02   ` Jakub Kicinski
2023-01-25  2:07   ` Toshiaki Makita
2023-03-08 22:33     ` Shawn Bohrer
2023-03-14  8:02       ` Toshiaki Makita
2023-03-14 15:25         ` Shawn Bohrer [this message]
2023-03-14 15:33           ` [PATCH] veth: Fix " Shawn Bohrer
2023-03-14 17:13             ` Lorenzo Bianconi
2023-03-15  6:19             ` Toshiaki Makita
2023-03-15 18:08             ` Toke Høiland-Jørgensen
2023-03-16  4:20             ` patchwork-bot+netdevbpf
2023-01-25 16:40   ` KASAN veth " Shawn Bohrer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZBCSAsUBeNvTPj/s@JNXK7M3 \
    --to=sbohrer@cloudflare.com \
    --cc=bpf@vger.kernel.org \
    --cc=kernel-team@cloudflare.com \
    --cc=lorenzo@kernel.org \
    --cc=makita.toshiaki@lab.ntt.co.jp \
    --cc=netdev@vger.kernel.org \
    --cc=toke@kernel.org \
    --cc=toshiaki.makita1@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.