All of lore.kernel.org
 help / color / mirror / Atom feed
From: Vlad Buslov <vladbu@nvidia.com>
To: Peilin Ye <yepeilin.cs@gmail.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>,
	Pedro Tammela <pctammela@mojatatu.com>,
	Seth Forshee <sforshee@digitalocean.com>,
	"Jakub Kicinski" <kuba@kernel.org>,
	syzbot <syzbot+b53a9c0d1ea4ad62da8b@syzkaller.appspotmail.com>,
	<davem@davemloft.net>, <edumazet@google.com>, <jiri@resnulli.us>,
	<linux-kernel@vger.kernel.org>, <netdev@vger.kernel.org>,
	<pabeni@redhat.com>, <syzkaller-bugs@googlegroups.com>,
	<xiyou.wangcong@gmail.com>, <peilin.ye@bytedance.com>,
	<hdanton@sina.com>
Subject: Re: [syzbot] [net?] KASAN: slab-use-after-free Write in mini_qdisc_pair_swap
Date: Thu, 27 Apr 2023 15:26:03 +0300	[thread overview]
Message-ID: <877ctxsdnb.fsf@nvidia.com> (raw)
In-Reply-To: <20230426233657.GA11249@bytedance>

Hi Peilin,

On Wed 26 Apr 2023 at 16:42, Peilin Ye <yepeilin.cs@gmail.com> wrote:
> +Cc: Vlad Buslov, Hillf Danton
>
> Hi all,
>
> On Mon, Apr 17, 2023 at 04:00:11PM -0700, Peilin Ye wrote:
>> I also reproduced this UAF using the syzkaller reproducer in the report
>> (the C reproducer did not work for me for unknown reasons).  I will look
>> into this.
>
> Currently, multiple ingress (clsact) Qdiscs can access the per-netdev
> *miniq_ingress (*miniq_egress) pointer concurrently.  This is
> unfortunately true in two senses:
>
> 1. We allow adding ingress (clsact) Qdiscs under parents other than
> TC_H_INGRESS (TC_H_CLSACT):
>
>   $ ip link add ifb0 numtxqueues 8 type ifb
>   $ echo clsact > /proc/sys/net/core/default_qdisc
>   $ tc qdisc add dev ifb0 handle 1: root mq
>   $ tc qdisc show dev ifb0
>   qdisc mq 1: root
>   qdisc clsact 0: parent 1:8
>   qdisc clsact 0: parent 1:7
>   qdisc clsact 0: parent 1:6
>   qdisc clsact 0: parent 1:5
>   qdisc clsact 0: parent 1:4
>   qdisc clsact 0: parent 1:3
>   qdisc clsact 0: parent 1:2
>   qdisc clsact 0: parent 1:1
>
> This is obviously racy and should be prohibited.  I've started working
> on patches to fix this.  The syz repro for this UAF adds ingress Qdiscs
> under TC_H_ROOT, by the way.


Hmm, didn't realize it was the case.

>
> 2. After introducing RTNL-lockless RTM_{NEW,DEL,GET}TFILTER requests
> [1], it is possible that, when replacing ingress (clsact) Qdiscs, the
> old one can access *miniq_{in,e}gress concurrently with the new one.  For
> example, the syz repro does something like the following:
>
>   Thread 1 creates sch_ingress Qdisc A (containing mini Qdisc a1 and a2),
>   then adds a cls_flower filter X to Qdisc A.
>
>   Thread 2 creates sch_ingress Qdisc B (containing mini Qdisc b1 and b2)
>   to replace Qdisc A, then adds a cls_flower filter Y to Qdisc B.
>
>   Device has 8 TXQs.
>
>  Thread 1               A's refcnt   Thread 2
>   RTM_NEWQDISC (A, locked)    
>    qdisc_create(A)               1
>    qdisc_graft(A)                9
>
>   RTM_NEWTFILTER (X, lockless)
>    __tcf_qdisc_find(A)          10
>    tcf_chain0_head_change(A)
>  ! mini_qdisc_pair_swap(A)           
>             |                        RTM_NEWQDISC (B, locked)
>             |                    2    qdisc_graft(B)
>             |                    1    notify_and_destroy(A)
>             |                                  
>             |                        RTM_NEWTFILTER (Y, lockless)
>             |                         tcf_chain0_head_change(B)
>             |                       ! mini_qdisc_pair_swap(B)
>    tcf_block_release(A)          0             |
>    qdisc_destroy(A)                            |
>    tcf_chain0_head_change_cb_del(A)            |
>  ! mini_qdisc_pair_swap(A)                     |
>             |                                  |
>            ...                                ...
>
> As we can see there're interleaving mini_qdisc_pair_swap() calls between
> Qdisc A and B, causing all kinds of troubles, including the UAF (thread
> 2 writing to mini Qdisc a1's rcu_state after Qdisc A has already been
> freed) reported by syzbot.

Great analysis! However, it is still not quite clear to me how threads 1
and 2 access each other RCU state when q->miniqp is a private memory of
the Qdisc, so 1 should only see A->miniqp and 2 only B->miniqp. And both
miniqps should be protected from deallocation by reference that lockless
RTM_NEWTFILTER obtains.

>
> To fix this, I'm cooking a patch that, when replacing ingress (clsact)
> Qdiscs, in qdisc_graft():
>
>   I.  We should make sure there's no on-the-fly lockless filter requests
>       for the old Qdisc, and return -EBUSY if there's any (or can/should
>       we wait in RTM_NEWQDISC handler?)
>
>   II. We should destory the old Qdisc before publishing the new one
>       (i.e. setting it to dev_ingress_queue(dev)->qdisc_sleeping, so
>       that subsequent filter requests can see it), because
>       {ingress,clsact}_destroy() also call mini_qdisc_pair_swap(), which
>       sets *miniq_{in,e}gress to NULL

Another approach would be to somehow detect concurrent Qdisc replace and
return -EAGAIN from tcf_chain_tp_insert() before calling
tcf_chain0_head_change(). This would leverage existing cls_api
functionality that automatically retries after releasing all references
to chain/tp and obtaining them again instead of messing with qdisc api.
However, since I still didn't fully grasp the issue it is hard for me to
reason whether such approach would be possible to implement in this
case.

>
> Future Qdiscs that support RTNL-lockless cls_ops, if any, won't need
> this fix, as long as their ->chain_head_change() don't access
> out-of-Qdisc-scope data, like pointers in struct net_device.
>
> Do you think this is the right way to go?  Thanks!
>
> [1] Thanks Hillf Danton for the hint:
>     https://syzkaller.appspot.com/text?tag=Patch&x=10d7cd5bc80000
>
> Thanks,
> Peilin Ye


  parent reply	other threads:[~2023-04-27 12:39 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-24  0:52 [syzbot] [net?] KASAN: slab-use-after-free Write in mini_qdisc_pair_swap syzbot
2023-03-29  1:47 ` Jakub Kicinski
2023-03-29  3:37   ` Seth Forshee
2023-03-29 19:07     ` Pedro Tammela
2023-04-03 15:58       ` Jamal Hadi Salim
2023-04-17 23:00         ` Peilin Ye
2023-04-26 23:42           ` Peilin Ye
2023-04-27  2:31             ` Pedro Tammela
2023-04-27 12:26             ` Vlad Buslov [this message]
2023-04-27 17:35               ` Peilin Ye
2023-04-28 12:43                 ` Vlad Buslov
2023-05-06  0:09             ` [PATCH net 0/6] net/sched: Fixes for sch_ingress and sch_clsact Peilin Ye
2023-05-06  0:12               ` [PATCH net 1/6] net/sched: sch_ingress: Only create under TC_H_INGRESS Peilin Ye
2023-05-08 11:22                 ` Jamal Hadi Salim
2023-05-06  0:13               ` [PATCH net 2/6] net/sched: sch_clsact: Only create under TC_H_CLSACT Peilin Ye
2023-05-08 11:23                 ` Jamal Hadi Salim
2023-05-06  0:14               ` [PATCH net 3/6] net/sched: Reserve TC_H_INGRESS (TC_H_CLSACT) for ingress (clsact) Qdiscs Peilin Ye
2023-05-08 11:23                 ` Jamal Hadi Salim
2023-05-06  0:14               ` [PATCH net 4/6] net/sched: Prohibit regrafting ingress or clsact Qdiscs Peilin Ye
2023-05-08 11:24                 ` Jamal Hadi Salim
2023-05-06  0:15               ` [PATCH net 5/6] net/sched: Refactor qdisc_graft() for ingress and " Peilin Ye
2023-05-08 11:29                 ` Jamal Hadi Salim
2023-05-08 22:24                   ` Peilin Ye
2023-05-08 14:11                 ` Pedro Tammela
2023-05-06  0:16               ` [PATCH net 6/6] net/sched: qdisc_destroy() old ingress and clsact Qdiscs before grafting Peilin Ye
2023-05-08 11:32                 ` Jamal Hadi Salim
2023-05-08 21:58                   ` Peilin Ye
2023-05-08 14:12                 ` Pedro Tammela
2023-05-08 22:01                   ` Peilin Ye
2023-05-09  1:33                 ` Jakub Kicinski
2023-05-10 20:11                   ` Peilin Ye
2023-05-10 23:15                     ` Jakub Kicinski
2023-05-11 20:40                       ` Peilin Ye
2023-05-11 23:20                         ` Jakub Kicinski
2023-05-11 23:46                           ` Peilin Ye
2023-05-12  0:11                             ` Peilin Ye
2023-05-15 22:45                               ` Peilin Ye
2023-05-16 19:22                                 ` Jakub Kicinski
2023-05-16 19:35                                   ` Vlad Buslov
2023-05-16 21:50                                     ` Jakub Kicinski
2023-05-16 22:58                                       ` Peilin Ye
2023-05-17  0:39                                         ` Jakub Kicinski
2023-05-17  8:49                                           ` Vlad Buslov
2023-05-17 18:48                                             ` Jakub Kicinski
2023-05-17 22:20                                               ` Peilin Ye
2023-05-17 18:48                 ` Jakub Kicinski
2023-05-17 21:18                   ` Peilin Ye
2023-05-24 14:31 ` [syzbot] [net?] KASAN: slab-use-after-free Write in mini_qdisc_pair_swap Pedro Tammela
2023-05-24 15:02   ` syzbot
2023-05-24 15:05   ` Pedro Tammela
2023-05-24 15:34     ` syzbot
     [not found] <20230418022559.1197-1-hdanton@sina.com>
2023-04-18  3:07 ` syzbot
     [not found] <20230418092148.1294-1-hdanton@sina.com>
2023-04-18  9:44 ` syzbot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=877ctxsdnb.fsf@nvidia.com \
    --to=vladbu@nvidia.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=hdanton@sina.com \
    --cc=jhs@mojatatu.com \
    --cc=jiri@resnulli.us \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=pctammela@mojatatu.com \
    --cc=peilin.ye@bytedance.com \
    --cc=sforshee@digitalocean.com \
    --cc=syzbot+b53a9c0d1ea4ad62da8b@syzkaller.appspotmail.com \
    --cc=syzkaller-bugs@googlegroups.com \
    --cc=xiyou.wangcong@gmail.com \
    --cc=yepeilin.cs@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.