All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peilin Ye <yepeilin.cs@gmail.com>
To: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Pedro Tammela <pctammela@mojatatu.com>,
	Seth Forshee <sforshee@digitalocean.com>,
	Jakub Kicinski <kuba@kernel.org>,
	syzbot <syzbot+b53a9c0d1ea4ad62da8b@syzkaller.appspotmail.com>,
	davem@davemloft.net, edumazet@google.com, jiri@resnulli.us,
	linux-kernel@vger.kernel.org, netdev@vger.kernel.org,
	pabeni@redhat.com, syzkaller-bugs@googlegroups.com,
	xiyou.wangcong@gmail.com, peilin.ye@bytedance.com,
	yepeilin.cs@gmail.com, vladbu@nvidia.com, hdanton@sina.com
Subject: Re: [syzbot] [net?] KASAN: slab-use-after-free Write in mini_qdisc_pair_swap
Date: Wed, 26 Apr 2023 16:42:01 -0700	[thread overview]
Message-ID: <20230426233657.GA11249@bytedance> (raw)
In-Reply-To: <20230417230011.GA41709@bytedance>

+Cc: Vlad Buslov, Hillf Danton

Hi all,

On Mon, Apr 17, 2023 at 04:00:11PM -0700, Peilin Ye wrote:
> I also reproduced this UAF using the syzkaller reproducer in the report
> (the C reproducer did not work for me for unknown reasons).  I will look
> into this.

Currently, multiple ingress (clsact) Qdiscs can access the per-netdev
*miniq_ingress (*miniq_egress) pointer concurrently.  This is
unfortunately true in two senses:

1. We allow adding ingress (clsact) Qdiscs under parents other than
TC_H_INGRESS (TC_H_CLSACT):

  $ ip link add ifb0 numtxqueues 8 type ifb
  $ echo clsact > /proc/sys/net/core/default_qdisc
  $ tc qdisc add dev ifb0 handle 1: root mq
  $ tc qdisc show dev ifb0
  qdisc mq 1: root
  qdisc clsact 0: parent 1:8
  qdisc clsact 0: parent 1:7
  qdisc clsact 0: parent 1:6
  qdisc clsact 0: parent 1:5
  qdisc clsact 0: parent 1:4
  qdisc clsact 0: parent 1:3
  qdisc clsact 0: parent 1:2
  qdisc clsact 0: parent 1:1

This is obviously racy and should be prohibited.  I've started working
on patches to fix this.  The syz repro for this UAF adds ingress Qdiscs
under TC_H_ROOT, by the way.

2. After introducing RTNL-lockless RTM_{NEW,DEL,GET}TFILTER requests
[1], it is possible that, when replacing ingress (clsact) Qdiscs, the
old one can access *miniq_{in,e}gress concurrently with the new one.  For
example, the syz repro does something like the following:

  Thread 1 creates sch_ingress Qdisc A (containing mini Qdisc a1 and a2),
  then adds a cls_flower filter X to Qdisc A.

  Thread 2 creates sch_ingress Qdisc B (containing mini Qdisc b1 and b2)
  to replace Qdisc A, then adds a cls_flower filter Y to Qdisc B.

  Device has 8 TXQs.

 Thread 1               A's refcnt   Thread 2
  RTM_NEWQDISC (A, locked)    
   qdisc_create(A)               1
   qdisc_graft(A)                9

  RTM_NEWTFILTER (X, lockless)
   __tcf_qdisc_find(A)          10
   tcf_chain0_head_change(A)
 ! mini_qdisc_pair_swap(A)           
            |                        RTM_NEWQDISC (B, locked)
            |                    2    qdisc_graft(B)
            |                    1    notify_and_destroy(A)
            |                                  
            |                        RTM_NEWTFILTER (Y, lockless)
            |                         tcf_chain0_head_change(B)
            |                       ! mini_qdisc_pair_swap(B)
   tcf_block_release(A)          0             |
   qdisc_destroy(A)                            |
   tcf_chain0_head_change_cb_del(A)            |
 ! mini_qdisc_pair_swap(A)                     |
            |                                  |
           ...                                ...

As we can see there're interleaving mini_qdisc_pair_swap() calls between
Qdisc A and B, causing all kinds of troubles, including the UAF (thread
2 writing to mini Qdisc a1's rcu_state after Qdisc A has already been
freed) reported by syzbot.

To fix this, I'm cooking a patch that, when replacing ingress (clsact)
Qdiscs, in qdisc_graft():

  I.  We should make sure there's no on-the-fly lockless filter requests
      for the old Qdisc, and return -EBUSY if there's any (or can/should
      we wait in RTM_NEWQDISC handler?)

  II. We should destory the old Qdisc before publishing the new one
      (i.e. setting it to dev_ingress_queue(dev)->qdisc_sleeping, so
      that subsequent filter requests can see it), because
      {ingress,clsact}_destroy() also call mini_qdisc_pair_swap(), which
      sets *miniq_{in,e}gress to NULL

Future Qdiscs that support RTNL-lockless cls_ops, if any, won't need
this fix, as long as their ->chain_head_change() don't access
out-of-Qdisc-scope data, like pointers in struct net_device.

Do you think this is the right way to go?  Thanks!

[1] Thanks Hillf Danton for the hint:
    https://syzkaller.appspot.com/text?tag=Patch&x=10d7cd5bc80000

Thanks,
Peilin Ye


  reply	other threads:[~2023-04-26 23:42 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-24  0:52 [syzbot] [net?] KASAN: slab-use-after-free Write in mini_qdisc_pair_swap syzbot
2023-03-29  1:47 ` Jakub Kicinski
2023-03-29  3:37   ` Seth Forshee
2023-03-29 19:07     ` Pedro Tammela
2023-04-03 15:58       ` Jamal Hadi Salim
2023-04-17 23:00         ` Peilin Ye
2023-04-26 23:42           ` Peilin Ye [this message]
2023-04-27  2:31             ` Pedro Tammela
2023-04-27 12:26             ` Vlad Buslov
2023-04-27 17:35               ` Peilin Ye
2023-04-28 12:43                 ` Vlad Buslov
2023-05-06  0:09             ` [PATCH net 0/6] net/sched: Fixes for sch_ingress and sch_clsact Peilin Ye
2023-05-06  0:12               ` [PATCH net 1/6] net/sched: sch_ingress: Only create under TC_H_INGRESS Peilin Ye
2023-05-08 11:22                 ` Jamal Hadi Salim
2023-05-06  0:13               ` [PATCH net 2/6] net/sched: sch_clsact: Only create under TC_H_CLSACT Peilin Ye
2023-05-08 11:23                 ` Jamal Hadi Salim
2023-05-06  0:14               ` [PATCH net 3/6] net/sched: Reserve TC_H_INGRESS (TC_H_CLSACT) for ingress (clsact) Qdiscs Peilin Ye
2023-05-08 11:23                 ` Jamal Hadi Salim
2023-05-06  0:14               ` [PATCH net 4/6] net/sched: Prohibit regrafting ingress or clsact Qdiscs Peilin Ye
2023-05-08 11:24                 ` Jamal Hadi Salim
2023-05-06  0:15               ` [PATCH net 5/6] net/sched: Refactor qdisc_graft() for ingress and " Peilin Ye
2023-05-08 11:29                 ` Jamal Hadi Salim
2023-05-08 22:24                   ` Peilin Ye
2023-05-08 14:11                 ` Pedro Tammela
2023-05-06  0:16               ` [PATCH net 6/6] net/sched: qdisc_destroy() old ingress and clsact Qdiscs before grafting Peilin Ye
2023-05-08 11:32                 ` Jamal Hadi Salim
2023-05-08 21:58                   ` Peilin Ye
2023-05-08 14:12                 ` Pedro Tammela
2023-05-08 22:01                   ` Peilin Ye
2023-05-09  1:33                 ` Jakub Kicinski
2023-05-10 20:11                   ` Peilin Ye
2023-05-10 23:15                     ` Jakub Kicinski
2023-05-11 20:40                       ` Peilin Ye
2023-05-11 23:20                         ` Jakub Kicinski
2023-05-11 23:46                           ` Peilin Ye
2023-05-12  0:11                             ` Peilin Ye
2023-05-15 22:45                               ` Peilin Ye
2023-05-16 19:22                                 ` Jakub Kicinski
2023-05-16 19:35                                   ` Vlad Buslov
2023-05-16 21:50                                     ` Jakub Kicinski
2023-05-16 22:58                                       ` Peilin Ye
2023-05-17  0:39                                         ` Jakub Kicinski
2023-05-17  8:49                                           ` Vlad Buslov
2023-05-17 18:48                                             ` Jakub Kicinski
2023-05-17 22:20                                               ` Peilin Ye
2023-05-17 18:48                 ` Jakub Kicinski
2023-05-17 21:18                   ` Peilin Ye
2023-05-24 14:31 ` [syzbot] [net?] KASAN: slab-use-after-free Write in mini_qdisc_pair_swap Pedro Tammela
2023-05-24 15:02   ` syzbot
2023-05-24 15:05   ` Pedro Tammela
2023-05-24 15:34     ` syzbot
     [not found] <20230418022559.1197-1-hdanton@sina.com>
2023-04-18  3:07 ` syzbot
     [not found] <20230418092148.1294-1-hdanton@sina.com>
2023-04-18  9:44 ` syzbot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230426233657.GA11249@bytedance \
    --to=yepeilin.cs@gmail.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=hdanton@sina.com \
    --cc=jhs@mojatatu.com \
    --cc=jiri@resnulli.us \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=pctammela@mojatatu.com \
    --cc=peilin.ye@bytedance.com \
    --cc=sforshee@digitalocean.com \
    --cc=syzbot+b53a9c0d1ea4ad62da8b@syzkaller.appspotmail.com \
    --cc=syzkaller-bugs@googlegroups.com \
    --cc=vladbu@nvidia.com \
    --cc=xiyou.wangcong@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.