All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jakub Kicinski <kuba@kernel.org>
To: Peilin Ye <yepeilin.cs@gmail.com>
Cc: "David S. Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Paolo Abeni <pabeni@redhat.com>,
	Jamal Hadi Salim <jhs@mojatatu.com>,
	Cong Wang <xiyou.wangcong@gmail.com>,
	Jiri Pirko <jiri@resnulli.us>,
	Peilin Ye <peilin.ye@bytedance.com>,
	Daniel Borkmann <daniel@iogearbox.net>,
	John Fastabend <john.fastabend@gmail.com>,
	Vlad Buslov <vladbu@mellanox.com>,
	Pedro Tammela <pctammela@mojatatu.com>,
	Hillf Danton <hdanton@sina.com>,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	Cong Wang <cong.wang@bytedance.com>
Subject: Re: [PATCH net 6/6] net/sched: qdisc_destroy() old ingress and clsact Qdiscs before grafting
Date: Wed, 10 May 2023 16:15:59 -0700	[thread overview]
Message-ID: <20230510161559.2767b27a@kernel.org> (raw)
In-Reply-To: <ZFv6Z7hssZ9snNAw@C02FL77VMD6R.googleapis.com>

On Wed, 10 May 2023 13:11:19 -0700 Peilin Ye wrote:
> On Fri,  5 May 2023 17:16:10 -0700 Peilin Ye wrote:
> >   Thread 1 creates ingress Qdisc A (containing mini Qdisc a1 and a2), then
> >   adds a flower filter X to A.
> > 
> >   Thread 2 creates another ingress Qdisc B (containing mini Qdisc b1 and
> >   b2) to replace A, then adds a flower filter Y to B.
> > 
> >  Thread 1               A's refcnt   Thread 2
> >   RTM_NEWQDISC (A, RTNL-locked)
> >    qdisc_create(A)               1
> >    qdisc_graft(A)                9
> > 
> >   RTM_NEWTFILTER (X, RTNL-lockless)
> >    __tcf_qdisc_find(A)          10
> >    tcf_chain0_head_change(A)
> >    mini_qdisc_pair_swap(A) (1st)
> >             |
> >             |                         RTM_NEWQDISC (B, RTNL-locked)
> >            RCU                   2     qdisc_graft(B)
> >             |                    1     notify_and_destroy(A)
> >             |
> >    tcf_block_release(A)          0    RTM_NEWTFILTER (Y, RTNL-lockless)
> >    qdisc_destroy(A)                    tcf_chain0_head_change(B)
> >    tcf_chain0_head_change_cb_del(A)    mini_qdisc_pair_swap(B) (2nd)
> >    mini_qdisc_pair_swap(A) (3rd)                |
> >            ...                                 ...  
> 
> Looking at the code, I think there is no guarantee that (1st) cannot
> happen after (2nd), although unlikely?  Can RTNL-lockless RTM_NEWTFILTER
> handlers get preempted?

Right, we need qdisc_graft(B) to update the appropriate dev pointer 
to point to b1. With that the ordering should not matter. Probably
using the ->attach() callback?

> If (1st) happens later than (2nd), we will need to make (1st) no-op, by
> detecting that we are the "old" Qdisc.  I am not sure there is any
> (clean) way to do it.  I even thought about:
> 
>   (1) Get the containing Qdisc of "miniqp" we are working on, "qdisc";
>   (2) Test if "qdisc == qdisc->dev_queue->qdisc_sleeping".  If false, it
>       means we are the "old" Qdisc (have been replaced), and should do
>       nothing.
> 
> However, for clsact Qdiscs I don't know if "miniqp" is the ingress or
> egress one, so I can't container_of() during step (1) ...

And we can't be using multiple pieces of information to make 
the decision since AFAIU mini_qdisc_pair_swap() can race with
qdisc_graft().

My thinking was to make sure that dev->miniq_* pointers always point
to one of the miniqs of the currently attached qdisc. Right now, on 
a quick look, those pointers are not initialized during initial graft,
only when first filter is added, and may be cleared when filters are
removed. But I don't think that's strictly required, miniq with no
filters should be fine.

> Eventually I created [5,6/6].  It is a workaround indeed, in the sense
> that it changes sch_api.c to avoid a mini Qdisc issue.  However I think it
> makes the code correct in a relatively understandable way,

What's your benchmark for being understandable?

> without slowing down mini_qdisc_pair_swap() or sch_handle_*gress().


  reply	other threads:[~2023-05-10 23:16 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-24  0:52 [syzbot] [net?] KASAN: slab-use-after-free Write in mini_qdisc_pair_swap syzbot
2023-03-29  1:47 ` Jakub Kicinski
2023-03-29  3:37   ` Seth Forshee
2023-03-29 19:07     ` Pedro Tammela
2023-04-03 15:58       ` Jamal Hadi Salim
2023-04-17 23:00         ` Peilin Ye
2023-04-26 23:42           ` Peilin Ye
2023-04-27  2:31             ` Pedro Tammela
2023-04-27 12:26             ` Vlad Buslov
2023-04-27 17:35               ` Peilin Ye
2023-04-28 12:43                 ` Vlad Buslov
2023-05-06  0:09             ` [PATCH net 0/6] net/sched: Fixes for sch_ingress and sch_clsact Peilin Ye
2023-05-06  0:12               ` [PATCH net 1/6] net/sched: sch_ingress: Only create under TC_H_INGRESS Peilin Ye
2023-05-08 11:22                 ` Jamal Hadi Salim
2023-05-06  0:13               ` [PATCH net 2/6] net/sched: sch_clsact: Only create under TC_H_CLSACT Peilin Ye
2023-05-08 11:23                 ` Jamal Hadi Salim
2023-05-06  0:14               ` [PATCH net 3/6] net/sched: Reserve TC_H_INGRESS (TC_H_CLSACT) for ingress (clsact) Qdiscs Peilin Ye
2023-05-08 11:23                 ` Jamal Hadi Salim
2023-05-06  0:14               ` [PATCH net 4/6] net/sched: Prohibit regrafting ingress or clsact Qdiscs Peilin Ye
2023-05-08 11:24                 ` Jamal Hadi Salim
2023-05-06  0:15               ` [PATCH net 5/6] net/sched: Refactor qdisc_graft() for ingress and " Peilin Ye
2023-05-08 11:29                 ` Jamal Hadi Salim
2023-05-08 22:24                   ` Peilin Ye
2023-05-08 14:11                 ` Pedro Tammela
2023-05-06  0:16               ` [PATCH net 6/6] net/sched: qdisc_destroy() old ingress and clsact Qdiscs before grafting Peilin Ye
2023-05-08 11:32                 ` Jamal Hadi Salim
2023-05-08 21:58                   ` Peilin Ye
2023-05-08 14:12                 ` Pedro Tammela
2023-05-08 22:01                   ` Peilin Ye
2023-05-09  1:33                 ` Jakub Kicinski
2023-05-10 20:11                   ` Peilin Ye
2023-05-10 23:15                     ` Jakub Kicinski [this message]
2023-05-11 20:40                       ` Peilin Ye
2023-05-11 23:20                         ` Jakub Kicinski
2023-05-11 23:46                           ` Peilin Ye
2023-05-12  0:11                             ` Peilin Ye
2023-05-15 22:45                               ` Peilin Ye
2023-05-16 19:22                                 ` Jakub Kicinski
2023-05-16 19:35                                   ` Vlad Buslov
2023-05-16 21:50                                     ` Jakub Kicinski
2023-05-16 22:58                                       ` Peilin Ye
2023-05-17  0:39                                         ` Jakub Kicinski
2023-05-17  8:49                                           ` Vlad Buslov
2023-05-17 18:48                                             ` Jakub Kicinski
2023-05-17 22:20                                               ` Peilin Ye
2023-05-17 18:48                 ` Jakub Kicinski
2023-05-17 21:18                   ` Peilin Ye
2023-05-24 14:31 ` [syzbot] [net?] KASAN: slab-use-after-free Write in mini_qdisc_pair_swap Pedro Tammela
2023-05-24 15:02   ` syzbot
2023-05-24 15:05   ` Pedro Tammela
2023-05-24 15:34     ` syzbot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230510161559.2767b27a@kernel.org \
    --to=kuba@kernel.org \
    --cc=cong.wang@bytedance.com \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=hdanton@sina.com \
    --cc=jhs@mojatatu.com \
    --cc=jiri@resnulli.us \
    --cc=john.fastabend@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=pctammela@mojatatu.com \
    --cc=peilin.ye@bytedance.com \
    --cc=vladbu@mellanox.com \
    --cc=xiyou.wangcong@gmail.com \
    --cc=yepeilin.cs@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.