From: John Fastabend <john.fastabend@gmail.com>
To: Daniel Borkmann <daniel@iogearbox.net>,
Cong Wang <xiyou.wangcong@gmail.com>,
Jiri Pirko <jiri@resnulli.us>
Cc: Roi Dayan <roid@mellanox.com>,
"David S. Miller" <davem@davemloft.net>,
Linux Kernel Network Developers <netdev@vger.kernel.org>,
Jiri Pirko <jiri@mellanox.com>,
Or Gerlitz <ogerlitz@mellanox.com>,
Cong Wang <cwang@twopensource.com>
Subject: Re: [PATCH net-next] net/sched: cls_flower: verify root pointer before dereferncing it
Date: Tue, 22 Nov 2016 15:36:16 -0800 [thread overview]
Message-ID: <5834D670.2050301@gmail.com> (raw)
In-Reply-To: <5834AD97.1020600@iogearbox.net>
On 16-11-22 12:41 PM, Daniel Borkmann wrote:
> On 11/22/2016 08:28 PM, Cong Wang wrote:
>> On Tue, Nov 22, 2016 at 8:11 AM, Jiri Pirko <jiri@resnulli.us> wrote:
>>> Tue, Nov 22, 2016 at 05:04:11PM CET, daniel@iogearbox.net wrote:
>>>> Hmm, I don't think we want to have such an additional test in fast
>>>> path for each and every classifier. Can we think of ways to avoid that?
>>>>
>>>> My question is, since we unlink individual instances from such
>>>> tp-internal
>>>> lists through RCU and release the instance through call_rcu() as
>>>> well as
>>>> the head (tp->root) via kfree_rcu() eventually, against what are we
>>>> protecting
>>>> setting RCU_INIT_POINTER(tp->root, NULL) in ->destroy() callback?
>>>> Something
>>>> not respecting grace period?
>>>
>>> If you call tp->ops->destroy in call_rcu, you don't have to set tp->root
>>> to null.
>
> But that's not really an answer to my question. ;)
>
>> We do need to respect the grace period if we touch the globally visible
>> data structure tp in tcf_destroy(). Therefore Roi's patch is not
>> fixing the
>> right place.
>
> I think there may be multiple issues actually.
>
> At the time we go into tc_classify(), from ingress as well as egress side,
> we're under RCU, but BH variant. In cls delete()/destroy() callbacks, we
> everywhere use call_rcu() and kfree_rcu(), same as for tcf_destroy() where
> we use kfree_rcu() on tp, although we iterate tps (and implicitly inner
> filters)
> via rcu_dereference_bh() from reader side. Is there a reason why we don't
> use call_rcu_bh() variant on destruction for all this instead?
I can't think of any if its all under _bh we can convert the call_rcu to
call_rcu_bh it just needs an audit.
>
> Just looking at cls_bpf and others, what protects
> RCU_INIT_POINTER(tp->root,
> NULL) against? The tp is unlinked in tc_ctl_tfilter() from the tp chain in
> tcf_destroy() cases. Still active readers under RCU BH can race against
> this
> (tp->root being NULL), as the commit identified. Only the get() callback
> checks
> for head against NULL, but both are serialized under rtnl, and the only
> place
> we call this is tc_ctl_tfilter(). Even if we create a new tp, head
> should not
> be NULL there, if it was assigned during the init() cb, but contains an
> empty
> list. (It's different for things like cls_cgroup, though.) So, I'm
> wondering
> if the RCU_INIT_POINTER(tp->root, NULL) can just be removed instead
> (unless I'm
> missing something obvious)?
Just took a look at this I think there are a couple possible solutions.
The easiest is likely to fix all the call sites so that 'tp' is unlinked
before calling the destroy() handlers AND not doing the NULL set. I only
see one such call site where destroy is called before unlinking at the
moment. This should enforce that after a grace period there is no path
to reach the classifiers because 'tp' is unlinked. Calling destroy
before unlinking 'tp' however could cause a small race between grace
period of 'tp' and grace period of the filter.
Another would be to only call the destroy path from the call_rcu path
of the 'tp' object so that destroy is only ever called after the object
is guaranteed to be unlinked from the tc_filter path.
I think both solutions would be fine.
Cong were you working on one of these? Or do you have another idea?
>
>> Also I don't know why you blame my commit, this problem should already
>> exist prior to my commit, probably date back to John's RCU patches.
>
> It seems so.
next prev parent reply other threads:[~2016-11-22 23:36 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-11-22 14:25 [PATCH net-next] net/sched: cls_flower: verify root pointer before dereferncing it Roi Dayan
2016-11-22 14:48 ` Jiri Pirko
2016-11-22 15:37 ` David Miller
2016-11-22 16:13 ` Jiri Pirko
2016-11-22 16:04 ` Daniel Borkmann
2016-11-22 16:11 ` Jiri Pirko
2016-11-22 19:28 ` Cong Wang
2016-11-22 20:41 ` Daniel Borkmann
2016-11-22 23:36 ` John Fastabend [this message]
2016-11-23 5:24 ` Cong Wang
2016-11-23 11:29 ` Daniel Borkmann
2016-11-23 19:29 ` Cong Wang
2016-11-24 20:25 ` David Miller
2016-11-25 0:17 ` Daniel Borkmann
2016-11-25 6:23 ` Cong Wang
2016-11-25 9:43 ` matchall race (was: Re: [PATCH net-next] net/sched: cls_flower: verify root pointer before dereferncing it) Daniel Borkmann
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5834D670.2050301@gmail.com \
--to=john.fastabend@gmail.com \
--cc=cwang@twopensource.com \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=jiri@mellanox.com \
--cc=jiri@resnulli.us \
--cc=netdev@vger.kernel.org \
--cc=ogerlitz@mellanox.com \
--cc=roid@mellanox.com \
--cc=xiyou.wangcong@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).