From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>,
Chris Mi <chrism@mellanox.com>,
Linux Kernel Network Developers <netdev@vger.kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>,
Eric Dumazet <edumazet@google.com>,
David Miller <davem@davemloft.net>, Jiri Pirko <jiri@resnulli.us>
Subject: Re: Get rid of RCU callbacks in TC filters?
Date: Wed, 18 Oct 2017 12:35:48 -0700 [thread overview]
Message-ID: <20171018193548.GM3521@linux.vnet.ibm.com> (raw)
In-Reply-To: <CAM_iQpU1hF9v1+bek9CQkdbjS_2nctwpaSXyiNLiBHPC6WEYKQ@mail.gmail.com>
On Wed, Oct 18, 2017 at 10:36:28AM -0700, Cong Wang wrote:
> Hi, all
>
> Recently, the RCU callbacks used in TC filters and TC actions keep
> drawing my attention, they introduce at least 4 race condition bugs:
>
> 1. A simple one fixed by Daniel:
>
> commit c78e1746d3ad7d548bdf3fe491898cc453911a49
> Author: Daniel Borkmann <daniel@iogearbox.net>
> Date: Wed May 20 17:13:33 2015 +0200
>
> net: sched: fix call_rcu() race on classifier module unloads
>
> 2. A very nasty one fixed by me:
>
> commit 1697c4bb5245649a23f06a144cc38c06715e1b65
> Author: Cong Wang <xiyou.wangcong@gmail.com>
> Date: Mon Sep 11 16:33:32 2017 -0700
>
> net_sched: carefully handle tcf_block_put()
>
> 3. Two more bugs found by Chris:
> https://patchwork.ozlabs.org/patch/826696/
> https://patchwork.ozlabs.org/patch/826695/
>
>
> Usually RCU callbacks are simple, however for TC filters and actions,
> they are complex because at least TC actions could be destroyed
> together with the TC filter in one callback. And RCU callbacks are
> invoked in BH context, without locking they are parallel too. All of
> these contribute to the cause of these nasty bugs. It looks like they
> bring us more problems than benefits.
>
> Therefore, I have been thinking about getting rid of these callbacks,
> because they are not strictly necessary, callers of these call_rcu()
> are all on slow path and have RTNL lock, so blocking is permitted in
> their contexts, and _I think_ it does not harm to use
> synchronize_rcu() on slow paths, at least I can argue RTNL lock is
> already there and is a bottleneck if we really care. :)
>
> There are 3 solutions here:
>
> 1) Get rid of these RCU callbacks and use synchronize_rcu(). The
> downside is this could hurt the performance of deleting TC filters,
> but again it is slow path comparing to skb classification path. Note,
> it is _not_ merely replacing call_rcu() with synchronize_rcu(),
> because many call_rcu()'s are actually in list iterations, we have to
> use a local list and call list_del_rcu()+list_add() before
> synchronize_rcu() (Or is there any other API I am not aware of?). If
> people really hate synchronize_rcu() because of performance, we could
> also defer the work to a workqueue and callers could keep their
> performance as they are.
>
> 2) Introduce a spinlock to serialize these RCU callbacks. But as I
> said in commit 1697c4bb5245 ("net_sched: carefully handle
> tcf_block_put()"), it is very hard to do because of tcf_chain_dump().
> Potentially we need to do a lot of work to make it possible, if not
> impossible.
>
> 3) Keep these RCU callbacks and fix all race conditions. Like what
> Chris tries to do in his patchset, but my argument is that we can not
> prove we are really race-free even with Chris' patches and his patches
> are already large enough.
>
>
> What do you think? Any other ideas?
4) Move from call_rcu() to synchronize_rcu(), but if feasible use one
synchronize_rcu() for multiple deletions/iterations.
5) Keep call_rcu(), but have the RCU callback schedule a workqueue.
The workqueue could then use blocking primitives, for example, acquiring
RTNL.
6) As with #5, have the RCU callback schedule a workqueue, but aggregate
workqueue scheduling using a timer. This would reduce the number of
RTNL acquisitions.
7) As with #5, have the RCU callback schedule a workqueue, but have each
iterator accumulate a list of things removed and do call_rcu() on the
list. This is an alternative way of aggregating to reduce the number
of RTNL acquisitions.
There are many other ways to skin this cat.
Thanx, Paul
next prev parent reply other threads:[~2017-10-18 19:35 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-10-18 17:36 Get rid of RCU callbacks in TC filters? Cong Wang
2017-10-18 19:35 ` Paul E. McKenney [this message]
2017-10-19 15:34 ` John Fastabend
2017-10-20 3:15 ` Cong Wang
2017-10-20 3:26 ` Cong Wang
2017-10-20 16:56 ` Paul E. McKenney
2017-10-20 20:31 ` Cong Wang
2017-10-20 20:52 ` Cong Wang
2017-10-23 11:10 ` David Laight
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20171018193548.GM3521@linux.vnet.ibm.com \
--to=paulmck@linux.vnet.ibm.com \
--cc=chrism@mellanox.com \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=jhs@mojatatu.com \
--cc=jiri@resnulli.us \
--cc=netdev@vger.kernel.org \
--cc=xiyou.wangcong@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).