From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Cong Wang <xiyou.wangcong@gmail.com>
Cc: David Miller <davem@davemloft.net>,
Linux Kernel Network Developers <netdev@vger.kernel.org>,
Jamal Hadi Salim <jhs@mojatatu.com>,
John Fastabend <john.fastabend@gmail.com>,
Chris Mi <chrism@mellanox.com>
Subject: Re: [Patch net 00/15] net_sched: remove RCU callbacks from TC
Date: Wed, 25 Oct 2017 17:19:16 -0700 [thread overview]
Message-ID: <20171026001916.GU3659@linux.vnet.ibm.com> (raw)
In-Reply-To: <CAM_iQpVLHLx9eVgO0S--iq-XkhZE82Y30Z0REwU6qx8OGgm+mA@mail.gmail.com>
On Wed, Oct 25, 2017 at 03:37:40PM -0700, Cong Wang wrote:
> On Tue, Oct 24, 2017 at 6:43 PM, David Miller <davem@davemloft.net> wrote:
> > From: Cong Wang <xiyou.wangcong@gmail.com>
> > Date: Mon, 23 Oct 2017 15:02:49 -0700
> >
> >> Recently, the RCU callbacks used in TC filters and TC actions keep
> >> drawing my attention, they introduce at least 4 race condition bugs:
> >
> > Like Eric, I think doing a full RCU sync on every delete is too big
> > a pill to swallow. This is a major control plane performance
> > regression.
> >
> > Please find another reasonable way to fix this.
> >
>
> Alright... I finally find a way to make everyone happy.
>
> My solution is introducing a workqueue for tc filters
> and let each RCU callback defer the work to this
> workqueue. I solve the flush_workqueue() deadlock
> by queuing another work in the same workqueue
> at the end, so the execution order should be as same
> as it is now. The ugly part is now tcf_block_put() which
> looks like below:
>
>
> static void tcf_block_put_final(struct work_struct *work)
> {
> struct tcf_block *block = container_of(work, struct tcf_block, work);
> struct tcf_chain *chain, *tmp;
>
> /* At this point, all the chains should have refcnt == 1. */
> rtnl_lock();
> list_for_each_entry_safe(chain, tmp, &block->chain_list, list)
> tcf_chain_put(chain);
> rtnl_unlock();
> kfree(block);
> }
I am guessing that tcf_chain_put() sometimes does a call_rcu(),
and the callback function in turn calls schedule_work(), and that
tcf_block_put_deferred() is the workqueue handler function.
> static void tcf_block_put_deferred(struct work_struct *work)
> {
> struct tcf_block *block = container_of(work, struct tcf_block, work);
> struct tcf_chain *chain;
>
> rtnl_lock();
> /* Hold a refcnt for all chains, except 0, in case they are gone. */
> list_for_each_entry(chain, &block->chain_list, list)
> if (chain->index)
> tcf_chain_hold(chain);
>
> /* No race on the list, because no chain could be destroyed. */
> list_for_each_entry(chain, &block->chain_list, list)
> tcf_chain_flush(chain);
>
> INIT_WORK(&block->work, tcf_block_put_final);
> /* Wait for RCU callbacks to release the reference count and make
> * sure their works have been queued before this.
> */
> rcu_barrier();
This one can take awhile... Though in recent kernels it will often
be a bit faster than synchronize_rcu().
Note that rcu_barrier() only waits for callbacks posted via call_rcu()
before the rcu_barrier() starts, if that matters.
> tcf_queue_work(&block->work);
> rtnl_unlock();
> }
And it looks like tcf_block_put_deferred() queues itself as work as well.
Or maybe instead?
> void tcf_block_put(struct tcf_block *block)
> {
> if (!block)
> return;
>
> INIT_WORK(&block->work, tcf_block_put_deferred);
> /* Wait for existing RCU callbacks to cool down, make sure their works
> * have been queued before this. We can not flush pending works here
> * because we are holding the RTNL lock.
> */
> rcu_barrier();
> tcf_queue_work(&block->work);
> }
>
>
> Paul, does this make any sense to you? ;)
would be surprised if I fully understand the problem to be solved,
but my current guess is that the constraints are as follows:
1. Things removed must be processed in order.
2. Things removes must not be processed until a grace period
has elapsed.
3. Things being processed after a grace period should not be
processed concurrently with each other or with subsequent
removals.
4. A given removal is not finalized until its reference count
reaches zero.
5. RTNL might not be held when the reference count reaches zero.
Or did I lose the thread somewhere?
Thanx, Paul
next prev parent reply other threads:[~2017-10-26 0:19 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-10-23 22:02 [Patch net 00/15] net_sched: remove RCU callbacks from TC Cong Wang
2017-10-23 22:02 ` [Patch net 01/15] net_sched: remove RCU callbacks in basic filter Cong Wang
2017-10-23 22:02 ` [Patch net 02/15] net_sched: remove RCU callbacks in bpf filter Cong Wang
2017-10-23 22:02 ` [Patch net 03/15] net_sched: remove RCU callbacks in flower filter Cong Wang
2017-10-23 22:02 ` [Patch net 04/15] net_sched: remove RCU callbacks in matchall filter Cong Wang
2017-10-23 22:02 ` [Patch net 05/15] net_sched: remove RCU callbacks in cgroup filter Cong Wang
2017-10-23 22:02 ` [Patch net 06/15] net_sched: remove RCU callbacks in rsvp filter Cong Wang
2017-10-23 22:02 ` [Patch net 07/15] net_sched: remove RCU callbacks in flow filter Cong Wang
2017-10-23 22:02 ` [Patch net 08/15] net_sched: remove RCU callbacks in tcindex filter Cong Wang
2017-10-23 22:02 ` [Patch net 09/15] net_sched: remove RCU callbacks in u32 filter Cong Wang
2017-10-23 22:02 ` [Patch net 10/15] net_sched: remove RCU callbacks in fw filter Cong Wang
2017-10-23 22:03 ` [Patch net 11/15] net_sched: remove RCU callbacks in route filter Cong Wang
2017-10-23 22:03 ` [Patch net 12/15] net_sched: remove RCU callbacks in sample action Cong Wang
2017-10-23 22:03 ` [Patch net 13/15] net_sched: add rtnl assertion to tcf_exts_destroy() Cong Wang
2017-10-23 22:03 ` [Patch net 14/15] selftests: Introduce a new script to generate tc batch file Cong Wang
2017-10-24 4:56 ` Chris Mi
2017-10-23 22:03 ` [Patch net 15/15] selftests: Introduce a new test case to tc testsuite Cong Wang
2017-10-23 23:16 ` [Patch net 00/15] net_sched: remove RCU callbacks from TC Eric Dumazet
2017-10-23 23:23 ` Cong Wang
2017-10-23 23:31 ` Eric Dumazet
2017-10-25 20:46 ` Cong Wang
2017-10-25 1:43 ` David Miller
2017-10-25 22:37 ` Cong Wang
2017-10-26 0:19 ` Paul E. McKenney [this message]
2017-10-26 4:49 ` Cong Wang
2017-10-26 13:58 ` Paul E. McKenney
2017-10-26 19:10 ` Cong Wang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20171026001916.GU3659@linux.vnet.ibm.com \
--to=paulmck@linux.vnet.ibm.com \
--cc=chrism@mellanox.com \
--cc=davem@davemloft.net \
--cc=jhs@mojatatu.com \
--cc=john.fastabend@gmail.com \
--cc=netdev@vger.kernel.org \
--cc=xiyou.wangcong@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).