Get rid of RCU callbacks in TC filters?

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Get rid of RCU callbacks in TC filters?
@ 2017-10-18 17:36 Cong Wang
  2017-10-18 19:35 ` Paul E. McKenney
  0 siblings, 1 reply; 9+ messages in thread
From: Cong Wang @ 2017-10-18 17:36 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: Chris Mi, Linux Kernel Network Developers, Daniel Borkmann,
	Eric Dumazet, David Miller, Paul E. McKenney, Jiri Pirko

Hi, all

Recently, the RCU callbacks used in TC filters and TC actions keep
drawing my attention, they introduce at least 4 race condition bugs:

1. A simple one fixed by Daniel:

commit c78e1746d3ad7d548bdf3fe491898cc453911a49
Author: Daniel Borkmann <daniel@iogearbox.net>
Date:   Wed May 20 17:13:33 2015 +0200

    net: sched: fix call_rcu() race on classifier module unloads

2. A very nasty one fixed by me:

commit 1697c4bb5245649a23f06a144cc38c06715e1b65
Author: Cong Wang <xiyou.wangcong@gmail.com>
Date:   Mon Sep 11 16:33:32 2017 -0700

    net_sched: carefully handle tcf_block_put()

3. Two more bugs found by Chris:
https://patchwork.ozlabs.org/patch/826696/
https://patchwork.ozlabs.org/patch/826695/

Usually RCU callbacks are simple, however for TC filters and actions,
they are complex because at least TC actions could be destroyed
together with the TC filter in one callback. And RCU callbacks are
invoked in BH context, without locking they are parallel too. All of
these contribute to the cause of these nasty bugs. It looks like they
bring us more problems than benefits.

Therefore, I have been thinking about getting rid of these callbacks,
because they are not strictly necessary, callers of these call_rcu()
are all on slow path and have RTNL lock, so blocking is permitted in
their contexts, and _I think_ it does not harm to use
synchronize_rcu() on slow paths, at least I can argue RTNL lock is
already there and is a bottleneck if we really care. :)

There are 3 solutions here:

1) Get rid of these RCU callbacks and use synchronize_rcu(). The
downside is this could hurt the performance of deleting TC filters,
but again it is slow path comparing to skb classification path. Note,
it is _not_ merely replacing call_rcu() with synchronize_rcu(),
because many call_rcu()'s are actually in list iterations, we have to
use a local list and call list_del_rcu()+list_add() before
synchronize_rcu() (Or is there any other API I am not aware of?). If
people really hate synchronize_rcu() because of performance, we could
also defer the work to a workqueue and callers could keep their
performance as they are.

2) Introduce a spinlock to serialize these RCU callbacks. But as I
said in commit 1697c4bb5245 ("net_sched: carefully handle
tcf_block_put()"), it is very hard to do because of tcf_chain_dump().
Potentially we need to do a lot of work to make it possible, if not
impossible.

3) Keep these RCU callbacks and fix all race conditions. Like what
Chris tries to do in his patchset, but my argument is that we can not
prove we are really race-free even with Chris' patches and his patches
are already large enough.

What do you think? Any other ideas?

Thanks.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Get rid of RCU callbacks in TC filters?
  2017-10-18 17:36 Get rid of RCU callbacks in TC filters? Cong Wang
@ 2017-10-18 19:35 ` Paul E. McKenney
  2017-10-19 15:34   ` John Fastabend
  2017-10-20  3:26   ` Cong Wang
  0 siblings, 2 replies; 9+ messages in thread
From: Paul E. McKenney @ 2017-10-18 19:35 UTC (permalink / raw)
  To: Cong Wang
  Cc: Jamal Hadi Salim, Chris Mi, Linux Kernel Network Developers,
	Daniel Borkmann, Eric Dumazet, David Miller, Jiri Pirko

On Wed, Oct 18, 2017 at 10:36:28AM -0700, Cong Wang wrote:
> Hi, all
> 
> Recently, the RCU callbacks used in TC filters and TC actions keep
> drawing my attention, they introduce at least 4 race condition bugs:
> 
> 1. A simple one fixed by Daniel:
> 
> commit c78e1746d3ad7d548bdf3fe491898cc453911a49
> Author: Daniel Borkmann <daniel@iogearbox.net>
> Date:   Wed May 20 17:13:33 2015 +0200
> 
>     net: sched: fix call_rcu() race on classifier module unloads
> 
> 2. A very nasty one fixed by me:
> 
> commit 1697c4bb5245649a23f06a144cc38c06715e1b65
> Author: Cong Wang <xiyou.wangcong@gmail.com>
> Date:   Mon Sep 11 16:33:32 2017 -0700
> 
>     net_sched: carefully handle tcf_block_put()
> 
> 3. Two more bugs found by Chris:
> https://patchwork.ozlabs.org/patch/826696/
> https://patchwork.ozlabs.org/patch/826695/
> 
> 
> Usually RCU callbacks are simple, however for TC filters and actions,
> they are complex because at least TC actions could be destroyed
> together with the TC filter in one callback. And RCU callbacks are
> invoked in BH context, without locking they are parallel too. All of
> these contribute to the cause of these nasty bugs. It looks like they
> bring us more problems than benefits.
> 
> Therefore, I have been thinking about getting rid of these callbacks,
> because they are not strictly necessary, callers of these call_rcu()
> are all on slow path and have RTNL lock, so blocking is permitted in
> their contexts, and _I think_ it does not harm to use
> synchronize_rcu() on slow paths, at least I can argue RTNL lock is
> already there and is a bottleneck if we really care. :)
> 
> There are 3 solutions here:
> 
> 1) Get rid of these RCU callbacks and use synchronize_rcu(). The
> downside is this could hurt the performance of deleting TC filters,
> but again it is slow path comparing to skb classification path. Note,
> it is _not_ merely replacing call_rcu() with synchronize_rcu(),
> because many call_rcu()'s are actually in list iterations, we have to
> use a local list and call list_del_rcu()+list_add() before
> synchronize_rcu() (Or is there any other API I am not aware of?). If
> people really hate synchronize_rcu() because of performance, we could
> also defer the work to a workqueue and callers could keep their
> performance as they are.
> 
> 2) Introduce a spinlock to serialize these RCU callbacks. But as I
> said in commit 1697c4bb5245 ("net_sched: carefully handle
> tcf_block_put()"), it is very hard to do because of tcf_chain_dump().
> Potentially we need to do a lot of work to make it possible, if not
> impossible.
> 
> 3) Keep these RCU callbacks and fix all race conditions. Like what
> Chris tries to do in his patchset, but my argument is that we can not
> prove we are really race-free even with Chris' patches and his patches
> are already large enough.
> 
> 
> What do you think? Any other ideas?

4) Move from call_rcu() to synchronize_rcu(), but if feasible use one
synchronize_rcu() for multiple deletions/iterations.

5) Keep call_rcu(), but have the RCU callback schedule a workqueue.
The workqueue could then use blocking primitives, for example, acquiring
RTNL.

6) As with #5, have the RCU callback schedule a workqueue, but aggregate
workqueue scheduling using a timer.  This would reduce the number of
RTNL acquisitions.

7) As with #5, have the RCU callback schedule a workqueue, but have each
iterator accumulate a list of things removed and do call_rcu() on the
list.  This is an alternative way of aggregating to reduce the number
of RTNL acquisitions.

There are many other ways to skin this cat.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Get rid of RCU callbacks in TC filters?
  2017-10-18 19:35 ` Paul E. McKenney
@ 2017-10-19 15:34   ` John Fastabend
  2017-10-20  3:15     ` Cong Wang
  2017-10-20  3:26   ` Cong Wang
  1 sibling, 1 reply; 9+ messages in thread
From: John Fastabend @ 2017-10-19 15:34 UTC (permalink / raw)
  To: paulmck, Cong Wang
  Cc: Jamal Hadi Salim, Chris Mi, Linux Kernel Network Developers,
	Daniel Borkmann, Eric Dumazet, David Miller, Jiri Pirko

On 10/18/2017 12:35 PM, Paul E. McKenney wrote:
> On Wed, Oct 18, 2017 at 10:36:28AM -0700, Cong Wang wrote:
>> Hi, all
>>
>> Recently, the RCU callbacks used in TC filters and TC actions keep
>> drawing my attention, they introduce at least 4 race condition bugs:
>>
>> 1. A simple one fixed by Daniel:
>>
>> commit c78e1746d3ad7d548bdf3fe491898cc453911a49
>> Author: Daniel Borkmann <daniel@iogearbox.net>
>> Date:   Wed May 20 17:13:33 2015 +0200
>>
>>     net: sched: fix call_rcu() race on classifier module unloads
>>
>> 2. A very nasty one fixed by me:
>>
>> commit 1697c4bb5245649a23f06a144cc38c06715e1b65
>> Author: Cong Wang <xiyou.wangcong@gmail.com>
>> Date:   Mon Sep 11 16:33:32 2017 -0700
>>
>>     net_sched: carefully handle tcf_block_put()
>>
>> 3. Two more bugs found by Chris:
>> https://patchwork.ozlabs.org/patch/826696/
>> https://patchwork.ozlabs.org/patch/826695/
>>
>>
>> Usually RCU callbacks are simple, however for TC filters and actions,
>> they are complex because at least TC actions could be destroyed
>> together with the TC filter in one callback. And RCU callbacks are
>> invoked in BH context, without locking they are parallel too. All of
>> these contribute to the cause of these nasty bugs. It looks like they
>> bring us more problems than benefits.
>>
>> Therefore, I have been thinking about getting rid of these callbacks,
>> because they are not strictly necessary, callers of these call_rcu()
>> are all on slow path and have RTNL lock, so blocking is permitted in
>> their contexts, and _I think_ it does not harm to use
>> synchronize_rcu() on slow paths, at least I can argue RTNL lock is
>> already there and is a bottleneck if we really care. :)
>>
>> There are 3 solutions here:
>>
>> 1) Get rid of these RCU callbacks and use synchronize_rcu(). The
>> downside is this could hurt the performance of deleting TC filters,
>> but again it is slow path comparing to skb classification path. Note,
>> it is _not_ merely replacing call_rcu() with synchronize_rcu(),
>> because many call_rcu()'s are actually in list iterations, we have to
>> use a local list and call list_del_rcu()+list_add() before
>> synchronize_rcu() (Or is there any other API I am not aware of?). If
>> people really hate synchronize_rcu() because of performance, we could
>> also defer the work to a workqueue and callers could keep their
>> performance as they are.
>>
>> 2) Introduce a spinlock to serialize these RCU callbacks. But as I
>> said in commit 1697c4bb5245 ("net_sched: carefully handle
>> tcf_block_put()"), it is very hard to do because of tcf_chain_dump().
>> Potentially we need to do a lot of work to make it possible, if not
>> impossible.
>>
>> 3) Keep these RCU callbacks and fix all race conditions. Like what
>> Chris tries to do in his patchset, but my argument is that we can not
>> prove we are really race-free even with Chris' patches and his patches
>> are already large enough.
>>

My take on this would be to stay with the current RCU callbacks and try
to simplify the implementation. Falling back to sync operations seems
like a step backwards to me. I know update/delete of filters is currently
a pain point for some use cases so getting the RTNL out of the way may
become a requirement to support those (alternatively maybe batching is
good enough).

I guess at a high level with Cris' patches actions are now doing reference
counting correctly. If shared filters also do reference counting similarly
we should be OK right? (yes I know simplifying maybe too much to be
meaningful)

Are we aware of any outstanding problem areas?

>>
>> What do you think? Any other ideas?
> 
> 4) Move from call_rcu() to synchronize_rcu(), but if feasible use one
> synchronize_rcu() for multiple deletions/iterations.
> 
> 5) Keep call_rcu(), but have the RCU callback schedule a workqueue.
> The workqueue could then use blocking primitives, for example, acquiring
> RTNL.
> 
> 6) As with #5, have the RCU callback schedule a workqueue, but aggregate
> workqueue scheduling using a timer.  This would reduce the number of
> RTNL acquisitions.
> 
> 7) As with #5, have the RCU callback schedule a workqueue, but have each
> iterator accumulate a list of things removed and do call_rcu() on the
> list.  This is an alternative way of aggregating to reduce the number
> of RTNL acquisitions.
> 
> There are many other ways to skin this cat.
> 
> 							Thanx, Paul
> 
Batching will probably be needed soon regardless for the hardware offload
folks. 

Thanks,
John

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Get rid of RCU callbacks in TC filters?
  2017-10-19 15:34   ` John Fastabend
@ 2017-10-20  3:15     ` Cong Wang
  0 siblings, 0 replies; 9+ messages in thread
From: Cong Wang @ 2017-10-20  3:15 UTC (permalink / raw)
  To: John Fastabend
  Cc: Paul E. McKenney, Jamal Hadi Salim, Chris Mi,
	Linux Kernel Network Developers, Daniel Borkmann, Eric Dumazet,
	David Miller, Jiri Pirko

On Thu, Oct 19, 2017 at 8:34 AM, John Fastabend
<john.fastabend@gmail.com> wrote:
>
> My take on this would be to stay with the current RCU callbacks and try
> to simplify the implementation. Falling back to sync operations seems
> like a step backwards to me. I know update/delete of filters is currently
> a pain point for some use cases so getting the RTNL out of the way may
> become a requirement to support those (alternatively maybe batching is
> good enough).

For me it looks like very hard to make tc action destroy code completely
race-free in RCU callbacks, at least looks harder than getting rid of
RCU callbacks.

>
> I guess at a high level with Cris' patches actions are now doing reference
> counting correctly. If shared filters also do reference counting similarly
> we should be OK right? (yes I know simplifying maybe too much to be
> meaningful)

I don't know what you mean by "doing reference counting correctly",
if you mean making them atomic, as I already explained to Chris, it
is not necessary at all if we remove RCU callbacks. Refcnt doesn't
have to be atomic if it is always serialized with a lock.

>
> Are we aware of any outstanding problem areas?
>

Potentially many problems, since tc action destroy code could be
called either with a RTNL lock (fine) or in a RCU callback without
RTNL lock (buggy), these two paths race with each other and RCU
callbacks race among themselves too.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Get rid of RCU callbacks in TC filters?
  2017-10-18 19:35 ` Paul E. McKenney
  2017-10-19 15:34   ` John Fastabend
@ 2017-10-20  3:26   ` Cong Wang
  2017-10-20 16:56     ` Paul E. McKenney
  1 sibling, 1 reply; 9+ messages in thread
From: Cong Wang @ 2017-10-20  3:26 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Jamal Hadi Salim, Chris Mi, Linux Kernel Network Developers,
	Daniel Borkmann, Eric Dumazet, David Miller, Jiri Pirko

On Wed, Oct 18, 2017 at 12:35 PM, Paul E. McKenney
<paulmck@linux.vnet.ibm.com> wrote:
> On Wed, Oct 18, 2017 at 10:36:28AM -0700, Cong Wang wrote:
>> Hi, all
>>
>> Recently, the RCU callbacks used in TC filters and TC actions keep
>> drawing my attention, they introduce at least 4 race condition bugs:
>>
>> 1. A simple one fixed by Daniel:
>>
>> commit c78e1746d3ad7d548bdf3fe491898cc453911a49
>> Author: Daniel Borkmann <daniel@iogearbox.net>
>> Date:   Wed May 20 17:13:33 2015 +0200
>>
>>     net: sched: fix call_rcu() race on classifier module unloads
>>
>> 2. A very nasty one fixed by me:
>>
>> commit 1697c4bb5245649a23f06a144cc38c06715e1b65
>> Author: Cong Wang <xiyou.wangcong@gmail.com>
>> Date:   Mon Sep 11 16:33:32 2017 -0700
>>
>>     net_sched: carefully handle tcf_block_put()
>>
>> 3. Two more bugs found by Chris:
>> https://patchwork.ozlabs.org/patch/826696/
>> https://patchwork.ozlabs.org/patch/826695/
>>
>>
>> Usually RCU callbacks are simple, however for TC filters and actions,
>> they are complex because at least TC actions could be destroyed
>> together with the TC filter in one callback. And RCU callbacks are
>> invoked in BH context, without locking they are parallel too. All of
>> these contribute to the cause of these nasty bugs. It looks like they
>> bring us more problems than benefits.
>>
>> Therefore, I have been thinking about getting rid of these callbacks,
>> because they are not strictly necessary, callers of these call_rcu()
>> are all on slow path and have RTNL lock, so blocking is permitted in
>> their contexts, and _I think_ it does not harm to use
>> synchronize_rcu() on slow paths, at least I can argue RTNL lock is
>> already there and is a bottleneck if we really care. :)
>>
>> There are 3 solutions here:
>>
>> 1) Get rid of these RCU callbacks and use synchronize_rcu(). The
>> downside is this could hurt the performance of deleting TC filters,
>> but again it is slow path comparing to skb classification path. Note,
>> it is _not_ merely replacing call_rcu() with synchronize_rcu(),
>> because many call_rcu()'s are actually in list iterations, we have to
>> use a local list and call list_del_rcu()+list_add() before
>> synchronize_rcu() (Or is there any other API I am not aware of?). If
>> people really hate synchronize_rcu() because of performance, we could
>> also defer the work to a workqueue and callers could keep their
>> performance as they are.
>>
>> 2) Introduce a spinlock to serialize these RCU callbacks. But as I
>> said in commit 1697c4bb5245 ("net_sched: carefully handle
>> tcf_block_put()"), it is very hard to do because of tcf_chain_dump().
>> Potentially we need to do a lot of work to make it possible, if not
>> impossible.
>>
>> 3) Keep these RCU callbacks and fix all race conditions. Like what
>> Chris tries to do in his patchset, but my argument is that we can not
>> prove we are really race-free even with Chris' patches and his patches
>> are already large enough.
>>
>>
>> What do you think? Any other ideas?
>
> 4) Move from call_rcu() to synchronize_rcu(), but if feasible use one
> synchronize_rcu() for multiple deletions/iterations.

This is what I meant by using a local list, perhaps I didn't make it clear.


>
> 5) Keep call_rcu(), but have the RCU callback schedule a workqueue.
> The workqueue could then use blocking primitives, for example, acquiring
> RTNL.

Yeah, this could work too but we would get one more async...

filter delete -> call_rcu() -> schedule_work() -> action destroy



>
> 6) As with #5, have the RCU callback schedule a workqueue, but aggregate
> workqueue scheduling using a timer.  This would reduce the number of
> RTNL acquisitions.

Ouch, sounds like even one more async:

filter delete -> call_rcu() -> schedule_work() -> timer -> flush_work()
-> action destroy

:-(


>
> 7) As with #5, have the RCU callback schedule a workqueue, but have each
> iterator accumulate a list of things removed and do call_rcu() on the
> list.  This is an alternative way of aggregating to reduce the number
> of RTNL acquisitions.


Yeah, this seems working too.


>
> There are many other ways to skin this cat.
>

We still have to pick one. :) Any preference? I want to keep it as simple
as possible, otherwise some day I would not understand it either.

Thanks for all the ideas!

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Get rid of RCU callbacks in TC filters?
  2017-10-20  3:26   ` Cong Wang
@ 2017-10-20 16:56     ` Paul E. McKenney
  2017-10-20 20:31       ` Cong Wang
  0 siblings, 1 reply; 9+ messages in thread
From: Paul E. McKenney @ 2017-10-20 16:56 UTC (permalink / raw)
  To: Cong Wang
  Cc: Jamal Hadi Salim, Chris Mi, Linux Kernel Network Developers,
	Daniel Borkmann, Eric Dumazet, David Miller, Jiri Pirko

On Thu, Oct 19, 2017 at 08:26:01PM -0700, Cong Wang wrote:
> On Wed, Oct 18, 2017 at 12:35 PM, Paul E. McKenney
> <paulmck@linux.vnet.ibm.com> wrote:
> > On Wed, Oct 18, 2017 at 10:36:28AM -0700, Cong Wang wrote:
> >> Hi, all
> >>
> >> Recently, the RCU callbacks used in TC filters and TC actions keep
> >> drawing my attention, they introduce at least 4 race condition bugs:
> >>
> >> 1. A simple one fixed by Daniel:
> >>
> >> commit c78e1746d3ad7d548bdf3fe491898cc453911a49
> >> Author: Daniel Borkmann <daniel@iogearbox.net>
> >> Date:   Wed May 20 17:13:33 2015 +0200
> >>
> >>     net: sched: fix call_rcu() race on classifier module unloads
> >>
> >> 2. A very nasty one fixed by me:
> >>
> >> commit 1697c4bb5245649a23f06a144cc38c06715e1b65
> >> Author: Cong Wang <xiyou.wangcong@gmail.com>
> >> Date:   Mon Sep 11 16:33:32 2017 -0700
> >>
> >>     net_sched: carefully handle tcf_block_put()
> >>
> >> 3. Two more bugs found by Chris:
> >> https://patchwork.ozlabs.org/patch/826696/
> >> https://patchwork.ozlabs.org/patch/826695/
> >>
> >>
> >> Usually RCU callbacks are simple, however for TC filters and actions,
> >> they are complex because at least TC actions could be destroyed
> >> together with the TC filter in one callback. And RCU callbacks are
> >> invoked in BH context, without locking they are parallel too. All of
> >> these contribute to the cause of these nasty bugs. It looks like they
> >> bring us more problems than benefits.
> >>
> >> Therefore, I have been thinking about getting rid of these callbacks,
> >> because they are not strictly necessary, callers of these call_rcu()
> >> are all on slow path and have RTNL lock, so blocking is permitted in
> >> their contexts, and _I think_ it does not harm to use
> >> synchronize_rcu() on slow paths, at least I can argue RTNL lock is
> >> already there and is a bottleneck if we really care. :)
> >>
> >> There are 3 solutions here:
> >>
> >> 1) Get rid of these RCU callbacks and use synchronize_rcu(). The
> >> downside is this could hurt the performance of deleting TC filters,
> >> but again it is slow path comparing to skb classification path. Note,
> >> it is _not_ merely replacing call_rcu() with synchronize_rcu(),
> >> because many call_rcu()'s are actually in list iterations, we have to
> >> use a local list and call list_del_rcu()+list_add() before
> >> synchronize_rcu() (Or is there any other API I am not aware of?). If
> >> people really hate synchronize_rcu() because of performance, we could
> >> also defer the work to a workqueue and callers could keep their
> >> performance as they are.
> >>
> >> 2) Introduce a spinlock to serialize these RCU callbacks. But as I
> >> said in commit 1697c4bb5245 ("net_sched: carefully handle
> >> tcf_block_put()"), it is very hard to do because of tcf_chain_dump().
> >> Potentially we need to do a lot of work to make it possible, if not
> >> impossible.
> >>
> >> 3) Keep these RCU callbacks and fix all race conditions. Like what
> >> Chris tries to do in his patchset, but my argument is that we can not
> >> prove we are really race-free even with Chris' patches and his patches
> >> are already large enough.
> >>
> >>
> >> What do you think? Any other ideas?
> >
> > 4) Move from call_rcu() to synchronize_rcu(), but if feasible use one
> > synchronize_rcu() for multiple deletions/iterations.
> 
> This is what I meant by using a local list, perhaps I didn't make it clear.

Ah, got it.

> > 5) Keep call_rcu(), but have the RCU callback schedule a workqueue.
> > The workqueue could then use blocking primitives, for example, acquiring
> > RTNL.
> 
> Yeah, this could work too but we would get one more async...
> 
> filter delete -> call_rcu() -> schedule_work() -> action destroy

True, but on the other hand you get to hold RTNL.

> > 6) As with #5, have the RCU callback schedule a workqueue, but aggregate
> > workqueue scheduling using a timer.  This would reduce the number of
> > RTNL acquisitions.
> 
> Ouch, sounds like even one more async:
> 
> filter delete -> call_rcu() -> schedule_work() -> timer -> flush_work()
> -> action destroy
> 
> :-(

Indeed, the price of scalability and performance is often added
asynchronous action at a distance.  But sometimes you can have
scalability, performance, -and- synchronous action.  Not sure that this
is one of those cases, but perhaps someone will come up with some trick
that we are not yet seeing.

And again, one benefit you get from the added asynchrony is the ability
to acquire RTNL.  Another is increased batching, allowing the overhead
of acquiring RTNL to be amortized over a larger number of updates.

> > 7) As with #5, have the RCU callback schedule a workqueue, but have each
> > iterator accumulate a list of things removed and do call_rcu() on the
> > list.  This is an alternative way of aggregating to reduce the number
> > of RTNL acquisitions.
> 
> Yeah, this seems working too.
> 
> > There are many other ways to skin this cat.
> 
> We still have to pick one. :) Any preference? I want to keep it as simple
> as possible, otherwise some day I would not understand it either.

I must defer to the people who actually fully understand this code.

> Thanks for all the ideas!

Please let me know how it goes!

							Thanx, Paul

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Get rid of RCU callbacks in TC filters?
  2017-10-20 16:56     ` Paul E. McKenney
@ 2017-10-20 20:31       ` Cong Wang
  2017-10-20 20:52         ` Cong Wang
  0 siblings, 1 reply; 9+ messages in thread
From: Cong Wang @ 2017-10-20 20:31 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Jamal Hadi Salim, Chris Mi, Linux Kernel Network Developers,
	Daniel Borkmann, Eric Dumazet, David Miller, Jiri Pirko

On Fri, Oct 20, 2017 at 9:56 AM, Paul E. McKenney
<paulmck@linux.vnet.ibm.com> wrote:
> On Thu, Oct 19, 2017 at 08:26:01PM -0700, Cong Wang wrote:
>> On Wed, Oct 18, 2017 at 12:35 PM, Paul E. McKenney
>> <paulmck@linux.vnet.ibm.com> wrote:
>> > 5) Keep call_rcu(), but have the RCU callback schedule a workqueue.
>> > The workqueue could then use blocking primitives, for example, acquiring
>> > RTNL.
>>
>> Yeah, this could work too but we would get one more async...
>>
>> filter delete -> call_rcu() -> schedule_work() -> action destroy
>
> True, but on the other hand you get to hold RTNL.

I can get RTNL too with converting call_rcu() to synchronize_rcu().
;)

So this turns into the question again: if we mind synchronize_rcu()
on slow paths or not?

Actually, I just tried this approach, this way makes the core tc filter
code harder to wait for flying callbacks, currently rcu_barrier() is
enough, with one more schedule_work() added we probably
need flush_workqueue()... Huh, this also means I can't use the
global workqueue so should add a new workqueue for tc filters.

Good news is I seem to make it work without adding much code.
Stay tuned. ;)

>
>> > 6) As with #5, have the RCU callback schedule a workqueue, but aggregate
>> > workqueue scheduling using a timer.  This would reduce the number of
>> > RTNL acquisitions.
>>
>> Ouch, sounds like even one more async:
>>
>> filter delete -> call_rcu() -> schedule_work() -> timer -> flush_work()
>> -> action destroy
>>
>> :-(
>
> Indeed, the price of scalability and performance is often added
> asynchronous action at a distance.  But sometimes you can have
> scalability, performance, -and- synchronous action.  Not sure that this
> is one of those cases, but perhaps someone will come up with some trick
> that we are not yet seeing.
>
> And again, one benefit you get from the added asynchrony is the ability
> to acquire RTNL.  Another is increased batching, allowing the overhead
> of acquiring RTNL to be amortized over a larger number of updates.

Understood, my point is it might not be worthy to optimize a slow path
which already has RTNL lock...


>
>> > 7) As with #5, have the RCU callback schedule a workqueue, but have each
>> > iterator accumulate a list of things removed and do call_rcu() on the
>> > list.  This is an alternative way of aggregating to reduce the number
>> > of RTNL acquisitions.
>>
>> Yeah, this seems working too.
>>
>> > There are many other ways to skin this cat.
>>
>> We still have to pick one. :) Any preference? I want to keep it as simple
>> as possible, otherwise some day I would not understand it either.
>
> I must defer to the people who actually fully understand this code.

I understand that code, just not sure about which approach I should
pick.

I will keep you Cc'ed for any further change I make.

Thanks!

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Get rid of RCU callbacks in TC filters?
  2017-10-20 20:31       ` Cong Wang
@ 2017-10-20 20:52         ` Cong Wang
  2017-10-23 11:10           ` David Laight
  0 siblings, 1 reply; 9+ messages in thread
From: Cong Wang @ 2017-10-20 20:52 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Jamal Hadi Salim, Chris Mi, Linux Kernel Network Developers,
	Daniel Borkmann, Eric Dumazet, David Miller, Jiri Pirko

On Fri, Oct 20, 2017 at 1:31 PM, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> Actually, I just tried this approach, this way makes the core tc filter
> code harder to wait for flying callbacks, currently rcu_barrier() is
> enough, with one more schedule_work() added we probably
> need flush_workqueue()... Huh, this also means I can't use the
> global workqueue so should add a new workqueue for tc filters.
>
> Good news is I seem to make it work without adding much code.
> Stay tuned. ;)

Hmm, lockdep already complains I have a deadlock when flushing
workqueue while hodling RTNL lock, because callbacks need RTNL
too... See the rcu_barrier() in tcf_block_put().

So this approach seems to be a dead end. Is there any way out?

Probably we have to convert call_rcu() to synchronize_rcu(),
but with batching of course.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: Get rid of RCU callbacks in TC filters?
  2017-10-20 20:52         ` Cong Wang
@ 2017-10-23 11:10           ` David Laight
  0 siblings, 0 replies; 9+ messages in thread
From: David Laight @ 2017-10-23 11:10 UTC (permalink / raw)
  To: 'Cong Wang', Paul E. McKenney
  Cc: Jamal Hadi Salim, Chris Mi, Linux Kernel Network Developers,
	Daniel Borkmann, Eric Dumazet, David Miller, Jiri Pirko

From: Cong Wang
> Sent: 20 October 2017 21:52
> To: Paul E. McKenney
> Cc: Jamal Hadi Salim; Chris Mi; Linux Kernel Network Developers; Daniel Borkmann; Eric Dumazet; David
> Miller; Jiri Pirko
> Subject: Re: Get rid of RCU callbacks in TC filters?
> 
> On Fri, Oct 20, 2017 at 1:31 PM, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> > Actually, I just tried this approach, this way makes the core tc filter
> > code harder to wait for flying callbacks, currently rcu_barrier() is
> > enough, with one more schedule_work() added we probably
> > need flush_workqueue()... Huh, this also means I can't use the
> > global workqueue so should add a new workqueue for tc filters.
> >
> > Good news is I seem to make it work without adding much code.
> > Stay tuned. ;)
> 
> Hmm, lockdep already complains I have a deadlock when flushing
> workqueue while hodling RTNL lock, because callbacks need RTNL
> too... See the rcu_barrier() in tcf_block_put().
> 
> So this approach seems to be a dead end. Is there any way out?
> 
> Probably we have to convert call_rcu() to synchronize_rcu(),
> but with batching of course.

Locking for callbacks is always a PITA.
You also need to sort out how to 'unhook' the callbacks.

If you require the caller hold the rntl_lock prior to adding
or removing callbacks, and guarantee that the functions will
be called with the rntl_lock held you might be ok.

Otherwise the one side can't do its 'expected' locking without
a potential deadlock or use-after-free.

The only way I know to avoid the issues removing the callbacks
is to get the callback function called from its normal scheduling
point with an argument (probably NULL) that indicates it will
never be called again.

	David


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2017-10-23 11:10 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-10-18 17:36 Get rid of RCU callbacks in TC filters? Cong Wang
2017-10-18 19:35 ` Paul E. McKenney
2017-10-19 15:34   ` John Fastabend
2017-10-20  3:15     ` Cong Wang
2017-10-20  3:26   ` Cong Wang
2017-10-20 16:56     ` Paul E. McKenney
2017-10-20 20:31       ` Cong Wang
2017-10-20 20:52         ` Cong Wang
2017-10-23 11:10           ` David Laight

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).