netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC net 0/1] net: sched: act: fix rcu race
@ 2017-10-10 12:32 Alexander Aring
  2017-10-10 12:32 ` [RFC net 1/1] net: sched: act: fix rcu race in dump Alexander Aring
  0 siblings, 1 reply; 6+ messages in thread
From: Alexander Aring @ 2017-10-10 12:32 UTC (permalink / raw)
  To: jhs; +Cc: xiyou.wangcong, jiri, netdev, kurup.manish, bjb, Alexander Aring

Hi,

while I reading tc action code to debug a "it does not work" statement I
suppose I detected issues with the current rcu handling of tc actions.

There are more than just skbmod which do it wrong. Anyway if somebody agree
with me here I will send more patches which fix this behaviour in other tc
actions where code was just copy&pasted.

The problem because nobody hits this issue is, I think that dump will do alot
of previous stuff which took more time than a rcu_synchronize. Anyway, this
change should avoid any use after free issues etc.

- Alex

Alexander Aring (1):
  net: sched: act: fix rcu race in dump

 net/sched/act_skbmod.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

-- 
2.11.0

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [RFC net 1/1] net: sched: act: fix rcu race in dump
  2017-10-10 12:32 [RFC net 0/1] net: sched: act: fix rcu race Alexander Aring
@ 2017-10-10 12:32 ` Alexander Aring
  2017-10-10 12:39   ` Alexander Aring
                     ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Alexander Aring @ 2017-10-10 12:32 UTC (permalink / raw)
  To: jhs; +Cc: xiyou.wangcong, jiri, netdev, kurup.manish, bjb, Alexander Aring

This patch fixes an issue with kfree_rcu which is not protected by RTNL
lock. It could be that the current assigned rcu pointer will be freed by
kfree_rcu while dump callback is running.

To prevent this, we call rcu_synchronize at first. Then we are sure all
latest rcu functions e.g. rcu_assign_pointer and kfree_rcu in init are
done. After rcu_synchronize we dereference under RTNL lock which is also
held in init function, which means no other rcu_assign_pointer or
kfree_rcu will occur.

To call rcu_synchronize will also prevent weird behaviours by doing over
netlink:

 - set params A
 - set params B
 - dump params
  \--> will dump params A

This could be a unlikely case that the last rcu_assign_pointer was not
happened before dump callback.

Signed-off-by: Alexander Aring <aring@mojatatu.com>
---
 net/sched/act_skbmod.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/net/sched/act_skbmod.c b/net/sched/act_skbmod.c
index b642ad3d39dd..231e07bca384 100644
--- a/net/sched/act_skbmod.c
+++ b/net/sched/act_skbmod.c
@@ -198,7 +198,7 @@ static int tcf_skbmod_dump(struct sk_buff *skb, struct tc_action *a,
 {
 	struct tcf_skbmod *d = to_skbmod(a);
 	unsigned char *b = skb_tail_pointer(skb);
-	struct tcf_skbmod_params  *p = rtnl_dereference(d->skbmod_p);
+	struct tcf_skbmod_params  *p;
 	struct tc_skbmod opt = {
 		.index   = d->tcf_index,
 		.refcnt  = d->tcf_refcnt - ref,
@@ -207,6 +207,11 @@ static int tcf_skbmod_dump(struct sk_buff *skb, struct tc_action *a,
 	};
 	struct tcf_t t;
 
+	/* wait until last rcu_assign_pointer/kfree_rcu is done */
+	rcu_synchronize();
+	/* RTNL lock prevents another rcu_assign_pointer/kfree_rcu call */
+	p = rtnl_dereference(d->skbmod_p);
+
 	opt.flags  = p->flags;
 	if (nla_put(skb, TCA_SKBMOD_PARMS, sizeof(opt), &opt))
 		goto nla_put_failure;
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [RFC net 1/1] net: sched: act: fix rcu race in dump
  2017-10-10 12:32 ` [RFC net 1/1] net: sched: act: fix rcu race in dump Alexander Aring
@ 2017-10-10 12:39   ` Alexander Aring
  2017-10-10 14:12   ` Eric Dumazet
  2017-10-10 16:40   ` Cong Wang
  2 siblings, 0 replies; 6+ messages in thread
From: Alexander Aring @ 2017-10-10 12:39 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: Cong Wang, Jiří Pírko, netdev, Manish Kurup,
	Brenda Butler, Alexander Aring

Hi,

On Tue, Oct 10, 2017 at 8:32 AM, Alexander Aring <aring@mojatatu.com> wrote:
> This patch fixes an issue with kfree_rcu which is not protected by RTNL
> lock. It could be that the current assigned rcu pointer will be freed by
> kfree_rcu while dump callback is running.
>
> To prevent this, we call rcu_synchronize at first. Then we are sure all
> latest rcu functions e.g. rcu_assign_pointer and kfree_rcu in init are
> done. After rcu_synchronize we dereference under RTNL lock which is also
> held in init function, which means no other rcu_assign_pointer or
> kfree_rcu will occur.
>
> To call rcu_synchronize will also prevent weird behaviours by doing over
> netlink:
>
>  - set params A
>  - set params B
>  - dump params
>   \--> will dump params A
>
> This could be a unlikely case that the last rcu_assign_pointer was not
> happened before dump callback.
>
> Signed-off-by: Alexander Aring <aring@mojatatu.com>
> ---
>  net/sched/act_skbmod.c | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/net/sched/act_skbmod.c b/net/sched/act_skbmod.c
> index b642ad3d39dd..231e07bca384 100644
> --- a/net/sched/act_skbmod.c
> +++ b/net/sched/act_skbmod.c
> @@ -198,7 +198,7 @@ static int tcf_skbmod_dump(struct sk_buff *skb, struct tc_action *a,
>  {
>         struct tcf_skbmod *d = to_skbmod(a);
>         unsigned char *b = skb_tail_pointer(skb);
> -       struct tcf_skbmod_params  *p = rtnl_dereference(d->skbmod_p);
> +       struct tcf_skbmod_params  *p;
>         struct tc_skbmod opt = {
>                 .index   = d->tcf_index,
>                 .refcnt  = d->tcf_refcnt - ref,
> @@ -207,6 +207,11 @@ static int tcf_skbmod_dump(struct sk_buff *skb, struct tc_action *a,
>         };
>         struct tcf_t t;
>
> +       /* wait until last rcu_assign_pointer/kfree_rcu is done */
> +       rcu_synchronize();

... and next time I should use the right function:

s/rcu_synchronize/synchronize_rcu/

anyway there exists a reason why sent it as RFC. :-)

Thanks.

- Alex

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC net 1/1] net: sched: act: fix rcu race in dump
  2017-10-10 12:32 ` [RFC net 1/1] net: sched: act: fix rcu race in dump Alexander Aring
  2017-10-10 12:39   ` Alexander Aring
@ 2017-10-10 14:12   ` Eric Dumazet
  2017-10-10 18:09     ` Alexander Aring
  2017-10-10 16:40   ` Cong Wang
  2 siblings, 1 reply; 6+ messages in thread
From: Eric Dumazet @ 2017-10-10 14:12 UTC (permalink / raw)
  To: Alexander Aring; +Cc: jhs, xiyou.wangcong, jiri, netdev, kurup.manish, bjb

On Tue, 2017-10-10 at 08:32 -0400, Alexander Aring wrote:
> This patch fixes an issue with kfree_rcu which is not protected by RTNL
> lock. It could be that the current assigned rcu pointer will be freed by
> kfree_rcu while dump callback is running.
> 
> To prevent this, we call rcu_synchronize at first. Then we are sure all
> latest rcu functions e.g. rcu_assign_pointer and kfree_rcu in init are
> done. After rcu_synchronize we dereference under RTNL lock which is also
> held in init function, which means no other rcu_assign_pointer or
> kfree_rcu will occur.
> 
> To call rcu_synchronize will also prevent weird behaviours by doing over
> netlink:
> 
>  - set params A
>  - set params B
>  - dump params
>   \--> will dump params A
> 
> This could be a unlikely case that the last rcu_assign_pointer was not
> happened before dump callback.
> 
> Signed-off-by: Alexander Aring <aring@mojatatu.com>
> ---
>  net/sched/act_skbmod.c | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/net/sched/act_skbmod.c b/net/sched/act_skbmod.c
> index b642ad3d39dd..231e07bca384 100644
> --- a/net/sched/act_skbmod.c
> +++ b/net/sched/act_skbmod.c
> @@ -198,7 +198,7 @@ static int tcf_skbmod_dump(struct sk_buff *skb, struct tc_action *a,
>  {
>  	struct tcf_skbmod *d = to_skbmod(a);
>  	unsigned char *b = skb_tail_pointer(skb);
> -	struct tcf_skbmod_params  *p = rtnl_dereference(d->skbmod_p);
> +	struct tcf_skbmod_params  *p;
>  	struct tc_skbmod opt = {
>  		.index   = d->tcf_index,
>  		.refcnt  = d->tcf_refcnt - ref,
> @@ -207,6 +207,11 @@ static int tcf_skbmod_dump(struct sk_buff *skb, struct tc_action *a,
>  	};
>  	struct tcf_t t;
>  
> +	/* wait until last rcu_assign_pointer/kfree_rcu is done */
> +	rcu_synchronize();
> +	/* RTNL lock prevents another rcu_assign_pointer/kfree_rcu call */
> +	p = rtnl_dereference(d->skbmod_p);
> +
>  	opt.flags  = p->flags;
>  	if (nla_put(skb, TCA_SKBMOD_PARMS, sizeof(opt), &opt))
>  		goto nla_put_failure;

Sorry but no. This is plainly wrong.

We need to fix this without adding a _very_ expensive rcu_synchronize()
on a path which does not need such thing.

I am confused by this patch, please tell us more what the problem is.

I suspect rcu_read_lock() is what you need, but isn't a writer supposed
to hold RTNL in net/sched/* ???

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC net 1/1] net: sched: act: fix rcu race in dump
  2017-10-10 12:32 ` [RFC net 1/1] net: sched: act: fix rcu race in dump Alexander Aring
  2017-10-10 12:39   ` Alexander Aring
  2017-10-10 14:12   ` Eric Dumazet
@ 2017-10-10 16:40   ` Cong Wang
  2 siblings, 0 replies; 6+ messages in thread
From: Cong Wang @ 2017-10-10 16:40 UTC (permalink / raw)
  To: Alexander Aring
  Cc: Jamal Hadi Salim, Jiri Pirko, Linux Kernel Network Developers,
	kurup.manish, Brenda Butler

On Tue, Oct 10, 2017 at 5:32 AM, Alexander Aring <aring@mojatatu.com> wrote:
> This patch fixes an issue with kfree_rcu which is not protected by RTNL
> lock. It could be that the current assigned rcu pointer will be freed by
> kfree_rcu while dump callback is running.

Why? kfree_rcu() respects existing readers, so why this could happen?


>
> To prevent this, we call rcu_synchronize at first. Then we are sure all
> latest rcu functions e.g. rcu_assign_pointer and kfree_rcu in init are
> done. After rcu_synchronize we dereference under RTNL lock which is also
> held in init function, which means no other rcu_assign_pointer or
> kfree_rcu will occur.

If you really want to wait for kfree_rcu(), rcu_barrier() is the one
instead of rcu_synchronize(). Just FYI.


>
> To call rcu_synchronize will also prevent weird behaviours by doing over
> netlink:
>
>  - set params A
>  - set params B
>  - dump params
>   \--> will dump params A


What's wrong with this? Existing readers could still read old data,
which is _perfectly_ fine as long as we don't free the old data before
they are gone.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC net 1/1] net: sched: act: fix rcu race in dump
  2017-10-10 14:12   ` Eric Dumazet
@ 2017-10-10 18:09     ` Alexander Aring
  0 siblings, 0 replies; 6+ messages in thread
From: Alexander Aring @ 2017-10-10 18:09 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Jamal Hadi Salim, Cong Wang, Jiří Pírko, netdev,
	Manish Kurup, Brenda Butler

Hi,

On Tue, Oct 10, 2017 at 10:12 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Tue, 2017-10-10 at 08:32 -0400, Alexander Aring wrote:
>> This patch fixes an issue with kfree_rcu which is not protected by RTNL
>> lock. It could be that the current assigned rcu pointer will be freed by
>> kfree_rcu while dump callback is running.
>>
>> To prevent this, we call rcu_synchronize at first. Then we are sure all
>> latest rcu functions e.g. rcu_assign_pointer and kfree_rcu in init are
>> done. After rcu_synchronize we dereference under RTNL lock which is also
>> held in init function, which means no other rcu_assign_pointer or
>> kfree_rcu will occur.
>>
>> To call rcu_synchronize will also prevent weird behaviours by doing over
>> netlink:
>>
>>  - set params A
>>  - set params B
>>  - dump params
>>   \--> will dump params A
>>
>> This could be a unlikely case that the last rcu_assign_pointer was not
>> happened before dump callback.
>>
>> Signed-off-by: Alexander Aring <aring@mojatatu.com>
>> ---
>>  net/sched/act_skbmod.c | 7 ++++++-
>>  1 file changed, 6 insertions(+), 1 deletion(-)
>>
>> diff --git a/net/sched/act_skbmod.c b/net/sched/act_skbmod.c
>> index b642ad3d39dd..231e07bca384 100644
>> --- a/net/sched/act_skbmod.c
>> +++ b/net/sched/act_skbmod.c
>> @@ -198,7 +198,7 @@ static int tcf_skbmod_dump(struct sk_buff *skb, struct tc_action *a,
>>  {
>>       struct tcf_skbmod *d = to_skbmod(a);
>>       unsigned char *b = skb_tail_pointer(skb);
>> -     struct tcf_skbmod_params  *p = rtnl_dereference(d->skbmod_p);
>> +     struct tcf_skbmod_params  *p;
>>       struct tc_skbmod opt = {
>>               .index   = d->tcf_index,
>>               .refcnt  = d->tcf_refcnt - ref,
>> @@ -207,6 +207,11 @@ static int tcf_skbmod_dump(struct sk_buff *skb, struct tc_action *a,
>>       };
>>       struct tcf_t t;
>>
>> +     /* wait until last rcu_assign_pointer/kfree_rcu is done */
>> +     rcu_synchronize();
>> +     /* RTNL lock prevents another rcu_assign_pointer/kfree_rcu call */
>> +     p = rtnl_dereference(d->skbmod_p);
>> +
>>       opt.flags  = p->flags;
>>       if (nla_put(skb, TCA_SKBMOD_PARMS, sizeof(opt), &opt))
>>               goto nla_put_failure;
>
> Sorry but no. This is plainly wrong.
>
> We need to fix this without adding a _very_ expensive rcu_synchronize()
> on a path which does not need such thing.
>

I agree that a rcu synchronize is very expensive while holding RTNL.
Should be handled with rcu_read_lock as you suggested below, but this
will not prevent to show an user space behavior like:

 - set_params(A)
 - set_params(B)
  \---> dump - will dump values A

Because the rcu_read_lock will avoid rcu_assign_pointer to update the
pointer and not wait that the rcu_assign_pointer of set_params(B) is
done before calling dump.
Okay, this issue is maybe something we should not care about it so far
it's not an use after free issue.


> I am confused by this patch, please tell us more what the problem is.
>

The callback "init" is also called by updating parameters for an action.

It use rcu_assign_pointer [0], as well kfree_rcu [1] to swap the
pointers of parameter structures and free the old resource.
This is well protected by rcu_read_lock inside the "run" callback of
tc action, which runs in softirq context. But dump is only protected
by RTNL so far I see.

Sorry when I understood RCU wrong, but so far I understood RCU
handling, it _could_ be that returning of "init" the pointers are not
updated yet. After a "grace" period, which rcu synchronize waits for
it - we can be sure that it's assigned and kfree_rcu completes.

The problem is:
If the deference of parameters inside dump callback using still the
old structure (for my understanding, it can happened because this
callback do nothing against it to protect it) kfree_rcu can free the
resource during accessing this structure. A RCU read lock will of
course preventing RCU to update the pointers in this time (but not
RTNL, so far I understood).

> I suspect rcu_read_lock() is what you need, but isn't a writer supposed
> to hold RTNL in net/sched/* ???
>

Yes a writer holds RTNL, but these writers using RCU to write (as
shown in [0] and [1]). So far I know kfree_rcu: it can occur that
"init" returns and dump is called afterwards - during the dump RCU can
run and free/assign pointers in this time (while dump still holds
references). So far I understand a RTNL lock will not prevent RCU to
do that.

I wrote this mail also to get an answer if there exists a problem or
not. If you say me, the resource cannot be freed by kfree_rcu if RTNL
lock is hold, then I know more about how RCU is working now.

- Alex

[0] http://elixir.free-electrons.com/linux/latest/source/net/sched/act_skbmod.c#L177
[1] http://elixir.free-electrons.com/linux/latest/source/net/sched/act_skbmod.c#L182

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2017-10-10 18:09 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-10-10 12:32 [RFC net 0/1] net: sched: act: fix rcu race Alexander Aring
2017-10-10 12:32 ` [RFC net 1/1] net: sched: act: fix rcu race in dump Alexander Aring
2017-10-10 12:39   ` Alexander Aring
2017-10-10 14:12   ` Eric Dumazet
2017-10-10 18:09     ` Alexander Aring
2017-10-10 16:40   ` Cong Wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).