* [PATCH RESEND net-next] net: Do synchronize_rcu() in ip6mr_sk_done() only if this is needed
@ 2018-03-06 16:24 Kirill Tkhai
2018-03-06 16:50 ` Eric Dumazet
0 siblings, 1 reply; 11+ messages in thread
From: Kirill Tkhai @ 2018-03-06 16:24 UTC (permalink / raw)
To: davem, yoshfuji, netdev, ktkhai
After unshare test kworker hangs for ages:
$ while :; do unshare -n true; done &
$ perf report <kworker>
- 88,82% 0,00% kworker/u16:0 [kernel.vmlinux] [k] cleanup_net
cleanup_net
- ops_exit_list.isra.9
- 85,68% igmp6_net_exit
- 53,31% sock_release
- inet_release
- 25,33% rawv6_close
- ip6mr_sk_done
+ 23,38% synchronize_rcu
Keep in mind, this perf report shows the time a function was executing, and
it does not show the time, it was sleeping. But it's easy to imagine, how
much it is sleeping, if synchronize_rcu() execution takes the most time.
Top shows the kworker R time is less then 1%.
This happen, because of in ip6mr_sk_done() we do too many synchronize_rcu(),
even for the sockets, that are not referenced in mr_table, and which are not
need it. So, the progress of kworker becomes very slow.
The patch introduces apparent solution, and it makes ip6mr_sk_done() to skip
synchronize_rcu() for sockets, that are not need that. After the patch,
kworker becomes able to warm the cpu up as expected.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
---
net/ipv6/ip6mr.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/net/ipv6/ip6mr.c b/net/ipv6/ip6mr.c
index 2a38f9de45d3..290a8d0d5eac 100644
--- a/net/ipv6/ip6mr.c
+++ b/net/ipv6/ip6mr.c
@@ -1485,7 +1485,9 @@ int ip6mr_sk_done(struct sock *sk)
}
}
rtnl_unlock();
- synchronize_rcu();
+
+ if (!err)
+ synchronize_rcu();
return err;
}
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH RESEND net-next] net: Do synchronize_rcu() in ip6mr_sk_done() only if this is needed
2018-03-06 16:24 [PATCH RESEND net-next] net: Do synchronize_rcu() in ip6mr_sk_done() only if this is needed Kirill Tkhai
@ 2018-03-06 16:50 ` Eric Dumazet
2018-03-06 16:58 ` Eric Dumazet
2018-03-07 9:22 ` Kirill Tkhai
0 siblings, 2 replies; 11+ messages in thread
From: Eric Dumazet @ 2018-03-06 16:50 UTC (permalink / raw)
To: Kirill Tkhai, davem, yoshfuji, netdev; +Cc: Yuval Mintz
On Tue, 2018-03-06 at 19:24 +0300, Kirill Tkhai wrote:
> After unshare test kworker hangs for ages:
>
> $ while :; do unshare -n true; done &
>
> $ perf report <kworker>
> - 88,82% 0,00% kworker/u16:0 [kernel.vmlinux] [k]
> cleanup_net
> cleanup_net
> - ops_exit_list.isra.9
> - 85,68% igmp6_net_exit
> - 53,31% sock_release
> - inet_release
> - 25,33% rawv6_close
> - ip6mr_sk_done
> + 23,38% synchronize_rcu
>
> Keep in mind, this perf report shows the time a function was
> executing, and
> it does not show the time, it was sleeping. But it's easy to imagine,
> how
> much it is sleeping, if synchronize_rcu() execution takes the most
> time.
> Top shows the kworker R time is less then 1%.
>
> This happen, because of in ip6mr_sk_done() we do too many
> synchronize_rcu(),
> even for the sockets, that are not referenced in mr_table, and which
> are not
> need it. So, the progress of kworker becomes very slow.
>
> The patch introduces apparent solution, and it makes ip6mr_sk_done()
> to skip
> synchronize_rcu() for sockets, that are not need that. After the
> patch,
> kworker becomes able to warm the cpu up as expected.
>
> Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
> ---
> net/ipv6/ip6mr.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/net/ipv6/ip6mr.c b/net/ipv6/ip6mr.c
> index 2a38f9de45d3..290a8d0d5eac 100644
> --- a/net/ipv6/ip6mr.c
> +++ b/net/ipv6/ip6mr.c
> @@ -1485,7 +1485,9 @@ int ip6mr_sk_done(struct sock *sk)
> }
> }
> rtnl_unlock();
> - synchronize_rcu();
> +
> + if (!err)
> + synchronize_rcu();
>
But... what is this synchronize_rcu() doing exactly ?
This was added in 8571ab479a6e1ef46ead5ebee567e128a422767c
("ip6mr: Make mroute_sk rcu-based")
Typically on a delete, the synchronize_rcu() would be needed before
freeing the deleted object.
But nowadays we have better way : SOCK_RCU_FREE
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH RESEND net-next] net: Do synchronize_rcu() in ip6mr_sk_done() only if this is needed
2018-03-06 16:50 ` Eric Dumazet
@ 2018-03-06 16:58 ` Eric Dumazet
2018-03-06 17:20 ` Eric Dumazet
2018-03-07 9:22 ` Kirill Tkhai
1 sibling, 1 reply; 11+ messages in thread
From: Eric Dumazet @ 2018-03-06 16:58 UTC (permalink / raw)
To: Kirill Tkhai, davem, yoshfuji, netdev; +Cc: Yuval Mintz
On Tue, 2018-03-06 at 08:50 -0800, Eric Dumazet wrote:
>
> But... what is this synchronize_rcu() doing exactly ?
>
> This was added in 8571ab479a6e1ef46ead5ebee567e128a422767c
>
> ("ip6mr: Make mroute_sk rcu-based")
>
> Typically on a delete, the synchronize_rcu() would be needed before
> freeing the deleted object.
>
> But nowadays we have better way : SOCK_RCU_FREE
To be clear, your patch is fine Kirill,
I am only sad seeing one can add a synchronize_rcu() in hot path
without anyone complaining during code review.
Reviewed-by: Eric Dumazet <edumazet@google.com>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH RESEND net-next] net: Do synchronize_rcu() in ip6mr_sk_done() only if this is needed
2018-03-06 16:58 ` Eric Dumazet
@ 2018-03-06 17:20 ` Eric Dumazet
0 siblings, 0 replies; 11+ messages in thread
From: Eric Dumazet @ 2018-03-06 17:20 UTC (permalink / raw)
To: Kirill Tkhai, davem, yoshfuji, netdev; +Cc: Yuval Mintz
On Tue, 2018-03-06 at 08:58 -0800, Eric Dumazet wrote:
>
> To be clear, your patch is fine Kirill,
>
> I am only sad seeing one can add a synchronize_rcu() in hot path
> without anyone complaining during code review.
lpaa2:~# time for i in {1..1000}; do unshare -n /bin/false;done
real 7m18.911s
user 0m0.185s
sys 0m2.314s
Instead of less than 10 seconds not a long time ago :/
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH RESEND net-next] net: Do synchronize_rcu() in ip6mr_sk_done() only if this is needed
2018-03-06 16:50 ` Eric Dumazet
2018-03-06 16:58 ` Eric Dumazet
@ 2018-03-07 9:22 ` Kirill Tkhai
2018-03-07 9:32 ` Kirill Tkhai
1 sibling, 1 reply; 11+ messages in thread
From: Kirill Tkhai @ 2018-03-07 9:22 UTC (permalink / raw)
To: Eric Dumazet, davem, yoshfuji, netdev, Yuval Mintz
On 06.03.2018 19:50, Eric Dumazet wrote:
> On Tue, 2018-03-06 at 19:24 +0300, Kirill Tkhai wrote:
>> After unshare test kworker hangs for ages:
>>
>> $ while :; do unshare -n true; done &
>>
>> $ perf report <kworker>
>> - 88,82% 0,00% kworker/u16:0 [kernel.vmlinux] [k]
>> cleanup_net
>> cleanup_net
>> - ops_exit_list.isra.9
>> - 85,68% igmp6_net_exit
>> - 53,31% sock_release
>> - inet_release
>> - 25,33% rawv6_close
>> - ip6mr_sk_done
>> + 23,38% synchronize_rcu
>>
>> Keep in mind, this perf report shows the time a function was
>> executing, and
>> it does not show the time, it was sleeping. But it's easy to imagine,
>> how
>> much it is sleeping, if synchronize_rcu() execution takes the most
>> time.
>> Top shows the kworker R time is less then 1%.
>>
>> This happen, because of in ip6mr_sk_done() we do too many
>> synchronize_rcu(),
>> even for the sockets, that are not referenced in mr_table, and which
>> are not
>> need it. So, the progress of kworker becomes very slow.
>>
>> The patch introduces apparent solution, and it makes ip6mr_sk_done()
>> to skip
>> synchronize_rcu() for sockets, that are not need that. After the
>> patch,
>> kworker becomes able to warm the cpu up as expected.
>>
>> Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
>> ---
>> net/ipv6/ip6mr.c | 4 +++-
>> 1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/net/ipv6/ip6mr.c b/net/ipv6/ip6mr.c
>> index 2a38f9de45d3..290a8d0d5eac 100644
>> --- a/net/ipv6/ip6mr.c
>> +++ b/net/ipv6/ip6mr.c
>> @@ -1485,7 +1485,9 @@ int ip6mr_sk_done(struct sock *sk)
>> }
>> }
>> rtnl_unlock();
>> - synchronize_rcu();
>> +
>> + if (!err)
>> + synchronize_rcu();
>>
>
>
> But... what is this synchronize_rcu() doing exactly ?
>
> This was added in 8571ab479a6e1ef46ead5ebee567e128a422767c
>
> ("ip6mr: Make mroute_sk rcu-based")
>
> Typically on a delete, the synchronize_rcu() would be needed before
> freeing the deleted object.
>
> But nowadays we have better way : SOCK_RCU_FREE
Hm. I'm agree with you. This is hot path, and mroute sockets created from userspace
will delay userspace tasks close() and exit(). Since there may be many such sockets,
we may get a zombie task, which can't be reaped for ages. This slows down the system
directly.
Fix for pernet_operations works, but we need generic solution instead.
The commit "8571ab479a6e1ef46ead5ebee567e128a422767c" says:
ip6mr: Make mroute_sk rcu-based
In ipmr the mr_table socket is handled under RCU. Introduce the same
for ip6mr.
There is no pointing to improvements it invents, or to the problem it solves. The description
looks like a cleanup. It's too expensive cleanup, if it worsens the performance a hundred
times.
Can't we simply revert it?!
Yuval, do you have ideas to fix that (maybe, via SOCK_RCU_FREE suggested by Eric)?
We actually use rcu_dereference() in ip6mr_cache_report() only. The only user of dereference
is sock_queue_rcv_skb(). Superficial analysis shows we purge the queue in inet_sock_destruct().
Thanks,
Kirill
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH RESEND net-next] net: Do synchronize_rcu() in ip6mr_sk_done() only if this is needed
2018-03-07 9:22 ` Kirill Tkhai
@ 2018-03-07 9:32 ` Kirill Tkhai
2018-03-07 13:51 ` Yuval Mintz
0 siblings, 1 reply; 11+ messages in thread
From: Kirill Tkhai @ 2018-03-07 9:32 UTC (permalink / raw)
To: Eric Dumazet, davem, yoshfuji, netdev, Yuval Mintz
On 07.03.2018 12:22, Kirill Tkhai wrote:
> On 06.03.2018 19:50, Eric Dumazet wrote:
>> On Tue, 2018-03-06 at 19:24 +0300, Kirill Tkhai wrote:
>>> After unshare test kworker hangs for ages:
>>>
>>> $ while :; do unshare -n true; done &
>>>
>>> $ perf report <kworker>
>>> - 88,82% 0,00% kworker/u16:0 [kernel.vmlinux] [k]
>>> cleanup_net
>>> cleanup_net
>>> - ops_exit_list.isra.9
>>> - 85,68% igmp6_net_exit
>>> - 53,31% sock_release
>>> - inet_release
>>> - 25,33% rawv6_close
>>> - ip6mr_sk_done
>>> + 23,38% synchronize_rcu
>>>
>>> Keep in mind, this perf report shows the time a function was
>>> executing, and
>>> it does not show the time, it was sleeping. But it's easy to imagine,
>>> how
>>> much it is sleeping, if synchronize_rcu() execution takes the most
>>> time.
>>> Top shows the kworker R time is less then 1%.
>>>
>>> This happen, because of in ip6mr_sk_done() we do too many
>>> synchronize_rcu(),
>>> even for the sockets, that are not referenced in mr_table, and which
>>> are not
>>> need it. So, the progress of kworker becomes very slow.
>>>
>>> The patch introduces apparent solution, and it makes ip6mr_sk_done()
>>> to skip
>>> synchronize_rcu() for sockets, that are not need that. After the
>>> patch,
>>> kworker becomes able to warm the cpu up as expected.
>>>
>>> Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
>>> ---
>>> net/ipv6/ip6mr.c | 4 +++-
>>> 1 file changed, 3 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/net/ipv6/ip6mr.c b/net/ipv6/ip6mr.c
>>> index 2a38f9de45d3..290a8d0d5eac 100644
>>> --- a/net/ipv6/ip6mr.c
>>> +++ b/net/ipv6/ip6mr.c
>>> @@ -1485,7 +1485,9 @@ int ip6mr_sk_done(struct sock *sk)
>>> }
>>> }
>>> rtnl_unlock();
>>> - synchronize_rcu();
>>> +
>>> + if (!err)
>>> + synchronize_rcu();
>>>
>>
>>
>> But... what is this synchronize_rcu() doing exactly ?
>>
>> This was added in 8571ab479a6e1ef46ead5ebee567e128a422767c
>>
>> ("ip6mr: Make mroute_sk rcu-based")
>>
>> Typically on a delete, the synchronize_rcu() would be needed before
>> freeing the deleted object.
>>
>> But nowadays we have better way : SOCK_RCU_FREE
>
> Hm. I'm agree with you. This is hot path, and mroute sockets created from userspace
> will delay userspace tasks close() and exit(). Since there may be many such sockets,
> we may get a zombie task, which can't be reaped for ages. This slows down the system
> directly.
>
> Fix for pernet_operations works, but we need generic solution instead.
>
> The commit "8571ab479a6e1ef46ead5ebee567e128a422767c" says:
>
> ip6mr: Make mroute_sk rcu-based
>
> In ipmr the mr_table socket is handled under RCU. Introduce the same
> for ip6mr.
>
> There is no pointing to improvements it invents, or to the problem it solves. The description
> looks like a cleanup. It's too expensive cleanup, if it worsens the performance a hundred
> times.
>
> Can't we simply revert it?!
>
> Yuval, do you have ideas to fix that (maybe, via SOCK_RCU_FREE suggested by Eric)?
>
> We actually use rcu_dereference() in ip6mr_cache_report() only. The only user of dereference
> is sock_queue_rcv_skb(). Superficial analysis shows we purge the queue in inet_sock_destruct().
+ So this change should be safe.
> Thanks,
> Kirill
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* RE: [PATCH RESEND net-next] net: Do synchronize_rcu() in ip6mr_sk_done() only if this is needed
2018-03-07 9:32 ` Kirill Tkhai
@ 2018-03-07 13:51 ` Yuval Mintz
2018-03-07 14:22 ` Eric Dumazet
0 siblings, 1 reply; 11+ messages in thread
From: Yuval Mintz @ 2018-03-07 13:51 UTC (permalink / raw)
To: Kirill Tkhai, Eric Dumazet
Cc: davem@davemloft.net, yoshfuji@linux-ipv6.org,
netdev@vger.kernel.org
> >>> After unshare test kworker hangs for ages:
> >>>
> >>> $ while :; do unshare -n true; done &
> >>>
> >>> $ perf report <kworker>
> >>> - 88,82% 0,00% kworker/u16:0 [kernel.vmlinux] [k]
> >>> cleanup_net
> >>> cleanup_net
> >>> - ops_exit_list.isra.9
> >>> - 85,68% igmp6_net_exit
> >>> - 53,31% sock_release
> >>> - inet_release
> >>> - 25,33% rawv6_close
> >>> - ip6mr_sk_done
> >>> + 23,38% synchronize_rcu
> >>>
> >>> Keep in mind, this perf report shows the time a function was
> >>> executing, and
> >>> it does not show the time, it was sleeping. But it's easy to imagine,
> >>> how
> >>> much it is sleeping, if synchronize_rcu() execution takes the most
> >>> time.
> >>> Top shows the kworker R time is less then 1%.
> >>>
> >>> This happen, because of in ip6mr_sk_done() we do too many
> >>> synchronize_rcu(),
> >>> even for the sockets, that are not referenced in mr_table, and which
> >>> are not
> >>> need it. So, the progress of kworker becomes very slow.
> >>>
> >>> The patch introduces apparent solution, and it makes ip6mr_sk_done()
> >>> to skip
> >>> synchronize_rcu() for sockets, that are not need that. After the
> >>> patch,
> >>> kworker becomes able to warm the cpu up as expected.
> >>>
> >>> Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
> >>> ---
> >>> net/ipv6/ip6mr.c | 4 +++-
> >>> 1 file changed, 3 insertions(+), 1 deletion(-)
> >>>
> >>> diff --git a/net/ipv6/ip6mr.c b/net/ipv6/ip6mr.c
> >>> index 2a38f9de45d3..290a8d0d5eac 100644
> >>> --- a/net/ipv6/ip6mr.c
> >>> +++ b/net/ipv6/ip6mr.c
> >>> @@ -1485,7 +1485,9 @@ int ip6mr_sk_done(struct sock *sk)
> >>> }
> >>> }
> >>> rtnl_unlock();
> >>> - synchronize_rcu();
> >>> +
> >>> + if (!err)
> >>> + synchronize_rcu();
> >>>
> >>
> >>
> >> But... what is this synchronize_rcu() doing exactly ?
> >>
> >> This was added in 8571ab479a6e1ef46ead5ebee567e128a422767c
> >>
> >> ("ip6mr: Make mroute_sk rcu-based")
> >>
> >> Typically on a delete, the synchronize_rcu() would be needed before
> >> freeing the deleted object.
> >>
> >> But nowadays we have better way : SOCK_RCU_FREE
> >
> > Hm. I'm agree with you. This is hot path, and mroute sockets created from
> userspace
> > will delay userspace tasks close() and exit(). Since there may be many such
> sockets,
> > we may get a zombie task, which can't be reaped for ages. This slows down
> the system
> > directly.
> >
> > Fix for pernet_operations works, but we need generic solution instead.
> >
> > The commit "8571ab479a6e1ef46ead5ebee567e128a422767c" says:
> >
> > ip6mr: Make mroute_sk rcu-based
> >
> > In ipmr the mr_table socket is handled under RCU. Introduce the same
> > for ip6mr.
> >
> > There is no pointing to improvements it invents, or to the problem it solves.
> The description
> > looks like a cleanup. It's too expensive cleanup, if it worsens the
> performance a hundred
> > times.
> >
> > Can't we simply revert it?!
> >
> > Yuval, do you have ideas to fix that (maybe, via SOCK_RCU_FREE suggested
> by Eric)?
Sorry, failed to notice ip6mr_sk_done() is called unconditionally from
rawv6_close(). But even with your suggested fix it should be ~resolved
[as only sockets used for ip6mr would reach the sync].
Or are you claiming there's still some performance hit even with your
suggested change?
> >
> > We actually use rcu_dereference() in ip6mr_cache_report() only. The only
> user of dereference
> > is sock_queue_rcv_skb(). Superficial analysis shows we purge the queue in
> inet_sock_destruct().
>
> + So this change should be safe.
I might have misunderstood the comment from
commit 4c9687098f24 ("ipmr: RCU conversion of mroute_sk") when writing
this; Thought comment regarding ip_ra_destroy() meant that for the IPv6 case
we DO have to make sure there's a grace-period before destroying the socket.
>
> > Thanks,
> > Kirill
> >
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH RESEND net-next] net: Do synchronize_rcu() in ip6mr_sk_done() only if this is needed
2018-03-07 13:51 ` Yuval Mintz
@ 2018-03-07 14:22 ` Eric Dumazet
2018-03-07 16:43 ` [PATCH net-next] ip6mr: remove synchronize_rcu() in favor of SOCK_RCU_FREE Eric Dumazet
0 siblings, 1 reply; 11+ messages in thread
From: Eric Dumazet @ 2018-03-07 14:22 UTC (permalink / raw)
To: Yuval Mintz, Kirill Tkhai
Cc: davem@davemloft.net, yoshfuji@linux-ipv6.org,
netdev@vger.kernel.org
On Wed, 2018-03-07 at 13:51 +0000, Yuval Mintz wrote:
> > > > > After unshare test kworker hangs for ages:
> > > > >
> > > > > $ while :; do unshare -n true; done &
> > > > >
> > > > > $ perf report <kworker>
> > > > > - 88,82% 0,00% kworker/u16:0 [kernel.vmlinux] [k
> > > > > ]
> > > > > cleanup_net
> > > > > cleanup_net
> > > > > - ops_exit_list.isra.9
> > > > > - 85,68% igmp6_net_exit
> > > > > - 53,31% sock_release
> > > > > - inet_release
> > > > > - 25,33% rawv6_close
> > > > > - ip6mr_sk_done
> > > > > + 23,38% synchronize_rcu
> > > > >
> > > > > Keep in mind, this perf report shows the time a function was
> > > > > executing, and
> > > > > it does not show the time, it was sleeping. But it's easy to
> > > > > imagine,
> > > > > how
> > > > > much it is sleeping, if synchronize_rcu() execution takes the
> > > > > most
> > > > > time.
> > > > > Top shows the kworker R time is less then 1%.
> > > > >
> > > > > This happen, because of in ip6mr_sk_done() we do too many
> > > > > synchronize_rcu(),
> > > > > even for the sockets, that are not referenced in mr_table,
> > > > > and which
> > > > > are not
> > > > > need it. So, the progress of kworker becomes very slow.
> > > > >
> > > > > The patch introduces apparent solution, and it makes
> > > > > ip6mr_sk_done()
> > > > > to skip
> > > > > synchronize_rcu() for sockets, that are not need that. After
> > > > > the
> > > > > patch,
> > > > > kworker becomes able to warm the cpu up as expected.
> > > > >
> > > > > Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
> > > > > ---
> > > > > net/ipv6/ip6mr.c | 4 +++-
> > > > > 1 file changed, 3 insertions(+), 1 deletion(-)
> > > > >
> > > > > diff --git a/net/ipv6/ip6mr.c b/net/ipv6/ip6mr.c
> > > > > index 2a38f9de45d3..290a8d0d5eac 100644
> > > > > --- a/net/ipv6/ip6mr.c
> > > > > +++ b/net/ipv6/ip6mr.c
> > > > > @@ -1485,7 +1485,9 @@ int ip6mr_sk_done(struct sock *sk)
> > > > > }
> > > > > }
> > > > > rtnl_unlock();
> > > > > - synchronize_rcu();
> > > > > +
> > > > > + if (!err)
> > > > > + synchronize_rcu();
> > > > >
> > > >
> > > >
> > > > But... what is this synchronize_rcu() doing exactly ?
> > > >
> > > > This was added in 8571ab479a6e1ef46ead5ebee567e128a422767c
> > > >
> > > > ("ip6mr: Make mroute_sk rcu-based")
> > > >
> > > > Typically on a delete, the synchronize_rcu() would be needed
> > > > before
> > > > freeing the deleted object.
> > > >
> > > > But nowadays we have better way : SOCK_RCU_FREE
> > >
> > > Hm. I'm agree with you. This is hot path, and mroute sockets
> > > created from
> >
> > userspace
> > > will delay userspace tasks close() and exit(). Since there may be
> > > many such
> >
> > sockets,
> > > we may get a zombie task, which can't be reaped for ages. This
> > > slows down
> >
> > the system
> > > directly.
> > >
> > > Fix for pernet_operations works, but we need generic solution
> > > instead.
> > >
> > > The commit "8571ab479a6e1ef46ead5ebee567e128a422767c" says:
> > >
> > > ip6mr: Make mroute_sk rcu-based
> > >
> > > In ipmr the mr_table socket is handled under RCU. Introduce
> > > the same
> > > for ip6mr.
> > >
> > > There is no pointing to improvements it invents, or to the
> > > problem it solves.
> >
> > The description
> > > looks like a cleanup. It's too expensive cleanup, if it worsens
> > > the
> >
> > performance a hundred
> > > times.
> > >
> > > Can't we simply revert it?!
> > >
> > > Yuval, do you have ideas to fix that (maybe, via SOCK_RCU_FREE
> > > suggested
> >
> > by Eric)?
>
> Sorry, failed to notice ip6mr_sk_done() is called unconditionally
> from
> rawv6_close(). But even with your suggested fix it should be
> ~resolved
> [as only sockets used for ip6mr would reach the sync]
Yes but some user setups could still hit this synchronize_rcu() quite
hard.
New synchronize_rcu() should not be added in possibly critical paths
unless absolutely needed (if really there is no other ways)
We have to be very careful here, since this kind of mistake can have a
100x impact.
> .
> Or are you claiming there's still some performance hit even with your
> suggested change?
>
> > >
> > > We actually use rcu_dereference() in ip6mr_cache_report() only.
> > > The only
> >
> > user of dereference
> > > is sock_queue_rcv_skb(). Superficial analysis shows we purge the
> > > queue in
> >
> > inet_sock_destruct().
> >
> > + So this change should be safe.
>
> I might have misunderstood the comment from
> commit 4c9687098f24 ("ipmr: RCU conversion of mroute_sk") when
> writing
> this; Thought comment regarding ip_ra_destroy() meant that for the
> IPv6 case
> we DO have to make sure there's a grace-period before destroying the
> socket.
Yeah but since 2010 things have changed a lot in the kernel ;)
And we now have SOCK_RCU_FREE for the few sockets that need RCU grace
period at dismantle.
Note that TCP sockets are RCU protected but do not request
SOCK_RCU_FREE in general (only listeners use this)
SOCK_RCU_FREE will not require a synchronize_rcu() stall, and instead
use the asynchronous RCU infrastructure ( call_rcu() )
So all we need now is to make sure these (RAW / ICMPV6) IPv6 sockets
have this flag set, then we can remove this problematic
synchronize_rcu() from ip6mr_sk_done()
I will test the following fix.
diff --git a/net/ipv6/ip6mr.c b/net/ipv6/ip6mr.c
index 2a38f9de45d399fafc9f4bcc662b44be17279e51..7345bd6c4b7dda39c0d73d542e9ca9a5366542ff 100644
--- a/net/ipv6/ip6mr.c
+++ b/net/ipv6/ip6mr.c
@@ -1443,6 +1443,7 @@ static int ip6mr_sk_init(struct mr_table *mrt, struct sock *sk)
err = -EADDRINUSE;
} else {
rcu_assign_pointer(mrt->mroute_sk, sk);
+ sock_set_flag(sk, SOCK_RCU_FREE);
net->ipv6.devconf_all->mc_forwarding++;
}
write_unlock_bh(&mrt_lock);
@@ -1472,6 +1473,10 @@ int ip6mr_sk_done(struct sock *sk)
if (sk == rtnl_dereference(mrt->mroute_sk)) {
write_lock_bh(&mrt_lock);
RCU_INIT_POINTER(mrt->mroute_sk, NULL);
+ /* Note that mroute_sk had SOCK_RCU_FREE set,
+ * so the RCU grace period before sk freeing
+ * is guaranteed by sk_destruct()
+ */
net->ipv6.devconf_all->mc_forwarding--;
write_unlock_bh(&mrt_lock);
inet6_netconf_notify_devconf(net, RTM_NEWNETCONF,
@@ -1485,7 +1490,6 @@ int ip6mr_sk_done(struct sock *sk)
}
}
rtnl_unlock();
- synchronize_rcu();
return err;
}
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH net-next] ip6mr: remove synchronize_rcu() in favor of SOCK_RCU_FREE
2018-03-07 14:22 ` Eric Dumazet
@ 2018-03-07 16:43 ` Eric Dumazet
2018-03-07 22:11 ` Kirill Tkhai
2018-03-07 23:14 ` David Miller
0 siblings, 2 replies; 11+ messages in thread
From: Eric Dumazet @ 2018-03-07 16:43 UTC (permalink / raw)
To: Yuval Mintz, Kirill Tkhai
Cc: davem@davemloft.net, yoshfuji@linux-ipv6.org,
netdev@vger.kernel.org
From: Eric Dumazet <edumazet@google.com>
Kirill found that recently added synchronize_rcu() call in
ip6mr_sk_done()
was slowing down netns dismantle and posted a patch to use it only if
the socket
was found.
I instead suggested to get rid of this call, and use instead
SOCK_RCU_FREE
We might later change IPv4 side to use the same technique and unify
both stacks. IPv4 does not use synchronize_rcu() but has a call_rcu()
that could be replaced by SOCK_RCU_FREE.
Tested:
time for i in {1..1000}; do unshare -n /bin/false;done
Before : real 7m18.911s
After : real 10.187s
Fixes: 8571ab479a6e ("ip6mr: Make mroute_sk rcu-based")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Cc: Yuval Mintz <yuvalm@mellanox.com>
---
net/ipv6/ip6mr.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/net/ipv6/ip6mr.c b/net/ipv6/ip6mr.c
index 2a38f9de45d399fafc9f4bcc662b44be17279e51..7345bd6c4b7dda39c0d73d542e9ca9a5366542ff 100644
--- a/net/ipv6/ip6mr.c
+++ b/net/ipv6/ip6mr.c
@@ -1443,6 +1443,7 @@ static int ip6mr_sk_init(struct mr_table *mrt, struct sock *sk)
err = -EADDRINUSE;
} else {
rcu_assign_pointer(mrt->mroute_sk, sk);
+ sock_set_flag(sk, SOCK_RCU_FREE);
net->ipv6.devconf_all->mc_forwarding++;
}
write_unlock_bh(&mrt_lock);
@@ -1472,6 +1473,10 @@ int ip6mr_sk_done(struct sock *sk)
if (sk == rtnl_dereference(mrt->mroute_sk)) {
write_lock_bh(&mrt_lock);
RCU_INIT_POINTER(mrt->mroute_sk, NULL);
+ /* Note that mroute_sk had SOCK_RCU_FREE set,
+ * so the RCU grace period before sk freeing
+ * is guaranteed by sk_destruct()
+ */
net->ipv6.devconf_all->mc_forwarding--;
write_unlock_bh(&mrt_lock);
inet6_netconf_notify_devconf(net, RTM_NEWNETCONF,
@@ -1485,7 +1490,6 @@ int ip6mr_sk_done(struct sock *sk)
}
}
rtnl_unlock();
- synchronize_rcu();
return err;
}
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH net-next] ip6mr: remove synchronize_rcu() in favor of SOCK_RCU_FREE
2018-03-07 16:43 ` [PATCH net-next] ip6mr: remove synchronize_rcu() in favor of SOCK_RCU_FREE Eric Dumazet
@ 2018-03-07 22:11 ` Kirill Tkhai
2018-03-07 23:14 ` David Miller
1 sibling, 0 replies; 11+ messages in thread
From: Kirill Tkhai @ 2018-03-07 22:11 UTC (permalink / raw)
To: Eric Dumazet, Yuval Mintz
Cc: davem@davemloft.net, yoshfuji@linux-ipv6.org,
netdev@vger.kernel.org
On 07.03.2018 19:43, Eric Dumazet wrote:
> From: Eric Dumazet <edumazet@google.com>
>
> Kirill found that recently added synchronize_rcu() call in
> ip6mr_sk_done()
> was slowing down netns dismantle and posted a patch to use it only if
> the socket
> was found.
>
> I instead suggested to get rid of this call, and use instead
> SOCK_RCU_FREE
>
> We might later change IPv4 side to use the same technique and unify
> both stacks. IPv4 does not use synchronize_rcu() but has a call_rcu()
> that could be replaced by SOCK_RCU_FREE.
>
> Tested:
> time for i in {1..1000}; do unshare -n /bin/false;done
>
> Before : real 7m18.911s
> After : real 10.187s
>
> Fixes: 8571ab479a6e ("ip6mr: Make mroute_sk rcu-based")
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Reported-by: Kirill Tkhai <ktkhai@virtuozzo.com>
> Cc: Yuval Mintz <yuvalm@mellanox.com>
Reviewed-by: Kirill Tkhai <ktkhai@virtuozzo.com>
> ---
> net/ipv6/ip6mr.c | 6 +++++-
> 1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/net/ipv6/ip6mr.c b/net/ipv6/ip6mr.c
> index 2a38f9de45d399fafc9f4bcc662b44be17279e51..7345bd6c4b7dda39c0d73d542e9ca9a5366542ff 100644
> --- a/net/ipv6/ip6mr.c
> +++ b/net/ipv6/ip6mr.c
> @@ -1443,6 +1443,7 @@ static int ip6mr_sk_init(struct mr_table *mrt, struct sock *sk)
> err = -EADDRINUSE;
> } else {
> rcu_assign_pointer(mrt->mroute_sk, sk);
> + sock_set_flag(sk, SOCK_RCU_FREE);
> net->ipv6.devconf_all->mc_forwarding++;
> }
> write_unlock_bh(&mrt_lock);
> @@ -1472,6 +1473,10 @@ int ip6mr_sk_done(struct sock *sk)
> if (sk == rtnl_dereference(mrt->mroute_sk)) {
> write_lock_bh(&mrt_lock);
> RCU_INIT_POINTER(mrt->mroute_sk, NULL);
> + /* Note that mroute_sk had SOCK_RCU_FREE set,
> + * so the RCU grace period before sk freeing
> + * is guaranteed by sk_destruct()
> + */
> net->ipv6.devconf_all->mc_forwarding--;
> write_unlock_bh(&mrt_lock);
> inet6_netconf_notify_devconf(net, RTM_NEWNETCONF,
> @@ -1485,7 +1490,6 @@ int ip6mr_sk_done(struct sock *sk)
> }
> }
> rtnl_unlock();
> - synchronize_rcu();
>
> return err;
> }
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH net-next] ip6mr: remove synchronize_rcu() in favor of SOCK_RCU_FREE
2018-03-07 16:43 ` [PATCH net-next] ip6mr: remove synchronize_rcu() in favor of SOCK_RCU_FREE Eric Dumazet
2018-03-07 22:11 ` Kirill Tkhai
@ 2018-03-07 23:14 ` David Miller
1 sibling, 0 replies; 11+ messages in thread
From: David Miller @ 2018-03-07 23:14 UTC (permalink / raw)
To: eric.dumazet; +Cc: yuvalm, ktkhai, yoshfuji, netdev
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 07 Mar 2018 08:43:19 -0800
> From: Eric Dumazet <edumazet@google.com>
>
> Kirill found that recently added synchronize_rcu() call in
> ip6mr_sk_done()
> was slowing down netns dismantle and posted a patch to use it only if
> the socket
> was found.
>
> I instead suggested to get rid of this call, and use instead
> SOCK_RCU_FREE
>
> We might later change IPv4 side to use the same technique and unify
> both stacks. IPv4 does not use synchronize_rcu() but has a call_rcu()
> that could be replaced by SOCK_RCU_FREE.
>
> Tested:
> time for i in {1..1000}; do unshare -n /bin/false;done
>
> Before : real 7m18.911s
> After : real 10.187s
>
> Fixes: 8571ab479a6e ("ip6mr: Make mroute_sk rcu-based")
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Reported-by: Kirill Tkhai <ktkhai@virtuozzo.com>
> Cc: Yuval Mintz <yuvalm@mellanox.com>
Looks great, applied, thanks everyone.
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2018-03-07 23:14 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-03-06 16:24 [PATCH RESEND net-next] net: Do synchronize_rcu() in ip6mr_sk_done() only if this is needed Kirill Tkhai
2018-03-06 16:50 ` Eric Dumazet
2018-03-06 16:58 ` Eric Dumazet
2018-03-06 17:20 ` Eric Dumazet
2018-03-07 9:22 ` Kirill Tkhai
2018-03-07 9:32 ` Kirill Tkhai
2018-03-07 13:51 ` Yuval Mintz
2018-03-07 14:22 ` Eric Dumazet
2018-03-07 16:43 ` [PATCH net-next] ip6mr: remove synchronize_rcu() in favor of SOCK_RCU_FREE Eric Dumazet
2018-03-07 22:11 ` Kirill Tkhai
2018-03-07 23:14 ` David Miller
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).