* [PATCH RESEND net-next] net: Do synchronize_rcu() in ip6mr_sk_done() only if this is needed
@ 2018-03-06 16:24 Kirill Tkhai
2018-03-06 16:50 ` Eric Dumazet
0 siblings, 1 reply; 11+ messages in thread
From: Kirill Tkhai @ 2018-03-06 16:24 UTC (permalink / raw)
To: davem, yoshfuji, netdev, ktkhai
After unshare test kworker hangs for ages:
$ while :; do unshare -n true; done &
$ perf report <kworker>
- 88,82% 0,00% kworker/u16:0 [kernel.vmlinux] [k] cleanup_net
cleanup_net
- ops_exit_list.isra.9
- 85,68% igmp6_net_exit
- 53,31% sock_release
- inet_release
- 25,33% rawv6_close
- ip6mr_sk_done
+ 23,38% synchronize_rcu
Keep in mind, this perf report shows the time a function was executing, and
it does not show the time, it was sleeping. But it's easy to imagine, how
much it is sleeping, if synchronize_rcu() execution takes the most time.
Top shows the kworker R time is less then 1%.
This happen, because of in ip6mr_sk_done() we do too many synchronize_rcu(),
even for the sockets, that are not referenced in mr_table, and which are not
need it. So, the progress of kworker becomes very slow.
The patch introduces apparent solution, and it makes ip6mr_sk_done() to skip
synchronize_rcu() for sockets, that are not need that. After the patch,
kworker becomes able to warm the cpu up as expected.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
---
net/ipv6/ip6mr.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/net/ipv6/ip6mr.c b/net/ipv6/ip6mr.c
index 2a38f9de45d3..290a8d0d5eac 100644
--- a/net/ipv6/ip6mr.c
+++ b/net/ipv6/ip6mr.c
@@ -1485,7 +1485,9 @@ int ip6mr_sk_done(struct sock *sk)
}
}
rtnl_unlock();
- synchronize_rcu();
+
+ if (!err)
+ synchronize_rcu();
return err;
}
^ permalink raw reply related [flat|nested] 11+ messages in thread* Re: [PATCH RESEND net-next] net: Do synchronize_rcu() in ip6mr_sk_done() only if this is needed 2018-03-06 16:24 [PATCH RESEND net-next] net: Do synchronize_rcu() in ip6mr_sk_done() only if this is needed Kirill Tkhai @ 2018-03-06 16:50 ` Eric Dumazet 2018-03-06 16:58 ` Eric Dumazet 2018-03-07 9:22 ` Kirill Tkhai 0 siblings, 2 replies; 11+ messages in thread From: Eric Dumazet @ 2018-03-06 16:50 UTC (permalink / raw) To: Kirill Tkhai, davem, yoshfuji, netdev; +Cc: Yuval Mintz On Tue, 2018-03-06 at 19:24 +0300, Kirill Tkhai wrote: > After unshare test kworker hangs for ages: > > $ while :; do unshare -n true; done & > > $ perf report <kworker> > - 88,82% 0,00% kworker/u16:0 [kernel.vmlinux] [k] > cleanup_net > cleanup_net > - ops_exit_list.isra.9 > - 85,68% igmp6_net_exit > - 53,31% sock_release > - inet_release > - 25,33% rawv6_close > - ip6mr_sk_done > + 23,38% synchronize_rcu > > Keep in mind, this perf report shows the time a function was > executing, and > it does not show the time, it was sleeping. But it's easy to imagine, > how > much it is sleeping, if synchronize_rcu() execution takes the most > time. > Top shows the kworker R time is less then 1%. > > This happen, because of in ip6mr_sk_done() we do too many > synchronize_rcu(), > even for the sockets, that are not referenced in mr_table, and which > are not > need it. So, the progress of kworker becomes very slow. > > The patch introduces apparent solution, and it makes ip6mr_sk_done() > to skip > synchronize_rcu() for sockets, that are not need that. After the > patch, > kworker becomes able to warm the cpu up as expected. > > Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> > --- > net/ipv6/ip6mr.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/net/ipv6/ip6mr.c b/net/ipv6/ip6mr.c > index 2a38f9de45d3..290a8d0d5eac 100644 > --- a/net/ipv6/ip6mr.c > +++ b/net/ipv6/ip6mr.c > @@ -1485,7 +1485,9 @@ int ip6mr_sk_done(struct sock *sk) > } > } > rtnl_unlock(); > - synchronize_rcu(); > + > + if (!err) > + synchronize_rcu(); > But... what is this synchronize_rcu() doing exactly ? This was added in 8571ab479a6e1ef46ead5ebee567e128a422767c ("ip6mr: Make mroute_sk rcu-based") Typically on a delete, the synchronize_rcu() would be needed before freeing the deleted object. But nowadays we have better way : SOCK_RCU_FREE ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH RESEND net-next] net: Do synchronize_rcu() in ip6mr_sk_done() only if this is needed 2018-03-06 16:50 ` Eric Dumazet @ 2018-03-06 16:58 ` Eric Dumazet 2018-03-06 17:20 ` Eric Dumazet 2018-03-07 9:22 ` Kirill Tkhai 1 sibling, 1 reply; 11+ messages in thread From: Eric Dumazet @ 2018-03-06 16:58 UTC (permalink / raw) To: Kirill Tkhai, davem, yoshfuji, netdev; +Cc: Yuval Mintz On Tue, 2018-03-06 at 08:50 -0800, Eric Dumazet wrote: > > But... what is this synchronize_rcu() doing exactly ? > > This was added in 8571ab479a6e1ef46ead5ebee567e128a422767c > > ("ip6mr: Make mroute_sk rcu-based") > > Typically on a delete, the synchronize_rcu() would be needed before > freeing the deleted object. > > But nowadays we have better way : SOCK_RCU_FREE To be clear, your patch is fine Kirill, I am only sad seeing one can add a synchronize_rcu() in hot path without anyone complaining during code review. Reviewed-by: Eric Dumazet <edumazet@google.com> ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH RESEND net-next] net: Do synchronize_rcu() in ip6mr_sk_done() only if this is needed 2018-03-06 16:58 ` Eric Dumazet @ 2018-03-06 17:20 ` Eric Dumazet 0 siblings, 0 replies; 11+ messages in thread From: Eric Dumazet @ 2018-03-06 17:20 UTC (permalink / raw) To: Kirill Tkhai, davem, yoshfuji, netdev; +Cc: Yuval Mintz On Tue, 2018-03-06 at 08:58 -0800, Eric Dumazet wrote: > > To be clear, your patch is fine Kirill, > > I am only sad seeing one can add a synchronize_rcu() in hot path > without anyone complaining during code review. lpaa2:~# time for i in {1..1000}; do unshare -n /bin/false;done real 7m18.911s user 0m0.185s sys 0m2.314s Instead of less than 10 seconds not a long time ago :/ ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH RESEND net-next] net: Do synchronize_rcu() in ip6mr_sk_done() only if this is needed 2018-03-06 16:50 ` Eric Dumazet 2018-03-06 16:58 ` Eric Dumazet @ 2018-03-07 9:22 ` Kirill Tkhai 2018-03-07 9:32 ` Kirill Tkhai 1 sibling, 1 reply; 11+ messages in thread From: Kirill Tkhai @ 2018-03-07 9:22 UTC (permalink / raw) To: Eric Dumazet, davem, yoshfuji, netdev, Yuval Mintz On 06.03.2018 19:50, Eric Dumazet wrote: > On Tue, 2018-03-06 at 19:24 +0300, Kirill Tkhai wrote: >> After unshare test kworker hangs for ages: >> >> $ while :; do unshare -n true; done & >> >> $ perf report <kworker> >> - 88,82% 0,00% kworker/u16:0 [kernel.vmlinux] [k] >> cleanup_net >> cleanup_net >> - ops_exit_list.isra.9 >> - 85,68% igmp6_net_exit >> - 53,31% sock_release >> - inet_release >> - 25,33% rawv6_close >> - ip6mr_sk_done >> + 23,38% synchronize_rcu >> >> Keep in mind, this perf report shows the time a function was >> executing, and >> it does not show the time, it was sleeping. But it's easy to imagine, >> how >> much it is sleeping, if synchronize_rcu() execution takes the most >> time. >> Top shows the kworker R time is less then 1%. >> >> This happen, because of in ip6mr_sk_done() we do too many >> synchronize_rcu(), >> even for the sockets, that are not referenced in mr_table, and which >> are not >> need it. So, the progress of kworker becomes very slow. >> >> The patch introduces apparent solution, and it makes ip6mr_sk_done() >> to skip >> synchronize_rcu() for sockets, that are not need that. After the >> patch, >> kworker becomes able to warm the cpu up as expected. >> >> Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> >> --- >> net/ipv6/ip6mr.c | 4 +++- >> 1 file changed, 3 insertions(+), 1 deletion(-) >> >> diff --git a/net/ipv6/ip6mr.c b/net/ipv6/ip6mr.c >> index 2a38f9de45d3..290a8d0d5eac 100644 >> --- a/net/ipv6/ip6mr.c >> +++ b/net/ipv6/ip6mr.c >> @@ -1485,7 +1485,9 @@ int ip6mr_sk_done(struct sock *sk) >> } >> } >> rtnl_unlock(); >> - synchronize_rcu(); >> + >> + if (!err) >> + synchronize_rcu(); >> > > > But... what is this synchronize_rcu() doing exactly ? > > This was added in 8571ab479a6e1ef46ead5ebee567e128a422767c > > ("ip6mr: Make mroute_sk rcu-based") > > Typically on a delete, the synchronize_rcu() would be needed before > freeing the deleted object. > > But nowadays we have better way : SOCK_RCU_FREE Hm. I'm agree with you. This is hot path, and mroute sockets created from userspace will delay userspace tasks close() and exit(). Since there may be many such sockets, we may get a zombie task, which can't be reaped for ages. This slows down the system directly. Fix for pernet_operations works, but we need generic solution instead. The commit "8571ab479a6e1ef46ead5ebee567e128a422767c" says: ip6mr: Make mroute_sk rcu-based In ipmr the mr_table socket is handled under RCU. Introduce the same for ip6mr. There is no pointing to improvements it invents, or to the problem it solves. The description looks like a cleanup. It's too expensive cleanup, if it worsens the performance a hundred times. Can't we simply revert it?! Yuval, do you have ideas to fix that (maybe, via SOCK_RCU_FREE suggested by Eric)? We actually use rcu_dereference() in ip6mr_cache_report() only. The only user of dereference is sock_queue_rcv_skb(). Superficial analysis shows we purge the queue in inet_sock_destruct(). Thanks, Kirill ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH RESEND net-next] net: Do synchronize_rcu() in ip6mr_sk_done() only if this is needed 2018-03-07 9:22 ` Kirill Tkhai @ 2018-03-07 9:32 ` Kirill Tkhai 2018-03-07 13:51 ` Yuval Mintz 0 siblings, 1 reply; 11+ messages in thread From: Kirill Tkhai @ 2018-03-07 9:32 UTC (permalink / raw) To: Eric Dumazet, davem, yoshfuji, netdev, Yuval Mintz On 07.03.2018 12:22, Kirill Tkhai wrote: > On 06.03.2018 19:50, Eric Dumazet wrote: >> On Tue, 2018-03-06 at 19:24 +0300, Kirill Tkhai wrote: >>> After unshare test kworker hangs for ages: >>> >>> $ while :; do unshare -n true; done & >>> >>> $ perf report <kworker> >>> - 88,82% 0,00% kworker/u16:0 [kernel.vmlinux] [k] >>> cleanup_net >>> cleanup_net >>> - ops_exit_list.isra.9 >>> - 85,68% igmp6_net_exit >>> - 53,31% sock_release >>> - inet_release >>> - 25,33% rawv6_close >>> - ip6mr_sk_done >>> + 23,38% synchronize_rcu >>> >>> Keep in mind, this perf report shows the time a function was >>> executing, and >>> it does not show the time, it was sleeping. But it's easy to imagine, >>> how >>> much it is sleeping, if synchronize_rcu() execution takes the most >>> time. >>> Top shows the kworker R time is less then 1%. >>> >>> This happen, because of in ip6mr_sk_done() we do too many >>> synchronize_rcu(), >>> even for the sockets, that are not referenced in mr_table, and which >>> are not >>> need it. So, the progress of kworker becomes very slow. >>> >>> The patch introduces apparent solution, and it makes ip6mr_sk_done() >>> to skip >>> synchronize_rcu() for sockets, that are not need that. After the >>> patch, >>> kworker becomes able to warm the cpu up as expected. >>> >>> Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> >>> --- >>> net/ipv6/ip6mr.c | 4 +++- >>> 1 file changed, 3 insertions(+), 1 deletion(-) >>> >>> diff --git a/net/ipv6/ip6mr.c b/net/ipv6/ip6mr.c >>> index 2a38f9de45d3..290a8d0d5eac 100644 >>> --- a/net/ipv6/ip6mr.c >>> +++ b/net/ipv6/ip6mr.c >>> @@ -1485,7 +1485,9 @@ int ip6mr_sk_done(struct sock *sk) >>> } >>> } >>> rtnl_unlock(); >>> - synchronize_rcu(); >>> + >>> + if (!err) >>> + synchronize_rcu(); >>> >> >> >> But... what is this synchronize_rcu() doing exactly ? >> >> This was added in 8571ab479a6e1ef46ead5ebee567e128a422767c >> >> ("ip6mr: Make mroute_sk rcu-based") >> >> Typically on a delete, the synchronize_rcu() would be needed before >> freeing the deleted object. >> >> But nowadays we have better way : SOCK_RCU_FREE > > Hm. I'm agree with you. This is hot path, and mroute sockets created from userspace > will delay userspace tasks close() and exit(). Since there may be many such sockets, > we may get a zombie task, which can't be reaped for ages. This slows down the system > directly. > > Fix for pernet_operations works, but we need generic solution instead. > > The commit "8571ab479a6e1ef46ead5ebee567e128a422767c" says: > > ip6mr: Make mroute_sk rcu-based > > In ipmr the mr_table socket is handled under RCU. Introduce the same > for ip6mr. > > There is no pointing to improvements it invents, or to the problem it solves. The description > looks like a cleanup. It's too expensive cleanup, if it worsens the performance a hundred > times. > > Can't we simply revert it?! > > Yuval, do you have ideas to fix that (maybe, via SOCK_RCU_FREE suggested by Eric)? > > We actually use rcu_dereference() in ip6mr_cache_report() only. The only user of dereference > is sock_queue_rcv_skb(). Superficial analysis shows we purge the queue in inet_sock_destruct(). + So this change should be safe. > Thanks, > Kirill > ^ permalink raw reply [flat|nested] 11+ messages in thread
* RE: [PATCH RESEND net-next] net: Do synchronize_rcu() in ip6mr_sk_done() only if this is needed 2018-03-07 9:32 ` Kirill Tkhai @ 2018-03-07 13:51 ` Yuval Mintz 2018-03-07 14:22 ` Eric Dumazet 0 siblings, 1 reply; 11+ messages in thread From: Yuval Mintz @ 2018-03-07 13:51 UTC (permalink / raw) To: Kirill Tkhai, Eric Dumazet Cc: davem@davemloft.net, yoshfuji@linux-ipv6.org, netdev@vger.kernel.org > >>> After unshare test kworker hangs for ages: > >>> > >>> $ while :; do unshare -n true; done & > >>> > >>> $ perf report <kworker> > >>> - 88,82% 0,00% kworker/u16:0 [kernel.vmlinux] [k] > >>> cleanup_net > >>> cleanup_net > >>> - ops_exit_list.isra.9 > >>> - 85,68% igmp6_net_exit > >>> - 53,31% sock_release > >>> - inet_release > >>> - 25,33% rawv6_close > >>> - ip6mr_sk_done > >>> + 23,38% synchronize_rcu > >>> > >>> Keep in mind, this perf report shows the time a function was > >>> executing, and > >>> it does not show the time, it was sleeping. But it's easy to imagine, > >>> how > >>> much it is sleeping, if synchronize_rcu() execution takes the most > >>> time. > >>> Top shows the kworker R time is less then 1%. > >>> > >>> This happen, because of in ip6mr_sk_done() we do too many > >>> synchronize_rcu(), > >>> even for the sockets, that are not referenced in mr_table, and which > >>> are not > >>> need it. So, the progress of kworker becomes very slow. > >>> > >>> The patch introduces apparent solution, and it makes ip6mr_sk_done() > >>> to skip > >>> synchronize_rcu() for sockets, that are not need that. After the > >>> patch, > >>> kworker becomes able to warm the cpu up as expected. > >>> > >>> Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> > >>> --- > >>> net/ipv6/ip6mr.c | 4 +++- > >>> 1 file changed, 3 insertions(+), 1 deletion(-) > >>> > >>> diff --git a/net/ipv6/ip6mr.c b/net/ipv6/ip6mr.c > >>> index 2a38f9de45d3..290a8d0d5eac 100644 > >>> --- a/net/ipv6/ip6mr.c > >>> +++ b/net/ipv6/ip6mr.c > >>> @@ -1485,7 +1485,9 @@ int ip6mr_sk_done(struct sock *sk) > >>> } > >>> } > >>> rtnl_unlock(); > >>> - synchronize_rcu(); > >>> + > >>> + if (!err) > >>> + synchronize_rcu(); > >>> > >> > >> > >> But... what is this synchronize_rcu() doing exactly ? > >> > >> This was added in 8571ab479a6e1ef46ead5ebee567e128a422767c > >> > >> ("ip6mr: Make mroute_sk rcu-based") > >> > >> Typically on a delete, the synchronize_rcu() would be needed before > >> freeing the deleted object. > >> > >> But nowadays we have better way : SOCK_RCU_FREE > > > > Hm. I'm agree with you. This is hot path, and mroute sockets created from > userspace > > will delay userspace tasks close() and exit(). Since there may be many such > sockets, > > we may get a zombie task, which can't be reaped for ages. This slows down > the system > > directly. > > > > Fix for pernet_operations works, but we need generic solution instead. > > > > The commit "8571ab479a6e1ef46ead5ebee567e128a422767c" says: > > > > ip6mr: Make mroute_sk rcu-based > > > > In ipmr the mr_table socket is handled under RCU. Introduce the same > > for ip6mr. > > > > There is no pointing to improvements it invents, or to the problem it solves. > The description > > looks like a cleanup. It's too expensive cleanup, if it worsens the > performance a hundred > > times. > > > > Can't we simply revert it?! > > > > Yuval, do you have ideas to fix that (maybe, via SOCK_RCU_FREE suggested > by Eric)? Sorry, failed to notice ip6mr_sk_done() is called unconditionally from rawv6_close(). But even with your suggested fix it should be ~resolved [as only sockets used for ip6mr would reach the sync]. Or are you claiming there's still some performance hit even with your suggested change? > > > > We actually use rcu_dereference() in ip6mr_cache_report() only. The only > user of dereference > > is sock_queue_rcv_skb(). Superficial analysis shows we purge the queue in > inet_sock_destruct(). > > + So this change should be safe. I might have misunderstood the comment from commit 4c9687098f24 ("ipmr: RCU conversion of mroute_sk") when writing this; Thought comment regarding ip_ra_destroy() meant that for the IPv6 case we DO have to make sure there's a grace-period before destroying the socket. > > > Thanks, > > Kirill > > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH RESEND net-next] net: Do synchronize_rcu() in ip6mr_sk_done() only if this is needed 2018-03-07 13:51 ` Yuval Mintz @ 2018-03-07 14:22 ` Eric Dumazet 2018-03-07 16:43 ` [PATCH net-next] ip6mr: remove synchronize_rcu() in favor of SOCK_RCU_FREE Eric Dumazet 0 siblings, 1 reply; 11+ messages in thread From: Eric Dumazet @ 2018-03-07 14:22 UTC (permalink / raw) To: Yuval Mintz, Kirill Tkhai Cc: davem@davemloft.net, yoshfuji@linux-ipv6.org, netdev@vger.kernel.org On Wed, 2018-03-07 at 13:51 +0000, Yuval Mintz wrote: > > > > > After unshare test kworker hangs for ages: > > > > > > > > > > $ while :; do unshare -n true; done & > > > > > > > > > > $ perf report <kworker> > > > > > - 88,82% 0,00% kworker/u16:0 [kernel.vmlinux] [k > > > > > ] > > > > > cleanup_net > > > > > cleanup_net > > > > > - ops_exit_list.isra.9 > > > > > - 85,68% igmp6_net_exit > > > > > - 53,31% sock_release > > > > > - inet_release > > > > > - 25,33% rawv6_close > > > > > - ip6mr_sk_done > > > > > + 23,38% synchronize_rcu > > > > > > > > > > Keep in mind, this perf report shows the time a function was > > > > > executing, and > > > > > it does not show the time, it was sleeping. But it's easy to > > > > > imagine, > > > > > how > > > > > much it is sleeping, if synchronize_rcu() execution takes the > > > > > most > > > > > time. > > > > > Top shows the kworker R time is less then 1%. > > > > > > > > > > This happen, because of in ip6mr_sk_done() we do too many > > > > > synchronize_rcu(), > > > > > even for the sockets, that are not referenced in mr_table, > > > > > and which > > > > > are not > > > > > need it. So, the progress of kworker becomes very slow. > > > > > > > > > > The patch introduces apparent solution, and it makes > > > > > ip6mr_sk_done() > > > > > to skip > > > > > synchronize_rcu() for sockets, that are not need that. After > > > > > the > > > > > patch, > > > > > kworker becomes able to warm the cpu up as expected. > > > > > > > > > > Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> > > > > > --- > > > > > net/ipv6/ip6mr.c | 4 +++- > > > > > 1 file changed, 3 insertions(+), 1 deletion(-) > > > > > > > > > > diff --git a/net/ipv6/ip6mr.c b/net/ipv6/ip6mr.c > > > > > index 2a38f9de45d3..290a8d0d5eac 100644 > > > > > --- a/net/ipv6/ip6mr.c > > > > > +++ b/net/ipv6/ip6mr.c > > > > > @@ -1485,7 +1485,9 @@ int ip6mr_sk_done(struct sock *sk) > > > > > } > > > > > } > > > > > rtnl_unlock(); > > > > > - synchronize_rcu(); > > > > > + > > > > > + if (!err) > > > > > + synchronize_rcu(); > > > > > > > > > > > > > > > > > But... what is this synchronize_rcu() doing exactly ? > > > > > > > > This was added in 8571ab479a6e1ef46ead5ebee567e128a422767c > > > > > > > > ("ip6mr: Make mroute_sk rcu-based") > > > > > > > > Typically on a delete, the synchronize_rcu() would be needed > > > > before > > > > freeing the deleted object. > > > > > > > > But nowadays we have better way : SOCK_RCU_FREE > > > > > > Hm. I'm agree with you. This is hot path, and mroute sockets > > > created from > > > > userspace > > > will delay userspace tasks close() and exit(). Since there may be > > > many such > > > > sockets, > > > we may get a zombie task, which can't be reaped for ages. This > > > slows down > > > > the system > > > directly. > > > > > > Fix for pernet_operations works, but we need generic solution > > > instead. > > > > > > The commit "8571ab479a6e1ef46ead5ebee567e128a422767c" says: > > > > > > ip6mr: Make mroute_sk rcu-based > > > > > > In ipmr the mr_table socket is handled under RCU. Introduce > > > the same > > > for ip6mr. > > > > > > There is no pointing to improvements it invents, or to the > > > problem it solves. > > > > The description > > > looks like a cleanup. It's too expensive cleanup, if it worsens > > > the > > > > performance a hundred > > > times. > > > > > > Can't we simply revert it?! > > > > > > Yuval, do you have ideas to fix that (maybe, via SOCK_RCU_FREE > > > suggested > > > > by Eric)? > > Sorry, failed to notice ip6mr_sk_done() is called unconditionally > from > rawv6_close(). But even with your suggested fix it should be > ~resolved > [as only sockets used for ip6mr would reach the sync] Yes but some user setups could still hit this synchronize_rcu() quite hard. New synchronize_rcu() should not be added in possibly critical paths unless absolutely needed (if really there is no other ways) We have to be very careful here, since this kind of mistake can have a 100x impact. > . > Or are you claiming there's still some performance hit even with your > suggested change? > > > > > > > We actually use rcu_dereference() in ip6mr_cache_report() only. > > > The only > > > > user of dereference > > > is sock_queue_rcv_skb(). Superficial analysis shows we purge the > > > queue in > > > > inet_sock_destruct(). > > > > + So this change should be safe. > > I might have misunderstood the comment from > commit 4c9687098f24 ("ipmr: RCU conversion of mroute_sk") when > writing > this; Thought comment regarding ip_ra_destroy() meant that for the > IPv6 case > we DO have to make sure there's a grace-period before destroying the > socket. Yeah but since 2010 things have changed a lot in the kernel ;) And we now have SOCK_RCU_FREE for the few sockets that need RCU grace period at dismantle. Note that TCP sockets are RCU protected but do not request SOCK_RCU_FREE in general (only listeners use this) SOCK_RCU_FREE will not require a synchronize_rcu() stall, and instead use the asynchronous RCU infrastructure ( call_rcu() ) So all we need now is to make sure these (RAW / ICMPV6) IPv6 sockets have this flag set, then we can remove this problematic synchronize_rcu() from ip6mr_sk_done() I will test the following fix. diff --git a/net/ipv6/ip6mr.c b/net/ipv6/ip6mr.c index 2a38f9de45d399fafc9f4bcc662b44be17279e51..7345bd6c4b7dda39c0d73d542e9ca9a5366542ff 100644 --- a/net/ipv6/ip6mr.c +++ b/net/ipv6/ip6mr.c @@ -1443,6 +1443,7 @@ static int ip6mr_sk_init(struct mr_table *mrt, struct sock *sk) err = -EADDRINUSE; } else { rcu_assign_pointer(mrt->mroute_sk, sk); + sock_set_flag(sk, SOCK_RCU_FREE); net->ipv6.devconf_all->mc_forwarding++; } write_unlock_bh(&mrt_lock); @@ -1472,6 +1473,10 @@ int ip6mr_sk_done(struct sock *sk) if (sk == rtnl_dereference(mrt->mroute_sk)) { write_lock_bh(&mrt_lock); RCU_INIT_POINTER(mrt->mroute_sk, NULL); + /* Note that mroute_sk had SOCK_RCU_FREE set, + * so the RCU grace period before sk freeing + * is guaranteed by sk_destruct() + */ net->ipv6.devconf_all->mc_forwarding--; write_unlock_bh(&mrt_lock); inet6_netconf_notify_devconf(net, RTM_NEWNETCONF, @@ -1485,7 +1490,6 @@ int ip6mr_sk_done(struct sock *sk) } } rtnl_unlock(); - synchronize_rcu(); return err; } ^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH net-next] ip6mr: remove synchronize_rcu() in favor of SOCK_RCU_FREE 2018-03-07 14:22 ` Eric Dumazet @ 2018-03-07 16:43 ` Eric Dumazet 2018-03-07 22:11 ` Kirill Tkhai 2018-03-07 23:14 ` David Miller 0 siblings, 2 replies; 11+ messages in thread From: Eric Dumazet @ 2018-03-07 16:43 UTC (permalink / raw) To: Yuval Mintz, Kirill Tkhai Cc: davem@davemloft.net, yoshfuji@linux-ipv6.org, netdev@vger.kernel.org From: Eric Dumazet <edumazet@google.com> Kirill found that recently added synchronize_rcu() call in ip6mr_sk_done() was slowing down netns dismantle and posted a patch to use it only if the socket was found. I instead suggested to get rid of this call, and use instead SOCK_RCU_FREE We might later change IPv4 side to use the same technique and unify both stacks. IPv4 does not use synchronize_rcu() but has a call_rcu() that could be replaced by SOCK_RCU_FREE. Tested: time for i in {1..1000}; do unshare -n /bin/false;done Before : real 7m18.911s After : real 10.187s Fixes: 8571ab479a6e ("ip6mr: Make mroute_sk rcu-based") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Kirill Tkhai <ktkhai@virtuozzo.com> Cc: Yuval Mintz <yuvalm@mellanox.com> --- net/ipv6/ip6mr.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/net/ipv6/ip6mr.c b/net/ipv6/ip6mr.c index 2a38f9de45d399fafc9f4bcc662b44be17279e51..7345bd6c4b7dda39c0d73d542e9ca9a5366542ff 100644 --- a/net/ipv6/ip6mr.c +++ b/net/ipv6/ip6mr.c @@ -1443,6 +1443,7 @@ static int ip6mr_sk_init(struct mr_table *mrt, struct sock *sk) err = -EADDRINUSE; } else { rcu_assign_pointer(mrt->mroute_sk, sk); + sock_set_flag(sk, SOCK_RCU_FREE); net->ipv6.devconf_all->mc_forwarding++; } write_unlock_bh(&mrt_lock); @@ -1472,6 +1473,10 @@ int ip6mr_sk_done(struct sock *sk) if (sk == rtnl_dereference(mrt->mroute_sk)) { write_lock_bh(&mrt_lock); RCU_INIT_POINTER(mrt->mroute_sk, NULL); + /* Note that mroute_sk had SOCK_RCU_FREE set, + * so the RCU grace period before sk freeing + * is guaranteed by sk_destruct() + */ net->ipv6.devconf_all->mc_forwarding--; write_unlock_bh(&mrt_lock); inet6_netconf_notify_devconf(net, RTM_NEWNETCONF, @@ -1485,7 +1490,6 @@ int ip6mr_sk_done(struct sock *sk) } } rtnl_unlock(); - synchronize_rcu(); return err; } ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH net-next] ip6mr: remove synchronize_rcu() in favor of SOCK_RCU_FREE 2018-03-07 16:43 ` [PATCH net-next] ip6mr: remove synchronize_rcu() in favor of SOCK_RCU_FREE Eric Dumazet @ 2018-03-07 22:11 ` Kirill Tkhai 2018-03-07 23:14 ` David Miller 1 sibling, 0 replies; 11+ messages in thread From: Kirill Tkhai @ 2018-03-07 22:11 UTC (permalink / raw) To: Eric Dumazet, Yuval Mintz Cc: davem@davemloft.net, yoshfuji@linux-ipv6.org, netdev@vger.kernel.org On 07.03.2018 19:43, Eric Dumazet wrote: > From: Eric Dumazet <edumazet@google.com> > > Kirill found that recently added synchronize_rcu() call in > ip6mr_sk_done() > was slowing down netns dismantle and posted a patch to use it only if > the socket > was found. > > I instead suggested to get rid of this call, and use instead > SOCK_RCU_FREE > > We might later change IPv4 side to use the same technique and unify > both stacks. IPv4 does not use synchronize_rcu() but has a call_rcu() > that could be replaced by SOCK_RCU_FREE. > > Tested: > time for i in {1..1000}; do unshare -n /bin/false;done > > Before : real 7m18.911s > After : real 10.187s > > Fixes: 8571ab479a6e ("ip6mr: Make mroute_sk rcu-based") > Signed-off-by: Eric Dumazet <edumazet@google.com> > Reported-by: Kirill Tkhai <ktkhai@virtuozzo.com> > Cc: Yuval Mintz <yuvalm@mellanox.com> Reviewed-by: Kirill Tkhai <ktkhai@virtuozzo.com> > --- > net/ipv6/ip6mr.c | 6 +++++- > 1 file changed, 5 insertions(+), 1 deletion(-) > > diff --git a/net/ipv6/ip6mr.c b/net/ipv6/ip6mr.c > index 2a38f9de45d399fafc9f4bcc662b44be17279e51..7345bd6c4b7dda39c0d73d542e9ca9a5366542ff 100644 > --- a/net/ipv6/ip6mr.c > +++ b/net/ipv6/ip6mr.c > @@ -1443,6 +1443,7 @@ static int ip6mr_sk_init(struct mr_table *mrt, struct sock *sk) > err = -EADDRINUSE; > } else { > rcu_assign_pointer(mrt->mroute_sk, sk); > + sock_set_flag(sk, SOCK_RCU_FREE); > net->ipv6.devconf_all->mc_forwarding++; > } > write_unlock_bh(&mrt_lock); > @@ -1472,6 +1473,10 @@ int ip6mr_sk_done(struct sock *sk) > if (sk == rtnl_dereference(mrt->mroute_sk)) { > write_lock_bh(&mrt_lock); > RCU_INIT_POINTER(mrt->mroute_sk, NULL); > + /* Note that mroute_sk had SOCK_RCU_FREE set, > + * so the RCU grace period before sk freeing > + * is guaranteed by sk_destruct() > + */ > net->ipv6.devconf_all->mc_forwarding--; > write_unlock_bh(&mrt_lock); > inet6_netconf_notify_devconf(net, RTM_NEWNETCONF, > @@ -1485,7 +1490,6 @@ int ip6mr_sk_done(struct sock *sk) > } > } > rtnl_unlock(); > - synchronize_rcu(); > > return err; > } > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH net-next] ip6mr: remove synchronize_rcu() in favor of SOCK_RCU_FREE 2018-03-07 16:43 ` [PATCH net-next] ip6mr: remove synchronize_rcu() in favor of SOCK_RCU_FREE Eric Dumazet 2018-03-07 22:11 ` Kirill Tkhai @ 2018-03-07 23:14 ` David Miller 1 sibling, 0 replies; 11+ messages in thread From: David Miller @ 2018-03-07 23:14 UTC (permalink / raw) To: eric.dumazet; +Cc: yuvalm, ktkhai, yoshfuji, netdev From: Eric Dumazet <eric.dumazet@gmail.com> Date: Wed, 07 Mar 2018 08:43:19 -0800 > From: Eric Dumazet <edumazet@google.com> > > Kirill found that recently added synchronize_rcu() call in > ip6mr_sk_done() > was slowing down netns dismantle and posted a patch to use it only if > the socket > was found. > > I instead suggested to get rid of this call, and use instead > SOCK_RCU_FREE > > We might later change IPv4 side to use the same technique and unify > both stacks. IPv4 does not use synchronize_rcu() but has a call_rcu() > that could be replaced by SOCK_RCU_FREE. > > Tested: > time for i in {1..1000}; do unshare -n /bin/false;done > > Before : real 7m18.911s > After : real 10.187s > > Fixes: 8571ab479a6e ("ip6mr: Make mroute_sk rcu-based") > Signed-off-by: Eric Dumazet <edumazet@google.com> > Reported-by: Kirill Tkhai <ktkhai@virtuozzo.com> > Cc: Yuval Mintz <yuvalm@mellanox.com> Looks great, applied, thanks everyone. ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2018-03-07 23:14 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2018-03-06 16:24 [PATCH RESEND net-next] net: Do synchronize_rcu() in ip6mr_sk_done() only if this is needed Kirill Tkhai 2018-03-06 16:50 ` Eric Dumazet 2018-03-06 16:58 ` Eric Dumazet 2018-03-06 17:20 ` Eric Dumazet 2018-03-07 9:22 ` Kirill Tkhai 2018-03-07 9:32 ` Kirill Tkhai 2018-03-07 13:51 ` Yuval Mintz 2018-03-07 14:22 ` Eric Dumazet 2018-03-07 16:43 ` [PATCH net-next] ip6mr: remove synchronize_rcu() in favor of SOCK_RCU_FREE Eric Dumazet 2018-03-07 22:11 ` Kirill Tkhai 2018-03-07 23:14 ` David Miller
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).