* Question about synchronize_net() in AF_PACKET close()
@ 2014-09-09 21:44 Martin Kelly
2014-09-09 22:00 ` David Miller
0 siblings, 1 reply; 10+ messages in thread
From: Martin Kelly @ 2014-09-09 21:44 UTC (permalink / raw)
To: netdev; +Cc: Paul E. McKenney, Stephen Hemminger
Hi,
I have been reading the code in net/packet/af_packet.c in order to
optimize the runtime for raw socket close(). In certain cases, close()
can take a long time to return because of the synchronize_net()
overhead. In pursuit of speeding up close(), I have been testing a patch
that defers the socket buffer memory release via call_rcu() in order to
make close() return faster. However, I hit a tricky RCU question for
which I don't have the answer and wanted to know if any RCU/networking
experts could provide some guidance.
In net/packet/af_packet.c, I noticed the following few lines within
packet_release:
synchronize_net();
/*
* Now the socket is dead. No more input will appear.
*/
sock_orphan(sk);
sock->sk = NULL;
>From testing and code analysis, I have found that it appears to be safe
to move sock_orphan and sock->sk = NULL before the synchronize_net()
call like so:
/*
* Now the socket is dead. No more input will appear.
*/
sock_orphan(sk);
sock->sk = NULL;
synchronize_net();
Could some RCU and/or networking experts chime in about whether this a
safe operation? For all I know, there is some deep, fundamental reason
why those lines are in the order they are. On the other hand, perhaps
there is not.
Thanks,
Martin
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: Question about synchronize_net() in AF_PACKET close() 2014-09-09 21:44 Question about synchronize_net() in AF_PACKET close() Martin Kelly @ 2014-09-09 22:00 ` David Miller 2014-09-10 13:19 ` Marcelo Ricardo Leitner 2014-09-10 21:37 ` Martin Kelly 0 siblings, 2 replies; 10+ messages in thread From: David Miller @ 2014-09-09 22:00 UTC (permalink / raw) To: martin; +Cc: netdev, paulmck, stephen From: Martin Kelly <martin@martingkelly.com> Date: Tue, 9 Sep 2014 14:44:56 -0700 > In net/packet/af_packet.c, I noticed the following few lines within > packet_release: > > synchronize_net(); > /* > * Now the socket is dead. No more input will appear. > */ > sock_orphan(sk); > sock->sk = NULL; > > From testing and code analysis, I have found that it appears to be safe > to move sock_orphan and sock->sk = NULL before the synchronize_net() > call like so: > > /* > * Now the socket is dead. No more input will appear. > */ > sock_orphan(sk); > sock->sk = NULL; > synchronize_net(); > > Could some RCU and/or networking experts chime in about whether this a > safe operation? For all I know, there is some deep, fundamental reason > why those lines are in the order they are. On the other hand, perhaps > there is not. The synchronize_net() is also there to protect against the prot hook which can run asynchronously from the core packet input path on any cpu. You probably want to reference commit: commit 808f5114a9206fee855117d416440e1071ab375c Author: stephen hemminger <shemminger@vyatta.com> Date: Mon Feb 22 07:57:18 2010 +0000 packet: convert socket list to RCU (v3) which put the synchronize_net() there in the first place. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Question about synchronize_net() in AF_PACKET close() 2014-09-09 22:00 ` David Miller @ 2014-09-10 13:19 ` Marcelo Ricardo Leitner 2014-09-10 21:39 ` Martin Kelly 2014-09-10 21:37 ` Martin Kelly 1 sibling, 1 reply; 10+ messages in thread From: Marcelo Ricardo Leitner @ 2014-09-10 13:19 UTC (permalink / raw) To: martin; +Cc: David Miller, netdev, paulmck, stephen Em 09-09-2014 19:00, David Miller escreveu: > From: Martin Kelly <martin@martingkelly.com> > Date: Tue, 9 Sep 2014 14:44:56 -0700 > >> In net/packet/af_packet.c, I noticed the following few lines within >> packet_release: >> >> synchronize_net(); >> /* >> * Now the socket is dead. No more input will appear. >> */ >> sock_orphan(sk); >> sock->sk = NULL; >> >> From testing and code analysis, I have found that it appears to be safe >> to move sock_orphan and sock->sk = NULL before the synchronize_net() >> call like so: >> >> /* >> * Now the socket is dead. No more input will appear. >> */ >> sock_orphan(sk); >> sock->sk = NULL; >> synchronize_net(); >> >> Could some RCU and/or networking experts chime in about whether this a >> safe operation? For all I know, there is some deep, fundamental reason >> why those lines are in the order they are. On the other hand, perhaps >> there is not. > > The synchronize_net() is also there to protect against the prot hook > which can run asynchronously from the core packet input path on any > cpu. > > You probably want to reference commit: > > commit 808f5114a9206fee855117d416440e1071ab375c > Author: stephen hemminger <shemminger@vyatta.com> > Date: Mon Feb 22 07:57:18 2010 +0000 > > packet: convert socket list to RCU (v3) > > which put the synchronize_net() there in the first place. And also http://patchwork.ozlabs.org/patch/181605/ Which was an attempt on improving it. Marcelo ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Question about synchronize_net() in AF_PACKET close() 2014-09-10 13:19 ` Marcelo Ricardo Leitner @ 2014-09-10 21:39 ` Martin Kelly 0 siblings, 0 replies; 10+ messages in thread From: Martin Kelly @ 2014-09-10 21:39 UTC (permalink / raw) To: mleitner; +Cc: David Miller, netdev, Paul McKenney, Stephen Hemminger > > > And also http://patchwork.ozlabs.org/patch/181605/ > Which was an attempt on improving it. > > Marcelo > Yes, the patch I'm testing right now is very similar to the one mentioned in that link. Whether or not the optimization is worth the added complexity is a separate discussion. For now, I'm interested in whether the optimization can be safely performed, both for improving close() performance and for my own understanding of the code. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Question about synchronize_net() in AF_PACKET close() 2014-09-09 22:00 ` David Miller 2014-09-10 13:19 ` Marcelo Ricardo Leitner @ 2014-09-10 21:37 ` Martin Kelly 2014-09-17 14:29 ` Martin Kelly 1 sibling, 1 reply; 10+ messages in thread From: Martin Kelly @ 2014-09-10 21:37 UTC (permalink / raw) To: David Miller; +Cc: netdev, Paul McKenney, Stephen Hemminger > The synchronize_net() is also there to protect against the prot hook > which can run asynchronously from the core packet input path on any > cpu. > Yes, understood. What I'm not clear about is whether it is safe to do the following: unregister_prot_hook(sk, false); sock_orphan(sk); sock->sk = NULL; call_rcu(...); close socket, return to userspace instead of unregister_prot_hook(sk, false); synchronize_net(); sock_orphan(sk); sock->sk = NULL; close socket, return to userspace If you don't call synchronize_net() immediately, then other readers could see the protocol hook in the protocol list and try to use it. They could call into prot_hook.func. However, it appears that such functions ( e.g. packet_rcv() ) touch the socket buffer but not the socket itself, so orphaning the socket before all RCUs have been processed is safe. In addition, no new packets will come in after packet_release() and touch the socket because the socket fd will be removed from the process fd list. >From my testing, I'm not seeing any obvious issues, but I could be missing something. Is orphaning the socket before all RCUs have finished unsafe? ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Question about synchronize_net() in AF_PACKET close() 2014-09-10 21:37 ` Martin Kelly @ 2014-09-17 14:29 ` Martin Kelly 2014-09-17 14:54 ` Eric Dumazet 0 siblings, 1 reply; 10+ messages in thread From: Martin Kelly @ 2014-09-17 14:29 UTC (permalink / raw) To: David Miller; +Cc: netdev, Paul McKenney, Stephen Hemminger On 09/10/2014 02:37 PM, Martin Kelly wrote: >> The synchronize_net() is also there to protect against the prot hook >> which can run asynchronously from the core packet input path on any >> cpu. >> > > Yes, understood. What I'm not clear about is whether it is safe to do > the following: > > unregister_prot_hook(sk, false); > sock_orphan(sk); > sock->sk = NULL; > call_rcu(...); > close socket, return to userspace > > instead of > > unregister_prot_hook(sk, false); > synchronize_net(); > sock_orphan(sk); > sock->sk = NULL; > close socket, return to userspace > > If you don't call synchronize_net() immediately, then other readers > could see the protocol hook in the protocol list and try to use it. > They could call into prot_hook.func. However, it appears that such > functions ( e.g. packet_rcv() ) touch the socket buffer but not the > socket itself, so orphaning the socket before all RCUs have been > processed is safe. In addition, no new packets will come in after > packet_release() and touch the socket because the socket fd will be > removed from the process fd list. > > From my testing, I'm not seeing any obvious issues, but I could be > missing something. Is orphaning the socket before all RCUs have > finished unsafe? > (friendly ping) ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Question about synchronize_net() in AF_PACKET close() 2014-09-17 14:29 ` Martin Kelly @ 2014-09-17 14:54 ` Eric Dumazet 2014-09-17 17:04 ` Martin Kelly 0 siblings, 1 reply; 10+ messages in thread From: Eric Dumazet @ 2014-09-17 14:54 UTC (permalink / raw) To: Martin Kelly; +Cc: David Miller, netdev, Paul McKenney, Stephen Hemminger On Wed, 2014-09-17 at 07:29 -0700, Martin Kelly wrote: > On 09/10/2014 02:37 PM, Martin Kelly wrote: > >> The synchronize_net() is also there to protect against the prot hook > >> which can run asynchronously from the core packet input path on any > >> cpu. > >> > > > > Yes, understood. What I'm not clear about is whether it is safe to do > > the following: > > > > unregister_prot_hook(sk, false); > > sock_orphan(sk); > > sock->sk = NULL; > > call_rcu(...); Can you describe the ... ? > > close socket, return to userspace > > > > instead of > > > > unregister_prot_hook(sk, false); > > synchronize_net(); > > sock_orphan(sk); > > sock->sk = NULL; > > close socket, return to userspace > > > > If you don't call synchronize_net() immediately, then other readers > > could see the protocol hook in the protocol list and try to use it. > > They could call into prot_hook.func. However, it appears that such > > functions ( e.g. packet_rcv() ) touch the socket buffer but not the > > socket itself, so orphaning the socket before all RCUs have been > > processed is safe. In addition, no new packets will come in after > > packet_release() and touch the socket because the socket fd will be > > removed from the process fd list. > > > > From my testing, I'm not seeing any obvious issues, but I could be > > missing something. Is orphaning the socket before all RCUs have > > finished unsafe? > > > > (friendly ping) What problem do you want to solve exactly ? I believe its not safe, you missed sk_data_ready() call (sock_def_readable()) ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Question about synchronize_net() in AF_PACKET close() 2014-09-17 14:54 ` Eric Dumazet @ 2014-09-17 17:04 ` Martin Kelly 2014-09-17 17:52 ` Eric Dumazet 0 siblings, 1 reply; 10+ messages in thread From: Martin Kelly @ 2014-09-17 17:04 UTC (permalink / raw) To: Eric Dumazet; +Cc: David Miller, netdev, Paul McKenney, Stephen Hemminger On Wed, Sep 17, 2014 at 7:54 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > > Can you describe the ... ? > Sure. The patch I've been looking at is functionally identical to the one referenced in http://patchwork.ozlabs.org/patch/181605/ with the exception that sock_orphan() and sock->sk = NULL is moved before the synchronize_net(). This further improves the performance relative to http://patchwork.ozlabs.org/patch/181605/ because it never calls synchronize_net(), whereas http://patchwork.ozlabs.org/patch/181605/ calls synchronize_net() in the po->running case. However, it looks like making that optimization is not safe, for the reasons you mentioned below. > > What problem do you want to solve exactly ? > > I believe its not safe, you missed sk_data_ready() call > (sock_def_readable()) > The problem I I'm trying to solve is the same one mentioned in http://patchwork.ozlabs.org/patch/181605/: On systems with a lot of RCU contention, raw socket close() can take up to 300 ms or so, which really adds up for processes that open lots of raw sockets. What you see is a many-second hang when the process exits as it waits for many synchronize_rcu calls (all in serial) to finish. Although it would be better for close() not to be in a critical path, it's sometimes hard to get around for some userspace processes, so this kind of change could yield a significant performance improvement for such processes. You're right, I missed sk_def_readable(), which looks like it would cause a crash if it occurred before the call_rcu() was called. I see that sk_def_readable() acceses sk->sk_wq through rcu_read_lock(), rcu_dereference(), rcu_read_unlock(), while sock_orphan() sets sk->sk_wq = NULL without an RCU function. You could modify sock_orphan() and sock->sk to use RCU too, but you would need to patch all access sites, and the patch would quickly become complex and error-prone, which is not worth it. Does my analysis sound correct to you? If so, can you think of a more natural way to improve raw socket close() performance? I agree that optimally, userspace should not put close() in a critical path, but it still seems like a bug for close() to take 300 ms to complete. Thanks, Martin ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Question about synchronize_net() in AF_PACKET close() 2014-09-17 17:04 ` Martin Kelly @ 2014-09-17 17:52 ` Eric Dumazet 2014-09-17 18:58 ` Martin Kelly 0 siblings, 1 reply; 10+ messages in thread From: Eric Dumazet @ 2014-09-17 17:52 UTC (permalink / raw) To: Martin Kelly; +Cc: David Miller, netdev, Paul McKenney, Stephen Hemminger On Wed, 2014-09-17 at 10:04 -0700, Martin Kelly wrote: > Does my analysis sound correct to you? If so, can you think of a more > natural way to improve raw socket close() performance? I agree that > optimally, userspace should not put close() in a critical path, but it > still seems like a bug for close() to take 300 ms to complete. I have a plan to remove the nulls support for UDP stack, because we would like to be able to use million of connected UDP sockets, and we would like to get rid of the atomic_inc()/atomic_dec() on socket refcount for every incoming message ;) My plan was to add a way for a socket to be freed after one rcu grace period. It would be an opt-in for protocols needing this. TCP stack would continue to use SLAB_DESTROY_BY_RCU. af_packet could immediately use this, and not have to use synchronize_net() at all. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Question about synchronize_net() in AF_PACKET close() 2014-09-17 17:52 ` Eric Dumazet @ 2014-09-17 18:58 ` Martin Kelly 0 siblings, 0 replies; 10+ messages in thread From: Martin Kelly @ 2014-09-17 18:58 UTC (permalink / raw) To: Eric Dumazet; +Cc: David Miller, netdev, Paul McKenney, Stephen Hemminger On Wed, Sep 17, 2014 at 10:52 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > > I have a plan to remove the nulls support for UDP stack, because we > would like to be able to use million of connected UDP sockets, and we > would like to get rid of the atomic_inc()/atomic_dec() on socket > refcount for every incoming message ;) > > My plan was to add a way for a socket to be freed after one rcu grace > period. > > It would be an opt-in for protocols needing this. TCP stack would > continue to use SLAB_DESTROY_BY_RCU. > > af_packet could immediately use this, and not have to use > synchronize_net() at all. > That sounds like a great idea. Do you have a rough estimate of how long that work will take or what will be required? ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2014-09-17 18:58 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-09-09 21:44 Question about synchronize_net() in AF_PACKET close() Martin Kelly 2014-09-09 22:00 ` David Miller 2014-09-10 13:19 ` Marcelo Ricardo Leitner 2014-09-10 21:39 ` Martin Kelly 2014-09-10 21:37 ` Martin Kelly 2014-09-17 14:29 ` Martin Kelly 2014-09-17 14:54 ` Eric Dumazet 2014-09-17 17:04 ` Martin Kelly 2014-09-17 17:52 ` Eric Dumazet 2014-09-17 18:58 ` Martin Kelly
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).