* Question about synchronize_net() in AF_PACKET close()
@ 2014-09-09 21:44 Martin Kelly
2014-09-09 22:00 ` David Miller
0 siblings, 1 reply; 10+ messages in thread
From: Martin Kelly @ 2014-09-09 21:44 UTC (permalink / raw)
To: netdev; +Cc: Paul E. McKenney, Stephen Hemminger
Hi,
I have been reading the code in net/packet/af_packet.c in order to
optimize the runtime for raw socket close(). In certain cases, close()
can take a long time to return because of the synchronize_net()
overhead. In pursuit of speeding up close(), I have been testing a patch
that defers the socket buffer memory release via call_rcu() in order to
make close() return faster. However, I hit a tricky RCU question for
which I don't have the answer and wanted to know if any RCU/networking
experts could provide some guidance.
In net/packet/af_packet.c, I noticed the following few lines within
packet_release:
synchronize_net();
/*
* Now the socket is dead. No more input will appear.
*/
sock_orphan(sk);
sock->sk = NULL;
>From testing and code analysis, I have found that it appears to be safe
to move sock_orphan and sock->sk = NULL before the synchronize_net()
call like so:
/*
* Now the socket is dead. No more input will appear.
*/
sock_orphan(sk);
sock->sk = NULL;
synchronize_net();
Could some RCU and/or networking experts chime in about whether this a
safe operation? For all I know, there is some deep, fundamental reason
why those lines are in the order they are. On the other hand, perhaps
there is not.
Thanks,
Martin
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Question about synchronize_net() in AF_PACKET close()
2014-09-09 21:44 Question about synchronize_net() in AF_PACKET close() Martin Kelly
@ 2014-09-09 22:00 ` David Miller
2014-09-10 13:19 ` Marcelo Ricardo Leitner
2014-09-10 21:37 ` Martin Kelly
0 siblings, 2 replies; 10+ messages in thread
From: David Miller @ 2014-09-09 22:00 UTC (permalink / raw)
To: martin; +Cc: netdev, paulmck, stephen
From: Martin Kelly <martin@martingkelly.com>
Date: Tue, 9 Sep 2014 14:44:56 -0700
> In net/packet/af_packet.c, I noticed the following few lines within
> packet_release:
>
> synchronize_net();
> /*
> * Now the socket is dead. No more input will appear.
> */
> sock_orphan(sk);
> sock->sk = NULL;
>
> From testing and code analysis, I have found that it appears to be safe
> to move sock_orphan and sock->sk = NULL before the synchronize_net()
> call like so:
>
> /*
> * Now the socket is dead. No more input will appear.
> */
> sock_orphan(sk);
> sock->sk = NULL;
> synchronize_net();
>
> Could some RCU and/or networking experts chime in about whether this a
> safe operation? For all I know, there is some deep, fundamental reason
> why those lines are in the order they are. On the other hand, perhaps
> there is not.
The synchronize_net() is also there to protect against the prot hook
which can run asynchronously from the core packet input path on any
cpu.
You probably want to reference commit:
commit 808f5114a9206fee855117d416440e1071ab375c
Author: stephen hemminger <shemminger@vyatta.com>
Date: Mon Feb 22 07:57:18 2010 +0000
packet: convert socket list to RCU (v3)
which put the synchronize_net() there in the first place.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Question about synchronize_net() in AF_PACKET close()
2014-09-09 22:00 ` David Miller
@ 2014-09-10 13:19 ` Marcelo Ricardo Leitner
2014-09-10 21:39 ` Martin Kelly
2014-09-10 21:37 ` Martin Kelly
1 sibling, 1 reply; 10+ messages in thread
From: Marcelo Ricardo Leitner @ 2014-09-10 13:19 UTC (permalink / raw)
To: martin; +Cc: David Miller, netdev, paulmck, stephen
Em 09-09-2014 19:00, David Miller escreveu:
> From: Martin Kelly <martin@martingkelly.com>
> Date: Tue, 9 Sep 2014 14:44:56 -0700
>
>> In net/packet/af_packet.c, I noticed the following few lines within
>> packet_release:
>>
>> synchronize_net();
>> /*
>> * Now the socket is dead. No more input will appear.
>> */
>> sock_orphan(sk);
>> sock->sk = NULL;
>>
>> From testing and code analysis, I have found that it appears to be safe
>> to move sock_orphan and sock->sk = NULL before the synchronize_net()
>> call like so:
>>
>> /*
>> * Now the socket is dead. No more input will appear.
>> */
>> sock_orphan(sk);
>> sock->sk = NULL;
>> synchronize_net();
>>
>> Could some RCU and/or networking experts chime in about whether this a
>> safe operation? For all I know, there is some deep, fundamental reason
>> why those lines are in the order they are. On the other hand, perhaps
>> there is not.
>
> The synchronize_net() is also there to protect against the prot hook
> which can run asynchronously from the core packet input path on any
> cpu.
>
> You probably want to reference commit:
>
> commit 808f5114a9206fee855117d416440e1071ab375c
> Author: stephen hemminger <shemminger@vyatta.com>
> Date: Mon Feb 22 07:57:18 2010 +0000
>
> packet: convert socket list to RCU (v3)
>
> which put the synchronize_net() there in the first place.
And also http://patchwork.ozlabs.org/patch/181605/
Which was an attempt on improving it.
Marcelo
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Question about synchronize_net() in AF_PACKET close()
2014-09-09 22:00 ` David Miller
2014-09-10 13:19 ` Marcelo Ricardo Leitner
@ 2014-09-10 21:37 ` Martin Kelly
2014-09-17 14:29 ` Martin Kelly
1 sibling, 1 reply; 10+ messages in thread
From: Martin Kelly @ 2014-09-10 21:37 UTC (permalink / raw)
To: David Miller; +Cc: netdev, Paul McKenney, Stephen Hemminger
> The synchronize_net() is also there to protect against the prot hook
> which can run asynchronously from the core packet input path on any
> cpu.
>
Yes, understood. What I'm not clear about is whether it is safe to do
the following:
unregister_prot_hook(sk, false);
sock_orphan(sk);
sock->sk = NULL;
call_rcu(...);
close socket, return to userspace
instead of
unregister_prot_hook(sk, false);
synchronize_net();
sock_orphan(sk);
sock->sk = NULL;
close socket, return to userspace
If you don't call synchronize_net() immediately, then other readers
could see the protocol hook in the protocol list and try to use it.
They could call into prot_hook.func. However, it appears that such
functions ( e.g. packet_rcv() ) touch the socket buffer but not the
socket itself, so orphaning the socket before all RCUs have been
processed is safe. In addition, no new packets will come in after
packet_release() and touch the socket because the socket fd will be
removed from the process fd list.
>From my testing, I'm not seeing any obvious issues, but I could be
missing something. Is orphaning the socket before all RCUs have
finished unsafe?
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Question about synchronize_net() in AF_PACKET close()
2014-09-10 13:19 ` Marcelo Ricardo Leitner
@ 2014-09-10 21:39 ` Martin Kelly
0 siblings, 0 replies; 10+ messages in thread
From: Martin Kelly @ 2014-09-10 21:39 UTC (permalink / raw)
To: mleitner; +Cc: David Miller, netdev, Paul McKenney, Stephen Hemminger
>
>
> And also http://patchwork.ozlabs.org/patch/181605/
> Which was an attempt on improving it.
>
> Marcelo
>
Yes, the patch I'm testing right now is very similar to the one
mentioned in that link. Whether or not the optimization is worth the
added complexity is a separate discussion. For now, I'm interested in
whether the optimization can be safely performed, both for improving
close() performance and for my own understanding of the code.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Question about synchronize_net() in AF_PACKET close()
2014-09-10 21:37 ` Martin Kelly
@ 2014-09-17 14:29 ` Martin Kelly
2014-09-17 14:54 ` Eric Dumazet
0 siblings, 1 reply; 10+ messages in thread
From: Martin Kelly @ 2014-09-17 14:29 UTC (permalink / raw)
To: David Miller; +Cc: netdev, Paul McKenney, Stephen Hemminger
On 09/10/2014 02:37 PM, Martin Kelly wrote:
>> The synchronize_net() is also there to protect against the prot hook
>> which can run asynchronously from the core packet input path on any
>> cpu.
>>
>
> Yes, understood. What I'm not clear about is whether it is safe to do
> the following:
>
> unregister_prot_hook(sk, false);
> sock_orphan(sk);
> sock->sk = NULL;
> call_rcu(...);
> close socket, return to userspace
>
> instead of
>
> unregister_prot_hook(sk, false);
> synchronize_net();
> sock_orphan(sk);
> sock->sk = NULL;
> close socket, return to userspace
>
> If you don't call synchronize_net() immediately, then other readers
> could see the protocol hook in the protocol list and try to use it.
> They could call into prot_hook.func. However, it appears that such
> functions ( e.g. packet_rcv() ) touch the socket buffer but not the
> socket itself, so orphaning the socket before all RCUs have been
> processed is safe. In addition, no new packets will come in after
> packet_release() and touch the socket because the socket fd will be
> removed from the process fd list.
>
> From my testing, I'm not seeing any obvious issues, but I could be
> missing something. Is orphaning the socket before all RCUs have
> finished unsafe?
>
(friendly ping)
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Question about synchronize_net() in AF_PACKET close()
2014-09-17 14:29 ` Martin Kelly
@ 2014-09-17 14:54 ` Eric Dumazet
2014-09-17 17:04 ` Martin Kelly
0 siblings, 1 reply; 10+ messages in thread
From: Eric Dumazet @ 2014-09-17 14:54 UTC (permalink / raw)
To: Martin Kelly; +Cc: David Miller, netdev, Paul McKenney, Stephen Hemminger
On Wed, 2014-09-17 at 07:29 -0700, Martin Kelly wrote:
> On 09/10/2014 02:37 PM, Martin Kelly wrote:
> >> The synchronize_net() is also there to protect against the prot hook
> >> which can run asynchronously from the core packet input path on any
> >> cpu.
> >>
> >
> > Yes, understood. What I'm not clear about is whether it is safe to do
> > the following:
> >
> > unregister_prot_hook(sk, false);
> > sock_orphan(sk);
> > sock->sk = NULL;
> > call_rcu(...);
Can you describe the ... ?
> > close socket, return to userspace
> >
> > instead of
> >
> > unregister_prot_hook(sk, false);
> > synchronize_net();
> > sock_orphan(sk);
> > sock->sk = NULL;
> > close socket, return to userspace
> >
> > If you don't call synchronize_net() immediately, then other readers
> > could see the protocol hook in the protocol list and try to use it.
> > They could call into prot_hook.func. However, it appears that such
> > functions ( e.g. packet_rcv() ) touch the socket buffer but not the
> > socket itself, so orphaning the socket before all RCUs have been
> > processed is safe. In addition, no new packets will come in after
> > packet_release() and touch the socket because the socket fd will be
> > removed from the process fd list.
> >
> > From my testing, I'm not seeing any obvious issues, but I could be
> > missing something. Is orphaning the socket before all RCUs have
> > finished unsafe?
> >
>
> (friendly ping)
What problem do you want to solve exactly ?
I believe its not safe, you missed sk_data_ready() call
(sock_def_readable())
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Question about synchronize_net() in AF_PACKET close()
2014-09-17 14:54 ` Eric Dumazet
@ 2014-09-17 17:04 ` Martin Kelly
2014-09-17 17:52 ` Eric Dumazet
0 siblings, 1 reply; 10+ messages in thread
From: Martin Kelly @ 2014-09-17 17:04 UTC (permalink / raw)
To: Eric Dumazet; +Cc: David Miller, netdev, Paul McKenney, Stephen Hemminger
On Wed, Sep 17, 2014 at 7:54 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
> Can you describe the ... ?
>
Sure. The patch I've been looking at is functionally identical to the
one referenced in http://patchwork.ozlabs.org/patch/181605/ with the
exception that sock_orphan() and sock->sk = NULL is moved before the
synchronize_net(). This further improves the performance relative to
http://patchwork.ozlabs.org/patch/181605/ because it never calls
synchronize_net(), whereas http://patchwork.ozlabs.org/patch/181605/
calls synchronize_net() in the po->running case. However, it looks
like making that optimization is not safe, for the reasons you
mentioned below.
>
> What problem do you want to solve exactly ?
>
> I believe its not safe, you missed sk_data_ready() call
> (sock_def_readable())
>
The problem I I'm trying to solve is the same one mentioned in
http://patchwork.ozlabs.org/patch/181605/: On systems with a lot of
RCU contention, raw socket close() can take up to 300 ms or so, which
really adds up for processes that open lots of raw sockets. What you
see is a many-second hang when the process exits as it waits for many
synchronize_rcu calls (all in serial) to finish. Although it would be
better for close() not to be in a critical path, it's sometimes hard
to get around for some userspace processes, so this kind of change
could yield a significant performance improvement for such processes.
You're right, I missed sk_def_readable(), which looks like it would
cause a crash if it occurred before the call_rcu() was called. I see
that sk_def_readable() acceses sk->sk_wq through rcu_read_lock(),
rcu_dereference(), rcu_read_unlock(), while sock_orphan() sets
sk->sk_wq = NULL without an RCU function. You could modify
sock_orphan() and sock->sk to use RCU too, but you would need to patch
all access sites, and the patch would quickly become complex and
error-prone, which is not worth it.
Does my analysis sound correct to you? If so, can you think of a more
natural way to improve raw socket close() performance? I agree that
optimally, userspace should not put close() in a critical path, but it
still seems like a bug for close() to take 300 ms to complete.
Thanks,
Martin
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Question about synchronize_net() in AF_PACKET close()
2014-09-17 17:04 ` Martin Kelly
@ 2014-09-17 17:52 ` Eric Dumazet
2014-09-17 18:58 ` Martin Kelly
0 siblings, 1 reply; 10+ messages in thread
From: Eric Dumazet @ 2014-09-17 17:52 UTC (permalink / raw)
To: Martin Kelly; +Cc: David Miller, netdev, Paul McKenney, Stephen Hemminger
On Wed, 2014-09-17 at 10:04 -0700, Martin Kelly wrote:
> Does my analysis sound correct to you? If so, can you think of a more
> natural way to improve raw socket close() performance? I agree that
> optimally, userspace should not put close() in a critical path, but it
> still seems like a bug for close() to take 300 ms to complete.
I have a plan to remove the nulls support for UDP stack, because we
would like to be able to use million of connected UDP sockets, and we
would like to get rid of the atomic_inc()/atomic_dec() on socket
refcount for every incoming message ;)
My plan was to add a way for a socket to be freed after one rcu grace
period.
It would be an opt-in for protocols needing this. TCP stack would
continue to use SLAB_DESTROY_BY_RCU.
af_packet could immediately use this, and not have to use
synchronize_net() at all.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Question about synchronize_net() in AF_PACKET close()
2014-09-17 17:52 ` Eric Dumazet
@ 2014-09-17 18:58 ` Martin Kelly
0 siblings, 0 replies; 10+ messages in thread
From: Martin Kelly @ 2014-09-17 18:58 UTC (permalink / raw)
To: Eric Dumazet; +Cc: David Miller, netdev, Paul McKenney, Stephen Hemminger
On Wed, Sep 17, 2014 at 10:52 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
> I have a plan to remove the nulls support for UDP stack, because we
> would like to be able to use million of connected UDP sockets, and we
> would like to get rid of the atomic_inc()/atomic_dec() on socket
> refcount for every incoming message ;)
>
> My plan was to add a way for a socket to be freed after one rcu grace
> period.
>
> It would be an opt-in for protocols needing this. TCP stack would
> continue to use SLAB_DESTROY_BY_RCU.
>
> af_packet could immediately use this, and not have to use
> synchronize_net() at all.
>
That sounds like a great idea. Do you have a rough estimate of how
long that work will take or what will be required?
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2014-09-17 18:58 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-09-09 21:44 Question about synchronize_net() in AF_PACKET close() Martin Kelly
2014-09-09 22:00 ` David Miller
2014-09-10 13:19 ` Marcelo Ricardo Leitner
2014-09-10 21:39 ` Martin Kelly
2014-09-10 21:37 ` Martin Kelly
2014-09-17 14:29 ` Martin Kelly
2014-09-17 14:54 ` Eric Dumazet
2014-09-17 17:04 ` Martin Kelly
2014-09-17 17:52 ` Eric Dumazet
2014-09-17 18:58 ` Martin Kelly
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).