netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next v2] vxlan: synchronously and race-free destruction of vxlan sockets
@ 2016-04-08 20:55 Hannes Frederic Sowa
  2016-04-08 23:24 ` Cong Wang
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Hannes Frederic Sowa @ 2016-04-08 20:55 UTC (permalink / raw)
  To: netdev; +Cc: Eric Dumazet, Jiri Benc, Marcelo Ricardo Leitner

Due to the fact that the udp socket is destructed asynchronously in a
work queue, we have some nondeterministic behavior during shutdown of
vxlan tunnels and creating new ones. Fix this by keeping the destruction
process synchronous in regards to the user space process so IFF_UP can
be reliably set.

udp_tunnel_sock_release destroys vs->sock->sk if reference counter
indicates so. We expect to have the same lifetime of vxlan_sock and
vxlan_sock->sock->sk even in fast paths with only rcu locks held. So
only destruct the whole socket after we can be sure it cannot be found
by searching vxlan_net->sock_list.

Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Jiri Benc <jbenc@redhat.com>
Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
---
v2) synchronize_rcu -> synchronize_net (proposed by Eric, thanks!)
    also rebased on net-next to apply without conflicts
 drivers/net/vxlan.c | 20 +++-----------------
 include/net/vxlan.h |  2 --
 2 files changed, 3 insertions(+), 19 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 9f3634064c921f..77ba31a0e44f97 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -98,7 +98,6 @@ struct vxlan_fdb {
 
 /* salt for hash table */
 static u32 vxlan_salt __read_mostly;
-static struct workqueue_struct *vxlan_wq;
 
 static inline bool vxlan_collect_metadata(struct vxlan_sock *vs)
 {
@@ -1053,7 +1052,9 @@ static void __vxlan_sock_release(struct vxlan_sock *vs)
 	vxlan_notify_del_rx_port(vs);
 	spin_unlock(&vn->sock_lock);
 
-	queue_work(vxlan_wq, &vs->del_work);
+	synchronize_net();
+	udp_tunnel_sock_release(vs->sock);
+	kfree(vs);
 }
 
 static void vxlan_sock_release(struct vxlan_dev *vxlan)
@@ -2674,13 +2675,6 @@ static const struct ethtool_ops vxlan_ethtool_ops = {
 	.get_link	= ethtool_op_get_link,
 };
 
-static void vxlan_del_work(struct work_struct *work)
-{
-	struct vxlan_sock *vs = container_of(work, struct vxlan_sock, del_work);
-	udp_tunnel_sock_release(vs->sock);
-	kfree_rcu(vs, rcu);
-}
-
 static struct socket *vxlan_create_sock(struct net *net, bool ipv6,
 					__be16 port, u32 flags)
 {
@@ -2726,8 +2720,6 @@ static struct vxlan_sock *vxlan_socket_create(struct net *net, bool ipv6,
 	for (h = 0; h < VNI_HASH_SIZE; ++h)
 		INIT_HLIST_HEAD(&vs->vni_list[h]);
 
-	INIT_WORK(&vs->del_work, vxlan_del_work);
-
 	sock = vxlan_create_sock(net, ipv6, port, flags);
 	if (IS_ERR(sock)) {
 		pr_info("Cannot bind port %d, err=%ld\n", ntohs(port),
@@ -3346,10 +3338,6 @@ static int __init vxlan_init_module(void)
 {
 	int rc;
 
-	vxlan_wq = alloc_workqueue("vxlan", 0, 0);
-	if (!vxlan_wq)
-		return -ENOMEM;
-
 	get_random_bytes(&vxlan_salt, sizeof(vxlan_salt));
 
 	rc = register_pernet_subsys(&vxlan_net_ops);
@@ -3370,7 +3358,6 @@ out3:
 out2:
 	unregister_pernet_subsys(&vxlan_net_ops);
 out1:
-	destroy_workqueue(vxlan_wq);
 	return rc;
 }
 late_initcall(vxlan_init_module);
@@ -3379,7 +3366,6 @@ static void __exit vxlan_cleanup_module(void)
 {
 	rtnl_link_unregister(&vxlan_link_ops);
 	unregister_netdevice_notifier(&vxlan_notifier_block);
-	destroy_workqueue(vxlan_wq);
 	unregister_pernet_subsys(&vxlan_net_ops);
 	/* rcu_barrier() is called by netns */
 }
diff --git a/include/net/vxlan.h b/include/net/vxlan.h
index 2f168f0ea32c39..d442eb3129cde4 100644
--- a/include/net/vxlan.h
+++ b/include/net/vxlan.h
@@ -184,9 +184,7 @@ struct vxlan_metadata {
 /* per UDP socket information */
 struct vxlan_sock {
 	struct hlist_node hlist;
-	struct work_struct del_work;
 	struct socket	 *sock;
-	struct rcu_head	  rcu;
 	struct hlist_head vni_list[VNI_HASH_SIZE];
 	atomic_t	  refcnt;
 	u32		  flags;
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH net-next v2] vxlan: synchronously and race-free destruction of vxlan sockets
  2016-04-08 20:55 [PATCH net-next v2] vxlan: synchronously and race-free destruction of vxlan sockets Hannes Frederic Sowa
@ 2016-04-08 23:24 ` Cong Wang
  2016-04-08 23:55   ` Hannes Frederic Sowa
  2016-04-15 20:36 ` David Miller
  2016-04-16 22:23 ` David Miller
  2 siblings, 1 reply; 8+ messages in thread
From: Cong Wang @ 2016-04-08 23:24 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: Linux Kernel Network Developers, Eric Dumazet, Jiri Benc,
	Marcelo Ricardo Leitner

On Fri, Apr 8, 2016 at 1:55 PM, Hannes Frederic Sowa
<hannes@stressinduktion.org> wrote:
> Due to the fact that the udp socket is destructed asynchronously in a
> work queue, we have some nondeterministic behavior during shutdown of
> vxlan tunnels and creating new ones. Fix this by keeping the destruction
> process synchronous in regards to the user space process so IFF_UP can
> be reliably set.
>
> udp_tunnel_sock_release destroys vs->sock->sk if reference counter
> indicates so. We expect to have the same lifetime of vxlan_sock and
> vxlan_sock->sock->sk even in fast paths with only rcu locks held. So
> only destruct the whole socket after we can be sure it cannot be found
> by searching vxlan_net->sock_list.
>

I am wondering what is the reason why we used work queue from
the beginning?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net-next v2] vxlan: synchronously and race-free destruction of vxlan sockets
  2016-04-08 23:24 ` Cong Wang
@ 2016-04-08 23:55   ` Hannes Frederic Sowa
  2016-04-15 20:58     ` Stephen Hemminger
  0 siblings, 1 reply; 8+ messages in thread
From: Hannes Frederic Sowa @ 2016-04-08 23:55 UTC (permalink / raw)
  To: Cong Wang
  Cc: Linux Kernel Network Developers, Eric Dumazet, Jiri Benc,
	Marcelo Ricardo Leitner, Stephen Hemminger



On Sat, Apr 9, 2016, at 01:24, Cong Wang wrote:
> On Fri, Apr 8, 2016 at 1:55 PM, Hannes Frederic Sowa
> <hannes@stressinduktion.org> wrote:
> > Due to the fact that the udp socket is destructed asynchronously in a
> > work queue, we have some nondeterministic behavior during shutdown of
> > vxlan tunnels and creating new ones. Fix this by keeping the destruction
> > process synchronous in regards to the user space process so IFF_UP can
> > be reliably set.
> >
> > udp_tunnel_sock_release destroys vs->sock->sk if reference counter
> > indicates so. We expect to have the same lifetime of vxlan_sock and
> > vxlan_sock->sock->sk even in fast paths with only rcu locks held. So
> > only destruct the whole socket after we can be sure it cannot be found
> > by searching vxlan_net->sock_list.
> >
> 
> I am wondering what is the reason why we used work queue from
> the beginning?

I actually don't know. It was like that from the beginning. I cc'ed
Stephen, maybe he remembers?

Bye,
Hannes

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net-next v2] vxlan: synchronously and race-free destruction of vxlan sockets
  2016-04-08 20:55 [PATCH net-next v2] vxlan: synchronously and race-free destruction of vxlan sockets Hannes Frederic Sowa
  2016-04-08 23:24 ` Cong Wang
@ 2016-04-15 20:36 ` David Miller
  2016-04-15 21:50   ` Hannes Frederic Sowa
  2016-04-16 22:23 ` David Miller
  2 siblings, 1 reply; 8+ messages in thread
From: David Miller @ 2016-04-15 20:36 UTC (permalink / raw)
  To: hannes; +Cc: netdev, eric.dumazet, jbenc, marcelo.leitner

From: Hannes Frederic Sowa <hannes@stressinduktion.org>
Date: Fri,  8 Apr 2016 22:55:01 +0200

> @@ -1053,7 +1052,9 @@ static void __vxlan_sock_release(struct vxlan_sock *vs)
>  	vxlan_notify_del_rx_port(vs);
>  	spin_unlock(&vn->sock_lock);
>  
> -	queue_work(vxlan_wq, &vs->del_work);
> +	synchronize_net();
> +	udp_tunnel_sock_release(vs->sock);
> +	kfree(vs);
>  }
>  
>  static void vxlan_sock_release(struct vxlan_dev *vxlan)

I just want to make sure you saw this change in net-next:

====================
commit ca065d0cf80fa547724440a8bf37f1e674d917c0
Author: Eric Dumazet <edumazet@google.com>
Date:   Fri Apr 1 08:52:13 2016 -0700

    udp: no longer use SLAB_DESTROY_BY_RCU
====================

Does that effect your change?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net-next v2] vxlan: synchronously and race-free destruction of vxlan sockets
  2016-04-08 23:55   ` Hannes Frederic Sowa
@ 2016-04-15 20:58     ` Stephen Hemminger
  2016-04-15 21:47       ` Marcelo Ricardo Leitner
  0 siblings, 1 reply; 8+ messages in thread
From: Stephen Hemminger @ 2016-04-15 20:58 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: Cong Wang, Linux Kernel Network Developers, Eric Dumazet,
	Jiri Benc, Marcelo Ricardo Leitner

On Sat, 09 Apr 2016 01:55:06 +0200
Hannes Frederic Sowa <hannes@stressinduktion.org> wrote:

> 
> 
> On Sat, Apr 9, 2016, at 01:24, Cong Wang wrote:
> > On Fri, Apr 8, 2016 at 1:55 PM, Hannes Frederic Sowa
> > <hannes@stressinduktion.org> wrote:
> > > Due to the fact that the udp socket is destructed asynchronously in a
> > > work queue, we have some nondeterministic behavior during shutdown of
> > > vxlan tunnels and creating new ones. Fix this by keeping the destruction
> > > process synchronous in regards to the user space process so IFF_UP can
> > > be reliably set.
> > >
> > > udp_tunnel_sock_release destroys vs->sock->sk if reference counter
> > > indicates so. We expect to have the same lifetime of vxlan_sock and
> > > vxlan_sock->sock->sk even in fast paths with only rcu locks held. So
> > > only destruct the whole socket after we can be sure it cannot be found
> > > by searching vxlan_net->sock_list.
> > >
> > 
> > I am wondering what is the reason why we used work queue from
> > the beginning?
> 
> I actually don't know. It was like that from the beginning. I cc'ed
> Stephen, maybe he remembers?
> 
> Bye,
> Hannes

The problem was that VXLAN needs to update multicast settings and that
can't be done under RTNL.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net-next v2] vxlan: synchronously and race-free destruction of vxlan sockets
  2016-04-15 20:58     ` Stephen Hemminger
@ 2016-04-15 21:47       ` Marcelo Ricardo Leitner
  0 siblings, 0 replies; 8+ messages in thread
From: Marcelo Ricardo Leitner @ 2016-04-15 21:47 UTC (permalink / raw)
  To: Stephen Hemminger, Hannes Frederic Sowa
  Cc: Cong Wang, Linux Kernel Network Developers, Eric Dumazet,
	Jiri Benc

Em 15-04-2016 17:58, Stephen Hemminger escreveu:
> On Sat, 09 Apr 2016 01:55:06 +0200
> Hannes Frederic Sowa <hannes@stressinduktion.org> wrote:
>
>>
>>
>> On Sat, Apr 9, 2016, at 01:24, Cong Wang wrote:
>>> On Fri, Apr 8, 2016 at 1:55 PM, Hannes Frederic Sowa
>>> <hannes@stressinduktion.org> wrote:
>>>> Due to the fact that the udp socket is destructed asynchronously in a
>>>> work queue, we have some nondeterministic behavior during shutdown of
>>>> vxlan tunnels and creating new ones. Fix this by keeping the destruction
>>>> process synchronous in regards to the user space process so IFF_UP can
>>>> be reliably set.
>>>>
>>>> udp_tunnel_sock_release destroys vs->sock->sk if reference counter
>>>> indicates so. We expect to have the same lifetime of vxlan_sock and
>>>> vxlan_sock->sock->sk even in fast paths with only rcu locks held. So
>>>> only destruct the whole socket after we can be sure it cannot be found
>>>> by searching vxlan_net->sock_list.
>>>>
>>>
>>> I am wondering what is the reason why we used work queue from
>>> the beginning?
>>
>> I actually don't know. It was like that from the beginning. I cc'ed
>> Stephen, maybe he remembers?
>>
>> Bye,
>> Hannes
>
> The problem was that VXLAN needs to update multicast settings and that
> can't be done under RTNL.

If the socket destroy was delayed just due to this, it should be all 
good then, because I took care of this multicast issue on commits

56ef9c909b40 ("vxlan: Move socket initialization to within rtnl scope")
54ff9ef36bdf ("ipv4, ipv6: kill ip_mc_{join, leave}_group and 
ipv6_sock_mc_{join, drop}")

   Marcelo

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net-next v2] vxlan: synchronously and race-free destruction of vxlan sockets
  2016-04-15 20:36 ` David Miller
@ 2016-04-15 21:50   ` Hannes Frederic Sowa
  0 siblings, 0 replies; 8+ messages in thread
From: Hannes Frederic Sowa @ 2016-04-15 21:50 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, eric.dumazet, jbenc, marcelo.leitner

On 15.04.2016 22:36, David Miller wrote:
> From: Hannes Frederic Sowa <hannes@stressinduktion.org>
> Date: Fri,  8 Apr 2016 22:55:01 +0200
>
>> @@ -1053,7 +1052,9 @@ static void __vxlan_sock_release(struct vxlan_sock *vs)
>>   	vxlan_notify_del_rx_port(vs);
>>   	spin_unlock(&vn->sock_lock);
>>
>> -	queue_work(vxlan_wq, &vs->del_work);
>> +	synchronize_net();
>> +	udp_tunnel_sock_release(vs->sock);
>> +	kfree(vs);
>>   }
>>
>>   static void vxlan_sock_release(struct vxlan_dev *vxlan)
>
> I just want to make sure you saw this change in net-next:
>
> ====================
> commit ca065d0cf80fa547724440a8bf37f1e674d917c0
> Author: Eric Dumazet <edumazet@google.com>
> Date:   Fri Apr 1 08:52:13 2016 -0700
>
>      udp: no longer use SLAB_DESTROY_BY_RCU
> ====================
>
> Does that effect your change?

I have seen this patch and it does not affect this patch:

The socket is matched from the net->vxlan_net->hlist in fast path. I 
don't want to destruct (kern_sock_shutdown) the socket while packets 
could be in flight through the vxlan stack. We clean up socket memory 
and destruct after rcu again, but given that we only do this during 
ifdown of a vxlan interface I don't see that we need to optimize this again.

All other tunneling protocols don't look up sockets in fast path, so 
they don't need to protect against this.

Bye,
Hannes

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net-next v2] vxlan: synchronously and race-free destruction of vxlan sockets
  2016-04-08 20:55 [PATCH net-next v2] vxlan: synchronously and race-free destruction of vxlan sockets Hannes Frederic Sowa
  2016-04-08 23:24 ` Cong Wang
  2016-04-15 20:36 ` David Miller
@ 2016-04-16 22:23 ` David Miller
  2 siblings, 0 replies; 8+ messages in thread
From: David Miller @ 2016-04-16 22:23 UTC (permalink / raw)
  To: hannes; +Cc: netdev, eric.dumazet, jbenc, marcelo.leitner

From: Hannes Frederic Sowa <hannes@stressinduktion.org>
Date: Fri,  8 Apr 2016 22:55:01 +0200

> Due to the fact that the udp socket is destructed asynchronously in a
> work queue, we have some nondeterministic behavior during shutdown of
> vxlan tunnels and creating new ones. Fix this by keeping the destruction
> process synchronous in regards to the user space process so IFF_UP can
> be reliably set.
> 
> udp_tunnel_sock_release destroys vs->sock->sk if reference counter
> indicates so. We expect to have the same lifetime of vxlan_sock and
> vxlan_sock->sock->sk even in fast paths with only rcu locks held. So
> only destruct the whole socket after we can be sure it cannot be found
> by searching vxlan_net->sock_list.
> 
> Cc: Eric Dumazet <eric.dumazet@gmail.com>
> Cc: Jiri Benc <jbenc@redhat.com>
> Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>

Applied.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2016-04-16 22:23 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-04-08 20:55 [PATCH net-next v2] vxlan: synchronously and race-free destruction of vxlan sockets Hannes Frederic Sowa
2016-04-08 23:24 ` Cong Wang
2016-04-08 23:55   ` Hannes Frederic Sowa
2016-04-15 20:58     ` Stephen Hemminger
2016-04-15 21:47       ` Marcelo Ricardo Leitner
2016-04-15 20:36 ` David Miller
2016-04-15 21:50   ` Hannes Frederic Sowa
2016-04-16 22:23 ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).