* [PATCH net-next] vxlan: Fix vs->vni_list locking.
@ 2013-07-10 22:04 Pravin B Shelar
  2013-07-10 22:58 ` Stephen Hemminger
  0 siblings, 1 reply; 13+ messages in thread
From: Pravin B Shelar @ 2013-07-10 22:04 UTC (permalink / raw)
  To: netdev; +Cc: stephen, Pravin B Shelar
Use rtnl lock to protect vs->vni_list updates.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
---
 drivers/net/vxlan.c |   13 +++++++++----
 1 files changed, 9 insertions(+), 4 deletions(-)
diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 227b54a..deca481 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -89,7 +89,7 @@ struct vxlan_sock {
 	struct work_struct del_work;
 	atomic_t	  refcnt;
 	struct socket	  *sock;
-	struct hlist_head vni_list[VNI_HASH_SIZE];
+	struct hlist_head vni_list[VNI_HASH_SIZE]; /* Protected by RTNL. */
 };
 
 /* per-network namespace private data for this module */
@@ -122,7 +122,7 @@ struct vxlan_fdb {
 
 /* Pseudo network device */
 struct vxlan_dev {
-	struct hlist_node hlist;	/* vni hash table */
+	struct hlist_node hlist;	/* vni hash table (vni_list) */
 	struct list_head  next;		/* vxlan's per namespace list */
 	struct vxlan_sock *vn_sock;	/* listening socket */
 	struct net_device *dev;
@@ -1644,16 +1644,21 @@ static void vxlan_sock_work(struct work_struct *work)
 	if (ovs) {
 		atomic_inc(&ovs->refcnt);
 		vxlan->vn_sock = ovs;
-		hlist_add_head_rcu(&vxlan->hlist, vni_head(ovs, vni));
 		spin_unlock(&vn->sock_lock);
 
+		rtnl_lock();
+		hlist_add_head_rcu(&vxlan->hlist, vni_head(ovs, vni));
+		rtnl_unlock();
+
 		sk_release_kernel(nvs->sock->sk);
 		kfree(nvs);
 	} else {
 		vxlan->vn_sock = nvs;
 		hlist_add_head_rcu(&nvs->hlist, vs_head(net, port));
-		hlist_add_head_rcu(&vxlan->hlist, vni_head(nvs, vni));
 		spin_unlock(&vn->sock_lock);
+		rtnl_lock();
+		hlist_add_head_rcu(&vxlan->hlist, vni_head(nvs, vni));
+		rtnl_unlock();
 	}
 out:
 	dev_put(dev);
-- 
1.7.1
^ permalink raw reply related	[flat|nested] 13+ messages in thread- * Re: [PATCH net-next] vxlan: Fix vs->vni_list locking.
  2013-07-10 22:04 [PATCH net-next] vxlan: Fix vs->vni_list locking Pravin B Shelar
@ 2013-07-10 22:58 ` Stephen Hemminger
  2013-07-10 23:08   ` Pravin Shelar
  0 siblings, 1 reply; 13+ messages in thread
From: Stephen Hemminger @ 2013-07-10 22:58 UTC (permalink / raw)
  To: Pravin B Shelar; +Cc: netdev
On Wed, 10 Jul 2013 15:04:44 -0700
Pravin B Shelar <pshelar@nicira.com> wrote:
> Use rtnl lock to protect vs->vni_list updates.
> 
> Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
> ---
I don't think this is necessary. I intentionally changed the
locking when socket management was moved to a work queue.
The vxlan_net socket lock is held there already, and finer grain.
^ permalink raw reply	[flat|nested] 13+ messages in thread 
- * Re: [PATCH net-next] vxlan: Fix vs->vni_list locking.
  2013-07-10 22:58 ` Stephen Hemminger
@ 2013-07-10 23:08   ` Pravin Shelar
  2013-07-10 23:25     ` Stephen Hemminger
  2013-07-13 17:18     ` [PATCH net] vxlan: add necessary locking on device removal Stephen Hemminger
  0 siblings, 2 replies; 13+ messages in thread
From: Pravin Shelar @ 2013-07-10 23:08 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev
On Wed, Jul 10, 2013 at 3:58 PM, Stephen Hemminger
<stephen@networkplumber.org> wrote:
> On Wed, 10 Jul 2013 15:04:44 -0700
> Pravin B Shelar <pshelar@nicira.com> wrote:
>
>> Use rtnl lock to protect vs->vni_list updates.
>>
>> Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
>> ---
>
> I don't think this is necessary. I intentionally changed the
> locking when socket management was moved to a work queue.
> The vxlan_net socket lock is held there already, and finer grain.
what abt vxlan_dellink()?
it is deleting vxlan-dev from hash table without lock.
^ permalink raw reply	[flat|nested] 13+ messages in thread 
- * Re: [PATCH net-next] vxlan: Fix vs->vni_list locking.
  2013-07-10 23:08   ` Pravin Shelar
@ 2013-07-10 23:25     ` Stephen Hemminger
  2013-07-11  2:55       ` David Miller
  2013-07-13 17:18     ` [PATCH net] vxlan: add necessary locking on device removal Stephen Hemminger
  1 sibling, 1 reply; 13+ messages in thread
From: Stephen Hemminger @ 2013-07-10 23:25 UTC (permalink / raw)
  To: Pravin Shelar; +Cc: netdev
On Wed, 10 Jul 2013 16:08:38 -0700
Pravin Shelar <pshelar@nicira.com> wrote:
> On Wed, Jul 10, 2013 at 3:58 PM, Stephen Hemminger
> <stephen@networkplumber.org> wrote:
> > On Wed, 10 Jul 2013 15:04:44 -0700
> > Pravin B Shelar <pshelar@nicira.com> wrote:
> >
> >> Use rtnl lock to protect vs->vni_list updates.
> >>
> >> Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
> >> ---
> >
> > I don't think this is necessary. I intentionally changed the
> > locking when socket management was moved to a work queue.
> > The vxlan_net socket lock is held there already, and finer grain.
> 
> what abt vxlan_dellink()?
> it is deleting vxlan-dev from hash table without lock.
Ok, then this should fix it.
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -1768,8 +1768,12 @@ static int vxlan_newlink(struct net *net
 static void vxlan_dellink(struct net_device *dev, struct list_head *head)
 {
        struct vxlan_dev *vxlan = netdev_priv(dev);
+       struct vxlan_net *vn = net_generic(dev_net(dev), vxlan_net_id);
 
+       spin_lock(&vn->sock_lock);
        hlist_del_rcu(&vxlan->hlist);
+       spin_unlock(&vn->sock_lock);
+
        list_del(&vxlan->next);
        unregister_netdevice_queue(dev, head);
 }
^ permalink raw reply	[flat|nested] 13+ messages in thread
- * Re: [PATCH net-next] vxlan: Fix vs->vni_list locking.
  2013-07-10 23:25     ` Stephen Hemminger
@ 2013-07-11  2:55       ` David Miller
  0 siblings, 0 replies; 13+ messages in thread
From: David Miller @ 2013-07-11  2:55 UTC (permalink / raw)
  To: stephen; +Cc: pshelar, netdev
From: Stephen Hemminger <stephen@networkplumber.org>
Date: Wed, 10 Jul 2013 16:25:09 -0700
> On Wed, 10 Jul 2013 16:08:38 -0700
> Pravin Shelar <pshelar@nicira.com> wrote:
> 
>> On Wed, Jul 10, 2013 at 3:58 PM, Stephen Hemminger
>> <stephen@networkplumber.org> wrote:
>> > On Wed, 10 Jul 2013 15:04:44 -0700
>> > Pravin B Shelar <pshelar@nicira.com> wrote:
>> >
>> >> Use rtnl lock to protect vs->vni_list updates.
>> >>
>> >> Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
>> >> ---
>> >
>> > I don't think this is necessary. I intentionally changed the
>> > locking when socket management was moved to a work queue.
>> > The vxlan_net socket lock is held there already, and finer grain.
>> 
>> what abt vxlan_dellink()?
>> it is deleting vxlan-dev from hash table without lock.
> 
> Ok, then this should fix it.
Looks good, one minor thing:
>  {
>         struct vxlan_dev *vxlan = netdev_priv(dev);
> +       struct vxlan_net *vn = net_generic(dev_net(dev), vxlan_net_id);
Please declare 'vn' on the first line rather than the second.
A formal submission with that fixup and a signoff, and I'll apply
this.
Thanks.
^ permalink raw reply	[flat|nested] 13+ messages in thread
 
- * [PATCH net] vxlan: add necessary locking on device removal
  2013-07-10 23:08   ` Pravin Shelar
  2013-07-10 23:25     ` Stephen Hemminger
@ 2013-07-13 17:18     ` Stephen Hemminger
  2013-07-13 19:21       ` Pravin Shelar
  2013-07-17 19:51       ` David Miller
  1 sibling, 2 replies; 13+ messages in thread
From: Stephen Hemminger @ 2013-07-13 17:18 UTC (permalink / raw)
  To: Pravin Shelar; +Cc: netdev
The socket management is now done in workqueue (outside of RTNL)
and protected by vn->sock_lock. There were two possible bugs, first
the vxlan device was removed from the VNI hash table per socket without
holding lock. And there was a race when device is created and the workqueue
could run after deletion.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
--- a/drivers/net/vxlan.c	2013-07-08 16:31:50.080744429 -0700
+++ b/drivers/net/vxlan.c	2013-07-10 20:15:47.337653899 -0700
@@ -1767,9 +1767,15 @@ static int vxlan_newlink(struct net *net
 
 static void vxlan_dellink(struct net_device *dev, struct list_head *head)
 {
+	struct vxlan_net *vn = net_generic(dev_net(dev), vxlan_net_id);
 	struct vxlan_dev *vxlan = netdev_priv(dev);
 
+	flush_workqueue(vxlan_wq);
+
+	spin_lock(&vn->sock_lock);
 	hlist_del_rcu(&vxlan->hlist);
+	spin_unlock(&vn->sock_lock);
+
 	list_del(&vxlan->next);
 	unregister_netdevice_queue(dev, head);
 }
^ permalink raw reply	[flat|nested] 13+ messages in thread
- * Re: [PATCH net] vxlan: add necessary locking on device removal
  2013-07-13 17:18     ` [PATCH net] vxlan: add necessary locking on device removal Stephen Hemminger
@ 2013-07-13 19:21       ` Pravin Shelar
  2013-07-16 18:28         ` David Miller
  2013-07-17 19:51       ` David Miller
  1 sibling, 1 reply; 13+ messages in thread
From: Pravin Shelar @ 2013-07-13 19:21 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev
On Sat, Jul 13, 2013 at 10:18 AM, Stephen Hemminger
<stephen@networkplumber.org> wrote:
> The socket management is now done in workqueue (outside of RTNL)
> and protected by vn->sock_lock. There were two possible bugs, first
> the vxlan device was removed from the VNI hash table per socket without
> holding lock. And there was a race when device is created and the workqueue
> could run after deletion.
>
> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
>
> --- a/drivers/net/vxlan.c       2013-07-08 16:31:50.080744429 -0700
> +++ b/drivers/net/vxlan.c       2013-07-10 20:15:47.337653899 -0700
> @@ -1767,9 +1767,15 @@ static int vxlan_newlink(struct net *net
>
>  static void vxlan_dellink(struct net_device *dev, struct list_head *head)
>  {
> +       struct vxlan_net *vn = net_generic(dev_net(dev), vxlan_net_id);
>         struct vxlan_dev *vxlan = netdev_priv(dev);
>
> +       flush_workqueue(vxlan_wq);
> +
Doesn't this create dependency on sock_work thread while holding RTNL.
If so it can result in deadlock.
> +       spin_lock(&vn->sock_lock);
>         hlist_del_rcu(&vxlan->hlist);
> +       spin_unlock(&vn->sock_lock);
> +
>         list_del(&vxlan->next);
>         unregister_netdevice_queue(dev, head);
>  }
^ permalink raw reply	[flat|nested] 13+ messages in thread
- * Re: [PATCH net] vxlan: add necessary locking on device removal
  2013-07-13 19:21       ` Pravin Shelar
@ 2013-07-16 18:28         ` David Miller
  2013-07-16 20:29           ` Pravin Shelar
  0 siblings, 1 reply; 13+ messages in thread
From: David Miller @ 2013-07-16 18:28 UTC (permalink / raw)
  To: pshelar; +Cc: stephen, netdev
From: Pravin Shelar <pshelar@nicira.com>
Date: Sat, 13 Jul 2013 12:21:52 -0700
> On Sat, Jul 13, 2013 at 10:18 AM, Stephen Hemminger
> <stephen@networkplumber.org> wrote:
>> The socket management is now done in workqueue (outside of RTNL)
>> and protected by vn->sock_lock. There were two possible bugs, first
>> the vxlan device was removed from the VNI hash table per socket without
>> holding lock. And there was a race when device is created and the workqueue
>> could run after deletion.
>>
>> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
>>
>> --- a/drivers/net/vxlan.c       2013-07-08 16:31:50.080744429 -0700
>> +++ b/drivers/net/vxlan.c       2013-07-10 20:15:47.337653899 -0700
>> @@ -1767,9 +1767,15 @@ static int vxlan_newlink(struct net *net
>>
>>  static void vxlan_dellink(struct net_device *dev, struct list_head *head)
>>  {
>> +       struct vxlan_net *vn = net_generic(dev_net(dev), vxlan_net_id);
>>         struct vxlan_dev *vxlan = netdev_priv(dev);
>>
>> +       flush_workqueue(vxlan_wq);
>> +
> Doesn't this create dependency on sock_work thread while holding RTNL.
> If so it can result in deadlock.
What exact deadlock do you perceive?  I don't see any code path in the
sock_work handler (vxlan_sock_work) which takes the RTNL mutex.
So we should be able to safely flush any pending sock_work jobs from
vxlan_dellink().  The fact that vxlan_dellink() runs with the RTNL
mutex shouldn't cause any issues.
^ permalink raw reply	[flat|nested] 13+ messages in thread
- * Re: [PATCH net] vxlan: add necessary locking on device removal
  2013-07-16 18:28         ` David Miller
@ 2013-07-16 20:29           ` Pravin Shelar
  2013-07-17  6:06             ` David Miller
  0 siblings, 1 reply; 13+ messages in thread
From: Pravin Shelar @ 2013-07-16 20:29 UTC (permalink / raw)
  To: David Miller; +Cc: stephen, netdev
On Tue, Jul 16, 2013 at 11:28 AM, David Miller <davem@davemloft.net> wrote:
> From: Pravin Shelar <pshelar@nicira.com>
> Date: Sat, 13 Jul 2013 12:21:52 -0700
>
>> On Sat, Jul 13, 2013 at 10:18 AM, Stephen Hemminger
>> <stephen@networkplumber.org> wrote:
>>> The socket management is now done in workqueue (outside of RTNL)
>>> and protected by vn->sock_lock. There were two possible bugs, first
>>> the vxlan device was removed from the VNI hash table per socket without
>>> holding lock. And there was a race when device is created and the workqueue
>>> could run after deletion.
>>>
>>> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
>>>
>>> --- a/drivers/net/vxlan.c       2013-07-08 16:31:50.080744429 -0700
>>> +++ b/drivers/net/vxlan.c       2013-07-10 20:15:47.337653899 -0700
>>> @@ -1767,9 +1767,15 @@ static int vxlan_newlink(struct net *net
>>>
>>>  static void vxlan_dellink(struct net_device *dev, struct list_head *head)
>>>  {
>>> +       struct vxlan_net *vn = net_generic(dev_net(dev), vxlan_net_id);
>>>         struct vxlan_dev *vxlan = netdev_priv(dev);
>>>
>>> +       flush_workqueue(vxlan_wq);
>>> +
>> Doesn't this create dependency on sock_work thread while holding RTNL.
>> If so it can result in deadlock.
>
> What exact deadlock do you perceive?  I don't see any code path in the
> sock_work handler (vxlan_sock_work) which takes the RTNL mutex.
>
> So we should be able to safely flush any pending sock_work jobs from
> vxlan_dellink().  The fact that vxlan_dellink() runs with the RTNL
> mutex shouldn't cause any issues.
commit 1c51a9159ddefa51 (vxlan: fix race caused by dropping
rtnl_unlock) moved sock-create to sock_work workq. Thats why I thought
sock_work takes RTNL. but if it is not taking RTNL in any case, they
why vxlan sock create is deferred to sock_work?
^ permalink raw reply	[flat|nested] 13+ messages in thread
- * Re: [PATCH net] vxlan: add necessary locking on device removal
  2013-07-16 20:29           ` Pravin Shelar
@ 2013-07-17  6:06             ` David Miller
  2013-07-17 15:41               ` Pravin Shelar
  0 siblings, 1 reply; 13+ messages in thread
From: David Miller @ 2013-07-17  6:06 UTC (permalink / raw)
  To: pshelar; +Cc: stephen, netdev
From: Pravin Shelar <pshelar@nicira.com>
Date: Tue, 16 Jul 2013 13:29:08 -0700
> commit 1c51a9159ddefa51 (vxlan: fix race caused by dropping
> rtnl_unlock) moved sock-create to sock_work workq. Thats why I thought
> sock_work takes RTNL. but if it is not taking RTNL in any case, they
> why vxlan sock create is deferred to sock_work?
That commit is handling the fact that the RTNL mutex is NOT held
during the create operation in question.
It defers to a workqueue so that the new ->sock_lock can be taken
in the appropriate context.
^ permalink raw reply	[flat|nested] 13+ messages in thread 
- * Re: [PATCH net] vxlan: add necessary locking on device removal
  2013-07-17  6:06             ` David Miller
@ 2013-07-17 15:41               ` Pravin Shelar
  2013-07-17 19:09                 ` David Miller
  0 siblings, 1 reply; 13+ messages in thread
From: Pravin Shelar @ 2013-07-17 15:41 UTC (permalink / raw)
  To: David Miller; +Cc: stephen, netdev
On Tue, Jul 16, 2013 at 11:06 PM, David Miller <davem@davemloft.net> wrote:
> From: Pravin Shelar <pshelar@nicira.com>
> Date: Tue, 16 Jul 2013 13:29:08 -0700
>
>> commit 1c51a9159ddefa51 (vxlan: fix race caused by dropping
>> rtnl_unlock) moved sock-create to sock_work workq. Thats why I thought
>> sock_work takes RTNL. but if it is not taking RTNL in any case, they
>> why vxlan sock create is deferred to sock_work?
>
> That commit is handling the fact that the RTNL mutex is NOT held
> during the create operation in question.
>
> It defers to a workqueue so that the new ->sock_lock can be taken
> in the appropriate context.
ok, Thanks for explanation. I do not see any problem with the patch.
^ permalink raw reply	[flat|nested] 13+ messages in thread 
- * Re: [PATCH net] vxlan: add necessary locking on device removal
  2013-07-17 15:41               ` Pravin Shelar
@ 2013-07-17 19:09                 ` David Miller
  0 siblings, 0 replies; 13+ messages in thread
From: David Miller @ 2013-07-17 19:09 UTC (permalink / raw)
  To: pshelar; +Cc: stephen, netdev
From: Pravin Shelar <pshelar@nicira.com>
Date: Wed, 17 Jul 2013 08:41:33 -0700
> On Tue, Jul 16, 2013 at 11:06 PM, David Miller <davem@davemloft.net> wrote:
>> From: Pravin Shelar <pshelar@nicira.com>
>> Date: Tue, 16 Jul 2013 13:29:08 -0700
>>
>>> commit 1c51a9159ddefa51 (vxlan: fix race caused by dropping
>>> rtnl_unlock) moved sock-create to sock_work workq. Thats why I thought
>>> sock_work takes RTNL. but if it is not taking RTNL in any case, they
>>> why vxlan sock create is deferred to sock_work?
>>
>> That commit is handling the fact that the RTNL mutex is NOT held
>> during the create operation in question.
>>
>> It defers to a workqueue so that the new ->sock_lock can be taken
>> in the appropriate context.
> 
> ok, Thanks for explanation. I do not see any problem with the patch.
Great, I'm going to apply Stephen's patch then.
Thanks.
^ permalink raw reply	[flat|nested] 13+ messages in thread 
 
 
 
 
 
- * Re: [PATCH net] vxlan: add necessary locking on device removal
  2013-07-13 17:18     ` [PATCH net] vxlan: add necessary locking on device removal Stephen Hemminger
  2013-07-13 19:21       ` Pravin Shelar
@ 2013-07-17 19:51       ` David Miller
  1 sibling, 0 replies; 13+ messages in thread
From: David Miller @ 2013-07-17 19:51 UTC (permalink / raw)
  To: stephen; +Cc: pshelar, netdev
From: Stephen Hemminger <stephen@networkplumber.org>
Date: Sat, 13 Jul 2013 10:18:18 -0700
> The socket management is now done in workqueue (outside of RTNL)
> and protected by vn->sock_lock. There were two possible bugs, first
> the vxlan device was removed from the VNI hash table per socket without
> holding lock. And there was a race when device is created and the workqueue
> could run after deletion.
> 
> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Applied.
^ permalink raw reply	[flat|nested] 13+ messages in thread 
 
 
 
end of thread, other threads:[~2013-07-17 19:51 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-07-10 22:04 [PATCH net-next] vxlan: Fix vs->vni_list locking Pravin B Shelar
2013-07-10 22:58 ` Stephen Hemminger
2013-07-10 23:08   ` Pravin Shelar
2013-07-10 23:25     ` Stephen Hemminger
2013-07-11  2:55       ` David Miller
2013-07-13 17:18     ` [PATCH net] vxlan: add necessary locking on device removal Stephen Hemminger
2013-07-13 19:21       ` Pravin Shelar
2013-07-16 18:28         ` David Miller
2013-07-16 20:29           ` Pravin Shelar
2013-07-17  6:06             ` David Miller
2013-07-17 15:41               ` Pravin Shelar
2013-07-17 19:09                 ` David Miller
2013-07-17 19:51       ` David Miller
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).