netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] rtnl: Simplify ASSERT_RTNL
@ 2007-09-29  0:59 Eric W. Biederman
  2007-09-29  4:31 ` Herbert Xu
  2007-10-11  4:16 ` David Miller
  0 siblings, 2 replies; 20+ messages in thread
From: Eric W. Biederman @ 2007-09-29  0:59 UTC (permalink / raw)
  To: David Miller; +Cc: netdev


Currently we have the call path:
macvlan_open -> dev_unicast_add -> __dev_set_rx_mode ->
	__dev_set_promiscuity -> ASSERT_RTNL -> mutex_trylock

When mutex debugging is on taking a mutex complains if we are not
allowed to sleep.  At that point we have called netif_tx_lock_bh
so we are clearly not allowed to sleep.  Arguably this is not a
problem for mutex_trylock.

However we can avoid the complaint and make the ASSERT_RTNL code
cheaper, faster and more obvious by simply calling mutex_is_locked.

So this patch adds rtnl_is_locked (which does mutex_is_locked on
the rtnl_mutex) and changes ASSERT_RTNL to use that.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 include/linux/rtnetlink.h |    4 ++--
 net/core/rtnetlink.c      |    5 +++++
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/include/linux/rtnetlink.h b/include/linux/rtnetlink.h
index dff3192..9c21e45 100644
--- a/include/linux/rtnetlink.h
+++ b/include/linux/rtnetlink.h
@@ -714,13 +714,13 @@ extern void rtmsg_ifinfo(int type, struct net_device *dev, unsigned change);
 extern void rtnl_lock(void);
 extern void rtnl_unlock(void);
 extern int rtnl_trylock(void);
+extern int rtnl_is_locked(void);
 
 extern void rtnetlink_init(void);
 extern void __rtnl_unlock(void);
 
 #define ASSERT_RTNL() do { \
-	if (unlikely(rtnl_trylock())) { \
-		rtnl_unlock(); \
+	if (unlikely(!rtnl_is_locked())) { \
 		printk(KERN_ERR "RTNL: assertion failed at %s (%d)\n", \
 		       __FILE__,  __LINE__); \
 		dump_stack(); \
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 739fbad..8bc68e6 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -85,6 +85,11 @@ int rtnl_trylock(void)
 	return mutex_trylock(&rtnl_mutex);
 }
 
+int rtnl_is_locked(void)
+{
+	return mutex_is_locked(&rtnl_mutex);
+}
+
 int rtattr_parse(struct rtattr *tb[], int maxattr, struct rtattr *rta, int len)
 {
 	memset(tb, 0, sizeof(struct rtattr*)*maxattr);
-- 
1.5.3.rc6.17.g1911


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH] rtnl: Simplify ASSERT_RTNL
  2007-09-29  0:59 [PATCH] rtnl: Simplify ASSERT_RTNL Eric W. Biederman
@ 2007-09-29  4:31 ` Herbert Xu
  2007-09-29 15:32   ` Patrick McHardy
  2007-09-29 17:18   ` Eric W. Biederman
  2007-10-11  4:16 ` David Miller
  1 sibling, 2 replies; 20+ messages in thread
From: Herbert Xu @ 2007-09-29  4:31 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: davem, netdev

Eric W. Biederman <ebiederm@xmission.com> wrote:
> 
> Currently we have the call path:
> macvlan_open -> dev_unicast_add -> __dev_set_rx_mode ->
>        __dev_set_promiscuity -> ASSERT_RTNL -> mutex_trylock
> 
> When mutex debugging is on taking a mutex complains if we are not
> allowed to sleep.  At that point we have called netif_tx_lock_bh
> so we are clearly not allowed to sleep.  Arguably this is not a
> problem for mutex_trylock.

Actually holding the TX lock here is a bug.  We're going to
call down into the hardware with __dev_set_promiscuity, which
may sleep (think USB NICs), so we definitely shouldn't be holding
any spin locks.

Patrick, could we avoid taking the TX lock here somehow?

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] rtnl: Simplify ASSERT_RTNL
  2007-09-29  4:31 ` Herbert Xu
@ 2007-09-29 15:32   ` Patrick McHardy
  2007-09-30  0:24     ` Herbert Xu
  2007-09-29 17:18   ` Eric W. Biederman
  1 sibling, 1 reply; 20+ messages in thread
From: Patrick McHardy @ 2007-09-29 15:32 UTC (permalink / raw)
  To: Herbert Xu; +Cc: Eric W. Biederman, davem, netdev

Herbert Xu wrote:
> Eric W. Biederman <ebiederm@xmission.com> wrote:
> 
>>Currently we have the call path:
>>macvlan_open -> dev_unicast_add -> __dev_set_rx_mode ->
>>       __dev_set_promiscuity -> ASSERT_RTNL -> mutex_trylock
>>
>>When mutex debugging is on taking a mutex complains if we are not
>>allowed to sleep.  At that point we have called netif_tx_lock_bh
>>so we are clearly not allowed to sleep.  Arguably this is not a
>>problem for mutex_trylock.
> 
> 
> Actually holding the TX lock here is a bug.  We're going to
> call down into the hardware with __dev_set_promiscuity, which
> may sleep (think USB NICs), so we definitely shouldn't be holding
> any spin locks.
> 
> Patrick, could we avoid taking the TX lock here somehow?


For unicast addresses its not strictly necessary since they may
only be changed under the RTNL anyway. The reason why it takes
the tx_lock is for consistency with multicast address handling,
which can't rely on the RTNL since IPv6 changes them from
BH context. The idea was that the ->set_rx_mode function should
handle both secondary unicast and multicast addresses for
simplicity.

But I don't understand the problem, if this doesn't work for
unicast addresses, why does it work for multicast addresses
and ->set_multicast_list? I had a quick look at some USB
network drivers but couldn't spot anything special ..

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] rtnl: Simplify ASSERT_RTNL
  2007-09-29  4:31 ` Herbert Xu
  2007-09-29 15:32   ` Patrick McHardy
@ 2007-09-29 17:18   ` Eric W. Biederman
  2007-09-29 17:51     ` Patrick McHardy
  2007-09-30  0:28     ` Herbert Xu
  1 sibling, 2 replies; 20+ messages in thread
From: Eric W. Biederman @ 2007-09-29 17:18 UTC (permalink / raw)
  To: Herbert Xu; +Cc: davem, netdev

Herbert Xu <herbert@gondor.apana.org.au> writes:

> Eric W. Biederman <ebiederm@xmission.com> wrote:
>> 
>> Currently we have the call path:
>> macvlan_open -> dev_unicast_add -> __dev_set_rx_mode ->
>>        __dev_set_promiscuity -> ASSERT_RTNL -> mutex_trylock
>> 
>> When mutex debugging is on taking a mutex complains if we are not
>> allowed to sleep.  At that point we have called netif_tx_lock_bh
>> so we are clearly not allowed to sleep.  Arguably this is not a
>> problem for mutex_trylock.
>
> Actually holding the TX lock here is a bug.  We're going to
> call down into the hardware with __dev_set_promiscuity, which
> may sleep (think USB NICs), so we definitely shouldn't be holding
> any spin locks.

Regardless of the correctness of where we have ASSERT_RTNL.
I think not actually taking the mutex on the assertion failure path
(just so we can release it), is still a good deal regardless.

For this particular call site clearly we need to look at what
is happening a little more.  The obvious thing would be to add
an explicit might_sleep if we are calling code that can sleep.

Eric

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] rtnl: Simplify ASSERT_RTNL
  2007-09-29 17:18   ` Eric W. Biederman
@ 2007-09-29 17:51     ` Patrick McHardy
  2007-09-30  0:28     ` Herbert Xu
  1 sibling, 0 replies; 20+ messages in thread
From: Patrick McHardy @ 2007-09-29 17:51 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: Herbert Xu, davem, netdev

Eric W. Biederman wrote:
> Regardless of the correctness of where we have ASSERT_RTNL.
> I think not actually taking the mutex on the assertion failure path
> (just so we can release it), is still a good deal regardless.


Agreed. I actually have an identical patch somewhere that
I wanted to submit soon.



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] rtnl: Simplify ASSERT_RTNL
  2007-09-29 15:32   ` Patrick McHardy
@ 2007-09-30  0:24     ` Herbert Xu
  2007-09-30 15:47       ` Patrick McHardy
  0 siblings, 1 reply; 20+ messages in thread
From: Herbert Xu @ 2007-09-30  0:24 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: netdev, oliver, Eric W. Biederman, linux-usb-devel, davem

On Sat, Sep 29, 2007 at 05:32:41PM +0200, Patrick McHardy wrote:
>
> For unicast addresses its not strictly necessary since they may
> only be changed under the RTNL anyway. The reason why it takes
> the tx_lock is for consistency with multicast address handling,
> which can't rely on the RTNL since IPv6 changes them from
> BH context. The idea was that the ->set_rx_mode function should
> handle both secondary unicast and multicast addresses for
> simplicity.

OK, the documentation (which predates USB NIC drivers) disagrees
with me and dev->set_multicast will always operate with the xmit
lock.

This isn't very nice for USB drivers since they have to leave
the change request hanging after dev->set_multicast returns.
I suppose the one thing that guarantees this will work is the
S in USB :)

However, at least one USB driver is buggy (kaweth) in not
allowing another set_multicast until the previous one is
done, which can lead to silent losses of set_multicast calls
since it's a void.

In any case, coming back to the original question, the RTNL
assertion is simply wrong in this case because if we're being
called from IPv6 then the RTNL won't even be held.

So I think we need to

1) Move the assert into dev_set_promiscuity.
2) Take the TX lock in dev_set_promiscuity.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
linux-usb-devel@lists.sourceforge.net
To unsubscribe, use the last form field at:
https://lists.sourceforge.net/lists/listinfo/linux-usb-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] rtnl: Simplify ASSERT_RTNL
  2007-09-29 17:18   ` Eric W. Biederman
  2007-09-29 17:51     ` Patrick McHardy
@ 2007-09-30  0:28     ` Herbert Xu
  1 sibling, 0 replies; 20+ messages in thread
From: Herbert Xu @ 2007-09-30  0:28 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: davem, netdev

On Sat, Sep 29, 2007 at 11:18:18AM -0600, Eric W. Biederman wrote:
>
> Regardless of the correctness of where we have ASSERT_RTNL.
> I think not actually taking the mutex on the assertion failure path
> (just so we can release it), is still a good deal regardless.

Provided that you add a might_sleep call in there so that if
somebody does this under locks it'll complain then I agree.

Checking RTNL under spin locks is almost certainly the sign
of a bug.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] rtnl: Simplify ASSERT_RTNL
  2007-09-30  0:24     ` Herbert Xu
@ 2007-09-30 15:47       ` Patrick McHardy
  2007-10-02  9:28         ` Herbert Xu
  0 siblings, 1 reply; 20+ messages in thread
From: Patrick McHardy @ 2007-09-30 15:47 UTC (permalink / raw)
  To: Herbert Xu; +Cc: Eric W. Biederman, davem, netdev, oliver, linux-usb-devel

Herbert Xu wrote:
> On Sat, Sep 29, 2007 at 05:32:41PM +0200, Patrick McHardy wrote:
> 
>>For unicast addresses its not strictly necessary since they may
>>only be changed under the RTNL anyway. The reason why it takes
>>the tx_lock is for consistency with multicast address handling,
>>which can't rely on the RTNL since IPv6 changes them from
>>BH context. The idea was that the ->set_rx_mode function should
>>handle both secondary unicast and multicast addresses for
>>simplicity.
> 
> 
> In any case, coming back to the original question, the RTNL
> assertion is simply wrong in this case because if we're being
> called from IPv6 then the RTNL won't even be held.
> 
> So I think we need to
> 
> 1) Move the assert into dev_set_promiscuity.
> 2) Take the TX lock in dev_set_promiscuity.


In the IPv6 case we're only changing the multicast list,
so we're never calling into __dev_set_promiscuity.

I actually even added a comment about this :)

        /* Unicast addresses changes may only happen under the rtnl,
         * therefore calling __dev_set_promiscuity here is safe.
         */

I would prefer to keep the ASSERT_RTNL in __dev_set_promiscuity
since it also covers the __dev_set_rx_mode path. How about
adding an ASSERT_RTNL_ATOMIC without the might_sleep or simply
open coding it?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] rtnl: Simplify ASSERT_RTNL
  2007-09-30 15:47       ` Patrick McHardy
@ 2007-10-02  9:28         ` Herbert Xu
  2007-10-02 15:29           ` Patrick McHardy
  0 siblings, 1 reply; 20+ messages in thread
From: Herbert Xu @ 2007-10-02  9:28 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: netdev, oliver, Eric W. Biederman, linux-usb-devel, davem

On Sun, Sep 30, 2007 at 05:47:30PM +0200, Patrick McHardy wrote:
>
> In the IPv6 case we're only changing the multicast list,
> so we're never calling into __dev_set_promiscuity.
> 
> I actually even added a comment about this :)
> 
>         /* Unicast addresses changes may only happen under the rtnl,
>          * therefore calling __dev_set_promiscuity here is safe.
>          */

Good point.  I should stop mentally blocking out comments when
I read code :)

I'm a bit uncomfortable with having a function (change_flags)
that's sometimes in a sleepable context and sometimes running
with BH disabled.

So how about splitting up the unicast/multicast entry points
as follows?

[NET]: Restore multicast-only __dev_mc_upload

As it is dev_set_rx_mode needs to cope with being called with
or without the RTNL.  This is rather confusing when it comes
to understanding what locks are being held at a particular
point.

Since the only path that calls it without the RTNL is in the
multicast code, we could avoid the confusion by splitting that
out.

This patch restores the old __dev_mc_upload as a multicast-only
variant of dev_set_rx_mode.  It also gets rid of the now-unused
__dev_set_rx_mod and makes dev_set_rx_mode static since it's
only used in net/core/dev.c.

As a side-effect this fixes the warning triggered by the
RTNL_ASSERT in __dev_set_promiscuity.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 91cd3f3..e9a579e 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1356,8 +1356,6 @@ extern struct net_device *alloc_netdev_mq(int sizeof_priv, const char *name,
 extern int		register_netdev(struct net_device *dev);
 extern void		unregister_netdev(struct net_device *dev);
 /* Functions used for secondary unicast and multicast support */
-extern void		dev_set_rx_mode(struct net_device *dev);
-extern void		__dev_set_rx_mode(struct net_device *dev);
 extern int		dev_unicast_delete(struct net_device *dev, void *addr, int alen);
 extern int		dev_unicast_add(struct net_device *dev, void *addr, int alen);
 extern int 		dev_mc_delete(struct net_device *dev, void *addr, int alen, int all);
diff --git a/net/core/dev.c b/net/core/dev.c
index 833f060..4cf985f 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -967,6 +967,8 @@ void dev_load(struct net *net, const char *name)
 		request_module("%s", name);
 }
 
+static void dev_set_rx_mode(struct net_device *dev);
+
 /**
  *	dev_open	- prepare an interface for use.
  *	@dev:	device to open
@@ -2775,7 +2777,7 @@ void dev_set_allmulti(struct net_device *dev, int inc)
  *	filtering it is put in promiscous mode while unicast addresses
  *	are present.
  */
-void __dev_set_rx_mode(struct net_device *dev)
+static void dev_set_rx_mode(struct net_device *dev)
 {
 	/* dev_open will call this function so the list will stay sane. */
 	if (!(dev->flags&IFF_UP))
@@ -2784,9 +2786,11 @@ void __dev_set_rx_mode(struct net_device *dev)
 	if (!netif_device_present(dev))
 		return;
 
-	if (dev->set_rx_mode)
+	if (dev->set_rx_mode) {
+		netif_tx_lock_bh(dev);
 		dev->set_rx_mode(dev);
-	else {
+		netif_tx_unlock_bh(dev);
+	} else {
 		/* Unicast addresses changes may only happen under the rtnl,
 		 * therefore calling __dev_set_promiscuity here is safe.
 		 */
@@ -2798,18 +2802,14 @@ void __dev_set_rx_mode(struct net_device *dev)
 			dev->uc_promisc = 0;
 		}
 
-		if (dev->set_multicast_list)
+		if (dev->set_multicast_list) {
+			netif_tx_lock_bh(dev);
 			dev->set_multicast_list(dev);
+			netif_tx_unlock_bh(dev);
+		}
 	}
 }
 
-void dev_set_rx_mode(struct net_device *dev)
-{
-	netif_tx_lock_bh(dev);
-	__dev_set_rx_mode(dev);
-	netif_tx_unlock_bh(dev);
-}
-
 int __dev_addr_delete(struct dev_addr_list **list, int *count,
 		      void *addr, int alen, int glbl)
 {
@@ -2885,11 +2885,9 @@ int dev_unicast_delete(struct net_device *dev, void *addr, int alen)
 
 	ASSERT_RTNL();
 
-	netif_tx_lock_bh(dev);
 	err = __dev_addr_delete(&dev->uc_list, &dev->uc_count, addr, alen, 0);
 	if (!err)
-		__dev_set_rx_mode(dev);
-	netif_tx_unlock_bh(dev);
+		dev_set_rx_mode(dev);
 	return err;
 }
 EXPORT_SYMBOL(dev_unicast_delete);
@@ -2911,11 +2909,9 @@ int dev_unicast_add(struct net_device *dev, void *addr, int alen)
 
 	ASSERT_RTNL();
 
-	netif_tx_lock_bh(dev);
 	err = __dev_addr_add(&dev->uc_list, &dev->uc_count, addr, alen, 0);
 	if (!err)
-		__dev_set_rx_mode(dev);
-	netif_tx_unlock_bh(dev);
+		dev_set_rx_mode(dev);
 	return err;
 }
 EXPORT_SYMBOL(dev_unicast_add);
diff --git a/net/core/dev_mcast.c b/net/core/dev_mcast.c
index 896b0ca..be987d5 100644
--- a/net/core/dev_mcast.c
+++ b/net/core/dev_mcast.c
@@ -65,6 +65,30 @@
  */
 
 /*
+ *     Update the multicast list into the physical NIC controller.
+ */
+
+static void __dev_mc_upload(struct net_device *dev)
+{
+	/* Don't do anything till we up the interface
+	 * [dev_open will do this too so the list will
+	 * stay sane]
+	 */
+
+	if (!(dev->flags & IFF_UP))
+		return;
+
+	/* Devices with which have been detached don't get set. */
+	if (!netif_device_present(dev))
+		return;
+
+	if (dev->set_rx_mode)
+		dev->set_rx_mode(dev);
+	else if (dev->set_multicast_list)
+		dev->set_multicast_list(dev);
+}
+
+/*
  *	Delete a device level multicast
  */
 
@@ -81,7 +105,7 @@ int dev_mc_delete(struct net_device *dev, void *addr, int alen, int glbl)
 		 *	loaded filter is now wrong. Fix it
 		 */
 
-		__dev_set_rx_mode(dev);
+		__dev_mc_upload(dev);
 	}
 	netif_tx_unlock_bh(dev);
 	return err;
@@ -98,7 +122,7 @@ int dev_mc_add(struct net_device *dev, void *addr, int alen, int glbl)
 	netif_tx_lock_bh(dev);
 	err = __dev_addr_add(&dev->mc_list, &dev->mc_count, addr, alen, glbl);
 	if (!err)
-		__dev_set_rx_mode(dev);
+		__dev_mc_upload(dev);
 	netif_tx_unlock_bh(dev);
 	return err;
 }
@@ -140,7 +164,7 @@ int dev_mc_sync(struct net_device *to, struct net_device *from)
 		da = next;
 	}
 	if (!err)
-		__dev_set_rx_mode(to);
+		__dev_mc_upload(to);
 	netif_tx_unlock_bh(to);
 
 	return err;
@@ -177,7 +201,7 @@ void dev_mc_unsync(struct net_device *to, struct net_device *from)
 				  da->da_addr, da->da_addrlen, 0);
 		da = next;
 	}
-	__dev_set_rx_mode(to);
+	__dev_mc_upload(to);
 
 	netif_tx_unlock_bh(to);
 	netif_tx_unlock_bh(from);

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
linux-usb-devel@lists.sourceforge.net
To unsubscribe, use the last form field at:
https://lists.sourceforge.net/lists/listinfo/linux-usb-devel

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH] rtnl: Simplify ASSERT_RTNL
  2007-10-02  9:28         ` Herbert Xu
@ 2007-10-02 15:29           ` Patrick McHardy
  2007-10-03  6:06             ` Herbert Xu
  0 siblings, 1 reply; 20+ messages in thread
From: Patrick McHardy @ 2007-10-02 15:29 UTC (permalink / raw)
  To: Herbert Xu; +Cc: netdev, oliver, Eric W. Biederman, linux-usb-devel, davem

On Tue, 2 Oct 2007, Herbert Xu wrote:

> On Sun, Sep 30, 2007 at 05:47:30PM +0200, Patrick McHardy wrote:
>>
> I'm a bit uncomfortable with having a function (change_flags)
> that's sometimes in a sleepable context and sometimes running
> with BH disabled.
>
> So how about splitting up the unicast/multicast entry points
> as follows?

>
> [NET]: Restore multicast-only __dev_mc_upload
>
> As it is dev_set_rx_mode needs to cope with being called with
> or without the RTNL.  This is rather confusing when it comes
> to understanding what locks are being held at a particular
> point.
>
> Since the only path that calls it without the RTNL is in the
> multicast code, we could avoid the confusion by splitting that
> out.
>
> This patch restores the old __dev_mc_upload as a multicast-only
> variant of dev_set_rx_mode.  It also gets rid of the now-unused
> __dev_set_rx_mod and makes dev_set_rx_mode static since it's
> only used in net/core/dev.c.
>
> As a side-effect this fixes the warning triggered by the
> RTNL_ASSERT in __dev_set_promiscuity.


I think this doesn't completely fix it, when dev_unicast_add is
interrupted by dev_mc_add before the unicast changes are performed,
they will get committed in the dev_mc_add context, so we might still
call change_flags with BH disabled. Taking the TX lock around the
dev->uc_count and dev->uc_promisc checks and changes in __dev_set_rx_mode
should fix this.

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
linux-usb-devel@lists.sourceforge.net
To unsubscribe, use the last form field at:
https://lists.sourceforge.net/lists/listinfo/linux-usb-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] rtnl: Simplify ASSERT_RTNL
  2007-10-02 15:29           ` Patrick McHardy
@ 2007-10-03  6:06             ` Herbert Xu
  2007-10-08  4:40               ` Patrick McHardy
  0 siblings, 1 reply; 20+ messages in thread
From: Herbert Xu @ 2007-10-03  6:06 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: netdev, oliver, Eric W. Biederman, linux-usb-devel, davem

On Tue, Oct 02, 2007 at 05:29:11PM +0200, Patrick McHardy wrote:
>
> I think this doesn't completely fix it, when dev_unicast_add is
> interrupted by dev_mc_add before the unicast changes are performed,
> they will get committed in the dev_mc_add context, so we might still
> call change_flags with BH disabled. Taking the TX lock around the
> dev->uc_count and dev->uc_promisc checks and changes in __dev_set_rx_mode
> should fix this.

Good catch.  Digging back in history it seems that you added
the change_rx_flags function so that the driver didn't have to
do it under TX lock, right?

The problem with this is that the stack can now call
change_rx_flags and set_multicast_list simultaneously
which presents a potential headache for the driver
author (if they were to use change_rx_flags).

It seems to me what we could do is in fact separate out the
part that adds the address and the part that syncs it with
hardware.

That way we can call the hardware from a process context later
and use the RTNL to guarantee that we only enter the driver
once.

So dev_mc_add would look like:

1) Hold some form of lock L.
2) Modify mc list A (a copy of the current mc list).
3) Drop lock.
4) Schedule an update to the hardware.

The update to the hardware would look lie:

1) Hold RTNL.
2) Hold lock L.
3) Copy list A to list B (B would be our current list).
4) Drop lock L.
5) Call the hardware.
6) Drop RTNL.

For compatibility, set_multicast_list would still be invoked
under the TX lock while set_rx_mode would do exactly the same
thing but would only hold the RTNL.

What do you think about this approach?

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
linux-usb-devel@lists.sourceforge.net
To unsubscribe, use the last form field at:
https://lists.sourceforge.net/lists/listinfo/linux-usb-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] rtnl: Simplify ASSERT_RTNL
  2007-10-03  6:06             ` Herbert Xu
@ 2007-10-08  4:40               ` Patrick McHardy
  0 siblings, 0 replies; 20+ messages in thread
From: Patrick McHardy @ 2007-10-08  4:40 UTC (permalink / raw)
  To: Herbert Xu; +Cc: Eric W. Biederman, davem, netdev, oliver, linux-usb-devel

Herbert Xu wrote:
> On Tue, Oct 02, 2007 at 05:29:11PM +0200, Patrick McHardy wrote:
> 
>>I think this doesn't completely fix it, when dev_unicast_add is
>>interrupted by dev_mc_add before the unicast changes are performed,
>>they will get committed in the dev_mc_add context, so we might still
>>call change_flags with BH disabled. Taking the TX lock around the
>>dev->uc_count and dev->uc_promisc checks and changes in __dev_set_rx_mode
>>should fix this.
> 
> 
> Good catch.  Digging back in history it seems that you added
> the change_rx_flags function so that the driver didn't have to
> do it under TX lock, right?


Yes, and to make sure the RTNL is held.

> The problem with this is that the stack can now call
> change_rx_flags and set_multicast_list simultaneously
> which presents a potential headache for the driver
> author (if they were to use change_rx_flags).


The change_rx_flags function was intended to be used by software
devices that want to synchronize flags to a different device,
in which case they shouldn't interfere since set_multicast_list
would only be used for the multicast list and not the flags.

> It seems to me what we could do is in fact separate out the
> part that adds the address and the part that syncs it with
> hardware.


That sounds like a really good idea to get rid of all the
confusion.

> That way we can call the hardware from a process context later
> and use the RTNL to guarantee that we only enter the driver
> once.
> 
> So dev_mc_add would look like:
> 
> 1) Hold some form of lock L.
> 2) Modify mc list A (a copy of the current mc list).
> 3) Drop lock.
> 4) Schedule an update to the hardware.
> 
> The update to the hardware would look lie:
> 
> 1) Hold RTNL.
> 2) Hold lock L.
> 3) Copy list A to list B (B would be our current list).
> 4) Drop lock L.
> 5) Call the hardware.
> 6) Drop RTNL.
> 
> For compatibility, set_multicast_list would still be invoked
> under the TX lock while set_rx_mode would do exactly the same
> thing but would only hold the RTNL.
> 
> What do you think about this approach?


Sounds great :)

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] rtnl: Simplify ASSERT_RTNL
  2007-09-29  0:59 [PATCH] rtnl: Simplify ASSERT_RTNL Eric W. Biederman
  2007-09-29  4:31 ` Herbert Xu
@ 2007-10-11  4:16 ` David Miller
  2007-10-11  6:57   ` Eric W. Biederman
  1 sibling, 1 reply; 20+ messages in thread
From: David Miller @ 2007-10-11  4:16 UTC (permalink / raw)
  To: ebiederm; +Cc: netdev

From: ebiederm@xmission.com (Eric W. Biederman)
Date: Fri, 28 Sep 2007 18:59:08 -0600

> 
> Currently we have the call path:
> macvlan_open -> dev_unicast_add -> __dev_set_rx_mode ->
> 	__dev_set_promiscuity -> ASSERT_RTNL -> mutex_trylock
> 
> When mutex debugging is on taking a mutex complains if we are not
> allowed to sleep.  At that point we have called netif_tx_lock_bh
> so we are clearly not allowed to sleep.  Arguably this is not a
> problem for mutex_trylock.
> 
> However we can avoid the complaint and make the ASSERT_RTNL code
> cheaper, faster and more obvious by simply calling mutex_is_locked.
> 
> So this patch adds rtnl_is_locked (which does mutex_is_locked on
> the rtnl_mutex) and changes ASSERT_RTNL to use that.
> 
> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>

There was a lot of discussion about how to do this right and
therefore you'll need to resubmit all of this with this
discussioned issues addressed.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] rtnl: Simplify ASSERT_RTNL
  2007-10-11  4:16 ` David Miller
@ 2007-10-11  6:57   ` Eric W. Biederman
  2007-10-11  7:12     ` Herbert Xu
  0 siblings, 1 reply; 20+ messages in thread
From: Eric W. Biederman @ 2007-10-11  6:57 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Herbert Xu, Patrick McHardy

David Miller <davem@davemloft.net> writes:

> From: ebiederm@xmission.com (Eric W. Biederman)
> Date: Fri, 28 Sep 2007 18:59:08 -0600
>
>> 
>> Currently we have the call path:
>> macvlan_open -> dev_unicast_add -> __dev_set_rx_mode ->
>> 	__dev_set_promiscuity -> ASSERT_RTNL -> mutex_trylock
>> 
>> When mutex debugging is on taking a mutex complains if we are not
>> allowed to sleep.  At that point we have called netif_tx_lock_bh
>> so we are clearly not allowed to sleep.  Arguably this is not a
>> problem for mutex_trylock.
>> 
>> However we can avoid the complaint and make the ASSERT_RTNL code
>> cheaper, faster and more obvious by simply calling mutex_is_locked.
>> 
>> So this patch adds rtnl_is_locked (which does mutex_is_locked on
>> the rtnl_mutex) and changes ASSERT_RTNL to use that.
>> 
>> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
>
> There was a lot of discussion about how to do this right and
> therefore you'll need to resubmit all of this with this
> discussioned issues addressed.

Thanks for the update on where this patch sits.

My understanding is that the patch as I submitted it is correct.

The code path that sparked this patch has been seen to be extremely
convoluted from a locking perspective.  Herbert and Patrick have been
discussing that code path and it was my impression they were working
together to figure out how to refactor that code path to make the
locking simpler.  There are enough convoluted details that I
do not feel comfortable refactoring that code path.

There was a practical suggestion by Herbert that ASSERT_RTNL have a
might_sleep() added.  That suggestion will currently result in
ASSERT_RTNL firing unnecessarily from the macvlan_open code path.

So I do not feel comfortable adding a might_sleep() into ASSERT_RTNL,
as it appears that it will warn on currently correct code.  Even if
that code has code has nearly incomprehensible locking.

Eric

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] rtnl: Simplify ASSERT_RTNL
  2007-10-11  6:57   ` Eric W. Biederman
@ 2007-10-11  7:12     ` Herbert Xu
  2007-10-11  8:23       ` Eric W. Biederman
  0 siblings, 1 reply; 20+ messages in thread
From: Herbert Xu @ 2007-10-11  7:12 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: David Miller, netdev, Patrick McHardy

On Thu, Oct 11, 2007 at 12:57:31AM -0600, Eric W. Biederman wrote:
>
> There was a practical suggestion by Herbert that ASSERT_RTNL have a
> might_sleep() added.  That suggestion will currently result in
> ASSERT_RTNL firing unnecessarily from the macvlan_open code path.

As I've already said we should change the macvlan and core
netdev code so that this doesn't happen in the first place.
In gernal checking for the RTNL while holding a spin lock is
a sign of a bug.

So I would object to a patch that caused the RTNL_ASSERT to not
warn about being called in an atomic context.

I don't have a problem with your patch per se, it's the fact
that the patch is removing the warning when it's called in an
atomic context that I don't like.

Thanks,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] rtnl: Simplify ASSERT_RTNL
  2007-10-11  7:12     ` Herbert Xu
@ 2007-10-11  8:23       ` Eric W. Biederman
  2007-10-11  8:28         ` Herbert Xu
  0 siblings, 1 reply; 20+ messages in thread
From: Eric W. Biederman @ 2007-10-11  8:23 UTC (permalink / raw)
  To: Herbert Xu; +Cc: David Miller, netdev, Patrick McHardy

Herbert Xu <herbert@gondor.apana.org.au> writes:

> On Thu, Oct 11, 2007 at 12:57:31AM -0600, Eric W. Biederman wrote:
>>
>> There was a practical suggestion by Herbert that ASSERT_RTNL have a
>> might_sleep() added.  That suggestion will currently result in
>> ASSERT_RTNL firing unnecessarily from the macvlan_open code path.
>
> As I've already said we should change the macvlan and core
> netdev code so that this doesn't happen in the first place.

Agreed.  Until that is done I am reluctant to add a warning
to ASSERT_RTNL.  Last I looked at that part of the thread
it looked like you and Patrick were making good progress
towards unraveling that, so I have no problem adding
an extra warning when we don't expect it to ever trigger.

> In gernal checking for the RTNL while holding a spin lock is
> a sign of a bug.
>
> So I would object to a patch that caused the RTNL_ASSERT to not
> warn about being called in an atomic context.

ASSERT_RTNL does not warn about being called in an atomic context
today!

> I don't have a problem with your patch per se, it's the fact
> that the patch is removing the warning when it's called in an
> atomic context that I don't like.

No my patch does not remove a warning.

Way way deep in mutex debugging on the slowpath there is a unreadable
and incomprehensible WARN_ON in muxtex_trylock that will trigger if
you have 10 tons of debugging turned on, and you are in,
interrupt context, and you manage to hit the slow path.  I think that
is a pretty unlikely scenario.

Eric

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] rtnl: Simplify ASSERT_RTNL
  2007-10-11  8:23       ` Eric W. Biederman
@ 2007-10-11  8:28         ` Herbert Xu
  2007-10-11 16:33           ` Eric W. Biederman
  0 siblings, 1 reply; 20+ messages in thread
From: Herbert Xu @ 2007-10-11  8:28 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: David Miller, netdev, Patrick McHardy

On Thu, Oct 11, 2007 at 02:23:35AM -0600, Eric W. Biederman wrote:
>
> > So I would object to a patch that caused the RTNL_ASSERT to not
> > warn about being called in an atomic context.
> 
> ASSERT_RTNL does not warn about being called in an atomic context
> today!

Well it did didn't it or we wouldn't be having this thread :)
 
> Way way deep in mutex debugging on the slowpath there is a unreadable
> and incomprehensible WARN_ON in muxtex_trylock that will trigger if
> you have 10 tons of debugging turned on, and you are in,
> interrupt context, and you manage to hit the slow path.  I think that
> is a pretty unlikely scenario.

Well thanks to that warning we're on our way of improving the
code that triggered it in such a way that this warning will soon
go silent.

That's precisely the reason why I object to having this warning
removed.  Now you have a good point that this warning doesn't
trigger all the time.  The fix to that is to *make* it trigger
always, not removing it.

Thanks,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] rtnl: Simplify ASSERT_RTNL
  2007-10-11  8:28         ` Herbert Xu
@ 2007-10-11 16:33           ` Eric W. Biederman
  2007-10-12  0:30             ` David Miller
  0 siblings, 1 reply; 20+ messages in thread
From: Eric W. Biederman @ 2007-10-11 16:33 UTC (permalink / raw)
  To: Herbert Xu; +Cc: David Miller, netdev, Patrick McHardy

Herbert Xu <herbert@gondor.apana.org.au> writes:

> Well thanks to that warning we're on our way of improving the
> code that triggered it in such a way that this warning will soon
> go silent.
>
> That's precisely the reason why I object to having this warning
> removed.  Now you have a good point that this warning doesn't
> trigger all the time.  The fix to that is to *make* it trigger
> always, not removing it.

I'm almost convinced but.

Where people deliberately use convoluted locking is where we
most need things like ASSERT_RTNL.

Having ASSERT_RTNL warn if you were sleeping does not seem
intuitive from the name.

This instance of convoluted locking seems like a complete
one off to me, and if it will warn about other constructs
currently in the kernel it seems wrong.

Frankly I don't feel comfortable adding the check because I can't
defend the presence of might_sleep() in ASSERT_RTNL.  If I can't
understand a change well enough to defend it I will not take
responsibility for it, and I will not add my Signed-off-by to it.

The patch I wrote was trivial a trivial optimization and obviously
correct.  Adding the might_sleep() and the patch becomes the start
of a crusade for better code that I don't believe in.

So I would rather forget this patch then make that one line addition.

Thanks,
Eric




^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] rtnl: Simplify ASSERT_RTNL
  2007-10-11 16:33           ` Eric W. Biederman
@ 2007-10-12  0:30             ` David Miller
  2007-10-12  3:15               ` Eric W. Biederman
  0 siblings, 1 reply; 20+ messages in thread
From: David Miller @ 2007-10-12  0:30 UTC (permalink / raw)
  To: ebiederm; +Cc: herbert, netdev, kaber

From: ebiederm@xmission.com (Eric W. Biederman)
Date: Thu, 11 Oct 2007 10:33:39 -0600

> Having ASSERT_RTNL warn if you were sleeping does not seem
> intuitive from the name.
> 
> This instance of convoluted locking seems like a complete
> one off to me, and if it will warn about other constructs
> currently in the kernel it seems wrong.

RTNL is a semaphore, therefore it sleeps.

Therefore anything that requires RTNL is held can also assume that it
can do things like GFP_KERNEL allocations and other sleeping actions.

This is why any code path that runs with spinlocks held or interrupts
disabled, and hits an RTNL assertion, is buggy.  It is the chain of
dependencies of what is allowed in such contexts.

	ASSERT_RTNL();
	page = alloc_page(GFP_KERNEL);

That sequence above is absolutely always legal.

The might_sleep() check is just letting us know the problem exists.

If spinlocks or interrupt disabling is needed to implement something
deeper in the call chain, that's fine, it just cannot call back into
the RTNL asserted domain with those spinlocks held or interrupts
disabled.

If the mac_vlan driver, or whichever one has the problem, does things
like this it must be fixed and putting or not putting a proper
might_sleep() check here doesn't change that.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] rtnl: Simplify ASSERT_RTNL
  2007-10-12  0:30             ` David Miller
@ 2007-10-12  3:15               ` Eric W. Biederman
  0 siblings, 0 replies; 20+ messages in thread
From: Eric W. Biederman @ 2007-10-12  3:15 UTC (permalink / raw)
  To: David Miller; +Cc: herbert, netdev, kaber


David, all.  My apologies. I'm tired and I don't have the energy to
grok yet another part of the kernel right now.  So I can not
productively participate in this discussion.

I do agree that the locking below dev_unicast_add() that was exposed
by the macvlan driver is unmaintainable even if it is currently
correct.

I don't see any fundamental problems with adding a might_sleep(),
ASSERT_RTNL.

I just don't have the energy to audit everything I feel I would have
to audit to be comfortable taking responsibility for adding the
might_sleep() or fixing the locking in dev_unicast_add() (which seems
more important).

So again my apologies.  This is all beyond what I can deal with at the moment.

Eric

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2007-10-12  3:16 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-09-29  0:59 [PATCH] rtnl: Simplify ASSERT_RTNL Eric W. Biederman
2007-09-29  4:31 ` Herbert Xu
2007-09-29 15:32   ` Patrick McHardy
2007-09-30  0:24     ` Herbert Xu
2007-09-30 15:47       ` Patrick McHardy
2007-10-02  9:28         ` Herbert Xu
2007-10-02 15:29           ` Patrick McHardy
2007-10-03  6:06             ` Herbert Xu
2007-10-08  4:40               ` Patrick McHardy
2007-09-29 17:18   ` Eric W. Biederman
2007-09-29 17:51     ` Patrick McHardy
2007-09-30  0:28     ` Herbert Xu
2007-10-11  4:16 ` David Miller
2007-10-11  6:57   ` Eric W. Biederman
2007-10-11  7:12     ` Herbert Xu
2007-10-11  8:23       ` Eric W. Biederman
2007-10-11  8:28         ` Herbert Xu
2007-10-11 16:33           ` Eric W. Biederman
2007-10-12  0:30             ` David Miller
2007-10-12  3:15               ` Eric W. Biederman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).