netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC] dev->refcnt long term holder
@ 2009-11-16 18:01 Eric Dumazet
  2009-11-16 19:02 ` Stephen Hemminger
  2009-11-16 21:50 ` PATCH net-next-2.6] linkwatch: linkwatch_forget_dev() to speedup device dismantle Eric Dumazet
  0 siblings, 2 replies; 13+ messages in thread
From: Eric Dumazet @ 2009-11-16 18:01 UTC (permalink / raw)
  To: David S. Miller, Herbert Xu, Stephen Hemminger; +Cc: Linux Netdev List

time ip link del eth3.103 ; time ip link del eth3.104 ; time ip link del eth3.105

real	0m0.266s
user	0m0.000s
sys	0m0.001s

real	0m0.770s
user	0m0.000s
sys	0m0.000s

real	0m1.022s
user	0m0.000s
sys	0m0.000s


One problem of current schem in vlan dismantle phase is the
holding of device done by following chain :

vlan_dev_stop() ->
	netif_carrier_off(dev) ->
		linkwatch_fire_event(dev) ->
			dev_hold() ...

And __linkwatch_run_queue() run up to one second later...

Is following patch one way to avoid the problem, or should
we add a new linkwatch_forgetpro_device(dev) method to immediately
release the device reference (and unlink device from the list) ?
(This would probably need a doubly linked list instead of single link list)

Thanks

diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c
index 4198ec5..17216f9 100644
--- a/net/8021q/vlan_dev.c
+++ b/net/8021q/vlan_dev.c
@@ -512,7 +512,6 @@ static int vlan_dev_stop(struct net_device *dev)
 	if (compare_ether_addr(dev->dev_addr, real_dev->dev_addr))
 		dev_unicast_delete(real_dev, dev->dev_addr);
 
-	netif_carrier_off(dev);
 	return 0;
 }
 

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [RFC] dev->refcnt long term holder
  2009-11-16 18:01 [RFC] dev->refcnt long term holder Eric Dumazet
@ 2009-11-16 19:02 ` Stephen Hemminger
  2009-11-16 19:54   ` Eric Dumazet
  2009-11-16 21:50 ` PATCH net-next-2.6] linkwatch: linkwatch_forget_dev() to speedup device dismantle Eric Dumazet
  1 sibling, 1 reply; 13+ messages in thread
From: Stephen Hemminger @ 2009-11-16 19:02 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David S. Miller, Herbert Xu, Linux Netdev List

On Mon, 16 Nov 2009 19:01:37 +0100
Eric Dumazet <eric.dumazet@gmail.com> wrote:

> time ip link del eth3.103 ; time ip link del eth3.104 ; time ip link del eth3.105
> 
> real	0m0.266s
> user	0m0.000s
> sys	0m0.001s
> 
> real	0m0.770s
> user	0m0.000s
> sys	0m0.000s
> 
> real	0m1.022s
> user	0m0.000s
> sys	0m0.000s
> 
> 
> One problem of current schem in vlan dismantle phase is the
> holding of device done by following chain :
> 
> vlan_dev_stop() ->
> 	netif_carrier_off(dev) ->
> 		linkwatch_fire_event(dev) ->
> 			dev_hold() ...
> 
> And __linkwatch_run_queue() run up to one second later...
> 
> Is following patch one way to avoid the problem, or should
> we add a new linkwatch_forgetpro_device(dev) method to immediately
> release the device reference (and unlink device from the list) ?
> (This would probably need a doubly linked list instead of single link list)

Or both..

-- 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC] dev->refcnt long term holder
  2009-11-16 19:02 ` Stephen Hemminger
@ 2009-11-16 19:54   ` Eric Dumazet
  2009-11-17  8:30     ` David Miller
  0 siblings, 1 reply; 13+ messages in thread
From: Eric Dumazet @ 2009-11-16 19:54 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: David S. Miller, Herbert Xu, Linux Netdev List

Stephen Hemminger a écrit :
> 
> Or both..
> 

Well well well :)

I was hoping a fast path, but anyway, a linkwatch_forget_dev(dev) is probably better,
(in case "ip link del" closely follows an "ip link add / up")

I'll post something when tested.

Thanks

^ permalink raw reply	[flat|nested] 13+ messages in thread

* PATCH net-next-2.6] linkwatch: linkwatch_forget_dev() to speedup device dismantle
  2009-11-16 18:01 [RFC] dev->refcnt long term holder Eric Dumazet
  2009-11-16 19:02 ` Stephen Hemminger
@ 2009-11-16 21:50 ` Eric Dumazet
  2009-11-16 22:39   ` Stephen Hemminger
  1 sibling, 1 reply; 13+ messages in thread
From: Eric Dumazet @ 2009-11-16 21:50 UTC (permalink / raw)
  To: David S. Miller, Herbert Xu, Stephen Hemminger; +Cc: Linux Netdev List

time ip link del eth3.103 ; time ip link del eth3.104 ; time ip link del eth3.105

real	0m0.266s
user	0m0.000s
sys	0m0.001s

real	0m0.770s
user	0m0.000s
sys	0m0.000s

real	0m1.022s
user	0m0.000s
sys	0m0.000s


One problem of current schem in vlan dismantle phase is the
holding of device done by following chain :

vlan_dev_stop() ->
	netif_carrier_off(dev) ->
		linkwatch_fire_event(dev) ->
			dev_hold() ...

And __linkwatch_run_queue() runs up to one second later...

A generic fix to this problem is to add a linkwatch_forget_dev() method
to unlink the device from the list of watched devices.

dev->link_watch_next becomes dev->link_watch_list (and use a bit more memory),
to be able to unlink device in O(1).

After patch :
time ip link del eth3.103 ; time ip link del eth3.104 ; time ip link del eth3.105

real    0m0.024s
user    0m0.000s
sys     0m0.000s

real    0m0.032s
user    0m0.000s
sys     0m0.001s

real    0m0.033s
user    0m0.000s
sys     0m0.000s


Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 include/linux/netdevice.h |    3 +-
 net/core/dev.c            |    3 ++
 net/core/link_watch.c     |   42 ++++++++++++++++++++++++------------
 3 files changed, 33 insertions(+), 15 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 7043f85..4e25730 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -896,7 +896,7 @@ struct net_device {
 	/* device index hash chain */
 	struct hlist_node	index_hlist;
 
-	struct net_device	*link_watch_next;
+	struct list_head	link_watch_list;
 
 	/* register/unregister state machine */
 	enum { NETREG_UNINITIALIZED=0,
@@ -1600,6 +1600,7 @@ static inline void dev_hold(struct net_device *dev)
  */
 
 extern void linkwatch_fire_event(struct net_device *dev);
+extern void linkwatch_forget_dev(struct net_device *dev);
 
 /**
  *	netif_carrier_ok - test if carrier present
diff --git a/net/core/dev.c b/net/core/dev.c
index 4b24d79..649de02 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -5066,6 +5066,8 @@ static void netdev_wait_allrefs(struct net_device *dev)
 {
 	unsigned long rebroadcast_time, warning_time;
 
+	linkwatch_forget_dev(dev);
+
 	rebroadcast_time = warning_time = jiffies;
 	while (atomic_read(&dev->refcnt) != 0) {
 		if (time_after(jiffies, rebroadcast_time + 1 * HZ)) {
@@ -5280,6 +5282,7 @@ struct net_device *alloc_netdev_mq(int sizeof_priv, const char *name,
 
 	INIT_LIST_HEAD(&dev->napi_list);
 	INIT_LIST_HEAD(&dev->unreg_list);
+	INIT_LIST_HEAD(&dev->link_watch_list);
 	dev->priv_flags = IFF_XMIT_DST_RELEASE;
 	setup(dev);
 	strcpy(dev->name, name);
diff --git a/net/core/link_watch.c b/net/core/link_watch.c
index bf8f7af..05fe273 100644
--- a/net/core/link_watch.c
+++ b/net/core/link_watch.c
@@ -35,7 +35,7 @@ static unsigned long linkwatch_nextevent;
 static void linkwatch_event(struct work_struct *dummy);
 static DECLARE_DELAYED_WORK(linkwatch_work, linkwatch_event);
 
-static struct net_device *lweventlist;
+static LIST_HEAD(lweventlist);
 static DEFINE_SPINLOCK(lweventlist_lock);
 
 static unsigned char default_operstate(const struct net_device *dev)
@@ -89,8 +89,10 @@ static void linkwatch_add_event(struct net_device *dev)
 	unsigned long flags;
 
 	spin_lock_irqsave(&lweventlist_lock, flags);
-	dev->link_watch_next = lweventlist;
-	lweventlist = dev;
+	if (list_empty(&dev->link_watch_list)) {
+		list_add_tail(&dev->link_watch_list, &lweventlist);
+		dev_hold(dev);
+	}
 	spin_unlock_irqrestore(&lweventlist_lock, flags);
 }
 
@@ -135,7 +137,8 @@ static void linkwatch_schedule_work(int urgent)
 
 static void __linkwatch_run_queue(int urgent_only)
 {
-	struct net_device *next;
+	struct net_device *dev;
+	LIST_HEAD(wrk);
 
 	/*
 	 * Limit the number of linkwatch events to one
@@ -153,19 +156,18 @@ static void __linkwatch_run_queue(int urgent_only)
 	clear_bit(LW_URGENT, &linkwatch_flags);
 
 	spin_lock_irq(&lweventlist_lock);
-	next = lweventlist;
-	lweventlist = NULL;
-	spin_unlock_irq(&lweventlist_lock);
+	list_splice_init(&lweventlist, &wrk);
 
-	while (next) {
-		struct net_device *dev = next;
+	while (!list_empty(&wrk)) {
 
-		next = dev->link_watch_next;
+		dev = list_first_entry(&wrk, struct net_device, link_watch_list);
+		list_del_init(&dev->link_watch_list);
 
 		if (urgent_only && !linkwatch_urgent_event(dev)) {
-			linkwatch_add_event(dev);
+			list_add_tail(&dev->link_watch_list, &lweventlist);
 			continue;
 		}
+		spin_unlock_irq(&lweventlist_lock);
 
 		/*
 		 * Make sure the above read is complete since it can be
@@ -189,10 +191,24 @@ static void __linkwatch_run_queue(int urgent_only)
 		}
 
 		dev_put(dev);
+		spin_lock_irq(&lweventlist_lock);
 	}
 
-	if (lweventlist)
+	if (!list_empty(&lweventlist))
 		linkwatch_schedule_work(0);
+	spin_unlock_irq(&lweventlist_lock);
+}
+
+void linkwatch_forget_dev(struct net_device *dev)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&lweventlist_lock, flags);
+	if (!list_empty(&dev->link_watch_list)) {
+		list_del_init(&dev->link_watch_list);
+		dev_put(dev);
+	}
+	spin_unlock_irqrestore(&lweventlist_lock, flags);
 }
 
 
@@ -216,8 +232,6 @@ void linkwatch_fire_event(struct net_device *dev)
 	bool urgent = linkwatch_urgent_event(dev);
 
 	if (!test_and_set_bit(__LINK_STATE_LINKWATCH_PENDING, &dev->state)) {
-		dev_hold(dev);
-
 		linkwatch_add_event(dev);
 	} else if (!urgent)
 		return;

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: PATCH net-next-2.6] linkwatch: linkwatch_forget_dev() to speedup device dismantle
  2009-11-16 21:50 ` PATCH net-next-2.6] linkwatch: linkwatch_forget_dev() to speedup device dismantle Eric Dumazet
@ 2009-11-16 22:39   ` Stephen Hemminger
  2009-11-17 12:26     ` David Miller
  0 siblings, 1 reply; 13+ messages in thread
From: Stephen Hemminger @ 2009-11-16 22:39 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David S. Miller, Herbert Xu, Linux Netdev List

On Mon, 16 Nov 2009 22:50:48 +0100
Eric Dumazet <eric.dumazet@gmail.com> wrote:

> vlan_dev_stop() ->
> 	netif_carrier_off(dev) ->
> 		linkwatch_fire_event(dev) ->
> 			dev_hold() ...
> 
> And __linkwatch_run_queue() runs up to one second later...
> 
> A generic fix to this problem is to add a linkwatch_forget_dev() method
> to unlink the device from the list of watched devices.
> 
> dev->link_watch_next becomes dev->link_watch_list (and use a bit more memory),
> to be able to unlink device in O(1).
> 
> After patch :
> time ip link del eth3.103 ; time ip link del eth3.104 ; time ip link del eth3.105
> 
> real    0m0.024s
> user    0m0.000s
> sys     0m0.000s
> 
> real    0m0.032s
> user    0m0.000s
> sys     0m0.001s
> 
> real    0m0.033s
> user    0m0.000s
> sys     0m0.000s
> 
> 
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>

Acked-by: Stephen Hemminger <shemminger@vyatta.com>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC] dev->refcnt long term holder
  2009-11-16 19:54   ` Eric Dumazet
@ 2009-11-17  8:30     ` David Miller
  2009-11-17  8:37       ` Herbert Xu
  2009-11-17 17:58       ` Stephen Hemminger
  0 siblings, 2 replies; 13+ messages in thread
From: David Miller @ 2009-11-17  8:30 UTC (permalink / raw)
  To: eric.dumazet; +Cc: shemminger, herbert, netdev

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Mon, 16 Nov 2009 20:54:29 +0100

> I was hoping a fast path, but anyway, a linkwatch_forget_dev(dev) is
> probably better, (in case "ip link del" closely follows an "ip link
> add / up")
> 
> I'll post something when tested.

Let's face it, linkwatch has been a thorn in our sides for a long
time.  A non-stop source of problems.

I suspect that here in this VLAN case you care about, the work can
even be done synchronously.

I'm trying to remember why we added this asynchronous link state event
processing monster.  It probably has something to do with needing a
sleepable context.  What's amusing is that linkwatch has repeatably
caused RTNL deadlock issues over the years. :-)

If it is purely an issue of doing link state processing outside of
HW irq context:

1) PHY events in most drivers are handled in softirq (NAPI ->poll())
   or a workqueue of some sort these days.  If we can make all of
   the link state management softirq safe (it probably is, except
   for perhaps RTNL), these case can be done synchronously always.

2) The remaining cases are device probe, open, and close.  None of
   which execute in HW irq context, they can sleep, and they hold
   RTNL already.

I'm sure there are a few drivers which still can invoke
netif_carrier_*() in HW interrupt context, but we can create a tiny
workqueue helper or something like that for them.  It won't be the
main way to invoke carrier state changes, just a compatability item.

Anyways, just some food for thought.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC] dev->refcnt long term holder
  2009-11-17  8:30     ` David Miller
@ 2009-11-17  8:37       ` Herbert Xu
  2009-11-17 17:58       ` Stephen Hemminger
  1 sibling, 0 replies; 13+ messages in thread
From: Herbert Xu @ 2009-11-17  8:37 UTC (permalink / raw)
  To: David Miller; +Cc: eric.dumazet, shemminger, netdev

On Tue, Nov 17, 2009 at 12:30:19AM -0800, David Miller wrote:
> 
> 1) PHY events in most drivers are handled in softirq (NAPI ->poll())
>    or a workqueue of some sort these days.  If we can make all of
>    the link state management softirq safe (it probably is, except
>    for perhaps RTNL), these case can be done synchronously always.

I think one of the reasons it's done this way is to call the function
netdev_state_change which needs to be in process context to walk
through the notifier list and do all the calls.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: PATCH net-next-2.6] linkwatch: linkwatch_forget_dev() to speedup device dismantle
  2009-11-16 22:39   ` Stephen Hemminger
@ 2009-11-17 12:26     ` David Miller
  2009-11-17 12:31       ` Herbert Xu
  0 siblings, 1 reply; 13+ messages in thread
From: David Miller @ 2009-11-17 12:26 UTC (permalink / raw)
  To: shemminger; +Cc: eric.dumazet, herbert, netdev

From: Stephen Hemminger <shemminger@vyatta.com>
Date: Mon, 16 Nov 2009 14:39:17 -0800

> On Mon, 16 Nov 2009 22:50:48 +0100
> Eric Dumazet <eric.dumazet@gmail.com> wrote:
> 
>> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> 
> Acked-by: Stephen Hemminger <shemminger@vyatta.com>

Is this really valid?

The whole point in emitting the netif_carrier_off() is so
that applications see the event in userspace and therefore
can clean things up.

Sure, the kernel will no longer make the device visible, and therefore
the application can't operate on it any longer.  But the application
is deserved of receiving the event anyways so that it can clean up
internal state and datastructures.

It seem to me that in this ->stop() case we'll now elide the event
emission, and I don't see how that can be right.

Really, the link watch stuff is just due for a redesign.  I don't
think a simple hack is going to cut it this time, sorry Eric :-)

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: PATCH net-next-2.6] linkwatch: linkwatch_forget_dev() to speedup device dismantle
  2009-11-17 12:26     ` David Miller
@ 2009-11-17 12:31       ` Herbert Xu
  2009-11-17 15:59         ` Eric Dumazet
  0 siblings, 1 reply; 13+ messages in thread
From: Herbert Xu @ 2009-11-17 12:31 UTC (permalink / raw)
  To: David Miller; +Cc: shemminger, eric.dumazet, netdev

On Tue, Nov 17, 2009 at 04:26:04AM -0800, David Miller wrote:
>
> Really, the link watch stuff is just due for a redesign.  I don't
> think a simple hack is going to cut it this time, sorry Eric :-)

I have no objections against any redesigns, but since the only
caller of linkwatch_forget_dev runs in process context with the
RTNL, it could also legally emit those events.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: PATCH net-next-2.6] linkwatch: linkwatch_forget_dev() to speedup device dismantle
  2009-11-17 12:31       ` Herbert Xu
@ 2009-11-17 15:59         ` Eric Dumazet
  2009-11-18 13:05           ` David Miller
  0 siblings, 1 reply; 13+ messages in thread
From: Eric Dumazet @ 2009-11-17 15:59 UTC (permalink / raw)
  To: Herbert Xu, David Miller; +Cc: shemminger, netdev

Herbert Xu a écrit :
> On Tue, Nov 17, 2009 at 04:26:04AM -0800, David Miller wrote:
>> Really, the link watch stuff is just due for a redesign.  I don't
>> think a simple hack is going to cut it this time, sorry Eric :-)
> 
> I have no objections against any redesigns, but since the only
> caller of linkwatch_forget_dev runs in process context with the
> RTNL, it could also legally emit those events.

Thanks guys, here an updated version then, before linkwatch surgery ?

In this version, I force the event to be sent synchronously.

[PATCH net-next-2.6] linkwatch: linkwatch_forget_dev() to speedup device dismantle

time ip link del eth3.103 ; time ip link del eth3.104 ; time ip link del eth3.105

real	0m0.266s
user	0m0.000s
sys	0m0.001s

real	0m0.770s
user	0m0.000s
sys	0m0.000s

real	0m1.022s
user	0m0.000s
sys	0m0.000s


One problem of current schem in vlan dismantle phase is the
holding of device done by following chain :

vlan_dev_stop() ->
	netif_carrier_off(dev) ->
		linkwatch_fire_event(dev) ->
			dev_hold() ...

And __linkwatch_run_queue() runs up to one second later...

A generic fix to this problem is to add a linkwatch_forget_dev() method
to unlink the device from the list of watched devices.

dev->link_watch_next becomes dev->link_watch_list (and use a bit more memory),
to be able to unlink device in O(1).

After patch :
time ip link del eth3.103 ; time ip link del eth3.104 ; time ip link del eth3.105

real    0m0.024s
user    0m0.000s
sys     0m0.000s

real    0m0.032s
user    0m0.000s
sys     0m0.001s

real    0m0.033s
user    0m0.000s
sys     0m0.000s


Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 include/linux/netdevice.h |    3 -
 net/core/dev.c            |    3 +
 net/core/link_watch.c     |   94 +++++++++++++++++++++---------------
 3 files changed, 62 insertions(+), 38 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 7043f85..4e25730 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -896,7 +896,7 @@ struct net_device {
 	/* device index hash chain */
 	struct hlist_node	index_hlist;
 
-	struct net_device	*link_watch_next;
+	struct list_head	link_watch_list;
 
 	/* register/unregister state machine */
 	enum { NETREG_UNINITIALIZED=0,
@@ -1600,6 +1600,7 @@ static inline void dev_hold(struct net_device *dev)
  */
 
 extern void linkwatch_fire_event(struct net_device *dev);
+extern void linkwatch_forget_dev(struct net_device *dev);
 
 /**
  *	netif_carrier_ok - test if carrier present
diff --git a/net/core/dev.c b/net/core/dev.c
index d867522..9b58c04 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -5067,6 +5067,8 @@ static void netdev_wait_allrefs(struct net_device *dev)
 {
 	unsigned long rebroadcast_time, warning_time;
 
+	linkwatch_forget_dev(dev);
+
 	rebroadcast_time = warning_time = jiffies;
 	while (atomic_read(&dev->refcnt) != 0) {
 		if (time_after(jiffies, rebroadcast_time + 1 * HZ)) {
@@ -5281,6 +5283,7 @@ struct net_device *alloc_netdev_mq(int sizeof_priv, const char *name,
 
 	INIT_LIST_HEAD(&dev->napi_list);
 	INIT_LIST_HEAD(&dev->unreg_list);
+	INIT_LIST_HEAD(&dev->link_watch_list);
 	dev->priv_flags = IFF_XMIT_DST_RELEASE;
 	setup(dev);
 	strcpy(dev->name, name);
diff --git a/net/core/link_watch.c b/net/core/link_watch.c
index bf8f7af..5910b55 100644
--- a/net/core/link_watch.c
+++ b/net/core/link_watch.c
@@ -35,7 +35,7 @@ static unsigned long linkwatch_nextevent;
 static void linkwatch_event(struct work_struct *dummy);
 static DECLARE_DELAYED_WORK(linkwatch_work, linkwatch_event);
 
-static struct net_device *lweventlist;
+static LIST_HEAD(lweventlist);
 static DEFINE_SPINLOCK(lweventlist_lock);
 
 static unsigned char default_operstate(const struct net_device *dev)
@@ -89,8 +89,10 @@ static void linkwatch_add_event(struct net_device *dev)
 	unsigned long flags;
 
 	spin_lock_irqsave(&lweventlist_lock, flags);
-	dev->link_watch_next = lweventlist;
-	lweventlist = dev;
+	if (list_empty(&dev->link_watch_list)) {
+		list_add_tail(&dev->link_watch_list, &lweventlist);
+		dev_hold(dev);
+	}
 	spin_unlock_irqrestore(&lweventlist_lock, flags);
 }
 
@@ -133,9 +135,35 @@ static void linkwatch_schedule_work(int urgent)
 }
 
 
+static void linkwatch_do_dev(struct net_device *dev)
+{
+	/*
+	 * Make sure the above read is complete since it can be
+	 * rewritten as soon as we clear the bit below.
+	 */
+	smp_mb__before_clear_bit();
+
+	/* We are about to handle this device,
+	 * so new events can be accepted
+	 */
+	clear_bit(__LINK_STATE_LINKWATCH_PENDING, &dev->state);
+
+	rfc2863_policy(dev);
+	if (dev->flags & IFF_UP) {
+		if (netif_carrier_ok(dev))
+			dev_activate(dev);
+		else
+			dev_deactivate(dev);
+
+		netdev_state_change(dev);
+	}
+	dev_put(dev);
+}
+
 static void __linkwatch_run_queue(int urgent_only)
 {
-	struct net_device *next;
+	struct net_device *dev;
+	LIST_HEAD(wrk);
 
 	/*
 	 * Limit the number of linkwatch events to one
@@ -153,46 +181,40 @@ static void __linkwatch_run_queue(int urgent_only)
 	clear_bit(LW_URGENT, &linkwatch_flags);
 
 	spin_lock_irq(&lweventlist_lock);
-	next = lweventlist;
-	lweventlist = NULL;
-	spin_unlock_irq(&lweventlist_lock);
+	list_splice_init(&lweventlist, &wrk);
 
-	while (next) {
-		struct net_device *dev = next;
+	while (!list_empty(&wrk)) {
 
-		next = dev->link_watch_next;
+		dev = list_first_entry(&wrk, struct net_device, link_watch_list);
+		list_del_init(&dev->link_watch_list);
 
 		if (urgent_only && !linkwatch_urgent_event(dev)) {
-			linkwatch_add_event(dev);
+			list_add_tail(&dev->link_watch_list, &lweventlist);
 			continue;
 		}
-
-		/*
-		 * Make sure the above read is complete since it can be
-		 * rewritten as soon as we clear the bit below.
-		 */
-		smp_mb__before_clear_bit();
-
-		/* We are about to handle this device,
-		 * so new events can be accepted
-		 */
-		clear_bit(__LINK_STATE_LINKWATCH_PENDING, &dev->state);
-
-		rfc2863_policy(dev);
-		if (dev->flags & IFF_UP) {
-			if (netif_carrier_ok(dev))
-				dev_activate(dev);
-			else
-				dev_deactivate(dev);
-
-			netdev_state_change(dev);
-		}
-
-		dev_put(dev);
+		spin_unlock_irq(&lweventlist_lock);
+		linkwatch_do_dev(dev);
+		spin_lock_irq(&lweventlist_lock);
 	}
 
-	if (lweventlist)
+	if (!list_empty(&lweventlist))
 		linkwatch_schedule_work(0);
+	spin_unlock_irq(&lweventlist_lock);
+}
+
+void linkwatch_forget_dev(struct net_device *dev)
+{
+	unsigned long flags;
+	int clean = 0;
+
+	spin_lock_irqsave(&lweventlist_lock, flags);
+	if (!list_empty(&dev->link_watch_list)) {
+		list_del_init(&dev->link_watch_list);
+		clean = 1;
+	}
+	spin_unlock_irqrestore(&lweventlist_lock, flags);
+	if (clean)
+		linkwatch_do_dev(dev);
 }
 
 
@@ -216,8 +238,6 @@ void linkwatch_fire_event(struct net_device *dev)
 	bool urgent = linkwatch_urgent_event(dev);
 
 	if (!test_and_set_bit(__LINK_STATE_LINKWATCH_PENDING, &dev->state)) {
-		dev_hold(dev);
-
 		linkwatch_add_event(dev);
 	} else if (!urgent)
 		return;

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [RFC] dev->refcnt long term holder
  2009-11-17  8:30     ` David Miller
  2009-11-17  8:37       ` Herbert Xu
@ 2009-11-17 17:58       ` Stephen Hemminger
  2009-11-17 19:18         ` David Miller
  1 sibling, 1 reply; 13+ messages in thread
From: Stephen Hemminger @ 2009-11-17 17:58 UTC (permalink / raw)
  To: David Miller; +Cc: eric.dumazet, herbert, netdev

On Tue, 17 Nov 2009 00:30:19 -0800 (PST)
David Miller <davem@davemloft.net> wrote:

> I'm trying to remember why we added this asynchronous link state event
> processing monster.  It probably has something to do with needing a
> sleepable context.  What's amusing is that linkwatch has repeatably
> caused RTNL deadlock issues over the years. :-)

I thought it was to handle:
 1) carrier on old devices would bounce, so it provides ratelimiting
    of state changes. Modern hardware and CPU's probably makes this a non-issue.
 2) wasn't there some code path with device changes, hotplug, uevent and
    udev that meant that we couldn't do notifiers immediately.

-- 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC] dev->refcnt long term holder
  2009-11-17 17:58       ` Stephen Hemminger
@ 2009-11-17 19:18         ` David Miller
  0 siblings, 0 replies; 13+ messages in thread
From: David Miller @ 2009-11-17 19:18 UTC (permalink / raw)
  To: shemminger; +Cc: eric.dumazet, herbert, netdev

From: Stephen Hemminger <shemminger@vyatta.com>
Date: Tue, 17 Nov 2009 09:58:46 -0800

> I thought it was to handle:
>  1) carrier on old devices would bounce, so it provides ratelimiting
>     of state changes. Modern hardware and CPU's probably makes this a non-issue.
>  2) wasn't there some code path with device changes, hotplug, uevent and
>     udev that meant that we couldn't do notifiers immediately.

I did a lot of code review in this area last night and it seems to be
#1 above and simply being able to do sleeping things to send the
events even though the carrier state changes happen in interrupt
context.

Even for what it's designed to do, it's overengineered.

So what I propose is that we simplify the design and also allow direct
invocation for cases where we're already in a sleepable context and/or
holding RTNL.  Similar to how Eric is doing in his latest linkwatch
patch for VLANs.

Note also that linkwatch's current implementation is the sole reason
we do the real work of netdevice destruction after dropping RTNL :-)
Linkwatch and unregister_netdevice() used to deadlock on RTNL.

>From history-2.6 GIT:

commit ff936f4e8148e75b20595eda5de6d3a4bb55b631
Author: David S. Miller <davem@nuts.ninka.net>
Date:   Mon May 19 04:30:48 2003 -0700

    [NET]: Fix netdevice unregister races.
    
    We had two major issues when unregistering networking devices.
    1) Even trying to run hotplug asynchronously could deadlock
       if keventd was currently trying to get the RTNL semaphore
       in order to process linkwatch events.
    2) Unregister needs to wait for the last reference to go away
       before the finalization of the unregister can execute.  This
       cannot occur under the RTNL semaphore as this is deadlock
       prone as well.
    
    The solution is to do all of this stuff after dropping the
    RTNL semaphore.  rtnl_lock, if it is about to protect a region
    of code that could unregister network devices, registers a list
    to which unregistered netdevs are attached.  At rtnl_unlock time
    this list is processed to wait for refcounts to drop to zero and
    then finalize the unregister.




^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: PATCH net-next-2.6] linkwatch: linkwatch_forget_dev() to speedup device dismantle
  2009-11-17 15:59         ` Eric Dumazet
@ 2009-11-18 13:05           ` David Miller
  0 siblings, 0 replies; 13+ messages in thread
From: David Miller @ 2009-11-18 13:05 UTC (permalink / raw)
  To: eric.dumazet; +Cc: herbert, shemminger, netdev

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Tue, 17 Nov 2009 16:59:21 +0100

> Herbert Xu a écrit :
>> On Tue, Nov 17, 2009 at 04:26:04AM -0800, David Miller wrote:
>>> Really, the link watch stuff is just due for a redesign.  I don't
>>> think a simple hack is going to cut it this time, sorry Eric :-)
>> 
>> I have no objections against any redesigns, but since the only
>> caller of linkwatch_forget_dev runs in process context with the
>> RTNL, it could also legally emit those events.
> 
> Thanks guys, here an updated version then, before linkwatch surgery ?

Sure, applied to net-next-2.6, thanks.

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2009-11-18 13:05 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-11-16 18:01 [RFC] dev->refcnt long term holder Eric Dumazet
2009-11-16 19:02 ` Stephen Hemminger
2009-11-16 19:54   ` Eric Dumazet
2009-11-17  8:30     ` David Miller
2009-11-17  8:37       ` Herbert Xu
2009-11-17 17:58       ` Stephen Hemminger
2009-11-17 19:18         ` David Miller
2009-11-16 21:50 ` PATCH net-next-2.6] linkwatch: linkwatch_forget_dev() to speedup device dismantle Eric Dumazet
2009-11-16 22:39   ` Stephen Hemminger
2009-11-17 12:26     ` David Miller
2009-11-17 12:31       ` Herbert Xu
2009-11-17 15:59         ` Eric Dumazet
2009-11-18 13:05           ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).