netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* linkwatch bustage in git-net
@ 2007-05-09  5:20 Andrew Morton
  2007-05-09  5:31 ` Herbert Xu
  0 siblings, 1 reply; 12+ messages in thread
From: Andrew Morton @ 2007-05-09  5:20 UTC (permalink / raw)
  To: Herbert Xu, David S. Miller; +Cc: netdev


My Vaio is taking 5-10x longer to boot when git-net as of five minutes ago
is applied.

It's not very clear what's gone wrong.  A few linkwatch things are showing
up in profiles, but it's not burning excessive CPU afaict.



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: linkwatch bustage in git-net
  2007-05-09  5:20 linkwatch bustage in git-net Andrew Morton
@ 2007-05-09  5:31 ` Herbert Xu
  2007-05-09  5:45   ` Herbert Xu
  0 siblings, 1 reply; 12+ messages in thread
From: Herbert Xu @ 2007-05-09  5:31 UTC (permalink / raw)
  To: Andrew Morton; +Cc: David S. Miller, netdev

On Tue, May 08, 2007 at 10:20:33PM -0700, Andrew Morton wrote:
> 
> My Vaio is taking 5-10x longer to boot when git-net as of five minutes ago
> is applied.
> 
> It's not very clear what's gone wrong.  A few linkwatch things are showing
> up in profiles, but it's not burning excessive CPU afaict.

Hmm, I don't see it here (probably because we use different NICs).
But does this help?

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
diff --git a/net/core/link_watch.c b/net/core/link_watch.c
index b5f4579..4674ae5 100644
--- a/net/core/link_watch.c
+++ b/net/core/link_watch.c
@@ -101,8 +101,10 @@ static void linkwatch_schedule_work(unsigned long delay)
 		return;
 
 	/* If we wrap around we'll delay it by at most HZ. */
-	if (delay > HZ)
+	if (delay > HZ) {
+		linkwatch_nextevent = jiffies;
 		delay = 0;
+	}
 
 	schedule_delayed_work(&linkwatch_work, delay);
 }

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: linkwatch bustage in git-net
  2007-05-09  5:31 ` Herbert Xu
@ 2007-05-09  5:45   ` Herbert Xu
       [not found]     ` <20070508225153.d538f79c.akpm@linux-foundation.org>
  2007-05-09  6:11     ` Andrew Morton
  0 siblings, 2 replies; 12+ messages in thread
From: Herbert Xu @ 2007-05-09  5:45 UTC (permalink / raw)
  To: Andrew Morton; +Cc: David S. Miller, netdev

On Wed, May 09, 2007 at 03:31:55PM +1000, Herbert Xu wrote:
> 
> Hmm, I don't see it here (probably because we use different NICs).
> But does this help?

Thinking about it again I don't think it will help you because if your
carrier started out as off then it would've been considered an urgent
event anyway.

So what NIC are you using? And where abouts in the boot process is it
hanging? For exmaple, is it hanging when obtaining a DHCP address?

In any case, this patch can't hurt.  So here's one with a changelog:

[NET] link_watch: Eliminate potential delay on wrap-around

When the jiffies wrap around or when the system boots up for the first
time, down events can be delayed indefinitely since we no longer update
linkwatch_nextevent when only urgent events are processed.

This patch fixes this by setting linkwatch_nextevent when a wrap-around
occurs.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
diff --git a/net/core/link_watch.c b/net/core/link_watch.c
index b5f4579..4674ae5 100644
--- a/net/core/link_watch.c
+++ b/net/core/link_watch.c
@@ -101,8 +101,10 @@ static void linkwatch_schedule_work(unsigned long delay)
 		return;
 
 	/* If we wrap around we'll delay it by at most HZ. */
-	if (delay > HZ)
+	if (delay > HZ) {
+		linkwatch_nextevent = jiffies;
 		delay = 0;
+	}
 
 	schedule_delayed_work(&linkwatch_work, delay);
 }

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: linkwatch bustage in git-net
       [not found]     ` <20070508225153.d538f79c.akpm@linux-foundation.org>
@ 2007-05-09  5:54       ` Herbert Xu
  2007-05-09  5:58         ` Andrew Morton
  0 siblings, 1 reply; 12+ messages in thread
From: Herbert Xu @ 2007-05-09  5:54 UTC (permalink / raw)
  To: Andrew Morton; +Cc: David S. Miller, netdev

On Tue, May 08, 2007 at 10:51:53PM -0700, Andrew Morton wrote:
> 
> It's not hanging.  It's just going reeeeeeeaaaaallllllyyyy sslllooowwllyyy.
> 
> This is first noticeable during udev startup and persists all the way
> through initscripts.

OK, does reverting the link watch patches make this go away?

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: linkwatch bustage in git-net
  2007-05-09  5:54       ` Herbert Xu
@ 2007-05-09  5:58         ` Andrew Morton
  0 siblings, 0 replies; 12+ messages in thread
From: Andrew Morton @ 2007-05-09  5:58 UTC (permalink / raw)
  To: Herbert Xu; +Cc: David S. Miller, netdev

On Wed, 9 May 2007 15:54:05 +1000 Herbert Xu <herbert@gondor.apana.org.au> wrote:

> On Tue, May 08, 2007 at 10:51:53PM -0700, Andrew Morton wrote:
> > 
> > It's not hanging.  It's just going reeeeeeeaaaaallllllyyyy sslllooowwllyyy.
> > 
> > This is first noticeable during udev startup and persists all the way
> > through initscripts.
> 
> OK, does reverting the link watch patches make this go away?
> 

That's the only thing which is in git-net.  Here it is, in toto:


GIT 4833bb610f5d940d623c99ca5ed83e1faf9a7881 git+ssh://master.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6.git

commit 
Author: Geert Uytterhoeven <geert@linux-m68k.org>
Date:   Tue May 8 18:40:27 2007 -0700

    [MAC80211]: include <linux/delay.h> instead of <asm/delay.h>
    
    |   CC      net/mac80211/ieee80211_sta.o
    | In file included from linux/net/mac80211/ieee80211_sta.c:31:
    | include2/asm/delay.h: In function '__const_udelay':
    | include2/asm/delay.h:33: error: 'loops_per_jiffy' undeclared (first use in this function)
    | include2/asm/delay.h:33: error: (Each undeclared identifier is reported only once
    | include2/asm/delay.h:33: error: for each function it appears in.)
    
    Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
    Signed-off-by: John W. Linville <linville@tuxdriver.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit 804abf08e316754dc50f10f8dd6957b2e47c879d
Author: Herbert Xu <herbert@gondor.apana.org.au>
Date:   Tue May 8 18:36:28 2007 -0700

    [NET]: Remove link_watch delay for up even when we're down
    
    Currently all link carrier events are delayed by up to a second
    before they're processed to prevent link storms.  This causes
    unnecessary packet loss during that interval.
    
    In fact, we can achieve the same effect in preventing storms by
    only delaying down events and unnecssary up events.  The latter
    is defined as up events when we're already up.
    
    Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit 119a2edd882f7be9e90defcd57debf9d1658bbd5
Author: Herbert Xu <herbert@gondor.apana.org.au>
Date:   Tue May 8 18:34:17 2007 -0700

    [NET] link_watch: Move link watch list into net_device
    
    These days the link watch mechanism is an integral part of the
    network subsystem as it manages the carrier status.  So it now
    makes sense to allocate some memory for it in net_device rather
    than allocating it on demand.
    
    In fact, this is necessary because we can't tolerate a memory
    allocation failure since that means we'd have to potentially
    throw a link up event away.
    
    It also simplifies the code greatly.
    
    In doing so I discovered a subtle race condition in the use
    of singleevent.  This race condition still exists (and is
    somewhat magnified) without singleevent but it's now plugged
    thanks to an smp_mb__before_clear_bit.
    
    Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
    Signed-off-by: David S. Miller <davem@davemloft.net>
 include/linux/netdevice.h    |    2 +
 net/core/link_watch.c        |  134 +++++++++++++++++++++++++-----------------
 net/mac80211/ieee80211_sta.c |    2 -
 3 files changed, 82 insertions(+), 56 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 3044622..f671cd2 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -467,6 +467,8 @@ #endif
 	/* device index hash chain */
 	struct hlist_node	index_hlist;
 
+	struct net_device	*link_watch_next;
+
 	/* register/unregister state machine */
 	enum { NETREG_UNINITIALIZED=0,
 	       NETREG_REGISTERED,	/* completed register_netdevice */
diff --git a/net/core/link_watch.c b/net/core/link_watch.c
index e3c26a9..b5f4579 100644
--- a/net/core/link_watch.c
+++ b/net/core/link_watch.c
@@ -19,7 +19,6 @@ #include <net/pkt_sched.h>
 #include <linux/rtnetlink.h>
 #include <linux/jiffies.h>
 #include <linux/spinlock.h>
-#include <linux/list.h>
 #include <linux/slab.h>
 #include <linux/workqueue.h>
 #include <linux/bitops.h>
@@ -28,7 +27,6 @@ #include <asm/types.h>
 
 enum lw_bits {
 	LW_RUNNING = 0,
-	LW_SE_USED
 };
 
 static unsigned long linkwatch_flags;
@@ -37,17 +35,9 @@ static unsigned long linkwatch_nextevent
 static void linkwatch_event(struct work_struct *dummy);
 static DECLARE_DELAYED_WORK(linkwatch_work, linkwatch_event);
 
-static LIST_HEAD(lweventlist);
+static struct net_device *lweventlist;
 static DEFINE_SPINLOCK(lweventlist_lock);
 
-struct lw_event {
-	struct list_head list;
-	struct net_device *dev;
-};
-
-/* Avoid kmalloc() for most systems */
-static struct lw_event singleevent;
-
 static unsigned char default_operstate(const struct net_device *dev)
 {
 	if (!netif_carrier_ok(dev))
@@ -87,25 +77,73 @@ static void rfc2863_policy(struct net_de
 }
 
 
-/* Must be called with the rtnl semaphore held */
-void linkwatch_run_queue(void)
+static int linkwatch_urgent_event(struct net_device *dev)
+{
+	return netif_running(dev) && netif_carrier_ok(dev) &&
+	       dev->qdisc != dev->qdisc_sleeping;
+}
+
+
+static void linkwatch_add_event(struct net_device *dev)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&lweventlist_lock, flags);
+	dev->link_watch_next = lweventlist;
+	lweventlist = dev;
+	spin_unlock_irqrestore(&lweventlist_lock, flags);
+}
+
+
+static void linkwatch_schedule_work(unsigned long delay)
 {
-	struct list_head head, *n, *next;
+	if (test_and_set_bit(LW_RUNNING, &linkwatch_flags))
+		return;
+
+	/* If we wrap around we'll delay it by at most HZ. */
+	if (delay > HZ)
+		delay = 0;
+
+	schedule_delayed_work(&linkwatch_work, delay);
+}
+
+
+static void __linkwatch_run_queue(int urgent_only)
+{
+	struct net_device *next;
+
+	/*
+	 * Limit the number of linkwatch events to one
+	 * per second so that a runaway driver does not
+	 * cause a storm of messages on the netlink
+	 * socket.  This limit does not apply to up events
+	 * while the device qdisc is down.
+	 */
+	if (!urgent_only)
+		linkwatch_nextevent = jiffies + HZ;
+	clear_bit(LW_RUNNING, &linkwatch_flags);
 
 	spin_lock_irq(&lweventlist_lock);
-	list_replace_init(&lweventlist, &head);
+	next = lweventlist;
+	lweventlist = NULL;
 	spin_unlock_irq(&lweventlist_lock);
 
-	list_for_each_safe(n, next, &head) {
-		struct lw_event *event = list_entry(n, struct lw_event, list);
-		struct net_device *dev = event->dev;
+	while (next) {
+		struct net_device *dev = next;
+
+		next = dev->link_watch_next;
 
-		if (event == &singleevent) {
-			clear_bit(LW_SE_USED, &linkwatch_flags);
-		} else {
-			kfree(event);
+		if (urgent_only && !linkwatch_urgent_event(dev)) {
+			linkwatch_add_event(dev);
+			continue;
 		}
 
+		/*
+		 * Make sure the above read is complete since it can be
+		 * rewritten as soon as we clear the bit below.
+		 */
+		smp_mb__before_clear_bit();
+
 		/* We are about to handle this device,
 		 * so new events can be accepted
 		 */
@@ -124,21 +162,23 @@ void linkwatch_run_queue(void)
 
 		dev_put(dev);
 	}
+
+	if (lweventlist)
+		linkwatch_schedule_work(linkwatch_nextevent - jiffies);
 }
 
 
-static void linkwatch_event(struct work_struct *dummy)
+/* Must be called with the rtnl semaphore held */
+void linkwatch_run_queue(void)
 {
-	/* Limit the number of linkwatch events to one
-	 * per second so that a runaway driver does not
-	 * cause a storm of messages on the netlink
-	 * socket
-	 */
-	linkwatch_nextevent = jiffies + HZ;
-	clear_bit(LW_RUNNING, &linkwatch_flags);
+	__linkwatch_run_queue(0);
+}
+
 
+static void linkwatch_event(struct work_struct *dummy)
+{
 	rtnl_lock();
-	linkwatch_run_queue();
+	__linkwatch_run_queue(time_after(linkwatch_nextevent, jiffies));
 	rtnl_unlock();
 }
 
@@ -146,35 +186,19 @@ static void linkwatch_event(struct work_
 void linkwatch_fire_event(struct net_device *dev)
 {
 	if (!test_and_set_bit(__LINK_STATE_LINKWATCH_PENDING, &dev->state)) {
-		unsigned long flags;
-		struct lw_event *event;
-
-		if (test_and_set_bit(LW_SE_USED, &linkwatch_flags)) {
-			event = kmalloc(sizeof(struct lw_event), GFP_ATOMIC);
-
-			if (unlikely(event == NULL)) {
-				clear_bit(__LINK_STATE_LINKWATCH_PENDING, &dev->state);
-				return;
-			}
-		} else {
-			event = &singleevent;
-		}
+		unsigned long delay;
 
 		dev_hold(dev);
-		event->dev = dev;
 
-		spin_lock_irqsave(&lweventlist_lock, flags);
-		list_add_tail(&event->list, &lweventlist);
-		spin_unlock_irqrestore(&lweventlist_lock, flags);
+		linkwatch_add_event(dev);
 
-		if (!test_and_set_bit(LW_RUNNING, &linkwatch_flags)) {
-			unsigned long delay = linkwatch_nextevent - jiffies;
+		delay = linkwatch_nextevent - jiffies;
 
-			/* If we wrap around we'll delay it by at most HZ. */
-			if (delay > HZ)
-				delay = 0;
-			schedule_delayed_work(&linkwatch_work, delay);
-		}
+		/* Minimise down-time: drop delay for up event. */
+		if (linkwatch_urgent_event(dev))
+			delay = 0;
+
+		linkwatch_schedule_work(delay);
 	}
 }
 
diff --git a/net/mac80211/ieee80211_sta.c b/net/mac80211/ieee80211_sta.c
index 822917d..3e07e9d 100644
--- a/net/mac80211/ieee80211_sta.c
+++ b/net/mac80211/ieee80211_sta.c
@@ -17,6 +17,7 @@
  * scan result table filtering (by capability (privacy, IBSS/BSS, WPA/RSN IE,
  *    SSID)
  */
+#include <linux/delay.h>
 #include <linux/if_ether.h>
 #include <linux/skbuff.h>
 #include <linux/netdevice.h>
@@ -27,7 +28,6 @@ #include <linux/etherdevice.h>
 #include <linux/rtnetlink.h>
 #include <net/iw_handler.h>
 #include <asm/types.h>
-#include <asm/delay.h>
 
 #include <net/mac80211.h>
 #include "ieee80211_i.h"


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: linkwatch bustage in git-net
  2007-05-09  5:45   ` Herbert Xu
       [not found]     ` <20070508225153.d538f79c.akpm@linux-foundation.org>
@ 2007-05-09  6:11     ` Andrew Morton
  2007-05-09  6:22       ` David Miller
  2007-05-09  7:16       ` Herbert Xu
  1 sibling, 2 replies; 12+ messages in thread
From: Andrew Morton @ 2007-05-09  6:11 UTC (permalink / raw)
  To: Herbert Xu; +Cc: David S. Miller, netdev

On Wed, 9 May 2007 15:45:58 +1000 Herbert Xu <herbert@gondor.apana.org.au> wrote:

> On Wed, May 09, 2007 at 03:31:55PM +1000, Herbert Xu wrote:
> > 
> > Hmm, I don't see it here (probably because we use different NICs).
> > But does this help?
> 
> Thinking about it again I don't think it will help you because if your
> carrier started out as off then it would've been considered an urgent
> event anyway.
> 
> So what NIC are you using? And where abouts in the boot process is it
> hanging? For exmaple, is it hanging when obtaining a DHCP address?
> 
> In any case, this patch can't hurt.  So here's one with a changelog:
> 
> [NET] link_watch: Eliminate potential delay on wrap-around

hm, that fixed it.  Do we know why? ;)



btw, looking at the code:

	clear_bit(LW_RUNNING, &linkwatch_flags);

	spin_lock_irq(&lweventlist_lock);
	next = lweventlist;
	lweventlist = NULL;
	spin_unlock_irq(&lweventlist_lock);

	while (next) {
		struct net_device *dev = next;

		next = dev->link_watch_next;

lweventlist_lock protects lweventlist and every netdev's ->link_watch_next.
 But this code is walking that singly-linked list outside the lock and
after clearing LW_RUNNING.  What stops this singly-linked list from getting
altered by another thread of control while this code is traversing it?


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: linkwatch bustage in git-net
  2007-05-09  6:11     ` Andrew Morton
@ 2007-05-09  6:22       ` David Miller
  2007-05-09  7:16       ` Herbert Xu
  1 sibling, 0 replies; 12+ messages in thread
From: David Miller @ 2007-05-09  6:22 UTC (permalink / raw)
  To: akpm; +Cc: herbert, netdev

From: Andrew Morton <akpm@linux-foundation.org>
Date: Tue, 8 May 2007 23:11:43 -0700

> On Wed, 9 May 2007 15:45:58 +1000 Herbert Xu <herbert@gondor.apana.org.au> wrote:
> 
> > On Wed, May 09, 2007 at 03:31:55PM +1000, Herbert Xu wrote:
> > > 
> > > Hmm, I don't see it here (probably because we use different NICs).
> > > But does this help?
> > 
> > Thinking about it again I don't think it will help you because if your
> > carrier started out as off then it would've been considered an urgent
> > event anyway.
> > 
> > So what NIC are you using? And where abouts in the boot process is it
> > hanging? For exmaple, is it hanging when obtaining a DHCP address?
> > 
> > In any case, this patch can't hurt.  So here's one with a changelog:
> > 
> > [NET] link_watch: Eliminate potential delay on wrap-around
> 
> hm, that fixed it.  Do we know why? ;)

I wonder if, because of how we initialize jiffies on bootup to find
wraparound bugs, urgent_event is always true.

If it is always true, linkwatch_nextevent will never get updated.
Herbert's patch indirectly corrects that.

In the mean time I'll stick Herbert's patch into net-2.6

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: linkwatch bustage in git-net
  2007-05-09  6:11     ` Andrew Morton
  2007-05-09  6:22       ` David Miller
@ 2007-05-09  7:16       ` Herbert Xu
  2007-05-09  7:18         ` David Miller
  1 sibling, 1 reply; 12+ messages in thread
From: Herbert Xu @ 2007-05-09  7:16 UTC (permalink / raw)
  To: Andrew Morton; +Cc: David S. Miller, netdev

On Tue, May 08, 2007 at 11:11:43PM -0700, Andrew Morton wrote:
>
> > [NET] link_watch: Eliminate potential delay on wrap-around
> 
> hm, that fixed it.  Do we know why? ;)

Yep :)

> btw, looking at the code:
> 
> 	clear_bit(LW_RUNNING, &linkwatch_flags);
> 
> 	spin_lock_irq(&lweventlist_lock);
> 	next = lweventlist;
> 	lweventlist = NULL;
> 	spin_unlock_irq(&lweventlist_lock);
> 
> 	while (next) {
> 		struct net_device *dev = next;
> 
> 		next = dev->link_watch_next;
> 
> lweventlist_lock protects lweventlist and every netdev's ->link_watch_next.
>  But this code is walking that singly-linked list outside the lock and
> after clearing LW_RUNNING.  What stops this singly-linked list from getting
> altered by another thread of control while this code is traversing it?

It's protected by the __LINK_STATE_LINKWATCH_PENDING bit.

The real problem here is a combination of factors.  First of all e100
does a netif_carrier_off in its open routine *before* the first call
to dev_activate.  This when combined with the wrap-around bug causes
an infinitely delayed event for your NIC to be installed.

When the carrier comes up it notices that we already have an event
queued for that NIC so it just drops it even though it's urgent.

The previous bug fix limits this delay to one second but it is still
sub-optimal.

So here is the additional fix.

[NET] link_watch: Always schedule urgent events

Urgent events may be delayed if we already have a non-urgent event
queued for that device.  This patch changes this by making sure that
an urgent event is always looked at immediately.

I've replaced the LW_RUNNING flag by LW_URGENT since whether work
is scheduled is already kept track by the work queue system.

The only complication is that we have to provide some exclusion for
the setting linkwatch_nextevent which is available in the actual
work function.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
diff --git a/net/core/link_watch.c b/net/core/link_watch.c
index 4674ae5..a5e372b 100644
--- a/net/core/link_watch.c
+++ b/net/core/link_watch.c
@@ -26,7 +26,7 @@
 
 
 enum lw_bits {
-	LW_RUNNING = 0,
+	LW_URGENT = 0,
 };
 
 static unsigned long linkwatch_flags;
@@ -95,18 +95,41 @@ static void linkwatch_add_event(struct net_device *dev)
 }
 
 
-static void linkwatch_schedule_work(unsigned long delay)
+static void linkwatch_schedule_work(int urgent)
 {
-	if (test_and_set_bit(LW_RUNNING, &linkwatch_flags))
+	unsigned long delay = linkwatch_nextevent - jiffies;
+
+	if (test_bit(LW_URGENT, &linkwatch_flags))
 		return;
 
-	/* If we wrap around we'll delay it by at most HZ. */
-	if (delay > HZ) {
-		linkwatch_nextevent = jiffies;
+	/* Minimise down-time: drop delay for up event. */
+	if (urgent) {
+		if (test_and_set_bit(LW_URGENT, &linkwatch_flags))
+			return;
 		delay = 0;
 	}
 
-	schedule_delayed_work(&linkwatch_work, delay);
+	/* If we wrap around we'll delay it by at most HZ. */
+	if (delay > HZ)
+		delay = 0;
+
+	/*
+	 * This is true if we've scheduled it immeditately or if we don't
+	 * need an immediate execution and it's already pending.
+	 */
+	if (schedule_delayed_work(&linkwatch_work, delay) == !delay)
+		return;
+
+	/* Don't bother if there is nothing urgent. */
+	if (!test_bit(LW_URGENT, &linkwatch_flags))
+		return;
+
+	/* It's already running which is good enough. */
+	if (!cancel_delayed_work(&linkwatch_work))
+		return;
+
+	/* Otherwise we reschedule it again for immediate exection. */
+	schedule_delayed_work(&linkwatch_work, 0);
 }
 
 
@@ -123,7 +146,11 @@ static void __linkwatch_run_queue(int urgent_only)
 	 */
 	if (!urgent_only)
 		linkwatch_nextevent = jiffies + HZ;
-	clear_bit(LW_RUNNING, &linkwatch_flags);
+	/* Limit wrap-around effect on delay. */
+	else if (time_after(linkwatch_nextevent, jiffies + HZ))
+		linkwatch_nextevent = jiffies;
+
+	clear_bit(LW_URGENT, &linkwatch_flags);
 
 	spin_lock_irq(&lweventlist_lock);
 	next = lweventlist;
@@ -166,7 +193,7 @@ static void __linkwatch_run_queue(int urgent_only)
 	}
 
 	if (lweventlist)
-		linkwatch_schedule_work(linkwatch_nextevent - jiffies);
+		linkwatch_schedule_work(0);
 }
 
 
@@ -187,21 +214,16 @@ static void linkwatch_event(struct work_struct *dummy)
 
 void linkwatch_fire_event(struct net_device *dev)
 {
-	if (!test_and_set_bit(__LINK_STATE_LINKWATCH_PENDING, &dev->state)) {
-		unsigned long delay;
+	int urgent = linkwatch_urgent_event(dev);
 
+	if (!test_and_set_bit(__LINK_STATE_LINKWATCH_PENDING, &dev->state)) {
 		dev_hold(dev);
 
 		linkwatch_add_event(dev);
+	} else if (!urgent)
+		return;
 
-		delay = linkwatch_nextevent - jiffies;
-
-		/* Minimise down-time: drop delay for up event. */
-		if (linkwatch_urgent_event(dev))
-			delay = 0;
-
-		linkwatch_schedule_work(delay);
-	}
+	linkwatch_schedule_work(urgent);
 }
 
 EXPORT_SYMBOL(linkwatch_fire_event);

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: linkwatch bustage in git-net
  2007-05-09  7:16       ` Herbert Xu
@ 2007-05-09  7:18         ` David Miller
  2007-05-09 21:11           ` Andrew Morton
  0 siblings, 1 reply; 12+ messages in thread
From: David Miller @ 2007-05-09  7:18 UTC (permalink / raw)
  To: herbert; +Cc: akpm, netdev

From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Wed, 9 May 2007 17:16:15 +1000

> The real problem here is a combination of factors.  First of all e100
> does a netif_carrier_off in its open routine *before* the first call
> to dev_activate.  This when combined with the wrap-around bug causes
> an infinitely delayed event for your NIC to be installed.

I noticed that behavior of e100 too.

> [NET] link_watch: Always schedule urgent events
> 
> Urgent events may be delayed if we already have a non-urgent event
> queued for that device.  This patch changes this by making sure that
> an urgent event is always looked at immediately.
> 
> I've replaced the LW_RUNNING flag by LW_URGENT since whether work
> is scheduled is already kept track by the work queue system.
> 
> The only complication is that we have to provide some exclusion for
> the setting linkwatch_nextevent which is available in the actual
> work function.
> 
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Thanks for working this out, applied and pushed to net-2.6.git

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: linkwatch bustage in git-net
  2007-05-09  7:18         ` David Miller
@ 2007-05-09 21:11           ` Andrew Morton
  2007-05-09 21:39             ` David Miller
  0 siblings, 1 reply; 12+ messages in thread
From: Andrew Morton @ 2007-05-09 21:11 UTC (permalink / raw)
  To: David Miller; +Cc: herbert, netdev

On Wed, 09 May 2007 00:18:14 -0700 (PDT)
David Miller <davem@davemloft.net> wrote:

> > Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
> 
> Thanks for working this out, applied and pushed to net-2.6.git

I'm all confused.  I just pulled your tree and I see in the log


	commit 4cba637dbb9a13706494a1c85174c8e736914010
	Author: Herbert Xu <herbert@gondor.apana.org.au>
	Date:   Wed May 9 00:17:30 2007 -0700

	    [NET] link_watch: Always schedule urgent events
    

and


	commit fe47cdba83b3041e4ac1aa1418431020a4afe1e0
	Author: Herbert Xu <herbert@gondor.apana.org.au>
	Date:   Tue May 8 23:22:43 2007 -0700

	    [NET] link_watch: Eliminate potential delay on wrap-around


but in the actual diff and in the checked-out tree I see no sign of the
earlier patch (fe47cdba83b3041e4ac1aa1418431020a4afe1e0).  I have the old
code:

        /* If we wrap around we'll delay it by at most HZ. */
        if (delay > HZ)
                delay = 0;


where'd it go?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: linkwatch bustage in git-net
  2007-05-09 21:11           ` Andrew Morton
@ 2007-05-09 21:39             ` David Miller
  2007-05-09 21:45               ` Andrew Morton
  0 siblings, 1 reply; 12+ messages in thread
From: David Miller @ 2007-05-09 21:39 UTC (permalink / raw)
  To: akpm; +Cc: herbert, netdev

From: Andrew Morton <akpm@linux-foundation.org>
Date: Wed, 9 May 2007 14:11:03 -0700

> On Wed, 09 May 2007 00:18:14 -0700 (PDT)
> David Miller <davem@davemloft.net> wrote:
> 
> > > Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
> > 
> > Thanks for working this out, applied and pushed to net-2.6.git
> 
> I'm all confused.  I just pulled your tree and I see in the log
> 
> 
> 	commit 4cba637dbb9a13706494a1c85174c8e736914010
> 	Author: Herbert Xu <herbert@gondor.apana.org.au>
> 	Date:   Wed May 9 00:17:30 2007 -0700
> 
> 	    [NET] link_watch: Always schedule urgent events
>     
> 
> and
> 
> 
> 	commit fe47cdba83b3041e4ac1aa1418431020a4afe1e0
> 	Author: Herbert Xu <herbert@gondor.apana.org.au>
> 	Date:   Tue May 8 23:22:43 2007 -0700
> 
> 	    [NET] link_watch: Eliminate potential delay on wrap-around
> 
> 
> but in the actual diff and in the checked-out tree I see no sign of the
> earlier patch (fe47cdba83b3041e4ac1aa1418431020a4afe1e0).  I have the old
> code:
> 
>         /* If we wrap around we'll delay it by at most HZ. */
>         if (delay > HZ)
>                 delay = 0;
> 
> 
> where'd it go?

Look at the two individual changes, Herbert implemented the
delay calculations different in the second changeset.  So
the code is much different there now.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: linkwatch bustage in git-net
  2007-05-09 21:39             ` David Miller
@ 2007-05-09 21:45               ` Andrew Morton
  0 siblings, 0 replies; 12+ messages in thread
From: Andrew Morton @ 2007-05-09 21:45 UTC (permalink / raw)
  To: David Miller; +Cc: herbert, netdev

On Wed, 09 May 2007 14:39:25 -0700 (PDT)
David Miller <davem@davemloft.net> wrote:

> > where'd it go?
> 
> Look at the two individual changes, Herbert implemented the
> delay calculations different in the second changeset.  So
> the code is much different there now.

ah, OK, I just haven't learned to read diffs yet ;)

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2007-05-09 21:46 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-05-09  5:20 linkwatch bustage in git-net Andrew Morton
2007-05-09  5:31 ` Herbert Xu
2007-05-09  5:45   ` Herbert Xu
     [not found]     ` <20070508225153.d538f79c.akpm@linux-foundation.org>
2007-05-09  5:54       ` Herbert Xu
2007-05-09  5:58         ` Andrew Morton
2007-05-09  6:11     ` Andrew Morton
2007-05-09  6:22       ` David Miller
2007-05-09  7:16       ` Herbert Xu
2007-05-09  7:18         ` David Miller
2007-05-09 21:11           ` Andrew Morton
2007-05-09 21:39             ` David Miller
2007-05-09 21:45               ` Andrew Morton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).