* Re: [BUG]: problem when shutting down ppp connection since 2.5.70 [not found] <3EFFA1EA.7090502@nortelnetworks.com> @ 2003-07-03 5:17 ` Chris Friesen 2003-07-06 9:43 ` Paul Rolland 0 siblings, 1 reply; 5+ messages in thread From: Chris Friesen @ 2003-07-03 5:17 UTC (permalink / raw) To: paulus; +Cc: linux-kernel, linux-ppp, shemminger, netdev Well, I've upgraded to the latest 2.5.74 kernel and pppd version 2.4.2b3 (still using the rp-pppoe userspace software though). Per Stephen's suggestion I also tried removing the ip address and bringing down the ppp link before shuttind down the adsl connection. Makes no difference. If I start a dsl connection at system init and then as soon as I get a login prompt I shut the connection down, I get the following log: Jul 3 00:59:26 doug adsl-stop: Killing pppd Jul 3 00:59:26 doug pppd[779]: Terminating on signal 15. Jul 3 00:59:26 doug adsl-stop: Killing adsl-connect Jul 3 00:59:26 doug pppd[779]: Connection terminated. Jul 3 00:59:26 doug pppd[779]: Connect time 1.5 minutes. Jul 3 00:59:26 doug pppd[779]: Sent 978 bytes, received 588 bytes. Jul 3 00:59:29 doug pppoe[781]: Session 511 terminated -- received PADT from peer Jul 3 00:59:29 doug pppoe[781]: Sent PADT Jul 3 00:59:36 doug kernel: unregister_netdevice: waiting for ppp0 to become free. Usage count = 1 Jul 3 01:00:16 doug last message repeated 4 times If I start the connection up manually after I'm booted, I get the following: Jul 3 00:03:06 doug adsl-stop: Killing pppd Jul 3 00:03:06 doug pppd[1763]: Terminating on signal 15. Jul 3 00:03:06 doug adsl-stop: Killing adsl-connect Jul 3 00:03:06 doug pppd[1763]: Connection terminated. Jul 3 00:03:06 doug pppd[1763]: Connect time 0.4 minutes. Jul 3 00:03:06 doug pppd[1763]: Sent 64 bytes, received 70 bytes. Jul 3 00:03:06 doug pppoe[1769]: read (asyncReadFromPPP): Session 6990: Input/output error Jul 3 00:03:06 doug pppoe[1769]: Sent PADT Jul 3 00:03:06 doug pppd[1763]: Exit. The main difference I see is that in the success case we don't seem to be receiving a PADT message, but rather we get an error in asyncReadFromPPP. Any ideas where to look? For those on the netdev list who have just tuned in, this started happening with 2.5.70. Chris -- Chris Friesen | MailStop: 043/33/F10 Nortel Networks | work: (613) 765-0557 3500 Carling Avenue | fax: (613) 765-2986 Nepean, ON K2H 8E9 Canada | email: cfriesen@nortelnetworks.com ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [BUG]: problem when shutting down ppp connection since 2.5.70 2003-07-03 5:17 ` [BUG]: problem when shutting down ppp connection since 2.5.70 Chris Friesen @ 2003-07-06 9:43 ` Paul Rolland 2003-07-09 18:43 ` Stephen Hemminger 0 siblings, 1 reply; 5+ messages in thread From: Paul Rolland @ 2003-07-06 9:43 UTC (permalink / raw) To: 'Chris Friesen', paulus Cc: linux-kernel, linux-ppp, shemminger, netdev Hello, > Well, I've upgraded to the latest 2.5.74 kernel and pppd > version 2.4.2b3 > (still using the rp-pppoe userspace software though). > > Per Stephen's suggestion I also tried removing the ip address and > bringing down the ppp link before shuttind down the adsl connection. > > Makes no difference. > To complete on this topic : I've got the problem since 2.5.70, when netdev_wait_allrefs has been introduced in net/core/dev.c I have the same behavior using vtund, configured to create a tap0 interface. At shutdown time, the interface refuses to get freed and I'm stuck. Having vtund started at boot time (within the /etc/rc.d/... stuff) or later doesn't make any difference. Shutting down the interface before stopping the application or halting the machine doesn't make any difference either. The other problem is that the current implementation of netdev_wait_allrefs makes that if you kill an application that is using a device not correctly counted, you lock the console you are working on. e.g., killing vtund will create a printk(... unregister_netdevice...), and the console cannot be used anymore as long as the counter hasn't reached 0 and the device is freed... Paul ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [BUG]: problem when shutting down ppp connection since 2.5.70 2003-07-06 9:43 ` Paul Rolland @ 2003-07-09 18:43 ` Stephen Hemminger 2003-07-14 9:16 ` Paul Rolland 0 siblings, 1 reply; 5+ messages in thread From: Stephen Hemminger @ 2003-07-09 18:43 UTC (permalink / raw) To: Paul Rolland, cfriesen, paulus; +Cc: linux-ppp, netdev The problem is that some protocol is still holding a reference to the device. This is a bug in the protocol, and needs to be fixed (ie not a ppp bug). Try building a kernel with only IPv4, eliminate all others then add back until you find the culprit. The following patch may help also. diff -Nru a/net/core/dev.c b/net/core/dev.c --- a/net/core/dev.c Wed Jul 9 11:40:56 2003 +++ b/net/core/dev.c Wed Jul 9 11:40:56 2003 @@ -72,6 +72,8 @@ * - netif_rx() feedback */ +#define DEBUG 1 + #include <asm/uaccess.h> #include <asm/system.h> #include <asm/bitops.h> @@ -2704,6 +2706,8 @@ goto out; } +extern void dst_dumpref(const struct net_device *dev); + static void netdev_wait_allrefs(struct net_device *dev) { unsigned long rebroadcast_time, warning_time; @@ -2740,6 +2744,30 @@ current->state = TASK_RUNNING; if (time_after(jiffies, warning_time + 10 * HZ)) { +#ifdef DEBUG + dst_dumpref(dev); + + if (dev->atalk_ptr) + printk(KERN_INFO "unregister_netdevice: " + " %s: probably in use as AppleTalk device\n", dev->name); + if (dev->ip_ptr) + printk(KERN_INFO "unregister_netdevice: " + " %s: probably in use as IPv4 device\n", dev->name); + + if (dev->atalk_ptr) + printk(KERN_INFO "unregister_netdevice: " + " %s: probably in use as DECnet device\n", dev->name); + if (dev->ip6_ptr) + printk(KERN_INFO "unregister_netdevice: " + " %s: probably in use as IPv6 device\n", dev->name); + + if (dev->ec_ptr) + printk(KERN_INFO "unregister_netdevice: " + " %s: probably in use as Econet device\n", dev->name); + if (dev->ax25_ptr) + printk(KERN_INFO "unregister_netdevice: " + " %s: probably in use as AX.25 device\n", dev->name); +#endif printk(KERN_EMERG "unregister_netdevice: " "waiting for %s to become free. Usage " "count = %d\n", diff -Nru a/net/core/dst.c b/net/core/dst.c --- a/net/core/dst.c Wed Jul 9 11:40:56 2003 +++ b/net/core/dst.c Wed Jul 9 11:40:56 2003 @@ -41,6 +41,21 @@ static struct timer_list dst_gc_timer = TIMER_INITIALIZER(dst_run_gc, 0, DST_GC_MIN); + +void dst_dumpref(const struct net_device *dev) +{ + struct dst_entry *dst; + int count = 0; + + spin_lock_bh(&dst_lock); + for (dst = dst_garbage_list; dst; dst = dst->next) { + if (dst->dev == dev) ++count; + } + spin_unlock_bh(&dst_lock); + + printk(KERN_INFO "dst route cache has %d references\n", count); +} + static void dst_run_gc(unsigned long dummy) { int delayed = 0; ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [BUG]: problem when shutting down ppp connection since 2.5.70 2003-07-09 18:43 ` Stephen Hemminger @ 2003-07-14 9:16 ` Paul Rolland 0 siblings, 0 replies; 5+ messages in thread From: Paul Rolland @ 2003-07-14 9:16 UTC (permalink / raw) To: 'Stephen Hemminger', cfriesen, paulus; +Cc: linux-ppp, netdev Hello, I've applied the patch from Stephen on top of the 2.6.0-test1 kernel, and here is the result : Jul 14 10:57:01 donald kernel: dst route cache has 0 references Jul 14 10:57:01 donald kernel: unregister_netdevice: waiting for tap0 to become free. Usage count = -3 Jul 14 10:57:10 donald kernel: dst route cache has 0 references Jul 14 10:57:10 donald kernel: unregister_netdevice: waiting for tap0 to become free. Usage count = -3 Jul 14 10:57:20 donald kernel: dst route cache has 0 references Jul 14 10:57:20 donald kernel: unregister_netdevice: waiting for tap0 to become free. Usage count = -3 Jul 14 10:57:31 donald kernel: dst route cache has 0 references Jul 14 10:57:31 donald kernel: unregister_netdevice: waiting for tap0 to become free. Usage count = -3 This is a big change compared to 2.5.74 (.75 not tested). I guess a negative usage count should be considered as valid by unregister_netdevice. What about : --- dev.c 2003-07-14 11:13:23.000000000 +0200 +++ dev.c.orig 2003-07-14 11:13:01.000000000 +0200 @@ -2746,7 +2746,7 @@ unsigned long rebroadcast_time, warning_time; rebroadcast_time = warning_time = jiffies; - while (atomic_read(&dev->refcnt) > 0) { + while (atomic_read(&dev->refcnt) != 0) { if (time_after(jiffies, rebroadcast_time + 1 * HZ)) { rtnl_shlock(); rtnl_exlock(); Regards, Paul ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ REHAB is for quitters. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Paul Rolland, rol@witbe.net Witbe.net SA Directeur Associe -- Please no HTML, I'm not a browser - Pas d'HTML, je ne suis pas un navigateur "Some people dream of success... while others wake up and work hard at it" > -----Original Message----- > From: Stephen Hemminger [mailto:shemminger@osdl.org] > Sent: Wednesday, July 09, 2003 8:44 PM > To: Paul Rolland; cfriesen@nortelnetworks.com; paulus@samba.org > Cc: linux-ppp@vger.kernel.org; netdev@oss.sgi.com > Subject: Re: [BUG]: problem when shutting down ppp connection > since 2.5.70 > > > The problem is that some protocol is still holding a reference to the > device. This is a bug in the protocol, and needs to be fixed (ie not a > ppp bug). > > Try building a kernel with only IPv4, eliminate all others then add > back until you find the culprit. > > The following patch may help also. > > diff -Nru a/net/core/dev.c b/net/core/dev.c > --- a/net/core/dev.c Wed Jul 9 11:40:56 2003 > +++ b/net/core/dev.c Wed Jul 9 11:40:56 2003 > @@ -72,6 +72,8 @@ > * - netif_rx() feedback > */ > > +#define DEBUG 1 > + > #include <asm/uaccess.h> > #include <asm/system.h> > #include <asm/bitops.h> > @@ -2704,6 +2706,8 @@ > goto out; > } > > +extern void dst_dumpref(const struct net_device *dev); > + > static void netdev_wait_allrefs(struct net_device *dev) > { > unsigned long rebroadcast_time, warning_time; > @@ -2740,6 +2744,30 @@ > current->state = TASK_RUNNING; > > if (time_after(jiffies, warning_time + 10 * HZ)) { > +#ifdef DEBUG > + dst_dumpref(dev); > + > + if (dev->atalk_ptr) > + printk(KERN_INFO > "unregister_netdevice: " > + " %s: probably in use as > AppleTalk device\n", dev->name); > + if (dev->ip_ptr) > + printk(KERN_INFO > "unregister_netdevice: " > + " %s: probably in use as > IPv4 device\n", dev->name); > + > + if (dev->atalk_ptr) > + printk(KERN_INFO > "unregister_netdevice: " > + " %s: probably in use as > DECnet device\n", dev->name); > + if (dev->ip6_ptr) > + printk(KERN_INFO > "unregister_netdevice: " > + " %s: probably in use as > IPv6 device\n", dev->name); > + > + if (dev->ec_ptr) > + printk(KERN_INFO > "unregister_netdevice: " > + " %s: probably in use as > Econet device\n", dev->name); > + if (dev->ax25_ptr) > + printk(KERN_INFO > "unregister_netdevice: " > + " %s: probably in use as > AX.25 device\n", dev->name); #endif > printk(KERN_EMERG "unregister_netdevice: " > "waiting for %s to become free. Usage " > "count = %d\n", > diff -Nru a/net/core/dst.c b/net/core/dst.c > --- a/net/core/dst.c Wed Jul 9 11:40:56 2003 > +++ b/net/core/dst.c Wed Jul 9 11:40:56 2003 > @@ -41,6 +41,21 @@ > static struct timer_list dst_gc_timer = > TIMER_INITIALIZER(dst_run_gc, 0, DST_GC_MIN); > > + > +void dst_dumpref(const struct net_device *dev) > +{ > + struct dst_entry *dst; > + int count = 0; > + > + spin_lock_bh(&dst_lock); > + for (dst = dst_garbage_list; dst; dst = dst->next) { > + if (dst->dev == dev) ++count; > + } > + spin_unlock_bh(&dst_lock); > + > + printk(KERN_INFO "dst route cache has %d references\n", > count); } > + > static void dst_run_gc(unsigned long dummy) > { > int delayed = 0; > ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [BUG]: problem when shutting down ppp connection since 2.5.70
@ 2003-07-14 11:43 Paul Rolland
0 siblings, 0 replies; 5+ messages in thread
From: Paul Rolland @ 2003-07-14 11:43 UTC (permalink / raw)
To: 'Paul Rolland', 'Stephen Hemminger', cfriesen,
paulus
Cc: linux-ppp, netdev
Hello,
I'm sorry, the patch was not complete, it should have considered the
BUG_ON too...
Here is one that is fine on my system :
--- dev.c.orig 2003-07-14 13:41:33.000000000 +0200
+++ dev.c 2003-07-14 13:34:27.000000000 +0200
@@ -2742,7 +2742,7 @@
unsigned long rebroadcast_time, warning_time;
rebroadcast_time = warning_time = jiffies;
- while (atomic_read(&dev->refcnt) != 0) {
+ while (atomic_read(&dev->refcnt) > 0) {
if (time_after(jiffies, rebroadcast_time + 1 * HZ)) {
rtnl_shlock();
rtnl_exlock();
@@ -2836,7 +2836,7 @@
dev->reg_state = NETREG_UNREGISTERED;
netdev_wait_allrefs(dev);
- BUG_ON(atomic_read(&dev->refcnt));
+ BUG_ON(atomic_read(&dev->refcnt) > 0);
netdev_finish_unregister(dev);
break;
Still don't understand why refcnt is really bad (negative value),
but at least the machine is working...
Paul
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
I have a vitally important role serving as a bad example.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
^ permalink raw reply [flat|nested] 5+ messages in threadend of thread, other threads:[~2003-07-14 11:43 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <3EFFA1EA.7090502@nortelnetworks.com>
2003-07-03 5:17 ` [BUG]: problem when shutting down ppp connection since 2.5.70 Chris Friesen
2003-07-06 9:43 ` Paul Rolland
2003-07-09 18:43 ` Stephen Hemminger
2003-07-14 9:16 ` Paul Rolland
2003-07-14 11:43 Paul Rolland
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).