netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [BUG]:   problem when shutting down ppp connection since 2.5.70
       [not found] <3EFFA1EA.7090502@nortelnetworks.com>
@ 2003-07-03  5:17 ` Chris Friesen
  2003-07-06  9:43   ` Paul Rolland
  0 siblings, 1 reply; 5+ messages in thread
From: Chris Friesen @ 2003-07-03  5:17 UTC (permalink / raw)
  To: paulus; +Cc: linux-kernel, linux-ppp, shemminger, netdev


Well, I've upgraded to the latest 2.5.74 kernel and pppd version 2.4.2b3 
(still using the rp-pppoe userspace software though).

Per Stephen's suggestion I also tried removing the ip address and 
bringing down the ppp link before shuttind down the adsl connection.

Makes no difference.

If I start a dsl connection at system init and then as soon as I get a 
login prompt I shut the connection down, I get the following log:

Jul  3 00:59:26 doug adsl-stop: Killing pppd
Jul  3 00:59:26 doug pppd[779]: Terminating on signal 15.
Jul  3 00:59:26 doug adsl-stop: Killing adsl-connect
Jul  3 00:59:26 doug pppd[779]: Connection terminated.
Jul  3 00:59:26 doug pppd[779]: Connect time 1.5 minutes.
Jul  3 00:59:26 doug pppd[779]: Sent 978 bytes, received 588 bytes.
Jul  3 00:59:29 doug pppoe[781]: Session 511 terminated -- received PADT 
from peer
Jul  3 00:59:29 doug pppoe[781]: Sent PADT
Jul  3 00:59:36 doug kernel: unregister_netdevice: waiting for ppp0 to 
become free. Usage count = 1
Jul  3 01:00:16 doug last message repeated 4 times


If I start the connection up manually after I'm booted, I get the following:

Jul  3 00:03:06 doug adsl-stop: Killing pppd
Jul  3 00:03:06 doug pppd[1763]: Terminating on signal 15.
Jul  3 00:03:06 doug adsl-stop: Killing adsl-connect
Jul  3 00:03:06 doug pppd[1763]: Connection terminated.
Jul  3 00:03:06 doug pppd[1763]: Connect time 0.4 minutes.
Jul  3 00:03:06 doug pppd[1763]: Sent 64 bytes, received 70 bytes.
Jul  3 00:03:06 doug pppoe[1769]: read (asyncReadFromPPP): Session 6990: 
Input/output error
Jul  3 00:03:06 doug pppoe[1769]: Sent PADT
Jul  3 00:03:06 doug pppd[1763]: Exit.


The main difference I see is that in the success case we don't seem to 
be receiving a PADT message, but rather we get an error in asyncReadFromPPP.

Any ideas where to look?  For those on the netdev list who have just 
tuned in, this started happening with 2.5.70.

Chris

-- 
Chris Friesen                    | MailStop: 043/33/F10
Nortel Networks                  | work: (613) 765-0557
3500 Carling Avenue              | fax:  (613) 765-2986
Nepean, ON K2H 8E9 Canada        | email: cfriesen@nortelnetworks.com


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [BUG]:   problem when shutting down ppp connection since 2.5.70
  2003-07-03  5:17 ` Chris Friesen
@ 2003-07-06  9:43   ` Paul Rolland
  2003-07-09 18:43     ` Stephen Hemminger
  0 siblings, 1 reply; 5+ messages in thread
From: Paul Rolland @ 2003-07-06  9:43 UTC (permalink / raw)
  To: 'Chris Friesen', paulus
  Cc: linux-kernel, linux-ppp, shemminger, netdev

Hello,
 
> Well, I've upgraded to the latest 2.5.74 kernel and pppd 
> version 2.4.2b3 
> (still using the rp-pppoe userspace software though).
> 
> Per Stephen's suggestion I also tried removing the ip address and 
> bringing down the ppp link before shuttind down the adsl connection.
> 
> Makes no difference.
> 
To complete on this topic : I've got the problem since 2.5.70, when
netdev_wait_allrefs has been introduced in net/core/dev.c

I have the same behavior using vtund, configured to create a tap0
interface.
At shutdown time, the interface refuses to get freed and I'm stuck.

Having vtund started at boot time (within the /etc/rc.d/... stuff)
or later doesn't make any difference.

Shutting down the interface before stopping the application or halting
the machine doesn't make any difference either.

The other problem is that the current implementation of 
netdev_wait_allrefs makes that if you kill an application that is
using a device not correctly counted, you lock the console you are
working on.
e.g., killing vtund will create a printk(... unregister_netdevice...),
and the console cannot be used anymore as long as the counter hasn't
reached 0 and the device is freed...

Paul


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [BUG]:   problem when shutting down ppp connection since 2.5.70
  2003-07-06  9:43   ` Paul Rolland
@ 2003-07-09 18:43     ` Stephen Hemminger
  2003-07-14  9:16       ` Paul Rolland
  0 siblings, 1 reply; 5+ messages in thread
From: Stephen Hemminger @ 2003-07-09 18:43 UTC (permalink / raw)
  To: Paul Rolland, cfriesen, paulus; +Cc: linux-ppp, netdev

The problem is that some protocol is still holding a reference to the device.
This is a bug in the protocol, and needs to be fixed (ie not a ppp bug).

Try building a kernel with only IPv4, eliminate all others then add back until
you find the culprit.

The following patch may help also.

diff -Nru a/net/core/dev.c b/net/core/dev.c
--- a/net/core/dev.c	Wed Jul  9 11:40:56 2003
+++ b/net/core/dev.c	Wed Jul  9 11:40:56 2003
@@ -72,6 +72,8 @@
  *				        - netif_rx() feedback
  */
 
+#define DEBUG 1
+
 #include <asm/uaccess.h>
 #include <asm/system.h>
 #include <asm/bitops.h>
@@ -2704,6 +2706,8 @@
 	goto out;
 }
 
+extern void dst_dumpref(const struct net_device *dev);
+
 static void netdev_wait_allrefs(struct net_device *dev)
 {
 	unsigned long rebroadcast_time, warning_time;
@@ -2740,6 +2744,30 @@
 		current->state = TASK_RUNNING;
 
 		if (time_after(jiffies, warning_time + 10 * HZ)) {
+#ifdef DEBUG
+			dst_dumpref(dev);
+
+			if (dev->atalk_ptr)
+				printk(KERN_INFO "unregister_netdevice: "
+				       " %s: probably in use as AppleTalk device\n", dev->name);
+			if (dev->ip_ptr)
+				printk(KERN_INFO "unregister_netdevice: "
+				       " %s: probably in use as IPv4 device\n", dev->name);
+
+			if (dev->atalk_ptr)
+				printk(KERN_INFO "unregister_netdevice: "
+				       " %s: probably in use as DECnet device\n", dev->name);
+			if (dev->ip6_ptr)
+				printk(KERN_INFO "unregister_netdevice: "
+				       " %s: probably in use as IPv6 device\n", dev->name);
+
+			if (dev->ec_ptr)
+				printk(KERN_INFO "unregister_netdevice: "
+				       " %s: probably in use as Econet device\n", dev->name);
+			if (dev->ax25_ptr)
+				printk(KERN_INFO "unregister_netdevice: "
+				       " %s: probably in use as AX.25 device\n", dev->name);
+#endif
 			printk(KERN_EMERG "unregister_netdevice: "
 			       "waiting for %s to become free. Usage "
 			       "count = %d\n",
diff -Nru a/net/core/dst.c b/net/core/dst.c
--- a/net/core/dst.c	Wed Jul  9 11:40:56 2003
+++ b/net/core/dst.c	Wed Jul  9 11:40:56 2003
@@ -41,6 +41,21 @@
 static struct timer_list dst_gc_timer =
 	TIMER_INITIALIZER(dst_run_gc, 0, DST_GC_MIN);
 
+
+void dst_dumpref(const struct net_device *dev)
+{
+	struct dst_entry *dst;
+	int count = 0;
+
+	spin_lock_bh(&dst_lock);
+	for (dst = dst_garbage_list; dst; dst = dst->next) {
+		if (dst->dev == dev) ++count;
+	}
+	spin_unlock_bh(&dst_lock);
+
+	printk(KERN_INFO "dst route cache has %d references\n", count);
+}
+
 static void dst_run_gc(unsigned long dummy)
 {
 	int    delayed = 0;

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [BUG]:   problem when shutting down ppp connection since 2.5.70
  2003-07-09 18:43     ` Stephen Hemminger
@ 2003-07-14  9:16       ` Paul Rolland
  0 siblings, 0 replies; 5+ messages in thread
From: Paul Rolland @ 2003-07-14  9:16 UTC (permalink / raw)
  To: 'Stephen Hemminger', cfriesen, paulus; +Cc: linux-ppp, netdev

Hello,

I've applied the patch from Stephen on top of the 2.6.0-test1 kernel,
and here is the result :

Jul 14 10:57:01 donald kernel: dst route cache has 0 references
Jul 14 10:57:01 donald kernel: unregister_netdevice: waiting for tap0 to become free. Usage count = -3
Jul 14 10:57:10 donald kernel: dst route cache has 0 references
Jul 14 10:57:10 donald kernel: unregister_netdevice: waiting for tap0 to become free. Usage count = -3
Jul 14 10:57:20 donald kernel: dst route cache has 0 references
Jul 14 10:57:20 donald kernel: unregister_netdevice: waiting for tap0 to become free. Usage count = -3
Jul 14 10:57:31 donald kernel: dst route cache has 0 references
Jul 14 10:57:31 donald kernel: unregister_netdevice: waiting for tap0 to become free. Usage count = -3

This is a big change compared to 2.5.74 (.75 not tested).

I guess a negative usage count should be considered as valid
by unregister_netdevice.

What about :

--- dev.c       2003-07-14 11:13:23.000000000 +0200
+++ dev.c.orig  2003-07-14 11:13:01.000000000 +0200
@@ -2746,7 +2746,7 @@
        unsigned long rebroadcast_time, warning_time;
 
        rebroadcast_time = warning_time = jiffies;
-       while (atomic_read(&dev->refcnt) > 0) {
+       while (atomic_read(&dev->refcnt) != 0) {
                if (time_after(jiffies, rebroadcast_time + 1 * HZ)) {
                        rtnl_shlock();
                        rtnl_exlock();

Regards,
Paul

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
REHAB is for quitters. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


Paul Rolland, rol@witbe.net
Witbe.net SA
Directeur Associe

--

Please no HTML, I'm not a browser - Pas d'HTML, je ne suis pas un navigateur

"Some people dream of success... while others wake up and work hard at it"

> -----Original Message-----
> From: Stephen Hemminger [mailto:shemminger@osdl.org]
> Sent: Wednesday, July 09, 2003 8:44 PM
> To: Paul Rolland; cfriesen@nortelnetworks.com; paulus@samba.org
> Cc: linux-ppp@vger.kernel.org; netdev@oss.sgi.com
> Subject: Re: [BUG]: problem when shutting down ppp connection
> since 2.5.70
> 
> 
> The problem is that some protocol is still holding a reference to the 
> device. This is a bug in the protocol, and needs to be fixed (ie not a 
> ppp bug).
> 
> Try building a kernel with only IPv4, eliminate all others then add 
> back until you find the culprit.
> 
> The following patch may help also.
> 
> diff -Nru a/net/core/dev.c b/net/core/dev.c
> --- a/net/core/dev.c	Wed Jul  9 11:40:56 2003
> +++ b/net/core/dev.c	Wed Jul  9 11:40:56 2003
> @@ -72,6 +72,8 @@
>   *				        - netif_rx() feedback
>   */
>  
> +#define DEBUG 1
> +
>  #include <asm/uaccess.h>
>  #include <asm/system.h>
>  #include <asm/bitops.h>
> @@ -2704,6 +2706,8 @@
>  	goto out;
>  }
>  
> +extern void dst_dumpref(const struct net_device *dev);
> +
>  static void netdev_wait_allrefs(struct net_device *dev)
>  {
>  	unsigned long rebroadcast_time, warning_time;
> @@ -2740,6 +2744,30 @@
>  		current->state = TASK_RUNNING;
>  
>  		if (time_after(jiffies, warning_time + 10 * HZ)) {
> +#ifdef DEBUG
> +			dst_dumpref(dev);
> +
> +			if (dev->atalk_ptr)
> +				printk(KERN_INFO
> "unregister_netdevice: "
> +				       " %s: probably in use as
> AppleTalk device\n", dev->name);
> +			if (dev->ip_ptr)
> +				printk(KERN_INFO
> "unregister_netdevice: "
> +				       " %s: probably in use as
> IPv4 device\n", dev->name);
> +
> +			if (dev->atalk_ptr)
> +				printk(KERN_INFO
> "unregister_netdevice: "
> +				       " %s: probably in use as
> DECnet device\n", dev->name);
> +			if (dev->ip6_ptr)
> +				printk(KERN_INFO
> "unregister_netdevice: "
> +				       " %s: probably in use as
> IPv6 device\n", dev->name);
> +
> +			if (dev->ec_ptr)
> +				printk(KERN_INFO
> "unregister_netdevice: "
> +				       " %s: probably in use as
> Econet device\n", dev->name);
> +			if (dev->ax25_ptr)
> +				printk(KERN_INFO
> "unregister_netdevice: "
> +				       " %s: probably in use as
> AX.25 device\n", dev->name); #endif
>  			printk(KERN_EMERG "unregister_netdevice: "
>  			       "waiting for %s to become free. Usage "
>  			       "count = %d\n",
> diff -Nru a/net/core/dst.c b/net/core/dst.c
> --- a/net/core/dst.c	Wed Jul  9 11:40:56 2003
> +++ b/net/core/dst.c	Wed Jul  9 11:40:56 2003
> @@ -41,6 +41,21 @@
>  static struct timer_list dst_gc_timer =
>  	TIMER_INITIALIZER(dst_run_gc, 0, DST_GC_MIN);
>  
> +
> +void dst_dumpref(const struct net_device *dev)
> +{
> +	struct dst_entry *dst;
> +	int count = 0;
> +
> +	spin_lock_bh(&dst_lock);
> +	for (dst = dst_garbage_list; dst; dst = dst->next) {
> +		if (dst->dev == dev) ++count;
> +	}
> +	spin_unlock_bh(&dst_lock);
> +
> +	printk(KERN_INFO "dst route cache has %d references\n",
> count); }
> +
>  static void dst_run_gc(unsigned long dummy)
>  {
>  	int    delayed = 0;
> 


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [BUG]:   problem when shutting down ppp connection since 2.5.70
@ 2003-07-14 11:43 Paul Rolland
  0 siblings, 0 replies; 5+ messages in thread
From: Paul Rolland @ 2003-07-14 11:43 UTC (permalink / raw)
  To: 'Paul Rolland', 'Stephen Hemminger', cfriesen,
	paulus
  Cc: linux-ppp, netdev

Hello,

I'm sorry, the patch was not complete, it should have considered the
BUG_ON too...

Here is one that is fine on my system :
--- dev.c.orig  2003-07-14 13:41:33.000000000 +0200
+++ dev.c       2003-07-14 13:34:27.000000000 +0200
@@ -2742,7 +2742,7 @@
        unsigned long rebroadcast_time, warning_time;
 
        rebroadcast_time = warning_time = jiffies;
-       while (atomic_read(&dev->refcnt) != 0) {
+       while (atomic_read(&dev->refcnt) > 0) {
                if (time_after(jiffies, rebroadcast_time + 1 * HZ)) {
                        rtnl_shlock();
                        rtnl_exlock();
@@ -2836,7 +2836,7 @@
                        dev->reg_state = NETREG_UNREGISTERED;
 
                        netdev_wait_allrefs(dev);
-                       BUG_ON(atomic_read(&dev->refcnt));
+                       BUG_ON(atomic_read(&dev->refcnt) > 0);
 
                        netdev_finish_unregister(dev);
                        break;

Still don't understand why refcnt is really bad (negative value),
but at least the machine is working...

Paul

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
I have a vitally important role serving as a bad example.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2003-07-14 11:43 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-07-14 11:43 [BUG]: problem when shutting down ppp connection since 2.5.70 Paul Rolland
     [not found] <3EFFA1EA.7090502@nortelnetworks.com>
2003-07-03  5:17 ` Chris Friesen
2003-07-06  9:43   ` Paul Rolland
2003-07-09 18:43     ` Stephen Hemminger
2003-07-14  9:16       ` Paul Rolland

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).