netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: 2.6.7-rc3: unregister_netdevice: waiting for tun0 to become free. Usage count = 1
       [not found] <20040613121514.6b3c1c8a.davem@redhat.com>
@ 2004-06-13 23:35 ` Herbert Xu
  2004-06-13 23:41   ` Herbert Xu
  0 siblings, 1 reply; 25+ messages in thread
From: Herbert Xu @ 2004-06-13 23:35 UTC (permalink / raw)
  To: David S. Miller; +Cc: schwab, linux-net, netdev, yoshfuji

David S. Miller <davem@redhat.com> wrote:
>
> diff -Nru a/net/core/dst.c b/net/core/dst.c
> --- a/net/core/dst.c    2004-06-13 06:04:49 -07:00
> +++ b/net/core/dst.c    2004-06-13 06:04:49 -07:00
> @@ -230,8 +230,8 @@
>                                if (event!=NETDEV_DOWN &&
>                                    dst->output == dst_discard_out) {

This is a historical question.  What's the dst->output check for?

>                                        dst->dev = &loopback_dev;
> -                                       dev_put(dev);
>                                        dev_hold(&loopback_dev);
> +                                       dev_put(dev);
>                                        dst->output = dst_discard_out;
>                                        if (dst->neighbour && dst->neighbour->dev == dev) {
>                                                dst->neighbour->dev = &loopback_dev;
> @@ -242,6 +242,8 @@
>                                        dst->input = dst_discard_in;
>                                        dst->output = dst_discard_out;
>                                }

The loopback_dev stuff takes care of the case when someone is still
holding onto the dst.  How come we don't have similar code in ifdown?

> +                               if (dst->ops->ifdown)
> +                                       dst->ops->ifdown(dst, event != NETDEV_DOWN);

What's the rationale for doing this for both UNREGISTER and DOWN?

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email:  Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: 2.6.7-rc3: unregister_netdevice: waiting for tun0 to become free. Usage count = 1
  2004-06-13 23:35 ` 2.6.7-rc3: unregister_netdevice: waiting for tun0 to become free. Usage count = 1 Herbert Xu
@ 2004-06-13 23:41   ` Herbert Xu
  2004-06-14  1:36     ` David S. Miller
  0 siblings, 1 reply; 25+ messages in thread
From: Herbert Xu @ 2004-06-13 23:41 UTC (permalink / raw)
  To: David S. Miller; +Cc: schwab, netdev, yoshfuji

BTW, it looks like I missed the original IPv4 idev patch to route.c
as well.  So could someone please tell me why it was needed in the
first place?

Thanks,
-- 
Visit Openswan at http://www.openswan.org/
Email:  Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: 2.6.7-rc3: unregister_netdevice: waiting for tun0 to become free. Usage count = 1
  2004-06-13 23:41   ` Herbert Xu
@ 2004-06-14  1:36     ` David S. Miller
  2004-06-14  1:50       ` Herbert Xu
  0 siblings, 1 reply; 25+ messages in thread
From: David S. Miller @ 2004-06-14  1:36 UTC (permalink / raw)
  To: Herbert Xu; +Cc: schwab, netdev, yoshfuji

On Mon, 14 Jun 2004 09:41:42 +1000
Herbert Xu <herbert@gondor.apana.org.au> wrote:

> BTW, it looks like I missed the original IPv4 idev patch to route.c
> as well.  So could someone please tell me why it was needed in the
> first place?

Because per-device ipv4/ipv6 statistics support is coming.
It's necessary infrastructure.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: 2.6.7-rc3: unregister_netdevice: waiting for tun0 to become free. Usage count = 1
  2004-06-14  1:36     ` David S. Miller
@ 2004-06-14  1:50       ` Herbert Xu
  2004-06-14  4:07         ` David S. Miller
  0 siblings, 1 reply; 25+ messages in thread
From: Herbert Xu @ 2004-06-14  1:50 UTC (permalink / raw)
  To: David S. Miller; +Cc: schwab, netdev, yoshfuji

On Sun, Jun 13, 2004 at 06:36:22PM -0700, David S. Miller wrote:
> On Mon, 14 Jun 2004 09:41:42 +1000
> Herbert Xu <herbert@gondor.apana.org.au> wrote:
> 
> > BTW, it looks like I missed the original IPv4 idev patch to route.c
> > as well.  So could someone please tell me why it was needed in the
> > first place?
> 
> Because per-device ipv4/ipv6 statistics support is coming.
> It's necessary infrastructure.

Thanks.  Is there somewhere where I can look at the code that's
going to use this infrastructure?
-- 
Visit Openswan at http://www.openswan.org/
Email:  Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: 2.6.7-rc3: unregister_netdevice: waiting for tun0 to become free. Usage count = 1
  2004-06-14  1:50       ` Herbert Xu
@ 2004-06-14  4:07         ` David S. Miller
  2004-06-14  4:22           ` Herbert Xu
  0 siblings, 1 reply; 25+ messages in thread
From: David S. Miller @ 2004-06-14  4:07 UTC (permalink / raw)
  To: Herbert Xu; +Cc: schwab, netdev, yoshfuji

On Mon, 14 Jun 2004 11:50:13 +1000
Herbert Xu <herbert@gondor.apana.org.au> wrote:

> On Sun, Jun 13, 2004 at 06:36:22PM -0700, David S. Miller wrote:
> > On Mon, 14 Jun 2004 09:41:42 +1000
> > Herbert Xu <herbert@gondor.apana.org.au> wrote:
> > 
> > > BTW, it looks like I missed the original IPv4 idev patch to route.c
> > > as well.  So could someone please tell me why it was needed in the
> > > first place?
> > 
> > Because per-device ipv4/ipv6 statistics support is coming.
> > It's necessary infrastructure.
> 
> Thanks.  Is there somewhere where I can look at the code that's
> going to use this infrastructure?

Not written yet, but it will be of the form:

	idev = rt->idev;
	SNMP_BUMP_BLAH_BLAH_STAT(idev);

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: 2.6.7-rc3: unregister_netdevice: waiting for tun0 to become free. Usage count = 1
  2004-06-14  4:07         ` David S. Miller
@ 2004-06-14  4:22           ` Herbert Xu
  2004-06-14  4:27             ` David S. Miller
  2004-06-14 10:28             ` Herbert Xu
  0 siblings, 2 replies; 25+ messages in thread
From: Herbert Xu @ 2004-06-14  4:22 UTC (permalink / raw)
  To: David S. Miller; +Cc: schwab, netdev, yoshfuji

On Sun, Jun 13, 2004 at 09:07:25PM -0700, David S. Miller wrote:
> 
> Not written yet, but it will be of the form:
> 
> 	idev = rt->idev;
> 	SNMP_BUMP_BLAH_BLAH_STAT(idev);

OK, in that case either they all need to check whether idev is
NULL before doing this or we'll need to fix ->ifdown to set the
dev to something other than NULL.

BTW in looking at this I've found two bugs in the dst code wrt
cleaning up net devices.

1. The code in dst_dev_event relies on the entries with dev in it
getting onto the garbage list before it is called.  Unfortunately
for routing entries, the flushing is done asynchronusly so it is
entirely possible for the entries to get onto the garbage list after
dst_dev_event has been called twice (DOWN and then UNREGISTER).

This is not fatal though since another GC run will pick them up in
at most two minutes.  But the user may well be alarmed by those
dreaded "waiting for..." errors.

2. dst_dev_event doesn't clean up dst->child at all.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email:  Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: 2.6.7-rc3: unregister_netdevice: waiting for tun0 to become free. Usage count = 1
  2004-06-14  4:22           ` Herbert Xu
@ 2004-06-14  4:27             ` David S. Miller
  2004-06-14  4:42               ` Herbert Xu
  2004-06-14 10:28             ` Herbert Xu
  1 sibling, 1 reply; 25+ messages in thread
From: David S. Miller @ 2004-06-14  4:27 UTC (permalink / raw)
  To: Herbert Xu; +Cc: schwab, netdev, yoshfuji

On Mon, 14 Jun 2004 14:22:16 +1000
Herbert Xu <herbert@gondor.apana.org.au> wrote:

> OK, in that case either they all need to check whether idev is
> NULL before doing this or we'll need to fix ->ifdown to set the
> dev to something other than NULL.

That's correct, I don't exactly how to deal with that race
just yet.

> BTW in looking at this I've found two bugs in the dst code wrt
> cleaning up net devices.
> 
> 1. The code in dst_dev_event relies on the entries with dev in it
> getting onto the garbage list before it is called.  Unfortunately
> for routing entries, the flushing is done asynchronusly so it is
> entirely possible for the entries to get onto the garbage list after
> dst_dev_event has been called twice (DOWN and then UNREGISTER).
> 
> This is not fatal though since another GC run will pick them up in
> at most two minutes.  But the user may well be alarmed by those
> dreaded "waiting for..." errors.

Right.

> 2. dst_dev_event doesn't clean up dst->child at all.

DST object is disassembled at GC time, so I classift this just
like case #1 above.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: 2.6.7-rc3: unregister_netdevice: waiting for tun0 to become free. Usage count = 1
  2004-06-14  4:27             ` David S. Miller
@ 2004-06-14  4:42               ` Herbert Xu
  2004-06-14  4:47                 ` Herbert Xu
  0 siblings, 1 reply; 25+ messages in thread
From: Herbert Xu @ 2004-06-14  4:42 UTC (permalink / raw)
  To: David S. Miller; +Cc: schwab, netdev, yoshfuji

On Sun, Jun 13, 2004 at 09:27:08PM -0700, David S. Miller wrote:
>
> > 1. The code in dst_dev_event relies on the entries with dev in it
> > getting onto the garbage list before it is called.  Unfortunately
> > for routing entries, the flushing is done asynchronusly so it is
> > entirely possible for the entries to get onto the garbage list after
> > dst_dev_event has been called twice (DOWN and then UNREGISTER).
> > 
> > This is not fatal though since another GC run will pick them up in
> > at most two minutes.  But the user may well be alarmed by those
> > dreaded "waiting for..." errors.
> 
> Right.
> 
> > 2. dst_dev_event doesn't clean up dst->child at all.
> 
> DST object is disassembled at GC time, so I classift this just
> like case #1 above.

Yes.  In fact the loopback_dev code in dst_dev_event is also like
this.  Removing that code will simply defer the dropping of the
dev reference to the GC.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email:  Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: 2.6.7-rc3: unregister_netdevice: waiting for tun0 to become free. Usage count = 1
  2004-06-14  4:47                 ` Herbert Xu
@ 2004-06-14  4:43                   ` David S. Miller
  2004-06-14  4:53                     ` Herbert Xu
  0 siblings, 1 reply; 25+ messages in thread
From: David S. Miller @ 2004-06-14  4:43 UTC (permalink / raw)
  To: Herbert Xu; +Cc: schwab, netdev, yoshfuji

On Mon, 14 Jun 2004 14:47:17 +1000
Herbert Xu <herbert@gondor.apana.org.au> wrote:

> And there's more :) The work done in ifdown also falls in the same
> category.  Its work will be carried out in the GC anyway.
> 
> Actually in all these cases it could take a lot more than two minutes
> if there is a long-living entity (maybe a TCP connection) holding onto
> the dst.  So I guess #1 and #2 should be addressed at some point.

Maybe the solution is to initiate a GC run as soon as possible
instead of however long into the future it would have ended
up running.

Wouldn't that help?

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: 2.6.7-rc3: unregister_netdevice: waiting for tun0 to become free. Usage count = 1
  2004-06-14  4:42               ` Herbert Xu
@ 2004-06-14  4:47                 ` Herbert Xu
  2004-06-14  4:43                   ` David S. Miller
  0 siblings, 1 reply; 25+ messages in thread
From: Herbert Xu @ 2004-06-14  4:47 UTC (permalink / raw)
  To: David S. Miller; +Cc: schwab, netdev, yoshfuji

On Mon, Jun 14, 2004 at 02:42:40PM +1000, herbert wrote:
> On Sun, Jun 13, 2004 at 09:27:08PM -0700, David S. Miller wrote:
> >
> > > 1. The code in dst_dev_event relies on the entries with dev in it
> > > getting onto the garbage list before it is called.  Unfortunately
> > > for routing entries, the flushing is done asynchronusly so it is
> > > entirely possible for the entries to get onto the garbage list after
> > > dst_dev_event has been called twice (DOWN and then UNREGISTER).
> > > 
> > > This is not fatal though since another GC run will pick them up in
> > > at most two minutes.  But the user may well be alarmed by those
> > > dreaded "waiting for..." errors.
> > 
> > Right.
> > 
> > > 2. dst_dev_event doesn't clean up dst->child at all.
> > 
> > DST object is disassembled at GC time, so I classift this just
> > like case #1 above.
> 
> Yes.  In fact the loopback_dev code in dst_dev_event is also like
> this.  Removing that code will simply defer the dropping of the
> dev reference to the GC.

And there's more :) The work done in ifdown also falls in the same
category.  Its work will be carried out in the GC anyway.

Actually in all these cases it could take a lot more than two minutes
if there is a long-living entity (maybe a TCP connection) holding onto
the dst.  So I guess #1 and #2 should be addressed at some point.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email:  Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: 2.6.7-rc3: unregister_netdevice: waiting for tun0 to become free. Usage count = 1
  2004-06-14  4:43                   ` David S. Miller
@ 2004-06-14  4:53                     ` Herbert Xu
  0 siblings, 0 replies; 25+ messages in thread
From: Herbert Xu @ 2004-06-14  4:53 UTC (permalink / raw)
  To: David S. Miller; +Cc: schwab, netdev, yoshfuji

On Sun, Jun 13, 2004 at 09:43:55PM -0700, David S. Miller wrote:
> On Mon, 14 Jun 2004 14:47:17 +1000
> Herbert Xu <herbert@gondor.apana.org.au> wrote:
> 
> > And there's more :) The work done in ifdown also falls in the same
> > category.  Its work will be carried out in the GC anyway.
> > 
> > Actually in all these cases it could take a lot more than two minutes
> > if there is a long-living entity (maybe a TCP connection) holding onto
> > the dst.  So I guess #1 and #2 should be addressed at some point.
> 
> Maybe the solution is to initiate a GC run as soon as possible
> instead of however long into the future it would have ended
> up running.
> 
> Wouldn't that help?

Not really.  The GC is powerless until all those entities drop their
references to the dst.

That's why we've got this hack in dst_dev_event that sets ->dev to
loopback_dev in order to drop the dev reference early.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email:  Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: 2.6.7-rc3: unregister_netdevice: waiting for tun0 to become free. Usage count = 1
  2004-06-14  4:22           ` Herbert Xu
  2004-06-14  4:27             ` David S. Miller
@ 2004-06-14 10:28             ` Herbert Xu
  2004-06-14 12:44               ` Herbert Xu
  1 sibling, 1 reply; 25+ messages in thread
From: Herbert Xu @ 2004-06-14 10:28 UTC (permalink / raw)
  To: David S. Miller; +Cc: schwab, netdev, yoshfuji

On Mon, Jun 14, 2004 at 02:22:16PM +1000, herbert wrote:
> 
> 1. The code in dst_dev_event relies on the entries with dev in it
> getting onto the garbage list before it is called.  Unfortunately
> for routing entries, the flushing is done asynchronusly so it is
> entirely possible for the entries to get onto the garbage list after
> dst_dev_event has been called twice (DOWN and then UNREGISTER).

Actually this can't happen since fib_netdev_event will do an
immediate flush.  So it's only #2 or IPsec that's a problem.
 
Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email:  Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: 2.6.7-rc3: unregister_netdevice: waiting for tun0 to become free. Usage count = 1
  2004-06-14 10:28             ` Herbert Xu
@ 2004-06-14 12:44               ` Herbert Xu
  2004-06-16 19:37                 ` Alexey Kuznetsov
  0 siblings, 1 reply; 25+ messages in thread
From: Herbert Xu @ 2004-06-14 12:44 UTC (permalink / raw)
  To: David S. Miller; +Cc: schwab, netdev, yoshfuji, kuznet

[-- Attachment #1: Type: text/plain, Size: 1312 bytes --]

On Mon, Jun 14, 2004 at 08:28:58PM +1000, herbert wrote:
> On Mon, Jun 14, 2004 at 02:22:16PM +1000, herbert wrote:
> > 
> > 1. The code in dst_dev_event relies on the entries with dev in it
> > getting onto the garbage list before it is called.  Unfortunately
> > for routing entries, the flushing is done asynchronusly so it is
> > entirely possible for the entries to get onto the garbage list after
> > dst_dev_event has been called twice (DOWN and then UNREGISTER).
> 
> Actually this can't happen since fib_netdev_event will do an
> immediate flush.  So it's only #2 or IPsec that's a problem.

And even there it's not too bad as it only happens if there are
more than two IPsec dst's in a row, e.g., ESP/IPCOMP.

Here is a patch to do the same loopback_dev hack for child dst's.
I've also taken the liberty to remove the bogus dst->output check.
I've tested it with ESP/IPCOMP and I can confirm that it allows
me to remove my NIC module if when I've got a TCP connection over
it.  Without it the removal hangs.

Alexey, if I've missed any subtle piece of logic there, please
scream :)

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email:  Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

[-- Attachment #2: p --]
[-- Type: text/plain, Size: 2115 bytes --]

===== net/core/dst.c 1.17 vs edited =====
--- 1.17/net/core/dst.c	2004-06-13 03:51:53 +10:00
+++ edited/net/core/dst.c	2004-06-14 22:39:51 +10:00
@@ -210,6 +210,36 @@
 	return NULL;
 }
 
+/* Dirty hack. We did it in 2.2 (in __dst_free),
+ * we have _very_ good reasons not to repeat
+ * this mistake in 2.3, but we have no choice
+ * now. _It_ _is_ _explicit_ _deliberate_
+ * _race_ _condition_.
+ *
+ * Commented and originally written by Alexey.
+ */
+static void dst_ifdown(struct dst_entry *dst, int unregister)
+{
+	struct net_device *dev = dst->dev;
+
+	do {
+		if (unregister) {
+			dst->dev = &loopback_dev;
+			dev_hold(&loopback_dev);
+			dev_put(dev);
+			if (dst->neighbour && dst->neighbour->dev == dev) {
+				dst->neighbour->dev = &loopback_dev;
+				dev_put(dev);
+				dev_hold(&loopback_dev);
+			}
+		}
+
+		if (dst->ops->ifdown)
+			dst->ops->ifdown(dst, unregister);
+	} while ((dst = dst->child) && dst->flags & DST_NOHASH &&
+		 dst->dev == dev);
+}
+
 static int dst_dev_event(struct notifier_block *this, unsigned long event, void *ptr)
 {
 	struct net_device *dev = ptr;
@@ -221,29 +251,11 @@
 		spin_lock_bh(&dst_lock);
 		for (dst = dst_garbage_list; dst; dst = dst->next) {
 			if (dst->dev == dev) {
-				/* Dirty hack. We did it in 2.2 (in __dst_free),
-				   we have _very_ good reasons not to repeat
-				   this mistake in 2.3, but we have no choice
-				   now. _It_ _is_ _explicit_ _deliberate_
-				   _race_ _condition_.
-				 */
-				if (event!=NETDEV_DOWN &&
-				    dst->output == dst_discard_out) {
-					dst->dev = &loopback_dev;
-					dev_hold(&loopback_dev);
-					dev_put(dev);
-					dst->output = dst_discard_out;
-					if (dst->neighbour && dst->neighbour->dev == dev) {
-						dst->neighbour->dev = &loopback_dev;
-						dev_put(dev);
-						dev_hold(&loopback_dev);
-					}
-				} else {
+				if (event == NETDEV_DOWN) {
 					dst->input = dst_discard_in;
 					dst->output = dst_discard_out;
 				}
-				if (dst->ops->ifdown)
-					dst->ops->ifdown(dst, event != NETDEV_DOWN);
+				dst_ifdown(dst, event != NETDEV_DOWN);
 			}
 		}
 		spin_unlock_bh(&dst_lock);

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: 2.6.7-rc3: unregister_netdevice: waiting for tun0 to become free. Usage count = 1
  2004-06-14 12:44               ` Herbert Xu
@ 2004-06-16 19:37                 ` Alexey Kuznetsov
  2004-06-16 20:09                   ` David S. Miller
  0 siblings, 1 reply; 25+ messages in thread
From: Alexey Kuznetsov @ 2004-06-16 19:37 UTC (permalink / raw)
  To: Herbert Xu; +Cc: David S. Miller, schwab, netdev, yoshfuji

Hello!

> Alexey, if I've missed any subtle piece of logic there, please
> scream :)

The "logic" behind the check ("logic" is double quoted because
of that comment)  with dst_discard_out was to make
release only on the second attempt to unregister. In the past
unregister event used to be rebroadcast periodically until the unregister
finally succeeds. Now logic is different, so this check really
does not make sense.

Anyway, the comment remains damn true and the place is still
bug. It makes sense to think about waiting for at least one quiescent
state after detaching the stale device from dst and real dev_put() it.
Reference to RCU is not occasional, the job looks natural for it.

Alexey

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: 2.6.7-rc3: unregister_netdevice: waiting for tun0 to become free. Usage count = 1
  2004-06-16 19:37                 ` Alexey Kuznetsov
@ 2004-06-16 20:09                   ` David S. Miller
  2004-06-16 20:37                     ` Alexey Kuznetsov
  0 siblings, 1 reply; 25+ messages in thread
From: David S. Miller @ 2004-06-16 20:09 UTC (permalink / raw)
  To: Alexey Kuznetsov; +Cc: herbert, schwab, netdev, yoshfuji

On Wed, 16 Jun 2004 23:37:31 +0400
Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> wrote:

> Anyway, the comment remains damn true and the place is still
> bug. It makes sense to think about waiting for at least one quiescent
> state after detaching the stale device from dst and real dev_put() it.
> Reference to RCU is not occasional, the job looks natural for it.

It seems a simple synchronize_kernel() would clear all the
crap no?

===== net/core/dst.c 1.18 vs edited =====
--- 1.18/net/core/dst.c	2004-06-16 10:26:56 -07:00
+++ edited/net/core/dst.c	2004-06-16 13:10:37 -07:00
@@ -243,6 +243,8 @@
 			dst->ops->ifdown(dst, unregister);
 	} while ((dst = dst->child) && dst->flags & DST_NOHASH &&
 		 dst->dev == dev);
+
+	synchronize_kernel();
 }
 
 static int dst_dev_event(struct notifier_block *this, unsigned long event, void *ptr)

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: 2.6.7-rc3: unregister_netdevice: waiting for tun0 to become free. Usage count = 1
  2004-06-16 20:09                   ` David S. Miller
@ 2004-06-16 20:37                     ` Alexey Kuznetsov
  2004-06-16 20:47                       ` David S. Miller
  0 siblings, 1 reply; 25+ messages in thread
From: Alexey Kuznetsov @ 2004-06-16 20:37 UTC (permalink / raw)
  To: David S. Miller; +Cc: herbert, schwab, netdev, yoshfuji

Hello!

> > bug. It makes sense to think about waiting for at least one quiescent
> > state after detaching the stale device from dst and real dev_put() it.
> > Reference to RCU is not occasional, the job looks natural for it.
> 
> It seems a simple synchronize_kernel() would clear all the
> crap no?

Sort of. I think it should happen after killing reference to dst->dev:

+                       dst->dev = &loopback_dev;
+                       dev_hold(&loopback_dev);


but before dropping reference to dev:

+                       dev_put(dev);


So, I think it should look like:

		dst->dev = &loopback_dev;
		dev_hold(&loopback_dev);
+		synchronize_kernel();
		dev_put(dev);

(sorry, I still do not have Herbert's patch pulled)

Alexey

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: 2.6.7-rc3: unregister_netdevice: waiting for tun0 to become free. Usage count = 1
  2004-06-16 20:37                     ` Alexey Kuznetsov
@ 2004-06-16 20:47                       ` David S. Miller
  2004-06-17  8:17                         ` Herbert Xu
  0 siblings, 1 reply; 25+ messages in thread
From: David S. Miller @ 2004-06-16 20:47 UTC (permalink / raw)
  To: Alexey Kuznetsov; +Cc: herbert, schwab, netdev, yoshfuji

On Thu, 17 Jun 2004 00:37:48 +0400
Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> wrote:

> Sort of. I think it should happen after killing reference to dst->dev:
 ...
> but before dropping reference to dev:
 ...
> So, I think it should look like:
> 
> 		dst->dev = &loopback_dev;
> 		dev_hold(&loopback_dev);
> +		synchronize_kernel();
> 		dev_put(dev);

Perfect, that's what I put into my tree then, plus some comment.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: 2.6.7-rc3: unregister_netdevice: waiting for tun0 to become free. Usage count = 1
  2004-06-16 20:47                       ` David S. Miller
@ 2004-06-17  8:17                         ` Herbert Xu
  2004-06-17  8:33                           ` Herbert Xu
  0 siblings, 1 reply; 25+ messages in thread
From: Herbert Xu @ 2004-06-17  8:17 UTC (permalink / raw)
  To: David S. Miller; +Cc: Alexey Kuznetsov, schwab, netdev, yoshfuji

On Wed, Jun 16, 2004 at 01:47:11PM -0700, David S. Miller wrote:
> On Thu, 17 Jun 2004 00:37:48 +0400
> Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> wrote:
> 
> > Sort of. I think it should happen after killing reference to dst->dev:
>  ...
> > but before dropping reference to dev:
>  ...
> > So, I think it should look like:
> > 
> > 		dst->dev = &loopback_dev;
> > 		dev_hold(&loopback_dev);
> > +		synchronize_kernel();
> > 		dev_put(dev);
> 
> Perfect, that's what I put into my tree then, plus some comment.

Yes we definitely want this call.  But wouldn't it be better to do one
call at the top level, say at the end of netdev_wait_allrefs?

Otherwise we may end up with lots of synchronize_kernel() calls
scattered in each individual event handler.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email:  Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: 2.6.7-rc3: unregister_netdevice: waiting for tun0 to become free. Usage count = 1
  2004-06-17  8:17                         ` Herbert Xu
@ 2004-06-17  8:33                           ` Herbert Xu
  2004-06-17 17:10                             ` David S. Miller
  0 siblings, 1 reply; 25+ messages in thread
From: Herbert Xu @ 2004-06-17  8:33 UTC (permalink / raw)
  To: David S. Miller; +Cc: Alexey Kuznetsov, schwab, netdev, yoshfuji

[-- Attachment #1: Type: text/plain, Size: 514 bytes --]

On Thu, Jun 17, 2004 at 06:17:22PM +1000, herbert wrote:
> 
> Yes we definitely want this call.  But wouldn't it be better to do one
> call at the top level, say at the end of netdev_wait_allrefs?

Something like should be good enough.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
 
Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email:  Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

[-- Attachment #2: p --]
[-- Type: text/plain, Size: 961 bytes --]

===== net/core/dev.c 1.145 vs edited =====
--- 1.145/net/core/dev.c	2004-06-04 15:02:47 +10:00
+++ edited/net/core/dev.c	2004-06-17 18:31:12 +10:00
@@ -3033,6 +3033,12 @@
 
 			netdev_wait_allrefs(dev);
 
+			/* Wait for at least one quiescent state before
+			 * destroying dev.  This ensures users like
+			 * dst->dev are completely finished.
+			 */
+			synchronize_net();
+
 			/* paranoia */
 			BUG_ON(atomic_read(&dev->refcnt));
 			BUG_TRAP(!dev->ip_ptr);
===== net/core/dst.c 1.19 vs edited =====
--- 1.19/net/core/dst.c	2004-06-17 06:42:22 +10:00
+++ edited/net/core/dst.c	2004-06-17 18:31:46 +10:00
@@ -231,12 +231,6 @@
 		if (unregister) {
 			dst->dev = &loopback_dev;
 			dev_hold(&loopback_dev);
-
-			/* Wait for at least one quiescent state after
-			 * detaching the stale device from dst.
-			 */
-			synchronize_kernel();
-
 			dev_put(dev);
 			if (dst->neighbour && dst->neighbour->dev == dev) {
 				dst->neighbour->dev = &loopback_dev;

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: 2.6.7-rc3: unregister_netdevice: waiting for tun0 to become free. Usage count = 1
  2004-06-17  8:33                           ` Herbert Xu
@ 2004-06-17 17:10                             ` David S. Miller
  2004-06-17 17:24                               ` Alexey Kuznetsov
  0 siblings, 1 reply; 25+ messages in thread
From: David S. Miller @ 2004-06-17 17:10 UTC (permalink / raw)
  To: Herbert Xu; +Cc: kuznet, schwab, netdev, yoshfuji


Herbert, you still don't understand what the synchronize_kernel() is
for.

It is to make sure that anyone (other cpus using the dst) who "saw"
the old dst->dev is done doing whatever they were doing _BEFORE_
we put the device reference.

The best we could do is accumulate devices to be put onto
some kind of list, then at the end of dst_ifdown() do one
synchronize_kernel() then walk the list putting all the
devices.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: 2.6.7-rc3: unregister_netdevice: waiting for tun0 to become free. Usage count = 1
  2004-06-17 17:10                             ` David S. Miller
@ 2004-06-17 17:24                               ` Alexey Kuznetsov
  2004-06-17 17:53                                 ` David S. Miller
  0 siblings, 1 reply; 25+ messages in thread
From: Alexey Kuznetsov @ 2004-06-17 17:24 UTC (permalink / raw)
  To: David S. Miller; +Cc: Herbert Xu, schwab, netdev, yoshfuji

Hello!

> It is to make sure that anyone (other cpus using the dst) who "saw"
> the old dst->dev is done doing whatever they were doing _BEFORE_
> we put the device reference.

Actually, he noticed right thing. I forgot that this function
works in context when dev, which is argument is of the function,
is held, so dev_put() here can be safely replaced with __dev_put().

So, synchonize_kernel() can be moved somewhere to unregister_netdevice(),
(maybe, it is already present there under surface)

Alexey

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: 2.6.7-rc3: unregister_netdevice: waiting for tun0 to become free. Usage count = 1
  2004-06-17 17:24                               ` Alexey Kuznetsov
@ 2004-06-17 17:53                                 ` David S. Miller
  2004-06-17 19:07                                   ` Alexey Kuznetsov
  0 siblings, 1 reply; 25+ messages in thread
From: David S. Miller @ 2004-06-17 17:53 UTC (permalink / raw)
  To: Alexey Kuznetsov; +Cc: herbert, schwab, netdev, yoshfuji

On Thu, 17 Jun 2004 21:24:49 +0400
Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> wrote:

> Actually, he noticed right thing. I forgot that this function
> works in context when dev, which is argument is of the function,
> is held, so dev_put() here can be safely replaced with __dev_put().
> 
> So, synchonize_kernel() can be moved somewhere to unregister_netdevice(),
> (maybe, it is already present there under surface)

It occurs, via synchronize_net(), but this happens before NETDEV_UNREGISTER
event is sent out.  Is it sufficient?  I think not, and that we need to
add another call in unregister_netdevice() right before final dev_put().

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: 2.6.7-rc3: unregister_netdevice: waiting for tun0 to become free. Usage count = 1
  2004-06-17 17:53                                 ` David S. Miller
@ 2004-06-17 19:07                                   ` Alexey Kuznetsov
  2004-06-17 19:16                                     ` David S. Miller
  2004-06-17 21:29                                     ` Herbert Xu
  0 siblings, 2 replies; 25+ messages in thread
From: Alexey Kuznetsov @ 2004-06-17 19:07 UTC (permalink / raw)
  To: David S. Miller; +Cc: herbert, schwab, netdev, yoshfuji

Hello!

> It occurs, via synchronize_net(), but this happens before NETDEV_UNREGISTER
> event is sent out.  Is it sufficient?  I think not,

Agreed.

>						 and that we need to
> add another call in unregister_netdevice() right before final dev_put().

This should be enough.

Alexey

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: 2.6.7-rc3: unregister_netdevice: waiting for tun0 to become free. Usage count = 1
  2004-06-17 19:07                                   ` Alexey Kuznetsov
@ 2004-06-17 19:16                                     ` David S. Miller
  2004-06-17 21:29                                     ` Herbert Xu
  1 sibling, 0 replies; 25+ messages in thread
From: David S. Miller @ 2004-06-17 19:16 UTC (permalink / raw)
  To: Alexey Kuznetsov; +Cc: herbert, schwab, netdev, yoshfuji

On Thu, 17 Jun 2004 23:07:39 +0400
Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> wrote:

> >						 and that we need to
> > add another call in unregister_netdevice() right before final dev_put().
> 
> This should be enough.

Perfect, that's what I'll do in my tree.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: 2.6.7-rc3: unregister_netdevice: waiting for tun0 to become free. Usage count = 1
  2004-06-17 19:07                                   ` Alexey Kuznetsov
  2004-06-17 19:16                                     ` David S. Miller
@ 2004-06-17 21:29                                     ` Herbert Xu
  1 sibling, 0 replies; 25+ messages in thread
From: Herbert Xu @ 2004-06-17 21:29 UTC (permalink / raw)
  To: Alexey Kuznetsov; +Cc: David S. Miller, schwab, netdev, yoshfuji

On Thu, Jun 17, 2004 at 11:07:39PM +0400, Alexey Kuznetsov wrote:
> 
> >						 and that we need to
> > add another call in unregister_netdevice() right before final dev_put().
> 
> This should be enough.

It is enough for this dst case, but it may not be enough in general.

Other event handlers may need a rebroadcast before they drop the dev
reference.  So we really should do the synchronize_net() before
freeing the device in net_run_todo().

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email:  Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2004-06-17 21:29 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20040613121514.6b3c1c8a.davem@redhat.com>
2004-06-13 23:35 ` 2.6.7-rc3: unregister_netdevice: waiting for tun0 to become free. Usage count = 1 Herbert Xu
2004-06-13 23:41   ` Herbert Xu
2004-06-14  1:36     ` David S. Miller
2004-06-14  1:50       ` Herbert Xu
2004-06-14  4:07         ` David S. Miller
2004-06-14  4:22           ` Herbert Xu
2004-06-14  4:27             ` David S. Miller
2004-06-14  4:42               ` Herbert Xu
2004-06-14  4:47                 ` Herbert Xu
2004-06-14  4:43                   ` David S. Miller
2004-06-14  4:53                     ` Herbert Xu
2004-06-14 10:28             ` Herbert Xu
2004-06-14 12:44               ` Herbert Xu
2004-06-16 19:37                 ` Alexey Kuznetsov
2004-06-16 20:09                   ` David S. Miller
2004-06-16 20:37                     ` Alexey Kuznetsov
2004-06-16 20:47                       ` David S. Miller
2004-06-17  8:17                         ` Herbert Xu
2004-06-17  8:33                           ` Herbert Xu
2004-06-17 17:10                             ` David S. Miller
2004-06-17 17:24                               ` Alexey Kuznetsov
2004-06-17 17:53                                 ` David S. Miller
2004-06-17 19:07                                   ` Alexey Kuznetsov
2004-06-17 19:16                                     ` David S. Miller
2004-06-17 21:29                                     ` Herbert Xu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).