* Re: PROBLEM: 2.6.11-rc2 hangs on bridge shutdown (br0)
[not found] <20050131162201.GA1000@stilzchen.informatik.tu-chemnitz.de>
@ 2005-02-05 5:24 ` Herbert Xu
2005-02-05 5:38 ` David S. Miller
0 siblings, 1 reply; 33+ messages in thread
From: Herbert Xu @ 2005-02-05 5:24 UTC (permalink / raw)
To: Mirko Parthey
Cc: linux-kernel, netdev, David S. Miller, YOSHIFUJI Hideaki,
Stephen Hemminger
[-- Attachment #1: Type: text/plain, Size: 2335 bytes --]
On Mon, Jan 31, 2005 at 04:22:02PM +0000, Mirko Parthey wrote:
>
> How to reproduce the problem (I tried this on a Pentium 4 machine):
>
> boot: linux init=/bin/bash
> [...booting...]
> # mount proc -t proc /proc
> # ifconfig lo 127.0.0.1
> # brctl addbr br0
> # modprobe e100 # also reproducible with 3c5x9
> # brctl addif br0 eth0
> # ifconfig br0 192.168.1.1
> # ifconfig eth0 up
> # ifconfig lo down
This is the key to the problem.
It took me a while to find the cause of this. Along the way I
found a few other ref counting bugs in this area as well.
All of these bugs stem from the idev reference held in rtable/rt6_info.
Looking back in my mailbox, it's amazing how many problems this piece
of infrastructure has caused since it was installed last June. AFAIK
to this day there is still no code in the kernel that actually uses
this infrastructure.
Anyway, this particular problem is due to IPv6 adding local addresses
with split devices. That is, routes to local addresses are added with
rt6i_dev set to &loopback_dev and rt6i_idev set to the idev of the
device where the address is added.
This is just not going to work unless IPv6 creates its own dst garbage
collection routine since the generic GC in net/core/dst.c has no way
of finding all the rt6_info's referring to a specific net_device through
rt6i_idev.
It is also different from the IPv4 behaviour where we set both dev
and idev to loopback_dev.
Now this stuff is apparently going to be used for IP statistics. As
it is packets sent to/received from local addresses are counted against
the loopback device. Linux has behaved this way for a long time. So
when these statistics actually get implemented it will be very strange
if they were accounted to the device owning the local addresses instead
of loopback.
It also goes against the Linux philosophy where the addresses are owned
by the host, not the interface.
Therefore I propose the simple solution of not doing the split device
accounting in rt6_info.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
I will send the patches for the other bugs separately.
Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
[-- Attachment #2: q --]
[-- Type: text/plain, Size: 661 bytes --]
===== net/ipv6/route.c 1.105 vs edited =====
--- 1.105/net/ipv6/route.c 2005-01-15 19:44:48 +11:00
+++ edited/net/ipv6/route.c 2005-02-05 16:10:19 +11:00
@@ -1395,13 +1395,12 @@
return ERR_PTR(-ENOMEM);
dev_hold(&loopback_dev);
- in6_dev_hold(idev);
rt->u.dst.flags = DST_HOST;
rt->u.dst.input = ip6_input;
rt->u.dst.output = ip6_output;
rt->rt6i_dev = &loopback_dev;
- rt->rt6i_idev = idev;
+ rt->rt6i_idev = in6_dev_get(&loopback_dev);
rt->u.dst.metrics[RTAX_MTU-1] = ipv6_get_mtu(rt->rt6i_dev);
rt->u.dst.metrics[RTAX_ADVMSS-1] = ipv6_advmss(dst_pmtu(&rt->u.dst));
rt->u.dst.metrics[RTAX_HOPLIMIT-1] = ipv6_get_hoplimit(rt->rt6i_dev);
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: PROBLEM: 2.6.11-rc2 hangs on bridge shutdown (br0)
2005-02-05 5:24 ` PROBLEM: 2.6.11-rc2 hangs on bridge shutdown (br0) Herbert Xu
@ 2005-02-05 5:38 ` David S. Miller
2005-02-05 6:11 ` Herbert Xu
0 siblings, 1 reply; 33+ messages in thread
From: David S. Miller @ 2005-02-05 5:38 UTC (permalink / raw)
To: Herbert Xu; +Cc: mirko.parthey, linux-kernel, netdev, yoshfuji, shemminger
On Sat, 5 Feb 2005 16:24:07 +1100
Herbert Xu <herbert@gondor.apana.org.au> wrote:
> This is the key to the problem.
...
> All of these bugs stem from the idev reference held in rtable/rt6_info.
...
> Anyway, this particular problem is due to IPv6 adding local addresses
> with split devices. That is, routes to local addresses are added with
> rt6i_dev set to &loopback_dev and rt6i_idev set to the idev of the
> device where the address is added.
...
> It also goes against the Linux philosophy where the addresses are owned
> by the host, not the interface.
>
> Therefore I propose the simple solution of not doing the split device
> accounting in rt6_info.
I agree with your analysis, however... this change is not sufficient.
You have to then walk over all the uses of rt6i_dev and sanitize the
cases that still expect the split semantics. For example, things like
this piece of coe in rt6_device_match():
if (dev->flags & IFF_LOOPBACK) {
if (sprt->rt6i_idev == NULL ||
sprt->rt6i_idev->dev->ifindex != oif) {
if (strict && oif)
continue;
if (local && (!oif ||
local->rt6i_idev->dev->ifindex == oif))
continue;
}
local = sprt;
}
It is just the first such thing I found, scanning rt6i_idev uses
will easily find several others.
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: PROBLEM: 2.6.11-rc2 hangs on bridge shutdown (br0)
2005-02-05 5:38 ` David S. Miller
@ 2005-02-05 6:11 ` Herbert Xu
2005-02-05 6:13 ` David S. Miller
` (2 more replies)
0 siblings, 3 replies; 33+ messages in thread
From: Herbert Xu @ 2005-02-05 6:11 UTC (permalink / raw)
To: David S. Miller; +Cc: mirko.parthey, linux-kernel, netdev, yoshfuji, shemminger
[-- Attachment #1: Type: text/plain, Size: 800 bytes --]
On Fri, Feb 04, 2005 at 09:38:13PM -0800, David S. Miller wrote:
>
> It is just the first such thing I found, scanning rt6i_idev uses
> will easily find several others.
You're right of course. I thought they were all harmless but I was
obviously wrong about this one.
So here is a patch that essentially reverts the split devices
semantics introduced by these two changesets:
[IPV6] addrconf_dst_alloc() to allocate new route for local address.
[IPV6] take rt6i_idev into account when looking up routes.
Signed-off-by: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
[-- Attachment #2: p --]
[-- Type: text/plain, Size: 2981 bytes --]
===== include/net/ip6_route.h 1.21 vs edited =====
--- 1.21/include/net/ip6_route.h 2004-10-26 14:27:49 +10:00
+++ edited/include/net/ip6_route.h 2005-02-05 17:02:36 +11:00
@@ -74,8 +74,7 @@
extern int ndisc_dst_gc(int *more);
extern void fib6_force_start_gc(void);
-extern struct rt6_info *addrconf_dst_alloc(struct inet6_dev *idev,
- const struct in6_addr *addr,
+extern struct rt6_info *addrconf_dst_alloc(const struct in6_addr *addr,
int anycast);
/*
===== net/ipv6/addrconf.c 1.129 vs edited =====
--- 1.129/net/ipv6/addrconf.c 2005-01-18 08:13:31 +11:00
+++ edited/net/ipv6/addrconf.c 2005-02-05 17:02:00 +11:00
@@ -509,7 +509,7 @@
goto out;
}
- rt = addrconf_dst_alloc(idev, addr, 0);
+ rt = addrconf_dst_alloc(addr, 0);
if (IS_ERR(rt)) {
err = PTR_ERR(rt);
goto out;
===== net/ipv6/anycast.c 1.19 vs edited =====
--- 1.19/net/ipv6/anycast.c 2005-01-15 08:30:07 +11:00
+++ edited/net/ipv6/anycast.c 2005-02-05 17:01:47 +11:00
@@ -340,7 +340,7 @@
goto out;
}
- rt = addrconf_dst_alloc(idev, addr, 1);
+ rt = addrconf_dst_alloc(addr, 1);
if (IS_ERR(rt)) {
kfree(aca);
err = PTR_ERR(rt);
===== net/ipv6/ip6_fib.c 1.34 vs edited =====
--- 1.34/net/ipv6/ip6_fib.c 2005-01-14 15:41:06 +11:00
+++ edited/net/ipv6/ip6_fib.c 2005-02-05 17:04:02 +11:00
@@ -450,7 +450,6 @@
*/
if (iter->rt6i_dev == rt->rt6i_dev &&
- iter->rt6i_idev == rt->rt6i_idev &&
ipv6_addr_equal(&iter->rt6i_gateway,
&rt->rt6i_gateway)) {
if (!(iter->rt6i_flags&RTF_EXPIRES))
===== net/ipv6/route.c 1.105 vs edited =====
--- 1.105/net/ipv6/route.c 2005-01-15 19:44:48 +11:00
+++ edited/net/ipv6/route.c 2005-02-05 17:01:23 +11:00
@@ -189,17 +189,8 @@
struct net_device *dev = sprt->rt6i_dev;
if (dev->ifindex == oif)
return sprt;
- if (dev->flags & IFF_LOOPBACK) {
- if (sprt->rt6i_idev == NULL ||
- sprt->rt6i_idev->dev->ifindex != oif) {
- if (strict && oif)
- continue;
- if (local && (!oif ||
- local->rt6i_idev->dev->ifindex == oif))
- continue;
- }
+ if (dev->flags & IFF_LOOPBACK)
local = sprt;
- }
}
if (local)
@@ -1385,8 +1376,7 @@
* Allocate a dst for local (unicast / anycast) address.
*/
-struct rt6_info *addrconf_dst_alloc(struct inet6_dev *idev,
- const struct in6_addr *addr,
+struct rt6_info *addrconf_dst_alloc(const struct in6_addr *addr,
int anycast)
{
struct rt6_info *rt = ip6_dst_alloc();
@@ -1395,13 +1385,12 @@
return ERR_PTR(-ENOMEM);
dev_hold(&loopback_dev);
- in6_dev_hold(idev);
rt->u.dst.flags = DST_HOST;
rt->u.dst.input = ip6_input;
rt->u.dst.output = ip6_output;
rt->rt6i_dev = &loopback_dev;
- rt->rt6i_idev = idev;
+ rt->rt6i_idev = in6_dev_get(&loopback_dev);
rt->u.dst.metrics[RTAX_MTU-1] = ipv6_get_mtu(rt->rt6i_dev);
rt->u.dst.metrics[RTAX_ADVMSS-1] = ipv6_advmss(dst_pmtu(&rt->u.dst));
rt->u.dst.metrics[RTAX_HOPLIMIT-1] = ipv6_get_hoplimit(rt->rt6i_dev);
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: PROBLEM: 2.6.11-rc2 hangs on bridge shutdown (br0)
2005-02-05 6:11 ` Herbert Xu
@ 2005-02-05 6:13 ` David S. Miller
2005-02-05 6:46 ` Herbert Xu
2005-02-05 11:14 ` PROBLEM: 2.6.11-rc2 hangs on bridge shutdown (br0) Andre Tomt
2005-02-06 10:55 ` Andre Tomt
2 siblings, 1 reply; 33+ messages in thread
From: David S. Miller @ 2005-02-05 6:13 UTC (permalink / raw)
To: Herbert Xu; +Cc: mirko.parthey, linux-kernel, netdev, yoshfuji, shemminger
On Sat, 5 Feb 2005 17:11:10 +1100
Herbert Xu <herbert@gondor.apana.org.au> wrote:
> You're right of course. I thought they were all harmless but I was
> obviously wrong about this one.
>
> So here is a patch that essentially reverts the split devices
> semantics introduced by these two changesets:
>
> [IPV6] addrconf_dst_alloc() to allocate new route for local address.
> [IPV6] take rt6i_idev into account when looking up routes.
>
> Signed-off-by: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Ok.
But Herbert, let's take a step back real quick because I want
to point something out. IPv6 does try to handle the dangling
mismatched idev's, in route.c:ip6_dst_ifdown(), this is called
via net/core/dst.c:dst_ifdown(), and this releases the ipv6
idev correctly in the split device case.
Did your analysis of this bridging release bug take this into
account? That's why we added this dst->ops method, specifically
to handle this problem.
This was added by Yoshifuji-san in ChangeSet 1.1722.137.17 which
has the checking comment:
[NET]: Add dst->ifdown callback.
Use it to release protocol specific objects that may be
tied to a dst cache object, at ifdown time. Currently
this is used to release ipv4/ipv6 specific device state.
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: PROBLEM: 2.6.11-rc2 hangs on bridge shutdown (br0)
2005-02-05 6:13 ` David S. Miller
@ 2005-02-05 6:46 ` Herbert Xu
2005-02-05 10:45 ` Herbert Xu
` (2 more replies)
0 siblings, 3 replies; 33+ messages in thread
From: Herbert Xu @ 2005-02-05 6:46 UTC (permalink / raw)
To: David S. Miller; +Cc: mirko.parthey, linux-kernel, netdev, yoshfuji, shemminger
On Fri, Feb 04, 2005 at 10:13:44PM -0800, David S. Miller wrote:
>
> But Herbert, let's take a step back real quick because I want
> to point something out. IPv6 does try to handle the dangling
> mismatched idev's, in route.c:ip6_dst_ifdown(), this is called
> via net/core/dst.c:dst_ifdown(), and this releases the ipv6
> idev correctly in the split device case.
>
> Did your analysis of this bridging release bug take this into
> account? That's why we added this dst->ops method, specifically
> to handle this problem.
This doesn't work because net/core/dst.c can only search based
on dst->dev. For the split device case, dst->dev is set to
loopback_dev while rt6i_idev is set to the real device.
Therefore net/core/dst.c stands no chance of finding the correct
local address routes that it needs to process.
If we wanted to preserve the split device semantics, then we
can create a local GC list in IPv6 so that it can search based
on rt6i_idev as well as the other keys. Alternatively we can
remove the dst->dev == dev check in dst_dev_event and dst_ifdown
and move that test down to the individual ifdown functions.
However, IMHO the split device semantics is inconsistent with
the philosphy that addresses belong to the host and not the
interface. So it doesn't really make sense in the current
networking stack.
Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: PROBLEM: 2.6.11-rc2 hangs on bridge shutdown (br0)
2005-02-05 6:46 ` Herbert Xu
@ 2005-02-05 10:45 ` Herbert Xu
2005-02-06 11:41 ` Herbert Xu
2005-02-05 10:50 ` YOSHIFUJI Hideaki / 吉藤英明
2005-02-06 4:10 ` David S. Miller
2 siblings, 1 reply; 33+ messages in thread
From: Herbert Xu @ 2005-02-05 10:45 UTC (permalink / raw)
To: David S. Miller; +Cc: mirko.parthey, linux-kernel, netdev, yoshfuji, shemminger
On Sat, Feb 05, 2005 at 05:46:43PM +1100, herbert wrote:
>
> This doesn't work because net/core/dst.c can only search based
> on dst->dev. For the split device case, dst->dev is set to
> loopback_dev while rt6i_idev is set to the real device.
Although I still think this is a bug, I'm now starting to suspect
that there is another bug around as well.
There is probably an ifp leak which in turn leads to a split dst
leak that allows the first bug to make its mark.
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: PROBLEM: 2.6.11-rc2 hangs on bridge shutdown (br0)
2005-02-05 6:46 ` Herbert Xu
2005-02-05 10:45 ` Herbert Xu
@ 2005-02-05 10:50 ` YOSHIFUJI Hideaki / 吉藤英明
2005-02-05 18:32 ` Herbert Xu
2005-02-06 4:02 ` David S. Miller
2005-02-06 4:10 ` David S. Miller
2 siblings, 2 replies; 33+ messages in thread
From: YOSHIFUJI Hideaki / 吉藤英明 @ 2005-02-05 10:50 UTC (permalink / raw)
To: herbert; +Cc: davem, mirko.parthey, linux-kernel, netdev, shemminger, yoshfuji
In article <20050205064643.GA29758@gondor.apana.org.au> (at Sat, 5 Feb 2005 17:46:43 +1100), Herbert Xu <herbert@gondor.apana.org.au> says:
> If we wanted to preserve the split device semantics, then we
> can create a local GC list in IPv6 so that it can search based
> on rt6i_idev as well as the other keys. Alternatively we can
> remove the dst->dev == dev check in dst_dev_event and dst_ifdown
> and move that test down to the individual ifdown functions.
Yes, IPv6 needs "split device" semantics
(for per-device statistics such as Ip6InDelivers etc),
and I like later solution.
Thanks.
--yoshfuji
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: PROBLEM: 2.6.11-rc2 hangs on bridge shutdown (br0)
2005-02-05 6:11 ` Herbert Xu
2005-02-05 6:13 ` David S. Miller
@ 2005-02-05 11:14 ` Andre Tomt
2005-02-05 11:39 ` YOSHIFUJI Hideaki / 吉藤英明
2005-02-06 10:55 ` Andre Tomt
2 siblings, 1 reply; 33+ messages in thread
From: Andre Tomt @ 2005-02-05 11:14 UTC (permalink / raw)
To: Herbert Xu
Cc: David S. Miller, mirko.parthey, linux-kernel, netdev, yoshfuji,
shemminger
Herbert Xu wrote:
> On Fri, Feb 04, 2005 at 09:38:13PM -0800, David S. Miller wrote:
>
>>It is just the first such thing I found, scanning rt6i_idev uses
>>will easily find several others.
>
>
> You're right of course. I thought they were all harmless but I was
> obviously wrong about this one.
>
> So here is a patch that essentially reverts the split devices
> semantics introduced by these two changesets:
<..>
This patch fixes my problems with hangs when dot1q VLAN interfaces gets
removed when loopback is down, as reported in the thread "2.6.10
ipv6/8021q lockup on vconfig on interface removal".
Yay :)
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: PROBLEM: 2.6.11-rc2 hangs on bridge shutdown (br0)
2005-02-05 11:14 ` PROBLEM: 2.6.11-rc2 hangs on bridge shutdown (br0) Andre Tomt
@ 2005-02-05 11:39 ` YOSHIFUJI Hideaki / 吉藤英明
2005-02-05 11:48 ` Andre Tomt
2005-02-05 18:33 ` Herbert Xu
0 siblings, 2 replies; 33+ messages in thread
From: YOSHIFUJI Hideaki / 吉藤英明 @ 2005-02-05 11:39 UTC (permalink / raw)
To: andre
Cc: herbert, davem, mirko.parthey, linux-kernel, netdev, shemminger,
yoshfuji
In article <4204AA7C.9010509@tomt.net> (at Sat, 05 Feb 2005 12:14:04 +0100), Andre Tomt <andre@tomt.net> says:
> This patch fixes my problems with hangs when dot1q VLAN interfaces gets
> removed when loopback is down, as reported in the thread "2.6.10
> ipv6/8021q lockup on vconfig on interface removal".
Please tell me, why your lo is down...
Anyway, if we really want to "fix" this,
we should do in other way.
I think "Make loopback idev stick around" patches
(for IPv4 and IPv6) could be start of that.
--yoshfuji
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: PROBLEM: 2.6.11-rc2 hangs on bridge shutdown (br0)
2005-02-05 11:39 ` YOSHIFUJI Hideaki / 吉藤英明
@ 2005-02-05 11:48 ` Andre Tomt
2005-02-05 11:55 ` YOSHIFUJI Hideaki / 吉藤英明
2005-02-05 18:33 ` Herbert Xu
1 sibling, 1 reply; 33+ messages in thread
From: Andre Tomt @ 2005-02-05 11:48 UTC (permalink / raw)
To: yoshfuji
Cc: andre, herbert, davem, mirko.parthey, linux-kernel, netdev,
shemminger
YOSHIFUJI Hideaki / 吉藤英明 wrote:
> In article <4204AA7C.9010509@tomt.net> (at Sat, 05 Feb 2005 12:14:04 +0100), Andre Tomt <andre@tomt.net> says:
>
>
>>This patch fixes my problems with hangs when dot1q VLAN interfaces gets
>>removed when loopback is down, as reported in the thread "2.6.10
>>ipv6/8021q lockup on vconfig on interface removal".
>
>
> Please tell me, why your lo is down...
>
> Anyway, if we really want to "fix" this,
> we should do in other way.
>
> I think "Make loopback idev stick around" patches
> (for IPv4 and IPv6) could be start of that.
"ifdown -a" gets run on shutdown and reboot here, and ifdown -a in
Debian brings down loopback before any other interfaces.
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: PROBLEM: 2.6.11-rc2 hangs on bridge shutdown (br0)
2005-02-05 11:48 ` Andre Tomt
@ 2005-02-05 11:55 ` YOSHIFUJI Hideaki / 吉藤英明
0 siblings, 0 replies; 33+ messages in thread
From: YOSHIFUJI Hideaki / 吉藤英明 @ 2005-02-05 11:55 UTC (permalink / raw)
To: andre
Cc: herbert, davem, mirko.parthey, linux-kernel, netdev, shemminger,
yoshfuji
In article <4204B27F.5040202@tomt.net> (at Sat, 05 Feb 2005 12:48:15 +0100), Andre Tomt <andre@tomt.net> says:
> > Please tell me, why your lo is down...
:
> "ifdown -a" gets run on shutdown and reboot here, and ifdown -a in
> Debian brings down loopback before any other interfaces.
Okay, thanks. (I now remember someone told me this before.) hmm...
--yoshfuji
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: PROBLEM: 2.6.11-rc2 hangs on bridge shutdown (br0)
2005-02-05 10:50 ` YOSHIFUJI Hideaki / 吉藤英明
@ 2005-02-05 18:32 ` Herbert Xu
2005-02-06 4:02 ` David S. Miller
1 sibling, 0 replies; 33+ messages in thread
From: Herbert Xu @ 2005-02-05 18:32 UTC (permalink / raw)
To: YOSHIFUJI Hideaki / ?$B5HF#1QL@
Cc: davem, mirko.parthey, linux-kernel, netdev, shemminger
On Sat, Feb 05, 2005 at 07:50:39PM +0900, YOSHIFUJI Hideaki / ?$B5HF#1QL@ wrote:
>
> Yes, IPv6 needs "split device" semantics
> (for per-device statistics such as Ip6InDelivers etc),
> and I like later solution.
OK. Is there any reason why IPv4 should be different from IPv6 in
this respect though?
If the split device dst's are to be kept, we'll need a way to clean
them up. There are two choices:
1) Put the dst's on IPv6's own GC so that we can search by rt6i_idev.
2) Change the generic GC so that dst->ops->ifdown is always called even
if dst->dev does not match with the device going down. We also need to
change the individual ifdown functions to check for ->dev. The IPv6
ifdown function can then check for ->rt6i_idev as well.
What's your preference?
Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: PROBLEM: 2.6.11-rc2 hangs on bridge shutdown (br0)
2005-02-05 11:39 ` YOSHIFUJI Hideaki / 吉藤英明
2005-02-05 11:48 ` Andre Tomt
@ 2005-02-05 18:33 ` Herbert Xu
1 sibling, 0 replies; 33+ messages in thread
From: Herbert Xu @ 2005-02-05 18:33 UTC (permalink / raw)
To: YOSHIFUJI Hideaki / ?$B5HF#1QL@
Cc: andre, davem, mirko.parthey, linux-kernel, netdev, shemminger
On Sat, Feb 05, 2005 at 08:39:00PM +0900, YOSHIFUJI Hideaki / ?$B5HF#1QL@ wrote:
>
> I think "Make loopback idev stick around" patches
> (for IPv4 and IPv6) could be start of that.
Unfortunately that patch can't fix this particular problem. This
problem will show up whenever there is a dst on the GC list that
has split devices and a non-zero refcnt.
So if you had a process holding that dst you can still get a dead-lock.
Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: PROBLEM: 2.6.11-rc2 hangs on bridge shutdown (br0)
2005-02-05 10:50 ` YOSHIFUJI Hideaki / 吉藤英明
2005-02-05 18:32 ` Herbert Xu
@ 2005-02-06 4:02 ` David S. Miller
2005-02-06 5:01 ` YOSHIFUJI Hideaki / 吉藤英明
1 sibling, 1 reply; 33+ messages in thread
From: David S. Miller @ 2005-02-06 4:02 UTC (permalink / raw)
To: yoshfuji; +Cc: herbert, mirko.parthey, linux-kernel, netdev, shemminger
On Sat, 05 Feb 2005 19:50:39 +0900 (JST)
YOSHIFUJI Hideaki / ^[$B5HF#1QL@^[(B <yoshfuji@linux-ipv6.org> wrote:
> In article <20050205064643.GA29758@gondor.apana.org.au> (at Sat, 5 Feb 2005 17:46:43 +1100), Herbert Xu <herbert@gondor.apana.org.au> says:
>
> > If we wanted to preserve the split device semantics, then we
> > can create a local GC list in IPv6 so that it can search based
> > on rt6i_idev as well as the other keys. Alternatively we can
> > remove the dst->dev == dev check in dst_dev_event and dst_ifdown
> > and move that test down to the individual ifdown functions.
>
> Yes, IPv6 needs "split device" semantics
> (for per-device statistics such as Ip6InDelivers etc),
> and I like later solution.
Ok. I never read whether ipv6, like ipv4, is specified to support
a model of host based ownership of addresses. Does anyone know?
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: PROBLEM: 2.6.11-rc2 hangs on bridge shutdown (br0)
2005-02-05 6:46 ` Herbert Xu
2005-02-05 10:45 ` Herbert Xu
2005-02-05 10:50 ` YOSHIFUJI Hideaki / 吉藤英明
@ 2005-02-06 4:10 ` David S. Miller
2005-02-06 4:37 ` YOSHIFUJI Hideaki / 吉藤英明
2005-02-06 6:51 ` Herbert Xu
2 siblings, 2 replies; 33+ messages in thread
From: David S. Miller @ 2005-02-06 4:10 UTC (permalink / raw)
To: Herbert Xu; +Cc: mirko.parthey, linux-kernel, netdev, yoshfuji, shemminger
On Sat, 5 Feb 2005 17:46:43 +1100
Herbert Xu <herbert@gondor.apana.org.au> wrote:
> This doesn't work because net/core/dst.c can only search based
> on dst->dev. For the split device case, dst->dev is set to
> loopback_dev while rt6i_idev is set to the real device.
Indeed. I didn't catch that.
> If we wanted to preserve the split device semantics, then we
> can create a local GC list in IPv6 so that it can search based
> on rt6i_idev as well as the other keys.
Ok, so this would entail changing each ipv6 dst_free() call
into one to ip6_dst_free(), which would:
ip6_garbage_add(dst);
dst_free(dst);
It would mean that dst_run_gc() would need to have some callback
like dst->ops->gc_destroy() or similar, which would allow ipv6
to delete the dst from it's local garbage list.
> Alternatively we can
> remove the dst->dev == dev check in dst_dev_event and dst_ifdown
> and move that test down to the individual ifdown functions.
I think there is a hole in this idea.... maybe.
If the idea is to scan dst_garbage_list down in ipv6 specific code,
you can't do that since 'dst' objects from every pool in the kernel
get put onto the dst_garbage_list. It is generic.
They have no identity, so it's illegal to treat any member of that
list as an rt_entry, rt6_entry or any specific higher level dst
type.
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: PROBLEM: 2.6.11-rc2 hangs on bridge shutdown (br0)
2005-02-06 4:10 ` David S. Miller
@ 2005-02-06 4:37 ` YOSHIFUJI Hideaki / 吉藤英明
2005-02-06 5:04 ` David S. Miller
2005-02-06 6:51 ` Herbert Xu
1 sibling, 1 reply; 33+ messages in thread
From: YOSHIFUJI Hideaki / 吉藤英明 @ 2005-02-06 4:37 UTC (permalink / raw)
To: davem; +Cc: herbert, mirko.parthey, linux-kernel, netdev, shemminger,
yoshfuji
In article <20050205201044.1b95f4e8.davem@davemloft.net> (at Sat, 5 Feb 2005 20:10:44 -0800), "David S. Miller" <davem@davemloft.net> says:
> > Alternatively we can
> > remove the dst->dev == dev check in dst_dev_event and dst_ifdown
> > and move that test down to the individual ifdown functions.
>
> I think there is a hole in this idea.... maybe.
>
> If the idea is to scan dst_garbage_list down in ipv6 specific code,
> you can't do that since 'dst' objects from every pool in the kernel
> get put onto the dst_garbage_list. It is generic.
How about making dst->ops->dev_check() like this:
static int inline dst_dev_check(struct dst_entry *dst, struct net_device *dev)
{
if (dst->ops->dev_check)
return dst->ops->dev_check(dst, dev)
else
return dst->dev == dev;
}
--yoshfuji
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: PROBLEM: 2.6.11-rc2 hangs on bridge shutdown (br0)
2005-02-06 4:02 ` David S. Miller
@ 2005-02-06 5:01 ` YOSHIFUJI Hideaki / 吉藤英明
0 siblings, 0 replies; 33+ messages in thread
From: YOSHIFUJI Hideaki / 吉藤英明 @ 2005-02-06 5:01 UTC (permalink / raw)
To: davem; +Cc: herbert, mirko.parthey, linux-kernel, netdev, shemminger,
yoshfuji
In article <20050205200242.2b629de7.davem@davemloft.net> (at Sat, 5 Feb 2005 20:02:42 -0800), "David S. Miller" <davem@davemloft.net> says:
> > Yes, IPv6 needs "split device" semantics
> > (for per-device statistics such as Ip6InDelivers etc),
> > and I like later solution.
>
> Ok. I never read whether ipv6, like ipv4, is specified to support
> a model of host based ownership of addresses. Does anyone know?
I'm not sure it is explicitly specified, but there're some hints:
1. we need to allow multiple addresses on multiple interfaces.
e.g. link-local address
2. if a packet has come from A to link-local address on the other side B,
we should not receive it.
+-------+
---->|A B|
+-------+
Currently, it does not happen in usual because ndisc does NOT handle
addresses on other device.
3. mib document states that we should take statistics on interface which
the address belongs to; not the interface where the packet has come
from:
cited from RFC2011bis:
Local (*) packets on the input side are counted on the interface
associated with their destination address, which may not be the
interface on which they were received. This requirement is caused by
the possibility of losing the original interface during processing,
especially re-assembly.
(*): here it means incoming, but not forwarding.
BTW, BSD has similar reference to interface structure in routeing entry.
--
Hideaki YOSHIFUJI @ USAGI Project <yoshfuji@linux-ipv6.org>
GPG FP: 9022 65EB 1ECF 3AD1 0BDF 80D8 4807 F894 E062 0EEA
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: PROBLEM: 2.6.11-rc2 hangs on bridge shutdown (br0)
2005-02-06 4:37 ` YOSHIFUJI Hideaki / 吉藤英明
@ 2005-02-06 5:04 ` David S. Miller
2005-02-06 5:31 ` YOSHIFUJI Hideaki / 吉藤英明
2005-02-06 6:52 ` Herbert Xu
0 siblings, 2 replies; 33+ messages in thread
From: David S. Miller @ 2005-02-06 5:04 UTC (permalink / raw)
To: yoshfuji; +Cc: herbert, mirko.parthey, linux-kernel, netdev, shemminger,
yoshfuji
On Sun, 06 Feb 2005 13:37:23 +0900 (JST)
YOSHIFUJI Hideaki / ^[$B5HF#1QL@^[(B <yoshfuji@linux-ipv6.org> wrote:
> How about making dst->ops->dev_check() like this:
>
> static int inline dst_dev_check(struct dst_entry *dst, struct net_device *dev)
> {
> if (dst->ops->dev_check)
> return dst->ops->dev_check(dst, dev)
> else
> return dst->dev == dev;
> }
Oh I see. That would work, and it seems the simplest, and
lowest risk fix for this problem.
Herbert, what do you think?
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: PROBLEM: 2.6.11-rc2 hangs on bridge shutdown (br0)
2005-02-06 5:04 ` David S. Miller
@ 2005-02-06 5:31 ` YOSHIFUJI Hideaki / 吉藤英明
2005-02-06 5:50 ` YOSHIFUJI Hideaki / 吉藤英明
2005-02-06 6:53 ` Herbert Xu
2005-02-06 6:52 ` Herbert Xu
1 sibling, 2 replies; 33+ messages in thread
From: YOSHIFUJI Hideaki / 吉藤英明 @ 2005-02-06 5:31 UTC (permalink / raw)
To: davem; +Cc: herbert, mirko.parthey, linux-kernel, netdev, shemminger,
yoshfuji
In article <20050205210411.7e18b8e6.davem@davemloft.net> (at Sat, 5 Feb 2005 21:04:11 -0800), "David S. Miller" <davem@davemloft.net> says:
> On Sun, 06 Feb 2005 13:37:23 +0900 (JST)
> YOSHIFUJI Hideaki / ^[$B5HF#1QL@^[(B <yoshfuji@linux-ipv6.org> wrote:
>
> > How about making dst->ops->dev_check() like this:
> >
> > static int inline dst_dev_check(struct dst_entry *dst, struct net_device *dev)
> > {
> > if (dst->ops->dev_check)
> > return dst->ops->dev_check(dst, dev)
> > else
> > return dst->dev == dev;
> > }
>
> Oh I see. That would work, and it seems the simplest, and
> lowest risk fix for this problem.
Well...
Here, lo is going down.
rt->rt6i_dev = lo and rt->rt6i_idev = ethX.
I think we already see dst->dev == dev (==lo) now.
So, I doubt that fix the problem.
The source of problem is entry (*) which still on routing entry,
not on gc list. And, the owner of entry is not routing table but
unicast/anycast address structure(s).
We need to "kill" active address on the other interfaces.
*: rt->rt6i_dev = lo and rt->rt6i_idev = ethX
BTW, I wish we could shut down eth0 during lo is pending...
--yoshfuji
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: PROBLEM: 2.6.11-rc2 hangs on bridge shutdown (br0)
2005-02-06 5:31 ` YOSHIFUJI Hideaki / 吉藤英明
@ 2005-02-06 5:50 ` YOSHIFUJI Hideaki / 吉藤英明
2005-02-06 7:00 ` Herbert Xu
2005-02-06 6:53 ` Herbert Xu
1 sibling, 1 reply; 33+ messages in thread
From: YOSHIFUJI Hideaki / 吉藤英明 @ 2005-02-06 5:50 UTC (permalink / raw)
To: davem; +Cc: herbert, mirko.parthey, linux-kernel, netdev, shemminger,
yoshfuji
In article <20050206.143107.39728239.yoshfuji@linux-ipv6.org> (at Sun, 06 Feb 2005 14:31:07 +0900 (JST)), YOSHIFUJI Hideaki / 吉藤英明 <yoshfuji@linux-ipv6.org> says:
> The source of problem is entry (*) which still on routing entry,
> not on gc list. And, the owner of entry is not routing table but
> unicast/anycast address structure(s).
> We need to "kill" active address on the other interfaces.
Which means in addrconf_notiry(), if the dev == &loopback_dev,
call addrconf_ifdown for every device like this:
Signed-off-by: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
===== net/ipv6/addrconf.c 1.129 vs edited =====
--- 1.129/net/ipv6/addrconf.c 2005-01-18 06:13:31 +09:00
+++ edited/net/ipv6/addrconf.c 2005-02-06 14:48:25 +09:00
@@ -1961,6 +1961,20 @@
case NETDEV_DOWN:
case NETDEV_UNREGISTER:
/*
+ * If lo is doing down we need to kill
+ * all addresses on the host because it owns
+ * route on lo. --yoshfuji
+ */
+ if (dev == &loopback_dev) {
+ struct net_device *dev1;
+ for (dev1 = dev_base; dev1; dev1 = dev->next) {
+ if (dev1 == &loopback_dev ||
+ __in6_dev_get(dev1) == NULL)
+ continue;
+ addrconf_ifdown(dev1, event != NETDEV_DOWN);
+ }
+ }
+ /*
* Remove all addresses from this interface.
*/
addrconf_ifdown(dev, event != NETDEV_DOWN);
--
Hideaki YOSHIFUJI @ USAGI Project <yoshfuji@linux-ipv6.org>
GPG FP: 9022 65EB 1ECF 3AD1 0BDF 80D8 4807 F894 E062 0EEA
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: PROBLEM: 2.6.11-rc2 hangs on bridge shutdown (br0)
2005-02-06 4:10 ` David S. Miller
2005-02-06 4:37 ` YOSHIFUJI Hideaki / 吉藤英明
@ 2005-02-06 6:51 ` Herbert Xu
2005-02-08 1:29 ` [IPSEC] Move dst->child loop from dst_ifdown to xfrm_dst_ifdown Herbert Xu
1 sibling, 1 reply; 33+ messages in thread
From: Herbert Xu @ 2005-02-06 6:51 UTC (permalink / raw)
To: David S. Miller; +Cc: mirko.parthey, linux-kernel, netdev, yoshfuji, shemminger
On Sat, Feb 05, 2005 at 08:10:44PM -0800, David S. Miller wrote:
>
> > Alternatively we can
> > remove the dst->dev == dev check in dst_dev_event and dst_ifdown
> > and move that test down to the individual ifdown functions.
>
> I think there is a hole in this idea.... maybe.
>
> If the idea is to scan dst_garbage_list down in ipv6 specific code,
> you can't do that since 'dst' objects from every pool in the kernel
> get put onto the dst_garbage_list. It is generic.
The idea is to move the check into dst->ops->ifdown. By definition
ipv6_dst_ifdown will only see rt6_info entries. So dst_dev_event
will become
for (dst = dst_garbage_list; dst; dst = dst->next) {
dst_ifdown(dst, event != NETDEV_DOWN);
}
dst_ifdown will become
...
do {
if (dst->dev == dev && unregister) {
...
}
dst->ops->ifdown(dst, dev, unregister);
} while ((dst = dst->child) && dst->flags & DST_NOHASH);
...
Note the extra dev argument to ifdown. ipv6_dst_ifdown will be
static void ip6_dst_ifdown(struct dst_entry *dst, struct net_device *dev,
int how)
{
struct rt6_info *rt = (struct rt6_info *)dst;
struct inet6_dev *idev = rt->rt6i_idev;
if (idev != NULL && idev->dev != &loopback_dev && idev->dev == dev) {
struct inet6_dev *loopback_idev = in6_dev_get(&loopback_dev);
if (loopback_idev != NULL) {
rt->rt6i_idev = loopback_idev;
in6_dev_put(idev);
}
}
}
Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: PROBLEM: 2.6.11-rc2 hangs on bridge shutdown (br0)
2005-02-06 5:04 ` David S. Miller
2005-02-06 5:31 ` YOSHIFUJI Hideaki / 吉藤英明
@ 2005-02-06 6:52 ` Herbert Xu
1 sibling, 0 replies; 33+ messages in thread
From: Herbert Xu @ 2005-02-06 6:52 UTC (permalink / raw)
To: David S. Miller; +Cc: yoshfuji, mirko.parthey, linux-kernel, netdev, shemminger
On Sat, Feb 05, 2005 at 09:04:11PM -0800, David S. Miller wrote:
>
> Oh I see. That would work, and it seems the simplest, and
> lowest risk fix for this problem.
>
> Herbert, what do you think?
Yes I agree.
Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: PROBLEM: 2.6.11-rc2 hangs on bridge shutdown (br0)
2005-02-06 5:31 ` YOSHIFUJI Hideaki / 吉藤英明
2005-02-06 5:50 ` YOSHIFUJI Hideaki / 吉藤英明
@ 2005-02-06 6:53 ` Herbert Xu
1 sibling, 0 replies; 33+ messages in thread
From: Herbert Xu @ 2005-02-06 6:53 UTC (permalink / raw)
To: YOSHIFUJI Hideaki / ?$B5HF#1QL@
Cc: davem, mirko.parthey, linux-kernel, netdev, shemminger
On Sun, Feb 06, 2005 at 02:31:07PM +0900, YOSHIFUJI Hideaki / ?$B5HF#1QL@ wrote:
>
> Here, lo is going down.
> rt->rt6i_dev = lo and rt->rt6i_idev = ethX.
> I think we already see dst->dev == dev (==lo) now.
> So, I doubt that fix the problem.
>
> The source of problem is entry (*) which still on routing entry,
> not on gc list. And, the owner of entry is not routing table but
> unicast/anycast address structure(s).
> We need to "kill" active address on the other interfaces.
>
> *: rt->rt6i_dev = lo and rt->rt6i_idev = ethX
Sorry I don't think this is right. Although lo going down is
required to cause those symptoms, it is not the trigger.
The problem only occurs when eth0 itself is unregistered.
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: PROBLEM: 2.6.11-rc2 hangs on bridge shutdown (br0)
2005-02-06 5:50 ` YOSHIFUJI Hideaki / 吉藤英明
@ 2005-02-06 7:00 ` Herbert Xu
0 siblings, 0 replies; 33+ messages in thread
From: Herbert Xu @ 2005-02-06 7:00 UTC (permalink / raw)
To: YOSHIFUJI Hideaki / ?$B5HF#1QL@
Cc: davem, mirko.parthey, linux-kernel, netdev, shemminger
On Sun, Feb 06, 2005 at 02:50:07PM +0900, YOSHIFUJI Hideaki / ?$B5HF#1QL@ wrote:
>
> Which means in addrconf_notiry(), if the dev == &loopback_dev,
> call addrconf_ifdown for every device like this:
This should fix the reported issue. However, I'm not sure if it's
a good idea to stop all IP traffic when lo goes down. We don't do
that for IPv4.
Besides, we'll still need to fix the rt6i_idev GC issue since the
same bug can occur when eth0 goes down and some appliation is holding
a dst to a local address route. It can become a dead-lock if the
said application then invokes a syscall that takes the RTNL.
Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: PROBLEM: 2.6.11-rc2 hangs on bridge shutdown (br0)
2005-02-05 6:11 ` Herbert Xu
2005-02-05 6:13 ` David S. Miller
2005-02-05 11:14 ` PROBLEM: 2.6.11-rc2 hangs on bridge shutdown (br0) Andre Tomt
@ 2005-02-06 10:55 ` Andre Tomt
2005-02-06 11:28 ` YOSHIFUJI Hideaki / 吉藤英明
2 siblings, 1 reply; 33+ messages in thread
From: Andre Tomt @ 2005-02-06 10:55 UTC (permalink / raw)
To: Herbert Xu
Cc: David S. Miller, mirko.parthey, linux-kernel, netdev, yoshfuji,
shemminger
Herbert Xu wrote:
> On Fri, Feb 04, 2005 at 09:38:13PM -0800, David S. Miller wrote:
>
>>It is just the first such thing I found, scanning rt6i_idev uses
>>will easily find several others.
>
>
> You're right of course. I thought they were all harmless but I was
> obviously wrong about this one.
>
> So here is a patch that essentially reverts the split devices
> semantics introduced by these two changesets:
>
> [IPV6] addrconf_dst_alloc() to allocate new route for local address.
> [IPV6] take rt6i_idev into account when looking up routes.
>
> Signed-off-by: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Now that this fix have been written off as probably wrong; how much does
it break? As far as I've understood it "just" reverts to old semantics,
probably not correct semantics, but still not unexpected semantics
(like, say, hang on device unregistration ;) )
I'm contemplating just using it as a quick-fix until 2.6.11 to get this
problem under control.
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: PROBLEM: 2.6.11-rc2 hangs on bridge shutdown (br0)
2005-02-06 10:55 ` Andre Tomt
@ 2005-02-06 11:28 ` YOSHIFUJI Hideaki / 吉藤英明
0 siblings, 0 replies; 33+ messages in thread
From: YOSHIFUJI Hideaki / 吉藤英明 @ 2005-02-06 11:28 UTC (permalink / raw)
To: andre
Cc: herbert, davem, mirko.parthey, linux-kernel, netdev, shemminger,
yoshfuji
In article <4205F797.8080804@tomt.net> (at Sun, 06 Feb 2005 11:55:19 +0100), Andre Tomt <andre@tomt.net> says:
> I'm contemplating just using it as a quick-fix until 2.6.11 to get this
> problem under control.
Would you find if my patch works? Thanks.
--yoshfuji
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: PROBLEM: 2.6.11-rc2 hangs on bridge shutdown (br0)
2005-02-05 10:45 ` Herbert Xu
@ 2005-02-06 11:41 ` Herbert Xu
2005-02-06 12:30 ` YOSHIFUJI Hideaki / 吉藤英明
0 siblings, 1 reply; 33+ messages in thread
From: Herbert Xu @ 2005-02-06 11:41 UTC (permalink / raw)
To: David S. Miller; +Cc: mirko.parthey, linux-kernel, netdev, yoshfuji, shemminger
On Sat, Feb 05, 2005 at 09:45:59PM +1100, herbert wrote:
>
> Although I still think this is a bug, I'm now starting to suspect
> that there is another bug around as well.
>
> There is probably an ifp leak which in turn leads to a split dst
> leak that allows the first bug to make its mark.
Found it. This is what happens:
lo goes down =>
rt6_ifdown =>
eth0's local address route gets deleted
eth0 goes down =>
__ipv6_ifa_notify =>
ip6_del_rt fails so we fall through to the
dst_free path. At this point the refcount
taken by __ipv6_ifa_notify is leaked.
Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: PROBLEM: 2.6.11-rc2 hangs on bridge shutdown (br0)
2005-02-06 11:41 ` Herbert Xu
@ 2005-02-06 12:30 ` YOSHIFUJI Hideaki / 吉藤英明
2005-02-06 21:01 ` Herbert Xu
2005-02-09 20:45 ` Andre Tomt
0 siblings, 2 replies; 33+ messages in thread
From: YOSHIFUJI Hideaki / 吉藤英明 @ 2005-02-06 12:30 UTC (permalink / raw)
To: herbert; +Cc: davem, mirko.parthey, linux-kernel, netdev, shemminger, yoshfuji
In article <20050206114145.GA20883@gondor.apana.org.au> (at Sun, 6 Feb 2005 22:41:45 +1100), Herbert Xu <herbert@gondor.apana.org.au> says:
> On Sat, Feb 05, 2005 at 09:45:59PM +1100, herbert wrote:
> >
> > Although I still think this is a bug, I'm now starting to suspect
> > that there is another bug around as well.
> >
> > There is probably an ifp leak which in turn leads to a split dst
> > leak that allows the first bug to make its mark.
>
> Found it. This is what happens:
>
> lo goes down =>
> rt6_ifdown =>
> eth0's local address route gets deleted
>
> eth0 goes down =>
> __ipv6_ifa_notify =>
> ip6_del_rt fails so we fall through to the
> dst_free path. At this point the refcount
> taken by __ipv6_ifa_notify is leaked.
Oh, you're right! Thanks.
How about this; Ignore entries addrconf_dst_alloc'ed entries in rt6_ifdown()?
Signed-off-by: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
===== include/linux/ipv6_route.h 1.6 vs edited =====
--- 1.6/include/linux/ipv6_route.h 2004-10-26 12:54:23 +09:00
+++ edited/include/linux/ipv6_route.h 2005-02-06 21:27:02 +09:00
@@ -26,6 +26,7 @@
#define RTF_FLOW 0x02000000 /* flow significant route */
#define RTF_POLICY 0x04000000 /* policy route */
+#define RTF_ANYCAST 0x40000000
#define RTF_LOCAL 0x80000000
struct in6_rtmsg {
===== net/ipv6/route.c 1.105 vs edited =====
--- 1.105/net/ipv6/route.c 2005-01-15 17:44:48 +09:00
+++ edited/net/ipv6/route.c 2005-02-06 21:26:35 +09:00
@@ -1408,7 +1408,9 @@
rt->u.dst.obsolete = -1;
rt->rt6i_flags = RTF_UP | RTF_NONEXTHOP;
- if (!anycast)
+ if (anycast)
+ rt->rt6i_flags |= RTF_ANYCAST;
+ else
rt->rt6i_flags |= RTF_LOCAL;
rt->rt6i_nexthop = ndisc_get_neigh(rt->rt6i_dev, &rt->rt6i_gateway);
if (rt->rt6i_nexthop == NULL) {
@@ -1427,7 +1429,8 @@
static int fib6_ifdown(struct rt6_info *rt, void *arg)
{
if (((void*)rt->rt6i_dev == arg || arg == NULL) &&
- rt != &ip6_null_entry) {
+ rt != &ip6_null_entry &&
+ !(rt->rt6i_flags & (RTF_LOCAL|RTF_ANYCAST))) {
RT6_TRACE("deleted by ifdown %p\n", rt);
return -1;
}
--
Hideaki YOSHIFUJI @ USAGI Project <yoshfuji@linux-ipv6.org>
GPG FP: 9022 65EB 1ECF 3AD1 0BDF 80D8 4807 F894 E062 0EEA
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: PROBLEM: 2.6.11-rc2 hangs on bridge shutdown (br0)
2005-02-06 12:30 ` YOSHIFUJI Hideaki / 吉藤英明
@ 2005-02-06 21:01 ` Herbert Xu
2005-02-09 20:45 ` Andre Tomt
1 sibling, 0 replies; 33+ messages in thread
From: Herbert Xu @ 2005-02-06 21:01 UTC (permalink / raw)
To: YOSHIFUJI Hideaki / ?$B5HF#1QL@
Cc: davem, mirko.parthey, linux-kernel, netdev, shemminger
On Sun, Feb 06, 2005 at 09:30:18PM +0900, YOSHIFUJI Hideaki / ?$B5HF#1QL@ wrote:
>
> How about this; Ignore entries addrconf_dst_alloc'ed entries in rt6_ifdown()?
Great, that definitely fixes the local address problem.
I'm not sure about anycast routes though. Who's going to delete them
when the real device goes down?
Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply [flat|nested] 33+ messages in thread
* [IPSEC] Move dst->child loop from dst_ifdown to xfrm_dst_ifdown
2005-02-06 6:51 ` Herbert Xu
@ 2005-02-08 1:29 ` Herbert Xu
2005-02-08 1:31 ` Herbert Xu
0 siblings, 1 reply; 33+ messages in thread
From: Herbert Xu @ 2005-02-08 1:29 UTC (permalink / raw)
To: David S. Miller; +Cc: mirko.parthey, linux-kernel, netdev, yoshfuji, shemminger
[-- Attachment #1: Type: text/plain, Size: 622 bytes --]
On Sun, Feb 06, 2005 at 05:51:17PM +1100, herbert wrote:
>
> The idea is to move the check into dst->ops->ifdown. By definition
> ipv6_dst_ifdown will only see rt6_info entries. So dst_dev_event
> will become
Here are the patches to do this. Do they look sane?
This one moves the dst->child processing from dst_ifdown into
xfrm_dst_ifdown.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
[-- Attachment #2: dst-patch-1 --]
[-- Type: text/plain, Size: 2519 bytes --]
===== net/core/dst.c 1.26 vs edited =====
--- 1.26/net/core/dst.c 2005-02-06 14:23:59 +11:00
+++ edited/net/core/dst.c 2005-02-08 12:11:39 +11:00
@@ -220,31 +220,26 @@
*
* Commented and originally written by Alexey.
*/
-static void dst_ifdown(struct dst_entry *dst, int unregister)
+static inline void dst_ifdown(struct dst_entry *dst, int unregister)
{
struct net_device *dev = dst->dev;
+ if (dst->ops->ifdown)
+ dst->ops->ifdown(dst, unregister);
+
if (!unregister) {
dst->input = dst_discard_in;
dst->output = dst_discard_out;
- }
-
- do {
- if (unregister) {
- dst->dev = &loopback_dev;
- dev_hold(&loopback_dev);
+ } else {
+ dst->dev = &loopback_dev;
+ dev_hold(&loopback_dev);
+ dev_put(dev);
+ if (dst->neighbour && dst->neighbour->dev == dev) {
+ dst->neighbour->dev = &loopback_dev;
dev_put(dev);
- if (dst->neighbour && dst->neighbour->dev == dev) {
- dst->neighbour->dev = &loopback_dev;
- dev_put(dev);
- dev_hold(&loopback_dev);
- }
+ dev_hold(&loopback_dev);
}
-
- if (dst->ops->ifdown)
- dst->ops->ifdown(dst, unregister);
- } while ((dst = dst->child) && dst->flags & DST_NOHASH &&
- dst->dev == dev);
+ }
}
static int dst_dev_event(struct notifier_block *this, unsigned long event, void *ptr)
===== net/xfrm/xfrm_policy.c 1.63 vs edited =====
--- 1.63/net/xfrm/xfrm_policy.c 2005-01-19 07:08:19 +11:00
+++ edited/net/xfrm/xfrm_policy.c 2005-02-08 12:10:47 +11:00
@@ -1027,6 +1027,20 @@
dst->xfrm = NULL;
}
+static void xfrm_dst_ifdown(struct dst_entry *dst, int unregister)
+{
+ struct net_device *dev = dst->dev;
+
+ if (!unregister)
+ return;
+
+ while ((dst = dst->child) && dst->xfrm && dst->dev == dev) {
+ dst->dev = &loopback_dev;
+ dev_hold(&loopback_dev);
+ dev_put(dev);
+ }
+}
+
static void xfrm_link_failure(struct sk_buff *skb)
{
/* Impossible. Such dst must be popped before reaches point of failure. */
@@ -1150,6 +1164,8 @@
dst_ops->check = xfrm_dst_check;
if (likely(dst_ops->destroy == NULL))
dst_ops->destroy = xfrm_dst_destroy;
+ if (likely(dst_ops->ifdown == NULL))
+ dst_ops->ifdown = xfrm_dst_ifdown;
if (likely(dst_ops->negative_advice == NULL))
dst_ops->negative_advice = xfrm_negative_advice;
if (likely(dst_ops->link_failure == NULL))
@@ -1181,6 +1197,7 @@
dst_ops->kmem_cachep = NULL;
dst_ops->check = NULL;
dst_ops->destroy = NULL;
+ dst_ops->ifdown = NULL;
dst_ops->negative_advice = NULL;
dst_ops->link_failure = NULL;
dst_ops->get_mss = NULL;
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [IPSEC] Move dst->child loop from dst_ifdown to xfrm_dst_ifdown
2005-02-08 1:29 ` [IPSEC] Move dst->child loop from dst_ifdown to xfrm_dst_ifdown Herbert Xu
@ 2005-02-08 1:31 ` Herbert Xu
2005-02-15 22:53 ` David S. Miller
0 siblings, 1 reply; 33+ messages in thread
From: Herbert Xu @ 2005-02-08 1:31 UTC (permalink / raw)
To: David S. Miller; +Cc: mirko.parthey, linux-kernel, netdev, yoshfuji, shemminger
[-- Attachment #1: Type: text/plain, Size: 588 bytes --]
On Tue, Feb 08, 2005 at 12:29:29PM +1100, herbert wrote:
>
> This one moves the dst->child processing from dst_ifdown into
> xfrm_dst_ifdown.
This patch adds a net_device argument to ifdown. After all,
it's a bit silly to notify someone of an ifdown event without
telling them what which device it was for :)
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
[-- Attachment #2: dst-patch-2 --]
[-- Type: text/plain, Size: 4260 bytes --]
===== include/net/dst.h 1.25 vs edited =====
--- 1.25/include/net/dst.h 2005-02-06 14:23:59 +11:00
+++ edited/include/net/dst.h 2005-02-08 12:14:10 +11:00
@@ -89,7 +89,8 @@
int (*gc)(void);
struct dst_entry * (*check)(struct dst_entry *, __u32 cookie);
void (*destroy)(struct dst_entry *);
- void (*ifdown)(struct dst_entry *, int how);
+ void (*ifdown)(struct dst_entry *,
+ struct net_device *dev, int how);
struct dst_entry * (*negative_advice)(struct dst_entry *);
void (*link_failure)(struct sk_buff *);
void (*update_pmtu)(struct dst_entry *dst, u32 mtu);
===== net/core/dst.c 1.27 vs edited =====
--- 1.27/net/core/dst.c 2005-02-08 12:12:21 +11:00
+++ edited/net/core/dst.c 2005-02-08 12:15:03 +11:00
@@ -220,12 +220,14 @@
*
* Commented and originally written by Alexey.
*/
-static inline void dst_ifdown(struct dst_entry *dst, int unregister)
+static inline void dst_ifdown(struct dst_entry *dst, struct net_device *dev,
+ int unregister)
{
- struct net_device *dev = dst->dev;
-
if (dst->ops->ifdown)
- dst->ops->ifdown(dst, unregister);
+ dst->ops->ifdown(dst, dev, unregister);
+
+ if (dev != dst->dev)
+ return;
if (!unregister) {
dst->input = dst_discard_in;
@@ -252,8 +254,7 @@
case NETDEV_DOWN:
spin_lock_bh(&dst_lock);
for (dst = dst_garbage_list; dst; dst = dst->next) {
- if (dst->dev == dev)
- dst_ifdown(dst, event != NETDEV_DOWN);
+ dst_ifdown(dst, dev, event != NETDEV_DOWN);
}
spin_unlock_bh(&dst_lock);
break;
===== net/ipv4/route.c 1.101 vs edited =====
--- 1.101/net/ipv4/route.c 2005-02-03 07:43:48 +11:00
+++ edited/net/ipv4/route.c 2005-02-08 12:14:11 +11:00
@@ -138,7 +138,8 @@
static struct dst_entry *ipv4_dst_check(struct dst_entry *dst, u32 cookie);
static void ipv4_dst_destroy(struct dst_entry *dst);
-static void ipv4_dst_ifdown(struct dst_entry *dst, int how);
+static void ipv4_dst_ifdown(struct dst_entry *dst,
+ struct net_device *dev, int how);
static struct dst_entry *ipv4_negative_advice(struct dst_entry *dst);
static void ipv4_link_failure(struct sk_buff *skb);
static void ip_rt_update_pmtu(struct dst_entry *dst, u32 mtu);
@@ -1342,11 +1343,12 @@
}
}
-static void ipv4_dst_ifdown(struct dst_entry *dst, int how)
+static void ipv4_dst_ifdown(struct dst_entry *dst, struct net_device *dev,
+ int how)
{
struct rtable *rt = (struct rtable *) dst;
struct in_device *idev = rt->idev;
- if (idev && idev->dev != &loopback_dev) {
+ if (dev != &loopback_dev && idev && idev->dev == dev) {
struct in_device *loopback_idev = in_dev_get(&loopback_dev);
if (loopback_idev) {
rt->idev = loopback_idev;
===== net/ipv6/route.c 1.105 vs edited =====
--- 1.105/net/ipv6/route.c 2005-01-15 19:44:48 +11:00
+++ edited/net/ipv6/route.c 2005-02-08 12:14:11 +11:00
@@ -84,7 +84,8 @@
static struct dst_entry *ip6_dst_check(struct dst_entry *dst, u32 cookie);
static struct dst_entry *ip6_negative_advice(struct dst_entry *);
static void ip6_dst_destroy(struct dst_entry *);
-static void ip6_dst_ifdown(struct dst_entry *, int how);
+static void ip6_dst_ifdown(struct dst_entry *,
+ struct net_device *dev, int how);
static int ip6_dst_gc(void);
static int ip6_pkt_discard(struct sk_buff *skb);
@@ -153,12 +154,13 @@
}
}
-static void ip6_dst_ifdown(struct dst_entry *dst, int how)
+static void ip6_dst_ifdown(struct dst_entry *dst, struct net_device *dev,
+ int how)
{
struct rt6_info *rt = (struct rt6_info *)dst;
struct inet6_dev *idev = rt->rt6i_idev;
- if (idev != NULL && idev->dev != &loopback_dev) {
+ if (dev != &loopback_dev && idev != NULL && idev->dev == dev) {
struct inet6_dev *loopback_idev = in6_dev_get(&loopback_dev);
if (loopback_idev != NULL) {
rt->rt6i_idev = loopback_idev;
===== net/xfrm/xfrm_policy.c 1.64 vs edited =====
--- 1.64/net/xfrm/xfrm_policy.c 2005-02-08 12:12:21 +11:00
+++ edited/net/xfrm/xfrm_policy.c 2005-02-08 12:15:35 +11:00
@@ -1027,10 +1027,9 @@
dst->xfrm = NULL;
}
-static void xfrm_dst_ifdown(struct dst_entry *dst, int unregister)
+static void xfrm_dst_ifdown(struct dst_entry *dst, struct net_device *dev,
+ int unregister)
{
- struct net_device *dev = dst->dev;
-
if (!unregister)
return;
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: PROBLEM: 2.6.11-rc2 hangs on bridge shutdown (br0)
2005-02-06 12:30 ` YOSHIFUJI Hideaki / 吉藤英明
2005-02-06 21:01 ` Herbert Xu
@ 2005-02-09 20:45 ` Andre Tomt
1 sibling, 0 replies; 33+ messages in thread
From: Andre Tomt @ 2005-02-09 20:45 UTC (permalink / raw)
To: yoshfuji; +Cc: herbert, davem, mirko.parthey, linux-kernel, netdev, shemminger
YOSHIFUJI Hideaki / 吉藤英明 wrote:
> Oh, you're right! Thanks.
>
> How about this; Ignore entries addrconf_dst_alloc'ed entries in rt6_ifdown()?
>
> Signed-off-by: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
This patch seems to work fine here; cleared out the other patches to be
sure. So, herberts first patch fixes it, and so do this one. I have not
tried the one in between.
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [IPSEC] Move dst->child loop from dst_ifdown to xfrm_dst_ifdown
2005-02-08 1:31 ` Herbert Xu
@ 2005-02-15 22:53 ` David S. Miller
0 siblings, 0 replies; 33+ messages in thread
From: David S. Miller @ 2005-02-15 22:53 UTC (permalink / raw)
To: Herbert Xu; +Cc: mirko.parthey, linux-kernel, netdev, yoshfuji, shemminger
On Tue, 8 Feb 2005 12:31:40 +1100
Herbert Xu <herbert@gondor.apana.org.au> wrote:
> On Tue, Feb 08, 2005 at 12:29:29PM +1100, herbert wrote:
> >
> > This one moves the dst->child processing from dst_ifdown into
> > xfrm_dst_ifdown.
>
> This patch adds a net_device argument to ifdown. After all,
> it's a bit silly to notify someone of an ifdown event without
> telling them what which device it was for :)
>
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Ok, I'm going to try and get these two patches into 2.6.11
^ permalink raw reply [flat|nested] 33+ messages in thread
end of thread, other threads:[~2005-02-15 22:53 UTC | newest]
Thread overview: 33+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20050131162201.GA1000@stilzchen.informatik.tu-chemnitz.de>
2005-02-05 5:24 ` PROBLEM: 2.6.11-rc2 hangs on bridge shutdown (br0) Herbert Xu
2005-02-05 5:38 ` David S. Miller
2005-02-05 6:11 ` Herbert Xu
2005-02-05 6:13 ` David S. Miller
2005-02-05 6:46 ` Herbert Xu
2005-02-05 10:45 ` Herbert Xu
2005-02-06 11:41 ` Herbert Xu
2005-02-06 12:30 ` YOSHIFUJI Hideaki / 吉藤英明
2005-02-06 21:01 ` Herbert Xu
2005-02-09 20:45 ` Andre Tomt
2005-02-05 10:50 ` YOSHIFUJI Hideaki / 吉藤英明
2005-02-05 18:32 ` Herbert Xu
2005-02-06 4:02 ` David S. Miller
2005-02-06 5:01 ` YOSHIFUJI Hideaki / 吉藤英明
2005-02-06 4:10 ` David S. Miller
2005-02-06 4:37 ` YOSHIFUJI Hideaki / 吉藤英明
2005-02-06 5:04 ` David S. Miller
2005-02-06 5:31 ` YOSHIFUJI Hideaki / 吉藤英明
2005-02-06 5:50 ` YOSHIFUJI Hideaki / 吉藤英明
2005-02-06 7:00 ` Herbert Xu
2005-02-06 6:53 ` Herbert Xu
2005-02-06 6:52 ` Herbert Xu
2005-02-06 6:51 ` Herbert Xu
2005-02-08 1:29 ` [IPSEC] Move dst->child loop from dst_ifdown to xfrm_dst_ifdown Herbert Xu
2005-02-08 1:31 ` Herbert Xu
2005-02-15 22:53 ` David S. Miller
2005-02-05 11:14 ` PROBLEM: 2.6.11-rc2 hangs on bridge shutdown (br0) Andre Tomt
2005-02-05 11:39 ` YOSHIFUJI Hideaki / 吉藤英明
2005-02-05 11:48 ` Andre Tomt
2005-02-05 11:55 ` YOSHIFUJI Hideaki / 吉藤英明
2005-02-05 18:33 ` Herbert Xu
2005-02-06 10:55 ` Andre Tomt
2005-02-06 11:28 ` YOSHIFUJI Hideaki / 吉藤英明
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).