* Lockup with tun/tap/bridge interface deregistration.
@ 2008-12-10 14:44 Mathieu SEGAUD
2008-12-11 10:24 ` Jarek Poplawski
2008-12-17 12:23 ` Jarek Poplawski
0 siblings, 2 replies; 8+ messages in thread
From: Mathieu SEGAUD @ 2008-12-10 14:44 UTC (permalink / raw)
To: netdev; +Cc: herbert, kernel
Hi,
we are experiencing some "virtual" network interfaces problem, especially
with tun/tap devices and with bridges, which occasionally get stuck and
won't deregister and hang the destroying process for very long time.
I can reliably (*) reproduce it, with any kernel version I tested, with
this script:
#! /bin/sh
for run in $(seq 1 1000000); do
echo "Run #$run"
brctl addbr vbr$run
tunctl -t vif$run
ifconfig vif$run up
brctl addif vbr$run vif$run
ifconfig vbr$run 30.30.30.30 up
ifconfig vbr$run down
brctl delif vbr$run vif$run
ifconfig vif$run down
tunctl -d vif$run
brctl delbr vbr$run
done
The box is responsive when these "lockups" occur, but the brctl or the
tunctl processes can get stuck for hours. Here is a link to a complete
task dump at a time where brctl was stuck:
http://bugs.gentoo.org/attachment.cgi?id=174835&action=view
especially the brctl process dump:
brctl D 00000000 0 19796 30706
c0d72400 00200086 013c5000 00000000 c0441e80 c0441900 c0441900
cada6370
cada6510 c1806900 00000001 0225d2fd 000178f0 c02e5bb8 cada6510
c02e5a3a
f785c000 f785c000 c0129bf2 062bd8e0 d9f5aed8 f785c000 c02e5bb8
c0129d48
Call Trace:
[<c02e5bb8>] _spin_unlock_irqrestore+0xe/0x21
[<c02e5a3a>] _spin_lock_irqsave+0x11/0x2a
[<c0129bf2>] lock_timer_base+0x19/0x35
[<c02e5bb8>] _spin_unlock_irqrestore+0xe/0x21
[<c0129d48>] __mod_timer+0x93/0x9c
[<c02e4855>] schedule_timeout+0x7e/0x99
[<c01298a3>] process_timeout+0x0/0x5
[<c0129d5e>] msleep+0xd/0x12
[<c0266b41>] netdev_run_todo+0xf7/0x19d
[<f8a7f3c5>] br_del_bridge+0x48/0x4c [bridge]
[<f8a7ff61>] br_ioctl_deviceless_stub+0x190/0x19f [bridge]
[<c018a66c>] inotify_d_instantiate+0x12/0x3a
[<c02e5c1b>] _spin_unlock+0xc/0x1f
[<f8a7fdd1>] br_ioctl_deviceless_stub+0x0/0x19f [bridge]
[<c025bf6e>] sock_ioctl+0x11f/0x1d9
[<c025be4f>] sock_ioctl+0x0/0x1d9
[<c0170cdc>] vfs_ioctl+0x1c/0x5f
[<c0170f47>] do_vfs_ioctl+0x228/0x23b
[<c025d76f>] sys_socketcall+0x51/0x19d
[<c0170f86>] sys_ioctl+0x2c/0x42
[<c01038a9>] sysenter_do_call+0x12/0x25
[<c02e0000>] print_cpu_info+0x7e/0x92
(This one was obtained with kernel version 2.6.27.4)
We, at Gentoo, are asking any ideas to solve this. This is reproducible,
even if it is time-consuming. I will rerun and try to ger back here with
a task dump long enough to have all tasks, and more debugging info.
Thanks a lot for reading that much.
(*) reliably because it always happened even though I may have to wait
for hours
Here is an entry in the Gentoo bugzilla reporting this:
http://bugs.gentoo.org/show_bug.cgi?id=219400
--
Mathieu Segaud
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Lockup with tun/tap/bridge interface deregistration.
2008-12-10 14:44 Lockup with tun/tap/bridge interface deregistration Mathieu SEGAUD
@ 2008-12-11 10:24 ` Jarek Poplawski
2008-12-11 10:43 ` Jarek Poplawski
2008-12-17 12:23 ` Jarek Poplawski
1 sibling, 1 reply; 8+ messages in thread
From: Jarek Poplawski @ 2008-12-11 10:24 UTC (permalink / raw)
To: Mathieu SEGAUD; +Cc: netdev, herbert, kernel
On 10-12-2008 15:44, Mathieu SEGAUD wrote:
> Hi,
>
> we are experiencing some "virtual" network interfaces problem, especially
> with tun/tap devices and with bridges, which occasionally get stuck and
> won't deregister and hang the destroying process for very long time.
...
> [<c0266b41>] netdev_run_todo+0xf7/0x19d
> [<f8a7f3c5>] br_del_bridge+0x48/0x4c [bridge]
> [<f8a7ff61>] br_ioctl_deviceless_stub+0x190/0x19f [bridge]
> [<c018a66c>] inotify_d_instantiate+0x12/0x3a
> [<c02e5c1b>] _spin_unlock+0xc/0x1f
> [<f8a7fdd1>] br_ioctl_deviceless_stub+0x0/0x19f [bridge]
> [<c025bf6e>] sock_ioctl+0x11f/0x1d9
> [<c025be4f>] sock_ioctl+0x0/0x1d9
> [<c0170cdc>] vfs_ioctl+0x1c/0x5f
> [<c0170f47>] do_vfs_ioctl+0x228/0x23b
> [<c025d76f>] sys_socketcall+0x51/0x19d
> [<c0170f86>] sys_ioctl+0x2c/0x42
> [<c01038a9>] sysenter_do_call+0x12/0x25
> [<c02e0000>] print_cpu_info+0x7e/0x92
>
> (This one was obtained with kernel version 2.6.27.4)
>
Could you try e.g. 2.6.27.7 or something with this patch?:
http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.27.y.git;a=commit;h=58ec3b4db9eb5a28e3aec5f407a54e28f7039c19
Jarek P.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Lockup with tun/tap/bridge interface deregistration.
2008-12-11 10:24 ` Jarek Poplawski
@ 2008-12-11 10:43 ` Jarek Poplawski
2008-12-11 10:49 ` Herbert Xu
0 siblings, 1 reply; 8+ messages in thread
From: Jarek Poplawski @ 2008-12-11 10:43 UTC (permalink / raw)
To: Mathieu SEGAUD; +Cc: netdev, herbert, kernel
On Thu, Dec 11, 2008 at 10:24:21AM +0000, Jarek Poplawski wrote:
...
> Could you try e.g. 2.6.27.7 or something with this patch?:
>
> http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.27.y.git;a=commit;h=58ec3b4db9eb5a28e3aec5f407a54e28f7039c19
Oops! 2.6.27.4 has this too...
Sorry,
Jarek P.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Lockup with tun/tap/bridge interface deregistration.
2008-12-11 10:43 ` Jarek Poplawski
@ 2008-12-11 10:49 ` Herbert Xu
2008-12-11 11:02 ` Jarek Poplawski
2008-12-11 13:27 ` Mathieu SEGAUD
0 siblings, 2 replies; 8+ messages in thread
From: Herbert Xu @ 2008-12-11 10:49 UTC (permalink / raw)
To: Jarek Poplawski; +Cc: Mathieu SEGAUD, netdev, kernel
On Thu, Dec 11, 2008 at 10:43:03AM +0000, Jarek Poplawski wrote:
> On Thu, Dec 11, 2008 at 10:24:21AM +0000, Jarek Poplawski wrote:
> ...
> > Could you try e.g. 2.6.27.7 or something with this patch?:
> >
> > http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.27.y.git;a=commit;h=58ec3b4db9eb5a28e3aec5f407a54e28f7039c19
>
> Oops! 2.6.27.4 has this too...
Right, this fixes a dead-lock that causes the RTNL to be held which
doesn't seem to be the case here. It sounds like the problem in
this thread is simply due to a non-zero ref count on the netdev
itself.
Mathieu, you should get a message like this
unregister_netdevice: waiting for %s to become free. Usage count = %d
What does your count say?
Thanks,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Lockup with tun/tap/bridge interface deregistration.
2008-12-11 10:49 ` Herbert Xu
@ 2008-12-11 11:02 ` Jarek Poplawski
2008-12-11 13:27 ` Mathieu SEGAUD
1 sibling, 0 replies; 8+ messages in thread
From: Jarek Poplawski @ 2008-12-11 11:02 UTC (permalink / raw)
To: Herbert Xu; +Cc: Mathieu SEGAUD, netdev, kernel
On Thu, Dec 11, 2008 at 09:49:54PM +1100, Herbert Xu wrote:
...
> Right, this fixes a dead-lock that causes the RTNL to be held which
> doesn't seem to be the case here. It sounds like the problem in
> this thread is simply due to a non-zero ref count on the netdev
> itself.
>
> Mathieu, you should get a message like this
>
> unregister_netdevice: waiting for %s to become free. Usage count = %d
>
> What does your count say?
Right, you can see this at the first link of Mathieu's message:
http://bugs.gentoo.org/attachment.cgi?id=174835&action=view
Jarek P.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Lockup with tun/tap/bridge interface deregistration.
2008-12-11 10:49 ` Herbert Xu
2008-12-11 11:02 ` Jarek Poplawski
@ 2008-12-11 13:27 ` Mathieu SEGAUD
1 sibling, 0 replies; 8+ messages in thread
From: Mathieu SEGAUD @ 2008-12-11 13:27 UTC (permalink / raw)
To: Herbert Xu; +Cc: Jarek Poplawski, netdev, kernel
Vous m'avez dit récemment :
> On Thu, Dec 11, 2008 at 10:43:03AM +0000, Jarek Poplawski wrote:
>> On Thu, Dec 11, 2008 at 10:24:21AM +0000, Jarek Poplawski wrote:
>> ...
>> > Could you try e.g. 2.6.27.7 or something with this patch?:
>> >
>> > http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.27.y.git;a=commit;h=58ec3b4db9eb5a28e3aec5f407a54e28f7039c19
>>
>> Oops! 2.6.27.4 has this too...
>
> Right, this fixes a dead-lock that causes the RTNL to be held which
> doesn't seem to be the case here. It sounds like the problem in
> this thread is simply due to a non-zero ref count on the netdev
> itself.
>
> Mathieu, you should get a message like this
>
> unregister_netdevice: waiting for %s to become free. Usage count = %d
>
> What does your count say?
>
> Thanks,
sorry for the delay, the count, in this dump, is stuck at count 3.
unregister_netdevice: waiting for vbr447429 to become free. Usage count = 3
--
Mathieu
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Lockup with tun/tap/bridge interface deregistration.
2008-12-10 14:44 Lockup with tun/tap/bridge interface deregistration Mathieu SEGAUD
2008-12-11 10:24 ` Jarek Poplawski
@ 2008-12-17 12:23 ` Jarek Poplawski
2008-12-17 12:34 ` Herbert Xu
1 sibling, 1 reply; 8+ messages in thread
From: Jarek Poplawski @ 2008-12-17 12:23 UTC (permalink / raw)
To: Mathieu SEGAUD; +Cc: netdev, herbert, kernel
On 10-12-2008 15:44, Mathieu SEGAUD wrote:
> Hi,
>
> we are experiencing some "virtual" network interfaces problem, especially
> with tun/tap devices and with bridges, which occasionally get stuck and
> won't deregister and hang the destroying process for very long time.
...
> [<c0266b41>] netdev_run_todo+0xf7/0x19d
> [<f8a7f3c5>] br_del_bridge+0x48/0x4c [bridge]
...
> (This one was obtained with kernel version 2.6.27.4)
>
I wonder if it's related but since it was originally reported for
2.6.24 kernel, and there was a substantial change around this in dst
code, it would be very interresting to try 2.6.23 for this.
Thanks,
Jarek P.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Lockup with tun/tap/bridge interface deregistration.
2008-12-17 12:23 ` Jarek Poplawski
@ 2008-12-17 12:34 ` Herbert Xu
0 siblings, 0 replies; 8+ messages in thread
From: Herbert Xu @ 2008-12-17 12:34 UTC (permalink / raw)
To: Jarek Poplawski; +Cc: Mathieu SEGAUD, netdev, kernel
On Wed, Dec 17, 2008 at 12:23:50PM +0000, Jarek Poplawski wrote:
> On 10-12-2008 15:44, Mathieu SEGAUD wrote:
> >
> > we are experiencing some "virtual" network interfaces problem, especially
> > with tun/tap devices and with bridges, which occasionally get stuck and
> > won't deregister and hang the destroying process for very long time.
> ...
> > [<c0266b41>] netdev_run_todo+0xf7/0x19d
> > [<f8a7f3c5>] br_del_bridge+0x48/0x4c [bridge]
> ...
> > (This one was obtained with kernel version 2.6.27.4)
>
> I wonder if it's related but since it was originally reported for
> 2.6.24 kernel, and there was a substantial change around this in dst
> code, it would be very interresting to try 2.6.23 for this.
Another thing to try would be to see if it's still reproducible
with IPv6 disabled (assuming IPv6 was enabled to begin with).
Thanks,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2008-12-17 12:34 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-12-10 14:44 Lockup with tun/tap/bridge interface deregistration Mathieu SEGAUD
2008-12-11 10:24 ` Jarek Poplawski
2008-12-11 10:43 ` Jarek Poplawski
2008-12-11 10:49 ` Herbert Xu
2008-12-11 11:02 ` Jarek Poplawski
2008-12-11 13:27 ` Mathieu SEGAUD
2008-12-17 12:23 ` Jarek Poplawski
2008-12-17 12:34 ` Herbert Xu
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).