netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [regression, 2.6.37-rc1] 'ip link tap0 up' stuck in do_exit()
@ 2010-11-03  6:26 Dave Chinner
  2010-11-03  7:13 ` Eric Dumazet
  0 siblings, 1 reply; 14+ messages in thread
From: Dave Chinner @ 2010-11-03  6:26 UTC (permalink / raw)
  To: linux-kernel; +Cc: netdev

Folks,

Starting up KVM on a current mainline kernel using the tap
device for the networking is resulting in the ip process tryin gto
up the tap interface hanging. KVM is started with this networking
config:

....
        -net nic,vlan=0,macaddr=00:e4:b6:63:63:6d,model=virtio \
        -net tap,vlan=0,script=/vm-images/qemu-ifup,downscript=no \
....

And the script is effectively:

switch=br0
if [ -n "$1" ];then
        /usr/bin/sudo /sbin/ip link set $1 up
        sleep 0.5s
        /usr/bin/sudo /usr/sbin/brctl addif $switch $1
	exit 0
fi
exit 1

This is resulting in the command 'ip link set tap0 up' hanging as a zombie:

root      3005     1  0 16:53 pts/3    00:00:00 /bin/sh /vm-images/qemu-ifup tap0
root      3011  3005  0 16:53 pts/3    00:00:00 /usr/bin/sudo /sbin/ip link set tap0 up
root      3012  3011  0 16:53 pts/3    00:00:00 [ip] <defunct>

In do_exit() with this trace:

[ 1630.782255] ip            x ffff88063fcb3600     0  3012   3011 0x00000000
[ 1630.789121]  ffff880631328000 0000000000000046 0000000000000000 ffff880633104380
[ 1630.796524]  0000000000013600 ffff88062f031fd8 0000000000013600 0000000000013600
[ 1630.803925]  ffff8806313282d8 ffff8806313282e0 ffff880631328000 0000000000013600
[ 1630.811324] Call Trace:
[ 1630.813760]  [<ffffffff8104a90d>] ? do_exit+0x716/0x724
[ 1630.818964]  [<ffffffff8104a995>] ? do_group_exit+0x7a/0xa4
[ 1630.824512]  [<ffffffff8104a9d1>] ? sys_exit_group+0x12/0x16
[ 1630.830149]  [<ffffffff81009a82>] ? system_call_fastpath+0x16/0x1b

The address comes down to the schedule() call:

(gdb) l *(do_exit+0x716)
0xffffffff8104a90d is in do_exit (kernel/exit.c:1034).
1029            preempt_disable();
1030            exit_rcu();
1031            /* causes final put_task_struct in finish_task_switch(). */
1032            tsk->state = TASK_DEAD;
1033            schedule();
1034            BUG();
1035            /* Avoid "noreturn function does return".  */
1036            for (;;)
1037                    cpu_relax();    /* For when BUG is null */
1038    }

Needless to say, KVM is not starting up. This works just fine on
2.6.35.1 and so is a regression. I can't do a lot of testing on this as
the host is the machine that hosts all my build and test environments....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [regression, 2.6.37-rc1] 'ip link tap0 up' stuck in do_exit()
  2010-11-03  6:26 [regression, 2.6.37-rc1] 'ip link tap0 up' stuck in do_exit() Dave Chinner
@ 2010-11-03  7:13 ` Eric Dumazet
  2010-11-03 10:34   ` Dave Chinner
  0 siblings, 1 reply; 14+ messages in thread
From: Eric Dumazet @ 2010-11-03  7:13 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-kernel, netdev

Le mercredi 03 novembre 2010 à 17:26 +1100, Dave Chinner a écrit :
> Folks,
> 
> Starting up KVM on a current mainline kernel using the tap
> device for the networking is resulting in the ip process tryin gto
> up the tap interface hanging. KVM is started with this networking
> config:
> 
> ....
>         -net nic,vlan=0,macaddr=00:e4:b6:63:63:6d,model=virtio \
>         -net tap,vlan=0,script=/vm-images/qemu-ifup,downscript=no \
> ....
> 
> And the script is effectively:
> 
> switch=br0
> if [ -n "$1" ];then
>         /usr/bin/sudo /sbin/ip link set $1 up
>         sleep 0.5s
>         /usr/bin/sudo /usr/sbin/brctl addif $switch $1
> 	exit 0
> fi
> exit 1
> 
> This is resulting in the command 'ip link set tap0 up' hanging as a zombie:
> 
> root      3005     1  0 16:53 pts/3    00:00:00 /bin/sh /vm-images/qemu-ifup tap0
> root      3011  3005  0 16:53 pts/3    00:00:00 /usr/bin/sudo /sbin/ip link set tap0 up
> root      3012  3011  0 16:53 pts/3    00:00:00 [ip] <defunct>
> 
> In do_exit() with this trace:
> 
> [ 1630.782255] ip            x ffff88063fcb3600     0  3012   3011 0x00000000
> [ 1630.789121]  ffff880631328000 0000000000000046 0000000000000000 ffff880633104380
> [ 1630.796524]  0000000000013600 ffff88062f031fd8 0000000000013600 0000000000013600
> [ 1630.803925]  ffff8806313282d8 ffff8806313282e0 ffff880631328000 0000000000013600
> [ 1630.811324] Call Trace:
> [ 1630.813760]  [<ffffffff8104a90d>] ? do_exit+0x716/0x724
> [ 1630.818964]  [<ffffffff8104a995>] ? do_group_exit+0x7a/0xa4
> [ 1630.824512]  [<ffffffff8104a9d1>] ? sys_exit_group+0x12/0x16
> [ 1630.830149]  [<ffffffff81009a82>] ? system_call_fastpath+0x16/0x1b
> 
> The address comes down to the schedule() call:
> 
> (gdb) l *(do_exit+0x716)
> 0xffffffff8104a90d is in do_exit (kernel/exit.c:1034).
> 1029            preempt_disable();
> 1030            exit_rcu();
> 1031            /* causes final put_task_struct in finish_task_switch(). */
> 1032            tsk->state = TASK_DEAD;
> 1033            schedule();
> 1034            BUG();
> 1035            /* Avoid "noreturn function does return".  */
> 1036            for (;;)
> 1037                    cpu_relax();    /* For when BUG is null */
> 1038    }
> 
> Needless to say, KVM is not starting up. This works just fine on
> 2.6.35.1 and so is a regression. I can't do a lot of testing on this as
> the host is the machine that hosts all my build and test environments....
> 
> Cheers,
> 
> Dave.

Could it be the same problem than 

http://kerneltrap.com/mailarchive/linux-netdev/2010/10/23/6288128

Try to revert bee31369ce16fc3898ec9a54161248c9eddb06bc ?

Thanks




^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [regression, 2.6.37-rc1] 'ip link tap0 up' stuck in do_exit()
  2010-11-03  7:13 ` Eric Dumazet
@ 2010-11-03 10:34   ` Dave Chinner
  2010-11-03 11:29     ` Dave Chinner
  0 siblings, 1 reply; 14+ messages in thread
From: Dave Chinner @ 2010-11-03 10:34 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: linux-kernel, netdev

On Wed, Nov 03, 2010 at 08:13:22AM +0100, Eric Dumazet wrote:
> Le mercredi 03 novembre 2010 à 17:26 +1100, Dave Chinner a écrit :
> > Folks,
> > 
> > Starting up KVM on a current mainline kernel using the tap
> > device for the networking is resulting in the ip process tryin gto
> > up the tap interface hanging. KVM is started with this networking
> > config:
> > 
> > ....
> >         -net nic,vlan=0,macaddr=00:e4:b6:63:63:6d,model=virtio \
> >         -net tap,vlan=0,script=/vm-images/qemu-ifup,downscript=no \
> > ....
> > 
> > And the script is effectively:
> > 
> > switch=br0
> > if [ -n "$1" ];then
> >         /usr/bin/sudo /sbin/ip link set $1 up
> >         sleep 0.5s
> >         /usr/bin/sudo /usr/sbin/brctl addif $switch $1
> > 	exit 0
> > fi
> > exit 1
> > 
> > This is resulting in the command 'ip link set tap0 up' hanging as a zombie:
> > 
> > root      3005     1  0 16:53 pts/3    00:00:00 /bin/sh /vm-images/qemu-ifup tap0
> > root      3011  3005  0 16:53 pts/3    00:00:00 /usr/bin/sudo /sbin/ip link set tap0 up
> > root      3012  3011  0 16:53 pts/3    00:00:00 [ip] <defunct>
> > 
> > In do_exit() with this trace:
> > 
> > [ 1630.782255] ip            x ffff88063fcb3600     0  3012   3011 0x00000000
> > [ 1630.789121]  ffff880631328000 0000000000000046 0000000000000000 ffff880633104380
> > [ 1630.796524]  0000000000013600 ffff88062f031fd8 0000000000013600 0000000000013600
> > [ 1630.803925]  ffff8806313282d8 ffff8806313282e0 ffff880631328000 0000000000013600
> > [ 1630.811324] Call Trace:
> > [ 1630.813760]  [<ffffffff8104a90d>] ? do_exit+0x716/0x724
> > [ 1630.818964]  [<ffffffff8104a995>] ? do_group_exit+0x7a/0xa4
> > [ 1630.824512]  [<ffffffff8104a9d1>] ? sys_exit_group+0x12/0x16
> > [ 1630.830149]  [<ffffffff81009a82>] ? system_call_fastpath+0x16/0x1b
> > 
> > The address comes down to the schedule() call:
> > 
> > (gdb) l *(do_exit+0x716)
> > 0xffffffff8104a90d is in do_exit (kernel/exit.c:1034).
> > 1029            preempt_disable();
> > 1030            exit_rcu();
> > 1031            /* causes final put_task_struct in finish_task_switch(). */
> > 1032            tsk->state = TASK_DEAD;
> > 1033            schedule();
> > 1034            BUG();
> > 1035            /* Avoid "noreturn function does return".  */
> > 1036            for (;;)
> > 1037                    cpu_relax();    /* For when BUG is null */
> > 1038    }
> > 
> > Needless to say, KVM is not starting up. This works just fine on
> > 2.6.35.1 and so is a regression. I can't do a lot of testing on this as
> > the host is the machine that hosts all my build and test environments....
> > 
> > Cheers,
> > 
> > Dave.
> 
> Could it be the same problem than 
> 
> http://kerneltrap.com/mailarchive/linux-netdev/2010/10/23/6288128
> 
> Try to revert bee31369ce16fc3898ec9a54161248c9eddb06bc ?

It's working fine on 2.6.36 right now, so it's something that came in
with the .37 merge cycle...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [regression, 2.6.37-rc1] 'ip link tap0 up' stuck in do_exit()
  2010-11-03 10:34   ` Dave Chinner
@ 2010-11-03 11:29     ` Dave Chinner
  2010-11-04  0:21       ` Dave Chinner
  0 siblings, 1 reply; 14+ messages in thread
From: Dave Chinner @ 2010-11-03 11:29 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: linux-kernel, netdev

On Wed, Nov 03, 2010 at 09:34:48PM +1100, Dave Chinner wrote:
> On Wed, Nov 03, 2010 at 08:13:22AM +0100, Eric Dumazet wrote:
> > Le mercredi 03 novembre 2010 à 17:26 +1100, Dave Chinner a écrit :
> > > Folks,
> > > 
> > > Starting up KVM on a current mainline kernel using the tap
> > > device for the networking is resulting in the ip process tryin gto
> > > up the tap interface hanging. KVM is started with this networking
> > > config:
> > > 
> > > ....
> > >         -net nic,vlan=0,macaddr=00:e4:b6:63:63:6d,model=virtio \
> > >         -net tap,vlan=0,script=/vm-images/qemu-ifup,downscript=no \
> > > ....
> > > 
> > > And the script is effectively:
> > > 
> > > switch=br0
> > > if [ -n "$1" ];then
> > >         /usr/bin/sudo /sbin/ip link set $1 up
> > >         sleep 0.5s
> > >         /usr/bin/sudo /usr/sbin/brctl addif $switch $1
> > > 	exit 0
> > > fi
> > > exit 1
> > > 
> > > This is resulting in the command 'ip link set tap0 up' hanging as a zombie:
> > > 
> > > root      3005     1  0 16:53 pts/3    00:00:00 /bin/sh /vm-images/qemu-ifup tap0
> > > root      3011  3005  0 16:53 pts/3    00:00:00 /usr/bin/sudo /sbin/ip link set tap0 up
> > > root      3012  3011  0 16:53 pts/3    00:00:00 [ip] <defunct>
> > > 
> > > In do_exit() with this trace:
> > > 
> > > [ 1630.782255] ip            x ffff88063fcb3600     0  3012   3011 0x00000000
> > > [ 1630.789121]  ffff880631328000 0000000000000046 0000000000000000 ffff880633104380
> > > [ 1630.796524]  0000000000013600 ffff88062f031fd8 0000000000013600 0000000000013600
> > > [ 1630.803925]  ffff8806313282d8 ffff8806313282e0 ffff880631328000 0000000000013600
> > > [ 1630.811324] Call Trace:
> > > [ 1630.813760]  [<ffffffff8104a90d>] ? do_exit+0x716/0x724
> > > [ 1630.818964]  [<ffffffff8104a995>] ? do_group_exit+0x7a/0xa4
> > > [ 1630.824512]  [<ffffffff8104a9d1>] ? sys_exit_group+0x12/0x16
> > > [ 1630.830149]  [<ffffffff81009a82>] ? system_call_fastpath+0x16/0x1b
> > > 
> > > The address comes down to the schedule() call:
> > > 
> > > (gdb) l *(do_exit+0x716)
> > > 0xffffffff8104a90d is in do_exit (kernel/exit.c:1034).
> > > 1029            preempt_disable();
> > > 1030            exit_rcu();
> > > 1031            /* causes final put_task_struct in finish_task_switch(). */
> > > 1032            tsk->state = TASK_DEAD;
> > > 1033            schedule();
> > > 1034            BUG();
> > > 1035            /* Avoid "noreturn function does return".  */
> > > 1036            for (;;)
> > > 1037                    cpu_relax();    /* For when BUG is null */
> > > 1038    }
> > > 
> > > Needless to say, KVM is not starting up. This works just fine on
> > > 2.6.35.1 and so is a regression. I can't do a lot of testing on this as
> > > the host is the machine that hosts all my build and test environments....
> > > 
> > > Cheers,
> > > 
> > > Dave.
> > 
> > Could it be the same problem than 
> > 
> > http://kerneltrap.com/mailarchive/linux-netdev/2010/10/23/6288128
> > 
> > Try to revert bee31369ce16fc3898ec9a54161248c9eddb06bc ?
> 
> It's working fine on 2.6.36 right now, so it's something that came in
> with the .37 merge cycle...

Actually, the machine isn't running a 2.6.36 kernel (it had booted
to the working .35 kernel and I didn't notice). So i've just tested
a 2.6.36 kernel, and the problem _is present_ in 2.6.36. I've
reverted the above commit but that does not fix the problem.

Cheers,

Dave.

-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [regression, 2.6.37-rc1] 'ip link tap0 up' stuck in do_exit()
  2010-11-03 11:29     ` Dave Chinner
@ 2010-11-04  0:21       ` Dave Chinner
  2010-11-04  5:47         ` Américo Wang
  0 siblings, 1 reply; 14+ messages in thread
From: Dave Chinner @ 2010-11-04  0:21 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: linux-kernel, netdev

On Wed, Nov 03, 2010 at 10:29:36PM +1100, Dave Chinner wrote:
> On Wed, Nov 03, 2010 at 09:34:48PM +1100, Dave Chinner wrote:
> > On Wed, Nov 03, 2010 at 08:13:22AM +0100, Eric Dumazet wrote:
> > > Le mercredi 03 novembre 2010 à 17:26 +1100, Dave Chinner a écrit :
> > > > Folks,
> > > > 
> > > > Starting up KVM on a current mainline kernel using the tap
> > > > device for the networking is resulting in the ip process tryin gto
> > > > up the tap interface hanging. KVM is started with this networking
> > > > config:
> > > > 
> > > > ....
> > > >         -net nic,vlan=0,macaddr=00:e4:b6:63:63:6d,model=virtio \
> > > >         -net tap,vlan=0,script=/vm-images/qemu-ifup,downscript=no \
> > > > ....
> > > > 
> > > > And the script is effectively:
> > > > 
> > > > switch=br0
> > > > if [ -n "$1" ];then
> > > >         /usr/bin/sudo /sbin/ip link set $1 up
> > > >         sleep 0.5s
> > > >         /usr/bin/sudo /usr/sbin/brctl addif $switch $1
> > > > 	exit 0
> > > > fi
> > > > exit 1
> > > > 
> > > > This is resulting in the command 'ip link set tap0 up' hanging as a zombie:
> > > > 
> > > > root      3005     1  0 16:53 pts/3    00:00:00 /bin/sh /vm-images/qemu-ifup tap0
> > > > root      3011  3005  0 16:53 pts/3    00:00:00 /usr/bin/sudo /sbin/ip link set tap0 up
> > > > root      3012  3011  0 16:53 pts/3    00:00:00 [ip] <defunct>
> > > > 
> > > > In do_exit() with this trace:
> > > > 
> > > > [ 1630.782255] ip            x ffff88063fcb3600     0  3012   3011 0x00000000
> > > > [ 1630.789121]  ffff880631328000 0000000000000046 0000000000000000 ffff880633104380
> > > > [ 1630.796524]  0000000000013600 ffff88062f031fd8 0000000000013600 0000000000013600
> > > > [ 1630.803925]  ffff8806313282d8 ffff8806313282e0 ffff880631328000 0000000000013600
> > > > [ 1630.811324] Call Trace:
> > > > [ 1630.813760]  [<ffffffff8104a90d>] ? do_exit+0x716/0x724
> > > > [ 1630.818964]  [<ffffffff8104a995>] ? do_group_exit+0x7a/0xa4
> > > > [ 1630.824512]  [<ffffffff8104a9d1>] ? sys_exit_group+0x12/0x16
> > > > [ 1630.830149]  [<ffffffff81009a82>] ? system_call_fastpath+0x16/0x1b
> > > > 
> > > > The address comes down to the schedule() call:
> > > > 
> > > > (gdb) l *(do_exit+0x716)
> > > > 0xffffffff8104a90d is in do_exit (kernel/exit.c:1034).
> > > > 1029            preempt_disable();
> > > > 1030            exit_rcu();
> > > > 1031            /* causes final put_task_struct in finish_task_switch(). */
> > > > 1032            tsk->state = TASK_DEAD;
> > > > 1033            schedule();
> > > > 1034            BUG();
> > > > 1035            /* Avoid "noreturn function does return".  */
> > > > 1036            for (;;)
> > > > 1037                    cpu_relax();    /* For when BUG is null */
> > > > 1038    }
> > > > 
> > > > Needless to say, KVM is not starting up. This works just fine on
> > > > 2.6.35.1 and so is a regression. I can't do a lot of testing on this as
> > > > the host is the machine that hosts all my build and test environments....
> > > > 
> > > > Cheers,
> > > > 
> > > > Dave.
> > > 
> > > Could it be the same problem than 
> > > 
> > > http://kerneltrap.com/mailarchive/linux-netdev/2010/10/23/6288128
> > > 
> > > Try to revert bee31369ce16fc3898ec9a54161248c9eddb06bc ?
> > 
> > It's working fine on 2.6.36 right now, so it's something that came in
> > with the .37 merge cycle...
> 
> Actually, the machine isn't running a 2.6.36 kernel (it had booted
> to the working .35 kernel and I didn't notice). So i've just tested
> a 2.6.36 kernel, and the problem _is present_ in 2.6.36. I've
> reverted the above commit but that does not fix the problem.

Ok, so further investigation has shown I can reproduce this on
2.6.32 and 2.6.35. It's not a new bug, nor do I think that it is
a networking bug as it is not specific to the ip command.

The trigger for the problem is actually an upgrade of the sudo
package in debian unstable which changed the behaviour of sudo (has
some per-login/pty restriction on it now). Basically, the startup
script I'm running does:

sudo kvm .....

which then executes the qemu-ifup bash script which does:

	sudo ip ....
	sudo brctl ...

because at one point KVM did not create the tap device automatically
and so kvm could be run as a user with only the ifup script
requiring privileges to create the tap device and mark it up. When
KVM started creating the tap device, I added the sudo to the KVM
script, an everything worked again.

Now if I take the 'sudo' out of the ifup script, the hang goes away.
I first removed it from the ip command, and then the brctl command
hung in the same way the ip command was hanging. Hence my thoughts
that it is not directly related to networking utilities.
Unfortunately, it is not trivial to reproduce as I could only
trigger it through this kvm method, not on the command line. e.g:

$ sudo bash -c "sudo ip link set tap1 up"

does not hang.

This sudo package upgrade coincided with kernel upgrades, and so
that lead to my confusion about where it occurred and what triggered
it.  Still, it appears to be a bug that has been around for some
time.....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [regression, 2.6.37-rc1] 'ip link tap0 up' stuck in do_exit()
  2010-11-04  0:21       ` Dave Chinner
@ 2010-11-04  5:47         ` Américo Wang
  2010-12-08  9:02           ` Florian Mickler
  0 siblings, 1 reply; 14+ messages in thread
From: Américo Wang @ 2010-11-04  5:47 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Eric Dumazet, linux-kernel, netdev

On Thu, Nov 04, 2010 at 11:21:40AM +1100, Dave Chinner wrote:
>On Wed, Nov 03, 2010 at 10:29:36PM +1100, Dave Chinner wrote:
>> On Wed, Nov 03, 2010 at 09:34:48PM +1100, Dave Chinner wrote:
>> > On Wed, Nov 03, 2010 at 08:13:22AM +0100, Eric Dumazet wrote:
>> > > Le mercredi 03 novembre 2010 à 17:26 +1100, Dave Chinner a écrit :
>> > > > Folks,
>> > > > 
>> > > > Starting up KVM on a current mainline kernel using the tap
>> > > > device for the networking is resulting in the ip process tryin gto
>> > > > up the tap interface hanging. KVM is started with this networking
>> > > > config:
>> > > > 
>> > > > ....
>> > > >         -net nic,vlan=0,macaddr=00:e4:b6:63:63:6d,model=virtio \
>> > > >         -net tap,vlan=0,script=/vm-images/qemu-ifup,downscript=no \
>> > > > ....
>> > > > 
>> > > > And the script is effectively:
>> > > > 
>> > > > switch=br0
>> > > > if [ -n "$1" ];then
>> > > >         /usr/bin/sudo /sbin/ip link set $1 up
>> > > >         sleep 0.5s
>> > > >         /usr/bin/sudo /usr/sbin/brctl addif $switch $1
>> > > > 	exit 0
>> > > > fi
>> > > > exit 1
>> > > > 
>> > > > This is resulting in the command 'ip link set tap0 up' hanging as a zombie:
>> > > > 
>> > > > root      3005     1  0 16:53 pts/3    00:00:00 /bin/sh /vm-images/qemu-ifup tap0
>> > > > root      3011  3005  0 16:53 pts/3    00:00:00 /usr/bin/sudo /sbin/ip link set tap0 up
>> > > > root      3012  3011  0 16:53 pts/3    00:00:00 [ip] <defunct>
>> > > > 
>> > > > In do_exit() with this trace:
>> > > > 
>> > > > [ 1630.782255] ip            x ffff88063fcb3600     0  3012   3011 0x00000000
>> > > > [ 1630.789121]  ffff880631328000 0000000000000046 0000000000000000 ffff880633104380
>> > > > [ 1630.796524]  0000000000013600 ffff88062f031fd8 0000000000013600 0000000000013600
>> > > > [ 1630.803925]  ffff8806313282d8 ffff8806313282e0 ffff880631328000 0000000000013600
>> > > > [ 1630.811324] Call Trace:
>> > > > [ 1630.813760]  [<ffffffff8104a90d>] ? do_exit+0x716/0x724
>> > > > [ 1630.818964]  [<ffffffff8104a995>] ? do_group_exit+0x7a/0xa4
>> > > > [ 1630.824512]  [<ffffffff8104a9d1>] ? sys_exit_group+0x12/0x16
>> > > > [ 1630.830149]  [<ffffffff81009a82>] ? system_call_fastpath+0x16/0x1b
>> > > > 
>> > > > The address comes down to the schedule() call:
>> > > > 
>> > > > (gdb) l *(do_exit+0x716)
>> > > > 0xffffffff8104a90d is in do_exit (kernel/exit.c:1034).
>> > > > 1029            preempt_disable();
>> > > > 1030            exit_rcu();
>> > > > 1031            /* causes final put_task_struct in finish_task_switch(). */
>> > > > 1032            tsk->state = TASK_DEAD;
>> > > > 1033            schedule();
>> > > > 1034            BUG();
>> > > > 1035            /* Avoid "noreturn function does return".  */
>> > > > 1036            for (;;)
>> > > > 1037                    cpu_relax();    /* For when BUG is null */
>> > > > 1038    }
>> > > > 
>> > > > Needless to say, KVM is not starting up. This works just fine on
>> > > > 2.6.35.1 and so is a regression. I can't do a lot of testing on this as
>> > > > the host is the machine that hosts all my build and test environments....
>> > > > 
>> > > > Cheers,
>> > > > 
>> > > > Dave.
>> > > 
>> > > Could it be the same problem than 
>> > > 
>> > > http://kerneltrap.com/mailarchive/linux-netdev/2010/10/23/6288128
>> > > 
>> > > Try to revert bee31369ce16fc3898ec9a54161248c9eddb06bc ?
>> > 
>> > It's working fine on 2.6.36 right now, so it's something that came in
>> > with the .37 merge cycle...
>> 
>> Actually, the machine isn't running a 2.6.36 kernel (it had booted
>> to the working .35 kernel and I didn't notice). So i've just tested
>> a 2.6.36 kernel, and the problem _is present_ in 2.6.36. I've
>> reverted the above commit but that does not fix the problem.
>
>Ok, so further investigation has shown I can reproduce this on
>2.6.32 and 2.6.35. It's not a new bug, nor do I think that it is
>a networking bug as it is not specific to the ip command.
>
>The trigger for the problem is actually an upgrade of the sudo
>package in debian unstable which changed the behaviour of sudo (has
>some per-login/pty restriction on it now). Basically, the startup
>script I'm running does:
>
>sudo kvm .....
>
>which then executes the qemu-ifup bash script which does:
>
>	sudo ip ....
>	sudo brctl ...
>
>because at one point KVM did not create the tap device automatically
>and so kvm could be run as a user with only the ifup script
>requiring privileges to create the tap device and mark it up. When
>KVM started creating the tap device, I added the sudo to the KVM
>script, an everything worked again.
>
>Now if I take the 'sudo' out of the ifup script, the hang goes away.
>I first removed it from the ip command, and then the brctl command
>hung in the same way the ip command was hanging. Hence my thoughts
>that it is not directly related to networking utilities.
>Unfortunately, it is not trivial to reproduce as I could only
>trigger it through this kvm method, not on the command line. e.g:
>
>$ sudo bash -c "sudo ip link set tap1 up"
>
>does not hang.
>
>This sudo package upgrade coincided with kernel upgrades, and so
>that lead to my confusion about where it occurred and what triggered
>it.  Still, it appears to be a bug that has been around for some
>time.....
>

Interesting, the scheduler failed to put the dead task out of
run queue, so to me this is likely to be a scheduler bug.
I have no idea how sudo can change the behaviour here.

Another guess is we need a smp_wmb() before schedule() above.

We need to Cc Oleg and Ingo.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [regression, 2.6.37-rc1] 'ip link tap0 up' stuck in do_exit()
  2010-11-04  5:47         ` Américo Wang
@ 2010-12-08  9:02           ` Florian Mickler
  2010-12-08 13:41             ` Oleg Nesterov
  0 siblings, 1 reply; 14+ messages in thread
From: Florian Mickler @ 2010-12-08  9:02 UTC (permalink / raw)
  To: Ingo Molnar, Oleg Nesterov
  Cc: Américo Wang, Dave Chinner, Eric Dumazet, linux-kernel,
	netdev

[ ccing Ingo and Oleg ] as suggested

On Thu, 4 Nov 2010 13:47:18 +0800
Américo Wang <xiyou.wangcong@gmail.com> wrote:

> On Thu, Nov 04, 2010 at 11:21:40AM +1100, Dave Chinner wrote:
> >On Wed, Nov 03, 2010 at 10:29:36PM +1100, Dave Chinner wrote:
> >> On Wed, Nov 03, 2010 at 09:34:48PM +1100, Dave Chinner wrote:
> >> > On Wed, Nov 03, 2010 at 08:13:22AM +0100, Eric Dumazet wrote:
> >> > > Le mercredi 03 novembre 2010 à 17:26 +1100, Dave Chinner a écrit :
> >> > > > Folks,
> >> > > > 
> >> > > > Starting up KVM on a current mainline kernel using the tap
> >> > > > device for the networking is resulting in the ip process tryin gto
> >> > > > up the tap interface hanging. KVM is started with this networking
> >> > > > config:
> >> > > > 
> >> > > > ....
> >> > > >         -net nic,vlan=0,macaddr=00:e4:b6:63:63:6d,model=virtio \
> >> > > >         -net tap,vlan=0,script=/vm-images/qemu-ifup,downscript=no \
> >> > > > ....
> >> > > > 
> >> > > > And the script is effectively:
> >> > > > 
> >> > > > switch=br0
> >> > > > if [ -n "$1" ];then
> >> > > >         /usr/bin/sudo /sbin/ip link set $1 up
> >> > > >         sleep 0.5s
> >> > > >         /usr/bin/sudo /usr/sbin/brctl addif $switch $1
> >> > > > 	exit 0
> >> > > > fi
> >> > > > exit 1
> >> > > > 
> >> > > > This is resulting in the command 'ip link set tap0 up' hanging as a zombie:
> >> > > > 
> >> > > > root      3005     1  0 16:53 pts/3    00:00:00 /bin/sh /vm-images/qemu-ifup tap0
> >> > > > root      3011  3005  0 16:53 pts/3    00:00:00 /usr/bin/sudo /sbin/ip link set tap0 up
> >> > > > root      3012  3011  0 16:53 pts/3    00:00:00 [ip] <defunct>
> >> > > > 
> >> > > > In do_exit() with this trace:
> >> > > > 
> >> > > > [ 1630.782255] ip            x ffff88063fcb3600     0  3012   3011 0x00000000
> >> > > > [ 1630.789121]  ffff880631328000 0000000000000046 0000000000000000 ffff880633104380
> >> > > > [ 1630.796524]  0000000000013600 ffff88062f031fd8 0000000000013600 0000000000013600
> >> > > > [ 1630.803925]  ffff8806313282d8 ffff8806313282e0 ffff880631328000 0000000000013600
> >> > > > [ 1630.811324] Call Trace:
> >> > > > [ 1630.813760]  [<ffffffff8104a90d>] ? do_exit+0x716/0x724
> >> > > > [ 1630.818964]  [<ffffffff8104a995>] ? do_group_exit+0x7a/0xa4
> >> > > > [ 1630.824512]  [<ffffffff8104a9d1>] ? sys_exit_group+0x12/0x16
> >> > > > [ 1630.830149]  [<ffffffff81009a82>] ? system_call_fastpath+0x16/0x1b
> >> > > > 
> >> > > > The address comes down to the schedule() call:
> >> > > > 
> >> > > > (gdb) l *(do_exit+0x716)
> >> > > > 0xffffffff8104a90d is in do_exit (kernel/exit.c:1034).
> >> > > > 1029            preempt_disable();
> >> > > > 1030            exit_rcu();
> >> > > > 1031            /* causes final put_task_struct in finish_task_switch(). */
> >> > > > 1032            tsk->state = TASK_DEAD;
> >> > > > 1033            schedule();
> >> > > > 1034            BUG();
> >> > > > 1035            /* Avoid "noreturn function does return".  */
> >> > > > 1036            for (;;)
> >> > > > 1037                    cpu_relax();    /* For when BUG is null */
> >> > > > 1038    }
> >> > > > 
> >> > > > Needless to say, KVM is not starting up. This works just fine on
> >> > > > 2.6.35.1 and so is a regression. I can't do a lot of testing on this as
> >> > > > the host is the machine that hosts all my build and test environments....
> >> > > > 
> >> > > > Cheers,
> >> > > > 
> >> > > > Dave.
> >> > > 
> >> > > Could it be the same problem than 
> >> > > 
> >> > > http://kerneltrap.com/mailarchive/linux-netdev/2010/10/23/6288128
> >> > > 
> >> > > Try to revert bee31369ce16fc3898ec9a54161248c9eddb06bc ?
> >> > 
> >> > It's working fine on 2.6.36 right now, so it's something that came in
> >> > with the .37 merge cycle...
> >> 
> >> Actually, the machine isn't running a 2.6.36 kernel (it had booted
> >> to the working .35 kernel and I didn't notice). So i've just tested
> >> a 2.6.36 kernel, and the problem _is present_ in 2.6.36. I've
> >> reverted the above commit but that does not fix the problem.
> >
> >Ok, so further investigation has shown I can reproduce this on
> >2.6.32 and 2.6.35. It's not a new bug, nor do I think that it is
> >a networking bug as it is not specific to the ip command.
> >
> >The trigger for the problem is actually an upgrade of the sudo
> >package in debian unstable which changed the behaviour of sudo (has
> >some per-login/pty restriction on it now). Basically, the startup
> >script I'm running does:
> >
> >sudo kvm .....
> >
> >which then executes the qemu-ifup bash script which does:
> >
> >	sudo ip ....
> >	sudo brctl ...
> >
> >because at one point KVM did not create the tap device automatically
> >and so kvm could be run as a user with only the ifup script
> >requiring privileges to create the tap device and mark it up. When
> >KVM started creating the tap device, I added the sudo to the KVM
> >script, an everything worked again.
> >
> >Now if I take the 'sudo' out of the ifup script, the hang goes away.
> >I first removed it from the ip command, and then the brctl command
> >hung in the same way the ip command was hanging. Hence my thoughts
> >that it is not directly related to networking utilities.
> >Unfortunately, it is not trivial to reproduce as I could only
> >trigger it through this kvm method, not on the command line. e.g:
> >
> >$ sudo bash -c "sudo ip link set tap1 up"
> >
> >does not hang.
> >
> >This sudo package upgrade coincided with kernel upgrades, and so
> >that lead to my confusion about where it occurred and what triggered
> >it.  Still, it appears to be a bug that has been around for some
> >time.....
> >
> 
> Interesting, the scheduler failed to put the dead task out of
> run queue, so to me this is likely to be a scheduler bug.
> I have no idea how sudo can change the behaviour here.
> 
> Another guess is we need a smp_wmb() before schedule() above.
> 
> We need to Cc Oleg and Ingo.
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [regression, 2.6.37-rc1] 'ip link tap0 up' stuck in do_exit()
  2010-12-08  9:02           ` Florian Mickler
@ 2010-12-08 13:41             ` Oleg Nesterov
  2010-12-08 13:47               ` Oleg Nesterov
  0 siblings, 1 reply; 14+ messages in thread
From: Oleg Nesterov @ 2010-12-08 13:41 UTC (permalink / raw)
  To: Florian Mickler
  Cc: Ingo Molnar, Américo Wang, Dave Chinner, Eric Dumazet,
	linux-kernel, netdev

On 12/08, Florian Mickler wrote:
>
> [ ccing Ingo and Oleg ] as suggested

Well. Of course I can't explain this bug. But, looking at this email
I do not see amything strange in exit/schedule/etc.

> > >> > > > This is resulting in the command 'ip link set tap0 up' hanging as a zombie:
> > >> > > >
> > >> > > > root      3005     1  0 16:53 pts/3    00:00:00 /bin/sh /vm-images/qemu-ifup tap0
> > >> > > > root      3011  3005  0 16:53 pts/3    00:00:00 /usr/bin/sudo /sbin/ip link set tap0 up
> > >> > > > root      3012  3011  0 16:53 pts/3    00:00:00 [ip] <defunct>

That is. ip is a zombie.

> > >> > > > In do_exit() with this trace:
> > >> > > >
> > >> > > > [ 1630.782255] ip            x ffff88063fcb3600     0  3012   3011 0x00000000
> > >> > > > [ 1630.789121]  ffff880631328000 0000000000000046 0000000000000000 ffff880633104380
> > >> > > > [ 1630.796524]  0000000000013600 ffff88062f031fd8 0000000000013600 0000000000013600
> > >> > > > [ 1630.803925]  ffff8806313282d8 ffff8806313282e0 ffff880631328000 0000000000013600
> > >> > > > [ 1630.811324] Call Trace:
> > >> > > > [ 1630.813760]  [<ffffffff8104a90d>] ? do_exit+0x716/0x724
> > >> > > > [ 1630.818964]  [<ffffffff8104a995>] ? do_group_exit+0x7a/0xa4
> > >> > > > [ 1630.824512]  [<ffffffff8104a9d1>] ? sys_exit_group+0x12/0x16
> > >> > > > [ 1630.830149]  [<ffffffff81009a82>] ? system_call_fastpath+0x16/0x1b
> > >> > > >
> > >> > > > The address comes down to the schedule() call:
> > >> > > >
> > >> > > > (gdb) l *(do_exit+0x716)
> > >> > > > 0xffffffff8104a90d is in do_exit (kernel/exit.c:1034).
> > >> > > > 1029            preempt_disable();
> > >> > > > 1030            exit_rcu();
> > >> > > > 1031            /* causes final put_task_struct in finish_task_switch(). */
> > >> > > > 1032            tsk->state = TASK_DEAD;
> > >> > > > 1033            schedule();
> > >> > > > 1034            BUG();
> > >> > > > 1035            /* Avoid "noreturn function does return".  */
> > >> > > > 1036            for (;;)
> > >> > > > 1037                    cpu_relax();    /* For when BUG is null */
> > >> > > > 1038    }

Everything is correct. The task is dead, but it wasn't released by its
parent, task_struct (and thus the stack) is still visible.

> > Interesting, the scheduler failed to put the dead task out of
> > run queue, so to me this is likely to be a scheduler bug.
> > I have no idea how sudo can change the behaviour here.
> >
> > Another guess is we need a smp_wmb() before schedule() above.

No, everything looks fine.

For example,

	$ perl -le 'print fork || exit; <>'
	17436

	$ ps 17436
	  PID TTY      STAT   TIME COMMAND
	17436 pts/22   Z+     0:00 [perl] <defunct>

	$ cat /proc/17436/stack
	[<ffffffff8104d3a0>] do_exit+0x6c4/0x6d2
	[<ffffffff8104d429>] do_group_exit+0x7b/0xa4
	[<ffffffff8104d469>] sys_exit_group+0x17/0x1b
	[<ffffffff8100bdb2>] system_call_fastpath+0x16/0x1b
	[<ffffffffffffffff>] 0xffffffffffffffff

Oleg.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [regression, 2.6.37-rc1] 'ip link tap0 up' stuck in do_exit()
  2010-12-08 13:41             ` Oleg Nesterov
@ 2010-12-08 13:47               ` Oleg Nesterov
  2010-12-08 14:08                 ` Oleg Nesterov
  0 siblings, 1 reply; 14+ messages in thread
From: Oleg Nesterov @ 2010-12-08 13:47 UTC (permalink / raw)
  To: Florian Mickler
  Cc: Ingo Molnar, Américo Wang, Dave Chinner, Eric Dumazet,
	linux-kernel, netdev

On 12/08, Oleg Nesterov wrote:
>
> On 12/08, Florian Mickler wrote:
> >
> > [ ccing Ingo and Oleg ] as suggested
>
> Well. Of course I can't explain this bug. But, looking at this email
> I do not see amything strange in exit/schedule/etc.
>
> > > >> > > > This is resulting in the command 'ip link set tap0 up' hanging as a zombie:
> > > >> > > >
> > > >> > > > root      3005     1  0 16:53 pts/3    00:00:00 /bin/sh /vm-images/qemu-ifup tap0
> > > >> > > > root      3011  3005  0 16:53 pts/3    00:00:00 /usr/bin/sudo /sbin/ip link set tap0 up
> > > >> > > > root      3012  3011  0 16:53 pts/3    00:00:00 [ip] <defunct>
>
> That is. ip is a zombie.

And. I do not know if this matters or not, but "the command 'ip link
set tap0 up' hanging as a zombie" does not look right.

This was spawned by

> >> > > > if [ -n "$1" ];then
> >> > > >         /usr/bin/sudo /sbin/ip link set $1 up
> >> > > >         sleep 0.5s
> >> > > >         /usr/bin/sudo /usr/sbin/brctl addif $switch $1
> >> > > >      exit 0
> >> > > > fi

The command does not hang. But it forks the child with pid == 3012,
this child exits.

Oleg.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [regression, 2.6.37-rc1] 'ip link tap0 up' stuck in do_exit()
  2010-12-08 13:47               ` Oleg Nesterov
@ 2010-12-08 14:08                 ` Oleg Nesterov
  2010-12-09 16:47                   ` Américo Wang
  0 siblings, 1 reply; 14+ messages in thread
From: Oleg Nesterov @ 2010-12-08 14:08 UTC (permalink / raw)
  To: Florian Mickler
  Cc: Ingo Molnar, Américo Wang, Dave Chinner, Eric Dumazet,
	linux-kernel, netdev

On 12/08, Oleg Nesterov wrote:
>
> On 12/08, Oleg Nesterov wrote:
> >
> > On 12/08, Florian Mickler wrote:
> > >
> > > [ ccing Ingo and Oleg ] as suggested
> >
> > Well. Of course I can't explain this bug. But, looking at this email
> > I do not see amything strange in exit/schedule/etc.
> >
> > > > >> > > > This is resulting in the command 'ip link set tap0 up' hanging as a zombie:
> > > > >> > > >
> > > > >> > > > root      3005     1  0 16:53 pts/3    00:00:00 /bin/sh /vm-images/qemu-ifup tap0
> > > > >> > > > root      3011  3005  0 16:53 pts/3    00:00:00 /usr/bin/sudo /sbin/ip link set tap0 up
> > > > >> > > > root      3012  3011  0 16:53 pts/3    00:00:00 [ip] <defunct>
> >
> > That is. ip is a zombie.
>
> And. I do not know if this matters or not, but "the command 'ip link
> set tap0 up' hanging as a zombie" does not look right.
>
> This was spawned by
>
> > >> > > > if [ -n "$1" ];then
> > >> > > >         /usr/bin/sudo /sbin/ip link set $1 up
> > >> > > >         sleep 0.5s
> > >> > > >         /usr/bin/sudo /usr/sbin/brctl addif $switch $1
> > >> > > >      exit 0
> > >> > > > fi
>
> The command does not hang. But it forks the child with pid == 3012,
> this child exits.

Damn, sorry for noise, forgot to mention...

The parent's trace (pid == 3011) can be more useful. Say, if it
hangs in do_wait(), then the kernel is obviously wrong.

Oleg.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [regression, 2.6.37-rc1] 'ip link tap0 up' stuck in do_exit()
  2010-12-08 14:08                 ` Oleg Nesterov
@ 2010-12-09 16:47                   ` Américo Wang
  2010-12-09 17:07                     ` Eric Dumazet
  2010-12-09 17:59                     ` Jim Bos
  0 siblings, 2 replies; 14+ messages in thread
From: Américo Wang @ 2010-12-09 16:47 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Florian Mickler, Ingo Molnar, Américo Wang, Dave Chinner,
	Eric Dumazet, linux-kernel, netdev

On Wed, Dec 08, 2010 at 03:08:22PM +0100, Oleg Nesterov wrote:
>On 12/08, Oleg Nesterov wrote:
>>
>> On 12/08, Oleg Nesterov wrote:
>> >
>> > On 12/08, Florian Mickler wrote:
>> > >
>> > > [ ccing Ingo and Oleg ] as suggested
>> >
>> > Well. Of course I can't explain this bug. But, looking at this email
>> > I do not see amything strange in exit/schedule/etc.
>> >
>> > > > >> > > > This is resulting in the command 'ip link set tap0 up' hanging as a zombie:
>> > > > >> > > >
>> > > > >> > > > root      3005     1  0 16:53 pts/3    00:00:00 /bin/sh /vm-images/qemu-ifup tap0
>> > > > >> > > > root      3011  3005  0 16:53 pts/3    00:00:00 /usr/bin/sudo /sbin/ip link set tap0 up
>> > > > >> > > > root      3012  3011  0 16:53 pts/3    00:00:00 [ip] <defunct>
>> >
>> > That is. ip is a zombie.
>>
>> And. I do not know if this matters or not, but "the command 'ip link
>> set tap0 up' hanging as a zombie" does not look right.
>>
>> This was spawned by
>>
>> > >> > > > if [ -n "$1" ];then
>> > >> > > >         /usr/bin/sudo /sbin/ip link set $1 up
>> > >> > > >         sleep 0.5s
>> > >> > > >         /usr/bin/sudo /usr/sbin/brctl addif $switch $1
>> > >> > > >      exit 0
>> > >> > > > fi
>>
>> The command does not hang. But it forks the child with pid == 3012,
>> this child exits.
>
>Damn, sorry for noise, forgot to mention...
>
>The parent's trace (pid == 3011) can be more useful. Say, if it
>hangs in do_wait(), then the kernel is obviously wrong.
>

Yeah, there is no point that a zombie can trigger a BUG_ON in kenrel.
But it is still interesting to know why it becomes a zombie...

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [regression, 2.6.37-rc1] 'ip link tap0 up' stuck in do_exit()
  2010-12-09 16:47                   ` Américo Wang
@ 2010-12-09 17:07                     ` Eric Dumazet
  2010-12-09 17:09                       ` Eric Dumazet
  2010-12-09 17:59                     ` Jim Bos
  1 sibling, 1 reply; 14+ messages in thread
From: Eric Dumazet @ 2010-12-09 17:07 UTC (permalink / raw)
  To: Américo Wang
  Cc: Oleg Nesterov, Florian Mickler, Ingo Molnar, Dave Chinner,
	linux-kernel, netdev

Le vendredi 10 décembre 2010 à 00:47 +0800, Américo Wang a écrit :

> Yeah, there is no point that a zombie can trigger a BUG_ON in kenrel.
> But it is still interesting to know why it becomes a zombie...
> 

A zombie is very easy to get.

Technically speaking, all processes die and become zombies, unless
parent said : signal(SIGCLD, SIG_IGN) before fork()

The parent is buggy (sudo in this case ?) and doesnt call wait() to
'free' one of its children.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [regression, 2.6.37-rc1] 'ip link tap0 up' stuck in do_exit()
  2010-12-09 17:07                     ` Eric Dumazet
@ 2010-12-09 17:09                       ` Eric Dumazet
  0 siblings, 0 replies; 14+ messages in thread
From: Eric Dumazet @ 2010-12-09 17:09 UTC (permalink / raw)
  To: Américo Wang
  Cc: Oleg Nesterov, Florian Mickler, Ingo Molnar, Dave Chinner,
	linux-kernel, netdev

Le jeudi 09 décembre 2010 à 18:07 +0100, Eric Dumazet a écrit :
> Le vendredi 10 décembre 2010 à 00:47 +0800, Américo Wang a écrit :
> 
> > Yeah, there is no point that a zombie can trigger a BUG_ON in kenrel.
> > But it is still interesting to know why it becomes a zombie...
> > 
> 
> A zombie is very easy to get.
> 
> Technically speaking, all processes die and become zombies, unless
> parent said : signal(SIGCLD, SIG_IGN) before fork()
> 
> The parent is buggy (sudo in this case ?) and doesnt call wait() to
> 'free' one of its children.
> 
> 

Before you ask :)

If the parent dies before the child, task is re-parented to init.

Then, with namespaces, I dont know what happens (is there one init per
namespace ?)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [regression, 2.6.37-rc1] 'ip link tap0 up' stuck in do_exit()
  2010-12-09 16:47                   ` Américo Wang
  2010-12-09 17:07                     ` Eric Dumazet
@ 2010-12-09 17:59                     ` Jim Bos
  1 sibling, 0 replies; 14+ messages in thread
From: Jim Bos @ 2010-12-09 17:59 UTC (permalink / raw)
  To: Américo Wang
  Cc: Oleg Nesterov, Florian Mickler, Ingo Molnar, Dave Chinner,
	Eric Dumazet, linux-kernel, netdev

On 12/09/2010 05:47 PM, Américo Wang wrote:
> On Wed, Dec 08, 2010 at 03:08:22PM +0100, Oleg Nesterov wrote:
>> On 12/08, Oleg Nesterov wrote:
>>>
>>> On 12/08, Oleg Nesterov wrote:
>>>>
>>>> On 12/08, Florian Mickler wrote:
>>>>>
>>>>> [ ccing Ingo and Oleg ] as suggested
>>>>
>>>> Well. Of course I can't explain this bug. But, looking at this email
>>>> I do not see amything strange in exit/schedule/etc.
>>>>
>>>>>>>>>>> This is resulting in the command 'ip link set tap0 up' hanging as a zombie:
>>>>>>>>>>>
>>>>>>>>>>> root      3005     1  0 16:53 pts/3    00:00:00 /bin/sh /vm-images/qemu-ifup tap0
>>>>>>>>>>> root      3011  3005  0 16:53 pts/3    00:00:00 /usr/bin/sudo /sbin/ip link set tap0 up
>>>>>>>>>>> root      3012  3011  0 16:53 pts/3    00:00:00 [ip] <defunct>
>>>>
>>>> That is. ip is a zombie.
>>>
>>> And. I do not know if this matters or not, but "the command 'ip link
>>> set tap0 up' hanging as a zombie" does not look right.
>>>
>>> This was spawned by
>>>
>>>>>>>>> if [ -n "$1" ];then
>>>>>>>>>         /usr/bin/sudo /sbin/ip link set $1 up
>>>>>>>>>         sleep 0.5s
>>>>>>>>>         /usr/bin/sudo /usr/sbin/brctl addif $switch $1
>>>>>>>>>      exit 0
>>>>>>>>> fi
>>>
>>> The command does not hang. But it forks the child with pid == 3012,
>>> this child exits.
>>
>> Damn, sorry for noise, forgot to mention...
>>
>> The parent's trace (pid == 3011) can be more useful. Say, if it
>> hangs in do_wait(), then the kernel is obviously wrong.
>>
> 
> Yeah, there is no point that a zombie can trigger a BUG_ON in kenrel.
> But it is still interesting to know why it becomes a zombie...
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

ip link tap0 up not working might be this issue:

   http://marc.info/?l=linux-netdev&m=128783852132311&w=2

( Latest Virtualbox 3.2.12 works around this issue )

Jim

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2010-12-09 17:59 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-11-03  6:26 [regression, 2.6.37-rc1] 'ip link tap0 up' stuck in do_exit() Dave Chinner
2010-11-03  7:13 ` Eric Dumazet
2010-11-03 10:34   ` Dave Chinner
2010-11-03 11:29     ` Dave Chinner
2010-11-04  0:21       ` Dave Chinner
2010-11-04  5:47         ` Américo Wang
2010-12-08  9:02           ` Florian Mickler
2010-12-08 13:41             ` Oleg Nesterov
2010-12-08 13:47               ` Oleg Nesterov
2010-12-08 14:08                 ` Oleg Nesterov
2010-12-09 16:47                   ` Américo Wang
2010-12-09 17:07                     ` Eric Dumazet
2010-12-09 17:09                       ` Eric Dumazet
2010-12-09 17:59                     ` Jim Bos

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).