All of lore.kernel.org
 help / color / mirror / Atom feed
From: Oleg Nesterov <oleg@redhat.com>
To: Florian Mickler <florian@mickler.org>
Cc: "Ingo Molnar" <mingo@elte.hu>,
	"Américo Wang" <xiyou.wangcong@gmail.com>,
	"Dave Chinner" <david@fromorbit.com>,
	"Eric Dumazet" <eric.dumazet@gmail.com>,
	linux-kernel@vger.kernel.org, netdev@vger.kernel.org
Subject: Re: [regression, 2.6.37-rc1] 'ip link tap0 up' stuck in do_exit()
Date: Wed, 8 Dec 2010 14:41:16 +0100	[thread overview]
Message-ID: <20101208134116.GA16923@redhat.com> (raw)
In-Reply-To: <20101208100245.01cf23c5@schatten.dmk.lab>

On 12/08, Florian Mickler wrote:
>
> [ ccing Ingo and Oleg ] as suggested

Well. Of course I can't explain this bug. But, looking at this email
I do not see amything strange in exit/schedule/etc.

> > >> > > > This is resulting in the command 'ip link set tap0 up' hanging as a zombie:
> > >> > > >
> > >> > > > root      3005     1  0 16:53 pts/3    00:00:00 /bin/sh /vm-images/qemu-ifup tap0
> > >> > > > root      3011  3005  0 16:53 pts/3    00:00:00 /usr/bin/sudo /sbin/ip link set tap0 up
> > >> > > > root      3012  3011  0 16:53 pts/3    00:00:00 [ip] <defunct>

That is. ip is a zombie.

> > >> > > > In do_exit() with this trace:
> > >> > > >
> > >> > > > [ 1630.782255] ip            x ffff88063fcb3600     0  3012   3011 0x00000000
> > >> > > > [ 1630.789121]  ffff880631328000 0000000000000046 0000000000000000 ffff880633104380
> > >> > > > [ 1630.796524]  0000000000013600 ffff88062f031fd8 0000000000013600 0000000000013600
> > >> > > > [ 1630.803925]  ffff8806313282d8 ffff8806313282e0 ffff880631328000 0000000000013600
> > >> > > > [ 1630.811324] Call Trace:
> > >> > > > [ 1630.813760]  [<ffffffff8104a90d>] ? do_exit+0x716/0x724
> > >> > > > [ 1630.818964]  [<ffffffff8104a995>] ? do_group_exit+0x7a/0xa4
> > >> > > > [ 1630.824512]  [<ffffffff8104a9d1>] ? sys_exit_group+0x12/0x16
> > >> > > > [ 1630.830149]  [<ffffffff81009a82>] ? system_call_fastpath+0x16/0x1b
> > >> > > >
> > >> > > > The address comes down to the schedule() call:
> > >> > > >
> > >> > > > (gdb) l *(do_exit+0x716)
> > >> > > > 0xffffffff8104a90d is in do_exit (kernel/exit.c:1034).
> > >> > > > 1029            preempt_disable();
> > >> > > > 1030            exit_rcu();
> > >> > > > 1031            /* causes final put_task_struct in finish_task_switch(). */
> > >> > > > 1032            tsk->state = TASK_DEAD;
> > >> > > > 1033            schedule();
> > >> > > > 1034            BUG();
> > >> > > > 1035            /* Avoid "noreturn function does return".  */
> > >> > > > 1036            for (;;)
> > >> > > > 1037                    cpu_relax();    /* For when BUG is null */
> > >> > > > 1038    }

Everything is correct. The task is dead, but it wasn't released by its
parent, task_struct (and thus the stack) is still visible.

> > Interesting, the scheduler failed to put the dead task out of
> > run queue, so to me this is likely to be a scheduler bug.
> > I have no idea how sudo can change the behaviour here.
> >
> > Another guess is we need a smp_wmb() before schedule() above.

No, everything looks fine.

For example,

	$ perl -le 'print fork || exit; <>'
	17436

	$ ps 17436
	  PID TTY      STAT   TIME COMMAND
	17436 pts/22   Z+     0:00 [perl] <defunct>

	$ cat /proc/17436/stack
	[<ffffffff8104d3a0>] do_exit+0x6c4/0x6d2
	[<ffffffff8104d429>] do_group_exit+0x7b/0xa4
	[<ffffffff8104d469>] sys_exit_group+0x17/0x1b
	[<ffffffff8100bdb2>] system_call_fastpath+0x16/0x1b
	[<ffffffffffffffff>] 0xffffffffffffffff

Oleg.


  reply	other threads:[~2010-12-08 13:48 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-11-03  6:26 [regression, 2.6.37-rc1] 'ip link tap0 up' stuck in do_exit() Dave Chinner
2010-11-03  7:13 ` Eric Dumazet
2010-11-03 10:34   ` Dave Chinner
2010-11-03 11:29     ` Dave Chinner
2010-11-04  0:21       ` Dave Chinner
2010-11-04  5:47         ` Américo Wang
2010-12-08  9:02           ` Florian Mickler
2010-12-08 13:41             ` Oleg Nesterov [this message]
2010-12-08 13:47               ` Oleg Nesterov
2010-12-08 14:08                 ` Oleg Nesterov
2010-12-09 16:47                   ` Américo Wang
2010-12-09 17:07                     ` Eric Dumazet
2010-12-09 17:09                       ` Eric Dumazet
2010-12-09 17:59                     ` Jim Bos

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20101208134116.GA16923@redhat.com \
    --to=oleg@redhat.com \
    --cc=david@fromorbit.com \
    --cc=eric.dumazet@gmail.com \
    --cc=florian@mickler.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=netdev@vger.kernel.org \
    --cc=xiyou.wangcong@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.