From: "H. J. Lu" <hjl@lucon.org>
To: Andrew Morton <akpm@osdl.org>
Cc: linux-kernel@vger.kernel.org, linux-ia64@vger.kernel.org
Subject: Re: Hanging process on SMP machines?
Date: Fri, 10 Sep 2004 16:39:27 +0000 [thread overview]
Message-ID: <20040910163927.GA8124@lucon.org> (raw)
In-Reply-To: <20040910004549.2df18073.akpm@osdl.org>
On Fri, Sep 10, 2004 at 12:45:49AM -0700, Andrew Morton wrote:
> "H. J. Lu" <hjl@lucon.org> wrote:
> >
> > I notice that a process may hang on SMP machines at random:
> >
> > http://bugzilla.kernel.org/show_bug.cgi?id332
> >
> > I can reliably trigger it within 15 minutes under SMP kernel on P4 HT
> > and 4-way ia64 machines. Has anyone else seen it?
>
> It's easy to reproduce on 2-way x86, however it doesn't look like a kernel
> bug.
>
>
> akpm 2503 0.0 0.2 3892 640 pts/0 S 00:34 0:00 | \_ make
> akpm 2504 0.0 0.0 1368 192 pts/0 S 00:34 0:00 | \_ time expect test.exp
> akpm 2505 0.0 0.5 4776 1276 pts/0 S 00:34 0:00 | \_ expect test.exp
> akpm 4726 0.0 0.0 0 0 ? Z 00:35 0:00 | \_ [true] <defunct>
>
> process 4726 is sleeping at the end of do_exit():
>
>
> schedule(); <<- here
> BUG();
> /* Avoid "noreturn function does return". */
> for (;;) ;
> }
>
> So it has completely exitted and is waiting for someone to reap its exit
> code and stack slot via wait4(). So what is its parent up to?
>
>
> (gdb) thread 69
> [Switching to thread 69 (Thread 2505)]#0 0xc01653e9 in do_select (n=5, fds=0xcc8b5fa4,
> timeout=0xcc8b5fa0) at fs/select.c:257
> 257 __timeout = schedule_timeout(__timeout);
> (gdb) bt
> #0 0xc01653e9 in do_select (n=5, fds=0xcc8b5fa4, timeout=0xcc8b5fa0) at fs/select.c:257
> #1 0xc016579c in sys_select (n=5, inp=0x804b444, outp=0x804b4c4, exp=0x804b544, tvp=0xbfffe1d0)
> at fs/select.c:354
> #2 0xc0105e39 in sysenter_past_esp () at arch/i386/kernel/semaphore.c:177
>
> The parent is sleeping in select() rather than wait()ing for children.
>
I don't think SIGCHLD is blocked. Shouldn't SIGCHLD interrupt select?
UP kernel doesn't have this problem.
H.J.
WARNING: multiple messages have this Message-ID (diff)
From: "H. J. Lu" <hjl@lucon.org>
To: Andrew Morton <akpm@osdl.org>
Cc: linux-kernel@vger.kernel.org, linux-ia64@vger.kernel.org
Subject: Re: Hanging process on SMP machines?
Date: Fri, 10 Sep 2004 09:39:27 -0700 [thread overview]
Message-ID: <20040910163927.GA8124@lucon.org> (raw)
In-Reply-To: <20040910004549.2df18073.akpm@osdl.org>
On Fri, Sep 10, 2004 at 12:45:49AM -0700, Andrew Morton wrote:
> "H. J. Lu" <hjl@lucon.org> wrote:
> >
> > I notice that a process may hang on SMP machines at random:
> >
> > http://bugzilla.kernel.org/show_bug.cgi?id=3332
> >
> > I can reliably trigger it within 15 minutes under SMP kernel on P4 HT
> > and 4-way ia64 machines. Has anyone else seen it?
>
> It's easy to reproduce on 2-way x86, however it doesn't look like a kernel
> bug.
>
>
> akpm 2503 0.0 0.2 3892 640 pts/0 S 00:34 0:00 | \_ make
> akpm 2504 0.0 0.0 1368 192 pts/0 S 00:34 0:00 | \_ time expect test.exp
> akpm 2505 0.0 0.5 4776 1276 pts/0 S 00:34 0:00 | \_ expect test.exp
> akpm 4726 0.0 0.0 0 0 ? Z 00:35 0:00 | \_ [true] <defunct>
>
> process 4726 is sleeping at the end of do_exit():
>
>
> schedule(); <<- here
> BUG();
> /* Avoid "noreturn function does return". */
> for (;;) ;
> }
>
> So it has completely exitted and is waiting for someone to reap its exit
> code and stack slot via wait4(). So what is its parent up to?
>
>
> (gdb) thread 69
> [Switching to thread 69 (Thread 2505)]#0 0xc01653e9 in do_select (n=5, fds=0xcc8b5fa4,
> timeout=0xcc8b5fa0) at fs/select.c:257
> 257 __timeout = schedule_timeout(__timeout);
> (gdb) bt
> #0 0xc01653e9 in do_select (n=5, fds=0xcc8b5fa4, timeout=0xcc8b5fa0) at fs/select.c:257
> #1 0xc016579c in sys_select (n=5, inp=0x804b444, outp=0x804b4c4, exp=0x804b544, tvp=0xbfffe1d0)
> at fs/select.c:354
> #2 0xc0105e39 in sysenter_past_esp () at arch/i386/kernel/semaphore.c:177
>
> The parent is sleeping in select() rather than wait()ing for children.
>
I don't think SIGCHLD is blocked. Shouldn't SIGCHLD interrupt select?
UP kernel doesn't have this problem.
H.J.
next prev parent reply other threads:[~2004-09-10 16:39 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2004-09-09 20:13 Hanging process on SMP machines? H. J. Lu
2004-09-09 20:13 ` H. J. Lu
2004-09-10 7:45 ` Andrew Morton
2004-09-10 7:45 ` Andrew Morton
2004-09-10 16:39 ` H. J. Lu [this message]
2004-09-10 16:39 ` H. J. Lu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20040910163927.GA8124@lucon.org \
--to=hjl@lucon.org \
--cc=akpm@osdl.org \
--cc=linux-ia64@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.