All of lore.kernel.org
 help / color / mirror / Atom feed
From: "H. J. Lu" <hjl@lucon.org>
To: Andrew Morton <akpm@osdl.org>
Cc: linux-kernel@vger.kernel.org, linux-ia64@vger.kernel.org
Subject: Re: Hanging process on SMP machines?
Date: Fri, 10 Sep 2004 16:39:27 +0000	[thread overview]
Message-ID: <20040910163927.GA8124@lucon.org> (raw)
In-Reply-To: <20040910004549.2df18073.akpm@osdl.org>

On Fri, Sep 10, 2004 at 12:45:49AM -0700, Andrew Morton wrote:
> "H. J. Lu" <hjl@lucon.org> wrote:
> >
> > I notice that a process may hang on SMP machines at random:
> > 
> >  http://bugzilla.kernel.org/show_bug.cgi?id332
> > 
> >  I can reliably trigger it within 15 minutes under SMP kernel on P4 HT
> >  and 4-way ia64 machines. Has anyone else seen it?
> 
> It's easy to reproduce on 2-way x86, however it doesn't look like a kernel
> bug.
> 
> 
> akpm      2503  0.0  0.2  3892  640 pts/0    S    00:34   0:00  |           \_ make
> akpm      2504  0.0  0.0  1368  192 pts/0    S    00:34   0:00  |               \_ time expect test.exp
> akpm      2505  0.0  0.5  4776 1276 pts/0    S    00:34   0:00  |                   \_ expect test.exp
> akpm      4726  0.0  0.0     0    0 ?        Z    00:35   0:00  |                       \_ [true] <defunct>
> 
> process 4726 is sleeping at the end of do_exit():
> 
> 
> 	schedule();		<<- here
> 	BUG();
> 	/* Avoid "noreturn function does return".  */
> 	for (;;) ;
> }
> 
> So it has completely exitted and is waiting for someone to reap its exit
> code and stack slot via wait4().  So what is its parent up to?
> 
> 
> (gdb) thread 69
> [Switching to thread 69 (Thread 2505)]#0  0xc01653e9 in do_select (n=5, fds=0xcc8b5fa4, 
>     timeout=0xcc8b5fa0) at fs/select.c:257
> 257                     __timeout = schedule_timeout(__timeout);
> (gdb) bt
> #0  0xc01653e9 in do_select (n=5, fds=0xcc8b5fa4, timeout=0xcc8b5fa0) at fs/select.c:257
> #1  0xc016579c in sys_select (n=5, inp=0x804b444, outp=0x804b4c4, exp=0x804b544, tvp=0xbfffe1d0)
>     at fs/select.c:354
> #2  0xc0105e39 in sysenter_past_esp () at arch/i386/kernel/semaphore.c:177
> 
> The parent is sleeping in select() rather than wait()ing for children.
> 

I don't think SIGCHLD is blocked. Shouldn't SIGCHLD interrupt select?
UP kernel doesn't have this problem.


H.J.

WARNING: multiple messages have this Message-ID (diff)
From: "H. J. Lu" <hjl@lucon.org>
To: Andrew Morton <akpm@osdl.org>
Cc: linux-kernel@vger.kernel.org, linux-ia64@vger.kernel.org
Subject: Re: Hanging process on SMP machines?
Date: Fri, 10 Sep 2004 09:39:27 -0700	[thread overview]
Message-ID: <20040910163927.GA8124@lucon.org> (raw)
In-Reply-To: <20040910004549.2df18073.akpm@osdl.org>

On Fri, Sep 10, 2004 at 12:45:49AM -0700, Andrew Morton wrote:
> "H. J. Lu" <hjl@lucon.org> wrote:
> >
> > I notice that a process may hang on SMP machines at random:
> > 
> >  http://bugzilla.kernel.org/show_bug.cgi?id=3332
> > 
> >  I can reliably trigger it within 15 minutes under SMP kernel on P4 HT
> >  and 4-way ia64 machines. Has anyone else seen it?
> 
> It's easy to reproduce on 2-way x86, however it doesn't look like a kernel
> bug.
> 
> 
> akpm      2503  0.0  0.2  3892  640 pts/0    S    00:34   0:00  |           \_ make
> akpm      2504  0.0  0.0  1368  192 pts/0    S    00:34   0:00  |               \_ time expect test.exp
> akpm      2505  0.0  0.5  4776 1276 pts/0    S    00:34   0:00  |                   \_ expect test.exp
> akpm      4726  0.0  0.0     0    0 ?        Z    00:35   0:00  |                       \_ [true] <defunct>
> 
> process 4726 is sleeping at the end of do_exit():
> 
> 
> 	schedule();		<<- here
> 	BUG();
> 	/* Avoid "noreturn function does return".  */
> 	for (;;) ;
> }
> 
> So it has completely exitted and is waiting for someone to reap its exit
> code and stack slot via wait4().  So what is its parent up to?
> 
> 
> (gdb) thread 69
> [Switching to thread 69 (Thread 2505)]#0  0xc01653e9 in do_select (n=5, fds=0xcc8b5fa4, 
>     timeout=0xcc8b5fa0) at fs/select.c:257
> 257                     __timeout = schedule_timeout(__timeout);
> (gdb) bt
> #0  0xc01653e9 in do_select (n=5, fds=0xcc8b5fa4, timeout=0xcc8b5fa0) at fs/select.c:257
> #1  0xc016579c in sys_select (n=5, inp=0x804b444, outp=0x804b4c4, exp=0x804b544, tvp=0xbfffe1d0)
>     at fs/select.c:354
> #2  0xc0105e39 in sysenter_past_esp () at arch/i386/kernel/semaphore.c:177
> 
> The parent is sleeping in select() rather than wait()ing for children.
> 

I don't think SIGCHLD is blocked. Shouldn't SIGCHLD interrupt select?
UP kernel doesn't have this problem.


H.J.

  reply	other threads:[~2004-09-10 16:39 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-09-09 20:13 Hanging process on SMP machines? H. J. Lu
2004-09-09 20:13 ` H. J. Lu
2004-09-10  7:45 ` Andrew Morton
2004-09-10  7:45   ` Andrew Morton
2004-09-10 16:39   ` H. J. Lu [this message]
2004-09-10 16:39     ` H. J. Lu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20040910163927.GA8124@lucon.org \
    --to=hjl@lucon.org \
    --cc=akpm@osdl.org \
    --cc=linux-ia64@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.