From: Andrew Morton <akpm@osdl.org>
To: "H. J. Lu" <hjl@lucon.org>
Cc: linux-kernel@vger.kernel.org, linux-ia64@vger.kernel.org
Subject: Re: Hanging process on SMP machines?
Date: Fri, 10 Sep 2004 07:45:49 +0000 [thread overview]
Message-ID: <20040910004549.2df18073.akpm@osdl.org> (raw)
In-Reply-To: <20040909201321.GA21492@lucon.org>
"H. J. Lu" <hjl@lucon.org> wrote:
>
> I notice that a process may hang on SMP machines at random:
>
> http://bugzilla.kernel.org/show_bug.cgi?id332
>
> I can reliably trigger it within 15 minutes under SMP kernel on P4 HT
> and 4-way ia64 machines. Has anyone else seen it?
It's easy to reproduce on 2-way x86, however it doesn't look like a kernel
bug.
akpm 2503 0.0 0.2 3892 640 pts/0 S 00:34 0:00 | \_ make
akpm 2504 0.0 0.0 1368 192 pts/0 S 00:34 0:00 | \_ time expect test.exp
akpm 2505 0.0 0.5 4776 1276 pts/0 S 00:34 0:00 | \_ expect test.exp
akpm 4726 0.0 0.0 0 0 ? Z 00:35 0:00 | \_ [true] <defunct>
process 4726 is sleeping at the end of do_exit():
schedule(); <<- here
BUG();
/* Avoid "noreturn function does return". */
for (;;) ;
}
So it has completely exitted and is waiting for someone to reap its exit
code and stack slot via wait4(). So what is its parent up to?
(gdb) thread 69
[Switching to thread 69 (Thread 2505)]#0 0xc01653e9 in do_select (n=5, fds=0xcc8b5fa4,
timeout=0xcc8b5fa0) at fs/select.c:257
257 __timeout = schedule_timeout(__timeout);
(gdb) bt
#0 0xc01653e9 in do_select (n=5, fds=0xcc8b5fa4, timeout=0xcc8b5fa0) at fs/select.c:257
#1 0xc016579c in sys_select (n=5, inp=0x804b444, outp=0x804b4c4, exp=0x804b544, tvp=0xbfffe1d0)
at fs/select.c:354
#2 0xc0105e39 in sysenter_past_esp () at arch/i386/kernel/semaphore.c:177
The parent is sleeping in select() rather than wait()ing for children.
(gdb) f 0
#0 0xc01653e9 in do_select (n=5, fds=0xcc8b5fa4, timeout=0xcc8b5fa0) at fs/select.c:257
257 __timeout = schedule_timeout(__timeout);
(gdb) p __timeout
$1 = 3000000
For 3,000 seconds.
I changed your 3000 to 30 and lo, the script hangs for 30 seconds every now
and then, and then resumes.
WARNING: multiple messages have this Message-ID (diff)
From: Andrew Morton <akpm@osdl.org>
To: "H. J. Lu" <hjl@lucon.org>
Cc: linux-kernel@vger.kernel.org, linux-ia64@vger.kernel.org
Subject: Re: Hanging process on SMP machines?
Date: Fri, 10 Sep 2004 00:45:49 -0700 [thread overview]
Message-ID: <20040910004549.2df18073.akpm@osdl.org> (raw)
In-Reply-To: <20040909201321.GA21492@lucon.org>
"H. J. Lu" <hjl@lucon.org> wrote:
>
> I notice that a process may hang on SMP machines at random:
>
> http://bugzilla.kernel.org/show_bug.cgi?id=3332
>
> I can reliably trigger it within 15 minutes under SMP kernel on P4 HT
> and 4-way ia64 machines. Has anyone else seen it?
It's easy to reproduce on 2-way x86, however it doesn't look like a kernel
bug.
akpm 2503 0.0 0.2 3892 640 pts/0 S 00:34 0:00 | \_ make
akpm 2504 0.0 0.0 1368 192 pts/0 S 00:34 0:00 | \_ time expect test.exp
akpm 2505 0.0 0.5 4776 1276 pts/0 S 00:34 0:00 | \_ expect test.exp
akpm 4726 0.0 0.0 0 0 ? Z 00:35 0:00 | \_ [true] <defunct>
process 4726 is sleeping at the end of do_exit():
schedule(); <<- here
BUG();
/* Avoid "noreturn function does return". */
for (;;) ;
}
So it has completely exitted and is waiting for someone to reap its exit
code and stack slot via wait4(). So what is its parent up to?
(gdb) thread 69
[Switching to thread 69 (Thread 2505)]#0 0xc01653e9 in do_select (n=5, fds=0xcc8b5fa4,
timeout=0xcc8b5fa0) at fs/select.c:257
257 __timeout = schedule_timeout(__timeout);
(gdb) bt
#0 0xc01653e9 in do_select (n=5, fds=0xcc8b5fa4, timeout=0xcc8b5fa0) at fs/select.c:257
#1 0xc016579c in sys_select (n=5, inp=0x804b444, outp=0x804b4c4, exp=0x804b544, tvp=0xbfffe1d0)
at fs/select.c:354
#2 0xc0105e39 in sysenter_past_esp () at arch/i386/kernel/semaphore.c:177
The parent is sleeping in select() rather than wait()ing for children.
(gdb) f 0
#0 0xc01653e9 in do_select (n=5, fds=0xcc8b5fa4, timeout=0xcc8b5fa0) at fs/select.c:257
257 __timeout = schedule_timeout(__timeout);
(gdb) p __timeout
$1 = 3000000
For 3,000 seconds.
I changed your 3000 to 30 and lo, the script hangs for 30 seconds every now
and then, and then resumes.
next prev parent reply other threads:[~2004-09-10 7:45 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2004-09-09 20:13 Hanging process on SMP machines? H. J. Lu
2004-09-09 20:13 ` H. J. Lu
2004-09-10 7:45 ` Andrew Morton [this message]
2004-09-10 7:45 ` Andrew Morton
2004-09-10 16:39 ` H. J. Lu
2004-09-10 16:39 ` H. J. Lu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20040910004549.2df18073.akpm@osdl.org \
--to=akpm@osdl.org \
--cc=hjl@lucon.org \
--cc=linux-ia64@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.