* Strange stop-signal behavior in multithreaded program with defunct main
@ 2008-10-29 4:29 Michael Kerrisk
2008-10-30 11:00 ` Oleg Nesterov
0 siblings, 1 reply; 4+ messages in thread
From: Michael Kerrisk @ 2008-10-29 4:29 UTC (permalink / raw)
To: lkml
Cc: Oleg Nesterov, Alan Cox, Bert Wesarg, Ingo Molnar, Roland McGrath,
Linus Torvalds
Bert Wesarg described a scenario that I quickly replicated on
2.6.28-rc2 (and 2.6.25 -- it's not a regression in 2.6.28-rc)
using the program below: if we have a multithreaded process
with a defunct main thread running on a tty, and that
process is sent a stop signal (either ^Z (SIGTSTP) or a stop
signal sent from another terminal using kill(1)), then:
a) the terminal is locked up; and
b) the program is unresponsive to any other signal, except SIGKILL
or SIGCONT.
An example run:
$ ./pthreads_zombie_main 1 # Creates one thread besides main
0: 0
0: 1
0: 2
^Z
At this point, no shell prompt appears, and typing ^C (or ^\) has no
effect. The process can be killed (and the terminal restored) by sending
SIGKILL from another terminal. (If one instead types ^C at the terminal,
and then sends SIGCONT from another terminal, then the terminal is restored
and the program can be seen (via $?) to have terminated because of
SIGINT.)
I'm (wildly) guessing that there is some problem in the terminal driver's
understanding of the state and identify of the foreground job, but am not
sure how to analyze this further. (I couldn't find a bug report or LKML
thread that seemed to describe exactly this problem.) Ideas?
Cheers,
Michael
/* pthreads_zombie_main.c */
#include <pthread.h>
#include <string.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <errno.h>
#include <assert.h>
#define errExitEN(en, msg) { errno = en; perror(msg); \
exit(EXIT_FAILURE); }
static void *
thread_start(void *arg)
{
int tnum = (int) arg;
int j;
for (j = 0; ; j++) {
sleep(3);
printf("%d: %d\n", tnum, j);
}
}
int
main(int argc, char *argv[])
{
int s, tnum;
pthread_t thr;
if (argc != 2) {
fprintf(stderr, "Usage: %s <num-threads>\n", argv[0]);
exit(EXIT_SUCCESS);
}
for (tnum = 0; tnum < atoi(argv[1]); tnum++) {
s = pthread_create(&thr, NULL, &thread_start, (void *) tnum);
if (s != 0)
errExitEN(s, "pthread_create");
}
pthread_exit(NULL);
}
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Strange stop-signal behavior in multithreaded program with defunct main
2008-10-29 4:29 Strange stop-signal behavior in multithreaded program with defunct main Michael Kerrisk
@ 2008-10-30 11:00 ` Oleg Nesterov
2008-10-30 15:55 ` Michael Kerrisk
0 siblings, 1 reply; 4+ messages in thread
From: Oleg Nesterov @ 2008-10-30 11:00 UTC (permalink / raw)
To: Michael Kerrisk
Cc: lkml, Alan Cox, Bert Wesarg, Ingo Molnar, Roland McGrath,
Linus Torvalds
On 10/28, Michael Kerrisk wrote:
>
> Bert Wesarg described a scenario that I quickly replicated on
> 2.6.28-rc2 (and 2.6.25 -- it's not a regression in 2.6.28-rc)
> using the program below: if we have a multithreaded process
> with a defunct main thread running on a tty, and that
> process is sent a stop signal (either ^Z (SIGTSTP) or a stop
> signal sent from another terminal using kill(1)), then:
>
> a) the terminal is locked up; and
>
> b) the program is unresponsive to any other signal, except SIGKILL
> or SIGCONT.
Yes, known problem. Please look at
[RFC,PATCH 3/3] do_wait: fix waiting for stopped group with dead leader
http://marc.info/?t=119713920000003
Oleg.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Strange stop-signal behavior in multithreaded program with defunct main
2008-10-30 11:00 ` Oleg Nesterov
@ 2008-10-30 15:55 ` Michael Kerrisk
2008-10-30 18:10 ` Oleg Nesterov
0 siblings, 1 reply; 4+ messages in thread
From: Michael Kerrisk @ 2008-10-30 15:55 UTC (permalink / raw)
To: Oleg Nesterov
Cc: lkml, Alan Cox, Bert Wesarg, Ingo Molnar, Roland McGrath,
Linus Torvalds
Hi Oleg,
On Thu, Oct 30, 2008 at 6:00 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> On 10/28, Michael Kerrisk wrote:
>>
>> Bert Wesarg described a scenario that I quickly replicated on
>> 2.6.28-rc2 (and 2.6.25 -- it's not a regression in 2.6.28-rc)
>> using the program below: if we have a multithreaded process
>> with a defunct main thread running on a tty, and that
>> process is sent a stop signal (either ^Z (SIGTSTP) or a stop
>> signal sent from another terminal using kill(1)), then:
>>
>> a) the terminal is locked up; and
>>
>> b) the program is unresponsive to any other signal, except SIGKILL
>> or SIGCONT.
>
> Yes, known problem. Please look at
>
> [RFC,PATCH 3/3] do_wait: fix waiting for stopped group with dead leader
> http://marc.info/?t=119713920000003
Okay -- thanks for the info. I've added some text to man-pages to
cover this bug.
Cheers,
Michael
--- a/man3/pthread_exit.3
+++ b/man3/pthread_exit.3
@@ -21,7 +21,7 @@
.\" Formatted or processed versions of this manual, if unaccompanied by
.\" the source, must acknowledge the copyright and authors of this work.
.\"
-.TH PTHREAD_EXIT 3 2008-10-24 "Linux" "Linux Programmer's Manual"
+.TH PTHREAD_EXIT 3 2008-10-30 "Linux" "Linux Programmer's Manual"
.SH NAME
pthread_exit \- terminate calling thread
.SH SYNOPSIS
@@ -87,6 +87,18 @@ The value pointed to by
.IR retval
should not be located on the calling thread's stack,
since the contents of that stack are undefined after the thread terminates.
+.SH BUGS
+Currently,
+.\" Linux 2.6.27
+there are limitations in the kernel implementation logic for
+.BR wait (2)ing
+on a stopped thread group with a dead thread group leader.
+This can manifest in problems such as a locked terminal if a stop signal is
+sent to a foreground process whose thread group leader has already called
+.BR pthread_exit (3).
+.\" FIXME . review a later kernel to see if this gets fixed
+.\" http://thread.gmane.org/gmane.linux.kernel/611611
+.\" http://marc.info/?l=linux-kernel&m=122525468300823&w=2
.SH SEE ALSO
.BR pthread_create (3),
.BR pthread_join (3),
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Strange stop-signal behavior in multithreaded program with defunct main
2008-10-30 15:55 ` Michael Kerrisk
@ 2008-10-30 18:10 ` Oleg Nesterov
0 siblings, 0 replies; 4+ messages in thread
From: Oleg Nesterov @ 2008-10-30 18:10 UTC (permalink / raw)
To: Michael Kerrisk
Cc: lkml, Alan Cox, Bert Wesarg, Ingo Molnar, Roland McGrath,
Linus Torvalds
On 10/30, Michael Kerrisk wrote:
>
> On Thu, Oct 30, 2008 at 6:00 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> > On 10/28, Michael Kerrisk wrote:
> >>
> >> Bert Wesarg described a scenario that I quickly replicated on
> >> 2.6.28-rc2 (and 2.6.25 -- it's not a regression in 2.6.28-rc)
> >> using the program below: if we have a multithreaded process
> >> with a defunct main thread running on a tty, and that
> >> process is sent a stop signal (either ^Z (SIGTSTP) or a stop
> >> signal sent from another terminal using kill(1)), then:
> >>
> >> a) the terminal is locked up; and
> >>
> >> b) the program is unresponsive to any other signal, except SIGKILL
> >> or SIGCONT.
> >
> > Yes, known problem. Please look at
> >
> > [RFC,PATCH 3/3] do_wait: fix waiting for stopped group with dead leader
> > http://marc.info/?t=119713920000003
>
> Okay -- thanks for the info. I've added some text to man-pages to
> cover this bug.
Well, we should fix this bug, of course.
I'll try to redo my old patch, but fyi I am very busy right now, and
most probably I will be completely offline during the next week.
Oleg.
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2008-10-30 17:10 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-10-29 4:29 Strange stop-signal behavior in multithreaded program with defunct main Michael Kerrisk
2008-10-30 11:00 ` Oleg Nesterov
2008-10-30 15:55 ` Michael Kerrisk
2008-10-30 18:10 ` Oleg Nesterov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox