main thread pthread_exit/sys

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* main thread pthread_exit/sys_exit bug!
@ 2009-02-01 22:32 Kaz Kylheku
       [not found] ` <20090201174159.4a52e15c.akpm@linux-foundation.org>
  0 siblings, 1 reply; 20+ messages in thread
From: Kaz Kylheku @ 2009-02-01 22:32 UTC (permalink / raw)
  To: linux-kernel

Basically, if you call pthread_exit from the main thread of a process, and keep
other threads running, the behavior is ugly.

I logged this initially as a bug against glibc, but then resolved it
with a kernel patch against linux 2.6.26:

Please see:

http://sources.redhat.com/bugzilla/show_bug.cgi?id=9804

I've known about this for some time, first having reproduced it on 2.6.17;
finally got around to fixing it.

When the main thread of a POSIX threads process calls pthread_exit, the process
should stick around until all the other threads do the same, or until one of
them calls _exit or exit, or until the process terminates abnormally.  During
this time, it would be nice if the process behaved normally: if it did not
appear defunct in the process list and if POSIX job control was possible on it.

An easy way to achieve this is to insert a wait into the top of sys_exit, so
that do_exit is not called unless all the other threads have terminated.  This
is another special case like do_group_exit. In the group exit, we zap the other
threads. In this case, we must not zap the other threads, but neither should we
fall through do_exit and become defunct!

The patch involves a controversial move: returning -ERESTARTSYS from sys_exit.
This is because the main thread may be stuck in sys_exit and have to respond to
a signal, and then go back to sys_exit. It appears to be working fine.

^ permalink raw reply	[flat|nested] 20+ messages in thread

[parent not found: <20090201174159.4a52e15c.akpm@linux-foundation.org>]

* Re: main thread pthread_exit/sys_exit bug!
       [not found] ` <20090201174159.4a52e15c.akpm@linux-foundation.org>
@ 2009-02-02  6:45   ` Oleg Nesterov
  2009-02-02  7:10     ` Kaz Kylheku
  0 siblings, 1 reply; 20+ messages in thread
From: Oleg Nesterov @ 2009-02-02  6:45 UTC (permalink / raw)
  To: Kaz Kylheku, Andrew Morton; +Cc: linux-kernel

Kaz Kylheku wrote:
>
> Basically, if you call pthread_exit from the main thread of a process, and keep
> other threads running, the behavior is ugly.

Yes, known problem.

Please look at

        [RFC,PATCH 3/3] do_wait: fix waiting for stopped group with dead leader
        http://marc.info/?t=119713920000003

I'll try to re-do and re-send this patch this week.

Oleg.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: main thread pthread_exit/sys_exit bug!
  2009-02-02  6:45   ` Oleg Nesterov
@ 2009-02-02  7:10     ` Kaz Kylheku
  2009-02-02 16:56       ` Oleg Nesterov
  0 siblings, 1 reply; 20+ messages in thread
From: Kaz Kylheku @ 2009-02-02  7:10 UTC (permalink / raw)
  To: linux-kernel; +Cc: Andrew Morton, Oleg Nesterov

On Sun, Feb 1, 2009 at 10:45 PM, Oleg Nesterov <oleg@redhat.com> wrote:
> Kaz Kylheku wrote:
>>
>> Basically, if you call pthread_exit from the main thread of a process, and keep
>> other threads running, the behavior is ugly.
>
> Yes, known problem.
>
> Please look at
>
>        [RFC,PATCH 3/3] do_wait: fix waiting for stopped group with dead leader
>        http://marc.info/?t=119713920000003
>
> I'll try to re-do and re-send this patch this week.

I believe that my straight-forward fix is pretty much good to go. I
checked into my distro, so we will see how it holds up.

It's a bad idea to allow the main thread to terminate. It should stick
around because it serves as a facade for the process as a whole. If
the main thread is allowed to bail all the way through do_exit, who
knows what kind of problems may show up because of that.

What if one of my developers is working on a server which has called
pthread_exit in the main thread, and wants to attach gdb to it? Will
that work if the main thread (a.k.a group leader) is a defunct
process?

I just tried this test case and it worked perfectly with my patch! gdb
attached to the process by the pid of teh group leader. It correctly
showed as that thread being stopped in __exit_thread. I can see the
other threads, etc.

bash:~# /projects/sw/kaz/bug-repro-programs/pthread-exit &
[1] 2093
bash:~# gdb -p 2093
GNU gdb 6.8
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "mips64-linux".
Attaching to process 2093
Reading symbols from /projects/sw/kaz/bug-repro-programs/pthread-exit...done.
Reading symbols from /lib32/libpthread.so.0...done.
[Thread debugging using libthread_db enabled]
[New Thread 0x2d46c4b0 (LWP 2098)]
[New Thread 0x2cc6c4b0 (LWP 2097)]
[New Thread 0x2c46c4b0 (LWP 2096)]
[New Thread 0x2bc6c4b0 (LWP 2095)]
[New Thread 0x2b46c4b0 (LWP 2094)]
Loaded symbols for /lib32/libpthread.so.0
Reading symbols from /lib32/libc.so.6...done.
Loaded symbols for /lib32/libc.so.6
Reading symbols from /lib32/ld.so.1...done.
Loaded symbols for /lib32/ld.so.1
Reading symbols from /lib32/libgcc_s.so.1...done.
Loaded symbols for /lib32/libgcc_s.so.1
0x2abd3da4 in __exit_thread () from /lib32/libc.so.6
(gdb) where
#0  0x2abd3da4 in __exit_thread () from /lib32/libc.so.6
#1  0x2ab18ab0 in __libc_start_main (main=0x10000710 <main>, argc=1,
    ubp_av=0x7fcd7524, init=<value optimized out>, fini=<value optimized out>,
    rtld_fini=<value optimized out>, stack_end=<value optimized out>)
    at libc-start.c:245
#2  0x100005dc in _ftext ()

If I try this on an unpatched kernel that allows a main thread to bail
through do_exit, this is what happens:

Attaching to process 14651
ptrace: No such process.
/root/14651: No such file or directory.

I don't think that this is solved by any patch that allows the group
leader to bail through do_exit. It's not just a problem of waiting on
a dead group leader. If you want to maintain the illusion that the OS
provides a process that contains threads, and the group leader is the
representation of that process, then you have to keep that leader
alive; the lifetime of that leader cannot be shorter than that of the
process illusion.

Patch:

http://sourceware.org/bugzilla/attachment.cgi?id=3702

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: main thread pthread_exit/sys_exit bug!
  2009-02-02  7:10     ` Kaz Kylheku
@ 2009-02-02 16:56       ` Oleg Nesterov
  2009-02-02 20:10         ` Kaz Kylheku
                           ` (2 more replies)
  0 siblings, 3 replies; 20+ messages in thread
From: Oleg Nesterov @ 2009-02-02 16:56 UTC (permalink / raw)
  To: Kaz Kylheku; +Cc: linux-kernel, Andrew Morton, Roland McGrath, Ulrich Drepper

On 02/01, Kaz Kylheku wrote:
>
> On Sun, Feb 1, 2009 at 10:45 PM, Oleg Nesterov <oleg@redhat.com> wrote:
> > Kaz Kylheku wrote:
> >>
> >> Basically, if you call pthread_exit from the main thread of a process, and keep
> >> other threads running, the behavior is ugly.
> >
> > Yes, known problem.
> >
> > Please look at
> >
> >        [RFC,PATCH 3/3] do_wait: fix waiting for stopped group with dead leader
> >        http://marc.info/?t=119713920000003
> >
> > I'll try to re-do and re-send this patch this week.
>
> I believe that my straight-forward fix is pretty much good to go. I
> checked into my distro, so we will see how it holds up.

(the patch: http://sourceware.org/bugzilla/attachment.cgi?id=3702)

Well, perhaps something like your patch makes sense.

	+/*
	+ * A single thread is exiting, and it is the leader of the group.
	+ * This coresponds to the main thread calling pthread_exit.
	+ * In this case, the persists until all the other
	+ * threads call pthread_exit, or someone calls exit or _exit.
	+ * To implement this case, we park the thread leader in
	+ * a loop which waits until the thread list becomes empty,
	+ * or it receives a fatal signal.
                           ^^^^^^^^^^^^^^
The comment is not exactly right, we return on every signal, not only fatal.

	+int
	+do_leader_exit(void)
	+{
	+	int ret = 0;
	+
	+	if (unlikely(!thread_group_empty(current))) {
	+		DECLARE_WAITQUEUE(wait, current);
	+
	+		add_wait_queue(&current->signal->wait_chldexit, &wait);
	+
	+		set_current_state(TASK_INTERRUPTIBLE);
	+
	+		do {
	+			if (thread_group_empty(current))
	+				break;
	+
	+			try_to_freeze();
	+			schedule();
	+			if (signal_pending(current))
	+				ret = -ERESTARTSYS;
	+		} while (ret == 0);
	+
	+		remove_wait_queue(&current->signal->wait_chldexit, &wait);
	+
	+		__set_current_state(TASK_RUNNING);
	+	}
	+
	+	return ret;
	+}

the above is just the open-coded
wait_event_interruptible(&current->signal->wait_chldexit,
				thread_group_empty(current));


	 asmlinkage long sys_exit(int error_code)
	 {
	+	if (thread_group_leader(current)) {
	+		int ret = do_leader_exit();
	+		if (ret != 0)
	+			return ret;
	+	}
		do_exit((error_code&0xff)<<8);
	 }

afaics, -ERESTARTSYS is not exactly correct. We can dequeue the
signal without SA_RESTART. But we never should abort sys_exit().
ERESTARTNOINTR is better. But see below.

I am worried this patch can confuse the user-space. Because, when
the main thread does sys_exit(), the user-space has all rights
to assume it exits ;) But with this patch the main thread will
continue to handle the signals until the while group exits, I'm
afraid libpthread.so won't be happy.

And what if the signal handler does siglongjmp() and aborts sys_exit() ?

> It's a bad idea to allow the main thread to terminate. It should stick
> around because it serves as a facade for the process as a whole. If
> the main thread is allowed to bail all the way through do_exit, who
> knows what kind of problems may show up because of that.
>
> What if one of my developers is working on a server which has called
> pthread_exit in the main thread, and wants to attach gdb to it? Will
> that work if the main thread (a.k.a group leader) is a defunct
> process?

And I think gdb is wrong. It can see this process has other live threads,
and attach to them. I didn't check gdb, but iirc "strace -f" works
correctly in this case.

I cced other people. Let's see what they think.

Oleg.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: main thread pthread_exit/sys_exit bug!
  2009-02-02 16:56       ` Oleg Nesterov
@ 2009-02-02 20:10         ` Kaz Kylheku
  2009-02-02 20:17         ` Ulrich Drepper
  2009-02-05  3:05         ` Roland McGrath
  2 siblings, 0 replies; 20+ messages in thread
From: Kaz Kylheku @ 2009-02-02 20:10 UTC (permalink / raw)
  To: linux-kernel; +Cc: Andrew Morton, Roland McGrath, Ulrich Drepper, Oleg Nesterov

On Mon, Feb 2, 2009 at 8:56 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> Well, perhaps something like your patch makes sense.
>
>        +/*
>        + * A single thread is exiting, and it is the leader of the group.
>        + * This coresponds to the main thread calling pthread_exit.
>        + * In this case, the persists until all the other
>        + * threads call pthread_exit, or someone calls exit or _exit.
>        + * To implement this case, we park the thread leader in
>        + * a loop which waits until the thread list becomes empty,
>        + * or it receives a fatal signal.
>                           ^^^^^^^^^^^^^^
> The comment is not exactly right, we return on every signal, not only fatal.

Yes; that's stale text, referring to some earlier code that had been changed.

> the above is just the open-coded
> wait_event_interruptible(&current->signal->wait_chldexit,
>                                thread_group_empty(current));

Terrific! I will make the change locally.

Btw, it's ``hand coded''.  ``Open coding'' is what both programmers and macros
do. Hand conding is the dumb thing I did instead of using the macro to open
code it. :)

> I am worried this patch can confuse the user-space. Because, when
> the main thread does sys_exit(), the user-space has all rights
> to assume it exits ;)

Only glibc knows about sys_exit. (Or are there run-times for other language
implementations that have their own binding to sys_exit, bypassing pthreads?)

The POSIX interface used by applications is pthread_exit, and there is no
assumption there about it being an exit-like system call. It has a number of
standard-defined user-space chores to do, in fact.

> But with this patch the main thread will
> continue to handle the signals until the while group exits, I'm
> afraid libpthread.so won't be happy.

> And what if the signal handler does siglongjmp() and aborts sys_exit() ?

pthread_exit peforms cleanup unwinding (required by Unix and POSIX) and
destruction of thread-specific storage.

Glibc does this and then performs its own longjmp to a handler in the startup
code above main. At that point it's no longer correct to longjmp to any of
the frames that have been aborted by that action.  It's still executing user
code; sys_exit has not been called yet, and the signal handler can go
off.

Other than that, there is actually nothing wrong with aborting sys_exit. It
hasn't done any cleanup yet through do_exit, so it can be nicely
reentered later.

I'd say that any programs that are broken by this patch are
probably ``fork in the toaster'' category anyway.

Note that programs can also abort the exit function with signals, too,
and there is nothing that can be done in the kernel about it.

>> It's a bad idea to allow the main thread to terminate. It should stick
>> around because it serves as a facade for the process as a whole. If
>> the main thread is allowed to bail all the way through do_exit, who
>> knows what kind of problems may show up because of that.
>>
>> What if one of my developers is working on a server which has called
>> pthread_exit in the main thread, and wants to attach gdb to it? Will
>> that work if the main thread (a.k.a group leader) is a defunct
>> process?
>
> And I think gdb is wrong. It can see this process has other live threads,
> and attach to them.

But if this problem is patched in this very simple way in the kernel, then gdb
installations ``Just Work'' without even being recompiled.

Probably, the only way the gdb maintainers would accept another Linux-specific
hack would be if they didn't know that a kernel patch will fix it. :)

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: main thread pthread_exit/sys_exit bug!
  2009-02-02 16:56       ` Oleg Nesterov
  2009-02-02 20:10         ` Kaz Kylheku
@ 2009-02-02 20:17         ` Ulrich Drepper
  2009-02-02 20:39           ` Kaz Kylheku
  2009-02-05  3:05         ` Roland McGrath
  2 siblings, 1 reply; 20+ messages in thread
From: Ulrich Drepper @ 2009-02-02 20:17 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: Kaz Kylheku, linux-kernel, Andrew Morton, Roland McGrath

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Oleg Nesterov wrote:
> I am worried this patch can confuse the user-space. Because, when
> the main thread does sys_exit(), the user-space has all rights
> to assume it exits ;) But with this patch the main thread will
> continue to handle the signals until the while group exits, I'm
> afraid libpthread.so won't be happy.

I haven't looked at the patch nor tried it.

If the patch changes the behavior that the main thread, after calling
sys_exit, still react to signals sent to this thread or to the process
as a whole, then the patch is wrong.  The userlevel context of the
thread is not usable anymore.  It will have run all kinds of
destructors.  The current behavior is AFAIK that the main thread won't
react to any signal anymore.  That is absolutely required.

- --
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)

iEYEARECAAYFAkmHVO8ACgkQ2ijCOnn/RHTJvwCgodxkT+mg0tmrnlhf/IP8hUQc
RYIAn0YC7pTjPHHZa7kmvYSyu/Zw5IIT
=ehdX
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: main thread pthread_exit/sys_exit bug!
  2009-02-02 20:17         ` Ulrich Drepper
@ 2009-02-02 20:39           ` Kaz Kylheku
  2009-02-03  2:39             ` Kaz Kylheku
  0 siblings, 1 reply; 20+ messages in thread
From: Kaz Kylheku @ 2009-02-02 20:39 UTC (permalink / raw)
  To: Ulrich Drepper; +Cc: Oleg Nesterov, linux-kernel, Andrew Morton, Roland McGrath

On Mon, Feb 2, 2009 at 12:17 PM, Ulrich Drepper <drepper@redhat.com> wrote:
> The userlevel context of the
> thread is not usable anymore.  It will have run all kinds of
> destructors.  The current behavior is AFAIK that the main thread won't
> react to any signal anymore.  That is absolutely required.

Hey Ulrich,

Thanks for articulating that requirement. I think it can be met by
extending the patch a little bit. We can keep the main thread parked
inside sys_exit /and/ get it not to react to signals internally, yet
have the external behavior that the process reacts in the normal
way to certain signals like SIGTSTP, SIGCONT, etc.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: main thread pthread_exit/sys_exit bug!
  2009-02-02 20:39           ` Kaz Kylheku
@ 2009-02-03  2:39             ` Kaz Kylheku
  2009-02-03 13:33               ` Oleg Nesterov
  0 siblings, 1 reply; 20+ messages in thread
From: Kaz Kylheku @ 2009-02-03  2:39 UTC (permalink / raw)
  To: linux-kernel; +Cc: Oleg Nesterov, Andrew Morton, Roland McGrath

On Mon, Feb 2, 2009 at 12:39 PM, Kaz Kylheku <kkylheku@gmail.com> wrote:
> On Mon, Feb 2, 2009 at 12:17 PM, Ulrich Drepper <drepper@redhat.com> wrote:
>> The userlevel context of the
>> thread is not usable anymore.  It will have run all kinds of
>> destructors.  The current behavior is AFAIK that the main thread won't
>> react to any signal anymore.  That is absolutely required.
>
> Hey Ulrich,
>
> Thanks for articulating that requirement. I think it can be met by
> extending the patch a little bit.

I've now done that.

The exiting thread leader, if there are still other
threads alive, gets its own private signal handler array in which
every action is set to SIG_IGN, using the ignore_signals
function.

I experimented with blocking signals, but that approach
breaks the test case of being able to attach GDB to the
exiting thread.

As part of the patch, I found it convenient to extend the
incomplete sys_unshare functionality w.r.t. signal handlers,
rather than reinvent the wheel.

Cheers ...

http://sourceware.org/bugzilla/attachment.cgi?id=3702
http://sourceware.org/bugzilla/attachment.cgi?id=3705

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: main thread pthread_exit/sys_exit bug!
  2009-02-03  2:39             ` Kaz Kylheku
@ 2009-02-03 13:33               ` Oleg Nesterov
  2009-02-03 19:51                 ` Kaz Kylheku
  0 siblings, 1 reply; 20+ messages in thread
From: Oleg Nesterov @ 2009-02-03 13:33 UTC (permalink / raw)
  To: Kaz Kylheku; +Cc: linux-kernel, Andrew Morton, Roland McGrath

On 02/02, Kaz Kylheku wrote:
>
> On Mon, Feb 2, 2009 at 12:39 PM, Kaz Kylheku <kkylheku@gmail.com> wrote:
> > On Mon, Feb 2, 2009 at 12:17 PM, Ulrich Drepper <drepper@redhat.com> wrote:
> >> The userlevel context of the
> >> thread is not usable anymore.  It will have run all kinds of
> >> destructors.  The current behavior is AFAIK that the main thread won't
> >> react to any signal anymore.  That is absolutely required.
> >
> > Hey Ulrich,
> >
> > Thanks for articulating that requirement. I think it can be met by
> > extending the patch a little bit.
>
> I've now done that.
>
> The exiting thread leader, if there are still other
> threads alive, gets its own private signal handler array in which
> every action is set to SIG_IGN, using the ignore_signals
> function.
>
> I experimented with blocking signals, but that approach
> breaks the test case of being able to attach GDB to the
> exiting thread.
>
> As part of the patch, I found it convenient to extend the
> incomplete sys_unshare functionality w.r.t. signal handlers,
> rather than reinvent the wheel.

This is wrong, we can not and must not unshare ->sighand.

> Cheers ...
>
> http://sourceware.org/bugzilla/attachment.cgi?id=3702
> http://sourceware.org/bugzilla/attachment.cgi?id=3705

This adds multiple problems. Just for example, fs/proc/ takes
leader->sighand->siglock to protect the list of sub-threads.
Of course this doesn't work any longer after unsharing. And
there are numerous similar problems.

ignore_signals() in do_leader_exit() is not right too. This
thread group should hangle the group-wide signals even if
the main thread exits.

atomic_read(&sigh->count) in unshare_sighand() is racy, and
in fact bogus. (yes, the whole unshare_sighand() is bogus,
it never populates new_sighp).

The changing of ->sighand in do_unshare() is very wrong, we
can free the sighand_struct which is currently locked/used/etc.

Kaz, I don't really understand why you are trying to add these
complications to the kernel :(

If the thread exits - it should exit. Yes, we have problems
with the exited main thread, we should fix them.

Yes, gdb refuses to attach to the dead thread (I didn't check
this myself, but I think you are right). But there is nothing
wrong here, because we can't ptrace this thread. But, gdb
_can_ ptrace the process, and it can see it have other threads.

OK, if nothing else. Let's suppose your patch is correct. What
about robust futexes? How can we delay exit_robust_list() ?
I don't think we can.

Oleg.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: main thread pthread_exit/sys_exit bug!
  2009-02-03 13:33               ` Oleg Nesterov
@ 2009-02-03 19:51                 ` Kaz Kylheku
  2009-02-03 21:32                   ` Oleg Nesterov
  0 siblings, 1 reply; 20+ messages in thread
From: Kaz Kylheku @ 2009-02-03 19:51 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: linux-kernel, Andrew Morton, Roland McGrath

On Tue, Feb 3, 2009 at 5:33 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> On 02/02, Kaz Kylheku wrote:
>>
>> On Mon, Feb 2, 2009 at 12:39 PM, Kaz Kylheku <kkylheku@gmail.com> wrote:
>> > On Mon, Feb 2, 2009 at 12:17 PM, Ulrich Drepper <drepper@redhat.com> wrote:
>> >> The userlevel context of the
>> >> thread is not usable anymore.  It will have run all kinds of
>> >> destructors.  The current behavior is AFAIK that the main thread won't
>> >> react to any signal anymore.  That is absolutely required.
>> >
>> > Hey Ulrich,
>> >
>> > Thanks for articulating that requirement. I think it can be met by
>> > extending the patch a little bit.
>>
>> I've now done that.
>>
>> The exiting thread leader, if there are still other
>> threads alive, gets its own private signal handler array in which
>> every action is set to SIG_IGN, using the ignore_signals
>> function.
>>
>> I experimented with blocking signals, but that approach
>> breaks the test case of being able to attach GDB to the
>> exiting thread.
>>
>> As part of the patch, I found it convenient to extend the
>> incomplete sys_unshare functionality w.r.t. signal handlers,
>> rather than reinvent the wheel.
>
> This is wrong, we can not and must not unshare ->sighand.

You are right; it breaks important invariant conditions which
connect the thread group together, like the one about the
lock, et cetera.  The patch goes too far: rather than simply
delaying the finalization (relatively safe), it's messing with
the shared state (risky).

Well, it doesn't bother me that that has to be thrown out.
In fact, I do not agree with the requirement that the thread
which calls pthread_exit must not respond to signals;
the original patch works for me.

I.e. in my embedded GNU/Linux distro, that requirement
doesn't exist. And since I can't find it in the Single
Unix Specification, so much for that!

Nothing in the spec says that once pthread_exit is called,
signals are stopped. This function invokes cleanup handling,
and thread-specific-storage destruction. During any of those
tasks, signals can still be happening.  Any of those
tasks can easily enter into an indefinite wait. What if
a cleanup handler performs a blocking RPC to a remote
server? Well, there you are, stuck in pthread_exit,
handling signals, and not cleaning up your robust list, etc.

I also don't require robust locks to be cleaned up
instantly if they are owned by a main thread that has
called pthread_exit.

My organization is a heavy user of robust mutexes;
they protect the integrity of a large, ``real time'' database
stored in shared memory. I don't think that this would
affect us in any way. The principal concern is that
a process dies, for whatever reason, while holding locks.
It's more to recover from catastrophic failures, not from
mutex locking mistakes. If a thread locks a mutex and
doesn't release it due to bad program logic, that is a
problem whether or not that thread dies.  It's not
particularly useful that we can resolve that situation
with EOWNERDEAD in the kernel when that thread
happens to die, because that's just one case where
we are lucky, so to speak.

> Yes, gdb refuses to attach to the dead thread (I didn't check
> this myself, but I think you are right). But there is nothing
> wrong here, because we can't ptrace this thread. But, gdb
> _can_ ptrace the process, and it can see it have other threads.

Face it, allowing the thread leader to exit is as wrong as doing
other stupid things to the leader, like unsharing the signal
handler.

Either way, you are breaking some little stick which is holding
up the illusion that there is a process which has threads.

> OK, if nothing else. Let's suppose your patch is correct.

Obviously, the second part isn't. I'm very happy to have
any excuse to throw this out, thanks!

The first part isn't Ulrich-compliant. That is signficant,
and duly noted; but it's not the law where I'm sitting.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: main thread pthread_exit/sys_exit bug!
  2009-02-03 19:51                 ` Kaz Kylheku
@ 2009-02-03 21:32                   ` Oleg Nesterov
  2009-02-03 23:06                     ` Kaz Kylheku
  0 siblings, 1 reply; 20+ messages in thread
From: Oleg Nesterov @ 2009-02-03 21:32 UTC (permalink / raw)
  To: Kaz Kylheku; +Cc: linux-kernel, Andrew Morton, Roland McGrath

On 02/03, Kaz Kylheku wrote:
>
> Well, it doesn't bother me that that has to be thrown out.
> In fact, I do not agree with the requirement that the thread
> which calls pthread_exit must not respond to signals;
> the original patch works for me.

What about other users? We can't know what how much they
depend on the current behaviour.

> I.e. in my embedded GNU/Linux distro, that requirement
> doesn't exist. And since I can't find it in the Single
> Unix Specification, so much for that!
>
> Nothing in the spec says that once pthread_exit is called,
> signals are stopped. This function invokes cleanup handling,
> and thread-specific-storage destruction. During any of those
> tasks, signals can still be happening.  Any of those
> tasks can easily enter into an indefinite wait. What if
> a cleanup handler performs a blocking RPC to a remote
> server? Well, there you are, stuck in pthread_exit,
> handling signals, and not cleaning up your robust list, etc.
>
> I also don't require robust locks to be cleaned up
> instantly if they are owned by a main thread that has
> called pthread_exit.

OK, OK. Please forget about signals, futexes, etc.
Simple program:

	pthread_t main_thread;

	void *tfunc(void *a)
	{
		pthread_joni(main_thread, NULL);
		return NULL;
	}

	int main(void)
	{
		pthread_t thr;

		main_thread = pthread_self();
		pthread_create(&thr, NULL, tfunc, NULL);
		pthread_exit(NULL);
	}

I bet this will hang with your patch applied. Because
we depend on sys_futex(->clear_child_tid, FUTEX_WAKE, ...).

Kaz, you know, it is not easy to say "you patch is wrong
in any case, no matter how much it will be improved" ;)
But even if the current behaviour is not optimal, we must not
change it unless we think it leads to bugs. We can't know
which application can suffer. The current behaviour is old.

> Face it, allowing the thread leader to exit is as wrong as doing
> other stupid things to the leader, like unsharing the signal
> handler.

Perhaps. That is why I said _something_ like your patch perhaps
makes sense. But this is tricky, and I don't see a simple/clean
way to improve things. And, otoh, I do not see _real_ problems
with the zombie leaders.

As for original problem, it should be fixed anyway. wait_task_stopped()
should take SIGNAL_STOP_STOPPED into account, not task->state.
Unless we are ptracer, of course.

Oleg.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: main thread pthread_exit/sys_exit bug!
  2009-02-03 21:32                   ` Oleg Nesterov
@ 2009-02-03 23:06                     ` Kaz Kylheku
  0 siblings, 0 replies; 20+ messages in thread
From: Kaz Kylheku @ 2009-02-03 23:06 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: linux-kernel, Andrew Morton, Roland McGrath

On Tue, Feb 3, 2009 at 1:32 PM, Oleg Nesterov <oleg@redhat.com> wrote:
> On 02/03, Kaz Kylheku wrote:
>>
>> Well, it doesn't bother me that that has to be thrown out.
>> In fact, I do not agree with the requirement that the thread
>> which calls pthread_exit must not respond to signals;
>> the original patch works for me.
>
> What about other users? We can't know what how much they
> depend on the current behaviour.

If they haven't run into this gaping job control issue,
they haven't done a whole lot of testing, obviously!

Those who have run into it would certainly have to implement
a workaround --- like not calling pthread_exit from the main
thread!

I.e.  ``Q: Doctor, it hurts when I do this; A: So don't
do that!''.

> OK, OK. Please forget about signals, futexes, etc.
> Simple program:
>
>        pthread_t main_thread;
>
>        void *tfunc(void *a)
>        {
>                pthread_joni(main_thread, NULL);
>                return NULL;
>        }
>
>        int main(void)
>        {
>                pthread_t thr;
>
>                main_thread = pthread_self();
>                pthread_create(&thr, NULL, tfunc, NULL);
>                pthread_exit(NULL);
>        }

This test case appears to be conforming, so it
has to work.

The initial thread is considered joinable.

For instance a Rationale note in Issue 6 of the
SUS claims that one reason for the existence of
pthread_detach is so that the initial thread could
be detached, which cannot be done through thread
creation attributes for that thread. So the intent
is clearly that the initial thread is joinable!

> I bet this will hang with your patch applied.

It almost certainly will, and it does have to do with
futexes.

The main thread hasn't gone through the step where
it clears the TID, so the lll_wait_tid
futex wait in pthread_join will block. There is no
short-circuit indication in the library to indicate that
the main thread has died.

This TID trick is analogous to the robust list clean up. It's the same
kind of thing: fixing up a value of some registered
user-space memory location, signaling.

> Kaz, you know, it is not easy to say "you patch is wrong
> in any case, no matter how much it will be improved" ;)

Sure it is!

I will save you from that, because I do not believe in piling
hacks on top of hacks to fix something that may be
the wrong approach, even in situations where there is a
good chance that after some finite number of hacks,
it will finally be right.  I did that in LinuxThreads once upon
a time (mind you, that was so great, FreeBSD had to have it!)

I do think there is a clean, non-hacky way to reason
about this.

If we think about the process-containing-threads model that
the kernel is trying to emulate, and what should happen
when threads exit, we come to the following reasoning:

When a thread exits, there does have to be certain cleanup
with respect to that thread. But the process-wide cleanup
is not performed until all the threads are gone (thread
count hits zero).  This is easy to implement under the
process-contains-threads model.

These actions are not cleanly separated in Linux. There
is a do_exit function which handles both the thread-things
and the process-things in one swoop, so to speak.

The zombie problem occurs because do_exit goes too
far, cleaning up things that it shouldn't; things that
are necessary in holding up the POSIX-conforming
illusion that there is a process that contains threads.

My kneejerk approch was: hey wait, let's hold back
from doing /anything/ in do_exit; in fact let's not
call it at all if we're the initial thread and there are
still others. But obviously, it's not just anything in
do_exit that causes problems. Maybe some in-between
approach can work.  The pthread_join test case can
be fixed in a clean way, as can robust cleanup.

The thread can signal pthread_join by resetting its TID
to zero and hitting the futex. It can do the robust
list cleanup, etc.

If you can identify a good separation about what to do
first, and what to do later, maybe you can some decent
compromise among the concerns. Breaking up the
do_exit logic into

   do_exit_thread
   do_exit_process

would probably not hurt. You then have to pick whether
each action belongs to one or the other and stick
it into the appropriate function, with some clear
guidelines about what goes where.

In sys_exit, the two pieces could be used somehow like this:

   do_exit_thread();

   if (leader and not_empty(group))
     { special_logic(); }

   do_exit_process();

As one rule (for instance), any cleanup that threatens
the integrity of the process/thread model goes into do_exit_process.

So for instance if you ptrace that process, it still has all of its
memory areas intact (you don't have to look for a different
PID in the process list in order to find them).

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: main thread pthread_exit/sys_exit bug!
  2009-02-02 16:56       ` Oleg Nesterov
  2009-02-02 20:10         ` Kaz Kylheku
  2009-02-02 20:17         ` Ulrich Drepper
@ 2009-02-05  3:05         ` Roland McGrath
  2009-02-05  4:55           ` Kaz Kylheku
  2 siblings, 1 reply; 20+ messages in thread
From: Roland McGrath @ 2009-02-05  3:05 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: Kaz Kylheku, linux-kernel, Andrew Morton, Ulrich Drepper

I haven't seen the clear explanation of what specific actual problems there
are here.  But I'm quite sure this is not the right approach to address them.

Kaz has said things that seemed to imply that the behavior is erratic or
the semantics are somehow ill-defined when the group leader has died with
other threads living on.  In fact, this case is perfectly well-specified
and there is no mystery about it.

The group leader dies and becomes a zombie.  The zombie group leader is
kept from reaping and parent notification by the delayed_group_leader()
logic and related code, until the last thread in the group dies.  The tgid
(leader's tid), aka PID in POSIX terms, remains as the PID for the process
as a whole and signals to it work fine, etc.

Quite some time ago, there was some /proc bug wherein /proc/pid/task could
not be listed when the group leader had died.  That prevented strace or gdb
from attaching to the process after its initial thread used pthread_exit.
I don't recall when that was fixed, but it's been fine for a good while.
That is the only problem for debuggability of this case that I recall
knowing about.

Certainly long ago there were many problems with job control signals in
multi-thread groups, and there have been many little corner cases fixed in
that over the 2.6.x period.  I'm not aware of any such problems remaining.
But if there are some, they need to be fixed in the signals code.  It's
certainly clear how it's supposed to work, and that's no different when the
group leader is dead.

Thanks,
Roland

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: main thread pthread_exit/sys_exit bug!
  2009-02-05  3:05         ` Roland McGrath
@ 2009-02-05  4:55           ` Kaz Kylheku
  2009-02-05 16:15             ` Oleg Nesterov
  0 siblings, 1 reply; 20+ messages in thread
From: Kaz Kylheku @ 2009-02-05  4:55 UTC (permalink / raw)
  To: Roland McGrath; +Cc: Oleg Nesterov, linux-kernel, Andrew Morton, Ulrich Drepper

On Wed, Feb 4, 2009 at 7:05 PM, Roland McGrath <roland@redhat.com> wrote:
> I haven't seen the clear explanation of what specific actual problems there
> are here.  But I'm quite sure this is not the right approach to address them.
>
> Kaz has said things that seemed to imply that the behavior is erratic or
> the semantics are somehow ill-defined when the group leader has died with
> other threads living on.  In fact, this case is perfectly well-specified
> and there is no mystery about it.

I haven't observed anything that could be called erratic. The behavior that
occurs, occurs reliably.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: main thread pthread_exit/sys_exit bug!
  2009-02-05  4:55           ` Kaz Kylheku
@ 2009-02-05 16:15             ` Oleg Nesterov
  2009-02-05 21:22               ` Roland McGrath
  0 siblings, 1 reply; 20+ messages in thread
From: Oleg Nesterov @ 2009-02-05 16:15 UTC (permalink / raw)
  To: Kaz Kylheku; +Cc: Roland McGrath, linux-kernel, Andrew Morton, Ulrich Drepper

On 02/04, Kaz Kylheku wrote:
>
> On Wed, Feb 4, 2009 at 7:05 PM, Roland McGrath <roland@redhat.com> wrote:
> > I haven't seen the clear explanation of what specific actual problems there
> > are here.  But I'm quite sure this is not the right approach to address them.
> >
> > Kaz has said things that seemed to imply that the behavior is erratic or
> > the semantics are somehow ill-defined when the group leader has died with
> > other threads living on.  In fact, this case is perfectly well-specified
> > and there is no mystery about it.
>
> I haven't observed anything that could be called erratic. The behavior that
> occurs, occurs reliably.

Yes we have the bug, and wait_task_stopped() should be fixed. But it is
buggy anyway, even if we delay the death of the main thread. But I also
think we shouldn't.

(and I am sorry, I still can't find the time to redo my old patch, will
 try to do this asap).

Oleg.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: main thread pthread_exit/sys_exit bug!
  2009-02-05 16:15             ` Oleg Nesterov
@ 2009-02-05 21:22               ` Roland McGrath
  2009-02-05 23:22                 ` Oleg Nesterov
  0 siblings, 1 reply; 20+ messages in thread
From: Roland McGrath @ 2009-02-05 21:22 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: Kaz Kylheku, linux-kernel, Andrew Morton, Ulrich Drepper

> Yes we have the bug, and wait_task_stopped() should be fixed. But it is
> buggy anyway, even if we delay the death of the main thread. But I also
> think we shouldn't.

Sorry, I'd missed the actual bug report among all the tangential verbiage.
I wrote this test case for it.  Is there any problem other than this one?


Thanks,
Roland

==========

#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <signal.h>
#include <pthread.h>
#include <assert.h>

static void *
thfunc (void *arg)
{
  sleep (2);
  puts ("stopping");
  raise (SIGSTOP);
  puts ("resumed");
  exit (0);
}

int
main (void)
{
  pthread_t th;
  int rc = pthread_create (&th, NULL, &thfunc, NULL);
  assert_perror (rc);
  pthread_exit (0);
  /*NOTREACHED*/
  return 1;
}

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: main thread pthread_exit/sys_exit bug!
  2009-02-05 21:22               ` Roland McGrath
@ 2009-02-05 23:22                 ` Oleg Nesterov
  2009-02-09  3:33                   ` Roland McGrath
  0 siblings, 1 reply; 20+ messages in thread
From: Oleg Nesterov @ 2009-02-05 23:22 UTC (permalink / raw)
  To: Roland McGrath; +Cc: Kaz Kylheku, linux-kernel, Andrew Morton, Ulrich Drepper

On 02/05, Roland McGrath wrote:
>
> I wrote this test case for it.  Is there any problem other than this one?

I don't know about other problems with the zombie leaders.

Except, I am worried whether the fix I have in mind is correct ;)
It is simple, wait_task_stopped() should do

	if we tracer:

		check ->state, eat ->exit_code

	else:
		check SIGNAL_STOP_STOPPED, use ->group_exit_code

This looks logical, and should fix the problem. But this is
the user-visible change. For example,

	$ sleep 100
	^Z
	[1]+  Stopped                 sleep 100
	$ strace -p `pidof sleep`
	Process 11442 attached - interrupt to quit

strace hangs in do_wait(). But after the fix strace will happily
proceed. I can't know whether this behaviour change is bad or not.

Oleg.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: main thread pthread_exit/sys_exit bug!
  2009-02-05 23:22                 ` Oleg Nesterov
@ 2009-02-09  3:33                   ` Roland McGrath
  2009-02-09  4:52                     ` Oleg Nesterov
  0 siblings, 1 reply; 20+ messages in thread
From: Roland McGrath @ 2009-02-09  3:33 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: Kaz Kylheku, linux-kernel, Andrew Morton, Ulrich Drepper

> I don't know about other problems with the zombie leaders.

Ok, then we can just concentrate on the test case I posted.

> Except, I am worried whether the fix I have in mind is correct ;)
> It is simple, wait_task_stopped() should do

I think the first problem is we'll never even get into wait_task_stopped().
We'll be in wait_consider_task() on the group leader, which is EXIT_ZOMBIE.
First we need to adjust this:

	if (p->exit_state == EXIT_ZOMBIE && !delay_group_leader(p))
		return wait_task_zombie(p, options, infop, stat_addr, ru);

to maybe:

	if (p->exit_state == EXIT_ZOMBIE) {
		if (delay_group_leader(p))
			return wait_task_zombie_leader(p, options,
						       infop, stat_addr, ru);
		return wait_task_zombie(p, options, infop, stat_addr, ru);
	}

In wait_task_zombie_leader(), it will have to take the siglock and try to
figure out if there is a completed group stop to report.

> 	if we tracer:
> 
> 		check ->state, eat ->exit_code

Being the ptracer does not affect the delay_group_leader logic.
It just affects individual vs group stop reports.  So the existing
code path is right for ptrace.

> 	else:
> 		check SIGNAL_STOP_STOPPED, use ->group_exit_code

We don't want wait to change group_exit_code.  But we need the "reported as
stopped" tracking that wait_task_stopped() gets by clearing exit_code.  So
I think what we need is to get the zombie group_leader->exit_code to be set
to ->group_exit_code as it would have been if the leader were alive and had
participated in the group stop.

> This looks logical, and should fix the problem. But this is
> the user-visible change. For example,
> 
> 	$ sleep 100
> 	^Z
> 	[1]+  Stopped                 sleep 100
> 	$ strace -p `pidof sleep`
> 	Process 11442 attached - interrupt to quit
> 
> strace hangs in do_wait(). But after the fix strace will happily
> proceed. I can't know whether this behaviour change is bad or not.

I think this would only happen if the "reported as stopped" bookkeeping I
mentioned above were broken.  The "Stopped" line means that the shell just
did do_wait(WUNTRACED), so wait_task_stopped() cleared ->exit_code when
reporting it as stopped.  Now strace does PTRACE_ATTACH and then a wait;
it can't see a fresh wait result here because ->exit_code is still zero.

100% untested concept patch follows.


Thanks,
Roland

==========
diff --git a/kernel/exit.c b/kernel/exit.c
index f80dec3..0000000 100644  
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -1437,7 +1437,8 @@ static int wait_task_stopped(int ptrace,
 	exit_code = 0;
 	spin_lock_irq(&p->sighand->siglock);
 
-	if (unlikely(!task_is_stopped_or_traced(p)))
+	if (unlikely(!task_is_stopped_or_traced(p)) &&
+	    (ptrace || p->exit_state != EXIT_ZOMBIE || !delay_group_leader(p)))
 		goto unlock_sig;
 
 	if (!ptrace && p->signal->group_stop_count > 0)
@@ -1598,9 +1599,20 @@ static int wait_consider_task(struct tas
 
 	/*
 	 * We don't reap group leaders with subthreads.
+	 * When the group leader is dead, it still serves as
+	 * a moniker for the whole group for stop and continue.
+	 * But for ptrace, stop and continue are reported per-thread.
 	 */
-	if (p->exit_state == EXIT_ZOMBIE && !delay_group_leader(p))
-		return wait_task_zombie(p, options, infop, stat_addr, ru);
+	if (p->exit_state == EXIT_ZOMBIE) {
+		if (!delay_group_leader(p))
+			return wait_task_zombie(p, options,
+						infop, stat_addr, ru);
+		*notask_error = 0;
+		if (unlikely(ptrace))
+			return 0;
+		return wait_task_stopped(p, options, infop, stat_addr, ru) ?:
+			wait_task_continued(p, options, infop, stat_addr, ru);
+	}
 
 	/*
 	 * It's stopped or running now, so it might
diff --git a/kernel/signal.c b/kernel/signal.c
index b6b3676..0000000 100644  
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1653,6 +1653,27 @@ finish_stop(int stop_count)
 }
 
 /*
+ * Complete group stop bookkeeping after decrementing sig->group_stop_count,
+ * to new value stop_count.  When it reaches zero, mark the process stopped.
+ *
+ * If the group leader is already dead, then it did not participate
+ * normally in the group stop.  But its ->exit_code stands for the whole
+ * group in do_wait() bookkeeping, so we need it to reflect the stop.
+ */
+static inline void complete_group_stop(struct task_struct *tsk,
+				       struct signal_struct *sig,
+				       int stop_count)
+{
+	if (stop_count)
+		return;
+
+	sig->flags = SIGNAL_STOP_STOPPED;
+
+	if (tsk->group_leader->exit_state)
+		tsk->group_leader->exit_code = sig->group_exit_code;
+}
+
+/*
  * This performs the stopping for SIGSTOP and other stop signals.
  * We have to stop all threads in the thread group.
  * Returns nonzero if we've actually stopped and released the siglock.
@@ -1696,8 +1717,7 @@ static int do_signal_stop(int signr)
 		sig->group_stop_count = stop_count;
 	}
 
-	if (stop_count == 0)
-		sig->flags = SIGNAL_STOP_STOPPED;
+	complete_group_stop(current, sig, stop_count);
 	current->exit_code = sig->group_exit_code;
 	__set_current_state(TASK_STOPPED);
 
@@ -1933,9 +1953,8 @@ void exit_signals(struct task_struct *ts
 		if (!signal_pending(t) && !(t->flags & PF_EXITING))
 			recalc_sigpending_and_wake(t);
 
-	if (unlikely(tsk->signal->group_stop_count) &&
-			!--tsk->signal->group_stop_count) {
-		tsk->signal->flags = SIGNAL_STOP_STOPPED;
+	if (unlikely(tsk->signal->group_stop_count)) {
+		complete_group_stop(tsk, sig, --tsk->signal->group_stop_count);
 		group_stop = 1;
 	}
 out:

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: main thread pthread_exit/sys_exit bug!
  2009-02-09  3:33                   ` Roland McGrath
@ 2009-02-09  4:52                     ` Oleg Nesterov
  2009-02-09  5:14                       ` Oleg Nesterov
  0 siblings, 1 reply; 20+ messages in thread
From: Oleg Nesterov @ 2009-02-09  4:52 UTC (permalink / raw)
  To: Roland McGrath; +Cc: Kaz Kylheku, linux-kernel, Andrew Morton, Ulrich Drepper

I am already sleep, will return tomorrow. Just a souple of quick notes...

On 02/08, Roland McGrath wrote:
>
> I think the first problem is we'll never even get into wait_task_stopped().
> We'll be in wait_consider_task() on the group leader, which is EXIT_ZOMBIE.

Yes sure. I meant, instead of just checking task_is_stopped_or_traced() in
wait_consider_task(), we should do somthing like

	int wait_is_stopped(p, ptrace)
	{
		if (ptrace)
			// wait_task_stopped() will also check p->exit_code != 0
			return task_is_stopped_or_traced(p);
		else
			// wait_task_stopped() will also check ->group_exit_code != 0
			return !!(signal->flags & SIGNAL_STOP_STOPPED);
	}

> In wait_task_zombie_leader(), it will have to take the siglock and try to
> figure out if there is a completed group stop to report.
>
> > 	if we tracer:
> >
> > 		check ->state, eat ->exit_code
>
> Being the ptracer does not affect the delay_group_leader logic.
> It just affects individual vs group stop reports.  So the existing
> code path is right for ptrace.
>
> > 	else:
> > 		check SIGNAL_STOP_STOPPED, use ->group_exit_code
>
> We don't want wait to change group_exit_code.  But we need the "reported as
> stopped" tracking that wait_task_stopped() gets by clearing exit_code.

I never understood this.

Why do we mix the normal group stop with the ptrace "per-thread" stops?

Look. We have the main thread M and the sub-thread T. We stop this process,
its parent does do_wait() and clears M->exit_code.

Now, ptracer can attach to T (it still has ->exit_code != 0), but not to M.
This always looked very strange to me.

Or. ptracer attaches to the main thread and (say) does nothing. We send
SIGSTOP to another thread. The whole group stops, but its parent can't
see this. Why? Then ptracer does PTRACE_DETACH, and the parent still can't
(and will never can) see the stop unless the ptracer puts something reasonable
into ->exit_code. But even in this case we lost the notofication.

> So
> I think what we need is to get the zombie group_leader->exit_code to be set
> to ->group_exit_code as it would have been if the leader were alive and had
> participated in the group stop.

Please see below.

> > 	$ sleep 100
> > 	^Z
> > 	[1]+  Stopped                 sleep 100
> > 	$ strace -p `pidof sleep`
> > 	Process 11442 attached - interrupt to quit
> >
> > strace hangs in do_wait(). But after the fix strace will happily
> > proceed. I can't know whether this behaviour change is bad or not.
>
> I think this would only happen if the "reported as stopped" bookkeeping I
> mentioned above were broken.  The "Stopped" line means that the shell just
> did do_wait(WUNTRACED), so wait_task_stopped() cleared ->exit_code when
> reporting it as stopped.  Now strace does PTRACE_ATTACH and then a wait;
> it can't see a fresh wait result here because ->exit_code is still zero.

Yes. And why ptracer should wait?

> 100% untested concept patch follows.

it adds more special cases for the delay_group_leader() zombies :(

> +static inline void complete_group_stop(struct task_struct *tsk,
> +				       struct signal_struct *sig,
> +				       int stop_count)
> +{
> +	if (stop_count)
> +		return;
> +
> +	sig->flags = SIGNAL_STOP_STOPPED;
> +
> +	if (tsk->group_leader->exit_state)
> +		tsk->group_leader->exit_code = sig->group_exit_code;
> +}

This doesn't look exactly right. Unless another thread does sys_exit_group()
later (they all can exit via sys_exit), wait_task_zombie() may report this
->exit_code == SIGSTOP.

Oleg.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: main thread pthread_exit/sys_exit bug!
  2009-02-09  4:52                     ` Oleg Nesterov
@ 2009-02-09  5:14                       ` Oleg Nesterov
  0 siblings, 0 replies; 20+ messages in thread
From: Oleg Nesterov @ 2009-02-09  5:14 UTC (permalink / raw)
  To: Roland McGrath; +Cc: Kaz Kylheku, linux-kernel, Andrew Morton, Ulrich Drepper

On 02/09, Oleg Nesterov wrote:
>
> Yes sure. I meant, instead of just checking task_is_stopped_or_traced() in
> wait_consider_task(), we should do somthing like

In short, please see the "patch" below. I doubt it can be compiled,
just for the illustration.

Oleg.

--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -1417,6 +1417,19 @@ static int wait_task_zombie(struct task_
 	return retval;
 }
 
+static int *wait_xxx(struct task_struct *p, int ptrace)
+{
+	if (ptrace) {
+		if (task_is_stopped_or_traced(p))
+			return &p->exit_code;
+	} else {
+		if (p->signal->flags & SIGNAL_STOPPED_STOPPED)
+			return &p->signal->group_exit_code;
+	}
+
+	return NULL;
+}
+
 /*
  * Handle sys_wait4 work for one task in state TASK_STOPPED.  We hold
  * read_lock(&tasklist_lock) on entry.  If we return zero, we still hold
@@ -1427,7 +1440,7 @@ static int wait_task_stopped(int ptrace,
 			     int options, struct siginfo __user *infop,
 			     int __user *stat_addr, struct rusage __user *ru)
 {
-	int retval, exit_code, why;
+	int retval, exit_code, *p_code, why;
 	uid_t uid = 0; /* unneeded, required by compiler */
 	pid_t pid;
 
@@ -1437,22 +1450,16 @@ static int wait_task_stopped(int ptrace,
 	exit_code = 0;
 	spin_lock_irq(&p->sighand->siglock);
 
-	if (unlikely(!task_is_stopped_or_traced(p)))
-		goto unlock_sig;
-
-	if (!ptrace && p->signal->group_stop_count > 0)
-		/*
-		 * A group stop is in progress and this is the group leader.
-		 * We won't report until all threads have stopped.
-		 */
+	p_code = wait_xxx(p, ptrace);
+	if (unlikely(!p_code))
 		goto unlock_sig;
 
-	exit_code = p->exit_code;
+	exit_code = *p_code;
 	if (!exit_code)
 		goto unlock_sig;
 
 	if (!unlikely(options & WNOWAIT))
-		p->exit_code = 0;
+		*p_code = 0;
 
 	/* don't need the RCU readlock here as we're holding a spinlock */
 	uid = __task_cred(p)->uid;
@@ -1608,7 +1615,7 @@ static int wait_consider_task(struct tas
 	 */
 	*notask_error = 0;
 
-	if (task_is_stopped_or_traced(p))
+	if (wait_xxx(p, ptrace))
 		return wait_task_stopped(ptrace, p, options,
 					 infop, stat_addr, ru);
 


^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2009-02-09  5:17 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-02-01 22:32 main thread pthread_exit/sys_exit bug! Kaz Kylheku
     [not found] ` <20090201174159.4a52e15c.akpm@linux-foundation.org>
2009-02-02  6:45   ` Oleg Nesterov
2009-02-02  7:10     ` Kaz Kylheku
2009-02-02 16:56       ` Oleg Nesterov
2009-02-02 20:10         ` Kaz Kylheku
2009-02-02 20:17         ` Ulrich Drepper
2009-02-02 20:39           ` Kaz Kylheku
2009-02-03  2:39             ` Kaz Kylheku
2009-02-03 13:33               ` Oleg Nesterov
2009-02-03 19:51                 ` Kaz Kylheku
2009-02-03 21:32                   ` Oleg Nesterov
2009-02-03 23:06                     ` Kaz Kylheku
2009-02-05  3:05         ` Roland McGrath
2009-02-05  4:55           ` Kaz Kylheku
2009-02-05 16:15             ` Oleg Nesterov
2009-02-05 21:22               ` Roland McGrath
2009-02-05 23:22                 ` Oleg Nesterov
2009-02-09  3:33                   ` Roland McGrath
2009-02-09  4:52                     ` Oleg Nesterov
2009-02-09  5:14                       ` Oleg Nesterov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox