* [PATCH 2/3] ptrace_stop: remove the wrong ->group_stop_count bookkeeping
@ 2007-12-08 18:38 Oleg Nesterov
2007-12-09 0:31 ` Eric W. Biederman
0 siblings, 1 reply; 6+ messages in thread
From: Oleg Nesterov @ 2007-12-08 18:38 UTC (permalink / raw)
To: Andrew Morton, Davide Libenzi, Eric W. Biederman, Ingo Molnar,
Linus Torvalds, Roland McGrath
Cc: linux-kernel
ptrace_stop() decrements ->group_stop_count to "participate" in group stop.
This looks very wrong to me, the task can in fact decrement this counter twice.
If the tracee returns to the user-space before other threads complete the group
stop, it will notice TIF_SIGPENDING and do it again.
Another problem is that we don't set SIGNAL_STOP_STOPPED if the counter becomes
zero.
I must admit, I don't undestand the reason why this code was added, it is very
old.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
--- PT/kernel/signal.c~5_ptrace_stop 2007-12-08 16:46:37.000000000 +0300
+++ PT/kernel/signal.c 2007-12-08 16:47:53.000000000 +0300
@@ -1579,13 +1579,6 @@ static inline int may_ptrace_stop(void)
*/
static void ptrace_stop(int exit_code, int clear_code, siginfo_t *info)
{
- /*
- * If there is a group stop in progress,
- * we must participate in the bookkeeping.
- */
- if (current->signal->group_stop_count > 0)
- --current->signal->group_stop_count;
-
current->last_siginfo = info;
current->exit_code = exit_code;
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: [PATCH 2/3] ptrace_stop: remove the wrong ->group_stop_count bookkeeping 2007-12-08 18:38 [PATCH 2/3] ptrace_stop: remove the wrong ->group_stop_count bookkeeping Oleg Nesterov @ 2007-12-09 0:31 ` Eric W. Biederman 2007-12-09 14:05 ` Oleg Nesterov 0 siblings, 1 reply; 6+ messages in thread From: Eric W. Biederman @ 2007-12-09 0:31 UTC (permalink / raw) To: Oleg Nesterov Cc: Andrew Morton, Davide Libenzi, Ingo Molnar, Linus Torvalds, Roland McGrath, linux-kernel Oleg Nesterov <oleg@tv-sign.ru> writes: > ptrace_stop() decrements ->group_stop_count to "participate" in group stop. > This looks very wrong to me, the task can in fact decrement this counter twice. > If the tracee returns to the user-space before other threads complete the group > stop, it will notice TIF_SIGPENDING and do it again. This is one of those interesting weird cases. The ptrace interface remains per task. So need to handle a simultaneous thread group stop and a per task stop. > > Another problem is that we don't set SIGNAL_STOP_STOPPED if the counter becomes > zero. > > I must admit, I don't undestand the reason why this code was added, it is very > old. I haven't dug in enough yet to understand better, but it is my hunch we need to do something when we have both kinds of stop happening simultaneously. Eric ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 2/3] ptrace_stop: remove the wrong ->group_stop_count bookkeeping 2007-12-09 0:31 ` Eric W. Biederman @ 2007-12-09 14:05 ` Oleg Nesterov 2008-01-10 10:41 ` Petr Tesarik 0 siblings, 1 reply; 6+ messages in thread From: Oleg Nesterov @ 2007-12-09 14:05 UTC (permalink / raw) To: Eric W. Biederman Cc: Andrew Morton, Davide Libenzi, Ingo Molnar, Linus Torvalds, Roland McGrath, linux-kernel On 12/08, Eric W. Biederman wrote: > > Oleg Nesterov <oleg@tv-sign.ru> writes: > > > ptrace_stop() decrements ->group_stop_count to "participate" in group stop. > > This looks very wrong to me, the task can in fact decrement this counter twice. > > If the tracee returns to the user-space before other threads complete the group > > stop, it will notice TIF_SIGPENDING and do it again. > > This is one of those interesting weird cases. The ptrace interface remains per > task. > > So need to handle a simultaneous thread group stop and a per task stop. > > > > > > Another problem is that we don't set SIGNAL_STOP_STOPPED if the counter becomes > > zero. > > > > I must admit, I don't undestand the reason why this code was added, it is very > > old. > > I haven't dug in enough yet to understand better, but it is my hunch we > need to do something when we have both kinds of stop happening simultaneously. Looking further, I think it was done to match the !is_task_stopped_or_traced() check in do_signal_stop(). Still, I don't understand why we really need this decrement. The ptrace interface needs only per-thread TASK_TRACED ot TASK_STOPPED, it doesn't need the completion of the group stop. We can delay the completion of the group stop, but why this is bad? At worse, the tracer recieves the extra CLD_STOPPED when the tracee resumes. And do_signal_stop() probably can s/is_task_stopped_or_traced/is_task_stopped/. OK, it is better to ignore this patch, I don't understand all implications of this change. But this all doesn't look very good. Suppose we have a lot of threads and the task with _TIF_SYSCALL_TRACE does system call. So ptrace_notify() decrements the counter before syscall, after, and before the return to user-space. Hopefully Roland can clarify. Oleg. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 2/3] ptrace_stop: remove the wrong ->group_stop_count bookkeeping 2007-12-09 14:05 ` Oleg Nesterov @ 2008-01-10 10:41 ` Petr Tesarik 2008-01-10 21:39 ` Oleg Nesterov 0 siblings, 1 reply; 6+ messages in thread From: Petr Tesarik @ 2008-01-10 10:41 UTC (permalink / raw) To: Oleg Nesterov Cc: Eric W. Biederman, Andrew Morton, Davide Libenzi, Ingo Molnar, Linus Torvalds, Roland McGrath, linux-kernel -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Oleg Nesterov wrote: > On 12/08, Eric W. Biederman wrote: >> Oleg Nesterov <oleg@tv-sign.ru> writes: >> >>> ptrace_stop() decrements ->group_stop_count to "participate" in group stop. >>> This looks very wrong to me, the task can in fact decrement this counter twice. >>> If the tracee returns to the user-space before other threads complete the group >>> stop, it will notice TIF_SIGPENDING and do it again. >> This is one of those interesting weird cases. The ptrace interface remains per >> task. >> >> So need to handle a simultaneous thread group stop and a per task stop. >> >> >>> Another problem is that we don't set SIGNAL_STOP_STOPPED if the counter becomes >>> zero. >>> >>> I must admit, I don't undestand the reason why this code was added, it is very >>> old. >> I haven't dug in enough yet to understand better, but it is my hunch we >> need to do something when we have both kinds of stop happening simultaneously. > > Looking further, I think it was done to match the !is_task_stopped_or_traced() > check in do_signal_stop(). > > Still, I don't understand why we really need this decrement. The ptrace interface > needs only per-thread TASK_TRACED ot TASK_STOPPED, it doesn't need the completion > of the group stop. We can delay the completion of the group stop, but why this is > bad? At worse, the tracer recieves the extra CLD_STOPPED when the tracee resumes. > And do_signal_stop() probably can s/is_task_stopped_or_traced/is_task_stopped/. > > OK, it is better to ignore this patch, I don't understand all implications of this > change. But this all doesn't look very good. Suppose we have a lot of threads and > the task with _TIF_SYSCALL_TRACE does system call. So ptrace_notify() decrements > the counter before syscall, after, and before the return to user-space. > > Hopefully Roland can clarify. Roland, could you help here? I can actually see a bug which may be related: 1. a process creates a thread (or more threads) 2. I attach/detach to that thread with strace several times (each time pressing CTRL-C to quit strace) 3. the whole thread group (except the traced thread) ends in TASK_STOPPED I looked at what strace was doing to that thread, and it sometimes sends SIGSTOP shortly before detaching. This is done when the thread is running, i.e. not waiting in ptrace_stop. Then PTRACE_DETACH returns - -ESRCH because it requires the tracee to be stopped -- just like all PTRACE_* requests except TRACEME and ATTACH. So, strace has no other option than to send an explicit SIGSTOP to the thread to stop it and discard it afterwards. Could this be related? Kind regards, Petr Tesarik -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFHhfZ0jpY2ODFi2ogRAmCdAJ0dfhCoXMavThBKF7ZQSQ7J0t3ApACfYxdH ZxEvTOrGrVO6rliHzPxBlEo= =oxx4 -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 2/3] ptrace_stop: remove the wrong ->group_stop_count bookkeeping 2008-01-10 10:41 ` Petr Tesarik @ 2008-01-10 21:39 ` Oleg Nesterov 2008-01-11 8:50 ` Petr Tesarik 0 siblings, 1 reply; 6+ messages in thread From: Oleg Nesterov @ 2008-01-10 21:39 UTC (permalink / raw) To: Petr Tesarik Cc: Eric W. Biederman, Andrew Morton, Davide Libenzi, Ingo Molnar, Linus Torvalds, Roland McGrath, linux-kernel On 01/10, Petr Tesarik wrote: > > I can actually see a bug which may be related: > > 1. a process creates a thread (or more threads) > 2. I attach/detach to that thread with strace several times > (each time pressing CTRL-C to quit strace) > 3. the whole thread group (except the traced thread) ends in > TASK_STOPPED > > I looked at what strace was doing to that thread, and it sometimes sends > SIGSTOP shortly before detaching. This is done when the thread is > running, i.e. not waiting in ptrace_stop. Then PTRACE_DETACH returns > - -ESRCH because it requires the tracee to be stopped -- just like all > PTRACE_* requests except TRACEME and ATTACH. So, strace has no other > option than to send an explicit SIGSTOP to the thread to stop it and > discard it afterwards. > > Could this be related? Perhaps yes. But there are so many oddities in this area. I don't know what really happens with your test-case, but afaics this can happen even without ptrace_stop() playing with the group stop. Let's suppose that strace detached all sub-threads except T which is running, and now strace does ptrace(PTRACE_DETACH, T). This fails, so strace does kill(T, SIGSTOP). Note that it use kill(), not tkill(). This means another sub-thread can dequeue this signal and initiate the group stop (remember, it was already detached and thus it is not traced any longer). Now strace does wait4(T, __WALL). T notices the group stop in progress, calls handle_group_stop(), and notifies its parent - strace. wait4() returns success, strace does ptrace(PTRACE_DETACH, T) again. Now T is TASK_STOPPED, ptrace() changes the state to TASK_TRACED and finally does ptrace_untrace(). ptrace_untrace() sees TASK_TRACED. But it is possible that the group stop is not completed yet (some sub-thread didn't pass handle_group_stop()), in that case we are doing signal_wake_up(T, 1) so it becomes running. I still think this series makes sense even if not complete. Oleg. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 2/3] ptrace_stop: remove the wrong ->group_stop_count bookkeeping 2008-01-10 21:39 ` Oleg Nesterov @ 2008-01-11 8:50 ` Petr Tesarik 0 siblings, 0 replies; 6+ messages in thread From: Petr Tesarik @ 2008-01-11 8:50 UTC (permalink / raw) To: Oleg Nesterov Cc: Eric W. Biederman, Andrew Morton, Davide Libenzi, Ingo Molnar, Linus Torvalds, Roland McGrath, linux-kernel Oleg Nesterov wrote: > On 01/10, Petr Tesarik wrote: >> I can actually see a bug which may be related: >> >> 1. a process creates a thread (or more threads) >> 2. I attach/detach to that thread with strace several times >> (each time pressing CTRL-C to quit strace) >> 3. the whole thread group (except the traced thread) ends in >> TASK_STOPPED >> >> I looked at what strace was doing to that thread, and it sometimes sends >> SIGSTOP shortly before detaching. This is done when the thread is >> running, i.e. not waiting in ptrace_stop. Then PTRACE_DETACH returns >> - -ESRCH because it requires the tracee to be stopped -- just like all >> PTRACE_* requests except TRACEME and ATTACH. So, strace has no other >> option than to send an explicit SIGSTOP to the thread to stop it and >> discard it afterwards. >> >> Could this be related? > > Perhaps yes. But there are so many oddities in this area. I don't know what > really happens with your test-case, but afaics this can happen even without > ptrace_stop() playing with the group stop. > > Let's suppose that strace detached all sub-threads except T which is running, > and now strace does ptrace(PTRACE_DETACH, T). This fails, so strace does > kill(T, SIGSTOP). > > Note that it use kill(), not tkill(). This means another sub-thread can > dequeue this signal and initiate the group stop (remember, it was already > detached and thus it is not traced any longer). In fact, it had been never traced - I attached strace to the PID of the sub-thread, not to the thread group leader. Anyway, I haven't seen the erroneous stop again since I changed detach() to call tkill() instead of kill(). It's not a proof, because the failure was very seldom, so I'll keep testing, but it makes much sense to me. Petr > Now strace does wait4(T, __WALL). T notices the group stop in progress, > calls handle_group_stop(), and notifies its parent - strace. > > wait4() returns success, strace does ptrace(PTRACE_DETACH, T) again. Now > T is TASK_STOPPED, ptrace() changes the state to TASK_TRACED and finally > does ptrace_untrace(). > > ptrace_untrace() sees TASK_TRACED. But it is possible that the group stop > is not completed yet (some sub-thread didn't pass handle_group_stop()), in > that case we are doing signal_wake_up(T, 1) so it becomes running. > > > I still think this series makes sense even if not complete. > > Oleg. > ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2008-01-11 8:46 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2007-12-08 18:38 [PATCH 2/3] ptrace_stop: remove the wrong ->group_stop_count bookkeeping Oleg Nesterov 2007-12-09 0:31 ` Eric W. Biederman 2007-12-09 14:05 ` Oleg Nesterov 2008-01-10 10:41 ` Petr Tesarik 2008-01-10 21:39 ` Oleg Nesterov 2008-01-11 8:50 ` Petr Tesarik
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox