[PATCH ptrace] ptrace: fix signal->wait_chldexit usage in task_clear_group_stop

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH ptrace] ptrace: fix signal->wait_chldexit usage in task_clear_group_stop_trapping()
@ 2011-05-06  9:52 Tejun Heo
  2011-05-08 13:35 ` Oleg Nesterov
  0 siblings, 1 reply; 6+ messages in thread
From: Tejun Heo @ 2011-05-06  9:52 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: linux-kernel, Linus Torvalds, Andrew Morton

GROUP_STOP_TRAPPING waiting mechanism piggybacks on
signal->wait_chldexit which is primarily used to implement waiting for
wait(2) and friends.  When do_wait() waits on signal->wait_chldexit,
it uses a custom wake up callback, child_wait_callback(), which
expects the child task which is waking up the parent to be passed in
as @key to filter out spurious wakeups.

task_clear_group_stop_trapping() used __wake_up_sync() which uses NULL
@key causing the following oops if the parent was doing do_wait().

  BUG: unable to handle kernel NULL pointer dereference at 00000000000002d8
  IP: [<ffffffff810499f9>] child_wait_callback+0x29/0x80
  PGD 1d899067 PUD 1e418067 PMD 0
  Oops: 0000 [#1] PREEMPT SMP
  last sysfs file: /sys/devices/pci0000:00/0000:00:03.0/local_cpus
  CPU 2
  Modules linked in:

  Pid: 4498, comm: test-continued Not tainted 2.6.39-rc6-work+ #32 Bochs Bochs
  RIP: 0010:[<ffffffff810499f9>]  [<ffffffff810499f9>] child_wait_callback+0x29/0x80
  RSP: 0000:ffff88001b889bf8  EFLAGS: 00010046
  RAX: 0000000000000000 RBX: ffff88001fab3af8 RCX: 0000000000000000
  RDX: 0000000000000001 RSI: 0000000000000002 RDI: ffff88001d91df20
  RBP: ffff88001b889c08 R08: 0000000000000000 R09: 0000000000000000
  R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000
  R13: ffff88001fb70550 R14: 0000000000000000 R15: 0000000000000001
  FS:  00007f26ccae4700(0000) GS:ffff88001fd00000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
  CR2: 00000000000002d8 CR3: 000000001b8ac000 CR4: 00000000000006e0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
  Process test-continued (pid: 4498, threadinfo ffff88001b888000, task ffff88001fb88000)
  Stack:
   ffff88001b889c18 ffff88001fb70538 ffff88001b889c58 ffffffff810312f9
   0000000000000001 0000000200000001 ffff88001b889c58 ffff88001fb70518
   0000000000000002 0000000000000082 0000000000000001 0000000000000000
  Call Trace:
   [<ffffffff810312f9>] __wake_up_common+0x59/0x90
   [<ffffffff81035263>] __wake_up_sync_key+0x53/0x80
   [<ffffffff810352a0>] __wake_up_sync+0x10/0x20
   [<ffffffff8105a984>] task_clear_jobctl_trapping+0x44/0x50
   [<ffffffff8105bcbc>] ptrace_stop+0x7c/0x290
   [<ffffffff8105c20a>] do_signal_stop+0x28a/0x2d0
   [<ffffffff8105d27f>] get_signal_to_deliver+0x14f/0x5a0
   [<ffffffff81002175>] do_signal+0x75/0x7b0
   [<ffffffff8100292d>] do_notify_resume+0x5d/0x70
   [<ffffffff8182e36a>] retint_signal+0x46/0x8c
  Code: 00 00 55 48 89 e5 53 48 83 ec 08 0f 1f 44 00 00 8b 47 d8 83 f8 03 74 3a 85 c0 49 89 c8 75 23 89 c0 48 8b 5f e0 4c 8d 0c 40 31 c0 <4b> 39 9c c8 d8 02 00 00 74 1d 48 83 c4 08 5b c9 c3 66 0f 1f 44

Fix it by using __wake_up_sync_key() and passing in the child as @key.

I still think it's a mistake to piggyback on wait_chldexit for this.
Given the relative low frequency of ptrace use, we would be much
better off leaving already complex wait_chldexit alone and using bit
waitqueue.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 kernel/signal.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Index: work/kernel/signal.c
===================================================================
--- work.orig/kernel/signal.c
+++ work/kernel/signal.c
@@ -238,8 +238,8 @@ static void task_clear_group_stop_trappi
 {
 	if (unlikely(task->group_stop & GROUP_STOP_TRAPPING)) {
 		task->group_stop &= ~GROUP_STOP_TRAPPING;
-		__wake_up_sync(&task->parent->signal->wait_chldexit,
-			       TASK_UNINTERRUPTIBLE, 1);
+		__wake_up_sync_key(&task->parent->signal->wait_chldexit,
+				   TASK_UNINTERRUPTIBLE, 1, task);
 	}
 }
 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH ptrace] ptrace: fix signal->wait_chldexit usage in task_clear_group_stop_trapping()
  2011-05-06  9:52 [PATCH ptrace] ptrace: fix signal->wait_chldexit usage in task_clear_group_stop_trapping() Tejun Heo
@ 2011-05-08 13:35 ` Oleg Nesterov
  2011-05-08 14:02   ` Tejun Heo
  0 siblings, 1 reply; 6+ messages in thread
From: Oleg Nesterov @ 2011-05-08 13:35 UTC (permalink / raw)
  To: Tejun Heo; +Cc: linux-kernel, Linus Torvalds, Andrew Morton

On 05/06, Tejun Heo wrote:
>
> GROUP_STOP_TRAPPING waiting mechanism piggybacks on
> signal->wait_chldexit which is primarily used to implement waiting for
> wait(2) and friends.  When do_wait() waits on signal->wait_chldexit,
> it uses a custom wake up callback, child_wait_callback(), which
> expects the child task which is waking up the parent to be passed in
> as @key to filter out spurious wakeups.
>
> task_clear_group_stop_trapping() used __wake_up_sync() which uses NULL
> @key causing the following oops if the parent was doing do_wait().
>
>   BUG: unable to handle kernel NULL pointer dereference at 00000000000002d8
>   IP: [<ffffffff810499f9>] child_wait_callback+0x29/0x80

Argh! Shame on me. Somehow I convinced myself that this needs the cleanup
but safe because we are not going to wake up the TASK_INTTERRUPTIBLE tasks
sleeping in do_wait(). Somehow I forgot that wait_queue_t->func() is called
anyway without checking "mode". I even specially mentioned this should be
safe during the review ;)

> I still think it's a mistake to piggyback on wait_chldexit for this.

Agreed. Previously I thought we should teach __wake_up_parent() to wake
up the TASK_UNINTERRUPTIBLE tasks, now I think this is not needed.

> Given the relative low frequency of ptrace use, we would be much
> better off leaving already complex wait_chldexit alone and using bit
> waitqueue.

Well, I don't think so. wait_on_bit() looks as unnecessary complication
to me. See below.

> --- work.orig/kernel/signal.c
> +++ work/kernel/signal.c
> @@ -238,8 +238,8 @@ static void task_clear_group_stop_trappi
>  {
>  	if (unlikely(task->group_stop & GROUP_STOP_TRAPPING)) {
>  		task->group_stop &= ~GROUP_STOP_TRAPPING;
> -		__wake_up_sync(&task->parent->signal->wait_chldexit,
> -			       TASK_UNINTERRUPTIBLE, 1);
> +		__wake_up_sync_key(&task->parent->signal->wait_chldexit,
> +				   TASK_UNINTERRUPTIBLE, 1, task);
>  	}
>  }

I believe this is correct and should fix the problem.

But. Why do we need signal->wait_chldexit or bit waitqueue at all?
Previously this was needed because wait_event(!GROUP_STOP_TRAPPING)
was called from ptrace_check_attach(), and the tracer can do anything
after ptrace_attach() which sets GROUP_STOP_TRAPPING.

With the current code we know that GROUP_STOP_TRAPPING means: the
tracer can't go away from ptrace_attach() until we clear this bit.

How about the patch below instead?

Hmm. This is minor and off-topic, but perhaps it makes sense to move
the code which sets/waits GROUP_STOP_TRAPPING from ptrace_attach() to
the separate function, it can be called outside of tasklist_lock and
cred_guard_mutex.

And. Could you remind why ptrace_attach() does signal_wake_up() instead
of wake_up_state(TASK_STOPPED) ? OK, in general we shouldn't set
GROUP_STOP_PENDING without TIF_SIGPENDING, but in this case?

Oleg.

--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -256,9 +256,16 @@ unlock_tasklist:
 unlock_creds:
 	mutex_unlock(&task->signal->cred_guard_mutex);
 out:
-	if (wait_trap)
-		wait_event(current->signal->wait_chldexit,
-			   !(task->group_stop & GROUP_STOP_TRAPPING));
+	if (wait_trap) {
+		for (;;) {
+			set_current_state(TASK_UNINTERRUPTIBLE);
+			if (!(task->group_stop & GROUP_STOP_TRAPPING))
+				break;
+			schedule();
+		}
+		__set_current_state(TASK_RUNNING);
+	}
+
 	return retval;
 }

--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -238,8 +238,7 @@ static void task_clear_group_stop_trappi
 {
 	if (unlikely(task->group_stop & GROUP_STOP_TRAPPING)) {
 		task->group_stop &= ~GROUP_STOP_TRAPPING;
-		__wake_up_sync(&task->parent->signal->wait_chldexit,
-			       TASK_UNINTERRUPTIBLE, 1);
+		wake_up_process(task->parent);
 	}
 }

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH ptrace] ptrace: fix signal->wait_chldexit usage in task_clear_group_stop_trapping()
  2011-05-08 13:35 ` Oleg Nesterov
@ 2011-05-08 14:02   ` Tejun Heo
  2011-05-08 15:34     ` Oleg Nesterov
  0 siblings, 1 reply; 6+ messages in thread
From: Tejun Heo @ 2011-05-08 14:02 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: linux-kernel, Linus Torvalds, Andrew Morton

Hello, Oleg.

On Sun, May 08, 2011 at 03:35:43PM +0200, Oleg Nesterov wrote:
> > Given the relative low frequency of ptrace use, we would be much
> > better off leaving already complex wait_chldexit alone and using bit
> > waitqueue.
> 
> Well, I don't think so. wait_on_bit() looks as unnecessary complication
> to me. See below.

Why is wait_on_bit() a complication?  It's a well defined event
construct.

> But. Why do we need signal->wait_chldexit or bit waitqueue at all?
> Previously this was needed because wait_event(!GROUP_STOP_TRAPPING)
> was called from ptrace_check_attach(), and the tracer can do anything
> after ptrace_attach() which sets GROUP_STOP_TRAPPING.
>
> With the current code we know that GROUP_STOP_TRAPPING means: the
> tracer can't go away from ptrace_attach() until we clear this bit.

Several reasons.

* Because I'm gonna use TRAPPING for end of group stop notification
  too and move TRAPPING waiting to ptrace_check_attach() and
  wait_task_stopped().

* I dislike adding unqualified wake_up_process() unless the
  interlocked behavior with the waiter is very obvious.  The waiter is
  waiting for an event to happen and the waker is generating the
  event.  It's much more clear and robust to describe that
  relationship explicitly rather than depending on "on yeah, if the
  waiter wasn't expecting this event yet, other sleeping places are
  supposed to be safe against spurious wakeups so we're all good".

* I think it's a bad idea to design such tightly coupled waiter/waker
  relationship as in "because the waiter is not gonna do anything
  else, just waking up is safe".  It's way too fragile and prone to
  error when code gets updated later and it's not like we gain
  anything from it.  Let's make waiter wait for a well defined event
  and waker generate that event.  We have pretty good facilities for
  that kind of things.

> Hmm. This is minor and off-topic, but perhaps it makes sense to move
> the code which sets/waits GROUP_STOP_TRAPPING from ptrace_attach() to
> the separate function, it can be called outside of tasklist_lock and
> cred_guard_mutex.

I'm confused.  It should be set while siglock is held.  Which place
are you suggesting?

> And. Could you remind why ptrace_attach() does signal_wake_up() instead
> of wake_up_state(TASK_STOPPED) ? OK, in general we shouldn't set
> GROUP_STOP_PENDING without TIF_SIGPENDING, but in this case?

I don't know.  I can't find any good reason there.  Feel free to
change it to wake_up_state(TASK_STOPPED) but leaving it as
signal_wake_up() would look prettier after SEIZE addition, and I think
using the same @resume abstraction for all ptrace wakeups make things
easier to understand too.  I'll soon post the patchset.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH ptrace] ptrace: fix signal->wait_chldexit usage in task_clear_group_stop_trapping()
  2011-05-08 14:02   ` Tejun Heo
@ 2011-05-08 15:34     ` Oleg Nesterov
  2011-05-08 15:40       ` Oleg Nesterov
  2011-05-08 15:55       ` Tejun Heo
  0 siblings, 2 replies; 6+ messages in thread
From: Oleg Nesterov @ 2011-05-08 15:34 UTC (permalink / raw)
  To: Tejun Heo; +Cc: linux-kernel, Linus Torvalds, Andrew Morton

On 05/08, Tejun Heo wrote:
>
> Hello, Oleg.
>
> On Sun, May 08, 2011 at 03:35:43PM +0200, Oleg Nesterov wrote:
> > > Given the relative low frequency of ptrace use, we would be much
> > > better off leaving already complex wait_chldexit alone and using bit
> > > waitqueue.
> >
> > Well, I don't think so. wait_on_bit() looks as unnecessary complication
> > to me. See below.
>
> Why is wait_on_bit() a complication?  It's a well defined event
> construct.

Sure, but in this case wait_chldexit or wake_up_process() looks more
simple and natural to me. OK, this is subjective.

> > But. Why do we need signal->wait_chldexit or bit waitqueue at all?
> > Previously this was needed because wait_event(!GROUP_STOP_TRAPPING)
> > was called from ptrace_check_attach(), and the tracer can do anything
> > after ptrace_attach() which sets GROUP_STOP_TRAPPING.
> >
> > With the current code we know that GROUP_STOP_TRAPPING means: the
> > tracer can't go away from ptrace_attach() until we clear this bit.
>
> Several reasons.
>
> * Because I'm gonna use TRAPPING for end of group stop notification
>   too and move TRAPPING waiting to ptrace_check_attach() and
>   wait_task_stopped().

Hmm. Right now this is not clear to me... OK, nevermind.

> * I dislike adding unqualified wake_up_process() unless the
>   interlocked behavior with the waiter is very obvious.

Imho this is what we currently have.

But this is subjective too, and I agree that the future patches can
change the current trivial contract. So:

Reviewed-by: Oleg Nesterov <oleg@redhat.com>

I'll add this fix to my tree.


> > Hmm. This is minor and off-topic, but perhaps it makes sense to move
> > the code which sets/waits GROUP_STOP_TRAPPING from ptrace_attach() to
> > the separate function, it can be called outside of tasklist_lock and
> > cred_guard_mutex.
>
> I'm confused.  It should be set while siglock is held.  Which place
> are you suggesting?

Of course, we should take siglock. I meant, we could probably make

	static void ptrace_s_stopped_traced(struct task_struct *task)
	{
		bool wait_trap = false;

		spin_lock(&task->sighand->siglock);
		/*
		 * If the task is already STOPPED, set GROUP_STOP_PENDING and
		 * TRAPPING, and kick it so that it transits to TRACED.  TRAPPING
		 * will be cleared if the child completes the transition or any
		 * event which clears the group stop states happens.  We'll wait
		 * for the transition to complete before returning from this
		 * function.
		 *
		 * This hides STOPPED -> RUNNING -> TRACED transition from the
		 * attaching thread but a different thread in the same group can
		 * still observe the transient RUNNING state.  IOW, if another
		 * thread's WNOHANG wait(2) on the stopped tracee races against
		 * ATTACH, the wait(2) may fail due to the transient RUNNING.
		 *
		 * The following task_is_stopped() test is safe as both transitions
		 * in and out of STOPPED are protected by siglock.
		 */
		if (task_is_stopped(task)) {
			task->group_stop |= GROUP_STOP_PENDING | GROUP_STOP_TRAPPING;
			signal_wake_up(task, 1);
			wait_trap = true;
		}
		spin_unlock(&task->sighand->siglock);

		if (wait_trap)
			wait_event(current->signal->wait_chldexit,
				   !(task->group_stop & GROUP_STOP_TRAPPING));
		return retval;
	}

called by ptrace_attach() at the end.

> > And. Could you remind why ptrace_attach() does signal_wake_up() instead
> > of wake_up_state(TASK_STOPPED) ? OK, in general we shouldn't set
> > GROUP_STOP_PENDING without TIF_SIGPENDING, but in this case?
>
> I don't know.  I can't find any good reason there.  Feel free to
> change it to wake_up_state(TASK_STOPPED)

No, I agree signal_wake_up() looks more consistent. Just I wanted to
ensure I didn't miss something which makes it strictly necessary.

Oleg.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH ptrace] ptrace: fix signal->wait_chldexit usage in task_clear_group_stop_trapping()
  2011-05-08 15:34     ` Oleg Nesterov
@ 2011-05-08 15:40       ` Oleg Nesterov
  2011-05-08 15:55       ` Tejun Heo
  1 sibling, 0 replies; 6+ messages in thread
From: Oleg Nesterov @ 2011-05-08 15:40 UTC (permalink / raw)
  To: Tejun Heo; +Cc: linux-kernel, Linus Torvalds, Andrew Morton

On 05/08, Oleg Nesterov wrote:
>
> Of course, we should take siglock. I meant, we could probably make
>
> 	static void ptrace_s_stopped_traced(struct task_struct *task)
> 	{
> 		bool wait_trap = false;
>
> 		spin_lock(&task->sighand->siglock);

this needs lock_task_sighand() of course.

And in any case this is minor.

Oleg.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH ptrace] ptrace: fix signal->wait_chldexit usage in task_clear_group_stop_trapping()
  2011-05-08 15:34     ` Oleg Nesterov
  2011-05-08 15:40       ` Oleg Nesterov
@ 2011-05-08 15:55       ` Tejun Heo
  1 sibling, 0 replies; 6+ messages in thread
From: Tejun Heo @ 2011-05-08 15:55 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: linux-kernel, Linus Torvalds, Andrew Morton

Hello,

On Sun, May 08, 2011 at 05:34:35PM +0200, Oleg Nesterov wrote:
> But this is subjective too, and I agree that the future patches can
> change the current trivial contract. So:
> 
> Reviewed-by: Oleg Nesterov <oleg@redhat.com>
> 
> I'll add this fix to my tree.

Great, thanks.

> > > Hmm. This is minor and off-topic, but perhaps it makes sense to move
> > > the code which sets/waits GROUP_STOP_TRAPPING from ptrace_attach() to
> > > the separate function, it can be called outside of tasklist_lock and
> > > cred_guard_mutex.
> >
> > I'm confused.  It should be set while siglock is held.  Which place
> > are you suggesting?
> 
> Of course, we should take siglock. I meant, we could probably make
> 
> 	static void ptrace_s_stopped_traced(struct task_struct *task)
> 	{
> 		bool wait_trap = false;
> 
> 		spin_lock(&task->sighand->siglock);
> 		/*
> 		 * If the task is already STOPPED, set GROUP_STOP_PENDING and
> 		 * TRAPPING, and kick it so that it transits to TRACED.  TRAPPING
> 		 * will be cleared if the child completes the transition or any
> 		 * event which clears the group stop states happens.  We'll wait
> 		 * for the transition to complete before returning from this
> 		 * function.
> 		 *
> 		 * This hides STOPPED -> RUNNING -> TRACED transition from the
> 		 * attaching thread but a different thread in the same group can
> 		 * still observe the transient RUNNING state.  IOW, if another
> 		 * thread's WNOHANG wait(2) on the stopped tracee races against
> 		 * ATTACH, the wait(2) may fail due to the transient RUNNING.
> 		 *
> 		 * The following task_is_stopped() test is safe as both transitions
> 		 * in and out of STOPPED are protected by siglock.
> 		 */
> 		if (task_is_stopped(task)) {
> 			task->group_stop |= GROUP_STOP_PENDING | GROUP_STOP_TRAPPING;
> 			signal_wake_up(task, 1);
> 			wait_trap = true;
> 		}
> 		spin_unlock(&task->sighand->siglock);
> 
> 		if (wait_trap)
> 			wait_event(current->signal->wait_chldexit,
> 				   !(task->group_stop & GROUP_STOP_TRAPPING));
> 		return retval;
> 	}
> 
> called by ptrace_attach() at the end.

Ah, okay.  Yeah, that might be a good idea but in the just posted
patch series, I implemented a different helper and moved the wait back
into ptrace_check_attach(), so if it needs further cleanup, it would
be great if we could do it on top of the series.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2011-05-08 15:55 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-05-06  9:52 [PATCH ptrace] ptrace: fix signal->wait_chldexit usage in task_clear_group_stop_trapping() Tejun Heo
2011-05-08 13:35 ` Oleg Nesterov
2011-05-08 14:02   ` Tejun Heo
2011-05-08 15:34     ` Oleg Nesterov
2011-05-08 15:40       ` Oleg Nesterov
2011-05-08 15:55       ` Tejun Heo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.