public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Oleg Nesterov <oleg@redhat.com>
To: Linus Torvalds <torvalds@linux-foundation.org>,
	Yasunori Goto <y-goto@jp.fujitsu.com>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Ingo Molnar <mingo@elte.hu>
Cc: Dan Carpenter <dan.carpenter@oracle.com>,
	Michael Davidson <md@google.com>,
	Suleiman Souhlal <suleiman@google.com>,
	Julien Tinnes <jln@google.com>, Aaron Durbin <adurbin@google.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Tejun Heo <tj@kernel.org>, Roland McGrath <roland@hack.frob.com>,
	Alexander Gordeev <agordeev@redhat.com>
Subject: TASK_DEAD && ttwu() again (Was: ensure arch_ptrace/ptrace_request can never race with SIGKILL)
Date: Wed, 23 Jan 2013 20:19:46 +0100	[thread overview]
Message-ID: <20130123191946.GA19101@redhat.com> (raw)
In-Reply-To: <20130121194723.GA18775@redhat.com>

To avoid the confusion, this is not connected to ptrace_freeze_traced()
changes...

With or without these changes, there is another problem: a spurious
wakeup from try_to_wake_up(TASK_NORMAL) which doesn't necessarily see
the "right" task->state.

As for ptrace_stop() this is purely theoretical, but I thought that
perhaps it makes sense to extract the "mb + unlock_wait(pi_lock)" code
from do_exit() into the generic helper,
set_current_state_sync_because_we_cant_tolerate_a_wrong_wakeup().
But when I look at this code again I am not sure it is right.

Let me remind the problem. To oversimplify, we have

	try_to_wake_up(task, state)
	{
		lock(task->pi_lock);

		if (task->state & state)
			task->state = TASK_RUNNING;

		unlock(task->pi_lock);
	}

And this means that a task doing

	current->state = STATE_1;
	// no schedule() in between
	current->state = STATE_2;
	schedule();

can be actually woken up by try_to_wake_up(STATE_1) even if it already
sleeps in STATE_2.

Usually this is fine, any wait_event-like code should be careful. But
sometimes we can't afford the false wakeup, that is why do_wait() roughly
does

	do_exit()
	{
		// down_read(mmap_sem) can do this without schedule()
		current->state = TASK_UNINTERRUPTIBLE;
		current->state = TASK_RUNNING;

		mb();

		spin_unlock_wait(current->pi_lock);

		current->state = TASK_DEAD;
		schedule();
	}

This should ensure that any subsequent (after unlock_wait) try_to_wake_up()
can't see state == UNINTERRUPTIBLE and I think this works.

But. Somehow we missed the fact (I think) that we also need to serialize
unlock_wait() and "state = TASK_DEAD". The code above can be reordered,

	do_exit()
	{
		// down_read(mmap_sem) can do this without schedule()
		current->state = TASK_UNINTERRUPTIBLE;
		current->state = TASK_RUNNING;

		mb();

		current->state = TASK_DEAD;
		// !!! ttwu() can change ->state here !!!
		spin_unlock_wait(current->pi_lock);

		schedule();
	}

and we have the same problem again. So _I think_ that we we need another
mb() after unlock_wait() ?

And, afaics, in theory we can't simply move the current mb() down, after
unlock_wait().  (again, only in theory, if nothing else we should have
the implicit barrrers after we played with ->state in the past).

Or perhaps we should modify ttwu_do_wakeup() to not blindly set RUNNING,
say, cmpxchg(old_state, RUNNING). But this is not simple/nice.

Or I missed something?

Oleg.


--- x/kernel/exit.c
+++ x/kernel/exit.c
@@ -869,8 +869,15 @@ void do_exit(long code)
 	 * To avoid it, we have to wait for releasing tsk->pi_lock which
 	 * is held by try_to_wake_up()
 	 */
+
 	smp_mb();
 	raw_spin_unlock_wait(&tsk->pi_lock);
+	/*
+	 * The first mb() ensures that after that try_to_wake_up() must see
+	 * state == TASK_RUNNING. We need another one to ensure that we set
+	 * TASK_DEAD only after ->pi_lock is really unlocked.
+	 */
+	smp_mb();
 
 	/* causes final put_task_struct in finish_task_switch(). */
 	tsk->state = TASK_DEAD;


  parent reply	other threads:[~2013-01-23 19:21 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20130115172613.GA3011@redhat.com>
     [not found] ` <20130116181830.GA6469@redhat.com>
     [not found]   ` <CA+55aFzkWqSEzw9oa5JodrM2NWE0H_AF7xyzRhd+DQ=PB=ZT2A@mail.gmail.com>
     [not found]     ` <20130118153700.GA27915@redhat.com>
     [not found]       ` <CA+55aFxEow_-PoX0xFa07yOi6az=6uVx8zeOsfToErmzh7dB8A@mail.gmail.com>
     [not found]         ` <20130118172854.GA29753@redhat.com>
     [not found]           ` <20130118175224.GA520@redhat.com>
     [not found]             ` <CA+55aFyEsU-pkX557A-m+xoGkA_v+fXEyA8z8HbJ5J8K1jObeg@mail.gmail.com>
     [not found]               ` <20130118185559.GA3773@redhat.com>
     [not found]                 ` <CA+55aFy=newnMbx53HipyWbRs2mUUPSqXXCpSfDLW78gkro37g@mail.gmail.com>
2013-01-20 19:24                   ` [PATCH 0/4] (Was: ptrace: prevent PTRACE_SETREGS from corrupting stack) Oleg Nesterov
2013-01-20 19:25                     ` [PATCH 1/4] ptrace: introduce signal_wake_up_state() and ptrace_signal_wake_up() Oleg Nesterov
2013-01-20 19:25                     ` [PATCH 2/4] ptrace: ensure arch_ptrace/ptrace_request can never race with SIGKILL Oleg Nesterov
2013-01-20 19:46                       ` [PATCH v2 " Oleg Nesterov
2013-01-20 20:21                       ` [PATCH " Linus Torvalds
2013-01-21 17:21                         ` Oleg Nesterov
2013-01-21 18:27                           ` Linus Torvalds
2013-01-21 19:47                             ` [PATCH v3 0/3] " Oleg Nesterov
2013-01-21 19:47                               ` [PATCH v3 1/3] ptrace: introduce signal_wake_up_state() and ptrace_signal_wake_up() Oleg Nesterov
2013-01-21 19:48                               ` [PATCH v3 2/3] ptrace: ensure arch_ptrace/ptrace_request can never race with SIGKILL Oleg Nesterov
2013-01-22 17:52                                 ` [PATCH v4 " Oleg Nesterov
2013-01-21 19:48                               ` [PATCH v3 3/3] wake_up_process() should be never used to wakeup a TASK_STOPPED/TRACED task Oleg Nesterov
2013-01-22 17:51                               ` [PATCH v3 0/3] ptrace: ensure arch_ptrace/ptrace_request can never race with SIGKILL Oleg Nesterov
2013-01-23 19:19                               ` Oleg Nesterov [this message]
2013-01-23 19:50                                 ` TASK_DEAD && ttwu() again (Was: ensure arch_ptrace/ptrace_request can never race with SIGKILL) Tejun Heo
2013-01-24 18:50                                   ` Oleg Nesterov
2013-01-20 19:25                     ` [PATCH 3/4] ia64: kill thread_matches(), unexport ptrace_check_attach() Oleg Nesterov
2013-01-20 20:23                       ` Linus Torvalds
2013-01-20 19:26                     ` [PATCH 4/4] wake_up_process() should be never used to wakeup a TASK_STOPPED/TRACED task Oleg Nesterov
2013-01-20 19:35                     ` [PATCH 0/4] (Was: ptrace: prevent PTRACE_SETREGS from corrupting stack) Oleg Nesterov
2013-01-23 18:00                     ` Greg Kroah-Hartman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130123191946.GA19101@redhat.com \
    --to=oleg@redhat.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=adurbin@google.com \
    --cc=agordeev@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=dan.carpenter@oracle.com \
    --cc=jln@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=md@google.com \
    --cc=mingo@elte.hu \
    --cc=roland@hack.frob.com \
    --cc=suleiman@google.com \
    --cc=tj@kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=y-goto@jp.fujitsu.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox