stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Dave Jones <davej@redhat.com>,
	Peter Ziljstra <peterz@infradead.org>,
	"Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com>,
	Ingo Molnar <mingo@kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	dhillf@gmail.com
Subject: [ 04/42] kthread: Prevent unpark race which puts threads on the wrong cpu
Date: Tue, 23 Apr 2013 14:52:02 -0700	[thread overview]
Message-ID: <20130423215206.010265655@linuxfoundation.org> (raw)
In-Reply-To: <20130423215205.523980967@linuxfoundation.org>

3.8-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Thomas Gleixner <tglx@linutronix.de>

commit f2530dc71cf0822f90bb63ea4600caaef33a66bb upstream.

The smpboot threads rely on the park/unpark mechanism which binds per
cpu threads on a particular core. Though the functionality is racy:

CPU0	       	 	CPU1  	     	    CPU2
unpark(T)				    wake_up_process(T)
  clear(SHOULD_PARK)	T runs
			leave parkme() due to !SHOULD_PARK
  bind_to(CPU2)		BUG_ON(wrong CPU)

We cannot let the tasks move themself to the target CPU as one of
those tasks is actually the migration thread itself, which requires
that it starts running on the target cpu right away.

The solution to this problem is to prevent wakeups in park mode which
are not from unpark(). That way we can guarantee that the association
of the task to the target cpu is working correctly.

Add a new task state (TASK_PARKED) which prevents other wakeups and
use this state explicitly for the unpark wakeup.

Peter noticed: Also, since the task state is visible to userspace and
all the parked tasks are still in the PID space, its a good hint in ps
and friends that these tasks aren't really there for the moment.

The migration thread has another related issue.

CPU0	      	     	 CPU1
Bring up CPU2
create_thread(T)
park(T)
 wait_for_completion()
			 parkme()
			 complete()
sched_set_stop_task()
			 schedule(TASK_PARKED)

The sched_set_stop_task() call is issued while the task is on the
runqueue of CPU1 and that confuses the hell out of the stop_task class
on that cpu. So we need the same synchronizaion before
sched_set_stop_task().

Reported-by: Dave Jones <davej@redhat.com>
Reported-and-tested-by: Dave Hansen <dave@sr71.net>
Reported-and-tested-by: Borislav Petkov <bp@alien8.de>
Acked-by: Peter Ziljstra <peterz@infradead.org>
Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Cc: dhillf@gmail.com
Cc: Ingo Molnar <mingo@kernel.org>
Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1304091635430.21884@ionos
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/proc/array.c              |    1 
 include/linux/sched.h        |    5 ++--
 include/trace/events/sched.h |    2 -
 kernel/kthread.c             |   52 +++++++++++++++++++++++--------------------
 4 files changed, 33 insertions(+), 27 deletions(-)

--- a/fs/proc/array.c
+++ b/fs/proc/array.c
@@ -143,6 +143,7 @@ static const char * const task_state_arr
 	"x (dead)",		/*  64 */
 	"K (wakekill)",		/* 128 */
 	"W (waking)",		/* 256 */
+	"P (parked)",		/* 512 */
 };
 
 static inline const char *get_task_state(struct task_struct *tsk)
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -163,9 +163,10 @@ print_cfs_rq(struct seq_file *m, int cpu
 #define TASK_DEAD		64
 #define TASK_WAKEKILL		128
 #define TASK_WAKING		256
-#define TASK_STATE_MAX		512
+#define TASK_PARKED		512
+#define TASK_STATE_MAX		1024
 
-#define TASK_STATE_TO_CHAR_STR "RSDTtZXxKW"
+#define TASK_STATE_TO_CHAR_STR "RSDTtZXxKWP"
 
 extern char ___assert_task_state[1 - 2*!!(
 		sizeof(TASK_STATE_TO_CHAR_STR)-1 != ilog2(TASK_STATE_MAX)+1)];
--- a/include/trace/events/sched.h
+++ b/include/trace/events/sched.h
@@ -147,7 +147,7 @@ TRACE_EVENT(sched_switch,
 		  __print_flags(__entry->prev_state & (TASK_STATE_MAX-1), "|",
 				{ 1, "S"} , { 2, "D" }, { 4, "T" }, { 8, "t" },
 				{ 16, "Z" }, { 32, "X" }, { 64, "x" },
-				{ 128, "W" }) : "R",
+				{ 128, "K" }, { 256, "W" }, { 512, "P" }) : "R",
 		__entry->prev_state & TASK_STATE_MAX ? "+" : "",
 		__entry->next_comm, __entry->next_pid, __entry->next_prio)
 );
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -124,12 +124,12 @@ void *kthread_data(struct task_struct *t
 
 static void __kthread_parkme(struct kthread *self)
 {
-	__set_current_state(TASK_INTERRUPTIBLE);
+	__set_current_state(TASK_PARKED);
 	while (test_bit(KTHREAD_SHOULD_PARK, &self->flags)) {
 		if (!test_and_set_bit(KTHREAD_IS_PARKED, &self->flags))
 			complete(&self->parked);
 		schedule();
-		__set_current_state(TASK_INTERRUPTIBLE);
+		__set_current_state(TASK_PARKED);
 	}
 	clear_bit(KTHREAD_IS_PARKED, &self->flags);
 	__set_current_state(TASK_RUNNING);
@@ -256,8 +256,13 @@ struct task_struct *kthread_create_on_no
 }
 EXPORT_SYMBOL(kthread_create_on_node);
 
-static void __kthread_bind(struct task_struct *p, unsigned int cpu)
+static void __kthread_bind(struct task_struct *p, unsigned int cpu, long state)
 {
+	/* Must have done schedule() in kthread() before we set_task_cpu */
+	if (!wait_task_inactive(p, state)) {
+		WARN_ON(1);
+		return;
+	}
 	/* It's safe because the task is inactive. */
 	do_set_cpus_allowed(p, cpumask_of(cpu));
 	p->flags |= PF_THREAD_BOUND;
@@ -274,12 +279,7 @@ static void __kthread_bind(struct task_s
  */
 void kthread_bind(struct task_struct *p, unsigned int cpu)
 {
-	/* Must have done schedule() in kthread() before we set_task_cpu */
-	if (!wait_task_inactive(p, TASK_UNINTERRUPTIBLE)) {
-		WARN_ON(1);
-		return;
-	}
-	__kthread_bind(p, cpu);
+	__kthread_bind(p, cpu, TASK_UNINTERRUPTIBLE);
 }
 EXPORT_SYMBOL(kthread_bind);
 
@@ -324,6 +324,22 @@ static struct kthread *task_get_live_kth
 	return NULL;
 }
 
+static void __kthread_unpark(struct task_struct *k, struct kthread *kthread)
+{
+	clear_bit(KTHREAD_SHOULD_PARK, &kthread->flags);
+	/*
+	 * We clear the IS_PARKED bit here as we don't wait
+	 * until the task has left the park code. So if we'd
+	 * park before that happens we'd see the IS_PARKED bit
+	 * which might be about to be cleared.
+	 */
+	if (test_and_clear_bit(KTHREAD_IS_PARKED, &kthread->flags)) {
+		if (test_bit(KTHREAD_IS_PER_CPU, &kthread->flags))
+			__kthread_bind(k, kthread->cpu, TASK_PARKED);
+		wake_up_state(k, TASK_PARKED);
+	}
+}
+
 /**
  * kthread_unpark - unpark a thread created by kthread_create().
  * @k:		thread created by kthread_create().
@@ -336,20 +352,8 @@ void kthread_unpark(struct task_struct *
 {
 	struct kthread *kthread = task_get_live_kthread(k);
 
-	if (kthread) {
-		clear_bit(KTHREAD_SHOULD_PARK, &kthread->flags);
-		/*
-		 * We clear the IS_PARKED bit here as we don't wait
-		 * until the task has left the park code. So if we'd
-		 * park before that happens we'd see the IS_PARKED bit
-		 * which might be about to be cleared.
-		 */
-		if (test_and_clear_bit(KTHREAD_IS_PARKED, &kthread->flags)) {
-			if (test_bit(KTHREAD_IS_PER_CPU, &kthread->flags))
-				__kthread_bind(k, kthread->cpu);
-			wake_up_process(k);
-		}
-	}
+	if (kthread)
+		__kthread_unpark(k, kthread);
 	put_task_struct(k);
 }
 
@@ -407,7 +411,7 @@ int kthread_stop(struct task_struct *k)
 	trace_sched_kthread_stop(k);
 	if (kthread) {
 		set_bit(KTHREAD_SHOULD_STOP, &kthread->flags);
-		clear_bit(KTHREAD_SHOULD_PARK, &kthread->flags);
+		__kthread_unpark(k, kthread);
 		wake_up_process(k);
 		wait_for_completion(&kthread->exited);
 	}



  parent reply	other threads:[~2013-04-23 21:52 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-04-23 21:51 [ 00/42] 3.8.9-stable review Greg Kroah-Hartman
2013-04-23 21:51 ` [ 01/42] powerpc: add a missing label in resume_kernel Greg Kroah-Hartman
2013-04-23 21:52 ` [ 02/42] kvm/powerpc/e500mc: fix tlb invalidation on cpu migration Greg Kroah-Hartman
2013-04-23 21:52 ` [ 03/42] ARM: Do 15e0d9e37c (ARM: pm: let platforms select cpu_suspend support) properly Greg Kroah-Hartman
2013-04-23 21:52 ` Greg Kroah-Hartman [this message]
2013-04-23 21:52 ` [ 05/42] hrtimer: Dont reinitialize a cpu_base lock on CPU_UP Greg Kroah-Hartman
2013-04-23 21:52 ` [ 06/42] can: mcp251x: add missing IRQF_ONESHOT to request_threaded_irq Greg Kroah-Hartman
2013-04-23 21:52 ` [ 07/42] can: sja1000: fix handling on dt properties on little endian systems Greg Kroah-Hartman
2013-04-23 21:52 ` [ 08/42] hugetlbfs: stop setting VM_DONTDUMP in initializing vma(VM_HUGETLB) Greg Kroah-Hartman
2013-04-23 21:52 ` [ 09/42] hugetlbfs: add swap entry check in follow_hugetlb_page() Greg Kroah-Hartman
2013-04-23 21:52 ` [ 10/42] fs/binfmt_elf.c: fix hugetlb memory check in vma_dump_size() Greg Kroah-Hartman
2013-04-23 21:52 ` [ 11/42] kernel/signal.c: stop info leak via the tkill and the tgkill syscalls Greg Kroah-Hartman
2013-04-23 21:52 ` [ 12/42] hfsplus: fix potential overflow in hfsplus_file_truncate() Greg Kroah-Hartman
2013-04-23 21:52 ` [ 13/42] md: raid1,10: Handle REQ_WRITE_SAME flag in write bios Greg Kroah-Hartman
2013-04-23 21:52 ` [ 14/42] KVM: x86: fix for buffer overflow in handling of MSR_KVM_SYSTEM_TIME (CVE-2013-1796) Greg Kroah-Hartman
2013-04-23 21:52 ` [ 15/42] KVM: x86: Convert MSR_KVM_SYSTEM_TIME to use gfn_to_hva_cache functions (CVE-2013-1797) Greg Kroah-Hartman
2013-04-23 21:52 ` [ 16/42] KVM: Fix bounds checking in ioapic indirect register reads (CVE-2013-1798) Greg Kroah-Hartman
2013-04-23 21:52 ` [ 17/42] KVM: Allow cross page reads and writes from cached translations Greg Kroah-Hartman
2013-04-23 21:52 ` [ 18/42] ARM: i.MX35: enable MAX clock Greg Kroah-Hartman
2013-04-23 21:52 ` [ 19/42] ARM: clk-imx35: Bugfix iomux clock Greg Kroah-Hartman
2013-04-23 21:52 ` [ 20/42] tg3: Add 57766 device support Greg Kroah-Hartman
2013-04-23 21:52 ` [ 21/42] sched: Convert BUG_ON()s in try_to_wake_up_local() to WARN_ON_ONCE()s Greg Kroah-Hartman
2013-04-23 21:52 ` [ 22/42] sched/debug: Fix sd->*_idx limit range avoiding overflow Greg Kroah-Hartman
2013-05-10  2:14   ` Ben Hutchings
2013-05-10  7:59     ` Ingo Molnar
2013-05-28  1:47       ` Ben Hutchings
2013-04-23 21:52 ` [ 23/42] ARM: 7696/1: Fix kexec by setting outer_cache.inv_all for Feroceon Greg Kroah-Hartman
2013-04-23 21:52 ` [ 24/42] ARM: 7698/1: perf: fix group validation when using enable_on_exec Greg Kroah-Hartman
2013-04-23 21:52 ` [ 25/42] ath9k_htc: accept 1.x firmware newer than 1.3 Greg Kroah-Hartman
2013-04-23 21:52 ` [ 26/42] ath9k_hw: change AR9580 initvals to fix a stability issue Greg Kroah-Hartman
2013-04-23 21:52 ` [ 27/42] mac80211: fix cfg80211 interaction on auth/assoc request Greg Kroah-Hartman
2013-04-23 21:52 ` [ 28/42] ssb: implement spurious tone avoidance Greg Kroah-Hartman
2013-04-23 21:52 ` [ 29/42] crypto: algif - suppress sending source address information in recvmsg Greg Kroah-Hartman
2013-04-23 21:52 ` [ 30/42] perf: Treat attr.config as u64 in perf_swevent_init() Greg Kroah-Hartman
2013-04-23 21:52 ` [ 31/42] perf/x86: Fix offcore_rsp valid mask for SNB/IVB Greg Kroah-Hartman
2013-04-23 21:52 ` [ 32/42] userns: Dont let unprivileged users trick privileged users into setting the id_map Greg Kroah-Hartman
2013-04-23 21:52 ` [ 33/42] userns: Check uid_maps openers fsuid, not the current fsuid Greg Kroah-Hartman
2013-04-23 21:52 ` [ 34/42] userns: Changing any namespace id mappings should require privileges Greg Kroah-Hartman
2013-04-23 21:52 ` [ 35/42] vm: add vm_iomap_memory() helper function Greg Kroah-Hartman
2013-04-23 21:52 ` [ 36/42] vm: convert snd_pcm_lib_mmap_iomem() to vm_iomap_memory() helper Greg Kroah-Hartman
2013-04-23 21:52 ` [ 37/42] vm: convert fb_mmap " Greg Kroah-Hartman
2013-04-23 21:52 ` [ 38/42] vm: convert HPET mmap " Greg Kroah-Hartman
2013-04-23 21:52 ` [ 39/42] vm: convert mtdchar " Greg Kroah-Hartman
2013-04-23 21:52 ` [ 40/42] Btrfs: make sure nbytes are right after log replay Greg Kroah-Hartman
2013-04-23 21:52 ` [ 41/42] s390: move dummy io_remap_pfn_range() to asm/pgtable.h Greg Kroah-Hartman
2013-04-23 21:52 ` [ 42/42] Revert "MIPS: page.h: Provide more readable definition for PAGE_MASK." Greg Kroah-Hartman
2013-04-24 16:23 ` [ 00/42] 3.8.9-stable review Shuah Khan
2013-04-24 16:24   ` Greg Kroah-Hartman
2013-04-25 10:43 ` Satoru Takeuchi
2013-04-25 14:49   ` Greg Kroah-Hartman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130423215206.010265655@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=davej@redhat.com \
    --cc=dhillf@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=srivatsa.bhat@linux.vnet.ibm.com \
    --cc=stable@vger.kernel.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).