stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Oleg Nesterov <oleg@redhat.com>,
	"Peter Zijlstra (Intel)" <peterz@infradead.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	manfred@colorfullife.com, will.deacon@arm.com,
	Ingo Molnar <mingo@kernel.org>
Subject: [PATCH 3.14 22/79] sched/core: Fix TASK_DEAD race in finish_task_switch()
Date: Sat, 17 Oct 2015 19:05:14 -0700	[thread overview]
Message-ID: <20151018020214.322850284@linuxfoundation.org> (raw)
In-Reply-To: <20151018020213.322172837@linuxfoundation.org>

3.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Peter Zijlstra <peterz@infradead.org>

commit 95913d97914f44db2b81271c2e2ebd4d2ac2df83 upstream.

So the problem this patch is trying to address is as follows:

        CPU0                            CPU1

        context_switch(A, B)
                                        ttwu(A)
                                          LOCK A->pi_lock
                                          A->on_cpu == 0
        finish_task_switch(A)
          prev_state = A->state  <-.
          WMB                      |
          A->on_cpu = 0;           |
          UNLOCK rq0->lock         |
                                   |    context_switch(C, A)
                                   `--  A->state = TASK_DEAD
          prev_state == TASK_DEAD
            put_task_struct(A)
                                        context_switch(A, C)
                                        finish_task_switch(A)
                                          A->state == TASK_DEAD
                                            put_task_struct(A)

The argument being that the WMB will allow the load of A->state on CPU0
to cross over and observe CPU1's store of A->state, which will then
result in a double-drop and use-after-free.

Now the comment states (and this was true once upon a long time ago)
that we need to observe A->state while holding rq->lock because that
will order us against the wakeup; however the wakeup will not in fact
acquire (that) rq->lock; it takes A->pi_lock these days.

We can obviously fix this by upgrading the WMB to an MB, but that is
expensive, so we'd rather avoid that.

The alternative this patch takes is: smp_store_release(&A->on_cpu, 0),
which avoids the MB on some archs, but not important ones like ARM.

Reported-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Cc: manfred@colorfullife.com
Cc: will.deacon@arm.com
Fixes: e4a52bcb9a18 ("sched: Remove rq->lock from the first half of ttwu()")
Link: http://lkml.kernel.org/r/20150929124509.GG3816@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 kernel/sched/core.c  |   10 +++++-----
 kernel/sched/sched.h |    5 +++--
 2 files changed, 8 insertions(+), 7 deletions(-)

--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2136,11 +2136,11 @@ static void finish_task_switch(struct rq
 	 * If a task dies, then it sets TASK_DEAD in tsk->state and calls
 	 * schedule one last time. The schedule call will never return, and
 	 * the scheduled task must drop that reference.
-	 * The test for TASK_DEAD must occur while the runqueue locks are
-	 * still held, otherwise prev could be scheduled on another cpu, die
-	 * there before we look at prev->state, and then the reference would
-	 * be dropped twice.
-	 *		Manfred Spraul <manfred@colorfullife.com>
+	 *
+	 * We must observe prev->state before clearing prev->on_cpu (in
+	 * finish_lock_switch), otherwise a concurrent wakeup can get prev
+	 * running on another CPU and we could rave with its RUNNING -> DEAD
+	 * transition, resulting in a double drop.
 	 */
 	prev_state = prev->state;
 	vtime_task_switch(prev);
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -994,9 +994,10 @@ static inline void finish_lock_switch(st
 	 * After ->on_cpu is cleared, the task can be moved to a different CPU.
 	 * We must ensure this doesn't happen until the switch is completely
 	 * finished.
+	 *
+	 * Pairs with the control dependency and rmb in try_to_wake_up().
 	 */
-	smp_wmb();
-	prev->on_cpu = 0;
+	smp_store_release(&prev->on_cpu, 0);
 #endif
 #ifdef CONFIG_DEBUG_SPINLOCK
 	/* this is a valid case when another task releases the spinlock */



  parent reply	other threads:[~2015-10-18  3:02 UTC|newest]

Thread overview: 88+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-10-18  2:04 [PATCH 3.14 00/79] 3.14.55-stable review Greg Kroah-Hartman
2015-10-18  2:04 ` [PATCH 3.14 01/79] kvm: fix zero length mmio searching Greg Kroah-Hartman
2015-10-18  2:04 ` [PATCH 3.14 02/79] scsi: fix scsi_error_handler vs. scsi_host_dev_release race Greg Kroah-Hartman
2015-10-18  2:04 ` [PATCH 3.14 03/79] iser-target: remove command with state ISTATE_REMOVE Greg Kroah-Hartman
2015-10-18  2:04 ` [PATCH 3.14 04/79] perf tools: Fix copying of /proc/kcore Greg Kroah-Hartman
2015-10-18  2:04 ` [PATCH 3.14 05/79] perf hists: Update the column width for the "srcline" sort key Greg Kroah-Hartman
2015-10-18  2:04 ` [PATCH 3.14 06/79] perf stat: Get correct cpu id for print_aggr Greg Kroah-Hartman
2015-10-18  2:04 ` [PATCH 3.14 07/79] perf header: Fixup reading of HEADER_NRCPUS feature Greg Kroah-Hartman
2015-10-18  2:05 ` [PATCH 3.14 08/79] hwmon: (nct6775) Swap STEP_UP_TIME and STEP_DOWN_TIME registers for most chips Greg Kroah-Hartman
2015-10-18  2:05 ` [PATCH 3.14 09/79] ARM: fix Thumb2 signal handling when ARMv6 is enabled Greg Kroah-Hartman
2015-10-18  2:05 ` [PATCH 3.14 10/79] ARM: 8429/1: disable GCC SRA optimization Greg Kroah-Hartman
2015-10-18  2:05 ` [PATCH 3.14 11/79] windfarm: decrement client count when unregistering Greg Kroah-Hartman
2015-10-18  2:05 ` [PATCH 3.14 12/79] ARM: dts: omap5-uevm.dts: fix i2c5 pinctrl offsets Greg Kroah-Hartman
2015-10-18  2:05 ` [PATCH 3.14 13/79] dmaengine: dw: properly read DWC_PARAMS register Greg Kroah-Hartman
2015-10-18  2:05 ` [PATCH 3.14 14/79] x86/apic: Serialize LVTT and TSC_DEADLINE writes Greg Kroah-Hartman
2015-10-18  2:05 ` [PATCH 3.14 15/79] x86/platform: Fix Geode LX timekeeping in the generic x86 build Greg Kroah-Hartman
2015-10-18  2:05 ` [PATCH 3.14 16/79] x86/paravirt: Replace the paravirt nop with a bona fide empty function Greg Kroah-Hartman
2015-10-18  2:05 ` [PATCH 3.14 17/79] x86/nmi/64: Fix a paravirt stack-clobbering bug in the NMI code Greg Kroah-Hartman
2015-10-18  2:05 ` [PATCH 3.14 18/79] Use WARN_ON_ONCE for missing X86_FEATURE_NRIPS Greg Kroah-Hartman
2015-10-18  2:05 ` [PATCH 3.14 19/79] x86/efi: Fix boot crash by mapping EFI memmap entries bottom-up at runtime, instead of top-down Greg Kroah-Hartman
2015-10-18  2:05 ` [PATCH 3.14 20/79] x86/mm: Set NX on gap between __ex_table and rodata Greg Kroah-Hartman
2015-10-18  2:05 ` [PATCH 3.14 21/79] x86/xen: Support kexec/kdump in HVM guests by doing a soft reset Greg Kroah-Hartman
2015-10-18  2:05 ` Greg Kroah-Hartman [this message]
2015-10-18  2:05 ` [PATCH 3.14 23/79] spi: Fix documentation of spi_alloc_master() Greg Kroah-Hartman
2015-10-18  2:05 ` [PATCH 3.14 24/79] spi: spi-pxa2xx: Check status register to determine if SSSR_TINT is disabled Greg Kroah-Hartman
2015-10-18  2:05 ` [PATCH 3.14 25/79] mm: hugetlbfs: skip shared VMAs when unmapping private pages to satisfy a fault Greg Kroah-Hartman
2015-10-18  2:05 ` [PATCH 3.14 26/79] ALSA: synth: Fix conflicting OSS device registration on AWE32 Greg Kroah-Hartman
2015-10-18  2:05 ` [PATCH 3.14 27/79] ALSA: hda - Apply SPDIF pin ctl to MacBookPro 12,1 Greg Kroah-Hartman
2015-10-18  2:05 ` [PATCH 3.14 28/79] ASoC: pxa: pxa2xx-ac97: fix dma requestor lines Greg Kroah-Hartman
2015-10-18  2:05 ` [PATCH 3.14 29/79] ASoC: fix broken pxa SoC support Greg Kroah-Hartman
2015-10-18  2:05 ` [PATCH 3.14 30/79] ASoC: dwc: correct irq clear method Greg Kroah-Hartman
2015-10-18  2:05 ` [PATCH 3.14 31/79] btrfs: skip waiting on ordered range for special files Greg Kroah-Hartman
2015-10-18  2:05 ` [PATCH 3.14 32/79] Btrfs: fix read corruption of compressed and shared extents Greg Kroah-Hartman
2015-10-18  2:05 ` [PATCH 3.14 33/79] Btrfs: update fix for " Greg Kroah-Hartman
2015-10-18  2:05 ` [PATCH 3.14 34/79] dm btree: add ref counting ops for the leaves of top level btrees Greg Kroah-Hartman
2015-10-18  2:05 ` [PATCH 3.14 35/79] staging: ion: fix corruption of ion_import_dma_buf Greg Kroah-Hartman
2015-10-18  2:05 ` [PATCH 3.14 36/79] USB: option: add ZTE PIDs Greg Kroah-Hartman
2015-10-18  2:05 ` [PATCH 3.14 37/79] dm raid: fix round up of default region size Greg Kroah-Hartman
2015-10-18  2:05 ` [PATCH 3.14 38/79] netfilter: nf_conntrack: Support expectations in different zones Greg Kroah-Hartman
2015-10-18  2:05 ` [PATCH 3.14 39/79] netfilter: ctnetlink: put back references to master ct and expect objects Greg Kroah-Hartman
2015-10-18  2:05 ` [PATCH 3.14 40/79] netfilter: nft_compat: skip family comparison in case of NFPROTO_UNSPEC Greg Kroah-Hartman
2015-10-18  2:05 ` [PATCH 3.14 41/79] disabling oplocks/leases via module parm enable_oplocks broken for SMB3 Greg Kroah-Hartman
2015-10-18  2:05 ` [PATCH 3.14 42/79] drm/qxl: only report first monitor as connected if we have no state Greg Kroah-Hartman
2016-05-27 11:06   ` Jiri Slaby
2016-05-27 11:13     ` Jiri Slaby
2016-08-14 14:51     ` Greg Kroah-Hartman
2016-08-16  8:21       ` Jiri Slaby
2015-10-18  2:05 ` [PATCH 3.14 44/79] drm: Reject DRI1 hw lock ioctl functions for kms drivers Greg Kroah-Hartman
2015-10-18  2:05 ` [PATCH 3.14 45/79] USB: whiteheat: fix potential null-deref at probe Greg Kroah-Hartman
2015-10-18  2:05 ` [PATCH 3.14 46/79] usb: xhci: Clear XHCI_STATE_DYING on start Greg Kroah-Hartman
2015-10-18  2:05 ` [PATCH 3.14 47/79] xhci: change xhci 1.0 only restrictions to support xhci 1.1 Greg Kroah-Hartman
2015-10-18  2:05 ` [PATCH 3.14 48/79] usb: xhci: Add support for URB_ZERO_PACKET to bulk/sg transfers Greg Kroah-Hartman
2015-10-18  2:05 ` [PATCH 3.14 49/79] Initialize msg/shm IPC objects before doing ipc_addid() Greg Kroah-Hartman
2015-10-18  2:05 ` [PATCH 3.14 50/79] ipvs: do not use random local source address for tunnels Greg Kroah-Hartman
2015-10-18  2:05 ` [PATCH 3.14 51/79] ipvs: fix crash with sync protocol v0 and FTP Greg Kroah-Hartman
2015-10-18  2:05 ` [PATCH 3.14 52/79] cifs: use server timestamp for ntlmv2 authentication Greg Kroah-Hartman
2015-10-18  2:05 ` [PATCH 3.14 53/79] mtd: pxa3xx_nand: add a default chunk size Greg Kroah-Hartman
2015-10-18  2:05 ` [PATCH 3.14 54/79] regmap: debugfs: Ensure we dont underflow when printing access masks Greg Kroah-Hartman
2015-10-18  2:05 ` [PATCH 3.14 55/79] regmap: debugfs: Dont bother actually printing when calculating max length Greg Kroah-Hartman
2015-10-18  2:05 ` [PATCH 3.14 56/79] security: fix typo in security_task_prctl Greg Kroah-Hartman
2015-10-18  2:05 ` [PATCH 3.14 57/79] usb: Use the USB_SS_MULT() macro to get the burst multiplier Greg Kroah-Hartman
2015-10-18  2:05 ` [PATCH 3.14 58/79] usb: Add device quirk for Logitech PTZ cameras Greg Kroah-Hartman
2015-10-18  2:05 ` [PATCH 3.14 59/79] USB: Add reset-resume quirk for two Plantronics usb headphones Greg Kroah-Hartman
2015-10-18  2:05 ` [PATCH 3.14 60/79] MIPS: dma-default: Fix 32-bit fall back to GFP_DMA Greg Kroah-Hartman
2015-10-18  2:05 ` [PATCH 3.14 61/79] arch,hexagon: Convert smp_mb__*() Greg Kroah-Hartman
2015-10-18  2:05 ` [PATCH 3.14 62/79] staging: comedi: usbduxsigma: dont clobber ai_timer in command test Greg Kroah-Hartman
2015-10-18  2:05 ` [PATCH 3.14 63/79] staging: comedi: usbduxsigma: dont clobber ao_timer " Greg Kroah-Hartman
2015-10-18  2:05 ` [PATCH 3.14 64/79] md: flush ->event_work before stopping array Greg Kroah-Hartman
2015-10-18  2:05 ` [PATCH 3.14 65/79] powerpc/MSI: Fix race condition in tearing down MSI interrupts Greg Kroah-Hartman
2015-10-18  2:05 ` [PATCH 3.14 66/79] UBI: Validate data_size Greg Kroah-Hartman
2015-10-18  2:05 ` [PATCH 3.14 67/79] UBI: return ENOSPC if no enough space available Greg Kroah-Hartman
2015-10-18  2:06 ` [PATCH 3.14 68/79] dcache: Handle escaped paths in prepend_path Greg Kroah-Hartman
2015-10-18  2:06 ` [PATCH 3.14 69/79] vfs: Test for and handle paths that are unreachable from their mnt_root Greg Kroah-Hartman
2015-10-18  2:06 ` [PATCH 3.14 70/79] arm64: readahead: fault retry breaks mmap file read random detection Greg Kroah-Hartman
2015-10-18  2:06 ` [PATCH 3.14 71/79] m68k: Define asmlinkage_protect Greg Kroah-Hartman
2015-10-18  2:06 ` [PATCH 3.14 72/79] fib_rules: Fix dump_rules() not to exit early Greg Kroah-Hartman
2015-10-18  2:06 ` [PATCH 3.14 73/79] genirq: Fix race in register_irq_proc() Greg Kroah-Hartman
2015-10-18  2:06 ` [PATCH 3.14 74/79] jbd2: avoid infinite loop when destroying aborted journal Greg Kroah-Hartman
2015-10-18  2:06 ` [PATCH 3.14 75/79] clk: ti: fix dual-registration of uart4_ick Greg Kroah-Hartman
2015-10-18  2:06 ` [PATCH 3.14 76/79] dm cache: fix NULL pointer when switching from cleaner policy Greg Kroah-Hartman
2015-10-18  2:06 ` [PATCH 3.14 77/79] staging: speakup: fix speakup-r regression Greg Kroah-Hartman
2015-10-18  2:06 ` [PATCH 3.14 78/79] mm/slab: fix unexpected index mapping result of kmalloc_size(INDEX_NODE+1) Greg Kroah-Hartman
2015-10-19  4:10 ` [PATCH 3.14 00/79] 3.14.55-stable review Guenter Roeck
2015-10-19 15:14   ` Greg Kroah-Hartman
2015-10-19 19:13     ` Richard Kuo
2015-10-19 20:09       ` Greg Kroah-Hartman
2015-10-19 21:31         ` Richard Kuo
2015-10-19 15:20 ` Shuah Khan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20151018020214.322850284@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=manfred@colorfullife.com \
    --cc=mingo@kernel.org \
    --cc=oleg@redhat.com \
    --cc=peterz@infradead.org \
    --cc=stable@vger.kernel.org \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=will.deacon@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).