From: lizf@kernel.org
To: stable@vger.kernel.org
Cc: linux-kernel@vger.kernel.org,
Peter Zijlstra <peterz@infradead.org>,
Thomas Gleixner <tglx@linutronix.de>,
manfred@colorfullife.com, will.deacon@arm.com,
Ingo Molnar <mingo@kernel.org>, Zefan Li <lizefan@huawei.com>
Subject: [PATCH 3.4 42/92] sched/core: Fix TASK_DEAD race in finish_task_switch()
Date: Mon, 18 Apr 2016 18:45:47 +0800 [thread overview]
Message-ID: <1460976397-5688-42-git-send-email-lizf@kernel.org> (raw)
In-Reply-To: <1460976338-5631-1-git-send-email-lizf@kernel.org>
From: Peter Zijlstra <peterz@infradead.org>
3.4.112-rc1 review patch. If anyone has any objections, please let me know.
------------------
commit 95913d97914f44db2b81271c2e2ebd4d2ac2df83 upstream.
So the problem this patch is trying to address is as follows:
CPU0 CPU1
context_switch(A, B)
ttwu(A)
LOCK A->pi_lock
A->on_cpu == 0
finish_task_switch(A)
prev_state = A->state <-.
WMB |
A->on_cpu = 0; |
UNLOCK rq0->lock |
| context_switch(C, A)
`-- A->state = TASK_DEAD
prev_state == TASK_DEAD
put_task_struct(A)
context_switch(A, C)
finish_task_switch(A)
A->state == TASK_DEAD
put_task_struct(A)
The argument being that the WMB will allow the load of A->state on CPU0
to cross over and observe CPU1's store of A->state, which will then
result in a double-drop and use-after-free.
Now the comment states (and this was true once upon a long time ago)
that we need to observe A->state while holding rq->lock because that
will order us against the wakeup; however the wakeup will not in fact
acquire (that) rq->lock; it takes A->pi_lock these days.
We can obviously fix this by upgrading the WMB to an MB, but that is
expensive, so we'd rather avoid that.
The alternative this patch takes is: smp_store_release(&A->on_cpu, 0),
which avoids the MB on some archs, but not important ones like ARM.
Reported-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Cc: manfred@colorfullife.com
Cc: will.deacon@arm.com
Fixes: e4a52bcb9a18 ("sched: Remove rq->lock from the first half of ttwu()")
Link: http://lkml.kernel.org/r/20150929124509.GG3816@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
[lizf: Backported to 3.4: use smb_mb() instead of smp_store_release(), which
is not defined in 3.4.y]
Signed-off-by: Zefan Li <lizefan@huawei.com>
---
kernel/sched/core.c | 10 +++++-----
kernel/sched/sched.h | 4 +++-
2 files changed, 8 insertions(+), 6 deletions(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 15be435..609a226 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1949,11 +1949,11 @@ static void finish_task_switch(struct rq *rq, struct task_struct *prev)
* If a task dies, then it sets TASK_DEAD in tsk->state and calls
* schedule one last time. The schedule call will never return, and
* the scheduled task must drop that reference.
- * The test for TASK_DEAD must occur while the runqueue locks are
- * still held, otherwise prev could be scheduled on another cpu, die
- * there before we look at prev->state, and then the reference would
- * be dropped twice.
- * Manfred Spraul <manfred@colorfullife.com>
+ *
+ * We must observe prev->state before clearing prev->on_cpu (in
+ * finish_lock_switch), otherwise a concurrent wakeup can get prev
+ * running on another CPU and we could rave with its RUNNING -> DEAD
+ * transition, resulting in a double drop.
*/
prev_state = prev->state;
finish_arch_switch(prev);
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 4a5e739..44f4058 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -702,8 +702,10 @@ static inline void finish_lock_switch(struct rq *rq, struct task_struct *prev)
* After ->on_cpu is cleared, the task can be moved to a different CPU.
* We must ensure this doesn't happen until the switch is completely
* finished.
+ *
+ * Pairs with the control dependency and rmb in try_to_wake_up().
*/
- smp_wmb();
+ smp_mb();
prev->on_cpu = 0;
#endif
#ifdef CONFIG_DEBUG_SPINLOCK
--
1.9.1
next prev parent reply other threads:[~2016-04-18 10:49 UTC|newest]
Thread overview: 98+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-04-18 10:45 [PATCH 3.4 00/92] 3.4.112-rc1 review lizf
2016-04-18 10:45 ` [PATCH 3.4 01/92] rc-core: fix remove uevent generation lizf
2016-04-18 10:45 ` [PATCH 3.4 02/92] PCI: Fix TI816X class code quirk lizf
2016-04-18 10:45 ` [PATCH 3.4 03/92] mac80211: enable assoc check for mesh interfaces lizf
2016-04-18 10:45 ` [PATCH 3.4 04/92] PCI: Add dev_flags bit to access VPD through function 0 lizf
2016-04-18 10:45 ` [PATCH 3.4 05/92] PCI: Add VPD function 0 quirk for Intel Ethernet devices lizf
2016-04-18 10:45 ` [PATCH 3.4 06/92] powerpc/rtas: Introduce rtas_get_sensor_fast() for IRQ handlers lizf
2016-04-18 10:45 ` [PATCH 3.4 07/92] svcrdma: Fix send_reply() scatter/gather set-up lizf
2016-04-18 10:45 ` [PATCH 3.4 08/92] md/raid0: update queue parameter in a safer location lizf
2016-04-18 10:45 ` [PATCH 3.4 09/92] auxdisplay: ks0108: fix refcount lizf
2016-04-18 10:45 ` [PATCH 3.4 10/92] devres: fix devres_get() lizf
2016-04-18 10:45 ` [PATCH 3.4 11/92] windfarm: decrement client count when unregistering lizf
2016-04-18 10:45 ` [PATCH 3.4 12/92] NFSv4: don't set SETATTR for O_RDONLY|O_EXCL lizf
2016-04-18 10:45 ` [PATCH 3.4 13/92] usb: host: ehci-sys: delete useless bus_to_hcd conversion lizf
2016-04-18 10:45 ` [PATCH 3.4 14/92] USB: ftdi_sio: Added custom PID for CustomWare products lizf
2016-04-18 10:45 ` [PATCH 3.4 15/92] eCryptfs: Invalidate dcache entries when lower i_nlink is zero lizf
2016-04-18 10:45 ` [PATCH 3.4 16/92] DRM - radeon: Don't link train DisplayPort on HPD until we get the dpcd lizf
2016-04-18 10:45 ` [PATCH 3.4 17/92] of/address: Don't loop forever in of_find_matching_node_by_address() lizf
2016-04-18 10:45 ` [PATCH 3.4 18/92] drivercore: Fix unregistration path of platform devices lizf
2016-04-18 10:45 ` [PATCH 3.4 19/92] SUNRPC: xs_reset_transport must mark the connection as disconnected lizf
2016-04-18 10:45 ` [PATCH 3.4 20/92] IB/mlx4: Use correct SL on AH query under RoCE lizf
2016-04-18 10:45 ` [PATCH 3.4 21/92] IB/uverbs: Fix race between ib_uverbs_open and remove_one lizf
2016-04-18 10:45 ` [PATCH 3.4 22/92] Add radeon suspend/resume quirk for HP Compaq dc5750 lizf
2016-04-18 10:45 ` [PATCH 3.4 23/92] IB/uverbs: reject invalid or unknown opcodes lizf
2016-04-18 10:45 ` [PATCH 3.4 24/92] hpfs: update ctime and mtime on directory modification lizf
2016-04-18 10:45 ` [PATCH 3.4 25/92] crypto: ghash-clmulni: specify context size for ghash async algorithm lizf
2016-04-18 10:45 ` [PATCH 3.4 26/92] fs: create and use seq_show_option for escaping lizf
2016-04-18 10:45 ` [PATCH 3.4 27/92] hfs,hfsplus: cache pages correctly between bnode_create and bnode_free lizf
2016-04-18 10:45 ` [PATCH 3.4 28/92] hfs: fix B-tree corruption after insertion at position 0 lizf
2016-04-18 10:45 ` [PATCH 3.4 29/92] scsi_dh: fix randconfig build error lizf
2016-04-18 10:45 ` [PATCH 3.4 30/92] ARM: 8429/1: disable GCC SRA optimization lizf
2016-04-18 10:45 ` [PATCH 3.4 31/92] powerpc/MSI: Fix race condition in tearing down MSI interrupts lizf
2016-04-18 10:45 ` [PATCH 3.4 32/92] perf header: Fixup reading of HEADER_NRCPUS feature lizf
2016-04-18 10:45 ` [PATCH 3.4 33/92] ARM: 7880/1: Clear the IT state independent of the Thumb-2 mode lizf
2016-04-18 10:45 ` [PATCH 3.4 34/92] ARM: fix Thumb2 signal handling when ARMv6 is enabled lizf
2016-04-18 10:45 ` [PATCH 3.4 35/92] x86/platform: Fix Geode LX timekeeping in the generic x86 build lizf
2016-04-18 10:45 ` [PATCH 3.4 36/92] module: Fix locking in symbol_put_addr() lizf
2016-04-18 10:45 ` [PATCH 3.4 37/92] ipv6: Fix IPsec pre-encap fragmentation check lizf
2016-04-18 10:45 ` [PATCH 3.4 38/92] ASoC: fix broken pxa SoC support lizf
2016-04-18 10:45 ` [PATCH 3.4 39/92] MIPS: dma-default: Fix 32-bit fall back to GFP_DMA lizf
2016-04-18 10:45 ` [PATCH 3.4 40/92] md/raid0: apply base queue limits *before* disk_stack_limits lizf
2016-04-18 10:45 ` [PATCH 3.4 41/92] iwlwifi: dvm: fix D3 firmware PN programming lizf
2016-04-18 10:45 ` lizf [this message]
2016-04-18 10:45 ` [PATCH 3.4 43/92] IB/cm: Fix rb-tree duplicate free and use-after-free lizf
2016-04-18 10:45 ` [PATCH 3.4 44/92] powerpc/rtas: Validate rtas.entry before calling enter_rtas() lizf
2016-04-18 10:45 ` [PATCH 3.4 45/92] md/raid10: ensure device failure recorded before write request returns lizf
2016-04-18 10:45 ` [PATCH 3.4 46/92] md/raid10: don't clear bitmap bit when bad-block-list write fails lizf
2016-04-18 10:45 ` [PATCH 3.4 47/92] md/raid1: ensure device failure recorded before write request returns lizf
2016-04-18 10:45 ` [PATCH 3.4 48/92] md/raid1: don't clear bitmap bit when bad-block-list write fails lizf
2016-04-18 10:45 ` [PATCH 3.4 49/92] drm: crtc: integer overflow in drm_property_create_blob() lizf
2016-04-18 10:45 ` [PATCH 3.4 50/92] spi: spi-pxa2xx: Check status register to determine if SSSR_TINT is disabled lizf
2016-04-18 10:45 ` [PATCH 3.4 51/92] spi: Fix documentation of spi_alloc_master() lizf
2016-04-18 10:45 ` [PATCH 3.4 52/92] btrfs: skip waiting on ordered range for special files lizf
2016-04-18 10:45 ` [PATCH 3.4 53/92] regmap: debugfs: Ensure we don't underflow when printing access masks lizf
2016-04-18 10:45 ` [PATCH 3.4 54/92] regmap: debugfs: Don't bother actually printing when calculating max length lizf
2016-04-18 10:46 ` [PATCH 3.4 55/92] KVM: x86: trap AMD MSRs for the TSeg base and mask lizf
2016-04-18 10:46 ` [PATCH 3.4 56/92] usb: Use the USB_SS_MULT() macro to get the burst multiplier lizf
2016-04-18 10:46 ` [PATCH 3.4 57/92] xhci: give command abortion one more chance before killing xhci lizf
2016-04-18 10:46 ` [PATCH 3.4 58/92] usb: xhci: Clear XHCI_STATE_DYING on start lizf
2016-04-18 10:46 ` [PATCH 3.4 59/92] xhci: change xhci 1.0 only restrictions to support xhci 1.1 lizf
2016-04-18 10:46 ` [PATCH 3.4 60/92] cifs: use server timestamp for ntlmv2 authentication lizf
2016-04-18 10:46 ` [PATCH 3.4 61/92] ocfs2/dlm: fix deadlock when dispatch assert master lizf
2016-04-18 11:29 ` Joseph Qi
2016-04-19 0:18 ` Zefan Li
2016-04-18 10:46 ` [PATCH 3.4 62/92] ath9k: declare required extra tx headroom lizf
2016-04-18 10:46 ` [PATCH 3.4 63/92] m68k: Define asmlinkage_protect lizf
2016-04-18 10:46 ` [PATCH 3.4 64/92] x86/xen: Do not clip xen_e820_map to xen_e820_map_entries when sanitizing map lizf
2016-04-18 10:46 ` [PATCH 3.4 65/92] UBI: Validate data_size lizf
2016-04-18 10:46 ` [PATCH 3.4 66/92] UBI: return ENOSPC if no enough space available lizf
2016-04-18 10:46 ` [PATCH 3.4 67/92] x86/process: Add proper bound checks in 64bit get_wchan() lizf
2016-04-18 10:46 ` [PATCH 3.4 68/92] genirq: Fix race in register_irq_proc() lizf
2016-04-18 10:46 ` [PATCH 3.4 69/92] mm: hugetlbfs: skip shared VMAs when unmapping private pages to satisfy a fault lizf
2016-04-18 10:46 ` [PATCH 3.4 70/92] clocksource: Fix abs() usage w/ 64bit values lizf
2016-04-18 10:46 ` [PATCH 3.4 71/92] USB: Add reset-resume quirk for two Plantronics usb headphones lizf
2016-04-18 10:46 ` [PATCH 3.4 72/92] usb: Add device quirk for Logitech PTZ cameras lizf
2016-04-18 10:46 ` [PATCH 3.4 73/92] tty: fix stall caused by missing memory barrier in drivers/tty/n_tty.c lizf
2016-04-18 10:46 ` [PATCH 3.4 74/92] drivers/tty: require read access for controlling terminal lizf
2016-04-18 10:46 ` [PATCH 3.4 75/92] ALSA: synth: Fix conflicting OSS device registration on AWE32 lizf
2016-04-18 10:46 ` [PATCH 3.4 76/92] xen-blkfront: check for null drvdata in blkback_changed (XenbusStateClosing) lizf
2016-04-18 10:46 ` [PATCH 3.4 77/92] crypto: ahash - ensure statesize is non-zero lizf
2016-04-18 10:46 ` [PATCH 3.4 78/92] iommu/vt-d: fix range computation when making room for large pages lizf
2016-04-18 10:46 ` [PATCH 3.4 79/92] xhci: handle no ping response error properly lizf
2016-04-18 10:46 ` [PATCH 3.4 80/92] xhci: Add spurious wakeup quirk for LynxPoint-LP controllers lizf
2016-04-18 10:46 ` [PATCH 3.4 81/92] crypto: api - Only abort operations on fatal signal lizf
2016-04-18 10:46 ` [PATCH 3.4 82/92] ASoC: wm8904: Correct number of EQ registers lizf
2016-04-18 10:46 ` [PATCH 3.4 83/92] iommu/amd: Don't clear DTE flags when modifying it lizf
2016-04-18 10:46 ` [PATCH 3.4 84/92] drm/nouveau/gem: return only valid domain when there's only one lizf
2016-04-18 10:46 ` [PATCH 3.4 85/92] mm: make sendfile(2) killable lizf
2016-04-18 10:46 ` [PATCH 3.4 86/92] dm btree: fix leak of bufio-backed block in btree_split_beneath error path lizf
2016-04-18 10:46 ` [PATCH 3.4 87/92] mvsas: Fix NULL pointer dereference in mvs_slot_task_free lizf
2016-04-18 10:46 ` [PATCH 3.4 88/92] raid1: include bio_end_io_list in nr_queued to prevent freeze_array hang lizf
2016-04-18 10:46 ` [PATCH 3.4 89/92] usb: Use the USB_SS_MULT() macro to decode burst multiplier for log message lizf
2016-04-18 10:46 ` [PATCH 3.4 90/92] pipe: Fix buffer offset after partially failed read lizf
2016-04-18 10:46 ` [PATCH 3.4 91/92] splice: sendfile() at once fails for big files lizf
2016-04-18 10:57 ` [PATCH 3.4 92/92] x86/iopl/64: Properly context-switch IOPL on Xen PV lizf
2016-04-18 16:37 ` [PATCH 3.4 00/92] 3.4.112-rc1 review Guenter Roeck
2016-04-19 0:18 ` Zefan Li
2016-04-22 16:48 ` Christoph Biedl
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1460976397-5688-42-git-send-email-lizf@kernel.org \
--to=lizf@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=lizefan@huawei.com \
--cc=manfred@colorfullife.com \
--cc=mingo@kernel.org \
--cc=peterz@infradead.org \
--cc=stable@vger.kernel.org \
--cc=tglx@linutronix.de \
--cc=will.deacon@arm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.