From: Ben Hutchings <ben@decadent.org.uk>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: akpm@linux-foundation.org, alan@lxorguk.ukuu.org.uk,
Yasunori Goto <y-goto@jp.fujitsu.com>,
Oleg Nesterov <oleg@redhat.com>,
Peter Zijlstra <a.p.zijlstra@chello.nl>,
Linus Torvalds <torvalds@linux-foundation.org>,
Ingo Molnar <mingo@elte.hu>
Subject: [ 013/108] sched: Fix ancient race in do_exit()
Date: Sun, 07 Oct 2012 23:58:47 +0100 [thread overview]
Message-ID: <20121007225836.524377798@decadent.org.uk> (raw)
In-Reply-To: <20121007225834.673681075@decadent.org.uk>
3.2-stable review patch. If anyone has any objections, please let me know.
------------------
From: Yasunori Goto <y-goto@jp.fujitsu.com>
commit b5740f4b2cb3503b436925eb2242bc3d75cd3dfe upstream.
try_to_wake_up() has a problem which may change status from TASK_DEAD to
TASK_RUNNING in race condition with SMI or guest environment of virtual
machine. As a result, exited task is scheduled() again and panic occurs.
Here is the sequence how it occurs:
----------------------------------+-----------------------------
|
CPU A | CPU B
----------------------------------+-----------------------------
TASK A calls exit()....
do_exit()
exit_mm()
down_read(mm->mmap_sem);
rwsem_down_failed_common()
set TASK_UNINTERRUPTIBLE
set waiter.task <= task A
list_add to sem->wait_list
:
raw_spin_unlock_irq()
(I/O interruption occured)
__rwsem_do_wake(mmap_sem)
list_del(&waiter->list);
waiter->task = NULL
wake_up_process(task A)
try_to_wake_up()
(task is still
TASK_UNINTERRUPTIBLE)
p->on_rq is still 1.)
ttwu_do_wakeup()
(*A)
:
(I/O interruption handler finished)
if (!waiter.task)
schedule() is not called
due to waiter.task is NULL.
tsk->state = TASK_RUNNING
:
check_preempt_curr();
:
task->state = TASK_DEAD
(*B)
<--- set TASK_RUNNING (*C)
schedule()
(exit task is running again)
BUG_ON() is called!
--------------------------------------------------------
The execution time between (*A) and (*B) is usually very short,
because the interruption is disabled, and setting TASK_RUNNING at (*C)
must be executed before setting TASK_DEAD.
HOWEVER, if SMI is interrupted between (*A) and (*B),
(*C) is able to execute AFTER setting TASK_DEAD!
Then, exited task is scheduled again, and BUG_ON() is called....
If the system works on guest system of virtual machine, the time
between (*A) and (*B) may be also long due to scheduling of hypervisor,
and same phenomenon can occur.
By this patch, do_exit() waits for releasing task->pi_lock which is used
in try_to_wake_up(). It guarantees the task becomes TASK_DEAD after
waking up.
Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/20120117174031.3118.E1E9C6FF@jp.fujitsu.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
---
kernel/exit.c | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -1019,6 +1019,22 @@ NORET_TYPE void do_exit(long code)
preempt_disable();
exit_rcu();
+
+ /*
+ * The setting of TASK_RUNNING by try_to_wake_up() may be delayed
+ * when the following two conditions become true.
+ * - There is race condition of mmap_sem (It is acquired by
+ * exit_mm()), and
+ * - SMI occurs before setting TASK_RUNINNG.
+ * (or hypervisor of virtual machine switches to other guest)
+ * As a result, we may become TASK_RUNNING after becoming TASK_DEAD
+ *
+ * To avoid it, we have to wait for releasing tsk->pi_lock which
+ * is held by try_to_wake_up()
+ */
+ smp_mb();
+ raw_spin_unlock_wait(&tsk->pi_lock);
+
/* causes final put_task_struct in finish_task_switch(). */
tsk->state = TASK_DEAD;
schedule();
next prev parent reply other threads:[~2012-10-07 23:47 UTC|newest]
Thread overview: 114+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-10-07 22:58 [ 000/108] 3.2.31-stable review Ben Hutchings
2012-10-07 22:58 ` [ 001/108] target: Fix ->data_length re-assignment bug with SCSI overflow Ben Hutchings
2012-10-07 22:58 ` [ 002/108] ASoC: samsung dma - Dont indicate support for pause/resume Ben Hutchings
2012-10-07 22:58 ` [ 003/108] fs/proc: fix potential unregister_sysctl_table hang Ben Hutchings
2012-10-07 22:58 ` [ 004/108] mm/ia64: fix a memory block size bug Ben Hutchings
2012-10-07 22:58 ` [ 005/108] nbd: clear waiting_queue on shutdown Ben Hutchings
2012-10-07 22:58 ` [ 006/108] drivers/rtc/rtc-twl.c: ensure all interrupts are disabled during probe Ben Hutchings
2012-10-07 22:58 ` [ 007/108] mm/page_alloc: fix the page address of higher pages buddy calculation Ben Hutchings
2012-10-07 22:58 ` [ 008/108] memory hotplug: fix section info double registration bug Ben Hutchings
2012-10-07 22:58 ` [ 009/108] cciss: fix handling of protocol error Ben Hutchings
2012-10-07 22:58 ` [ 010/108] vfs: dcache: use DCACHE_DENTRY_KILLED instead of DCACHE_DISCONNECTED in d_kill() Ben Hutchings
2012-10-07 22:58 ` [ 011/108] workqueue: reimplement work_on_cpu() using system_wq Ben Hutchings
2012-10-07 22:58 ` [ 012/108] cpufreq/powernow-k8: workqueue user shouldnt migrate the kworker to another CPU Ben Hutchings
2012-10-07 22:58 ` Ben Hutchings [this message]
2012-10-07 22:58 ` [ 014/108] hpwdt: Fix kdump issue in hpwdt Ben Hutchings
2012-10-07 22:58 ` [ 015/108] rtlwifi: rtl8192ce: Log message that B_CUT device may not work Ben Hutchings
2012-10-07 22:58 ` [ 016/108] brcmfmac: fix big endian bug in i-scan Ben Hutchings
2012-10-07 22:58 ` [ 017/108] brcmfmac: Fix big endian host configuration data Ben Hutchings
2012-10-07 22:58 ` [ 018/108] dmaengine: at_hdmac: fix comment in atc_prep_slave_sg() Ben Hutchings
2012-10-07 22:58 ` [ 019/108] dmaengine: at_hdmac: check that each sg data length is non-null Ben Hutchings
2012-10-07 22:58 ` [ 020/108] ARM: 7532/1: decompressor: reset SCTLR.TRE for VMSA ARMv7 cores Ben Hutchings
2012-10-07 22:58 ` [ 021/108] drm/i915: Reduce a pin-leak BUG into a WARN Ben Hutchings
2012-10-07 22:58 ` [ 022/108] drm/i915: HDMI - Clear Audio Enable bit for Hot Plug Ben Hutchings
2012-10-07 22:58 ` [ 023/108] [SCSI] bnx2i: Fixed NULL ptr deference for 1G bnx2 Linux iSCSI offload Ben Hutchings
2012-10-07 22:58 ` [ 024/108] [SCSI] mpt2sas: Fix for issue - Unable to boot from the drive connected to HBA Ben Hutchings
2012-10-07 22:58 ` [ 025/108] hwmon: (ads7871) Add name sysfs attribute Ben Hutchings
2012-10-07 22:59 ` [ 026/108] DMA: PL330: Check the pointer returned by kzalloc Ben Hutchings
2012-10-07 22:59 ` [ 027/108] [SCSI] hpsa: fix handling of protocol error Ben Hutchings
2012-10-07 22:59 ` [ 028/108] ARM: imx: armadillo5x0: Fix illegal register access Ben Hutchings
2012-10-07 23:37 ` Estevam Fabio-R49496
2012-10-10 2:24 ` Ben Hutchings
2012-10-07 22:59 ` [ 029/108] hwmon: (ad7314) Add name sysfs attribute Ben Hutchings
2012-10-07 22:59 ` [ 030/108] cifs: fix return value in cifsConvertToUTF16 Ben Hutchings
2012-10-07 22:59 ` [ 031/108] cfg80211: fix possible circular lock on reg_regdb_search() Ben Hutchings
2012-10-07 22:59 ` [ 032/108] xen/boot: Disable BIOS SMP MP table search Ben Hutchings
2012-10-07 22:59 ` [ 033/108] asix: Support DLink DUB-E100 H/W Ver C1 Ben Hutchings
2012-10-07 22:59 ` [ 034/108] Input: i8042 - disable mux on Toshiba C850D Ben Hutchings
2012-10-07 22:59 ` [ 035/108] tracing: Dont call page_to_pfn() if page is NULL Ben Hutchings
2012-10-07 22:59 ` [ 036/108] can: janz-ican3: fix support for older hardware revisions Ben Hutchings
2012-10-07 22:59 ` [ 037/108] can: ti_hecc: fix oops during rmmod Ben Hutchings
2012-10-07 22:59 ` [ 038/108] HID: logitech: fix mask to enable DJ mode Ben Hutchings
2012-10-07 22:59 ` [ 039/108] HID: logitech: dont use stack based dj_report structures Ben Hutchings
2012-10-07 22:59 ` [ 040/108] dj: memory scribble in logi_dj Ben Hutchings
2012-10-07 22:59 ` [ 041/108] HID: Fix logitech-dj: missing Unifying device issue Ben Hutchings
2012-10-07 22:59 ` [ 042/108] hwmon: (fam15h_power) Tweak runavg_range on resume Ben Hutchings
2012-10-07 22:59 ` [ 043/108] xen/boot: Disable NUMA for PV guests Ben Hutchings
2012-10-07 22:59 ` [ 044/108] gpio-lpc32xx: Fix value handling of gpio_direction_output() Ben Hutchings
2012-10-07 22:59 ` [ 045/108] sb_edac: Avoid overflow errors at memory size calculation Ben Hutchings
2012-10-07 22:59 ` [ 046/108] dm: handle requests beyond end of device instead of using BUG_ON Ben Hutchings
2012-10-07 22:59 ` [ 047/108] dm table: clear add_random unless all devices have it set Ben Hutchings
2012-10-07 22:59 ` [ 048/108] md/raid10: fix "enough" function for detecting if array is failed Ben Hutchings
2012-10-07 22:59 ` [ 049/108] USB: Fix race condition when removing host controllers Ben Hutchings
2012-10-07 22:59 ` [ 050/108] asus-laptop: HRWS/HWRS typo Ben Hutchings
2012-10-07 22:59 ` [ 051/108] Bluetooth: btusb: Add vendor specific ID (0a5c:21f4) BCM20702A0 Ben Hutchings
2012-10-07 22:59 ` [ 052/108] Bluetooth: Use USB_VENDOR_AND_INTERFACE() for Broadcom devices Ben Hutchings
2012-10-07 22:59 ` [ 053/108] Bluetooth: Add support for Apple vendor-specific devices Ben Hutchings
2012-10-07 22:59 ` [ 054/108] net: Statically initialize init_net.dev_base_head Ben Hutchings
2012-10-07 22:59 ` [ 055/108] Fix a dead loop in async_synchronize_full() Ben Hutchings
2012-10-07 22:59 ` [ 056/108] rds: set correct msg_namelen Ben Hutchings
2012-10-07 22:59 ` [ 057/108] [libata] Prevent interface errors with Seagate FreeAgent GoFlex Ben Hutchings
2012-10-07 22:59 ` [ 058/108] mmc: Prevent 1.8V switch for SD hosts that dont support UHS modes Ben Hutchings
2012-10-07 22:59 ` [ 059/108] Bluetooth: Change signature of smp_conn_security() Ben Hutchings
2012-10-07 22:59 ` [ 060/108] Bluetooth: Fix sending a HCI Authorization Request over LE links Ben Hutchings
2012-10-07 22:59 ` [ 061/108] pch_uart: Add eg20t_port lock field, avoid recursive spinlocks Ben Hutchings
2012-10-07 22:59 ` [ 062/108] irq_remap: disable IRQ remapping if any IOAPIC lacks an IOMMU Ben Hutchings
2012-10-07 22:59 ` [ 063/108] vfs: dcache: fix deadlock in tree traversal Ben Hutchings
2012-10-07 22:59 ` [ 064/108] usb: gadget: dummy_hcd: fixup error probe path Ben Hutchings
2012-10-07 22:59 ` [ 065/108] CPU hotplug, cpusets, suspend: Dont modify cpusets during suspend/resume Ben Hutchings
2012-10-07 22:59 ` [ 066/108] Revert "drm/radeon: rework pll selection (v3)" Ben Hutchings
2012-10-07 22:59 ` [ 067/108] x86/alternatives: Fix p6 nops on non-modular kernels Ben Hutchings
2012-10-07 22:59 ` [ 068/108] HID: hidraw: add proper error handling to raw event reporting Ben Hutchings
2012-10-08 19:34 ` Herton Ronaldo Krzesinski
2012-10-10 2:25 ` Ben Hutchings
2012-10-07 22:59 ` [ 069/108] HID: hidraw: fix list->buffer memleak Ben Hutchings
2012-10-07 22:59 ` [ 070/108] HID: hidraw: improve error handling in hidraw_init() Ben Hutchings
2012-10-07 22:59 ` [ 071/108] HID: hidraw: dont deallocate memory when it is in use Ben Hutchings
2012-10-07 22:59 ` [ 072/108] PCI: acpiphp: check whether _ADR evaluation succeeded Ben Hutchings
2012-10-07 22:59 ` [ 073/108] bnx2x: fix rx checksum validation for IPv6 Ben Hutchings
2012-10-07 22:59 ` [ 074/108] xfrm: Workaround incompatibility of ESN and async crypto Ben Hutchings
2012-10-07 22:59 ` [ 075/108] xfrm_user: return error pointer instead of NULL Ben Hutchings
2012-10-07 22:59 ` [ 076/108] xfrm_user: return error pointer instead of NULL #2 Ben Hutchings
2012-10-07 22:59 ` [ 077/108] xfrm: fix a read lock imbalance in make_blackhole Ben Hutchings
2012-10-07 22:59 ` [ 078/108] xfrm_user: fix info leak in copy_to_user_auth() Ben Hutchings
2012-10-07 22:59 ` [ 079/108] xfrm_user: fix info leak in copy_to_user_state() Ben Hutchings
2012-10-07 22:59 ` [ 080/108] xfrm_user: fix info leak in copy_to_user_policy() Ben Hutchings
2012-10-07 22:59 ` [ 081/108] xfrm_user: fix info leak in copy_to_user_tmpl() Ben Hutchings
2012-10-07 22:59 ` [ 082/108] xfrm_user: dont copy esn replay window twice for new states Ben Hutchings
2012-10-07 22:59 ` [ 083/108] xfrm_user: ensure user supplied esn replay window is valid Ben Hutchings
2012-10-07 22:59 ` [ 084/108] net: ethernet: davinci_cpdma: decrease the desc count when cleaning up the remaining packets Ben Hutchings
2012-10-07 22:59 ` [ 085/108] ixp4xx_hss: fix build failure due to missing linux/module.h inclusion Ben Hutchings
2012-10-07 23:00 ` [ 086/108] netxen: check for root bus in netxen_mask_aer_correctable Ben Hutchings
2012-10-07 23:00 ` [ 087/108] net-sched: sch_cbq: avoid infinite loop Ben Hutchings
2012-10-07 23:00 ` [ 088/108] pkt_sched: fix virtual-start-time update in QFQ Ben Hutchings
2012-10-07 23:00 ` [ 089/108] sierra_net: Endianess bug fix Ben Hutchings
2012-10-07 23:00 ` [ 090/108] 8021q: fix mac_len recomputation in vlan_untag() Ben Hutchings
2012-10-07 23:00 ` [ 091/108] ipv6: release reference of ip6_null_entrys dst entry in __ip6_del_rt Ben Hutchings
2012-10-07 23:00 ` [ 092/108] tcp: flush DMA queue before sk_wait_data if rcv_wnd is zero Ben Hutchings
2012-10-07 23:00 ` [ 093/108] sctp: Dont charge for data in sndbuf again when transmitting packet Ben Hutchings
2012-10-07 23:00 ` [ 094/108] pppoe: drop PPPOX_ZOMBIEs in pppoe_release Ben Hutchings
2012-10-07 23:00 ` [ 095/108] net: small bug on rxhash calculation Ben Hutchings
2012-10-07 23:00 ` [ 096/108] net: guard tcp_set_keepalive() to tcp sockets Ben Hutchings
2012-10-07 23:00 ` [ 097/108] ipv4: raw: fix icmp_filter() Ben Hutchings
2012-10-07 23:00 ` [ 098/108] ipv6: raw: fix icmpv6_filter() Ben Hutchings
2012-10-07 23:00 ` [ 099/108] ipv6: mip6: fix mip6_mh_filter() Ben Hutchings
2012-10-07 23:00 ` [ 100/108] l2tp: fix a typo in l2tp_eth_dev_recv() Ben Hutchings
2012-10-07 23:00 ` [ 101/108] netrom: copy_datagram_iovec can fail Ben Hutchings
2012-10-07 23:00 ` [ 102/108] net: do not disable sg for packets requiring no checksum Ben Hutchings
2012-10-07 23:00 ` [ 103/108] aoe: assert AoE packets marked as " Ben Hutchings
2012-10-07 23:00 ` [ 104/108] tg3: Fix TSO CAP for 5704 devs w / ASF enabled Ben Hutchings
2012-10-07 23:00 ` [ 105/108] Bluetooth: Support AR3011 in Acer Iconia Tab W500 Ben Hutchings
2012-10-07 23:00 ` [ 106/108] Bluetooth: add support for atheros 0930:0219 Ben Hutchings
2012-10-07 23:00 ` [ 107/108] Bluetooth: add support for atheros 0489:e057 Ben Hutchings
2012-10-07 23:00 ` [ 108/108] Bluetooth: Add support for Sony Vaio T-Series Ben Hutchings
2012-10-08 13:05 ` [ 000/108] 3.2.31-stable review Ben Hutchings
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20121007225836.524377798@decadent.org.uk \
--to=ben@decadent.org.uk \
--cc=a.p.zijlstra@chello.nl \
--cc=akpm@linux-foundation.org \
--cc=alan@lxorguk.ukuu.org.uk \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=oleg@redhat.com \
--cc=stable@vger.kernel.org \
--cc=torvalds@linux-foundation.org \
--cc=y-goto@jp.fujitsu.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox