linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	alan@lxorguk.ukuu.org.uk, Yasunori Goto <y-goto@jp.fujitsu.com>,
	Oleg Nesterov <oleg@redhat.com>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Ingo Molnar <mingo@elte.hu>, Michal Hocko <mhocko@suse.cz>
Subject: [ 125/127] sched: Fix ancient race in do_exit()
Date: Fri, 28 Sep 2012 13:34:58 -0700	[thread overview]
Message-ID: <20120928203100.517080609@linuxfoundation.org> (raw)
In-Reply-To: <20120928203045.835238916@linuxfoundation.org>

3.0-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Yasunori Goto <y-goto@jp.fujitsu.com>

commit b5740f4b2cb3503b436925eb2242bc3d75cd3dfe upstream.

try_to_wake_up() has a problem which may change status from TASK_DEAD to
TASK_RUNNING in race condition with SMI or guest environment of virtual
machine. As a result, exited task is scheduled() again and panic occurs.

Here is the sequence how it occurs:

 ----------------------------------+-----------------------------
                                   |
            CPU A                  |             CPU B
 ----------------------------------+-----------------------------

TASK A calls exit()....

do_exit()

  exit_mm()
    down_read(mm->mmap_sem);

    rwsem_down_failed_common()

      set TASK_UNINTERRUPTIBLE
      set waiter.task <= task A
      list_add to sem->wait_list
           :
      raw_spin_unlock_irq()
      (I/O interruption occured)

                                      __rwsem_do_wake(mmap_sem)

                                        list_del(&waiter->list);
                                        waiter->task = NULL
                                        wake_up_process(task A)
                                          try_to_wake_up()
                                             (task is still
                                               TASK_UNINTERRUPTIBLE)
                                              p->on_rq is still 1.)

                                              ttwu_do_wakeup()
                                                 (*A)
                                                   :
     (I/O interruption handler finished)

      if (!waiter.task)
          schedule() is not called
          due to waiter.task is NULL.

      tsk->state = TASK_RUNNING

          :
                                              check_preempt_curr();
                                                  :
  task->state = TASK_DEAD
                                              (*B)
                                        <---    set TASK_RUNNING (*C)

     schedule()
     (exit task is running again)
     BUG_ON() is called!
 --------------------------------------------------------

The execution time between (*A) and (*B) is usually very short,
because the interruption is disabled, and setting TASK_RUNNING at (*C)
must be executed before setting TASK_DEAD.

HOWEVER, if SMI is interrupted between (*A) and (*B),
(*C) is able to execute AFTER setting TASK_DEAD!
Then, exited task is scheduled again, and BUG_ON() is called....

If the system works on guest system of virtual machine, the time
between (*A) and (*B) may be also long due to scheduling of hypervisor,
and same phenomenon can occur.

By this patch, do_exit() waits for releasing task->pi_lock which is used
in try_to_wake_up(). It guarantees the task becomes TASK_DEAD after
waking up.

Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/20120117174031.3118.E1E9C6FF@jp.fujitsu.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 kernel/exit.c |   16 ++++++++++++++++
 1 file changed, 16 insertions(+)

--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -1049,6 +1049,22 @@ NORET_TYPE void do_exit(long code)
 
 	preempt_disable();
 	exit_rcu();
+
+	/*
+	 * The setting of TASK_RUNNING by try_to_wake_up() may be delayed
+	 * when the following two conditions become true.
+	 *   - There is race condition of mmap_sem (It is acquired by
+	 *     exit_mm()), and
+	 *   - SMI occurs before setting TASK_RUNINNG.
+	 *     (or hypervisor of virtual machine switches to other guest)
+	 *  As a result, we may become TASK_RUNNING after becoming TASK_DEAD
+	 *
+	 * To avoid it, we have to wait for releasing tsk->pi_lock which
+	 * is held by try_to_wake_up()
+	 */
+	smp_mb();
+	raw_spin_unlock_wait(&tsk->pi_lock);
+
 	/* causes final put_task_struct in finish_task_switch(). */
 	tsk->state = TASK_DEAD;
 	schedule();



  parent reply	other threads:[~2012-09-28 21:08 UTC|newest]

Thread overview: 135+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-09-28 20:32 [ 000/127] 3.0.44-stable review Greg Kroah-Hartman
2012-09-28 20:32 ` [ 001/127] net: Allow driver to limit number of GSO segments per skb Greg Kroah-Hartman
2012-09-28 20:32 ` [ 002/127] sfc: Fix maximum number of TSO segments and minimum TX queue size Greg Kroah-Hartman
2012-09-28 20:32 ` [ 003/127] tcp: Apply device TSO segment limit earlier Greg Kroah-Hartman
2012-09-28 20:32 ` [ 004/127] net_sched: gact: Fix potential panic in tcf_gact() Greg Kroah-Hartman
2012-09-28 20:32 ` [ 005/127] isdnloop: fix and simplify isdnloop_init() Greg Kroah-Hartman
2012-09-28 20:32 ` [ 006/127] net/core: Fix potential memory leak in dev_set_alias() Greg Kroah-Hartman
2012-09-28 20:33 ` [ 007/127] af_packet: remove BUG statement in tpacket_destruct_skb Greg Kroah-Hartman
2012-09-28 20:33 ` [ 008/127] ipv6: addrconf: Avoid calling netdevice notifiers with RCU read-side lock Greg Kroah-Hartman
2012-09-28 20:33 ` [ 009/127] atm: fix info leak in getsockopt(SO_ATMPVC) Greg Kroah-Hartman
2012-09-28 20:33 ` [ 010/127] atm: fix info leak via getsockname() Greg Kroah-Hartman
2012-09-28 20:33 ` [ 011/127] Bluetooth: HCI - Fix info leak in getsockopt(HCI_FILTER) Greg Kroah-Hartman
2012-09-28 20:33 ` [ 012/127] Bluetooth: HCI - Fix info leak via getsockname() Greg Kroah-Hartman
2012-09-28 20:33 ` [ 013/127] Bluetooth: RFCOMM - Fix info leak in ioctl(RFCOMMGETDEVLIST) Greg Kroah-Hartman
2012-09-28 20:33 ` [ 014/127] Bluetooth: RFCOMM - Fix info leak via getsockname() Greg Kroah-Hartman
2012-09-28 20:33 ` [ 015/127] Bluetooth: L2CAP " Greg Kroah-Hartman
2012-09-28 20:33 ` [ 016/127] llc: fix " Greg Kroah-Hartman
2012-09-28 20:33 ` [ 017/127] dccp: fix info leak via getsockopt(DCCP_SOCKOPT_CCID_TX_INFO) Greg Kroah-Hartman
2012-09-28 20:33 ` [ 018/127] ipvs: fix info leak in getsockopt(IP_VS_SO_GET_TIMEOUT) Greg Kroah-Hartman
2012-09-28 20:33 ` [ 019/127] net: fix info leak in compat dev_ifconf() Greg Kroah-Hartman
2012-09-28 20:33 ` [ 020/127] netlink: fix possible spoofing from non-root processes Greg Kroah-Hartman
2012-09-28 20:33 ` [ 021/127] l2tp: avoid to use synchronize_rcu in tunnel free function Greg Kroah-Hartman
2012-09-28 20:33 ` [ 022/127] net: ipv4: ipmr_expire_timer causes crash when removing net namespace Greg Kroah-Hartman
2012-09-28 20:33 ` [ 023/127] workqueue: reimplement work_on_cpu() using system_wq Greg Kroah-Hartman
2012-09-28 20:33 ` [ 024/127] cpufreq/powernow-k8: workqueue user shouldnt migrate the kworker to another CPU Greg Kroah-Hartman
2012-09-28 20:33 ` [ 025/127] cciss: fix handling of protocol error Greg Kroah-Hartman
2012-09-28 20:33 ` [ 026/127] vfs: make O_PATH file descriptors usable for fstat() Greg Kroah-Hartman
2012-09-28 20:33 ` [ 027/127] vfs: dcache: use DCACHE_DENTRY_KILLED instead of DCACHE_DISCONNECTED in d_kill() Greg Kroah-Hartman
2012-09-28 20:33 ` [ 028/127] netconsole: remove a redundant netconsole_target_put() Greg Kroah-Hartman
2012-09-28 20:33 ` [ 029/127] eCryptfs: Copy up attributes of the lower target inode after rename Greg Kroah-Hartman
2012-09-28 20:33 ` [ 030/127] target: Fix ->data_length re-assignment bug with SCSI overflow Greg Kroah-Hartman
2012-09-28 20:33 ` [ 031/127] ALSA: ice1724: Use linear scale for AK4396 volume control Greg Kroah-Hartman
2012-09-28 20:33 ` [ 032/127] Staging: speakup: fix an improperly-declared variable Greg Kroah-Hartman
2012-09-28 20:33 ` [ 033/127] staging: vt6656: [BUG] - Failed connection, incorrect endian Greg Kroah-Hartman
2012-09-28 20:33 ` [ 034/127] staging: r8712u: fix bug in r8712_recv_indicatepkt() Greg Kroah-Hartman
2012-09-28 20:33 ` [ 035/127] staging: comedi: das08: Correct AO output for das08jr-16-ao Greg Kroah-Hartman
2012-09-28 20:33 ` [ 036/127] USB: option: replace ZTE K5006-Z entry with vendor class rule Greg Kroah-Hartman
2012-09-28 20:33 ` [ 037/127] perf_event: Switch to internal refcount, fix race with close() Greg Kroah-Hartman
2012-09-28 20:33 ` [ 038/127] mmc: mxs-mmc: fix deadlock in SDIO IRQ case Greg Kroah-Hartman
2012-09-28 20:33 ` [ 039/127] mmc: sdhci-esdhc: break out early if clock is 0 Greg Kroah-Hartman
2012-09-28 20:33 ` [ 040/127] ahci: Add alternate identifier for the 88SE9172 Greg Kroah-Hartman
2012-09-28 20:33 ` [ 041/127] kobject: fix oops with "input0: bad kobj_uevent_env content in show_uevent()" Greg Kroah-Hartman
2012-09-28 20:33 ` [ 042/127] Redefine ATOMIC_INIT and ATOMIC64_INIT to drop the casts Greg Kroah-Hartman
2012-09-28 20:33 ` [ 043/127] md: Dont truncate size at 4TB for RAID0 and Linear Greg Kroah-Hartman
2012-09-28 20:33 ` [ 044/127] mm/page_alloc: fix the page address of higher pages buddy calculation Greg Kroah-Hartman
2012-09-28 20:33 ` [ 045/127] drivers/rtc/rtc-twl.c: ensure all interrupts are disabled during probe Greg Kroah-Hartman
2012-09-28 20:33 ` [ 046/127] hwmon: (twl4030-madc-hwmon) Initialize uninitialized structure elements Greg Kroah-Hartman
2012-09-28 20:33 ` [ 047/127] can: mcp251x: avoid repeated frame bug Greg Kroah-Hartman
2012-09-28 20:33 ` [ 048/127] mm/ia64: fix a memory block size bug Greg Kroah-Hartman
2012-09-28 20:33 ` [ 049/127] memory hotplug: fix section info double registration bug Greg Kroah-Hartman
2012-09-28 20:33 ` [ 050/127] xen/boot: Disable NUMA for PV guests Greg Kroah-Hartman
2012-09-28 20:33 ` [ 051/127] hwmon: (fam15h_power) Tweak runavg_range on resume Greg Kroah-Hartman
2012-09-28 20:33 ` [ 052/127] hwmon: (ads7871) Add name sysfs attribute Greg Kroah-Hartman
2012-09-28 20:33 ` [ 053/127] SCSI: mpt2sas: Fix for issue - Unable to boot from the drive connected to HBA Greg Kroah-Hartman
2012-09-28 20:33 ` [ 054/127] SCSI: bnx2i: Fixed NULL ptr deference for 1G bnx2 Linux iSCSI offload Greg Kroah-Hartman
2012-09-28 20:33 ` [ 055/127] SCSI: hpsa: fix handling of protocol error Greg Kroah-Hartman
2012-09-28 20:33 ` [ 056/127] Bluetooth: Fix not removing power_off delayed work Greg Kroah-Hartman
2012-09-28 22:13   ` Ben Hutchings
2012-09-28 22:19     ` Gustavo Padovan
2012-09-28 22:26       ` Greg Kroah-Hartman
2012-09-28 20:33 ` [ 057/127] hpwdt: Fix kdump issue in hpwdt Greg Kroah-Hartman
2012-09-28 20:33 ` [ 058/127] ARM: fix bad applied patch for arch/arm/Kconfig of stable 3.0.y tree Greg Kroah-Hartman
2012-09-28 20:33 ` [ 059/127] ARM: 7532/1: decompressor: reset SCTLR.TRE for VMSA ARMv7 cores Greg Kroah-Hartman
2012-09-28 20:33 ` [ 060/127] tracing: Dont call page_to_pfn() if page is NULL Greg Kroah-Hartman
2012-09-28 20:33 ` [ 061/127] Input: i8042 - disable mux on Toshiba C850D Greg Kroah-Hartman
2012-09-28 20:33 ` [ 062/127] asix: Support DLink DUB-E100 H/W Ver C1 Greg Kroah-Hartman
2012-09-28 20:33 ` [ 063/127] can: ti_hecc: fix oops during rmmod Greg Kroah-Hartman
2012-09-28 20:33 ` [ 064/127] can: janz-ican3: fix support for older hardware revisions Greg Kroah-Hartman
2012-09-28 20:33 ` [ 065/127] cfg80211: fix possible circular lock on reg_regdb_search() Greg Kroah-Hartman
2012-09-28 20:33 ` [ 066/127] dmaengine: at_hdmac: fix comment in atc_prep_slave_sg() Greg Kroah-Hartman
2012-09-28 20:34 ` [ 067/127] dmaengine: at_hdmac: check that each sg data length is non-null Greg Kroah-Hartman
2012-09-28 20:34 ` [ 068/127] rt2x00: Fix word size of rt2500usb MAC_CSR19 register Greg Kroah-Hartman
2012-09-28 20:34 ` [ 069/127] rt2x00: Fix rfkill polling prior to interface start Greg Kroah-Hartman
2012-09-28 20:34 ` [ 070/127] NFS: Fix the initialisation of the readdir cookieverf array Greg Kroah-Hartman
2012-09-28 20:34 ` [ 071/127] NFS: Fix a problem with the legacy binary mount code Greg Kroah-Hartman
2012-09-28 20:34 ` [ 072/127] NFS: return error from decode_getfh in decode open Greg Kroah-Hartman
2012-09-28 20:34 ` [ 073/127] EHCI: Update qTD next pointer in QH overlay region during unlink Greg Kroah-Hartman
2012-09-28 20:34 ` [ 074/127] USB: ftdi_sio: PID for NZR SEM 16+ USB Greg Kroah-Hartman
2012-09-28 20:34 ` [ 075/127] USB: ftdi_sio: do not claim CDC ACM function Greg Kroah-Hartman
2012-09-28 20:34 ` [ 076/127] USB: ftdi-sio: add support for more Physik Instrumente devices Greg Kroah-Hartman
2012-09-28 20:34 ` [ 077/127] USB: add device quirk for Joss Optical touchboard Greg Kroah-Hartman
2012-09-28 20:34 ` [ 078/127] Intel xhci: Only switch the switchable ports Greg Kroah-Hartman
2012-09-28 20:34 ` [ 079/127] xhci: Fix a logical vs bitwise AND bug Greg Kroah-Hartman
2012-09-28 20:34 ` [ 080/127] xhci: Make handover code more robust Greg Kroah-Hartman
2012-09-28 20:34 ` [ 081/127] xhci: Recognize USB 3.0 devices as superspeed at powerup Greg Kroah-Hartman
2012-09-28 20:34 ` [ 082/127] usb: host: xhci: fix compilation error for non-PCI based stacks Greg Kroah-Hartman
2012-09-28 20:34 ` [ 083/127] xhci: Fix bug after deq ptr set to link TRB Greg Kroah-Hartman
2012-09-28 20:34 ` [ 084/127] mutex: Place lock in contended state after fastpath_lock failure Greg Kroah-Hartman
2012-09-28 20:34 ` [ 085/127] drivers/rtc/rtc-rs5c348.c: fix hour decoding in 12-hour mode Greg Kroah-Hartman
2012-09-28 20:34 ` [ 086/127] PM / Runtime: Fix rpm_resume() return value for power.no_callbacks set Greg Kroah-Hartman
2012-09-28 20:34 ` [ 087/127] PM / Runtime: Clear power.deferred_resume on success in rpm_suspend() Greg Kroah-Hartman
2012-09-28 20:34 ` [ 088/127] drivers/misc/sgi-xp/xpc_uv.c: SGI XPC fails to load when cpu 0 is out of IRQ resources Greg Kroah-Hartman
2012-09-28 20:34 ` [ 089/127] fbcon: fix race condition between console lock and cursor timer (v1.1) Greg Kroah-Hartman
2012-09-28 20:34 ` [ 090/127] drm/radeon/kms: extend the Fujitsu D3003-S2 board connector quirk to cover later silicon stepping Greg Kroah-Hartman
2012-09-28 20:34 ` [ 091/127] asus-laptop: HRWS/HWRS typo Greg Kroah-Hartman
2012-09-28 20:34 ` [ 092/127] asus-nb-wmi: add some video toggle keys Greg Kroah-Hartman
2012-09-28 20:34 ` [ 093/127] drm/i915: HDMI - Clear Audio Enable bit for Hot Plug Greg Kroah-Hartman
2012-09-28 20:34 ` [ 094/127] md: make sure metadata is updated when spares are activated or removed Greg Kroah-Hartman
2012-09-28 22:51   ` Ben Hutchings
2012-09-28 23:57     ` Greg Kroah-Hartman
2012-10-02  2:52       ` NeilBrown
2012-10-02 16:09         ` Greg Kroah-Hartman
2012-09-28 20:34 ` [ 095/127] workqueue: UNBOUND -> REBIND morphing in rebind_workers() should be atomic Greg Kroah-Hartman
2012-09-28 20:34 ` [ 096/127] x86: Fix boot on Twinhead H12Y Greg Kroah-Hartman
2012-09-28 20:34 ` [ 097/127] Bluetooth: btusb: Add vendor specific ID (0a5c:21f4) BCM20702A0 Greg Kroah-Hartman
2012-09-28 20:34 ` [ 098/127] Bluetooth: Use USB_VENDOR_AND_INTERFACE() for Broadcom devices Greg Kroah-Hartman
2012-09-28 20:34 ` [ 099/127] Bluetooth: Add support for Apple vendor-specific devices Greg Kroah-Hartman
2012-09-28 20:34 ` [ 100/127] net: Statically initialize init_net.dev_base_head Greg Kroah-Hartman
2012-09-28 20:34 ` [ 101/127] Fix a dead loop in async_synchronize_full() Greg Kroah-Hartman
2012-09-28 20:34 ` [ 102/127] rds: set correct msg_namelen Greg Kroah-Hartman
2012-09-28 20:34 ` [ 103/127] libata: Prevent interface errors with Seagate FreeAgent GoFlex Greg Kroah-Hartman
2012-09-28 20:34 ` [ 104/127] cpufreq / ACPI: Fix not loading acpi-cpufreq driver regression Greg Kroah-Hartman
2012-09-28 20:34 ` [ 105/127] sched: Fix race in task_group() Greg Kroah-Hartman
2012-09-28 20:34 ` [ 106/127] media: lirc_sir: make device registration work Greg Kroah-Hartman
2012-09-28 20:34 ` [ 107/127] drop_monitor: fix sleeping in invalid context warning Greg Kroah-Hartman
2012-09-28 20:34 ` [ 108/127] drop_monitor: Make updating data->skb smp safe Greg Kroah-Hartman
2012-09-28 20:34 ` [ 109/127] drop_monitor: prevent init path from scheduling on the wrong cpu Greg Kroah-Hartman
2012-09-28 20:34 ` [ 110/127] drop_monitor: dont sleep in atomic context Greg Kroah-Hartman
2012-09-28 20:34 ` [ 111/127] time: Improve sanity checking of timekeeping inputs Greg Kroah-Hartman
2012-09-28 20:34 ` [ 112/127] time: Avoid making adjustments if we havent accumulated anything Greg Kroah-Hartman
2012-09-28 20:34 ` [ 113/127] time: Move ktime_t overflow checking into timespec_valid_strict Greg Kroah-Hartman
2012-09-28 20:34 ` [ 114/127] media: Avoid sysfs oops when an rc_devs raw device is absent Greg Kroah-Hartman
2012-09-28 20:34 ` [ 115/127] pch_uart: Fix missing break for 16 byte fifo Greg Kroah-Hartman
2012-09-28 20:34 ` [ 116/127] pch_uart: Fix rx error interrupt setting issue Greg Kroah-Hartman
2012-09-28 20:34 ` [ 117/127] pch_uart: Fix parity " Greg Kroah-Hartman
2012-09-28 20:34 ` [ 118/127] Squashfs: fix mount time sanity check for corrupted superblock Greg Kroah-Hartman
2012-09-28 20:34 ` [ 119/127] mmc: sd: Handle SD3.0 cards not supporting UHS-I bus speed mode Greg Kroah-Hartman
2012-09-28 20:34 ` [ 120/127] mmc: Prevent 1.8V switch for SD hosts that dont support UHS modes Greg Kroah-Hartman
2012-09-28 20:34 ` [ 121/127] e1000e: Disable ASPM L1 on 82574 Greg Kroah-Hartman
2012-09-28 20:34 ` [ 122/127] UBI: fix a horrible memory deallocation bug Greg Kroah-Hartman
2012-09-28 20:34 ` [ 123/127] spi/mpc83xx: fix NULL pdata dereference bug Greg Kroah-Hartman
2012-09-28 20:34 ` [ 124/127] spi/spi-fsl-spi: reference correct pdata in fsl_spi_cs_control Greg Kroah-Hartman
2012-09-28 20:34 ` Greg Kroah-Hartman [this message]
2012-09-28 20:34 ` [ 126/127] MCE: Fix vm86 handling for 32bit mce handler Greg Kroah-Hartman
2012-09-28 20:35 ` [ 127/127] USB: Fix race condition when removing host controllers Greg Kroah-Hartman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120928203100.517080609@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=a.p.zijlstra@chello.nl \
    --cc=akpm@linux-foundation.org \
    --cc=alan@lxorguk.ukuu.org.uk \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mhocko@suse.cz \
    --cc=mingo@elte.hu \
    --cc=oleg@redhat.com \
    --cc=stable@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=y-goto@jp.fujitsu.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).