From: Greg KH <gregkh@suse.de>
To: linux-kernel@vger.kernel.org, stable@kernel.org
Cc: Justin Forbes <jmforbes@linuxtx.org>,
Zwane Mwaikambo <zwane@arm.linux.org.uk>,
"Theodore Ts'o" <tytso@mit.edu>,
Randy Dunlap <rdunlap@xenotime.net>,
Dave Jones <davej@redhat.com>,
Chuck Wolber <chuckw@quantumlinux.com>,
Chris Wedgwood <reviews@ml.cw.f00f.org>,
Michael Krufky <mkrufky@linuxtv.org>,
Chuck Ebbert <cebbert@redhat.com>,
Domenico Andreoli <cavokz@gmail.com>, Willy Tarreau <w@1wt.eu>,
Rodrigo Rubira Branco <rbranco@la.checkpoint.com>,
Jake Edge <jake@lwn.net>, Eugene Teo <eteo@redhat.com>,
torvalds@linux-foundation.org, akpm@linux-foundation.org,
alan@lxorguk.ukuu.org.uk, Jiri Slaby <jirislaby@gmail.com>,
Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>,
Paul Menage <menage@google.com>,
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
Balbir Singh <balbir@linux.vnet.ibm.com>,
Hugh Dickins <hugh@veritas.com>
Subject: [patch 70/71] mm owner: fix race between swapoff and exit
Date: Mon, 6 Oct 2008 17:40:29 -0700 [thread overview]
Message-ID: <20081007004029.GS3055@suse.de> (raw)
In-Reply-To: <20081007003634.GA3055@suse.de>
[-- Attachment #1: mm-owner-fix-race-between-swapoff-and-exit.patch --]
[-- Type: text/plain, Size: 5407 bytes --]
2.6.26-stable review patch. If anyone has any objections, please let us
know.
------------------
From: Balbir Singh <balbir@linux.vnet.ibm.com>
[Here's a backport of 2.6.27-rc8's 31a78f23bac0069004e69f98808b6988baccb6b6
to 2.6.26 or 2.6.26.5: I wouldn't trouble -stable for the (root only)
swapoff case which uncovered the bug, but the /proc/<pid>/<mmstats> case
is open to all, so I think worth plugging in the next 2.6.26-stable.
- Hugh]
There's a race between mm->owner assignment and swapoff, more easily
seen when task slab poisoning is turned on. The condition occurs when
try_to_unuse() runs in parallel with an exiting task. A similar race
can occur with callers of get_task_mm(), such as /proc/<pid>/<mmstats>
or ptrace or page migration.
CPU0 CPU1
try_to_unuse
looks at mm = task0->mm
increments mm->mm_users
task 0 exits
mm->owner needs to be updated, but no
new owner is found (mm_users > 1, but
no other task has task->mm = task0->mm)
mm_update_next_owner() leaves
mmput(mm) decrements mm->mm_users
task0 freed
dereferencing mm->owner fails
The fix is to notify the subsystem via mm_owner_changed callback(),
if no new owner is found, by specifying the new task as NULL.
Jiri Slaby:
mm->owner was set to NULL prior to calling cgroup_mm_owner_callbacks(), but
must be set after that, so as not to pass NULL as old owner causing oops.
Daisuke Nishimura:
mm_update_next_owner() may set mm->owner to NULL, but mem_cgroup_from_task()
and its callers need to take account of this situation to avoid oops.
Hugh Dickins:
Lockdep warning and hang below exec_mmap() when testing these patches.
exit_mm() up_reads mmap_sem before calling mm_update_next_owner(),
so exec_mmap() now needs to do the same. And with that repositioning,
there's now no point in mm_need_new_owner() allowing for NULL mm.
Reported-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Paul Menage <menage@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
---
fs/exec.c | 2 +-
kernel/cgroup.c | 5 +++--
kernel/exit.c | 12 ++++++++++--
mm/memcontrol.c | 13 +++++++++++++
4 files changed, 27 insertions(+), 5 deletions(-)
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -740,11 +740,11 @@ static int exec_mmap(struct mm_struct *m
tsk->active_mm = mm;
activate_mm(active_mm, mm);
task_unlock(tsk);
- mm_update_next_owner(old_mm);
arch_pick_mmap_layout(mm);
if (old_mm) {
up_read(&old_mm->mmap_sem);
BUG_ON(active_mm != old_mm);
+ mm_update_next_owner(old_mm);
mmput(old_mm);
return 0;
}
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -2761,14 +2761,15 @@ void cgroup_fork_callbacks(struct task_s
*/
void cgroup_mm_owner_callbacks(struct task_struct *old, struct task_struct *new)
{
- struct cgroup *oldcgrp, *newcgrp;
+ struct cgroup *oldcgrp, *newcgrp = NULL;
if (need_mm_owner_callback) {
int i;
for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) {
struct cgroup_subsys *ss = subsys[i];
oldcgrp = task_cgroup(old, ss->subsys_id);
- newcgrp = task_cgroup(new, ss->subsys_id);
+ if (new)
+ newcgrp = task_cgroup(new, ss->subsys_id);
if (oldcgrp == newcgrp)
continue;
if (ss->mm_owner_changed)
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -577,8 +577,6 @@ mm_need_new_owner(struct mm_struct *mm,
* If there are other users of the mm and the owner (us) is exiting
* we need to find a new owner to take on the responsibility.
*/
- if (!mm)
- return 0;
if (atomic_read(&mm->mm_users) <= 1)
return 0;
if (mm->owner != p)
@@ -621,6 +619,16 @@ retry:
} while_each_thread(g, c);
read_unlock(&tasklist_lock);
+ /*
+ * We found no owner yet mm_users > 1: this implies that we are
+ * most likely racing with swapoff (try_to_unuse()) or /proc or
+ * ptrace or page migration (get_task_mm()). Mark owner as NULL,
+ * so that subsystems can understand the callback and take action.
+ */
+ down_write(&mm->mmap_sem);
+ cgroup_mm_owner_callbacks(mm->owner, NULL);
+ mm->owner = NULL;
+ up_write(&mm->mmap_sem);
return;
assign_new_owner:
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -250,6 +250,14 @@ static struct mem_cgroup *mem_cgroup_fro
struct mem_cgroup *mem_cgroup_from_task(struct task_struct *p)
{
+ /*
+ * mm_update_next_owner() may clear mm->owner to NULL
+ * if it races with swapoff, page migration, etc.
+ * So this can be called with p == NULL.
+ */
+ if (unlikely(!p))
+ return NULL;
+
return container_of(task_subsys_state(p, mem_cgroup_subsys_id),
struct mem_cgroup, css);
}
@@ -574,6 +582,11 @@ retry:
rcu_read_lock();
mem = mem_cgroup_from_task(rcu_dereference(mm->owner));
+ if (unlikely(!mem)) {
+ rcu_read_unlock();
+ kmem_cache_free(page_cgroup_cache, pc);
+ return 0;
+ }
/*
* For every charge from the cgroup, increment reference count
*/
--
next prev parent reply other threads:[~2008-10-07 1:05 UTC|newest]
Thread overview: 78+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20081007002606.723632097@mini.kroah.org>
2008-10-07 0:36 ` [patch 00/71] 2.6.26-stable review Greg KH
2008-10-07 0:37 ` [patch 01/71] x86-32: AMD c1e force timer broadcast late Greg KH
2008-10-07 0:37 ` [patch 02/71] ACPI: Fix thermal shutdowns Greg KH
2008-10-07 0:37 ` [patch 03/71] i2c-dev: Return correct error code on class_create() failure Greg KH
2008-10-07 0:37 ` [patch 04/71] ixgbe: initialize interrupt throttle rate Greg KH
2008-10-07 0:37 ` [patch 05/71] drivers/mmc/card/block.c: fix refcount leak in mmc_block_open() Greg KH
2008-10-07 0:37 ` [patch 06/71] async_tx: fix the bug in async_tx_run_dependencies Greg KH
2008-10-07 0:37 ` [patch 07/71] mm: mark the correct zone as full when scanning zonelists Greg KH
2008-10-07 0:37 ` [patch 08/71] pxa2xx_spi: dma bugfixes Greg KH
2008-10-07 0:37 ` [patch 09/71] pxa2xx_spi: chipselect bugfixes Greg KH
2008-10-07 0:37 ` [patch 10/71] smb.h: do not include linux/time.h in userspace Greg KH
2008-10-07 0:37 ` [patch 11/71] USB: fix hcd interrupt disabling Greg KH
2008-10-07 0:37 ` [patch 12/71] SCSI: qla2xxx: Defer enablement of RISC interrupts until ISP initialization completes Greg KH
2008-10-07 0:38 ` [patch 13/71] ALSA: hda - Fix model for Dell Inspiron 1525 Greg KH
2008-10-07 0:38 ` [patch 14/71] ALSA: oxygen: fix distorted output on AK4396-based cards Greg KH
2008-10-07 0:38 ` [patch 15/71] ALSA: fix locking in snd_pcm_open*() and snd_rawmidi_open*() Greg KH
2008-10-07 0:38 ` [patch 16/71] ALSA: remove unneeded power_mutex lock in snd_pcm_drop Greg KH
2008-10-07 0:38 ` [patch 17/71] KVM: SVM: fix random segfaults with NPT enabled Greg KH
2008-10-07 0:38 ` [patch 18/71] KVM: SVM: fix guest global tlb flushes with NPT Greg KH
2008-10-07 0:38 ` [patch 19/71] x86-64: Clean up save/restore_i387() usage Greg KH
2008-10-07 0:38 ` [patch 20/71] x64, fpu: fix possible FPU leakage in error conditions Greg KH
2008-10-07 0:38 ` [patch 21/71] x86: Fix broken LDT access in VMI Greg KH
2008-10-07 0:38 ` [patch 22/71] block: submit_bh() inadvertently discards barrier flag on a sync write Greg KH
2008-10-07 0:38 ` [patch 23/71] sched: fix process time monotonicity Greg KH
2008-10-07 0:38 ` [patch 24/71] APIC routing fix Greg KH
2008-10-07 0:38 ` [patch 25/71] ocfs2: Increment the reference count of an already-active stack Greg KH
2008-10-07 0:38 ` [patch 26/71] sg: disable interrupts inside sg_copy_buffer Greg KH
2008-10-07 0:38 ` [patch 27/71] x86: Fix 27-rc crash on vsmp due to paravirt during module load Greg KH
2008-10-07 0:38 ` [patch 28/71] rt2x00: Use ieee80211_hw->workqueue again Greg KH
2008-10-07 0:38 ` [patch 29/71] x86: fdiv bug detection fix Greg KH
2008-10-07 0:38 ` [patch 30/71] x86: fix oprofile + hibernation badness Greg KH
2008-10-07 0:38 ` [patch 31/71] x86: PAT proper tracking of set_memory_uc and friends Greg KH
2008-10-07 0:38 ` [patch 32/71] x86-64: fix overlap of modules and fixmap areas Greg KH
2008-10-07 0:38 ` [patch 33/71] mm: dirty page tracking race fix Greg KH
2008-10-07 0:38 ` [patch 34/71] rtc: fix deadlock Greg KH
2008-10-07 0:38 ` [patch 35/71] x86: fix SMP alternatives: use mutex instead of spinlock, text_poke is sleepable Greg KH
2008-10-07 0:38 ` [patch 36/71] ACPI: Avoid bogus EC timeout when EC is in Polling mode Greg KH
2008-10-07 0:39 ` [patch 37/71] x86: add io delay quirk for Presario F700 Greg KH
2008-10-07 0:39 ` [patch 38/71] x86: fix memmap=exactmap boot argument Greg KH
2008-10-07 0:39 ` [patch 39/71] clockevents: prevent clockevent event_handler ending up handler_noop Greg KH
2008-10-07 0:39 ` [patch 40/71] clockevents: prevent endless loop in periodic broadcast handler Greg KH
2008-10-07 0:39 ` [patch 41/71] clockevents: enforce reprogram in oneshot setup Greg KH
2008-10-07 0:39 ` [patch 42/71] clockevents: prevent multiple init/shutdown Greg KH
2008-10-07 0:39 ` [patch 43/71] clockevents: prevent endless loop lockup Greg KH
2008-10-07 0:39 ` [patch 44/71] HPET: make minimum reprogramming delta useful Greg KH
2008-10-07 0:39 ` [patch 45/71] clockevents: broadcast fixup possible waiters Greg KH
2008-10-07 0:39 ` [patch 46/71] x86: HPET fix moronic 32/64bit thinko Greg KH
2008-10-07 0:39 ` [patch 47/71] x86: HPET: read back compare register before reading counter Greg KH
2008-10-07 0:39 ` [patch 48/71] ntp: fix calculation of the next jiffie to trigger RTC sync Greg KH
2008-10-07 0:39 ` [patch 49/71] clockevents: remove WARN_ON which was used to gather information Greg KH
2008-10-07 0:39 ` [patch 50/71] pcmcia: Fix broken abuse of dev->driver_data Greg KH
2008-10-07 0:39 ` [patch 51/71] af_key: Free dumping state on socket close Greg KH
2008-10-07 0:39 ` [patch 52/71] XFRM,IPv6: initialize ip6_dst_blackhole_ops.kmem_cachep Greg KH
2008-10-07 0:39 ` [patch 53/71] ipv6: Fix OOPS in ip6_dst_lookup_tail() Greg KH
2008-10-07 0:39 ` [patch 54/71] niu: panic on reset Greg KH
2008-10-07 0:39 ` [patch 55/71] netlink: fix overrun in attribute iteration Greg KH
2008-10-07 0:39 ` [patch 56/71] ipsec: Fix pskb_expand_head corruption in xfrm_state_check_space Greg KH
2008-10-07 0:40 ` [patch 57/71] sctp: do not enable peer features if we cant do them Greg KH
2008-10-07 0:40 ` [patch 58/71] sctp: Fix oops when INIT-ACK indicates that peer doesnt support AUTH Greg KH
2008-10-07 0:40 ` [patch 59/71] udp: Fix rcv socket locking Greg KH
2008-10-07 0:40 ` [patch 60/71] sparc64: Fix PCI error interrupt registry on PSYCHO Greg KH
2008-12-29 17:14 ` [patch 60/71] sparc64: Fix PCI error interrupt registry on Geert Uytterhoeven
2008-12-29 17:14 ` [patch 60/71] sparc64: Fix PCI error interrupt registry on PSYCHO Geert Uytterhoeven
2008-12-30 2:36 ` [patch 60/71] sparc64: Fix PCI error interrupt registry on David Miller
2008-12-30 2:36 ` [patch 60/71] sparc64: Fix PCI error interrupt registry on PSYCHO David Miller
2008-10-07 0:40 ` [patch 61/71] sparc64: Fix interrupt register calculations on Psycho and Sabre Greg KH
2008-10-07 0:40 ` [patch 62/71] sparc64: Fix OOPS in psycho_pcierr_intr_other() Greg KH
2008-10-07 0:40 ` [patch 63/71] sparc64: Fix disappearing PCI devices on e3500 Greg KH
2008-10-07 0:40 ` [patch 64/71] sparc64: Fix missing devices due to PCI bridge test in of_create_pci_dev() Greg KH
2008-10-07 0:40 ` [patch 65/71] braille_console: only register notifiers when the braille console is used Greg KH
2008-10-07 0:40 ` [patch 66/71] ALSA: snd-powermac: mixers for PowerMac G4 AGP Greg KH
2008-10-07 0:40 ` [patch 67/71] ALSA: snd-powermac: HP detection for 1st iMac G3 SL Greg KH
2008-10-07 0:40 ` [patch 68/71] fbcon: fix monochrome color value calculation Greg KH
2008-10-07 0:40 ` [patch 69/71] rtc: fix kernel panic on second use of SIGIO nofitication Greg KH
2008-10-07 0:40 ` Greg KH [this message]
2008-10-07 0:40 ` [patch 71/71] S390: CVE-2008-1514: prevent ptrace padding area read/write in 31-bit mode Greg KH
2008-10-07 4:42 ` [patch 00/71] 2.6.26-stable review Grant Coady
2008-10-07 4:59 ` Greg KH
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20081007004029.GS3055@suse.de \
--to=gregkh@suse.de \
--cc=akpm@linux-foundation.org \
--cc=alan@lxorguk.ukuu.org.uk \
--cc=balbir@linux.vnet.ibm.com \
--cc=cavokz@gmail.com \
--cc=cebbert@redhat.com \
--cc=chuckw@quantumlinux.com \
--cc=davej@redhat.com \
--cc=eteo@redhat.com \
--cc=hugh@veritas.com \
--cc=jake@lwn.net \
--cc=jirislaby@gmail.com \
--cc=jmforbes@linuxtx.org \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=menage@google.com \
--cc=mkrufky@linuxtv.org \
--cc=nishimura@mxp.nes.nec.co.jp \
--cc=rbranco@la.checkpoint.com \
--cc=rdunlap@xenotime.net \
--cc=reviews@ml.cw.f00f.org \
--cc=stable@kernel.org \
--cc=torvalds@linux-foundation.org \
--cc=tytso@mit.edu \
--cc=w@1wt.eu \
--cc=zwane@arm.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.