From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
alan@lxorguk.ukuu.org.uk, Dave Jones <davej@redhat.com>,
Christoph Lameter <cl@linux.com>,
KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
Mel Gorman <mgorman@suse.de>, Josh Boyer <jwboyer@gmail.com>,
Andrew Morton <akpm@linux-foundation.org>,
Linus Torvalds <torvalds@linux-foundation.org>
Subject: [ 75/84] mempolicy: remove mempolicy sharing
Date: Thu, 11 Oct 2012 11:03:59 +0900 [thread overview]
Message-ID: <20121011015430.572152637@linuxfoundation.org> (raw)
In-Reply-To: <20121011015417.017144658@linuxfoundation.org>
3.0-stable review patch. If anyone has any objections, please let me know.
------------------
From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
commit 869833f2c5c6e4dd09a5378cfc665ffb4615e5d2 upstream.
Dave Jones' system call fuzz testing tool "trinity" triggered the
following bug error with slab debugging enabled
=============================================================================
BUG numa_policy (Not tainted): Poison overwritten
-----------------------------------------------------------------------------
INFO: 0xffff880146498250-0xffff880146498250. First byte 0x6a instead of 0x6b
INFO: Allocated in mpol_new+0xa3/0x140 age=46310 cpu=6 pid=32154
__slab_alloc+0x3d3/0x445
kmem_cache_alloc+0x29d/0x2b0
mpol_new+0xa3/0x140
sys_mbind+0x142/0x620
system_call_fastpath+0x16/0x1b
INFO: Freed in __mpol_put+0x27/0x30 age=46268 cpu=6 pid=32154
__slab_free+0x2e/0x1de
kmem_cache_free+0x25a/0x260
__mpol_put+0x27/0x30
remove_vma+0x68/0x90
exit_mmap+0x118/0x140
mmput+0x73/0x110
exit_mm+0x108/0x130
do_exit+0x162/0xb90
do_group_exit+0x4f/0xc0
sys_exit_group+0x17/0x20
system_call_fastpath+0x16/0x1b
INFO: Slab 0xffffea0005192600 objects=27 used=27 fp=0x (null) flags=0x20000000004080
INFO: Object 0xffff880146498250 @offset=592 fp=0xffff88014649b9d0
The problem is that the structure is being prematurely freed due to a
reference count imbalance. In the following case mbind(addr, len) should
replace the memory policies of both vma1 and vma2 and thus they will
become to share the same mempolicy and the new mempolicy will have the
MPOL_F_SHARED flag.
+-------------------+-------------------+
| vma1 | vma2(shmem) |
+-------------------+-------------------+
| |
addr addr+len
alloc_pages_vma() uses get_vma_policy() and mpol_cond_put() pair for
maintaining the mempolicy reference count. The current rule is that
get_vma_policy() only increments refcount for shmem VMA and
mpol_conf_put() only decrements refcount if the policy has
MPOL_F_SHARED.
In above case, vma1 is not shmem vma and vma->policy has MPOL_F_SHARED!
The reference count will be decreased even though was not increased
whenever alloc_page_vma() is called. This has been broken since commit
[52cd3b07: mempolicy: rework mempolicy Reference Counting] in 2008.
There is another serious bug with the sharing of memory policies.
Currently, mempolicy rebind logic (it is called from cpuset rebinding)
ignores a refcount of mempolicy and override it forcibly. Thus, any
mempolicy sharing may cause mempolicy corruption. The bug was
introduced by commit [68860ec1: cpusets: automatic numa mempolicy
rebinding].
Ideally, the shared policy handling would be rewritten to either
properly handle COW of the policy structures or at least reference count
MPOL_F_SHARED based exclusively on information within the policy.
However, this patch takes the easier approach of disabling any policy
sharing between VMAs. Each new range allocated with sp_alloc will
allocate a new policy, set the reference count to 1 and drop the
reference count of the old policy. This increases the memory footprint
but is not expected to be a major problem as mbind() is unlikely to be
used for fine-grained ranges. It is also inefficient because it means
we allocate a new policy even in cases where mbind_range() could use the
new_policy passed to it. However, it is more straight-forward and the
change should be invisible to the user.
[mgorman@suse.de: Edited changelog]
Reported-by: Dave Jones <davej@redhat.com>
Cc: Christoph Lameter <cl@linux.com>
Reviewed-by: Christoph Lameter <cl@linux.com>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Cc: Josh Boyer <jwboyer@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
mm/mempolicy.c | 52 ++++++++++++++++++++++++++++++++++++++--------------
1 file changed, 38 insertions(+), 14 deletions(-)
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -606,24 +606,39 @@ check_range(struct mm_struct *mm, unsign
return first;
}
-/* Apply policy to a single VMA */
-static int policy_vma(struct vm_area_struct *vma, struct mempolicy *new)
+/*
+ * Apply policy to a single VMA
+ * This must be called with the mmap_sem held for writing.
+ */
+static int vma_replace_policy(struct vm_area_struct *vma,
+ struct mempolicy *pol)
{
- int err = 0;
- struct mempolicy *old = vma->vm_policy;
+ int err;
+ struct mempolicy *old;
+ struct mempolicy *new;
pr_debug("vma %lx-%lx/%lx vm_ops %p vm_file %p set_policy %p\n",
vma->vm_start, vma->vm_end, vma->vm_pgoff,
vma->vm_ops, vma->vm_file,
vma->vm_ops ? vma->vm_ops->set_policy : NULL);
- if (vma->vm_ops && vma->vm_ops->set_policy)
+ new = mpol_dup(pol);
+ if (IS_ERR(new))
+ return PTR_ERR(new);
+
+ if (vma->vm_ops && vma->vm_ops->set_policy) {
err = vma->vm_ops->set_policy(vma, new);
- if (!err) {
- mpol_get(new);
- vma->vm_policy = new;
- mpol_put(old);
+ if (err)
+ goto err_out;
}
+
+ old = vma->vm_policy;
+ vma->vm_policy = new; /* protected by mmap_sem */
+ mpol_put(old);
+
+ return 0;
+ err_out:
+ mpol_put(new);
return err;
}
@@ -666,7 +681,7 @@ static int mbind_range(struct mm_struct
if (err)
goto out;
}
- err = policy_vma(vma, new_pol);
+ err = vma_replace_policy(vma, new_pol);
if (err)
goto out;
}
@@ -2091,15 +2106,24 @@ static void sp_delete(struct shared_poli
static struct sp_node *sp_alloc(unsigned long start, unsigned long end,
struct mempolicy *pol)
{
- struct sp_node *n = kmem_cache_alloc(sn_cache, GFP_KERNEL);
+ struct sp_node *n;
+ struct mempolicy *newpol;
+ n = kmem_cache_alloc(sn_cache, GFP_KERNEL);
if (!n)
return NULL;
+
+ newpol = mpol_dup(pol);
+ if (IS_ERR(newpol)) {
+ kmem_cache_free(sn_cache, n);
+ return NULL;
+ }
+ newpol->flags |= MPOL_F_SHARED;
+
n->start = start;
n->end = end;
- mpol_get(pol);
- pol->flags |= MPOL_F_SHARED; /* for unref */
- n->policy = pol;
+ n->policy = newpol;
+
return n;
}
next prev parent reply other threads:[~2012-10-11 2:27 UTC|newest]
Thread overview: 87+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-10-11 2:01 [ 00/84] 3.0.46-stable review Greg Kroah-Hartman
2012-10-11 2:02 ` [ 01/84] mn10300: only add -mmem-funcs to KBUILD_CFLAGS if gcc supports it Greg Kroah-Hartman
2012-10-11 2:02 ` [ 02/84] kbuild: make: fix if_changed when command contains backslashes Greg Kroah-Hartman
2012-10-11 2:02 ` [ 03/84] media: rc: ite-cir: Initialise ite_dev::rdev earlier Greg Kroah-Hartman
2012-10-11 2:02 ` [ 04/84] ACPI: run _OSC after ACPI_FULL_INITIALIZATION Greg Kroah-Hartman
2012-10-11 2:02 ` [ 05/84] PCI: acpiphp: check whether _ADR evaluation succeeded Greg Kroah-Hartman
2012-10-11 2:02 ` [ 06/84] lib/gcd.c: prevent possible div by 0 Greg Kroah-Hartman
2012-10-11 2:02 ` [ 07/84] kernel/sys.c: call disable_nonboot_cpus() in kernel_restart() Greg Kroah-Hartman
2012-10-11 2:02 ` [ 08/84] drivers/scsi/atp870u.c: fix bad use of udelay Greg Kroah-Hartman
2012-10-11 2:02 ` [ 09/84] workqueue: add missing smp_wmb() in process_one_work() Greg Kroah-Hartman
2012-10-11 2:02 ` [ 10/84] xfrm: Workaround incompatibility of ESN and async crypto Greg Kroah-Hartman
2012-10-11 2:02 ` [ 11/84] xfrm_user: return error pointer instead of NULL Greg Kroah-Hartman
2012-10-11 2:02 ` [ 12/84] xfrm_user: return error pointer instead of NULL #2 Greg Kroah-Hartman
2012-10-11 2:02 ` [ 13/84] xfrm: fix a read lock imbalance in make_blackhole Greg Kroah-Hartman
2012-10-11 2:02 ` [ 14/84] xfrm_user: fix info leak in copy_to_user_auth() Greg Kroah-Hartman
2012-10-11 2:02 ` [ 15/84] xfrm_user: fix info leak in copy_to_user_state() Greg Kroah-Hartman
2012-10-11 2:03 ` [ 16/84] xfrm_user: fix info leak in copy_to_user_policy() Greg Kroah-Hartman
2012-10-11 2:03 ` [ 17/84] xfrm_user: fix info leak in copy_to_user_tmpl() Greg Kroah-Hartman
2012-10-11 2:03 ` [ 18/84] xfrm_user: dont copy esn replay window twice for new states Greg Kroah-Hartman
2012-10-11 2:03 ` [ 19/84] xfrm_user: ensure user supplied esn replay window is valid Greg Kroah-Hartman
2012-10-11 2:03 ` [ 20/84] net: ethernet: davinci_cpdma: decrease the desc count when cleaning up the remaining packets Greg Kroah-Hartman
2012-10-11 2:03 ` [ 21/84] ixp4xx_hss: fix build failure due to missing linux/module.h inclusion Greg Kroah-Hartman
2012-10-11 2:03 ` [ 22/84] netxen: check for root bus in netxen_mask_aer_correctable Greg Kroah-Hartman
2012-10-11 2:03 ` [ 23/84] net-sched: sch_cbq: avoid infinite loop Greg Kroah-Hartman
2012-10-11 2:03 ` [ 24/84] pkt_sched: fix virtual-start-time update in QFQ Greg Kroah-Hartman
2012-10-11 2:03 ` [ 25/84] sierra_net: Endianess bug fix Greg Kroah-Hartman
2012-10-11 2:03 ` [ 26/84] 8021q: fix mac_len recomputation in vlan_untag() Greg Kroah-Hartman
2012-10-11 2:03 ` [ 27/84] ipv6: release reference of ip6_null_entrys dst entry in __ip6_del_rt Greg Kroah-Hartman
2012-10-11 2:03 ` [ 28/84] tcp: flush DMA queue before sk_wait_data if rcv_wnd is zero Greg Kroah-Hartman
2012-10-11 2:03 ` [ 29/84] sctp: Dont charge for data in sndbuf again when transmitting packet Greg Kroah-Hartman
2012-10-11 2:03 ` [ 30/84] pppoe: drop PPPOX_ZOMBIEs in pppoe_release Greg Kroah-Hartman
2012-10-11 2:03 ` [ 31/84] net: small bug on rxhash calculation Greg Kroah-Hartman
2012-10-11 2:03 ` [ 32/84] net: guard tcp_set_keepalive() to tcp sockets Greg Kroah-Hartman
2012-10-11 2:03 ` [ 33/84] ipv4: raw: fix icmp_filter() Greg Kroah-Hartman
2012-10-11 2:03 ` [ 34/84] ipv6: raw: fix icmpv6_filter() Greg Kroah-Hartman
2012-10-11 2:03 ` [ 35/84] ipv6: mip6: fix mip6_mh_filter() Greg Kroah-Hartman
2012-10-11 2:03 ` [ 36/84] l2tp: fix a typo in l2tp_eth_dev_recv() Greg Kroah-Hartman
2012-10-11 2:03 ` [ 37/84] netrom: copy_datagram_iovec can fail Greg Kroah-Hartman
2012-10-11 2:03 ` [ 38/84] net: do not disable sg for packets requiring no checksum Greg Kroah-Hartman
2012-10-11 2:03 ` [ 39/84] aoe: assert AoE packets marked as " Greg Kroah-Hartman
2012-10-11 2:03 ` [ 40/84] tg3: Fix TSO CAP for 5704 devs w / ASF enabled Greg Kroah-Hartman
2012-10-11 2:03 ` [ 41/84] SCSI: zfcp: Make trace record tags unique Greg Kroah-Hartman
2012-10-11 2:03 ` [ 42/84] SCSI: zfcp: Do not wakeup while suspended Greg Kroah-Hartman
2012-10-11 2:03 ` [ 43/84] SCSI: zfcp: remove invalid reference to list iterator variable Greg Kroah-Hartman
2012-10-11 2:03 ` [ 44/84] SCSI: zfcp: restore refcount check on port_remove Greg Kroah-Hartman
2012-10-11 2:03 ` [ 45/84] SCSI: zfcp: only access zfcp_scsi_dev for valid scsi_device Greg Kroah-Hartman
2012-10-11 2:03 ` [ 46/84] PCI: Check P2P bridge for invalid secondary/subordinate range Greg Kroah-Hartman
2012-10-11 2:03 ` [ 47/84] ext4: online defrag is not supported for journaled files Greg Kroah-Hartman
2012-10-11 2:03 ` [ 48/84] ext4: always set i_op in ext4_mknod() Greg Kroah-Hartman
2012-10-11 2:03 ` [ 49/84] ext4: fix fdatasync() for files with only i_size changes Greg Kroah-Hartman
2012-10-11 2:03 ` [ 50/84] ASoC: wm9712: Fix name of Capture Switch Greg Kroah-Hartman
2012-10-11 2:03 ` [ 51/84] mm: fix invalidate_complete_page2() lock ordering Greg Kroah-Hartman
2012-10-11 2:03 ` [ 52/84] mm: thp: fix pmd_present for split_huge_page and PROT_NONE with THP Greg Kroah-Hartman
2012-10-11 2:03 ` [ 53/84] ALSA: aloop - add locking to timer access Greg Kroah-Hartman
2012-10-11 2:03 ` [ 54/84] ALSA: usb - disable broken hw volume for Tenx TP6911 Greg Kroah-Hartman
2012-10-11 2:03 ` [ 55/84] ALSA: USB: Support for (original) Xbox Communicator Greg Kroah-Hartman
2012-10-11 2:03 ` [ 56/84] drm/radeon: only adjust default clocks on NI GPUs Greg Kroah-Hartman
2012-10-11 2:03 ` [ 57/84] drm/radeon: Add MSI quirk for gateway RS690 Greg Kroah-Hartman
2012-10-11 2:03 ` [ 58/84] drm/radeon: force MSIs on RS690 asics Greg Kroah-Hartman
2012-10-11 2:03 ` [ 59/84] rcu: Fix day-one dyntick-idle stall-warning bug Greg Kroah-Hartman
2012-10-11 2:03 ` [ 60/84] r8169: fix wake on lan setting for non-8111E Greg Kroah-Hartman
2012-10-11 7:15 ` Jonathan Nieder
2012-10-11 10:59 ` Greg Kroah-Hartman
2012-10-11 2:03 ` [ 61/84] r8169: dont enable rx when shutdown Greg Kroah-Hartman
2012-10-11 2:03 ` [ 62/84] r8169: remove erroneous processing of always set bit Greg Kroah-Hartman
2012-10-11 2:03 ` [ 63/84] r8169: jumbo fixes Greg Kroah-Hartman
2012-10-11 2:03 ` [ 64/84] r8169: expand received packet length indication Greg Kroah-Hartman
2012-10-11 2:03 ` [ 65/84] r8169: increase the delay parameter of pm_schedule_suspend Greg Kroah-Hartman
2012-10-11 2:03 ` [ 66/84] r8169: Rx FIFO overflow fixes Greg Kroah-Hartman
2012-10-11 2:03 ` [ 67/84] r8169: fix Config2 MSIEnable bit setting Greg Kroah-Hartman
2012-10-11 2:03 ` [ 68/84] r8169: missing barriers Greg Kroah-Hartman
2012-10-11 2:03 ` [ 69/84] r8169: runtime resume before shutdown Greg Kroah-Hartman
2012-10-11 2:03 ` [ 70/84] r8169: Config1 is read-only on 8168c and later Greg Kroah-Hartman
2012-10-11 2:03 ` [ 71/84] r8169: 8168c and later require bit 0x20 to be set in Config2 for PME signaling Greg Kroah-Hartman
2012-10-11 2:03 ` [ 72/84] r8169: fix unsigned int wraparound with TSO Greg Kroah-Hartman
2012-10-11 2:03 ` [ 73/84] r8169: call netif_napi_del at errpaths and at driver unload Greg Kroah-Hartman
2012-10-11 2:03 ` [ 74/84] revert "mm: mempolicy: Let vma_merge and vma_split handle vma->vm_policy linkages" Greg Kroah-Hartman
2012-10-11 2:03 ` Greg Kroah-Hartman [this message]
2012-10-11 2:04 ` [ 76/84] mempolicy: fix a race in shared_policy_replace() Greg Kroah-Hartman
2012-10-11 2:04 ` [ 77/84] mempolicy: fix refcount leak in mpol_set_shared_policy() Greg Kroah-Hartman
2012-10-11 2:04 ` [ 78/84] mempolicy: fix a memory corruption by refcount imbalance in alloc_pages_vma() Greg Kroah-Hartman
2012-10-11 2:04 ` [ 79/84] CPU hotplug, cpusets, suspend: Dont modify cpusets during suspend/resume Greg Kroah-Hartman
2012-10-11 2:04 ` [ 80/84] mtd: autcpu12-nvram: Fix compile breakage Greg Kroah-Hartman
2012-10-11 2:04 ` [ 81/84] mtd: nandsim: bugfix: fail if overridesize is too big Greg Kroah-Hartman
2012-10-11 2:04 ` [ 82/84] mtd: nand: Use the mirror BBT descriptor when reading its version Greg Kroah-Hartman
2012-10-11 2:04 ` [ 83/84] mtd: omap2: fix omap_nand_remove segfault Greg Kroah-Hartman
2012-10-11 2:04 ` [ 84/84] mtd: omap2: fix module loading Greg Kroah-Hartman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20121011015430.572152637@linuxfoundation.org \
--to=gregkh@linuxfoundation.org \
--cc=akpm@linux-foundation.org \
--cc=alan@lxorguk.ukuu.org.uk \
--cc=cl@linux.com \
--cc=davej@redhat.com \
--cc=jwboyer@gmail.com \
--cc=kosaki.motohiro@jp.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mgorman@suse.de \
--cc=stable@vger.kernel.org \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox