stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	alan@lxorguk.ukuu.org.uk, Hugh Dickins <hughd@google.com>,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: [ 13/47] tmpfs mempolicy: fix /proc/mounts corrupting memory
Date: Wed,  9 Jan 2013 12:35:08 -0800	[thread overview]
Message-ID: <20130109201731.014766529@linuxfoundation.org> (raw)
In-Reply-To: <20130109201729.435233560@linuxfoundation.org>

3.0-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Hugh Dickins <hughd@google.com>

commit f2a07f40dbc603c15f8b06e6ec7f768af67b424f upstream.

Recently I suggested using "mount -o remount,mpol=local /tmp" in NUMA
mempolicy testing.  Very nasty.  Reading /proc/mounts, /proc/pid/mounts
or /proc/pid/mountinfo may then corrupt one bit of kernel memory, often
in a page table (causing "Bad swap" or "Bad page map" warning or "Bad
pagetable" oops), sometimes in a vm_area_struct or rbnode or somewhere
worse.  "mpol=prefer" and "mpol=prefer:Node" are equally toxic.

Recent NUMA enhancements are not to blame: this dates back to 2.6.35,
when commit e17f74af351c "mempolicy: don't call mpol_set_nodemask() when
no_context" skipped mpol_parse_str()'s call to mpol_set_nodemask(),
which used to initialize v.preferred_node, or set MPOL_F_LOCAL in flags.
With slab poisoning, you can then rely on mpol_to_str() to set the bit
for node 0x6b6b, probably in the next page above the caller's stack.

mpol_parse_str() is only called from shmem_parse_options(): no_context
is always true, so call it unused for now, and remove !no_context code.
Set v.nodes or v.preferred_node or MPOL_F_LOCAL as mpol_to_str() might
expect.  Then mpol_to_str() can ignore its no_context argument also,
the mpol being appropriately initialized whether contextualized or not.
Rename its no_context unused too, and let subsequent patch remove them
(that's not needed for stable backporting, which would involve rejects).

I don't understand why MPOL_LOCAL is described as a pseudo-policy:
it's a reasonable policy which suffers from a confusing implementation
in terms of MPOL_PREFERRED with MPOL_F_LOCAL.  I believe this would be
much more robust if MPOL_LOCAL were recognized in switch statements
throughout, MPOL_F_LOCAL deleted, and MPOL_PREFERRED use the (possibly
empty) nodes mask like everyone else, instead of its preferred_node
variant (I presume an optimization from the days before MPOL_LOCAL).
But that would take me too long to get right and fully tested.

Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 mm/mempolicy.c |   64 +++++++++++++++++++++++----------------------------------
 1 file changed, 26 insertions(+), 38 deletions(-)

--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -2308,8 +2308,7 @@ void numa_default_policy(void)
  */
 
 /*
- * "local" is pseudo-policy:  MPOL_PREFERRED with MPOL_F_LOCAL flag
- * Used only for mpol_parse_str() and mpol_to_str()
+ * "local" is implemented internally by MPOL_PREFERRED with MPOL_F_LOCAL flag.
  */
 #define MPOL_LOCAL MPOL_MAX
 static const char * const policy_modes[] =
@@ -2324,28 +2323,21 @@ static const char * const policy_modes[]
 
 #ifdef CONFIG_TMPFS
 /**
- * mpol_parse_str - parse string to mempolicy
+ * mpol_parse_str - parse string to mempolicy, for tmpfs mpol mount option.
  * @str:  string containing mempolicy to parse
  * @mpol:  pointer to struct mempolicy pointer, returned on success.
- * @no_context:  flag whether to "contextualize" the mempolicy
+ * @unused:  redundant argument, to be removed later.
  *
  * Format of input:
  *	<mode>[=<flags>][:<nodelist>]
  *
- * if @no_context is true, save the input nodemask in w.user_nodemask in
- * the returned mempolicy.  This will be used to "clone" the mempolicy in
- * a specific context [cpuset] at a later time.  Used to parse tmpfs mpol
- * mount option.  Note that if 'static' or 'relative' mode flags were
- * specified, the input nodemask will already have been saved.  Saving
- * it again is redundant, but safe.
- *
  * On success, returns 0, else 1
  */
-int mpol_parse_str(char *str, struct mempolicy **mpol, int no_context)
+int mpol_parse_str(char *str, struct mempolicy **mpol, int unused)
 {
 	struct mempolicy *new = NULL;
 	unsigned short mode;
-	unsigned short uninitialized_var(mode_flags);
+	unsigned short mode_flags;
 	nodemask_t nodes;
 	char *nodelist = strchr(str, ':');
 	char *flags = strchr(str, '=');
@@ -2433,24 +2425,23 @@ int mpol_parse_str(char *str, struct mem
 	if (IS_ERR(new))
 		goto out;
 
-	if (no_context) {
-		/* save for contextualization */
-		new->w.user_nodemask = nodes;
-	} else {
-		int ret;
-		NODEMASK_SCRATCH(scratch);
-		if (scratch) {
-			task_lock(current);
-			ret = mpol_set_nodemask(new, &nodes, scratch);
-			task_unlock(current);
-		} else
-			ret = -ENOMEM;
-		NODEMASK_SCRATCH_FREE(scratch);
-		if (ret) {
-			mpol_put(new);
-			goto out;
-		}
-	}
+	/*
+	 * Save nodes for mpol_to_str() to show the tmpfs mount options
+	 * for /proc/mounts, /proc/pid/mounts and /proc/pid/mountinfo.
+	 */
+	if (mode != MPOL_PREFERRED)
+		new->v.nodes = nodes;
+	else if (nodelist)
+		new->v.preferred_node = first_node(nodes);
+	else
+		new->flags |= MPOL_F_LOCAL;
+
+	/*
+	 * Save nodes for contextualization: this will be used to "clone"
+	 * the mempolicy in a specific context [cpuset] at a later time.
+	 */
+	new->w.user_nodemask = nodes;
+
 	err = 0;
 
 out:
@@ -2470,13 +2461,13 @@ out:
  * @buffer:  to contain formatted mempolicy string
  * @maxlen:  length of @buffer
  * @pol:  pointer to mempolicy to be formatted
- * @no_context:  "context free" mempolicy - use nodemask in w.user_nodemask
+ * @unused:  redundant argument, to be removed later.
  *
  * Convert a mempolicy into a string.
  * Returns the number of characters in buffer (if positive)
  * or an error (negative)
  */
-int mpol_to_str(char *buffer, int maxlen, struct mempolicy *pol, int no_context)
+int mpol_to_str(char *buffer, int maxlen, struct mempolicy *pol, int unused)
 {
 	char *p = buffer;
 	int l;
@@ -2502,7 +2493,7 @@ int mpol_to_str(char *buffer, int maxlen
 	case MPOL_PREFERRED:
 		nodes_clear(nodes);
 		if (flags & MPOL_F_LOCAL)
-			mode = MPOL_LOCAL;	/* pseudo-policy */
+			mode = MPOL_LOCAL;
 		else
 			node_set(pol->v.preferred_node, nodes);
 		break;
@@ -2510,10 +2501,7 @@ int mpol_to_str(char *buffer, int maxlen
 	case MPOL_BIND:
 		/* Fall through */
 	case MPOL_INTERLEAVE:
-		if (no_context)
-			nodes = pol->w.user_nodemask;
-		else
-			nodes = pol->v.nodes;
+		nodes = pol->v.nodes;
 		break;
 
 	default:



  parent reply	other threads:[~2013-01-09 20:35 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-01-09 20:34 [ 00/47] 3.0.58-stable review Greg Kroah-Hartman
2013-01-09 20:34 ` [ 01/47] bonding: Bonding driver does not consider the gso_max_size/gso_max_segs setting of slave devices Greg Kroah-Hartman
2013-01-09 20:34 ` [ 02/47] bonding: fix race condition in bonding_store_slaves_active Greg Kroah-Hartman
2013-01-09 20:34 ` [ 03/47] sctp: fix memory leak in sctp_datamsg_from_user() when copy from user space fails Greg Kroah-Hartman
2013-01-09 20:34 ` [ 04/47] sctp: fix -ENOMEM result with invalid user space pointer in sendto() syscall Greg Kroah-Hartman
2013-01-09 20:35 ` [ 05/47] ne2000: add the right platform device Greg Kroah-Hartman
2013-01-09 20:35 ` [ 06/47] irda: sir_dev: Fix copy/paste typo Greg Kroah-Hartman
2013-01-09 20:35 ` [ 07/47] usb/ipheth: Add iPhone 5 support Greg Kroah-Hartman
2013-01-09 20:35 ` [ 08/47] pnpacpi: fix incorrect TEST_ALPHA() test Greg Kroah-Hartman
2013-01-09 20:35 ` [ 09/47] exec: do not leave bprm->interp on stack Greg Kroah-Hartman
2013-01-09 20:35 ` [ 10/47] x86, 8042: Enable A20 using KBC to fix S3 resume on some MSI laptops Greg Kroah-Hartman
2013-01-09 20:35 ` [ 11/47] virtio: force vring descriptors to be allocated from lowmem Greg Kroah-Hartman
2013-01-09 20:35 ` [ 12/47] mm: Fix PageHead when !CONFIG_PAGEFLAGS_EXTENDED Greg Kroah-Hartman
2013-01-09 20:35 ` Greg Kroah-Hartman [this message]
2013-01-09 20:35 ` [ 14/47] ALSA: usb-audio: Avoid autopm calls after disconnection Greg Kroah-Hartman
2013-01-09 20:35 ` [ 15/47] ALSA: usb-audio: Fix missing autopm for MIDI input Greg Kroah-Hartman
2013-01-09 20:35 ` [ 16/47] p54usb: add USB ID for T-Com Sinus 154 data II Greg Kroah-Hartman
2013-01-09 20:35 ` [ 17/47] p54usb: add USBIDs for two more p54usb devices Greg Kroah-Hartman
2013-01-09 20:35 ` [ 18/47] usb: gadget: phonet: free requests in pn_bind()s error path Greg Kroah-Hartman
2013-01-09 20:35 ` [ 19/47] ACPI / scan: Do not use dummy HID for system bus ACPI nodes Greg Kroah-Hartman
2013-01-09 20:35 ` [ 20/47] NFS: avoid NULL dereference in nfs_destroy_server Greg Kroah-Hartman
2013-01-09 20:35 ` [ 21/47] NFS: Fix calls to drop_nlink() Greg Kroah-Hartman
2013-01-09 20:35 ` [ 22/47] nfsd4: fix oops on unusual readlike compound Greg Kroah-Hartman
2013-01-09 20:35 ` [ 23/47] nfs: fix null checking in nfs_get_option_str() Greg Kroah-Hartman
2013-01-09 20:35 ` [ 24/47] Input: walkera0701 - fix crash on startup Greg Kroah-Hartman
2013-01-09 20:35 ` [ 25/47] genirq: Always force thread affinity Greg Kroah-Hartman
2013-01-09 20:35 ` [ 26/47] xhci: Add Lynx Point LP to list of Intel switchable hosts Greg Kroah-Hartman
2013-01-09 20:35 ` [ 27/47] cgroup: remove incorrect dget/dput() pair in cgroup_create_dir() Greg Kroah-Hartman
2013-01-09 20:35 ` [ 28/47] x86, amd: Disable way access filter on Piledriver CPUs Greg Kroah-Hartman
2013-01-09 20:35 ` [ 29/47] ftrace: Do not function trace inlined functions Greg Kroah-Hartman
2013-01-09 20:35 ` [ 30/47] sparc: huge_ptep_set_* functions need to call set_huge_pte_at() Greg Kroah-Hartman
2013-01-09 20:35 ` [ 31/47] net: sched: integer overflow fix Greg Kroah-Hartman
2013-01-09 20:35 ` [ 32/47] tcp: implement RFC 5961 3.2 Greg Kroah-Hartman
2013-01-09 20:35 ` [ 33/47] tcp: implement RFC 5961 4.2 Greg Kroah-Hartman
2013-01-09 20:35 ` [ 34/47] tcp: refine SYN handling in tcp_validate_incoming Greg Kroah-Hartman
2013-01-09 20:35 ` [ 35/47] tcp: tcp_replace_ts_recent() should not be called from tcp_validate_incoming() Greg Kroah-Hartman
2013-01-09 20:35 ` [ 36/47] tcp: RFC 5961 5.2 Blind Data Injection Attack Mitigation Greg Kroah-Hartman
2013-01-09 20:35 ` [ 37/47] ARM: mm: use pteval_t to represent page protection values Greg Kroah-Hartman
2013-01-09 20:35 ` [ 38/47] ARM: missing ->mmap_sem around find_vma() in swp_emulate.c Greg Kroah-Hartman
2013-01-09 20:35 ` [ 39/47] solos-pci: fix double-free of TX skb in DMA mode Greg Kroah-Hartman
2013-01-09 20:35 ` [ 40/47] PCI: Reduce Ricoh 0xe822 SD card reader base clock frequency to 50MHz Greg Kroah-Hartman
2013-01-09 20:35 ` [ 41/47] Bluetooth: ath3k: Add support for VAIO VPCEH [0489:e027] Greg Kroah-Hartman
2013-01-09 20:35 ` [ 42/47] Bluetooth: cancel power_on work when unregistering the device Greg Kroah-Hartman
2013-01-09 20:35 ` [ 43/47] CRIS: fix I/O macros Greg Kroah-Hartman
2013-01-09 20:35 ` [ 44/47] drivers/rtc/rtc-vt8500.c: correct handling of CR_24H bitfield Greg Kroah-Hartman
2013-01-09 20:35 ` [ 45/47] drivers/rtc/rtc-vt8500.c: fix handling of data passed in struct rtc_time Greg Kroah-Hartman
2013-01-09 20:35 ` [ 46/47] mm: limit mmu_gather batching to fix soft lockups on !CONFIG_PREEMPT Greg Kroah-Hartman
2013-01-09 20:35 ` [ 47/47] can: Do not call dev_put if restart timer is running upon close Greg Kroah-Hartman
2013-01-10 18:01 ` [ 00/47] 3.0.58-stable review Shuah Khan
2013-01-11 14:47 ` Satoru Takeuchi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130109201731.014766529@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=alan@lxorguk.ukuu.org.uk \
    --cc=hughd@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=stable@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).