From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
alan@lxorguk.ukuu.org.uk, Hugh Dickins <hughd@google.com>,
Linus Torvalds <torvalds@linux-foundation.org>
Subject: [ 13/47] tmpfs mempolicy: fix /proc/mounts corrupting memory
Date: Wed, 9 Jan 2013 12:35:08 -0800 [thread overview]
Message-ID: <20130109201731.014766529@linuxfoundation.org> (raw)
In-Reply-To: <20130109201729.435233560@linuxfoundation.org>
3.0-stable review patch. If anyone has any objections, please let me know.
------------------
From: Hugh Dickins <hughd@google.com>
commit f2a07f40dbc603c15f8b06e6ec7f768af67b424f upstream.
Recently I suggested using "mount -o remount,mpol=local /tmp" in NUMA
mempolicy testing. Very nasty. Reading /proc/mounts, /proc/pid/mounts
or /proc/pid/mountinfo may then corrupt one bit of kernel memory, often
in a page table (causing "Bad swap" or "Bad page map" warning or "Bad
pagetable" oops), sometimes in a vm_area_struct or rbnode or somewhere
worse. "mpol=prefer" and "mpol=prefer:Node" are equally toxic.
Recent NUMA enhancements are not to blame: this dates back to 2.6.35,
when commit e17f74af351c "mempolicy: don't call mpol_set_nodemask() when
no_context" skipped mpol_parse_str()'s call to mpol_set_nodemask(),
which used to initialize v.preferred_node, or set MPOL_F_LOCAL in flags.
With slab poisoning, you can then rely on mpol_to_str() to set the bit
for node 0x6b6b, probably in the next page above the caller's stack.
mpol_parse_str() is only called from shmem_parse_options(): no_context
is always true, so call it unused for now, and remove !no_context code.
Set v.nodes or v.preferred_node or MPOL_F_LOCAL as mpol_to_str() might
expect. Then mpol_to_str() can ignore its no_context argument also,
the mpol being appropriately initialized whether contextualized or not.
Rename its no_context unused too, and let subsequent patch remove them
(that's not needed for stable backporting, which would involve rejects).
I don't understand why MPOL_LOCAL is described as a pseudo-policy:
it's a reasonable policy which suffers from a confusing implementation
in terms of MPOL_PREFERRED with MPOL_F_LOCAL. I believe this would be
much more robust if MPOL_LOCAL were recognized in switch statements
throughout, MPOL_F_LOCAL deleted, and MPOL_PREFERRED use the (possibly
empty) nodes mask like everyone else, instead of its preferred_node
variant (I presume an optimization from the days before MPOL_LOCAL).
But that would take me too long to get right and fully tested.
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
mm/mempolicy.c | 64 +++++++++++++++++++++++----------------------------------
1 file changed, 26 insertions(+), 38 deletions(-)
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -2308,8 +2308,7 @@ void numa_default_policy(void)
*/
/*
- * "local" is pseudo-policy: MPOL_PREFERRED with MPOL_F_LOCAL flag
- * Used only for mpol_parse_str() and mpol_to_str()
+ * "local" is implemented internally by MPOL_PREFERRED with MPOL_F_LOCAL flag.
*/
#define MPOL_LOCAL MPOL_MAX
static const char * const policy_modes[] =
@@ -2324,28 +2323,21 @@ static const char * const policy_modes[]
#ifdef CONFIG_TMPFS
/**
- * mpol_parse_str - parse string to mempolicy
+ * mpol_parse_str - parse string to mempolicy, for tmpfs mpol mount option.
* @str: string containing mempolicy to parse
* @mpol: pointer to struct mempolicy pointer, returned on success.
- * @no_context: flag whether to "contextualize" the mempolicy
+ * @unused: redundant argument, to be removed later.
*
* Format of input:
* <mode>[=<flags>][:<nodelist>]
*
- * if @no_context is true, save the input nodemask in w.user_nodemask in
- * the returned mempolicy. This will be used to "clone" the mempolicy in
- * a specific context [cpuset] at a later time. Used to parse tmpfs mpol
- * mount option. Note that if 'static' or 'relative' mode flags were
- * specified, the input nodemask will already have been saved. Saving
- * it again is redundant, but safe.
- *
* On success, returns 0, else 1
*/
-int mpol_parse_str(char *str, struct mempolicy **mpol, int no_context)
+int mpol_parse_str(char *str, struct mempolicy **mpol, int unused)
{
struct mempolicy *new = NULL;
unsigned short mode;
- unsigned short uninitialized_var(mode_flags);
+ unsigned short mode_flags;
nodemask_t nodes;
char *nodelist = strchr(str, ':');
char *flags = strchr(str, '=');
@@ -2433,24 +2425,23 @@ int mpol_parse_str(char *str, struct mem
if (IS_ERR(new))
goto out;
- if (no_context) {
- /* save for contextualization */
- new->w.user_nodemask = nodes;
- } else {
- int ret;
- NODEMASK_SCRATCH(scratch);
- if (scratch) {
- task_lock(current);
- ret = mpol_set_nodemask(new, &nodes, scratch);
- task_unlock(current);
- } else
- ret = -ENOMEM;
- NODEMASK_SCRATCH_FREE(scratch);
- if (ret) {
- mpol_put(new);
- goto out;
- }
- }
+ /*
+ * Save nodes for mpol_to_str() to show the tmpfs mount options
+ * for /proc/mounts, /proc/pid/mounts and /proc/pid/mountinfo.
+ */
+ if (mode != MPOL_PREFERRED)
+ new->v.nodes = nodes;
+ else if (nodelist)
+ new->v.preferred_node = first_node(nodes);
+ else
+ new->flags |= MPOL_F_LOCAL;
+
+ /*
+ * Save nodes for contextualization: this will be used to "clone"
+ * the mempolicy in a specific context [cpuset] at a later time.
+ */
+ new->w.user_nodemask = nodes;
+
err = 0;
out:
@@ -2470,13 +2461,13 @@ out:
* @buffer: to contain formatted mempolicy string
* @maxlen: length of @buffer
* @pol: pointer to mempolicy to be formatted
- * @no_context: "context free" mempolicy - use nodemask in w.user_nodemask
+ * @unused: redundant argument, to be removed later.
*
* Convert a mempolicy into a string.
* Returns the number of characters in buffer (if positive)
* or an error (negative)
*/
-int mpol_to_str(char *buffer, int maxlen, struct mempolicy *pol, int no_context)
+int mpol_to_str(char *buffer, int maxlen, struct mempolicy *pol, int unused)
{
char *p = buffer;
int l;
@@ -2502,7 +2493,7 @@ int mpol_to_str(char *buffer, int maxlen
case MPOL_PREFERRED:
nodes_clear(nodes);
if (flags & MPOL_F_LOCAL)
- mode = MPOL_LOCAL; /* pseudo-policy */
+ mode = MPOL_LOCAL;
else
node_set(pol->v.preferred_node, nodes);
break;
@@ -2510,10 +2501,7 @@ int mpol_to_str(char *buffer, int maxlen
case MPOL_BIND:
/* Fall through */
case MPOL_INTERLEAVE:
- if (no_context)
- nodes = pol->w.user_nodemask;
- else
- nodes = pol->v.nodes;
+ nodes = pol->v.nodes;
break;
default:
next prev parent reply other threads:[~2013-01-09 22:02 UTC|newest]
Thread overview: 52+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-01-09 20:34 [ 00/47] 3.0.58-stable review Greg Kroah-Hartman
2013-01-09 20:34 ` [ 01/47] bonding: Bonding driver does not consider the gso_max_size/gso_max_segs setting of slave devices Greg Kroah-Hartman
2013-01-09 20:34 ` [ 02/47] bonding: fix race condition in bonding_store_slaves_active Greg Kroah-Hartman
2013-01-09 20:34 ` [ 03/47] sctp: fix memory leak in sctp_datamsg_from_user() when copy from user space fails Greg Kroah-Hartman
2013-01-09 20:34 ` [ 04/47] sctp: fix -ENOMEM result with invalid user space pointer in sendto() syscall Greg Kroah-Hartman
2013-01-09 20:35 ` [ 05/47] ne2000: add the right platform device Greg Kroah-Hartman
2013-01-09 20:35 ` [ 06/47] irda: sir_dev: Fix copy/paste typo Greg Kroah-Hartman
2013-01-09 20:35 ` [ 07/47] usb/ipheth: Add iPhone 5 support Greg Kroah-Hartman
2013-01-09 20:35 ` [ 08/47] pnpacpi: fix incorrect TEST_ALPHA() test Greg Kroah-Hartman
2013-01-09 20:35 ` [ 09/47] exec: do not leave bprm->interp on stack Greg Kroah-Hartman
2013-01-09 20:35 ` [ 10/47] x86, 8042: Enable A20 using KBC to fix S3 resume on some MSI laptops Greg Kroah-Hartman
2013-01-09 20:35 ` [ 11/47] virtio: force vring descriptors to be allocated from lowmem Greg Kroah-Hartman
2013-01-09 20:35 ` [ 12/47] mm: Fix PageHead when !CONFIG_PAGEFLAGS_EXTENDED Greg Kroah-Hartman
2013-01-09 20:35 ` Greg Kroah-Hartman [this message]
2013-01-09 20:35 ` [ 14/47] ALSA: usb-audio: Avoid autopm calls after disconnection Greg Kroah-Hartman
2013-01-09 20:35 ` [ 15/47] ALSA: usb-audio: Fix missing autopm for MIDI input Greg Kroah-Hartman
2013-01-09 20:35 ` [ 16/47] p54usb: add USB ID for T-Com Sinus 154 data II Greg Kroah-Hartman
2013-01-09 20:35 ` [ 17/47] p54usb: add USBIDs for two more p54usb devices Greg Kroah-Hartman
2013-01-09 20:35 ` [ 18/47] usb: gadget: phonet: free requests in pn_bind()s error path Greg Kroah-Hartman
2013-01-09 20:35 ` [ 19/47] ACPI / scan: Do not use dummy HID for system bus ACPI nodes Greg Kroah-Hartman
2013-01-09 20:35 ` [ 20/47] NFS: avoid NULL dereference in nfs_destroy_server Greg Kroah-Hartman
2013-01-09 20:35 ` [ 21/47] NFS: Fix calls to drop_nlink() Greg Kroah-Hartman
2013-01-09 20:35 ` [ 22/47] nfsd4: fix oops on unusual readlike compound Greg Kroah-Hartman
2013-01-09 20:35 ` [ 23/47] nfs: fix null checking in nfs_get_option_str() Greg Kroah-Hartman
2013-01-09 20:35 ` [ 24/47] Input: walkera0701 - fix crash on startup Greg Kroah-Hartman
2013-01-09 20:35 ` [ 25/47] genirq: Always force thread affinity Greg Kroah-Hartman
2013-01-09 20:35 ` [ 26/47] xhci: Add Lynx Point LP to list of Intel switchable hosts Greg Kroah-Hartman
2013-01-09 20:35 ` [ 27/47] cgroup: remove incorrect dget/dput() pair in cgroup_create_dir() Greg Kroah-Hartman
2013-01-09 20:35 ` [ 28/47] x86, amd: Disable way access filter on Piledriver CPUs Greg Kroah-Hartman
2013-01-09 20:35 ` [ 29/47] ftrace: Do not function trace inlined functions Greg Kroah-Hartman
2013-01-09 20:35 ` [ 30/47] sparc: huge_ptep_set_* functions need to call set_huge_pte_at() Greg Kroah-Hartman
2013-01-09 20:35 ` Greg Kroah-Hartman
2013-01-09 20:35 ` [ 31/47] net: sched: integer overflow fix Greg Kroah-Hartman
2013-01-09 20:35 ` [ 32/47] tcp: implement RFC 5961 3.2 Greg Kroah-Hartman
2013-01-09 20:35 ` [ 33/47] tcp: implement RFC 5961 4.2 Greg Kroah-Hartman
2013-01-09 20:35 ` [ 34/47] tcp: refine SYN handling in tcp_validate_incoming Greg Kroah-Hartman
2013-01-09 20:35 ` [ 35/47] tcp: tcp_replace_ts_recent() should not be called from tcp_validate_incoming() Greg Kroah-Hartman
2013-01-09 20:35 ` [ 36/47] tcp: RFC 5961 5.2 Blind Data Injection Attack Mitigation Greg Kroah-Hartman
2013-01-09 20:35 ` [ 37/47] ARM: mm: use pteval_t to represent page protection values Greg Kroah-Hartman
2013-01-09 20:35 ` [ 38/47] ARM: missing ->mmap_sem around find_vma() in swp_emulate.c Greg Kroah-Hartman
2013-01-09 20:35 ` [ 39/47] solos-pci: fix double-free of TX skb in DMA mode Greg Kroah-Hartman
2013-01-09 20:35 ` [ 40/47] PCI: Reduce Ricoh 0xe822 SD card reader base clock frequency to 50MHz Greg Kroah-Hartman
2013-01-09 20:35 ` [ 41/47] Bluetooth: ath3k: Add support for VAIO VPCEH [0489:e027] Greg Kroah-Hartman
2013-01-09 20:35 ` [ 42/47] Bluetooth: cancel power_on work when unregistering the device Greg Kroah-Hartman
2013-01-09 20:35 ` [ 43/47] CRIS: fix I/O macros Greg Kroah-Hartman
2013-01-09 20:35 ` Greg Kroah-Hartman
2013-01-09 20:35 ` [ 44/47] drivers/rtc/rtc-vt8500.c: correct handling of CR_24H bitfield Greg Kroah-Hartman
2013-01-09 20:35 ` [ 45/47] drivers/rtc/rtc-vt8500.c: fix handling of data passed in struct rtc_time Greg Kroah-Hartman
2013-01-09 20:35 ` [ 46/47] mm: limit mmu_gather batching to fix soft lockups on !CONFIG_PREEMPT Greg Kroah-Hartman
2013-01-09 20:35 ` [ 47/47] can: Do not call dev_put if restart timer is running upon close Greg Kroah-Hartman
2013-01-10 18:01 ` [ 00/47] 3.0.58-stable review Shuah Khan
2013-01-11 14:47 ` Satoru Takeuchi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130109201731.014766529@linuxfoundation.org \
--to=gregkh@linuxfoundation.org \
--cc=alan@lxorguk.ukuu.org.uk \
--cc=hughd@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=stable@vger.kernel.org \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.