public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Matthew Dobson <colpatch@us.ibm.com>
To: Andi Kleen <ak@suse.de>
Cc: LKML <linux-kernel@vger.kernel.org>,
	Andrew Morton <akpm@osdl.org>,
	"Martin J. Bligh" <mbligh@aracnet.com>
Subject: Re: NUMA API for Linux
Date: Wed, 14 Apr 2004 17:38:37 -0700	[thread overview]
Message-ID: <1081989517.1206.206.camel@arrakis> (raw)
In-Reply-To: <20040407232712.2595ac16.ak@suse.de>

[-- Attachment #1: Type: text/plain, Size: 2196 bytes --]

Andi,
	I'm sure you're sick of me commenting on your patches without "showing
you the money".  I've attached a patch with some of the changes I think
would be beneficial.  Feel free to let me know which changes you think
are crap and which you think are not.

Changes include:

1) Redefine the value of some of the MPOL_* flags
2) Rename check_* to mpol_check_*
3) Remove get_nodes().  This should be done in the same manner as
sys_sched_setaffinity().  We shouldn't care about unused high bits.
4) Create mpol_check_flags() to, well, check the flags.  As the number
of flags and modes grows, it will be easier to do this check in its own
function.
5) In the syscalls (sys_mbind() & sys_set_mempolicy()), change 'len' to
a size_t, add __user to the declaration of 'nmask', change 'maxnode' to
'nmask_len', and condense 'flags' and 'mode' into 'flags'.  The
motivation here is to make this syscall similar to
sys_sched_setaffinity().  These calls are basically the memory
equivalent of set/getaffinity, and should look & behave that way.  Also,
dropping an argument leaves an opening for a pid argument, which I
believe would be good.  We should allow processes (with appropriate
permissions, of course) to mbind other processes.
6) Change how end is calculated as follows:
	end = PAGE_ALIGN(start+len);
	start &= PAGE_MASK;
Basically, this allows users to pass in a non-page aligned 'start', and
makes sure we mbind all pages from the page containing 'start' to the
page containing 'start'+'len'.

This patch also shows that sys_mbind() and sys_set_mempolicy() have more
commonalities than differences.  I believe these two syscalls should be
combined into one with the call signature of sys_mbind().  If the user
passes a start address and length of 0 (or maybe even a flag?), we bind
the whole process, otherwise we bind just a region.  This would shrink
the patch even more than the measly 3 lines the current patch saves, and
save a syscall.

[mcd@arrakis source]$ diffstat ~/linux/patches/265-mm4/mcd_mods.patch
 include/linux/mempolicy.h |   12 ++--
 mm/mempolicy.c            |  119
++++++++++++++++++++++------------------------
 2 files changed, 64 insertions(+), 67 deletions(-)

-Matt

[-- Attachment #2: mcd_mods.patch --]
[-- Type: text/x-patch, Size: 7294 bytes --]

diff -Nurp --exclude-from=/home/mcd/.dontdiff linux-2.6.5-mm4/include/linux/mempolicy.h linux-2.6.5-mcd_numa_api/include/linux/mempolicy.h
--- linux-2.6.5-mm4/include/linux/mempolicy.h	2004-04-12 15:07:18.000000000 -0700
+++ linux-2.6.5-mcd_numa_api/include/linux/mempolicy.h	2004-04-14 17:13:22.000000000 -0700
@@ -8,20 +8,22 @@
  * Copyright 2003,2004 Andi Kleen SuSE Labs
  */
 
-/* Policies */
+/* Policies aka 'modes' */
 #define MPOL_DEFAULT	0
 #define MPOL_PREFERRED	1
 #define MPOL_BIND	2
 #define MPOL_INTERLEAVE	3
 
-#define MPOL_MAX MPOL_INTERLEAVE
+#define MPOL_MAX	(MPOL_INTERLEAVE)
+/* Reserve low 4 bits for policies, ie: 16 possible 'modes' */
+#define MPOL_MODE_MASK	(0xf)
 
 /* Flags for get_mem_policy */
-#define MPOL_F_NODE	(1<<0)	/* return next IL mode instead of node mask */
-#define MPOL_F_ADDR	(1<<1)	/* look up vma using address */
+#define MPOL_F_NODE	(1<<4)	/* return next IL mode instead of node mask */
+#define MPOL_F_ADDR	(1<<5)	/* look up vma using address */
 
 /* Flags for mbind */
-#define MPOL_MF_STRICT	(1<<0)	/* Verify existing pages in the mapping */
+#define MPOL_MF_STRICT	(1<<6)	/* Verify existing pages in the mapping */
 
 #ifdef __KERNEL__
 
diff -Nurp --exclude-from=/home/mcd/.dontdiff linux-2.6.5-mm4/mm/mempolicy.c linux-2.6.5-mcd_numa_api/mm/mempolicy.c
--- linux-2.6.5-mm4/mm/mempolicy.c	2004-04-12 15:42:30.000000000 -0700
+++ linux-2.6.5-mcd_numa_api/mm/mempolicy.c	2004-04-14 17:22:05.000000000 -0700
@@ -88,13 +88,11 @@ static struct mempolicy default_policy =
 };
 
 /* Check if all specified nodes are online */
-static int check_online(unsigned long *nodes)
+static int mpol_check_online(unsigned long *nodes)
 {
 	DECLARE_BITMAP(offline, MAX_NUMNODES);
 
 	bitmap_copy(offline, node_online_map, MAX_NUMNODES);
-	if (bitmap_empty(offline, MAX_NUMNODES))
-		set_bit(0, offline);
 	bitmap_complement(offline, MAX_NUMNODES);
 	bitmap_and(offline, offline, nodes, MAX_NUMNODES);
 	if (!bitmap_empty(offline, MAX_NUMNODES))
@@ -103,7 +101,7 @@ static int check_online(unsigned long *n
 }
 
 /* Do sanity checking on a policy */
-static int check_policy(int mode, unsigned long *nodes)
+static int mpol_check_policy(int mode, unsigned long *nodes)
 {
 	int empty = bitmap_empty(nodes, MAX_NUMNODES);
 
@@ -120,46 +118,25 @@ static int check_policy(int mode, unsign
 			return -EINVAL;
 		break;
 	}
-	return check_online(nodes);
+	return mpol_check_online(nodes);
 }
 
-/* Copy a node mask from user space. */
-static int get_nodes(unsigned long *nodes, unsigned long *nmask,
-		     unsigned long maxnode, int mode)
-{
-	unsigned long k;
-	unsigned long nlongs;
-	unsigned long endmask;
-
-	--maxnode;
-	nlongs = BITS_TO_LONGS(maxnode);
-	if ((maxnode % BITS_PER_LONG) == 0)
-		endmask = ~0UL;
-	else
-		endmask = (1UL << (maxnode % BITS_PER_LONG)) - 1;
-
-	/* When the user specified more nodes than supported just check
-	   if the non supported part is all zero. */
-	if (nmask && nlongs > BITS_TO_LONGS(MAX_NUMNODES)) {
-		for (k = BITS_TO_LONGS(MAX_NUMNODES); k < nlongs; k++) {
-			unsigned long t;
-			if (get_user(t,  nmask + k))
-				return -EFAULT;
-			if (k == nlongs - 1) {
-				if (t & endmask)
-					return -EINVAL;
-			} else if (t)
-				return -EINVAL;
-		}
-		nlongs = BITS_TO_LONGS(MAX_NUMNODES);
-		endmask = ~0UL;
-	}
+/*
+ * Do sanity checking on flags argument to sys_mbind.
+ * Return 'mode' bits if sane, 0 if bad flags.
+ */
+static int mpol_check_flags(int flags)
+{
+	int mode = flags & MPOL_MODE_MASK;
+	flags &= ~MPOL_MODE_MASK;
 
-	bitmap_zero(nodes, MAX_NUMNODES);
-	if (nmask && copy_from_user(nodes, nmask, nlongs*sizeof(unsigned long)))
-		return -EFAULT;
-	nodes[nlongs-1] &= endmask;
-	return check_policy(mode, nodes);
+	if (flags & ~MPOL_MF_STRICT)
+		return 0;
+
+	if (mode > MPOL_MAX)
+		return 0;
+
+	return mode;
 }
 
 /* Generate a custom zonelist for the BIND policy. */
@@ -259,7 +236,7 @@ verify_pages(unsigned long addr, unsigne
 
 /* Step 1: check the range */
 static struct vm_area_struct *
-check_range(struct mm_struct *mm, unsigned long start, unsigned long end,
+mpol_check_range(struct mm_struct *mm, unsigned long start, unsigned long end,
 	    unsigned long *nodes, unsigned long flags)
 {
 	int err;
@@ -334,32 +311,39 @@ static int mbind_range(struct vm_area_st
 }
 
 /* Change policy for a memory range */
-asmlinkage long sys_mbind(unsigned long start, unsigned long len,
-			  unsigned long mode,
-			  unsigned long *nmask, unsigned long maxnode,
-			  unsigned flags)
+asmlinkage long sys_mbind(unsigned long start, size_t len,
+			  unsigned long __user *nmask, unsigned int nmask_len,
+			  int flags)
 {
 	struct vm_area_struct *vma;
 	struct mm_struct *mm = current->mm;
 	struct mempolicy *new;
 	unsigned long end;
 	DECLARE_BITMAP(nodes, MAX_NUMNODES);
-	int err;
+	int err, mode = 0;
 
-	if ((flags & ~(unsigned long)(MPOL_MF_STRICT)) || mode > MPOL_MAX)
-		return -EINVAL;
-	if (start & ~PAGE_MASK)
+	/* Make sure user passed us sane 'flags', and separate the 'mode' */
+	mode = mpol_check_flags(flags);
+	if (mode == 0)
 		return -EINVAL;
+	flags &= ~MPOL_MODE_MASK;
 	if (mode == MPOL_DEFAULT)
 		flags &= ~MPOL_MF_STRICT;
-	len = (len + PAGE_SIZE - 1) & PAGE_MASK;
-	end = start + len;
+
+	/* Ensure start and end are on page boundaries */
+	end = PAGE_ALIGN(start + len);
+	start &= PAGE_MASK;
 	if (end < start)
 		return -EINVAL;
 	if (end == start)
 		return 0;
 
-	err = get_nodes(nodes, nmask, maxnode, mode);
+	/* Copy user's bitmask of nodes */
+	if (nmask_len < sizeof(*nodes))
+		return -EINVAL;
+	if (copy_from_user(nodes, nmask, sizeof(*nodes)))
+		return -EFAULT;
+	err = mpol_check_policy(mode, nodes);
 	if (err)
 		return err;
 
@@ -367,11 +351,9 @@ asmlinkage long sys_mbind(unsigned long 
 	if (IS_ERR(new))
 		return PTR_ERR(new);
 
-	PDprintk("mbind %lx-%lx mode:%ld nodes:%lx\n",start,start+len,
-			mode,nodes[0]);
-
+	PDprintk("mbind %lx-%lx mode:%ld nodes:%lx\n", start, end, mode, nodes[0]);
 	down_write(&mm->mmap_sem);
-	vma = check_range(mm, start, end, nodes, flags);
+	vma = mpol_check_range(mm, start, end, nodes, flags);
 	err = PTR_ERR(vma);
 	if (!IS_ERR(vma))
 		err = mbind_range(vma, start, end, new);
@@ -381,21 +363,34 @@ asmlinkage long sys_mbind(unsigned long 
 }
 
 /* Set the process memory policy */
-asmlinkage long sys_set_mempolicy(int mode, unsigned long *nmask,
-				   unsigned long maxnode)
+asmlinkage long sys_set_mempolicy(unsigned long __user *nmask,
+				  unsigned int nmask_len, int flags)
 {
-	int err;
 	struct mempolicy *new;
 	DECLARE_BITMAP(nodes, MAX_NUMNODES);
+	int err, mode = 0;
 
-	if (mode > MPOL_MAX)
+	/* Make sure user passed us sane 'flags', and separate the 'mode' */
+	mode = mpol_check_flags(flags);
+	if (mode == 0)
 		return -EINVAL;
-	err = get_nodes(nodes, nmask, maxnode, mode);
+	flags &= ~MPOL_MODE_MASK;
+	if (mode == MPOL_DEFAULT)
+		flags &= ~MPOL_MF_STRICT;
+
+	/* Copy user's bitmask of nodes */
+	if (nmask_len < sizeof(*nodes))
+		return -EINVAL;
+	if (copy_from_user(nodes, nmask, sizeof(*nodes)))
+		return -EFAULT;
+	err = mpol_check_policy(mode, nodes);
 	if (err)
 		return err;
+
 	new = new_policy(mode, nodes);
 	if (IS_ERR(new))
 		return PTR_ERR(new);
+
 	mpol_free(current->mempolicy);
 	current->mempolicy = new;
 	if (new && new->policy == MPOL_INTERLEAVE)

  parent reply	other threads:[~2004-04-15  0:39 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-04-07 21:24 NUMA API for Linux Matthew Dobson
2004-04-07 21:27 ` Andi Kleen
2004-04-07 21:41   ` Matthew Dobson
2004-04-07 21:45     ` Andi Kleen
2004-04-07 22:19       ` Matthew Dobson
2004-04-08  0:58       ` Matthew Dobson
2004-04-08  1:31         ` Andi Kleen
2004-04-08 18:36           ` Matthew Dobson
2004-04-09  1:09       ` Matthew Dobson
2004-04-09  5:29         ` Martin J. Bligh
2004-04-09 18:44           ` Matthew Dobson
2004-04-15  0:38   ` Matthew Dobson [this message]
2004-04-15 10:39     ` Andi Kleen
2004-04-15 11:48       ` Robin Holt
2004-04-15 18:32         ` Matthew Dobson
2004-04-15 19:44       ` Matthew Dobson
2004-04-07 21:35 ` Matthew Dobson
2004-04-07 21:51 ` Andrew Morton
2004-04-07 22:16   ` Andi Kleen
2004-04-07 22:34     ` Andrew Morton
2004-04-07 22:39     ` Martin J. Bligh
2004-04-07 22:33       ` Andi Kleen
2004-04-07 22:38   ` Martin J. Bligh
2004-04-07 22:38     ` Andi Kleen
2004-04-07 22:52       ` Andrew Morton
2004-04-07 23:09         ` Martin J. Bligh
2004-04-07 23:35         ` Andi Kleen
2004-04-07 23:56           ` Andrew Morton
2004-04-08  0:14             ` Andi Kleen
2004-04-08  0:26               ` Andrea Arcangeli
2004-04-08  0:51                 ` Andi Kleen
2004-04-08 16:15             ` Hugh Dickins
2004-04-08 17:05               ` Martin J. Bligh
2004-04-08 18:16                 ` Hugh Dickins
2004-04-08 19:25               ` Andrew Morton
2004-04-09  2:41                 ` Wim Coekaerts
2004-04-08  0:22           ` Andrea Arcangeli
     [not found] <1IsMQ-3vi-35@gated-at.bofh.it>
     [not found] ` <1IsMS-3vi-45@gated-at.bofh.it>
     [not found]   ` <1It5U-3J1-21@gated-at.bofh.it>
     [not found]     ` <1ItfE-3PL-3@gated-at.bofh.it>
     [not found]       ` <1ISQC-7Cv-5@gated-at.bofh.it>
2004-04-09  5:39         ` Andi Kleen
     [not found] <1IL3l-1dP-35@gated-at.bofh.it>
     [not found] ` <1IMik-2is-37@gated-at.bofh.it>
2004-04-08 19:20   ` Rajesh Venkatasubramanian
2004-04-08 19:48     ` Hugh Dickins
2004-04-08 19:57       ` Rajesh Venkatasubramanian
2004-04-08 19:52     ` Andrea Arcangeli
  -- strict thread matches above, loose matches on Subject: below --
2004-04-06 13:33 Andi Kleen
2004-04-06 23:35 ` Paul Jackson
2004-04-08 20:12 ` Pavel Machek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1081989517.1206.206.camel@arrakis \
    --to=colpatch@us.ibm.com \
    --cc=ak@suse.de \
    --cc=akpm@osdl.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mbligh@aracnet.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox