From: Matthew Dobson <colpatch@us.ibm.com>
To: Andi Kleen <ak@suse.de>
Cc: LKML <linux-kernel@vger.kernel.org>,
Andrew Morton <akpm@osdl.org>,
"Martin J. Bligh" <mbligh@aracnet.com>
Subject: Re: NUMA API for Linux
Date: Wed, 14 Apr 2004 17:38:37 -0700 [thread overview]
Message-ID: <1081989517.1206.206.camel@arrakis> (raw)
In-Reply-To: <20040407232712.2595ac16.ak@suse.de>
[-- Attachment #1: Type: text/plain, Size: 2196 bytes --]
Andi,
I'm sure you're sick of me commenting on your patches without "showing
you the money". I've attached a patch with some of the changes I think
would be beneficial. Feel free to let me know which changes you think
are crap and which you think are not.
Changes include:
1) Redefine the value of some of the MPOL_* flags
2) Rename check_* to mpol_check_*
3) Remove get_nodes(). This should be done in the same manner as
sys_sched_setaffinity(). We shouldn't care about unused high bits.
4) Create mpol_check_flags() to, well, check the flags. As the number
of flags and modes grows, it will be easier to do this check in its own
function.
5) In the syscalls (sys_mbind() & sys_set_mempolicy()), change 'len' to
a size_t, add __user to the declaration of 'nmask', change 'maxnode' to
'nmask_len', and condense 'flags' and 'mode' into 'flags'. The
motivation here is to make this syscall similar to
sys_sched_setaffinity(). These calls are basically the memory
equivalent of set/getaffinity, and should look & behave that way. Also,
dropping an argument leaves an opening for a pid argument, which I
believe would be good. We should allow processes (with appropriate
permissions, of course) to mbind other processes.
6) Change how end is calculated as follows:
end = PAGE_ALIGN(start+len);
start &= PAGE_MASK;
Basically, this allows users to pass in a non-page aligned 'start', and
makes sure we mbind all pages from the page containing 'start' to the
page containing 'start'+'len'.
This patch also shows that sys_mbind() and sys_set_mempolicy() have more
commonalities than differences. I believe these two syscalls should be
combined into one with the call signature of sys_mbind(). If the user
passes a start address and length of 0 (or maybe even a flag?), we bind
the whole process, otherwise we bind just a region. This would shrink
the patch even more than the measly 3 lines the current patch saves, and
save a syscall.
[mcd@arrakis source]$ diffstat ~/linux/patches/265-mm4/mcd_mods.patch
include/linux/mempolicy.h | 12 ++--
mm/mempolicy.c | 119
++++++++++++++++++++++------------------------
2 files changed, 64 insertions(+), 67 deletions(-)
-Matt
[-- Attachment #2: mcd_mods.patch --]
[-- Type: text/x-patch, Size: 7294 bytes --]
diff -Nurp --exclude-from=/home/mcd/.dontdiff linux-2.6.5-mm4/include/linux/mempolicy.h linux-2.6.5-mcd_numa_api/include/linux/mempolicy.h
--- linux-2.6.5-mm4/include/linux/mempolicy.h 2004-04-12 15:07:18.000000000 -0700
+++ linux-2.6.5-mcd_numa_api/include/linux/mempolicy.h 2004-04-14 17:13:22.000000000 -0700
@@ -8,20 +8,22 @@
* Copyright 2003,2004 Andi Kleen SuSE Labs
*/
-/* Policies */
+/* Policies aka 'modes' */
#define MPOL_DEFAULT 0
#define MPOL_PREFERRED 1
#define MPOL_BIND 2
#define MPOL_INTERLEAVE 3
-#define MPOL_MAX MPOL_INTERLEAVE
+#define MPOL_MAX (MPOL_INTERLEAVE)
+/* Reserve low 4 bits for policies, ie: 16 possible 'modes' */
+#define MPOL_MODE_MASK (0xf)
/* Flags for get_mem_policy */
-#define MPOL_F_NODE (1<<0) /* return next IL mode instead of node mask */
-#define MPOL_F_ADDR (1<<1) /* look up vma using address */
+#define MPOL_F_NODE (1<<4) /* return next IL mode instead of node mask */
+#define MPOL_F_ADDR (1<<5) /* look up vma using address */
/* Flags for mbind */
-#define MPOL_MF_STRICT (1<<0) /* Verify existing pages in the mapping */
+#define MPOL_MF_STRICT (1<<6) /* Verify existing pages in the mapping */
#ifdef __KERNEL__
diff -Nurp --exclude-from=/home/mcd/.dontdiff linux-2.6.5-mm4/mm/mempolicy.c linux-2.6.5-mcd_numa_api/mm/mempolicy.c
--- linux-2.6.5-mm4/mm/mempolicy.c 2004-04-12 15:42:30.000000000 -0700
+++ linux-2.6.5-mcd_numa_api/mm/mempolicy.c 2004-04-14 17:22:05.000000000 -0700
@@ -88,13 +88,11 @@ static struct mempolicy default_policy =
};
/* Check if all specified nodes are online */
-static int check_online(unsigned long *nodes)
+static int mpol_check_online(unsigned long *nodes)
{
DECLARE_BITMAP(offline, MAX_NUMNODES);
bitmap_copy(offline, node_online_map, MAX_NUMNODES);
- if (bitmap_empty(offline, MAX_NUMNODES))
- set_bit(0, offline);
bitmap_complement(offline, MAX_NUMNODES);
bitmap_and(offline, offline, nodes, MAX_NUMNODES);
if (!bitmap_empty(offline, MAX_NUMNODES))
@@ -103,7 +101,7 @@ static int check_online(unsigned long *n
}
/* Do sanity checking on a policy */
-static int check_policy(int mode, unsigned long *nodes)
+static int mpol_check_policy(int mode, unsigned long *nodes)
{
int empty = bitmap_empty(nodes, MAX_NUMNODES);
@@ -120,46 +118,25 @@ static int check_policy(int mode, unsign
return -EINVAL;
break;
}
- return check_online(nodes);
+ return mpol_check_online(nodes);
}
-/* Copy a node mask from user space. */
-static int get_nodes(unsigned long *nodes, unsigned long *nmask,
- unsigned long maxnode, int mode)
-{
- unsigned long k;
- unsigned long nlongs;
- unsigned long endmask;
-
- --maxnode;
- nlongs = BITS_TO_LONGS(maxnode);
- if ((maxnode % BITS_PER_LONG) == 0)
- endmask = ~0UL;
- else
- endmask = (1UL << (maxnode % BITS_PER_LONG)) - 1;
-
- /* When the user specified more nodes than supported just check
- if the non supported part is all zero. */
- if (nmask && nlongs > BITS_TO_LONGS(MAX_NUMNODES)) {
- for (k = BITS_TO_LONGS(MAX_NUMNODES); k < nlongs; k++) {
- unsigned long t;
- if (get_user(t, nmask + k))
- return -EFAULT;
- if (k == nlongs - 1) {
- if (t & endmask)
- return -EINVAL;
- } else if (t)
- return -EINVAL;
- }
- nlongs = BITS_TO_LONGS(MAX_NUMNODES);
- endmask = ~0UL;
- }
+/*
+ * Do sanity checking on flags argument to sys_mbind.
+ * Return 'mode' bits if sane, 0 if bad flags.
+ */
+static int mpol_check_flags(int flags)
+{
+ int mode = flags & MPOL_MODE_MASK;
+ flags &= ~MPOL_MODE_MASK;
- bitmap_zero(nodes, MAX_NUMNODES);
- if (nmask && copy_from_user(nodes, nmask, nlongs*sizeof(unsigned long)))
- return -EFAULT;
- nodes[nlongs-1] &= endmask;
- return check_policy(mode, nodes);
+ if (flags & ~MPOL_MF_STRICT)
+ return 0;
+
+ if (mode > MPOL_MAX)
+ return 0;
+
+ return mode;
}
/* Generate a custom zonelist for the BIND policy. */
@@ -259,7 +236,7 @@ verify_pages(unsigned long addr, unsigne
/* Step 1: check the range */
static struct vm_area_struct *
-check_range(struct mm_struct *mm, unsigned long start, unsigned long end,
+mpol_check_range(struct mm_struct *mm, unsigned long start, unsigned long end,
unsigned long *nodes, unsigned long flags)
{
int err;
@@ -334,32 +311,39 @@ static int mbind_range(struct vm_area_st
}
/* Change policy for a memory range */
-asmlinkage long sys_mbind(unsigned long start, unsigned long len,
- unsigned long mode,
- unsigned long *nmask, unsigned long maxnode,
- unsigned flags)
+asmlinkage long sys_mbind(unsigned long start, size_t len,
+ unsigned long __user *nmask, unsigned int nmask_len,
+ int flags)
{
struct vm_area_struct *vma;
struct mm_struct *mm = current->mm;
struct mempolicy *new;
unsigned long end;
DECLARE_BITMAP(nodes, MAX_NUMNODES);
- int err;
+ int err, mode = 0;
- if ((flags & ~(unsigned long)(MPOL_MF_STRICT)) || mode > MPOL_MAX)
- return -EINVAL;
- if (start & ~PAGE_MASK)
+ /* Make sure user passed us sane 'flags', and separate the 'mode' */
+ mode = mpol_check_flags(flags);
+ if (mode == 0)
return -EINVAL;
+ flags &= ~MPOL_MODE_MASK;
if (mode == MPOL_DEFAULT)
flags &= ~MPOL_MF_STRICT;
- len = (len + PAGE_SIZE - 1) & PAGE_MASK;
- end = start + len;
+
+ /* Ensure start and end are on page boundaries */
+ end = PAGE_ALIGN(start + len);
+ start &= PAGE_MASK;
if (end < start)
return -EINVAL;
if (end == start)
return 0;
- err = get_nodes(nodes, nmask, maxnode, mode);
+ /* Copy user's bitmask of nodes */
+ if (nmask_len < sizeof(*nodes))
+ return -EINVAL;
+ if (copy_from_user(nodes, nmask, sizeof(*nodes)))
+ return -EFAULT;
+ err = mpol_check_policy(mode, nodes);
if (err)
return err;
@@ -367,11 +351,9 @@ asmlinkage long sys_mbind(unsigned long
if (IS_ERR(new))
return PTR_ERR(new);
- PDprintk("mbind %lx-%lx mode:%ld nodes:%lx\n",start,start+len,
- mode,nodes[0]);
-
+ PDprintk("mbind %lx-%lx mode:%ld nodes:%lx\n", start, end, mode, nodes[0]);
down_write(&mm->mmap_sem);
- vma = check_range(mm, start, end, nodes, flags);
+ vma = mpol_check_range(mm, start, end, nodes, flags);
err = PTR_ERR(vma);
if (!IS_ERR(vma))
err = mbind_range(vma, start, end, new);
@@ -381,21 +363,34 @@ asmlinkage long sys_mbind(unsigned long
}
/* Set the process memory policy */
-asmlinkage long sys_set_mempolicy(int mode, unsigned long *nmask,
- unsigned long maxnode)
+asmlinkage long sys_set_mempolicy(unsigned long __user *nmask,
+ unsigned int nmask_len, int flags)
{
- int err;
struct mempolicy *new;
DECLARE_BITMAP(nodes, MAX_NUMNODES);
+ int err, mode = 0;
- if (mode > MPOL_MAX)
+ /* Make sure user passed us sane 'flags', and separate the 'mode' */
+ mode = mpol_check_flags(flags);
+ if (mode == 0)
return -EINVAL;
- err = get_nodes(nodes, nmask, maxnode, mode);
+ flags &= ~MPOL_MODE_MASK;
+ if (mode == MPOL_DEFAULT)
+ flags &= ~MPOL_MF_STRICT;
+
+ /* Copy user's bitmask of nodes */
+ if (nmask_len < sizeof(*nodes))
+ return -EINVAL;
+ if (copy_from_user(nodes, nmask, sizeof(*nodes)))
+ return -EFAULT;
+ err = mpol_check_policy(mode, nodes);
if (err)
return err;
+
new = new_policy(mode, nodes);
if (IS_ERR(new))
return PTR_ERR(new);
+
mpol_free(current->mempolicy);
current->mempolicy = new;
if (new && new->policy == MPOL_INTERLEAVE)
next prev parent reply other threads:[~2004-04-15 0:39 UTC|newest]
Thread overview: 45+ messages / expand[flat|nested] mbox.gz Atom feed top
2004-04-07 21:24 NUMA API for Linux Matthew Dobson
2004-04-07 21:27 ` Andi Kleen
2004-04-07 21:41 ` Matthew Dobson
2004-04-07 21:45 ` Andi Kleen
2004-04-07 22:19 ` Matthew Dobson
2004-04-08 0:58 ` Matthew Dobson
2004-04-08 1:31 ` Andi Kleen
2004-04-08 18:36 ` Matthew Dobson
2004-04-09 1:09 ` Matthew Dobson
2004-04-09 5:29 ` Martin J. Bligh
2004-04-09 18:44 ` Matthew Dobson
2004-04-15 0:38 ` Matthew Dobson [this message]
2004-04-15 10:39 ` Andi Kleen
2004-04-15 11:48 ` Robin Holt
2004-04-15 18:32 ` Matthew Dobson
2004-04-15 19:44 ` Matthew Dobson
2004-04-07 21:35 ` Matthew Dobson
2004-04-07 21:51 ` Andrew Morton
2004-04-07 22:16 ` Andi Kleen
2004-04-07 22:34 ` Andrew Morton
2004-04-07 22:39 ` Martin J. Bligh
2004-04-07 22:33 ` Andi Kleen
2004-04-07 22:38 ` Martin J. Bligh
2004-04-07 22:38 ` Andi Kleen
2004-04-07 22:52 ` Andrew Morton
2004-04-07 23:09 ` Martin J. Bligh
2004-04-07 23:35 ` Andi Kleen
2004-04-07 23:56 ` Andrew Morton
2004-04-08 0:14 ` Andi Kleen
2004-04-08 0:26 ` Andrea Arcangeli
2004-04-08 0:51 ` Andi Kleen
2004-04-08 16:15 ` Hugh Dickins
2004-04-08 17:05 ` Martin J. Bligh
2004-04-08 18:16 ` Hugh Dickins
2004-04-08 19:25 ` Andrew Morton
2004-04-09 2:41 ` Wim Coekaerts
2004-04-08 0:22 ` Andrea Arcangeli
[not found] <1IsMQ-3vi-35@gated-at.bofh.it>
[not found] ` <1IsMS-3vi-45@gated-at.bofh.it>
[not found] ` <1It5U-3J1-21@gated-at.bofh.it>
[not found] ` <1ItfE-3PL-3@gated-at.bofh.it>
[not found] ` <1ISQC-7Cv-5@gated-at.bofh.it>
2004-04-09 5:39 ` Andi Kleen
[not found] <1IL3l-1dP-35@gated-at.bofh.it>
[not found] ` <1IMik-2is-37@gated-at.bofh.it>
2004-04-08 19:20 ` Rajesh Venkatasubramanian
2004-04-08 19:48 ` Hugh Dickins
2004-04-08 19:57 ` Rajesh Venkatasubramanian
2004-04-08 19:52 ` Andrea Arcangeli
-- strict thread matches above, loose matches on Subject: below --
2004-04-06 13:33 Andi Kleen
2004-04-06 23:35 ` Paul Jackson
2004-04-08 20:12 ` Pavel Machek
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1081989517.1206.206.camel@arrakis \
--to=colpatch@us.ibm.com \
--cc=ak@suse.de \
--cc=akpm@osdl.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mbligh@aracnet.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox