From: Paolo Zeppegno <zeppegno.paolo@seat.it>
To: Hugh Dickins <hugh@veritas.com>
Cc: Matthew Dobson <colpatch@us.ibm.com>,
linux-kernel@vger.kernel.org,
"Martin J. Bligh" <mbligh@aracnet.com>,
Andrew Morton <akpm@digeo.com>,
Christoph Hellwig <hch@infradead.org>, Andi Kleen <ak@muc.de>,
lse-tech <lse-tech@lists.sourceforge.net>
Subject: Re: [rfc][patch] Memory Binding Take 2 (1/1)
Date: Thu, 03 Apr 2003 15:25:26 +0200 [thread overview]
Message-ID: <3E8C3646.3010906@seat.it> (raw)
In-Reply-To: <Pine.LNX.4.44.0304031317290.1718-100000@localhost.localdomain>
I suggested mbind for consistency with mmap, munmap, mremap and msync,
that is IFF the mbind operation is in some ways related with these other
syscalls.
Hugh Dickins wrote:
>On Wed, 2 Apr 2003, Matthew Dobson wrote:
>+/*
>+ * membind - Bind a range of a process' VM space to a set of memory blocks according to
>
>membind or mbind? Me, I like mbind (modulo remarks below), but you may
>find Linus does not (he was rather caustic when I suggested that fremap
>should be mseek, and it ended up as remap_file_pages instead).
>
>+ * a predefined policy.
>+ * @start: beginning address of memory region to bind
>+ * @len: length of memory region to bind
>
>Oh really? len is unused in the code below. If you were to use it,
>you'd need to loop over vmas, splitting where necessary.
>
>+ * @mask_ptr: pointer to bitmask of cpus
>+ * @mask_len: length of the bitmask
>+ * @policy: flag specifying the policy to use for the segment
>
>I think you already remarked that policy is currently unused,
>fair enough.
>
>+ */
>+asmlinkage unsigned long sys_mbind(unsigned long start, unsigned long len,
>+ unsigned long *mask_ptr, unsigned int mask_len, unsigned long policy)
>+{
>+ DECLARE_BITMAP(cpu_mask, NR_CPUS);
>+ DECLARE_BITMAP(node_mask, MAX_NUMNODES);
>+ struct vm_area_struct *vma = NULL;
>+ struct address_space *mapping;
>+ int copy_len, error = 0;
>+
>+ /* Deal with getting cpu_mask from userspace & translating to node_mask */
>+ copy_len = min(mask_len, (unsigned int)NR_CPUS);
>+ CLEAR_BITMAP(cpu_mask, NR_CPUS);
>+ CLEAR_BITMAP(node_mask, MAX_NUMNODES);
>+ if (copy_from_user(cpu_mask, mask_ptr, (copy_len+7)/8)) {
>+ error = -EFAULT;
>+ goto out;
>+ }
>
>Shouldn't there be some capability restriction? Is it right that
>anyone who can mmap a file for reading can determine its binding
>(until the next does it differently)?
>
>+ cpumask_to_nodemask(cpu_mask, node_mask);
>+
>+ vma = find_vma(current->mm, start);
>
>You must not scan the vma list without at least
>down_read(¤t->mm->mmap_sem).
>
>+ if (!(vma && vma->vm_file && vma->vm_ops &&
>+ vma->vm_ops->nopage == shmem_nopage)) {
>+ /* This isn't a shm segment. For now, we bail. */
>
>So you're allowing this on any file on tmpfs,
>but on no file on any other filesystem: curious.
>
>+ error = -EINVAL;
>+ goto out;
>+ }
>+
>+ mapping = vma->vm_file->f_dentry->d_inode->i_mapping;
>+ mapping->binding = alloc_binding(node_mask);
>
>Your NUMA machines clearly have more memory than is good for you:
>nowhere is there an equivalent free_binding: which in particular
>would need to be called first here if binding is already set (or
>else old structure reused), and when inode is freed.
>
>So... mapping->binding conditions every page_cache_alloc for that
>inode. Hmm, what on earth does this have to do with mbind or membind?
>It looks to me like fbind, except that you've dressed up the interface
>to use an address in the caller's address space: presumably because you
>couldn't get a file handle on SysV shared memory, and that's what you
>were really wanting to bind, hence the shmem_nopage test?
>
>I think this interface is confused (but it probably thinks I am).
>
>+ if (!mapping->binding)
>+ error = -EFAULT;
>+
>+out:
>+ return error;
>+}
>diff -Nur --exclude-from=/usr/src/.dontdiff linux-2.5.66-pre_membind/mm/swap_state.c linux-2.5.66-membind/mm/swap_state.c
>--- linux-2.5.66-pre_membind/mm/swap_state.c Mon Mar 24 14:00:21 2003
>+++ linux-2.5.66-membind/mm/swap_state.c Tue Apr 1 17:12:00 2003
>@@ -47,6 +47,9 @@
> .i_shared_sem = __MUTEX_INITIALIZER(swapper_space.i_shared_sem),
> .private_lock = SPIN_LOCK_UNLOCKED,
> .private_list = LIST_HEAD_INIT(swapper_space.private_list),
>+#ifdef CONFIG_NUMA
>+ .binding = NULL,
>+#endif
> };
>
> #define INC_CACHE_INFO(x) do { swap_cache_info.x++; } while (0)
>
>Please leave swap_state.c out of it: this patch does nothing but add
>an ugly #ifdef to initialize something to 0 which would be 0 anyway.
>
>Hugh
>
>
>
next prev parent reply other threads:[~2003-04-03 13:32 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2003-04-03 5:50 [rfc][patch] Memory Binding Take 2 (0/1) Matthew Dobson
2003-04-03 5:56 ` [rfc][patch] Memory Binding Take 2 (1/1) Matthew Dobson
2003-04-03 6:37 ` Andrew Morton
2003-04-03 23:30 ` Matthew Dobson
2003-04-03 12:20 ` Hugh Dickins
2003-04-03 13:25 ` Paolo Zeppegno [this message]
2003-04-03 23:57 ` Matthew Dobson
2003-04-04 13:40 ` Christoph Hellwig
2003-04-04 13:34 ` [rfc][patch] Memory Binding Take 2 (0/1) Christoph Hellwig
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=3E8C3646.3010906@seat.it \
--to=zeppegno.paolo@seat.it \
--cc=ak@muc.de \
--cc=akpm@digeo.com \
--cc=colpatch@us.ibm.com \
--cc=hch@infradead.org \
--cc=hugh@veritas.com \
--cc=linux-kernel@vger.kernel.org \
--cc=lse-tech@lists.sourceforge.net \
--cc=mbligh@aracnet.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.