From: Andrew Morton <akpm@linux-foundation.org>
To: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Cc: linux-mm@kvack.org, Ben Widawsky <ben.widawsky@intel.com>,
Dave Hansen <dave.hansen@linux.intel.com>,
Feng Tang <feng.tang@intel.com>, Michal Hocko <mhocko@kernel.org>,
Andrea Arcangeli <aarcange@redhat.com>,
Mel Gorman <mgorman@techsingularity.net>,
Mike Kravetz <mike.kravetz@oracle.com>,
Randy Dunlap <rdunlap@infradead.org>,
Vlastimil Babka <vbabka@suse.cz>, Andi Kleen <ak@linux.intel.com>,
Dan Williams <dan.j.williams@intel.com>,
Huang Ying <ying.huang@intel.com>,
linux-api@vger.kernel.org
Subject: Re: [PATCH v5 2/3] mm/mempolicy: add set_mempolicy_home_node syscall
Date: Mon, 29 Nov 2021 14:02:15 -0800 [thread overview]
Message-ID: <20211129140215.11b7cf9f1034a7fe7017768c@linux-foundation.org> (raw)
In-Reply-To: <20211116064238.727454-3-aneesh.kumar@linux.ibm.com>
On Tue, 16 Nov 2021 12:12:37 +0530 "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> wrote:
> This syscall can be used to set a home node for the MPOL_BIND
> and MPOL_PREFERRED_MANY memory policy. Users should use this
> syscall after setting up a memory policy for the specified range
> as shown below.
>
> mbind(p, nr_pages * page_size, MPOL_BIND, new_nodes->maskp,
> new_nodes->size + 1, 0);
> sys_set_mempolicy_home_node((unsigned long)p, nr_pages * page_size,
> home_node, 0);
>
> The syscall allows specifying a home node/preferred node from which kernel
> will fulfill memory allocation requests first.
>
> For address range with MPOL_BIND memory policy, if nodemask specifies more
> than one node, page allocations will come from the node in the nodemask
> with sufficient free memory that is closest to the home node/preferred node.
>
> For MPOL_PREFERRED_MANY if the nodemask specifies more than one node,
> page allocation will come from the node in the nodemask with sufficient
> free memory that is closest to the home node/preferred node. If there is
> not enough memory in all the nodes specified in the nodemask, the allocation
> will be attempted from the closest numa node to the home node in the system.
>
> This helps applications to hint at a memory allocation preference node
> and fallback to _only_ a set of nodes if the memory is not available
> on the preferred node. Fallback allocation is attempted from the node which is
> nearest to the preferred node.
>
> This helps applications to have control on memory allocation numa nodes and
> avoids default fallback to slow memory NUMA nodes. For example a system with
> NUMA nodes 1,2 and 3 with DRAM memory and 10, 11 and 12 of slow memory
>
> new_nodes = numa_bitmask_alloc(nr_nodes);
>
> numa_bitmask_setbit(new_nodes, 1);
> numa_bitmask_setbit(new_nodes, 2);
> numa_bitmask_setbit(new_nodes, 3);
>
> p = mmap(NULL, nr_pages * page_size, protflag, mapflag, -1, 0);
> mbind(p, nr_pages * page_size, MPOL_BIND, new_nodes->maskp, new_nodes->size + 1, 0);
>
> sys_set_mempolicy_home_node(p, nr_pages * page_size, 2, 0);
>
> This will allocate from nodes closer to node 2 and will make sure kernel will
> only allocate from nodes 1, 2 and3. Memory will not be allocated from slow memory
> nodes 10, 11 and 12
>
> With MPOL_PREFERRED_MANY on the other hand will first try to allocate from the
> closest node to node 2 from the node list 1, 2 and 3. If those nodes don't have
> enough memory, kernel will allocate from slow memory node 10, 11 and 12 which
> ever is closer to node 2.
>
> ...
>
> @@ -1477,6 +1478,60 @@ static long kernel_mbind(unsigned long start, unsigned long len,
> return do_mbind(start, len, lmode, mode_flags, &nodes, flags);
> }
>
> +SYSCALL_DEFINE4(set_mempolicy_home_node, unsigned long, start, unsigned long, len,
> + unsigned long, home_node, unsigned long, flags)
> +{
> + struct mm_struct *mm = current->mm;
> + struct vm_area_struct *vma;
> + struct mempolicy *new;
> + unsigned long vmstart;
> + unsigned long vmend;
> + unsigned long end;
> + int err = -ENOENT;
> +
> + if (start & ~PAGE_MASK)
> + return -EINVAL;
> + /*
> + * flags is used for future extension if any.
> + */
> + if (flags != 0)
> + return -EINVAL;
> +
> + if (!node_online(home_node))
> + return -EINVAL;
What's the thinking here? The node can later be offlined and the
kernel takes no action to reset home nodes, so why not permit setting a
presently-offline node as the home node? Checking here seems rather
arbitrary?
next prev parent reply other threads:[~2021-11-29 22:04 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20211116064238.727454-1-aneesh.kumar@linux.ibm.com>
2021-11-16 6:42 ` [PATCH v5 1/3] mm/mempolicy: use policy_node helper with MPOL_PREFERRED_MANY Aneesh Kumar K.V
2021-11-29 10:11 ` Michal Hocko
2021-11-29 10:12 ` [PATCH 4/3] mm: drop node from alloc_pages_vma Michal Hocko
2021-11-16 6:42 ` [PATCH v5 2/3] mm/mempolicy: add set_mempolicy_home_node syscall Aneesh Kumar K.V
2021-11-29 10:32 ` Michal Hocko
2021-11-29 10:46 ` Aneesh Kumar K.V
2021-11-29 12:45 ` Michal Hocko
2021-11-29 13:47 ` Aneesh Kumar K.V
2021-11-29 14:52 ` Michal Hocko
2021-11-29 14:59 ` Aneesh Kumar K.V
2021-11-29 15:19 ` Michal Hocko
2021-11-29 22:02 ` Andrew Morton [this message]
2021-11-30 8:59 ` Aneesh Kumar K.V
2021-11-30 9:59 ` Michal Hocko
2021-12-01 3:00 ` Andrew Morton
2021-12-01 6:22 ` Aneesh Kumar K.V
2021-12-01 0:47 ` Daniel Jordan
2021-12-01 6:15 ` Aneesh Kumar K.V
2021-12-01 16:22 ` Daniel Jordan
2021-11-16 6:42 ` [PATCH v5 3/3] mm/mempolicy: wire up syscall set_mempolicy_home_node Aneesh Kumar K.V
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20211129140215.11b7cf9f1034a7fe7017768c@linux-foundation.org \
--to=akpm@linux-foundation.org \
--cc=aarcange@redhat.com \
--cc=ak@linux.intel.com \
--cc=aneesh.kumar@linux.ibm.com \
--cc=ben.widawsky@intel.com \
--cc=dan.j.williams@intel.com \
--cc=dave.hansen@linux.intel.com \
--cc=feng.tang@intel.com \
--cc=linux-api@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@techsingularity.net \
--cc=mhocko@kernel.org \
--cc=mike.kravetz@oracle.com \
--cc=rdunlap@infradead.org \
--cc=vbabka@suse.cz \
--cc=ying.huang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).