* Re: [RFC PATCH] mm/mempolicy: add MPOL_PREFERRED_STRICT memory policy [not found] <20211013094539.962357-1-aneesh.kumar@linux.ibm.com> @ 2021-10-13 10:42 ` Michal Hocko 2021-10-13 10:48 ` Michal Hocko 0 siblings, 1 reply; 10+ messages in thread From: Michal Hocko @ 2021-10-13 10:42 UTC (permalink / raw) To: Aneesh Kumar K.V Cc: linux-mm, akpm, Ben Widawsky, Dave Hansen, Feng Tang, Andrea Arcangeli, Mel Gorman, Mike Kravetz, Randy Dunlap, Vlastimil Babka, Andi Kleen, Dan Williams, Huang Ying, linux-api [Cc linux-api] On Wed 13-10-21 15:15:39, Aneesh Kumar K.V wrote: > This mempolicy mode can be used with either the set_mempolicy(2) > or mbind(2) interfaces. Like the MPOL_PREFERRED interface, it > allows an application to set a preference node from which the kernel > will fulfill memory allocation requests. Unlike the MPOL_PREFERRED mode, > it takes a set of nodes. The nodes in the nodemask are used as fallback > allocation nodes if memory is not available on the preferred node. > Unlike MPOL_PREFERRED_MANY, it will not fall back memory allocations > to all nodes in the system. Like the MPOL_BIND interface, it works over a > set of nodes and will cause a SIGSEGV or invoke the OOM killer if > memory is not available on those preferred nodes. > > This patch helps applications to hint a memory allocation preference node > and fallback to _only_ a set of nodes if the memory is not available > on the preferred node. Fallback allocation is attempted from the node which is > nearest to the preferred node. > > This new memory policy helps applications to have explicit control on slow > memory allocation and avoids default fallback to slow memory NUMA nodes. > The difference with MPOL_BIND is the ability to specify a preferred node > which is the first node in the nodemask argument passed. > > Cc: Ben Widawsky <ben.widawsky@intel.com> > Cc: Dave Hansen <dave.hansen@linux.intel.com> > Cc: Feng Tang <feng.tang@intel.com> > Cc: Michal Hocko <mhocko@kernel.org> > Cc: Andrea Arcangeli <aarcange@redhat.com> > Cc: Mel Gorman <mgorman@techsingularity.net> > Cc: Mike Kravetz <mike.kravetz@oracle.com> > Cc: Randy Dunlap <rdunlap@infradead.org> > Cc: Vlastimil Babka <vbabka@suse.cz> > Cc: Andi Kleen <ak@linux.intel.com> > Cc: Dan Williams <dan.j.williams@intel.com> > Cc: Huang Ying <ying.huang@intel.com>b > > Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> > --- > .../admin-guide/mm/numa_memory_policy.rst | 7 +++ > include/uapi/linux/mempolicy.h | 1 + > mm/mempolicy.c | 43 +++++++++++++++++-- > 3 files changed, 48 insertions(+), 3 deletions(-) > > diff --git a/Documentation/admin-guide/mm/numa_memory_policy.rst b/Documentation/admin-guide/mm/numa_memory_policy.rst > index 64fd0ba0d057..4dfdcbd22d67 100644 > --- a/Documentation/admin-guide/mm/numa_memory_policy.rst > +++ b/Documentation/admin-guide/mm/numa_memory_policy.rst > @@ -252,6 +252,13 @@ MPOL_PREFERRED_MANY > can fall back to all existing numa nodes. This is effectively > MPOL_PREFERRED allowed for a mask rather than a single node. > > +MPOL_PREFERRED_STRICT > + This mode specifies that the allocation should be attempted > + from the first node specified in the nodemask of the policy. > + If that allocation fails, the kernel will search other nodes > + in the nodemask, in order of increasing distance from the > + preferred node based on information provided by the platform firmware. > + > NUMA memory policy supports the following optional mode flags: > > MPOL_F_STATIC_NODES > diff --git a/include/uapi/linux/mempolicy.h b/include/uapi/linux/mempolicy.h > index 046d0ccba4cd..8aa1d1963235 100644 > --- a/include/uapi/linux/mempolicy.h > +++ b/include/uapi/linux/mempolicy.h > @@ -23,6 +23,7 @@ enum { > MPOL_INTERLEAVE, > MPOL_LOCAL, > MPOL_PREFERRED_MANY, > + MPOL_PREFERRED_STRICT, > MPOL_MAX, /* always last member of enum */ > }; > > diff --git a/mm/mempolicy.c b/mm/mempolicy.c > index 1592b081c58e..59080dd1ea69 100644 > --- a/mm/mempolicy.c > +++ b/mm/mempolicy.c > @@ -407,6 +407,10 @@ static const struct mempolicy_operations mpol_ops[MPOL_MAX] = { > .create = mpol_new_nodemask, > .rebind = mpol_rebind_preferred, > }, > + [MPOL_PREFERRED_STRICT] = { > + .create = mpol_new_nodemask, > + .rebind = mpol_rebind_preferred, > + }, > }; > > static int migrate_page_add(struct page *page, struct list_head *pagelist, > @@ -900,6 +904,7 @@ static void get_policy_nodemask(struct mempolicy *p, nodemask_t *nodes) > case MPOL_INTERLEAVE: > case MPOL_PREFERRED: > case MPOL_PREFERRED_MANY: > + case MPOL_PREFERRED_STRICT: > *nodes = p->nodes; > break; > case MPOL_LOCAL: > @@ -1781,7 +1786,7 @@ nodemask_t *policy_nodemask(gfp_t gfp, struct mempolicy *policy) > cpuset_nodemask_valid_mems_allowed(&policy->nodes)) > return &policy->nodes; > > - if (mode == MPOL_PREFERRED_MANY) > + if (mode == MPOL_PREFERRED_MANY || mode == MPOL_PREFERRED_STRICT) > return &policy->nodes; > > return NULL; > @@ -1796,7 +1801,7 @@ nodemask_t *policy_nodemask(gfp_t gfp, struct mempolicy *policy) > */ > static int policy_node(gfp_t gfp, struct mempolicy *policy, int nd) > { > - if (policy->mode == MPOL_PREFERRED) { > + if (policy->mode == MPOL_PREFERRED || policy->mode == MPOL_PREFERRED_STRICT) { > nd = first_node(policy->nodes); > } else { > /* > @@ -1840,6 +1845,7 @@ unsigned int mempolicy_slab_node(void) > > switch (policy->mode) { > case MPOL_PREFERRED: > + case MPOL_PREFERRED_STRICT: > return first_node(policy->nodes); > > case MPOL_INTERLEAVE: > @@ -1952,7 +1958,8 @@ int huge_node(struct vm_area_struct *vma, unsigned long addr, gfp_t gfp_flags, > huge_page_shift(hstate_vma(vma))); > } else { > nid = policy_node(gfp_flags, *mpol, numa_node_id()); > - if (mode == MPOL_BIND || mode == MPOL_PREFERRED_MANY) > + if (mode == MPOL_BIND || mode == MPOL_PREFERRED_MANY || > + mode == MPOL_PREFERRED_STRICT) > *nodemask = &(*mpol)->nodes; > } > return nid; > @@ -1986,6 +1993,7 @@ bool init_nodemask_of_mempolicy(nodemask_t *mask) > switch (mempolicy->mode) { > case MPOL_PREFERRED: > case MPOL_PREFERRED_MANY: > + case MPOL_PREFERRED_STRICT: > case MPOL_BIND: > case MPOL_INTERLEAVE: > *mask = mempolicy->nodes; > @@ -2072,6 +2080,23 @@ static struct page *alloc_pages_preferred_many(gfp_t gfp, unsigned int order, > return page; > } > > +static struct page *alloc_pages_preferred_strict(gfp_t gfp, unsigned int order, > + struct mempolicy *pol) > +{ > + int nid; > + gfp_t preferred_gfp; > + > + /* > + * With MPOL_PREFERRED_STRICT first node in the policy nodemask > + * is picked as the preferred node id and the fallback allocation > + * is still restricted to the preferred nodes in the nodemask. > + */ > + preferred_gfp = gfp | __GFP_NOWARN; > + preferred_gfp &= ~(__GFP_DIRECT_RECLAIM | __GFP_NOFAIL); > + nid = first_node(pol->nodes); > + return __alloc_pages(preferred_gfp, order, nid, &pol->nodes); > +} > + > /** > * alloc_pages_vma - Allocate a page for a VMA. > * @gfp: GFP flags. > @@ -2113,6 +2138,12 @@ struct page *alloc_pages_vma(gfp_t gfp, int order, struct vm_area_struct *vma, > goto out; > } > > + if (pol->mode == MPOL_PREFERRED_STRICT) { > + page = alloc_pages_preferred_strict(gfp, order, pol); > + mpol_cond_put(pol); > + goto out; > + } > + > if (unlikely(IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) && hugepage)) { > int hpage_node = node; > > @@ -2193,6 +2224,8 @@ struct page *alloc_pages(gfp_t gfp, unsigned order) > else if (pol->mode == MPOL_PREFERRED_MANY) > page = alloc_pages_preferred_many(gfp, order, > numa_node_id(), pol); > + else if (pol->mode == MPOL_PREFERRED_STRICT) > + page = alloc_pages_preferred_strict(gfp, order, pol); > else > page = __alloc_pages(gfp, order, > policy_node(gfp, pol, numa_node_id()), > @@ -2265,6 +2298,7 @@ bool __mpol_equal(struct mempolicy *a, struct mempolicy *b) > case MPOL_INTERLEAVE: > case MPOL_PREFERRED: > case MPOL_PREFERRED_MANY: > + case MPOL_PREFERRED_STRICT: > return !!nodes_equal(a->nodes, b->nodes); > case MPOL_LOCAL: > return true; > @@ -2405,6 +2439,7 @@ int mpol_misplaced(struct page *page, struct vm_area_struct *vma, unsigned long > break; > > case MPOL_PREFERRED: > + case MPOL_PREFERRED_STRICT: > if (node_isset(curnid, pol->nodes)) > goto out; > polnid = first_node(pol->nodes); > @@ -2866,6 +2901,7 @@ int mpol_parse_str(char *str, struct mempolicy **mpol) > err = 0; > goto out; > case MPOL_PREFERRED_MANY: > + case MPOL_PREFERRED_STRICT: > case MPOL_BIND: > /* > * Insist on a nodelist > @@ -2953,6 +2989,7 @@ void mpol_to_str(char *buffer, int maxlen, struct mempolicy *pol) > break; > case MPOL_PREFERRED: > case MPOL_PREFERRED_MANY: > + case MPOL_PREFERRED_STRICT: > case MPOL_BIND: > case MPOL_INTERLEAVE: > nodes = pol->nodes; > -- > 2.31.1 -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC PATCH] mm/mempolicy: add MPOL_PREFERRED_STRICT memory policy 2021-10-13 10:42 ` [RFC PATCH] mm/mempolicy: add MPOL_PREFERRED_STRICT memory policy Michal Hocko @ 2021-10-13 10:48 ` Michal Hocko 2021-10-13 12:35 ` Aneesh Kumar K.V 0 siblings, 1 reply; 10+ messages in thread From: Michal Hocko @ 2021-10-13 10:48 UTC (permalink / raw) To: Aneesh Kumar K.V Cc: linux-mm, akpm, Ben Widawsky, Dave Hansen, Feng Tang, Andrea Arcangeli, Mel Gorman, Mike Kravetz, Randy Dunlap, Vlastimil Babka, Andi Kleen, Dan Williams, Huang Ying, linux-api On Wed 13-10-21 12:42:34, Michal Hocko wrote: > [Cc linux-api] > > On Wed 13-10-21 15:15:39, Aneesh Kumar K.V wrote: > > This mempolicy mode can be used with either the set_mempolicy(2) > > or mbind(2) interfaces. Like the MPOL_PREFERRED interface, it > > allows an application to set a preference node from which the kernel > > will fulfill memory allocation requests. Unlike the MPOL_PREFERRED mode, > > it takes a set of nodes. The nodes in the nodemask are used as fallback > > allocation nodes if memory is not available on the preferred node. > > Unlike MPOL_PREFERRED_MANY, it will not fall back memory allocations > > to all nodes in the system. Like the MPOL_BIND interface, it works over a > > set of nodes and will cause a SIGSEGV or invoke the OOM killer if > > memory is not available on those preferred nodes. > > > > This patch helps applications to hint a memory allocation preference node > > and fallback to _only_ a set of nodes if the memory is not available > > on the preferred node. Fallback allocation is attempted from the node which is > > nearest to the preferred node. > > > > This new memory policy helps applications to have explicit control on slow > > memory allocation and avoids default fallback to slow memory NUMA nodes. > > The difference with MPOL_BIND is the ability to specify a preferred node > > which is the first node in the nodemask argument passed. I am sorry but I do not understand the semantic diffrence from MPOL_BIND. Could you be more specific please? -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC PATCH] mm/mempolicy: add MPOL_PREFERRED_STRICT memory policy 2021-10-13 10:48 ` Michal Hocko @ 2021-10-13 12:35 ` Aneesh Kumar K.V 2021-10-13 12:50 ` Michal Hocko 0 siblings, 1 reply; 10+ messages in thread From: Aneesh Kumar K.V @ 2021-10-13 12:35 UTC (permalink / raw) To: Michal Hocko Cc: linux-mm, akpm, Ben Widawsky, Dave Hansen, Feng Tang, Andrea Arcangeli, Mel Gorman, Mike Kravetz, Randy Dunlap, Vlastimil Babka, Andi Kleen, Dan Williams, Huang Ying, linux-api On 10/13/21 16:18, Michal Hocko wrote: > On Wed 13-10-21 12:42:34, Michal Hocko wrote: >> [Cc linux-api] >> >> On Wed 13-10-21 15:15:39, Aneesh Kumar K.V wrote: >>> This mempolicy mode can be used with either the set_mempolicy(2) >>> or mbind(2) interfaces. Like the MPOL_PREFERRED interface, it >>> allows an application to set a preference node from which the kernel >>> will fulfill memory allocation requests. Unlike the MPOL_PREFERRED mode, >>> it takes a set of nodes. The nodes in the nodemask are used as fallback >>> allocation nodes if memory is not available on the preferred node. >>> Unlike MPOL_PREFERRED_MANY, it will not fall back memory allocations >>> to all nodes in the system. Like the MPOL_BIND interface, it works over a >>> set of nodes and will cause a SIGSEGV or invoke the OOM killer if >>> memory is not available on those preferred nodes. >>> >>> This patch helps applications to hint a memory allocation preference node >>> and fallback to _only_ a set of nodes if the memory is not available >>> on the preferred node. Fallback allocation is attempted from the node which is >>> nearest to the preferred node. >>> >>> This new memory policy helps applications to have explicit control on slow >>> memory allocation and avoids default fallback to slow memory NUMA nodes. >>> The difference with MPOL_BIND is the ability to specify a preferred node >>> which is the first node in the nodemask argument passed. > > I am sorry but I do not understand the semantic diffrence from > MPOL_BIND. Could you be more specific please? > MPOL_BIND This mode specifies that memory must come from the set of nodes specified by the policy. Memory will be allocated from the node in the set with sufficient free memory that is closest to the node where the allocation takes place. MPOL_PREFERRED_STRICT This mode specifies that the allocation should be attempted from the first node specified in the nodemask of the policy. If that allocation fails, the kernel will search other nodes in the nodemask, in order of increasing distance from the preferred node based on information provided by the platform firmware. The difference is the ability to specify the preferred node as the first node in the nodemask and all fallback allocations are based on the distance from the preferred node. With MPOL_BIND they base based on the node where the allocation takes place. -aneesh -aneesh ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC PATCH] mm/mempolicy: add MPOL_PREFERRED_STRICT memory policy 2021-10-13 12:35 ` Aneesh Kumar K.V @ 2021-10-13 12:50 ` Michal Hocko 2021-10-13 12:58 ` Aneesh Kumar K.V 0 siblings, 1 reply; 10+ messages in thread From: Michal Hocko @ 2021-10-13 12:50 UTC (permalink / raw) To: Aneesh Kumar K.V Cc: linux-mm, akpm, Ben Widawsky, Dave Hansen, Feng Tang, Andrea Arcangeli, Mel Gorman, Mike Kravetz, Randy Dunlap, Vlastimil Babka, Andi Kleen, Dan Williams, Huang Ying, linux-api On Wed 13-10-21 18:05:49, Aneesh Kumar K.V wrote: > On 10/13/21 16:18, Michal Hocko wrote: > > On Wed 13-10-21 12:42:34, Michal Hocko wrote: > > > [Cc linux-api] > > > > > > On Wed 13-10-21 15:15:39, Aneesh Kumar K.V wrote: > > > > This mempolicy mode can be used with either the set_mempolicy(2) > > > > or mbind(2) interfaces. Like the MPOL_PREFERRED interface, it > > > > allows an application to set a preference node from which the kernel > > > > will fulfill memory allocation requests. Unlike the MPOL_PREFERRED mode, > > > > it takes a set of nodes. The nodes in the nodemask are used as fallback > > > > allocation nodes if memory is not available on the preferred node. > > > > Unlike MPOL_PREFERRED_MANY, it will not fall back memory allocations > > > > to all nodes in the system. Like the MPOL_BIND interface, it works over a > > > > set of nodes and will cause a SIGSEGV or invoke the OOM killer if > > > > memory is not available on those preferred nodes. > > > > > > > > This patch helps applications to hint a memory allocation preference node > > > > and fallback to _only_ a set of nodes if the memory is not available > > > > on the preferred node. Fallback allocation is attempted from the node which is > > > > nearest to the preferred node. > > > > > > > > This new memory policy helps applications to have explicit control on slow > > > > memory allocation and avoids default fallback to slow memory NUMA nodes. > > > > The difference with MPOL_BIND is the ability to specify a preferred node > > > > which is the first node in the nodemask argument passed. > > > > I am sorry but I do not understand the semantic diffrence from > > MPOL_BIND. Could you be more specific please? > > > > > > MPOL_BIND > This mode specifies that memory must come from the set of > nodes specified by the policy. Memory will be allocated from > the node in the set with sufficient free memory that is > closest to the node where the allocation takes place. > > > MPOL_PREFERRED_STRICT > This mode specifies that the allocation should be attempted > from the first node specified in the nodemask of the policy. > If that allocation fails, the kernel will search other nodes > in the nodemask, in order of increasing distance from the > preferred node based on information provided by the platform firmware. > > The difference is the ability to specify the preferred node as the first > node in the nodemask and all fallback allocations are based on the distance > from the preferred node. With MPOL_BIND they base based on the node where > the allocation takes place. OK, this makes it more clear. Thanks! I am still not sure the semantic makes sense though. Why should the lowest node in the nodemask have any special meaning? What if it is a node with a higher number that somebody preferes to start with? -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC PATCH] mm/mempolicy: add MPOL_PREFERRED_STRICT memory policy 2021-10-13 12:50 ` Michal Hocko @ 2021-10-13 12:58 ` Aneesh Kumar K.V 2021-10-13 13:07 ` Michal Hocko 2021-10-13 13:57 ` Aneesh Kumar K.V 0 siblings, 2 replies; 10+ messages in thread From: Aneesh Kumar K.V @ 2021-10-13 12:58 UTC (permalink / raw) To: Michal Hocko Cc: linux-mm, akpm, Ben Widawsky, Dave Hansen, Feng Tang, Andrea Arcangeli, Mel Gorman, Mike Kravetz, Randy Dunlap, Vlastimil Babka, Andi Kleen, Dan Williams, Huang Ying, linux-api On 10/13/21 18:20, Michal Hocko wrote: > On Wed 13-10-21 18:05:49, Aneesh Kumar K.V wrote: >> On 10/13/21 16:18, Michal Hocko wrote: >>> On Wed 13-10-21 12:42:34, Michal Hocko wrote: >>>> [Cc linux-api] >>>> >>>> On Wed 13-10-21 15:15:39, Aneesh Kumar K.V wrote: >>>>> This mempolicy mode can be used with either the set_mempolicy(2) >>>>> or mbind(2) interfaces. Like the MPOL_PREFERRED interface, it >>>>> allows an application to set a preference node from which the kernel >>>>> will fulfill memory allocation requests. Unlike the MPOL_PREFERRED mode, >>>>> it takes a set of nodes. The nodes in the nodemask are used as fallback >>>>> allocation nodes if memory is not available on the preferred node. >>>>> Unlike MPOL_PREFERRED_MANY, it will not fall back memory allocations >>>>> to all nodes in the system. Like the MPOL_BIND interface, it works over a >>>>> set of nodes and will cause a SIGSEGV or invoke the OOM killer if >>>>> memory is not available on those preferred nodes. >>>>> >>>>> This patch helps applications to hint a memory allocation preference node >>>>> and fallback to _only_ a set of nodes if the memory is not available >>>>> on the preferred node. Fallback allocation is attempted from the node which is >>>>> nearest to the preferred node. >>>>> >>>>> This new memory policy helps applications to have explicit control on slow >>>>> memory allocation and avoids default fallback to slow memory NUMA nodes. >>>>> The difference with MPOL_BIND is the ability to specify a preferred node >>>>> which is the first node in the nodemask argument passed. >>> >>> I am sorry but I do not understand the semantic diffrence from >>> MPOL_BIND. Could you be more specific please? >>> >> >> >> >> MPOL_BIND >> This mode specifies that memory must come from the set of >> nodes specified by the policy. Memory will be allocated from >> the node in the set with sufficient free memory that is >> closest to the node where the allocation takes place. >> >> >> MPOL_PREFERRED_STRICT >> This mode specifies that the allocation should be attempted >> from the first node specified in the nodemask of the policy. >> If that allocation fails, the kernel will search other nodes >> in the nodemask, in order of increasing distance from the >> preferred node based on information provided by the platform firmware. >> >> The difference is the ability to specify the preferred node as the first >> node in the nodemask and all fallback allocations are based on the distance >> from the preferred node. With MPOL_BIND they base based on the node where >> the allocation takes place. > > OK, this makes it more clear. Thanks! > > I am still not sure the semantic makes sense though. Why should > the lowest node in the nodemask have any special meaning? What if it is > a node with a higher number that somebody preferes to start with? > That is true. I haven't been able to find an easy way to specify the preferred node other than expressing it as first node in the node mask. Yes, it limits the usage of the policy. Any alternate suggestion? We could do set_mempolicy(MPOLD_PREFERRED, nodemask(nodeX))) set_mempolicy(MPOLD_PREFFERED_EXTEND, nodemask(fallback nodemask for above PREFERRED policy)) But that really complicates the interface? -aneesh ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC PATCH] mm/mempolicy: add MPOL_PREFERRED_STRICT memory policy 2021-10-13 12:58 ` Aneesh Kumar K.V @ 2021-10-13 13:07 ` Michal Hocko 2021-10-13 13:10 ` Aneesh Kumar K.V 2021-10-13 13:57 ` Aneesh Kumar K.V 1 sibling, 1 reply; 10+ messages in thread From: Michal Hocko @ 2021-10-13 13:07 UTC (permalink / raw) To: Aneesh Kumar K.V Cc: linux-mm, akpm, Ben Widawsky, Dave Hansen, Feng Tang, Andrea Arcangeli, Mel Gorman, Mike Kravetz, Randy Dunlap, Vlastimil Babka, Andi Kleen, Dan Williams, Huang Ying, linux-api On Wed 13-10-21 18:28:40, Aneesh Kumar K.V wrote: > On 10/13/21 18:20, Michal Hocko wrote: [...] > > I am still not sure the semantic makes sense though. Why should > > the lowest node in the nodemask have any special meaning? What if it is > > a node with a higher number that somebody preferes to start with? > > > > That is true. I haven't been able to find an easy way to specify the > preferred node other than expressing it as first node in the node mask. Yes, > it limits the usage of the policy. Any alternate suggestion? set_mempolicy is indeed not very suitable for something you are looking for. Could you be more specific why the initial node is so important? Is this because you want to allocate from a cpu less node first before falling back to others? -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC PATCH] mm/mempolicy: add MPOL_PREFERRED_STRICT memory policy 2021-10-13 13:07 ` Michal Hocko @ 2021-10-13 13:10 ` Aneesh Kumar K.V 2021-10-13 14:22 ` Michal Hocko 0 siblings, 1 reply; 10+ messages in thread From: Aneesh Kumar K.V @ 2021-10-13 13:10 UTC (permalink / raw) To: Michal Hocko Cc: linux-mm, akpm, Ben Widawsky, Dave Hansen, Feng Tang, Andrea Arcangeli, Mel Gorman, Mike Kravetz, Randy Dunlap, Vlastimil Babka, Andi Kleen, Dan Williams, Huang Ying, linux-api On 10/13/21 18:37, Michal Hocko wrote: > On Wed 13-10-21 18:28:40, Aneesh Kumar K.V wrote: >> On 10/13/21 18:20, Michal Hocko wrote: > [...] >>> I am still not sure the semantic makes sense though. Why should >>> the lowest node in the nodemask have any special meaning? What if it is >>> a node with a higher number that somebody preferes to start with? >>> >> >> That is true. I haven't been able to find an easy way to specify the >> preferred node other than expressing it as first node in the node mask. Yes, >> it limits the usage of the policy. Any alternate suggestion? > > set_mempolicy is indeed not very suitable for something you are looking > for. Could you be more specific why the initial node is so important? > Is this because you want to allocate from a cpu less node first before > falling back to others? > One of the reason is that the thread that is faulting in pages first is not the one that is going to operate on this page long term. Application wants to hint the allocation node for the same reason they use MPOL_PREFERRED now. -aneesh ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC PATCH] mm/mempolicy: add MPOL_PREFERRED_STRICT memory policy 2021-10-13 13:10 ` Aneesh Kumar K.V @ 2021-10-13 14:22 ` Michal Hocko 0 siblings, 0 replies; 10+ messages in thread From: Michal Hocko @ 2021-10-13 14:22 UTC (permalink / raw) To: Aneesh Kumar K.V Cc: linux-mm, akpm, Ben Widawsky, Dave Hansen, Feng Tang, Andrea Arcangeli, Mel Gorman, Mike Kravetz, Randy Dunlap, Vlastimil Babka, Andi Kleen, Dan Williams, Huang Ying, linux-api On Wed 13-10-21 18:40:26, Aneesh Kumar K.V wrote: > On 10/13/21 18:37, Michal Hocko wrote: > > On Wed 13-10-21 18:28:40, Aneesh Kumar K.V wrote: > > > On 10/13/21 18:20, Michal Hocko wrote: > > [...] > > > > I am still not sure the semantic makes sense though. Why should > > > > the lowest node in the nodemask have any special meaning? What if it is > > > > a node with a higher number that somebody preferes to start with? > > > > > > > > > > That is true. I haven't been able to find an easy way to specify the > > > preferred node other than expressing it as first node in the node mask. Yes, > > > it limits the usage of the policy. Any alternate suggestion? > > > > set_mempolicy is indeed not very suitable for something you are looking > > for. Could you be more specific why the initial node is so important? > > Is this because you want to allocate from a cpu less node first before > > falling back to others? > > > > One of the reason is that the thread that is faulting in pages first is not > the one that is going to operate on this page long term. Application wants > to hint the allocation node for the same reason they use MPOL_PREFERRED now. Why cannot you move the faulting thread to a numa node of the preference during the faulting and them move it out if that is really necessary? -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC PATCH] mm/mempolicy: add MPOL_PREFERRED_STRICT memory policy 2021-10-13 12:58 ` Aneesh Kumar K.V 2021-10-13 13:07 ` Michal Hocko @ 2021-10-13 13:57 ` Aneesh Kumar K.V 2021-10-13 14:26 ` Michal Hocko 1 sibling, 1 reply; 10+ messages in thread From: Aneesh Kumar K.V @ 2021-10-13 13:57 UTC (permalink / raw) To: Michal Hocko Cc: linux-mm, akpm, Ben Widawsky, Dave Hansen, Feng Tang, Andrea Arcangeli, Mel Gorman, Mike Kravetz, Randy Dunlap, Vlastimil Babka, Andi Kleen, Dan Williams, Huang Ying, linux-api On 10/13/21 18:28, Aneesh Kumar K.V wrote: > On 10/13/21 18:20, Michal Hocko wrote: >> On Wed 13-10-21 18:05:49, Aneesh Kumar K.V wrote: >>> On 10/13/21 16:18, Michal Hocko wrote: >>>> On Wed 13-10-21 12:42:34, Michal Hocko wrote: >>>>> [Cc linux-api] >>>>> >>>>> On Wed 13-10-21 15:15:39, Aneesh Kumar K.V wrote: >>>>>> This mempolicy mode can be used with either the set_mempolicy(2) >>>>>> or mbind(2) interfaces. Like the MPOL_PREFERRED interface, it >>>>>> allows an application to set a preference node from which the kernel >>>>>> will fulfill memory allocation requests. Unlike the MPOL_PREFERRED >>>>>> mode, >>>>>> it takes a set of nodes. The nodes in the nodemask are used as >>>>>> fallback >>>>>> allocation nodes if memory is not available on the preferred node. >>>>>> Unlike MPOL_PREFERRED_MANY, it will not fall back memory allocations >>>>>> to all nodes in the system. Like the MPOL_BIND interface, it works >>>>>> over a >>>>>> set of nodes and will cause a SIGSEGV or invoke the OOM killer if >>>>>> memory is not available on those preferred nodes. >>>>>> >>>>>> This patch helps applications to hint a memory allocation >>>>>> preference node >>>>>> and fallback to _only_ a set of nodes if the memory is not available >>>>>> on the preferred node. Fallback allocation is attempted from the >>>>>> node which is >>>>>> nearest to the preferred node. >>>>>> >>>>>> This new memory policy helps applications to have explicit control >>>>>> on slow >>>>>> memory allocation and avoids default fallback to slow memory NUMA >>>>>> nodes. >>>>>> The difference with MPOL_BIND is the ability to specify a >>>>>> preferred node >>>>>> which is the first node in the nodemask argument passed. >>>> >>>> I am sorry but I do not understand the semantic diffrence from >>>> MPOL_BIND. Could you be more specific please? >>>> >>> >>> >>> >>> MPOL_BIND >>> This mode specifies that memory must come from the set of >>> nodes specified by the policy. Memory will be allocated from >>> the node in the set with sufficient free memory that is >>> closest to the node where the allocation takes place. >>> >>> >>> MPOL_PREFERRED_STRICT >>> This mode specifies that the allocation should be attempted >>> from the first node specified in the nodemask of the policy. >>> If that allocation fails, the kernel will search other nodes >>> in the nodemask, in order of increasing distance from the >>> preferred node based on information provided by the platform >>> firmware. >>> >>> The difference is the ability to specify the preferred node as the first >>> node in the nodemask and all fallback allocations are based on the >>> distance >>> from the preferred node. With MPOL_BIND they base based on the node >>> where >>> the allocation takes place. >> >> OK, this makes it more clear. Thanks! >> >> I am still not sure the semantic makes sense though. Why should >> the lowest node in the nodemask have any special meaning? What if it is >> a node with a higher number that somebody preferes to start with? >> > > That is true. I haven't been able to find an easy way to specify the > preferred node other than expressing it as first node in the node mask. > Yes, it limits the usage of the policy. Any alternate suggestion? > > We could do > set_mempolicy(MPOLD_PREFERRED, nodemask(nodeX))) > set_mempolicy(MPOLD_PREFFERED_EXTEND, nodemask(fallback nodemask for > above PREFERRED policy)) > > But that really complicates the interface? > > Another option is to keep this mbind(2) specific and overload flags to be the preferred nodeid. mbind(va, len, MPOL_PREFERRED_STRICT, nodemask, max_node, preferred_node); -aneesh ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC PATCH] mm/mempolicy: add MPOL_PREFERRED_STRICT memory policy 2021-10-13 13:57 ` Aneesh Kumar K.V @ 2021-10-13 14:26 ` Michal Hocko 0 siblings, 0 replies; 10+ messages in thread From: Michal Hocko @ 2021-10-13 14:26 UTC (permalink / raw) To: Aneesh Kumar K.V Cc: linux-mm, akpm, Ben Widawsky, Dave Hansen, Feng Tang, Andrea Arcangeli, Mel Gorman, Mike Kravetz, Randy Dunlap, Vlastimil Babka, Andi Kleen, Dan Williams, Huang Ying, linux-api On Wed 13-10-21 19:27:03, Aneesh Kumar K.V wrote: [...] > Another option is to keep this mbind(2) specific and overload flags to be > the preferred nodeid. > > mbind(va, len, MPOL_PREFERRED_STRICT, nodemask, max_node, preferred_node); First of all I do not think you really want to create a new memory policy for this. Not to mention that PREFERRED_STRICT is kinda weird in the first place but one could argue that a preference of the first node to try is not really specific to BIND/PREFERRED_MANY. Overloading flags is a nogo. -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2021-10-13 14:26 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- [not found] <20211013094539.962357-1-aneesh.kumar@linux.ibm.com> 2021-10-13 10:42 ` [RFC PATCH] mm/mempolicy: add MPOL_PREFERRED_STRICT memory policy Michal Hocko 2021-10-13 10:48 ` Michal Hocko 2021-10-13 12:35 ` Aneesh Kumar K.V 2021-10-13 12:50 ` Michal Hocko 2021-10-13 12:58 ` Aneesh Kumar K.V 2021-10-13 13:07 ` Michal Hocko 2021-10-13 13:10 ` Aneesh Kumar K.V 2021-10-13 14:22 ` Michal Hocko 2021-10-13 13:57 ` Aneesh Kumar K.V 2021-10-13 14:26 ` Michal Hocko
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).