From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
To: David Rientjes <rientjes@google.com>,
"Kirill A. Shutemov" <kirill@shutemov.name>
Cc: akpm@linux-foundation.org, linux-mm@kvack.org,
linux-kernel@vger.kernel.org
Subject: Re: [RFC PATCH] mm/thp: Always allocate transparent hugepages on local node
Date: Thu, 27 Nov 2014 12:02:01 +0530 [thread overview]
Message-ID: <87r3wp887y.fsf@linux.vnet.ibm.com> (raw)
In-Reply-To: <alpine.DEB.2.10.1411241317430.21237@chino.kir.corp.google.com>
David Rientjes <rientjes@google.com> writes:
> On Mon, 24 Nov 2014, Kirill A. Shutemov wrote:
>
>> > This make sure that we try to allocate hugepages from local node. If
>> > we can't we fallback to small page allocation based on
>> > mempolicy. This is based on the observation that allocating pages
>> > on local node is more beneficial that allocating hugepages on remote node.
>>
>> Local node on allocation is not necessary local node for use.
>> If policy says to use a specific node[s], we should follow.
>>
>
> True, and the interaction between thp and mempolicies is fragile: if a
> process has a MPOL_BIND mempolicy over a set of nodes, that does not
> necessarily mean that we want to allocate thp remotely if it will always
> be accessed remotely. It's simple to benchmark and show that remote
> access latency of a hugepage can exceed that of local pages. MPOL_BIND
> itself is a policy of exclusion, not inclusion, and it's difficult to
> define when local pages and its cost of allocation is better than remote
> thp.
>
> For MPOL_BIND, if the local node is allowed then thp should be forced from
> that node, if the local node is disallowed then allocate from any node in
> the nodemask. For MPOL_INTERLEAVE, I think we should only allocate thp
> from the next node in order, otherwise fail the allocation and fallback to
> small pages. Is this what you meant as well?
>
Something like below
struct page *alloc_hugepage_vma(gfp_t gfp, struct vm_area_struct *vma,
unsigned long addr, int order)
{
struct page *page;
nodemask_t *nmask;
struct mempolicy *pol;
int node = numa_node_id();
unsigned int cpuset_mems_cookie;
retry_cpuset:
pol = get_vma_policy(vma, addr);
cpuset_mems_cookie = read_mems_allowed_begin();
if (unlikely(pol->mode == MPOL_INTERLEAVE)) {
unsigned nid;
nid = interleave_nid(pol, vma, addr, PAGE_SHIFT + order);
mpol_cond_put(pol);
page = alloc_page_interleave(gfp, order, nid);
if (unlikely(!page &&
read_mems_allowed_retry(cpuset_mems_cookie)))
goto retry_cpuset;
return page;
}
nmask = policy_nodemask(gfp, pol);
if (!nmask || node_isset(node, *nmask)) {
mpol_cond_put(pol);
page = alloc_hugepage_exact_node(node, gfp, order);
if (unlikely(!page &&
read_mems_allowed_retry(cpuset_mems_cookie)))
goto retry_cpuset;
return page;
}
/*
* if current node is not part of node mask, try
* the allocation from any node, and we can do retry
* in that case.
*/
page = __alloc_pages_nodemask(gfp, order,
policy_zonelist(gfp, pol, node),
nmask);
mpol_cond_put(pol);
if (unlikely(!page && read_mems_allowed_retry(cpuset_mems_cookie)))
goto retry_cpuset;
return page;
}
-aneesh
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
prev parent reply other threads:[~2014-11-27 6:43 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-11-24 14:19 [RFC PATCH] mm/thp: Always allocate transparent hugepages on local node Aneesh Kumar K.V
2014-11-24 15:03 ` Kirill A. Shutemov
2014-11-24 21:33 ` David Rientjes
2014-11-25 14:17 ` Kirill A. Shutemov
2014-11-27 6:32 ` Aneesh Kumar K.V [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87r3wp887y.fsf@linux.vnet.ibm.com \
--to=aneesh.kumar@linux.vnet.ibm.com \
--cc=akpm@linux-foundation.org \
--cc=kirill@shutemov.name \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=rientjes@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).