From: Vlastimil Babka <vbabka@suse.cz>
To: Dave Hansen <dave@sr71.net>
Cc: akpm@linux-foundation.org, n-horiguchi@ah.jp.nec.com,
mike.kravetz@oracle.com, hillf.zj@alibaba-inc.com,
rientjes@google.com, linux-mm@kvack.org,
linux-kernel@vger.kernel.org, dave.hansen@linux.intel.com
Subject: Re: [PATCH] mm, hugetlb: use memory policy when available
Date: Thu, 5 Nov 2015 14:47:21 +0100 [thread overview]
Message-ID: <563B5DE9.70803@suse.cz> (raw)
In-Reply-To: <20151020195317.ADA052D8@viggo.jf.intel.com>
On 10/20/2015 09:53 PM, Dave Hansen wrote:
> From: Dave Hansen <dave.hansen@linux.intel.com>
>
> I have a hugetlbfs user which is never explicitly allocating huge pages
> with 'nr_hugepages'. They only set 'nr_overcommit_hugepages' and then let
> the pages be allocated from the buddy allocator at fault time.
>
> This works, but they noticed that mbind() was not doing them any good and
> the pages were being allocated without respect for the policy they
> specified.
>
> The code in question is this:
>
>> struct page *alloc_huge_page(struct vm_area_struct *vma,
> ...
>> page = dequeue_huge_page_vma(h, vma, addr, avoid_reserve, gbl_chg);
>> if (!page) {
>> page = alloc_buddy_huge_page(h, NUMA_NO_NODE);
>
> dequeue_huge_page_vma() is smart and will respect the VMA's memory policy.
> But, it only grabs _existing_ huge pages from the huge page pool. If the
> pool is empty, we fall back to alloc_buddy_huge_page() which obviously
> can't do anything with the VMA's policy because it isn't even passed the
> VMA.
>
> Almost everybody preallocates huge pages. That's probably why nobody has
> ever noticed this. Looking back at the git history, I don't think this
> _ever_ worked from when alloc_buddy_huge_page() was introduced in 7893d1d5,
> 8 years ago.
>
> The fix is to pass vma/addr down in to the places where we actually call in
> to the buddy allocator. It's fairly straightforward plumbing. This has
> been lightly tested.
>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> Cc: Mike Kravetz <mike.kravetz@oracle.com>
> Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
> Cc: David Rientjes <rientjes@google.com>
> Cc: linux-mm@kvack.org
> Cc: linux-kernel@vger.kernel.org
> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Together with the fix and NUMA=n cleanup
Acked=by: Vlastimil Babka <vbabka@suse.cz>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
WARNING: multiple messages have this Message-ID (diff)
From: Vlastimil Babka <vbabka@suse.cz>
To: Dave Hansen <dave@sr71.net>
Cc: akpm@linux-foundation.org, n-horiguchi@ah.jp.nec.com,
mike.kravetz@oracle.com, hillf.zj@alibaba-inc.com,
rientjes@google.com, linux-mm@kvack.org,
linux-kernel@vger.kernel.org, dave.hansen@linux.intel.com
Subject: Re: [PATCH] mm, hugetlb: use memory policy when available
Date: Thu, 5 Nov 2015 14:47:21 +0100 [thread overview]
Message-ID: <563B5DE9.70803@suse.cz> (raw)
In-Reply-To: <20151020195317.ADA052D8@viggo.jf.intel.com>
On 10/20/2015 09:53 PM, Dave Hansen wrote:
> From: Dave Hansen <dave.hansen@linux.intel.com>
>
> I have a hugetlbfs user which is never explicitly allocating huge pages
> with 'nr_hugepages'. They only set 'nr_overcommit_hugepages' and then let
> the pages be allocated from the buddy allocator at fault time.
>
> This works, but they noticed that mbind() was not doing them any good and
> the pages were being allocated without respect for the policy they
> specified.
>
> The code in question is this:
>
>> struct page *alloc_huge_page(struct vm_area_struct *vma,
> ...
>> page = dequeue_huge_page_vma(h, vma, addr, avoid_reserve, gbl_chg);
>> if (!page) {
>> page = alloc_buddy_huge_page(h, NUMA_NO_NODE);
>
> dequeue_huge_page_vma() is smart and will respect the VMA's memory policy.
> But, it only grabs _existing_ huge pages from the huge page pool. If the
> pool is empty, we fall back to alloc_buddy_huge_page() which obviously
> can't do anything with the VMA's policy because it isn't even passed the
> VMA.
>
> Almost everybody preallocates huge pages. That's probably why nobody has
> ever noticed this. Looking back at the git history, I don't think this
> _ever_ worked from when alloc_buddy_huge_page() was introduced in 7893d1d5,
> 8 years ago.
>
> The fix is to pass vma/addr down in to the places where we actually call in
> to the buddy allocator. It's fairly straightforward plumbing. This has
> been lightly tested.
>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> Cc: Mike Kravetz <mike.kravetz@oracle.com>
> Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
> Cc: David Rientjes <rientjes@google.com>
> Cc: linux-mm@kvack.org
> Cc: linux-kernel@vger.kernel.org
> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Together with the fix and NUMA=n cleanup
Acked=by: Vlastimil Babka <vbabka@suse.cz>
next prev parent reply other threads:[~2015-11-05 13:47 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-10-20 19:53 [PATCH] mm, hugetlb: use memory policy when available Dave Hansen
2015-10-20 19:53 ` Dave Hansen
2015-10-20 22:19 ` Andrew Morton
2015-10-20 22:19 ` Andrew Morton
2015-10-21 15:12 ` Kirill A. Shutemov
2015-10-21 15:12 ` Kirill A. Shutemov
2015-10-22 21:39 ` Sasha Levin
2015-10-22 21:39 ` Sasha Levin
2015-10-22 21:42 ` Dave Hansen
2015-10-22 21:42 ` Dave Hansen
2015-11-03 19:12 ` Sasha Levin
2015-11-03 19:12 ` Sasha Levin
2015-11-05 13:47 ` Vlastimil Babka [this message]
2015-11-05 13:47 ` Vlastimil Babka
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=563B5DE9.70803@suse.cz \
--to=vbabka@suse.cz \
--cc=akpm@linux-foundation.org \
--cc=dave.hansen@linux.intel.com \
--cc=dave@sr71.net \
--cc=hillf.zj@alibaba-inc.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mike.kravetz@oracle.com \
--cc=n-horiguchi@ah.jp.nec.com \
--cc=rientjes@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.