From: Mike Kravetz <mike.kravetz@oracle.com>
To: Hillf Danton <hillf.zj@alibaba-inc.com>,
'Naoya Horiguchi' <n-horiguchi@ah.jp.nec.com>,
'Andrew Morton' <akpm@linux-foundation.org>
Cc: 'David Rientjes' <rientjes@google.com>,
'Dave Hansen' <dave.hansen@intel.com>,
'Mel Gorman' <mgorman@suse.de>,
'Joonsoo Kim' <iamjoonsoo.kim@lge.com>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
'Naoya Horiguchi' <nao.horiguchi@gmail.com>
Subject: Re: [PATCH v1] mm: hugetlb: fix hugepage memory leak caused by wrong reserve count
Date: Fri, 20 Nov 2015 13:56:18 -0800 [thread overview]
Message-ID: <564F9702.5070007@oracle.com> (raw)
In-Reply-To: <050201d12369$167a0a10$436e1e30$@alibaba-inc.com>
On 11/19/2015 11:57 PM, Hillf Danton wrote:
>>
>> When dequeue_huge_page_vma() in alloc_huge_page() fails, we fall back to
>> alloc_buddy_huge_page() to directly create a hugepage from the buddy allocator.
>> In that case, however, if alloc_buddy_huge_page() succeeds we don't decrement
>> h->resv_huge_pages, which means that successful hugetlb_fault() returns without
>> releasing the reserve count. As a result, subsequent hugetlb_fault() might fail
>> despite that there are still free hugepages.
>>
>> This patch simply adds decrementing code on that code path.
In general, I agree with the patch. If we allocate a huge page via the
buddy allocator and that page will be used to satisfy a reservation, then
we need to decrement the reservation count.
As Hillf mentions, this code is not exactly the same in linux-next.
Specifically, there is the new call to take the memory policy of the
vma into account when calling the buddy allocator. I do not think,
this impacts your proposed change but you may want to test with that
in place.
>>
>> I reproduced this problem when testing v4.3 kernel in the following situation:
>> - the test machine/VM is a NUMA system,
>> - hugepage overcommiting is enabled,
>> - most of hugepages are allocated and there's only one free hugepage
>> which is on node 0 (for example),
>> - another program, which calls set_mempolicy(MPOL_BIND) to bind itself to
>> node 1, tries to allocate a hugepage,
I am curious about this scenario. When this second program attempts to
allocate the page, I assume it creates a reservation first. Is this
reservation before or after setting mempolicy? If the mempolicy was set
first, I would have expected the reservation to allocate a page on
node 1 to satisfy the reservation.
--
Mike Kravetz
>> - the allocation should fail but the reserve count is still hold.
>>
>> Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
>> Cc: <stable@vger.kernel.org> [3.16+]
>> ---
>> - the reason why I set stable target to "3.16+" is that this patch can be
>> applied easily/automatically on these versions. But this bug seems to be
>> old one, so if you are interested in backporting to older kernels,
>> please let me know.
>> ---
>> mm/hugetlb.c | 5 ++++-
>> 1 files changed, 4 insertions(+), 1 deletions(-)
>>
>> diff --git v4.3/mm/hugetlb.c v4.3_patched/mm/hugetlb.c
>> index 9cc7734..77c518c 100644
>> --- v4.3/mm/hugetlb.c
>> +++ v4.3_patched/mm/hugetlb.c
>> @@ -1790,7 +1790,10 @@ struct page *alloc_huge_page(struct vm_area_struct *vma,
>> page = alloc_buddy_huge_page(h, NUMA_NO_NODE);
>> if (!page)
>> goto out_uncharge_cgroup;
>> -
>> + if (!avoid_reserve && vma_has_reserves(vma, gbl_chg)) {
>> + SetPagePrivate(page);
>> + h->resv_huge_pages--;
>> + }
>
> I am wondering if this patch was prepared against the next tree.
>
>> spin_lock(&hugetlb_lock);
>> list_move(&page->lru, &h->hugepage_activelist);
>> /* Fall through */
>> --
>> 1.7.1
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
WARNING: multiple messages have this Message-ID (diff)
From: Mike Kravetz <mike.kravetz@oracle.com>
To: Hillf Danton <hillf.zj@alibaba-inc.com>,
"'Naoya Horiguchi'" <n-horiguchi@ah.jp.nec.com>,
"'Andrew Morton'" <akpm@linux-foundation.org>
Cc: "'David Rientjes'" <rientjes@google.com>,
"'Dave Hansen'" <dave.hansen@intel.com>,
"'Mel Gorman'" <mgorman@suse.de>,
"'Joonsoo Kim'" <iamjoonsoo.kim@lge.com>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
"'Naoya Horiguchi'" <nao.horiguchi@gmail.com>
Subject: Re: [PATCH v1] mm: hugetlb: fix hugepage memory leak caused by wrong reserve count
Date: Fri, 20 Nov 2015 13:56:18 -0800 [thread overview]
Message-ID: <564F9702.5070007@oracle.com> (raw)
In-Reply-To: <050201d12369$167a0a10$436e1e30$@alibaba-inc.com>
On 11/19/2015 11:57 PM, Hillf Danton wrote:
>>
>> When dequeue_huge_page_vma() in alloc_huge_page() fails, we fall back to
>> alloc_buddy_huge_page() to directly create a hugepage from the buddy allocator.
>> In that case, however, if alloc_buddy_huge_page() succeeds we don't decrement
>> h->resv_huge_pages, which means that successful hugetlb_fault() returns without
>> releasing the reserve count. As a result, subsequent hugetlb_fault() might fail
>> despite that there are still free hugepages.
>>
>> This patch simply adds decrementing code on that code path.
In general, I agree with the patch. If we allocate a huge page via the
buddy allocator and that page will be used to satisfy a reservation, then
we need to decrement the reservation count.
As Hillf mentions, this code is not exactly the same in linux-next.
Specifically, there is the new call to take the memory policy of the
vma into account when calling the buddy allocator. I do not think,
this impacts your proposed change but you may want to test with that
in place.
>>
>> I reproduced this problem when testing v4.3 kernel in the following situation:
>> - the test machine/VM is a NUMA system,
>> - hugepage overcommiting is enabled,
>> - most of hugepages are allocated and there's only one free hugepage
>> which is on node 0 (for example),
>> - another program, which calls set_mempolicy(MPOL_BIND) to bind itself to
>> node 1, tries to allocate a hugepage,
I am curious about this scenario. When this second program attempts to
allocate the page, I assume it creates a reservation first. Is this
reservation before or after setting mempolicy? If the mempolicy was set
first, I would have expected the reservation to allocate a page on
node 1 to satisfy the reservation.
--
Mike Kravetz
>> - the allocation should fail but the reserve count is still hold.
>>
>> Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
>> Cc: <stable@vger.kernel.org> [3.16+]
>> ---
>> - the reason why I set stable target to "3.16+" is that this patch can be
>> applied easily/automatically on these versions. But this bug seems to be
>> old one, so if you are interested in backporting to older kernels,
>> please let me know.
>> ---
>> mm/hugetlb.c | 5 ++++-
>> 1 files changed, 4 insertions(+), 1 deletions(-)
>>
>> diff --git v4.3/mm/hugetlb.c v4.3_patched/mm/hugetlb.c
>> index 9cc7734..77c518c 100644
>> --- v4.3/mm/hugetlb.c
>> +++ v4.3_patched/mm/hugetlb.c
>> @@ -1790,7 +1790,10 @@ struct page *alloc_huge_page(struct vm_area_struct *vma,
>> page = alloc_buddy_huge_page(h, NUMA_NO_NODE);
>> if (!page)
>> goto out_uncharge_cgroup;
>> -
>> + if (!avoid_reserve && vma_has_reserves(vma, gbl_chg)) {
>> + SetPagePrivate(page);
>> + h->resv_huge_pages--;
>> + }
>
> I am wondering if this patch was prepared against the next tree.
>
>> spin_lock(&hugetlb_lock);
>> list_move(&page->lru, &h->hugepage_activelist);
>> /* Fall through */
>> --
>> 1.7.1
>
next prev parent reply other threads:[~2015-11-20 21:56 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-11-20 7:20 [PATCH v1] mm: hugetlb: fix hugepage memory leak caused by wrong reserve count Naoya Horiguchi
2015-11-20 7:20 ` Naoya Horiguchi
2015-11-20 7:57 ` Hillf Danton
2015-11-20 7:57 ` Hillf Danton
2015-11-20 21:56 ` Mike Kravetz [this message]
2015-11-20 21:56 ` Mike Kravetz
2015-11-24 5:32 ` Naoya Horiguchi
2015-11-24 5:32 ` Naoya Horiguchi
2015-11-24 18:16 ` Mike Kravetz
2015-11-24 18:16 ` Mike Kravetz
2015-11-20 22:26 ` Andrew Morton
2015-11-20 22:26 ` Andrew Morton
2015-11-24 5:03 ` Naoya Horiguchi
2015-11-24 5:03 ` Naoya Horiguchi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=564F9702.5070007@oracle.com \
--to=mike.kravetz@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=dave.hansen@intel.com \
--cc=hillf.zj@alibaba-inc.com \
--cc=iamjoonsoo.kim@lge.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@suse.de \
--cc=n-horiguchi@ah.jp.nec.com \
--cc=nao.horiguchi@gmail.com \
--cc=rientjes@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.