From: Mike Kravetz <mike.kravetz@oracle.com>
To: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: "linux-mm@kvack.org" <linux-mm@kvack.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
Dave Hansen <dave.hansen@linux.intel.com>,
David Rientjes <rientjes@google.com>,
Hugh Dickins <hughd@google.com>,
Davidlohr Bueso <dave@stgolabs.net>,
Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>,
Hillf Danton <hillf.zj@alibaba-inc.com>,
Christoph Hellwig <hch@infradead.org>
Subject: Re: [RFC v4 PATCH 6/9] mm/hugetlb: alloc_huge_page handle areas hole punched by fallocate
Date: Mon, 15 Jun 2015 11:42:25 -0700 [thread overview]
Message-ID: <557F1C91.9080904@oracle.com> (raw)
In-Reply-To: <20150615063444.GA26050@hori1.linux.bs1.fc.nec.co.jp>
On 06/14/2015 11:34 PM, Naoya Horiguchi wrote:
> On Thu, Jun 11, 2015 at 02:01:37PM -0700, Mike Kravetz wrote:
>> Areas hole punched by fallocate will not have entries in the
>> region/reserve map. However, shared mappings with min_size subpool
>> reservations may still have reserved pages. alloc_huge_page needs
>> to handle this special case and do the proper accounting.
>>
>> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
>> ---
>> mm/hugetlb.c | 48 +++++++++++++++++++++++++++---------------------
>> 1 file changed, 27 insertions(+), 21 deletions(-)
>>
>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
>> index ecbaffe..9c295c9 100644
>> --- a/mm/hugetlb.c
>> +++ b/mm/hugetlb.c
>> @@ -692,19 +692,9 @@ static int vma_has_reserves(struct vm_area_struct *vma, long chg)
>> return 0;
>> }
>>
>> - if (vma->vm_flags & VM_MAYSHARE) {
>> - /*
>> - * We know VM_NORESERVE is not set. Therefore, there SHOULD
>> - * be a region map for all pages. The only situation where
>> - * there is no region map is if a hole was punched via
>> - * fallocate. In this case, there really are no reverves to
>> - * use. This situation is indicated if chg != 0.
>> - */
>> - if (chg)
>> - return 0;
>> - else
>> - return 1;
>> - }
>> + /* Shared mappings always use reserves */
>> + if (vma->vm_flags & VM_MAYSHARE)
>> + return 1;
>
> This change completely reverts 5/9, so can you omit 5/9?
That was a mistake. This change should not be in the patch. The
change from 5/9 needs to remain. Sorry for confusion. Thanks for
catching.
>> /*
>> * Only the process that called mmap() has reserves for
>> @@ -1601,6 +1591,7 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma,
>> struct hstate *h = hstate_vma(vma);
>> struct page *page;
>> long chg, commit;
>> + long gbl_chg;
>> int ret, idx;
>> struct hugetlb_cgroup *h_cg;
>>
>> @@ -1608,24 +1599,39 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma,
>> /*
>> * Processes that did not create the mapping will have no
>> * reserves and will not have accounted against subpool
>> - * limit. Check that the subpool limit can be made before
>> - * satisfying the allocation MAP_NORESERVE mappings may also
>> - * need pages and subpool limit allocated allocated if no reserve
>> - * mapping overlaps.
>> + * limit. Check that the subpool limit will not be exceeded
>> + * before performing the allocation. Allocations for
>> + * MAP_NORESERVE mappings also need to be checked against
>> + * any subpool limit.
>> + *
>> + * NOTE: Shared mappings with holes punched via fallocate
>> + * may still have reservations, even without entries in the
>> + * reserve map as indicated by vma_needs_reservation. This
>> + * would be the case if hugepage_subpool_get_pages returns
>> + * zero to indicate no changes to the global reservation count
>> + * are necessary. In this case, pass the output of
>> + * hugepage_subpool_get_pages (zero) to dequeue_huge_page_vma
>> + * so that the page is not counted against the global limit.
>> + * For MAP_NORESERVE mappings always pass the output of
>> + * vma_needs_reservation. For race detection and error cleanup
>> + * use output of vma_needs_reservation as well.
>> */
>> - chg = vma_needs_reservation(h, vma, addr);
>> + chg = gbl_chg = vma_needs_reservation(h, vma, addr);
>> if (chg < 0)
>> return ERR_PTR(-ENOMEM);
>> - if (chg || avoid_reserve)
>> - if (hugepage_subpool_get_pages(spool, 1) < 0)
>> + if (chg || avoid_reserve) {
>> + gbl_chg = hugepage_subpool_get_pages(spool, 1);
>> + if (gbl_chg < 0)
>> return ERR_PTR(-ENOSPC);
>> + }
>>
>> ret = hugetlb_cgroup_charge_cgroup(idx, pages_per_huge_page(h), &h_cg);
>> if (ret)
>> goto out_subpool_put;
>>
>> spin_lock(&hugetlb_lock);
>> - page = dequeue_huge_page_vma(h, vma, addr, avoid_reserve, chg);
>> + page = dequeue_huge_page_vma(h, vma, addr, avoid_reserve,
>> + avoid_reserve ? chg : gbl_chg);
>
> You use chg or gbl_chg depending on avoid_reserve here, and below this line
> there's code like below
>
> commit = vma_commit_reservation(h, vma, addr);
> if (unlikely(chg > commit)) {
> ...
> }
>
> This also need to be changed to use chg or gbl_chg depending on avoid_reserve?
It should use chg only. I attempted to address this at the end of the
Note above.
" For race detection and error cleanup use output of vma_needs_reservation
as well."
I will add more comments to make it clear.
> # I feel that this reserve-handling code in alloc_huge_page() is too complicated
> # and hard to understand, so some cleanup like separating reserve parts into
> # other new routine(s) might be helpful...
I agree, let me think about ways to split this up and hopefully make
it easier to understand.
--
Mike Kravetz
>
> Thanks,
> Naoya Horiguchi
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
WARNING: multiple messages have this Message-ID (diff)
From: Mike Kravetz <mike.kravetz@oracle.com>
To: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: "linux-mm@kvack.org" <linux-mm@kvack.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
Dave Hansen <dave.hansen@linux.intel.com>,
David Rientjes <rientjes@google.com>,
Hugh Dickins <hughd@google.com>,
Davidlohr Bueso <dave@stgolabs.net>,
Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>,
Hillf Danton <hillf.zj@alibaba-inc.com>,
Christoph Hellwig <hch@infradead.org>
Subject: Re: [RFC v4 PATCH 6/9] mm/hugetlb: alloc_huge_page handle areas hole punched by fallocate
Date: Mon, 15 Jun 2015 11:42:25 -0700 [thread overview]
Message-ID: <557F1C91.9080904@oracle.com> (raw)
In-Reply-To: <20150615063444.GA26050@hori1.linux.bs1.fc.nec.co.jp>
On 06/14/2015 11:34 PM, Naoya Horiguchi wrote:
> On Thu, Jun 11, 2015 at 02:01:37PM -0700, Mike Kravetz wrote:
>> Areas hole punched by fallocate will not have entries in the
>> region/reserve map. However, shared mappings with min_size subpool
>> reservations may still have reserved pages. alloc_huge_page needs
>> to handle this special case and do the proper accounting.
>>
>> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
>> ---
>> mm/hugetlb.c | 48 +++++++++++++++++++++++++++---------------------
>> 1 file changed, 27 insertions(+), 21 deletions(-)
>>
>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
>> index ecbaffe..9c295c9 100644
>> --- a/mm/hugetlb.c
>> +++ b/mm/hugetlb.c
>> @@ -692,19 +692,9 @@ static int vma_has_reserves(struct vm_area_struct *vma, long chg)
>> return 0;
>> }
>>
>> - if (vma->vm_flags & VM_MAYSHARE) {
>> - /*
>> - * We know VM_NORESERVE is not set. Therefore, there SHOULD
>> - * be a region map for all pages. The only situation where
>> - * there is no region map is if a hole was punched via
>> - * fallocate. In this case, there really are no reverves to
>> - * use. This situation is indicated if chg != 0.
>> - */
>> - if (chg)
>> - return 0;
>> - else
>> - return 1;
>> - }
>> + /* Shared mappings always use reserves */
>> + if (vma->vm_flags & VM_MAYSHARE)
>> + return 1;
>
> This change completely reverts 5/9, so can you omit 5/9?
That was a mistake. This change should not be in the patch. The
change from 5/9 needs to remain. Sorry for confusion. Thanks for
catching.
>> /*
>> * Only the process that called mmap() has reserves for
>> @@ -1601,6 +1591,7 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma,
>> struct hstate *h = hstate_vma(vma);
>> struct page *page;
>> long chg, commit;
>> + long gbl_chg;
>> int ret, idx;
>> struct hugetlb_cgroup *h_cg;
>>
>> @@ -1608,24 +1599,39 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma,
>> /*
>> * Processes that did not create the mapping will have no
>> * reserves and will not have accounted against subpool
>> - * limit. Check that the subpool limit can be made before
>> - * satisfying the allocation MAP_NORESERVE mappings may also
>> - * need pages and subpool limit allocated allocated if no reserve
>> - * mapping overlaps.
>> + * limit. Check that the subpool limit will not be exceeded
>> + * before performing the allocation. Allocations for
>> + * MAP_NORESERVE mappings also need to be checked against
>> + * any subpool limit.
>> + *
>> + * NOTE: Shared mappings with holes punched via fallocate
>> + * may still have reservations, even without entries in the
>> + * reserve map as indicated by vma_needs_reservation. This
>> + * would be the case if hugepage_subpool_get_pages returns
>> + * zero to indicate no changes to the global reservation count
>> + * are necessary. In this case, pass the output of
>> + * hugepage_subpool_get_pages (zero) to dequeue_huge_page_vma
>> + * so that the page is not counted against the global limit.
>> + * For MAP_NORESERVE mappings always pass the output of
>> + * vma_needs_reservation. For race detection and error cleanup
>> + * use output of vma_needs_reservation as well.
>> */
>> - chg = vma_needs_reservation(h, vma, addr);
>> + chg = gbl_chg = vma_needs_reservation(h, vma, addr);
>> if (chg < 0)
>> return ERR_PTR(-ENOMEM);
>> - if (chg || avoid_reserve)
>> - if (hugepage_subpool_get_pages(spool, 1) < 0)
>> + if (chg || avoid_reserve) {
>> + gbl_chg = hugepage_subpool_get_pages(spool, 1);
>> + if (gbl_chg < 0)
>> return ERR_PTR(-ENOSPC);
>> + }
>>
>> ret = hugetlb_cgroup_charge_cgroup(idx, pages_per_huge_page(h), &h_cg);
>> if (ret)
>> goto out_subpool_put;
>>
>> spin_lock(&hugetlb_lock);
>> - page = dequeue_huge_page_vma(h, vma, addr, avoid_reserve, chg);
>> + page = dequeue_huge_page_vma(h, vma, addr, avoid_reserve,
>> + avoid_reserve ? chg : gbl_chg);
>
> You use chg or gbl_chg depending on avoid_reserve here, and below this line
> there's code like below
>
> commit = vma_commit_reservation(h, vma, addr);
> if (unlikely(chg > commit)) {
> ...
> }
>
> This also need to be changed to use chg or gbl_chg depending on avoid_reserve?
It should use chg only. I attempted to address this at the end of the
Note above.
" For race detection and error cleanup use output of vma_needs_reservation
as well."
I will add more comments to make it clear.
> # I feel that this reserve-handling code in alloc_huge_page() is too complicated
> # and hard to understand, so some cleanup like separating reserve parts into
> # other new routine(s) might be helpful...
I agree, let me think about ways to split this up and hopefully make
it easier to understand.
--
Mike Kravetz
>
> Thanks,
> Naoya Horiguchi
>
next prev parent reply other threads:[~2015-06-15 18:43 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-06-11 21:01 [RFC v4 PATCH 0/9] hugetlbfs: add fallocate support Mike Kravetz
2015-06-11 21:01 ` Mike Kravetz
2015-06-11 21:01 ` [RFC v4 PATCH 1/9] mm/hugetlb: add region_del() to delete a specific range of entries Mike Kravetz
2015-06-11 21:01 ` Mike Kravetz
2015-06-11 21:01 ` [RFC v4 PATCH 2/9] mm/hugetlb: expose hugetlb fault mutex for use by fallocate Mike Kravetz
2015-06-11 21:01 ` Mike Kravetz
2015-06-11 22:46 ` Davidlohr Bueso
2015-06-11 22:46 ` Davidlohr Bueso
2015-06-11 23:09 ` Mike Kravetz
2015-06-11 23:09 ` Mike Kravetz
2015-06-17 22:05 ` Mike Kravetz
2015-06-17 22:05 ` Mike Kravetz
2015-06-11 21:01 ` [RFC v4 PATCH 3/9] hugetlbfs: hugetlb_vmtruncate_list() needs to take a range to delete Mike Kravetz
2015-06-11 21:01 ` Mike Kravetz
2015-06-11 21:01 ` [RFC v4 PATCH 4/9] hugetlbfs: truncate_hugepages() takes a range of pages Mike Kravetz
2015-06-11 21:01 ` Mike Kravetz
2015-06-11 21:01 ` [RFC v4 PATCH 5/9] mm/hugetlb: vma_has_reserves() needs to handle fallocate hole punch Mike Kravetz
2015-06-11 21:01 ` Mike Kravetz
2015-06-11 21:01 ` [RFC v4 PATCH 6/9] mm/hugetlb: alloc_huge_page handle areas hole punched by fallocate Mike Kravetz
2015-06-11 21:01 ` Mike Kravetz
2015-06-15 6:34 ` Naoya Horiguchi
2015-06-15 6:34 ` Naoya Horiguchi
2015-06-15 18:42 ` Mike Kravetz [this message]
2015-06-15 18:42 ` Mike Kravetz
2015-06-11 21:01 ` [RFC v4 PATCH 7/9] hugetlbfs: New huge_add_to_page_cache helper routine Mike Kravetz
2015-06-11 21:01 ` Mike Kravetz
2015-06-11 21:01 ` [RFC v4 PATCH 8/9] hugetlbfs: add hugetlbfs_fallocate() Mike Kravetz
2015-06-11 21:01 ` Mike Kravetz
2015-06-11 21:01 ` [RFC v4 PATCH 9/9] mm: madvise allow remove operation for hugetlbfs Mike Kravetz
2015-06-11 21:01 ` Mike Kravetz
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=557F1C91.9080904@oracle.com \
--to=mike.kravetz@oracle.com \
--cc=aneesh.kumar@linux.vnet.ibm.com \
--cc=dave.hansen@linux.intel.com \
--cc=dave@stgolabs.net \
--cc=hch@infradead.org \
--cc=hillf.zj@alibaba-inc.com \
--cc=hughd@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=n-horiguchi@ah.jp.nec.com \
--cc=rientjes@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.