From: Mike Kravetz <mike.kravetz@oracle.com>
To: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: "linux-mm@kvack.org" <linux-mm@kvack.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"linux-api@vger.kernel.org" <linux-api@vger.kernel.org>,
Dave Hansen <dave.hansen@linux.intel.com>,
David Rientjes <rientjes@google.com>,
Hugh Dickins <hughd@google.com>,
Davidlohr Bueso <dave@stgolabs.net>,
Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>,
Hillf Danton <hillf.zj@alibaba-inc.com>,
Christoph Hellwig <hch@infradead.org>,
Andrew Morton <akpm@linux-foundation.org>,
Michal Hocko <mhocko@suse.cz>
Subject: Re: [PATCH v3 01/10] mm/hugetlb: add cache of descriptors to resv_map for region_add
Date: Mon, 20 Jul 2015 10:50:12 -0700 [thread overview]
Message-ID: <55AD34D4.2020804@oracle.com> (raw)
In-Reply-To: <20150717090213.GB32135@hori1.linux.bs1.fc.nec.co.jp>
On 07/17/2015 02:02 AM, Naoya Horiguchi wrote:
> On Sun, Jul 12, 2015 at 09:20:59PM -0700, Mike Kravetz wrote:
>> fallocate hole punch will want to remove a specific range of
>> pages. When pages are removed, their associated entries in
>> the region/reserve map will also be removed. This will break
>> an assumption in the region_chg/region_add calling sequence.
>> If a new region descriptor must be allocated, it is done as
>> part of the region_chg processing. In this way, region_add
>> can not fail because it does not need to attempt an allocation.
>>
>> To prepare for fallocate hole punch, create a "cache" of
>> descriptors that can be used by region_add if necessary.
>> region_chg will ensure there are sufficient entries in the
>> cache. It will be necessary to track the number of in progress
>> add operations to know a sufficient number of descriptors
>> reside in the cache. A new routine region_abort is added to
>> adjust this in progress count when add operations are aborted.
>> vma_abort_reservation is also added for callers creating
>> reservations with vma_needs_reservation/vma_commit_reservation.
>>
>> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
>> ---
>> include/linux/hugetlb.h | 3 +
>> mm/hugetlb.c | 169 ++++++++++++++++++++++++++++++++++++++++++------
>> 2 files changed, 153 insertions(+), 19 deletions(-)
>>
>> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
>> index d891f94..667cf44 100644
>> --- a/include/linux/hugetlb.h
>> +++ b/include/linux/hugetlb.h
>> @@ -35,6 +35,9 @@ struct resv_map {
>> struct kref refs;
>> spinlock_t lock;
>> struct list_head regions;
>> + long adds_in_progress;
>> + struct list_head rgn_cache;
>> + long rgn_cache_count;
>> };
>> extern struct resv_map *resv_map_alloc(void);
>> void resv_map_release(struct kref *ref);
>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
>> index a8c3087..241d16d 100644
>> --- a/mm/hugetlb.c
>> +++ b/mm/hugetlb.c
>> @@ -240,11 +240,14 @@ struct file_region {
>>
>> /*
>> * Add the huge page range represented by [f, t) to the reserve
>> - * map. Existing regions will be expanded to accommodate the
>> - * specified range. We know only existing regions need to be
>> - * expanded, because region_add is only called after region_chg
>> - * with the same range. If a new file_region structure must
>> - * be allocated, it is done in region_chg.
>> + * map. In the normal case, existing regions will be expanded
>> + * to accommodate the specified range. Sufficient regions should
>> + * exist for expansion due to the previous call to region_chg
>> + * with the same range. However, it is possible that region_del
>> + * could have been called after region_chg and modifed the map
>> + * in such a way that no region exists to be expanded. In this
>> + * case, pull a region descriptor from the cache associated with
>> + * the map and use that for the new range.
>> *
>> * Return the number of new huge pages added to the map. This
>> * number is greater than or equal to zero.
>> @@ -261,6 +264,27 @@ static long region_add(struct resv_map *resv, long f, long t)
>> if (f <= rg->to)
>> break;
>>
>> + if (&rg->link == head || t < rg->from) {
>> + /*
>> + * No region exists which can be expanded to include the
>> + * specified range. Pull a region descriptor from the
>> + * cache, and use it for this range.
>> + */
>
> This comment mentions this if-block, not the VM_BUG_ON below, so it had
> better be put the above if-line.
OK, I will move and make a minor modification to the comment.
>
>> + VM_BUG_ON(!resv->rgn_cache_count);
>
> resv->rgn_cache_count <= 0 might be safer.
Sure.
> ...
>> @@ -3236,11 +3360,14 @@ retry:
>> * any allocations necessary to record that reservation occur outside
>> * the spinlock.
>> */
>> - if ((flags & FAULT_FLAG_WRITE) && !(vma->vm_flags & VM_SHARED))
>> + if ((flags & FAULT_FLAG_WRITE) && !(vma->vm_flags & VM_SHARED)) {
>> if (vma_needs_reservation(h, vma, address) < 0) {
>> ret = VM_FAULT_OOM;
>> goto backout_unlocked;
>> }
>> + /* Just decrements count, does not deallocate */
>> + vma_abort_reservation(h, vma, address);
>> + }
>
> This is not "abort reservation" operation, but you use "abort reservation"
> routine, which might confusing and makes future maintenance hard. I think
> this should be done in a simplified variant of vma_commit_reservation()
> (maybe just an alias of your vma_abort_reservation()) or fast path in
> vma_commit_reservation().
I am struggling a bit with the names of these routines. The
routines in question are:
vma_needs_reservation - This is a wrapper for region_chg(), so the
return value is the number of regions needed for the page.
Since there is only one page, the routine effectively
becomes a boolean. Hence the name "needs".
vma_commit_reservation - This is a wrapper for region_add(). It
must be called after a prior call to vma_needs_reservation
and after actual allocation of the page.
We need a way to handle the case where vma_needs_reservation has
been called, but the page allocation is not successful. I chose
the name vma_abort_reservation, but as noted (even in my comments)
it is not an actual abort.
I am not sure if you are suggesting vma_commit_reservation() should
handle this as a special case. I think a separately named routine which
indicates and end of the reservation/allocation process would be
easier to understand.
What about changing the name vma_abort_reservation() to
vma_end_reservation()? This would indicate that the reservation/
allocation process is ended.
> Thanks,
> Naoya Horiguchi
Thank you for your reviews.
--
Mike Kravetz
>
>>
>> ptl = huge_pte_lockptr(h, mm, ptep);
>> spin_lock(ptl);
>> @@ -3387,6 +3514,8 @@ int hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
>> ret = VM_FAULT_OOM;
>> goto out_mutex;
>> }
>> + /* Just decrements count, does not deallocate */
>> + vma_abort_reservation(h, vma, address);
>>
>> if (!(vma->vm_flags & VM_MAYSHARE))
>> pagecache_page = hugetlbfs_pagecache_page(h,
>> @@ -3726,6 +3855,8 @@ int hugetlb_reserve_pages(struct inode *inode,
>> }
>> return 0;
>> out_err:
>> + if (!vma || vma->vm_flags & VM_MAYSHARE)
>> + region_abort(resv_map, from, to);
>> if (vma && is_vma_resv_set(vma, HPAGE_RESV_OWNER))
>> kref_put(&resv_map->refs, resv_map_release);
>> return ret;
>> --
>> 2.1.0
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
WARNING: multiple messages have this Message-ID (diff)
From: Mike Kravetz <mike.kravetz@oracle.com>
To: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: "linux-mm@kvack.org" <linux-mm@kvack.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"linux-api@vger.kernel.org" <linux-api@vger.kernel.org>,
Dave Hansen <dave.hansen@linux.intel.com>,
David Rientjes <rientjes@google.com>,
Hugh Dickins <hughd@google.com>,
Davidlohr Bueso <dave@stgolabs.net>,
Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>,
Hillf Danton <hillf.zj@alibaba-inc.com>,
Christoph Hellwig <hch@infradead.org>,
Andrew Morton <akpm@linux-foundation.org>,
Michal Hocko <mhocko@suse.cz>
Subject: Re: [PATCH v3 01/10] mm/hugetlb: add cache of descriptors to resv_map for region_add
Date: Mon, 20 Jul 2015 10:50:12 -0700 [thread overview]
Message-ID: <55AD34D4.2020804@oracle.com> (raw)
In-Reply-To: <20150717090213.GB32135@hori1.linux.bs1.fc.nec.co.jp>
On 07/17/2015 02:02 AM, Naoya Horiguchi wrote:
> On Sun, Jul 12, 2015 at 09:20:59PM -0700, Mike Kravetz wrote:
>> fallocate hole punch will want to remove a specific range of
>> pages. When pages are removed, their associated entries in
>> the region/reserve map will also be removed. This will break
>> an assumption in the region_chg/region_add calling sequence.
>> If a new region descriptor must be allocated, it is done as
>> part of the region_chg processing. In this way, region_add
>> can not fail because it does not need to attempt an allocation.
>>
>> To prepare for fallocate hole punch, create a "cache" of
>> descriptors that can be used by region_add if necessary.
>> region_chg will ensure there are sufficient entries in the
>> cache. It will be necessary to track the number of in progress
>> add operations to know a sufficient number of descriptors
>> reside in the cache. A new routine region_abort is added to
>> adjust this in progress count when add operations are aborted.
>> vma_abort_reservation is also added for callers creating
>> reservations with vma_needs_reservation/vma_commit_reservation.
>>
>> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
>> ---
>> include/linux/hugetlb.h | 3 +
>> mm/hugetlb.c | 169 ++++++++++++++++++++++++++++++++++++++++++------
>> 2 files changed, 153 insertions(+), 19 deletions(-)
>>
>> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
>> index d891f94..667cf44 100644
>> --- a/include/linux/hugetlb.h
>> +++ b/include/linux/hugetlb.h
>> @@ -35,6 +35,9 @@ struct resv_map {
>> struct kref refs;
>> spinlock_t lock;
>> struct list_head regions;
>> + long adds_in_progress;
>> + struct list_head rgn_cache;
>> + long rgn_cache_count;
>> };
>> extern struct resv_map *resv_map_alloc(void);
>> void resv_map_release(struct kref *ref);
>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
>> index a8c3087..241d16d 100644
>> --- a/mm/hugetlb.c
>> +++ b/mm/hugetlb.c
>> @@ -240,11 +240,14 @@ struct file_region {
>>
>> /*
>> * Add the huge page range represented by [f, t) to the reserve
>> - * map. Existing regions will be expanded to accommodate the
>> - * specified range. We know only existing regions need to be
>> - * expanded, because region_add is only called after region_chg
>> - * with the same range. If a new file_region structure must
>> - * be allocated, it is done in region_chg.
>> + * map. In the normal case, existing regions will be expanded
>> + * to accommodate the specified range. Sufficient regions should
>> + * exist for expansion due to the previous call to region_chg
>> + * with the same range. However, it is possible that region_del
>> + * could have been called after region_chg and modifed the map
>> + * in such a way that no region exists to be expanded. In this
>> + * case, pull a region descriptor from the cache associated with
>> + * the map and use that for the new range.
>> *
>> * Return the number of new huge pages added to the map. This
>> * number is greater than or equal to zero.
>> @@ -261,6 +264,27 @@ static long region_add(struct resv_map *resv, long f, long t)
>> if (f <= rg->to)
>> break;
>>
>> + if (&rg->link == head || t < rg->from) {
>> + /*
>> + * No region exists which can be expanded to include the
>> + * specified range. Pull a region descriptor from the
>> + * cache, and use it for this range.
>> + */
>
> This comment mentions this if-block, not the VM_BUG_ON below, so it had
> better be put the above if-line.
OK, I will move and make a minor modification to the comment.
>
>> + VM_BUG_ON(!resv->rgn_cache_count);
>
> resv->rgn_cache_count <= 0 might be safer.
Sure.
> ...
>> @@ -3236,11 +3360,14 @@ retry:
>> * any allocations necessary to record that reservation occur outside
>> * the spinlock.
>> */
>> - if ((flags & FAULT_FLAG_WRITE) && !(vma->vm_flags & VM_SHARED))
>> + if ((flags & FAULT_FLAG_WRITE) && !(vma->vm_flags & VM_SHARED)) {
>> if (vma_needs_reservation(h, vma, address) < 0) {
>> ret = VM_FAULT_OOM;
>> goto backout_unlocked;
>> }
>> + /* Just decrements count, does not deallocate */
>> + vma_abort_reservation(h, vma, address);
>> + }
>
> This is not "abort reservation" operation, but you use "abort reservation"
> routine, which might confusing and makes future maintenance hard. I think
> this should be done in a simplified variant of vma_commit_reservation()
> (maybe just an alias of your vma_abort_reservation()) or fast path in
> vma_commit_reservation().
I am struggling a bit with the names of these routines. The
routines in question are:
vma_needs_reservation - This is a wrapper for region_chg(), so the
return value is the number of regions needed for the page.
Since there is only one page, the routine effectively
becomes a boolean. Hence the name "needs".
vma_commit_reservation - This is a wrapper for region_add(). It
must be called after a prior call to vma_needs_reservation
and after actual allocation of the page.
We need a way to handle the case where vma_needs_reservation has
been called, but the page allocation is not successful. I chose
the name vma_abort_reservation, but as noted (even in my comments)
it is not an actual abort.
I am not sure if you are suggesting vma_commit_reservation() should
handle this as a special case. I think a separately named routine which
indicates and end of the reservation/allocation process would be
easier to understand.
What about changing the name vma_abort_reservation() to
vma_end_reservation()? This would indicate that the reservation/
allocation process is ended.
> Thanks,
> Naoya Horiguchi
Thank you for your reviews.
--
Mike Kravetz
>
>>
>> ptl = huge_pte_lockptr(h, mm, ptep);
>> spin_lock(ptl);
>> @@ -3387,6 +3514,8 @@ int hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
>> ret = VM_FAULT_OOM;
>> goto out_mutex;
>> }
>> + /* Just decrements count, does not deallocate */
>> + vma_abort_reservation(h, vma, address);
>>
>> if (!(vma->vm_flags & VM_MAYSHARE))
>> pagecache_page = hugetlbfs_pagecache_page(h,
>> @@ -3726,6 +3855,8 @@ int hugetlb_reserve_pages(struct inode *inode,
>> }
>> return 0;
>> out_err:
>> + if (!vma || vma->vm_flags & VM_MAYSHARE)
>> + region_abort(resv_map, from, to);
>> if (vma && is_vma_resv_set(vma, HPAGE_RESV_OWNER))
>> kref_put(&resv_map->refs, resv_map_release);
>> return ret;
>> --
>> 2.1.0
next prev parent reply other threads:[~2015-07-20 17:50 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-07-13 4:20 [PATCH v3 00/10] hugetlbfs: add fallocate support Mike Kravetz
2015-07-13 4:20 ` Mike Kravetz
2015-07-13 4:20 ` [PATCH v3 01/10] mm/hugetlb: add cache of descriptors to resv_map for region_add Mike Kravetz
2015-07-13 4:20 ` Mike Kravetz
2015-07-17 9:02 ` Naoya Horiguchi
2015-07-17 9:02 ` Naoya Horiguchi
2015-07-20 17:50 ` Mike Kravetz [this message]
2015-07-20 17:50 ` Mike Kravetz
2015-07-21 4:16 ` Naoya Horiguchi
2015-07-21 4:16 ` Naoya Horiguchi
2015-07-13 4:21 ` [PATCH v3 02/10] mm/hugetlb: add region_del() to delete a specific range of entries Mike Kravetz
2015-07-13 4:21 ` Mike Kravetz
2015-07-13 4:21 ` [PATCH v3 03/10] mm/hugetlb: expose hugetlb fault mutex for use by fallocate Mike Kravetz
2015-07-13 4:21 ` Mike Kravetz
2015-07-13 4:21 ` [PATCH v3 04/10] hugetlbfs: hugetlb_vmtruncate_list() needs to take a range to delete Mike Kravetz
2015-07-13 4:21 ` Mike Kravetz
2015-07-13 4:21 ` [PATCH v3 05/10] hugetlbfs: truncate_hugepages() takes a range of pages Mike Kravetz
2015-07-13 4:21 ` Mike Kravetz
2015-07-13 4:21 ` [PATCH v3 06/10] mm/hugetlb: vma_has_reserves() needs to handle fallocate hole punch Mike Kravetz
2015-07-13 4:21 ` Mike Kravetz
2015-07-13 4:21 ` [PATCH v3 07/10] mm/hugetlb: alloc_huge_page handle areas hole punched by fallocate Mike Kravetz
2015-07-13 4:21 ` Mike Kravetz
2015-07-13 4:21 ` [PATCH v3 08/10] hugetlbfs: New huge_add_to_page_cache helper routine Mike Kravetz
2015-07-13 4:21 ` Mike Kravetz
2015-07-13 4:21 ` [PATCH v3 09/10] hugetlbfs: add hugetlbfs_fallocate() Mike Kravetz
2015-07-13 4:21 ` Mike Kravetz
2015-07-13 4:21 ` [PATCH v3 10/10] mm: madvise allow remove operation for hugetlbfs Mike Kravetz
2015-07-13 4:21 ` Mike Kravetz
2015-07-17 9:01 ` [PATCH v3 00/10] hugetlbfs: add fallocate support Naoya Horiguchi
2015-07-17 9:01 ` Naoya Horiguchi
2015-07-21 3:08 ` Hillf Danton
2015-07-21 3:08 ` Hillf Danton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=55AD34D4.2020804@oracle.com \
--to=mike.kravetz@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=aneesh.kumar@linux.vnet.ibm.com \
--cc=dave.hansen@linux.intel.com \
--cc=dave@stgolabs.net \
--cc=hch@infradead.org \
--cc=hillf.zj@alibaba-inc.com \
--cc=hughd@google.com \
--cc=linux-api@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@suse.cz \
--cc=n-horiguchi@ah.jp.nec.com \
--cc=rientjes@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.