From: Mike Kravetz <mike.kravetz@oracle.com>
To: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: "linux-mm@kvack.org" <linux-mm@kvack.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
Davidlohr Bueso <dave@stgolabs.net>,
David Rientjes <rientjes@google.com>,
Luiz Capitulino <lcapitulino@redhat.com>,
Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [PATCH v2 1/2] mm/hugetlb: compute/return the number of regions added by region_add()
Date: Tue, 26 May 2015 09:30:23 -0700 [thread overview]
Message-ID: <55649F9F.9090401@oracle.com> (raw)
In-Reply-To: <20150525061922.GA3751@hori1.linux.bs1.fc.nec.co.jp>
On 05/24/2015 11:19 PM, Naoya Horiguchi wrote:
> On Fri, May 22, 2015 at 08:55:03PM -0700, Mike Kravetz wrote:
>> Modify region_add() to keep track of regions(pages) added to the
>> reserve map and return this value. The return value can be
>> compared to the return value of region_chg() to determine if the
>> map was modified between calls.
>>
>> Add documentation to the reserve/region map routines.
>>
>> Make vma_commit_reservation() also pass along the return value of
>> region_add(). In the normal case, we want vma_commit_reservation
>> to return the same value as the preceding call to vma_needs_reservation.
>> Create a common __vma_reservation_common routine to help keep the
>> special case return values in sync
>>
>> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
>> ---
>> mm/hugetlb.c | 120 ++++++++++++++++++++++++++++++++++++++++++++++-------------
>> 1 file changed, 94 insertions(+), 26 deletions(-)
>>
>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
>> index 54f129d..3855889 100644
>> --- a/mm/hugetlb.c
>> +++ b/mm/hugetlb.c
>> @@ -212,8 +212,16 @@ static inline struct hugepage_subpool *subpool_vma(struct vm_area_struct *vma)
>> * Region tracking -- allows tracking of reservations and instantiated pages
>> * across the pages in a mapping.
>> *
>> - * The region data structures are embedded into a resv_map and
>> - * protected by a resv_map's lock
>> + * The region data structures are embedded into a resv_map and protected
>> + * by a resv_map's lock. The set of regions within the resv_map represent
>> + * reservations for huge pages, or huge pages that have already been
>> + * instantiated within the map.
>
>> The from and to elements are huge page
>> + * indicies into the associated mapping. from indicates the starting index
>> + * of the region. to represents the first index past the end of the region.
>> + * For example, a file region structure with from == 0 and to == 4 represents
>> + * four huge pages in a mapping. It is important to note that the to element
>> + * represents the first element past the end of the region. This is used in
>> + * arithmetic as 4(to) - 0(from) = 4 huge pages in the region.
>
> How about just saying "[from, to)", which implies "from" is inclusive and "to"
> is exclusive. I hope this mathematical notation is widely accepted among kernel
> developers.
OK, I'll add the mathematical notation. But, I think the more explicit
description and example might help those who would not recognize the
notation.
>> */
>> struct file_region {
>> struct list_head link;
>> @@ -221,10 +229,23 @@ struct file_region {
>> long to;
>> };
>>
>> +/*
>> + * Add the huge page range represented by indicies f (from)
>> + * and t (to) to the reserve map. Existing regions will be
>> + * expanded to accommodate the specified range. We know only
>> + * existing regions need to be expanded, because region_add
>> + * is only called after region_chg(with the same range). If
>> + * a new file_region structure must be allocated, it is done
>> + * in region_chg.
>> + *
>> + * Return the number of new huge pages added to the map. This
>> + * number is greater than or equal to zero.
>> + */
>> static long region_add(struct resv_map *resv, long f, long t)
>> {
>> struct list_head *head = &resv->regions;
>> struct file_region *rg, *nrg, *trg;
>> + long add = 0;
>>
>> spin_lock(&resv->lock);
>> /* Locate the region we are either in or before. */
>> @@ -250,16 +271,44 @@ static long region_add(struct resv_map *resv, long f, long t)
>> if (rg->to > t)
>> t = rg->to;
>> if (rg != nrg) {
>> + /* Decrement return value by the deleted range.
>> + * Another range will span this area so that by
>> + * end of routine add will be >= zero
>> + */
>> + add -= (rg->to - rg->from);
>
> I can't say how, but if file_region data were broken for some reason (mainly
> due to bug,) this could return negative value, so how about asserting add >=0
> with VM_BUG_ON() at the end of this function?
>
Sure, that should not be a problem.
>> list_del(&rg->link);
>> kfree(rg);
>> }
>> }
>> +
>> + add += (nrg->from - f); /* Added to beginning of region */
>> nrg->from = f;
>> + add += t - nrg->to; /* Added to end of region */
>> nrg->to = t;
>> +
>> spin_unlock(&resv->lock);
>> - return 0;
>> + return add;
>> }
>>
>> +/*
>> + * Examine the existing reserve map and determine how many
>> + * huge pages in the specified range (f, t) are NOT currently
>
> "[f, t)" would be better.
>
Yes, my plan is to use the mathematical notation throughout with a
more detailed explanation the first time it is used.
>> + * represented. This routine is called before a subsequent
>> + * call to region_add that will actually modify the reserve
>> + * map to add the specified range (f, t). region_chg does
>> + * not change the number of huge pages represented by the
>> + * map. However, if the existing regions in the map can not
>> + * be expanded to represent the new range, a new file_region
>> + * structure is added to the map as a placeholder. This is
>> + * so that the subsequent region_add call will have all
>> + * regions it needs and will not fail.
>> + *
>> + * Returns the number of huge pages that need to be added
>> + * to the existing reservation map for the range (f, t).
>> + * This number is greater or equal to zero. -ENOMEM is
>> + * returned if a new file_region structure can not be
>> + * allocated.
>> + */
>> static long region_chg(struct resv_map *resv, long f, long t)
>> {
>> struct list_head *head = &resv->regions;
>> @@ -326,6 +375,11 @@ out_nrg:
>> return chg;
>> }
>>
>> +/*
>> + * Truncate the reserve map at index 'end'. Modify/truncate any
>> + * region which contains end. Delete any regions past end.
>> + * Return the number of huge pages removed from the map.
>> + */
>> static long region_truncate(struct resv_map *resv, long end)
>> {
>> struct list_head *head = &resv->regions;
>> @@ -361,6 +415,10 @@ out:
>> return chg;
>> }
>>
>> +/*
>> + * Count and return the number of huge pages in the reserve map
>> + * that intersect with the range (f, t).
>> + */
>> static long region_count(struct resv_map *resv, long f, long t)
>> {
>> struct list_head *head = &resv->regions;
>> @@ -1424,46 +1482,56 @@ static void return_unused_surplus_pages(struct hstate *h,
>> }
>>
>> /*
>> - * Determine if the huge page at addr within the vma has an associated
>> - * reservation. Where it does not we will need to logically increase
>> - * reservation and actually increase subpool usage before an allocation
>> - * can occur. Where any new reservation would be required the
>> - * reservation change is prepared, but not committed. Once the page
>> - * has been allocated from the subpool and instantiated the change should
>> - * be committed via vma_commit_reservation. No action is required on
>> - * failure.
>> + * vma_needs_reservation and vma_commit_reservation are used by the huge
>> + * page allocation routines to manage reservations.
>> + *
>> + * vma_needs_reservation is called to determine if the huge page at addr
>> + * within the vma has an associated reservation. If a reservation is
>> + * needed, the value 1 is returned. The caller is then responsible for
>> + * managing the global reservation and subpool usage counts. After
>> + * the huge page has been allocated, vma_commit_reservation is called
>> + * to add the page to the reservation map.
>> + *
>> + * In the normal case, vma_commit_reservation should return the same value
>> + * as the preceding vma_needs_reservation call. The only time this is
>> + * not the case is if a reserve map was changed between calls. It is the
>> + * responsibility of the caller to notice the difference and take appropriate
>> + * action.
>> */
>> -static long vma_needs_reservation(struct hstate *h,
>> - struct vm_area_struct *vma, unsigned long addr)
>> +static long __vma_reservation_common(struct hstate *h,
>> + struct vm_area_struct *vma, unsigned long addr,
>> + bool needs)
>> {
>> struct resv_map *resv;
>> pgoff_t idx;
>> - long chg;
>> + long ret;
>>
>> resv = vma_resv_map(vma);
>> if (!resv)
>> return 1;
>>
>> idx = vma_hugecache_offset(h, vma, addr);
>> - chg = region_chg(resv, idx, idx + 1);
>> + if (needs)
>> + ret = region_chg(resv, idx, idx + 1);
>> + else
>> + ret = region_add(resv, idx, idx + 1);
>
> This code sharing is OK, but the name "needs" looks a bit unclear to me.
> I feel that it's more readable if we name "commit" (or "commits") to the bool
> parameter and call region_add() if "commit" is true.
>
Agree. And Davidlor suggested renaming "needs" in the routine name
to prepare. If renamed prepare, I think it would more clear.
>>
>> if (vma->vm_flags & VM_MAYSHARE)
>> - return chg;
>> + return ret;
>> else
>> - return chg < 0 ? chg : 0;
>> + return ret < 0 ? ret : 0;
>> }
>> -static void vma_commit_reservation(struct hstate *h,
>> +
>> +static long vma_needs_reservation(struct hstate *h,
>> struct vm_area_struct *vma, unsigned long addr)
>> {
>> - struct resv_map *resv;
>> - pgoff_t idx;
>> -
>> - resv = vma_resv_map(vma);
>> - if (!resv)
>> - return;
>> + return __vma_reservation_common(h, vma, addr, (bool)1);
>
> You can simply use literal "true"?
Yes.
Thank you for the review and comments.
--
Mike Kravetz
>
> Thanks,
> Naoya Horiguchi
>
>> +}
>>
>> - idx = vma_hugecache_offset(h, vma, addr);
>> - region_add(resv, idx, idx + 1);
>> +static long vma_commit_reservation(struct hstate *h,
>> + struct vm_area_struct *vma, unsigned long addr)
>> +{
>> + return __vma_reservation_common(h, vma, addr, (bool)0);
>> }
>>
>> static struct page *alloc_huge_page(struct vm_area_struct *vma,
>> --
>> 2.1.0
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
WARNING: multiple messages have this Message-ID (diff)
From: Mike Kravetz <mike.kravetz@oracle.com>
To: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: "linux-mm@kvack.org" <linux-mm@kvack.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
Davidlohr Bueso <dave@stgolabs.net>,
David Rientjes <rientjes@google.com>,
Luiz Capitulino <lcapitulino@redhat.com>,
Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [PATCH v2 1/2] mm/hugetlb: compute/return the number of regions added by region_add()
Date: Tue, 26 May 2015 09:30:23 -0700 [thread overview]
Message-ID: <55649F9F.9090401@oracle.com> (raw)
In-Reply-To: <20150525061922.GA3751@hori1.linux.bs1.fc.nec.co.jp>
On 05/24/2015 11:19 PM, Naoya Horiguchi wrote:
> On Fri, May 22, 2015 at 08:55:03PM -0700, Mike Kravetz wrote:
>> Modify region_add() to keep track of regions(pages) added to the
>> reserve map and return this value. The return value can be
>> compared to the return value of region_chg() to determine if the
>> map was modified between calls.
>>
>> Add documentation to the reserve/region map routines.
>>
>> Make vma_commit_reservation() also pass along the return value of
>> region_add(). In the normal case, we want vma_commit_reservation
>> to return the same value as the preceding call to vma_needs_reservation.
>> Create a common __vma_reservation_common routine to help keep the
>> special case return values in sync
>>
>> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
>> ---
>> mm/hugetlb.c | 120 ++++++++++++++++++++++++++++++++++++++++++++++-------------
>> 1 file changed, 94 insertions(+), 26 deletions(-)
>>
>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
>> index 54f129d..3855889 100644
>> --- a/mm/hugetlb.c
>> +++ b/mm/hugetlb.c
>> @@ -212,8 +212,16 @@ static inline struct hugepage_subpool *subpool_vma(struct vm_area_struct *vma)
>> * Region tracking -- allows tracking of reservations and instantiated pages
>> * across the pages in a mapping.
>> *
>> - * The region data structures are embedded into a resv_map and
>> - * protected by a resv_map's lock
>> + * The region data structures are embedded into a resv_map and protected
>> + * by a resv_map's lock. The set of regions within the resv_map represent
>> + * reservations for huge pages, or huge pages that have already been
>> + * instantiated within the map.
>
>> The from and to elements are huge page
>> + * indicies into the associated mapping. from indicates the starting index
>> + * of the region. to represents the first index past the end of the region.
>> + * For example, a file region structure with from == 0 and to == 4 represents
>> + * four huge pages in a mapping. It is important to note that the to element
>> + * represents the first element past the end of the region. This is used in
>> + * arithmetic as 4(to) - 0(from) = 4 huge pages in the region.
>
> How about just saying "[from, to)", which implies "from" is inclusive and "to"
> is exclusive. I hope this mathematical notation is widely accepted among kernel
> developers.
OK, I'll add the mathematical notation. But, I think the more explicit
description and example might help those who would not recognize the
notation.
>> */
>> struct file_region {
>> struct list_head link;
>> @@ -221,10 +229,23 @@ struct file_region {
>> long to;
>> };
>>
>> +/*
>> + * Add the huge page range represented by indicies f (from)
>> + * and t (to) to the reserve map. Existing regions will be
>> + * expanded to accommodate the specified range. We know only
>> + * existing regions need to be expanded, because region_add
>> + * is only called after region_chg(with the same range). If
>> + * a new file_region structure must be allocated, it is done
>> + * in region_chg.
>> + *
>> + * Return the number of new huge pages added to the map. This
>> + * number is greater than or equal to zero.
>> + */
>> static long region_add(struct resv_map *resv, long f, long t)
>> {
>> struct list_head *head = &resv->regions;
>> struct file_region *rg, *nrg, *trg;
>> + long add = 0;
>>
>> spin_lock(&resv->lock);
>> /* Locate the region we are either in or before. */
>> @@ -250,16 +271,44 @@ static long region_add(struct resv_map *resv, long f, long t)
>> if (rg->to > t)
>> t = rg->to;
>> if (rg != nrg) {
>> + /* Decrement return value by the deleted range.
>> + * Another range will span this area so that by
>> + * end of routine add will be >= zero
>> + */
>> + add -= (rg->to - rg->from);
>
> I can't say how, but if file_region data were broken for some reason (mainly
> due to bug,) this could return negative value, so how about asserting add >=0
> with VM_BUG_ON() at the end of this function?
>
Sure, that should not be a problem.
>> list_del(&rg->link);
>> kfree(rg);
>> }
>> }
>> +
>> + add += (nrg->from - f); /* Added to beginning of region */
>> nrg->from = f;
>> + add += t - nrg->to; /* Added to end of region */
>> nrg->to = t;
>> +
>> spin_unlock(&resv->lock);
>> - return 0;
>> + return add;
>> }
>>
>> +/*
>> + * Examine the existing reserve map and determine how many
>> + * huge pages in the specified range (f, t) are NOT currently
>
> "[f, t)" would be better.
>
Yes, my plan is to use the mathematical notation throughout with a
more detailed explanation the first time it is used.
>> + * represented. This routine is called before a subsequent
>> + * call to region_add that will actually modify the reserve
>> + * map to add the specified range (f, t). region_chg does
>> + * not change the number of huge pages represented by the
>> + * map. However, if the existing regions in the map can not
>> + * be expanded to represent the new range, a new file_region
>> + * structure is added to the map as a placeholder. This is
>> + * so that the subsequent region_add call will have all
>> + * regions it needs and will not fail.
>> + *
>> + * Returns the number of huge pages that need to be added
>> + * to the existing reservation map for the range (f, t).
>> + * This number is greater or equal to zero. -ENOMEM is
>> + * returned if a new file_region structure can not be
>> + * allocated.
>> + */
>> static long region_chg(struct resv_map *resv, long f, long t)
>> {
>> struct list_head *head = &resv->regions;
>> @@ -326,6 +375,11 @@ out_nrg:
>> return chg;
>> }
>>
>> +/*
>> + * Truncate the reserve map at index 'end'. Modify/truncate any
>> + * region which contains end. Delete any regions past end.
>> + * Return the number of huge pages removed from the map.
>> + */
>> static long region_truncate(struct resv_map *resv, long end)
>> {
>> struct list_head *head = &resv->regions;
>> @@ -361,6 +415,10 @@ out:
>> return chg;
>> }
>>
>> +/*
>> + * Count and return the number of huge pages in the reserve map
>> + * that intersect with the range (f, t).
>> + */
>> static long region_count(struct resv_map *resv, long f, long t)
>> {
>> struct list_head *head = &resv->regions;
>> @@ -1424,46 +1482,56 @@ static void return_unused_surplus_pages(struct hstate *h,
>> }
>>
>> /*
>> - * Determine if the huge page at addr within the vma has an associated
>> - * reservation. Where it does not we will need to logically increase
>> - * reservation and actually increase subpool usage before an allocation
>> - * can occur. Where any new reservation would be required the
>> - * reservation change is prepared, but not committed. Once the page
>> - * has been allocated from the subpool and instantiated the change should
>> - * be committed via vma_commit_reservation. No action is required on
>> - * failure.
>> + * vma_needs_reservation and vma_commit_reservation are used by the huge
>> + * page allocation routines to manage reservations.
>> + *
>> + * vma_needs_reservation is called to determine if the huge page at addr
>> + * within the vma has an associated reservation. If a reservation is
>> + * needed, the value 1 is returned. The caller is then responsible for
>> + * managing the global reservation and subpool usage counts. After
>> + * the huge page has been allocated, vma_commit_reservation is called
>> + * to add the page to the reservation map.
>> + *
>> + * In the normal case, vma_commit_reservation should return the same value
>> + * as the preceding vma_needs_reservation call. The only time this is
>> + * not the case is if a reserve map was changed between calls. It is the
>> + * responsibility of the caller to notice the difference and take appropriate
>> + * action.
>> */
>> -static long vma_needs_reservation(struct hstate *h,
>> - struct vm_area_struct *vma, unsigned long addr)
>> +static long __vma_reservation_common(struct hstate *h,
>> + struct vm_area_struct *vma, unsigned long addr,
>> + bool needs)
>> {
>> struct resv_map *resv;
>> pgoff_t idx;
>> - long chg;
>> + long ret;
>>
>> resv = vma_resv_map(vma);
>> if (!resv)
>> return 1;
>>
>> idx = vma_hugecache_offset(h, vma, addr);
>> - chg = region_chg(resv, idx, idx + 1);
>> + if (needs)
>> + ret = region_chg(resv, idx, idx + 1);
>> + else
>> + ret = region_add(resv, idx, idx + 1);
>
> This code sharing is OK, but the name "needs" looks a bit unclear to me.
> I feel that it's more readable if we name "commit" (or "commits") to the bool
> parameter and call region_add() if "commit" is true.
>
Agree. And Davidlor suggested renaming "needs" in the routine name
to prepare. If renamed prepare, I think it would more clear.
>>
>> if (vma->vm_flags & VM_MAYSHARE)
>> - return chg;
>> + return ret;
>> else
>> - return chg < 0 ? chg : 0;
>> + return ret < 0 ? ret : 0;
>> }
>> -static void vma_commit_reservation(struct hstate *h,
>> +
>> +static long vma_needs_reservation(struct hstate *h,
>> struct vm_area_struct *vma, unsigned long addr)
>> {
>> - struct resv_map *resv;
>> - pgoff_t idx;
>> -
>> - resv = vma_resv_map(vma);
>> - if (!resv)
>> - return;
>> + return __vma_reservation_common(h, vma, addr, (bool)1);
>
> You can simply use literal "true"?
Yes.
Thank you for the review and comments.
--
Mike Kravetz
>
> Thanks,
> Naoya Horiguchi
>
>> +}
>>
>> - idx = vma_hugecache_offset(h, vma, addr);
>> - region_add(resv, idx, idx + 1);
>> +static long vma_commit_reservation(struct hstate *h,
>> + struct vm_area_struct *vma, unsigned long addr)
>> +{
>> + return __vma_reservation_common(h, vma, addr, (bool)0);
>> }
>>
>> static struct page *alloc_huge_page(struct vm_area_struct *vma,
>> --
>> 2.1.0
next prev parent reply other threads:[~2015-05-26 16:40 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-05-23 3:55 [PATCH v2 0/2] alloc_huge_page/hugetlb_reserve_pages race Mike Kravetz
2015-05-23 3:55 ` Mike Kravetz
2015-05-23 3:55 ` [PATCH v2 1/2] mm/hugetlb: compute/return the number of regions added by region_add() Mike Kravetz
2015-05-23 3:55 ` Mike Kravetz
2015-05-25 6:19 ` Naoya Horiguchi
2015-05-25 6:19 ` Naoya Horiguchi
2015-05-26 16:30 ` Mike Kravetz [this message]
2015-05-26 16:30 ` Mike Kravetz
2015-05-25 20:29 ` Davidlohr Bueso
2015-05-25 20:29 ` Davidlohr Bueso
2015-05-26 22:59 ` Mike Kravetz
2015-05-26 22:59 ` Mike Kravetz
2015-05-23 3:55 ` [PATCH v2 2/2] mm/hugetlb: handle races in alloc_huge_page and hugetlb_reserve_pages Mike Kravetz
2015-05-23 3:55 ` Mike Kravetz
2015-05-25 19:58 ` [PATCH v2 0/2] alloc_huge_page/hugetlb_reserve_pages race Davidlohr Bueso
2015-05-25 19:58 ` Davidlohr Bueso
2015-05-26 17:10 ` Mike Kravetz
2015-05-26 17:10 ` Mike Kravetz
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=55649F9F.9090401@oracle.com \
--to=mike.kravetz@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=dave@stgolabs.net \
--cc=lcapitulino@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=n-horiguchi@ah.jp.nec.com \
--cc=rientjes@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.