From: zhong jiang <zhongjiang@huawei.com>
To: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Michal Hocko <mhocko@kernel.org>,
David Rientjes <rientjes@google.com>,
Vlastimil Babka <vbabka@suse.cz>, Hugh Dickins <hughd@google.com>,
Linux Memory Management List <linux-mm@kvack.org>,
LKML <linux-kernel@vger.kernel.org>,
Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Subject: Re: [RFC] remove unnecessary condition in remove_inode_hugepages
Date: Sat, 24 Sep 2016 10:56:52 +0800 [thread overview]
Message-ID: <57E5EB74.5070003@huawei.com> (raw)
In-Reply-To: <d1e61e42-b644-478d-6294-3f8099318a3b@oracle.com>
On 2016/9/24 1:19, Mike Kravetz wrote:
> On 09/22/2016 06:53 PM, zhong jiang wrote:
>> At present, we need to call hugetlb_fix_reserve_count when hugetlb_unrserve_pages fails,
>> and PagePrivate will decide hugetlb reserves counts.
>>
>> we obtain the page from page cache. and use page both lock_page and mutex_lock.
>> alloc_huge_page add page to page chace always hold lock page, then bail out clearpageprivate
>> before unlock page.
>>
>> but I' m not sure it is right or I miss the points.
> Let me try to explain the code you suggest is unnecessary.
>
> The PagePrivate flag is used in huge page allocation/deallocation to
> indicate that the page was globally reserved. For example, in
> dequeue_huge_page_vma() there is this code:
>
> if (page) {
> if (avoid_reserve)
> break;
> if (!vma_has_reserves(vma, chg))
> break;
>
> SetPagePrivate(page);
> h->resv_huge_pages--;
> break;
> }
>
> and in free_huge_page():
>
> restore_reserve = PagePrivate(page);
> ClearPagePrivate(page);
> .
> <snip>
> .
> if (restore_reserve)
> h->resv_huge_pages++;
>
> This helps maintains the global huge page reserve count.
>
> In addition to the global reserve count, there are per VMA reservation
> structures. Unfortunately, these structures have different meanings
> depending on the context in which they are used.
>
> If there is a VMA reservation entry for a page, and the page has not
> been instantiated in the VMA this indicates there is a huge page reserved
> and the global resv_huge_pages count reflects that reservation. Even
> if a page was not reserved, a VMA reservation entry is added when a page
> is instantiated in the VMA.
>
> With that background, let's look at the existing code/proposed changes.
Clearly.
>> diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
>> index 4ea71eb..010723b 100644
>> --- a/fs/hugetlbfs/inode.c
>> +++ b/fs/hugetlbfs/inode.c
>> @@ -462,14 +462,12 @@ static void remove_inode_hugepages(struct inode *inode, loff_t lstart,
>> * the page, note PagePrivate which is used in case
>> * of error.
>> */
>> - rsv_on_error = !PagePrivate(page);
> This rsv_on_error flag indicates that when the huge page was allocated,
yes
> it was NOT counted against the global reserve count. So, when
> remove_huge_page eventually calls free_huge_page(), the global count
> resv_huge_pages is not incremented. So far, no problem.
but the page comes from the page cache. if it is. it should implement
ClearPageprivate(page) when lock page. This condition always true.
The key point is why it need still check the PagePrivate(page) when page from
page cache and hold lock.
Thanks you
zhongjiang
>> remove_huge_page(page);
>> freed++;
>> if (!truncate_op) {
>> if (unlikely(hugetlb_unreserve_pages(inode,
>> next, next + 1, 1)))
> We now have this VERY unlikely situation that hugetlb_unreserve_pages fails.
> This means that the VMA reservation entry for the page was not removed.
> So, we are in a bit of a mess. The page has already been removed, but the
> VMA reservation entry can not. This LOOKS like there is a reservation for
> the page in the VMA reservation structure. But, the global count
> resv_huge_pages does not reflect this reservation.
>
> If we do nothing, when the VMA is eventually removed the VMA reservation
> structure will be completely removed and the global count resv_huge_pages
> will be decremented for each entry in the structure. Since, there is a
> VMA reservation entry without a corresponding global count, the global
> count will be one less than it should (will eventually go to -1).
>
> To 'fix' this, hugetlb_fix_reserve_counts is called. In this case, it will
> increment the global count so that it is consistent with the entries in
> the VMA reservation structure.
>
> This is all quite confusing and really unlikely to happen. I tried to
> explain in code comments:
>
> Before removing the page:
> /*
> * We must free the huge page and remove from page
> * cache (remove_huge_page) BEFORE removing the
> * region/reserve map (hugetlb_unreserve_pages). In
> * rare out of memory conditions, removal of the
> * region/reserve map could fail. Before free'ing
> * the page, note PagePrivate which is used in case
> * of error.
> */
>
> And, the routine hugetlb_fix_reserve_counts:
> /*
> * A rare out of memory error was encountered which prevented removal of
> * the reserve map region for a page. The huge page itself was free'ed
> * and removed from the page cache. This routine will adjust the subpool
> * usage count, and the global reserve count if needed. By incrementing
> * these counts, the reserve map entry which could not be deleted will
> * appear as a "reserved" entry instead of simply dangling with incorrect
> * counts.
> */
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2016-09-24 3:00 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-09-23 1:53 [RFC] remove unnecessary condition in remove_inode_hugepages zhong jiang
2016-09-23 8:18 ` Michal Hocko
2016-09-23 17:19 ` Mike Kravetz
2016-09-24 2:56 ` zhong jiang [this message]
2016-09-25 0:06 ` Mike Kravetz
2016-09-25 6:40 ` zhong jiang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=57E5EB74.5070003@huawei.com \
--to=zhongjiang@huawei.com \
--cc=hughd@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@kernel.org \
--cc=mike.kravetz@oracle.com \
--cc=n-horiguchi@ah.jp.nec.com \
--cc=rientjes@google.com \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).