From: Mike Kravetz <mike.kravetz@oracle.com>
To: Hillf Danton <hillf.zj@alibaba-inc.com>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org
Cc: 'Hugh Dickins' <hughd@google.com>,
'Naoya Horiguchi' <n-horiguchi@ah.jp.nec.com>,
'Davidlohr Bueso' <dave@stgolabs.net>,
'Dave Hansen' <dave.hansen@linux.intel.com>,
'Andrew Morton' <akpm@linux-foundation.org>,
'Michel Lespinasse' <walken@google.com>
Subject: Re: [PATCH] mm/hugetlbfs: Unmap pages if page fault raced with hole punch
Date: Thu, 7 Jan 2016 20:39:15 -0800 [thread overview]
Message-ID: <568F3D73.60901@oracle.com> (raw)
In-Reply-To: <04d801d14922$5d1e2f30$175a8d90$@alibaba-inc.com>
On 01/07/2016 12:06 AM, Hillf Danton wrote:
>> diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
>> index 0444760..0871d70 100644
>> --- a/fs/hugetlbfs/inode.c
>> +++ b/fs/hugetlbfs/inode.c
>> @@ -324,11 +324,46 @@ static void remove_huge_page(struct page *page)
>> delete_from_page_cache(page);
>> }
>>
>> +static inline void
>> +hugetlb_vmdelete_list(struct rb_root *root, pgoff_t start, pgoff_t end)
>> +{
>> + struct vm_area_struct *vma;
>> +
>> + /*
>> + * end == 0 indicates that the entire range after
>> + * start should be unmapped.
>> + */
>> + vma_interval_tree_foreach(vma, root, start, end ? end : ULONG_MAX) {
>
> [1] perhaps end can be reused.
>
>> + unsigned long v_offset;
>> +
>> + /*
>> + * Can the expression below overflow on 32-bit arches?
>> + * No, because the interval tree returns us only those vmas
>> + * which overlap the truncated area starting at pgoff,
>> + * and no vma on a 32-bit arch can span beyond the 4GB.
>> + */
>> + if (vma->vm_pgoff < start)
>> + v_offset = (start - vma->vm_pgoff) << PAGE_SHIFT;
>> + else
>> + v_offset = 0;
>> +
>> + if (end) {
>> + end = ((end - start) << PAGE_SHIFT) +
>> + vma->vm_start + v_offset;
>
> [2] end is input to be pgoff_t, but changed to be the type of v_offset.
> Further we cannot handle the case that end is input to be zero.
> See the diff below please.
>
<snip>
>
> --- a/fs/hugetlbfs/inode.c Thu Jan 7 15:04:35 2016
> +++ b/fs/hugetlbfs/inode.c Thu Jan 7 15:31:03 2016
> @@ -461,8 +461,11 @@ hugetlb_vmdelete_list(struct rb_root *ro
> * end == 0 indicates that the entire range after
> * start should be unmapped.
> */
> - vma_interval_tree_foreach(vma, root, start, end ? end : ULONG_MAX) {
> + if (!end)
> + end = ULONG_MAX;
> + vma_interval_tree_foreach(vma, root, start, end) {
> unsigned long v_offset;
> + unsigned long v_end;
>
> /*
> * Can the expression below overflow on 32-bit arches?
> @@ -475,15 +478,12 @@ hugetlb_vmdelete_list(struct rb_root *ro
> else
> v_offset = 0;
>
> - if (end) {
> - end = ((end - start) << PAGE_SHIFT) +
> + v_end = ((end - start) << PAGE_SHIFT) +
> vma->vm_start + v_offset;
> - if (end > vma->vm_end)
> - end = vma->vm_end;
> - } else
> - end = vma->vm_end;
> + if (v_end > vma->vm_end)
> + v_end = vma->vm_end;
>
> - unmap_hugepage_range(vma, vma->vm_start + v_offset, end, NULL);
> + unmap_hugepage_range(vma, vma->vm_start + v_offset, v_end, NULL);
> }
> }
>
> --
Unfortunately, that calculation of v_end is not correct. I know it
is based on the existing code, but the existing code it not correct.
I attempted to fix in a patch earlier today, but that was not correct
either. Below is a proposed new version of hugetlb_vmdelete_list.
Let me know what you think.
static inline void
hugetlb_vmdelete_list(struct rb_root *root, pgoff_t start, pgoff_t end)
{
struct vm_area_struct *vma;
/*
* end == 0 indicates that the entire range after
* start should be unmapped.
*/
vma_interval_tree_foreach(vma, root, start, end ? end : ULONG_MAX) {
unsigned long v_offset;
unsigned long v_end;
/*
* Can the expression below overflow on 32-bit arches?
* No, because the interval tree returns us only those vmas
* which overlap the truncated area starting at pgoff,
* and no vma on a 32-bit arch can span beyond the 4GB.
*/
if (vma->vm_pgoff < start)
v_offset = (start - vma->vm_pgoff) << PAGE_SHIFT;
else
v_offset = 0;
if (!end)
v_end = vma->vm_end;
else {
v_end = ((end - vma->vm_pgoff) << PAGE_SHIFT)
+ vma->vm_start;
if (v_end > vma->vm_end)
v_end = vma->vm_end;
}
unmap_hugepage_range(vma, vma->vm_start + v_offset, v_end,
NULL);
}
}
--
Mike Kravetz
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
WARNING: multiple messages have this Message-ID (diff)
From: Mike Kravetz <mike.kravetz@oracle.com>
To: Hillf Danton <hillf.zj@alibaba-inc.com>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org
Cc: "'Hugh Dickins'" <hughd@google.com>,
"'Naoya Horiguchi'" <n-horiguchi@ah.jp.nec.com>,
"'Davidlohr Bueso'" <dave@stgolabs.net>,
"'Dave Hansen'" <dave.hansen@linux.intel.com>,
"'Andrew Morton'" <akpm@linux-foundation.org>,
"'Michel Lespinasse'" <walken@google.com>
Subject: Re: [PATCH] mm/hugetlbfs: Unmap pages if page fault raced with hole punch
Date: Thu, 7 Jan 2016 20:39:15 -0800 [thread overview]
Message-ID: <568F3D73.60901@oracle.com> (raw)
In-Reply-To: <04d801d14922$5d1e2f30$175a8d90$@alibaba-inc.com>
On 01/07/2016 12:06 AM, Hillf Danton wrote:
>> diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
>> index 0444760..0871d70 100644
>> --- a/fs/hugetlbfs/inode.c
>> +++ b/fs/hugetlbfs/inode.c
>> @@ -324,11 +324,46 @@ static void remove_huge_page(struct page *page)
>> delete_from_page_cache(page);
>> }
>>
>> +static inline void
>> +hugetlb_vmdelete_list(struct rb_root *root, pgoff_t start, pgoff_t end)
>> +{
>> + struct vm_area_struct *vma;
>> +
>> + /*
>> + * end == 0 indicates that the entire range after
>> + * start should be unmapped.
>> + */
>> + vma_interval_tree_foreach(vma, root, start, end ? end : ULONG_MAX) {
>
> [1] perhaps end can be reused.
>
>> + unsigned long v_offset;
>> +
>> + /*
>> + * Can the expression below overflow on 32-bit arches?
>> + * No, because the interval tree returns us only those vmas
>> + * which overlap the truncated area starting at pgoff,
>> + * and no vma on a 32-bit arch can span beyond the 4GB.
>> + */
>> + if (vma->vm_pgoff < start)
>> + v_offset = (start - vma->vm_pgoff) << PAGE_SHIFT;
>> + else
>> + v_offset = 0;
>> +
>> + if (end) {
>> + end = ((end - start) << PAGE_SHIFT) +
>> + vma->vm_start + v_offset;
>
> [2] end is input to be pgoff_t, but changed to be the type of v_offset.
> Further we cannot handle the case that end is input to be zero.
> See the diff below please.
>
<snip>
>
> --- a/fs/hugetlbfs/inode.c Thu Jan 7 15:04:35 2016
> +++ b/fs/hugetlbfs/inode.c Thu Jan 7 15:31:03 2016
> @@ -461,8 +461,11 @@ hugetlb_vmdelete_list(struct rb_root *ro
> * end == 0 indicates that the entire range after
> * start should be unmapped.
> */
> - vma_interval_tree_foreach(vma, root, start, end ? end : ULONG_MAX) {
> + if (!end)
> + end = ULONG_MAX;
> + vma_interval_tree_foreach(vma, root, start, end) {
> unsigned long v_offset;
> + unsigned long v_end;
>
> /*
> * Can the expression below overflow on 32-bit arches?
> @@ -475,15 +478,12 @@ hugetlb_vmdelete_list(struct rb_root *ro
> else
> v_offset = 0;
>
> - if (end) {
> - end = ((end - start) << PAGE_SHIFT) +
> + v_end = ((end - start) << PAGE_SHIFT) +
> vma->vm_start + v_offset;
> - if (end > vma->vm_end)
> - end = vma->vm_end;
> - } else
> - end = vma->vm_end;
> + if (v_end > vma->vm_end)
> + v_end = vma->vm_end;
>
> - unmap_hugepage_range(vma, vma->vm_start + v_offset, end, NULL);
> + unmap_hugepage_range(vma, vma->vm_start + v_offset, v_end, NULL);
> }
> }
>
> --
Unfortunately, that calculation of v_end is not correct. I know it
is based on the existing code, but the existing code it not correct.
I attempted to fix in a patch earlier today, but that was not correct
either. Below is a proposed new version of hugetlb_vmdelete_list.
Let me know what you think.
static inline void
hugetlb_vmdelete_list(struct rb_root *root, pgoff_t start, pgoff_t end)
{
struct vm_area_struct *vma;
/*
* end == 0 indicates that the entire range after
* start should be unmapped.
*/
vma_interval_tree_foreach(vma, root, start, end ? end : ULONG_MAX) {
unsigned long v_offset;
unsigned long v_end;
/*
* Can the expression below overflow on 32-bit arches?
* No, because the interval tree returns us only those vmas
* which overlap the truncated area starting at pgoff,
* and no vma on a 32-bit arch can span beyond the 4GB.
*/
if (vma->vm_pgoff < start)
v_offset = (start - vma->vm_pgoff) << PAGE_SHIFT;
else
v_offset = 0;
if (!end)
v_end = vma->vm_end;
else {
v_end = ((end - vma->vm_pgoff) << PAGE_SHIFT)
+ vma->vm_start;
if (v_end > vma->vm_end)
v_end = vma->vm_end;
}
unmap_hugepage_range(vma, vma->vm_start + v_offset, v_end,
NULL);
}
}
--
Mike Kravetz
next prev parent reply other threads:[~2016-01-08 4:44 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-01-06 22:37 [PATCH] mm/hugetlbfs: Unmap pages if page fault raced with hole punch Mike Kravetz
2016-01-06 22:37 ` Mike Kravetz
2016-01-07 8:06 ` Hillf Danton
2016-01-07 8:06 ` Hillf Danton
2016-01-07 16:46 ` Mike Kravetz
2016-01-07 16:46 ` Mike Kravetz
2016-01-08 4:39 ` Mike Kravetz [this message]
2016-01-08 4:39 ` Mike Kravetz
2016-01-08 6:25 ` Hillf Danton
2016-01-08 6:25 ` Hillf Danton
2016-01-11 22:35 ` Andrew Morton
2016-01-11 22:35 ` Andrew Morton
2016-01-11 23:38 ` Mike Kravetz
2016-01-11 23:38 ` Mike Kravetz
2016-01-12 0:29 ` Andrew Morton
2016-01-12 0:29 ` Andrew Morton
2016-01-12 1:36 ` Mike Kravetz
2016-01-12 1:36 ` Mike Kravetz
2016-01-12 2:20 ` Andrew Morton
2016-01-12 2:20 ` Andrew Morton
2016-01-12 3:21 ` Mike Kravetz
2016-01-12 3:21 ` Mike Kravetz
2016-01-12 4:35 ` Andrew Morton
2016-01-12 4:35 ` Andrew Morton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=568F3D73.60901@oracle.com \
--to=mike.kravetz@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=dave.hansen@linux.intel.com \
--cc=dave@stgolabs.net \
--cc=hillf.zj@alibaba-inc.com \
--cc=hughd@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=n-horiguchi@ah.jp.nec.com \
--cc=walken@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.