linux-api.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Mike Kravetz <mike.kravetz@oracle.com>
To: Li Xinhai <lixinhai.lxh@gmail.com>,
	John Hubbard <jhubbard@nvidia.com>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>
Cc: Linux API <linux-api@vger.kernel.org>, akpm <akpm@linux-foundation.org>
Subject: Re: [PATCH] mm: allow checking length for hugetlb mapping in mmap()
Date: Mon, 30 Mar 2020 11:39:44 -0700	[thread overview]
Message-ID: <5e02a305-038f-b86c-31e7-85358563cbc5@oracle.com> (raw)
In-Reply-To: <2020032916093522557671@gmail.com>

On 3/29/20 1:09 AM, Li Xinhai wrote:
> On 2020-03-29 at 11:53 John Hubbard wrote:
>> On 3/28/20 8:08 PM, Li Xinhai wrote:
>>> In current code, the vma related call of hugetlb mapping, except mmap,
>>> are all consider not correctly aligned length as invalid parameter,
>>> including mprotect,munmap, mlock, etc., by checking through
>>> hugetlb_vm_op_split. So, user will see failure, after successfully call
>>> mmap, although using same length parameter to other mapping syscall.
>>>
>>> It is desirable for all hugetlb mapping calls have consistent behavior,
>>> without mmap as exception(which round up length to align underlying
>>> hugepage size). In current Documentation/admin-guide/mm/hugetlbpage.rst,
>>> the description is:
>>> "
>>> Syscalls that operate on memory backed by hugetlb pages only have their
>>> lengths aligned to the native page size of the processor; they will
>>> normally fail with errno set to EINVAL or exclude hugetlb pages that
>>> extend beyond the length if not hugepage aligned. For example, munmap(2)
>>> will fail if memory is backed by a hugetlb page and the length is smaller
>>> than the hugepage size.
>>> "
>>> which express the consistent behavior.
>>
>>
>> Missing here is a description of what the patch actually does...
>>
> 
> right, more statement can be added like:
> "
> After this patch, all hugetlb mapping related syscall wil only align
> length parameter to the native page size of the processor. For mmap(),
> hugetlb_get_unmmaped_area() will set errno to EINVAL if length is not
> aligned to underlying hugepage size.
> "
> 
>>>
>>> Signed-off-by: Li Xinhai <lixinhai.lxh@gmail.com>
>>> Cc: Andrew Morton <akpm@linux-foundation.org>
>>> Cc: Mike Kravetz <mike.kravetz@oracle.com>
>>> Cc: John Hubbard <jhubbard@nvidia.com>
>>> ---
>>> changes:
>>> 0. patch which introduce new flag for mmap()
>>>      The new flag should be avoided.
>>> https://lore.kernel.org/linux-mm/1585313944-8627-1-git-send-email-lixinhai.lxh@gmail.com/

It is not exactly clear in your commit message, but this change will cause
mmap() of hugetlb ranges to fail (-EINVAL) if length is not a multiple of
huge page size.  The mmap man page says:

  Huge page (Huge TLB) mappings
       For mappings that employ huge pages, the requirements for the arguments
       of  mmap()  and munmap() differ somewhat from the requirements for map‐
       pings that use the native system page size.

       For mmap(), offset must be a multiple of the underlying huge page size.
       The system automatically aligns length to be a multiple of the underly‐
       ing huge page size.

       For munmap(), addr and length must both be a multiple of the underlying
       huge page size.

So this change may cause application failure.  The code you are removing was
added with commit af73e4d9506d.  The commit message for that commit says:

    hugetlbfs: fix mmap failure in unaligned size request
    
    The current kernel returns -EINVAL unless a given mmap length is
    "almost" hugepage aligned.  This is because in sys_mmap_pgoff() the
    given length is passed to vm_mmap_pgoff() as it is without being aligned
    with hugepage boundary.
    
    This is a regression introduced in commit 40716e29243d ("hugetlbfs: fix
    alignment of huge page requests"), where alignment code is pushed into
    hugetlb_file_setup() and the variable len in caller side is not changed.

The change in commit af73e4d9506d was added because causing mmap to return
-EINVAL if length is not a multiple of huge page size was considered a
regression.  It would still be considered a regression today.

I understand that the behavior not consistent.  However, it is clearly
documented.  I do not believe we can change the behavior of this code.

-- 
Mike Kravetz

  reply	other threads:[~2020-03-30 18:41 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-29  3:08 [PATCH] mm: allow checking length for hugetlb mapping in mmap() Li Xinhai
2020-03-29  3:53 ` John Hubbard
2020-03-29  8:09   ` Li Xinhai
2020-03-30 18:39     ` Mike Kravetz [this message]
2020-03-31  8:35       ` Li Xinhai
2020-03-31 22:04         ` Mike Kravetz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5e02a305-038f-b86c-31e7-85358563cbc5@oracle.com \
    --to=mike.kravetz@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=jhubbard@nvidia.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lixinhai.lxh@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).