linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* Re: [Bug 56881] New: MAP_HUGETLB mmap fails for certain sizes
       [not found] <bug-56881-27@https.bugzilla.kernel.org/>
@ 2013-04-23 20:25 ` Andrew Morton
  2013-04-23 22:26   ` Naoya Horiguchi
  2013-04-24  8:14   ` Johannes Weiner
  0 siblings, 2 replies; 21+ messages in thread
From: Andrew Morton @ 2013-04-23 20:25 UTC (permalink / raw)
  To: linux-mm; +Cc: bugzilla-daemon, iceman_dvd


(switched to email.  Please respond via emailed reply-to-all, not via the
bugzilla web interface).

On Sat, 20 Apr 2013 03:00:30 +0000 (UTC) bugzilla-daemon@bugzilla.kernel.org wrote:

> https://bugzilla.kernel.org/show_bug.cgi?id=56881
> 
>            Summary: MAP_HUGETLB mmap fails for certain sizes
>            Product: Memory Management
>            Version: 2.5
>     Kernel Version: 3.5.0-27

Thanks.

It's a post-3.4 regression, testcase included.  Does someone want to
take a look, please?

>           Platform: All
>         OS/Version: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: high
>           Priority: P1
>          Component: Other
>         AssignedTo: akpm@linux-foundation.org
>         ReportedBy: iceman_dvd@yahoo.com
>         Regression: No
> 
> 
> This is on an Ubuntu 12.10 desktop, but the same issue has been found on 12.04
> with 3.5.0 kernel.
> See the sample program. An allocation with MAP_HUGETLB consistently fails with
> certain sizes, while it succeeds with others.
> The allocation sizes are well below the number of free huge pages.
> 
> $ uname -a Linux davide-lnx2 3.5.0-27-generic #46-Ubuntu SMP Mon Mar 25
> 19:58:17 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
> 
> 
> # echo 100 > /proc/sys/vm/nr_hugepages
> 
> # cat /proc/meminfo
> ...
> AnonHugePages:         0 kB
> HugePages_Total:     100
> HugePages_Free:      100
> HugePages_Rsvd:        0
> HugePages_Surp:        0
> Hugepagesize:       2048 kB
> 
> 
> $ ./mmappu $((5 * 2 * 1024 * 1024 - 4096))
> size=10481664    0x9ff000
> hugepage mmap: Invalid argument
> 
> 
> $ ./mmappu $((5 * 2 * 1024 * 1024 - 4095))
> size=10481665    0x9ff001
> OK!
> 
> 
> It seems the trigger point is a normal page size.
> The same binary works flawlessly in previous kernels.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Bug 56881] New: MAP_HUGETLB mmap fails for certain sizes
  2013-04-23 20:25 ` [Bug 56881] New: MAP_HUGETLB mmap fails for certain sizes Andrew Morton
@ 2013-04-23 22:26   ` Naoya Horiguchi
  2013-04-24  8:20     ` Jianguo Wu
  2013-04-24  8:14   ` Johannes Weiner
  1 sibling, 1 reply; 21+ messages in thread
From: Naoya Horiguchi @ 2013-04-23 22:26 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-mm, bugzilla-daemon, iceman_dvd

On Tue, Apr 23, 2013 at 01:25:22PM -0700, Andrew Morton wrote:
> 
> (switched to email.  Please respond via emailed reply-to-all, not via the
> bugzilla web interface).
> 
> On Sat, 20 Apr 2013 03:00:30 +0000 (UTC) bugzilla-daemon@bugzilla.kernel.org wrote:
> 
> > https://bugzilla.kernel.org/show_bug.cgi?id=56881
> > 
> >            Summary: MAP_HUGETLB mmap fails for certain sizes
> >            Product: Memory Management
> >            Version: 2.5
> >     Kernel Version: 3.5.0-27
> 
> Thanks.
> 
> It's a post-3.4 regression, testcase included.  Does someone want to
> take a look, please?

Let me try it.

  static int hugetlbfs_file_mmap(struct file *file, struct vm_area_struct *vma)
  {                                                                            
          struct inode *inode = file->f_path.dentry->d_inode;
          loff_t len, vma_len;                               
          int ret;                                           
          struct hstate *h = hstate_file(file);              
          ...                                                                               
          if (vma->vm_pgoff & (~huge_page_mask(h) >> PAGE_SHIFT))              
                  return -EINVAL;                                              

This code checks only whether a given hugetlb vma covers (1 << order)
pages, not whether it's exactly hugepage aligned.
Before 2b37c35e6552 "fs/hugetlbfs/inode.c: fix pgoff alignment
checking on 32-bit", it was

  if (vma->vm_pgoff & ~(huge_page_mask(h) >> PAGE_SHIFT))

, but this made no sense because ~(huge_page_mask(h) >> PAGE_SHIFT) is
0xff for 2M hugepage.
I think the reported problem is not a bug because the behavior before
this change was wrong or not as expected.

If we want to make sure that a given address range fits hugepage size,
something like below can be useful.

diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index 78bde32..a98304b 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -113,11 +113,11 @@ static int hugetlbfs_file_mmap(struct file *file, struct vm_area_struct *vma)
 	vma->vm_flags |= VM_HUGETLB | VM_DONTEXPAND | VM_DONTDUMP;
 	vma->vm_ops = &hugetlb_vm_ops;
 
-	if (vma->vm_pgoff & (~huge_page_mask(h) >> PAGE_SHIFT))
-		return -EINVAL;
-
 	vma_len = (loff_t)(vma->vm_end - vma->vm_start);
 
+	if (vma->len & ~huge_page_mask(h))
+		return -EINVAL;
+
 	mutex_lock(&inode->i_mutex);
 	file_accessed(file);
 

Thanks,
Naoya Horiguchi

> >           Platform: All
> >         OS/Version: Linux
> >               Tree: Mainline
> >             Status: NEW
> >           Severity: high
> >           Priority: P1
> >          Component: Other
> >         AssignedTo: akpm@linux-foundation.org
> >         ReportedBy: iceman_dvd@yahoo.com
> >         Regression: No
> > 
> > 
> > This is on an Ubuntu 12.10 desktop, but the same issue has been found on 12.04
> > with 3.5.0 kernel.
> > See the sample program. An allocation with MAP_HUGETLB consistently fails with
> > certain sizes, while it succeeds with others.
> > The allocation sizes are well below the number of free huge pages.
> > 
> > $ uname -a Linux davide-lnx2 3.5.0-27-generic #46-Ubuntu SMP Mon Mar 25
> > 19:58:17 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
> > 
> > 
> > # echo 100 > /proc/sys/vm/nr_hugepages
> > 
> > # cat /proc/meminfo
> > ...
> > AnonHugePages:         0 kB
> > HugePages_Total:     100
> > HugePages_Free:      100
> > HugePages_Rsvd:        0
> > HugePages_Surp:        0
> > Hugepagesize:       2048 kB
> > 
> > 
> > $ ./mmappu $((5 * 2 * 1024 * 1024 - 4096))
> > size=10481664    0x9ff000
> > hugepage mmap: Invalid argument
> > 
> > 
> > $ ./mmappu $((5 * 2 * 1024 * 1024 - 4095))
> > size=10481665    0x9ff001
> > OK!
> > 
> > 
> > It seems the trigger point is a normal page size.
> > The same binary works flawlessly in previous kernels.
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [Bug 56881] New: MAP_HUGETLB mmap fails for certain sizes
  2013-04-23 20:25 ` [Bug 56881] New: MAP_HUGETLB mmap fails for certain sizes Andrew Morton
  2013-04-23 22:26   ` Naoya Horiguchi
@ 2013-04-24  8:14   ` Johannes Weiner
  2013-04-24 15:16     ` Naoya Horiguchi
  1 sibling, 1 reply; 21+ messages in thread
From: Johannes Weiner @ 2013-04-24  8:14 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-mm, bugzilla-daemon, iceman_dvd, Steven Truelove

On Tue, Apr 23, 2013 at 01:25:22PM -0700, Andrew Morton wrote:
> 
> (switched to email.  Please respond via emailed reply-to-all, not via the
> bugzilla web interface).
> 
> On Sat, 20 Apr 2013 03:00:30 +0000 (UTC) bugzilla-daemon@bugzilla.kernel.org wrote:
> 
> > https://bugzilla.kernel.org/show_bug.cgi?id=56881
> > 
> >            Summary: MAP_HUGETLB mmap fails for certain sizes
> >            Product: Memory Management
> >            Version: 2.5
> >     Kernel Version: 3.5.0-27
> 
> Thanks.
> 
> It's a post-3.4 regression, testcase included.  Does someone want to
> take a look, please?

commit 40716e29243de46720e5773797791466c28904ec
Author: Steven Truelove <steven.truelove@utoronto.ca>
Date:   Wed Mar 21 16:34:14 2012 -0700

    hugetlbfs: fix alignment of huge page requests
    
    When calling shmget() with SHM_HUGETLB, shmget aligns the request size to
    PAGE_SIZE, but this is not sufficient.
    
    Modify hugetlb_file_setup() to align requests to the huge page size, and
    to accept an address argument so that all alignment checks can be
    performed in hugetlb_file_setup(), rather than in its callers.  Change
    newseg() and mmap_pgoff() to match the new prototype and eliminate a now
    redundant alignment check.
    
    [akpm@linux-foundation.org: fix build]
    Signed-off-by: Steven Truelove <steven.truelove@utoronto.ca>
    Cc: Hugh Dickins <hughd@google.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

This pushes down the length alignment from mmap_pgoff() into
hugetlb_file_setup() and failed to observe that mmap_pgoff() continues
to work with that now unaligned length parameter.  I.e. this part:

diff --git a/mm/mmap.c b/mm/mmap.c
index 9e0c0de..a19cc27 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1099,9 +1099,9 @@ SYSCALL_DEFINE6(mmap_pgoff, unsigned long, addr, unsigned long, len,
 		 * A dummy user value is used because we are not locking
 		 * memory so no accounting is necessary
 		 */
-		len = ALIGN(len, huge_page_size(&default_hstate));
-		file = hugetlb_file_setup(HUGETLB_ANON_FILE, len, VM_NORESERVE,
-						&user, HUGETLB_ANONHUGE_INODE);
+		file = hugetlb_file_setup(HUGETLB_ANON_FILE, addr, len,
+						VM_NORESERVE, &user,
+						HUGETLB_ANONHUGE_INODE);
 		if (IS_ERR(file))
 			return PTR_ERR(file);
 	}

It would probably be best to revert this commit for the most part and
fix up the alignment in the shmem code.

> > # echo 100 > /proc/sys/vm/nr_hugepages
> > 
> > # cat /proc/meminfo
> > ...
> > AnonHugePages:         0 kB
> > HugePages_Total:     100
> > HugePages_Free:      100
> > HugePages_Rsvd:        0
> > HugePages_Surp:        0
> > Hugepagesize:       2048 kB
> > 
> > 
> > $ ./mmappu $((5 * 2 * 1024 * 1024 - 4096))
> > size=10481664    0x9ff000
> > hugepage mmap: Invalid argument
> > 
> > 
> > $ ./mmappu $((5 * 2 * 1024 * 1024 - 4095))
> > size=10481665    0x9ff001
> > OK!

hugetlb_get_unmapped_area() expects a hugepage-aligned size argument.
Before do_mmap_pgoff() calls it, it does len = PAGE_ALIGN(len), which
is why the second case works and the first one does not.

How about this (untested) partial revert of the above patch that keeps
the shm alignment fix?

diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index 84e3d85..13a7d51 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -929,9 +929,8 @@ static struct dentry_operations anon_ops = {
 	.d_dname = hugetlb_dname
 };
 
-struct file *hugetlb_file_setup(const char *name, unsigned long addr,
-				size_t size, vm_flags_t acctflag,
-				struct user_struct **user,
+struct file *hugetlb_file_setup(const char *name, size_t size,
+				vm_flags_t acctflag, struct user_struct **user,
 				int creat_flags, int page_size_log)
 {
 	struct file *file = ERR_PTR(-ENOMEM);
@@ -939,8 +938,6 @@ struct file *hugetlb_file_setup(const char *name, unsigned long addr,
 	struct path path;
 	struct super_block *sb;
 	struct qstr quick_string;
-	struct hstate *hstate;
-	unsigned long num_pages;
 	int hstate_idx;
 
 	hstate_idx = get_hstate_idx(page_size_log);
@@ -980,12 +977,10 @@ struct file *hugetlb_file_setup(const char *name, unsigned long addr,
 	if (!inode)
 		goto out_dentry;
 
-	hstate = hstate_inode(inode);
-	size += addr & ~huge_page_mask(hstate);
-	num_pages = ALIGN(size, huge_page_size(hstate)) >>
-			huge_page_shift(hstate);
 	file = ERR_PTR(-ENOMEM);
-	if (hugetlb_reserve_pages(inode, 0, num_pages, NULL, acctflag))
+	if (hugetlb_reserve_pages(inode, 0,
+			size >> huge_page_shift(hstate_inode(inode)), NULL,
+			acctflag))
 		goto out_inode;
 
 	d_instantiate(path.dentry, inode);
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 16e4e9a..437875c 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -185,8 +185,7 @@ static inline struct hugetlbfs_sb_info *HUGETLBFS_SB(struct super_block *sb)
 
 extern const struct file_operations hugetlbfs_file_operations;
 extern const struct vm_operations_struct hugetlb_vm_ops;
-struct file *hugetlb_file_setup(const char *name, unsigned long addr,
-				size_t size, vm_flags_t acct,
+struct file *hugetlb_file_setup(const char *name, size_t size, vm_flags_t acct,
 				struct user_struct **user, int creat_flags,
 				int page_size_log);
 
@@ -205,9 +204,9 @@ static inline int is_file_hugepages(struct file *file)
 
 #define is_file_hugepages(file)			0
 static inline struct file *
-hugetlb_file_setup(const char *name, unsigned long addr, size_t size,
-		vm_flags_t acctflag, struct user_struct **user, int creat_flags,
-		int page_size_log)
+hugetlb_file_setup(const char *name, size_t size, vm_flags_t acctflag,
+		   struct user_struct **user, int creat_flags,
+		   int page_size_log)
 {
 	return ERR_PTR(-ENOSYS);
 }
diff --git a/ipc/shm.c b/ipc/shm.c
index cb858df..c1293d1 100644
--- a/ipc/shm.c
+++ b/ipc/shm.c
@@ -491,10 +491,13 @@ static int newseg(struct ipc_namespace *ns, struct ipc_params *params)
 
 	sprintf (name, "SYSV%08x", key);
 	if (shmflg & SHM_HUGETLB) {
+		unsigned int hugesize;
+
 		/* hugetlb_file_setup applies strict accounting */
 		if (shmflg & SHM_NORESERVE)
 			acctflag = VM_NORESERVE;
-		file = hugetlb_file_setup(name, 0, size, acctflag,
+		hugesize = ALIGN(size, huge_page_size(&default_hstate));
+		file = hugetlb_file_setup(name, hugesize, acctflag,
 				  &shp->mlock_user, HUGETLB_SHMFS_INODE,
 				(shmflg >> SHM_HUGE_SHIFT) & SHM_HUGE_MASK);
 	} else {
diff --git a/mm/mmap.c b/mm/mmap.c
index 6466699..36342dd 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1335,7 +1335,8 @@ SYSCALL_DEFINE6(mmap_pgoff, unsigned long, addr, unsigned long, len,
 		 * A dummy user value is used because we are not locking
 		 * memory so no accounting is necessary
 		 */
-		file = hugetlb_file_setup(HUGETLB_ANON_FILE, addr, len,
+		len = ALIGN(len, huge_page_size(&default_hstate));
+		file = hugetlb_file_setup(HUGETLB_ANON_FILE, len,
 				VM_NORESERVE,
 				&user, HUGETLB_ANONHUGE_INODE,
 				(flags >> MAP_HUGE_SHIFT) & MAP_HUGE_MASK);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [Bug 56881] New: MAP_HUGETLB mmap fails for certain sizes
  2013-04-23 22:26   ` Naoya Horiguchi
@ 2013-04-24  8:20     ` Jianguo Wu
  2013-04-24 14:17       ` Naoya Horiguchi
  0 siblings, 1 reply; 21+ messages in thread
From: Jianguo Wu @ 2013-04-24  8:20 UTC (permalink / raw)
  To: Naoya Horiguchi; +Cc: Andrew Morton, linux-mm, bugzilla-daemon, iceman_dvd

On 2013/4/24 6:26, Naoya Horiguchi wrote:

> On Tue, Apr 23, 2013 at 01:25:22PM -0700, Andrew Morton wrote:
>>
>> (switched to email.  Please respond via emailed reply-to-all, not via the
>> bugzilla web interface).
>>
>> On Sat, 20 Apr 2013 03:00:30 +0000 (UTC) bugzilla-daemon@bugzilla.kernel.org wrote:
>>
>>> https://bugzilla.kernel.org/show_bug.cgi?id=56881
>>>
>>>            Summary: MAP_HUGETLB mmap fails for certain sizes
>>>            Product: Memory Management
>>>            Version: 2.5
>>>     Kernel Version: 3.5.0-27
>>
>> Thanks.
>>
>> It's a post-3.4 regression, testcase included.  Does someone want to
>> take a look, please?
> 
> Let me try it.
> 
>   static int hugetlbfs_file_mmap(struct file *file, struct vm_area_struct *vma)
>   {                                                                            
>           struct inode *inode = file->f_path.dentry->d_inode;
>           loff_t len, vma_len;                               
>           int ret;                                           
>           struct hstate *h = hstate_file(file);              
>           ...                                                                               
>           if (vma->vm_pgoff & (~huge_page_mask(h) >> PAGE_SHIFT))              
>                   return -EINVAL;                                              
> 
> This code checks only whether a given hugetlb vma covers (1 << order)
> pages, not whether it's exactly hugepage aligned.
> Before 2b37c35e6552 "fs/hugetlbfs/inode.c: fix pgoff alignment
> checking on 32-bit", it was
> 
>   if (vma->vm_pgoff & ~(huge_page_mask(h) >> PAGE_SHIFT))
> 
> , but this made no sense because ~(huge_page_mask(h) >> PAGE_SHIFT) is
> 0xff for 2M hugepage.
> I think the reported problem is not a bug because the behavior before
> this change was wrong or not as expected.
> 
> If we want to make sure that a given address range fits hugepage size,
> something like below can be useful.
> 
> diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
> index 78bde32..a98304b 100644
> --- a/fs/hugetlbfs/inode.c
> +++ b/fs/hugetlbfs/inode.c
> @@ -113,11 +113,11 @@ static int hugetlbfs_file_mmap(struct file *file, struct vm_area_struct *vma)
>  	vma->vm_flags |= VM_HUGETLB | VM_DONTEXPAND | VM_DONTDUMP;
>  	vma->vm_ops = &hugetlb_vm_ops;
>  
> -	if (vma->vm_pgoff & (~huge_page_mask(h) >> PAGE_SHIFT))
> -		return -EINVAL;
> -
>  	vma_len = (loff_t)(vma->vm_end - vma->vm_start);
>  
> +	if (vma->len & ~huge_page_mask(h))
> +		return -EINVAL;
> +
>  	mutex_lock(&inode->i_mutex);
>  	file_accessed(file);
>  
> 
> Thanks,
> Naoya Horiguchi
> 

Hi Naoya,

I think the -EINVAL is returned from hugetlb_get_unmapped_area(),
for the two testcases:
1) $ ./mmappu $((5 * 2 * 1024 * 1024 - 4096))	//len1 = 0x9ff000
2) $ ./mmappu $((5 * 2 * 1024 * 1024 - 4095))	//len2 = 0x9ff001

In do_mmap_pgoff(), after "len = PAGE_ALIGN(len);", len1 = 0x9ff000,
len2 = 0xa00000, so len2 will pass "if (len & ~huge_page_mask(h))" check in
hugetlb_get_unmapped_area(), and len1 will return -EINVAL. As follow:

do_mmap_pgoff()
{
	...
	/* Careful about overflows.. */
	len = PAGE_ALIGN(len);
	...
	get_unmapped_area()
		-->hugetlb_get_unmapped_area()
		   {
			...
			if (len & ~huge_page_mask(h))
				return -EINVAL;
			...
		   }
}

do we need to align len to hugepage size if it's hugetlbfs mmap? something like below:

---
 mm/mmap.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/mm/mmap.c b/mm/mmap.c
index 0db0de1..bd42be24 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1188,7 +1188,10 @@ unsigned long do_mmap_pgoff(struct file *file, unsigned long addr,
 		addr = round_hint_to_min(addr);
 
 	/* Careful about overflows.. */
-	len = PAGE_ALIGN(len);
+	if (file && is_file_hugepages(file))
+		len = ALIGN(len, huge_page_size(hstate_file(file)));
+	else
+		len = PAGE_ALIGN(len);
 	if (!len)
 		return -ENOMEM;
 
-- 

Thanks,
Jianguo Wu

>>>           Platform: All
>>>         OS/Version: Linux
>>>               Tree: Mainline
>>>             Status: NEW
>>>           Severity: high
>>>           Priority: P1
>>>          Component: Other
>>>         AssignedTo: akpm@linux-foundation.org
>>>         ReportedBy: iceman_dvd@yahoo.com
>>>         Regression: No
>>>
>>>
>>> This is on an Ubuntu 12.10 desktop, but the same issue has been found on 12.04
>>> with 3.5.0 kernel.
>>> See the sample program. An allocation with MAP_HUGETLB consistently fails with
>>> certain sizes, while it succeeds with others.
>>> The allocation sizes are well below the number of free huge pages.
>>>
>>> $ uname -a Linux davide-lnx2 3.5.0-27-generic #46-Ubuntu SMP Mon Mar 25
>>> 19:58:17 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
>>>
>>>
>>> # echo 100 > /proc/sys/vm/nr_hugepages
>>>
>>> # cat /proc/meminfo
>>> ...
>>> AnonHugePages:         0 kB
>>> HugePages_Total:     100
>>> HugePages_Free:      100
>>> HugePages_Rsvd:        0
>>> HugePages_Surp:        0
>>> Hugepagesize:       2048 kB
>>>
>>>
>>> $ ./mmappu $((5 * 2 * 1024 * 1024 - 4096))
>>> size=10481664    0x9ff000
>>> hugepage mmap: Invalid argument
>>>
>>>
>>> $ ./mmappu $((5 * 2 * 1024 * 1024 - 4095))
>>> size=10481665    0x9ff001
>>> OK!
>>>
>>>
>>> It seems the trigger point is a normal page size.
>>> The same binary works flawlessly in previous kernels.
>>
>> --
>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> the body to majordomo@kvack.org.  For more info on Linux MM,
>> see: http://www.linux-mm.org/ .
>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> 
> 



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [Bug 56881] New: MAP_HUGETLB mmap fails for certain sizes
  2013-04-24  8:20     ` Jianguo Wu
@ 2013-04-24 14:17       ` Naoya Horiguchi
  0 siblings, 0 replies; 21+ messages in thread
From: Naoya Horiguchi @ 2013-04-24 14:17 UTC (permalink / raw)
  To: Jianguo Wu; +Cc: Andrew Morton, linux-mm, bugzilla-daemon, iceman_dvd

On Wed, Apr 24, 2013 at 04:20:50PM +0800, Jianguo Wu wrote:
...
> Hi Naoya,
> 
> I think the -EINVAL is returned from hugetlb_get_unmapped_area(),
> for the two testcases:
> 1) $ ./mmappu $((5 * 2 * 1024 * 1024 - 4096))	//len1 = 0x9ff000
> 2) $ ./mmappu $((5 * 2 * 1024 * 1024 - 4095))	//len2 = 0x9ff001
> 
> In do_mmap_pgoff(), after "len = PAGE_ALIGN(len);", len1 = 0x9ff000,
> len2 = 0xa00000, so len2 will pass "if (len & ~huge_page_mask(h))" check in
> hugetlb_get_unmapped_area(), and len1 will return -EINVAL. As follow:
> 
> do_mmap_pgoff()
> {
> 	...
> 	/* Careful about overflows.. */
> 	len = PAGE_ALIGN(len);
> 	...
> 	get_unmapped_area()
> 		-->hugetlb_get_unmapped_area()
> 		   {
> 			...
> 			if (len & ~huge_page_mask(h))
> 				return -EINVAL;
> 			...
> 		   }
> }

You are right, Jianguo. Thanks you.
I totally missed the point.

> 
> do we need to align len to hugepage size if it's hugetlbfs mmap? something like below:
> 
> ---
>  mm/mmap.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/mmap.c b/mm/mmap.c
> index 0db0de1..bd42be24 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -1188,7 +1188,10 @@ unsigned long do_mmap_pgoff(struct file *file, unsigned long addr,
>  		addr = round_hint_to_min(addr);
>  
>  	/* Careful about overflows.. */
> -	len = PAGE_ALIGN(len);
> +	if (file && is_file_hugepages(file))
> +		len = ALIGN(len, huge_page_size(hstate_file(file)));
> +	else
> +		len = PAGE_ALIGN(len);
>  	if (!len)
>  		return -ENOMEM;
>  
> -- 

I like putting this alignment code in if (flags & MAP_HUGETLB) branch in
SYSCALL_DEFINE6(mmap_pgoff) as Johannes pointed out in another subthread,
because it adds no impact on mmap calls with !MAP_HUGETLB.

Thanks,
Naoya Horiguchi

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Bug 56881] New: MAP_HUGETLB mmap fails for certain sizes
  2013-04-24  8:14   ` Johannes Weiner
@ 2013-04-24 15:16     ` Naoya Horiguchi
  2013-04-24 15:39       ` Johannes Weiner
  0 siblings, 1 reply; 21+ messages in thread
From: Naoya Horiguchi @ 2013-04-24 15:16 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, linux-mm, bugzilla-daemon, iceman_dvd,
	Steven Truelove

On Wed, Apr 24, 2013 at 04:14:54AM -0400, Johannes Weiner wrote:
> On Tue, Apr 23, 2013 at 01:25:22PM -0700, Andrew Morton wrote:
...
> hugetlb_get_unmapped_area() expects a hugepage-aligned size argument.
> Before do_mmap_pgoff() calls it, it does len = PAGE_ALIGN(len), which
> is why the second case works and the first one does not.
> 
> How about this (untested) partial revert of the above patch that keeps
> the shm alignment fix?

I have one comment about the suggested patch.

> diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
> index 84e3d85..13a7d51 100644
> --- a/fs/hugetlbfs/inode.c
> +++ b/fs/hugetlbfs/inode.c
> @@ -929,9 +929,8 @@ static struct dentry_operations anon_ops = {
>  	.d_dname = hugetlb_dname
>  };
>  
> -struct file *hugetlb_file_setup(const char *name, unsigned long addr,
> -				size_t size, vm_flags_t acctflag,
> -				struct user_struct **user,
> +struct file *hugetlb_file_setup(const char *name, size_t size,
> +				vm_flags_t acctflag, struct user_struct **user,
>  				int creat_flags, int page_size_log)
>  {
>  	struct file *file = ERR_PTR(-ENOMEM);
> @@ -939,8 +938,6 @@ struct file *hugetlb_file_setup(const char *name, unsigned long addr,
>  	struct path path;
>  	struct super_block *sb;
>  	struct qstr quick_string;
> -	struct hstate *hstate;
> -	unsigned long num_pages;
>  	int hstate_idx;
>  
>  	hstate_idx = get_hstate_idx(page_size_log);
> @@ -980,12 +977,10 @@ struct file *hugetlb_file_setup(const char *name, unsigned long addr,
>  	if (!inode)
>  		goto out_dentry;
>  
> -	hstate = hstate_inode(inode);
> -	size += addr & ~huge_page_mask(hstate);
> -	num_pages = ALIGN(size, huge_page_size(hstate)) >>
> -			huge_page_shift(hstate);
>  	file = ERR_PTR(-ENOMEM);
> -	if (hugetlb_reserve_pages(inode, 0, num_pages, NULL, acctflag))
> +	if (hugetlb_reserve_pages(inode, 0,
> +			size >> huge_page_shift(hstate_inode(inode)), NULL,
> +			acctflag))
>  		goto out_inode;
>  
>  	d_instantiate(path.dentry, inode);
> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> index 16e4e9a..437875c 100644
> --- a/include/linux/hugetlb.h
> +++ b/include/linux/hugetlb.h
> @@ -185,8 +185,7 @@ static inline struct hugetlbfs_sb_info *HUGETLBFS_SB(struct super_block *sb)
>  
>  extern const struct file_operations hugetlbfs_file_operations;
>  extern const struct vm_operations_struct hugetlb_vm_ops;
> -struct file *hugetlb_file_setup(const char *name, unsigned long addr,
> -				size_t size, vm_flags_t acct,
> +struct file *hugetlb_file_setup(const char *name, size_t size, vm_flags_t acct,
>  				struct user_struct **user, int creat_flags,
>  				int page_size_log);
>  
> @@ -205,9 +204,9 @@ static inline int is_file_hugepages(struct file *file)
>  
>  #define is_file_hugepages(file)			0
>  static inline struct file *
> -hugetlb_file_setup(const char *name, unsigned long addr, size_t size,
> -		vm_flags_t acctflag, struct user_struct **user, int creat_flags,
> -		int page_size_log)
> +hugetlb_file_setup(const char *name, size_t size, vm_flags_t acctflag,
> +		   struct user_struct **user, int creat_flags,
> +		   int page_size_log)
>  {
>  	return ERR_PTR(-ENOSYS);
>  }
> diff --git a/ipc/shm.c b/ipc/shm.c
> index cb858df..c1293d1 100644
> --- a/ipc/shm.c
> +++ b/ipc/shm.c
> @@ -491,10 +491,13 @@ static int newseg(struct ipc_namespace *ns, struct ipc_params *params)
>  
>  	sprintf (name, "SYSV%08x", key);
>  	if (shmflg & SHM_HUGETLB) {
> +		unsigned int hugesize;
> +
>  		/* hugetlb_file_setup applies strict accounting */
>  		if (shmflg & SHM_NORESERVE)
>  			acctflag = VM_NORESERVE;
> -		file = hugetlb_file_setup(name, 0, size, acctflag,
> +		hugesize = ALIGN(size, huge_page_size(&default_hstate));
> +		file = hugetlb_file_setup(name, hugesize, acctflag,
>  				  &shp->mlock_user, HUGETLB_SHMFS_INODE,
>  				(shmflg >> SHM_HUGE_SHIFT) & SHM_HUGE_MASK);
>  	} else {

Would it be better to find proper hstate instead of using default_hstate?

> diff --git a/mm/mmap.c b/mm/mmap.c
> index 6466699..36342dd 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -1335,7 +1335,8 @@ SYSCALL_DEFINE6(mmap_pgoff, unsigned long, addr, unsigned long, len,
>  		 * A dummy user value is used because we are not locking
>  		 * memory so no accounting is necessary
>  		 */
> -		file = hugetlb_file_setup(HUGETLB_ANON_FILE, addr, len,
> +		len = ALIGN(len, huge_page_size(&default_hstate));
> +		file = hugetlb_file_setup(HUGETLB_ANON_FILE, len,
>  				VM_NORESERVE,
>  				&user, HUGETLB_ANONHUGE_INODE,
>  				(flags >> MAP_HUGE_SHIFT) & MAP_HUGE_MASK);
> 

ditto.

Thanks,
Naoya Horiguchi

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Bug 56881] New: MAP_HUGETLB mmap fails for certain sizes
  2013-04-24 15:16     ` Naoya Horiguchi
@ 2013-04-24 15:39       ` Johannes Weiner
  2013-04-24 23:05         ` Naoya Horiguchi
  0 siblings, 1 reply; 21+ messages in thread
From: Johannes Weiner @ 2013-04-24 15:39 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: Andrew Morton, linux-mm, bugzilla-daemon, iceman_dvd,
	Steven Truelove

On Wed, Apr 24, 2013 at 11:16:39AM -0400, Naoya Horiguchi wrote:
> On Wed, Apr 24, 2013 at 04:14:54AM -0400, Johannes Weiner wrote:
> > @@ -491,10 +491,13 @@ static int newseg(struct ipc_namespace *ns, struct ipc_params *params)
> >  
> >  	sprintf (name, "SYSV%08x", key);
> >  	if (shmflg & SHM_HUGETLB) {
> > +		unsigned int hugesize;
> > +
> >  		/* hugetlb_file_setup applies strict accounting */
> >  		if (shmflg & SHM_NORESERVE)
> >  			acctflag = VM_NORESERVE;
> > -		file = hugetlb_file_setup(name, 0, size, acctflag,
> > +		hugesize = ALIGN(size, huge_page_size(&default_hstate));
> > +		file = hugetlb_file_setup(name, hugesize, acctflag,
> >  				  &shp->mlock_user, HUGETLB_SHMFS_INODE,
> >  				(shmflg >> SHM_HUGE_SHIFT) & SHM_HUGE_MASK);
> >  	} else {
> 
> Would it be better to find proper hstate instead of using default_hstate?

You are probably right, I guess we can't assume default_hstate anymore
after page_size_log can be passed in.

Can we have hugetlb_file_setup() return an adjusted length, or an
alignment requirement?

Or pull the hstate lookup into the callsites (since they pass in
page_size_log to begin with)?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Bug 56881] New: MAP_HUGETLB mmap fails for certain sizes
  2013-04-24 15:39       ` Johannes Weiner
@ 2013-04-24 23:05         ` Naoya Horiguchi
  2013-04-24 23:13           ` Naoya Horiguchi
                             ` (3 more replies)
  0 siblings, 4 replies; 21+ messages in thread
From: Naoya Horiguchi @ 2013-04-24 23:05 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, linux-mm, bugzilla-daemon, iceman_dvd,
	Steven Truelove

On Wed, Apr 24, 2013 at 11:39:51AM -0400, Johannes Weiner wrote:
> On Wed, Apr 24, 2013 at 11:16:39AM -0400, Naoya Horiguchi wrote:
> > On Wed, Apr 24, 2013 at 04:14:54AM -0400, Johannes Weiner wrote:
> > > @@ -491,10 +491,13 @@ static int newseg(struct ipc_namespace *ns, struct ipc_params *params)
> > >  
> > >  	sprintf (name, "SYSV%08x", key);
> > >  	if (shmflg & SHM_HUGETLB) {
> > > +		unsigned int hugesize;
> > > +
> > >  		/* hugetlb_file_setup applies strict accounting */
> > >  		if (shmflg & SHM_NORESERVE)
> > >  			acctflag = VM_NORESERVE;
> > > -		file = hugetlb_file_setup(name, 0, size, acctflag,
> > > +		hugesize = ALIGN(size, huge_page_size(&default_hstate));
> > > +		file = hugetlb_file_setup(name, hugesize, acctflag,
> > >  				  &shp->mlock_user, HUGETLB_SHMFS_INODE,
> > >  				(shmflg >> SHM_HUGE_SHIFT) & SHM_HUGE_MASK);
> > >  	} else {
> > 
> > Would it be better to find proper hstate instead of using default_hstate?
> 
> You are probably right, I guess we can't assume default_hstate anymore
> after page_size_log can be passed in.
> 
> Can we have hugetlb_file_setup() return an adjusted length, or an
> alignment requirement?

Yes, it's possible if callers pass the pointer of size (length) to
hugetlb_file_setup() and make it adjusted inside the function.
And as for alignment, I think it's not a hugetlb_file_setup's job,
so we don't have to do it in this function.

> Or pull the hstate lookup into the callsites (since they pass in
> page_size_log to begin with)?

This is also a possible solution, where we might need to define and
export a function converting hugepage order to hstate.

I like the former one, so wrote a patch like below.
# I added your Signed-off-by: because this's based on your draft patch.
# if you don't like it, please let me know.

Thanks,
Naoya Horiguchi
---
From: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Date: Wed, 24 Apr 2013 16:44:19 -0400
Subject: [PATCH] hugetlbfs: fix mmap failure in unaligned size request

As reported in https://bugzilla.kernel.org/show_bug.cgi?id=56881, current
kernel returns -EINVAL unless a given mmap length is "almost" hugepage
aligned. This is because in sys_mmap_pgoff() the given length is passed to
vm_mmap_pgoff() as it is without being aligned with hugepage boundary.

This is a regression introduced in commit 40716e29243d "hugetlbfs: fix
alignment of huge page requests", where alignment code is pushed into
hugetlb_file_setup() and the variable len in caller side is not changed.

To fix this, this patch partially reverts that commit, and changes
the type of parameter size from size_t to (size_t *) in order to
align the size in caller side.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
 fs/hugetlbfs/inode.c    | 20 ++++++++++----------
 include/linux/hugetlb.h |  7 +++----
 ipc/shm.c               |  2 +-
 mm/mmap.c               |  2 +-
 4 files changed, 15 insertions(+), 16 deletions(-)

diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index 523464e..7adbd7b 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -929,9 +929,8 @@ static struct dentry_operations anon_ops = {
 	.d_dname = hugetlb_dname
 };
 
-struct file *hugetlb_file_setup(const char *name, unsigned long addr,
-				size_t size, vm_flags_t acctflag,
-				struct user_struct **user,
+struct file *hugetlb_file_setup(const char *name, size_t *sizeptr,
+				vm_flags_t acctflag, struct user_struct **user,
 				int creat_flags, int page_size_log)
 {
 	struct file *file = ERR_PTR(-ENOMEM);
@@ -939,9 +938,8 @@ struct file *hugetlb_file_setup(const char *name, unsigned long addr,
 	struct path path;
 	struct super_block *sb;
 	struct qstr quick_string;
-	struct hstate *hstate;
-	unsigned long num_pages;
 	int hstate_idx;
+	size_t size;
 
 	hstate_idx = get_hstate_idx(page_size_log);
 	if (hstate_idx < 0)
@@ -951,6 +949,10 @@ struct file *hugetlb_file_setup(const char *name, unsigned long addr,
 	if (!hugetlbfs_vfsmount[hstate_idx])
 		return ERR_PTR(-ENOENT);
 
+	size = 1 << hstate_index_to_shift(hstate_idx);
+	if (sizeptr)
+		*sizeptr = ALIGN(*sizeptr, size);
+
 	if (creat_flags == HUGETLB_SHMFS_INODE && !can_do_hugetlb_shm()) {
 		*user = current_user();
 		if (user_shm_lock(size, *user)) {
@@ -980,12 +982,10 @@ struct file *hugetlb_file_setup(const char *name, unsigned long addr,
 	if (!inode)
 		goto out_dentry;
 
-	hstate = hstate_inode(inode);
-	size += addr & ~huge_page_mask(hstate);
-	num_pages = ALIGN(size, huge_page_size(hstate)) >>
-			huge_page_shift(hstate);
 	file = ERR_PTR(-ENOMEM);
-	if (hugetlb_reserve_pages(inode, 0, num_pages, NULL, acctflag))
+	if (hugetlb_reserve_pages(inode, 0,
+			size >> huge_page_shift(hstate_inode(inode)), NULL,
+			acctflag))
 		goto out_inode;
 
 	d_instantiate(path.dentry, inode);
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 8220a8a..ca67d42 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -193,8 +193,7 @@ static inline struct hugetlbfs_sb_info *HUGETLBFS_SB(struct super_block *sb)
 
 extern const struct file_operations hugetlbfs_file_operations;
 extern const struct vm_operations_struct hugetlb_vm_ops;
-struct file *hugetlb_file_setup(const char *name, unsigned long addr,
-				size_t size, vm_flags_t acct,
+struct file *hugetlb_file_setup(const char *name, size_t *size, vm_flags_t acct,
 				struct user_struct **user, int creat_flags,
 				int page_size_log);
 
@@ -213,8 +212,8 @@ static inline int is_file_hugepages(struct file *file)
 
 #define is_file_hugepages(file)			0
 static inline struct file *
-hugetlb_file_setup(const char *name, unsigned long addr, size_t size,
-		vm_flags_t acctflag, struct user_struct **user, int creat_flags,
+hugetlb_file_setup(const char *name, size_t *size, vm_flags_t acctflag,
+		struct user_struct **user, int creat_flags,
 		int page_size_log)
 {
 	return ERR_PTR(-ENOSYS);
diff --git a/ipc/shm.c b/ipc/shm.c
index cb858df..e2cb809 100644
--- a/ipc/shm.c
+++ b/ipc/shm.c
@@ -494,7 +494,7 @@ static int newseg(struct ipc_namespace *ns, struct ipc_params *params)
 		/* hugetlb_file_setup applies strict accounting */
 		if (shmflg & SHM_NORESERVE)
 			acctflag = VM_NORESERVE;
-		file = hugetlb_file_setup(name, 0, size, acctflag,
+		file = hugetlb_file_setup(name, NULL, acctflag,
 				  &shp->mlock_user, HUGETLB_SHMFS_INODE,
 				(shmflg >> SHM_HUGE_SHIFT) & SHM_HUGE_MASK);
 	} else {
diff --git a/mm/mmap.c b/mm/mmap.c
index 2664a47..d03a541 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1333,7 +1333,7 @@ SYSCALL_DEFINE6(mmap_pgoff, unsigned long, addr, unsigned long, len,
 		 * A dummy user value is used because we are not locking
 		 * memory so no accounting is necessary
 		 */
-		file = hugetlb_file_setup(HUGETLB_ANON_FILE, addr, len,
+		file = hugetlb_file_setup(HUGETLB_ANON_FILE, &len,
 				VM_NORESERVE,
 				&user, HUGETLB_ANONHUGE_INODE,
 				(flags >> MAP_HUGE_SHIFT) & MAP_HUGE_MASK);
-- 
1.7.11.7

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [Bug 56881] New: MAP_HUGETLB mmap fails for certain sizes
  2013-04-24 23:05         ` Naoya Horiguchi
@ 2013-04-24 23:13           ` Naoya Horiguchi
  2013-04-24 23:44             ` Johannes Weiner
  2013-04-24 23:26           ` Johannes Weiner
                             ` (2 subsequent siblings)
  3 siblings, 1 reply; 21+ messages in thread
From: Naoya Horiguchi @ 2013-04-24 23:13 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, linux-mm, bugzilla-daemon, iceman_dvd,
	Steven Truelove

On Wed, Apr 24, 2013 at 07:05:35PM -0400, Naoya Horiguchi wrote:
...
> diff --git a/ipc/shm.c b/ipc/shm.c
> index cb858df..e2cb809 100644
> --- a/ipc/shm.c
> +++ b/ipc/shm.c
> @@ -494,7 +494,7 @@ static int newseg(struct ipc_namespace *ns, struct ipc_params *params)
>  		/* hugetlb_file_setup applies strict accounting */
>  		if (shmflg & SHM_NORESERVE)
>  			acctflag = VM_NORESERVE;
> -		file = hugetlb_file_setup(name, 0, size, acctflag,
> +		file = hugetlb_file_setup(name, NULL, acctflag,
>  				  &shp->mlock_user, HUGETLB_SHMFS_INODE,
>  				(shmflg >> SHM_HUGE_SHIFT) & SHM_HUGE_MASK);
>  	} else {

Ugh, NULL is not correct, it should be &size.
But anyway, I want your comment, and after that I'll repost with fixing this.

Naoya

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Bug 56881] New: MAP_HUGETLB mmap fails for certain sizes
  2013-04-24 23:05         ` Naoya Horiguchi
  2013-04-24 23:13           ` Naoya Horiguchi
@ 2013-04-24 23:26           ` Johannes Weiner
  2013-04-25 21:00             ` Naoya Horiguchi
  2013-04-25  3:02           ` [Bug 56881] New: MAP_HUGETLB mmap fails for certain sizes Jianguo Wu
  2013-06-12 12:16           ` Aneesh Kumar K.V
  3 siblings, 1 reply; 21+ messages in thread
From: Johannes Weiner @ 2013-04-24 23:26 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: Andrew Morton, linux-mm, bugzilla-daemon, iceman_dvd,
	Steven Truelove

On Wed, Apr 24, 2013 at 07:05:35PM -0400, Naoya Horiguchi wrote:
> On Wed, Apr 24, 2013 at 11:39:51AM -0400, Johannes Weiner wrote:
> > On Wed, Apr 24, 2013 at 11:16:39AM -0400, Naoya Horiguchi wrote:
> > > On Wed, Apr 24, 2013 at 04:14:54AM -0400, Johannes Weiner wrote:
> > > > @@ -491,10 +491,13 @@ static int newseg(struct ipc_namespace *ns, struct ipc_params *params)
> > > >  
> > > >  	sprintf (name, "SYSV%08x", key);
> > > >  	if (shmflg & SHM_HUGETLB) {
> > > > +		unsigned int hugesize;
> > > > +
> > > >  		/* hugetlb_file_setup applies strict accounting */
> > > >  		if (shmflg & SHM_NORESERVE)
> > > >  			acctflag = VM_NORESERVE;
> > > > -		file = hugetlb_file_setup(name, 0, size, acctflag,
> > > > +		hugesize = ALIGN(size, huge_page_size(&default_hstate));
> > > > +		file = hugetlb_file_setup(name, hugesize, acctflag,
> > > >  				  &shp->mlock_user, HUGETLB_SHMFS_INODE,
> > > >  				(shmflg >> SHM_HUGE_SHIFT) & SHM_HUGE_MASK);
> > > >  	} else {
> > > 
> > > Would it be better to find proper hstate instead of using default_hstate?
> > 
> > You are probably right, I guess we can't assume default_hstate anymore
> > after page_size_log can be passed in.
> > 
> > Can we have hugetlb_file_setup() return an adjusted length, or an
> > alignment requirement?
> 
> Yes, it's possible if callers pass the pointer of size (length) to
> hugetlb_file_setup() and make it adjusted inside the function.
> And as for alignment, I think it's not a hugetlb_file_setup's job,
> so we don't have to do it in this function.
> 
> > Or pull the hstate lookup into the callsites (since they pass in
> > page_size_log to begin with)?
> 
> This is also a possible solution, where we might need to define and
> export a function converting hugepage order to hstate.

After thinking about it some more, I would actually prefer this.  The
callsites have all the information and the file setup code should not
really care about the alignment requirements of the callers.

I.e. export something like get_hstate_idx() but which returns hstate,
then make the callers look it up, do the alignment, pass in the
aligned size and hstate instead of page_size_log.  Then they are free
to use the aligned size (mmap) or use the original size (shm).

> I like the former one, so wrote a patch like below.
> # I added your Signed-off-by: because this's based on your draft patch.
> # if you don't like it, please let me know.

Thanks, I appreciate it.  But usually if you take and modify a patch
add the original From: line to the changelog to give credit, then add
your own signoff and only add other people's signoff after they agree.

> @@ -929,9 +929,8 @@ static struct dentry_operations anon_ops = {
>  	.d_dname = hugetlb_dname
>  };
>  
> -struct file *hugetlb_file_setup(const char *name, unsigned long addr,
> -				size_t size, vm_flags_t acctflag,
> -				struct user_struct **user,
> +struct file *hugetlb_file_setup(const char *name, size_t *sizeptr,
> +				vm_flags_t acctflag, struct user_struct **user,
>  				int creat_flags, int page_size_log)
>  {
>  	struct file *file = ERR_PTR(-ENOMEM);
> @@ -939,9 +938,8 @@ struct file *hugetlb_file_setup(const char *name, unsigned long addr,
>  	struct path path;
>  	struct super_block *sb;
>  	struct qstr quick_string;
> -	struct hstate *hstate;
> -	unsigned long num_pages;
>  	int hstate_idx;
> +	size_t size;
>  
>  	hstate_idx = get_hstate_idx(page_size_log);
>  	if (hstate_idx < 0)
> @@ -951,6 +949,10 @@ struct file *hugetlb_file_setup(const char *name, unsigned long addr,
>  	if (!hugetlbfs_vfsmount[hstate_idx])
>  		return ERR_PTR(-ENOENT);
>  
> +	size = 1 << hstate_index_to_shift(hstate_idx);
> +	if (sizeptr)
> +		*sizeptr = ALIGN(*sizeptr, size);

You always assume the file will just be one hugepage in size?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Bug 56881] New: MAP_HUGETLB mmap fails for certain sizes
  2013-04-24 23:13           ` Naoya Horiguchi
@ 2013-04-24 23:44             ` Johannes Weiner
  0 siblings, 0 replies; 21+ messages in thread
From: Johannes Weiner @ 2013-04-24 23:44 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: Andrew Morton, linux-mm, bugzilla-daemon, iceman_dvd,
	Steven Truelove

On Wed, Apr 24, 2013 at 07:13:08PM -0400, Naoya Horiguchi wrote:
> On Wed, Apr 24, 2013 at 07:05:35PM -0400, Naoya Horiguchi wrote:
> ...
> > diff --git a/ipc/shm.c b/ipc/shm.c
> > index cb858df..e2cb809 100644
> > --- a/ipc/shm.c
> > +++ b/ipc/shm.c
> > @@ -494,7 +494,7 @@ static int newseg(struct ipc_namespace *ns, struct ipc_params *params)
> >  		/* hugetlb_file_setup applies strict accounting */
> >  		if (shmflg & SHM_NORESERVE)
> >  			acctflag = VM_NORESERVE;
> > -		file = hugetlb_file_setup(name, 0, size, acctflag,
> > +		file = hugetlb_file_setup(name, NULL, acctflag,
> >  				  &shp->mlock_user, HUGETLB_SHMFS_INODE,
> >  				(shmflg >> SHM_HUGE_SHIFT) & SHM_HUGE_MASK);
> >  	} else {
> 
> Ugh, NULL is not correct, it should be &size.

No, shm does not want its size variable aligned, it wants the segment
to be the originally requested size...  only mmap uses the aligned
size later on.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Bug 56881] New: MAP_HUGETLB mmap fails for certain sizes
  2013-04-24 23:05         ` Naoya Horiguchi
  2013-04-24 23:13           ` Naoya Horiguchi
  2013-04-24 23:26           ` Johannes Weiner
@ 2013-04-25  3:02           ` Jianguo Wu
  2013-04-25 21:03             ` Naoya Horiguchi
  2013-06-12 12:16           ` Aneesh Kumar K.V
  3 siblings, 1 reply; 21+ messages in thread
From: Jianguo Wu @ 2013-04-25  3:02 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: Johannes Weiner, Andrew Morton, linux-mm, bugzilla-daemon,
	iceman_dvd, Steven Truelove

On 2013/4/25 7:05, Naoya Horiguchi wrote:

> On Wed, Apr 24, 2013 at 11:39:51AM -0400, Johannes Weiner wrote:
>> On Wed, Apr 24, 2013 at 11:16:39AM -0400, Naoya Horiguchi wrote:
>>> On Wed, Apr 24, 2013 at 04:14:54AM -0400, Johannes Weiner wrote:
>>>> @@ -491,10 +491,13 @@ static int newseg(struct ipc_namespace *ns, struct ipc_params *params)
>>>>  
>>>>  	sprintf (name, "SYSV%08x", key);
>>>>  	if (shmflg & SHM_HUGETLB) {
>>>> +		unsigned int hugesize;
>>>> +
>>>>  		/* hugetlb_file_setup applies strict accounting */
>>>>  		if (shmflg & SHM_NORESERVE)
>>>>  			acctflag = VM_NORESERVE;
>>>> -		file = hugetlb_file_setup(name, 0, size, acctflag,
>>>> +		hugesize = ALIGN(size, huge_page_size(&default_hstate));
>>>> +		file = hugetlb_file_setup(name, hugesize, acctflag,
>>>>  				  &shp->mlock_user, HUGETLB_SHMFS_INODE,
>>>>  				(shmflg >> SHM_HUGE_SHIFT) & SHM_HUGE_MASK);
>>>>  	} else {
>>>
>>> Would it be better to find proper hstate instead of using default_hstate?
>>
>> You are probably right, I guess we can't assume default_hstate anymore
>> after page_size_log can be passed in.
>>
>> Can we have hugetlb_file_setup() return an adjusted length, or an
>> alignment requirement?
> 
> Yes, it's possible if callers pass the pointer of size (length) to
> hugetlb_file_setup() and make it adjusted inside the function.
> And as for alignment, I think it's not a hugetlb_file_setup's job,
> so we don't have to do it in this function.
> 
>> Or pull the hstate lookup into the callsites (since they pass in
>> page_size_log to begin with)?
> 
> This is also a possible solution, where we might need to define and
> export a function converting hugepage order to hstate.
> 
> I like the former one, so wrote a patch like below.
> # I added your Signed-off-by: because this's based on your draft patch.
> # if you don't like it, please let me know.
> 
> Thanks,
> Naoya Horiguchi
> ---
> From: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> Date: Wed, 24 Apr 2013 16:44:19 -0400
> Subject: [PATCH] hugetlbfs: fix mmap failure in unaligned size request
> 
> As reported in https://bugzilla.kernel.org/show_bug.cgi?id=56881, current
> kernel returns -EINVAL unless a given mmap length is "almost" hugepage
> aligned. This is because in sys_mmap_pgoff() the given length is passed to
> vm_mmap_pgoff() as it is without being aligned with hugepage boundary.
> 
> This is a regression introduced in commit 40716e29243d "hugetlbfs: fix
> alignment of huge page requests", where alignment code is pushed into
> hugetlb_file_setup() and the variable len in caller side is not changed.
> 
> To fix this, this patch partially reverts that commit, and changes
> the type of parameter size from size_t to (size_t *) in order to
> align the size in caller side.
> 

Hi Naoya,

This patch only fix anonymous hugetlb mmap case, should also fix hugetlbfs file mmap case?

diff --git a/mm/mmap.c b/mm/mmap.c
index 0db0de1..5ed9561 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1327,6 +1327,8 @@ SYSCALL_DEFINE6(mmap_pgoff, unsigned long, addr, unsigned long, len,
 		file = fget(fd);
 		if (!file)
 			goto out;
+		else if (is_file_hugepages(file))
+			len = ALIGN(len, huge_page_size(hstate_file(file)));
 	} else if (flags & MAP_HUGETLB) {
 		struct user_struct *user = NULL;
 		/*

Thanks,
Jianguo Wu

> Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> ---
>  fs/hugetlbfs/inode.c    | 20 ++++++++++----------
>  include/linux/hugetlb.h |  7 +++----
>  ipc/shm.c               |  2 +-
>  mm/mmap.c               |  2 +-
>  4 files changed, 15 insertions(+), 16 deletions(-)
> 
> diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
> index 523464e..7adbd7b 100644
> --- a/fs/hugetlbfs/inode.c
> +++ b/fs/hugetlbfs/inode.c
> @@ -929,9 +929,8 @@ static struct dentry_operations anon_ops = {
>  	.d_dname = hugetlb_dname
>  };
>  
> -struct file *hugetlb_file_setup(const char *name, unsigned long addr,
> -				size_t size, vm_flags_t acctflag,
> -				struct user_struct **user,
> +struct file *hugetlb_file_setup(const char *name, size_t *sizeptr,
> +				vm_flags_t acctflag, struct user_struct **user,
>  				int creat_flags, int page_size_log)
>  {
>  	struct file *file = ERR_PTR(-ENOMEM);
> @@ -939,9 +938,8 @@ struct file *hugetlb_file_setup(const char *name, unsigned long addr,
>  	struct path path;
>  	struct super_block *sb;
>  	struct qstr quick_string;
> -	struct hstate *hstate;
> -	unsigned long num_pages;
>  	int hstate_idx;
> +	size_t size;
>  
>  	hstate_idx = get_hstate_idx(page_size_log);
>  	if (hstate_idx < 0)
> @@ -951,6 +949,10 @@ struct file *hugetlb_file_setup(const char *name, unsigned long addr,
>  	if (!hugetlbfs_vfsmount[hstate_idx])
>  		return ERR_PTR(-ENOENT);
>  
> +	size = 1 << hstate_index_to_shift(hstate_idx);
> +	if (sizeptr)
> +		*sizeptr = ALIGN(*sizeptr, size);
> +
>  	if (creat_flags == HUGETLB_SHMFS_INODE && !can_do_hugetlb_shm()) {
>  		*user = current_user();
>  		if (user_shm_lock(size, *user)) {
> @@ -980,12 +982,10 @@ struct file *hugetlb_file_setup(const char *name, unsigned long addr,
>  	if (!inode)
>  		goto out_dentry;
>  
> -	hstate = hstate_inode(inode);
> -	size += addr & ~huge_page_mask(hstate);
> -	num_pages = ALIGN(size, huge_page_size(hstate)) >>
> -			huge_page_shift(hstate);
>  	file = ERR_PTR(-ENOMEM);
> -	if (hugetlb_reserve_pages(inode, 0, num_pages, NULL, acctflag))
> +	if (hugetlb_reserve_pages(inode, 0,
> +			size >> huge_page_shift(hstate_inode(inode)), NULL,
> +			acctflag))
>  		goto out_inode;
>  
>  	d_instantiate(path.dentry, inode);
> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> index 8220a8a..ca67d42 100644
> --- a/include/linux/hugetlb.h
> +++ b/include/linux/hugetlb.h
> @@ -193,8 +193,7 @@ static inline struct hugetlbfs_sb_info *HUGETLBFS_SB(struct super_block *sb)
>  
>  extern const struct file_operations hugetlbfs_file_operations;
>  extern const struct vm_operations_struct hugetlb_vm_ops;
> -struct file *hugetlb_file_setup(const char *name, unsigned long addr,
> -				size_t size, vm_flags_t acct,
> +struct file *hugetlb_file_setup(const char *name, size_t *size, vm_flags_t acct,
>  				struct user_struct **user, int creat_flags,
>  				int page_size_log);
>  
> @@ -213,8 +212,8 @@ static inline int is_file_hugepages(struct file *file)
>  
>  #define is_file_hugepages(file)			0
>  static inline struct file *
> -hugetlb_file_setup(const char *name, unsigned long addr, size_t size,
> -		vm_flags_t acctflag, struct user_struct **user, int creat_flags,
> +hugetlb_file_setup(const char *name, size_t *size, vm_flags_t acctflag,
> +		struct user_struct **user, int creat_flags,
>  		int page_size_log)
>  {
>  	return ERR_PTR(-ENOSYS);
> diff --git a/ipc/shm.c b/ipc/shm.c
> index cb858df..e2cb809 100644
> --- a/ipc/shm.c
> +++ b/ipc/shm.c
> @@ -494,7 +494,7 @@ static int newseg(struct ipc_namespace *ns, struct ipc_params *params)
>  		/* hugetlb_file_setup applies strict accounting */
>  		if (shmflg & SHM_NORESERVE)
>  			acctflag = VM_NORESERVE;
> -		file = hugetlb_file_setup(name, 0, size, acctflag,
> +		file = hugetlb_file_setup(name, NULL, acctflag,
>  				  &shp->mlock_user, HUGETLB_SHMFS_INODE,
>  				(shmflg >> SHM_HUGE_SHIFT) & SHM_HUGE_MASK);
>  	} else {
> diff --git a/mm/mmap.c b/mm/mmap.c
> index 2664a47..d03a541 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -1333,7 +1333,7 @@ SYSCALL_DEFINE6(mmap_pgoff, unsigned long, addr, unsigned long, len,
>  		 * A dummy user value is used because we are not locking
>  		 * memory so no accounting is necessary
>  		 */
> -		file = hugetlb_file_setup(HUGETLB_ANON_FILE, addr, len,
> +		file = hugetlb_file_setup(HUGETLB_ANON_FILE, &len,
>  				VM_NORESERVE,
>  				&user, HUGETLB_ANONHUGE_INODE,
>  				(flags >> MAP_HUGE_SHIFT) & MAP_HUGE_MASK);



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [Bug 56881] New: MAP_HUGETLB mmap fails for certain sizes
  2013-04-24 23:26           ` Johannes Weiner
@ 2013-04-25 21:00             ` Naoya Horiguchi
  2013-04-26  4:35               ` [PATCH v2] hugetlbfs: fix mmap failure in unaligned size request Naoya Horiguchi
  0 siblings, 1 reply; 21+ messages in thread
From: Naoya Horiguchi @ 2013-04-25 21:00 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, linux-mm, bugzilla-daemon, iceman_dvd,
	Steven Truelove

On Wed, Apr 24, 2013 at 07:26:00PM -0400, Johannes Weiner wrote:
> On Wed, Apr 24, 2013 at 07:05:35PM -0400, Naoya Horiguchi wrote:
> > On Wed, Apr 24, 2013 at 11:39:51AM -0400, Johannes Weiner wrote:
> > > On Wed, Apr 24, 2013 at 11:16:39AM -0400, Naoya Horiguchi wrote:
> > > > On Wed, Apr 24, 2013 at 04:14:54AM -0400, Johannes Weiner wrote:
> > > > > @@ -491,10 +491,13 @@ static int newseg(struct ipc_namespace *ns, struct ipc_params *params)
> > > > >  
> > > > >  	sprintf (name, "SYSV%08x", key);
> > > > >  	if (shmflg & SHM_HUGETLB) {
> > > > > +		unsigned int hugesize;
> > > > > +
> > > > >  		/* hugetlb_file_setup applies strict accounting */
> > > > >  		if (shmflg & SHM_NORESERVE)
> > > > >  			acctflag = VM_NORESERVE;
> > > > > -		file = hugetlb_file_setup(name, 0, size, acctflag,
> > > > > +		hugesize = ALIGN(size, huge_page_size(&default_hstate));
> > > > > +		file = hugetlb_file_setup(name, hugesize, acctflag,
> > > > >  				  &shp->mlock_user, HUGETLB_SHMFS_INODE,
> > > > >  				(shmflg >> SHM_HUGE_SHIFT) & SHM_HUGE_MASK);
> > > > >  	} else {
> > > > 
> > > > Would it be better to find proper hstate instead of using default_hstate?
> > > 
> > > You are probably right, I guess we can't assume default_hstate anymore
> > > after page_size_log can be passed in.
> > > 
> > > Can we have hugetlb_file_setup() return an adjusted length, or an
> > > alignment requirement?
> > 
> > Yes, it's possible if callers pass the pointer of size (length) to
> > hugetlb_file_setup() and make it adjusted inside the function.
> > And as for alignment, I think it's not a hugetlb_file_setup's job,
> > so we don't have to do it in this function.
> > 
> > > Or pull the hstate lookup into the callsites (since they pass in
> > > page_size_log to begin with)?
> > 
> > This is also a possible solution, where we might need to define and
> > export a function converting hugepage order to hstate.
> 
> After thinking about it some more, I would actually prefer this.  The
> callsites have all the information and the file setup code should not
> really care about the alignment requirements of the callers.
> 
> I.e. export something like get_hstate_idx() but which returns hstate,
> then make the callers look it up, do the alignment, pass in the
> aligned size and hstate instead of page_size_log.  Then they are free
> to use the aligned size (mmap) or use the original size (shm).

OK. I'll do this.

> > I like the former one, so wrote a patch like below.
> > # I added your Signed-off-by: because this's based on your draft patch.
> > # if you don't like it, please let me know.
> 
> Thanks, I appreciate it.  But usually if you take and modify a patch
> add the original From: line to the changelog to give credit, then add
> your own signoff and only add other people's signoff after they agree.

OK, got it.

> > @@ -929,9 +929,8 @@ static struct dentry_operations anon_ops = {
> >  	.d_dname = hugetlb_dname
> >  };
> >  
> > -struct file *hugetlb_file_setup(const char *name, unsigned long addr,
> > -				size_t size, vm_flags_t acctflag,
> > -				struct user_struct **user,
> > +struct file *hugetlb_file_setup(const char *name, size_t *sizeptr,
> > +				vm_flags_t acctflag, struct user_struct **user,
> >  				int creat_flags, int page_size_log)
> >  {
> >  	struct file *file = ERR_PTR(-ENOMEM);
> > @@ -939,9 +938,8 @@ struct file *hugetlb_file_setup(const char *name, unsigned long addr,
> >  	struct path path;
> >  	struct super_block *sb;
> >  	struct qstr quick_string;
> > -	struct hstate *hstate;
> > -	unsigned long num_pages;
> >  	int hstate_idx;
> > +	size_t size;
> >  
> >  	hstate_idx = get_hstate_idx(page_size_log);
> >  	if (hstate_idx < 0)
> > @@ -951,6 +949,10 @@ struct file *hugetlb_file_setup(const char *name, unsigned long addr,
> >  	if (!hugetlbfs_vfsmount[hstate_idx])
> >  		return ERR_PTR(-ENOENT);
> >  
> > +	size = 1 << hstate_index_to_shift(hstate_idx);
> > +	if (sizeptr)
> > +		*sizeptr = ALIGN(*sizeptr, size);
> 
> You always assume the file will just be one hugepage in size?

No, this line means that *sizeptr (given by the caller) is round up
to the multiple of hugepage's size. So assuming size is 2MB, for example,
if 5MB is given it's round up to 6MB in return (3 hugepages.)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Bug 56881] New: MAP_HUGETLB mmap fails for certain sizes
  2013-04-25  3:02           ` [Bug 56881] New: MAP_HUGETLB mmap fails for certain sizes Jianguo Wu
@ 2013-04-25 21:03             ` Naoya Horiguchi
  0 siblings, 0 replies; 21+ messages in thread
From: Naoya Horiguchi @ 2013-04-25 21:03 UTC (permalink / raw)
  To: Jianguo Wu
  Cc: Johannes Weiner, Andrew Morton, linux-mm, bugzilla-daemon,
	iceman_dvd, Steven Truelove

On Thu, Apr 25, 2013 at 11:02:59AM +0800, Jianguo Wu wrote:
> On 2013/4/25 7:05, Naoya Horiguchi wrote:
> 
> > On Wed, Apr 24, 2013 at 11:39:51AM -0400, Johannes Weiner wrote:
> >> On Wed, Apr 24, 2013 at 11:16:39AM -0400, Naoya Horiguchi wrote:
> >>> On Wed, Apr 24, 2013 at 04:14:54AM -0400, Johannes Weiner wrote:
> >>>> @@ -491,10 +491,13 @@ static int newseg(struct ipc_namespace *ns, struct ipc_params *params)
> >>>>  
> >>>>  	sprintf (name, "SYSV%08x", key);
> >>>>  	if (shmflg & SHM_HUGETLB) {
> >>>> +		unsigned int hugesize;
> >>>> +
> >>>>  		/* hugetlb_file_setup applies strict accounting */
> >>>>  		if (shmflg & SHM_NORESERVE)
> >>>>  			acctflag = VM_NORESERVE;
> >>>> -		file = hugetlb_file_setup(name, 0, size, acctflag,
> >>>> +		hugesize = ALIGN(size, huge_page_size(&default_hstate));
> >>>> +		file = hugetlb_file_setup(name, hugesize, acctflag,
> >>>>  				  &shp->mlock_user, HUGETLB_SHMFS_INODE,
> >>>>  				(shmflg >> SHM_HUGE_SHIFT) & SHM_HUGE_MASK);
> >>>>  	} else {
> >>>
> >>> Would it be better to find proper hstate instead of using default_hstate?
> >>
> >> You are probably right, I guess we can't assume default_hstate anymore
> >> after page_size_log can be passed in.
> >>
> >> Can we have hugetlb_file_setup() return an adjusted length, or an
> >> alignment requirement?
> > 
> > Yes, it's possible if callers pass the pointer of size (length) to
> > hugetlb_file_setup() and make it adjusted inside the function.
> > And as for alignment, I think it's not a hugetlb_file_setup's job,
> > so we don't have to do it in this function.
> > 
> >> Or pull the hstate lookup into the callsites (since they pass in
> >> page_size_log to begin with)?
> > 
> > This is also a possible solution, where we might need to define and
> > export a function converting hugepage order to hstate.
> > 
> > I like the former one, so wrote a patch like below.
> > # I added your Signed-off-by: because this's based on your draft patch.
> > # if you don't like it, please let me know.
> > 
> > Thanks,
> > Naoya Horiguchi
> > ---
> > From: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> > Date: Wed, 24 Apr 2013 16:44:19 -0400
> > Subject: [PATCH] hugetlbfs: fix mmap failure in unaligned size request
> > 
> > As reported in https://bugzilla.kernel.org/show_bug.cgi?id=56881, current
> > kernel returns -EINVAL unless a given mmap length is "almost" hugepage
> > aligned. This is because in sys_mmap_pgoff() the given length is passed to
> > vm_mmap_pgoff() as it is without being aligned with hugepage boundary.
> > 
> > This is a regression introduced in commit 40716e29243d "hugetlbfs: fix
> > alignment of huge page requests", where alignment code is pushed into
> > hugetlb_file_setup() and the variable len in caller side is not changed.
> > 
> > To fix this, this patch partially reverts that commit, and changes
> > the type of parameter size from size_t to (size_t *) in order to
> > align the size in caller side.
> > 
> 
> Hi Naoya,
> 
> This patch only fix anonymous hugetlb mmap case, should also fix hugetlbfs file mmap case?

Right, thank you, Jianguo.

> diff --git a/mm/mmap.c b/mm/mmap.c
> index 0db0de1..5ed9561 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -1327,6 +1327,8 @@ SYSCALL_DEFINE6(mmap_pgoff, unsigned long, addr, unsigned long, len,
>  		file = fget(fd);
>  		if (!file)
>  			goto out;
> +		else if (is_file_hugepages(file))
> +			len = ALIGN(len, huge_page_size(hstate_file(file)));
>  	} else if (flags & MAP_HUGETLB) {
>  		struct user_struct *user = NULL;
>  		/*

I'll added this in next post.

Thanks,
Naoya Horiguchi

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v2] hugetlbfs: fix mmap failure in unaligned size request
  2013-04-25 21:00             ` Naoya Horiguchi
@ 2013-04-26  4:35               ` Naoya Horiguchi
  2013-04-30 16:45                 ` Johannes Weiner
  0 siblings, 1 reply; 21+ messages in thread
From: Naoya Horiguchi @ 2013-04-26  4:35 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, linux-mm, bugzilla-daemon, iceman_dvd,
	Steven Truelove, Jianguo Wu

Here is a revised patch.
Thank you for the nice feedback, Johannes, Jianguo.
---
From: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Date: Fri, 26 Apr 2013 00:31:18 -0400
Subject: [PATCH v2] hugetlbfs: fix mmap failure in unaligned size request

As reported in https://bugzilla.kernel.org/show_bug.cgi?id=56881, current
kernel returns -EINVAL unless a given mmap length is "almost" hugepage
aligned. This is because in sys_mmap_pgoff() the given length is passed to
vm_mmap_pgoff() as it is without being aligned with hugepage boundary.

This is a regression introduced in commit 40716e29243d "hugetlbfs: fix
alignment of huge page requests", where alignment code is pushed into
hugetlb_file_setup() and the variable len in caller side is not changed.

To fix this, this patch partially reverts that commit, and adds alignment
code in caller side. And it also introduces hstate_sizelog() in order to
get proper hstate to specified hugepage size.

ChangeLog v2:
 - introduce hstate_sizelog to calculate alignment in caller side
 - add ALIGN also in file mmap path.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
 fs/hugetlbfs/inode.c    | 24 ++++++++++--------------
 include/linux/hugetlb.h | 19 +++++++++++++------
 ipc/shm.c               |  6 +++++-
 mm/mmap.c               |  7 ++++++-
 4 files changed, 34 insertions(+), 22 deletions(-)

diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index 523464e..a3f868a 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -909,11 +909,8 @@ static int can_do_hugetlb_shm(void)
 
 static int get_hstate_idx(int page_size_log)
 {
-	struct hstate *h;
+	struct hstate *h = hstate_sizelog(page_size_log);
 
-	if (!page_size_log)
-		return default_hstate_idx;
-	h = size_to_hstate(1 << page_size_log);
 	if (!h)
 		return -1;
 	return h - hstates;
@@ -929,9 +926,12 @@ static struct dentry_operations anon_ops = {
 	.d_dname = hugetlb_dname
 };
 
-struct file *hugetlb_file_setup(const char *name, unsigned long addr,
-				size_t size, vm_flags_t acctflag,
-				struct user_struct **user,
+/*
+ * Note that size should be aligned to proper hugepage size in caller side,
+ * otherwise hugetlb_reserve_pages reserves one less hugepages than intended.
+ */
+struct file *hugetlb_file_setup(const char *name, size_t size,
+				vm_flags_t acctflag, struct user_struct **user,
 				int creat_flags, int page_size_log)
 {
 	struct file *file = ERR_PTR(-ENOMEM);
@@ -939,8 +939,6 @@ struct file *hugetlb_file_setup(const char *name, unsigned long addr,
 	struct path path;
 	struct super_block *sb;
 	struct qstr quick_string;
-	struct hstate *hstate;
-	unsigned long num_pages;
 	int hstate_idx;
 
 	hstate_idx = get_hstate_idx(page_size_log);
@@ -980,12 +978,10 @@ struct file *hugetlb_file_setup(const char *name, unsigned long addr,
 	if (!inode)
 		goto out_dentry;
 
-	hstate = hstate_inode(inode);
-	size += addr & ~huge_page_mask(hstate);
-	num_pages = ALIGN(size, huge_page_size(hstate)) >>
-			huge_page_shift(hstate);
 	file = ERR_PTR(-ENOMEM);
-	if (hugetlb_reserve_pages(inode, 0, num_pages, NULL, acctflag))
+	if (hugetlb_reserve_pages(inode, 0,
+			size >> huge_page_shift(hstate_inode(inode)), NULL,
+			acctflag))
 		goto out_inode;
 
 	d_instantiate(path.dentry, inode);
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 8220a8a..c78f5a2 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -193,8 +193,7 @@ static inline struct hugetlbfs_sb_info *HUGETLBFS_SB(struct super_block *sb)
 
 extern const struct file_operations hugetlbfs_file_operations;
 extern const struct vm_operations_struct hugetlb_vm_ops;
-struct file *hugetlb_file_setup(const char *name, unsigned long addr,
-				size_t size, vm_flags_t acct,
+struct file *hugetlb_file_setup(const char *name, size_t size, vm_flags_t acct,
 				struct user_struct **user, int creat_flags,
 				int page_size_log);
 
@@ -213,8 +212,8 @@ static inline int is_file_hugepages(struct file *file)
 
 #define is_file_hugepages(file)			0
 static inline struct file *
-hugetlb_file_setup(const char *name, unsigned long addr, size_t size,
-		vm_flags_t acctflag, struct user_struct **user, int creat_flags,
+hugetlb_file_setup(const char *name, size_t size, vm_flags_t acctflag,
+		struct user_struct **user, int creat_flags,
 		int page_size_log)
 {
 	return ERR_PTR(-ENOSYS);
@@ -294,6 +293,13 @@ static inline struct hstate *hstate_file(struct file *f)
 	return hstate_inode(file_inode(f));
 }
 
+static inline struct hstate *hstate_sizelog(int page_size_log)
+{
+	if (!page_size_log)
+		return &default_hstate;
+	return size_to_hstate(1 << page_size_log);
+}
+
 static inline struct hstate *hstate_vma(struct vm_area_struct *vma)
 {
 	return hstate_file(vma->vm_file);
@@ -361,12 +367,13 @@ static inline int hstate_index(struct hstate *h)
 extern void dissolve_free_huge_pages(unsigned long start_pfn,
 				     unsigned long end_pfn);
 
-#else
+#else /* !CONFIG_HUGETLB_PAGE */
 struct hstate {};
 #define alloc_huge_page(v, a, r) NULL
 #define alloc_huge_page_node(h, nid) NULL
 #define alloc_bootmem_huge_page(h) NULL
 #define hstate_file(f) NULL
+#define hstate_sizelog(s) NULL
 #define hstate_vma(v) NULL
 #define hstate_inode(i) NULL
 #define huge_page_size(h) PAGE_SIZE
@@ -382,6 +389,6 @@ static inline unsigned int pages_per_huge_page(struct hstate *h)
 #define hstate_index_to_shift(index) 0
 #define hstate_index(h) 0
 #define dissolve_free_huge_pages(s, e) 0
-#endif
+#endif /* !CONFIG_HUGETLB_PAGE */
 
 #endif /* _LINUX_HUGETLB_H */
diff --git a/ipc/shm.c b/ipc/shm.c
index cb858df..e316cb9 100644
--- a/ipc/shm.c
+++ b/ipc/shm.c
@@ -491,10 +491,14 @@ static int newseg(struct ipc_namespace *ns, struct ipc_params *params)
 
 	sprintf (name, "SYSV%08x", key);
 	if (shmflg & SHM_HUGETLB) {
+		struct hstate *hs = hstate_sizelog((shmflg >> SHM_HUGE_SHIFT)
+						& SHM_HUGE_MASK);
+		size_t hugesize = ALIGN(size, huge_page_size(hs));
+
 		/* hugetlb_file_setup applies strict accounting */
 		if (shmflg & SHM_NORESERVE)
 			acctflag = VM_NORESERVE;
-		file = hugetlb_file_setup(name, 0, size, acctflag,
+		file = hugetlb_file_setup(name, hugesize, acctflag,
 				  &shp->mlock_user, HUGETLB_SHMFS_INODE,
 				(shmflg >> SHM_HUGE_SHIFT) & SHM_HUGE_MASK);
 	} else {
diff --git a/mm/mmap.c b/mm/mmap.c
index 2664a47..212721f 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1325,15 +1325,20 @@ SYSCALL_DEFINE6(mmap_pgoff, unsigned long, addr, unsigned long, len,
 		file = fget(fd);
 		if (!file)
 			goto out;
+		if (is_file_hugepages(file))
+			len = ALIGN(len, huge_page_size(hstate_file(file)));
 	} else if (flags & MAP_HUGETLB) {
 		struct user_struct *user = NULL;
+		struct hstate *hs = hstate_sizelog((flags >> MAP_HUGE_SHIFT)
+						& MAP_HUGE_MASK);
+		len = ALIGN(len, huge_page_size(hs));
 		/*
 		 * VM_NORESERVE is used because the reservations will be
 		 * taken when vm_ops->mmap() is called
 		 * A dummy user value is used because we are not locking
 		 * memory so no accounting is necessary
 		 */
-		file = hugetlb_file_setup(HUGETLB_ANON_FILE, addr, len,
+		file = hugetlb_file_setup(HUGETLB_ANON_FILE, len,
 				VM_NORESERVE,
 				&user, HUGETLB_ANONHUGE_INODE,
 				(flags >> MAP_HUGE_SHIFT) & MAP_HUGE_MASK);
-- 
1.7.11.7

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH v2] hugetlbfs: fix mmap failure in unaligned size request
  2013-04-26  4:35               ` [PATCH v2] hugetlbfs: fix mmap failure in unaligned size request Naoya Horiguchi
@ 2013-04-30 16:45                 ` Johannes Weiner
  2013-04-30 17:02                   ` [PATCH v3] " Naoya Horiguchi
  0 siblings, 1 reply; 21+ messages in thread
From: Johannes Weiner @ 2013-04-30 16:45 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: Andrew Morton, linux-mm, bugzilla-daemon, iceman_dvd,
	Steven Truelove, Jianguo Wu

On Fri, Apr 26, 2013 at 12:35:12AM -0400, Naoya Horiguchi wrote:
> Here is a revised patch.
> Thank you for the nice feedback, Johannes, Jianguo.

FWIW, this looks good to me.  Could you include

Reported-by: iceman_dvd@yahoo.com

and resend it to Andrew?  Unless Andrew sees and picks it directly
from this thread.  Hi, Andrew!

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v3] hugetlbfs: fix mmap failure in unaligned size request
  2013-04-30 16:45                 ` Johannes Weiner
@ 2013-04-30 17:02                   ` Naoya Horiguchi
  2013-05-01  8:00                     ` Sam Ben
  0 siblings, 1 reply; 21+ messages in thread
From: Naoya Horiguchi @ 2013-04-30 17:02 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Johannes Weiner, linux-mm, bugzilla-daemon, iceman_dvd,
	Steven Truelove, Jianguo Wu

On Tue, Apr 30, 2013 at 12:45:03PM -0400, Johannes Weiner wrote:
> On Fri, Apr 26, 2013 at 12:35:12AM -0400, Naoya Horiguchi wrote:
> > Here is a revised patch.
> > Thank you for the nice feedback, Johannes, Jianguo.
> 
> FWIW, this looks good to me.  Could you include
> 
> Reported-by: iceman_dvd@yahoo.com

OK, added.
And thank you for the report, iceman_dvd :)

> and resend it to Andrew?  Unless Andrew sees and picks it directly
> from this thread.  Hi, Andrew!

Andrew, could you review and merge this into your tree?

Thanks,
Naoya Horiguchi
---
From: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Date: Fri, 26 Apr 2013 00:31:18 -0400
Subject: [PATCH v3] hugetlbfs: fix mmap failure in unaligned size request

As reported in https://bugzilla.kernel.org/show_bug.cgi?id=56881, current
kernel returns -EINVAL unless a given mmap length is "almost" hugepage
aligned. This is because in sys_mmap_pgoff() the given length is passed to
vm_mmap_pgoff() as it is without being aligned with hugepage boundary.

This is a regression introduced in commit 40716e29243d "hugetlbfs: fix
alignment of huge page requests", where alignment code is pushed into
hugetlb_file_setup() and the variable len in caller side is not changed.

To fix this, this patch partially reverts that commit, and adds alignment
code in caller side. And it also introduces hstate_sizelog() in order to
get proper hstate to specified hugepage size.

ChangeLog v3:
 - add Reported-by

ChangeLog v2:
 - introduce hstate_sizelog to calculate alignment in caller side
 - add ALIGN also in file mmap path.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reported-by: iceman_dvd@yahoo.com
---
 fs/hugetlbfs/inode.c    | 24 ++++++++++--------------
 include/linux/hugetlb.h | 19 +++++++++++++------
 ipc/shm.c               |  6 +++++-
 mm/mmap.c               |  7 ++++++-
 4 files changed, 34 insertions(+), 22 deletions(-)

diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index 523464e..a3f868a 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -909,11 +909,8 @@ static int can_do_hugetlb_shm(void)
 
 static int get_hstate_idx(int page_size_log)
 {
-	struct hstate *h;
+	struct hstate *h = hstate_sizelog(page_size_log);
 
-	if (!page_size_log)
-		return default_hstate_idx;
-	h = size_to_hstate(1 << page_size_log);
 	if (!h)
 		return -1;
 	return h - hstates;
@@ -929,9 +926,12 @@ static struct dentry_operations anon_ops = {
 	.d_dname = hugetlb_dname
 };
 
-struct file *hugetlb_file_setup(const char *name, unsigned long addr,
-				size_t size, vm_flags_t acctflag,
-				struct user_struct **user,
+/*
+ * Note that size should be aligned to proper hugepage size in caller side,
+ * otherwise hugetlb_reserve_pages reserves one less hugepages than intended.
+ */
+struct file *hugetlb_file_setup(const char *name, size_t size,
+				vm_flags_t acctflag, struct user_struct **user,
 				int creat_flags, int page_size_log)
 {
 	struct file *file = ERR_PTR(-ENOMEM);
@@ -939,8 +939,6 @@ struct file *hugetlb_file_setup(const char *name, unsigned long addr,
 	struct path path;
 	struct super_block *sb;
 	struct qstr quick_string;
-	struct hstate *hstate;
-	unsigned long num_pages;
 	int hstate_idx;
 
 	hstate_idx = get_hstate_idx(page_size_log);
@@ -980,12 +978,10 @@ struct file *hugetlb_file_setup(const char *name, unsigned long addr,
 	if (!inode)
 		goto out_dentry;
 
-	hstate = hstate_inode(inode);
-	size += addr & ~huge_page_mask(hstate);
-	num_pages = ALIGN(size, huge_page_size(hstate)) >>
-			huge_page_shift(hstate);
 	file = ERR_PTR(-ENOMEM);
-	if (hugetlb_reserve_pages(inode, 0, num_pages, NULL, acctflag))
+	if (hugetlb_reserve_pages(inode, 0,
+			size >> huge_page_shift(hstate_inode(inode)), NULL,
+			acctflag))
 		goto out_inode;
 
 	d_instantiate(path.dentry, inode);
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 8220a8a..c78f5a2 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -193,8 +193,7 @@ static inline struct hugetlbfs_sb_info *HUGETLBFS_SB(struct super_block *sb)
 
 extern const struct file_operations hugetlbfs_file_operations;
 extern const struct vm_operations_struct hugetlb_vm_ops;
-struct file *hugetlb_file_setup(const char *name, unsigned long addr,
-				size_t size, vm_flags_t acct,
+struct file *hugetlb_file_setup(const char *name, size_t size, vm_flags_t acct,
 				struct user_struct **user, int creat_flags,
 				int page_size_log);
 
@@ -213,8 +212,8 @@ static inline int is_file_hugepages(struct file *file)
 
 #define is_file_hugepages(file)			0
 static inline struct file *
-hugetlb_file_setup(const char *name, unsigned long addr, size_t size,
-		vm_flags_t acctflag, struct user_struct **user, int creat_flags,
+hugetlb_file_setup(const char *name, size_t size, vm_flags_t acctflag,
+		struct user_struct **user, int creat_flags,
 		int page_size_log)
 {
 	return ERR_PTR(-ENOSYS);
@@ -294,6 +293,13 @@ static inline struct hstate *hstate_file(struct file *f)
 	return hstate_inode(file_inode(f));
 }
 
+static inline struct hstate *hstate_sizelog(int page_size_log)
+{
+	if (!page_size_log)
+		return &default_hstate;
+	return size_to_hstate(1 << page_size_log);
+}
+
 static inline struct hstate *hstate_vma(struct vm_area_struct *vma)
 {
 	return hstate_file(vma->vm_file);
@@ -361,12 +367,13 @@ static inline int hstate_index(struct hstate *h)
 extern void dissolve_free_huge_pages(unsigned long start_pfn,
 				     unsigned long end_pfn);
 
-#else
+#else /* !CONFIG_HUGETLB_PAGE */
 struct hstate {};
 #define alloc_huge_page(v, a, r) NULL
 #define alloc_huge_page_node(h, nid) NULL
 #define alloc_bootmem_huge_page(h) NULL
 #define hstate_file(f) NULL
+#define hstate_sizelog(s) NULL
 #define hstate_vma(v) NULL
 #define hstate_inode(i) NULL
 #define huge_page_size(h) PAGE_SIZE
@@ -382,6 +389,6 @@ static inline unsigned int pages_per_huge_page(struct hstate *h)
 #define hstate_index_to_shift(index) 0
 #define hstate_index(h) 0
 #define dissolve_free_huge_pages(s, e) 0
-#endif
+#endif /* !CONFIG_HUGETLB_PAGE */
 
 #endif /* _LINUX_HUGETLB_H */
diff --git a/ipc/shm.c b/ipc/shm.c
index cb858df..e316cb9 100644
--- a/ipc/shm.c
+++ b/ipc/shm.c
@@ -491,10 +491,14 @@ static int newseg(struct ipc_namespace *ns, struct ipc_params *params)
 
 	sprintf (name, "SYSV%08x", key);
 	if (shmflg & SHM_HUGETLB) {
+		struct hstate *hs = hstate_sizelog((shmflg >> SHM_HUGE_SHIFT)
+						& SHM_HUGE_MASK);
+		size_t hugesize = ALIGN(size, huge_page_size(hs));
+
 		/* hugetlb_file_setup applies strict accounting */
 		if (shmflg & SHM_NORESERVE)
 			acctflag = VM_NORESERVE;
-		file = hugetlb_file_setup(name, 0, size, acctflag,
+		file = hugetlb_file_setup(name, hugesize, acctflag,
 				  &shp->mlock_user, HUGETLB_SHMFS_INODE,
 				(shmflg >> SHM_HUGE_SHIFT) & SHM_HUGE_MASK);
 	} else {
diff --git a/mm/mmap.c b/mm/mmap.c
index 2664a47..212721f 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1325,15 +1325,20 @@ SYSCALL_DEFINE6(mmap_pgoff, unsigned long, addr, unsigned long, len,
 		file = fget(fd);
 		if (!file)
 			goto out;
+		if (is_file_hugepages(file))
+			len = ALIGN(len, huge_page_size(hstate_file(file)));
 	} else if (flags & MAP_HUGETLB) {
 		struct user_struct *user = NULL;
+		struct hstate *hs = hstate_sizelog((flags >> MAP_HUGE_SHIFT)
+						& MAP_HUGE_MASK);
+		len = ALIGN(len, huge_page_size(hs));
 		/*
 		 * VM_NORESERVE is used because the reservations will be
 		 * taken when vm_ops->mmap() is called
 		 * A dummy user value is used because we are not locking
 		 * memory so no accounting is necessary
 		 */
-		file = hugetlb_file_setup(HUGETLB_ANON_FILE, addr, len,
+		file = hugetlb_file_setup(HUGETLB_ANON_FILE, len,
 				VM_NORESERVE,
 				&user, HUGETLB_ANONHUGE_INODE,
 				(flags >> MAP_HUGE_SHIFT) & MAP_HUGE_MASK);
-- 
1.7.11.7

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH v3] hugetlbfs: fix mmap failure in unaligned size request
  2013-04-30 17:02                   ` [PATCH v3] " Naoya Horiguchi
@ 2013-05-01  8:00                     ` Sam Ben
  0 siblings, 0 replies; 21+ messages in thread
From: Sam Ben @ 2013-05-01  8:00 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: Andrew Morton, Johannes Weiner, linux-mm, bugzilla-daemon,
	iceman_dvd, Steven Truelove, Jianguo Wu

On 05/01/2013 01:02 AM, Naoya Horiguchi wrote:
> On Tue, Apr 30, 2013 at 12:45:03PM -0400, Johannes Weiner wrote:
>> On Fri, Apr 26, 2013 at 12:35:12AM -0400, Naoya Horiguchi wrote:
>>> Here is a revised patch.
>>> Thank you for the nice feedback, Johannes, Jianguo.
>> FWIW, this looks good to me.  Could you include
>>
>> Reported-by: iceman_dvd@yahoo.com
> OK, added.
> And thank you for the report, iceman_dvd :)
>
>> and resend it to Andrew?  Unless Andrew sees and picks it directly
>> from this thread.  Hi, Andrew!
> Andrew, could you review and merge this into your tree?
>
> Thanks,
> Naoya Horiguchi
> ---
> From: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> Date: Fri, 26 Apr 2013 00:31:18 -0400
> Subject: [PATCH v3] hugetlbfs: fix mmap failure in unaligned size request
>
> As reported in https://bugzilla.kernel.org/show_bug.cgi?id=56881, current
> kernel returns -EINVAL unless a given mmap length is "almost" hugepage
> aligned. This is because in sys_mmap_pgoff() the given length is passed to

Hi all,

When I use mmap in userspcae, when will call old_mmap and when will call
sys_mmap_pgoff()?

> vm_mmap_pgoff() as it is without being aligned with hugepage boundary.
>
> This is a regression introduced in commit 40716e29243d "hugetlbfs: fix
> alignment of huge page requests", where alignment code is pushed into
> hugetlb_file_setup() and the variable len in caller side is not changed.
>
> To fix this, this patch partially reverts that commit, and adds alignment
> code in caller side. And it also introduces hstate_sizelog() in order to
> get proper hstate to specified hugepage size.
>
> ChangeLog v3:
>  - add Reported-by
>
> ChangeLog v2:
>  - introduce hstate_sizelog to calculate alignment in caller side
>  - add ALIGN also in file mmap path.
>
> Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> Reported-by: iceman_dvd@yahoo.com
> ---
>  fs/hugetlbfs/inode.c    | 24 ++++++++++--------------
>  include/linux/hugetlb.h | 19 +++++++++++++------
>  ipc/shm.c               |  6 +++++-
>  mm/mmap.c               |  7 ++++++-
>  4 files changed, 34 insertions(+), 22 deletions(-)
>
> diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
> index 523464e..a3f868a 100644
> --- a/fs/hugetlbfs/inode.c
> +++ b/fs/hugetlbfs/inode.c
> @@ -909,11 +909,8 @@ static int can_do_hugetlb_shm(void)
>  
>  static int get_hstate_idx(int page_size_log)
>  {
> -	struct hstate *h;
> +	struct hstate *h = hstate_sizelog(page_size_log);
>  
> -	if (!page_size_log)
> -		return default_hstate_idx;
> -	h = size_to_hstate(1 << page_size_log);
>  	if (!h)
>  		return -1;
>  	return h - hstates;
> @@ -929,9 +926,12 @@ static struct dentry_operations anon_ops = {
>  	.d_dname = hugetlb_dname
>  };
>  
> -struct file *hugetlb_file_setup(const char *name, unsigned long addr,
> -				size_t size, vm_flags_t acctflag,
> -				struct user_struct **user,
> +/*
> + * Note that size should be aligned to proper hugepage size in caller side,
> + * otherwise hugetlb_reserve_pages reserves one less hugepages than intended.
> + */
> +struct file *hugetlb_file_setup(const char *name, size_t size,
> +				vm_flags_t acctflag, struct user_struct **user,
>  				int creat_flags, int page_size_log)
>  {
>  	struct file *file = ERR_PTR(-ENOMEM);
> @@ -939,8 +939,6 @@ struct file *hugetlb_file_setup(const char *name, unsigned long addr,
>  	struct path path;
>  	struct super_block *sb;
>  	struct qstr quick_string;
> -	struct hstate *hstate;
> -	unsigned long num_pages;
>  	int hstate_idx;
>  
>  	hstate_idx = get_hstate_idx(page_size_log);
> @@ -980,12 +978,10 @@ struct file *hugetlb_file_setup(const char *name, unsigned long addr,
>  	if (!inode)
>  		goto out_dentry;
>  
> -	hstate = hstate_inode(inode);
> -	size += addr & ~huge_page_mask(hstate);
> -	num_pages = ALIGN(size, huge_page_size(hstate)) >>
> -			huge_page_shift(hstate);
>  	file = ERR_PTR(-ENOMEM);
> -	if (hugetlb_reserve_pages(inode, 0, num_pages, NULL, acctflag))
> +	if (hugetlb_reserve_pages(inode, 0,
> +			size >> huge_page_shift(hstate_inode(inode)), NULL,
> +			acctflag))
>  		goto out_inode;
>  
>  	d_instantiate(path.dentry, inode);
> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> index 8220a8a..c78f5a2 100644
> --- a/include/linux/hugetlb.h
> +++ b/include/linux/hugetlb.h
> @@ -193,8 +193,7 @@ static inline struct hugetlbfs_sb_info *HUGETLBFS_SB(struct super_block *sb)
>  
>  extern const struct file_operations hugetlbfs_file_operations;
>  extern const struct vm_operations_struct hugetlb_vm_ops;
> -struct file *hugetlb_file_setup(const char *name, unsigned long addr,
> -				size_t size, vm_flags_t acct,
> +struct file *hugetlb_file_setup(const char *name, size_t size, vm_flags_t acct,
>  				struct user_struct **user, int creat_flags,
>  				int page_size_log);
>  
> @@ -213,8 +212,8 @@ static inline int is_file_hugepages(struct file *file)
>  
>  #define is_file_hugepages(file)			0
>  static inline struct file *
> -hugetlb_file_setup(const char *name, unsigned long addr, size_t size,
> -		vm_flags_t acctflag, struct user_struct **user, int creat_flags,
> +hugetlb_file_setup(const char *name, size_t size, vm_flags_t acctflag,
> +		struct user_struct **user, int creat_flags,
>  		int page_size_log)
>  {
>  	return ERR_PTR(-ENOSYS);
> @@ -294,6 +293,13 @@ static inline struct hstate *hstate_file(struct file *f)
>  	return hstate_inode(file_inode(f));
>  }
>  
> +static inline struct hstate *hstate_sizelog(int page_size_log)
> +{
> +	if (!page_size_log)
> +		return &default_hstate;
> +	return size_to_hstate(1 << page_size_log);
> +}
> +
>  static inline struct hstate *hstate_vma(struct vm_area_struct *vma)
>  {
>  	return hstate_file(vma->vm_file);
> @@ -361,12 +367,13 @@ static inline int hstate_index(struct hstate *h)
>  extern void dissolve_free_huge_pages(unsigned long start_pfn,
>  				     unsigned long end_pfn);
>  
> -#else
> +#else /* !CONFIG_HUGETLB_PAGE */
>  struct hstate {};
>  #define alloc_huge_page(v, a, r) NULL
>  #define alloc_huge_page_node(h, nid) NULL
>  #define alloc_bootmem_huge_page(h) NULL
>  #define hstate_file(f) NULL
> +#define hstate_sizelog(s) NULL
>  #define hstate_vma(v) NULL
>  #define hstate_inode(i) NULL
>  #define huge_page_size(h) PAGE_SIZE
> @@ -382,6 +389,6 @@ static inline unsigned int pages_per_huge_page(struct hstate *h)
>  #define hstate_index_to_shift(index) 0
>  #define hstate_index(h) 0
>  #define dissolve_free_huge_pages(s, e) 0
> -#endif
> +#endif /* !CONFIG_HUGETLB_PAGE */
>  
>  #endif /* _LINUX_HUGETLB_H */
> diff --git a/ipc/shm.c b/ipc/shm.c
> index cb858df..e316cb9 100644
> --- a/ipc/shm.c
> +++ b/ipc/shm.c
> @@ -491,10 +491,14 @@ static int newseg(struct ipc_namespace *ns, struct ipc_params *params)
>  
>  	sprintf (name, "SYSV%08x", key);
>  	if (shmflg & SHM_HUGETLB) {
> +		struct hstate *hs = hstate_sizelog((shmflg >> SHM_HUGE_SHIFT)
> +						& SHM_HUGE_MASK);
> +		size_t hugesize = ALIGN(size, huge_page_size(hs));
> +
>  		/* hugetlb_file_setup applies strict accounting */
>  		if (shmflg & SHM_NORESERVE)
>  			acctflag = VM_NORESERVE;
> -		file = hugetlb_file_setup(name, 0, size, acctflag,
> +		file = hugetlb_file_setup(name, hugesize, acctflag,
>  				  &shp->mlock_user, HUGETLB_SHMFS_INODE,
>  				(shmflg >> SHM_HUGE_SHIFT) & SHM_HUGE_MASK);
>  	} else {
> diff --git a/mm/mmap.c b/mm/mmap.c
> index 2664a47..212721f 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -1325,15 +1325,20 @@ SYSCALL_DEFINE6(mmap_pgoff, unsigned long, addr, unsigned long, len,
>  		file = fget(fd);
>  		if (!file)
>  			goto out;
> +		if (is_file_hugepages(file))
> +			len = ALIGN(len, huge_page_size(hstate_file(file)));
>  	} else if (flags & MAP_HUGETLB) {
>  		struct user_struct *user = NULL;
> +		struct hstate *hs = hstate_sizelog((flags >> MAP_HUGE_SHIFT)
> +						& MAP_HUGE_MASK);
> +		len = ALIGN(len, huge_page_size(hs));
>  		/*
>  		 * VM_NORESERVE is used because the reservations will be
>  		 * taken when vm_ops->mmap() is called
>  		 * A dummy user value is used because we are not locking
>  		 * memory so no accounting is necessary
>  		 */
> -		file = hugetlb_file_setup(HUGETLB_ANON_FILE, addr, len,
> +		file = hugetlb_file_setup(HUGETLB_ANON_FILE, len,
>  				VM_NORESERVE,
>  				&user, HUGETLB_ANONHUGE_INODE,
>  				(flags >> MAP_HUGE_SHIFT) & MAP_HUGE_MASK);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Bug 56881] New: MAP_HUGETLB mmap fails for certain sizes
  2013-04-24 23:05         ` Naoya Horiguchi
                             ` (2 preceding siblings ...)
  2013-04-25  3:02           ` [Bug 56881] New: MAP_HUGETLB mmap fails for certain sizes Jianguo Wu
@ 2013-06-12 12:16           ` Aneesh Kumar K.V
  2013-06-13 21:29             ` Andrew Morton
  3 siblings, 1 reply; 21+ messages in thread
From: Aneesh Kumar K.V @ 2013-06-12 12:16 UTC (permalink / raw)
  To: Naoya Horiguchi, Johannes Weiner
  Cc: Andrew Morton, linux-mm, bugzilla-daemon, iceman_dvd,
	Steven Truelove

Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> writes:

> On Wed, Apr 24, 2013 at 11:39:51AM -0400, Johannes Weiner wrote:
>> On Wed, Apr 24, 2013 at 11:16:39AM -0400, Naoya Horiguchi wrote:
>> > On Wed, Apr 24, 2013 at 04:14:54AM -0400, Johannes Weiner wrote:
>> > > @@ -491,10 +491,13 @@ static int newseg(struct ipc_namespace *ns, struct ipc_params *params)
>> > >  
>> > >  	sprintf (name, "SYSV%08x", key);
>> > >  	if (shmflg & SHM_HUGETLB) {
>> > > +		unsigned int hugesize;
>> > > +
>> > >  		/* hugetlb_file_setup applies strict accounting */
>> > >  		if (shmflg & SHM_NORESERVE)
>> > >  			acctflag = VM_NORESERVE;
>> > > -		file = hugetlb_file_setup(name, 0, size, acctflag,
>> > > +		hugesize = ALIGN(size, huge_page_size(&default_hstate));
>> > > +		file = hugetlb_file_setup(name, hugesize, acctflag,
>> > >  				  &shp->mlock_user, HUGETLB_SHMFS_INODE,
>> > >  				(shmflg >> SHM_HUGE_SHIFT) & SHM_HUGE_MASK);
>> > >  	} else {
>> > 
>> > Would it be better to find proper hstate instead of using default_hstate?
>> 
>> You are probably right, I guess we can't assume default_hstate anymore
>> after page_size_log can be passed in.
>> 
>> Can we have hugetlb_file_setup() return an adjusted length, or an
>> alignment requirement?
>
> Yes, it's possible if callers pass the pointer of size (length) to
> hugetlb_file_setup() and make it adjusted inside the function.
> And as for alignment, I think it's not a hugetlb_file_setup's job,
> so we don't have to do it in this function.
>
>> Or pull the hstate lookup into the callsites (since they pass in
>> page_size_log to begin with)?
>
> This is also a possible solution, where we might need to define and
> export a function converting hugepage order to hstate.
>
> I like the former one, so wrote a patch like below.
> # I added your Signed-off-by: because this's based on your draft patch.
> # if you don't like it, please let me know.
>
> Thanks,
> Naoya Horiguchi
> ---
> From: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> Date: Wed, 24 Apr 2013 16:44:19 -0400
> Subject: [PATCH] hugetlbfs: fix mmap failure in unaligned size request
>
> As reported in https://bugzilla.kernel.org/show_bug.cgi?id=56881, current
> kernel returns -EINVAL unless a given mmap length is "almost" hugepage
> aligned. This is because in sys_mmap_pgoff() the given length is passed to
> vm_mmap_pgoff() as it is without being aligned with hugepage boundary.
>
> This is a regression introduced in commit 40716e29243d "hugetlbfs: fix
> alignment of huge page requests", where alignment code is pushed into
> hugetlb_file_setup() and the variable len in caller side is not changed.
>
> To fix this, this patch partially reverts that commit, and changes
> the type of parameter size from size_t to (size_t *) in order to
> align the size in caller side.

After the change af73e4d9506d3b797509f3c030e7dcd554f7d9c4 we have
alignment related failures in libhugetlbfs test suite. misalign test
fails with 3.10-rc5, while it works with 3.9.

-aneesh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Bug 56881] New: MAP_HUGETLB mmap fails for certain sizes
  2013-06-12 12:16           ` Aneesh Kumar K.V
@ 2013-06-13 21:29             ` Andrew Morton
  2013-06-18 11:14               ` Aneesh Kumar K.V
  0 siblings, 1 reply; 21+ messages in thread
From: Andrew Morton @ 2013-06-13 21:29 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: Naoya Horiguchi, Johannes Weiner, linux-mm, bugzilla-daemon,
	iceman_dvd, Steven Truelove

On Wed, 12 Jun 2013 17:46:16 +0530 "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> wrote:

> > From: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> > Date: Wed, 24 Apr 2013 16:44:19 -0400
> > Subject: [PATCH] hugetlbfs: fix mmap failure in unaligned size request
> >
> > As reported in https://bugzilla.kernel.org/show_bug.cgi?id=56881, current
> > kernel returns -EINVAL unless a given mmap length is "almost" hugepage
> > aligned. This is because in sys_mmap_pgoff() the given length is passed to
> > vm_mmap_pgoff() as it is without being aligned with hugepage boundary.
> >
> > This is a regression introduced in commit 40716e29243d "hugetlbfs: fix
> > alignment of huge page requests", where alignment code is pushed into
> > hugetlb_file_setup() and the variable len in caller side is not changed.
> >
> > To fix this, this patch partially reverts that commit, and changes
> > the type of parameter size from size_t to (size_t *) in order to
> > align the size in caller side.
> 
> After the change af73e4d9506d3b797509f3c030e7dcd554f7d9c4 we have
> alignment related failures in libhugetlbfs test suite. misalign test
> fails with 3.10-rc5, while it works with 3.9.

What does this mean.  Is 3.10-rc5 more strict, or less strict?

If "less strict" then that's expected and old userspace should be OK
with the change and the test should be updated (sorry).

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Bug 56881] New: MAP_HUGETLB mmap fails for certain sizes
  2013-06-13 21:29             ` Andrew Morton
@ 2013-06-18 11:14               ` Aneesh Kumar K.V
  0 siblings, 0 replies; 21+ messages in thread
From: Aneesh Kumar K.V @ 2013-06-18 11:14 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Naoya Horiguchi, Johannes Weiner, linux-mm, bugzilla-daemon,
	iceman_dvd, Steven Truelove

Andrew Morton <akpm@linux-foundation.org> writes:

> On Wed, 12 Jun 2013 17:46:16 +0530 "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> wrote:
>
>> > From: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
>> > Date: Wed, 24 Apr 2013 16:44:19 -0400
>> > Subject: [PATCH] hugetlbfs: fix mmap failure in unaligned size request
>> >
>> > As reported in https://bugzilla.kernel.org/show_bug.cgi?id=56881, current
>> > kernel returns -EINVAL unless a given mmap length is "almost" hugepage
>> > aligned. This is because in sys_mmap_pgoff() the given length is passed to
>> > vm_mmap_pgoff() as it is without being aligned with hugepage boundary.
>> >
>> > This is a regression introduced in commit 40716e29243d "hugetlbfs: fix
>> > alignment of huge page requests", where alignment code is pushed into
>> > hugetlb_file_setup() and the variable len in caller side is not changed.
>> >
>> > To fix this, this patch partially reverts that commit, and changes
>> > the type of parameter size from size_t to (size_t *) in order to
>> > align the size in caller side.
>> 
>> After the change af73e4d9506d3b797509f3c030e7dcd554f7d9c4 we have
>> alignment related failures in libhugetlbfs test suite. misalign test
>> fails with 3.10-rc5, while it works with 3.9.
>
> What does this mean.  Is 3.10-rc5 more strict, or less strict?
>
> If "less strict" then that's expected and old userspace should be OK
> with the change and the test should be updated (sorry).

3.10_rc5 is less strict. Also Naoya Horiguchi updated that relevant
changes to libhugetlbfs test is also posted at 

http://www.mail-archive.com/libhugetlbfs-devel@lists.sourceforge.net/msg13317.html

-aneesh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2013-06-18 11:15 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <bug-56881-27@https.bugzilla.kernel.org/>
2013-04-23 20:25 ` [Bug 56881] New: MAP_HUGETLB mmap fails for certain sizes Andrew Morton
2013-04-23 22:26   ` Naoya Horiguchi
2013-04-24  8:20     ` Jianguo Wu
2013-04-24 14:17       ` Naoya Horiguchi
2013-04-24  8:14   ` Johannes Weiner
2013-04-24 15:16     ` Naoya Horiguchi
2013-04-24 15:39       ` Johannes Weiner
2013-04-24 23:05         ` Naoya Horiguchi
2013-04-24 23:13           ` Naoya Horiguchi
2013-04-24 23:44             ` Johannes Weiner
2013-04-24 23:26           ` Johannes Weiner
2013-04-25 21:00             ` Naoya Horiguchi
2013-04-26  4:35               ` [PATCH v2] hugetlbfs: fix mmap failure in unaligned size request Naoya Horiguchi
2013-04-30 16:45                 ` Johannes Weiner
2013-04-30 17:02                   ` [PATCH v3] " Naoya Horiguchi
2013-05-01  8:00                     ` Sam Ben
2013-04-25  3:02           ` [Bug 56881] New: MAP_HUGETLB mmap fails for certain sizes Jianguo Wu
2013-04-25 21:03             ` Naoya Horiguchi
2013-06-12 12:16           ` Aneesh Kumar K.V
2013-06-13 21:29             ` Andrew Morton
2013-06-18 11:14               ` Aneesh Kumar K.V

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).