All of lore.kernel.org
 help / color / mirror / Atom feed
* More convenient way to grab hugepage memory
@ 2004-05-13  5:55 David Gibson
  2004-05-13  6:05 ` Muli Ben-Yehuda
                   ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: David Gibson @ 2004-05-13  5:55 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Anton Blanchard, Adam Litke, Benjamin Herrenschmidt, linux-kernel,
	linuxppc64-dev

Andrew, please apply:

At present, getting a block of (quasi-) anonymous memory mapping with
hugepages is a slightly convoluted process, involving creating a dummy
file in a hugetlbfs filesystem.  In particular that means finding
where such a filesystem is mounted, for which there is no standard
mechanism.  Getting hugepage SysV shm segments is easier, just requing
the SHM_HUGETLB flag.  This patch adds an analagous MAP_HUGETLB mmap()
flag to easily request that a block of anonymous memory come from
hugepages.

[The MAP_HUGETLB flag has the side effect that MAP_SHARED semantics
will apply, even if MAP_PRIVATE is specific - but that's no different
to explicitly mapping hugetlbfs].

Index: working-2.6/include/asm-i386/mman.h
===================================================================
--- working-2.6.orig/include/asm-i386/mman.h	2003-10-03 11:24:48.000000000 +1000
+++ working-2.6/include/asm-i386/mman.h	2004-04-27 13:40:01.058286584 +1000
@@ -16,6 +16,7 @@
 #define MAP_ANONYMOUS	0x20		/* don't use a file */
 
 #define MAP_GROWSDOWN	0x0100		/* stack-like segment */
+#define MAP_HUGETLB	0x0400		/* Backed by hugetlb pages */
 #define MAP_DENYWRITE	0x0800		/* ETXTBSY */
 #define MAP_EXECUTABLE	0x1000		/* mark it as an executable */
 #define MAP_LOCKED	0x2000		/* pages are locked */
Index: working-2.6/include/asm-ppc64/mman.h
===================================================================
--- working-2.6.orig/include/asm-ppc64/mman.h	2003-10-01 11:47:33.000000000 +1000
+++ working-2.6/include/asm-ppc64/mman.h	2004-04-27 13:40:01.058286584 +1000
@@ -26,6 +26,7 @@
 #define MAP_LOCKED	0x80
 
 #define MAP_GROWSDOWN	0x0100		/* stack-like segment */
+#define MAP_HUGETLB	0x0400		/* Backed with hugetlb pages */
 #define MAP_DENYWRITE	0x0800		/* ETXTBSY */
 #define MAP_EXECUTABLE	0x1000		/* mark it as an executable */
 
Index: working-2.6/include/linux/mman.h
===================================================================
--- working-2.6.orig/include/linux/mman.h	2003-10-07 10:31:58.000000000 +1000
+++ working-2.6/include/linux/mman.h	2004-04-27 13:40:01.059286432 +1000
@@ -58,6 +58,9 @@
 	return _calc_vm_trans(flags, MAP_GROWSDOWN,  VM_GROWSDOWN ) |
 	       _calc_vm_trans(flags, MAP_DENYWRITE,  VM_DENYWRITE ) |
 	       _calc_vm_trans(flags, MAP_EXECUTABLE, VM_EXECUTABLE) |
+#ifdef CONFIG_HUGETLB_PAGE
+               _calc_vm_trans(flags, MAP_HUGETLB,    VM_HUGETLB   ) |
+#endif
 	       _calc_vm_trans(flags, MAP_LOCKED,     VM_LOCKED    );
 }
 
Index: working-2.6/include/asm-sh/mman.h
===================================================================
--- working-2.6.orig/include/asm-sh/mman.h	2003-10-01 11:47:40.000000000 +1000
+++ working-2.6/include/asm-sh/mman.h	2004-04-27 13:40:01.059286432 +1000
@@ -16,6 +16,7 @@
 #define MAP_ANONYMOUS	0x20		/* don't use a file */
 
 #define MAP_GROWSDOWN	0x0100		/* stack-like segment */
+#define MAP_HUGETLB	0x0400		/* Backed with hugetlb pages */
 #define MAP_DENYWRITE	0x0800		/* ETXTBSY */
 #define MAP_EXECUTABLE	0x1000		/* mark it as an executable */
 #define MAP_LOCKED	0x2000		/* pages are locked */
Index: working-2.6/include/asm-ia64/mman.h
===================================================================
--- working-2.6.orig/include/asm-ia64/mman.h	2004-01-28 10:55:18.000000000 +1100
+++ working-2.6/include/asm-ia64/mman.h	2004-04-27 13:40:01.059286432 +1000
@@ -24,6 +24,7 @@
 
 #define MAP_GROWSDOWN	0x00100		/* stack-like segment */
 #define MAP_GROWSUP	0x00200		/* register stack-like segment */
+#define MAP_HUGETLB	0x00400		/* Backed with hugetlb pages */
 #define MAP_DENYWRITE	0x00800		/* ETXTBSY */
 #define MAP_EXECUTABLE	0x01000		/* mark it as an executable */
 #define MAP_LOCKED	0x02000		/* pages are locked */
Index: working-2.6/include/asm-sparc64/mman.h
===================================================================
--- working-2.6.orig/include/asm-sparc64/mman.h	2003-10-01 11:47:45.000000000 +1000
+++ working-2.6/include/asm-sparc64/mman.h	2004-04-27 13:40:01.060286280 +1000
@@ -24,6 +24,7 @@
 #define _MAP_NEW        0x80000000      /* Binary compatibility is fun... */
 
 #define MAP_GROWSDOWN	0x0200		/* stack-like segment */
+#define MAP_HUGETLB	0x0400		/* Backed with hugetlb pages */
 #define MAP_DENYWRITE	0x0800		/* ETXTBSY */
 #define MAP_EXECUTABLE	0x1000		/* mark it as an executable */
 
Index: working-2.6/include/asm-x86_64/mman.h
===================================================================
--- working-2.6.orig/include/asm-x86_64/mman.h	2003-10-01 11:47:50.000000000 +1000
+++ working-2.6/include/asm-x86_64/mman.h	2004-04-27 13:40:01.061286128 +1000
@@ -17,6 +17,7 @@
 #define MAP_32BIT	0x40		/* only give out 32bit addresses */
 
 #define MAP_GROWSDOWN	0x0100		/* stack-like segment */
+#define MAP_HUGETLB	0x0400		/* Backed with hugetlb pages */
 #define MAP_DENYWRITE	0x0800		/* ETXTBSY */
 #define MAP_EXECUTABLE	0x1000		/* mark it as an executable */
 #define MAP_LOCKED	0x2000		/* pages are locked */
Index: working-2.6/include/asm-ppc/mman.h
===================================================================
--- working-2.6.orig/include/asm-ppc/mman.h	2003-10-01 11:47:28.000000000 +1000
+++ working-2.6/include/asm-ppc/mman.h	2004-04-27 14:01:47.067366688 +1000
@@ -19,6 +19,7 @@
 #define MAP_LOCKED	0x80
 
 #define MAP_GROWSDOWN	0x0100		/* stack-like segment */
+#define MAP_HUGETLB	0x0400		/* Backed with hugetlb pages */
 #define MAP_DENYWRITE	0x0800		/* ETXTBSY */
 #define MAP_EXECUTABLE	0x1000		/* mark it as an executable */
 #define MAP_POPULATE	0x8000		/* populate (prefault) pagetables */
Index: working-2.6/include/asm-alpha/mman.h
===================================================================
--- working-2.6.orig/include/asm-alpha/mman.h	2004-04-27 13:41:24.845329144 +1000
+++ working-2.6/include/asm-alpha/mman.h	2004-04-27 13:59:08.242417392 +1000
@@ -28,6 +28,7 @@
 #define MAP_NORESERVE	0x10000		/* don't check for reservations */
 #define MAP_POPULATE	0x20000		/* populate (prefault) pagetables */
 #define MAP_NONBLOCK	0x40000		/* do not block on IO */
+#define MAP_HUGETLB	0x80000		/* Backed with hugetlb pages */
 
 #define MS_ASYNC	1		/* sync memory asynchronously */
 #define MS_SYNC		2		/* synchronous memory sync */
Index: working-2.6/include/asm-arm/mman.h
===================================================================
--- working-2.6.orig/include/asm-arm/mman.h	2004-04-27 13:41:49.445317984 +1000
+++ working-2.6/include/asm-arm/mman.h	2004-04-27 13:59:22.106305584 +1000
@@ -16,6 +16,7 @@
 #define MAP_ANONYMOUS	0x20		/* don't use a file */
 
 #define MAP_GROWSDOWN	0x0100		/* stack-like segment */
+#define MAP_HUGETLB	0x0400		/* Backed with hugetlb pages */
 #define MAP_DENYWRITE	0x0800		/* ETXTBSY */
 #define MAP_EXECUTABLE	0x1000		/* mark it as an executable */
 #define MAP_LOCKED	0x2000		/* pages are locked */
Index: working-2.6/include/asm-arm26/mman.h
===================================================================
--- working-2.6.orig/include/asm-arm26/mman.h	2003-10-01 11:47:01.000000000 +1000
+++ working-2.6/include/asm-arm26/mman.h	2004-04-27 13:59:38.194390184 +1000
@@ -16,6 +16,7 @@
 #define MAP_ANONYMOUS	0x20		/* don't use a file */
 
 #define MAP_GROWSDOWN	0x0100		/* stack-like segment */
+#define MAP_HUGETLB	0x0400		/* Backed with hugetlb pages */
 #define MAP_DENYWRITE	0x0800		/* ETXTBSY */
 #define MAP_EXECUTABLE	0x1000		/* mark it as an executable */
 #define MAP_LOCKED	0x2000		/* pages are locked */
Index: working-2.6/include/asm-cris/mman.h
===================================================================
--- working-2.6.orig/include/asm-cris/mman.h	2003-10-01 11:47:02.000000000 +1000
+++ working-2.6/include/asm-cris/mman.h	2004-04-27 13:59:49.690373160 +1000
@@ -18,6 +18,7 @@
 #define MAP_ANONYMOUS	0x20		/* don't use a file */
 
 #define MAP_GROWSDOWN	0x0100		/* stack-like segment */
+#define MAP_HUGETLB	0x0400		/* Backed with hugetlb pages */
 #define MAP_DENYWRITE	0x0800		/* ETXTBSY */
 #define MAP_EXECUTABLE	0x1000		/* mark it as an executable */
 #define MAP_LOCKED	0x2000		/* pages are locked */
Index: working-2.6/include/asm-h8300/mman.h
===================================================================
--- working-2.6.orig/include/asm-h8300/mman.h	2003-10-01 11:47:05.000000000 +1000
+++ working-2.6/include/asm-h8300/mman.h	2004-04-27 14:00:01.026380304 +1000
@@ -15,6 +15,7 @@
 #define MAP_ANONYMOUS	0x20		/* don't use a file */
 
 #define MAP_GROWSDOWN	0x0100		/* stack-like segment */
+#define MAP_HUGETLB	0x0400		/* Backed with hugetlb pages */
 #define MAP_DENYWRITE	0x0800		/* ETXTBSY */
 #define MAP_EXECUTABLE	0x1000		/* mark it as an executable */
 #define MAP_LOCKED	0x2000		/* pages are locked */
Index: working-2.6/include/asm-m68k/mman.h
===================================================================
--- working-2.6.orig/include/asm-m68k/mman.h	2003-10-01 11:47:14.000000000 +1000
+++ working-2.6/include/asm-m68k/mman.h	2004-04-27 14:00:13.418360392 +1000
@@ -16,6 +16,7 @@
 #define MAP_ANONYMOUS	0x20		/* don't use a file */
 
 #define MAP_GROWSDOWN	0x0100		/* stack-like segment */
+#define MAP_HUGETLB	0x0400		/* Backed with hugetlb pages */
 #define MAP_DENYWRITE	0x0800		/* ETXTBSY */
 #define MAP_EXECUTABLE	0x1000		/* mark it as an executable */
 #define MAP_LOCKED	0x2000		/* pages are locked */
Index: working-2.6/include/asm-mips/mman.h
===================================================================
--- working-2.6.orig/include/asm-mips/mman.h	2003-10-01 11:47:19.000000000 +1000
+++ working-2.6/include/asm-mips/mman.h	2004-04-27 14:01:02.499353552 +1000
@@ -38,6 +38,7 @@
 #define MAP_AUTORSRV	0x100		/* Logical swap reserved on demand */
 
 /* These are linux-specific */
+#define MAP_HUGETLB	0x0200		/* Backed with hugetlb pages */
 #define MAP_NORESERVE	0x0400		/* don't check for reservations */
 #define MAP_ANONYMOUS	0x0800		/* don't use a file */
 #define MAP_GROWSDOWN	0x1000		/* stack-like segment */
Index: working-2.6/include/asm-parisc/mman.h
===================================================================
--- working-2.6.orig/include/asm-parisc/mman.h	2004-04-27 14:01:27.187327864 +1000
+++ working-2.6/include/asm-parisc/mman.h	2004-04-27 14:01:33.227341696 +1000
@@ -15,6 +15,7 @@
 #define MAP_FIXED	0x04		/* Interpret addr exactly */
 #define MAP_ANONYMOUS	0x10		/* don't use a file */
 
+#define MAP_HUGETLB	0x0400		/* Backed with hugetlb pages */
 #define MAP_DENYWRITE	0x0800		/* ETXTBSY */
 #define MAP_EXECUTABLE	0x1000		/* mark it as an executable */
 #define MAP_LOCKED	0x2000		/* pages are locked */
Index: working-2.6/include/asm-s390/mman.h
===================================================================
--- working-2.6.orig/include/asm-s390/mman.h	2003-10-01 11:47:38.000000000 +1000
+++ working-2.6/include/asm-s390/mman.h	2004-04-27 14:02:16.636399000 +1000
@@ -24,6 +24,7 @@
 #define MAP_ANONYMOUS	0x20		/* don't use a file */
 
 #define MAP_GROWSDOWN	0x0100		/* stack-like segment */
+#define MAP_HUGETLB	0x0400		/* Backed with hugetlb pages */
 #define MAP_DENYWRITE	0x0800		/* ETXTBSY */
 #define MAP_EXECUTABLE	0x1000		/* mark it as an executable */
 #define MAP_LOCKED	0x2000		/* pages are locked */
Index: working-2.6/include/asm-sparc/mman.h
===================================================================
--- working-2.6.orig/include/asm-sparc/mman.h	2003-10-01 11:47:43.000000000 +1000
+++ working-2.6/include/asm-sparc/mman.h	2004-04-27 14:02:47.172415328 +1000
@@ -24,6 +24,7 @@
 #define _MAP_NEW        0x80000000      /* Binary compatibility is fun... */
 
 #define MAP_GROWSDOWN	0x0200		/* stack-like segment */
+#define MAP_HUGETLB	0x0400		/* Backed with hugetlb pages */
 #define MAP_DENYWRITE	0x0800		/* ETXTBSY */
 #define MAP_EXECUTABLE	0x1000		/* mark it as an executable */
 
Index: working-2.6/include/asm-v850/mman.h
===================================================================
--- working-2.6.orig/include/asm-v850/mman.h	2003-10-01 11:47:49.000000000 +1000
+++ working-2.6/include/asm-v850/mman.h	2004-04-27 14:03:19.989353704 +1000
@@ -15,6 +15,7 @@
 #define MAP_ANONYMOUS	0x20		/* don't use a file */
 
 #define MAP_GROWSDOWN	0x0100		/* stack-like segment */
+#define MAP_HUGETLB	0x0400		/* Backed with hugetlb pages */
 #define MAP_DENYWRITE	0x0800		/* ETXTBSY */
 #define MAP_EXECUTABLE	0x1000		/* mark it as an executable */
 #define MAP_LOCKED	0x2000		/* pages are locked */
Index: working-2.6/mm/mmap.c
===================================================================
--- working-2.6.orig/mm/mmap.c	2004-04-20 10:50:09.000000000 +1000
+++ working-2.6/mm/mmap.c	2004-04-27 13:40:01.062285976 +1000
@@ -21,6 +21,7 @@
 #include <linux/profile.h>
 #include <linux/module.h>
 #include <linux/mount.h>
+#include <linux/err.h>
 
 #include <asm/uaccess.h>
 #include <asm/pgalloc.h>
@@ -62,6 +63,9 @@
 EXPORT_SYMBOL(sysctl_max_map_count);
 EXPORT_SYMBOL(vm_committed_space);
 
+int mmap_use_hugepages = 0;
+int mmap_hugepages_map_sz = 256;
+
 /*
  * Requires inode->i_mapping->i_shared_sem
  */
@@ -476,13 +480,9 @@
 	return NULL;
 }
 
-/*
- * The caller must hold down_write(current->mm->mmap_sem).
- */
-
-unsigned long do_mmap_pgoff(struct file * file, unsigned long addr,
-			unsigned long len, unsigned long prot,
-			unsigned long flags, unsigned long pgoff)
+static inline unsigned long __do_mmap_pgoff(struct file * file, unsigned long addr,
+					    unsigned long len, unsigned long prot,
+					    unsigned long flags, unsigned long pgoff)
 {
 	struct mm_struct * mm = current->mm;
 	struct vm_area_struct * vma, * prev;
@@ -494,40 +494,19 @@
 	int accountable = 1;
 	unsigned long charged = 0;
 
-	if (file) {
-		if (is_file_hugepages(file))
-			accountable = 0;
-
-		if (!file->f_op || !file->f_op->mmap)
-			return -ENODEV;
-
-		if ((prot & PROT_EXEC) && (file->f_vfsmnt->mnt_flags & MNT_NOEXEC))
-			return -EPERM;
-	}
-
-	if (!len)
-		return addr;
-
-	/* Careful about overflows.. */
-	len = PAGE_ALIGN(len);
-	if (!len || len > TASK_SIZE)
-		return -EINVAL;
-
-	/* offset overflow? */
-	if ((pgoff + (len >> PAGE_SHIFT)) < pgoff)
-		return -EINVAL;
-
-	/* Too many mappings? */
-	if (mm->map_count > sysctl_max_map_count)
-		return -ENOMEM;
-
-	/* Obtain the address to map to. we verify (or select) it and ensure
-	 * that it represents a valid section of the address space.
+	/* Obtain the address to map to. we verify (or select) it and
+	 * ensure that it represents a valid section of the address
+	 * space.  VM_HUGETLB will never appear in vm_flags when
+	 * CONFIG_HUGETLB is unset.
 	 */
 	addr = get_unmapped_area(file, addr, len, pgoff, flags);
 	if (addr & ~PAGE_MASK)
 		return addr;
 
+	/* Huge pages aren't accounted for here */
+	if (file && is_file_hugepages(file))
+		accountable = 0;
+
 	/* Do simple checking here so the lower-level routines won't have
 	 * to. we assume access permissions have been handled by the open
 	 * of the memory object, so we don't do any here.
@@ -724,11 +703,17 @@
 unmap_and_free_vma:
 	if (correct_wcount)
 		atomic_inc(&inode->i_writecount);
-	vma->vm_file = NULL;
-	fput(file);
 
-	/* Undo any partial mapping done by a device driver. */
+	/*
+	 * Undo any partial mapping done by a device driver.  
+	 * hugetlb wants to know the vma's file etc. so nuke  
+	 * the file afterward.                                
+	 */                                                   
 	zap_page_range(vma, vma->vm_start, vma->vm_end - vma->vm_start, NULL);
+
+	if (file)
+		fput(vma->vm_file); 
+
 free_vma:
 	kmem_cache_free(vm_area_cachep, vma);
 unacct_error:
@@ -737,6 +722,62 @@
 	return error;
 }
 
+/*
+ * The caller must hold down_write(current->mm->mmap_sem).
+ */
+
+unsigned long do_mmap_pgoff(struct file * file, unsigned long addr,
+			unsigned long len, unsigned long prot,
+			unsigned long flags, unsigned long pgoff)
+{
+	struct file *hugetlb_file = NULL;
+	unsigned long result;
+
+	if (file) {
+		if ((flags & MAP_HUGETLB) && !is_file_hugepages(file))
+			return -EINVAL;
+
+		if (!file->f_op || !file->f_op->mmap)
+			return -ENODEV;
+
+		if ((prot & PROT_EXEC) && (file->f_vfsmnt->mnt_flags & MNT_NOEXEC))
+			return -EPERM;
+	}
+
+	if (!len)
+		return addr;
+
+	/* Careful about overflows.. */
+	len = PAGE_ALIGN(len);
+	if (!len || len > TASK_SIZE)
+		return -EINVAL;
+
+	/* offset overflow? */
+	if ((pgoff + (len >> PAGE_SHIFT)) < pgoff)
+		return -EINVAL;
+
+	/* Too many mappings? */
+	if (current->mm->map_count > sysctl_max_map_count)
+		return -ENOMEM;
+
+	/* Create an implicit hugetlb file if necessary */
+	if (!file && (flags & MAP_HUGETLB)) {
+		file = hugetlb_file = hugetlb_zero_setup(len);
+		if (IS_ERR(file))
+			return PTR_ERR(file);
+	}
+
+	result = __do_mmap_pgoff(file, addr, len, prot, flags, pgoff);
+
+	/* Drop reference to implicit hugetlb file, it's already been
+	 * "gotten" in __do_mmap_pgoff in case of success
+	 */
+	if (hugetlb_file)
+		fput(hugetlb_file);
+
+	return result;
+}
+
 EXPORT_SYMBOL(do_mmap_pgoff);
 
 /* Get an address range which is currently unmapped.

-- 
David Gibson			| For every complex problem there is a
david AT gibson.dropbear.id.au	| solution which is simple, neat and
				| wrong.
http://www.ozlabs.org/people/dgibson

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: More convenient way to grab hugepage memory
  2004-05-13  5:55 More convenient way to grab hugepage memory David Gibson
@ 2004-05-13  6:05 ` Muli Ben-Yehuda
  2004-05-13  6:20   ` David Gibson
  2004-05-13  6:59   ` William Lee Irwin III
  2004-05-13  6:06 ` William Lee Irwin III
  2004-05-13  7:49 ` Christoph Hellwig
  2 siblings, 2 replies; 14+ messages in thread
From: Muli Ben-Yehuda @ 2004-05-13  6:05 UTC (permalink / raw)
  To: David Gibson, Andrew Morton, Anton Blanchard, Adam Litke,
	Benjamin Herrenschmidt, linux-kernel, linuxppc64-dev

[-- Attachment #1: Type: text/plain, Size: 896 bytes --]

On Thu, May 13, 2004 at 03:55:20PM +1000, David Gibson wrote:

> --- working-2.6.orig/mm/mmap.c	2004-04-20 10:50:09.000000000 +1000
> +++ working-2.6/mm/mmap.c	2004-04-27 13:40:01.062285976 +1000
> @@ -21,6 +21,7 @@
>  #include <linux/profile.h>
>  #include <linux/module.h>
>  #include <linux/mount.h>
> +#include <linux/err.h>
>  
>  #include <asm/uaccess.h>
>  #include <asm/pgalloc.h>
> @@ -62,6 +63,9 @@
>  EXPORT_SYMBOL(sysctl_max_map_count);
>  EXPORT_SYMBOL(vm_committed_space);
>  
> +int mmap_use_hugepages = 0;
> +int mmap_hugepages_map_sz = 256;

These two global variables do not appear to be used anywhere in this
patch? 

> +static inline unsigned long __do_mmap_pgoff(struct file * file, unsigned long addr,

__do_mmap_pgoff seems rather long to be inline? 

Cheers, 
Muli 
-- 
Muli Ben-Yehuda
http://www.mulix.org | http://mulix.livejournal.com/


[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: More convenient way to grab hugepage memory
  2004-05-13  5:55 More convenient way to grab hugepage memory David Gibson
  2004-05-13  6:05 ` Muli Ben-Yehuda
@ 2004-05-13  6:06 ` William Lee Irwin III
  2004-05-13  6:27   ` William Lee Irwin III
  2004-05-13  7:49 ` Christoph Hellwig
  2 siblings, 1 reply; 14+ messages in thread
From: William Lee Irwin III @ 2004-05-13  6:06 UTC (permalink / raw)
  To: David Gibson, Andrew Morton, Anton Blanchard, Adam Litke,
	Benjamin Herrenschmidt, linux-kernel, linuxppc64-dev

On Thu, May 13, 2004 at 03:55:20PM +1000, David Gibson wrote:
> +int mmap_use_hugepages = 0;
> +int mmap_hugepages_map_sz = 256;

These aren't used anywhere else in your patch; any chance you could
nuke them?

Thanks.


-- wli


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: More convenient way to grab hugepage memory
  2004-05-13  6:05 ` Muli Ben-Yehuda
@ 2004-05-13  6:20   ` David Gibson
  2004-05-13  6:45     ` Roland Dreier
  2004-05-13  6:59   ` William Lee Irwin III
  1 sibling, 1 reply; 14+ messages in thread
From: David Gibson @ 2004-05-13  6:20 UTC (permalink / raw)
  To: Muli Ben-Yehuda
  Cc: Andrew Morton, Anton Blanchard, Adam Litke,
	Benjamin Herrenschmidt, linux-kernel, linuxppc64-dev

[-- Attachment #1: Type: text/plain, Size: 17982 bytes --]

On Thu, May 13, 2004 at 09:05:50AM +0300, Muli Ben-Yehuda wrote:
> On Thu, May 13, 2004 at 03:55:20PM +1000, David Gibson wrote:
> 
> > +++ working-2.6/mm/mmap.c	2004-04-27 13:40:01.062285976 +1000
> > @@ -21,6 +21,7 @@
> >  #include <linux/profile.h>
> >  #include <linux/module.h>
> >  #include <linux/mount.h>
> > +#include <linux/err.h>
> >  
> >  #include <asm/uaccess.h>
> >  #include <asm/pgalloc.h>
> > @@ -62,6 +63,9 @@
> >  EXPORT_SYMBOL(sysctl_max_map_count);
> >  EXPORT_SYMBOL(vm_committed_space);
> >  
> > +int mmap_use_hugepages = 0;
> > +int mmap_hugepages_map_sz = 256;
> 
> These two global variables do not appear to be used anywhere in this
> patch? 

Bother, they leaked in there from the other patch I had on top of
this.  Revised version below.

> > +static inline unsigned long __do_mmap_pgoff(struct file * file, unsigned long addr,
> 
> __do_mmap_pgoff seems rather long to be inline? 

Well, it's only called in one place - it's really the one function,
just split up to let us put the wrapper creating the hugetlb file in
the right place without excessive indentation.  I guess it doesn't
really matter, with -funit-at-a-time it will end up the same anyway.

Add a new MAP_HUGETLB mmap() flag to easily request that a block of
anonymous memory come from hugepages.

Index: working-2.6/include/asm-i386/mman.h
===================================================================
--- working-2.6.orig/include/asm-i386/mman.h	2003-10-03 11:24:48.000000000 +1000
+++ working-2.6/include/asm-i386/mman.h	2004-05-13 14:51:18.698011208 +1000
@@ -16,6 +16,7 @@
 #define MAP_ANONYMOUS	0x20		/* don't use a file */
 
 #define MAP_GROWSDOWN	0x0100		/* stack-like segment */
+#define MAP_HUGETLB	0x0400		/* Backed by hugetlb pages */
 #define MAP_DENYWRITE	0x0800		/* ETXTBSY */
 #define MAP_EXECUTABLE	0x1000		/* mark it as an executable */
 #define MAP_LOCKED	0x2000		/* pages are locked */
Index: working-2.6/include/asm-ppc64/mman.h
===================================================================
--- working-2.6.orig/include/asm-ppc64/mman.h	2003-10-01 11:47:33.000000000 +1000
+++ working-2.6/include/asm-ppc64/mman.h	2004-05-13 14:51:18.705010144 +1000
@@ -26,6 +26,7 @@
 #define MAP_LOCKED	0x80
 
 #define MAP_GROWSDOWN	0x0100		/* stack-like segment */
+#define MAP_HUGETLB	0x0400		/* Backed with hugetlb pages */
 #define MAP_DENYWRITE	0x0800		/* ETXTBSY */
 #define MAP_EXECUTABLE	0x1000		/* mark it as an executable */
 
Index: working-2.6/include/linux/mman.h
===================================================================
--- working-2.6.orig/include/linux/mman.h	2003-10-07 10:31:58.000000000 +1000
+++ working-2.6/include/linux/mman.h	2004-05-13 14:51:18.712009080 +1000
@@ -58,6 +58,9 @@
 	return _calc_vm_trans(flags, MAP_GROWSDOWN,  VM_GROWSDOWN ) |
 	       _calc_vm_trans(flags, MAP_DENYWRITE,  VM_DENYWRITE ) |
 	       _calc_vm_trans(flags, MAP_EXECUTABLE, VM_EXECUTABLE) |
+#ifdef CONFIG_HUGETLB_PAGE
+               _calc_vm_trans(flags, MAP_HUGETLB,    VM_HUGETLB   ) |
+#endif
 	       _calc_vm_trans(flags, MAP_LOCKED,     VM_LOCKED    );
 }
 
Index: working-2.6/include/asm-sh/mman.h
===================================================================
--- working-2.6.orig/include/asm-sh/mman.h	2003-10-01 11:47:40.000000000 +1000
+++ working-2.6/include/asm-sh/mman.h	2004-05-13 14:51:18.718008168 +1000
@@ -16,6 +16,7 @@
 #define MAP_ANONYMOUS	0x20		/* don't use a file */
 
 #define MAP_GROWSDOWN	0x0100		/* stack-like segment */
+#define MAP_HUGETLB	0x0400		/* Backed with hugetlb pages */
 #define MAP_DENYWRITE	0x0800		/* ETXTBSY */
 #define MAP_EXECUTABLE	0x1000		/* mark it as an executable */
 #define MAP_LOCKED	0x2000		/* pages are locked */
Index: working-2.6/include/asm-ia64/mman.h
===================================================================
--- working-2.6.orig/include/asm-ia64/mman.h	2004-01-28 10:55:18.000000000 +1100
+++ working-2.6/include/asm-ia64/mman.h	2004-05-13 14:51:18.721007712 +1000
@@ -24,6 +24,7 @@
 
 #define MAP_GROWSDOWN	0x00100		/* stack-like segment */
 #define MAP_GROWSUP	0x00200		/* register stack-like segment */
+#define MAP_HUGETLB	0x00400		/* Backed with hugetlb pages */
 #define MAP_DENYWRITE	0x00800		/* ETXTBSY */
 #define MAP_EXECUTABLE	0x01000		/* mark it as an executable */
 #define MAP_LOCKED	0x02000		/* pages are locked */
Index: working-2.6/include/asm-sparc64/mman.h
===================================================================
--- working-2.6.orig/include/asm-sparc64/mman.h	2003-10-01 11:47:45.000000000 +1000
+++ working-2.6/include/asm-sparc64/mman.h	2004-05-13 14:51:18.724007256 +1000
@@ -24,6 +24,7 @@
 #define _MAP_NEW        0x80000000      /* Binary compatibility is fun... */
 
 #define MAP_GROWSDOWN	0x0200		/* stack-like segment */
+#define MAP_HUGETLB	0x0400		/* Backed with hugetlb pages */
 #define MAP_DENYWRITE	0x0800		/* ETXTBSY */
 #define MAP_EXECUTABLE	0x1000		/* mark it as an executable */
 
Index: working-2.6/include/asm-x86_64/mman.h
===================================================================
--- working-2.6.orig/include/asm-x86_64/mman.h	2003-10-01 11:47:50.000000000 +1000
+++ working-2.6/include/asm-x86_64/mman.h	2004-05-13 14:51:18.726006952 +1000
@@ -17,6 +17,7 @@
 #define MAP_32BIT	0x40		/* only give out 32bit addresses */
 
 #define MAP_GROWSDOWN	0x0100		/* stack-like segment */
+#define MAP_HUGETLB	0x0400		/* Backed with hugetlb pages */
 #define MAP_DENYWRITE	0x0800		/* ETXTBSY */
 #define MAP_EXECUTABLE	0x1000		/* mark it as an executable */
 #define MAP_LOCKED	0x2000		/* pages are locked */
Index: working-2.6/include/asm-ppc/mman.h
===================================================================
--- working-2.6.orig/include/asm-ppc/mman.h	2003-10-01 11:47:28.000000000 +1000
+++ working-2.6/include/asm-ppc/mman.h	2004-05-13 14:51:18.733005888 +1000
@@ -19,6 +19,7 @@
 #define MAP_LOCKED	0x80
 
 #define MAP_GROWSDOWN	0x0100		/* stack-like segment */
+#define MAP_HUGETLB	0x0400		/* Backed with hugetlb pages */
 #define MAP_DENYWRITE	0x0800		/* ETXTBSY */
 #define MAP_EXECUTABLE	0x1000		/* mark it as an executable */
 #define MAP_POPULATE	0x8000		/* populate (prefault) pagetables */
Index: working-2.6/include/asm-alpha/mman.h
===================================================================
--- working-2.6.orig/include/asm-alpha/mman.h	2003-10-01 11:46:52.000000000 +1000
+++ working-2.6/include/asm-alpha/mman.h	2004-05-13 14:51:18.735005584 +1000
@@ -28,6 +28,7 @@
 #define MAP_NORESERVE	0x10000		/* don't check for reservations */
 #define MAP_POPULATE	0x20000		/* populate (prefault) pagetables */
 #define MAP_NONBLOCK	0x40000		/* do not block on IO */
+#define MAP_HUGETLB	0x80000		/* Backed with hugetlb pages */
 
 #define MS_ASYNC	1		/* sync memory asynchronously */
 #define MS_SYNC		2		/* synchronous memory sync */
Index: working-2.6/include/asm-arm/mman.h
===================================================================
--- working-2.6.orig/include/asm-arm/mman.h	2003-10-01 11:46:53.000000000 +1000
+++ working-2.6/include/asm-arm/mman.h	2004-05-13 14:51:18.741004672 +1000
@@ -16,6 +16,7 @@
 #define MAP_ANONYMOUS	0x20		/* don't use a file */
 
 #define MAP_GROWSDOWN	0x0100		/* stack-like segment */
+#define MAP_HUGETLB	0x0400		/* Backed with hugetlb pages */
 #define MAP_DENYWRITE	0x0800		/* ETXTBSY */
 #define MAP_EXECUTABLE	0x1000		/* mark it as an executable */
 #define MAP_LOCKED	0x2000		/* pages are locked */
Index: working-2.6/include/asm-arm26/mman.h
===================================================================
--- working-2.6.orig/include/asm-arm26/mman.h	2003-10-01 11:47:01.000000000 +1000
+++ working-2.6/include/asm-arm26/mman.h	2004-05-13 14:51:18.744004216 +1000
@@ -16,6 +16,7 @@
 #define MAP_ANONYMOUS	0x20		/* don't use a file */
 
 #define MAP_GROWSDOWN	0x0100		/* stack-like segment */
+#define MAP_HUGETLB	0x0400		/* Backed with hugetlb pages */
 #define MAP_DENYWRITE	0x0800		/* ETXTBSY */
 #define MAP_EXECUTABLE	0x1000		/* mark it as an executable */
 #define MAP_LOCKED	0x2000		/* pages are locked */
Index: working-2.6/include/asm-cris/mman.h
===================================================================
--- working-2.6.orig/include/asm-cris/mman.h	2003-10-01 11:47:02.000000000 +1000
+++ working-2.6/include/asm-cris/mman.h	2004-05-13 14:51:18.751003152 +1000
@@ -18,6 +18,7 @@
 #define MAP_ANONYMOUS	0x20		/* don't use a file */
 
 #define MAP_GROWSDOWN	0x0100		/* stack-like segment */
+#define MAP_HUGETLB	0x0400		/* Backed with hugetlb pages */
 #define MAP_DENYWRITE	0x0800		/* ETXTBSY */
 #define MAP_EXECUTABLE	0x1000		/* mark it as an executable */
 #define MAP_LOCKED	0x2000		/* pages are locked */
Index: working-2.6/include/asm-h8300/mman.h
===================================================================
--- working-2.6.orig/include/asm-h8300/mman.h	2003-10-01 11:47:05.000000000 +1000
+++ working-2.6/include/asm-h8300/mman.h	2004-05-13 14:51:18.764001176 +1000
@@ -15,6 +15,7 @@
 #define MAP_ANONYMOUS	0x20		/* don't use a file */
 
 #define MAP_GROWSDOWN	0x0100		/* stack-like segment */
+#define MAP_HUGETLB	0x0400		/* Backed with hugetlb pages */
 #define MAP_DENYWRITE	0x0800		/* ETXTBSY */
 #define MAP_EXECUTABLE	0x1000		/* mark it as an executable */
 #define MAP_LOCKED	0x2000		/* pages are locked */
Index: working-2.6/include/asm-m68k/mman.h
===================================================================
--- working-2.6.orig/include/asm-m68k/mman.h	2003-10-01 11:47:14.000000000 +1000
+++ working-2.6/include/asm-m68k/mman.h	2004-05-13 14:51:18.771000112 +1000
@@ -16,6 +16,7 @@
 #define MAP_ANONYMOUS	0x20		/* don't use a file */
 
 #define MAP_GROWSDOWN	0x0100		/* stack-like segment */
+#define MAP_HUGETLB	0x0400		/* Backed with hugetlb pages */
 #define MAP_DENYWRITE	0x0800		/* ETXTBSY */
 #define MAP_EXECUTABLE	0x1000		/* mark it as an executable */
 #define MAP_LOCKED	0x2000		/* pages are locked */
Index: working-2.6/include/asm-mips/mman.h
===================================================================
--- working-2.6.orig/include/asm-mips/mman.h	2003-10-01 11:47:19.000000000 +1000
+++ working-2.6/include/asm-mips/mman.h	2004-05-13 14:51:18.779998744 +1000
@@ -38,6 +38,7 @@
 #define MAP_AUTORSRV	0x100		/* Logical swap reserved on demand */
 
 /* These are linux-specific */
+#define MAP_HUGETLB	0x0200		/* Backed with hugetlb pages */
 #define MAP_NORESERVE	0x0400		/* don't check for reservations */
 #define MAP_ANONYMOUS	0x0800		/* don't use a file */
 #define MAP_GROWSDOWN	0x1000		/* stack-like segment */
Index: working-2.6/include/asm-parisc/mman.h
===================================================================
--- working-2.6.orig/include/asm-parisc/mman.h	2003-10-01 11:47:25.000000000 +1000
+++ working-2.6/include/asm-parisc/mman.h	2004-05-13 14:51:18.785997832 +1000
@@ -15,6 +15,7 @@
 #define MAP_FIXED	0x04		/* Interpret addr exactly */
 #define MAP_ANONYMOUS	0x10		/* don't use a file */
 
+#define MAP_HUGETLB	0x0400		/* Backed with hugetlb pages */
 #define MAP_DENYWRITE	0x0800		/* ETXTBSY */
 #define MAP_EXECUTABLE	0x1000		/* mark it as an executable */
 #define MAP_LOCKED	0x2000		/* pages are locked */
Index: working-2.6/include/asm-s390/mman.h
===================================================================
--- working-2.6.orig/include/asm-s390/mman.h	2003-10-01 11:47:38.000000000 +1000
+++ working-2.6/include/asm-s390/mman.h	2004-05-13 14:51:18.791996920 +1000
@@ -24,6 +24,7 @@
 #define MAP_ANONYMOUS	0x20		/* don't use a file */
 
 #define MAP_GROWSDOWN	0x0100		/* stack-like segment */
+#define MAP_HUGETLB	0x0400		/* Backed with hugetlb pages */
 #define MAP_DENYWRITE	0x0800		/* ETXTBSY */
 #define MAP_EXECUTABLE	0x1000		/* mark it as an executable */
 #define MAP_LOCKED	0x2000		/* pages are locked */
Index: working-2.6/include/asm-sparc/mman.h
===================================================================
--- working-2.6.orig/include/asm-sparc/mman.h	2003-10-01 11:47:43.000000000 +1000
+++ working-2.6/include/asm-sparc/mman.h	2004-05-13 14:51:18.797996008 +1000
@@ -24,6 +24,7 @@
 #define _MAP_NEW        0x80000000      /* Binary compatibility is fun... */
 
 #define MAP_GROWSDOWN	0x0200		/* stack-like segment */
+#define MAP_HUGETLB	0x0400		/* Backed with hugetlb pages */
 #define MAP_DENYWRITE	0x0800		/* ETXTBSY */
 #define MAP_EXECUTABLE	0x1000		/* mark it as an executable */
 
Index: working-2.6/include/asm-v850/mman.h
===================================================================
--- working-2.6.orig/include/asm-v850/mman.h	2003-10-01 11:47:49.000000000 +1000
+++ working-2.6/include/asm-v850/mman.h	2004-05-13 14:51:18.803995096 +1000
@@ -15,6 +15,7 @@
 #define MAP_ANONYMOUS	0x20		/* don't use a file */
 
 #define MAP_GROWSDOWN	0x0100		/* stack-like segment */
+#define MAP_HUGETLB	0x0400		/* Backed with hugetlb pages */
 #define MAP_DENYWRITE	0x0800		/* ETXTBSY */
 #define MAP_EXECUTABLE	0x1000		/* mark it as an executable */
 #define MAP_LOCKED	0x2000		/* pages are locked */
Index: working-2.6/mm/mmap.c
===================================================================
--- working-2.6.orig/mm/mmap.c	2004-04-20 10:50:09.000000000 +1000
+++ working-2.6/mm/mmap.c	2004-05-13 16:13:58.213950408 +1000
@@ -21,6 +21,7 @@
 #include <linux/profile.h>
 #include <linux/module.h>
 #include <linux/mount.h>
+#include <linux/err.h>
 
 #include <asm/uaccess.h>
 #include <asm/pgalloc.h>
@@ -476,13 +477,9 @@
 	return NULL;
 }
 
-/*
- * The caller must hold down_write(current->mm->mmap_sem).
- */
-
-unsigned long do_mmap_pgoff(struct file * file, unsigned long addr,
-			unsigned long len, unsigned long prot,
-			unsigned long flags, unsigned long pgoff)
+static inline unsigned long __do_mmap_pgoff(struct file * file, unsigned long addr,
+					    unsigned long len, unsigned long prot,
+					    unsigned long flags, unsigned long pgoff)
 {
 	struct mm_struct * mm = current->mm;
 	struct vm_area_struct * vma, * prev;
@@ -494,40 +491,19 @@
 	int accountable = 1;
 	unsigned long charged = 0;
 
-	if (file) {
-		if (is_file_hugepages(file))
-			accountable = 0;
-
-		if (!file->f_op || !file->f_op->mmap)
-			return -ENODEV;
-
-		if ((prot & PROT_EXEC) && (file->f_vfsmnt->mnt_flags & MNT_NOEXEC))
-			return -EPERM;
-	}
-
-	if (!len)
-		return addr;
-
-	/* Careful about overflows.. */
-	len = PAGE_ALIGN(len);
-	if (!len || len > TASK_SIZE)
-		return -EINVAL;
-
-	/* offset overflow? */
-	if ((pgoff + (len >> PAGE_SHIFT)) < pgoff)
-		return -EINVAL;
-
-	/* Too many mappings? */
-	if (mm->map_count > sysctl_max_map_count)
-		return -ENOMEM;
-
-	/* Obtain the address to map to. we verify (or select) it and ensure
-	 * that it represents a valid section of the address space.
+	/* Obtain the address to map to. we verify (or select) it and
+	 * ensure that it represents a valid section of the address
+	 * space.  VM_HUGETLB will never appear in vm_flags when
+	 * CONFIG_HUGETLB is unset.
 	 */
 	addr = get_unmapped_area(file, addr, len, pgoff, flags);
 	if (addr & ~PAGE_MASK)
 		return addr;
 
+	/* Huge pages aren't accounted for here */
+	if (file && is_file_hugepages(file))
+		accountable = 0;
+
 	/* Do simple checking here so the lower-level routines won't have
 	 * to. we assume access permissions have been handled by the open
 	 * of the memory object, so we don't do any here.
@@ -724,11 +700,17 @@
 unmap_and_free_vma:
 	if (correct_wcount)
 		atomic_inc(&inode->i_writecount);
-	vma->vm_file = NULL;
-	fput(file);
 
-	/* Undo any partial mapping done by a device driver. */
+	/*
+	 * Undo any partial mapping done by a device driver.  
+	 * hugetlb wants to know the vma's file etc. so nuke  
+	 * the file afterward.                                
+	 */                                                   
 	zap_page_range(vma, vma->vm_start, vma->vm_end - vma->vm_start, NULL);
+
+	if (file)
+		fput(vma->vm_file); 
+
 free_vma:
 	kmem_cache_free(vm_area_cachep, vma);
 unacct_error:
@@ -737,6 +719,62 @@
 	return error;
 }
 
+/*
+ * The caller must hold down_write(current->mm->mmap_sem).
+ */
+
+unsigned long do_mmap_pgoff(struct file * file, unsigned long addr,
+			unsigned long len, unsigned long prot,
+			unsigned long flags, unsigned long pgoff)
+{
+	struct file *hugetlb_file = NULL;
+	unsigned long result;
+
+	if (file) {
+		if ((flags & MAP_HUGETLB) && !is_file_hugepages(file))
+			return -EINVAL;
+
+		if (!file->f_op || !file->f_op->mmap)
+			return -ENODEV;
+
+		if ((prot & PROT_EXEC) && (file->f_vfsmnt->mnt_flags & MNT_NOEXEC))
+			return -EPERM;
+	}
+
+	if (!len)
+		return addr;
+
+	/* Careful about overflows.. */
+	len = PAGE_ALIGN(len);
+	if (!len || len > TASK_SIZE)
+		return -EINVAL;
+
+	/* offset overflow? */
+	if ((pgoff + (len >> PAGE_SHIFT)) < pgoff)
+		return -EINVAL;
+
+	/* Too many mappings? */
+	if (current->mm->map_count > sysctl_max_map_count)
+		return -ENOMEM;
+
+	/* Create an implicit hugetlb file if necessary */
+	if (!file && (flags & MAP_HUGETLB)) {
+		file = hugetlb_file = hugetlb_zero_setup(len);
+		if (IS_ERR(file))
+			return PTR_ERR(file);
+	}
+
+	result = __do_mmap_pgoff(file, addr, len, prot, flags, pgoff);
+
+	/* Drop reference to implicit hugetlb file, it's already been
+	 * "gotten" in __do_mmap_pgoff in case of success
+	 */
+	if (hugetlb_file)
+		fput(hugetlb_file);
+
+	return result;
+}
+
 EXPORT_SYMBOL(do_mmap_pgoff);
 
 /* Get an address range which is currently unmapped.


-- 
David Gibson			| For every complex problem there is a
david AT gibson.dropbear.id.au	| solution which is simple, neat and
				| wrong.
http://www.ozlabs.org/people/dgibson

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: More convenient way to grab hugepage memory
  2004-05-13  6:06 ` William Lee Irwin III
@ 2004-05-13  6:27   ` William Lee Irwin III
  0 siblings, 0 replies; 14+ messages in thread
From: William Lee Irwin III @ 2004-05-13  6:27 UTC (permalink / raw)
  To: David Gibson, Andrew Morton, Anton Blanchard, Adam Litke,
	Benjamin Herrenschmidt, linux-kernel, linuxppc64-dev

On Wed, May 12, 2004 at 11:06:39PM -0700, William Lee Irwin III wrote:
> These aren't used anywhere else in your patch; any chance you could
> nuke them?
> Thanks.

Here is a rediff vs. 2.6.6-mm1 (1 reject resolved) with the removal of
the global variables included:


-- wli

At present, getting a block of (quasi-) anonymous memory mapping with
hugepages is a slightly convoluted process, involving creating a dummy
file in a hugetlbfs filesystem.  In particular that means finding
where such a filesystem is mounted, for which there is no standard
mechanism.  Getting hugepage SysV shm segments is easier, just requing
the SHM_HUGETLB flag.  This patch adds an analagous MAP_HUGETLB mmap()
flag to easily request that a block of anonymous memory come from
hugepages.

[The MAP_HUGETLB flag has the side effect that MAP_SHARED semantics
will apply, even if MAP_PRIVATE is specific - but that's no different
to explicitly mapping hugetlbfs].


diff -u wli-2.6.6-mm1/mm/mmap.c wli-2.6.6-mm1/mm/mmap.c
--- wli-2.6.6-mm1/mm/mmap.c	2004-05-12 23:17:22.000000000 -0700
+++ wli-2.6.6-mm1/mm/mmap.c	2004-05-12 23:18:07.000000000 -0700
@@ -22,6 +22,7 @@
 #include <linux/module.h>
 #include <linux/mount.h>
 #include <linux/mempolicy.h>
+#include <linux/err.h>
 
 #include <asm/uaccess.h>
 #include <asm/cacheflush.h>
@@ -528,13 +529,9 @@
 	return NULL;
 }
 
-/*
- * The caller must hold down_write(current->mm->mmap_sem).
- */
-
-unsigned long do_mmap_pgoff(struct file * file, unsigned long addr,
-			unsigned long len, unsigned long prot,
-			unsigned long flags, unsigned long pgoff)
+static inline unsigned long __do_mmap_pgoff(struct file * file, unsigned long addr,
+					    unsigned long len, unsigned long prot,
+					    unsigned long flags, unsigned long pgoff)
 {
 	struct mm_struct * mm = current->mm;
 	struct vm_area_struct * vma, * prev;
@@ -546,40 +543,19 @@
 	int accountable = 1;
 	unsigned long charged = 0;
 
-	if (file) {
-		if (is_file_hugepages(file))
-			accountable = 0;
-
-		if (!file->f_op || !file->f_op->mmap)
-			return -ENODEV;
-
-		if ((prot & PROT_EXEC) && (file->f_vfsmnt->mnt_flags & MNT_NOEXEC))
-			return -EPERM;
-	}
-
-	if (!len)
-		return addr;
-
-	/* Careful about overflows.. */
-	len = PAGE_ALIGN(len);
-	if (!len || len > TASK_SIZE)
-		return -EINVAL;
-
-	/* offset overflow? */
-	if ((pgoff + (len >> PAGE_SHIFT)) < pgoff)
-		return -EINVAL;
-
-	/* Too many mappings? */
-	if (mm->map_count > sysctl_max_map_count)
-		return -ENOMEM;
-
-	/* Obtain the address to map to. we verify (or select) it and ensure
-	 * that it represents a valid section of the address space.
+	/* Obtain the address to map to. we verify (or select) it and
+	 * ensure that it represents a valid section of the address
+	 * space.  VM_HUGETLB will never appear in vm_flags when
+	 * CONFIG_HUGETLB is unset.
 	 */
 	addr = get_unmapped_area(file, addr, len, pgoff, flags);
 	if (addr & ~PAGE_MASK)
 		return addr;
 
+	/* Huge pages aren't accounted for here */
+	if (file && is_file_hugepages(file))
+		accountable = 0;
+
 	/* Do simple checking here so the lower-level routines won't have
 	 * to. we assume access permissions have been handled by the open
 	 * of the memory object, so we don't do any here.
@@ -776,11 +752,17 @@
 unmap_and_free_vma:
 	if (correct_wcount)
 		atomic_inc(&inode->i_writecount);
-	vma->vm_file = NULL;
-	fput(file);
 
-	/* Undo any partial mapping done by a device driver. */
+	/*
+	 * Undo any partial mapping done by a device driver.  
+	 * hugetlb wants to know the vma's file etc. so nuke  
+	 * the file afterward.                                
+	 */                                                   
 	zap_page_range(vma, vma->vm_start, vma->vm_end - vma->vm_start, NULL);
+
+	if (file)
+		fput(vma->vm_file); 
+
 free_vma:
 	kmem_cache_free(vm_area_cachep, vma);
 unacct_error:
@@ -789,6 +771,62 @@
 	return error;
 }
 
+/*
+ * The caller must hold down_write(current->mm->mmap_sem).
+ */
+
+unsigned long do_mmap_pgoff(struct file * file, unsigned long addr,
+			unsigned long len, unsigned long prot,
+			unsigned long flags, unsigned long pgoff)
+{
+	struct file *hugetlb_file = NULL;
+	unsigned long result;
+
+	if (file) {
+		if ((flags & MAP_HUGETLB) && !is_file_hugepages(file))
+			return -EINVAL;
+
+		if (!file->f_op || !file->f_op->mmap)
+			return -ENODEV;
+
+		if ((prot & PROT_EXEC) && (file->f_vfsmnt->mnt_flags & MNT_NOEXEC))
+			return -EPERM;
+	}
+
+	if (!len)
+		return addr;
+
+	/* Careful about overflows.. */
+	len = PAGE_ALIGN(len);
+	if (!len || len > TASK_SIZE)
+		return -EINVAL;
+
+	/* offset overflow? */
+	if ((pgoff + (len >> PAGE_SHIFT)) < pgoff)
+		return -EINVAL;
+
+	/* Too many mappings? */
+	if (current->mm->map_count > sysctl_max_map_count)
+		return -ENOMEM;
+
+	/* Create an implicit hugetlb file if necessary */
+	if (!file && (flags & MAP_HUGETLB)) {
+		file = hugetlb_file = hugetlb_zero_setup(len);
+		if (IS_ERR(file))
+			return PTR_ERR(file);
+	}
+
+	result = __do_mmap_pgoff(file, addr, len, prot, flags, pgoff);
+
+	/* Drop reference to implicit hugetlb file, it's already been
+	 * "gotten" in __do_mmap_pgoff in case of success
+	 */
+	if (hugetlb_file)
+		fput(hugetlb_file);
+
+	return result;
+}
+
 EXPORT_SYMBOL(do_mmap_pgoff);
 
 /* Get an address range which is currently unmapped.
unchanged:
--- wli-2.6.6-mm1.orig/include/asm-i386/mman.h	2004-05-09 19:32:01.000000000 -0700
+++ wli-2.6.6-mm1/include/asm-i386/mman.h	2004-05-12 23:17:22.000000000 -0700
@@ -16,6 +16,7 @@
 #define MAP_ANONYMOUS	0x20		/* don't use a file */
 
 #define MAP_GROWSDOWN	0x0100		/* stack-like segment */
+#define MAP_HUGETLB	0x0400		/* Backed by hugetlb pages */
 #define MAP_DENYWRITE	0x0800		/* ETXTBSY */
 #define MAP_EXECUTABLE	0x1000		/* mark it as an executable */
 #define MAP_LOCKED	0x2000		/* pages are locked */
unchanged:
--- wli-2.6.6-mm1.orig/include/asm-ppc64/mman.h	2004-05-09 19:32:28.000000000 -0700
+++ wli-2.6.6-mm1/include/asm-ppc64/mman.h	2004-05-12 23:17:22.000000000 -0700
@@ -26,6 +26,7 @@
 #define MAP_LOCKED	0x80
 
 #define MAP_GROWSDOWN	0x0100		/* stack-like segment */
+#define MAP_HUGETLB	0x0400		/* Backed with hugetlb pages */
 #define MAP_DENYWRITE	0x0800		/* ETXTBSY */
 #define MAP_EXECUTABLE	0x1000		/* mark it as an executable */
 
unchanged:
--- wli-2.6.6-mm1.orig/include/linux/mman.h	2004-05-09 19:32:01.000000000 -0700
+++ wli-2.6.6-mm1/include/linux/mman.h	2004-05-12 23:17:22.000000000 -0700
@@ -58,6 +58,9 @@
 	return _calc_vm_trans(flags, MAP_GROWSDOWN,  VM_GROWSDOWN ) |
 	       _calc_vm_trans(flags, MAP_DENYWRITE,  VM_DENYWRITE ) |
 	       _calc_vm_trans(flags, MAP_EXECUTABLE, VM_EXECUTABLE) |
+#ifdef CONFIG_HUGETLB_PAGE
+               _calc_vm_trans(flags, MAP_HUGETLB,    VM_HUGETLB   ) |
+#endif
 	       _calc_vm_trans(flags, MAP_LOCKED,     VM_LOCKED    );
 }
 
unchanged:
--- wli-2.6.6-mm1.orig/include/asm-sh/mman.h	2004-05-09 19:33:13.000000000 -0700
+++ wli-2.6.6-mm1/include/asm-sh/mman.h	2004-05-12 23:17:22.000000000 -0700
@@ -16,6 +16,7 @@
 #define MAP_ANONYMOUS	0x20		/* don't use a file */
 
 #define MAP_GROWSDOWN	0x0100		/* stack-like segment */
+#define MAP_HUGETLB	0x0400		/* Backed with hugetlb pages */
 #define MAP_DENYWRITE	0x0800		/* ETXTBSY */
 #define MAP_EXECUTABLE	0x1000		/* mark it as an executable */
 #define MAP_LOCKED	0x2000		/* pages are locked */
unchanged:
--- wli-2.6.6-mm1.orig/include/asm-ia64/mman.h	2004-05-09 19:32:37.000000000 -0700
+++ wli-2.6.6-mm1/include/asm-ia64/mman.h	2004-05-12 23:17:22.000000000 -0700
@@ -24,6 +24,7 @@
 
 #define MAP_GROWSDOWN	0x00100		/* stack-like segment */
 #define MAP_GROWSUP	0x00200		/* register stack-like segment */
+#define MAP_HUGETLB	0x00400		/* Backed with hugetlb pages */
 #define MAP_DENYWRITE	0x00800		/* ETXTBSY */
 #define MAP_EXECUTABLE	0x01000		/* mark it as an executable */
 #define MAP_LOCKED	0x02000		/* pages are locked */
unchanged:
--- wli-2.6.6-mm1.orig/include/asm-sparc64/mman.h	2004-05-09 19:32:29.000000000 -0700
+++ wli-2.6.6-mm1/include/asm-sparc64/mman.h	2004-05-12 23:17:22.000000000 -0700
@@ -24,6 +24,7 @@
 #define _MAP_NEW        0x80000000      /* Binary compatibility is fun... */
 
 #define MAP_GROWSDOWN	0x0200		/* stack-like segment */
+#define MAP_HUGETLB	0x0400		/* Backed with hugetlb pages */
 #define MAP_DENYWRITE	0x0800		/* ETXTBSY */
 #define MAP_EXECUTABLE	0x1000		/* mark it as an executable */
 
unchanged:
--- wli-2.6.6-mm1.orig/include/asm-x86_64/mman.h	2004-05-09 19:32:53.000000000 -0700
+++ wli-2.6.6-mm1/include/asm-x86_64/mman.h	2004-05-12 23:17:22.000000000 -0700
@@ -17,6 +17,7 @@
 #define MAP_32BIT	0x40		/* only give out 32bit addresses */
 
 #define MAP_GROWSDOWN	0x0100		/* stack-like segment */
+#define MAP_HUGETLB	0x0400		/* Backed with hugetlb pages */
 #define MAP_DENYWRITE	0x0800		/* ETXTBSY */
 #define MAP_EXECUTABLE	0x1000		/* mark it as an executable */
 #define MAP_LOCKED	0x2000		/* pages are locked */
unchanged:
--- wli-2.6.6-mm1.orig/include/asm-ppc/mman.h	2004-05-09 19:33:22.000000000 -0700
+++ wli-2.6.6-mm1/include/asm-ppc/mman.h	2004-05-12 23:17:22.000000000 -0700
@@ -19,6 +19,7 @@
 #define MAP_LOCKED	0x80
 
 #define MAP_GROWSDOWN	0x0100		/* stack-like segment */
+#define MAP_HUGETLB	0x0400		/* Backed with hugetlb pages */
 #define MAP_DENYWRITE	0x0800		/* ETXTBSY */
 #define MAP_EXECUTABLE	0x1000		/* mark it as an executable */
 #define MAP_POPULATE	0x8000		/* populate (prefault) pagetables */
unchanged:
--- wli-2.6.6-mm1.orig/include/asm-alpha/mman.h	2004-05-09 19:31:58.000000000 -0700
+++ wli-2.6.6-mm1/include/asm-alpha/mman.h	2004-05-12 23:17:22.000000000 -0700
@@ -28,6 +28,7 @@
 #define MAP_NORESERVE	0x10000		/* don't check for reservations */
 #define MAP_POPULATE	0x20000		/* populate (prefault) pagetables */
 #define MAP_NONBLOCK	0x40000		/* do not block on IO */
+#define MAP_HUGETLB	0x80000		/* Backed with hugetlb pages */
 
 #define MS_ASYNC	1		/* sync memory asynchronously */
 #define MS_SYNC		2		/* synchronous memory sync */
unchanged:
--- wli-2.6.6-mm1.orig/include/asm-arm/mman.h	2004-05-09 19:32:27.000000000 -0700
+++ wli-2.6.6-mm1/include/asm-arm/mman.h	2004-05-12 23:17:22.000000000 -0700
@@ -16,6 +16,7 @@
 #define MAP_ANONYMOUS	0x20		/* don't use a file */
 
 #define MAP_GROWSDOWN	0x0100		/* stack-like segment */
+#define MAP_HUGETLB	0x0400		/* Backed with hugetlb pages */
 #define MAP_DENYWRITE	0x0800		/* ETXTBSY */
 #define MAP_EXECUTABLE	0x1000		/* mark it as an executable */
 #define MAP_LOCKED	0x2000		/* pages are locked */
unchanged:
--- wli-2.6.6-mm1.orig/include/asm-arm26/mman.h	2004-05-09 19:31:59.000000000 -0700
+++ wli-2.6.6-mm1/include/asm-arm26/mman.h	2004-05-12 23:17:22.000000000 -0700
@@ -16,6 +16,7 @@
 #define MAP_ANONYMOUS	0x20		/* don't use a file */
 
 #define MAP_GROWSDOWN	0x0100		/* stack-like segment */
+#define MAP_HUGETLB	0x0400		/* Backed with hugetlb pages */
 #define MAP_DENYWRITE	0x0800		/* ETXTBSY */
 #define MAP_EXECUTABLE	0x1000		/* mark it as an executable */
 #define MAP_LOCKED	0x2000		/* pages are locked */
unchanged:
--- wli-2.6.6-mm1.orig/include/asm-cris/mman.h	2004-05-09 19:33:20.000000000 -0700
+++ wli-2.6.6-mm1/include/asm-cris/mman.h	2004-05-12 23:17:22.000000000 -0700
@@ -18,6 +18,7 @@
 #define MAP_ANONYMOUS	0x20		/* don't use a file */
 
 #define MAP_GROWSDOWN	0x0100		/* stack-like segment */
+#define MAP_HUGETLB	0x0400		/* Backed with hugetlb pages */
 #define MAP_DENYWRITE	0x0800		/* ETXTBSY */
 #define MAP_EXECUTABLE	0x1000		/* mark it as an executable */
 #define MAP_LOCKED	0x2000		/* pages are locked */
unchanged:
--- wli-2.6.6-mm1.orig/include/asm-h8300/mman.h	2004-05-09 19:32:38.000000000 -0700
+++ wli-2.6.6-mm1/include/asm-h8300/mman.h	2004-05-12 23:17:22.000000000 -0700
@@ -15,6 +15,7 @@
 #define MAP_ANONYMOUS	0x20		/* don't use a file */
 
 #define MAP_GROWSDOWN	0x0100		/* stack-like segment */
+#define MAP_HUGETLB	0x0400		/* Backed with hugetlb pages */
 #define MAP_DENYWRITE	0x0800		/* ETXTBSY */
 #define MAP_EXECUTABLE	0x1000		/* mark it as an executable */
 #define MAP_LOCKED	0x2000		/* pages are locked */
unchanged:
--- wli-2.6.6-mm1.orig/include/asm-m68k/mman.h	2004-05-09 19:32:01.000000000 -0700
+++ wli-2.6.6-mm1/include/asm-m68k/mman.h	2004-05-12 23:17:22.000000000 -0700
@@ -16,6 +16,7 @@
 #define MAP_ANONYMOUS	0x20		/* don't use a file */
 
 #define MAP_GROWSDOWN	0x0100		/* stack-like segment */
+#define MAP_HUGETLB	0x0400		/* Backed with hugetlb pages */
 #define MAP_DENYWRITE	0x0800		/* ETXTBSY */
 #define MAP_EXECUTABLE	0x1000		/* mark it as an executable */
 #define MAP_LOCKED	0x2000		/* pages are locked */
unchanged:
--- wli-2.6.6-mm1.orig/include/asm-mips/mman.h	2004-05-09 19:32:27.000000000 -0700
+++ wli-2.6.6-mm1/include/asm-mips/mman.h	2004-05-12 23:17:22.000000000 -0700
@@ -38,6 +38,7 @@
 #define MAP_AUTORSRV	0x100		/* Logical swap reserved on demand */
 
 /* These are linux-specific */
+#define MAP_HUGETLB	0x0200		/* Backed with hugetlb pages */
 #define MAP_NORESERVE	0x0400		/* don't check for reservations */
 #define MAP_ANONYMOUS	0x0800		/* don't use a file */
 #define MAP_GROWSDOWN	0x1000		/* stack-like segment */
unchanged:
--- wli-2.6.6-mm1.orig/include/asm-parisc/mman.h	2004-05-09 19:32:52.000000000 -0700
+++ wli-2.6.6-mm1/include/asm-parisc/mman.h	2004-05-12 23:17:22.000000000 -0700
@@ -15,6 +15,7 @@
 #define MAP_FIXED	0x04		/* Interpret addr exactly */
 #define MAP_ANONYMOUS	0x10		/* don't use a file */
 
+#define MAP_HUGETLB	0x0400		/* Backed with hugetlb pages */
 #define MAP_DENYWRITE	0x0800		/* ETXTBSY */
 #define MAP_EXECUTABLE	0x1000		/* mark it as an executable */
 #define MAP_LOCKED	0x2000		/* pages are locked */
unchanged:
--- wli-2.6.6-mm1.orig/include/asm-s390/mman.h	2004-05-09 19:32:00.000000000 -0700
+++ wli-2.6.6-mm1/include/asm-s390/mman.h	2004-05-12 23:17:22.000000000 -0700
@@ -24,6 +24,7 @@
 #define MAP_ANONYMOUS	0x20		/* don't use a file */
 
 #define MAP_GROWSDOWN	0x0100		/* stack-like segment */
+#define MAP_HUGETLB	0x0400		/* Backed with hugetlb pages */
 #define MAP_DENYWRITE	0x0800		/* ETXTBSY */
 #define MAP_EXECUTABLE	0x1000		/* mark it as an executable */
 #define MAP_LOCKED	0x2000		/* pages are locked */
unchanged:
--- wli-2.6.6-mm1.orig/include/asm-sparc/mman.h	2004-05-09 19:33:20.000000000 -0700
+++ wli-2.6.6-mm1/include/asm-sparc/mman.h	2004-05-12 23:17:22.000000000 -0700
@@ -24,6 +24,7 @@
 #define _MAP_NEW        0x80000000      /* Binary compatibility is fun... */
 
 #define MAP_GROWSDOWN	0x0200		/* stack-like segment */
+#define MAP_HUGETLB	0x0400		/* Backed with hugetlb pages */
 #define MAP_DENYWRITE	0x0800		/* ETXTBSY */
 #define MAP_EXECUTABLE	0x1000		/* mark it as an executable */
 
unchanged:
--- wli-2.6.6-mm1.orig/include/asm-v850/mman.h	2004-05-09 19:33:13.000000000 -0700
+++ wli-2.6.6-mm1/include/asm-v850/mman.h	2004-05-12 23:17:22.000000000 -0700
@@ -15,6 +15,7 @@
 #define MAP_ANONYMOUS	0x20		/* don't use a file */
 
 #define MAP_GROWSDOWN	0x0100		/* stack-like segment */
+#define MAP_HUGETLB	0x0400		/* Backed with hugetlb pages */
 #define MAP_DENYWRITE	0x0800		/* ETXTBSY */
 #define MAP_EXECUTABLE	0x1000		/* mark it as an executable */
 #define MAP_LOCKED	0x2000		/* pages are locked */

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: More convenient way to grab hugepage memory
  2004-05-13  6:20   ` David Gibson
@ 2004-05-13  6:45     ` Roland Dreier
  0 siblings, 0 replies; 14+ messages in thread
From: Roland Dreier @ 2004-05-13  6:45 UTC (permalink / raw)
  To: David Gibson
  Cc: Muli Ben-Yehuda, Andrew Morton, Anton Blanchard, Adam Litke,
	Benjamin Herrenschmidt, linux-kernel, linuxppc64-dev

    David> Well, it's only called in one place - it's really the one
    David> function, just split up to let us put the wrapper creating
    David> the hugetlb file in the right place without excessive
    David> indentation.  I guess it doesn't really matter, with
    David> -funit-at-a-time it will end up the same anyway.

We seem to be in a strange situation with respect to -funit-at-a-time:

    arch/i386/Makefile:
    # Disable unit-at-a-time mode, it makes gcc use a lot more stack
    # due to the lack of sharing of stacklots.
    CFLAGS += $(call check_gcc,-fno-unit-at-a-time,)

    arch/x86_64/Makefile:
    # -funit-at-a-time shrinks the kernel .text considerably
    # unfortunately it makes reading oopses harder.
    CFLAGS += $(call check_gcc,-funit-at-a-time,)

    arch/ppc64/Makefile:
    # Enable unit-at-a-time mode when possible. It shrinks the
    # kernel considerably.
    CFLAGS += $(call check_gcc,-funit-at-a-time,)

It looks like i386/x86_64 led the way to -funit-at-a-time, ppc64
followed, and then i386 had second thoughts because of increased stack
usage around the time of 4K stacks.

No other archs have a position one way or another.

 -  Roland

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: More convenient way to grab hugepage memory
  2004-05-13  6:05 ` Muli Ben-Yehuda
  2004-05-13  6:20   ` David Gibson
@ 2004-05-13  6:59   ` William Lee Irwin III
  2004-05-13  7:04     ` William Lee Irwin III
  2004-05-13  7:09     ` Benjamin Herrenschmidt
  1 sibling, 2 replies; 14+ messages in thread
From: William Lee Irwin III @ 2004-05-13  6:59 UTC (permalink / raw)
  To: Muli Ben-Yehuda
  Cc: David Gibson, Andrew Morton, Anton Blanchard, Adam Litke,
	Benjamin Herrenschmidt, linux-kernel, linuxppc64-dev

On Thu, May 13, 2004 at 09:05:50AM +0300, Muli Ben-Yehuda wrote:
> These two global variables do not appear to be used anywhere in this
> patch? 
> > +static inline unsigned long __do_mmap_pgoff(struct file * file, unsigned long addr,
> __do_mmap_pgoff seems rather long to be inline? 

Atop my other patch to nuke the unused global variables, here is a patch
to manually inline __do_mmap_pgoff(), removing the inline usage. Untested.
Are you sure you want this? #ifdef'ing out the hugetlb case is somewhat
more digestible with the inline in place, e.g.:

Index: wli-2.6.6-mm1/mm/mmap.c
===================================================================
--- wli-2.6.6-mm1.orig/mm/mmap.c	2004-05-12 23:29:53.000000000 -0700
+++ wli-2.6.6-mm1/mm/mmap.c	2004-05-12 23:56:40.000000000 -0700
@@ -771,6 +771,25 @@
 	return error;
 }
 
+#ifdef CONFIG_HUGETLB_PAGE
+static inline int create_hugetlb_file(struct file **file, unsigned long flags)
+{
+	/* Create an implicit hugetlb file if necessary */
+	if (*file || !(flags & MAP_HUGETLB))
+		return 0;
+	*file = hugetlb_zero_setup(len);
+	if (!IS_ERR(*file))
+		return 0;
+	else
+		return PTR_ERR(*file);
+}
+#else
+static inline int create_hugetlb_file(struct file **file, unsigned long flags)
+{
+	return 0;
+}
+#endif
+
 /*
  * The caller must hold down_write(current->mm->mmap_sem).
  */
@@ -809,14 +828,10 @@
 	if (current->mm->map_count > sysctl_max_map_count)
 		return -ENOMEM;
 
-	/* Create an implicit hugetlb file if necessary */
-	if (!file && (flags & MAP_HUGETLB)) {
-		file = hugetlb_file = hugetlb_zero_setup(len);
-		if (IS_ERR(file))
-			return PTR_ERR(file);
-	}
-
-	result = __do_mmap_pgoff(file, addr, len, prot, flags, pgoff);
+	if (!(result = create_hugetlb_file(&hugetlb_file, flags)))
+		result = __do_mmap_pgoff(file, addr, len, prot, flags, pgoff);
+	else
+		return result;
 
 	/* Drop reference to implicit hugetlb file, it's already been
 	 * "gotten" in __do_mmap_pgoff in case of success

though I suppose it's possible in principle to combine things.


-- wli

Remove __do_mmap_pgoff() in favor of direct case analysis within
do_mmap_pgoff() proper.

Index: wli-2.6.6-mm1/mm/mmap.c
===================================================================
--- wli-2.6.6-mm1.orig/mm/mmap.c	2004-05-12 23:29:53.000000000 -0700
+++ wli-2.6.6-mm1/mm/mmap.c	2004-05-12 23:40:37.000000000 -0700
@@ -529,9 +529,12 @@
 	return NULL;
 }
 
-static inline unsigned long __do_mmap_pgoff(struct file * file, unsigned long addr,
-					    unsigned long len, unsigned long prot,
-					    unsigned long flags, unsigned long pgoff)
+/*
+ * The caller must hold down_write(current->mm->mmap_sem).
+ */
+unsigned long do_mmap_pgoff(struct file * file, unsigned long addr,
+			unsigned long len, unsigned long prot,
+			unsigned long flags, unsigned long pgoff)
 {
 	struct mm_struct * mm = current->mm;
 	struct vm_area_struct * vma, * prev;
@@ -542,6 +545,42 @@
 	struct rb_node ** rb_link, * rb_parent;
 	int accountable = 1;
 	unsigned long charged = 0;
+	struct file *hugetlb_file = NULL;
+	unsigned long result;
+
+	if (file) {
+		if ((flags & MAP_HUGETLB) && !is_file_hugepages(file))
+			return -EINVAL;
+
+		if (!file->f_op || !file->f_op->mmap)
+			return -ENODEV;
+
+		if ((prot & PROT_EXEC) && (file->f_vfsmnt->mnt_flags & MNT_NOEXEC))
+			return -EPERM;
+	}
+
+	if (!len)
+		return addr;
+
+	/* Careful about overflows.. */
+	len = PAGE_ALIGN(len);
+	if (!len || len > TASK_SIZE)
+		return -EINVAL;
+
+	/* offset overflow? */
+	if ((pgoff + (len >> PAGE_SHIFT)) < pgoff)
+		return -EINVAL;
+
+	/* Too many mappings? */
+	if (mm->map_count > sysctl_max_map_count)
+		return -ENOMEM;
+
+	/* Create an implicit hugetlb file if necessary */
+	if (!file && (flags & MAP_HUGETLB)) {
+		file = hugetlb_file = hugetlb_zero_setup(len);
+		if (IS_ERR(file))
+			return PTR_ERR(file);
+	}
 
 	/* Obtain the address to map to. we verify (or select) it and
 	 * ensure that it represents a valid section of the address
@@ -549,9 +588,10 @@
 	 * CONFIG_HUGETLB is unset.
 	 */
 	addr = get_unmapped_area(file, addr, len, pgoff, flags);
-	if (addr & ~PAGE_MASK)
-		return addr;
-
+	if (addr & ~PAGE_MASK) {
+		result = addr;
+		goto out_check_hugetlb;
+	}
 	/* Huge pages aren't accounted for here */
 	if (file && is_file_hugepages(file))
 		accountable = 0;
@@ -564,16 +604,21 @@
 			mm->def_flags | VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC;
 
 	if (flags & MAP_LOCKED) {
-		if (!can_do_mlock())
-			return -EPERM;
-		vm_flags |= VM_LOCKED;
+		if (can_do_mlock())
+			vm_flags |= VM_LOCKED;
+		else {
+			result = -EPERM;
+			goto out_check_hugetlb;
+		}
 	}
 	/* mlock MCL_FUTURE? */
 	if (vm_flags & VM_LOCKED) {
 		unsigned long locked = mm->locked_vm << PAGE_SHIFT;
 		locked += len;
-		if (locked > current->rlim[RLIMIT_MEMLOCK].rlim_cur)
-			return -EAGAIN;
+		if (locked > current->rlim[RLIMIT_MEMLOCK].rlim_cur) {
+			result = -EAGAIN;
+			goto out_check_hugetlb;
+		}
 	}
 
 	inode = file ? file->f_dentry->d_inode : NULL;
@@ -581,21 +626,27 @@
 	if (file) {
 		switch (flags & MAP_TYPE) {
 		case MAP_SHARED:
-			if ((prot&PROT_WRITE) && !(file->f_mode&FMODE_WRITE))
-				return -EACCES;
+			if ((prot & PROT_WRITE) && !(file->f_mode & FMODE_WRITE)) {
+				result = -EACCES;
+				goto out_check_hugetlb;
+			}
 
 			/*
 			 * Make sure we don't allow writing to an append-only
 			 * file..
 			 */
-			if (IS_APPEND(inode) && (file->f_mode & FMODE_WRITE))
-				return -EACCES;
+			if (IS_APPEND(inode) && (file->f_mode & FMODE_WRITE)) {
+				result = -EACCES;
+				goto out_check_hugetlb;
+			}
 
 			/*
 			 * Make sure there are no mandatory locks on the file.
 			 */
-			if (locks_verify_locked(inode))
-				return -EAGAIN;
+			if (locks_verify_locked(inode)) {
+				result = -EAGAIN;
+				goto out_check_hugetlb;
+			}
 
 			vm_flags |= VM_SHARED | VM_MAYSHARE;
 			if (!(file->f_mode & FMODE_WRITE))
@@ -603,18 +654,22 @@
 
 			/* fall through */
 		case MAP_PRIVATE:
-			if (!(file->f_mode & FMODE_READ))
-				return -EACCES;
+			if (!(file->f_mode & FMODE_READ)) {
+				result = -EACCES;
+				goto out_check_hugetlb;
+			}
 			break;
 
 		default:
-			return -EINVAL;
+			result = -EINVAL;
+			goto out_check_hugetlb;
 		}
 	} else {
 		vm_flags |= VM_SHARED | VM_MAYSHARE;
 		switch (flags & MAP_TYPE) {
 		default:
-			return -EINVAL;
+			result = -EINVAL;
+			goto out_check_hugetlb;
 		case MAP_PRIVATE:
 			vm_flags &= ~(VM_SHARED | VM_MAYSHARE);
 			/* fall through */
@@ -624,23 +679,29 @@
 	}
 
 	error = security_file_mmap(file, prot, flags);
-	if (error)
-		return error;
-		
+	if (error) {
+		result = error;
+		goto out_check_hugetlb;
+	}
+
 	/* Clear old maps */
 	error = -ENOMEM;
 munmap_back:
 	vma = find_vma_prepare(mm, addr, &prev, &rb_link, &rb_parent);
 	if (vma && vma->vm_start < addr + len) {
-		if (do_munmap(mm, addr, len))
-			return -ENOMEM;
+		if (do_munmap(mm, addr, len)) {
+			result = -ENOMEM;
+			goto out_check_hugetlb;
+		}
 		goto munmap_back;
 	}
 
 	/* Check against address space limit. */
 	if ((mm->total_vm << PAGE_SHIFT) + len
-	    > current->rlim[RLIMIT_AS].rlim_cur)
-		return -ENOMEM;
+	    > current->rlim[RLIMIT_AS].rlim_cur) {
+		result = -ENOMEM;
+		goto out_check_hugetlb;
+	}
 
 	if (accountable && (!(flags & MAP_NORESERVE) ||
 			sysctl_overcommit_memory > 1)) {
@@ -652,8 +713,10 @@
 			 * Private writable mapping: check memory availability
 			 */
 			charged = len >> PAGE_SHIFT;
-			if (security_vm_enough_memory(charged))
-				return -ENOMEM;
+			if (security_vm_enough_memory(charged)) {
+				result = -ENOMEM;
+				goto out_check_hugetlb;
+			}
 			vm_flags |= VM_ACCOUNT;
 		}
 	}
@@ -747,7 +810,8 @@
 					pgoff, flags & MAP_NONBLOCK);
 		down_write(&mm->mmap_sem);
 	}
-	return addr;
+	result = addr;
+	goto out_check_hugetlb;
 
 unmap_and_free_vma:
 	if (correct_wcount)
@@ -768,58 +832,12 @@
 unacct_error:
 	if (charged)
 		vm_unacct_memory(charged);
-	return error;
-}
-
-/*
- * The caller must hold down_write(current->mm->mmap_sem).
- */
-
-unsigned long do_mmap_pgoff(struct file * file, unsigned long addr,
-			unsigned long len, unsigned long prot,
-			unsigned long flags, unsigned long pgoff)
-{
-	struct file *hugetlb_file = NULL;
-	unsigned long result;
-
-	if (file) {
-		if ((flags & MAP_HUGETLB) && !is_file_hugepages(file))
-			return -EINVAL;
-
-		if (!file->f_op || !file->f_op->mmap)
-			return -ENODEV;
+	result = error;
 
-		if ((prot & PROT_EXEC) && (file->f_vfsmnt->mnt_flags & MNT_NOEXEC))
-			return -EPERM;
-	}
-
-	if (!len)
-		return addr;
-
-	/* Careful about overflows.. */
-	len = PAGE_ALIGN(len);
-	if (!len || len > TASK_SIZE)
-		return -EINVAL;
-
-	/* offset overflow? */
-	if ((pgoff + (len >> PAGE_SHIFT)) < pgoff)
-		return -EINVAL;
-
-	/* Too many mappings? */
-	if (current->mm->map_count > sysctl_max_map_count)
-		return -ENOMEM;
-
-	/* Create an implicit hugetlb file if necessary */
-	if (!file && (flags & MAP_HUGETLB)) {
-		file = hugetlb_file = hugetlb_zero_setup(len);
-		if (IS_ERR(file))
-			return PTR_ERR(file);
-	}
-
-	result = __do_mmap_pgoff(file, addr, len, prot, flags, pgoff);
-
-	/* Drop reference to implicit hugetlb file, it's already been
-	 * "gotten" in __do_mmap_pgoff in case of success
+out_check_hugetlb:
+	/*
+	 * Drop reference to the implicitly-allocated hugetlb file, if
+	 * MAP_HUGETLB was passed and the hugetlbfs inode succeeded.
 	 */
 	if (hugetlb_file)
 		fput(hugetlb_file);

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: More convenient way to grab hugepage memory
  2004-05-13  6:59   ` William Lee Irwin III
@ 2004-05-13  7:04     ` William Lee Irwin III
  2004-05-13  7:11       ` William Lee Irwin III
  2004-05-13  7:09     ` Benjamin Herrenschmidt
  1 sibling, 1 reply; 14+ messages in thread
From: William Lee Irwin III @ 2004-05-13  7:04 UTC (permalink / raw)
  To: Muli Ben-Yehuda, David Gibson, Andrew Morton, Anton Blanchard,
	Adam Litke, Benjamin Herrenschmidt, linux-kernel, linuxppc64-dev

On Wed, May 12, 2004 at 11:59:12PM -0700, William Lee Irwin III wrote:
> +#ifdef CONFIG_HUGETLB_PAGE
> +static inline int create_hugetlb_file(struct file **file, unsigned long flags)

That would be:


Index: wli-2.6.6-mm1/mm/mmap.c
===================================================================
--- wli-2.6.6-mm1.orig/mm/mmap.c	2004-05-12 23:29:53.000000000 -0700
+++ wli-2.6.6-mm1/mm/mmap.c	2004-05-13 00:02:36.000000000 -0700
@@ -771,6 +771,26 @@
 	return error;
 }
 
+#ifdef CONFIG_HUGETLB_PAGE
+static inline int create_hugetlb_file(struct file **file, unsigned long flags,
+							unsigned long len)
+{
+	/* Create an implicit hugetlb file if necessary */
+	if (*file || !(flags & MAP_HUGETLB))
+		return 0;
+	else if (!IS_ERR(*file = hugetlb_zero_setup(len)))
+		return 0;
+	else
+		return PTR_ERR(*file);
+}
+#else
+static inline int create_hugetlb_file(struct file **file, unsigned long flags,
+							unsigned long len)
+{
+	return 0;
+}
+#endif
+
 /*
  * The caller must hold down_write(current->mm->mmap_sem).
  */
@@ -809,14 +829,10 @@
 	if (current->mm->map_count > sysctl_max_map_count)
 		return -ENOMEM;
 
-	/* Create an implicit hugetlb file if necessary */
-	if (!file && (flags & MAP_HUGETLB)) {
-		file = hugetlb_file = hugetlb_zero_setup(len);
-		if (IS_ERR(file))
-			return PTR_ERR(file);
-	}
-
-	result = __do_mmap_pgoff(file, addr, len, prot, flags, pgoff);
+	if (!(result = create_hugetlb_file(&hugetlb_file, flags, len)))
+		result = __do_mmap_pgoff(file, addr, len, prot, flags, pgoff);
+	else
+		return result;
 
 	/* Drop reference to implicit hugetlb file, it's already been
 	 * "gotten" in __do_mmap_pgoff in case of success

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: More convenient way to grab hugepage memory
  2004-05-13  6:59   ` William Lee Irwin III
  2004-05-13  7:04     ` William Lee Irwin III
@ 2004-05-13  7:09     ` Benjamin Herrenschmidt
  2004-05-13  7:13       ` William Lee Irwin III
  1 sibling, 1 reply; 14+ messages in thread
From: Benjamin Herrenschmidt @ 2004-05-13  7:09 UTC (permalink / raw)
  To: William Lee Irwin III
  Cc: Muli Ben-Yehuda, David Gibson, Andrew Morton, Anton Blanchard,
	Adam Litke, Linux Kernel list, linuxppc64-dev

On Thu, 2004-05-13 at 16:59, William Lee Irwin III wrote:
> On Thu, May 13, 2004 at 09:05:50AM +0300, Muli Ben-Yehuda wrote:
> > These two global variables do not appear to be used anywhere in this
> > patch?
> > > +static inline unsigned long __do_mmap_pgoff(struct file * file, unsigned long addr,
> > __do_mmap_pgoff seems rather long to be inline?
> 
> Atop my other patch to nuke the unused global variables, here is a patch
> to manually inline __do_mmap_pgoff(), removing the inline usage. Untested.
> Are you sure you want this? #ifdef'ing out the hugetlb case is somewhat
> more digestible with the inline in place, e.g.:

Well, I did the breakup in 2 pieces in the first place for 2 reasons:

 - the original patch had some subtle issues with accounting
 - do_mmap_pgoff is already such a mess, let's not make it worse

I mean, it's awful to get anything right in this function, especially
the cleanup/exit path, which is why I think it's more maintainable
cut in 2.

Ben.



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: More convenient way to grab hugepage memory
  2004-05-13  7:04     ` William Lee Irwin III
@ 2004-05-13  7:11       ` William Lee Irwin III
  0 siblings, 0 replies; 14+ messages in thread
From: William Lee Irwin III @ 2004-05-13  7:11 UTC (permalink / raw)
  To: Muli Ben-Yehuda, David Gibson, Andrew Morton, Anton Blanchard,
	Adam Litke, Benjamin Herrenschmidt, linux-kernel, linuxppc64-dev

On Thu, May 13, 2004 at 12:04:34AM -0700, William Lee Irwin III wrote:
> That would be:

At some point while I wasn't looking hugetlb_zero_setup() got an
ERR_PTR(-ENOSYS) implementation for #ifndef CONFIG_HUGETLB_PAGE so the
whole create_hugetlb_file() bit is unnecessary.


-- wli

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: More convenient way to grab hugepage memory
  2004-05-13  7:09     ` Benjamin Herrenschmidt
@ 2004-05-13  7:13       ` William Lee Irwin III
  2004-05-13  8:05         ` Muli Ben-Yehuda
  0 siblings, 1 reply; 14+ messages in thread
From: William Lee Irwin III @ 2004-05-13  7:13 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Muli Ben-Yehuda, David Gibson, Andrew Morton, Anton Blanchard,
	Adam Litke, Linux Kernel list, linuxppc64-dev

On Thu, 2004-05-13 at 16:59, William Lee Irwin III wrote:
>> Atop my other patch to nuke the unused global variables, here is a patch
>> to manually inline __do_mmap_pgoff(), removing the inline usage. Untested.
>> Are you sure you want this? #ifdef'ing out the hugetlb case is somewhat
>> more digestible with the inline in place, e.g.:

On Thu, May 13, 2004 at 05:09:01PM +1000, Benjamin Herrenschmidt wrote:
> Well, I did the breakup in 2 pieces in the first place for 2 reasons:
>  - the original patch had some subtle issues with accounting
>  - do_mmap_pgoff is already such a mess, let's not make it worse
> I mean, it's awful to get anything right in this function, especially
> the cleanup/exit path, which is why I think it's more maintainable
> cut in 2.

Well, writing it vaguely convinced me that it wasn't a great idea; I
suppose now that Muli can look at the result he'll be convinced likewise.


-- wli

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: More convenient way to grab hugepage memory
  2004-05-13  5:55 More convenient way to grab hugepage memory David Gibson
  2004-05-13  6:05 ` Muli Ben-Yehuda
  2004-05-13  6:06 ` William Lee Irwin III
@ 2004-05-13  7:49 ` Christoph Hellwig
  2004-05-13  8:06   ` Christoph Hellwig
  2 siblings, 1 reply; 14+ messages in thread
From: Christoph Hellwig @ 2004-05-13  7:49 UTC (permalink / raw)
  To: David Gibson, Andrew Morton, Anton Blanchard, Adam Litke,
	Benjamin Herrenschmidt, linux-kernel, linuxppc64-dev

On Thu, May 13, 2004 at 03:55:20PM +1000, David Gibson wrote:
> Andrew, please apply:
> 
> At present, getting a block of (quasi-) anonymous memory mapping with
> hugepages is a slightly convoluted process, involving creating a dummy
> file in a hugetlbfs filesystem.  In particular that means finding
> where such a filesystem is mounted, for which there is no standard
> mechanism.  Getting hugepage SysV shm segments is easier, just requing
> the SHM_HUGETLB flag.  This patch adds an analagous MAP_HUGETLB mmap()
> flag to easily request that a block of anonymous memory come from
> hugepages.
> 
> [The MAP_HUGETLB flag has the side effect that MAP_SHARED semantics
> will apply, even if MAP_PRIVATE is specific - but that's no different
> to explicitly mapping hugetlbfs].

Please don't do this.  It's messing all over sensitive codepathes in the
kernel, creating special cases and bloat of what you could with simple a
simpe hugetlb_mmap() wrapper ala (pseudocode)

hugetlb_mmap()
{
	fd = open(file in hugetlbfs)

	mmap(.., fd, ...)
	close(fd)
}

in some library.  The hugetlbfs implementation was chosen exactly because
if kept the impact of hugetlb pages down to normal kernel codepathes.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: More convenient way to grab hugepage memory
  2004-05-13  7:13       ` William Lee Irwin III
@ 2004-05-13  8:05         ` Muli Ben-Yehuda
  0 siblings, 0 replies; 14+ messages in thread
From: Muli Ben-Yehuda @ 2004-05-13  8:05 UTC (permalink / raw)
  To: William Lee Irwin III, Benjamin Herrenschmidt, David Gibson,
	Andrew Morton, Anton Blanchard, Adam Litke, Linux Kernel list,
	linuxppc64-dev

[-- Attachment #1: Type: text/plain, Size: 1369 bytes --]

On Thu, May 13, 2004 at 12:13:59AM -0700, William Lee Irwin III wrote:
> On Thu, 2004-05-13 at 16:59, William Lee Irwin III wrote:
> >> Atop my other patch to nuke the unused global variables, here is a patch
> >> to manually inline __do_mmap_pgoff(), removing the inline usage. Untested.
> >> Are you sure you want this? #ifdef'ing out the hugetlb case is somewhat
> >> more digestible with the inline in place, e.g.:
> 
> On Thu, May 13, 2004 at 05:09:01PM +1000, Benjamin Herrenschmidt wrote:
> > Well, I did the breakup in 2 pieces in the first place for 2 reasons:
> >  - the original patch had some subtle issues with accounting
> >  - do_mmap_pgoff is already such a mess, let's not make it worse
> > I mean, it's awful to get anything right in this function, especially
> > the cleanup/exit path, which is why I think it's more maintainable
> > cut in 2.
> 
> Well, writing it vaguely convinced me that it wasn't a great idea; I
> suppose now that Muli can look at the result he'll be convinced
> likewise.

No need to convince me; my comment was strictly with regards to
inlining the function (as opposed to just leaving it static), not with
regards to splitting up the messy horror that is do_mmap_pgoff, which
I am very much in favor of.

Cheers, 
Muli 
-- 
Muli Ben-Yehuda
http://www.mulix.org | http://mulix.livejournal.com/


[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: More convenient way to grab hugepage memory
  2004-05-13  7:49 ` Christoph Hellwig
@ 2004-05-13  8:06   ` Christoph Hellwig
  0 siblings, 0 replies; 14+ messages in thread
From: Christoph Hellwig @ 2004-05-13  8:06 UTC (permalink / raw)
  To: Christoph Hellwig, David Gibson, Andrew Morton, Anton Blanchard,
	Adam Litke, Benjamin Herrenschmidt, linux-kernel, linuxppc64-dev

On Thu, May 13, 2004 at 08:49:03AM +0100, Christoph Hellwig wrote:
> Please don't do this.  It's messing all over sensitive codepathes in the
> kernel, creating special cases and bloat of what you could with simple a
> simpe hugetlb_mmap() wrapper ala (pseudocode)

another thing is that you could also simply override the mmap symbol from
glibc do transparently use hugetlb pages.

Another problem with this interface is that hugetlb_zero_setup bypassed
directory based permissions, aka it's has the same design bug as the
broken sysv shm extension for hugetlb pages and thus needs privilegues
or one of the horrible hacks discussed on lkml the last days.


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2004-05-13  8:16 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-05-13  5:55 More convenient way to grab hugepage memory David Gibson
2004-05-13  6:05 ` Muli Ben-Yehuda
2004-05-13  6:20   ` David Gibson
2004-05-13  6:45     ` Roland Dreier
2004-05-13  6:59   ` William Lee Irwin III
2004-05-13  7:04     ` William Lee Irwin III
2004-05-13  7:11       ` William Lee Irwin III
2004-05-13  7:09     ` Benjamin Herrenschmidt
2004-05-13  7:13       ` William Lee Irwin III
2004-05-13  8:05         ` Muli Ben-Yehuda
2004-05-13  6:06 ` William Lee Irwin III
2004-05-13  6:27   ` William Lee Irwin III
2004-05-13  7:49 ` Christoph Hellwig
2004-05-13  8:06   ` Christoph Hellwig

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.