All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Howells <dhowells@redhat.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: dhowells@redhat.com, torvalds@linux-foundation.org,
	bernds_cb1@t-online.de, rgetz@blackfin.uclinux.org,
	vapier.adi@gmail.com, lethal@linux-sh.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH 04/10] NOMMU: Make VMAs per MM as for MMU-mode linux [ver #3]
Date: Tue, 13 Jan 2009 10:51:28 +0000	[thread overview]
Message-ID: <10270.1231843888@redhat.com> (raw)
In-Reply-To: <20090112235550.2d6be428.akpm@linux-foundation.org>

Andrew Morton <akpm@linux-foundation.org> wrote:

> > +		K((unsigned long) atomic_read(&mmap_pages_allocated)),
> 
> This is slightly wrong, as atomic_read() returns "int".  It will show
> weird results on 64-bit nommu machines which allocate more than 2G pages :)

Okay...  I'll make mmap_pages_allocated an atomic_long.  That should be
exactly the same on a 32-bit system.

> > +		   vma->vm_pgoff << PAGE_SHIFT,
> 
> This will overflow at file offsets >4G?

True.  I'll cast it to unsigned long long before shifting and adjust the print
format to %08llx.

> > +	vm_area_cachep = kmem_cache_create("vm_area_struct",
> > +			sizeof(struct vm_area_struct), 0,
> > +			SLAB_PANIC, NULL);
> 
> We have the KMEM_CACHE() macro for this.

Not that fork.c seems to use it.

> (I see this was copy-n-pasted from fork.c.  Why was it moved?)

Because I wanted to do something different for vm_area_struct under nommu for
a time.  I'll move it back.

> open-coded BUG_ON()

All changed.

> `region' could be local to this block, if one feels so inclined.

But then you can't look in the variable with gdb.

> > +#define validate_nommu_regions() do {} while(0)
> 
> There's no need to implement this in cpp...

True.  There's also no need to implement it as a static inline either.

> > +static void free_page_series(unsigned long from, unsigned long to)
> > ...
> > +		atomic_dec(&mmap_pages_allocated);
> 
> This isn't counting "pages allocated".  It's counting page *refcounts*.
> Now, perhaps these pages will always have a refcount of 1 (why?), in
> which case things happen to work out.  But if so, the next line isn't
> needed:

It's actually keeping track of the number of pages we have recorded in
vm_region structs as being allocated.  Once we put the pages here, we can't
track them anymore.

Furthermore, they should have a refcount of 1 at this point, unless there's a
bug in ptrace or direct I/O.

> > +		if (page_count(page) != 1)
> 
> should be "== 1", surely?

Why?  That would be silly.  I want to warn _if_ the page count isn't what I
expect, not if it is.  I'll adjust the text of the warning to make it clear.

> > + * - the caller must hold the region semaphore, which this releases
> 
> "for writing"

Yep.

> > +		printk(KERN_WARNING
> > +		       "munmap of memory not mmapped by process %d (%s):"
> > +		       " 0x%lx-0x%lx\n",
> > +		       current->pid, current->comm, start, start + len - 1);
> 
> Unprivileged userspace can spam logs?  Perhaps not a concern on typical
> nommu systems.  But buggy userspace can spam logs, too, which _is_ a
> problem?

I'll limit it to 5 printks.  But this is NOMMU: if userspace wants to corrupt
the kernel, it can do so at will.


Patch attached.  Now I have to go and catch a plane to Australia.

David
---
From: David Howells <dhowells@redhat.com>
Subject: [PATCH] NOMMU: Fix a number of issues with the per-MM VMA patch

Fix a number of issues with the per-MM VMA patch:

 (1) Make mmap_pages_allocated an atomic_long_t, just in case this is used on
     a NOMMU system with more than 2G pages.  Makes no difference on a 32-bit
     system.

 (2) Report vma->vm_pgoff * PAGE_SIZE as a 64-bit value, not a 32-bit value,
     lest it overflow.

 (3) Move the allocation of the vm_area_struct slab back for fork.c.

 (4) Use KMEM_CACHE() for both vm_area_struct and vm_region slabs.

 (5) Use BUG_ON() rather than if () BUG().

 (6) Make the default validate_nommu_regions() a static inline rather than a
     #define.

 (7) Make free_page_series()'s objection to pages with a refcount != 1 more
     informative.

 (8) Adjust the __put_nommu_region() banner comment to indicate that the
     semaphore must be held for writing.

 (9) Limit the number of warnings about munmaps of non-mmapped regions.

Reported-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/proc/meminfo.c    |    2 +-
 fs/proc/task_nommu.c |    4 ++--
 include/linux/mm.h   |    2 +-
 kernel/fork.c        |    1 +
 mm/mmap.c            |    3 ---
 mm/nommu.c           |   52 ++++++++++++++++++++++++--------------------------
 6 files changed, 30 insertions(+), 34 deletions(-)


diff --git a/fs/proc/meminfo.c b/fs/proc/meminfo.c
index 43d2394..74ea974 100644
--- a/fs/proc/meminfo.c
+++ b/fs/proc/meminfo.c
@@ -120,7 +120,7 @@ static int meminfo_proc_show(struct seq_file *m, void *v)
 		K(i.freeram-i.freehigh),
 #endif
 #ifndef CONFIG_MMU
-		K((unsigned long) atomic_read(&mmap_pages_allocated)),
+		K((unsigned long) atomic_long_read(&mmap_pages_allocated)),
 #endif
 		K(i.totalswap),
 		K(i.freeswap),
diff --git a/fs/proc/task_nommu.c b/fs/proc/task_nommu.c
index 343ea12..370be0a 100644
--- a/fs/proc/task_nommu.c
+++ b/fs/proc/task_nommu.c
@@ -136,14 +136,14 @@ static int nommu_vma_show(struct seq_file *m, struct vm_area_struct *vma)
 	}
 
 	seq_printf(m,
-		   "%08lx-%08lx %c%c%c%c %08lx %02x:%02x %lu %n",
+		   "%08lx-%08lx %c%c%c%c %08llx %02x:%02x %lu %n",
 		   vma->vm_start,
 		   vma->vm_end,
 		   flags & VM_READ ? 'r' : '-',
 		   flags & VM_WRITE ? 'w' : '-',
 		   flags & VM_EXEC ? 'x' : '-',
 		   flags & VM_MAYSHARE ? flags & VM_SHARED ? 'S' : 's' : 'p',
-		   vma->vm_pgoff << PAGE_SHIFT,
+		   (unsigned long long) vma->vm_pgoff << PAGE_SHIFT,
 		   MAJOR(dev), MINOR(dev), ino, &len);
 
 	if (file) {
diff --git a/include/linux/mm.h b/include/linux/mm.h
index b91a73f..d9a832c 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1064,7 +1064,7 @@ static inline void setup_per_cpu_pageset(void) {}
 #endif
 
 /* nommu.c */
-extern atomic_t mmap_pages_allocated;
+extern atomic_long_t mmap_pages_allocated;
 
 /* prio_tree.c */
 void vma_prio_tree_add(struct vm_area_struct *, struct vm_area_struct *old);
diff --git a/kernel/fork.c b/kernel/fork.c
index 1d68f12..119b6ed 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1484,6 +1484,7 @@ void __init proc_caches_init(void)
 	mm_cachep = kmem_cache_create("mm_struct",
 			sizeof(struct mm_struct), ARCH_MIN_MMSTRUCT_ALIGN,
 			SLAB_HWCACHE_ALIGN|SLAB_PANIC, NULL);
+	vm_area_cachep = KMEM_CACHE(vm_area_struct, SLAB_PANIC);
 	mmap_init();
 }
 
diff --git a/mm/mmap.c b/mm/mmap.c
index 7496231..46d126c 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2478,7 +2478,4 @@ void mm_drop_all_locks(struct mm_struct *mm)
  */
 void __init mmap_init(void)
 {
-	vm_area_cachep = kmem_cache_create("vm_area_struct",
-			sizeof(struct vm_area_struct), 0,
-			SLAB_PANIC, NULL);
 }
diff --git a/mm/nommu.c b/mm/nommu.c
index 60ed837..427d8c4 100644
--- a/mm/nommu.c
+++ b/mm/nommu.c
@@ -69,7 +69,7 @@ int sysctl_max_map_count = DEFAULT_MAX_MAP_COUNT;
 int sysctl_nr_trim_pages = 1; /* page trimming behaviour */
 int heap_stack_gap = 0;
 
-atomic_t mmap_pages_allocated;
+atomic_long_t mmap_pages_allocated;
 
 EXPORT_SYMBOL(mem_map);
 EXPORT_SYMBOL(num_physpages);
@@ -445,12 +445,7 @@ asmlinkage unsigned long sys_brk(unsigned long brk)
  */
 void __init mmap_init(void)
 {
-	vm_region_jar = kmem_cache_create("vm_region_jar",
-					  sizeof(struct vm_region), 0,
-					  SLAB_PANIC, NULL);
-	vm_area_cachep = kmem_cache_create("vm_area_struct",
-					   sizeof(struct vm_area_struct), 0,
-					   SLAB_PANIC, NULL);
+	vm_region_jar = KMEM_CACHE(vm_region, SLAB_PANIC);
 }
 
 /*
@@ -468,27 +463,24 @@ static noinline void validate_nommu_regions(void)
 		return;
 
 	last = rb_entry(lastp, struct vm_region, vm_rb);
-	if (unlikely(last->vm_end <= last->vm_start))
-		BUG();
-	if (unlikely(last->vm_top < last->vm_end))
-		BUG();
+	BUG_ON(unlikely(last->vm_end <= last->vm_start));
+	BUG_ON(unlikely(last->vm_top < last->vm_end));
 
 	while ((p = rb_next(lastp))) {
 		region = rb_entry(p, struct vm_region, vm_rb);
 		last = rb_entry(lastp, struct vm_region, vm_rb);
 
-		if (unlikely(region->vm_end <= region->vm_start))
-			BUG();
-		if (unlikely(region->vm_top < region->vm_end))
-			BUG();
-		if (unlikely(region->vm_start < last->vm_top))
-			BUG();
+		BUG_ON(unlikely(region->vm_end <= region->vm_start));
+		BUG_ON(unlikely(region->vm_top < region->vm_end));
+		BUG_ON(unlikely(region->vm_start < last->vm_top));
 
 		lastp = p;
 	}
 }
 #else
-#define validate_nommu_regions() do {} while(0)
+static void validate_nommu_regions(void)
+{
+}
 #endif
 
 /*
@@ -545,16 +537,17 @@ static void free_page_series(unsigned long from, unsigned long to)
 		struct page *page = virt_to_page(from);
 
 		kdebug("- free %lx", from);
-		atomic_dec(&mmap_pages_allocated);
+		atomic_long_dec(&mmap_pages_allocated);
 		if (page_count(page) != 1)
-			kdebug("free page %p [%d]", page, page_count(page));
+			kdebug("free page %p: refcount not one: %d",
+			       page, page_count(page));
 		put_page(page);
 	}
 }
 
 /*
  * release a reference to a region
- * - the caller must hold the region semaphore, which this releases
+ * - the caller must hold the region semaphore for writing, which this releases
  * - the region may not have been added to the tree yet, in which case vm_top
  *   will equal vm_start
  */
@@ -1078,7 +1071,7 @@ static int do_mmap_private(struct vm_area_struct *vma,
 		goto enomem;
 
 	total = 1 << order;
-	atomic_add(total, &mmap_pages_allocated);
+	atomic_long_add(total, &mmap_pages_allocated);
 
 	point = rlen >> PAGE_SHIFT;
 
@@ -1089,7 +1082,7 @@ static int do_mmap_private(struct vm_area_struct *vma,
 			order = ilog2(total - point);
 			n = 1 << order;
 			kdebug("shave %lu/%lu @%lu", n, total - point, total);
-			atomic_sub(n, &mmap_pages_allocated);
+			atomic_long_sub(n, &mmap_pages_allocated);
 			total -= n;
 			set_page_refcounted(pages + total);
 			__free_pages(pages + total, order);
@@ -1518,10 +1511,15 @@ int do_munmap(struct mm_struct *mm, unsigned long start, size_t len)
 	/* find the first potentially overlapping VMA */
 	vma = find_vma(mm, start);
 	if (!vma) {
-		printk(KERN_WARNING
-		       "munmap of memory not mmapped by process %d (%s):"
-		       " 0x%lx-0x%lx\n",
-		       current->pid, current->comm, start, start + len - 1);
+		static int limit = 0;
+		if (limit < 5) {
+			printk(KERN_WARNING
+			       "munmap of memory not mmapped by process %d"
+			       " (%s): 0x%lx-0x%lx\n",
+			       current->pid, current->comm,
+			       start, start + len - 1);
+			limit++;
+		}
 		return -EINVAL;
 	}
 

  reply	other threads:[~2009-01-13 10:53 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-01-08 12:53 [PATCH 00/10] Alter NOMMU mmap handling [ver #3] David Howells
2009-01-08 12:53 ` [PATCH 01/10] NOMMU: Fix cleanup handling in ramfs_nommu_get_umapped_area() " David Howells
2009-01-08 12:53 ` [PATCH 02/10] NOMMU: Rename ARM's struct vm_region " David Howells
2009-01-08 12:54 ` [PATCH 03/10] NOMMU: Delete askedalloc and realalloc variables " David Howells
2009-01-08 12:54 ` [PATCH 04/10] NOMMU: Make VMAs per MM as for MMU-mode linux " David Howells
2009-01-13  7:55   ` Andrew Morton
2009-01-13 10:51     ` David Howells [this message]
2009-01-08 12:54 ` [PATCH 05/10] NOMMU: Make mmap allocation page trimming behaviour configurable. " David Howells
2009-01-08 12:54 ` [PATCH 06/10] NOMMU: Improve procfs output using per-MM VMAs " David Howells
2009-01-08 12:54 ` [PATCH 07/10] FDPIC: Don't attempt to expand the userspace stack to fill the space allocated " David Howells
2009-01-08 12:54 ` [PATCH 08/10] FLAT: " David Howells
2009-01-08 12:54 ` [PATCH 09/10] NOMMU: Teach kobjsize() about VMA regions. " David Howells
2009-01-08 12:54 ` [PATCH 10/10] NOMMU: Support XIP on initramfs " David Howells

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=10270.1231843888@redhat.com \
    --to=dhowells@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=bernds_cb1@t-online.de \
    --cc=lethal@linux-sh.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=rgetz@blackfin.uclinux.org \
    --cc=torvalds@linux-foundation.org \
    --cc=vapier.adi@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.