public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH 0/4] Fix vmalloc regression
  2008-11-07 22:35 [PATCH 0/4] Fix vmalloc regression Glauber Costa
@ 2008-11-07 21:56 ` walt
  2008-11-07 22:35 ` [PATCH 1/4] don't call __vmalloc from other vmap internal functions Glauber Costa
  2008-11-08  0:58 ` [PATCH 0/4] Fix vmalloc regression Nick Piggin
  2 siblings, 0 replies; 10+ messages in thread
From: walt @ 2008-11-07 21:56 UTC (permalink / raw)
  To: linux-kernel; +Cc: kvm

Glauber Costa wrote:
> Nick,
>
> This is the whole set of patches I was talking about.
> Patch 3 is the one that in fact fixes the problem...

Yep, patch 3 works for me, thanks.  Only the 32-bit kernel
seems to need the patch, FWIW.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 0/4] Fix vmalloc regression
@ 2008-11-07 22:35 Glauber Costa
  2008-11-07 21:56 ` walt
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Glauber Costa @ 2008-11-07 22:35 UTC (permalink / raw)
  To: linux-kernel; +Cc: kvm, avi, npiggin

Nick,

This is the whole set of patches I was talking about.
Patch 3 is the one that in fact fixes the problem
Patches 1 and 2 are debugging aids I made use of, and could be possibly
useful to others
Patch 4 removes guard pages entirely for non-debug kernels, as we have already
previously discussed.

Hope it's all fine.



^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/4] don't call __vmalloc from other vmap internal functions
  2008-11-07 22:35 [PATCH 0/4] Fix vmalloc regression Glauber Costa
  2008-11-07 21:56 ` walt
@ 2008-11-07 22:35 ` Glauber Costa
  2008-11-07 22:35   ` [PATCH 2/4] show size of failing allocation Glauber Costa
  2008-11-08  0:58 ` [PATCH 0/4] Fix vmalloc regression Nick Piggin
  2 siblings, 1 reply; 10+ messages in thread
From: Glauber Costa @ 2008-11-07 22:35 UTC (permalink / raw)
  To: linux-kernel; +Cc: kvm, avi, npiggin

If we do that, output of files like /proc/vmallocinfo
will show things like "vmalloc_32", "vmalloc_user", or
whomever the caller was as the caller. This info is not
as useful as the real caller of the allocation.

So, proposal is to call __vmalloc_node node directly, with
matching parameters to save the caller information

Signed-off-by: Glauber Costa <glommer@redhat.com>
---
 mm/vmalloc.c |   12 ++++++++----
 1 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 0365369..95856d1 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -1343,7 +1343,8 @@ void *vmalloc_user(unsigned long size)
 	struct vm_struct *area;
 	void *ret;
 
-	ret = __vmalloc(size, GFP_KERNEL | __GFP_HIGHMEM | __GFP_ZERO, PAGE_KERNEL);
+	ret = __vmalloc_node(size, GFP_KERNEL | __GFP_HIGHMEM | __GFP_ZERO,
+			     PAGE_KERNEL, -1, __builtin_return_address(0));
 	if (ret) {
 		area = find_vm_area(ret);
 		area->flags |= VM_USERMAP;
@@ -1388,7 +1389,8 @@ EXPORT_SYMBOL(vmalloc_node);
 
 void *vmalloc_exec(unsigned long size)
 {
-	return __vmalloc(size, GFP_KERNEL | __GFP_HIGHMEM, PAGE_KERNEL_EXEC);
+	return __vmalloc_node(size, GFP_KERNEL | __GFP_HIGHMEM, PAGE_KERNEL_EXEC,
+			      -1, __builtin_return_address(0));
 }
 
 #if defined(CONFIG_64BIT) && defined(CONFIG_ZONE_DMA32)
@@ -1408,7 +1410,8 @@ void *vmalloc_exec(unsigned long size)
  */
 void *vmalloc_32(unsigned long size)
 {
-	return __vmalloc(size, GFP_VMALLOC32, PAGE_KERNEL);
+	return __vmalloc_node(size, GFP_VMALLOC32, PAGE_KERNEL,
+			      -1, __builtin_return_address(0));
 }
 EXPORT_SYMBOL(vmalloc_32);
 
@@ -1424,7 +1427,8 @@ void *vmalloc_32_user(unsigned long size)
 	struct vm_struct *area;
 	void *ret;
 
-	ret = __vmalloc(size, GFP_VMALLOC32 | __GFP_ZERO, PAGE_KERNEL);
+	ret = __vmalloc_node(size, GFP_VMALLOC32 | __GFP_ZERO, PAGE_KERNEL,
+			     -1, __builtin_return_address(0));
 	if (ret) {
 		area = find_vm_area(ret);
 		area->flags |= VM_USERMAP;
-- 
1.5.6.5


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 2/4] show size of failing allocation
  2008-11-07 22:35 ` [PATCH 1/4] don't call __vmalloc from other vmap internal functions Glauber Costa
@ 2008-11-07 22:35   ` Glauber Costa
  2008-11-07 22:35     ` [PATCH 3/4] restart search at beggining of vmalloc address Glauber Costa
  0 siblings, 1 reply; 10+ messages in thread
From: Glauber Costa @ 2008-11-07 22:35 UTC (permalink / raw)
  To: linux-kernel; +Cc: kvm, avi, npiggin

if we can't service a vmalloc allocation, show size of
the allocation that actually failed. Useful for
debugging.

Signed-off-by: Glauber Costa <glommer@redhat.com>
---
 mm/vmalloc.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 95856d1..7db493d 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -381,8 +381,8 @@ found:
 			goto retry;
 		}
 		if (printk_ratelimit())
-			printk(KERN_WARNING "vmap allocation failed: "
-				 "use vmalloc=<size> to increase size.\n");
+			printk(KERN_WARNING "vmap allocation for size %d failed: "
+				 "use vmalloc=<size> to increase size.\n", size);
 		return ERR_PTR(-EBUSY);
 	}
 
-- 
1.5.6.5


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 3/4] restart search at beggining of vmalloc address
  2008-11-07 22:35   ` [PATCH 2/4] show size of failing allocation Glauber Costa
@ 2008-11-07 22:35     ` Glauber Costa
  2008-11-07 22:35       ` [PATCH 4/4] Do not use guard pages in non-debug kernels Glauber Costa
  2008-11-14 17:53       ` [PATCH 3/4] restart search at beggining of vmalloc address walt
  0 siblings, 2 replies; 10+ messages in thread
From: Glauber Costa @ 2008-11-07 22:35 UTC (permalink / raw)
  To: linux-kernel; +Cc: kvm, avi, npiggin

Current vmalloc restart search for a free area in case we
can't find one. The reason is there are areas which are lazily
freed, and could be possibly freed now. However, current implementation
start searching the tree from the last failing address, which is
pretty much by definition at the end of address space. So, we fail.

The proposal of this patch is to restart the search from the beginning
of the requested vstart address. This fixes the regression in running
KVM virtual machines for me, described in
http://lkml.org/lkml/2008/10/28/349, caused by commit
db64fe02258f1507e13fe5212a989922323685ce.

Signed-off-by: Glauber Costa <glommer@redhat.com>
CC: Nick Piggin <npiggin@suse.de>
---
 mm/vmalloc.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 7db493d..6fe2003 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -378,6 +378,7 @@ found:
 		if (!purged) {
 			purge_vmap_area_lazy();
 			purged = 1;
+			addr = ALIGN(vstart, align);
 			goto retry;
 		}
 		if (printk_ratelimit())
-- 
1.5.6.5


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 4/4] Do not use guard pages in non-debug kernels
  2008-11-07 22:35     ` [PATCH 3/4] restart search at beggining of vmalloc address Glauber Costa
@ 2008-11-07 22:35       ` Glauber Costa
  2008-11-14 17:53       ` [PATCH 3/4] restart search at beggining of vmalloc address walt
  1 sibling, 0 replies; 10+ messages in thread
From: Glauber Costa @ 2008-11-07 22:35 UTC (permalink / raw)
  To: linux-kernel; +Cc: kvm, avi, npiggin

In mm/vmalloc.c, make usage of guard pages dependant
on CONFIG_DEBUG_PAGEALLOC.

Signed-off-by: Glauber Costa <glommer@redhat.com>
---
 mm/vmalloc.c |   25 +++++++++++++++----------
 1 files changed, 15 insertions(+), 10 deletions(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 6fe2003..ed73c6f 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -28,6 +28,11 @@
 #include <asm/uaccess.h>
 #include <asm/tlbflush.h>
 
+#ifdef CONFIG_DEBUG_PAGEALLOC
+#define GUARD_PAGE_SIZE PAGE_SIZE
+#else
+#define GUARD_PAGE_SIZE 0
+#endif
 
 /*** Page table manipulation functions ***/
 
@@ -363,7 +368,7 @@ retry:
 		}
 
 		while (addr + size >= first->va_start && addr + size <= vend) {
-			addr = ALIGN(first->va_end + PAGE_SIZE, align);
+			addr = ALIGN(first->va_end, align);
 
 			n = rb_next(&first->rb_node);
 			if (n)
@@ -954,7 +959,7 @@ void unmap_kernel_range(unsigned long addr, unsigned long size)
 int map_vm_area(struct vm_struct *area, pgprot_t prot, struct page ***pages)
 {
 	unsigned long addr = (unsigned long)area->addr;
-	unsigned long end = addr + area->size - PAGE_SIZE;
+	unsigned long end = addr + area->size - GUARD_PAGE_SIZE;
 	int err;
 
 	err = vmap_page_range(addr, end, prot, *pages);
@@ -1003,7 +1008,7 @@ static struct vm_struct *__get_vm_area_node(unsigned long size,
 	/*
 	 * We always allocate a guard page.
 	 */
-	size += PAGE_SIZE;
+	size += GUARD_PAGE_SIZE;
 
 	va = alloc_vmap_area(size, align, start, end, node, gfp_mask);
 	if (IS_ERR(va)) {
@@ -1098,7 +1103,7 @@ struct vm_struct *remove_vm_area(const void *addr)
 		struct vm_struct *vm = va->private;
 		struct vm_struct *tmp, **p;
 		free_unmap_vmap_area(va);
-		vm->size -= PAGE_SIZE;
+		vm->size -= GUARD_PAGE_SIZE;
 
 		write_lock(&vmlist_lock);
 		for (p = &vmlist; (tmp = *p) != vm; p = &tmp->next)
@@ -1226,7 +1231,7 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
 	struct page **pages;
 	unsigned int nr_pages, array_size, i;
 
-	nr_pages = (area->size - PAGE_SIZE) >> PAGE_SHIFT;
+	nr_pages = (area->size - GUARD_PAGE_SIZE) >> PAGE_SHIFT;
 	array_size = (nr_pages * sizeof(struct page *));
 
 	area->nr_pages = nr_pages;
@@ -1451,7 +1456,7 @@ long vread(char *buf, char *addr, unsigned long count)
 	read_lock(&vmlist_lock);
 	for (tmp = vmlist; tmp; tmp = tmp->next) {
 		vaddr = (char *) tmp->addr;
-		if (addr >= vaddr + tmp->size - PAGE_SIZE)
+		if (addr >= vaddr + tmp->size - GUARD_PAGE_SIZE)
 			continue;
 		while (addr < vaddr) {
 			if (count == 0)
@@ -1461,7 +1466,7 @@ long vread(char *buf, char *addr, unsigned long count)
 			addr++;
 			count--;
 		}
-		n = vaddr + tmp->size - PAGE_SIZE - addr;
+		n = vaddr + tmp->size - GUARD_PAGE_SIZE - addr;
 		do {
 			if (count == 0)
 				goto finished;
@@ -1489,7 +1494,7 @@ long vwrite(char *buf, char *addr, unsigned long count)
 	read_lock(&vmlist_lock);
 	for (tmp = vmlist; tmp; tmp = tmp->next) {
 		vaddr = (char *) tmp->addr;
-		if (addr >= vaddr + tmp->size - PAGE_SIZE)
+		if (addr >= vaddr + tmp->size - GUARD_PAGE_SIZE)
 			continue;
 		while (addr < vaddr) {
 			if (count == 0)
@@ -1498,7 +1503,7 @@ long vwrite(char *buf, char *addr, unsigned long count)
 			addr++;
 			count--;
 		}
-		n = vaddr + tmp->size - PAGE_SIZE - addr;
+		n = vaddr + tmp->size - GUARD_PAGE_SIZE - addr;
 		do {
 			if (count == 0)
 				goto finished;
@@ -1544,7 +1549,7 @@ int remap_vmalloc_range(struct vm_area_struct *vma, void *addr,
 	if (!(area->flags & VM_USERMAP))
 		return -EINVAL;
 
-	if (usize + (pgoff << PAGE_SHIFT) > area->size - PAGE_SIZE)
+	if (usize + (pgoff << PAGE_SHIFT) > area->size - GUARD_PAGE_SIZE)
 		return -EINVAL;
 
 	addr += pgoff << PAGE_SHIFT;
-- 
1.5.6.5


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 0/4] Fix vmalloc regression
  2008-11-07 22:35 [PATCH 0/4] Fix vmalloc regression Glauber Costa
  2008-11-07 21:56 ` walt
  2008-11-07 22:35 ` [PATCH 1/4] don't call __vmalloc from other vmap internal functions Glauber Costa
@ 2008-11-08  0:58 ` Nick Piggin
  2008-11-08  2:13   ` Glauber Costa
  2 siblings, 1 reply; 10+ messages in thread
From: Nick Piggin @ 2008-11-08  0:58 UTC (permalink / raw)
  To: Glauber Costa; +Cc: linux-kernel, kvm, avi

On Fri, Nov 07, 2008 at 08:35:50PM -0200, Glauber Costa wrote:
> Nick,
> 
> This is the whole set of patches I was talking about.
> Patch 3 is the one that in fact fixes the problem
> Patches 1 and 2 are debugging aids I made use of, and could be possibly
> useful to others
> Patch 4 removes guard pages entirely for non-debug kernels, as we have already
> previously discussed.
> 
> Hope it's all fine.

OK, these all look good, but I may only push 3/4 for Linus in this round,
along with some of the changes from my patch that you tested as well.

With the DEBUG_PAGEALLOC case, I have been thinking that we perhaps should
turn off the lazy unmapping optimisation as well, so it catches use
after free similarly to the page allocator... but probably it is a good
idea at least to avoid the double-guard page for 2.6.28?

Anyway thanks for these, I'll send them up to Andrew/Linus and cc you.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 0/4] Fix vmalloc regression
  2008-11-08  0:58 ` [PATCH 0/4] Fix vmalloc regression Nick Piggin
@ 2008-11-08  2:13   ` Glauber Costa
  2008-11-08  2:54     ` Nick Piggin
  0 siblings, 1 reply; 10+ messages in thread
From: Glauber Costa @ 2008-11-08  2:13 UTC (permalink / raw)
  To: Nick Piggin; +Cc: linux-kernel, kvm, avi

On Sat, Nov 08, 2008 at 01:58:32AM +0100, Nick Piggin wrote:
> On Fri, Nov 07, 2008 at 08:35:50PM -0200, Glauber Costa wrote:
> > Nick,
> > 
> > This is the whole set of patches I was talking about.
> > Patch 3 is the one that in fact fixes the problem
> > Patches 1 and 2 are debugging aids I made use of, and could be possibly
> > useful to others
> > Patch 4 removes guard pages entirely for non-debug kernels, as we have already
> > previously discussed.
> > 
> > Hope it's all fine.
> 
> OK, these all look good, but I may only push 3/4 for Linus in this round,
> along with some of the changes from my patch that you tested as well.

Makes total sense.
> 
> With the DEBUG_PAGEALLOC case, I have been thinking that we perhaps should
> turn off the lazy unmapping optimisation as well, so it catches use
> after free similarly to the page allocator... but probably it is a good
> idea at least to avoid the double-guard page for 2.6.28?

Makes sense. Maybe poisoning after free would also be useful?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 0/4] Fix vmalloc regression
  2008-11-08  2:13   ` Glauber Costa
@ 2008-11-08  2:54     ` Nick Piggin
  0 siblings, 0 replies; 10+ messages in thread
From: Nick Piggin @ 2008-11-08  2:54 UTC (permalink / raw)
  To: Glauber Costa; +Cc: Nick Piggin, linux-kernel, kvm, avi

On Saturday 08 November 2008 13:13, Glauber Costa wrote:
> On Sat, Nov 08, 2008 at 01:58:32AM +0100, Nick Piggin wrote:
> > On Fri, Nov 07, 2008 at 08:35:50PM -0200, Glauber Costa wrote:
> > > Nick,
> > >
> > > This is the whole set of patches I was talking about.
> > > Patch 3 is the one that in fact fixes the problem
> > > Patches 1 and 2 are debugging aids I made use of, and could be possibly
> > > useful to others
> > > Patch 4 removes guard pages entirely for non-debug kernels, as we have
> > > already previously discussed.
> > >
> > > Hope it's all fine.
> >
> > OK, these all look good, but I may only push 3/4 for Linus in this round,
> > along with some of the changes from my patch that you tested as well.
>
> Makes total sense.

OK, sent. Thanks again.


> > With the DEBUG_PAGEALLOC case, I have been thinking that we perhaps
> > should turn off the lazy unmapping optimisation as well, so it catches
> > use after free similarly to the page allocator... but probably it is a
> > good idea at least to avoid the double-guard page for 2.6.28?
>
> Makes sense. Maybe poisoning after free would also be useful?

It's a problem because we're only dealing with virtual address, rather
than real memory. So we don't really have anything to poison (we don't
know what the caller will do with the memory). I guess it would be
possible to poison in the page allocator or in vfree, but.... probably
not worthwhile (after the immediate-unmap debug option).

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 3/4] restart search at beggining of vmalloc address
  2008-11-07 22:35     ` [PATCH 3/4] restart search at beggining of vmalloc address Glauber Costa
  2008-11-07 22:35       ` [PATCH 4/4] Do not use guard pages in non-debug kernels Glauber Costa
@ 2008-11-14 17:53       ` walt
  1 sibling, 0 replies; 10+ messages in thread
From: walt @ 2008-11-14 17:53 UTC (permalink / raw)
  To: kvm; +Cc: linux-kernel

Glauber Costa wrote:
> Current vmalloc restart search for a free area in case we
> can't find one. The reason is there are areas which are lazily
> freed, and could be possibly freed now. However, current implementation
> start searching the tree from the last failing address, which is
> pretty much by definition at the end of address space. So, we fail.
>
> The proposal of this patch is to restart the search from the beginning
> of the requested vstart address. This fixes the regression in running
> KVM virtual machines for me, described in
> http://lkml.org/lkml/2008/10/28/349, caused by commit
> db64fe02258f1507e13fe5212a989922323685ce.
>
> Signed-off-by: Glauber Costa<glommer@redhat.com>
> CC: Nick Piggin<npiggin@suse.de>
> ---
>   mm/vmalloc.c |    1 +
>   1 files changed, 1 insertions(+), 0 deletions(-)
>
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 7db493d..6fe2003 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -378,6 +378,7 @@ found:
>   		if (!purged) {
>   			purge_vmap_area_lazy();
>   			purged = 1;
> +			addr = ALIGN(vstart, align);
>   			goto retry;
>   		}
>   		if (printk_ratelimit())

Linus doesn't seem to have this patch yet.  Is it still needed?



^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2008-11-14 17:53 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-11-07 22:35 [PATCH 0/4] Fix vmalloc regression Glauber Costa
2008-11-07 21:56 ` walt
2008-11-07 22:35 ` [PATCH 1/4] don't call __vmalloc from other vmap internal functions Glauber Costa
2008-11-07 22:35   ` [PATCH 2/4] show size of failing allocation Glauber Costa
2008-11-07 22:35     ` [PATCH 3/4] restart search at beggining of vmalloc address Glauber Costa
2008-11-07 22:35       ` [PATCH 4/4] Do not use guard pages in non-debug kernels Glauber Costa
2008-11-14 17:53       ` [PATCH 3/4] restart search at beggining of vmalloc address walt
2008-11-08  0:58 ` [PATCH 0/4] Fix vmalloc regression Nick Piggin
2008-11-08  2:13   ` Glauber Costa
2008-11-08  2:54     ` Nick Piggin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox