* [PATCH 0/4] Fix vmalloc regression
@ 2008-11-07 22:35 Glauber Costa
2008-11-07 21:56 ` walt
` (2 more replies)
0 siblings, 3 replies; 10+ messages in thread
From: Glauber Costa @ 2008-11-07 22:35 UTC (permalink / raw)
To: linux-kernel; +Cc: kvm, avi, npiggin
Nick,
This is the whole set of patches I was talking about.
Patch 3 is the one that in fact fixes the problem
Patches 1 and 2 are debugging aids I made use of, and could be possibly
useful to others
Patch 4 removes guard pages entirely for non-debug kernels, as we have already
previously discussed.
Hope it's all fine.
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: [PATCH 0/4] Fix vmalloc regression 2008-11-07 22:35 [PATCH 0/4] Fix vmalloc regression Glauber Costa @ 2008-11-07 21:56 ` walt 2008-11-07 22:35 ` [PATCH 1/4] don't call __vmalloc from other vmap internal functions Glauber Costa 2008-11-08 0:58 ` [PATCH 0/4] Fix vmalloc regression Nick Piggin 2 siblings, 0 replies; 10+ messages in thread From: walt @ 2008-11-07 21:56 UTC (permalink / raw) To: linux-kernel; +Cc: kvm Glauber Costa wrote: > Nick, > > This is the whole set of patches I was talking about. > Patch 3 is the one that in fact fixes the problem... Yep, patch 3 works for me, thanks. Only the 32-bit kernel seems to need the patch, FWIW. ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 1/4] don't call __vmalloc from other vmap internal functions 2008-11-07 22:35 [PATCH 0/4] Fix vmalloc regression Glauber Costa 2008-11-07 21:56 ` walt @ 2008-11-07 22:35 ` Glauber Costa 2008-11-07 22:35 ` [PATCH 2/4] show size of failing allocation Glauber Costa 2008-11-08 0:58 ` [PATCH 0/4] Fix vmalloc regression Nick Piggin 2 siblings, 1 reply; 10+ messages in thread From: Glauber Costa @ 2008-11-07 22:35 UTC (permalink / raw) To: linux-kernel; +Cc: kvm, avi, npiggin If we do that, output of files like /proc/vmallocinfo will show things like "vmalloc_32", "vmalloc_user", or whomever the caller was as the caller. This info is not as useful as the real caller of the allocation. So, proposal is to call __vmalloc_node node directly, with matching parameters to save the caller information Signed-off-by: Glauber Costa <glommer@redhat.com> --- mm/vmalloc.c | 12 ++++++++---- 1 files changed, 8 insertions(+), 4 deletions(-) diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 0365369..95856d1 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -1343,7 +1343,8 @@ void *vmalloc_user(unsigned long size) struct vm_struct *area; void *ret; - ret = __vmalloc(size, GFP_KERNEL | __GFP_HIGHMEM | __GFP_ZERO, PAGE_KERNEL); + ret = __vmalloc_node(size, GFP_KERNEL | __GFP_HIGHMEM | __GFP_ZERO, + PAGE_KERNEL, -1, __builtin_return_address(0)); if (ret) { area = find_vm_area(ret); area->flags |= VM_USERMAP; @@ -1388,7 +1389,8 @@ EXPORT_SYMBOL(vmalloc_node); void *vmalloc_exec(unsigned long size) { - return __vmalloc(size, GFP_KERNEL | __GFP_HIGHMEM, PAGE_KERNEL_EXEC); + return __vmalloc_node(size, GFP_KERNEL | __GFP_HIGHMEM, PAGE_KERNEL_EXEC, + -1, __builtin_return_address(0)); } #if defined(CONFIG_64BIT) && defined(CONFIG_ZONE_DMA32) @@ -1408,7 +1410,8 @@ void *vmalloc_exec(unsigned long size) */ void *vmalloc_32(unsigned long size) { - return __vmalloc(size, GFP_VMALLOC32, PAGE_KERNEL); + return __vmalloc_node(size, GFP_VMALLOC32, PAGE_KERNEL, + -1, __builtin_return_address(0)); } EXPORT_SYMBOL(vmalloc_32); @@ -1424,7 +1427,8 @@ void *vmalloc_32_user(unsigned long size) struct vm_struct *area; void *ret; - ret = __vmalloc(size, GFP_VMALLOC32 | __GFP_ZERO, PAGE_KERNEL); + ret = __vmalloc_node(size, GFP_VMALLOC32 | __GFP_ZERO, PAGE_KERNEL, + -1, __builtin_return_address(0)); if (ret) { area = find_vm_area(ret); area->flags |= VM_USERMAP; -- 1.5.6.5 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 2/4] show size of failing allocation 2008-11-07 22:35 ` [PATCH 1/4] don't call __vmalloc from other vmap internal functions Glauber Costa @ 2008-11-07 22:35 ` Glauber Costa 2008-11-07 22:35 ` [PATCH 3/4] restart search at beggining of vmalloc address Glauber Costa 0 siblings, 1 reply; 10+ messages in thread From: Glauber Costa @ 2008-11-07 22:35 UTC (permalink / raw) To: linux-kernel; +Cc: kvm, avi, npiggin if we can't service a vmalloc allocation, show size of the allocation that actually failed. Useful for debugging. Signed-off-by: Glauber Costa <glommer@redhat.com> --- mm/vmalloc.c | 4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 95856d1..7db493d 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -381,8 +381,8 @@ found: goto retry; } if (printk_ratelimit()) - printk(KERN_WARNING "vmap allocation failed: " - "use vmalloc=<size> to increase size.\n"); + printk(KERN_WARNING "vmap allocation for size %d failed: " + "use vmalloc=<size> to increase size.\n", size); return ERR_PTR(-EBUSY); } -- 1.5.6.5 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 3/4] restart search at beggining of vmalloc address 2008-11-07 22:35 ` [PATCH 2/4] show size of failing allocation Glauber Costa @ 2008-11-07 22:35 ` Glauber Costa 2008-11-07 22:35 ` [PATCH 4/4] Do not use guard pages in non-debug kernels Glauber Costa 2008-11-14 17:53 ` [PATCH 3/4] restart search at beggining of vmalloc address walt 0 siblings, 2 replies; 10+ messages in thread From: Glauber Costa @ 2008-11-07 22:35 UTC (permalink / raw) To: linux-kernel; +Cc: kvm, avi, npiggin Current vmalloc restart search for a free area in case we can't find one. The reason is there are areas which are lazily freed, and could be possibly freed now. However, current implementation start searching the tree from the last failing address, which is pretty much by definition at the end of address space. So, we fail. The proposal of this patch is to restart the search from the beginning of the requested vstart address. This fixes the regression in running KVM virtual machines for me, described in http://lkml.org/lkml/2008/10/28/349, caused by commit db64fe02258f1507e13fe5212a989922323685ce. Signed-off-by: Glauber Costa <glommer@redhat.com> CC: Nick Piggin <npiggin@suse.de> --- mm/vmalloc.c | 1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 7db493d..6fe2003 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -378,6 +378,7 @@ found: if (!purged) { purge_vmap_area_lazy(); purged = 1; + addr = ALIGN(vstart, align); goto retry; } if (printk_ratelimit()) -- 1.5.6.5 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 4/4] Do not use guard pages in non-debug kernels 2008-11-07 22:35 ` [PATCH 3/4] restart search at beggining of vmalloc address Glauber Costa @ 2008-11-07 22:35 ` Glauber Costa 2008-11-14 17:53 ` [PATCH 3/4] restart search at beggining of vmalloc address walt 1 sibling, 0 replies; 10+ messages in thread From: Glauber Costa @ 2008-11-07 22:35 UTC (permalink / raw) To: linux-kernel; +Cc: kvm, avi, npiggin In mm/vmalloc.c, make usage of guard pages dependant on CONFIG_DEBUG_PAGEALLOC. Signed-off-by: Glauber Costa <glommer@redhat.com> --- mm/vmalloc.c | 25 +++++++++++++++---------- 1 files changed, 15 insertions(+), 10 deletions(-) diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 6fe2003..ed73c6f 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -28,6 +28,11 @@ #include <asm/uaccess.h> #include <asm/tlbflush.h> +#ifdef CONFIG_DEBUG_PAGEALLOC +#define GUARD_PAGE_SIZE PAGE_SIZE +#else +#define GUARD_PAGE_SIZE 0 +#endif /*** Page table manipulation functions ***/ @@ -363,7 +368,7 @@ retry: } while (addr + size >= first->va_start && addr + size <= vend) { - addr = ALIGN(first->va_end + PAGE_SIZE, align); + addr = ALIGN(first->va_end, align); n = rb_next(&first->rb_node); if (n) @@ -954,7 +959,7 @@ void unmap_kernel_range(unsigned long addr, unsigned long size) int map_vm_area(struct vm_struct *area, pgprot_t prot, struct page ***pages) { unsigned long addr = (unsigned long)area->addr; - unsigned long end = addr + area->size - PAGE_SIZE; + unsigned long end = addr + area->size - GUARD_PAGE_SIZE; int err; err = vmap_page_range(addr, end, prot, *pages); @@ -1003,7 +1008,7 @@ static struct vm_struct *__get_vm_area_node(unsigned long size, /* * We always allocate a guard page. */ - size += PAGE_SIZE; + size += GUARD_PAGE_SIZE; va = alloc_vmap_area(size, align, start, end, node, gfp_mask); if (IS_ERR(va)) { @@ -1098,7 +1103,7 @@ struct vm_struct *remove_vm_area(const void *addr) struct vm_struct *vm = va->private; struct vm_struct *tmp, **p; free_unmap_vmap_area(va); - vm->size -= PAGE_SIZE; + vm->size -= GUARD_PAGE_SIZE; write_lock(&vmlist_lock); for (p = &vmlist; (tmp = *p) != vm; p = &tmp->next) @@ -1226,7 +1231,7 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask, struct page **pages; unsigned int nr_pages, array_size, i; - nr_pages = (area->size - PAGE_SIZE) >> PAGE_SHIFT; + nr_pages = (area->size - GUARD_PAGE_SIZE) >> PAGE_SHIFT; array_size = (nr_pages * sizeof(struct page *)); area->nr_pages = nr_pages; @@ -1451,7 +1456,7 @@ long vread(char *buf, char *addr, unsigned long count) read_lock(&vmlist_lock); for (tmp = vmlist; tmp; tmp = tmp->next) { vaddr = (char *) tmp->addr; - if (addr >= vaddr + tmp->size - PAGE_SIZE) + if (addr >= vaddr + tmp->size - GUARD_PAGE_SIZE) continue; while (addr < vaddr) { if (count == 0) @@ -1461,7 +1466,7 @@ long vread(char *buf, char *addr, unsigned long count) addr++; count--; } - n = vaddr + tmp->size - PAGE_SIZE - addr; + n = vaddr + tmp->size - GUARD_PAGE_SIZE - addr; do { if (count == 0) goto finished; @@ -1489,7 +1494,7 @@ long vwrite(char *buf, char *addr, unsigned long count) read_lock(&vmlist_lock); for (tmp = vmlist; tmp; tmp = tmp->next) { vaddr = (char *) tmp->addr; - if (addr >= vaddr + tmp->size - PAGE_SIZE) + if (addr >= vaddr + tmp->size - GUARD_PAGE_SIZE) continue; while (addr < vaddr) { if (count == 0) @@ -1498,7 +1503,7 @@ long vwrite(char *buf, char *addr, unsigned long count) addr++; count--; } - n = vaddr + tmp->size - PAGE_SIZE - addr; + n = vaddr + tmp->size - GUARD_PAGE_SIZE - addr; do { if (count == 0) goto finished; @@ -1544,7 +1549,7 @@ int remap_vmalloc_range(struct vm_area_struct *vma, void *addr, if (!(area->flags & VM_USERMAP)) return -EINVAL; - if (usize + (pgoff << PAGE_SHIFT) > area->size - PAGE_SIZE) + if (usize + (pgoff << PAGE_SHIFT) > area->size - GUARD_PAGE_SIZE) return -EINVAL; addr += pgoff << PAGE_SHIFT; -- 1.5.6.5 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH 3/4] restart search at beggining of vmalloc address 2008-11-07 22:35 ` [PATCH 3/4] restart search at beggining of vmalloc address Glauber Costa 2008-11-07 22:35 ` [PATCH 4/4] Do not use guard pages in non-debug kernels Glauber Costa @ 2008-11-14 17:53 ` walt 1 sibling, 0 replies; 10+ messages in thread From: walt @ 2008-11-14 17:53 UTC (permalink / raw) To: kvm; +Cc: linux-kernel Glauber Costa wrote: > Current vmalloc restart search for a free area in case we > can't find one. The reason is there are areas which are lazily > freed, and could be possibly freed now. However, current implementation > start searching the tree from the last failing address, which is > pretty much by definition at the end of address space. So, we fail. > > The proposal of this patch is to restart the search from the beginning > of the requested vstart address. This fixes the regression in running > KVM virtual machines for me, described in > http://lkml.org/lkml/2008/10/28/349, caused by commit > db64fe02258f1507e13fe5212a989922323685ce. > > Signed-off-by: Glauber Costa<glommer@redhat.com> > CC: Nick Piggin<npiggin@suse.de> > --- > mm/vmalloc.c | 1 + > 1 files changed, 1 insertions(+), 0 deletions(-) > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c > index 7db493d..6fe2003 100644 > --- a/mm/vmalloc.c > +++ b/mm/vmalloc.c > @@ -378,6 +378,7 @@ found: > if (!purged) { > purge_vmap_area_lazy(); > purged = 1; > + addr = ALIGN(vstart, align); > goto retry; > } > if (printk_ratelimit()) Linus doesn't seem to have this patch yet. Is it still needed? ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 0/4] Fix vmalloc regression 2008-11-07 22:35 [PATCH 0/4] Fix vmalloc regression Glauber Costa 2008-11-07 21:56 ` walt 2008-11-07 22:35 ` [PATCH 1/4] don't call __vmalloc from other vmap internal functions Glauber Costa @ 2008-11-08 0:58 ` Nick Piggin 2008-11-08 2:13 ` Glauber Costa 2 siblings, 1 reply; 10+ messages in thread From: Nick Piggin @ 2008-11-08 0:58 UTC (permalink / raw) To: Glauber Costa; +Cc: linux-kernel, kvm, avi On Fri, Nov 07, 2008 at 08:35:50PM -0200, Glauber Costa wrote: > Nick, > > This is the whole set of patches I was talking about. > Patch 3 is the one that in fact fixes the problem > Patches 1 and 2 are debugging aids I made use of, and could be possibly > useful to others > Patch 4 removes guard pages entirely for non-debug kernels, as we have already > previously discussed. > > Hope it's all fine. OK, these all look good, but I may only push 3/4 for Linus in this round, along with some of the changes from my patch that you tested as well. With the DEBUG_PAGEALLOC case, I have been thinking that we perhaps should turn off the lazy unmapping optimisation as well, so it catches use after free similarly to the page allocator... but probably it is a good idea at least to avoid the double-guard page for 2.6.28? Anyway thanks for these, I'll send them up to Andrew/Linus and cc you. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 0/4] Fix vmalloc regression 2008-11-08 0:58 ` [PATCH 0/4] Fix vmalloc regression Nick Piggin @ 2008-11-08 2:13 ` Glauber Costa 2008-11-08 2:54 ` Nick Piggin 0 siblings, 1 reply; 10+ messages in thread From: Glauber Costa @ 2008-11-08 2:13 UTC (permalink / raw) To: Nick Piggin; +Cc: linux-kernel, kvm, avi On Sat, Nov 08, 2008 at 01:58:32AM +0100, Nick Piggin wrote: > On Fri, Nov 07, 2008 at 08:35:50PM -0200, Glauber Costa wrote: > > Nick, > > > > This is the whole set of patches I was talking about. > > Patch 3 is the one that in fact fixes the problem > > Patches 1 and 2 are debugging aids I made use of, and could be possibly > > useful to others > > Patch 4 removes guard pages entirely for non-debug kernels, as we have already > > previously discussed. > > > > Hope it's all fine. > > OK, these all look good, but I may only push 3/4 for Linus in this round, > along with some of the changes from my patch that you tested as well. Makes total sense. > > With the DEBUG_PAGEALLOC case, I have been thinking that we perhaps should > turn off the lazy unmapping optimisation as well, so it catches use > after free similarly to the page allocator... but probably it is a good > idea at least to avoid the double-guard page for 2.6.28? Makes sense. Maybe poisoning after free would also be useful? ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 0/4] Fix vmalloc regression 2008-11-08 2:13 ` Glauber Costa @ 2008-11-08 2:54 ` Nick Piggin 0 siblings, 0 replies; 10+ messages in thread From: Nick Piggin @ 2008-11-08 2:54 UTC (permalink / raw) To: Glauber Costa; +Cc: Nick Piggin, linux-kernel, kvm, avi On Saturday 08 November 2008 13:13, Glauber Costa wrote: > On Sat, Nov 08, 2008 at 01:58:32AM +0100, Nick Piggin wrote: > > On Fri, Nov 07, 2008 at 08:35:50PM -0200, Glauber Costa wrote: > > > Nick, > > > > > > This is the whole set of patches I was talking about. > > > Patch 3 is the one that in fact fixes the problem > > > Patches 1 and 2 are debugging aids I made use of, and could be possibly > > > useful to others > > > Patch 4 removes guard pages entirely for non-debug kernels, as we have > > > already previously discussed. > > > > > > Hope it's all fine. > > > > OK, these all look good, but I may only push 3/4 for Linus in this round, > > along with some of the changes from my patch that you tested as well. > > Makes total sense. OK, sent. Thanks again. > > With the DEBUG_PAGEALLOC case, I have been thinking that we perhaps > > should turn off the lazy unmapping optimisation as well, so it catches > > use after free similarly to the page allocator... but probably it is a > > good idea at least to avoid the double-guard page for 2.6.28? > > Makes sense. Maybe poisoning after free would also be useful? It's a problem because we're only dealing with virtual address, rather than real memory. So we don't really have anything to poison (we don't know what the caller will do with the memory). I guess it would be possible to poison in the page allocator or in vfree, but.... probably not worthwhile (after the immediate-unmap debug option). ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2008-11-14 17:53 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-11-07 22:35 [PATCH 0/4] Fix vmalloc regression Glauber Costa 2008-11-07 21:56 ` walt 2008-11-07 22:35 ` [PATCH 1/4] don't call __vmalloc from other vmap internal functions Glauber Costa 2008-11-07 22:35 ` [PATCH 2/4] show size of failing allocation Glauber Costa 2008-11-07 22:35 ` [PATCH 3/4] restart search at beggining of vmalloc address Glauber Costa 2008-11-07 22:35 ` [PATCH 4/4] Do not use guard pages in non-debug kernels Glauber Costa 2008-11-14 17:53 ` [PATCH 3/4] restart search at beggining of vmalloc address walt 2008-11-08 0:58 ` [PATCH 0/4] Fix vmalloc regression Nick Piggin 2008-11-08 2:13 ` Glauber Costa 2008-11-08 2:54 ` Nick Piggin
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox