* %u-order allocation failed @ 2001-10-05 11:07 Krzysztof Rusocki 2001-10-05 11:59 ` Rik van Riel 0 siblings, 1 reply; 54+ messages in thread From: Krzysztof Rusocki @ 2001-10-05 11:07 UTC (permalink / raw) To: linux-xfs; +Cc: linux-kernel Hi, After simple bash fork bombing (about 200 forks) on my UP Celeron/96MB I get quite a lot %u-allocations failed, but only when swap is turned off. When it's turned on, processes are still forking for some time until i get messages like 'fork: Resource temporarily unavailable' or 'cannot redirect /dev/null: too many open files in system' (or similar) and also 'cannot load libdl.so blah blah return code 23' (don't remember exact message)... load goes up to about 700 but _none_ of processess get killed. Machine is almost unresponsible that time... i hardly managed to Alt+SysRQ+UB ... As mentioned in some other mail - no highmem, no lvm, md as module (unused). 2.4.10-xfs cvs co 25th September (not 12th :/ - info in previous mail was incorrect) When swap was off first i got some of 0-order (gfp=0x1d2/0) from c012ac08 (_alloc_pages+24) beside it, in a few seconds also noticed 0-order (gfp=0x1f0/0) from c012ac08 0-order (gfp=0xf0/0) from c012ac08 at random order.... I also saw a really small number of 1-order (gfp=0x1f0/0) from c012ac08 During that time almost all processess were killed by VM, machine was more responsible so i could freely do Alt+SysRQ+K and everything went back to normal... I'm not familiar with LinuxVM.. so... is it normal behaviour ? or (if not) what's happening when such messages are printed my kernel ? Cheers, Krzysztof PS lkml people - please CC, ain't subscribing. ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: %u-order allocation failed 2001-10-05 11:07 %u-order allocation failed Krzysztof Rusocki @ 2001-10-05 11:59 ` Rik van Riel 2001-10-05 20:18 ` Seth Mos 2001-10-06 14:00 ` Mikulas Patocka 0 siblings, 2 replies; 54+ messages in thread From: Rik van Riel @ 2001-10-05 11:59 UTC (permalink / raw) To: Krzysztof Rusocki; +Cc: linux-xfs, linux-kernel On Fri, 5 Oct 2001, Krzysztof Rusocki wrote: > After simple bash fork bombing (about 200 forks) on my UP Celeron/96MB > I get quite a lot %u-allocations failed, but only when swap is turned > off. > I'm not familiar with LinuxVM.. so... is it normal behaviour ? or (if not) > what's happening when such messages are printed my kernel ? This is perfectly normal behaviour: 1) on your system, you have no process limit configured for yourself so you can start processes until all resources (memory, file descriptors, ...) are used 2) when all processes are used, there really is no way the kernel can buy you more hardware on ebay and install it on the fly ... all it can do is start failing allocations On production systems, good admins setup per-user limits for the various resources so no single user is able to run the system into the ground. regards, Rik -- DMCA, SSSCA, W3C? Who cares? http://thefreeworld.net/ (volunteers needed) http://www.surriel.com/ http://distro.conectiva.com/ ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: %u-order allocation failed 2001-10-05 11:59 ` Rik van Riel @ 2001-10-05 20:18 ` Seth Mos 2001-10-05 20:22 ` Rik van Riel 2001-10-05 22:06 ` David Schwartz 2001-10-06 14:00 ` Mikulas Patocka 1 sibling, 2 replies; 54+ messages in thread From: Seth Mos @ 2001-10-05 20:18 UTC (permalink / raw) To: Rik van Riel; +Cc: Krzysztof Rusocki, linux-xfs, linux-kernel On Fri, 5 Oct 2001, Rik van Riel wrote: > On Fri, 5 Oct 2001, Krzysztof Rusocki wrote: > > > After simple bash fork bombing (about 200 forks) on my UP Celeron/96MB > > I get quite a lot %u-allocations failed, but only when swap is turned > > off. > > > I'm not familiar with LinuxVM.. so... is it normal behaviour ? or (if not) > > what's happening when such messages are printed my kernel ? > > This is perfectly normal behaviour: > > 1) on your system, you have no process limit configured for > yourself so you can start processes until all resources > (memory, file descriptors, ...) are used Fair enough. > 2) when all processes are used, there really is no way the > kernel can buy you more hardware on ebay and install it > on the fly ... all it can do is start failing allocations So it needs a handbrake in case of a emergency? The box at work deadlocks or crashes. I can hardly call that normal operational behaviour. I have a Dell PE 2500 (Serverworks LE) with 2GB ram and 2 1.13Ghz processors. If I disable HIGHMEM (4GB) support the box does not produce these allocations messages and does not deadlock or die under the same load or worse. What I used was a mongo.pl with 5 processes (does not matter if the fs is ext2 reiserfs or xfs) and the box dies within minutes/seconds after starting the benchmark. This happens using either 2.4.10-xfs or 2.4.11-pre3-xfs. Using a single process hides the issue. > On production systems, good admins setup per-user limits for > the various resources so no single user is able to run the > system into the ground. The system is beafy enough to tolerate something mundane as this. It should definitely not die. Cheers Seth ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: %u-order allocation failed 2001-10-05 20:18 ` Seth Mos @ 2001-10-05 20:22 ` Rik van Riel 2001-10-05 20:31 ` Seth Mos 2001-10-05 22:06 ` David Schwartz 1 sibling, 1 reply; 54+ messages in thread From: Rik van Riel @ 2001-10-05 20:22 UTC (permalink / raw) To: Seth Mos; +Cc: Krzysztof Rusocki, linux-xfs, linux-kernel On Fri, 5 Oct 2001, Seth Mos wrote: > This happens using either 2.4.10-xfs or 2.4.11-pre3-xfs. Ohh duh, IIRC there are a bunch of highmem bugs in -linus which are fixed in -ac. Can you reproduce the bug with an -ac kernel ? regards, Rik -- DMCA, SSSCA, W3C? Who cares? http://thefreeworld.net/ (volunteers needed) http://www.surriel.com/ http://distro.conectiva.com/ ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: %u-order allocation failed 2001-10-05 20:22 ` Rik van Riel @ 2001-10-05 20:31 ` Seth Mos 2001-10-05 20:43 ` Steve Lord 0 siblings, 1 reply; 54+ messages in thread From: Seth Mos @ 2001-10-05 20:31 UTC (permalink / raw) To: Rik van Riel; +Cc: Krzysztof Rusocki, linux-xfs, linux-kernel On Fri, 5 Oct 2001, Rik van Riel wrote: > On Fri, 5 Oct 2001, Seth Mos wrote: > > > This happens using either 2.4.10-xfs or 2.4.11-pre3-xfs. > > Ohh duh, IIRC there are a bunch of highmem bugs in > -linus which are fixed in -ac. Fitting XFS onto a -ac kernel should be fun :-( I will try this over the weekend or get a redhat kernel going which is also -ac based. That would come in handy for other people using XFS since a lot are using highmem in combination with this fs. > Can you reproduce the bug with an -ac kernel ? I am not that good/fast at patching. Expect something over the weekend :-) Bye Seth ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: %u-order allocation failed 2001-10-05 20:31 ` Seth Mos @ 2001-10-05 20:43 ` Steve Lord 2001-10-05 21:09 ` Seth Mos 0 siblings, 1 reply; 54+ messages in thread From: Steve Lord @ 2001-10-05 20:43 UTC (permalink / raw) To: Seth Mos; +Cc: Rik van Riel, Krzysztof Rusocki, linux-xfs, linux-kernel > On Fri, 5 Oct 2001, Rik van Riel wrote: > > > On Fri, 5 Oct 2001, Seth Mos wrote: > > > > > This happens using either 2.4.10-xfs or 2.4.11-pre3-xfs. > > > > Ohh duh, IIRC there are a bunch of highmem bugs in > > -linus which are fixed in -ac. > > Fitting XFS onto a -ac kernel should be fun :-( Its not that that simple - I tried before I got dragged kicking and screaming back into some Irix stuff. Just running mongo on ext2 on a HIGHMEM ac kernel should show if things are better there - since the problems seem to be fairly filesystem independent. Steve > > I will try this over the weekend or get a redhat kernel going which is > also -ac based. That would come in handy for other people using XFS since > a lot are using highmem in combination with this fs. > > > Can you reproduce the bug with an -ac kernel ? > > I am not that good/fast at patching. Expect something over the weekend :-) > > Bye > Seth ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: %u-order allocation failed 2001-10-05 20:43 ` Steve Lord @ 2001-10-05 21:09 ` Seth Mos 0 siblings, 0 replies; 54+ messages in thread From: Seth Mos @ 2001-10-05 21:09 UTC (permalink / raw) To: Steve Lord; +Cc: Rik van Riel, Krzysztof Rusocki, linux-xfs, linux-kernel On Fri, 5 Oct 2001, Steve Lord wrote: > > On Fri, 5 Oct 2001, Rik van Riel wrote: > > > > > On Fri, 5 Oct 2001, Seth Mos wrote: > > > > > > > This happens using either 2.4.10-xfs or 2.4.11-pre3-xfs. > > > > > > Ohh duh, IIRC there are a bunch of highmem bugs in > > > -linus which are fixed in -ac. > > > > Fitting XFS onto a -ac kernel should be fun :-( > > Its not that that simple - I tried before I got dragged kicking and > screaming back into some Irix stuff. Just running mongo on ext2 > on a HIGHMEM ac kernel should show if things are better there - since > the problems seem to be fairly filesystem independent. I don't have a HIGHMEM box without XFS filesystems. So i have to merge both -ac and the xfs tree to test it. I can reformat the box ofcourse but that would mean next week. If I can win a day and spare a reformat I am willing to make that sacrifice. ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: %u-order allocation failed 2001-10-05 20:18 ` Seth Mos 2001-10-05 20:22 ` Rik van Riel @ 2001-10-05 22:06 ` David Schwartz 2001-10-05 22:16 ` Seth Mos 1 sibling, 1 reply; 54+ messages in thread From: David Schwartz @ 2001-10-05 22:06 UTC (permalink / raw) To: knuffie, Rik van Riel; +Cc: Krzysztof Rusocki, linux-xfs, linux-kernel >The system is beafy enough to tolerate something mundane as this. It should >definitely not die. A fork bomb with no limits attempts to create an infinite number of processes. No system can be that beefy. DS ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: %u-order allocation failed 2001-10-05 22:06 ` David Schwartz @ 2001-10-05 22:16 ` Seth Mos 0 siblings, 0 replies; 54+ messages in thread From: Seth Mos @ 2001-10-05 22:16 UTC (permalink / raw) To: David Schwartz; +Cc: Rik van Riel, Krzysztof Rusocki, linux-xfs, linux-kernel On Fri, 5 Oct 2001, David Schwartz wrote: > > >The system is beafy enough to tolerate something mundane as this. It should > >definitely not die. > > A fork bomb with no limits attempts to create an infinite number of > processes. No system can be that beefy. I was refering to the mundane load of mongo.pl with 5 processes. Something the systems should withstand. If you have more then 10GB of database to access you would want it to work. I am not talking about a lot of processes but a lot of disk IO. I have just one box running SMP with highmem and that one is acting funny. All the other SMP ur Uni servers have absolutely no problems. Disable highmem and the problem goes away while halving your ram. That is not very efficient is it? Cheers ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: %u-order allocation failed 2001-10-05 11:59 ` Rik van Riel 2001-10-05 20:18 ` Seth Mos @ 2001-10-06 14:00 ` Mikulas Patocka 2001-10-06 14:03 ` Rik van Riel 1 sibling, 1 reply; 54+ messages in thread From: Mikulas Patocka @ 2001-10-06 14:00 UTC (permalink / raw) To: Rik van Riel; +Cc: Krzysztof Rusocki, linux-xfs, linux-kernel > > After simple bash fork bombing (about 200 forks) on my UP Celeron/96MB > > I get quite a lot %u-allocations failed, but only when swap is turned > > off. > > > I'm not familiar with LinuxVM.. so... is it normal behaviour ? or (if not) > > what's happening when such messages are printed my kernel ? > > This is perfectly normal behaviour: > > 1) on your system, you have no process limit configured for > yourself so you can start processes until all resources > (memory, file descriptors, ...) are used > > 2) when all processes are used, there really is no way the > kernel can buy you more hardware on ebay and install it > on the fly ... all it can do is start failing allocations > > On production systems, good admins setup per-user limits for > the various resources so no single user is able to run the > system into the ground. No, it's not normal. It is long-standing bug - I think from 2.2 kernels. You know that without swap and with certain memory allocation strategy (when process in a loop allocates one anonymous page, one file cache page and again...) this bug can be triggered even when there is half memory free. Buddy allocator is broken - kill it. Or at least do not misuse it for anything except kernel or driver initialization. Mikulas ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: %u-order allocation failed 2001-10-06 14:00 ` Mikulas Patocka @ 2001-10-06 14:03 ` Rik van Riel 2001-10-06 14:44 ` Mikulas Patocka 0 siblings, 1 reply; 54+ messages in thread From: Rik van Riel @ 2001-10-06 14:03 UTC (permalink / raw) To: Mikulas Patocka; +Cc: Krzysztof Rusocki, linux-xfs, linux-kernel On Sat, 6 Oct 2001, Mikulas Patocka wrote: > Buddy allocator is broken - kill it. Or at least do not misuse it for > anything except kernel or driver initialization. Please send patches to get rid of the buddy allocator while still making it possible to allocate contiguous chunks of memory. If you have any idea on how to fix things, this would be a good time to let us know. cheers, Rik -- DMCA, SSSCA, W3C? Who cares? http://thefreeworld.net/ (volunteers needed) http://www.surriel.com/ http://distro.conectiva.com/ ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: %u-order allocation failed 2001-10-06 14:03 ` Rik van Riel @ 2001-10-06 14:44 ` Mikulas Patocka 2001-10-06 15:31 ` Mikulas Patocka ` (3 more replies) 0 siblings, 4 replies; 54+ messages in thread From: Mikulas Patocka @ 2001-10-06 14:44 UTC (permalink / raw) To: Rik van Riel; +Cc: Krzysztof Rusocki, linux-xfs, linux-kernel [-- Attachment #1: Type: TEXT/PLAIN, Size: 805 bytes --] On Sat, 6 Oct 2001, Rik van Riel wrote: > On Sat, 6 Oct 2001, Mikulas Patocka wrote: > > > Buddy allocator is broken - kill it. Or at least do not misuse it for > > anything except kernel or driver initialization. > > Please send patches to get rid of the buddy allocator while > still making it possible to allocate contiguous chunks of > memory. > > If you have any idea on how to fix things, this would be a > good time to let us know. Here goes the fix. (note that I didn't try to compile it so there may be bugs, but you see the point). kmalloc should be fixed too (used badly for example in select.c - and yes - I have seen real world bugreports for poll randomly failing with ENOMEM), but it will be hard to audit all drivers that they do not try to use dma on kmallocated memory. Mikulas [-- Attachment #2: Type: TEXT/PLAIN, Size: 2274 bytes --] diff -u -r linux-orig/include/asm-i386/processor.h linux/include/asm-i386/processor.h --- linux-orig/include/asm-i386/processor.h Sat Oct 6 16:21:50 2001 +++ linux/include/asm-i386/processor.h Sat Oct 6 16:31:15 2001 @@ -448,7 +448,7 @@ #define KSTK_ESP(tsk) (((unsigned long *)(4096+(unsigned long)(tsk)))[1022]) #define THREAD_SIZE (2*PAGE_SIZE) -#define alloc_task_struct() ((struct task_struct *) __get_free_pages(GFP_KERNEL,1)) +#define alloc_task_struct() ((struct task_struct *) __get_free_pages(GFP_KERNEL | __GFP_VMALLOC,1)) #define free_task_struct(p) free_pages((unsigned long) (p), 1) #define get_task_struct(tsk) atomic_inc(&virt_to_page(tsk)->count) diff -u -r linux-orig/include/linux/mm.h linux/include/linux/mm.h --- linux-orig/include/linux/mm.h Sat Oct 6 16:21:59 2001 +++ linux/include/linux/mm.h Sat Oct 6 16:28:12 2001 @@ -550,6 +550,7 @@ #define __GFP_IO 0x40 /* Can start low memory physical IO? */ #define __GFP_HIGHIO 0x80 /* Can start high mem physical IO? */ #define __GFP_FS 0x100 /* Can call down to low-level FS? */ +#define __GFP_VMALLOC 0x200 /* Can vmalloc pages if buddy allocator fails */ #define GFP_NOHIGHIO (__GFP_HIGH | __GFP_WAIT | __GFP_IO) #define GFP_NOIO (__GFP_HIGH | __GFP_WAIT) diff -u -r linux-orig/mm/page_alloc.c linux/mm/page_alloc.c --- linux-orig/mm/page_alloc.c Sat Oct 6 16:21:47 2001 +++ linux/mm/page_alloc.c Sat Oct 6 16:36:28 2001 @@ -18,6 +18,7 @@ #include <linux/bootmem.h> #include <linux/slab.h> #include <linux/compiler.h> +#include <linux/vmalloc.h> int nr_swap_pages; int nr_active_pages; @@ -421,9 +422,9 @@ struct page * page; page = alloc_pages(gfp_mask, order); - if (!page) - return 0; - return (unsigned long) page_address(page); + if (page) return (unsigned long) page_address(page); + if (gfp_mask & __GFP_VMALLOC) return (unsigned long)__vmalloc(PAGE_SIZE << order, gfp_mask, PAGE_KERNEL); + return 0; } unsigned long get_zeroed_page(unsigned int gfp_mask) @@ -447,6 +448,10 @@ void free_pages(unsigned long addr, unsigned int order) { + if (addr >= VMALLOC_START && addr < VMALLOC_END) { + vfree((void *)addr); + return; + } if (addr != 0) __free_pages(virt_to_page(addr), order); } ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: %u-order allocation failed 2001-10-06 14:44 ` Mikulas Patocka @ 2001-10-06 15:31 ` Mikulas Patocka 2001-10-06 19:05 ` Mikulas Patocka 2001-10-06 16:58 ` Rik van Riel ` (2 subsequent siblings) 3 siblings, 1 reply; 54+ messages in thread From: Mikulas Patocka @ 2001-10-06 15:31 UTC (permalink / raw) To: Rik van Riel; +Cc: Krzysztof Rusocki, linux-xfs, linux-kernel [-- Attachment #1: Type: TEXT/PLAIN, Size: 998 bytes --] On Sat, 6 Oct 2001, Mikulas Patocka wrote: > On Sat, 6 Oct 2001, Rik van Riel wrote: > > > On Sat, 6 Oct 2001, Mikulas Patocka wrote: > > > > > Buddy allocator is broken - kill it. Or at least do not misuse it for > > > anything except kernel or driver initialization. > > > > Please send patches to get rid of the buddy allocator while > > still making it possible to allocate contiguous chunks of > > memory. > > > > If you have any idea on how to fix things, this would be a > > good time to let us know. > > Here goes the fix. (note that I didn't try to compile it so there may be > bugs, but you see the point). > > kmalloc should be fixed too (used badly for example in select.c - and yes > - I have seen real world bugreports for poll randomly failing with > ENOMEM), but it will be hard to audit all drivers that they do not try to > use dma on kmallocated memory. This is enhanced version of a patch that fixes select and poll as well. Again - not compiled, not tried. Mikulas [-- Attachment #2: Type: TEXT/PLAIN, Size: 4013 bytes --] diff -u -r linux-orig/fs/select.c linux/fs/select.c --- linux-orig/fs/select.c Sat Oct 6 16:20:45 2001 +++ linux/fs/select.c Sat Oct 6 16:54:44 2001 @@ -236,7 +236,7 @@ static void *select_bits_alloc(int size) { - return kmalloc(6 * size, GFP_KERNEL); + return kmalloc(6 * size, GFP_KERNEL | __GFP_VMALLOC); } static void select_bits_free(void *bits, int size) @@ -438,7 +438,7 @@ if (nfds != 0) { fds = (struct pollfd **)kmalloc( (1 + (nfds - 1) / POLLFD_PER_PAGE) * sizeof(struct pollfd *), - GFP_KERNEL); + GFP_KERNEL | __GFP_VMALLOC); if (fds == NULL) goto out; } diff -u -r linux-orig/include/asm-i386/processor.h linux/include/asm-i386/processor.h --- linux-orig/include/asm-i386/processor.h Sat Oct 6 16:21:50 2001 +++ linux/include/asm-i386/processor.h Sat Oct 6 16:31:15 2001 @@ -448,7 +448,7 @@ #define KSTK_ESP(tsk) (((unsigned long *)(4096+(unsigned long)(tsk)))[1022]) #define THREAD_SIZE (2*PAGE_SIZE) -#define alloc_task_struct() ((struct task_struct *) __get_free_pages(GFP_KERNEL,1)) +#define alloc_task_struct() ((struct task_struct *) __get_free_pages(GFP_KERNEL | __GFP_VMALLOC,1)) #define free_task_struct(p) free_pages((unsigned long) (p), 1) #define get_task_struct(tsk) atomic_inc(&virt_to_page(tsk)->count) diff -u -r linux-orig/include/linux/mm.h linux/include/linux/mm.h --- linux-orig/include/linux/mm.h Sat Oct 6 16:21:59 2001 +++ linux/include/linux/mm.h Sat Oct 6 16:28:12 2001 @@ -550,6 +550,7 @@ #define __GFP_IO 0x40 /* Can start low memory physical IO? */ #define __GFP_HIGHIO 0x80 /* Can start high mem physical IO? */ #define __GFP_FS 0x100 /* Can call down to low-level FS? */ +#define __GFP_VMALLOC 0x200 /* Can vmalloc pages if buddy allocator fails */ #define GFP_NOHIGHIO (__GFP_HIGH | __GFP_WAIT | __GFP_IO) #define GFP_NOIO (__GFP_HIGH | __GFP_WAIT) diff -u -r linux-orig/mm/page_alloc.c linux/mm/page_alloc.c --- linux-orig/mm/page_alloc.c Sat Oct 6 16:21:47 2001 +++ linux/mm/page_alloc.c Sat Oct 6 16:36:28 2001 @@ -18,6 +18,7 @@ #include <linux/bootmem.h> #include <linux/slab.h> #include <linux/compiler.h> +#include <linux/vmalloc.h> int nr_swap_pages; int nr_active_pages; @@ -421,9 +422,9 @@ struct page * page; page = alloc_pages(gfp_mask, order); - if (!page) - return 0; - return (unsigned long) page_address(page); + if (page) return (unsigned long) page_address(page); + if (gfp_mask & __GFP_VMALLOC) return (unsigned long)__vmalloc(PAGE_SIZE << order, gfp_mask, PAGE_KERNEL); + return 0; } unsigned long get_zeroed_page(unsigned int gfp_mask) @@ -447,6 +448,10 @@ void free_pages(unsigned long addr, unsigned int order) { + if (addr >= VMALLOC_START && addr < VMALLOC_END) { + vfree((void *)addr); + return; + } if (addr != 0) __free_pages(virt_to_page(addr), order); } diff -u -r linux-orig/mm/slab.c linux/mm/slab.c --- linux-orig/mm/slab.c Sat Oct 6 16:21:48 2001 +++ linux/mm/slab.c Sat Oct 6 17:04:37 2001 @@ -73,6 +73,7 @@ #include <linux/interrupt.h> #include <linux/init.h> #include <linux/compiler.h> +#include <linux/vmalloc.h> #include <asm/uaccess.h> /* @@ -1536,10 +1537,14 @@ cache_sizes_t *csizep = cache_sizes; for (; csizep->cs_size; csizep++) { + void *p; if (size > csizep->cs_size) continue; - return __kmem_cache_alloc(flags & GFP_DMA ? - csizep->cs_dmacachep : csizep->cs_cachep, flags); + if ((p = __kmem_cache_alloc(flags & GFP_DMA ? + csizep->cs_dmacachep : csizep->cs_cachep, flags & ~__GFP_VMALLOC))) + return p; + if (flags & __GFP_VMALLOC) return __vmalloc(size, flags, PAGE_KERNEL); + return NULL; } return NULL; } @@ -1580,6 +1585,10 @@ if (!objp) return; + if ((unsigned long)objp >= VMALLOC_START && (unsigned long)obj < VMALLOC_END) { + vfree(objp); + return; + } local_irq_save(flags); CHECK_PAGE(virt_to_page(objp)); c = GET_PAGE_CACHE(virt_to_page(objp)); ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: %u-order allocation failed 2001-10-06 15:31 ` Mikulas Patocka @ 2001-10-06 19:05 ` Mikulas Patocka 0 siblings, 0 replies; 54+ messages in thread From: Mikulas Patocka @ 2001-10-06 19:05 UTC (permalink / raw) To: Rik van Riel; +Cc: Krzysztof Rusocki, linux-xfs, linux-kernel [-- Attachment #1: Type: TEXT/PLAIN, Size: 254 bytes --] > This is enhanced version of a patch that fixes select and poll as well. > Again - not compiled, not tried. There is a bug that it does not align allocation - so things like (%esp & ~8191) won't work. This should be applied on the top of it. Mikulas [-- Attachment #2: Type: TEXT/PLAIN, Size: 583 bytes --] --- linux-orig/mm/vmalloc.c Sat Oct 6 16:21:47 2001 +++ linux/mm/vmalloc.c Sat Oct 6 21:01:00 2001 @@ -170,6 +170,9 @@ { unsigned long addr; struct vm_struct **p, *tmp, *area; + int align = 0; + + if (size > PAGE_SIZE && !(size & (size - 1))) align = size - 1; area = (struct vm_struct *) kmalloc(sizeof(*area), GFP_KERNEL); if (!area) @@ -183,6 +186,7 @@ if (size + addr <= (unsigned long) tmp->addr) break; addr = tmp->size + (unsigned long) tmp->addr; + addr = (addr + align) & ~align; if (addr > VMALLOC_END-size) goto out; } ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: %u-order allocation failed 2001-10-06 14:44 ` Mikulas Patocka 2001-10-06 15:31 ` Mikulas Patocka @ 2001-10-06 16:58 ` Rik van Riel 2001-10-06 17:48 ` Mikulas Patocka [not found] ` <Pine.LNX.3.96.1011006164044.29342B-200000@artax.karlin.mff.cuni .cz> 2001-10-07 9:40 ` Alan Cox 3 siblings, 1 reply; 54+ messages in thread From: Rik van Riel @ 2001-10-06 16:58 UTC (permalink / raw) To: Mikulas Patocka; +Cc: Krzysztof Rusocki, linux-xfs, linux-kernel [-- Attachment #1: Type: TEXT/PLAIN, Size: 828 bytes --] On Sat, 6 Oct 2001, Mikulas Patocka wrote: > On Sat, 6 Oct 2001, Rik van Riel wrote: > > On Sat, 6 Oct 2001, Mikulas Patocka wrote: > > > > > Buddy allocator is broken - kill it. Or at least do not misuse it for > > > anything except kernel or driver initialization. > > > > Please send patches to get rid of the buddy allocator while > > still making it possible to allocate contiguous chunks of > > memory. > > > > If you have any idea on how to fix things, this would be a > > good time to let us know. > > Here goes the fix. (note that I didn't try to compile it so there may be > bugs, but you see the point). So what are you going to do when your 64MB of vmalloc space runs out ? Rik -- DMCA, SSSCA, W3C? Who cares? http://thefreeworld.net/ (volunteers needed) http://www.surriel.com/ http://distro.conectiva.com/ [-- Attachment #2: Type: TEXT/PLAIN, Size: 2274 bytes --] diff -u -r linux-orig/include/asm-i386/processor.h linux/include/asm-i386/processor.h --- linux-orig/include/asm-i386/processor.h Sat Oct 6 16:21:50 2001 +++ linux/include/asm-i386/processor.h Sat Oct 6 16:31:15 2001 @@ -448,7 +448,7 @@ #define KSTK_ESP(tsk) (((unsigned long *)(4096+(unsigned long)(tsk)))[1022]) #define THREAD_SIZE (2*PAGE_SIZE) -#define alloc_task_struct() ((struct task_struct *) __get_free_pages(GFP_KERNEL,1)) +#define alloc_task_struct() ((struct task_struct *) __get_free_pages(GFP_KERNEL | __GFP_VMALLOC,1)) #define free_task_struct(p) free_pages((unsigned long) (p), 1) #define get_task_struct(tsk) atomic_inc(&virt_to_page(tsk)->count) diff -u -r linux-orig/include/linux/mm.h linux/include/linux/mm.h --- linux-orig/include/linux/mm.h Sat Oct 6 16:21:59 2001 +++ linux/include/linux/mm.h Sat Oct 6 16:28:12 2001 @@ -550,6 +550,7 @@ #define __GFP_IO 0x40 /* Can start low memory physical IO? */ #define __GFP_HIGHIO 0x80 /* Can start high mem physical IO? */ #define __GFP_FS 0x100 /* Can call down to low-level FS? */ +#define __GFP_VMALLOC 0x200 /* Can vmalloc pages if buddy allocator fails */ #define GFP_NOHIGHIO (__GFP_HIGH | __GFP_WAIT | __GFP_IO) #define GFP_NOIO (__GFP_HIGH | __GFP_WAIT) diff -u -r linux-orig/mm/page_alloc.c linux/mm/page_alloc.c --- linux-orig/mm/page_alloc.c Sat Oct 6 16:21:47 2001 +++ linux/mm/page_alloc.c Sat Oct 6 16:36:28 2001 @@ -18,6 +18,7 @@ #include <linux/bootmem.h> #include <linux/slab.h> #include <linux/compiler.h> +#include <linux/vmalloc.h> int nr_swap_pages; int nr_active_pages; @@ -421,9 +422,9 @@ struct page * page; page = alloc_pages(gfp_mask, order); - if (!page) - return 0; - return (unsigned long) page_address(page); + if (page) return (unsigned long) page_address(page); + if (gfp_mask & __GFP_VMALLOC) return (unsigned long)__vmalloc(PAGE_SIZE << order, gfp_mask, PAGE_KERNEL); + return 0; } unsigned long get_zeroed_page(unsigned int gfp_mask) @@ -447,6 +448,10 @@ void free_pages(unsigned long addr, unsigned int order) { + if (addr >= VMALLOC_START && addr < VMALLOC_END) { + vfree((void *)addr); + return; + } if (addr != 0) __free_pages(virt_to_page(addr), order); } ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: %u-order allocation failed 2001-10-06 16:58 ` Rik van Riel @ 2001-10-06 17:48 ` Mikulas Patocka 2001-10-06 18:12 ` Anton Blanchard 2001-10-07 7:35 ` Pavel Machek 0 siblings, 2 replies; 54+ messages in thread From: Mikulas Patocka @ 2001-10-06 17:48 UTC (permalink / raw) To: Rik van Riel; +Cc: Krzysztof Rusocki, linux-xfs, linux-kernel On Sat, 6 Oct 2001, Rik van Riel wrote: > On Sat, 6 Oct 2001, Mikulas Patocka wrote: > > On Sat, 6 Oct 2001, Rik van Riel wrote: > > > On Sat, 6 Oct 2001, Mikulas Patocka wrote: > > > > > > > Buddy allocator is broken - kill it. Or at least do not misuse it for > > > > anything except kernel or driver initialization. > > > > > > Please send patches to get rid of the buddy allocator while > > > still making it possible to allocate contiguous chunks of > > > memory. > > > > > > If you have any idea on how to fix things, this would be a > > > good time to let us know. > > > > Here goes the fix. (note that I didn't try to compile it so there may be > > bugs, but you see the point). > > So what are you going to do when your 64MB of vmalloc space > runs out ? Make larger vmalloc space :-) Virtual memory costs very little. Besides 64M / 8k = 8192 - so it runs out at 8192 processes. Of course vmalloc space can overflow - but it overflows only when the machine is overloaded with too many processes, too many processes with many filedescriptors etc. On the other hand, the buddy allocator fails *RANDOMLY*. Totally randomly, depending on cache access patterns and page allocation times. Mikulas ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: %u-order allocation failed 2001-10-06 17:48 ` Mikulas Patocka @ 2001-10-06 18:12 ` Anton Blanchard 2001-10-06 19:07 ` Mikulas Patocka 2001-10-07 7:35 ` Pavel Machek 1 sibling, 1 reply; 54+ messages in thread From: Anton Blanchard @ 2001-10-06 18:12 UTC (permalink / raw) To: Mikulas Patocka; +Cc: Rik van Riel, Krzysztof Rusocki, linux-xfs, linux-kernel > Of course vmalloc space can overflow - but it overflows only when the > machine is overloaded with too many processes, too many processes with > many filedescriptors etc. On the other hand, the buddy allocator fails > *RANDOMLY*. Totally randomly, depending on cache access patterns and > page allocation times. vmalloc space is also much worse for tlb usage when the main kernel mapping uses large hardware ptes. Ingo and davem pointed this out to me recently when I wanted to allocate the pagecache hash using vmalloc (at the moment it maxes out at order 10 which is much to small for machines with large memory). If you could get away with a single page stack, then you could allocate the task struct separately and avoid any order 1 allocation. But you would probably need interrupt stacks to get away with a single page stack. Anton ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: %u-order allocation failed 2001-10-06 18:12 ` Anton Blanchard @ 2001-10-06 19:07 ` Mikulas Patocka 2001-10-06 20:13 ` Benjamin Herrenschmidt 2001-10-06 21:13 ` Alan Cox 0 siblings, 2 replies; 54+ messages in thread From: Mikulas Patocka @ 2001-10-06 19:07 UTC (permalink / raw) To: Anton Blanchard; +Cc: Rik van Riel, Krzysztof Rusocki, linux-xfs, linux-kernel > > Of course vmalloc space can overflow - but it overflows only when the > > machine is overloaded with too many processes, too many processes with > > many filedescriptors etc. On the other hand, the buddy allocator fails > > *RANDOMLY*. Totally randomly, depending on cache access patterns and > > page allocation times. > > vmalloc space is also much worse for tlb usage when the main kernel mapping > uses large hardware ptes. Ingo and davem pointed this out to me recently > when I wanted to allocate the pagecache hash using vmalloc (at the > moment it maxes out at order 10 which is much to small for machines > with large memory). OK, but my patch uses vmalloc only as a fallback when buddy fails. The probability that buddy fails is small. It is slower but with very small probability. It is perfectly OK to have a bit slower access to task_struct with probability 1/1000000. But it is ***BAD*BUG*** if allocation of task_struct fails with probability 1/1000000. > If you could get away with a single page stack, then you could allocate > the task struct separately and avoid any order 1 allocation. But you > would probably need interrupt stacks to get away with a single page > stack. Yes, but there are still other dangerous usages of kmalloc and __get_free_pages. (The most offending one is in select.c) It is sad that core VM developers did not write any documentation that explains that high-order allocations can fail any time and the caller must not abort his operation when it happens. Instead - they are trying to make high-order allocations fail less often :-/ How should random Joe-driver-developer know, that kmalloc(4096) is safe and kmalloc(4097) is not? Now parts of a kernel written by people who know about buddy allocator (page/buffer/dentry/inode hash allocations, filedescriptor array allocation) are written correctly with the assumption that high-order allocation may fail. Other parts of kernel written by people who do not know about buddy allocator (task_struct allocation, select and probably a lot of drivers) assume that high-order allocation always succeeds. task_struct and select can be fixed easily, but cleaning the shit in drivers will be real pain and it will probably never be finished :-( Mikulas ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: %u-order allocation failed 2001-10-06 19:07 ` Mikulas Patocka @ 2001-10-06 20:13 ` Benjamin Herrenschmidt 2001-10-06 22:34 ` Mikulas Patocka 2001-10-06 21:13 ` Alan Cox 1 sibling, 1 reply; 54+ messages in thread From: Benjamin Herrenschmidt @ 2001-10-06 20:13 UTC (permalink / raw) To: Mikulas Patocka; +Cc: linux-kernel, linux-xfs > >OK, but my patch uses vmalloc only as a fallback when buddy fails. The >probability that buddy fails is small. It is slower but with very small >probability. > >It is perfectly OK to have a bit slower access to task_struct with >probability 1/1000000. > >But it is ***BAD*BUG*** if allocation of task_struct fails with >probability 1/1000000. I missed the beginning of the thread, sorry if that question was already answered, What about all the code that still consider kmalloc'ed memory is safe for use with virt_to_bus and friends and is contiguous physically for DMA ? In some cases (non-PCI devices, embedded platforms, etc...), the pci_consistent API is not an option. That means that __GFP_VMALLOC can't be part of GFP_KERNEL or many driver will break in horrible ways (random memory corruption). Ben. ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: %u-order allocation failed 2001-10-06 20:13 ` Benjamin Herrenschmidt @ 2001-10-06 22:34 ` Mikulas Patocka 2001-10-07 1:23 ` Rik van Riel 2001-10-07 11:12 ` Benjamin Herrenschmidt 0 siblings, 2 replies; 54+ messages in thread From: Mikulas Patocka @ 2001-10-06 22:34 UTC (permalink / raw) To: Benjamin Herrenschmidt; +Cc: linux-kernel, linux-xfs > >OK, but my patch uses vmalloc only as a fallback when buddy fails. The > >probability that buddy fails is small. It is slower but with very small > >probability. > > > >It is perfectly OK to have a bit slower access to task_struct with > >probability 1/1000000. > > > >But it is ***BAD*BUG*** if allocation of task_struct fails with > >probability 1/1000000. > > I missed the beginning of the thread, sorry if that question was > already answered, > > What about all the code that still consider kmalloc'ed memory is > safe for use with virt_to_bus and friends and is contiguous > physically for DMA ? In some cases (non-PCI devices, embedded > platforms, etc...), the pci_consistent API is not an option. > That means that __GFP_VMALLOC can't be part of GFP_KERNEL or > many driver will break in horrible ways (random memory corruption). You are right. Code that allocates more than page and expects it to be physicaly contignuous is broken by design. Even rewrite the driver or allocate memory on boot. It will be very hard to audit all drivers for it. Mikulas ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: %u-order allocation failed 2001-10-06 22:34 ` Mikulas Patocka @ 2001-10-07 1:23 ` Rik van Riel 2001-10-07 11:12 ` Benjamin Herrenschmidt 1 sibling, 0 replies; 54+ messages in thread From: Rik van Riel @ 2001-10-07 1:23 UTC (permalink / raw) To: Mikulas Patocka; +Cc: Benjamin Herrenschmidt, linux-kernel, linux-xfs On Sun, 7 Oct 2001, Mikulas Patocka wrote: > You are right. Code that allocates more than page and expects it to be > physicaly contignuous is broken by design. Even rewrite the driver or > allocate memory on boot. It will be very hard to audit all drivers for it. Better buy us all new hardware, then ;) Some devices really do want physically contiguous buffers for DMA... Rik -- DMCA, SSSCA, W3C? Who cares? http://thefreeworld.net/ (volunteers needed) http://www.surriel.com/ http://distro.conectiva.com/ ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: %u-order allocation failed 2001-10-06 22:34 ` Mikulas Patocka 2001-10-07 1:23 ` Rik van Riel @ 2001-10-07 11:12 ` Benjamin Herrenschmidt 1 sibling, 0 replies; 54+ messages in thread From: Benjamin Herrenschmidt @ 2001-10-07 11:12 UTC (permalink / raw) To: Mikulas Patocka; +Cc: linux-kernel, linux-xfs > >You are right. Code that allocates more than page and expects it to be >physicaly contignuous is broken by design. Even rewrite the driver or >allocate memory on boot. It will be very hard to audit all drivers for it. Well, the problem here is not code. Some piece of hardware just can't scatter gather, or in some case, they can, but the scatter/gather list itself has to be contiguous and can be larger than a page. The fact that kmalloc returns physically contiguous memory is a feature and can't be modified that easily. If you intend to do so, then you need different GFP flags, for example a GFP_CONTIGUOUS flag, and then make sure that drivers allocating DMA memory use that new flag. Ben. ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: %u-order allocation failed 2001-10-06 19:07 ` Mikulas Patocka 2001-10-06 20:13 ` Benjamin Herrenschmidt @ 2001-10-06 21:13 ` Alan Cox 2001-10-06 22:31 ` Mikulas Patocka 1 sibling, 1 reply; 54+ messages in thread From: Alan Cox @ 2001-10-06 21:13 UTC (permalink / raw) To: mikulas Cc: Anton Blanchard, Rik van Riel, Krzysztof Rusocki, linux-xfs, linux-kernel > It is perfectly OK to have a bit slower access to task_struct with > probability 1/1000000. Except that you added a bug where some old driver code would crash the machine by doing so. > Yes, but there are still other dangerous usages of kmalloc and > __get_free_pages. (The most offending one is in select.c) Nothing dangeorus there. The -ac vm isnt triggering these cases. > not abort his operation when it happens. Instead - they are trying to make > high-order allocations fail less often :-/ How should random > Joe-driver-developer know, that kmalloc(4096) is safe and kmalloc(4097) is > not? 4096 is not safe - there is no safe size for a kmalloc, you can always run out of memory - deal with it. Alan ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: %u-order allocation failed 2001-10-06 21:13 ` Alan Cox @ 2001-10-06 22:31 ` Mikulas Patocka 2001-10-06 22:42 ` Alan Cox [not found] ` <Pine.LNX.3.96.1011007002406.18004A-100000@artax.karlin.mff.cuni .cz> 0 siblings, 2 replies; 54+ messages in thread From: Mikulas Patocka @ 2001-10-06 22:31 UTC (permalink / raw) To: Alan Cox Cc: Anton Blanchard, Rik van Riel, Krzysztof Rusocki, linux-xfs, linux-kernel > > It is perfectly OK to have a bit slower access to task_struct with > > probability 1/1000000. > > Except that you added a bug where some old driver code would crash the > machine by doing so. ? > > Yes, but there are still other dangerous usages of kmalloc and > > __get_free_pages. (The most offending one is in select.c) > > Nothing dangeorus there. The -ac vm isnt triggering these cases. Sorry, but it can be triggered by _ANY_ VM since buddy allocator was introduced. You have no guarantee, that you find two or more consecutive free pages. And if you don't, poll() fails. > > not abort his operation when it happens. Instead - they are trying to make > > high-order allocations fail less often :-/ How should random > > Joe-driver-developer know, that kmalloc(4096) is safe and kmalloc(4097) is > > not? > > 4096 is not safe - there is no safe size for a kmalloc, you can always run > out of memory - deal with it. This is not about running out of memory. It is about free space fragmentation. Think this: You have no swap. Program allocates one file cache page, one anon page, one cache page, one anon page and so on. The memory will look like: cache page anon page cache page anon page cache page anon page etc. Now some driver wants to allocate 4097 and it CAN'T. Even when there's half memory free. Mikulas ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: %u-order allocation failed 2001-10-06 22:31 ` Mikulas Patocka @ 2001-10-06 22:42 ` Alan Cox 2001-10-06 22:58 ` Mikulas Patocka [not found] ` <Pine.LNX.3.96.1011007002406.18004A-100000@artax.karlin.mff.cuni .cz> 1 sibling, 1 reply; 54+ messages in thread From: Alan Cox @ 2001-10-06 22:42 UTC (permalink / raw) To: mikulas Cc: Alan Cox, Anton Blanchard, Rik van Riel, Krzysztof Rusocki, linux-xfs, linux-kernel > > Nothing dangeorus there. The -ac vm isnt triggering these cases. > > Sorry, but it can be triggered by _ANY_ VM since buddy allocator was > introduced. You have no guarantee, that you find two or more consecutive > free pages. And if you don't, poll() fails. The two page case isnt one you need to worry about. To all intents and purposes it does not happen, and if you do the maths it isnt going to fail in any interesting ways. Once you go to the 4 page set the odds get a lot longer and then rapidly get very bad indeed, Alan ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: %u-order allocation failed 2001-10-06 22:42 ` Alan Cox @ 2001-10-06 22:58 ` Mikulas Patocka [not found] ` <Pine.LNX.3.96.1011007003803.18004D-100000@artax.karlin.mff.cuni .cz> 0 siblings, 1 reply; 54+ messages in thread From: Mikulas Patocka @ 2001-10-06 22:58 UTC (permalink / raw) To: Alan Cox Cc: Anton Blanchard, Rik van Riel, Krzysztof Rusocki, linux-xfs, linux-kernel > > > Nothing dangeorus there. The -ac vm isnt triggering these cases. > > > > Sorry, but it can be triggered by _ANY_ VM since buddy allocator was > > introduced. You have no guarantee, that you find two or more consecutive > > free pages. And if you don't, poll() fails. > > The two page case isnt one you need to worry about. To all intents and > purposes it does not happen, How do you know it? I showed a simple case where it may happen. > and if you do the maths it isnt going to > fail in any interesting ways. Once you go to the 4 page set the odds get > a lot longer and then rapidly get very bad indeed, I hope you don't want to count probability that the server will or won't crash (yes, crash, because when poll in main loop fails, the server process has not many choices - it can only terminate itself). This reminds me some Microsoft announcement saying that Windows NT are 3 times more stable than Windows 95 :-) And it does happen - see this: http://www.uwsg.indiana.edu/hypermail/linux/kernel/0012.3/0711.html Maybe probability was reduced somehow, but the problem is still there. Mikulas ^ permalink raw reply [flat|nested] 54+ messages in thread
[parent not found: <Pine.LNX.3.96.1011007003803.18004D-100000@artax.karlin.mff.cuni .cz>]
* Re: %u-order allocation failed [not found] ` <Pine.LNX.3.96.1011007003803.18004D-100000@artax.karlin.mff.cuni .cz> @ 2001-10-06 23:36 ` Alex Bligh - linux-kernel 0 siblings, 0 replies; 54+ messages in thread From: Alex Bligh - linux-kernel @ 2001-10-06 23:36 UTC (permalink / raw) To: Mikulas Patocka, Alan Cox Cc: Anton Blanchard, Rik van Riel, Krzysztof Rusocki, linux-xfs, linux-kernel, Alex Bligh - linux-kernel --On Sunday, 07 October, 2001 12:58 AM +0200 Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz> wrote: > How do you know it? I showed a simple case where it may happen. Do you know two order=0 allocations with the same GFP_ value would not have also failed? -- Alex Bligh ^ permalink raw reply [flat|nested] 54+ messages in thread
[parent not found: <Pine.LNX.3.96.1011007002406.18004A-100000@artax.karlin.mff.cuni .cz>]
* Re: %u-order allocation failed [not found] ` <Pine.LNX.3.96.1011007002406.18004A-100000@artax.karlin.mff.cuni .cz> @ 2001-10-06 23:34 ` Alex Bligh - linux-kernel 0 siblings, 0 replies; 54+ messages in thread From: Alex Bligh - linux-kernel @ 2001-10-06 23:34 UTC (permalink / raw) To: Mikulas Patocka, Alan Cox Cc: Anton Blanchard, Rik van Riel, Krzysztof Rusocki, linux-xfs, linux-kernel, Alex Bligh - linux-kernel --On Sunday, 07 October, 2001 12:31 AM +0200 Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz> wrote: > Sorry, but it can be triggered by _ANY_ VM since buddy allocator was > introduced. Just for info, this was circa 1.0.6 :-) (patches were available since 0.99.xxx). And before it was introduced, rather a lot of other things would consistently fail, for instance anything that reassembled packets whose total size was >4k. And currently they still need that. Kernel memory is a limited resource. Contiguous kernel memory more so. Things that need it need to better deal with the lack of it, esp. in transient situations (such as by working round the absence of it, e.g. kiovec in net code, or by causing some freeing and retrying). And, when contiguous kernel memory is short, the allocator could do with some intelligent page freeing to reduce fragmentation. -- Alex Bligh ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: %u-order allocation failed 2001-10-06 17:48 ` Mikulas Patocka 2001-10-06 18:12 ` Anton Blanchard @ 2001-10-07 7:35 ` Pavel Machek 1 sibling, 0 replies; 54+ messages in thread From: Pavel Machek @ 2001-10-07 7:35 UTC (permalink / raw) To: Mikulas Patocka, Rik van Riel; +Cc: Krzysztof Rusocki, linux-xfs, linux-kernel Hi! > > So what are you going to do when your 64MB of vmalloc space > > runs out ? > > Make larger vmalloc space :-) Virtual memory costs very little. > Besides 64M / 8k = 8192 - so it runs out at 8192 processes. Hard to do of machine with 1GB ram... There, virtual memory costs *very* much. Pavel -- I'm pavel@ucw.cz. "In my country we have almost anarchy and I don't care." Panos Katsaloulis describing me w.r.t. patents at discuss@linmodems.org ^ permalink raw reply [flat|nested] 54+ messages in thread
[parent not found: <Pine.LNX.3.96.1011006164044.29342B-200000@artax.karlin.mff.cuni .cz>]
* Re: %u-order allocation failed [not found] ` <Pine.LNX.3.96.1011006164044.29342B-200000@artax.karlin.mff.cuni .cz> @ 2001-10-06 17:59 ` Alex Bligh - linux-kernel 2001-10-06 19:13 ` Mikulas Patocka 0 siblings, 1 reply; 54+ messages in thread From: Alex Bligh - linux-kernel @ 2001-10-06 17:59 UTC (permalink / raw) To: Mikulas Patocka, Rik van Riel Cc: Krzysztof Rusocki, linux-xfs, linux-kernel, Alex Bligh - linux-kernel --On Saturday, 06 October, 2001 4:44 PM +0200 Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz> wrote: > Here goes the fix. (note that I didn't try to compile it so there may be > bugs, but you see the point). (seems to replace high order allocations by vmalloc) & how does vmalloc allocate physically (as opposed to virtually) contiguous memory; can't clearly recall it being IRQ safe either (for GFP_ATOMIC). -- Alex Bligh ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: %u-order allocation failed 2001-10-06 17:59 ` Alex Bligh - linux-kernel @ 2001-10-06 19:13 ` Mikulas Patocka 2001-10-06 19:22 ` arjan [not found] ` <Pine.LNX.3.96.1011006210743.7808D-100000@artax.karlin.mff.cuni. cz> 0 siblings, 2 replies; 54+ messages in thread From: Mikulas Patocka @ 2001-10-06 19:13 UTC (permalink / raw) To: Alex Bligh - linux-kernel Cc: Rik van Riel, Krzysztof Rusocki, linux-xfs, linux-kernel > --On Saturday, 06 October, 2001 4:44 PM +0200 Mikulas Patocka > <mikulas@artax.karlin.mff.cuni.cz> wrote: > > > Here goes the fix. (note that I didn't try to compile it so there may be > > bugs, but you see the point). > > (seems to replace high order allocations by vmalloc) > > & how does vmalloc allocate physically (as opposed to virtually) > contiguous memory; can't clearly recall it being IRQ safe either > (for GFP_ATOMIC). It uses vmalloc only when __GFP_VMALLOC flag is given - and so it is expected to not use __GFP_VMALLOC flag in IRQ. NOTE: no allocations in IRQ are safe. Not only high-order ones. Allocation in IRQ may fail any time and you must recover without lost of functionality (network can lose packets any time, if you are doing some general device driver, you must preallocate all buffers in process context). Mikulas ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: %u-order allocation failed 2001-10-06 19:13 ` Mikulas Patocka @ 2001-10-06 19:22 ` arjan 2001-10-06 22:36 ` Mikulas Patocka [not found] ` <Pine.LNX.3.96.1011006210743.7808D-100000@artax.karlin.mff.cuni. cz> 1 sibling, 1 reply; 54+ messages in thread From: arjan @ 2001-10-06 19:22 UTC (permalink / raw) To: Mikulas Patocka; +Cc: linux-kernel In article <Pine.LNX.3.96.1011006210743.7808D-100000@artax.karlin.mff.cuni.cz> you wrote: > NOTE: no allocations in IRQ are safe. Not only high-order ones. > Allocation in IRQ may fail any time and you must recover without lost of > functionality (network can lose packets any time, if you are doing some > general device driver, you must preallocate all buffers in process > context). how again do you deal with calling vfree() on the ones where you used vmalloc instead of the buddy allocator ? ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: %u-order allocation failed 2001-10-06 19:22 ` arjan @ 2001-10-06 22:36 ` Mikulas Patocka 0 siblings, 0 replies; 54+ messages in thread From: Mikulas Patocka @ 2001-10-06 22:36 UTC (permalink / raw) To: arjan; +Cc: linux-kernel > In article <Pine.LNX.3.96.1011006210743.7808D-100000@artax.karlin.mff.cuni.cz> you wrote: > > > NOTE: no allocations in IRQ are safe. Not only high-order ones. > > Allocation in IRQ may fail any time and you must recover without lost of > > functionality (network can lose packets any time, if you are doing some > > general device driver, you must preallocate all buffers in process > > context). > > how again do you deal with calling vfree() on the ones where you used > vmalloc instead of the buddy allocator ? It's in the patch: if someone calls get_free_pages on vmallocated memory, it will be freed with vfree instead of __get_free_pages. Of course you can't allocate memory in process context and free it in interrupt context - which you could do without __GFP_VMALLOC. Mikulas ^ permalink raw reply [flat|nested] 54+ messages in thread
[parent not found: <Pine.LNX.3.96.1011006210743.7808D-100000@artax.karlin.mff.cuni. cz>]
* Re: %u-order allocation failed [not found] ` <Pine.LNX.3.96.1011006210743.7808D-100000@artax.karlin.mff.cuni. cz> @ 2001-10-06 23:26 ` Alex Bligh - linux-kernel 2001-10-07 18:30 ` Eric W. Biederman 2001-10-07 18:32 ` Eric W. Biederman 0 siblings, 2 replies; 54+ messages in thread From: Alex Bligh - linux-kernel @ 2001-10-06 23:26 UTC (permalink / raw) To: Mikulas Patocka, Alex Bligh - linux-kernel Cc: Rik van Riel, Krzysztof Rusocki, linux-xfs, linux-kernel, Alex Bligh - linux-kernel Mikulas, > It uses vmalloc only when __GFP_VMALLOC flag is given - and so it is > expected to not use __GFP_VMALLOC flag in IRQ. Ah OK. If your point is that people use GFP_ATOMIC when it's not needed, and demand physically contiguous memory when only virtually contiguous memory is needed, in several places in the kernel, then you are correct. [I am not convinced that vmalloc() is the best way to fix it though.] Most of the order>0 users of __get_free_pages() don't 'need' to do that. For instance I was convinced that networking code needed this for larger than 4k packets (pre-fragmentation or post-prefragmentation) until someone pointed out that the kiovec stuff was there, waiting to be used, if someone made the code changes. But the code changes are non-trivial. Note also that something (not sure what) has made fragmentation increasingly prevalent over the years since the buddy allocator was originally put in. (see my earlier patch for measuring fragmentation). There is currently /no/ intelligence in there to defragment stuff, and the 'light touch' patches (ideas I had and posted here) don't appear to work. If we want __get_free_pages to allocate order>0 this is possible to do reliably if we have some intelligent form of page out which attempts to defragment as it runs, or else run a defragmenter. It's also possible to do allocate order>0 GFP_ATOMIC far more reliably than at present if we had a target for defragmentation under normal operation, just like we retain a target for pages reserved for atomic allocation. The very original buddy code (circa 94/95 which I wrote) maintained that there should be (from memory) at least one entry on a high order list (I think it was the 64k list), which gave you a few guaranteed 8k allocations (which was I was interested in). It's trivial to patch this into __get_free_pages though I haven't tried this (i.e. rather than just look at total free pages, look at the existance of a page on either the order=4, 5, 6... queues). Note you will use memory less efficiently if you do this. In times of cheaper memory costs, it might be worth testing this approach again. -- Alex Bligh ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: %u-order allocation failed 2001-10-06 23:26 ` Alex Bligh - linux-kernel @ 2001-10-07 18:30 ` Eric W. Biederman 2001-10-08 15:01 ` Alex Bligh - linux-kernel 2001-10-07 18:32 ` Eric W. Biederman 1 sibling, 1 reply; 54+ messages in thread From: Eric W. Biederman @ 2001-10-07 18:30 UTC (permalink / raw) To: Alex Bligh - linux-kernel Cc: Mikulas Patocka, Rik van Riel, Krzysztof Rusocki, linux-xfs, linux-kernel, Alex Bligh - linux-kernel Alex Bligh - linux-kernel <linux-kernel@alex.org.uk> writes: > Mikulas, > > > It uses vmalloc only when __GFP_VMALLOC flag is given - and so it is > > expected to not use __GFP_VMALLOC flag in IRQ. > > Ah OK. If your point is that people use GFP_ATOMIC when it's > not needed, and demand physically contiguous memory when only > virtually contiguous memory is needed, in several places in > the kernel, then you are correct. [I am not convinced that > vmalloc() is the best way to fix it though.] > > Most of the order>0 users of __get_free_pages() don't > 'need' to do that. For instance I was convinced that networking > code needed this for larger than 4k packets (pre-fragmentation > or post-prefragmentation) until someone pointed out that > the kiovec stuff was there, waiting to be used, if someone > made the code changes. But the code changes are non-trivial. The zero copy stuff introduced in 2.4.4 allows for skb fragments. I haven't seen any of the network drivers using it on their receive path but it should be possible. > Note also that something (not sure what) has made fragmentation > increasingly prevalent over the years since the buddy allocator > was originally put in. Actually it seems to be situations like the stack now being two pages > (see my earlier patch for measuring > fragmentation). There is currently /no/ intelligence in there > to defragment stuff, and the 'light touch' patches (ideas I had > and posted here) don't appear to work. If we want __get_free_pages > to allocate order>0 this is possible to do reliably if we > have some intelligent form of page out which attempts > to defragment as it runs, or else run a defragmenter. It's also possible > to do allocate order>0 GFP_ATOMIC far more reliably than at > present if we had a target for defragmentation under normal > operation, just like we retain a target for pages reserved > for atomic allocation. > > The very original buddy code (circa 94/95 which I wrote) maintained > that there should be (from memory) at least one entry on a high > order list (I think it was the 64k list), which gave you a few > guaranteed 8k allocations (which was I was interested in). It's > trivial to patch this into __get_free_pages though I haven't > tried this (i.e. rather than just look at total free pages, > look at the existance of a page on either the order=4, 5, 6... > queues). Note you will use memory less efficiently if you do > this. In times of cheaper memory costs, it might be worth > testing this approach again. > > -- > Alex Bligh > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: %u-order allocation failed 2001-10-07 18:30 ` Eric W. Biederman @ 2001-10-08 15:01 ` Alex Bligh - linux-kernel 0 siblings, 0 replies; 54+ messages in thread From: Alex Bligh - linux-kernel @ 2001-10-08 15:01 UTC (permalink / raw) To: Eric W. Biederman, Alex Bligh - linux-kernel Cc: Mikulas Patocka, Rik van Riel, Krzysztof Rusocki, linux-xfs, linux-kernel, Alex Bligh - linux-kernel --On Sunday, October 07, 2001 12:30 PM -0600 "Eric W. Biederman" <ebiederman@uswest.net> wrote: >> Note also that something (not sure what) has made fragmentation >> increasingly prevalent over the years since the buddy allocator >> was originally put in. > > Actually it seems to be situations like the stack now being two pages Instrumentation posted here before appears to corellate fragmentation being /caused/ with I/O activity (single bonnie process and thus a single 8k stack frame). My own guess is that it is due to a different persistence of various caches. I haven't seen anyone before blaming stack frame allocation as a /cause/ of fragmenation - I've heard people say they notice fragmentation more as stack frame allocs start to fail - but that's a symptom. -- Alex Bligh ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: %u-order allocation failed 2001-10-06 23:26 ` Alex Bligh - linux-kernel 2001-10-07 18:30 ` Eric W. Biederman @ 2001-10-07 18:32 ` Eric W. Biederman 1 sibling, 0 replies; 54+ messages in thread From: Eric W. Biederman @ 2001-10-07 18:32 UTC (permalink / raw) To: Alex Bligh - linux-kernel Cc: Mikulas Patocka, Rik van Riel, Krzysztof Rusocki, linux-xfs, linux-kernel, Alex Bligh - linux-kernel Alex Bligh - linux-kernel <linux-kernel@alex.org.uk> writes: > Mikulas, > > > It uses vmalloc only when __GFP_VMALLOC flag is given - and so it is > > expected to not use __GFP_VMALLOC flag in IRQ. > > Ah OK. If your point is that people use GFP_ATOMIC when it's > not needed, and demand physically contiguous memory when only > virtually contiguous memory is needed, in several places in > the kernel, then you are correct. [I am not convinced that > vmalloc() is the best way to fix it though.] > > Most of the order>0 users of __get_free_pages() don't > 'need' to do that. For instance I was convinced that networking > code needed this for larger than 4k packets (pre-fragmentation > or post-prefragmentation) until someone pointed out that > the kiovec stuff was there, waiting to be used, if someone > made the code changes. But the code changes are non-trivial. The zero copy stuff introduced in 2.4.4 allows for skb fragments. I haven't seen any of the network drivers using it on their receive path but it should be possible. > Note also that something (not sure what) has made fragmentation > increasingly prevalent over the years since the buddy allocator > was originally put in. Actually it seems to be situations like the stack now being two contiguous pages instead of one, where the demand for contiguous memory has increased instead of the amount of fragmentation having increased. Eric ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: %u-order allocation failed 2001-10-06 14:44 ` Mikulas Patocka ` (2 preceding siblings ...) [not found] ` <Pine.LNX.3.96.1011006164044.29342B-200000@artax.karlin.mff.cuni .cz> @ 2001-10-07 9:40 ` Alan Cox 2001-10-07 12:28 ` Mikulas Patocka 3 siblings, 1 reply; 54+ messages in thread From: Alan Cox @ 2001-10-07 9:40 UTC (permalink / raw) To: Mikulas Patocka; +Cc: Rik van Riel, Krzysztof Rusocki, linux-xfs, linux-kernel > Here goes the fix. (note that I didn't try to compile it so there may be > bugs, but you see the point). It isnt a fix > kmalloc should be fixed too (used badly for example in select.c - and yes > - I have seen real world bugreports for poll randomly failing with > ENOMEM), but it will be hard to audit all drivers that they do not try to > use dma on kmallocated memory. So you run out of blocks of vmalloc address space instead. The same problem still occurs and always will ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: %u-order allocation failed 2001-10-07 9:40 ` Alan Cox @ 2001-10-07 12:28 ` Mikulas Patocka 2001-10-07 14:12 ` Alan Cox 0 siblings, 1 reply; 54+ messages in thread From: Mikulas Patocka @ 2001-10-07 12:28 UTC (permalink / raw) To: Alan Cox; +Cc: Rik van Riel, Krzysztof Rusocki, linux-xfs, linux-kernel On Sun, 7 Oct 2001, Alan Cox wrote: > > Here goes the fix. (note that I didn't try to compile it so there may be > > bugs, but you see the point). > > It isnt a fix > > > kmalloc should be fixed too (used badly for example in select.c - and yes > > - I have seen real world bugreports for poll randomly failing with > > ENOMEM), but it will be hard to audit all drivers that they do not try to > > use dma on kmallocated memory. > > So you run out of blocks of vmalloc address space instead. The same problem > still occurs and always will I already said it in mail to Rik: Yes - you can run out of vmalloc space. But you run out of it only when you create too many processes (8192), load too many modules etc. If someone needs to put such heavy load on linux, we can expect that he is not a luser and he knows how to increase size of vmalloc space. But - you run out of high-order pages randomly. You don't have to overflow any resource - just map a file, touch it whole the first time and then periodically touch every second page of it. Or: alloc periodically one anon page and one cache page - read() (without readahead) does exactly that. You can't run out of vmalloc space just by mapping files and touching pages. The probability math is fine - only if you are sure that pages are allocated and freed randomly. But they are not. Mikulas ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: %u-order allocation failed 2001-10-07 12:28 ` Mikulas Patocka @ 2001-10-07 14:12 ` Alan Cox 2001-10-07 15:42 ` Mikulas Patocka 0 siblings, 1 reply; 54+ messages in thread From: Alan Cox @ 2001-10-07 14:12 UTC (permalink / raw) To: Mikulas Patocka Cc: Alan Cox, Rik van Riel, Krzysztof Rusocki, linux-xfs, linux-kernel > Yes - you can run out of vmalloc space. But you run out of it only when > you create too many processes (8192), load too many modules etc. If > someone needs to put such heavy load on linux, we can expect that he is > not a luser and he knows how to increase size of vmalloc space. Not just that - you get fragmentation of it which leads you back to the same situation as kmalloc except that with the guard pages you fragment the address space more. Alan ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: %u-order allocation failed 2001-10-07 14:12 ` Alan Cox @ 2001-10-07 15:42 ` Mikulas Patocka 2001-10-07 22:01 ` Alan Cox 0 siblings, 1 reply; 54+ messages in thread From: Mikulas Patocka @ 2001-10-07 15:42 UTC (permalink / raw) To: Alan Cox; +Cc: Rik van Riel, Krzysztof Rusocki, linux-xfs, linux-kernel > > Yes - you can run out of vmalloc space. But you run out of it only when > > you create too many processes (8192), load too many modules etc. If > > someone needs to put such heavy load on linux, we can expect that he is > > not a luser and he knows how to increase size of vmalloc space. > > Not just that - you get fragmentation of it which leads you back to the > same situation as kmalloc except that with the guard pages you fragment the > address space more. So - for example if you have 500 processes, each process 8k stack (plus one page for vmalloc alignment). Please tell me some alloc/free strategy that fills up and fragments 64M vmalloc space. You can't find it. The difference between memory and vmalloc space is this: you fill up the whole memory with cache => memory fragments. You don't fill up the whole vmalloc space with anything => vmalloc space doesn't fragment. Mikulas ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: %u-order allocation failed 2001-10-07 15:42 ` Mikulas Patocka @ 2001-10-07 22:01 ` Alan Cox 2001-10-08 15:08 ` Alex Bligh - linux-kernel ` (2 more replies) 0 siblings, 3 replies; 54+ messages in thread From: Alan Cox @ 2001-10-07 22:01 UTC (permalink / raw) To: Mikulas Patocka Cc: Alan Cox, Rik van Riel, Krzysztof Rusocki, linux-xfs, linux-kernel > The difference between memory and vmalloc space is this: you fill up the > whole memory with cache => memory fragments. You don't fill up the whole > vmalloc space with anything => vmalloc space doesn't fragment. vmalloc space fragments. You fragment address space rather than pages thats all. Same problem ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: %u-order allocation failed 2001-10-07 22:01 ` Alan Cox @ 2001-10-08 15:08 ` Alex Bligh - linux-kernel 2001-10-08 16:44 ` Pavel Machek 2001-10-08 22:21 ` Mikulas Patocka 2 siblings, 0 replies; 54+ messages in thread From: Alex Bligh - linux-kernel @ 2001-10-08 15:08 UTC (permalink / raw) To: Alan Cox, Mikulas Patocka Cc: Rik van Riel, Krzysztof Rusocki, linux-xfs, linux-kernel, Alex Bligh - linux-kernel --On Sunday, October 07, 2001 11:01 PM +0100 Alan Cox <alan@lxorguk.ukuu.org.uk> wrote: > vmalloc space fragments. You fragment address space rather than pages > thats all. Same problem Actually fragmented virtual space is theoretically worse, as you have now lost a possible weapon to defragment stuff (indirection on mapping to physical RAM - i.e. you could no longer move or swap out physical RAM and keep the virtual address mapping the same). -- Alex Bligh ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: %u-order allocation failed 2001-10-07 22:01 ` Alan Cox 2001-10-08 15:08 ` Alex Bligh - linux-kernel @ 2001-10-08 16:44 ` Pavel Machek 2001-10-08 22:21 ` Mikulas Patocka 2 siblings, 0 replies; 54+ messages in thread From: Pavel Machek @ 2001-10-08 16:44 UTC (permalink / raw) To: Alan Cox Cc: Mikulas Patocka, Rik van Riel, Krzysztof Rusocki, linux-xfs, linux-kernel Hi! > > The difference between memory and vmalloc space is this: you fill up the > > whole memory with cache => memory fragments. You don't fill up the whole > > vmalloc space with anything => vmalloc space doesn't fragment. > > vmalloc space fragments. You fragment address space rather than pages thats > all. Same problem vmalloc space tends to be empty while ram tends to be full. That might be important. Pavel -- Philips Velo 1: 1"x4"x8", 300gram, 60, 12MB, 40bogomips, linux, mutt, details at http://atrey.karlin.mff.cuni.cz/~pavel/velo/index.html. ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: %u-order allocation failed 2001-10-07 22:01 ` Alan Cox 2001-10-08 15:08 ` Alex Bligh - linux-kernel 2001-10-08 16:44 ` Pavel Machek @ 2001-10-08 22:21 ` Mikulas Patocka 2001-10-08 21:16 ` David Lang [not found] ` <Pine.LNX.3.96.1011009001720.20446A-100000@artax.karlin.mff.cuni .cz> 2 siblings, 2 replies; 54+ messages in thread From: Mikulas Patocka @ 2001-10-08 22:21 UTC (permalink / raw) To: Alan Cox; +Cc: Rik van Riel, Krzysztof Rusocki, linux-xfs, linux-kernel On Sun, 7 Oct 2001, Alan Cox wrote: > > The difference between memory and vmalloc space is this: you fill up the > > whole memory with cache => memory fragments. You don't fill up the whole > > vmalloc space with anything => vmalloc space doesn't fragment. > > vmalloc space fragments. You fragment address space rather than pages thats > all. Same problem If you have more than half of virtual space free, you can always find two consecutive free pages. Period. You can fill up half of virtual space if you start 4096 processes or load many modules of total size 32M. Is it clear? Do you realize that no one will ever hit this limit in typical linux configuration? Mikulas ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: %u-order allocation failed 2001-10-08 22:21 ` Mikulas Patocka @ 2001-10-08 21:16 ` David Lang [not found] ` <Pine.LNX.3.96.1011009001720.20446A-100000@artax.karlin.mff.cuni .cz> 1 sibling, 0 replies; 54+ messages in thread From: David Lang @ 2001-10-08 21:16 UTC (permalink / raw) To: Mikulas Patocka Cc: Alan Cox, Rik van Riel, Krzysztof Rusocki, linux-xfs, linux-kernel only 4096 processes, sounds low to me (I realize that some of my configs are not typical, but this isn't that unusual on servers) does this limit go up if you raise the max number of processes/threads? David Lang On Tue, 9 Oct 2001, Mikulas Patocka wrote: > Date: Tue, 9 Oct 2001 00:21:04 +0200 (CEST) > From: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz> > To: Alan Cox <alan@lxorguk.ukuu.org.uk> > Cc: Rik van Riel <riel@conectiva.com.br>, > Krzysztof Rusocki <kszysiu@main.braxis.co.uk>, linux-xfs@oss.sgi.com, > linux-kernel@vger.kernel.org > Subject: Re: %u-order allocation failed > > On Sun, 7 Oct 2001, Alan Cox wrote: > > > > The difference between memory and vmalloc space is this: you fill up the > > > whole memory with cache => memory fragments. You don't fill up the whole > > > vmalloc space with anything => vmalloc space doesn't fragment. > > > > vmalloc space fragments. You fragment address space rather than pages thats > > all. Same problem > > If you have more than half of virtual space free, you can always find two > consecutive free pages. Period. > > You can fill up half of virtual space if you start 4096 processes or load > many modules of total size 32M. Is it clear? Do you realize that no one > will ever hit this limit in typical linux configuration? > > Mikulas > > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > ^ permalink raw reply [flat|nested] 54+ messages in thread
[parent not found: <Pine.LNX.3.96.1011009001720.20446A-100000@artax.karlin.mff.cuni .cz>]
* Re: %u-order allocation failed [not found] ` <Pine.LNX.3.96.1011009001720.20446A-100000@artax.karlin.mff.cuni .cz> @ 2001-10-08 22:53 ` Alex Bligh - linux-kernel 2001-10-08 23:31 ` Mikulas Patocka 0 siblings, 1 reply; 54+ messages in thread From: Alex Bligh - linux-kernel @ 2001-10-08 22:53 UTC (permalink / raw) To: Mikulas Patocka, Alan Cox Cc: Rik van Riel, Krzysztof Rusocki, linux-xfs, linux-kernel, Alex Bligh - linux-kernel --On Tuesday, 09 October, 2001 12:21 AM +0200 Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz> wrote: > If you have more than half of virtual space free, you can always find two > consecutive free pages. Period. Now calculate the probability of not being able to do this in physical space, assuming even page dispersion, and many pages free. You will find it is very small. This may give you a clue as to what the problem actually is. -- Alex Bligh ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: %u-order allocation failed 2001-10-08 22:53 ` Alex Bligh - linux-kernel @ 2001-10-08 23:31 ` Mikulas Patocka 2001-10-08 23:44 ` Alan Cox 2001-10-08 23:48 ` Linus Torvalds 0 siblings, 2 replies; 54+ messages in thread From: Mikulas Patocka @ 2001-10-08 23:31 UTC (permalink / raw) To: torvalds Cc: Alan Cox, Rik van Riel, Alex Bligh - linux-kernel, Krzysztof Rusocki, linux-xfs, linux-kernel On Mon, 8 Oct 2001, Alex Bligh - linux-kernel wrote: > --On Tuesday, 09 October, 2001 12:21 AM +0200 Mikulas Patocka > <mikulas@artax.karlin.mff.cuni.cz> wrote: > > > If you have more than half of virtual space free, you can always find two > > consecutive free pages. Period. > > Now calculate the probability of not being able to do this in physical > space, assuming even page dispersion, and many pages free. You will > find it is very small. This may give you a clue as to what the problem > actually is. My patch is not providing "very small probability". It is providing _zero_ probability that fork fails. (assiming that there is more than half vmalloc space free). I'm just tired of this stupid flamewar. Linus, what do you think: is it OK if fork randomly fails with very small probability or not? Are you going to accept patch that maps task_struct into virtual space if buddy allocator fails or not? Mikulas ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: %u-order allocation failed 2001-10-08 23:31 ` Mikulas Patocka @ 2001-10-08 23:44 ` Alan Cox 2001-10-08 23:46 ` Mikulas Patocka 2001-10-08 23:48 ` Linus Torvalds 1 sibling, 1 reply; 54+ messages in thread From: Alan Cox @ 2001-10-08 23:44 UTC (permalink / raw) To: Mikulas Patocka Cc: torvalds, Alan Cox, Rik van Riel, Alex Bligh - linux-kernel, Krzysztof Rusocki, linux-xfs, linux-kernel > Linus, what do you think: is it OK if fork randomly fails with very small > probability or not? Your code doesnt change that behaviour. Not one iota. Do the mathematics, work out the failure probabilities for page pairs. Now remember that the vmalloc one has guard pages too. You are trying to solve a non problem with a non solution Alan ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: %u-order allocation failed 2001-10-08 23:44 ` Alan Cox @ 2001-10-08 23:46 ` Mikulas Patocka 2001-10-09 9:45 ` Pavel Machek 0 siblings, 1 reply; 54+ messages in thread From: Mikulas Patocka @ 2001-10-08 23:46 UTC (permalink / raw) To: Alan Cox Cc: torvalds, Rik van Riel, Alex Bligh - linux-kernel, Krzysztof Rusocki, linux-xfs, linux-kernel On Tue, 9 Oct 2001, Alan Cox wrote: > > Linus, what do you think: is it OK if fork randomly fails with very small > > probability or not? > > Your code doesnt change that behaviour. Not one iota. Do the mathematics, > work out the failure probabilities for page pairs. Now remember that the > vmalloc one has guard pages too. > > You are trying to solve a non problem with a non solution I asked Linus, not you :-/ It's up to him, if he wants "stability-based-on-probability" algorithms in Linux or not. Mikulas ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: %u-order allocation failed 2001-10-08 23:46 ` Mikulas Patocka @ 2001-10-09 9:45 ` Pavel Machek 0 siblings, 0 replies; 54+ messages in thread From: Pavel Machek @ 2001-10-09 9:45 UTC (permalink / raw) To: Mikulas Patocka; +Cc: linux-kernel Hi! > > > Linus, what do you think: is it OK if fork randomly fails with very small > > > probability or not? > > > > Your code doesnt change that behaviour. Not one iota. Do the mathematics, > > work out the failure probabilities for page pairs. Now remember that the > > vmalloc one has guard pages too. > > > > You are trying to solve a non problem with a non solution > > I asked Linus, not you :-/ > > It's up to him, if he wants "stability-based-on-probability" algorithms in > Linux or not. You ignored comment about guard pages. Pavel -- Casualities in World Trade Center: 6453 dead inside the building, cryptography in U.S.A. and free speech in Czech Republic. ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: %u-order allocation failed 2001-10-08 23:31 ` Mikulas Patocka 2001-10-08 23:44 ` Alan Cox @ 2001-10-08 23:48 ` Linus Torvalds 2001-10-08 23:54 ` Mikulas Patocka 2001-10-09 11:48 ` Rik van Riel 1 sibling, 2 replies; 54+ messages in thread From: Linus Torvalds @ 2001-10-08 23:48 UTC (permalink / raw) To: Mikulas Patocka Cc: Alan Cox, Rik van Riel, Alex Bligh - linux-kernel, Krzysztof Rusocki, linux-xfs, linux-kernel On Tue, 9 Oct 2001, Mikulas Patocka wrote: > > Linus, what do you think: is it OK if fork randomly fails with very small > probability or not? I've never seen it, I've never heard it reported, and I _know_ that vmalloc() causes slowdowns. In short, I'm not switching to a vmalloc() fork. Linus ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: %u-order allocation failed 2001-10-08 23:48 ` Linus Torvalds @ 2001-10-08 23:54 ` Mikulas Patocka 2001-10-09 11:48 ` Rik van Riel 1 sibling, 0 replies; 54+ messages in thread From: Mikulas Patocka @ 2001-10-08 23:54 UTC (permalink / raw) To: Linus Torvalds Cc: Alan Cox, Rik van Riel, Alex Bligh - linux-kernel, Krzysztof Rusocki, linux-xfs, linux-kernel On Mon, 8 Oct 2001, Linus Torvalds wrote: > > On Tue, 9 Oct 2001, Mikulas Patocka wrote: > > > > Linus, what do you think: is it OK if fork randomly fails with very small > > probability or not? > > I've never seen it, I've never heard it reported, and I _know_ that > vmalloc() causes slowdowns. > > In short, I'm not switching to a vmalloc() fork. The patch uses buddy by default and does vmalloc only if buddy fails. Slowdown is not an issue here. Mikulas ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: %u-order allocation failed 2001-10-08 23:48 ` Linus Torvalds 2001-10-08 23:54 ` Mikulas Patocka @ 2001-10-09 11:48 ` Rik van Riel 1 sibling, 0 replies; 54+ messages in thread From: Rik van Riel @ 2001-10-09 11:48 UTC (permalink / raw) To: Linus Torvalds Cc: Mikulas Patocka, Alan Cox, Alex Bligh - linux-kernel, Krzysztof Rusocki, linux-xfs, linux-kernel On Mon, 8 Oct 2001, Linus Torvalds wrote: > On Tue, 9 Oct 2001, Mikulas Patocka wrote: > > > > Linus, what do you think: is it OK if fork randomly fails with very small > > probability or not? > > I've never seen it, I've never heard it reported, and I _know_ that > vmalloc() causes slowdowns. I've seen it happen during stresstest of an underpowered test box. When that point is reached, the system usually is already so far overloaded there's little point in allowing extra processes to be started. > In short, I'm not switching to a vmalloc() fork. The only real use I could see would be to allow root to start up some commands to save the box when it's going down the drain. Probably not worth it since root could have just used ulimit for the normal users ;) regards, Rik -- DMCA, SSSCA, W3C? Who cares? http://thefreeworld.net/ (volunteers needed) http://www.surriel.com/ http://distro.conectiva.com/ ^ permalink raw reply [flat|nested] 54+ messages in thread
end of thread, other threads:[~2001-10-13 19:34 UTC | newest]
Thread overview: 54+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2001-10-05 11:07 %u-order allocation failed Krzysztof Rusocki
2001-10-05 11:59 ` Rik van Riel
2001-10-05 20:18 ` Seth Mos
2001-10-05 20:22 ` Rik van Riel
2001-10-05 20:31 ` Seth Mos
2001-10-05 20:43 ` Steve Lord
2001-10-05 21:09 ` Seth Mos
2001-10-05 22:06 ` David Schwartz
2001-10-05 22:16 ` Seth Mos
2001-10-06 14:00 ` Mikulas Patocka
2001-10-06 14:03 ` Rik van Riel
2001-10-06 14:44 ` Mikulas Patocka
2001-10-06 15:31 ` Mikulas Patocka
2001-10-06 19:05 ` Mikulas Patocka
2001-10-06 16:58 ` Rik van Riel
2001-10-06 17:48 ` Mikulas Patocka
2001-10-06 18:12 ` Anton Blanchard
2001-10-06 19:07 ` Mikulas Patocka
2001-10-06 20:13 ` Benjamin Herrenschmidt
2001-10-06 22:34 ` Mikulas Patocka
2001-10-07 1:23 ` Rik van Riel
2001-10-07 11:12 ` Benjamin Herrenschmidt
2001-10-06 21:13 ` Alan Cox
2001-10-06 22:31 ` Mikulas Patocka
2001-10-06 22:42 ` Alan Cox
2001-10-06 22:58 ` Mikulas Patocka
[not found] ` <Pine.LNX.3.96.1011007003803.18004D-100000@artax.karlin.mff.cuni .cz>
2001-10-06 23:36 ` Alex Bligh - linux-kernel
[not found] ` <Pine.LNX.3.96.1011007002406.18004A-100000@artax.karlin.mff.cuni .cz>
2001-10-06 23:34 ` Alex Bligh - linux-kernel
2001-10-07 7:35 ` Pavel Machek
[not found] ` <Pine.LNX.3.96.1011006164044.29342B-200000@artax.karlin.mff.cuni .cz>
2001-10-06 17:59 ` Alex Bligh - linux-kernel
2001-10-06 19:13 ` Mikulas Patocka
2001-10-06 19:22 ` arjan
2001-10-06 22:36 ` Mikulas Patocka
[not found] ` <Pine.LNX.3.96.1011006210743.7808D-100000@artax.karlin.mff.cuni. cz>
2001-10-06 23:26 ` Alex Bligh - linux-kernel
2001-10-07 18:30 ` Eric W. Biederman
2001-10-08 15:01 ` Alex Bligh - linux-kernel
2001-10-07 18:32 ` Eric W. Biederman
2001-10-07 9:40 ` Alan Cox
2001-10-07 12:28 ` Mikulas Patocka
2001-10-07 14:12 ` Alan Cox
2001-10-07 15:42 ` Mikulas Patocka
2001-10-07 22:01 ` Alan Cox
2001-10-08 15:08 ` Alex Bligh - linux-kernel
2001-10-08 16:44 ` Pavel Machek
2001-10-08 22:21 ` Mikulas Patocka
2001-10-08 21:16 ` David Lang
[not found] ` <Pine.LNX.3.96.1011009001720.20446A-100000@artax.karlin.mff.cuni .cz>
2001-10-08 22:53 ` Alex Bligh - linux-kernel
2001-10-08 23:31 ` Mikulas Patocka
2001-10-08 23:44 ` Alan Cox
2001-10-08 23:46 ` Mikulas Patocka
2001-10-09 9:45 ` Pavel Machek
2001-10-08 23:48 ` Linus Torvalds
2001-10-08 23:54 ` Mikulas Patocka
2001-10-09 11:48 ` Rik van Riel
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox