* [PATCH 1/7] mm: introduce simple_malloc()/simple_free() @ 2008-11-16 4:33 Lai Jiangshan 2008-11-16 4:49 ` Alexey Dobriyan 2008-11-16 4:52 ` Arjan van de Ven 0 siblings, 2 replies; 29+ messages in thread From: Lai Jiangshan @ 2008-11-16 4:33 UTC (permalink / raw) To: Andrew Morton Cc: Paul Menage, kamezawa.hiroyu, Balbir Singh, Jens Axboe, David S. Miller, Jan Kara, Jes Sorensen, Linux Kernel Mailing List some subsystem needs vmalloc() when required memory is large. but current kernel has not APIs for this requirement. this patch introduces simple_malloc() and simple_free(). Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> --- diff --git a/include/linux/mm.h b/include/linux/mm.h index ffee2f7..e9c11f7 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -13,6 +13,7 @@ #include <linux/prio_tree.h> #include <linux/debug_locks.h> #include <linux/mm_types.h> +#include <linux/vmalloc.h> struct mempolicy; struct anon_vma; @@ -278,6 +279,36 @@ static inline int is_vmalloc_addr(const void *x) #endif } +static inline void *__simple_malloc(unsigned long size, int pages_hint) +{ + if (size <= PAGE_SIZE * pages_hint) + return kmalloc(size, GFP_KERNEL); + else + return vmalloc(size); +} + +/** + * simple_malloc - allocate memory by kmalloc() or vmalloc() + * + * if @size <= PAGE_SIZE, memory is allocated by kmalloc(), + * otherwise by vmalloc() + */ +static inline void *simple_malloc(unsigned long size) +{ + return __simple_malloc(size, 1); +} + +/** + * simple_free - free the memory by kfree(), or vfree() if it is vmalloc addr + */ +static inline void simple_free(void *ptr) +{ + if (is_vmalloc_addr(ptr)) + vfree(ptr); + else + kfree(ptr); +} + static inline struct page *compound_head(struct page *page) { if (unlikely(PageTail(page))) ^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: [PATCH 1/7] mm: introduce simple_malloc()/simple_free() 2008-11-16 4:33 [PATCH 1/7] mm: introduce simple_malloc()/simple_free() Lai Jiangshan @ 2008-11-16 4:49 ` Alexey Dobriyan 2008-11-16 8:14 ` David Miller 2008-11-16 18:42 ` KOSAKI Motohiro 2008-11-16 4:52 ` Arjan van de Ven 1 sibling, 2 replies; 29+ messages in thread From: Alexey Dobriyan @ 2008-11-16 4:49 UTC (permalink / raw) To: Lai Jiangshan Cc: Andrew Morton, Paul Menage, kamezawa.hiroyu, Balbir Singh, Jens Axboe, David S. Miller, Jan Kara, Jes Sorensen, Linux Kernel Mailing List On Sun, Nov 16, 2008 at 12:33:15PM +0800, Lai Jiangshan wrote: > some subsystem needs vmalloc() when required memory is large. > but current kernel has not APIs for this requirement. It doesn't mean there should be such an API. > this patch introduces simple_malloc() and simple_free(). Grossly misnamed. It should be kvmalloc/kvfree, at least. > --- a/include/linux/mm.h > +++ b/include/linux/mm.h > @@ -278,6 +279,36 @@ static inline int is_vmalloc_addr(const void *x) > #endif > } > > +static inline void *__simple_malloc(unsigned long size, int pages_hint) > +{ > + if (size <= PAGE_SIZE * pages_hint) > + return kmalloc(size, GFP_KERNEL); > + else > + return vmalloc(size); > +} > + > +/** > + * simple_malloc - allocate memory by kmalloc() or vmalloc() > + * > + * if @size <= PAGE_SIZE, memory is allocated by kmalloc(), > + * otherwise by vmalloc() > + */ > +static inline void *simple_malloc(unsigned long size) > +{ > + return __simple_malloc(size, 1); > +} > + > +/** > + * simple_free - free the memory by kfree(), or vfree() if it is vmalloc addr > + */ > +static inline void simple_free(void *ptr) > +{ > + if (is_vmalloc_addr(ptr)) > + vfree(ptr); > + else > + kfree(ptr); > +} ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 1/7] mm: introduce simple_malloc()/simple_free() 2008-11-16 4:49 ` Alexey Dobriyan @ 2008-11-16 8:14 ` David Miller 2008-11-16 18:42 ` KOSAKI Motohiro 1 sibling, 0 replies; 29+ messages in thread From: David Miller @ 2008-11-16 8:14 UTC (permalink / raw) To: adobriyan Cc: laijs, akpm, menage, kamezawa.hiroyu, balbir, jens.axboe, jack, jes, linux-kernel From: Alexey Dobriyan <adobriyan@gmail.com> Date: Sun, 16 Nov 2008 07:49:01 +0300 > On Sun, Nov 16, 2008 at 12:33:15PM +0800, Lai Jiangshan wrote: > > some subsystem needs vmalloc() when required memory is large. > > but current kernel has not APIs for this requirement. > > It doesn't mean there should be such an API. If it leads to lots of code duplication, which this case has, that is enough reason to provide helper functions. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 1/7] mm: introduce simple_malloc()/simple_free() 2008-11-16 4:49 ` Alexey Dobriyan 2008-11-16 8:14 ` David Miller @ 2008-11-16 18:42 ` KOSAKI Motohiro 1 sibling, 0 replies; 29+ messages in thread From: KOSAKI Motohiro @ 2008-11-16 18:42 UTC (permalink / raw) To: Alexey Dobriyan Cc: kosaki.motohiro, Lai Jiangshan, Andrew Morton, Paul Menage, kamezawa.hiroyu, Balbir Singh, Jens Axboe, David S. Miller, Jan Kara, Jes Sorensen, Linux Kernel Mailing List Hi > > this patch introduces simple_malloc() and simple_free(). > > Grossly misnamed. It should be kvmalloc/kvfree, at least. I also like the name of kvmalloc because "simple_" prefix seems too ambiguity. Otherthings, looks good to me. this patch is just cleanup. it definitly don't have any side effect. So, I think vmalloc disadvantage discussion isn't point so much. There alaredy is. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 1/7] mm: introduce simple_malloc()/simple_free() 2008-11-16 4:33 [PATCH 1/7] mm: introduce simple_malloc()/simple_free() Lai Jiangshan 2008-11-16 4:49 ` Alexey Dobriyan @ 2008-11-16 4:52 ` Arjan van de Ven 2008-11-16 5:03 ` Andrew Morton 2008-11-16 8:19 ` David Miller 1 sibling, 2 replies; 29+ messages in thread From: Arjan van de Ven @ 2008-11-16 4:52 UTC (permalink / raw) To: Lai Jiangshan Cc: Andrew Morton, Paul Menage, kamezawa.hiroyu, Balbir Singh, Jens Axboe, David S. Miller, Jan Kara, Jes Sorensen, Linux Kernel Mailing List On Sun, 16 Nov 2008 12:33:15 +0800 Lai Jiangshan <laijs@cn.fujitsu.com> wrote: > > some subsystem needs vmalloc() when required memory is large. > but current kernel has not APIs for this requirement. > this patch introduces simple_malloc() and simple_free(). Hi I kinda really don't like this approach. vmalloc() (and especially, vfree()) is a really expensive operation, and vmalloc()'d memory is also slower (due to tlb pressure). Realistically, people should try hard to use small datastructure instead.... -- Arjan van de Ven Intel Open Source Technology Centre For development, discussion and tips for power savings, visit http://www.lesswatts.org ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 1/7] mm: introduce simple_malloc()/simple_free() 2008-11-16 4:52 ` Arjan van de Ven @ 2008-11-16 5:03 ` Andrew Morton 2008-11-16 5:35 ` Lai Jiangshan 2008-11-16 8:21 ` David Miller 2008-11-16 8:19 ` David Miller 1 sibling, 2 replies; 29+ messages in thread From: Andrew Morton @ 2008-11-16 5:03 UTC (permalink / raw) To: Arjan van de Ven Cc: Lai Jiangshan, Paul Menage, kamezawa.hiroyu, Balbir Singh, Jens Axboe, David S. Miller, Jan Kara, Jes Sorensen, Linux Kernel Mailing List On Sat, 15 Nov 2008 20:52:29 -0800 Arjan van de Ven <arjan@infradead.org> wrote: > On Sun, 16 Nov 2008 12:33:15 +0800 > Lai Jiangshan <laijs@cn.fujitsu.com> wrote: > > > > > some subsystem needs vmalloc() when required memory is large. > > but current kernel has not APIs for this requirement. > > this patch introduces simple_malloc() and simple_free(). > > Hi > > I kinda really don't like this approach. vmalloc() (and especially, > vfree()) is a really expensive operation, and vmalloc()'d memory is > also slower (due to tlb pressure). And it can fragment, which effectively means a dead box. > Realistically, people should try hard > to use small datastructure instead.... Yup, it makes it easier for people to do something which we strongly discourage. The risk got worse with all these 64-bit machines with vast amounts of virtual address space. It makes it easier for people to develop and "test" code which isn't reliable on smaller machines. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 1/7] mm: introduce simple_malloc()/simple_free() 2008-11-16 5:03 ` Andrew Morton @ 2008-11-16 5:35 ` Lai Jiangshan 2008-11-16 5:47 ` Andrew Morton 2008-11-16 5:53 ` Arjan van de Ven 2008-11-16 8:21 ` David Miller 1 sibling, 2 replies; 29+ messages in thread From: Lai Jiangshan @ 2008-11-16 5:35 UTC (permalink / raw) To: Andrew Morton Cc: Arjan van de Ven, Paul Menage, kamezawa.hiroyu, Balbir Singh, Jens Axboe, David S. Miller, Jan Kara, Jes Sorensen, Linux Kernel Mailing List Andrew Morton wrote: > On Sat, 15 Nov 2008 20:52:29 -0800 Arjan van de Ven <arjan@infradead.org> wrote: > >> On Sun, 16 Nov 2008 12:33:15 +0800 >> Lai Jiangshan <laijs@cn.fujitsu.com> wrote: >> >>> some subsystem needs vmalloc() when required memory is large. >>> but current kernel has not APIs for this requirement. >>> this patch introduces simple_malloc() and simple_free(). >> Hi >> >> I kinda really don't like this approach. vmalloc() (and especially, >> vfree()) is a really expensive operation, and vmalloc()'d memory is >> also slower (due to tlb pressure). > > And it can fragment, which effectively means a dead box. > >> Realistically, people should try hard >> to use small datastructure instead.... > > Yup, it makes it easier for people to do something which we strongly > discourage. The risk got worse with all these 64-bit machines with > vast amounts of virtual address space. It makes it easier for people > to develop and "test" code which isn't reliable on smaller machines. > > vmalloc() is not good for performance and increasing fragment. but vmalloc() is need for some subsystems' alternative malloc, like cgroup's tasks file and other subsystems(about 20 subsystems). these subsystems use kmalloc() in the most condition, but may need vmalloc() in some rare condition. so they use alternative malloc. So, since these subsystems' maintainer have good reasons for using vmalloc(), they can use simple_malloc() too. simple_malloc() is not for common using. (I should document when we use simple_malloc() in the code) simple_free() is useful. there are several subsystems which use a flags for selecting kfree() or vfree(), and some subsystems recount the size hardy before kfree() or vfree(). Thanks, Lai. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 1/7] mm: introduce simple_malloc()/simple_free() 2008-11-16 5:35 ` Lai Jiangshan @ 2008-11-16 5:47 ` Andrew Morton 2008-11-16 5:53 ` Arjan van de Ven 1 sibling, 0 replies; 29+ messages in thread From: Andrew Morton @ 2008-11-16 5:47 UTC (permalink / raw) To: Lai Jiangshan Cc: Arjan van de Ven, Paul Menage, kamezawa.hiroyu, Balbir Singh, Jens Axboe, David S. Miller, Jan Kara, Jes Sorensen, Linux Kernel Mailing List On Sun, 16 Nov 2008 13:35:03 +0800 Lai Jiangshan <laijs@cn.fujitsu.com> wrote: > Andrew Morton wrote: > > On Sat, 15 Nov 2008 20:52:29 -0800 Arjan van de Ven <arjan@infradead.org> wrote: > > > >> On Sun, 16 Nov 2008 12:33:15 +0800 > >> Lai Jiangshan <laijs@cn.fujitsu.com> wrote: > >> > >>> some subsystem needs vmalloc() when required memory is large. > >>> but current kernel has not APIs for this requirement. > >>> this patch introduces simple_malloc() and simple_free(). > >> Hi > >> > >> I kinda really don't like this approach. vmalloc() (and especially, > >> vfree()) is a really expensive operation, and vmalloc()'d memory is > >> also slower (due to tlb pressure). > > > > And it can fragment, which effectively means a dead box. > > > >> Realistically, people should try hard > >> to use small datastructure instead.... > > > > Yup, it makes it easier for people to do something which we strongly > > discourage. The risk got worse with all these 64-bit machines with > > vast amounts of virtual address space. It makes it easier for people > > to develop and "test" code which isn't reliable on smaller machines. > > > > > > vmalloc() is not good for performance and increasing fragment. > but vmalloc() is need for some subsystems' alternative malloc, > like cgroup's tasks file and other subsystems(about 20 subsystems). > > these subsystems use kmalloc() in the most condition, but may need > vmalloc() in some rare condition. so they use alternative malloc. > > So, since these subsystems' maintainer have good reasons for using vmalloc(), > they can use simple_malloc() too. simple_malloc() is not for common using. > (I should document when we use simple_malloc() in the code) > > simple_free() is useful. there are several subsystems which use a flags > for selecting kfree() or vfree(), and some subsystems recount the size hardy > before kfree() or vfree(). > Sure. Apart from the names of the functions, it's a good cleanup of existing code. It's just that we must *really* discourage the use of vmalloc :( Maybe we should call it i_am_a_hopeless_loser_alloc(). Sending the per-subsystem patches to the maintainers would be fun. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 1/7] mm: introduce simple_malloc()/simple_free() 2008-11-16 5:35 ` Lai Jiangshan 2008-11-16 5:47 ` Andrew Morton @ 2008-11-16 5:53 ` Arjan van de Ven 2008-11-16 6:08 ` Eric Dumazet 2008-11-16 8:23 ` David Miller 1 sibling, 2 replies; 29+ messages in thread From: Arjan van de Ven @ 2008-11-16 5:53 UTC (permalink / raw) To: Lai Jiangshan Cc: Andrew Morton, Paul Menage, kamezawa.hiroyu, Balbir Singh, Jens Axboe, David S. Miller, Jan Kara, Jes Sorensen, Linux Kernel Mailing List On Sun, 16 Nov 2008 13:35:03 +0800 Lai Jiangshan <laijs@cn.fujitsu.com> wrote: > > vmalloc() is not good for performance and increasing fragment. > but vmalloc() is need for some subsystems' alternative malloc, > like cgroup's tasks file and other subsystems(about 20 subsystems). actually what you are pointing out is that these areas need improvement to not need such huge blocks of memory... but only a series of smaller blocks instead. -- Arjan van de Ven Intel Open Source Technology Centre For development, discussion and tips for power savings, visit http://www.lesswatts.org ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 1/7] mm: introduce simple_malloc()/simple_free() 2008-11-16 5:53 ` Arjan van de Ven @ 2008-11-16 6:08 ` Eric Dumazet 2008-11-16 8:23 ` David Miller 1 sibling, 0 replies; 29+ messages in thread From: Eric Dumazet @ 2008-11-16 6:08 UTC (permalink / raw) To: Arjan van de Ven Cc: Lai Jiangshan, Andrew Morton, Paul Menage, kamezawa.hiroyu, Balbir Singh, Jens Axboe, David S. Miller, Jan Kara, Jes Sorensen, Linux Kernel Mailing List Arjan van de Ven a écrit : > On Sun, 16 Nov 2008 13:35:03 +0800 > Lai Jiangshan <laijs@cn.fujitsu.com> wrote: > >> vmalloc() is not good for performance and increasing fragment. >> but vmalloc() is need for some subsystems' alternative malloc, >> like cgroup's tasks file and other subsystems(about 20 subsystems). > > actually what you are pointing out is that these areas need improvement > to not need such huge blocks of memory... but only a series of smaller > blocks instead. > Just zap vmalloc()/vfree() then ? More seriously, vmalloc()/vfree() were designed partly to make people life easier, not to be *the* premium interface to manage kernel memory Some parts of the kernel cannot afford the cost of vmalloc()/vfree(), so people must think and design complex algos. I personnaly like this cleanup too. For example bnx2 driver actually uses vmalloc() while a kmalloc() should be OK for bnx2_alloc_rx_mem() # grep bnx2 /proc/vmallocinfo 0xf8260000-0xf8274000 81920 bnx2_init_board+0x104/0xae0 phys=f6000000 ioremap 0xf8280000-0xf8294000 81920 bnx2_init_board+0x104/0xae0 phys=fa000000 ioremap 0xf82c1000-0xf82c3000 8192 bnx2_alloc_rx_mem+0x33/0x310 pages=1 vmalloc 0xf82d9000-0xf82db000 8192 bnx2_alloc_rx_mem+0x33/0x310 pages=1 vmalloc I have two comments : 1) Names should be not "simple" : That is misleading. "convenient" maybe, or "slow"... 2) Since vmalloc()/vfree() is potentially a very expensive operation, we should make slow_malloc()/slow_free() or whatever name is chosen, uninlined. No need to try to save 3 or 4 cpu cycles. This will save icache at least. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 1/7] mm: introduce simple_malloc()/simple_free() 2008-11-16 5:53 ` Arjan van de Ven 2008-11-16 6:08 ` Eric Dumazet @ 2008-11-16 8:23 ` David Miller 1 sibling, 0 replies; 29+ messages in thread From: David Miller @ 2008-11-16 8:23 UTC (permalink / raw) To: arjan Cc: laijs, akpm, menage, kamezawa.hiroyu, balbir, jens.axboe, jack, jes, linux-kernel From: Arjan van de Ven <arjan@infradead.org> Date: Sat, 15 Nov 2008 21:53:09 -0800 > On Sun, 16 Nov 2008 13:35:03 +0800 > Lai Jiangshan <laijs@cn.fujitsu.com> wrote: > > > vmalloc() is not good for performance and increasing fragment. > > but vmalloc() is need for some subsystems' alternative malloc, > > like cgroup's tasks file and other subsystems(about 20 subsystems). > > actually what you are pointing out is that these areas need improvement > to not need such huge blocks of memory... but only a series of smaller > blocks instead. Like the IPSEC policy and state hash tables when hundreds of thousands of rules become active? You have to be joking. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 1/7] mm: introduce simple_malloc()/simple_free() 2008-11-16 5:03 ` Andrew Morton 2008-11-16 5:35 ` Lai Jiangshan @ 2008-11-16 8:21 ` David Miller 1 sibling, 0 replies; 29+ messages in thread From: David Miller @ 2008-11-16 8:21 UTC (permalink / raw) To: akpm Cc: arjan, laijs, menage, kamezawa.hiroyu, balbir, jens.axboe, jack, jes, linux-kernel From: Andrew Morton <akpm@linux-foundation.org> Date: Sat, 15 Nov 2008 21:03:34 -0800 > On Sat, 15 Nov 2008 20:52:29 -0800 Arjan van de Ven <arjan@infradead.org> wrote: > > > Realistically, people should try hard > > to use small datastructure instead.... > > Yup, it makes it easier for people to do something which we strongly > discourage. The risk got worse with all these 64-bit machines with > vast amounts of virtual address space. It makes it easier for people > to develop and "test" code which isn't reliable on smaller machines. I don't see it as being used for common operations (unlike vmap BTW, which XFS does like crazy). It's mainly for core hash tables and similarly large structures. And also, for those things, the NUMA spreading helps. And, in any event, for those cases: 1) we are already doing this exact thing 2) due to #1 we have N copies of the same damn code So whatever policy fits your fancy, wouldn't it be better to "enforce" this in one spot instead of the 12 copies we have already? ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 1/7] mm: introduce simple_malloc()/simple_free() 2008-11-16 4:52 ` Arjan van de Ven 2008-11-16 5:03 ` Andrew Morton @ 2008-11-16 8:19 ` David Miller 2008-11-16 18:57 ` Arjan van de Ven 1 sibling, 1 reply; 29+ messages in thread From: David Miller @ 2008-11-16 8:19 UTC (permalink / raw) To: arjan Cc: laijs, akpm, menage, kamezawa.hiroyu, balbir, jens.axboe, jack, jes, linux-kernel From: Arjan van de Ven <arjan@infradead.org> Date: Sat, 15 Nov 2008 20:52:29 -0800 > On Sun, 16 Nov 2008 12:33:15 +0800 > Lai Jiangshan <laijs@cn.fujitsu.com> wrote: > > > > > some subsystem needs vmalloc() when required memory is large. > > but current kernel has not APIs for this requirement. > > this patch introduces simple_malloc() and simple_free(). > > I kinda really don't like this approach. vmalloc() (and especially, > vfree()) is a really expensive operation, and vmalloc()'d memory is > also slower (due to tlb pressure). Realistically, people should try hard > to use small datastructure instead.... This is happening in many places, already, for good reason. There are lots of places where we can't (core hash tables, etc.) and we want NUMA spreading and reliable allocation, and thus vmalloc it is. And with Nick Piggin's recent patches cost of the vmalloc allocation operation itself is no longer an issue. I encouraged people to write this patch because we _ALREADY DO THIS_ in many places and it's pure code duplication. And whatever particular heuristics you think are best for these cases, with a common piece of code you can enforce it in ONE place instead of several which is what is happening right now.. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 1/7] mm: introduce simple_malloc()/simple_free() 2008-11-16 8:19 ` David Miller @ 2008-11-16 18:57 ` Arjan van de Ven 2008-11-16 21:39 ` Dave Airlie 0 siblings, 1 reply; 29+ messages in thread From: Arjan van de Ven @ 2008-11-16 18:57 UTC (permalink / raw) To: David Miller Cc: laijs, akpm, menage, kamezawa.hiroyu, balbir, jens.axboe, jack, jes, linux-kernel On Sun, 16 Nov 2008 00:19:26 -0800 (PST) David Miller <davem@davemloft.net> wrote: > From: Arjan van de Ven <arjan@infradead.org> > Date: Sat, 15 Nov 2008 20:52:29 -0800 > > > On Sun, 16 Nov 2008 12:33:15 +0800 > > Lai Jiangshan <laijs@cn.fujitsu.com> wrote: > > > > > > > > some subsystem needs vmalloc() when required memory is large. > > > but current kernel has not APIs for this requirement. > > > this patch introduces simple_malloc() and simple_free(). > > > > I kinda really don't like this approach. vmalloc() (and especially, > > vfree()) is a really expensive operation, and vmalloc()'d memory is > > also slower (due to tlb pressure). Realistically, people should try > > hard to use small datastructure instead.... > > This is happening in many places, already, for good reason. > > There are lots of places where we can't (core hash tables, etc.) and > we want NUMA spreading and reliable allocation, and thus vmalloc it > is. vmalloc() isn't 100% evil; for truely long term stuff it's sometimes a quite reasonable solution. There are some issues with it still: the vmalloc() space is shared with ioremap, modules and others and it's not all that big on 32 bit; on x86 you could well end up with only 64Mb total (after taking out the various ioremap's etc). Yes there's places where it's then totally fine to dip into this space at boot/init time. You mention a few very good users. (There's still the tlb miss cost on use but on modern cpus a tlb miss is actually quite cheap) But this doesn't make vmalloc() the magic bullet that solves the "oh Linux can't allocate large chunks of memory" problem. Specifically in driver space for things that get ported from other OSes. -- Arjan van de Ven Intel Open Source Technology Centre For development, discussion and tips for power savings, visit http://www.lesswatts.org ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 1/7] mm: introduce simple_malloc()/simple_free() 2008-11-16 18:57 ` Arjan van de Ven @ 2008-11-16 21:39 ` Dave Airlie 2008-11-16 21:51 ` Arjan van de Ven 2008-11-17 4:43 ` Balbir Singh 0 siblings, 2 replies; 29+ messages in thread From: Dave Airlie @ 2008-11-16 21:39 UTC (permalink / raw) To: Arjan van de Ven Cc: David Miller, laijs, akpm, menage, kamezawa.hiroyu, balbir, jens.axboe, jack, jes, linux-kernel On Mon, Nov 17, 2008 at 4:57 AM, Arjan van de Ven <arjan@infradead.org> wrote: > On Sun, 16 Nov 2008 00:19:26 -0800 (PST) > David Miller <davem@davemloft.net> wrote: > >> From: Arjan van de Ven <arjan@infradead.org> >> Date: Sat, 15 Nov 2008 20:52:29 -0800 >> >> > On Sun, 16 Nov 2008 12:33:15 +0800 >> > Lai Jiangshan <laijs@cn.fujitsu.com> wrote: >> > >> > > >> > > some subsystem needs vmalloc() when required memory is large. >> > > but current kernel has not APIs for this requirement. >> > > this patch introduces simple_malloc() and simple_free(). >> > >> > I kinda really don't like this approach. vmalloc() (and especially, >> > vfree()) is a really expensive operation, and vmalloc()'d memory is >> > also slower (due to tlb pressure). Realistically, people should try >> > hard to use small datastructure instead.... >> >> This is happening in many places, already, for good reason. >> >> There are lots of places where we can't (core hash tables, etc.) and >> we want NUMA spreading and reliable allocation, and thus vmalloc it >> is. > > vmalloc() isn't 100% evil; for truely long term stuff it's sometimes a > quite reasonable solution. > > There are some issues with it still: the vmalloc() space is shared > with ioremap, modules and others and it's not all that big on 32 bit; on > x86 you could well end up with only 64Mb total (after taking out the > various ioremap's etc). > > Yes there's places where it's then totally fine to dip into this space > at boot/init time. You mention a few very good users. > (There's still the tlb miss cost on use but on modern cpus a tlb miss > is actually quite cheap) > > But this doesn't make vmalloc() the magic bullet that solves the "oh > Linux can't allocate large chunks of memory" problem. Specifically in > driver space for things that get ported from other OSes. So we keep the duplicated code? or we just audit new callers.... I think this patch makes it easier to spot new callers doing something stupid. As davem said we duplicate this code all over the place, so for that reason along a simple wrapper makes things a lot easier, and also possibly a lot easier to change in the future to a new non-sucky API. So I'm all for it maybe with a non simple name. Dave. > > -- > Arjan van de Ven Intel Open Source Technology Centre > For development, discussion and tips for power savings, > visit http://www.lesswatts.org > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 1/7] mm: introduce simple_malloc()/simple_free() 2008-11-16 21:39 ` Dave Airlie @ 2008-11-16 21:51 ` Arjan van de Ven 2008-11-16 22:42 ` Dave Airlie ` (2 more replies) 2008-11-17 4:43 ` Balbir Singh 1 sibling, 3 replies; 29+ messages in thread From: Arjan van de Ven @ 2008-11-16 21:51 UTC (permalink / raw) To: Dave Airlie Cc: David Miller, laijs, akpm, menage, kamezawa.hiroyu, balbir, jens.axboe, jack, jes, linux-kernel On Mon, 17 Nov 2008 07:39:55 +1000 "Dave Airlie" <airlied@gmail.com> wrote: > On Mon, Nov 17, 2008 at 4:57 AM, Arjan van de Ven > <arjan@infradead.org> wrote: > > On Sun, 16 Nov 2008 00:19:26 -0800 (PST) > > David Miller <davem@davemloft.net> wrote: > > > >> From: Arjan van de Ven <arjan@infradead.org> > >> Date: Sat, 15 Nov 2008 20:52:29 -0800 > >> > >> > On Sun, 16 Nov 2008 12:33:15 +0800 > >> > Lai Jiangshan <laijs@cn.fujitsu.com> wrote: > >> > > >> > > > >> > > some subsystem needs vmalloc() when required memory is large. > >> > > but current kernel has not APIs for this requirement. > >> > > this patch introduces simple_malloc() and simple_free(). > >> > > >> > I kinda really don't like this approach. vmalloc() (and > >> > especially, vfree()) is a really expensive operation, and > >> > vmalloc()'d memory is also slower (due to tlb pressure). > >> > Realistically, people should try hard to use small datastructure > >> > instead.... > >> > >> This is happening in many places, already, for good reason. > >> > >> There are lots of places where we can't (core hash tables, etc.) > >> and we want NUMA spreading and reliable allocation, and thus > >> vmalloc it is. > > > > vmalloc() isn't 100% evil; for truely long term stuff it's > > sometimes a quite reasonable solution. > > > > There are some issues with it still: the vmalloc() space is shared > > with ioremap, modules and others and it's not all that big on 32 > > bit; on x86 you could well end up with only 64Mb total (after > > taking out the various ioremap's etc). > > > > Yes there's places where it's then totally fine to dip into this > > space at boot/init time. You mention a few very good users. > > (There's still the tlb miss cost on use but on modern cpus a tlb > > miss is actually quite cheap) > > > > But this doesn't make vmalloc() the magic bullet that solves the "oh > > Linux can't allocate large chunks of memory" problem. Specifically > > in driver space for things that get ported from other OSes. > > So we keep the duplicated code? or we just audit new callers.... I > think this patch > makes it easier to spot new callers doing something stupid. As davem > said we duplicate > this code all over the place, so for that reason along a simple > wrapper makes things a lot > easier, and also possibly a lot easier to change in the future to a > new non-sucky API. > > So I'm all for it maybe with a non simple name. > I would go further than this. Make the code just use vmalloc(). Period. But then make vmalloc() smart and try do a direct mapping allocation first, before falling back to a virtual mapping. (and based on size it wouldn't even try it for just big things) -- Arjan van de Ven Intel Open Source Technology Centre For development, discussion and tips for power savings, visit http://www.lesswatts.org ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 1/7] mm: introduce simple_malloc()/simple_free() 2008-11-16 21:51 ` Arjan van de Ven @ 2008-11-16 22:42 ` Dave Airlie 2008-11-17 2:08 ` Lai Jiangshan 2008-11-17 4:46 ` Balbir Singh 2 siblings, 0 replies; 29+ messages in thread From: Dave Airlie @ 2008-11-16 22:42 UTC (permalink / raw) To: Arjan van de Ven Cc: David Miller, laijs, akpm, menage, kamezawa.hiroyu, balbir, jens.axboe, jack, jes, linux-kernel On Mon, Nov 17, 2008 at 7:51 AM, Arjan van de Ven <arjan@infradead.org> wrote: > On Mon, 17 Nov 2008 07:39:55 +1000 > "Dave Airlie" <airlied@gmail.com> wrote: > >> On Mon, Nov 17, 2008 at 4:57 AM, Arjan van de Ven >> <arjan@infradead.org> wrote: >> > On Sun, 16 Nov 2008 00:19:26 -0800 (PST) >> > David Miller <davem@davemloft.net> wrote: >> > >> >> From: Arjan van de Ven <arjan@infradead.org> >> >> Date: Sat, 15 Nov 2008 20:52:29 -0800 >> >> >> >> > On Sun, 16 Nov 2008 12:33:15 +0800 >> >> > Lai Jiangshan <laijs@cn.fujitsu.com> wrote: >> >> > >> >> > > >> >> > > some subsystem needs vmalloc() when required memory is large. >> >> > > but current kernel has not APIs for this requirement. >> >> > > this patch introduces simple_malloc() and simple_free(). >> >> > >> >> > I kinda really don't like this approach. vmalloc() (and >> >> > especially, vfree()) is a really expensive operation, and >> >> > vmalloc()'d memory is also slower (due to tlb pressure). >> >> > Realistically, people should try hard to use small datastructure >> >> > instead.... >> >> >> >> This is happening in many places, already, for good reason. >> >> >> >> There are lots of places where we can't (core hash tables, etc.) >> >> and we want NUMA spreading and reliable allocation, and thus >> >> vmalloc it is. >> > >> > vmalloc() isn't 100% evil; for truely long term stuff it's >> > sometimes a quite reasonable solution. >> > >> > There are some issues with it still: the vmalloc() space is shared >> > with ioremap, modules and others and it's not all that big on 32 >> > bit; on x86 you could well end up with only 64Mb total (after >> > taking out the various ioremap's etc). >> > >> > Yes there's places where it's then totally fine to dip into this >> > space at boot/init time. You mention a few very good users. >> > (There's still the tlb miss cost on use but on modern cpus a tlb >> > miss is actually quite cheap) >> > >> > But this doesn't make vmalloc() the magic bullet that solves the "oh >> > Linux can't allocate large chunks of memory" problem. Specifically >> > in driver space for things that get ported from other OSes. >> >> So we keep the duplicated code? or we just audit new callers.... I >> think this patch >> makes it easier to spot new callers doing something stupid. As davem >> said we duplicate >> this code all over the place, so for that reason along a simple >> wrapper makes things a lot >> easier, and also possibly a lot easier to change in the future to a >> new non-sucky API. >> >> So I'm all for it maybe with a non simple name. >> > > I would go further than this. > > Make the code just use vmalloc(). Period. > > But then make vmalloc() smart and try do a direct mapping allocation > first, before falling back to a virtual mapping. (and based on size it > wouldn't even try it for just big things I seem to remember Christoph Lameter having something along these lines before, did vcompound do some of this type of thing... Dave. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 1/7] mm: introduce simple_malloc()/simple_free() 2008-11-16 21:51 ` Arjan van de Ven 2008-11-16 22:42 ` Dave Airlie @ 2008-11-17 2:08 ` Lai Jiangshan 2008-11-17 4:53 ` Balbir Singh 2008-11-17 4:46 ` Balbir Singh 2 siblings, 1 reply; 29+ messages in thread From: Lai Jiangshan @ 2008-11-17 2:08 UTC (permalink / raw) To: Arjan van de Ven Cc: Dave Airlie, David Miller, akpm, menage, kamezawa.hiroyu, balbir, jens.axboe, jack, jes, linux-kernel, KOSAKI Motohiro, dada1, Alexey Dobriyan Arjan van de Ven wrote: > On Mon, 17 Nov 2008 07:39:55 +1000 > > I would go further than this. > > Make the code just use vmalloc(). Period. > > But then make vmalloc() smart and try do a direct mapping allocation > first, before falling back to a virtual mapping. (and based on size it > wouldn't even try it for just big things) > > > Hi, Arjan van de Ven (I'll rename simple_malloc/simple_free to kvmalloc/kvfree) I think vmalloc() should only do one thing(virtual mapping). Your idea can be implemented in helper function kvmalloc() if there is a good algorithm provided for it. kvmalloc() is for cleanup mostly. It will remove existed duplicate code. As David and I pointed out, vmalloc() is need for some good reason. If we do not introduce a helper function, these duplicate code are still spread everywhere. kvfree() is not only for free the memory allocated by kvmalloc(). kvfree() frees the memory that we don't know whether it was allocated by kmalloc() nor vmalloc(). Someone use a flag for it, and other guys calculate size of memory again before kfree() or vfree(). these two ways increase complexity, and it is hard to prove the re-calculated size is reliable. kvfree() will remove these needless complexity! Thanx, Lai ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 1/7] mm: introduce simple_malloc()/simple_free() 2008-11-17 2:08 ` Lai Jiangshan @ 2008-11-17 4:53 ` Balbir Singh 2008-11-17 5:25 ` KAMEZAWA Hiroyuki 0 siblings, 1 reply; 29+ messages in thread From: Balbir Singh @ 2008-11-17 4:53 UTC (permalink / raw) To: Lai Jiangshan Cc: Arjan van de Ven, Dave Airlie, David Miller, akpm, menage, kamezawa.hiroyu, jens.axboe, jack, jes, linux-kernel, KOSAKI Motohiro, dada1, Alexey Dobriyan Lai Jiangshan wrote: > Arjan van de Ven wrote: >> On Mon, 17 Nov 2008 07:39:55 +1000 >> >> I would go further than this. >> >> Make the code just use vmalloc(). Period. >> >> But then make vmalloc() smart and try do a direct mapping allocation >> first, before falling back to a virtual mapping. (and based on size it >> wouldn't even try it for just big things) >> >> >> > > Hi, Arjan van de Ven > > (I'll rename simple_malloc/simple_free to kvmalloc/kvfree) > I would prefer to find a way to say that one cannot select gfp_mask with this API. -- Balbir ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 1/7] mm: introduce simple_malloc()/simple_free() 2008-11-17 4:53 ` Balbir Singh @ 2008-11-17 5:25 ` KAMEZAWA Hiroyuki 2008-11-17 6:43 ` KOSAKI Motohiro 0 siblings, 1 reply; 29+ messages in thread From: KAMEZAWA Hiroyuki @ 2008-11-17 5:25 UTC (permalink / raw) To: balbir Cc: Lai Jiangshan, Arjan van de Ven, Dave Airlie, David Miller, akpm, menage, jens.axboe, jack, jes, linux-kernel, KOSAKI Motohiro, dada1, Alexey Dobriyan On Mon, 17 Nov 2008 10:23:24 +0530 Balbir Singh <balbir@linux.vnet.ibm.com> wrote: > Lai Jiangshan wrote: > > Arjan van de Ven wrote: > >> On Mon, 17 Nov 2008 07:39:55 +1000 > >> > >> I would go further than this. > >> > >> Make the code just use vmalloc(). Period. > >> > >> But then make vmalloc() smart and try do a direct mapping allocation > >> first, before falling back to a virtual mapping. (and based on size it > >> wouldn't even try it for just big things) > >> > >> > >> > > > > Hi, Arjan van de Ven > > > > (I'll rename simple_malloc/simple_free to kvmalloc/kvfree) > > > > I would prefer to find a way to say that one cannot select gfp_mask with this API. > I think gfp_mask must be passed explicitly. Thanks, -Kame ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 1/7] mm: introduce simple_malloc()/simple_free() 2008-11-17 5:25 ` KAMEZAWA Hiroyuki @ 2008-11-17 6:43 ` KOSAKI Motohiro 2008-11-17 7:13 ` Andrew Morton 0 siblings, 1 reply; 29+ messages in thread From: KOSAKI Motohiro @ 2008-11-17 6:43 UTC (permalink / raw) To: KAMEZAWA Hiroyuki Cc: balbir, Lai Jiangshan, Arjan van de Ven, Dave Airlie, David Miller, akpm, menage, jens.axboe, jack, jes, linux-kernel, dada1, Alexey Dobriyan >> > (I'll rename simple_malloc/simple_free to kvmalloc/kvfree) >> > >> >> I would prefer to find a way to say that one cannot select gfp_mask with this API. >> > I think gfp_mask must be passed explicitly. Agreed. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 1/7] mm: introduce simple_malloc()/simple_free() 2008-11-17 6:43 ` KOSAKI Motohiro @ 2008-11-17 7:13 ` Andrew Morton 2008-11-17 7:15 ` David Miller 2008-11-18 4:39 ` Nick Piggin 0 siblings, 2 replies; 29+ messages in thread From: Andrew Morton @ 2008-11-17 7:13 UTC (permalink / raw) To: KOSAKI Motohiro Cc: KAMEZAWA Hiroyuki, balbir, Lai Jiangshan, Arjan van de Ven, Dave Airlie, David Miller, menage, jens.axboe, jack, jes, linux-kernel, dada1, Alexey Dobriyan On Mon, 17 Nov 2008 15:43:59 +0900 "KOSAKI Motohiro" <kosaki.motohiro@jp.fujitsu.com> wrote: > >> > (I'll rename simple_malloc/simple_free to kvmalloc/kvfree) > >> > > >> > >> I would prefer to find a way to say that one cannot select gfp_mask with this API. > >> > > I think gfp_mask must be passed explicitly. > > Agreed. It would only make sense if __vmalloc() can be called in atomic contexts. __vmalloc() cannot be called from irq contexts due to it taking non-irq-safe spinlocks. __vmalloc() kinda looks like it could be called from non-irq atomic contexts with GFP_ATOMIC, but I think it lies. For example, pud_alloc_one/pmd_alloc_one/etc use hard-wired GFP_KERNEL. In which case this new allocation function can only be called from contexts where GFP_KERNEL can be used, hence we don't need to pass that in - it would be misleading to do so. In fact it's not immediately clear why __vmalloc() takes a gfp_t argument either? ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 1/7] mm: introduce simple_malloc()/simple_free() 2008-11-17 7:13 ` Andrew Morton @ 2008-11-17 7:15 ` David Miller 2008-11-17 8:10 ` KOSAKI Motohiro 2008-11-17 8:24 ` Balbir Singh 2008-11-18 4:39 ` Nick Piggin 1 sibling, 2 replies; 29+ messages in thread From: David Miller @ 2008-11-17 7:15 UTC (permalink / raw) To: akpm Cc: kosaki.motohiro, kamezawa.hiroyu, balbir, laijs, arjan, airlied, menage, jens.axboe, jack, jes, linux-kernel, dada1, adobriyan From: Andrew Morton <akpm@linux-foundation.org> Date: Sun, 16 Nov 2008 23:13:01 -0800 > In fact it's not immediately clear why __vmalloc() takes a gfp_t > argument either? Probably for things like GFP_DMA32, GFP_HIGHMEM, et al. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 1/7] mm: introduce simple_malloc()/simple_free() 2008-11-17 7:15 ` David Miller @ 2008-11-17 8:10 ` KOSAKI Motohiro 2008-11-17 8:24 ` Balbir Singh 1 sibling, 0 replies; 29+ messages in thread From: KOSAKI Motohiro @ 2008-11-17 8:10 UTC (permalink / raw) To: David Miller Cc: akpm, kamezawa.hiroyu, balbir, laijs, arjan, airlied, menage, jens.axboe, jack, jes, linux-kernel, dada1, adobriyan 2008/11/17 David Miller <davem@davemloft.net>: > From: Andrew Morton <akpm@linux-foundation.org> > Date: Sun, 16 Nov 2008 23:13:01 -0800 > >> In fact it's not immediately clear why __vmalloc() takes a gfp_t >> argument either? > > Probably for things like GFP_DMA32, GFP_HIGHMEM, et al. Thanks. I think to __GFP_ZERO or __GFP_COLD etc.. but that also is right, imho. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 1/7] mm: introduce simple_malloc()/simple_free() 2008-11-17 7:15 ` David Miller 2008-11-17 8:10 ` KOSAKI Motohiro @ 2008-11-17 8:24 ` Balbir Singh 1 sibling, 0 replies; 29+ messages in thread From: Balbir Singh @ 2008-11-17 8:24 UTC (permalink / raw) To: David Miller Cc: akpm, kosaki.motohiro, kamezawa.hiroyu, laijs, arjan, airlied, menage, jens.axboe, jack, jes, linux-kernel, dada1, adobriyan David Miller wrote: > From: Andrew Morton <akpm@linux-foundation.org> > Date: Sun, 16 Nov 2008 23:13:01 -0800 > >> In fact it's not immediately clear why __vmalloc() takes a gfp_t >> argument either? > > Probably for things like GFP_DMA32, GFP_HIGHMEM, et al. > vmalloc() hides away GFP_DMA32 and hard codes gfp_mask to GFP_KERNEL | __GFP_HIGHMEM. __vmalloc() lies like Andrew mentioned. For the use cases mentioned in this thread, we don't really care about GFP_DMA32 (or do we?). I would prefer to avoid passing the gfp_mask and call the API something like blocking_vkmalloc() and blocking_vkzalloc() or something better. -- Balbir ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 1/7] mm: introduce simple_malloc()/simple_free() 2008-11-17 7:13 ` Andrew Morton 2008-11-17 7:15 ` David Miller @ 2008-11-18 4:39 ` Nick Piggin 2008-11-18 5:16 ` Lai Jiangshan 1 sibling, 1 reply; 29+ messages in thread From: Nick Piggin @ 2008-11-18 4:39 UTC (permalink / raw) To: Andrew Morton Cc: KOSAKI Motohiro, KAMEZAWA Hiroyuki, balbir, Lai Jiangshan, Arjan van de Ven, Dave Airlie, David Miller, menage, jens.axboe, jack, jes, linux-kernel, dada1, Alexey Dobriyan On Monday 17 November 2008 18:13, Andrew Morton wrote: > On Mon, 17 Nov 2008 15:43:59 +0900 "KOSAKI Motohiro" <kosaki.motohiro@jp.fujitsu.com> wrote: > > >> > (I'll rename simple_malloc/simple_free to kvmalloc/kvfree) > > >> > > >> I would prefer to find a way to say that one cannot select gfp_mask > > >> with this API. > > > > > > I think gfp_mask must be passed explicitly. > > > > Agreed. > > It would only make sense if __vmalloc() can be called in atomic contexts. > > __vmalloc() cannot be called from irq contexts due to it taking > non-irq-safe spinlocks. > > __vmalloc() kinda looks like it could be called from non-irq atomic > contexts with GFP_ATOMIC, but I think it lies. For example, > pud_alloc_one/pmd_alloc_one/etc use hard-wired GFP_KERNEL. vmalloc/vfree / vmap/vunmap I think could now be made to be usable even in irq context, I think. Freeing up vmalloc space with global tlb flush can't be done from interrupt context, but now with the lazy unmapping, you only have to mark the area as freed (and possibly kick off a thread to do the actual unmapping). I didn't actually add that, because yes it would increase overheads a bit, and I would still prefer to wait for a real nice problem it solves before adding such a capability... > In which case this new allocation function can only be called from > contexts where GFP_KERNEL can be used, hence we don't need to pass that > in - it would be misleading to do so. > > In fact it's not immediately clear why __vmalloc() takes a gfp_t > argument either? Possibly a bugcheck for !GFP_WAIT || !GFP_FS || !GFP_IO, or a might_sleep() or something would be a good idea to add... ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 1/7] mm: introduce simple_malloc()/simple_free() 2008-11-18 4:39 ` Nick Piggin @ 2008-11-18 5:16 ` Lai Jiangshan 0 siblings, 0 replies; 29+ messages in thread From: Lai Jiangshan @ 2008-11-18 5:16 UTC (permalink / raw) To: Nick Piggin Cc: Andrew Morton, KOSAKI Motohiro, KAMEZAWA Hiroyuki, balbir, Arjan van de Ven, Dave Airlie, David Miller, menage, jens.axboe, jack, jes, linux-kernel, dada1, Alexey Dobriyan Nick Piggin wrote: > On Monday 17 November 2008 18:13, Andrew Morton wrote: >> On Mon, 17 Nov 2008 15:43:59 +0900 "KOSAKI Motohiro" > <kosaki.motohiro@jp.fujitsu.com> wrote: >>>>>> (I'll rename simple_malloc/simple_free to kvmalloc/kvfree) >>>>> I would prefer to find a way to say that one cannot select gfp_mask >>>>> with this API. >>>> I think gfp_mask must be passed explicitly. >>> Agreed. >> It would only make sense if __vmalloc() can be called in atomic contexts. >> >> __vmalloc() cannot be called from irq contexts due to it taking >> non-irq-safe spinlocks. >> >> __vmalloc() kinda looks like it could be called from non-irq atomic >> contexts with GFP_ATOMIC, but I think it lies. For example, >> pud_alloc_one/pmd_alloc_one/etc use hard-wired GFP_KERNEL. > > vmalloc/vfree / vmap/vunmap I think could now be made to be usable even > in irq context, I think. Freeing up vmalloc space with global tlb flush > can't be done from interrupt context, but now with the lazy unmapping, > you only have to mark the area as freed (and possibly kick off a thread > to do the actual unmapping). > > I didn't actually add that, because yes it would increase overheads a > bit, and I would still prefer to wait for a real nice problem it solves > before adding such a capability... > > >> In which case this new allocation function can only be called from >> contexts where GFP_KERNEL can be used, hence we don't need to pass that >> in - it would be misleading to do so. >> >> In fact it's not immediately clear why __vmalloc() takes a gfp_t >> argument either? > > Possibly a bugcheck for !GFP_WAIT || !GFP_FS || !GFP_IO, or a might_sleep() > or something would be a good idea to add... > > > Thanks. Now, new patch for it is: http://lkml.org/lkml/2008/11/17/137 Lai. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 1/7] mm: introduce simple_malloc()/simple_free() 2008-11-16 21:51 ` Arjan van de Ven 2008-11-16 22:42 ` Dave Airlie 2008-11-17 2:08 ` Lai Jiangshan @ 2008-11-17 4:46 ` Balbir Singh 2 siblings, 0 replies; 29+ messages in thread From: Balbir Singh @ 2008-11-17 4:46 UTC (permalink / raw) To: Arjan van de Ven Cc: Dave Airlie, David Miller, laijs, akpm, menage, kamezawa.hiroyu, jens.axboe, jack, jes, linux-kernel Arjan van de Ven wrote: > On Mon, 17 Nov 2008 07:39:55 +1000 > "Dave Airlie" <airlied@gmail.com> wrote: > >> On Mon, Nov 17, 2008 at 4:57 AM, Arjan van de Ven >> <arjan@infradead.org> wrote: >>> On Sun, 16 Nov 2008 00:19:26 -0800 (PST) >>> David Miller <davem@davemloft.net> wrote: >>> >>>> From: Arjan van de Ven <arjan@infradead.org> >>>> Date: Sat, 15 Nov 2008 20:52:29 -0800 >>>> >>>>> On Sun, 16 Nov 2008 12:33:15 +0800 >>>>> Lai Jiangshan <laijs@cn.fujitsu.com> wrote: >>>>> >>>>>> some subsystem needs vmalloc() when required memory is large. >>>>>> but current kernel has not APIs for this requirement. >>>>>> this patch introduces simple_malloc() and simple_free(). >>>>> I kinda really don't like this approach. vmalloc() (and >>>>> especially, vfree()) is a really expensive operation, and >>>>> vmalloc()'d memory is also slower (due to tlb pressure). >>>>> Realistically, people should try hard to use small datastructure >>>>> instead.... >>>> This is happening in many places, already, for good reason. >>>> >>>> There are lots of places where we can't (core hash tables, etc.) >>>> and we want NUMA spreading and reliable allocation, and thus >>>> vmalloc it is. >>> vmalloc() isn't 100% evil; for truely long term stuff it's >>> sometimes a quite reasonable solution. >>> >>> There are some issues with it still: the vmalloc() space is shared >>> with ioremap, modules and others and it's not all that big on 32 >>> bit; on x86 you could well end up with only 64Mb total (after >>> taking out the various ioremap's etc). >>> >>> Yes there's places where it's then totally fine to dip into this >>> space at boot/init time. You mention a few very good users. >>> (There's still the tlb miss cost on use but on modern cpus a tlb >>> miss is actually quite cheap) >>> >>> But this doesn't make vmalloc() the magic bullet that solves the "oh >>> Linux can't allocate large chunks of memory" problem. Specifically >>> in driver space for things that get ported from other OSes. >> So we keep the duplicated code? or we just audit new callers.... I >> think this patch >> makes it easier to spot new callers doing something stupid. As davem >> said we duplicate >> this code all over the place, so for that reason along a simple >> wrapper makes things a lot >> easier, and also possibly a lot easier to change in the future to a >> new non-sucky API. >> >> So I'm all for it maybe with a non simple name. >> > > I would go further than this. > > Make the code just use vmalloc(). Period. > But vmalloc() is always chunks of pages, not always desirable. > But then make vmalloc() smart and try do a direct mapping allocation > first, before falling back to a virtual mapping. (and based on size it > wouldn't even try it for just big things) If only slab/slub could do vmalloc() based caches, but vmalloc() is not the common case worth optimizing for. -- Balbir ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 1/7] mm: introduce simple_malloc()/simple_free() 2008-11-16 21:39 ` Dave Airlie 2008-11-16 21:51 ` Arjan van de Ven @ 2008-11-17 4:43 ` Balbir Singh 1 sibling, 0 replies; 29+ messages in thread From: Balbir Singh @ 2008-11-17 4:43 UTC (permalink / raw) To: Dave Airlie Cc: Arjan van de Ven, David Miller, laijs, akpm, menage, kamezawa.hiroyu, jens.axboe, jack, jes, linux-kernel Dave Airlie wrote: > On Mon, Nov 17, 2008 at 4:57 AM, Arjan van de Ven <arjan@infradead.org> wrote: >> On Sun, 16 Nov 2008 00:19:26 -0800 (PST) >> David Miller <davem@davemloft.net> wrote: >> >>> From: Arjan van de Ven <arjan@infradead.org> >>> Date: Sat, 15 Nov 2008 20:52:29 -0800 >>> >>>> On Sun, 16 Nov 2008 12:33:15 +0800 >>>> Lai Jiangshan <laijs@cn.fujitsu.com> wrote: >>>> >>>>> some subsystem needs vmalloc() when required memory is large. >>>>> but current kernel has not APIs for this requirement. >>>>> this patch introduces simple_malloc() and simple_free(). >>>> I kinda really don't like this approach. vmalloc() (and especially, >>>> vfree()) is a really expensive operation, and vmalloc()'d memory is >>>> also slower (due to tlb pressure). Realistically, people should try >>>> hard to use small datastructure instead.... >>> This is happening in many places, already, for good reason. >>> >>> There are lots of places where we can't (core hash tables, etc.) and >>> we want NUMA spreading and reliable allocation, and thus vmalloc it >>> is. >> vmalloc() isn't 100% evil; for truely long term stuff it's sometimes a >> quite reasonable solution. >> >> There are some issues with it still: the vmalloc() space is shared >> with ioremap, modules and others and it's not all that big on 32 bit; on >> x86 you could well end up with only 64Mb total (after taking out the >> various ioremap's etc). >> >> Yes there's places where it's then totally fine to dip into this space >> at boot/init time. You mention a few very good users. >> (There's still the tlb miss cost on use but on modern cpus a tlb miss >> is actually quite cheap) >> >> But this doesn't make vmalloc() the magic bullet that solves the "oh >> Linux can't allocate large chunks of memory" problem. Specifically in >> driver space for things that get ported from other OSes. > > So we keep the duplicated code? or we just audit new callers.... I > think this patch > makes it easier to spot new callers doing something stupid. As davem > said we duplicate > this code all over the place, so for that reason along a simple > wrapper makes things a lot > easier, and also possibly a lot easier to change in the future to a > new non-sucky API. > > So I'm all for it maybe with a non simple name. Yes, tracking vmalloc() spillage all over the place is harder than using one abstraction and fixing that appropriately if needed (specially for 32 bit systems). -- Balbir ^ permalink raw reply [flat|nested] 29+ messages in thread
end of thread, other threads:[~2008-11-18 5:20 UTC | newest] Thread overview: 29+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-11-16 4:33 [PATCH 1/7] mm: introduce simple_malloc()/simple_free() Lai Jiangshan 2008-11-16 4:49 ` Alexey Dobriyan 2008-11-16 8:14 ` David Miller 2008-11-16 18:42 ` KOSAKI Motohiro 2008-11-16 4:52 ` Arjan van de Ven 2008-11-16 5:03 ` Andrew Morton 2008-11-16 5:35 ` Lai Jiangshan 2008-11-16 5:47 ` Andrew Morton 2008-11-16 5:53 ` Arjan van de Ven 2008-11-16 6:08 ` Eric Dumazet 2008-11-16 8:23 ` David Miller 2008-11-16 8:21 ` David Miller 2008-11-16 8:19 ` David Miller 2008-11-16 18:57 ` Arjan van de Ven 2008-11-16 21:39 ` Dave Airlie 2008-11-16 21:51 ` Arjan van de Ven 2008-11-16 22:42 ` Dave Airlie 2008-11-17 2:08 ` Lai Jiangshan 2008-11-17 4:53 ` Balbir Singh 2008-11-17 5:25 ` KAMEZAWA Hiroyuki 2008-11-17 6:43 ` KOSAKI Motohiro 2008-11-17 7:13 ` Andrew Morton 2008-11-17 7:15 ` David Miller 2008-11-17 8:10 ` KOSAKI Motohiro 2008-11-17 8:24 ` Balbir Singh 2008-11-18 4:39 ` Nick Piggin 2008-11-18 5:16 ` Lai Jiangshan 2008-11-17 4:46 ` Balbir Singh 2008-11-17 4:43 ` Balbir Singh
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox