public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/7] mm: introduce simple_malloc()/simple_free()
@ 2008-11-16  4:33 Lai Jiangshan
  2008-11-16  4:49 ` Alexey Dobriyan
  2008-11-16  4:52 ` Arjan van de Ven
  0 siblings, 2 replies; 29+ messages in thread
From: Lai Jiangshan @ 2008-11-16  4:33 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Paul Menage, kamezawa.hiroyu, Balbir Singh, Jens Axboe,
	David S. Miller, Jan Kara, Jes Sorensen,
	Linux Kernel Mailing List


some subsystem needs vmalloc() when required memory is large.
but current kernel has not APIs for this requirement.
this patch introduces simple_malloc() and simple_free().

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
diff --git a/include/linux/mm.h b/include/linux/mm.h
index ffee2f7..e9c11f7 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -13,6 +13,7 @@
 #include <linux/prio_tree.h>
 #include <linux/debug_locks.h>
 #include <linux/mm_types.h>
+#include <linux/vmalloc.h>
 
 struct mempolicy;
 struct anon_vma;
@@ -278,6 +279,36 @@ static inline int is_vmalloc_addr(const void *x)
 #endif
 }
 
+static inline void *__simple_malloc(unsigned long size, int pages_hint)
+{
+	if (size <= PAGE_SIZE * pages_hint)
+		return kmalloc(size, GFP_KERNEL);
+	else
+		return vmalloc(size);
+}
+
+/**
+ * simple_malloc - allocate memory by kmalloc() or vmalloc()
+ *
+ * if @size <= PAGE_SIZE, memory is allocated by kmalloc(),
+ * otherwise by vmalloc()
+ */
+static inline void *simple_malloc(unsigned long size)
+{
+	return __simple_malloc(size, 1);
+}
+
+/**
+ * simple_free - free the memory by kfree(), or vfree() if it is vmalloc addr
+ */
+static inline void simple_free(void *ptr)
+{
+	if (is_vmalloc_addr(ptr))
+		vfree(ptr);
+	else
+		kfree(ptr);
+}
+
 static inline struct page *compound_head(struct page *page)
 {
 	if (unlikely(PageTail(page)))



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/7] mm: introduce simple_malloc()/simple_free()
  2008-11-16  4:33 [PATCH 1/7] mm: introduce simple_malloc()/simple_free() Lai Jiangshan
@ 2008-11-16  4:49 ` Alexey Dobriyan
  2008-11-16  8:14   ` David Miller
  2008-11-16 18:42   ` KOSAKI Motohiro
  2008-11-16  4:52 ` Arjan van de Ven
  1 sibling, 2 replies; 29+ messages in thread
From: Alexey Dobriyan @ 2008-11-16  4:49 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: Andrew Morton, Paul Menage, kamezawa.hiroyu, Balbir Singh,
	Jens Axboe, David S. Miller, Jan Kara, Jes Sorensen,
	Linux Kernel Mailing List

On Sun, Nov 16, 2008 at 12:33:15PM +0800, Lai Jiangshan wrote:
> some subsystem needs vmalloc() when required memory is large.
> but current kernel has not APIs for this requirement.

It doesn't mean there should be such an API.

> this patch introduces simple_malloc() and simple_free().

Grossly misnamed. It should be kvmalloc/kvfree, at least.

> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -278,6 +279,36 @@ static inline int is_vmalloc_addr(const void *x)
>  #endif
>  }
>  
> +static inline void *__simple_malloc(unsigned long size, int pages_hint)
> +{
> +	if (size <= PAGE_SIZE * pages_hint)
> +		return kmalloc(size, GFP_KERNEL);
> +	else
> +		return vmalloc(size);
> +}
> +
> +/**
> + * simple_malloc - allocate memory by kmalloc() or vmalloc()
> + *
> + * if @size <= PAGE_SIZE, memory is allocated by kmalloc(),
> + * otherwise by vmalloc()
> + */
> +static inline void *simple_malloc(unsigned long size)
> +{
> +	return __simple_malloc(size, 1);
> +}
> +
> +/**
> + * simple_free - free the memory by kfree(), or vfree() if it is vmalloc addr
> + */
> +static inline void simple_free(void *ptr)
> +{
> +	if (is_vmalloc_addr(ptr))
> +		vfree(ptr);
> +	else
> +		kfree(ptr);
> +}

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/7] mm: introduce simple_malloc()/simple_free()
  2008-11-16  4:33 [PATCH 1/7] mm: introduce simple_malloc()/simple_free() Lai Jiangshan
  2008-11-16  4:49 ` Alexey Dobriyan
@ 2008-11-16  4:52 ` Arjan van de Ven
  2008-11-16  5:03   ` Andrew Morton
  2008-11-16  8:19   ` David Miller
  1 sibling, 2 replies; 29+ messages in thread
From: Arjan van de Ven @ 2008-11-16  4:52 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: Andrew Morton, Paul Menage, kamezawa.hiroyu, Balbir Singh,
	Jens Axboe, David S. Miller, Jan Kara, Jes Sorensen,
	Linux Kernel Mailing List

On Sun, 16 Nov 2008 12:33:15 +0800
Lai Jiangshan <laijs@cn.fujitsu.com> wrote:

> 
> some subsystem needs vmalloc() when required memory is large.
> but current kernel has not APIs for this requirement.
> this patch introduces simple_malloc() and simple_free().

Hi

I kinda really don't like this approach. vmalloc() (and especially,
vfree()) is a really expensive operation, and vmalloc()'d memory is
also slower (due to tlb pressure). Realistically, people should try hard
to use small datastructure instead....


-- 
Arjan van de Ven 	Intel Open Source Technology Centre
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/7] mm: introduce simple_malloc()/simple_free()
  2008-11-16  4:52 ` Arjan van de Ven
@ 2008-11-16  5:03   ` Andrew Morton
  2008-11-16  5:35     ` Lai Jiangshan
  2008-11-16  8:21     ` David Miller
  2008-11-16  8:19   ` David Miller
  1 sibling, 2 replies; 29+ messages in thread
From: Andrew Morton @ 2008-11-16  5:03 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Lai Jiangshan, Paul Menage, kamezawa.hiroyu, Balbir Singh,
	Jens Axboe, David S. Miller, Jan Kara, Jes Sorensen,
	Linux Kernel Mailing List

On Sat, 15 Nov 2008 20:52:29 -0800 Arjan van de Ven <arjan@infradead.org> wrote:

> On Sun, 16 Nov 2008 12:33:15 +0800
> Lai Jiangshan <laijs@cn.fujitsu.com> wrote:
> 
> > 
> > some subsystem needs vmalloc() when required memory is large.
> > but current kernel has not APIs for this requirement.
> > this patch introduces simple_malloc() and simple_free().
> 
> Hi
> 
> I kinda really don't like this approach. vmalloc() (and especially,
> vfree()) is a really expensive operation, and vmalloc()'d memory is
> also slower (due to tlb pressure).

And it can fragment, which effectively means a dead box.

> Realistically, people should try hard
> to use small datastructure instead....

Yup, it makes it easier for people to do something which we strongly
discourage.  The risk got worse with all these 64-bit machines with
vast amounts of virtual address space.  It makes it easier for people
to develop and "test" code which isn't reliable on smaller machines.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/7] mm: introduce simple_malloc()/simple_free()
  2008-11-16  5:03   ` Andrew Morton
@ 2008-11-16  5:35     ` Lai Jiangshan
  2008-11-16  5:47       ` Andrew Morton
  2008-11-16  5:53       ` Arjan van de Ven
  2008-11-16  8:21     ` David Miller
  1 sibling, 2 replies; 29+ messages in thread
From: Lai Jiangshan @ 2008-11-16  5:35 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Arjan van de Ven, Paul Menage, kamezawa.hiroyu, Balbir Singh,
	Jens Axboe, David S. Miller, Jan Kara, Jes Sorensen,
	Linux Kernel Mailing List

Andrew Morton wrote:
> On Sat, 15 Nov 2008 20:52:29 -0800 Arjan van de Ven <arjan@infradead.org> wrote:
> 
>> On Sun, 16 Nov 2008 12:33:15 +0800
>> Lai Jiangshan <laijs@cn.fujitsu.com> wrote:
>>
>>> some subsystem needs vmalloc() when required memory is large.
>>> but current kernel has not APIs for this requirement.
>>> this patch introduces simple_malloc() and simple_free().
>> Hi
>>
>> I kinda really don't like this approach. vmalloc() (and especially,
>> vfree()) is a really expensive operation, and vmalloc()'d memory is
>> also slower (due to tlb pressure).
> 
> And it can fragment, which effectively means a dead box.
> 
>> Realistically, people should try hard
>> to use small datastructure instead....
> 
> Yup, it makes it easier for people to do something which we strongly
> discourage.  The risk got worse with all these 64-bit machines with
> vast amounts of virtual address space.  It makes it easier for people
> to develop and "test" code which isn't reliable on smaller machines.
> 
>

vmalloc() is not good for performance and increasing fragment.
but vmalloc() is need for some subsystems' alternative malloc,
like cgroup's tasks file and other subsystems(about 20 subsystems).

these subsystems use kmalloc() in the most condition, but may need
vmalloc() in some rare condition. so they use alternative malloc.

So, since these subsystems' maintainer have good reasons for using vmalloc(),
they can use simple_malloc() too. simple_malloc() is not for common using.
(I should document when we use simple_malloc() in the code)

simple_free() is useful. there are several subsystems which use a flags
for selecting kfree() or vfree(), and some subsystems recount the size hardy
before kfree() or vfree().

Thanks, Lai.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/7] mm: introduce simple_malloc()/simple_free()
  2008-11-16  5:35     ` Lai Jiangshan
@ 2008-11-16  5:47       ` Andrew Morton
  2008-11-16  5:53       ` Arjan van de Ven
  1 sibling, 0 replies; 29+ messages in thread
From: Andrew Morton @ 2008-11-16  5:47 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: Arjan van de Ven, Paul Menage, kamezawa.hiroyu, Balbir Singh,
	Jens Axboe, David S. Miller, Jan Kara, Jes Sorensen,
	Linux Kernel Mailing List

On Sun, 16 Nov 2008 13:35:03 +0800 Lai Jiangshan <laijs@cn.fujitsu.com> wrote:

> Andrew Morton wrote:
> > On Sat, 15 Nov 2008 20:52:29 -0800 Arjan van de Ven <arjan@infradead.org> wrote:
> > 
> >> On Sun, 16 Nov 2008 12:33:15 +0800
> >> Lai Jiangshan <laijs@cn.fujitsu.com> wrote:
> >>
> >>> some subsystem needs vmalloc() when required memory is large.
> >>> but current kernel has not APIs for this requirement.
> >>> this patch introduces simple_malloc() and simple_free().
> >> Hi
> >>
> >> I kinda really don't like this approach. vmalloc() (and especially,
> >> vfree()) is a really expensive operation, and vmalloc()'d memory is
> >> also slower (due to tlb pressure).
> > 
> > And it can fragment, which effectively means a dead box.
> > 
> >> Realistically, people should try hard
> >> to use small datastructure instead....
> > 
> > Yup, it makes it easier for people to do something which we strongly
> > discourage.  The risk got worse with all these 64-bit machines with
> > vast amounts of virtual address space.  It makes it easier for people
> > to develop and "test" code which isn't reliable on smaller machines.
> > 
> >
> 
> vmalloc() is not good for performance and increasing fragment.
> but vmalloc() is need for some subsystems' alternative malloc,
> like cgroup's tasks file and other subsystems(about 20 subsystems).
> 
> these subsystems use kmalloc() in the most condition, but may need
> vmalloc() in some rare condition. so they use alternative malloc.
> 
> So, since these subsystems' maintainer have good reasons for using vmalloc(),
> they can use simple_malloc() too. simple_malloc() is not for common using.
> (I should document when we use simple_malloc() in the code)
> 
> simple_free() is useful. there are several subsystems which use a flags
> for selecting kfree() or vfree(), and some subsystems recount the size hardy
> before kfree() or vfree().
> 

Sure.  Apart from the names of the functions, it's a good cleanup of
existing code.

It's just that we must *really* discourage the use of vmalloc :(

Maybe we should call it i_am_a_hopeless_loser_alloc().  Sending the
per-subsystem patches to the maintainers would be fun.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/7] mm: introduce simple_malloc()/simple_free()
  2008-11-16  5:35     ` Lai Jiangshan
  2008-11-16  5:47       ` Andrew Morton
@ 2008-11-16  5:53       ` Arjan van de Ven
  2008-11-16  6:08         ` Eric Dumazet
  2008-11-16  8:23         ` David Miller
  1 sibling, 2 replies; 29+ messages in thread
From: Arjan van de Ven @ 2008-11-16  5:53 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: Andrew Morton, Paul Menage, kamezawa.hiroyu, Balbir Singh,
	Jens Axboe, David S. Miller, Jan Kara, Jes Sorensen,
	Linux Kernel Mailing List

On Sun, 16 Nov 2008 13:35:03 +0800
Lai Jiangshan <laijs@cn.fujitsu.com> wrote:

> 
> vmalloc() is not good for performance and increasing fragment.
> but vmalloc() is need for some subsystems' alternative malloc,
> like cgroup's tasks file and other subsystems(about 20 subsystems).

actually what you are pointing out is that these areas need improvement
to not need such huge blocks of memory... but only a series of smaller
blocks instead.

-- 
Arjan van de Ven 	Intel Open Source Technology Centre
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/7] mm: introduce simple_malloc()/simple_free()
  2008-11-16  5:53       ` Arjan van de Ven
@ 2008-11-16  6:08         ` Eric Dumazet
  2008-11-16  8:23         ` David Miller
  1 sibling, 0 replies; 29+ messages in thread
From: Eric Dumazet @ 2008-11-16  6:08 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Lai Jiangshan, Andrew Morton, Paul Menage, kamezawa.hiroyu,
	Balbir Singh, Jens Axboe, David S. Miller, Jan Kara, Jes Sorensen,
	Linux Kernel Mailing List

Arjan van de Ven a écrit :
> On Sun, 16 Nov 2008 13:35:03 +0800
> Lai Jiangshan <laijs@cn.fujitsu.com> wrote:
> 
>> vmalloc() is not good for performance and increasing fragment.
>> but vmalloc() is need for some subsystems' alternative malloc,
>> like cgroup's tasks file and other subsystems(about 20 subsystems).
> 
> actually what you are pointing out is that these areas need improvement
> to not need such huge blocks of memory... but only a series of smaller
> blocks instead.
> 


Just zap vmalloc()/vfree() then ?

More seriously, vmalloc()/vfree() were designed partly to make people life easier,
not to be *the* premium interface to manage kernel memory

Some parts of the kernel cannot afford the cost of vmalloc()/vfree(), so people
must think and design complex algos.

I personnaly like this cleanup too. For example bnx2 driver actually uses vmalloc() while
a kmalloc() should be OK for bnx2_alloc_rx_mem()

# grep bnx2 /proc/vmallocinfo
0xf8260000-0xf8274000   81920 bnx2_init_board+0x104/0xae0 phys=f6000000 ioremap
0xf8280000-0xf8294000   81920 bnx2_init_board+0x104/0xae0 phys=fa000000 ioremap
0xf82c1000-0xf82c3000    8192 bnx2_alloc_rx_mem+0x33/0x310 pages=1 vmalloc
0xf82d9000-0xf82db000    8192 bnx2_alloc_rx_mem+0x33/0x310 pages=1 vmalloc

I have two comments :

1) Names should be not "simple" : That is misleading. "convenient" maybe, or "slow"...

2) Since vmalloc()/vfree() is potentially a very expensive operation, we should make
   slow_malloc()/slow_free() or whatever name is chosen, uninlined. No need to try to save
   3 or 4 cpu cycles. This will save icache at least.



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/7] mm: introduce simple_malloc()/simple_free()
  2008-11-16  4:49 ` Alexey Dobriyan
@ 2008-11-16  8:14   ` David Miller
  2008-11-16 18:42   ` KOSAKI Motohiro
  1 sibling, 0 replies; 29+ messages in thread
From: David Miller @ 2008-11-16  8:14 UTC (permalink / raw)
  To: adobriyan
  Cc: laijs, akpm, menage, kamezawa.hiroyu, balbir, jens.axboe, jack,
	jes, linux-kernel

From: Alexey Dobriyan <adobriyan@gmail.com>
Date: Sun, 16 Nov 2008 07:49:01 +0300

> On Sun, Nov 16, 2008 at 12:33:15PM +0800, Lai Jiangshan wrote:
> > some subsystem needs vmalloc() when required memory is large.
> > but current kernel has not APIs for this requirement.
> 
> It doesn't mean there should be such an API.

If it leads to lots of code duplication, which this case has, that is
enough reason to provide helper functions.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/7] mm: introduce simple_malloc()/simple_free()
  2008-11-16  4:52 ` Arjan van de Ven
  2008-11-16  5:03   ` Andrew Morton
@ 2008-11-16  8:19   ` David Miller
  2008-11-16 18:57     ` Arjan van de Ven
  1 sibling, 1 reply; 29+ messages in thread
From: David Miller @ 2008-11-16  8:19 UTC (permalink / raw)
  To: arjan
  Cc: laijs, akpm, menage, kamezawa.hiroyu, balbir, jens.axboe, jack,
	jes, linux-kernel

From: Arjan van de Ven <arjan@infradead.org>
Date: Sat, 15 Nov 2008 20:52:29 -0800

> On Sun, 16 Nov 2008 12:33:15 +0800
> Lai Jiangshan <laijs@cn.fujitsu.com> wrote:
> 
> > 
> > some subsystem needs vmalloc() when required memory is large.
> > but current kernel has not APIs for this requirement.
> > this patch introduces simple_malloc() and simple_free().
> 
> I kinda really don't like this approach. vmalloc() (and especially,
> vfree()) is a really expensive operation, and vmalloc()'d memory is
> also slower (due to tlb pressure). Realistically, people should try hard
> to use small datastructure instead....

This is happening in many places, already, for good reason.

There are lots of places where we can't (core hash tables, etc.)  and
we want NUMA spreading and reliable allocation, and thus vmalloc it
is.

And with Nick Piggin's recent patches cost of the vmalloc allocation
operation itself is no longer an issue.

I encouraged people to write this patch because we _ALREADY DO THIS_
in many places and it's pure code duplication.

And whatever particular heuristics you think are best for these cases,
with a common piece of code you can enforce it in ONE place instead of
several which is what is happening right now..

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/7] mm: introduce simple_malloc()/simple_free()
  2008-11-16  5:03   ` Andrew Morton
  2008-11-16  5:35     ` Lai Jiangshan
@ 2008-11-16  8:21     ` David Miller
  1 sibling, 0 replies; 29+ messages in thread
From: David Miller @ 2008-11-16  8:21 UTC (permalink / raw)
  To: akpm
  Cc: arjan, laijs, menage, kamezawa.hiroyu, balbir, jens.axboe, jack,
	jes, linux-kernel

From: Andrew Morton <akpm@linux-foundation.org>
Date: Sat, 15 Nov 2008 21:03:34 -0800

> On Sat, 15 Nov 2008 20:52:29 -0800 Arjan van de Ven <arjan@infradead.org> wrote:
> 
> > Realistically, people should try hard
> > to use small datastructure instead....
> 
> Yup, it makes it easier for people to do something which we strongly
> discourage.  The risk got worse with all these 64-bit machines with
> vast amounts of virtual address space.  It makes it easier for people
> to develop and "test" code which isn't reliable on smaller machines.

I don't see it as being used for common operations (unlike vmap BTW,
which XFS does like crazy).  It's mainly for core hash tables and
similarly large structures.  And also, for those things, the NUMA
spreading helps.

And, in any event, for those cases:

1) we are already doing this exact thing

2) due to #1 we have N copies of the same damn code

So whatever policy fits your fancy, wouldn't it be better to "enforce"
this in one spot instead of the 12 copies we have already?


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/7] mm: introduce simple_malloc()/simple_free()
  2008-11-16  5:53       ` Arjan van de Ven
  2008-11-16  6:08         ` Eric Dumazet
@ 2008-11-16  8:23         ` David Miller
  1 sibling, 0 replies; 29+ messages in thread
From: David Miller @ 2008-11-16  8:23 UTC (permalink / raw)
  To: arjan
  Cc: laijs, akpm, menage, kamezawa.hiroyu, balbir, jens.axboe, jack,
	jes, linux-kernel

From: Arjan van de Ven <arjan@infradead.org>
Date: Sat, 15 Nov 2008 21:53:09 -0800

> On Sun, 16 Nov 2008 13:35:03 +0800
> Lai Jiangshan <laijs@cn.fujitsu.com> wrote:
> 
> > vmalloc() is not good for performance and increasing fragment.
> > but vmalloc() is need for some subsystems' alternative malloc,
> > like cgroup's tasks file and other subsystems(about 20 subsystems).
> 
> actually what you are pointing out is that these areas need improvement
> to not need such huge blocks of memory... but only a series of smaller
> blocks instead.

Like the IPSEC policy and state hash tables when hundreds of thousands
of rules become active?

You have to be joking.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/7] mm: introduce simple_malloc()/simple_free()
  2008-11-16  4:49 ` Alexey Dobriyan
  2008-11-16  8:14   ` David Miller
@ 2008-11-16 18:42   ` KOSAKI Motohiro
  1 sibling, 0 replies; 29+ messages in thread
From: KOSAKI Motohiro @ 2008-11-16 18:42 UTC (permalink / raw)
  To: Alexey Dobriyan
  Cc: kosaki.motohiro, Lai Jiangshan, Andrew Morton, Paul Menage,
	kamezawa.hiroyu, Balbir Singh, Jens Axboe, David S. Miller,
	Jan Kara, Jes Sorensen, Linux Kernel Mailing List

Hi

> > this patch introduces simple_malloc() and simple_free().
> 
> Grossly misnamed. It should be kvmalloc/kvfree, at least.

I also like the name of kvmalloc because "simple_" prefix seems too ambiguity.

Otherthings, looks good to me.
this patch is just cleanup. it definitly don't have any side effect.

So, I think vmalloc disadvantage discussion isn't point so much.
There alaredy is.



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/7] mm: introduce simple_malloc()/simple_free()
  2008-11-16  8:19   ` David Miller
@ 2008-11-16 18:57     ` Arjan van de Ven
  2008-11-16 21:39       ` Dave Airlie
  0 siblings, 1 reply; 29+ messages in thread
From: Arjan van de Ven @ 2008-11-16 18:57 UTC (permalink / raw)
  To: David Miller
  Cc: laijs, akpm, menage, kamezawa.hiroyu, balbir, jens.axboe, jack,
	jes, linux-kernel

On Sun, 16 Nov 2008 00:19:26 -0800 (PST)
David Miller <davem@davemloft.net> wrote:

> From: Arjan van de Ven <arjan@infradead.org>
> Date: Sat, 15 Nov 2008 20:52:29 -0800
> 
> > On Sun, 16 Nov 2008 12:33:15 +0800
> > Lai Jiangshan <laijs@cn.fujitsu.com> wrote:
> > 
> > > 
> > > some subsystem needs vmalloc() when required memory is large.
> > > but current kernel has not APIs for this requirement.
> > > this patch introduces simple_malloc() and simple_free().
> > 
> > I kinda really don't like this approach. vmalloc() (and especially,
> > vfree()) is a really expensive operation, and vmalloc()'d memory is
> > also slower (due to tlb pressure). Realistically, people should try
> > hard to use small datastructure instead....
> 
> This is happening in many places, already, for good reason.
> 
> There are lots of places where we can't (core hash tables, etc.)  and
> we want NUMA spreading and reliable allocation, and thus vmalloc it
> is.

vmalloc() isn't 100% evil; for truely long term stuff it's sometimes a
quite reasonable solution.

There are some issues with it still: the vmalloc() space is shared
with ioremap, modules and others and it's not all that big on 32 bit; on
x86 you could well end up with only 64Mb total (after taking out the
various ioremap's etc). 

Yes there's places where it's then totally fine to dip into this space
at boot/init time. You mention a few very good users.
(There's still the tlb miss cost on use but on modern cpus a tlb miss
is actually quite cheap)

But this doesn't make vmalloc() the magic bullet that solves the "oh
Linux can't allocate large chunks of memory" problem. Specifically in
driver space for things that get ported from other OSes.

-- 
Arjan van de Ven 	Intel Open Source Technology Centre
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/7] mm: introduce simple_malloc()/simple_free()
  2008-11-16 18:57     ` Arjan van de Ven
@ 2008-11-16 21:39       ` Dave Airlie
  2008-11-16 21:51         ` Arjan van de Ven
  2008-11-17  4:43         ` Balbir Singh
  0 siblings, 2 replies; 29+ messages in thread
From: Dave Airlie @ 2008-11-16 21:39 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: David Miller, laijs, akpm, menage, kamezawa.hiroyu, balbir,
	jens.axboe, jack, jes, linux-kernel

On Mon, Nov 17, 2008 at 4:57 AM, Arjan van de Ven <arjan@infradead.org> wrote:
> On Sun, 16 Nov 2008 00:19:26 -0800 (PST)
> David Miller <davem@davemloft.net> wrote:
>
>> From: Arjan van de Ven <arjan@infradead.org>
>> Date: Sat, 15 Nov 2008 20:52:29 -0800
>>
>> > On Sun, 16 Nov 2008 12:33:15 +0800
>> > Lai Jiangshan <laijs@cn.fujitsu.com> wrote:
>> >
>> > >
>> > > some subsystem needs vmalloc() when required memory is large.
>> > > but current kernel has not APIs for this requirement.
>> > > this patch introduces simple_malloc() and simple_free().
>> >
>> > I kinda really don't like this approach. vmalloc() (and especially,
>> > vfree()) is a really expensive operation, and vmalloc()'d memory is
>> > also slower (due to tlb pressure). Realistically, people should try
>> > hard to use small datastructure instead....
>>
>> This is happening in many places, already, for good reason.
>>
>> There are lots of places where we can't (core hash tables, etc.)  and
>> we want NUMA spreading and reliable allocation, and thus vmalloc it
>> is.
>
> vmalloc() isn't 100% evil; for truely long term stuff it's sometimes a
> quite reasonable solution.
>
> There are some issues with it still: the vmalloc() space is shared
> with ioremap, modules and others and it's not all that big on 32 bit; on
> x86 you could well end up with only 64Mb total (after taking out the
> various ioremap's etc).
>
> Yes there's places where it's then totally fine to dip into this space
> at boot/init time. You mention a few very good users.
> (There's still the tlb miss cost on use but on modern cpus a tlb miss
> is actually quite cheap)
>
> But this doesn't make vmalloc() the magic bullet that solves the "oh
> Linux can't allocate large chunks of memory" problem. Specifically in
> driver space for things that get ported from other OSes.

So we keep the duplicated code? or we just audit new callers.... I
think this patch
makes it easier to spot new callers doing something stupid. As davem
said we duplicate
this code all over the place, so for that reason along a simple
wrapper makes things a lot
easier, and also possibly a lot easier to change in the future to a
new non-sucky API.

So I'm all for it maybe with a non simple name.

Dave.

>
> --
> Arjan van de Ven        Intel Open Source Technology Centre
> For development, discussion and tips for power savings,
> visit http://www.lesswatts.org
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/7] mm: introduce simple_malloc()/simple_free()
  2008-11-16 21:39       ` Dave Airlie
@ 2008-11-16 21:51         ` Arjan van de Ven
  2008-11-16 22:42           ` Dave Airlie
                             ` (2 more replies)
  2008-11-17  4:43         ` Balbir Singh
  1 sibling, 3 replies; 29+ messages in thread
From: Arjan van de Ven @ 2008-11-16 21:51 UTC (permalink / raw)
  To: Dave Airlie
  Cc: David Miller, laijs, akpm, menage, kamezawa.hiroyu, balbir,
	jens.axboe, jack, jes, linux-kernel

On Mon, 17 Nov 2008 07:39:55 +1000
"Dave Airlie" <airlied@gmail.com> wrote:

> On Mon, Nov 17, 2008 at 4:57 AM, Arjan van de Ven
> <arjan@infradead.org> wrote:
> > On Sun, 16 Nov 2008 00:19:26 -0800 (PST)
> > David Miller <davem@davemloft.net> wrote:
> >
> >> From: Arjan van de Ven <arjan@infradead.org>
> >> Date: Sat, 15 Nov 2008 20:52:29 -0800
> >>
> >> > On Sun, 16 Nov 2008 12:33:15 +0800
> >> > Lai Jiangshan <laijs@cn.fujitsu.com> wrote:
> >> >
> >> > >
> >> > > some subsystem needs vmalloc() when required memory is large.
> >> > > but current kernel has not APIs for this requirement.
> >> > > this patch introduces simple_malloc() and simple_free().
> >> >
> >> > I kinda really don't like this approach. vmalloc() (and
> >> > especially, vfree()) is a really expensive operation, and
> >> > vmalloc()'d memory is also slower (due to tlb pressure).
> >> > Realistically, people should try hard to use small datastructure
> >> > instead....
> >>
> >> This is happening in many places, already, for good reason.
> >>
> >> There are lots of places where we can't (core hash tables, etc.)
> >> and we want NUMA spreading and reliable allocation, and thus
> >> vmalloc it is.
> >
> > vmalloc() isn't 100% evil; for truely long term stuff it's
> > sometimes a quite reasonable solution.
> >
> > There are some issues with it still: the vmalloc() space is shared
> > with ioremap, modules and others and it's not all that big on 32
> > bit; on x86 you could well end up with only 64Mb total (after
> > taking out the various ioremap's etc).
> >
> > Yes there's places where it's then totally fine to dip into this
> > space at boot/init time. You mention a few very good users.
> > (There's still the tlb miss cost on use but on modern cpus a tlb
> > miss is actually quite cheap)
> >
> > But this doesn't make vmalloc() the magic bullet that solves the "oh
> > Linux can't allocate large chunks of memory" problem. Specifically
> > in driver space for things that get ported from other OSes.
> 
> So we keep the duplicated code? or we just audit new callers.... I
> think this patch
> makes it easier to spot new callers doing something stupid. As davem
> said we duplicate
> this code all over the place, so for that reason along a simple
> wrapper makes things a lot
> easier, and also possibly a lot easier to change in the future to a
> new non-sucky API.
> 
> So I'm all for it maybe with a non simple name.
> 

I would go further than this.

Make the code just use vmalloc(). Period.

But then make vmalloc() smart and try do a direct mapping allocation
first, before falling back to a virtual mapping. (and based on size it
wouldn't even try it for just big things)



-- 
Arjan van de Ven 	Intel Open Source Technology Centre
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/7] mm: introduce simple_malloc()/simple_free()
  2008-11-16 21:51         ` Arjan van de Ven
@ 2008-11-16 22:42           ` Dave Airlie
  2008-11-17  2:08           ` Lai Jiangshan
  2008-11-17  4:46           ` Balbir Singh
  2 siblings, 0 replies; 29+ messages in thread
From: Dave Airlie @ 2008-11-16 22:42 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: David Miller, laijs, akpm, menage, kamezawa.hiroyu, balbir,
	jens.axboe, jack, jes, linux-kernel

On Mon, Nov 17, 2008 at 7:51 AM, Arjan van de Ven <arjan@infradead.org> wrote:
> On Mon, 17 Nov 2008 07:39:55 +1000
> "Dave Airlie" <airlied@gmail.com> wrote:
>
>> On Mon, Nov 17, 2008 at 4:57 AM, Arjan van de Ven
>> <arjan@infradead.org> wrote:
>> > On Sun, 16 Nov 2008 00:19:26 -0800 (PST)
>> > David Miller <davem@davemloft.net> wrote:
>> >
>> >> From: Arjan van de Ven <arjan@infradead.org>
>> >> Date: Sat, 15 Nov 2008 20:52:29 -0800
>> >>
>> >> > On Sun, 16 Nov 2008 12:33:15 +0800
>> >> > Lai Jiangshan <laijs@cn.fujitsu.com> wrote:
>> >> >
>> >> > >
>> >> > > some subsystem needs vmalloc() when required memory is large.
>> >> > > but current kernel has not APIs for this requirement.
>> >> > > this patch introduces simple_malloc() and simple_free().
>> >> >
>> >> > I kinda really don't like this approach. vmalloc() (and
>> >> > especially, vfree()) is a really expensive operation, and
>> >> > vmalloc()'d memory is also slower (due to tlb pressure).
>> >> > Realistically, people should try hard to use small datastructure
>> >> > instead....
>> >>
>> >> This is happening in many places, already, for good reason.
>> >>
>> >> There are lots of places where we can't (core hash tables, etc.)
>> >> and we want NUMA spreading and reliable allocation, and thus
>> >> vmalloc it is.
>> >
>> > vmalloc() isn't 100% evil; for truely long term stuff it's
>> > sometimes a quite reasonable solution.
>> >
>> > There are some issues with it still: the vmalloc() space is shared
>> > with ioremap, modules and others and it's not all that big on 32
>> > bit; on x86 you could well end up with only 64Mb total (after
>> > taking out the various ioremap's etc).
>> >
>> > Yes there's places where it's then totally fine to dip into this
>> > space at boot/init time. You mention a few very good users.
>> > (There's still the tlb miss cost on use but on modern cpus a tlb
>> > miss is actually quite cheap)
>> >
>> > But this doesn't make vmalloc() the magic bullet that solves the "oh
>> > Linux can't allocate large chunks of memory" problem. Specifically
>> > in driver space for things that get ported from other OSes.
>>
>> So we keep the duplicated code? or we just audit new callers.... I
>> think this patch
>> makes it easier to spot new callers doing something stupid. As davem
>> said we duplicate
>> this code all over the place, so for that reason along a simple
>> wrapper makes things a lot
>> easier, and also possibly a lot easier to change in the future to a
>> new non-sucky API.
>>
>> So I'm all for it maybe with a non simple name.
>>
>
> I would go further than this.
>
> Make the code just use vmalloc(). Period.
>
> But then make vmalloc() smart and try do a direct mapping allocation
> first, before falling back to a virtual mapping. (and based on size it
> wouldn't even try it for just big things

I seem to remember Christoph Lameter having something along these
lines before, did
vcompound do some of this type of thing...

Dave.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/7] mm: introduce simple_malloc()/simple_free()
  2008-11-16 21:51         ` Arjan van de Ven
  2008-11-16 22:42           ` Dave Airlie
@ 2008-11-17  2:08           ` Lai Jiangshan
  2008-11-17  4:53             ` Balbir Singh
  2008-11-17  4:46           ` Balbir Singh
  2 siblings, 1 reply; 29+ messages in thread
From: Lai Jiangshan @ 2008-11-17  2:08 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Dave Airlie, David Miller, akpm, menage, kamezawa.hiroyu, balbir,
	jens.axboe, jack, jes, linux-kernel, KOSAKI Motohiro, dada1,
	Alexey Dobriyan

Arjan van de Ven wrote:
> On Mon, 17 Nov 2008 07:39:55 +1000
> 
> I would go further than this.
> 
> Make the code just use vmalloc(). Period.
> 
> But then make vmalloc() smart and try do a direct mapping allocation
> first, before falling back to a virtual mapping. (and based on size it
> wouldn't even try it for just big things)
> 
> 
> 

Hi, Arjan van de Ven

(I'll rename simple_malloc/simple_free to kvmalloc/kvfree)

I think vmalloc() should only do one thing(virtual mapping). Your idea
can be implemented in helper function kvmalloc() if there is a good
algorithm provided for it.

kvmalloc() is for cleanup mostly. It will remove existed duplicate code.
As David and I pointed out, vmalloc() is need for some good reason.

If we do not introduce a helper function, these duplicate code
are still spread everywhere.


kvfree() is not only for free the memory allocated by kvmalloc().

kvfree() frees the memory that we don't know whether it was allocated
by kmalloc() nor vmalloc().

Someone use a flag for it, and other guys calculate size of memory again
before kfree() or vfree(). these two ways increase complexity, and
it is hard to prove the re-calculated size is reliable.

kvfree() will remove these needless complexity!



Thanx, Lai


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/7] mm: introduce simple_malloc()/simple_free()
  2008-11-16 21:39       ` Dave Airlie
  2008-11-16 21:51         ` Arjan van de Ven
@ 2008-11-17  4:43         ` Balbir Singh
  1 sibling, 0 replies; 29+ messages in thread
From: Balbir Singh @ 2008-11-17  4:43 UTC (permalink / raw)
  To: Dave Airlie
  Cc: Arjan van de Ven, David Miller, laijs, akpm, menage,
	kamezawa.hiroyu, jens.axboe, jack, jes, linux-kernel

Dave Airlie wrote:
> On Mon, Nov 17, 2008 at 4:57 AM, Arjan van de Ven <arjan@infradead.org> wrote:
>> On Sun, 16 Nov 2008 00:19:26 -0800 (PST)
>> David Miller <davem@davemloft.net> wrote:
>>
>>> From: Arjan van de Ven <arjan@infradead.org>
>>> Date: Sat, 15 Nov 2008 20:52:29 -0800
>>>
>>>> On Sun, 16 Nov 2008 12:33:15 +0800
>>>> Lai Jiangshan <laijs@cn.fujitsu.com> wrote:
>>>>
>>>>> some subsystem needs vmalloc() when required memory is large.
>>>>> but current kernel has not APIs for this requirement.
>>>>> this patch introduces simple_malloc() and simple_free().
>>>> I kinda really don't like this approach. vmalloc() (and especially,
>>>> vfree()) is a really expensive operation, and vmalloc()'d memory is
>>>> also slower (due to tlb pressure). Realistically, people should try
>>>> hard to use small datastructure instead....
>>> This is happening in many places, already, for good reason.
>>>
>>> There are lots of places where we can't (core hash tables, etc.)  and
>>> we want NUMA spreading and reliable allocation, and thus vmalloc it
>>> is.
>> vmalloc() isn't 100% evil; for truely long term stuff it's sometimes a
>> quite reasonable solution.
>>
>> There are some issues with it still: the vmalloc() space is shared
>> with ioremap, modules and others and it's not all that big on 32 bit; on
>> x86 you could well end up with only 64Mb total (after taking out the
>> various ioremap's etc).
>>
>> Yes there's places where it's then totally fine to dip into this space
>> at boot/init time. You mention a few very good users.
>> (There's still the tlb miss cost on use but on modern cpus a tlb miss
>> is actually quite cheap)
>>
>> But this doesn't make vmalloc() the magic bullet that solves the "oh
>> Linux can't allocate large chunks of memory" problem. Specifically in
>> driver space for things that get ported from other OSes.
> 
> So we keep the duplicated code? or we just audit new callers.... I
> think this patch
> makes it easier to spot new callers doing something stupid. As davem
> said we duplicate
> this code all over the place, so for that reason along a simple
> wrapper makes things a lot
> easier, and also possibly a lot easier to change in the future to a
> new non-sucky API.
> 
> So I'm all for it maybe with a non simple name.

Yes, tracking vmalloc() spillage all over the place is harder than using one
abstraction and fixing that appropriately if needed (specially for 32 bit systems).

-- 
	Balbir

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/7] mm: introduce simple_malloc()/simple_free()
  2008-11-16 21:51         ` Arjan van de Ven
  2008-11-16 22:42           ` Dave Airlie
  2008-11-17  2:08           ` Lai Jiangshan
@ 2008-11-17  4:46           ` Balbir Singh
  2 siblings, 0 replies; 29+ messages in thread
From: Balbir Singh @ 2008-11-17  4:46 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Dave Airlie, David Miller, laijs, akpm, menage, kamezawa.hiroyu,
	jens.axboe, jack, jes, linux-kernel

Arjan van de Ven wrote:
> On Mon, 17 Nov 2008 07:39:55 +1000
> "Dave Airlie" <airlied@gmail.com> wrote:
> 
>> On Mon, Nov 17, 2008 at 4:57 AM, Arjan van de Ven
>> <arjan@infradead.org> wrote:
>>> On Sun, 16 Nov 2008 00:19:26 -0800 (PST)
>>> David Miller <davem@davemloft.net> wrote:
>>>
>>>> From: Arjan van de Ven <arjan@infradead.org>
>>>> Date: Sat, 15 Nov 2008 20:52:29 -0800
>>>>
>>>>> On Sun, 16 Nov 2008 12:33:15 +0800
>>>>> Lai Jiangshan <laijs@cn.fujitsu.com> wrote:
>>>>>
>>>>>> some subsystem needs vmalloc() when required memory is large.
>>>>>> but current kernel has not APIs for this requirement.
>>>>>> this patch introduces simple_malloc() and simple_free().
>>>>> I kinda really don't like this approach. vmalloc() (and
>>>>> especially, vfree()) is a really expensive operation, and
>>>>> vmalloc()'d memory is also slower (due to tlb pressure).
>>>>> Realistically, people should try hard to use small datastructure
>>>>> instead....
>>>> This is happening in many places, already, for good reason.
>>>>
>>>> There are lots of places where we can't (core hash tables, etc.)
>>>> and we want NUMA spreading and reliable allocation, and thus
>>>> vmalloc it is.
>>> vmalloc() isn't 100% evil; for truely long term stuff it's
>>> sometimes a quite reasonable solution.
>>>
>>> There are some issues with it still: the vmalloc() space is shared
>>> with ioremap, modules and others and it's not all that big on 32
>>> bit; on x86 you could well end up with only 64Mb total (after
>>> taking out the various ioremap's etc).
>>>
>>> Yes there's places where it's then totally fine to dip into this
>>> space at boot/init time. You mention a few very good users.
>>> (There's still the tlb miss cost on use but on modern cpus a tlb
>>> miss is actually quite cheap)
>>>
>>> But this doesn't make vmalloc() the magic bullet that solves the "oh
>>> Linux can't allocate large chunks of memory" problem. Specifically
>>> in driver space for things that get ported from other OSes.
>> So we keep the duplicated code? or we just audit new callers.... I
>> think this patch
>> makes it easier to spot new callers doing something stupid. As davem
>> said we duplicate
>> this code all over the place, so for that reason along a simple
>> wrapper makes things a lot
>> easier, and also possibly a lot easier to change in the future to a
>> new non-sucky API.
>>
>> So I'm all for it maybe with a non simple name.
>>
> 
> I would go further than this.
> 
> Make the code just use vmalloc(). Period.
> 

But vmalloc() is always chunks of pages, not always desirable.

> But then make vmalloc() smart and try do a direct mapping allocation
> first, before falling back to a virtual mapping. (and based on size it
> wouldn't even try it for just big things)

If only slab/slub could do vmalloc() based caches, but vmalloc() is not the
common case worth optimizing for.

-- 
	Balbir

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/7] mm: introduce simple_malloc()/simple_free()
  2008-11-17  2:08           ` Lai Jiangshan
@ 2008-11-17  4:53             ` Balbir Singh
  2008-11-17  5:25               ` KAMEZAWA Hiroyuki
  0 siblings, 1 reply; 29+ messages in thread
From: Balbir Singh @ 2008-11-17  4:53 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: Arjan van de Ven, Dave Airlie, David Miller, akpm, menage,
	kamezawa.hiroyu, jens.axboe, jack, jes, linux-kernel,
	KOSAKI Motohiro, dada1, Alexey Dobriyan

Lai Jiangshan wrote:
> Arjan van de Ven wrote:
>> On Mon, 17 Nov 2008 07:39:55 +1000
>>
>> I would go further than this.
>>
>> Make the code just use vmalloc(). Period.
>>
>> But then make vmalloc() smart and try do a direct mapping allocation
>> first, before falling back to a virtual mapping. (and based on size it
>> wouldn't even try it for just big things)
>>
>>
>>
> 
> Hi, Arjan van de Ven
> 
> (I'll rename simple_malloc/simple_free to kvmalloc/kvfree)
> 

I would prefer to find a way to say that one cannot select gfp_mask with this API.

-- 
	Balbir

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/7] mm: introduce simple_malloc()/simple_free()
  2008-11-17  4:53             ` Balbir Singh
@ 2008-11-17  5:25               ` KAMEZAWA Hiroyuki
  2008-11-17  6:43                 ` KOSAKI Motohiro
  0 siblings, 1 reply; 29+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-11-17  5:25 UTC (permalink / raw)
  To: balbir
  Cc: Lai Jiangshan, Arjan van de Ven, Dave Airlie, David Miller, akpm,
	menage, jens.axboe, jack, jes, linux-kernel, KOSAKI Motohiro,
	dada1, Alexey Dobriyan

On Mon, 17 Nov 2008 10:23:24 +0530
Balbir Singh <balbir@linux.vnet.ibm.com> wrote:

> Lai Jiangshan wrote:
> > Arjan van de Ven wrote:
> >> On Mon, 17 Nov 2008 07:39:55 +1000
> >>
> >> I would go further than this.
> >>
> >> Make the code just use vmalloc(). Period.
> >>
> >> But then make vmalloc() smart and try do a direct mapping allocation
> >> first, before falling back to a virtual mapping. (and based on size it
> >> wouldn't even try it for just big things)
> >>
> >>
> >>
> > 
> > Hi, Arjan van de Ven
> > 
> > (I'll rename simple_malloc/simple_free to kvmalloc/kvfree)
> > 
> 
> I would prefer to find a way to say that one cannot select gfp_mask with this API.
> 
I think gfp_mask must be passed explicitly.

Thanks,
-Kame


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/7] mm: introduce simple_malloc()/simple_free()
  2008-11-17  5:25               ` KAMEZAWA Hiroyuki
@ 2008-11-17  6:43                 ` KOSAKI Motohiro
  2008-11-17  7:13                   ` Andrew Morton
  0 siblings, 1 reply; 29+ messages in thread
From: KOSAKI Motohiro @ 2008-11-17  6:43 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: balbir, Lai Jiangshan, Arjan van de Ven, Dave Airlie,
	David Miller, akpm, menage, jens.axboe, jack, jes, linux-kernel,
	dada1, Alexey Dobriyan

>> > (I'll rename simple_malloc/simple_free to kvmalloc/kvfree)
>> >
>>
>> I would prefer to find a way to say that one cannot select gfp_mask with this API.
>>
> I think gfp_mask must be passed explicitly.

Agreed.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/7] mm: introduce simple_malloc()/simple_free()
  2008-11-17  6:43                 ` KOSAKI Motohiro
@ 2008-11-17  7:13                   ` Andrew Morton
  2008-11-17  7:15                     ` David Miller
  2008-11-18  4:39                     ` Nick Piggin
  0 siblings, 2 replies; 29+ messages in thread
From: Andrew Morton @ 2008-11-17  7:13 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: KAMEZAWA Hiroyuki, balbir, Lai Jiangshan, Arjan van de Ven,
	Dave Airlie, David Miller, menage, jens.axboe, jack, jes,
	linux-kernel, dada1, Alexey Dobriyan

On Mon, 17 Nov 2008 15:43:59 +0900 "KOSAKI Motohiro" <kosaki.motohiro@jp.fujitsu.com> wrote:

> >> > (I'll rename simple_malloc/simple_free to kvmalloc/kvfree)
> >> >
> >>
> >> I would prefer to find a way to say that one cannot select gfp_mask with this API.
> >>
> > I think gfp_mask must be passed explicitly.
> 
> Agreed.

It would only make sense if __vmalloc() can be called in atomic contexts.

__vmalloc() cannot be called from irq contexts due to it taking
non-irq-safe spinlocks.

__vmalloc() kinda looks like it could be called from non-irq atomic
contexts with GFP_ATOMIC, but I think it lies.  For example,
pud_alloc_one/pmd_alloc_one/etc use hard-wired GFP_KERNEL.

In which case this new allocation function can only be called from
contexts where GFP_KERNEL can be used, hence we don't need to pass that
in - it would be misleading to do so.

In fact it's not immediately clear why __vmalloc() takes a gfp_t
argument either?

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/7] mm: introduce simple_malloc()/simple_free()
  2008-11-17  7:13                   ` Andrew Morton
@ 2008-11-17  7:15                     ` David Miller
  2008-11-17  8:10                       ` KOSAKI Motohiro
  2008-11-17  8:24                       ` Balbir Singh
  2008-11-18  4:39                     ` Nick Piggin
  1 sibling, 2 replies; 29+ messages in thread
From: David Miller @ 2008-11-17  7:15 UTC (permalink / raw)
  To: akpm
  Cc: kosaki.motohiro, kamezawa.hiroyu, balbir, laijs, arjan, airlied,
	menage, jens.axboe, jack, jes, linux-kernel, dada1, adobriyan

From: Andrew Morton <akpm@linux-foundation.org>
Date: Sun, 16 Nov 2008 23:13:01 -0800

> In fact it's not immediately clear why __vmalloc() takes a gfp_t
> argument either?

Probably for things like GFP_DMA32, GFP_HIGHMEM, et al.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/7] mm: introduce simple_malloc()/simple_free()
  2008-11-17  7:15                     ` David Miller
@ 2008-11-17  8:10                       ` KOSAKI Motohiro
  2008-11-17  8:24                       ` Balbir Singh
  1 sibling, 0 replies; 29+ messages in thread
From: KOSAKI Motohiro @ 2008-11-17  8:10 UTC (permalink / raw)
  To: David Miller
  Cc: akpm, kamezawa.hiroyu, balbir, laijs, arjan, airlied, menage,
	jens.axboe, jack, jes, linux-kernel, dada1, adobriyan

2008/11/17 David Miller <davem@davemloft.net>:
> From: Andrew Morton <akpm@linux-foundation.org>
> Date: Sun, 16 Nov 2008 23:13:01 -0800
>
>> In fact it's not immediately clear why __vmalloc() takes a gfp_t
>> argument either?
>
> Probably for things like GFP_DMA32, GFP_HIGHMEM, et al.

Thanks.

I think to __GFP_ZERO or __GFP_COLD etc..
but that also is right, imho.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/7] mm: introduce simple_malloc()/simple_free()
  2008-11-17  7:15                     ` David Miller
  2008-11-17  8:10                       ` KOSAKI Motohiro
@ 2008-11-17  8:24                       ` Balbir Singh
  1 sibling, 0 replies; 29+ messages in thread
From: Balbir Singh @ 2008-11-17  8:24 UTC (permalink / raw)
  To: David Miller
  Cc: akpm, kosaki.motohiro, kamezawa.hiroyu, laijs, arjan, airlied,
	menage, jens.axboe, jack, jes, linux-kernel, dada1, adobriyan

David Miller wrote:
> From: Andrew Morton <akpm@linux-foundation.org>
> Date: Sun, 16 Nov 2008 23:13:01 -0800
> 
>> In fact it's not immediately clear why __vmalloc() takes a gfp_t
>> argument either?
> 
> Probably for things like GFP_DMA32, GFP_HIGHMEM, et al.
> 

vmalloc() hides away GFP_DMA32 and hard codes gfp_mask to GFP_KERNEL |
__GFP_HIGHMEM. __vmalloc() lies like Andrew mentioned. For the use cases
mentioned in this thread, we don't really care about GFP_DMA32 (or do we?).

I would prefer to avoid passing the gfp_mask and call the API something like
blocking_vkmalloc() and blocking_vkzalloc() or something better.

-- 
	Balbir

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/7] mm: introduce simple_malloc()/simple_free()
  2008-11-17  7:13                   ` Andrew Morton
  2008-11-17  7:15                     ` David Miller
@ 2008-11-18  4:39                     ` Nick Piggin
  2008-11-18  5:16                       ` Lai Jiangshan
  1 sibling, 1 reply; 29+ messages in thread
From: Nick Piggin @ 2008-11-18  4:39 UTC (permalink / raw)
  To: Andrew Morton
  Cc: KOSAKI Motohiro, KAMEZAWA Hiroyuki, balbir, Lai Jiangshan,
	Arjan van de Ven, Dave Airlie, David Miller, menage, jens.axboe,
	jack, jes, linux-kernel, dada1, Alexey Dobriyan

On Monday 17 November 2008 18:13, Andrew Morton wrote:
> On Mon, 17 Nov 2008 15:43:59 +0900 "KOSAKI Motohiro" 
<kosaki.motohiro@jp.fujitsu.com> wrote:
> > >> > (I'll rename simple_malloc/simple_free to kvmalloc/kvfree)
> > >>
> > >> I would prefer to find a way to say that one cannot select gfp_mask
> > >> with this API.
> > >
> > > I think gfp_mask must be passed explicitly.
> >
> > Agreed.
>
> It would only make sense if __vmalloc() can be called in atomic contexts.
>
> __vmalloc() cannot be called from irq contexts due to it taking
> non-irq-safe spinlocks.
>
> __vmalloc() kinda looks like it could be called from non-irq atomic
> contexts with GFP_ATOMIC, but I think it lies.  For example,
> pud_alloc_one/pmd_alloc_one/etc use hard-wired GFP_KERNEL.

vmalloc/vfree / vmap/vunmap I think could now be made to be usable even
in irq context, I think. Freeing up vmalloc space with global tlb flush
can't be done from interrupt context, but now with the lazy unmapping,
you only have to mark the area as freed (and possibly kick off a thread
to do the actual unmapping).

I didn't actually add that, because yes it would increase overheads a
bit, and I would still prefer to wait for a real nice problem it solves
before adding such a capability...


> In which case this new allocation function can only be called from
> contexts where GFP_KERNEL can be used, hence we don't need to pass that
> in - it would be misleading to do so.
>
> In fact it's not immediately clear why __vmalloc() takes a gfp_t
> argument either?

Possibly a bugcheck for !GFP_WAIT || !GFP_FS || !GFP_IO, or a might_sleep()
or something would be a good idea to add...

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/7] mm: introduce simple_malloc()/simple_free()
  2008-11-18  4:39                     ` Nick Piggin
@ 2008-11-18  5:16                       ` Lai Jiangshan
  0 siblings, 0 replies; 29+ messages in thread
From: Lai Jiangshan @ 2008-11-18  5:16 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Andrew Morton, KOSAKI Motohiro, KAMEZAWA Hiroyuki, balbir,
	Arjan van de Ven, Dave Airlie, David Miller, menage, jens.axboe,
	jack, jes, linux-kernel, dada1, Alexey Dobriyan

Nick Piggin wrote:
> On Monday 17 November 2008 18:13, Andrew Morton wrote:
>> On Mon, 17 Nov 2008 15:43:59 +0900 "KOSAKI Motohiro" 
> <kosaki.motohiro@jp.fujitsu.com> wrote:
>>>>>> (I'll rename simple_malloc/simple_free to kvmalloc/kvfree)
>>>>> I would prefer to find a way to say that one cannot select gfp_mask
>>>>> with this API.
>>>> I think gfp_mask must be passed explicitly.
>>> Agreed.
>> It would only make sense if __vmalloc() can be called in atomic contexts.
>>
>> __vmalloc() cannot be called from irq contexts due to it taking
>> non-irq-safe spinlocks.
>>
>> __vmalloc() kinda looks like it could be called from non-irq atomic
>> contexts with GFP_ATOMIC, but I think it lies.  For example,
>> pud_alloc_one/pmd_alloc_one/etc use hard-wired GFP_KERNEL.
> 
> vmalloc/vfree / vmap/vunmap I think could now be made to be usable even
> in irq context, I think. Freeing up vmalloc space with global tlb flush
> can't be done from interrupt context, but now with the lazy unmapping,
> you only have to mark the area as freed (and possibly kick off a thread
> to do the actual unmapping).
> 
> I didn't actually add that, because yes it would increase overheads a
> bit, and I would still prefer to wait for a real nice problem it solves
> before adding such a capability...
> 
> 
>> In which case this new allocation function can only be called from
>> contexts where GFP_KERNEL can be used, hence we don't need to pass that
>> in - it would be misleading to do so.
>>
>> In fact it's not immediately clear why __vmalloc() takes a gfp_t
>> argument either?
> 
> Possibly a bugcheck for !GFP_WAIT || !GFP_FS || !GFP_IO, or a might_sleep()
> or something would be a good idea to add...
> 
> 
> 

Thanks.
Now, new patch for it is: http://lkml.org/lkml/2008/11/17/137

Lai.



^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2008-11-18  5:20 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-11-16  4:33 [PATCH 1/7] mm: introduce simple_malloc()/simple_free() Lai Jiangshan
2008-11-16  4:49 ` Alexey Dobriyan
2008-11-16  8:14   ` David Miller
2008-11-16 18:42   ` KOSAKI Motohiro
2008-11-16  4:52 ` Arjan van de Ven
2008-11-16  5:03   ` Andrew Morton
2008-11-16  5:35     ` Lai Jiangshan
2008-11-16  5:47       ` Andrew Morton
2008-11-16  5:53       ` Arjan van de Ven
2008-11-16  6:08         ` Eric Dumazet
2008-11-16  8:23         ` David Miller
2008-11-16  8:21     ` David Miller
2008-11-16  8:19   ` David Miller
2008-11-16 18:57     ` Arjan van de Ven
2008-11-16 21:39       ` Dave Airlie
2008-11-16 21:51         ` Arjan van de Ven
2008-11-16 22:42           ` Dave Airlie
2008-11-17  2:08           ` Lai Jiangshan
2008-11-17  4:53             ` Balbir Singh
2008-11-17  5:25               ` KAMEZAWA Hiroyuki
2008-11-17  6:43                 ` KOSAKI Motohiro
2008-11-17  7:13                   ` Andrew Morton
2008-11-17  7:15                     ` David Miller
2008-11-17  8:10                       ` KOSAKI Motohiro
2008-11-17  8:24                       ` Balbir Singh
2008-11-18  4:39                     ` Nick Piggin
2008-11-18  5:16                       ` Lai Jiangshan
2008-11-17  4:46           ` Balbir Singh
2008-11-17  4:43         ` Balbir Singh

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox