linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] cache last free vmap_area to avoid restarting beginning
  2010-04-29 13:43                   ` Steven Whitehouse
@ 2010-05-02 17:29                     ` Minchan Kim
  2010-05-05 12:48                       ` Steven Whitehouse
  0 siblings, 1 reply; 14+ messages in thread
From: Minchan Kim @ 2010-05-02 17:29 UTC (permalink / raw)
  To: Steven Whitehouse, Andrew Morton; +Cc: linux-mm, linux-kernel, Nick Piggin

On Thu, 2010-04-29 at 14:43 +0100, Steven Whitehouse wrote:
> Hi,
> 
> On Mon, 2010-04-19 at 23:12 +0900, Minchan Kim wrote:
> > On Mon, Apr 19, 2010 at 9:58 PM, Steven Whitehouse <swhiteho@redhat.com> wrote:
> > > Hi,
> > >
> > > On Mon, 2010-04-19 at 00:14 +0900, Minchan Kim wrote:
> > >> On Fri, 2010-04-16 at 15:10 +0100, Steven Whitehouse wrote:
> > >> > Hi,
> > >> >
> > >> > On Fri, 2010-04-16 at 01:51 +0900, Minchan Kim wrote:
> > >> > [snip]
> > >> > > Thanks for the explanation. It seems to be real issue.
> > >> > >
> > >> > > I tested to see effect with flush during rb tree search.
> > >> > >
> > >> > > Before I applied your patch, the time is 50300661 us.
> > >> > > After your patch, 11569357 us.
> > >> > > After my debug patch, 6104875 us.
> > >> > >
> > >> > > I tested it as changing threshold value.
> > >> > >
> > >> > > threshold time
> > >> > > 1000              13892809
> > >> > > 500               9062110
> > >> > > 200               6714172
> > >> > > 100               6104875
> > >> > > 50                6758316
> > >> > >
> > >> > My results show:
> > >> >
> > >> > threshold        time
> > >> > 100000           139309948
> > >> > 1000             13555878
> > >> > 500              10069801
> > >> > 200              7813667
> > >> > 100              18523172
> > >> > 50               18546256
> > >> >
> > >> > > And perf shows smp_call_function is very low percentage.
> > >> > >
> > >> > > In my cases, 100 is best.
> > >> > >
> > >> > Looks like 200 for me.
> > >> >
> > >> > I think you meant to use the non _minmax version of proc_dointvec too?
> > >>
> > >> Yes. My fault :)
> > >>
> > >> > Although it doesn't make any difference for this basic test.
> > >> >
> > >> > The original reporter also has 8 cpu cores I've discovered. In his case
> > >> > divided by 4 cpus where as mine are divided by 2 cpus, but I think that
> > >> > makes no real difference in this case.
> > >> >
> > >> > I'll try and get some further test results ready shortly. Many thanks
> > >> > for all your efforts in tracking this down,
> > >> >
> > >> > Steve.
> > >>
> > >> I voted "free area cache".
> > > My results with this patch are:
> > >
> > > vmalloc took 5419238 us
> > > vmalloc took 5432874 us
> > > vmalloc took 5425568 us
> > > vmalloc took 5423867 us
> > >
> > > So thats about a third of the time it took with my original patch, so
> > > very much going in the right direction :-)
> > 
> > Good. :)
> > 
> > >
> > > I did get a compile warning:
> > >  CC      mm/vmalloc.o
> > > mm/vmalloc.c: In function ‘__free_vmap_area’:
> > > mm/vmalloc.c:454: warning: unused variable ‘prev’
> > >
> > > ....harmless, but it should be fixed before the final version,
> > 
> > Of course. It's not formal patch but for showing concept  . :)
> > 
> > Thanks for consuming precious your time. :)
> > As Nick comments, I have to do further work.
> > Maybe Nick could do it faster than me.
> > Anyway, I hope it can solve your problem.
> > 
> > Thanks, Steven.
> > 
> > >
> > > Steve.
> > >
> > >
> 
> Your latest patch has now been run though the GFS2 tests which
> originally triggered my investigation. It seems to solve the problem
> completely. Maybe thanks for your efforts in helping us find and fix the
> problem. The next question is what remains to be done in order to get
> the patch into a form suitable for upstream merge?
> 
> Steve.
> 
Hi, Steven. 

Sorry for lazy response.
I wanted to submit the patch which implement Nick's request whole.
And unfortunately, I am so busy now. 
But if it's urgent, I want to submit this one firstly and 
at next version, maybe I will submit remained TODO things 
after middle of May.

I think this patch can't make regression other usages.
Nick. What do you think about?

== CUT_HERE ==
>From c93437583b5ff476fcfe13901898f981baa672d8 Mon Sep 17 00:00:00 2001
From: Minchan Kim <minchan.kim@gmail.com>
Date: Mon, 3 May 2010 01:43:30 +0900
Subject: [PATCH] cache last free vmap_area to avoid restarting beginning.

Steven Whitehouse reported that GFS2 had a regression about vmalloc.
He measured some test module to compare vmalloc speed on the two cases.

1. lazy TLB flush
2. disable lazy TLB flush by hard coding

1)
vmalloc took 148798983 us
vmalloc took 151664529 us
vmalloc took 152416398 us
vmalloc took 151837733 us

2)
vmalloc took 15363634 us
vmalloc took 15358026 us
vmalloc took 15240955 us
vmalloc took 15402302 us

You can refer test module and Steven's patch
with https://bugzilla.redhat.com/show_bug.cgi?id=581459.

The cause is that lazy TLB flush can delay release vmap_area.
OTOH, To find free vmap_area is always started from beginnig of rbnode.
So before lazy TLB flush happens, searching free vmap_area could take
long time.

Steven's experiment can do 9 times faster than old.
But Always disable lazy TLB flush is not good.

This patch caches next free vmap_area to accelerate.
In my test case, following as.

The result is following as.

1) vanilla
elapsed time                    # search of rbtree
vmalloc took 49121724 us                5535
vmalloc took 50675245 us                5535
vmalloc took 48987711 us                5535
vmalloc took 54232479 us                5535
vmalloc took 50258117 us                5535
vmalloc took 49424859 us                5535

3) Steven's patch

elapsed time                    # search of rbtree
vmalloc took 11363341 us                62
vmalloc took 12798868 us                62
vmalloc took 13247942 us                62
vmalloc took 11434647 us                62
vmalloc took 13221733 us                62
vmalloc took 12134019 us                62

2) my patch(vmap cache)
elapsed time                    # search of rbtree
vmalloc took 5159893 us                 8
vmalloc took 5124434 us                 8
vmalloc took 5123291 us                 8
vmalloc took 5145396 us                 12
vmalloc took 5163605 us                 8
vmalloc took 5945663 us                 8

Nick commented some advise.
"
- invalidating the cache in the case of vstart being decreased.
- Don't unconditionally reset the cache to the last vm area freed,
 because you might have a higher area freed after a lower area. Only
 reset if the freed area is lower.
- Do keep a cached hole size, so smaller lookups can restart a full
 search.
- refactoring rbtree search code to manage alloc_vmap_area complexity
"

Now, it's on my TODO list.

Cc: Nick Piggin <npiggin@suse.de>
Reported-by: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
Tested-by: Steven Whitehouse <swhiteho@redhat.com>
---
 mm/vmalloc.c |   49 +++++++++++++++++++++++++++++++++++--------------
 1 files changed, 35 insertions(+), 14 deletions(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index ae00746..56f09ec 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -263,6 +263,7 @@ struct vmap_area {
 
 static DEFINE_SPINLOCK(vmap_area_lock);
 static struct rb_root vmap_area_root = RB_ROOT;
+static struct rb_node *free_vmap_cache;
 static LIST_HEAD(vmap_area_list);
 static unsigned long vmap_area_pcpu_hole;
 
@@ -319,6 +320,7 @@ static void __insert_vmap_area(struct vmap_area *va)
 
 static void purge_vmap_area_lazy(void);
 
+unsigned long max_lookup_count;
 /*
  * Allocate a region of KVA of the specified size and alignment, within the
  * vstart and vend.
@@ -332,6 +334,8 @@ static struct vmap_area *alloc_vmap_area(unsigned long size,
 	struct rb_node *n;
 	unsigned long addr;
 	int purged = 0;
+	int lookup_cache = 0;
+	struct vmap_area *first;
 
 	BUG_ON(!size);
 	BUG_ON(size & ~PAGE_MASK);
@@ -342,29 +346,42 @@ static struct vmap_area *alloc_vmap_area(unsigned long size,
 		return ERR_PTR(-ENOMEM);
 
 retry:
+	first = NULL;
 	addr = ALIGN(vstart, align);
 
 	spin_lock(&vmap_area_lock);
 	if (addr + size - 1 < addr)
 		goto overflow;
 
-	/* XXX: could have a last_hole cache */
 	n = vmap_area_root.rb_node;
-	if (n) {
-		struct vmap_area *first = NULL;
+	if (free_vmap_cache && !purged) {
+		struct vmap_area *cache;
+		cache = rb_entry(free_vmap_cache, struct vmap_area, rb_node);
+		if (cache->va_start >= addr && cache->va_end < vend) {
+			lookup_cache = 1;
+			n = free_vmap_cache;
+		}
+	}
 
-		do {
-			struct vmap_area *tmp;
-			tmp = rb_entry(n, struct vmap_area, rb_node);
-			if (tmp->va_end >= addr) {
-				if (!first && tmp->va_start < addr + size)
+	if (n) {
+		if (!lookup_cache) {
+			do {
+				struct vmap_area *tmp;
+				tmp = rb_entry(n, struct vmap_area, rb_node);
+				if (tmp->va_end >= addr) {
+					if (!first && tmp->va_start < addr + size)
+						first = tmp;
+					n = n->rb_left;
+				} else {
 					first = tmp;
-				n = n->rb_left;
-			} else {
-				first = tmp;
-				n = n->rb_right;
-			}
-		} while (n);
+					n = n->rb_right;
+				}
+			} while (n);
+		}
+		else {
+			first = rb_entry(n, struct vmap_area, rb_node);
+			addr = first->va_start;
+		}
 
 		if (!first)
 			goto found;
@@ -396,6 +413,7 @@ overflow:
 		if (!purged) {
 			purge_vmap_area_lazy();
 			purged = 1;
+			lookup_cache = 0;
 			goto retry;
 		}
 		if (printk_ratelimit())
@@ -412,6 +430,7 @@ overflow:
 	va->va_end = addr + size;
 	va->flags = 0;
 	__insert_vmap_area(va);
+	free_vmap_cache = &va->rb_node;
 	spin_unlock(&vmap_area_lock);
 
 	return va;
@@ -426,7 +445,9 @@ static void rcu_free_va(struct rcu_head *head)
 
 static void __free_vmap_area(struct vmap_area *va)
 {
+	struct rb_node *prev;
 	BUG_ON(RB_EMPTY_NODE(&va->rb_node));
+	free_vmap_cache = rb_prev(&va->rb_node);
 	rb_erase(&va->rb_node, &vmap_area_root);
 	RB_CLEAR_NODE(&va->rb_node);
 	list_del_rcu(&va->list);
-- 
1.7.0.5





-- 
Kind regards,
Minchan Kim




^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH] cache last free vmap_area to avoid restarting beginning
  2010-05-02 17:29                     ` [PATCH] cache last free vmap_area to avoid restarting beginning Minchan Kim
@ 2010-05-05 12:48                       ` Steven Whitehouse
  2010-05-05 16:16                         ` Nick Piggin
  0 siblings, 1 reply; 14+ messages in thread
From: Steven Whitehouse @ 2010-05-05 12:48 UTC (permalink / raw)
  To: Minchan Kim; +Cc: Andrew Morton, linux-mm, linux-kernel, Nick Piggin

Hi,

On Mon, 2010-05-03 at 02:29 +0900, Minchan Kim wrote:
> Hi, Steven. 
> 
> Sorry for lazy response.
> I wanted to submit the patch which implement Nick's request whole.
> And unfortunately, I am so busy now. 
> But if it's urgent, I want to submit this one firstly and 
> at next version, maybe I will submit remained TODO things 
> after middle of May.
> 
> I think this patch can't make regression other usages.
> Nick. What do you think about?
> 
I guess the question is whether the remaining items are essential for
correct functioning of this patch, or whether they are "it would be nice
if" items. I suspect that they are the latter (I'm not a VM expert, but
from the brief descriptions it looks like that to me) in which case I'd
suggest send the currently existing patch first and the following up
with the remaining changes later.

We have got a nice speed up with your current patch and so far as I'm
aware not introduced any new bugs or regressions with it.

Nick, does that sound ok?

Steve.

> == CUT_HERE ==
> >From c93437583b5ff476fcfe13901898f981baa672d8 Mon Sep 17 00:00:00 2001
> From: Minchan Kim <minchan.kim@gmail.com>
> Date: Mon, 3 May 2010 01:43:30 +0900
> Subject: [PATCH] cache last free vmap_area to avoid restarting beginning.
> 
> Steven Whitehouse reported that GFS2 had a regression about vmalloc.
> He measured some test module to compare vmalloc speed on the two cases.
> 
> 1. lazy TLB flush
> 2. disable lazy TLB flush by hard coding
> 
> 1)
> vmalloc took 148798983 us
> vmalloc took 151664529 us
> vmalloc took 152416398 us
> vmalloc took 151837733 us
> 
> 2)
> vmalloc took 15363634 us
> vmalloc took 15358026 us
> vmalloc took 15240955 us
> vmalloc took 15402302 us
> 
> You can refer test module and Steven's patch
> with https://bugzilla.redhat.com/show_bug.cgi?id=581459.
> 
> The cause is that lazy TLB flush can delay release vmap_area.
> OTOH, To find free vmap_area is always started from beginnig of rbnode.
> So before lazy TLB flush happens, searching free vmap_area could take
> long time.
> 
> Steven's experiment can do 9 times faster than old.
> But Always disable lazy TLB flush is not good.
> 
> This patch caches next free vmap_area to accelerate.
> In my test case, following as.
> 
> The result is following as.
> 
> 1) vanilla
> elapsed time                    # search of rbtree
> vmalloc took 49121724 us                5535
> vmalloc took 50675245 us                5535
> vmalloc took 48987711 us                5535
> vmalloc took 54232479 us                5535
> vmalloc took 50258117 us                5535
> vmalloc took 49424859 us                5535
> 
> 3) Steven's patch
> 
> elapsed time                    # search of rbtree
> vmalloc took 11363341 us                62
> vmalloc took 12798868 us                62
> vmalloc took 13247942 us                62
> vmalloc took 11434647 us                62
> vmalloc took 13221733 us                62
> vmalloc took 12134019 us                62
> 
> 2) my patch(vmap cache)
> elapsed time                    # search of rbtree
> vmalloc took 5159893 us                 8
> vmalloc took 5124434 us                 8
> vmalloc took 5123291 us                 8
> vmalloc took 5145396 us                 12
> vmalloc took 5163605 us                 8
> vmalloc took 5945663 us                 8
> 
> Nick commented some advise.
> "
> - invalidating the cache in the case of vstart being decreased.
> - Don't unconditionally reset the cache to the last vm area freed,
>  because you might have a higher area freed after a lower area. Only
>  reset if the freed area is lower.
> - Do keep a cached hole size, so smaller lookups can restart a full
>  search.
> - refactoring rbtree search code to manage alloc_vmap_area complexity
> "
> 
> Now, it's on my TODO list.
> 
> Cc: Nick Piggin <npiggin@suse.de>
> Reported-by: Steven Whitehouse <swhiteho@redhat.com>
> Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
> Tested-by: Steven Whitehouse <swhiteho@redhat.com>
> ---
>  mm/vmalloc.c |   49 +++++++++++++++++++++++++++++++++++--------------
>  1 files changed, 35 insertions(+), 14 deletions(-)
> 
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index ae00746..56f09ec 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -263,6 +263,7 @@ struct vmap_area {
>  
>  static DEFINE_SPINLOCK(vmap_area_lock);
>  static struct rb_root vmap_area_root = RB_ROOT;
> +static struct rb_node *free_vmap_cache;
>  static LIST_HEAD(vmap_area_list);
>  static unsigned long vmap_area_pcpu_hole;
>  
> @@ -319,6 +320,7 @@ static void __insert_vmap_area(struct vmap_area *va)
>  
>  static void purge_vmap_area_lazy(void);
>  
> +unsigned long max_lookup_count;
>  /*
>   * Allocate a region of KVA of the specified size and alignment, within the
>   * vstart and vend.
> @@ -332,6 +334,8 @@ static struct vmap_area *alloc_vmap_area(unsigned long size,
>  	struct rb_node *n;
>  	unsigned long addr;
>  	int purged = 0;
> +	int lookup_cache = 0;
> +	struct vmap_area *first;
>  
>  	BUG_ON(!size);
>  	BUG_ON(size & ~PAGE_MASK);
> @@ -342,29 +346,42 @@ static struct vmap_area *alloc_vmap_area(unsigned long size,
>  		return ERR_PTR(-ENOMEM);
>  
>  retry:
> +	first = NULL;
>  	addr = ALIGN(vstart, align);
>  
>  	spin_lock(&vmap_area_lock);
>  	if (addr + size - 1 < addr)
>  		goto overflow;
>  
> -	/* XXX: could have a last_hole cache */
>  	n = vmap_area_root.rb_node;
> -	if (n) {
> -		struct vmap_area *first = NULL;
> +	if (free_vmap_cache && !purged) {
> +		struct vmap_area *cache;
> +		cache = rb_entry(free_vmap_cache, struct vmap_area, rb_node);
> +		if (cache->va_start >= addr && cache->va_end < vend) {
> +			lookup_cache = 1;
> +			n = free_vmap_cache;
> +		}
> +	}
>  
> -		do {
> -			struct vmap_area *tmp;
> -			tmp = rb_entry(n, struct vmap_area, rb_node);
> -			if (tmp->va_end >= addr) {
> -				if (!first && tmp->va_start < addr + size)
> +	if (n) {
> +		if (!lookup_cache) {
> +			do {
> +				struct vmap_area *tmp;
> +				tmp = rb_entry(n, struct vmap_area, rb_node);
> +				if (tmp->va_end >= addr) {
> +					if (!first && tmp->va_start < addr + size)
> +						first = tmp;
> +					n = n->rb_left;
> +				} else {
>  					first = tmp;
> -				n = n->rb_left;
> -			} else {
> -				first = tmp;
> -				n = n->rb_right;
> -			}
> -		} while (n);
> +					n = n->rb_right;
> +				}
> +			} while (n);
> +		}
> +		else {
> +			first = rb_entry(n, struct vmap_area, rb_node);
> +			addr = first->va_start;
> +		}
>  
>  		if (!first)
>  			goto found;
> @@ -396,6 +413,7 @@ overflow:
>  		if (!purged) {
>  			purge_vmap_area_lazy();
>  			purged = 1;
> +			lookup_cache = 0;
>  			goto retry;
>  		}
>  		if (printk_ratelimit())
> @@ -412,6 +430,7 @@ overflow:
>  	va->va_end = addr + size;
>  	va->flags = 0;
>  	__insert_vmap_area(va);
> +	free_vmap_cache = &va->rb_node;
>  	spin_unlock(&vmap_area_lock);
>  
>  	return va;
> @@ -426,7 +445,9 @@ static void rcu_free_va(struct rcu_head *head)
>  
>  static void __free_vmap_area(struct vmap_area *va)
>  {
> +	struct rb_node *prev;
>  	BUG_ON(RB_EMPTY_NODE(&va->rb_node));
> +	free_vmap_cache = rb_prev(&va->rb_node);
>  	rb_erase(&va->rb_node, &vmap_area_root);
>  	RB_CLEAR_NODE(&va->rb_node);
>  	list_del_rcu(&va->list);
> -- 
> 1.7.0.5
> 
> 
> 
> 
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] cache last free vmap_area to avoid restarting beginning
  2010-05-05 12:48                       ` Steven Whitehouse
@ 2010-05-05 16:16                         ` Nick Piggin
  2010-05-17 12:42                           ` Steven Whitehouse
                                             ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Nick Piggin @ 2010-05-05 16:16 UTC (permalink / raw)
  To: Steven Whitehouse; +Cc: Minchan Kim, Andrew Morton, linux-mm, linux-kernel

On Wed, May 05, 2010 at 01:48:48PM +0100, Steven Whitehouse wrote:
> Hi,
> 
> On Mon, 2010-05-03 at 02:29 +0900, Minchan Kim wrote:
> > Hi, Steven. 
> > 
> > Sorry for lazy response.
> > I wanted to submit the patch which implement Nick's request whole.
> > And unfortunately, I am so busy now. 
> > But if it's urgent, I want to submit this one firstly and 
> > at next version, maybe I will submit remained TODO things 
> > after middle of May.
> > 
> > I think this patch can't make regression other usages.
> > Nick. What do you think about?
> > 
> I guess the question is whether the remaining items are essential for
> correct functioning of this patch, or whether they are "it would be nice
> if" items. I suspect that they are the latter (I'm not a VM expert, but
> from the brief descriptions it looks like that to me) in which case I'd
> suggest send the currently existing patch first and the following up
> with the remaining changes later.
> 
> We have got a nice speed up with your current patch and so far as I'm
> aware not introduced any new bugs or regressions with it.
> 
> Nick, does that sound ok?

Just got around to looking at it again. I definitely agree we need to
fix the regression, however I'm concerned about introducing other
possible problems while doing that.

The following patch should (modulo bugs, but it's somewhat tested) give
no difference in the allocation patterns, so won't introduce virtual
memory layout changes.

Any chance you could test it?

---
 mm/vmalloc.c |   49 +++++++++++++++++++++++++++++++++++--------------
 1 files changed, 35 insertions(+), 14 deletions(-)

Index: linux-2.6/mm/vmalloc.c
===================================================================
--- linux-2.6.orig/mm/vmalloc.c
+++ linux-2.6/mm/vmalloc.c
@@ -262,8 +262,13 @@ struct vmap_area {
 };
 
 static DEFINE_SPINLOCK(vmap_area_lock);
-static struct rb_root vmap_area_root = RB_ROOT;
 static LIST_HEAD(vmap_area_list);
+static struct rb_root vmap_area_root = RB_ROOT;
+
+static struct rb_node *free_vmap_cache;
+static unsigned long cached_hole_size;
+static unsigned long cached_start;
+
 static unsigned long vmap_area_pcpu_hole;
 
 static struct vmap_area *__find_vmap_area(unsigned long addr)
@@ -332,6 +337,7 @@ static struct vmap_area *alloc_vmap_area
 	struct rb_node *n;
 	unsigned long addr;
 	int purged = 0;
+	struct vmap_area *first;
 
 	BUG_ON(!size);
 	BUG_ON(size & ~PAGE_MASK);
@@ -348,11 +354,23 @@ retry:
 	if (addr + size - 1 < addr)
 		goto overflow;
 
-	/* XXX: could have a last_hole cache */
-	n = vmap_area_root.rb_node;
-	if (n) {
-		struct vmap_area *first = NULL;
+	if (size <= cached_hole_size || addr < cached_start || !free_vmap_cache) {
+		cached_hole_size = 0;
+		cached_start = addr;
+		free_vmap_cache = NULL;
+	}
 
+	/* find starting point for our search */
+	if (free_vmap_cache) {
+		first = rb_entry(free_vmap_cache, struct vmap_area, rb_node);
+		addr = ALIGN(first->va_end + PAGE_SIZE, align);
+
+	} else {
+		n = vmap_area_root.rb_node;
+		if (!n)
+			goto found;
+
+		first = NULL;
 		do {
 			struct vmap_area *tmp;
 			tmp = rb_entry(n, struct vmap_area, rb_node);
@@ -369,26 +387,36 @@ retry:
 		if (!first)
 			goto found;
 
-		if (first->va_end < addr) {
+		if (first->va_start < addr) {
+			BUG_ON(first->va_end < addr);
 			n = rb_next(&first->rb_node);
+			addr = ALIGN(first->va_end + PAGE_SIZE, align);
 			if (n)
 				first = rb_entry(n, struct vmap_area, rb_node);
 			else
 				goto found;
 		}
+		BUG_ON(first->va_start < addr);
+		if (addr + cached_hole_size < first->va_start)
+			cached_hole_size = first->va_start - addr;
+	}
 
-		while (addr + size > first->va_start && addr + size <= vend) {
-			addr = ALIGN(first->va_end + PAGE_SIZE, align);
-			if (addr + size - 1 < addr)
-				goto overflow;
+	/* from the starting point, walk areas until a suitable hole is found */
 
-			n = rb_next(&first->rb_node);
-			if (n)
-				first = rb_entry(n, struct vmap_area, rb_node);
-			else
-				goto found;
-		}
+	while (addr + size > first->va_start && addr + size <= vend) {
+		if (addr + cached_hole_size < first->va_start)
+			cached_hole_size = first->va_start - addr;
+		addr = ALIGN(first->va_end + PAGE_SIZE, align);
+		if (addr + size - 1 < addr)
+			goto overflow;
+
+		n = rb_next(&first->rb_node);
+		if (n)
+			first = rb_entry(n, struct vmap_area, rb_node);
+		else
+			goto found;
 	}
+
 found:
 	if (addr + size > vend) {
 overflow:
@@ -412,6 +440,7 @@ overflow:
 	va->va_end = addr + size;
 	va->flags = 0;
 	__insert_vmap_area(va);
+	free_vmap_cache = &va->rb_node;
 	spin_unlock(&vmap_area_lock);
 
 	return va;
@@ -427,6 +456,21 @@ static void rcu_free_va(struct rcu_head
 static void __free_vmap_area(struct vmap_area *va)
 {
 	BUG_ON(RB_EMPTY_NODE(&va->rb_node));
+
+	if (free_vmap_cache) {
+		if (va->va_end < cached_start) {
+			cached_hole_size = 0;
+			cached_start = 0;
+			free_vmap_cache = NULL;
+		} else {
+			struct vmap_area *cache;
+			cache = rb_entry(free_vmap_cache, struct vmap_area, rb_node);
+			if (va->va_start <= cache->va_start) {
+				free_vmap_cache = rb_prev(&va->rb_node);
+				cache = rb_entry(free_vmap_cache, struct vmap_area, rb_node);
+			}
+		}
+	}
 	rb_erase(&va->rb_node, &vmap_area_root);
 	RB_CLEAR_NODE(&va->rb_node);
 	list_del_rcu(&va->list);

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] cache last free vmap_area to avoid restarting beginning
  2010-05-05 16:16                         ` Nick Piggin
@ 2010-05-17 12:42                           ` Steven Whitehouse
  2010-05-18 13:44                             ` Steven Whitehouse
  2010-05-19 13:54                           ` Steven Whitehouse
  2010-05-22  9:53                           ` Minchan Kim
  2 siblings, 1 reply; 14+ messages in thread
From: Steven Whitehouse @ 2010-05-17 12:42 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Minchan Kim, Andrew Morton, linux-mm, linux-kernel

Hi,

On Thu, 2010-05-06 at 02:16 +1000, Nick Piggin wrote:
> On Wed, May 05, 2010 at 01:48:48PM +0100, Steven Whitehouse wrote:
> > Hi,
> > 
> > On Mon, 2010-05-03 at 02:29 +0900, Minchan Kim wrote:
> > > Hi, Steven. 
> > > 
> > > Sorry for lazy response.
> > > I wanted to submit the patch which implement Nick's request whole.
> > > And unfortunately, I am so busy now. 
> > > But if it's urgent, I want to submit this one firstly and 
> > > at next version, maybe I will submit remained TODO things 
> > > after middle of May.
> > > 
> > > I think this patch can't make regression other usages.
> > > Nick. What do you think about?
> > > 
> > I guess the question is whether the remaining items are essential for
> > correct functioning of this patch, or whether they are "it would be nice
> > if" items. I suspect that they are the latter (I'm not a VM expert, but
> > from the brief descriptions it looks like that to me) in which case I'd
> > suggest send the currently existing patch first and the following up
> > with the remaining changes later.
> > 
> > We have got a nice speed up with your current patch and so far as I'm
> > aware not introduced any new bugs or regressions with it.
> > 
> > Nick, does that sound ok?
> 
> Just got around to looking at it again. I definitely agree we need to
> fix the regression, however I'm concerned about introducing other
> possible problems while doing that.
> 
> The following patch should (modulo bugs, but it's somewhat tested) give
> no difference in the allocation patterns, so won't introduce virtual
> memory layout changes.
> 
> Any chance you could test it?
> 

Apologies for the delay. I tried the patch on my test box and it worked
perfectly ok. When the original test was tried which triggered the
investigation in the first place, it failed to boot. Since that box is
remote and with limited remote console access, all I've been able to
find out is "it didn't work" which isn't very helpful.

I'm currently trying to figure out how we can work out whats wrong. It
isn't at all certain that it is an issue with this patch - it could be
almost anything :(

Steve.




^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] cache last free vmap_area to avoid restarting beginning
  2010-05-17 12:42                           ` Steven Whitehouse
@ 2010-05-18 13:44                             ` Steven Whitehouse
  0 siblings, 0 replies; 14+ messages in thread
From: Steven Whitehouse @ 2010-05-18 13:44 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Minchan Kim, Andrew Morton, linux-mm, linux-kernel

Hi,

On Mon, 2010-05-17 at 13:42 +0100, Steven Whitehouse wrote:
> Hi,
> 
> On Thu, 2010-05-06 at 02:16 +1000, Nick Piggin wrote:
> > On Wed, May 05, 2010 at 01:48:48PM +0100, Steven Whitehouse wrote:
> > > Hi,
> > > 
> > > On Mon, 2010-05-03 at 02:29 +0900, Minchan Kim wrote:
> > > > Hi, Steven. 
> > > > 
> > > > Sorry for lazy response.
> > > > I wanted to submit the patch which implement Nick's request whole.
> > > > And unfortunately, I am so busy now. 
> > > > But if it's urgent, I want to submit this one firstly and 
> > > > at next version, maybe I will submit remained TODO things 
> > > > after middle of May.
> > > > 
> > > > I think this patch can't make regression other usages.
> > > > Nick. What do you think about?
> > > > 
> > > I guess the question is whether the remaining items are essential for
> > > correct functioning of this patch, or whether they are "it would be nice
> > > if" items. I suspect that they are the latter (I'm not a VM expert, but
> > > from the brief descriptions it looks like that to me) in which case I'd
> > > suggest send the currently existing patch first and the following up
> > > with the remaining changes later.
> > > 
> > > We have got a nice speed up with your current patch and so far as I'm
> > > aware not introduced any new bugs or regressions with it.
> > > 
> > > Nick, does that sound ok?
> > 
> > Just got around to looking at it again. I definitely agree we need to
> > fix the regression, however I'm concerned about introducing other
> > possible problems while doing that.
> > 
> > The following patch should (modulo bugs, but it's somewhat tested) give
> > no difference in the allocation patterns, so won't introduce virtual
> > memory layout changes.
> > 
> > Any chance you could test it?
> > 
> 
> Apologies for the delay. I tried the patch on my test box and it worked
> perfectly ok. When the original test was tried which triggered the
> investigation in the first place, it failed to boot. Since that box is
> remote and with limited remote console access, all I've been able to
> find out is "it didn't work" which isn't very helpful.
> 
> I'm currently trying to figure out how we can work out whats wrong. It
> isn't at all certain that it is an issue with this patch - it could be
> almost anything :(
> 
> Steve.
> 
> 

Further tests show that exactly the same kernel, without that single
patch works ok, and but that with the patch we get the crash on boot. We
are trying to arrange for better console access to the test box (which
is remote) and will report back if we manage that and capture any
output,

Steve.



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] cache last free vmap_area to avoid restarting beginning
  2010-05-05 16:16                         ` Nick Piggin
  2010-05-17 12:42                           ` Steven Whitehouse
@ 2010-05-19 13:54                           ` Steven Whitehouse
  2010-05-19 13:56                             ` Nick Piggin
  2010-05-25  8:43                             ` Nick Piggin
  2010-05-22  9:53                           ` Minchan Kim
  2 siblings, 2 replies; 14+ messages in thread
From: Steven Whitehouse @ 2010-05-19 13:54 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Minchan Kim, Andrew Morton, linux-mm, linux-kernel

Hi,

On Thu, 2010-05-06 at 02:16 +1000, Nick Piggin wrote:
> On Wed, May 05, 2010 at 01:48:48PM +0100, Steven Whitehouse wrote:
> > Hi,
> > 
> > On Mon, 2010-05-03 at 02:29 +0900, Minchan Kim wrote:
> > > Hi, Steven. 
> > > 
> > > Sorry for lazy response.
> > > I wanted to submit the patch which implement Nick's request whole.
> > > And unfortunately, I am so busy now. 
> > > But if it's urgent, I want to submit this one firstly and 
> > > at next version, maybe I will submit remained TODO things 
> > > after middle of May.
> > > 
> > > I think this patch can't make regression other usages.
> > > Nick. What do you think about?
> > > 
> > I guess the question is whether the remaining items are essential for
> > correct functioning of this patch, or whether they are "it would be nice
> > if" items. I suspect that they are the latter (I'm not a VM expert, but
> > from the brief descriptions it looks like that to me) in which case I'd
> > suggest send the currently existing patch first and the following up
> > with the remaining changes later.
> > 
> > We have got a nice speed up with your current patch and so far as I'm
> > aware not introduced any new bugs or regressions with it.
> > 
> > Nick, does that sound ok?
> 
> Just got around to looking at it again. I definitely agree we need to
> fix the regression, however I'm concerned about introducing other
> possible problems while doing that.
> 
> The following patch should (modulo bugs, but it's somewhat tested) give
> no difference in the allocation patterns, so won't introduce virtual
> memory layout changes.
> 
> Any chance you could test it?
> 

At last some further info on the failed boot during testing. The
messages look like this:

dracut: Starting plymouth daemon
G------------[ cut here ]------------
kernel BUG at mm/vmalloc.c:391!
invalid opcode: 0000 [#1] SMP 
last sysfs file: /sys/devices/virtual/vtconsole/vtcon0/uevent
CPU 7 
Modules linked in:
Pid: 193, comm: modprobe Tainted: G        W  2.6.32-23.el6.bz583026.patches2.3.7.x86_64 #1 ProLiant DL580 G3
RIP: 0010:[<ffffffff8113c161>]  [<ffffffff8113c161>] alloc_vmap_area+0x431/0x440
RSP: 0018:ffff8803dae3bcf8  EFLAGS: 00010287
RAX: ffffc9001232e000 RBX: 0000000000004000 RCX: 0000000000000000
RDX: ffffffffa0000000 RSI: ffff8803db66fdc0 RDI: ffffffff81b6d0a0
RBP: ffff8803dae3bd88 R08: 000000000000000a R09: 00000000000000d0
R10: ffff8803db6b6e40 R11: 0000000000000040 R12: 0000000000000001
R13: ffffffffff000000 R14: ffffffffffffffff R15: ffffffffa0000000
FS:  00007f5872189700(0000) GS:ffff88002c2e0000(0000) knlGS:0000000000000000

and the code around that point is:

static struct vmap_area *alloc_vmap_area(unsigned long size,
                                unsigned long align,
                                unsigned long vstart, unsigned long vend,
                                int node, gfp_t gfp_mask)
{

...

                if (!first)
                        goto found;

                if (first->va_start < addr) {
391>                    BUG_ON(first->va_end < addr);
                        n = rb_next(&first->rb_node);
                        addr = ALIGN(first->va_end + PAGE_SIZE, align);
                        if (n)
                                first = rb_entry(n, struct vmap_area, rb_node);
                        else
                                goto found;
                }


so that seems to pinpoint the line on which the problem occurred. Let us
know if you'd like us to do some more testing. I think we have the
console access issue fixed now. Many thanks for all you help in this
so far,

Steve.





^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] cache last free vmap_area to avoid restarting beginning
  2010-05-19 13:54                           ` Steven Whitehouse
@ 2010-05-19 13:56                             ` Nick Piggin
  2010-05-25  8:43                             ` Nick Piggin
  1 sibling, 0 replies; 14+ messages in thread
From: Nick Piggin @ 2010-05-19 13:56 UTC (permalink / raw)
  To: Steven Whitehouse; +Cc: Minchan Kim, Andrew Morton, linux-mm, linux-kernel

On Wed, May 19, 2010 at 02:54:54PM +0100, Steven Whitehouse wrote:
> On Thu, 2010-05-06 at 02:16 +1000, Nick Piggin wrote:
> > On Wed, May 05, 2010 at 01:48:48PM +0100, Steven Whitehouse wrote:
> > Any chance you could test it?
> > 
> 
> At last some further info on the failed boot during testing. The
> messages look like this:
> 
> dracut: Starting plymouth daemon
> G------------[ cut here ]------------
> kernel BUG at mm/vmalloc.c:391!
> invalid opcode: 0000 [#1] SMP 
> last sysfs file: /sys/devices/virtual/vtconsole/vtcon0/uevent
> CPU 7 
> Modules linked in:
> Pid: 193, comm: modprobe Tainted: G        W  2.6.32-23.el6.bz583026.patches2.3.7.x86_64 #1 ProLiant DL580 G3
> RIP: 0010:[<ffffffff8113c161>]  [<ffffffff8113c161>] alloc_vmap_area+0x431/0x440
> RSP: 0018:ffff8803dae3bcf8  EFLAGS: 00010287
> RAX: ffffc9001232e000 RBX: 0000000000004000 RCX: 0000000000000000
> RDX: ffffffffa0000000 RSI: ffff8803db66fdc0 RDI: ffffffff81b6d0a0
> RBP: ffff8803dae3bd88 R08: 000000000000000a R09: 00000000000000d0
> R10: ffff8803db6b6e40 R11: 0000000000000040 R12: 0000000000000001
> R13: ffffffffff000000 R14: ffffffffffffffff R15: ffffffffa0000000
> FS:  00007f5872189700(0000) GS:ffff88002c2e0000(0000) knlGS:0000000000000000
> 
> and the code around that point is:
> 
> static struct vmap_area *alloc_vmap_area(unsigned long size,
>                                 unsigned long align,
>                                 unsigned long vstart, unsigned long vend,
>                                 int node, gfp_t gfp_mask)
> {
> 
> ...
> 
>                 if (!first)
>                         goto found;
> 
>                 if (first->va_start < addr) {
> 391>                    BUG_ON(first->va_end < addr);
>                         n = rb_next(&first->rb_node);
>                         addr = ALIGN(first->va_end + PAGE_SIZE, align);
>                         if (n)
>                                 first = rb_entry(n, struct vmap_area, rb_node);
>                         else
>                                 goto found;
>                 }
> 
> 
> so that seems to pinpoint the line on which the problem occurred. Let us
> know if you'd like us to do some more testing. I think we have the
> console access issue fixed now. Many thanks for all you help in this
> so far,

Thanks for testing it out. Hmm, I thought I'd shaken out these bugs --
I put the code in a userspace test harness and hammered it pretty hard,
but I must have overlooked something or you're triggering a really
specific sequence.

Let me get back to you if I cannot trigger anything here.



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] cache last free vmap_area to avoid restarting beginning
  2010-05-05 16:16                         ` Nick Piggin
  2010-05-17 12:42                           ` Steven Whitehouse
  2010-05-19 13:54                           ` Steven Whitehouse
@ 2010-05-22  9:53                           ` Minchan Kim
  2010-05-24  6:23                             ` Nick Piggin
  2 siblings, 1 reply; 14+ messages in thread
From: Minchan Kim @ 2010-05-22  9:53 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Steven Whitehouse, Andrew Morton, linux-mm, linux-kernel

Hi, Nick.
Sorry for late review. 

On Thu, 2010-05-06 at 02:16 +1000, Nick Piggin wrote:
> On Wed, May 05, 2010 at 01:48:48PM +0100, Steven Whitehouse wrote:
> > Hi,
> > 
> > On Mon, 2010-05-03 at 02:29 +0900, Minchan Kim wrote:
> > > Hi, Steven. 
> > > 
> > > Sorry for lazy response.
> > > I wanted to submit the patch which implement Nick's request whole.
> > > And unfortunately, I am so busy now. 
> > > But if it's urgent, I want to submit this one firstly and 
> > > at next version, maybe I will submit remained TODO things 
> > > after middle of May.
> > > 
> > > I think this patch can't make regression other usages.
> > > Nick. What do you think about?
> > > 
> > I guess the question is whether the remaining items are essential for
> > correct functioning of this patch, or whether they are "it would be nice
> > if" items. I suspect that they are the latter (I'm not a VM expert, but
> > from the brief descriptions it looks like that to me) in which case I'd
> > suggest send the currently existing patch first and the following up
> > with the remaining changes later.
> > 
> > We have got a nice speed up with your current patch and so far as I'm
> > aware not introduced any new bugs or regressions with it.
> > 
> > Nick, does that sound ok?
> 
> Just got around to looking at it again. I definitely agree we need to
> fix the regression, however I'm concerned about introducing other
> possible problems while doing that.
> 
> The following patch should (modulo bugs, but it's somewhat tested) give
> no difference in the allocation patterns, so won't introduce virtual
> memory layout changes.
> 
> Any chance you could test it?
> 
> ---
>  mm/vmalloc.c |   49 +++++++++++++++++++++++++++++++++++--------------
>  1 files changed, 35 insertions(+), 14 deletions(-)
> 
> Index: linux-2.6/mm/vmalloc.c
> ===================================================================
> --- linux-2.6.orig/mm/vmalloc.c
> +++ linux-2.6/mm/vmalloc.c
> @@ -262,8 +262,13 @@ struct vmap_area {
>  };
>  
>  static DEFINE_SPINLOCK(vmap_area_lock);
> -static struct rb_root vmap_area_root = RB_ROOT;
>  static LIST_HEAD(vmap_area_list);
> +static struct rb_root vmap_area_root = RB_ROOT;
> +
> +static struct rb_node *free_vmap_cache;
> +static unsigned long cached_hole_size;
> +static unsigned long cached_start;
> +
>  static unsigned long vmap_area_pcpu_hole;
>  
>  static struct vmap_area *__find_vmap_area(unsigned long addr)
> @@ -332,6 +337,7 @@ static struct vmap_area *alloc_vmap_area
>  	struct rb_node *n;
>  	unsigned long addr;
>  	int purged = 0;
> +	struct vmap_area *first;
>  
>  	BUG_ON(!size);
>  	BUG_ON(size & ~PAGE_MASK);
> @@ -348,11 +354,23 @@ retry:
>  	if (addr + size - 1 < addr)
>  		goto overflow;
>  
> -	/* XXX: could have a last_hole cache */
> -	n = vmap_area_root.rb_node;
> -	if (n) {
> -		struct vmap_area *first = NULL;
> +	if (size <= cached_hole_size || addr < cached_start || !free_vmap_cache) {

Do we need !free_vmap_cache check?
In __free_vmap_area, we already reset whole of variables when free_vmap_cache = NULL.

> +		cached_hole_size = 0;
> +		cached_start = addr;
> +		free_vmap_cache = NULL;
> +	}
>  
> +	/* find starting point for our search */
> +	if (free_vmap_cache) {
> +		first = rb_entry(free_vmap_cache, struct vmap_area, rb_node);
> +		addr = ALIGN(first->va_end + PAGE_SIZE, align);
> +
> +	} else {
> +		n = vmap_area_root.rb_node;
> +		if (!n)
> +			goto found;
> +
> +		first = NULL;
>  		do {
>  			struct vmap_area *tmp;
>  			tmp = rb_entry(n, struct vmap_area, rb_node);
> @@ -369,26 +387,36 @@ retry:
>  		if (!first)
>  			goto found;
>  
> -		if (first->va_end < addr) {
> +		if (first->va_start < addr) {

I can't understand your intention.
Why do you change va_end with va_start?

> +			BUG_ON(first->va_end < addr);

And Why do you put this BUG_ON in here?
Could you elaborate on logic?

>  			n = rb_next(&first->rb_node);
> +			addr = ALIGN(first->va_end + PAGE_SIZE, align);
>  			if (n)
>  				first = rb_entry(n, struct vmap_area, rb_node);
>  			else
>  				goto found;
>  		}
> +		BUG_ON(first->va_start < addr);

Ditto. 

> +		if (addr + cached_hole_size < first->va_start)
> +			cached_hole_size = first->va_start - addr;
> +	}
>  
> -		while (addr + size > first->va_start && addr + size <= vend) {
> -			addr = ALIGN(first->va_end + PAGE_SIZE, align);
> -			if (addr + size - 1 < addr)
> -				goto overflow;
> +	/* from the starting point, walk areas until a suitable hole is found */
>  
> -			n = rb_next(&first->rb_node);
> -			if (n)
> -				first = rb_entry(n, struct vmap_area, rb_node);
> -			else
> -				goto found;
> -		}
> +	while (addr + size > first->va_start && addr + size <= vend) {
> +		if (addr + cached_hole_size < first->va_start)
> +			cached_hole_size = first->va_start - addr;
> +		addr = ALIGN(first->va_end + PAGE_SIZE, align);
> +		if (addr + size - 1 < addr)
> +			goto overflow;
> +
> +		n = rb_next(&first->rb_node);
> +		if (n)
> +			first = rb_entry(n, struct vmap_area, rb_node);
> +		else
> +			goto found;
>  	}
> +
>  found:
>  	if (addr + size > vend) {
>  overflow:
> @@ -412,6 +440,7 @@ overflow:
>  	va->va_end = addr + size;
>  	va->flags = 0;
>  	__insert_vmap_area(va);
> +	free_vmap_cache = &va->rb_node;
>  	spin_unlock(&vmap_area_lock);
>  
>  	return va;
> @@ -427,6 +456,21 @@ static void rcu_free_va(struct rcu_head
>  static void __free_vmap_area(struct vmap_area *va)
>  {
>  	BUG_ON(RB_EMPTY_NODE(&va->rb_node));
> +
> +	if (free_vmap_cache) {
> +		if (va->va_end < cached_start) {
> +			cached_hole_size = 0;
> +			cached_start = 0;
> +			free_vmap_cache = NULL;
> +		} else {
> +			struct vmap_area *cache;
> +			cache = rb_entry(free_vmap_cache, struct vmap_area, rb_node);
> +			if (va->va_start <= cache->va_start) {
> +				free_vmap_cache = rb_prev(&va->rb_node);
> +				cache = rb_entry(free_vmap_cache, struct vmap_area, rb_node);
> +			}
> +		}
> +	}
>  	rb_erase(&va->rb_node, &vmap_area_root);
>  	RB_CLEAR_NODE(&va->rb_node);
>  	list_del_rcu(&va->list);

Hmm. I will send refactoring version soon. 
If you don't mind, let's discuss in there. :)

-- 
Kind regards,
Minchan Kim



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] cache last free vmap_area to avoid restarting beginning
  2010-05-22  9:53                           ` Minchan Kim
@ 2010-05-24  6:23                             ` Nick Piggin
  0 siblings, 0 replies; 14+ messages in thread
From: Nick Piggin @ 2010-05-24  6:23 UTC (permalink / raw)
  To: Minchan Kim; +Cc: Steven Whitehouse, Andrew Morton, linux-mm, linux-kernel

On Sat, May 22, 2010 at 06:53:53PM +0900, Minchan Kim wrote:
> Hi, Nick.
> Sorry for late review. 

No problem, thanks for reviewing.

 
> On Thu, 2010-05-06 at 02:16 +1000, Nick Piggin wrote:
> > On Wed, May 05, 2010 at 01:48:48PM +0100, Steven Whitehouse wrote:
> > @@ -348,11 +354,23 @@ retry:
> >  	if (addr + size - 1 < addr)
> >  		goto overflow;
> >  
> > -	/* XXX: could have a last_hole cache */
> > -	n = vmap_area_root.rb_node;
> > -	if (n) {
> > -		struct vmap_area *first = NULL;
> > +	if (size <= cached_hole_size || addr < cached_start || !free_vmap_cache) {
> 
> Do we need !free_vmap_cache check?
> In __free_vmap_area, we already reset whole of variables when free_vmap_cache = NULL.

You're right.


> > +		cached_hole_size = 0;
> > +		cached_start = addr;
> > +		free_vmap_cache = NULL;
> > +	}
> >  
> > +	/* find starting point for our search */
> > +	if (free_vmap_cache) {
> > +		first = rb_entry(free_vmap_cache, struct vmap_area, rb_node);
> > +		addr = ALIGN(first->va_end + PAGE_SIZE, align);
> > +
> > +	} else {
> > +		n = vmap_area_root.rb_node;
> > +		if (!n)
> > +			goto found;
> > +
> > +		first = NULL;
> >  		do {
> >  			struct vmap_area *tmp;
> >  			tmp = rb_entry(n, struct vmap_area, rb_node);
> > @@ -369,26 +387,36 @@ retry:
> >  		if (!first)
> >  			goto found;
> >  
> > -		if (first->va_end < addr) {
> > +		if (first->va_start < addr) {
> 
> I can't understand your intention.
> Why do you change va_end with va_start?

Because we don't want an area which is spanning the start address. And
it makes subsequent logic simpler.

 
> > +			BUG_ON(first->va_end < addr);
> 
> And Why do you put this BUG_ON in here?
> Could you elaborate on logic?

It seems this is wrong, so I've removed it. This is the BUG that
Steven hit, but there is another bug in there that my stress tester
wasn't triggering.

> 
> >  			n = rb_next(&first->rb_node);
> > +			addr = ALIGN(first->va_end + PAGE_SIZE, align);
> >  			if (n)
> >  				first = rb_entry(n, struct vmap_area, rb_node);
> >  			else
> >  				goto found;
> >  		}
> > +		BUG_ON(first->va_start < addr);
> 
> Ditto. 

Don't want an area spanning start address, as above.


> >  	rb_erase(&va->rb_node, &vmap_area_root);
> >  	RB_CLEAR_NODE(&va->rb_node);
> >  	list_del_rcu(&va->list);
> 
> Hmm. I will send refactoring version soon. 
> If you don't mind, let's discuss in there. :)

I just reworked the initial patch a little bit and fixed a bug in it,
if we could instead do the refactoring on top of it, that would save
me having to rediff?

I'll post it shortly.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] cache last free vmap_area to avoid restarting beginning
  2010-05-19 13:54                           ` Steven Whitehouse
  2010-05-19 13:56                             ` Nick Piggin
@ 2010-05-25  8:43                             ` Nick Piggin
  2010-05-25 15:00                               ` Minchan Kim
  1 sibling, 1 reply; 14+ messages in thread
From: Nick Piggin @ 2010-05-25  8:43 UTC (permalink / raw)
  To: Steven Whitehouse; +Cc: Minchan Kim, Andrew Morton, linux-mm, linux-kernel

On Wed, May 19, 2010 at 02:54:54PM +0100, Steven Whitehouse wrote:
> so that seems to pinpoint the line on which the problem occurred. Let us
> know if you'd like us to do some more testing. I think we have the
> console access issue fixed now. Many thanks for all you help in this
> so far,

Sorry for the delay. Ended up requiring a bit of surgery and several bug
fixes. I added a lot more test cases to my userspace tester, and found
several bugs including the one you hit.

Most of them were due to changing vstart,vend or changing requested
alignment.

I can't guarantee it's going to work for you (it boots here, but the
last version booted as well). But I think it's in much better shape.

It is very careful to reproduce exactly the same allocation behaviour,
so the effectiveness of the cache can be reduced if sizes, alignments,
or start,end ranges are very frequently changing. But I'd hope that
for most vmap heavy workloads, they should cache quite well. We could
look at doing smarter things if it isn't effective enough.

--
Provide a free area cache for the vmalloc virtual address allocator, based
on the approach taken in the user virtual memory allocator.

This reduces the number of rbtree operations and linear traversals over
the vmap extents to find a free area. The lazy vmap flushing makes this problem
worse because because freed but not yet flushed vmaps tend to build up in
the address space between flushes.

Steven noticed a performance problem with GFS2. Results are as follows...



 mm/vmalloc.c |   49 +++++++++++++++++++++++++++++++++++--------------
 1 files changed, 35 insertions(+), 14 deletions(-)

Index: linux-2.6/mm/vmalloc.c
===================================================================
--- linux-2.6.orig/mm/vmalloc.c
+++ linux-2.6/mm/vmalloc.c
@@ -262,8 +262,14 @@ struct vmap_area {
 };
 
 static DEFINE_SPINLOCK(vmap_area_lock);
-static struct rb_root vmap_area_root = RB_ROOT;
 static LIST_HEAD(vmap_area_list);
+static struct rb_root vmap_area_root = RB_ROOT;
+
+static struct rb_node *free_vmap_cache;
+static unsigned long cached_hole_size;
+static unsigned long cached_start;
+static unsigned long cached_align;
+
 static unsigned long vmap_area_pcpu_hole;
 
 static struct vmap_area *__find_vmap_area(unsigned long addr)
@@ -332,27 +338,52 @@ static struct vmap_area *alloc_vmap_area
 	struct rb_node *n;
 	unsigned long addr;
 	int purged = 0;
+	struct vmap_area *first;
 
 	BUG_ON(!size);
 	BUG_ON(size & ~PAGE_MASK);
+	BUG_ON(!is_power_of_2(align));
 
 	va = kmalloc_node(sizeof(struct vmap_area),
 			gfp_mask & GFP_RECLAIM_MASK, node);
 	if (unlikely(!va))
 		return ERR_PTR(-ENOMEM);
 
+	spin_lock(&vmap_area_lock);
 retry:
-	addr = ALIGN(vstart, align);
+	/* invalidate cache if we have more permissive parameters */
+	if (!free_vmap_cache ||
+			size <= cached_hole_size ||
+			vstart < cached_start ||
+			align < cached_align) {
+		cached_hole_size = 0;
+		free_vmap_cache = NULL;
+	}
+	/* record if we encounter less permissive parameters */
+	cached_start = vstart;
+	cached_align = align;
+
+	/* find starting point for our search */
+	if (free_vmap_cache) {
+		first = rb_entry(free_vmap_cache, struct vmap_area, rb_node);
+		addr = ALIGN(first->va_end + PAGE_SIZE, align);
+		if (addr < vstart) {
+			free_vmap_cache = NULL;
+			goto retry;
+		}
+		if (addr + size - 1 < addr)
+			goto overflow;
 
-	spin_lock(&vmap_area_lock);
-	if (addr + size - 1 < addr)
-		goto overflow;
+	} else {
+		addr = ALIGN(vstart, align);
+		if (addr + size - 1 < addr)
+			goto overflow;
 
-	/* XXX: could have a last_hole cache */
-	n = vmap_area_root.rb_node;
-	if (n) {
-		struct vmap_area *first = NULL;
+		n = vmap_area_root.rb_node;
+		if (!n)
+			goto found;
 
+		first = NULL;
 		do {
 			struct vmap_area *tmp;
 			tmp = rb_entry(n, struct vmap_area, rb_node);
@@ -369,26 +400,37 @@ retry:
 		if (!first)
 			goto found;
 
-		if (first->va_end < addr) {
-			n = rb_next(&first->rb_node);
-			if (n)
-				first = rb_entry(n, struct vmap_area, rb_node);
-			else
-				goto found;
-		}
-
-		while (addr + size > first->va_start && addr + size <= vend) {
-			addr = ALIGN(first->va_end + PAGE_SIZE, align);
+		if (first->va_start < addr) {
+			addr = ALIGN(max(first->va_end + PAGE_SIZE, addr), align);
 			if (addr + size - 1 < addr)
 				goto overflow;
-
 			n = rb_next(&first->rb_node);
 			if (n)
 				first = rb_entry(n, struct vmap_area, rb_node);
 			else
 				goto found;
 		}
+		BUG_ON(first->va_start < addr);
+		if (addr + cached_hole_size < first->va_start)
+			cached_hole_size = first->va_start - addr;
+	}
+
+	/* from the starting point, walk areas until a suitable hole is found */
+
+	while (addr + size > first->va_start && addr + size <= vend) {
+		if (addr + cached_hole_size < first->va_start)
+			cached_hole_size = first->va_start - addr;
+		addr = ALIGN(first->va_end + PAGE_SIZE, align);
+		if (addr + size - 1 < addr)
+			goto overflow;
+
+		n = rb_next(&first->rb_node);
+		if (n)
+			first = rb_entry(n, struct vmap_area, rb_node);
+		else
+			goto found;
 	}
+
 found:
 	if (addr + size > vend) {
 overflow:
@@ -406,14 +448,17 @@ overflow:
 		return ERR_PTR(-EBUSY);
 	}
 
-	BUG_ON(addr & (align-1));
-
 	va->va_start = addr;
 	va->va_end = addr + size;
 	va->flags = 0;
 	__insert_vmap_area(va);
+	free_vmap_cache = &va->rb_node;
 	spin_unlock(&vmap_area_lock);
 
+	BUG_ON(va->va_start & (align-1));
+	BUG_ON(va->va_start < vstart);
+	BUG_ON(va->va_end > vend);
+
 	return va;
 }
 
@@ -427,6 +472,19 @@ static void rcu_free_va(struct rcu_head
 static void __free_vmap_area(struct vmap_area *va)
 {
 	BUG_ON(RB_EMPTY_NODE(&va->rb_node));
+
+	if (free_vmap_cache) {
+		if (va->va_end < cached_start) {
+			free_vmap_cache = NULL;
+		} else {
+			struct vmap_area *cache;
+			cache = rb_entry(free_vmap_cache, struct vmap_area, rb_node);
+			if (va->va_start <= cache->va_start) {
+				free_vmap_cache = rb_prev(&va->rb_node);
+				cache = rb_entry(free_vmap_cache, struct vmap_area, rb_node);
+			}
+		}
+	}
 	rb_erase(&va->rb_node, &vmap_area_root);
 	RB_CLEAR_NODE(&va->rb_node);
 	list_del_rcu(&va->list);

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] cache last free vmap_area to avoid restarting beginning
  2010-05-25  8:43                             ` Nick Piggin
@ 2010-05-25 15:00                               ` Minchan Kim
  2010-05-25 15:48                                 ` Steven Whitehouse
  0 siblings, 1 reply; 14+ messages in thread
From: Minchan Kim @ 2010-05-25 15:00 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Steven Whitehouse, Andrew Morton, linux-mm, linux-kernel

On Tue, May 25, 2010 at 06:43:23PM +1000, Nick Piggin wrote:
> On Wed, May 19, 2010 at 02:54:54PM +0100, Steven Whitehouse wrote:
> > so that seems to pinpoint the line on which the problem occurred. Let us
> > know if you'd like us to do some more testing. I think we have the
> > console access issue fixed now. Many thanks for all you help in this
> > so far,
> 
> Sorry for the delay. Ended up requiring a bit of surgery and several bug
> fixes. I added a lot more test cases to my userspace tester, and found
> several bugs including the one you hit.
> 
> Most of them were due to changing vstart,vend or changing requested
> alignment.
> 
> I can't guarantee it's going to work for you (it boots here, but the
> last version booted as well). But I think it's in much better shape.
> 
> It is very careful to reproduce exactly the same allocation behaviour,
> so the effectiveness of the cache can be reduced if sizes, alignments,
> or start,end ranges are very frequently changing. But I'd hope that
> for most vmap heavy workloads, they should cache quite well. We could
> look at doing smarter things if it isn't effective enough.
> 
> --
> Provide a free area cache for the vmalloc virtual address allocator, based
> on the approach taken in the user virtual memory allocator.
> 
> This reduces the number of rbtree operations and linear traversals over
> the vmap extents to find a free area. The lazy vmap flushing makes this problem
> worse because because freed but not yet flushed vmaps tend to build up in
> the address space between flushes.
> 
> Steven noticed a performance problem with GFS2. Results are as follows...
> 
> 
> 
>  mm/vmalloc.c |   49 +++++++++++++++++++++++++++++++++++--------------
>  1 files changed, 35 insertions(+), 14 deletions(-)
> 
> Index: linux-2.6/mm/vmalloc.c
> ===================================================================
> --- linux-2.6.orig/mm/vmalloc.c
> +++ linux-2.6/mm/vmalloc.c
> @@ -262,8 +262,14 @@ struct vmap_area {
>  };
>  
>  static DEFINE_SPINLOCK(vmap_area_lock);
> -static struct rb_root vmap_area_root = RB_ROOT;
>  static LIST_HEAD(vmap_area_list);
> +static struct rb_root vmap_area_root = RB_ROOT;
> +
> +static struct rb_node *free_vmap_cache;
> +static unsigned long cached_hole_size;
> +static unsigned long cached_start;
> +static unsigned long cached_align;
> +
>  static unsigned long vmap_area_pcpu_hole;
>  
>  static struct vmap_area *__find_vmap_area(unsigned long addr)
> @@ -332,27 +338,52 @@ static struct vmap_area *alloc_vmap_area
>  	struct rb_node *n;
>  	unsigned long addr;
>  	int purged = 0;
> +	struct vmap_area *first;
>  
>  	BUG_ON(!size);
>  	BUG_ON(size & ~PAGE_MASK);
> +	BUG_ON(!is_power_of_2(align));
>  
>  	va = kmalloc_node(sizeof(struct vmap_area),
>  			gfp_mask & GFP_RECLAIM_MASK, node);
>  	if (unlikely(!va))
>  		return ERR_PTR(-ENOMEM);
>  
> +	spin_lock(&vmap_area_lock);

vmap_area_lock is unbalnced with last spin_unlock in case of overflow.
Maybe you hold the lock after retry and you release the lock before retry.

>  retry:
> -	addr = ALIGN(vstart, align);
> +	/* invalidate cache if we have more permissive parameters */
> +	if (!free_vmap_cache ||
> +			size <= cached_hole_size ||
> +			vstart < cached_start ||
> +			align < cached_align) {
> +		cached_hole_size = 0;
> +		free_vmap_cache = NULL;
> +	}
> +	/* record if we encounter less permissive parameters */
> +	cached_start = vstart;
> +	cached_align = align;
> +
> +	/* find starting point for our search */
> +	if (free_vmap_cache) {
> +		first = rb_entry(free_vmap_cache, struct vmap_area, rb_node);
> +		addr = ALIGN(first->va_end + PAGE_SIZE, align);
> +		if (addr < vstart) {
> +			free_vmap_cache = NULL;
> +			goto retry;
> +		}
> +		if (addr + size - 1 < addr)
> +			goto overflow;
>  
> -	spin_lock(&vmap_area_lock);
> -	if (addr + size - 1 < addr)
> -		goto overflow;
> +	} else {
> +		addr = ALIGN(vstart, align);
> +		if (addr + size - 1 < addr)
> +			goto overflow;
>  
> -	/* XXX: could have a last_hole cache */
> -	n = vmap_area_root.rb_node;
> -	if (n) {
> -		struct vmap_area *first = NULL;
> +		n = vmap_area_root.rb_node;
> +		if (!n)
> +			goto found;
>  
> +		first = NULL;
>  		do {
>  			struct vmap_area *tmp;
>  			tmp = rb_entry(n, struct vmap_area, rb_node);
> @@ -369,26 +400,37 @@ retry:
>  		if (!first)
>  			goto found;
>  
> -		if (first->va_end < addr) {
> -			n = rb_next(&first->rb_node);
> -			if (n)
> -				first = rb_entry(n, struct vmap_area, rb_node);
> -			else
> -				goto found;
> -		}
> -
> -		while (addr + size > first->va_start && addr + size <= vend) {
> -			addr = ALIGN(first->va_end + PAGE_SIZE, align);
> +		if (first->va_start < addr) {
> +			addr = ALIGN(max(first->va_end + PAGE_SIZE, addr), align);

Frankly speaking, I don't see the benefit which you mentiond that it makes
subsequent logic simpler. For me, I like old code which compares va_end. 
In case of spanning, old code has the problem?
I think old code has no problem and looks good than current one. 

>  			if (addr + size - 1 < addr)
>  				goto overflow;
> -
>  			n = rb_next(&first->rb_node);
>  			if (n)
>  				first = rb_entry(n, struct vmap_area, rb_node);
>  			else
>  				goto found;
>  		}
> +		BUG_ON(first->va_start < addr);
> +		if (addr + cached_hole_size < first->va_start)
> +			cached_hole_size = first->va_start - addr;
> +	}
> +
> +	/* from the starting point, walk areas until a suitable hole is found */

Unnecessary empty line :)

> +
> +	while (addr + size > first->va_start && addr + size <= vend) {
> +		if (addr + cached_hole_size < first->va_start)
> +			cached_hole_size = first->va_start - addr;
> +		addr = ALIGN(first->va_end + PAGE_SIZE, align);
> +		if (addr + size - 1 < addr)
> +			goto overflow;
> +
> +		n = rb_next(&first->rb_node);
> +		if (n)
> +			first = rb_entry(n, struct vmap_area, rb_node);
> +		else
> +			goto found;
>  	}
> +
>  found:
>  	if (addr + size > vend) {
>  overflow:
> @@ -406,14 +448,17 @@ overflow:
>  		return ERR_PTR(-EBUSY);
>  	}
>  
> -	BUG_ON(addr & (align-1));
> -
>  	va->va_start = addr;
>  	va->va_end = addr + size;
>  	va->flags = 0;
>  	__insert_vmap_area(va);
> +	free_vmap_cache = &va->rb_node;
>  	spin_unlock(&vmap_area_lock);
>  
> +	BUG_ON(va->va_start & (align-1));
> +	BUG_ON(va->va_start < vstart);
> +	BUG_ON(va->va_end > vend);
> +
>  	return va;
>  }
>  
> @@ -427,6 +472,19 @@ static void rcu_free_va(struct rcu_head
>  static void __free_vmap_area(struct vmap_area *va)
>  {
>  	BUG_ON(RB_EMPTY_NODE(&va->rb_node));
> +
> +	if (free_vmap_cache) {
> +		if (va->va_end < cached_start) {
> +			free_vmap_cache = NULL;
> +		} else {
> +			struct vmap_area *cache;
> +			cache = rb_entry(free_vmap_cache, struct vmap_area, rb_node);
> +			if (va->va_start <= cache->va_start) {
> +				free_vmap_cache = rb_prev(&va->rb_node);
> +				cache = rb_entry(free_vmap_cache, struct vmap_area, rb_node);
> +			}
> +		}
> +	}
>  	rb_erase(&va->rb_node, &vmap_area_root);
>  	RB_CLEAR_NODE(&va->rb_node);
>  	list_del_rcu(&va->list);

Anyway, I am looking forard to seeing Steven's experiment.
If test has no problem, I will remake refactoring patch based on your patch. :)

Thanks, Nick.
-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] cache last free vmap_area to avoid restarting beginning
  2010-05-25 15:00                               ` Minchan Kim
@ 2010-05-25 15:48                                 ` Steven Whitehouse
  0 siblings, 0 replies; 14+ messages in thread
From: Steven Whitehouse @ 2010-05-25 15:48 UTC (permalink / raw)
  To: Minchan Kim; +Cc: Nick Piggin, Andrew Morton, linux-mm, linux-kernel

Hi,

On Wed, 2010-05-26 at 00:00 +0900, Minchan Kim wrote:

> Anyway, I am looking forard to seeing Steven's experiment.
> If test has no problem, I will remake refactoring patch based on your patch. :)
> 
> Thanks, Nick.

I gather that it might be a couple of days before our tester can run the
tests as he is busy with something else at the moment. I'll get back to
you as soon as I can. Apologies for the delay in testing,

Steve.



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] cache last free vmap_area to avoid restarting beginning
       [not found] <118879818.966271277501902144.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com>
@ 2010-06-25 21:40 ` Bob Peterson
  2010-06-26  8:35   ` Nick Piggin
  0 siblings, 1 reply; 14+ messages in thread
From: Bob Peterson @ 2010-06-25 21:40 UTC (permalink / raw)
  To: linux-kernel, npiggin

On 2010-05-25 8:43:23, Nick Piggin wrote:
> On Wed, May 19, 2010 at 02:54:54PM +0100, Steven Whitehouse wrote:
> > so that seems to pinpoint the line on which the problem occurred. Let us
> > know if you'd like us to do some more testing. I think we have the
> > console access issue fixed now. Many thanks for all you help in this
> > so far,
> 
> Sorry for the delay. Ended up requiring a bit of surgery and several bug
> fixes. I added a lot more test cases to my userspace tester, and found
> several bugs including the one you hit.
> 
> Most of them were due to changing vstart,vend or changing requested
> alignment.
> 
> I can't guarantee it's going to work for you (it boots here, but the
> last version booted as well). But I think it's in much better shape.
> 
> It is very careful to reproduce exactly the same allocation behaviour,
> so the effectiveness of the cache can be reduced if sizes, alignments,
> or start,end ranges are very frequently changing. But I'd hope that
> for most vmap heavy workloads, they should cache quite well. We could
> look at doing smarter things if it isn't effective enough.
> 
> --
> Provide a free area cache for the vmalloc virtual address allocator, based
> on the approach taken in the user virtual memory allocator.
> 
> This reduces the number of rbtree operations and linear traversals over
> the vmap extents to find a free area. The lazy vmap flushing makes this problem
> worse because because freed but not yet flushed vmaps tend to build up in
> the address space between flushes.
> 
> Steven noticed a performance problem with GFS2. Results are as follows...
> 
> 
> 
>  mm/vmalloc.c |   49 +++++++++++++++++++++++++++++++++++--------------
>  1 files changed, 35 insertions(+), 14 deletions(-)
> 
> Index: linux-2.6/mm/vmalloc.c
> ===================================================================
> --- linux-2.6.orig/mm/vmalloc.c
> +++ linux-2.6/mm/vmalloc.c
> @@ -262,8 +262,14 @@ struct vmap_area {
>  };
>  
>  static DEFINE_SPINLOCK(vmap_area_lock);
> -static struct rb_root vmap_area_root = RB_ROOT;
>  static LIST_HEAD(vmap_area_list);
> +static struct rb_root vmap_area_root = RB_ROOT;
> +
> +static struct rb_node *free_vmap_cache;
> +static unsigned long cached_hole_size;
> +static unsigned long cached_start;
> +static unsigned long cached_align;
> +
>  static unsigned long vmap_area_pcpu_hole;
>  
>  static struct vmap_area *__find_vmap_area(unsigned long addr)
> @@ -332,27 +338,52 @@ static struct vmap_area *alloc_vmap_area
>  	struct rb_node *n;
>  	unsigned long addr;
>  	int purged = 0;
> +	struct vmap_area *first;
>  
>  	BUG_ON(!size);
>  	BUG_ON(size & ~PAGE_MASK);
> +	BUG_ON(!is_power_of_2(align));
>  
>  	va = kmalloc_node(sizeof(struct vmap_area),
>  			gfp_mask & GFP_RECLAIM_MASK, node);
>  	if (unlikely(!va))
>  		return ERR_PTR(-ENOMEM);
>  
> +	spin_lock(&vmap_area_lock);
>  retry:
> -	addr = ALIGN(vstart, align);
> +	/* invalidate cache if we have more permissive parameters */
> +	if (!free_vmap_cache ||
> +			size <= cached_hole_size ||
> +			vstart < cached_start ||
> +			align < cached_align) {
> +		cached_hole_size = 0;
> +		free_vmap_cache = NULL;
> +	}
> +	/* record if we encounter less permissive parameters */
> +	cached_start = vstart;
> +	cached_align = align;
> +
> +	/* find starting point for our search */
> +	if (free_vmap_cache) {
> +		first = rb_entry(free_vmap_cache, struct vmap_area, rb_node);
> +		addr = ALIGN(first->va_end + PAGE_SIZE, align);
> +		if (addr < vstart) {
> +			free_vmap_cache = NULL;
> +			goto retry;
> +		}
> +		if (addr + size - 1 < addr)
> +			goto overflow;
>  
> -	spin_lock(&vmap_area_lock);
> -	if (addr + size - 1 < addr)
> -		goto overflow;
> +	} else {
> +		addr = ALIGN(vstart, align);
> +		if (addr + size - 1 < addr)
> +			goto overflow;
>  
> -	/* XXX: could have a last_hole cache */
> -	n = vmap_area_root.rb_node;
> -	if (n) {
> -		struct vmap_area *first = NULL;
> +		n = vmap_area_root.rb_node;
> +		if (!n)
> +			goto found;
>  
> +		first = NULL;
>  		do {
>  			struct vmap_area *tmp;
>  			tmp = rb_entry(n, struct vmap_area, rb_node);
> @@ -369,26 +400,37 @@ retry:
>  		if (!first)
>  			goto found;
>  
> -		if (first->va_end < addr) {
> -			n = rb_next(&first->rb_node);
> -			if (n)
> -				first = rb_entry(n, struct vmap_area, rb_node);
> -			else
> -				goto found;
> -		}
> -
> -		while (addr + size > first->va_start && addr + size <= vend) {
> -			addr = ALIGN(first->va_end + PAGE_SIZE, align);
> +		if (first->va_start < addr) {
> +			addr = ALIGN(max(first->va_end + PAGE_SIZE, addr), align);
>  			if (addr + size - 1 < addr)
>  				goto overflow;
> -
>  			n = rb_next(&first->rb_node);
>  			if (n)
>  				first = rb_entry(n, struct vmap_area, rb_node);
>  			else
>  				goto found;
>  		}
> +		BUG_ON(first->va_start < addr);
> +		if (addr + cached_hole_size < first->va_start)
> +			cached_hole_size = first->va_start - addr;
> +	}
> +
> +	/* from the starting point, walk areas until a suitable hole is found */
> +
> +	while (addr + size > first->va_start && addr + size <= vend) {
> +		if (addr + cached_hole_size < first->va_start)
> +			cached_hole_size = first->va_start - addr;
> +		addr = ALIGN(first->va_end + PAGE_SIZE, align);
> +		if (addr + size - 1 < addr)
> +			goto overflow;
> +
> +		n = rb_next(&first->rb_node);
> +		if (n)
> +			first = rb_entry(n, struct vmap_area, rb_node);
> +		else
> +			goto found;
>  	}
> +
>  found:
>  	if (addr + size > vend) {
>  overflow:
> @@ -406,14 +448,17 @@ overflow:
>  		return ERR_PTR(-EBUSY);
>  	}
>  
> -	BUG_ON(addr & (align-1));
> -
>  	va->va_start = addr;
>  	va->va_end = addr + size;
>  	va->flags = 0;
>  	__insert_vmap_area(va);
> +	free_vmap_cache = &va->rb_node;
>  	spin_unlock(&vmap_area_lock);
>  
> +	BUG_ON(va->va_start & (align-1));
> +	BUG_ON(va->va_start < vstart);
> +	BUG_ON(va->va_end > vend);
> +
>  	return va;
>  }
>  
> @@ -427,6 +472,19 @@ static void rcu_free_va(struct rcu_head
>  static void __free_vmap_area(struct vmap_area *va)
>  {
>  	BUG_ON(RB_EMPTY_NODE(&va->rb_node));
> +
> +	if (free_vmap_cache) {
> +		if (va->va_end < cached_start) {
> +			free_vmap_cache = NULL;
> +		} else {
> +			struct vmap_area *cache;
> +			cache = rb_entry(free_vmap_cache, struct vmap_area, rb_node);
> +			if (va->va_start <= cache->va_start) {
> +				free_vmap_cache = rb_prev(&va->rb_node);
> +				cache = rb_entry(free_vmap_cache, struct vmap_area, rb_node);
> +			}
> +		}
> +	}
>  	rb_erase(&va->rb_node, &vmap_area_root);
>  	RB_CLEAR_NODE(&va->rb_node);
>  	list_del_rcu(&va->list);

Hey Nick,

I spotted what seems to be a regression in this patch.
The spin_lock was moved before the try loop:

	spin_lock(&vmap_area_lock);
retry:

But later, after the overflow label, it can unlock the spin_lock and
loop back to retry without the spin_lock locked.  See:

overflow:
		spin_unlock(&vmap_area_lock);
		if (!purged) {
			purge_vmap_area_lazy();
			purged = 1;
			goto retry;

If I'm not mistaken we should either put the spin_lock back below the
retry label or move the spin_unlock after the if (!purged) to prevent
accesses without the spin_lock held.

Regards,

Bob Peterson
Red Hat File Systems

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] cache last free vmap_area to avoid restarting beginning
  2010-06-25 21:40 ` [PATCH] cache last free vmap_area to avoid restarting beginning Bob Peterson
@ 2010-06-26  8:35   ` Nick Piggin
  0 siblings, 0 replies; 14+ messages in thread
From: Nick Piggin @ 2010-06-26  8:35 UTC (permalink / raw)
  To: Bob Peterson; +Cc: linux-kernel

On Fri, Jun 25, 2010 at 05:40:22PM -0400, Bob Peterson wrote:
> On 2010-05-25 8:43:23, Nick Piggin wrote:
> Hey Nick,
> 
> I spotted what seems to be a regression in this patch.
> The spin_lock was moved before the try loop:
> 
> 	spin_lock(&vmap_area_lock);
> retry:
> 
> But later, after the overflow label, it can unlock the spin_lock and
> loop back to retry without the spin_lock locked.  See:
> 
> overflow:
> 		spin_unlock(&vmap_area_lock);
> 		if (!purged) {
> 			purge_vmap_area_lazy();
> 			purged = 1;
> 			goto retry;
> 
> If I'm not mistaken we should either put the spin_lock back below the
> retry label or move the spin_unlock after the if (!purged) to prevent
> accesses without the spin_lock held.
> 
> Regards,
> 
> Bob Peterson
> Red Hat File Systems

Hi Bob,

Thanks for having a look. You are right, I think Minchan spotted
this too and the version in Andrew Morton's tree should have that
fixed.

http://userweb.kernel.org/~akpm/mmotm/broken-out/mm-vmap-area-cache.patch

Thanks,
Nick


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2010-06-26  8:35 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <118879818.966271277501902144.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com>
2010-06-25 21:40 ` [PATCH] cache last free vmap_area to avoid restarting beginning Bob Peterson
2010-06-26  8:35   ` Nick Piggin
2010-04-12 16:27 vmalloc performance Steven Whitehouse
2010-04-14 12:49 ` Steven Whitehouse
2010-04-14 15:13   ` Minchan Kim
2010-04-14 16:35     ` Minchan Kim
2010-04-15  8:33       ` Steven Whitehouse
2010-04-15 16:51         ` Minchan Kim
2010-04-16 14:10           ` Steven Whitehouse
2010-04-18 15:14             ` Minchan Kim
2010-04-19 12:58               ` Steven Whitehouse
2010-04-19 14:12                 ` Minchan Kim
2010-04-29 13:43                   ` Steven Whitehouse
2010-05-02 17:29                     ` [PATCH] cache last free vmap_area to avoid restarting beginning Minchan Kim
2010-05-05 12:48                       ` Steven Whitehouse
2010-05-05 16:16                         ` Nick Piggin
2010-05-17 12:42                           ` Steven Whitehouse
2010-05-18 13:44                             ` Steven Whitehouse
2010-05-19 13:54                           ` Steven Whitehouse
2010-05-19 13:56                             ` Nick Piggin
2010-05-25  8:43                             ` Nick Piggin
2010-05-25 15:00                               ` Minchan Kim
2010-05-25 15:48                                 ` Steven Whitehouse
2010-05-22  9:53                           ` Minchan Kim
2010-05-24  6:23                             ` Nick Piggin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).