From: Minchan Kim <minchan.kim@gmail.com>
To: Nick Piggin <npiggin@suse.de>
Cc: Steven Whitehouse <swhiteho@redhat.com>,
Andrew Morton <akpm@linux-foundation.org>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] cache last free vmap_area to avoid restarting beginning
Date: Wed, 26 May 2010 00:00:38 +0900 [thread overview]
Message-ID: <20100525150038.GA3227@barrios-desktop> (raw)
In-Reply-To: <20100525084323.GG5087@laptop>
On Tue, May 25, 2010 at 06:43:23PM +1000, Nick Piggin wrote:
> On Wed, May 19, 2010 at 02:54:54PM +0100, Steven Whitehouse wrote:
> > so that seems to pinpoint the line on which the problem occurred. Let us
> > know if you'd like us to do some more testing. I think we have the
> > console access issue fixed now. Many thanks for all you help in this
> > so far,
>
> Sorry for the delay. Ended up requiring a bit of surgery and several bug
> fixes. I added a lot more test cases to my userspace tester, and found
> several bugs including the one you hit.
>
> Most of them were due to changing vstart,vend or changing requested
> alignment.
>
> I can't guarantee it's going to work for you (it boots here, but the
> last version booted as well). But I think it's in much better shape.
>
> It is very careful to reproduce exactly the same allocation behaviour,
> so the effectiveness of the cache can be reduced if sizes, alignments,
> or start,end ranges are very frequently changing. But I'd hope that
> for most vmap heavy workloads, they should cache quite well. We could
> look at doing smarter things if it isn't effective enough.
>
> --
> Provide a free area cache for the vmalloc virtual address allocator, based
> on the approach taken in the user virtual memory allocator.
>
> This reduces the number of rbtree operations and linear traversals over
> the vmap extents to find a free area. The lazy vmap flushing makes this problem
> worse because because freed but not yet flushed vmaps tend to build up in
> the address space between flushes.
>
> Steven noticed a performance problem with GFS2. Results are as follows...
>
>
>
> mm/vmalloc.c | 49 +++++++++++++++++++++++++++++++++++--------------
> 1 files changed, 35 insertions(+), 14 deletions(-)
>
> Index: linux-2.6/mm/vmalloc.c
> ===================================================================
> --- linux-2.6.orig/mm/vmalloc.c
> +++ linux-2.6/mm/vmalloc.c
> @@ -262,8 +262,14 @@ struct vmap_area {
> };
>
> static DEFINE_SPINLOCK(vmap_area_lock);
> -static struct rb_root vmap_area_root = RB_ROOT;
> static LIST_HEAD(vmap_area_list);
> +static struct rb_root vmap_area_root = RB_ROOT;
> +
> +static struct rb_node *free_vmap_cache;
> +static unsigned long cached_hole_size;
> +static unsigned long cached_start;
> +static unsigned long cached_align;
> +
> static unsigned long vmap_area_pcpu_hole;
>
> static struct vmap_area *__find_vmap_area(unsigned long addr)
> @@ -332,27 +338,52 @@ static struct vmap_area *alloc_vmap_area
> struct rb_node *n;
> unsigned long addr;
> int purged = 0;
> + struct vmap_area *first;
>
> BUG_ON(!size);
> BUG_ON(size & ~PAGE_MASK);
> + BUG_ON(!is_power_of_2(align));
>
> va = kmalloc_node(sizeof(struct vmap_area),
> gfp_mask & GFP_RECLAIM_MASK, node);
> if (unlikely(!va))
> return ERR_PTR(-ENOMEM);
>
> + spin_lock(&vmap_area_lock);
vmap_area_lock is unbalnced with last spin_unlock in case of overflow.
Maybe you hold the lock after retry and you release the lock before retry.
> retry:
> - addr = ALIGN(vstart, align);
> + /* invalidate cache if we have more permissive parameters */
> + if (!free_vmap_cache ||
> + size <= cached_hole_size ||
> + vstart < cached_start ||
> + align < cached_align) {
> + cached_hole_size = 0;
> + free_vmap_cache = NULL;
> + }
> + /* record if we encounter less permissive parameters */
> + cached_start = vstart;
> + cached_align = align;
> +
> + /* find starting point for our search */
> + if (free_vmap_cache) {
> + first = rb_entry(free_vmap_cache, struct vmap_area, rb_node);
> + addr = ALIGN(first->va_end + PAGE_SIZE, align);
> + if (addr < vstart) {
> + free_vmap_cache = NULL;
> + goto retry;
> + }
> + if (addr + size - 1 < addr)
> + goto overflow;
>
> - spin_lock(&vmap_area_lock);
> - if (addr + size - 1 < addr)
> - goto overflow;
> + } else {
> + addr = ALIGN(vstart, align);
> + if (addr + size - 1 < addr)
> + goto overflow;
>
> - /* XXX: could have a last_hole cache */
> - n = vmap_area_root.rb_node;
> - if (n) {
> - struct vmap_area *first = NULL;
> + n = vmap_area_root.rb_node;
> + if (!n)
> + goto found;
>
> + first = NULL;
> do {
> struct vmap_area *tmp;
> tmp = rb_entry(n, struct vmap_area, rb_node);
> @@ -369,26 +400,37 @@ retry:
> if (!first)
> goto found;
>
> - if (first->va_end < addr) {
> - n = rb_next(&first->rb_node);
> - if (n)
> - first = rb_entry(n, struct vmap_area, rb_node);
> - else
> - goto found;
> - }
> -
> - while (addr + size > first->va_start && addr + size <= vend) {
> - addr = ALIGN(first->va_end + PAGE_SIZE, align);
> + if (first->va_start < addr) {
> + addr = ALIGN(max(first->va_end + PAGE_SIZE, addr), align);
Frankly speaking, I don't see the benefit which you mentiond that it makes
subsequent logic simpler. For me, I like old code which compares va_end.
In case of spanning, old code has the problem?
I think old code has no problem and looks good than current one.
> if (addr + size - 1 < addr)
> goto overflow;
> -
> n = rb_next(&first->rb_node);
> if (n)
> first = rb_entry(n, struct vmap_area, rb_node);
> else
> goto found;
> }
> + BUG_ON(first->va_start < addr);
> + if (addr + cached_hole_size < first->va_start)
> + cached_hole_size = first->va_start - addr;
> + }
> +
> + /* from the starting point, walk areas until a suitable hole is found */
Unnecessary empty line :)
> +
> + while (addr + size > first->va_start && addr + size <= vend) {
> + if (addr + cached_hole_size < first->va_start)
> + cached_hole_size = first->va_start - addr;
> + addr = ALIGN(first->va_end + PAGE_SIZE, align);
> + if (addr + size - 1 < addr)
> + goto overflow;
> +
> + n = rb_next(&first->rb_node);
> + if (n)
> + first = rb_entry(n, struct vmap_area, rb_node);
> + else
> + goto found;
> }
> +
> found:
> if (addr + size > vend) {
> overflow:
> @@ -406,14 +448,17 @@ overflow:
> return ERR_PTR(-EBUSY);
> }
>
> - BUG_ON(addr & (align-1));
> -
> va->va_start = addr;
> va->va_end = addr + size;
> va->flags = 0;
> __insert_vmap_area(va);
> + free_vmap_cache = &va->rb_node;
> spin_unlock(&vmap_area_lock);
>
> + BUG_ON(va->va_start & (align-1));
> + BUG_ON(va->va_start < vstart);
> + BUG_ON(va->va_end > vend);
> +
> return va;
> }
>
> @@ -427,6 +472,19 @@ static void rcu_free_va(struct rcu_head
> static void __free_vmap_area(struct vmap_area *va)
> {
> BUG_ON(RB_EMPTY_NODE(&va->rb_node));
> +
> + if (free_vmap_cache) {
> + if (va->va_end < cached_start) {
> + free_vmap_cache = NULL;
> + } else {
> + struct vmap_area *cache;
> + cache = rb_entry(free_vmap_cache, struct vmap_area, rb_node);
> + if (va->va_start <= cache->va_start) {
> + free_vmap_cache = rb_prev(&va->rb_node);
> + cache = rb_entry(free_vmap_cache, struct vmap_area, rb_node);
> + }
> + }
> + }
> rb_erase(&va->rb_node, &vmap_area_root);
> RB_CLEAR_NODE(&va->rb_node);
> list_del_rcu(&va->list);
Anyway, I am looking forard to seeing Steven's experiment.
If test has no problem, I will remake refactoring patch based on your patch. :)
Thanks, Nick.
--
Kind regards,
Minchan Kim
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2010-05-25 15:00 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-04-12 16:27 vmalloc performance Steven Whitehouse
2010-04-14 12:49 ` Steven Whitehouse
2010-04-14 14:24 ` Steven Whitehouse
2010-04-14 15:12 ` Minchan Kim
2010-04-14 15:13 ` Minchan Kim
2010-04-14 16:35 ` Minchan Kim
2010-04-15 8:33 ` Steven Whitehouse
2010-04-15 16:51 ` Minchan Kim
2010-04-16 14:10 ` Steven Whitehouse
2010-04-18 15:14 ` Minchan Kim
2010-04-19 12:58 ` Steven Whitehouse
2010-04-19 14:12 ` Minchan Kim
2010-04-29 13:43 ` Steven Whitehouse
2010-05-02 17:29 ` [PATCH] cache last free vmap_area to avoid restarting beginning Minchan Kim
2010-05-05 12:48 ` Steven Whitehouse
2010-05-05 16:16 ` Nick Piggin
2010-05-17 12:42 ` Steven Whitehouse
2010-05-18 13:44 ` Steven Whitehouse
2010-05-19 13:54 ` Steven Whitehouse
2010-05-19 13:56 ` Nick Piggin
2010-05-25 8:43 ` Nick Piggin
2010-05-25 15:00 ` Minchan Kim [this message]
2010-05-25 15:48 ` Steven Whitehouse
2010-05-22 9:53 ` Minchan Kim
2010-05-24 6:23 ` Nick Piggin
2010-04-19 13:38 ` vmalloc performance Nick Piggin
2010-04-19 14:09 ` Minchan Kim
2010-04-16 6:12 ` Nick Piggin
2010-04-16 7:20 ` Minchan Kim
2010-04-16 8:50 ` Steven Whitehouse
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20100525150038.GA3227@barrios-desktop \
--to=minchan.kim@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=npiggin@suse.de \
--cc=swhiteho@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).