From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
To: Bob Liu <lliubbo@gmail.com>
Cc: keir@xen.org, ian.campbell@citrix.com, andrew.cooper3@citrix.com,
jbeulich@suse.com, xen-devel@lists.xenproject.org
Subject: Re: [RFC PATCH v2] xen: free_domheap_pages: delay page scrub to idle loop
Date: Tue, 20 May 2014 09:56:17 -0400 [thread overview]
Message-ID: <20140520135616.GD3045@localhost.localdomain> (raw)
In-Reply-To: <1400552132-18654-1-git-send-email-bob.liu@oracle.com>
On Tue, May 20, 2014 at 10:15:31AM +0800, Bob Liu wrote:
> Because of page scrub, it's very slow to destroy a domain with large
> memory.
> It took around 10 minutes when destroy a guest of nearly 1 TB of memory.
>
> [root@ca-test111 ~]# time xm des 5
> real 10m51.582s
> user 0m0.115s
> sys 0m0.039s
> [root@ca-test111 ~]#
>
> There are two meanings to improve this situation.
> 1. Delay the page scrub in free_domheap_pages(), so that 'xl destroy xxx' can
> return earlier.
>
> 2. But the real scrub time doesn't get improved a lot, we should consider put the
> scrub job on all idle cpus in parallel. An obvious solution is add page to
> a global list during free_domheap_pages(), and then whenever a cpu enter
> idle_loop() it will try to isolate a page and scrub/free it.
> Unfortunately this solution didn't work as expected in my testing, because
> introduce a global list which also means we need a lock to protect that list.
> The cost is too heavy!
You can introduce a per-cpu list which does not need a global lock.
The problem is with insertion of items in it - that would require an IPI
which would then need to take the global lock and populate the local CPU
list.
Interestingly, this is what I had been working on to convert the
tasklets in per-cpu tasklets.
>
> So I use a percpu scrub page list in this patch, the tradeoff is we may not use
> all idle cpus. It depends on free_domheap_pages() runs on which cpu.
>
> Signed-off-by: Bob Liu <bob.liu@oracle.com>
> ---
> xen/arch/x86/domain.c | 1 +
> xen/common/page_alloc.c | 32 +++++++++++++++++++++++++++++++-
> xen/include/xen/mm.h | 1 +
> 3 files changed, 33 insertions(+), 1 deletion(-)
>
> diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
> index 6fddd4c..f3f1260 100644
> --- a/xen/arch/x86/domain.c
> +++ b/xen/arch/x86/domain.c
> @@ -119,6 +119,7 @@ static void idle_loop(void)
> (*pm_idle)();
I would actually do it _right_ before we call the pm_idle. As in
right before we go in C states instead of after we have been woken up -
as that means:
a) the timer expires, so a guest needs to be woken up - and we do not
want to take use its timeslice to scrub some other guest memory.
b). an interrupt that needs to be processed. (do_IRQ gets called,
does it stuff, setups the right softirq bits and then exits which
resumes).
> do_tasklet();
> do_softirq();
> + scrub_free_pages();
> }
> }
>
> diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
> index 601319c..b2a0fc5 100644
> --- a/xen/common/page_alloc.c
> +++ b/xen/common/page_alloc.c
> @@ -79,6 +79,8 @@ PAGE_LIST_HEAD(page_offlined_list);
> /* Broken page list, protected by heap_lock. */
> PAGE_LIST_HEAD(page_broken_list);
>
> +DEFINE_PER_CPU(struct page_list_head , page_scrub_list);
> +
> /*************************
> * BOOT-TIME ALLOCATOR
> */
> @@ -633,6 +635,9 @@ static struct page_info *alloc_heap_pages(
> goto found;
> } while ( zone-- > zone_lo ); /* careful: unsigned zone may wrap */
>
> + if ( scrub_free_pages() )
> + continue;
> +
> if ( memflags & MEMF_exact_node )
> goto not_found;
>
> @@ -1417,6 +1422,23 @@ void free_xenheap_pages(void *v, unsigned int order)
> #endif
>
>
> +unsigned long scrub_free_pages(void)
> +{
> + struct page_info *pg;
> + unsigned long nr_scrubed = 0;
> +
> + /* Scrub around 400M memory every time */
Could you mention why 400M?
> + while ( nr_scrubed < 100000 )
> + {
> + pg = page_list_remove_head( &this_cpu(page_scrub_list) );
> + if (!pg)
> + break;
> + scrub_one_page(pg);
> + free_heap_pages(pg, 0);
> + nr_scrubed++;
> + }
> + return nr_scrubed;
> +}
>
> /*************************
> * DOMAIN-HEAP SUB-ALLOCATOR
> @@ -1564,8 +1586,15 @@ void free_domheap_pages(struct page_info *pg, unsigned int order)
> * domain has died we assume responsibility for erasure.
> */
> if ( unlikely(d->is_dying) )
> + {
> + /*
> + * Add page to page_scrub_list to speed up domain destroy, those
> + * pages will be zeroed later by scrub_page_tasklet.
> + */
> for ( i = 0; i < (1 << order); i++ )
> - scrub_one_page(&pg[i]);
> + page_list_add_tail( &pg[i], &this_cpu(page_scrub_list) );
> + goto out;
> + }
>
> free_heap_pages(pg, order);
> }
> @@ -1583,6 +1612,7 @@ void free_domheap_pages(struct page_info *pg, unsigned int order)
> drop_dom_ref = 0;
> }
>
> +out:
> if ( drop_dom_ref )
> put_domain(d);
> }
> diff --git a/xen/include/xen/mm.h b/xen/include/xen/mm.h
> index b183189..3560335 100644
> --- a/xen/include/xen/mm.h
> +++ b/xen/include/xen/mm.h
> @@ -355,6 +355,7 @@ static inline unsigned int get_order_from_pages(unsigned long nr_pages)
> }
>
> void scrub_one_page(struct page_info *);
> +unsigned long scrub_free_pages(void);
>
> int xenmem_add_to_physmap_one(struct domain *d, unsigned int space,
> domid_t foreign_domid,
> --
> 1.7.10.4
>
prev parent reply other threads:[~2014-05-20 13:56 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-05-20 2:15 [RFC PATCH v2] xen: free_domheap_pages: delay page scrub to idle loop Bob Liu
2014-05-20 8:20 ` Jan Beulich
2014-05-20 8:47 ` Bob Liu
2014-05-20 9:46 ` Jan Beulich
2014-05-20 13:56 ` Konrad Rzeszutek Wilk [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140520135616.GD3045@localhost.localdomain \
--to=konrad.wilk@oracle.com \
--cc=andrew.cooper3@citrix.com \
--cc=ian.campbell@citrix.com \
--cc=jbeulich@suse.com \
--cc=keir@xen.org \
--cc=lliubbo@gmail.com \
--cc=xen-devel@lists.xenproject.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.