[RFC PATCH] xen: free_domheap_pages: delay page scrub to tasklet

xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed

* [RFC PATCH] xen: free_domheap_pages: delay page scrub to tasklet
@ 2014-05-19  2:57 Bob Liu
  2014-05-19  9:59 ` Andrew Cooper
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Bob Liu @ 2014-05-19  2:57 UTC (permalink / raw)
  To: xen-devel; +Cc: keir, ian.campbell, andrew.cooper3, jbeulich, boris.ostrovsky

Because of page scrub, it's very slow to destroy a domain with large
memory.
It took around 10 minutes when destroy a guest of nearly 1 TB of memory.

[root@ca-test111 ~]# time xm des 5
real    10m51.582s
user    0m0.115s
sys     0m0.039s
[root@ca-test111 ~]#

Use perf we can see what happened, thanks for Boris's help and provide this
useful tool for xen.
[root@x4-4 bob]# perf report
    22.32%       xl  [xen.syms]            [k] page_get_owner_and_reference
    20.82%       xl  [xen.syms]            [k] relinquish_memory
    20.63%       xl  [xen.syms]            [k] put_page
    17.10%       xl  [xen.syms]            [k] scrub_one_page
     4.74%       xl  [xen.syms]            [k] unmap_domain_page
     2.24%       xl  [xen.syms]            [k] get_page
     1.49%       xl  [xen.syms]            [k] free_heap_pages
     1.06%       xl  [xen.syms]            [k] _spin_lock
     0.78%       xl  [xen.syms]            [k] __put_page_type
     0.75%       xl  [xen.syms]            [k] map_domain_page
     0.57%       xl  [xen.syms]            [k] free_page_type
     0.52%       xl  [xen.syms]            [k] is_iomem_page
     0.42%       xl  [xen.syms]            [k] free_domheap_pages
     0.31%       xl  [xen.syms]            [k] put_page_from_l1e
     0.27%       xl  [xen.syms]            [k] check_lock
     0.27%       xl  [xen.syms]            [k] __mfn_valid

This patch try to delay scrub_one_page() to a tasklet which will be scheduled on
all online physical cpus, so that it's much faster to return from 'xl/xm
destroy xxx'.

Tested on a guest with 30G memory.
Before this patch:
[root@x4-4 bob]# time xl des PV-30G

real 0m16.014s
user 0m0.010s
sys  0m13.976s
[root@x4-4 bob]#

After:
[root@x4-4 bob]# time xl des PV-30G

real 0m3.581s
user 0m0.003s
sys  0m1.554s
[root@x4-4 bob]#

The destroy time reduced from 16s to 3s.

Signed-off-by: Bob Liu <bob.liu@oracle.com>
---
 xen/common/page_alloc.c |   39 ++++++++++++++++++++++++++++++++++++++-
 1 file changed, 38 insertions(+), 1 deletion(-)

diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index 601319c..2ca59a1 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -79,6 +79,10 @@ PAGE_LIST_HEAD(page_offlined_list);
 /* Broken page list, protected by heap_lock. */
 PAGE_LIST_HEAD(page_broken_list);
 
+PAGE_LIST_HEAD(page_scrub_list);
+static DEFINE_SPINLOCK(scrub_list_spinlock);
+static struct tasklet scrub_page_tasklet;
+
 /*************************
  * BOOT-TIME ALLOCATOR
  */
@@ -1417,6 +1421,25 @@ void free_xenheap_pages(void *v, unsigned int order)
 #endif
 
 
+static void scrub_free_pages(unsigned long unuse)
+{
+    struct page_info *pg;
+
+    for ( ; ; )
+    {
+        while ( page_list_empty(&page_scrub_list) )
+            cpu_relax();
+
+        spin_lock(&scrub_list_spinlock);
+        pg = page_list_remove_head(&page_scrub_list);
+        spin_unlock(&scrub_list_spinlock);
+        if (pg)
+        {
+            scrub_one_page(pg);
+            free_heap_pages(pg, 0);
+        }
+    }
+}
 
 /*************************
  * DOMAIN-HEAP SUB-ALLOCATOR
@@ -1425,6 +1448,7 @@ void free_xenheap_pages(void *v, unsigned int order)
 void init_domheap_pages(paddr_t ps, paddr_t pe)
 {
     unsigned long smfn, emfn;
+    unsigned int cpu;
 
     ASSERT(!in_irq());
 
@@ -1435,6 +1459,9 @@ void init_domheap_pages(paddr_t ps, paddr_t pe)
         return;
 
     init_heap_pages(mfn_to_page(smfn), emfn - smfn);
+    tasklet_init(&scrub_page_tasklet, scrub_free_pages, 0);
+    for_each_online_cpu(cpu)
+        tasklet_schedule_on_cpu(&scrub_page_tasklet, cpu);
 }
 
 
@@ -1564,8 +1591,17 @@ void free_domheap_pages(struct page_info *pg, unsigned int order)
          * domain has died we assume responsibility for erasure.
          */
         if ( unlikely(d->is_dying) )
+        {
+            /*
+             * Add page to page_scrub_list to speed up domain destroy, those
+	     * pages will be zeroed later by scrub_page_tasklet.
+             */
+            spin_lock(&scrub_list_spinlock);
             for ( i = 0; i < (1 << order); i++ )
-                scrub_one_page(&pg[i]);
+                page_list_add_tail(&pg[i], &page_scrub_list);
+            spin_unlock(&scrub_list_spinlock);
+            goto out;
+        }
 
         free_heap_pages(pg, order);
     }
@@ -1583,6 +1619,7 @@ void free_domheap_pages(struct page_info *pg, unsigned int order)
         drop_dom_ref = 0;
     }
 
+out:
     if ( drop_dom_ref )
         put_domain(d);
 }
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH] xen: free_domheap_pages: delay page scrub to tasklet
  2014-05-19  2:57 [RFC PATCH] xen: free_domheap_pages: delay page scrub to tasklet Bob Liu
@ 2014-05-19  9:59 ` Andrew Cooper
  2014-05-19 10:10 ` Konrad Rzeszutek Wilk
  2014-05-19 11:34 ` Jan Beulich
  2 siblings, 0 replies; 9+ messages in thread
From: Andrew Cooper @ 2014-05-19  9:59 UTC (permalink / raw)
  To: Bob Liu; +Cc: keir, ian.campbell, jbeulich, xen-devel, boris.ostrovsky

On 19/05/14 03:57, Bob Liu wrote:
> Because of page scrub, it's very slow to destroy a domain with large
> memory.
> It took around 10 minutes when destroy a guest of nearly 1 TB of memory.
>
> [root@ca-test111 ~]# time xm des 5
> real    10m51.582s
> user    0m0.115s
> sys     0m0.039s
> [root@ca-test111 ~]#
>
> Use perf we can see what happened, thanks for Boris's help and provide this
> useful tool for xen.
> [root@x4-4 bob]# perf report
>     22.32%       xl  [xen.syms]            [k] page_get_owner_and_reference
>     20.82%       xl  [xen.syms]            [k] relinquish_memory
>     20.63%       xl  [xen.syms]            [k] put_page
>     17.10%       xl  [xen.syms]            [k] scrub_one_page
>      4.74%       xl  [xen.syms]            [k] unmap_domain_page
>      2.24%       xl  [xen.syms]            [k] get_page
>      1.49%       xl  [xen.syms]            [k] free_heap_pages
>      1.06%       xl  [xen.syms]            [k] _spin_lock
>      0.78%       xl  [xen.syms]            [k] __put_page_type
>      0.75%       xl  [xen.syms]            [k] map_domain_page
>      0.57%       xl  [xen.syms]            [k] free_page_type
>      0.52%       xl  [xen.syms]            [k] is_iomem_page
>      0.42%       xl  [xen.syms]            [k] free_domheap_pages
>      0.31%       xl  [xen.syms]            [k] put_page_from_l1e
>      0.27%       xl  [xen.syms]            [k] check_lock
>      0.27%       xl  [xen.syms]            [k] __mfn_valid
>
> This patch try to delay scrub_one_page() to a tasklet which will be scheduled on
> all online physical cpus, so that it's much faster to return from 'xl/xm
> destroy xxx'.
>
> Tested on a guest with 30G memory.
> Before this patch:
> [root@x4-4 bob]# time xl des PV-30G
>
> real 0m16.014s
> user 0m0.010s
> sys  0m13.976s
> [root@x4-4 bob]#
>
> After:
> [root@x4-4 bob]# time xl des PV-30G
>
> real 0m3.581s
> user 0m0.003s
> sys  0m1.554s
> [root@x4-4 bob]#
>
> The destroy time reduced from 16s to 3s.
>
> Signed-off-by: Bob Liu <bob.liu@oracle.com>
> ---
>  xen/common/page_alloc.c |   39 ++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 38 insertions(+), 1 deletion(-)
>
> diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
> index 601319c..2ca59a1 100644
> --- a/xen/common/page_alloc.c
> +++ b/xen/common/page_alloc.c
> @@ -79,6 +79,10 @@ PAGE_LIST_HEAD(page_offlined_list);
>  /* Broken page list, protected by heap_lock. */
>  PAGE_LIST_HEAD(page_broken_list);
>  
> +PAGE_LIST_HEAD(page_scrub_list);
> +static DEFINE_SPINLOCK(scrub_list_spinlock);
> +static struct tasklet scrub_page_tasklet;
> +
>  /*************************
>   * BOOT-TIME ALLOCATOR
>   */
> @@ -1417,6 +1421,25 @@ void free_xenheap_pages(void *v, unsigned int order)
>  #endif
>  
>  
> +static void scrub_free_pages(unsigned long unuse)
> +{
> +    struct page_info *pg;
> +
> +    for ( ; ; )
> +    {

A tasklet function is expected to return.  I don't see how this works at
all...

> +        while ( page_list_empty(&page_scrub_list) )
> +            cpu_relax();
> +
> +        spin_lock(&scrub_list_spinlock);
> +        pg = page_list_remove_head(&page_scrub_list);
> +        spin_unlock(&scrub_list_spinlock);
> +        if (pg)
> +        {
> +            scrub_one_page(pg);
> +            free_heap_pages(pg, 0);
> +        }
> +    }
> +}
>  
>  /*************************
>   * DOMAIN-HEAP SUB-ALLOCATOR
> @@ -1425,6 +1448,7 @@ void free_xenheap_pages(void *v, unsigned int order)
>  void init_domheap_pages(paddr_t ps, paddr_t pe)
>  {
>      unsigned long smfn, emfn;
> +    unsigned int cpu;
>  
>      ASSERT(!in_irq());
>  
> @@ -1435,6 +1459,9 @@ void init_domheap_pages(paddr_t ps, paddr_t pe)
>          return;
>  
>      init_heap_pages(mfn_to_page(smfn), emfn - smfn);
> +    tasklet_init(&scrub_page_tasklet, scrub_free_pages, 0);
> +    for_each_online_cpu(cpu)
> +        tasklet_schedule_on_cpu(&scrub_page_tasklet, cpu);

So now you have an infinite loop doing nothing, running on all cpus in
tasklet context ?

>  }
>  
>  
> @@ -1564,8 +1591,17 @@ void free_domheap_pages(struct page_info *pg, unsigned int order)
>           * domain has died we assume responsibility for erasure.
>           */
>          if ( unlikely(d->is_dying) )
> +        {
> +            /*
> +             * Add page to page_scrub_list to speed up domain destroy, those
> +	     * pages will be zeroed later by scrub_page_tasklet.
> +             */

Spaces/tabs

~Andrew

> +            spin_lock(&scrub_list_spinlock);
>              for ( i = 0; i < (1 << order); i++ )
> -                scrub_one_page(&pg[i]);
> +                page_list_add_tail(&pg[i], &page_scrub_list);
> +            spin_unlock(&scrub_list_spinlock);
> +            goto out;
> +        }
>  
>          free_heap_pages(pg, order);
>      }
> @@ -1583,6 +1619,7 @@ void free_domheap_pages(struct page_info *pg, unsigned int order)
>          drop_dom_ref = 0;
>      }
>  
> +out:
>      if ( drop_dom_ref )
>          put_domain(d);
>  }

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH] xen: free_domheap_pages: delay page scrub to tasklet
  2014-05-19  2:57 [RFC PATCH] xen: free_domheap_pages: delay page scrub to tasklet Bob Liu
  2014-05-19  9:59 ` Andrew Cooper
@ 2014-05-19 10:10 ` Konrad Rzeszutek Wilk
  2014-05-19 11:34 ` Jan Beulich
  2 siblings, 0 replies; 9+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-05-19 10:10 UTC (permalink / raw)
  To: Bob Liu, xen-devel
  Cc: keir, ian.campbell, andrew.cooper3, jbeulich, boris.ostrovsky

On May 18, 2014 10:57:56 PM EDT, Bob Liu <lliubbo@gmail.com> wrote:
>Because of page scrub, it's very slow to destroy a domain with large
>memory.
>It took around 10 minutes when destroy a guest of nearly 1 TB of
>memory.
>
>[root@ca-test111 ~]# time xm des 5
>real    10m51.582s
>user    0m0.115s
>sys     0m0.039s
>[root@ca-test111 ~]#
>
>Use perf we can see what happened, thanks for Boris's help and provide
>this
>useful tool for xen.
>[root@x4-4 bob]# perf report
>22.32%       xl  [xen.syms]            [k] page_get_owner_and_reference
>    20.82%       xl  [xen.syms]            [k] relinquish_memory
>    20.63%       xl  [xen.syms]            [k] put_page
>    17.10%       xl  [xen.syms]            [k] scrub_one_page
>     4.74%       xl  [xen.syms]            [k] unmap_domain_page
>     2.24%       xl  [xen.syms]            [k] get_page
>     1.49%       xl  [xen.syms]            [k] free_heap_pages
>     1.06%       xl  [xen.syms]            [k] _spin_lock
>     0.78%       xl  [xen.syms]            [k] __put_page_type
>     0.75%       xl  [xen.syms]            [k] map_domain_page
>     0.57%       xl  [xen.syms]            [k] free_page_type
>     0.52%       xl  [xen.syms]            [k] is_iomem_page
>     0.42%       xl  [xen.syms]            [k] free_domheap_pages
>     0.31%       xl  [xen.syms]            [k] put_page_from_l1e
>     0.27%       xl  [xen.syms]            [k] check_lock
>     0.27%       xl  [xen.syms]            [k] __mfn_valid
>
>This patch try to delay scrub_one_page() to a tasklet which will be
>scheduled on
>all online physical cpus, so that it's much faster to return from
>'xl/xm
>destroy xxx'.

Thank you digging in this. However tasklets do not run in parallel. That is they are only executed on one CPU.

>
>Tested on a guest with 30G memory.
>Before this patch:
>[root@x4-4 bob]# time xl des PV-30G
>
>real 0m16.014s
>user 0m0.010s
>sys  0m13.976s
>[root@x4-4 bob]#
>
>After:
>[root@x4-4 bob]# time xl des PV-30G
>
>real 0m3.581s
>user 0m0.003s
>sys  0m1.554s
>[root@x4-4 bob]#
>
>The destroy time reduced from 16s to 3s.

Right. By moving the scrubbing from this function to a task let.
>
>Signed-off-by: Bob Liu <bob.liu@oracle.com>
>---
> xen/common/page_alloc.c |   39 ++++++++++++++++++++++++++++++++++++++-
> 1 file changed, 38 insertions(+), 1 deletion(-)
>
>diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
>index 601319c..2ca59a1 100644
>--- a/xen/common/page_alloc.c
>+++ b/xen/common/page_alloc.c
>@@ -79,6 +79,10 @@ PAGE_LIST_HEAD(page_offlined_list);
> /* Broken page list, protected by heap_lock. */
> PAGE_LIST_HEAD(page_broken_list);
> 
>+PAGE_LIST_HEAD(page_scrub_list);
>+static DEFINE_SPINLOCK(scrub_list_spinlock);
>+static struct tasklet scrub_page_tasklet;
>+
> /*************************
>  * BOOT-TIME ALLOCATOR
>  */
>@@ -1417,6 +1421,25 @@ void free_xenheap_pages(void *v, unsigned int
>order)
> #endif
> 
> 
>+static void scrub_free_pages(unsigned long unuse)
>+{
>+    struct page_info *pg;
>+
>+    for ( ; ; )
>+    {
>+        while ( page_list_empty(&page_scrub_list) )
>+            cpu_relax();
>+
>+        spin_lock(&scrub_list_spinlock);
>+        pg = page_list_remove_head(&page_scrub_list);
>+        spin_unlock(&scrub_list_spinlock);
>+        if (pg)
>+        {
>+            scrub_one_page(pg);
>+            free_heap_pages(pg, 0);
>+        }
>+    }

I fear that means you added an work item that can run for a very long time and cause security issues (DoS to guests). The VMEXIT code for example checks to see if a softirq is to run and will run any tasklets. Which means you could be running this scrubbing now in another guest context and cause it to be delayed significantly.


A couple of ideas;
 - have per cpu tasklets for nr_online_cpus and they all can try to do some batched work and if any anything is left reschedule themselves.
 - if a worker detects that it is not running within the idle domain context then schedule itself for later 
 - perhaps also look at having an per-cpu scrubbing list. And then feed them per node list ?

Thanks!
>+}
> 
> /*************************
>  * DOMAIN-HEAP SUB-ALLOCATOR
>@@ -1425,6 +1448,7 @@ void free_xenheap_pages(void *v, unsigned int
>order)
> void init_domheap_pages(paddr_t ps, paddr_t pe)
> {
>     unsigned long smfn, emfn;
>+    unsigned int cpu;
> 
>     ASSERT(!in_irq());
> 
>@@ -1435,6 +1459,9 @@ void init_domheap_pages(paddr_t ps, paddr_t pe)
>         return;
> 
>     init_heap_pages(mfn_to_page(smfn), emfn - smfn);
>+    tasklet_init(&scrub_page_tasklet, scrub_free_pages, 0);
>+    for_each_online_cpu(cpu)
>+        tasklet_schedule_on_cpu(&scrub_page_tasklet, cpu);
> }
> 
> 
>@@ -1564,8 +1591,17 @@ void free_domheap_pages(struct page_info *pg,
>unsigned int order)
>          * domain has died we assume responsibility for erasure.
>          */
>         if ( unlikely(d->is_dying) )
>+        {
>+            /*
>+             * Add page to page_scrub_list to speed up domain destroy,
>those
>+	     * pages will be zeroed later by scrub_page_tasklet.
>+             */
>+            spin_lock(&scrub_list_spinlock);
>             for ( i = 0; i < (1 << order); i++ )
>-                scrub_one_page(&pg[i]);
>+                page_list_add_tail(&pg[i], &page_scrub_list);
>+            spin_unlock(&scrub_list_spinlock);
>+            goto out;
>+        }
> 
>         free_heap_pages(pg, order);
>     }
>@@ -1583,6 +1619,7 @@ void free_domheap_pages(struct page_info *pg,
>unsigned int order)
>         drop_dom_ref = 0;
>     }
> 
>+out:
>     if ( drop_dom_ref )
>         put_domain(d);
> }

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH] xen: free_domheap_pages: delay page scrub to tasklet
  2014-05-19  2:57 [RFC PATCH] xen: free_domheap_pages: delay page scrub to tasklet Bob Liu
  2014-05-19  9:59 ` Andrew Cooper
  2014-05-19 10:10 ` Konrad Rzeszutek Wilk
@ 2014-05-19 11:34 ` Jan Beulich
  2014-05-20  2:14   ` Bob Liu
  2 siblings, 1 reply; 9+ messages in thread
From: Jan Beulich @ 2014-05-19 11:34 UTC (permalink / raw)
  To: Bob Liu; +Cc: keir, ian.campbell, andrew.cooper3, xen-devel, boris.ostrovsky

>>> On 19.05.14 at 04:57, <lliubbo@gmail.com> wrote:
> This patch try to delay scrub_one_page() to a tasklet which will be 
> scheduled on
> all online physical cpus, so that it's much faster to return from 'xl/xm
> destroy xxx'.

At the price of impacting all other guests. I think this is too simplistic
an approach. For one, I think the behavior ought to be configurable
by the admin: Deferring the scrubbing means you can't use the
memory for creating a new guest right away. And then you should
be doing this only on idle CPUs, or (with care not to introduce
security issues nor exhaustion of the DMA region) on CPUs actively
requesting memory, where the request can't be fulfilled without using
some of the not yet scrubbed memory.

And btw., 10 min of cleanup time for 1Tb seems rather much
independent of the specific scrubber behavior - did you check
whether decreasing the rate at which relinquish_memory() calls
hypercall_preempt_check() wouldn't already reduce this be quite
a bit?

Jan

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH] xen: free_domheap_pages: delay page scrub to tasklet
  2014-05-19 11:34 ` Jan Beulich
@ 2014-05-20  2:14   ` Bob Liu
  2014-05-20  6:27     ` Jan Beulich
  0 siblings, 1 reply; 9+ messages in thread
From: Bob Liu @ 2014-05-20  2:14 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Bob Liu, keir, ian.campbell, andrew.cooper3, xen-devel,
	boris.ostrovsky


On 05/19/2014 07:34 PM, Jan Beulich wrote:
>>>> On 19.05.14 at 04:57, <lliubbo@gmail.com> wrote:
>> This patch try to delay scrub_one_page() to a tasklet which will be 
>> scheduled on
>> all online physical cpus, so that it's much faster to return from 'xl/xm
>> destroy xxx'.
> 
> At the price of impacting all other guests. I think this is too simplistic
> an approach. For one, I think the behavior ought to be configurable
> by the admin: Deferring the scrubbing means you can't use the
> memory for creating a new guest right away. And then you should
> be doing this only on idle CPUs, or (with care not to introduce
> security issues nor exhaustion of the DMA region) on CPUs actively
> requesting memory, where the request can't be fulfilled without using
> some of the not yet scrubbed memory.
> 
> And btw., 10 min of cleanup time for 1Tb seems rather much
> independent of the specific scrubber behavior - did you check
> whether decreasing the rate at which relinquish_memory() calls
> hypercall_preempt_check() wouldn't already reduce this be quite
> a bit?
> 

I tried to call hypercall_preempt_check() every 10000 page, but the time
didn't get any reduced.

-- 
Regards,
-Bob

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH] xen: free_domheap_pages: delay page scrub to tasklet
  2014-05-20  2:14   ` Bob Liu
@ 2014-05-20  6:27     ` Jan Beulich
  2014-05-20  7:11       ` Bob Liu
  0 siblings, 1 reply; 9+ messages in thread
From: Jan Beulich @ 2014-05-20  6:27 UTC (permalink / raw)
  To: Bob Liu
  Cc: Bob Liu, keir, ian.campbell, andrew.cooper3, xen-devel,
	boris.ostrovsky

>>> On 20.05.14 at 04:14, <bob.liu@oracle.com> wrote:

> On 05/19/2014 07:34 PM, Jan Beulich wrote:
>>>>> On 19.05.14 at 04:57, <lliubbo@gmail.com> wrote:
>>> This patch try to delay scrub_one_page() to a tasklet which will be 
>>> scheduled on
>>> all online physical cpus, so that it's much faster to return from 'xl/xm
>>> destroy xxx'.
>> 
>> At the price of impacting all other guests. I think this is too simplistic
>> an approach. For one, I think the behavior ought to be configurable
>> by the admin: Deferring the scrubbing means you can't use the
>> memory for creating a new guest right away. And then you should
>> be doing this only on idle CPUs, or (with care not to introduce
>> security issues nor exhaustion of the DMA region) on CPUs actively
>> requesting memory, where the request can't be fulfilled without using
>> some of the not yet scrubbed memory.
>> 
>> And btw., 10 min of cleanup time for 1Tb seems rather much
>> independent of the specific scrubber behavior - did you check
>> whether decreasing the rate at which relinquish_memory() calls
>> hypercall_preempt_check() wouldn't already reduce this be quite
>> a bit?
>> 
> 
> I tried to call hypercall_preempt_check() every 10000 page, but the time
> didn't get any reduced.

So if you have the system scrub 1Tb at boot (via suitable
dom0_mem=), how long does that take?

Jan

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH] xen: free_domheap_pages: delay page scrub to tasklet
  2014-05-20  6:27     ` Jan Beulich
@ 2014-05-20  7:11       ` Bob Liu
  2014-05-20  7:26         ` Jan Beulich
  0 siblings, 1 reply; 9+ messages in thread
From: Bob Liu @ 2014-05-20  7:11 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Bob Liu, keir, ian.campbell, andrew.cooper3, xen-devel,
	boris.ostrovsky


On 05/20/2014 02:27 PM, Jan Beulich wrote:
>>>> On 20.05.14 at 04:14, <bob.liu@oracle.com> wrote:
> 
>> On 05/19/2014 07:34 PM, Jan Beulich wrote:
>>>>>> On 19.05.14 at 04:57, <lliubbo@gmail.com> wrote:
>>>> This patch try to delay scrub_one_page() to a tasklet which will be 
>>>> scheduled on
>>>> all online physical cpus, so that it's much faster to return from 'xl/xm
>>>> destroy xxx'.
>>>
>>> At the price of impacting all other guests. I think this is too simplistic
>>> an approach. For one, I think the behavior ought to be configurable
>>> by the admin: Deferring the scrubbing means you can't use the
>>> memory for creating a new guest right away. And then you should
>>> be doing this only on idle CPUs, or (with care not to introduce
>>> security issues nor exhaustion of the DMA region) on CPUs actively
>>> requesting memory, where the request can't be fulfilled without using
>>> some of the not yet scrubbed memory.
>>>
>>> And btw., 10 min of cleanup time for 1Tb seems rather much
>>> independent of the specific scrubber behavior - did you check
>>> whether decreasing the rate at which relinquish_memory() calls
>>> hypercall_preempt_check() wouldn't already reduce this be quite
>>> a bit?
>>>
>>
>> I tried to call hypercall_preempt_check() every 10000 page, but the time
>> didn't get any reduced.
> 
> So if you have the system scrub 1Tb at boot (via suitable
> dom0_mem=), how long does that take?
> 

I only have a 32G machine, the 1Tb bug was reported by our testing engineer.

On 32G machine, if set dom0_mem=2G the scrub time in "(XEN) Scrubbing
Free RAM:" is around 12s at boot.

The xl destroy time for a 30G guest is always around 15s even decreased
the rate of calling hypercall_preempt_check().

-- 
Regards,
-Bob

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH] xen: free_domheap_pages: delay page scrub to tasklet
  2014-05-20  7:11       ` Bob Liu
@ 2014-05-20  7:26         ` Jan Beulich
  2014-05-20  8:14           ` Bob Liu
  0 siblings, 1 reply; 9+ messages in thread
From: Jan Beulich @ 2014-05-20  7:26 UTC (permalink / raw)
  To: Bob Liu
  Cc: Bob Liu, keir, ian.campbell, andrew.cooper3, xen-devel,
	boris.ostrovsky

>>> On 20.05.14 at 09:11, <bob.liu@oracle.com> wrote:
> On 05/20/2014 02:27 PM, Jan Beulich wrote:
>> So if you have the system scrub 1Tb at boot (via suitable
>> dom0_mem=), how long does that take?
>> 
> 
> I only have a 32G machine, the 1Tb bug was reported by our testing engineer.
> 
> On 32G machine, if set dom0_mem=2G the scrub time in "(XEN) Scrubbing
> Free RAM:" is around 12s at boot.
> 
> The xl destroy time for a 30G guest is always around 15s even decreased
> the rate of calling hypercall_preempt_check().

Okay, so these numbers at least appear to correlate. And in fact I
think 3Gb/s (approximated) isn't that unreasonable a number; at
least it's not orders of magnitude away from theoretical bandwidth.

Which means yes, better dealing with the load resulting from the
post-guest-death scrubbing would be desirable, but otoh it's also
not really unexpected for this taking minutes for huge guests. Any
change here clearly need proper judgment between latency and
the effect on other guests it has: As said previously, impacting all
other guests just so that the scrubbing would get done quickly
doesn't seem right either.

Jan

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH] xen: free_domheap_pages: delay page scrub to tasklet
  2014-05-20  7:26         ` Jan Beulich
@ 2014-05-20  8:14           ` Bob Liu
  0 siblings, 0 replies; 9+ messages in thread
From: Bob Liu @ 2014-05-20  8:14 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Bob Liu, keir, ian.campbell, andrew.cooper3, xen-devel,
	boris.ostrovsky


On 05/20/2014 03:26 PM, Jan Beulich wrote:
>>>> On 20.05.14 at 09:11, <bob.liu@oracle.com> wrote:
>> On 05/20/2014 02:27 PM, Jan Beulich wrote:
>>> So if you have the system scrub 1Tb at boot (via suitable
>>> dom0_mem=), how long does that take?
>>>
>>
>> I only have a 32G machine, the 1Tb bug was reported by our testing engineer.
>>
>> On 32G machine, if set dom0_mem=2G the scrub time in "(XEN) Scrubbing
>> Free RAM:" is around 12s at boot.
>>
>> The xl destroy time for a 30G guest is always around 15s even decreased
>> the rate of calling hypercall_preempt_check().
> 
> Okay, so these numbers at least appear to correlate. And in fact I
> think 3Gb/s (approximated) isn't that unreasonable a number; at
> least it's not orders of magnitude away from theoretical bandwidth.
> 
> Which means yes, better dealing with the load resulting from the
> post-guest-death scrubbing would be desirable, but otoh it's also
> not really unexpected for this taking minutes for huge guests. Any
> change here clearly need proper judgment between latency and
> the effect on other guests it has: As said previously, impacting all
> other guests just so that the scrubbing would get done quickly
> doesn't seem right either.
> 

Yes, so I have sent out an new version mainly based on your suggestions
with title "[RFC PATCH v2] xen: free_domheap_pages: delay page scrub to
idle loop".

Pages are added to a percpu scrub list in free_domheap_pages(), and the
real scrub work is done in idle_loop(). By this way, no scrub work is
assigned to unrelated cpu which never executes free_domheap_pages().

The trade off is we can't use all cpu resources to do the scrub job in
parallel.
But at least we arrived:
1. Make xl destroy return faster, ~3s for a 30G guest.
2. Do the scrub job in idle_loop() is still faster than in
relinquish_memory(), because E.g there are some atomic instructions in
relinquish_memory() every loop.

Please take a review.

Thanks,
-Bob

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2014-05-20  8:15 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-05-19  2:57 [RFC PATCH] xen: free_domheap_pages: delay page scrub to tasklet Bob Liu
2014-05-19  9:59 ` Andrew Cooper
2014-05-19 10:10 ` Konrad Rzeszutek Wilk
2014-05-19 11:34 ` Jan Beulich
2014-05-20  2:14   ` Bob Liu
2014-05-20  6:27     ` Jan Beulich
2014-05-20  7:11       ` Bob Liu
2014-05-20  7:26         ` Jan Beulich
2014-05-20  8:14           ` Bob Liu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).