Huge perf degradation from missing xen_tlb_flush

xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed

* Huge perf degradation from missing xen_tlb_flush_all
@ 2012-10-26 22:43 Mukesh Rathor
  2012-10-26 22:58 ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 3+ messages in thread
From: Mukesh Rathor @ 2012-10-26 22:43 UTC (permalink / raw)
  To: Xen-devel@lists.xensource.com, Konrad Rzeszutek Wilk,
	david.vrabel

Hi,

A customer experienced huge degradation in migration performance moving
from 2.6.32 based dom0 to 2.6.39 based dom0. We tracked it down to
missing xen_tlb_flush_all() in 2.6.39/pv-ops kernel.

To summarize, in 2.6.32,  we had

#define flush_tlb_all xen_tlb_flush_all

As a result, when xen_remap_domain_mfn_range called flush_tlb_all(), 
it made a hypercall to xen: 

void xen_tlb_flush_all(void)
{
        struct mmuext_op op;
	op.cmd = MMUEXT_TLB_FLUSH_ALL;
	BUG_ON(HYPERVISOR_mmuext_op(&op, 1, NULL, DOMID_SELF) < 0);
}

xen optimized IPI to only relevant cpus. But in pvops/2.6.39 kernel,
the flush_tlb_all will IPI each VCPU whethere it's running or not:

void flush_tlb_all(void)
{
        on_each_cpu(do_flush_tlb_all, NULL, 1);
}

This results in each vcpu being scheduled to receive the event channel
at least. With large number of VCPUs the overhead is significant.

It seems the best solution would be to restore xen_tlb_flush_all().

Thoughts?

thanks
Mukesh

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Huge perf degradation from missing xen_tlb_flush_all
  2012-10-26 22:43 Huge perf degradation from missing xen_tlb_flush_all Mukesh Rathor
@ 2012-10-26 22:58 ` Konrad Rzeszutek Wilk
  2012-10-27  0:02   ` Mukesh Rathor
  0 siblings, 1 reply; 3+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-10-26 22:58 UTC (permalink / raw)
  To: Mukesh Rathor; +Cc: Xen-devel@lists.xensource.com, david.vrabel

On Fri, Oct 26, 2012 at 03:43:11PM -0700, Mukesh Rathor wrote:
> Hi,
> 
> A customer experienced huge degradation in migration performance moving
> from 2.6.32 based dom0 to 2.6.39 based dom0. We tracked it down to
> missing xen_tlb_flush_all() in 2.6.39/pv-ops kernel.
> 
> To summarize, in 2.6.32,  we had
> 
> #define flush_tlb_all xen_tlb_flush_all
> 
> As a result, when xen_remap_domain_mfn_range called flush_tlb_all(), 
> it made a hypercall to xen: 
> 
> void xen_tlb_flush_all(void)
> {
>         struct mmuext_op op;
> 	op.cmd = MMUEXT_TLB_FLUSH_ALL;
> 	BUG_ON(HYPERVISOR_mmuext_op(&op, 1, NULL, DOMID_SELF) < 0);
> }
> 
> xen optimized IPI to only relevant cpus. But in pvops/2.6.39 kernel,
> the flush_tlb_all will IPI each VCPU whethere it's running or not:
> 
> void flush_tlb_all(void)
> {
>         on_each_cpu(do_flush_tlb_all, NULL, 1);
> }
> 
> This results in each vcpu being scheduled to receive the event channel
> at least. With large number of VCPUs the overhead is significant.
> 
> It seems the best solution would be to restore xen_tlb_flush_all().
> 
> Thoughts?

Like this I presume (not compile tested):


diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index 6226c99..dd91c3c 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -1288,6 +1288,23 @@ unsigned long xen_read_cr2_direct(void)
 	return this_cpu_read(xen_vcpu_info.arch.cr2);
 }
 
+void xen_flush_tbl_all(void)
+{
+	struct mmuext_op *op;
+	struct multicall_space mcs;
+
+	preempt_disable();
+
+	mcs = xen_mc_entry(sizeof(*op));
+
+	op = mcs.args;
+	op->cmd = MMUEXT_TLB_FLUSH_ALL;
+	MULTI_mmuext_op(mcs.mc, op, 1, NULL, DOMID_SELF);
+
+	xen_mc_issue(PARAVIRT_LAZY_MMU);
+
+	preempt_enable();
+}
 static void xen_flush_tlb(void)
 {
 	struct mmuext_op *op;
@@ -2518,7 +2535,7 @@ int xen_remap_domain_mfn_range(struct vm_area_struct *vma,
 	err = 0;
 out:
 
-	flush_tlb_all();
+	xen_flush_tbl_all();
 
 	return err;
 }
> 
> thanks
> Mukesh

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: Huge perf degradation from missing xen_tlb_flush_all
  2012-10-26 22:58 ` Konrad Rzeszutek Wilk
@ 2012-10-27  0:02   ` Mukesh Rathor
  0 siblings, 0 replies; 3+ messages in thread
From: Mukesh Rathor @ 2012-10-27  0:02 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: Xen-devel@lists.xensource.com, david.vrabel

On Fri, 26 Oct 2012 18:58:13 -0400
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:

> On Fri, Oct 26, 2012 at 03:43:11PM -0700, Mukesh Rathor wrote:
> > Hi,
> > 
> Like this I presume (not compile tested):
> diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
> index 6226c99..dd91c3c 100644
> --- a/arch/x86/xen/mmu.c
> +++ b/arch/x86/xen/mmu.c
> @@ -1288,6 +1288,23 @@ unsigned long xen_read_cr2_direct(void)
>  	return this_cpu_read(xen_vcpu_info.arch.cr2);
>  }
>  
> +void xen_flush_tbl_all(void)
> +{
> +	struct mmuext_op *op;
> +	struct multicall_space mcs;
> +
> +	preempt_disable();
> +
> +	mcs = xen_mc_entry(sizeof(*op));
> +
> +	op = mcs.args;
> +	op->cmd = MMUEXT_TLB_FLUSH_ALL;
> +	MULTI_mmuext_op(mcs.mc, op, 1, NULL, DOMID_SELF);
> +
> +	xen_mc_issue(PARAVIRT_LAZY_MMU);
> +
> +	preempt_enable();
> +}
>  static void xen_flush_tlb(void)
>  {
>  	struct mmuext_op *op;
> @@ -2518,7 +2535,7 @@ int xen_remap_domain_mfn_range(struct
> vm_area_struct *vma, err = 0;
>  out:
>  
> -	flush_tlb_all();
> +	xen_flush_tbl_all();

We should examine other places flush_tlb_all() is called from too.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2012-10-27  0:02 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-10-26 22:43 Huge perf degradation from missing xen_tlb_flush_all Mukesh Rathor
2012-10-26 22:58 ` Konrad Rzeszutek Wilk
2012-10-27  0:02   ` Mukesh Rathor

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).