xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2] xen/arm: Trap and yield on WFE instructions
@ 2014-07-16 10:32 Anup Patel
  2014-07-16 14:49 ` Ian Campbell
  0 siblings, 1 reply; 3+ messages in thread
From: Anup Patel @ 2014-07-16 10:32 UTC (permalink / raw)
  To: xen-devel
  Cc: Ian.Campbell, Anup Patel, stefano.stabellini, patches,
	stefano.stabellini, Pranavkumar Sawargaonkar

If we have a Guest/DomU with two or more of its VCPUs running
on same host CPU then it can quite likely happen that these
VCPUs fight for same spinlock and one of them will waste CPU
cycles in WFE instruction. This patch makes WFE instruction
trap for VCPU and forces VCPU to yield its timeslice.

The KVM ARM/ARM64 also does similar thing for handling WFE
instructions. (Please refer,
https://lists.cs.columbia.edu/pipermail/kvmarm/2013-November/006259.html)

In general, this patch is more of an optimization for an
oversubscribed system having number of VCPUs more than
underlying host CPUs.

Changes since V1:
 - Added separate member in union hsr for decoding WFI/WFE
   related info.

Signed-off-by: Anup Patel <anup.patel@linaro.org>
Signed-off-by: Pranavkumar Sawargaonkar <pranavkumar@linaro.org>
Tested-by: Pranavkumar Sawargaonkar <pranavkumar@linaro.org>
---
 xen/arch/arm/traps.c            |   27 ++++++++++++++++-----------
 xen/include/asm-arm/processor.h |    9 +++++++++
 2 files changed, 25 insertions(+), 11 deletions(-)

diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c
index 686d8b7..632b8ea 100644
--- a/xen/arch/arm/traps.c
+++ b/xen/arch/arm/traps.c
@@ -90,7 +90,7 @@ void __cpuinit init_traps(void)
 
     /* Setup hypervisor traps */
     WRITE_SYSREG(HCR_PTW|HCR_BSU_INNER|HCR_AMO|HCR_IMO|HCR_FMO|HCR_VM|
-                 HCR_TWI|HCR_TSC|HCR_TAC|HCR_SWIO|HCR_TIDCP, HCR_EL2);
+                 HCR_TWE|HCR_TWI|HCR_TSC|HCR_TAC|HCR_SWIO|HCR_TIDCP, HCR_EL2);
     isb();
 }
 
@@ -1803,16 +1803,21 @@ asmlinkage void do_trap_hypervisor(struct cpu_user_regs *regs)
             advance_pc(regs, hsr);
             return;
         }
-        /* at the moment we only trap WFI */
-        vcpu_block();
-        /* The ARM spec declares that even if local irqs are masked in
-         * the CPSR register, an irq should wake up a cpu from WFI anyway.
-         * For this reason we need to check for irqs that need delivery,
-         * ignoring the CPSR register, *after* calling SCHEDOP_block to
-         * avoid races with vgic_vcpu_inject_irq.
-         */
-        if ( local_events_need_delivery_nomask() )
-            vcpu_unblock(current);
+        if ( hsr.wfi_wfe.ti ) {
+            /* Yield the VCPU for WFE */
+            vcpu_force_reschedule(current);
+        } else {
+            /* Block the VCPU for WFI */
+            vcpu_block();
+            /* The ARM spec declares that even if local irqs are masked in
+             * the CPSR register, an irq should wake up a cpu from WFI anyway.
+             * For this reason we need to check for irqs that need delivery,
+             * ignoring the CPSR register, *after* calling SCHEDOP_block to
+             * avoid races with vgic_vcpu_inject_irq.
+             */
+            if ( local_events_need_delivery_nomask() )
+                vcpu_unblock(current);
+        }
         advance_pc(regs, hsr);
         break;
     case HSR_EC_CP15_32:
diff --git a/xen/include/asm-arm/processor.h b/xen/include/asm-arm/processor.h
index bdfff4e..cd1db4d 100644
--- a/xen/include/asm-arm/processor.h
+++ b/xen/include/asm-arm/processor.h
@@ -276,6 +276,15 @@ union hsr {
         unsigned long ec:6;    /* Exception Class */
     } cond;
 
+    struct hsr_wfi_wfe {
+	unsigned long ti:1;    /* Trapped instruction */
+        unsigned long sbzp:19;
+        unsigned long cc:4;    /* Condition Code */
+        unsigned long ccvalid:1;/* CC Valid */
+        unsigned long len:1;   /* Instruction length */
+        unsigned long ec:6;    /* Exception Class */
+    } wfi_wfe;
+
     /* reg, reg0, reg1 are 4 bits on AArch32, the fifth bit is sbzp. */
     struct hsr_cp32 {
         unsigned long read:1;  /* Direction */
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH v2] xen/arm: Trap and yield on WFE instructions
  2014-07-16 10:32 [PATCH v2] xen/arm: Trap and yield on WFE instructions Anup Patel
@ 2014-07-16 14:49 ` Ian Campbell
  2014-07-22 17:58   ` Julien Grall
  0 siblings, 1 reply; 3+ messages in thread
From: Ian Campbell @ 2014-07-16 14:49 UTC (permalink / raw)
  To: Anup Patel
  Cc: Pranavkumar Sawargaonkar, patches, stefano.stabellini,
	stefano.stabellini, xen-devel


On Wed, 2014-07-16 at 16:02 +0530, Anup Patel wrote:
> If we have a Guest/DomU with two or more of its VCPUs running
> on same host CPU then it can quite likely happen that these
> VCPUs fight for same spinlock and one of them will waste CPU
> cycles in WFE instruction. This patch makes WFE instruction
> trap for VCPU and forces VCPU to yield its timeslice.
> 
> The KVM ARM/ARM64 also does similar thing for handling WFE
> instructions. (Please refer,
> https://lists.cs.columbia.edu/pipermail/kvmarm/2013-November/006259.html)
> 
> In general, this patch is more of an optimization for an
> oversubscribed system having number of VCPUs more than
> underlying host CPUs.
> 
> Changes since V1:
>  - Added separate member in union hsr for decoding WFI/WFE
>    related info.
> 
> Signed-off-by: Anup Patel <anup.patel@linaro.org>
> Signed-off-by: Pranavkumar Sawargaonkar <pranavkumar@linaro.org>
> Tested-by: Pranavkumar Sawargaonkar <pranavkumar@linaro.org>

Acked + applied. There was a conflict with "[PATCH v4 1/2] xen/arm :
Adding helper function for WFI" which I just applied before it. I fixed
it up and the result is below, please check it is ok.

I also nuked a hard tab which had snuck in.

Ian.

commit af82c49116c7bf6857be6bf6b56094b9eb2ef012
Author: Anup Patel <anup.patel@linaro.org>
Date:   Wed Jul 16 16:02:15 2014 +0530

    xen/arm: Trap and yield on WFE instructions
    
    If we have a Guest/DomU with two or more of its VCPUs running
    on same host CPU then it can quite likely happen that these
    VCPUs fight for same spinlock and one of them will waste CPU
    cycles in WFE instruction. This patch makes WFE instruction
    trap for VCPU and forces VCPU to yield its timeslice.
    
    The KVM ARM/ARM64 also does similar thing for handling WFE
    instructions. (Please refer,

https://lists.cs.columbia.edu/pipermail/kvmarm/2013-November/006259.html)
    
    In general, this patch is more of an optimization for an
    oversubscribed system having number of VCPUs more than
    underlying host CPUs.
    
    Signed-off-by: Anup Patel <anup.patel@linaro.org>
    Signed-off-by: Pranavkumar Sawargaonkar <pranavkumar@linaro.org>
    Tested-by: Pranavkumar Sawargaonkar <pranavkumar@linaro.org>
    Acked-by: Ian Campbell <ian.campbell@citrix.com>
    [ ijc -- resolved conflict with "Adding helper function for WFI",
             nuked stray hard tab ]

diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c
index 5e4c837..3dfabd0 100644
--- a/xen/arch/arm/traps.c
+++ b/xen/arch/arm/traps.c
@@ -90,7 +90,7 @@ void __cpuinit init_traps(void)
 
     /* Setup hypervisor traps */
     WRITE_SYSREG(HCR_PTW|HCR_BSU_INNER|HCR_AMO|HCR_IMO|HCR_FMO|HCR_VM|
-                 HCR_TWI|HCR_TSC|HCR_TAC|HCR_SWIO|HCR_TIDCP, HCR_EL2);
+                 HCR_TWE|HCR_TWI|HCR_TSC|HCR_TAC|HCR_SWIO|HCR_TIDCP,
HCR_EL2);
     isb();
 }
 
@@ -1803,8 +1803,13 @@ asmlinkage void do_trap_hypervisor(struct
cpu_user_regs *regs)
             advance_pc(regs, hsr);
             return;
         }
-        /* at the moment we only trap WFI */
-        vcpu_block_unless_event_pending(current);
+        if ( hsr.wfi_wfe.ti ) {
+            /* Yield the VCPU for WFE */
+            vcpu_force_reschedule(current);
+        } else {
+            /* Block the VCPU for WFI */
+            vcpu_block_unless_event_pending(current);
+        }
         advance_pc(regs, hsr);
         break;
     case HSR_EC_CP15_32:
diff --git a/xen/include/asm-arm/processor.h
b/xen/include/asm-arm/processor.h
index bdfff4e..9d230f3 100644
--- a/xen/include/asm-arm/processor.h
+++ b/xen/include/asm-arm/processor.h
@@ -276,6 +276,15 @@ union hsr {
         unsigned long ec:6;    /* Exception Class */
     } cond;
 
+    struct hsr_wfi_wfe {
+        unsigned long ti:1;    /* Trapped instruction */
+        unsigned long sbzp:19;
+        unsigned long cc:4;    /* Condition Code */
+        unsigned long ccvalid:1;/* CC Valid */
+        unsigned long len:1;   /* Instruction length */
+        unsigned long ec:6;    /* Exception Class */
+    } wfi_wfe;
+
     /* reg, reg0, reg1 are 4 bits on AArch32, the fifth bit is sbzp. */
     struct hsr_cp32 {
         unsigned long read:1;  /* Direction */

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH v2] xen/arm: Trap and yield on WFE instructions
  2014-07-16 14:49 ` Ian Campbell
@ 2014-07-22 17:58   ` Julien Grall
  0 siblings, 0 replies; 3+ messages in thread
From: Julien Grall @ 2014-07-22 17:58 UTC (permalink / raw)
  To: Ian Campbell, Anup Patel
  Cc: stefano.stabellini, xen-devel, stefano.stabellini, patches,
	Pranavkumar Sawargaonkar

Hi Anup and Ian,

On 07/16/2014 03:49 PM, Ian Campbell wrote:
> 
> On Wed, 2014-07-16 at 16:02 +0530, Anup Patel wrote:
>> If we have a Guest/DomU with two or more of its VCPUs running
>> on same host CPU then it can quite likely happen that these
>> VCPUs fight for same spinlock and one of them will waste CPU
>> cycles in WFE instruction. This patch makes WFE instruction
>> trap for VCPU and forces VCPU to yield its timeslice.
>>
>> The KVM ARM/ARM64 also does similar thing for handling WFE
>> instructions. (Please refer,
>> https://lists.cs.columbia.edu/pipermail/kvmarm/2013-November/006259.html)
>>
>> In general, this patch is more of an optimization for an
>> oversubscribed system having number of VCPUs more than
>> underlying host CPUs.
>>
>> Changes since V1:
>>  - Added separate member in union hsr for decoding WFI/WFE
>>    related info.
>>
>> Signed-off-by: Anup Patel <anup.patel@linaro.org>
>> Signed-off-by: Pranavkumar Sawargaonkar <pranavkumar@linaro.org>
>> Tested-by: Pranavkumar Sawargaonkar <pranavkumar@linaro.org>
> 
> Acked + applied. There was a conflict with "[PATCH v4 1/2] xen/arm :
> Adding helper function for WFI" which I just applied before it. I fixed
> it up and the result is below, please check it is ok.

This patch make Xen unstable on some platform ARM32. hackbench will
crash after few minutes on the guest.

Test hackbench on guest
Running in threaded mode with 10 groups using 40 file descriptors each
(== 400 tasks)
Each sender will pass 100 messages of 100 bytes
Time: 1.135
Running in process mode with 10 groups using 40 file descriptors each
(== 400 tasks)
Each sender will pass 100 messages of 100 bytes
Time: 1.056
Running in threaded mode with 10 groups using 40 file descriptors each
(== 400 tasks)
Each sender will pass 10000 messages of 100 bytes
Time: 105.583
Running in process mode with 10 groups using 40 file descriptors each
(== 400 tasks)
Each sender will pass 10000 messages of 100 bytes
*** Error in `/usr/bin/hackbench': free(): invalid pointer: 0x016831c0 ***
*** Error in `/usr/bin/hackbench': free(): invalid pointer: 0x01682fe0 ***
SENDER: write (error: Connection reset by peer)
SENDER: write (error: Broken pipe)
SENDER: write (error: Broken pipe)
SENDER: write (error: Broken pipe)

This has been tested on midway via Linaro CI loop (failing [1] and
working [2]). For the moment I'm not able to reproduce locally on my
midway node.

I checked Linux ARM32 KVM code and there is no specific change from the
ARM64 implementation.

AFAIU the commit message, this patch doesn't fix any issue but improve
performance. Hence, from the Anup's mail on V1:

"I did not try any benchmarks myself but hackbench shows good
improvement for KVM hence it is a worthy optimization for Xen too.

I found this change missing for Xen hence this patch."

So nobody try to exercise this patch via a benchmark or other things on
Xen ARM{64,32}... Unless someone has an idea how to fix the memory
corruption quickly, I request to revert this patch. I prefer a stable
Xen rather a fast one.

Regards,

[1] based on commit af82c49
https://validation.linaro.org/dashboard/permalink/bundle/ca2c130a5ccdfbb00bb1dec51f50bfdd870588e2/

[2] based on commit c047211
https://validation.linaro.org/dashboard/permalink/bundle/d7fef09bb3b7cbdcfc546c0759b209831bde0cb7/

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2014-07-22 17:58 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-07-16 10:32 [PATCH v2] xen/arm: Trap and yield on WFE instructions Anup Patel
2014-07-16 14:49 ` Ian Campbell
2014-07-22 17:58   ` Julien Grall

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).