[PATCH REPOST 0/5] armv7 BE kvm support

linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed

* [PATCH REPOST 0/5] armv7 BE kvm support
@ 2013-12-20 16:48 Victor Kamensky
  2013-12-20 16:48 ` [PATCH REPOST 1/5] ARM: kvm: replace push and pop with stdmb and ldmia instrs to enable assembler.h inclusion Victor Kamensky
                   ` (4 more replies)
  0 siblings, 5 replies; 26+ messages in thread
From: Victor Kamensky @ 2013-12-20 16:48 UTC (permalink / raw)
  To: linux-arm-kernel

[repost: adding kvmarm mailing list as per Christoffer's request]

Hi Guys,

Here is series that enables KVM support for V7 big endian kernels. Mostly
it deals with BE KVM host support. Marc Zyngier showed before with his patches
how BE guest could run on top LE host. With these patches BE guest runs on 
top of BE host. If Marc's kvmtool is used with few additional changes I tested
that BE host could run LE guest. Also I verified that there were no 
regressions in BE guest on top of LE host case.

Note that posted series covers only kernel side changes. The changes were
tested inside of bigger setup with additional changes in qemu and kvmtool.
I will post those changes separately in proper aliases but for completeness 
sake Appendix A gives pointers to git repositories and branches with all
needed changes.

Please note first patch is not related to BE KVM per se. I've run
into an issue of conflicting 'push' identifier use while trying to include 
assembler.h into KVM .S files. Details of an issue I observed covered in
Appendix B. The first patch is my take on solving it.

Victor Kamensky (5):
  ARM: kvm: replace push and pop with stdmb and ldmia instrs to enable
    assembler.h inclusion
  ARM: fix KVM assembler files to work in BE case
  ARM: kvm one_reg coproc set and get BE fixes
  ARM: kvm vgic mmio should return data in BE format in BE case
  ARM: kvm MMIO support BE host running LE code

 arch/arm/include/asm/assembler.h   |  7 +++
 arch/arm/include/asm/kvm_asm.h     |  4 +-
 arch/arm/include/asm/kvm_emulate.h | 22 +++++++--
 arch/arm/kvm/coproc.c              | 94 ++++++++++++++++++++++++++++----------
 arch/arm/kvm/init.S                |  7 ++-
 arch/arm/kvm/interrupts.S          | 50 +++++++++++---------
 arch/arm/kvm/interrupts_head.S     | 61 +++++++++++++++----------
 virt/kvm/arm/vgic.c                |  4 +-
 8 files changed, 168 insertions(+), 81 deletions(-)

-- 
1.8.1.4

Thanks,
Victor

      Appendix A: Testing and Full Setup Description
      ----------------------------------------------

I) No mixed mode setup - i.e BE guest on BE host; and LE guest
on LE host tested to make sure no regressions.

KVM host and guest kernels:
  TC2 on top of Linus 3.13-rc4 (this patch series):
      git: git://git.linaro.org/people/victor.kamensky/linux-linaro-tracking-be.git
      branch: armv7be-kvm-3.13-rc4
      
  TC2 and Arndale on top of Linaro BE tree:
      git: git://git.linaro.org/people/victor.kamensky/linux-linaro-tracking-be.git
      branch: llct-be-20131216-kvm

- TC1 kernels used as guests

qemu:
      git: git://git.linaro.org/people/victor.kamensky/qemu-be.git 
      branch: armv7be-v1
      description: changes to run qemu on armeb target; and other
      changes to work with be image on top of be host

kvmtool:
      git: git://git.linaro.org/people/victor.kamensky/linux-linaro-tracking-be.git
      branch: kvmtool-armv7be-v1
      desciption: minimal changes to build kvmtool for armeb target; and
      tiny change with virtio magic

II) Mixed mode setup all possible combinations within V7 (LE guest on BE host;
BE guest on LE host as Marc's setup tested to make sure no regressions) only
with kvmtool.

This work is based on Marc Zyngier's work that made BE guest to run on top
of LE host. For this setup special version of kvmtool should be used and
in addition I had to apply patch to guest kernel that would switch reading
virtio configs reads to be LE only, that is made on top of previous Rusty
Russell's changes. Effectively I just had to do very minor addition to make
LE guest to work on BE host, most of heavy lifting was done before by Marc.

KVM host kernels: as in previous setup

Guest TC1 kernels with LE virtio config patch:
      git: git://git.linaro.org/people/victor.kamensky/linux-linaro-tracking-be.git
      branch: virtio-leconfig-3.13-rc4

kvmtool:
      git: git://git.linaro.org/people/victor.kamensky/linux-linaro-tracking-be.git
      branch: kvmtool-mixed-v1
      description: based on git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git
      branch kvm-arm64/kvmtool-be-on-le; adds missing include fix; above armeb target
      build patches; and one fix related to BE mode

qemu:
      git: git://git.linaro.org/people/victor.kamensky/qemu-be.git
      branch: armv7be-leconfig-v1
      description: change virtio-blk that so qemu could work with guest image
      where virtio leconfig is made; note it does not work in mixed mode; to do
      so qemu would need bunch of similar changes that Marc did in kvmtool


      Appendix B: kvm asm file and asm/assembler.h file issue
      -------------------------------------------------------

diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S
index ddc1553..5d3b511 100644
--- a/arch/arm/kvm/interrupts.S
+++ b/arch/arm/kvm/interrupts.S
@@ -25,6 +25,7 @@
 #include <asm/kvm_asm.h>
 #include <asm/kvm_arm.h>
 #include <asm/vfpmacros.h>
+#include <asm/assembler.h>
 #include "interrupts_head.S"
 
        .text

produce the following compilation errors:

/run/media/kamensky/wd/linaro/linux-linaro-core-tracking/092913/linux-linaro-tracking-be/arch/arm/kvm/interrupts.S: Assembler messages:
/run/media/kamensky/wd/linaro/linux-linaro-core-tracking/092913/linux-linaro-tracking-be/arch/arm/kvm/interrupts.S:51: Error: ARM register expected -- `lsr {r2,r3}'
/run/media/kamensky/wd/linaro/linux-linaro-core-tracking/092913/linux-linaro-tracking-be/arch/arm/kvm/interrupts.S:100: Error: ARM register expected -- `lsr {r2}'
/run/media/kamensky/wd/linaro/linux-linaro-core-tracking/092913/linux-linaro-tracking-be/arch/arm/kvm/interrupts.S:100: Error: ARM register expected -- `lsr {r4-r12}'

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH REPOST 1/5] ARM: kvm: replace push and pop with stdmb and ldmia instrs to enable assembler.h inclusion
  2013-12-20 16:48 [PATCH REPOST 0/5] armv7 BE kvm support Victor Kamensky
@ 2013-12-20 16:48 ` Victor Kamensky
  2014-01-21  1:18   ` Christoffer Dall
  2013-12-20 16:48 ` [PATCH REPOST 2/5] ARM: fix KVM assembler files to work in BE case Victor Kamensky
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 26+ messages in thread
From: Victor Kamensky @ 2013-12-20 16:48 UTC (permalink / raw)
  To: linux-arm-kernel

Before fix kvm interrupt.S and interrupt_head.S used push and pop assembler
instruction. It causes problem if <asm/assembler.h> file should be include. In
assembler.h "push" is defined as macro so it causes compilation errors like
this:

arch/arm/kvm/interrupts.S: Assembler messages:
arch/arm/kvm/interrupts.S:51: Error: ARM register expected -- `lsr {r2,r3}'

Solution implemented by this patch replaces all 'push {...}' with
'stdmb sp!, {...}' instruction; and all 'pop {...}' with 'ldmia sp!, {...}'.

Signed-off-by: Victor Kamensky <victor.kamensky@linaro.org>
---
 arch/arm/kvm/interrupts.S      | 38 +++++++++++++++++++-------------------
 arch/arm/kvm/interrupts_head.S | 34 +++++++++++++++++-----------------
 2 files changed, 36 insertions(+), 36 deletions(-)

diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S
index ddc1553..df19133 100644
--- a/arch/arm/kvm/interrupts.S
+++ b/arch/arm/kvm/interrupts.S
@@ -47,7 +47,7 @@ __kvm_hyp_code_start:
  * instead, ignoring the ipa value.
  */
 ENTRY(__kvm_tlb_flush_vmid_ipa)
-	push	{r2, r3}
+	stmdb	sp!, {r2, r3}
 
 	dsb	ishst
 	add	r0, r0, #KVM_VTTBR
@@ -62,7 +62,7 @@ ENTRY(__kvm_tlb_flush_vmid_ipa)
 	mcrr	p15, 6, r2, r3, c2	@ Back to VMID #0
 	isb				@ Not necessary if followed by eret
 
-	pop	{r2, r3}
+	ldmia	sp!, {r2, r3}
 	bx	lr
 ENDPROC(__kvm_tlb_flush_vmid_ipa)
 
@@ -110,7 +110,7 @@ ENTRY(__kvm_vcpu_run)
 #ifdef CONFIG_VFPv3
 	@ Set FPEXC_EN so the guest doesn't trap floating point instructions
 	VFPFMRX r2, FPEXC		@ VMRS
-	push	{r2}
+	stmdb	sp!, {r2}
 	orr	r2, r2, #FPEXC_EN
 	VFPFMXR FPEXC, r2		@ VMSR
 #endif
@@ -175,7 +175,7 @@ __kvm_vcpu_return:
 
 after_vfp_restore:
 	@ Restore FPEXC_EN which we clobbered on entry
-	pop	{r2}
+	ldmia	sp!, {r2}
 	VFPFMXR FPEXC, r2
 #endif
 
@@ -260,7 +260,7 @@ ENTRY(kvm_call_hyp)
 
 /* Handle undef, svc, pabt, or dabt by crashing with a user notice */
 .macro bad_exception exception_code, panic_str
-	push	{r0-r2}
+	stmdb   sp!, {r0-r2}
 	mrrc	p15, 6, r0, r1, c2	@ Read VTTBR
 	lsr	r1, r1, #16
 	ands	r1, r1, #0xff
@@ -338,7 +338,7 @@ hyp_hvc:
 	 * Getting here is either becuase of a trap from a guest or from calling
 	 * HVC from the host kernel, which means "switch to Hyp mode".
 	 */
-	push	{r0, r1, r2}
+	stmdb	sp!, {r0, r1, r2}
 
 	@ Check syndrome register
 	mrc	p15, 4, r1, c5, c2, 0	@ HSR
@@ -361,11 +361,11 @@ hyp_hvc:
 	bne	guest_trap		@ Guest called HVC
 
 host_switch_to_hyp:
-	pop	{r0, r1, r2}
+	ldmia	sp!, {r0, r1, r2}
 
-	push	{lr}
+	stmdb	sp!, {lr}
 	mrs	lr, SPSR
-	push	{lr}
+	stmdb	sp!, {lr}
 
 	mov	lr, r0
 	mov	r0, r1
@@ -375,9 +375,9 @@ host_switch_to_hyp:
 THUMB(	orr	lr, #1)
 	blx	lr			@ Call the HYP function
 
-	pop	{lr}
+	ldmia	sp!, {lr}
 	msr	SPSR_csxf, lr
-	pop	{lr}
+	ldmia	sp!, {lr}
 	eret
 
 guest_trap:
@@ -418,7 +418,7 @@ guest_trap:
 
 	/* Preserve PAR */
 	mrrc	p15, 0, r0, r1, c7	@ PAR
-	push	{r0, r1}
+	stmdb	sp!, {r0, r1}
 
 	/* Resolve IPA using the xFAR */
 	mcr	p15, 0, r2, c7, c8, 0	@ ATS1CPR
@@ -431,7 +431,7 @@ guest_trap:
 	orr	r2, r2, r1, lsl #24
 
 	/* Restore PAR */
-	pop	{r0, r1}
+	ldmia	sp!, {r0, r1}
 	mcrr	p15, 0, r0, r1, c7	@ PAR
 
 3:	load_vcpu			@ Load VCPU pointer to r0
@@ -440,10 +440,10 @@ guest_trap:
 1:	mov	r1, #ARM_EXCEPTION_HVC
 	b	__kvm_vcpu_return
 
-4:	pop	{r0, r1}		@ Failed translation, return to guest
+4:	ldmia	sp!, {r0, r1}		@ Failed translation, return to guest
 	mcrr	p15, 0, r0, r1, c7	@ PAR
 	clrex
-	pop	{r0, r1, r2}
+	ldmia	sp!, {r0, r1, r2}
 	eret
 
 /*
@@ -455,7 +455,7 @@ guest_trap:
 #ifdef CONFIG_VFPv3
 switch_to_guest_vfp:
 	load_vcpu			@ Load VCPU pointer to r0
-	push	{r3-r7}
+	stmdb	sp!, {r3-r7}
 
 	@ NEON/VFP used.  Turn on VFP access.
 	set_hcptr vmexit, (HCPTR_TCP(10) | HCPTR_TCP(11))
@@ -467,15 +467,15 @@ switch_to_guest_vfp:
 	add	r7, r0, #VCPU_VFP_GUEST
 	restore_vfp_state r7
 
-	pop	{r3-r7}
-	pop	{r0-r2}
+	ldmia	sp!, {r3-r7}
+	ldmia	sp!, {r0-r2}
 	clrex
 	eret
 #endif
 
 	.align
 hyp_irq:
-	push	{r0, r1, r2}
+	stmdb	sp!, {r0, r1, r2}
 	mov	r1, #ARM_EXCEPTION_IRQ
 	load_vcpu			@ Load VCPU pointer to r0
 	b	__kvm_vcpu_return
diff --git a/arch/arm/kvm/interrupts_head.S b/arch/arm/kvm/interrupts_head.S
index 6f18695..c371db7 100644
--- a/arch/arm/kvm/interrupts_head.S
+++ b/arch/arm/kvm/interrupts_head.S
@@ -63,7 +63,7 @@ vcpu	.req	r0		@ vcpu pointer always in r0
 	mrs	r2, SP_\mode
 	mrs	r3, LR_\mode
 	mrs	r4, SPSR_\mode
-	push	{r2, r3, r4}
+	stmdb	sp!, {r2, r3, r4}
 .endm
 
 /*
@@ -73,13 +73,13 @@ vcpu	.req	r0		@ vcpu pointer always in r0
 .macro save_host_regs
 	/* Hyp regs. Only ELR_hyp (SPSR_hyp already saved) */
 	mrs	r2, ELR_hyp
-	push	{r2}
+	stmdb	sp!, {r2}
 
 	/* usr regs */
-	push	{r4-r12}	@ r0-r3 are always clobbered
+	stmdb	sp!, {r4-r12}	@ r0-r3 are always clobbered
 	mrs	r2, SP_usr
 	mov	r3, lr
-	push	{r2, r3}
+	stmdb	sp!, {r2, r3}
 
 	push_host_regs_mode svc
 	push_host_regs_mode abt
@@ -95,11 +95,11 @@ vcpu	.req	r0		@ vcpu pointer always in r0
 	mrs	r7, SP_fiq
 	mrs	r8, LR_fiq
 	mrs	r9, SPSR_fiq
-	push	{r2-r9}
+	stmdb	sp!, {r2-r9}
 .endm
 
 .macro pop_host_regs_mode mode
-	pop	{r2, r3, r4}
+	ldmia	sp!, {r2, r3, r4}
 	msr	SP_\mode, r2
 	msr	LR_\mode, r3
 	msr	SPSR_\mode, r4
@@ -110,7 +110,7 @@ vcpu	.req	r0		@ vcpu pointer always in r0
  * Clobbers all registers, in all modes, except r0 and r1.
  */
 .macro restore_host_regs
-	pop	{r2-r9}
+	ldmia	sp!, {r2-r9}
 	msr	r8_fiq, r2
 	msr	r9_fiq, r3
 	msr	r10_fiq, r4
@@ -125,12 +125,12 @@ vcpu	.req	r0		@ vcpu pointer always in r0
 	pop_host_regs_mode abt
 	pop_host_regs_mode svc
 
-	pop	{r2, r3}
+	ldmia	sp!, {r2, r3}
 	msr	SP_usr, r2
 	mov	lr, r3
-	pop	{r4-r12}
+	ldmia	sp!, {r4-r12}
 
-	pop	{r2}
+	ldmia	sp!, {r2}
 	msr	ELR_hyp, r2
 .endm
 
@@ -218,7 +218,7 @@ vcpu	.req	r0		@ vcpu pointer always in r0
 	add	r2, vcpu, #VCPU_USR_REG(3)
 	stm	r2, {r3-r12}
 	add	r2, vcpu, #VCPU_USR_REG(0)
-	pop	{r3, r4, r5}		@ r0, r1, r2
+	ldmia	sp!, {r3, r4, r5}		@ r0, r1, r2
 	stm	r2, {r3, r4, r5}
 	mrs	r2, SP_usr
 	mov	r3, lr
@@ -258,7 +258,7 @@ vcpu	.req	r0		@ vcpu pointer always in r0
 	mrc	p15, 2, r12, c0, c0, 0	@ CSSELR
 
 	.if \store_to_vcpu == 0
-	push	{r2-r12}		@ Push CP15 registers
+	stmdb	sp!, {r2-r12}		@ Push CP15 registers
 	.else
 	str	r2, [vcpu, #CP15_OFFSET(c1_SCTLR)]
 	str	r3, [vcpu, #CP15_OFFSET(c1_CPACR)]
@@ -286,7 +286,7 @@ vcpu	.req	r0		@ vcpu pointer always in r0
 	mrc	p15, 0, r12, c12, c0, 0	@ VBAR
 
 	.if \store_to_vcpu == 0
-	push	{r2-r12}		@ Push CP15 registers
+	stmdb	sp!, {r2-r12}		@ Push CP15 registers
 	.else
 	str	r2, [vcpu, #CP15_OFFSET(c13_CID)]
 	str	r3, [vcpu, #CP15_OFFSET(c13_TID_URW)]
@@ -305,7 +305,7 @@ vcpu	.req	r0		@ vcpu pointer always in r0
 	mrrc	p15, 0, r4, r5, c7	@ PAR
 
 	.if \store_to_vcpu == 0
-	push	{r2,r4-r5}
+	stmdb	sp!, {r2,r4-r5}
 	.else
 	str	r2, [vcpu, #CP15_OFFSET(c14_CNTKCTL)]
 	add	r12, vcpu, #CP15_OFFSET(c7_PAR)
@@ -322,7 +322,7 @@ vcpu	.req	r0		@ vcpu pointer always in r0
  */
 .macro write_cp15_state read_from_vcpu
 	.if \read_from_vcpu == 0
-	pop	{r2,r4-r5}
+	ldmia	sp!, {r2,r4-r5}
 	.else
 	ldr	r2, [vcpu, #CP15_OFFSET(c14_CNTKCTL)]
 	add	r12, vcpu, #CP15_OFFSET(c7_PAR)
@@ -333,7 +333,7 @@ vcpu	.req	r0		@ vcpu pointer always in r0
 	mcrr	p15, 0, r4, r5, c7	@ PAR
 
 	.if \read_from_vcpu == 0
-	pop	{r2-r12}
+	ldmia	sp!, {r2-r12}
 	.else
 	ldr	r2, [vcpu, #CP15_OFFSET(c13_CID)]
 	ldr	r3, [vcpu, #CP15_OFFSET(c13_TID_URW)]
@@ -361,7 +361,7 @@ vcpu	.req	r0		@ vcpu pointer always in r0
 	mcr	p15, 0, r12, c12, c0, 0	@ VBAR
 
 	.if \read_from_vcpu == 0
-	pop	{r2-r12}
+	ldmia	sp!, {r2-r12}
 	.else
 	ldr	r2, [vcpu, #CP15_OFFSET(c1_SCTLR)]
 	ldr	r3, [vcpu, #CP15_OFFSET(c1_CPACR)]
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH REPOST 2/5] ARM: fix KVM assembler files to work in BE case
  2013-12-20 16:48 [PATCH REPOST 0/5] armv7 BE kvm support Victor Kamensky
  2013-12-20 16:48 ` [PATCH REPOST 1/5] ARM: kvm: replace push and pop with stdmb and ldmia instrs to enable assembler.h inclusion Victor Kamensky
@ 2013-12-20 16:48 ` Victor Kamensky
  2014-01-21  1:18   ` Christoffer Dall
  2013-12-20 16:48 ` [PATCH REPOST 3/5] ARM: kvm one_reg coproc set and get BE fixes Victor Kamensky
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 26+ messages in thread
From: Victor Kamensky @ 2013-12-20 16:48 UTC (permalink / raw)
  To: linux-arm-kernel

ARM v7 KVM assembler files fixes to work in big endian mode:

vgic h/w registers are little endian; when asm code reads/writes from/to
them, it needs to do byteswap after/before. Byte swap code uses ARM_BE8
wrapper to add swap only if BIG_ENDIAN kernel is configured

mcrr and mrrc instructions take couple 32 bit registers as argument, one
is supposed to be high part of 64 bit value and another is low part of
64 bit. Typically those values are loaded/stored with ldrd and strd
instructions and those will load high and low parts in opposite register
depending on endianity. Introduce and use rr_lo_hi macro that swap
registers in BE mode when they are passed to mcrr and mrrc instructions.

function that returns 64 bit result __kvm_vcpu_run in couple registers
has to be adjusted for BE case.

Signed-off-by: Victor Kamensky <victor.kamensky@linaro.org>
---
 arch/arm/include/asm/assembler.h |  7 +++++++
 arch/arm/include/asm/kvm_asm.h   |  4 ++--
 arch/arm/kvm/init.S              |  7 +++++--
 arch/arm/kvm/interrupts.S        | 12 +++++++++---
 arch/arm/kvm/interrupts_head.S   | 27 ++++++++++++++++++++-------
 5 files changed, 43 insertions(+), 14 deletions(-)

diff --git a/arch/arm/include/asm/assembler.h b/arch/arm/include/asm/assembler.h
index 5c22851..ad1ad31 100644
--- a/arch/arm/include/asm/assembler.h
+++ b/arch/arm/include/asm/assembler.h
@@ -60,6 +60,13 @@
 #define ARM_BE8(code...)
 #endif
 
+/* swap pair of registers position depending on current endianity */
+#ifdef CONFIG_CPU_ENDIAN_BE8
+#define rr_lo_hi(a1, a2) a2, a1
+#else
+#define rr_lo_hi(a1, a2) a1, a2
+#endif
+
 /*
  * Data preload for architectures that support it
  */
diff --git a/arch/arm/include/asm/kvm_asm.h b/arch/arm/include/asm/kvm_asm.h
index 661da11..12981d6 100644
--- a/arch/arm/include/asm/kvm_asm.h
+++ b/arch/arm/include/asm/kvm_asm.h
@@ -26,9 +26,9 @@
 #define c1_ACTLR	4	/* Auxilliary Control Register */
 #define c1_CPACR	5	/* Coprocessor Access Control */
 #define c2_TTBR0	6	/* Translation Table Base Register 0 */
-#define c2_TTBR0_high	7	/* TTBR0 top 32 bits */
+#define c2_TTBR0_hilo	7	/* TTBR0 top 32 bits in LE case, low 32 bits in BE case */
 #define c2_TTBR1	8	/* Translation Table Base Register 1 */
-#define c2_TTBR1_high	9	/* TTBR1 top 32 bits */
+#define c2_TTBR1_hilo	9	/* TTBR1 top 32 bits in LE case, low 32 bits in BE case */
 #define c2_TTBCR	10	/* Translation Table Base Control R. */
 #define c3_DACR		11	/* Domain Access Control Register */
 #define c5_DFSR		12	/* Data Fault Status Register */
diff --git a/arch/arm/kvm/init.S b/arch/arm/kvm/init.S
index 1b9844d..2d10b2d 100644
--- a/arch/arm/kvm/init.S
+++ b/arch/arm/kvm/init.S
@@ -22,6 +22,7 @@
 #include <asm/kvm_asm.h>
 #include <asm/kvm_arm.h>
 #include <asm/kvm_mmu.h>
+#include <asm/assembler.h>
 
 /********************************************************************
  * Hypervisor initialization
@@ -70,8 +71,10 @@ __do_hyp_init:
 	cmp	r0, #0			@ We have a SP?
 	bne	phase2			@ Yes, second stage init
 
+ARM_BE8(setend	be) @ Switch to Big Endian mode if needed
+
 	@ Set the HTTBR to point to the hypervisor PGD pointer passed
-	mcrr	p15, 4, r2, r3, c2
+	mcrr	p15, 4, rr_lo_hi(r2, r3), c2
 
 	@ Set the HTCR and VTCR to the same shareability and cacheability
 	@ settings as the non-secure TTBCR and with T0SZ == 0.
@@ -137,7 +140,7 @@ phase2:
 	mov	pc, r0
 
 target:	@ We're now in the trampoline code, switch page tables
-	mcrr	p15, 4, r2, r3, c2
+	mcrr	p15, 4, rr_lo_hi(r2, r3), c2
 	isb
 
 	@ Invalidate the old TLBs
diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S
index df19133..0784ec3 100644
--- a/arch/arm/kvm/interrupts.S
+++ b/arch/arm/kvm/interrupts.S
@@ -25,6 +25,7 @@
 #include <asm/kvm_asm.h>
 #include <asm/kvm_arm.h>
 #include <asm/vfpmacros.h>
+#include <asm/assembler.h>
 #include "interrupts_head.S"
 
 	.text
@@ -52,14 +53,14 @@ ENTRY(__kvm_tlb_flush_vmid_ipa)
 	dsb	ishst
 	add	r0, r0, #KVM_VTTBR
 	ldrd	r2, r3, [r0]
-	mcrr	p15, 6, r2, r3, c2	@ Write VTTBR
+	mcrr	p15, 6, rr_lo_hi(r2, r3), c2	@ Write VTTBR
 	isb
 	mcr     p15, 0, r0, c8, c3, 0	@ TLBIALLIS (rt ignored)
 	dsb	ish
 	isb
 	mov	r2, #0
 	mov	r3, #0
-	mcrr	p15, 6, r2, r3, c2	@ Back to VMID #0
+	mcrr	p15, 6, rr_lo_hi(r2, r3), c2	@ Back to VMID #0
 	isb				@ Not necessary if followed by eret
 
 	ldmia	sp!, {r2, r3}
@@ -135,7 +136,7 @@ ENTRY(__kvm_vcpu_run)
 	ldr	r1, [vcpu, #VCPU_KVM]
 	add	r1, r1, #KVM_VTTBR
 	ldrd	r2, r3, [r1]
-	mcrr	p15, 6, r2, r3, c2	@ Write VTTBR
+	mcrr	p15, 6, rr_lo_hi(r2, r3), c2	@ Write VTTBR
 
 	@ We're all done, just restore the GPRs and go to the guest
 	restore_guest_regs
@@ -199,8 +200,13 @@ after_vfp_restore:
 
 	restore_host_regs
 	clrex				@ Clear exclusive monitor
+#ifndef __ARMEB__
 	mov	r0, r1			@ Return the return code
 	mov	r1, #0			@ Clear upper bits in return value
+#else
+	@ r1 already has return code
+	mov	r0, #0			@ Clear upper bits in return value
+#endif /* __ARMEB__ */
 	bx	lr			@ return to IOCTL
 
 /********************************************************************
diff --git a/arch/arm/kvm/interrupts_head.S b/arch/arm/kvm/interrupts_head.S
index c371db7..67b4002 100644
--- a/arch/arm/kvm/interrupts_head.S
+++ b/arch/arm/kvm/interrupts_head.S
@@ -251,8 +251,8 @@ vcpu	.req	r0		@ vcpu pointer always in r0
 	mrc	p15, 0, r3, c1, c0, 2	@ CPACR
 	mrc	p15, 0, r4, c2, c0, 2	@ TTBCR
 	mrc	p15, 0, r5, c3, c0, 0	@ DACR
-	mrrc	p15, 0, r6, r7, c2	@ TTBR 0
-	mrrc	p15, 1, r8, r9, c2	@ TTBR 1
+	mrrc	p15, 0, rr_lo_hi(r6, r7), c2	@ TTBR 0
+	mrrc	p15, 1, rr_lo_hi(r8, r9), c2	@ TTBR 1
 	mrc	p15, 0, r10, c10, c2, 0	@ PRRR
 	mrc	p15, 0, r11, c10, c2, 1	@ NMRR
 	mrc	p15, 2, r12, c0, c0, 0	@ CSSELR
@@ -380,8 +380,8 @@ vcpu	.req	r0		@ vcpu pointer always in r0
 	mcr	p15, 0, r3, c1, c0, 2	@ CPACR
 	mcr	p15, 0, r4, c2, c0, 2	@ TTBCR
 	mcr	p15, 0, r5, c3, c0, 0	@ DACR
-	mcrr	p15, 0, r6, r7, c2	@ TTBR 0
-	mcrr	p15, 1, r8, r9, c2	@ TTBR 1
+	mcrr	p15, 0, rr_lo_hi(r6, r7), c2	@ TTBR 0
+	mcrr	p15, 1, rr_lo_hi(r8, r9), c2	@ TTBR 1
 	mcr	p15, 0, r10, c10, c2, 0	@ PRRR
 	mcr	p15, 0, r11, c10, c2, 1	@ NMRR
 	mcr	p15, 2, r12, c0, c0, 0	@ CSSELR
@@ -413,13 +413,21 @@ vcpu	.req	r0		@ vcpu pointer always in r0
 	ldr	r9, [r2, #GICH_ELRSR1]
 	ldr	r10, [r2, #GICH_APR]
 
+ARM_BE8(rev	r3, r3	)
 	str	r3, [r11, #VGIC_CPU_HCR]
+ARM_BE8(rev	r4, r4	)
 	str	r4, [r11, #VGIC_CPU_VMCR]
+ARM_BE8(rev	r5, r5	)
 	str	r5, [r11, #VGIC_CPU_MISR]
+ARM_BE8(rev	r6, r6	)
 	str	r6, [r11, #VGIC_CPU_EISR]
+ARM_BE8(rev	r7, r7	)
 	str	r7, [r11, #(VGIC_CPU_EISR + 4)]
+ARM_BE8(rev	r8, r8	)
 	str	r8, [r11, #VGIC_CPU_ELRSR]
+ARM_BE8(rev	r9, r9	)
 	str	r9, [r11, #(VGIC_CPU_ELRSR + 4)]
+ARM_BE8(rev	r10, r10	)
 	str	r10, [r11, #VGIC_CPU_APR]
 
 	/* Clear GICH_HCR */
@@ -431,6 +439,7 @@ vcpu	.req	r0		@ vcpu pointer always in r0
 	add	r3, r11, #VGIC_CPU_LR
 	ldr	r4, [r11, #VGIC_CPU_NR_LR]
 1:	ldr	r6, [r2], #4
+ARM_BE8(rev	r6, r6	)
 	str	r6, [r3], #4
 	subs	r4, r4, #1
 	bne	1b
@@ -459,8 +468,11 @@ vcpu	.req	r0		@ vcpu pointer always in r0
 	ldr	r4, [r11, #VGIC_CPU_VMCR]
 	ldr	r8, [r11, #VGIC_CPU_APR]
 
+ARM_BE8(rev	r3, r3	)
 	str	r3, [r2, #GICH_HCR]
+ARM_BE8(rev	r4, r4	)
 	str	r4, [r2, #GICH_VMCR]
+ARM_BE8(rev	r8, r8	)
 	str	r8, [r2, #GICH_APR]
 
 	/* Restore list registers */
@@ -468,6 +480,7 @@ vcpu	.req	r0		@ vcpu pointer always in r0
 	add	r3, r11, #VGIC_CPU_LR
 	ldr	r4, [r11, #VGIC_CPU_NR_LR]
 1:	ldr	r6, [r3], #4
+ARM_BE8(rev	r6, r6  )
 	str	r6, [r2], #4
 	subs	r4, r4, #1
 	bne	1b
@@ -498,7 +511,7 @@ vcpu	.req	r0		@ vcpu pointer always in r0
 	mcr	p15, 0, r2, c14, c3, 1	@ CNTV_CTL
 	isb
 
-	mrrc	p15, 3, r2, r3, c14	@ CNTV_CVAL
+	mrrc	p15, 3, rr_lo_hi(r2, r3), c14	@ CNTV_CVAL
 	ldr	r4, =VCPU_TIMER_CNTV_CVAL
 	add	r5, vcpu, r4
 	strd	r2, r3, [r5]
@@ -538,12 +551,12 @@ vcpu	.req	r0		@ vcpu pointer always in r0
 
 	ldr	r2, [r4, #KVM_TIMER_CNTVOFF]
 	ldr	r3, [r4, #(KVM_TIMER_CNTVOFF + 4)]
-	mcrr	p15, 4, r2, r3, c14	@ CNTVOFF
+	mcrr	p15, 4, rr_lo_hi(r2, r3), c14	@ CNTVOFF
 
 	ldr	r4, =VCPU_TIMER_CNTV_CVAL
 	add	r5, vcpu, r4
 	ldrd	r2, r3, [r5]
-	mcrr	p15, 3, r2, r3, c14	@ CNTV_CVAL
+	mcrr	p15, 3, rr_lo_hi(r2, r3), c14	@ CNTV_CVAL
 	isb
 
 	ldr	r2, [vcpu, #VCPU_TIMER_CNTV_CTL]
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH REPOST 3/5] ARM: kvm one_reg coproc set and get BE fixes
  2013-12-20 16:48 [PATCH REPOST 0/5] armv7 BE kvm support Victor Kamensky
  2013-12-20 16:48 ` [PATCH REPOST 1/5] ARM: kvm: replace push and pop with stdmb and ldmia instrs to enable assembler.h inclusion Victor Kamensky
  2013-12-20 16:48 ` [PATCH REPOST 2/5] ARM: fix KVM assembler files to work in BE case Victor Kamensky
@ 2013-12-20 16:48 ` Victor Kamensky
  2014-01-21  1:18   ` Christoffer Dall
  2013-12-20 16:48 ` [PATCH REPOST 4/5] ARM: kvm vgic mmio should return data in BE format in BE case Victor Kamensky
  2013-12-20 16:48 ` [PATCH REPOST 5/5] ARM: kvm MMIO support BE host running LE code Victor Kamensky
  4 siblings, 1 reply; 26+ messages in thread
From: Victor Kamensky @ 2013-12-20 16:48 UTC (permalink / raw)
  To: linux-arm-kernel

This patch fixes issue of reading and writing
ARM V7 registers values from/to user land. Existing code was designed to
work only in LE case.

struct kvm_one_reg
------------------

registers value passed through kvm_one_reg structure. It is used by
KVM_SET_ONE_REG, KVM_GET_ONE_REG ioctls. Note by looking at structure
itself we cannot tell what is size of register. Note that structure carries
address of user memory, 'addr' where register should be read or written

Setting register (from user-land to kvm)
----------------------------------------

kvm_arm_set_reg takes vcpu and pointer to struct kvm_one_reg which already
read from user space

kvm_arm_set_reg calls set_core_reg or kvm_arm_coproc_set_reg

set_core_reg deals only with 4 bytes registers, it just reads 4 bytes from
user space and store it properly into vcpu->arch.regs

kvm_arm_coproc_set_reg deals with registers of different size. At certain
point code reaches phase where it retrieves description of register by id
and it knows register size, which could be either 4 or 8 bytes. Kernel code
is ready to read values from user space, but destination type may vary. It
could be pointer to 32 bit integer or it could be pointer to 64 bit
integer. And all possible permutation of size and destination pointer are
possible. Depending on destination pointer type, 4 bytes or 8 bytes, two
new helper functions are introduced  - reg_from_user32 and reg_from_user64.
They are used instead of reg_from_user function which could work only in
LE case.

Size  sizeof(*DstInt)   Function used to read from user
   4                4   reg_from_user32
   8                4   reg_from_user32 - read two registers
   4                8   reg_from_user64 - need special handling for BE
   8                8   reg_from_user64

Getting register (to user-land from kvm)
----------------------------------------

Situation with reading registers is similar to writing. Integer pointer
type of register to be copied could be 4 or 8 bytes. And size passed in
struct kvm_one_reg could be 4 or 8. And any permutation is possible.
Depending on src pointer type, 4 bytes or 8 bytes, two new helper functions
are introduced - reg_from_user32 and reg_to_user64. They are used instead
of reg_to_user function, which could work only in LE case.

Size  sizeof(*SrcInt)   Function used to write to user
   4                4   reg_to_user32
   8                4   reg_to_user32 - writes two registers
   4                8   reg_to_user64 - need special handleing for BE
   8                8   reg_to_user64

Note code does assume that it can only deals with 4 or 8 byte registers.

Signed-off-by: Victor Kamensky <victor.kamensky@linaro.org>
---
 arch/arm/kvm/coproc.c | 94 +++++++++++++++++++++++++++++++++++++--------------
 1 file changed, 69 insertions(+), 25 deletions(-)

diff --git a/arch/arm/kvm/coproc.c b/arch/arm/kvm/coproc.c
index 78c0885..64b2b94 100644
--- a/arch/arm/kvm/coproc.c
+++ b/arch/arm/kvm/coproc.c
@@ -634,17 +634,61 @@ static struct coproc_reg invariant_cp15[] = {
 	{ CRn( 0), CRm( 0), Op1( 1), Op2( 7), is32, NULL, get_AIDR },
 };
 
-static int reg_from_user(void *val, const void __user *uaddr, u64 id)
+static int reg_from_user64(u64 *val, const void __user *uaddr, u64 id)
+{
+	unsigned long regsize = KVM_REG_SIZE(id);
+	union {
+		u32	word;
+		u64	dword;
+	} tmp = {0};
+
+	if (copy_from_user(&tmp, uaddr, regsize) != 0)
+		return -EFAULT;
+
+	switch (regsize) {
+	case 4:
+		*val = tmp.word;
+		break;
+	case 8:
+		*val = tmp.dword;
+		break;
+	}
+	return 0;
+}
+
+/* Note it may really copy two u32 registers */
+static int reg_from_user32(u32 *val, const void __user *uaddr, u64 id)
 {
-	/* This Just Works because we are little endian. */
 	if (copy_from_user(val, uaddr, KVM_REG_SIZE(id)) != 0)
 		return -EFAULT;
 	return 0;
 }
 
-static int reg_to_user(void __user *uaddr, const void *val, u64 id)
+static int reg_to_user64(void __user *uaddr, const u64 *val, u64 id)
+{
+	unsigned long regsize = KVM_REG_SIZE(id);
+	union {
+		u32	word;
+		u64	dword;
+	} tmp;
+
+	switch (regsize) {
+	case 4:
+		tmp.word = *val;
+		break;
+	case 8:
+		tmp.dword = *val;
+		break;
+	}
+
+	if (copy_to_user(uaddr, &tmp, regsize) != 0)
+		return -EFAULT;
+	return 0;
+}
+
+/* Note it may really copy two u32 registers */
+static int reg_to_user32(void __user *uaddr, const u32 *val, u64 id)
 {
-	/* This Just Works because we are little endian. */
 	if (copy_to_user(uaddr, val, KVM_REG_SIZE(id)) != 0)
 		return -EFAULT;
 	return 0;
@@ -662,7 +706,7 @@ static int get_invariant_cp15(u64 id, void __user *uaddr)
 	if (!r)
 		return -ENOENT;
 
-	return reg_to_user(uaddr, &r->val, id);
+	return reg_to_user64(uaddr, &r->val, id);
 }
 
 static int set_invariant_cp15(u64 id, void __user *uaddr)
@@ -678,7 +722,7 @@ static int set_invariant_cp15(u64 id, void __user *uaddr)
 	if (!r)
 		return -ENOENT;
 
-	err = reg_from_user(&val, uaddr, id);
+	err = reg_from_user64(&val, uaddr, id);
 	if (err)
 		return err;
 
@@ -846,7 +890,7 @@ static int vfp_get_reg(const struct kvm_vcpu *vcpu, u64 id, void __user *uaddr)
 	if (vfpid < num_fp_regs()) {
 		if (KVM_REG_SIZE(id) != 8)
 			return -ENOENT;
-		return reg_to_user(uaddr, &vcpu->arch.vfp_guest.fpregs[vfpid],
+		return reg_to_user64(uaddr, &vcpu->arch.vfp_guest.fpregs[vfpid],
 				   id);
 	}
 
@@ -856,22 +900,22 @@ static int vfp_get_reg(const struct kvm_vcpu *vcpu, u64 id, void __user *uaddr)
 
 	switch (vfpid) {
 	case KVM_REG_ARM_VFP_FPEXC:
-		return reg_to_user(uaddr, &vcpu->arch.vfp_guest.fpexc, id);
+		return reg_to_user32(uaddr, &vcpu->arch.vfp_guest.fpexc, id);
 	case KVM_REG_ARM_VFP_FPSCR:
-		return reg_to_user(uaddr, &vcpu->arch.vfp_guest.fpscr, id);
+		return reg_to_user32(uaddr, &vcpu->arch.vfp_guest.fpscr, id);
 	case KVM_REG_ARM_VFP_FPINST:
-		return reg_to_user(uaddr, &vcpu->arch.vfp_guest.fpinst, id);
+		return reg_to_user32(uaddr, &vcpu->arch.vfp_guest.fpinst, id);
 	case KVM_REG_ARM_VFP_FPINST2:
-		return reg_to_user(uaddr, &vcpu->arch.vfp_guest.fpinst2, id);
+		return reg_to_user32(uaddr, &vcpu->arch.vfp_guest.fpinst2, id);
 	case KVM_REG_ARM_VFP_MVFR0:
 		val = fmrx(MVFR0);
-		return reg_to_user(uaddr, &val, id);
+		return reg_to_user32(uaddr, &val, id);
 	case KVM_REG_ARM_VFP_MVFR1:
 		val = fmrx(MVFR1);
-		return reg_to_user(uaddr, &val, id);
+		return reg_to_user32(uaddr, &val, id);
 	case KVM_REG_ARM_VFP_FPSID:
 		val = fmrx(FPSID);
-		return reg_to_user(uaddr, &val, id);
+		return reg_to_user32(uaddr, &val, id);
 	default:
 		return -ENOENT;
 	}
@@ -890,8 +934,8 @@ static int vfp_set_reg(struct kvm_vcpu *vcpu, u64 id, const void __user *uaddr)
 	if (vfpid < num_fp_regs()) {
 		if (KVM_REG_SIZE(id) != 8)
 			return -ENOENT;
-		return reg_from_user(&vcpu->arch.vfp_guest.fpregs[vfpid],
-				     uaddr, id);
+		return reg_from_user64(&vcpu->arch.vfp_guest.fpregs[vfpid],
+				       uaddr, id);
 	}
 
 	/* FP control registers are all 32 bit. */
@@ -900,28 +944,28 @@ static int vfp_set_reg(struct kvm_vcpu *vcpu, u64 id, const void __user *uaddr)
 
 	switch (vfpid) {
 	case KVM_REG_ARM_VFP_FPEXC:
-		return reg_from_user(&vcpu->arch.vfp_guest.fpexc, uaddr, id);
+		return reg_from_user32(&vcpu->arch.vfp_guest.fpexc, uaddr, id);
 	case KVM_REG_ARM_VFP_FPSCR:
-		return reg_from_user(&vcpu->arch.vfp_guest.fpscr, uaddr, id);
+		return reg_from_user32(&vcpu->arch.vfp_guest.fpscr, uaddr, id);
 	case KVM_REG_ARM_VFP_FPINST:
-		return reg_from_user(&vcpu->arch.vfp_guest.fpinst, uaddr, id);
+		return reg_from_user32(&vcpu->arch.vfp_guest.fpinst, uaddr, id);
 	case KVM_REG_ARM_VFP_FPINST2:
-		return reg_from_user(&vcpu->arch.vfp_guest.fpinst2, uaddr, id);
+		return reg_from_user32(&vcpu->arch.vfp_guest.fpinst2, uaddr, id);
 	/* These are invariant. */
 	case KVM_REG_ARM_VFP_MVFR0:
-		if (reg_from_user(&val, uaddr, id))
+		if (reg_from_user32(&val, uaddr, id))
 			return -EFAULT;
 		if (val != fmrx(MVFR0))
 			return -EINVAL;
 		return 0;
 	case KVM_REG_ARM_VFP_MVFR1:
-		if (reg_from_user(&val, uaddr, id))
+		if (reg_from_user32(&val, uaddr, id))
 			return -EFAULT;
 		if (val != fmrx(MVFR1))
 			return -EINVAL;
 		return 0;
 	case KVM_REG_ARM_VFP_FPSID:
-		if (reg_from_user(&val, uaddr, id))
+		if (reg_from_user32(&val, uaddr, id))
 			return -EFAULT;
 		if (val != fmrx(FPSID))
 			return -EINVAL;
@@ -968,7 +1012,7 @@ int kvm_arm_coproc_get_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
 		return get_invariant_cp15(reg->id, uaddr);
 
 	/* Note: copies two regs if size is 64 bit. */
-	return reg_to_user(uaddr, &vcpu->arch.cp15[r->reg], reg->id);
+	return reg_to_user32(uaddr, &vcpu->arch.cp15[r->reg], reg->id);
 }
 
 int kvm_arm_coproc_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
@@ -987,7 +1031,7 @@ int kvm_arm_coproc_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
 		return set_invariant_cp15(reg->id, uaddr);
 
 	/* Note: copies two regs if size is 64 bit */
-	return reg_from_user(&vcpu->arch.cp15[r->reg], uaddr, reg->id);
+	return reg_from_user32(&vcpu->arch.cp15[r->reg], uaddr, reg->id);
 }
 
 static unsigned int num_demux_regs(void)
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH REPOST 4/5] ARM: kvm vgic mmio should return data in BE format in BE case
  2013-12-20 16:48 [PATCH REPOST 0/5] armv7 BE kvm support Victor Kamensky
                   ` (2 preceding siblings ...)
  2013-12-20 16:48 ` [PATCH REPOST 3/5] ARM: kvm one_reg coproc set and get BE fixes Victor Kamensky
@ 2013-12-20 16:48 ` Victor Kamensky
  2014-01-21  1:19   ` Christoffer Dall
  2013-12-20 16:48 ` [PATCH REPOST 5/5] ARM: kvm MMIO support BE host running LE code Victor Kamensky
  4 siblings, 1 reply; 26+ messages in thread
From: Victor Kamensky @ 2013-12-20 16:48 UTC (permalink / raw)
  To: linux-arm-kernel

KVM mmio in BE case assumes that data it recieves is in BE format. Vgic
operates in LE, so need byteswap data in BE case.

Signed-off-by: Victor Kamensky <victor.kamensky@linaro.org>
---
 virt/kvm/arm/vgic.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
index 685fc72..7e11458 100644
--- a/virt/kvm/arm/vgic.c
+++ b/virt/kvm/arm/vgic.c
@@ -236,12 +236,12 @@ static void vgic_cpu_irq_clear(struct kvm_vcpu *vcpu, int irq)
 
 static u32 mmio_data_read(struct kvm_exit_mmio *mmio, u32 mask)
 {
-	return *((u32 *)mmio->data) & mask;
+	return le32_to_cpu(*((u32 *)mmio->data)) & mask;
 }
 
 static void mmio_data_write(struct kvm_exit_mmio *mmio, u32 mask, u32 value)
 {
-	*((u32 *)mmio->data) = value & mask;
+	*((u32 *)mmio->data) = cpu_to_le32(value) & mask;
 }
 
 /**
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH REPOST 5/5] ARM: kvm MMIO support BE host running LE code
  2013-12-20 16:48 [PATCH REPOST 0/5] armv7 BE kvm support Victor Kamensky
                   ` (3 preceding siblings ...)
  2013-12-20 16:48 ` [PATCH REPOST 4/5] ARM: kvm vgic mmio should return data in BE format in BE case Victor Kamensky
@ 2013-12-20 16:48 ` Victor Kamensky
  2014-01-06 12:37   ` Marc Zyngier
  4 siblings, 1 reply; 26+ messages in thread
From: Victor Kamensky @ 2013-12-20 16:48 UTC (permalink / raw)
  To: linux-arm-kernel

In case of status register E bit is not set (LE mode) and host runs in
BE mode we need byteswap data, so read/write is emulated correctly.

Signed-off-by: Victor Kamensky <victor.kamensky@linaro.org>
---
 arch/arm/include/asm/kvm_emulate.h | 22 ++++++++++++++++++----
 1 file changed, 18 insertions(+), 4 deletions(-)

diff --git a/arch/arm/include/asm/kvm_emulate.h b/arch/arm/include/asm/kvm_emulate.h
index 0fa90c9..69b7469 100644
--- a/arch/arm/include/asm/kvm_emulate.h
+++ b/arch/arm/include/asm/kvm_emulate.h
@@ -185,9 +185,16 @@ static inline unsigned long vcpu_data_guest_to_host(struct kvm_vcpu *vcpu,
 		default:
 			return be32_to_cpu(data);
 		}
+	} else {
+		switch (len) {
+		case 1:
+			return data & 0xff;
+		case 2:
+			return le16_to_cpu(data & 0xffff);
+		default:
+			return le32_to_cpu(data);
+		}
 	}
-
-	return data;		/* Leave LE untouched */
 }
 
 static inline unsigned long vcpu_data_host_to_guest(struct kvm_vcpu *vcpu,
@@ -203,9 +210,16 @@ static inline unsigned long vcpu_data_host_to_guest(struct kvm_vcpu *vcpu,
 		default:
 			return cpu_to_be32(data);
 		}
+	} else {
+		switch (len) {
+		case 1:
+			return data & 0xff;
+		case 2:
+			return cpu_to_le16(data & 0xffff);
+		default:
+			return cpu_to_le32(data);
+		}
 	}
-
-	return data;		/* Leave LE untouched */
 }
 
 #endif /* __ARM_KVM_EMULATE_H__ */
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH REPOST 5/5] ARM: kvm MMIO support BE host running LE code
  2013-12-20 16:48 ` [PATCH REPOST 5/5] ARM: kvm MMIO support BE host running LE code Victor Kamensky
@ 2014-01-06 12:37   ` Marc Zyngier
  2014-01-06 17:44     ` Victor Kamensky
  0 siblings, 1 reply; 26+ messages in thread
From: Marc Zyngier @ 2014-01-06 12:37 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Dec 20 2013 at 04:48:45 PM, Victor Kamensky <victor.kamensky@linaro.org> wrote:
> In case of status register E bit is not set (LE mode) and host runs in
> BE mode we need byteswap data, so read/write is emulated correctly.

I don't think this is correct.

The only reason we byteswap the value in the BE guest case is because it
has byteswapped the data the first place.

With a LE guest, the value we get in the register is the right one, no
need for further processing. I think your additional byteswap only
hides bugs somewhere else in the stack.

	M.

> Signed-off-by: Victor Kamensky <victor.kamensky@linaro.org>
> ---
>  arch/arm/include/asm/kvm_emulate.h | 22 ++++++++++++++++++----
>  1 file changed, 18 insertions(+), 4 deletions(-)
>
> diff --git a/arch/arm/include/asm/kvm_emulate.h b/arch/arm/include/asm/kvm_emulate.h
> index 0fa90c9..69b7469 100644
> --- a/arch/arm/include/asm/kvm_emulate.h
> +++ b/arch/arm/include/asm/kvm_emulate.h
> @@ -185,9 +185,16 @@ static inline unsigned long vcpu_data_guest_to_host(struct kvm_vcpu *vcpu,
>  		default:
>  			return be32_to_cpu(data);
>  		}
> +	} else {
> +		switch (len) {
> +		case 1:
> +			return data & 0xff;
> +		case 2:
> +			return le16_to_cpu(data & 0xffff);
> +		default:
> +			return le32_to_cpu(data);
> +		}
>  	}
> -
> -	return data;		/* Leave LE untouched */
>  }
>  
>  static inline unsigned long vcpu_data_host_to_guest(struct kvm_vcpu *vcpu,
> @@ -203,9 +210,16 @@ static inline unsigned long vcpu_data_host_to_guest(struct kvm_vcpu *vcpu,
>  		default:
>  			return cpu_to_be32(data);
>  		}
> +	} else {
> +		switch (len) {
> +		case 1:
> +			return data & 0xff;
> +		case 2:
> +			return cpu_to_le16(data & 0xffff);
> +		default:
> +			return cpu_to_le32(data);
> +		}
>  	}
> -
> -	return data;		/* Leave LE untouched */
>  }
>  
>  #endif /* __ARM_KVM_EMULATE_H__ */

-- 
Jazz is not dead. It just smells funny.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH REPOST 5/5] ARM: kvm MMIO support BE host running LE code
  2014-01-06 12:37   ` Marc Zyngier
@ 2014-01-06 17:44     ` Victor Kamensky
  2014-01-06 18:20       ` Marc Zyngier
  0 siblings, 1 reply; 26+ messages in thread
From: Victor Kamensky @ 2014-01-06 17:44 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Marc,

Thank you for looking into this.

On 6 January 2014 04:37, Marc Zyngier <marc.zyngier@arm.com> wrote:
> On Fri, Dec 20 2013 at 04:48:45 PM, Victor Kamensky <victor.kamensky@linaro.org> wrote:
>> In case of status register E bit is not set (LE mode) and host runs in
>> BE mode we need byteswap data, so read/write is emulated correctly.
>
> I don't think this is correct.
>
> The only reason we byteswap the value in the BE guest case is because it
> has byteswapped the data the first place.
>
> With a LE guest, the value we get in the register is the right one, no
> need for further processing. I think your additional byteswap only
> hides bugs somewhere else in the stack.

First, do we agree that this patch has effect only in BE host case
(CONFIG_CPU_BIG_ENDIAN=y), because in LE host case cpu_to_leXX
function does nothing only simple copy, just the same we  had before?

In BE host case, we have emulator (qemu, kvm-tool), host kernel, and
hypervisor part of code, all, operating in BE mode; and guest could be either
LE or BE (i.e E bit not set or set). That is opposite to LE host case,
where we have emulator (qemu, kvm-tool), host kernel, and hypervisor part
of code, all, operating in LE mode. Your changes introduced byteswap when
host is LE and access is happening with E bit set. I don't see why symmetry
should break for case when host is BE and access is happening with E bit
cleared.

In another words, regardless of E bit setting of guest access operation rest
of the stack should bring/see the same value before/after
vcpu_data_host_to_guest/vcpu_data_guest_to_host functions are applied. I.e
the rest of stack should be agnostic to E bit setting of access operation.
Do we agree on that? Now, depending on E bit setting of guest access operation
result should differ in its endianity - so in one of two cases byteswap must
happen. But it will not happen in case of BE host and LE access, unless my diff
is applied. Previously added byteswap code for E bit set case will not
have effect
because in BE host case cpu_to_beXX functions don't do anything just copy, and
in another branch of if statement again it just copies the data. So regardless
of E bit setting guest access resulting value is the same in case of
BE host - it
cannot be that way. Note, just only with your changes, in LE host case
byteswap will happen if E bit is set and no byteswap if E bit is clear - so
guest access resulting value does depend on E setting.

Also please note that vcpu_data_host_to_guest/vcpu_data_guest_to_host functions
effectively transfer data between host kernel and memory of saved guest CPU
registers. Those saved registers will be will be put back to CPU registers,
or saved from CPU registers to memory by hypervisor part of code. In BE host
case this hypervisor part of code operates in BE mode as well, so register set
shared between host and hypervisor part of code holds guest registers values in
memory in BE order. vcpu_data_host_to_guest/vcpu_data_guest_to_host function are
not interacting with CPU registers directly. I am not sure, but may this
point was missed.

Thanks,
Victor

>         M.
>
>> Signed-off-by: Victor Kamensky <victor.kamensky@linaro.org>
>> ---
>>  arch/arm/include/asm/kvm_emulate.h | 22 ++++++++++++++++++----
>>  1 file changed, 18 insertions(+), 4 deletions(-)
>>
>> diff --git a/arch/arm/include/asm/kvm_emulate.h b/arch/arm/include/asm/kvm_emulate.h
>> index 0fa90c9..69b7469 100644
>> --- a/arch/arm/include/asm/kvm_emulate.h
>> +++ b/arch/arm/include/asm/kvm_emulate.h
>> @@ -185,9 +185,16 @@ static inline unsigned long vcpu_data_guest_to_host(struct kvm_vcpu *vcpu,
>>               default:
>>                       return be32_to_cpu(data);
>>               }
>> +     } else {
>> +             switch (len) {
>> +             case 1:
>> +                     return data & 0xff;
>> +             case 2:
>> +                     return le16_to_cpu(data & 0xffff);
>> +             default:
>> +                     return le32_to_cpu(data);
>> +             }
>>       }
>> -
>> -     return data;            /* Leave LE untouched */
>>  }
>>
>>  static inline unsigned long vcpu_data_host_to_guest(struct kvm_vcpu *vcpu,
>> @@ -203,9 +210,16 @@ static inline unsigned long vcpu_data_host_to_guest(struct kvm_vcpu *vcpu,
>>               default:
>>                       return cpu_to_be32(data);
>>               }
>> +     } else {
>> +             switch (len) {
>> +             case 1:
>> +                     return data & 0xff;
>> +             case 2:
>> +                     return cpu_to_le16(data & 0xffff);
>> +             default:
>> +                     return cpu_to_le32(data);
>> +             }
>>       }
>> -
>> -     return data;            /* Leave LE untouched */
>>  }
>>
>>  #endif /* __ARM_KVM_EMULATE_H__ */
>
> --
> Jazz is not dead. It just smells funny.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH REPOST 5/5] ARM: kvm MMIO support BE host running LE code
  2014-01-06 17:44     ` Victor Kamensky
@ 2014-01-06 18:20       ` Marc Zyngier
  2014-01-06 19:55         ` Victor Kamensky
  2014-01-06 22:31         ` Peter Maydell
  0 siblings, 2 replies; 26+ messages in thread
From: Marc Zyngier @ 2014-01-06 18:20 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Victor,

On Mon, Jan 06 2014 at 05:44:48 PM, Victor Kamensky <victor.kamensky@linaro.org> wrote:
> Hi Marc,
>
> Thank you for looking into this.
>
> On 6 January 2014 04:37, Marc Zyngier <marc.zyngier@arm.com> wrote:
>> On Fri, Dec 20 2013 at 04:48:45 PM, Victor Kamensky <victor.kamensky@linaro.org> wrote:
>>> In case of status register E bit is not set (LE mode) and host runs in
>>> BE mode we need byteswap data, so read/write is emulated correctly.
>>
>> I don't think this is correct.
>>
>> The only reason we byteswap the value in the BE guest case is because it
>> has byteswapped the data the first place.
>>
>> With a LE guest, the value we get in the register is the right one, no
>> need for further processing. I think your additional byteswap only
>> hides bugs somewhere else in the stack.
>
> First, do we agree that this patch has effect only in BE host case
> (CONFIG_CPU_BIG_ENDIAN=y), because in LE host case cpu_to_leXX
> function does nothing only simple copy, just the same we  had before?

Sure, but that is not the point.

> In BE host case, we have emulator (qemu, kvm-tool), host kernel, and
> hypervisor part of code, all, operating in BE mode; and guest could be either
> LE or BE (i.e E bit not set or set). That is opposite to LE host case,
> where we have emulator (qemu, kvm-tool), host kernel, and hypervisor part
> of code, all, operating in LE mode. Your changes introduced byteswap when
> host is LE and access is happening with E bit set. I don't see why symmetry
> should break for case when host is BE and access is happening with E bit
> cleared.

It is certainly not about symmetry. An IO access is LE, always. Again,
the only reason we byteswap a BE guest is because it tries to write to a
LE device, and thus byteswapping the data before it hits the bus.

When we trap this access, we need to correct that byteswap. And that is
the only case we should handle. A LE guest writes a LE value to a LE
device, and nothing is to be byteswapped.

As for the value you read on the host, it will be exactly the value the
guest has written (registers don't have any endianness).

> In another words, regardless of E bit setting of guest access operation rest
> of the stack should bring/see the same value before/after
> vcpu_data_host_to_guest/vcpu_data_guest_to_host functions are applied. I.e
> the rest of stack should be agnostic to E bit setting of access operation.
> Do we agree on that? Now, depending on E bit setting of guest access operation
> result should differ in its endianity - so in one of two cases byteswap must
> happen. But it will not happen in case of BE host and LE access, unless my diff
> is applied. Previously added byteswap code for E bit set case will not
> have effect
> because in BE host case cpu_to_beXX functions don't do anything just copy, and
> in another branch of if statement again it just copies the data. So regardless
> of E bit setting guest access resulting value is the same in case of
> BE host - it
> cannot be that way. Note, just only with your changes, in LE host case
> byteswap will happen if E bit is set and no byteswap if E bit is clear - so
> guest access resulting value does depend on E setting.
>
> Also please note that vcpu_data_host_to_guest/vcpu_data_guest_to_host functions
> effectively transfer data between host kernel and memory of saved guest CPU
> registers. Those saved registers will be will be put back to CPU registers,
> or saved from CPU registers to memory by hypervisor part of code. In BE host
> case this hypervisor part of code operates in BE mode as well, so register set
> shared between host and hypervisor part of code holds guest registers values in
> memory in BE order. vcpu_data_host_to_guest/vcpu_data_guest_to_host function are
> not interacting with CPU registers directly. I am not sure, but may this
> point was missed.

It wasn't missed. No matter how data is stored in memory (BE, LE, or
even PDP endianness), CPU registers always have a consistent
representation. They are immune to CPU endianness change, and storing
to/reading from memory won't change the value, as long as you use the
same endianness for writing/reading.

What you seems to be missing is that the emulated devices must be
LE. There is no such thing as a BE GIC. So for this to work properly,
you will need to fix the VGIC code (distributor emulation only) to be
host-endianness agnostic, and behave like a LE device, even on a BE
system. And all your other device emulations.

	M.
-- 
Jazz is not dead. It just smells funny.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH REPOST 5/5] ARM: kvm MMIO support BE host running LE code
  2014-01-06 18:20       ` Marc Zyngier
@ 2014-01-06 19:55         ` Victor Kamensky
  2014-01-06 22:31         ` Peter Maydell
  1 sibling, 0 replies; 26+ messages in thread
From: Victor Kamensky @ 2014-01-06 19:55 UTC (permalink / raw)
  To: linux-arm-kernel

On 6 January 2014 10:20, Marc Zyngier <marc.zyngier@arm.com> wrote:
> Hi Victor,
>
> On Mon, Jan 06 2014 at 05:44:48 PM, Victor Kamensky <victor.kamensky@linaro.org> wrote:
>> Hi Marc,
>>
>> Thank you for looking into this.
>>
>> On 6 January 2014 04:37, Marc Zyngier <marc.zyngier@arm.com> wrote:
>>> On Fri, Dec 20 2013 at 04:48:45 PM, Victor Kamensky <victor.kamensky@linaro.org> wrote:
>>>> In case of status register E bit is not set (LE mode) and host runs in
>>>> BE mode we need byteswap data, so read/write is emulated correctly.
>>>
>>> I don't think this is correct.
>>>
>>> The only reason we byteswap the value in the BE guest case is because it
>>> has byteswapped the data the first place.
>>>
>>> With a LE guest, the value we get in the register is the right one, no
>>> need for further processing. I think your additional byteswap only
>>> hides bugs somewhere else in the stack.
>>
>> First, do we agree that this patch has effect only in BE host case
>> (CONFIG_CPU_BIG_ENDIAN=y), because in LE host case cpu_to_leXX
>> function does nothing only simple copy, just the same we  had before?
>
> Sure, but that is not the point.
>
>> In BE host case, we have emulator (qemu, kvm-tool), host kernel, and
>> hypervisor part of code, all, operating in BE mode; and guest could be either
>> LE or BE (i.e E bit not set or set). That is opposite to LE host case,
>> where we have emulator (qemu, kvm-tool), host kernel, and hypervisor part
>> of code, all, operating in LE mode. Your changes introduced byteswap when
>> host is LE and access is happening with E bit set. I don't see why symmetry
>> should break for case when host is BE and access is happening with E bit
>> cleared.
>
> It is certainly not about symmetry. An IO access is LE, always. Again,
> the only reason we byteswap a BE guest is because it tries to write to a
> LE device, and thus byteswapping the data before it hits the bus.
>
> When we trap this access, we need to correct that byteswap. And that is
> the only case we should handle. A LE guest writes a LE value to a LE
> device, and nothing is to be byteswapped.
>
> As for the value you read on the host, it will be exactly the value the
> guest has written (registers don't have any endianness).
>
>> In another words, regardless of E bit setting of guest access operation rest
>> of the stack should bring/see the same value before/after
>> vcpu_data_host_to_guest/vcpu_data_guest_to_host functions are applied. I.e
>> the rest of stack should be agnostic to E bit setting of access operation.
>> Do we agree on that? Now, depending on E bit setting of guest access operation
>> result should differ in its endianity - so in one of two cases byteswap must
>> happen. But it will not happen in case of BE host and LE access, unless my diff
>> is applied. Previously added byteswap code for E bit set case will not
>> have effect
>> because in BE host case cpu_to_beXX functions don't do anything just copy, and
>> in another branch of if statement again it just copies the data. So regardless
>> of E bit setting guest access resulting value is the same in case of
>> BE host - it
>> cannot be that way. Note, just only with your changes, in LE host case
>> byteswap will happen if E bit is set and no byteswap if E bit is clear - so
>> guest access resulting value does depend on E setting.
>>
>> Also please note that vcpu_data_host_to_guest/vcpu_data_guest_to_host functions
>> effectively transfer data between host kernel and memory of saved guest CPU
>> registers. Those saved registers will be will be put back to CPU registers,
>> or saved from CPU registers to memory by hypervisor part of code. In BE host
>> case this hypervisor part of code operates in BE mode as well, so register set
>> shared between host and hypervisor part of code holds guest registers values in
>> memory in BE order. vcpu_data_host_to_guest/vcpu_data_guest_to_host function are
>> not interacting with CPU registers directly. I am not sure, but may this
>> point was missed.
>
> It wasn't missed. No matter how data is stored in memory (BE, LE, or
> even PDP endianness), CPU registers always have a consistent
> representation. They are immune to CPU endianness change, and storing
> to/reading from memory won't change the value, as long as you use the
> same endianness for writing/reading.
>
> What you seems to be missing is that the emulated devices must be
> LE.

It does not matter whether emulated devices are LE or BE.
It is about how E bit should be *emulated* during access.

For example consider situation

1) BE host case
2a) In some place BE guest (E bit set) accesses LE device like this
   ldr r3, [r0, #0]
   rev r3, r3
2b) In the same place BE guest (E bit initially set) accesses LE
device like this (which is really equivalent to 2a):
   setend le
   ldr r3, [r0, #0]
   setend be
3) everything else is completely the same

Regardless of how memory device at r0 address is emulated in the
rest of the stack, if my patch is not applied, in BE host case after
'ldr r3, [r0, #0' is trapped and emulated for both 2a) and 2b) cases
r3 would contain the same value! It is clearly wrong because in one
case memory was read with E bit set and another with E bit
cleared. Such reads should give the byteswapped values for the same
memory location (device or not). In 2a) case after 'rev r3, r3' executes
r3 value will be byteswapped compared to 2b) case - which is very
different if the same 2a) and 2b) code pieces would be executed in
non emulated case with real LE device.

If you suggest that current guest access E value should be
propagated down to the rest of the stack I disagree - it is too invasive.
I believe the rest of stack should emulate access to r0 memory
in the same way regardless what is current guest access E bit value.

Note if I construct similar example for LE host and in some place in
LE guest (E bit initially cleared)  will have (where r0 address is emulated).
4a)
  ldr r3, [r0, #0]
4b)
  setend be
  ldr r3, [r0, #0]
  setend le
  rev r3, r3

The rest of stack would emulate access to r0 address memory in
the same way (regardless of current E bit value) and in 4b) case
value would be byteswapped by code that you added (E bit is set
and host is in LE) and it would be byteswapped again by rev instruction.
As result r3 value will be the same for both 4a) and 4b) cases, the
same result as one would have with real non emulated device.

Thanks,
Victor

> There is no such thing as a BE GIC. So for this to work properly,
> you will need to fix the VGIC code (distributor emulation only) to be
> host-endianness agnostic, and behave like a LE device, even on a BE
> system. And all your other device emulations.
>
>         M.
> --
> Jazz is not dead. It just smells funny.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH REPOST 5/5] ARM: kvm MMIO support BE host running LE code
  2014-01-06 18:20       ` Marc Zyngier
  2014-01-06 19:55         ` Victor Kamensky
@ 2014-01-06 22:31         ` Peter Maydell
  2014-01-06 22:56           ` Christoffer Dall
  1 sibling, 1 reply; 26+ messages in thread
From: Peter Maydell @ 2014-01-06 22:31 UTC (permalink / raw)
  To: linux-arm-kernel

On 6 January 2014 18:20, Marc Zyngier <marc.zyngier@arm.com> wrote:
>  No matter how data is stored in memory (BE, LE, or
> even PDP endianness), CPU registers always have a consistent
> representation. They are immune to CPU endianness change, and storing
> to/reading from memory won't change the value, as long as you use the
> same endianness for writing/reading.

Ah, endianness. This always confuses me, but I hope the following
is correct... (in all the following when I say BE I mean BE8, not BE32,
since BE32 and virtualization never occur in the same CPU).

Certainly registers don't have endianness, but the entire point
of the CPSR.E bit is exactly that it changes the value as it is
stored to / read from memory, isn't it? -- that's where and when the
byte-lane flipping happens.

Where this impacts the hypervisor is that instead of actually sending
the data out to the bus via the byte-swapping h/w, we've trapped instead.
The hypervisor reads the original data directly from the guest CPU
registers, and so it's the hypervisor and userspace support code that
between them have to emulate the equivalent of the byte lane
swapping h/w. You could argue that it shouldn't be the kernel's
job, but since the kernel has to do it for the devices it emulates
internally, I'm not sure that makes much sense.

> What you seems to be missing is that the emulated devices must be
> LE. There is no such thing as a BE GIC.

Right, so a BE guest would be internally flipping the 32 bit value
it wants to write so that when it goes through the CPU's byte-lane
swap (because CPSR.E is set) it appears to the GIC with the correct
bit at the bottom, yes?

(At least I think that's what the GIC being LE means; I don't think
it's like the devices on the Private Peripheral Bus on the M-profile
cores which are genuinely on the CPU's side of the byte-lane
swapping h/w and thus always LE regardless of the state of the
endianness bit. Am I wrong there?)

It's not necessary that *all* emulated devices must be LE, of
course -- you could have a QEMU which supported a board
with a bunch of BE devices on it.

thanks
-- PMM

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH REPOST 5/5] ARM: kvm MMIO support BE host running LE code
  2014-01-06 22:31         ` Peter Maydell
@ 2014-01-06 22:56           ` Christoffer Dall
  2014-01-07  1:59             ` Victor Kamensky
  0 siblings, 1 reply; 26+ messages in thread
From: Christoffer Dall @ 2014-01-06 22:56 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Jan 06, 2014 at 10:31:42PM +0000, Peter Maydell wrote:
> On 6 January 2014 18:20, Marc Zyngier <marc.zyngier@arm.com> wrote:
> >  No matter how data is stored in memory (BE, LE, or
> > even PDP endianness), CPU registers always have a consistent
> > representation. They are immune to CPU endianness change, and storing
> > to/reading from memory won't change the value, as long as you use the
> > same endianness for writing/reading.
> 
> Ah, endianness. This always confuses me, but I hope the following
> is correct... (in all the following when I say BE I mean BE8, not BE32,
> since BE32 and virtualization never occur in the same CPU).
> 
> Certainly registers don't have endianness, but the entire point
> of the CPSR.E bit is exactly that it changes the value as it is
> stored to / read from memory, isn't it? -- that's where and when the
> byte-lane flipping happens.
> 
> Where this impacts the hypervisor is that instead of actually sending
> the data out to the bus via the byte-swapping h/w, we've trapped instead.
> The hypervisor reads the original data directly from the guest CPU
> registers, and so it's the hypervisor and userspace support code that
> between them have to emulate the equivalent of the byte lane
> swapping h/w. You could argue that it shouldn't be the kernel's
> job, but since the kernel has to do it for the devices it emulates
> internally, I'm not sure that makes much sense.

As far as I understand, this is exactly what vcpu_data_guest_to_host and
vcpu_data_host_to_guest do; emulate the byte lane swapping.

The problem is that it only works on a little-endian host with the
current code, because be16_to_cpu (for example), actually perform a
byteswap, which is what needs to be emulated.  On a big-endian host, we
do nothing, so we end up giving a byteswapped value to the emulated
device.

I think a cleaner fix than this patch is to just change the
be16_to_cpu() to a __swab16() instead, which clearly indicates that
'here is the byte lane swapping'.

But admittedly this hurts my brain, so I'm not 100% sure I got this last
part right.

-Christoffer

> 
> > What you seems to be missing is that the emulated devices must be
> > LE. There is no such thing as a BE GIC.
> 
> Right, so a BE guest would be internally flipping the 32 bit value
> it wants to write so that when it goes through the CPU's byte-lane
> swap (because CPSR.E is set) it appears to the GIC with the correct
> bit at the bottom, yes?
> 
> (At least I think that's what the GIC being LE means; I don't think
> it's like the devices on the Private Peripheral Bus on the M-profile
> cores which are genuinely on the CPU's side of the byte-lane
> swapping h/w and thus always LE regardless of the state of the
> endianness bit. Am I wrong there?)
> 
> It's not necessary that *all* emulated devices must be LE, of
> course -- you could have a QEMU which supported a board
> with a bunch of BE devices on it.
> 
> thanks
> -- PMM

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH REPOST 5/5] ARM: kvm MMIO support BE host running LE code
  2014-01-06 22:56           ` Christoffer Dall
@ 2014-01-07  1:59             ` Victor Kamensky
  2014-01-21  1:19               ` Christoffer Dall
  0 siblings, 1 reply; 26+ messages in thread
From: Victor Kamensky @ 2014-01-07  1:59 UTC (permalink / raw)
  To: linux-arm-kernel

On 6 January 2014 14:56, Christoffer Dall <christoffer.dall@linaro.org> wrote:
> On Mon, Jan 06, 2014 at 10:31:42PM +0000, Peter Maydell wrote:
>> On 6 January 2014 18:20, Marc Zyngier <marc.zyngier@arm.com> wrote:
>> >  No matter how data is stored in memory (BE, LE, or
>> > even PDP endianness), CPU registers always have a consistent
>> > representation. They are immune to CPU endianness change, and storing
>> > to/reading from memory won't change the value, as long as you use the
>> > same endianness for writing/reading.
>>
>> Ah, endianness. This always confuses me, but I hope the following
>> is correct... (in all the following when I say BE I mean BE8, not BE32,
>> since BE32 and virtualization never occur in the same CPU).
>>
>> Certainly registers don't have endianness, but the entire point
>> of the CPSR.E bit is exactly that it changes the value as it is
>> stored to / read from memory, isn't it? -- that's where and when the
>> byte-lane flipping happens.
>>
>> Where this impacts the hypervisor is that instead of actually sending
>> the data out to the bus via the byte-swapping h/w, we've trapped instead.
>> The hypervisor reads the original data directly from the guest CPU
>> registers, and so it's the hypervisor and userspace support code that
>> between them have to emulate the equivalent of the byte lane
>> swapping h/w. You could argue that it shouldn't be the kernel's
>> job, but since the kernel has to do it for the devices it emulates
>> internally, I'm not sure that makes much sense.
>
> As far as I understand, this is exactly what vcpu_data_guest_to_host and
> vcpu_data_host_to_guest do; emulate the byte lane swapping.
>
> The problem is that it only works on a little-endian host with the
> current code, because be16_to_cpu (for example), actually perform a
> byteswap, which is what needs to be emulated.  On a big-endian host, we
> do nothing, so we end up giving a byteswapped value to the emulated
> device.

Yes, that was my point on the thread: vcpu_data_guest_to_host and
vcpu_data_host_to_guest functions for any given host endianity should
give opposite endian results depending on CPSR E bit value. And
currently it is not happening in BE host case. It seems that Peter and
you agree with that and I gave example in another email with
dynamically switching E bit illustrating this problem for BE host.

> I think a cleaner fix than this patch is to just change the
> be16_to_cpu() to a __swab16() instead, which clearly indicates that
> 'here is the byte lane swapping'.

Yes, that may work, but it is a bit orthogonal issue. And I don't think
it is better. For this to work one need to change canonical endianity on
one of the sides around vcpu_data_guest_to_host and
vcpu_data_host_to_guest functions.

Changing  it on side that faces hypervisor (code that handles guest spilled
CPU register set) does not make sense at all - if we will keep guest CPU
register set in memory in LE form and hypervisor runs in BE (BE host),
code that spills registers would need to do constant byteswaps. Also any
access by host kernel and hypervisor (all running in BE) would need to do
byteswaps while working with guest saved registers.

Changing canonical form of data on side that faces emulator and mmio
part of kvm_run does not make sense either. kvm_run mmio.data field is
bytes array, when it comes to host kernel from emulator, it already contains
device memory in correct endian order that corresponds to endianity of
emulated device. For example for LE device word read access, after call is
emulated, mmio.data will contain mmio.data[0], mmio.data[1], mmio.data[2]
mmio.data[3] values in LE order (mmio.data[3] is MSB). Now look at
mmio_read_buf function introduced by Marc's 6d89d2d9 commit, this function
will byte copy this mmio.data buffer into integer according to ongoing mmio
access size. Note in BE host case such integer, in 'data' variable of
kvm_handle_mmio_return function, will have byteswapped value. Now when it will
be passed into vcpu_data_host_to_guest function, and it emulates read access
of guest with E bit set, and if we follow your suggestion, it will be
byteswapped.
I.e 'data' integer will contain non byteswapped value of LE device. It will be
further stored into some vcpu_reg register, still in native format (BE
store), and
further restored into guest CPU register, still non byteswapped (BE hypervisor).
And that is not what BE client reading word of LE device expects - BE client
knowing that it reads LE device with E bit set, it will issue additional rev
instruction to get device memory as integer. If we really want to follow your
suggestion, one may introduce compensatory byteswaps in mmio_read_buf
and mmio_write_buf functions in case of BE host, rather then just do
memcpy ... but I am not sure what it will buy us - in BE case it will swap data
twice.

Note in above description by "canonical" I mean some form of data regardless
of current access CPSR E value. But it may differ depending on host endianess.

Also as far as working with VGIC concerned: PATCH 2/5 [1] of this
series reads real
h/w vgic values from #GICH_HCR, #VGIC_CPU_VMCR, etc and byteswapps them in
case of BE host. So now VGIC "integer" values are present in kernel in cpu
native format. When mmio_data_read, and mmio_data_write functions of vgic.c
are called to fill mmio.data array because VGIC values are now in native format
but mmio.data array should contain memory in device endianity (LE for VGIC) my
PATCH 4/5 [2] of this series cpu_to_le32 and le32_to_cpu function to byteswap. I
admit that PATCH 4/5 comment is a bit obscure.

Thanks,
Victor

[1] http://lists.infradead.org/pipermail/linux-arm-kernel/2013-December/221168.html
[2] http://lists.infradead.org/pipermail/linux-arm-kernel/2013-December/221167.html

> But admittedly this hurts my brain, so I'm not 100% sure I got this last
> part right.
>
> -Christoffer
>
>>
>> > What you seems to be missing is that the emulated devices must be
>> > LE. There is no such thing as a BE GIC.
>>
>> Right, so a BE guest would be internally flipping the 32 bit value
>> it wants to write so that when it goes through the CPU's byte-lane
>> swap (because CPSR.E is set) it appears to the GIC with the correct
>> bit at the bottom, yes?
>>
>> (At least I think that's what the GIC being LE means; I don't think
>> it's like the devices on the Private Peripheral Bus on the M-profile
>> cores which are genuinely on the CPU's side of the byte-lane
>> swapping h/w and thus always LE regardless of the state of the
>> endianness bit. Am I wrong there?)
>>
>> It's not necessary that *all* emulated devices must be LE, of
>> course -- you could have a QEMU which supported a board
>> with a bunch of BE devices on it.
>>
>> thanks
>> -- PMM
> _______________________________________________
> kvmarm mailing list
> kvmarm at lists.cs.columbia.edu
> https://lists.cs.columbia.edu/cucslists/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH REPOST 1/5] ARM: kvm: replace push and pop with stdmb and ldmia instrs to enable assembler.h inclusion
  2013-12-20 16:48 ` [PATCH REPOST 1/5] ARM: kvm: replace push and pop with stdmb and ldmia instrs to enable assembler.h inclusion Victor Kamensky
@ 2014-01-21  1:18   ` Christoffer Dall
  2014-01-21  9:58     ` Marc Zyngier
  0 siblings, 1 reply; 26+ messages in thread
From: Christoffer Dall @ 2014-01-21  1:18 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Dec 20, 2013 at 08:48:41AM -0800, Victor Kamensky wrote:
> Before fix kvm interrupt.S and interrupt_head.S used push and pop assembler
> instruction. It causes problem if <asm/assembler.h> file should be include. In
> assembler.h "push" is defined as macro so it causes compilation errors like
> this:

"Before fix kvm..." doesn't read very pleasently, consider using
something like "Prior to this commit...."

"causes a problem" or "causes problems"

change "if <asm/assembler.h> file should be include..." to "if
<asm/assembler.h> is included, because assember.h defines 'push' as a
macro..."



> 
> arch/arm/kvm/interrupts.S: Assembler messages:
> arch/arm/kvm/interrupts.S:51: Error: ARM register expected -- `lsr {r2,r3}'
> 
> Solution implemented by this patch replaces all 'push {...}' with
> 'stdmb sp!, {...}' instruction; and all 'pop {...}' with 'ldmia sp!, {...}'.
> 
> Signed-off-by: Victor Kamensky <victor.kamensky@linaro.org>
> ---
>  arch/arm/kvm/interrupts.S      | 38 +++++++++++++++++++-------------------
>  arch/arm/kvm/interrupts_head.S | 34 +++++++++++++++++-----------------
>  2 files changed, 36 insertions(+), 36 deletions(-)
> 
> diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S
> index ddc1553..df19133 100644
> --- a/arch/arm/kvm/interrupts.S
> +++ b/arch/arm/kvm/interrupts.S
> @@ -47,7 +47,7 @@ __kvm_hyp_code_start:
>   * instead, ignoring the ipa value.
>   */
>  ENTRY(__kvm_tlb_flush_vmid_ipa)
> -	push	{r2, r3}
> +	stmdb	sp!, {r2, r3}
>  
>  	dsb	ishst
>  	add	r0, r0, #KVM_VTTBR
> @@ -62,7 +62,7 @@ ENTRY(__kvm_tlb_flush_vmid_ipa)
>  	mcrr	p15, 6, r2, r3, c2	@ Back to VMID #0
>  	isb				@ Not necessary if followed by eret
>  
> -	pop	{r2, r3}
> +	ldmia	sp!, {r2, r3}
>  	bx	lr
>  ENDPROC(__kvm_tlb_flush_vmid_ipa)
>  
> @@ -110,7 +110,7 @@ ENTRY(__kvm_vcpu_run)
>  #ifdef CONFIG_VFPv3
>  	@ Set FPEXC_EN so the guest doesn't trap floating point instructions
>  	VFPFMRX r2, FPEXC		@ VMRS
> -	push	{r2}
> +	stmdb	sp!, {r2}
>  	orr	r2, r2, #FPEXC_EN
>  	VFPFMXR FPEXC, r2		@ VMSR
>  #endif
> @@ -175,7 +175,7 @@ __kvm_vcpu_return:
>  
>  after_vfp_restore:
>  	@ Restore FPEXC_EN which we clobbered on entry
> -	pop	{r2}
> +	ldmia	sp!, {r2}
>  	VFPFMXR FPEXC, r2
>  #endif
>  
> @@ -260,7 +260,7 @@ ENTRY(kvm_call_hyp)
>  
>  /* Handle undef, svc, pabt, or dabt by crashing with a user notice */
>  .macro bad_exception exception_code, panic_str
> -	push	{r0-r2}
> +	stmdb   sp!, {r0-r2}
>  	mrrc	p15, 6, r0, r1, c2	@ Read VTTBR
>  	lsr	r1, r1, #16
>  	ands	r1, r1, #0xff
> @@ -338,7 +338,7 @@ hyp_hvc:
>  	 * Getting here is either becuase of a trap from a guest or from calling
>  	 * HVC from the host kernel, which means "switch to Hyp mode".
>  	 */
> -	push	{r0, r1, r2}
> +	stmdb	sp!, {r0, r1, r2}
>  
>  	@ Check syndrome register
>  	mrc	p15, 4, r1, c5, c2, 0	@ HSR
> @@ -361,11 +361,11 @@ hyp_hvc:
>  	bne	guest_trap		@ Guest called HVC
>  
>  host_switch_to_hyp:
> -	pop	{r0, r1, r2}
> +	ldmia	sp!, {r0, r1, r2}
>  
> -	push	{lr}
> +	stmdb	sp!, {lr}
>  	mrs	lr, SPSR
> -	push	{lr}
> +	stmdb	sp!, {lr}
>  
>  	mov	lr, r0
>  	mov	r0, r1
> @@ -375,9 +375,9 @@ host_switch_to_hyp:
>  THUMB(	orr	lr, #1)
>  	blx	lr			@ Call the HYP function
>  
> -	pop	{lr}
> +	ldmia	sp!, {lr}
>  	msr	SPSR_csxf, lr
> -	pop	{lr}
> +	ldmia	sp!, {lr}
>  	eret
>  
>  guest_trap:
> @@ -418,7 +418,7 @@ guest_trap:
>  
>  	/* Preserve PAR */
>  	mrrc	p15, 0, r0, r1, c7	@ PAR
> -	push	{r0, r1}
> +	stmdb	sp!, {r0, r1}
>  
>  	/* Resolve IPA using the xFAR */
>  	mcr	p15, 0, r2, c7, c8, 0	@ ATS1CPR
> @@ -431,7 +431,7 @@ guest_trap:
>  	orr	r2, r2, r1, lsl #24
>  
>  	/* Restore PAR */
> -	pop	{r0, r1}
> +	ldmia	sp!, {r0, r1}
>  	mcrr	p15, 0, r0, r1, c7	@ PAR
>  
>  3:	load_vcpu			@ Load VCPU pointer to r0
> @@ -440,10 +440,10 @@ guest_trap:
>  1:	mov	r1, #ARM_EXCEPTION_HVC
>  	b	__kvm_vcpu_return
>  
> -4:	pop	{r0, r1}		@ Failed translation, return to guest
> +4:	ldmia	sp!, {r0, r1}		@ Failed translation, return to guest
>  	mcrr	p15, 0, r0, r1, c7	@ PAR
>  	clrex
> -	pop	{r0, r1, r2}
> +	ldmia	sp!, {r0, r1, r2}
>  	eret
>  
>  /*
> @@ -455,7 +455,7 @@ guest_trap:
>  #ifdef CONFIG_VFPv3
>  switch_to_guest_vfp:
>  	load_vcpu			@ Load VCPU pointer to r0
> -	push	{r3-r7}
> +	stmdb	sp!, {r3-r7}
>  
>  	@ NEON/VFP used.  Turn on VFP access.
>  	set_hcptr vmexit, (HCPTR_TCP(10) | HCPTR_TCP(11))
> @@ -467,15 +467,15 @@ switch_to_guest_vfp:
>  	add	r7, r0, #VCPU_VFP_GUEST
>  	restore_vfp_state r7
>  
> -	pop	{r3-r7}
> -	pop	{r0-r2}
> +	ldmia	sp!, {r3-r7}
> +	ldmia	sp!, {r0-r2}
>  	clrex
>  	eret
>  #endif
>  
>  	.align
>  hyp_irq:
> -	push	{r0, r1, r2}
> +	stmdb	sp!, {r0, r1, r2}
>  	mov	r1, #ARM_EXCEPTION_IRQ
>  	load_vcpu			@ Load VCPU pointer to r0
>  	b	__kvm_vcpu_return
> diff --git a/arch/arm/kvm/interrupts_head.S b/arch/arm/kvm/interrupts_head.S
> index 6f18695..c371db7 100644
> --- a/arch/arm/kvm/interrupts_head.S
> +++ b/arch/arm/kvm/interrupts_head.S
> @@ -63,7 +63,7 @@ vcpu	.req	r0		@ vcpu pointer always in r0
>  	mrs	r2, SP_\mode
>  	mrs	r3, LR_\mode
>  	mrs	r4, SPSR_\mode
> -	push	{r2, r3, r4}
> +	stmdb	sp!, {r2, r3, r4}
>  .endm
>  
>  /*
> @@ -73,13 +73,13 @@ vcpu	.req	r0		@ vcpu pointer always in r0
>  .macro save_host_regs
>  	/* Hyp regs. Only ELR_hyp (SPSR_hyp already saved) */
>  	mrs	r2, ELR_hyp
> -	push	{r2}
> +	stmdb	sp!, {r2}
>  
>  	/* usr regs */
> -	push	{r4-r12}	@ r0-r3 are always clobbered
> +	stmdb	sp!, {r4-r12}	@ r0-r3 are always clobbered
>  	mrs	r2, SP_usr
>  	mov	r3, lr
> -	push	{r2, r3}
> +	stmdb	sp!, {r2, r3}
>  
>  	push_host_regs_mode svc
>  	push_host_regs_mode abt
> @@ -95,11 +95,11 @@ vcpu	.req	r0		@ vcpu pointer always in r0
>  	mrs	r7, SP_fiq
>  	mrs	r8, LR_fiq
>  	mrs	r9, SPSR_fiq
> -	push	{r2-r9}
> +	stmdb	sp!, {r2-r9}
>  .endm
>  
>  .macro pop_host_regs_mode mode
> -	pop	{r2, r3, r4}
> +	ldmia	sp!, {r2, r3, r4}
>  	msr	SP_\mode, r2
>  	msr	LR_\mode, r3
>  	msr	SPSR_\mode, r4
> @@ -110,7 +110,7 @@ vcpu	.req	r0		@ vcpu pointer always in r0
>   * Clobbers all registers, in all modes, except r0 and r1.
>   */
>  .macro restore_host_regs
> -	pop	{r2-r9}
> +	ldmia	sp!, {r2-r9}
>  	msr	r8_fiq, r2
>  	msr	r9_fiq, r3
>  	msr	r10_fiq, r4
> @@ -125,12 +125,12 @@ vcpu	.req	r0		@ vcpu pointer always in r0
>  	pop_host_regs_mode abt
>  	pop_host_regs_mode svc
>  
> -	pop	{r2, r3}
> +	ldmia	sp!, {r2, r3}
>  	msr	SP_usr, r2
>  	mov	lr, r3
> -	pop	{r4-r12}
> +	ldmia	sp!, {r4-r12}
>  
> -	pop	{r2}
> +	ldmia	sp!, {r2}
>  	msr	ELR_hyp, r2
>  .endm
>  
> @@ -218,7 +218,7 @@ vcpu	.req	r0		@ vcpu pointer always in r0
>  	add	r2, vcpu, #VCPU_USR_REG(3)
>  	stm	r2, {r3-r12}
>  	add	r2, vcpu, #VCPU_USR_REG(0)
> -	pop	{r3, r4, r5}		@ r0, r1, r2
> +	ldmia	sp!, {r3, r4, r5}		@ r0, r1, r2
>  	stm	r2, {r3, r4, r5}
>  	mrs	r2, SP_usr
>  	mov	r3, lr
> @@ -258,7 +258,7 @@ vcpu	.req	r0		@ vcpu pointer always in r0
>  	mrc	p15, 2, r12, c0, c0, 0	@ CSSELR
>  
>  	.if \store_to_vcpu == 0
> -	push	{r2-r12}		@ Push CP15 registers
> +	stmdb	sp!, {r2-r12}		@ Push CP15 registers
>  	.else
>  	str	r2, [vcpu, #CP15_OFFSET(c1_SCTLR)]
>  	str	r3, [vcpu, #CP15_OFFSET(c1_CPACR)]
> @@ -286,7 +286,7 @@ vcpu	.req	r0		@ vcpu pointer always in r0
>  	mrc	p15, 0, r12, c12, c0, 0	@ VBAR
>  
>  	.if \store_to_vcpu == 0
> -	push	{r2-r12}		@ Push CP15 registers
> +	stmdb	sp!, {r2-r12}		@ Push CP15 registers
>  	.else
>  	str	r2, [vcpu, #CP15_OFFSET(c13_CID)]
>  	str	r3, [vcpu, #CP15_OFFSET(c13_TID_URW)]
> @@ -305,7 +305,7 @@ vcpu	.req	r0		@ vcpu pointer always in r0
>  	mrrc	p15, 0, r4, r5, c7	@ PAR
>  
>  	.if \store_to_vcpu == 0
> -	push	{r2,r4-r5}
> +	stmdb	sp!, {r2,r4-r5}
>  	.else
>  	str	r2, [vcpu, #CP15_OFFSET(c14_CNTKCTL)]
>  	add	r12, vcpu, #CP15_OFFSET(c7_PAR)
> @@ -322,7 +322,7 @@ vcpu	.req	r0		@ vcpu pointer always in r0
>   */
>  .macro write_cp15_state read_from_vcpu
>  	.if \read_from_vcpu == 0
> -	pop	{r2,r4-r5}
> +	ldmia	sp!, {r2,r4-r5}
>  	.else
>  	ldr	r2, [vcpu, #CP15_OFFSET(c14_CNTKCTL)]
>  	add	r12, vcpu, #CP15_OFFSET(c7_PAR)
> @@ -333,7 +333,7 @@ vcpu	.req	r0		@ vcpu pointer always in r0
>  	mcrr	p15, 0, r4, r5, c7	@ PAR
>  
>  	.if \read_from_vcpu == 0
> -	pop	{r2-r12}
> +	ldmia	sp!, {r2-r12}
>  	.else
>  	ldr	r2, [vcpu, #CP15_OFFSET(c13_CID)]
>  	ldr	r3, [vcpu, #CP15_OFFSET(c13_TID_URW)]
> @@ -361,7 +361,7 @@ vcpu	.req	r0		@ vcpu pointer always in r0
>  	mcr	p15, 0, r12, c12, c0, 0	@ VBAR
>  
>  	.if \read_from_vcpu == 0
> -	pop	{r2-r12}
> +	ldmia	sp!, {r2-r12}
>  	.else
>  	ldr	r2, [vcpu, #CP15_OFFSET(c1_SCTLR)]
>  	ldr	r3, [vcpu, #CP15_OFFSET(c1_CPACR)]
> -- 
> 1.8.1.4
> 

If you fix to address Dave's comments, then the code change otherwise
looks good.

Thanks,
-- 
Christoffer

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH REPOST 2/5] ARM: fix KVM assembler files to work in BE case
  2013-12-20 16:48 ` [PATCH REPOST 2/5] ARM: fix KVM assembler files to work in BE case Victor Kamensky
@ 2014-01-21  1:18   ` Christoffer Dall
  0 siblings, 0 replies; 26+ messages in thread
From: Christoffer Dall @ 2014-01-21  1:18 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Dec 20, 2013 at 08:48:42AM -0800, Victor Kamensky wrote:
> ARM v7 KVM assembler files fixes to work in big endian mode:

I don't think 'files fixes' is proper English, could be something like:

Fix ARM v7 KVM assembler files to work...

> 
> vgic h/w registers are little endian; when asm code reads/writes from/to

the vgic h/w registers

> them, it needs to do byteswap after/before. Byte swap code uses ARM_BE8

Byteswap

> wrapper to add swap only if BIG_ENDIAN kernel is configured

what is the config symbol, CONFIG_BIG_ENDIAN?

> 
> mcrr and mrrc instructions take couple 32 bit registers as argument, one

The mcrr and mrrc...

a couple of

as their arguments

> is supposed to be high part of 64 bit value and another is low part of
> 64 bit. Typically those values are loaded/stored with ldrd and strd

one is supposed to be?

> instructions and those will load high and low parts in opposite register
> depending on endianity. Introduce and use rr_lo_hi macro that swap

opposite register?  This text is more confusing that clarifying, I think
you need to explain what how the rr_lo_hi macro is intended to be used
if anything.

> registers in BE mode when they are passed to mcrr and mrrc instructions.
> 
> function that returns 64 bit result __kvm_vcpu_run in couple registers
> has to be adjusted for BE case.

The __kvm_vcpu_run function returns a 64-bit result in two registers,
which has...

> 
> Signed-off-by: Victor Kamensky <victor.kamensky@linaro.org>
> ---
>  arch/arm/include/asm/assembler.h |  7 +++++++
>  arch/arm/include/asm/kvm_asm.h   |  4 ++--
>  arch/arm/kvm/init.S              |  7 +++++--
>  arch/arm/kvm/interrupts.S        | 12 +++++++++---
>  arch/arm/kvm/interrupts_head.S   | 27 ++++++++++++++++++++-------
>  5 files changed, 43 insertions(+), 14 deletions(-)
> 
> diff --git a/arch/arm/include/asm/assembler.h b/arch/arm/include/asm/assembler.h
> index 5c22851..ad1ad31 100644
> --- a/arch/arm/include/asm/assembler.h
> +++ b/arch/arm/include/asm/assembler.h
> @@ -60,6 +60,13 @@
>  #define ARM_BE8(code...)
>  #endif
>  
> +/* swap pair of registers position depending on current endianity */
> +#ifdef CONFIG_CPU_ENDIAN_BE8
> +#define rr_lo_hi(a1, a2) a2, a1
> +#else
> +#define rr_lo_hi(a1, a2) a1, a2
> +#endif
> +

I'm not convinced that this is needed generally in the kernel and not
locally to KVM, but if it is, then I think it needs to be documented
more.  I assume the idea here is that a1 is always the lowered number
register in an ldrd instruction loading the values to write to the
register?

>  /*
>   * Data preload for architectures that support it
>   */
> diff --git a/arch/arm/include/asm/kvm_asm.h b/arch/arm/include/asm/kvm_asm.h
> index 661da11..12981d6 100644
> --- a/arch/arm/include/asm/kvm_asm.h
> +++ b/arch/arm/include/asm/kvm_asm.h
> @@ -26,9 +26,9 @@
>  #define c1_ACTLR	4	/* Auxilliary Control Register */
>  #define c1_CPACR	5	/* Coprocessor Access Control */
>  #define c2_TTBR0	6	/* Translation Table Base Register 0 */
> -#define c2_TTBR0_high	7	/* TTBR0 top 32 bits */
> +#define c2_TTBR0_hilo	7	/* TTBR0 top 32 bits in LE case, low 32 bits in BE case */
>  #define c2_TTBR1	8	/* Translation Table Base Register 1 */
> -#define c2_TTBR1_high	9	/* TTBR1 top 32 bits */
> +#define c2_TTBR1_hilo	9	/* TTBR1 top 32 bits in LE case, low 32 bits in BE case */

These lines far exceed 80 chars, but not sure how to improve on that...

>  #define c2_TTBCR	10	/* Translation Table Base Control R. */
>  #define c3_DACR		11	/* Domain Access Control Register */
>  #define c5_DFSR		12	/* Data Fault Status Register */
> diff --git a/arch/arm/kvm/init.S b/arch/arm/kvm/init.S
> index 1b9844d..2d10b2d 100644
> --- a/arch/arm/kvm/init.S
> +++ b/arch/arm/kvm/init.S
> @@ -22,6 +22,7 @@
>  #include <asm/kvm_asm.h>
>  #include <asm/kvm_arm.h>
>  #include <asm/kvm_mmu.h>
> +#include <asm/assembler.h>
>  
>  /********************************************************************
>   * Hypervisor initialization
> @@ -70,8 +71,10 @@ __do_hyp_init:
>  	cmp	r0, #0			@ We have a SP?
>  	bne	phase2			@ Yes, second stage init
>  
> +ARM_BE8(setend	be) @ Switch to Big Endian mode if needed
> +
>  	@ Set the HTTBR to point to the hypervisor PGD pointer passed
> -	mcrr	p15, 4, r2, r3, c2
> +	mcrr	p15, 4, rr_lo_hi(r2, r3), c2
>  
>  	@ Set the HTCR and VTCR to the same shareability and cacheability
>  	@ settings as the non-secure TTBCR and with T0SZ == 0.
> @@ -137,7 +140,7 @@ phase2:
>  	mov	pc, r0
>  
>  target:	@ We're now in the trampoline code, switch page tables
> -	mcrr	p15, 4, r2, r3, c2
> +	mcrr	p15, 4, rr_lo_hi(r2, r3), c2
>  	isb

I guess you could switch r2 and r3 (without a third register or using
stack space) on big endian to avoid the need for the macro in a header
file and define the macro locally in the interrupts*.S files...  Hmmm,
undecided.

>  
>  	@ Invalidate the old TLBs
> diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S
> index df19133..0784ec3 100644
> --- a/arch/arm/kvm/interrupts.S
> +++ b/arch/arm/kvm/interrupts.S
> @@ -25,6 +25,7 @@
>  #include <asm/kvm_asm.h>
>  #include <asm/kvm_arm.h>
>  #include <asm/vfpmacros.h>
> +#include <asm/assembler.h>
>  #include "interrupts_head.S"
>  
>  	.text
> @@ -52,14 +53,14 @@ ENTRY(__kvm_tlb_flush_vmid_ipa)
>  	dsb	ishst
>  	add	r0, r0, #KVM_VTTBR
>  	ldrd	r2, r3, [r0]
> -	mcrr	p15, 6, r2, r3, c2	@ Write VTTBR
> +	mcrr	p15, 6, rr_lo_hi(r2, r3), c2	@ Write VTTBR
>  	isb
>  	mcr     p15, 0, r0, c8, c3, 0	@ TLBIALLIS (rt ignored)
>  	dsb	ish
>  	isb
>  	mov	r2, #0
>  	mov	r3, #0
> -	mcrr	p15, 6, r2, r3, c2	@ Back to VMID #0
> +	mcrr	p15, 6, rr_lo_hi(r2, r3), c2	@ Back to VMID #0
>  	isb				@ Not necessary if followed by eret
>  
>  	ldmia	sp!, {r2, r3}
> @@ -135,7 +136,7 @@ ENTRY(__kvm_vcpu_run)
>  	ldr	r1, [vcpu, #VCPU_KVM]
>  	add	r1, r1, #KVM_VTTBR
>  	ldrd	r2, r3, [r1]
> -	mcrr	p15, 6, r2, r3, c2	@ Write VTTBR
> +	mcrr	p15, 6, rr_lo_hi(r2, r3), c2	@ Write VTTBR
>  
>  	@ We're all done, just restore the GPRs and go to the guest
>  	restore_guest_regs
> @@ -199,8 +200,13 @@ after_vfp_restore:
>  
>  	restore_host_regs
>  	clrex				@ Clear exclusive monitor
> +#ifndef __ARMEB__
>  	mov	r0, r1			@ Return the return code
>  	mov	r1, #0			@ Clear upper bits in return value
> +#else
> +	@ r1 already has return code
> +	mov	r0, #0			@ Clear upper bits in return value
> +#endif /* __ARMEB__ */
>  	bx	lr			@ return to IOCTL
>  
>  /********************************************************************
> diff --git a/arch/arm/kvm/interrupts_head.S b/arch/arm/kvm/interrupts_head.S
> index c371db7..67b4002 100644
> --- a/arch/arm/kvm/interrupts_head.S
> +++ b/arch/arm/kvm/interrupts_head.S
> @@ -251,8 +251,8 @@ vcpu	.req	r0		@ vcpu pointer always in r0
>  	mrc	p15, 0, r3, c1, c0, 2	@ CPACR
>  	mrc	p15, 0, r4, c2, c0, 2	@ TTBCR
>  	mrc	p15, 0, r5, c3, c0, 0	@ DACR
> -	mrrc	p15, 0, r6, r7, c2	@ TTBR 0
> -	mrrc	p15, 1, r8, r9, c2	@ TTBR 1
> +	mrrc	p15, 0, rr_lo_hi(r6, r7), c2	@ TTBR 0
> +	mrrc	p15, 1, rr_lo_hi(r8, r9), c2	@ TTBR 1
>  	mrc	p15, 0, r10, c10, c2, 0	@ PRRR
>  	mrc	p15, 0, r11, c10, c2, 1	@ NMRR
>  	mrc	p15, 2, r12, c0, c0, 0	@ CSSELR
> @@ -380,8 +380,8 @@ vcpu	.req	r0		@ vcpu pointer always in r0
>  	mcr	p15, 0, r3, c1, c0, 2	@ CPACR
>  	mcr	p15, 0, r4, c2, c0, 2	@ TTBCR
>  	mcr	p15, 0, r5, c3, c0, 0	@ DACR
> -	mcrr	p15, 0, r6, r7, c2	@ TTBR 0
> -	mcrr	p15, 1, r8, r9, c2	@ TTBR 1
> +	mcrr	p15, 0, rr_lo_hi(r6, r7), c2	@ TTBR 0
> +	mcrr	p15, 1, rr_lo_hi(r8, r9), c2	@ TTBR 1
>  	mcr	p15, 0, r10, c10, c2, 0	@ PRRR
>  	mcr	p15, 0, r11, c10, c2, 1	@ NMRR
>  	mcr	p15, 2, r12, c0, c0, 0	@ CSSELR
> @@ -413,13 +413,21 @@ vcpu	.req	r0		@ vcpu pointer always in r0
>  	ldr	r9, [r2, #GICH_ELRSR1]
>  	ldr	r10, [r2, #GICH_APR]
>  
> +ARM_BE8(rev	r3, r3	)
>  	str	r3, [r11, #VGIC_CPU_HCR]
> +ARM_BE8(rev	r4, r4	)
>  	str	r4, [r11, #VGIC_CPU_VMCR]
> +ARM_BE8(rev	r5, r5	)
>  	str	r5, [r11, #VGIC_CPU_MISR]
> +ARM_BE8(rev	r6, r6	)
>  	str	r6, [r11, #VGIC_CPU_EISR]
> +ARM_BE8(rev	r7, r7	)
>  	str	r7, [r11, #(VGIC_CPU_EISR + 4)]
> +ARM_BE8(rev	r8, r8	)
>  	str	r8, [r11, #VGIC_CPU_ELRSR]
> +ARM_BE8(rev	r9, r9	)
>  	str	r9, [r11, #(VGIC_CPU_ELRSR + 4)]
> +ARM_BE8(rev	r10, r10	)
>  	str	r10, [r11, #VGIC_CPU_APR]

Wouldn't it be semantically cleaner to to the byteswap after the loads
from the hardware instead?

>  
>  	/* Clear GICH_HCR */
> @@ -431,6 +439,7 @@ vcpu	.req	r0		@ vcpu pointer always in r0
>  	add	r3, r11, #VGIC_CPU_LR
>  	ldr	r4, [r11, #VGIC_CPU_NR_LR]
>  1:	ldr	r6, [r2], #4
> +ARM_BE8(rev	r6, r6	)
>  	str	r6, [r3], #4
>  	subs	r4, r4, #1
>  	bne	1b
> @@ -459,8 +468,11 @@ vcpu	.req	r0		@ vcpu pointer always in r0
>  	ldr	r4, [r11, #VGIC_CPU_VMCR]
>  	ldr	r8, [r11, #VGIC_CPU_APR]
>  
> +ARM_BE8(rev	r3, r3	)
>  	str	r3, [r2, #GICH_HCR]
> +ARM_BE8(rev	r4, r4	)
>  	str	r4, [r2, #GICH_VMCR]
> +ARM_BE8(rev	r8, r8	)
>  	str	r8, [r2, #GICH_APR]
>  
>  	/* Restore list registers */
> @@ -468,6 +480,7 @@ vcpu	.req	r0		@ vcpu pointer always in r0
>  	add	r3, r11, #VGIC_CPU_LR
>  	ldr	r4, [r11, #VGIC_CPU_NR_LR]
>  1:	ldr	r6, [r3], #4
> +ARM_BE8(rev	r6, r6  )
>  	str	r6, [r2], #4
>  	subs	r4, r4, #1
>  	bne	1b
> @@ -498,7 +511,7 @@ vcpu	.req	r0		@ vcpu pointer always in r0
>  	mcr	p15, 0, r2, c14, c3, 1	@ CNTV_CTL
>  	isb
>  
> -	mrrc	p15, 3, r2, r3, c14	@ CNTV_CVAL
> +	mrrc	p15, 3, rr_lo_hi(r2, r3), c14	@ CNTV_CVAL
>  	ldr	r4, =VCPU_TIMER_CNTV_CVAL
>  	add	r5, vcpu, r4
>  	strd	r2, r3, [r5]
> @@ -538,12 +551,12 @@ vcpu	.req	r0		@ vcpu pointer always in r0
>  
>  	ldr	r2, [r4, #KVM_TIMER_CNTVOFF]
>  	ldr	r3, [r4, #(KVM_TIMER_CNTVOFF + 4)]
> -	mcrr	p15, 4, r2, r3, c14	@ CNTVOFF
> +	mcrr	p15, 4, rr_lo_hi(r2, r3), c14	@ CNTVOFF
>  
>  	ldr	r4, =VCPU_TIMER_CNTV_CVAL
>  	add	r5, vcpu, r4
>  	ldrd	r2, r3, [r5]
> -	mcrr	p15, 3, r2, r3, c14	@ CNTV_CVAL
> +	mcrr	p15, 3, rr_lo_hi(r2, r3), c14	@ CNTV_CVAL
>  	isb
>  
>  	ldr	r2, [vcpu, #VCPU_TIMER_CNTV_CTL]
> -- 
> 1.8.1.4
> 

Thanks,
-- 
Christoffer

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH REPOST 3/5] ARM: kvm one_reg coproc set and get BE fixes
  2013-12-20 16:48 ` [PATCH REPOST 3/5] ARM: kvm one_reg coproc set and get BE fixes Victor Kamensky
@ 2014-01-21  1:18   ` Christoffer Dall
  0 siblings, 0 replies; 26+ messages in thread
From: Christoffer Dall @ 2014-01-21  1:18 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Victor,

On Fri, Dec 20, 2013 at 08:48:43AM -0800, Victor Kamensky wrote:
> This patch fixes issue of reading and writing

interesting line break.

an issue with

> ARM V7 registers values from/to user land. Existing code was designed to
> work only in LE case.

The existing code...

'LE case'?  'little-endian'?

> 
> struct kvm_one_reg
> ------------------
> 
> registers value passed through kvm_one_reg structure. It is used by

registers value passed through?  What are you trying to say?

> KVM_SET_ONE_REG, KVM_GET_ONE_REG ioctls. Note by looking at structure

by the KVM...

the structure

> itself we cannot tell what is size of register. Note that structure carries

the size of the register

> address of user memory, 'addr' where register should be read or written

I'm a little confused as to the value of this Section of the commit
text.  I believe the ONE_REG interface is quite well documented
already...

> 
> Setting register (from user-land to kvm)
> ----------------------------------------
> 
> kvm_arm_set_reg takes vcpu and pointer to struct kvm_one_reg which already
> read from user space

I think you could ditch this first sentence

> 
> kvm_arm_set_reg calls set_core_reg or kvm_arm_coproc_set_reg

nit: adding kvm_arm_set_reg() makes it clear that this is the function
you're refering to, and not the ioctl as a concept.

> 
> set_core_reg deals only with 4 bytes registers, it just reads 4 bytes from
> user space and store it properly into vcpu->arch.regs

stores

> 
> kvm_arm_coproc_set_reg deals with registers of different size. At certain

different sizes

At a certain point

> point code reaches phase where it retrieves description of register by id

the description of a register

> and it knows register size, which could be either 4 or 8 bytes. Kernel code

s/could be/is/

Kernel code is ready?

> is ready to read values from user space, but destination type may vary. It
> could be pointer to 32 bit integer or it could be pointer to 64 bit
> integer. And all possible permutation of size and destination pointer are

permutations

> possible.  Depending on destination pointer type, 4 bytes or 8 bytes, two

the destination pointer type

> new helper functions are introduced  - reg_from_user32 and reg_from_user64.
> They are used instead of reg_from_user function which could work only in
> LE case.

which only worked in

> 
> Size  sizeof(*DstInt)   Function used to read from user
>    4                4   reg_from_user32
>    8                4   reg_from_user32 - read two registers
>    4                8   reg_from_user64 - need special handling for BE
>    8                8   reg_from_user64
> 
> Getting register (to user-land from kvm)
> ----------------------------------------
> 
> Situation with reading registers is similar to writing. Integer pointer

The situation

> type of register to be copied could be 4 or 8 bytes. And size passed in

The integer pointer

pointer to be copied?  Please clarify what you are referring to.

> struct kvm_one_reg could be 4 or 8. And any permutation is possible.

Any permutation of source pointer type and size is possible.

> Depending on src pointer type, 4 bytes or 8 bytes, two new helper functions
> are introduced - reg_from_user32 and reg_to_user64. They are used instead

reg_to_user32?

> of reg_to_user function, which could work only in LE case.

the reg_to_user, which worked only for LE.

> 
> Size  sizeof(*SrcInt)   Function used to write to user
>    4                4   reg_to_user32
>    8                4   reg_to_user32 - writes two registers
>    4                8   reg_to_user64 - need special handleing for BE
>    8                8   reg_to_user64

I think it could be slightly more helpful to put a comment on the
functions, like "Write to 32-bit user pointer" on reg_to_user32, but
it's up to you.

> 
> Note code does assume that it can only deals with 4 or 8 byte registers.

Note: We only support register sizes of 4 or 8 bytes.

> 
> Signed-off-by: Victor Kamensky <victor.kamensky@linaro.org>

I don't mean to hammer on your commit message with all my comments.  I
really do appreciate you taking the time to document your changes.
However, with the level of detail you are providing in the commit, I
think you have to be slightly more careful with the language, so that it
doesn't end up being misleading instead of helpful.  I think you could
sum this up much shorter to simply say that core register handling is
already endian-safe, but coprocessors and vfpregs use reg_to_user which
is not endian-safe, and therefore needs changing.

The motivation about the pointer types and register sizes being
arbitrarily different is important though, so I appreciate you listing
that.


> ---
>  arch/arm/kvm/coproc.c | 94 +++++++++++++++++++++++++++++++++++++--------------
>  1 file changed, 69 insertions(+), 25 deletions(-)
> 
> diff --git a/arch/arm/kvm/coproc.c b/arch/arm/kvm/coproc.c
> index 78c0885..64b2b94 100644
> --- a/arch/arm/kvm/coproc.c
> +++ b/arch/arm/kvm/coproc.c
> @@ -634,17 +634,61 @@ static struct coproc_reg invariant_cp15[] = {
>  	{ CRn( 0), CRm( 0), Op1( 1), Op2( 7), is32, NULL, get_AIDR },
>  };
>  
> -static int reg_from_user(void *val, const void __user *uaddr, u64 id)
> +static int reg_from_user64(u64 *val, const void __user *uaddr, u64 id)
> +{
> +	unsigned long regsize = KVM_REG_SIZE(id);
> +	union {
> +		u32	word;
> +		u64	dword;
> +	} tmp = {0};
> +
> +	if (copy_from_user(&tmp, uaddr, regsize) != 0)
> +		return -EFAULT;
> +
> +	switch (regsize) {
> +	case 4:
> +		*val = tmp.word;
> +		break;
> +	case 8:
> +		*val = tmp.dword;
> +		break;
> +	}
> +	return 0;
> +}

You stated in the commit message that any permutation of
KVM_REG_SIZE(id) and sizeof(*val) is possible.

So doesn't this totally mess up the the kernel if I pass a 32-bit
pointer to reg_from_user64?  Or is that not really the case and that's
an exception to all of the permutations?

Basically you KVM_REG_SIZE(id) and sizeof your destination pointer type
should always match, but we abuse this slightly so far.  I don't think
you should cater to that, but just require callers to always provide a
consistent size/type pair (you could also add a union you use as a
parameter instead, or have two typed parameter) and simplify into a
single function.

The only special cases you now have to deal with are in:
set_invariant_cp15(): declare two temp variables of u32 and u64 sizes
get_invariant_cp15(): either have temporary values or change val in
                      corproc_reg to be a union

The current scheme is pretty hard to understand and to make sure we're
not breaking anything...

> +
> +/* Note it may really copy two u32 registers */
> +static int reg_from_user32(u32 *val, const void __user *uaddr, u64 id)
>  {
> -	/* This Just Works because we are little endian. */
>  	if (copy_from_user(val, uaddr, KVM_REG_SIZE(id)) != 0)
>  		return -EFAULT;
>  	return 0;
>  }
>  
> -static int reg_to_user(void __user *uaddr, const void *val, u64 id)
> +static int reg_to_user64(void __user *uaddr, const u64 *val, u64 id)
> +{
> +	unsigned long regsize = KVM_REG_SIZE(id);
> +	union {
> +		u32	word;
> +		u64	dword;
> +	} tmp;
> +
> +	switch (regsize) {
> +	case 4:
> +		tmp.word = *val;
> +		break;
> +	case 8:
> +		tmp.dword = *val;
> +		break;
> +	}
> +
> +	if (copy_to_user(uaddr, &tmp, regsize) != 0)
> +		return -EFAULT;
> +	return 0;
> +}
> +
> +/* Note it may really copy two u32 registers */
> +static int reg_to_user32(void __user *uaddr, const u32 *val, u64 id)
>  {
> -	/* This Just Works because we are little endian. */
>  	if (copy_to_user(uaddr, val, KVM_REG_SIZE(id)) != 0)
>  		return -EFAULT;
>  	return 0;
> @@ -662,7 +706,7 @@ static int get_invariant_cp15(u64 id, void __user *uaddr)
>  	if (!r)
>  		return -ENOENT;
>  
> -	return reg_to_user(uaddr, &r->val, id);
> +	return reg_to_user64(uaddr, &r->val, id);
>  }
>  
>  static int set_invariant_cp15(u64 id, void __user *uaddr)
> @@ -678,7 +722,7 @@ static int set_invariant_cp15(u64 id, void __user *uaddr)
>  	if (!r)
>  		return -ENOENT;
>  
> -	err = reg_from_user(&val, uaddr, id);
> +	err = reg_from_user64(&val, uaddr, id);
>  	if (err)
>  		return err;
>  
> @@ -846,7 +890,7 @@ static int vfp_get_reg(const struct kvm_vcpu *vcpu, u64 id, void __user *uaddr)
>  	if (vfpid < num_fp_regs()) {
>  		if (KVM_REG_SIZE(id) != 8)
>  			return -ENOENT;
> -		return reg_to_user(uaddr, &vcpu->arch.vfp_guest.fpregs[vfpid],
> +		return reg_to_user64(uaddr, &vcpu->arch.vfp_guest.fpregs[vfpid],
>  				   id);
>  	}
>  
> @@ -856,22 +900,22 @@ static int vfp_get_reg(const struct kvm_vcpu *vcpu, u64 id, void __user *uaddr)
>  
>  	switch (vfpid) {
>  	case KVM_REG_ARM_VFP_FPEXC:
> -		return reg_to_user(uaddr, &vcpu->arch.vfp_guest.fpexc, id);
> +		return reg_to_user32(uaddr, &vcpu->arch.vfp_guest.fpexc, id);
>  	case KVM_REG_ARM_VFP_FPSCR:
> -		return reg_to_user(uaddr, &vcpu->arch.vfp_guest.fpscr, id);
> +		return reg_to_user32(uaddr, &vcpu->arch.vfp_guest.fpscr, id);
>  	case KVM_REG_ARM_VFP_FPINST:
> -		return reg_to_user(uaddr, &vcpu->arch.vfp_guest.fpinst, id);
> +		return reg_to_user32(uaddr, &vcpu->arch.vfp_guest.fpinst, id);
>  	case KVM_REG_ARM_VFP_FPINST2:
> -		return reg_to_user(uaddr, &vcpu->arch.vfp_guest.fpinst2, id);
> +		return reg_to_user32(uaddr, &vcpu->arch.vfp_guest.fpinst2, id);
>  	case KVM_REG_ARM_VFP_MVFR0:
>  		val = fmrx(MVFR0);
> -		return reg_to_user(uaddr, &val, id);
> +		return reg_to_user32(uaddr, &val, id);
>  	case KVM_REG_ARM_VFP_MVFR1:
>  		val = fmrx(MVFR1);
> -		return reg_to_user(uaddr, &val, id);
> +		return reg_to_user32(uaddr, &val, id);
>  	case KVM_REG_ARM_VFP_FPSID:
>  		val = fmrx(FPSID);
> -		return reg_to_user(uaddr, &val, id);
> +		return reg_to_user32(uaddr, &val, id);
>  	default:
>  		return -ENOENT;
>  	}
> @@ -890,8 +934,8 @@ static int vfp_set_reg(struct kvm_vcpu *vcpu, u64 id, const void __user *uaddr)
>  	if (vfpid < num_fp_regs()) {
>  		if (KVM_REG_SIZE(id) != 8)
>  			return -ENOENT;
> -		return reg_from_user(&vcpu->arch.vfp_guest.fpregs[vfpid],
> -				     uaddr, id);
> +		return reg_from_user64(&vcpu->arch.vfp_guest.fpregs[vfpid],
> +				       uaddr, id);
>  	}
>  
>  	/* FP control registers are all 32 bit. */
> @@ -900,28 +944,28 @@ static int vfp_set_reg(struct kvm_vcpu *vcpu, u64 id, const void __user *uaddr)
>  
>  	switch (vfpid) {
>  	case KVM_REG_ARM_VFP_FPEXC:
> -		return reg_from_user(&vcpu->arch.vfp_guest.fpexc, uaddr, id);
> +		return reg_from_user32(&vcpu->arch.vfp_guest.fpexc, uaddr, id);
>  	case KVM_REG_ARM_VFP_FPSCR:
> -		return reg_from_user(&vcpu->arch.vfp_guest.fpscr, uaddr, id);
> +		return reg_from_user32(&vcpu->arch.vfp_guest.fpscr, uaddr, id);
>  	case KVM_REG_ARM_VFP_FPINST:
> -		return reg_from_user(&vcpu->arch.vfp_guest.fpinst, uaddr, id);
> +		return reg_from_user32(&vcpu->arch.vfp_guest.fpinst, uaddr, id);
>  	case KVM_REG_ARM_VFP_FPINST2:
> -		return reg_from_user(&vcpu->arch.vfp_guest.fpinst2, uaddr, id);
> +		return reg_from_user32(&vcpu->arch.vfp_guest.fpinst2, uaddr, id);
>  	/* These are invariant. */
>  	case KVM_REG_ARM_VFP_MVFR0:
> -		if (reg_from_user(&val, uaddr, id))
> +		if (reg_from_user32(&val, uaddr, id))
>  			return -EFAULT;
>  		if (val != fmrx(MVFR0))
>  			return -EINVAL;
>  		return 0;
>  	case KVM_REG_ARM_VFP_MVFR1:
> -		if (reg_from_user(&val, uaddr, id))
> +		if (reg_from_user32(&val, uaddr, id))
>  			return -EFAULT;
>  		if (val != fmrx(MVFR1))
>  			return -EINVAL;
>  		return 0;
>  	case KVM_REG_ARM_VFP_FPSID:
> -		if (reg_from_user(&val, uaddr, id))
> +		if (reg_from_user32(&val, uaddr, id))
>  			return -EFAULT;
>  		if (val != fmrx(FPSID))
>  			return -EINVAL;
> @@ -968,7 +1012,7 @@ int kvm_arm_coproc_get_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
>  		return get_invariant_cp15(reg->id, uaddr);
>  
>  	/* Note: copies two regs if size is 64 bit. */
> -	return reg_to_user(uaddr, &vcpu->arch.cp15[r->reg], reg->id);
> +	return reg_to_user32(uaddr, &vcpu->arch.cp15[r->reg], reg->id);
>  }
>  
>  int kvm_arm_coproc_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
> @@ -987,7 +1031,7 @@ int kvm_arm_coproc_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
>  		return set_invariant_cp15(reg->id, uaddr);
>  
>  	/* Note: copies two regs if size is 64 bit */
> -	return reg_from_user(&vcpu->arch.cp15[r->reg], uaddr, reg->id);
> +	return reg_from_user32(&vcpu->arch.cp15[r->reg], uaddr, reg->id);
>  }
>  
>  static unsigned int num_demux_regs(void)
> -- 
> 1.8.1.4
> 

Thanks,
-- 
Christoffer

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH REPOST 4/5] ARM: kvm vgic mmio should return data in BE format in BE case
  2013-12-20 16:48 ` [PATCH REPOST 4/5] ARM: kvm vgic mmio should return data in BE format in BE case Victor Kamensky
@ 2014-01-21  1:19   ` Christoffer Dall
  0 siblings, 0 replies; 26+ messages in thread
From: Christoffer Dall @ 2014-01-21  1:19 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Dec 20, 2013 at 08:48:44AM -0800, Victor Kamensky wrote:
> KVM mmio in BE case assumes that data it recieves is in BE format. Vgic
> operates in LE, so need byteswap data in BE case.
> 
> Signed-off-by: Victor Kamensky <victor.kamensky@linaro.org>
> ---
>  virt/kvm/arm/vgic.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
> index 685fc72..7e11458 100644
> --- a/virt/kvm/arm/vgic.c
> +++ b/virt/kvm/arm/vgic.c
> @@ -236,12 +236,12 @@ static void vgic_cpu_irq_clear(struct kvm_vcpu *vcpu, int irq)
>  
>  static u32 mmio_data_read(struct kvm_exit_mmio *mmio, u32 mask)
>  {
> -	return *((u32 *)mmio->data) & mask;
> +	return le32_to_cpu(*((u32 *)mmio->data)) & mask;
>  }
>  
>  static void mmio_data_write(struct kvm_exit_mmio *mmio, u32 mask, u32 value)
>  {
> -	*((u32 *)mmio->data) = value & mask;
> +	*((u32 *)mmio->data) = cpu_to_le32(value) & mask;
>  }
>  
>  /**
> -- 
> 1.8.1.4
> 

The VGIC code is complicated enough without adding endianness logic in
its depths.  I would strongly prefer that the VGIC emulation is an
endianness-agnostic software model of a device.  In fact, a better fix
for this whole situation would probably be to let the vgic_handle_mmio()
function take a typed union (or a u64)  instead of the byte array and
deal with any endianness conversion outside of the vgic itself.

-Christoffer

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH REPOST 5/5] ARM: kvm MMIO support BE host running LE code
  2014-01-07  1:59             ` Victor Kamensky
@ 2014-01-21  1:19               ` Christoffer Dall
  2014-01-21  5:24                 ` Victor Kamensky
  0 siblings, 1 reply; 26+ messages in thread
From: Christoffer Dall @ 2014-01-21  1:19 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Jan 06, 2014 at 05:59:03PM -0800, Victor Kamensky wrote:
> On 6 January 2014 14:56, Christoffer Dall <christoffer.dall@linaro.org> wrote:
> > On Mon, Jan 06, 2014 at 10:31:42PM +0000, Peter Maydell wrote:
> >> On 6 January 2014 18:20, Marc Zyngier <marc.zyngier@arm.com> wrote:
> >> >  No matter how data is stored in memory (BE, LE, or
> >> > even PDP endianness), CPU registers always have a consistent
> >> > representation. They are immune to CPU endianness change, and storing
> >> > to/reading from memory won't change the value, as long as you use the
> >> > same endianness for writing/reading.
> >>
> >> Ah, endianness. This always confuses me, but I hope the following
> >> is correct... (in all the following when I say BE I mean BE8, not BE32,
> >> since BE32 and virtualization never occur in the same CPU).
> >>
> >> Certainly registers don't have endianness, but the entire point
> >> of the CPSR.E bit is exactly that it changes the value as it is
> >> stored to / read from memory, isn't it? -- that's where and when the
> >> byte-lane flipping happens.
> >>
> >> Where this impacts the hypervisor is that instead of actually sending
> >> the data out to the bus via the byte-swapping h/w, we've trapped instead.
> >> The hypervisor reads the original data directly from the guest CPU
> >> registers, and so it's the hypervisor and userspace support code that
> >> between them have to emulate the equivalent of the byte lane
> >> swapping h/w. You could argue that it shouldn't be the kernel's
> >> job, but since the kernel has to do it for the devices it emulates
> >> internally, I'm not sure that makes much sense.
> >
> > As far as I understand, this is exactly what vcpu_data_guest_to_host and
> > vcpu_data_host_to_guest do; emulate the byte lane swapping.
> >
> > The problem is that it only works on a little-endian host with the
> > current code, because be16_to_cpu (for example), actually perform a
> > byteswap, which is what needs to be emulated.  On a big-endian host, we
> > do nothing, so we end up giving a byteswapped value to the emulated
> > device.
> 
> Yes, that was my point on the thread: vcpu_data_guest_to_host and
> vcpu_data_host_to_guest functions for any given host endianity should
> give opposite endian results depending on CPSR E bit value. And
> currently it is not happening in BE host case. It seems that Peter and
> you agree with that and I gave example in another email with
> dynamically switching E bit illustrating this problem for BE host.
> 
> > I think a cleaner fix than this patch is to just change the
> > be16_to_cpu() to a __swab16() instead, which clearly indicates that
> > 'here is the byte lane swapping'.
> 
> Yes, that may work, but it is a bit orthogonal issue.

Why?  I don't think it is, I think it's addressing exactly the point at
hand.

> And I don't think
> it is better. For this to work one need to change canonical endianity on
> one of the sides around vcpu_data_guest_to_host and
> vcpu_data_host_to_guest functions.

You have to simply clearly define which format you want mmio.data to be
in.  This is a user space interface across multiple architectures and
therefore something you have to consider carefully and you're limited in
choices to something that works with existing user space code.

> 
> Changing  it on side that faces hypervisor (code that handles guest spilled
> CPU register set) does not make sense at all - if we will keep guest CPU
> register set in memory in LE form and hypervisor runs in BE (BE host),
> code that spills registers would need to do constant byteswaps. Also any
> access by host kernel and hypervisor (all running in BE) would need to do
> byteswaps while working with guest saved registers.
> 
> Changing canonical form of data on side that faces emulator and mmio
> part of kvm_run does not make sense either. kvm_run mmio.data field is
> bytes array, when it comes to host kernel from emulator, it already contains
> device memory in correct endian order that corresponds to endianity of
> emulated device. For example for LE device word read access, after call is
> emulated, mmio.data will contain mmio.data[0], mmio.data[1], mmio.data[2]
> mmio.data[3] values in LE order (mmio.data[3] is MSB). Now look at
> mmio_read_buf function introduced by Marc's 6d89d2d9 commit, this function
> will byte copy this mmio.data buffer into integer according to ongoing mmio
> access size. Note in BE host case such integer, in 'data' variable of
> kvm_handle_mmio_return function, will have byteswapped value. Now when it will
> be passed into vcpu_data_host_to_guest function, and it emulates read access
> of guest with E bit set, and if we follow your suggestion, it will be
> byteswapped.
> I.e 'data' integer will contain non byteswapped value of LE device. It will be
> further stored into some vcpu_reg register, still in native format (BE
> store), and
> further restored into guest CPU register, still non byteswapped (BE hypervisor).
> And that is not what BE client reading word of LE device expects - BE client
> knowing that it reads LE device with E bit set, it will issue additional rev
> instruction to get device memory as integer. If we really want to follow your
> suggestion, one may introduce compensatory byteswaps in mmio_read_buf
> and mmio_write_buf functions in case of BE host, rather then just do
> memcpy ... but I am not sure what it will buy us - in BE case it will swap data
> twice.
> 
> Note in above description by "canonical" I mean some form of data regardless
> of current access CPSR E value. But it may differ depending on host endianess.
> 

There's a lot of text to digest here, talking about a canonical form
here doesn't help; just define the layout of the destination byte array.
I also got completely lost in what you're referring to when you talk
about 'sides' here.

The thing we must decide is how the data is stored in
kvm_exit_mmio.data.  See Peter's recent thread "KVM and
variable-endianness guest CPUs".  Once we agree on this, the rest should
be easy (assuming we use the same structure for the data in the kernel's
internal kvm_exit_mmio declared on the stack in io_mem_abort()).

The format you suggest requires any consumer of this data to consider
the host endianness, which I don't think makes anything more clear (see
my comment on the vgic patch).

The in-kernel interface between the io_mem_abort() code and any
in-kernel emulated device must do exactly the same as the interface
between KVM and QEMU must do for KVM_EXIT_MMIO.

-- 
Christoffer

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH REPOST 5/5] ARM: kvm MMIO support BE host running LE code
  2014-01-21  1:19               ` Christoffer Dall
@ 2014-01-21  5:24                 ` Victor Kamensky
  2014-01-21  5:46                   ` Anup Patel
  2014-01-21  6:03                   ` Christoffer Dall
  0 siblings, 2 replies; 26+ messages in thread
From: Victor Kamensky @ 2014-01-21  5:24 UTC (permalink / raw)
  To: linux-arm-kernel

On 20 January 2014 17:19, Christoffer Dall <christoffer.dall@linaro.org> wrote:
> On Mon, Jan 06, 2014 at 05:59:03PM -0800, Victor Kamensky wrote:
>> On 6 January 2014 14:56, Christoffer Dall <christoffer.dall@linaro.org> wrote:
>> > On Mon, Jan 06, 2014 at 10:31:42PM +0000, Peter Maydell wrote:
>> >> On 6 January 2014 18:20, Marc Zyngier <marc.zyngier@arm.com> wrote:
>> >> >  No matter how data is stored in memory (BE, LE, or
>> >> > even PDP endianness), CPU registers always have a consistent
>> >> > representation. They are immune to CPU endianness change, and storing
>> >> > to/reading from memory won't change the value, as long as you use the
>> >> > same endianness for writing/reading.
>> >>
>> >> Ah, endianness. This always confuses me, but I hope the following
>> >> is correct... (in all the following when I say BE I mean BE8, not BE32,
>> >> since BE32 and virtualization never occur in the same CPU).
>> >>
>> >> Certainly registers don't have endianness, but the entire point
>> >> of the CPSR.E bit is exactly that it changes the value as it is
>> >> stored to / read from memory, isn't it? -- that's where and when the
>> >> byte-lane flipping happens.
>> >>
>> >> Where this impacts the hypervisor is that instead of actually sending
>> >> the data out to the bus via the byte-swapping h/w, we've trapped instead.
>> >> The hypervisor reads the original data directly from the guest CPU
>> >> registers, and so it's the hypervisor and userspace support code that
>> >> between them have to emulate the equivalent of the byte lane
>> >> swapping h/w. You could argue that it shouldn't be the kernel's
>> >> job, but since the kernel has to do it for the devices it emulates
>> >> internally, I'm not sure that makes much sense.
>> >
>> > As far as I understand, this is exactly what vcpu_data_guest_to_host and
>> > vcpu_data_host_to_guest do; emulate the byte lane swapping.
>> >
>> > The problem is that it only works on a little-endian host with the
>> > current code, because be16_to_cpu (for example), actually perform a
>> > byteswap, which is what needs to be emulated.  On a big-endian host, we
>> > do nothing, so we end up giving a byteswapped value to the emulated
>> > device.
>>
>> Yes, that was my point on the thread: vcpu_data_guest_to_host and
>> vcpu_data_host_to_guest functions for any given host endianity should
>> give opposite endian results depending on CPSR E bit value. And
>> currently it is not happening in BE host case. It seems that Peter and
>> you agree with that and I gave example in another email with
>> dynamically switching E bit illustrating this problem for BE host.
>>
>> > I think a cleaner fix than this patch is to just change the
>> > be16_to_cpu() to a __swab16() instead, which clearly indicates that
>> > 'here is the byte lane swapping'.
>>
>> Yes, that may work, but it is a bit orthogonal issue.
>
> Why?  I don't think it is, I think it's addressing exactly the point at
> hand.
>
>> And I don't think
>> it is better. For this to work one need to change canonical endianity on
>> one of the sides around vcpu_data_guest_to_host and
>> vcpu_data_host_to_guest functions.
>
> You have to simply clearly define which format you want mmio.data to be
> in.

I believe it is already decided. 'mmio.data' in 'struct kvm_run' is not
an integer type - it is bytes array. Bytes array does not have endianity.
It is endian agnostic. Here is snippet from linux/kvm.h

                /* KVM_EXIT_MMIO */
                struct {
                        __u64 phys_addr;
                        __u8  data[8];
                        __u32 len;
                        __u8  is_write;
                } mmio;

it is very natural to treat it as just a piece of memory. I.e when code reads
emulated LE device address as integer, this array will contain integer
placed in memory in LE order, data[3] is MSB, as it would be located in
regular memory. When code reads emulated BE device address as
integer this array will contain integer placed in memory in BE order,
data[0] is MSB.

You can think about it in that way: ARM system emulator runs on x86
(LE) and on PPC (BE). How mmio.data array for the same emulated
device should look like in across these two cases? I believe it should
be identical - just a stream of bytes.

Emulator code handles this situation quite nicely. For example check
in qemu endianness field of MemoryRegionOps structure. Depending
of the field value and current emulator endianity code will place
results into 'mmio.data' array in right order. See [1] as an example
in qemu where endianity of certain ARM devices were not declared
correctly - it was marked as DEVICE_NATIVE_ENDIAN whereas
it should be DEVICE_LITTLE_ENDIAN. After I changed that BE qemu
pretty much started working. I strongly suspect if one would run
ARM system emulation on PPC (BE) he/she would need the same
changes.

Note issue with virtio endianity is very different problem - there it
is not clear for given arrangement of host/emulator how to treat
virtio devices as LE or BE, and in what format data in rings
descriptors are.

Thanks,
Victor

[1]  https://git.linaro.org/people/victor.kamensky/qemu-be.git/commitdiff/8599358f9711b7a546a2bba63b6277fbfb5b8e0c?hp=c4880f08ff9451e3d8020153e1a710ab4acee151

> This is a user space interface across multiple architectures and
> therefore something you have to consider carefully and you're limited in
> choices to something that works with existing user space code.
>
>>
>> Changing  it on side that faces hypervisor (code that handles guest spilled
>> CPU register set) does not make sense at all - if we will keep guest CPU
>> register set in memory in LE form and hypervisor runs in BE (BE host),
>> code that spills registers would need to do constant byteswaps. Also any
>> access by host kernel and hypervisor (all running in BE) would need to do
>> byteswaps while working with guest saved registers.
>>
>> Changing canonical form of data on side that faces emulator and mmio
>> part of kvm_run does not make sense either. kvm_run mmio.data field is
>> bytes array, when it comes to host kernel from emulator, it already contains
>> device memory in correct endian order that corresponds to endianity of
>> emulated device. For example for LE device word read access, after call is
>> emulated, mmio.data will contain mmio.data[0], mmio.data[1], mmio.data[2]
>> mmio.data[3] values in LE order (mmio.data[3] is MSB). Now look at
>> mmio_read_buf function introduced by Marc's 6d89d2d9 commit, this function
>> will byte copy this mmio.data buffer into integer according to ongoing mmio
>> access size. Note in BE host case such integer, in 'data' variable of
>> kvm_handle_mmio_return function, will have byteswapped value. Now when it will
>> be passed into vcpu_data_host_to_guest function, and it emulates read access
>> of guest with E bit set, and if we follow your suggestion, it will be
>> byteswapped.
>> I.e 'data' integer will contain non byteswapped value of LE device. It will be
>> further stored into some vcpu_reg register, still in native format (BE
>> store), and
>> further restored into guest CPU register, still non byteswapped (BE hypervisor).
>> And that is not what BE client reading word of LE device expects - BE client
>> knowing that it reads LE device with E bit set, it will issue additional rev
>> instruction to get device memory as integer. If we really want to follow your
>> suggestion, one may introduce compensatory byteswaps in mmio_read_buf
>> and mmio_write_buf functions in case of BE host, rather then just do
>> memcpy ... but I am not sure what it will buy us - in BE case it will swap data
>> twice.
>>
>> Note in above description by "canonical" I mean some form of data regardless
>> of current access CPSR E value. But it may differ depending on host endianess.
>>
>
> There's a lot of text to digest here, talking about a canonical form
> here doesn't help; just define the layout of the destination byte array.
> I also got completely lost in what you're referring to when you talk
> about 'sides' here.
>
> The thing we must decide is how the data is stored in
> kvm_exit_mmio.data.  See Peter's recent thread "KVM and
> variable-endianness guest CPUs".  Once we agree on this, the rest should
> be easy (assuming we use the same structure for the data in the kernel's
> internal kvm_exit_mmio declared on the stack in io_mem_abort()).
>
> The format you suggest requires any consumer of this data to consider
> the host endianness, which I don't think makes anything more clear (see
> my comment on the vgic patch).
>
> The in-kernel interface between the io_mem_abort() code and any
> in-kernel emulated device must do exactly the same as the interface
> between KVM and QEMU must do for KVM_EXIT_MMIO.
>
> --
> Christoffer

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH REPOST 5/5] ARM: kvm MMIO support BE host running LE code
  2014-01-21  5:24                 ` Victor Kamensky
@ 2014-01-21  5:46                   ` Anup Patel
  2014-01-21  6:31                     ` Christoffer Dall
  2014-01-21  6:39                     ` Victor Kamensky
  2014-01-21  6:03                   ` Christoffer Dall
  1 sibling, 2 replies; 26+ messages in thread
From: Anup Patel @ 2014-01-21  5:46 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jan 21, 2014 at 10:54 AM, Victor Kamensky
<victor.kamensky@linaro.org> wrote:
> On 20 January 2014 17:19, Christoffer Dall <christoffer.dall@linaro.org> wrote:
>> On Mon, Jan 06, 2014 at 05:59:03PM -0800, Victor Kamensky wrote:
>>> On 6 January 2014 14:56, Christoffer Dall <christoffer.dall@linaro.org> wrote:
>>> > On Mon, Jan 06, 2014 at 10:31:42PM +0000, Peter Maydell wrote:
>>> >> On 6 January 2014 18:20, Marc Zyngier <marc.zyngier@arm.com> wrote:
>>> >> >  No matter how data is stored in memory (BE, LE, or
>>> >> > even PDP endianness), CPU registers always have a consistent
>>> >> > representation. They are immune to CPU endianness change, and storing
>>> >> > to/reading from memory won't change the value, as long as you use the
>>> >> > same endianness for writing/reading.
>>> >>
>>> >> Ah, endianness. This always confuses me, but I hope the following
>>> >> is correct... (in all the following when I say BE I mean BE8, not BE32,
>>> >> since BE32 and virtualization never occur in the same CPU).
>>> >>
>>> >> Certainly registers don't have endianness, but the entire point
>>> >> of the CPSR.E bit is exactly that it changes the value as it is
>>> >> stored to / read from memory, isn't it? -- that's where and when the
>>> >> byte-lane flipping happens.
>>> >>
>>> >> Where this impacts the hypervisor is that instead of actually sending
>>> >> the data out to the bus via the byte-swapping h/w, we've trapped instead.
>>> >> The hypervisor reads the original data directly from the guest CPU
>>> >> registers, and so it's the hypervisor and userspace support code that
>>> >> between them have to emulate the equivalent of the byte lane
>>> >> swapping h/w. You could argue that it shouldn't be the kernel's
>>> >> job, but since the kernel has to do it for the devices it emulates
>>> >> internally, I'm not sure that makes much sense.
>>> >
>>> > As far as I understand, this is exactly what vcpu_data_guest_to_host and
>>> > vcpu_data_host_to_guest do; emulate the byte lane swapping.
>>> >
>>> > The problem is that it only works on a little-endian host with the
>>> > current code, because be16_to_cpu (for example), actually perform a
>>> > byteswap, which is what needs to be emulated.  On a big-endian host, we
>>> > do nothing, so we end up giving a byteswapped value to the emulated
>>> > device.
>>>
>>> Yes, that was my point on the thread: vcpu_data_guest_to_host and
>>> vcpu_data_host_to_guest functions for any given host endianity should
>>> give opposite endian results depending on CPSR E bit value. And
>>> currently it is not happening in BE host case. It seems that Peter and
>>> you agree with that and I gave example in another email with
>>> dynamically switching E bit illustrating this problem for BE host.
>>>
>>> > I think a cleaner fix than this patch is to just change the
>>> > be16_to_cpu() to a __swab16() instead, which clearly indicates that
>>> > 'here is the byte lane swapping'.
>>>
>>> Yes, that may work, but it is a bit orthogonal issue.
>>
>> Why?  I don't think it is, I think it's addressing exactly the point at
>> hand.
>>
>>> And I don't think
>>> it is better. For this to work one need to change canonical endianity on
>>> one of the sides around vcpu_data_guest_to_host and
>>> vcpu_data_host_to_guest functions.
>>
>> You have to simply clearly define which format you want mmio.data to be
>> in.
>
> I believe it is already decided. 'mmio.data' in 'struct kvm_run' is not
> an integer type - it is bytes array. Bytes array does not have endianity.
> It is endian agnostic. Here is snippet from linux/kvm.h
>
>                 /* KVM_EXIT_MMIO */
>                 struct {
>                         __u64 phys_addr;
>                         __u8  data[8];
>                         __u32 len;
>                         __u8  is_write;
>                 } mmio;
>
> it is very natural to treat it as just a piece of memory. I.e when code reads
> emulated LE device address as integer, this array will contain integer
> placed in memory in LE order, data[3] is MSB, as it would be located in
> regular memory. When code reads emulated BE device address as
> integer this array will contain integer placed in memory in BE order,
> data[0] is MSB.
>
> You can think about it in that way: ARM system emulator runs on x86
> (LE) and on PPC (BE). How mmio.data array for the same emulated
> device should look like in across these two cases? I believe it should
> be identical - just a stream of bytes.
>
> Emulator code handles this situation quite nicely. For example check
> in qemu endianness field of MemoryRegionOps structure. Depending
> of the field value and current emulator endianity code will place
> results into 'mmio.data' array in right order. See [1] as an example
> in qemu where endianity of certain ARM devices were not declared
> correctly - it was marked as DEVICE_NATIVE_ENDIAN whereas
> it should be DEVICE_LITTLE_ENDIAN. After I changed that BE qemu
> pretty much started working. I strongly suspect if one would run
> ARM system emulation on PPC (BE) he/she would need the same
> changes.
>
> Note issue with virtio endianity is very different problem - there it
> is not clear for given arrangement of host/emulator how to treat
> virtio devices as LE or BE, and in what format data in rings
> descriptors are.

IMHO, device endianess should be taken care by device emulators only
because we can have Machine Model containing both LE devices and
BE devices. KVM ARM/ARM64 should only worry about endianess of
in-kernel emulated devices (e.g. VGIC). In general, QEMU or KVMTOOL
should be responsible of device endianess and for this QEMU or KVMTOOL
should also know whether Guest (or VM) is little-endian or big-endian.

Regards,
Anup

>
> Thanks,
> Victor
>
> [1]  https://git.linaro.org/people/victor.kamensky/qemu-be.git/commitdiff/8599358f9711b7a546a2bba63b6277fbfb5b8e0c?hp=c4880f08ff9451e3d8020153e1a710ab4acee151
>
>> This is a user space interface across multiple architectures and
>> therefore something you have to consider carefully and you're limited in
>> choices to something that works with existing user space code.
>>
>>>
>>> Changing  it on side that faces hypervisor (code that handles guest spilled
>>> CPU register set) does not make sense at all - if we will keep guest CPU
>>> register set in memory in LE form and hypervisor runs in BE (BE host),
>>> code that spills registers would need to do constant byteswaps. Also any
>>> access by host kernel and hypervisor (all running in BE) would need to do
>>> byteswaps while working with guest saved registers.
>>>
>>> Changing canonical form of data on side that faces emulator and mmio
>>> part of kvm_run does not make sense either. kvm_run mmio.data field is
>>> bytes array, when it comes to host kernel from emulator, it already contains
>>> device memory in correct endian order that corresponds to endianity of
>>> emulated device. For example for LE device word read access, after call is
>>> emulated, mmio.data will contain mmio.data[0], mmio.data[1], mmio.data[2]
>>> mmio.data[3] values in LE order (mmio.data[3] is MSB). Now look at
>>> mmio_read_buf function introduced by Marc's 6d89d2d9 commit, this function
>>> will byte copy this mmio.data buffer into integer according to ongoing mmio
>>> access size. Note in BE host case such integer, in 'data' variable of
>>> kvm_handle_mmio_return function, will have byteswapped value. Now when it will
>>> be passed into vcpu_data_host_to_guest function, and it emulates read access
>>> of guest with E bit set, and if we follow your suggestion, it will be
>>> byteswapped.
>>> I.e 'data' integer will contain non byteswapped value of LE device. It will be
>>> further stored into some vcpu_reg register, still in native format (BE
>>> store), and
>>> further restored into guest CPU register, still non byteswapped (BE hypervisor).
>>> And that is not what BE client reading word of LE device expects - BE client
>>> knowing that it reads LE device with E bit set, it will issue additional rev
>>> instruction to get device memory as integer. If we really want to follow your
>>> suggestion, one may introduce compensatory byteswaps in mmio_read_buf
>>> and mmio_write_buf functions in case of BE host, rather then just do
>>> memcpy ... but I am not sure what it will buy us - in BE case it will swap data
>>> twice.
>>>
>>> Note in above description by "canonical" I mean some form of data regardless
>>> of current access CPSR E value. But it may differ depending on host endianess.
>>>
>>
>> There's a lot of text to digest here, talking about a canonical form
>> here doesn't help; just define the layout of the destination byte array.
>> I also got completely lost in what you're referring to when you talk
>> about 'sides' here.
>>
>> The thing we must decide is how the data is stored in
>> kvm_exit_mmio.data.  See Peter's recent thread "KVM and
>> variable-endianness guest CPUs".  Once we agree on this, the rest should
>> be easy (assuming we use the same structure for the data in the kernel's
>> internal kvm_exit_mmio declared on the stack in io_mem_abort()).
>>
>> The format you suggest requires any consumer of this data to consider
>> the host endianness, which I don't think makes anything more clear (see
>> my comment on the vgic patch).
>>
>> The in-kernel interface between the io_mem_abort() code and any
>> in-kernel emulated device must do exactly the same as the interface
>> between KVM and QEMU must do for KVM_EXIT_MMIO.
>>
>> --
>> Christoffer
> _______________________________________________
> kvmarm mailing list
> kvmarm at lists.cs.columbia.edu
> https://lists.cs.columbia.edu/cucslists/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH REPOST 5/5] ARM: kvm MMIO support BE host running LE code
  2014-01-21  5:24                 ` Victor Kamensky
  2014-01-21  5:46                   ` Anup Patel
@ 2014-01-21  6:03                   ` Christoffer Dall
  1 sibling, 0 replies; 26+ messages in thread
From: Christoffer Dall @ 2014-01-21  6:03 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Jan 20, 2014 at 09:24:10PM -0800, Victor Kamensky wrote:
> On 20 January 2014 17:19, Christoffer Dall <christoffer.dall@linaro.org> wrote:
> > On Mon, Jan 06, 2014 at 05:59:03PM -0800, Victor Kamensky wrote:
> >> On 6 January 2014 14:56, Christoffer Dall <christoffer.dall@linaro.org> wrote:
> >> > On Mon, Jan 06, 2014 at 10:31:42PM +0000, Peter Maydell wrote:
> >> >> On 6 January 2014 18:20, Marc Zyngier <marc.zyngier@arm.com> wrote:
> >> >> >  No matter how data is stored in memory (BE, LE, or
> >> >> > even PDP endianness), CPU registers always have a consistent
> >> >> > representation. They are immune to CPU endianness change, and storing
> >> >> > to/reading from memory won't change the value, as long as you use the
> >> >> > same endianness for writing/reading.
> >> >>
> >> >> Ah, endianness. This always confuses me, but I hope the following
> >> >> is correct... (in all the following when I say BE I mean BE8, not BE32,
> >> >> since BE32 and virtualization never occur in the same CPU).
> >> >>
> >> >> Certainly registers don't have endianness, but the entire point
> >> >> of the CPSR.E bit is exactly that it changes the value as it is
> >> >> stored to / read from memory, isn't it? -- that's where and when the
> >> >> byte-lane flipping happens.
> >> >>
> >> >> Where this impacts the hypervisor is that instead of actually sending
> >> >> the data out to the bus via the byte-swapping h/w, we've trapped instead.
> >> >> The hypervisor reads the original data directly from the guest CPU
> >> >> registers, and so it's the hypervisor and userspace support code that
> >> >> between them have to emulate the equivalent of the byte lane
> >> >> swapping h/w. You could argue that it shouldn't be the kernel's
> >> >> job, but since the kernel has to do it for the devices it emulates
> >> >> internally, I'm not sure that makes much sense.
> >> >
> >> > As far as I understand, this is exactly what vcpu_data_guest_to_host and
> >> > vcpu_data_host_to_guest do; emulate the byte lane swapping.
> >> >
> >> > The problem is that it only works on a little-endian host with the
> >> > current code, because be16_to_cpu (for example), actually perform a
> >> > byteswap, which is what needs to be emulated.  On a big-endian host, we
> >> > do nothing, so we end up giving a byteswapped value to the emulated
> >> > device.
> >>
> >> Yes, that was my point on the thread: vcpu_data_guest_to_host and
> >> vcpu_data_host_to_guest functions for any given host endianity should
> >> give opposite endian results depending on CPSR E bit value. And
> >> currently it is not happening in BE host case. It seems that Peter and
> >> you agree with that and I gave example in another email with
> >> dynamically switching E bit illustrating this problem for BE host.
> >>
> >> > I think a cleaner fix than this patch is to just change the
> >> > be16_to_cpu() to a __swab16() instead, which clearly indicates that
> >> > 'here is the byte lane swapping'.
> >>
> >> Yes, that may work, but it is a bit orthogonal issue.
> >
> > Why?  I don't think it is, I think it's addressing exactly the point at
> > hand.
> >
> >> And I don't think
> >> it is better. For this to work one need to change canonical endianity on
> >> one of the sides around vcpu_data_guest_to_host and
> >> vcpu_data_host_to_guest functions.
> >
> > You have to simply clearly define which format you want mmio.data to be
> > in.
> 
> I believe it is already decided. 'mmio.data' in 'struct kvm_run' is not
> an integer type - it is bytes array. Bytes array does not have endianity.

Please read through this thread:
https://lists.cs.columbia.edu/pipermail/kvmarm/2014-January/008784.html

> It is endian agnostic. Here is snippet from linux/kvm.h
> 
>                 /* KVM_EXIT_MMIO */
>                 struct {
>                         __u64 phys_addr;
>                         __u8  data[8];
>                         __u32 len;
>                         __u8  is_write;
>                 } mmio;

Thanks, I already knew where to find this though ;)

I realize that it is a byte array.  But that doesn't change the fact
that a store of a word would have to put either the most or least
significant byte in data[0].

> 
> it is very natural to treat it as just a piece of memory. I.e when code reads
> emulated LE device address as integer, this array will contain integer
> placed in memory in LE order, data[3] is MSB, as it would be located in
> regular memory. When code reads emulated BE device address as
> integer this array will contain integer placed in memory in BE order,
> data[0] is MSB.

I don't understand this.  "code reads emulated device address as
integer".  The format of the byte array cannot be device-specific,
because the kernel doesn't know about device.  It can only depend on the
endianness of the VM and of the host.

Can you try in a single sentence to to specify what the format of the
byte array is?

> 
> You can think about it in that way: ARM system emulator runs on x86
> (LE) and on PPC (BE). How mmio.data array for the same emulated
> device should look like in across these two cases? I believe it should
> be identical - just a stream of bytes.

Well, KVM/ARM cannot run on PPC for obvious reasons, and this is a KVM
kernel to user space interface.

> 
> Emulator code handles this situation quite nicely. For example check
> in qemu endianness field of MemoryRegionOps structure. Depending
> of the field value and current emulator endianity code will place
> results into 'mmio.data' array in right order. See [1] as an example
> in qemu where endianity of certain ARM devices were not declared
> correctly - it was marked as DEVICE_NATIVE_ENDIAN whereas
> it should be DEVICE_LITTLE_ENDIAN. After I changed that BE qemu
> pretty much started working. I strongly suspect if one would run
> ARM system emulation on PPC (BE) he/she would need the same
> changes.

It doesn't really matter what the emulator does if there's no clear
specification of the interface it relies on.  It may happen to work in
the cases that are already supported (by chance), but we don't know how
to deal with a new (cross-endianness situation) because it is not
specified.

> 
> Note issue with virtio endianity is very different problem - there it
> is not clear for given arrangement of host/emulator how to treat
> virtio devices as LE or BE, and in what format data in rings
> descriptors are.
> 
> Thanks,
> Victor
> 
> [1]  https://git.linaro.org/people/victor.kamensky/qemu-be.git/commitdiff/8599358f9711b7a546a2bba63b6277fbfb5b8e0c?hp=c4880f08ff9451e3d8020153e1a710ab4acee151
> 
> > This is a user space interface across multiple architectures and
> > therefore something you have to consider carefully and you're limited in
> > choices to something that works with existing user space code.
> >
> >>
> >> Changing  it on side that faces hypervisor (code that handles guest spilled
> >> CPU register set) does not make sense at all - if we will keep guest CPU
> >> register set in memory in LE form and hypervisor runs in BE (BE host),
> >> code that spills registers would need to do constant byteswaps. Also any
> >> access by host kernel and hypervisor (all running in BE) would need to do
> >> byteswaps while working with guest saved registers.
> >>
> >> Changing canonical form of data on side that faces emulator and mmio
> >> part of kvm_run does not make sense either. kvm_run mmio.data field is
> >> bytes array, when it comes to host kernel from emulator, it already contains
> >> device memory in correct endian order that corresponds to endianity of
> >> emulated device. For example for LE device word read access, after call is
> >> emulated, mmio.data will contain mmio.data[0], mmio.data[1], mmio.data[2]
> >> mmio.data[3] values in LE order (mmio.data[3] is MSB). Now look at
> >> mmio_read_buf function introduced by Marc's 6d89d2d9 commit, this function
> >> will byte copy this mmio.data buffer into integer according to ongoing mmio
> >> access size. Note in BE host case such integer, in 'data' variable of
> >> kvm_handle_mmio_return function, will have byteswapped value. Now when it will
> >> be passed into vcpu_data_host_to_guest function, and it emulates read access
> >> of guest with E bit set, and if we follow your suggestion, it will be
> >> byteswapped.
> >> I.e 'data' integer will contain non byteswapped value of LE device. It will be
> >> further stored into some vcpu_reg register, still in native format (BE
> >> store), and
> >> further restored into guest CPU register, still non byteswapped (BE hypervisor).
> >> And that is not what BE client reading word of LE device expects - BE client
> >> knowing that it reads LE device with E bit set, it will issue additional rev
> >> instruction to get device memory as integer. If we really want to follow your
> >> suggestion, one may introduce compensatory byteswaps in mmio_read_buf
> >> and mmio_write_buf functions in case of BE host, rather then just do
> >> memcpy ... but I am not sure what it will buy us - in BE case it will swap data
> >> twice.
> >>
> >> Note in above description by "canonical" I mean some form of data regardless
> >> of current access CPSR E value. But it may differ depending on host endianess.
> >>
> >
> > There's a lot of text to digest here, talking about a canonical form
> > here doesn't help; just define the layout of the destination byte array.
> > I also got completely lost in what you're referring to when you talk
> > about 'sides' here.
> >
> > The thing we must decide is how the data is stored in
> > kvm_exit_mmio.data.  See Peter's recent thread "KVM and
> > variable-endianness guest CPUs".  Once we agree on this, the rest should
> > be easy (assuming we use the same structure for the data in the kernel's
> > internal kvm_exit_mmio declared on the stack in io_mem_abort()).
> >
> > The format you suggest requires any consumer of this data to consider
> > the host endianness, which I don't think makes anything more clear (see
> > my comment on the vgic patch).
> >
> > The in-kernel interface between the io_mem_abort() code and any
> > in-kernel emulated device must do exactly the same as the interface
> > between KVM and QEMU must do for KVM_EXIT_MMIO.
> >
> > --
> > Christoffer

-- 
Christoffer

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH REPOST 5/5] ARM: kvm MMIO support BE host running LE code
  2014-01-21  5:46                   ` Anup Patel
@ 2014-01-21  6:31                     ` Christoffer Dall
  2014-01-21  6:39                     ` Victor Kamensky
  1 sibling, 0 replies; 26+ messages in thread
From: Christoffer Dall @ 2014-01-21  6:31 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jan 21, 2014 at 11:16:46AM +0530, Anup Patel wrote:
> On Tue, Jan 21, 2014 at 10:54 AM, Victor Kamensky
> <victor.kamensky@linaro.org> wrote:
> > On 20 January 2014 17:19, Christoffer Dall <christoffer.dall@linaro.org> wrote:
> >> On Mon, Jan 06, 2014 at 05:59:03PM -0800, Victor Kamensky wrote:
> >>> On 6 January 2014 14:56, Christoffer Dall <christoffer.dall@linaro.org> wrote:
> >>> > On Mon, Jan 06, 2014 at 10:31:42PM +0000, Peter Maydell wrote:
> >>> >> On 6 January 2014 18:20, Marc Zyngier <marc.zyngier@arm.com> wrote:
> >>> >> >  No matter how data is stored in memory (BE, LE, or
> >>> >> > even PDP endianness), CPU registers always have a consistent
> >>> >> > representation. They are immune to CPU endianness change, and storing
> >>> >> > to/reading from memory won't change the value, as long as you use the
> >>> >> > same endianness for writing/reading.
> >>> >>
> >>> >> Ah, endianness. This always confuses me, but I hope the following
> >>> >> is correct... (in all the following when I say BE I mean BE8, not BE32,
> >>> >> since BE32 and virtualization never occur in the same CPU).
> >>> >>
> >>> >> Certainly registers don't have endianness, but the entire point
> >>> >> of the CPSR.E bit is exactly that it changes the value as it is
> >>> >> stored to / read from memory, isn't it? -- that's where and when the
> >>> >> byte-lane flipping happens.
> >>> >>
> >>> >> Where this impacts the hypervisor is that instead of actually sending
> >>> >> the data out to the bus via the byte-swapping h/w, we've trapped instead.
> >>> >> The hypervisor reads the original data directly from the guest CPU
> >>> >> registers, and so it's the hypervisor and userspace support code that
> >>> >> between them have to emulate the equivalent of the byte lane
> >>> >> swapping h/w. You could argue that it shouldn't be the kernel's
> >>> >> job, but since the kernel has to do it for the devices it emulates
> >>> >> internally, I'm not sure that makes much sense.
> >>> >
> >>> > As far as I understand, this is exactly what vcpu_data_guest_to_host and
> >>> > vcpu_data_host_to_guest do; emulate the byte lane swapping.
> >>> >
> >>> > The problem is that it only works on a little-endian host with the
> >>> > current code, because be16_to_cpu (for example), actually perform a
> >>> > byteswap, which is what needs to be emulated.  On a big-endian host, we
> >>> > do nothing, so we end up giving a byteswapped value to the emulated
> >>> > device.
> >>>
> >>> Yes, that was my point on the thread: vcpu_data_guest_to_host and
> >>> vcpu_data_host_to_guest functions for any given host endianity should
> >>> give opposite endian results depending on CPSR E bit value. And
> >>> currently it is not happening in BE host case. It seems that Peter and
> >>> you agree with that and I gave example in another email with
> >>> dynamically switching E bit illustrating this problem for BE host.
> >>>
> >>> > I think a cleaner fix than this patch is to just change the
> >>> > be16_to_cpu() to a __swab16() instead, which clearly indicates that
> >>> > 'here is the byte lane swapping'.
> >>>
> >>> Yes, that may work, but it is a bit orthogonal issue.
> >>
> >> Why?  I don't think it is, I think it's addressing exactly the point at
> >> hand.
> >>
> >>> And I don't think
> >>> it is better. For this to work one need to change canonical endianity on
> >>> one of the sides around vcpu_data_guest_to_host and
> >>> vcpu_data_host_to_guest functions.
> >>
> >> You have to simply clearly define which format you want mmio.data to be
> >> in.
> >
> > I believe it is already decided. 'mmio.data' in 'struct kvm_run' is not
> > an integer type - it is bytes array. Bytes array does not have endianity.
> > It is endian agnostic. Here is snippet from linux/kvm.h
> >
> >                 /* KVM_EXIT_MMIO */
> >                 struct {
> >                         __u64 phys_addr;
> >                         __u8  data[8];
> >                         __u32 len;
> >                         __u8  is_write;
> >                 } mmio;
> >
> > it is very natural to treat it as just a piece of memory. I.e when code reads
> > emulated LE device address as integer, this array will contain integer
> > placed in memory in LE order, data[3] is MSB, as it would be located in
> > regular memory. When code reads emulated BE device address as
> > integer this array will contain integer placed in memory in BE order,
> > data[0] is MSB.
> >
> > You can think about it in that way: ARM system emulator runs on x86
> > (LE) and on PPC (BE). How mmio.data array for the same emulated
> > device should look like in across these two cases? I believe it should
> > be identical - just a stream of bytes.
> >
> > Emulator code handles this situation quite nicely. For example check
> > in qemu endianness field of MemoryRegionOps structure. Depending
> > of the field value and current emulator endianity code will place
> > results into 'mmio.data' array in right order. See [1] as an example
> > in qemu where endianity of certain ARM devices were not declared
> > correctly - it was marked as DEVICE_NATIVE_ENDIAN whereas
> > it should be DEVICE_LITTLE_ENDIAN. After I changed that BE qemu
> > pretty much started working. I strongly suspect if one would run
> > ARM system emulation on PPC (BE) he/she would need the same
> > changes.
> >
> > Note issue with virtio endianity is very different problem - there it
> > is not clear for given arrangement of host/emulator how to treat
> > virtio devices as LE or BE, and in what format data in rings
> > descriptors are.
> 
> IMHO, device endianess should be taken care by device emulators only
> because we can have Machine Model containing both LE devices and
> BE devices. KVM ARM/ARM64 should only worry about endianess of
> in-kernel emulated devices (e.g. VGIC). In general, QEMU or KVMTOOL
> should be responsible of device endianess and for this QEMU or KVMTOOL
> should also know whether Guest (or VM) is little-endian or big-endian.
> 

Specifying the interface to say that this is a store of the register
value directly using the endianness of the host kernel is an option.
However, user space must fetch the CPSR on each MMIO from the kernel and
look at the E-bit to understand how it should interpret the data, which
may add overhead, and it doesn't change the fact that this needs to be
specified in the API.

The E bit on ARM specifies that the CPU will swap the bytes before
putting the register value on the memory bus.  That's all it does.

Something has to emulate this, and given that KVM emulates the CPU, I
think KVM should emulate the E-bit.

>From my point of view, the mmio.data API as the signal you would receive
if you're any consumer of the memory operation externally to the CPU,
which would be in the form of a bunch of wires and a length, with no
endianness.

But, the thread I pointed Victor to is focused purely on this
discussion, so you should probably respond there.

-Christoffer

> 
> >
> > Thanks,
> > Victor
> >
> > [1]  https://git.linaro.org/people/victor.kamensky/qemu-be.git/commitdiff/8599358f9711b7a546a2bba63b6277fbfb5b8e0c?hp=c4880f08ff9451e3d8020153e1a710ab4acee151
> >
> >> This is a user space interface across multiple architectures and
> >> therefore something you have to consider carefully and you're limited in
> >> choices to something that works with existing user space code.
> >>
> >>>
> >>> Changing  it on side that faces hypervisor (code that handles guest spilled
> >>> CPU register set) does not make sense at all - if we will keep guest CPU
> >>> register set in memory in LE form and hypervisor runs in BE (BE host),
> >>> code that spills registers would need to do constant byteswaps. Also any
> >>> access by host kernel and hypervisor (all running in BE) would need to do
> >>> byteswaps while working with guest saved registers.
> >>>
> >>> Changing canonical form of data on side that faces emulator and mmio
> >>> part of kvm_run does not make sense either. kvm_run mmio.data field is
> >>> bytes array, when it comes to host kernel from emulator, it already contains
> >>> device memory in correct endian order that corresponds to endianity of
> >>> emulated device. For example for LE device word read access, after call is
> >>> emulated, mmio.data will contain mmio.data[0], mmio.data[1], mmio.data[2]
> >>> mmio.data[3] values in LE order (mmio.data[3] is MSB). Now look at
> >>> mmio_read_buf function introduced by Marc's 6d89d2d9 commit, this function
> >>> will byte copy this mmio.data buffer into integer according to ongoing mmio
> >>> access size. Note in BE host case such integer, in 'data' variable of
> >>> kvm_handle_mmio_return function, will have byteswapped value. Now when it will
> >>> be passed into vcpu_data_host_to_guest function, and it emulates read access
> >>> of guest with E bit set, and if we follow your suggestion, it will be
> >>> byteswapped.
> >>> I.e 'data' integer will contain non byteswapped value of LE device. It will be
> >>> further stored into some vcpu_reg register, still in native format (BE
> >>> store), and
> >>> further restored into guest CPU register, still non byteswapped (BE hypervisor).
> >>> And that is not what BE client reading word of LE device expects - BE client
> >>> knowing that it reads LE device with E bit set, it will issue additional rev
> >>> instruction to get device memory as integer. If we really want to follow your
> >>> suggestion, one may introduce compensatory byteswaps in mmio_read_buf
> >>> and mmio_write_buf functions in case of BE host, rather then just do
> >>> memcpy ... but I am not sure what it will buy us - in BE case it will swap data
> >>> twice.
> >>>
> >>> Note in above description by "canonical" I mean some form of data regardless
> >>> of current access CPSR E value. But it may differ depending on host endianess.
> >>>
> >>
> >> There's a lot of text to digest here, talking about a canonical form
> >> here doesn't help; just define the layout of the destination byte array.
> >> I also got completely lost in what you're referring to when you talk
> >> about 'sides' here.
> >>
> >> The thing we must decide is how the data is stored in
> >> kvm_exit_mmio.data.  See Peter's recent thread "KVM and
> >> variable-endianness guest CPUs".  Once we agree on this, the rest should
> >> be easy (assuming we use the same structure for the data in the kernel's
> >> internal kvm_exit_mmio declared on the stack in io_mem_abort()).
> >>
> >> The format you suggest requires any consumer of this data to consider
> >> the host endianness, which I don't think makes anything more clear (see
> >> my comment on the vgic patch).
> >>
> >> The in-kernel interface between the io_mem_abort() code and any
> >> in-kernel emulated device must do exactly the same as the interface
> >> between KVM and QEMU must do for KVM_EXIT_MMIO.
> >>
> >> --
> >> Christoffer
> > _______________________________________________
> > kvmarm mailing list
> > kvmarm at lists.cs.columbia.edu
> > https://lists.cs.columbia.edu/cucslists/listinfo/kvmarm

-- 
Christoffer

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH REPOST 5/5] ARM: kvm MMIO support BE host running LE code
  2014-01-21  5:46                   ` Anup Patel
  2014-01-21  6:31                     ` Christoffer Dall
@ 2014-01-21  6:39                     ` Victor Kamensky
  1 sibling, 0 replies; 26+ messages in thread
From: Victor Kamensky @ 2014-01-21  6:39 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Anup,

On 20 January 2014 21:46, Anup Patel <anup@brainfault.org> wrote:
> On Tue, Jan 21, 2014 at 10:54 AM, Victor Kamensky
> <victor.kamensky@linaro.org> wrote:
>> On 20 January 2014 17:19, Christoffer Dall <christoffer.dall@linaro.org> wrote:
>>> On Mon, Jan 06, 2014 at 05:59:03PM -0800, Victor Kamensky wrote:
>>>> On 6 January 2014 14:56, Christoffer Dall <christoffer.dall@linaro.org> wrote:
>>>> > On Mon, Jan 06, 2014 at 10:31:42PM +0000, Peter Maydell wrote:
>>>> >> On 6 January 2014 18:20, Marc Zyngier <marc.zyngier@arm.com> wrote:
>>>> >> >  No matter how data is stored in memory (BE, LE, or
>>>> >> > even PDP endianness), CPU registers always have a consistent
>>>> >> > representation. They are immune to CPU endianness change, and storing
>>>> >> > to/reading from memory won't change the value, as long as you use the
>>>> >> > same endianness for writing/reading.
>>>> >>
>>>> >> Ah, endianness. This always confuses me, but I hope the following
>>>> >> is correct... (in all the following when I say BE I mean BE8, not BE32,
>>>> >> since BE32 and virtualization never occur in the same CPU).
>>>> >>
>>>> >> Certainly registers don't have endianness, but the entire point
>>>> >> of the CPSR.E bit is exactly that it changes the value as it is
>>>> >> stored to / read from memory, isn't it? -- that's where and when the
>>>> >> byte-lane flipping happens.
>>>> >>
>>>> >> Where this impacts the hypervisor is that instead of actually sending
>>>> >> the data out to the bus via the byte-swapping h/w, we've trapped instead.
>>>> >> The hypervisor reads the original data directly from the guest CPU
>>>> >> registers, and so it's the hypervisor and userspace support code that
>>>> >> between them have to emulate the equivalent of the byte lane
>>>> >> swapping h/w. You could argue that it shouldn't be the kernel's
>>>> >> job, but since the kernel has to do it for the devices it emulates
>>>> >> internally, I'm not sure that makes much sense.
>>>> >
>>>> > As far as I understand, this is exactly what vcpu_data_guest_to_host and
>>>> > vcpu_data_host_to_guest do; emulate the byte lane swapping.
>>>> >
>>>> > The problem is that it only works on a little-endian host with the
>>>> > current code, because be16_to_cpu (for example), actually perform a
>>>> > byteswap, which is what needs to be emulated.  On a big-endian host, we
>>>> > do nothing, so we end up giving a byteswapped value to the emulated
>>>> > device.
>>>>
>>>> Yes, that was my point on the thread: vcpu_data_guest_to_host and
>>>> vcpu_data_host_to_guest functions for any given host endianity should
>>>> give opposite endian results depending on CPSR E bit value. And
>>>> currently it is not happening in BE host case. It seems that Peter and
>>>> you agree with that and I gave example in another email with
>>>> dynamically switching E bit illustrating this problem for BE host.
>>>>
>>>> > I think a cleaner fix than this patch is to just change the
>>>> > be16_to_cpu() to a __swab16() instead, which clearly indicates that
>>>> > 'here is the byte lane swapping'.
>>>>
>>>> Yes, that may work, but it is a bit orthogonal issue.
>>>
>>> Why?  I don't think it is, I think it's addressing exactly the point at
>>> hand.
>>>
>>>> And I don't think
>>>> it is better. For this to work one need to change canonical endianity on
>>>> one of the sides around vcpu_data_guest_to_host and
>>>> vcpu_data_host_to_guest functions.
>>>
>>> You have to simply clearly define which format you want mmio.data to be
>>> in.
>>
>> I believe it is already decided. 'mmio.data' in 'struct kvm_run' is not
>> an integer type - it is bytes array. Bytes array does not have endianity.
>> It is endian agnostic. Here is snippet from linux/kvm.h
>>
>>                 /* KVM_EXIT_MMIO */
>>                 struct {
>>                         __u64 phys_addr;
>>                         __u8  data[8];
>>                         __u32 len;
>>                         __u8  is_write;
>>                 } mmio;
>>
>> it is very natural to treat it as just a piece of memory. I.e when code reads
>> emulated LE device address as integer, this array will contain integer
>> placed in memory in LE order, data[3] is MSB, as it would be located in
>> regular memory. When code reads emulated BE device address as
>> integer this array will contain integer placed in memory in BE order,
>> data[0] is MSB.
>>
>> You can think about it in that way: ARM system emulator runs on x86
>> (LE) and on PPC (BE). How mmio.data array for the same emulated
>> device should look like in across these two cases? I believe it should
>> be identical - just a stream of bytes.
>>
>> Emulator code handles this situation quite nicely. For example check
>> in qemu endianness field of MemoryRegionOps structure. Depending
>> of the field value and current emulator endianity code will place
>> results into 'mmio.data' array in right order. See [1] as an example
>> in qemu where endianity of certain ARM devices were not declared
>> correctly - it was marked as DEVICE_NATIVE_ENDIAN whereas
>> it should be DEVICE_LITTLE_ENDIAN. After I changed that BE qemu
>> pretty much started working. I strongly suspect if one would run
>> ARM system emulation on PPC (BE) he/she would need the same
>> changes.
>>
>> Note issue with virtio endianity is very different problem - there it
>> is not clear for given arrangement of host/emulator how to treat
>> virtio devices as LE or BE, and in what format data in rings
>> descriptors are.
>
> IMHO, device endianess should be taken care by device emulators only
> because we can have Machine Model containing both LE devices and
> BE devices. KVM ARM/ARM64 should only worry about endianess of
> in-kernel emulated devices (e.g. VGIC). In general, QEMU or KVMTOOL
> should be responsible of device endianess and for this QEMU or KVMTOOL
> should also know whether Guest (or VM) is little-endian or big-endian.

I agree with most of above statement except last part. I think
emulator and host KVM should not really care about guest endianity.
They should work in the same way in either case. MarcZ illustrated this
earlier with setup where LE KVM hosted either LE guest or BE guest.
Also note endianity as far as emulation concerned strictly speaking is
not property of the guest, it is rather property of current CPU execution
context (i.e E bit in CPSR reg of V7) In fact access endianity can
change on the fly - i.e when BE V7 image starts initially it assumes
that it runs in LE mode, then once kernel entered it switches CPU
into BE mode, the same happens with secondary CPU callback. And
with the last one I run into situation where such callback before switching
into BE mode read some emulated device with E bit off, latter the same
kernel reads the same device register with E bit on

Thanks,
Victor

> Regards,
> Anup
>
>>
>> Thanks,
>> Victor
>>
>> [1]  https://git.linaro.org/people/victor.kamensky/qemu-be.git/commitdiff/8599358f9711b7a546a2bba63b6277fbfb5b8e0c?hp=c4880f08ff9451e3d8020153e1a710ab4acee151
>>
>>> This is a user space interface across multiple architectures and
>>> therefore something you have to consider carefully and you're limited in
>>> choices to something that works with existing user space code.
>>>
>>>>
>>>> Changing  it on side that faces hypervisor (code that handles guest spilled
>>>> CPU register set) does not make sense at all - if we will keep guest CPU
>>>> register set in memory in LE form and hypervisor runs in BE (BE host),
>>>> code that spills registers would need to do constant byteswaps. Also any
>>>> access by host kernel and hypervisor (all running in BE) would need to do
>>>> byteswaps while working with guest saved registers.
>>>>
>>>> Changing canonical form of data on side that faces emulator and mmio
>>>> part of kvm_run does not make sense either. kvm_run mmio.data field is
>>>> bytes array, when it comes to host kernel from emulator, it already contains
>>>> device memory in correct endian order that corresponds to endianity of
>>>> emulated device. For example for LE device word read access, after call is
>>>> emulated, mmio.data will contain mmio.data[0], mmio.data[1], mmio.data[2]
>>>> mmio.data[3] values in LE order (mmio.data[3] is MSB). Now look at
>>>> mmio_read_buf function introduced by Marc's 6d89d2d9 commit, this function
>>>> will byte copy this mmio.data buffer into integer according to ongoing mmio
>>>> access size. Note in BE host case such integer, in 'data' variable of
>>>> kvm_handle_mmio_return function, will have byteswapped value. Now when it will
>>>> be passed into vcpu_data_host_to_guest function, and it emulates read access
>>>> of guest with E bit set, and if we follow your suggestion, it will be
>>>> byteswapped.
>>>> I.e 'data' integer will contain non byteswapped value of LE device. It will be
>>>> further stored into some vcpu_reg register, still in native format (BE
>>>> store), and
>>>> further restored into guest CPU register, still non byteswapped (BE hypervisor).
>>>> And that is not what BE client reading word of LE device expects - BE client
>>>> knowing that it reads LE device with E bit set, it will issue additional rev
>>>> instruction to get device memory as integer. If we really want to follow your
>>>> suggestion, one may introduce compensatory byteswaps in mmio_read_buf
>>>> and mmio_write_buf functions in case of BE host, rather then just do
>>>> memcpy ... but I am not sure what it will buy us - in BE case it will swap data
>>>> twice.
>>>>
>>>> Note in above description by "canonical" I mean some form of data regardless
>>>> of current access CPSR E value. But it may differ depending on host endianess.
>>>>
>>>
>>> There's a lot of text to digest here, talking about a canonical form
>>> here doesn't help; just define the layout of the destination byte array.
>>> I also got completely lost in what you're referring to when you talk
>>> about 'sides' here.
>>>
>>> The thing we must decide is how the data is stored in
>>> kvm_exit_mmio.data.  See Peter's recent thread "KVM and
>>> variable-endianness guest CPUs".  Once we agree on this, the rest should
>>> be easy (assuming we use the same structure for the data in the kernel's
>>> internal kvm_exit_mmio declared on the stack in io_mem_abort()).
>>>
>>> The format you suggest requires any consumer of this data to consider
>>> the host endianness, which I don't think makes anything more clear (see
>>> my comment on the vgic patch).
>>>
>>> The in-kernel interface between the io_mem_abort() code and any
>>> in-kernel emulated device must do exactly the same as the interface
>>> between KVM and QEMU must do for KVM_EXIT_MMIO.
>>>
>>> --
>>> Christoffer
>> _______________________________________________
>> kvmarm mailing list
>> kvmarm at lists.cs.columbia.edu
>> https://lists.cs.columbia.edu/cucslists/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH REPOST 1/5] ARM: kvm: replace push and pop with stdmb and ldmia instrs to enable assembler.h inclusion
  2014-01-21  1:18   ` Christoffer Dall
@ 2014-01-21  9:58     ` Marc Zyngier
  2014-01-22  6:41       ` Victor Kamensky
  0 siblings, 1 reply; 26+ messages in thread
From: Marc Zyngier @ 2014-01-21  9:58 UTC (permalink / raw)
  To: linux-arm-kernel

On 21/01/14 01:18, Christoffer Dall wrote:
> On Fri, Dec 20, 2013 at 08:48:41AM -0800, Victor Kamensky wrote:
>> Before fix kvm interrupt.S and interrupt_head.S used push and pop assembler
>> instruction. It causes problem if <asm/assembler.h> file should be include. In
>> assembler.h "push" is defined as macro so it causes compilation errors like
>> this:
> 
> "Before fix kvm..." doesn't read very pleasently, consider using
> something like "Prior to this commit...."
> 
> "causes a problem" or "causes problems"
> 
> change "if <asm/assembler.h> file should be include..." to "if
> <asm/assembler.h> is included, because assember.h defines 'push' as a
> macro..."
> 
> 
> 
>>
>> arch/arm/kvm/interrupts.S: Assembler messages:
>> arch/arm/kvm/interrupts.S:51: Error: ARM register expected -- `lsr {r2,r3}'
>>
>> Solution implemented by this patch replaces all 'push {...}' with
>> 'stdmb sp!, {...}' instruction; and all 'pop {...}' with 'ldmia sp!, {...}'.
>>
>> Signed-off-by: Victor Kamensky <victor.kamensky@linaro.org>
>> ---
>>  arch/arm/kvm/interrupts.S      | 38 +++++++++++++++++++-------------------
>>  arch/arm/kvm/interrupts_head.S | 34 +++++++++++++++++-----------------
>>  2 files changed, 36 insertions(+), 36 deletions(-)
>>
>> diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S
>> index ddc1553..df19133 100644
>> --- a/arch/arm/kvm/interrupts.S
>> +++ b/arch/arm/kvm/interrupts.S
>> @@ -47,7 +47,7 @@ __kvm_hyp_code_start:
>>   * instead, ignoring the ipa value.
>>   */
>>  ENTRY(__kvm_tlb_flush_vmid_ipa)
>> -	push	{r2, r3}
>> +	stmdb	sp!, {r2, r3}
>>  
>>  	dsb	ishst
>>  	add	r0, r0, #KVM_VTTBR
>> @@ -62,7 +62,7 @@ ENTRY(__kvm_tlb_flush_vmid_ipa)
>>  	mcrr	p15, 6, r2, r3, c2	@ Back to VMID #0
>>  	isb				@ Not necessary if followed by eret
>>  
>> -	pop	{r2, r3}
>> +	ldmia	sp!, {r2, r3}
>>  	bx	lr
>>  ENDPROC(__kvm_tlb_flush_vmid_ipa)
>>  
>> @@ -110,7 +110,7 @@ ENTRY(__kvm_vcpu_run)
>>  #ifdef CONFIG_VFPv3
>>  	@ Set FPEXC_EN so the guest doesn't trap floating point instructions
>>  	VFPFMRX r2, FPEXC		@ VMRS
>> -	push	{r2}
>> +	stmdb	sp!, {r2}
>>  	orr	r2, r2, #FPEXC_EN
>>  	VFPFMXR FPEXC, r2		@ VMSR
>>  #endif
>> @@ -175,7 +175,7 @@ __kvm_vcpu_return:
>>  
>>  after_vfp_restore:
>>  	@ Restore FPEXC_EN which we clobbered on entry
>> -	pop	{r2}
>> +	ldmia	sp!, {r2}
>>  	VFPFMXR FPEXC, r2
>>  #endif
>>  
>> @@ -260,7 +260,7 @@ ENTRY(kvm_call_hyp)
>>  
>>  /* Handle undef, svc, pabt, or dabt by crashing with a user notice */
>>  .macro bad_exception exception_code, panic_str
>> -	push	{r0-r2}
>> +	stmdb   sp!, {r0-r2}
>>  	mrrc	p15, 6, r0, r1, c2	@ Read VTTBR
>>  	lsr	r1, r1, #16
>>  	ands	r1, r1, #0xff
>> @@ -338,7 +338,7 @@ hyp_hvc:
>>  	 * Getting here is either becuase of a trap from a guest or from calling
>>  	 * HVC from the host kernel, which means "switch to Hyp mode".
>>  	 */
>> -	push	{r0, r1, r2}
>> +	stmdb	sp!, {r0, r1, r2}
>>  
>>  	@ Check syndrome register
>>  	mrc	p15, 4, r1, c5, c2, 0	@ HSR
>> @@ -361,11 +361,11 @@ hyp_hvc:
>>  	bne	guest_trap		@ Guest called HVC
>>  
>>  host_switch_to_hyp:
>> -	pop	{r0, r1, r2}
>> +	ldmia	sp!, {r0, r1, r2}
>>  
>> -	push	{lr}
>> +	stmdb	sp!, {lr}
>>  	mrs	lr, SPSR
>> -	push	{lr}
>> +	stmdb	sp!, {lr}
>>  
>>  	mov	lr, r0
>>  	mov	r0, r1
>> @@ -375,9 +375,9 @@ host_switch_to_hyp:
>>  THUMB(	orr	lr, #1)
>>  	blx	lr			@ Call the HYP function
>>  
>> -	pop	{lr}
>> +	ldmia	sp!, {lr}
>>  	msr	SPSR_csxf, lr
>> -	pop	{lr}
>> +	ldmia	sp!, {lr}
>>  	eret
>>  
>>  guest_trap:
>> @@ -418,7 +418,7 @@ guest_trap:
>>  
>>  	/* Preserve PAR */
>>  	mrrc	p15, 0, r0, r1, c7	@ PAR
>> -	push	{r0, r1}
>> +	stmdb	sp!, {r0, r1}
>>  
>>  	/* Resolve IPA using the xFAR */
>>  	mcr	p15, 0, r2, c7, c8, 0	@ ATS1CPR
>> @@ -431,7 +431,7 @@ guest_trap:
>>  	orr	r2, r2, r1, lsl #24
>>  
>>  	/* Restore PAR */
>> -	pop	{r0, r1}
>> +	ldmia	sp!, {r0, r1}
>>  	mcrr	p15, 0, r0, r1, c7	@ PAR
>>  
>>  3:	load_vcpu			@ Load VCPU pointer to r0
>> @@ -440,10 +440,10 @@ guest_trap:
>>  1:	mov	r1, #ARM_EXCEPTION_HVC
>>  	b	__kvm_vcpu_return
>>  
>> -4:	pop	{r0, r1}		@ Failed translation, return to guest
>> +4:	ldmia	sp!, {r0, r1}		@ Failed translation, return to guest
>>  	mcrr	p15, 0, r0, r1, c7	@ PAR
>>  	clrex
>> -	pop	{r0, r1, r2}
>> +	ldmia	sp!, {r0, r1, r2}
>>  	eret
>>  
>>  /*
>> @@ -455,7 +455,7 @@ guest_trap:
>>  #ifdef CONFIG_VFPv3
>>  switch_to_guest_vfp:
>>  	load_vcpu			@ Load VCPU pointer to r0
>> -	push	{r3-r7}
>> +	stmdb	sp!, {r3-r7}
>>  
>>  	@ NEON/VFP used.  Turn on VFP access.
>>  	set_hcptr vmexit, (HCPTR_TCP(10) | HCPTR_TCP(11))
>> @@ -467,15 +467,15 @@ switch_to_guest_vfp:
>>  	add	r7, r0, #VCPU_VFP_GUEST
>>  	restore_vfp_state r7
>>  
>> -	pop	{r3-r7}
>> -	pop	{r0-r2}
>> +	ldmia	sp!, {r3-r7}
>> +	ldmia	sp!, {r0-r2}
>>  	clrex
>>  	eret
>>  #endif
>>  
>>  	.align
>>  hyp_irq:
>> -	push	{r0, r1, r2}
>> +	stmdb	sp!, {r0, r1, r2}
>>  	mov	r1, #ARM_EXCEPTION_IRQ
>>  	load_vcpu			@ Load VCPU pointer to r0
>>  	b	__kvm_vcpu_return
>> diff --git a/arch/arm/kvm/interrupts_head.S b/arch/arm/kvm/interrupts_head.S
>> index 6f18695..c371db7 100644
>> --- a/arch/arm/kvm/interrupts_head.S
>> +++ b/arch/arm/kvm/interrupts_head.S
>> @@ -63,7 +63,7 @@ vcpu	.req	r0		@ vcpu pointer always in r0
>>  	mrs	r2, SP_\mode
>>  	mrs	r3, LR_\mode
>>  	mrs	r4, SPSR_\mode
>> -	push	{r2, r3, r4}
>> +	stmdb	sp!, {r2, r3, r4}
>>  .endm
>>  
>>  /*
>> @@ -73,13 +73,13 @@ vcpu	.req	r0		@ vcpu pointer always in r0
>>  .macro save_host_regs
>>  	/* Hyp regs. Only ELR_hyp (SPSR_hyp already saved) */
>>  	mrs	r2, ELR_hyp
>> -	push	{r2}
>> +	stmdb	sp!, {r2}
>>  
>>  	/* usr regs */
>> -	push	{r4-r12}	@ r0-r3 are always clobbered
>> +	stmdb	sp!, {r4-r12}	@ r0-r3 are always clobbered
>>  	mrs	r2, SP_usr
>>  	mov	r3, lr
>> -	push	{r2, r3}
>> +	stmdb	sp!, {r2, r3}
>>  
>>  	push_host_regs_mode svc
>>  	push_host_regs_mode abt
>> @@ -95,11 +95,11 @@ vcpu	.req	r0		@ vcpu pointer always in r0
>>  	mrs	r7, SP_fiq
>>  	mrs	r8, LR_fiq
>>  	mrs	r9, SPSR_fiq
>> -	push	{r2-r9}
>> +	stmdb	sp!, {r2-r9}
>>  .endm
>>  
>>  .macro pop_host_regs_mode mode
>> -	pop	{r2, r3, r4}
>> +	ldmia	sp!, {r2, r3, r4}
>>  	msr	SP_\mode, r2
>>  	msr	LR_\mode, r3
>>  	msr	SPSR_\mode, r4
>> @@ -110,7 +110,7 @@ vcpu	.req	r0		@ vcpu pointer always in r0
>>   * Clobbers all registers, in all modes, except r0 and r1.
>>   */
>>  .macro restore_host_regs
>> -	pop	{r2-r9}
>> +	ldmia	sp!, {r2-r9}
>>  	msr	r8_fiq, r2
>>  	msr	r9_fiq, r3
>>  	msr	r10_fiq, r4
>> @@ -125,12 +125,12 @@ vcpu	.req	r0		@ vcpu pointer always in r0
>>  	pop_host_regs_mode abt
>>  	pop_host_regs_mode svc
>>  
>> -	pop	{r2, r3}
>> +	ldmia	sp!, {r2, r3}
>>  	msr	SP_usr, r2
>>  	mov	lr, r3
>> -	pop	{r4-r12}
>> +	ldmia	sp!, {r4-r12}
>>  
>> -	pop	{r2}
>> +	ldmia	sp!, {r2}
>>  	msr	ELR_hyp, r2
>>  .endm
>>  
>> @@ -218,7 +218,7 @@ vcpu	.req	r0		@ vcpu pointer always in r0
>>  	add	r2, vcpu, #VCPU_USR_REG(3)
>>  	stm	r2, {r3-r12}
>>  	add	r2, vcpu, #VCPU_USR_REG(0)
>> -	pop	{r3, r4, r5}		@ r0, r1, r2
>> +	ldmia	sp!, {r3, r4, r5}		@ r0, r1, r2
>>  	stm	r2, {r3, r4, r5}
>>  	mrs	r2, SP_usr
>>  	mov	r3, lr
>> @@ -258,7 +258,7 @@ vcpu	.req	r0		@ vcpu pointer always in r0
>>  	mrc	p15, 2, r12, c0, c0, 0	@ CSSELR
>>  
>>  	.if \store_to_vcpu == 0
>> -	push	{r2-r12}		@ Push CP15 registers
>> +	stmdb	sp!, {r2-r12}		@ Push CP15 registers
>>  	.else
>>  	str	r2, [vcpu, #CP15_OFFSET(c1_SCTLR)]
>>  	str	r3, [vcpu, #CP15_OFFSET(c1_CPACR)]
>> @@ -286,7 +286,7 @@ vcpu	.req	r0		@ vcpu pointer always in r0
>>  	mrc	p15, 0, r12, c12, c0, 0	@ VBAR
>>  
>>  	.if \store_to_vcpu == 0
>> -	push	{r2-r12}		@ Push CP15 registers
>> +	stmdb	sp!, {r2-r12}		@ Push CP15 registers
>>  	.else
>>  	str	r2, [vcpu, #CP15_OFFSET(c13_CID)]
>>  	str	r3, [vcpu, #CP15_OFFSET(c13_TID_URW)]
>> @@ -305,7 +305,7 @@ vcpu	.req	r0		@ vcpu pointer always in r0
>>  	mrrc	p15, 0, r4, r5, c7	@ PAR
>>  
>>  	.if \store_to_vcpu == 0
>> -	push	{r2,r4-r5}
>> +	stmdb	sp!, {r2,r4-r5}
>>  	.else
>>  	str	r2, [vcpu, #CP15_OFFSET(c14_CNTKCTL)]
>>  	add	r12, vcpu, #CP15_OFFSET(c7_PAR)
>> @@ -322,7 +322,7 @@ vcpu	.req	r0		@ vcpu pointer always in r0
>>   */
>>  .macro write_cp15_state read_from_vcpu
>>  	.if \read_from_vcpu == 0
>> -	pop	{r2,r4-r5}
>> +	ldmia	sp!, {r2,r4-r5}
>>  	.else
>>  	ldr	r2, [vcpu, #CP15_OFFSET(c14_CNTKCTL)]
>>  	add	r12, vcpu, #CP15_OFFSET(c7_PAR)
>> @@ -333,7 +333,7 @@ vcpu	.req	r0		@ vcpu pointer always in r0
>>  	mcrr	p15, 0, r4, r5, c7	@ PAR
>>  
>>  	.if \read_from_vcpu == 0
>> -	pop	{r2-r12}
>> +	ldmia	sp!, {r2-r12}
>>  	.else
>>  	ldr	r2, [vcpu, #CP15_OFFSET(c13_CID)]
>>  	ldr	r3, [vcpu, #CP15_OFFSET(c13_TID_URW)]
>> @@ -361,7 +361,7 @@ vcpu	.req	r0		@ vcpu pointer always in r0
>>  	mcr	p15, 0, r12, c12, c0, 0	@ VBAR
>>  
>>  	.if \read_from_vcpu == 0
>> -	pop	{r2-r12}
>> +	ldmia	sp!, {r2-r12}
>>  	.else
>>  	ldr	r2, [vcpu, #CP15_OFFSET(c1_SCTLR)]
>>  	ldr	r3, [vcpu, #CP15_OFFSET(c1_CPACR)]
>> -- 
>> 1.8.1.4
>>
> 
> If you fix to address Dave's comments, then the code change otherwise
> looks good.

How about trying this alternative approach:

It looks like all the users of the push/pop macros are located in
arch/arm/lib (mostly checksumming code). Can't we move these macros to a
separate include file and leave the code that uses push/pop (as defined
by the assembler) alone?

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH REPOST 1/5] ARM: kvm: replace push and pop with stdmb and ldmia instrs to enable assembler.h inclusion
  2014-01-21  9:58     ` Marc Zyngier
@ 2014-01-22  6:41       ` Victor Kamensky
  2014-01-22 10:22         ` Will Deacon
  0 siblings, 1 reply; 26+ messages in thread
From: Victor Kamensky @ 2014-01-22  6:41 UTC (permalink / raw)
  To: linux-arm-kernel

Russell, Dave, Will, could you please check below
inline, looking for your opinion.

Marc, response is inline.

On 21 January 2014 01:58, Marc Zyngier <marc.zyngier@arm.com> wrote:
> On 21/01/14 01:18, Christoffer Dall wrote:
>> On Fri, Dec 20, 2013 at 08:48:41AM -0800, Victor Kamensky wrote:
>>> Before fix kvm interrupt.S and interrupt_head.S used push and pop assembler
>>> instruction. It causes problem if <asm/assembler.h> file should be include. In
>>> assembler.h "push" is defined as macro so it causes compilation errors like
>>> this:
>>
>> "Before fix kvm..." doesn't read very pleasently, consider using
>> something like "Prior to this commit...."
>>
>> "causes a problem" or "causes problems"
>>
>> change "if <asm/assembler.h> file should be include..." to "if
>> <asm/assembler.h> is included, because assember.h defines 'push' as a
>> macro..."
>>
>>
>>
>>>
>>> arch/arm/kvm/interrupts.S: Assembler messages:
>>> arch/arm/kvm/interrupts.S:51: Error: ARM register expected -- `lsr {r2,r3}'
>>>
>>> Solution implemented by this patch replaces all 'push {...}' with
>>> 'stdmb sp!, {...}' instruction; and all 'pop {...}' with 'ldmia sp!, {...}'.
>>>
>>> Signed-off-by: Victor Kamensky <victor.kamensky@linaro.org>
>>> ---
>>>  arch/arm/kvm/interrupts.S      | 38 +++++++++++++++++++-------------------
>>>  arch/arm/kvm/interrupts_head.S | 34 +++++++++++++++++-----------------
>>>  2 files changed, 36 insertions(+), 36 deletions(-)
>>>
>>> diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S
>>> index ddc1553..df19133 100644
>>> --- a/arch/arm/kvm/interrupts.S
>>> +++ b/arch/arm/kvm/interrupts.S
>>> @@ -47,7 +47,7 @@ __kvm_hyp_code_start:
>>>   * instead, ignoring the ipa value.
>>>   */
>>>  ENTRY(__kvm_tlb_flush_vmid_ipa)
>>> -    push    {r2, r3}
>>> +    stmdb   sp!, {r2, r3}
>>>
>>>      dsb     ishst
>>>      add     r0, r0, #KVM_VTTBR
>>> @@ -62,7 +62,7 @@ ENTRY(__kvm_tlb_flush_vmid_ipa)
>>>      mcrr    p15, 6, r2, r3, c2      @ Back to VMID #0
>>>      isb                             @ Not necessary if followed by eret
>>>
>>> -    pop     {r2, r3}
>>> +    ldmia   sp!, {r2, r3}
>>>      bx      lr
>>>  ENDPROC(__kvm_tlb_flush_vmid_ipa)
>>>
>>> @@ -110,7 +110,7 @@ ENTRY(__kvm_vcpu_run)
>>>  #ifdef CONFIG_VFPv3
>>>      @ Set FPEXC_EN so the guest doesn't trap floating point instructions
>>>      VFPFMRX r2, FPEXC               @ VMRS
>>> -    push    {r2}
>>> +    stmdb   sp!, {r2}
>>>      orr     r2, r2, #FPEXC_EN
>>>      VFPFMXR FPEXC, r2               @ VMSR
>>>  #endif
>>> @@ -175,7 +175,7 @@ __kvm_vcpu_return:
>>>
>>>  after_vfp_restore:
>>>      @ Restore FPEXC_EN which we clobbered on entry
>>> -    pop     {r2}
>>> +    ldmia   sp!, {r2}
>>>      VFPFMXR FPEXC, r2
>>>  #endif
>>>
>>> @@ -260,7 +260,7 @@ ENTRY(kvm_call_hyp)
>>>
>>>  /* Handle undef, svc, pabt, or dabt by crashing with a user notice */
>>>  .macro bad_exception exception_code, panic_str
>>> -    push    {r0-r2}
>>> +    stmdb   sp!, {r0-r2}
>>>      mrrc    p15, 6, r0, r1, c2      @ Read VTTBR
>>>      lsr     r1, r1, #16
>>>      ands    r1, r1, #0xff
>>> @@ -338,7 +338,7 @@ hyp_hvc:
>>>       * Getting here is either becuase of a trap from a guest or from calling
>>>       * HVC from the host kernel, which means "switch to Hyp mode".
>>>       */
>>> -    push    {r0, r1, r2}
>>> +    stmdb   sp!, {r0, r1, r2}
>>>
>>>      @ Check syndrome register
>>>      mrc     p15, 4, r1, c5, c2, 0   @ HSR
>>> @@ -361,11 +361,11 @@ hyp_hvc:
>>>      bne     guest_trap              @ Guest called HVC
>>>
>>>  host_switch_to_hyp:
>>> -    pop     {r0, r1, r2}
>>> +    ldmia   sp!, {r0, r1, r2}
>>>
>>> -    push    {lr}
>>> +    stmdb   sp!, {lr}
>>>      mrs     lr, SPSR
>>> -    push    {lr}
>>> +    stmdb   sp!, {lr}
>>>
>>>      mov     lr, r0
>>>      mov     r0, r1
>>> @@ -375,9 +375,9 @@ host_switch_to_hyp:
>>>  THUMB(      orr     lr, #1)
>>>      blx     lr                      @ Call the HYP function
>>>
>>> -    pop     {lr}
>>> +    ldmia   sp!, {lr}
>>>      msr     SPSR_csxf, lr
>>> -    pop     {lr}
>>> +    ldmia   sp!, {lr}
>>>      eret
>>>
>>>  guest_trap:
>>> @@ -418,7 +418,7 @@ guest_trap:
>>>
>>>      /* Preserve PAR */
>>>      mrrc    p15, 0, r0, r1, c7      @ PAR
>>> -    push    {r0, r1}
>>> +    stmdb   sp!, {r0, r1}
>>>
>>>      /* Resolve IPA using the xFAR */
>>>      mcr     p15, 0, r2, c7, c8, 0   @ ATS1CPR
>>> @@ -431,7 +431,7 @@ guest_trap:
>>>      orr     r2, r2, r1, lsl #24
>>>
>>>      /* Restore PAR */
>>> -    pop     {r0, r1}
>>> +    ldmia   sp!, {r0, r1}
>>>      mcrr    p15, 0, r0, r1, c7      @ PAR
>>>
>>>  3:  load_vcpu                       @ Load VCPU pointer to r0
>>> @@ -440,10 +440,10 @@ guest_trap:
>>>  1:  mov     r1, #ARM_EXCEPTION_HVC
>>>      b       __kvm_vcpu_return
>>>
>>> -4:  pop     {r0, r1}                @ Failed translation, return to guest
>>> +4:  ldmia   sp!, {r0, r1}           @ Failed translation, return to guest
>>>      mcrr    p15, 0, r0, r1, c7      @ PAR
>>>      clrex
>>> -    pop     {r0, r1, r2}
>>> +    ldmia   sp!, {r0, r1, r2}
>>>      eret
>>>
>>>  /*
>>> @@ -455,7 +455,7 @@ guest_trap:
>>>  #ifdef CONFIG_VFPv3
>>>  switch_to_guest_vfp:
>>>      load_vcpu                       @ Load VCPU pointer to r0
>>> -    push    {r3-r7}
>>> +    stmdb   sp!, {r3-r7}
>>>
>>>      @ NEON/VFP used.  Turn on VFP access.
>>>      set_hcptr vmexit, (HCPTR_TCP(10) | HCPTR_TCP(11))
>>> @@ -467,15 +467,15 @@ switch_to_guest_vfp:
>>>      add     r7, r0, #VCPU_VFP_GUEST
>>>      restore_vfp_state r7
>>>
>>> -    pop     {r3-r7}
>>> -    pop     {r0-r2}
>>> +    ldmia   sp!, {r3-r7}
>>> +    ldmia   sp!, {r0-r2}
>>>      clrex
>>>      eret
>>>  #endif
>>>
>>>      .align
>>>  hyp_irq:
>>> -    push    {r0, r1, r2}
>>> +    stmdb   sp!, {r0, r1, r2}
>>>      mov     r1, #ARM_EXCEPTION_IRQ
>>>      load_vcpu                       @ Load VCPU pointer to r0
>>>      b       __kvm_vcpu_return
>>> diff --git a/arch/arm/kvm/interrupts_head.S b/arch/arm/kvm/interrupts_head.S
>>> index 6f18695..c371db7 100644
>>> --- a/arch/arm/kvm/interrupts_head.S
>>> +++ b/arch/arm/kvm/interrupts_head.S
>>> @@ -63,7 +63,7 @@ vcpu       .req    r0              @ vcpu pointer always in r0
>>>      mrs     r2, SP_\mode
>>>      mrs     r3, LR_\mode
>>>      mrs     r4, SPSR_\mode
>>> -    push    {r2, r3, r4}
>>> +    stmdb   sp!, {r2, r3, r4}
>>>  .endm
>>>
>>>  /*
>>> @@ -73,13 +73,13 @@ vcpu     .req    r0              @ vcpu pointer always in r0
>>>  .macro save_host_regs
>>>      /* Hyp regs. Only ELR_hyp (SPSR_hyp already saved) */
>>>      mrs     r2, ELR_hyp
>>> -    push    {r2}
>>> +    stmdb   sp!, {r2}
>>>
>>>      /* usr regs */
>>> -    push    {r4-r12}        @ r0-r3 are always clobbered
>>> +    stmdb   sp!, {r4-r12}   @ r0-r3 are always clobbered
>>>      mrs     r2, SP_usr
>>>      mov     r3, lr
>>> -    push    {r2, r3}
>>> +    stmdb   sp!, {r2, r3}
>>>
>>>      push_host_regs_mode svc
>>>      push_host_regs_mode abt
>>> @@ -95,11 +95,11 @@ vcpu     .req    r0              @ vcpu pointer always in r0
>>>      mrs     r7, SP_fiq
>>>      mrs     r8, LR_fiq
>>>      mrs     r9, SPSR_fiq
>>> -    push    {r2-r9}
>>> +    stmdb   sp!, {r2-r9}
>>>  .endm
>>>
>>>  .macro pop_host_regs_mode mode
>>> -    pop     {r2, r3, r4}
>>> +    ldmia   sp!, {r2, r3, r4}
>>>      msr     SP_\mode, r2
>>>      msr     LR_\mode, r3
>>>      msr     SPSR_\mode, r4
>>> @@ -110,7 +110,7 @@ vcpu     .req    r0              @ vcpu pointer always in r0
>>>   * Clobbers all registers, in all modes, except r0 and r1.
>>>   */
>>>  .macro restore_host_regs
>>> -    pop     {r2-r9}
>>> +    ldmia   sp!, {r2-r9}
>>>      msr     r8_fiq, r2
>>>      msr     r9_fiq, r3
>>>      msr     r10_fiq, r4
>>> @@ -125,12 +125,12 @@ vcpu   .req    r0              @ vcpu pointer always in r0
>>>      pop_host_regs_mode abt
>>>      pop_host_regs_mode svc
>>>
>>> -    pop     {r2, r3}
>>> +    ldmia   sp!, {r2, r3}
>>>      msr     SP_usr, r2
>>>      mov     lr, r3
>>> -    pop     {r4-r12}
>>> +    ldmia   sp!, {r4-r12}
>>>
>>> -    pop     {r2}
>>> +    ldmia   sp!, {r2}
>>>      msr     ELR_hyp, r2
>>>  .endm
>>>
>>> @@ -218,7 +218,7 @@ vcpu     .req    r0              @ vcpu pointer always in r0
>>>      add     r2, vcpu, #VCPU_USR_REG(3)
>>>      stm     r2, {r3-r12}
>>>      add     r2, vcpu, #VCPU_USR_REG(0)
>>> -    pop     {r3, r4, r5}            @ r0, r1, r2
>>> +    ldmia   sp!, {r3, r4, r5}               @ r0, r1, r2
>>>      stm     r2, {r3, r4, r5}
>>>      mrs     r2, SP_usr
>>>      mov     r3, lr
>>> @@ -258,7 +258,7 @@ vcpu     .req    r0              @ vcpu pointer always in r0
>>>      mrc     p15, 2, r12, c0, c0, 0  @ CSSELR
>>>
>>>      .if \store_to_vcpu == 0
>>> -    push    {r2-r12}                @ Push CP15 registers
>>> +    stmdb   sp!, {r2-r12}           @ Push CP15 registers
>>>      .else
>>>      str     r2, [vcpu, #CP15_OFFSET(c1_SCTLR)]
>>>      str     r3, [vcpu, #CP15_OFFSET(c1_CPACR)]
>>> @@ -286,7 +286,7 @@ vcpu     .req    r0              @ vcpu pointer always in r0
>>>      mrc     p15, 0, r12, c12, c0, 0 @ VBAR
>>>
>>>      .if \store_to_vcpu == 0
>>> -    push    {r2-r12}                @ Push CP15 registers
>>> +    stmdb   sp!, {r2-r12}           @ Push CP15 registers
>>>      .else
>>>      str     r2, [vcpu, #CP15_OFFSET(c13_CID)]
>>>      str     r3, [vcpu, #CP15_OFFSET(c13_TID_URW)]
>>> @@ -305,7 +305,7 @@ vcpu     .req    r0              @ vcpu pointer always in r0
>>>      mrrc    p15, 0, r4, r5, c7      @ PAR
>>>
>>>      .if \store_to_vcpu == 0
>>> -    push    {r2,r4-r5}
>>> +    stmdb   sp!, {r2,r4-r5}
>>>      .else
>>>      str     r2, [vcpu, #CP15_OFFSET(c14_CNTKCTL)]
>>>      add     r12, vcpu, #CP15_OFFSET(c7_PAR)
>>> @@ -322,7 +322,7 @@ vcpu     .req    r0              @ vcpu pointer always in r0
>>>   */
>>>  .macro write_cp15_state read_from_vcpu
>>>      .if \read_from_vcpu == 0
>>> -    pop     {r2,r4-r5}
>>> +    ldmia   sp!, {r2,r4-r5}
>>>      .else
>>>      ldr     r2, [vcpu, #CP15_OFFSET(c14_CNTKCTL)]
>>>      add     r12, vcpu, #CP15_OFFSET(c7_PAR)
>>> @@ -333,7 +333,7 @@ vcpu     .req    r0              @ vcpu pointer always in r0
>>>      mcrr    p15, 0, r4, r5, c7      @ PAR
>>>
>>>      .if \read_from_vcpu == 0
>>> -    pop     {r2-r12}
>>> +    ldmia   sp!, {r2-r12}
>>>      .else
>>>      ldr     r2, [vcpu, #CP15_OFFSET(c13_CID)]
>>>      ldr     r3, [vcpu, #CP15_OFFSET(c13_TID_URW)]
>>> @@ -361,7 +361,7 @@ vcpu     .req    r0              @ vcpu pointer always in r0
>>>      mcr     p15, 0, r12, c12, c0, 0 @ VBAR
>>>
>>>      .if \read_from_vcpu == 0
>>> -    pop     {r2-r12}
>>> +    ldmia   sp!, {r2-r12}
>>>      .else
>>>      ldr     r2, [vcpu, #CP15_OFFSET(c1_SCTLR)]
>>>      ldr     r3, [vcpu, #CP15_OFFSET(c1_CPACR)]
>>> --
>>> 1.8.1.4
>>>
>>
>> If you fix to address Dave's comments, then the code change otherwise
>> looks good.
>
> How about trying this alternative approach:
>
> It looks like all the users of the push/pop macros are located in
> arch/arm/lib (mostly checksumming code). Can't we move these macros to a
> separate include file and leave the code that uses push/pop (as defined
> by the assembler) alone?

Marc, personally I am OK with such proposal. I was considering something
along these lines as one of the options. It works for me both ways. If
others agree I am happy to recode it as your suggested. I choose
proposed above patch because kvm arm code came after push and pop
defines were introduced in asm/assembler.h and used in other places.
I am OK either way. I agree that use of push and pop as define names
seems a bit unfortunate, but I don't have any historic visibility here

Russell, Dave, Will, do you have any opinion on Marc's proposal to
fix this issue?

Thanks,
Victor


> Thanks,
>
>         M.
> --
> Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH REPOST 1/5] ARM: kvm: replace push and pop with stdmb and ldmia instrs to enable assembler.h inclusion
  2014-01-22  6:41       ` Victor Kamensky
@ 2014-01-22 10:22         ` Will Deacon
  0 siblings, 0 replies; 26+ messages in thread
From: Will Deacon @ 2014-01-22 10:22 UTC (permalink / raw)
  To: linux-arm-kernel

[Adding Nico, as the author of the push/pull macros. Background is that kvm
is using push to store to the stack and would now like to include assembler.h]

On Wed, Jan 22, 2014 at 06:41:09AM +0000, Victor Kamensky wrote:
> On 21 January 2014 01:58, Marc Zyngier <marc.zyngier@arm.com> wrote:
> > How about trying this alternative approach:
> >
> > It looks like all the users of the push/pop macros are located in
> > arch/arm/lib (mostly checksumming code). Can't we move these macros to a
> > separate include file and leave the code that uses push/pop (as defined
> > by the assembler) alone?
> 
> Marc, personally I am OK with such proposal. I was considering something
> along these lines as one of the options. It works for me both ways. If
> others agree I am happy to recode it as your suggested. I choose
> proposed above patch because kvm arm code came after push and pop
> defines were introduced in asm/assembler.h and used in other places.
> I am OK either way. I agree that use of push and pop as define names
> seems a bit unfortunate, but I don't have any historic visibility here
> 
> Russell, Dave, Will, do you have any opinion on Marc's proposal to
> fix this issue?

I'm perfectly fine with moving those macros into a lib/-local header file.
An alternative is renaming push/pull to something like lspush and lspull
and updating the files under lib.

Will

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2014-01-22 10:22 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-12-20 16:48 [PATCH REPOST 0/5] armv7 BE kvm support Victor Kamensky
2013-12-20 16:48 ` [PATCH REPOST 1/5] ARM: kvm: replace push and pop with stdmb and ldmia instrs to enable assembler.h inclusion Victor Kamensky
2014-01-21  1:18   ` Christoffer Dall
2014-01-21  9:58     ` Marc Zyngier
2014-01-22  6:41       ` Victor Kamensky
2014-01-22 10:22         ` Will Deacon
2013-12-20 16:48 ` [PATCH REPOST 2/5] ARM: fix KVM assembler files to work in BE case Victor Kamensky
2014-01-21  1:18   ` Christoffer Dall
2013-12-20 16:48 ` [PATCH REPOST 3/5] ARM: kvm one_reg coproc set and get BE fixes Victor Kamensky
2014-01-21  1:18   ` Christoffer Dall
2013-12-20 16:48 ` [PATCH REPOST 4/5] ARM: kvm vgic mmio should return data in BE format in BE case Victor Kamensky
2014-01-21  1:19   ` Christoffer Dall
2013-12-20 16:48 ` [PATCH REPOST 5/5] ARM: kvm MMIO support BE host running LE code Victor Kamensky
2014-01-06 12:37   ` Marc Zyngier
2014-01-06 17:44     ` Victor Kamensky
2014-01-06 18:20       ` Marc Zyngier
2014-01-06 19:55         ` Victor Kamensky
2014-01-06 22:31         ` Peter Maydell
2014-01-06 22:56           ` Christoffer Dall
2014-01-07  1:59             ` Victor Kamensky
2014-01-21  1:19               ` Christoffer Dall
2014-01-21  5:24                 ` Victor Kamensky
2014-01-21  5:46                   ` Anup Patel
2014-01-21  6:31                     ` Christoffer Dall
2014-01-21  6:39                     ` Victor Kamensky
2014-01-21  6:03                   ` Christoffer Dall

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).