* [PATCH] ARM: io: avoid writeback addressing modes for __raw_ accessors @ 2012-08-14 14:59 Will Deacon 2012-08-17 3:43 ` Nicolas Pitre 0 siblings, 1 reply; 13+ messages in thread From: Will Deacon @ 2012-08-14 14:59 UTC (permalink / raw) To: linux-arm-kernel Data aborts taken to hyp mode do not provide a valid instruction syndrome field in the HSR if the faulting instruction is a memory access using a writeback addressing mode. For hypervisors emulating MMIO accesses to virtual peripherals, taking such an exception requires disassembling the faulting instruction in order to determine the behaviour of the access. Since this requires manually walking the two stages of translation, the world must be stopped to prevent races against page aging in the guest, where the first-stage translation is invalidated after the hypervisor has translated to an IPA and the physical page is reused for something else. This patch avoids taking this heavy performance penalty when running Linux as a guest by ensuring that our I/O accessors do not make use of writeback addressing modes. Tested-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> --- arch/arm/include/asm/io.h | 55 ++++++++++++++++++++++++++++++++++++++++----- 1 files changed, 49 insertions(+), 6 deletions(-) diff --git a/arch/arm/include/asm/io.h b/arch/arm/include/asm/io.h index 815c669..b54d687 100644 --- a/arch/arm/include/asm/io.h +++ b/arch/arm/include/asm/io.h @@ -47,13 +47,56 @@ extern void __raw_readsb(const void __iomem *addr, void *data, int bytelen); extern void __raw_readsw(const void __iomem *addr, void *data, int wordlen); extern void __raw_readsl(const void __iomem *addr, void *data, int longlen); -#define __raw_writeb(v,a) ((void)(__chk_io_ptr(a), *(volatile unsigned char __force *)(a) = (v))) -#define __raw_writew(v,a) ((void)(__chk_io_ptr(a), *(volatile unsigned short __force *)(a) = (v))) -#define __raw_writel(v,a) ((void)(__chk_io_ptr(a), *(volatile unsigned int __force *)(a) = (v))) +#if __LINUX_ARM_ARCH__ < 6 +/* + * Half-word accesses are problematic with RiscPC due to limitations of + * the bus. Rather than special-case the machine, just let the compiler + * generate the access for CPUs prior to ARMv6. + */ +#define __raw_readw(a) (__chk_io_ptr(a), *(volatile unsigned short __force *)(a)) +#define __raw_writew(v,a) ((void)(__chk_io_ptr(a), *(volatile unsigned short __force *)(a) = (v))) +#else +/* + * When running under a hypervisor, we want to avoid I/O accesses with + * writeback addressing modes as these incur a significant performance + * overhead (the address generation must be emulated in software). + */ +static inline void __raw_writew(u16 val, volatile void __iomem *addr) +{ + asm volatile("strh %0, [%1]" : : "r" (val), "r" (addr)); +} + +static inline u16 __raw_readw(const volatile void __iomem *addr) +{ + u16 val; + asm volatile("ldrh %0, [%1]" : "=r" (val) : "r" (addr)); + return val; +} +#endif -#define __raw_readb(a) (__chk_io_ptr(a), *(volatile unsigned char __force *)(a)) -#define __raw_readw(a) (__chk_io_ptr(a), *(volatile unsigned short __force *)(a)) -#define __raw_readl(a) (__chk_io_ptr(a), *(volatile unsigned int __force *)(a)) +static inline void __raw_writeb(u8 val, volatile void __iomem *addr) +{ + asm volatile("strb %0, [%1]" : : "r" (val), "r" (addr)); +} + +static inline void __raw_writel(u32 val, volatile void __iomem *addr) +{ + asm volatile("str %0, [%1]" : : "r" (val), "r" (addr)); +} + +static inline u8 __raw_readb(const volatile void __iomem *addr) +{ + u8 val; + asm volatile("ldrb %0, [%1]" : "=r" (val) : "r" (addr)); + return val; +} + +static inline u32 __raw_readl(const volatile void __iomem *addr) +{ + u32 val; + asm volatile("ldr %0, [%1]" : "=r" (val) : "r" (addr)); + return val; +} /* * Architecture ioremap implementation. -- 1.7.4.1 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH] ARM: io: avoid writeback addressing modes for __raw_ accessors 2012-08-14 14:59 [PATCH] ARM: io: avoid writeback addressing modes for __raw_ accessors Will Deacon @ 2012-08-17 3:43 ` Nicolas Pitre 2012-08-20 12:41 ` Will Deacon 0 siblings, 1 reply; 13+ messages in thread From: Nicolas Pitre @ 2012-08-17 3:43 UTC (permalink / raw) To: linux-arm-kernel On Tue, 14 Aug 2012, Will Deacon wrote: > Data aborts taken to hyp mode do not provide a valid instruction > syndrome field in the HSR if the faulting instruction is a memory > access using a writeback addressing mode. > > For hypervisors emulating MMIO accesses to virtual peripherals, taking > such an exception requires disassembling the faulting instruction in > order to determine the behaviour of the access. Since this requires > manually walking the two stages of translation, the world must be > stopped to prevent races against page aging in the guest, where the > first-stage translation is invalidated after the hypervisor has > translated to an IPA and the physical page is reused for something else. > > This patch avoids taking this heavy performance penalty when running > Linux as a guest by ensuring that our I/O accessors do not make use of > writeback addressing modes. How often does this happen? I don't really see writeback as a common pattern for IO access. What does happen quite a lot, though, is pre-indexed addressing. For example, let's take this code which is fairly typical of driver code: #define HW_REG1 0x10 #define HW_REG2 0x14 #define HW_REG3 0x18 #define HW_REG4 0x30 int hw_init(void __iomem *ioaddr) { writel(0, ioaddr + HW_REG1) writel(-1, ioaddr + HW_REG2); writel(readl(ioaddr + HW_REG3) | 0xff, ioaddr + HW_REG4); return 0; } Right now this produces this: hw_init: mov r3, r0 mvn r2, #0 mov r0, #0 str r0, [r3, #16] str r2, [r3, #20] ldr r2, [r3, #24] orr r2, r2, #255 str r2, [r3, #48] bx lr With your patch applied this becomes: hw_init: add r2, r0, #16 mov r3, #0 str r3, [r2] mvn r3, #0 add r2, r0, #20 str r3, [r2] add r3, r0, #24 ldr r3, [r3] orr r3, r3, #255 add r0, r0, #48 str r3, [r0] mov r0, #0 bx lr This basically made every IO access into two instructions instead of only one, as well as increasing register pressure. So, is the performance claim something that you've actually measured with a real system, or was it only theoretical? Nicolas ^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH] ARM: io: avoid writeback addressing modes for __raw_ accessors 2012-08-17 3:43 ` Nicolas Pitre @ 2012-08-20 12:41 ` Will Deacon 2012-08-20 13:29 ` Nicolas Pitre 0 siblings, 1 reply; 13+ messages in thread From: Will Deacon @ 2012-08-20 12:41 UTC (permalink / raw) To: linux-arm-kernel Hi Nicolas, [apologies in advance for the long reply] On Fri, Aug 17, 2012 at 04:43:01AM +0100, Nicolas Pitre wrote: > On Tue, 14 Aug 2012, Will Deacon wrote: > > > Data aborts taken to hyp mode do not provide a valid instruction > > syndrome field in the HSR if the faulting instruction is a memory > > access using a writeback addressing mode. > > > > For hypervisors emulating MMIO accesses to virtual peripherals, taking > > such an exception requires disassembling the faulting instruction in > > order to determine the behaviour of the access. Since this requires > > manually walking the two stages of translation, the world must be > > stopped to prevent races against page aging in the guest, where the > > first-stage translation is invalidated after the hypervisor has > > translated to an IPA and the physical page is reused for something else. > > > > This patch avoids taking this heavy performance penalty when running > > Linux as a guest by ensuring that our I/O accessors do not make use of > > writeback addressing modes. > > How often does this happen? I don't really see writeback as a common > pattern for IO access. Building a Thumb-2 kernel with GCC: gcc version 4.6.3 20120201 (prerelease) (crosstool-NG linaro-1.13.1-2012.02-20120222 - Linaro GCC 2012.02) Translates the following code from amba_device_add: for (pid = 0, i = 0; i < 4; i++) pid |= (readl(tmp + size - 0x20 + 4 * i) & 255) << (i * 8); for (cid = 0, i = 0; i < 4; i++) cid |= (readl(tmp + size - 0x10 + 4 * i) & 255) << (i * 8); into: c0177bec: f853 2b04 ldr.w r2, [r3], #4 <--- c0177bf0: f3bf 8f4f dsb sy c0177bf4: b2d2 uxtb r2, r2 c0177bf6: 40a2 lsls r2, r4 c0177bf8: 3408 adds r4, #8 c0177bfa: 2c20 cmp r4, #32 c0177bfc: ea45 0502 orr.w r5, r5, r2 c0177c00: d1f4 bne.n c0177bec <amba_device_add+0x94> c0177c02: f1a8 0810 sub.w r8, r8, #16 c0177c06: 2300 movs r3, #0 c0177c08: 44c8 add r8, r9 c0177c0a: 461c mov r4, r3 c0177c0c: f858 2b04 ldr.w r2, [r8], #4 <--- c0177c10: f3bf 8f4f dsb sy c0177c14: b2d2 uxtb r2, r2 c0177c16: 409a lsls r2, r3 c0177c18: 3308 adds r3, #8 c0177c1a: 2b20 cmp r3, #32 c0177c1c: ea44 0402 orr.w r4, r4, r2 c0177c20: d1f4 bne.n c0177c0c <amba_device_add+0xb4> so this is happening with recent toolchains and current kernel sources. > What does happen quite a lot, though, is pre-indexed addressing. For > example, let's take this code which is fairly typical of driver code: > > #define HW_REG1 0x10 > #define HW_REG2 0x14 > #define HW_REG3 0x18 > #define HW_REG4 0x30 > > int hw_init(void __iomem *ioaddr) > { > writel(0, ioaddr + HW_REG1) > writel(-1, ioaddr + HW_REG2); > writel(readl(ioaddr + HW_REG3) | 0xff, ioaddr + HW_REG4); > return 0; > } > > Right now this produces this: > > hw_init: > mov r3, r0 > mvn r2, #0 > mov r0, #0 > str r0, [r3, #16] > str r2, [r3, #20] > ldr r2, [r3, #24] > orr r2, r2, #255 > str r2, [r3, #48] > bx lr Well, that's not quite true for CONFIG_ARM_DMA_MEM_BUFFERABLE=y. The dsb and compiler barrier ends up creating this monster for that code: 00000280 <hw_init>: 280: 4603 mov r3, r0 282: f3bf 8f4f dsb sy 286: 2000 movs r0, #0 288: 6118 str r0, [r3, #16] 28a: f3bf 8f4f dsb sy 28e: f04f 32ff mov.w r2, #4294967295 292: 615a str r2, [r3, #20] 294: f3bf 8f4f dsb sy 298: 699a ldr r2, [r3, #24] 29a: f3bf 8f4f dsb sy 29e: f042 02ff orr.w r2, r2, #255 ; 0xff 2a2: 631a str r2, [r3, #48] ; 0x30 2a4: 4770 bx lr 2a6: bf00 nop whilst the addressing modes are still nice, the dsb is going to be the performance limitation here. That said, we could try the "Qo" constraints, which I seem to remember don't generate writebacks. I'll have a play. > So, is the performance claim something that you've actually measured > with a real system, or was it only theoretical? The difference is down to the work done by the hypvervisor: an MMIO access will trap to hyp mode, where the HSR describes the instruction (access size, load/store, Rt, signed etc). For the case of a writeback instruction, this information is not provided by the hardware. Instead, the hypervisor has to disassemble the faulting instruction and work out what it's doing at which address prior to emulation. Even if that cost was acceptable, the problem then gets worse. Imagine that a guest MMIO access faults into the hypervisor, where the emulation code tries to decode the instruction because the fault information is incomplete. To do this, it must obtain the *physical* address of the faulting text page so that it can load the instruction. This happens via the ATS12NSOP{R,W} registers, which return a PA for the faulting VA (i.e. both stages of translation). Now, let's say the hypervisor has got hold of a PA but hasn't yet loaded the instruction. Meanwhile, another virtual CPU running the same guest decides (due to page aging or whatnot) to reclaim the text page containing the faulting instruction. It writes a faulting pte and does a TLB invalidation, however this is too late for the hypervisor, who has already translated its address. Furthermore, let's say that the guest then reuses the same physical page for something like a network buffer. The hypervisor goes ahead and grabs what it thinks is the faulting instruction from memory but in fact gets a load of random network data! To deal with this, the hypervisor will likely have to stop the virtual world when emulating any MMIO accesses that report incomplete fault information to avoid racing with a TLB invalidation from another virtual CPU. That will certainly be more expensive than an additional instruction on each access. To answer your question: I haven't measured it (largely because KVM can't pull apart Thumb-2 instructions yet). I believe that the KVM guys were planning to print a diagnostic prior to emulation/world-stop if the fault information is incomplete, complaining that the guest is using writeback addressing modes for I/O. Cheers, Will ^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH] ARM: io: avoid writeback addressing modes for __raw_ accessors 2012-08-20 12:41 ` Will Deacon @ 2012-08-20 13:29 ` Nicolas Pitre 2012-08-20 14:49 ` Will Deacon 0 siblings, 1 reply; 13+ messages in thread From: Nicolas Pitre @ 2012-08-20 13:29 UTC (permalink / raw) To: linux-arm-kernel On Mon, 20 Aug 2012, Will Deacon wrote: > Hi Nicolas, > > [apologies in advance for the long reply] > > On Fri, Aug 17, 2012 at 04:43:01AM +0100, Nicolas Pitre wrote: > > On Tue, 14 Aug 2012, Will Deacon wrote: > > > > > Data aborts taken to hyp mode do not provide a valid instruction > > > syndrome field in the HSR if the faulting instruction is a memory > > > access using a writeback addressing mode. > > > > > > For hypervisors emulating MMIO accesses to virtual peripherals, taking > > > such an exception requires disassembling the faulting instruction in > > > order to determine the behaviour of the access. Since this requires > > > manually walking the two stages of translation, the world must be > > > stopped to prevent races against page aging in the guest, where the > > > first-stage translation is invalidated after the hypervisor has > > > translated to an IPA and the physical page is reused for something else. > > > > > > This patch avoids taking this heavy performance penalty when running > > > Linux as a guest by ensuring that our I/O accessors do not make use of > > > writeback addressing modes. > > > > How often does this happen? I don't really see writeback as a common > > pattern for IO access. > > Building a Thumb-2 kernel with GCC: > > gcc version 4.6.3 20120201 (prerelease) (crosstool-NG > linaro-1.13.1-2012.02-20120222 - Linaro GCC 2012.02) > > Translates the following code from amba_device_add: > > for (pid = 0, i = 0; i < 4; i++) > pid |= (readl(tmp + size - 0x20 + 4 * i) & 255) << > (i * 8); > for (cid = 0, i = 0; i < 4; i++) > cid |= (readl(tmp + size - 0x10 + 4 * i) & 255) << > (i * 8); > > into: [...] OK, I can see how the compiler will fold the loop increment into the IO access instruction. But my point is: is this common? And when this happens, is this a critical path? > > What does happen quite a lot, though, is pre-indexed addressing. For > > example, let's take this code which is fairly typical of driver code: > > > > #define HW_REG1 0x10 > > #define HW_REG2 0x14 > > #define HW_REG3 0x18 > > #define HW_REG4 0x30 > > > > int hw_init(void __iomem *ioaddr) > > { > > writel(0, ioaddr + HW_REG1) > > writel(-1, ioaddr + HW_REG2); > > writel(readl(ioaddr + HW_REG3) | 0xff, ioaddr + HW_REG4); > > return 0; > > } > > > > Right now this produces this: > > > > hw_init: > > mov r3, r0 > > mvn r2, #0 > > mov r0, #0 > > str r0, [r3, #16] > > str r2, [r3, #20] > > ldr r2, [r3, #24] > > orr r2, r2, #255 > > str r2, [r3, #48] > > bx lr > > Well, that's not quite true for CONFIG_ARM_DMA_MEM_BUFFERABLE=y. True. I turned those readl() into their raw counterparts to generate the assembly but didn't update the example code in my mailer. > whilst the addressing modes are still nice, the dsb is going to be the > performance limitation here. That said, we could try the "Qo" constraints, > which I seem to remember don't generate writebacks. I'll have a play. OK. That would be excellent. > > So, is the performance claim something that you've actually measured > > with a real system, or was it only theoretical? > > The difference is down to the work done by the hypvervisor: an MMIO access > will trap to hyp mode, where the HSR describes the instruction (access size, > load/store, Rt, signed etc). For the case of a writeback instruction, this > information is not provided by the hardware. Instead, the hypervisor has to > disassemble the faulting instruction and work out what it's doing at which > address prior to emulation. > > Even if that cost was acceptable, the problem then gets worse. Imagine that > a guest MMIO access faults into the hypervisor, where the emulation code > tries to decode the instruction because the fault information is incomplete. > To do this, it must obtain the *physical* address of the faulting text page > so that it can load the instruction. This happens via the ATS12NSOP{R,W} > registers, which return a PA for the faulting VA (i.e. both stages of > translation). > > Now, let's say the hypervisor has got hold of a PA but hasn't yet loaded the > instruction. Meanwhile, another virtual CPU running the same guest decides > (due to page aging or whatnot) to reclaim the text page containing the > faulting instruction. It writes a faulting pte and does a TLB invalidation, > however this is too late for the hypervisor, who has already translated its > address. Furthermore, let's say that the guest then reuses the same physical > page for something like a network buffer. The hypervisor goes ahead and grabs > what it thinks is the faulting instruction from memory but in fact gets a > load of random network data! > > To deal with this, the hypervisor will likely have to stop the virtual world > when emulating any MMIO accesses that report incomplete fault information to > avoid racing with a TLB invalidation from another virtual CPU. That will > certainly be more expensive than an additional instruction on each access. I totally agree with you here. However, for completeness and above all for security reasons, the hypervisor will _ahve_ to support that case anyway. So it is now a matter of compromise between performance and code size. If the pathological case you brought up above is the exception and not the rule then I think that we can live with the performance impact in that case and keep the optimal pre-indexed addressing for the common cases. Nicolas ^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH] ARM: io: avoid writeback addressing modes for __raw_ accessors 2012-08-20 13:29 ` Nicolas Pitre @ 2012-08-20 14:49 ` Will Deacon 2012-08-20 16:09 ` Nicolas Pitre 0 siblings, 1 reply; 13+ messages in thread From: Will Deacon @ 2012-08-20 14:49 UTC (permalink / raw) To: linux-arm-kernel On Mon, Aug 20, 2012 at 02:29:31PM +0100, Nicolas Pitre wrote: > On Mon, 20 Aug 2012, Will Deacon wrote: > > Translates the following code from amba_device_add: > > > > for (pid = 0, i = 0; i < 4; i++) > > pid |= (readl(tmp + size - 0x20 + 4 * i) & 255) << > > (i * 8); > > for (cid = 0, i = 0; i < 4; i++) > > cid |= (readl(tmp + size - 0x10 + 4 * i) & 255) << > > (i * 8); > > > > into: > [...] > > OK, I can see how the compiler will fold the loop increment into the IO > access instruction. But my point is: is this common? And when this > happens, is this a critical path? Sure, this case isn't such a big deal (basically just boot-time probing) but GCC could still generate this stuff in a driver, where it could end up being an I/O bottleneck for a guest OS. > > whilst the addressing modes are still nice, the dsb is going to be the > > performance limitation here. That said, we could try the "Qo" constraints, > > which I seem to remember don't generate writebacks. I'll have a play. > > OK. That would be excellent. Looks like we have a winner (diff against the original patch below). I now see: 00000340 <hw_init>: 340: e1a03000 mov r3, r0 344: e3a00000 mov r0, #0 348: e5830010 str r0, [r3, #16] 34c: e3e02000 mvn r2, #0 350: e5832014 str r2, [r3, #20] 354: e5932018 ldr r2, [r3, #24] 358: e38220ff orr r2, r2, #255 ; 0xff 35c: e5832030 str r2, [r3, #48] ; 0x30 360: e12fff1e bx lr with the new code, which is basically the same as the old code but the mvn and a str have switched places. The same difference occurs when targetting Thumb2. > > To deal with this, the hypervisor will likely have to stop the virtual world > > when emulating any MMIO accesses that report incomplete fault information to > > avoid racing with a TLB invalidation from another virtual CPU. That will > > certainly be more expensive than an additional instruction on each access. > > I totally agree with you here. > > However, for completeness and above all for security reasons, the > hypervisor will _ahve_ to support that case anyway. Support it, yes, but perhaps not efficiently. > So it is now a matter of compromise between performance and code size. > If the pathological case you brought up above is the exception and not > the rule then I think that we can live with the performance impact in > that case and keep the optimal pre-indexed addressing for the common > cases. It looks like the new code does exactly what we want, so I think we could actually have the best of both worlds: pre-index addressing and no writeback. Will --- >8 diff --git a/arch/arm/include/asm/io.h b/arch/arm/include/asm/io.h index b54d687..bbc94c2 100644 --- a/arch/arm/include/asm/io.h +++ b/arch/arm/include/asm/io.h @@ -63,38 +63,50 @@ extern void __raw_readsl(const void __iomem *addr, void *data, int longlen); */ static inline void __raw_writew(u16 val, volatile void __iomem *addr) { - asm volatile("strh %0, [%1]" : : "r" (val), "r" (addr)); + asm volatile("strh %1, %0" + : "=Qo" (*(volatile u16 __force *)addr) + : "r" (val)); } static inline u16 __raw_readw(const volatile void __iomem *addr) { u16 val; - asm volatile("ldrh %0, [%1]" : "=r" (val) : "r" (addr)); + asm volatile("ldrh %0, %1" + : "=r" (val) + : "Qo" (*(const volatile u16 __force *)addr)); return val; } #endif static inline void __raw_writeb(u8 val, volatile void __iomem *addr) { - asm volatile("strb %0, [%1]" : : "r" (val), "r" (addr)); + asm volatile("strb %1, %0" + : "=Qo" (*(volatile u8 __force *)addr) + : "r" (val)); } static inline void __raw_writel(u32 val, volatile void __iomem *addr) { - asm volatile("str %0, [%1]" : : "r" (val), "r" (addr)); + asm volatile("str %1, %0" + : "=Qo" (*(volatile u32 __force *)addr) + : "r" (val)); } static inline u8 __raw_readb(const volatile void __iomem *addr) { u8 val; - asm volatile("ldrb %0, [%1]" : "=r" (val) : "r" (addr)); + asm volatile("ldrb %0, %1" + : "=r" (val) + : "Qo" (*(const volatile u8 __force *)addr)); return val; } static inline u32 __raw_readl(const volatile void __iomem *addr) { u32 val; - asm volatile("ldr %0, [%1]" : "=r" (val) : "r" (addr)); + asm volatile("ldr %0, %1" + : "=r" (val) + : "Qo" (*(const volatile u32 __force *)addr)); return val; } ^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH] ARM: io: avoid writeback addressing modes for __raw_ accessors 2012-08-20 14:49 ` Will Deacon @ 2012-08-20 16:09 ` Nicolas Pitre 2012-08-20 16:54 ` Will Deacon 0 siblings, 1 reply; 13+ messages in thread From: Nicolas Pitre @ 2012-08-20 16:09 UTC (permalink / raw) To: linux-arm-kernel On Mon, 20 Aug 2012, Will Deacon wrote: > On Mon, Aug 20, 2012 at 02:29:31PM +0100, Nicolas Pitre wrote: > > On Mon, 20 Aug 2012, Will Deacon wrote: > > > That said, we could try the "Qo" constraints, which I seem to > > > remember don't generate writebacks. I'll have a play. > > > > OK. That would be excellent. > > Looks like we have a winner (diff against the original patch below). I now > see: > > 00000340 <hw_init>: > 340: e1a03000 mov r3, r0 > 344: e3a00000 mov r0, #0 > 348: e5830010 str r0, [r3, #16] > 34c: e3e02000 mvn r2, #0 > 350: e5832014 str r2, [r3, #20] > 354: e5932018 ldr r2, [r3, #24] > 358: e38220ff orr r2, r2, #255 ; 0xff > 35c: e5832030 str r2, [r3, #48] ; 0x30 > 360: e12fff1e bx lr > > with the new code, which is basically the same as the old code but the mvn and > a str have switched places. The same difference occurs when targetting Thumb2. OK. Of course the compiler cannot have the same cost evaluation when instructions are hidden inside an asm statement which might change the instruction scheduling slightly. But I don't think that matters much for IO accesses. > --- >8 > > diff --git a/arch/arm/include/asm/io.h b/arch/arm/include/asm/io.h > index b54d687..bbc94c2 100644 > --- a/arch/arm/include/asm/io.h > +++ b/arch/arm/include/asm/io.h > @@ -63,38 +63,50 @@ extern void __raw_readsl(const void __iomem *addr, void *data, int longlen); > */ > static inline void __raw_writew(u16 val, volatile void __iomem *addr) > { > - asm volatile("strh %0, [%1]" : : "r" (val), "r" (addr)); > + asm volatile("strh %1, %0" > + : "=Qo" (*(volatile u16 __force *)addr) > + : "r" (val)); > } > > static inline u16 __raw_readw(const volatile void __iomem *addr) > { > u16 val; > - asm volatile("ldrh %0, [%1]" : "=r" (val) : "r" (addr)); > + asm volatile("ldrh %0, %1" > + : "=r" (val) > + : "Qo" (*(const volatile u16 __force *)addr)); > return val; > } Semantically, I think the qualifier on the Qo constraint should be + as in "+Qo" listed in the input operand list in both cases since we may not assume anything about the memory location when it is referring to IO registers. It is not because you write to it that previous writes can be optimized away, and it is not because you read from it that the accessed memory location will remain the same after the read. Granted, the volatile should take care of that, but it doesn't hurt to be explicit. If you fix that then you can add... Reviewed-by: Nicolas Pitre <nico@linaro.org> Nicolas ^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH] ARM: io: avoid writeback addressing modes for __raw_ accessors 2012-08-20 16:09 ` Nicolas Pitre @ 2012-08-20 16:54 ` Will Deacon 2012-08-20 18:04 ` Nicolas Pitre 0 siblings, 1 reply; 13+ messages in thread From: Will Deacon @ 2012-08-20 16:54 UTC (permalink / raw) To: linux-arm-kernel On Mon, Aug 20, 2012 at 05:09:06PM +0100, Nicolas Pitre wrote: > On Mon, 20 Aug 2012, Will Deacon wrote: > > diff --git a/arch/arm/include/asm/io.h b/arch/arm/include/asm/io.h > > index b54d687..bbc94c2 100644 > > --- a/arch/arm/include/asm/io.h > > +++ b/arch/arm/include/asm/io.h > > @@ -63,38 +63,50 @@ extern void __raw_readsl(const void __iomem *addr, void *data, int longlen); > > */ > > static inline void __raw_writew(u16 val, volatile void __iomem *addr) > > { > > - asm volatile("strh %0, [%1]" : : "r" (val), "r" (addr)); > > + asm volatile("strh %1, %0" > > + : "=Qo" (*(volatile u16 __force *)addr) > > + : "r" (val)); > > } > > > > static inline u16 __raw_readw(const volatile void __iomem *addr) > > { > > u16 val; > > - asm volatile("ldrh %0, [%1]" : "=r" (val) : "r" (addr)); > > + asm volatile("ldrh %0, %1" > > + : "=r" (val) > > + : "Qo" (*(const volatile u16 __force *)addr)); > > return val; > > } > > Semantically, I think the qualifier on the Qo constraint should be + as > in "+Qo" listed in the input operand list in both cases since we may not > assume anything about the memory location when it is referring to IO > registers. It is not because you write to it that previous writes can > be optimized away, and it is not because you read from it that the > accessed memory location will remain the same after the read. Granted, > the volatile should take care of that, but it doesn't hurt to be > explicit. Hmm, ok. I too would hope that the volatile keyword would sort that out but, since the '+' doesn't seem to change the generated code, I can add that. It does, however, mean we have to cast away the `const' in the read accessors which makes the code even uglier. > Reviewed-by: Nicolas Pitre <nico@linaro.org> Cheers Nicolas, I'll post the patch independently with your tag. Will ^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH] ARM: io: avoid writeback addressing modes for __raw_ accessors 2012-08-20 16:54 ` Will Deacon @ 2012-08-20 18:04 ` Nicolas Pitre 2012-08-20 18:10 ` Will Deacon 0 siblings, 1 reply; 13+ messages in thread From: Nicolas Pitre @ 2012-08-20 18:04 UTC (permalink / raw) To: linux-arm-kernel On Mon, 20 Aug 2012, Will Deacon wrote: > On Mon, Aug 20, 2012 at 05:09:06PM +0100, Nicolas Pitre wrote: > > On Mon, 20 Aug 2012, Will Deacon wrote: > > > diff --git a/arch/arm/include/asm/io.h b/arch/arm/include/asm/io.h > > > index b54d687..bbc94c2 100644 > > > --- a/arch/arm/include/asm/io.h > > > +++ b/arch/arm/include/asm/io.h > > > @@ -63,38 +63,50 @@ extern void __raw_readsl(const void __iomem *addr, void *data, int longlen); > > > */ > > > static inline void __raw_writew(u16 val, volatile void __iomem *addr) > > > { > > > - asm volatile("strh %0, [%1]" : : "r" (val), "r" (addr)); > > > + asm volatile("strh %1, %0" > > > + : "=Qo" (*(volatile u16 __force *)addr) > > > + : "r" (val)); > > > } > > > > > > static inline u16 __raw_readw(const volatile void __iomem *addr) > > > { > > > u16 val; > > > - asm volatile("ldrh %0, [%1]" : "=r" (val) : "r" (addr)); > > > + asm volatile("ldrh %0, %1" > > > + : "=r" (val) > > > + : "Qo" (*(const volatile u16 __force *)addr)); > > > return val; > > > } > > > > Semantically, I think the qualifier on the Qo constraint should be + as > > in "+Qo" listed in the input operand list in both cases since we may not > > assume anything about the memory location when it is referring to IO > > registers. It is not because you write to it that previous writes can > > be optimized away, and it is not because you read from it that the > > accessed memory location will remain the same after the read. Granted, > > the volatile should take care of that, but it doesn't hurt to be > > explicit. > > Hmm, ok. I too would hope that the volatile keyword would sort that out but, > since the '+' doesn't seem to change the generated code, I can add that. It > does, however, mean we have to cast away the `const' in the read accessors > which makes the code even uglier. Nah... the const is wrong. The way you wrote it means that addr may change but the pointed data is constant. This is obviously wrong since we expect the pointed location to change even from a read. Nicolas ^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH] ARM: io: avoid writeback addressing modes for __raw_ accessors 2012-08-20 18:04 ` Nicolas Pitre @ 2012-08-20 18:10 ` Will Deacon 2012-08-20 18:45 ` Nicolas Pitre 0 siblings, 1 reply; 13+ messages in thread From: Will Deacon @ 2012-08-20 18:10 UTC (permalink / raw) To: linux-arm-kernel On Mon, Aug 20, 2012 at 07:04:01PM +0100, Nicolas Pitre wrote: > On Mon, 20 Aug 2012, Will Deacon wrote: > > > > Hmm, ok. I too would hope that the volatile keyword would sort that out but, > > since the '+' doesn't seem to change the generated code, I can add that. It > > does, however, mean we have to cast away the `const' in the read accessors > > which makes the code even uglier. > > Nah... the const is wrong. The way you wrote it means that addr may > change but the pointed data is constant. This is obviously wrong since > we expect the pointed location to change even from a read. That's the prototype for the read accessors though -- a bunch of other architectures define them that way (including asm-generic), so I wonder what the reasoning behind that was? Will ^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH] ARM: io: avoid writeback addressing modes for __raw_ accessors 2012-08-20 18:10 ` Will Deacon @ 2012-08-20 18:45 ` Nicolas Pitre 2012-08-21 9:02 ` Will Deacon 0 siblings, 1 reply; 13+ messages in thread From: Nicolas Pitre @ 2012-08-20 18:45 UTC (permalink / raw) To: linux-arm-kernel On Mon, 20 Aug 2012, Will Deacon wrote: > On Mon, Aug 20, 2012 at 07:04:01PM +0100, Nicolas Pitre wrote: > > On Mon, 20 Aug 2012, Will Deacon wrote: > > > > > > Hmm, ok. I too would hope that the volatile keyword would sort that out but, > > > since the '+' doesn't seem to change the generated code, I can add that. It > > > does, however, mean we have to cast away the `const' in the read accessors > > > which makes the code even uglier. > > > > Nah... the const is wrong. The way you wrote it means that addr may > > change but the pointed data is constant. This is obviously wrong since > > we expect the pointed location to change even from a read. > > That's the prototype for the read accessors though -- a bunch of other > architectures define them that way (including asm-generic), so I wonder what > the reasoning behind that was? I'm still asserting that they're wrong. Nicolas ^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH] ARM: io: avoid writeback addressing modes for __raw_ accessors 2012-08-20 18:45 ` Nicolas Pitre @ 2012-08-21 9:02 ` Will Deacon 2012-08-21 10:11 ` Arnd Bergmann 2012-08-21 12:33 ` Nicolas Pitre 0 siblings, 2 replies; 13+ messages in thread From: Will Deacon @ 2012-08-21 9:02 UTC (permalink / raw) To: linux-arm-kernel On Mon, Aug 20, 2012 at 07:45:20PM +0100, Nicolas Pitre wrote: > On Mon, 20 Aug 2012, Will Deacon wrote: > > On Mon, Aug 20, 2012 at 07:04:01PM +0100, Nicolas Pitre wrote: > > > Nah... the const is wrong. The way you wrote it means that addr may > > > change but the pointed data is constant. This is obviously wrong since > > > we expect the pointed location to change even from a read. > > > > That's the prototype for the read accessors though -- a bunch of other > > architectures define them that way (including asm-generic), so I wonder what > > the reasoning behind that was? > > I'm still asserting that they're wrong. I did a bit of digging around and `const volatile void *' is apparently used because a function with such a parameter type can be passed any old pointer without warnings. Torvalds says something about them here: http://readlist.com/lists/vger.kernel.org/linux-kernel/14/72300.html Personally, I too think that the const is misleading and who on Earth would be passing in pointers to const for an I/O region? However, it's an argument I'd rather avoid so, for the sake of consistency, I'll cast away the const in the asm block. Cheers, Will ^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH] ARM: io: avoid writeback addressing modes for __raw_ accessors 2012-08-21 9:02 ` Will Deacon @ 2012-08-21 10:11 ` Arnd Bergmann 2012-08-21 12:33 ` Nicolas Pitre 1 sibling, 0 replies; 13+ messages in thread From: Arnd Bergmann @ 2012-08-21 10:11 UTC (permalink / raw) To: linux-arm-kernel On Tuesday 21 August 2012, Will Deacon wrote: > On Mon, Aug 20, 2012 at 07:45:20PM +0100, Nicolas Pitre wrote: > > On Mon, 20 Aug 2012, Will Deacon wrote: > > > On Mon, Aug 20, 2012 at 07:04:01PM +0100, Nicolas Pitre wrote: > > > > Nah... the const is wrong. The way you wrote it means that addr may > > > > change but the pointed data is constant. This is obviously wrong since > > > > we expect the pointed location to change even from a read. > > > > > > That's the prototype for the read accessors though -- a bunch of other > > > architectures define them that way (including asm-generic), so I wonder what > > > the reasoning behind that was? > > > > I'm still asserting that they're wrong. > > I did a bit of digging around and `const volatile void *' is apparently used > because a function with such a parameter type can be passed any old pointer > without warnings. Torvalds says something about them here: > > http://readlist.com/lists/vger.kernel.org/linux-kernel/14/72300.html > > Personally, I too think that the const is misleading and who on Earth would > be passing in pointers to const for an I/O region? However, it's an argument > I'd rather avoid so, for the sake of consistency, I'll cast away the const in > the asm block. > You could have a read-only register area, and I would regard it as sensible to mark a pointer to it as "const", although there should not be any optimizations based on that. There is no need to cast away the constness of the pointer when passing it into the inline assembly, as the "asm volatile" already implies that gcc cannot remove the contents. On a related topic, thank you very much for introducing the inline assemblies here, as they finally solve a lingering problem that has hit us in the past [1] and that could happen again in other drivers. Arnd [1] http://old.nabble.com/ARM-unaligned-MMIO-access-with-attribute%28%28packed%29%29-td30827280.html ^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH] ARM: io: avoid writeback addressing modes for __raw_ accessors 2012-08-21 9:02 ` Will Deacon 2012-08-21 10:11 ` Arnd Bergmann @ 2012-08-21 12:33 ` Nicolas Pitre 1 sibling, 0 replies; 13+ messages in thread From: Nicolas Pitre @ 2012-08-21 12:33 UTC (permalink / raw) To: linux-arm-kernel On Tue, 21 Aug 2012, Will Deacon wrote: > On Mon, Aug 20, 2012 at 07:45:20PM +0100, Nicolas Pitre wrote: > > On Mon, 20 Aug 2012, Will Deacon wrote: > > > On Mon, Aug 20, 2012 at 07:04:01PM +0100, Nicolas Pitre wrote: > > > > Nah... the const is wrong. The way you wrote it means that addr may > > > > change but the pointed data is constant. This is obviously wrong since > > > > we expect the pointed location to change even from a read. > > > > > > That's the prototype for the read accessors though -- a bunch of other > > > architectures define them that way (including asm-generic), so I wonder what > > > the reasoning behind that was? > > > > I'm still asserting that they're wrong. > > I did a bit of digging around and `const volatile void *' is apparently used > because a function with such a parameter type can be passed any old pointer > without warnings. Torvalds says something about them here: > > http://readlist.com/lists/vger.kernel.org/linux-kernel/14/72300.html > > Personally, I too think that the const is misleading and who on Earth would > be passing in pointers to const for an I/O region? However, it's an argument > I'd rather avoid so, for the sake of consistency, I'll cast away the const in > the asm block. OK. Either that or your previous patch should do then. Nicolas ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2012-08-21 12:33 UTC | newest] Thread overview: 13+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-08-14 14:59 [PATCH] ARM: io: avoid writeback addressing modes for __raw_ accessors Will Deacon 2012-08-17 3:43 ` Nicolas Pitre 2012-08-20 12:41 ` Will Deacon 2012-08-20 13:29 ` Nicolas Pitre 2012-08-20 14:49 ` Will Deacon 2012-08-20 16:09 ` Nicolas Pitre 2012-08-20 16:54 ` Will Deacon 2012-08-20 18:04 ` Nicolas Pitre 2012-08-20 18:10 ` Will Deacon 2012-08-20 18:45 ` Nicolas Pitre 2012-08-21 9:02 ` Will Deacon 2012-08-21 10:11 ` Arnd Bergmann 2012-08-21 12:33 ` Nicolas Pitre
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).