* Code generation involving __raw_readl and __raw_writel
@ 2014-11-27 10:40 Mason
2014-11-27 10:48 ` Russell King - ARM Linux
` (2 more replies)
0 siblings, 3 replies; 17+ messages in thread
From: Mason @ 2014-11-27 10:40 UTC (permalink / raw)
To: linux-arm-kernel
Hello everyone,
Consider the following code (preprocessor output):
static int tangox_target(struct cpufreq_policy *policy, unsigned int index)
{
while (__raw_readl((volatile void *)(0xf0000000 +(0x10024))) >> 31);
__raw_writel(0, (volatile void *)(0xf0000000 +(0x10024)));
return 0;
}
gcc generates the following code:
(version and command-line in sig below)
00000014 <tangox_target>:
14: e3a03000 mov r3, #0
18: e34f3001 movt r3, #61441 ; 0xf001
1c: e3a02000 mov r2, #0
20: e34f2001 movt r2, #61441 ; 0xf001
24: e5931024 ldr r1, [r3, #36] ; 0x24
28: e3510000 cmp r1, #0
2c: bafffffa blt 1c <tangox_target+0x8>
30: e3a00000 mov r0, #0
34: e5820024 str r0, [r2, #36] ; 0x24
38: e12fff1e bx lr
Do you know why gcc duplicates the address in r2 and r3?
And keeps putting the address in r2 over and over in the loop?
I was expecting something more along these lines:
00000014 <tangox_target>:
14: e3a03000 mov r3, #0
18: e34f3001 movt r3, #61441 ; 0xf001
1c: e5931024 ldr r1, [r3, #36] ; 0x24
20: e3510000 cmp r1, #0
24: bafffffa blt 1c <tangox_target+0x8>
28: e3a00000 mov r0, #0
2c: e5820024 str r0, [r3, #36] ; 0x24
30: e12fff1e bx lr
Regards.
--
gcc version 4.6.3 (Sourcery CodeBench Lite 2012.03-57)
arm-none-linux-gnueabi-gcc -Wp,-MD,arch/arm/mach-tangox/.cpufreq.o.d -nostdinc -isystem /utils/release/sets/6_1_3_8756/sigma-buildroot_2012.02-22/output/host/opt/ext-toolchain/bin/../lib/gcc/arm-none-linux-gnueabi/4.6.3/include -I/home/maze/linux/arch/arm/include -Iarch/arm/include/generated -Iinclude -I/home/maze/linux/arch/arm/include/uapi -Iarch/arm/include/generated/uapi -I/home/maze/linux/include/uapi -Iinclude/generated/uapi -include /home/maze/linux/include/linux/kconfig.h -D__KERNEL__ -mlittle-endian -Iarch/arm/mach-tangox/include -Wall -Wundef -Wstrict-prototypes -Wno-trigraphs -fno-strict-aliasing -fno-common -Werror-implicit-function-declaration -Wno-format-security -fno-delete-null-pointer-checks -O2 -fno-dwarf2-cfi-asm -mabi=aapcs-linux -mno-thumb-interwork -mfpu=vfp -funwind-tables -marm -D__LINUX_ARM_ARCH__=7 -march=armv7-a -msoft-float -Uarm -Wframe-larger-than=1024 -fno-stack-protector -Wno-unused-but-set-variable -fomit-frame-pointer -fno-var-tracking-assignments
-
g -Wdeclaration-after-statement -Wno-pointer-sign -fno-strict-overflow -fconserve-stack -Werror=implicit-int -Werror=strict-prototypes -D"KBUILD_STR(s)=#s" -D"KBUILD_BASENAME=KBUILD_STR(cpufreq)" -D"KBUILD_MODNAME=KBUILD_STR(cpufreq)" -c -o arch/arm/mach-tangox/cpufreq.o arch/arm/mach-tangox/cpufreq.c
^ permalink raw reply [flat|nested] 17+ messages in thread* Code generation involving __raw_readl and __raw_writel 2014-11-27 10:40 Code generation involving __raw_readl and __raw_writel Mason @ 2014-11-27 10:48 ` Russell King - ARM Linux 2014-11-27 11:09 ` Willy Tarreau 2014-11-27 11:23 ` Arnd Bergmann 2 siblings, 0 replies; 17+ messages in thread From: Russell King - ARM Linux @ 2014-11-27 10:48 UTC (permalink / raw) To: linux-arm-kernel On Thu, Nov 27, 2014 at 11:40:34AM +0100, Mason wrote: > Hello everyone, > > Consider the following code (preprocessor output): > > static int tangox_target(struct cpufreq_policy *policy, unsigned int index) > { > while (__raw_readl((volatile void *)(0xf0000000 +(0x10024))) >> 31); > __raw_writel(0, (volatile void *)(0xf0000000 +(0x10024))); > return 0; > } > > gcc generates the following code: > (version and command-line in sig below) > > 00000014 <tangox_target>: > 14: e3a03000 mov r3, #0 > 18: e34f3001 movt r3, #61441 ; 0xf001 > 1c: e3a02000 mov r2, #0 > 20: e34f2001 movt r2, #61441 ; 0xf001 > 24: e5931024 ldr r1, [r3, #36] ; 0x24 > 28: e3510000 cmp r1, #0 > 2c: bafffffa blt 1c <tangox_target+0x8> > 30: e3a00000 mov r0, #0 > 34: e5820024 str r0, [r2, #36] ; 0x24 > 38: e12fff1e bx lr > > Do you know why gcc duplicates the address in r2 and r3? > And keeps putting the address in r2 over and over in the loop? Because GCC is dumb. GCC has a long history of doing stupid stuff like this. That's why it's often far better to code your functions assuming that GCC isn't going to optimise very well. So, for instance: static int tangox_target(struct cpufreq_policy *policy, unsigned int index) { void __iomem *reg = (void *)(0xf0000000 +(0x10024)); while (__raw_readl(reg) >> 31); __raw_writel(0, reg); return 0; } It's also good practise to add a cpu_relax() to the while loop: while (__raw_readl(reg) >> 31) cpu_relax(); for two reasons - the ';' at the end can easily be overlooked when reading the code, and it also ensures that there are no bugs lurking (eg, some ARM CPUs don't bound their write buffers, which means stores can sit in them permanently while you're looping.) -- FTTC broadband for 0.8mile line: currently at 9.5Mbps down 400kbps up according to speedtest.net. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Code generation involving __raw_readl and __raw_writel 2014-11-27 10:40 Code generation involving __raw_readl and __raw_writel Mason 2014-11-27 10:48 ` Russell King - ARM Linux @ 2014-11-27 11:09 ` Willy Tarreau 2014-11-27 11:23 ` Arnd Bergmann 2 siblings, 0 replies; 17+ messages in thread From: Willy Tarreau @ 2014-11-27 11:09 UTC (permalink / raw) To: linux-arm-kernel Hi, On Thu, Nov 27, 2014 at 11:40:34AM +0100, Mason wrote: > Hello everyone, > > Consider the following code (preprocessor output): > > static int tangox_target(struct cpufreq_policy *policy, unsigned int index) > { > while (__raw_readl((volatile void *)(0xf0000000 +(0x10024))) >> 31); > __raw_writel(0, (volatile void *)(0xf0000000 +(0x10024))); > return 0; > } > > gcc generates the following code: > (version and command-line in sig below) > > 00000014 <tangox_target>: > 14: e3a03000 mov r3, #0 > 18: e34f3001 movt r3, #61441 ; 0xf001 > 1c: e3a02000 mov r2, #0 > 20: e34f2001 movt r2, #61441 ; 0xf001 > 24: e5931024 ldr r1, [r3, #36] ; 0x24 > 28: e3510000 cmp r1, #0 > 2c: bafffffa blt 1c <tangox_target+0x8> > 30: e3a00000 mov r0, #0 > 34: e5820024 str r0, [r2, #36] ; 0x24 > 38: e12fff1e bx lr > > Do you know why gcc duplicates the address in r2 and r3? > And keeps putting the address in r2 over and over in the loop? I'm used to see crap like this all the time, which is why I *always* look at the assembly code for any performance-sensible section. In general, I try hard to help gcc do the right thing, or at least make it harder for it to do the wrong thing. Yes that's painful. But with a bit of training, you get automatisms and don't think about it anymore. Above it's very likely that if you compute your offset into a variable and use this variable, it will magically work. Hoping this helps, Willy ^ permalink raw reply [flat|nested] 17+ messages in thread
* Code generation involving __raw_readl and __raw_writel 2014-11-27 10:40 Code generation involving __raw_readl and __raw_writel Mason 2014-11-27 10:48 ` Russell King - ARM Linux 2014-11-27 11:09 ` Willy Tarreau @ 2014-11-27 11:23 ` Arnd Bergmann 2014-11-27 13:01 ` Mason 2 siblings, 1 reply; 17+ messages in thread From: Arnd Bergmann @ 2014-11-27 11:23 UTC (permalink / raw) To: linux-arm-kernel On Thursday 27 November 2014 11:40:34 Mason wrote: > Hello everyone, > > Consider the following code (preprocessor output): > > static int tangox_target(struct cpufreq_policy *policy, unsigned int index) > { > while (__raw_readl((volatile void *)(0xf0000000 +(0x10024))) >> 31); > __raw_writel(0, (volatile void *)(0xf0000000 +(0x10024))); > return 0; > } First of all: - don't use __raw_readl in driver code, use readl or readl_relaxed. - When you do a busy-loop, add a cpu_relax(). - use proper types: 'void __iomem *', not 'volatile void *'. - use of_iomap or devm_ioremap_resource to get to the pointer for a device, don't just hardcode virtual addresses. > gcc generates the following code: > (version and command-line in sig below) > > 00000014 <tangox_target>: > 14: e3a03000 mov r3, #0 > 18: e34f3001 movt r3, #61441 ; 0xf001 > 1c: e3a02000 mov r2, #0 > 20: e34f2001 movt r2, #61441 ; 0xf001 > 24: e5931024 ldr r1, [r3, #36] ; 0x24 > 28: e3510000 cmp r1, #0 > 2c: bafffffa blt 1c <tangox_target+0x8> > 30: e3a00000 mov r0, #0 > 34: e5820024 str r0, [r2, #36] ; 0x24 > 38: e12fff1e bx lr > > Do you know why gcc duplicates the address in r2 and r3? > And keeps putting the address in r2 over and over in the loop? > > I was expecting something more along these lines: > > 00000014 <tangox_target>: > 14: e3a03000 mov r3, #0 > 18: e34f3001 movt r3, #61441 ; 0xf001 > 1c: e5931024 ldr r1, [r3, #36] ; 0x24 > 20: e3510000 cmp r1, #0 > 24: bafffffa blt 1c <tangox_target+0x8> > 28: e3a00000 mov r0, #0 > 2c: e5820024 str r0, [r3, #36] ; 0x24 > 30: e12fff1e bx lr I suspect the use of 'volatile' just makes gcc avoid all optimizations. Try cleaning up the code first and see if it still happens, then use a local variable to store the __iomem token if you have to. Arnd ^ permalink raw reply [flat|nested] 17+ messages in thread
* Code generation involving __raw_readl and __raw_writel 2014-11-27 11:23 ` Arnd Bergmann @ 2014-11-27 13:01 ` Mason 2014-11-27 13:12 ` Arnd Bergmann 0 siblings, 1 reply; 17+ messages in thread From: Mason @ 2014-11-27 13:01 UTC (permalink / raw) To: linux-arm-kernel On 27/11/2014 12:23, Arnd Bergmann wrote: > On Thursday 27 November 2014 11:40:34 Mason wrote: > >> Hello everyone, >> >> Consider the following code (preprocessor output): >> >> static int tangox_target(struct cpufreq_policy *policy, unsigned int index) >> { >> while (__raw_readl((volatile void *)(0xf0000000 +(0x10024))) >> 31); >> __raw_writel(0, (volatile void *)(0xf0000000 +(0x10024))); >> return 0; >> } > > First of all: Disclaimer: I've only recently started writing driver code, so I am very grateful for every tip/suggestion I receive. Please note that what I provided was *preprocessor output* (to help in diagnosing the "C to ASM" translation). The actual code was: while (gbus_read_reg32(SYS_cpuclk_div_ctrl) >> 31); gbus_write_reg32(SYS_cpuclk_div_ctrl, 0); these being macro wrappers: #define gbus_read_reg32(r) __raw_readl((volatile void __iomem *)IO_ADDRESS(r)) #define gbus_write_reg32(r, v) __raw_writel(v, (volatile void __iomem *)IO_ADDRESS(r)) > - don't use __raw_readl in driver code, use readl or readl_relaxed. Can you expand a bit on the differences between these variants? __raw_readl, readl, readl_relaxed Is this an appropriate reference? https://www.kernel.org/doc/htmldocs/deviceiobook/index.html ( https://lwn.net/Articles/66938/ ) NB: if it matters, I am accessing memory-mapped registers, across a non-standard (I mean not PCI, ISA, etc) memory bus. arch/arm/include/asm/io.h defines #define readl_relaxed(c) ({ u32 __r = le32_to_cpu((__force __le32) __raw_readl(c)); __r; }) #define readl(c) ({ u32 __v = readl_relaxed(c); __iormb(); __v; }) #define ioread32(p) ({ unsigned int __v = le32_to_cpu((__force __le32)__raw_readl(p)); __iormb(); __v; }) static inline u32 __raw_readl(const volatile void __iomem *addr) { u32 val; asm volatile("ldr %1, %0" : "+Qo" (*(volatile u32 __force *)addr), "=r" (val)); return val; } As far as I can tell, readl and ioread32 are equivalent, right? > - When you do a busy-loop, add a cpu_relax(). LDD3 mentions cpu_relax in the context of long delays. My busy wait loop has a maximum latency of 60 ?s, and the cpufreq core is given this information, so that, typically, execution falls through the loop at the first iteration. Should I still use cpu_relax? Doesn't that risk adding latency to the calling thread? On my platform, cpu_relax is defined to a barrier: __asm__ __volatile__("": : :"memory") > - use proper types: 'void __iomem *', not 'volatile void *'. The actual code does use the __iomem attribute. > - use of_iomap or devm_ioremap_resource to get to the pointer for > a device, don't just hardcode virtual addresses. About that. If nothing had been done, 0xf0010024 would be an invalid virtual address, and reading from that address would generate a TLB miss, right? So something must have configured the TLB to accept and translate this address correctly. I'm looking for an iomap or ioremap call, right? I'm asking because I have an idea in mind: on the bus, the first 16 MB contains only memory-mapped registers, so I've been thinking I can map this region at init, and keep it for the lifetime of the system. It would use only one entry in the TLB, since the CPU supports 16 MB super-sections (or whatever they are called). I could even lock that entry in the TLB so that these accesses are guaranteed to never TLB miss, right? I'm also wondering: why did we choose to map physical address PA at virtual address VA = PA + 0xf000000? What happens if I map PA at VA = PA? Is it a problem if I "shadow" the 0 page? (I imagine it is.) Regards. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Code generation involving __raw_readl and __raw_writel 2014-11-27 13:01 ` Mason @ 2014-11-27 13:12 ` Arnd Bergmann 2014-11-27 14:51 ` Mason 0 siblings, 1 reply; 17+ messages in thread From: Arnd Bergmann @ 2014-11-27 13:12 UTC (permalink / raw) To: linux-arm-kernel On Thursday 27 November 2014 14:01:41 Mason wrote: > On 27/11/2014 12:23, Arnd Bergmann wrote: > > > On Thursday 27 November 2014 11:40:34 Mason wrote: > > > >> Hello everyone, > >> > >> Consider the following code (preprocessor output): > >> > >> static int tangox_target(struct cpufreq_policy *policy, unsigned int index) > >> { > >> while (__raw_readl((volatile void *)(0xf0000000 +(0x10024))) >> 31); > >> __raw_writel(0, (volatile void *)(0xf0000000 +(0x10024))); > >> return 0; > >> } > > > > First of all: > > Disclaimer: I've only recently started writing driver code, so I am > very grateful for every tip/suggestion I receive. Please note that > what I provided was *preprocessor output* (to help in diagnosing the > "C to ASM" translation). The actual code was: > > while (gbus_read_reg32(SYS_cpuclk_div_ctrl) >> 31); > gbus_write_reg32(SYS_cpuclk_div_ctrl, 0); > > these being macro wrappers: > > #define gbus_read_reg32(r) __raw_readl((volatile void __iomem *)IO_ADDRESS(r)) > #define gbus_write_reg32(r, v) __raw_writel(v, (volatile void __iomem *)IO_ADDRESS(r)) Right, that's how things used to be done a while ago. > > - don't use __raw_readl in driver code, use readl or readl_relaxed. > > Can you expand a bit on the differences between these variants? > __raw_readl, readl, readl_relaxed __raw_* is for accessing RAM behind a memory-mapped I/O bus, and will not do any endian-swaps or guarantee atomic access. readl is the standard way to access an MMIO register readl_relaxed is like readl, but it does not guarantee synchronization with device access to DMA buffers, and can be a little faster than readl. > Is this an appropriate reference? > https://www.kernel.org/doc/htmldocs/deviceiobook/index.html > ( https://lwn.net/Articles/66938/ ) > > NB: if it matters, I am accessing memory-mapped registers, across > a non-standard (I mean not PCI, ISA, etc) memory bus. > > arch/arm/include/asm/io.h defines > #define readl_relaxed(c) ({ u32 __r = le32_to_cpu((__force __le32) __raw_readl(c)); __r; }) > #define readl(c) ({ u32 __v = readl_relaxed(c); __iormb(); __v; }) > #define ioread32(p) ({ unsigned int __v = le32_to_cpu((__force __le32)__raw_readl(p)); __iormb(); __v; }) > static inline u32 __raw_readl(const volatile void __iomem *addr) > { > u32 val; > asm volatile("ldr %1, %0" > : "+Qo" (*(volatile u32 __force *)addr), > "=r" (val)); > return val; > } > > As far as I can tell, readl and ioread32 are equivalent, right? On arm32 yes. On other architectures, ioread32 may be a little slower but also take __iomem tokens that were returned from ioport_map() or pci_iomap() instead of ioremap(). On arm32 those are all pointers that can be passed into readl. > > - When you do a busy-loop, add a cpu_relax(). > > LDD3 mentions cpu_relax in the context of long delays. > My busy wait loop has a maximum latency of 60 ?s, and the cpufreq core > is given this information, so that, typically, execution falls through > the loop at the first iteration. > > Should I still use cpu_relax? Doesn't that risk adding latency to the > calling thread? > > On my platform, cpu_relax is defined to a barrier: > __asm__ __volatile__("": : :"memory") I would always use it in a busy-loop. > > - use proper types: 'void __iomem *', not 'volatile void *'. > > The actual code does use the __iomem attribute. > > > - use of_iomap or devm_ioremap_resource to get to the pointer for > > a device, don't just hardcode virtual addresses. > > About that. If nothing had been done, 0xf0010024 would be an > invalid virtual address, and reading from that address would > generate a TLB miss, right? So something must have configured > the TLB to accept and translate this address correctly. > > I'm looking for an iomap or ioremap call, right? The IO_ADDRESS() macro on this platform is probably defined to match a address range that is set up from a map_io callback in the platform. On new platforms, you can't do that because the mach/*.h header files are inaccessible to drivers, so you have to use ioremap. > I'm asking because I have an idea in mind: on the bus, the first > 16 MB contains only memory-mapped registers, so I've been thinking > I can map this region at init, and keep it for the lifetime of the > system. It would use only one entry in the TLB, since the CPU > supports 16 MB super-sections (or whatever they are called). > > I could even lock that entry in the TLB so that these accesses > are guaranteed to never TLB miss, right? The map_io callback will set up a mapping like that, and when a driver calls ioremap on the same physical address, you will get the correct pointer using that TLB, you just don't communicate the address through a pointer any more. > I'm also wondering: why did we choose to map physical address > PA at virtual address VA = PA + 0xf000000? What happens if I > map PA at VA = PA? Is it a problem if I "shadow" the 0 page? > (I imagine it is.) The address has to be between VMALLOC_START and VMALLOC_END. 0xf0000000 is a typical address for such mappings. Arnd ^ permalink raw reply [flat|nested] 17+ messages in thread
* Code generation involving __raw_readl and __raw_writel 2014-11-27 13:12 ` Arnd Bergmann @ 2014-11-27 14:51 ` Mason 2014-11-27 15:00 ` Arnd Bergmann 0 siblings, 1 reply; 17+ messages in thread From: Mason @ 2014-11-27 14:51 UTC (permalink / raw) To: linux-arm-kernel Arnd, First of all, thanks (a lot) for your highly informative replies! On 27/11/2014 14:12, Arnd Bergmann wrote: > On Thursday 27 November 2014 14:01:41 Mason wrote: > >> #define gbus_read_reg32(r) __raw_readl((volatile void __iomem *)IO_ADDRESS(r)) >> #define gbus_write_reg32(r, v) __raw_writel(v, (volatile void __iomem *)IO_ADDRESS(r)) > > Right, that's how things used to be done a while ago. So, IIUC, old code used to call __raw_readl directly, but modern code is supposed to call either readl or readl_relaxed? (BTW, the original code is 4-5 years old, while my target is 3.14.x) >>> use of_iomap or devm_ioremap_resource to get to the pointer for >>> a device, don't just hardcode virtual addresses. >> >> About that. If nothing had been done, 0xf0010024 would be an >> invalid virtual address, and reading from that address would >> generate a TLB miss, right? So something must have configured >> the TLB to accept and translate this address correctly. >> >> I'm looking for an iomap or ioremap call, right? > > The IO_ADDRESS() macro on this platform is probably defined to > match a address range that is set up from a map_io callback in > the platform. #define __IO_START 0xf0000000 #define __IO_SIZE SZ_8M #define __IO_END (__IO_START + __IO_SIZE) #define IO_ADDRESS(x) (__IO_START +(x)) static struct map_desc hw_io_desc[] __initdata = { { .virtual = SCU_VIRT_BASE_ADDR, .pfn =__phys_to_pfn(SCU_BASE_ADDR), .length = SZ_2M, .type = MT_DEVICE, }, { .virtual = IO_ADDRESS(0), .pfn =__phys_to_pfn(0), .length = SZ_8M, .type = MT_DEVICE, }, }; ... iotable_init(hw_io_desc, ARRAY_SIZE(tangox_87xx_io_desc)); I'll take a much closer look at iotable_init, but I suppose it is this function that sets up the TLB? As far as I can see, it is not optimal to map 8 MB, because that will take up 8 entries in the TLB, whereas 16 MB would take only one (in theory). > On new platforms, you can't do that because the mach/*.h header > files are inaccessible to drivers, so you have to use ioremap. What do you mean by new platforms? Indeed, the IO_* macros given above are defined in arch/arm/mach-tangox/include/mach/io.h But my 3.14 driver does see the header. >> I'm asking because I have an idea in mind: on the bus, the first >> 16 MB contains only memory-mapped registers, so I've been thinking >> I can map this region at init, and keep it for the lifetime of the >> system. It would use only one entry in the TLB, since the CPU >> supports 16 MB super-sections (or whatever they are called). >> >> I could even lock that entry in the TLB so that these accesses >> are guaranteed to never TLB miss, right? > > The map_io callback will set up a mapping like that, and when > a driver calls ioremap on the same physical address, you will > get the correct pointer using that TLB, you just don't communicate > the address through a pointer any more. IIUC, you're saying the current method using iotable_init is not appropriate, and I should use the map_io callback? Regards. (And thanks again) ^ permalink raw reply [flat|nested] 17+ messages in thread
* Code generation involving __raw_readl and __raw_writel 2014-11-27 14:51 ` Mason @ 2014-11-27 15:00 ` Arnd Bergmann 2014-11-27 15:36 ` Måns Rullgård 2014-11-27 15:46 ` Mason 0 siblings, 2 replies; 17+ messages in thread From: Arnd Bergmann @ 2014-11-27 15:00 UTC (permalink / raw) To: linux-arm-kernel On Thursday 27 November 2014 15:51:55 Mason wrote: > Arnd, > > First of all, thanks (a lot) for your highly informative replies! > > On 27/11/2014 14:12, Arnd Bergmann wrote: > > > On Thursday 27 November 2014 14:01:41 Mason wrote: > > > >> #define gbus_read_reg32(r) __raw_readl((volatile void __iomem *)IO_ADDRESS(r)) > >> #define gbus_write_reg32(r, v) __raw_writel(v, (volatile void __iomem *)IO_ADDRESS(r)) > > > > Right, that's how things used to be done a while ago. > > So, IIUC, old code used to call __raw_readl directly, but modern > code is supposed to call either readl or readl_relaxed? > > (BTW, the original code is 4-5 years old, while my target is 3.14.x) I meant the IO_ADDRESS stuff. Modern code uses ioremap() instead since the IO_ADDRESS was platform specific, and drivers can no longer use platform headers on CONFIG_ARCH_MULTIPLATFORM, which is used for all new code now. The __raw_readl() was probably considered bad back then already, but a lot of people did it as we were lacking readl_relaxed(). Would you consider submitting the code upstream? > >>> use of_iomap or devm_ioremap_resource to get to the pointer for > >>> a device, don't just hardcode virtual addresses. > >> > >> About that. If nothing had been done, 0xf0010024 would be an > >> invalid virtual address, and reading from that address would > >> generate a TLB miss, right? So something must have configured > >> the TLB to accept and translate this address correctly. > >> > >> I'm looking for an iomap or ioremap call, right? > > > > The IO_ADDRESS() macro on this platform is probably defined to > > match a address range that is set up from a map_io callback in > > the platform. > > #define __IO_START 0xf0000000 > #define __IO_SIZE SZ_8M > #define __IO_END (__IO_START + __IO_SIZE) > #define IO_ADDRESS(x) (__IO_START +(x)) > > static struct map_desc hw_io_desc[] __initdata = { > { > .virtual = SCU_VIRT_BASE_ADDR, > .pfn =__phys_to_pfn(SCU_BASE_ADDR), > .length = SZ_2M, > .type = MT_DEVICE, > }, > { > .virtual = IO_ADDRESS(0), > .pfn =__phys_to_pfn(0), > .length = SZ_8M, > .type = MT_DEVICE, > }, > }; > > ... > iotable_init(hw_io_desc, ARRAY_SIZE(tangox_87xx_io_desc)); > > I'll take a much closer look at iotable_init, but I suppose it is > this function that sets up the TLB? As far as I can see, it is not > optimal to map 8 MB, because that will take up 8 entries in the TLB, > whereas 16 MB would take only one (in theory). Right, it should just map 16 MB. > > On new platforms, you can't do that because the mach/*.h header > > files are inaccessible to drivers, so you have to use ioremap. > > What do you mean by new platforms? > > Indeed, the IO_* macros given above are defined in > arch/arm/mach-tangox/include/mach/io.h Modern platforms no longer have an include directory and use the standard definitions. How many other header files do you have in there? Actually, we also try to get rid of the need for most other files in mach-* as well, though commonly you need a little bit of code for bringing up secondary CPU cores. Everything else should be a device driver. > But my 3.14 driver does see the header. That means you don't use CONFIG_ARCH_MULTIPLATFORM. You should ;-) > >> I'm asking because I have an idea in mind: on the bus, the first > >> 16 MB contains only memory-mapped registers, so I've been thinking > >> I can map this region at init, and keep it for the lifetime of the > >> system. It would use only one entry in the TLB, since the CPU > >> supports 16 MB super-sections (or whatever they are called). > >> > >> I could even lock that entry in the TLB so that these accesses > >> are guaranteed to never TLB miss, right? > > > > The map_io callback will set up a mapping like that, and when > > a driver calls ioremap on the same physical address, you will > > get the correct pointer using that TLB, you just don't communicate > > the address through a pointer any more. > > IIUC, you're saying the current method using iotable_init is not > appropriate, and I should use the map_io callback? map_io and iotable_init is the same thing, the former calls the latter. What I'm saying is that relying on hardcoded virtual addresses is not appropriate, you can use the iotable_init call to set up a hugetlb mapping for performance optimization, but it should really work without that. Arnd ^ permalink raw reply [flat|nested] 17+ messages in thread
* Code generation involving __raw_readl and __raw_writel 2014-11-27 15:00 ` Arnd Bergmann @ 2014-11-27 15:36 ` Måns Rullgård 2014-11-27 15:55 ` Mason 2014-11-27 15:46 ` Mason 1 sibling, 1 reply; 17+ messages in thread From: Måns Rullgård @ 2014-11-27 15:36 UTC (permalink / raw) To: linux-arm-kernel Arnd Bergmann <arnd@arndb.de> writes: > On Thursday 27 November 2014 15:51:55 Mason wrote: >> Arnd, >> >> First of all, thanks (a lot) for your highly informative replies! >> >> On 27/11/2014 14:12, Arnd Bergmann wrote: >> >> > On Thursday 27 November 2014 14:01:41 Mason wrote: >> > >> >> #define gbus_read_reg32(r) __raw_readl((volatile void __iomem *)IO_ADDRESS(r)) >> >> #define gbus_write_reg32(r, v) __raw_writel(v, (volatile void __iomem *)IO_ADDRESS(r)) >> > >> > Right, that's how things used to be done a while ago. >> >> So, IIUC, old code used to call __raw_readl directly, but modern >> code is supposed to call either readl or readl_relaxed? >> >> (BTW, the original code is 4-5 years old, while my target is 3.14.x) > > I meant the IO_ADDRESS stuff. Modern code uses ioremap() instead > since the IO_ADDRESS was platform specific, and drivers can no longer > use platform headers on CONFIG_ARCH_MULTIPLATFORM, which is used > for all new code now. > > The __raw_readl() was probably considered bad back then already, but > a lot of people did it as we were lacking readl_relaxed(). > > Would you consider submitting the code upstream? This appears to be targeting a Sigma Designs chip, and nobody has to my knowledge ever seen anything go upstream from them. -- M?ns Rullg?rd mans at mansr.com ^ permalink raw reply [flat|nested] 17+ messages in thread
* Code generation involving __raw_readl and __raw_writel 2014-11-27 15:36 ` Måns Rullgård @ 2014-11-27 15:55 ` Mason 2014-11-27 16:18 ` Måns Rullgård 0 siblings, 1 reply; 17+ messages in thread From: Mason @ 2014-11-27 15:55 UTC (permalink / raw) To: linux-arm-kernel On 27/11/2014 16:36, M?ns Rullg?rd wrote: > Arnd Bergmann wrote: > >> Would you consider submitting the code upstream? > > This appears to be targeting a Sigma Designs chip, and nobody has > to my knowledge ever seen anything go upstream from them. Well, I do hope to push as much as I can into the vanilla kernel. (That is what "to submit the code upstream" means, right?) I'm just now starting the learning process though. Regards. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Code generation involving __raw_readl and __raw_writel 2014-11-27 15:55 ` Mason @ 2014-11-27 16:18 ` Måns Rullgård 2014-11-27 16:51 ` Arnd Bergmann 0 siblings, 1 reply; 17+ messages in thread From: Måns Rullgård @ 2014-11-27 16:18 UTC (permalink / raw) To: linux-arm-kernel Mason <mpeg.blue@free.fr> writes: > On 27/11/2014 16:36, M?ns Rullg?rd wrote: > >> Arnd Bergmann wrote: >> >>> Would you consider submitting the code upstream? >> >> This appears to be targeting a Sigma Designs chip, and nobody has >> to my knowledge ever seen anything go upstream from them. > > Well, I do hope to push as much as I can into the vanilla kernel. > (That is what "to submit the code upstream" means, right?) That's great. You mentioned working against 3.14. Do you have any plans to move onto the current version? That would be a prerequisite for upstream acceptance. -- M?ns Rullg?rd mans at mansr.com ^ permalink raw reply [flat|nested] 17+ messages in thread
* Code generation involving __raw_readl and __raw_writel 2014-11-27 16:18 ` Måns Rullgård @ 2014-11-27 16:51 ` Arnd Bergmann 2014-11-27 21:26 ` Mason 0 siblings, 1 reply; 17+ messages in thread From: Arnd Bergmann @ 2014-11-27 16:51 UTC (permalink / raw) To: linux-arm-kernel On Thursday 27 November 2014 16:18:20 M?ns Rullg?rd wrote: > Mason <mpeg.blue@free.fr> writes: > > > On 27/11/2014 16:36, M?ns Rullg?rd wrote: > > > >> Arnd Bergmann wrote: > >> > >>> Would you consider submitting the code upstream? > >> > >> This appears to be targeting a Sigma Designs chip, and nobody has > >> to my knowledge ever seen anything go upstream from them. > > > > Well, I do hope to push as much as I can into the vanilla kernel. > > (That is what "to submit the code upstream" means, right?) > > That's great. You mentioned working against 3.14. Do you have any > plans to move onto the current version? That would be a prerequisite > for upstream acceptance. FWIW, it's also a good strategy to do the initial bringup on the latest kernel and then backport to the older one for product work, otherwise you end up doing the forward port twice, and it's almost always harder than a backport. Arnd ^ permalink raw reply [flat|nested] 17+ messages in thread
* Code generation involving __raw_readl and __raw_writel 2014-11-27 16:51 ` Arnd Bergmann @ 2014-11-27 21:26 ` Mason 2014-11-27 21:49 ` Arnd Bergmann 2014-11-27 21:53 ` Måns Rullgård 0 siblings, 2 replies; 17+ messages in thread From: Mason @ 2014-11-27 21:26 UTC (permalink / raw) To: linux-arm-kernel On 27/11/2014 17:51, Arnd Bergmann wrote: > On Thursday 27 November 2014 16:18:20 M?ns Rullg?rd wrote: > >> That's great. You mentioned working against 3.14. Do you have any >> plans to move onto the current version? That would be a prerequisite >> for upstream acceptance. > > FWIW, it's also a good strategy to do the initial bringup on the latest > kernel and then backport to the older one for product work, otherwise > you end up doing the forward port twice, and it's almost always harder > than a backport. Hmmm, I was told to target 3.14, because that's what we plan to support. Maybe I can push one driver at a time? (I have no idea if this is realistic or not.) Regards. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Code generation involving __raw_readl and __raw_writel 2014-11-27 21:26 ` Mason @ 2014-11-27 21:49 ` Arnd Bergmann 2014-11-27 21:53 ` Måns Rullgård 1 sibling, 0 replies; 17+ messages in thread From: Arnd Bergmann @ 2014-11-27 21:49 UTC (permalink / raw) To: linux-arm-kernel On Thursday 27 November 2014 22:26:48 Mason wrote: > On 27/11/2014 17:51, Arnd Bergmann wrote: > > > On Thursday 27 November 2014 16:18:20 M?ns Rullg?rd wrote: > > > >> That's great. You mentioned working against 3.14. Do you have any > >> plans to move onto the current version? That would be a prerequisite > >> for upstream acceptance. > > > > FWIW, it's also a good strategy to do the initial bringup on the latest > > kernel and then backport to the older one for product work, otherwise > > you end up doing the forward port twice, and it's almost always harder > > than a backport. > > Hmmm, I was told to target 3.14, because that's what we plan to support. > > Maybe I can push one driver at a time? > (I have no idea if this is realistic or not.) > Doing one driver at a time is the normal way anyway, what I'm saying is just that it's more work for you if you first port everything to 3.14. Arnd ^ permalink raw reply [flat|nested] 17+ messages in thread
* Code generation involving __raw_readl and __raw_writel 2014-11-27 21:26 ` Mason 2014-11-27 21:49 ` Arnd Bergmann @ 2014-11-27 21:53 ` Måns Rullgård 1 sibling, 0 replies; 17+ messages in thread From: Måns Rullgård @ 2014-11-27 21:53 UTC (permalink / raw) To: linux-arm-kernel Mason <mpeg.blue@free.fr> writes: > On 27/11/2014 17:51, Arnd Bergmann wrote: > >> On Thursday 27 November 2014 16:18:20 M?ns Rullg?rd wrote: >> >>> That's great. You mentioned working against 3.14. Do you have any >>> plans to move onto the current version? That would be a prerequisite >>> for upstream acceptance. >> >> FWIW, it's also a good strategy to do the initial bringup on the latest >> kernel and then backport to the older one for product work, otherwise >> you end up doing the forward port twice, and it's almost always harder >> than a backport. > > Hmmm, I was told to target 3.14, because that's what we plan to support. > > Maybe I can push one driver at a time? Normally you would start with minimal chip support, just enough to configure memory and a uart, then add drivers for various devices one by one. New ARM systems are expected to use DT, so you should also be switching to that if you haven't already. -- M?ns Rullg?rd mans at mansr.com ^ permalink raw reply [flat|nested] 17+ messages in thread
* Code generation involving __raw_readl and __raw_writel 2014-11-27 15:00 ` Arnd Bergmann 2014-11-27 15:36 ` Måns Rullgård @ 2014-11-27 15:46 ` Mason 2014-11-27 15:59 ` Arnd Bergmann 1 sibling, 1 reply; 17+ messages in thread From: Mason @ 2014-11-27 15:46 UTC (permalink / raw) To: linux-arm-kernel On 27/11/2014 16:00, Arnd Bergmann wrote: > On Thursday 27 November 2014 15:51:55 Mason wrote: > >> (BTW, the original code is 4-5 years old, while my target is 3.14.x) > > I meant the IO_ADDRESS stuff. Modern code uses ioremap() instead > since the IO_ADDRESS was platform specific, and drivers can no longer > use platform headers on CONFIG_ARCH_MULTIPLATFORM, which is used > for all new code now. $ grep CONFIG_ARCH_MULTIPLATFORM .config # CONFIG_ARCH_MULTIPLATFORM is not set One more thing on the TODO list (which is starting to look like a phone directory). Do you have a reference explaining this option? (I did read the short description in arch/arm/Kconfig) > Would you consider submitting the code upstream? This is definitely one of my objectives. I'm still learning the ropes (I rewrote the cpufreq driver to use mostly generic functions, heard of the clk "meta-framework" but haven't even started looking at that, I'm also writing a temperature sensor driver for lm-sensors.) > Modern platforms no longer have an include directory and use the > standard definitions. How many other header files do you have in there? $ find arch/arm/mach-tangox -name *.h | wc -l 44 > Actually, we also try to get rid of the need for most other files > in mach-* as well, though commonly you need a little bit of code for > bringing up secondary CPU cores. Everything else should be a device > driver. I see that other platforms put their cpufreq driver in drivers/cpufreq Ours is in arch/arm/mach-tangox (same for cpuidle.c) (Hmmm, now that I look more closely, I see that we also have a generic clock.c in there... More on the TODO list.) >> IIUC, you're saying the current method using iotable_init is not >> appropriate, and I should use the map_io callback? > > map_io and iotable_init is the same thing, the former calls the latter. > What I'm saying is that relying on hardcoded virtual addresses is > not appropriate, you can use the iotable_init call to set up a > hugetlb mapping for performance optimization, but it should really > work without that. I think I don't fully understand this paragraph yet. But you've given me a lot to think about today, so maybe It'll sink in later. Regards. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Code generation involving __raw_readl and __raw_writel 2014-11-27 15:46 ` Mason @ 2014-11-27 15:59 ` Arnd Bergmann 0 siblings, 0 replies; 17+ messages in thread From: Arnd Bergmann @ 2014-11-27 15:59 UTC (permalink / raw) To: linux-arm-kernel On Thursday 27 November 2014 16:46:30 Mason wrote: > On 27/11/2014 16:00, Arnd Bergmann wrote: > > > On Thursday 27 November 2014 15:51:55 Mason wrote: > > > >> (BTW, the original code is 4-5 years old, while my target is 3.14.x) > > > > I meant the IO_ADDRESS stuff. Modern code uses ioremap() instead > > since the IO_ADDRESS was platform specific, and drivers can no longer > > use platform headers on CONFIG_ARCH_MULTIPLATFORM, which is used > > for all new code now. > > $ grep CONFIG_ARCH_MULTIPLATFORM .config > # CONFIG_ARCH_MULTIPLATFORM is not set > > One more thing on the TODO list (which is starting to look like a > phone directory). Do you have a reference explaining this option? > (I did read the short description in arch/arm/Kconfig) The brief version is that instead of selecting just one platform at a time, you can now build a kernel with (almost) all ARMv6 and ARMv7 platforms enabled at the same time, and new platforms are no longer allowed to be separate options. When you look at any of the modern (e.g. sunxi, rockchip, hisi, ...) platforms you find how their Kconfig is structured. Some older platforms still have a lot of other files in the mach-* directories for historic reasons, but you wouldn't do that for a new platform. > > Would you consider submitting the code upstream? > > This is definitely one of my objectives. I'm still learning the ropes > (I rewrote the cpufreq driver to use mostly generic functions, heard > of the clk "meta-framework" but haven't even started looking at that, > I'm also writing a temperature sensor driver for lm-sensors.) Ok, cool. Can you upload what you have to a public git tree for reference? > > Modern platforms no longer have an include directory and use the > > standard definitions. How many other header files do you have in there? > > $ find arch/arm/mach-tangox -name *.h | wc -l > 44 Ouch. I'm guessing that a lot of these will be driver specific. Those headers should just be merged into whatever driver uses the registers and removed from the platform. Then there is the case of platform_data definitions. For platforms using traditional board files, these should get moved to include/linux/platform_data/, but for an upstream port you will have to use devicetree anyway, and then you won't need platform data. > > Actually, we also try to get rid of the need for most other files > > in mach-* as well, though commonly you need a little bit of code for > > bringing up secondary CPU cores. Everything else should be a device > > driver. > > I see that other platforms put their cpufreq driver in drivers/cpufreq > Ours is in arch/arm/mach-tangox (same for cpuidle.c) > (Hmmm, now that I look more closely, I see that we also have a generic > clock.c in there... More on the TODO list.) Right. We moved almost everything into subsystems. Also clocksource, irqchip, pinctrl, gpio, dma, led, pwm, and probably some more that I forgot. Arnd ^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2014-11-27 21:53 UTC | newest] Thread overview: 17+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-11-27 10:40 Code generation involving __raw_readl and __raw_writel Mason 2014-11-27 10:48 ` Russell King - ARM Linux 2014-11-27 11:09 ` Willy Tarreau 2014-11-27 11:23 ` Arnd Bergmann 2014-11-27 13:01 ` Mason 2014-11-27 13:12 ` Arnd Bergmann 2014-11-27 14:51 ` Mason 2014-11-27 15:00 ` Arnd Bergmann 2014-11-27 15:36 ` Måns Rullgård 2014-11-27 15:55 ` Mason 2014-11-27 16:18 ` Måns Rullgård 2014-11-27 16:51 ` Arnd Bergmann 2014-11-27 21:26 ` Mason 2014-11-27 21:49 ` Arnd Bergmann 2014-11-27 21:53 ` Måns Rullgård 2014-11-27 15:46 ` Mason 2014-11-27 15:59 ` Arnd Bergmann
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).