Code generation involving __raw_readl and __raw

linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed

* Code generation involving __raw_readl and __raw_writel
@ 2014-11-27 10:40 Mason
  2014-11-27 10:48 ` Russell King - ARM Linux
                   ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: Mason @ 2014-11-27 10:40 UTC (permalink / raw)
  To: linux-arm-kernel

Hello everyone,

Consider the following code (preprocessor output):

static int tangox_target(struct cpufreq_policy *policy, unsigned int index)
{
  while (__raw_readl((volatile void *)(0xf0000000 +(0x10024))) >> 31);
  __raw_writel(0, (volatile void *)(0xf0000000 +(0x10024)));
  return 0;
}

gcc generates the following code:
(version and command-line in sig below)

00000014 <tangox_target>:
   14:   e3a03000        mov     r3, #0
   18:   e34f3001        movt    r3, #61441      ; 0xf001
   1c:   e3a02000        mov     r2, #0
   20:   e34f2001        movt    r2, #61441      ; 0xf001
   24:   e5931024        ldr     r1, [r3, #36]   ; 0x24
   28:   e3510000        cmp     r1, #0
   2c:   bafffffa        blt     1c <tangox_target+0x8>
   30:   e3a00000        mov     r0, #0
   34:   e5820024        str     r0, [r2, #36]   ; 0x24
   38:   e12fff1e        bx      lr

Do you know why gcc duplicates the address in r2 and r3?
And keeps putting the address in r2 over and over in the loop?

I was expecting something more along these lines:

00000014 <tangox_target>:
   14:   e3a03000        mov     r3, #0
   18:   e34f3001        movt    r3, #61441      ; 0xf001
   1c:   e5931024        ldr     r1, [r3, #36]   ; 0x24
   20:   e3510000        cmp     r1, #0
   24:   bafffffa        blt     1c <tangox_target+0x8>
   28:   e3a00000        mov     r0, #0
   2c:   e5820024        str     r0, [r3, #36]   ; 0x24
   30:   e12fff1e        bx      lr

Regards.

-- 
gcc version 4.6.3 (Sourcery CodeBench Lite 2012.03-57)

arm-none-linux-gnueabi-gcc -Wp,-MD,arch/arm/mach-tangox/.cpufreq.o.d  -nostdinc -isystem /utils/release/sets/6_1_3_8756/sigma-buildroot_2012.02-22/output/host/opt/ext-toolchain/bin/../lib/gcc/arm-none-linux-gnueabi/4.6.3/include -I/home/maze/linux/arch/arm/include -Iarch/arm/include/generated  -Iinclude -I/home/maze/linux/arch/arm/include/uapi -Iarch/arm/include/generated/uapi -I/home/maze/linux/include/uapi -Iinclude/generated/uapi -include /home/maze/linux/include/linux/kconfig.h -D__KERNEL__ -mlittle-endian -Iarch/arm/mach-tangox/include -Wall -Wundef -Wstrict-prototypes -Wno-trigraphs -fno-strict-aliasing -fno-common -Werror-implicit-function-declaration -Wno-format-security -fno-delete-null-pointer-checks -O2 -fno-dwarf2-cfi-asm -mabi=aapcs-linux -mno-thumb-interwork -mfpu=vfp -funwind-tables -marm -D__LINUX_ARM_ARCH__=7 -march=armv7-a -msoft-float -Uarm -Wframe-larger-than=1024 -fno-stack-protector -Wno-unused-but-set-variable -fomit-frame-pointer -fno-var-tracking-assignments 
 -
g -Wdeclaration-after-statement -Wno-pointer-sign -fno-strict-overflow -fconserve-stack -Werror=implicit-int -Werror=strict-prototypes    -D"KBUILD_STR(s)=#s" -D"KBUILD_BASENAME=KBUILD_STR(cpufreq)"  -D"KBUILD_MODNAME=KBUILD_STR(cpufreq)" -c -o arch/arm/mach-tangox/cpufreq.o arch/arm/mach-tangox/cpufreq.c

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Code generation involving __raw_readl and __raw_writel
  2014-11-27 10:40 Code generation involving __raw_readl and __raw_writel Mason
@ 2014-11-27 10:48 ` Russell King - ARM Linux
  2014-11-27 11:09 ` Willy Tarreau
  2014-11-27 11:23 ` Arnd Bergmann
  2 siblings, 0 replies; 17+ messages in thread
From: Russell King - ARM Linux @ 2014-11-27 10:48 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Nov 27, 2014 at 11:40:34AM +0100, Mason wrote:
> Hello everyone,
> 
> Consider the following code (preprocessor output):
> 
> static int tangox_target(struct cpufreq_policy *policy, unsigned int index)
> {
>  while (__raw_readl((volatile void *)(0xf0000000 +(0x10024))) >> 31);
>  __raw_writel(0, (volatile void *)(0xf0000000 +(0x10024)));
>  return 0;
> }
> 
> gcc generates the following code:
> (version and command-line in sig below)
> 
> 00000014 <tangox_target>:
>   14:   e3a03000        mov     r3, #0
>   18:   e34f3001        movt    r3, #61441      ; 0xf001
>   1c:   e3a02000        mov     r2, #0
>   20:   e34f2001        movt    r2, #61441      ; 0xf001
>   24:   e5931024        ldr     r1, [r3, #36]   ; 0x24
>   28:   e3510000        cmp     r1, #0
>   2c:   bafffffa        blt     1c <tangox_target+0x8>
>   30:   e3a00000        mov     r0, #0
>   34:   e5820024        str     r0, [r2, #36]   ; 0x24
>   38:   e12fff1e        bx      lr
> 
> Do you know why gcc duplicates the address in r2 and r3?
> And keeps putting the address in r2 over and over in the loop?

Because GCC is dumb.  GCC has a long history of doing stupid stuff like
this.

That's why it's often far better to code your functions assuming that
GCC isn't going to optimise very well.  So, for instance:

static int tangox_target(struct cpufreq_policy *policy, unsigned int index)
{
 void __iomem *reg = (void *)(0xf0000000 +(0x10024));
 while (__raw_readl(reg) >> 31);
 __raw_writel(0, reg);
 return 0;
}

It's also good practise to add a cpu_relax() to the while loop:

 while (__raw_readl(reg) >> 31)
  cpu_relax();

for two reasons - the ';' at the end can easily be overlooked when reading
the code, and it also ensures that there are no bugs lurking (eg, some
ARM CPUs don't bound their write buffers, which means stores can sit in
them permanently while you're looping.)

-- 
FTTC broadband for 0.8mile line: currently at 9.5Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Code generation involving __raw_readl and __raw_writel
  2014-11-27 10:40 Code generation involving __raw_readl and __raw_writel Mason
  2014-11-27 10:48 ` Russell King - ARM Linux
@ 2014-11-27 11:09 ` Willy Tarreau
  2014-11-27 11:23 ` Arnd Bergmann
  2 siblings, 0 replies; 17+ messages in thread
From: Willy Tarreau @ 2014-11-27 11:09 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

On Thu, Nov 27, 2014 at 11:40:34AM +0100, Mason wrote:
> Hello everyone,
> 
> Consider the following code (preprocessor output):
> 
> static int tangox_target(struct cpufreq_policy *policy, unsigned int index)
> {
>  while (__raw_readl((volatile void *)(0xf0000000 +(0x10024))) >> 31);
>  __raw_writel(0, (volatile void *)(0xf0000000 +(0x10024)));
>  return 0;
> }
> 
> gcc generates the following code:
> (version and command-line in sig below)
> 
> 00000014 <tangox_target>:
>   14:   e3a03000        mov     r3, #0
>   18:   e34f3001        movt    r3, #61441      ; 0xf001
>   1c:   e3a02000        mov     r2, #0
>   20:   e34f2001        movt    r2, #61441      ; 0xf001
>   24:   e5931024        ldr     r1, [r3, #36]   ; 0x24
>   28:   e3510000        cmp     r1, #0
>   2c:   bafffffa        blt     1c <tangox_target+0x8>
>   30:   e3a00000        mov     r0, #0
>   34:   e5820024        str     r0, [r2, #36]   ; 0x24
>   38:   e12fff1e        bx      lr
> 
> Do you know why gcc duplicates the address in r2 and r3?
> And keeps putting the address in r2 over and over in the loop?

I'm used to see crap like this all the time, which is why I
*always* look at the assembly code for any performance-sensible
section. In general, I try hard to help gcc do the right thing,
or at least make it harder for it to do the wrong thing. Yes
that's painful. But with a bit of training, you get automatisms
and don't think about it anymore.

Above it's very likely that if you compute your offset into a
variable and use this variable, it will magically work.

Hoping this helps,
Willy

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Code generation involving __raw_readl and __raw_writel
  2014-11-27 10:40 Code generation involving __raw_readl and __raw_writel Mason
  2014-11-27 10:48 ` Russell King - ARM Linux
  2014-11-27 11:09 ` Willy Tarreau
@ 2014-11-27 11:23 ` Arnd Bergmann
  2014-11-27 13:01   ` Mason
  2 siblings, 1 reply; 17+ messages in thread
From: Arnd Bergmann @ 2014-11-27 11:23 UTC (permalink / raw)
  To: linux-arm-kernel

On Thursday 27 November 2014 11:40:34 Mason wrote:
> Hello everyone,
> 
> Consider the following code (preprocessor output):
> 
> static int tangox_target(struct cpufreq_policy *policy, unsigned int index)
> {
>   while (__raw_readl((volatile void *)(0xf0000000 +(0x10024))) >> 31);
>   __raw_writel(0, (volatile void *)(0xf0000000 +(0x10024)));
>   return 0;
> }

First of all:
- don't use __raw_readl in driver code, use readl or readl_relaxed.
- When you do a busy-loop, add a cpu_relax().
- use proper types: 'void __iomem *', not 'volatile void *'.
- use of_iomap or devm_ioremap_resource to get to the pointer for
  a device, don't just hardcode virtual addresses.

> gcc generates the following code:
> (version and command-line in sig below)
> 
> 00000014 <tangox_target>:
>    14:   e3a03000        mov     r3, #0
>    18:   e34f3001        movt    r3, #61441      ; 0xf001
>    1c:   e3a02000        mov     r2, #0
>    20:   e34f2001        movt    r2, #61441      ; 0xf001
>    24:   e5931024        ldr     r1, [r3, #36]   ; 0x24
>    28:   e3510000        cmp     r1, #0
>    2c:   bafffffa        blt     1c <tangox_target+0x8>
>    30:   e3a00000        mov     r0, #0
>    34:   e5820024        str     r0, [r2, #36]   ; 0x24
>    38:   e12fff1e        bx      lr
> 
> Do you know why gcc duplicates the address in r2 and r3?
> And keeps putting the address in r2 over and over in the loop?
> 
> I was expecting something more along these lines:
> 
> 00000014 <tangox_target>:
>    14:   e3a03000        mov     r3, #0
>    18:   e34f3001        movt    r3, #61441      ; 0xf001
>    1c:   e5931024        ldr     r1, [r3, #36]   ; 0x24
>    20:   e3510000        cmp     r1, #0
>    24:   bafffffa        blt     1c <tangox_target+0x8>
>    28:   e3a00000        mov     r0, #0
>    2c:   e5820024        str     r0, [r3, #36]   ; 0x24
>    30:   e12fff1e        bx      lr

I suspect the use of 'volatile' just makes gcc avoid all
optimizations. Try cleaning up the code first and see if it
still happens, then use a local variable to store the __iomem
token if you have to.

	Arnd

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Code generation involving __raw_readl and __raw_writel
  2014-11-27 11:23 ` Arnd Bergmann
@ 2014-11-27 13:01   ` Mason
  2014-11-27 13:12     ` Arnd Bergmann
  0 siblings, 1 reply; 17+ messages in thread
From: Mason @ 2014-11-27 13:01 UTC (permalink / raw)
  To: linux-arm-kernel

On 27/11/2014 12:23, Arnd Bergmann wrote:

> On Thursday 27 November 2014 11:40:34 Mason wrote:
>
>> Hello everyone,
>>
>> Consider the following code (preprocessor output):
>>
>> static int tangox_target(struct cpufreq_policy *policy, unsigned int index)
>> {
>>    while (__raw_readl((volatile void *)(0xf0000000 +(0x10024))) >> 31);
>>    __raw_writel(0, (volatile void *)(0xf0000000 +(0x10024)));
>>    return 0;
>> }
>
> First of all:

Disclaimer: I've only recently started writing driver code, so I am
very grateful for every tip/suggestion I receive. Please note that
what I provided was *preprocessor output* (to help in diagnosing the
"C to ASM" translation). The actual code was:

   while (gbus_read_reg32(SYS_cpuclk_div_ctrl) >> 31);
   gbus_write_reg32(SYS_cpuclk_div_ctrl, 0);

these being macro wrappers:

#define gbus_read_reg32(r)      __raw_readl((volatile void __iomem *)IO_ADDRESS(r))
#define gbus_write_reg32(r, v)  __raw_writel(v, (volatile void __iomem *)IO_ADDRESS(r))

> - don't use __raw_readl in driver code, use readl or readl_relaxed.

Can you expand a bit on the differences between these variants?
__raw_readl, readl, readl_relaxed

Is this an appropriate reference?
https://www.kernel.org/doc/htmldocs/deviceiobook/index.html
( https://lwn.net/Articles/66938/ )

NB: if it matters, I am accessing memory-mapped registers, across
a non-standard (I mean not PCI, ISA, etc) memory bus.

arch/arm/include/asm/io.h defines
#define readl_relaxed(c) ({ u32 __r = le32_to_cpu((__force __le32) __raw_readl(c)); __r; })
#define readl(c)         ({ u32 __v = readl_relaxed(c); __iormb(); __v; })
#define ioread32(p)	 ({ unsigned int __v = le32_to_cpu((__force __le32)__raw_readl(p)); __iormb(); __v; })
static inline u32 __raw_readl(const volatile void __iomem *addr)
{
	u32 val;
	asm volatile("ldr %1, %0"
		     : "+Qo" (*(volatile u32 __force *)addr),
		       "=r" (val));
	return val;
}

As far as I can tell, readl and ioread32 are equivalent, right?

> - When you do a busy-loop, add a cpu_relax().

LDD3 mentions cpu_relax in the context of long delays.
My busy wait loop has a maximum latency of 60 ?s, and the cpufreq core
is given this information, so that, typically, execution falls through
the loop at the first iteration.

Should I still use cpu_relax? Doesn't that risk adding latency to the
calling thread?

On my platform, cpu_relax is defined to a barrier:
__asm__ __volatile__("": : :"memory")

> - use proper types: 'void __iomem *', not 'volatile void *'.

The actual code does use the __iomem attribute.

> - use of_iomap or devm_ioremap_resource to get to the pointer for
>    a device, don't just hardcode virtual addresses.

About that. If nothing had been done, 0xf0010024 would be an
invalid virtual address, and reading from that address would
generate a TLB miss, right? So something must have configured
the TLB to accept and translate this address correctly.

I'm looking for an iomap or ioremap call, right?

I'm asking because I have an idea in mind: on the bus, the first
16 MB contains only memory-mapped registers, so I've been thinking
I can map this region at init, and keep it for the lifetime of the
system. It would use only one entry in the TLB, since the CPU
supports 16 MB super-sections (or whatever they are called).

I could even lock that entry in the TLB so that these accesses
are guaranteed to never TLB miss, right?

I'm also wondering: why did we choose to map physical address
PA at virtual address VA = PA + 0xf000000? What happens if I
map PA at VA = PA? Is it a problem if I "shadow" the 0 page?
(I imagine it is.)

Regards.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Code generation involving __raw_readl and __raw_writel
  2014-11-27 13:01   ` Mason
@ 2014-11-27 13:12     ` Arnd Bergmann
  2014-11-27 14:51       ` Mason
  0 siblings, 1 reply; 17+ messages in thread
From: Arnd Bergmann @ 2014-11-27 13:12 UTC (permalink / raw)
  To: linux-arm-kernel

On Thursday 27 November 2014 14:01:41 Mason wrote:
> On 27/11/2014 12:23, Arnd Bergmann wrote:
> 
> > On Thursday 27 November 2014 11:40:34 Mason wrote:
> >
> >> Hello everyone,
> >>
> >> Consider the following code (preprocessor output):
> >>
> >> static int tangox_target(struct cpufreq_policy *policy, unsigned int index)
> >> {
> >>    while (__raw_readl((volatile void *)(0xf0000000 +(0x10024))) >> 31);
> >>    __raw_writel(0, (volatile void *)(0xf0000000 +(0x10024)));
> >>    return 0;
> >> }
> >
> > First of all:
> 
> Disclaimer: I've only recently started writing driver code, so I am
> very grateful for every tip/suggestion I receive. Please note that
> what I provided was *preprocessor output* (to help in diagnosing the
> "C to ASM" translation). The actual code was:
> 
>    while (gbus_read_reg32(SYS_cpuclk_div_ctrl) >> 31);
>    gbus_write_reg32(SYS_cpuclk_div_ctrl, 0);
> 
> these being macro wrappers:
> 
> #define gbus_read_reg32(r)      __raw_readl((volatile void __iomem *)IO_ADDRESS(r))
> #define gbus_write_reg32(r, v)  __raw_writel(v, (volatile void __iomem *)IO_ADDRESS(r))

Right, that's how things used to be done a while ago.

> > - don't use __raw_readl in driver code, use readl or readl_relaxed.
> 
> Can you expand a bit on the differences between these variants?
> __raw_readl, readl, readl_relaxed

__raw_* is for accessing RAM behind a memory-mapped I/O bus,
and will not do any endian-swaps or guarantee atomic access.

readl is the standard way to access an MMIO register

readl_relaxed is like readl, but it does not guarantee synchronization
with device access to DMA buffers, and can be a little faster than
readl.

> Is this an appropriate reference?
> https://www.kernel.org/doc/htmldocs/deviceiobook/index.html
> ( https://lwn.net/Articles/66938/ )
> 
> NB: if it matters, I am accessing memory-mapped registers, across
> a non-standard (I mean not PCI, ISA, etc) memory bus.
> 
> arch/arm/include/asm/io.h defines
> #define readl_relaxed(c) ({ u32 __r = le32_to_cpu((__force __le32) __raw_readl(c)); __r; })
> #define readl(c)         ({ u32 __v = readl_relaxed(c); __iormb(); __v; })
> #define ioread32(p)	 ({ unsigned int __v = le32_to_cpu((__force __le32)__raw_readl(p)); __iormb(); __v; })
> static inline u32 __raw_readl(const volatile void __iomem *addr)
> {
> 	u32 val;
> 	asm volatile("ldr %1, %0"
> 		     : "+Qo" (*(volatile u32 __force *)addr),
> 		       "=r" (val));
> 	return val;
> }
> 
> As far as I can tell, readl and ioread32 are equivalent, right?

On arm32 yes. On other architectures, ioread32 may be a little slower
but also take __iomem tokens that were returned from ioport_map()
or pci_iomap() instead of ioremap(). On arm32 those are all pointers
that can be passed into readl.

> > - When you do a busy-loop, add a cpu_relax().
> 
> LDD3 mentions cpu_relax in the context of long delays.
> My busy wait loop has a maximum latency of 60 ?s, and the cpufreq core
> is given this information, so that, typically, execution falls through
> the loop at the first iteration.
> 
> Should I still use cpu_relax? Doesn't that risk adding latency to the
> calling thread?
> 
> On my platform, cpu_relax is defined to a barrier:
> __asm__ __volatile__("": : :"memory")

I would always use it in a busy-loop.

> > - use proper types: 'void __iomem *', not 'volatile void *'.
> 
> The actual code does use the __iomem attribute.
> 
> > - use of_iomap or devm_ioremap_resource to get to the pointer for
> >    a device, don't just hardcode virtual addresses.
> 
> About that. If nothing had been done, 0xf0010024 would be an
> invalid virtual address, and reading from that address would
> generate a TLB miss, right? So something must have configured
> the TLB to accept and translate this address correctly.
> 
> I'm looking for an iomap or ioremap call, right?

The IO_ADDRESS() macro on this platform is probably defined to
match a address range that is set up from a map_io callback in
the platform.

On new platforms, you can't do that because the mach/*.h header
files are inaccessible to drivers, so you have to use ioremap.

> I'm asking because I have an idea in mind: on the bus, the first
> 16 MB contains only memory-mapped registers, so I've been thinking
> I can map this region at init, and keep it for the lifetime of the
> system. It would use only one entry in the TLB, since the CPU
> supports 16 MB super-sections (or whatever they are called).
> 
> I could even lock that entry in the TLB so that these accesses
> are guaranteed to never TLB miss, right?

The map_io callback will set up a mapping like that, and when
a driver calls ioremap on the same physical address, you will
get the correct pointer using that TLB, you just don't communicate
the address through a pointer any more.

> I'm also wondering: why did we choose to map physical address
> PA at virtual address VA = PA + 0xf000000? What happens if I
> map PA at VA = PA? Is it a problem if I "shadow" the 0 page?
> (I imagine it is.)

The address has to be between VMALLOC_START and VMALLOC_END.
0xf0000000 is a typical address for such mappings.

	Arnd

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Code generation involving __raw_readl and __raw_writel
  2014-11-27 13:12     ` Arnd Bergmann
@ 2014-11-27 14:51       ` Mason
  2014-11-27 15:00         ` Arnd Bergmann
  0 siblings, 1 reply; 17+ messages in thread
From: Mason @ 2014-11-27 14:51 UTC (permalink / raw)
  To: linux-arm-kernel

Arnd,

First of all, thanks (a lot) for your highly informative replies!

On 27/11/2014 14:12, Arnd Bergmann wrote:

> On Thursday 27 November 2014 14:01:41 Mason wrote:
>
>> #define gbus_read_reg32(r)      __raw_readl((volatile void __iomem *)IO_ADDRESS(r))
>> #define gbus_write_reg32(r, v)  __raw_writel(v, (volatile void __iomem *)IO_ADDRESS(r))
>
> Right, that's how things used to be done a while ago.

So, IIUC, old code used to call __raw_readl directly, but modern
code is supposed to call either readl or readl_relaxed?

(BTW, the original code is 4-5 years old, while my target is 3.14.x)

>>> use of_iomap or devm_ioremap_resource to get to the pointer for
>>> a device, don't just hardcode virtual addresses.
>>
>> About that. If nothing had been done, 0xf0010024 would be an
>> invalid virtual address, and reading from that address would
>> generate a TLB miss, right? So something must have configured
>> the TLB to accept and translate this address correctly.
>>
>> I'm looking for an iomap or ioremap call, right?
>
> The IO_ADDRESS() macro on this platform is probably defined to
> match a address range that is set up from a map_io callback in
> the platform.

#define __IO_START 0xf0000000
#define __IO_SIZE  SZ_8M
#define __IO_END   (__IO_START + __IO_SIZE)
#define IO_ADDRESS(x) (__IO_START +(x))

static struct map_desc hw_io_desc[] __initdata = {
	{
		.virtual	= SCU_VIRT_BASE_ADDR,
		.pfn		=__phys_to_pfn(SCU_BASE_ADDR),
		.length		= SZ_2M,
		.type 		= MT_DEVICE,
	},
	{
		.virtual	= IO_ADDRESS(0),
		.pfn		=__phys_to_pfn(0),
		.length		= SZ_8M,
		.type 		= MT_DEVICE,
	},
};

   ...
   iotable_init(hw_io_desc, ARRAY_SIZE(tangox_87xx_io_desc));

I'll take a much closer look at iotable_init, but I suppose it is
this function that sets up the TLB? As far as I can see, it is not
optimal to map 8 MB, because that will take up 8 entries in the TLB,
whereas 16 MB would take only one (in theory).

> On new platforms, you can't do that because the mach/*.h header
> files are inaccessible to drivers, so you have to use ioremap.

What do you mean by new platforms?

Indeed, the IO_* macros given above are defined in
arch/arm/mach-tangox/include/mach/io.h

But my 3.14 driver does see the header.

>> I'm asking because I have an idea in mind: on the bus, the first
>> 16 MB contains only memory-mapped registers, so I've been thinking
>> I can map this region at init, and keep it for the lifetime of the
>> system. It would use only one entry in the TLB, since the CPU
>> supports 16 MB super-sections (or whatever they are called).
>>
>> I could even lock that entry in the TLB so that these accesses
>> are guaranteed to never TLB miss, right?
>
> The map_io callback will set up a mapping like that, and when
> a driver calls ioremap on the same physical address, you will
> get the correct pointer using that TLB, you just don't communicate
> the address through a pointer any more.

IIUC, you're saying the current method using iotable_init is not
appropriate, and I should use the map_io callback?

Regards. (And thanks again)

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Code generation involving __raw_readl and __raw_writel
  2014-11-27 14:51       ` Mason
@ 2014-11-27 15:00         ` Arnd Bergmann
  2014-11-27 15:36           ` Måns Rullgård
  2014-11-27 15:46           ` Mason
  0 siblings, 2 replies; 17+ messages in thread
From: Arnd Bergmann @ 2014-11-27 15:00 UTC (permalink / raw)
  To: linux-arm-kernel

On Thursday 27 November 2014 15:51:55 Mason wrote:
> Arnd,
> 
> First of all, thanks (a lot) for your highly informative replies!
> 
> On 27/11/2014 14:12, Arnd Bergmann wrote:
> 
> > On Thursday 27 November 2014 14:01:41 Mason wrote:
> >
> >> #define gbus_read_reg32(r)      __raw_readl((volatile void __iomem *)IO_ADDRESS(r))
> >> #define gbus_write_reg32(r, v)  __raw_writel(v, (volatile void __iomem *)IO_ADDRESS(r))
> >
> > Right, that's how things used to be done a while ago.
> 
> So, IIUC, old code used to call __raw_readl directly, but modern
> code is supposed to call either readl or readl_relaxed?
> 
> (BTW, the original code is 4-5 years old, while my target is 3.14.x)

I meant the IO_ADDRESS stuff. Modern code uses ioremap() instead
since the IO_ADDRESS was platform specific, and drivers can no longer
use platform headers on CONFIG_ARCH_MULTIPLATFORM, which is used
for all new code now.

The __raw_readl() was probably considered bad back then already, but
a lot of people did it as we were lacking readl_relaxed().

Would you consider submitting the code upstream?

> >>> use of_iomap or devm_ioremap_resource to get to the pointer for
> >>> a device, don't just hardcode virtual addresses.
> >>
> >> About that. If nothing had been done, 0xf0010024 would be an
> >> invalid virtual address, and reading from that address would
> >> generate a TLB miss, right? So something must have configured
> >> the TLB to accept and translate this address correctly.
> >>
> >> I'm looking for an iomap or ioremap call, right?
> >
> > The IO_ADDRESS() macro on this platform is probably defined to
> > match a address range that is set up from a map_io callback in
> > the platform.
> 
> #define __IO_START 0xf0000000
> #define __IO_SIZE  SZ_8M
> #define __IO_END   (__IO_START + __IO_SIZE)
> #define IO_ADDRESS(x) (__IO_START +(x))
> 
> static struct map_desc hw_io_desc[] __initdata = {
> 	{
> 		.virtual	= SCU_VIRT_BASE_ADDR,
> 		.pfn		=__phys_to_pfn(SCU_BASE_ADDR),
> 		.length		= SZ_2M,
> 		.type 		= MT_DEVICE,
> 	},
> 	{
> 		.virtual	= IO_ADDRESS(0),
> 		.pfn		=__phys_to_pfn(0),
> 		.length		= SZ_8M,
> 		.type 		= MT_DEVICE,
> 	},
> };
> 
>    ...
>    iotable_init(hw_io_desc, ARRAY_SIZE(tangox_87xx_io_desc));
> 
> I'll take a much closer look at iotable_init, but I suppose it is
> this function that sets up the TLB? As far as I can see, it is not
> optimal to map 8 MB, because that will take up 8 entries in the TLB,
> whereas 16 MB would take only one (in theory).

Right, it should just map 16 MB.

> > On new platforms, you can't do that because the mach/*.h header
> > files are inaccessible to drivers, so you have to use ioremap.
> 
> What do you mean by new platforms?
> 
> Indeed, the IO_* macros given above are defined in
> arch/arm/mach-tangox/include/mach/io.h

Modern platforms no longer have an include directory and use the
standard definitions. How many other header files do you have in there?

Actually, we also try to get rid of the need for most other files
in mach-* as well, though commonly you need a little bit of code for
bringing up secondary CPU cores. Everything else should be a device
driver.

> But my 3.14 driver does see the header.

That means you don't use CONFIG_ARCH_MULTIPLATFORM. You should ;-)

> >> I'm asking because I have an idea in mind: on the bus, the first
> >> 16 MB contains only memory-mapped registers, so I've been thinking
> >> I can map this region at init, and keep it for the lifetime of the
> >> system. It would use only one entry in the TLB, since the CPU
> >> supports 16 MB super-sections (or whatever they are called).
> >>
> >> I could even lock that entry in the TLB so that these accesses
> >> are guaranteed to never TLB miss, right?
> >
> > The map_io callback will set up a mapping like that, and when
> > a driver calls ioremap on the same physical address, you will
> > get the correct pointer using that TLB, you just don't communicate
> > the address through a pointer any more.
> 
> IIUC, you're saying the current method using iotable_init is not
> appropriate, and I should use the map_io callback?

map_io and iotable_init is the same thing, the former calls the latter.
What I'm saying is that relying on hardcoded virtual addresses is
not appropriate, you can use the iotable_init call to set up a
hugetlb mapping for performance optimization, but it should really
work without that.

	Arnd

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Code generation involving __raw_readl and __raw_writel
  2014-11-27 15:00         ` Arnd Bergmann
@ 2014-11-27 15:36           ` Måns Rullgård
  2014-11-27 15:55             ` Mason
  2014-11-27 15:46           ` Mason
  1 sibling, 1 reply; 17+ messages in thread
From: Måns Rullgård @ 2014-11-27 15:36 UTC (permalink / raw)
  To: linux-arm-kernel

Arnd Bergmann <arnd@arndb.de> writes:

> On Thursday 27 November 2014 15:51:55 Mason wrote:
>> Arnd,
>> 
>> First of all, thanks (a lot) for your highly informative replies!
>> 
>> On 27/11/2014 14:12, Arnd Bergmann wrote:
>> 
>> > On Thursday 27 November 2014 14:01:41 Mason wrote:
>> >
>> >> #define gbus_read_reg32(r)      __raw_readl((volatile void __iomem *)IO_ADDRESS(r))
>> >> #define gbus_write_reg32(r, v)  __raw_writel(v, (volatile void __iomem *)IO_ADDRESS(r))
>> >
>> > Right, that's how things used to be done a while ago.
>> 
>> So, IIUC, old code used to call __raw_readl directly, but modern
>> code is supposed to call either readl or readl_relaxed?
>> 
>> (BTW, the original code is 4-5 years old, while my target is 3.14.x)
>
> I meant the IO_ADDRESS stuff. Modern code uses ioremap() instead
> since the IO_ADDRESS was platform specific, and drivers can no longer
> use platform headers on CONFIG_ARCH_MULTIPLATFORM, which is used
> for all new code now.
>
> The __raw_readl() was probably considered bad back then already, but
> a lot of people did it as we were lacking readl_relaxed().
>
> Would you consider submitting the code upstream?

This appears to be targeting a Sigma Designs chip, and nobody has to my
knowledge ever seen anything go upstream from them.

-- 
M?ns Rullg?rd
mans at mansr.com

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Code generation involving __raw_readl and __raw_writel
  2014-11-27 15:36           ` Måns Rullgård
@ 2014-11-27 15:55             ` Mason
  2014-11-27 16:18               ` Måns Rullgård
  0 siblings, 1 reply; 17+ messages in thread
From: Mason @ 2014-11-27 15:55 UTC (permalink / raw)
  To: linux-arm-kernel

On 27/11/2014 16:36, M?ns Rullg?rd wrote:

> Arnd Bergmann wrote:
>
>> Would you consider submitting the code upstream?
>
> This appears to be targeting a Sigma Designs chip, and nobody has
> to my knowledge ever seen anything go upstream from them.

Well, I do hope to push as much as I can into the vanilla kernel.
(That is what "to submit the code upstream" means, right?) I'm
just now starting the learning process though.

Regards.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Code generation involving __raw_readl and __raw_writel
  2014-11-27 15:55             ` Mason
@ 2014-11-27 16:18               ` Måns Rullgård
  2014-11-27 16:51                 ` Arnd Bergmann
  0 siblings, 1 reply; 17+ messages in thread
From: Måns Rullgård @ 2014-11-27 16:18 UTC (permalink / raw)
  To: linux-arm-kernel

Mason <mpeg.blue@free.fr> writes:

> On 27/11/2014 16:36, M?ns Rullg?rd wrote:
>
>> Arnd Bergmann wrote:
>>
>>> Would you consider submitting the code upstream?
>>
>> This appears to be targeting a Sigma Designs chip, and nobody has
>> to my knowledge ever seen anything go upstream from them.
>
> Well, I do hope to push as much as I can into the vanilla kernel.
> (That is what "to submit the code upstream" means, right?)

That's great.  You mentioned working against 3.14.  Do you have any
plans to move onto the current version?  That would be a prerequisite
for upstream acceptance.

-- 
M?ns Rullg?rd
mans at mansr.com

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Code generation involving __raw_readl and __raw_writel
  2014-11-27 16:18               ` Måns Rullgård
@ 2014-11-27 16:51                 ` Arnd Bergmann
  2014-11-27 21:26                   ` Mason
  0 siblings, 1 reply; 17+ messages in thread
From: Arnd Bergmann @ 2014-11-27 16:51 UTC (permalink / raw)
  To: linux-arm-kernel

On Thursday 27 November 2014 16:18:20 M?ns Rullg?rd wrote:
> Mason <mpeg.blue@free.fr> writes:
> 
> > On 27/11/2014 16:36, M?ns Rullg?rd wrote:
> >
> >> Arnd Bergmann wrote:
> >>
> >>> Would you consider submitting the code upstream?
> >>
> >> This appears to be targeting a Sigma Designs chip, and nobody has
> >> to my knowledge ever seen anything go upstream from them.
> >
> > Well, I do hope to push as much as I can into the vanilla kernel.
> > (That is what "to submit the code upstream" means, right?)
> 
> That's great.  You mentioned working against 3.14.  Do you have any
> plans to move onto the current version?  That would be a prerequisite
> for upstream acceptance.

FWIW, it's also a good strategy to do the initial bringup on the latest
kernel and then backport to the older one for product work, otherwise
you end up doing the forward port twice, and it's almost always harder
than a backport.

	Arnd

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Code generation involving __raw_readl and __raw_writel
  2014-11-27 16:51                 ` Arnd Bergmann
@ 2014-11-27 21:26                   ` Mason
  2014-11-27 21:49                     ` Arnd Bergmann
  2014-11-27 21:53                     ` Måns Rullgård
  0 siblings, 2 replies; 17+ messages in thread
From: Mason @ 2014-11-27 21:26 UTC (permalink / raw)
  To: linux-arm-kernel

On 27/11/2014 17:51, Arnd Bergmann wrote:

> On Thursday 27 November 2014 16:18:20 M?ns Rullg?rd wrote:
>
>> That's great.  You mentioned working against 3.14.  Do you have any
>> plans to move onto the current version?  That would be a prerequisite
>> for upstream acceptance.
> 
> FWIW, it's also a good strategy to do the initial bringup on the latest
> kernel and then backport to the older one for product work, otherwise
> you end up doing the forward port twice, and it's almost always harder
> than a backport.

Hmmm, I was told to target 3.14, because that's what we plan to support.

Maybe I can push one driver at a time?
(I have no idea if this is realistic or not.)

Regards.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Code generation involving __raw_readl and __raw_writel
  2014-11-27 21:26                   ` Mason
@ 2014-11-27 21:49                     ` Arnd Bergmann
  2014-11-27 21:53                     ` Måns Rullgård
  1 sibling, 0 replies; 17+ messages in thread
From: Arnd Bergmann @ 2014-11-27 21:49 UTC (permalink / raw)
  To: linux-arm-kernel

On Thursday 27 November 2014 22:26:48 Mason wrote:
> On 27/11/2014 17:51, Arnd Bergmann wrote:
> 
> > On Thursday 27 November 2014 16:18:20 M?ns Rullg?rd wrote:
> >
> >> That's great.  You mentioned working against 3.14.  Do you have any
> >> plans to move onto the current version?  That would be a prerequisite
> >> for upstream acceptance.
> > 
> > FWIW, it's also a good strategy to do the initial bringup on the latest
> > kernel and then backport to the older one for product work, otherwise
> > you end up doing the forward port twice, and it's almost always harder
> > than a backport.
> 
> Hmmm, I was told to target 3.14, because that's what we plan to support.
> 
> Maybe I can push one driver at a time?
> (I have no idea if this is realistic or not.)
> 

Doing one driver at a time is the normal way anyway, what I'm saying
is just that it's more work for you if you first port everything to 3.14.

	Arnd

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Code generation involving __raw_readl and __raw_writel
  2014-11-27 21:26                   ` Mason
  2014-11-27 21:49                     ` Arnd Bergmann
@ 2014-11-27 21:53                     ` Måns Rullgård
  1 sibling, 0 replies; 17+ messages in thread
From: Måns Rullgård @ 2014-11-27 21:53 UTC (permalink / raw)
  To: linux-arm-kernel

Mason <mpeg.blue@free.fr> writes:

> On 27/11/2014 17:51, Arnd Bergmann wrote:
>
>> On Thursday 27 November 2014 16:18:20 M?ns Rullg?rd wrote:
>>
>>> That's great.  You mentioned working against 3.14.  Do you have any
>>> plans to move onto the current version?  That would be a prerequisite
>>> for upstream acceptance.
>> 
>> FWIW, it's also a good strategy to do the initial bringup on the latest
>> kernel and then backport to the older one for product work, otherwise
>> you end up doing the forward port twice, and it's almost always harder
>> than a backport.
>
> Hmmm, I was told to target 3.14, because that's what we plan to support.
>
> Maybe I can push one driver at a time?

Normally you would start with minimal chip support, just enough to
configure memory and a uart, then add drivers for various devices one by
one.

New ARM systems are expected to use DT, so you should also be switching
to that if you haven't already.

-- 
M?ns Rullg?rd
mans at mansr.com

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Code generation involving __raw_readl and __raw_writel
  2014-11-27 15:00         ` Arnd Bergmann
  2014-11-27 15:36           ` Måns Rullgård
@ 2014-11-27 15:46           ` Mason
  2014-11-27 15:59             ` Arnd Bergmann
  1 sibling, 1 reply; 17+ messages in thread
From: Mason @ 2014-11-27 15:46 UTC (permalink / raw)
  To: linux-arm-kernel

On 27/11/2014 16:00, Arnd Bergmann wrote:

> On Thursday 27 November 2014 15:51:55 Mason wrote:
>
>> (BTW, the original code is 4-5 years old, while my target is 3.14.x)
>
> I meant the IO_ADDRESS stuff. Modern code uses ioremap() instead
> since the IO_ADDRESS was platform specific, and drivers can no longer
> use platform headers on CONFIG_ARCH_MULTIPLATFORM, which is used
> for all new code now.

$ grep CONFIG_ARCH_MULTIPLATFORM .config
# CONFIG_ARCH_MULTIPLATFORM is not set

One more thing on the TODO list (which is starting to look like a
phone directory). Do you have a reference explaining this option?
(I did read the short description in arch/arm/Kconfig)

> Would you consider submitting the code upstream?

This is definitely one of my objectives. I'm still learning the ropes
(I rewrote the cpufreq driver to use mostly generic functions, heard
of the clk "meta-framework" but haven't even started looking at that,
I'm also writing a temperature sensor driver for lm-sensors.)

> Modern platforms no longer have an include directory and use the
> standard definitions. How many other header files do you have in there?

$ find arch/arm/mach-tangox -name *.h | wc -l
44

> Actually, we also try to get rid of the need for most other files
> in mach-* as well, though commonly you need a little bit of code for
> bringing up secondary CPU cores. Everything else should be a device
> driver.

I see that other platforms put their cpufreq driver in drivers/cpufreq
Ours is in arch/arm/mach-tangox (same for cpuidle.c)
(Hmmm, now that I look more closely, I see that we also have a generic
clock.c in there... More on the TODO list.)

>> IIUC, you're saying the current method using iotable_init is not
>> appropriate, and I should use the map_io callback?
>
> map_io and iotable_init is the same thing, the former calls the latter.
> What I'm saying is that relying on hardcoded virtual addresses is
> not appropriate, you can use the iotable_init call to set up a
> hugetlb mapping for performance optimization, but it should really
> work without that.

I think I don't fully understand this paragraph yet. But you've given
me a lot to think about today, so maybe It'll sink in later.

Regards.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Code generation involving __raw_readl and __raw_writel
  2014-11-27 15:46           ` Mason
@ 2014-11-27 15:59             ` Arnd Bergmann
  0 siblings, 0 replies; 17+ messages in thread
From: Arnd Bergmann @ 2014-11-27 15:59 UTC (permalink / raw)
  To: linux-arm-kernel

On Thursday 27 November 2014 16:46:30 Mason wrote:
> On 27/11/2014 16:00, Arnd Bergmann wrote:
> 
> > On Thursday 27 November 2014 15:51:55 Mason wrote:
> >
> >> (BTW, the original code is 4-5 years old, while my target is 3.14.x)
> >
> > I meant the IO_ADDRESS stuff. Modern code uses ioremap() instead
> > since the IO_ADDRESS was platform specific, and drivers can no longer
> > use platform headers on CONFIG_ARCH_MULTIPLATFORM, which is used
> > for all new code now.
> 
> $ grep CONFIG_ARCH_MULTIPLATFORM .config
> # CONFIG_ARCH_MULTIPLATFORM is not set
> 
> One more thing on the TODO list (which is starting to look like a
> phone directory). Do you have a reference explaining this option?
> (I did read the short description in arch/arm/Kconfig)

The brief version is that instead of selecting just one platform
at a time, you can now build a kernel with (almost) all ARMv6 and
ARMv7 platforms enabled at the same time, and new platforms are no
longer allowed to be separate options. When you look at any of the
modern (e.g. sunxi, rockchip, hisi, ...) platforms you find how
their Kconfig is structured. Some older platforms still have a lot
of other files in the mach-* directories for historic reasons, but
you wouldn't do that for a new platform.

> > Would you consider submitting the code upstream?
> 
> This is definitely one of my objectives. I'm still learning the ropes
> (I rewrote the cpufreq driver to use mostly generic functions, heard
> of the clk "meta-framework" but haven't even started looking at that,
> I'm also writing a temperature sensor driver for lm-sensors.)

Ok, cool. Can you upload what you have to a public git tree for
reference? 

> > Modern platforms no longer have an include directory and use the
> > standard definitions. How many other header files do you have in there?
> 
> $ find arch/arm/mach-tangox -name *.h | wc -l
> 44

Ouch. I'm guessing that a lot of these will be driver specific.
Those headers should just be merged into whatever driver uses the
registers and removed from the platform.

Then there is the case of platform_data definitions. For platforms
using traditional board files, these should get moved to
include/linux/platform_data/, but for an upstream port you will have
to use devicetree anyway, and then you won't need platform data.

> > Actually, we also try to get rid of the need for most other files
> > in mach-* as well, though commonly you need a little bit of code for
> > bringing up secondary CPU cores. Everything else should be a device
> > driver.
> 
> I see that other platforms put their cpufreq driver in drivers/cpufreq
> Ours is in arch/arm/mach-tangox (same for cpuidle.c)
> (Hmmm, now that I look more closely, I see that we also have a generic
> clock.c in there... More on the TODO list.)

Right. We moved almost everything into subsystems. Also clocksource,
irqchip, pinctrl, gpio, dma, led, pwm, and probably some more that
I forgot.

	Arnd

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2014-11-27 21:53 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-11-27 10:40 Code generation involving __raw_readl and __raw_writel Mason
2014-11-27 10:48 ` Russell King - ARM Linux
2014-11-27 11:09 ` Willy Tarreau
2014-11-27 11:23 ` Arnd Bergmann
2014-11-27 13:01   ` Mason
2014-11-27 13:12     ` Arnd Bergmann
2014-11-27 14:51       ` Mason
2014-11-27 15:00         ` Arnd Bergmann
2014-11-27 15:36           ` Måns Rullgård
2014-11-27 15:55             ` Mason
2014-11-27 16:18               ` Måns Rullgård
2014-11-27 16:51                 ` Arnd Bergmann
2014-11-27 21:26                   ` Mason
2014-11-27 21:49                     ` Arnd Bergmann
2014-11-27 21:53                     ` Måns Rullgård
2014-11-27 15:46           ` Mason
2014-11-27 15:59             ` Arnd Bergmann

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).