arm64 memcpy_{from|to}io and memset

Linux-ARM-Kernel Archive on lore.kernel.org
 help / color / mirror / Atom feed

* arm64 memcpy_{from|to}io and memset_io
@ 2015-10-14  6:12 Radha Mohan
  2015-10-14  8:17 ` Arnd Bergmann
  2015-10-14 16:12 ` Catalin Marinas
  0 siblings, 2 replies; 4+ messages in thread
From: Radha Mohan @ 2015-10-14  6:12 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,
I see that the memcpy_{from|to}io and memset_io are not in an
optimized manner. I guess these are just a copy from
arch/arm/include/asm/io.h where there could be problem with different
implementations.
Do we still need these to be byte write ?
Can we convert them to use a more optimized memcpy ?

We have some drivers, like framebuffer driver using these functions
and end up writing byte-by-byte. This causes a very poor VGA
performance.

Let me know if there are any concerns to convert these to use memcpy.
I can send a patch.

regards,
Radha Mohan

^ permalink raw reply	[flat|nested] 4+ messages in thread

* arm64 memcpy_{from|to}io and memset_io
  2015-10-14  6:12 arm64 memcpy_{from|to}io and memset_io Radha Mohan
@ 2015-10-14  8:17 ` Arnd Bergmann
  2015-10-14 16:12 ` Catalin Marinas
  1 sibling, 0 replies; 4+ messages in thread
From: Arnd Bergmann @ 2015-10-14  8:17 UTC (permalink / raw)
  To: linux-arm-kernel

On Tuesday 13 October 2015 23:12:18 Radha Mohan wrote:
> Hi,
> I see that the memcpy_{from|to}io and memset_io are not in an
> optimized manner. I guess these are just a copy from
> arch/arm/include/asm/io.h where there could be problem with different
> implementations.
> Do we still need these to be byte write ?

No.

> Can we convert them to use a more optimized memcpy ?

Yes.

> We have some drivers, like framebuffer driver using these functions
> and end up writing byte-by-byte. This causes a very poor VGA
> performance.
> 
> Let me know if there are any concerns to convert these to use memcpy.
> I can send a patch.

A few things to watch out for:

- you cannot use a static inline to do the job, because gcc might
  replace a plain memcpy() with unaligned pointer dereferences
  that are not allowed on __iomem

- when providing an external implementation of the functions, make sure
  they honor the alignment as well

- I think you need the same barriers that readl/writel have, but only
  at the start/end of the loop, not in the middle.

	Arnd

^ permalink raw reply	[flat|nested] 4+ messages in thread

* arm64 memcpy_{from|to}io and memset_io
  2015-10-14  6:12 arm64 memcpy_{from|to}io and memset_io Radha Mohan
  2015-10-14  8:17 ` Arnd Bergmann
@ 2015-10-14 16:12 ` Catalin Marinas
  2015-10-14 16:16   ` Radha Mohan
  1 sibling, 1 reply; 4+ messages in thread
From: Catalin Marinas @ 2015-10-14 16:12 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Oct 13, 2015 at 11:12:18PM -0700, Radha Mohan wrote:
> I see that the memcpy_{from|to}io and memset_io are not in an
> optimized manner. I guess these are just a copy from
> arch/arm/include/asm/io.h where there could be problem with different
> implementations.

I think you may be looking at an older kernel version. In the latest
mainline, memcpy_*io functions are more optimised in the sense that they
use 64-bit accesses if the alignment permits.

> Do we still need these to be byte write ?

No but see above.

> Can we convert them to use a more optimized memcpy ?

There is a risk to converting them to something like memcpy() as the
latter does not guarantee aligned accesses. Alignment is mandatory for
Device memory access.

> We have some drivers, like framebuffer driver using these functions
> and end up writing byte-by-byte. This causes a very poor VGA
> performance.

You probably have an old kernel version.

-- 
Catalin

^ permalink raw reply	[flat|nested] 4+ messages in thread

* arm64 memcpy_{from|to}io and memset_io
  2015-10-14 16:12 ` Catalin Marinas
@ 2015-10-14 16:16   ` Radha Mohan
  0 siblings, 0 replies; 4+ messages in thread
From: Radha Mohan @ 2015-10-14 16:16 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Oct 14, 2015 at 9:12 AM, Catalin Marinas
<catalin.marinas@arm.com> wrote:
> On Tue, Oct 13, 2015 at 11:12:18PM -0700, Radha Mohan wrote:
>> I see that the memcpy_{from|to}io and memset_io are not in an
>> optimized manner. I guess these are just a copy from
>> arch/arm/include/asm/io.h where there could be problem with different
>> implementations.
>
> I think you may be looking at an older kernel version. In the latest
> mainline, memcpy_*io functions are more optimised in the sense that they
> use 64-bit accesses if the alignment permits.
>
>> Do we still need these to be byte write ?
>
> No but see above.
>
>> Can we convert them to use a more optimized memcpy ?
>
> There is a risk to converting them to something like memcpy() as the
> latter does not guarantee aligned accesses. Alignment is mandatory for
> Device memory access.
>
>> We have some drivers, like framebuffer driver using these functions
>> and end up writing byte-by-byte. This causes a very poor VGA
>> performance.
>
> You probably have an old kernel version.

Yes, I was alternating between old and new kernels. The newer implementation
 is much better. I will try that. Thanks.

>
> --
> Catalin

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2015-10-14 16:16 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-10-14  6:12 arm64 memcpy_{from|to}io and memset_io Radha Mohan
2015-10-14  8:17 ` Arnd Bergmann
2015-10-14 16:12 ` Catalin Marinas
2015-10-14 16:16   ` Radha Mohan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox