* [QUESTION,RFC] cacheable_memcpy() versus memcpy() ==> 8% improvment on FTP throughput
@ 2015-02-11 7:53 leroy christophe
2015-02-11 8:33 ` Benjamin Herrenschmidt
0 siblings, 1 reply; 2+ messages in thread
From: leroy christophe @ 2015-02-11 7:53 UTC (permalink / raw)
To: LinuxPPC-dev, Scott Wood, Benjamin Herrenschmidt,
Joakim Tjernlund
In powerpc32 architecture there is a function called cacheable_memcpy()
which does same thing as memcpy() but using dcbz/dcbt instructions for
an optimised copy (just like __copy_tofrom_user())
What seems strange is that it is almost nowhere used (only used in
drivers/net/ethernet/ibm/emac/core.c)
For a try I replaced all memcpy() in include/linux/skbuff.h and
net/core/skbuff.c by cacheable_memcpy() and I got around 8% improvement
on FTP throughput on MPC885.
What could be done to generalise the use of cacheable_memcpy() instead
of memcpy() whenever possible ?
Indeed, in order to use cacheable_memcpy(), we need
* The destination to be cacheable
* The source and destination to not overlap on the same cachelines
Could we check, when calling memcpy(), whether the destination is
cacheable or not, and if yes redirect the call to cacheable_memcpy() ?
How can we check that ?
Christophe
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: [QUESTION,RFC] cacheable_memcpy() versus memcpy() ==> 8% improvment on FTP throughput
2015-02-11 7:53 [QUESTION,RFC] cacheable_memcpy() versus memcpy() ==> 8% improvment on FTP throughput leroy christophe
@ 2015-02-11 8:33 ` Benjamin Herrenschmidt
0 siblings, 0 replies; 2+ messages in thread
From: Benjamin Herrenschmidt @ 2015-02-11 8:33 UTC (permalink / raw)
To: leroy christophe; +Cc: Scott Wood, Anton Blanchard, LinuxPPC-dev
On Wed, 2015-02-11 at 08:53 +0100, leroy christophe wrote:
> In powerpc32 architecture there is a function called cacheable_memcpy()
> which does same thing as memcpy() but using dcbz/dcbt instructions for
> an optimised copy (just like __copy_tofrom_user())
> What seems strange is that it is almost nowhere used (only used in
> drivers/net/ethernet/ibm/emac/core.c)
>
> For a try I replaced all memcpy() in include/linux/skbuff.h and
> net/core/skbuff.c by cacheable_memcpy() and I got around 8% improvement
> on FTP throughput on MPC885.
>
> What could be done to generalise the use of cacheable_memcpy() instead
> of memcpy() whenever possible ?
> Indeed, in order to use cacheable_memcpy(), we need
> * The destination to be cacheable
> * The source and destination to not overlap on the same cachelines
>
> Could we check, when calling memcpy(), whether the destination is
> cacheable or not, and if yes redirect the call to cacheable_memcpy() ?
> How can we check that ?
Additionally we could have a P8 implementation that uses unaligned
vectors. Adding Anton to the CC list.
Cheers,
Ben.
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2015-02-11 9:33 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-02-11 7:53 [QUESTION,RFC] cacheable_memcpy() versus memcpy() ==> 8% improvment on FTP throughput leroy christophe
2015-02-11 8:33 ` Benjamin Herrenschmidt
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).