From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from gate.crashing.org (gate.crashing.org [63.228.1.57]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 1077E1A075A for ; Wed, 11 Feb 2015 20:33:57 +1100 (AEDT) Message-ID: <1423643596.5965.12.camel@kernel.crashing.org> Subject: Re: [QUESTION,RFC] cacheable_memcpy() versus memcpy() ==> 8% improvment on FTP throughput From: Benjamin Herrenschmidt To: leroy christophe Date: Wed, 11 Feb 2015 19:33:16 +1100 In-Reply-To: <54DB0A6E.7020105@c-s.fr> References: <54DB0A6E.7020105@c-s.fr> Content-Type: text/plain; charset="UTF-8" Mime-Version: 1.0 Cc: Scott Wood , Anton Blanchard , LinuxPPC-dev List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , On Wed, 2015-02-11 at 08:53 +0100, leroy christophe wrote: > In powerpc32 architecture there is a function called cacheable_memcpy() > which does same thing as memcpy() but using dcbz/dcbt instructions for > an optimised copy (just like __copy_tofrom_user()) > What seems strange is that it is almost nowhere used (only used in > drivers/net/ethernet/ibm/emac/core.c) > > For a try I replaced all memcpy() in include/linux/skbuff.h and > net/core/skbuff.c by cacheable_memcpy() and I got around 8% improvement > on FTP throughput on MPC885. > > What could be done to generalise the use of cacheable_memcpy() instead > of memcpy() whenever possible ? > Indeed, in order to use cacheable_memcpy(), we need > * The destination to be cacheable > * The source and destination to not overlap on the same cachelines > > Could we check, when calling memcpy(), whether the destination is > cacheable or not, and if yes redirect the call to cacheable_memcpy() ? > How can we check that ? Additionally we could have a P8 implementation that uses unaligned vectors. Adding Anton to the CC list. Cheers, Ben.