All of lore.kernel.org
 help / color / mirror / Atom feed
From: Scott Wood <scottwood@freescale.com>
To: christophe leroy <christophe.leroy@c-s.fr>
Cc: Kyle Moffett <kyle@moffetthome.net>,
	linux-kernel@vger.kernel.org, Paul Mackerras <paulus@samba.org>,
	linuxppc-dev@lists.ozlabs.org
Subject: Re: [PATCH 3/4] powerpc32: memset(0): use cacheable_memzero
Date: Thu, 14 May 2015 15:18:00 -0500	[thread overview]
Message-ID: <1431634680.3868.200.camel@freescale.com> (raw)
In-Reply-To: <555461BF.5020105@c-s.fr>

On Thu, 2015-05-14 at 10:50 +0200, christophe leroy wrote:
> 
> Le 14/05/2015 02:55, Scott Wood a écrit :
> > On Tue, 2015-05-12 at 15:32 +0200, Christophe Leroy wrote:
> >> cacheable_memzero uses dcbz instruction and is more efficient than
> >> memset(0) when the destination is in RAM
> >>
> >> This patch renames memset as generic_memset, and defines memset
> >> as a prolog to cacheable_memzero. This prolog checks if the byte
> >> to set is 0 and if the buffer is in RAM. If not, it falls back to
> >> generic_memcpy()
> >>
> >> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
> >> ---
> >>   arch/powerpc/lib/copy_32.S | 15 ++++++++++++++-
> >>   1 file changed, 14 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/arch/powerpc/lib/copy_32.S b/arch/powerpc/lib/copy_32.S
> >> index cbca76c..d8a9a86 100644
> >> --- a/arch/powerpc/lib/copy_32.S
> >> +++ b/arch/powerpc/lib/copy_32.S
> >> @@ -12,6 +12,7 @@
> >>   #include <asm/cache.h>
> >>   #include <asm/errno.h>
> >>   #include <asm/ppc_asm.h>
> >> +#include <asm/page.h>
> >>   
> >>   #define COPY_16_BYTES		\
> >>   	lwz	r7,4(r4);	\
> >> @@ -74,6 +75,18 @@ CACHELINE_MASK = (L1_CACHE_BYTES-1)
> >>    * to set them to zero.  This requires that the destination
> >>    * area is cacheable.  -- paulus
> >>    */
> >> +_GLOBAL(memset)
> >> +	cmplwi	r4,0
> >> +	bne-	generic_memset
> >> +	cmplwi	r5,L1_CACHE_BYTES
> >> +	blt-	generic_memset
> >> +	lis	r8,max_pfn@ha
> >> +	lwz	r8,max_pfn@l(r8)
> >> +	tophys	(r9,r3)
> >> +	srwi	r9,r9,PAGE_SHIFT
> >> +	cmplw	r9,r8
> >> +	bge-	generic_memset
> >> +	mr	r4,r5
> > max_pfn includes highmem, and tophys only works on normal kernel
> > addresses.
> Is there any other simple way to determine whether an address is in RAM 
> or not ?

If you want to do it based on the virtual address, rather than doing a
tablewalk or TLB search, you need to limit it to lowmem.

> I did that because of the below function from mm/mem.c
> 
> |int  page_is_ram(unsigned long  pfn)
> {
> #ifndef CONFIG_PPC64	/* XXX for now */
> 	return  pfn<  max_pfn;
> #else
> 	unsigned long  paddr= (pfn<<  PAGE_SHIFT);
> 	struct  memblock_region*reg;
> 
> 	for_each_memblock(memory,  reg)
> 		if  (paddr>=  reg->base&&  paddr< (reg->base+  reg->size))
> 			return  1;
> 	return  0;
> #endif
> }

Right, the problem is figuring out the pfn in the first place.

> > If we were to point memset_io, memcpy_toio, etc. at noncacheable
> > versions, are there any other callers left that can reasonably point at
> > uncacheable memory?
> Do you mean we could just consider that memcpy() and memset() are called 
> only with destination on RAM and thus we could avoid the check ?

Maybe.  If that's not a safe assumption I hope someone will point it
out.

> copy_tofrom_user() already does this assumption (allthought a user app 
> could possibly provide a buffer located in an ALSA mapped IO area)

The user could also pass in NULL.  That's what the fixups are for. :-)

-Scott

WARNING: multiple messages have this Message-ID (diff)
From: Scott Wood <scottwood@freescale.com>
To: christophe leroy <christophe.leroy@c-s.fr>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	Paul Mackerras <paulus@samba.org>,
	Michael Ellerman <mpe@ellerman.id.au>,
	<linux-kernel@vger.kernel.org>, <linuxppc-dev@lists.ozlabs.org>,
	"Joakim Tjernlund" <joakim.tjernlund@transmode.se>,
	Kyle Moffett <kyle@moffetthome.net>
Subject: Re: [PATCH 3/4] powerpc32: memset(0): use cacheable_memzero
Date: Thu, 14 May 2015 15:18:00 -0500	[thread overview]
Message-ID: <1431634680.3868.200.camel@freescale.com> (raw)
In-Reply-To: <555461BF.5020105@c-s.fr>

On Thu, 2015-05-14 at 10:50 +0200, christophe leroy wrote:
> 
> Le 14/05/2015 02:55, Scott Wood a écrit :
> > On Tue, 2015-05-12 at 15:32 +0200, Christophe Leroy wrote:
> >> cacheable_memzero uses dcbz instruction and is more efficient than
> >> memset(0) when the destination is in RAM
> >>
> >> This patch renames memset as generic_memset, and defines memset
> >> as a prolog to cacheable_memzero. This prolog checks if the byte
> >> to set is 0 and if the buffer is in RAM. If not, it falls back to
> >> generic_memcpy()
> >>
> >> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
> >> ---
> >>   arch/powerpc/lib/copy_32.S | 15 ++++++++++++++-
> >>   1 file changed, 14 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/arch/powerpc/lib/copy_32.S b/arch/powerpc/lib/copy_32.S
> >> index cbca76c..d8a9a86 100644
> >> --- a/arch/powerpc/lib/copy_32.S
> >> +++ b/arch/powerpc/lib/copy_32.S
> >> @@ -12,6 +12,7 @@
> >>   #include <asm/cache.h>
> >>   #include <asm/errno.h>
> >>   #include <asm/ppc_asm.h>
> >> +#include <asm/page.h>
> >>   
> >>   #define COPY_16_BYTES		\
> >>   	lwz	r7,4(r4);	\
> >> @@ -74,6 +75,18 @@ CACHELINE_MASK = (L1_CACHE_BYTES-1)
> >>    * to set them to zero.  This requires that the destination
> >>    * area is cacheable.  -- paulus
> >>    */
> >> +_GLOBAL(memset)
> >> +	cmplwi	r4,0
> >> +	bne-	generic_memset
> >> +	cmplwi	r5,L1_CACHE_BYTES
> >> +	blt-	generic_memset
> >> +	lis	r8,max_pfn@ha
> >> +	lwz	r8,max_pfn@l(r8)
> >> +	tophys	(r9,r3)
> >> +	srwi	r9,r9,PAGE_SHIFT
> >> +	cmplw	r9,r8
> >> +	bge-	generic_memset
> >> +	mr	r4,r5
> > max_pfn includes highmem, and tophys only works on normal kernel
> > addresses.
> Is there any other simple way to determine whether an address is in RAM 
> or not ?

If you want to do it based on the virtual address, rather than doing a
tablewalk or TLB search, you need to limit it to lowmem.

> I did that because of the below function from mm/mem.c
> 
> |int  page_is_ram(unsigned long  pfn)
> {
> #ifndef CONFIG_PPC64	/* XXX for now */
> 	return  pfn<  max_pfn;
> #else
> 	unsigned long  paddr= (pfn<<  PAGE_SHIFT);
> 	struct  memblock_region*reg;
> 
> 	for_each_memblock(memory,  reg)
> 		if  (paddr>=  reg->base&&  paddr< (reg->base+  reg->size))
> 			return  1;
> 	return  0;
> #endif
> }

Right, the problem is figuring out the pfn in the first place.

> > If we were to point memset_io, memcpy_toio, etc. at noncacheable
> > versions, are there any other callers left that can reasonably point at
> > uncacheable memory?
> Do you mean we could just consider that memcpy() and memset() are called 
> only with destination on RAM and thus we could avoid the check ?

Maybe.  If that's not a safe assumption I hope someone will point it
out.

> copy_tofrom_user() already does this assumption (allthought a user app 
> could possibly provide a buffer located in an ALSA mapped IO area)

The user could also pass in NULL.  That's what the fixups are for. :-)

-Scott



  reply	other threads:[~2015-05-14 20:18 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-05-12 13:32 [PATCH 0/4] powerpc32: use cacheable alternatives of memcpy and memset Christophe Leroy
2015-05-12 13:32 ` Christophe Leroy
2015-05-12 13:32 ` [PATCH 1/4] Partially revert "powerpc: Remove duplicate cacheable_memcpy/memzero functions" Christophe Leroy
2015-05-12 13:32   ` Christophe Leroy
2015-05-14  0:49   ` Scott Wood
2015-05-14  0:49     ` Scott Wood
2015-05-15 17:58     ` christophe leroy
2015-05-15 17:58       ` christophe leroy
2015-05-12 13:32 ` [PATCH 2/4] powerpc32: swap r4 and r5 in cacheable_memzero Christophe Leroy
2015-05-12 13:32   ` Christophe Leroy
2015-05-12 13:32 ` [PATCH 3/4] powerpc32: memset(0): use cacheable_memzero Christophe Leroy
2015-05-12 13:32   ` Christophe Leroy
2015-05-14  0:55   ` Scott Wood
2015-05-14  0:55     ` Scott Wood
2015-05-14  8:50     ` christophe leroy
2015-05-14  8:50       ` christophe leroy
2015-05-14 20:18       ` Scott Wood [this message]
2015-05-14 20:18         ` Scott Wood
2015-05-12 13:32 ` [PATCH 4/4] powerpc32: memcpy: use cacheable_memcpy Christophe Leroy
2015-05-12 13:32   ` Christophe Leroy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1431634680.3868.200.camel@freescale.com \
    --to=scottwood@freescale.com \
    --cc=christophe.leroy@c-s.fr \
    --cc=kyle@moffetthome.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=paulus@samba.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.