All of lore.kernel.org
 help / color / mirror / Atom feed
From: Simon Guo <wei.guo.simon@gmail.com>
To: Michael Ellerman <mpe@ellerman.id.au>
Cc: linuxppc-dev@lists.ozlabs.org, Paul Mackerras <paulus@ozlabs.org>,
	"Naveen N.  Rao" <naveen.n.rao@linux.vnet.ibm.com>,
	Cyril Bur <cyrilbur@gmail.com>
Subject: Re: [PATCH v6 2/4] powerpc/64: enhance memcmp() with VMX instruction for long bytes comparision
Date: Wed, 30 May 2018 16:15:33 +0800	[thread overview]
Message-ID: <20180530081533.GC5951@simonLocalRHEL7.x64> (raw)
In-Reply-To: <87fu2c3s9q.fsf@concordia.ellerman.id.au>

Hi Michael,
On Mon, May 28, 2018 at 09:59:29PM +1000, Michael Ellerman wrote:
> Hi Simon,
> 
> wei.guo.simon@gmail.com writes:
> > diff --git a/arch/powerpc/lib/memcmp_64.S b/arch/powerpc/lib/memcmp_64.S
> > index f20e883..4ba7bb6 100644
> > --- a/arch/powerpc/lib/memcmp_64.S
> > +++ b/arch/powerpc/lib/memcmp_64.S
> > @@ -174,6 +235,13 @@ _GLOBAL(memcmp)
> >  	blr
> >  
> >  .Llong:
> > +#ifdef CONFIG_ALTIVEC
> > +	/* Try to use vmx loop if length is equal or greater than 4K */
> > +	cmpldi  cr6,r5,VMX_THRESH
> > +	bge	cr6,.Lsameoffset_vmx_cmp
> > +
> 
> Here we decide to use vmx, but we don't do any CPU feature checks.
> 
> 
> > @@ -332,7 +400,94 @@ _GLOBAL(memcmp)
> >  8:
> >  	blr
> >  
> > +#ifdef CONFIG_ALTIVEC
> > +.Lsameoffset_vmx_cmp:
> > +	/* Enter with src/dst addrs has the same offset with 8 bytes
> > +	 * align boundary
> > +	 */
> > +	ENTER_VMX_OPS
> > +	beq     cr1,.Llong_novmx_cmp
> > +
> > +3:
> > +	/* need to check whether r4 has the same offset with r3
> > +	 * for 16 bytes boundary.
> > +	 */
> > +	xor	r0,r3,r4
> > +	andi.	r0,r0,0xf
> > +	bne	.Ldiffoffset_vmx_cmp_start
> > +
> > +	/* len is no less than 4KB. Need to align with 16 bytes further.
> > +	 */
> > +	andi.	rA,r3,8
> > +	LD	rA,0,r3
> > +	beq	4f
> > +	LD	rB,0,r4
> > +	cmpld	cr0,rA,rB
> > +	addi	r3,r3,8
> > +	addi	r4,r4,8
> > +	addi	r5,r5,-8
> > +
> > +	beq	cr0,4f
> > +	/* save and restore cr0 */
> > +	mfocrf  r5,64
> > +	EXIT_VMX_OPS
> > +	mtocrf	64,r5
> > +	b	.LcmpAB_lightweight
> > +
> > +4:
> > +	/* compare 32 bytes for each loop */
> > +	srdi	r0,r5,5
> > +	mtctr	r0
> > +	clrldi  r5,r5,59
> > +	li	off16,16
> > +
> > +.balign 16
> > +5:
> > +	lvx 	v0,0,r3
> > +	lvx 	v1,0,r4
> > +	vcmpequd. v0,v0,v1
> 
> vcmpequd is only available on Power8 and later CPUs.
> 
> Which means this will crash on Power7 or earlier.
> 
> Something like this should fix it I think.
> 
> diff --git a/arch/powerpc/lib/memcmp_64.S b/arch/powerpc/lib/memcmp_64.S
> index 96eb08b2be2e..0a11ff14dcd9 100644
> --- a/arch/powerpc/lib/memcmp_64.S
> +++ b/arch/powerpc/lib/memcmp_64.S
> @@ -236,9 +236,11 @@ _GLOBAL(memcmp)
>  
>  .Llong:
>  #ifdef CONFIG_ALTIVEC
> +BEGIN_FTR_SECTION
>  	/* Try to use vmx loop if length is equal or greater than 4K */
>  	cmpldi  cr6,r5,VMX_THRESH
>  	bge	cr6,.Lsameoffset_vmx_cmp
> +END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
>  
>  .Llong_novmx_cmp:
>  #endif
Thanks for the good catch! I will update that.
> 
> 
> There's another problem which is that old toolchains don't know about
> vcmpequd. To fix that we'll need to add a macro that uses .long to
> construct the instruction.
Right. I will add the corresponding macros.

Thanks for your review.

BR,
 - Simon

  reply	other threads:[~2018-05-30  8:15 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-05-25  4:07 [PATCH v6 0/4] powerpc/64: memcmp() optimization wei.guo.simon
2018-05-25  4:07 ` [PATCH v6 1/4] powerpc/64: Align bytes before fall back to .Lshort in powerpc64 memcmp() wei.guo.simon
2018-05-28 10:35   ` Segher Boessenkool
2018-05-30  8:11     ` Simon Guo
2018-05-30  8:27       ` Segher Boessenkool
2018-05-30  9:02         ` Simon Guo
2018-05-25  4:07 ` [PATCH v6 2/4] powerpc/64: enhance memcmp() with VMX instruction for long bytes comparision wei.guo.simon
2018-05-28 11:05   ` Segher Boessenkool
2018-05-30  8:14     ` Simon Guo
2018-05-30  8:35       ` Segher Boessenkool
2018-05-30  9:03         ` Simon Guo
2018-06-06  6:42           ` Simon Guo
2018-06-06 20:00             ` Segher Boessenkool
2018-05-28 11:59   ` Michael Ellerman
2018-05-30  8:15     ` Simon Guo [this message]
2018-05-25  4:07 ` [PATCH v6 3/4] powerpc/64: add 32 bytes prechecking before using VMX optimization on memcmp() wei.guo.simon
2018-05-25  4:07 ` [PATCH v6 4/4] powerpc:selftest update memcmp_64 selftest for VMX implementation wei.guo.simon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180530081533.GC5951@simonLocalRHEL7.x64 \
    --to=wei.guo.simon@gmail.com \
    --cc=cyrilbur@gmail.com \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mpe@ellerman.id.au \
    --cc=naveen.n.rao@linux.vnet.ibm.com \
    --cc=paulus@ozlabs.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.