From: Simon Guo <wei.guo.simon@gmail.com>
To: Cyril Bur <cyrilbur@gmail.com>
Cc: linuxppc-dev@lists.ozlabs.org,
David Laight <David.Laight@ACULAB.COM>,
"Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com>
Subject: Re: [PATCH v2 2/3] powerpc/64: enhance memcmp() with VMX instruction for long bytes comparision
Date: Sun, 24 Sep 2017 05:18:43 +0800 [thread overview]
Message-ID: <20170923211843.GA10899@simonLocalRHEL7.x64> (raw)
In-Reply-To: <1506089208.1155.32.camel@gmail.com>
Hi Cyril,
On Sat, Sep 23, 2017 at 12:06:48AM +1000, Cyril Bur wrote:
> On Thu, 2017-09-21 at 07:34 +0800, wei.guo.simon@gmail.com wrote:
> > From: Simon Guo <wei.guo.simon@gmail.com>
> >
> > This patch add VMX primitives to do memcmp() in case the compare size
> > exceeds 4K bytes.
> >
>
> Hi Simon,
>
> Sorry I didn't see this sooner, I've actually been working on a kernel
> version of glibc commit dec4a7105e (powerpc: Improve memcmp performance
> for POWER8) unfortunately I've been distracted and it still isn't done.
Thanks for sync with me. Let's consolidate our effort together :)
I have a quick check on glibc commit dec4a7105e.
Looks the aligned case comparison with VSX is launched without rN size
limitation, which means it will have a VSX reg load penalty even when the
length is 9 bytes.
It did some optimization when src/dest addrs don't have the same offset
on 8 bytes alignment boundary. I need to read more closely.
> I wonder if we can consolidate our efforts here. One thing I did come
> across in my testing is that for memcmp() that will fail early (I
> haven't narrowed down the the optimal number yet) the cost of enabling
> VMX actually turns out to be a performance regression, as such I've
> added a small check of the first 64 bytes to the start before enabling
> VMX to ensure the penalty is worth taking.
Will there still be a penalty if the 65th byte differs?
>
> Also, you should consider doing 4K and greater, KSM (Kernel Samepage
> Merging) uses PAGE_SIZE which can be as small as 4K.
Currently the VMX will only be applied when size exceeds 4K. Are you
suggesting a bigger threshold than 4K?
We can sync more offline for v3.
Thanks,
- Simon
next prev parent reply other threads:[~2017-09-25 3:11 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-09-20 23:34 [PATCH v2 0/3] powerpc/64: memcmp() optimization wei.guo.simon
2017-09-20 23:34 ` [PATCH v2 1/3] powerpc/64: Align bytes before fall back to .Lshort in powerpc64 memcmp() wei.guo.simon
2017-09-20 23:34 ` [PATCH v2 2/3] powerpc/64: enhance memcmp() with VMX instruction for long bytes comparision wei.guo.simon
2017-09-21 0:54 ` Simon Guo
2017-09-22 14:06 ` Cyril Bur
2017-09-23 21:18 ` Simon Guo [this message]
2017-09-25 23:59 ` Cyril Bur
2017-09-26 5:34 ` Michael Ellerman
2017-09-26 11:26 ` Segher Boessenkool
2017-09-27 3:38 ` Michael Ellerman
2017-09-27 9:27 ` Segher Boessenkool
2017-09-27 9:43 ` David Laight
2017-09-27 18:33 ` Simon Guo
2017-09-28 9:24 ` David Laight
2017-09-27 16:22 ` Simon Guo
2017-09-20 23:34 ` [PATCH v2 3/3] powerpc:selftest update memcmp_64 selftest for VMX implementation wei.guo.simon
2017-09-25 9:30 ` David Laight
2017-09-24 6:19 ` Simon Guo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170923211843.GA10899@simonLocalRHEL7.x64 \
--to=wei.guo.simon@gmail.com \
--cc=David.Laight@ACULAB.COM \
--cc=cyrilbur@gmail.com \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=naveen.n.rao@linux.vnet.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.