All of lore.kernel.org
 help / color / mirror / Atom feed
From: gilbertd@treblig.org (Dr. David Alan Gilbert)
To: linux-arm-kernel@lists.infradead.org
Subject: Call for testing/opinions: Optimized memset/memcpy
Date: Sat, 13 Jul 2013 17:48:40 +0100	[thread overview]
Message-ID: <20130713164840.GC28473@gallifrey> (raw)
In-Reply-To: <loom.20130713T172357-560@post.gmane.org>

* Harm Hanemaaijer (fgenfb at yahoo.com) wrote:
> Hello,
> 
> I've been doing some work on optimizing the memset/memcpy family of
> functions for modern ARM platforms, including copy_page, memset,
> memzero, memcpy, copy_from_user and copy_to_user. It appears that
> there is room for improvement, especially with regard to using an
> optimal preload strategy for armv6/v7 architectures as well as
> aligning the write target. For example, on an armv6-based platform
> (RPi) I am seeing a 80% speed-up in copy_page and large sized
> memcpy. Gains in the range 10-25% are seen on a Cortex A8 device.
> These optimizations use the regular register file, like the
> previous implementation, and do not use any NEON or vfp registers.

You might like to compare with some of the routines at:
https://launchpad.net/cortex-strings
and some of the numbers at:
https://wiki.linaro.org/WorkingGroups/ToolChain/Benchmarks/

(I'm sure Michael Hope who owns that set of stuff would be
interested in seeing your stuff as well).

> To properly benchmark and test these new implementations, I've
> created a userspace testing utility that can be used to compare
> and validate exact copies of the original and optimized kernel
> versions of the functions in userspace. The repository is
> available at https://github.com/hglm/test-arm-kernel-memcpy.git.
> It would be useful to compare the results on different
> platforms and to check whether changes in the prefetch distance
> or write alignment result in optimized performance.

It's quite tricky figuring out across different machines; also
even the same machine in different setups;

http://ssvb.github.io/2013/06/27/fullhd-x11-desktop-performance-of-the-allwinner-a10.html

is an interesting article on one machine being screwed over by
video bandwidth.

I've only had a brief scan through your code, one thing I remember
from a couple of years ago was a theory that ldrd/strd was supposed
to be faster on A15's (but I never had a chance to try it out).

<snip>

> So in short, I am looking for opinions, and test results especially
> from the userspace benchmark, to see the relative merit of these
> optimizations on different platforms.

Maybe neon is worth a try these days (although be careful of platforms
like Tegra 2 that doens't have it); there was a recent patch that enabled
use in the kernel (I think for some RAID use). The downside is it's
supposed to be quite power hungry.

Dave
-- 
 -----Open up your eyes, open up your mind, open up your code -------   
/ Dr. David Alan Gilbert    |       Running GNU/Linux       | Happy  \ 
\ gro.gilbert @ treblig.org |                               | In Hex /
 \ _________________________|_____ http://www.treblig.org   |_______/

  reply	other threads:[~2013-07-13 16:48 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-07-13 15:51 Call for testing/opinions: Optimized memset/memcpy Harm Hanemaaijer
2013-07-13 16:48 ` Dr. David Alan Gilbert [this message]
2013-07-13 21:13   ` Harm Hanemaaijer
2013-07-15 13:15     ` Catalin Marinas
2013-07-14 11:19   ` Harm Hanemaaijer
2013-07-14 11:32     ` Dr. David Alan Gilbert
2013-07-14 11:37     ` Ard Biesheuvel
2013-07-14 13:13       ` Russell King - ARM Linux
2013-07-14 13:33       ` Harm Hanemaaijer
2013-07-14 14:09         ` Ard Biesheuvel
2013-07-14 14:32           ` Russell King - ARM Linux
2013-07-13 17:24 ` Willy Tarreau
2013-07-13 21:51   ` Harm Hanemaaijer
2013-07-14  6:13     ` Willy Tarreau
2013-07-14 11:00       ` Harm Hanemaaijer
2013-07-14 13:09         ` Russell King - ARM Linux
2013-07-14 13:59           ` Harm Hanemaaijer
2013-07-14 15:21         ` Siarhei Siamashka

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130713164840.GC28473@gallifrey \
    --to=gilbertd@treblig.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.