Call for testing/opinions: Optimized memset/memcpy

public inbox for linux-arm-kernel@lists.infradead.org
 help / color / mirror / Atom feed

From: gilbertd@treblig.org (Dr. David Alan Gilbert)
To: linux-arm-kernel@lists.infradead.org
Subject: Call for testing/opinions: Optimized memset/memcpy
Date: Sat, 13 Jul 2013 17:48:40 +0100	[thread overview]
Message-ID: <20130713164840.GC28473@gallifrey> (raw)
In-Reply-To: <loom.20130713T172357-560@post.gmane.org>

* Harm Hanemaaijer (fgenfb at yahoo.com) wrote:
> Hello,
> 
> I've been doing some work on optimizing the memset/memcpy family of
> functions for modern ARM platforms, including copy_page, memset,
> memzero, memcpy, copy_from_user and copy_to_user. It appears that
> there is room for improvement, especially with regard to using an
> optimal preload strategy for armv6/v7 architectures as well as
> aligning the write target. For example, on an armv6-based platform
> (RPi) I am seeing a 80% speed-up in copy_page and large sized
> memcpy. Gains in the range 10-25% are seen on a Cortex A8 device.
> These optimizations use the regular register file, like the
> previous implementation, and do not use any NEON or vfp registers.

You might like to compare with some of the routines at:
https://launchpad.net/cortex-strings
and some of the numbers at:
https://wiki.linaro.org/WorkingGroups/ToolChain/Benchmarks/

(I'm sure Michael Hope who owns that set of stuff would be
interested in seeing your stuff as well).

> To properly benchmark and test these new implementations, I've
> created a userspace testing utility that can be used to compare
> and validate exact copies of the original and optimized kernel
> versions of the functions in userspace. The repository is
> available at https://github.com/hglm/test-arm-kernel-memcpy.git.
> It would be useful to compare the results on different
> platforms and to check whether changes in the prefetch distance
> or write alignment result in optimized performance.

It's quite tricky figuring out across different machines; also
even the same machine in different setups;

http://ssvb.github.io/2013/06/27/fullhd-x11-desktop-performance-of-the-allwinner-a10.html

is an interesting article on one machine being screwed over by
video bandwidth.

I've only had a brief scan through your code, one thing I remember
from a couple of years ago was a theory that ldrd/strd was supposed
to be faster on A15's (but I never had a chance to try it out).

<snip>

> So in short, I am looking for opinions, and test results especially
> from the userspace benchmark, to see the relative merit of these
> optimizations on different platforms.

Maybe neon is worth a try these days (although be careful of platforms
like Tegra 2 that doens't have it); there was a recent patch that enabled
use in the kernel (I think for some RAID use). The downside is it's
supposed to be quite power hungry.

Dave
-- 
 -----Open up your eyes, open up your mind, open up your code -------   
/ Dr. David Alan Gilbert    |       Running GNU/Linux       | Happy  \ 
\ gro.gilbert @ treblig.org |                               | In Hex /
 \ _________________________|_____ http://www.treblig.org   |_______/

next prev parent reply	other threads:[~2013-07-13 16:48 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-07-13 15:51 Call for testing/opinions: Optimized memset/memcpy Harm Hanemaaijer
2013-07-13 16:48 ` Dr. David Alan Gilbert [this message]
2013-07-13 21:13   ` Harm Hanemaaijer
2013-07-15 13:15     ` Catalin Marinas
2013-07-14 11:19   ` Harm Hanemaaijer
2013-07-14 11:32     ` Dr. David Alan Gilbert
2013-07-14 11:37     ` Ard Biesheuvel
2013-07-14 13:13       ` Russell King - ARM Linux
2013-07-14 13:33       ` Harm Hanemaaijer
2013-07-14 14:09         ` Ard Biesheuvel
2013-07-14 14:32           ` Russell King - ARM Linux
2013-07-13 17:24 ` Willy Tarreau
2013-07-13 21:51   ` Harm Hanemaaijer
2013-07-14  6:13     ` Willy Tarreau
2013-07-14 11:00       ` Harm Hanemaaijer
2013-07-14 13:09         ` Russell King - ARM Linux
2013-07-14 13:59           ` Harm Hanemaaijer
2013-07-14 15:21         ` Siarhei Siamashka

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130713164840.GC28473@gallifrey \
    --to=gilbertd@treblig.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox