linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
From: Paul Mackerras <paulus@samba.org>
To: Arnd Bergmann <arnd@arndb.de>
Cc: linuxppc-dev@ozlabs.org, Gunnar von Boehn <VONBOEHN@de.ibm.com>,
	Michael Ellerman <ellerman@au1.ibm.com>,
	cbe-oss-dev@ozlabs.org
Subject: Re: [Cbe-oss-dev] [RFC 1/3] powerpc: __copy_tofrom_user tweaked for Cell
Date: Sat, 21 Jun 2008 14:30:02 +1000	[thread overview]
Message-ID: <18524.33738.450400.63491@cargo.ozlabs.ibm.com> (raw)
In-Reply-To: <200806210400.20794.arnd@arndb.de>

Arnd Bergmann writes:

> On Friday 20 June 2008, Paul Mackerras wrote:
> 
> > Transferring data over loopback is possibly an exception to that.
> > However, it's very rare to transfer large amounts of data over
> > loopback, unless you're running a benchmark like iperf or netperf. :-/
> 
> Well, it is the exact case that came up in a real world scenario
> for cell: On a network intensive application where the SPUs are
> supposed to do all the work, we ended up not getting enough
> data in and out through gbit ethernet because the PPU spent
			  ^^^^^^^^^^^^^
Which isn't loopback... :)

I have no objection to improving copy_tofrom_user, memcpy and
copy_page.  I just want to make sure that we don't make things worse
on some platform.

In fact, Mark and I dug up some experiments I had done 5 or 6 years
ago and just ran through all the copy loops I tried back then, on
QS22, POWER6, POWER5+, POWER5, POWER4, 970, and POWER3, and compared
them to the current kernel routines and the proposed new Cell
routines.  So far we have just looked at the copy_page case (i.e. 4kB
on a 4kB alignment) for cache-cold and cache-hot cases.
Interestingly, some of the routines I discarded back then turn out to
do really well on most of the modern platforms, and quite a lot better
on Cell than Gunnar's code does (~10GB/s vs. ~5.5GB/s in the hot-cache
case, IIRC).  Mark is going to summarise the results and also measure
the speed for smaller copies and misaligned copies.

As for the distribution of sizes, I think it would be worthwhile to
run a fresh set of tests.  As I said, my previous results showed most
copies to be either small (<= 128B) or a multiple of 4k, and I think
that was true for copy_tofrom_user as well as memcpy, but that was a
while ago.

> much of its time in copy_to_user.
> 
> Going to 10gbit will make the problem even more apparent.

Is this application really transferring bulk data and using buffers
that aren't a multiple of the page size?  Do you know whether the
copies ended up being misaligned?

Of course, if we really want the fastest copy possible, the thing to
do is to use VMX loads and stores on 970, POWER6 and Cell.  The
overhead of setting up to use VMX in the kernel would probably kill
any advantage, though -- at least, that's what I found when I tried
using VMX for copy_page in the kernel on 970 a few years ago.

> Doing some static compile-time analysis, I found that most
> of the call sites (which are not necessarily most of
> the run time calls) pass either a small constant size of
> less than a few cache lines, or have a variable size but are
> not at all performance critical.
> Since the prefetching and cache line size awareness was
> most of the improvement for cell (AFAIU), maybe we can
> annotate the few interesting cases, say by introducing a
> new copy_from_user_large() function that can be easily
> optimized for large transfers on a given CPU, while
> the remaining code keeps optmizing for small transfers
> and may even get rid of the full page copy optimization
> in order to save a branch.

Let's see what Mark comes up with.  We may be able to find a way to do
it that works well across all current CPUs and also is OK for small
copies.  If not we might need to do what you suggest.

Regards,
Paul.

  reply	other threads:[~2008-06-21  4:30 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-06-19  7:53 [RFC 1/3] powerpc: __copy_tofrom_user tweaked for Cell Mark Nelson
2008-06-19 14:43 ` Arnd Bergmann
2008-06-19 15:17   ` Gunnar von Boehn
2008-06-19 16:13     ` Sanjay Patel
2008-06-20 11:36       ` Gunnar von Boehn
2008-06-20 17:46         ` Sanjay Patel
2008-06-20 23:20           ` Benjamin Herrenschmidt
2008-06-20 23:44             ` Sanjay Patel
2008-06-23  8:30           ` Gunnar von Boehn
2008-06-23 12:07             ` Geert Uytterhoeven
2008-06-23 23:49             ` Paul Mackerras
2008-06-27 13:30               ` Gunnar von Boehn
2008-06-20  1:13     ` [Cbe-oss-dev] " Paul Mackerras
2008-06-20 16:47       ` Gunnar von Boehn
2008-06-21  2:00       ` Arnd Bergmann
2008-06-21  4:30         ` Paul Mackerras [this message]
2008-06-21  4:49           ` David Miller
2008-06-21 21:06           ` Arnd Bergmann
2008-06-20  1:55   ` Mark Nelson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=18524.33738.450400.63491@cargo.ozlabs.ibm.com \
    --to=paulus@samba.org \
    --cc=VONBOEHN@de.ibm.com \
    --cc=arnd@arndb.de \
    --cc=cbe-oss-dev@ozlabs.org \
    --cc=ellerman@au1.ibm.com \
    --cc=linuxppc-dev@ozlabs.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).