linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
From: Mark Nelson <markn@au1.ibm.com>
To: linuxppc-dev@ozlabs.org, cbe-oss-dev@ozlabs.org
Subject: [RFC 0/2] powerpc: copy_4K_page tweaked for Cell
Date: Thu, 14 Aug 2008 16:17:32 +1000	[thread overview]
Message-ID: <200808141617.32033.markn@au1.ibm.com> (raw)

Hi All,

What follows is an updated version of copy_4K_page that has been tuned
for the Cell processor. With this new routine it was found that the
system time measured when compiling a 2.6.26 pseries_defconfig was
reduced by ~10s:

mainline (2.6.27-rc1-00632-g2e1e921):

real    17m8.727s
user    59m48.693s
sys     3m56.089s

real    17m9.350s
user    59m44.822s
sys     3m56.666s

new routine:

real    17m7.311s
user    59m51.339s
sys     3m47.043s

real    17m7.863s
user    59m49.028s
sys     3m46.608s

This same routine was also found to improve performance on 970 CPUs
too (but by a much smaller amount):

mainline (2.6.27-rc1-00632-g2e1e921):

real    16m8.545s
user    14m38.134s
sys     1m55.156s

real    16m7.089s
user    14m37.974s
sys     1m55.010s

new routine:

real    16m11.641s
user    14m37.251s
sys     1m52.618s

real    16m6.139s
user    14m38.282s
sys     1m53.184s


I also did testing on Power{3..6} and I found that Power3, Power5 and
Power6 did better with this new routine when the dcbt and dcbz
weren't used (in which case they achieved performance comparable to
the existing kernel copy_4K_page routine). Power4 on other hand
performed slightly better with the dcbt and dcbz included (still
comparable to the current kernel copy_4K_page).

So in order to get the best performance across the board I created a
new CPU feature that will govern whether the dcbt and dcbz are used
(and un-creatively named it CPU_FTR_CP_USE_DCBTZ). I added it to the
CPU features of Cell, Power4 and 970.
Unfortunately I don't have access to a PA6T but judging by the
marketing material I could find, it looks like it has a strong enough
hardware prefetcher that it probably wouldn't benefit from the dcbt
and dcbz...

Okay, that's probably enough prattling along - you can all go and look
at the code now.

All comments appreciated

[I decided to post the whole copy routine rather than a diff between
it and the current one because I found the diff quite unreadable. I'll post
a real patchset after I've addressed any comments.]

Many thanks!

             reply	other threads:[~2008-08-14  6:17 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-08-14  6:17 Mark Nelson [this message]
2008-08-22  4:32 ` [PATCH 0/2] powerpc: new copy_4K_page() Mark Nelson
2008-08-22  4:36 ` [PATCH 1/2] powerpc: add new CPU feature: CPU_FTR_CP_USE_DCBTZ Mark Nelson
2008-08-22  4:39 ` [PATCH 2/2] powerpc: new copy_4K_page() Mark Nelson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200808141617.32033.markn@au1.ibm.com \
    --to=markn@au1.ibm.com \
    --cc=cbe-oss-dev@ozlabs.org \
    --cc=linuxppc-dev@ozlabs.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).