From: Andrew Morton <akpm@digeo.com>
To: Mala Anand <manand@us.ibm.com>
Cc: lkml <linux-kernel@vger.kernel.org>,
"linux-mm@kvack.org" <linux-mm@kvack.org>,
Bill Hartner <bhartner@us.ibm.com>
Subject: Re: 2.5.40-mm1
Date: Wed, 09 Oct 2002 16:32:11 -0700 [thread overview]
Message-ID: <3DA4BC7B.EC8D65A3@digeo.com> (raw)
In-Reply-To: OF13BF2DC5.95D8249D-ON87256C4C.00509A83@boulder.ibm.com
Mala Anand wrote:
>
> ...
> P4 Xeon CPU 1.50 GHz 4-way - hyperthreading disabled
> Src is aligned and dst is misaligned as follows:
>
> Dst 2.5.40 2.5.40+patch 2.5.40+patch++
> Align throughout throughput throughput
> (bytes) KB/sec KB/sec KB/sec
> 0 1360071 1314783 912359
> 1 323674 340447
> 2 329202 336425
> 4 512955 693170
> 8 523223 615097 506641
> 12 517184 558701 553700
> 16 966598 872080 932736
> 32 846937 838514 845178
Note the tremendous slowdown which the P4 suffers when you're not
cacheline aligned. Even 32-byte-aligned is down a lot.
> I see too much variance in the test results so I ran
> each test 3 times. I tried increasing the iterations
> but it did not reduce the variance.
>
> Dst is aligned and src is misaligned as follows:
>
> Dst 2.5.40 2.5.40+patch
> Align throughout throughput
> (bytes) KB/sec KB/sec
> 0 1275372 1029815
> 1 529907 511815
> 2 534811 530850
> 4 643196 627013
> 8 568000 626676
> 12 574468 658793
> 16 631707 635979
> 32 741485 592938
This differs a little from my P4 testing - the rep;movsl approach
seemed OK for 8,16,32 alignment.
But still, that's something we can tune later.
>
> However I have seen using floating point registers instead of integer
> registers on Pentium IV improves performance to a greater extent on
> some alignments. I need to do more testing and then I will create a
> patch for pentium IV.
I believe there are "issues" using those registers in-kernel. Related
to the need to save/restore them, or errata; not too sure about that.
WARNING: multiple messages have this Message-ID (diff)
From: Andrew Morton <akpm@digeo.com>
To: Mala Anand <manand@us.ibm.com>
Cc: lkml <linux-kernel@vger.kernel.org>,
"linux-mm@kvack.org" <linux-mm@kvack.org>,
Bill Hartner <bhartner@us.ibm.com>
Subject: Re: 2.5.40-mm1
Date: Wed, 09 Oct 2002 16:32:11 -0700 [thread overview]
Message-ID: <3DA4BC7B.EC8D65A3@digeo.com> (raw)
In-Reply-To: OF13BF2DC5.95D8249D-ON87256C4C.00509A83@boulder.ibm.com
Mala Anand wrote:
>
> ...
> P4 Xeon CPU 1.50 GHz 4-way - hyperthreading disabled
> Src is aligned and dst is misaligned as follows:
>
> Dst 2.5.40 2.5.40+patch 2.5.40+patch++
> Align throughout throughput throughput
> (bytes) KB/sec KB/sec KB/sec
> 0 1360071 1314783 912359
> 1 323674 340447
> 2 329202 336425
> 4 512955 693170
> 8 523223 615097 506641
> 12 517184 558701 553700
> 16 966598 872080 932736
> 32 846937 838514 845178
Note the tremendous slowdown which the P4 suffers when you're not
cacheline aligned. Even 32-byte-aligned is down a lot.
> I see too much variance in the test results so I ran
> each test 3 times. I tried increasing the iterations
> but it did not reduce the variance.
>
> Dst is aligned and src is misaligned as follows:
>
> Dst 2.5.40 2.5.40+patch
> Align throughout throughput
> (bytes) KB/sec KB/sec
> 0 1275372 1029815
> 1 529907 511815
> 2 534811 530850
> 4 643196 627013
> 8 568000 626676
> 12 574468 658793
> 16 631707 635979
> 32 741485 592938
This differs a little from my P4 testing - the rep;movsl approach
seemed OK for 8,16,32 alignment.
But still, that's something we can tune later.
>
> However I have seen using floating point registers instead of integer
> registers on Pentium IV improves performance to a greater extent on
> some alignments. I need to do more testing and then I will create a
> patch for pentium IV.
I believe there are "issues" using those registers in-kernel. Related
to the need to save/restore them, or errata; not too sure about that.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/
next prev parent reply other threads:[~2002-10-09 23:29 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2002-10-09 23:20 2.5.40-mm1 Mala Anand
2002-10-09 23:20 ` 2.5.40-mm1 Mala Anand
2002-10-09 23:32 ` Andrew Morton [this message]
2002-10-09 23:32 ` 2.5.40-mm1 Andrew Morton
-- strict thread matches above, loose matches on Subject: below --
2002-10-01 9:32 2.5.40-mm1 Andrew Morton
2002-10-01 9:32 ` 2.5.40-mm1 Andrew Morton
2002-10-08 6:46 ` 2.5.40-mm1 Andrew Morton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=3DA4BC7B.EC8D65A3@digeo.com \
--to=akpm@digeo.com \
--cc=bhartner@us.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=manand@us.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.