All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrew Morton <akpm@digeo.com>
To: Mala Anand <manand@us.ibm.com>
Cc: lkml <linux-kernel@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	Bill Hartner <bhartner@us.ibm.com>
Subject: Re: 2.5.40-mm1
Date: Wed, 09 Oct 2002 16:32:11 -0700	[thread overview]
Message-ID: <3DA4BC7B.EC8D65A3@digeo.com> (raw)
In-Reply-To: OF13BF2DC5.95D8249D-ON87256C4C.00509A83@boulder.ibm.com

Mala Anand wrote:
> 
> ...
> P4 Xeon CPU 1.50 GHz 4-way - hyperthreading disabled
> Src is aligned and dst is misaligned as follows:
> 
>  Dst      2.5.40       2.5.40+patch     2.5.40+patch++
> Align    throughout     throughput      throughput
> (bytes)   KB/sec          KB/sec        KB/sec
>   0       1360071         1314783        912359
>   1       323674           340447
>   2       329202           336425
>   4       512955           693170
>   8       523223           615097        506641
>  12       517184           558701        553700
>  16       966598           872080        932736
>  32       846937           838514        845178

Note the tremendous slowdown which the P4 suffers when you're not
cacheline aligned.  Even 32-byte-aligned is down a lot.

 
> I see too much variance in the test results so I ran
> each test 3 times. I tried increasing the iterations
> but it did not reduce the variance.
> 
> Dst is aligned and src is misaligned as follows:
> 
>  Dst      2.5.40       2.5.40+patch
> Align    throughout     throughput
> (bytes)   KB/sec          KB/sec
>   0       1275372       1029815
>   1        529907        511815
>   2        534811        530850
>   4        643196        627013
>   8        568000        626676
>  12        574468        658793
>  16        631707        635979
>  32        741485        592938

This differs a little from my P4 testing - the rep;movsl approach
seemed OK for 8,16,32 alignment.

But still, that's something we can tune later.
 
> 
> However I have seen using floating point registers instead of integer
> registers on Pentium IV improves performance to a greater extent on
> some alignments. I need to do more testing and then I will create a
> patch for pentium IV.

I believe there are "issues" using those registers in-kernel. Related
to the need to save/restore them, or errata; not too sure about that.

WARNING: multiple messages have this Message-ID (diff)
From: Andrew Morton <akpm@digeo.com>
To: Mala Anand <manand@us.ibm.com>
Cc: lkml <linux-kernel@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	Bill Hartner <bhartner@us.ibm.com>
Subject: Re: 2.5.40-mm1
Date: Wed, 09 Oct 2002 16:32:11 -0700	[thread overview]
Message-ID: <3DA4BC7B.EC8D65A3@digeo.com> (raw)
In-Reply-To: OF13BF2DC5.95D8249D-ON87256C4C.00509A83@boulder.ibm.com

Mala Anand wrote:
> 
> ...
> P4 Xeon CPU 1.50 GHz 4-way - hyperthreading disabled
> Src is aligned and dst is misaligned as follows:
> 
>  Dst      2.5.40       2.5.40+patch     2.5.40+patch++
> Align    throughout     throughput      throughput
> (bytes)   KB/sec          KB/sec        KB/sec
>   0       1360071         1314783        912359
>   1       323674           340447
>   2       329202           336425
>   4       512955           693170
>   8       523223           615097        506641
>  12       517184           558701        553700
>  16       966598           872080        932736
>  32       846937           838514        845178

Note the tremendous slowdown which the P4 suffers when you're not
cacheline aligned.  Even 32-byte-aligned is down a lot.

 
> I see too much variance in the test results so I ran
> each test 3 times. I tried increasing the iterations
> but it did not reduce the variance.
> 
> Dst is aligned and src is misaligned as follows:
> 
>  Dst      2.5.40       2.5.40+patch
> Align    throughout     throughput
> (bytes)   KB/sec          KB/sec
>   0       1275372       1029815
>   1        529907        511815
>   2        534811        530850
>   4        643196        627013
>   8        568000        626676
>  12        574468        658793
>  16        631707        635979
>  32        741485        592938

This differs a little from my P4 testing - the rep;movsl approach
seemed OK for 8,16,32 alignment.

But still, that's something we can tune later.
 
> 
> However I have seen using floating point registers instead of integer
> registers on Pentium IV improves performance to a greater extent on
> some alignments. I need to do more testing and then I will create a
> patch for pentium IV.

I believe there are "issues" using those registers in-kernel. Related
to the need to save/restore them, or errata; not too sure about that.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

  reply	other threads:[~2002-10-09 23:29 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2002-10-09 23:20 2.5.40-mm1 Mala Anand
2002-10-09 23:20 ` 2.5.40-mm1 Mala Anand
2002-10-09 23:32 ` Andrew Morton [this message]
2002-10-09 23:32   ` 2.5.40-mm1 Andrew Morton
  -- strict thread matches above, loose matches on Subject: below --
2002-10-01  9:32 2.5.40-mm1 Andrew Morton
2002-10-01  9:32 ` 2.5.40-mm1 Andrew Morton
2002-10-08  6:46 ` 2.5.40-mm1 Andrew Morton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3DA4BC7B.EC8D65A3@digeo.com \
    --to=akpm@digeo.com \
    --cc=bhartner@us.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=manand@us.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.