All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrew Morton <andrewm@uow.edu.au>
To: Manfred Spraul <manfred@colorfullife.com>
Cc: linux-kernel@vger.kernel.org
Subject: Re: [beta patch] SSE copy_page() / clear_page()
Date: Sat, 17 Feb 2001 02:27:02 +1100	[thread overview]
Message-ID: <3A8D46C6.3873DF22@uow.edu.au> (raw)
In-Reply-To: <3A846C84.109F1D7D@colorfullife.com> <200102092240.OAA15902@penguin.transmeta.com> <3A8B08C7.BD79E3B4@colorfullife.com>

Manfred Spraul wrote:
> 
> Intel Pentium III and P 4 have hardcoded "fast stringcopy" operations
> that invalidate whole cachelines during write (documented in the most
> obvious place: multiprocessor management, memory ordering)

Which are dramatically slower than a simple `mov' loop for just
about all alignments, except for source and dest both eight-byte
aligned.

For example, copying an unchached source to an uncached dest,
with the source misaligned, my PIII Coppermine does 108 MBytes/sec
with `rep;movsl' and 149 MBytes/sec with an open-coded variant
of our copy_csum routines.  That's a lot.  Similar results
on a PII and a PIII Katmai.

On the K6-2, however, the string operation is almost always
a win.

It seems that a good approximation for our bulk-copy strategy is:

	if (AMD) {
		string_copy();
	} else if (intel) {
		if ((source|dest) & 7)
			duff_copy();
		else
			string_copy();
	} else {
		quack();
	}

This will make our Intel copies 20-40% faster than
at present, depending upon the distribution of
alignments.  (And for networking, the distribution
is pretty much uniform).

Somewhere on my to-do list is getting lots of people to
test lots of architectures with lots of combinations of
[source/dest][cached/uncached] at lots of alignments
to confirm if this will work.

If you have time, could you please grab

	http://www.uow.edu.au/~andrewm/linux/cptimer.tar.gz

and teach it how to do SSE copies, in preparation for this
great event?

Thanks.

-

  reply	other threads:[~2001-02-16 15:17 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2001-02-09 22:17 [beta patch] SSE copy_page() / clear_page() Manfred Spraul
2001-02-09 22:40 ` Linus Torvalds
2001-02-09 23:03   ` Doug Ledford
2001-02-10  9:09     ` Manfred Spraul
2001-02-10 17:18       ` Doug Ledford
2001-02-10 18:00         ` Manfred Spraul
2001-02-10 18:18           ` Manfred Spraul
     [not found] ` <200102092240.OAA15902@penguin.transmeta.com>
2001-02-14 22:37   ` Manfred Spraul
2001-02-16 15:27     ` Andrew Morton [this message]
2001-02-20 17:35     ` Pavel Machek
2001-02-20 20:49       ` Alan Cox
2001-02-20 20:52         ` Pavel Machek
2001-02-20 21:08           ` Alan Cox
2001-02-20 21:16           ` Manfred Spraul

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3A8D46C6.3873DF22@uow.edu.au \
    --to=andrewm@uow.edu.au \
    --cc=linux-kernel@vger.kernel.org \
    --cc=manfred@colorfullife.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.