* raid5 xor functions and caches
@ 2003-03-21 19:30 Andi Kleen
0 siblings, 0 replies; only message in thread
From: Andi Kleen @ 2003-03-21 19:30 UTC (permalink / raw)
To: mingo; +Cc: linux-kernel
Hi,
I tuned the xor.h functions for x86-64 a bit. One of the changes I did
was to use streaming stores, which are usually faster. But it's actually
slower in the test it does a booting because that tests cache hot behaviour.
NTI will invalidate the cache line, so it's eating a lot of cache misses.
Does it make sense to test cache hot behaviour? iirc RAID checksumming
should be done without polluting caches. I'm considering to change it
to benchmark a few MB of data buffers to avoid caching effects.
Also I'm not quite sure about the logic between XOR_SELECT_TEMPLATE prefering
SSE :- if the CPU supports SSE2 then all functions including integer can be
compiled with NTA prefetches and NTI stores (on some CPUs an integer NTI
store is not a good idea though) For distribution kernels
they can be duplicated. If integer is really faster for some things
it should be used this way.
Another thing I noticed is that the prefetch distance for the SSE
functions is a bit too short. On modern CPUs 256 bytes is not long enough
(on P4 that would be only two cachelines), it needs more (3-4 cachelines at
least).
-Andi
^ permalink raw reply [flat|nested] only message in thread
only message in thread, other threads:[~2003-03-21 20:12 UTC | newest]
Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-03-21 19:30 raid5 xor functions and caches Andi Kleen
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.