* [x86] Fix prefetch instruction
@ 2011-08-05 16:18 Christoph Lameter
2011-08-05 20:13 ` H. Peter Anvin
0 siblings, 1 reply; 6+ messages in thread
From: Christoph Lameter @ 2011-08-05 16:18 UTC (permalink / raw)
To: Ingo Molnar; +Cc: linux-kernel, Andi Kleen, H. Peter Anvin
The prefetchnta instruction used for prefetching on x86 is a special instruction
used for streaming that is usually used to avoid polluting the l2 and l3 caches.
The cacheline will be evicted rapidly.
What we need is a prefetch that puts the cacheline in all levels of the cache hierachy instead.
Change the instruction to do that.
Acked-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Christoph Lameter <cl@linux.com>
---
arch/x86/include/asm/processor.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
Index: linux-2.6/arch/x86/include/asm/processor.h
===================================================================
--- linux-2.6.orig/arch/x86/include/asm/processor.h 2011-08-04 13:12:39.000000000 -0500
+++ linux-2.6/arch/x86/include/asm/processor.h 2011-08-04 13:16:31.000000000 -0500
@@ -829,7 +829,7 @@ extern char ignore_fpu_irq;
static inline void prefetch(const void *x)
{
alternative_input(BASE_PREFETCH,
- "prefetchnta (%1)",
+ "prefetcht0 (%1)",
X86_FEATURE_XMM,
"r" (x));
}
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [x86] Fix prefetch instruction
2011-08-05 16:18 [x86] Fix prefetch instruction Christoph Lameter
@ 2011-08-05 20:13 ` H. Peter Anvin
2011-08-05 21:10 ` Christoph Lameter
0 siblings, 1 reply; 6+ messages in thread
From: H. Peter Anvin @ 2011-08-05 20:13 UTC (permalink / raw)
To: Christoph Lameter; +Cc: Ingo Molnar, linux-kernel, Andi Kleen, H. Peter Anvin
On 08/05/2011 09:18 AM, Christoph Lameter wrote:
> The prefetchnta instruction used for prefetching on x86 is a special instruction
> used for streaming that is usually used to avoid polluting the l2 and l3 caches.
> The cacheline will be evicted rapidly.
>
> What we need is a prefetch that puts the cacheline in all levels of the cache hierachy instead.
> Change the instruction to do that.
>
> Acked-by: Andi Kleen <ak@linux.intel.com>
> Signed-off-by: Christoph Lameter <cl@linux.com>
>
Have you done any performance analysis on this versus the null case? I
know there are some workloads where it helps, but if it hurts as many as
it helps...
-hpa
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [x86] Fix prefetch instruction
2011-08-05 20:13 ` H. Peter Anvin
@ 2011-08-05 21:10 ` Christoph Lameter
2011-08-05 21:22 ` H. Peter Anvin
0 siblings, 1 reply; 6+ messages in thread
From: Christoph Lameter @ 2011-08-05 21:10 UTC (permalink / raw)
To: H. Peter Anvin; +Cc: Ingo Molnar, linux-kernel, Andi Kleen, H. Peter Anvin
On Fri, 5 Aug 2011, H. Peter Anvin wrote:
> On 08/05/2011 09:18 AM, Christoph Lameter wrote:
> > The prefetchnta instruction used for prefetching on x86 is a special instruction
> > used for streaming that is usually used to avoid polluting the l2 and l3 caches.
> > The cacheline will be evicted rapidly.
> >
> > What we need is a prefetch that puts the cacheline in all levels of the cache hierachy instead.
> > Change the instruction to do that.
> >
> > Acked-by: Andi Kleen <ak@linux.intel.com>
> > Signed-off-by: Christoph Lameter <cl@linux.com>
> >
>
> Have you done any performance analysis on this versus the null case? I
> know there are some workloads where it helps, but if it hurts as many as
> it helps...
No I have not. prefetch IMHO means that the cacheline is fetched early so
that the cacheline is fully available like any other to the code.
prefetchnta does fetch the cacheline too but its not treated like the other cacheline but
preferably thrown out again. Its a "streamfetch" designed for apps that
scan over large amounts of memory and want to avoid cache pollution.
This is surprising to the end user as far as I can tell.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [x86] Fix prefetch instruction
2011-08-05 21:10 ` Christoph Lameter
@ 2011-08-05 21:22 ` H. Peter Anvin
2011-08-05 23:32 ` Christoph Lameter
0 siblings, 1 reply; 6+ messages in thread
From: H. Peter Anvin @ 2011-08-05 21:22 UTC (permalink / raw)
To: Christoph Lameter; +Cc: H. Peter Anvin, Ingo Molnar, linux-kernel, Andi Kleen
On 08/05/2011 02:10 PM, Christoph Lameter wrote:
>>
>> Have you done any performance analysis on this versus the null case? I
>> know there are some workloads where it helps, but if it hurts as many as
>> it helps...
>
> No I have not. prefetch IMHO means that the cacheline is fetched early so
> that the cacheline is fully available like any other to the code.
> prefetchnta does fetch the cacheline too but its not treated like the other cacheline but
> preferably thrown out again. Its a "streamfetch" designed for apps that
> scan over large amounts of memory and want to avoid cache pollution.
>
> This is surprising to the end user as far as I can tell.
>
Right. However, Linus has brought up the hypothesis that prefetch might
actually be a net loss on x86, because current x86 processors are
generally doing a good job with prefetching in hardware. Directed
prefetches can thus be a net minus.
-hpa
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [x86] Fix prefetch instruction
2011-08-05 21:22 ` H. Peter Anvin
@ 2011-08-05 23:32 ` Christoph Lameter
2011-08-05 23:53 ` H. Peter Anvin
0 siblings, 1 reply; 6+ messages in thread
From: Christoph Lameter @ 2011-08-05 23:32 UTC (permalink / raw)
To: H. Peter Anvin; +Cc: H. Peter Anvin, Ingo Molnar, linux-kernel, Andi Kleen
On Fri, 5 Aug 2011, H. Peter Anvin wrote:
> On 08/05/2011 02:10 PM, Christoph Lameter wrote:
> >>
> >> Have you done any performance analysis on this versus the null case? I
> >> know there are some workloads where it helps, but if it hurts as many as
> >> it helps...
> >
> > No I have not. prefetch IMHO means that the cacheline is fetched early so
> > that the cacheline is fully available like any other to the code.
> > prefetchnta does fetch the cacheline too but its not treated like the other cacheline but
> > preferably thrown out again. Its a "streamfetch" designed for apps that
> > scan over large amounts of memory and want to avoid cache pollution.
> >
> > This is surprising to the end user as far as I can tell.
> >
>
> Right. However, Linus has brought up the hypothesis that prefetch might
> actually be a net loss on x86, because current x86 processors are
> generally doing a good job with prefetching in hardware. Directed
> prefetches can thus be a net minus.
This kinid of prefetch is a minus because the cache is evicted early. It
was prefetched with a special hint so its likely very important. That does
not seem to be very consistent and may cause regressions. Changing it to a
full prefetch would make the important cacheline stay longer in the cache.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [x86] Fix prefetch instruction
2011-08-05 23:32 ` Christoph Lameter
@ 2011-08-05 23:53 ` H. Peter Anvin
0 siblings, 0 replies; 6+ messages in thread
From: H. Peter Anvin @ 2011-08-05 23:53 UTC (permalink / raw)
To: Christoph Lameter; +Cc: H. Peter Anvin, Ingo Molnar, linux-kernel, Andi Kleen
On 08/05/2011 04:32 PM, Christoph Lameter wrote:
>>
>> Right. However, Linus has brought up the hypothesis that prefetch might
>> actually be a net loss on x86, because current x86 processors are
>> generally doing a good job with prefetching in hardware. Directed
>> prefetches can thus be a net minus.
>
> This kinid of prefetch is a minus because the cache is evicted early. It
> was prefetched with a special hint so its likely very important. That does
> not seem to be very consistent and may cause regressions. Changing it to a
> full prefetch would make the important cacheline stay longer in the cache.
>
The argument applies not just to NTA prefetches, though. There is a
pipeline cost to performing the software prefetch action, it can cause
evictions if the data is not used, and it can increase TLB pressure.
As such, it would be very interesting to know if prefetch0 or nothing is
the better; agree we shouldn't nta here.
-hpa
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2011-08-05 23:53 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-08-05 16:18 [x86] Fix prefetch instruction Christoph Lameter
2011-08-05 20:13 ` H. Peter Anvin
2011-08-05 21:10 ` Christoph Lameter
2011-08-05 21:22 ` H. Peter Anvin
2011-08-05 23:32 ` Christoph Lameter
2011-08-05 23:53 ` H. Peter Anvin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox