* [PATCH] i386: PAE entries must have their low word cleared first
@ 2006-04-26 13:46 Jan Beulich
2006-04-26 14:46 ` Hugh Dickins
0 siblings, 1 reply; 14+ messages in thread
From: Jan Beulich @ 2006-04-26 13:46 UTC (permalink / raw)
To: linux-kernel; +Cc: Keir Fraser
During Xen development we noticed that writes to 64-bit variables may get
coded as high-word-before-low-word writes by gcc. Consequently, when
invalidating a PAE page table entry, the low word must be explicitly written
first, as otherwise, as pointed out by Keir, speculative execution may result
in a partially modified (and still valid) translation to be used, which has,
according to the specification, the potential of dead-locking the machine.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Cc: Keir Fraser <Keir.Fraser@cl.cam.ac.uk>
diff -Npru /home/jbeulich/tmp/linux-2.6.17-rc2/include/asm-i386/pgtable.h
2.6.17-rc2-i386-pae-ptep_get_and_clear_full/include/asm-i386/pgtable.h
--- /home/jbeulich/tmp/linux-2.6.17-rc2/include/asm-i386/pgtable.h 2006-04-26 10:56:03.000000000 +0200
+++ 2.6.17-rc2-i386-pae-ptep_get_and_clear_full/include/asm-i386/pgtable.h 2006-04-24 12:28:37.000000000 +0200
@@ -268,7 +268,15 @@ static inline pte_t ptep_get_and_clear_f
pte_t pte;
if (full) {
pte = *ptep;
+#ifdef CONFIG_X86_PAE
+ /* Cannot do this in a single step, as the compiler
+ may issue the two stores in either order. */
+ ptep->pte_low = 0;
+ barrier();
+ ptep->pte_high = 0;
+#else
*ptep = __pte(0);
+#endif
} else {
pte = ptep_get_and_clear(mm, addr, ptep);
}
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] i386: PAE entries must have their low word cleared first
2006-04-26 13:46 Jan Beulich
@ 2006-04-26 14:46 ` Hugh Dickins
2006-04-26 15:06 ` Keir Fraser
0 siblings, 1 reply; 14+ messages in thread
From: Hugh Dickins @ 2006-04-26 14:46 UTC (permalink / raw)
To: Jan Beulich; +Cc: linux-kernel, Keir Fraser, Zachary Amsden
On Wed, 26 Apr 2006, Jan Beulich wrote:
> During Xen development we noticed that writes to 64-bit variables may get
> coded as high-word-before-low-word writes by gcc. Consequently, when
> invalidating a PAE page table entry, the low word must be explicitly written
> first, as otherwise, as pointed out by Keir, speculative execution may result
> in a partially modified (and still valid) translation to be used, which has,
> according to the specification, the potential of dead-locking the machine.
If that's so (I don't trust my judgement on matters of speculative
execution), then I think you'd do better to replace the *ptep = __pte(0)
by pte_clear(mm, addr, ptep), and so avoid your ugly #ifdef'ing: please
check, but I think you'll find that reduces to just the barrier you want.
CC'ed Zach since it's his optimization, and he'll judge that spexecution.
Hugh
> Signed-off-by: Jan Beulich <jbeulich@novell.com>
> Cc: Keir Fraser <Keir.Fraser@cl.cam.ac.uk>
>
> diff -Npru /home/jbeulich/tmp/linux-2.6.17-rc2/include/asm-i386/pgtable.h
> 2.6.17-rc2-i386-pae-ptep_get_and_clear_full/include/asm-i386/pgtable.h
> --- /home/jbeulich/tmp/linux-2.6.17-rc2/include/asm-i386/pgtable.h 2006-04-26 10:56:03.000000000 +0200
> +++ 2.6.17-rc2-i386-pae-ptep_get_and_clear_full/include/asm-i386/pgtable.h 2006-04-24 12:28:37.000000000 +0200
> @@ -268,7 +268,15 @@ static inline pte_t ptep_get_and_clear_f
> pte_t pte;
> if (full) {
> pte = *ptep;
> +#ifdef CONFIG_X86_PAE
> + /* Cannot do this in a single step, as the compiler
> + may issue the two stores in either order. */
> + ptep->pte_low = 0;
> + barrier();
> + ptep->pte_high = 0;
> +#else
> *ptep = __pte(0);
> +#endif
> } else {
> pte = ptep_get_and_clear(mm, addr, ptep);
> }
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] i386: PAE entries must have their low word cleared first
2006-04-26 14:46 ` Hugh Dickins
@ 2006-04-26 15:06 ` Keir Fraser
2006-04-26 15:44 ` Zachary Amsden
` (2 more replies)
0 siblings, 3 replies; 14+ messages in thread
From: Keir Fraser @ 2006-04-26 15:06 UTC (permalink / raw)
To: Hugh Dickins; +Cc: Jan Beulich, Zachary Amsden, linux-kernel
On 26 Apr 2006, at 15:46, Hugh Dickins wrote:
> If that's so (I don't trust my judgement on matters of speculative
> execution), then I think you'd do better to replace the *ptep =
> __pte(0)
> by pte_clear(mm, addr, ptep), and so avoid your ugly #ifdef'ing: please
> check, but I think you'll find that reduces to just the barrier you
> want.
> CC'ed Zach since it's his optimization, and he'll judge that
> spexecution.
In more detail the problem is that, since we're still running on the
page tables while clearing them, the CPU may choose to prefetch a
half-cleared pte into its TLB, and then execute speculative memory
accesses based on that mapping (including ones that may write-allocate
cachelines, leading to problems like the AMD AGP GART deadlock Linux
had a year or so back).
The barrier is needed to ensure that the bottom half is cleared before
the top half. In fact it probably ought to be wmb() -- that's clearer
in intent and actually reduces to barrier() on all PAE-capable
platforms.
We cannot use pte_clear() unless we redefine it for PAE. Currently it
reduces to set_pte() which explicitly uses the wrong ordering (sets
high *then* low, because it's normally used to introduce a mapping).
Also, I think wmb() should be used in PAE's set_pte(), rather than the
current smp_wmb(). They reduce to the same thing, but wmb() makes it
clear this is is not an issue specific to SMP systems.
-- Keir
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] i386: PAE entries must have their low word cleared first
2006-04-26 15:06 ` Keir Fraser
@ 2006-04-26 15:44 ` Zachary Amsden
2006-04-26 15:57 ` Hugh Dickins
2006-04-26 15:45 ` Hugh Dickins
2006-04-26 15:58 ` Nick Piggin
2 siblings, 1 reply; 14+ messages in thread
From: Zachary Amsden @ 2006-04-26 15:44 UTC (permalink / raw)
To: Keir Fraser; +Cc: Hugh Dickins, Jan Beulich, linux-kernel
[-- Attachment #1: Type: text/plain, Size: 656 bytes --]
Keir Fraser wrote:
>
> We cannot use pte_clear() unless we redefine it for PAE. Currently it
> reduces to set_pte() which explicitly uses the wrong ordering (sets
> high *then* low, because it's normally used to introduce a mapping).
Yes, that means pte_clear() has the same bug. Why don't we redefine
pte_clear for PAE? Then the ifdef can go away.
> Also, I think wmb() should be used in PAE's set_pte(), rather than the
> current smp_wmb(). They reduce to the same thing, but wmb() makes it
> clear this is is not an issue specific to SMP systems.
I agree with that. The problem is not related to SMP at all. I would
propose this approach.
[-- Attachment #2: fix-pte-clear --]
[-- Type: text/plain, Size: 2468 bytes --]
Proposed fix for ptep_get_and_clear_full PAE bug. Pte_clear had the same
bug, so use the same fix for both.
Signed-off-by: Zachary Amsden <zach@vmware.com>
Index: linux-2.6.17-rc/include/asm-i386/pgtable.h
===================================================================
--- linux-2.6.17-rc.orig/include/asm-i386/pgtable.h 2006-04-25 15:41:06.000000000 -0700
+++ linux-2.6.17-rc/include/asm-i386/pgtable.h 2006-04-26 08:39:42.000000000 -0700
@@ -204,7 +204,6 @@ extern unsigned long long __PAGE_KERNEL,
extern unsigned long pg0[];
#define pte_present(x) ((x).pte_low & (_PAGE_PRESENT | _PAGE_PROTNONE))
-#define pte_clear(mm,addr,xp) do { set_pte_at(mm, addr, xp, __pte(0)); } while (0)
/* To avoid harmful races, pmd_none(x) should check only the lower when PAE */
#define pmd_none(x) (!(unsigned long)pmd_val(x))
@@ -268,7 +267,7 @@ static inline pte_t ptep_get_and_clear_f
pte_t pte;
if (full) {
pte = *ptep;
- *ptep = __pte(0);
+ pte_clear(mm, addr, ptep);
} else {
pte = ptep_get_and_clear(mm, addr, ptep);
}
Index: linux-2.6.17-rc/include/asm-i386/pgtable-3level.h
===================================================================
--- linux-2.6.17-rc.orig/include/asm-i386/pgtable-3level.h 2006-04-25 15:41:06.000000000 -0700
+++ linux-2.6.17-rc/include/asm-i386/pgtable-3level.h 2006-04-26 08:38:57.000000000 -0700
@@ -85,6 +85,13 @@ static inline void pud_clear (pud_t * pu
#define pmd_offset(pud, address) ((pmd_t *) pud_page(*(pud)) + \
pmd_index(address))
+static inline void pte_clear(struct mm_struct *mm, unsigned long addr, pte_t *ptep)
+{
+ ptep->pte_low = 0;
+ wmb();
+ ptep->pte_high = 0;
+}
+
static inline pte_t ptep_get_and_clear(struct mm_struct *mm, unsigned long addr, pte_t *ptep)
{
pte_t res;
Index: linux-2.6.17-rc/include/asm-i386/pgtable-2level.h
===================================================================
--- linux-2.6.17-rc.orig/include/asm-i386/pgtable-2level.h 2006-04-25 15:41:06.000000000 -0700
+++ linux-2.6.17-rc/include/asm-i386/pgtable-2level.h 2006-04-26 08:37:34.000000000 -0700
@@ -18,6 +18,8 @@
#define set_pte_atomic(pteptr, pteval) set_pte(pteptr,pteval)
#define set_pmd(pmdptr, pmdval) (*(pmdptr) = (pmdval))
+#define pte_clear(mm,addr,xp) do { set_pte_at(mm, addr, xp, __pte(0)); } while (0)
+
#define ptep_get_and_clear(mm,addr,xp) __pte(xchg(&(xp)->pte_low, 0))
#define pte_same(a, b) ((a).pte_low == (b).pte_low)
#define pte_page(x) pfn_to_page(pte_pfn(x))
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] i386: PAE entries must have their low word cleared first
2006-04-26 15:06 ` Keir Fraser
2006-04-26 15:44 ` Zachary Amsden
@ 2006-04-26 15:45 ` Hugh Dickins
2006-04-26 16:06 ` Nick Piggin
2006-04-26 15:58 ` Nick Piggin
2 siblings, 1 reply; 14+ messages in thread
From: Hugh Dickins @ 2006-04-26 15:45 UTC (permalink / raw)
To: Keir Fraser; +Cc: Jan Beulich, Zachary Amsden, linux-kernel
On Wed, 26 Apr 2006, Keir Fraser wrote:
>
> We cannot use pte_clear() unless we redefine it for PAE. Currently it reduces
> to set_pte() which explicitly uses the wrong ordering (sets high *then* low,
> because it's normally used to introduce a mapping).
I overlooked that reversal completely. What a very good point.
I think that actually pte_clear() _does_ need to be redefined for PAE,
to reverse that ordering as you point out. Take a look at its use in
mm/highmem.c (where a comment states it's safe against speculative
execution, but a comment can't guarantee that!): what do you think?
Hugh
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] i386: PAE entries must have their low word cleared first
2006-04-26 15:44 ` Zachary Amsden
@ 2006-04-26 15:57 ` Hugh Dickins
2006-04-26 16:12 ` Keir Fraser
0 siblings, 1 reply; 14+ messages in thread
From: Hugh Dickins @ 2006-04-26 15:57 UTC (permalink / raw)
To: Zachary Amsden; +Cc: Keir Fraser, Jan Beulich, linux-kernel
On Wed, 26 Apr 2006, Zachary Amsden wrote:
> Keir Fraser wrote:
> >
> > We cannot use pte_clear() unless we redefine it for PAE. Currently it
> > reduces to set_pte() which explicitly uses the wrong ordering (sets high
> > *then* low, because it's normally used to introduce a mapping).
>
> Yes, that means pte_clear() has the same bug. Why don't we redefine pte_clear
> for PAE? Then the ifdef can go away.
Good, we agree.
> > Also, I think wmb() should be used in PAE's set_pte(), rather than the
> > current smp_wmb(). They reduce to the same thing, but wmb() makes it clear
> > this is is not an issue specific to SMP systems.
>
> I agree with that. The problem is not related to SMP at all. I would propose
> this approach.
Attachment included...
> Proposed fix for ptep_get_and_clear_full PAE bug. Pte_clear had the same
> bug, so use the same fix for both.
You need to expand that comment with text from Jan & Keir's patch:
Andrew will want to know just what this bug was. The patch looks
good to me, except for the unnecessary do { } while (0) in the
definition of pte_clear: ah, you're only copying what was already
there, can't blame you for that, let's not worry about it.
Hugh
> Signed-off-by: Zachary Amsden <zach@vmware.com>
>
> Index: linux-2.6.17-rc/include/asm-i386/pgtable.h
> ===================================================================
> --- linux-2.6.17-rc.orig/include/asm-i386/pgtable.h 2006-04-25 15:41:06.000000000 -0700
> +++ linux-2.6.17-rc/include/asm-i386/pgtable.h 2006-04-26 08:39:42.000000000 -0700
> @@ -204,7 +204,6 @@ extern unsigned long long __PAGE_KERNEL,
> extern unsigned long pg0[];
>
> #define pte_present(x) ((x).pte_low & (_PAGE_PRESENT | _PAGE_PROTNONE))
> -#define pte_clear(mm,addr,xp) do { set_pte_at(mm, addr, xp, __pte(0)); } while (0)
>
> /* To avoid harmful races, pmd_none(x) should check only the lower when PAE */
> #define pmd_none(x) (!(unsigned long)pmd_val(x))
> @@ -268,7 +267,7 @@ static inline pte_t ptep_get_and_clear_f
> pte_t pte;
> if (full) {
> pte = *ptep;
> - *ptep = __pte(0);
> + pte_clear(mm, addr, ptep);
> } else {
> pte = ptep_get_and_clear(mm, addr, ptep);
> }
> Index: linux-2.6.17-rc/include/asm-i386/pgtable-3level.h
> ===================================================================
> --- linux-2.6.17-rc.orig/include/asm-i386/pgtable-3level.h 2006-04-25 15:41:06.000000000 -0700
> +++ linux-2.6.17-rc/include/asm-i386/pgtable-3level.h 2006-04-26 08:38:57.000000000 -0700
> @@ -85,6 +85,13 @@ static inline void pud_clear (pud_t * pu
> #define pmd_offset(pud, address) ((pmd_t *) pud_page(*(pud)) + \
> pmd_index(address))
>
> +static inline void pte_clear(struct mm_struct *mm, unsigned long addr, pte_t *ptep)
> +{
> + ptep->pte_low = 0;
> + wmb();
> + ptep->pte_high = 0;
> +}
> +
> static inline pte_t ptep_get_and_clear(struct mm_struct *mm, unsigned long addr, pte_t *ptep)
> {
> pte_t res;
> Index: linux-2.6.17-rc/include/asm-i386/pgtable-2level.h
> ===================================================================
> --- linux-2.6.17-rc.orig/include/asm-i386/pgtable-2level.h 2006-04-25 15:41:06.000000000 -0700
> +++ linux-2.6.17-rc/include/asm-i386/pgtable-2level.h 2006-04-26 08:37:34.000000000 -0700
> @@ -18,6 +18,8 @@
> #define set_pte_atomic(pteptr, pteval) set_pte(pteptr,pteval)
> #define set_pmd(pmdptr, pmdval) (*(pmdptr) = (pmdval))
>
> +#define pte_clear(mm,addr,xp) do { set_pte_at(mm, addr, xp, __pte(0)); } while (0)
> +
> #define ptep_get_and_clear(mm,addr,xp) __pte(xchg(&(xp)->pte_low, 0))
> #define pte_same(a, b) ((a).pte_low == (b).pte_low)
> #define pte_page(x) pfn_to_page(pte_pfn(x))
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] i386: PAE entries must have their low word cleared first
2006-04-26 15:06 ` Keir Fraser
2006-04-26 15:44 ` Zachary Amsden
2006-04-26 15:45 ` Hugh Dickins
@ 2006-04-26 15:58 ` Nick Piggin
2006-04-27 10:27 ` Sonny Rao
2 siblings, 1 reply; 14+ messages in thread
From: Nick Piggin @ 2006-04-26 15:58 UTC (permalink / raw)
To: Keir Fraser; +Cc: Hugh Dickins, Jan Beulich, Zachary Amsden, linux-kernel
Keir Fraser wrote:
>
> On 26 Apr 2006, at 15:46, Hugh Dickins wrote:
>
>> If that's so (I don't trust my judgement on matters of speculative
>> execution), then I think you'd do better to replace the *ptep = __pte(0)
>> by pte_clear(mm, addr, ptep), and so avoid your ugly #ifdef'ing: please
>> check, but I think you'll find that reduces to just the barrier you want.
>> CC'ed Zach since it's his optimization, and he'll judge that spexecution.
>
>
> In more detail the problem is that, since we're still running on the
> page tables while clearing them, the CPU may choose to prefetch a
> half-cleared pte into its TLB, and then execute speculative memory
> accesses based on that mapping (including ones that may write-allocate
> cachelines, leading to problems like the AMD AGP GART deadlock Linux had
> a year or so back).
What do you mean, you're still running on the page tables? The CPU can
still walk the pagetables?
Because if ptep_get_and_clear_full is passed non zero in the full
argument, then that specific translation should never see another
access. I didn't know CPUs now actually resolve TLB misses as part of
speculative prefetching... does this really happen?
Do you have a pointer to where the AMD AGP GART deadlock was discussed?
Google is having some trouble finding it.
>
> The barrier is needed to ensure that the bottom half is cleared before
> the top half. In fact it probably ought to be wmb() -- that's clearer in
> intent and actually reduces to barrier() on all PAE-capable platforms.
I think barrier is correct there.
The full argument means that the mm is being torn down and we are the
last thread in the mm. So these pagetables can't be visible to other CPUs,
can they?
>
> We cannot use pte_clear() unless we redefine it for PAE. Currently it
> reduces to set_pte() which explicitly uses the wrong ordering (sets high
> *then* low, because it's normally used to introduce a mapping).
Presumably you would redefine it for PAE also (in this case you may
actually need an smp_wmb rather than just barrier).
Unless you can explain why this is a problem and not the other pte_clear
callers.
>
> Also, I think wmb() should be used in PAE's set_pte(), rather than the
> current smp_wmb(). They reduce to the same thing, but wmb() makes it
> clear this is is not an issue specific to SMP systems.
That is what smp_wmb() means. wmb() means it should also order io memory.
Clear as mud?
--
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] i386: PAE entries must have their low word cleared first
2006-04-26 15:45 ` Hugh Dickins
@ 2006-04-26 16:06 ` Nick Piggin
0 siblings, 0 replies; 14+ messages in thread
From: Nick Piggin @ 2006-04-26 16:06 UTC (permalink / raw)
To: Hugh Dickins; +Cc: Keir Fraser, Jan Beulich, Zachary Amsden, linux-kernel
Hugh Dickins wrote:
> On Wed, 26 Apr 2006, Keir Fraser wrote:
>
>>We cannot use pte_clear() unless we redefine it for PAE. Currently it reduces
>>to set_pte() which explicitly uses the wrong ordering (sets high *then* low,
>>because it's normally used to introduce a mapping).
>
>
> I overlooked that reversal completely. What a very good point.
> I think that actually pte_clear() _does_ need to be redefined for PAE,
> to reverse that ordering as you point out. Take a look at its use in
> mm/highmem.c (where a comment states it's safe against speculative
> execution, but a comment can't guarantee that!): what do you think?
Speculative execution is safe I think (and so is ptep_get_and_clear_full,
because in neither case will the virtual address be visible).
Speculative prefetching + tlb instantiation, apparently no.
--
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] i386: PAE entries must have their low word cleared first
2006-04-26 15:57 ` Hugh Dickins
@ 2006-04-26 16:12 ` Keir Fraser
0 siblings, 0 replies; 14+ messages in thread
From: Keir Fraser @ 2006-04-26 16:12 UTC (permalink / raw)
To: Hugh Dickins; +Cc: Zachary Amsden, Jan Beulich, linux-kernel
On 26 Apr 2006, at 16:57, Hugh Dickins wrote:
>> Proposed fix for ptep_get_and_clear_full PAE bug. Pte_clear had the
>> same
>> bug, so use the same fix for both.
>
> You need to expand that comment with text from Jan & Keir's patch:
> Andrew will want to know just what this bug was. The patch looks
> good to me, except for the unnecessary do { } while (0) in the
> definition of pte_clear: ah, you're only copying what was already
> there, can't blame you for that, let's not worry about it.
I agree: the patch comment is rather brief, but the patch itself looks
good. I suppose switching PAE's set_pte() from smp_wmb() to wmb() would
logically belong in a separate patch.
-- Keir
^ permalink raw reply [flat|nested] 14+ messages in thread
* RE: [PATCH] i386: PAE entries must have their low word cleared first
@ 2006-04-26 22:11 Brunner, Richard
2006-04-26 22:22 ` Zachary Amsden
0 siblings, 1 reply; 14+ messages in thread
From: Brunner, Richard @ 2006-04-26 22:11 UTC (permalink / raw)
To: Nick Piggin, Keir Fraser
Cc: Hugh Dickins, Jan Beulich, Zachary Amsden, linux-kernel
> -----Original Message-----
> [mailto:linux-kernel-owner@vger.kernel.org] On Behalf Of
> Nick Piggin wrote
>
> Keir Fraser wrote:
> >
> > [snip]
> >
> > In more detail the problem is that, since we're still running on the
> > page tables while clearing them, the CPU may choose to prefetch a
> > half-cleared pte into its TLB, and then execute speculative memory
> > accesses based on that mapping (including ones that may write-allocate
> > cachelines, leading to problems like the AMD AGP GART deadlock Linux had
> > a year or so back).
>
> What do you mean, you're still running on the page tables? The CPU can
> still walk the pagetables?
>
> Because if ptep_get_and_clear_full is passed non zero in the full
> argument, then that specific translation should never see another
> access. I didn't know CPUs now actually resolve TLB misses as part of
> speculative prefetching... does this really happen?
Yes, on AMD processors, speculative execution is allowed to install TLB translations. Assuming you are not in 64-bit mode, and can not use
mmx or sse instructions to zero out the naturally aligned 64-bit pte,
then what is being suggested makes sense: clear out the valid bit
first by writing the low 32-bits. No invalid translations are ever
installed in the TLB.
> Do you have a pointer to where the AMD AGP GART deadlock was
> discussed? Google is having some trouble finding it.
http://marc.theaimsgroup.com/?l=linux-kernel&m=102376926732464&w=2
> > The barrier is needed to ensure that the bottom half is cleared before
> > the top half. In fact it probably ought to be wmb() --
> that's clearer in intent and actually reduces to barrier() on all PAE capable platforms.
Maybe the barrier is needed for other architectures, but two writes
to WB memory are not going to happen out of order and so no
barrier is needed on x86 to the best of my knowledge.
-Rich Brunner, AMD
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] i386: PAE entries must have their low word cleared first
2006-04-26 22:11 [PATCH] i386: PAE entries must have their low word cleared first Brunner, Richard
@ 2006-04-26 22:22 ` Zachary Amsden
2006-04-26 22:27 ` Zachary Amsden
0 siblings, 1 reply; 14+ messages in thread
From: Zachary Amsden @ 2006-04-26 22:22 UTC (permalink / raw)
To: Brunner, Richard
Cc: Nick Piggin, Keir Fraser, Hugh Dickins, Jan Beulich, linux-kernel
Brunner, Richard wrote:
>
>
> Maybe the barrier is needed for other architectures, but two writes
> to WB memory are not going to happen out of order and so no
> barrier is needed on x86 to the best of my knowledge.
>
The barrier here is just a compiler barrier - wmb on x86 is just asm
volatile ("" ::: "memory"); This is needed to stop gcc reordering the
stores - not because the processor does respect them.
Zach
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] i386: PAE entries must have their low word cleared first
2006-04-26 22:22 ` Zachary Amsden
@ 2006-04-26 22:27 ` Zachary Amsden
0 siblings, 0 replies; 14+ messages in thread
From: Zachary Amsden @ 2006-04-26 22:27 UTC (permalink / raw)
To: Zachary Amsden
Cc: Brunner, Richard, Nick Piggin, Keir Fraser, Hugh Dickins,
Jan Beulich, linux-kernel
Zachary Amsden wrote:
> Brunner, Richard wrote:
>>
>> Maybe the barrier is needed for other architectures, but two writes
>> to WB memory are not going to happen out of order and so no
>> barrier is needed on x86 to the best of my knowledge.
>>
>
> The barrier here is just a compiler barrier - wmb on x86 is just asm
> volatile ("" ::: "memory"); This is needed to stop gcc reordering the
> stores - not because the processor does respect them.
Please forgive my English - and avoiding the double negative - "because
the processor _does_ respect them". And thanks for confirming the
possibility of this bug.
Zach
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] i386: PAE entries must have their low word cleared first
2006-04-26 15:58 ` Nick Piggin
@ 2006-04-27 10:27 ` Sonny Rao
2006-04-27 13:39 ` Nick Piggin
0 siblings, 1 reply; 14+ messages in thread
From: Sonny Rao @ 2006-04-27 10:27 UTC (permalink / raw)
To: Nick Piggin
Cc: Keir Fraser, Hugh Dickins, Jan Beulich, Zachary Amsden,
linux-kernel, anton
On Thu, Apr 27, 2006 at 01:58:42AM +1000, Nick Piggin wrote:
> Keir Fraser wrote:
> >
> >On 26 Apr 2006, at 15:46, Hugh Dickins wrote:
> >
> >>If that's so (I don't trust my judgement on matters of speculative
> >>execution), then I think you'd do better to replace the *ptep = __pte(0)
> >>by pte_clear(mm, addr, ptep), and so avoid your ugly #ifdef'ing: please
> >>check, but I think you'll find that reduces to just the barrier you want.
> >>CC'ed Zach since it's his optimization, and he'll judge that spexecution.
> >
> >
> >In more detail the problem is that, since we're still running on the
> >page tables while clearing them, the CPU may choose to prefetch a
> >half-cleared pte into its TLB, and then execute speculative memory
> >accesses based on that mapping (including ones that may write-allocate
> >cachelines, leading to problems like the AMD AGP GART deadlock Linux had
> >a year or so back).
>
> What do you mean, you're still running on the page tables? The CPU can
> still walk the pagetables?
>
> Because if ptep_get_and_clear_full is passed non zero in the full
> argument, then that specific translation should never see another
> access. I didn't know CPUs now actually resolve TLB misses as part of
> speculative prefetching... does this really happen?
For instance, during speculative execution on POWER, we can take a
TLB miss for a speculative load and start a table-walk.
I'm not sure what "speculative prefetching" means in this case... just
regular hardware-initiated prefetching (where I suppose one could use the
modifier "speculative") on POWER will only prefetch to a page-boundary.
So, slightly OT, as this is not about x86 CPUs... but thought people
might be interested.
(Added Anton to CC to call any BS on my part :-)
Sonny
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] i386: PAE entries must have their low word cleared first
2006-04-27 10:27 ` Sonny Rao
@ 2006-04-27 13:39 ` Nick Piggin
0 siblings, 0 replies; 14+ messages in thread
From: Nick Piggin @ 2006-04-27 13:39 UTC (permalink / raw)
To: Sonny Rao
Cc: Keir Fraser, Hugh Dickins, Jan Beulich, Zachary Amsden,
linux-kernel, anton
Sonny Rao wrote:
> On Thu, Apr 27, 2006 at 01:58:42AM +1000, Nick Piggin wrote:
>
>>Keir Fraser wrote:
>>
>>>In more detail the problem is that, since we're still running on the
>>>page tables while clearing them, the CPU may choose to prefetch a
>>>half-cleared pte into its TLB, and then execute speculative memory
>>>accesses based on that mapping (including ones that may write-allocate
>>>cachelines, leading to problems like the AMD AGP GART deadlock Linux had
>>>a year or so back).
>>
>>What do you mean, you're still running on the page tables? The CPU can
>>still walk the pagetables?
>>
>>Because if ptep_get_and_clear_full is passed non zero in the full
>>argument, then that specific translation should never see another
>>access. I didn't know CPUs now actually resolve TLB misses as part of
>>speculative prefetching... does this really happen?
>
>
> For instance, during speculative execution on POWER, we can take a
> TLB miss for a speculative load and start a table-walk.
>
> I'm not sure what "speculative prefetching" means in this case... just
> regular hardware-initiated prefetching (where I suppose one could use the
> modifier "speculative") on POWER will only prefetch to a page-boundary.
speculative prefetching / hardware prefetching...
I believe this bug wouldn't happen as a result of speculative execution,
because a load from this virtual address should never be in the instruction
stream. So it would require some hardware initiated prefetch to establish
the tlb entry.
>
> So, slightly OT, as this is not about x86 CPUs... but thought people
> might be interested.
Thanks!
--
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2006-04-27 15:15 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-04-26 22:11 [PATCH] i386: PAE entries must have their low word cleared first Brunner, Richard
2006-04-26 22:22 ` Zachary Amsden
2006-04-26 22:27 ` Zachary Amsden
-- strict thread matches above, loose matches on Subject: below --
2006-04-26 13:46 Jan Beulich
2006-04-26 14:46 ` Hugh Dickins
2006-04-26 15:06 ` Keir Fraser
2006-04-26 15:44 ` Zachary Amsden
2006-04-26 15:57 ` Hugh Dickins
2006-04-26 16:12 ` Keir Fraser
2006-04-26 15:45 ` Hugh Dickins
2006-04-26 16:06 ` Nick Piggin
2006-04-26 15:58 ` Nick Piggin
2006-04-27 10:27 ` Sonny Rao
2006-04-27 13:39 ` Nick Piggin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).