* RE: [PATCH] i386: PAE entries must have their low word cleared first @ 2006-04-26 22:11 Brunner, Richard 2006-04-26 22:22 ` Zachary Amsden 0 siblings, 1 reply; 14+ messages in thread From: Brunner, Richard @ 2006-04-26 22:11 UTC (permalink / raw) To: Nick Piggin, Keir Fraser Cc: Hugh Dickins, Jan Beulich, Zachary Amsden, linux-kernel > -----Original Message----- > [mailto:linux-kernel-owner@vger.kernel.org] On Behalf Of > Nick Piggin wrote > > Keir Fraser wrote: > > > > [snip] > > > > In more detail the problem is that, since we're still running on the > > page tables while clearing them, the CPU may choose to prefetch a > > half-cleared pte into its TLB, and then execute speculative memory > > accesses based on that mapping (including ones that may write-allocate > > cachelines, leading to problems like the AMD AGP GART deadlock Linux had > > a year or so back). > > What do you mean, you're still running on the page tables? The CPU can > still walk the pagetables? > > Because if ptep_get_and_clear_full is passed non zero in the full > argument, then that specific translation should never see another > access. I didn't know CPUs now actually resolve TLB misses as part of > speculative prefetching... does this really happen? Yes, on AMD processors, speculative execution is allowed to install TLB translations. Assuming you are not in 64-bit mode, and can not use mmx or sse instructions to zero out the naturally aligned 64-bit pte, then what is being suggested makes sense: clear out the valid bit first by writing the low 32-bits. No invalid translations are ever installed in the TLB. > Do you have a pointer to where the AMD AGP GART deadlock was > discussed? Google is having some trouble finding it. http://marc.theaimsgroup.com/?l=linux-kernel&m=102376926732464&w=2 > > The barrier is needed to ensure that the bottom half is cleared before > > the top half. In fact it probably ought to be wmb() -- > that's clearer in intent and actually reduces to barrier() on all PAE capable platforms. Maybe the barrier is needed for other architectures, but two writes to WB memory are not going to happen out of order and so no barrier is needed on x86 to the best of my knowledge. -Rich Brunner, AMD ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] i386: PAE entries must have their low word cleared first 2006-04-26 22:11 [PATCH] i386: PAE entries must have their low word cleared first Brunner, Richard @ 2006-04-26 22:22 ` Zachary Amsden 2006-04-26 22:27 ` Zachary Amsden 0 siblings, 1 reply; 14+ messages in thread From: Zachary Amsden @ 2006-04-26 22:22 UTC (permalink / raw) To: Brunner, Richard Cc: Nick Piggin, Keir Fraser, Hugh Dickins, Jan Beulich, linux-kernel Brunner, Richard wrote: > > > Maybe the barrier is needed for other architectures, but two writes > to WB memory are not going to happen out of order and so no > barrier is needed on x86 to the best of my knowledge. > The barrier here is just a compiler barrier - wmb on x86 is just asm volatile ("" ::: "memory"); This is needed to stop gcc reordering the stores - not because the processor does respect them. Zach ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] i386: PAE entries must have their low word cleared first 2006-04-26 22:22 ` Zachary Amsden @ 2006-04-26 22:27 ` Zachary Amsden 0 siblings, 0 replies; 14+ messages in thread From: Zachary Amsden @ 2006-04-26 22:27 UTC (permalink / raw) To: Zachary Amsden Cc: Brunner, Richard, Nick Piggin, Keir Fraser, Hugh Dickins, Jan Beulich, linux-kernel Zachary Amsden wrote: > Brunner, Richard wrote: >> >> Maybe the barrier is needed for other architectures, but two writes >> to WB memory are not going to happen out of order and so no >> barrier is needed on x86 to the best of my knowledge. >> > > The barrier here is just a compiler barrier - wmb on x86 is just asm > volatile ("" ::: "memory"); This is needed to stop gcc reordering the > stores - not because the processor does respect them. Please forgive my English - and avoiding the double negative - "because the processor _does_ respect them". And thanks for confirming the possibility of this bug. Zach ^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH] i386: PAE entries must have their low word cleared first
@ 2006-04-26 13:46 Jan Beulich
2006-04-26 14:46 ` Hugh Dickins
0 siblings, 1 reply; 14+ messages in thread
From: Jan Beulich @ 2006-04-26 13:46 UTC (permalink / raw)
To: linux-kernel; +Cc: Keir Fraser
During Xen development we noticed that writes to 64-bit variables may get
coded as high-word-before-low-word writes by gcc. Consequently, when
invalidating a PAE page table entry, the low word must be explicitly written
first, as otherwise, as pointed out by Keir, speculative execution may result
in a partially modified (and still valid) translation to be used, which has,
according to the specification, the potential of dead-locking the machine.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Cc: Keir Fraser <Keir.Fraser@cl.cam.ac.uk>
diff -Npru /home/jbeulich/tmp/linux-2.6.17-rc2/include/asm-i386/pgtable.h
2.6.17-rc2-i386-pae-ptep_get_and_clear_full/include/asm-i386/pgtable.h
--- /home/jbeulich/tmp/linux-2.6.17-rc2/include/asm-i386/pgtable.h 2006-04-26 10:56:03.000000000 +0200
+++ 2.6.17-rc2-i386-pae-ptep_get_and_clear_full/include/asm-i386/pgtable.h 2006-04-24 12:28:37.000000000 +0200
@@ -268,7 +268,15 @@ static inline pte_t ptep_get_and_clear_f
pte_t pte;
if (full) {
pte = *ptep;
+#ifdef CONFIG_X86_PAE
+ /* Cannot do this in a single step, as the compiler
+ may issue the two stores in either order. */
+ ptep->pte_low = 0;
+ barrier();
+ ptep->pte_high = 0;
+#else
*ptep = __pte(0);
+#endif
} else {
pte = ptep_get_and_clear(mm, addr, ptep);
}
^ permalink raw reply [flat|nested] 14+ messages in thread* Re: [PATCH] i386: PAE entries must have their low word cleared first 2006-04-26 13:46 Jan Beulich @ 2006-04-26 14:46 ` Hugh Dickins 2006-04-26 15:06 ` Keir Fraser 0 siblings, 1 reply; 14+ messages in thread From: Hugh Dickins @ 2006-04-26 14:46 UTC (permalink / raw) To: Jan Beulich; +Cc: linux-kernel, Keir Fraser, Zachary Amsden On Wed, 26 Apr 2006, Jan Beulich wrote: > During Xen development we noticed that writes to 64-bit variables may get > coded as high-word-before-low-word writes by gcc. Consequently, when > invalidating a PAE page table entry, the low word must be explicitly written > first, as otherwise, as pointed out by Keir, speculative execution may result > in a partially modified (and still valid) translation to be used, which has, > according to the specification, the potential of dead-locking the machine. If that's so (I don't trust my judgement on matters of speculative execution), then I think you'd do better to replace the *ptep = __pte(0) by pte_clear(mm, addr, ptep), and so avoid your ugly #ifdef'ing: please check, but I think you'll find that reduces to just the barrier you want. CC'ed Zach since it's his optimization, and he'll judge that spexecution. Hugh > Signed-off-by: Jan Beulich <jbeulich@novell.com> > Cc: Keir Fraser <Keir.Fraser@cl.cam.ac.uk> > > diff -Npru /home/jbeulich/tmp/linux-2.6.17-rc2/include/asm-i386/pgtable.h > 2.6.17-rc2-i386-pae-ptep_get_and_clear_full/include/asm-i386/pgtable.h > --- /home/jbeulich/tmp/linux-2.6.17-rc2/include/asm-i386/pgtable.h 2006-04-26 10:56:03.000000000 +0200 > +++ 2.6.17-rc2-i386-pae-ptep_get_and_clear_full/include/asm-i386/pgtable.h 2006-04-24 12:28:37.000000000 +0200 > @@ -268,7 +268,15 @@ static inline pte_t ptep_get_and_clear_f > pte_t pte; > if (full) { > pte = *ptep; > +#ifdef CONFIG_X86_PAE > + /* Cannot do this in a single step, as the compiler > + may issue the two stores in either order. */ > + ptep->pte_low = 0; > + barrier(); > + ptep->pte_high = 0; > +#else > *ptep = __pte(0); > +#endif > } else { > pte = ptep_get_and_clear(mm, addr, ptep); > } > > > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] i386: PAE entries must have their low word cleared first 2006-04-26 14:46 ` Hugh Dickins @ 2006-04-26 15:06 ` Keir Fraser 2006-04-26 15:44 ` Zachary Amsden ` (2 more replies) 0 siblings, 3 replies; 14+ messages in thread From: Keir Fraser @ 2006-04-26 15:06 UTC (permalink / raw) To: Hugh Dickins; +Cc: Jan Beulich, Zachary Amsden, linux-kernel On 26 Apr 2006, at 15:46, Hugh Dickins wrote: > If that's so (I don't trust my judgement on matters of speculative > execution), then I think you'd do better to replace the *ptep = > __pte(0) > by pte_clear(mm, addr, ptep), and so avoid your ugly #ifdef'ing: please > check, but I think you'll find that reduces to just the barrier you > want. > CC'ed Zach since it's his optimization, and he'll judge that > spexecution. In more detail the problem is that, since we're still running on the page tables while clearing them, the CPU may choose to prefetch a half-cleared pte into its TLB, and then execute speculative memory accesses based on that mapping (including ones that may write-allocate cachelines, leading to problems like the AMD AGP GART deadlock Linux had a year or so back). The barrier is needed to ensure that the bottom half is cleared before the top half. In fact it probably ought to be wmb() -- that's clearer in intent and actually reduces to barrier() on all PAE-capable platforms. We cannot use pte_clear() unless we redefine it for PAE. Currently it reduces to set_pte() which explicitly uses the wrong ordering (sets high *then* low, because it's normally used to introduce a mapping). Also, I think wmb() should be used in PAE's set_pte(), rather than the current smp_wmb(). They reduce to the same thing, but wmb() makes it clear this is is not an issue specific to SMP systems. -- Keir ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] i386: PAE entries must have their low word cleared first 2006-04-26 15:06 ` Keir Fraser @ 2006-04-26 15:44 ` Zachary Amsden 2006-04-26 15:57 ` Hugh Dickins 2006-04-26 15:45 ` Hugh Dickins 2006-04-26 15:58 ` Nick Piggin 2 siblings, 1 reply; 14+ messages in thread From: Zachary Amsden @ 2006-04-26 15:44 UTC (permalink / raw) To: Keir Fraser; +Cc: Hugh Dickins, Jan Beulich, linux-kernel [-- Attachment #1: Type: text/plain, Size: 656 bytes --] Keir Fraser wrote: > > We cannot use pte_clear() unless we redefine it for PAE. Currently it > reduces to set_pte() which explicitly uses the wrong ordering (sets > high *then* low, because it's normally used to introduce a mapping). Yes, that means pte_clear() has the same bug. Why don't we redefine pte_clear for PAE? Then the ifdef can go away. > Also, I think wmb() should be used in PAE's set_pte(), rather than the > current smp_wmb(). They reduce to the same thing, but wmb() makes it > clear this is is not an issue specific to SMP systems. I agree with that. The problem is not related to SMP at all. I would propose this approach. [-- Attachment #2: fix-pte-clear --] [-- Type: text/plain, Size: 2468 bytes --] Proposed fix for ptep_get_and_clear_full PAE bug. Pte_clear had the same bug, so use the same fix for both. Signed-off-by: Zachary Amsden <zach@vmware.com> Index: linux-2.6.17-rc/include/asm-i386/pgtable.h =================================================================== --- linux-2.6.17-rc.orig/include/asm-i386/pgtable.h 2006-04-25 15:41:06.000000000 -0700 +++ linux-2.6.17-rc/include/asm-i386/pgtable.h 2006-04-26 08:39:42.000000000 -0700 @@ -204,7 +204,6 @@ extern unsigned long long __PAGE_KERNEL, extern unsigned long pg0[]; #define pte_present(x) ((x).pte_low & (_PAGE_PRESENT | _PAGE_PROTNONE)) -#define pte_clear(mm,addr,xp) do { set_pte_at(mm, addr, xp, __pte(0)); } while (0) /* To avoid harmful races, pmd_none(x) should check only the lower when PAE */ #define pmd_none(x) (!(unsigned long)pmd_val(x)) @@ -268,7 +267,7 @@ static inline pte_t ptep_get_and_clear_f pte_t pte; if (full) { pte = *ptep; - *ptep = __pte(0); + pte_clear(mm, addr, ptep); } else { pte = ptep_get_and_clear(mm, addr, ptep); } Index: linux-2.6.17-rc/include/asm-i386/pgtable-3level.h =================================================================== --- linux-2.6.17-rc.orig/include/asm-i386/pgtable-3level.h 2006-04-25 15:41:06.000000000 -0700 +++ linux-2.6.17-rc/include/asm-i386/pgtable-3level.h 2006-04-26 08:38:57.000000000 -0700 @@ -85,6 +85,13 @@ static inline void pud_clear (pud_t * pu #define pmd_offset(pud, address) ((pmd_t *) pud_page(*(pud)) + \ pmd_index(address)) +static inline void pte_clear(struct mm_struct *mm, unsigned long addr, pte_t *ptep) +{ + ptep->pte_low = 0; + wmb(); + ptep->pte_high = 0; +} + static inline pte_t ptep_get_and_clear(struct mm_struct *mm, unsigned long addr, pte_t *ptep) { pte_t res; Index: linux-2.6.17-rc/include/asm-i386/pgtable-2level.h =================================================================== --- linux-2.6.17-rc.orig/include/asm-i386/pgtable-2level.h 2006-04-25 15:41:06.000000000 -0700 +++ linux-2.6.17-rc/include/asm-i386/pgtable-2level.h 2006-04-26 08:37:34.000000000 -0700 @@ -18,6 +18,8 @@ #define set_pte_atomic(pteptr, pteval) set_pte(pteptr,pteval) #define set_pmd(pmdptr, pmdval) (*(pmdptr) = (pmdval)) +#define pte_clear(mm,addr,xp) do { set_pte_at(mm, addr, xp, __pte(0)); } while (0) + #define ptep_get_and_clear(mm,addr,xp) __pte(xchg(&(xp)->pte_low, 0)) #define pte_same(a, b) ((a).pte_low == (b).pte_low) #define pte_page(x) pfn_to_page(pte_pfn(x)) ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] i386: PAE entries must have their low word cleared first 2006-04-26 15:44 ` Zachary Amsden @ 2006-04-26 15:57 ` Hugh Dickins 2006-04-26 16:12 ` Keir Fraser 0 siblings, 1 reply; 14+ messages in thread From: Hugh Dickins @ 2006-04-26 15:57 UTC (permalink / raw) To: Zachary Amsden; +Cc: Keir Fraser, Jan Beulich, linux-kernel On Wed, 26 Apr 2006, Zachary Amsden wrote: > Keir Fraser wrote: > > > > We cannot use pte_clear() unless we redefine it for PAE. Currently it > > reduces to set_pte() which explicitly uses the wrong ordering (sets high > > *then* low, because it's normally used to introduce a mapping). > > Yes, that means pte_clear() has the same bug. Why don't we redefine pte_clear > for PAE? Then the ifdef can go away. Good, we agree. > > Also, I think wmb() should be used in PAE's set_pte(), rather than the > > current smp_wmb(). They reduce to the same thing, but wmb() makes it clear > > this is is not an issue specific to SMP systems. > > I agree with that. The problem is not related to SMP at all. I would propose > this approach. Attachment included... > Proposed fix for ptep_get_and_clear_full PAE bug. Pte_clear had the same > bug, so use the same fix for both. You need to expand that comment with text from Jan & Keir's patch: Andrew will want to know just what this bug was. The patch looks good to me, except for the unnecessary do { } while (0) in the definition of pte_clear: ah, you're only copying what was already there, can't blame you for that, let's not worry about it. Hugh > Signed-off-by: Zachary Amsden <zach@vmware.com> > > Index: linux-2.6.17-rc/include/asm-i386/pgtable.h > =================================================================== > --- linux-2.6.17-rc.orig/include/asm-i386/pgtable.h 2006-04-25 15:41:06.000000000 -0700 > +++ linux-2.6.17-rc/include/asm-i386/pgtable.h 2006-04-26 08:39:42.000000000 -0700 > @@ -204,7 +204,6 @@ extern unsigned long long __PAGE_KERNEL, > extern unsigned long pg0[]; > > #define pte_present(x) ((x).pte_low & (_PAGE_PRESENT | _PAGE_PROTNONE)) > -#define pte_clear(mm,addr,xp) do { set_pte_at(mm, addr, xp, __pte(0)); } while (0) > > /* To avoid harmful races, pmd_none(x) should check only the lower when PAE */ > #define pmd_none(x) (!(unsigned long)pmd_val(x)) > @@ -268,7 +267,7 @@ static inline pte_t ptep_get_and_clear_f > pte_t pte; > if (full) { > pte = *ptep; > - *ptep = __pte(0); > + pte_clear(mm, addr, ptep); > } else { > pte = ptep_get_and_clear(mm, addr, ptep); > } > Index: linux-2.6.17-rc/include/asm-i386/pgtable-3level.h > =================================================================== > --- linux-2.6.17-rc.orig/include/asm-i386/pgtable-3level.h 2006-04-25 15:41:06.000000000 -0700 > +++ linux-2.6.17-rc/include/asm-i386/pgtable-3level.h 2006-04-26 08:38:57.000000000 -0700 > @@ -85,6 +85,13 @@ static inline void pud_clear (pud_t * pu > #define pmd_offset(pud, address) ((pmd_t *) pud_page(*(pud)) + \ > pmd_index(address)) > > +static inline void pte_clear(struct mm_struct *mm, unsigned long addr, pte_t *ptep) > +{ > + ptep->pte_low = 0; > + wmb(); > + ptep->pte_high = 0; > +} > + > static inline pte_t ptep_get_and_clear(struct mm_struct *mm, unsigned long addr, pte_t *ptep) > { > pte_t res; > Index: linux-2.6.17-rc/include/asm-i386/pgtable-2level.h > =================================================================== > --- linux-2.6.17-rc.orig/include/asm-i386/pgtable-2level.h 2006-04-25 15:41:06.000000000 -0700 > +++ linux-2.6.17-rc/include/asm-i386/pgtable-2level.h 2006-04-26 08:37:34.000000000 -0700 > @@ -18,6 +18,8 @@ > #define set_pte_atomic(pteptr, pteval) set_pte(pteptr,pteval) > #define set_pmd(pmdptr, pmdval) (*(pmdptr) = (pmdval)) > > +#define pte_clear(mm,addr,xp) do { set_pte_at(mm, addr, xp, __pte(0)); } while (0) > + > #define ptep_get_and_clear(mm,addr,xp) __pte(xchg(&(xp)->pte_low, 0)) > #define pte_same(a, b) ((a).pte_low == (b).pte_low) > #define pte_page(x) pfn_to_page(pte_pfn(x)) ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] i386: PAE entries must have their low word cleared first 2006-04-26 15:57 ` Hugh Dickins @ 2006-04-26 16:12 ` Keir Fraser 0 siblings, 0 replies; 14+ messages in thread From: Keir Fraser @ 2006-04-26 16:12 UTC (permalink / raw) To: Hugh Dickins; +Cc: Zachary Amsden, Jan Beulich, linux-kernel On 26 Apr 2006, at 16:57, Hugh Dickins wrote: >> Proposed fix for ptep_get_and_clear_full PAE bug. Pte_clear had the >> same >> bug, so use the same fix for both. > > You need to expand that comment with text from Jan & Keir's patch: > Andrew will want to know just what this bug was. The patch looks > good to me, except for the unnecessary do { } while (0) in the > definition of pte_clear: ah, you're only copying what was already > there, can't blame you for that, let's not worry about it. I agree: the patch comment is rather brief, but the patch itself looks good. I suppose switching PAE's set_pte() from smp_wmb() to wmb() would logically belong in a separate patch. -- Keir ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] i386: PAE entries must have their low word cleared first 2006-04-26 15:06 ` Keir Fraser 2006-04-26 15:44 ` Zachary Amsden @ 2006-04-26 15:45 ` Hugh Dickins 2006-04-26 16:06 ` Nick Piggin 2006-04-26 15:58 ` Nick Piggin 2 siblings, 1 reply; 14+ messages in thread From: Hugh Dickins @ 2006-04-26 15:45 UTC (permalink / raw) To: Keir Fraser; +Cc: Jan Beulich, Zachary Amsden, linux-kernel On Wed, 26 Apr 2006, Keir Fraser wrote: > > We cannot use pte_clear() unless we redefine it for PAE. Currently it reduces > to set_pte() which explicitly uses the wrong ordering (sets high *then* low, > because it's normally used to introduce a mapping). I overlooked that reversal completely. What a very good point. I think that actually pte_clear() _does_ need to be redefined for PAE, to reverse that ordering as you point out. Take a look at its use in mm/highmem.c (where a comment states it's safe against speculative execution, but a comment can't guarantee that!): what do you think? Hugh ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] i386: PAE entries must have their low word cleared first 2006-04-26 15:45 ` Hugh Dickins @ 2006-04-26 16:06 ` Nick Piggin 0 siblings, 0 replies; 14+ messages in thread From: Nick Piggin @ 2006-04-26 16:06 UTC (permalink / raw) To: Hugh Dickins; +Cc: Keir Fraser, Jan Beulich, Zachary Amsden, linux-kernel Hugh Dickins wrote: > On Wed, 26 Apr 2006, Keir Fraser wrote: > >>We cannot use pte_clear() unless we redefine it for PAE. Currently it reduces >>to set_pte() which explicitly uses the wrong ordering (sets high *then* low, >>because it's normally used to introduce a mapping). > > > I overlooked that reversal completely. What a very good point. > I think that actually pte_clear() _does_ need to be redefined for PAE, > to reverse that ordering as you point out. Take a look at its use in > mm/highmem.c (where a comment states it's safe against speculative > execution, but a comment can't guarantee that!): what do you think? Speculative execution is safe I think (and so is ptep_get_and_clear_full, because in neither case will the virtual address be visible). Speculative prefetching + tlb instantiation, apparently no. -- SUSE Labs, Novell Inc. Send instant messages to your online friends http://au.messenger.yahoo.com ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] i386: PAE entries must have their low word cleared first 2006-04-26 15:06 ` Keir Fraser 2006-04-26 15:44 ` Zachary Amsden 2006-04-26 15:45 ` Hugh Dickins @ 2006-04-26 15:58 ` Nick Piggin 2006-04-27 10:27 ` Sonny Rao 2 siblings, 1 reply; 14+ messages in thread From: Nick Piggin @ 2006-04-26 15:58 UTC (permalink / raw) To: Keir Fraser; +Cc: Hugh Dickins, Jan Beulich, Zachary Amsden, linux-kernel Keir Fraser wrote: > > On 26 Apr 2006, at 15:46, Hugh Dickins wrote: > >> If that's so (I don't trust my judgement on matters of speculative >> execution), then I think you'd do better to replace the *ptep = __pte(0) >> by pte_clear(mm, addr, ptep), and so avoid your ugly #ifdef'ing: please >> check, but I think you'll find that reduces to just the barrier you want. >> CC'ed Zach since it's his optimization, and he'll judge that spexecution. > > > In more detail the problem is that, since we're still running on the > page tables while clearing them, the CPU may choose to prefetch a > half-cleared pte into its TLB, and then execute speculative memory > accesses based on that mapping (including ones that may write-allocate > cachelines, leading to problems like the AMD AGP GART deadlock Linux had > a year or so back). What do you mean, you're still running on the page tables? The CPU can still walk the pagetables? Because if ptep_get_and_clear_full is passed non zero in the full argument, then that specific translation should never see another access. I didn't know CPUs now actually resolve TLB misses as part of speculative prefetching... does this really happen? Do you have a pointer to where the AMD AGP GART deadlock was discussed? Google is having some trouble finding it. > > The barrier is needed to ensure that the bottom half is cleared before > the top half. In fact it probably ought to be wmb() -- that's clearer in > intent and actually reduces to barrier() on all PAE-capable platforms. I think barrier is correct there. The full argument means that the mm is being torn down and we are the last thread in the mm. So these pagetables can't be visible to other CPUs, can they? > > We cannot use pte_clear() unless we redefine it for PAE. Currently it > reduces to set_pte() which explicitly uses the wrong ordering (sets high > *then* low, because it's normally used to introduce a mapping). Presumably you would redefine it for PAE also (in this case you may actually need an smp_wmb rather than just barrier). Unless you can explain why this is a problem and not the other pte_clear callers. > > Also, I think wmb() should be used in PAE's set_pte(), rather than the > current smp_wmb(). They reduce to the same thing, but wmb() makes it > clear this is is not an issue specific to SMP systems. That is what smp_wmb() means. wmb() means it should also order io memory. Clear as mud? -- SUSE Labs, Novell Inc. Send instant messages to your online friends http://au.messenger.yahoo.com ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] i386: PAE entries must have their low word cleared first 2006-04-26 15:58 ` Nick Piggin @ 2006-04-27 10:27 ` Sonny Rao 2006-04-27 13:39 ` Nick Piggin 0 siblings, 1 reply; 14+ messages in thread From: Sonny Rao @ 2006-04-27 10:27 UTC (permalink / raw) To: Nick Piggin Cc: Keir Fraser, Hugh Dickins, Jan Beulich, Zachary Amsden, linux-kernel, anton On Thu, Apr 27, 2006 at 01:58:42AM +1000, Nick Piggin wrote: > Keir Fraser wrote: > > > >On 26 Apr 2006, at 15:46, Hugh Dickins wrote: > > > >>If that's so (I don't trust my judgement on matters of speculative > >>execution), then I think you'd do better to replace the *ptep = __pte(0) > >>by pte_clear(mm, addr, ptep), and so avoid your ugly #ifdef'ing: please > >>check, but I think you'll find that reduces to just the barrier you want. > >>CC'ed Zach since it's his optimization, and he'll judge that spexecution. > > > > > >In more detail the problem is that, since we're still running on the > >page tables while clearing them, the CPU may choose to prefetch a > >half-cleared pte into its TLB, and then execute speculative memory > >accesses based on that mapping (including ones that may write-allocate > >cachelines, leading to problems like the AMD AGP GART deadlock Linux had > >a year or so back). > > What do you mean, you're still running on the page tables? The CPU can > still walk the pagetables? > > Because if ptep_get_and_clear_full is passed non zero in the full > argument, then that specific translation should never see another > access. I didn't know CPUs now actually resolve TLB misses as part of > speculative prefetching... does this really happen? For instance, during speculative execution on POWER, we can take a TLB miss for a speculative load and start a table-walk. I'm not sure what "speculative prefetching" means in this case... just regular hardware-initiated prefetching (where I suppose one could use the modifier "speculative") on POWER will only prefetch to a page-boundary. So, slightly OT, as this is not about x86 CPUs... but thought people might be interested. (Added Anton to CC to call any BS on my part :-) Sonny ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] i386: PAE entries must have their low word cleared first 2006-04-27 10:27 ` Sonny Rao @ 2006-04-27 13:39 ` Nick Piggin 0 siblings, 0 replies; 14+ messages in thread From: Nick Piggin @ 2006-04-27 13:39 UTC (permalink / raw) To: Sonny Rao Cc: Keir Fraser, Hugh Dickins, Jan Beulich, Zachary Amsden, linux-kernel, anton Sonny Rao wrote: > On Thu, Apr 27, 2006 at 01:58:42AM +1000, Nick Piggin wrote: > >>Keir Fraser wrote: >> >>>In more detail the problem is that, since we're still running on the >>>page tables while clearing them, the CPU may choose to prefetch a >>>half-cleared pte into its TLB, and then execute speculative memory >>>accesses based on that mapping (including ones that may write-allocate >>>cachelines, leading to problems like the AMD AGP GART deadlock Linux had >>>a year or so back). >> >>What do you mean, you're still running on the page tables? The CPU can >>still walk the pagetables? >> >>Because if ptep_get_and_clear_full is passed non zero in the full >>argument, then that specific translation should never see another >>access. I didn't know CPUs now actually resolve TLB misses as part of >>speculative prefetching... does this really happen? > > > For instance, during speculative execution on POWER, we can take a > TLB miss for a speculative load and start a table-walk. > > I'm not sure what "speculative prefetching" means in this case... just > regular hardware-initiated prefetching (where I suppose one could use the > modifier "speculative") on POWER will only prefetch to a page-boundary. speculative prefetching / hardware prefetching... I believe this bug wouldn't happen as a result of speculative execution, because a load from this virtual address should never be in the instruction stream. So it would require some hardware initiated prefetch to establish the tlb entry. > > So, slightly OT, as this is not about x86 CPUs... but thought people > might be interested. Thanks! -- SUSE Labs, Novell Inc. Send instant messages to your online friends http://au.messenger.yahoo.com ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2006-04-27 15:15 UTC | newest] Thread overview: 14+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2006-04-26 22:11 [PATCH] i386: PAE entries must have their low word cleared first Brunner, Richard 2006-04-26 22:22 ` Zachary Amsden 2006-04-26 22:27 ` Zachary Amsden -- strict thread matches above, loose matches on Subject: below -- 2006-04-26 13:46 Jan Beulich 2006-04-26 14:46 ` Hugh Dickins 2006-04-26 15:06 ` Keir Fraser 2006-04-26 15:44 ` Zachary Amsden 2006-04-26 15:57 ` Hugh Dickins 2006-04-26 16:12 ` Keir Fraser 2006-04-26 15:45 ` Hugh Dickins 2006-04-26 16:06 ` Nick Piggin 2006-04-26 15:58 ` Nick Piggin 2006-04-27 10:27 ` Sonny Rao 2006-04-27 13:39 ` Nick Piggin
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).