* [PATCH] powerpc/mm/hash: Fix the reference bit update when handling hash fault
@ 2016-05-27 6:26 Aneesh Kumar K.V
0 siblings, 0 replies; 6+ messages in thread
From: Aneesh Kumar K.V @ 2016-05-27 6:26 UTC (permalink / raw)
To: benh, paulus, mpe, Hugh Dickins; +Cc: linuxppc-dev, Aneesh Kumar K.V
When we converted the asm routines to C functions, we missed updating
HPTE_R_R based on _PAGE_ACCESSED. ASM code used to copy over the lower
bits from pte via.
andi. r3,r30,0x1fe /* Get basic set of flags */
Fixes: 'commit 89ff725051d1 ("powerpc/mm: Convert __hash_page_64K to C")'
We also don't set C bit always with this patch. This was added by
'commit c5cf0e30bf3d8 ("[PATCH] powerpc: Fix buglet with MMU hash management")'
With hash64, we need to make sure that hardware doesn't do a pte update
directly. This is because we do end up with entries in TLB with no hash
page table entry. This happens because when we find hash bucket full,
we "evict" a more/less random entry from it. When we do that we don't
invalidate the TLB (hpte_remove) because we assume the old translation
is still technically "valid". For more info look at
'commit 0608d692463(" powerpc/mm: Always invalidate tlb on hpte invalidate and
update")'. Now that implies that hardware should never do a writeback to
update 'R' or 'C' hpte bits.
Commitc 5cf0e30bf3d8 did that for 'C' bit by enabling 'C' bit always.
We don't really need to do that because we never map a RW pte entry
without setting 'C' bit. on READ fault on a RW pte entry, we still map
it READ only, hence a store update in the page will still cause a hash
pte fault.
This patch reverts the 'C' update part of the c5cf0e30bf3d8
("[PATCH] powerpc: Fix buglet with MMU hash management") and retain
the updatepp part.
- If we hit the updatepp path on native, the old code without that
commit, would fail to set C bcause native_hpte_updatepp()
was implemented to filter the same bits as H_PROTECT and not let C
through thus we would "upgrade" a RO HPTE to RW without setting C
thus causing the bug. So the real fix in that commit was the change
to native_hpte_updatepp
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
arch/powerpc/mm/hash_utils_64.c | 22 ++++++++++++++++++++--
1 file changed, 20 insertions(+), 2 deletions(-)
diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
index 9c594ea99149..76fa5f3eb450 100644
--- a/arch/powerpc/mm/hash_utils_64.c
+++ b/arch/powerpc/mm/hash_utils_64.c
@@ -159,6 +159,19 @@ static struct mmu_psize_def mmu_psize_defaults_gp[] = {
},
};
+/*
+ * 'R' and 'C' update notes:
+ * - Under pHyp or KVM, the updatepp path will not set C, thus it *will*
+ * create writeable HPTEs without C set, because the hcall H_PROTECT
+ * that we use in that case will not update C
+ * - The above is however not a problem, because we also don't do that
+ * fancy "no flush" variant of eviction and we use H_REMOVE which will
+ * do the right thing and thus we don't have the race I described earlier
+ *
+ * - Under bare metal, we do have the race, so we need R and C set
+ * - We make sure R is always set and never lost
+ * - C is _PAGE_DIRTY, and *should* always be set for a writeable mapping
+ */
unsigned long htab_convert_pte_flags(unsigned long pteflags)
{
unsigned long rflags = 0;
@@ -186,9 +199,14 @@ unsigned long htab_convert_pte_flags(unsigned long pteflags)
rflags |= 0x1;
}
/*
- * Always add "C" bit for perf. Memory coherence is always enabled
+ * We can't allow hardware to update hpte bits. Hence always
+ * set 'R' bit and set 'C' if it is a write fault
+ * Memory coherence is always enabled
*/
- rflags |= HPTE_R_C | HPTE_R_M;
+ rflags |= HPTE_R_R | HPTE_R_M;
+
+ if (pteflags & _PAGE_DIRTY)
+ rflags |= HPTE_R_C;
/*
* Add in WIG bits
*/
--
2.7.4
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH] powerpc/mm/hash: Fix the reference bit update when handling hash fault
[not found] <201605270629.u4R6OCLx014831@mx0a-001b2d01.pphosted.com>
@ 2016-05-30 16:39 ` Hugh Dickins
2016-05-30 22:27 ` Benjamin Herrenschmidt
0 siblings, 1 reply; 6+ messages in thread
From: Hugh Dickins @ 2016-05-30 16:39 UTC (permalink / raw)
To: Aneesh Kumar K.V; +Cc: benh, paulus, mpe, Hugh Dickins, linuxppc-dev
On Fri, 27 May 2016, Aneesh Kumar K.V wrote:
> When we converted the asm routines to C functions, we missed updating
> HPTE_R_R based on _PAGE_ACCESSED. ASM code used to copy over the lower
> bits from pte via.
>
> andi. r3,r30,0x1fe /* Get basic set of flags */
>
> Fixes: 'commit 89ff725051d1 ("powerpc/mm: Convert __hash_page_64K to C")'
>
> We also don't set C bit always with this patch. This was added by
> 'commit c5cf0e30bf3d8 ("[PATCH] powerpc: Fix buglet with MMU hash management")'
>
> With hash64, we need to make sure that hardware doesn't do a pte update
> directly. This is because we do end up with entries in TLB with no hash
> page table entry. This happens because when we find hash bucket full,
> we "evict" a more/less random entry from it. When we do that we don't
> invalidate the TLB (hpte_remove) because we assume the old translation
> is still technically "valid". For more info look at
> 'commit 0608d692463(" powerpc/mm: Always invalidate tlb on hpte invalidate and
> update")'. Now that implies that hardware should never do a writeback to
> update 'R' or 'C' hpte bits.
>
> Commitc 5cf0e30bf3d8 did that for 'C' bit by enabling 'C' bit always.
> We don't really need to do that because we never map a RW pte entry
> without setting 'C' bit. on READ fault on a RW pte entry, we still map
> it READ only, hence a store update in the page will still cause a hash
> pte fault.
>
> This patch reverts the 'C' update part of the c5cf0e30bf3d8
> ("[PATCH] powerpc: Fix buglet with MMU hash management") and retain
> the updatepp part.
> - If we hit the updatepp path on native, the old code without that
> commit, would fail to set C bcause native_hpte_updatepp()
> was implemented to filter the same bits as H_PROTECT and not let C
> through thus we would "upgrade" a RO HPTE to RW without setting C
> thus causing the bug. So the real fix in that commit was the change
> to native_hpte_updatepp
>
> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> ---
> arch/powerpc/mm/hash_utils_64.c | 22 ++++++++++++++++++++--
> 1 file changed, 20 insertions(+), 2 deletions(-)
>
> diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
> index 9c594ea99149..76fa5f3eb450 100644
> --- a/arch/powerpc/mm/hash_utils_64.c
> +++ b/arch/powerpc/mm/hash_utils_64.c
> @@ -159,6 +159,19 @@ static struct mmu_psize_def mmu_psize_defaults_gp[] = {
> },
> };
>
> +/*
> + * 'R' and 'C' update notes:
> + * - Under pHyp or KVM, the updatepp path will not set C, thus it *will*
> + * create writeable HPTEs without C set, because the hcall H_PROTECT
> + * that we use in that case will not update C
> + * - The above is however not a problem, because we also don't do that
> + * fancy "no flush" variant of eviction and we use H_REMOVE which will
> + * do the right thing and thus we don't have the race I described earlier
> + *
> + * - Under bare metal, we do have the race, so we need R and C set
> + * - We make sure R is always set and never lost
> + * - C is _PAGE_DIRTY, and *should* always be set for a writeable mapping
> + */
> unsigned long htab_convert_pte_flags(unsigned long pteflags)
> {
> unsigned long rflags = 0;
> @@ -186,9 +199,14 @@ unsigned long htab_convert_pte_flags(unsigned long pteflags)
> rflags |= 0x1;
> }
> /*
> - * Always add "C" bit for perf. Memory coherence is always enabled
> + * We can't allow hardware to update hpte bits. Hence always
> + * set 'R' bit and set 'C' if it is a write fault
> + * Memory coherence is always enabled
> */
> - rflags |= HPTE_R_C | HPTE_R_M;
> + rflags |= HPTE_R_R | HPTE_R_M;
> +
> + if (pteflags & _PAGE_DIRTY)
> + rflags |= HPTE_R_C;
> /*
> * Add in WIG bits
> */
> --
> 2.7.4
Thanks a lot for the patch: I'm just starting to try it out.
I had hoped to try it on 4.7-rc1, but found yesterday that rc1 has at
least two issues (not specific to powerpc) with OOMing and leaking
memory, that I've no chance of sustaining my swapping load on it.
I'll hope at least one of them gets fixed by rc2 and try again then.
So I've applied your patch to v4.5 and v4.6, and set v4.6 going for now.
But if all goes well, I won't be able to report back with confidence for
a couple of days.
I don't mean to be churlish, and subtract from your triumph in tracking
this down (assuming you have), but that commit log... okay, it's intended
for powerpc mmu experts, not me, but if it hasn't already gone into git,
then a rewrite could be very helpful.
I gather that C is a bit as well as a language :) and there is a code
comment which helped me by mentioning "C is _PAGE_DIRTY, and *should*
always be set for a writeable mapping" (though I wonder what imposes
that "should" - certainly not core mm); which was the first mention
of "dirty" or "modified" in the whole thing.
And it needs a Cc to stable. And the patch title: "Fix the reference
bit update"? But this is about dirty bit, not referenced bit, isn't it?
If it's just about referenced bit, then I don't see how we would have
corruption before the fix (but again, I'm ignorant of powerpc mmu).
Many thanks, I hope, for the fix: I shall report back.
Hugh
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] powerpc/mm/hash: Fix the reference bit update when handling hash fault
2016-05-30 16:39 ` [PATCH] powerpc/mm/hash: Fix the reference bit update when handling hash fault Hugh Dickins
@ 2016-05-30 22:27 ` Benjamin Herrenschmidt
2016-05-31 16:28 ` Hugh Dickins
0 siblings, 1 reply; 6+ messages in thread
From: Benjamin Herrenschmidt @ 2016-05-30 22:27 UTC (permalink / raw)
To: Hugh Dickins, Aneesh Kumar K.V; +Cc: paulus, mpe, linuxppc-dev
On Mon, 2016-05-30 at 09:39 -0700, Hugh Dickins wrote:
> I don't mean to be churlish, and subtract from your triumph in tracking
> this down (assuming you have), but that commit log... okay, it's intended
> for powerpc mmu experts, not me, but if it hasn't already gone into git,
> then a rewrite could be very helpful.
Something along these lines:
The powerpc hash table has a R (Referenced) and C (Changed) bits that
somewhat correspond to Linux _PAGE_DIRTY and _PAGE_ACCESSED. However we
don't currently use them.
Moreover, we also require them to never be updated by HW. This is due
to an optimization we have in the hash eviction code, which would be
racy vs. a hardware update as the HW updates are done non-atomically.
Thus it's critical that valid hash PTEs always have R set and writeable
ones have C set. We do this by hashing a non-dirty linux PTE as read-only and always setting _PAGE_ACCESSED (and thus R) when hashing anything else in. Any attempt by Linux at clearing those bits also removes the corresponding hash entry.
The old commit <.....> fixed an issue where we would miss setting C in
the specific case where a Linux PTE was upgraded from read only to
read-write (and appropriately made dirty). The hash code would realize
the hash PTE is already present and would use a different path than the
normal insertion path for updating a hash entry in-place. That path
unfortunately didn't update "C".
That commit however got a bit over zealous and also forced C on any
entry including those that aren't writeable. That was unnecessary.
In commit 89ff725051d1, when converting to C, we mangled that up:
- We kept the useless part of <....> setting C always instead of
only when _PAGE_DIRTY is set
- We never set R thus letting the HW do the racy updates.
This fixes it.
Ben.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] powerpc/mm/hash: Fix the reference bit update when handling hash fault
2016-05-30 22:27 ` Benjamin Herrenschmidt
@ 2016-05-31 16:28 ` Hugh Dickins
2016-06-02 15:12 ` Hugh Dickins
0 siblings, 1 reply; 6+ messages in thread
From: Hugh Dickins @ 2016-05-31 16:28 UTC (permalink / raw)
To: Benjamin Herrenschmidt
Cc: Hugh Dickins, Aneesh Kumar K.V, paulus, mpe, linuxppc-dev
[-- Attachment #1: Type: TEXT/PLAIN, Size: 2716 bytes --]
On Tue, 31 May 2016, Benjamin Herrenschmidt wrote:
> On Mon, 2016-05-30 at 09:39 -0700, Hugh Dickins wrote:
> > I don't mean to be churlish, and subtract from your triumph in tracking
> > this down (assuming you have), but that commit log... okay, it's intended
> > for powerpc mmu experts, not me, but if it hasn't already gone into git,
> > then a rewrite could be very helpful.
>
> Something along these lines:
>
> The powerpc hash table has a R (Referenced) and C (Changed) bits that
> somewhat correspond to Linux _PAGE_DIRTY and _PAGE_ACCESSED. However we
> don't currently use them.
>
> Moreover, we also require them to never be updated by HW. This is due
> to an optimization we have in the hash eviction code, which would be
> racy vs. a hardware update as the HW updates are done non-atomically.
>
> Thus it's critical that valid hash PTEs always have R set and writeable
> ones have C set. We do this by hashing a non-dirty linux PTE as read-only and always setting _PAGE_ACCESSED (and thus R) when hashing anything else in. Any attempt by Linux at clearing those bits also removes the corresponding hash entry.
>
> The old commit <.....> fixed an issue where we would miss setting C in
> the specific case where a Linux PTE was upgraded from read only to
> read-write (and appropriately made dirty). The hash code would realize
> the hash PTE is already present and would use a different path than the
> normal insertion path for updating a hash entry in-place. That path
> unfortunately didn't update "C".
>
> That commit however got a bit over zealous and also forced C on any
> entry including those that aren't writeable. That was unnecessary.
>
> In commit 89ff725051d1, when converting to C, we mangled that up:
>
> - We kept the useless part of <....> setting C always instead of
> only when _PAGE_DIRTY is set
>
> - We never set R thus letting the HW do the racy updates.
>
> This fixes it.
Thanks, that helped me to understand a lot better - and the original
commitlog then made much more sense to me after this version.
Plus I can now see that it is inherently a very confusing situation,
made the more confusing by the series of not-quite-right commits,
so all the more difficult to explain to an outsider.
I still won't pretend to understand it fully, but don't mind that:
my main concern is that if the commitlog is confused, then that
might hint that the code is still not quite right.
But all my evidence so far is that it is now right: I'll continue
testing v4.6+fix on a couple of loads until this evening: all is
well so far. And then switch to testing v4.5+fix on those loads
for another day and a half.
Hugh
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] powerpc/mm/hash: Fix the reference bit update when handling hash fault
2016-05-31 16:28 ` Hugh Dickins
@ 2016-06-02 15:12 ` Hugh Dickins
2016-06-07 12:07 ` Michael Ellerman
0 siblings, 1 reply; 6+ messages in thread
From: Hugh Dickins @ 2016-06-02 15:12 UTC (permalink / raw)
To: Aneesh Kumar K.V; +Cc: Benjamin Herrenschmidt, paulus, mpe, linuxppc-dev
On Tue, 31 May 2016, Hugh Dickins wrote:
>
> But all my evidence so far is that it is now right: I'll continue
> testing v4.6+fix on a couple of loads until this evening: all is
> well so far. And then switch to testing v4.5+fix on those loads
> for another day and a half.
I'm glad to confirm: your patch to htab_convert_pte_flags() fixes all
the elusive sigsegv issues I was seeing under load on v4.5 and v4.6.
Thank you!
Hugh
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] powerpc/mm/hash: Fix the reference bit update when handling hash fault
2016-06-02 15:12 ` Hugh Dickins
@ 2016-06-07 12:07 ` Michael Ellerman
0 siblings, 0 replies; 6+ messages in thread
From: Michael Ellerman @ 2016-06-07 12:07 UTC (permalink / raw)
To: Hugh Dickins, Aneesh Kumar K.V
Cc: Benjamin Herrenschmidt, paulus, linuxppc-dev
On Thu, 2016-06-02 at 08:12 -0700, Hugh Dickins wrote:
> On Tue, 31 May 2016, Hugh Dickins wrote:
> >
> > But all my evidence so far is that it is now right: I'll continue
> > testing v4.6+fix on a couple of loads until this evening: all is
> > well so far. And then switch to testing v4.5+fix on those loads
> > for another day and a half.
>
> I'm glad to confirm: your patch to htab_convert_pte_flags() fixes all
> the elusive sigsegv issues I was seeing under load on v4.5 and v4.6.
> Thank you!
Thanks for all the testing Hugh, and sorry for the bug.
cheers
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2016-06-07 12:07 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <201605270629.u4R6OCLx014831@mx0a-001b2d01.pphosted.com>
2016-05-30 16:39 ` [PATCH] powerpc/mm/hash: Fix the reference bit update when handling hash fault Hugh Dickins
2016-05-30 22:27 ` Benjamin Herrenschmidt
2016-05-31 16:28 ` Hugh Dickins
2016-06-02 15:12 ` Hugh Dickins
2016-06-07 12:07 ` Michael Ellerman
2016-05-27 6:26 Aneesh Kumar K.V
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).