* [PATCH 0/3] Simplify set_huge_pte_at, pte_same for LPAE
@ 2013-11-19 17:35 Steve Capper
2013-11-19 17:35 ` [PATCH 1/3] ARM: mm: Rewire LPAE set_huge_pte_at Steve Capper
` (4 more replies)
0 siblings, 5 replies; 7+ messages in thread
From: Steve Capper @ 2013-11-19 17:35 UTC (permalink / raw)
To: linux-arm-kernel
Hello,
The following patch series is my attempt at fixing a rather nasty bug
which became visible in 3.12-rc1 when running the libhugetlbfs test
suite. (This problem only just came to my attention yesterday).
For LPAE, set_huge_pte_at calls set_pte_at which then calls
set_pte_ext, which in turn is wired up to call cpu_v7_set_pte_ext,
which is defined in proc-v7-3level.S.
For huge pages, given newprot a pgprot_t value for a shared writable
VMA, and ptep a pointer to a pte belonging to this VMA; the following
behaviour is assumed by core code:
hugetlb_change_protection(vma, address, end, newprot);
...
huge_pte_write(huge_ptep_get(ptep)); /* should be true! */
Unfortunately, cpu_v7_set_pte_ext will change the bit layout of the
resultant pte, and will set the read only bit if the dirty bit is not
also enabled.
If one were to allocate a read only shared huge page, then fault it in,
and then mprotect it to be writeable. A subsequent write to that huge
page will result in a spurious call to hugetlb_cow, which causes
corruption. This call is optimised away prior to:
37a2140 mm, hugetlb: do not use a page in page cache for cow
optimization
If one runs the libhugetlbfs test suite on v3.12-rc1 upwards, then the
mprotect test will cause the afformentioned corruption and before the
set of tests completes, the system will be left in an unresponsive
state. (calls to fork fail with -ENOMEM).
This was an absolute pig to debug and, as this is the second time I've
ran into issues caused by ptes being modified in transit, I've opted to
re-implement set_huge_pte_at such that it just dereferences the pte.
(in a similar manner as arm64). This has also allowed me to revert the
pte_same logic change (that removed the NG bit from comparison), by
also setting the NG bit for all new huge ptes.
These patches are against 3.12, and I have tested this series on an
Arndale board with LPAE running libhugetlbfs.
I would really value any comments/critique/flames on this series.
Especially as I've ommitted the DCCMVAC at the end of set_huge_pte_at
as I couldn't see why it was needed, please yell at me if it is needed!
:-)
Cheers,
--
Steve
Steve Capper (3):
ARM: mm: Rewire LPAE set_huge_pte_at
ARM: mm: Make LPAE huge page ptes NG by default
Revert "ARM: mm: correct pte_same behaviour for LPAE."
arch/arm/include/asm/hugetlb-3level.h | 7 ++++++-
arch/arm/include/asm/pgtable-3level.h | 19 +------------------
2 files changed, 7 insertions(+), 19 deletions(-)
--
1.8.1.4
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH 1/3] ARM: mm: Rewire LPAE set_huge_pte_at
2013-11-19 17:35 [PATCH 0/3] Simplify set_huge_pte_at, pte_same for LPAE Steve Capper
@ 2013-11-19 17:35 ` Steve Capper
2013-11-19 17:35 ` [PATCH 2/3] ARM: mm: Make LPAE huge page ptes NG by default Steve Capper
` (3 subsequent siblings)
4 siblings, 0 replies; 7+ messages in thread
From: Steve Capper @ 2013-11-19 17:35 UTC (permalink / raw)
To: linux-arm-kernel
For LPAE, set_huge_pte_at calls set_pte_at which then calls
set_pte_ext, which in turn is wired up to call cpu_v7_set_pte_ext,
which is defined in proc-v7-3level.S.
For huge pages, given newprot a pgprot_t value for a shared writable
VMA, and ptep a pointer to a pte belonging to this VMA; the following
behaviour is assumed by core code:
hugetlb_change_protection(vma, address, end, newprot);
...
huge_pte_write(huge_ptep_get(ptep)); /* should be true! */
Unfortunately, cpu_v7_set_pte_ext will change the bit layout of the
resultant pte, and will set the read only bit if the dirty bit is not
also enabled.
If one were to allocate a read only shared huge page, then fault it in,
and then mprotect it to be writeable. A subsequent write to that huge
page will result in a spurious call to hugetlb_cow, which causes
corruption. This call is optimised away prior to:
37a2140 mm, hugetlb: do not use a page in page cache for cow
optimization
If one runs the libhugetlbfs test suite on v3.12-rc1 upwards, then the
mprotect test will cause the afformentioned corruption and before the
set of tests completes, the system will be left in an unresponsive
state. (calls to fork fail with -ENOMEM).
This patch re-implements set_huge_pte_at to dereference the pte value
explicitly. hugetlb_cow is no longer called spuriously, and the unit
tests complete succesfully.
Signed-off-by: Steve Capper <steve.capper@linaro.org>
---
arch/arm/include/asm/hugetlb-3level.h | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/arch/arm/include/asm/hugetlb-3level.h b/arch/arm/include/asm/hugetlb-3level.h
index d4014fb..211e9a8 100644
--- a/arch/arm/include/asm/hugetlb-3level.h
+++ b/arch/arm/include/asm/hugetlb-3level.h
@@ -40,7 +40,12 @@ static inline pte_t huge_ptep_get(pte_t *ptep)
static inline void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
pte_t *ptep, pte_t pte)
{
- set_pte_at(mm, addr, ptep, pte);
+ VM_BUG_ON(addr >= TASK_SIZE);
+
+ if (pte_present_user(pte))
+ __sync_icache_dcache(pte);
+
+ *ptep = pte;
}
static inline void huge_ptep_clear_flush(struct vm_area_struct *vma,
--
1.8.1.4
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH 2/3] ARM: mm: Make LPAE huge page ptes NG by default
2013-11-19 17:35 [PATCH 0/3] Simplify set_huge_pte_at, pte_same for LPAE Steve Capper
2013-11-19 17:35 ` [PATCH 1/3] ARM: mm: Rewire LPAE set_huge_pte_at Steve Capper
@ 2013-11-19 17:35 ` Steve Capper
2013-11-19 17:35 ` [PATCH 3/3] Revert "ARM: mm: correct pte_same behaviour for LPAE." Steve Capper
` (2 subsequent siblings)
4 siblings, 0 replies; 7+ messages in thread
From: Steve Capper @ 2013-11-19 17:35 UTC (permalink / raw)
To: linux-arm-kernel
We now don't set the NG bit as we write the huge page entry, so set it
on huge page entry creation. This simplifies code, and obviates the
need for us to override pte_same.
Signed-off-by: Steve Capper <steve.capper@linaro.org>
---
arch/arm/include/asm/pgtable-3level.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm/include/asm/pgtable-3level.h b/arch/arm/include/asm/pgtable-3level.h
index 5689c18..d1318e1 100644
--- a/arch/arm/include/asm/pgtable-3level.h
+++ b/arch/arm/include/asm/pgtable-3level.h
@@ -199,7 +199,7 @@ static inline pmd_t *pmd_offset(pud_t *pud, unsigned long addr)
#define set_pte_ext(ptep,pte,ext) cpu_set_pte_ext(ptep,__pte(pte_val(pte)|(ext)))
#define pte_huge(pte) (pte_val(pte) && !(pte_val(pte) & PTE_TABLE_BIT))
-#define pte_mkhuge(pte) (__pte(pte_val(pte) & ~PTE_TABLE_BIT))
+#define pte_mkhuge(pte) (__pte((pte_val(pte) & ~PTE_TABLE_BIT) | PTE_EXT_NG))
#define pmd_young(pmd) (pmd_val(pmd) & PMD_SECT_AF)
--
1.8.1.4
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH 3/3] Revert "ARM: mm: correct pte_same behaviour for LPAE."
2013-11-19 17:35 [PATCH 0/3] Simplify set_huge_pte_at, pte_same for LPAE Steve Capper
2013-11-19 17:35 ` [PATCH 1/3] ARM: mm: Rewire LPAE set_huge_pte_at Steve Capper
2013-11-19 17:35 ` [PATCH 2/3] ARM: mm: Make LPAE huge page ptes NG by default Steve Capper
@ 2013-11-19 17:35 ` Steve Capper
2013-11-19 18:02 ` [PATCH 0/3] Simplify set_huge_pte_at, pte_same for LPAE Christoffer Dall
2013-12-03 13:46 ` Steve Capper
4 siblings, 0 replies; 7+ messages in thread
From: Steve Capper @ 2013-11-19 17:35 UTC (permalink / raw)
To: linux-arm-kernel
This reverts commit dde1b65110353517816bcbc58539463396202244.
We no longer need to override pte_same for LPAE, as we set the NG bit
on huge pte creation.
Signed-off-by: Steve Capper <steve.capper@linaro.org>
---
arch/arm/include/asm/pgtable-3level.h | 17 -----------------
1 file changed, 17 deletions(-)
diff --git a/arch/arm/include/asm/pgtable-3level.h b/arch/arm/include/asm/pgtable-3level.h
index d1318e1..7f3fa99 100644
--- a/arch/arm/include/asm/pgtable-3level.h
+++ b/arch/arm/include/asm/pgtable-3level.h
@@ -179,23 +179,6 @@ static inline pmd_t *pmd_offset(pud_t *pud, unsigned long addr)
clean_pmd_entry(pmdp); \
} while (0)
-/*
- * For 3 levels of paging the PTE_EXT_NG bit will be set for user address ptes
- * that are written to a page table but not for ptes created with mk_pte.
- *
- * In hugetlb_no_page, a new huge pte (new_pte) is generated and passed to
- * hugetlb_cow, where it is compared with an entry in a page table.
- * This comparison test fails erroneously leading ultimately to a memory leak.
- *
- * To correct this behaviour, we mask off PTE_EXT_NG for any pte that is
- * present before running the comparison.
- */
-#define __HAVE_ARCH_PTE_SAME
-#define pte_same(pte_a,pte_b) ((pte_present(pte_a) ? pte_val(pte_a) & ~PTE_EXT_NG \
- : pte_val(pte_a)) \
- == (pte_present(pte_b) ? pte_val(pte_b) & ~PTE_EXT_NG \
- : pte_val(pte_b)))
-
#define set_pte_ext(ptep,pte,ext) cpu_set_pte_ext(ptep,__pte(pte_val(pte)|(ext)))
#define pte_huge(pte) (pte_val(pte) && !(pte_val(pte) & PTE_TABLE_BIT))
--
1.8.1.4
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH 0/3] Simplify set_huge_pte_at, pte_same for LPAE
2013-11-19 17:35 [PATCH 0/3] Simplify set_huge_pte_at, pte_same for LPAE Steve Capper
` (2 preceding siblings ...)
2013-11-19 17:35 ` [PATCH 3/3] Revert "ARM: mm: correct pte_same behaviour for LPAE." Steve Capper
@ 2013-11-19 18:02 ` Christoffer Dall
2013-12-03 13:46 ` Steve Capper
4 siblings, 0 replies; 7+ messages in thread
From: Christoffer Dall @ 2013-11-19 18:02 UTC (permalink / raw)
To: linux-arm-kernel
On Tue, Nov 19, 2013 at 05:35:26PM +0000, Steve Capper wrote:
> Hello,
> The following patch series is my attempt at fixing a rather nasty bug
> which became visible in 3.12-rc1 when running the libhugetlbfs test
> suite. (This problem only just came to my attention yesterday).
>
> For LPAE, set_huge_pte_at calls set_pte_at which then calls
> set_pte_ext, which in turn is wired up to call cpu_v7_set_pte_ext,
> which is defined in proc-v7-3level.S.
>
> For huge pages, given newprot a pgprot_t value for a shared writable
> VMA, and ptep a pointer to a pte belonging to this VMA; the following
> behaviour is assumed by core code:
> hugetlb_change_protection(vma, address, end, newprot);
> ...
>
> huge_pte_write(huge_ptep_get(ptep)); /* should be true! */
>
> Unfortunately, cpu_v7_set_pte_ext will change the bit layout of the
> resultant pte, and will set the read only bit if the dirty bit is not
> also enabled.
>
> If one were to allocate a read only shared huge page, then fault it in,
> and then mprotect it to be writeable. A subsequent write to that huge
> page will result in a spurious call to hugetlb_cow, which causes
> corruption. This call is optimised away prior to:
> 37a2140 mm, hugetlb: do not use a page in page cache for cow
> optimization
>
> If one runs the libhugetlbfs test suite on v3.12-rc1 upwards, then the
> mprotect test will cause the afformentioned corruption and before the
> set of tests completes, the system will be left in an unresponsive
> state. (calls to fork fail with -ENOMEM).
>
> This was an absolute pig to debug and, as this is the second time I've
> ran into issues caused by ptes being modified in transit, I've opted to
> re-implement set_huge_pte_at such that it just dereferences the pte.
> (in a similar manner as arm64). This has also allowed me to revert the
> pte_same logic change (that removed the NG bit from comparison), by
> also setting the NG bit for all new huge ptes.
>
For what it's worth, I spend weeks on the infamous KVM 'voodoo bug'
which was also related to the side effect of setting bits in set_pte_at,
and I remember then thinking that callers should decide which bits they
want set in their page tables and a function to set a pte should set a
pte, not or random bits on there.
But I don't know the full history or rationale behind having this side
effect, but I would certainly welcome a change to move setting those
bits higher in the stack, especially because tracking it down into the
non-trivial assembly code is quite tedious.
-Christoffer
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH 0/3] Simplify set_huge_pte_at, pte_same for LPAE
2013-11-19 17:35 [PATCH 0/3] Simplify set_huge_pte_at, pte_same for LPAE Steve Capper
` (3 preceding siblings ...)
2013-11-19 18:02 ` [PATCH 0/3] Simplify set_huge_pte_at, pte_same for LPAE Christoffer Dall
@ 2013-12-03 13:46 ` Steve Capper
2013-12-03 15:09 ` Catalin Marinas
4 siblings, 1 reply; 7+ messages in thread
From: Steve Capper @ 2013-12-03 13:46 UTC (permalink / raw)
To: linux-arm-kernel
On Tue, Nov 19, 2013 at 05:35:26PM +0000, Steve Capper wrote:
> Hello,
> The following patch series is my attempt at fixing a rather nasty bug
> which became visible in 3.12-rc1 when running the libhugetlbfs test
> suite. (This problem only just came to my attention yesterday).
>
> For LPAE, set_huge_pte_at calls set_pte_at which then calls
> set_pte_ext, which in turn is wired up to call cpu_v7_set_pte_ext,
> which is defined in proc-v7-3level.S.
>
> For huge pages, given newprot a pgprot_t value for a shared writable
> VMA, and ptep a pointer to a pte belonging to this VMA; the following
> behaviour is assumed by core code:
> hugetlb_change_protection(vma, address, end, newprot);
> ...
>
> huge_pte_write(huge_ptep_get(ptep)); /* should be true! */
>
> Unfortunately, cpu_v7_set_pte_ext will change the bit layout of the
> resultant pte, and will set the read only bit if the dirty bit is not
> also enabled.
>
> If one were to allocate a read only shared huge page, then fault it in,
> and then mprotect it to be writeable. A subsequent write to that huge
> page will result in a spurious call to hugetlb_cow, which causes
> corruption. This call is optimised away prior to:
> 37a2140 mm, hugetlb: do not use a page in page cache for cow
> optimization
>
> If one runs the libhugetlbfs test suite on v3.12-rc1 upwards, then the
> mprotect test will cause the afformentioned corruption and before the
> set of tests completes, the system will be left in an unresponsive
> state. (calls to fork fail with -ENOMEM).
>
> This was an absolute pig to debug and, as this is the second time I've
> ran into issues caused by ptes being modified in transit, I've opted to
> re-implement set_huge_pte_at such that it just dereferences the pte.
> (in a similar manner as arm64). This has also allowed me to revert the
> pte_same logic change (that removed the NG bit from comparison), by
> also setting the NG bit for all new huge ptes.
>
> These patches are against 3.12, and I have tested this series on an
> Arndale board with LPAE running libhugetlbfs.
>
> I would really value any comments/critique/flames on this series.
> Especially as I've ommitted the DCCMVAC at the end of set_huge_pte_at
> as I couldn't see why it was needed, please yell at me if it is needed!
> :-)
>
> Cheers,
> --
> Steve
Hi,
A question has been raised for the arm64 analogue of this series as to
whether or not this is the best approach:
http://lists.infradead.org/pipermail/linux-arm-kernel/2013-November/215155.html
I am having a think about this, and will send out a V2 once my brain
has caught up. :-)
Cheers,
--
Steve
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH 0/3] Simplify set_huge_pte_at, pte_same for LPAE
2013-12-03 13:46 ` Steve Capper
@ 2013-12-03 15:09 ` Catalin Marinas
0 siblings, 0 replies; 7+ messages in thread
From: Catalin Marinas @ 2013-12-03 15:09 UTC (permalink / raw)
To: linux-arm-kernel
On Tue, Dec 03, 2013 at 01:46:24PM +0000, Steve Capper wrote:
> On Tue, Nov 19, 2013 at 05:35:26PM +0000, Steve Capper wrote:
> > The following patch series is my attempt at fixing a rather nasty bug
> > which became visible in 3.12-rc1 when running the libhugetlbfs test
> > suite. (This problem only just came to my attention yesterday).
> >
> > For LPAE, set_huge_pte_at calls set_pte_at which then calls
> > set_pte_ext, which in turn is wired up to call cpu_v7_set_pte_ext,
> > which is defined in proc-v7-3level.S.
> >
> > For huge pages, given newprot a pgprot_t value for a shared writable
> > VMA, and ptep a pointer to a pte belonging to this VMA; the following
> > behaviour is assumed by core code:
> > hugetlb_change_protection(vma, address, end, newprot);
> > ...
> >
> > huge_pte_write(huge_ptep_get(ptep)); /* should be true! */
> >
> > Unfortunately, cpu_v7_set_pte_ext will change the bit layout of the
> > resultant pte, and will set the read only bit if the dirty bit is not
> > also enabled.
>
> A question has been raised for the arm64 analogue of this series as to
> whether or not this is the best approach:
> http://lists.infradead.org/pipermail/linux-arm-kernel/2013-November/215155.html
>
> I am having a think about this, and will send out a V2 once my brain
> has caught up. :-)
I think we first need to check we don't actually do a CoW for any clean
anonymous small page just because pte_dirty and pte_write use the same
PTE_RDONLY bit (both arm+LPAE and arm64). Once that's fixed, I don't
think you need any hugetlb changes.
Basically we need to encode these states:
PTE_DIRTY PTE_RDONLY
!pte_dirty && !pte_write 0 1
!pte_dirty && pte_write 0 1
pte_dirty && !pte_write 1 1
pte_dirty && pte_write 1 0
So we can't distinguish between the first two with just two bits since
PTE_RDONLY is a hardware bit.
My proposal would be for PTE_WRITE bit with PTE_RDONLY only set in the
set_pte() function (if !pte_dirty || !pte_write).
--
Catalin
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2013-12-03 15:09 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-11-19 17:35 [PATCH 0/3] Simplify set_huge_pte_at, pte_same for LPAE Steve Capper
2013-11-19 17:35 ` [PATCH 1/3] ARM: mm: Rewire LPAE set_huge_pte_at Steve Capper
2013-11-19 17:35 ` [PATCH 2/3] ARM: mm: Make LPAE huge page ptes NG by default Steve Capper
2013-11-19 17:35 ` [PATCH 3/3] Revert "ARM: mm: correct pte_same behaviour for LPAE." Steve Capper
2013-11-19 18:02 ` [PATCH 0/3] Simplify set_huge_pte_at, pte_same for LPAE Christoffer Dall
2013-12-03 13:46 ` Steve Capper
2013-12-03 15:09 ` Catalin Marinas
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).