* [PATCH 1/2] mm/vmalloc: Add interfaces to free unused page table
2018-03-07 18:32 ` [PATCH 1/2] mm/vmalloc: Add interfaces to free unused page table Toshi Kani
@ 2018-03-07 22:54 ` Andrew Morton
2018-03-07 23:02 ` Kani, Toshi
2018-03-07 22:55 ` Andrew Morton
` (3 subsequent siblings)
4 siblings, 1 reply; 15+ messages in thread
From: Andrew Morton @ 2018-03-07 22:54 UTC (permalink / raw)
To: linux-arm-kernel
On Wed, 7 Mar 2018 11:32:26 -0700 Toshi Kani <toshi.kani@hpe.com> wrote:
> On architectures with CONFIG_HAVE_ARCH_HUGE_VMAP set, ioremap()
> may create pud/pmd mappings. Kernel panic was observed on arm64
> systems with Cortex-A75 in the following steps as described by
> Hanjun Guo.
>
> 1. ioremap a 4K size, valid page table will build,
> 2. iounmap it, pte0 will set to 0;
> 3. ioremap the same address with 2M size, pgd/pmd is unchanged,
> then set the a new value for pmd;
> 4. pte0 is leaked;
> 5. CPU may meet exception because the old pmd is still in TLB,
> which will lead to kernel panic.
>
> This panic is not reproducible on x86. INVLPG, called from iounmap,
> purges all levels of entries associated with purged address on x86.
> x86 still has memory leak.
>
> Add two interfaces, pud_free_pmd_page() and pmd_free_pte_page(),
> which clear a given pud/pmd entry and free up a page for the lower
> level entries.
>
> This patch implements their stub functions on x86 and arm64, which
> work as workaround.
>
> index 004abf9ebf12..942f4fa341f1 100644
> --- a/arch/x86/mm/pgtable.c
> +++ b/arch/x86/mm/pgtable.c
> @@ -702,4 +702,24 @@ int pmd_clear_huge(pmd_t *pmd)
>
> return 0;
> }
> +
> +/**
> + * pud_free_pmd_page - clear pud entry and free pmd page
> + *
> + * Returns 1 on success and 0 on failure (pud not cleared).
> + */
> +int pud_free_pmd_page(pud_t *pud)
> +{
> + return pud_none(*pud);
> +}
> +
> +/**
> + * pmd_free_pte_page - clear pmd entry and free pte page
> + *
> + * Returns 1 on success and 0 on failure (pmd not cleared).
> + */
> +int pmd_free_pte_page(pmd_t *pmd)
> +{
> + return pmd_none(*pmd);
> +}
Are these functions well named? I mean, the comment says "clear pud
entry and free pmd page" but the implementatin does neither of those
things. The name implies that the function frees a pmd_page but the
callsites use the function as a way of querying state.
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 1/2] mm/vmalloc: Add interfaces to free unused page table
2018-03-07 22:54 ` Andrew Morton
@ 2018-03-07 23:02 ` Kani, Toshi
0 siblings, 0 replies; 15+ messages in thread
From: Kani, Toshi @ 2018-03-07 23:02 UTC (permalink / raw)
To: linux-arm-kernel
On Wed, 2018-03-07 at 14:54 -0800, Andrew Morton wrote:
> On Wed, 7 Mar 2018 11:32:26 -0700 Toshi Kani <toshi.kani@hpe.com> wrote:
>
> > On architectures with CONFIG_HAVE_ARCH_HUGE_VMAP set, ioremap()
> > may create pud/pmd mappings. Kernel panic was observed on arm64
> > systems with Cortex-A75 in the following steps as described by
> > Hanjun Guo.
> >
> > 1. ioremap a 4K size, valid page table will build,
> > 2. iounmap it, pte0 will set to 0;
> > 3. ioremap the same address with 2M size, pgd/pmd is unchanged,
> > then set the a new value for pmd;
> > 4. pte0 is leaked;
> > 5. CPU may meet exception because the old pmd is still in TLB,
> > which will lead to kernel panic.
> >
> > This panic is not reproducible on x86. INVLPG, called from iounmap,
> > purges all levels of entries associated with purged address on x86.
> > x86 still has memory leak.
> >
> > Add two interfaces, pud_free_pmd_page() and pmd_free_pte_page(),
> > which clear a given pud/pmd entry and free up a page for the lower
> > level entries.
> >
> > This patch implements their stub functions on x86 and arm64, which
> > work as workaround.
> >
> > index 004abf9ebf12..942f4fa341f1 100644
> > --- a/arch/x86/mm/pgtable.c
> > +++ b/arch/x86/mm/pgtable.c
> > @@ -702,4 +702,24 @@ int pmd_clear_huge(pmd_t *pmd)
> >
> > return 0;
> > }
> > +
> > +/**
> > + * pud_free_pmd_page - clear pud entry and free pmd page
> > + *
> > + * Returns 1 on success and 0 on failure (pud not cleared).
> > + */
> > +int pud_free_pmd_page(pud_t *pud)
> > +{
> > + return pud_none(*pud);
> > +}
> > +
> > +/**
> > + * pmd_free_pte_page - clear pmd entry and free pte page
> > + *
> > + * Returns 1 on success and 0 on failure (pmd not cleared).
> > + */
> > +int pmd_free_pte_page(pmd_t *pmd)
> > +{
> > + return pmd_none(*pmd);
> > +}
>
> Are these functions well named? I mean, the comment says "clear pud
> entry and free pmd page" but the implementatin does neither of those
> things. The name implies that the function frees a pmd_page but the
> callsites use the function as a way of querying state.
This patch 1/2 only implements stubs. Patch 2/2 implements what is
described here.
> Also, as this fixes an arm64 kernel panic, should we be using
> cc:stable?
Right.
Thanks,
-Toshi
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 1/2] mm/vmalloc: Add interfaces to free unused page table
2018-03-07 18:32 ` [PATCH 1/2] mm/vmalloc: Add interfaces to free unused page table Toshi Kani
2018-03-07 22:54 ` Andrew Morton
@ 2018-03-07 22:55 ` Andrew Morton
2018-03-08 4:00 ` Matthew Wilcox
` (2 subsequent siblings)
4 siblings, 0 replies; 15+ messages in thread
From: Andrew Morton @ 2018-03-07 22:55 UTC (permalink / raw)
To: linux-arm-kernel
On Wed, 7 Mar 2018 11:32:26 -0700 Toshi Kani <toshi.kani@hpe.com> wrote:
> On architectures with CONFIG_HAVE_ARCH_HUGE_VMAP set, ioremap()
> may create pud/pmd mappings. Kernel panic was observed on arm64
> systems with Cortex-A75 in the following steps as described by
> Hanjun Guo.
>
> 1. ioremap a 4K size, valid page table will build,
> 2. iounmap it, pte0 will set to 0;
> 3. ioremap the same address with 2M size, pgd/pmd is unchanged,
> then set the a new value for pmd;
> 4. pte0 is leaked;
> 5. CPU may meet exception because the old pmd is still in TLB,
> which will lead to kernel panic.
>
> This panic is not reproducible on x86. INVLPG, called from iounmap,
> purges all levels of entries associated with purged address on x86.
> x86 still has memory leak.
>
> Add two interfaces, pud_free_pmd_page() and pmd_free_pte_page(),
> which clear a given pud/pmd entry and free up a page for the lower
> level entries.
>
> This patch implements their stub functions on x86 and arm64, which
> work as workaround.
Also, as this fixes an arm64 kernel panic, should we be using cc:stable?
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 1/2] mm/vmalloc: Add interfaces to free unused page table
2018-03-07 18:32 ` [PATCH 1/2] mm/vmalloc: Add interfaces to free unused page table Toshi Kani
2018-03-07 22:54 ` Andrew Morton
2018-03-07 22:55 ` Andrew Morton
@ 2018-03-08 4:00 ` Matthew Wilcox
2018-03-08 15:56 ` Kani, Toshi
2018-03-08 8:08 ` Ingo Molnar
2018-03-08 18:04 ` Will Deacon
4 siblings, 1 reply; 15+ messages in thread
From: Matthew Wilcox @ 2018-03-08 4:00 UTC (permalink / raw)
To: linux-arm-kernel
On Wed, Mar 07, 2018 at 11:32:26AM -0700, Toshi Kani wrote:
> +/**
> + * pud_free_pmd_page - clear pud entry and free pmd page
> + *
> + * Returns 1 on success and 0 on failure (pud not cleared).
> + */
> +int pud_free_pmd_page(pud_t *pud)
> +{
> + return pud_none(*pud);
> +}
Wouldn't it be clearer if you returned 'bool' instead of 'int' here?
Also you didn't document the pud parameter, nor use the approved form
for documenting the return type, nor the calling context. So I would
have written it out like this:
/**
* pud_free_pmd_page - Clear pud entry and free pmd page.
* @pud: Pointer to a PUD.
*
* Context: Caller should hold mmap_sem write-locked.
* Return: %true if clearing the entry succeeded.
*/
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 1/2] mm/vmalloc: Add interfaces to free unused page table
2018-03-08 4:00 ` Matthew Wilcox
@ 2018-03-08 15:56 ` Kani, Toshi
2018-03-08 22:07 ` Matthew Wilcox
0 siblings, 1 reply; 15+ messages in thread
From: Kani, Toshi @ 2018-03-08 15:56 UTC (permalink / raw)
To: linux-arm-kernel
On Wed, 2018-03-07 at 20:00 -0800, Matthew Wilcox wrote:
> On Wed, Mar 07, 2018 at 11:32:26AM -0700, Toshi Kani wrote:
> > +/**
> > + * pud_free_pmd_page - clear pud entry and free pmd page
> > + *
> > + * Returns 1 on success and 0 on failure (pud not cleared).
> > + */
> > +int pud_free_pmd_page(pud_t *pud)
> > +{
> > + return pud_none(*pud);
> > +}
>
> Wouldn't it be clearer if you returned 'bool' instead of 'int' here?
I thought about it and decided to use 'int' since all other pud/pmd/pte
interfaces, such as pud_none() above, use 'int'.
> Also you didn't document the pud parameter, nor use the approved form
> for documenting the return type, nor the calling context. So I would
> have written it out like this:
>
> /**
> * pud_free_pmd_page - Clear pud entry and free pmd page.
> * @pud: Pointer to a PUD.
> *
> * Context: Caller should hold mmap_sem write-locked.
> * Return: %true if clearing the entry succeeded.
> */
Will do.
Thanks!
-Toshi
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 1/2] mm/vmalloc: Add interfaces to free unused page table
2018-03-08 15:56 ` Kani, Toshi
@ 2018-03-08 22:07 ` Matthew Wilcox
2018-03-08 23:27 ` Kani, Toshi
0 siblings, 1 reply; 15+ messages in thread
From: Matthew Wilcox @ 2018-03-08 22:07 UTC (permalink / raw)
To: linux-arm-kernel
On Thu, Mar 08, 2018 at 03:56:30PM +0000, Kani, Toshi wrote:
> On Wed, 2018-03-07 at 20:00 -0800, Matthew Wilcox wrote:
> > On Wed, Mar 07, 2018 at 11:32:26AM -0700, Toshi Kani wrote:
> > > +/**
> > > + * pud_free_pmd_page - clear pud entry and free pmd page
> > > + *
> > > + * Returns 1 on success and 0 on failure (pud not cleared).
> > > + */
> > > +int pud_free_pmd_page(pud_t *pud)
> > > +{
> > > + return pud_none(*pud);
> > > +}
> >
> > Wouldn't it be clearer if you returned 'bool' instead of 'int' here?
>
> I thought about it and decided to use 'int' since all other pud/pmd/pte
> interfaces, such as pud_none() above, use 'int'.
These interfaces were introduced before we had bool ... I suspect nobody's
taken the time to go through and convert them all.
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 1/2] mm/vmalloc: Add interfaces to free unused page table
2018-03-08 22:07 ` Matthew Wilcox
@ 2018-03-08 23:27 ` Kani, Toshi
0 siblings, 0 replies; 15+ messages in thread
From: Kani, Toshi @ 2018-03-08 23:27 UTC (permalink / raw)
To: linux-arm-kernel
On Thu, 2018-03-08 at 14:07 -0800, Matthew Wilcox wrote:
> On Thu, Mar 08, 2018 at 03:56:30PM +0000, Kani, Toshi wrote:
> > On Wed, 2018-03-07 at 20:00 -0800, Matthew Wilcox wrote:
> > > On Wed, Mar 07, 2018 at 11:32:26AM -0700, Toshi Kani wrote:
> > > > +/**
> > > > + * pud_free_pmd_page - clear pud entry and free pmd page
> > > > + *
> > > > + * Returns 1 on success and 0 on failure (pud not cleared).
> > > > + */
> > > > +int pud_free_pmd_page(pud_t *pud)
> > > > +{
> > > > + return pud_none(*pud);
> > > > +}
> > >
> > > Wouldn't it be clearer if you returned 'bool' instead of 'int' here?
> >
> > I thought about it and decided to use 'int' since all other pud/pmd/pte
> > interfaces, such as pud_none() above, use 'int'.
>
> These interfaces were introduced before we had bool ... I suspect nobody's
> taken the time to go through and convert them all.
I see. Since this patchset already changes core, arm and x86, and will
be back ported to stables. So, if you do not mind, I'd like to leave it
consistent with others with 'int', and make the footstep minimum.
Thanks,
-Toshi
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 1/2] mm/vmalloc: Add interfaces to free unused page table
2018-03-07 18:32 ` [PATCH 1/2] mm/vmalloc: Add interfaces to free unused page table Toshi Kani
` (2 preceding siblings ...)
2018-03-08 4:00 ` Matthew Wilcox
@ 2018-03-08 8:08 ` Ingo Molnar
2018-03-08 18:04 ` Will Deacon
4 siblings, 0 replies; 15+ messages in thread
From: Ingo Molnar @ 2018-03-08 8:08 UTC (permalink / raw)
To: linux-arm-kernel
* Toshi Kani <toshi.kani@hpe.com> wrote:
> On architectures with CONFIG_HAVE_ARCH_HUGE_VMAP set, ioremap()
> may create pud/pmd mappings. Kernel panic was observed on arm64
> systems with Cortex-A75 in the following steps as described by
> Hanjun Guo.
>
> 1. ioremap a 4K size, valid page table will build,
> 2. iounmap it, pte0 will set to 0;
> 3. ioremap the same address with 2M size, pgd/pmd is unchanged,
> then set the a new value for pmd;
> 4. pte0 is leaked;
> 5. CPU may meet exception because the old pmd is still in TLB,
> which will lead to kernel panic.
>
> This panic is not reproducible on x86. INVLPG, called from iounmap,
> purges all levels of entries associated with purged address on x86.
Where does x86 iounmap() do that?
> x86 still has memory leak.
> Add two interfaces, pud_free_pmd_page() and pmd_free_pte_page(),
> which clear a given pud/pmd entry and free up a page for the lower
> level entries.
>
> This patch implements their stub functions on x86 and arm64, which
> work as workaround.
At minimum the ordering of the patches is very confusing: why don't you introduce
the new methods in patch #1, and then use them in patch #2?
Also please double check the coding style of your patches, there's a number of
obvious problems of outright bad patterns and also cases where you clearly don't
try to follow the (correct) style of existing code.
Thanks,
Ingo
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 1/2] mm/vmalloc: Add interfaces to free unused page table
2018-03-07 18:32 ` [PATCH 1/2] mm/vmalloc: Add interfaces to free unused page table Toshi Kani
` (3 preceding siblings ...)
2018-03-08 8:08 ` Ingo Molnar
@ 2018-03-08 18:04 ` Will Deacon
2018-03-08 19:30 ` Kani, Toshi
4 siblings, 1 reply; 15+ messages in thread
From: Will Deacon @ 2018-03-08 18:04 UTC (permalink / raw)
To: linux-arm-kernel
Hi Toshi,
Thanks for the patches!
On Wed, Mar 07, 2018 at 11:32:26AM -0700, Toshi Kani wrote:
> On architectures with CONFIG_HAVE_ARCH_HUGE_VMAP set, ioremap()
> may create pud/pmd mappings. Kernel panic was observed on arm64
> systems with Cortex-A75 in the following steps as described by
> Hanjun Guo.
>
> 1. ioremap a 4K size, valid page table will build,
> 2. iounmap it, pte0 will set to 0;
> 3. ioremap the same address with 2M size, pgd/pmd is unchanged,
> then set the a new value for pmd;
> 4. pte0 is leaked;
> 5. CPU may meet exception because the old pmd is still in TLB,
> which will lead to kernel panic.
>
> This panic is not reproducible on x86. INVLPG, called from iounmap,
> purges all levels of entries associated with purged address on x86.
> x86 still has memory leak.
>
> Add two interfaces, pud_free_pmd_page() and pmd_free_pte_page(),
> which clear a given pud/pmd entry and free up a page for the lower
> level entries.
>
> This patch implements their stub functions on x86 and arm64, which
> work as workaround.
>
> Reported-by: Lei Li <lious.lilei@hisilicon.com>
> Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Wang Xuefeng <wxf.wang@hisilicon.com>
> Cc: Will Deacon <will.deacon@arm.com>
> Cc: Hanjun Guo <guohanjun@huawei.com>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: Borislav Petkov <bp@suse.de>
> ---
> arch/arm64/mm/mmu.c | 10 ++++++++++
> arch/x86/mm/pgtable.c | 20 ++++++++++++++++++++
> include/asm-generic/pgtable.h | 10 ++++++++++
> lib/ioremap.c | 6 ++++--
> 4 files changed, 44 insertions(+), 2 deletions(-)
[...]
> diff --git a/lib/ioremap.c b/lib/ioremap.c
> index b808a390e4c3..54e5bbaa3200 100644
> --- a/lib/ioremap.c
> +++ b/lib/ioremap.c
> @@ -91,7 +91,8 @@ static inline int ioremap_pmd_range(pud_t *pud, unsigned long addr,
>
> if (ioremap_pmd_enabled() &&
> ((next - addr) == PMD_SIZE) &&
> - IS_ALIGNED(phys_addr + addr, PMD_SIZE)) {
> + IS_ALIGNED(phys_addr + addr, PMD_SIZE) &&
> + pmd_free_pte_page(pmd)) {
I find it a bit weird that we're postponing this to the subsequent map. If
we want to address the break-before-make issue that was causing a panic on
arm64, then I think it would be better to do this on the unmap path to avoid
duplicating TLB invalidation.
Will
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 1/2] mm/vmalloc: Add interfaces to free unused page table
2018-03-08 18:04 ` Will Deacon
@ 2018-03-08 19:30 ` Kani, Toshi
0 siblings, 0 replies; 15+ messages in thread
From: Kani, Toshi @ 2018-03-08 19:30 UTC (permalink / raw)
To: linux-arm-kernel
On Thu, 2018-03-08 at 18:04 +0000, Will Deacon wrote:
:
> > diff --git a/lib/ioremap.c b/lib/ioremap.c
> > index b808a390e4c3..54e5bbaa3200 100644
> > --- a/lib/ioremap.c
> > +++ b/lib/ioremap.c
> > @@ -91,7 +91,8 @@ static inline int ioremap_pmd_range(pud_t *pud, unsigned long addr,
> >
> > if (ioremap_pmd_enabled() &&
> > ((next - addr) == PMD_SIZE) &&
> > - IS_ALIGNED(phys_addr + addr, PMD_SIZE)) {
> > + IS_ALIGNED(phys_addr + addr, PMD_SIZE) &&
> > + pmd_free_pte_page(pmd)) {
>
> I find it a bit weird that we're postponing this to the subsequent map. If
> we want to address the break-before-make issue that was causing a panic on
> arm64, then I think it would be better to do this on the unmap path to avoid
> duplicating TLB invalidation.
Hi Will,
Yes, I started looking into doing it the unmap path, but found the
following issues:
- The iounmap() path is shared with vunmap(). Since vmap() only
supports pte mappings, making vunmap() to free pte pages is an overhead
for regular vmap users as they do not need pte pages freed up.
- Checking to see if all entries in a pte page are cleared in the unmap
path is racy, and serializing this check is expensive.
- The unmap path calls free_vmap_area_noflush() to do lazy TLB purges.
Clearing a pud/pmd entry before the lazy TLB purges needs extra TLB
purge.
Hence, I decided to postpone and do it in the ioremap path when a
pud/pmd mapping is set. The "break" on arm64 happens when you update a
pmd entry without purging it. So, the unmap path is not broken. I
understand that arm64 may need extra TLB purge in pmd_free_pte_page(),
but it limits this overhead only when it sets up a pud/pmd mapping.
Thanks,
-Toshi
^ permalink raw reply [flat|nested] 15+ messages in thread