From: Nicholas Piggin <npiggin@gmail.com>
To: Andrew Morton <akpm@linux-foundation.org>,
Ding Tianhong <dingtianhong@huawei.com>,
linux-mm@kvack.org
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>,
Christoph Hellwig <hch@infradead.org>,
Christoph Hellwig <hch@lst.de>,
Jonathan Cameron <Jonathan.Cameron@Huawei.com>,
linux-arch@vger.kernel.org, linux-kernel@vger.kernel.org,
linuxppc-dev@lists.ozlabs.org,
Rick Edgecombe <rick.p.edgecombe@intel.com>
Subject: Re: [PATCH v11 01/13] mm/vmalloc: fix HUGE_VMAP regression by enabling huge pages in vmalloc_to_page
Date: Tue, 02 Feb 2021 20:22:35 +1000 [thread overview]
Message-ID: <1612261080.2gjaa5ecdf.astroid@bobo.none> (raw)
In-Reply-To: <2dcbe2c9-c968-4895-fc43-c40dfe9f06d3@huawei.com>
Excerpts from Ding Tianhong's message of January 28, 2021 1:13 pm:
> On 2021/1/26 12:44, Nicholas Piggin wrote:
>> vmalloc_to_page returns NULL for addresses mapped by larger pages[*].
>> Whether or not a vmap is huge depends on the architecture details,
>> alignments, boot options, etc., which the caller can not be expected
>> to know. Therefore HUGE_VMAP is a regression for vmalloc_to_page.
>>
>> This change teaches vmalloc_to_page about larger pages, and returns
>> the struct page that corresponds to the offset within the large page.
>> This makes the API agnostic to mapping implementation details.
>>
>> [*] As explained by commit 029c54b095995 ("mm/vmalloc.c: huge-vmap:
>> fail gracefully on unexpected huge vmap mappings")
>>
>> Reviewed-by: Christoph Hellwig <hch@lst.de>
>> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
>> ---
>> mm/vmalloc.c | 41 ++++++++++++++++++++++++++---------------
>> 1 file changed, 26 insertions(+), 15 deletions(-)
>>
>> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
>> index e6f352bf0498..62372f9e0167 100644
>> --- a/mm/vmalloc.c
>> +++ b/mm/vmalloc.c
>> @@ -34,7 +34,7 @@
>> #include <linux/bitops.h>
>> #include <linux/rbtree_augmented.h>
>> #include <linux/overflow.h>
>> -
>> +#include <linux/pgtable.h>
>> #include <linux/uaccess.h>
>> #include <asm/tlbflush.h>
>> #include <asm/shmparam.h>
>> @@ -343,7 +343,9 @@ int is_vmalloc_or_module_addr(const void *x)
>> }
>>
>> /*
>> - * Walk a vmap address to the struct page it maps.
>> + * Walk a vmap address to the struct page it maps. Huge vmap mappings will
>> + * return the tail page that corresponds to the base page address, which
>> + * matches small vmap mappings.
>> */
>> struct page *vmalloc_to_page(const void *vmalloc_addr)
>> {
>> @@ -363,25 +365,33 @@ struct page *vmalloc_to_page(const void *vmalloc_addr)
>>
>> if (pgd_none(*pgd))
>> return NULL;
>> + if (WARN_ON_ONCE(pgd_leaf(*pgd)))
>> + return NULL; /* XXX: no allowance for huge pgd */
>> + if (WARN_ON_ONCE(pgd_bad(*pgd)))
>> + return NULL;
>> +
>> p4d = p4d_offset(pgd, addr);
>> if (p4d_none(*p4d))
>> return NULL;
>> - pud = pud_offset(p4d, addr);
>> + if (p4d_leaf(*p4d))
>> + return p4d_page(*p4d) + ((addr & ~P4D_MASK) >> PAGE_SHIFT);
>> + if (WARN_ON_ONCE(p4d_bad(*p4d)))
>> + return NULL;
>>
>> - /*
>> - * Don't dereference bad PUD or PMD (below) entries. This will also
>> - * identify huge mappings, which we may encounter on architectures
>> - * that define CONFIG_HAVE_ARCH_HUGE_VMAP=y. Such regions will be
>> - * identified as vmalloc addresses by is_vmalloc_addr(), but are
>> - * not [unambiguously] associated with a struct page, so there is
>> - * no correct value to return for them.
>> - */
>> - WARN_ON_ONCE(pud_bad(*pud));
>> - if (pud_none(*pud) || pud_bad(*pud))
>> + pud = pud_offset(p4d, addr);
>> + if (pud_none(*pud))
>> + return NULL;
>> + if (pud_leaf(*pud))
>> + return pud_page(*pud) + ((addr & ~PUD_MASK) >> PAGE_SHIFT);
>
> Hi Nicho:
>
> /builds/1mzfdQzleCy69KZFb5qHNSEgabZ/mm/vmalloc.c: In function 'vmalloc_to_page':
> /builds/1mzfdQzleCy69KZFb5qHNSEgabZ/include/asm-generic/pgtable-nop4d-hack.h:48:27: error: implicit declaration of function 'pud_page'; did you mean 'put_page'? [-Werror=implicit-function-declaration]
> 48 | #define pgd_page(pgd) (pud_page((pud_t){ pgd }))
> | ^~~~~~~~
>
> the pug_page is not defined for aarch32 when enabling 2-level page config, it break the system building.
Hey thanks for finding that, not sure why that didn't trigger any CI.
Anyway newer kernels don't have the ptable-*-hack.h headers, but even so
it still breaks upstream. arm is using some hand-rolled 2-level folding
of its own (which is fair enough because most 32-bit archs were 2 level
at the time I added pgtable-nopud.h header).
This patch seems to at least make it build.
Thanks,
Nick
---
arch/arm/include/asm/pgtable-3level.h | 2 --
arch/arm/include/asm/pgtable.h | 3 +++
2 files changed, 3 insertions(+), 2 deletions(-)
diff --git a/arch/arm/include/asm/pgtable-3level.h b/arch/arm/include/asm/pgtable-3level.h
index 2b85d175e999..d4edab51a77c 100644
--- a/arch/arm/include/asm/pgtable-3level.h
+++ b/arch/arm/include/asm/pgtable-3level.h
@@ -186,8 +186,6 @@ static inline pte_t pte_mkspecial(pte_t pte)
#define pmd_write(pmd) (pmd_isclear((pmd), L_PMD_SECT_RDONLY))
#define pmd_dirty(pmd) (pmd_isset((pmd), L_PMD_SECT_DIRTY))
-#define pud_page(pud) pmd_page(__pmd(pud_val(pud)))
-#define pud_write(pud) pmd_write(__pmd(pud_val(pud)))
#define pmd_hugewillfault(pmd) (!pmd_young(pmd) || !pmd_write(pmd))
#define pmd_thp_or_huge(pmd) (pmd_huge(pmd) || pmd_trans_huge(pmd))
diff --git a/arch/arm/include/asm/pgtable.h b/arch/arm/include/asm/pgtable.h
index c02f24400369..d63a5bb6bd0c 100644
--- a/arch/arm/include/asm/pgtable.h
+++ b/arch/arm/include/asm/pgtable.h
@@ -166,6 +166,9 @@ extern struct page *empty_zero_page;
extern pgd_t swapper_pg_dir[PTRS_PER_PGD];
+#define pud_page(pud) pmd_page(__pmd(pud_val(pud)))
+#define pud_write(pud) pmd_write(__pmd(pud_val(pud)))
+
#define pmd_none(pmd) (!pmd_val(pmd))
static inline pte_t *pmd_page_vaddr(pmd_t pmd)
--
2.23.0
WARNING: multiple messages have this Message-ID (diff)
From: Nicholas Piggin <npiggin@gmail.com>
To: Andrew Morton <akpm@linux-foundation.org>,
Ding Tianhong <dingtianhong@huawei.com>,
linux-mm@kvack.org
Cc: linux-arch@vger.kernel.org, linux-kernel@vger.kernel.org,
Christoph Hellwig <hch@infradead.org>,
Jonathan Cameron <Jonathan.Cameron@Huawei.com>,
Rick Edgecombe <rick.p.edgecombe@intel.com>,
linuxppc-dev@lists.ozlabs.org, Christoph Hellwig <hch@lst.de>
Subject: Re: [PATCH v11 01/13] mm/vmalloc: fix HUGE_VMAP regression by enabling huge pages in vmalloc_to_page
Date: Tue, 02 Feb 2021 20:22:35 +1000 [thread overview]
Message-ID: <1612261080.2gjaa5ecdf.astroid@bobo.none> (raw)
In-Reply-To: <2dcbe2c9-c968-4895-fc43-c40dfe9f06d3@huawei.com>
Excerpts from Ding Tianhong's message of January 28, 2021 1:13 pm:
> On 2021/1/26 12:44, Nicholas Piggin wrote:
>> vmalloc_to_page returns NULL for addresses mapped by larger pages[*].
>> Whether or not a vmap is huge depends on the architecture details,
>> alignments, boot options, etc., which the caller can not be expected
>> to know. Therefore HUGE_VMAP is a regression for vmalloc_to_page.
>>
>> This change teaches vmalloc_to_page about larger pages, and returns
>> the struct page that corresponds to the offset within the large page.
>> This makes the API agnostic to mapping implementation details.
>>
>> [*] As explained by commit 029c54b095995 ("mm/vmalloc.c: huge-vmap:
>> fail gracefully on unexpected huge vmap mappings")
>>
>> Reviewed-by: Christoph Hellwig <hch@lst.de>
>> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
>> ---
>> mm/vmalloc.c | 41 ++++++++++++++++++++++++++---------------
>> 1 file changed, 26 insertions(+), 15 deletions(-)
>>
>> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
>> index e6f352bf0498..62372f9e0167 100644
>> --- a/mm/vmalloc.c
>> +++ b/mm/vmalloc.c
>> @@ -34,7 +34,7 @@
>> #include <linux/bitops.h>
>> #include <linux/rbtree_augmented.h>
>> #include <linux/overflow.h>
>> -
>> +#include <linux/pgtable.h>
>> #include <linux/uaccess.h>
>> #include <asm/tlbflush.h>
>> #include <asm/shmparam.h>
>> @@ -343,7 +343,9 @@ int is_vmalloc_or_module_addr(const void *x)
>> }
>>
>> /*
>> - * Walk a vmap address to the struct page it maps.
>> + * Walk a vmap address to the struct page it maps. Huge vmap mappings will
>> + * return the tail page that corresponds to the base page address, which
>> + * matches small vmap mappings.
>> */
>> struct page *vmalloc_to_page(const void *vmalloc_addr)
>> {
>> @@ -363,25 +365,33 @@ struct page *vmalloc_to_page(const void *vmalloc_addr)
>>
>> if (pgd_none(*pgd))
>> return NULL;
>> + if (WARN_ON_ONCE(pgd_leaf(*pgd)))
>> + return NULL; /* XXX: no allowance for huge pgd */
>> + if (WARN_ON_ONCE(pgd_bad(*pgd)))
>> + return NULL;
>> +
>> p4d = p4d_offset(pgd, addr);
>> if (p4d_none(*p4d))
>> return NULL;
>> - pud = pud_offset(p4d, addr);
>> + if (p4d_leaf(*p4d))
>> + return p4d_page(*p4d) + ((addr & ~P4D_MASK) >> PAGE_SHIFT);
>> + if (WARN_ON_ONCE(p4d_bad(*p4d)))
>> + return NULL;
>>
>> - /*
>> - * Don't dereference bad PUD or PMD (below) entries. This will also
>> - * identify huge mappings, which we may encounter on architectures
>> - * that define CONFIG_HAVE_ARCH_HUGE_VMAP=y. Such regions will be
>> - * identified as vmalloc addresses by is_vmalloc_addr(), but are
>> - * not [unambiguously] associated with a struct page, so there is
>> - * no correct value to return for them.
>> - */
>> - WARN_ON_ONCE(pud_bad(*pud));
>> - if (pud_none(*pud) || pud_bad(*pud))
>> + pud = pud_offset(p4d, addr);
>> + if (pud_none(*pud))
>> + return NULL;
>> + if (pud_leaf(*pud))
>> + return pud_page(*pud) + ((addr & ~PUD_MASK) >> PAGE_SHIFT);
>
> Hi Nicho:
>
> /builds/1mzfdQzleCy69KZFb5qHNSEgabZ/mm/vmalloc.c: In function 'vmalloc_to_page':
> /builds/1mzfdQzleCy69KZFb5qHNSEgabZ/include/asm-generic/pgtable-nop4d-hack.h:48:27: error: implicit declaration of function 'pud_page'; did you mean 'put_page'? [-Werror=implicit-function-declaration]
> 48 | #define pgd_page(pgd) (pud_page((pud_t){ pgd }))
> | ^~~~~~~~
>
> the pug_page is not defined for aarch32 when enabling 2-level page config, it break the system building.
Hey thanks for finding that, not sure why that didn't trigger any CI.
Anyway newer kernels don't have the ptable-*-hack.h headers, but even so
it still breaks upstream. arm is using some hand-rolled 2-level folding
of its own (which is fair enough because most 32-bit archs were 2 level
at the time I added pgtable-nopud.h header).
This patch seems to at least make it build.
Thanks,
Nick
---
arch/arm/include/asm/pgtable-3level.h | 2 --
arch/arm/include/asm/pgtable.h | 3 +++
2 files changed, 3 insertions(+), 2 deletions(-)
diff --git a/arch/arm/include/asm/pgtable-3level.h b/arch/arm/include/asm/pgtable-3level.h
index 2b85d175e999..d4edab51a77c 100644
--- a/arch/arm/include/asm/pgtable-3level.h
+++ b/arch/arm/include/asm/pgtable-3level.h
@@ -186,8 +186,6 @@ static inline pte_t pte_mkspecial(pte_t pte)
#define pmd_write(pmd) (pmd_isclear((pmd), L_PMD_SECT_RDONLY))
#define pmd_dirty(pmd) (pmd_isset((pmd), L_PMD_SECT_DIRTY))
-#define pud_page(pud) pmd_page(__pmd(pud_val(pud)))
-#define pud_write(pud) pmd_write(__pmd(pud_val(pud)))
#define pmd_hugewillfault(pmd) (!pmd_young(pmd) || !pmd_write(pmd))
#define pmd_thp_or_huge(pmd) (pmd_huge(pmd) || pmd_trans_huge(pmd))
diff --git a/arch/arm/include/asm/pgtable.h b/arch/arm/include/asm/pgtable.h
index c02f24400369..d63a5bb6bd0c 100644
--- a/arch/arm/include/asm/pgtable.h
+++ b/arch/arm/include/asm/pgtable.h
@@ -166,6 +166,9 @@ extern struct page *empty_zero_page;
extern pgd_t swapper_pg_dir[PTRS_PER_PGD];
+#define pud_page(pud) pmd_page(__pmd(pud_val(pud)))
+#define pud_write(pud) pmd_write(__pmd(pud_val(pud)))
+
#define pmd_none(pmd) (!pmd_val(pmd))
static inline pte_t *pmd_page_vaddr(pmd_t pmd)
--
2.23.0
next prev parent reply other threads:[~2021-02-02 10:23 UTC|newest]
Thread overview: 61+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-01-26 4:44 [PATCH v11 00/13] huge vmalloc mappings Nicholas Piggin
2021-01-26 4:44 ` Nicholas Piggin
2021-01-26 4:44 ` [PATCH v11 01/13] mm/vmalloc: fix HUGE_VMAP regression by enabling huge pages in vmalloc_to_page Nicholas Piggin
2021-01-26 4:44 ` Nicholas Piggin
2021-01-26 6:40 ` Miaohe Lin
2021-01-26 6:40 ` Miaohe Lin
2021-01-28 3:13 ` Ding Tianhong
2021-01-28 3:13 ` Ding Tianhong
2021-02-02 10:22 ` Nicholas Piggin [this message]
2021-02-02 10:22 ` Nicholas Piggin
2021-01-26 4:44 ` [PATCH v11 02/13] mm: apply_to_pte_range warn and fail if a large pte is encountered Nicholas Piggin
2021-01-26 4:44 ` Nicholas Piggin
2021-01-26 6:49 ` Miaohe Lin
2021-01-26 6:49 ` Miaohe Lin
2021-01-26 4:45 ` [PATCH v11 03/13] mm/vmalloc: rename vmap_*_range vmap_pages_*_range Nicholas Piggin
2021-01-26 4:45 ` Nicholas Piggin
2021-01-27 2:10 ` Miaohe Lin
2021-01-27 2:10 ` Miaohe Lin
2021-01-26 4:45 ` [PATCH v11 04/13] mm/ioremap: rename ioremap_*_range to vmap_*_range Nicholas Piggin
2021-01-26 4:45 ` Nicholas Piggin
2021-01-26 6:40 ` Christoph Hellwig
2021-01-26 6:40 ` Christoph Hellwig
2021-01-28 2:38 ` Miaohe Lin
2021-01-28 2:38 ` Miaohe Lin
2021-01-26 4:45 ` [PATCH v11 05/13] mm: HUGE_VMAP arch support cleanup Nicholas Piggin
2021-01-26 4:45 ` Nicholas Piggin
2021-01-26 4:45 ` Nicholas Piggin
2021-01-26 6:07 ` Ding Tianhong
2021-01-26 6:07 ` Ding Tianhong
2021-01-26 6:07 ` Ding Tianhong
2021-01-26 13:26 ` kernel test robot
2021-01-26 13:26 ` kernel test robot
2021-01-26 13:26 ` kernel test robot
2021-01-27 5:26 ` kernel test robot
2021-01-27 5:26 ` kernel test robot
2021-01-27 5:26 ` kernel test robot
2021-01-26 4:45 ` [PATCH v11 06/13] powerpc: inline huge vmap supported functions Nicholas Piggin
2021-01-26 4:45 ` Nicholas Piggin
2021-01-26 4:45 ` [PATCH v11 07/13] arm64: " Nicholas Piggin
2021-01-26 4:45 ` Nicholas Piggin
2021-01-26 4:45 ` Nicholas Piggin
2021-01-26 4:45 ` [PATCH v11 08/13] x86: " Nicholas Piggin
2021-01-26 4:45 ` Nicholas Piggin
2021-01-26 4:45 ` [PATCH v11 09/13] mm/vmalloc: provide fallback arch huge vmap support functions Nicholas Piggin
2021-01-26 4:45 ` Nicholas Piggin
2021-01-26 4:45 ` [PATCH v11 10/13] mm: Move vmap_range from mm/ioremap.c to mm/vmalloc.c Nicholas Piggin
2021-01-26 4:45 ` Nicholas Piggin
2021-01-26 4:45 ` [PATCH v11 11/13] mm/vmalloc: add vmap_range_noflush variant Nicholas Piggin
2021-01-26 4:45 ` Nicholas Piggin
2021-01-26 4:45 ` [PATCH v11 12/13] mm/vmalloc: Hugepage vmalloc mappings Nicholas Piggin
2021-01-26 4:45 ` Nicholas Piggin
2021-01-26 6:59 ` Ding Tianhong
2021-01-26 6:59 ` Ding Tianhong
2021-01-26 9:47 ` Nicholas Piggin
2021-01-26 9:47 ` Nicholas Piggin
2021-01-26 11:48 ` Ding Tianhong
2021-01-26 11:48 ` Ding Tianhong
2021-01-26 4:45 ` [PATCH v11 13/13] powerpc/64s/radix: Enable huge " Nicholas Piggin
2021-01-26 4:45 ` Nicholas Piggin
2021-01-27 10:26 ` Michael Ellerman
2021-01-27 10:26 ` Michael Ellerman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1612261080.2gjaa5ecdf.astroid@bobo.none \
--to=npiggin@gmail.com \
--cc=Jonathan.Cameron@Huawei.com \
--cc=akpm@linux-foundation.org \
--cc=christophe.leroy@csgroup.eu \
--cc=dingtianhong@huawei.com \
--cc=hch@infradead.org \
--cc=hch@lst.de \
--cc=linux-arch@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=rick.p.edgecombe@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.