* [PATCH 00/13] mm/treewide: Remove pXd_huge() API
@ 2024-03-13 21:47 peterx
2024-03-13 21:47 ` [PATCH 03/13] mm/gup: Check p4d presence before going on peterx
` (7 more replies)
0 siblings, 8 replies; 18+ messages in thread
From: peterx @ 2024-03-13 21:47 UTC (permalink / raw)
To: linux-kernel, linux-mm
Cc: linux-arm-kernel, Matthew Wilcox, linuxppc-dev, Christophe Leroy,
Andrew Morton, x86, peterx, Mike Rapoport, Muchun Song,
sparclinux, Jason Gunthorpe
From: Peter Xu <peterx@redhat.com>
[based on akpm/mm-unstable latest commit 9af2e4c429b5]
v1:
- Rebase, remove RFC tag
- Fixed powerpc patch build issue, enhancing commit message [Michael]
- Optimize patch 1 & 3 on "none || !present" check [Jason]
In previous work [1], we removed the pXd_large() API, which is arch
specific. This patchset further removes the hugetlb pXd_huge() API.
Hugetlb was never special on creating huge mappings when compared with
other huge mappings. Having a standalone API just to detect such pgtable
entries is more or less redundant, especially after the pXd_leaf() API set
is introduced with/without CONFIG_HUGETLB_PAGE.
When looking at this problem, a few issues are also exposed that we don't
have a clear definition of the *_huge() variance API. This patchset
started by cleaning these issues first, then replace all *_huge() users to
use *_leaf(), then drop all *_huge() code.
On x86/sparc, swap entries will be reported "true" in pXd_huge(), while for
all the rest archs they're reported "false" instead. This part is done in
patch 1-5, in which I suspect patch 1 can be seen as a bug fix, but I'll
leave that to hmm experts to decide.
Besides, there are three archs (arm, arm64, powerpc) that have slightly
different definitions between the *_huge() v.s. *_leaf() variances. I
tackled them separately so that it'll be easier for arch experts to chim in
when necessary. This part is done in patch 6-9.
The final patches 10-13 do the rest on the final removal, since *_leaf()
will be the ultimate API in the future, and we seem to have quite some
confusions on how *_huge() APIs can be defined, provide a rich comment for
*_leaf() API set to define them properly to avoid future misuse, and
hopefully that'll also help new archs to start support huge mappings and
avoid traps (like either swap entries, or PROT_NONE entry checks).
The whole series is only lightly tested on x86, while as usual I don't have
the capability to test all archs that it touches.
[1] https://lore.kernel.org/r/20240305043750.93762-1-peterx@redhat.com
Peter Xu (13):
mm/hmm: Process pud swap entry without pud_huge()
mm/gup: Cache p4d in follow_p4d_mask()
mm/gup: Check p4d presence before going on
mm/x86: Change pXd_huge() behavior to exclude swap entries
mm/sparc: Change pXd_huge() behavior to exclude swap entries
mm/arm: Use macros to define pmd/pud helpers
mm/arm: Redefine pmd_huge() with pmd_leaf()
mm/arm64: Merge pXd_huge() and pXd_leaf() definitions
mm/powerpc: Redefine pXd_huge() with pXd_leaf()
mm/gup: Merge pXd huge mapping checks
mm/treewide: Replace pXd_huge() with pXd_leaf()
mm/treewide: Remove pXd_huge()
mm: Document pXd_leaf() API
arch/arm/include/asm/pgtable-2level.h | 4 +--
arch/arm/include/asm/pgtable-3level-hwdef.h | 1 +
arch/arm/include/asm/pgtable-3level.h | 6 ++--
arch/arm/mm/Makefile | 1 -
arch/arm/mm/hugetlbpage.c | 34 -------------------
arch/arm64/include/asm/pgtable.h | 6 +++-
arch/arm64/mm/hugetlbpage.c | 18 ++--------
arch/loongarch/mm/hugetlbpage.c | 12 +------
arch/mips/include/asm/pgtable-32.h | 2 +-
arch/mips/include/asm/pgtable-64.h | 2 +-
arch/mips/mm/hugetlbpage.c | 10 ------
arch/mips/mm/tlb-r4k.c | 2 +-
arch/parisc/mm/hugetlbpage.c | 11 ------
.../include/asm/book3s/64/pgtable-4k.h | 20 -----------
.../include/asm/book3s/64/pgtable-64k.h | 25 --------------
arch/powerpc/include/asm/book3s/64/pgtable.h | 27 +++++++--------
arch/powerpc/include/asm/nohash/pgtable.h | 10 ------
arch/powerpc/mm/pgtable_64.c | 6 ++--
arch/riscv/mm/hugetlbpage.c | 10 ------
arch/s390/mm/hugetlbpage.c | 10 ------
arch/sh/mm/hugetlbpage.c | 10 ------
arch/sparc/mm/hugetlbpage.c | 12 -------
arch/x86/mm/hugetlbpage.c | 26 --------------
arch/x86/mm/pgtable.c | 4 +--
include/linux/hugetlb.h | 24 -------------
include/linux/pgtable.h | 24 ++++++++++---
mm/gup.c | 24 ++++++-------
mm/hmm.c | 9 ++---
mm/memory.c | 2 +-
29 files changed, 68 insertions(+), 284 deletions(-)
delete mode 100644 arch/arm/mm/hugetlbpage.c
--
2.44.0
^ permalink raw reply [flat|nested] 18+ messages in thread* [PATCH 03/13] mm/gup: Check p4d presence before going on 2024-03-13 21:47 [PATCH 00/13] mm/treewide: Remove pXd_huge() API peterx @ 2024-03-13 21:47 ` peterx 2024-03-13 21:47 ` [PATCH 06/13] mm/arm: Use macros to define pmd/pud helpers peterx ` (6 subsequent siblings) 7 siblings, 0 replies; 18+ messages in thread From: peterx @ 2024-03-13 21:47 UTC (permalink / raw) To: linux-kernel, linux-mm Cc: linux-arm-kernel, Matthew Wilcox, linuxppc-dev, Christophe Leroy, Andrew Morton, x86, peterx, Mike Rapoport, Muchun Song, sparclinux, Jason Gunthorpe From: Peter Xu <peterx@redhat.com> Currently there should have no p4d swap entries so it may not matter much, however this may help us to rule out swap entries in pXd_huge() API, which will include p4d_huge(). The p4d_present() checks make it 100% clear that we won't rely on p4d_huge() for swap entries. Signed-off-by: Peter Xu <peterx@redhat.com> --- mm/gup.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/mm/gup.c b/mm/gup.c index 69a777f4fc5c..802987281b2f 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -776,7 +776,7 @@ static struct page *follow_p4d_mask(struct vm_area_struct *vma, p4dp = p4d_offset(pgdp, address); p4d = READ_ONCE(*p4dp); - if (p4d_none(p4d)) + if (!p4d_present(p4d)) return no_page_table(vma, flags); BUILD_BUG_ON(p4d_huge(p4d)); if (unlikely(p4d_bad(p4d))) @@ -3069,7 +3069,7 @@ static int gup_p4d_range(pgd_t *pgdp, pgd_t pgd, unsigned long addr, unsigned lo p4d_t p4d = READ_ONCE(*p4dp); next = p4d_addr_end(addr, end); - if (p4d_none(p4d)) + if (!p4d_present(p4d)) return 0; BUILD_BUG_ON(p4d_huge(p4d)); if (unlikely(is_hugepd(__hugepd(p4d_val(p4d))))) { -- 2.44.0 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH 06/13] mm/arm: Use macros to define pmd/pud helpers 2024-03-13 21:47 [PATCH 00/13] mm/treewide: Remove pXd_huge() API peterx 2024-03-13 21:47 ` [PATCH 03/13] mm/gup: Check p4d presence before going on peterx @ 2024-03-13 21:47 ` peterx 2024-03-13 21:47 ` [PATCH 07/13] mm/arm: Redefine pmd_huge() with pmd_leaf() peterx ` (5 subsequent siblings) 7 siblings, 0 replies; 18+ messages in thread From: peterx @ 2024-03-13 21:47 UTC (permalink / raw) To: linux-kernel, linux-mm Cc: linux-arm-kernel, Matthew Wilcox, linuxppc-dev, Christophe Leroy, Andrew Morton, x86, peterx, Mike Rapoport, Muchun Song, sparclinux, Jason Gunthorpe, Russell King, Shawn Guo, Krzysztof Kozlowski, Bjorn Andersson, Arnd Bergmann, Konrad Dybcio, Fabio Estevam From: Peter Xu <peterx@redhat.com> It's already confusing that ARM 2-level v.s. 3-level defines SECT bit differently on pmd/puds. Always use a macro which is much clearer. Cc: Russell King <linux@armlinux.org.uk> Cc: Shawn Guo <shawnguo@kernel.org> Cc: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Cc: Bjorn Andersson <andersson@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Konrad Dybcio <konrad.dybcio@linaro.org> Cc: Fabio Estevam <festevam@denx.de> Cc: linux-arm-kernel@lists.infradead.org Signed-off-by: Peter Xu <peterx@redhat.com> --- arch/arm/include/asm/pgtable-2level.h | 4 ++-- arch/arm/include/asm/pgtable-3level-hwdef.h | 1 + arch/arm/include/asm/pgtable-3level.h | 4 ++-- 3 files changed, 5 insertions(+), 4 deletions(-) diff --git a/arch/arm/include/asm/pgtable-2level.h b/arch/arm/include/asm/pgtable-2level.h index b0a262566eb9..4245c2e74720 100644 --- a/arch/arm/include/asm/pgtable-2level.h +++ b/arch/arm/include/asm/pgtable-2level.h @@ -213,8 +213,8 @@ static inline pmd_t *pmd_offset(pud_t *pud, unsigned long addr) #define pmd_pfn(pmd) (__phys_to_pfn(pmd_val(pmd) & PHYS_MASK)) -#define pmd_leaf(pmd) (pmd_val(pmd) & 2) -#define pmd_bad(pmd) (pmd_val(pmd) & 2) +#define pmd_leaf(pmd) (pmd_val(pmd) & PMD_TYPE_SECT) +#define pmd_bad(pmd) pmd_leaf(pmd) #define pmd_present(pmd) (pmd_val(pmd)) #define copy_pmd(pmdpd,pmdps) \ diff --git a/arch/arm/include/asm/pgtable-3level-hwdef.h b/arch/arm/include/asm/pgtable-3level-hwdef.h index 2f35b4eddaa8..e7b666cf0060 100644 --- a/arch/arm/include/asm/pgtable-3level-hwdef.h +++ b/arch/arm/include/asm/pgtable-3level-hwdef.h @@ -14,6 +14,7 @@ * + Level 1/2 descriptor * - common */ +#define PUD_TABLE_BIT (_AT(pmdval_t, 1) << 1) #define PMD_TYPE_MASK (_AT(pmdval_t, 3) << 0) #define PMD_TYPE_FAULT (_AT(pmdval_t, 0) << 0) #define PMD_TYPE_TABLE (_AT(pmdval_t, 3) << 0) diff --git a/arch/arm/include/asm/pgtable-3level.h b/arch/arm/include/asm/pgtable-3level.h index 4b1d9eb3908a..e7aecbef75c9 100644 --- a/arch/arm/include/asm/pgtable-3level.h +++ b/arch/arm/include/asm/pgtable-3level.h @@ -112,7 +112,7 @@ #ifndef __ASSEMBLY__ #define pud_none(pud) (!pud_val(pud)) -#define pud_bad(pud) (!(pud_val(pud) & 2)) +#define pud_bad(pud) (!(pud_val(pud) & PUD_TABLE_BIT)) #define pud_present(pud) (pud_val(pud)) #define pmd_table(pmd) ((pmd_val(pmd) & PMD_TYPE_MASK) == \ PMD_TYPE_TABLE) @@ -137,7 +137,7 @@ static inline pmd_t *pud_pgtable(pud_t pud) return __va(pud_val(pud) & PHYS_MASK & (s32)PAGE_MASK); } -#define pmd_bad(pmd) (!(pmd_val(pmd) & 2)) +#define pmd_bad(pmd) (!(pmd_val(pmd) & PMD_TABLE_BIT)) #define copy_pmd(pmdpd,pmdps) \ do { \ -- 2.44.0 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH 07/13] mm/arm: Redefine pmd_huge() with pmd_leaf() 2024-03-13 21:47 [PATCH 00/13] mm/treewide: Remove pXd_huge() API peterx 2024-03-13 21:47 ` [PATCH 03/13] mm/gup: Check p4d presence before going on peterx 2024-03-13 21:47 ` [PATCH 06/13] mm/arm: Use macros to define pmd/pud helpers peterx @ 2024-03-13 21:47 ` peterx 2024-03-13 21:47 ` [PATCH 08/13] mm/arm64: Merge pXd_huge() and pXd_leaf() definitions peterx ` (4 subsequent siblings) 7 siblings, 0 replies; 18+ messages in thread From: peterx @ 2024-03-13 21:47 UTC (permalink / raw) To: linux-kernel, linux-mm Cc: linux-arm-kernel, Matthew Wilcox, linuxppc-dev, Christophe Leroy, Andrew Morton, x86, peterx, Mike Rapoport, Muchun Song, sparclinux, Jason Gunthorpe, Russell King, Shawn Guo, Krzysztof Kozlowski, Bjorn Andersson, Arnd Bergmann, Konrad Dybcio, Fabio Estevam From: Peter Xu <peterx@redhat.com> Most of the archs already define these two APIs the same way. ARM is more complicated in two aspects: - For pXd_huge() it's always checking against !PXD_TABLE_BIT, while for pXd_leaf() it's always checking against PXD_TYPE_SECT. - SECT/TABLE bits are defined differently on 2-level v.s. 3-level ARM pgtables, which makes the whole thing even harder to follow. Luckily, the second complexity should be hidden by the pmd_leaf() implementation against 2-level v.s. 3-level headers. Invoke pmd_leaf() directly for pmd_huge(), to remove the first part of complexity. This prepares to drop pXd_huge() API globally. When at it, drop the obsolete comments - it's outdated. Cc: Russell King <linux@armlinux.org.uk> Cc: Shawn Guo <shawnguo@kernel.org> Cc: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Cc: Bjorn Andersson <andersson@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Konrad Dybcio <konrad.dybcio@linaro.org> Cc: Fabio Estevam <festevam@denx.de> Cc: linux-arm-kernel@lists.infradead.org Signed-off-by: Peter Xu <peterx@redhat.com> --- arch/arm/mm/hugetlbpage.c | 7 +------ 1 file changed, 1 insertion(+), 6 deletions(-) diff --git a/arch/arm/mm/hugetlbpage.c b/arch/arm/mm/hugetlbpage.c index dd7a0277c5c0..c2fa643f6bb5 100644 --- a/arch/arm/mm/hugetlbpage.c +++ b/arch/arm/mm/hugetlbpage.c @@ -18,11 +18,6 @@ #include <asm/tlb.h> #include <asm/tlbflush.h> -/* - * On ARM, huge pages are backed by pmd's rather than pte's, so we do a lot - * of type casting from pmd_t * to pte_t *. - */ - int pud_huge(pud_t pud) { return 0; @@ -30,5 +25,5 @@ int pud_huge(pud_t pud) int pmd_huge(pmd_t pmd) { - return pmd_val(pmd) && !(pmd_val(pmd) & PMD_TABLE_BIT); + return pmd_leaf(pmd); } -- 2.44.0 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH 08/13] mm/arm64: Merge pXd_huge() and pXd_leaf() definitions 2024-03-13 21:47 [PATCH 00/13] mm/treewide: Remove pXd_huge() API peterx ` (2 preceding siblings ...) 2024-03-13 21:47 ` [PATCH 07/13] mm/arm: Redefine pmd_huge() with pmd_leaf() peterx @ 2024-03-13 21:47 ` peterx 2024-03-13 21:47 ` [PATCH 10/13] mm/gup: Merge pXd huge mapping checks peterx ` (3 subsequent siblings) 7 siblings, 0 replies; 18+ messages in thread From: peterx @ 2024-03-13 21:47 UTC (permalink / raw) To: linux-kernel, linux-mm Cc: linux-arm-kernel, Matthew Wilcox, linuxppc-dev, Christophe Leroy, Andrew Morton, x86, peterx, Mike Rapoport, Muchun Song, sparclinux, Jason Gunthorpe, Mark Salter, Catalin Marinas, Will Deacon From: Peter Xu <peterx@redhat.com> Unlike most archs, aarch64 defines pXd_huge() and pXd_leaf() slightly differently. Redefine the pXd_huge() with pXd_leaf(). There used to be two traps for old aarch64 definitions over these APIs that I found when reading the code around, they're: (1) 4797ec2dc83a ("arm64: fix pud_huge() for 2-level pagetables") (2) 23bc8f69f0ec ("arm64: mm: fix p?d_leaf()") Define pXd_huge() with the current pXd_leaf() will make sure (2) isn't a problem (on PROT_NONE checks). To make sure it also works for (1), we move over the __PAGETABLE_PMD_FOLDED check to pud_leaf(), allowing it to constantly returning "false" for 2-level pgtables, which looks even safer to cover both now. Cc: Muchun Song <muchun.song@linux.dev> Cc: Mark Salter <msalter@redhat.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Signed-off-by: Peter Xu <peterx@redhat.com> --- arch/arm64/include/asm/pgtable.h | 4 ++++ arch/arm64/mm/hugetlbpage.c | 8 ++------ 2 files changed, 6 insertions(+), 6 deletions(-) diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h index 401087e8a43d..14d24c357c7a 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -704,7 +704,11 @@ static inline unsigned long pmd_page_vaddr(pmd_t pmd) #define pud_none(pud) (!pud_val(pud)) #define pud_bad(pud) (!pud_table(pud)) #define pud_present(pud) pte_present(pud_pte(pud)) +#ifndef __PAGETABLE_PMD_FOLDED #define pud_leaf(pud) (pud_present(pud) && !pud_table(pud)) +#else +#define pud_leaf(pud) false +#endif #define pud_valid(pud) pte_valid(pud_pte(pud)) #define pud_user(pud) pte_user(pud_pte(pud)) #define pud_user_exec(pud) pte_user_exec(pud_pte(pud)) diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c index 0f0e10bb0a95..1234bbaef5bf 100644 --- a/arch/arm64/mm/hugetlbpage.c +++ b/arch/arm64/mm/hugetlbpage.c @@ -81,16 +81,12 @@ bool arch_hugetlb_migration_supported(struct hstate *h) int pmd_huge(pmd_t pmd) { - return pmd_val(pmd) && !(pmd_val(pmd) & PMD_TABLE_BIT); + return pmd_leaf(pmd); } int pud_huge(pud_t pud) { -#ifndef __PAGETABLE_PMD_FOLDED - return pud_val(pud) && !(pud_val(pud) & PUD_TABLE_BIT); -#else - return 0; -#endif + return pud_leaf(pud); } static int find_num_contig(struct mm_struct *mm, unsigned long addr, -- 2.44.0 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH 10/13] mm/gup: Merge pXd huge mapping checks 2024-03-13 21:47 [PATCH 00/13] mm/treewide: Remove pXd_huge() API peterx ` (3 preceding siblings ...) 2024-03-13 21:47 ` [PATCH 08/13] mm/arm64: Merge pXd_huge() and pXd_leaf() definitions peterx @ 2024-03-13 21:47 ` peterx [not found] ` <20240313214719.253873-10-peterx@redhat.com> ` (2 subsequent siblings) 7 siblings, 0 replies; 18+ messages in thread From: peterx @ 2024-03-13 21:47 UTC (permalink / raw) To: linux-kernel, linux-mm Cc: linux-arm-kernel, Matthew Wilcox, linuxppc-dev, Christophe Leroy, Andrew Morton, x86, peterx, Mike Rapoport, Muchun Song, sparclinux, Jason Gunthorpe From: Peter Xu <peterx@redhat.com> Huge mapping checks in GUP are slightly redundant and can be simplified. pXd_huge() now is the same as pXd_leaf(). pmd_trans_huge() and pXd_devmap() should both imply pXd_leaf(). Time to merge them into one. Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Peter Xu <peterx@redhat.com> --- mm/gup.c | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/mm/gup.c b/mm/gup.c index 802987281b2f..e2415e9789bc 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -3005,8 +3005,7 @@ static int gup_pmd_range(pud_t *pudp, pud_t pud, unsigned long addr, unsigned lo if (!pmd_present(pmd)) return 0; - if (unlikely(pmd_trans_huge(pmd) || pmd_huge(pmd) || - pmd_devmap(pmd))) { + if (unlikely(pmd_leaf(pmd))) { /* See gup_pte_range() */ if (pmd_protnone(pmd)) return 0; @@ -3043,7 +3042,7 @@ static int gup_pud_range(p4d_t *p4dp, p4d_t p4d, unsigned long addr, unsigned lo next = pud_addr_end(addr, end); if (unlikely(!pud_present(pud))) return 0; - if (unlikely(pud_huge(pud) || pud_devmap(pud))) { + if (unlikely(pud_leaf(pud))) { if (!gup_huge_pud(pud, pudp, addr, next, flags, pages, nr)) return 0; @@ -3096,7 +3095,7 @@ static void gup_pgd_range(unsigned long addr, unsigned long end, next = pgd_addr_end(addr, end); if (pgd_none(pgd)) return; - if (unlikely(pgd_huge(pgd))) { + if (unlikely(pgd_leaf(pgd))) { if (!gup_huge_pgd(pgd, pgdp, addr, next, flags, pages, nr)) return; -- 2.44.0 ^ permalink raw reply related [flat|nested] 18+ messages in thread
[parent not found: <20240313214719.253873-10-peterx@redhat.com>]
* Re: [PATCH 09/13] mm/powerpc: Redefine pXd_huge() with pXd_leaf() [not found] ` <20240313214719.253873-10-peterx@redhat.com> @ 2024-03-14 8:45 ` Christophe Leroy [not found] ` <ZfLzZekFBp3J6JUy@x1n> 0 siblings, 1 reply; 18+ messages in thread From: Christophe Leroy @ 2024-03-14 8:45 UTC (permalink / raw) To: peterx@redhat.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: linux-arm-kernel@lists.infradead.org, Matthew Wilcox, linuxppc-dev@lists.ozlabs.org, Andrew Morton, x86@kernel.org, Mike Rapoport, Muchun Song, sparclinux@vger.kernel.org, Jason Gunthorpe, Michael Ellerman, Nicholas Piggin, Aneesh Kumar K.V, Naveen N. Rao Le 13/03/2024 à 22:47, peterx@redhat.com a écrit : > From: Peter Xu <peterx@redhat.com> > > PowerPC book3s 4K mostly has the same definition on both, except pXd_huge() > constantly returns 0 for hash MMUs. As Michael Ellerman pointed out [1], > it is safe to check _PAGE_PTE on hash MMUs, as the bit will never be set so > it will keep returning false. > > As a reference, __p[mu]d_mkhuge() will trigger a BUG_ON trying to create > such huge mappings for 4K hash MMUs. Meanwhile, the major powerpc hugetlb > pgtable walker __find_linux_pte() already used pXd_leaf() to check hugetlb > mappings. > > The goal should be that we will have one API pXd_leaf() to detect all kinds > of huge mappings. AFAICT we need to use the pXd_leaf() impl (rather than > pXd_huge() ones) to make sure ie. THPs on hash MMU will also return true. All kinds of huge mappings ? pXd_leaf() will detect only leaf mappings (like pXd_huge() ). There are also huge mappings through hugepd. On powerpc 8xx we have 8M huge pages and 512k huge pages. A PGD entry covers 4M so pgd_leaf() won't report those huge pages. > > This helps to simplify a follow up patch to drop pXd_huge() treewide. > > NOTE: *_leaf() definition need to be moved before the inclusion of > asm/book3s/64/pgtable-4k.h, which defines pXd_huge() with it. > > [1] https://lore.kernel.org/r/87v85zo6w7.fsf@mail.lhotse > > Cc: Michael Ellerman <mpe@ellerman.id.au> > Cc: Nicholas Piggin <npiggin@gmail.com> > Cc: Christophe Leroy <christophe.leroy@csgroup.eu> > Cc: "Aneesh Kumar K.V" <aneesh.kumar@kernel.org> > Cc: "Naveen N. Rao" <naveen.n.rao@linux.ibm.com> > Cc: linuxppc-dev@lists.ozlabs.org > Signed-off-by: Peter Xu <peterx@redhat.com> > --- > .../include/asm/book3s/64/pgtable-4k.h | 14 ++-------- > arch/powerpc/include/asm/book3s/64/pgtable.h | 27 +++++++++---------- > 2 files changed, 14 insertions(+), 27 deletions(-) > > diff --git a/arch/powerpc/include/asm/book3s/64/pgtable-4k.h b/arch/powerpc/include/asm/book3s/64/pgtable-4k.h > index 48f21820afe2..92545981bb49 100644 > --- a/arch/powerpc/include/asm/book3s/64/pgtable-4k.h > +++ b/arch/powerpc/include/asm/book3s/64/pgtable-4k.h > @@ -8,22 +8,12 @@ > #ifdef CONFIG_HUGETLB_PAGE > static inline int pmd_huge(pmd_t pmd) > { > - /* > - * leaf pte for huge page > - */ > - if (radix_enabled()) > - return !!(pmd_raw(pmd) & cpu_to_be64(_PAGE_PTE)); > - return 0; > + return pmd_leaf(pmd); > } > > static inline int pud_huge(pud_t pud) > { > - /* > - * leaf pte for huge page > - */ > - if (radix_enabled()) > - return !!(pud_raw(pud) & cpu_to_be64(_PAGE_PTE)); > - return 0; > + return pud_leaf(pud); > } > > /* > diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h > index df66dce8306f..fd7180fded75 100644 > --- a/arch/powerpc/include/asm/book3s/64/pgtable.h > +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h > @@ -262,6 +262,18 @@ extern unsigned long __kernel_io_end; > > extern struct page *vmemmap; > extern unsigned long pci_io_base; > + > +#define pmd_leaf pmd_leaf > +static inline bool pmd_leaf(pmd_t pmd) > +{ > + return !!(pmd_raw(pmd) & cpu_to_be64(_PAGE_PTE)); > +} > + > +#define pud_leaf pud_leaf > +static inline bool pud_leaf(pud_t pud) > +{ > + return !!(pud_raw(pud) & cpu_to_be64(_PAGE_PTE)); > +} > #endif /* __ASSEMBLY__ */ > > #include <asm/book3s/64/hash.h> > @@ -1436,20 +1448,5 @@ static inline bool is_pte_rw_upgrade(unsigned long old_val, unsigned long new_va > return false; > } > > -/* > - * Like pmd_huge(), but works regardless of config options > - */ > -#define pmd_leaf pmd_leaf > -static inline bool pmd_leaf(pmd_t pmd) > -{ > - return !!(pmd_raw(pmd) & cpu_to_be64(_PAGE_PTE)); > -} > - > -#define pud_leaf pud_leaf > -static inline bool pud_leaf(pud_t pud) > -{ > - return !!(pud_raw(pud) & cpu_to_be64(_PAGE_PTE)); > -} > - > #endif /* __ASSEMBLY__ */ > #endif /* _ASM_POWERPC_BOOK3S_64_PGTABLE_H_ */ ^ permalink raw reply [flat|nested] 18+ messages in thread
[parent not found: <ZfLzZekFBp3J6JUy@x1n>]
* Re: [PATCH 09/13] mm/powerpc: Redefine pXd_huge() with pXd_leaf() [not found] ` <ZfLzZekFBp3J6JUy@x1n> @ 2024-03-14 13:11 ` Christophe Leroy 2024-03-18 16:15 ` Jason Gunthorpe 0 siblings, 1 reply; 18+ messages in thread From: Christophe Leroy @ 2024-03-14 13:11 UTC (permalink / raw) To: Peter Xu Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org, Matthew Wilcox, linuxppc-dev@lists.ozlabs.org, Andrew Morton, x86@kernel.org, Mike Rapoport, Muchun Song, sparclinux@vger.kernel.org, Jason Gunthorpe, Michael Ellerman, Nicholas Piggin, Aneesh Kumar K.V, Naveen N. Rao Le 14/03/2024 à 13:53, Peter Xu a écrit : > On Thu, Mar 14, 2024 at 08:45:34AM +0000, Christophe Leroy wrote: >> >> >> Le 13/03/2024 à 22:47, peterx@redhat.com a écrit : >>> From: Peter Xu <peterx@redhat.com> >>> >>> PowerPC book3s 4K mostly has the same definition on both, except pXd_huge() >>> constantly returns 0 for hash MMUs. As Michael Ellerman pointed out [1], >>> it is safe to check _PAGE_PTE on hash MMUs, as the bit will never be set so >>> it will keep returning false. >>> >>> As a reference, __p[mu]d_mkhuge() will trigger a BUG_ON trying to create >>> such huge mappings for 4K hash MMUs. Meanwhile, the major powerpc hugetlb >>> pgtable walker __find_linux_pte() already used pXd_leaf() to check hugetlb >>> mappings. >>> >>> The goal should be that we will have one API pXd_leaf() to detect all kinds >>> of huge mappings. AFAICT we need to use the pXd_leaf() impl (rather than >>> pXd_huge() ones) to make sure ie. THPs on hash MMU will also return true. >> >> All kinds of huge mappings ? >> >> pXd_leaf() will detect only leaf mappings (like pXd_huge() ). There are >> also huge mappings through hugepd. On powerpc 8xx we have 8M huge pages >> and 512k huge pages. A PGD entry covers 4M so pgd_leaf() won't report >> those huge pages. > > Ah yes, I should always mention this is in the context of leaf huge pages > only. Are the examples you provided all fall into hugepd category? If so > I can reword the commit message, as: On powerpc 8xx, only the 8M huge pages fall into the hugepd case. The 512k hugepages are at PTE level, they are handled more or less like CONT_PTE on ARM. see function set_huge_pte_at() for more context. You can also look at pte_leaf_size() and pgd_leaf_size(). By the way pgd_leaf_size() looks odd because it is called only when pgd_leaf_size() returns true, which never happens for 8M pages. > > As a reference, __p[mu]d_mkhuge() will trigger a BUG_ON trying to > create such huge mappings for 4K hash MMUs. Meanwhile, the major > powerpc hugetlb pgtable walker __find_linux_pte() already used > pXd_leaf() to check leaf hugetlb mappings. > > The goal should be that we will have one API pXd_leaf() to detect > all kinds of huge mappings except hugepd. AFAICT we need to use > the pXd_leaf() impl (rather than pXd_huge() ones) to make sure > ie. THPs on hash MMU will also return true. > > Does this look good to you? > > Thanks, > ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 09/13] mm/powerpc: Redefine pXd_huge() with pXd_leaf() 2024-03-14 13:11 ` Christophe Leroy @ 2024-03-18 16:15 ` Jason Gunthorpe 2024-03-19 23:07 ` Christophe Leroy 0 siblings, 1 reply; 18+ messages in thread From: Jason Gunthorpe @ 2024-03-18 16:15 UTC (permalink / raw) To: Christophe Leroy Cc: Peter Xu, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org, Matthew Wilcox, linuxppc-dev@lists.ozlabs.org, Andrew Morton, x86@kernel.org, Mike Rapoport, Muchun Song, sparclinux@vger.kernel.org, Michael Ellerman, Nicholas Piggin, Aneesh Kumar K.V, Naveen N. Rao On Thu, Mar 14, 2024 at 01:11:59PM +0000, Christophe Leroy wrote: > > > Le 14/03/2024 à 13:53, Peter Xu a écrit : > > On Thu, Mar 14, 2024 at 08:45:34AM +0000, Christophe Leroy wrote: > >> > >> > >> Le 13/03/2024 à 22:47, peterx@redhat.com a écrit : > >>> From: Peter Xu <peterx@redhat.com> > >>> > >>> PowerPC book3s 4K mostly has the same definition on both, except pXd_huge() > >>> constantly returns 0 for hash MMUs. As Michael Ellerman pointed out [1], > >>> it is safe to check _PAGE_PTE on hash MMUs, as the bit will never be set so > >>> it will keep returning false. > >>> > >>> As a reference, __p[mu]d_mkhuge() will trigger a BUG_ON trying to create > >>> such huge mappings for 4K hash MMUs. Meanwhile, the major powerpc hugetlb > >>> pgtable walker __find_linux_pte() already used pXd_leaf() to check hugetlb > >>> mappings. > >>> > >>> The goal should be that we will have one API pXd_leaf() to detect all kinds > >>> of huge mappings. AFAICT we need to use the pXd_leaf() impl (rather than > >>> pXd_huge() ones) to make sure ie. THPs on hash MMU will also return true. > >> > >> All kinds of huge mappings ? > >> > >> pXd_leaf() will detect only leaf mappings (like pXd_huge() ). There are > >> also huge mappings through hugepd. On powerpc 8xx we have 8M huge pages > >> and 512k huge pages. A PGD entry covers 4M so pgd_leaf() won't report > >> those huge pages. > > > > Ah yes, I should always mention this is in the context of leaf huge pages > > only. Are the examples you provided all fall into hugepd category? If so > > I can reword the commit message, as: > > On powerpc 8xx, only the 8M huge pages fall into the hugepd case. > > The 512k hugepages are at PTE level, they are handled more or less like > CONT_PTE on ARM. see function set_huge_pte_at() for more context. > > You can also look at pte_leaf_size() and pgd_leaf_size(). IMHO leaf should return false if the thing is pointing to a next level page table, even if that next level is fully populated with contiguous pages. This seems more aligned with the contig page direction that hugepd should be moved over to.. > By the way pgd_leaf_size() looks odd because it is called only when > pgd_leaf_size() returns true, which never happens for 8M pages. Like this, you should reach the actual final leaf that the HW will load and leaf_size() should say it is greater size than the current table level. Other levels should return 0. If necessary the core MM code should deal with this by iterating over adjacent tables. Jason ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 09/13] mm/powerpc: Redefine pXd_huge() with pXd_leaf() 2024-03-18 16:15 ` Jason Gunthorpe @ 2024-03-19 23:07 ` Christophe Leroy 2024-03-19 23:26 ` Jason Gunthorpe 0 siblings, 1 reply; 18+ messages in thread From: Christophe Leroy @ 2024-03-19 23:07 UTC (permalink / raw) To: Jason Gunthorpe Cc: Peter Xu, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org, Matthew Wilcox, linuxppc-dev@lists.ozlabs.org, Andrew Morton, x86@kernel.org, Mike Rapoport, Muchun Song, sparclinux@vger.kernel.org, Michael Ellerman, Nicholas Piggin, Aneesh Kumar K.V, Naveen N. Rao Le 18/03/2024 à 17:15, Jason Gunthorpe a écrit : > On Thu, Mar 14, 2024 at 01:11:59PM +0000, Christophe Leroy wrote: >> >> >> Le 14/03/2024 à 13:53, Peter Xu a écrit : >>> On Thu, Mar 14, 2024 at 08:45:34AM +0000, Christophe Leroy wrote: >>>> >>>> >>>> Le 13/03/2024 à 22:47, peterx@redhat.com a écrit : >>>>> From: Peter Xu <peterx@redhat.com> >>>>> >>>>> PowerPC book3s 4K mostly has the same definition on both, except pXd_huge() >>>>> constantly returns 0 for hash MMUs. As Michael Ellerman pointed out [1], >>>>> it is safe to check _PAGE_PTE on hash MMUs, as the bit will never be set so >>>>> it will keep returning false. >>>>> >>>>> As a reference, __p[mu]d_mkhuge() will trigger a BUG_ON trying to create >>>>> such huge mappings for 4K hash MMUs. Meanwhile, the major powerpc hugetlb >>>>> pgtable walker __find_linux_pte() already used pXd_leaf() to check hugetlb >>>>> mappings. >>>>> >>>>> The goal should be that we will have one API pXd_leaf() to detect all kinds >>>>> of huge mappings. AFAICT we need to use the pXd_leaf() impl (rather than >>>>> pXd_huge() ones) to make sure ie. THPs on hash MMU will also return true. >>>> >>>> All kinds of huge mappings ? >>>> >>>> pXd_leaf() will detect only leaf mappings (like pXd_huge() ). There are >>>> also huge mappings through hugepd. On powerpc 8xx we have 8M huge pages >>>> and 512k huge pages. A PGD entry covers 4M so pgd_leaf() won't report >>>> those huge pages. >>> >>> Ah yes, I should always mention this is in the context of leaf huge pages >>> only. Are the examples you provided all fall into hugepd category? If so >>> I can reword the commit message, as: >> >> On powerpc 8xx, only the 8M huge pages fall into the hugepd case. >> >> The 512k hugepages are at PTE level, they are handled more or less like >> CONT_PTE on ARM. see function set_huge_pte_at() for more context. >> >> You can also look at pte_leaf_size() and pgd_leaf_size(). > > IMHO leaf should return false if the thing is pointing to a next level > page table, even if that next level is fully populated with contiguous > pages. > > This seems more aligned with the contig page direction that hugepd > should be moved over to.. Should hugepd be moved to the contig page direction, really ? Would it be acceptable that a 8M hugepage requires 2048 contig entries in 2 page tables, when the hugepd allows a single entry ? Would it be acceptable performancewise ? > >> By the way pgd_leaf_size() looks odd because it is called only when >> pgd_leaf_size() returns true, which never happens for 8M pages. > > Like this, you should reach the actual final leaf that the HW will > load and leaf_size() should say it is greater size than the current > table level. Other levels should return 0. > > If necessary the core MM code should deal with this by iterating over > adjacent tables. > > Jason ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 09/13] mm/powerpc: Redefine pXd_huge() with pXd_leaf() 2024-03-19 23:07 ` Christophe Leroy @ 2024-03-19 23:26 ` Jason Gunthorpe 2024-03-20 6:16 ` Christophe Leroy 0 siblings, 1 reply; 18+ messages in thread From: Jason Gunthorpe @ 2024-03-19 23:26 UTC (permalink / raw) To: Christophe Leroy Cc: Peter Xu, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org, Matthew Wilcox, linuxppc-dev@lists.ozlabs.org, Andrew Morton, x86@kernel.org, Mike Rapoport, Muchun Song, sparclinux@vger.kernel.org, Michael Ellerman, Nicholas Piggin, Aneesh Kumar K.V, Naveen N. Rao On Tue, Mar 19, 2024 at 11:07:08PM +0000, Christophe Leroy wrote: > > > Le 18/03/2024 à 17:15, Jason Gunthorpe a écrit : > > On Thu, Mar 14, 2024 at 01:11:59PM +0000, Christophe Leroy wrote: > >> > >> > >> Le 14/03/2024 à 13:53, Peter Xu a écrit : > >>> On Thu, Mar 14, 2024 at 08:45:34AM +0000, Christophe Leroy wrote: > >>>> > >>>> > >>>> Le 13/03/2024 à 22:47, peterx@redhat.com a écrit : > >>>>> From: Peter Xu <peterx@redhat.com> > >>>>> > >>>>> PowerPC book3s 4K mostly has the same definition on both, except pXd_huge() > >>>>> constantly returns 0 for hash MMUs. As Michael Ellerman pointed out [1], > >>>>> it is safe to check _PAGE_PTE on hash MMUs, as the bit will never be set so > >>>>> it will keep returning false. > >>>>> > >>>>> As a reference, __p[mu]d_mkhuge() will trigger a BUG_ON trying to create > >>>>> such huge mappings for 4K hash MMUs. Meanwhile, the major powerpc hugetlb > >>>>> pgtable walker __find_linux_pte() already used pXd_leaf() to check hugetlb > >>>>> mappings. > >>>>> > >>>>> The goal should be that we will have one API pXd_leaf() to detect all kinds > >>>>> of huge mappings. AFAICT we need to use the pXd_leaf() impl (rather than > >>>>> pXd_huge() ones) to make sure ie. THPs on hash MMU will also return true. > >>>> > >>>> All kinds of huge mappings ? > >>>> > >>>> pXd_leaf() will detect only leaf mappings (like pXd_huge() ). There are > >>>> also huge mappings through hugepd. On powerpc 8xx we have 8M huge pages > >>>> and 512k huge pages. A PGD entry covers 4M so pgd_leaf() won't report > >>>> those huge pages. > >>> > >>> Ah yes, I should always mention this is in the context of leaf huge pages > >>> only. Are the examples you provided all fall into hugepd category? If so > >>> I can reword the commit message, as: > >> > >> On powerpc 8xx, only the 8M huge pages fall into the hugepd case. > >> > >> The 512k hugepages are at PTE level, they are handled more or less like > >> CONT_PTE on ARM. see function set_huge_pte_at() for more context. > >> > >> You can also look at pte_leaf_size() and pgd_leaf_size(). > > > > IMHO leaf should return false if the thing is pointing to a next level > > page table, even if that next level is fully populated with contiguous > > pages. > > > > This seems more aligned with the contig page direction that hugepd > > should be moved over to.. > > Should hugepd be moved to the contig page direction, really ? Sure? Is there any downside for the reading side to do so? > Would it be acceptable that a 8M hugepage requires 2048 contig entries > in 2 page tables, when the hugepd allows a single entry ? ? I thought we agreed the only difference would be that something new is needed to merge the two identical sibling page tables into one, ie you pay 2x the page table memory if that isn't fixed. That is write side only change and I imagine it could be done with a single PPC special API. Honestly not totally sure that is a big deal, it is already really memory inefficient compared to every other arch's huge page by needing the child page table in the first place. > Would it be acceptable performancewise ? Isn't this particular PPC sub platform ancient? Are there current real users that are going to have hugetlbfs special code and care about this performance detail on a 6.20 era kernel? In today's world wouldn't it be performance better if these platforms could support THP by aligning to the contig API instead of being special? Am I wrong to question why we are polluting the core code for this special optimization? Jason ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 09/13] mm/powerpc: Redefine pXd_huge() with pXd_leaf() 2024-03-19 23:26 ` Jason Gunthorpe @ 2024-03-20 6:16 ` Christophe Leroy [not found] ` <ZfsKIResY4YcxkxK@x1n> 0 siblings, 1 reply; 18+ messages in thread From: Christophe Leroy @ 2024-03-20 6:16 UTC (permalink / raw) To: Jason Gunthorpe Cc: Peter Xu, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org, Matthew Wilcox, linuxppc-dev@lists.ozlabs.org, Andrew Morton, x86@kernel.org, Mike Rapoport, Muchun Song, sparclinux@vger.kernel.org, Michael Ellerman, Nicholas Piggin, Aneesh Kumar K.V, Naveen N. Rao Le 20/03/2024 à 00:26, Jason Gunthorpe a écrit : > On Tue, Mar 19, 2024 at 11:07:08PM +0000, Christophe Leroy wrote: >> >> >> Le 18/03/2024 à 17:15, Jason Gunthorpe a écrit : >>> On Thu, Mar 14, 2024 at 01:11:59PM +0000, Christophe Leroy wrote: >>>> >>>> >>>> Le 14/03/2024 à 13:53, Peter Xu a écrit : >>>>> On Thu, Mar 14, 2024 at 08:45:34AM +0000, Christophe Leroy wrote: >>>>>> >>>>>> >>>>>> Le 13/03/2024 à 22:47, peterx@redhat.com a écrit : >>>>>>> From: Peter Xu <peterx@redhat.com> >>>>>>> >>>>>>> PowerPC book3s 4K mostly has the same definition on both, except pXd_huge() >>>>>>> constantly returns 0 for hash MMUs. As Michael Ellerman pointed out [1], >>>>>>> it is safe to check _PAGE_PTE on hash MMUs, as the bit will never be set so >>>>>>> it will keep returning false. >>>>>>> >>>>>>> As a reference, __p[mu]d_mkhuge() will trigger a BUG_ON trying to create >>>>>>> such huge mappings for 4K hash MMUs. Meanwhile, the major powerpc hugetlb >>>>>>> pgtable walker __find_linux_pte() already used pXd_leaf() to check hugetlb >>>>>>> mappings. >>>>>>> >>>>>>> The goal should be that we will have one API pXd_leaf() to detect all kinds >>>>>>> of huge mappings. AFAICT we need to use the pXd_leaf() impl (rather than >>>>>>> pXd_huge() ones) to make sure ie. THPs on hash MMU will also return true. >>>>>> >>>>>> All kinds of huge mappings ? >>>>>> >>>>>> pXd_leaf() will detect only leaf mappings (like pXd_huge() ). There are >>>>>> also huge mappings through hugepd. On powerpc 8xx we have 8M huge pages >>>>>> and 512k huge pages. A PGD entry covers 4M so pgd_leaf() won't report >>>>>> those huge pages. >>>>> >>>>> Ah yes, I should always mention this is in the context of leaf huge pages >>>>> only. Are the examples you provided all fall into hugepd category? If so >>>>> I can reword the commit message, as: >>>> >>>> On powerpc 8xx, only the 8M huge pages fall into the hugepd case. >>>> >>>> The 512k hugepages are at PTE level, they are handled more or less like >>>> CONT_PTE on ARM. see function set_huge_pte_at() for more context. >>>> >>>> You can also look at pte_leaf_size() and pgd_leaf_size(). >>> >>> IMHO leaf should return false if the thing is pointing to a next level >>> page table, even if that next level is fully populated with contiguous >>> pages. >>> >>> This seems more aligned with the contig page direction that hugepd >>> should be moved over to.. >> >> Should hugepd be moved to the contig page direction, really ? > > Sure? Is there any downside for the reading side to do so? Probably not. > >> Would it be acceptable that a 8M hugepage requires 2048 contig entries >> in 2 page tables, when the hugepd allows a single entry ? > > ? I thought we agreed the only difference would be that something new > is needed to merge the two identical sibling page tables into one, ie > you pay 2x the page table memory if that isn't fixed. That is write > side only change and I imagine it could be done with a single PPC > special API. > > Honestly not totally sure that is a big deal, it is already really > memory inefficient compared to every other arch's huge page by needing > the child page table in the first place. > >> Would it be acceptable performancewise ? > > Isn't this particular PPC sub platform ancient? Are there current real > users that are going to have hugetlbfs special code and care about > this performance detail on a 6.20 era kernel? Ancient yes but still widely in use and with the emergence of voice over IP in Air Trafic Control, performance becomes more and more challenge with those old boards that have another 10 years in front of them. > > In today's world wouldn't it be performance better if these platforms > could support THP by aligning to the contig API instead of being > special? Indeed, if we can promote THP that'd be even better. > > Am I wrong to question why we are polluting the core code for this > special optimization? At the first place that was to get a close fit between hardware pagetable topology and linux pagetable topology. But obviously we already stepped back for 512k pages, so let's go one more step aside and do similar with 8M pages. I'll give it a try and see how it goes. Christophe ^ permalink raw reply [flat|nested] 18+ messages in thread
[parent not found: <ZfsKIResY4YcxkxK@x1n>]
* Re: [PATCH 09/13] mm/powerpc: Redefine pXd_huge() with pXd_leaf() [not found] ` <ZfsKIResY4YcxkxK@x1n> @ 2024-03-20 17:40 ` Christophe Leroy 2024-03-20 20:24 ` Peter Xu 0 siblings, 1 reply; 18+ messages in thread From: Christophe Leroy @ 2024-03-20 17:40 UTC (permalink / raw) To: Peter Xu, Aneesh Kumar K.V Cc: Jason Gunthorpe, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org, Matthew Wilcox, linuxppc-dev@lists.ozlabs.org, Andrew Morton, x86@kernel.org, Mike Rapoport, Muchun Song, sparclinux@vger.kernel.org, Michael Ellerman, Nicholas Piggin, Naveen N. Rao Le 20/03/2024 à 17:09, Peter Xu a écrit : > On Wed, Mar 20, 2024 at 06:16:43AM +0000, Christophe Leroy wrote: >> At the first place that was to get a close fit between hardware >> pagetable topology and linux pagetable topology. But obviously we >> already stepped back for 512k pages, so let's go one more step aside and >> do similar with 8M pages. >> >> I'll give it a try and see how it goes. > > So you're talking about 8M only for 8xx, am I right? Yes I am. > > There seem to be other PowerPC systems use hugepd. Is it possible that we > convert all hugepd into cont_pte form? Indeed. Seems like we have hugepd for book3s/64 and for nohash. For book3s I don't know, may Aneesh can answer. For nohash I think it should be possible because TLB misses are handled by software. Even the e6500 which has a hardware tablewalk falls back on software walk when it is a hugepage IIUC. Christophe ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 09/13] mm/powerpc: Redefine pXd_huge() with pXd_leaf() 2024-03-20 17:40 ` Christophe Leroy @ 2024-03-20 20:24 ` Peter Xu 0 siblings, 0 replies; 18+ messages in thread From: Peter Xu @ 2024-03-20 20:24 UTC (permalink / raw) To: Christophe Leroy Cc: Aneesh Kumar K.V, Jason Gunthorpe, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org, Matthew Wilcox, linuxppc-dev@lists.ozlabs.org, Andrew Morton, x86@kernel.org, Mike Rapoport, Muchun Song, sparclinux@vger.kernel.org, Michael Ellerman, Nicholas Piggin, Naveen N. Rao On Wed, Mar 20, 2024 at 05:40:39PM +0000, Christophe Leroy wrote: > > > Le 20/03/2024 à 17:09, Peter Xu a écrit : > > On Wed, Mar 20, 2024 at 06:16:43AM +0000, Christophe Leroy wrote: > >> At the first place that was to get a close fit between hardware > >> pagetable topology and linux pagetable topology. But obviously we > >> already stepped back for 512k pages, so let's go one more step aside and > >> do similar with 8M pages. > >> > >> I'll give it a try and see how it goes. > > > > So you're talking about 8M only for 8xx, am I right? > > Yes I am. > > > > > There seem to be other PowerPC systems use hugepd. Is it possible that we > > convert all hugepd into cont_pte form? > > Indeed. > > Seems like we have hugepd for book3s/64 and for nohash. > > For book3s I don't know, may Aneesh can answer. > > For nohash I think it should be possible because TLB misses are handled > by software. Even the e6500 which has a hardware tablewalk falls back on > software walk when it is a hugepage IIUC. It'll be great if I can get some answer here, and then I know the path for hugepd in general. I don't want to add any new code into core mm to something destined to fade away soon. One option for me is I can check a macro of hugepd existance, so all new code will only work when hugepd is not supported on such arch. However that'll start to make some PowerPC systems special (which I still tried hard to avoid, if that wasn't proved in the past..), meanwhile we'll also need to keep some generic-mm paths (that I can already remove along with the new code) only for these hugepd systems. But it's still okay to me, it'll be just a matter of when to drop those codes, sooner or later. Thanks, -- Peter Xu ^ permalink raw reply [flat|nested] 18+ messages in thread
[parent not found: <20240313214719.253873-12-peterx@redhat.com>]
* Re: [PATCH 11/13] mm/treewide: Replace pXd_huge() with pXd_leaf() [not found] ` <20240313214719.253873-12-peterx@redhat.com> @ 2024-03-14 8:50 ` Christophe Leroy 2024-03-14 12:59 ` Peter Xu 0 siblings, 1 reply; 18+ messages in thread From: Christophe Leroy @ 2024-03-14 8:50 UTC (permalink / raw) To: peterx@redhat.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: linux-arm-kernel@lists.infradead.org, Matthew Wilcox, linuxppc-dev@lists.ozlabs.org, Andrew Morton, x86@kernel.org, Mike Rapoport, Muchun Song, sparclinux@vger.kernel.org, Jason Gunthorpe Le 13/03/2024 à 22:47, peterx@redhat.com a écrit : > From: Peter Xu <peterx@redhat.com> > > Now after we're sure all pXd_huge() definitions are the same as pXd_leaf(), > reuse it. Luckily, pXd_huge() isn't widely used. > > Signed-off-by: Peter Xu <peterx@redhat.com> > --- > arch/arm/include/asm/pgtable-3level.h | 2 +- > arch/arm64/include/asm/pgtable.h | 2 +- > arch/arm64/mm/hugetlbpage.c | 4 ++-- > arch/loongarch/mm/hugetlbpage.c | 2 +- > arch/mips/mm/tlb-r4k.c | 2 +- > arch/powerpc/mm/pgtable_64.c | 6 +++--- > arch/x86/mm/pgtable.c | 4 ++-- > mm/gup.c | 4 ++-- > mm/hmm.c | 2 +- > mm/memory.c | 2 +- > 10 files changed, 15 insertions(+), 15 deletions(-) > > diff --git a/arch/arm/include/asm/pgtable-3level.h b/arch/arm/include/asm/pgtable-3level.h > index e7aecbef75c9..9e3c44f0aea2 100644 > --- a/arch/arm/include/asm/pgtable-3level.h > +++ b/arch/arm/include/asm/pgtable-3level.h > @@ -190,7 +190,7 @@ static inline pte_t pte_mkspecial(pte_t pte) > #define pmd_dirty(pmd) (pmd_isset((pmd), L_PMD_SECT_DIRTY)) > > #define pmd_hugewillfault(pmd) (!pmd_young(pmd) || !pmd_write(pmd)) > -#define pmd_thp_or_huge(pmd) (pmd_huge(pmd) || pmd_trans_huge(pmd)) > +#define pmd_thp_or_huge(pmd) (pmd_leaf(pmd) || pmd_trans_huge(pmd)) Previous patch said pmd_trans_huge() implies pmd_leaf(). Or is that only for GUP ? > > #ifdef CONFIG_TRANSPARENT_HUGEPAGE > #define pmd_trans_huge(pmd) (pmd_val(pmd) && !pmd_table(pmd)) > diff --git a/mm/hmm.c b/mm/hmm.c > index c95b9ec5d95f..93aebd9cc130 100644 > --- a/mm/hmm.c > +++ b/mm/hmm.c > @@ -429,7 +429,7 @@ static int hmm_vma_walk_pud(pud_t *pudp, unsigned long start, unsigned long end, > return hmm_vma_walk_hole(start, end, -1, walk); > } > > - if (pud_huge(pud) && pud_devmap(pud)) { > + if (pud_leaf(pud) && pud_devmap(pud)) { Didn't previous patch say devmap implies leaf ? Or is it only for GUP ? > unsigned long i, npages, pfn; > unsigned int required_fault; > unsigned long *hmm_pfns; ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 11/13] mm/treewide: Replace pXd_huge() with pXd_leaf() 2024-03-14 8:50 ` [PATCH 11/13] mm/treewide: Replace " Christophe Leroy @ 2024-03-14 12:59 ` Peter Xu 2024-03-18 16:16 ` Jason Gunthorpe 0 siblings, 1 reply; 18+ messages in thread From: Peter Xu @ 2024-03-14 12:59 UTC (permalink / raw) To: Christophe Leroy Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org, Matthew Wilcox, linuxppc-dev@lists.ozlabs.org, Andrew Morton, x86@kernel.org, Mike Rapoport, Muchun Song, sparclinux@vger.kernel.org, Jason Gunthorpe On Thu, Mar 14, 2024 at 08:50:20AM +0000, Christophe Leroy wrote: > > > Le 13/03/2024 à 22:47, peterx@redhat.com a écrit : > > From: Peter Xu <peterx@redhat.com> > > > > Now after we're sure all pXd_huge() definitions are the same as pXd_leaf(), > > reuse it. Luckily, pXd_huge() isn't widely used. > > > > Signed-off-by: Peter Xu <peterx@redhat.com> > > --- > > arch/arm/include/asm/pgtable-3level.h | 2 +- > > arch/arm64/include/asm/pgtable.h | 2 +- > > arch/arm64/mm/hugetlbpage.c | 4 ++-- > > arch/loongarch/mm/hugetlbpage.c | 2 +- > > arch/mips/mm/tlb-r4k.c | 2 +- > > arch/powerpc/mm/pgtable_64.c | 6 +++--- > > arch/x86/mm/pgtable.c | 4 ++-- > > mm/gup.c | 4 ++-- > > mm/hmm.c | 2 +- > > mm/memory.c | 2 +- > > 10 files changed, 15 insertions(+), 15 deletions(-) > > > > diff --git a/arch/arm/include/asm/pgtable-3level.h b/arch/arm/include/asm/pgtable-3level.h > > index e7aecbef75c9..9e3c44f0aea2 100644 > > --- a/arch/arm/include/asm/pgtable-3level.h > > +++ b/arch/arm/include/asm/pgtable-3level.h > > @@ -190,7 +190,7 @@ static inline pte_t pte_mkspecial(pte_t pte) > > #define pmd_dirty(pmd) (pmd_isset((pmd), L_PMD_SECT_DIRTY)) > > > > #define pmd_hugewillfault(pmd) (!pmd_young(pmd) || !pmd_write(pmd)) > > -#define pmd_thp_or_huge(pmd) (pmd_huge(pmd) || pmd_trans_huge(pmd)) > > +#define pmd_thp_or_huge(pmd) (pmd_leaf(pmd) || pmd_trans_huge(pmd)) > > Previous patch said pmd_trans_huge() implies pmd_leaf(). Ah here I remember I kept this arm definition there because I think we should add a patch to drop pmd_thp_or_huge() completely. If you won't mind I can add one more patch instead of doing it here. Then I keep this patch purely as a replacement patch without further changes on arch-cleanups. > > Or is that only for GUP ? I think it should apply to all. > > > > > #ifdef CONFIG_TRANSPARENT_HUGEPAGE > > #define pmd_trans_huge(pmd) (pmd_val(pmd) && !pmd_table(pmd)) > > > > diff --git a/mm/hmm.c b/mm/hmm.c > > index c95b9ec5d95f..93aebd9cc130 100644 > > --- a/mm/hmm.c > > +++ b/mm/hmm.c > > @@ -429,7 +429,7 @@ static int hmm_vma_walk_pud(pud_t *pudp, unsigned long start, unsigned long end, > > return hmm_vma_walk_hole(start, end, -1, walk); > > } > > > > - if (pud_huge(pud) && pud_devmap(pud)) { > > + if (pud_leaf(pud) && pud_devmap(pud)) { > > Didn't previous patch say devmap implies leaf ? Or is it only for GUP ? This is an extra safety check that I didn't remove. Devmap used separate bits even though I'm not clear on why. It should still imply a leaf though. Thanks, > > > unsigned long i, npages, pfn; > > unsigned int required_fault; > > unsigned long *hmm_pfns; > > -- Peter Xu ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 11/13] mm/treewide: Replace pXd_huge() with pXd_leaf() 2024-03-14 12:59 ` Peter Xu @ 2024-03-18 16:16 ` Jason Gunthorpe 0 siblings, 0 replies; 18+ messages in thread From: Jason Gunthorpe @ 2024-03-18 16:16 UTC (permalink / raw) To: Peter Xu Cc: Christophe Leroy, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org, Matthew Wilcox, linuxppc-dev@lists.ozlabs.org, Andrew Morton, x86@kernel.org, Mike Rapoport, Muchun Song, sparclinux@vger.kernel.org On Thu, Mar 14, 2024 at 08:59:22AM -0400, Peter Xu wrote: > > > --- a/mm/hmm.c > > > +++ b/mm/hmm.c > > > @@ -429,7 +429,7 @@ static int hmm_vma_walk_pud(pud_t *pudp, unsigned long start, unsigned long end, > > > return hmm_vma_walk_hole(start, end, -1, walk); > > > } > > > > > > - if (pud_huge(pud) && pud_devmap(pud)) { > > > + if (pud_leaf(pud) && pud_devmap(pud)) { > > > > Didn't previous patch say devmap implies leaf ? Or is it only for GUP ? > > This is an extra safety check that I didn't remove. Devmap used separate > bits even though I'm not clear on why. It should still imply a leaf though. Yes, something is very wrong if devmap is true on non-leaf.. Jason ^ permalink raw reply [flat|nested] 18+ messages in thread
[parent not found: <20240313214719.253873-13-peterx@redhat.com>]
* Re: [PATCH 12/13] mm/treewide: Remove pXd_huge() [not found] ` <20240313214719.253873-13-peterx@redhat.com> @ 2024-03-14 8:56 ` Christophe Leroy 0 siblings, 0 replies; 18+ messages in thread From: Christophe Leroy @ 2024-03-14 8:56 UTC (permalink / raw) To: peterx@redhat.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: linux-arm-kernel@lists.infradead.org, Matthew Wilcox, linuxppc-dev@lists.ozlabs.org, Andrew Morton, x86@kernel.org, Mike Rapoport, Muchun Song, sparclinux@vger.kernel.org, Jason Gunthorpe Le 13/03/2024 à 22:47, peterx@redhat.com a écrit : > From: Peter Xu <peterx@redhat.com> > > This API is not used anymore, drop it for the whole tree. > > Signed-off-by: Peter Xu <peterx@redhat.com> > --- > arch/arm/mm/Makefile | 1 - > arch/arm/mm/hugetlbpage.c | 29 ------------------- > arch/arm64/mm/hugetlbpage.c | 10 ------- > arch/loongarch/mm/hugetlbpage.c | 10 ------- > arch/mips/include/asm/pgtable-32.h | 2 +- > arch/mips/include/asm/pgtable-64.h | 2 +- > arch/mips/mm/hugetlbpage.c | 10 ------- > arch/parisc/mm/hugetlbpage.c | 11 ------- > .../include/asm/book3s/64/pgtable-4k.h | 10 ------- > .../include/asm/book3s/64/pgtable-64k.h | 25 ---------------- > arch/powerpc/include/asm/nohash/pgtable.h | 10 ------- > arch/riscv/mm/hugetlbpage.c | 10 ------- > arch/s390/mm/hugetlbpage.c | 10 ------- > arch/sh/mm/hugetlbpage.c | 10 ------- > arch/sparc/mm/hugetlbpage.c | 10 ------- > arch/x86/mm/hugetlbpage.c | 16 ---------- > include/linux/hugetlb.h | 24 --------------- > 17 files changed, 2 insertions(+), 198 deletions(-) > delete mode 100644 arch/arm/mm/hugetlbpage.c > > diff --git a/arch/mips/include/asm/pgtable-32.h b/arch/mips/include/asm/pgtable-32.h > index 0e196650f4f4..92b7591aac2a 100644 > --- a/arch/mips/include/asm/pgtable-32.h > +++ b/arch/mips/include/asm/pgtable-32.h > @@ -129,7 +129,7 @@ static inline int pmd_none(pmd_t pmd) > static inline int pmd_bad(pmd_t pmd) > { > #ifdef CONFIG_MIPS_HUGE_TLB_SUPPORT > - /* pmd_huge(pmd) but inline */ > + /* pmd_leaf(pmd) but inline */ Shouldn't this comment have been changed in patch 11 ? > if (unlikely(pmd_val(pmd) & _PAGE_HUGE)) Unlike pmd_huge() which is an outline function, pmd_leaf() is a macro so it could be used here instead of open coping. > return 0; > #endif > diff --git a/arch/mips/include/asm/pgtable-64.h b/arch/mips/include/asm/pgtable-64.h > index 20ca48c1b606..7c28510b3768 100644 > --- a/arch/mips/include/asm/pgtable-64.h > +++ b/arch/mips/include/asm/pgtable-64.h > @@ -245,7 +245,7 @@ static inline int pmd_none(pmd_t pmd) > static inline int pmd_bad(pmd_t pmd) > { > #ifdef CONFIG_MIPS_HUGE_TLB_SUPPORT > - /* pmd_huge(pmd) but inline */ > + /* pmd_leaf(pmd) but inline */ Same > if (unlikely(pmd_val(pmd) & _PAGE_HUGE)) Same > return 0; > #endif > diff --git a/arch/powerpc/include/asm/book3s/64/pgtable-64k.h b/arch/powerpc/include/asm/book3s/64/pgtable-64k.h > index 2fce3498b000..579a7153857f 100644 > --- a/arch/powerpc/include/asm/book3s/64/pgtable-64k.h > +++ b/arch/powerpc/include/asm/book3s/64/pgtable-64k.h > @@ -4,31 +4,6 @@ > > #ifndef __ASSEMBLY__ > #ifdef CONFIG_HUGETLB_PAGE > -/* > - * We have PGD_INDEX_SIZ = 12 and PTE_INDEX_SIZE = 8, so that we can have > - * 16GB hugepage pte in PGD and 16MB hugepage pte at PMD; > - * > - * Defined in such a way that we can optimize away code block at build time > - * if CONFIG_HUGETLB_PAGE=n. > - * > - * returns true for pmd migration entries, THP, devmap, hugetlb > - * But compile time dependent on CONFIG_HUGETLB_PAGE > - */ Should we keep this comment somewhere for documentation ? > -static inline int pmd_huge(pmd_t pmd) > -{ > - /* > - * leaf pte for huge page > - */ > - return !!(pmd_raw(pmd) & cpu_to_be64(_PAGE_PTE)); > -} > - > -static inline int pud_huge(pud_t pud) > -{ > - /* > - * leaf pte for huge page > - */ > - return !!(pud_raw(pud) & cpu_to_be64(_PAGE_PTE)); > -} > > /* > * With 64k page size, we have hugepage ptes in the pgd and pmd entries. We don't ^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2024-03-20 20:24 UTC | newest]
Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-03-13 21:47 [PATCH 00/13] mm/treewide: Remove pXd_huge() API peterx
2024-03-13 21:47 ` [PATCH 03/13] mm/gup: Check p4d presence before going on peterx
2024-03-13 21:47 ` [PATCH 06/13] mm/arm: Use macros to define pmd/pud helpers peterx
2024-03-13 21:47 ` [PATCH 07/13] mm/arm: Redefine pmd_huge() with pmd_leaf() peterx
2024-03-13 21:47 ` [PATCH 08/13] mm/arm64: Merge pXd_huge() and pXd_leaf() definitions peterx
2024-03-13 21:47 ` [PATCH 10/13] mm/gup: Merge pXd huge mapping checks peterx
[not found] ` <20240313214719.253873-10-peterx@redhat.com>
2024-03-14 8:45 ` [PATCH 09/13] mm/powerpc: Redefine pXd_huge() with pXd_leaf() Christophe Leroy
[not found] ` <ZfLzZekFBp3J6JUy@x1n>
2024-03-14 13:11 ` Christophe Leroy
2024-03-18 16:15 ` Jason Gunthorpe
2024-03-19 23:07 ` Christophe Leroy
2024-03-19 23:26 ` Jason Gunthorpe
2024-03-20 6:16 ` Christophe Leroy
[not found] ` <ZfsKIResY4YcxkxK@x1n>
2024-03-20 17:40 ` Christophe Leroy
2024-03-20 20:24 ` Peter Xu
[not found] ` <20240313214719.253873-12-peterx@redhat.com>
2024-03-14 8:50 ` [PATCH 11/13] mm/treewide: Replace " Christophe Leroy
2024-03-14 12:59 ` Peter Xu
2024-03-18 16:16 ` Jason Gunthorpe
[not found] ` <20240313214719.253873-13-peterx@redhat.com>
2024-03-14 8:56 ` [PATCH 12/13] mm/treewide: Remove pXd_huge() Christophe Leroy
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).