All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mike Rapoport <rppt@kernel.org>
To: Christophe Leroy <christophe.leroy@c-s.fr>
Cc: Rich Felker <dalias@libc.org>,
	linux-ia64@vger.kernel.org,
	Geert Uytterhoeven <geert+renesas@glider.be>,
	linux-sh@vger.kernel.org,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	linux-mm@kvack.org, Paul Mackerras <paulus@samba.org>,
	linux-hexagon@vger.kernel.org, Will Deacon <will@kernel.org>,
	kvmarm@lists.cs.columbia.edu, Jonas Bonn <jonas@southpole.se>,
	linux-arch@vger.kernel.org, Brian Cain <bcain@codeaurora.org>,
	Marc Zyngier <maz@kernel.org>,
	Russell King <linux@armlinux.org.uk>,
	Ley Foon Tan <ley.foon.tan@intel.com>,
	Mike Rapoport <rppt@linux.ibm.com>,
	Catalin Marinas <catalin.marinas@arm.com>,
	uclinux-h8-devel@lists.sourceforge.jp,
	Fenghua Yu <fenghua.yu@intel.com>, Arnd Bergmann <arnd@arndb.de>,
	kvm-ppc@vger.kernel.org,
	Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>,
	openrisc@lists.librecores.org, Stafford Horne <shorne@gmail.com>,
	Guan Xuetao <gxt@pku.edu.cn>,
	linux-arm-kernel@lists.infradead.org,
	Tony Luck <tony.luck@intel.com>,
	Yoshinori Sato <ysato@users.sourceforge.jp>,
	linux-kernel@vger.kernel.org,
	Michael Ellerman <mpe@ellerman.id.au>,
	nios2-dev@lists.rocketboards.org,
	Andrew Morton <akpm@linux-foundation.org>,
	linuxppc-dev@lists.ozlabs.org
Subject: Re: [PATCH v2 07/13] powerpc: add support for folded p4d page tables
Date: Tue, 18 Feb 2020 10:54:40 +0000	[thread overview]
Message-ID: <20200218105440.GA1698@hump> (raw)
In-Reply-To: <c79b363c-a111-389a-5752-51cf85fa8c44@c-s.fr>

On Sun, Feb 16, 2020 at 11:41:07AM +0100, Christophe Leroy wrote:
> 
> 
> Le 16/02/2020 à 09:18, Mike Rapoport a écrit :
> > From: Mike Rapoport <rppt@linux.ibm.com>
> > 
> > Implement primitives necessary for the 4th level folding, add walks of p4d
> > level where appropriate and replace 5level-fixup.h with pgtable-nop4d.h.
> 
> I don't think it is worth adding all this additionnals walks of p4d, this
> patch could be limited to changes like:
> 
> -		pud = pud_offset(pgd, gpa);
> +		pud = pud_offset(p4d_offset(pgd, gpa), gpa);
> 
> The additionnal walks should be added through another patch the day powerpc
> need them.

Ok, I'll update the patch to reduce walking the p4d.
 
> See below for more comments.
> 
> > 
> > Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
> > Tested-by: Christophe Leroy <christophe.leroy@c-s.fr> # 8xx and 83xx
> > ---

...

> > diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h
> > index 201a69e6a355..ddddbafff0ab 100644
> > --- a/arch/powerpc/include/asm/book3s/64/pgtable.h
> > +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
> > @@ -2,7 +2,7 @@
> >   #ifndef _ASM_POWERPC_BOOK3S_64_PGTABLE_H_
> >   #define _ASM_POWERPC_BOOK3S_64_PGTABLE_H_
> > -#include <asm-generic/5level-fixup.h>
> > +#include <asm-generic/pgtable-nop4d.h>
> >   #ifndef __ASSEMBLY__
> >   #include <linux/mmdebug.h>
> > @@ -251,7 +251,7 @@ extern unsigned long __pmd_frag_size_shift;
> >   /* Bits to mask out from a PUD to get to the PMD page */
> >   #define PUD_MASKED_BITS		0xc0000000000000ffUL
> >   /* Bits to mask out from a PGD to get to the PUD page */
> > -#define PGD_MASKED_BITS		0xc0000000000000ffUL
> > +#define P4D_MASKED_BITS		0xc0000000000000ffUL
> >   /*
> >    * Used as an indicator for rcu callback functions
> > @@ -949,54 +949,60 @@ static inline bool pud_access_permitted(pud_t pud, bool write)
> >   	return pte_access_permitted(pud_pte(pud), write);
> >   }
> > -#define pgd_write(pgd)		pte_write(pgd_pte(pgd))
> > +#define __p4d_raw(x)	((p4d_t) { __pgd_raw(x) })
> > +static inline __be64 p4d_raw(p4d_t x)
> > +{
> > +	return pgd_raw(x.pgd);
> > +}
> > +
> 
> Shouldn't this be defined in asm/pgtable-be-types.h, just like other
> __pxx_raw() ?

Ideally yes, but this creates weird header file dependencies and untangling
them would generate way too much churn.
 
> > +#define p4d_write(p4d)		pte_write(p4d_pte(p4d))
> > -static inline void pgd_clear(pgd_t *pgdp)
> > +static inline void p4d_clear(p4d_t *p4dp)
> >   {
> > -	*pgdp = __pgd(0);
> > +	*p4dp = __p4d(0);
> >   }

...

> > @@ -573,9 +596,15 @@ int kvmppc_create_pte(struct kvm *kvm, pgd_t *pgtable, pte_t pte,
> >   	/* Traverse the guest's 2nd-level tree, allocate new levels needed */
> >   	pgd = pgtable + pgd_index(gpa);
> > -	pud = NULL;
> > +	p4d = NULL;
> >   	if (pgd_present(*pgd))
> > -		pud = pud_offset(pgd, gpa);
> > +		p4d = p4d_offset(pgd, gpa);
> > +	else
> > +		new_p4d = p4d_alloc_one(kvm->mm, gpa);
> > +
> > +	pud = NULL;
> > +	if (p4d_present(*p4d))
> > +		pud = pud_offset(p4d, gpa);
> 
> Is it worth adding all this new code ?
> 
> My understanding is that the series objective is to get rid of
> __ARCH_HAS_5LEVEL_HACK, to to add support for 5 levels to an architecture
> that not need it (at least for now).
> If we want to add support for 5 levels, it can be done later in another
> patch.
> 
> Here I think your change could be limited to:
> 
> -		pud = pud_offset(pgd, gpa);
> +		pud = pud_offset(p4d_offset(pgd, gpa), gpa);

This won't work. Without __ARCH_USE_5LEVEL_HACK defined pgd_present() is
hardwired to 1 and the actual check for the top level is performed with
p4d_present(). The 'else' clause that allocates p4d will never be taken and
it could be removed, but I prefer to keep it for consistency.
 
> >   	else
> >   		new_pud = pud_alloc_one(kvm->mm, gpa);
> > @@ -597,12 +626,18 @@ int kvmppc_create_pte(struct kvm *kvm, pgd_t *pgtable, pte_t pte,
> >   	/* Now traverse again under the lock and change the tree */
> >   	ret = -ENOMEM;
> >   	if (pgd_none(*pgd)) {
> > +		if (!new_p4d)
> > +			goto out_unlock;
> > +		pgd_populate(kvm->mm, pgd, new_p4d);
> > +		new_p4d = NULL;
> > +	}
> > +	if (p4d_none(*p4d)) {
> >   		if (!new_pud)
> >   			goto out_unlock;
> > -		pgd_populate(kvm->mm, pgd, new_pud);
> > +		p4d_populate(kvm->mm, p4d, new_pud);
> >   		new_pud = NULL;
> >   	}
> > -	pud = pud_offset(pgd, gpa);
> > +	pud = pud_offset(p4d, gpa);
> >   	if (pud_is_leaf(*pud)) {
> >   		unsigned long hgpa = gpa & PUD_MASK;
> > @@ -1220,6 +1255,7 @@ static ssize_t debugfs_radix_read(struct file *file, char __user *buf,
> >   	pgd_t *pgt;
> >   	struct kvm_nested_guest *nested;
> >   	pgd_t pgd, *pgdp;
> > +	p4d_t p4d, *p4dp;
> >   	pud_t pud, *pudp;
> >   	pmd_t pmd, *pmdp;
> >   	pte_t *ptep;
> > @@ -1298,7 +1334,14 @@ static ssize_t debugfs_radix_read(struct file *file, char __user *buf,
> >   			continue;
> >   		}
> > -		pudp = pud_offset(&pgd, gpa);
> > +		p4dp = p4d_offset(&pgd, gpa);
> > +		p4d = READ_ONCE(*p4dp);
> > +		if (!(p4d_val(p4d) & _PAGE_PRESENT)) {
> > +			gpa = (gpa & P4D_MASK) + P4D_SIZE;
> > +			continue;
> > +		}
> > +
> > +		pudp = pud_offset(&p4d, gpa);
> 
> Same, here you are forcing a useless read with READ_ONCE().
> 
> Your change could be limited to
> 
> -		pudp = pud_offset(&pgd, gpa);
> +		pudp = pud_offset(p4d_offset(&pgd, gpa), gpa);

Here again the actual check must be done against p4d rather than pgd. We
could skip READ_ONCE() for pgd, but since it is a debugfs method I don't
think it is more important than code consistency.
 
> This comment applies to many other places.

I'll make another pass to see where we can take the shortcut and use 

	pudp = pud_offset(p4d_offset(...))
 
> >   		pud = READ_ONCE(*pudp);
> >   		if (!(pud_val(pud) & _PAGE_PRESENT)) {
> >   			gpa = (gpa & PUD_MASK) + PUD_SIZE;
> > diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
> > index 3345f039a876..7a59f6863cec 100644
> > --- a/arch/powerpc/lib/code-patching.c
> > +++ b/arch/powerpc/lib/code-patching.c
> > @@ -107,13 +107,18 @@ static inline int unmap_patch_area(unsigned long addr)
> >   	pte_t *ptep;
> >   	pmd_t *pmdp;
> >   	pud_t *pudp;
> > +	p4d_t *p4dp;
> >   	pgd_t *pgdp;
> >   	pgdp = pgd_offset_k(addr);
> >   	if (unlikely(!pgdp))
> >   		return -EINVAL;
> > -	pudp = pud_offset(pgdp, addr);
> > +	p4dp = p4d_offset(pgdp, addr);
> > +	if (unlikely(!p4dp))
> > +		return -EINVAL;
> > +
> > +	pudp = pud_offset(p4dp, addr);
> >   	if (unlikely(!pudp))
> >   		return -EINVAL;
> > diff --git a/arch/powerpc/mm/book3s32/mmu.c b/arch/powerpc/mm/book3s32/mmu.c
> > index 0a1c65a2c565..b2fc3e71165c 100644
> > --- a/arch/powerpc/mm/book3s32/mmu.c
> > +++ b/arch/powerpc/mm/book3s32/mmu.c
> > @@ -312,7 +312,7 @@ void hash_preload(struct mm_struct *mm, unsigned long ea)
> >   	if (!Hash)
> >   		return;
> > -	pmd = pmd_offset(pud_offset(pgd_offset(mm, ea), ea), ea);
> > +	pmd = pmd_offset(pud_offset(p4d_offset(pgd_offset(mm, ea), ea), ea), ea);
> 
> If we continue like this, in ten years this like is going to be many
> kilometers long.
> 
> I think the above would be worth a generic helper.

Agree. My plan was to first unify all the architectures and then start
introducing the generic helpers, like e.g. pmd_offset_mm().
 
> >   	if (!pmd_none(*pmd))
> >   		add_hash_page(mm->context.id, ea, pmd_val(*pmd));
> >   }
> > diff --git a/arch/powerpc/mm/book3s32/tlb.c b/arch/powerpc/mm/book3s32/tlb.c
> > index 2fcd321040ff..175bc33b41b7 100644
> > --- a/arch/powerpc/mm/book3s32/tlb.c
> > +++ b/arch/powerpc/mm/book3s32/tlb.c
> > @@ -87,7 +87,7 @@ static void flush_range(struct mm_struct *mm, unsigned long start,
> >   	if (start >= end)
> >   		return;
> >   	end = (end - 1) | ~PAGE_MASK;
> > -	pmd = pmd_offset(pud_offset(pgd_offset(mm, start), start), start);
> > +	pmd = pmd_offset(pud_offset(p4d_offset(pgd_offset(mm, start), start), start), start);
> >   	for (;;) {
> >   		pmd_end = ((start + PGDIR_SIZE) & PGDIR_MASK) - 1;
> >   		if (pmd_end > end)
> > @@ -145,7 +145,7 @@ void flush_tlb_page(struct vm_area_struct *vma, unsigned long vmaddr)
> >   		return;
> >   	}
> >   	mm = (vmaddr < TASK_SIZE)? vma->vm_mm: &init_mm;
> > -	pmd = pmd_offset(pud_offset(pgd_offset(mm, vmaddr), vmaddr), vmaddr);
> > +	pmd = pmd_offset(pud_offset(p4d_offset(pgd_offset(mm, vmaddr), vmaddr), vmaddr), vmaddr);
> >   	if (!pmd_none(*pmd))
> >   		flush_hash_pages(mm->context.id, vmaddr, pmd_val(*pmd), 1);
> >   }
> > diff --git a/arch/powerpc/mm/book3s64/hash_pgtable.c b/arch/powerpc/mm/book3s64/hash_pgtable.c
> > index 64733b9cb20a..9cd15937e88a 100644
> > --- a/arch/powerpc/mm/book3s64/hash_pgtable.c
> > +++ b/arch/powerpc/mm/book3s64/hash_pgtable.c
> > @@ -148,6 +148,7 @@ void hash__vmemmap_remove_mapping(unsigned long start,
> >   int hash__map_kernel_page(unsigned long ea, unsigned long pa, pgprot_t prot)
> >   {
> >   	pgd_t *pgdp;
> > +	p4d_t *p4dp;
> >   	pud_t *pudp;
> >   	pmd_t *pmdp;
> >   	pte_t *ptep;
> > @@ -155,7 +156,8 @@ int hash__map_kernel_page(unsigned long ea, unsigned long pa, pgprot_t prot)
> >   	BUILD_BUG_ON(TASK_SIZE_USER64 > H_PGTABLE_RANGE);
> >   	if (slab_is_available()) {
> >   		pgdp = pgd_offset_k(ea);
> > -		pudp = pud_alloc(&init_mm, pgdp, ea);
> > +		p4dp = p4d_offset(pgdp, ea);
> > +		pudp = pud_alloc(&init_mm, p4dp, ea);
> 
> Could be a single line, without a new var.
> 
> -		pudp = pud_alloc(&init_mm, pgdp, ea);
> +		pudp = pud_alloc(&init_mm, p4d_offset(pgdp, ea), ea);
> 
> 
> Same kind of comments as already done apply to the rest.
> 
> Christophe

-- 
Sincerely yours,
Mike.

WARNING: multiple messages have this Message-ID (diff)
From: Mike Rapoport <rppt@kernel.org>
To: Christophe Leroy <christophe.leroy@c-s.fr>
Cc: Rich Felker <dalias@libc.org>,
	linux-ia64@vger.kernel.org,
	Geert Uytterhoeven <geert+renesas@glider.be>,
	linux-sh@vger.kernel.org,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	linux-mm@kvack.org, Paul Mackerras <paulus@samba.org>,
	linux-hexagon@vger.kernel.org, Will Deacon <will@kernel.org>,
	kvmarm@lists.cs.columbia.edu, Jonas Bonn <jonas@southpole.se>,
	linux-arch@vger.kernel.org, Brian Cain <bcain@codeaurora.org>,
	Marc Zyngier <maz@kernel.org>,
	Russell King <linux@armlinux.org.uk>,
	Ley Foon Tan <ley.foon.tan@intel.com>,
	Mike Rapoport <rppt@linux.ibm.com>,
	Catalin Marinas <catalin.marinas@arm.com>,
	uclinux-h8-devel@lists.sourceforge.jp,
	Fenghua Yu <fenghua.yu@intel.com>, Arnd Bergmann <arnd@arndb.de>,
	kvm-ppc@vger.kernel.org,
	Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>,
	openrisc@lists.librecores.org, Stafford Horne <shorne@gmail.com>,
	Guan Xuetao <gxt@pku.edu.cn>,
	linux-arm-kernel@lists.infradead.org,
	Tony Luck <tony.luck@intel.com>,
	Yoshinori Sato <ysato@users.sourceforge.jp>,
	linux-kernel@vger.kernel.org,
	Michael Ellerman <mpe@ellerman.id.au>,
	nios2-dev@lists.rocketboards.org,
	Andrew Morton <akpm@linux-foundation.org>,
	linuxppc-dev@lists.ozlabs.org
Subject: Re: [PATCH v2 07/13] powerpc: add support for folded p4d page tables
Date: Tue, 18 Feb 2020 12:54:40 +0200	[thread overview]
Message-ID: <20200218105440.GA1698@hump> (raw)
In-Reply-To: <c79b363c-a111-389a-5752-51cf85fa8c44@c-s.fr>

On Sun, Feb 16, 2020 at 11:41:07AM +0100, Christophe Leroy wrote:
> 
> 
> Le 16/02/2020 à 09:18, Mike Rapoport a écrit :
> > From: Mike Rapoport <rppt@linux.ibm.com>
> > 
> > Implement primitives necessary for the 4th level folding, add walks of p4d
> > level where appropriate and replace 5level-fixup.h with pgtable-nop4d.h.
> 
> I don't think it is worth adding all this additionnals walks of p4d, this
> patch could be limited to changes like:
> 
> -		pud = pud_offset(pgd, gpa);
> +		pud = pud_offset(p4d_offset(pgd, gpa), gpa);
> 
> The additionnal walks should be added through another patch the day powerpc
> need them.

Ok, I'll update the patch to reduce walking the p4d.
 
> See below for more comments.
> 
> > 
> > Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
> > Tested-by: Christophe Leroy <christophe.leroy@c-s.fr> # 8xx and 83xx
> > ---

...

> > diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h
> > index 201a69e6a355..ddddbafff0ab 100644
> > --- a/arch/powerpc/include/asm/book3s/64/pgtable.h
> > +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
> > @@ -2,7 +2,7 @@
> >   #ifndef _ASM_POWERPC_BOOK3S_64_PGTABLE_H_
> >   #define _ASM_POWERPC_BOOK3S_64_PGTABLE_H_
> > -#include <asm-generic/5level-fixup.h>
> > +#include <asm-generic/pgtable-nop4d.h>
> >   #ifndef __ASSEMBLY__
> >   #include <linux/mmdebug.h>
> > @@ -251,7 +251,7 @@ extern unsigned long __pmd_frag_size_shift;
> >   /* Bits to mask out from a PUD to get to the PMD page */
> >   #define PUD_MASKED_BITS		0xc0000000000000ffUL
> >   /* Bits to mask out from a PGD to get to the PUD page */
> > -#define PGD_MASKED_BITS		0xc0000000000000ffUL
> > +#define P4D_MASKED_BITS		0xc0000000000000ffUL
> >   /*
> >    * Used as an indicator for rcu callback functions
> > @@ -949,54 +949,60 @@ static inline bool pud_access_permitted(pud_t pud, bool write)
> >   	return pte_access_permitted(pud_pte(pud), write);
> >   }
> > -#define pgd_write(pgd)		pte_write(pgd_pte(pgd))
> > +#define __p4d_raw(x)	((p4d_t) { __pgd_raw(x) })
> > +static inline __be64 p4d_raw(p4d_t x)
> > +{
> > +	return pgd_raw(x.pgd);
> > +}
> > +
> 
> Shouldn't this be defined in asm/pgtable-be-types.h, just like other
> __pxx_raw() ?

Ideally yes, but this creates weird header file dependencies and untangling
them would generate way too much churn.
 
> > +#define p4d_write(p4d)		pte_write(p4d_pte(p4d))
> > -static inline void pgd_clear(pgd_t *pgdp)
> > +static inline void p4d_clear(p4d_t *p4dp)
> >   {
> > -	*pgdp = __pgd(0);
> > +	*p4dp = __p4d(0);
> >   }

...

> > @@ -573,9 +596,15 @@ int kvmppc_create_pte(struct kvm *kvm, pgd_t *pgtable, pte_t pte,
> >   	/* Traverse the guest's 2nd-level tree, allocate new levels needed */
> >   	pgd = pgtable + pgd_index(gpa);
> > -	pud = NULL;
> > +	p4d = NULL;
> >   	if (pgd_present(*pgd))
> > -		pud = pud_offset(pgd, gpa);
> > +		p4d = p4d_offset(pgd, gpa);
> > +	else
> > +		new_p4d = p4d_alloc_one(kvm->mm, gpa);
> > +
> > +	pud = NULL;
> > +	if (p4d_present(*p4d))
> > +		pud = pud_offset(p4d, gpa);
> 
> Is it worth adding all this new code ?
> 
> My understanding is that the series objective is to get rid of
> __ARCH_HAS_5LEVEL_HACK, to to add support for 5 levels to an architecture
> that not need it (at least for now).
> If we want to add support for 5 levels, it can be done later in another
> patch.
> 
> Here I think your change could be limited to:
> 
> -		pud = pud_offset(pgd, gpa);
> +		pud = pud_offset(p4d_offset(pgd, gpa), gpa);

This won't work. Without __ARCH_USE_5LEVEL_HACK defined pgd_present() is
hardwired to 1 and the actual check for the top level is performed with
p4d_present(). The 'else' clause that allocates p4d will never be taken and
it could be removed, but I prefer to keep it for consistency.
 
> >   	else
> >   		new_pud = pud_alloc_one(kvm->mm, gpa);
> > @@ -597,12 +626,18 @@ int kvmppc_create_pte(struct kvm *kvm, pgd_t *pgtable, pte_t pte,
> >   	/* Now traverse again under the lock and change the tree */
> >   	ret = -ENOMEM;
> >   	if (pgd_none(*pgd)) {
> > +		if (!new_p4d)
> > +			goto out_unlock;
> > +		pgd_populate(kvm->mm, pgd, new_p4d);
> > +		new_p4d = NULL;
> > +	}
> > +	if (p4d_none(*p4d)) {
> >   		if (!new_pud)
> >   			goto out_unlock;
> > -		pgd_populate(kvm->mm, pgd, new_pud);
> > +		p4d_populate(kvm->mm, p4d, new_pud);
> >   		new_pud = NULL;
> >   	}
> > -	pud = pud_offset(pgd, gpa);
> > +	pud = pud_offset(p4d, gpa);
> >   	if (pud_is_leaf(*pud)) {
> >   		unsigned long hgpa = gpa & PUD_MASK;
> > @@ -1220,6 +1255,7 @@ static ssize_t debugfs_radix_read(struct file *file, char __user *buf,
> >   	pgd_t *pgt;
> >   	struct kvm_nested_guest *nested;
> >   	pgd_t pgd, *pgdp;
> > +	p4d_t p4d, *p4dp;
> >   	pud_t pud, *pudp;
> >   	pmd_t pmd, *pmdp;
> >   	pte_t *ptep;
> > @@ -1298,7 +1334,14 @@ static ssize_t debugfs_radix_read(struct file *file, char __user *buf,
> >   			continue;
> >   		}
> > -		pudp = pud_offset(&pgd, gpa);
> > +		p4dp = p4d_offset(&pgd, gpa);
> > +		p4d = READ_ONCE(*p4dp);
> > +		if (!(p4d_val(p4d) & _PAGE_PRESENT)) {
> > +			gpa = (gpa & P4D_MASK) + P4D_SIZE;
> > +			continue;
> > +		}
> > +
> > +		pudp = pud_offset(&p4d, gpa);
> 
> Same, here you are forcing a useless read with READ_ONCE().
> 
> Your change could be limited to
> 
> -		pudp = pud_offset(&pgd, gpa);
> +		pudp = pud_offset(p4d_offset(&pgd, gpa), gpa);

Here again the actual check must be done against p4d rather than pgd. We
could skip READ_ONCE() for pgd, but since it is a debugfs method I don't
think it is more important than code consistency.
 
> This comment applies to many other places.

I'll make another pass to see where we can take the shortcut and use 

	pudp = pud_offset(p4d_offset(...))
 
> >   		pud = READ_ONCE(*pudp);
> >   		if (!(pud_val(pud) & _PAGE_PRESENT)) {
> >   			gpa = (gpa & PUD_MASK) + PUD_SIZE;
> > diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
> > index 3345f039a876..7a59f6863cec 100644
> > --- a/arch/powerpc/lib/code-patching.c
> > +++ b/arch/powerpc/lib/code-patching.c
> > @@ -107,13 +107,18 @@ static inline int unmap_patch_area(unsigned long addr)
> >   	pte_t *ptep;
> >   	pmd_t *pmdp;
> >   	pud_t *pudp;
> > +	p4d_t *p4dp;
> >   	pgd_t *pgdp;
> >   	pgdp = pgd_offset_k(addr);
> >   	if (unlikely(!pgdp))
> >   		return -EINVAL;
> > -	pudp = pud_offset(pgdp, addr);
> > +	p4dp = p4d_offset(pgdp, addr);
> > +	if (unlikely(!p4dp))
> > +		return -EINVAL;
> > +
> > +	pudp = pud_offset(p4dp, addr);
> >   	if (unlikely(!pudp))
> >   		return -EINVAL;
> > diff --git a/arch/powerpc/mm/book3s32/mmu.c b/arch/powerpc/mm/book3s32/mmu.c
> > index 0a1c65a2c565..b2fc3e71165c 100644
> > --- a/arch/powerpc/mm/book3s32/mmu.c
> > +++ b/arch/powerpc/mm/book3s32/mmu.c
> > @@ -312,7 +312,7 @@ void hash_preload(struct mm_struct *mm, unsigned long ea)
> >   	if (!Hash)
> >   		return;
> > -	pmd = pmd_offset(pud_offset(pgd_offset(mm, ea), ea), ea);
> > +	pmd = pmd_offset(pud_offset(p4d_offset(pgd_offset(mm, ea), ea), ea), ea);
> 
> If we continue like this, in ten years this like is going to be many
> kilometers long.
> 
> I think the above would be worth a generic helper.

Agree. My plan was to first unify all the architectures and then start
introducing the generic helpers, like e.g. pmd_offset_mm().
 
> >   	if (!pmd_none(*pmd))
> >   		add_hash_page(mm->context.id, ea, pmd_val(*pmd));
> >   }
> > diff --git a/arch/powerpc/mm/book3s32/tlb.c b/arch/powerpc/mm/book3s32/tlb.c
> > index 2fcd321040ff..175bc33b41b7 100644
> > --- a/arch/powerpc/mm/book3s32/tlb.c
> > +++ b/arch/powerpc/mm/book3s32/tlb.c
> > @@ -87,7 +87,7 @@ static void flush_range(struct mm_struct *mm, unsigned long start,
> >   	if (start >= end)
> >   		return;
> >   	end = (end - 1) | ~PAGE_MASK;
> > -	pmd = pmd_offset(pud_offset(pgd_offset(mm, start), start), start);
> > +	pmd = pmd_offset(pud_offset(p4d_offset(pgd_offset(mm, start), start), start), start);
> >   	for (;;) {
> >   		pmd_end = ((start + PGDIR_SIZE) & PGDIR_MASK) - 1;
> >   		if (pmd_end > end)
> > @@ -145,7 +145,7 @@ void flush_tlb_page(struct vm_area_struct *vma, unsigned long vmaddr)
> >   		return;
> >   	}
> >   	mm = (vmaddr < TASK_SIZE)? vma->vm_mm: &init_mm;
> > -	pmd = pmd_offset(pud_offset(pgd_offset(mm, vmaddr), vmaddr), vmaddr);
> > +	pmd = pmd_offset(pud_offset(p4d_offset(pgd_offset(mm, vmaddr), vmaddr), vmaddr), vmaddr);
> >   	if (!pmd_none(*pmd))
> >   		flush_hash_pages(mm->context.id, vmaddr, pmd_val(*pmd), 1);
> >   }
> > diff --git a/arch/powerpc/mm/book3s64/hash_pgtable.c b/arch/powerpc/mm/book3s64/hash_pgtable.c
> > index 64733b9cb20a..9cd15937e88a 100644
> > --- a/arch/powerpc/mm/book3s64/hash_pgtable.c
> > +++ b/arch/powerpc/mm/book3s64/hash_pgtable.c
> > @@ -148,6 +148,7 @@ void hash__vmemmap_remove_mapping(unsigned long start,
> >   int hash__map_kernel_page(unsigned long ea, unsigned long pa, pgprot_t prot)
> >   {
> >   	pgd_t *pgdp;
> > +	p4d_t *p4dp;
> >   	pud_t *pudp;
> >   	pmd_t *pmdp;
> >   	pte_t *ptep;
> > @@ -155,7 +156,8 @@ int hash__map_kernel_page(unsigned long ea, unsigned long pa, pgprot_t prot)
> >   	BUILD_BUG_ON(TASK_SIZE_USER64 > H_PGTABLE_RANGE);
> >   	if (slab_is_available()) {
> >   		pgdp = pgd_offset_k(ea);
> > -		pudp = pud_alloc(&init_mm, pgdp, ea);
> > +		p4dp = p4d_offset(pgdp, ea);
> > +		pudp = pud_alloc(&init_mm, p4dp, ea);
> 
> Could be a single line, without a new var.
> 
> -		pudp = pud_alloc(&init_mm, pgdp, ea);
> +		pudp = pud_alloc(&init_mm, p4d_offset(pgdp, ea), ea);
> 
> 
> Same kind of comments as already done apply to the rest.
> 
> Christophe

-- 
Sincerely yours,
Mike.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

WARNING: multiple messages have this Message-ID (diff)
From: Mike Rapoport <rppt@kernel.org>
To: Christophe Leroy <christophe.leroy@c-s.fr>
Cc: linux-kernel@vger.kernel.org,
	Andrew Morton <akpm@linux-foundation.org>,
	Arnd Bergmann <arnd@arndb.de>,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	Brian Cain <bcain@codeaurora.org>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Fenghua Yu <fenghua.yu@intel.com>,
	Geert Uytterhoeven <geert+renesas@glider.be>,
	Guan Xuetao <gxt@pku.edu.cn>, James Morse <james.morse@arm.com>,
	Jonas Bonn <jonas@southpole.se>,
	Julien Thierry <julien.thierry.kdev@gmail.com>,
	Ley Foon Tan <ley.foon.tan@intel.com>,
	Marc Zyngier <maz@kernel.org>,
	Michael Ellerman <mpe@ellerman.id.au>,
	Paul Mackerras <paulus@samba.org>, Rich Felker <dalias@libc.org>,
	Russell King <linux@armlinux.org.uk>,
	Stafford Horne <shorne@gmail.com>,
	Stefan Kristiansson <stefan.kristian>
Subject: Re: [PATCH v2 07/13] powerpc: add support for folded p4d page tables
Date: Tue, 18 Feb 2020 12:54:40 +0200	[thread overview]
Message-ID: <20200218105440.GA1698@hump> (raw)
In-Reply-To: <c79b363c-a111-389a-5752-51cf85fa8c44@c-s.fr>

On Sun, Feb 16, 2020 at 11:41:07AM +0100, Christophe Leroy wrote:
> 
> 
> Le 16/02/2020 à 09:18, Mike Rapoport a écrit :
> > From: Mike Rapoport <rppt@linux.ibm.com>
> > 
> > Implement primitives necessary for the 4th level folding, add walks of p4d
> > level where appropriate and replace 5level-fixup.h with pgtable-nop4d.h.
> 
> I don't think it is worth adding all this additionnals walks of p4d, this
> patch could be limited to changes like:
> 
> -		pud = pud_offset(pgd, gpa);
> +		pud = pud_offset(p4d_offset(pgd, gpa), gpa);
> 
> The additionnal walks should be added through another patch the day powerpc
> need them.

Ok, I'll update the patch to reduce walking the p4d.
 
> See below for more comments.
> 
> > 
> > Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
> > Tested-by: Christophe Leroy <christophe.leroy@c-s.fr> # 8xx and 83xx
> > ---

...

> > diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h
> > index 201a69e6a355..ddddbafff0ab 100644
> > --- a/arch/powerpc/include/asm/book3s/64/pgtable.h
> > +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
> > @@ -2,7 +2,7 @@
> >   #ifndef _ASM_POWERPC_BOOK3S_64_PGTABLE_H_
> >   #define _ASM_POWERPC_BOOK3S_64_PGTABLE_H_
> > -#include <asm-generic/5level-fixup.h>
> > +#include <asm-generic/pgtable-nop4d.h>
> >   #ifndef __ASSEMBLY__
> >   #include <linux/mmdebug.h>
> > @@ -251,7 +251,7 @@ extern unsigned long __pmd_frag_size_shift;
> >   /* Bits to mask out from a PUD to get to the PMD page */
> >   #define PUD_MASKED_BITS		0xc0000000000000ffUL
> >   /* Bits to mask out from a PGD to get to the PUD page */
> > -#define PGD_MASKED_BITS		0xc0000000000000ffUL
> > +#define P4D_MASKED_BITS		0xc0000000000000ffUL
> >   /*
> >    * Used as an indicator for rcu callback functions
> > @@ -949,54 +949,60 @@ static inline bool pud_access_permitted(pud_t pud, bool write)
> >   	return pte_access_permitted(pud_pte(pud), write);
> >   }
> > -#define pgd_write(pgd)		pte_write(pgd_pte(pgd))
> > +#define __p4d_raw(x)	((p4d_t) { __pgd_raw(x) })
> > +static inline __be64 p4d_raw(p4d_t x)
> > +{
> > +	return pgd_raw(x.pgd);
> > +}
> > +
> 
> Shouldn't this be defined in asm/pgtable-be-types.h, just like other
> __pxx_raw() ?

Ideally yes, but this creates weird header file dependencies and untangling
them would generate way too much churn.
 
> > +#define p4d_write(p4d)		pte_write(p4d_pte(p4d))
> > -static inline void pgd_clear(pgd_t *pgdp)
> > +static inline void p4d_clear(p4d_t *p4dp)
> >   {
> > -	*pgdp = __pgd(0);
> > +	*p4dp = __p4d(0);
> >   }

...

> > @@ -573,9 +596,15 @@ int kvmppc_create_pte(struct kvm *kvm, pgd_t *pgtable, pte_t pte,
> >   	/* Traverse the guest's 2nd-level tree, allocate new levels needed */
> >   	pgd = pgtable + pgd_index(gpa);
> > -	pud = NULL;
> > +	p4d = NULL;
> >   	if (pgd_present(*pgd))
> > -		pud = pud_offset(pgd, gpa);
> > +		p4d = p4d_offset(pgd, gpa);
> > +	else
> > +		new_p4d = p4d_alloc_one(kvm->mm, gpa);
> > +
> > +	pud = NULL;
> > +	if (p4d_present(*p4d))
> > +		pud = pud_offset(p4d, gpa);
> 
> Is it worth adding all this new code ?
> 
> My understanding is that the series objective is to get rid of
> __ARCH_HAS_5LEVEL_HACK, to to add support for 5 levels to an architecture
> that not need it (at least for now).
> If we want to add support for 5 levels, it can be done later in another
> patch.
> 
> Here I think your change could be limited to:
> 
> -		pud = pud_offset(pgd, gpa);
> +		pud = pud_offset(p4d_offset(pgd, gpa), gpa);

This won't work. Without __ARCH_USE_5LEVEL_HACK defined pgd_present() is
hardwired to 1 and the actual check for the top level is performed with
p4d_present(). The 'else' clause that allocates p4d will never be taken and
it could be removed, but I prefer to keep it for consistency.
 
> >   	else
> >   		new_pud = pud_alloc_one(kvm->mm, gpa);
> > @@ -597,12 +626,18 @@ int kvmppc_create_pte(struct kvm *kvm, pgd_t *pgtable, pte_t pte,
> >   	/* Now traverse again under the lock and change the tree */
> >   	ret = -ENOMEM;
> >   	if (pgd_none(*pgd)) {
> > +		if (!new_p4d)
> > +			goto out_unlock;
> > +		pgd_populate(kvm->mm, pgd, new_p4d);
> > +		new_p4d = NULL;
> > +	}
> > +	if (p4d_none(*p4d)) {
> >   		if (!new_pud)
> >   			goto out_unlock;
> > -		pgd_populate(kvm->mm, pgd, new_pud);
> > +		p4d_populate(kvm->mm, p4d, new_pud);
> >   		new_pud = NULL;
> >   	}
> > -	pud = pud_offset(pgd, gpa);
> > +	pud = pud_offset(p4d, gpa);
> >   	if (pud_is_leaf(*pud)) {
> >   		unsigned long hgpa = gpa & PUD_MASK;
> > @@ -1220,6 +1255,7 @@ static ssize_t debugfs_radix_read(struct file *file, char __user *buf,
> >   	pgd_t *pgt;
> >   	struct kvm_nested_guest *nested;
> >   	pgd_t pgd, *pgdp;
> > +	p4d_t p4d, *p4dp;
> >   	pud_t pud, *pudp;
> >   	pmd_t pmd, *pmdp;
> >   	pte_t *ptep;
> > @@ -1298,7 +1334,14 @@ static ssize_t debugfs_radix_read(struct file *file, char __user *buf,
> >   			continue;
> >   		}
> > -		pudp = pud_offset(&pgd, gpa);
> > +		p4dp = p4d_offset(&pgd, gpa);
> > +		p4d = READ_ONCE(*p4dp);
> > +		if (!(p4d_val(p4d) & _PAGE_PRESENT)) {
> > +			gpa = (gpa & P4D_MASK) + P4D_SIZE;
> > +			continue;
> > +		}
> > +
> > +		pudp = pud_offset(&p4d, gpa);
> 
> Same, here you are forcing a useless read with READ_ONCE().
> 
> Your change could be limited to
> 
> -		pudp = pud_offset(&pgd, gpa);
> +		pudp = pud_offset(p4d_offset(&pgd, gpa), gpa);

Here again the actual check must be done against p4d rather than pgd. We
could skip READ_ONCE() for pgd, but since it is a debugfs method I don't
think it is more important than code consistency.
 
> This comment applies to many other places.

I'll make another pass to see where we can take the shortcut and use 

	pudp = pud_offset(p4d_offset(...))
 
> >   		pud = READ_ONCE(*pudp);
> >   		if (!(pud_val(pud) & _PAGE_PRESENT)) {
> >   			gpa = (gpa & PUD_MASK) + PUD_SIZE;
> > diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
> > index 3345f039a876..7a59f6863cec 100644
> > --- a/arch/powerpc/lib/code-patching.c
> > +++ b/arch/powerpc/lib/code-patching.c
> > @@ -107,13 +107,18 @@ static inline int unmap_patch_area(unsigned long addr)
> >   	pte_t *ptep;
> >   	pmd_t *pmdp;
> >   	pud_t *pudp;
> > +	p4d_t *p4dp;
> >   	pgd_t *pgdp;
> >   	pgdp = pgd_offset_k(addr);
> >   	if (unlikely(!pgdp))
> >   		return -EINVAL;
> > -	pudp = pud_offset(pgdp, addr);
> > +	p4dp = p4d_offset(pgdp, addr);
> > +	if (unlikely(!p4dp))
> > +		return -EINVAL;
> > +
> > +	pudp = pud_offset(p4dp, addr);
> >   	if (unlikely(!pudp))
> >   		return -EINVAL;
> > diff --git a/arch/powerpc/mm/book3s32/mmu.c b/arch/powerpc/mm/book3s32/mmu.c
> > index 0a1c65a2c565..b2fc3e71165c 100644
> > --- a/arch/powerpc/mm/book3s32/mmu.c
> > +++ b/arch/powerpc/mm/book3s32/mmu.c
> > @@ -312,7 +312,7 @@ void hash_preload(struct mm_struct *mm, unsigned long ea)
> >   	if (!Hash)
> >   		return;
> > -	pmd = pmd_offset(pud_offset(pgd_offset(mm, ea), ea), ea);
> > +	pmd = pmd_offset(pud_offset(p4d_offset(pgd_offset(mm, ea), ea), ea), ea);
> 
> If we continue like this, in ten years this like is going to be many
> kilometers long.
> 
> I think the above would be worth a generic helper.

Agree. My plan was to first unify all the architectures and then start
introducing the generic helpers, like e.g. pmd_offset_mm().
 
> >   	if (!pmd_none(*pmd))
> >   		add_hash_page(mm->context.id, ea, pmd_val(*pmd));
> >   }
> > diff --git a/arch/powerpc/mm/book3s32/tlb.c b/arch/powerpc/mm/book3s32/tlb.c
> > index 2fcd321040ff..175bc33b41b7 100644
> > --- a/arch/powerpc/mm/book3s32/tlb.c
> > +++ b/arch/powerpc/mm/book3s32/tlb.c
> > @@ -87,7 +87,7 @@ static void flush_range(struct mm_struct *mm, unsigned long start,
> >   	if (start >= end)
> >   		return;
> >   	end = (end - 1) | ~PAGE_MASK;
> > -	pmd = pmd_offset(pud_offset(pgd_offset(mm, start), start), start);
> > +	pmd = pmd_offset(pud_offset(p4d_offset(pgd_offset(mm, start), start), start), start);
> >   	for (;;) {
> >   		pmd_end = ((start + PGDIR_SIZE) & PGDIR_MASK) - 1;
> >   		if (pmd_end > end)
> > @@ -145,7 +145,7 @@ void flush_tlb_page(struct vm_area_struct *vma, unsigned long vmaddr)
> >   		return;
> >   	}
> >   	mm = (vmaddr < TASK_SIZE)? vma->vm_mm: &init_mm;
> > -	pmd = pmd_offset(pud_offset(pgd_offset(mm, vmaddr), vmaddr), vmaddr);
> > +	pmd = pmd_offset(pud_offset(p4d_offset(pgd_offset(mm, vmaddr), vmaddr), vmaddr), vmaddr);
> >   	if (!pmd_none(*pmd))
> >   		flush_hash_pages(mm->context.id, vmaddr, pmd_val(*pmd), 1);
> >   }
> > diff --git a/arch/powerpc/mm/book3s64/hash_pgtable.c b/arch/powerpc/mm/book3s64/hash_pgtable.c
> > index 64733b9cb20a..9cd15937e88a 100644
> > --- a/arch/powerpc/mm/book3s64/hash_pgtable.c
> > +++ b/arch/powerpc/mm/book3s64/hash_pgtable.c
> > @@ -148,6 +148,7 @@ void hash__vmemmap_remove_mapping(unsigned long start,
> >   int hash__map_kernel_page(unsigned long ea, unsigned long pa, pgprot_t prot)
> >   {
> >   	pgd_t *pgdp;
> > +	p4d_t *p4dp;
> >   	pud_t *pudp;
> >   	pmd_t *pmdp;
> >   	pte_t *ptep;
> > @@ -155,7 +156,8 @@ int hash__map_kernel_page(unsigned long ea, unsigned long pa, pgprot_t prot)
> >   	BUILD_BUG_ON(TASK_SIZE_USER64 > H_PGTABLE_RANGE);
> >   	if (slab_is_available()) {
> >   		pgdp = pgd_offset_k(ea);
> > -		pudp = pud_alloc(&init_mm, pgdp, ea);
> > +		p4dp = p4d_offset(pgdp, ea);
> > +		pudp = pud_alloc(&init_mm, p4dp, ea);
> 
> Could be a single line, without a new var.
> 
> -		pudp = pud_alloc(&init_mm, pgdp, ea);
> +		pudp = pud_alloc(&init_mm, p4d_offset(pgdp, ea), ea);
> 
> 
> Same kind of comments as already done apply to the rest.
> 
> Christophe

-- 
Sincerely yours,
Mike.

WARNING: multiple messages have this Message-ID (diff)
From: Mike Rapoport <rppt@kernel.org>
To: Christophe Leroy <christophe.leroy@c-s.fr>
Cc: linux-kernel@vger.kernel.org,
	Andrew Morton <akpm@linux-foundation.org>,
	Arnd Bergmann <arnd@arndb.de>,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	Brian Cain <bcain@codeaurora.org>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Fenghua Yu <fenghua.yu@intel.com>,
	Geert Uytterhoeven <geert+renesas@glider.be>,
	Guan Xuetao <gxt@pku.edu.cn>, James Morse <james.morse@arm.com>,
	Jonas Bonn <jonas@southpole.se>,
	Julien Thierry <julien.thierry.kdev@gmail.com>,
	Ley Foon Tan <ley.foon.tan@intel.com>,
	Marc Zyngier <maz@kernel.org>,
	Michael Ellerman <mpe@ellerman.id.au>,
	Paul Mackerras <paulus@samba.org>, Rich Felker <dalias@libc.org>,
	Russell King <linux@armlinux.org.uk>,
	Stafford Horne <shorne@gmail.com>,
	Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>,
	Suzuki K Poulose <suzuki.poulose@arm.com>,
	Tony Luck <tony.luck@intel.com>, Will Deacon <will@kernel.org>,
	Yoshinori Sato <ysato@users.sourceforge.jp>,
	kvmarm@lists.cs.columbia.edu, kvm-ppc@vger.kernel.org,
	linux-arch@vger.kernel.org, linux-arm-kernel@lists.infradead.org,
	linux-hexagon@vger.kernel.org, linux-ia64@vger.kernel.org,
	linux-mm@kvack.org, linuxppc-dev@lists.ozlabs.org,
	linux-sh@vger.kernel.org, nios2-dev@lists.rocketboards.org,
	openrisc@lists.librecores.org,
	uclinux-h8-devel@lists.sourceforge.jp,
	Mike Rapoport <rppt@linux.ibm.com>
Subject: Re: [PATCH v2 07/13] powerpc: add support for folded p4d page tables
Date: Tue, 18 Feb 2020 12:54:40 +0200	[thread overview]
Message-ID: <20200218105440.GA1698@hump> (raw)
Message-ID: <20200218105440.8jcHhfPWz0KlbQXfAFWr66N6QxKYxXEJb3BzPOVw2o4@z> (raw)
In-Reply-To: <c79b363c-a111-389a-5752-51cf85fa8c44@c-s.fr>

On Sun, Feb 16, 2020 at 11:41:07AM +0100, Christophe Leroy wrote:
> 
> 
> Le 16/02/2020 à 09:18, Mike Rapoport a écrit :
> > From: Mike Rapoport <rppt@linux.ibm.com>
> > 
> > Implement primitives necessary for the 4th level folding, add walks of p4d
> > level where appropriate and replace 5level-fixup.h with pgtable-nop4d.h.
> 
> I don't think it is worth adding all this additionnals walks of p4d, this
> patch could be limited to changes like:
> 
> -		pud = pud_offset(pgd, gpa);
> +		pud = pud_offset(p4d_offset(pgd, gpa), gpa);
> 
> The additionnal walks should be added through another patch the day powerpc
> need them.

Ok, I'll update the patch to reduce walking the p4d.
 
> See below for more comments.
> 
> > 
> > Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
> > Tested-by: Christophe Leroy <christophe.leroy@c-s.fr> # 8xx and 83xx
> > ---

...

> > diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h
> > index 201a69e6a355..ddddbafff0ab 100644
> > --- a/arch/powerpc/include/asm/book3s/64/pgtable.h
> > +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
> > @@ -2,7 +2,7 @@
> >   #ifndef _ASM_POWERPC_BOOK3S_64_PGTABLE_H_
> >   #define _ASM_POWERPC_BOOK3S_64_PGTABLE_H_
> > -#include <asm-generic/5level-fixup.h>
> > +#include <asm-generic/pgtable-nop4d.h>
> >   #ifndef __ASSEMBLY__
> >   #include <linux/mmdebug.h>
> > @@ -251,7 +251,7 @@ extern unsigned long __pmd_frag_size_shift;
> >   /* Bits to mask out from a PUD to get to the PMD page */
> >   #define PUD_MASKED_BITS		0xc0000000000000ffUL
> >   /* Bits to mask out from a PGD to get to the PUD page */
> > -#define PGD_MASKED_BITS		0xc0000000000000ffUL
> > +#define P4D_MASKED_BITS		0xc0000000000000ffUL
> >   /*
> >    * Used as an indicator for rcu callback functions
> > @@ -949,54 +949,60 @@ static inline bool pud_access_permitted(pud_t pud, bool write)
> >   	return pte_access_permitted(pud_pte(pud), write);
> >   }
> > -#define pgd_write(pgd)		pte_write(pgd_pte(pgd))
> > +#define __p4d_raw(x)	((p4d_t) { __pgd_raw(x) })
> > +static inline __be64 p4d_raw(p4d_t x)
> > +{
> > +	return pgd_raw(x.pgd);
> > +}
> > +
> 
> Shouldn't this be defined in asm/pgtable-be-types.h, just like other
> __pxx_raw() ?

Ideally yes, but this creates weird header file dependencies and untangling
them would generate way too much churn.
 
> > +#define p4d_write(p4d)		pte_write(p4d_pte(p4d))
> > -static inline void pgd_clear(pgd_t *pgdp)
> > +static inline void p4d_clear(p4d_t *p4dp)
> >   {
> > -	*pgdp = __pgd(0);
> > +	*p4dp = __p4d(0);
> >   }

...

> > @@ -573,9 +596,15 @@ int kvmppc_create_pte(struct kvm *kvm, pgd_t *pgtable, pte_t pte,
> >   	/* Traverse the guest's 2nd-level tree, allocate new levels needed */
> >   	pgd = pgtable + pgd_index(gpa);
> > -	pud = NULL;
> > +	p4d = NULL;
> >   	if (pgd_present(*pgd))
> > -		pud = pud_offset(pgd, gpa);
> > +		p4d = p4d_offset(pgd, gpa);
> > +	else
> > +		new_p4d = p4d_alloc_one(kvm->mm, gpa);
> > +
> > +	pud = NULL;
> > +	if (p4d_present(*p4d))
> > +		pud = pud_offset(p4d, gpa);
> 
> Is it worth adding all this new code ?
> 
> My understanding is that the series objective is to get rid of
> __ARCH_HAS_5LEVEL_HACK, to to add support for 5 levels to an architecture
> that not need it (at least for now).
> If we want to add support for 5 levels, it can be done later in another
> patch.
> 
> Here I think your change could be limited to:
> 
> -		pud = pud_offset(pgd, gpa);
> +		pud = pud_offset(p4d_offset(pgd, gpa), gpa);

This won't work. Without __ARCH_USE_5LEVEL_HACK defined pgd_present() is
hardwired to 1 and the actual check for the top level is performed with
p4d_present(). The 'else' clause that allocates p4d will never be taken and
it could be removed, but I prefer to keep it for consistency.
 
> >   	else
> >   		new_pud = pud_alloc_one(kvm->mm, gpa);
> > @@ -597,12 +626,18 @@ int kvmppc_create_pte(struct kvm *kvm, pgd_t *pgtable, pte_t pte,
> >   	/* Now traverse again under the lock and change the tree */
> >   	ret = -ENOMEM;
> >   	if (pgd_none(*pgd)) {
> > +		if (!new_p4d)
> > +			goto out_unlock;
> > +		pgd_populate(kvm->mm, pgd, new_p4d);
> > +		new_p4d = NULL;
> > +	}
> > +	if (p4d_none(*p4d)) {
> >   		if (!new_pud)
> >   			goto out_unlock;
> > -		pgd_populate(kvm->mm, pgd, new_pud);
> > +		p4d_populate(kvm->mm, p4d, new_pud);
> >   		new_pud = NULL;
> >   	}
> > -	pud = pud_offset(pgd, gpa);
> > +	pud = pud_offset(p4d, gpa);
> >   	if (pud_is_leaf(*pud)) {
> >   		unsigned long hgpa = gpa & PUD_MASK;
> > @@ -1220,6 +1255,7 @@ static ssize_t debugfs_radix_read(struct file *file, char __user *buf,
> >   	pgd_t *pgt;
> >   	struct kvm_nested_guest *nested;
> >   	pgd_t pgd, *pgdp;
> > +	p4d_t p4d, *p4dp;
> >   	pud_t pud, *pudp;
> >   	pmd_t pmd, *pmdp;
> >   	pte_t *ptep;
> > @@ -1298,7 +1334,14 @@ static ssize_t debugfs_radix_read(struct file *file, char __user *buf,
> >   			continue;
> >   		}
> > -		pudp = pud_offset(&pgd, gpa);
> > +		p4dp = p4d_offset(&pgd, gpa);
> > +		p4d = READ_ONCE(*p4dp);
> > +		if (!(p4d_val(p4d) & _PAGE_PRESENT)) {
> > +			gpa = (gpa & P4D_MASK) + P4D_SIZE;
> > +			continue;
> > +		}
> > +
> > +		pudp = pud_offset(&p4d, gpa);
> 
> Same, here you are forcing a useless read with READ_ONCE().
> 
> Your change could be limited to
> 
> -		pudp = pud_offset(&pgd, gpa);
> +		pudp = pud_offset(p4d_offset(&pgd, gpa), gpa);

Here again the actual check must be done against p4d rather than pgd. We
could skip READ_ONCE() for pgd, but since it is a debugfs method I don't
think it is more important than code consistency.
 
> This comment applies to many other places.

I'll make another pass to see where we can take the shortcut and use 

	pudp = pud_offset(p4d_offset(...))
 
> >   		pud = READ_ONCE(*pudp);
> >   		if (!(pud_val(pud) & _PAGE_PRESENT)) {
> >   			gpa = (gpa & PUD_MASK) + PUD_SIZE;
> > diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
> > index 3345f039a876..7a59f6863cec 100644
> > --- a/arch/powerpc/lib/code-patching.c
> > +++ b/arch/powerpc/lib/code-patching.c
> > @@ -107,13 +107,18 @@ static inline int unmap_patch_area(unsigned long addr)
> >   	pte_t *ptep;
> >   	pmd_t *pmdp;
> >   	pud_t *pudp;
> > +	p4d_t *p4dp;
> >   	pgd_t *pgdp;
> >   	pgdp = pgd_offset_k(addr);
> >   	if (unlikely(!pgdp))
> >   		return -EINVAL;
> > -	pudp = pud_offset(pgdp, addr);
> > +	p4dp = p4d_offset(pgdp, addr);
> > +	if (unlikely(!p4dp))
> > +		return -EINVAL;
> > +
> > +	pudp = pud_offset(p4dp, addr);
> >   	if (unlikely(!pudp))
> >   		return -EINVAL;
> > diff --git a/arch/powerpc/mm/book3s32/mmu.c b/arch/powerpc/mm/book3s32/mmu.c
> > index 0a1c65a2c565..b2fc3e71165c 100644
> > --- a/arch/powerpc/mm/book3s32/mmu.c
> > +++ b/arch/powerpc/mm/book3s32/mmu.c
> > @@ -312,7 +312,7 @@ void hash_preload(struct mm_struct *mm, unsigned long ea)
> >   	if (!Hash)
> >   		return;
> > -	pmd = pmd_offset(pud_offset(pgd_offset(mm, ea), ea), ea);
> > +	pmd = pmd_offset(pud_offset(p4d_offset(pgd_offset(mm, ea), ea), ea), ea);
> 
> If we continue like this, in ten years this like is going to be many
> kilometers long.
> 
> I think the above would be worth a generic helper.

Agree. My plan was to first unify all the architectures and then start
introducing the generic helpers, like e.g. pmd_offset_mm().
 
> >   	if (!pmd_none(*pmd))
> >   		add_hash_page(mm->context.id, ea, pmd_val(*pmd));
> >   }
> > diff --git a/arch/powerpc/mm/book3s32/tlb.c b/arch/powerpc/mm/book3s32/tlb.c
> > index 2fcd321040ff..175bc33b41b7 100644
> > --- a/arch/powerpc/mm/book3s32/tlb.c
> > +++ b/arch/powerpc/mm/book3s32/tlb.c
> > @@ -87,7 +87,7 @@ static void flush_range(struct mm_struct *mm, unsigned long start,
> >   	if (start >= end)
> >   		return;
> >   	end = (end - 1) | ~PAGE_MASK;
> > -	pmd = pmd_offset(pud_offset(pgd_offset(mm, start), start), start);
> > +	pmd = pmd_offset(pud_offset(p4d_offset(pgd_offset(mm, start), start), start), start);
> >   	for (;;) {
> >   		pmd_end = ((start + PGDIR_SIZE) & PGDIR_MASK) - 1;
> >   		if (pmd_end > end)
> > @@ -145,7 +145,7 @@ void flush_tlb_page(struct vm_area_struct *vma, unsigned long vmaddr)
> >   		return;
> >   	}
> >   	mm = (vmaddr < TASK_SIZE)? vma->vm_mm: &init_mm;
> > -	pmd = pmd_offset(pud_offset(pgd_offset(mm, vmaddr), vmaddr), vmaddr);
> > +	pmd = pmd_offset(pud_offset(p4d_offset(pgd_offset(mm, vmaddr), vmaddr), vmaddr), vmaddr);
> >   	if (!pmd_none(*pmd))
> >   		flush_hash_pages(mm->context.id, vmaddr, pmd_val(*pmd), 1);
> >   }
> > diff --git a/arch/powerpc/mm/book3s64/hash_pgtable.c b/arch/powerpc/mm/book3s64/hash_pgtable.c
> > index 64733b9cb20a..9cd15937e88a 100644
> > --- a/arch/powerpc/mm/book3s64/hash_pgtable.c
> > +++ b/arch/powerpc/mm/book3s64/hash_pgtable.c
> > @@ -148,6 +148,7 @@ void hash__vmemmap_remove_mapping(unsigned long start,
> >   int hash__map_kernel_page(unsigned long ea, unsigned long pa, pgprot_t prot)
> >   {
> >   	pgd_t *pgdp;
> > +	p4d_t *p4dp;
> >   	pud_t *pudp;
> >   	pmd_t *pmdp;
> >   	pte_t *ptep;
> > @@ -155,7 +156,8 @@ int hash__map_kernel_page(unsigned long ea, unsigned long pa, pgprot_t prot)
> >   	BUILD_BUG_ON(TASK_SIZE_USER64 > H_PGTABLE_RANGE);
> >   	if (slab_is_available()) {
> >   		pgdp = pgd_offset_k(ea);
> > -		pudp = pud_alloc(&init_mm, pgdp, ea);
> > +		p4dp = p4d_offset(pgdp, ea);
> > +		pudp = pud_alloc(&init_mm, p4dp, ea);
> 
> Could be a single line, without a new var.
> 
> -		pudp = pud_alloc(&init_mm, pgdp, ea);
> +		pudp = pud_alloc(&init_mm, p4d_offset(pgdp, ea), ea);
> 
> 
> Same kind of comments as already done apply to the rest.
> 
> Christophe

-- 
Sincerely yours,
Mike.

WARNING: multiple messages have this Message-ID (diff)
From: Mike Rapoport <rppt@kernel.org>
To: openrisc@lists.librecores.org
Subject: [OpenRISC] [PATCH v2 07/13] powerpc: add support for folded p4d page tables
Date: Tue, 18 Feb 2020 12:54:40 +0200	[thread overview]
Message-ID: <20200218105440.GA1698@hump> (raw)
In-Reply-To: <c79b363c-a111-389a-5752-51cf85fa8c44@c-s.fr>

On Sun, Feb 16, 2020 at 11:41:07AM +0100, Christophe Leroy wrote:
> 
> 
> Le 16/02/2020 à 09:18, Mike Rapoport a écrit :
> > From: Mike Rapoport <rppt@linux.ibm.com>
> > 
> > Implement primitives necessary for the 4th level folding, add walks of p4d
> > level where appropriate and replace 5level-fixup.h with pgtable-nop4d.h.
> 
> I don't think it is worth adding all this additionnals walks of p4d, this
> patch could be limited to changes like:
> 
> -		pud = pud_offset(pgd, gpa);
> +		pud = pud_offset(p4d_offset(pgd, gpa), gpa);
> 
> The additionnal walks should be added through another patch the day powerpc
> need them.

Ok, I'll update the patch to reduce walking the p4d.
 
> See below for more comments.
> 
> > 
> > Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
> > Tested-by: Christophe Leroy <christophe.leroy@c-s.fr> # 8xx and 83xx
> > ---

...

> > diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h
> > index 201a69e6a355..ddddbafff0ab 100644
> > --- a/arch/powerpc/include/asm/book3s/64/pgtable.h
> > +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
> > @@ -2,7 +2,7 @@
> >   #ifndef _ASM_POWERPC_BOOK3S_64_PGTABLE_H_
> >   #define _ASM_POWERPC_BOOK3S_64_PGTABLE_H_
> > -#include <asm-generic/5level-fixup.h>
> > +#include <asm-generic/pgtable-nop4d.h>
> >   #ifndef __ASSEMBLY__
> >   #include <linux/mmdebug.h>
> > @@ -251,7 +251,7 @@ extern unsigned long __pmd_frag_size_shift;
> >   /* Bits to mask out from a PUD to get to the PMD page */
> >   #define PUD_MASKED_BITS		0xc0000000000000ffUL
> >   /* Bits to mask out from a PGD to get to the PUD page */
> > -#define PGD_MASKED_BITS		0xc0000000000000ffUL
> > +#define P4D_MASKED_BITS		0xc0000000000000ffUL
> >   /*
> >    * Used as an indicator for rcu callback functions
> > @@ -949,54 +949,60 @@ static inline bool pud_access_permitted(pud_t pud, bool write)
> >   	return pte_access_permitted(pud_pte(pud), write);
> >   }
> > -#define pgd_write(pgd)		pte_write(pgd_pte(pgd))
> > +#define __p4d_raw(x)	((p4d_t) { __pgd_raw(x) })
> > +static inline __be64 p4d_raw(p4d_t x)
> > +{
> > +	return pgd_raw(x.pgd);
> > +}
> > +
> 
> Shouldn't this be defined in asm/pgtable-be-types.h, just like other
> __pxx_raw() ?

Ideally yes, but this creates weird header file dependencies and untangling
them would generate way too much churn.
 
> > +#define p4d_write(p4d)		pte_write(p4d_pte(p4d))
> > -static inline void pgd_clear(pgd_t *pgdp)
> > +static inline void p4d_clear(p4d_t *p4dp)
> >   {
> > -	*pgdp = __pgd(0);
> > +	*p4dp = __p4d(0);
> >   }

...

> > @@ -573,9 +596,15 @@ int kvmppc_create_pte(struct kvm *kvm, pgd_t *pgtable, pte_t pte,
> >   	/* Traverse the guest's 2nd-level tree, allocate new levels needed */
> >   	pgd = pgtable + pgd_index(gpa);
> > -	pud = NULL;
> > +	p4d = NULL;
> >   	if (pgd_present(*pgd))
> > -		pud = pud_offset(pgd, gpa);
> > +		p4d = p4d_offset(pgd, gpa);
> > +	else
> > +		new_p4d = p4d_alloc_one(kvm->mm, gpa);
> > +
> > +	pud = NULL;
> > +	if (p4d_present(*p4d))
> > +		pud = pud_offset(p4d, gpa);
> 
> Is it worth adding all this new code ?
> 
> My understanding is that the series objective is to get rid of
> __ARCH_HAS_5LEVEL_HACK, to to add support for 5 levels to an architecture
> that not need it (at least for now).
> If we want to add support for 5 levels, it can be done later in another
> patch.
> 
> Here I think your change could be limited to:
> 
> -		pud = pud_offset(pgd, gpa);
> +		pud = pud_offset(p4d_offset(pgd, gpa), gpa);

This won't work. Without __ARCH_USE_5LEVEL_HACK defined pgd_present() is
hardwired to 1 and the actual check for the top level is performed with
p4d_present(). The 'else' clause that allocates p4d will never be taken and
it could be removed, but I prefer to keep it for consistency.
 
> >   	else
> >   		new_pud = pud_alloc_one(kvm->mm, gpa);
> > @@ -597,12 +626,18 @@ int kvmppc_create_pte(struct kvm *kvm, pgd_t *pgtable, pte_t pte,
> >   	/* Now traverse again under the lock and change the tree */
> >   	ret = -ENOMEM;
> >   	if (pgd_none(*pgd)) {
> > +		if (!new_p4d)
> > +			goto out_unlock;
> > +		pgd_populate(kvm->mm, pgd, new_p4d);
> > +		new_p4d = NULL;
> > +	}
> > +	if (p4d_none(*p4d)) {
> >   		if (!new_pud)
> >   			goto out_unlock;
> > -		pgd_populate(kvm->mm, pgd, new_pud);
> > +		p4d_populate(kvm->mm, p4d, new_pud);
> >   		new_pud = NULL;
> >   	}
> > -	pud = pud_offset(pgd, gpa);
> > +	pud = pud_offset(p4d, gpa);
> >   	if (pud_is_leaf(*pud)) {
> >   		unsigned long hgpa = gpa & PUD_MASK;
> > @@ -1220,6 +1255,7 @@ static ssize_t debugfs_radix_read(struct file *file, char __user *buf,
> >   	pgd_t *pgt;
> >   	struct kvm_nested_guest *nested;
> >   	pgd_t pgd, *pgdp;
> > +	p4d_t p4d, *p4dp;
> >   	pud_t pud, *pudp;
> >   	pmd_t pmd, *pmdp;
> >   	pte_t *ptep;
> > @@ -1298,7 +1334,14 @@ static ssize_t debugfs_radix_read(struct file *file, char __user *buf,
> >   			continue;
> >   		}
> > -		pudp = pud_offset(&pgd, gpa);
> > +		p4dp = p4d_offset(&pgd, gpa);
> > +		p4d = READ_ONCE(*p4dp);
> > +		if (!(p4d_val(p4d) & _PAGE_PRESENT)) {
> > +			gpa = (gpa & P4D_MASK) + P4D_SIZE;
> > +			continue;
> > +		}
> > +
> > +		pudp = pud_offset(&p4d, gpa);
> 
> Same, here you are forcing a useless read with READ_ONCE().
> 
> Your change could be limited to
> 
> -		pudp = pud_offset(&pgd, gpa);
> +		pudp = pud_offset(p4d_offset(&pgd, gpa), gpa);

Here again the actual check must be done against p4d rather than pgd. We
could skip READ_ONCE() for pgd, but since it is a debugfs method I don't
think it is more important than code consistency.
 
> This comment applies to many other places.

I'll make another pass to see where we can take the shortcut and use 

	pudp = pud_offset(p4d_offset(...))
 
> >   		pud = READ_ONCE(*pudp);
> >   		if (!(pud_val(pud) & _PAGE_PRESENT)) {
> >   			gpa = (gpa & PUD_MASK) + PUD_SIZE;
> > diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
> > index 3345f039a876..7a59f6863cec 100644
> > --- a/arch/powerpc/lib/code-patching.c
> > +++ b/arch/powerpc/lib/code-patching.c
> > @@ -107,13 +107,18 @@ static inline int unmap_patch_area(unsigned long addr)
> >   	pte_t *ptep;
> >   	pmd_t *pmdp;
> >   	pud_t *pudp;
> > +	p4d_t *p4dp;
> >   	pgd_t *pgdp;
> >   	pgdp = pgd_offset_k(addr);
> >   	if (unlikely(!pgdp))
> >   		return -EINVAL;
> > -	pudp = pud_offset(pgdp, addr);
> > +	p4dp = p4d_offset(pgdp, addr);
> > +	if (unlikely(!p4dp))
> > +		return -EINVAL;
> > +
> > +	pudp = pud_offset(p4dp, addr);
> >   	if (unlikely(!pudp))
> >   		return -EINVAL;
> > diff --git a/arch/powerpc/mm/book3s32/mmu.c b/arch/powerpc/mm/book3s32/mmu.c
> > index 0a1c65a2c565..b2fc3e71165c 100644
> > --- a/arch/powerpc/mm/book3s32/mmu.c
> > +++ b/arch/powerpc/mm/book3s32/mmu.c
> > @@ -312,7 +312,7 @@ void hash_preload(struct mm_struct *mm, unsigned long ea)
> >   	if (!Hash)
> >   		return;
> > -	pmd = pmd_offset(pud_offset(pgd_offset(mm, ea), ea), ea);
> > +	pmd = pmd_offset(pud_offset(p4d_offset(pgd_offset(mm, ea), ea), ea), ea);
> 
> If we continue like this, in ten years this like is going to be many
> kilometers long.
> 
> I think the above would be worth a generic helper.

Agree. My plan was to first unify all the architectures and then start
introducing the generic helpers, like e.g. pmd_offset_mm().
 
> >   	if (!pmd_none(*pmd))
> >   		add_hash_page(mm->context.id, ea, pmd_val(*pmd));
> >   }
> > diff --git a/arch/powerpc/mm/book3s32/tlb.c b/arch/powerpc/mm/book3s32/tlb.c
> > index 2fcd321040ff..175bc33b41b7 100644
> > --- a/arch/powerpc/mm/book3s32/tlb.c
> > +++ b/arch/powerpc/mm/book3s32/tlb.c
> > @@ -87,7 +87,7 @@ static void flush_range(struct mm_struct *mm, unsigned long start,
> >   	if (start >= end)
> >   		return;
> >   	end = (end - 1) | ~PAGE_MASK;
> > -	pmd = pmd_offset(pud_offset(pgd_offset(mm, start), start), start);
> > +	pmd = pmd_offset(pud_offset(p4d_offset(pgd_offset(mm, start), start), start), start);
> >   	for (;;) {
> >   		pmd_end = ((start + PGDIR_SIZE) & PGDIR_MASK) - 1;
> >   		if (pmd_end > end)
> > @@ -145,7 +145,7 @@ void flush_tlb_page(struct vm_area_struct *vma, unsigned long vmaddr)
> >   		return;
> >   	}
> >   	mm = (vmaddr < TASK_SIZE)? vma->vm_mm: &init_mm;
> > -	pmd = pmd_offset(pud_offset(pgd_offset(mm, vmaddr), vmaddr), vmaddr);
> > +	pmd = pmd_offset(pud_offset(p4d_offset(pgd_offset(mm, vmaddr), vmaddr), vmaddr), vmaddr);
> >   	if (!pmd_none(*pmd))
> >   		flush_hash_pages(mm->context.id, vmaddr, pmd_val(*pmd), 1);
> >   }
> > diff --git a/arch/powerpc/mm/book3s64/hash_pgtable.c b/arch/powerpc/mm/book3s64/hash_pgtable.c
> > index 64733b9cb20a..9cd15937e88a 100644
> > --- a/arch/powerpc/mm/book3s64/hash_pgtable.c
> > +++ b/arch/powerpc/mm/book3s64/hash_pgtable.c
> > @@ -148,6 +148,7 @@ void hash__vmemmap_remove_mapping(unsigned long start,
> >   int hash__map_kernel_page(unsigned long ea, unsigned long pa, pgprot_t prot)
> >   {
> >   	pgd_t *pgdp;
> > +	p4d_t *p4dp;
> >   	pud_t *pudp;
> >   	pmd_t *pmdp;
> >   	pte_t *ptep;
> > @@ -155,7 +156,8 @@ int hash__map_kernel_page(unsigned long ea, unsigned long pa, pgprot_t prot)
> >   	BUILD_BUG_ON(TASK_SIZE_USER64 > H_PGTABLE_RANGE);
> >   	if (slab_is_available()) {
> >   		pgdp = pgd_offset_k(ea);
> > -		pudp = pud_alloc(&init_mm, pgdp, ea);
> > +		p4dp = p4d_offset(pgdp, ea);
> > +		pudp = pud_alloc(&init_mm, p4dp, ea);
> 
> Could be a single line, without a new var.
> 
> -		pudp = pud_alloc(&init_mm, pgdp, ea);
> +		pudp = pud_alloc(&init_mm, p4d_offset(pgdp, ea), ea);
> 
> 
> Same kind of comments as already done apply to the rest.
> 
> Christophe

-- 
Sincerely yours,
Mike.

WARNING: multiple messages have this Message-ID (diff)
From: Mike Rapoport <rppt@kernel.org>
To: Christophe Leroy <christophe.leroy@c-s.fr>
Cc: Rich Felker <dalias@libc.org>,
	linux-ia64@vger.kernel.org,
	Geert Uytterhoeven <geert+renesas@glider.be>,
	linux-sh@vger.kernel.org, linux-mm@kvack.org,
	Paul Mackerras <paulus@samba.org>,
	linux-hexagon@vger.kernel.org, Will Deacon <will@kernel.org>,
	kvmarm@lists.cs.columbia.edu, Jonas Bonn <jonas@southpole.se>,
	linux-arch@vger.kernel.org, Brian Cain <bcain@codeaurora.org>,
	Marc Zyngier <maz@kernel.org>,
	Russell King <linux@armlinux.org.uk>,
	Ley Foon Tan <ley.foon.tan@intel.com>,
	Mike Rapoport <rppt@linux.ibm.com>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Julien Thierry <julien.thierry.kdev@gmail.com>,
	uclinux-h8-devel@lists.sourceforge.jp,
	Fenghua Yu <fenghua.yu@intel.com>, Arnd Bergmann <arnd@arndb.de>,
	Suzuki K Poulose <suzuki.poulose@arm.com>,
	kvm-ppc@vger.kernel.org,
	Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>,
	openrisc@lists.librecores.org, Stafford Horne <shorne@gmail.com>,
	Guan Xuetao <gxt@pku.edu.cn>,
	linux-arm-kernel@lists.infradead.org,
	Tony Luck <tony.luck@intel.com>,
	Yoshinori Sato <ysato@users.sourceforge.jp>,
	linux-kernel@vger.kernel.org, James Morse <james.morse@arm.com>,
	nios2-dev@lists.rocketboards.org,
	Andrew Morton <akpm@linux-foundation.org>,
	linuxppc-dev@lists.ozlabs.org
Subject: Re: [PATCH v2 07/13] powerpc: add support for folded p4d page tables
Date: Tue, 18 Feb 2020 12:54:40 +0200	[thread overview]
Message-ID: <20200218105440.GA1698@hump> (raw)
In-Reply-To: <c79b363c-a111-389a-5752-51cf85fa8c44@c-s.fr>

On Sun, Feb 16, 2020 at 11:41:07AM +0100, Christophe Leroy wrote:
> 
> 
> Le 16/02/2020 à 09:18, Mike Rapoport a écrit :
> > From: Mike Rapoport <rppt@linux.ibm.com>
> > 
> > Implement primitives necessary for the 4th level folding, add walks of p4d
> > level where appropriate and replace 5level-fixup.h with pgtable-nop4d.h.
> 
> I don't think it is worth adding all this additionnals walks of p4d, this
> patch could be limited to changes like:
> 
> -		pud = pud_offset(pgd, gpa);
> +		pud = pud_offset(p4d_offset(pgd, gpa), gpa);
> 
> The additionnal walks should be added through another patch the day powerpc
> need them.

Ok, I'll update the patch to reduce walking the p4d.
 
> See below for more comments.
> 
> > 
> > Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
> > Tested-by: Christophe Leroy <christophe.leroy@c-s.fr> # 8xx and 83xx
> > ---

...

> > diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h
> > index 201a69e6a355..ddddbafff0ab 100644
> > --- a/arch/powerpc/include/asm/book3s/64/pgtable.h
> > +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
> > @@ -2,7 +2,7 @@
> >   #ifndef _ASM_POWERPC_BOOK3S_64_PGTABLE_H_
> >   #define _ASM_POWERPC_BOOK3S_64_PGTABLE_H_
> > -#include <asm-generic/5level-fixup.h>
> > +#include <asm-generic/pgtable-nop4d.h>
> >   #ifndef __ASSEMBLY__
> >   #include <linux/mmdebug.h>
> > @@ -251,7 +251,7 @@ extern unsigned long __pmd_frag_size_shift;
> >   /* Bits to mask out from a PUD to get to the PMD page */
> >   #define PUD_MASKED_BITS		0xc0000000000000ffUL
> >   /* Bits to mask out from a PGD to get to the PUD page */
> > -#define PGD_MASKED_BITS		0xc0000000000000ffUL
> > +#define P4D_MASKED_BITS		0xc0000000000000ffUL
> >   /*
> >    * Used as an indicator for rcu callback functions
> > @@ -949,54 +949,60 @@ static inline bool pud_access_permitted(pud_t pud, bool write)
> >   	return pte_access_permitted(pud_pte(pud), write);
> >   }
> > -#define pgd_write(pgd)		pte_write(pgd_pte(pgd))
> > +#define __p4d_raw(x)	((p4d_t) { __pgd_raw(x) })
> > +static inline __be64 p4d_raw(p4d_t x)
> > +{
> > +	return pgd_raw(x.pgd);
> > +}
> > +
> 
> Shouldn't this be defined in asm/pgtable-be-types.h, just like other
> __pxx_raw() ?

Ideally yes, but this creates weird header file dependencies and untangling
them would generate way too much churn.
 
> > +#define p4d_write(p4d)		pte_write(p4d_pte(p4d))
> > -static inline void pgd_clear(pgd_t *pgdp)
> > +static inline void p4d_clear(p4d_t *p4dp)
> >   {
> > -	*pgdp = __pgd(0);
> > +	*p4dp = __p4d(0);
> >   }

...

> > @@ -573,9 +596,15 @@ int kvmppc_create_pte(struct kvm *kvm, pgd_t *pgtable, pte_t pte,
> >   	/* Traverse the guest's 2nd-level tree, allocate new levels needed */
> >   	pgd = pgtable + pgd_index(gpa);
> > -	pud = NULL;
> > +	p4d = NULL;
> >   	if (pgd_present(*pgd))
> > -		pud = pud_offset(pgd, gpa);
> > +		p4d = p4d_offset(pgd, gpa);
> > +	else
> > +		new_p4d = p4d_alloc_one(kvm->mm, gpa);
> > +
> > +	pud = NULL;
> > +	if (p4d_present(*p4d))
> > +		pud = pud_offset(p4d, gpa);
> 
> Is it worth adding all this new code ?
> 
> My understanding is that the series objective is to get rid of
> __ARCH_HAS_5LEVEL_HACK, to to add support for 5 levels to an architecture
> that not need it (at least for now).
> If we want to add support for 5 levels, it can be done later in another
> patch.
> 
> Here I think your change could be limited to:
> 
> -		pud = pud_offset(pgd, gpa);
> +		pud = pud_offset(p4d_offset(pgd, gpa), gpa);

This won't work. Without __ARCH_USE_5LEVEL_HACK defined pgd_present() is
hardwired to 1 and the actual check for the top level is performed with
p4d_present(). The 'else' clause that allocates p4d will never be taken and
it could be removed, but I prefer to keep it for consistency.
 
> >   	else
> >   		new_pud = pud_alloc_one(kvm->mm, gpa);
> > @@ -597,12 +626,18 @@ int kvmppc_create_pte(struct kvm *kvm, pgd_t *pgtable, pte_t pte,
> >   	/* Now traverse again under the lock and change the tree */
> >   	ret = -ENOMEM;
> >   	if (pgd_none(*pgd)) {
> > +		if (!new_p4d)
> > +			goto out_unlock;
> > +		pgd_populate(kvm->mm, pgd, new_p4d);
> > +		new_p4d = NULL;
> > +	}
> > +	if (p4d_none(*p4d)) {
> >   		if (!new_pud)
> >   			goto out_unlock;
> > -		pgd_populate(kvm->mm, pgd, new_pud);
> > +		p4d_populate(kvm->mm, p4d, new_pud);
> >   		new_pud = NULL;
> >   	}
> > -	pud = pud_offset(pgd, gpa);
> > +	pud = pud_offset(p4d, gpa);
> >   	if (pud_is_leaf(*pud)) {
> >   		unsigned long hgpa = gpa & PUD_MASK;
> > @@ -1220,6 +1255,7 @@ static ssize_t debugfs_radix_read(struct file *file, char __user *buf,
> >   	pgd_t *pgt;
> >   	struct kvm_nested_guest *nested;
> >   	pgd_t pgd, *pgdp;
> > +	p4d_t p4d, *p4dp;
> >   	pud_t pud, *pudp;
> >   	pmd_t pmd, *pmdp;
> >   	pte_t *ptep;
> > @@ -1298,7 +1334,14 @@ static ssize_t debugfs_radix_read(struct file *file, char __user *buf,
> >   			continue;
> >   		}
> > -		pudp = pud_offset(&pgd, gpa);
> > +		p4dp = p4d_offset(&pgd, gpa);
> > +		p4d = READ_ONCE(*p4dp);
> > +		if (!(p4d_val(p4d) & _PAGE_PRESENT)) {
> > +			gpa = (gpa & P4D_MASK) + P4D_SIZE;
> > +			continue;
> > +		}
> > +
> > +		pudp = pud_offset(&p4d, gpa);
> 
> Same, here you are forcing a useless read with READ_ONCE().
> 
> Your change could be limited to
> 
> -		pudp = pud_offset(&pgd, gpa);
> +		pudp = pud_offset(p4d_offset(&pgd, gpa), gpa);

Here again the actual check must be done against p4d rather than pgd. We
could skip READ_ONCE() for pgd, but since it is a debugfs method I don't
think it is more important than code consistency.
 
> This comment applies to many other places.

I'll make another pass to see where we can take the shortcut and use 

	pudp = pud_offset(p4d_offset(...))
 
> >   		pud = READ_ONCE(*pudp);
> >   		if (!(pud_val(pud) & _PAGE_PRESENT)) {
> >   			gpa = (gpa & PUD_MASK) + PUD_SIZE;
> > diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
> > index 3345f039a876..7a59f6863cec 100644
> > --- a/arch/powerpc/lib/code-patching.c
> > +++ b/arch/powerpc/lib/code-patching.c
> > @@ -107,13 +107,18 @@ static inline int unmap_patch_area(unsigned long addr)
> >   	pte_t *ptep;
> >   	pmd_t *pmdp;
> >   	pud_t *pudp;
> > +	p4d_t *p4dp;
> >   	pgd_t *pgdp;
> >   	pgdp = pgd_offset_k(addr);
> >   	if (unlikely(!pgdp))
> >   		return -EINVAL;
> > -	pudp = pud_offset(pgdp, addr);
> > +	p4dp = p4d_offset(pgdp, addr);
> > +	if (unlikely(!p4dp))
> > +		return -EINVAL;
> > +
> > +	pudp = pud_offset(p4dp, addr);
> >   	if (unlikely(!pudp))
> >   		return -EINVAL;
> > diff --git a/arch/powerpc/mm/book3s32/mmu.c b/arch/powerpc/mm/book3s32/mmu.c
> > index 0a1c65a2c565..b2fc3e71165c 100644
> > --- a/arch/powerpc/mm/book3s32/mmu.c
> > +++ b/arch/powerpc/mm/book3s32/mmu.c
> > @@ -312,7 +312,7 @@ void hash_preload(struct mm_struct *mm, unsigned long ea)
> >   	if (!Hash)
> >   		return;
> > -	pmd = pmd_offset(pud_offset(pgd_offset(mm, ea), ea), ea);
> > +	pmd = pmd_offset(pud_offset(p4d_offset(pgd_offset(mm, ea), ea), ea), ea);
> 
> If we continue like this, in ten years this like is going to be many
> kilometers long.
> 
> I think the above would be worth a generic helper.

Agree. My plan was to first unify all the architectures and then start
introducing the generic helpers, like e.g. pmd_offset_mm().
 
> >   	if (!pmd_none(*pmd))
> >   		add_hash_page(mm->context.id, ea, pmd_val(*pmd));
> >   }
> > diff --git a/arch/powerpc/mm/book3s32/tlb.c b/arch/powerpc/mm/book3s32/tlb.c
> > index 2fcd321040ff..175bc33b41b7 100644
> > --- a/arch/powerpc/mm/book3s32/tlb.c
> > +++ b/arch/powerpc/mm/book3s32/tlb.c
> > @@ -87,7 +87,7 @@ static void flush_range(struct mm_struct *mm, unsigned long start,
> >   	if (start >= end)
> >   		return;
> >   	end = (end - 1) | ~PAGE_MASK;
> > -	pmd = pmd_offset(pud_offset(pgd_offset(mm, start), start), start);
> > +	pmd = pmd_offset(pud_offset(p4d_offset(pgd_offset(mm, start), start), start), start);
> >   	for (;;) {
> >   		pmd_end = ((start + PGDIR_SIZE) & PGDIR_MASK) - 1;
> >   		if (pmd_end > end)
> > @@ -145,7 +145,7 @@ void flush_tlb_page(struct vm_area_struct *vma, unsigned long vmaddr)
> >   		return;
> >   	}
> >   	mm = (vmaddr < TASK_SIZE)? vma->vm_mm: &init_mm;
> > -	pmd = pmd_offset(pud_offset(pgd_offset(mm, vmaddr), vmaddr), vmaddr);
> > +	pmd = pmd_offset(pud_offset(p4d_offset(pgd_offset(mm, vmaddr), vmaddr), vmaddr), vmaddr);
> >   	if (!pmd_none(*pmd))
> >   		flush_hash_pages(mm->context.id, vmaddr, pmd_val(*pmd), 1);
> >   }
> > diff --git a/arch/powerpc/mm/book3s64/hash_pgtable.c b/arch/powerpc/mm/book3s64/hash_pgtable.c
> > index 64733b9cb20a..9cd15937e88a 100644
> > --- a/arch/powerpc/mm/book3s64/hash_pgtable.c
> > +++ b/arch/powerpc/mm/book3s64/hash_pgtable.c
> > @@ -148,6 +148,7 @@ void hash__vmemmap_remove_mapping(unsigned long start,
> >   int hash__map_kernel_page(unsigned long ea, unsigned long pa, pgprot_t prot)
> >   {
> >   	pgd_t *pgdp;
> > +	p4d_t *p4dp;
> >   	pud_t *pudp;
> >   	pmd_t *pmdp;
> >   	pte_t *ptep;
> > @@ -155,7 +156,8 @@ int hash__map_kernel_page(unsigned long ea, unsigned long pa, pgprot_t prot)
> >   	BUILD_BUG_ON(TASK_SIZE_USER64 > H_PGTABLE_RANGE);
> >   	if (slab_is_available()) {
> >   		pgdp = pgd_offset_k(ea);
> > -		pudp = pud_alloc(&init_mm, pgdp, ea);
> > +		p4dp = p4d_offset(pgdp, ea);
> > +		pudp = pud_alloc(&init_mm, p4dp, ea);
> 
> Could be a single line, without a new var.
> 
> -		pudp = pud_alloc(&init_mm, pgdp, ea);
> +		pudp = pud_alloc(&init_mm, p4d_offset(pgdp, ea), ea);
> 
> 
> Same kind of comments as already done apply to the rest.
> 
> Christophe

-- 
Sincerely yours,
Mike.

WARNING: multiple messages have this Message-ID (diff)
From: Mike Rapoport <rppt@kernel.org>
To: Christophe Leroy <christophe.leroy@c-s.fr>
Cc: Rich Felker <dalias@libc.org>,
	linux-ia64@vger.kernel.org,
	Geert Uytterhoeven <geert+renesas@glider.be>,
	linux-sh@vger.kernel.org,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	linux-mm@kvack.org, Paul Mackerras <paulus@samba.org>,
	linux-hexagon@vger.kernel.org, Will Deacon <will@kernel.org>,
	kvmarm@lists.cs.columbia.edu, Jonas Bonn <jonas@southpole.se>,
	linux-arch@vger.kernel.org, Brian Cain <bcain@codeaurora.org>,
	Marc Zyngier <maz@kernel.org>,
	Russell King <linux@armlinux.org.uk>,
	Ley Foon Tan <ley.foon.tan@intel.com>,
	Mike Rapoport <rppt@linux.ibm.com>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Julien Thierry <julien.thierry.kdev@gmail.com>,
	uclinux-h8-devel@lists.sourceforge.jp,
	Fenghua Yu <fenghua.yu@intel.com>, Arnd Bergmann <arnd@arndb.de>,
	Suzuki K Poulose <suzuki.poulose@arm.com>,
	kvm-ppc@vger.kernel.org,
	Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>,
	openrisc@lists.librecores.org, Stafford Horne <shorne@gmail.com>,
	Guan Xuetao <gxt@pku.edu.cn>,
	linux-arm-kernel@lists.infradead.org,
	Tony Luck <tony.luck@intel.com>,
	Yoshinori Sato <ysato@users.sourceforge.jp>,
	linux-kernel@vger.kernel.org, James Morse <james.morse@arm.com>,
	Michael Ellerman <mpe@ellerman.id.au>,
	nios2-dev@lists.rocketboards.org,
	Andrew Morton <akpm@linux-foundation.org>,
	linuxppc-dev@lists.ozlabs.org
Subject: Re: [PATCH v2 07/13] powerpc: add support for folded p4d page tables
Date: Tue, 18 Feb 2020 12:54:40 +0200	[thread overview]
Message-ID: <20200218105440.GA1698@hump> (raw)
In-Reply-To: <c79b363c-a111-389a-5752-51cf85fa8c44@c-s.fr>

On Sun, Feb 16, 2020 at 11:41:07AM +0100, Christophe Leroy wrote:
> 
> 
> Le 16/02/2020 à 09:18, Mike Rapoport a écrit :
> > From: Mike Rapoport <rppt@linux.ibm.com>
> > 
> > Implement primitives necessary for the 4th level folding, add walks of p4d
> > level where appropriate and replace 5level-fixup.h with pgtable-nop4d.h.
> 
> I don't think it is worth adding all this additionnals walks of p4d, this
> patch could be limited to changes like:
> 
> -		pud = pud_offset(pgd, gpa);
> +		pud = pud_offset(p4d_offset(pgd, gpa), gpa);
> 
> The additionnal walks should be added through another patch the day powerpc
> need them.

Ok, I'll update the patch to reduce walking the p4d.
 
> See below for more comments.
> 
> > 
> > Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
> > Tested-by: Christophe Leroy <christophe.leroy@c-s.fr> # 8xx and 83xx
> > ---

...

> > diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h
> > index 201a69e6a355..ddddbafff0ab 100644
> > --- a/arch/powerpc/include/asm/book3s/64/pgtable.h
> > +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
> > @@ -2,7 +2,7 @@
> >   #ifndef _ASM_POWERPC_BOOK3S_64_PGTABLE_H_
> >   #define _ASM_POWERPC_BOOK3S_64_PGTABLE_H_
> > -#include <asm-generic/5level-fixup.h>
> > +#include <asm-generic/pgtable-nop4d.h>
> >   #ifndef __ASSEMBLY__
> >   #include <linux/mmdebug.h>
> > @@ -251,7 +251,7 @@ extern unsigned long __pmd_frag_size_shift;
> >   /* Bits to mask out from a PUD to get to the PMD page */
> >   #define PUD_MASKED_BITS		0xc0000000000000ffUL
> >   /* Bits to mask out from a PGD to get to the PUD page */
> > -#define PGD_MASKED_BITS		0xc0000000000000ffUL
> > +#define P4D_MASKED_BITS		0xc0000000000000ffUL
> >   /*
> >    * Used as an indicator for rcu callback functions
> > @@ -949,54 +949,60 @@ static inline bool pud_access_permitted(pud_t pud, bool write)
> >   	return pte_access_permitted(pud_pte(pud), write);
> >   }
> > -#define pgd_write(pgd)		pte_write(pgd_pte(pgd))
> > +#define __p4d_raw(x)	((p4d_t) { __pgd_raw(x) })
> > +static inline __be64 p4d_raw(p4d_t x)
> > +{
> > +	return pgd_raw(x.pgd);
> > +}
> > +
> 
> Shouldn't this be defined in asm/pgtable-be-types.h, just like other
> __pxx_raw() ?

Ideally yes, but this creates weird header file dependencies and untangling
them would generate way too much churn.
 
> > +#define p4d_write(p4d)		pte_write(p4d_pte(p4d))
> > -static inline void pgd_clear(pgd_t *pgdp)
> > +static inline void p4d_clear(p4d_t *p4dp)
> >   {
> > -	*pgdp = __pgd(0);
> > +	*p4dp = __p4d(0);
> >   }

...

> > @@ -573,9 +596,15 @@ int kvmppc_create_pte(struct kvm *kvm, pgd_t *pgtable, pte_t pte,
> >   	/* Traverse the guest's 2nd-level tree, allocate new levels needed */
> >   	pgd = pgtable + pgd_index(gpa);
> > -	pud = NULL;
> > +	p4d = NULL;
> >   	if (pgd_present(*pgd))
> > -		pud = pud_offset(pgd, gpa);
> > +		p4d = p4d_offset(pgd, gpa);
> > +	else
> > +		new_p4d = p4d_alloc_one(kvm->mm, gpa);
> > +
> > +	pud = NULL;
> > +	if (p4d_present(*p4d))
> > +		pud = pud_offset(p4d, gpa);
> 
> Is it worth adding all this new code ?
> 
> My understanding is that the series objective is to get rid of
> __ARCH_HAS_5LEVEL_HACK, to to add support for 5 levels to an architecture
> that not need it (at least for now).
> If we want to add support for 5 levels, it can be done later in another
> patch.
> 
> Here I think your change could be limited to:
> 
> -		pud = pud_offset(pgd, gpa);
> +		pud = pud_offset(p4d_offset(pgd, gpa), gpa);

This won't work. Without __ARCH_USE_5LEVEL_HACK defined pgd_present() is
hardwired to 1 and the actual check for the top level is performed with
p4d_present(). The 'else' clause that allocates p4d will never be taken and
it could be removed, but I prefer to keep it for consistency.
 
> >   	else
> >   		new_pud = pud_alloc_one(kvm->mm, gpa);
> > @@ -597,12 +626,18 @@ int kvmppc_create_pte(struct kvm *kvm, pgd_t *pgtable, pte_t pte,
> >   	/* Now traverse again under the lock and change the tree */
> >   	ret = -ENOMEM;
> >   	if (pgd_none(*pgd)) {
> > +		if (!new_p4d)
> > +			goto out_unlock;
> > +		pgd_populate(kvm->mm, pgd, new_p4d);
> > +		new_p4d = NULL;
> > +	}
> > +	if (p4d_none(*p4d)) {
> >   		if (!new_pud)
> >   			goto out_unlock;
> > -		pgd_populate(kvm->mm, pgd, new_pud);
> > +		p4d_populate(kvm->mm, p4d, new_pud);
> >   		new_pud = NULL;
> >   	}
> > -	pud = pud_offset(pgd, gpa);
> > +	pud = pud_offset(p4d, gpa);
> >   	if (pud_is_leaf(*pud)) {
> >   		unsigned long hgpa = gpa & PUD_MASK;
> > @@ -1220,6 +1255,7 @@ static ssize_t debugfs_radix_read(struct file *file, char __user *buf,
> >   	pgd_t *pgt;
> >   	struct kvm_nested_guest *nested;
> >   	pgd_t pgd, *pgdp;
> > +	p4d_t p4d, *p4dp;
> >   	pud_t pud, *pudp;
> >   	pmd_t pmd, *pmdp;
> >   	pte_t *ptep;
> > @@ -1298,7 +1334,14 @@ static ssize_t debugfs_radix_read(struct file *file, char __user *buf,
> >   			continue;
> >   		}
> > -		pudp = pud_offset(&pgd, gpa);
> > +		p4dp = p4d_offset(&pgd, gpa);
> > +		p4d = READ_ONCE(*p4dp);
> > +		if (!(p4d_val(p4d) & _PAGE_PRESENT)) {
> > +			gpa = (gpa & P4D_MASK) + P4D_SIZE;
> > +			continue;
> > +		}
> > +
> > +		pudp = pud_offset(&p4d, gpa);
> 
> Same, here you are forcing a useless read with READ_ONCE().
> 
> Your change could be limited to
> 
> -		pudp = pud_offset(&pgd, gpa);
> +		pudp = pud_offset(p4d_offset(&pgd, gpa), gpa);

Here again the actual check must be done against p4d rather than pgd. We
could skip READ_ONCE() for pgd, but since it is a debugfs method I don't
think it is more important than code consistency.
 
> This comment applies to many other places.

I'll make another pass to see where we can take the shortcut and use 

	pudp = pud_offset(p4d_offset(...))
 
> >   		pud = READ_ONCE(*pudp);
> >   		if (!(pud_val(pud) & _PAGE_PRESENT)) {
> >   			gpa = (gpa & PUD_MASK) + PUD_SIZE;
> > diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
> > index 3345f039a876..7a59f6863cec 100644
> > --- a/arch/powerpc/lib/code-patching.c
> > +++ b/arch/powerpc/lib/code-patching.c
> > @@ -107,13 +107,18 @@ static inline int unmap_patch_area(unsigned long addr)
> >   	pte_t *ptep;
> >   	pmd_t *pmdp;
> >   	pud_t *pudp;
> > +	p4d_t *p4dp;
> >   	pgd_t *pgdp;
> >   	pgdp = pgd_offset_k(addr);
> >   	if (unlikely(!pgdp))
> >   		return -EINVAL;
> > -	pudp = pud_offset(pgdp, addr);
> > +	p4dp = p4d_offset(pgdp, addr);
> > +	if (unlikely(!p4dp))
> > +		return -EINVAL;
> > +
> > +	pudp = pud_offset(p4dp, addr);
> >   	if (unlikely(!pudp))
> >   		return -EINVAL;
> > diff --git a/arch/powerpc/mm/book3s32/mmu.c b/arch/powerpc/mm/book3s32/mmu.c
> > index 0a1c65a2c565..b2fc3e71165c 100644
> > --- a/arch/powerpc/mm/book3s32/mmu.c
> > +++ b/arch/powerpc/mm/book3s32/mmu.c
> > @@ -312,7 +312,7 @@ void hash_preload(struct mm_struct *mm, unsigned long ea)
> >   	if (!Hash)
> >   		return;
> > -	pmd = pmd_offset(pud_offset(pgd_offset(mm, ea), ea), ea);
> > +	pmd = pmd_offset(pud_offset(p4d_offset(pgd_offset(mm, ea), ea), ea), ea);
> 
> If we continue like this, in ten years this like is going to be many
> kilometers long.
> 
> I think the above would be worth a generic helper.

Agree. My plan was to first unify all the architectures and then start
introducing the generic helpers, like e.g. pmd_offset_mm().
 
> >   	if (!pmd_none(*pmd))
> >   		add_hash_page(mm->context.id, ea, pmd_val(*pmd));
> >   }
> > diff --git a/arch/powerpc/mm/book3s32/tlb.c b/arch/powerpc/mm/book3s32/tlb.c
> > index 2fcd321040ff..175bc33b41b7 100644
> > --- a/arch/powerpc/mm/book3s32/tlb.c
> > +++ b/arch/powerpc/mm/book3s32/tlb.c
> > @@ -87,7 +87,7 @@ static void flush_range(struct mm_struct *mm, unsigned long start,
> >   	if (start >= end)
> >   		return;
> >   	end = (end - 1) | ~PAGE_MASK;
> > -	pmd = pmd_offset(pud_offset(pgd_offset(mm, start), start), start);
> > +	pmd = pmd_offset(pud_offset(p4d_offset(pgd_offset(mm, start), start), start), start);
> >   	for (;;) {
> >   		pmd_end = ((start + PGDIR_SIZE) & PGDIR_MASK) - 1;
> >   		if (pmd_end > end)
> > @@ -145,7 +145,7 @@ void flush_tlb_page(struct vm_area_struct *vma, unsigned long vmaddr)
> >   		return;
> >   	}
> >   	mm = (vmaddr < TASK_SIZE)? vma->vm_mm: &init_mm;
> > -	pmd = pmd_offset(pud_offset(pgd_offset(mm, vmaddr), vmaddr), vmaddr);
> > +	pmd = pmd_offset(pud_offset(p4d_offset(pgd_offset(mm, vmaddr), vmaddr), vmaddr), vmaddr);
> >   	if (!pmd_none(*pmd))
> >   		flush_hash_pages(mm->context.id, vmaddr, pmd_val(*pmd), 1);
> >   }
> > diff --git a/arch/powerpc/mm/book3s64/hash_pgtable.c b/arch/powerpc/mm/book3s64/hash_pgtable.c
> > index 64733b9cb20a..9cd15937e88a 100644
> > --- a/arch/powerpc/mm/book3s64/hash_pgtable.c
> > +++ b/arch/powerpc/mm/book3s64/hash_pgtable.c
> > @@ -148,6 +148,7 @@ void hash__vmemmap_remove_mapping(unsigned long start,
> >   int hash__map_kernel_page(unsigned long ea, unsigned long pa, pgprot_t prot)
> >   {
> >   	pgd_t *pgdp;
> > +	p4d_t *p4dp;
> >   	pud_t *pudp;
> >   	pmd_t *pmdp;
> >   	pte_t *ptep;
> > @@ -155,7 +156,8 @@ int hash__map_kernel_page(unsigned long ea, unsigned long pa, pgprot_t prot)
> >   	BUILD_BUG_ON(TASK_SIZE_USER64 > H_PGTABLE_RANGE);
> >   	if (slab_is_available()) {
> >   		pgdp = pgd_offset_k(ea);
> > -		pudp = pud_alloc(&init_mm, pgdp, ea);
> > +		p4dp = p4d_offset(pgdp, ea);
> > +		pudp = pud_alloc(&init_mm, p4dp, ea);
> 
> Could be a single line, without a new var.
> 
> -		pudp = pud_alloc(&init_mm, pgdp, ea);
> +		pudp = pud_alloc(&init_mm, p4d_offset(pgdp, ea), ea);
> 
> 
> Same kind of comments as already done apply to the rest.
> 
> Christophe

-- 
Sincerely yours,
Mike.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2020-02-18 10:54 UTC|newest]

Thread overview: 189+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-16  8:18 [PATCH v2 00/13] mm: remove __ARCH_HAS_5LEVEL_HACK Mike Rapoport
2020-02-16  8:18 ` Mike Rapoport
2020-02-16  8:18 ` Mike Rapoport
2020-02-16  8:18 ` [OpenRISC] " Mike Rapoport
2020-02-16  8:18 ` Mike Rapoport
2020-02-16  8:18 ` Mike Rapoport
2020-02-16  8:18 ` Mike Rapoport
2020-02-16  8:18 ` Mike Rapoport
2020-02-16  8:18 ` [PATCH v2 01/13] arm/arm64: add support for folded p4d page tables Mike Rapoport
2020-02-16  8:18   ` Mike Rapoport
2020-02-16  8:18   ` Mike Rapoport
2020-02-16  8:18   ` [OpenRISC] " Mike Rapoport
2020-02-16  8:18   ` Mike Rapoport
2020-02-16  8:18   ` Mike Rapoport
2020-02-16  8:18   ` Mike Rapoport
2020-02-16  8:18   ` Mike Rapoport
2020-02-16  8:18 ` [PATCH v2 02/13] h8300: remove usage of __ARCH_USE_5LEVEL_HACK Mike Rapoport
2020-02-16  8:18   ` Mike Rapoport
2020-02-16  8:18   ` Mike Rapoport
2020-02-16  8:18   ` [OpenRISC] " Mike Rapoport
2020-02-16  8:18   ` Mike Rapoport
2020-02-16  8:18   ` Mike Rapoport
2020-02-16  8:18   ` Mike Rapoport
2020-02-16  8:18   ` Mike Rapoport
2020-02-16  8:18 ` [PATCH v2 03/13] hexagon: remove __ARCH_USE_5LEVEL_HACK Mike Rapoport
2020-02-16  8:18   ` Mike Rapoport
2020-02-16  8:18   ` Mike Rapoport
2020-02-16  8:18   ` [OpenRISC] " Mike Rapoport
2020-02-16  8:18   ` Mike Rapoport
2020-02-16  8:18   ` Mike Rapoport
2020-02-16  8:18   ` Mike Rapoport
2020-02-16  8:18   ` Mike Rapoport
2020-02-16  8:18 ` [PATCH v2 04/13] ia64: add support for folded p4d page tables Mike Rapoport
2020-02-16  8:18   ` Mike Rapoport
2020-02-16  8:18   ` Mike Rapoport
2020-02-16  8:18   ` [OpenRISC] " Mike Rapoport
2020-02-16  8:18   ` Mike Rapoport
2020-02-16  8:18   ` Mike Rapoport
2020-02-16  8:18   ` Mike Rapoport
2020-02-16  8:18   ` Mike Rapoport
2020-02-16  8:18 ` [PATCH v2 05/13] nios2: " Mike Rapoport
2020-02-16  8:18   ` Mike Rapoport
2020-02-16  8:18   ` Mike Rapoport
2020-02-16  8:18   ` [OpenRISC] " Mike Rapoport
2020-02-16  8:18   ` Mike Rapoport
2020-02-16  8:18   ` Mike Rapoport
2020-02-16  8:18   ` Mike Rapoport
2020-02-16  8:18   ` Mike Rapoport
2020-02-16  8:18 ` [PATCH v2 06/13] openrisc: " Mike Rapoport
2020-02-16  8:18   ` Mike Rapoport
2020-02-16  8:18   ` Mike Rapoport
2020-02-16  8:18   ` [OpenRISC] " Mike Rapoport
2020-02-16  8:18   ` Mike Rapoport
2020-02-16  8:18   ` Mike Rapoport
2020-02-16  8:18   ` Mike Rapoport
2020-02-16  8:18   ` Mike Rapoport
2020-02-16  8:18 ` [PATCH v2 07/13] powerpc: " Mike Rapoport
2020-02-16  8:18   ` Mike Rapoport
2020-02-16  8:18   ` Mike Rapoport
2020-02-16  8:18   ` [OpenRISC] " Mike Rapoport
2020-02-16  8:18   ` Mike Rapoport
2020-02-16  8:18   ` Mike Rapoport
2020-02-16  8:18   ` Mike Rapoport
2020-02-16  8:18   ` Mike Rapoport
2020-02-16 10:41   ` Christophe Leroy
2020-02-16 10:41     ` Christophe Leroy
2020-02-16 10:41     ` Christophe Leroy
2020-02-16 10:41     ` [OpenRISC] " Christophe Leroy
2020-02-16 10:41     ` Christophe Leroy
2020-02-16 10:41     ` Christophe Leroy
2020-02-16 10:41     ` Christophe Leroy
2020-02-18 10:54     ` Mike Rapoport [this message]
2020-02-18 10:54       ` Mike Rapoport
2020-02-18 10:54       ` Mike Rapoport
2020-02-18 10:54       ` [OpenRISC] " Mike Rapoport
2020-02-18 10:54       ` Mike Rapoport
2020-02-18 10:54       ` Mike Rapoport
2020-02-18 10:54       ` Mike Rapoport
2020-02-26  9:13       ` Mike Rapoport
2020-02-26  9:13         ` Mike Rapoport
2020-02-26  9:13         ` Mike Rapoport
2020-02-26  9:13         ` [OpenRISC] " Mike Rapoport
2020-02-26  9:13         ` Mike Rapoport
2020-02-26  9:13         ` Mike Rapoport
2020-02-26  9:13         ` Mike Rapoport
2020-02-26  9:46         ` Christophe Leroy
2020-02-26  9:46           ` Christophe Leroy
2020-02-26  9:46           ` Christophe Leroy
2020-02-26  9:46           ` [OpenRISC] " Christophe Leroy
2020-02-26  9:46           ` Christophe Leroy
2020-02-26  9:46           ` Christophe Leroy
2020-02-26  9:46           ` Christophe Leroy
2020-02-26 10:56           ` Mike Rapoport
2020-02-26 10:56             ` Mike Rapoport
2020-02-26 10:56             ` Mike Rapoport
2020-02-26 10:56             ` [OpenRISC] " Mike Rapoport
2020-02-26 10:56             ` Mike Rapoport
2020-02-26 10:56             ` Mike Rapoport
2020-02-26 10:56             ` Mike Rapoport
2020-02-26 11:20             ` Christophe Leroy
2020-02-26 11:20               ` Christophe Leroy
2020-02-26 11:20               ` Christophe Leroy
2020-02-26 11:20               ` [OpenRISC] " Christophe Leroy
2020-02-26 11:20               ` Christophe Leroy
2020-02-26 11:20               ` Christophe Leroy
2020-02-26 11:20               ` Christophe Leroy
2020-02-19 12:07   ` Christophe Leroy
2020-02-19 12:07     ` Christophe Leroy
2020-02-19 12:07     ` Christophe Leroy
2020-02-19 12:07     ` [OpenRISC] " Christophe Leroy
2020-02-19 12:07     ` Christophe Leroy
2020-02-19 12:07     ` Christophe Leroy
2020-02-19 12:07     ` Christophe Leroy
2020-02-19 13:24     ` Mike Rapoport
2020-02-19 13:24       ` Mike Rapoport
2020-02-19 13:24       ` Mike Rapoport
2020-02-19 13:24       ` [OpenRISC] " Mike Rapoport
2020-02-19 13:24       ` Mike Rapoport
2020-02-19 13:24       ` Mike Rapoport
2020-02-19 13:24       ` Mike Rapoport
2020-02-16  8:18 ` [PATCH v2 08/13] sh: fault: Modernize printing of kernel messages Mike Rapoport
2020-02-16  8:18   ` Mike Rapoport
2020-02-16  8:18   ` Mike Rapoport
2020-02-16  8:18   ` [OpenRISC] " Mike Rapoport
2020-02-16  8:18   ` Mike Rapoport
2020-02-16  8:18   ` Mike Rapoport
2020-02-16  8:18   ` Mike Rapoport
2020-02-16  8:18   ` Mike Rapoport
2020-02-16  8:18 ` [PATCH v2 09/13] sh: drop __pXd_offset() macros that duplicate pXd_index() ones Mike Rapoport
2020-02-16  8:18   ` Mike Rapoport
2020-02-16  8:18   ` Mike Rapoport
2020-02-16  8:18   ` [OpenRISC] " Mike Rapoport
2020-02-16  8:18   ` Mike Rapoport
2020-02-16  8:18   ` Mike Rapoport
2020-02-16  8:18   ` Mike Rapoport
2020-02-16  8:18   ` Mike Rapoport
2020-02-16  8:18 ` [PATCH v2 10/13] sh: add support for folded p4d page tables Mike Rapoport
2020-02-16  8:18   ` Mike Rapoport
2020-02-16  8:18   ` Mike Rapoport
2020-02-16  8:18   ` [OpenRISC] " Mike Rapoport
2020-02-16  8:18   ` Mike Rapoport
2020-02-16  8:18   ` Mike Rapoport
2020-02-16  8:18   ` Mike Rapoport
2020-02-16  8:18   ` Mike Rapoport
2020-02-16  8:18 ` [PATCH v2 11/13] unicore32: remove __ARCH_USE_5LEVEL_HACK Mike Rapoport
2020-02-16  8:18   ` Mike Rapoport
2020-02-16  8:18   ` Mike Rapoport
2020-02-16  8:18   ` [OpenRISC] " Mike Rapoport
2020-02-16  8:18   ` Mike Rapoport
2020-02-16  8:18   ` Mike Rapoport
2020-02-16  8:18   ` Mike Rapoport
2020-02-16  8:18   ` Mike Rapoport
2020-02-16  8:18 ` [PATCH v2 12/13] asm-generic: remove pgtable-nop4d-hack.h Mike Rapoport
2020-02-16  8:18   ` Mike Rapoport
2020-02-16  8:18   ` Mike Rapoport
2020-02-16  8:18   ` [OpenRISC] " Mike Rapoport
2020-02-16  8:18   ` Mike Rapoport
2020-02-16  8:18   ` Mike Rapoport
2020-02-16  8:18   ` Mike Rapoport
2020-02-16  8:18   ` Mike Rapoport
2020-02-16  8:18 ` [PATCH v2 13/13] mm: remove __ARCH_HAS_5LEVEL_HACK and include/asm-generic/5level-fixup.h Mike Rapoport
2020-02-16  8:18   ` Mike Rapoport
2020-02-16  8:18   ` Mike Rapoport
2020-02-16  8:18   ` [OpenRISC] " Mike Rapoport
2020-02-16  8:18   ` Mike Rapoport
2020-02-16  8:18   ` Mike Rapoport
2020-02-16  8:18   ` Mike Rapoport
2020-02-16  8:18   ` Mike Rapoport
2020-02-16  8:22 ` [PATCH v2 00/13] mm: remove __ARCH_HAS_5LEVEL_HACK Russell King - ARM Linux admin
2020-02-16  8:22   ` Russell King - ARM Linux admin
2020-02-16  8:22   ` Russell King - ARM Linux admin
2020-02-16  8:22   ` [OpenRISC] " Russell King - ARM Linux admin
2020-02-16  8:22   ` Russell King - ARM Linux admin
2020-02-16  8:22   ` Russell King - ARM Linux admin
2020-02-16  8:22   ` Russell King - ARM Linux admin
2020-02-16 10:45   ` Christophe Leroy
2020-02-16 10:45     ` Christophe Leroy
2020-02-16 10:45     ` Christophe Leroy
2020-02-16 10:45     ` [OpenRISC] " Christophe Leroy
2020-02-16 10:45     ` Christophe Leroy
2020-02-16 10:45     ` Christophe Leroy
2020-02-16 10:45     ` Christophe Leroy
2020-02-17  6:23   ` Mike Rapoport
2020-02-17  6:23     ` Mike Rapoport
2020-02-17  6:23     ` Mike Rapoport
2020-02-17  6:23     ` [OpenRISC] " Mike Rapoport
2020-02-17  6:23     ` Mike Rapoport
2020-02-17  6:23     ` Mike Rapoport
2020-02-17  6:23     ` Mike Rapoport

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200218105440.GA1698@hump \
    --to=rppt@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=arnd@arndb.de \
    --cc=bcain@codeaurora.org \
    --cc=benh@kernel.crashing.org \
    --cc=catalin.marinas@arm.com \
    --cc=christophe.leroy@c-s.fr \
    --cc=dalias@libc.org \
    --cc=fenghua.yu@intel.com \
    --cc=geert+renesas@glider.be \
    --cc=gxt@pku.edu.cn \
    --cc=jonas@southpole.se \
    --cc=kvm-ppc@vger.kernel.org \
    --cc=kvmarm@lists.cs.columbia.edu \
    --cc=ley.foon.tan@intel.com \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-hexagon@vger.kernel.org \
    --cc=linux-ia64@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-sh@vger.kernel.org \
    --cc=linux@armlinux.org.uk \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=maz@kernel.org \
    --cc=mpe@ellerman.id.au \
    --cc=nios2-dev@lists.rocketboards.org \
    --cc=openrisc@lists.librecores.org \
    --cc=paulus@samba.org \
    --cc=rppt@linux.ibm.com \
    --cc=shorne@gmail.com \
    --cc=stefan.kristiansson@saunalahti.fi \
    --cc=tony.luck@intel.com \
    --cc=uclinux-h8-devel@lists.sourceforge.jp \
    --cc=will@kernel.org \
    --cc=ysato@users.sourceforge.jp \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.