LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH] powerpc: use swap() to make code cleaner
From: Stephen Rothwell @ 2021-11-04 10:06 UTC (permalink / raw)
  To: davidcomponentone
  Cc: sfr, sxwjean, Zeal Robot, linux-kernel, nathan, yang.guang5,
	paulus, aneesh.kumar, linuxppc-dev
In-Reply-To: <20211104061709.1505592-1-yang.guang5@zte.com.cn>

[-- Attachment #1: Type: text/plain, Size: 327 bytes --]

Hi,

On Thu,  4 Nov 2021 14:17:09 +0800 davidcomponentone@gmail.com wrote:
>
> From: Yang Guang <yang.guang5@zte.com.cn>
> 
> Use the macro 'swap()' defined in 'include/linux/minmax.h' to avoid
> opencoding it.

So if swap() is in the above include file, then you should include it.

-- 
Cheers,
Stephen Rothwell

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply

* Re: Fwd: Fwd: X stopped working with 5.14 on iBook
From: Andreas Schwab @ 2021-11-04  9:01 UTC (permalink / raw)
  To: Stanley Johnson
  Cc: Christopher M. Riedl, linuxppc-dev, Riccardo Mottola, Finn Thain
In-Reply-To: <m0OyfpLa146ICa5BY-R6updiBgoAJmoZY4ywxwIMXAXKPzX6HQrpXInpO7V28XnYOm87T4pTbY4VTmJxeY2c_L4u7s4D84cvxQ3Ab-D86_M=@protonmail.com>

I don't use debian.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."

^ permalink raw reply

* Re: [PATCH stable 4.19 1/1] arch: pgtable: define MAX_POSSIBLE_PHYSMEM_BITS where needed
From: Greg KH @ 2021-11-04  8:34 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: open list:MIPS, Palmer Dabbelt, Stefan Agner, Paul Mackerras,
	stable, open list:RISC-V ARCHITECTURE, Sasha Levin, Russell King,
	Mike Rapoport, James Hogan, open list:SYNOPSYS ARC ARCHITECTURE,
	open list:GENERIC INCLUDE/ASM HEADER FILES, Albert Ou,
	Arnd Bergmann, moderated list:ARM PORT, Thomas Bogendoerfer,
	Vineet Gupta, linux-kernel, Ralf Baechle, Paul Burton,
	open list:LINUX FOR POWERPC (32-BIT AND 64-BIT)
In-Reply-To: <20211103205656.374678-1-f.fainelli@gmail.com>

On Wed, Nov 03, 2021 at 01:56:56PM -0700, Florian Fainelli wrote:
> From: Arnd Bergmann <arnd@arndb.de>
> 
> [ Upstream commit cef397038167ac15d085914493d6c86385773709 ]
> 
> Stefan Agner reported a bug when using zsram on 32-bit Arm machines
> with RAM above the 4GB address boundary:
> 
>   Unable to handle kernel NULL pointer dereference at virtual address 00000000
>   pgd = a27bd01c
>   [00000000] *pgd=236a0003, *pmd=1ffa64003
>   Internal error: Oops: 207 [#1] SMP ARM
>   Modules linked in: mdio_bcm_unimac(+) brcmfmac cfg80211 brcmutil raspberrypi_hwmon hci_uart crc32_arm_ce bcm2711_thermal phy_generic genet
>   CPU: 0 PID: 123 Comm: mkfs.ext4 Not tainted 5.9.6 #1
>   Hardware name: BCM2711
>   PC is at zs_map_object+0x94/0x338
>   LR is at zram_bvec_rw.constprop.0+0x330/0xa64
>   pc : [<c0602b38>]    lr : [<c0bda6a0>]    psr: 60000013
>   sp : e376bbe0  ip : 00000000  fp : c1e2921c
>   r10: 00000002  r9 : c1dda730  r8 : 00000000
>   r7 : e8ff7a00  r6 : 00000000  r5 : 02f9ffa0  r4 : e3710000
>   r3 : 000fdffe  r2 : c1e0ce80  r1 : ebf979a0  r0 : 00000000
>   Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
>   Control: 30c5383d  Table: 235c2a80  DAC: fffffffd
>   Process mkfs.ext4 (pid: 123, stack limit = 0x495a22e6)
>   Stack: (0xe376bbe0 to 0xe376c000)
> 
> As it turns out, zsram needs to know the maximum memory size, which
> is defined in MAX_PHYSMEM_BITS when CONFIG_SPARSEMEM is set, or in
> MAX_POSSIBLE_PHYSMEM_BITS on the x86 architecture.
> 
> The same problem will be hit on all 32-bit architectures that have a
> physical address space larger than 4GB and happen to not enable sparsemem
> and include asm/sparsemem.h from asm/pgtable.h.
> 
> After the initial discussion, I suggested just always defining
> MAX_POSSIBLE_PHYSMEM_BITS whenever CONFIG_PHYS_ADDR_T_64BIT is
> set, or provoking a build error otherwise. This addresses all
> configurations that can currently have this runtime bug, but
> leaves all other configurations unchanged.
> 
> I looked up the possible number of bits in source code and
> datasheets, here is what I found:
> 
>  - on ARC, CONFIG_ARC_HAS_PAE40 controls whether 32 or 40 bits are used
>  - on ARM, CONFIG_LPAE enables 40 bit addressing, without it we never
>    support more than 32 bits, even though supersections in theory allow
>    up to 40 bits as well.
>  - on MIPS, some MIPS32r1 or later chips support 36 bits, and MIPS32r5
>    XPA supports up to 60 bits in theory, but 40 bits are more than
>    anyone will ever ship
>  - On PowerPC, there are three different implementations of 36 bit
>    addressing, but 32-bit is used without CONFIG_PTE_64BIT
>  - On RISC-V, the normal page table format can support 34 bit
>    addressing. There is no highmem support on RISC-V, so anything
>    above 2GB is unused, but it might be useful to eventually support
>    CONFIG_ZRAM for high pages.
> 
> Fixes: 61989a80fb3a ("staging: zsmalloc: zsmalloc memory allocation library")
> Fixes: 02390b87a945 ("mm/zsmalloc: Prepare to variable MAX_PHYSMEM_BITS")
> Acked-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
> Reviewed-by: Stefan Agner <stefan@agner.ch>
> Tested-by: Stefan Agner <stefan@agner.ch>
> Acked-by: Mike Rapoport <rppt@linux.ibm.com>
> Link: https://lore.kernel.org/linux-mm/bdfa44bf1c570b05d6c70898e2bbb0acf234ecdf.1604762181.git.stefan@agner.ch/
> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
> Signed-off-by: Sasha Levin <sashal@kernel.org>
> [florian: patch arch/powerpc/include/asm/pte-common.h for 4.19.y]
> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>

All now queued up, thanks.

greg k-h

^ permalink raw reply

* Patch "mm/zsmalloc: Prepare to variable MAX_PHYSMEM_BITS" has been added to the 4.14-stable tree
From: gregkh @ 2021-11-04  8:34 UTC (permalink / raw)
  To: arnd, benh, bp, f.fainelli, gregkh, hpa, kirill.shutemov,
	linux-arm-kernel, linux-mips, linux-mm, linux-snps-arc, linux,
	linuxppc-dev, luto, minchan, mingo, mingo, mpe, ngupta, paulus,
	peterz, ralf, rppt, sashal, sergey.senozhatsky.work, stefan, tglx,
	torvalds, tsbogend, vgupta, x86
  Cc: stable-commits
In-Reply-To: <20211103205704.374734-2-f.fainelli@gmail.com>


This is a note to let you know that I've just added the patch titled

    mm/zsmalloc: Prepare to variable MAX_PHYSMEM_BITS

to the 4.14-stable tree which can be found at:
    http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary

The filename of the patch is:
     mm-zsmalloc-prepare-to-variable-max_physmem_bits.patch
and it can be found in the queue-4.14 subdirectory.

If you, or anyone else, feels it should not be added to the stable tree,
please let <stable@vger.kernel.org> know about it.


From foo@baz Thu Nov  4 09:33:05 AM CET 2021
From: Florian Fainelli <f.fainelli@gmail.com>
Date: Wed,  3 Nov 2021 13:57:03 -0700
Subject: mm/zsmalloc: Prepare to variable MAX_PHYSMEM_BITS
To: linux-kernel@vger.kernel.org
Cc: stable@vger.kernel.org, Greg Kroah-Hartman <gregkh@linuxfoundation.org>, Sasha Levin <sashal@kernel.org>, "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>, Nitin Gupta <ngupta@vflare.org>, Minchan Kim <minchan@kernel.org>, Andy Lutomirski <luto@amacapital.net>, Borislav Petkov <bp@suse.de>, Linus Torvalds <torvalds@linux-foundation.org>, Peter Zijlstra <peterz@infradead.org>, Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>, Thomas Gleixner <tglx@linutronix.de>, linux-mm@kvack.org, Ingo Molnar <mingo@kernel.org>, Florian Fainelli <f.fainelli@gmail.com>, Vineet Gupta <vgupta@synopsys.com>, Russell King <linux@armlinux.org.uk>, Ralf Baechle <ralf@linux-mips.org>, Benjamin Herrenschmidt <benh@kernel.crashing.org>, Paul Mackerras <paulus@samba.org>, Michael Ellerman <mpe@ellerman.id.au>, Ingo Molnar <mingo@redhat.com>, "H. Peter Anvin" <hpa@zytor.com>, x86@kernel.org (maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)), Arnd Bergmann <arnd@arndb.de>, Stefan Agner <stefan@agn
 er.ch>, Thomas Bogendoerfer <tsbogend@alpha.franken.de>, Mike Rapoport <rppt@linux.ibm.com>, linux-snps-arc@lists.infradead.org (open list:SYNOPSYS ARC ARCHITECTURE), linux-arm-kernel@lists.infradead.org (moderated list:ARM PORT), linux-mips@linux-mips.org (open list:MIPS), linuxppc-dev@lists.ozlabs.org (open list:LINUX FOR POWERPC (32-BIT AND 64-BIT)), linux-arch@vger.kernel.org (open list:GENERIC INCLUDE/ASM HEADER FILES)
Message-ID: <20211103205704.374734-2-f.fainelli@gmail.com>

From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>

commit 02390b87a9459937cdb299e6b34ff33992512ec7 upstream

With boot-time switching between paging mode we will have variable
MAX_PHYSMEM_BITS.

Let's use the maximum variable possible for CONFIG_X86_5LEVEL=y
configuration to define zsmalloc data structures.

The patch introduces MAX_POSSIBLE_PHYSMEM_BITS to cover such case.
It also suits well to handle PAE special case.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Nitin Gupta <ngupta@vflare.org>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20180214111656.88514-3-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/x86/include/asm/pgtable-3level_types.h |    1 +
 arch/x86/include/asm/pgtable_64_types.h     |    2 ++
 mm/zsmalloc.c                               |   13 +++++++------
 3 files changed, 10 insertions(+), 6 deletions(-)

--- a/arch/x86/include/asm/pgtable-3level_types.h
+++ b/arch/x86/include/asm/pgtable-3level_types.h
@@ -44,5 +44,6 @@ typedef union {
  */
 #define PTRS_PER_PTE	512
 
+#define MAX_POSSIBLE_PHYSMEM_BITS	36
 
 #endif /* _ASM_X86_PGTABLE_3LEVEL_DEFS_H */
--- a/arch/x86/include/asm/pgtable_64_types.h
+++ b/arch/x86/include/asm/pgtable_64_types.h
@@ -40,6 +40,8 @@ typedef struct { pteval_t pte; } pte_t;
 #define P4D_SIZE	(_AC(1, UL) << P4D_SHIFT)
 #define P4D_MASK	(~(P4D_SIZE - 1))
 
+#define MAX_POSSIBLE_PHYSMEM_BITS	52
+
 #else /* CONFIG_X86_5LEVEL */
 
 /*
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -83,18 +83,19 @@
  * This is made more complicated by various memory models and PAE.
  */
 
-#ifndef MAX_PHYSMEM_BITS
-#ifdef CONFIG_HIGHMEM64G
-#define MAX_PHYSMEM_BITS 36
-#else /* !CONFIG_HIGHMEM64G */
+#ifndef MAX_POSSIBLE_PHYSMEM_BITS
+#ifdef MAX_PHYSMEM_BITS
+#define MAX_POSSIBLE_PHYSMEM_BITS MAX_PHYSMEM_BITS
+#else
 /*
  * If this definition of MAX_PHYSMEM_BITS is used, OBJ_INDEX_BITS will just
  * be PAGE_SHIFT
  */
-#define MAX_PHYSMEM_BITS BITS_PER_LONG
+#define MAX_POSSIBLE_PHYSMEM_BITS BITS_PER_LONG
 #endif
 #endif
-#define _PFN_BITS		(MAX_PHYSMEM_BITS - PAGE_SHIFT)
+
+#define _PFN_BITS		(MAX_POSSIBLE_PHYSMEM_BITS - PAGE_SHIFT)
 
 /*
  * Memory for allocating for handle keeps object position by


Patches currently in stable-queue which might be from f.fainelli@gmail.com are

queue-4.14/mm-zsmalloc-prepare-to-variable-max_physmem_bits.patch
queue-4.14/arch-pgtable-define-max_possible_physmem_bits-where-needed.patch

^ permalink raw reply

* Patch "arch: pgtable: define MAX_POSSIBLE_PHYSMEM_BITS where needed" has been added to the 4.14-stable tree
From: gregkh @ 2021-11-04  8:34 UTC (permalink / raw)
  To: arnd, benh, f.fainelli, gregkh, hpa, kirill.shutemov,
	linux-arm-kernel, linux-mips, linux-mm, linux-snps-arc, linux,
	linuxppc-dev, minchan, mingo, mpe, ngupta, paulus, ralf, rppt,
	sashal, sergey.senozhatsky.work, stefan, tglx, tsbogend, vgupta,
	x86
  Cc: stable-commits
In-Reply-To: <20211103205704.374734-3-f.fainelli@gmail.com>


This is a note to let you know that I've just added the patch titled

    arch: pgtable: define MAX_POSSIBLE_PHYSMEM_BITS where needed

to the 4.14-stable tree which can be found at:
    http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary

The filename of the patch is:
     arch-pgtable-define-max_possible_physmem_bits-where-needed.patch
and it can be found in the queue-4.14 subdirectory.

If you, or anyone else, feels it should not be added to the stable tree,
please let <stable@vger.kernel.org> know about it.


From foo@baz Thu Nov  4 09:33:05 AM CET 2021
From: Florian Fainelli <f.fainelli@gmail.com>
Date: Wed,  3 Nov 2021 13:57:04 -0700
Subject: arch: pgtable: define MAX_POSSIBLE_PHYSMEM_BITS where needed
To: linux-kernel@vger.kernel.org
Cc: stable@vger.kernel.org, Greg Kroah-Hartman <gregkh@linuxfoundation.org>, Sasha Levin <sashal@kernel.org>, Arnd Bergmann <arnd@arndb.de>, Thomas Bogendoerfer <tsbogend@alpha.franken.de>, Stefan Agner <stefan@agner.ch>, Mike Rapoport <rppt@linux.ibm.com>, Florian Fainelli <f.fainelli@gmail.com>, Vineet Gupta <vgupta@synopsys.com>, Russell King <linux@armlinux.org.uk>, Ralf Baechle <ralf@linux-mips.org>, Benjamin Herrenschmidt <benh@kernel.crashing.org>, Paul Mackerras <paulus@samba.org>, Michael Ellerman <mpe@ellerman.id.au>, Thomas Gleixner <tglx@linutronix.de>, Ingo Molnar <mingo@redhat.com>, "H. Peter Anvin" <hpa@zytor.com>, x86@kernel.org (maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)), Minchan Kim <minchan@kernel.org>, Nitin Gupta <ngupta@vflare.org>, Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>, "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>, linux-snps-arc@lists.infradead.org (open list:SYNOPSYS ARC ARCHITECTURE), linux-arm-kernel@lists.infradead.org (mod
 erated list:ARM PORT), linux-mips@linux-mips.org (open list:MIPS), linuxppc-dev@lists.ozlabs.org (open list:LINUX FOR POWERPC (32-BIT AND 64-BIT)), linux-arch@vger.kernel.org (open list:GENERIC INCLUDE/ASM HEADER FILES), linux-mm@kvack.org (open list:ZSMALLOC COMPRESSED SLAB MEMORY ALLOCATOR)
Message-ID: <20211103205704.374734-3-f.fainelli@gmail.com>

From: Arnd Bergmann <arnd@arndb.de>

[ Upstream commit cef397038167ac15d085914493d6c86385773709 ]

Stefan Agner reported a bug when using zsram on 32-bit Arm machines
with RAM above the 4GB address boundary:

  Unable to handle kernel NULL pointer dereference at virtual address 00000000
  pgd = a27bd01c
  [00000000] *pgd=236a0003, *pmd=1ffa64003
  Internal error: Oops: 207 [#1] SMP ARM
  Modules linked in: mdio_bcm_unimac(+) brcmfmac cfg80211 brcmutil raspberrypi_hwmon hci_uart crc32_arm_ce bcm2711_thermal phy_generic genet
  CPU: 0 PID: 123 Comm: mkfs.ext4 Not tainted 5.9.6 #1
  Hardware name: BCM2711
  PC is at zs_map_object+0x94/0x338
  LR is at zram_bvec_rw.constprop.0+0x330/0xa64
  pc : [<c0602b38>]    lr : [<c0bda6a0>]    psr: 60000013
  sp : e376bbe0  ip : 00000000  fp : c1e2921c
  r10: 00000002  r9 : c1dda730  r8 : 00000000
  r7 : e8ff7a00  r6 : 00000000  r5 : 02f9ffa0  r4 : e3710000
  r3 : 000fdffe  r2 : c1e0ce80  r1 : ebf979a0  r0 : 00000000
  Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
  Control: 30c5383d  Table: 235c2a80  DAC: fffffffd
  Process mkfs.ext4 (pid: 123, stack limit = 0x495a22e6)
  Stack: (0xe376bbe0 to 0xe376c000)

As it turns out, zsram needs to know the maximum memory size, which
is defined in MAX_PHYSMEM_BITS when CONFIG_SPARSEMEM is set, or in
MAX_POSSIBLE_PHYSMEM_BITS on the x86 architecture.

The same problem will be hit on all 32-bit architectures that have a
physical address space larger than 4GB and happen to not enable sparsemem
and include asm/sparsemem.h from asm/pgtable.h.

After the initial discussion, I suggested just always defining
MAX_POSSIBLE_PHYSMEM_BITS whenever CONFIG_PHYS_ADDR_T_64BIT is
set, or provoking a build error otherwise. This addresses all
configurations that can currently have this runtime bug, but
leaves all other configurations unchanged.

I looked up the possible number of bits in source code and
datasheets, here is what I found:

 - on ARC, CONFIG_ARC_HAS_PAE40 controls whether 32 or 40 bits are used
 - on ARM, CONFIG_LPAE enables 40 bit addressing, without it we never
   support more than 32 bits, even though supersections in theory allow
   up to 40 bits as well.
 - on MIPS, some MIPS32r1 or later chips support 36 bits, and MIPS32r5
   XPA supports up to 60 bits in theory, but 40 bits are more than
   anyone will ever ship
 - On PowerPC, there are three different implementations of 36 bit
   addressing, but 32-bit is used without CONFIG_PTE_64BIT
 - On RISC-V, the normal page table format can support 34 bit
   addressing. There is no highmem support on RISC-V, so anything
   above 2GB is unused, but it might be useful to eventually support
   CONFIG_ZRAM for high pages.

Fixes: 61989a80fb3a ("staging: zsmalloc: zsmalloc memory allocation library")
Fixes: 02390b87a945 ("mm/zsmalloc: Prepare to variable MAX_PHYSMEM_BITS")
Acked-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Reviewed-by: Stefan Agner <stefan@agner.ch>
Tested-by: Stefan Agner <stefan@agner.ch>
Acked-by: Mike Rapoport <rppt@linux.ibm.com>
Link: https://lore.kernel.org/linux-mm/bdfa44bf1c570b05d6c70898e2bbb0acf234ecdf.1604762181.git.stefan@agner.ch/
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[florian: patch arch/powerpc/include/asm/pte-common.h for 4.14.y
removed arch/riscv/include/asm/pgtable.h which does not exist]
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/arc/include/asm/pgtable.h        |    2 ++
 arch/arm/include/asm/pgtable-2level.h |    2 ++
 arch/arm/include/asm/pgtable-3level.h |    2 ++
 arch/mips/include/asm/pgtable-32.h    |    3 +++
 arch/powerpc/include/asm/pte-common.h |    2 ++
 include/asm-generic/pgtable.h         |   13 +++++++++++++
 6 files changed, 24 insertions(+)

--- a/arch/arc/include/asm/pgtable.h
+++ b/arch/arc/include/asm/pgtable.h
@@ -138,8 +138,10 @@
 
 #ifdef CONFIG_ARC_HAS_PAE40
 #define PTE_BITS_NON_RWX_IN_PD1	(0xff00000000 | PAGE_MASK | _PAGE_CACHEABLE)
+#define MAX_POSSIBLE_PHYSMEM_BITS 40
 #else
 #define PTE_BITS_NON_RWX_IN_PD1	(PAGE_MASK | _PAGE_CACHEABLE)
+#define MAX_POSSIBLE_PHYSMEM_BITS 32
 #endif
 
 /**************************************************************************
--- a/arch/arm/include/asm/pgtable-2level.h
+++ b/arch/arm/include/asm/pgtable-2level.h
@@ -78,6 +78,8 @@
 #define PTE_HWTABLE_OFF		(PTE_HWTABLE_PTRS * sizeof(pte_t))
 #define PTE_HWTABLE_SIZE	(PTRS_PER_PTE * sizeof(u32))
 
+#define MAX_POSSIBLE_PHYSMEM_BITS	32
+
 /*
  * PMD_SHIFT determines the size of the area a second-level page table can map
  * PGDIR_SHIFT determines what a third-level page table entry can map
--- a/arch/arm/include/asm/pgtable-3level.h
+++ b/arch/arm/include/asm/pgtable-3level.h
@@ -37,6 +37,8 @@
 #define PTE_HWTABLE_OFF		(0)
 #define PTE_HWTABLE_SIZE	(PTRS_PER_PTE * sizeof(u64))
 
+#define MAX_POSSIBLE_PHYSMEM_BITS 40
+
 /*
  * PGDIR_SHIFT determines the size a top-level page table entry can map.
  */
--- a/arch/mips/include/asm/pgtable-32.h
+++ b/arch/mips/include/asm/pgtable-32.h
@@ -111,6 +111,7 @@ static inline void pmd_clear(pmd_t *pmdp
 
 #if defined(CONFIG_XPA)
 
+#define MAX_POSSIBLE_PHYSMEM_BITS 40
 #define pte_pfn(x)		(((unsigned long)((x).pte_high >> _PFN_SHIFT)) | (unsigned long)((x).pte_low << _PAGE_PRESENT_SHIFT))
 static inline pte_t
 pfn_pte(unsigned long pfn, pgprot_t prot)
@@ -126,6 +127,7 @@ pfn_pte(unsigned long pfn, pgprot_t prot
 
 #elif defined(CONFIG_PHYS_ADDR_T_64BIT) && defined(CONFIG_CPU_MIPS32)
 
+#define MAX_POSSIBLE_PHYSMEM_BITS 36
 #define pte_pfn(x)		((unsigned long)((x).pte_high >> 6))
 
 static inline pte_t pfn_pte(unsigned long pfn, pgprot_t prot)
@@ -140,6 +142,7 @@ static inline pte_t pfn_pte(unsigned lon
 
 #else
 
+#define MAX_POSSIBLE_PHYSMEM_BITS 32
 #ifdef CONFIG_CPU_VR41XX
 #define pte_pfn(x)		((unsigned long)((x).pte >> (PAGE_SHIFT + 2)))
 #define pfn_pte(pfn, prot)	__pte(((pfn) << (PAGE_SHIFT + 2)) | pgprot_val(prot))
--- a/arch/powerpc/include/asm/pte-common.h
+++ b/arch/powerpc/include/asm/pte-common.h
@@ -102,8 +102,10 @@ static inline bool pte_user(pte_t pte)
  */
 #if defined(CONFIG_PPC32) && defined(CONFIG_PTE_64BIT)
 #define PTE_RPN_MASK	(~((1ULL<<PTE_RPN_SHIFT)-1))
+#define MAX_POSSIBLE_PHYSMEM_BITS 36
 #else
 #define PTE_RPN_MASK	(~((1UL<<PTE_RPN_SHIFT)-1))
+#define MAX_POSSIBLE_PHYSMEM_BITS 32
 #endif
 
 /* _PAGE_CHG_MASK masks of bits that are to be preserved across
--- a/include/asm-generic/pgtable.h
+++ b/include/asm-generic/pgtable.h
@@ -1069,6 +1069,19 @@ static inline bool arch_has_pfn_modify_c
 
 #endif /* !__ASSEMBLY__ */
 
+#if !defined(MAX_POSSIBLE_PHYSMEM_BITS) && !defined(CONFIG_64BIT)
+#ifdef CONFIG_PHYS_ADDR_T_64BIT
+/*
+ * ZSMALLOC needs to know the highest PFN on 32-bit architectures
+ * with physical address space extension, but falls back to
+ * BITS_PER_LONG otherwise.
+ */
+#error Missing MAX_POSSIBLE_PHYSMEM_BITS definition
+#else
+#define MAX_POSSIBLE_PHYSMEM_BITS 32
+#endif
+#endif
+
 #ifndef has_transparent_hugepage
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 #define has_transparent_hugepage() 1


Patches currently in stable-queue which might be from f.fainelli@gmail.com are

queue-4.14/mm-zsmalloc-prepare-to-variable-max_physmem_bits.patch
queue-4.14/arch-pgtable-define-max_possible_physmem_bits-where-needed.patch

^ permalink raw reply

* Patch "mm/zsmalloc: Prepare to variable MAX_PHYSMEM_BITS" has been added to the 4.9-stable tree
From: gregkh @ 2021-11-04  8:34 UTC (permalink / raw)
  To: arnd, benh, bp, f.fainelli, gregkh, hpa, kirill.shutemov,
	linux-arm-kernel, linux-mips, linux-mm, linux-snps-arc, linux,
	linuxppc-dev, luto, minchan, mingo, mingo, mpe, ngupta, paulus,
	peterz, ralf, rppt, sashal, sergey.senozhatsky.work, stefan, tglx,
	torvalds, tsbogend, vgupta, x86
  Cc: stable-commits
In-Reply-To: <20211103205714.374801-2-f.fainelli@gmail.com>


This is a note to let you know that I've just added the patch titled

    mm/zsmalloc: Prepare to variable MAX_PHYSMEM_BITS

to the 4.9-stable tree which can be found at:
    http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary

The filename of the patch is:
     mm-zsmalloc-prepare-to-variable-max_physmem_bits.patch
and it can be found in the queue-4.9 subdirectory.

If you, or anyone else, feels it should not be added to the stable tree,
please let <stable@vger.kernel.org> know about it.


From foo@baz Thu Nov  4 09:33:49 AM CET 2021
From: Florian Fainelli <f.fainelli@gmail.com>
Date: Wed,  3 Nov 2021 13:57:13 -0700
Subject: mm/zsmalloc: Prepare to variable MAX_PHYSMEM_BITS
To: linux-kernel@vger.kernel.org
Cc: stable@vger.kernel.org, Greg Kroah-Hartman <gregkh@linuxfoundation.org>, Sasha Levin <sashal@kernel.org>, "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>, Nitin Gupta <ngupta@vflare.org>, Minchan Kim <minchan@kernel.org>, Andy Lutomirski <luto@amacapital.net>, Borislav Petkov <bp@suse.de>, Linus Torvalds <torvalds@linux-foundation.org>, Peter Zijlstra <peterz@infradead.org>, Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>, Thomas Gleixner <tglx@linutronix.de>, linux-mm@kvack.org, Ingo Molnar <mingo@kernel.org>, Florian Fainelli <f.fainelli@gmail.com>, Vineet Gupta <vgupta@synopsys.com>, Russell King <linux@armlinux.org.uk>, Ralf Baechle <ralf@linux-mips.org>, Benjamin Herrenschmidt <benh@kernel.crashing.org>, Paul Mackerras <paulus@samba.org>, Michael Ellerman <mpe@ellerman.id.au>, Ingo Molnar <mingo@redhat.com>, "H. Peter Anvin" <hpa@zytor.com>, x86@kernel.org (maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)), Arnd Bergmann <arnd@arndb.de>, Thomas Bogendoerfer <tsb
 ogend@alpha.franken.de>, Mike Rapoport <rppt@linux.ibm.com>, Stefan Agner <stefan@agner.ch>, linux-snps-arc@lists.infradead.org (open list:SYNOPSYS ARC ARCHITECTURE), linux-arm-kernel@lists.infradead.org (moderated list:ARM PORT), linux-mips@linux-mips.org (open list:MIPS), linuxppc-dev@lists.ozlabs.org (open list:LINUX FOR POWERPC (32-BIT AND 64-BIT)), linux-arch@vger.kernel.org (open list:GENERIC INCLUDE/ASM HEADER FILES)
Message-ID: <20211103205714.374801-2-f.fainelli@gmail.com>

From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>

commit 02390b87a9459937cdb299e6b34ff33992512ec7 upstream

With boot-time switching between paging mode we will have variable
MAX_PHYSMEM_BITS.

Let's use the maximum variable possible for CONFIG_X86_5LEVEL=y
configuration to define zsmalloc data structures.

The patch introduces MAX_POSSIBLE_PHYSMEM_BITS to cover such case.
It also suits well to handle PAE special case.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Nitin Gupta <ngupta@vflare.org>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20180214111656.88514-3-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
[florian: drop arch/x86/include/asm/pgtable_64_types.h changes since
there is no CONFIG_X86_5LEVEL]
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/x86/include/asm/pgtable-3level_types.h |    1 +
 mm/zsmalloc.c                               |   13 +++++++------
 2 files changed, 8 insertions(+), 6 deletions(-)

--- a/arch/x86/include/asm/pgtable-3level_types.h
+++ b/arch/x86/include/asm/pgtable-3level_types.h
@@ -42,5 +42,6 @@ typedef union {
  */
 #define PTRS_PER_PTE	512
 
+#define MAX_POSSIBLE_PHYSMEM_BITS	36
 
 #endif /* _ASM_X86_PGTABLE_3LEVEL_DEFS_H */
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -83,18 +83,19 @@
  * This is made more complicated by various memory models and PAE.
  */
 
-#ifndef MAX_PHYSMEM_BITS
-#ifdef CONFIG_HIGHMEM64G
-#define MAX_PHYSMEM_BITS 36
-#else /* !CONFIG_HIGHMEM64G */
+#ifndef MAX_POSSIBLE_PHYSMEM_BITS
+#ifdef MAX_PHYSMEM_BITS
+#define MAX_POSSIBLE_PHYSMEM_BITS MAX_PHYSMEM_BITS
+#else
 /*
  * If this definition of MAX_PHYSMEM_BITS is used, OBJ_INDEX_BITS will just
  * be PAGE_SHIFT
  */
-#define MAX_PHYSMEM_BITS BITS_PER_LONG
+#define MAX_POSSIBLE_PHYSMEM_BITS BITS_PER_LONG
 #endif
 #endif
-#define _PFN_BITS		(MAX_PHYSMEM_BITS - PAGE_SHIFT)
+
+#define _PFN_BITS		(MAX_POSSIBLE_PHYSMEM_BITS - PAGE_SHIFT)
 
 /*
  * Memory for allocating for handle keeps object position by


Patches currently in stable-queue which might be from f.fainelli@gmail.com are

queue-4.9/mm-zsmalloc-prepare-to-variable-max_physmem_bits.patch
queue-4.9/arch-pgtable-define-max_possible_physmem_bits-where-needed.patch

^ permalink raw reply

* Patch "arch: pgtable: define MAX_POSSIBLE_PHYSMEM_BITS where needed" has been added to the 4.9-stable tree
From: gregkh @ 2021-11-04  8:34 UTC (permalink / raw)
  To: arnd, benh, f.fainelli, gregkh, hpa, kirill.shutemov,
	linux-arm-kernel, linux-mips, linux-mm, linux-snps-arc, linux,
	linuxppc-dev, minchan, mingo, mpe, ngupta, paulus, ralf, rppt,
	sashal, sergey.senozhatsky.work, stefan, tglx, tsbogend, vgupta,
	x86
  Cc: stable-commits
In-Reply-To: <20211103205714.374801-3-f.fainelli@gmail.com>


This is a note to let you know that I've just added the patch titled

    arch: pgtable: define MAX_POSSIBLE_PHYSMEM_BITS where needed

to the 4.9-stable tree which can be found at:
    http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary

The filename of the patch is:
     arch-pgtable-define-max_possible_physmem_bits-where-needed.patch
and it can be found in the queue-4.9 subdirectory.

If you, or anyone else, feels it should not be added to the stable tree,
please let <stable@vger.kernel.org> know about it.


From foo@baz Thu Nov  4 09:33:49 AM CET 2021
From: Florian Fainelli <f.fainelli@gmail.com>
Date: Wed,  3 Nov 2021 13:57:14 -0700
Subject: arch: pgtable: define MAX_POSSIBLE_PHYSMEM_BITS where needed
To: linux-kernel@vger.kernel.org
Cc: stable@vger.kernel.org, Greg Kroah-Hartman <gregkh@linuxfoundation.org>, Sasha Levin <sashal@kernel.org>, Arnd Bergmann <arnd@arndb.de>, Thomas Bogendoerfer <tsbogend@alpha.franken.de>, Stefan Agner <stefan@agner.ch>, Mike Rapoport <rppt@linux.ibm.com>, Florian Fainelli <f.fainelli@gmail.com>, Vineet Gupta <vgupta@synopsys.com>, Russell King <linux@armlinux.org.uk>, Ralf Baechle <ralf@linux-mips.org>, Benjamin Herrenschmidt <benh@kernel.crashing.org>, Paul Mackerras <paulus@samba.org>, Michael Ellerman <mpe@ellerman.id.au>, Thomas Gleixner <tglx@linutronix.de>, Ingo Molnar <mingo@redhat.com>, "H. Peter Anvin" <hpa@zytor.com>, x86@kernel.org (maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)), Minchan Kim <minchan@kernel.org>, Nitin Gupta <ngupta@vflare.org>, Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>, "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>, linux-snps-arc@lists.infradead.org (open list:SYNOPSYS ARC ARCHITECTURE), linux-arm-kernel@lists.infradead.org (mod
 erated list:ARM PORT), linux-mips@linux-mips.org (open list:MIPS), linuxppc-dev@lists.ozlabs.org (open list:LINUX FOR POWERPC (32-BIT AND 64-BIT)), linux-arch@vger.kernel.org (open list:GENERIC INCLUDE/ASM HEADER FILES), linux-mm@kvack.org (open list:ZSMALLOC COMPRESSED SLAB MEMORY ALLOCATOR)
Message-ID: <20211103205714.374801-3-f.fainelli@gmail.com>

From: Arnd Bergmann <arnd@arndb.de>

[ Upstream commit cef397038167ac15d085914493d6c86385773709 ]

Stefan Agner reported a bug when using zsram on 32-bit Arm machines
with RAM above the 4GB address boundary:

  Unable to handle kernel NULL pointer dereference at virtual address 00000000
  pgd = a27bd01c
  [00000000] *pgd=236a0003, *pmd=1ffa64003
  Internal error: Oops: 207 [#1] SMP ARM
  Modules linked in: mdio_bcm_unimac(+) brcmfmac cfg80211 brcmutil raspberrypi_hwmon hci_uart crc32_arm_ce bcm2711_thermal phy_generic genet
  CPU: 0 PID: 123 Comm: mkfs.ext4 Not tainted 5.9.6 #1
  Hardware name: BCM2711
  PC is at zs_map_object+0x94/0x338
  LR is at zram_bvec_rw.constprop.0+0x330/0xa64
  pc : [<c0602b38>]    lr : [<c0bda6a0>]    psr: 60000013
  sp : e376bbe0  ip : 00000000  fp : c1e2921c
  r10: 00000002  r9 : c1dda730  r8 : 00000000
  r7 : e8ff7a00  r6 : 00000000  r5 : 02f9ffa0  r4 : e3710000
  r3 : 000fdffe  r2 : c1e0ce80  r1 : ebf979a0  r0 : 00000000
  Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
  Control: 30c5383d  Table: 235c2a80  DAC: fffffffd
  Process mkfs.ext4 (pid: 123, stack limit = 0x495a22e6)
  Stack: (0xe376bbe0 to 0xe376c000)

As it turns out, zsram needs to know the maximum memory size, which
is defined in MAX_PHYSMEM_BITS when CONFIG_SPARSEMEM is set, or in
MAX_POSSIBLE_PHYSMEM_BITS on the x86 architecture.

The same problem will be hit on all 32-bit architectures that have a
physical address space larger than 4GB and happen to not enable sparsemem
and include asm/sparsemem.h from asm/pgtable.h.

After the initial discussion, I suggested just always defining
MAX_POSSIBLE_PHYSMEM_BITS whenever CONFIG_PHYS_ADDR_T_64BIT is
set, or provoking a build error otherwise. This addresses all
configurations that can currently have this runtime bug, but
leaves all other configurations unchanged.

I looked up the possible number of bits in source code and
datasheets, here is what I found:

 - on ARC, CONFIG_ARC_HAS_PAE40 controls whether 32 or 40 bits are used
 - on ARM, CONFIG_LPAE enables 40 bit addressing, without it we never
   support more than 32 bits, even though supersections in theory allow
   up to 40 bits as well.
 - on MIPS, some MIPS32r1 or later chips support 36 bits, and MIPS32r5
   XPA supports up to 60 bits in theory, but 40 bits are more than
   anyone will ever ship
 - On PowerPC, there are three different implementations of 36 bit
   addressing, but 32-bit is used without CONFIG_PTE_64BIT
 - On RISC-V, the normal page table format can support 34 bit
   addressing. There is no highmem support on RISC-V, so anything
   above 2GB is unused, but it might be useful to eventually support
   CONFIG_ZRAM for high pages.

Fixes: 61989a80fb3a ("staging: zsmalloc: zsmalloc memory allocation library")
Fixes: 02390b87a945 ("mm/zsmalloc: Prepare to variable MAX_PHYSMEM_BITS")
Acked-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Reviewed-by: Stefan Agner <stefan@agner.ch>
Tested-by: Stefan Agner <stefan@agner.ch>
Acked-by: Mike Rapoport <rppt@linux.ibm.com>
Link: https://lore.kernel.org/linux-mm/bdfa44bf1c570b05d6c70898e2bbb0acf234ecdf.1604762181.git.stefan@agner.ch/
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[florian: patch arch/powerpc/include/asm/pte-common.h for 4.9.y
removed arch/riscv/include/asm/pgtable.h which does not exist]
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/arc/include/asm/pgtable.h        |    2 ++
 arch/arm/include/asm/pgtable-2level.h |    2 ++
 arch/arm/include/asm/pgtable-3level.h |    2 ++
 arch/mips/include/asm/pgtable-32.h    |    3 +++
 arch/powerpc/include/asm/pte-common.h |    2 ++
 include/asm-generic/pgtable.h         |   13 +++++++++++++
 6 files changed, 24 insertions(+)

--- a/arch/arc/include/asm/pgtable.h
+++ b/arch/arc/include/asm/pgtable.h
@@ -137,8 +137,10 @@
 
 #ifdef CONFIG_ARC_HAS_PAE40
 #define PTE_BITS_NON_RWX_IN_PD1	(0xff00000000 | PAGE_MASK | _PAGE_CACHEABLE)
+#define MAX_POSSIBLE_PHYSMEM_BITS 40
 #else
 #define PTE_BITS_NON_RWX_IN_PD1	(PAGE_MASK | _PAGE_CACHEABLE)
+#define MAX_POSSIBLE_PHYSMEM_BITS 32
 #endif
 
 /**************************************************************************
--- a/arch/arm/include/asm/pgtable-2level.h
+++ b/arch/arm/include/asm/pgtable-2level.h
@@ -78,6 +78,8 @@
 #define PTE_HWTABLE_OFF		(PTE_HWTABLE_PTRS * sizeof(pte_t))
 #define PTE_HWTABLE_SIZE	(PTRS_PER_PTE * sizeof(u32))
 
+#define MAX_POSSIBLE_PHYSMEM_BITS	32
+
 /*
  * PMD_SHIFT determines the size of the area a second-level page table can map
  * PGDIR_SHIFT determines what a third-level page table entry can map
--- a/arch/arm/include/asm/pgtable-3level.h
+++ b/arch/arm/include/asm/pgtable-3level.h
@@ -37,6 +37,8 @@
 #define PTE_HWTABLE_OFF		(0)
 #define PTE_HWTABLE_SIZE	(PTRS_PER_PTE * sizeof(u64))
 
+#define MAX_POSSIBLE_PHYSMEM_BITS 40
+
 /*
  * PGDIR_SHIFT determines the size a top-level page table entry can map.
  */
--- a/arch/mips/include/asm/pgtable-32.h
+++ b/arch/mips/include/asm/pgtable-32.h
@@ -110,6 +110,7 @@ static inline void pmd_clear(pmd_t *pmdp
 
 #if defined(CONFIG_XPA)
 
+#define MAX_POSSIBLE_PHYSMEM_BITS 40
 #define pte_pfn(x)		(((unsigned long)((x).pte_high >> _PFN_SHIFT)) | (unsigned long)((x).pte_low << _PAGE_PRESENT_SHIFT))
 static inline pte_t
 pfn_pte(unsigned long pfn, pgprot_t prot)
@@ -125,6 +126,7 @@ pfn_pte(unsigned long pfn, pgprot_t prot
 
 #elif defined(CONFIG_PHYS_ADDR_T_64BIT) && defined(CONFIG_CPU_MIPS32)
 
+#define MAX_POSSIBLE_PHYSMEM_BITS 36
 #define pte_pfn(x)		((unsigned long)((x).pte_high >> 6))
 
 static inline pte_t pfn_pte(unsigned long pfn, pgprot_t prot)
@@ -139,6 +141,7 @@ static inline pte_t pfn_pte(unsigned lon
 
 #else
 
+#define MAX_POSSIBLE_PHYSMEM_BITS 32
 #ifdef CONFIG_CPU_VR41XX
 #define pte_pfn(x)		((unsigned long)((x).pte >> (PAGE_SHIFT + 2)))
 #define pfn_pte(pfn, prot)	__pte(((pfn) << (PAGE_SHIFT + 2)) | pgprot_val(prot))
--- a/arch/powerpc/include/asm/pte-common.h
+++ b/arch/powerpc/include/asm/pte-common.h
@@ -101,8 +101,10 @@ static inline bool pte_user(pte_t pte)
  */
 #if defined(CONFIG_PPC32) && defined(CONFIG_PTE_64BIT)
 #define PTE_RPN_MASK	(~((1ULL<<PTE_RPN_SHIFT)-1))
+#define MAX_POSSIBLE_PHYSMEM_BITS 36
 #else
 #define PTE_RPN_MASK	(~((1UL<<PTE_RPN_SHIFT)-1))
+#define MAX_POSSIBLE_PHYSMEM_BITS 32
 #endif
 
 /* _PAGE_CHG_MASK masks of bits that are to be preserved across
--- a/include/asm-generic/pgtable.h
+++ b/include/asm-generic/pgtable.h
@@ -847,6 +847,19 @@ static inline bool arch_has_pfn_modify_c
 #define io_remap_pfn_range remap_pfn_range
 #endif
 
+#if !defined(MAX_POSSIBLE_PHYSMEM_BITS) && !defined(CONFIG_64BIT)
+#ifdef CONFIG_PHYS_ADDR_T_64BIT
+/*
+ * ZSMALLOC needs to know the highest PFN on 32-bit architectures
+ * with physical address space extension, but falls back to
+ * BITS_PER_LONG otherwise.
+ */
+#error Missing MAX_POSSIBLE_PHYSMEM_BITS definition
+#else
+#define MAX_POSSIBLE_PHYSMEM_BITS 32
+#endif
+#endif
+
 #ifndef has_transparent_hugepage
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 #define has_transparent_hugepage() 1


Patches currently in stable-queue which might be from f.fainelli@gmail.com are

queue-4.9/mm-zsmalloc-prepare-to-variable-max_physmem_bits.patch
queue-4.9/arch-pgtable-define-max_possible_physmem_bits-where-needed.patch

^ permalink raw reply

* [Bug 214913] [xfstests generic/051] BUG: Kernel NULL pointer dereference on read at 0x00000108 NIP [c0000000000372e4] tm_cgpr_active+0x14/0x40
From: bugzilla-daemon @ 2021-11-04  8:15 UTC (permalink / raw)
  To: linuxppc-dev
In-Reply-To: <bug-214913-206035@https.bugzilla.kernel.org/>

https://bugzilla.kernel.org/show_bug.cgi?id=214913

Michal Suchanek (hramrach@gmail.com) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |hramrach@gmail.com

--- Comment #3 from Michal Suchanek (hramrach@gmail.com) ---
What CPU is this?

Does it go away if you boot with ppc_tm=off

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply

* Re: [PATCH v2 RESEND] powerpc/audit: Convert powerpc to AUDIT_ARCH_COMPAT_GENERIC
From: Michael Ellerman @ 2021-11-04  6:20 UTC (permalink / raw)
  To: Konstantin Ryabitsev
  Cc: Paul Moore, patch-notifications, linux-kernel, linux-audit,
	Paul Mackerras, Eric Paris, linuxppc-dev
In-Reply-To: <20211102235449.rwtbgmddkzdaodhv@meerkat.local>

Konstantin Ryabitsev <konstantin@linuxfoundation.org> writes:
> On Wed, Nov 03, 2021 at 10:18:57AM +1100, Michael Ellerman wrote:
>> It's not in next, that notification is from the b4 thanks script, which
>> didn't notice that the commit has since been reverted.
>
> Yeah... I'm not sure how to catch that, but I'm open to suggestions.

I think that's probably the first time I've had a commit and a revert of
the commit in the same batch of thanks mails.

And the notification is not wrong, the commit was applied with that SHA,
it is in the tree.

So I'm not sure it's very common to have a commit & a revert in the tree
at the same time.


On the other hand being able to generate a mail for an arbitrary revert
would be helpful, ie. independent of any thanks state.

eg, picking a random commit from the past:

  e95ad5f21693 ("powerpc/head_check: Fix shellcheck errors")


If I revert that in my tree today, it'd be cool if I could run something
that would detect the revert, backtrack to the reverted commit, extract
the message-id from the Link: tag, and generate a reply to the original
submission noting that it's now been reverted.

In fact we could write a bot to do that across all commits ever ...

cheers

^ permalink raw reply

* [PATCH] powerpc: use swap() to make code cleaner
From: davidcomponentone @ 2021-11-04  6:17 UTC (permalink / raw)
  To: mpe
  Cc: sfr, sxwjean, Zeal Robot, davidcomponentone, linux-kernel, nathan,
	yang.guang5, paulus, aneesh.kumar, linuxppc-dev

From: Yang Guang <yang.guang5@zte.com.cn>

Use the macro 'swap()' defined in 'include/linux/minmax.h' to avoid
opencoding it.

Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: Yang Guang <yang.guang5@zte.com.cn>
---
 arch/powerpc/kernel/fadump.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
index b7ceb041743c..5b40e2d46090 100644
--- a/arch/powerpc/kernel/fadump.c
+++ b/arch/powerpc/kernel/fadump.c
@@ -1265,7 +1265,6 @@ static void fadump_release_reserved_area(u64 start, u64 end)
 static void sort_and_merge_mem_ranges(struct fadump_mrange_info *mrange_info)
 {
 	struct fadump_memory_range *mem_ranges;
-	struct fadump_memory_range tmp_range;
 	u64 base, size;
 	int i, j, idx;
 
@@ -1281,9 +1280,7 @@ static void sort_and_merge_mem_ranges(struct fadump_mrange_info *mrange_info)
 				idx = j;
 		}
 		if (idx != i) {
-			tmp_range = mem_ranges[idx];
-			mem_ranges[idx] = mem_ranges[i];
-			mem_ranges[i] = tmp_range;
+			swap(mem_ranges[idx], mem_ranges[i]);
 		}
 	}
 
-- 
2.30.2


^ permalink raw reply related

* Re: [PATCH 2/2] soc: fsl: guts: Add a missing memory allocation failure check
From: Michael Ellerman @ 2021-11-04  6:03 UTC (permalink / raw)
  To: Christophe JAILLET, leoyang.li, tyreld
  Cc: kernel-janitors, Christophe JAILLET, linuxppc-dev, linux-kernel,
	linux-arm-kernel
In-Reply-To: <4890990418ecbcfb8921efe8adb2019a03e5a1c1.1635969326.git.christophe.jaillet@wanadoo.fr>

Christophe JAILLET <christophe.jaillet@wanadoo.fr> writes:
> If 'devm_kstrdup()' fails, we should return -ENOMEM.
>
> While at it, move the 'of_node_put()' call in the error handling path and
> after the 'machine' has been copied.
> Better safe than sorry.
>
> Suggested-by: Tyrel Datwyler <tyreld@linux.ibm.com>
> Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
> ---
> Not sure of which Fixes tag to add. Should be a6fc3b698130, but since
> another commit needs to be reverted for this patch to make sense, I'm
> unsure of what to do. :(
> So, none is given.

I think it's still correct to add:

  Fixes: a6fc3b698130 ("soc: fsl: add GUTS driver for QorIQ platforms")

That is where the bug was introduced, and adding the tag creates a link
between the fix and the bug, which is what we want.

The fact that it also requires the revert in order to apply is kind of
orthogonal, it means an automated backport of this commit will probably
fail, but that's OK it just means someone might have to do it manually.

There is some use of "Depends-on:" to flag a commit that is depended on,
but you can't use that in a patch submission because you don't know the
SHA of the parent commit.

Possibly whoever applies this can add a Depends-on: pointing to patch 1.

cheers

> ---
>  drivers/soc/fsl/guts.c | 9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/soc/fsl/guts.c b/drivers/soc/fsl/guts.c
> index af7741eafc57..5ed2fc1c53a0 100644
> --- a/drivers/soc/fsl/guts.c
> +++ b/drivers/soc/fsl/guts.c
> @@ -158,9 +158,14 @@ static int fsl_guts_probe(struct platform_device *pdev)
>  	root = of_find_node_by_path("/");
>  	if (of_property_read_string(root, "model", &machine))
>  		of_property_read_string_index(root, "compatible", 0, &machine);
> -	of_node_put(root);
> -	if (machine)
> +	if (machine) {
>  		soc_dev_attr.machine = devm_kstrdup(dev, machine, GFP_KERNEL);
> +		if (!soc_dev_attr.machine) {
> +			of_node_put(root);
> +			return -ENOMEM;
> +		}
> +	}
> +	of_node_put(root);
>  
>  	svr = fsl_guts_get_svr();
>  	soc_die = fsl_soc_die_match(svr, fsl_soc_die);
> -- 
> 2.30.2

^ permalink raw reply

* Re: [V3] powerpc/perf: Enable PMU counters post partition migration if PMU is active
From: Michael Ellerman @ 2021-11-04  5:55 UTC (permalink / raw)
  To: Nathan Lynch, Nicholas Piggin
  Cc: kjain, Athira Rajeev, maddy, linuxppc-dev, rnsastry
In-Reply-To: <8735odx7us.fsf@linux.ibm.com>

Nathan Lynch <nathanl@linux.ibm.com> writes:
> Nicholas Piggin <npiggin@gmail.com> writes:
>> Excerpts from Michael Ellerman's message of October 29, 2021 11:15 pm:
>>> Nicholas Piggin <npiggin@gmail.com> writes:
>>>> Excerpts from Athira Rajeev's message of October 29, 2021 1:05 pm:
>>>>> @@ -631,12 +632,18 @@ static int pseries_migrate_partition(u64 handle)
>>>>>  	if (ret)
>>>>>  		return ret;
>>>>>  
>>>>> +	/* Disable PMU before suspend */
>>>>> +	on_each_cpu(&mobility_pmu_disable, NULL, 0);
>>>>
>>>> Why was this moved out of stop machine and to an IPI?
>>>>
>>>> My concern would be, what are the other CPUs doing at this time? Is it 
>>>> possible they could take interrupts and schedule? Could that mess up the
>>>> perf state here?
>>> 
>>> pseries_migrate_partition() is called directly from migration_store(),
>>> which is the sysfs store function, which can be called concurrently by
>>> different CPUs.
>>> 
>>> It's also potentially called from rtas_syscall_dispatch_ibm_suspend_me(),
>>> from sys_rtas(), again with no locking.
>>> 
>>> So we could have two CPUs calling into here at the same time, which
>>> might not crash, but is unlikely to work well.
>>> 
>>> I think the lack of locking might have been OK in the past because only
>>> one CPU will successfully get the other CPUs to call do_join() in
>>> pseries_suspend(). But I could be wrong.
>>> 
>>> Anyway, now that we're mutating the PMU state before suspending we need
>>> to be more careful. So I think we need a lock around the whole
>>> sequence.
>
> Regardless of the outcome here, generally agreed that some serialization
> should be imposed in this path. The way the platform works (and some
> extra measures by the drmgr utility) make it so that this code isn't
> entered concurrently in usual operation, but it's possible to make it
> happen if you are root.

Yeah I agree it's unlikely to be a problem in practice.

> A file-static mutex should be OK.

Ack.

>> My concern is still that we wouldn't necessarily have the other CPUs 
>> under control at that point even if we serialize the migrate path.
>> They could take interrupts, possibly call into perf subsystem after
>> the mobility_pmu_disable (e.g., via syscall or context switch) which 
>> might mess things up.
>>
>> I think the stop machine is a reasonable place for the code in this 
>> case. It's a low level disabling of hardware facility and saving off 
>> registers.
>
> That makes sense, but I can't help feeling concerned still. For this to
> be safe, power_pmu_enable() and power_pmu_disable() must never sleep or
> re-enable interrupts or send IPIs. I don't see anything obviously unsafe
> right now, but is that already part of their contract? Is there much
> risk they could change in the future to violate those constraints?
>
> That aside, the proposed change seems like we would be hacking around a
> more generic perf/pmu limitation in a powerpc-specific way. I see the
> same behavior on x86 across suspend/resume.
>
> # perf stat -a -e cache-misses -I 1000 & sleep 2 ; systemctl suspend ; sleep 20 ; kill $(jobs -p)
> [1] 189806
> #           time             counts unit events
>      1.000501710          9,983,649      cache-misses
>      2.002620321         14,131,072      cache-misses
>      3.004579071         23,010,971      cache-misses
>      9.971854783 140,737,491,680,853      cache-misses
>     10.982669250                  0      cache-misses
>     11.984660498                  0      cache-misses
>     12.986648392                  0      cache-misses
>     13.988561766                  0      cache-misses
>     14.992670615                  0      cache-misses
>     15.994938111                  0      cache-misses
>     16.996703952                  0      cache-misses
>     17.999092812                  0      cache-misses
>     19.000602677                  0      cache-misses
>     20.003272216                  0      cache-misses
>     21.004770295                  0      cache-misses
> # uname -r
> 5.13.19-100.fc33.x86_64

That is interesting.

Athira, I guess we should bring that to the perf maintainers and see if
there's any interest in solving the issue in a generic fashion.

cheers

^ permalink raw reply

* [Bug 214913] [xfstests generic/051] BUG: Kernel NULL pointer dereference on read at 0x00000108 NIP [c0000000000372e4] tm_cgpr_active+0x14/0x40
From: bugzilla-daemon @ 2021-11-04  5:45 UTC (permalink / raw)
  To: linuxppc-dev
In-Reply-To: <bug-214913-206035@https.bugzilla.kernel.org/>

https://bugzilla.kernel.org/show_bug.cgi?id=214913

Michael Ellerman (michael@ellerman.id.au) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |ASSIGNED
                 CC|                            |michael@ellerman.id.au

--- Comment #2 from Michael Ellerman (michael@ellerman.id.au) ---
Thanks for the report, I agree this looks like a powerpc bug not an XFS bug.

I won't have time to look at this until next week probably, unless someone
beats me to it.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply

* Re: Fwd: Fwd: X stopped working with 5.14 on iBook
From: Stanley Johnson @ 2021-11-04  2:00 UTC (permalink / raw)
  To: Finn Thain
  Cc: Christopher M. Riedl, linuxppc-dev, Andreas Schwab,
	Riccardo Mottola
In-Reply-To: <3723058-87f1-cba7-c8ad-ac8dc5722abe@linux-m68k.org>

On Wednesday, November 3rd, 2021 at 4:26 PM, Finn Thain <fthain@linux-m68k.org> wrote:

> On Wed, 3 Nov 2021, Andreas Schwab wrote:
>
> > On Nov 02 2021, Finn Thain wrote:
> >
> > > After many builds and tests, Stan and I were able to determine that this
> > >
> > > regression only affects builds with CONFIG_USER_NS=y. That is,
> > >
> > > d3ccc9781560 + CONFIG_USER_NS=y --> fail
> > >
> > > d3ccc9781560 + CONFIG_USER_NS=n --> okay
> > >
> > > d3ccc9781560~ + CONFIG_USER_NS=y --> okay
> > >
> > > d3ccc9781560~ + CONFIG_USER_NS=n --> okay
> >
> > On my iBook G4, X is working alright with 5.15 and CONFIG_USER_NS=y.
>
> Stan said his Cube has these packages installed:
>
> dpkg --list | grep Xorg
> =======================
>
> ii xserver-xorg-core 2:1.20.11-1
>
> powerpc Xorg X server - core server
>
> ii xserver-xorg-legacy 2:1.20.11-1
>
> powerpc setuid root Xorg server wrapper
>
> I gather that Riccardo also runs Debian SID.
>
> Perhaps there is some interaction between d3ccc9781560, CONFIG_USER_NS and
>
> the SUID wrapper...
>
> Does your Xorg installation use --enable-suid-wrapper, Andreas?

Hi Andreas,

Does X work for you if you use the current Debian SID installation with their current default kernel? That's how this all started. The problem was eventually isolated via a git bisect and an exhaustive search of kernel options to the identified "bad commit" and the kernel option CONFIG_USER_NS. The kernel just before the bad commit works with or without CONFIG_USER_NS set, but as of the bad commit, X works only with CONFIG_USER_NS not set, at least on my G4 Cube.

Hi Riccardo,

The G3 system I used for testing was a PowerBook Series II Wallstreet, using the same kernel and Xorg versions that I'm using on the Cube. The same test that failed on the Cube worked on the Wallstreet. Of course, this result may not be consistent with other G3 systems. On your iBook G4, if you recompile the kernel (the one that results in an X that doesn't work on your system) and set CONFIG_USER_NS=n, does X then work?

-Stan


^ permalink raw reply

* [PATCH] use swap() to make code cleaner
From: davidcomponentone @ 2021-11-04  1:14 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: zealci, yang.guang5, davidcomponentone, linux-kernel

From: Yang Guang <yang.guang5@zte.com.cn>

Use the macro 'swap()' defined in 'include/linux/minmax.h' to avoid
opencoding it.

Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: Yang Guang <yang.guang5@zte.com.cn>
---
 drivers/macintosh/adbhid.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/macintosh/adbhid.c b/drivers/macintosh/adbhid.c
index 994ba5cb3678..379ab49a8107 100644
--- a/drivers/macintosh/adbhid.c
+++ b/drivers/macintosh/adbhid.c
@@ -817,9 +817,7 @@ adbhid_input_register(int id, int default_id, int original_handler_id,
 		case 0xC4: case 0xC7:
 			keyboard_type = "ISO, swapping keys";
 			input_dev->id.version = ADB_KEYBOARD_ISO;
-			i = hid->keycode[10];
-			hid->keycode[10] = hid->keycode[50];
-			hid->keycode[50] = i;
+			swap(hid->keycode[10], hid->keycode[50]);
 			break;
 
 		case 0x12: case 0x15: case 0x16: case 0x17: case 0x1A:
-- 
2.30.2


^ permalink raw reply related

* [PATCH] powerpc: use swap() to make code cleaner
From: davidcomponentone @ 2021-11-04  1:09 UTC (permalink / raw)
  To: benh
  Cc: zealci, davidcomponentone, linux-kernel, yang.guang5, paulus,
	linuxppc-dev

From: Yang Guang <yang.guang5@zte.com.cn>

Use the macro 'swap()' defined in 'include/linux/minmax.h' to avoid
opencoding it.

Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: Yang Guang <yang.guang5@zte.com.cn>
---
 arch/powerpc/platforms/powermac/pic.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/arch/powerpc/platforms/powermac/pic.c b/arch/powerpc/platforms/powermac/pic.c
index 4921bccf0376..75d8d7ec53db 100644
--- a/arch/powerpc/platforms/powermac/pic.c
+++ b/arch/powerpc/platforms/powermac/pic.c
@@ -311,11 +311,8 @@ static void __init pmac_pic_probe_oldstyle(void)
 
 		/* Check ordering of master & slave */
 		if (of_device_is_compatible(master, "gatwick")) {
-			struct device_node *tmp;
 			BUG_ON(slave == NULL);
-			tmp = master;
-			master = slave;
-			slave = tmp;
+			swap(master, slave);
 		}
 
 		/* We found a slave */
-- 
2.30.2


^ permalink raw reply related

* Re: Fwd: Fwd: X stopped working with 5.14 on iBook
From: Andreas Schwab @ 2021-11-03 22:55 UTC (permalink / raw)
  To: Finn Thain
  Cc: Christopher M. Riedl, Stan Johnson, linuxppc-dev,
	Riccardo Mottola
In-Reply-To: <3723058-87f1-cba7-c8ad-ac8dc5722abe@linux-m68k.org>

On Nov 04 2021, Finn Thain wrote:

> Does your Xorg installation use --enable-suid-wrapper, Andreas?

Doesn't look like.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."

^ permalink raw reply

* Re: Fwd: Fwd: X stopped working with 5.14 on iBook
From: Finn Thain @ 2021-11-03 22:26 UTC (permalink / raw)
  To: Andreas Schwab
  Cc: Christopher M. Riedl, Stan Johnson, linuxppc-dev,
	Riccardo Mottola
In-Reply-To: <87cznh39lk.fsf@igel.home>

On Wed, 3 Nov 2021, Andreas Schwab wrote:

> On Nov 02 2021, Finn Thain wrote:
> 
> > After many builds and tests, Stan and I were able to determine that this 
> > regression only affects builds with CONFIG_USER_NS=y. That is,
> >
> > d3ccc9781560  + CONFIG_USER_NS=y  -->  fail
> > d3ccc9781560  + CONFIG_USER_NS=n  -->  okay
> > d3ccc9781560~ + CONFIG_USER_NS=y  -->  okay
> > d3ccc9781560~ + CONFIG_USER_NS=n  -->  okay
> 
> On my iBook G4, X is working alright with 5.15 and CONFIG_USER_NS=y.
> 

Stan said his Cube has these packages installed:

# dpkg --list | grep Xorg
ii  xserver-xorg-core                      2:1.20.11-1
      powerpc      Xorg X server - core server
ii  xserver-xorg-legacy                    2:1.20.11-1
      powerpc      setuid root Xorg server wrapper

I gather that Riccardo also runs Debian SID.

Perhaps there is some interaction between d3ccc9781560, CONFIG_USER_NS and 
the SUID wrapper...

Does your Xorg installation use --enable-suid-wrapper, Andreas?

^ permalink raw reply

* Re: [PATCH -next] powerpc/44x/fsp2: add missing of_node_put
From: Bixuan Cui @ 2021-11-03  7:30 UTC (permalink / raw)
  To: Michael Ellerman, linux-kernel, linuxppc-dev; +Cc: ivan, paulus
In-Reply-To: <163584792552.1845480.16701207323198181302.b4-ty@ellerman.id.au>

[-- Attachment #1: Type: text/plain, Size: 532 bytes --]


在 2021/11/2 下午6:12, Michael Ellerman 写道:
>> Early exits from for_each_compatible_node() should decrement the
>> node reference counter.  Reported by Coccinelle:
>>
>> ./arch/powerpc/platforms/44x/fsp2.c:206:1-25: WARNING: Function
>> "for_each_compatible_node" should have of_node_put() before return
>> around line 218.
>>
>> [...]
> Applied to powerpc/next.
>
> [1/1] powerpc/44x/fsp2: add missing of_node_put
>        https://git.kernel.org/powerpc/c/290fe8aa69ef5c51c778c0bb33f8ef0181c769f5

Thanks. :-)


Bixuan Cui

[-- Attachment #2: Type: text/html, Size: 1192 bytes --]

^ permalink raw reply

* Re: ppc64le STRICT_MODULE_RWX and livepatch apply_relocate_add() crashes
From: Suraj Jitindar Singh @ 2021-11-03 21:33 UTC (permalink / raw)
  To: Russell Currey, Joe Lawrence, live-patching, linuxppc-dev
  Cc: Peter Zijlstra, Jordan Niethe, Josh Poimboeuf, Jessica Yu
In-Reply-To: <7ee0c84650617e6307b29674dd0a12c7258417cf.camel@russell.cc>

Hi Russell,

On Mon, 2021-11-01 at 19:20 +1000, Russell Currey wrote:
> On Sun, 2021-10-31 at 22:43 -0400, Joe Lawrence wrote:
> > Starting with 5.14 kernels, I can reliably reproduce a crash [1] on
> > ppc64le when loading livepatches containing late klp-relocations
> > [2].
> > These are relocations, specific to livepatching, that are resolved
> > not
> > when a livepatch module is loaded, but only when a livepatch-target
> > module is loaded.
> 
> Hey Joe, thanks for the report.
> 
> > I haven't started looking at a fix yet, but in the case of the x86
> > code
> > update, its apply_relocate_add() implementation was modified to use
> > a
> > common text_poke() function to allowed us to drop
> > module_{en,dis}ble_ro() games by the livepatching code.
> 
> It should be a similar fix for Power, our patch_instruction() uses a
> text poke area but apply_relocate_add() doesn't use it and does its
> own
> raw patching instead.
> 
> > I can take a closer look this week, but thought I'd send out a
> > report
> > in case this may be a known todo for STRICT_MODULE_RWX on Power.
> 
> I'm looking into this now, will update when there's progress.  I
> personally wasn't aware but Jordan flagged this as an issue back in
> August [0].  Are the selftests in the klp-convert tree sufficient for
> testing?  I'm not especially familiar with livepatching & haven't
> used
> the userspace tools.
> 

You can test this by livepatching any module since this only occurs
when writing relocations for modules since the vmlinux relocations are
written earlier before the module text is mapped read-only.

- Suraj

> - Russell
> 
> [0] https://github.com/linuxppc/issues/issues/375
> 
> > 
> > -- Joe
> 
> 


^ permalink raw reply

* Re: [V3] powerpc/perf: Enable PMU counters post partition migration if PMU is active
From: Nathan Lynch @ 2021-11-03 21:11 UTC (permalink / raw)
  To: Nicholas Piggin; +Cc: Athira Rajeev, rnsastry, kjain, maddy, linuxppc-dev
In-Reply-To: <1635852231.aebe6lt6u4.astroid@bobo.none>

Nicholas Piggin <npiggin@gmail.com> writes:
> Excerpts from Michael Ellerman's message of October 29, 2021 11:15 pm:
>> Nicholas Piggin <npiggin@gmail.com> writes:
>>> Excerpts from Athira Rajeev's message of October 29, 2021 1:05 pm:
>>>> @@ -631,12 +632,18 @@ static int pseries_migrate_partition(u64 handle)
>>>>  	if (ret)
>>>>  		return ret;
>>>>  
>>>> +	/* Disable PMU before suspend */
>>>> +	on_each_cpu(&mobility_pmu_disable, NULL, 0);
>>>
>>> Why was this moved out of stop machine and to an IPI?
>>>
>>> My concern would be, what are the other CPUs doing at this time? Is it 
>>> possible they could take interrupts and schedule? Could that mess up the
>>> perf state here?
>> 
>> pseries_migrate_partition() is called directly from migration_store(),
>> which is the sysfs store function, which can be called concurrently by
>> different CPUs.
>> 
>> It's also potentially called from rtas_syscall_dispatch_ibm_suspend_me(),
>> from sys_rtas(), again with no locking.
>> 
>> So we could have two CPUs calling into here at the same time, which
>> might not crash, but is unlikely to work well.
>> 
>> I think the lack of locking might have been OK in the past because only
>> one CPU will successfully get the other CPUs to call do_join() in
>> pseries_suspend(). But I could be wrong.
>> 
>> Anyway, now that we're mutating the PMU state before suspending we need
>> to be more careful. So I think we need a lock around the whole
>> sequence.

Regardless of the outcome here, generally agreed that some serialization
should be imposed in this path. The way the platform works (and some
extra measures by the drmgr utility) make it so that this code isn't
entered concurrently in usual operation, but it's possible to make it
happen if you are root.

A file-static mutex should be OK.

> My concern is still that we wouldn't necessarily have the other CPUs 
> under control at that point even if we serialize the migrate path.
> They could take interrupts, possibly call into perf subsystem after
> the mobility_pmu_disable (e.g., via syscall or context switch) which 
> might mess things up.
>
> I think the stop machine is a reasonable place for the code in this 
> case. It's a low level disabling of hardware facility and saving off 
> registers.

That makes sense, but I can't help feeling concerned still. For this to
be safe, power_pmu_enable() and power_pmu_disable() must never sleep or
re-enable interrupts or send IPIs. I don't see anything obviously unsafe
right now, but is that already part of their contract? Is there much
risk they could change in the future to violate those constraints?

That aside, the proposed change seems like we would be hacking around a
more generic perf/pmu limitation in a powerpc-specific way. I see the
same behavior on x86 across suspend/resume.

# perf stat -a -e cache-misses -I 1000 & sleep 2 ; systemctl suspend ; sleep 20 ; kill $(jobs -p)
[1] 189806
#           time             counts unit events
     1.000501710          9,983,649      cache-misses
     2.002620321         14,131,072      cache-misses
     3.004579071         23,010,971      cache-misses
     9.971854783 140,737,491,680,853      cache-misses
    10.982669250                  0      cache-misses
    11.984660498                  0      cache-misses
    12.986648392                  0      cache-misses
    13.988561766                  0      cache-misses
    14.992670615                  0      cache-misses
    15.994938111                  0      cache-misses
    16.996703952                  0      cache-misses
    17.999092812                  0      cache-misses
    19.000602677                  0      cache-misses
    20.003272216                  0      cache-misses
    21.004770295                  0      cache-misses
# uname -r
5.13.19-100.fc33.x86_64



^ permalink raw reply

* [PATCH stable 4.9 2/2] arch: pgtable: define MAX_POSSIBLE_PHYSMEM_BITS where needed
From: Florian Fainelli @ 2021-11-03 20:57 UTC (permalink / raw)
  To: linux-kernel
  Cc: open list:MIPS, Sergey Senozhatsky, Stefan Agner,
	open list:ZSMALLOC COMPRESSED SLAB MEMORY ALLOCATOR,
	Paul Mackerras, stable, H. Peter Anvin, Sasha Levin,
	Florian Fainelli, maintainer:X86 ARCHITECTURE 32-BIT AND 64-BIT,
	Russell King, Mike Rapoport, Ingo Molnar,
	open list:SYNOPSYS ARC ARCHITECTURE, Nitin Gupta,
	open list:GENERIC INCLUDE/ASM HEADER FILES, Arnd Bergmann,
	Thomas Gleixner, moderated list:ARM PORT, Thomas Bogendoerfer,
	Greg Kroah-Hartman, Ralf Baechle, Minchan Kim, Vineet Gupta,
	open list:LINUX FOR POWERPC 32-BIT AND 64-BIT, Kirill A. Shutemov
In-Reply-To: <20211103205714.374801-1-f.fainelli@gmail.com>

From: Arnd Bergmann <arnd@arndb.de>

[ Upstream commit cef397038167ac15d085914493d6c86385773709 ]

Stefan Agner reported a bug when using zsram on 32-bit Arm machines
with RAM above the 4GB address boundary:

  Unable to handle kernel NULL pointer dereference at virtual address 00000000
  pgd = a27bd01c
  [00000000] *pgd=236a0003, *pmd=1ffa64003
  Internal error: Oops: 207 [#1] SMP ARM
  Modules linked in: mdio_bcm_unimac(+) brcmfmac cfg80211 brcmutil raspberrypi_hwmon hci_uart crc32_arm_ce bcm2711_thermal phy_generic genet
  CPU: 0 PID: 123 Comm: mkfs.ext4 Not tainted 5.9.6 #1
  Hardware name: BCM2711
  PC is at zs_map_object+0x94/0x338
  LR is at zram_bvec_rw.constprop.0+0x330/0xa64
  pc : [<c0602b38>]    lr : [<c0bda6a0>]    psr: 60000013
  sp : e376bbe0  ip : 00000000  fp : c1e2921c
  r10: 00000002  r9 : c1dda730  r8 : 00000000
  r7 : e8ff7a00  r6 : 00000000  r5 : 02f9ffa0  r4 : e3710000
  r3 : 000fdffe  r2 : c1e0ce80  r1 : ebf979a0  r0 : 00000000
  Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
  Control: 30c5383d  Table: 235c2a80  DAC: fffffffd
  Process mkfs.ext4 (pid: 123, stack limit = 0x495a22e6)
  Stack: (0xe376bbe0 to 0xe376c000)

As it turns out, zsram needs to know the maximum memory size, which
is defined in MAX_PHYSMEM_BITS when CONFIG_SPARSEMEM is set, or in
MAX_POSSIBLE_PHYSMEM_BITS on the x86 architecture.

The same problem will be hit on all 32-bit architectures that have a
physical address space larger than 4GB and happen to not enable sparsemem
and include asm/sparsemem.h from asm/pgtable.h.

After the initial discussion, I suggested just always defining
MAX_POSSIBLE_PHYSMEM_BITS whenever CONFIG_PHYS_ADDR_T_64BIT is
set, or provoking a build error otherwise. This addresses all
configurations that can currently have this runtime bug, but
leaves all other configurations unchanged.

I looked up the possible number of bits in source code and
datasheets, here is what I found:

 - on ARC, CONFIG_ARC_HAS_PAE40 controls whether 32 or 40 bits are used
 - on ARM, CONFIG_LPAE enables 40 bit addressing, without it we never
   support more than 32 bits, even though supersections in theory allow
   up to 40 bits as well.
 - on MIPS, some MIPS32r1 or later chips support 36 bits, and MIPS32r5
   XPA supports up to 60 bits in theory, but 40 bits are more than
   anyone will ever ship
 - On PowerPC, there are three different implementations of 36 bit
   addressing, but 32-bit is used without CONFIG_PTE_64BIT
 - On RISC-V, the normal page table format can support 34 bit
   addressing. There is no highmem support on RISC-V, so anything
   above 2GB is unused, but it might be useful to eventually support
   CONFIG_ZRAM for high pages.

Fixes: 61989a80fb3a ("staging: zsmalloc: zsmalloc memory allocation library")
Fixes: 02390b87a945 ("mm/zsmalloc: Prepare to variable MAX_PHYSMEM_BITS")
Acked-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Reviewed-by: Stefan Agner <stefan@agner.ch>
Tested-by: Stefan Agner <stefan@agner.ch>
Acked-by: Mike Rapoport <rppt@linux.ibm.com>
Link: https://lore.kernel.org/linux-mm/bdfa44bf1c570b05d6c70898e2bbb0acf234ecdf.1604762181.git.stefan@agner.ch/
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[florian: patch arch/powerpc/include/asm/pte-common.h for 4.9.y
removed arch/riscv/include/asm/pgtable.h which does not exist]
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
 arch/arc/include/asm/pgtable.h        |  2 ++
 arch/arm/include/asm/pgtable-2level.h |  2 ++
 arch/arm/include/asm/pgtable-3level.h |  2 ++
 arch/mips/include/asm/pgtable-32.h    |  3 +++
 arch/powerpc/include/asm/pte-common.h |  2 ++
 include/asm-generic/pgtable.h         | 13 +++++++++++++
 6 files changed, 24 insertions(+)

diff --git a/arch/arc/include/asm/pgtable.h b/arch/arc/include/asm/pgtable.h
index c10f5cb203e6..81198a6773c6 100644
--- a/arch/arc/include/asm/pgtable.h
+++ b/arch/arc/include/asm/pgtable.h
@@ -137,8 +137,10 @@
 
 #ifdef CONFIG_ARC_HAS_PAE40
 #define PTE_BITS_NON_RWX_IN_PD1	(0xff00000000 | PAGE_MASK | _PAGE_CACHEABLE)
+#define MAX_POSSIBLE_PHYSMEM_BITS 40
 #else
 #define PTE_BITS_NON_RWX_IN_PD1	(PAGE_MASK | _PAGE_CACHEABLE)
+#define MAX_POSSIBLE_PHYSMEM_BITS 32
 #endif
 
 /**************************************************************************
diff --git a/arch/arm/include/asm/pgtable-2level.h b/arch/arm/include/asm/pgtable-2level.h
index 92fd2c8a9af0..6154902bed83 100644
--- a/arch/arm/include/asm/pgtable-2level.h
+++ b/arch/arm/include/asm/pgtable-2level.h
@@ -78,6 +78,8 @@
 #define PTE_HWTABLE_OFF		(PTE_HWTABLE_PTRS * sizeof(pte_t))
 #define PTE_HWTABLE_SIZE	(PTRS_PER_PTE * sizeof(u32))
 
+#define MAX_POSSIBLE_PHYSMEM_BITS	32
+
 /*
  * PMD_SHIFT determines the size of the area a second-level page table can map
  * PGDIR_SHIFT determines what a third-level page table entry can map
diff --git a/arch/arm/include/asm/pgtable-3level.h b/arch/arm/include/asm/pgtable-3level.h
index 2a029bceaf2f..35807e611b6e 100644
--- a/arch/arm/include/asm/pgtable-3level.h
+++ b/arch/arm/include/asm/pgtable-3level.h
@@ -37,6 +37,8 @@
 #define PTE_HWTABLE_OFF		(0)
 #define PTE_HWTABLE_SIZE	(PTRS_PER_PTE * sizeof(u64))
 
+#define MAX_POSSIBLE_PHYSMEM_BITS 40
+
 /*
  * PGDIR_SHIFT determines the size a top-level page table entry can map.
  */
diff --git a/arch/mips/include/asm/pgtable-32.h b/arch/mips/include/asm/pgtable-32.h
index c0be540e83cb..2c6df5a92e1e 100644
--- a/arch/mips/include/asm/pgtable-32.h
+++ b/arch/mips/include/asm/pgtable-32.h
@@ -110,6 +110,7 @@ static inline void pmd_clear(pmd_t *pmdp)
 
 #if defined(CONFIG_XPA)
 
+#define MAX_POSSIBLE_PHYSMEM_BITS 40
 #define pte_pfn(x)		(((unsigned long)((x).pte_high >> _PFN_SHIFT)) | (unsigned long)((x).pte_low << _PAGE_PRESENT_SHIFT))
 static inline pte_t
 pfn_pte(unsigned long pfn, pgprot_t prot)
@@ -125,6 +126,7 @@ pfn_pte(unsigned long pfn, pgprot_t prot)
 
 #elif defined(CONFIG_PHYS_ADDR_T_64BIT) && defined(CONFIG_CPU_MIPS32)
 
+#define MAX_POSSIBLE_PHYSMEM_BITS 36
 #define pte_pfn(x)		((unsigned long)((x).pte_high >> 6))
 
 static inline pte_t pfn_pte(unsigned long pfn, pgprot_t prot)
@@ -139,6 +141,7 @@ static inline pte_t pfn_pte(unsigned long pfn, pgprot_t prot)
 
 #else
 
+#define MAX_POSSIBLE_PHYSMEM_BITS 32
 #ifdef CONFIG_CPU_VR41XX
 #define pte_pfn(x)		((unsigned long)((x).pte >> (PAGE_SHIFT + 2)))
 #define pfn_pte(pfn, prot)	__pte(((pfn) << (PAGE_SHIFT + 2)) | pgprot_val(prot))
diff --git a/arch/powerpc/include/asm/pte-common.h b/arch/powerpc/include/asm/pte-common.h
index 4ba26dd259fd..0d81cd9dd60e 100644
--- a/arch/powerpc/include/asm/pte-common.h
+++ b/arch/powerpc/include/asm/pte-common.h
@@ -101,8 +101,10 @@ static inline bool pte_user(pte_t pte)
  */
 #if defined(CONFIG_PPC32) && defined(CONFIG_PTE_64BIT)
 #define PTE_RPN_MASK	(~((1ULL<<PTE_RPN_SHIFT)-1))
+#define MAX_POSSIBLE_PHYSMEM_BITS 36
 #else
 #define PTE_RPN_MASK	(~((1UL<<PTE_RPN_SHIFT)-1))
+#define MAX_POSSIBLE_PHYSMEM_BITS 32
 #endif
 
 /* _PAGE_CHG_MASK masks of bits that are to be preserved across
diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
index 0a4c2d4d9f8d..b43fa9d95a7a 100644
--- a/include/asm-generic/pgtable.h
+++ b/include/asm-generic/pgtable.h
@@ -847,6 +847,19 @@ static inline bool arch_has_pfn_modify_check(void)
 #define io_remap_pfn_range remap_pfn_range
 #endif
 
+#if !defined(MAX_POSSIBLE_PHYSMEM_BITS) && !defined(CONFIG_64BIT)
+#ifdef CONFIG_PHYS_ADDR_T_64BIT
+/*
+ * ZSMALLOC needs to know the highest PFN on 32-bit architectures
+ * with physical address space extension, but falls back to
+ * BITS_PER_LONG otherwise.
+ */
+#error Missing MAX_POSSIBLE_PHYSMEM_BITS definition
+#else
+#define MAX_POSSIBLE_PHYSMEM_BITS 32
+#endif
+#endif
+
 #ifndef has_transparent_hugepage
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 #define has_transparent_hugepage() 1
-- 
2.25.1


^ permalink raw reply related

* [PATCH stable 4.9 1/2] mm/zsmalloc: Prepare to variable MAX_PHYSMEM_BITS
From: Florian Fainelli @ 2021-11-03 20:57 UTC (permalink / raw)
  To: linux-kernel
  Cc: open list:MIPS, Sergey Senozhatsky, Peter Zijlstra, Stefan Agner,
	linux-mm, Paul Mackerras, stable, H. Peter Anvin, Ingo Molnar,
	Sasha Levin, Florian Fainelli,
	open list:SYNOPSYS ARC ARCHITECTURE,
	maintainer:X86 ARCHITECTURE 32-BIT AND 64-BIT, Russell King,
	Mike Rapoport, Ingo Molnar, Borislav Petkov, Nitin Gupta,
	open list:GENERIC INCLUDE/ASM HEADER FILES, Arnd Bergmann,
	open list:LINUX FOR POWERPC 32-BIT AND 64-BIT, Thomas Gleixner,
	moderated list:ARM PORT, Thomas Bogendoerfer, Greg Kroah-Hartman,
	Ralf Baechle, Andy Lutomirski, Minchan Kim, Vineet Gupta,
	Linus Torvalds, Kirill A. Shutemov
In-Reply-To: <20211103205714.374801-1-f.fainelli@gmail.com>

From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>

commit 02390b87a9459937cdb299e6b34ff33992512ec7 upstream

With boot-time switching between paging mode we will have variable
MAX_PHYSMEM_BITS.

Let's use the maximum variable possible for CONFIG_X86_5LEVEL=y
configuration to define zsmalloc data structures.

The patch introduces MAX_POSSIBLE_PHYSMEM_BITS to cover such case.
It also suits well to handle PAE special case.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Nitin Gupta <ngupta@vflare.org>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20180214111656.88514-3-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
[florian: drop arch/x86/include/asm/pgtable_64_types.h changes since
there is no CONFIG_X86_5LEVEL]
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
 arch/x86/include/asm/pgtable-3level_types.h |  1 +
 mm/zsmalloc.c                               | 13 +++++++------
 2 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/pgtable-3level_types.h b/arch/x86/include/asm/pgtable-3level_types.h
index bcc89625ebe5..f3f719d59e61 100644
--- a/arch/x86/include/asm/pgtable-3level_types.h
+++ b/arch/x86/include/asm/pgtable-3level_types.h
@@ -42,5 +42,6 @@ typedef union {
  */
 #define PTRS_PER_PTE	512
 
+#define MAX_POSSIBLE_PHYSMEM_BITS	36
 
 #endif /* _ASM_X86_PGTABLE_3LEVEL_DEFS_H */
diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 8db3c2b27a17..2b7bfd97587a 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -83,18 +83,19 @@
  * This is made more complicated by various memory models and PAE.
  */
 
-#ifndef MAX_PHYSMEM_BITS
-#ifdef CONFIG_HIGHMEM64G
-#define MAX_PHYSMEM_BITS 36
-#else /* !CONFIG_HIGHMEM64G */
+#ifndef MAX_POSSIBLE_PHYSMEM_BITS
+#ifdef MAX_PHYSMEM_BITS
+#define MAX_POSSIBLE_PHYSMEM_BITS MAX_PHYSMEM_BITS
+#else
 /*
  * If this definition of MAX_PHYSMEM_BITS is used, OBJ_INDEX_BITS will just
  * be PAGE_SHIFT
  */
-#define MAX_PHYSMEM_BITS BITS_PER_LONG
+#define MAX_POSSIBLE_PHYSMEM_BITS BITS_PER_LONG
 #endif
 #endif
-#define _PFN_BITS		(MAX_PHYSMEM_BITS - PAGE_SHIFT)
+
+#define _PFN_BITS		(MAX_POSSIBLE_PHYSMEM_BITS - PAGE_SHIFT)
 
 /*
  * Memory for allocating for handle keeps object position by
-- 
2.25.1


^ permalink raw reply related

* [PATCH stable 4.9 0/2] zsmalloc MAX_POSSIBLE_PHYSMEM_BITS
From: Florian Fainelli @ 2021-11-03 20:57 UTC (permalink / raw)
  To: linux-kernel
  Cc: open list:MIPS, Sergey Senozhatsky, Stefan Agner,
	open list:ZSMALLOC COMPRESSED SLAB MEMORY ALLOCATOR,
	Paul Mackerras, Ralf Baechle, H. Peter Anvin, Sasha Levin,
	Florian Fainelli, maintainer:X86 ARCHITECTURE 32-BIT AND 64-BIT,
	Russell King, Mike Rapoport, Ingo Molnar,
	open list:SYNOPSYS ARC ARCHITECTURE, Nitin Gupta,
	open list:GENERIC INCLUDE/ASM HEADER FILES, Arnd Bergmann,
	Thomas Gleixner, moderated list:ARM PORT, Thomas Bogendoerfer,
	Greg Kroah-Hartman, stable, Minchan Kim, Vineet Gupta,
	open list:LINUX FOR POWERPC 32-BIT AND 64-BIT, Kirill A. Shutemov

This patch series is a back port necessary to address the problem
reported by Stefan Agner:

https://lore.kernel.org/linux-mm/bdfa44bf1c570b05d6c70898e2bbb0acf234ecdf.1604762181.git.stefan@agner.ch/

but which ended up being addressed by Arnd in a slightly different way
from Stefan's submission.

The first patch from Kirill is back ported in order to have
MAX_POSSIBLE_PHYSMEM_BITS be acted on my the zsmalloc.c code.

Arnd Bergmann (1):
  arch: pgtable: define MAX_POSSIBLE_PHYSMEM_BITS where needed

Kirill A. Shutemov (1):
  mm/zsmalloc: Prepare to variable MAX_PHYSMEM_BITS

 arch/arc/include/asm/pgtable.h              |  2 ++
 arch/arm/include/asm/pgtable-2level.h       |  2 ++
 arch/arm/include/asm/pgtable-3level.h       |  2 ++
 arch/mips/include/asm/pgtable-32.h          |  3 +++
 arch/powerpc/include/asm/pte-common.h       |  2 ++
 arch/x86/include/asm/pgtable-3level_types.h |  1 +
 include/asm-generic/pgtable.h               | 13 +++++++++++++
 mm/zsmalloc.c                               | 13 +++++++------
 8 files changed, 32 insertions(+), 6 deletions(-)

-- 
2.25.1


^ permalink raw reply

* [PATCH stable 4.14 2/2] arch: pgtable: define MAX_POSSIBLE_PHYSMEM_BITS where needed
From: Florian Fainelli @ 2021-11-03 20:57 UTC (permalink / raw)
  To: linux-kernel
  Cc: open list:MIPS, Sergey Senozhatsky, Stefan Agner,
	open list:ZSMALLOC COMPRESSED SLAB MEMORY ALLOCATOR,
	Paul Mackerras, stable, H. Peter Anvin, Sasha Levin,
	Florian Fainelli, maintainer:X86 ARCHITECTURE 32-BIT AND 64-BIT,
	Russell King, Mike Rapoport, Ingo Molnar,
	open list:SYNOPSYS ARC ARCHITECTURE, Nitin Gupta,
	open list:GENERIC INCLUDE/ASM HEADER FILES, Arnd Bergmann,
	Thomas Gleixner, moderated list:ARM PORT, Thomas Bogendoerfer,
	Greg Kroah-Hartman, Ralf Baechle, Minchan Kim, Vineet Gupta,
	open list:LINUX FOR POWERPC 32-BIT AND 64-BIT, Kirill A. Shutemov
In-Reply-To: <20211103205704.374734-1-f.fainelli@gmail.com>

From: Arnd Bergmann <arnd@arndb.de>

[ Upstream commit cef397038167ac15d085914493d6c86385773709 ]

Stefan Agner reported a bug when using zsram on 32-bit Arm machines
with RAM above the 4GB address boundary:

  Unable to handle kernel NULL pointer dereference at virtual address 00000000
  pgd = a27bd01c
  [00000000] *pgd=236a0003, *pmd=1ffa64003
  Internal error: Oops: 207 [#1] SMP ARM
  Modules linked in: mdio_bcm_unimac(+) brcmfmac cfg80211 brcmutil raspberrypi_hwmon hci_uart crc32_arm_ce bcm2711_thermal phy_generic genet
  CPU: 0 PID: 123 Comm: mkfs.ext4 Not tainted 5.9.6 #1
  Hardware name: BCM2711
  PC is at zs_map_object+0x94/0x338
  LR is at zram_bvec_rw.constprop.0+0x330/0xa64
  pc : [<c0602b38>]    lr : [<c0bda6a0>]    psr: 60000013
  sp : e376bbe0  ip : 00000000  fp : c1e2921c
  r10: 00000002  r9 : c1dda730  r8 : 00000000
  r7 : e8ff7a00  r6 : 00000000  r5 : 02f9ffa0  r4 : e3710000
  r3 : 000fdffe  r2 : c1e0ce80  r1 : ebf979a0  r0 : 00000000
  Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
  Control: 30c5383d  Table: 235c2a80  DAC: fffffffd
  Process mkfs.ext4 (pid: 123, stack limit = 0x495a22e6)
  Stack: (0xe376bbe0 to 0xe376c000)

As it turns out, zsram needs to know the maximum memory size, which
is defined in MAX_PHYSMEM_BITS when CONFIG_SPARSEMEM is set, or in
MAX_POSSIBLE_PHYSMEM_BITS on the x86 architecture.

The same problem will be hit on all 32-bit architectures that have a
physical address space larger than 4GB and happen to not enable sparsemem
and include asm/sparsemem.h from asm/pgtable.h.

After the initial discussion, I suggested just always defining
MAX_POSSIBLE_PHYSMEM_BITS whenever CONFIG_PHYS_ADDR_T_64BIT is
set, or provoking a build error otherwise. This addresses all
configurations that can currently have this runtime bug, but
leaves all other configurations unchanged.

I looked up the possible number of bits in source code and
datasheets, here is what I found:

 - on ARC, CONFIG_ARC_HAS_PAE40 controls whether 32 or 40 bits are used
 - on ARM, CONFIG_LPAE enables 40 bit addressing, without it we never
   support more than 32 bits, even though supersections in theory allow
   up to 40 bits as well.
 - on MIPS, some MIPS32r1 or later chips support 36 bits, and MIPS32r5
   XPA supports up to 60 bits in theory, but 40 bits are more than
   anyone will ever ship
 - On PowerPC, there are three different implementations of 36 bit
   addressing, but 32-bit is used without CONFIG_PTE_64BIT
 - On RISC-V, the normal page table format can support 34 bit
   addressing. There is no highmem support on RISC-V, so anything
   above 2GB is unused, but it might be useful to eventually support
   CONFIG_ZRAM for high pages.

Fixes: 61989a80fb3a ("staging: zsmalloc: zsmalloc memory allocation library")
Fixes: 02390b87a945 ("mm/zsmalloc: Prepare to variable MAX_PHYSMEM_BITS")
Acked-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Reviewed-by: Stefan Agner <stefan@agner.ch>
Tested-by: Stefan Agner <stefan@agner.ch>
Acked-by: Mike Rapoport <rppt@linux.ibm.com>
Link: https://lore.kernel.org/linux-mm/bdfa44bf1c570b05d6c70898e2bbb0acf234ecdf.1604762181.git.stefan@agner.ch/
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[florian: patch arch/powerpc/include/asm/pte-common.h for 4.14.y
removed arch/riscv/include/asm/pgtable.h which does not exist]
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
 arch/arc/include/asm/pgtable.h        |  2 ++
 arch/arm/include/asm/pgtable-2level.h |  2 ++
 arch/arm/include/asm/pgtable-3level.h |  2 ++
 arch/mips/include/asm/pgtable-32.h    |  3 +++
 arch/powerpc/include/asm/pte-common.h |  2 ++
 include/asm-generic/pgtable.h         | 13 +++++++++++++
 6 files changed, 24 insertions(+)

diff --git a/arch/arc/include/asm/pgtable.h b/arch/arc/include/asm/pgtable.h
index 77676e18da69..a31ae69da639 100644
--- a/arch/arc/include/asm/pgtable.h
+++ b/arch/arc/include/asm/pgtable.h
@@ -138,8 +138,10 @@
 
 #ifdef CONFIG_ARC_HAS_PAE40
 #define PTE_BITS_NON_RWX_IN_PD1	(0xff00000000 | PAGE_MASK | _PAGE_CACHEABLE)
+#define MAX_POSSIBLE_PHYSMEM_BITS 40
 #else
 #define PTE_BITS_NON_RWX_IN_PD1	(PAGE_MASK | _PAGE_CACHEABLE)
+#define MAX_POSSIBLE_PHYSMEM_BITS 32
 #endif
 
 /**************************************************************************
diff --git a/arch/arm/include/asm/pgtable-2level.h b/arch/arm/include/asm/pgtable-2level.h
index 92fd2c8a9af0..6154902bed83 100644
--- a/arch/arm/include/asm/pgtable-2level.h
+++ b/arch/arm/include/asm/pgtable-2level.h
@@ -78,6 +78,8 @@
 #define PTE_HWTABLE_OFF		(PTE_HWTABLE_PTRS * sizeof(pte_t))
 #define PTE_HWTABLE_SIZE	(PTRS_PER_PTE * sizeof(u32))
 
+#define MAX_POSSIBLE_PHYSMEM_BITS	32
+
 /*
  * PMD_SHIFT determines the size of the area a second-level page table can map
  * PGDIR_SHIFT determines what a third-level page table entry can map
diff --git a/arch/arm/include/asm/pgtable-3level.h b/arch/arm/include/asm/pgtable-3level.h
index 2a029bceaf2f..35807e611b6e 100644
--- a/arch/arm/include/asm/pgtable-3level.h
+++ b/arch/arm/include/asm/pgtable-3level.h
@@ -37,6 +37,8 @@
 #define PTE_HWTABLE_OFF		(0)
 #define PTE_HWTABLE_SIZE	(PTRS_PER_PTE * sizeof(u64))
 
+#define MAX_POSSIBLE_PHYSMEM_BITS 40
+
 /*
  * PGDIR_SHIFT determines the size a top-level page table entry can map.
  */
diff --git a/arch/mips/include/asm/pgtable-32.h b/arch/mips/include/asm/pgtable-32.h
index 74afe8c76bdd..215fb48f644b 100644
--- a/arch/mips/include/asm/pgtable-32.h
+++ b/arch/mips/include/asm/pgtable-32.h
@@ -111,6 +111,7 @@ static inline void pmd_clear(pmd_t *pmdp)
 
 #if defined(CONFIG_XPA)
 
+#define MAX_POSSIBLE_PHYSMEM_BITS 40
 #define pte_pfn(x)		(((unsigned long)((x).pte_high >> _PFN_SHIFT)) | (unsigned long)((x).pte_low << _PAGE_PRESENT_SHIFT))
 static inline pte_t
 pfn_pte(unsigned long pfn, pgprot_t prot)
@@ -126,6 +127,7 @@ pfn_pte(unsigned long pfn, pgprot_t prot)
 
 #elif defined(CONFIG_PHYS_ADDR_T_64BIT) && defined(CONFIG_CPU_MIPS32)
 
+#define MAX_POSSIBLE_PHYSMEM_BITS 36
 #define pte_pfn(x)		((unsigned long)((x).pte_high >> 6))
 
 static inline pte_t pfn_pte(unsigned long pfn, pgprot_t prot)
@@ -140,6 +142,7 @@ static inline pte_t pfn_pte(unsigned long pfn, pgprot_t prot)
 
 #else
 
+#define MAX_POSSIBLE_PHYSMEM_BITS 32
 #ifdef CONFIG_CPU_VR41XX
 #define pte_pfn(x)		((unsigned long)((x).pte >> (PAGE_SHIFT + 2)))
 #define pfn_pte(pfn, prot)	__pte(((pfn) << (PAGE_SHIFT + 2)) | pgprot_val(prot))
diff --git a/arch/powerpc/include/asm/pte-common.h b/arch/powerpc/include/asm/pte-common.h
index ce142ef99ba7..18ebe9a4728e 100644
--- a/arch/powerpc/include/asm/pte-common.h
+++ b/arch/powerpc/include/asm/pte-common.h
@@ -102,8 +102,10 @@ static inline bool pte_user(pte_t pte)
  */
 #if defined(CONFIG_PPC32) && defined(CONFIG_PTE_64BIT)
 #define PTE_RPN_MASK	(~((1ULL<<PTE_RPN_SHIFT)-1))
+#define MAX_POSSIBLE_PHYSMEM_BITS 36
 #else
 #define PTE_RPN_MASK	(~((1UL<<PTE_RPN_SHIFT)-1))
+#define MAX_POSSIBLE_PHYSMEM_BITS 32
 #endif
 
 /* _PAGE_CHG_MASK masks of bits that are to be preserved across
diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
index cdee19314061..dd65e925f7cf 100644
--- a/include/asm-generic/pgtable.h
+++ b/include/asm-generic/pgtable.h
@@ -1069,6 +1069,19 @@ static inline bool arch_has_pfn_modify_check(void)
 
 #endif /* !__ASSEMBLY__ */
 
+#if !defined(MAX_POSSIBLE_PHYSMEM_BITS) && !defined(CONFIG_64BIT)
+#ifdef CONFIG_PHYS_ADDR_T_64BIT
+/*
+ * ZSMALLOC needs to know the highest PFN on 32-bit architectures
+ * with physical address space extension, but falls back to
+ * BITS_PER_LONG otherwise.
+ */
+#error Missing MAX_POSSIBLE_PHYSMEM_BITS definition
+#else
+#define MAX_POSSIBLE_PHYSMEM_BITS 32
+#endif
+#endif
+
 #ifndef has_transparent_hugepage
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 #define has_transparent_hugepage() 1
-- 
2.25.1


^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox