LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] powerpc/irq: Drop forward declaration of struct irqaction
From: Christophe Leroy @ 2020-08-06 12:19 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman
  Cc: linuxppc-dev, linux-kernel

Since the commit identified below, the forward declaration of
struct irqaction is useless. Drop it.

Fixes: b709c0832824 ("ppc64: move stack switching up in interrupt processing")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
 arch/powerpc/include/asm/irq.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/powerpc/include/asm/irq.h b/arch/powerpc/include/asm/irq.h
index 814dfab7e392..4f983ca4030a 100644
--- a/arch/powerpc/include/asm/irq.h
+++ b/arch/powerpc/include/asm/irq.h
@@ -35,7 +35,6 @@ static __inline__ int irq_canonicalize(int irq)
 
 extern int distribute_irqs;
 
-struct irqaction;
 struct pt_regs;
 
 #define __ARCH_HAS_DO_SOFTIRQ
-- 
2.25.0


^ permalink raw reply related

* [PATCH] powerpc/hwirq: Remove stale forward irq_chip declaration
From: Christophe Leroy @ 2020-08-06 12:19 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman
  Cc: linuxppc-dev, linux-kernel

Since commit identified below, the forward declaration of
struct irq_chip is useless (was struct hw_interrupt_type at that time)

Remove it, together with the associated comment.

Fixes: c0ad90a32fb6 ("[PATCH] genirq: add ->retrigger() irq op to consolidate hw_irq_resend()")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
 arch/powerpc/include/asm/hw_irq.h | 6 ------
 1 file changed, 6 deletions(-)

diff --git a/arch/powerpc/include/asm/hw_irq.h b/arch/powerpc/include/asm/hw_irq.h
index 3a0db7b0b46e..538698facb80 100644
--- a/arch/powerpc/include/asm/hw_irq.h
+++ b/arch/powerpc/include/asm/hw_irq.h
@@ -372,12 +372,6 @@ static inline void may_hard_irq_enable(void) { }
 
 #define ARCH_IRQ_INIT_FLAGS	IRQ_NOREQUEST
 
-/*
- * interrupt-retrigger: should we handle this via lost interrupts and IPIs
- * or should we not care like we do now ? --BenH.
- */
-struct irq_chip;
-
 #endif  /* __ASSEMBLY__ */
 #endif	/* __KERNEL__ */
 #endif	/* _ASM_POWERPC_HW_IRQ_H */
-- 
2.25.0


^ permalink raw reply related

* Re: [PATCH v2] powerpc/uaccess: simplify the get_fs() set_fs() logic
From: Christophe Leroy @ 2020-08-06  8:29 UTC (permalink / raw)
  To: Michael Ellerman, Christophe Leroy, Benjamin Herrenschmidt,
	Paul Mackerras
  Cc: linuxppc-dev, linux-kernel
In-Reply-To: <87mu3nyh3w.fsf@mpe.ellerman.id.au>



Le 25/07/2020 à 13:22, Michael Ellerman a écrit :
> Hi Christophe,
> 
> Unfortunately this would collide messily with "uaccess: remove
> segment_eq" in linux-next, so I'll ask you to do a respin based on that,
> some comments below.

Done, sent as v3, together with the 2 patchs from Linux next to get it 
build and boot.

> 
> Christophe Leroy <christophe.leroy@c-s.fr> writes:
>> On powerpc, we only have USER_DS and KERNEL_DS
>>
>> Today, this is managed as an 'unsigned long' data space limit
>> which is used to compare the passed address with, plus a bit
>> in the thread_info flags that is set whenever modifying the limit
>> to enable the verification in addr_limit_user_check()
>>
>> The limit is either the last address of user space when USER_DS is
>> set, and the last address of address space when KERNEL_DS is set.
>> In both cases, the limit is a compiletime constant.
>>
>> get_fs() returns the limit, which is part of thread_info struct
>> set_fs() updates the limit then set the TI_FSCHECK flag.
>> addr_limit_user_check() check the flag, and if it is set it checks
>> the limit is the user limit, then unsets the TI_FSCHECK flag.
>>
>> In addition, when the flag is set the syscall exit work is involved.
>> This exit work is heavy compared to normal syscall exit as it goes
>> through normal exception exit instead of the fast syscall exit.
>>
>> Rename this TI_FSCHECK flag to TIF_KERNEL_DS flag which tells whether
>> KERNEL_DS or USER_DS is set. Get mm_segment_t be redifined as a bool
>> struct that is either false (for USER_DS) or true (for KERNEL_DS).
>> When TIF_KERNEL_DS is set, the limit is ~0UL. Otherwise it is
>> TASK_SIZE_USER (resp TASK_SIZE_USER64 on PPC64). When KERNEL_DS is
>> set, there is no range to check. Define TI_FSCHECK as an alias to
>> TIF_KERNEL_DS.
> 
> I'd rather avoid the "DS" name any more than we have to. Maybe it means
> "data space" but that's not a very common term.

I thought it was a reference to the ds/fs/gs ... segment registers in 
the 8086 ?

> 
> The generic helper these days is called uaccess_kernel(), which returns
> true when uaccess routines are allowed to access the kernel.
> 
> So calling it TIF_UACCESS_KERNEL would work I think?

ok

> 
> The bool could be called uaccess_kernel.
> And END_OF_USER_DS could be USER_ADDR_MAX.

ok

> 
>> On exit, involve exit work when the bit is set, i.e. when KERNEL_DS
>> is set. addr_limit_user_check() will clear the bit and kill the
>> user process.
> 
> I guess this is safe. The check was added to make sure we never return
> to userspace with KERNEL_DS set, but using the actual TIF flag to
> determine the address limit should be equally safe, and avoid the
> overhead of the check in the good case.

That's the purpose indeed, yes.


christophe

^ permalink raw reply

* Re: [PATCH] powerpc/40x: Fix assembler warning about r0
From: Christophe Leroy @ 2020-08-06  8:26 UTC (permalink / raw)
  To: Michael Ellerman, linuxppc-dev
In-Reply-To: <87o8noy0sc.fsf@mpe.ellerman.id.au>



Le 06/08/2020 à 04:18, Michael Ellerman a écrit :
> Christophe Leroy <christophe.leroy@csgroup.eu> writes:
>> Le 22/07/2020 à 04:24, Michael Ellerman a écrit :
>>> The assembler says:
>>>     arch/powerpc/kernel/head_40x.S:623: Warning: invalid register expression
>>
>> I get exactly the same with head_32.S, for the exact same reason.
> 
> Ah yep, I see it. I mostly build pmac32_defconfig which doesn't have
> BDI_SWITCH enabled.
> 
> Send a patch? :)

Done.

> 
> Do we still need the BDI_SWITCH code? Is it likely anyone still has one,
> that works?

I have three (One for 83xx and two for 8xx) and they work, allthough I'm 
using them only for Uboot and for very very very early Linux boot 
debugging (Last time I used it with Linux was when implementing KASAN)

Christophe

^ permalink raw reply

* [PATCH v3 1/3] syscalls: use uaccess_kernel in addr_limit_user_check
From: Christophe Leroy @ 2020-08-06  8:23 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman
  Cc: linuxppc-dev, linux-kernel

From: Christoph Hellwig <hch@lst.de>

Patch series "clean up address limit helpers", v2.

In preparation for eventually phasing out direct use of set_fs(), this
series removes the segment_eq() arch helper that is only used to implement
or duplicate the uaccess_kernel() API, and then adds descriptive helpers
to force the kernel address limit.

This patch (of 6):

Use the uaccess_kernel helper instead of duplicating it.

Link: http://lkml.kernel.org/r/20200714105505.935079-1-hch@lst.de
Link: http://lkml.kernel.org/r/20200710135706.537715-1-hch@lst.de
Link: http://lkml.kernel.org/r/20200710135706.537715-2-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nick Hu <nickhu@andestech.com>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Vincent Chen <deanbo422@gmail.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
---
 include/linux/syscalls.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index b951a87da987..e933a43d4a69 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -263,7 +263,7 @@ static inline void addr_limit_user_check(void)
 		return;
 #endif
 
-	if (CHECK_DATA_CORRUPTION(!segment_eq(get_fs(), USER_DS),
+	if (CHECK_DATA_CORRUPTION(uaccess_kernel(),
 				  "Invalid address limit on user-mode return"))
 		force_sig(SIGKILL);
 
-- 
2.25.0


^ permalink raw reply related

* [PATCH v3 2/3] uaccess: remove segment_eq
From: Christophe Leroy @ 2020-08-06  8:23 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman
  Cc: linuxppc-dev, linux-kernel
In-Reply-To: <a6e62627d25fb7ae9b91d8bf553e707689e37498.1596702117.git.christophe.leroy@csgroup.eu>

From: Christoph Hellwig <hch@lst.de>

segment_eq is only used to implement uaccess_kernel.  Just open code
uaccess_kernel in the arch uaccess headers and remove one layer of
indirection.

Link: http://lkml.kernel.org/r/20200710135706.537715-5-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Greentime Hu <green.hu@gmail.com>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Nick Hu <nickhu@andestech.com>
Cc: Vincent Chen <deanbo422@gmail.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
---
 arch/alpha/include/asm/uaccess.h      | 2 +-
 arch/arc/include/asm/segment.h        | 3 +--
 arch/arm/include/asm/uaccess.h        | 4 ++--
 arch/arm64/include/asm/uaccess.h      | 2 +-
 arch/csky/include/asm/segment.h       | 2 +-
 arch/h8300/include/asm/segment.h      | 2 +-
 arch/ia64/include/asm/uaccess.h       | 2 +-
 arch/m68k/include/asm/segment.h       | 2 +-
 arch/microblaze/include/asm/uaccess.h | 2 +-
 arch/mips/include/asm/uaccess.h       | 2 +-
 arch/nds32/include/asm/uaccess.h      | 2 +-
 arch/nios2/include/asm/uaccess.h      | 2 +-
 arch/openrisc/include/asm/uaccess.h   | 2 +-
 arch/parisc/include/asm/uaccess.h     | 2 +-
 arch/powerpc/include/asm/uaccess.h    | 3 +--
 arch/riscv/include/asm/uaccess.h      | 4 +---
 arch/s390/include/asm/uaccess.h       | 2 +-
 arch/sh/include/asm/segment.h         | 3 +--
 arch/sparc/include/asm/uaccess_32.h   | 2 +-
 arch/sparc/include/asm/uaccess_64.h   | 2 +-
 arch/x86/include/asm/uaccess.h        | 2 +-
 arch/xtensa/include/asm/uaccess.h     | 2 +-
 include/asm-generic/uaccess.h         | 4 ++--
 include/linux/uaccess.h               | 2 --
 24 files changed, 25 insertions(+), 32 deletions(-)

diff --git a/arch/alpha/include/asm/uaccess.h b/arch/alpha/include/asm/uaccess.h
index 1fe2b56cb861..1b6f25efa247 100644
--- a/arch/alpha/include/asm/uaccess.h
+++ b/arch/alpha/include/asm/uaccess.h
@@ -20,7 +20,7 @@
 #define get_fs()  (current_thread_info()->addr_limit)
 #define set_fs(x) (current_thread_info()->addr_limit = (x))
 
-#define segment_eq(a, b)	((a).seg == (b).seg)
+#define uaccess_kernel()	(get_fs().seg == KERNEL_DS.seg)
 
 /*
  * Is a address valid? This does a straightforward calculation rather
diff --git a/arch/arc/include/asm/segment.h b/arch/arc/include/asm/segment.h
index 6a2a5be5026d..871f8ab11bfd 100644
--- a/arch/arc/include/asm/segment.h
+++ b/arch/arc/include/asm/segment.h
@@ -14,8 +14,7 @@ typedef unsigned long mm_segment_t;
 
 #define KERNEL_DS		MAKE_MM_SEG(0)
 #define USER_DS			MAKE_MM_SEG(TASK_SIZE)
-
-#define segment_eq(a, b)	((a) == (b))
+#define uaccess_kernel()	(get_fs() == KERNEL_DS)
 
 #endif /* __ASSEMBLY__ */
 #endif /* __ASMARC_SEGMENT_H */
diff --git a/arch/arm/include/asm/uaccess.h b/arch/arm/include/asm/uaccess.h
index 98c6b91be4a8..b19c9bec1f7a 100644
--- a/arch/arm/include/asm/uaccess.h
+++ b/arch/arm/include/asm/uaccess.h
@@ -76,7 +76,7 @@ static inline void set_fs(mm_segment_t fs)
 	modify_domain(DOMAIN_KERNEL, fs ? DOMAIN_CLIENT : DOMAIN_MANAGER);
 }
 
-#define segment_eq(a, b)	((a) == (b))
+#define uaccess_kernel()	(get_fs() == KERNEL_DS)
 
 /* We use 33-bit arithmetic here... */
 #define __range_ok(addr, size) ({ \
@@ -263,7 +263,7 @@ extern int __put_user_8(void *, unsigned long long);
  */
 #define USER_DS			KERNEL_DS
 
-#define segment_eq(a, b)		(1)
+#define uaccess_kernel()	(true)
 #define __addr_ok(addr)		((void)(addr), 1)
 #define __range_ok(addr, size)	((void)(addr), 0)
 #define get_fs()		(KERNEL_DS)
diff --git a/arch/arm64/include/asm/uaccess.h b/arch/arm64/include/asm/uaccess.h
index bc5c7b091152..fcb8174de505 100644
--- a/arch/arm64/include/asm/uaccess.h
+++ b/arch/arm64/include/asm/uaccess.h
@@ -49,7 +49,7 @@ static inline void set_fs(mm_segment_t fs)
 				CONFIG_ARM64_UAO));
 }
 
-#define segment_eq(a, b)	((a) == (b))
+#define uaccess_kernel()	(get_fs() == KERNEL_DS)
 
 /*
  * Test whether a block of memory is a valid user space address.
diff --git a/arch/csky/include/asm/segment.h b/arch/csky/include/asm/segment.h
index db2640d5f575..79ede9b1a646 100644
--- a/arch/csky/include/asm/segment.h
+++ b/arch/csky/include/asm/segment.h
@@ -13,6 +13,6 @@ typedef struct {
 #define USER_DS			((mm_segment_t) { 0x80000000UL })
 #define get_fs()		(current_thread_info()->addr_limit)
 #define set_fs(x)		(current_thread_info()->addr_limit = (x))
-#define segment_eq(a, b)	((a).seg == (b).seg)
+#define uaccess_kernel()	(get_fs().seg == KERNEL_DS.seg)
 
 #endif /* __ASM_CSKY_SEGMENT_H */
diff --git a/arch/h8300/include/asm/segment.h b/arch/h8300/include/asm/segment.h
index a407978f9f9f..37950725d9b9 100644
--- a/arch/h8300/include/asm/segment.h
+++ b/arch/h8300/include/asm/segment.h
@@ -33,7 +33,7 @@ static inline mm_segment_t get_fs(void)
 	return USER_DS;
 }
 
-#define segment_eq(a, b)	((a).seg == (b).seg)
+#define uaccess_kernel()	(get_fs().seg == KERNEL_DS.seg)
 
 #endif /* __ASSEMBLY__ */
 
diff --git a/arch/ia64/include/asm/uaccess.h b/arch/ia64/include/asm/uaccess.h
index 8aa473a4b0f4..179243c3dfc7 100644
--- a/arch/ia64/include/asm/uaccess.h
+++ b/arch/ia64/include/asm/uaccess.h
@@ -50,7 +50,7 @@
 #define get_fs()  (current_thread_info()->addr_limit)
 #define set_fs(x) (current_thread_info()->addr_limit = (x))
 
-#define segment_eq(a, b)	((a).seg == (b).seg)
+#define uaccess_kernel()	(get_fs().seg == KERNEL_DS.seg)
 
 /*
  * When accessing user memory, we need to make sure the entire area really is in
diff --git a/arch/m68k/include/asm/segment.h b/arch/m68k/include/asm/segment.h
index c6686559e9b7..2b5e68a71ef7 100644
--- a/arch/m68k/include/asm/segment.h
+++ b/arch/m68k/include/asm/segment.h
@@ -52,7 +52,7 @@ static inline void set_fs(mm_segment_t val)
 #define set_fs(x)	(current_thread_info()->addr_limit = (x))
 #endif
 
-#define segment_eq(a, b) ((a).seg == (b).seg)
+#define uaccess_kernel()	(get_fs().seg == KERNEL_DS.seg)
 
 #endif /* __ASSEMBLY__ */
 
diff --git a/arch/microblaze/include/asm/uaccess.h b/arch/microblaze/include/asm/uaccess.h
index 6723c56ec378..304b04ffea2f 100644
--- a/arch/microblaze/include/asm/uaccess.h
+++ b/arch/microblaze/include/asm/uaccess.h
@@ -41,7 +41,7 @@
 # define get_fs()	(current_thread_info()->addr_limit)
 # define set_fs(val)	(current_thread_info()->addr_limit = (val))
 
-# define segment_eq(a, b)	((a).seg == (b).seg)
+# define uaccess_kernel()	(get_fs().seg == KERNEL_DS.seg)
 
 #ifndef CONFIG_MMU
 
diff --git a/arch/mips/include/asm/uaccess.h b/arch/mips/include/asm/uaccess.h
index 62b298c50905..61fc01f177a6 100644
--- a/arch/mips/include/asm/uaccess.h
+++ b/arch/mips/include/asm/uaccess.h
@@ -72,7 +72,7 @@ extern u64 __ua_limit;
 #define get_fs()	(current_thread_info()->addr_limit)
 #define set_fs(x)	(current_thread_info()->addr_limit = (x))
 
-#define segment_eq(a, b)	((a).seg == (b).seg)
+#define uaccess_kernel()	(get_fs().seg == KERNEL_DS.seg)
 
 /*
  * eva_kernel_access() - determine whether kernel memory access on an EVA system
diff --git a/arch/nds32/include/asm/uaccess.h b/arch/nds32/include/asm/uaccess.h
index 3a9219f53ee0..010ba5f1d7dd 100644
--- a/arch/nds32/include/asm/uaccess.h
+++ b/arch/nds32/include/asm/uaccess.h
@@ -44,7 +44,7 @@ static inline void set_fs(mm_segment_t fs)
 	current_thread_info()->addr_limit = fs;
 }
 
-#define segment_eq(a, b)	((a) == (b))
+#define uaccess_kernel()	(get_fs() == KERNEL_DS)
 
 #define __range_ok(addr, size) (size <= get_fs() && addr <= (get_fs() -size))
 
diff --git a/arch/nios2/include/asm/uaccess.h b/arch/nios2/include/asm/uaccess.h
index e83f831a76f9..a741abbed6fb 100644
--- a/arch/nios2/include/asm/uaccess.h
+++ b/arch/nios2/include/asm/uaccess.h
@@ -30,7 +30,7 @@
 #define get_fs()		(current_thread_info()->addr_limit)
 #define set_fs(seg)		(current_thread_info()->addr_limit = (seg))
 
-#define segment_eq(a, b)	((a).seg == (b).seg)
+#define uaccess_kernel() (get_fs().seg == KERNEL_DS.seg)
 
 #define __access_ok(addr, len)			\
 	(((signed long)(((long)get_fs().seg) &	\
diff --git a/arch/openrisc/include/asm/uaccess.h b/arch/openrisc/include/asm/uaccess.h
index 17c24f14615f..48b691530d3e 100644
--- a/arch/openrisc/include/asm/uaccess.h
+++ b/arch/openrisc/include/asm/uaccess.h
@@ -43,7 +43,7 @@
 #define get_fs()	(current_thread_info()->addr_limit)
 #define set_fs(x)	(current_thread_info()->addr_limit = (x))
 
-#define segment_eq(a, b)	((a) == (b))
+#define uaccess_kernel()	(get_fs() == KERNEL_DS)
 
 /* Ensure that the range from addr to addr+size is all within the process'
  * address space
diff --git a/arch/parisc/include/asm/uaccess.h b/arch/parisc/include/asm/uaccess.h
index ebbb9ffe038c..ed2cd4fb479b 100644
--- a/arch/parisc/include/asm/uaccess.h
+++ b/arch/parisc/include/asm/uaccess.h
@@ -14,7 +14,7 @@
 #define KERNEL_DS	((mm_segment_t){0})
 #define USER_DS 	((mm_segment_t){1})
 
-#define segment_eq(a, b) ((a).seg == (b).seg)
+#define uaccess_kernel() (get_fs().seg == KERNEL_DS.seg)
 
 #define get_fs()	(current_thread_info()->addr_limit)
 #define set_fs(x)	(current_thread_info()->addr_limit = (x))
diff --git a/arch/powerpc/include/asm/uaccess.h b/arch/powerpc/include/asm/uaccess.h
index 64c04ab09112..00699903f1ef 100644
--- a/arch/powerpc/include/asm/uaccess.h
+++ b/arch/powerpc/include/asm/uaccess.h
@@ -38,8 +38,7 @@ static inline void set_fs(mm_segment_t fs)
 	set_thread_flag(TIF_FSCHECK);
 }
 
-#define segment_eq(a, b)	((a).seg == (b).seg)
-
+#define uaccess_kernel() (get_fs().seg == KERNEL_DS.seg)
 #define user_addr_max()	(get_fs().seg)
 
 #ifdef __powerpc64__
diff --git a/arch/riscv/include/asm/uaccess.h b/arch/riscv/include/asm/uaccess.h
index 8ce9d607b53d..1bdf663e5932 100644
--- a/arch/riscv/include/asm/uaccess.h
+++ b/arch/riscv/include/asm/uaccess.h
@@ -62,11 +62,9 @@ static inline void set_fs(mm_segment_t fs)
 	current_thread_info()->addr_limit = fs;
 }
 
-#define segment_eq(a, b) ((a).seg == (b).seg)
-
+#define uaccess_kernel() (get_fs().seg == KERNEL_DS.seg)
 #define user_addr_max()	(get_fs().seg)
 
-
 /**
  * access_ok: - Checks if a user space pointer is valid
  * @addr: User space pointer to start of block to check
diff --git a/arch/s390/include/asm/uaccess.h b/arch/s390/include/asm/uaccess.h
index 324438889fe1..f09444d6aeab 100644
--- a/arch/s390/include/asm/uaccess.h
+++ b/arch/s390/include/asm/uaccess.h
@@ -32,7 +32,7 @@
 #define USER_DS_SACF	(3)
 
 #define get_fs()        (current->thread.mm_segment)
-#define segment_eq(a,b) (((a) & 2) == ((b) & 2))
+#define uaccess_kernel() ((get_fs() & 2) == KERNEL_DS)
 
 void set_fs(mm_segment_t fs);
 
diff --git a/arch/sh/include/asm/segment.h b/arch/sh/include/asm/segment.h
index 33d1d28057cb..02e54a3335d6 100644
--- a/arch/sh/include/asm/segment.h
+++ b/arch/sh/include/asm/segment.h
@@ -24,8 +24,7 @@ typedef struct {
 #define USER_DS		KERNEL_DS
 #endif
 
-#define segment_eq(a, b) ((a).seg == (b).seg)
-
+#define uaccess_kernel() (get_fs().seg == KERNEL_DS.seg)
 
 #define get_fs()	(current_thread_info()->addr_limit)
 #define set_fs(x)	(current_thread_info()->addr_limit = (x))
diff --git a/arch/sparc/include/asm/uaccess_32.h b/arch/sparc/include/asm/uaccess_32.h
index d6d8413eca83..0a2d3ebc4bb8 100644
--- a/arch/sparc/include/asm/uaccess_32.h
+++ b/arch/sparc/include/asm/uaccess_32.h
@@ -28,7 +28,7 @@
 #define get_fs()	(current->thread.current_ds)
 #define set_fs(val)	((current->thread.current_ds) = (val))
 
-#define segment_eq(a, b) ((a).seg == (b).seg)
+#define uaccess_kernel() (get_fs().seg == KERNEL_DS.seg)
 
 /* We have there a nice not-mapped page at PAGE_OFFSET - PAGE_SIZE, so that this test
  * can be fairly lightweight.
diff --git a/arch/sparc/include/asm/uaccess_64.h b/arch/sparc/include/asm/uaccess_64.h
index bf9d330073b2..698cf69f74e9 100644
--- a/arch/sparc/include/asm/uaccess_64.h
+++ b/arch/sparc/include/asm/uaccess_64.h
@@ -32,7 +32,7 @@
 
 #define get_fs() ((mm_segment_t){(current_thread_info()->current_ds)})
 
-#define segment_eq(a, b)  ((a).seg == (b).seg)
+#define uaccess_kernel() (get_fs().seg == KERNEL_DS.seg)
 
 #define set_fs(val)								\
 do {										\
diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
index 18dfa07d3ef0..dd3261f9f4ea 100644
--- a/arch/x86/include/asm/uaccess.h
+++ b/arch/x86/include/asm/uaccess.h
@@ -33,7 +33,7 @@ static inline void set_fs(mm_segment_t fs)
 	set_thread_flag(TIF_FSCHECK);
 }
 
-#define segment_eq(a, b)	((a).seg == (b).seg)
+#define uaccess_kernel() (get_fs().seg == KERNEL_DS.seg)
 #define user_addr_max() (current->thread.addr_limit.seg)
 
 /*
diff --git a/arch/xtensa/include/asm/uaccess.h b/arch/xtensa/include/asm/uaccess.h
index e57f0d0a88d8..b9758119feca 100644
--- a/arch/xtensa/include/asm/uaccess.h
+++ b/arch/xtensa/include/asm/uaccess.h
@@ -35,7 +35,7 @@
 #define get_fs()	(current->thread.current_ds)
 #define set_fs(val)	(current->thread.current_ds = (val))
 
-#define segment_eq(a, b)	((a).seg == (b).seg)
+#define uaccess_kernel() (get_fs().seg == KERNEL_DS.seg)
 
 #define __kernel_ok (uaccess_kernel())
 #define __user_ok(addr, size) \
diff --git a/include/asm-generic/uaccess.h b/include/asm-generic/uaccess.h
index e935318804f8..ba68ee4dabfa 100644
--- a/include/asm-generic/uaccess.h
+++ b/include/asm-generic/uaccess.h
@@ -86,8 +86,8 @@ static inline void set_fs(mm_segment_t fs)
 }
 #endif
 
-#ifndef segment_eq
-#define segment_eq(a, b) ((a).seg == (b).seg)
+#ifndef uaccess_kernel
+#define uaccess_kernel() (get_fs().seg == KERNEL_DS.seg)
 #endif
 
 #define access_ok(addr, size) __access_ok((unsigned long)(addr),(size))
diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h
index 0a76ddc07d59..5c62d0c6f15b 100644
--- a/include/linux/uaccess.h
+++ b/include/linux/uaccess.h
@@ -6,8 +6,6 @@
 #include <linux/sched.h>
 #include <linux/thread_info.h>
 
-#define uaccess_kernel() segment_eq(get_fs(), KERNEL_DS)
-
 #include <asm/uaccess.h>
 
 /*
-- 
2.25.0


^ permalink raw reply related

* [PATCH v3 3/3] powerpc/uaccess: simplify the get_fs() set_fs() logic
From: Christophe Leroy @ 2020-08-06  8:23 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman
  Cc: linuxppc-dev, linux-kernel
In-Reply-To: <a6e62627d25fb7ae9b91d8bf553e707689e37498.1596702117.git.christophe.leroy@csgroup.eu>

On powerpc, we only have USER_DS and KERNEL_DS

Today, this is managed as an 'unsigned long' data space limit
which is used to compare the passed address with, plus a bit
in the thread_info flags that is set whenever modifying the limit
to enable the verification in addr_limit_user_check()

The limit is either the last address of user space when USER_DS is
set, and the last address of address space when KERNEL_DS is set.
In both cases, the limit is a compiletime constant.

get_fs() returns the limit, which is part of thread_info struct
set_fs() updates the limit then set the TI_FSCHECK flag.
addr_limit_user_check() check the flag, and if it is set it checks
the limit is the user limit, then unsets the TI_FSCHECK flag.

In addition, when the flag is set the syscall exit work is involved.
This exit work is heavy compared to normal syscall exit as it goes
through normal exception exit instead of the fast syscall exit.

Rename this TI_FSCHECK flag to TIF_UACCESS_KERNEL flag which tells
whether KERNEL_DS or USER_DS is set. Get mm_segment_t be redifined as
a bool struct that is either false (for USER_DS) or true (for
KERNEL_DS). When TIF_UACCESS_KERNEL is set, the limit is ~0UL.
Otherwise it is TASK_SIZE_USER (resp TASK_SIZE_USER64 on PPC64). When
KERNEL_DS is set, there is no range to check. Define TI_FSCHECK as an
alias to TIF_UACCESS_KERNEL.

On exit, involve exit work when the bit is set, i.e. when KERNEL_DS
is set. addr_limit_user_check() will clear the bit and kill the
user process.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
v3: Rebased and taken into account removal of segment_eq() and comments from mpe
---
 arch/powerpc/include/asm/processor.h   |  5 +---
 arch/powerpc/include/asm/thread_info.h |  9 ++++---
 arch/powerpc/include/asm/uaccess.h     | 35 +++++++++++++-------------
 arch/powerpc/lib/sstep.c               |  2 +-
 4 files changed, 25 insertions(+), 26 deletions(-)

diff --git a/arch/powerpc/include/asm/processor.h b/arch/powerpc/include/asm/processor.h
index ed0d633ab5aa..86a9c4395b99 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -84,7 +84,7 @@ void start_thread(struct pt_regs *regs, unsigned long fdptr, unsigned long sp);
 void release_thread(struct task_struct *);
 
 typedef struct {
-	unsigned long seg;
+	bool uaccess_kernel;
 } mm_segment_t;
 
 #define TS_FPR(i) fp_state.fpr[i][TS_FPROFFSET]
@@ -148,7 +148,6 @@ struct thread_struct {
 	unsigned long	ksp_vsid;
 #endif
 	struct pt_regs	*regs;		/* Pointer to saved register state */
-	mm_segment_t	addr_limit;	/* for get_fs() validation */
 #ifdef CONFIG_BOOKE
 	/* BookE base exception scratch space; align on cacheline */
 	unsigned long	normsave[8] ____cacheline_aligned;
@@ -295,7 +294,6 @@ struct thread_struct {
 #define INIT_THREAD { \
 	.ksp = INIT_SP, \
 	.ksp_limit = INIT_SP_LIMIT, \
-	.addr_limit = KERNEL_DS, \
 	.pgdir = swapper_pg_dir, \
 	.fpexc_mode = MSR_FE0 | MSR_FE1, \
 	SPEFSCR_INIT \
@@ -303,7 +301,6 @@ struct thread_struct {
 #else
 #define INIT_THREAD  { \
 	.ksp = INIT_SP, \
-	.addr_limit = KERNEL_DS, \
 	.fpexc_mode = 0, \
 }
 #endif
diff --git a/arch/powerpc/include/asm/thread_info.h b/arch/powerpc/include/asm/thread_info.h
index ca6c97025704..123232a63ee7 100644
--- a/arch/powerpc/include/asm/thread_info.h
+++ b/arch/powerpc/include/asm/thread_info.h
@@ -69,7 +69,7 @@ struct thread_info {
 #define INIT_THREAD_INFO(tsk)			\
 {						\
 	.preempt_count = INIT_PREEMPT_COUNT,	\
-	.flags =	0,			\
+	.flags =	_TIF_UACCESS_KERNEL,		\
 }
 
 #define THREAD_SIZE_ORDER	(THREAD_SHIFT - PAGE_SHIFT)
@@ -90,7 +90,8 @@ void arch_setup_new_exec(void);
 #define TIF_SYSCALL_TRACE	0	/* syscall trace active */
 #define TIF_SIGPENDING		1	/* signal pending */
 #define TIF_NEED_RESCHED	2	/* rescheduling necessary */
-#define TIF_FSCHECK		3	/* Check FS is USER_DS on return */
+#define TIF_UACCESS_KERNEL	3	/* KERNEL_DS is set */
+#define TIF_FSCHECK	TIF_UACCESS_KERNEL
 #define TIF_SYSCALL_EMU		4	/* syscall emulation active */
 #define TIF_RESTORE_TM		5	/* need to restore TM FP/VEC/VSX */
 #define TIF_PATCH_PENDING	6	/* pending live patching update */
@@ -130,7 +131,7 @@ void arch_setup_new_exec(void);
 #define _TIF_SYSCALL_TRACEPOINT	(1<<TIF_SYSCALL_TRACEPOINT)
 #define _TIF_EMULATE_STACK_STORE	(1<<TIF_EMULATE_STACK_STORE)
 #define _TIF_NOHZ		(1<<TIF_NOHZ)
-#define _TIF_FSCHECK		(1<<TIF_FSCHECK)
+#define _TIF_UACCESS_KERNEL	(1 << TIF_UACCESS_KERNEL)
 #define _TIF_SYSCALL_EMU	(1<<TIF_SYSCALL_EMU)
 #define _TIF_SYSCALL_DOTRACE	(_TIF_SYSCALL_TRACE | _TIF_SYSCALL_AUDIT | \
 				 _TIF_SECCOMP | _TIF_SYSCALL_TRACEPOINT | \
@@ -139,7 +140,7 @@ void arch_setup_new_exec(void);
 #define _TIF_USER_WORK_MASK	(_TIF_SIGPENDING | _TIF_NEED_RESCHED | \
 				 _TIF_NOTIFY_RESUME | _TIF_UPROBE | \
 				 _TIF_RESTORE_TM | _TIF_PATCH_PENDING | \
-				 _TIF_FSCHECK)
+				 _TIF_UACCESS_KERNEL)
 #define _TIF_PERSYSCALL_MASK	(_TIF_RESTOREALL|_TIF_NOERROR)
 
 /* Bits in local_flags */
diff --git a/arch/powerpc/include/asm/uaccess.h b/arch/powerpc/include/asm/uaccess.h
index 00699903f1ef..8567bec6f939 100644
--- a/arch/powerpc/include/asm/uaccess.h
+++ b/arch/powerpc/include/asm/uaccess.h
@@ -15,48 +15,49 @@
  *
  * For historical reasons, these macros are grossly misnamed.
  *
- * The fs/ds values are now the highest legal address in the "segment".
+ * The fs/ds values are now a bool which tells the "segment" is user or kernel.
  * This simplifies the checking in the routines below.
  */
 
 #define MAKE_MM_SEG(s)  ((mm_segment_t) { (s) })
 
-#define KERNEL_DS	MAKE_MM_SEG(~0UL)
-#ifdef __powerpc64__
-/* We use TASK_SIZE_USER64 as TASK_SIZE is not constant */
-#define USER_DS		MAKE_MM_SEG(TASK_SIZE_USER64 - 1)
-#else
-#define USER_DS		MAKE_MM_SEG(TASK_SIZE - 1)
-#endif
+#define KERNEL_DS	MAKE_MM_SEG(true)
+#define USER_DS		MAKE_MM_SEG(false)
 
-#define get_fs()	(current->thread.addr_limit)
+#define get_fs()	(MAKE_MM_SEG(test_thread_flag(TIF_UACCESS_KERNEL)))
 
 static inline void set_fs(mm_segment_t fs)
 {
-	current->thread.addr_limit = fs;
-	/* On user-mode return check addr_limit (fs) is correct */
-	set_thread_flag(TIF_FSCHECK);
+	update_thread_flag(TIF_UACCESS_KERNEL, fs.uaccess_kernel);
 }
 
-#define uaccess_kernel() (get_fs().seg == KERNEL_DS.seg)
-#define user_addr_max()	(get_fs().seg)
+#define uaccess_kernel() (get_fs().uaccess_kernel)
+#define user_addr_max()	(get_fs().uaccess_kernel ? ~0UL : USER_ADDR_MAX - 1)
 
 #ifdef __powerpc64__
+
+#define USER_ADDR_MAX		TASK_SIZE_USER64
+
 /*
  * This check is sufficient because there is a large enough
  * gap between user addresses and the kernel addresses
  */
 #define __access_ok(addr, size, segment)	\
-	(((addr) <= (segment).seg) && ((size) <= (segment).seg))
+	segment.uaccess_kernel ?	\
+	1 : (addr) < USER_ADDR_MAX && ((size) < USER_ADDR_MAX)
 
 #else
 
+#define USER_ADDR_MAX		TASK_SIZE
+
 static inline int __access_ok(unsigned long addr, unsigned long size,
 			mm_segment_t seg)
 {
-	if (addr > seg.seg)
+	if (seg.uaccess_kernel)
+		return 1;
+	if (addr >= USER_ADDR_MAX)
 		return 0;
-	return (size == 0 || size - 1 <= seg.seg - addr);
+	return addr + size <= USER_ADDR_MAX;
 }
 
 #endif
diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c
index caee8cc77e19..e10b642566ba 100644
--- a/arch/powerpc/lib/sstep.c
+++ b/arch/powerpc/lib/sstep.c
@@ -112,7 +112,7 @@ static nokprobe_inline long address_ok(struct pt_regs *regs,
 		return 1;
 	if (__access_ok(ea, 1, USER_DS))
 		/* Access overlaps the end of the user region */
-		regs->dar = USER_DS.seg;
+		regs->dar = USER_ADDR_MAX;
 	else
 		regs->dar = ea;
 	return 0;
-- 
2.25.0


^ permalink raw reply related

* [PATCH] powerpc/32s: Fix assembler warning about r0
From: Christophe Leroy @ 2020-08-06  6:01 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman
  Cc: linuxppc-dev, linux-kernel

The assembler says:
  arch/powerpc/kernel/head_32.S:1095: Warning: invalid register expression

It's objecting to the use of r0 as the RA argument. That's because
when RA = 0 the literal value 0 is used, rather than the content of
r0, making the use of r0 in the source potentially confusing.

Fix it to use a literal 0, the generated code is identical.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
 arch/powerpc/kernel/head_32.S | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/head_32.S b/arch/powerpc/kernel/head_32.S
index f3ab94d73936..5624db0e09a1 100644
--- a/arch/powerpc/kernel/head_32.S
+++ b/arch/powerpc/kernel/head_32.S
@@ -1092,7 +1092,7 @@ END_MMU_FTR_SECTION_IFSET(MMU_FTR_HPTE_TABLE)
 	 */
 	lis	r5, abatron_pteptrs@h
 	ori	r5, r5, abatron_pteptrs@l
-	stw	r5, 0xf0(r0)	/* This much match your Abatron config */
+	stw	r5, 0xf0(0)	/* This much match your Abatron config */
 	lis	r6, swapper_pg_dir@h
 	ori	r6, r6, swapper_pg_dir@l
 	tophys(r5, r5)
-- 
2.25.0


^ permalink raw reply related

* Re: [PATCH] powerpc/signal: Move and simplify get_clean_sp()
From: Christoph Hellwig @ 2020-08-06  9:25 UTC (permalink / raw)
  To: Christophe Leroy; +Cc: Paul Mackerras, linuxppc-dev, linux-kernel
In-Reply-To: <04169f40c09682ce5747518268ca84285bc17fbc.1596703345.git.christophe.leroy@csgroup.eu>

On Thu, Aug 06, 2020 at 08:50:20AM +0000, Christophe Leroy wrote:
> get_clean_sp() is only used in kernel/signal.c . Move it there.
> 
> And GCC is smart enough to reduce the function when on PPC32, no
> need of a special PPC32 simple version.

What about just open coding it in the only caller, which would seem even
cleaner?

^ permalink raw reply

* [PATCH] powerpc/book3s64/radix: Make radix_mem_block_size 64bit
From: Aneesh Kumar K.V @ 2020-08-06  8:14 UTC (permalink / raw)
  To: linuxppc-dev, mpe; +Cc: Aneesh Kumar K.V

Similar to commit: 89c140bbaeee ("pseries: Fix 64 bit logical memory block panic")
make sure we update different variables tracking lmb_size are updated
to be 64 bit.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 arch/powerpc/include/asm/book3s/64/mmu.h | 2 +-
 arch/powerpc/include/asm/drmem.h         | 2 +-
 arch/powerpc/mm/book3s64/radix_pgtable.c | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/mmu.h b/arch/powerpc/include/asm/book3s/64/mmu.h
index 55442d45c597..1a0c9d09950f 100644
--- a/arch/powerpc/include/asm/book3s/64/mmu.h
+++ b/arch/powerpc/include/asm/book3s/64/mmu.h
@@ -85,7 +85,7 @@ extern unsigned int mmu_base_pid;
 /*
  * memory block size used with radix translation.
  */
-extern unsigned int __ro_after_init radix_mem_block_size;
+extern unsigned long __ro_after_init radix_mem_block_size;
 
 #define PRTB_SIZE_SHIFT	(mmu_pid_bits + 4)
 #define PRTB_ENTRIES	(1ul << mmu_pid_bits)
diff --git a/arch/powerpc/include/asm/drmem.h b/arch/powerpc/include/asm/drmem.h
index 17ccc6474ab6..07c158c5f939 100644
--- a/arch/powerpc/include/asm/drmem.h
+++ b/arch/powerpc/include/asm/drmem.h
@@ -21,7 +21,7 @@ struct drmem_lmb {
 struct drmem_lmb_info {
 	struct drmem_lmb        *lmbs;
 	int                     n_lmbs;
-	u32                     lmb_size;
+	u64                     lmb_size;
 };
 
 extern struct drmem_lmb_info *drmem_info;
diff --git a/arch/powerpc/mm/book3s64/radix_pgtable.c b/arch/powerpc/mm/book3s64/radix_pgtable.c
index 28c784976bed..ca76d9d6372a 100644
--- a/arch/powerpc/mm/book3s64/radix_pgtable.c
+++ b/arch/powerpc/mm/book3s64/radix_pgtable.c
@@ -34,7 +34,7 @@
 
 unsigned int mmu_pid_bits;
 unsigned int mmu_base_pid;
-unsigned int radix_mem_block_size __ro_after_init;
+unsigned long radix_mem_block_size __ro_after_init;
 
 static __ref void *early_alloc_pgtable(unsigned long size, int nid,
 			unsigned long region_start, unsigned long region_end)
-- 
2.26.2


^ permalink raw reply related

* Re: [PATCH v3 3/3] powerpc/uaccess: simplify the get_fs() set_fs() logic
From: Christoph Hellwig @ 2020-08-06  9:17 UTC (permalink / raw)
  To: Christophe Leroy; +Cc: linuxppc-dev, Paul Mackerras, linux-kernel
In-Reply-To: <cf39cb8e42cffe323393b8cecdc59a7230298eab.1596702117.git.christophe.leroy@csgroup.eu>

Do you urgently need this?  My plan for 5.10 is to rebased and submit
the remaining bits of this branch:

    http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/set_fs-removal

which will kill off set_fs/get_fs entirely.

^ permalink raw reply

* Re: [PATCH v10 2/5] powerpc/vdso: Prepare for switching VDSO to generic C implementation.
From: Christophe Leroy @ 2020-08-06  5:46 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: nathanl, linux-arch, vincenzo.frascino, arnd, linux-kernel,
	Paul Mackerras, luto, tglx, linuxppc-dev
In-Reply-To: <20200805184054.GQ6753@gate.crashing.org>

Hi,

On 08/05/2020 06:40 PM, Segher Boessenkool wrote:
> Hi!
> 
> On Wed, Aug 05, 2020 at 04:40:16PM +0000, Christophe Leroy wrote:
>>> It cannot optimise it because it does not know shift < 32.  The code
>>> below is incorrect for shift equal to 32, fwiw.
>>
>> Is there a way to tell it ?
> 
> Sure, for example the &31 should work (but it doesn't, with the GCC
> version you used -- which version is that?)

GCC 10.1

> 
>>> What does the compiler do for just
>>>
>>> static __always_inline u64 vdso_shift_ns(u64 ns, unsigned long shift)
>>> 	return ns >> (shift & 31);
>>> }
>>>
>>
>> Worse:
> 
> I cannot make heads or tails of all that branch spaghetti, sorry.
> 
>>   73c:	55 8c 06 fe 	clrlwi  r12,r12,27
>>   740:	7f c8 f0 14 	addc    r30,r8,r30
>>   744:	7c c6 4a 14 	add     r6,r6,r9
>>   748:	7c c6 e1 14 	adde    r6,r6,r28
>>   74c:	34 6c ff e0 	addic.  r3,r12,-32
>>   750:	41 80 00 70 	blt     7c0 <__c_kernel_clock_gettime+0x114>
> 
> This branch is always true.  Hrm.

As a standalone function:

With your suggestion:

000006ac <vdso_shift_ns>:
  6ac:	54 a5 06 fe 	clrlwi  r5,r5,27
  6b0:	35 25 ff e0 	addic.  r9,r5,-32
  6b4:	41 80 00 10 	blt     6c4 <vdso_shift_ns+0x18>
  6b8:	7c 64 4c 30 	srw     r4,r3,r9
  6bc:	38 60 00 00 	li      r3,0
  6c0:	4e 80 00 20 	blr
  6c4:	54 69 08 3c 	rlwinm  r9,r3,1,0,30
  6c8:	21 45 00 1f 	subfic  r10,r5,31
  6cc:	7c 84 2c 30 	srw     r4,r4,r5
  6d0:	7d 29 50 30 	slw     r9,r9,r10
  6d4:	7c 63 2c 30 	srw     r3,r3,r5
  6d8:	7d 24 23 78 	or      r4,r9,r4
  6dc:	4e 80 00 20 	blr


With the version as is in my series:

000006ac <vdso_shift_ns>:
  6ac:	21 25 00 20 	subfic  r9,r5,32
  6b0:	7c 69 48 30 	slw     r9,r3,r9
  6b4:	7c 84 2c 30 	srw     r4,r4,r5
  6b8:	7d 24 23 78 	or      r4,r9,r4
  6bc:	7c 63 2c 30 	srw     r3,r3,r5
  6c0:	4e 80 00 20 	blr


Christophe

^ permalink raw reply

* Re: [PATCH 1/2] sched/topology: Allow archs to override cpu_smt_mask
From: Michael Ellerman @ 2020-08-06  5:32 UTC (permalink / raw)
  To: peterz, Srikar Dronamraju
  Cc: Gautham R Shenoy, Michael Neuling, Vincent Guittot, Rik van Riel,
	linuxppc-dev, LKML, Ingo Molnar, Thomas Gleixner, Mel Gorman,
	Valentin Schneider, Dietmar Eggemann
In-Reply-To: <20200804124755.GJ2674@hirez.programming.kicks-ass.net>

peterz@infradead.org writes:
> On Tue, Aug 04, 2020 at 05:40:07PM +0530, Srikar Dronamraju wrote:
>> * peterz@infradead.org <peterz@infradead.org> [2020-08-04 12:45:20]:
>> 
>> > On Tue, Aug 04, 2020 at 09:03:06AM +0530, Srikar Dronamraju wrote:
>> > > cpu_smt_mask tracks topology_sibling_cpumask. This would be good for
>> > > most architectures. One of the users of cpu_smt_mask(), would be to
>> > > identify idle-cores. On Power9, a pair of cores can be presented by the
>> > > firmware as a big-core for backward compatibility reasons.
>> > > 
>> > > In order to maintain userspace backward compatibility with previous
>> > > versions of processor, (since Power8 had SMT8 cores), Power9 onwards there
>> > > is option to the firmware to advertise a pair of SMT4 cores as a fused
>> > > cores (referred to as the big_core mode in the Linux Kernel). On Power9
>> > > this pair shares the L2 cache as well. However, from the scheduler's point
>> > > of view, a core should be determined by SMT4. The load-balancer already
>> > > does this. Hence allow PowerPc architecture to override the default
>> > > cpu_smt_mask() to point to the SMT4 cores in a big_core mode.
>> > 
>> > I'm utterly confused.
>> > 
>> > Why can't you set your regular siblings mask to the smt4 thing? Who
>> > cares about the compat stuff, I thought that was an LPAR/AIX thing.

To be clear this stuff is all for when we're running on the PowerVM machines,
ie. as LPARs.

That brings with it a bunch of problems, such as existing software that
has been developed/configured for Power8 and expects to see SMT8.

We also allow LPARs to be live migrated from Power8 to Power9 (and back), so
maintaining the illusion of SMT8 is considered a requirement to make that work.

>> There are no technical challenges to set the sibling mask to SMT4.
>> This is for Linux running on PowerVM. When these Power9 boxes are sold /
>> marketed as X core boxes (where X stand for SMT8 cores).  Since on PowerVM
>> world, everything is in SMT8 mode, the device tree properties still mark
>> the system to be running on 8 thread core. There are a number of utilities
>> like ppc64_cpu that directly read from device-tree. They would get core
>> count and thread count which is SMT8 based.
>> 
>> If the sibling_mask is set to small core, then same user when looking at
>
> FWIW, I find the small/big core naming utterly confusing vs the
> big/little naming from ARM. When you say small, I'm thinking of
> asymmetric cores, not SMT4/SMT8.

Yeah I agree the naming is confusing.

Let's call them "SMT4 cores" and "SMT8 cores"?

>> output from lscpu and other utilities that look at sysfs will start seeing
>> 2x number of cores to what he had provisioned and what the utilities from
>> the device-tree show. This can gets users confused.
>
> One will report SMT8 and the other SMT4, right? So only users that
> cannot read will be confused, but if you can't read, confusion is
> guaranteed anyway.

It's partly users, but also software that would see different values depending
on where it looks.

> Also, by exposing the true (SMT4) topology to userspace, userspace
> applications could behave better -- for those few that actually parse
> the topology information.

Agreed, though as you say there aren't that many that actually use the low-level
topology information.

>> So to keep the device-tree properties, utilities depending on device-tree,
>> sysfs and utilities depending on sysfs on the same page, userspace are only
>> exposed as SMT8.
>
> I'm not convinced it makes sense to lie to userspace just to accomodate
> a few users that cannot read.

The problem is we are already lying to userspace, because firmware lies to us.

ie. the firmware on these systems shows us an SMT8 core, and so current kernels
show SMT8 to userspace. I don't think we can realistically change that fact now,
as these systems are already out in the field.

What this patch tries to do is undo some of the mess, and at least give the
scheduler the right information.

cheers

^ permalink raw reply

* Re: [PATCH 1/2] sched/topology: Allow archs to override cpu_smt_mask
From: peterz @ 2020-08-06  8:54 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: Gautham R Shenoy, Michael Neuling, Vincent Guittot,
	Srikar Dronamraju, Rik van Riel, linuxppc-dev, LKML,
	Valentin Schneider, Thomas Gleixner, Mel Gorman, Ingo Molnar,
	Dietmar Eggemann
In-Reply-To: <87ft90z6dy.fsf@mpe.ellerman.id.au>

On Thu, Aug 06, 2020 at 03:32:25PM +1000, Michael Ellerman wrote:

> That brings with it a bunch of problems, such as existing software that
> has been developed/configured for Power8 and expects to see SMT8.
> 
> We also allow LPARs to be live migrated from Power8 to Power9 (and back), so
> maintaining the illusion of SMT8 is considered a requirement to make that work.

So how does that work if the kernel booted on P9 and demuxed the SMT8
into 2xSMT4? If you migrate that state onto a P8 with actual SMT8 you're
toast again.

> Yeah I agree the naming is confusing.
> 
> Let's call them "SMT4 cores" and "SMT8 cores"?

Works for me, thanks!

> The problem is we are already lying to userspace, because firmware lies to us.
> 
> ie. the firmware on these systems shows us an SMT8 core, and so current kernels
> show SMT8 to userspace. I don't think we can realistically change that fact now,
> as these systems are already out in the field.
> 
> What this patch tries to do is undo some of the mess, and at least give the
> scheduler the right information.

What a mess... I think it depends on what you do with that P9 to P8
migration case. Does it make sense to have a "p8_compat" boot arg for
the case where you want LPAR migration back onto P8 systems -- in which
case it simply takes the firmware's word as gospel and doesn't untangle
things, because it can actually land on a P8.

^ permalink raw reply

* [powerpc:next] BUILD SUCCESS a7aaa2f26bfd932a654706b19859e7adf802bee2
From: kernel test robot @ 2020-08-06  4:14 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: linuxppc-dev

tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git  next
branch HEAD: a7aaa2f26bfd932a654706b19859e7adf802bee2  selftests/powerpc: Fix pkey syscall redefinitions

elapsed time: 843m

configs tested: 134
configs skipped: 14

The following configs have been built successfully.
More configs may be tested in the coming days.

arm                                 defconfig
arm64                               defconfig
arm                              allyesconfig
arm                              allmodconfig
arm64                            allyesconfig
arm                       mainstone_defconfig
sh                   sh7770_generic_defconfig
mips                       rbtx49xx_defconfig
ia64                         bigsur_defconfig
arm                          simpad_defconfig
sh                          polaris_defconfig
mips                  mips_paravirt_defconfig
microblaze                      mmu_defconfig
arm                         axm55xx_defconfig
xtensa                    xip_kc705_defconfig
m68k                       m5475evb_defconfig
powerpc                       holly_defconfig
powerpc                     pq2fads_defconfig
c6x                                 defconfig
arm                           u8500_defconfig
powerpc                    amigaone_defconfig
sh                           se7712_defconfig
arc                           tb10x_defconfig
sh                           se7780_defconfig
mips                   sb1250_swarm_defconfig
mips                         tb0287_defconfig
arc                              allyesconfig
sh                        dreamcast_defconfig
m68k                       m5249evb_defconfig
alpha                               defconfig
arc                      axs103_smp_defconfig
mips                      fuloong2e_defconfig
powerpc                    mvme5100_defconfig
mips                        bcm47xx_defconfig
m68k                             alldefconfig
arm                     am200epdkit_defconfig
arm                         nhk8815_defconfig
powerpc                      pasemi_defconfig
arm                             rpc_defconfig
m68k                        mvme147_defconfig
m68k                          multi_defconfig
powerpc                  mpc885_ads_defconfig
sh                   secureedge5410_defconfig
ia64                          tiger_defconfig
arm                         mv78xx0_defconfig
powerpc                     mpc5200_defconfig
mips                         tb0219_defconfig
powerpc                      ppc6xx_defconfig
s390                          debug_defconfig
powerpc                     powernv_defconfig
arm                        spear6xx_defconfig
mips                         db1xxx_defconfig
sh                         apsh4a3a_defconfig
mips                            gpr_defconfig
arm                       imx_v4_v5_defconfig
arm                     eseries_pxa_defconfig
arm                            pleb_defconfig
arm                        keystone_defconfig
arm                          collie_defconfig
arm                         shannon_defconfig
arm                     davinci_all_defconfig
xtensa                generic_kc705_defconfig
arm                          moxart_defconfig
sh                           sh2007_defconfig
sh                           se7751_defconfig
m68k                       m5275evb_defconfig
arc                        vdk_hs38_defconfig
arm                      tct_hammer_defconfig
mips                     loongson1b_defconfig
powerpc                       ppc64_defconfig
arm                             ezx_defconfig
ia64                             allmodconfig
ia64                                defconfig
ia64                             allyesconfig
m68k                             allmodconfig
m68k                                defconfig
m68k                             allyesconfig
nds32                               defconfig
nios2                            allyesconfig
csky                                defconfig
alpha                            allyesconfig
nios2                               defconfig
nds32                             allnoconfig
c6x                              allyesconfig
xtensa                           allyesconfig
h8300                            allyesconfig
arc                                 defconfig
sh                               allmodconfig
parisc                              defconfig
s390                             allyesconfig
parisc                           allyesconfig
s390                                defconfig
sparc                            allyesconfig
sparc                               defconfig
i386                                defconfig
i386                             allyesconfig
mips                             allyesconfig
mips                             allmodconfig
powerpc                             defconfig
powerpc                          allyesconfig
powerpc                          allmodconfig
powerpc                           allnoconfig
i386                 randconfig-a005-20200805
i386                 randconfig-a004-20200805
i386                 randconfig-a001-20200805
i386                 randconfig-a003-20200805
i386                 randconfig-a002-20200805
i386                 randconfig-a006-20200805
x86_64               randconfig-a013-20200805
x86_64               randconfig-a011-20200805
x86_64               randconfig-a012-20200805
x86_64               randconfig-a016-20200805
x86_64               randconfig-a015-20200805
x86_64               randconfig-a014-20200805
i386                 randconfig-a011-20200805
i386                 randconfig-a012-20200805
i386                 randconfig-a013-20200805
i386                 randconfig-a014-20200805
i386                 randconfig-a015-20200805
i386                 randconfig-a016-20200805
i386                 randconfig-a011-20200806
i386                 randconfig-a012-20200806
i386                 randconfig-a015-20200806
i386                 randconfig-a016-20200806
riscv                            allyesconfig
riscv                             allnoconfig
riscv                               defconfig
riscv                            allmodconfig
x86_64                                   rhel
x86_64                           allyesconfig
x86_64                    rhel-7.6-kselftests
x86_64                              defconfig
x86_64                               rhel-8.3
x86_64                                  kexec

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

^ permalink raw reply

* [powerpc:merge] BUILD SUCCESS 3cd2184115b85cc8242fec3d42529cd112962984
From: kernel test robot @ 2020-08-06  4:14 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: linuxppc-dev

tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git  merge
branch HEAD: 3cd2184115b85cc8242fec3d42529cd112962984  Automatic merge of 'master', 'next' and 'fixes' (2020-08-05 22:24)

elapsed time: 844m

configs tested: 91
configs skipped: 6

The following configs have been built successfully.
More configs may be tested in the coming days.

arm                                 defconfig
arm64                            allyesconfig
arm64                               defconfig
arm                              allyesconfig
arm                              allmodconfig
sh                          polaris_defconfig
mips                  mips_paravirt_defconfig
microblaze                      mmu_defconfig
mips                 pnx8335_stb225_defconfig
arm                       aspeed_g5_defconfig
mips                      bmips_stb_defconfig
sh                   sh7770_generic_defconfig
m68k                       m5249evb_defconfig
alpha                               defconfig
sh                        dreamcast_defconfig
m68k                             alldefconfig
arm                     am200epdkit_defconfig
arm                         nhk8815_defconfig
powerpc                     mpc5200_defconfig
mips                         tb0219_defconfig
powerpc                      ppc6xx_defconfig
arm                       imx_v4_v5_defconfig
arm                     eseries_pxa_defconfig
arm                            pleb_defconfig
arm                        keystone_defconfig
arm                          collie_defconfig
arc                        vdk_hs38_defconfig
arm                      tct_hammer_defconfig
mips                     loongson1b_defconfig
powerpc                       ppc64_defconfig
arm                             ezx_defconfig
ia64                             allmodconfig
ia64                                defconfig
ia64                             allyesconfig
m68k                             allmodconfig
m68k                                defconfig
m68k                             allyesconfig
nds32                               defconfig
nios2                            allyesconfig
csky                                defconfig
alpha                            allyesconfig
xtensa                           allyesconfig
h8300                            allyesconfig
arc                                 defconfig
sh                               allmodconfig
parisc                              defconfig
s390                             allyesconfig
parisc                           allyesconfig
s390                                defconfig
i386                             allyesconfig
sparc                            allyesconfig
sparc                               defconfig
i386                                defconfig
nios2                               defconfig
arc                              allyesconfig
nds32                             allnoconfig
c6x                              allyesconfig
mips                             allyesconfig
mips                             allmodconfig
powerpc                          allyesconfig
powerpc                          allmodconfig
powerpc                           allnoconfig
powerpc                             defconfig
i386                 randconfig-a005-20200805
i386                 randconfig-a004-20200805
i386                 randconfig-a001-20200805
i386                 randconfig-a003-20200805
i386                 randconfig-a002-20200805
i386                 randconfig-a006-20200805
x86_64               randconfig-a013-20200805
x86_64               randconfig-a011-20200805
x86_64               randconfig-a012-20200805
x86_64               randconfig-a016-20200805
x86_64               randconfig-a015-20200805
x86_64               randconfig-a014-20200805
i386                 randconfig-a011-20200805
i386                 randconfig-a012-20200805
i386                 randconfig-a013-20200805
i386                 randconfig-a014-20200805
i386                 randconfig-a015-20200805
i386                 randconfig-a016-20200805
riscv                            allyesconfig
riscv                             allnoconfig
riscv                               defconfig
riscv                            allmodconfig
x86_64                                   rhel
x86_64                           allyesconfig
x86_64                    rhel-7.6-kselftests
x86_64                              defconfig
x86_64                               rhel-8.3
x86_64                                  kexec

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

^ permalink raw reply

* [PATCH] ASoC: fsl-asoc-card: Get "extal" clock rate by clk_get_rate
From: Shengjiu Wang @ 2020-08-06  7:39 UTC (permalink / raw)
  To: timur, nicoleotsuka, Xiubo.Lee, festevam, lgirdwood, broonie,
	perex, tiwai, alsa-devel, linuxppc-dev, linux-kernel

On some platform(.e.g. i.MX8QM MEK), the "extal" clock is different
with the mclk of codec, then the clock rate is also different.
So it is better to get clock rate of "extal" rate by clk_get_rate,
don't reuse the clock rate of mclk.

Signed-off-by: Shengjiu Wang <shengjiu.wang@nxp.com>
---
 sound/soc/fsl/fsl-asoc-card.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/sound/soc/fsl/fsl-asoc-card.c b/sound/soc/fsl/fsl-asoc-card.c
index 52adedc03245..2c92a5efad61 100644
--- a/sound/soc/fsl/fsl-asoc-card.c
+++ b/sound/soc/fsl/fsl-asoc-card.c
@@ -696,6 +696,13 @@ static int fsl_asoc_card_probe(struct platform_device *pdev)
 			goto asrc_fail;
 		}
 	} else if (of_node_name_eq(cpu_np, "esai")) {
+		struct clk *esai_clk = clk_get(&cpu_pdev->dev, "extal");
+
+		if (!IS_ERR(esai_clk)) {
+			priv->cpu_priv.sysclk_freq[TX] = clk_get_rate(esai_clk);
+			priv->cpu_priv.sysclk_freq[RX] = clk_get_rate(esai_clk);
+			clk_put(esai_clk);
+		}
 		priv->cpu_priv.sysclk_id[1] = ESAI_HCKT_EXTAL;
 		priv->cpu_priv.sysclk_id[0] = ESAI_HCKR_EXTAL;
 	} else if (of_node_name_eq(cpu_np, "sai")) {
-- 
2.27.0


^ permalink raw reply related

* [PATCH] powerpc/signal: Move and simplify get_clean_sp()
From: Christophe Leroy @ 2020-08-06  8:50 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman
  Cc: linuxppc-dev, linux-kernel

get_clean_sp() is only used in kernel/signal.c . Move it there.

And GCC is smart enough to reduce the function when on PPC32, no
need of a special PPC32 simple version.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
 arch/powerpc/include/asm/processor.h | 14 --------------
 arch/powerpc/kernel/signal.c         |  7 +++++++
 2 files changed, 7 insertions(+), 14 deletions(-)

diff --git a/arch/powerpc/include/asm/processor.h b/arch/powerpc/include/asm/processor.h
index ed0d633ab5aa..5c20b6d509ae 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -404,20 +404,6 @@ static inline void prefetchw(const void *x)
 
 #define HAVE_ARCH_PICK_MMAP_LAYOUT
 
-#ifdef CONFIG_PPC64
-static inline unsigned long get_clean_sp(unsigned long sp, int is_32)
-{
-	if (is_32)
-		return sp & 0x0ffffffffUL;
-	return sp;
-}
-#else
-static inline unsigned long get_clean_sp(unsigned long sp, int is_32)
-{
-	return sp;
-}
-#endif
-
 /* asm stubs */
 extern unsigned long isa300_idle_stop_noloss(unsigned long psscr_val);
 extern unsigned long isa300_idle_stop_mayloss(unsigned long psscr_val);
diff --git a/arch/powerpc/kernel/signal.c b/arch/powerpc/kernel/signal.c
index d15a98c758b8..bd0ba7c5e2cf 100644
--- a/arch/powerpc/kernel/signal.c
+++ b/arch/powerpc/kernel/signal.c
@@ -171,6 +171,13 @@ inline unsigned long copy_ckfpr_from_user(struct task_struct *task,
 
 int show_unhandled_signals = 1;
 
+static unsigned long get_clean_sp(unsigned long sp, int is_32)
+{
+	if (is_32)
+		return sp & 0x0ffffffffUL;
+	return sp;
+}
+
 /*
  * Allocate space for the signal frame
  */
-- 
2.25.0


^ permalink raw reply related

* Re: [PATCH v3 3/3] powerpc/uaccess: simplify the get_fs() set_fs() logic
From: Christophe Leroy @ 2020-08-06  9:54 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linuxppc-dev, Paul Mackerras, linux-kernel
In-Reply-To: <20200806091758.GA653@infradead.org>



Le 06/08/2020 à 11:17, Christoph Hellwig a écrit :
> Do you urgently need this?  My plan for 5.10 is to rebased and submit
> the remaining bits of this branch:
> 
>      http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/set_fs-removal
> 
> which will kill off set_fs/get_fs entirely.
> 

No this isn't needed urgently at all I think.

It was sleeping in Patchwork since January, and I received comments from 
Michael a few days ago asking me to re-submit, see 
https://patchwork.ozlabs.org/project/linuxppc-dev/patch/dd2876b808ea38eb7b7f760ecd6ce06096c61fb5.1580295551.git.christophe.leroy@c-s.fr/

But if you are killing set_fs/get_fs entirely, that's even better I 
guess. Thanks for the hands up.

Christophe

^ permalink raw reply

* [PATCH][V2] macintosh: windfarm: remove detatch debug containing spelling mistakes
From: Colin King @ 2020-08-06 10:29 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Wolfram Sang, linuxppc-dev
  Cc: kernel-janitors, linux-kernel

From: Colin Ian King <colin.king@canonical.com>

There are spelling mistakes in two debug messages. As recommended
by Wolfram Sang, these can be removed as there is plenty of debug
in the driver core.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
---

V2: remove the debug rather than fixing the spelling

---
 drivers/macintosh/windfarm_lm75_sensor.c | 2 --
 drivers/macintosh/windfarm_lm87_sensor.c | 2 --
 2 files changed, 4 deletions(-)

diff --git a/drivers/macintosh/windfarm_lm75_sensor.c b/drivers/macintosh/windfarm_lm75_sensor.c
index 1e5fa09845e7..29f48c2028b6 100644
--- a/drivers/macintosh/windfarm_lm75_sensor.c
+++ b/drivers/macintosh/windfarm_lm75_sensor.c
@@ -152,8 +152,6 @@ static int wf_lm75_remove(struct i2c_client *client)
 {
 	struct wf_lm75_sensor *lm = i2c_get_clientdata(client);
 
-	DBG("wf_lm75: i2c detatch called for %s\n", lm->sens.name);
-
 	/* Mark client detached */
 	lm->i2c = NULL;
 
diff --git a/drivers/macintosh/windfarm_lm87_sensor.c b/drivers/macintosh/windfarm_lm87_sensor.c
index d011899c0a8a..9fab0b47cd3d 100644
--- a/drivers/macintosh/windfarm_lm87_sensor.c
+++ b/drivers/macintosh/windfarm_lm87_sensor.c
@@ -149,8 +149,6 @@ static int wf_lm87_remove(struct i2c_client *client)
 {
 	struct wf_lm87_sensor *lm = i2c_get_clientdata(client);
 
-	DBG("wf_lm87: i2c detatch called for %s\n", lm->sens.name);
-
 	/* Mark client detached */
 	lm->i2c = NULL;
 
-- 
2.27.0


^ permalink raw reply related

* Re: [RFC PATCH 1/2] powerpc/numa: Introduce logical numa id
From: Aneesh Kumar K.V @ 2020-08-06 10:44 UTC (permalink / raw)
  To: Srikar Dronamraju; +Cc: Nathan Lynch, linuxppc-dev
In-Reply-To: <20200804072507.GI24375@linux.vnet.ibm.com>

Srikar Dronamraju <srikar@linux.vnet.ibm.com> writes:

> * Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> [2020-08-02 19:51:41]:
>> Srikar Dronamraju <srikar@linux.vnet.ibm.com> writes:
>> > * Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> [2020-07-31 16:49:14]:
>> >
>> >
>> > If its just to eliminate node 0, then we have 2 other probably better
>> > solutions.
>> > 1. Dont mark node 0 as spl (currently still in mm-tree and a result in
>> > linux-next)
>> > 2. powerpc specific: explicitly clear node 0 during numa bringup.
>> >
>> 
>> 
>> I am not sure I consider them better. But yes, those patches are good
>> and also resolves the node 0 initialization when the firmware didn't
>> indicate the presence of such a node.
>> 
>> This patch in addition make sure that we get the same topolgy report
>> across reboot on a virtualized partitions as longs as the cpu/memory
>> ratio per powervm domains remain the same. This should also help to
>> avoid confusion after an LPM migration once we start applying topology
>> updates. 
>> 
>
> What do we mean by cpu/memory ratio. The topology across reboot would have
> changed only if PowerVM would have allocated resources differently by
> scrambling/unscrambling. We are no more processing topology updates at
> runtime. As far as I know, after LPM, the source topology is maintained.

A LPAR running with one numa node and 10GB of memory on PowerVM domain
10 will report node 10 and 10GB memory in the current scheme. After LPM
migration or a CEC shutdown/reboot if the domain from which the resource
allocated becomes 11, then the LPAR will report node 11 and 10GB memory.

Having a logical node number means in the both the above cases we report
node 0, 10GB memory. 


>
>> >> This can  be resolved by mapping the firmware provided group id to a logical Linux
>> >> NUMA id. In this patch, we do this only for pseries platforms considering the
>> >
>> > On PowerVM, as you would know the nid is already a logical or a flattened
>> > chip-id and not the actual hardware chip-id.
>> 
>> Yes. But then they are derived based on PowerVM resources AKA domains.
>> Now based on the available resource on a system, we could end up with
>> different node numbers with same toplogy across reboots. Making it
>> logical at OS level prevent that. 
>
> The above statement kind of gives an impression, that topology changes
> across every reboot.  We only end up with different node numbers if and only
> if the underlying topology has changed and that case is very rare. Or am I
> missing something?

IIUC it also depends on availability of resources within the
domain at the time of LPAR start.

>
>> 
>> >> diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
>> >> index e437a9ac4956..6c659aada55b 100644
>> >> --- a/arch/powerpc/mm/numa.c
>> >> +++ b/arch/powerpc/mm/numa.c
>> >> @@ -221,25 +221,51 @@ static void initialize_distance_lookup_table(int nid,
>> >>  	}
>> >>  }
>> >> 
>> >> +static u32 nid_map[MAX_NUMNODES] = {[0 ... MAX_NUMNODES - 1] =  NUMA_NO_NODE};
>> >> +
>> >> +int firmware_group_id_to_nid(int firmware_gid)
>> >> +{
>> >> +	static int last_nid = 0;
>> >> +
>> >> +	/*
>> >> +	 * For PowerNV we don't change the node id. This helps to avoid
>> >> +	 * confusion w.r.t the expected node ids. On pseries, node numbers
>> >> +	 * are virtualized. Hence do logical node id for pseries.
>> >> +	 */
>> >> +	if (!firmware_has_feature(FW_FEATURE_LPAR))
>> >> +		return firmware_gid;
>> >> +
>> >> +	if (firmware_gid ==  -1)
>> >> +		return NUMA_NO_NODE;
>> >> +
>> >> +	if (nid_map[firmware_gid] == NUMA_NO_NODE)
>> >> +		nid_map[firmware_gid] = last_nid++;
>> >
>> > How do we ensure 2 simultaneous firmware_group_id_to_nid() calls dont end up
>> > at this place in parallel?
>> 
>> Do we have a code path where we do that? All the node id init should
>> happen early and there should not be two cpus doing node init at the
>> same time. I might be mistaken. Can you point to the code path where you
>> expect this to be called in parallel?
>> 
>
> associativity_to_nid gets called the first time a cpu is being made present
> from offline. So it need not be in boot path. We may to verify if cpu
> hotplug, dlpar, operations are synchronized. For example a memory hotadd and
> cpu hotplug are they synchronized? I am not sure if they are synchronized at
> this time.

But you don't online cpu or memory to a non existent node post boot
right?. If the node is existent we have already initialized the nid_map.

However i am not sure whether we do a parallel initialization of devices. ie,
of_device_add getting called in parallel. if it can then we need the
below?

@@ -226,6 +226,7 @@ static u32 nid_map[MAX_NUMNODES] = {[0 ... MAX_NUMNODES - 1] =  NUMA_NO_NODE};
 int firmware_group_id_to_nid(int firmware_gid)
 {
        static int last_nid = 0;
+       static DEFINE_SPINLOCK(node_id_lock);
 
        /*
         * For PowerNV we don't change the node id. This helps to avoid
@@ -238,8 +239,13 @@ int firmware_group_id_to_nid(int firmware_gid)
        if (firmware_gid ==  -1)
                return NUMA_NO_NODE;
 
-       if (nid_map[firmware_gid] == NUMA_NO_NODE)
-               nid_map[firmware_gid] = last_nid++;
+       if (nid_map[firmware_gid] == NUMA_NO_NODE) {
+               spin_lock(&node_id_lock);
+               /*  recheck with lock held */
+               if (nid_map[firmware_gid] == NUMA_NO_NODE)
+                       nid_map[firmware_gid] = last_nid++;
+               spin_unlock(&node_id_lock);
+       }
 
        return nid_map[firmware_gid];
 }



>
>> >
>> >> +
>> >> +	return nid_map[firmware_gid];
>> >> +}
>> >> +
>> >>  /* Returns nid in the range [0..MAX_NUMNODES-1], or -1 if no useful numa
>> >>   * info is found.
>> >>   */
>> >>  static int associativity_to_nid(const __be32 *associativity)
>> >>  {
>> >>  	int nid = NUMA_NO_NODE;
>> >> +	int firmware_gid = -1;
>> >> 
>> >>  	if (!numa_enabled)
>> >>  		goto out;
>> >> 
>> >>  	if (of_read_number(associativity, 1) >= min_common_depth)
>> >> -		nid = of_read_number(&associativity[min_common_depth], 1);
>> >> +		firmware_gid = of_read_number(&associativity[min_common_depth], 1);
>> >> 
>> >>  	/* POWER4 LPAR uses 0xffff as invalid node */
>> >> -	if (nid == 0xffff || nid >= MAX_NUMNODES)
>> >> -		nid = NUMA_NO_NODE;
>> >> +	if (firmware_gid == 0xffff || firmware_gid >= MAX_NUMNODES)
>> >> +		firmware_gid = -1;
>> >
>> > Lets assume two or more invocations of associativity_to_nid for the same
>> > associativity, end up with -1, In each case aren't giving different
>> > nids?
>> 
>> 
>> I didn't quiet get the comment here. But I assume you are indicating the
>> same one you mentioned above?
>> 
>
> No its not related to the above comment.
> We are incrementing the nid_map table for every unique firmware_gid or for
> every -1 (aka invalid associativities). If there are sufficiently large
> number of associativities that end up returning invalid associativities,
> then don't we quickly overflow the nid_map table? Not only about the
> overflow but a 8 node machine may soon look like a 80 node machine.

Not sure I follow. What does a large number of associativies imply? Are
you looking at ibm,associativity-lookup-arrays that got entries which
are invalid? Even there we are not parsing the full array, we lookup
only a specific firmware_gid (in case of lookup-arrays we use aa_index
value from drmem_lmb).

I will also add a las_nid > MAX_NUMNODES check in
firmware_group_id_to_nid() to handle the case where we find more numa
nodes than MAX_NUMANODES in device tree.

-aneesh

^ permalink raw reply

* Re: [PATCH v2 2/5] powerpc/lib: Initialize a temporary mm for code patching
From: Daniel Axtens @ 2020-08-06  3:24 UTC (permalink / raw)
  To: Christopher M. Riedl, linuxppc-dev; +Cc: kernel-hardening
In-Reply-To: <20200709040316.12789-3-cmr@informatik.wtf>

"Christopher M. Riedl" <cmr@informatik.wtf> writes:

> When code patching a STRICT_KERNEL_RWX kernel the page containing the
> address to be patched is temporarily mapped with permissive memory
> protections. Currently, a per-cpu vmalloc patch area is used for this
> purpose. While the patch area is per-cpu, the temporary page mapping is
> inserted into the kernel page tables for the duration of the patching.
> The mapping is exposed to CPUs other than the patching CPU - this is
> undesirable from a hardening perspective.
>
> Use the `poking_init` init hook to prepare a temporary mm and patching
> address. Initialize the temporary mm by copying the init mm. Choose a
> randomized patching address inside the temporary mm userspace address
> portion. The next patch uses the temporary mm and patching address for
> code patching.
>
> Based on x86 implementation:
>
> commit 4fc19708b165
> ("x86/alternatives: Initialize temporary mm for patching")
>
> Signed-off-by: Christopher M. Riedl <cmr@informatik.wtf>
> ---
>  arch/powerpc/lib/code-patching.c | 33 ++++++++++++++++++++++++++++++++
>  1 file changed, 33 insertions(+)
>
> diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
> index 0a051dfeb177..8ae1a9e5fe6e 100644
> --- a/arch/powerpc/lib/code-patching.c
> +++ b/arch/powerpc/lib/code-patching.c
> @@ -11,6 +11,8 @@
>  #include <linux/cpuhotplug.h>
>  #include <linux/slab.h>
>  #include <linux/uaccess.h>
> +#include <linux/sched/task.h>
> +#include <linux/random.h>
>  
>  #include <asm/tlbflush.h>
>  #include <asm/page.h>
> @@ -44,6 +46,37 @@ int raw_patch_instruction(struct ppc_inst *addr, struct ppc_inst instr)
>  }
>  
>  #ifdef CONFIG_STRICT_KERNEL_RWX
> +
> +static struct mm_struct *patching_mm __ro_after_init;
> +static unsigned long patching_addr __ro_after_init;
> +
> +void __init poking_init(void)
> +{
> +	spinlock_t *ptl; /* for protecting pte table */
> +	pte_t *ptep;
> +
> +	/*
> +	 * Some parts of the kernel (static keys for example) depend on
> +	 * successful code patching. Code patching under STRICT_KERNEL_RWX
> +	 * requires this setup - otherwise we cannot patch at all. We use
> +	 * BUG_ON() here and later since an early failure is preferred to
> +	 * buggy behavior and/or strange crashes later.
> +	 */
> +	patching_mm = copy_init_mm();
> +	BUG_ON(!patching_mm);
> +
> +	/*
> +	 * In hash we cannot go above DEFAULT_MAP_WINDOW easily.
> +	 * XXX: Do we want additional bits of entropy for radix?
> +	 */
> +	patching_addr = (get_random_long() & PAGE_MASK) %
> +		(DEFAULT_MAP_WINDOW - PAGE_SIZE);

It took me a while to understand this calculation. I see that it's
calculating a base address for a page in which to do patching. It does
the following:

 - get a random long

 - mask with PAGE_MASK so as to get a page aligned value

 - make sure that the base address is at least one PAGE_SIZE below
   DEFAULT_MAP_WINDOW so we have a clear page between the base and
   DEFAULT_MAP_WINDOW.

On 64-bit Book3S with 64K pages, that works out to be

PAGE_SIZE = 0x0000 0000 0001 0000
PAGE_MASK = 0xFFFF FFFF FFFF 0000

DEFAULT_MAP_WINDOW = DEFAULT_MAP_WINDOW_USER64 = TASK_SIZE_128TB
                   = 0x0000_8000_0000_0000

DEFAULT_MAP_WINDOW - PAGE_SIZE = 0x0000 7FFF FFFF 0000

It took a while (and a conversation with my wife who studied pure
maths!) but I am convinced that the modulo preserves the page-alignement
of the patching address.

One thing I did realise is that patching_addr can be zero at the end of
this process. That seems dubious and slightly error-prone to me - is
the patching process robust to that or should we exclude it?

Anyway, if I have the maths right, that there are 0x7fffffff or ~2
billion possible locations for the patching page, which is just shy of
31 bits of entropy.

I think this compares pretty favourably to most (K)ASLR implementations?

What's the range if built with 4k pages?

Kind regards,
Daniel

> +
> +	ptep = get_locked_pte(patching_mm, patching_addr, &ptl);
> +	BUG_ON(!ptep);
> +	pte_unmap_unlock(ptep, ptl);
> +}
> +
>  static DEFINE_PER_CPU(struct vm_struct *, text_poke_area);
>  
>  static int text_area_cpu_up(unsigned int cpu)
> -- 
> 2.27.0

^ permalink raw reply

* Re: [PATCH] powerpc/40x: Fix assembler warning about r0
From: Michael Ellerman @ 2020-08-06  2:18 UTC (permalink / raw)
  To: Christophe Leroy, linuxppc-dev
In-Reply-To: <bc70ddd8-d48d-d939-dce0-5aa8b8cf87d8@csgroup.eu>

Christophe Leroy <christophe.leroy@csgroup.eu> writes:
> Le 22/07/2020 à 04:24, Michael Ellerman a écrit :
>> The assembler says:
>>    arch/powerpc/kernel/head_40x.S:623: Warning: invalid register expression
>
> I get exactly the same with head_32.S, for the exact same reason.

Ah yep, I see it. I mostly build pmac32_defconfig which doesn't have
BDI_SWITCH enabled.

Send a patch? :)

Do we still need the BDI_SWITCH code? Is it likely anyone still has one,
that works?

cheers

^ permalink raw reply

* Re: [PATCH v8 5/8] powerpc/vdso: Prepare for switching VDSO to generic C implementation.
From: Michael Ellerman @ 2020-08-06  2:03 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Christophe Leroy, nathanl, arnd, linux-kernel,
	Tulio Magno Quites Machado Filho, Paul Mackerras, luto,
	linux-arch, tglx, vincenzo.frascino, linuxppc-dev
In-Reply-To: <20200805133505.GN6753@gate.crashing.org>

Segher Boessenkool <segher@kernel.crashing.org> writes:
> On Wed, Aug 05, 2020 at 04:24:16PM +1000, Michael Ellerman wrote:
>> Christophe Leroy <christophe.leroy@csgroup.eu> writes:
>> > Indeed, 32-bit doesn't have a redzone, so I believe it needs a stack 
>> > frame whenever it has anything to same.
>> 
>> Yeah OK that would explain it.
>> 
>> > Here is what I have in libc.so:
>> >
>> > 000fbb60 <__clock_gettime>:
>> >     fbb60:	94 21 ff e0 	stwu    r1,-32(r1)
>
> This is the *only* place where you can use a negative offset from r1:
> in the stwu to extend the stack (set up a new stack frame, or make the
> current one bigger).

(You're talking about 32-bit code here right?)

>> > diff --git a/arch/powerpc/include/asm/vdso/gettimeofday.h 
>> > b/arch/powerpc/include/asm/vdso/gettimeofday.h
>> > index a0712a6e80d9..0b6fa245d54e 100644
>> > --- a/arch/powerpc/include/asm/vdso/gettimeofday.h
>> > +++ b/arch/powerpc/include/asm/vdso/gettimeofday.h
>> > @@ -10,6 +10,7 @@
>> >     .cfi_startproc
>> >   	PPC_STLU	r1, -STACK_FRAME_OVERHEAD(r1)
>> >   	mflr		r0
>> > +	PPC_STLU	r1, -STACK_FRAME_OVERHEAD(r1)
>> >     .cfi_register lr, r0
>> 
>> The cfi_register should come directly after the mflr I think.
>
> That is the idiomatic way to write it, and most obviously correct.  But
> as long as the value in LR at function entry is available in multiple
> places (like, in LR and in R0 here), it is fine to use either for
> unwinding.  Sometimes you can use this to optimise the unwind tables a
> bit -- not really worth it in hand-written code, it's more important to
> make it legible ;-)

OK. Because LR still holds the LR value until it's clobbered later, by
which point the cfi_register has taken effect.

But yeah I think for readability it's best to keep the cfi_register next
to the mflr.

>> >> There's also no code to load/restore the TOC pointer on BE, which I
>> >> think we'll need to handle.
>> >
>> > I see no code in the generated vdso64.so doing anything with r2, but if 
>> > you think that's needed, just let's do it:
>> 
>> Hmm, true.
>> 
>> The compiler will use the toc for globals (and possibly also for large
>> constants?)
>
> And anything else it bloody well wants to, yeah :-)

Haha yeah OK.

>> AFAIK there's no way to disable use of the toc, or make it a build error
>> if it's needed.
>
> Yes.
>
>> At the same time it's much safer for us to just save/restore r2, and
>> probably in the noise performance wise.
>
> If you want a function to be able to work with ABI-compliant code safely
> (in all cases), you'll have to make it itself ABI-compliant as well,
> yes :-)

True. Except this is the VDSO which has previously been a bit wild west
as far as ABI goes :)

>> So yeah we should probably do as below.
>
> [ snip ]
>
> Looks good yes.

Thanks for reviewing.

cheers

^ permalink raw reply

* Re: How would I code a Write to a proc file from within the kernel without reading anything from user space?
From: Michael Ellerman @ 2020-08-06  1:30 UTC (permalink / raw)
  To: thefirst ECS, linuxppc-dev
In-Reply-To: <32112832.26959.1596664366002.JavaMail.Dan@DanHP>

thefirst ECS <ecs_dn@yahoo.com> writes:
> In order to help debug a certain discrepancy, I need to "simulate" an "echo 1 > /proc/file" but doing it from kernel even when root file system is unavailable. 
>
> I have simulated it just fine via call_usermodehelper (with argv etc of "echo 1 > /proc/file") from inside the kernel which triggers:
> [ee897ec0] [c0122704] proc_reg_write+0x80/0xb4
> [ee897ef0] [c00d3b7c] vfs_write+0xb4/0x184
>
> just as I had wanted. But now I need to trigger the "vfs_write" and "proc_reg_write" but without using call_usermodehelper since I will be doing it when root "/" is unavailable and so I can no longer access /bin/echo and call the usermodehelper etc. So my question is how can I do that in kernel?
>
> Not sure if I'm supposed to look in fs read_write.c and fs.h and write a method based on those or if there's some other way etc.

You don't say which proc file, which would be useful information.

I'll assume it's /proc/sysrq-trigger based on your previous mail.

Rather than trying to invoke the write path from inside the kernel, the
simplest option is to just call __handle_sysrq() directly.

cheers

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox