LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v5 05/10] powerpc: dts: akebono: Harmonize EHCI/OHCI DT nodes name
From: Serge Semin @ 2021-02-08 13:51 UTC (permalink / raw)
  To: Felipe Balbi, Bjorn Andersson, Krzysztof Kozlowski,
	Florian Fainelli, Rob Herring, Greg Kroah-Hartman,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras
  Cc: Serge Semin, devicetree, linuxppc-dev, linux-kernel, Serge Semin
In-Reply-To: <20210208135154.6645-1-Sergey.Semin@baikalelectronics.ru>

In accordance with the Generic EHCI/OHCI bindings the corresponding node
name is suppose to comply with the Generic USB HCD DT schema, which
requires the USB nodes to have the name acceptable by the regexp:
"^usb(@.*)?" . Make sure the "generic-ehci" and "generic-ohci"-compatible
nodes are correctly named.

Signed-off-by: Serge Semin <Sergey.Semin@baikalelectronics.ru>
Acked-by: Krzysztof Kozlowski <krzk@kernel.org>
---
 arch/powerpc/boot/dts/akebono.dts | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/boot/dts/akebono.dts b/arch/powerpc/boot/dts/akebono.dts
index df18f8dc4642..343326c30380 100644
--- a/arch/powerpc/boot/dts/akebono.dts
+++ b/arch/powerpc/boot/dts/akebono.dts
@@ -126,7 +126,7 @@ SATA0: sata@30000010000 {
 			interrupts = <93 2>;
 		};
 
-		EHCI0: ehci@30010000000 {
+		EHCI0: usb@30010000000 {
 			compatible = "ibm,476gtr-ehci", "generic-ehci";
 			reg = <0x300 0x10000000 0x0 0x10000>;
 			interrupt-parent = <&MPIC>;
@@ -140,14 +140,14 @@ SD0: sd@30000000000 {
 			interrupt-parent = <&MPIC>;
 		};
 
-		OHCI0: ohci@30010010000 {
+		OHCI0: usb@30010010000 {
 			compatible = "ibm,476gtr-ohci", "generic-ohci";
 			reg = <0x300 0x10010000 0x0 0x10000>;
 			interrupt-parent = <&MPIC>;
 			interrupts = <89 1>;
 			};
 
-		OHCI1: ohci@30010020000 {
+		OHCI1: usb@30010020000 {
 			compatible = "ibm,476gtr-ohci", "generic-ohci";
 			reg = <0x300 0x10020000 0x0 0x10000>;
 			interrupt-parent = <&MPIC>;
-- 
2.29.2


^ permalink raw reply related

* Re: [PATCH v2 2/2] memblock: do not start bottom-up allocations with kernel_end
From: Thiago Jung Bauermann @ 2021-02-08 23:58 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: riel, iamjoonsoo.kim, Ram Pai, linux-kernel, mhocko, linux-mm,
	Satheesh Rajendran, guro, Konrad Rzeszutek Wilk, Andrew Morton,
	linuxppc-dev, kernel-team
In-Reply-To: <20210124073421.GG6332@kernel.org>


Mike Rapoport <rppt@kernel.org> writes:

> On Sat, Jan 23, 2021 at 06:09:11PM -0800, Andrew Morton wrote:
>> On Fri, 22 Jan 2021 01:37:14 -0300 Thiago Jung Bauermann <bauerman@linux.ibm.com> wrote:
>> 
>> > Mike Rapoport <rppt@kernel.org> writes:
>> > 
>> > > > Signed-off-by: Roman Gushchin <guro@fb.com>
>> > > 
>> > > Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
>> > 
>> > I've seen a couple of spurious triggers of the WARN_ONCE() removed by this
>> > patch. This happens on some ppc64le bare metal (powernv) server machines with
>> > CONFIG_SWIOTLB=y and crashkernel=4G, as described in a candidate patch I posted
>> > to solve this issue in a different way:
>> > 
>> > https://lore.kernel.org/linuxppc-dev/20201218062103.76102-1-bauerman@linux.ibm.com/
>> > 
>> > Since this patch solves that problem, is it possible to include it in the next
>> > feasible v5.11-rcX, with the following tag?
>> 
>> We could do this, if we're confident that this patch doesn't depend on
>> [1/2] "mm: cma: allocate cma areas bottom-up"?  I think it is...
>
> A think it does not depend on cma bottom-up allocation, it's rather the other
> way around: without this CMA bottom-up allocation could fail with KASLR
> enabled.

I noticed that this patch is now upstream as:

2dcb39645441 memblock: do not start bottom-up allocations with kernel_end

> Still, this patch may need updates to the way x86 does early reservations:
>
> https://lore.kernel.org/lkml/20210115083255.12744-1-rppt@kernel.org

... but the patches from this link still aren't. Isn't this a potential
problem for x86?

The patch series on the link above is now superseded by v2:

https://lore.kernel.org/linux-mm/20210128105711.10428-1-rppt@kernel.org/

-- 
Thiago Jung Bauermann
IBM Linux Technology Center

^ permalink raw reply

* Re: [PATCH v5 00/22] powerpc/32: Implement C syscall entry/exit
From: Nicholas Piggin @ 2021-02-09  1:03 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Christophe Leroy, Michael Ellerman,
	msuchanek, Paul Mackerras
  Cc: linuxppc-dev, linux-kernel
In-Reply-To: <cover.1612796617.git.christophe.leroy@csgroup.eu>

Excerpts from Christophe Leroy's message of February 9, 2021 1:10 am:
> This series implements C syscall entry/exit for PPC32. It reuses
> the work already done for PPC64.
> 
> This series is based on today's merge-test (b6f72fc05389e3fc694bf5a5fa1bbd33f61879e0)
> 
> In terms on performance we have the following number of cycles on an
> 8xx running null_syscall benchmark:
> - mainline: 296 cycles
> - after patch 4: 283 cycles
> - after patch 16: 304 cycles
> - after patch 17: 348 cycles
> - at the end of the series: 320 cycles
> 
> So in summary, we have a degradation of performance of 8% on null_syscall.
> 
> I think it is not a big degradation, it is worth it.

I guess it's 13% from 283. But it's very nice to use the shared C code.

There might be a few more percent speedup in there we can find later.

Thanks,
Nick


^ permalink raw reply

* Re: [PATCH v5 05/22] powerpc/irq: Add helper to set regs->softe
From: Nicholas Piggin @ 2021-02-09  1:11 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Christophe Leroy, Michael Ellerman,
	msuchanek, Paul Mackerras
  Cc: linuxppc-dev, linux-kernel
In-Reply-To: <5f37d1177a751fdbca79df461d283850ca3a34a2.1612796617.git.christophe.leroy@csgroup.eu>

Excerpts from Christophe Leroy's message of February 9, 2021 1:10 am:
> regs->softe doesn't exist on PPC32.
> 
> Add irq_soft_mask_regs_set_state() helper to set regs->softe.
> This helper will void on PPC32.
> 
> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
> ---
>  arch/powerpc/include/asm/hw_irq.h | 11 +++++++++--
>  1 file changed, 9 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/hw_irq.h b/arch/powerpc/include/asm/hw_irq.h
> index 614957f74cee..ed0c3b049dfd 100644
> --- a/arch/powerpc/include/asm/hw_irq.h
> +++ b/arch/powerpc/include/asm/hw_irq.h
> @@ -38,6 +38,8 @@
>  #define PACA_IRQ_MUST_HARD_MASK	(PACA_IRQ_EE)
>  #endif
>  
> +#endif /* CONFIG_PPC64 */
> +
>  /*
>   * flags for paca->irq_soft_mask
>   */
> @@ -46,8 +48,6 @@
>  #define IRQS_PMI_DISABLED	2
>  #define IRQS_ALL_DISABLED	(IRQS_DISABLED | IRQS_PMI_DISABLED)
>  
> -#endif /* CONFIG_PPC64 */
> -
>  #ifndef __ASSEMBLY__
>  
>  #ifdef CONFIG_PPC64
> @@ -287,6 +287,10 @@ extern void irq_set_pending_from_srr1(unsigned long srr1);
>  
>  extern void force_external_irq_replay(void);
>  
> +static inline void irq_soft_mask_regs_set_state(struct pt_regs *regs, unsigned long val)
> +{
> +	regs->softe = val;
> +}
>  #else /* CONFIG_PPC64 */
>  
>  static inline unsigned long arch_local_save_flags(void)
> @@ -355,6 +359,9 @@ static inline bool arch_irq_disabled_regs(struct pt_regs *regs)
>  
>  static inline void may_hard_irq_enable(void) { }
>  
> +static inline void irq_soft_mask_regs_set_state(struct pt_regs *regs, unsigned long val)
> +{
> +}
>  #endif /* CONFIG_PPC64 */
>  
>  #define ARCH_IRQ_INIT_FLAGS	IRQ_NOREQUEST

What I don't like about this where you use it is it kind of pollutes
the ppc32 path with this function which is not valid to use.

I would prefer if you had this purely so it could compile with:

  if (IS_ENABLED(CONFIG_PPC64)))
      irq_soft_mask_regs_set_state(regs, blah);

And then you could make the ppc32 cause a link error if it did not
get eliminated at compile time (e.g., call an undefined function).

You could do the same with the kuap_ functions to change some ifdefs
to IS_ENABLED.

That's just my preference but if you prefer this way I guess that's
okay.

Thanks,
Nick

^ permalink raw reply

* Re: [PATCH v5 06/22] powerpc/irq: Rework helpers that manipulate MSR[EE/RI]
From: Nicholas Piggin @ 2021-02-09  1:14 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Christophe Leroy, Michael Ellerman,
	msuchanek, Paul Mackerras
  Cc: linuxppc-dev, linux-kernel
In-Reply-To: <0e290372a0e7dc2ae657b4a01aec85f8de7fdf77.1612796617.git.christophe.leroy@csgroup.eu>

Excerpts from Christophe Leroy's message of February 9, 2021 1:10 am:
> In preparation of porting PPC32 to C syscall entry/exit,
> rewrite the following helpers as static inline functions and
> add support for PPC32 in them:
> 	__hard_irq_enable()
> 	__hard_irq_disable()
> 	__hard_EE_RI_disable()
> 	__hard_RI_enable()
> 
> Then use them in PPC32 version of arch_local_irq_disable()
> and arch_local_irq_enable() to avoid code duplication.
> 

Reviewed-by: Nicholas Piggin <npiggin@gmail.com>

> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
> ---
>  arch/powerpc/include/asm/hw_irq.h | 75 +++++++++++++++++++++----------
>  arch/powerpc/include/asm/reg.h    |  1 +
>  2 files changed, 52 insertions(+), 24 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/hw_irq.h b/arch/powerpc/include/asm/hw_irq.h
> index ed0c3b049dfd..4739f61e632c 100644
> --- a/arch/powerpc/include/asm/hw_irq.h
> +++ b/arch/powerpc/include/asm/hw_irq.h
> @@ -50,6 +50,55 @@
>  
>  #ifndef __ASSEMBLY__
>  
> +static inline void __hard_irq_enable(void)
> +{
> +	if (IS_ENABLED(CONFIG_BOOKE) || IS_ENABLED(CONFIG_40x))
> +		wrtee(MSR_EE);
> +	else if (IS_ENABLED(CONFIG_PPC_8xx))
> +		wrtspr(SPRN_EIE);
> +	else if (IS_ENABLED(CONFIG_PPC_BOOK3S_64))
> +		__mtmsrd(MSR_EE | MSR_RI, 1);
> +	else
> +		mtmsr(mfmsr() | MSR_EE);
> +}
> +
> +static inline void __hard_irq_disable(void)
> +{
> +	if (IS_ENABLED(CONFIG_BOOKE) || IS_ENABLED(CONFIG_40x))
> +		wrtee(0);
> +	else if (IS_ENABLED(CONFIG_PPC_8xx))
> +		wrtspr(SPRN_EID);
> +	else if (IS_ENABLED(CONFIG_PPC_BOOK3S_64))
> +		__mtmsrd(MSR_RI, 1);
> +	else
> +		mtmsr(mfmsr() & ~MSR_EE);
> +}
> +
> +static inline void __hard_EE_RI_disable(void)
> +{
> +	if (IS_ENABLED(CONFIG_BOOKE) || IS_ENABLED(CONFIG_40x))
> +		wrtee(0);
> +	else if (IS_ENABLED(CONFIG_PPC_8xx))
> +		wrtspr(SPRN_NRI);
> +	else if (IS_ENABLED(CONFIG_PPC_BOOK3S_64))
> +		__mtmsrd(0, 1);
> +	else
> +		mtmsr(mfmsr() & ~(MSR_EE | MSR_RI));
> +}
> +
> +static inline void __hard_RI_enable(void)
> +{
> +	if (IS_ENABLED(CONFIG_BOOKE) || IS_ENABLED(CONFIG_40x))
> +		return;
> +
> +	if (IS_ENABLED(CONFIG_PPC_8xx))
> +		wrtspr(SPRN_EID);
> +	else if (IS_ENABLED(CONFIG_PPC_BOOK3S_64))
> +		__mtmsrd(MSR_RI, 1);
> +	else
> +		mtmsr(mfmsr() | MSR_RI);
> +}
> +
>  #ifdef CONFIG_PPC64
>  #include <asm/paca.h>
>  
> @@ -212,18 +261,6 @@ static inline bool arch_irqs_disabled(void)
>  
>  #endif /* CONFIG_PPC_BOOK3S */
>  
> -#ifdef CONFIG_PPC_BOOK3E
> -#define __hard_irq_enable()	wrtee(MSR_EE)
> -#define __hard_irq_disable()	wrtee(0)
> -#define __hard_EE_RI_disable()	wrtee(0)
> -#define __hard_RI_enable()	do { } while (0)
> -#else
> -#define __hard_irq_enable()	__mtmsrd(MSR_EE|MSR_RI, 1)
> -#define __hard_irq_disable()	__mtmsrd(MSR_RI, 1)
> -#define __hard_EE_RI_disable()	__mtmsrd(0, 1)
> -#define __hard_RI_enable()	__mtmsrd(MSR_RI, 1)
> -#endif
> -
>  #define hard_irq_disable()	do {					\
>  	unsigned long flags;						\
>  	__hard_irq_disable();						\
> @@ -322,22 +359,12 @@ static inline unsigned long arch_local_irq_save(void)
>  
>  static inline void arch_local_irq_disable(void)
>  {
> -	if (IS_ENABLED(CONFIG_BOOKE))
> -		wrtee(0);
> -	else if (IS_ENABLED(CONFIG_PPC_8xx))
> -		wrtspr(SPRN_EID);
> -	else
> -		mtmsr(mfmsr() & ~MSR_EE);
> +	__hard_irq_disable();
>  }
>  
>  static inline void arch_local_irq_enable(void)
>  {
> -	if (IS_ENABLED(CONFIG_BOOKE))
> -		wrtee(MSR_EE);
> -	else if (IS_ENABLED(CONFIG_PPC_8xx))
> -		wrtspr(SPRN_EIE);
> -	else
> -		mtmsr(mfmsr() | MSR_EE);
> +	__hard_irq_enable();
>  }
>  
>  static inline bool arch_irqs_disabled_flags(unsigned long flags)
> diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
> index c5a3e856191c..bc4305ba00d0 100644
> --- a/arch/powerpc/include/asm/reg.h
> +++ b/arch/powerpc/include/asm/reg.h
> @@ -1375,6 +1375,7 @@
>  #define mtmsr(v)	asm volatile("mtmsr %0" : \
>  				     : "r" ((unsigned long)(v)) \
>  				     : "memory")
> +#define __mtmsrd(v, l)	BUILD_BUG()
>  #define __MTMSR		"mtmsr"
>  #endif
>  
> -- 
> 2.25.0
> 
> 

^ permalink raw reply

* Re: [PATCH v5 07/22] powerpc/irq: Add stub irq_soft_mask_return() for PPC32
From: Nicholas Piggin @ 2021-02-09  1:19 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Christophe Leroy, Michael Ellerman,
	msuchanek, Paul Mackerras
  Cc: linuxppc-dev, linux-kernel
In-Reply-To: <9b9f62c5e2e63cc121fd749a923aaaee92ee0da4.1612796617.git.christophe.leroy@csgroup.eu>

Excerpts from Christophe Leroy's message of February 9, 2021 1:10 am:
> To allow building syscall_64.c smoothly on PPC32, add stub version
> of irq_soft_mask_return().
> 
> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>

Same kind of comment as the other soft mask stuff. Again not a big deal 
but there might be a way to improve it. For example make a
debug_syscall_entry(regs) function that ppc64 could put the soft mask
checks into.

No big deal, if you don't make any changes now I might see about doing 
something like that after your series goes in.

Reviewed-by: Nicholas Piggin <npiggin@gmail.com>

> ---
>  arch/powerpc/include/asm/hw_irq.h | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/arch/powerpc/include/asm/hw_irq.h b/arch/powerpc/include/asm/hw_irq.h
> index 4739f61e632c..56a98936a6a9 100644
> --- a/arch/powerpc/include/asm/hw_irq.h
> +++ b/arch/powerpc/include/asm/hw_irq.h
> @@ -330,6 +330,11 @@ static inline void irq_soft_mask_regs_set_state(struct pt_regs *regs, unsigned l
>  }
>  #else /* CONFIG_PPC64 */
>  
> +static inline notrace unsigned long irq_soft_mask_return(void)
> +{
> +	return 0;
> +}
> +
>  static inline unsigned long arch_local_save_flags(void)
>  {
>  	return mfmsr();
> -- 
> 2.25.0
> 
> 

^ permalink raw reply

* Re: [PATCH v5 08/22] powerpc/syscall: Rename syscall_64.c into interrupt.c
From: Nicholas Piggin @ 2021-02-09  1:19 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Christophe Leroy, Michael Ellerman,
	msuchanek, Paul Mackerras
  Cc: linuxppc-dev, linux-kernel
In-Reply-To: <cddc2deaa8f049d3ec419738e69804934919b935.1612796617.git.christophe.leroy@csgroup.eu>

Excerpts from Christophe Leroy's message of February 9, 2021 1:10 am:
> syscall_64.c will be reused almost as is for PPC32.
> 
> As this file also contains functions to handle other types
> of interrupts rename it interrupt.c
> 
> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>

Reviewed-by: Nicholas Piggin <npiggin@gmail.com>

> ---
>  arch/powerpc/kernel/Makefile                      | 2 +-
>  arch/powerpc/kernel/{syscall_64.c => interrupt.c} | 0
>  2 files changed, 1 insertion(+), 1 deletion(-)
>  rename arch/powerpc/kernel/{syscall_64.c => interrupt.c} (100%)
> 
> diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
> index c173efd66c00..26ff8c6e06b7 100644
> --- a/arch/powerpc/kernel/Makefile
> +++ b/arch/powerpc/kernel/Makefile
> @@ -60,7 +60,7 @@ obj-y				:= cputable.o syscalls.o \
>  				   hw_breakpoint_constraints.o
>  obj-y				+= ptrace/
>  obj-$(CONFIG_PPC64)		+= setup_64.o \
> -				   paca.o nvram_64.o note.o syscall_64.o
> +				   paca.o nvram_64.o note.o interrupt.o
>  obj-$(CONFIG_COMPAT)		+= sys_ppc32.o signal_32.o
>  obj-$(CONFIG_VDSO32)		+= vdso32_wrapper.o
>  obj-$(CONFIG_PPC_WATCHDOG)	+= watchdog.o
> diff --git a/arch/powerpc/kernel/syscall_64.c b/arch/powerpc/kernel/interrupt.c
> similarity index 100%
> rename from arch/powerpc/kernel/syscall_64.c
> rename to arch/powerpc/kernel/interrupt.c
> -- 
> 2.25.0
> 
> 

^ permalink raw reply

* Re: [PATCH v5 09/22] powerpc/syscall: Make interrupt.c buildable on PPC32
From: Nicholas Piggin @ 2021-02-09  1:27 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Christophe Leroy, Michael Ellerman,
	msuchanek, Paul Mackerras
  Cc: linuxppc-dev, linux-kernel
In-Reply-To: <ba073ad67bd971a88ce331b65d6655523b54c794.1612796617.git.christophe.leroy@csgroup.eu>

Excerpts from Christophe Leroy's message of February 9, 2021 1:10 am:
> To allow building interrupt.c on PPC32, ifdef out specific PPC64
> code or use helpers which are available on both PP32 and PPC64
> 
> Modify Makefile to always build interrupt.o
> 
> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
> ---
> v5:
> - Also for interrupt exit preparation
> - Opted out kuap related code, ppc32 keeps it in ASM for the time being
> ---
>  arch/powerpc/kernel/Makefile    |  4 ++--
>  arch/powerpc/kernel/interrupt.c | 31 ++++++++++++++++++++++++-------
>  2 files changed, 26 insertions(+), 9 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
> index 26ff8c6e06b7..163755b1cef4 100644
> --- a/arch/powerpc/kernel/Makefile
> +++ b/arch/powerpc/kernel/Makefile
> @@ -57,10 +57,10 @@ obj-y				:= cputable.o syscalls.o \
>  				   prom.o traps.o setup-common.o \
>  				   udbg.o misc.o io.o misc_$(BITS).o \
>  				   of_platform.o prom_parse.o firmware.o \
> -				   hw_breakpoint_constraints.o
> +				   hw_breakpoint_constraints.o interrupt.o
>  obj-y				+= ptrace/
>  obj-$(CONFIG_PPC64)		+= setup_64.o \
> -				   paca.o nvram_64.o note.o interrupt.o
> +				   paca.o nvram_64.o note.o
>  obj-$(CONFIG_COMPAT)		+= sys_ppc32.o signal_32.o
>  obj-$(CONFIG_VDSO32)		+= vdso32_wrapper.o
>  obj-$(CONFIG_PPC_WATCHDOG)	+= watchdog.o
> diff --git a/arch/powerpc/kernel/interrupt.c b/arch/powerpc/kernel/interrupt.c
> index d6be4f9a67e5..2dac4d2bb1cf 100644
> --- a/arch/powerpc/kernel/interrupt.c
> +++ b/arch/powerpc/kernel/interrupt.c
> @@ -39,7 +39,7 @@ notrace long system_call_exception(long r3, long r4, long r5,
>  		BUG_ON(!(regs->msr & MSR_RI));
>  	BUG_ON(!(regs->msr & MSR_PR));
>  	BUG_ON(!FULL_REGS(regs));
> -	BUG_ON(regs->softe != IRQS_ENABLED);
> +	BUG_ON(arch_irq_disabled_regs(regs));
>  
>  #ifdef CONFIG_PPC_PKEY
>  	if (mmu_has_feature(MMU_FTR_PKEY)) {
> @@ -65,7 +65,9 @@ notrace long system_call_exception(long r3, long r4, long r5,
>  			isync();
>  	} else
>  #endif
> +#ifdef CONFIG_PPC64
>  		kuap_check_amr();
> +#endif

Wouldn't mind trying to get rid of these ifdefs at some point, but 
there's some kuap / keys changes going on recently so I'm happy enough 
to let this settle then look at whether we can refactor.

>  
>  	account_cpu_user_entry();
>  
> @@ -77,7 +79,7 @@ notrace long system_call_exception(long r3, long r4, long r5,
>  	 * frame, or if the unwinder was taught the first stack frame always
>  	 * returns to user with IRQS_ENABLED, this store could be avoided!
>  	 */
> -	regs->softe = IRQS_ENABLED;
> +	irq_soft_mask_regs_set_state(regs, IRQS_ENABLED);
>  
>  	local_irq_enable();
>  
> @@ -151,6 +153,7 @@ static notrace inline bool __prep_irq_for_enabled_exit(bool clear_ri)
>  		__hard_EE_RI_disable();
>  	else
>  		__hard_irq_disable();
> +#ifdef CONFIG_PPC64
>  	if (unlikely(lazy_irq_pending_nocheck())) {
>  		/* Took an interrupt, may have more exit work to do. */
>  		if (clear_ri)
> @@ -162,7 +165,7 @@ static notrace inline bool __prep_irq_for_enabled_exit(bool clear_ri)
>  	}
>  	local_paca->irq_happened = 0;
>  	irq_soft_mask_set(IRQS_ENABLED);
> -
> +#endif
>  	return true;
>  }
>  

Do we prefer space before return except in trivial cases?

> @@ -216,7 +219,9 @@ notrace unsigned long syscall_exit_prepare(unsigned long r3,
>  
>  	CT_WARN_ON(ct_state() == CONTEXT_USER);
>  
> +#ifdef CONFIG_PPC64
>  	kuap_check_amr();
> +#endif
>  
>  	regs->result = r3;
>  
> @@ -309,7 +314,7 @@ notrace unsigned long syscall_exit_prepare(unsigned long r3,
>  
>  	account_cpu_user_exit();
>  
> -#ifdef CONFIG_PPC_BOOK3S /* BOOK3E not yet using this */
> +#ifdef CONFIG_PPC_BOOK3S_64 /* BOOK3E and ppc32 not using this */
>  	/*
>  	 * We do this at the end so that we do context switch with KERNEL AMR
>  	 */
> @@ -318,7 +323,7 @@ notrace unsigned long syscall_exit_prepare(unsigned long r3,
>  	return ret;
>  }
>  
> -#ifdef CONFIG_PPC_BOOK3S /* BOOK3E not yet using this */
> +#ifndef CONFIG_PPC_BOOK3E_64 /* BOOK3E not yet using this */
>  notrace unsigned long interrupt_exit_user_prepare(struct pt_regs *regs, unsigned long msr)
>  {
>  #ifdef CONFIG_PPC_BOOK3E

Why are you building this for 32? I don't mind if it's just to keep 
things similar and make it build for now, but you're not using it yet,
right?
 
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>

> @@ -333,14 +338,16 @@ notrace unsigned long interrupt_exit_user_prepare(struct pt_regs *regs, unsigned
>  		BUG_ON(!(regs->msr & MSR_RI));
>  	BUG_ON(!(regs->msr & MSR_PR));
>  	BUG_ON(!FULL_REGS(regs));
> -	BUG_ON(regs->softe != IRQS_ENABLED);
> +	BUG_ON(arch_irq_disabled_regs(regs));
>  	CT_WARN_ON(ct_state() == CONTEXT_USER);
>  
>  	/*
>  	 * We don't need to restore AMR on the way back to userspace for KUAP.
>  	 * AMR can only have been unlocked if we interrupted the kernel.
>  	 */
> +#ifdef CONFIG_PPC64
>  	kuap_check_amr();
> +#endif
>  
>  	local_irq_save(flags);
>  
> @@ -407,7 +414,9 @@ notrace unsigned long interrupt_exit_user_prepare(struct pt_regs *regs, unsigned
>  	/*
>  	 * We do this at the end so that we do context switch with KERNEL AMR
>  	 */
> +#ifdef CONFIG_PPC64
>  	kuap_user_restore(regs);
> +#endif
>  	return ret;
>  }
>  
> @@ -419,7 +428,9 @@ notrace unsigned long interrupt_exit_kernel_prepare(struct pt_regs *regs, unsign
>  	unsigned long *ti_flagsp = &current_thread_info()->flags;
>  	unsigned long flags;
>  	unsigned long ret = 0;
> +#ifdef CONFIG_PPC64
>  	unsigned long amr;
> +#endif
>  
>  	if (IS_ENABLED(CONFIG_PPC_BOOK3S) && unlikely(!(regs->msr & MSR_RI)))
>  		unrecoverable_exception(regs);
> @@ -432,7 +443,9 @@ notrace unsigned long interrupt_exit_kernel_prepare(struct pt_regs *regs, unsign
>  	if (TRAP(regs) != 0x700)
>  		CT_WARN_ON(ct_state() == CONTEXT_USER);
>  
> +#ifdef CONFIG_PPC64
>  	amr = kuap_get_and_check_amr();
> +#endif
>  
>  	if (unlikely(*ti_flagsp & _TIF_EMULATE_STACK_STORE)) {
>  		clear_bits(_TIF_EMULATE_STACK_STORE, ti_flagsp);
> @@ -441,7 +454,7 @@ notrace unsigned long interrupt_exit_kernel_prepare(struct pt_regs *regs, unsign
>  
>  	local_irq_save(flags);
>  
> -	if (regs->softe == IRQS_ENABLED) {
> +	if (!arch_irq_disabled_regs(regs)) {
>  		/* Returning to a kernel context with local irqs enabled. */
>  		WARN_ON_ONCE(!(regs->msr & MSR_EE));
>  again:
> @@ -458,8 +471,10 @@ notrace unsigned long interrupt_exit_kernel_prepare(struct pt_regs *regs, unsign
>  	} else {
>  		/* Returning to a kernel context with local irqs disabled. */
>  		__hard_EE_RI_disable();
> +#ifdef CONFIG_PPC64
>  		if (regs->msr & MSR_EE)
>  			local_paca->irq_happened &= ~PACA_IRQ_HARD_DIS;
> +#endif
>  	}
>  
>  
> @@ -472,7 +487,9 @@ notrace unsigned long interrupt_exit_kernel_prepare(struct pt_regs *regs, unsign
>  	 * which would cause Read-After-Write stalls. Hence, we take the AMR
>  	 * value from the check above.
>  	 */
> +#ifdef CONFIG_PPC64
>  	kuap_kernel_restore(regs, amr);
> +#endif
>  
>  	return ret;
>  }
> -- 
> 2.25.0
> 
> 

^ permalink raw reply

* Re: [PATCH v5 10/22] powerpc/syscall: Use is_compat_task()
From: Nicholas Piggin @ 2021-02-09  1:29 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Christophe Leroy, Michael Ellerman,
	msuchanek, Paul Mackerras
  Cc: linuxppc-dev, linux-kernel
In-Reply-To: <c8094662199337a7200fea9f6e1d1f8b1b6d5f69.1612796617.git.christophe.leroy@csgroup.eu>

Excerpts from Christophe Leroy's message of February 9, 2021 1:10 am:
> Instead of hard comparing task flags with _TIF_32BIT, use
> is_compat_task(). The advantage is that it returns 0 on PPC32
> allthough _TIF_32BIT is always set.
> 
> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>

Reviewed-by: Nicholas Piggin <npiggin@gmail.com>


> ---
>  arch/powerpc/kernel/interrupt.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/kernel/interrupt.c b/arch/powerpc/kernel/interrupt.c
> index 2dac4d2bb1cf..46fd195ca659 100644
> --- a/arch/powerpc/kernel/interrupt.c
> +++ b/arch/powerpc/kernel/interrupt.c
> @@ -2,6 +2,8 @@
>  
>  #include <linux/context_tracking.h>
>  #include <linux/err.h>
> +#include <linux/compat.h>
> +
>  #include <asm/asm-prototypes.h>
>  #include <asm/kup.h>
>  #include <asm/cputime.h>
> @@ -118,7 +120,7 @@ notrace long system_call_exception(long r3, long r4, long r5,
>  	/* May be faster to do array_index_nospec? */
>  	barrier_nospec();
>  
> -	if (unlikely(is_32bit_task())) {
> +	if (unlikely(is_compat_task())) {
>  		f = (void *)compat_sys_call_table[r0];
>  
>  		r3 &= 0x00000000ffffffffULL;
> -- 
> 2.25.0
> 
> 

^ permalink raw reply

* Re: [PATCH v5 11/22] powerpc/syscall: Save r3 in regs->orig_r3
From: Nicholas Piggin @ 2021-02-09  1:29 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Christophe Leroy, Michael Ellerman,
	msuchanek, Paul Mackerras
  Cc: linuxppc-dev, linux-kernel
In-Reply-To: <9a90805ab6b9101b46daf56470f457a57acd86fc.1612796617.git.christophe.leroy@csgroup.eu>

Excerpts from Christophe Leroy's message of February 9, 2021 1:10 am:
> Save r3 in regs->orig_r3 in system_call_exception()
> 
> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>

Reviewed-by: Nicholas Piggin <npiggin@gmail.com>

> ---
> v5: Removed the assembly one on SCV type system call
> ---
>  arch/powerpc/kernel/entry_64.S  | 2 --
>  arch/powerpc/kernel/interrupt.c | 2 ++
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
> index 33ddfeef4fe9..a91c2def165d 100644
> --- a/arch/powerpc/kernel/entry_64.S
> +++ b/arch/powerpc/kernel/entry_64.S
> @@ -108,7 +108,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_TM)
>  	li	r11,\trapnr
>  	std	r11,_TRAP(r1)
>  	std	r12,_CCR(r1)
> -	std	r3,ORIG_GPR3(r1)
>  	addi	r10,r1,STACK_FRAME_OVERHEAD
>  	ld	r11,exception_marker@toc(r2)
>  	std	r11,-16(r10)		/* "regshere" marker */
> @@ -278,7 +277,6 @@ END_BTB_FLUSH_SECTION
>  	std	r10,_LINK(r1)
>  	std	r11,_TRAP(r1)
>  	std	r12,_CCR(r1)
> -	std	r3,ORIG_GPR3(r1)
>  	addi	r10,r1,STACK_FRAME_OVERHEAD
>  	ld	r11,exception_marker@toc(r2)
>  	std	r11,-16(r10)		/* "regshere" marker */
> diff --git a/arch/powerpc/kernel/interrupt.c b/arch/powerpc/kernel/interrupt.c
> index 46fd195ca659..1a2dec49f811 100644
> --- a/arch/powerpc/kernel/interrupt.c
> +++ b/arch/powerpc/kernel/interrupt.c
> @@ -29,6 +29,8 @@ notrace long system_call_exception(long r3, long r4, long r5,
>  {
>  	syscall_fn f;
>  
> +	regs->orig_gpr3 = r3;
> +
>  	if (IS_ENABLED(CONFIG_PPC_IRQ_SOFT_MASK_DEBUG))
>  		BUG_ON(irq_soft_mask_return() != IRQS_ALL_DISABLED);
>  
> -- 
> 2.25.0
> 
> 

^ permalink raw reply

* Re: [PATCH v5 12/22] powerpc/syscall: Change condition to check MSR_RI
From: Nicholas Piggin @ 2021-02-09  1:36 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Christophe Leroy, Michael Ellerman,
	msuchanek, Paul Mackerras
  Cc: linuxppc-dev, linux-kernel
In-Reply-To: <67820fada8dd6a8fe9d7b666f175d4cc9d8de87e.1612796617.git.christophe.leroy@csgroup.eu>

Excerpts from Christophe Leroy's message of February 9, 2021 1:10 am:
> In system_call_exception(), MSR_RI also needs to be checked on 8xx.
> Only booke and 40x doesn't have MSR_RI.

Reviewed-by: Nicholas Piggin <npiggin@gmail.com>

...
> 
> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
> ---
> v5: Also in interrupt exit prepare
> ---
>  arch/powerpc/kernel/interrupt.c | 7 ++++---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/interrupt.c b/arch/powerpc/kernel/interrupt.c
> index 1a2dec49f811..107ec39f05cb 100644
> --- a/arch/powerpc/kernel/interrupt.c
> +++ b/arch/powerpc/kernel/interrupt.c
> @@ -39,7 +39,7 @@ notrace long system_call_exception(long r3, long r4, long r5,
>  
>  	trace_hardirqs_off(); /* finish reconciling */
>  
> -	if (IS_ENABLED(CONFIG_PPC_BOOK3S))
> +	if (!IS_ENABLED(CONFIG_BOOKE) && !IS_ENABLED(CONFIG_40x))
>  		BUG_ON(!(regs->msr & MSR_RI));
>  	BUG_ON(!(regs->msr & MSR_PR));
>  	BUG_ON(!FULL_REGS(regs));
> @@ -338,7 +338,7 @@ notrace unsigned long interrupt_exit_user_prepare(struct pt_regs *regs, unsigned
>  	unsigned long flags;
>  	unsigned long ret = 0;
>  
> -	if (IS_ENABLED(CONFIG_PPC_BOOK3S))
> +	if (!IS_ENABLED(CONFIG_BOOKE) && !IS_ENABLED(CONFIG_40x))
>  		BUG_ON(!(regs->msr & MSR_RI));
>  	BUG_ON(!(regs->msr & MSR_PR));
>  	BUG_ON(!FULL_REGS(regs));
> @@ -436,7 +436,8 @@ notrace unsigned long interrupt_exit_kernel_prepare(struct pt_regs *regs, unsign
>  	unsigned long amr;
>  #endif
>  
> -	if (IS_ENABLED(CONFIG_PPC_BOOK3S) && unlikely(!(regs->msr & MSR_RI)))
> +	if (!IS_ENABLED(CONFIG_BOOKE) && !IS_ENABLED(CONFIG_40x) &&
> +	    unlikely(!(regs->msr & MSR_RI)))
>  		unrecoverable_exception(regs);
>  	BUG_ON(regs->msr & MSR_PR);
>  	BUG_ON(!FULL_REGS(regs));
> -- 
> 2.25.0
> 
> 

^ permalink raw reply

* Re: [PATCH v5 16/22] powerpc/syscall: Avoid stack frame in likely part of system_call_exception()
From: Nicholas Piggin @ 2021-02-09  1:55 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Christophe Leroy, Michael Ellerman,
	msuchanek, Paul Mackerras
  Cc: linuxppc-dev, linux-kernel
In-Reply-To: <981edfd50d4c980634b74c4bb76b765c499a87ec.1612796617.git.christophe.leroy@csgroup.eu>

Excerpts from Christophe Leroy's message of February 9, 2021 1:10 am:
> When r3 is not modified, reload it from regs->orig_r3 to free
> volatile registers. This avoids a stack frame for the likely part
> of system_call_exception()

This doesn't on my 64s build, but it does reduce one non volatile
register save/restore. With quite a bit more register pressure
reduction 64s can avoid the stack frame as well.

It's a cool trick but quite code and compiler specific so I don't know 
how worthwhile it is to keep considering we're calling out into random
kernel C code after this.

Maybe just keep it PPC32 specific for the moment, will have to do more
tuning for 64 and we have other stuff to do there first.

If you are happy to make it 32-bit only then

Reviewed-by: Nicholas Piggin <npiggin@gmail.com>

> 
> Before the patch:
> 
> c000b4d4 <system_call_exception>:
> c000b4d4:	7c 08 02 a6 	mflr    r0
> c000b4d8:	94 21 ff e0 	stwu    r1,-32(r1)
> c000b4dc:	93 e1 00 1c 	stw     r31,28(r1)
> c000b4e0:	90 01 00 24 	stw     r0,36(r1)
> c000b4e4:	90 6a 00 88 	stw     r3,136(r10)
> c000b4e8:	81 6a 00 84 	lwz     r11,132(r10)
> c000b4ec:	69 6b 00 02 	xori    r11,r11,2
> c000b4f0:	55 6b ff fe 	rlwinm  r11,r11,31,31,31
> c000b4f4:	0f 0b 00 00 	twnei   r11,0
> c000b4f8:	81 6a 00 a0 	lwz     r11,160(r10)
> c000b4fc:	55 6b 07 fe 	clrlwi  r11,r11,31
> c000b500:	0f 0b 00 00 	twnei   r11,0
> c000b504:	7c 0c 42 e6 	mftb    r0
> c000b508:	83 e2 00 08 	lwz     r31,8(r2)
> c000b50c:	81 82 00 28 	lwz     r12,40(r2)
> c000b510:	90 02 00 24 	stw     r0,36(r2)
> c000b514:	7d 8c f8 50 	subf    r12,r12,r31
> c000b518:	7c 0c 02 14 	add     r0,r12,r0
> c000b51c:	90 02 00 08 	stw     r0,8(r2)
> c000b520:	7c 10 13 a6 	mtspr   80,r0
> c000b524:	81 62 00 70 	lwz     r11,112(r2)
> c000b528:	71 60 86 91 	andi.   r0,r11,34449
> c000b52c:	40 82 00 34 	bne     c000b560 <system_call_exception+0x8c>
> c000b530:	2b 89 01 b6 	cmplwi  cr7,r9,438
> c000b534:	41 9d 00 64 	bgt     cr7,c000b598 <system_call_exception+0xc4>
> c000b538:	3d 40 c0 5c 	lis     r10,-16292
> c000b53c:	55 29 10 3a 	rlwinm  r9,r9,2,0,29
> c000b540:	39 4a 41 e8 	addi    r10,r10,16872
> c000b544:	80 01 00 24 	lwz     r0,36(r1)
> c000b548:	7d 2a 48 2e 	lwzx    r9,r10,r9
> c000b54c:	7c 08 03 a6 	mtlr    r0
> c000b550:	7d 29 03 a6 	mtctr   r9
> c000b554:	83 e1 00 1c 	lwz     r31,28(r1)
> c000b558:	38 21 00 20 	addi    r1,r1,32
> c000b55c:	4e 80 04 20 	bctr
> 
> After the patch:
> 
> c000b4d4 <system_call_exception>:
> c000b4d4:	81 6a 00 84 	lwz     r11,132(r10)
> c000b4d8:	90 6a 00 88 	stw     r3,136(r10)
> c000b4dc:	69 6b 00 02 	xori    r11,r11,2
> c000b4e0:	55 6b ff fe 	rlwinm  r11,r11,31,31,31
> c000b4e4:	0f 0b 00 00 	twnei   r11,0
> c000b4e8:	80 6a 00 a0 	lwz     r3,160(r10)
> c000b4ec:	54 63 07 fe 	clrlwi  r3,r3,31
> c000b4f0:	0f 03 00 00 	twnei   r3,0
> c000b4f4:	7d 6c 42 e6 	mftb    r11
> c000b4f8:	81 82 00 08 	lwz     r12,8(r2)
> c000b4fc:	80 02 00 28 	lwz     r0,40(r2)
> c000b500:	91 62 00 24 	stw     r11,36(r2)
> c000b504:	7c 00 60 50 	subf    r0,r0,r12
> c000b508:	7d 60 5a 14 	add     r11,r0,r11
> c000b50c:	91 62 00 08 	stw     r11,8(r2)
> c000b510:	7c 10 13 a6 	mtspr   80,r0
> c000b514:	80 62 00 70 	lwz     r3,112(r2)
> c000b518:	70 6b 86 91 	andi.   r11,r3,34449
> c000b51c:	40 82 00 28 	bne     c000b544 <system_call_exception+0x70>
> c000b520:	2b 89 01 b6 	cmplwi  cr7,r9,438
> c000b524:	41 9d 00 84 	bgt     cr7,c000b5a8 <system_call_exception+0xd4>
> c000b528:	80 6a 00 88 	lwz     r3,136(r10)
> c000b52c:	3d 40 c0 5c 	lis     r10,-16292
> c000b530:	55 29 10 3a 	rlwinm  r9,r9,2,0,29
> c000b534:	39 4a 41 e4 	addi    r10,r10,16868
> c000b538:	7d 2a 48 2e 	lwzx    r9,r10,r9
> c000b53c:	7d 29 03 a6 	mtctr   r9
> c000b540:	4e 80 04 20 	bctr
> 
> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
> ---
>  arch/powerpc/kernel/interrupt.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/arch/powerpc/kernel/interrupt.c b/arch/powerpc/kernel/interrupt.c
> index 107ec39f05cb..205902052112 100644
> --- a/arch/powerpc/kernel/interrupt.c
> +++ b/arch/powerpc/kernel/interrupt.c
> @@ -117,6 +117,9 @@ notrace long system_call_exception(long r3, long r4, long r5,
>  			return regs->gpr[3];
>  		}
>  		return -ENOSYS;
> +	} else {
> +		/* Restore r3 from orig_gpr3 to free up a volatile reg */
> +		r3 = regs->orig_gpr3;
>  	}
>  
>  	/* May be faster to do array_index_nospec? */
> -- 
> 2.25.0
> 
> 

^ permalink raw reply

* Re: [PATCH v5 17/22] powerpc/syscall: Do not check unsupported scv vector on PPC32
From: Nicholas Piggin @ 2021-02-09  2:00 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Christophe Leroy, Michael Ellerman,
	msuchanek, Paul Mackerras
  Cc: linuxppc-dev, linux-kernel
In-Reply-To: <fc3afe1870f943b2010805fcb045b718a638b3c6.1612796617.git.christophe.leroy@csgroup.eu>

Excerpts from Christophe Leroy's message of February 9, 2021 1:10 am:
> Only PPC64 has scv. No need to check the 0x7ff0 trap on PPC32.
> For that, add a helper trap_is_unsupported_scv() similar to
> trap_is_scv().
> 
> And ignore the scv parameter in syscall_exit_prepare (Save 14 cycles
> 346 => 332 cycles)
> 
> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
> ---
> v5: Added a helper trap_is_unsupported_scv()
> ---
>  arch/powerpc/include/asm/ptrace.h | 5 +++++
>  arch/powerpc/kernel/entry_32.S    | 1 -
>  arch/powerpc/kernel/interrupt.c   | 7 +++++--
>  3 files changed, 10 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/ptrace.h b/arch/powerpc/include/asm/ptrace.h
> index 58f9dc060a7b..2c842b11a924 100644
> --- a/arch/powerpc/include/asm/ptrace.h
> +++ b/arch/powerpc/include/asm/ptrace.h
> @@ -229,6 +229,11 @@ static inline bool trap_is_scv(struct pt_regs *regs)
>  	return (IS_ENABLED(CONFIG_PPC_BOOK3S_64) && TRAP(regs) == 0x3000);
>  }
>  
> +static inline bool trap_is_unsupported_scv(struct pt_regs *regs)
> +{
> +	return (IS_ENABLED(CONFIG_PPC_BOOK3S_64) && TRAP(regs) == 0x7ff0);
> +}

This change is good.

> +
>  static inline bool trap_is_syscall(struct pt_regs *regs)
>  {
>  	return (trap_is_scv(regs) || TRAP(regs) == 0xc00);
> diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S
> index cffe58e63356..7c824e8928d0 100644
> --- a/arch/powerpc/kernel/entry_32.S
> +++ b/arch/powerpc/kernel/entry_32.S
> @@ -344,7 +344,6 @@ transfer_to_syscall:
>  
>  ret_from_syscall:
>  	addi    r4,r1,STACK_FRAME_OVERHEAD
> -	li	r5,0
>  	bl	syscall_exit_prepare

For this one, I think it would be nice to do the "right" thing and make 
the function prototypes different on !64S. They could then declare a
local const bool scv = 0.

We could have syscall_exit_prepare and syscall_exit_prepare_maybe_scv
or something like that, 64s can use the latter one and the former can be
a wrapper that passes constant 0 for scv. Then we don't have different
prototypes for the same function, but you just have to make the 32-bit
version static inline and the 64-bit version exported to asm.

Thanks,
Nick

>  #if defined(CONFIG_4xx) || defined(CONFIG_BOOKE)
>  	/* If the process has its own DBCR0 value, load it up.  The internal
> diff --git a/arch/powerpc/kernel/interrupt.c b/arch/powerpc/kernel/interrupt.c
> index 205902052112..8fafca727b8b 100644
> --- a/arch/powerpc/kernel/interrupt.c
> +++ b/arch/powerpc/kernel/interrupt.c
> @@ -88,7 +88,7 @@ notrace long system_call_exception(long r3, long r4, long r5,
>  	local_irq_enable();
>  
>  	if (unlikely(current_thread_info()->flags & _TIF_SYSCALL_DOTRACE)) {
> -		if (unlikely(regs->trap == 0x7ff0)) {
> +		if (unlikely(trap_is_unsupported_scv(regs))) {
>  			/* Unsupported scv vector */
>  			_exception(SIGILL, regs, ILL_ILLOPC, regs->nip);
>  			return regs->gpr[3];
> @@ -111,7 +111,7 @@ notrace long system_call_exception(long r3, long r4, long r5,
>  		r8 = regs->gpr[8];
>  
>  	} else if (unlikely(r0 >= NR_syscalls)) {
> -		if (unlikely(regs->trap == 0x7ff0)) {
> +		if (unlikely(trap_is_unsupported_scv(regs))) {
>  			/* Unsupported scv vector */
>  			_exception(SIGILL, regs, ILL_ILLOPC, regs->nip);
>  			return regs->gpr[3];
> @@ -224,6 +224,9 @@ notrace unsigned long syscall_exit_prepare(unsigned long r3,
>  	unsigned long ti_flags;
>  	unsigned long ret = 0;
>  
> +	if (IS_ENABLED(CONFIG_PPC32))
> +		scv = 0;
> +
>  	CT_WARN_ON(ct_state() == CONTEXT_USER);
>  
>  #ifdef CONFIG_PPC64
> -- 
> 2.25.0
> 
> 

^ permalink raw reply

* Re: [PATCH v5 18/22] powerpc/syscall: Remove FULL_REGS verification in system_call_exception
From: Nicholas Piggin @ 2021-02-09  2:02 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Christophe Leroy, Michael Ellerman,
	msuchanek, Paul Mackerras
  Cc: linuxppc-dev, linux-kernel
In-Reply-To: <6bef4d9ba0cba50160d13e344ee4627ebdf801dc.1612796617.git.christophe.leroy@csgroup.eu>

Excerpts from Christophe Leroy's message of February 9, 2021 1:10 am:
> For book3s/64, FULL_REGS() is 'true' at all time, so the test voids.
> For others, non volatile registers are saved inconditionally.
> 
> So the verification is pointless.
> 
> Should one fail to do it, it would anyway be caught by the
> CHECK_FULL_REGS() in copy_thread() as we have removed the
> special versions ppc_fork() and friends.
> 
> null_syscall benchmark reduction 4 cycles (332 => 328 cycles)

I wonder if we rather make a CONFIG option for a bunch of these simpler
debug checks here (and also in interrupt exit, wrappers, etc) rather
than remove them entirely.

Thanks,
Nick

> 
> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
> ---
>  arch/powerpc/kernel/interrupt.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/arch/powerpc/kernel/interrupt.c b/arch/powerpc/kernel/interrupt.c
> index 8fafca727b8b..55e1aa18cdb9 100644
> --- a/arch/powerpc/kernel/interrupt.c
> +++ b/arch/powerpc/kernel/interrupt.c
> @@ -42,7 +42,6 @@ notrace long system_call_exception(long r3, long r4, long r5,
>  	if (!IS_ENABLED(CONFIG_BOOKE) && !IS_ENABLED(CONFIG_40x))
>  		BUG_ON(!(regs->msr & MSR_RI));
>  	BUG_ON(!(regs->msr & MSR_PR));
> -	BUG_ON(!FULL_REGS(regs));
>  	BUG_ON(arch_irq_disabled_regs(regs));
>  
>  #ifdef CONFIG_PPC_PKEY
> -- 
> 2.25.0
> 
> 

^ permalink raw reply

* Re: [PATCH v5 19/22] powerpc/syscall: Optimise checks in beginning of system_call_exception()
From: Nicholas Piggin @ 2021-02-09  2:06 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Christophe Leroy, Michael Ellerman,
	msuchanek, Paul Mackerras
  Cc: linuxppc-dev, linux-kernel
In-Reply-To: <3e48bb439357c6f72ae4343bf93bd29f0980eeb1.1612796617.git.christophe.leroy@csgroup.eu>

Excerpts from Christophe Leroy's message of February 9, 2021 1:10 am:
> Combine all tests of regs->msr into a single logical one.

Okay by me unless we choose to do the config option and put these all 
under it. I think I would prefer that because sometimes the registers
are in a state you can't easily see what the values in the expression
were. In this case it doesn't matter so much because they should be in
regs in the interrupt frame.

Thanks,
Nick

> 
> Before the patch:
> 
>    0:	81 6a 00 84 	lwz     r11,132(r10)
>    4:	90 6a 00 88 	stw     r3,136(r10)
>    8:	69 60 00 02 	xori    r0,r11,2
>    c:	54 00 ff fe 	rlwinm  r0,r0,31,31,31
>   10:	0f 00 00 00 	twnei   r0,0
>   14:	69 63 40 00 	xori    r3,r11,16384
>   18:	54 63 97 fe 	rlwinm  r3,r3,18,31,31
>   1c:	0f 03 00 00 	twnei   r3,0
>   20:	69 6b 80 00 	xori    r11,r11,32768
>   24:	55 6b 8f fe 	rlwinm  r11,r11,17,31,31
>   28:	0f 0b 00 00 	twnei   r11,0
> 
> After the patch:
> 
>    0:	81 6a 00 84 	lwz     r11,132(r10)
>    4:	90 6a 00 88 	stw     r3,136(r10)
>    8:	7d 6b 58 f8 	not     r11,r11
>    c:	71 6b c0 02 	andi.   r11,r11,49154
>   10:	0f 0b 00 00 	twnei   r11,0
> 
> 6 cycles less on powerpc 8xx (328 => 322 cycles).
> 
> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
> ---
>  arch/powerpc/kernel/interrupt.c | 10 +++++++---
>  1 file changed, 7 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/interrupt.c b/arch/powerpc/kernel/interrupt.c
> index 55e1aa18cdb9..8c38e8c95be2 100644
> --- a/arch/powerpc/kernel/interrupt.c
> +++ b/arch/powerpc/kernel/interrupt.c
> @@ -28,6 +28,7 @@ notrace long system_call_exception(long r3, long r4, long r5,
>  				   unsigned long r0, struct pt_regs *regs)
>  {
>  	syscall_fn f;
> +	unsigned long expected_msr;
>  
>  	regs->orig_gpr3 = r3;
>  
> @@ -39,10 +40,13 @@ notrace long system_call_exception(long r3, long r4, long r5,
>  
>  	trace_hardirqs_off(); /* finish reconciling */
>  
> +	expected_msr = MSR_PR;
>  	if (!IS_ENABLED(CONFIG_BOOKE) && !IS_ENABLED(CONFIG_40x))
> -		BUG_ON(!(regs->msr & MSR_RI));
> -	BUG_ON(!(regs->msr & MSR_PR));
> -	BUG_ON(arch_irq_disabled_regs(regs));
> +		expected_msr |= MSR_RI;
> +	if (IS_ENABLED(CONFIG_PPC32))
> +		expected_msr |= MSR_EE;
> +	BUG_ON((regs->msr & expected_msr) ^ expected_msr);
> +	BUG_ON(IS_ENABLED(CONFIG_PPC64) && arch_irq_disabled_regs(regs));
>  
>  #ifdef CONFIG_PPC_PKEY
>  	if (mmu_has_feature(MMU_FTR_PKEY)) {
> -- 
> 2.25.0
> 
> 

^ permalink raw reply

* Re: [PATCH v5 20/22] powerpc/syscall: Avoid storing 'current' in another pointer
From: Nicholas Piggin @ 2021-02-09  2:36 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Christophe Leroy, Michael Ellerman,
	msuchanek, Paul Mackerras
  Cc: linuxppc-dev, linux-kernel
In-Reply-To: <24804747098369ebcdac38970b8f7a1260bdd248.1612796617.git.christophe.leroy@csgroup.eu>

Excerpts from Christophe Leroy's message of February 9, 2021 1:10 am:
> By saving the pointer pointing to thread_info.flags, gcc copies r2
> in a non-volatile register.
> 
> We know 'current' doesn't change, so avoid that intermediaite pointer.
> 
> Reduces null_syscall benchmark by 2 cycles (322 => 320 cycles)
> 
> On PPC64, gcc seems to know that 'current' is not changing, and it keeps
> it in a non volatile register to avoid multiple read of 'current' in paca.
> 
> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>

What if you did this?

---
 arch/powerpc/include/asm/current.h | 13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/include/asm/current.h b/arch/powerpc/include/asm/current.h
index bbfb94800415..59ab327972a5 100644
--- a/arch/powerpc/include/asm/current.h
+++ b/arch/powerpc/include/asm/current.h
@@ -23,16 +23,19 @@ static inline struct task_struct *get_current(void)
 
 	return task;
 }
-#define current	get_current()
 
 #else
 
-/*
- * We keep `current' in r2 for speed.
- */
-register struct task_struct *current asm ("r2");
+static inline struct task_struct *get_current(void)
+{
+	register struct task_struct *task asm ("r2");
+
+	return task;
+}
 
 #endif
 
+#define current	get_current()
+
 #endif /* __KERNEL__ */
 #endif /* _ASM_POWERPC_CURRENT_H */
-- 

^ permalink raw reply related

* Re: [PATCH v7 32/42] powerpc/64: context tracking move to interrupt wrappers
From: Christophe Leroy @ 2021-02-09  5:49 UTC (permalink / raw)
  To: Nicholas Piggin, linuxppc-dev; +Cc: Athira Rajeev
In-Reply-To: <20210130130852.2952424-33-npiggin@gmail.com>



Le 30/01/2021 à 14:08, Nicholas Piggin a écrit :
> This moves exception_enter/exit calls to wrapper functions for
> synchronous interrupts. More interrupt handlers are covered by
> this than previously.

Why did you enclose everything in #ifdef CONFIG_PPC64 ? As far as I understand, before this patch 
exception_enter() and exception_exit() are called also on PPC32.

Christophe


> 
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
>   arch/powerpc/include/asm/interrupt.h  |  9 ++++
>   arch/powerpc/kernel/traps.c           | 74 ++++++---------------------
>   arch/powerpc/mm/book3s64/hash_utils.c |  3 --
>   arch/powerpc/mm/fault.c               |  9 +---
>   4 files changed, 27 insertions(+), 68 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/interrupt.h b/arch/powerpc/include/asm/interrupt.h
> index 488bdd5bd922..e65ce3e2b071 100644
> --- a/arch/powerpc/include/asm/interrupt.h
> +++ b/arch/powerpc/include/asm/interrupt.h
> @@ -7,10 +7,16 @@
>   #include <asm/ftrace.h>
>   
>   struct interrupt_state {
> +#ifdef CONFIG_PPC64
> +	enum ctx_state ctx_state;
> +#endif
>   };
>   
>   static inline void interrupt_enter_prepare(struct pt_regs *regs, struct interrupt_state *state)
>   {
> +#ifdef CONFIG_PPC64
> +	state->ctx_state = exception_enter();
> +#endif
>   }
>   
>   /*
> @@ -29,6 +35,9 @@ static inline void interrupt_enter_prepare(struct pt_regs *regs, struct interrup
>    */
>   static inline void interrupt_exit_prepare(struct pt_regs *regs, struct interrupt_state *state)
>   {
> +#ifdef CONFIG_PPC64
> +	exception_exit(state->ctx_state);
> +#endif
>   }
>   
>   static inline void interrupt_async_enter_prepare(struct pt_regs *regs, struct interrupt_state *state)
> diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
> index da488e62fb5f..21fd14828827 100644
> --- a/arch/powerpc/kernel/traps.c
> +++ b/arch/powerpc/kernel/traps.c
> @@ -1087,41 +1087,28 @@ DEFINE_INTERRUPT_HANDLER_ASYNC(handle_hmi_exception)
>   
>   DEFINE_INTERRUPT_HANDLER(unknown_exception)
>   {
> -	enum ctx_state prev_state = exception_enter();
> -
>   	printk("Bad trap at PC: %lx, SR: %lx, vector=%lx\n",
>   	       regs->nip, regs->msr, regs->trap);
>   
>   	_exception(SIGTRAP, regs, TRAP_UNK, 0);
> -
> -	exception_exit(prev_state);
>   }
>   
>   DEFINE_INTERRUPT_HANDLER_ASYNC(unknown_async_exception)
>   {
> -	enum ctx_state prev_state = exception_enter();
> -
>   	printk("Bad trap at PC: %lx, SR: %lx, vector=%lx\n",
>   	       regs->nip, regs->msr, regs->trap);
>   
>   	_exception(SIGTRAP, regs, TRAP_UNK, 0);
> -
> -	exception_exit(prev_state);
>   }
>   
>   DEFINE_INTERRUPT_HANDLER(instruction_breakpoint_exception)
>   {
> -	enum ctx_state prev_state = exception_enter();
> -
>   	if (notify_die(DIE_IABR_MATCH, "iabr_match", regs, 5,
>   					5, SIGTRAP) == NOTIFY_STOP)
> -		goto bail;
> +		return;
>   	if (debugger_iabr_match(regs))
> -		goto bail;
> +		return;
>   	_exception(SIGTRAP, regs, TRAP_BRKPT, regs->nip);
> -
> -bail:
> -	exception_exit(prev_state);
>   }
>   
>   DEFINE_INTERRUPT_HANDLER(RunModeException)
> @@ -1131,8 +1118,6 @@ DEFINE_INTERRUPT_HANDLER(RunModeException)
>   
>   DEFINE_INTERRUPT_HANDLER(single_step_exception)
>   {
> -	enum ctx_state prev_state = exception_enter();
> -
>   	clear_single_step(regs);
>   	clear_br_trace(regs);
>   
> @@ -1141,14 +1126,11 @@ DEFINE_INTERRUPT_HANDLER(single_step_exception)
>   
>   	if (notify_die(DIE_SSTEP, "single_step", regs, 5,
>   					5, SIGTRAP) == NOTIFY_STOP)
> -		goto bail;
> +		return;
>   	if (debugger_sstep(regs))
> -		goto bail;
> +		return;
>   
>   	_exception(SIGTRAP, regs, TRAP_TRACE, regs->nip);
> -
> -bail:
> -	exception_exit(prev_state);
>   }
>   NOKPROBE_SYMBOL(single_step_exception);
>   
> @@ -1476,7 +1458,6 @@ static inline int emulate_math(struct pt_regs *regs) { return -1; }
>   
>   DEFINE_INTERRUPT_HANDLER(program_check_exception)
>   {
> -	enum ctx_state prev_state = exception_enter();
>   	unsigned int reason = get_reason(regs);
>   
>   	/* We can now get here via a FP Unavailable exception if the core
> @@ -1485,22 +1466,22 @@ DEFINE_INTERRUPT_HANDLER(program_check_exception)
>   	if (reason & REASON_FP) {
>   		/* IEEE FP exception */
>   		parse_fpe(regs);
> -		goto bail;
> +		return;
>   	}
>   	if (reason & REASON_TRAP) {
>   		unsigned long bugaddr;
>   		/* Debugger is first in line to stop recursive faults in
>   		 * rcu_lock, notify_die, or atomic_notifier_call_chain */
>   		if (debugger_bpt(regs))
> -			goto bail;
> +			return;
>   
>   		if (kprobe_handler(regs))
> -			goto bail;
> +			return;
>   
>   		/* trap exception */
>   		if (notify_die(DIE_BPT, "breakpoint", regs, 5, 5, SIGTRAP)
>   				== NOTIFY_STOP)
> -			goto bail;
> +			return;
>   
>   		bugaddr = regs->nip;
>   		/*
> @@ -1512,10 +1493,10 @@ DEFINE_INTERRUPT_HANDLER(program_check_exception)
>   		if (!(regs->msr & MSR_PR) &&  /* not user-mode */
>   		    report_bug(bugaddr, regs) == BUG_TRAP_TYPE_WARN) {
>   			regs->nip += 4;
> -			goto bail;
> +			return;
>   		}
>   		_exception(SIGTRAP, regs, TRAP_BRKPT, regs->nip);
> -		goto bail;
> +		return;
>   	}
>   #ifdef CONFIG_PPC_TRANSACTIONAL_MEM
>   	if (reason & REASON_TM) {
> @@ -1536,7 +1517,7 @@ DEFINE_INTERRUPT_HANDLER(program_check_exception)
>   		 */
>   		if (user_mode(regs)) {
>   			_exception(SIGILL, regs, ILL_ILLOPN, regs->nip);
> -			goto bail;
> +			return;
>   		} else {
>   			printk(KERN_EMERG "Unexpected TM Bad Thing exception "
>   			       "at %lx (msr 0x%lx) tm_scratch=%llx\n",
> @@ -1567,7 +1548,7 @@ DEFINE_INTERRUPT_HANDLER(program_check_exception)
>   	 * pattern to occurrences etc. -dgibson 31/Mar/2003
>   	 */
>   	if (!emulate_math(regs))
> -		goto bail;
> +		return;
>   
>   	/* Try to emulate it if we should. */
>   	if (reason & (REASON_ILLEGAL | REASON_PRIVILEGED)) {
> @@ -1575,10 +1556,10 @@ DEFINE_INTERRUPT_HANDLER(program_check_exception)
>   		case 0:
>   			regs->nip += 4;
>   			emulate_single_step(regs);
> -			goto bail;
> +			return;
>   		case -EFAULT:
>   			_exception(SIGSEGV, regs, SEGV_MAPERR, regs->nip);
> -			goto bail;
> +			return;
>   		}
>   	}
>   
> @@ -1587,9 +1568,6 @@ DEFINE_INTERRUPT_HANDLER(program_check_exception)
>   		_exception(SIGILL, regs, ILL_PRVOPC, regs->nip);
>   	else
>   		_exception(SIGILL, regs, ILL_ILLOPC, regs->nip);
> -
> -bail:
> -	exception_exit(prev_state);
>   }
>   NOKPROBE_SYMBOL(program_check_exception);
>   
> @@ -1606,14 +1584,12 @@ NOKPROBE_SYMBOL(emulation_assist_interrupt);
>   
>   DEFINE_INTERRUPT_HANDLER(alignment_exception)
>   {
> -	enum ctx_state prev_state = exception_enter();
>   	int sig, code, fixed = 0;
>   	unsigned long  reason;
>   
>   	interrupt_cond_local_irq_enable(regs);
>   
>   	reason = get_reason(regs);
> -
>   	if (reason & REASON_BOUNDARY) {
>   		sig = SIGBUS;
>   		code = BUS_ADRALN;
> @@ -1621,7 +1597,7 @@ DEFINE_INTERRUPT_HANDLER(alignment_exception)
>   	}
>   
>   	if (tm_abort_check(regs, TM_CAUSE_ALIGNMENT | TM_CAUSE_PERSISTENT))
> -		goto bail;
> +		return;
>   
>   	/* we don't implement logging of alignment exceptions */
>   	if (!(current->thread.align_ctl & PR_UNALIGN_SIGBUS))
> @@ -1631,7 +1607,7 @@ DEFINE_INTERRUPT_HANDLER(alignment_exception)
>   		/* skip over emulated instruction */
>   		regs->nip += inst_length(reason);
>   		emulate_single_step(regs);
> -		goto bail;
> +		return;
>   	}
>   
>   	/* Operand address was bad */
> @@ -1647,9 +1623,6 @@ DEFINE_INTERRUPT_HANDLER(alignment_exception)
>   		_exception(sig, regs, code, regs->dar);
>   	else
>   		bad_page_fault(regs, sig);
> -
> -bail:
> -	exception_exit(prev_state);
>   }
>   
>   DEFINE_INTERRUPT_HANDLER(StackOverflow)
> @@ -1663,41 +1636,28 @@ DEFINE_INTERRUPT_HANDLER(StackOverflow)
>   
>   DEFINE_INTERRUPT_HANDLER(stack_overflow_exception)
>   {
> -	enum ctx_state prev_state = exception_enter();
> -
>   	die("Kernel stack overflow", regs, SIGSEGV);
> -
> -	exception_exit(prev_state);
>   }
>   
>   DEFINE_INTERRUPT_HANDLER(kernel_fp_unavailable_exception)
>   {
> -	enum ctx_state prev_state = exception_enter();
> -
>   	printk(KERN_EMERG "Unrecoverable FP Unavailable Exception "
>   			  "%lx at %lx\n", regs->trap, regs->nip);
>   	die("Unrecoverable FP Unavailable Exception", regs, SIGABRT);
> -
> -	exception_exit(prev_state);
>   }
>   
>   DEFINE_INTERRUPT_HANDLER(altivec_unavailable_exception)
>   {
> -	enum ctx_state prev_state = exception_enter();
> -
>   	if (user_mode(regs)) {
>   		/* A user program has executed an altivec instruction,
>   		   but this kernel doesn't support altivec. */
>   		_exception(SIGILL, regs, ILL_ILLOPC, regs->nip);
> -		goto bail;
> +		return;
>   	}
>   
>   	printk(KERN_EMERG "Unrecoverable VMX/Altivec Unavailable Exception "
>   			"%lx at %lx\n", regs->trap, regs->nip);
>   	die("Unrecoverable VMX/Altivec Unavailable Exception", regs, SIGABRT);
> -
> -bail:
> -	exception_exit(prev_state);
>   }
>   
>   DEFINE_INTERRUPT_HANDLER(vsx_unavailable_exception)
> diff --git a/arch/powerpc/mm/book3s64/hash_utils.c b/arch/powerpc/mm/book3s64/hash_utils.c
> index d681dc5a7b1c..fb7c10524bcd 100644
> --- a/arch/powerpc/mm/book3s64/hash_utils.c
> +++ b/arch/powerpc/mm/book3s64/hash_utils.c
> @@ -1514,7 +1514,6 @@ EXPORT_SYMBOL_GPL(hash_page);
>   DECLARE_INTERRUPT_HANDLER_RET(__do_hash_fault);
>   DEFINE_INTERRUPT_HANDLER_RET(__do_hash_fault)
>   {
> -	enum ctx_state prev_state = exception_enter();
>   	unsigned long ea = regs->dar;
>   	unsigned long dsisr = regs->dsisr;
>   	unsigned long access = _PAGE_PRESENT | _PAGE_READ;
> @@ -1563,8 +1562,6 @@ DEFINE_INTERRUPT_HANDLER_RET(__do_hash_fault)
>   		err = 0;
>   	}
>   
> -	exception_exit(prev_state);
> -
>   	return err;
>   }
>   
> diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
> index 9c4220efc20f..b26a7643fc6e 100644
> --- a/arch/powerpc/mm/fault.c
> +++ b/arch/powerpc/mm/fault.c
> @@ -564,14 +564,7 @@ NOKPROBE_SYMBOL(__do_page_fault);
>   
>   DEFINE_INTERRUPT_HANDLER_RET(do_page_fault)
>   {
> -	enum ctx_state prev_state = exception_enter();
> -	long err;
> -
> -	err = __do_page_fault(regs);
> -
> -	exception_exit(prev_state);
> -
> -	return err;
> +	return __do_page_fault(regs);
>   }
>   NOKPROBE_SYMBOL(do_page_fault);
>   
> 

^ permalink raw reply

* Re: [PATCH v5 05/22] powerpc/irq: Add helper to set regs->softe
From: Christophe Leroy @ 2021-02-09  5:57 UTC (permalink / raw)
  To: Nicholas Piggin, Benjamin Herrenschmidt, Michael Ellerman,
	msuchanek, Paul Mackerras
  Cc: linuxppc-dev, linux-kernel
In-Reply-To: <1612832745.vhjk6358hf.astroid@bobo.none>



Le 09/02/2021 à 02:11, Nicholas Piggin a écrit :
> Excerpts from Christophe Leroy's message of February 9, 2021 1:10 am:
>> regs->softe doesn't exist on PPC32.
>>
>> Add irq_soft_mask_regs_set_state() helper to set regs->softe.
>> This helper will void on PPC32.
>>
>> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
>> ---
>>   arch/powerpc/include/asm/hw_irq.h | 11 +++++++++--
>>   1 file changed, 9 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/powerpc/include/asm/hw_irq.h b/arch/powerpc/include/asm/hw_irq.h
>> index 614957f74cee..ed0c3b049dfd 100644
>> --- a/arch/powerpc/include/asm/hw_irq.h
>> +++ b/arch/powerpc/include/asm/hw_irq.h
>> @@ -38,6 +38,8 @@
>>   #define PACA_IRQ_MUST_HARD_MASK	(PACA_IRQ_EE)
>>   #endif
>>   
>> +#endif /* CONFIG_PPC64 */
>> +
>>   /*
>>    * flags for paca->irq_soft_mask
>>    */
>> @@ -46,8 +48,6 @@
>>   #define IRQS_PMI_DISABLED	2
>>   #define IRQS_ALL_DISABLED	(IRQS_DISABLED | IRQS_PMI_DISABLED)
>>   
>> -#endif /* CONFIG_PPC64 */
>> -
>>   #ifndef __ASSEMBLY__
>>   
>>   #ifdef CONFIG_PPC64
>> @@ -287,6 +287,10 @@ extern void irq_set_pending_from_srr1(unsigned long srr1);
>>   
>>   extern void force_external_irq_replay(void);
>>   
>> +static inline void irq_soft_mask_regs_set_state(struct pt_regs *regs, unsigned long val)
>> +{
>> +	regs->softe = val;
>> +}
>>   #else /* CONFIG_PPC64 */
>>   
>>   static inline unsigned long arch_local_save_flags(void)
>> @@ -355,6 +359,9 @@ static inline bool arch_irq_disabled_regs(struct pt_regs *regs)
>>   
>>   static inline void may_hard_irq_enable(void) { }
>>   
>> +static inline void irq_soft_mask_regs_set_state(struct pt_regs *regs, unsigned long val)
>> +{
>> +}
>>   #endif /* CONFIG_PPC64 */
>>   
>>   #define ARCH_IRQ_INIT_FLAGS	IRQ_NOREQUEST
> 
> What I don't like about this where you use it is it kind of pollutes
> the ppc32 path with this function which is not valid to use.
> 
> I would prefer if you had this purely so it could compile with:
> 
>    if (IS_ENABLED(CONFIG_PPC64)))
>        irq_soft_mask_regs_set_state(regs, blah);
> 
> And then you could make the ppc32 cause a link error if it did not
> get eliminated at compile time (e.g., call an undefined function).
> 
> You could do the same with the kuap_ functions to change some ifdefs
> to IS_ENABLED.
> 
> That's just my preference but if you prefer this way I guess that's
> okay.

I see you didn't change your mind since last April :)

I'll see what I can do.

Christophe

^ permalink raw reply

* Re: [PATCH v5 09/22] powerpc/syscall: Make interrupt.c buildable on PPC32
From: Christophe Leroy @ 2021-02-09  6:02 UTC (permalink / raw)
  To: Nicholas Piggin, Benjamin Herrenschmidt, Michael Ellerman,
	msuchanek, Paul Mackerras
  Cc: linuxppc-dev, linux-kernel
In-Reply-To: <1612833796.dl9doe6njg.astroid@bobo.none>



Le 09/02/2021 à 02:27, Nicholas Piggin a écrit :
> Excerpts from Christophe Leroy's message of February 9, 2021 1:10 am:
>> To allow building interrupt.c on PPC32, ifdef out specific PPC64
>> code or use helpers which are available on both PP32 and PPC64
>>
>> Modify Makefile to always build interrupt.o
>>
>> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
>> ---
>> v5:
>> - Also for interrupt exit preparation
>> - Opted out kuap related code, ppc32 keeps it in ASM for the time being
>> ---
>>   arch/powerpc/kernel/Makefile    |  4 ++--
>>   arch/powerpc/kernel/interrupt.c | 31 ++++++++++++++++++++++++-------
>>   2 files changed, 26 insertions(+), 9 deletions(-)
>>

>> diff --git a/arch/powerpc/kernel/interrupt.c b/arch/powerpc/kernel/interrupt.c
>> index d6be4f9a67e5..2dac4d2bb1cf 100644
>> --- a/arch/powerpc/kernel/interrupt.c
>> +++ b/arch/powerpc/kernel/interrupt.c
>> @@ -39,7 +39,7 @@ notrace long system_call_exception(long r3, long r4, long r5,
>>   		BUG_ON(!(regs->msr & MSR_RI));
>>   	BUG_ON(!(regs->msr & MSR_PR));
>>   	BUG_ON(!FULL_REGS(regs));
>> -	BUG_ON(regs->softe != IRQS_ENABLED);
>> +	BUG_ON(arch_irq_disabled_regs(regs));
>>   
>>   #ifdef CONFIG_PPC_PKEY
>>   	if (mmu_has_feature(MMU_FTR_PKEY)) {
>> @@ -65,7 +65,9 @@ notrace long system_call_exception(long r3, long r4, long r5,
>>   			isync();
>>   	} else
>>   #endif
>> +#ifdef CONFIG_PPC64
>>   		kuap_check_amr();
>> +#endif
> 
> Wouldn't mind trying to get rid of these ifdefs at some point, but
> there's some kuap / keys changes going on recently so I'm happy enough
> to let this settle then look at whether we can refactor.

I have a follow up series that implements interrupts entries/exits in C and that removes all kuap 
assembly, I will likely release it as RFC later today.

> 
>>   
>>   	account_cpu_user_entry();
>>   
>> @@ -318,7 +323,7 @@ notrace unsigned long syscall_exit_prepare(unsigned long r3,
>>   	return ret;
>>   }
>>   
>> -#ifdef CONFIG_PPC_BOOK3S /* BOOK3E not yet using this */
>> +#ifndef CONFIG_PPC_BOOK3E_64 /* BOOK3E not yet using this */
>>   notrace unsigned long interrupt_exit_user_prepare(struct pt_regs *regs, unsigned long msr)
>>   {
>>   #ifdef CONFIG_PPC_BOOK3E
> 
> Why are you building this for 32? I don't mind if it's just to keep
> things similar and make it build for now, but you're not using it yet,
> right?

The series using that will follow, I thought it would be worth doing this at once.

>   
> Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
> 

^ permalink raw reply

* Re: [PATCH v5 17/22] powerpc/syscall: Do not check unsupported scv vector on PPC32
From: Christophe Leroy @ 2021-02-09  6:13 UTC (permalink / raw)
  To: Nicholas Piggin, Benjamin Herrenschmidt, Michael Ellerman,
	msuchanek, Paul Mackerras
  Cc: linuxppc-dev, linux-kernel
In-Reply-To: <1612835741.qmlhg8iwmj.astroid@bobo.none>



Le 09/02/2021 à 03:00, Nicholas Piggin a écrit :
> Excerpts from Christophe Leroy's message of February 9, 2021 1:10 am:
>> Only PPC64 has scv. No need to check the 0x7ff0 trap on PPC32.
>> For that, add a helper trap_is_unsupported_scv() similar to
>> trap_is_scv().
>>
>> And ignore the scv parameter in syscall_exit_prepare (Save 14 cycles
>> 346 => 332 cycles)
>>
>> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
>> ---
>> v5: Added a helper trap_is_unsupported_scv()
>> ---
>>   arch/powerpc/include/asm/ptrace.h | 5 +++++
>>   arch/powerpc/kernel/entry_32.S    | 1 -
>>   arch/powerpc/kernel/interrupt.c   | 7 +++++--
>>   3 files changed, 10 insertions(+), 3 deletions(-)
>>
>> diff --git a/arch/powerpc/include/asm/ptrace.h b/arch/powerpc/include/asm/ptrace.h
>> index 58f9dc060a7b..2c842b11a924 100644
>> --- a/arch/powerpc/include/asm/ptrace.h
>> +++ b/arch/powerpc/include/asm/ptrace.h
>> @@ -229,6 +229,11 @@ static inline bool trap_is_scv(struct pt_regs *regs)
>>   	return (IS_ENABLED(CONFIG_PPC_BOOK3S_64) && TRAP(regs) == 0x3000);
>>   }
>>   
>> +static inline bool trap_is_unsupported_scv(struct pt_regs *regs)
>> +{
>> +	return (IS_ENABLED(CONFIG_PPC_BOOK3S_64) && TRAP(regs) == 0x7ff0);
>> +}
> 
> This change is good.
> 
>> +
>>   static inline bool trap_is_syscall(struct pt_regs *regs)
>>   {
>>   	return (trap_is_scv(regs) || TRAP(regs) == 0xc00);
>> diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S
>> index cffe58e63356..7c824e8928d0 100644
>> --- a/arch/powerpc/kernel/entry_32.S
>> +++ b/arch/powerpc/kernel/entry_32.S
>> @@ -344,7 +344,6 @@ transfer_to_syscall:
>>   
>>   ret_from_syscall:
>>   	addi    r4,r1,STACK_FRAME_OVERHEAD
>> -	li	r5,0
>>   	bl	syscall_exit_prepare
> 
> For this one, I think it would be nice to do the "right" thing and make
> the function prototypes different on !64S. They could then declare a
> local const bool scv = 0.
> 
> We could have syscall_exit_prepare and syscall_exit_prepare_maybe_scv
> or something like that, 64s can use the latter one and the former can be
> a wrapper that passes constant 0 for scv. Then we don't have different
> prototypes for the same function, but you just have to make the 32-bit
> version static inline and the 64-bit version exported to asm.

You can't call a static inline function from ASM, I don't understand you.

What is wrong for you really here ? Is that the fact we leave scv random, or is that the below 
IS_ENABLED() ?

I don't mind keeping the 'li r5,0' before calling the function if you find it cleaner, the real 
performance gain is with setting scv to 0 below for PPC32 (and maybe it should be set to zero for 
book3e/64 too ?).

Other solution would be to do replace (!scv) by (!scv || !IS_ENABLED(CONFIG_PPC_BOOK3S_64)) in the 
two places it is used in syscall_exit_prepare().

Any preference ?

Thanks
Christophe

>> @@ -224,6 +224,9 @@ notrace unsigned long syscall_exit_prepare(unsigned long r3,
>>   	unsigned long ti_flags;
>>   	unsigned long ret = 0;
>>   
>> +	if (IS_ENABLED(CONFIG_PPC32))
>> +		scv = 0;
>> +
>>   	CT_WARN_ON(ct_state() == CONTEXT_USER);
>>   
>>   #ifdef CONFIG_PPC64
>> -- 
>> 2.25.0
>>
>>

^ permalink raw reply

* Re: [PATCH v5 05/22] powerpc/irq: Add helper to set regs->softe
From: Christophe Leroy @ 2021-02-09  6:18 UTC (permalink / raw)
  To: Nicholas Piggin, Benjamin Herrenschmidt, Michael Ellerman,
	msuchanek, Paul Mackerras
  Cc: linuxppc-dev, linux-kernel
In-Reply-To: <1612832745.vhjk6358hf.astroid@bobo.none>



Le 09/02/2021 à 02:11, Nicholas Piggin a écrit :
> Excerpts from Christophe Leroy's message of February 9, 2021 1:10 am:
>> regs->softe doesn't exist on PPC32.
>>
>> Add irq_soft_mask_regs_set_state() helper to set regs->softe.
>> This helper will void on PPC32.
>>
>> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
>> ---
> 
> You could do the same with the kuap_ functions to change some ifdefs
> to IS_ENABLED.
> 
> That's just my preference but if you prefer this way I guess that's
> okay.
> 


That's also my preference on the long term.

Here it is ephemeral, I have a follow up series implementing interrupt exit/entry in C and getting 
rid of all the assembly kuap hence getting rid of those ifdefs.

The issue I see when using IS_ENABLED() is that you have to indent to the right, then you interfere 
with the file history and 'git blame'

Thanks for reviewing my series and looking forward to your feedback on my series on the interrupt 
entry/exit that I will likely release later today.

Christophe

^ permalink raw reply

* [PATCH v4 00/14] Restricted DMA
From: Claire Chang @ 2021-02-09  6:21 UTC (permalink / raw)
  To: Rob Herring, mpe, Joerg Roedel, Will Deacon, Frank Rowand,
	Konrad Rzeszutek Wilk, boris.ostrovsky, jgross, Christoph Hellwig,
	Marek Szyprowski
  Cc: heikki.krogerus, peterz, grant.likely, paulus, mingo, sstabellini,
	Saravana Kannan, xypron.glpk, Rafael J . Wysocki,
	Bartosz Golaszewski, xen-devel, Thierry Reding, linux-devicetree,
	linuxppc-dev, Nicolas Boichat, Claire Chang, Dan Williams,
	Andy Shevchenko, Greg KH, Randy Dunlap, lkml,
	list@263.net:IOMMU DRIVERS, Jim Quinlan, Robin Murphy, bauerman

This series implements mitigations for lack of DMA access control on
systems without an IOMMU, which could result in the DMA accessing the
system memory at unexpected times and/or unexpected addresses, possibly
leading to data leakage or corruption.

For example, we plan to use the PCI-e bus for Wi-Fi and that PCI-e bus is
not behind an IOMMU. As PCI-e, by design, gives the device full access to
system memory, a vulnerability in the Wi-Fi firmware could easily escalate
to a full system exploit (remote wifi exploits: [1a], [1b] that shows a
full chain of exploits; [2], [3]).

To mitigate the security concerns, we introduce restricted DMA. Restricted
DMA utilizes the existing swiotlb to bounce streaming DMA in and out of a
specially allocated region and does memory allocation from the same region.
The feature on its own provides a basic level of protection against the DMA
overwriting buffer contents at unexpected times. However, to protect
against general data leakage and system memory corruption, the system needs
to provide a way to restrict the DMA to a predefined memory region (this is
usually done at firmware level, e.g. MPU in ATF on some ARM platforms [4]).

[1a] https://googleprojectzero.blogspot.com/2017/04/over-air-exploiting-broadcoms-wi-fi_4.html
[1b] https://googleprojectzero.blogspot.com/2017/04/over-air-exploiting-broadcoms-wi-fi_11.html
[2] https://blade.tencent.com/en/advisories/qualpwn/
[3] https://www.bleepingcomputer.com/news/security/vulnerabilities-found-in-highly-popular-firmware-for-wifi-chips/
[4] https://github.com/ARM-software/arm-trusted-firmware/blob/master/plat/mediatek/mt8183/drivers/emi_mpu/emi_mpu.c#L132

Claire Chang (14):
  swiotlb: Remove external access to io_tlb_start
  swiotlb: Move is_swiotlb_buffer() to swiotlb.c
  swiotlb: Add struct swiotlb
  swiotlb: Refactor swiotlb_late_init_with_tbl
  swiotlb: Add DMA_RESTRICTED_POOL
  swiotlb: Add restricted DMA pool
  swiotlb: Update swiotlb API to gain a struct device argument
  swiotlb: Use restricted DMA pool if available
  swiotlb: Refactor swiotlb_tbl_{map,unmap}_single
  dma-direct: Add a new wrapper __dma_direct_free_pages()
  swiotlb: Add is_dev_swiotlb_force()
  swiotlb: Add restricted DMA alloc/free support.
  dt-bindings: of: Add restricted DMA pool
  of: Add plumbing for restricted DMA pool

 .../reserved-memory/reserved-memory.txt       |  24 +
 arch/powerpc/platforms/pseries/svm.c          |   4 +-
 drivers/iommu/dma-iommu.c                     |  12 +-
 drivers/of/address.c                          |  25 +
 drivers/of/device.c                           |   3 +
 drivers/of/of_private.h                       |   5 +
 drivers/xen/swiotlb-xen.c                     |   4 +-
 include/linux/device.h                        |   4 +
 include/linux/swiotlb.h                       |  32 +-
 kernel/dma/Kconfig                            |  14 +
 kernel/dma/direct.c                           |  51 +-
 kernel/dma/direct.h                           |   8 +-
 kernel/dma/swiotlb.c                          | 636 ++++++++++++------
 13 files changed, 582 insertions(+), 240 deletions(-)

-- 

v4:
  - Fix spinlock bad magic
  - Use rmem->name for debugfs entry
  - Address the comments in v3

v3:
  Using only one reserved memory region for both streaming DMA and memory
  allocation.
  https://lore.kernel.org/patchwork/cover/1360992/

v2:
  Building on top of swiotlb.
  https://lore.kernel.org/patchwork/cover/1280705/

v1:
  Using dma_map_ops.
  https://lore.kernel.org/patchwork/cover/1271660/

2.30.0.478.g8a0d178c01-goog


^ permalink raw reply

* [PATCH v4 01/14] swiotlb: Remove external access to io_tlb_start
From: Claire Chang @ 2021-02-09  6:21 UTC (permalink / raw)
  To: Rob Herring, mpe, Joerg Roedel, Will Deacon, Frank Rowand,
	Konrad Rzeszutek Wilk, boris.ostrovsky, jgross, Christoph Hellwig,
	Marek Szyprowski
  Cc: heikki.krogerus, peterz, grant.likely, paulus, mingo, sstabellini,
	Saravana Kannan, xypron.glpk, Rafael J . Wysocki,
	Bartosz Golaszewski, xen-devel, Thierry Reding, linux-devicetree,
	linuxppc-dev, Nicolas Boichat, Claire Chang, Dan Williams,
	Andy Shevchenko, Greg KH, Randy Dunlap, lkml,
	list@263.net:IOMMU DRIVERS, Jim Quinlan, Robin Murphy, bauerman
In-Reply-To: <20210209062131.2300005-1-tientzu@chromium.org>

Add a new function, get_swiotlb_start(), and remove external access to
io_tlb_start, so we can entirely hide struct swiotlb inside of swiotlb.c
in the following patches.

Signed-off-by: Claire Chang <tientzu@chromium.org>
---
 arch/powerpc/platforms/pseries/svm.c | 4 ++--
 drivers/xen/swiotlb-xen.c            | 4 ++--
 include/linux/swiotlb.h              | 1 +
 kernel/dma/swiotlb.c                 | 5 +++++
 4 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/svm.c b/arch/powerpc/platforms/pseries/svm.c
index 7b739cc7a8a9..c10c51d49f3d 100644
--- a/arch/powerpc/platforms/pseries/svm.c
+++ b/arch/powerpc/platforms/pseries/svm.c
@@ -55,8 +55,8 @@ void __init svm_swiotlb_init(void)
 	if (vstart && !swiotlb_init_with_tbl(vstart, io_tlb_nslabs, false))
 		return;
 
-	if (io_tlb_start)
-		memblock_free_early(io_tlb_start,
+	if (vstart)
+		memblock_free_early(vstart,
 				    PAGE_ALIGN(io_tlb_nslabs << IO_TLB_SHIFT));
 	panic("SVM: Cannot allocate SWIOTLB buffer");
 }
diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
index 2b385c1b4a99..91f8c68d1a9b 100644
--- a/drivers/xen/swiotlb-xen.c
+++ b/drivers/xen/swiotlb-xen.c
@@ -192,8 +192,8 @@ int __ref xen_swiotlb_init(int verbose, bool early)
 	/*
 	 * IO TLB memory already allocated. Just use it.
 	 */
-	if (io_tlb_start != 0) {
-		xen_io_tlb_start = phys_to_virt(io_tlb_start);
+	if (is_swiotlb_active()) {
+		xen_io_tlb_start = phys_to_virt(get_swiotlb_start());
 		goto end;
 	}
 
diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index d9c9fc9ca5d2..83200f3b042a 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -81,6 +81,7 @@ void __init swiotlb_exit(void);
 unsigned int swiotlb_max_segment(void);
 size_t swiotlb_max_mapping_size(struct device *dev);
 bool is_swiotlb_active(void);
+phys_addr_t get_swiotlb_start(void);
 void __init swiotlb_adjust_size(unsigned long new_size);
 #else
 #define swiotlb_force SWIOTLB_NO_FORCE
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 7c42df6e6100..e180211f6ad9 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -719,6 +719,11 @@ bool is_swiotlb_active(void)
 	return io_tlb_end != 0;
 }
 
+phys_addr_t get_swiotlb_start(void)
+{
+	return io_tlb_start;
+}
+
 #ifdef CONFIG_DEBUG_FS
 
 static int __init swiotlb_create_debugfs(void)
--

This can be dropped if Christoph's swiotlb cleanups are landed.
https://lore.kernel.org/linux-iommu/20210207160934.2955931-1-hch@lst.de/T/#m7124f29b6076d462101fcff6433295157621da09 

2.30.0.478.g8a0d178c01-goog


^ permalink raw reply related

* [PATCH v4 02/14] swiotlb: Move is_swiotlb_buffer() to swiotlb.c
From: Claire Chang @ 2021-02-09  6:21 UTC (permalink / raw)
  To: Rob Herring, mpe, Joerg Roedel, Will Deacon, Frank Rowand,
	Konrad Rzeszutek Wilk, boris.ostrovsky, jgross, Christoph Hellwig,
	Marek Szyprowski
  Cc: heikki.krogerus, peterz, grant.likely, paulus, mingo, sstabellini,
	Saravana Kannan, xypron.glpk, Rafael J . Wysocki,
	Bartosz Golaszewski, xen-devel, Thierry Reding, linux-devicetree,
	linuxppc-dev, Nicolas Boichat, Claire Chang, Dan Williams,
	Andy Shevchenko, Greg KH, Randy Dunlap, lkml,
	list@263.net:IOMMU DRIVERS, Jim Quinlan, Robin Murphy, bauerman
In-Reply-To: <20210209062131.2300005-1-tientzu@chromium.org>

Move is_swiotlb_buffer() to swiotlb.c and make io_tlb_{start,end}
static, so we can entirely hide struct swiotlb inside of swiotlb.c in
the following patches.

Signed-off-by: Claire Chang <tientzu@chromium.org>
---
 include/linux/swiotlb.h | 7 +------
 kernel/dma/swiotlb.c    | 7 ++++++-
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 83200f3b042a..041611bf3c2a 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -70,13 +70,8 @@ dma_addr_t swiotlb_map(struct device *dev, phys_addr_t phys,
 
 #ifdef CONFIG_SWIOTLB
 extern enum swiotlb_force swiotlb_force;
-extern phys_addr_t io_tlb_start, io_tlb_end;
-
-static inline bool is_swiotlb_buffer(phys_addr_t paddr)
-{
-	return paddr >= io_tlb_start && paddr < io_tlb_end;
-}
 
+bool is_swiotlb_buffer(phys_addr_t paddr);
 void __init swiotlb_exit(void);
 unsigned int swiotlb_max_segment(void);
 size_t swiotlb_max_mapping_size(struct device *dev);
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index e180211f6ad9..678490d39e55 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -69,7 +69,7 @@ enum swiotlb_force swiotlb_force;
  * swiotlb_tbl_sync_single_*, to see if the memory was in fact allocated by this
  * API.
  */
-phys_addr_t io_tlb_start, io_tlb_end;
+static phys_addr_t io_tlb_start, io_tlb_end;
 
 /*
  * The number of IO TLB blocks (in groups of 64) between io_tlb_start and
@@ -719,6 +719,11 @@ bool is_swiotlb_active(void)
 	return io_tlb_end != 0;
 }
 
+bool is_swiotlb_buffer(phys_addr_t paddr)
+{
+	return paddr >= io_tlb_start && paddr < io_tlb_end;
+}
+
 phys_addr_t get_swiotlb_start(void)
 {
 	return io_tlb_start;
-- 
2.30.0.478.g8a0d178c01-goog


^ permalink raw reply related

* [PATCH v4 03/14] swiotlb: Add struct swiotlb
From: Claire Chang @ 2021-02-09  6:21 UTC (permalink / raw)
  To: Rob Herring, mpe, Joerg Roedel, Will Deacon, Frank Rowand,
	Konrad Rzeszutek Wilk, boris.ostrovsky, jgross, Christoph Hellwig,
	Marek Szyprowski
  Cc: heikki.krogerus, peterz, grant.likely, paulus, mingo, sstabellini,
	Saravana Kannan, xypron.glpk, Rafael J . Wysocki,
	Bartosz Golaszewski, xen-devel, Thierry Reding, linux-devicetree,
	linuxppc-dev, Nicolas Boichat, Claire Chang, Dan Williams,
	Andy Shevchenko, Greg KH, Randy Dunlap, lkml,
	list@263.net:IOMMU DRIVERS, Jim Quinlan, Robin Murphy, bauerman
In-Reply-To: <20210209062131.2300005-1-tientzu@chromium.org>

Added a new struct, swiotlb, as the IO TLB memory pool descriptor and
moved relevant global variables into that struct.
This will be useful later to allow for restricted DMA pool.

Signed-off-by: Claire Chang <tientzu@chromium.org>
---
 kernel/dma/swiotlb.c | 327 +++++++++++++++++++++++--------------------
 1 file changed, 172 insertions(+), 155 deletions(-)

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 678490d39e55..28b7bfe7a2a8 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -61,33 +61,43 @@
  * allocate a contiguous 1MB, we're probably in trouble anyway.
  */
 #define IO_TLB_MIN_SLABS ((1<<20) >> IO_TLB_SHIFT)
+#define INVALID_PHYS_ADDR (~(phys_addr_t)0)
 
 enum swiotlb_force swiotlb_force;
 
 /*
- * Used to do a quick range check in swiotlb_tbl_unmap_single and
- * swiotlb_tbl_sync_single_*, to see if the memory was in fact allocated by this
- * API.
- */
-static phys_addr_t io_tlb_start, io_tlb_end;
-
-/*
- * The number of IO TLB blocks (in groups of 64) between io_tlb_start and
- * io_tlb_end.  This is command line adjustable via setup_io_tlb_npages.
- */
-static unsigned long io_tlb_nslabs;
-
-/*
- * The number of used IO TLB block
- */
-static unsigned long io_tlb_used;
-
-/*
- * This is a free list describing the number of free entries available from
- * each index
+ * struct swiotlb - Software IO TLB Memory Pool Descriptor
+ *
+ * @start:      The start address of the swiotlb memory pool. Used to do a quick
+ *              range check to see if the memory was in fact allocated by this
+ *              API.
+ * @end:        The end address of the swiotlb memory pool. Used to do a quick
+ *              range check to see if the memory was in fact allocated by this
+ *              API.
+ * @nslabs:     The number of IO TLB blocks (in groups of 64) between @start and
+ *              @end. This is command line adjustable via setup_io_tlb_npages.
+ * @used:       The number of used IO TLB block.
+ * @list:       The free list describing the number of free entries available
+ *              from each index.
+ * @index:      The index to start searching in the next round.
+ * @orig_addr:  The original address corresponding to a mapped entry for the
+ *              sync operations.
+ * @lock:       The lock to protect the above data structures in the map and
+ *              unmap calls.
+ * @debugfs:    The dentry to debugfs.
  */
-static unsigned int *io_tlb_list;
-static unsigned int io_tlb_index;
+struct swiotlb {
+	phys_addr_t start;
+	phys_addr_t end;
+	unsigned long nslabs;
+	unsigned long used;
+	unsigned int *list;
+	unsigned int index;
+	phys_addr_t *orig_addr;
+	spinlock_t lock;
+	struct dentry *debugfs;
+};
+static struct swiotlb default_swiotlb;
 
 /*
  * Max segment that we can provide which (if pages are contingous) will
@@ -95,27 +105,17 @@ static unsigned int io_tlb_index;
  */
 static unsigned int max_segment;
 
-/*
- * We need to save away the original address corresponding to a mapped entry
- * for the sync operations.
- */
-#define INVALID_PHYS_ADDR (~(phys_addr_t)0)
-static phys_addr_t *io_tlb_orig_addr;
-
-/*
- * Protect the above data structures in the map and unmap calls
- */
-static DEFINE_SPINLOCK(io_tlb_lock);
-
 static int late_alloc;
 
 static int __init
 setup_io_tlb_npages(char *str)
 {
+	struct swiotlb *swiotlb = &default_swiotlb;
+
 	if (isdigit(*str)) {
-		io_tlb_nslabs = simple_strtoul(str, &str, 0);
+		swiotlb->nslabs = simple_strtoul(str, &str, 0);
 		/* avoid tail segment of size < IO_TLB_SEGSIZE */
-		io_tlb_nslabs = ALIGN(io_tlb_nslabs, IO_TLB_SEGSIZE);
+		swiotlb->nslabs = ALIGN(swiotlb->nslabs, IO_TLB_SEGSIZE);
 	}
 	if (*str == ',')
 		++str;
@@ -123,7 +123,7 @@ setup_io_tlb_npages(char *str)
 		swiotlb_force = SWIOTLB_FORCE;
 	} else if (!strcmp(str, "noforce")) {
 		swiotlb_force = SWIOTLB_NO_FORCE;
-		io_tlb_nslabs = 1;
+		swiotlb->nslabs = 1;
 	}
 
 	return 0;
@@ -134,7 +134,7 @@ static bool no_iotlb_memory;
 
 unsigned long swiotlb_nr_tbl(void)
 {
-	return unlikely(no_iotlb_memory) ? 0 : io_tlb_nslabs;
+	return unlikely(no_iotlb_memory) ? 0 : default_swiotlb.nslabs;
 }
 EXPORT_SYMBOL_GPL(swiotlb_nr_tbl);
 
@@ -156,13 +156,14 @@ unsigned long swiotlb_size_or_default(void)
 {
 	unsigned long size;
 
-	size = io_tlb_nslabs << IO_TLB_SHIFT;
+	size = default_swiotlb.nslabs << IO_TLB_SHIFT;
 
 	return size ? size : (IO_TLB_DEFAULT_SIZE);
 }
 
 void __init swiotlb_adjust_size(unsigned long new_size)
 {
+	struct swiotlb *swiotlb = &default_swiotlb;
 	unsigned long size;
 
 	/*
@@ -170,10 +171,10 @@ void __init swiotlb_adjust_size(unsigned long new_size)
 	 * architectures such as those supporting memory encryption to
 	 * adjust/expand SWIOTLB size for their use.
 	 */
-	if (!io_tlb_nslabs) {
+	if (!swiotlb->nslabs) {
 		size = ALIGN(new_size, 1 << IO_TLB_SHIFT);
-		io_tlb_nslabs = size >> IO_TLB_SHIFT;
-		io_tlb_nslabs = ALIGN(io_tlb_nslabs, IO_TLB_SEGSIZE);
+		swiotlb->nslabs = size >> IO_TLB_SHIFT;
+		swiotlb->nslabs = ALIGN(swiotlb->nslabs, IO_TLB_SEGSIZE);
 
 		pr_info("SWIOTLB bounce buffer size adjusted to %luMB", size >> 20);
 	}
@@ -181,14 +182,15 @@ void __init swiotlb_adjust_size(unsigned long new_size)
 
 void swiotlb_print_info(void)
 {
-	unsigned long bytes = io_tlb_nslabs << IO_TLB_SHIFT;
+	struct swiotlb *swiotlb = &default_swiotlb;
+	unsigned long bytes = swiotlb->nslabs << IO_TLB_SHIFT;
 
 	if (no_iotlb_memory) {
 		pr_warn("No low mem\n");
 		return;
 	}
 
-	pr_info("mapped [mem %pa-%pa] (%luMB)\n", &io_tlb_start, &io_tlb_end,
+	pr_info("mapped [mem %pa-%pa] (%luMB)\n", &swiotlb->start, &swiotlb->end,
 	       bytes >> 20);
 }
 
@@ -200,57 +202,61 @@ void swiotlb_print_info(void)
  */
 void __init swiotlb_update_mem_attributes(void)
 {
+	struct swiotlb *swiotlb = &default_swiotlb;
 	void *vaddr;
 	unsigned long bytes;
 
 	if (no_iotlb_memory || late_alloc)
 		return;
 
-	vaddr = phys_to_virt(io_tlb_start);
-	bytes = PAGE_ALIGN(io_tlb_nslabs << IO_TLB_SHIFT);
+	vaddr = phys_to_virt(swiotlb->start);
+	bytes = PAGE_ALIGN(swiotlb->nslabs << IO_TLB_SHIFT);
 	set_memory_decrypted((unsigned long)vaddr, bytes >> PAGE_SHIFT);
 	memset(vaddr, 0, bytes);
 }
 
 int __init swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose)
 {
+	struct swiotlb *swiotlb = &default_swiotlb;
 	unsigned long i, bytes;
 	size_t alloc_size;
 
 	bytes = nslabs << IO_TLB_SHIFT;
 
-	io_tlb_nslabs = nslabs;
-	io_tlb_start = __pa(tlb);
-	io_tlb_end = io_tlb_start + bytes;
+	swiotlb->nslabs = nslabs;
+	swiotlb->start = __pa(tlb);
+	swiotlb->end = swiotlb->start + bytes;
 
 	/*
 	 * Allocate and initialize the free list array.  This array is used
 	 * to find contiguous free memory regions of size up to IO_TLB_SEGSIZE
-	 * between io_tlb_start and io_tlb_end.
+	 * between swiotlb->start and swiotlb->end.
 	 */
-	alloc_size = PAGE_ALIGN(io_tlb_nslabs * sizeof(int));
-	io_tlb_list = memblock_alloc(alloc_size, PAGE_SIZE);
-	if (!io_tlb_list)
+	alloc_size = PAGE_ALIGN(swiotlb->nslabs * sizeof(int));
+	swiotlb->list = memblock_alloc(alloc_size, PAGE_SIZE);
+	if (!swiotlb->list)
 		panic("%s: Failed to allocate %zu bytes align=0x%lx\n",
 		      __func__, alloc_size, PAGE_SIZE);
 
-	alloc_size = PAGE_ALIGN(io_tlb_nslabs * sizeof(phys_addr_t));
-	io_tlb_orig_addr = memblock_alloc(alloc_size, PAGE_SIZE);
-	if (!io_tlb_orig_addr)
+	alloc_size = PAGE_ALIGN(swiotlb->nslabs * sizeof(phys_addr_t));
+	swiotlb->orig_addr = memblock_alloc(alloc_size, PAGE_SIZE);
+	if (!swiotlb->orig_addr)
 		panic("%s: Failed to allocate %zu bytes align=0x%lx\n",
 		      __func__, alloc_size, PAGE_SIZE);
 
-	for (i = 0; i < io_tlb_nslabs; i++) {
-		io_tlb_list[i] = IO_TLB_SEGSIZE - OFFSET(i, IO_TLB_SEGSIZE);
-		io_tlb_orig_addr[i] = INVALID_PHYS_ADDR;
+	for (i = 0; i < swiotlb->nslabs; i++) {
+		swiotlb->list[i] = IO_TLB_SEGSIZE - OFFSET(i, IO_TLB_SEGSIZE);
+		swiotlb->orig_addr[i] = INVALID_PHYS_ADDR;
 	}
-	io_tlb_index = 0;
+	swiotlb->index = 0;
 	no_iotlb_memory = false;
 
 	if (verbose)
 		swiotlb_print_info();
 
-	swiotlb_set_max_segment(io_tlb_nslabs << IO_TLB_SHIFT);
+	swiotlb_set_max_segment(swiotlb->nslabs << IO_TLB_SHIFT);
+	spin_lock_init(&swiotlb->lock);
+
 	return 0;
 }
 
@@ -261,26 +267,27 @@ int __init swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose)
 void  __init
 swiotlb_init(int verbose)
 {
+	struct swiotlb *swiotlb = &default_swiotlb;
 	size_t default_size = IO_TLB_DEFAULT_SIZE;
 	unsigned char *vstart;
 	unsigned long bytes;
 
-	if (!io_tlb_nslabs) {
-		io_tlb_nslabs = (default_size >> IO_TLB_SHIFT);
-		io_tlb_nslabs = ALIGN(io_tlb_nslabs, IO_TLB_SEGSIZE);
+	if (!swiotlb->nslabs) {
+		swiotlb->nslabs = (default_size >> IO_TLB_SHIFT);
+		swiotlb->nslabs = ALIGN(swiotlb->nslabs, IO_TLB_SEGSIZE);
 	}
 
-	bytes = io_tlb_nslabs << IO_TLB_SHIFT;
+	bytes = swiotlb->nslabs << IO_TLB_SHIFT;
 
 	/* Get IO TLB memory from the low pages */
 	vstart = memblock_alloc_low(PAGE_ALIGN(bytes), PAGE_SIZE);
-	if (vstart && !swiotlb_init_with_tbl(vstart, io_tlb_nslabs, verbose))
+	if (vstart && !swiotlb_init_with_tbl(vstart, swiotlb->nslabs, verbose))
 		return;
 
-	if (io_tlb_start) {
-		memblock_free_early(io_tlb_start,
-				    PAGE_ALIGN(io_tlb_nslabs << IO_TLB_SHIFT));
-		io_tlb_start = 0;
+	if (swiotlb->start) {
+		memblock_free_early(swiotlb->start,
+				    PAGE_ALIGN(swiotlb->nslabs << IO_TLB_SHIFT));
+		swiotlb->start = 0;
 	}
 	pr_warn("Cannot allocate buffer");
 	no_iotlb_memory = true;
@@ -294,22 +301,23 @@ swiotlb_init(int verbose)
 int
 swiotlb_late_init_with_default_size(size_t default_size)
 {
-	unsigned long bytes, req_nslabs = io_tlb_nslabs;
+	struct swiotlb *swiotlb = &default_swiotlb;
+	unsigned long bytes, req_nslabs = swiotlb->nslabs;
 	unsigned char *vstart = NULL;
 	unsigned int order;
 	int rc = 0;
 
-	if (!io_tlb_nslabs) {
-		io_tlb_nslabs = (default_size >> IO_TLB_SHIFT);
-		io_tlb_nslabs = ALIGN(io_tlb_nslabs, IO_TLB_SEGSIZE);
+	if (!swiotlb->nslabs) {
+		swiotlb->nslabs = (default_size >> IO_TLB_SHIFT);
+		swiotlb->nslabs = ALIGN(swiotlb->nslabs, IO_TLB_SEGSIZE);
 	}
 
 	/*
 	 * Get IO TLB memory from the low pages
 	 */
-	order = get_order(io_tlb_nslabs << IO_TLB_SHIFT);
-	io_tlb_nslabs = SLABS_PER_PAGE << order;
-	bytes = io_tlb_nslabs << IO_TLB_SHIFT;
+	order = get_order(swiotlb->nslabs << IO_TLB_SHIFT);
+	swiotlb->nslabs = SLABS_PER_PAGE << order;
+	bytes = swiotlb->nslabs << IO_TLB_SHIFT;
 
 	while ((SLABS_PER_PAGE << order) > IO_TLB_MIN_SLABS) {
 		vstart = (void *)__get_free_pages(GFP_DMA | __GFP_NOWARN,
@@ -320,15 +328,15 @@ swiotlb_late_init_with_default_size(size_t default_size)
 	}
 
 	if (!vstart) {
-		io_tlb_nslabs = req_nslabs;
+		swiotlb->nslabs = req_nslabs;
 		return -ENOMEM;
 	}
 	if (order != get_order(bytes)) {
 		pr_warn("only able to allocate %ld MB\n",
 			(PAGE_SIZE << order) >> 20);
-		io_tlb_nslabs = SLABS_PER_PAGE << order;
+		swiotlb->nslabs = SLABS_PER_PAGE << order;
 	}
-	rc = swiotlb_late_init_with_tbl(vstart, io_tlb_nslabs);
+	rc = swiotlb_late_init_with_tbl(vstart, swiotlb->nslabs);
 	if (rc)
 		free_pages((unsigned long)vstart, order);
 
@@ -337,22 +345,25 @@ swiotlb_late_init_with_default_size(size_t default_size)
 
 static void swiotlb_cleanup(void)
 {
-	io_tlb_end = 0;
-	io_tlb_start = 0;
-	io_tlb_nslabs = 0;
+	struct swiotlb *swiotlb = &default_swiotlb;
+
+	swiotlb->end = 0;
+	swiotlb->start = 0;
+	swiotlb->nslabs = 0;
 	max_segment = 0;
 }
 
 int
 swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs)
 {
+	struct swiotlb *swiotlb = &default_swiotlb;
 	unsigned long i, bytes;
 
 	bytes = nslabs << IO_TLB_SHIFT;
 
-	io_tlb_nslabs = nslabs;
-	io_tlb_start = virt_to_phys(tlb);
-	io_tlb_end = io_tlb_start + bytes;
+	swiotlb->nslabs = nslabs;
+	swiotlb->start = virt_to_phys(tlb);
+	swiotlb->end = swiotlb->start + bytes;
 
 	set_memory_decrypted((unsigned long)tlb, bytes >> PAGE_SHIFT);
 	memset(tlb, 0, bytes);
@@ -360,39 +371,40 @@ swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs)
 	/*
 	 * Allocate and initialize the free list array.  This array is used
 	 * to find contiguous free memory regions of size up to IO_TLB_SEGSIZE
-	 * between io_tlb_start and io_tlb_end.
+	 * between swiotlb->start and swiotlb->end.
 	 */
-	io_tlb_list = (unsigned int *)__get_free_pages(GFP_KERNEL,
-	                              get_order(io_tlb_nslabs * sizeof(int)));
-	if (!io_tlb_list)
+	swiotlb->list = (unsigned int *)__get_free_pages(GFP_KERNEL,
+	                              get_order(swiotlb->nslabs * sizeof(int)));
+	if (!swiotlb->list)
 		goto cleanup3;
 
-	io_tlb_orig_addr = (phys_addr_t *)
+	swiotlb->orig_addr = (phys_addr_t *)
 		__get_free_pages(GFP_KERNEL,
-				 get_order(io_tlb_nslabs *
+				 get_order(swiotlb->nslabs *
 					   sizeof(phys_addr_t)));
-	if (!io_tlb_orig_addr)
+	if (!swiotlb->orig_addr)
 		goto cleanup4;
 
-	for (i = 0; i < io_tlb_nslabs; i++) {
-		io_tlb_list[i] = IO_TLB_SEGSIZE - OFFSET(i, IO_TLB_SEGSIZE);
-		io_tlb_orig_addr[i] = INVALID_PHYS_ADDR;
+	for (i = 0; i < swiotlb->nslabs; i++) {
+		swiotlb->list[i] = IO_TLB_SEGSIZE - OFFSET(i, IO_TLB_SEGSIZE);
+		swiotlb->orig_addr[i] = INVALID_PHYS_ADDR;
 	}
-	io_tlb_index = 0;
+	swiotlb->index = 0;
 	no_iotlb_memory = false;
 
 	swiotlb_print_info();
 
 	late_alloc = 1;
 
-	swiotlb_set_max_segment(io_tlb_nslabs << IO_TLB_SHIFT);
+	swiotlb_set_max_segment(swiotlb->nslabs << IO_TLB_SHIFT);
+	spin_lock_init(&swiotlb->lock);
 
 	return 0;
 
 cleanup4:
-	free_pages((unsigned long)io_tlb_list, get_order(io_tlb_nslabs *
-	                                                 sizeof(int)));
-	io_tlb_list = NULL;
+	free_pages((unsigned long)swiotlb->list,
+		   get_order(swiotlb->nslabs * sizeof(int)));
+	swiotlb->list = NULL;
 cleanup3:
 	swiotlb_cleanup();
 	return -ENOMEM;
@@ -400,23 +412,25 @@ swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs)
 
 void __init swiotlb_exit(void)
 {
-	if (!io_tlb_orig_addr)
+	struct swiotlb *swiotlb = &default_swiotlb;
+
+	if (!swiotlb->orig_addr)
 		return;
 
 	if (late_alloc) {
-		free_pages((unsigned long)io_tlb_orig_addr,
-			   get_order(io_tlb_nslabs * sizeof(phys_addr_t)));
-		free_pages((unsigned long)io_tlb_list, get_order(io_tlb_nslabs *
-								 sizeof(int)));
-		free_pages((unsigned long)phys_to_virt(io_tlb_start),
-			   get_order(io_tlb_nslabs << IO_TLB_SHIFT));
+		free_pages((unsigned long)swiotlb->orig_addr,
+			   get_order(swiotlb->nslabs * sizeof(phys_addr_t)));
+		free_pages((unsigned long)swiotlb->list,
+			   get_order(swiotlb->nslabs * sizeof(int)));
+		free_pages((unsigned long)phys_to_virt(swiotlb->start),
+			   get_order(swiotlb->nslabs << IO_TLB_SHIFT));
 	} else {
-		memblock_free_late(__pa(io_tlb_orig_addr),
-				   PAGE_ALIGN(io_tlb_nslabs * sizeof(phys_addr_t)));
-		memblock_free_late(__pa(io_tlb_list),
-				   PAGE_ALIGN(io_tlb_nslabs * sizeof(int)));
-		memblock_free_late(io_tlb_start,
-				   PAGE_ALIGN(io_tlb_nslabs << IO_TLB_SHIFT));
+		memblock_free_late(__pa(swiotlb->orig_addr),
+				   PAGE_ALIGN(swiotlb->nslabs * sizeof(phys_addr_t)));
+		memblock_free_late(__pa(swiotlb->list),
+				   PAGE_ALIGN(swiotlb->nslabs * sizeof(int)));
+		memblock_free_late(swiotlb->start,
+				   PAGE_ALIGN(swiotlb->nslabs << IO_TLB_SHIFT));
 	}
 	swiotlb_cleanup();
 }
@@ -465,7 +479,8 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, phys_addr_t orig_addr,
 		size_t mapping_size, size_t alloc_size,
 		enum dma_data_direction dir, unsigned long attrs)
 {
-	dma_addr_t tbl_dma_addr = phys_to_dma_unencrypted(hwdev, io_tlb_start);
+	struct swiotlb *swiotlb = &default_swiotlb;
+	dma_addr_t tbl_dma_addr = phys_to_dma_unencrypted(hwdev, swiotlb->start);
 	unsigned long flags;
 	phys_addr_t tlb_addr;
 	unsigned int nslots, stride, index, wrap;
@@ -516,13 +531,13 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, phys_addr_t orig_addr,
 	 * Find suitable number of IO TLB entries size that will fit this
 	 * request and allocate a buffer from that IO TLB pool.
 	 */
-	spin_lock_irqsave(&io_tlb_lock, flags);
+	spin_lock_irqsave(&swiotlb->lock, flags);
 
-	if (unlikely(nslots > io_tlb_nslabs - io_tlb_used))
+	if (unlikely(nslots > swiotlb->nslabs - swiotlb->used))
 		goto not_found;
 
-	index = ALIGN(io_tlb_index, stride);
-	if (index >= io_tlb_nslabs)
+	index = ALIGN(swiotlb->index, stride);
+	if (index >= swiotlb->nslabs)
 		index = 0;
 	wrap = index;
 
@@ -530,7 +545,7 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, phys_addr_t orig_addr,
 		while (iommu_is_span_boundary(index, nslots, offset_slots,
 					      max_slots)) {
 			index += stride;
-			if (index >= io_tlb_nslabs)
+			if (index >= swiotlb->nslabs)
 				index = 0;
 			if (index == wrap)
 				goto not_found;
@@ -541,40 +556,40 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, phys_addr_t orig_addr,
 		 * contiguous buffers, we allocate the buffers from that slot
 		 * and mark the entries as '0' indicating unavailable.
 		 */
-		if (io_tlb_list[index] >= nslots) {
+		if (swiotlb->list[index] >= nslots) {
 			int count = 0;
 
 			for (i = index; i < (int) (index + nslots); i++)
-				io_tlb_list[i] = 0;
-			for (i = index - 1; (OFFSET(i, IO_TLB_SEGSIZE) != IO_TLB_SEGSIZE - 1) && io_tlb_list[i]; i--)
-				io_tlb_list[i] = ++count;
-			tlb_addr = io_tlb_start + (index << IO_TLB_SHIFT);
+				swiotlb->list[i] = 0;
+			for (i = index - 1; (OFFSET(i, IO_TLB_SEGSIZE) != IO_TLB_SEGSIZE - 1) && swiotlb->list[i]; i--)
+				swiotlb->list[i] = ++count;
+			tlb_addr = swiotlb->start + (index << IO_TLB_SHIFT);
 
 			/*
 			 * Update the indices to avoid searching in the next
 			 * round.
 			 */
-			io_tlb_index = ((index + nslots) < io_tlb_nslabs
-					? (index + nslots) : 0);
+			swiotlb->index = ((index + nslots) < swiotlb->nslabs
+				      ? (index + nslots) : 0);
 
 			goto found;
 		}
 		index += stride;
-		if (index >= io_tlb_nslabs)
+		if (index >= swiotlb->nslabs)
 			index = 0;
 	} while (index != wrap);
 
 not_found:
-	tmp_io_tlb_used = io_tlb_used;
+	tmp_io_tlb_used = swiotlb->used;
 
-	spin_unlock_irqrestore(&io_tlb_lock, flags);
+	spin_unlock_irqrestore(&swiotlb->lock, flags);
 	if (!(attrs & DMA_ATTR_NO_WARN) && printk_ratelimit())
 		dev_warn(hwdev, "swiotlb buffer is full (sz: %zd bytes), total %lu (slots), used %lu (slots)\n",
-			 alloc_size, io_tlb_nslabs, tmp_io_tlb_used);
+			 alloc_size, swiotlb->nslabs, tmp_io_tlb_used);
 	return (phys_addr_t)DMA_MAPPING_ERROR;
 found:
-	io_tlb_used += nslots;
-	spin_unlock_irqrestore(&io_tlb_lock, flags);
+	swiotlb->used += nslots;
+	spin_unlock_irqrestore(&swiotlb->lock, flags);
 
 	/*
 	 * Save away the mapping from the original address to the DMA address.
@@ -582,7 +597,7 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, phys_addr_t orig_addr,
 	 * needed.
 	 */
 	for (i = 0; i < nslots; i++)
-		io_tlb_orig_addr[index+i] = orig_addr + (i << IO_TLB_SHIFT);
+		swiotlb->orig_addr[index+i] = orig_addr + (i << IO_TLB_SHIFT);
 	if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
 	    (dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL))
 		swiotlb_bounce(orig_addr, tlb_addr, mapping_size, DMA_TO_DEVICE);
@@ -597,10 +612,11 @@ void swiotlb_tbl_unmap_single(struct device *hwdev, phys_addr_t tlb_addr,
 			      size_t mapping_size, size_t alloc_size,
 			      enum dma_data_direction dir, unsigned long attrs)
 {
+	struct swiotlb *swiotlb = &default_swiotlb;
 	unsigned long flags;
 	int i, count, nslots = ALIGN(alloc_size, 1 << IO_TLB_SHIFT) >> IO_TLB_SHIFT;
-	int index = (tlb_addr - io_tlb_start) >> IO_TLB_SHIFT;
-	phys_addr_t orig_addr = io_tlb_orig_addr[index];
+	int index = (tlb_addr - swiotlb->start) >> IO_TLB_SHIFT;
+	phys_addr_t orig_addr = swiotlb->orig_addr[index];
 
 	/*
 	 * First, sync the memory before unmapping the entry
@@ -616,36 +632,37 @@ void swiotlb_tbl_unmap_single(struct device *hwdev, phys_addr_t tlb_addr,
 	 * While returning the entries to the free list, we merge the entries
 	 * with slots below and above the pool being returned.
 	 */
-	spin_lock_irqsave(&io_tlb_lock, flags);
+	spin_lock_irqsave(&swiotlb->lock, flags);
 	{
 		count = ((index + nslots) < ALIGN(index + 1, IO_TLB_SEGSIZE) ?
-			 io_tlb_list[index + nslots] : 0);
+			 swiotlb->list[index + nslots] : 0);
 		/*
 		 * Step 1: return the slots to the free list, merging the
 		 * slots with superceeding slots
 		 */
 		for (i = index + nslots - 1; i >= index; i--) {
-			io_tlb_list[i] = ++count;
-			io_tlb_orig_addr[i] = INVALID_PHYS_ADDR;
+			swiotlb->list[i] = ++count;
+			swiotlb->orig_addr[i] = INVALID_PHYS_ADDR;
 		}
 		/*
 		 * Step 2: merge the returned slots with the preceding slots,
 		 * if available (non zero)
 		 */
-		for (i = index - 1; (OFFSET(i, IO_TLB_SEGSIZE) != IO_TLB_SEGSIZE -1) && io_tlb_list[i]; i--)
-			io_tlb_list[i] = ++count;
+		for (i = index - 1; (OFFSET(i, IO_TLB_SEGSIZE) != IO_TLB_SEGSIZE -1) && swiotlb->list[i]; i--)
+			swiotlb->list[i] = ++count;
 
-		io_tlb_used -= nslots;
+		swiotlb->used -= nslots;
 	}
-	spin_unlock_irqrestore(&io_tlb_lock, flags);
+	spin_unlock_irqrestore(&swiotlb->lock, flags);
 }
 
 void swiotlb_tbl_sync_single(struct device *hwdev, phys_addr_t tlb_addr,
 			     size_t size, enum dma_data_direction dir,
 			     enum dma_sync_target target)
 {
-	int index = (tlb_addr - io_tlb_start) >> IO_TLB_SHIFT;
-	phys_addr_t orig_addr = io_tlb_orig_addr[index];
+	struct swiotlb *swiotlb = &default_swiotlb;
+	int index = (tlb_addr - swiotlb->start) >> IO_TLB_SHIFT;
+	phys_addr_t orig_addr = swiotlb->orig_addr[index];
 
 	if (orig_addr == INVALID_PHYS_ADDR)
 		return;
@@ -713,31 +730,31 @@ size_t swiotlb_max_mapping_size(struct device *dev)
 bool is_swiotlb_active(void)
 {
 	/*
-	 * When SWIOTLB is initialized, even if io_tlb_start points to physical
-	 * address zero, io_tlb_end surely doesn't.
+	 * When SWIOTLB is initialized, even if swiotlb->start points to
+	 * physical address zero, swiotlb->end surely doesn't.
 	 */
-	return io_tlb_end != 0;
+	return default_swiotlb.end != 0;
 }
 
 bool is_swiotlb_buffer(phys_addr_t paddr)
 {
-	return paddr >= io_tlb_start && paddr < io_tlb_end;
+	return paddr >= default_swiotlb.start && paddr < default_swiotlb.end;
 }
 
 phys_addr_t get_swiotlb_start(void)
 {
-	return io_tlb_start;
+	return default_swiotlb.start;
 }
 
 #ifdef CONFIG_DEBUG_FS
 
 static int __init swiotlb_create_debugfs(void)
 {
-	struct dentry *root;
+	struct swiotlb *swiotlb = &default_swiotlb;
 
-	root = debugfs_create_dir("swiotlb", NULL);
-	debugfs_create_ulong("io_tlb_nslabs", 0400, root, &io_tlb_nslabs);
-	debugfs_create_ulong("io_tlb_used", 0400, root, &io_tlb_used);
+	swiotlb->debugfs = debugfs_create_dir("swiotlb", NULL);
+	debugfs_create_ulong("io_tlb_nslabs", 0400, swiotlb->debugfs, &swiotlb->nslabs);
+	debugfs_create_ulong("io_tlb_used", 0400, swiotlb->debugfs, &swiotlb->used);
 	return 0;
 }
 
-- 
2.30.0.478.g8a0d178c01-goog


^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox