LinuxPPC-Dev Archive on lore.kernel.org

LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed

* Re: [PATCH] powerpc/603: Fix protection of user pages mapped with PROT_NONE
From: Christophe Leroy @ 2021-02-01 11:39 UTC (permalink / raw)
  To: PLATTNER Christoph, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman
  Cc: REITHER Robert - Contractor, HAMETNER Reinhard, KOENIG Werner,
	linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org
In-Reply-To: <1b194840-d4e6-4660-94d9-6bac623442cf@THSDC1IRIMBX13P.iris.infra.thales>



Le 01/02/2021 à 11:22, PLATTNER Christoph a écrit :
> Hello to all, and thank you very much for first and second fast response.
> 
> I do not have a long history on PowerPC MMU environment, I hacked into this topic
> for about 3 months for analyzing that problem- so, sorry, if I am wrong in some points ...

Yes you are wrong on some points, sorry, see below.


> 
> What I learn so far from this MPC5121e (variant of e300c4 core):
> - It uses book3s32 hash-code, but it DOES NOT provide KEY hash method, so always the
>     branch  "if (! Hash) ...." is taken, so, I assume that "key 0" and "key 1" setups are not
>     used on this CPU (not supporting MMU_FTR_HPTE_TABLE)

hash method is not used, this is SW TLB loading that is used, but still, all the PP and Ks/Kp keys 
defined in the segment register are used, see e300 core reference manual §6.4.2 Page Memory Protection

> - The PP bits are NOT checked by the CPU in HW, even if set to 00, the CPU does not react.
>     As far I have understood, the TLB miss routines are responsible for checking permissions.
>     The TLB miss routines check the Linux PTE styled entries and generates the PP bits
>     for the TLB entry. The PowerPC PP bits are never check elsewhere on that CPU models ...

PP bits ARE checked hoppefully. If it was not the case, then the TLB miss routines would install a 
TLB on a read, then the user could do a write without any verification being done ?

Refer to e300 Core reference Manual, §6.1.4 Memory Protection Facilities

As I explained in the patch, the problem is not that the HW doesn't check the permission. It is that 
user accessed been done with key 0 as programmed in the segment registers, PP 00 means RW access.

> - The PTE entries in Linux are fully "void" in sense of this CPU type, as this CPU does not
>     read any PTEs from RAM (no HW support in contrast to x86 or ARM or later ppc...).

No, the PTE are read by the TLB miss exception handlers and writen into TLB entries.

> 
> In summary - as far as I understand it now - we have to handle the PTE bits differently
> (Linux style) for PROT_NONE permissions - OR - we have to expand the permission
> checking like my proposed experimental patch. (PROT_NONE is not NUMA related only,
> but may not used very often ...).

Yes, expanding the permission checking is the easiest solution, hence the patch I sent out based on 
your proposal.

> 
> Another related point:
> According e300 RM (manual) the ACCESSED bit in the PTE shall be set on TLB miss, as
> it is an indication, that page is used. In 4.4 kernel this write back of the _PAGE_ACCESSED
> bit was performed after successful permission check:
> 
>          bne-    DataAddressInvalid      /* return if access not permitted */
>          ori     r0,r0,_PAGE_ACCESSED    /* set _PAGE_ACCESSED in pte */
>          /*
>           * NOTE! We are assuming this is not an SMP system, otherwise
>           * we would need to update the pte atomically with lwarx/stwcx.
>           */
>          stw     r0,0(r2)                /* update PTE (accessed bit) */
>          /* Convert linux-style PTE to low word of PPC-style PTE */
> 
> Bit is set (ori ...) and written back (stw ...) to Linux PTE. May be, this is not needed, as the
> PTE is never seen by the PPC chip. But I do not understand, WHY the PAGE_ACCCESSED
> is used for permission check in the late 5.4 kernel (not used in 4.4 kernel):
> 
> 	cmplw	0,r1,r3
>   	mfspr	r2, SPRN_SDR1
> 	li	r1, _PAGE_PRESENT | _PAGE_ACCESSED
> 	rlwinm	r2, r2, 28, 0xfffff000
>   	bgt-	112f
> 
> What is the reason or relevance for checking this here ?
> Was not checked in 4.4, bit or-ed afterwards, as it is accessed now.
> Do you know the reason of change on this point ?

PAGE_ACCESSED is important for memory management, linux kernel need it.

But instead of spending time at every miss to perform a write which will be a no-op in 99% of cases, 
we prefer bailing out to the page_fault logic when the accessed bit is not set. Then the page_fault 
logic will set the bit.
This also allowed to simplify the handling in __set_pte()_at function by avoiding races in the 
update of PTEs.

> 
> Another remark to Core manual relevant for this:
> There is the reference manual for e300 core available (e300 RM). It includes
> many remarks in range of Memory Management section, that many features
> are optional or variable for dedicated implementations. On the other hand,
> the MPC5121e reference manual refers to the e300 core RM, but DOES NOT
> information, which of the optional points are there or nor. According my
> analysis, MPC5121e does not include any of the optional features.
> 

Not sure what you mean. As far as I understand, that chapter tells you that some functionnalities 
are optional for the powerpc architectecture, and provided (or not) by the e300 core. The MPC5121 
supports all the things that are defined by e300 core.


> 
> Thanks a lot for first reactions

You are welcome, don't hesitate if you have additional questions.

Christophe

^ permalink raw reply

* Re: [PATCH 04/13] module: use RCU to synchronize find_module
From: Christoph Hellwig @ 2021-02-01 11:46 UTC (permalink / raw)
  To: Miroslav Benes
  Cc: Petr Mladek, Jiri Kosina, Andrew Donnellan, linux-kbuild,
	David Airlie, Masahiro Yamada, Josh Poimboeuf, Maarten Lankhorst,
	linux-kernel, Maxime Ripard, live-patching, Michal Marek,
	Joe Lawrence, dri-devel, Thomas Zimmermann, Jessica Yu,
	Frederic Barrat, Daniel Vetter, linuxppc-dev, Christoph Hellwig
In-Reply-To: <alpine.LSU.2.21.2101291626080.22237@pobox.suse.cz>

On Fri, Jan 29, 2021 at 04:29:02PM +0100, Miroslav Benes wrote:
> >  
> > -	mutex_lock(&module_mutex);
> > +	rcu_read_lock_sched();
> >  	/*
> >  	 * We do not want to block removal of patched modules and therefore
> >  	 * we do not take a reference here. The patches are removed by
> > @@ -74,7 +75,7 @@ static void klp_find_object_module(struct klp_object *obj)
> >  	if (mod && mod->klp_alive)
> 
> RCU always baffles me a bit, so I'll ask. Don't we need 
> rcu_dereference_sched() here? "mod" comes from a RCU-protected list, so I 
> wonder.

rcu_dereference* is only used for dereferencing points where that
reference itself is RCU protected, that is the lookup of mod itself down
in find_module_all in this case.

^ permalink raw reply

* Re: [PATCH 05/13] kallsyms: refactor {,module_}kallsyms_on_each_symbol
From: Christoph Hellwig @ 2021-02-01 11:47 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Jiri Kosina, Andrew Donnellan, linux-kbuild, David Airlie,
	Masahiro Yamada, Josh Poimboeuf, Maarten Lankhorst, linux-kernel,
	Maxime Ripard, live-patching, Michal Marek, Joe Lawrence,
	dri-devel, Thomas Zimmermann, Jessica Yu, Frederic Barrat,
	Daniel Vetter, Miroslav Benes, linuxppc-dev, Christoph Hellwig
In-Reply-To: <YBPYyEvesLMrRtZM@alley>

On Fri, Jan 29, 2021 at 10:43:36AM +0100, Petr Mladek wrote:
> > --- a/kernel/livepatch/core.c
> > +++ b/kernel/livepatch/core.c
> > @@ -164,12 +164,8 @@ static int klp_find_object_symbol(const char *objname, const char *name,
> >  		.pos = sympos,
> >  	};
> >  
> > -	mutex_lock(&module_mutex);
> > -	if (objname)
> > +	if (objname || !kallsyms_on_each_symbol(klp_find_callback, &args))
> >  		module_kallsyms_on_each_symbol(klp_find_callback, &args);
> > -	else
> > -		kallsyms_on_each_symbol(klp_find_callback, &args);
> > -	mutex_unlock(&module_mutex);
> 
> This change is not needed. (objname == NULL) means that we are
> interested only in symbols in "vmlinux".
> 
> module_kallsyms_on_each_symbol(klp_find_callback, &args)
> will always fail when objname == NULL.

I just tried to keep the old behavior.  I can respin it with your
recommended change noting the change in behavior, though.

^ permalink raw reply

* Re: [RFC 11/20] mm/tlb: remove arch-specific tlb_start/end_vma()
From: Peter Zijlstra @ 2021-02-01 12:09 UTC (permalink / raw)
  To: Nadav Amit
  Cc: Andrea Arcangeli, linux-s390, x86, Yu Zhao, linuxppc-dev,
	Dave Hansen, linux-kernel, Nick Piggin, linux-mm, Nadav Amit,
	linux-csky, Andy Lutomirski, Andrew Morton, Will Deacon,
	Thomas Gleixner
In-Reply-To: <20210131001132.3368247-12-namit@vmware.com>

On Sat, Jan 30, 2021 at 04:11:23PM -0800, Nadav Amit wrote:

> diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h
> index 427bfcc6cdec..b97136b7010b 100644
> --- a/include/asm-generic/tlb.h
> +++ b/include/asm-generic/tlb.h
> @@ -334,8 +334,8 @@ static inline void __tlb_reset_range(struct mmu_gather *tlb)
>  
>  #ifdef CONFIG_MMU_GATHER_NO_RANGE
>  
> -#if defined(tlb_flush) || defined(tlb_start_vma) || defined(tlb_end_vma)
> -#error MMU_GATHER_NO_RANGE relies on default tlb_flush(), tlb_start_vma() and tlb_end_vma()
> +#if defined(tlb_flush)
> +#error MMU_GATHER_NO_RANGE relies on default tlb_flush()
>  #endif
>  
>  /*
> @@ -362,10 +362,6 @@ static inline void tlb_end_vma(struct mmu_gather *tlb, struct vm_area_struct *vm
>  
>  #ifndef tlb_flush
>  
> -#if defined(tlb_start_vma) || defined(tlb_end_vma)
> -#error Default tlb_flush() relies on default tlb_start_vma() and tlb_end_vma()
> -#endif

#ifdef CONFIG_ARCH_WANT_AGGRESSIVE_TLB_FLUSH_BATCHING
#error ....
#endif

goes here...


>  static inline void tlb_end_vma(struct mmu_gather *tlb, struct vm_area_struct *vma)
>  {
>  	if (tlb->fullmm)
>  		return;
>  
> +	if (IS_ENABLED(CONFIG_ARCH_WANT_AGGRESSIVE_TLB_FLUSH_BATCHING))
> +		return;

Also, can you please stick to the CONFIG_MMU_GATHER_* namespace?

I also don't think AGRESSIVE_FLUSH_BATCHING quite captures what it does.
How about:

	CONFIG_MMU_GATHER_NO_PER_VMA_FLUSH

?



^ permalink raw reply

* Re: [PATCH 04/13] module: use RCU to synchronize find_module
From: Jessica Yu @ 2021-02-01 12:10 UTC (permalink / raw)
  To: Miroslav Benes
  Cc: Petr Mladek, Joe Lawrence, Andrew Donnellan, linux-kbuild,
	David Airlie, Masahiro Yamada, Jiri Kosina, Maarten Lankhorst,
	linux-kernel, Maxime Ripard, live-patching, Michal Marek,
	dri-devel, Thomas Zimmermann, Josh Poimboeuf, Frederic Barrat,
	Daniel Vetter, linuxppc-dev, Christoph Hellwig
In-Reply-To: <alpine.LSU.2.21.2101291626080.22237@pobox.suse.cz>

+++ Miroslav Benes [29/01/21 16:29 +0100]:
>On Thu, 28 Jan 2021, Christoph Hellwig wrote:
>
>> Allow for a RCU-sched critical section around find_module, following
>> the lower level find_module_all helper, and switch the two callers
>> outside of module.c to use such a RCU-sched critical section instead
>> of module_mutex.
>
>That's a nice idea.
>
>> @@ -57,7 +58,7 @@ static void klp_find_object_module(struct klp_object *obj)
>>  	if (!klp_is_module(obj))
>>  		return;
>>
>> -	mutex_lock(&module_mutex);
>> +	rcu_read_lock_sched();
>>  	/*
>>  	 * We do not want to block removal of patched modules and therefore
>>  	 * we do not take a reference here. The patches are removed by
>> @@ -74,7 +75,7 @@ static void klp_find_object_module(struct klp_object *obj)
>>  	if (mod && mod->klp_alive)
>
>RCU always baffles me a bit, so I'll ask. Don't we need
>rcu_dereference_sched() here? "mod" comes from a RCU-protected list, so I
>wonder.

Same here :-) I had to double check the RCU documentation. For our
modules list case I believe the rcu list API should take care of that
for us. Worth noting is this snippet from Documentation/RCU/whatisRCU.txt:

    rcu_dereference() is typically used indirectly, via the _rcu
    list-manipulation primitives, such as list_for_each_entry_rcu()



^ permalink raw reply

* RE: [PATCH] powerpc/603: Fix protection of user pages mapped with PROT_NONE
From: PLATTNER Christoph @ 2021-02-01 12:35 UTC (permalink / raw)
  To: Christophe Leroy, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman
  Cc: christoph.plattner@gmx.at, KOENIG Werner, HAMETNER Reinhard,
	linux-kernel@vger.kernel.org, REITHER Robert - Contractor,
	PLATTNER Christoph, linuxppc-dev@lists.ozlabs.org
In-Reply-To: <035a7cde-7ffd-5f27-81e1-a8d3648e4c1c@csgroup.eu>

Thank you very much, I appreciate your fast responses.
Thank you also for clarification, I did completely oversee
the permission settings in the segment setup and expected
the fault reaction on the PP bits in the TLB.
And I will re-read the chapters, got get deeper into this topic.

Greetings
Christoph 


-----Original Message-----
From: Christophe Leroy <christophe.leroy@csgroup.eu> 
Sent: Montag, 1. Februar 2021 12:39
To: PLATTNER Christoph <christoph.plattner@thalesgroup.com>; Benjamin Herrenschmidt <benh@kernel.crashing.org>; Paul Mackerras <paulus@samba.org>; Michael Ellerman <mpe@ellerman.id.au>
Cc: linux-kernel@vger.kernel.org; linuxppc-dev@lists.ozlabs.org; HAMETNER Reinhard <reinhard.hametner@thalesgroup.com>; REITHER Robert - Contractor <robert.reither@external.thalesgroup.com>; KOENIG Werner <werner.koenig@thalesgroup.com>
Subject: Re: [PATCH] powerpc/603: Fix protection of user pages mapped with PROT_NONE



Le 01/02/2021 à 11:22, PLATTNER Christoph a écrit :
> Hello to all, and thank you very much for first and second fast response.
> 
> I do not have a long history on PowerPC MMU environment, I hacked into 
> this topic for about 3 months for analyzing that problem- so, sorry, if I am wrong in some points ...

Yes you are wrong on some points, sorry, see below.


> 
> What I learn so far from this MPC5121e (variant of e300c4 core):
> - It uses book3s32 hash-code, but it DOES NOT provide KEY hash method, so always the
>     branch  "if (! Hash) ...." is taken, so, I assume that "key 0" and "key 1" setups are not
>     used on this CPU (not supporting MMU_FTR_HPTE_TABLE)

hash method is not used, this is SW TLB loading that is used, but still, all the PP and Ks/Kp keys defined in the segment register are used, see e300 core reference manual §6.4.2 Page Memory Protection

> - The PP bits are NOT checked by the CPU in HW, even if set to 00, the CPU does not react.
>     As far I have understood, the TLB miss routines are responsible for checking permissions.
>     The TLB miss routines check the Linux PTE styled entries and generates the PP bits
>     for the TLB entry. The PowerPC PP bits are never check elsewhere on that CPU models ...

PP bits ARE checked hoppefully. If it was not the case, then the TLB miss routines would install a TLB on a read, then the user could do a write without any verification being done ?

Refer to e300 Core reference Manual, §6.1.4 Memory Protection Facilities

As I explained in the patch, the problem is not that the HW doesn't check the permission. It is that user accessed been done with key 0 as programmed in the segment registers, PP 00 means RW access.

> - The PTE entries in Linux are fully "void" in sense of this CPU type, as this CPU does not
>     read any PTEs from RAM (no HW support in contrast to x86 or ARM or later ppc...).

No, the PTE are read by the TLB miss exception handlers and writen into TLB entries.

> 
> In summary - as far as I understand it now - we have to handle the PTE 
> bits differently (Linux style) for PROT_NONE permissions - OR - we 
> have to expand the permission checking like my proposed experimental 
> patch. (PROT_NONE is not NUMA related only, but may not used very often ...).

Yes, expanding the permission checking is the easiest solution, hence the patch I sent out based on your proposal.

> 
> Another related point:
> According e300 RM (manual) the ACCESSED bit in the PTE shall be set on 
> TLB miss, as it is an indication, that page is used. In 4.4 kernel 
> this write back of the _PAGE_ACCESSED bit was performed after successful permission check:
> 
>          bne-    DataAddressInvalid      /* return if access not permitted */
>          ori     r0,r0,_PAGE_ACCESSED    /* set _PAGE_ACCESSED in pte */
>          /*
>           * NOTE! We are assuming this is not an SMP system, otherwise
>           * we would need to update the pte atomically with lwarx/stwcx.
>           */
>          stw     r0,0(r2)                /* update PTE (accessed bit) */
>          /* Convert linux-style PTE to low word of PPC-style PTE */
> 
> Bit is set (ori ...) and written back (stw ...) to Linux PTE. May be, 
> this is not needed, as the PTE is never seen by the PPC chip. But I do 
> not understand, WHY the PAGE_ACCCESSED is used for permission check in the late 5.4 kernel (not used in 4.4 kernel):
> 
> 	cmplw	0,r1,r3
>   	mfspr	r2, SPRN_SDR1
> 	li	r1, _PAGE_PRESENT | _PAGE_ACCESSED
> 	rlwinm	r2, r2, 28, 0xfffff000
>   	bgt-	112f
> 
> What is the reason or relevance for checking this here ?
> Was not checked in 4.4, bit or-ed afterwards, as it is accessed now.
> Do you know the reason of change on this point ?

PAGE_ACCESSED is important for memory management, linux kernel need it.

But instead of spending time at every miss to perform a write which will be a no-op in 99% of cases, we prefer bailing out to the page_fault logic when the accessed bit is not set. Then the page_fault logic will set the bit.
This also allowed to simplify the handling in __set_pte()_at function by avoiding races in the update of PTEs.

> 
> Another remark to Core manual relevant for this:
> There is the reference manual for e300 core available (e300 RM). It includes
> many remarks in range of Memory Management section, that many features
> are optional or variable for dedicated implementations. On the other hand,
> the MPC5121e reference manual refers to the e300 core RM, but DOES NOT
> information, which of the optional points are there or nor. According my
> analysis, MPC5121e does not include any of the optional features.
> 

Not sure what you mean. As far as I understand, that chapter tells you that some functionnalities 
are optional for the powerpc architectecture, and provided (or not) by the e300 core. The MPC5121 
supports all the things that are defined by e300 core.


> 
> Thanks a lot for first reactions

You are welcome, don't hesitate if you have additional questions.

Christophe

^ permalink raw reply

* Re: [RFC 00/20] TLB batching consolidation and enhancements
From: Peter Zijlstra @ 2021-02-01 12:44 UTC (permalink / raw)
  To: Nadav Amit
  Cc: Andrea Arcangeli, linux-s390, X86 ML, Yu Zhao, Will Deacon,
	Mel Gorman, Dave Hansen, LKML, Nicholas Piggin, Linux-MM,
	linux-csky@vger.kernel.org, Andy Lutomirski, Andrew Morton,
	linuxppc-dev, Thomas Gleixner
In-Reply-To: <A1589669-34AE-4E15-8358-79BAD7C72520@vmware.com>

On Sun, Jan 31, 2021 at 07:57:01AM +0000, Nadav Amit wrote:
> > On Jan 30, 2021, at 7:30 PM, Nicholas Piggin <npiggin@gmail.com> wrote:

> > I'll go through the patches a bit more closely when they all come 
> > through. Sparc and powerpc of course need the arch lazy mode to get 
> > per-page/pte information for operations that are not freeing pages, 
> > which is what mmu gather is designed for.
> 
> IIUC you mean any PTE change requires a TLB flush. Even setting up a new PTE
> where no previous PTE was set, right?

These are the HASH architectures. Their hardware doesn't walk the
page-tables, but it consults a hash-table to resolve page translations.

They _MUST_ flush the entries under the PTL to avoid ever seeing
conflicting information, which will make them really unhappy. They can
do this because they have TLBI broadcast.

There's a few more details I'm sure, but those seem to have slipped from
my mind.

^ permalink raw reply

* Re: [PATCH 04/13] module: use RCU to synchronize find_module
From: Miroslav Benes @ 2021-02-01 13:16 UTC (permalink / raw)
  To: Jessica Yu
  Cc: Petr Mladek, Joe Lawrence, Andrew Donnellan, linux-kbuild,
	David Airlie, Masahiro Yamada, Jiri Kosina, Maarten Lankhorst,
	linux-kernel, Maxime Ripard, live-patching, Michal Marek,
	dri-devel, Thomas Zimmermann, Josh Poimboeuf, Frederic Barrat,
	Daniel Vetter, linuxppc-dev, Christoph Hellwig
In-Reply-To: <YBfvvdna9pSeu+1g@gunter>

On Mon, 1 Feb 2021, Jessica Yu wrote:

> +++ Miroslav Benes [29/01/21 16:29 +0100]:
> >On Thu, 28 Jan 2021, Christoph Hellwig wrote:
> >
> >> Allow for a RCU-sched critical section around find_module, following
> >> the lower level find_module_all helper, and switch the two callers
> >> outside of module.c to use such a RCU-sched critical section instead
> >> of module_mutex.
> >
> >That's a nice idea.
> >
> >> @@ -57,7 +58,7 @@ static void klp_find_object_module(struct klp_object
> >> *obj)
> >>   if (!klp_is_module(obj))
> >>    return;
> >>
> >> -	mutex_lock(&module_mutex);
> >> +	rcu_read_lock_sched();
> >>   /*
> >>    * We do not want to block removal of patched modules and therefore
> >>    * we do not take a reference here. The patches are removed by
> >> @@ -74,7 +75,7 @@ static void klp_find_object_module(struct klp_object
> >> *obj)
> >>   if (mod && mod->klp_alive)
> >
> >RCU always baffles me a bit, so I'll ask. Don't we need
> >rcu_dereference_sched() here? "mod" comes from a RCU-protected list, so I
> >wonder.
> 
> Same here :-) I had to double check the RCU documentation. For our
> modules list case I believe the rcu list API should take care of that
> for us. Worth noting is this snippet from Documentation/RCU/whatisRCU.txt:
> 
>    rcu_dereference() is typically used indirectly, via the _rcu
>    list-manipulation primitives, such as list_for_each_entry_rcu()

Ok, thanks to both for checking and explanation.

Ack to the patch then.

Miroslav

^ permalink raw reply

* Re: [PATCH 05/13] kallsyms: refactor {,module_}kallsyms_on_each_symbol
From: Miroslav Benes @ 2021-02-01 13:37 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Petr Mladek, Jiri Kosina, Andrew Donnellan, linux-kbuild,
	David Airlie, Masahiro Yamada, Josh Poimboeuf, Maarten Lankhorst,
	linux-kernel, Maxime Ripard, live-patching, Michal Marek,
	Joe Lawrence, dri-devel, Thomas Zimmermann, Jessica Yu,
	Frederic Barrat, Daniel Vetter, linuxppc-dev
In-Reply-To: <20210201114749.GB19696@lst.de>

On Mon, 1 Feb 2021, Christoph Hellwig wrote:

> On Fri, Jan 29, 2021 at 10:43:36AM +0100, Petr Mladek wrote:
> > > --- a/kernel/livepatch/core.c
> > > +++ b/kernel/livepatch/core.c
> > > @@ -164,12 +164,8 @@ static int klp_find_object_symbol(const char *objname, const char *name,
> > >  		.pos = sympos,
> > >  	};
> > >  
> > > -	mutex_lock(&module_mutex);
> > > -	if (objname)
> > > +	if (objname || !kallsyms_on_each_symbol(klp_find_callback, &args))
> > >  		module_kallsyms_on_each_symbol(klp_find_callback, &args);
> > > -	else
> > > -		kallsyms_on_each_symbol(klp_find_callback, &args);
> > > -	mutex_unlock(&module_mutex);
> > 
> > This change is not needed. (objname == NULL) means that we are
> > interested only in symbols in "vmlinux".
> > 
> > module_kallsyms_on_each_symbol(klp_find_callback, &args)
> > will always fail when objname == NULL.
> 
> I just tried to keep the old behavior.  I can respin it with your
> recommended change noting the change in behavior, though.

Yes, please. It would be cleaner that way.

Miroslav

^ permalink raw reply

* Re: [PATCH 05/13] kallsyms: refactor {,module_}kallsyms_on_each_symbol
From: Miroslav Benes @ 2021-02-01 14:02 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Petr Mladek, Jiri Kosina, Andrew Donnellan, linux-kbuild,
	David Airlie, Masahiro Yamada, Josh Poimboeuf, Maarten Lankhorst,
	linux-kernel, Maxime Ripard, live-patching, Michal Marek,
	Joe Lawrence, dri-devel, Thomas Zimmermann, Jessica Yu,
	Frederic Barrat, Daniel Vetter, linuxppc-dev
In-Reply-To: <20210128181421.2279-6-hch@lst.de>

One more thing...

> @@ -4379,8 +4379,7 @@ int module_kallsyms_on_each_symbol(int (*fn)(void *, const char *,
>  	unsigned int i;
>  	int ret;
>  
> -	module_assert_mutex();
> -
> +	mutex_lock(&module_mutex);
>  	list_for_each_entry(mod, &modules, list) {
>  		/* We hold module_mutex: no need for rcu_dereference_sched */
>  		struct mod_kallsyms *kallsyms = mod->kallsyms;

This was the last user of module_assert_mutex(), which can be removed now.

Miroslav

^ permalink raw reply

* RE: [PATCH v4 02/10] powerpc/signal: Add unsafe_copy_{vsx, fpr}_from_user()
From: Christopher M. Riedl @ 2021-02-01 15:55 UTC (permalink / raw)
  To: David Laight, linuxppc-dev@lists.ozlabs.org
In-Reply-To: <6a6ce1a53fcf4669a9848114d3460fef@AcuMS.aculab.com>

On Thu Jan 28, 2021 at 4:38 AM CST, David Laight wrote:
> From: Christopher M. Riedl
> > Sent: 28 January 2021 04:04
> > 
> > Reuse the "safe" implementation from signal.c except for calling
> > unsafe_copy_from_user() to copy into a local buffer.
> > 
> > Signed-off-by: Christopher M. Riedl <cmr@codefail.de>
> > ---
> >  arch/powerpc/kernel/signal.h | 33 +++++++++++++++++++++++++++++++++
> >  1 file changed, 33 insertions(+)
> > 
> > diff --git a/arch/powerpc/kernel/signal.h b/arch/powerpc/kernel/signal.h
> > index 2559a681536e..c18402d625f1 100644
> > --- a/arch/powerpc/kernel/signal.h
> > +++ b/arch/powerpc/kernel/signal.h
> > @@ -53,6 +53,33 @@ unsigned long copy_ckfpr_from_user(struct task_struct *task, void __user *from);
> >  				&buf[i], label);\
> >  } while (0)
> > 
> > +#define unsafe_copy_fpr_from_user(task, from, label)	do {		\
> > +	struct task_struct *__t = task;					\
> > +	u64 __user *__f = (u64 __user *)from;				\
> > +	u64 buf[ELF_NFPREG];						\
>
> How big is that buffer?
> Isn't is likely to be reasonably large compared to a reasonable
> kernel stack frame.
> Especially since this isn't even a leaf function.
>

I think Christophe answered this - I don't really have an opinion either
way. What would be a 'reasonable' kernel stack frame for reference?

> > +	int i;								\
> > +									\
> > +	unsafe_copy_from_user(buf, __f, ELF_NFPREG * sizeof(double),	\
>
> That really ought to be sizeof(buf).
>

Agreed, I will fix this. Thanks!

> David
>
>
> > +				label);					\
> > +	for (i = 0; i < ELF_NFPREG - 1; i++)				\
> > +		__t->thread.TS_FPR(i) = buf[i];				\
> > +	__t->thread.fp_state.fpscr = buf[i];				\
> > +} while (0)
> > +
> > +#define unsafe_copy_vsx_from_user(task, from, label)	do {		\
> > +	struct task_struct *__t = task;					\
> > +	u64 __user *__f = (u64 __user *)from;				\
> > +	u64 buf[ELF_NVSRHALFREG];					\
> > +	int i;								\
> > +									\
> > +	unsafe_copy_from_user(buf, __f,					\
> > +				ELF_NVSRHALFREG * sizeof(double),	\
> > +				label);					\
> > +	for (i = 0; i < ELF_NVSRHALFREG ; i++)				\
> > +		__t->thread.fp_state.fpr[i][TS_VSRLOWOFFSET] = buf[i];	\
> > +} while (0)
> > +
> > +
> >  #ifdef CONFIG_PPC_TRANSACTIONAL_MEM
> >  #define unsafe_copy_ckfpr_to_user(to, task, label)	do {		\
> >  	struct task_struct *__t = task;					\
> > @@ -80,6 +107,10 @@ unsigned long copy_ckfpr_from_user(struct task_struct *task, void __user *from);
> >  	unsafe_copy_to_user(to, (task)->thread.fp_state.fpr,	\
> >  			    ELF_NFPREG * sizeof(double), label)
> > 
> > +#define unsafe_copy_fpr_from_user(task, from, label)			\
> > +	unsafe_copy_from_user((task)->thread.fp_state.fpr, from,	\
> > +			    ELF_NFPREG * sizeof(double), label)
> > +
> >  static inline unsigned long
> >  copy_fpr_to_user(void __user *to, struct task_struct *task)
> >  {
> > @@ -115,6 +146,8 @@ copy_ckfpr_from_user(struct task_struct *task, void __user *from)
> >  #else
> >  #define unsafe_copy_fpr_to_user(to, task, label) do { } while (0)
> > 
> > +#define unsafe_copy_fpr_from_user(task, from, label) do { } while (0)
> > +
> >  static inline unsigned long
> >  copy_fpr_to_user(void __user *to, struct task_struct *task)
> >  {
> > --
> > 2.26.1
>
> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes,
> MK1 1PT, UK
> Registration No: 1397386 (Wales)


^ permalink raw reply

* RE: [PATCH v4 02/10] powerpc/signal: Add unsafe_copy_{vsx, fpr}_from_user()
From: David Laight @ 2021-02-01 16:15 UTC (permalink / raw)
  To: 'Christopher M. Riedl', linuxppc-dev@lists.ozlabs.org
In-Reply-To: <C8YBET4IGYGF.3QYANVRRHMV0R@geist>

From: Christopher M. Riedl
> Sent: 01 February 2021 15:56
> 
> On Thu Jan 28, 2021 at 4:38 AM CST, David Laight wrote:
> > From: Christopher M. Riedl
> > > Sent: 28 January 2021 04:04
> > >
> > > Reuse the "safe" implementation from signal.c except for calling
> > > unsafe_copy_from_user() to copy into a local buffer.
> > >
> > > Signed-off-by: Christopher M. Riedl <cmr@codefail.de>
> > > ---
> > >  arch/powerpc/kernel/signal.h | 33 +++++++++++++++++++++++++++++++++
> > >  1 file changed, 33 insertions(+)
> > >
> > > diff --git a/arch/powerpc/kernel/signal.h b/arch/powerpc/kernel/signal.h
> > > index 2559a681536e..c18402d625f1 100644
> > > --- a/arch/powerpc/kernel/signal.h
> > > +++ b/arch/powerpc/kernel/signal.h
> > > @@ -53,6 +53,33 @@ unsigned long copy_ckfpr_from_user(struct task_struct *task, void __user *from);
> > >  				&buf[i], label);\
> > >  } while (0)
> > >
> > > +#define unsafe_copy_fpr_from_user(task, from, label)	do {		\
> > > +	struct task_struct *__t = task;					\
> > > +	u64 __user *__f = (u64 __user *)from;				\
> > > +	u64 buf[ELF_NFPREG];						\
> >
> > How big is that buffer?
> > Isn't is likely to be reasonably large compared to a reasonable
> > kernel stack frame.
> > Especially since this isn't even a leaf function.
> >
> 
> I think Christophe answered this - I don't really have an opinion either
> way. What would be a 'reasonable' kernel stack frame for reference?

Zero :-)

> 
> > > +	int i;								\
> > > +									\
> > > +	unsafe_copy_from_user(buf, __f, ELF_NFPREG * sizeof(double),	\
> > > +				label);					\
> > > +	for (i = 0; i < ELF_NFPREG - 1; i++)				\
> > > +		__t->thread.TS_FPR(i) = buf[i];				\
> > > +	__t->thread.fp_state.fpscr = buf[i];				\
> > > +} while (0)

On further reflection, since you immediately loop through the buffer
why not just use user_access_begin() and unsafe_get_user() in the loop.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply

* Re: [PATCH 05/13] kallsyms: refactor {,module_}kallsyms_on_each_symbol
From: Christoph Hellwig @ 2021-02-01 16:28 UTC (permalink / raw)
  To: Miroslav Benes
  Cc: Petr Mladek, Jiri Kosina, Andrew Donnellan, linux-kbuild,
	David Airlie, Masahiro Yamada, Josh Poimboeuf, Maarten Lankhorst,
	linux-kernel, Maxime Ripard, live-patching, Michal Marek,
	Joe Lawrence, dri-devel, Thomas Zimmermann, Jessica Yu,
	Frederic Barrat, Daniel Vetter, linuxppc-dev, Christoph Hellwig
In-Reply-To: <alpine.LSU.2.21.2102011436320.21637@pobox.suse.cz>

On Mon, Feb 01, 2021 at 02:37:12PM +0100, Miroslav Benes wrote:
> > > This change is not needed. (objname == NULL) means that we are
> > > interested only in symbols in "vmlinux".
> > > 
> > > module_kallsyms_on_each_symbol(klp_find_callback, &args)
> > > will always fail when objname == NULL.
> > 
> > I just tried to keep the old behavior.  I can respin it with your
> > recommended change noting the change in behavior, though.
> 
> Yes, please. It would be cleaner that way.

Let me know if this works for you:

---
From 18af41e88d088cfb8680d1669fcae2bc2ede5328 Mon Sep 17 00:00:00 2001
From: Christoph Hellwig <hch@lst.de>
Date: Wed, 20 Jan 2021 16:23:16 +0100
Subject: kallsyms: refactor {,module_}kallsyms_on_each_symbol

Require an explicit call to module_kallsyms_on_each_symbol to look
for symbols in modules instead of the call from kallsyms_on_each_symbol,
and acquire module_mutex inside of module_kallsyms_on_each_symbol instead
of leaving that up to the caller.  Note that this slightly changes the
behavior for the livepatch code in that the symbols from vmlinux are not
iterated anymore if objname is set, but that actually is the desired
behavior in this case.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 kernel/kallsyms.c       |  6 +++++-
 kernel/livepatch/core.c |  2 --
 kernel/module.c         | 13 ++++---------
 3 files changed, 9 insertions(+), 12 deletions(-)

diff --git a/kernel/kallsyms.c b/kernel/kallsyms.c
index fe9de067771c34..a0d3f0865916f9 100644
--- a/kernel/kallsyms.c
+++ b/kernel/kallsyms.c
@@ -177,6 +177,10 @@ unsigned long kallsyms_lookup_name(const char *name)
 	return module_kallsyms_lookup_name(name);
 }
 
+/*
+ * Iterate over all symbols in vmlinux.  For symbols from modules use
+ * module_kallsyms_on_each_symbol instead.
+ */
 int kallsyms_on_each_symbol(int (*fn)(void *, const char *, struct module *,
 				      unsigned long),
 			    void *data)
@@ -192,7 +196,7 @@ int kallsyms_on_each_symbol(int (*fn)(void *, const char *, struct module *,
 		if (ret != 0)
 			return ret;
 	}
-	return module_kallsyms_on_each_symbol(fn, data);
+	return 0;
 }
 
 static unsigned long get_symbol_pos(unsigned long addr,
diff --git a/kernel/livepatch/core.c b/kernel/livepatch/core.c
index 262cd9b003b9f0..335d988bd81117 100644
--- a/kernel/livepatch/core.c
+++ b/kernel/livepatch/core.c
@@ -164,12 +164,10 @@ static int klp_find_object_symbol(const char *objname, const char *name,
 		.pos = sympos,
 	};
 
-	mutex_lock(&module_mutex);
 	if (objname)
 		module_kallsyms_on_each_symbol(klp_find_callback, &args);
 	else
 		kallsyms_on_each_symbol(klp_find_callback, &args);
-	mutex_unlock(&module_mutex);
 
 	/*
 	 * Ensure an address was found. If sympos is 0, ensure symbol is unique;
diff --git a/kernel/module.c b/kernel/module.c
index 6772fb2680eb3e..25345792c770d1 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -255,11 +255,6 @@ static void mod_update_bounds(struct module *mod)
 struct list_head *kdb_modules = &modules; /* kdb needs the list of modules */
 #endif /* CONFIG_KGDB_KDB */
 
-static void module_assert_mutex(void)
-{
-	lockdep_assert_held(&module_mutex);
-}
-
 static void module_assert_mutex_or_preempt(void)
 {
 #ifdef CONFIG_LOCKDEP
@@ -4379,8 +4374,7 @@ int module_kallsyms_on_each_symbol(int (*fn)(void *, const char *,
 	unsigned int i;
 	int ret;
 
-	module_assert_mutex();
-
+	mutex_lock(&module_mutex);
 	list_for_each_entry(mod, &modules, list) {
 		/* We hold module_mutex: no need for rcu_dereference_sched */
 		struct mod_kallsyms *kallsyms = mod->kallsyms;
@@ -4396,10 +4390,11 @@ int module_kallsyms_on_each_symbol(int (*fn)(void *, const char *,
 			ret = fn(data, kallsyms_symbol_name(kallsyms, i),
 				 mod, kallsyms_symbol_value(sym));
 			if (ret != 0)
-				return ret;
+				break;
 		}
 	}
-	return 0;
+	mutex_unlock(&module_mutex);
+	return ret;
 }
 #endif /* CONFIG_KALLSYMS */
 
-- 
2.29.2


^ permalink raw reply related

* RE: [PATCH v4 02/10] powerpc/signal: Add unsafe_copy_{vsx, fpr}_from_user()
From: Christopher M. Riedl @ 2021-02-01 16:55 UTC (permalink / raw)
  To: David Laight, linuxppc-dev@lists.ozlabs.org
In-Reply-To: <0433d40adacc47a3a27bc8bc35e076e3@AcuMS.aculab.com>

On Mon Feb 1, 2021 at 10:15 AM CST, David Laight wrote:
> From: Christopher M. Riedl
> > Sent: 01 February 2021 15:56
> > 
> > On Thu Jan 28, 2021 at 4:38 AM CST, David Laight wrote:
> > > From: Christopher M. Riedl
> > > > Sent: 28 January 2021 04:04
> > > >
> > > > Reuse the "safe" implementation from signal.c except for calling
> > > > unsafe_copy_from_user() to copy into a local buffer.
> > > >
> > > > Signed-off-by: Christopher M. Riedl <cmr@codefail.de>
> > > > ---
> > > >  arch/powerpc/kernel/signal.h | 33 +++++++++++++++++++++++++++++++++
> > > >  1 file changed, 33 insertions(+)
> > > >
> > > > diff --git a/arch/powerpc/kernel/signal.h b/arch/powerpc/kernel/signal.h
> > > > index 2559a681536e..c18402d625f1 100644
> > > > --- a/arch/powerpc/kernel/signal.h
> > > > +++ b/arch/powerpc/kernel/signal.h
> > > > @@ -53,6 +53,33 @@ unsigned long copy_ckfpr_from_user(struct task_struct *task, void __user *from);
> > > >  				&buf[i], label);\
> > > >  } while (0)
> > > >
> > > > +#define unsafe_copy_fpr_from_user(task, from, label)	do {		\
> > > > +	struct task_struct *__t = task;					\
> > > > +	u64 __user *__f = (u64 __user *)from;				\
> > > > +	u64 buf[ELF_NFPREG];						\
> > >
> > > How big is that buffer?
> > > Isn't is likely to be reasonably large compared to a reasonable
> > > kernel stack frame.
> > > Especially since this isn't even a leaf function.
> > >
> > 
> > I think Christophe answered this - I don't really have an opinion either
> > way. What would be a 'reasonable' kernel stack frame for reference?
>
> Zero :-)
>

Hehe good point!

> > 
> > > > +	int i;								\
> > > > +									\
> > > > +	unsafe_copy_from_user(buf, __f, ELF_NFPREG * sizeof(double),	\
> > > > +				label);					\
> > > > +	for (i = 0; i < ELF_NFPREG - 1; i++)				\
> > > > +		__t->thread.TS_FPR(i) = buf[i];				\
> > > > +	__t->thread.fp_state.fpscr = buf[i];				\
> > > > +} while (0)
>
> On further reflection, since you immediately loop through the buffer
> why not just use user_access_begin() and unsafe_get_user() in the loop.

Christophe had suggested this a few revisions ago as well. When I tried
this approach, the signal handling performance took a pretty big hit:
https://lists.ozlabs.org/pipermail/linuxppc-dev/2020-October/219351.html

I included some numbers on v3 as well but decided to drop the approach
altogether for this one since it just didn't seem worth the hit.

>
> David
>
> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes,
> MK1 1PT, UK
> Registration No: 1397386 (Wales)


^ permalink raw reply

* Re: [PATCH v4 02/10] powerpc/signal: Add unsafe_copy_{vsx, fpr}_from_user()
From: Gabriel Paubert @ 2021-02-01 16:54 UTC (permalink / raw)
  To: Christopher M. Riedl; +Cc: David Laight, linuxppc-dev@lists.ozlabs.org
In-Reply-To: <C8YBET4IGYGF.3QYANVRRHMV0R@geist>

On Mon, Feb 01, 2021 at 09:55:44AM -0600, Christopher M. Riedl wrote:
> On Thu Jan 28, 2021 at 4:38 AM CST, David Laight wrote:
> > From: Christopher M. Riedl
> > > Sent: 28 January 2021 04:04
> > > 
> > > Reuse the "safe" implementation from signal.c except for calling
> > > unsafe_copy_from_user() to copy into a local buffer.
> > > 
> > > Signed-off-by: Christopher M. Riedl <cmr@codefail.de>
> > > ---
> > >  arch/powerpc/kernel/signal.h | 33 +++++++++++++++++++++++++++++++++
> > >  1 file changed, 33 insertions(+)
> > > 
> > > diff --git a/arch/powerpc/kernel/signal.h b/arch/powerpc/kernel/signal.h
> > > index 2559a681536e..c18402d625f1 100644
> > > --- a/arch/powerpc/kernel/signal.h
> > > +++ b/arch/powerpc/kernel/signal.h
> > > @@ -53,6 +53,33 @@ unsigned long copy_ckfpr_from_user(struct task_struct *task, void __user *from);
> > >  				&buf[i], label);\
> > >  } while (0)
> > > 
> > > +#define unsafe_copy_fpr_from_user(task, from, label)	do {		\
> > > +	struct task_struct *__t = task;					\
> > > +	u64 __user *__f = (u64 __user *)from;				\
> > > +	u64 buf[ELF_NFPREG];						\
> >
> > How big is that buffer?
> > Isn't is likely to be reasonably large compared to a reasonable
> > kernel stack frame.
> > Especially since this isn't even a leaf function.
> >
> 
> I think Christophe answered this - I don't really have an opinion either
> way. What would be a 'reasonable' kernel stack frame for reference?

See include/linux/poll.h, where the limit is of the order of 800 bytes
and the number of entries in an on stack array is chosen at compile time
(different between 32 and 64 bit for example).

The values are used in do_sys_poll, which, with almost 1000 bytes of
stack footprint, appears close to the top of "make checkstack". In
addition do_sys_poll has to call the ->poll function of every file
descriptor in its table, so it is not a tail function.

This 264 bytes array looks reasonable, but please use 'make checkstack'
to verify that the function's total stack usage stays within reason.

	Gabriel

> 
> > > +	int i;								\
> > > +									\
> > > +	unsafe_copy_from_user(buf, __f, ELF_NFPREG * sizeof(double),	\
> >
> > That really ought to be sizeof(buf).
> >
> 
> Agreed, I will fix this. Thanks!
> 
> > David
> >
> >
> > > +				label);					\
> > > +	for (i = 0; i < ELF_NFPREG - 1; i++)				\
> > > +		__t->thread.TS_FPR(i) = buf[i];				\
> > > +	__t->thread.fp_state.fpscr = buf[i];				\
> > > +} while (0)
> > > +
> > > +#define unsafe_copy_vsx_from_user(task, from, label)	do {		\
> > > +	struct task_struct *__t = task;					\
> > > +	u64 __user *__f = (u64 __user *)from;				\
> > > +	u64 buf[ELF_NVSRHALFREG];					\
> > > +	int i;								\
> > > +									\
> > > +	unsafe_copy_from_user(buf, __f,					\
> > > +				ELF_NVSRHALFREG * sizeof(double),	\
> > > +				label);					\
> > > +	for (i = 0; i < ELF_NVSRHALFREG ; i++)				\
> > > +		__t->thread.fp_state.fpr[i][TS_VSRLOWOFFSET] = buf[i];	\
> > > +} while (0)
> > > +
> > > +
> > >  #ifdef CONFIG_PPC_TRANSACTIONAL_MEM
> > >  #define unsafe_copy_ckfpr_to_user(to, task, label)	do {		\
> > >  	struct task_struct *__t = task;					\
> > > @@ -80,6 +107,10 @@ unsigned long copy_ckfpr_from_user(struct task_struct *task, void __user *from);
> > >  	unsafe_copy_to_user(to, (task)->thread.fp_state.fpr,	\
> > >  			    ELF_NFPREG * sizeof(double), label)
> > > 
> > > +#define unsafe_copy_fpr_from_user(task, from, label)			\
> > > +	unsafe_copy_from_user((task)->thread.fp_state.fpr, from,	\
> > > +			    ELF_NFPREG * sizeof(double), label)
> > > +
> > >  static inline unsigned long
> > >  copy_fpr_to_user(void __user *to, struct task_struct *task)
> > >  {
> > > @@ -115,6 +146,8 @@ copy_ckfpr_from_user(struct task_struct *task, void __user *from)
> > >  #else
> > >  #define unsafe_copy_fpr_to_user(to, task, label) do { } while (0)
> > > 
> > > +#define unsafe_copy_fpr_from_user(task, from, label) do { } while (0)
> > > +
> > >  static inline unsigned long
> > >  copy_fpr_to_user(void __user *to, struct task_struct *task)
> > >  {
> > > --
> > > 2.26.1
> >
> > -
> > Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes,
> > MK1 1PT, UK
> > Registration No: 1397386 (Wales)
> 
 


^ permalink raw reply

* RE: [PATCH v4 02/10] powerpc/signal: Add unsafe_copy_{vsx, fpr}_from_user()
From: David Laight @ 2021-02-01 17:37 UTC (permalink / raw)
  To: 'Christopher M. Riedl', linuxppc-dev@lists.ozlabs.org
In-Reply-To: <C8YCOH19N9EX.3LXG80WZT1N37@geist>

From: Christopher M. Riedl
> Sent: 01 February 2021 16:55
...
> > > > > +	int i;								\
> > > > > +									\
> > > > > +	unsafe_copy_from_user(buf, __f, ELF_NFPREG * sizeof(double),	\
> > > > > +				label);					\
> > > > > +	for (i = 0; i < ELF_NFPREG - 1; i++)				\
> > > > > +		__t->thread.TS_FPR(i) = buf[i];				\
> > > > > +	__t->thread.fp_state.fpscr = buf[i];				\
> > > > > +} while (0)
> >
> > On further reflection, since you immediately loop through the buffer
> > why not just use user_access_begin() and unsafe_get_user() in the loop.
> 
> Christophe had suggested this a few revisions ago as well. When I tried
> this approach, the signal handling performance took a pretty big hit:
> https://lists.ozlabs.org/pipermail/linuxppc-dev/2020-October/219351.html
> 
> I included some numbers on v3 as well but decided to drop the approach
> altogether for this one since it just didn't seem worth the hit.

Was that using unsafe_get_user (which relies on user_access_begin()
having 'opened up' user accesses) or just get_user() that does
it for every access?

The former should be ok, the latter will be horrid.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply

* RE: [PATCH v4 02/10] powerpc/signal: Add unsafe_copy_{vsx, fpr}_from_user()
From: Christopher M. Riedl @ 2021-02-01 17:43 UTC (permalink / raw)
  To: David Laight, linuxppc-dev@lists.ozlabs.org
In-Reply-To: <f6234b06ccb54cffb3583f40635636d3@AcuMS.aculab.com>

On Mon Feb 1, 2021 at 11:37 AM CST, David Laight wrote:
> From: Christopher M. Riedl
> > Sent: 01 February 2021 16:55
> ...
> > > > > > +	int i;								\
> > > > > > +									\
> > > > > > +	unsafe_copy_from_user(buf, __f, ELF_NFPREG * sizeof(double),	\
> > > > > > +				label);					\
> > > > > > +	for (i = 0; i < ELF_NFPREG - 1; i++)				\
> > > > > > +		__t->thread.TS_FPR(i) = buf[i];				\
> > > > > > +	__t->thread.fp_state.fpscr = buf[i];				\
> > > > > > +} while (0)
> > >
> > > On further reflection, since you immediately loop through the buffer
> > > why not just use user_access_begin() and unsafe_get_user() in the loop.
> > 
> > Christophe had suggested this a few revisions ago as well. When I tried
> > this approach, the signal handling performance took a pretty big hit:
> > https://lists.ozlabs.org/pipermail/linuxppc-dev/2020-October/219351.html
> > 
> > I included some numbers on v3 as well but decided to drop the approach
> > altogether for this one since it just didn't seem worth the hit.
>
> Was that using unsafe_get_user (which relies on user_access_begin()
> having 'opened up' user accesses) or just get_user() that does
> it for every access?
>
> The former should be ok, the latter will be horrid.

It was using unsafe_get_user() whereas unsafe_copy_from_user() will just
call the optimized __copy_tofrom_user() a single time - assuming that
user access is open.

>
> David
>
> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes,
> MK1 1PT, UK
> Registration No: 1397386 (Wales)


^ permalink raw reply

* Re: [PATCH v15 09/10] arm64: Call kmalloc() to allocate DTB buffer
From: Thiago Jung Bauermann @ 2021-02-01 18:58 UTC (permalink / raw)
  To: Joe Perches
  Cc: mark.rutland, bhsharma, tao.li, zohar, paulus, vincenzo.frascino,
	frowand.list, sashal, robh, Lakshmi Ramasubramanian, masahiroy,
	jmorris, takahiro.akashi, linux-arm-kernel, catalin.marinas,
	serge, devicetree, pasha.tatashin, Will Deacon, prsriva, hsinyi,
	allison, christophe.leroy, mbrugger, balajib, dmitry.kasatkin,
	linux-kernel, james.morse, gregkh, linux-integrity, linuxppc-dev
In-Reply-To: <d6ddcab2f9db25f49e89f37e1cb4f59ad42651e6.camel@perches.com>


Joe Perches <joe@perches.com> writes:

> On Thu, 2021-01-28 at 00:52 -0300, Thiago Jung Bauermann wrote:
>> The problem is that this patch implements only part of the suggestion,
>> which isn't useful in itself. So the patch series should either drop
>> this patch or consolidate the FDT allocation between the arches.
>> 
>> I just tested on powernv and pseries platforms and powerpc can use
>> vmalloc for the FDT buffer.
>
> Perhaps more sensible to use kvmalloc/kvfree.

That's true. Converting both arm64 to powerpc to kvmalloc/kvfree is a
good option. I don't think it's that much better though, because
kexec_file_load() is called infrequently and doesn't need to be fast so
the vmalloc() overhead isn't important in practice.

-- 
Thiago Jung Bauermann
IBM Linux Technology Center

^ permalink raw reply

* [PATCH] powerpc/64/signal: Fix regression in __kernel_sigtramp_rt64 semantics
From: Raoni Fassina Firmino @ 2021-02-01 20:05 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Nicholas Piggin

Tested on powerpc64 and powerpc64le, with a glibc build and running the
affected glibc's testcase[2], inspected that glibc's backtrace() now gives
the correct result and gdb backtrace also keeps working as before.

I believe this should be backported to releases 5.9 and 5.10 as userspace
is affected in this releases.

---- 8< ----

A Change[1] in __kernel_sigtramp_rt64 VDSO and trampoline code introduced a
regression in the way glibc's backtrace()[2] detects the signal-handler
stack frame.  Apart from the practical implications, __kernel_sigtram_rt64
was a VDSO with the semantics that it is a function you can call from
userspace to end a signal handling.  Now this semantics are no longer
valid.

I believe the aforementioned change affects all releases since 5.9.

This patch tries to fix both the semantics and practical aspect of
__kernel_sigtramp_rt64 returning it to the previous code, whilst keeping
the intended behavior from[1] by adding a new symbol to serve as the jump
target from the kernel to the trampoline. Now the trampoline has two parts,
an new entry point and the old return point.

[1] commit 0138ba5783ae0dcc799ad401a1e8ac8333790df9 ("powerpc/64/signal:
    Balance return predictor stack in signal trampoline")
[2] https://lists.ozlabs.org/pipermail/linuxppc-dev/2021-January/223194.html

Fixes: 0138ba5783ae ("powerpc/64/signal: Balance return predictor stack in signal trampoline")
Signed-off-by: Raoni Fassina Firmino <raoni@linux.ibm.com>
---
 arch/powerpc/kernel/vdso64/sigtramp.S   | 9 ++++++++-
 arch/powerpc/kernel/vdso64/vdso64.lds.S | 2 +-
 2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/vdso64/sigtramp.S b/arch/powerpc/kernel/vdso64/sigtramp.S
index bbf68cd01088..f0fd8d2a9fc4 100644
--- a/arch/powerpc/kernel/vdso64/sigtramp.S
+++ b/arch/powerpc/kernel/vdso64/sigtramp.S
@@ -15,11 +15,18 @@

 	.text

+/* __kernel_start_sigtramp_rt64 and __kernel_sigtramp_rt64 together
+   are one function split in two parts. The kernel jumps to the former
+   and the signal handler indirectly (by blr) returns to the latter.
+   __kernel_sigtramp_rt64 needs to point to the return address so
+   glibc can correctly identify the trampoline stack frame.  */
 	.balign 8
 	.balign IFETCH_ALIGN_BYTES
-V_FUNCTION_BEGIN(__kernel_sigtramp_rt64)
+V_FUNCTION_BEGIN(__kernel_start_sigtramp_rt64)
 .Lsigrt_start:
 	bctrl	/* call the handler */
+V_FUNCTION_END(__kernel_start_sigtramp_rt64)
+V_FUNCTION_BEGIN(__kernel_sigtramp_rt64)
 	addi	r1, r1, __SIGNAL_FRAMESIZE
 	li	r0,__NR_rt_sigreturn
 	sc
diff --git a/arch/powerpc/kernel/vdso64/vdso64.lds.S b/arch/powerpc/kernel/vdso64/vdso64.lds.S
index 6164d1a1ba11..2f3c359cacd3 100644
--- a/arch/powerpc/kernel/vdso64/vdso64.lds.S
+++ b/arch/powerpc/kernel/vdso64/vdso64.lds.S
@@ -131,4 +131,4 @@ VERSION
 /*
  * Make the sigreturn code visible to the kernel.
  */
-VDSO_sigtramp_rt64	= __kernel_sigtramp_rt64;
+VDSO_sigtramp_rt64	= __kernel_start_sigtramp_rt64;

base-commit: 76c057c84d286140c6c416c3b4ba832cd1d8984e
-- 
2.26.2

^ permalink raw reply related

* Re: [PATCH v4 02/10] powerpc/signal: Add unsafe_copy_{vsx, fpr}_from_user()
From: Christopher M. Riedl @ 2021-02-01 20:55 UTC (permalink / raw)
  To: Gabriel Paubert; +Cc: David Laight, linuxppc-dev@lists.ozlabs.org
In-Reply-To: <20210201165440.GA8929@lt-gp.iram.es>

On Mon Feb 1, 2021 at 10:54 AM CST, Gabriel Paubert wrote:
> On Mon, Feb 01, 2021 at 09:55:44AM -0600, Christopher M. Riedl wrote:
> > On Thu Jan 28, 2021 at 4:38 AM CST, David Laight wrote:
> > > From: Christopher M. Riedl
> > > > Sent: 28 January 2021 04:04
> > > > 
> > > > Reuse the "safe" implementation from signal.c except for calling
> > > > unsafe_copy_from_user() to copy into a local buffer.
> > > > 
> > > > Signed-off-by: Christopher M. Riedl <cmr@codefail.de>
> > > > ---
> > > >  arch/powerpc/kernel/signal.h | 33 +++++++++++++++++++++++++++++++++
> > > >  1 file changed, 33 insertions(+)
> > > > 
> > > > diff --git a/arch/powerpc/kernel/signal.h b/arch/powerpc/kernel/signal.h
> > > > index 2559a681536e..c18402d625f1 100644
> > > > --- a/arch/powerpc/kernel/signal.h
> > > > +++ b/arch/powerpc/kernel/signal.h
> > > > @@ -53,6 +53,33 @@ unsigned long copy_ckfpr_from_user(struct task_struct *task, void __user *from);
> > > >  				&buf[i], label);\
> > > >  } while (0)
> > > > 
> > > > +#define unsafe_copy_fpr_from_user(task, from, label)	do {		\
> > > > +	struct task_struct *__t = task;					\
> > > > +	u64 __user *__f = (u64 __user *)from;				\
> > > > +	u64 buf[ELF_NFPREG];						\
> > >
> > > How big is that buffer?
> > > Isn't is likely to be reasonably large compared to a reasonable
> > > kernel stack frame.
> > > Especially since this isn't even a leaf function.
> > >
> > 
> > I think Christophe answered this - I don't really have an opinion either
> > way. What would be a 'reasonable' kernel stack frame for reference?
>
> See include/linux/poll.h, where the limit is of the order of 800 bytes
> and the number of entries in an on stack array is chosen at compile time
> (different between 32 and 64 bit for example).
>
> The values are used in do_sys_poll, which, with almost 1000 bytes of
> stack footprint, appears close to the top of "make checkstack". In
> addition do_sys_poll has to call the ->poll function of every file
> descriptor in its table, so it is not a tail function.
>
> This 264 bytes array looks reasonable, but please use 'make checkstack'
> to verify that the function's total stack usage stays within reason.

Neat, looks like total usage is a bit larger but still reasonable and
less than half of 800B:

0xc000000000017e900 __unsafe_restore_sigcontext.constprop.0 [vmlinux]:352

Thanks for the tip!

>
> Gabriel
>
> > 
> > > > +	int i;								\
> > > > +									\
> > > > +	unsafe_copy_from_user(buf, __f, ELF_NFPREG * sizeof(double),	\
> > >
> > > That really ought to be sizeof(buf).
> > >
> > 
> > Agreed, I will fix this. Thanks!
> > 
> > > David
> > >
> > >
> > > > +				label);					\
> > > > +	for (i = 0; i < ELF_NFPREG - 1; i++)				\
> > > > +		__t->thread.TS_FPR(i) = buf[i];				\
> > > > +	__t->thread.fp_state.fpscr = buf[i];				\
> > > > +} while (0)
> > > > +
> > > > +#define unsafe_copy_vsx_from_user(task, from, label)	do {		\
> > > > +	struct task_struct *__t = task;					\
> > > > +	u64 __user *__f = (u64 __user *)from;				\
> > > > +	u64 buf[ELF_NVSRHALFREG];					\
> > > > +	int i;								\
> > > > +									\
> > > > +	unsafe_copy_from_user(buf, __f,					\
> > > > +				ELF_NVSRHALFREG * sizeof(double),	\
> > > > +				label);					\
> > > > +	for (i = 0; i < ELF_NVSRHALFREG ; i++)				\
> > > > +		__t->thread.fp_state.fpr[i][TS_VSRLOWOFFSET] = buf[i];	\
> > > > +} while (0)
> > > > +
> > > > +
> > > >  #ifdef CONFIG_PPC_TRANSACTIONAL_MEM
> > > >  #define unsafe_copy_ckfpr_to_user(to, task, label)	do {		\
> > > >  	struct task_struct *__t = task;					\
> > > > @@ -80,6 +107,10 @@ unsigned long copy_ckfpr_from_user(struct task_struct *task, void __user *from);
> > > >  	unsafe_copy_to_user(to, (task)->thread.fp_state.fpr,	\
> > > >  			    ELF_NFPREG * sizeof(double), label)
> > > > 
> > > > +#define unsafe_copy_fpr_from_user(task, from, label)			\
> > > > +	unsafe_copy_from_user((task)->thread.fp_state.fpr, from,	\
> > > > +			    ELF_NFPREG * sizeof(double), label)
> > > > +
> > > >  static inline unsigned long
> > > >  copy_fpr_to_user(void __user *to, struct task_struct *task)
> > > >  {
> > > > @@ -115,6 +146,8 @@ copy_ckfpr_from_user(struct task_struct *task, void __user *from)
> > > >  #else
> > > >  #define unsafe_copy_fpr_to_user(to, task, label) do { } while (0)
> > > > 
> > > > +#define unsafe_copy_fpr_from_user(task, from, label) do { } while (0)
> > > > +
> > > >  static inline unsigned long
> > > >  copy_fpr_to_user(void __user *to, struct task_struct *task)
> > > >  {
> > > > --
> > > > 2.26.1
> > >
> > > -
> > > Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes,
> > > MK1 1PT, UK
> > > Registration No: 1397386 (Wales)
> > 
>  


^ permalink raw reply

* Re: [PATCH] ASoC: fsl_xcvr: remove unneeded semicolon
From: Shengjiu Wang @ 2021-02-02  2:30 UTC (permalink / raw)
  To: Yang Li
  Cc: alsa-devel, Timur Tabi, Xiubo Li, linuxppc-dev, Takashi Iwai,
	Liam Girdwood, perex, Nicolin Chen, Mark Brown, p.zabel,
	Fabio Estevam, linux-kernel
In-Reply-To: <1612166909-129900-1-git-send-email-yang.lee@linux.alibaba.com>

On Mon, Feb 1, 2021 at 4:08 PM Yang Li <yang.lee@linux.alibaba.com> wrote:
>
> Eliminate the following coccicheck warning:
> ./sound/soc/fsl/fsl_xcvr.c:739:2-3: Unneeded semicolon
>
> Reported-by: Abaci Robot <abaci@linux.alibaba.com>
> Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>

Acked-by: Shengjiu Wang <shengjiu.wang@gmail.com>

^ permalink raw reply

* Re: [PATCH] ASoC: fsl_spdif: Utilize the defined parameter to clear code
From: Shengjiu Wang @ 2021-02-02  2:34 UTC (permalink / raw)
  To: Tang Bin
  Cc: alsa-devel, Timur Tabi, Xiubo Li, linux-kernel, Takashi Iwai,
	Liam Girdwood, perex, Nicolin Chen, Mark Brown, linuxppc-dev
In-Reply-To: <20210128112714.16324-1-tangbin@cmss.chinamobile.com>

On Thu, Jan 28, 2021 at 7:28 PM Tang Bin <tangbin@cmss.chinamobile.com> wrote:
>
> Utilize the defined parameter 'dev' to make the code cleaner.
>
> Signed-off-by: Tang Bin <tangbin@cmss.chinamobile.com>

Acked-by: Shengjiu Wang <shengjiu.wang@gmail.com>

^ permalink raw reply

* Re: [PATCH net v2] ibmvnic: device remove has higher precedence over reset
From: patchwork-bot+netdevbpf @ 2021-02-02  2:40 UTC (permalink / raw)
  To: Lijun Pan
  Cc: gregkh, julietk, netdev, u.kleine-koenig, paulus, kernel, drt,
	kuba, sukadev, linuxppc-dev, davem
In-Reply-To: <20210129043402.95744-1-ljp@linux.ibm.com>

Hello:

This patch was applied to netdev/net.git (refs/heads/master):

On Thu, 28 Jan 2021 22:34:01 -0600 you wrote:
> Returning -EBUSY in ibmvnic_remove() does not actually hold the
> removal procedure since driver core doesn't care for the return
> value (see __device_release_driver() in drivers/base/dd.c
> calling dev->bus->remove()) though vio_bus_remove
> (in arch/powerpc/platforms/pseries/vio.c) records the
> return value and passes it on. [1]
> 
> [...]

Here is the summary with links:
  - [net,v2] ibmvnic: device remove has higher precedence over reset
    https://git.kernel.org/netdev/net/c/5e9eff5dfa46

You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* [RFC PATCH 0/9] KVM: PPC: Book3S: C-ify the P9 entry/exit code
From: Nicholas Piggin @ 2021-02-02  3:03 UTC (permalink / raw)
  To: kvm-ppc; +Cc: linuxppc-dev, Nicholas Piggin

This series goes on top of the KVM patches I sent for the next
merge window. It's pretty rough and incomplete at the moment but
has started booting a simple guest into busybox in sim.

One of the main points of this is to avoid running KVM in a special
mode (aka "realmode") which is hostile to the rest of Linux and can't
be instrumented well by core facilities or generally use core code.

The asm itself is also tricky to follow. It's nothing compared with the
old HV path when you sit down and go through it, but it's not trivial
and sharing code paths makes things painful too.

One issue is what to do about PR-KVM and pre-ISAv3.0 / HPT HV-KVM.

The old path could also be converted similarly to C, although that's a
far bigger job. At least removing the asm code sharing makes a (slight)
simplification to the old path for now.

This change is a pretty clean break in the low level exit/entry code,
and the rest of the code already has many divergences, so I don't think
this exacerbates the problem, but if those PR / old-HV are to be
maintained well then we should eat the cost early to modernise that
code too. If they stay walled off with these interface shims then
they'll just be harder to maintain and the problem will definitely not
get better.

Now, the number of people who understand PR-KVM and have time to
maintain it is roughly zero, and for the old HV path it's approaching
zero. The radix, LPAR-per-thread programming model by comparison is so
nice and simple, maybe its better to just keep the rest on life support
and keep them going for as long as we can with the bare minimum.

Thanks,
Nick

Nicholas Piggin (9):
  KVM: PPC: Book3S 64: move KVM interrupt entry to a common entry point
  KVM: PPC: Book3S 64: Move GUEST_MODE_SKIP test into KVM
  KVM: PPC: Book3S 64: add hcall interrupt handler
  KVM: PPC: Book3S HV: Move hcall early register setup to KVM
  powerpc/64s: Remove EXSLB interrupt save area
  KVM: PPC: Book3S HV: Move interrupt early register setup to KVM
  KVM: PPC: Book3S HV: move bad_host_intr check to HV handler
  KVM: PPC: Book3S HV: Minimise hcall handler calling convention
    differences
  KVM: PPC: Book3S HV: Implement the rest of the P9 entry/exit handling
    in C

 arch/powerpc/include/asm/asm-prototypes.h |   2 +-
 arch/powerpc/include/asm/exception-64s.h  |  13 ++
 arch/powerpc/include/asm/kvm_asm.h        |   3 +-
 arch/powerpc/include/asm/kvm_book3s_64.h  |   2 +
 arch/powerpc/include/asm/kvm_ppc.h        |   2 +
 arch/powerpc/include/asm/paca.h           |   3 +-
 arch/powerpc/kernel/asm-offsets.c         |   1 -
 arch/powerpc/kernel/exceptions-64s.S      | 259 +++-------------------
 arch/powerpc/kvm/Makefile                 |   6 +
 arch/powerpc/kvm/book3s_64_entry.S        | 232 +++++++++++++++++++
 arch/powerpc/kvm/book3s_hv.c              |  21 +-
 arch/powerpc/kvm/book3s_hv_interrupt.c    | 164 ++++++++++++++
 arch/powerpc/kvm/book3s_hv_rmhandlers.S   | 136 ++----------
 arch/powerpc/kvm/book3s_segment.S         |   7 +
 arch/powerpc/kvm/book3s_xive.c            |  32 +++
 15 files changed, 523 insertions(+), 360 deletions(-)
 create mode 100644 arch/powerpc/kvm/book3s_64_entry.S
 create mode 100644 arch/powerpc/kvm/book3s_hv_interrupt.c

-- 
2.23.0

^ permalink raw reply

* [RFC PATCH 1/9] KVM: PPC: Book3S 64: move KVM interrupt entry to a common entry point
From: Nicholas Piggin @ 2021-02-02  3:03 UTC (permalink / raw)
  To: kvm-ppc; +Cc: linuxppc-dev, Nicholas Piggin
In-Reply-To: <20210202030313.3509446-1-npiggin@gmail.com>

Rather than bifurcate the call depending on whether or not HV is
possible, and have the HV entry test for PR, just make a single
common point which does the demultiplexing. This makes it simpler
to add another type of exit handler.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/kernel/exceptions-64s.S    |  8 +-----
 arch/powerpc/kvm/Makefile               |  3 +++
 arch/powerpc/kvm/book3s_64_entry.S      | 34 +++++++++++++++++++++++++
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 11 ++------
 4 files changed, 40 insertions(+), 16 deletions(-)
 create mode 100644 arch/powerpc/kvm/book3s_64_entry.S

diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
index e02ad6fefa46..65659ea3cec4 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -212,7 +212,6 @@ do_define_int n
 .endm
 
 #ifdef CONFIG_KVM_BOOK3S_64_HANDLER
-#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
 /*
  * All interrupts which set HSRR registers, as well as SRESET and MCE and
  * syscall when invoked with "sc 1" switch to MSR[HV]=1 (HVMODE) to be taken,
@@ -242,13 +241,8 @@ do_define_int n
 
 /*
  * If an interrupt is taken while a guest is running, it is immediately routed
- * to KVM to handle. If both HV and PR KVM arepossible, KVM interrupts go first
- * to kvmppc_interrupt_hv, which handles the PR guest case.
+ * to KVM to handle.
  */
-#define kvmppc_interrupt kvmppc_interrupt_hv
-#else
-#define kvmppc_interrupt kvmppc_interrupt_pr
-#endif
 
 .macro KVMTEST name
 	lbz	r10,HSTATE_IN_GUEST(r13)
diff --git a/arch/powerpc/kvm/Makefile b/arch/powerpc/kvm/Makefile
index 2bfeaa13befb..cdd119028f64 100644
--- a/arch/powerpc/kvm/Makefile
+++ b/arch/powerpc/kvm/Makefile
@@ -59,6 +59,9 @@ kvm-pr-y := \
 kvm-book3s_64-builtin-objs-$(CONFIG_KVM_BOOK3S_64_HANDLER) += \
 	tm.o
 
+kvm-book3s_64-builtin-objs-y += \
+	book3s_64_entry.o
+
 ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE
 kvm-book3s_64-builtin-objs-$(CONFIG_KVM_BOOK3S_64_HANDLER) += \
 	book3s_rmhandlers.o
diff --git a/arch/powerpc/kvm/book3s_64_entry.S b/arch/powerpc/kvm/book3s_64_entry.S
new file mode 100644
index 000000000000..22e34b95f478
--- /dev/null
+++ b/arch/powerpc/kvm/book3s_64_entry.S
@@ -0,0 +1,34 @@
+#include <asm/cache.h>
+#include <asm/ppc_asm.h>
+#include <asm/kvm_asm.h>
+#include <asm/reg.h>
+#include <asm/asm-offsets.h>
+#include <asm/kvm_book3s_asm.h>
+
+/*
+ * We come here from the first-level interrupt handlers.
+ */
+.global	kvmppc_interrupt
+.balign IFETCH_ALIGN_BYTES
+kvmppc_interrupt:
+	/*
+	 * Register contents:
+	 * R12		= (guest CR << 32) | interrupt vector
+	 * R13		= PACA
+	 * guest R12 saved in shadow VCPU SCRATCH0
+	 * guest R13 saved in SPRN_SCRATCH0
+	 */
+#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
+	std	r9, HSTATE_SCRATCH2(r13)
+	lbz	r9, HSTATE_IN_GUEST(r13)
+	cmpwi	r9, KVM_GUEST_MODE_HOST_HV
+	beq	kvmppc_bad_host_intr
+#ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE
+	cmpwi	r9, KVM_GUEST_MODE_GUEST
+	ld	r9, HSTATE_SCRATCH2(r13)
+	beq	kvmppc_interrupt_pr
+#endif
+	b	kvmppc_interrupt_hv
+#else
+	b	kvmppc_interrupt_pr
+#endif
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index 8cf1f69f442e..b9c4acd747f7 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -1255,16 +1255,8 @@ kvmppc_interrupt_hv:
 	 * R13		= PACA
 	 * guest R12 saved in shadow VCPU SCRATCH0
 	 * guest R13 saved in SPRN_SCRATCH0
+	 * guest R9 saved in HSTATE_SCRATCH2
 	 */
-	std	r9, HSTATE_SCRATCH2(r13)
-	lbz	r9, HSTATE_IN_GUEST(r13)
-	cmpwi	r9, KVM_GUEST_MODE_HOST_HV
-	beq	kvmppc_bad_host_intr
-#ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE
-	cmpwi	r9, KVM_GUEST_MODE_GUEST
-	ld	r9, HSTATE_SCRATCH2(r13)
-	beq	kvmppc_interrupt_pr
-#endif
 	/* We're now back in the host but in guest MMU context */
 	li	r9, KVM_GUEST_MODE_HOST_HV
 	stb	r9, HSTATE_IN_GUEST(r13)
@@ -3253,6 +3245,7 @@ END_FTR_SECTION_IFCLR(CPU_FTR_P9_TM_HV_ASSIST)
  * cfar is saved in HSTATE_CFAR(r13)
  * ppr is saved in HSTATE_PPR(r13)
  */
+.global kvmppc_bad_host_intr
 kvmppc_bad_host_intr:
 	/*
 	 * Switch to the emergency stack, but start half-way down in
-- 
2.23.0


^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox