LinuxPPC-Dev Archive on lore.kernel.org

LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed

* Re: [PATCH 05/13] kallsyms: refactor {,module_}kallsyms_on_each_symbol
From: Miroslav Benes @ 2021-02-01 13:37 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Petr Mladek, Jiri Kosina, Andrew Donnellan, linux-kbuild,
	David Airlie, Masahiro Yamada, Josh Poimboeuf, Maarten Lankhorst,
	linux-kernel, Maxime Ripard, live-patching, Michal Marek,
	Joe Lawrence, dri-devel, Thomas Zimmermann, Jessica Yu,
	Frederic Barrat, Daniel Vetter, linuxppc-dev
In-Reply-To: <20210201114749.GB19696@lst.de>

On Mon, 1 Feb 2021, Christoph Hellwig wrote:

> On Fri, Jan 29, 2021 at 10:43:36AM +0100, Petr Mladek wrote:
> > > --- a/kernel/livepatch/core.c
> > > +++ b/kernel/livepatch/core.c
> > > @@ -164,12 +164,8 @@ static int klp_find_object_symbol(const char *objname, const char *name,
> > >  		.pos = sympos,
> > >  	};
> > >  
> > > -	mutex_lock(&module_mutex);
> > > -	if (objname)
> > > +	if (objname || !kallsyms_on_each_symbol(klp_find_callback, &args))
> > >  		module_kallsyms_on_each_symbol(klp_find_callback, &args);
> > > -	else
> > > -		kallsyms_on_each_symbol(klp_find_callback, &args);
> > > -	mutex_unlock(&module_mutex);
> > 
> > This change is not needed. (objname == NULL) means that we are
> > interested only in symbols in "vmlinux".
> > 
> > module_kallsyms_on_each_symbol(klp_find_callback, &args)
> > will always fail when objname == NULL.
> 
> I just tried to keep the old behavior.  I can respin it with your
> recommended change noting the change in behavior, though.

Yes, please. It would be cleaner that way.

Miroslav

^ permalink raw reply

* Re: [PATCH 04/13] module: use RCU to synchronize find_module
From: Miroslav Benes @ 2021-02-01 13:16 UTC (permalink / raw)
  To: Jessica Yu
  Cc: Petr Mladek, Joe Lawrence, Andrew Donnellan, linux-kbuild,
	David Airlie, Masahiro Yamada, Jiri Kosina, Maarten Lankhorst,
	linux-kernel, Maxime Ripard, live-patching, Michal Marek,
	dri-devel, Thomas Zimmermann, Josh Poimboeuf, Frederic Barrat,
	Daniel Vetter, linuxppc-dev, Christoph Hellwig
In-Reply-To: <YBfvvdna9pSeu+1g@gunter>

On Mon, 1 Feb 2021, Jessica Yu wrote:

> +++ Miroslav Benes [29/01/21 16:29 +0100]:
> >On Thu, 28 Jan 2021, Christoph Hellwig wrote:
> >
> >> Allow for a RCU-sched critical section around find_module, following
> >> the lower level find_module_all helper, and switch the two callers
> >> outside of module.c to use such a RCU-sched critical section instead
> >> of module_mutex.
> >
> >That's a nice idea.
> >
> >> @@ -57,7 +58,7 @@ static void klp_find_object_module(struct klp_object
> >> *obj)
> >>   if (!klp_is_module(obj))
> >>    return;
> >>
> >> -	mutex_lock(&module_mutex);
> >> +	rcu_read_lock_sched();
> >>   /*
> >>    * We do not want to block removal of patched modules and therefore
> >>    * we do not take a reference here. The patches are removed by
> >> @@ -74,7 +75,7 @@ static void klp_find_object_module(struct klp_object
> >> *obj)
> >>   if (mod && mod->klp_alive)
> >
> >RCU always baffles me a bit, so I'll ask. Don't we need
> >rcu_dereference_sched() here? "mod" comes from a RCU-protected list, so I
> >wonder.
> 
> Same here :-) I had to double check the RCU documentation. For our
> modules list case I believe the rcu list API should take care of that
> for us. Worth noting is this snippet from Documentation/RCU/whatisRCU.txt:
> 
>    rcu_dereference() is typically used indirectly, via the _rcu
>    list-manipulation primitives, such as list_for_each_entry_rcu()

Ok, thanks to both for checking and explanation.

Ack to the patch then.

Miroslav

^ permalink raw reply

* Re: [RFC 00/20] TLB batching consolidation and enhancements
From: Peter Zijlstra @ 2021-02-01 12:44 UTC (permalink / raw)
  To: Nadav Amit
  Cc: Andrea Arcangeli, linux-s390, X86 ML, Yu Zhao, Will Deacon,
	Mel Gorman, Dave Hansen, LKML, Nicholas Piggin, Linux-MM,
	linux-csky@vger.kernel.org, Andy Lutomirski, Andrew Morton,
	linuxppc-dev, Thomas Gleixner
In-Reply-To: <A1589669-34AE-4E15-8358-79BAD7C72520@vmware.com>

On Sun, Jan 31, 2021 at 07:57:01AM +0000, Nadav Amit wrote:
> > On Jan 30, 2021, at 7:30 PM, Nicholas Piggin <npiggin@gmail.com> wrote:

> > I'll go through the patches a bit more closely when they all come 
> > through. Sparc and powerpc of course need the arch lazy mode to get 
> > per-page/pte information for operations that are not freeing pages, 
> > which is what mmu gather is designed for.
> 
> IIUC you mean any PTE change requires a TLB flush. Even setting up a new PTE
> where no previous PTE was set, right?

These are the HASH architectures. Their hardware doesn't walk the
page-tables, but it consults a hash-table to resolve page translations.

They _MUST_ flush the entries under the PTL to avoid ever seeing
conflicting information, which will make them really unhappy. They can
do this because they have TLBI broadcast.

There's a few more details I'm sure, but those seem to have slipped from
my mind.

^ permalink raw reply

* RE: [PATCH] powerpc/603: Fix protection of user pages mapped with PROT_NONE
From: PLATTNER Christoph @ 2021-02-01 12:35 UTC (permalink / raw)
  To: Christophe Leroy, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman
  Cc: christoph.plattner@gmx.at, KOENIG Werner, HAMETNER Reinhard,
	linux-kernel@vger.kernel.org, REITHER Robert - Contractor,
	PLATTNER Christoph, linuxppc-dev@lists.ozlabs.org
In-Reply-To: <035a7cde-7ffd-5f27-81e1-a8d3648e4c1c@csgroup.eu>

Thank you very much, I appreciate your fast responses.
Thank you also for clarification, I did completely oversee
the permission settings in the segment setup and expected
the fault reaction on the PP bits in the TLB.
And I will re-read the chapters, got get deeper into this topic.

Greetings
Christoph 


-----Original Message-----
From: Christophe Leroy <christophe.leroy@csgroup.eu> 
Sent: Montag, 1. Februar 2021 12:39
To: PLATTNER Christoph <christoph.plattner@thalesgroup.com>; Benjamin Herrenschmidt <benh@kernel.crashing.org>; Paul Mackerras <paulus@samba.org>; Michael Ellerman <mpe@ellerman.id.au>
Cc: linux-kernel@vger.kernel.org; linuxppc-dev@lists.ozlabs.org; HAMETNER Reinhard <reinhard.hametner@thalesgroup.com>; REITHER Robert - Contractor <robert.reither@external.thalesgroup.com>; KOENIG Werner <werner.koenig@thalesgroup.com>
Subject: Re: [PATCH] powerpc/603: Fix protection of user pages mapped with PROT_NONE



Le 01/02/2021 à 11:22, PLATTNER Christoph a écrit :
> Hello to all, and thank you very much for first and second fast response.
> 
> I do not have a long history on PowerPC MMU environment, I hacked into 
> this topic for about 3 months for analyzing that problem- so, sorry, if I am wrong in some points ...

Yes you are wrong on some points, sorry, see below.


> 
> What I learn so far from this MPC5121e (variant of e300c4 core):
> - It uses book3s32 hash-code, but it DOES NOT provide KEY hash method, so always the
>     branch  "if (! Hash) ...." is taken, so, I assume that "key 0" and "key 1" setups are not
>     used on this CPU (not supporting MMU_FTR_HPTE_TABLE)

hash method is not used, this is SW TLB loading that is used, but still, all the PP and Ks/Kp keys defined in the segment register are used, see e300 core reference manual §6.4.2 Page Memory Protection

> - The PP bits are NOT checked by the CPU in HW, even if set to 00, the CPU does not react.
>     As far I have understood, the TLB miss routines are responsible for checking permissions.
>     The TLB miss routines check the Linux PTE styled entries and generates the PP bits
>     for the TLB entry. The PowerPC PP bits are never check elsewhere on that CPU models ...

PP bits ARE checked hoppefully. If it was not the case, then the TLB miss routines would install a TLB on a read, then the user could do a write without any verification being done ?

Refer to e300 Core reference Manual, §6.1.4 Memory Protection Facilities

As I explained in the patch, the problem is not that the HW doesn't check the permission. It is that user accessed been done with key 0 as programmed in the segment registers, PP 00 means RW access.

> - The PTE entries in Linux are fully "void" in sense of this CPU type, as this CPU does not
>     read any PTEs from RAM (no HW support in contrast to x86 or ARM or later ppc...).

No, the PTE are read by the TLB miss exception handlers and writen into TLB entries.

> 
> In summary - as far as I understand it now - we have to handle the PTE 
> bits differently (Linux style) for PROT_NONE permissions - OR - we 
> have to expand the permission checking like my proposed experimental 
> patch. (PROT_NONE is not NUMA related only, but may not used very often ...).

Yes, expanding the permission checking is the easiest solution, hence the patch I sent out based on your proposal.

> 
> Another related point:
> According e300 RM (manual) the ACCESSED bit in the PTE shall be set on 
> TLB miss, as it is an indication, that page is used. In 4.4 kernel 
> this write back of the _PAGE_ACCESSED bit was performed after successful permission check:
> 
>          bne-    DataAddressInvalid      /* return if access not permitted */
>          ori     r0,r0,_PAGE_ACCESSED    /* set _PAGE_ACCESSED in pte */
>          /*
>           * NOTE! We are assuming this is not an SMP system, otherwise
>           * we would need to update the pte atomically with lwarx/stwcx.
>           */
>          stw     r0,0(r2)                /* update PTE (accessed bit) */
>          /* Convert linux-style PTE to low word of PPC-style PTE */
> 
> Bit is set (ori ...) and written back (stw ...) to Linux PTE. May be, 
> this is not needed, as the PTE is never seen by the PPC chip. But I do 
> not understand, WHY the PAGE_ACCCESSED is used for permission check in the late 5.4 kernel (not used in 4.4 kernel):
> 
> 	cmplw	0,r1,r3
>   	mfspr	r2, SPRN_SDR1
> 	li	r1, _PAGE_PRESENT | _PAGE_ACCESSED
> 	rlwinm	r2, r2, 28, 0xfffff000
>   	bgt-	112f
> 
> What is the reason or relevance for checking this here ?
> Was not checked in 4.4, bit or-ed afterwards, as it is accessed now.
> Do you know the reason of change on this point ?

PAGE_ACCESSED is important for memory management, linux kernel need it.

But instead of spending time at every miss to perform a write which will be a no-op in 99% of cases, we prefer bailing out to the page_fault logic when the accessed bit is not set. Then the page_fault logic will set the bit.
This also allowed to simplify the handling in __set_pte()_at function by avoiding races in the update of PTEs.

> 
> Another remark to Core manual relevant for this:
> There is the reference manual for e300 core available (e300 RM). It includes
> many remarks in range of Memory Management section, that many features
> are optional or variable for dedicated implementations. On the other hand,
> the MPC5121e reference manual refers to the e300 core RM, but DOES NOT
> information, which of the optional points are there or nor. According my
> analysis, MPC5121e does not include any of the optional features.
> 

Not sure what you mean. As far as I understand, that chapter tells you that some functionnalities 
are optional for the powerpc architectecture, and provided (or not) by the e300 core. The MPC5121 
supports all the things that are defined by e300 core.


> 
> Thanks a lot for first reactions

You are welcome, don't hesitate if you have additional questions.

Christophe

^ permalink raw reply

* Re: [PATCH 04/13] module: use RCU to synchronize find_module
From: Jessica Yu @ 2021-02-01 12:10 UTC (permalink / raw)
  To: Miroslav Benes
  Cc: Petr Mladek, Joe Lawrence, Andrew Donnellan, linux-kbuild,
	David Airlie, Masahiro Yamada, Jiri Kosina, Maarten Lankhorst,
	linux-kernel, Maxime Ripard, live-patching, Michal Marek,
	dri-devel, Thomas Zimmermann, Josh Poimboeuf, Frederic Barrat,
	Daniel Vetter, linuxppc-dev, Christoph Hellwig
In-Reply-To: <alpine.LSU.2.21.2101291626080.22237@pobox.suse.cz>

+++ Miroslav Benes [29/01/21 16:29 +0100]:
>On Thu, 28 Jan 2021, Christoph Hellwig wrote:
>
>> Allow for a RCU-sched critical section around find_module, following
>> the lower level find_module_all helper, and switch the two callers
>> outside of module.c to use such a RCU-sched critical section instead
>> of module_mutex.
>
>That's a nice idea.
>
>> @@ -57,7 +58,7 @@ static void klp_find_object_module(struct klp_object *obj)
>>  	if (!klp_is_module(obj))
>>  		return;
>>
>> -	mutex_lock(&module_mutex);
>> +	rcu_read_lock_sched();
>>  	/*
>>  	 * We do not want to block removal of patched modules and therefore
>>  	 * we do not take a reference here. The patches are removed by
>> @@ -74,7 +75,7 @@ static void klp_find_object_module(struct klp_object *obj)
>>  	if (mod && mod->klp_alive)
>
>RCU always baffles me a bit, so I'll ask. Don't we need
>rcu_dereference_sched() here? "mod" comes from a RCU-protected list, so I
>wonder.

Same here :-) I had to double check the RCU documentation. For our
modules list case I believe the rcu list API should take care of that
for us. Worth noting is this snippet from Documentation/RCU/whatisRCU.txt:

    rcu_dereference() is typically used indirectly, via the _rcu
    list-manipulation primitives, such as list_for_each_entry_rcu()



^ permalink raw reply

* Re: [RFC 11/20] mm/tlb: remove arch-specific tlb_start/end_vma()
From: Peter Zijlstra @ 2021-02-01 12:09 UTC (permalink / raw)
  To: Nadav Amit
  Cc: Andrea Arcangeli, linux-s390, x86, Yu Zhao, linuxppc-dev,
	Dave Hansen, linux-kernel, Nick Piggin, linux-mm, Nadav Amit,
	linux-csky, Andy Lutomirski, Andrew Morton, Will Deacon,
	Thomas Gleixner
In-Reply-To: <20210131001132.3368247-12-namit@vmware.com>

On Sat, Jan 30, 2021 at 04:11:23PM -0800, Nadav Amit wrote:

> diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h
> index 427bfcc6cdec..b97136b7010b 100644
> --- a/include/asm-generic/tlb.h
> +++ b/include/asm-generic/tlb.h
> @@ -334,8 +334,8 @@ static inline void __tlb_reset_range(struct mmu_gather *tlb)
>  
>  #ifdef CONFIG_MMU_GATHER_NO_RANGE
>  
> -#if defined(tlb_flush) || defined(tlb_start_vma) || defined(tlb_end_vma)
> -#error MMU_GATHER_NO_RANGE relies on default tlb_flush(), tlb_start_vma() and tlb_end_vma()
> +#if defined(tlb_flush)
> +#error MMU_GATHER_NO_RANGE relies on default tlb_flush()
>  #endif
>  
>  /*
> @@ -362,10 +362,6 @@ static inline void tlb_end_vma(struct mmu_gather *tlb, struct vm_area_struct *vm
>  
>  #ifndef tlb_flush
>  
> -#if defined(tlb_start_vma) || defined(tlb_end_vma)
> -#error Default tlb_flush() relies on default tlb_start_vma() and tlb_end_vma()
> -#endif

#ifdef CONFIG_ARCH_WANT_AGGRESSIVE_TLB_FLUSH_BATCHING
#error ....
#endif

goes here...


>  static inline void tlb_end_vma(struct mmu_gather *tlb, struct vm_area_struct *vma)
>  {
>  	if (tlb->fullmm)
>  		return;
>  
> +	if (IS_ENABLED(CONFIG_ARCH_WANT_AGGRESSIVE_TLB_FLUSH_BATCHING))
> +		return;

Also, can you please stick to the CONFIG_MMU_GATHER_* namespace?

I also don't think AGRESSIVE_FLUSH_BATCHING quite captures what it does.
How about:

	CONFIG_MMU_GATHER_NO_PER_VMA_FLUSH

?



^ permalink raw reply

* Re: [PATCH 05/13] kallsyms: refactor {,module_}kallsyms_on_each_symbol
From: Christoph Hellwig @ 2021-02-01 11:47 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Jiri Kosina, Andrew Donnellan, linux-kbuild, David Airlie,
	Masahiro Yamada, Josh Poimboeuf, Maarten Lankhorst, linux-kernel,
	Maxime Ripard, live-patching, Michal Marek, Joe Lawrence,
	dri-devel, Thomas Zimmermann, Jessica Yu, Frederic Barrat,
	Daniel Vetter, Miroslav Benes, linuxppc-dev, Christoph Hellwig
In-Reply-To: <YBPYyEvesLMrRtZM@alley>

On Fri, Jan 29, 2021 at 10:43:36AM +0100, Petr Mladek wrote:
> > --- a/kernel/livepatch/core.c
> > +++ b/kernel/livepatch/core.c
> > @@ -164,12 +164,8 @@ static int klp_find_object_symbol(const char *objname, const char *name,
> >  		.pos = sympos,
> >  	};
> >  
> > -	mutex_lock(&module_mutex);
> > -	if (objname)
> > +	if (objname || !kallsyms_on_each_symbol(klp_find_callback, &args))
> >  		module_kallsyms_on_each_symbol(klp_find_callback, &args);
> > -	else
> > -		kallsyms_on_each_symbol(klp_find_callback, &args);
> > -	mutex_unlock(&module_mutex);
> 
> This change is not needed. (objname == NULL) means that we are
> interested only in symbols in "vmlinux".
> 
> module_kallsyms_on_each_symbol(klp_find_callback, &args)
> will always fail when objname == NULL.

I just tried to keep the old behavior.  I can respin it with your
recommended change noting the change in behavior, though.

^ permalink raw reply

* Re: [PATCH 04/13] module: use RCU to synchronize find_module
From: Christoph Hellwig @ 2021-02-01 11:46 UTC (permalink / raw)
  To: Miroslav Benes
  Cc: Petr Mladek, Jiri Kosina, Andrew Donnellan, linux-kbuild,
	David Airlie, Masahiro Yamada, Josh Poimboeuf, Maarten Lankhorst,
	linux-kernel, Maxime Ripard, live-patching, Michal Marek,
	Joe Lawrence, dri-devel, Thomas Zimmermann, Jessica Yu,
	Frederic Barrat, Daniel Vetter, linuxppc-dev, Christoph Hellwig
In-Reply-To: <alpine.LSU.2.21.2101291626080.22237@pobox.suse.cz>

On Fri, Jan 29, 2021 at 04:29:02PM +0100, Miroslav Benes wrote:
> >  
> > -	mutex_lock(&module_mutex);
> > +	rcu_read_lock_sched();
> >  	/*
> >  	 * We do not want to block removal of patched modules and therefore
> >  	 * we do not take a reference here. The patches are removed by
> > @@ -74,7 +75,7 @@ static void klp_find_object_module(struct klp_object *obj)
> >  	if (mod && mod->klp_alive)
> 
> RCU always baffles me a bit, so I'll ask. Don't we need 
> rcu_dereference_sched() here? "mod" comes from a RCU-protected list, so I 
> wonder.

rcu_dereference* is only used for dereferencing points where that
reference itself is RCU protected, that is the lookup of mod itself down
in find_module_all in this case.

^ permalink raw reply

* Re: [PATCH] powerpc/603: Fix protection of user pages mapped with PROT_NONE
From: Christophe Leroy @ 2021-02-01 11:39 UTC (permalink / raw)
  To: PLATTNER Christoph, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman
  Cc: REITHER Robert - Contractor, HAMETNER Reinhard, KOENIG Werner,
	linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org
In-Reply-To: <1b194840-d4e6-4660-94d9-6bac623442cf@THSDC1IRIMBX13P.iris.infra.thales>



Le 01/02/2021 à 11:22, PLATTNER Christoph a écrit :
> Hello to all, and thank you very much for first and second fast response.
> 
> I do not have a long history on PowerPC MMU environment, I hacked into this topic
> for about 3 months for analyzing that problem- so, sorry, if I am wrong in some points ...

Yes you are wrong on some points, sorry, see below.


> 
> What I learn so far from this MPC5121e (variant of e300c4 core):
> - It uses book3s32 hash-code, but it DOES NOT provide KEY hash method, so always the
>     branch  "if (! Hash) ...." is taken, so, I assume that "key 0" and "key 1" setups are not
>     used on this CPU (not supporting MMU_FTR_HPTE_TABLE)

hash method is not used, this is SW TLB loading that is used, but still, all the PP and Ks/Kp keys 
defined in the segment register are used, see e300 core reference manual §6.4.2 Page Memory Protection

> - The PP bits are NOT checked by the CPU in HW, even if set to 00, the CPU does not react.
>     As far I have understood, the TLB miss routines are responsible for checking permissions.
>     The TLB miss routines check the Linux PTE styled entries and generates the PP bits
>     for the TLB entry. The PowerPC PP bits are never check elsewhere on that CPU models ...

PP bits ARE checked hoppefully. If it was not the case, then the TLB miss routines would install a 
TLB on a read, then the user could do a write without any verification being done ?

Refer to e300 Core reference Manual, §6.1.4 Memory Protection Facilities

As I explained in the patch, the problem is not that the HW doesn't check the permission. It is that 
user accessed been done with key 0 as programmed in the segment registers, PP 00 means RW access.

> - The PTE entries in Linux are fully "void" in sense of this CPU type, as this CPU does not
>     read any PTEs from RAM (no HW support in contrast to x86 or ARM or later ppc...).

No, the PTE are read by the TLB miss exception handlers and writen into TLB entries.

> 
> In summary - as far as I understand it now - we have to handle the PTE bits differently
> (Linux style) for PROT_NONE permissions - OR - we have to expand the permission
> checking like my proposed experimental patch. (PROT_NONE is not NUMA related only,
> but may not used very often ...).

Yes, expanding the permission checking is the easiest solution, hence the patch I sent out based on 
your proposal.

> 
> Another related point:
> According e300 RM (manual) the ACCESSED bit in the PTE shall be set on TLB miss, as
> it is an indication, that page is used. In 4.4 kernel this write back of the _PAGE_ACCESSED
> bit was performed after successful permission check:
> 
>          bne-    DataAddressInvalid      /* return if access not permitted */
>          ori     r0,r0,_PAGE_ACCESSED    /* set _PAGE_ACCESSED in pte */
>          /*
>           * NOTE! We are assuming this is not an SMP system, otherwise
>           * we would need to update the pte atomically with lwarx/stwcx.
>           */
>          stw     r0,0(r2)                /* update PTE (accessed bit) */
>          /* Convert linux-style PTE to low word of PPC-style PTE */
> 
> Bit is set (ori ...) and written back (stw ...) to Linux PTE. May be, this is not needed, as the
> PTE is never seen by the PPC chip. But I do not understand, WHY the PAGE_ACCCESSED
> is used for permission check in the late 5.4 kernel (not used in 4.4 kernel):
> 
> 	cmplw	0,r1,r3
>   	mfspr	r2, SPRN_SDR1
> 	li	r1, _PAGE_PRESENT | _PAGE_ACCESSED
> 	rlwinm	r2, r2, 28, 0xfffff000
>   	bgt-	112f
> 
> What is the reason or relevance for checking this here ?
> Was not checked in 4.4, bit or-ed afterwards, as it is accessed now.
> Do you know the reason of change on this point ?

PAGE_ACCESSED is important for memory management, linux kernel need it.

But instead of spending time at every miss to perform a write which will be a no-op in 99% of cases, 
we prefer bailing out to the page_fault logic when the accessed bit is not set. Then the page_fault 
logic will set the bit.
This also allowed to simplify the handling in __set_pte()_at function by avoiding races in the 
update of PTEs.

> 
> Another remark to Core manual relevant for this:
> There is the reference manual for e300 core available (e300 RM). It includes
> many remarks in range of Memory Management section, that many features
> are optional or variable for dedicated implementations. On the other hand,
> the MPC5121e reference manual refers to the e300 core RM, but DOES NOT
> information, which of the optional points are there or nor. According my
> analysis, MPC5121e does not include any of the optional features.
> 

Not sure what you mean. As far as I understand, that chapter tells you that some functionnalities 
are optional for the powerpc architectecture, and provided (or not) by the e300 core. The MPC5121 
supports all the things that are defined by e300 core.


> 
> Thanks a lot for first reactions

You are welcome, don't hesitate if you have additional questions.

Christophe

^ permalink raw reply

* RE: [PATCH] powerpc/603: Fix protection of user pages mapped with PROT_NONE
From: PLATTNER Christoph @ 2021-02-01 10:22 UTC (permalink / raw)
  To: Christophe Leroy, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman
  Cc: KOENIG Werner, HAMETNER Reinhard, linux-kernel@vger.kernel.org,
	REITHER Robert - Contractor, PLATTNER Christoph,
	linuxppc-dev@lists.ozlabs.org
In-Reply-To: <4a0c6e3bb8f0c162457bf54d9bc6fd8d7b55129f.1612160907.git.christophe.leroy@csgroup.eu>

Hello to all, and thank you very much for first and second fast response.

I do not have a long history on PowerPC MMU environment, I hacked into this topic
for about 3 months for analyzing that problem- so, sorry, if I am wrong in some points ...

What I learn so far from this MPC5121e (variant of e300c4 core):
- It uses book3s32 hash-code, but it DOES NOT provide KEY hash method, so always the 
   branch  "if (! Hash) ...." is taken, so, I assume that "key 0" and "key 1" setups are not
   used on this CPU (not supporting MMU_FTR_HPTE_TABLE)
- The PP bits are NOT checked by the CPU in HW, even if set to 00, the CPU does not react.
   As far I have understood, the TLB miss routines are responsible for checking permissions.
   The TLB miss routines check the Linux PTE styled entries and generates the PP bits
   for the TLB entry. The PowerPC PP bits are never check elsewhere on that CPU models ...
- The PTE entries in Linux are fully "void" in sense of this CPU type, as this CPU does not
   read any PTEs from RAM (no HW support in contrast to x86 or ARM or later ppc...).

In summary - as far as I understand it now - we have to handle the PTE bits differently
(Linux style) for PROT_NONE permissions - OR - we have to expand the permission 
checking like my proposed experimental patch. (PROT_NONE is not NUMA related only,
but may not used very often ...).

Another related point:
According e300 RM (manual) the ACCESSED bit in the PTE shall be set on TLB miss, as
it is an indication, that page is used. In 4.4 kernel this write back of the _PAGE_ACCESSED 
bit was performed after successful permission check:

        bne-    DataAddressInvalid      /* return if access not permitted */
        ori     r0,r0,_PAGE_ACCESSED    /* set _PAGE_ACCESSED in pte */
        /*
         * NOTE! We are assuming this is not an SMP system, otherwise
         * we would need to update the pte atomically with lwarx/stwcx.
         */
        stw     r0,0(r2)                /* update PTE (accessed bit) */
        /* Convert linux-style PTE to low word of PPC-style PTE */

Bit is set (ori ...) and written back (stw ...) to Linux PTE. May be, this is not needed, as the
PTE is never seen by the PPC chip. But I do not understand, WHY the PAGE_ACCCESSED 
is used for permission check in the late 5.4 kernel (not used in 4.4 kernel):

	cmplw	0,r1,r3
 	mfspr	r2, SPRN_SDR1
	li	r1, _PAGE_PRESENT | _PAGE_ACCESSED
	rlwinm	r2, r2, 28, 0xfffff000
 	bgt-	112f

What is the reason or relevance for checking this here ?
Was not checked in 4.4, bit or-ed afterwards, as it is accessed now.
Do you know the reason of change on this point ?

Another remark to Core manual relevant for this:
There is the reference manual for e300 core available (e300 RM). It includes
many remarks in range of Memory Management section, that many features
are optional or variable for dedicated implementations. On the other hand, 
the MPC5121e reference manual refers to the e300 core RM, but DOES NOT 
information, which of the optional points are there or nor. According my
analysis, MPC5121e does not include any of the optional features.

Thanks a lot for first reactions
Christoph

-----Original Message-----
From: Christophe Leroy <christophe.leroy@csgroup.eu> 
Sent: Montag, 1. Februar 2021 07:30
To: Benjamin Herrenschmidt <benh@kernel.crashing.org>; Paul Mackerras <paulus@samba.org>; Michael Ellerman <mpe@ellerman.id.au>; PLATTNER Christoph <christoph.plattner@thalesgroup.com>
Cc: linux-kernel@vger.kernel.org; linuxppc-dev@lists.ozlabs.org
Subject: [PATCH] powerpc/603: Fix protection of user pages mapped with PROT_NONE

On book3s/32, page protection is defined by the PP bits in the PTE which provide the following protection depending on the access keys defined in the matching segment register:
- PP 00 means RW with key 0 and N/A with key 1.
- PP 01 means RW with key 0 and RO with key 1.
- PP 10 means RW with both key 0 and key 1.
- PP 11 means RO with both key 0 and key 1.

Since the implementation of kernel userspace access protection, PP bits have been set as follows:
- PP00 for pages without _PAGE_USER
- PP01 for pages with _PAGE_USER and _PAGE_RW
- PP11 for pages with _PAGE_USER and without _PAGE_RW

For kernelspace segments, kernel accesses are performed with key 0 and user accesses are performed with key 1. As PP00 is used for non _PAGE_USER pages, user can't access kernel pages not flagged _PAGE_USER while kernel can.

For userspace segments, both kernel and user accesses are performed with key 0, therefore pages not flagged _PAGE_USER are still accessible to the user.

This shouldn't be an issue, because userspace is expected to be accessible to the user. But unlike most other architectures, powerpc implements PROT_NONE protection by removing _PAGE_USER flag instead of flagging the page as not valid. This means that pages in userspace that are not flagged _PAGE_USER shall remain inaccessible.

To get the expected behaviour, just mimic other architectures in the TLB miss handler by checking _PAGE_USER permission on userspace accesses as if it was the _PAGE_PRESENT bit.

Note that this problem only is only for 603 cores. The 604+ have an hash table, and hash_page() function already implement the verification of _PAGE_USER permission on userspace pages.

Reported-by: Christoph Plattner <christoph.plattner@thalesgroup.com>
Fixes: f342adca3afc ("powerpc/32s: Prepare Kernel Userspace Access Protection")
Cc: stable@vger.kernel.org
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
 arch/powerpc/kernel/head_book3s_32.S | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/head_book3s_32.S b/arch/powerpc/kernel/head_book3s_32.S
index 858fbc8b19f3..0004e8a6a58e 100644
--- a/arch/powerpc/kernel/head_book3s_32.S
+++ b/arch/powerpc/kernel/head_book3s_32.S
@@ -453,11 +453,12 @@ InstructionTLBMiss:
 	cmplw	0,r1,r3
 #endif
 	mfspr	r2, SPRN_SDR1
-	li	r1,_PAGE_PRESENT | _PAGE_ACCESSED | _PAGE_EXEC
+	li	r1,_PAGE_PRESENT | _PAGE_ACCESSED | _PAGE_EXEC | _PAGE_USER
 	rlwinm	r2, r2, 28, 0xfffff000
 #ifdef CONFIG_MODULES
 	bgt-	112f
 	lis	r2, (swapper_pg_dir - PAGE_OFFSET)@ha	/* if kernel address, use */
+	li	r1,_PAGE_PRESENT | _PAGE_ACCESSED | _PAGE_EXEC
 	addi	r2, r2, (swapper_pg_dir - PAGE_OFFSET)@l	/* kernel page table */
 #endif
 112:	rlwimi	r2,r3,12,20,29		/* insert top 10 bits of address */
@@ -516,10 +517,11 @@ DataLoadTLBMiss:
 	lis	r1, TASK_SIZE@h		/* check if kernel address */
 	cmplw	0,r1,r3
 	mfspr	r2, SPRN_SDR1
-	li	r1, _PAGE_PRESENT | _PAGE_ACCESSED
+	li	r1, _PAGE_PRESENT | _PAGE_ACCESSED | _PAGE_USER
 	rlwinm	r2, r2, 28, 0xfffff000
 	bgt-	112f
 	lis	r2, (swapper_pg_dir - PAGE_OFFSET)@ha	/* if kernel address, use */
+	li	r1, _PAGE_PRESENT | _PAGE_ACCESSED
 	addi	r2, r2, (swapper_pg_dir - PAGE_OFFSET)@l	/* kernel page table */
 112:	rlwimi	r2,r3,12,20,29		/* insert top 10 bits of address */
 	lwz	r2,0(r2)		/* get pmd entry */
@@ -593,10 +595,11 @@ DataStoreTLBMiss:
 	lis	r1, TASK_SIZE@h		/* check if kernel address */
 	cmplw	0,r1,r3
 	mfspr	r2, SPRN_SDR1
-	li	r1, _PAGE_RW | _PAGE_DIRTY | _PAGE_PRESENT | _PAGE_ACCESSED
+	li	r1, _PAGE_RW | _PAGE_DIRTY | _PAGE_PRESENT | _PAGE_ACCESSED | _PAGE_USER
 	rlwinm	r2, r2, 28, 0xfffff000
 	bgt-	112f
 	lis	r2, (swapper_pg_dir - PAGE_OFFSET)@ha	/* if kernel address, use */
+	li	r1, _PAGE_RW | _PAGE_DIRTY | _PAGE_PRESENT | _PAGE_ACCESSED
 	addi	r2, r2, (swapper_pg_dir - PAGE_OFFSET)@l	/* kernel page table */
 112:	rlwimi	r2,r3,12,20,29		/* insert top 10 bits of address */
 	lwz	r2,0(r2)		/* get pmd entry */
--
2.25.0

^ permalink raw reply related

* Re: FSL P5040: KVM HV doesn't work with the RC5 of kernel 5.11
From: Christian Zigotzky @ 2021-02-01  8:29 UTC (permalink / raw)
  To: linuxppc-dev, tglx, kvm-ppc@vger.kernel.org
  Cc: Darren Stevens, mad skateman, R.T.Dickinson
In-Reply-To: <b0ba0690-507f-7660-79e2-5268cc6684bf@xenosoft.de>

Hello,

I compiled the RC6 of kernel 5.11 today and KVM HV works again. 
Therefore I don't need the patch below anymore.

Many thanks for solving the issue,
Christian


On 27 January 2021 at 05:07pm, Christian Zigotzky wrote:
> Hello,
>
> I compiled the RC5 of kernel 5.11 today. Unfortunately KVM HV doesn't 
> work anymore on my FSL P5040 board [1]. I tested it with QEMU 5.0.0 
> today [2]. The virtual e5500 QEMU machine works with the "RC4 with KVM 
> HV" and with the "RC5 without KVM HV". The complete system freezes if 
> I use KVM HV with the RC5.
>
> I have bisected and 785025820a6a565185ce9d47fdd8d23dbf91dee8 
> (powerpc/mm/highmem: use __set_pte_at() for kmap_local()) [3] is the 
> first bad commit.
>
> I was able to revert this bad commit and after a new compiling, KVM HV 
> works again.  I created a patch for reverting the commit. [4] Please 
> find attached the kernel config. I use one uImage for the virtual 
> machine and for the P5040 board.
>
> Please check the first bad commit.
>
> Thanks,
> Christian
>
>
> [1] http://wiki.amiga.org/index.php?title=X5000
> [2] qemu-system-ppc64 -M ppce500 -cpu e5500 -enable-kvm -m 1024 
> -kernel uImage-5.11 -drive 
> format=raw,file=MintPPC32-X5000.img,index=0,if=virtio -netdev 
> user,id=mynet0 -device e1000,netdev=mynet0 -append "rw root=/dev/vda" 
> -device virtio-vga -device virtio-mouse-pci -device 
> virtio-keyboard-pci -device pci-ohci,id=newusb -device 
> usb-audio,bus=newusb.0 -smp 4
> [3] 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v5.11-rc5&id=785025820a6a565185ce9d47fdd8d23dbf91dee8
> [4]
> diff -rupN a/arch/powerpc/include/asm/highmem.h 
> b/arch/powerpc/include/asm/highmem.h
> --- a/arch/powerpc/include/asm/highmem.h        2021-01-27 
> 16:12:40.382164118 +0100
> +++ b/arch/powerpc/include/asm/highmem.h        2021-01-27 
> 16:10:54.055249957 +0100
> @@ -58,8 +58,6 @@ extern pte_t *pkmap_page_table;
>
>  #define flush_cache_kmaps()    flush_cache_all()
>
> -#define arch_kmap_local_set_pte(mm, vaddr, ptep, ptev) \
> -       __set_pte_at(mm, vaddr, ptep, ptev, 1)
>  #define arch_kmap_local_post_map(vaddr, pteval)        \
>         local_flush_tlb_page(NULL, vaddr)
>  #define arch_kmap_local_post_unmap(vaddr)      \


^ permalink raw reply

* [PATCH] ASoC: fsl_xcvr: remove unneeded semicolon
From: Yang Li @ 2021-02-01  8:08 UTC (permalink / raw)
  To: timur
  Cc: alsa-devel, linuxppc-dev, Xiubo.Lee, festevam, tiwai, lgirdwood,
	perex, nicoleotsuka, broonie, Yang Li, p.zabel, shengjiu.wang,
	linux-kernel

Eliminate the following coccicheck warning:
./sound/soc/fsl/fsl_xcvr.c:739:2-3: Unneeded semicolon

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
---
 sound/soc/fsl/fsl_xcvr.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/sound/soc/fsl/fsl_xcvr.c b/sound/soc/fsl/fsl_xcvr.c
index 3d58c88..65b388a 100644
--- a/sound/soc/fsl/fsl_xcvr.c
+++ b/sound/soc/fsl/fsl_xcvr.c
@@ -736,7 +736,7 @@ static int fsl_xcvr_load_firmware(struct fsl_xcvr *xcvr)
 			/* clean current page, including data memory */
 			memset_io(xcvr->ram_addr, 0, size);
 		}
-	};
+	}
 
 err_firmware:
 	release_firmware(fw);
-- 
1.8.3.1


^ permalink raw reply related

* [PATCH] powerpc/603: Fix protection of user pages mapped with PROT_NONE
From: Christophe Leroy @ 2021-02-01  6:29 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	Christoph Plattner
  Cc: linuxppc-dev, linux-kernel

On book3s/32, page protection is defined by the PP bits in the PTE
which provide the following protection depending on the access
keys defined in the matching segment register:
- PP 00 means RW with key 0 and N/A with key 1.
- PP 01 means RW with key 0 and RO with key 1.
- PP 10 means RW with both key 0 and key 1.
- PP 11 means RO with both key 0 and key 1.

Since the implementation of kernel userspace access protection,
PP bits have been set as follows:
- PP00 for pages without _PAGE_USER
- PP01 for pages with _PAGE_USER and _PAGE_RW
- PP11 for pages with _PAGE_USER and without _PAGE_RW

For kernelspace segments, kernel accesses are performed with key 0
and user accesses are performed with key 1. As PP00 is used for
non _PAGE_USER pages, user can't access kernel pages not flagged
_PAGE_USER while kernel can.

For userspace segments, both kernel and user accesses are performed
with key 0, therefore pages not flagged _PAGE_USER are still
accessible to the user.

This shouldn't be an issue, because userspace is expected to be
accessible to the user. But unlike most other architectures, powerpc
implements PROT_NONE protection by removing _PAGE_USER flag instead of
flagging the page as not valid. This means that pages in userspace
that are not flagged _PAGE_USER shall remain inaccessible.

To get the expected behaviour, just mimic other architectures in the
TLB miss handler by checking _PAGE_USER permission on userspace
accesses as if it was the _PAGE_PRESENT bit.

Note that this problem only is only for 603 cores. The 604+ have
an hash table, and hash_page() function already implement the
verification of _PAGE_USER permission on userspace pages.

Reported-by: Christoph Plattner <christoph.plattner@thalesgroup.com>
Fixes: f342adca3afc ("powerpc/32s: Prepare Kernel Userspace Access Protection")
Cc: stable@vger.kernel.org
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
 arch/powerpc/kernel/head_book3s_32.S | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/head_book3s_32.S b/arch/powerpc/kernel/head_book3s_32.S
index 858fbc8b19f3..0004e8a6a58e 100644
--- a/arch/powerpc/kernel/head_book3s_32.S
+++ b/arch/powerpc/kernel/head_book3s_32.S
@@ -453,11 +453,12 @@ InstructionTLBMiss:
 	cmplw	0,r1,r3
 #endif
 	mfspr	r2, SPRN_SDR1
-	li	r1,_PAGE_PRESENT | _PAGE_ACCESSED | _PAGE_EXEC
+	li	r1,_PAGE_PRESENT | _PAGE_ACCESSED | _PAGE_EXEC | _PAGE_USER
 	rlwinm	r2, r2, 28, 0xfffff000
 #ifdef CONFIG_MODULES
 	bgt-	112f
 	lis	r2, (swapper_pg_dir - PAGE_OFFSET)@ha	/* if kernel address, use */
+	li	r1,_PAGE_PRESENT | _PAGE_ACCESSED | _PAGE_EXEC
 	addi	r2, r2, (swapper_pg_dir - PAGE_OFFSET)@l	/* kernel page table */
 #endif
 112:	rlwimi	r2,r3,12,20,29		/* insert top 10 bits of address */
@@ -516,10 +517,11 @@ DataLoadTLBMiss:
 	lis	r1, TASK_SIZE@h		/* check if kernel address */
 	cmplw	0,r1,r3
 	mfspr	r2, SPRN_SDR1
-	li	r1, _PAGE_PRESENT | _PAGE_ACCESSED
+	li	r1, _PAGE_PRESENT | _PAGE_ACCESSED | _PAGE_USER
 	rlwinm	r2, r2, 28, 0xfffff000
 	bgt-	112f
 	lis	r2, (swapper_pg_dir - PAGE_OFFSET)@ha	/* if kernel address, use */
+	li	r1, _PAGE_PRESENT | _PAGE_ACCESSED
 	addi	r2, r2, (swapper_pg_dir - PAGE_OFFSET)@l	/* kernel page table */
 112:	rlwimi	r2,r3,12,20,29		/* insert top 10 bits of address */
 	lwz	r2,0(r2)		/* get pmd entry */
@@ -593,10 +595,11 @@ DataStoreTLBMiss:
 	lis	r1, TASK_SIZE@h		/* check if kernel address */
 	cmplw	0,r1,r3
 	mfspr	r2, SPRN_SDR1
-	li	r1, _PAGE_RW | _PAGE_DIRTY | _PAGE_PRESENT | _PAGE_ACCESSED
+	li	r1, _PAGE_RW | _PAGE_DIRTY | _PAGE_PRESENT | _PAGE_ACCESSED | _PAGE_USER
 	rlwinm	r2, r2, 28, 0xfffff000
 	bgt-	112f
 	lis	r2, (swapper_pg_dir - PAGE_OFFSET)@ha	/* if kernel address, use */
+	li	r1, _PAGE_RW | _PAGE_DIRTY | _PAGE_PRESENT | _PAGE_ACCESSED
 	addi	r2, r2, (swapper_pg_dir - PAGE_OFFSET)@l	/* kernel page table */
 112:	rlwimi	r2,r3,12,20,29		/* insert top 10 bits of address */
 	lwz	r2,0(r2)		/* get pmd entry */
-- 
2.25.0


^ permalink raw reply related

* Re: [PATCH] powerpc/akebono: Fix unmet dependency errors
From: Randy Dunlap @ 2021-02-01  4:11 UTC (permalink / raw)
  To: Michael Ellerman, linuxppc-dev; +Cc: f.fainelli, yury.norov
In-Reply-To: <20210201012503.940145-1-mpe@ellerman.id.au>

On 1/31/21 5:25 PM, Michael Ellerman wrote:
> The AKEBONO config has various selects under it, including some with
> user-selectable dependencies, which means those dependencies can be
> disabled. This leads to warnings from Kconfig.
> 
> 
> The problem is that AKEBONO is using select to enable things that are
> not true dependencies, but rather things you probably want enabled in
> an AKEBONO kernel. That is what a defconfig is for.
> 
> So drop those selects and instead move those symbols into the
> defconfig. This fixes all the kconfig warnings, and the result of make
> 44x/akebono_defconfig is the same before and after the patch.
> 
> Reported-by: Yury Norov <yury.norov@gmail.com>
> Reported-by: Randy Dunlap <rdunlap@infradead.org>
> Reported-by: Florian Fainelli <f.fainelli@gmail.com>
> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
> ---
>  arch/powerpc/configs/44x/akebono_defconfig | 5 +++++
>  arch/powerpc/platforms/44x/Kconfig         | 7 -------
>  2 files changed, 5 insertions(+), 7 deletions(-)

Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org>

Thanks.

-- 
~Randy

^ permalink raw reply

* Re: [PATCH] powerpc/akebono: Fix unmet dependency errors
From: Florian Fainelli @ 2021-02-01  2:17 UTC (permalink / raw)
  To: Michael Ellerman, linuxppc-dev; +Cc: rdunlap, yury.norov
In-Reply-To: <20210201012503.940145-1-mpe@ellerman.id.au>



On 1/31/2021 5:25 PM, Michael Ellerman wrote:
> The AKEBONO config has various selects under it, including some with
> user-selectable dependencies, which means those dependencies can be
> disabled. This leads to warnings from Kconfig.
> 
> This can be seen with eg:
> 
>   $ make allnoconfig
>   $ ./scripts/config --file build~/.config -k -e CONFIG_44x -k -e CONFIG_PPC_47x -e CONFIG_AKEBONO
>   $ make olddefconfig
> 
>   WARNING: unmet direct dependencies detected for ATA
>     Depends on [n]: HAS_IOMEM [=y] && BLOCK [=n]
>     Selected by [y]:
>     - AKEBONO [=y] && PPC_47x [=y]
> 
>   WARNING: unmet direct dependencies detected for NETDEVICES
>     Depends on [n]: NET [=n]
>     Selected by [y]:
>     - AKEBONO [=y] && PPC_47x [=y]
> 
>   WARNING: unmet direct dependencies detected for ETHERNET
>     Depends on [n]: NETDEVICES [=y] && NET [=n]
>     Selected by [y]:
>     - AKEBONO [=y] && PPC_47x [=y]
> 
>   WARNING: unmet direct dependencies detected for MMC_SDHCI
>     Depends on [n]: MMC [=n] && HAS_DMA [=y]
>     Selected by [y]:
>     - AKEBONO [=y] && PPC_47x [=y]
> 
>   WARNING: unmet direct dependencies detected for MMC_SDHCI_PLTFM
>     Depends on [n]: MMC [=n] && MMC_SDHCI [=y]
>     Selected by [y]:
>     - AKEBONO [=y] && PPC_47x [=y]
> 
> The problem is that AKEBONO is using select to enable things that are
> not true dependencies, but rather things you probably want enabled in
> an AKEBONO kernel. That is what a defconfig is for.
> 
> So drop those selects and instead move those symbols into the
> defconfig. This fixes all the kconfig warnings, and the result of make
> 44x/akebono_defconfig is the same before and after the patch.
> 
> Reported-by: Yury Norov <yury.norov@gmail.com>
> Reported-by: Randy Dunlap <rdunlap@infradead.org>
> Reported-by: Florian Fainelli <f.fainelli@gmail.com>
> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

Acked-by: Florian Fainelli <f.fainelli@gmail.com>

Much better than my patch, thanks!
-- 
Florian

^ permalink raw reply

* Re: [PATCH] cxl: Simplify bool conversion
From: Andrew Donnellan @ 2021-02-01  1:44 UTC (permalink / raw)
  To: Yang Li, fbarrat; +Cc: gregkh, linuxppc-dev, linux-kernel, arnd
In-Reply-To: <1611908705-98507-1-git-send-email-yang.lee@linux.alibaba.com>

On 29/1/21 7:25 pm, Yang Li wrote:
> Fix the following coccicheck warning:
> ./drivers/misc/cxl/sysfs.c:181:48-53: WARNING: conversion to bool not
> needed here
> 
> Reported-by: Abaci Robot <abaci@linux.alibaba.com>
> Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>

Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com>

Thanks!

> ---
>   drivers/misc/cxl/sysfs.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/misc/cxl/sysfs.c b/drivers/misc/cxl/sysfs.c
> index d97a243..c173a5e 100644
> --- a/drivers/misc/cxl/sysfs.c
> +++ b/drivers/misc/cxl/sysfs.c
> @@ -178,7 +178,7 @@ static ssize_t perst_reloads_same_image_store(struct device *device,
>   	if ((rc != 1) || !(val == 1 || val == 0))
>   		return -EINVAL;
>   
> -	adapter->perst_same_image = (val == 1 ? true : false);
> +	adapter->perst_same_image = (val == 1);
>   	return count;
>   }
>   
> 

-- 
Andrew Donnellan              OzLabs, ADL Canberra
ajd@linux.ibm.com             IBM Australia Limited

^ permalink raw reply

* [PATCH] powerpc/akebono: Fix unmet dependency errors
From: Michael Ellerman @ 2021-02-01  1:25 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: f.fainelli, rdunlap, yury.norov

The AKEBONO config has various selects under it, including some with
user-selectable dependencies, which means those dependencies can be
disabled. This leads to warnings from Kconfig.

This can be seen with eg:

  $ make allnoconfig
  $ ./scripts/config --file build~/.config -k -e CONFIG_44x -k -e CONFIG_PPC_47x -e CONFIG_AKEBONO
  $ make olddefconfig

  WARNING: unmet direct dependencies detected for ATA
    Depends on [n]: HAS_IOMEM [=y] && BLOCK [=n]
    Selected by [y]:
    - AKEBONO [=y] && PPC_47x [=y]

  WARNING: unmet direct dependencies detected for NETDEVICES
    Depends on [n]: NET [=n]
    Selected by [y]:
    - AKEBONO [=y] && PPC_47x [=y]

  WARNING: unmet direct dependencies detected for ETHERNET
    Depends on [n]: NETDEVICES [=y] && NET [=n]
    Selected by [y]:
    - AKEBONO [=y] && PPC_47x [=y]

  WARNING: unmet direct dependencies detected for MMC_SDHCI
    Depends on [n]: MMC [=n] && HAS_DMA [=y]
    Selected by [y]:
    - AKEBONO [=y] && PPC_47x [=y]

  WARNING: unmet direct dependencies detected for MMC_SDHCI_PLTFM
    Depends on [n]: MMC [=n] && MMC_SDHCI [=y]
    Selected by [y]:
    - AKEBONO [=y] && PPC_47x [=y]

The problem is that AKEBONO is using select to enable things that are
not true dependencies, but rather things you probably want enabled in
an AKEBONO kernel. That is what a defconfig is for.

So drop those selects and instead move those symbols into the
defconfig. This fixes all the kconfig warnings, and the result of make
44x/akebono_defconfig is the same before and after the patch.

Reported-by: Yury Norov <yury.norov@gmail.com>
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
---
 arch/powerpc/configs/44x/akebono_defconfig | 5 +++++
 arch/powerpc/platforms/44x/Kconfig         | 7 -------
 2 files changed, 5 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/configs/44x/akebono_defconfig b/arch/powerpc/configs/44x/akebono_defconfig
index 3894ba8f8ffc..ae9b7beefdd6 100644
--- a/arch/powerpc/configs/44x/akebono_defconfig
+++ b/arch/powerpc/configs/44x/akebono_defconfig
@@ -21,6 +21,7 @@ CONFIG_IRQ_ALL_CPUS=y
 # CONFIG_COMPACTION is not set
 # CONFIG_SUSPEND is not set
 CONFIG_NET=y
+CONFIG_NETDEVICES=y
 CONFIG_PACKET=y
 CONFIG_UNIX=y
 CONFIG_INET=y
@@ -41,7 +42,9 @@ CONFIG_BLK_DEV_RAM_SIZE=35000
 # CONFIG_SCSI_PROC_FS is not set
 CONFIG_BLK_DEV_SD=y
 # CONFIG_SCSI_LOWLEVEL is not set
+CONFIG_ATA=y
 # CONFIG_SATA_PMP is not set
+CONFIG_SATA_AHCI_PLATFORM=y
 # CONFIG_ATA_SFF is not set
 # CONFIG_NET_VENDOR_3COM is not set
 # CONFIG_NET_VENDOR_ADAPTEC is not set
@@ -98,6 +101,8 @@ CONFIG_USB_OHCI_HCD=y
 # CONFIG_USB_OHCI_HCD_PCI is not set
 CONFIG_USB_STORAGE=y
 CONFIG_MMC=y
+CONFIG_MMC_SDHCI=y
+CONFIG_MMC_SDHCI_PLTFM=y
 CONFIG_RTC_CLASS=y
 CONFIG_RTC_DRV_M41T80=y
 CONFIG_EXT2_FS=y
diff --git a/arch/powerpc/platforms/44x/Kconfig b/arch/powerpc/platforms/44x/Kconfig
index 78ac6d67a935..7d41e9264510 100644
--- a/arch/powerpc/platforms/44x/Kconfig
+++ b/arch/powerpc/platforms/44x/Kconfig
@@ -206,17 +206,10 @@ config AKEBONO
 	select PPC4xx_HSTA_MSI
 	select I2C
 	select I2C_IBM_IIC
-	select NETDEVICES
-	select ETHERNET
-	select NET_VENDOR_IBM
 	select IBM_EMAC_EMAC4 if IBM_EMAC
 	select USB if USB_SUPPORT
 	select USB_OHCI_HCD_PLATFORM if USB_OHCI_HCD
 	select USB_EHCI_HCD_PLATFORM if USB_EHCI_HCD
-	select MMC_SDHCI
-	select MMC_SDHCI_PLTFM
-	select ATA
-	select SATA_AHCI_PLATFORM
 	help
 	  This option enables support for the IBM Akebono (476gtr) evaluation board
 
-- 
2.25.1


^ permalink raw reply related

* Re: [PATCH] powerpc: fix AKEBONO build failures
From: Michael Ellerman @ 2021-02-01  1:24 UTC (permalink / raw)
  To: Randy Dunlap, Yury Norov, linuxppc-dev, Linux Kernel Mailing List
  Cc: Paul Mackerras
In-Reply-To: <4572579a-7208-628d-cbe2-b70a74a84ae7@infradead.org>

Randy Dunlap <rdunlap@infradead.org> writes:
> On 1/21/21 5:14 PM, Michael Ellerman wrote:
>> Randy Dunlap <rdunlap@infradead.org> writes:
>>> On 1/20/21 1:29 PM, Yury Norov wrote:
>>>> Hi all,
>>>>
>>>> I found the power pc build broken on today's
>>>> linux-next (647060f3b592).
>>>
>>> Darn, I was building linux-5.11-rc4.
>>>
>>> I'll try linux-next after I send this.
>>>
>>> ---
>>> From: Randy Dunlap <rdunlap@infradead.org>
>>>
>>> Fulfill AKEBONO Kconfig requirements.
>>>
>>> Fixes these Kconfig warnings (and more) and fixes the subsequent
>>> build errors:
>>>
>>> WARNING: unmet direct dependencies detected for NETDEVICES
>>>    Depends on [n]: NET [=n]
>>>    Selected by [y]:
>>>    - AKEBONO [=y] && PPC_47x [=y]
>>>
>>> WARNING: unmet direct dependencies detected for MMC_SDHCI
>>>    Depends on [n]: MMC [=n] && HAS_DMA [=y]
>>>    Selected by [y]:
>>>    - AKEBONO [=y] && PPC_47x [=y]
>>>
>>> Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
>>> Cc: Michael Ellerman <mpe@ellerman.id.au>
>>> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
>>> Cc: Paul Mackerras <paulus@samba.org>
>>> Cc: linuxppc-dev@lists.ozlabs.org
>>> Cc: Yury Norov <yury.norov@gmail.com>
>>> ---
>>>   arch/powerpc/platforms/44x/Kconfig |    2 ++
>>>   1 file changed, 2 insertions(+)
>>>
>>> --- lnx-511-rc4.orig/arch/powerpc/platforms/44x/Kconfig
>>> +++ lnx-511-rc4/arch/powerpc/platforms/44x/Kconfig
>>> @@ -206,6 +206,7 @@ config AKEBONO
>>>   	select PPC4xx_HSTA_MSI
>>>   	select I2C
>>>   	select I2C_IBM_IIC
>>> +	select NET
>>>   	select NETDEVICES
>>>   	select ETHERNET
>>>   	select NET_VENDOR_IBM
>> 
>> I think the problem here is too much use of select, for things that
>> should instead be in the defconfig.
>> 
>> The patch below results in the same result for make
>> 44x/akebono_defconfig. Does it fix the original issue?
>
> Hi Michael,
> Sorry for the delay.
>
> Changing the akebono_defconfig doesn't cause the missing symbols
> to be set -- the defconfig is not being used here.

Yep, but that's OK. None of those selected symbols are hard dependencies
of AKEBONO, they're just things you probably want in your kernel to
actually boot on an akebono board.

> I guess that if you have users who set CONFIG_AKEBONO and expect
> it to build cleanly, you will need something like my patch or the
> patch that Florian just posted.

It will build cleanly, it just won't necessarily boot on a real board.
Users who enable AKEBONO manually need to know what they're doing, or
they should just use the defconfig.

> Changing the akebono_defconfig also would not help 'make randconfig'
> builds to build cleanly if they had happened to enable AKEBONO.

Changing the defconfig doesn't help randconfig, but dropping the selects
does.

Anyway I'll send a proper version of my patch, which I'm pretty
confident will fix all the issues.

cheers

^ permalink raw reply

* Re: [PATCH v7 19/42] powerpc/perf: move perf irq/nmi handling details into traps.c
From: Athira Rajeev @ 2021-01-31 12:30 UTC (permalink / raw)
  To: Nicholas Piggin; +Cc: linuxppc-dev
In-Reply-To: <20210130130852.2952424-20-npiggin@gmail.com>

[-- Attachment #1: Type: text/html, Size: 13382 bytes --]

^ permalink raw reply

* Re: [GIT PULL] Please pull powerpc/linux.git powerpc-5.11-6 tag
From: Linus Torvalds @ 2021-01-31 19:38 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: linuxppc-dev, Linux Kernel Mailing List, Nick Piggin
In-Reply-To: <87lfc9cnuw.fsf@mpe.ellerman.id.au>

On Sun, Jan 31, 2021 at 4:06 AM Michael Ellerman <mpe@ellerman.id.au> wrote:
>
> Please pull another powerpc fix for 5.11:

Manual pr-tracker-bot says thanks,

           Linus

^ permalink raw reply

* Re: [RFC 00/20] TLB batching consolidation and enhancements
From: Nadav Amit @ 2021-01-31  8:14 UTC (permalink / raw)
  To: Nadav Amit
  Cc: Andrea Arcangeli, linux-s390, X86 ML, Yu Zhao, Will Deacon,
	Peter Zijlstra, Mel Gorman, Dave Hansen, LKML, Nicholas Piggin,
	Linux-MM, linux-csky@vger.kernel.org, Andy Lutomirski,
	Andrew Morton, linuxppc-dev, Thomas Gleixner
In-Reply-To: <A1589669-34AE-4E15-8358-79BAD7C72520@vmware.com>

> On Jan 30, 2021, at 11:57 PM, Nadav Amit <namit@vmware.com> wrote:
> 
>> On Jan 30, 2021, at 7:30 PM, Nicholas Piggin <npiggin@gmail.com> wrote:
>> 
>> Excerpts from Nadav Amit's message of January 31, 2021 10:11 am:
>>> From: Nadav Amit <namit@vmware.com>
>>> 
>>> There are currently (at least?) 5 different TLB batching schemes in the
>>> kernel:
>>> 
>>> 1. Using mmu_gather (e.g., zap_page_range()).
>>> 
>>> 2. Using {inc|dec}_tlb_flush_pending() to inform other threads on the
>>>  ongoing deferred TLB flush and flushing the entire range eventually
>>>  (e.g., change_protection_range()).
>>> 
>>> 3. arch_{enter|leave}_lazy_mmu_mode() for sparc and powerpc (and Xen?).
>>> 
>>> 4. Batching per-table flushes (move_ptes()).
>>> 
>>> 5. By setting a flag on that a deferred TLB flush operation takes place,
>>>  flushing when (try_to_unmap_one() on x86).
>>> 
>>> It seems that (1)-(4) can be consolidated. In addition, it seems that
>>> (5) is racy. It also seems there can be many redundant TLB flushes, and
>>> potentially TLB-shootdown storms, for instance during batched
>>> reclamation (using try_to_unmap_one()) if at the same time mmu_gather
>>> defers TLB flushes.
>>> 
>>> More aggressive TLB batching may be possible, but this patch-set does
>>> not add such batching. The proposed changes would enable such batching
>>> in a later time.
>>> 
>>> Admittedly, I do not understand how things are not broken today, which
>>> frightens me to make further batching before getting things in order.
>>> For instance, why is ok for zap_pte_range() to batch dirty-PTE flushes
>>> for each page-table (but not in greater granularity). Can't
>>> ClearPageDirty() be called before the flush, causing writes after
>>> ClearPageDirty() and before the flush to be lost?
>> 
>> Because it's holding the page table lock which stops page_mkclean from 
>> cleaning the page. Or am I misunderstanding the question?
> 
> Thanks. I understood this part. Looking again at the code, I now understand
> my confusion: I forgot that the reverse mapping is removed after the PTE is
> zapped.
> 
> Makes me wonder whether it is ok to defer the TLB flush to tlb_finish_mmu(),
> by performing set_page_dirty() for the batched pages when needed in
> tlb_finish_mmu() [ i.e., by marking for each batched page whether
> set_page_dirty() should be issued for that page while collecting them ].

Correcting myself (I hope): no we cannot do so, since the buffers might be
remove from the page at that point.


^ permalink raw reply

* Re: [RFC 00/20] TLB batching consolidation and enhancements
From: Nadav Amit @ 2021-01-31  7:57 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: Andrea Arcangeli, linux-s390, X86 ML, Yu Zhao, Will Deacon,
	Peter Zijlstra, Mel Gorman, Dave Hansen, LKML,
	linux-csky@vger.kernel.org, Linux-MM, Andy Lutomirski,
	Andrew Morton, linuxppc-dev, Thomas Gleixner
In-Reply-To: <1612063149.2awdsvvmhj.astroid@bobo.none>

> On Jan 30, 2021, at 7:30 PM, Nicholas Piggin <npiggin@gmail.com> wrote:
> 
> Excerpts from Nadav Amit's message of January 31, 2021 10:11 am:
>> From: Nadav Amit <namit@vmware.com>
>> 
>> There are currently (at least?) 5 different TLB batching schemes in the
>> kernel:
>> 
>> 1. Using mmu_gather (e.g., zap_page_range()).
>> 
>> 2. Using {inc|dec}_tlb_flush_pending() to inform other threads on the
>>   ongoing deferred TLB flush and flushing the entire range eventually
>>   (e.g., change_protection_range()).
>> 
>> 3. arch_{enter|leave}_lazy_mmu_mode() for sparc and powerpc (and Xen?).
>> 
>> 4. Batching per-table flushes (move_ptes()).
>> 
>> 5. By setting a flag on that a deferred TLB flush operation takes place,
>>   flushing when (try_to_unmap_one() on x86).
>> 
>> It seems that (1)-(4) can be consolidated. In addition, it seems that
>> (5) is racy. It also seems there can be many redundant TLB flushes, and
>> potentially TLB-shootdown storms, for instance during batched
>> reclamation (using try_to_unmap_one()) if at the same time mmu_gather
>> defers TLB flushes.
>> 
>> More aggressive TLB batching may be possible, but this patch-set does
>> not add such batching. The proposed changes would enable such batching
>> in a later time.
>> 
>> Admittedly, I do not understand how things are not broken today, which
>> frightens me to make further batching before getting things in order.
>> For instance, why is ok for zap_pte_range() to batch dirty-PTE flushes
>> for each page-table (but not in greater granularity). Can't
>> ClearPageDirty() be called before the flush, causing writes after
>> ClearPageDirty() and before the flush to be lost?
> 
> Because it's holding the page table lock which stops page_mkclean from 
> cleaning the page. Or am I misunderstanding the question?

Thanks. I understood this part. Looking again at the code, I now understand
my confusion: I forgot that the reverse mapping is removed after the PTE is
zapped.

Makes me wonder whether it is ok to defer the TLB flush to tlb_finish_mmu(),
by performing set_page_dirty() for the batched pages when needed in
tlb_finish_mmu() [ i.e., by marking for each batched page whether
set_page_dirty() should be issued for that page while collecting them ].

> I'll go through the patches a bit more closely when they all come 
> through. Sparc and powerpc of course need the arch lazy mode to get 
> per-page/pte information for operations that are not freeing pages, 
> which is what mmu gather is designed for.

IIUC you mean any PTE change requires a TLB flush. Even setting up a new PTE
where no previous PTE was set, right?

> I wouldn't mind using a similar API so it's less of a black box when 
> reading generic code, but it might not quite fit the mmu gather API
> exactly (most of these paths don't want a full mmu_gather on stack).

I see your point. It may be possible to create two mmu_gather structs: a
small one that only holds the flush information and another that also holds
the pages. 

>> This patch-set therefore performs the following changes:
>> 
>> 1. Change mprotect, task_mmu and mapping_dirty_helpers to use mmu_gather
>>   instead of {inc|dec}_tlb_flush_pending().
>> 
>> 2. Avoid TLB flushes if PTE permission is not demoted.
>> 
>> 3. Cleans up mmu_gather to be less arch-dependant.
>> 
>> 4. Uses mm's generations to track in finer granularity, either per-VMA
>>   or per page-table, whether a pending mmu_gather operation is
>>   outstanding. This should allow to avoid some TLB flushes when KSM or
>>   memory reclamation takes place while another operation such as
>>   munmap() or mprotect() is running.
>> 
>> 5. Changes try_to_unmap_one() flushing scheme, as the current seems
>>   broken to track in a bitmap which CPUs have outstanding TLB flushes
>>   instead of having a flag.
> 
> Putting fixes first, and cleanups and independent patches (like #2) next
> would help with getting stuff merged and backported.

I tried to do it mostly this way. There are some theoretical races which
I did not manage (or try hard enough) to create, so I did not include in
the “fixes” section. I will restructure the patch-set according to the
feedback.

Thanks,
Nadav

^ permalink raw reply

* Re: [RFC 00/20] TLB batching consolidation and enhancements
From: Nicholas Piggin @ 2021-01-31  3:30 UTC (permalink / raw)
  To: linux-kernel, linux-mm, Nadav Amit
  Cc: Andrea Arcangeli, linux-s390, x86, Yu Zhao, Will Deacon,
	Peter Zijlstra, Mel Gorman, Dave Hansen, linux-csky, Nadav Amit,
	Andy Lutomirski, Andrew Morton, linuxppc-dev, Thomas Gleixner
In-Reply-To: <20210131001132.3368247-1-namit@vmware.com>

Excerpts from Nadav Amit's message of January 31, 2021 10:11 am:
> From: Nadav Amit <namit@vmware.com>
> 
> There are currently (at least?) 5 different TLB batching schemes in the
> kernel:
> 
> 1. Using mmu_gather (e.g., zap_page_range()).
> 
> 2. Using {inc|dec}_tlb_flush_pending() to inform other threads on the
>    ongoing deferred TLB flush and flushing the entire range eventually
>    (e.g., change_protection_range()).
> 
> 3. arch_{enter|leave}_lazy_mmu_mode() for sparc and powerpc (and Xen?).
> 
> 4. Batching per-table flushes (move_ptes()).
> 
> 5. By setting a flag on that a deferred TLB flush operation takes place,
>    flushing when (try_to_unmap_one() on x86).
> 
> It seems that (1)-(4) can be consolidated. In addition, it seems that
> (5) is racy. It also seems there can be many redundant TLB flushes, and
> potentially TLB-shootdown storms, for instance during batched
> reclamation (using try_to_unmap_one()) if at the same time mmu_gather
> defers TLB flushes.
> 
> More aggressive TLB batching may be possible, but this patch-set does
> not add such batching. The proposed changes would enable such batching
> in a later time.
> 
> Admittedly, I do not understand how things are not broken today, which
> frightens me to make further batching before getting things in order.
> For instance, why is ok for zap_pte_range() to batch dirty-PTE flushes
> for each page-table (but not in greater granularity). Can't
> ClearPageDirty() be called before the flush, causing writes after
> ClearPageDirty() and before the flush to be lost?

Because it's holding the page table lock which stops page_mkclean from 
cleaning the page. Or am I misunderstanding the question?

I'll go through the patches a bit more closely when they all come 
through. Sparc and powerpc of course need the arch lazy mode to get 
per-page/pte information for operations that are not freeing pages, 
which is what mmu gather is designed for.

I wouldn't mind using a similar API so it's less of a black box when 
reading generic code, but it might not quite fit the mmu gather API
exactly (most of these paths don't want a full mmu_gather on stack).

> 
> This patch-set therefore performs the following changes:
> 
> 1. Change mprotect, task_mmu and mapping_dirty_helpers to use mmu_gather
>    instead of {inc|dec}_tlb_flush_pending().
> 
> 2. Avoid TLB flushes if PTE permission is not demoted.
> 
> 3. Cleans up mmu_gather to be less arch-dependant.
> 
> 4. Uses mm's generations to track in finer granularity, either per-VMA
>    or per page-table, whether a pending mmu_gather operation is
>    outstanding. This should allow to avoid some TLB flushes when KSM or
>    memory reclamation takes place while another operation such as
>    munmap() or mprotect() is running.
> 
> 5. Changes try_to_unmap_one() flushing scheme, as the current seems
>    broken to track in a bitmap which CPUs have outstanding TLB flushes
>    instead of having a flag.

Putting fixes first, and cleanups and independent patches (like #2) next
would help with getting stuff merged and backported.

Thanks,
Nick

^ permalink raw reply

* Re: [RFC 00/20] TLB batching consolidation and enhancements
From: Nadav Amit @ 2021-01-31  1:08 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Andrea Arcangeli, linux-s390, X86 ML, Yu Zhao, Will Deacon,
	Peter Zijlstra, Mel Gorman, Dave Hansen, LKML,
	linux-csky@vger.kernel.org, Linux-MM, Nick Piggin, Andrew Morton,
	linuxppc-dev, Thomas Gleixner
In-Reply-To: <CALCETrUUe_iijSoTYMjibqxUveaYrG3sVTWawbi3HWCjx_ySYg@mail.gmail.com>

> On Jan 30, 2021, at 4:39 PM, Andy Lutomirski <luto@kernel.org> wrote:
> 
> On Sat, Jan 30, 2021 at 4:16 PM Nadav Amit <nadav.amit@gmail.com> wrote:
>> From: Nadav Amit <namit@vmware.com>
>> 
>> There are currently (at least?) 5 different TLB batching schemes in the
>> kernel:
>> 
>> 1. Using mmu_gather (e.g., zap_page_range()).
>> 
>> 2. Using {inc|dec}_tlb_flush_pending() to inform other threads on the
>>   ongoing deferred TLB flush and flushing the entire range eventually
>>   (e.g., change_protection_range()).
>> 
>> 3. arch_{enter|leave}_lazy_mmu_mode() for sparc and powerpc (and Xen?).
>> 
>> 4. Batching per-table flushes (move_ptes()).
>> 
>> 5. By setting a flag on that a deferred TLB flush operation takes place,
>>   flushing when (try_to_unmap_one() on x86).
> 
> Are you referring to the arch_tlbbatch_add_mm/flush mechanism?

Yes.

^ permalink raw reply

* Re: [RFC 00/20] TLB batching consolidation and enhancements
From: Andy Lutomirski @ 2021-01-31  0:39 UTC (permalink / raw)
  To: Nadav Amit
  Cc: Andrea Arcangeli, linux-s390, X86 ML, Yu Zhao, Will Deacon,
	Peter Zijlstra, Mel Gorman, Dave Hansen, LKML, linux-csky,
	Linux-MM, Nadav Amit, Nick Piggin, Andy Lutomirski, Andrew Morton,
	linuxppc-dev, Thomas Gleixner
In-Reply-To: <20210131001132.3368247-1-namit@vmware.com>

On Sat, Jan 30, 2021 at 4:16 PM Nadav Amit <nadav.amit@gmail.com> wrote:
>
> From: Nadav Amit <namit@vmware.com>
>
> There are currently (at least?) 5 different TLB batching schemes in the
> kernel:
>
> 1. Using mmu_gather (e.g., zap_page_range()).
>
> 2. Using {inc|dec}_tlb_flush_pending() to inform other threads on the
>    ongoing deferred TLB flush and flushing the entire range eventually
>    (e.g., change_protection_range()).
>
> 3. arch_{enter|leave}_lazy_mmu_mode() for sparc and powerpc (and Xen?).
>
> 4. Batching per-table flushes (move_ptes()).
>
> 5. By setting a flag on that a deferred TLB flush operation takes place,
>    flushing when (try_to_unmap_one() on x86).

Are you referring to the arch_tlbbatch_add_mm/flush mechanism?

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox