LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
* Re: [RFC please help] membarrier: Rewrite sync_core_before_usermode()
From: Andy Lutomirski @ 2020-12-28 21:06 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: linuxppc-dev, Catalin Marinas, Arnd Bergmann, paulmck, Jann Horn,
	Peter Zijlstra, x86, Russell King, ARM Linux, Nicholas Piggin,
	linux-kernel, Paul Mackerras, stable, Andy Lutomirski,
	Will Deacon, linux-arm-kernel
In-Reply-To: <1086654515.3607.1609187556216.JavaMail.zimbra@efficios.com>

On Mon, Dec 28, 2020 at 12:32 PM Mathieu Desnoyers
<mathieu.desnoyers@efficios.com> wrote:
>
> ----- On Dec 28, 2020, at 2:44 PM, Andy Lutomirski luto@kernel.org wrote:
>
> > On Mon, Dec 28, 2020 at 11:09 AM Russell King - ARM Linux admin
> > <linux@armlinux.org.uk> wrote:
> >>
> >> On Mon, Dec 28, 2020 at 07:29:34PM +0100, Jann Horn wrote:
> >> > After chatting with rmk about this (but without claiming that any of
> >> > this is his opinion), based on the manpage, I think membarrier()
> >> > currently doesn't really claim to be synchronizing caches? It just
> >> > serializes cores. So arguably if userspace wants to use membarrier()
> >> > to synchronize code changes, userspace should first do the code
> >> > change, then flush icache as appropriate for the architecture, and
> >> > then do the membarrier() to ensure that the old code is unused?
>
> ^ exactly, yes.
>
> >> >
> >> > For 32-bit arm, rmk pointed out that that would be the cacheflush()
> >> > syscall. That might cause you to end up with two IPIs instead of one
> >> > in total, but we probably don't care _that_ much about extra IPIs on
> >> > 32-bit arm?
>
> This was the original thinking, yes. The cacheflush IPI will flush specific
> regions of code, and the membarrier IPI issues context synchronizing
> instructions.
>
> Architectures with coherent i/d caches don't need the cacheflush step.

There are different levels of coherency -- VIVT architectures may have
differing requirements compared to PIPT, etc.

In any case, I feel like the approach taken by the documentation is
fundamentally confusing.  Architectures don't all speak the same
language  How about something like:

The SYNC_CORE operation causes all threads in the caller's address
space (including the caller) to execute an architecture-defined
barrier operation.  membarrier() will ensure that this barrier is
executed at a time such that all data writes done by the calling
thread before membarrier() are made visible by the barrier.
Additional architecture-dependent cache management operations may be
required to use this for JIT code.

x86: SYNC_CORE executes a barrier that will cause subsequent
instruction fetches to observe prior writes.  Currently this will be a
"serializing" instruction, but, if future improved CPU documentation
becomes available and relaxes this requirement, the barrier may
change.  The kernel guarantees that writing new or modified
instructions to normal memory (and issuing SFENCE if the writes were
non-temporal) then doing a membarrier SYNC_CORE operation is
sufficient to cause all threads in the caller's address space to
execute the new or modified instructions.  This is true regardless of
whether or not those instructions are written at the same virtual
address from which they are subsequently executed.  No additional
cache management is required on x86.

arm: Something about the cache management syscalls.

arm64: Ditto

powerpc: I have no idea.

^ permalink raw reply

* Re: [RFC please help] membarrier: Rewrite sync_core_before_usermode()
From: Mathieu Desnoyers @ 2020-12-28 20:40 UTC (permalink / raw)
  To: Russell King, ARM Linux
  Cc: linuxppc-dev, Catalin Marinas, Arnd Bergmann, Jann Horn, x86,
	linux-kernel, Nicholas Piggin, Paul Mackerras, stable,
	Andy Lutomirski, Will Deacon, linux-arm-kernel
In-Reply-To: <20201228202451.GJ1551@shell.armlinux.org.uk>

----- On Dec 28, 2020, at 3:24 PM, Russell King, ARM Linux linux@armlinux.org.uk wrote:

> On Mon, Dec 28, 2020 at 11:44:33AM -0800, Andy Lutomirski wrote:
>> On Mon, Dec 28, 2020 at 11:09 AM Russell King - ARM Linux admin
>> <linux@armlinux.org.uk> wrote:
>> >
>> > On Mon, Dec 28, 2020 at 07:29:34PM +0100, Jann Horn wrote:
>> > > After chatting with rmk about this (but without claiming that any of
>> > > this is his opinion), based on the manpage, I think membarrier()
>> > > currently doesn't really claim to be synchronizing caches? It just
>> > > serializes cores. So arguably if userspace wants to use membarrier()
>> > > to synchronize code changes, userspace should first do the code
>> > > change, then flush icache as appropriate for the architecture, and
>> > > then do the membarrier() to ensure that the old code is unused?
>> > >
>> > > For 32-bit arm, rmk pointed out that that would be the cacheflush()
>> > > syscall. That might cause you to end up with two IPIs instead of one
>> > > in total, but we probably don't care _that_ much about extra IPIs on
>> > > 32-bit arm?
>> > >
>> > > For arm64, I believe userspace can flush icache across the entire
>> > > system with some instructions from userspace - "DC CVAU" followed by
>> > > "DSB ISH", or something like that, I think? (See e.g.
>> > > compat_arm_syscall(), the arm64 compat code that implements the 32-bit
>> > > arm cacheflush() syscall.)
>> >
>> > Note that the ARM cacheflush syscall calls flush_icache_user_range()
>> > over the range of addresses that userspace has passed - it's intention
>> > since day one is to support cases where userspace wants to change
>> > executable code.
>> >
>> > It will issue the appropriate write-backs to the data cache (DCCMVAU),
>> > the invalidates to the instruction cache (ICIMVAU), invalidate the
>> > branch target buffer (BPIALLIS or BPIALL as appropriate), and issue
>> > the appropriate barriers (DSB ISHST, ISB).
>> >
>> > Note that neither flush_icache_user_range() nor flush_icache_range()
>> > result in IPIs; cache operations are broadcast across all CPUs (which
>> > is one of the minimums we require for SMP systems.)
>> >
>> > Now, that all said, I think the question that has to be asked is...
>> >
>> >         What is the basic purpose of membarrier?
>> >
>> > Is the purpose of it to provide memory barriers, or is it to provide
>> > memory coherence?
>> >
>> > If it's the former and not the latter, then cache flushes are out of
>> > scope, and expecting memory written to be visible to the instruction
>> > stream is totally out of scope of the membarrier interface, whether
>> > or not the writes happen on the same or a different CPU to the one
>> > executing the rewritten code.
>> >
>> > The documentation in the kernel does not seem to describe what it's
>> > supposed to be doing - the only thing I could find is this:
>> > Documentation/features/sched/membarrier-sync-core/arch-support.txt
>> > which describes it as "arch supports core serializing membarrier"
>> > whatever that means.
>> >
>> > Seems to be the standard and usual case of utterly poor to non-existent
>> > documentation within the kernel tree, or even a pointer to where any
>> > useful documentation can be found.
>> >
>> > Reading the membarrier(2) man page, I find nothing in there that talks
>> > about any kind of cache coherency for self-modifying code - it only
>> > seems to be about _barriers_ and nothing more, and barriers alone do
>> > precisely nothing to save you from non-coherent Harvard caches.
>> >
>> > So, either Andy has a misunderstanding, or the man page is wrong, or
>> > my rudimentary understanding of what membarrier is supposed to be
>> > doing is wrong...
>> 
>> Look at the latest man page:
>> 
>> https://man7.org/linux/man-pages/man2/membarrier.2.html
>> 
>> for MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE.  The result may not be
>> all that enlightening.
> 
>       MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE (since Linux 4.16)
>              In  addition  to  providing  the  memory ordering guarantees de■
>              scribed in MEMBARRIER_CMD_PRIVATE_EXPEDITED,  upon  return  from
>              system call the calling thread has a guarantee that all its run■
>              ning thread siblings have executed a core  serializing  instruc■
>              tion.   This  guarantee is provided only for threads in the same
>              process as the calling thread.
> 
>              The "expedited" commands complete faster than the  non-expedited
>              ones,  they  never block, but have the downside of causing extra
>              overhead.
> 
>              A process must register its intent to use the private  expedited
>              sync core command prior to using it.
> 
> This just says that the siblings have executed a serialising
> instruction, in other words a barrier. It makes no claims concerning
> cache coherency - and without some form of cache maintenance, there
> can be no expectation that the I and D streams to be coherent with
> each other.

Right, membarrier is not doing anything wrt I/D caches. On architectures
without coherent caches, users should use other system calls or instructions
provided by the architecture to synchronize the appropriate address ranges. 

> This description is also weird in another respect. "guarantee that
> all its running thread siblings have executed a core serializing
> instruction" ... "The expedited commands ... never block".
> 
> So, the core executing this call is not allowed to block, but the
> other part indicates that the other CPUs _have_ executed a serialising
> instruction before this call returns... one wonders how that happens
> without blocking. Maybe the CPU spins waiting for completion instead?

Membarrier expedited sync-core issues IPIs to all CPUs running sibling
threads. AFAIR the IPI mechanism uses the "csd lock" which is basically
busy waiting. So it does not "block", it busy-waits.

For completeness of the explanation, other (non-running) threads acting
on the same mm will eventually issue the context synchronizing instruction
before returning to user-space whenever they are scheduled back.

Thanks,

Mathieu


> 
> --
> RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
> FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply

* Re: [RFC please help] membarrier: Rewrite sync_core_before_usermode()
From: Mathieu Desnoyers @ 2020-12-28 20:32 UTC (permalink / raw)
  To: Andy Lutomirski, paulmck, Peter Zijlstra
  Cc: linuxppc-dev, Catalin Marinas, Arnd Bergmann, Jann Horn, x86,
	linux-kernel, Nicholas Piggin, Russell King, ARM Linux,
	Paul Mackerras, stable, Will Deacon, linux-arm-kernel
In-Reply-To: <CALCETrVpvrBufrJgXNY=ogtZQLo7zgxQmD7k9eVCFjcdcvarmA@mail.gmail.com>

----- On Dec 28, 2020, at 2:44 PM, Andy Lutomirski luto@kernel.org wrote:

> On Mon, Dec 28, 2020 at 11:09 AM Russell King - ARM Linux admin
> <linux@armlinux.org.uk> wrote:
>>
>> On Mon, Dec 28, 2020 at 07:29:34PM +0100, Jann Horn wrote:
>> > After chatting with rmk about this (but without claiming that any of
>> > this is his opinion), based on the manpage, I think membarrier()
>> > currently doesn't really claim to be synchronizing caches? It just
>> > serializes cores. So arguably if userspace wants to use membarrier()
>> > to synchronize code changes, userspace should first do the code
>> > change, then flush icache as appropriate for the architecture, and
>> > then do the membarrier() to ensure that the old code is unused?

^ exactly, yes.

>> >
>> > For 32-bit arm, rmk pointed out that that would be the cacheflush()
>> > syscall. That might cause you to end up with two IPIs instead of one
>> > in total, but we probably don't care _that_ much about extra IPIs on
>> > 32-bit arm?

This was the original thinking, yes. The cacheflush IPI will flush specific
regions of code, and the membarrier IPI issues context synchronizing
instructions.

Architectures with coherent i/d caches don't need the cacheflush step.

>> >
>> > For arm64, I believe userspace can flush icache across the entire
>> > system with some instructions from userspace - "DC CVAU" followed by
>> > "DSB ISH", or something like that, I think? (See e.g.
>> > compat_arm_syscall(), the arm64 compat code that implements the 32-bit
>> > arm cacheflush() syscall.)
>>
>> Note that the ARM cacheflush syscall calls flush_icache_user_range()
>> over the range of addresses that userspace has passed - it's intention
>> since day one is to support cases where userspace wants to change
>> executable code.
>>
>> It will issue the appropriate write-backs to the data cache (DCCMVAU),
>> the invalidates to the instruction cache (ICIMVAU), invalidate the
>> branch target buffer (BPIALLIS or BPIALL as appropriate), and issue
>> the appropriate barriers (DSB ISHST, ISB).
>>
>> Note that neither flush_icache_user_range() nor flush_icache_range()
>> result in IPIs; cache operations are broadcast across all CPUs (which
>> is one of the minimums we require for SMP systems.)

But the ISB AFAIU is not a broadcast. Based on
https://developer.arm.com/documentation/ddi0406/c/Appendices/Barrier-Litmus-Tests/Cache-and-TLB-maintenance-operations-and-barriers/Instruction-cache-maintenance-operations

"Ensuring the visibility of updates to instructions for a multiprocessor"

the ISB needs to be issued on each processor in a multiprocessor system.
This is where membarrier sync-core comes handy.

>>
>> Now, that all said, I think the question that has to be asked is...
>>
>>         What is the basic purpose of membarrier?

For basic membarrier (not sync-core), the purpose is to provide memory
barriers across a given set of threads.

For sync-core membarrier, the purpose is to guarantee context synchronizing
instructions on all threads belonging to the same mm.

>>
>> Is the purpose of it to provide memory barriers, or is it to provide
>> memory coherence?

The purpose has never been to provide any kind of memory coherence, as
this would duplicate what is already achieved by other architecture-specific
system calls.

>>
>> If it's the former and not the latter, then cache flushes are out of
>> scope, and expecting memory written to be visible to the instruction
>> stream is totally out of scope of the membarrier interface, whether
>> or not the writes happen on the same or a different CPU to the one
>> executing the rewritten code.
>>
>> The documentation in the kernel does not seem to describe what it's
>> supposed to be doing - the only thing I could find is this:
>> Documentation/features/sched/membarrier-sync-core/arch-support.txt
>> which describes it as "arch supports core serializing membarrier"
>> whatever that means.

It's described in include/uapi/linux/membarrier.h.

>>
>> Seems to be the standard and usual case of utterly poor to non-existent
>> documentation within the kernel tree, or even a pointer to where any
>> useful documentation can be found.
>>
>> Reading the membarrier(2) man page, I find nothing in there that talks
>> about any kind of cache coherency for self-modifying code - it only
>> seems to be about _barriers_ and nothing more, and barriers alone do
>> precisely nothing to save you from non-coherent Harvard caches.

There is no mention of cache coherency because membarrier does not
do anything wrt cache coherency. We should perhaps explicitly point
it out though.

>>
>> So, either Andy has a misunderstanding, or the man page is wrong, or
>> my rudimentary understanding of what membarrier is supposed to be
>> doing is wrong...
> 
> Look at the latest man page:
> 
> https://man7.org/linux/man-pages/man2/membarrier.2.html
> 
> for MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE.  The result may not be
> all that enlightening.

As described in the membarrier(2) man page and in include/uapi/linux/membarrier.h,
membarrier MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE guarantees core serializing
instructions in addition to the memory ordering guaranteed by MEMBARRIER_CMD_PRIVATE_EXPEDITED.

It does not guarantee anything wrt i/d cache coherency. I initially considered
adding such guarantees when we introduced membarrier sync-core, but decided
against it because it would basically replicate what architectures already
expose to user-space, e.g. flush_icache_user_range on arm32.

So between code modification and allowing other threads to jump to that code,
it should be expected that architectures without coherent i/d cache will need
to flush caches to ensure coherency *and* to issue membarrier to make sure
core serializing instructions will be issued by every thread acting on the
same mm either immediately by means of the IPI, or before they return to
user-space if they do not happen to be currently running when the membarrier
system call is invoked.

Hoping this clarifies things. I suspect we will need to clarify documentation
about what membarrier *does not* guarantee, given that you mistakenly expected
membarrier to take care of cache flushing.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply

* Re: [RFC please help] membarrier: Rewrite sync_core_before_usermode()
From: Russell King - ARM Linux admin @ 2020-12-28 20:24 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: linuxppc-dev, Catalin Marinas, Arnd Bergmann, Jann Horn, x86,
	linux-kernel, Nicholas Piggin, Mathieu Desnoyers, stable,
	Paul Mackerras, Will Deacon, linux-arm-kernel
In-Reply-To: <CALCETrVpvrBufrJgXNY=ogtZQLo7zgxQmD7k9eVCFjcdcvarmA@mail.gmail.com>

On Mon, Dec 28, 2020 at 11:44:33AM -0800, Andy Lutomirski wrote:
> On Mon, Dec 28, 2020 at 11:09 AM Russell King - ARM Linux admin
> <linux@armlinux.org.uk> wrote:
> >
> > On Mon, Dec 28, 2020 at 07:29:34PM +0100, Jann Horn wrote:
> > > After chatting with rmk about this (but without claiming that any of
> > > this is his opinion), based on the manpage, I think membarrier()
> > > currently doesn't really claim to be synchronizing caches? It just
> > > serializes cores. So arguably if userspace wants to use membarrier()
> > > to synchronize code changes, userspace should first do the code
> > > change, then flush icache as appropriate for the architecture, and
> > > then do the membarrier() to ensure that the old code is unused?
> > >
> > > For 32-bit arm, rmk pointed out that that would be the cacheflush()
> > > syscall. That might cause you to end up with two IPIs instead of one
> > > in total, but we probably don't care _that_ much about extra IPIs on
> > > 32-bit arm?
> > >
> > > For arm64, I believe userspace can flush icache across the entire
> > > system with some instructions from userspace - "DC CVAU" followed by
> > > "DSB ISH", or something like that, I think? (See e.g.
> > > compat_arm_syscall(), the arm64 compat code that implements the 32-bit
> > > arm cacheflush() syscall.)
> >
> > Note that the ARM cacheflush syscall calls flush_icache_user_range()
> > over the range of addresses that userspace has passed - it's intention
> > since day one is to support cases where userspace wants to change
> > executable code.
> >
> > It will issue the appropriate write-backs to the data cache (DCCMVAU),
> > the invalidates to the instruction cache (ICIMVAU), invalidate the
> > branch target buffer (BPIALLIS or BPIALL as appropriate), and issue
> > the appropriate barriers (DSB ISHST, ISB).
> >
> > Note that neither flush_icache_user_range() nor flush_icache_range()
> > result in IPIs; cache operations are broadcast across all CPUs (which
> > is one of the minimums we require for SMP systems.)
> >
> > Now, that all said, I think the question that has to be asked is...
> >
> >         What is the basic purpose of membarrier?
> >
> > Is the purpose of it to provide memory barriers, or is it to provide
> > memory coherence?
> >
> > If it's the former and not the latter, then cache flushes are out of
> > scope, and expecting memory written to be visible to the instruction
> > stream is totally out of scope of the membarrier interface, whether
> > or not the writes happen on the same or a different CPU to the one
> > executing the rewritten code.
> >
> > The documentation in the kernel does not seem to describe what it's
> > supposed to be doing - the only thing I could find is this:
> > Documentation/features/sched/membarrier-sync-core/arch-support.txt
> > which describes it as "arch supports core serializing membarrier"
> > whatever that means.
> >
> > Seems to be the standard and usual case of utterly poor to non-existent
> > documentation within the kernel tree, or even a pointer to where any
> > useful documentation can be found.
> >
> > Reading the membarrier(2) man page, I find nothing in there that talks
> > about any kind of cache coherency for self-modifying code - it only
> > seems to be about _barriers_ and nothing more, and barriers alone do
> > precisely nothing to save you from non-coherent Harvard caches.
> >
> > So, either Andy has a misunderstanding, or the man page is wrong, or
> > my rudimentary understanding of what membarrier is supposed to be
> > doing is wrong...
> 
> Look at the latest man page:
> 
> https://man7.org/linux/man-pages/man2/membarrier.2.html
> 
> for MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE.  The result may not be
> all that enlightening.

       MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE (since Linux 4.16)
              In  addition  to  providing  the  memory ordering guarantees de■
              scribed in MEMBARRIER_CMD_PRIVATE_EXPEDITED,  upon  return  from
              system call the calling thread has a guarantee that all its run■
              ning thread siblings have executed a core  serializing  instruc■
              tion.   This  guarantee is provided only for threads in the same
              process as the calling thread.

              The "expedited" commands complete faster than the  non-expedited
              ones,  they  never block, but have the downside of causing extra
              overhead.

              A process must register its intent to use the private  expedited
              sync core command prior to using it.

This just says that the siblings have executed a serialising
instruction, in other words a barrier. It makes no claims concerning
cache coherency - and without some form of cache maintenance, there
can be no expectation that the I and D streams to be coherent with
each other.

This description is also weird in another respect. "guarantee that
all its running thread siblings have executed a core serializing
instruction" ... "The expedited commands ... never block".

So, the core executing this call is not allowed to block, but the
other part indicates that the other CPUs _have_ executed a serialising
instruction before this call returns... one wonders how that happens
without blocking. Maybe the CPU spins waiting for completion instead?

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply

* Re: [RFC please help] membarrier: Rewrite sync_core_before_usermode()
From: Andy Lutomirski @ 2020-12-28 19:44 UTC (permalink / raw)
  To: Russell King - ARM Linux admin
  Cc: linuxppc-dev, Catalin Marinas, Arnd Bergmann, Jann Horn, x86,
	linux-kernel, Nicholas Piggin, Mathieu Desnoyers, stable,
	Andy Lutomirski, Paul Mackerras, Will Deacon, linux-arm-kernel
In-Reply-To: <20201228190852.GI1551@shell.armlinux.org.uk>

On Mon, Dec 28, 2020 at 11:09 AM Russell King - ARM Linux admin
<linux@armlinux.org.uk> wrote:
>
> On Mon, Dec 28, 2020 at 07:29:34PM +0100, Jann Horn wrote:
> > After chatting with rmk about this (but without claiming that any of
> > this is his opinion), based on the manpage, I think membarrier()
> > currently doesn't really claim to be synchronizing caches? It just
> > serializes cores. So arguably if userspace wants to use membarrier()
> > to synchronize code changes, userspace should first do the code
> > change, then flush icache as appropriate for the architecture, and
> > then do the membarrier() to ensure that the old code is unused?
> >
> > For 32-bit arm, rmk pointed out that that would be the cacheflush()
> > syscall. That might cause you to end up with two IPIs instead of one
> > in total, but we probably don't care _that_ much about extra IPIs on
> > 32-bit arm?
> >
> > For arm64, I believe userspace can flush icache across the entire
> > system with some instructions from userspace - "DC CVAU" followed by
> > "DSB ISH", or something like that, I think? (See e.g.
> > compat_arm_syscall(), the arm64 compat code that implements the 32-bit
> > arm cacheflush() syscall.)
>
> Note that the ARM cacheflush syscall calls flush_icache_user_range()
> over the range of addresses that userspace has passed - it's intention
> since day one is to support cases where userspace wants to change
> executable code.
>
> It will issue the appropriate write-backs to the data cache (DCCMVAU),
> the invalidates to the instruction cache (ICIMVAU), invalidate the
> branch target buffer (BPIALLIS or BPIALL as appropriate), and issue
> the appropriate barriers (DSB ISHST, ISB).
>
> Note that neither flush_icache_user_range() nor flush_icache_range()
> result in IPIs; cache operations are broadcast across all CPUs (which
> is one of the minimums we require for SMP systems.)
>
> Now, that all said, I think the question that has to be asked is...
>
>         What is the basic purpose of membarrier?
>
> Is the purpose of it to provide memory barriers, or is it to provide
> memory coherence?
>
> If it's the former and not the latter, then cache flushes are out of
> scope, and expecting memory written to be visible to the instruction
> stream is totally out of scope of the membarrier interface, whether
> or not the writes happen on the same or a different CPU to the one
> executing the rewritten code.
>
> The documentation in the kernel does not seem to describe what it's
> supposed to be doing - the only thing I could find is this:
> Documentation/features/sched/membarrier-sync-core/arch-support.txt
> which describes it as "arch supports core serializing membarrier"
> whatever that means.
>
> Seems to be the standard and usual case of utterly poor to non-existent
> documentation within the kernel tree, or even a pointer to where any
> useful documentation can be found.
>
> Reading the membarrier(2) man page, I find nothing in there that talks
> about any kind of cache coherency for self-modifying code - it only
> seems to be about _barriers_ and nothing more, and barriers alone do
> precisely nothing to save you from non-coherent Harvard caches.
>
> So, either Andy has a misunderstanding, or the man page is wrong, or
> my rudimentary understanding of what membarrier is supposed to be
> doing is wrong...

Look at the latest man page:

https://man7.org/linux/man-pages/man2/membarrier.2.html

for MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE.  The result may not be
all that enlightening.

--Andy

^ permalink raw reply

* Re: [RFC please help] membarrier: Rewrite sync_core_before_usermode()
From: Russell King - ARM Linux admin @ 2020-12-28 19:08 UTC (permalink / raw)
  To: Jann Horn
  Cc: linuxppc-dev, Catalin Marinas, Arnd Bergmann, x86, linux-kernel,
	Nicholas Piggin, Mathieu Desnoyers, stable, Andy Lutomirski,
	Paul Mackerras, Will Deacon, linux-arm-kernel
In-Reply-To: <CAG48ez0YZ_iy6qZpdGUj38wqeg_NzLHHhU-mBCBf5hcopYGVPg@mail.gmail.com>

On Mon, Dec 28, 2020 at 07:29:34PM +0100, Jann Horn wrote:
> After chatting with rmk about this (but without claiming that any of
> this is his opinion), based on the manpage, I think membarrier()
> currently doesn't really claim to be synchronizing caches? It just
> serializes cores. So arguably if userspace wants to use membarrier()
> to synchronize code changes, userspace should first do the code
> change, then flush icache as appropriate for the architecture, and
> then do the membarrier() to ensure that the old code is unused?
> 
> For 32-bit arm, rmk pointed out that that would be the cacheflush()
> syscall. That might cause you to end up with two IPIs instead of one
> in total, but we probably don't care _that_ much about extra IPIs on
> 32-bit arm?
> 
> For arm64, I believe userspace can flush icache across the entire
> system with some instructions from userspace - "DC CVAU" followed by
> "DSB ISH", or something like that, I think? (See e.g.
> compat_arm_syscall(), the arm64 compat code that implements the 32-bit
> arm cacheflush() syscall.)

Note that the ARM cacheflush syscall calls flush_icache_user_range()
over the range of addresses that userspace has passed - it's intention
since day one is to support cases where userspace wants to change
executable code.

It will issue the appropriate write-backs to the data cache (DCCMVAU),
the invalidates to the instruction cache (ICIMVAU), invalidate the
branch target buffer (BPIALLIS or BPIALL as appropriate), and issue
the appropriate barriers (DSB ISHST, ISB).

Note that neither flush_icache_user_range() nor flush_icache_range()
result in IPIs; cache operations are broadcast across all CPUs (which
is one of the minimums we require for SMP systems.)

Now, that all said, I think the question that has to be asked is...

	What is the basic purpose of membarrier?

Is the purpose of it to provide memory barriers, or is it to provide
memory coherence?

If it's the former and not the latter, then cache flushes are out of
scope, and expecting memory written to be visible to the instruction
stream is totally out of scope of the membarrier interface, whether
or not the writes happen on the same or a different CPU to the one
executing the rewritten code.

The documentation in the kernel does not seem to describe what it's
supposed to be doing - the only thing I could find is this:
Documentation/features/sched/membarrier-sync-core/arch-support.txt
which describes it as "arch supports core serializing membarrier"
whatever that means.

Seems to be the standard and usual case of utterly poor to non-existent
documentation within the kernel tree, or even a pointer to where any
useful documentation can be found.

Reading the membarrier(2) man page, I find nothing in there that talks
about any kind of cache coherency for self-modifying code - it only
seems to be about _barriers_ and nothing more, and barriers alone do
precisely nothing to save you from non-coherent Harvard caches.

So, either Andy has a misunderstanding, or the man page is wrong, or
my rudimentary understanding of what membarrier is supposed to be
doing is wrong...

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply

* Re: [RFC please help] membarrier: Rewrite sync_core_before_usermode()
From: Andy Lutomirski @ 2020-12-28 18:50 UTC (permalink / raw)
  To: Jann Horn
  Cc: linuxppc-dev, Catalin Marinas, Arnd Bergmann, x86,
	Russell King - ARM Linux admin, Nicholas Piggin, linux-kernel,
	Mathieu Desnoyers, stable, Andy Lutomirski, Paul Mackerras,
	Will Deacon, linux-arm-kernel
In-Reply-To: <CAG48ez0YZ_iy6qZpdGUj38wqeg_NzLHHhU-mBCBf5hcopYGVPg@mail.gmail.com>

On Mon, Dec 28, 2020 at 10:30 AM Jann Horn <jannh@google.com> wrote:
>
> On Mon, Dec 28, 2020 at 6:14 PM Andy Lutomirski <luto@kernel.org> wrote:
> > On Mon, Dec 28, 2020 at 2:25 AM Russell King - ARM Linux admin
> > <linux@armlinux.org.uk> wrote:
> > >
> > > On Sun, Dec 27, 2020 at 01:36:13PM -0800, Andy Lutomirski wrote:
> > > > On Sun, Dec 27, 2020 at 12:18 PM Mathieu Desnoyers
> > > > <mathieu.desnoyers@efficios.com> wrote:
> > > > >
> > > > > ----- On Dec 27, 2020, at 1:28 PM, Andy Lutomirski luto@kernel.org wrote:
> > > > >
> > > >
> > > > > >
> > > > > > I admit that I'm rather surprised that the code worked at all on arm64,
> > > > > > and I'm suspicious that it has never been very well tested.  My apologies
> > > > > > for not reviewing this more carefully in the first place.
> > > > >
> > > > > Please refer to Documentation/features/sched/membarrier-sync-core/arch-support.txt
> > > > >
> > > > > It clearly states that only arm, arm64, powerpc and x86 support the membarrier
> > > > > sync core feature as of now:
> > > >
> > > > Sigh, I missed arm (32).  Russell or ARM folks, what's the right
> > > > incantation to make the CPU notice instruction changes initiated by
> > > > other cores on 32-bit ARM?
> > >
> > > You need to call flush_icache_range(), since the changes need to be
> > > flushed from the data cache to the point of unification (of the Harvard
> > > I and D), and the instruction cache needs to be invalidated so it can
> > > then see those updated instructions. This will also take care of the
> > > necessary barriers that the CPU requires for you.
> >
> > With what parameters?   From looking at the header, this is for the
> > case in which the kernel writes some memory and then intends to
> > execute it.  That's not what membarrier() does at all.  membarrier()
> > works like this:
> >
> > User thread 1:
> >
> > write to RWX memory *or* write to an RW alias of an X region.
> > membarrier(...);
> > somehow tell thread 2 that we're ready (with a store release, perhaps,
> > or even just a relaxed store.)
> >
> > User thread 2:
> >
> > wait for the indication from thread 1.
> > barrier();
> > jump to the code.
> >
> > membarrier() is, for better or for worse, not given a range of addresses.
> >
> > On x86, the documentation is a bit weak, but a strict reading
> > indicates that thread 2 must execute a serializing instruction at some
> > point after thread 1 writes the code and before thread 2 runs it.
> > membarrier() does this by sending an IPI and ensuring that a
> > "serializing" instruction (thanks for great terminology, Intel) is
> > executed.  Note that flush_icache_range() is a no-op on x86, and I've
> > asked Intel's architects to please clarify their precise rules.  No
> > response yet.
> >
> > On arm64, flush_icache_range() seems to send an IPI, and that's not
> > what I want.  membarrier() already does an IPI.
>
> After chatting with rmk about this (but without claiming that any of
> this is his opinion), based on the manpage, I think membarrier()
> currently doesn't really claim to be synchronizing caches? It just
> serializes cores. So arguably if userspace wants to use membarrier()
> to synchronize code changes, userspace should first do the code
> change, then flush icache as appropriate for the architecture, and
> then do the membarrier() to ensure that the old code is unused?

I haven't the faintest clue what "serializes cores" means.  It seems
to be a bit of a mishmash of x86 SDM terminology and Linux x86
"sync_core" terminology.  The latter means very little to me, even as
an x86 person.

I'm moderately confident that the *intent* is that a multithreaded
program can write some JIT code to memory, do this membarrier()
operation, and then execute the code, and it will work.  Maybe it's
even intended to work cross-architecture without any additional help
from the program.  But maybe the existing API is simply incorrect for
this.

>
> For 32-bit arm, rmk pointed out that that would be the cacheflush()
> syscall. That might cause you to end up with two IPIs instead of one
> in total, but we probably don't care _that_ much about extra IPIs on
> 32-bit arm?
>
> For arm64, I believe userspace can flush icache across the entire
> system with some instructions from userspace - "DC CVAU" followed by
> "DSB ISH", or something like that, I think? (See e.g.
> compat_arm_syscall(), the arm64 compat code that implements the 32-bit
> arm cacheflush() syscall.)

I have no idea what DC anything does.   Based on my very cursory
reading of the manual, ISB is the right approach, but I don't pretend
I understand all the details.

^ permalink raw reply

* Re: [RFC please help] membarrier: Rewrite sync_core_before_usermode()
From: Jann Horn @ 2020-12-28 18:29 UTC (permalink / raw)
  To: Andy Lutomirski, Will Deacon
  Cc: Catalin Marinas, Arnd Bergmann, x86,
	Russell King - ARM Linux admin, Nicholas Piggin, linux-kernel,
	Mathieu Desnoyers, stable, Paul Mackerras, linuxppc-dev,
	linux-arm-kernel
In-Reply-To: <CALCETrWQx0qwthBc5pJBxs2PWAQo-roAz-6g=7HOs+dsiokVsg@mail.gmail.com>

On Mon, Dec 28, 2020 at 6:14 PM Andy Lutomirski <luto@kernel.org> wrote:
> On Mon, Dec 28, 2020 at 2:25 AM Russell King - ARM Linux admin
> <linux@armlinux.org.uk> wrote:
> >
> > On Sun, Dec 27, 2020 at 01:36:13PM -0800, Andy Lutomirski wrote:
> > > On Sun, Dec 27, 2020 at 12:18 PM Mathieu Desnoyers
> > > <mathieu.desnoyers@efficios.com> wrote:
> > > >
> > > > ----- On Dec 27, 2020, at 1:28 PM, Andy Lutomirski luto@kernel.org wrote:
> > > >
> > >
> > > > >
> > > > > I admit that I'm rather surprised that the code worked at all on arm64,
> > > > > and I'm suspicious that it has never been very well tested.  My apologies
> > > > > for not reviewing this more carefully in the first place.
> > > >
> > > > Please refer to Documentation/features/sched/membarrier-sync-core/arch-support.txt
> > > >
> > > > It clearly states that only arm, arm64, powerpc and x86 support the membarrier
> > > > sync core feature as of now:
> > >
> > > Sigh, I missed arm (32).  Russell or ARM folks, what's the right
> > > incantation to make the CPU notice instruction changes initiated by
> > > other cores on 32-bit ARM?
> >
> > You need to call flush_icache_range(), since the changes need to be
> > flushed from the data cache to the point of unification (of the Harvard
> > I and D), and the instruction cache needs to be invalidated so it can
> > then see those updated instructions. This will also take care of the
> > necessary barriers that the CPU requires for you.
>
> With what parameters?   From looking at the header, this is for the
> case in which the kernel writes some memory and then intends to
> execute it.  That's not what membarrier() does at all.  membarrier()
> works like this:
>
> User thread 1:
>
> write to RWX memory *or* write to an RW alias of an X region.
> membarrier(...);
> somehow tell thread 2 that we're ready (with a store release, perhaps,
> or even just a relaxed store.)
>
> User thread 2:
>
> wait for the indication from thread 1.
> barrier();
> jump to the code.
>
> membarrier() is, for better or for worse, not given a range of addresses.
>
> On x86, the documentation is a bit weak, but a strict reading
> indicates that thread 2 must execute a serializing instruction at some
> point after thread 1 writes the code and before thread 2 runs it.
> membarrier() does this by sending an IPI and ensuring that a
> "serializing" instruction (thanks for great terminology, Intel) is
> executed.  Note that flush_icache_range() is a no-op on x86, and I've
> asked Intel's architects to please clarify their precise rules.  No
> response yet.
>
> On arm64, flush_icache_range() seems to send an IPI, and that's not
> what I want.  membarrier() already does an IPI.

After chatting with rmk about this (but without claiming that any of
this is his opinion), based on the manpage, I think membarrier()
currently doesn't really claim to be synchronizing caches? It just
serializes cores. So arguably if userspace wants to use membarrier()
to synchronize code changes, userspace should first do the code
change, then flush icache as appropriate for the architecture, and
then do the membarrier() to ensure that the old code is unused?

For 32-bit arm, rmk pointed out that that would be the cacheflush()
syscall. That might cause you to end up with two IPIs instead of one
in total, but we probably don't care _that_ much about extra IPIs on
32-bit arm?

For arm64, I believe userspace can flush icache across the entire
system with some instructions from userspace - "DC CVAU" followed by
"DSB ISH", or something like that, I think? (See e.g.
compat_arm_syscall(), the arm64 compat code that implements the 32-bit
arm cacheflush() syscall.)

^ permalink raw reply

* Re: [RFC please help] membarrier: Rewrite sync_core_before_usermode()
From: Andy Lutomirski @ 2020-12-28 18:10 UTC (permalink / raw)
  To: Russell King - ARM Linux admin
  Cc: Catalin Marinas, Arnd Bergmann, x86, linux-kernel,
	Nicholas Piggin, Will Deacon, Mathieu Desnoyers, stable,
	Andy Lutomirski, Paul Mackerras, linuxppc-dev, linux-arm-kernel
In-Reply-To: <20201228172301.GH1551@shell.armlinux.org.uk>

On Mon, Dec 28, 2020 at 9:23 AM Russell King - ARM Linux admin
<linux@armlinux.org.uk> wrote:
>
> On Mon, Dec 28, 2020 at 09:14:23AM -0800, Andy Lutomirski wrote:
> > On Mon, Dec 28, 2020 at 2:25 AM Russell King - ARM Linux admin
> > <linux@armlinux.org.uk> wrote:
> > >
> > > On Sun, Dec 27, 2020 at 01:36:13PM -0800, Andy Lutomirski wrote:
> > > > On Sun, Dec 27, 2020 at 12:18 PM Mathieu Desnoyers
> > > > <mathieu.desnoyers@efficios.com> wrote:
> > > > >
> > > > > ----- On Dec 27, 2020, at 1:28 PM, Andy Lutomirski luto@kernel.org wrote:
> > > > >
> > > >
> > > > > >
> > > > > > I admit that I'm rather surprised that the code worked at all on arm64,
> > > > > > and I'm suspicious that it has never been very well tested.  My apologies
> > > > > > for not reviewing this more carefully in the first place.
> > > > >
> > > > > Please refer to Documentation/features/sched/membarrier-sync-core/arch-support.txt
> > > > >
> > > > > It clearly states that only arm, arm64, powerpc and x86 support the membarrier
> > > > > sync core feature as of now:
> > > >
> > > > Sigh, I missed arm (32).  Russell or ARM folks, what's the right
> > > > incantation to make the CPU notice instruction changes initiated by
> > > > other cores on 32-bit ARM?
> > >
> > > You need to call flush_icache_range(), since the changes need to be
> > > flushed from the data cache to the point of unification (of the Harvard
> > > I and D), and the instruction cache needs to be invalidated so it can
> > > then see those updated instructions. This will also take care of the
> > > necessary barriers that the CPU requires for you.
> >
> > With what parameters?   From looking at the header, this is for the
> > case in which the kernel writes some memory and then intends to
> > execute it.  That's not what membarrier() does at all.  membarrier()
> > works like this:
>
> You didn't specify that you weren't looking at kernel memory.
>
> If you're talking about userspace, then the interface you require
> is flush_icache_user_range(), which does the same as
> flush_icache_range() but takes userspace addresses. Note that this
> requires that the memory is currently mapped at those userspace
> addresses.
>
> If that doesn't fit your needs, there isn't an interface to do what
> you require, and it basically means creating something brand new
> on every architecture.
>
> What you are asking for is not "just a matter of a few instructions".
> I have stated the required steps to achieve what you require above;
> that is the minimum when you have non-snooping harvard caches, which
> the majority of 32-bit ARMs have.
>
> > User thread 1:
> >
> > write to RWX memory *or* write to an RW alias of an X region.
> > membarrier(...);
> > somehow tell thread 2 that we're ready (with a store release, perhaps,
> > or even just a relaxed store.)
> >
> > User thread 2:
> >
> > wait for the indication from thread 1.
> > barrier();
> > jump to the code.
> >
> > membarrier() is, for better or for worse, not given a range of addresses.
>
> Then, I'm sorry, it can't work on 32-bit ARM.

Is there a way to flush the *entire* user icache?  If so, and if it
has reasonable performance, then it could probably be used here.
Otherwise I'll just send a revert for this whole mechanism on 32-bit
ARM.

--Andy

^ permalink raw reply

* Re: [RFC please help] membarrier: Rewrite sync_core_before_usermode()
From: Russell King - ARM Linux admin @ 2020-12-28 17:23 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Catalin Marinas, Arnd Bergmann, x86, linux-kernel,
	Nicholas Piggin, Will Deacon, Mathieu Desnoyers, stable,
	Paul Mackerras, linuxppc-dev, linux-arm-kernel
In-Reply-To: <CALCETrWQx0qwthBc5pJBxs2PWAQo-roAz-6g=7HOs+dsiokVsg@mail.gmail.com>

On Mon, Dec 28, 2020 at 09:14:23AM -0800, Andy Lutomirski wrote:
> On Mon, Dec 28, 2020 at 2:25 AM Russell King - ARM Linux admin
> <linux@armlinux.org.uk> wrote:
> >
> > On Sun, Dec 27, 2020 at 01:36:13PM -0800, Andy Lutomirski wrote:
> > > On Sun, Dec 27, 2020 at 12:18 PM Mathieu Desnoyers
> > > <mathieu.desnoyers@efficios.com> wrote:
> > > >
> > > > ----- On Dec 27, 2020, at 1:28 PM, Andy Lutomirski luto@kernel.org wrote:
> > > >
> > >
> > > > >
> > > > > I admit that I'm rather surprised that the code worked at all on arm64,
> > > > > and I'm suspicious that it has never been very well tested.  My apologies
> > > > > for not reviewing this more carefully in the first place.
> > > >
> > > > Please refer to Documentation/features/sched/membarrier-sync-core/arch-support.txt
> > > >
> > > > It clearly states that only arm, arm64, powerpc and x86 support the membarrier
> > > > sync core feature as of now:
> > >
> > > Sigh, I missed arm (32).  Russell or ARM folks, what's the right
> > > incantation to make the CPU notice instruction changes initiated by
> > > other cores on 32-bit ARM?
> >
> > You need to call flush_icache_range(), since the changes need to be
> > flushed from the data cache to the point of unification (of the Harvard
> > I and D), and the instruction cache needs to be invalidated so it can
> > then see those updated instructions. This will also take care of the
> > necessary barriers that the CPU requires for you.
> 
> With what parameters?   From looking at the header, this is for the
> case in which the kernel writes some memory and then intends to
> execute it.  That's not what membarrier() does at all.  membarrier()
> works like this:

You didn't specify that you weren't looking at kernel memory.

If you're talking about userspace, then the interface you require
is flush_icache_user_range(), which does the same as
flush_icache_range() but takes userspace addresses. Note that this
requires that the memory is currently mapped at those userspace
addresses.

If that doesn't fit your needs, there isn't an interface to do what
you require, and it basically means creating something brand new
on every architecture.

What you are asking for is not "just a matter of a few instructions".
I have stated the required steps to achieve what you require above;
that is the minimum when you have non-snooping harvard caches, which
the majority of 32-bit ARMs have.

> User thread 1:
> 
> write to RWX memory *or* write to an RW alias of an X region.
> membarrier(...);
> somehow tell thread 2 that we're ready (with a store release, perhaps,
> or even just a relaxed store.)
> 
> User thread 2:
> 
> wait for the indication from thread 1.
> barrier();
> jump to the code.
> 
> membarrier() is, for better or for worse, not given a range of addresses.

Then, I'm sorry, it can't work on 32-bit ARM.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply

* Re: [RFC please help] membarrier: Rewrite sync_core_before_usermode()
From: Andy Lutomirski @ 2020-12-28 17:14 UTC (permalink / raw)
  To: Russell King - ARM Linux admin
  Cc: Catalin Marinas, Arnd Bergmann, x86, linux-kernel,
	Nicholas Piggin, Will Deacon, Mathieu Desnoyers, stable,
	Andy Lutomirski, Paul Mackerras, linuxppc-dev, linux-arm-kernel
In-Reply-To: <20201228102537.GG1551@shell.armlinux.org.uk>

On Mon, Dec 28, 2020 at 2:25 AM Russell King - ARM Linux admin
<linux@armlinux.org.uk> wrote:
>
> On Sun, Dec 27, 2020 at 01:36:13PM -0800, Andy Lutomirski wrote:
> > On Sun, Dec 27, 2020 at 12:18 PM Mathieu Desnoyers
> > <mathieu.desnoyers@efficios.com> wrote:
> > >
> > > ----- On Dec 27, 2020, at 1:28 PM, Andy Lutomirski luto@kernel.org wrote:
> > >
> >
> > > >
> > > > I admit that I'm rather surprised that the code worked at all on arm64,
> > > > and I'm suspicious that it has never been very well tested.  My apologies
> > > > for not reviewing this more carefully in the first place.
> > >
> > > Please refer to Documentation/features/sched/membarrier-sync-core/arch-support.txt
> > >
> > > It clearly states that only arm, arm64, powerpc and x86 support the membarrier
> > > sync core feature as of now:
> >
> > Sigh, I missed arm (32).  Russell or ARM folks, what's the right
> > incantation to make the CPU notice instruction changes initiated by
> > other cores on 32-bit ARM?
>
> You need to call flush_icache_range(), since the changes need to be
> flushed from the data cache to the point of unification (of the Harvard
> I and D), and the instruction cache needs to be invalidated so it can
> then see those updated instructions. This will also take care of the
> necessary barriers that the CPU requires for you.

With what parameters?   From looking at the header, this is for the
case in which the kernel writes some memory and then intends to
execute it.  That's not what membarrier() does at all.  membarrier()
works like this:

User thread 1:

write to RWX memory *or* write to an RW alias of an X region.
membarrier(...);
somehow tell thread 2 that we're ready (with a store release, perhaps,
or even just a relaxed store.)

User thread 2:

wait for the indication from thread 1.
barrier();
jump to the code.

membarrier() is, for better or for worse, not given a range of addresses.

On x86, the documentation is a bit weak, but a strict reading
indicates that thread 2 must execute a serializing instruction at some
point after thread 1 writes the code and before thread 2 runs it.
membarrier() does this by sending an IPI and ensuring that a
"serializing" instruction (thanks for great terminology, Intel) is
executed.  Note that flush_icache_range() is a no-op on x86, and I've
asked Intel's architects to please clarify their precise rules.  No
response yet.

On arm64, flush_icache_range() seems to send an IPI, and that's not
what I want.  membarrier() already does an IPI.

I suppose one option is to revert support for this membarrier()
operation on arm, but it would be nice to just implement it instead.

--Andy

^ permalink raw reply

* [Bug 200055] [Bisected][Regression] WARNING: CPU: 0 PID: 1 at kernel/locking/lockdep.c:3214 .__lockdep_init_map+0x260/0x270
From: bugzilla-daemon @ 2020-12-28 14:23 UTC (permalink / raw)
  To: linuxppc-dev
In-Reply-To: <bug-200055-206035@https.bugzilla.kernel.org/>

https://bugzilla.kernel.org/show_bug.cgi?id=200055

--- Comment #23 from Erhard F. (erhard_f@mailbox.org) ---
Created attachment 294377
  --> https://bugzilla.kernel.org/attachment.cgi?id=294377&action=edit
kernel .config (kernel 5.11-rc1, G5 11,2)

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply

* [Bug 200055] [Bisected][Regression] WARNING: CPU: 0 PID: 1 at kernel/locking/lockdep.c:3214 .__lockdep_init_map+0x260/0x270
From: bugzilla-daemon @ 2020-12-28 14:23 UTC (permalink / raw)
  To: linuxppc-dev
In-Reply-To: <bug-200055-206035@https.bugzilla.kernel.org/>

https://bugzilla.kernel.org/show_bug.cgi?id=200055

Erhard F. (erhard_f@mailbox.org) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
 Attachment #283675|0                           |1
        is obsolete|                            |
 Attachment #283677|0                           |1
        is obsolete|                            |

--- Comment #22 from Erhard F. (erhard_f@mailbox.org) ---
Created attachment 294375
  --> https://bugzilla.kernel.org/attachment.cgi?id=294375&action=edit
dmesg (kernel 5.11-rc1, G5 11,2)

Still in 5.11-rc1.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply

* Re: [RFC please help] membarrier: Rewrite sync_core_before_usermode()
From: Russell King - ARM Linux admin @ 2020-12-28 10:25 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Catalin Marinas, Arnd Bergmann, x86, linux-kernel,
	Nicholas Piggin, Will Deacon, Mathieu Desnoyers, stable,
	Paul Mackerras, linuxppc-dev, linux-arm-kernel
In-Reply-To: <CALCETrVdcn2r2Jvd1=-bM=FQ8KbX4aH-v4ytdojL7r7Nb6k8YQ@mail.gmail.com>

On Sun, Dec 27, 2020 at 01:36:13PM -0800, Andy Lutomirski wrote:
> On Sun, Dec 27, 2020 at 12:18 PM Mathieu Desnoyers
> <mathieu.desnoyers@efficios.com> wrote:
> >
> > ----- On Dec 27, 2020, at 1:28 PM, Andy Lutomirski luto@kernel.org wrote:
> >
> 
> > >
> > > I admit that I'm rather surprised that the code worked at all on arm64,
> > > and I'm suspicious that it has never been very well tested.  My apologies
> > > for not reviewing this more carefully in the first place.
> >
> > Please refer to Documentation/features/sched/membarrier-sync-core/arch-support.txt
> >
> > It clearly states that only arm, arm64, powerpc and x86 support the membarrier
> > sync core feature as of now:
> 
> Sigh, I missed arm (32).  Russell or ARM folks, what's the right
> incantation to make the CPU notice instruction changes initiated by
> other cores on 32-bit ARM?

You need to call flush_icache_range(), since the changes need to be
flushed from the data cache to the point of unification (of the Harvard
I and D), and the instruction cache needs to be invalidated so it can
then see those updated instructions. This will also take care of the
necessary barriers that the CPU requires for you.

... as documented in Documentation/core-api/cachetlb.rst and so
should be available on every kernel supported CPU.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply

* Re: [RFC Qemu PATCH v2 1/2] spapr: drc: Add support for async hcalls at the drc level
From: David Gibson @ 2020-12-28  8:38 UTC (permalink / raw)
  To: Greg Kurz
  Cc: xiaoguangrong.eric, Shivaprasad G Bhat, mst, aneesh.kumar,
	linux-nvdimm, qemu-devel, kvm-ppc, shivaprasadbhat, qemu-ppc,
	bharata, imammedo, linuxppc-dev
In-Reply-To: <20201221130853.15c8ddfd@bahia.lan>

[-- Attachment #1: Type: text/plain, Size: 11196 bytes --]

On Mon, Dec 21, 2020 at 01:08:53PM +0100, Greg Kurz wrote:
> Hi Shiva,
> 
> On Mon, 30 Nov 2020 09:16:39 -0600
> Shivaprasad G Bhat <sbhat@linux.ibm.com> wrote:
> 
> > The patch adds support for async hcalls at the DRC level for the
> > spapr devices. To be used by spapr-scm devices in the patch/es to follow.
> > 
> > Signed-off-by: Shivaprasad G Bhat <sbhat@linux.ibm.com>
> > ---
> 
> The overall idea looks good but I think you should consider using
> a thread pool to implement it. See below.

I am not convinced, however.  Specifically, attaching this to the DRC
doesn't make sense to me.  We're adding exactly one DRC related async
hcall, and I can't really see much call for another one.  We could
have other async hcalls - indeed we already have one for HPT resizing
- but attaching this to DRCs doesn't help for those.

> 
> >  hw/ppc/spapr_drc.c         |  149 ++++++++++++++++++++++++++++++++++++++++++++
> >  include/hw/ppc/spapr_drc.h |   25 +++++++
> >  2 files changed, 174 insertions(+)
> > 
> > diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c
> > index 77718cde1f..4ecd04f686 100644
> > --- a/hw/ppc/spapr_drc.c
> > +++ b/hw/ppc/spapr_drc.c
> > @@ -15,6 +15,7 @@
> >  #include "qapi/qmp/qnull.h"
> >  #include "cpu.h"
> >  #include "qemu/cutils.h"
> > +#include "qemu/guest-random.h"
> >  #include "hw/ppc/spapr_drc.h"
> >  #include "qom/object.h"
> >  #include "migration/vmstate.h"
> > @@ -421,6 +422,148 @@ void spapr_drc_detach(SpaprDrc *drc)
> >      spapr_drc_release(drc);
> >  }
> >  
> > +
> > +/*
> > + * @drc : device DRC targetting which the async hcalls to be made.
> > + *
> > + * All subsequent requests to run/query the status should use the
> > + * unique token returned here.
> > + */
> > +uint64_t spapr_drc_get_new_async_hcall_token(SpaprDrc *drc)
> > +{
> > +    Error *err = NULL;
> > +    uint64_t token;
> > +    SpaprDrcDeviceAsyncHCallState *tmp, *next, *state;
> > +
> > +    state = g_malloc0(sizeof(*state));
> > +    state->pending = true;
> > +
> > +    qemu_mutex_lock(&drc->async_hcall_states_lock);
> > +retry:
> > +    if (qemu_guest_getrandom(&token, sizeof(token), &err) < 0) {
> > +        error_report_err(err);
> > +        g_free(state);
> > +        qemu_mutex_unlock(&drc->async_hcall_states_lock);
> > +        return 0;
> > +    }
> > +
> > +    if (!token) /* Token should be non-zero */
> > +        goto retry;
> > +
> > +    if (!QLIST_EMPTY(&drc->async_hcall_states)) {
> > +        QLIST_FOREACH_SAFE(tmp, &drc->async_hcall_states, node, next) {
> > +            if (tmp->continue_token == token) {
> > +                /* If the token already in use, get a new one */
> > +                goto retry;
> > +            }
> > +        }
> > +    }
> > +
> > +    state->continue_token = token;
> > +    QLIST_INSERT_HEAD(&drc->async_hcall_states, state, node);
> > +
> > +    qemu_mutex_unlock(&drc->async_hcall_states_lock);
> > +
> > +    return state->continue_token;
> > +}
> > +
> > +static void *spapr_drc_async_hcall_runner(void *opaque)
> > +{
> > +    int response = -1;
> > +    SpaprDrcDeviceAsyncHCallState *state = opaque;
> > +
> > +    /*
> > +     * state is freed only after this thread finishes(after pthread_join()),
> > +     * don't worry about it becoming NULL.
> > +     */
> > +
> > +    response = state->func(state->data);
> > +
> > +    state->hcall_ret = response;
> > +    state->pending = 0;
> > +
> > +    return NULL;
> > +}
> > +
> > +/*
> > + * @drc  : device DRC targetting which the async hcalls to be made.
> > + * token : The continue token to be used for tracking as recived from
> > + *         spapr_drc_get_new_async_hcall_token
> > + * @func() : the worker function which needs to be executed asynchronously
> > + * @data : data to be passed to the asynchronous function. Worker is supposed
> > + *         to free/cleanup the data that is passed here
> 
> It'd be cleaner to pass a completion callback and have free/cleanup handled there.
> 
> > + */
> > +void spapr_drc_run_async_hcall(SpaprDrc *drc, uint64_t token,
> > +                               SpaprDrcAsyncHcallWorkerFunc *func, void *data)
> > +{
> > +    SpaprDrcDeviceAsyncHCallState *state;
> > +
> > +    qemu_mutex_lock(&drc->async_hcall_states_lock);
> > +    QLIST_FOREACH(state, &drc->async_hcall_states, node) {
> > +        if (state->continue_token == token) {
> > +            state->func = func;
> > +            state->data = data;
> > +            qemu_thread_create(&state->thread, "sPAPR Async HCALL",
> > +                               spapr_drc_async_hcall_runner, state,
> > +                               QEMU_THREAD_JOINABLE);
> 
> qemu_thread_create() exits on failure, it shouldn't be called on
> a guest triggerable path, eg. a buggy guest could call it up to
> the point that pthread_create() returns EAGAIN.
> 
> Please use a thread pool (see thread_pool_submit_aio()). This takes care
> of all the thread housekeeping for you in a safe way, and it provides a
> completion callback API. The implementation could then be just about
> having two lists: one for pending requests (fed here) and one for
> completed requests (fed by the completion callback).
> 
> > +            break;
> > +        }
> > +    }
> > +    qemu_mutex_unlock(&drc->async_hcall_states_lock);
> > +}
> > +
> > +/*
> > + * spapr_drc_finish_async_hcalls
> > + *      Waits for all pending async requests to complete
> > + *      thier execution and free the states
> > + */
> > +static void spapr_drc_finish_async_hcalls(SpaprDrc *drc)
> > +{
> > +    SpaprDrcDeviceAsyncHCallState *state, *next;
> > +
> > +    if (QLIST_EMPTY(&drc->async_hcall_states)) {
> > +        return;
> > +    }
> > +
> > +    qemu_mutex_lock(&drc->async_hcall_states_lock);
> > +    QLIST_FOREACH_SAFE(state, &drc->async_hcall_states, node, next) {
> > +        qemu_thread_join(&state->thread);
> 
> With a thread-pool, you'd just need to aio_poll() until the pending list
> is empty and then clear the completed list.
> 
> > +        QLIST_REMOVE(state, node);
> > +        g_free(state);
> > +    }
> > +    qemu_mutex_unlock(&drc->async_hcall_states_lock);
> > +}
> > +
> > +/*
> > + * spapr_drc_get_async_hcall_status
> > + *      Fetches the status of the hcall worker and returns H_BUSY
> > + *      if the worker is still running.
> > + */
> > +int spapr_drc_get_async_hcall_status(SpaprDrc *drc, uint64_t token)
> > +{
> > +    int ret = H_PARAMETER;
> > +    SpaprDrcDeviceAsyncHCallState *state, *node;
> > +
> > +    qemu_mutex_lock(&drc->async_hcall_states_lock);
> > +    QLIST_FOREACH_SAFE(state, &drc->async_hcall_states, node, node) {
> > +        if (state->continue_token == token) {
> > +            if (state->pending) {
> > +                ret = H_BUSY;
> > +                break;
> > +            } else {
> > +                ret = state->hcall_ret;
> > +                qemu_thread_join(&state->thread);
> 
> Like for qemu_thread_create(), the guest shouldn't be responsible for
> thread housekeeping. Getting the hcall status should just be about
> finding the token in the pending or completed lists.
> 
> > +                QLIST_REMOVE(state, node);
> > +                g_free(state);
> > +                break;
> > +            }
> > +        }
> > +    }
> > +    qemu_mutex_unlock(&drc->async_hcall_states_lock);
> > +
> > +    return ret;
> > +}
> > +
> >  void spapr_drc_reset(SpaprDrc *drc)
> >  {
> >      SpaprDrcClass *drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
> > @@ -448,6 +591,7 @@ void spapr_drc_reset(SpaprDrc *drc)
> >          drc->ccs_offset = -1;
> >          drc->ccs_depth = -1;
> >      }
> > +    spapr_drc_finish_async_hcalls(drc);
> >  }
> >  
> >  static bool spapr_drc_unplug_requested_needed(void *opaque)
> > @@ -558,6 +702,7 @@ SpaprDrc *spapr_dr_connector_new(Object *owner, const char *type,
> >      drc->owner = owner;
> >      prop_name = g_strdup_printf("dr-connector[%"PRIu32"]",
> >                                  spapr_drc_index(drc));
> > +
> 
> Unrelated change.
> 
> >      object_property_add_child(owner, prop_name, OBJECT(drc));
> >      object_unref(OBJECT(drc));
> >      qdev_realize(DEVICE(drc), NULL, NULL);
> > @@ -577,6 +722,10 @@ static void spapr_dr_connector_instance_init(Object *obj)
> >      object_property_add(obj, "fdt", "struct", prop_get_fdt,
> >                          NULL, NULL, NULL);
> >      drc->state = drck->empty_state;
> > +
> > +    qemu_mutex_init(&drc->async_hcall_states_lock);
> > +    QLIST_INIT(&drc->async_hcall_states);
> > +
> 
> Empty line not needed.
> 
> >  }
> >  
> >  static void spapr_dr_connector_class_init(ObjectClass *k, void *data)
> > diff --git a/include/hw/ppc/spapr_drc.h b/include/hw/ppc/spapr_drc.h
> > index 165b281496..77f6e4386c 100644
> > --- a/include/hw/ppc/spapr_drc.h
> > +++ b/include/hw/ppc/spapr_drc.h
> > @@ -18,6 +18,7 @@
> >  #include "sysemu/runstate.h"
> >  #include "hw/qdev-core.h"
> >  #include "qapi/error.h"
> > +#include "block/thread-pool.h"
> >  
> >  #define TYPE_SPAPR_DR_CONNECTOR "spapr-dr-connector"
> >  #define SPAPR_DR_CONNECTOR_GET_CLASS(obj) \
> > @@ -168,6 +169,21 @@ typedef enum {
> >      SPAPR_DRC_STATE_PHYSICAL_CONFIGURED = 8,
> >  } SpaprDrcState;
> >  
> > +typedef struct SpaprDrc SpaprDrc;
> > +
> > +typedef int SpaprDrcAsyncHcallWorkerFunc(void *opaque);
> > +typedef struct SpaprDrcDeviceAsyncHCallState {
> > +    uint64_t continue_token;
> > +    bool pending;
> > +
> > +    int hcall_ret;
> > +    SpaprDrcAsyncHcallWorkerFunc *func;
> > +    void *data;
> > +
> > +    QemuThread thread;
> > +
> > +    QLIST_ENTRY(SpaprDrcDeviceAsyncHCallState) node;
> > +} SpaprDrcDeviceAsyncHCallState;
> >  typedef struct SpaprDrc {
> >      /*< private >*/
> >      DeviceState parent;
> > @@ -182,6 +198,10 @@ typedef struct SpaprDrc {
> >      int ccs_offset;
> >      int ccs_depth;
> >  
> > +    /* async hcall states */
> > +    QemuMutex async_hcall_states_lock;
> > +    QLIST_HEAD(, SpaprDrcDeviceAsyncHCallState) async_hcall_states;
> > +
> >      /* device pointer, via link property */
> >      DeviceState *dev;
> >      bool unplug_requested;
> > @@ -241,6 +261,11 @@ void spapr_drc_detach(SpaprDrc *drc);
> >  /* Returns true if a hot plug/unplug request is pending */
> >  bool spapr_drc_transient(SpaprDrc *drc);
> >  
> > +uint64_t spapr_drc_get_new_async_hcall_token(SpaprDrc *drc);
> > +void spapr_drc_run_async_hcall(SpaprDrc *drc, uint64_t token,
> > +                               SpaprDrcAsyncHcallWorkerFunc, void *data);
> > +int spapr_drc_get_async_hcall_status(SpaprDrc *drc, uint64_t token);
> > +
> >  static inline bool spapr_drc_unplug_requested(SpaprDrc *drc)
> >  {
> >      return drc->unplug_requested;
> > 
> > 
> > 
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* [PATCH v3] powerpc/perf/hv-24x7: Dont create sysfs event files for dummy events
From: Kajol Jain @ 2020-12-28  8:52 UTC (permalink / raw)
  To: mpe, linuxppc-dev; +Cc: kjain, suka, maddy, atrajeev

hv_24x7 performance monitoring unit creates list of supported events
from the event catalog obtained via HCALL. hv_24x7 catalog could also
contain invalid or dummy events with names like RESERVED*.
These events does not have any hardware counters backing them.
So patch adds a check to string compare the event names
to filter out them.

Result in power9 machine:

Before this patch:
.....
  hv_24x7/PM_XLINK2_OUT_ODD_CYC,chip=?/              [Kernel PMU event]
  hv_24x7/PM_XLINK2_OUT_ODD_DATA_COUNT,chip=?/       [Kernel PMU event]
  hv_24x7/PM_XLINK2_OUT_ODD_TOTAL_UTIL,chip=?/       [Kernel PMU event]
  hv_24x7/PM_XTS_ATR_DEMAND_CHECKOUT,chip=?/         [Kernel PMU event]
  hv_24x7/PM_XTS_ATR_DEMAND_CHECKOUT_MISS,chip=?/    [Kernel PMU event]
  hv_24x7/PM_XTS_ATSD_SENT,chip=?/                   [Kernel PMU event]
  hv_24x7/PM_XTS_ATSD_TLBI_RCV,chip=?/               [Kernel PMU event]
  hv_24x7/RESERVED_NEST1,chip=?/                     [Kernel PMU event]
  hv_24x7/RESERVED_NEST10,chip=?/                    [Kernel PMU event]
  hv_24x7/RESERVED_NEST11,chip=?/                    [Kernel PMU event]
  hv_24x7/RESERVED_NEST12,chip=?/                    [Kernel PMU event]
  hv_24x7/RESERVED_NEST13,chip=?/                    [Kernel PMU event]
......

Dmesg:
[    0.000362] printk: console [hvc0] enabled
[    0.815452] hv-24x7: read 1530 catalog entries, created 537 event attrs
(0 failures), 275 descs

After this patch:
......
  hv_24x7/PM_XLINK2_OUT_ODD_AVLBL_CYC,chip=?/        [Kernel PMU event]
  hv_24x7/PM_XLINK2_OUT_ODD_CYC,chip=?/              [Kernel PMU event]
  hv_24x7/PM_XLINK2_OUT_ODD_DATA_COUNT,chip=?/       [Kernel PMU event]
  hv_24x7/PM_XLINK2_OUT_ODD_TOTAL_UTIL,chip=?/       [Kernel PMU event]
  hv_24x7/PM_XTS_ATR_DEMAND_CHECKOUT,chip=?/         [Kernel PMU event]
  hv_24x7/PM_XTS_ATR_DEMAND_CHECKOUT_MISS,chip=?/    [Kernel PMU event]
  hv_24x7/PM_XTS_ATSD_SENT,chip=?/                   [Kernel PMU event]
  hv_24x7/PM_XTS_ATSD_TLBI_RCV,chip=?/               [Kernel PMU event]
  hv_24x7/TOD,chip=?/                                [Kernel PMU event]
......

Demsg:
[    0.000357] printk: console [hvc0] enabled
[    0.808592] hv-24x7: read 1530 catalog entries, created 509 event attrs
(0 failures), 275 descs

Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
---
 arch/powerpc/perf/hv-24x7.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

---
Changelog
v2 -> v3
- Removed "FREE_" check from "ignore_event".

v1 -> v2
- Include "RESERVED*" as part of the invalid event check as
  suggested by Madhavan Srinivasan
- Add new helper function "ignore_event" to check invalid/dummy
  events as suggested by Michael Ellerman
- Remove pr_info to print each invalid event as suggested by
  Michael Ellerman
---
diff --git a/arch/powerpc/perf/hv-24x7.c b/arch/powerpc/perf/hv-24x7.c
index 6e7e820508df..2f32b532b359 100644
--- a/arch/powerpc/perf/hv-24x7.c
+++ b/arch/powerpc/perf/hv-24x7.c
@@ -764,6 +764,14 @@ static ssize_t catalog_event_len_validate(struct hv_24x7_event_data *event,
 	return ev_len;
 }
 
+/*
+ * Return true incase of invalid or dummy events with names like RESERVED*
+ */
+static bool ignore_event(const char *name)
+{
+	return (!strncmp(name, "RESERVED", 8)) ? true : false;
+}
+
 #define MAX_4K (SIZE_MAX / 4096)
 
 static int create_events_from_catalog(struct attribute ***events_,
@@ -894,6 +902,10 @@ static int create_events_from_catalog(struct attribute ***events_,
 
 		name = event_name(event, &nl);
 
+		if (ignore_event(name)) {
+			junk_events++;
+			continue;
+		}
 		if (event->event_group_record_len == 0) {
 			pr_devel("invalid event %zu (%.*s): group_record_len == 0, skipping\n",
 					event_idx, nl, name);
@@ -955,6 +967,9 @@ static int create_events_from_catalog(struct attribute ***events_,
 			continue;
 
 		name  = event_name(event, &nl);
+		if (ignore_event(name))
+			continue;
+
 		nonce = event_uniq_add(&ev_uniq, name, nl, event->domain);
 		ct    = event_data_to_attrs(event_idx, events + event_attr_ct,
 					    event, nonce);
-- 
2.27.0


^ permalink raw reply related

* [PATCH] selftests/powerpc: make the test check in eeh-basic.sh posix compliant
From: Po-Hsu Lin @ 2020-12-28  4:34 UTC (permalink / raw)
  To: linux-kernel, linux-kselftest, linuxppc-dev
  Cc: oohall, po-hsu.lin, shuah, paulus

The == operand is a bash extension, thus this will fail on Ubuntu with

As the /bin/sh on Ubuntu is pointed to DASH.

Use -eq to fix this posix compatibility issue.

Fixes: 996f9e0f93f162 ("selftests/powerpc: Fix eeh-basic.sh exit codes")
Signed-off-by: Po-Hsu Lin <po-hsu.lin@canonical.com>
---
 tools/testing/selftests/powerpc/eeh/eeh-basic.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/testing/selftests/powerpc/eeh/eeh-basic.sh b/tools/testing/selftests/powerpc/eeh/eeh-basic.sh
index 0d783e1..64779f0 100755
--- a/tools/testing/selftests/powerpc/eeh/eeh-basic.sh
+++ b/tools/testing/selftests/powerpc/eeh/eeh-basic.sh
@@ -86,5 +86,5 @@ echo "$failed devices failed to recover ($dev_count tested)"
 lspci | diff -u $pre_lspci -
 rm -f $pre_lspci
 
-test "$failed" == 0
+test "$failed" -eq 0
 exit $?
-- 
2.7.4


^ permalink raw reply related

* Re: [RFC please help] membarrier: Rewrite sync_core_before_usermode()
From: Andy Lutomirski @ 2020-12-27 21:36 UTC (permalink / raw)
  To: Mathieu Desnoyers, Russell King
  Cc: Catalin Marinas, Arnd Bergmann, x86, linux-kernel,
	Nicholas Piggin, Will Deacon, Paul Mackerras, stable,
	Andy Lutomirski, linuxppc-dev, linux-arm-kernel
In-Reply-To: <1836294649.3345.1609100294833.JavaMail.zimbra@efficios.com>

On Sun, Dec 27, 2020 at 12:18 PM Mathieu Desnoyers
<mathieu.desnoyers@efficios.com> wrote:
>
> ----- On Dec 27, 2020, at 1:28 PM, Andy Lutomirski luto@kernel.org wrote:
>

> >
> > I admit that I'm rather surprised that the code worked at all on arm64,
> > and I'm suspicious that it has never been very well tested.  My apologies
> > for not reviewing this more carefully in the first place.
>
> Please refer to Documentation/features/sched/membarrier-sync-core/arch-support.txt
>
> It clearly states that only arm, arm64, powerpc and x86 support the membarrier
> sync core feature as of now:

Sigh, I missed arm (32).  Russell or ARM folks, what's the right
incantation to make the CPU notice instruction changes initiated by
other cores on 32-bit ARM?

>
>
> # Architecture requirements
> #
> # * arm/arm64/powerpc
> #
> # Rely on implicit context synchronization as a result of exception return
> # when returning from IPI handler, and when returning to user-space.
> #
> # * x86
> #
> # x86-32 uses IRET as return from interrupt, which takes care of the IPI.
> # However, it uses both IRET and SYSEXIT to go back to user-space. The IRET
> # instruction is core serializing, but not SYSEXIT.
> #
> # x86-64 uses IRET as return from interrupt, which takes care of the IPI.
> # However, it can return to user-space through either SYSRETL (compat code),
> # SYSRETQ, or IRET.
> #
> # Given that neither SYSRET{L,Q}, nor SYSEXIT, are core serializing, we rely
> # instead on write_cr3() performed by switch_mm() to provide core serialization
> # after changing the current mm, and deal with the special case of kthread ->
> # uthread (temporarily keeping current mm into active_mm) by issuing a
> # sync_core_before_usermode() in that specific case.
>

I need to update that document as part of my series.

> This is based on direct feedback from the architecture maintainers.
>
> You seem to have noticed odd cases on arm64 where this guarantee does not
> match reality. Where exactly can we find this in the code, and which part
> of the architecture manual can you point us to which supports your concern ?
>
> Based on the notes I have, use of `eret` on aarch64 guarantees a context synchronizing
> instruction when returning to user-space.

Based on my reading of the manual, ERET on ARM doesn't synchronize
anything at all.  I can't find any evidence that it synchronizes data
or instructions, and I've seen reports that the CPU will happily
speculate right past it.

--Andy

^ permalink raw reply

* Re: [RFC please help] membarrier: Rewrite sync_core_before_usermode()
From: Mathieu Desnoyers @ 2020-12-27 20:18 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Catalin Marinas, Arnd Bergmann, x86, linux-kernel,
	Nicholas Piggin, Will Deacon, Paul Mackerras, stable,
	linuxppc-dev, linux-arm-kernel
In-Reply-To: <bf59ecb5487171a852bcc8cdd553ec797aedc485.1609093476.git.luto@kernel.org>

----- On Dec 27, 2020, at 1:28 PM, Andy Lutomirski luto@kernel.org wrote:

> The old sync_core_before_usermode() comments said that a non-icache-syncing
> return-to-usermode instruction is x86-specific and that all other
> architectures automatically notice cross-modified code on return to
> userspace.  Based on my general understanding of how CPUs work and based on
> my atttempt to read the ARM manual, this is not true at all.  In fact, x86
> seems to be a bit of an anomaly in the other direction: x86's IRET is
> unusually heavyweight for a return-to-usermode instruction.
> 
> So let's drop any pretense that we can have a generic way implementation
> behind membarrier's SYNC_CORE flush and require all architectures that opt
> in to supply their own.

Removing the generic implementation is OK with me, as this will really require
architecture maintainers to think hard about it when porting this feature.

> This means x86, arm64, and powerpc for now.  Let's
> also rename the function from sync_core_before_usermode() to
> membarrier_sync_core_before_usermode() because the precise flushing details
> may very well be specific to membarrier, and even the concept of
> "sync_core" in the kernel is mostly an x86-ism.

Work for me too.

> 
> I admit that I'm rather surprised that the code worked at all on arm64,
> and I'm suspicious that it has never been very well tested.  My apologies
> for not reviewing this more carefully in the first place.

Please refer to Documentation/features/sched/membarrier-sync-core/arch-support.txt

It clearly states that only arm, arm64, powerpc and x86 support the membarrier
sync core feature as of now:


# Architecture requirements
#
# * arm/arm64/powerpc
#
# Rely on implicit context synchronization as a result of exception return
# when returning from IPI handler, and when returning to user-space.
#
# * x86
#
# x86-32 uses IRET as return from interrupt, which takes care of the IPI.
# However, it uses both IRET and SYSEXIT to go back to user-space. The IRET
# instruction is core serializing, but not SYSEXIT.
#
# x86-64 uses IRET as return from interrupt, which takes care of the IPI.
# However, it can return to user-space through either SYSRETL (compat code),
# SYSRETQ, or IRET.
#
# Given that neither SYSRET{L,Q}, nor SYSEXIT, are core serializing, we rely
# instead on write_cr3() performed by switch_mm() to provide core serialization
# after changing the current mm, and deal with the special case of kthread ->
# uthread (temporarily keeping current mm into active_mm) by issuing a
# sync_core_before_usermode() in that specific case.

This is based on direct feedback from the architecture maintainers.

You seem to have noticed odd cases on arm64 where this guarantee does not
match reality. Where exactly can we find this in the code, and which part
of the architecture manual can you point us to which supports your concern ?

Based on the notes I have, use of `eret` on aarch64 guarantees a context synchronizing
instruction when returning to user-space.

Thanks,

Mathieu

> 
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Cc: Paul Mackerras <paulus@samba.org>
> Cc: linuxppc-dev@lists.ozlabs.org
> Cc: Nicholas Piggin <npiggin@gmail.com>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Will Deacon <will@kernel.org>
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> Cc: x86@kernel.org
> Cc: stable@vger.kernel.org
> Fixes: 70216e18e519 ("membarrier: Provide core serializing command,
> *_SYNC_CORE")
> Signed-off-by: Andy Lutomirski <luto@kernel.org>
> ---
> 
> Hi arm64 and powerpc people-
> 
> This is part of a series here:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/log/?h=x86/fixes
> 
> Before I send out the whole series, I'm hoping that some arm64 and powerpc
> people can help me verify that I did this patch right.  Once I get
> some feedback on this patch, I'll send out the whole pile.  And once
> *that's* done, I'll start giving the mm lazy stuff some serious thought.
> 
> The x86 part is already fixed in Linus' tree.
> 
> Thanks,
> Andy
> 
> arch/arm64/include/asm/sync_core.h   | 21 +++++++++++++++++++++
> arch/powerpc/include/asm/sync_core.h | 20 ++++++++++++++++++++
> arch/x86/Kconfig                     |  1 -
> arch/x86/include/asm/sync_core.h     |  7 +++----
> include/linux/sched/mm.h             |  1 -
> include/linux/sync_core.h            | 21 ---------------------
> init/Kconfig                         |  3 ---
> kernel/sched/membarrier.c            | 15 +++++++++++----
> 8 files changed, 55 insertions(+), 34 deletions(-)
> create mode 100644 arch/arm64/include/asm/sync_core.h
> create mode 100644 arch/powerpc/include/asm/sync_core.h
> delete mode 100644 include/linux/sync_core.h
> 
> diff --git a/arch/arm64/include/asm/sync_core.h
> b/arch/arm64/include/asm/sync_core.h
> new file mode 100644
> index 000000000000..5be4531caabd
> --- /dev/null
> +++ b/arch/arm64/include/asm/sync_core.h
> @@ -0,0 +1,21 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef _ASM_ARM64_SYNC_CORE_H
> +#define _ASM_ARM64_SYNC_CORE_H
> +
> +#include <asm/barrier.h>
> +
> +/*
> + * Ensure that the CPU notices any instruction changes before the next time
> + * it returns to usermode.
> + */
> +static inline void membarrier_sync_core_before_usermode(void)
> +{
> +	/*
> +	 * XXX: is this enough or do we need a DMB first to make sure that
> +	 * writes from other CPUs become visible to this CPU?  We have an
> +	 * smp_mb() already, but that's not quite the same thing.
> +	 */
> +	isb();
> +}
> +
> +#endif /* _ASM_ARM64_SYNC_CORE_H */
> diff --git a/arch/powerpc/include/asm/sync_core.h
> b/arch/powerpc/include/asm/sync_core.h
> new file mode 100644
> index 000000000000..71dfbe7794e5
> --- /dev/null
> +++ b/arch/powerpc/include/asm/sync_core.h
> @@ -0,0 +1,20 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef _ASM_POWERPC_SYNC_CORE_H
> +#define _ASM_POWERPC_SYNC_CORE_H
> +
> +#include <asm/barrier.h>
> +
> +/*
> + * Ensure that the CPU notices any instruction changes before the next time
> + * it returns to usermode.
> + */
> +static inline void membarrier_sync_core_before_usermode(void)
> +{
> +	/*
> +	 * XXX: I know basically nothing about powerpc cache management.
> +	 * Is this correct?
> +	 */
> +	isync();
> +}
> +
> +#endif /* _ASM_POWERPC_SYNC_CORE_H */
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index b5137cc5b7b4..895f70fd4a61 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -81,7 +81,6 @@ config X86
> 	select ARCH_HAS_SET_DIRECT_MAP
> 	select ARCH_HAS_STRICT_KERNEL_RWX
> 	select ARCH_HAS_STRICT_MODULE_RWX
> -	select ARCH_HAS_SYNC_CORE_BEFORE_USERMODE
> 	select ARCH_HAS_SYSCALL_WRAPPER
> 	select ARCH_HAS_UBSAN_SANITIZE_ALL
> 	select ARCH_HAS_DEBUG_WX
> diff --git a/arch/x86/include/asm/sync_core.h b/arch/x86/include/asm/sync_core.h
> index ab7382f92aff..c665b453969a 100644
> --- a/arch/x86/include/asm/sync_core.h
> +++ b/arch/x86/include/asm/sync_core.h
> @@ -89,11 +89,10 @@ static inline void sync_core(void)
> }
> 
> /*
> - * Ensure that a core serializing instruction is issued before returning
> - * to user-mode. x86 implements return to user-space through sysexit,
> - * sysrel, and sysretq, which are not core serializing.
> + * Ensure that the CPU notices any instruction changes before the next time
> + * it returns to usermode.
>  */
> -static inline void sync_core_before_usermode(void)
> +static inline void membarrier_sync_core_before_usermode(void)
> {
> 	/* With PTI, we unconditionally serialize before running user code. */
> 	if (static_cpu_has(X86_FEATURE_PTI))
> diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h
> index 48640db6ca86..81ba47910a73 100644
> --- a/include/linux/sched/mm.h
> +++ b/include/linux/sched/mm.h
> @@ -7,7 +7,6 @@
> #include <linux/sched.h>
> #include <linux/mm_types.h>
> #include <linux/gfp.h>
> -#include <linux/sync_core.h>
> 
> /*
>  * Routines for handling mm_structs
> diff --git a/include/linux/sync_core.h b/include/linux/sync_core.h
> deleted file mode 100644
> index 013da4b8b327..000000000000
> --- a/include/linux/sync_core.h
> +++ /dev/null
> @@ -1,21 +0,0 @@
> -/* SPDX-License-Identifier: GPL-2.0 */
> -#ifndef _LINUX_SYNC_CORE_H
> -#define _LINUX_SYNC_CORE_H
> -
> -#ifdef CONFIG_ARCH_HAS_SYNC_CORE_BEFORE_USERMODE
> -#include <asm/sync_core.h>
> -#else
> -/*
> - * This is a dummy sync_core_before_usermode() implementation that can be used
> - * on all architectures which return to user-space through core serializing
> - * instructions.
> - * If your architecture returns to user-space through non-core-serializing
> - * instructions, you need to write your own functions.
> - */
> -static inline void sync_core_before_usermode(void)
> -{
> -}
> -#endif
> -
> -#endif /* _LINUX_SYNC_CORE_H */
> -
> diff --git a/init/Kconfig b/init/Kconfig
> index c9446911cf41..eb9772078cd4 100644
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -2334,9 +2334,6 @@ source "kernel/Kconfig.locks"
> config ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE
> 	bool
> 
> -config ARCH_HAS_SYNC_CORE_BEFORE_USERMODE
> -	bool
> -
> # It may be useful for an architecture to override the definitions of the
> # SYSCALL_DEFINE() and __SYSCALL_DEFINEx() macros in <linux/syscalls.h>
> # and the COMPAT_ variants in <linux/compat.h>, in particular to use a
> diff --git a/kernel/sched/membarrier.c b/kernel/sched/membarrier.c
> index b3a82d7635da..db4945e1ec94 100644
> --- a/kernel/sched/membarrier.c
> +++ b/kernel/sched/membarrier.c
> @@ -5,6 +5,9 @@
>  * membarrier system call
>  */
> #include "sched.h"
> +#ifdef CONFIG_ARCH_HAS_MEMBARRIER_SYNC_CORE
> +#include <asm/sync_core.h>
> +#endif
> 
> /*
>  * The basic principle behind the regular memory barrier mode of membarrier()
> @@ -221,6 +224,7 @@ static void ipi_mb(void *info)
> 	smp_mb();	/* IPIs should be serializing but paranoid. */
> }
> 
> +#ifdef CONFIG_ARCH_HAS_MEMBARRIER_SYNC_CORE
> static void ipi_sync_core(void *info)
> {
> 	/*
> @@ -230,13 +234,14 @@ static void ipi_sync_core(void *info)
> 	 * the big comment at the top of this file.
> 	 *
> 	 * A sync_core() would provide this guarantee, but
> -	 * sync_core_before_usermode() might end up being deferred until
> -	 * after membarrier()'s smp_mb().
> +	 * membarrier_sync_core_before_usermode() might end up being deferred
> +	 * until after membarrier()'s smp_mb().
> 	 */
> 	smp_mb();	/* IPIs should be serializing but paranoid. */
> 
> -	sync_core_before_usermode();
> +	membarrier_sync_core_before_usermode();
> }
> +#endif
> 
> static void ipi_rseq(void *info)
> {
> @@ -368,12 +373,14 @@ static int membarrier_private_expedited(int flags, int
> cpu_id)
> 	smp_call_func_t ipi_func = ipi_mb;
> 
> 	if (flags == MEMBARRIER_FLAG_SYNC_CORE) {
> -		if (!IS_ENABLED(CONFIG_ARCH_HAS_MEMBARRIER_SYNC_CORE))
> +#ifndef CONFIG_ARCH_HAS_MEMBARRIER_SYNC_CORE
> 			return -EINVAL;
> +#else
> 		if (!(atomic_read(&mm->membarrier_state) &
> 		      MEMBARRIER_STATE_PRIVATE_EXPEDITED_SYNC_CORE_READY))
> 			return -EPERM;
> 		ipi_func = ipi_sync_core;
> +#endif
> 	} else if (flags == MEMBARRIER_FLAG_RSEQ) {
> 		if (!IS_ENABLED(CONFIG_RSEQ))
> 			return -EINVAL;
> --
> 2.29.2

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply

* [RFC please help] membarrier: Rewrite sync_core_before_usermode()
From: Andy Lutomirski @ 2020-12-27 18:28 UTC (permalink / raw)
  To: x86, Mathieu Desnoyers
  Cc: Catalin Marinas, Arnd Bergmann, LKML, Nicholas Piggin,
	Will Deacon, Paul Mackerras, stable, Andy Lutomirski,
	linuxppc-dev, linux-arm-kernel

The old sync_core_before_usermode() comments said that a non-icache-syncing
return-to-usermode instruction is x86-specific and that all other
architectures automatically notice cross-modified code on return to
userspace.  Based on my general understanding of how CPUs work and based on
my atttempt to read the ARM manual, this is not true at all.  In fact, x86
seems to be a bit of an anomaly in the other direction: x86's IRET is
unusually heavyweight for a return-to-usermode instruction.

So let's drop any pretense that we can have a generic way implementation
behind membarrier's SYNC_CORE flush and require all architectures that opt
in to supply their own.  This means x86, arm64, and powerpc for now.  Let's
also rename the function from sync_core_before_usermode() to
membarrier_sync_core_before_usermode() because the precise flushing details
may very well be specific to membarrier, and even the concept of
"sync_core" in the kernel is mostly an x86-ism.

I admit that I'm rather surprised that the code worked at all on arm64,
and I'm suspicious that it has never been very well tested.  My apologies
for not reviewing this more carefully in the first place.

Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: x86@kernel.org
Cc: stable@vger.kernel.org
Fixes: 70216e18e519 ("membarrier: Provide core serializing command, *_SYNC_CORE")
Signed-off-by: Andy Lutomirski <luto@kernel.org>
---

Hi arm64 and powerpc people-

This is part of a series here:

https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/log/?h=x86/fixes

Before I send out the whole series, I'm hoping that some arm64 and powerpc
people can help me verify that I did this patch right.  Once I get
some feedback on this patch, I'll send out the whole pile.  And once
*that's* done, I'll start giving the mm lazy stuff some serious thought.

The x86 part is already fixed in Linus' tree.

Thanks,
Andy

 arch/arm64/include/asm/sync_core.h   | 21 +++++++++++++++++++++
 arch/powerpc/include/asm/sync_core.h | 20 ++++++++++++++++++++
 arch/x86/Kconfig                     |  1 -
 arch/x86/include/asm/sync_core.h     |  7 +++----
 include/linux/sched/mm.h             |  1 -
 include/linux/sync_core.h            | 21 ---------------------
 init/Kconfig                         |  3 ---
 kernel/sched/membarrier.c            | 15 +++++++++++----
 8 files changed, 55 insertions(+), 34 deletions(-)
 create mode 100644 arch/arm64/include/asm/sync_core.h
 create mode 100644 arch/powerpc/include/asm/sync_core.h
 delete mode 100644 include/linux/sync_core.h

diff --git a/arch/arm64/include/asm/sync_core.h b/arch/arm64/include/asm/sync_core.h
new file mode 100644
index 000000000000..5be4531caabd
--- /dev/null
+++ b/arch/arm64/include/asm/sync_core.h
@@ -0,0 +1,21 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_ARM64_SYNC_CORE_H
+#define _ASM_ARM64_SYNC_CORE_H
+
+#include <asm/barrier.h>
+
+/*
+ * Ensure that the CPU notices any instruction changes before the next time
+ * it returns to usermode.
+ */
+static inline void membarrier_sync_core_before_usermode(void)
+{
+	/*
+	 * XXX: is this enough or do we need a DMB first to make sure that
+	 * writes from other CPUs become visible to this CPU?  We have an
+	 * smp_mb() already, but that's not quite the same thing.
+	 */
+	isb();
+}
+
+#endif /* _ASM_ARM64_SYNC_CORE_H */
diff --git a/arch/powerpc/include/asm/sync_core.h b/arch/powerpc/include/asm/sync_core.h
new file mode 100644
index 000000000000..71dfbe7794e5
--- /dev/null
+++ b/arch/powerpc/include/asm/sync_core.h
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_POWERPC_SYNC_CORE_H
+#define _ASM_POWERPC_SYNC_CORE_H
+
+#include <asm/barrier.h>
+
+/*
+ * Ensure that the CPU notices any instruction changes before the next time
+ * it returns to usermode.
+ */
+static inline void membarrier_sync_core_before_usermode(void)
+{
+	/*
+	 * XXX: I know basically nothing about powerpc cache management.
+	 * Is this correct?
+	 */
+	isync();
+}
+
+#endif /* _ASM_POWERPC_SYNC_CORE_H */
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index b5137cc5b7b4..895f70fd4a61 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -81,7 +81,6 @@ config X86
 	select ARCH_HAS_SET_DIRECT_MAP
 	select ARCH_HAS_STRICT_KERNEL_RWX
 	select ARCH_HAS_STRICT_MODULE_RWX
-	select ARCH_HAS_SYNC_CORE_BEFORE_USERMODE
 	select ARCH_HAS_SYSCALL_WRAPPER
 	select ARCH_HAS_UBSAN_SANITIZE_ALL
 	select ARCH_HAS_DEBUG_WX
diff --git a/arch/x86/include/asm/sync_core.h b/arch/x86/include/asm/sync_core.h
index ab7382f92aff..c665b453969a 100644
--- a/arch/x86/include/asm/sync_core.h
+++ b/arch/x86/include/asm/sync_core.h
@@ -89,11 +89,10 @@ static inline void sync_core(void)
 }
 
 /*
- * Ensure that a core serializing instruction is issued before returning
- * to user-mode. x86 implements return to user-space through sysexit,
- * sysrel, and sysretq, which are not core serializing.
+ * Ensure that the CPU notices any instruction changes before the next time
+ * it returns to usermode.
  */
-static inline void sync_core_before_usermode(void)
+static inline void membarrier_sync_core_before_usermode(void)
 {
 	/* With PTI, we unconditionally serialize before running user code. */
 	if (static_cpu_has(X86_FEATURE_PTI))
diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h
index 48640db6ca86..81ba47910a73 100644
--- a/include/linux/sched/mm.h
+++ b/include/linux/sched/mm.h
@@ -7,7 +7,6 @@
 #include <linux/sched.h>
 #include <linux/mm_types.h>
 #include <linux/gfp.h>
-#include <linux/sync_core.h>
 
 /*
  * Routines for handling mm_structs
diff --git a/include/linux/sync_core.h b/include/linux/sync_core.h
deleted file mode 100644
index 013da4b8b327..000000000000
--- a/include/linux/sync_core.h
+++ /dev/null
@@ -1,21 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-#ifndef _LINUX_SYNC_CORE_H
-#define _LINUX_SYNC_CORE_H
-
-#ifdef CONFIG_ARCH_HAS_SYNC_CORE_BEFORE_USERMODE
-#include <asm/sync_core.h>
-#else
-/*
- * This is a dummy sync_core_before_usermode() implementation that can be used
- * on all architectures which return to user-space through core serializing
- * instructions.
- * If your architecture returns to user-space through non-core-serializing
- * instructions, you need to write your own functions.
- */
-static inline void sync_core_before_usermode(void)
-{
-}
-#endif
-
-#endif /* _LINUX_SYNC_CORE_H */
-
diff --git a/init/Kconfig b/init/Kconfig
index c9446911cf41..eb9772078cd4 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -2334,9 +2334,6 @@ source "kernel/Kconfig.locks"
 config ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE
 	bool
 
-config ARCH_HAS_SYNC_CORE_BEFORE_USERMODE
-	bool
-
 # It may be useful for an architecture to override the definitions of the
 # SYSCALL_DEFINE() and __SYSCALL_DEFINEx() macros in <linux/syscalls.h>
 # and the COMPAT_ variants in <linux/compat.h>, in particular to use a
diff --git a/kernel/sched/membarrier.c b/kernel/sched/membarrier.c
index b3a82d7635da..db4945e1ec94 100644
--- a/kernel/sched/membarrier.c
+++ b/kernel/sched/membarrier.c
@@ -5,6 +5,9 @@
  * membarrier system call
  */
 #include "sched.h"
+#ifdef CONFIG_ARCH_HAS_MEMBARRIER_SYNC_CORE
+#include <asm/sync_core.h>
+#endif
 
 /*
  * The basic principle behind the regular memory barrier mode of membarrier()
@@ -221,6 +224,7 @@ static void ipi_mb(void *info)
 	smp_mb();	/* IPIs should be serializing but paranoid. */
 }
 
+#ifdef CONFIG_ARCH_HAS_MEMBARRIER_SYNC_CORE
 static void ipi_sync_core(void *info)
 {
 	/*
@@ -230,13 +234,14 @@ static void ipi_sync_core(void *info)
 	 * the big comment at the top of this file.
 	 *
 	 * A sync_core() would provide this guarantee, but
-	 * sync_core_before_usermode() might end up being deferred until
-	 * after membarrier()'s smp_mb().
+	 * membarrier_sync_core_before_usermode() might end up being deferred
+	 * until after membarrier()'s smp_mb().
 	 */
 	smp_mb();	/* IPIs should be serializing but paranoid. */
 
-	sync_core_before_usermode();
+	membarrier_sync_core_before_usermode();
 }
+#endif
 
 static void ipi_rseq(void *info)
 {
@@ -368,12 +373,14 @@ static int membarrier_private_expedited(int flags, int cpu_id)
 	smp_call_func_t ipi_func = ipi_mb;
 
 	if (flags == MEMBARRIER_FLAG_SYNC_CORE) {
-		if (!IS_ENABLED(CONFIG_ARCH_HAS_MEMBARRIER_SYNC_CORE))
+#ifndef CONFIG_ARCH_HAS_MEMBARRIER_SYNC_CORE
 			return -EINVAL;
+#else
 		if (!(atomic_read(&mm->membarrier_state) &
 		      MEMBARRIER_STATE_PRIVATE_EXPEDITED_SYNC_CORE_READY))
 			return -EPERM;
 		ipi_func = ipi_sync_core;
+#endif
 	} else if (flags == MEMBARRIER_FLAG_RSEQ) {
 		if (!IS_ENABLED(CONFIG_RSEQ))
 			return -EINVAL;
-- 
2.29.2


^ permalink raw reply related

* [PATCH v2] arch: consolidate pm_power_off callback
From: Enrico Weigelt, metux IT consult @ 2020-12-27 14:01 UTC (permalink / raw)
  To: linux-kernel
  Cc: dalias, linux-ia64, linux-sh, James.Bottomley, jcmvbkbc, paulus,
	linux-csky, hpa, linux-riscv, will, tglx, jonas, linux-s390,
	sstabellini, linux-c6x-dev, ysato, linux-hexagon, deller, x86,
	ley.foon.tan, mingo, geert, catalin.marinas, linux-snps-arc,
	linux-xtensa, linux-pm, msalter, jacquiot.aurelien, linux-m68k,
	openrisc, bp, shorne, stefan.kristiansson, christian, chris,
	tsbogend, linux-parisc, linux-mips, linux-alpha, linuxppc-dev

Move the pm_power_off callback into one global place and also add an
function for conditionally calling it (when not NULL), in order to remove
code duplication in all individual archs.

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Enrico Weigelt, metux IT consult <info@metux.net>

----

changes v2:
  * fix forgotten removal of pm_power_off in arch/powerpc, as reported by lkp
---
 arch/alpha/kernel/process.c        |  6 ------
 arch/arc/kernel/reset.c            |  3 ---
 arch/arm/kernel/reboot.c           |  6 ++----
 arch/arm64/kernel/process.c        |  6 +-----
 arch/c6x/kernel/process.c          | 10 ++--------
 arch/csky/kernel/power.c           | 10 +++-------
 arch/h8300/kernel/process.c        |  3 ---
 arch/hexagon/kernel/reset.c        |  3 ---
 arch/ia64/kernel/process.c         |  5 +----
 arch/m68k/kernel/process.c         |  3 ---
 arch/microblaze/kernel/process.c   |  3 ---
 arch/mips/kernel/reset.c           |  6 +-----
 arch/nds32/kernel/process.c        |  7 ++-----
 arch/nios2/kernel/process.c        |  3 ---
 arch/openrisc/kernel/process.c     |  3 ---
 arch/parisc/kernel/process.c       |  9 +++------
 arch/powerpc/kernel/setup-common.c |  8 ++------
 arch/powerpc/xmon/xmon.c           |  4 ++--
 arch/riscv/kernel/reset.c          |  9 ++++-----
 arch/s390/kernel/setup.c           |  3 ---
 arch/sh/kernel/reboot.c            |  6 +-----
 arch/x86/kernel/reboot.c           | 15 ++++-----------
 arch/x86/xen/enlighten_pv.c        |  4 ++--
 arch/xtensa/kernel/process.c       |  4 ----
 include/linux/pm.h                 |  2 ++
 kernel/reboot.c                    | 10 ++++++++++
 26 files changed, 42 insertions(+), 109 deletions(-)

diff --git a/arch/alpha/kernel/process.c b/arch/alpha/kernel/process.c
index 6c71554206cc..df0df869751d 100644
--- a/arch/alpha/kernel/process.c
+++ b/arch/alpha/kernel/process.c
@@ -43,12 +43,6 @@
 #include "proto.h"
 #include "pci_impl.h"
 
-/*
- * Power off function, if any
- */
-void (*pm_power_off)(void) = machine_power_off;
-EXPORT_SYMBOL(pm_power_off);
-
 #ifdef CONFIG_ALPHA_WTINT
 /*
  * Sleep the CPU.
diff --git a/arch/arc/kernel/reset.c b/arch/arc/kernel/reset.c
index fd6c3eb930ba..3a27b6a202d4 100644
--- a/arch/arc/kernel/reset.c
+++ b/arch/arc/kernel/reset.c
@@ -26,6 +26,3 @@ void machine_power_off(void)
 	/* FIXME ::  power off ??? */
 	machine_halt();
 }
-
-void (*pm_power_off) (void) = NULL;
-EXPORT_SYMBOL(pm_power_off);
diff --git a/arch/arm/kernel/reboot.c b/arch/arm/kernel/reboot.c
index 0ce388f15422..9e1bf0e9b3e0 100644
--- a/arch/arm/kernel/reboot.c
+++ b/arch/arm/kernel/reboot.c
@@ -6,6 +6,7 @@
 #include <linux/cpu.h>
 #include <linux/delay.h>
 #include <linux/reboot.h>
+#include <linux/pm.h>
 
 #include <asm/cacheflush.h>
 #include <asm/idmap.h>
@@ -19,8 +20,6 @@ typedef void (*phys_reset_t)(unsigned long, bool);
  * Function pointers to optional machine specific functions
  */
 void (*arm_pm_restart)(enum reboot_mode reboot_mode, const char *cmd);
-void (*pm_power_off)(void);
-EXPORT_SYMBOL(pm_power_off);
 
 /*
  * A temporary stack to use for CPU reset. This is static so that we
@@ -118,8 +117,7 @@ void machine_power_off(void)
 	local_irq_disable();
 	smp_send_stop();
 
-	if (pm_power_off)
-		pm_power_off();
+	do_power_off();
 }
 
 /*
diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index 6616486a58fe..a5d4c1e80abd 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -67,9 +67,6 @@ EXPORT_SYMBOL(__stack_chk_guard);
 /*
  * Function pointers to optional machine specific functions
  */
-void (*pm_power_off)(void);
-EXPORT_SYMBOL_GPL(pm_power_off);
-
 void (*arm_pm_restart)(enum reboot_mode reboot_mode, const char *cmd);
 
 static void noinstr __cpu_do_idle(void)
@@ -172,8 +169,7 @@ void machine_power_off(void)
 {
 	local_irq_disable();
 	smp_send_stop();
-	if (pm_power_off)
-		pm_power_off();
+	do_power_off();
 }
 
 /*
diff --git a/arch/c6x/kernel/process.c b/arch/c6x/kernel/process.c
index 9f4fd6a40a10..8b4b24476162 100644
--- a/arch/c6x/kernel/process.c
+++ b/arch/c6x/kernel/process.c
@@ -15,6 +15,7 @@
 #include <linux/reboot.h>
 #include <linux/sched/task.h>
 #include <linux/sched/task_stack.h>
+#include <linux/pm.h>
 
 #include <asm/syscalls.h>
 
@@ -25,12 +26,6 @@ void	(*c6x_halt)(void);
 extern asmlinkage void ret_from_fork(void);
 extern asmlinkage void ret_from_kernel_thread(void);
 
-/*
- * power off function, if any
- */
-void (*pm_power_off)(void);
-EXPORT_SYMBOL(pm_power_off);
-
 void arch_cpu_idle(void)
 {
 	unsigned long tmp;
@@ -71,8 +66,7 @@ void machine_halt(void)
 
 void machine_power_off(void)
 {
-	if (pm_power_off)
-		pm_power_off();
+	do_power_off();
 	halt_loop();
 }
 
diff --git a/arch/csky/kernel/power.c b/arch/csky/kernel/power.c
index 923ee4e381b8..c702e66ce03a 100644
--- a/arch/csky/kernel/power.c
+++ b/arch/csky/kernel/power.c
@@ -2,23 +2,19 @@
 // Copyright (C) 2018 Hangzhou C-SKY Microsystems co.,ltd.
 
 #include <linux/reboot.h>
-
-void (*pm_power_off)(void);
-EXPORT_SYMBOL(pm_power_off);
+#include <linux/pm.h>
 
 void machine_power_off(void)
 {
 	local_irq_disable();
-	if (pm_power_off)
-		pm_power_off();
+	do_power_off();
 	asm volatile ("bkpt");
 }
 
 void machine_halt(void)
 {
 	local_irq_disable();
-	if (pm_power_off)
-		pm_power_off();
+	do_power_off();
 	asm volatile ("bkpt");
 }
 
diff --git a/arch/h8300/kernel/process.c b/arch/h8300/kernel/process.c
index bc1364db58fe..020bf78a779c 100644
--- a/arch/h8300/kernel/process.c
+++ b/arch/h8300/kernel/process.c
@@ -46,9 +46,6 @@
 #include <asm/traps.h>
 #include <asm/setup.h>
 
-void (*pm_power_off)(void) = NULL;
-EXPORT_SYMBOL(pm_power_off);
-
 asmlinkage void ret_from_fork(void);
 asmlinkage void ret_from_kernel_thread(void);
 
diff --git a/arch/hexagon/kernel/reset.c b/arch/hexagon/kernel/reset.c
index da36114d928f..8370ddbcdfd9 100644
--- a/arch/hexagon/kernel/reset.c
+++ b/arch/hexagon/kernel/reset.c
@@ -19,6 +19,3 @@ void machine_halt(void)
 void machine_restart(char *cmd)
 {
 }
-
-void (*pm_power_off)(void) = NULL;
-EXPORT_SYMBOL(pm_power_off);
diff --git a/arch/ia64/kernel/process.c b/arch/ia64/kernel/process.c
index 4ebbfa076a26..72104b967668 100644
--- a/arch/ia64/kernel/process.c
+++ b/arch/ia64/kernel/process.c
@@ -57,8 +57,6 @@ void (*ia64_mark_idle)(int);
 
 unsigned long boot_option_idle_override = IDLE_NO_OVERRIDE;
 EXPORT_SYMBOL(boot_option_idle_override);
-void (*pm_power_off) (void);
-EXPORT_SYMBOL(pm_power_off);
 
 static void
 ia64_do_show_stack (struct unw_frame_info *info, void *arg)
@@ -602,8 +600,7 @@ machine_halt (void)
 void
 machine_power_off (void)
 {
-	if (pm_power_off)
-		pm_power_off();
+	do_power_off()
 	machine_halt();
 }
 
diff --git a/arch/m68k/kernel/process.c b/arch/m68k/kernel/process.c
index 08359a6e058f..b8dc10a630e1 100644
--- a/arch/m68k/kernel/process.c
+++ b/arch/m68k/kernel/process.c
@@ -72,9 +72,6 @@ void machine_power_off(void)
 	for (;;);
 }
 
-void (*pm_power_off)(void) = machine_power_off;
-EXPORT_SYMBOL(pm_power_off);
-
 void show_regs(struct pt_regs * regs)
 {
 	pr_info("Format %02x  Vector: %04x  PC: %08lx  Status: %04x    %s\n",
diff --git a/arch/microblaze/kernel/process.c b/arch/microblaze/kernel/process.c
index 657c2beb665e..f1dd66a14ab6 100644
--- a/arch/microblaze/kernel/process.c
+++ b/arch/microblaze/kernel/process.c
@@ -46,9 +46,6 @@ void show_regs(struct pt_regs *regs)
 				regs->msr, regs->ear, regs->esr, regs->fsr);
 }
 
-void (*pm_power_off)(void) = NULL;
-EXPORT_SYMBOL(pm_power_off);
-
 void flush_thread(void)
 {
 }
diff --git a/arch/mips/kernel/reset.c b/arch/mips/kernel/reset.c
index 6288780b779e..73e32eba422f 100644
--- a/arch/mips/kernel/reset.c
+++ b/arch/mips/kernel/reset.c
@@ -25,9 +25,6 @@
  */
 void (*_machine_restart)(char *command);
 void (*_machine_halt)(void);
-void (*pm_power_off)(void);
-
-EXPORT_SYMBOL(pm_power_off);
 
 static void machine_hang(void)
 {
@@ -114,8 +111,7 @@ void machine_halt(void)
 
 void machine_power_off(void)
 {
-	if (pm_power_off)
-		pm_power_off();
+	do_power_off();
 
 #ifdef CONFIG_SMP
 	preempt_disable();
diff --git a/arch/nds32/kernel/process.c b/arch/nds32/kernel/process.c
index e01ad5d17224..624e2a563082 100644
--- a/arch/nds32/kernel/process.c
+++ b/arch/nds32/kernel/process.c
@@ -12,6 +12,7 @@
 #include <asm/fpu.h>
 #include <linux/ptrace.h>
 #include <linux/reboot.h>
+#include <linux/pm.h>
 
 #if IS_ENABLED(CONFIG_LAZY_FPU)
 struct task_struct *last_task_used_math;
@@ -27,9 +28,6 @@ extern inline void arch_reset(char mode)
 	}
 }
 
-void (*pm_power_off) (void);
-EXPORT_SYMBOL(pm_power_off);
-
 static char reboot_mode_nds32 = 'h';
 
 int __init reboot_setup(char *str)
@@ -54,8 +52,7 @@ EXPORT_SYMBOL(machine_halt);
 
 void machine_power_off(void)
 {
-	if (pm_power_off)
-		pm_power_off();
+	do_power_off();
 }
 
 EXPORT_SYMBOL(machine_power_off);
diff --git a/arch/nios2/kernel/process.c b/arch/nios2/kernel/process.c
index 50b4eb19a6cc..a6195cc02ea4 100644
--- a/arch/nios2/kernel/process.c
+++ b/arch/nios2/kernel/process.c
@@ -28,9 +28,6 @@
 asmlinkage void ret_from_fork(void);
 asmlinkage void ret_from_kernel_thread(void);
 
-void (*pm_power_off)(void) = NULL;
-EXPORT_SYMBOL(pm_power_off);
-
 void arch_cpu_idle(void)
 {
 	raw_local_irq_enable();
diff --git a/arch/openrisc/kernel/process.c b/arch/openrisc/kernel/process.c
index 3c98728cce24..c02343bacf59 100644
--- a/arch/openrisc/kernel/process.c
+++ b/arch/openrisc/kernel/process.c
@@ -84,9 +84,6 @@ void arch_cpu_idle(void)
 		mtspr(SPR_PMR, mfspr(SPR_PMR) | SPR_PMR_DME);
 }
 
-void (*pm_power_off) (void) = machine_power_off;
-EXPORT_SYMBOL(pm_power_off);
-
 /*
  * When a process does an "exec", machine state like FPU and debug
  * registers need to be reset.  This is a hook function for that.
diff --git a/arch/parisc/kernel/process.c b/arch/parisc/kernel/process.c
index a92a23d6acd9..8b94599c9480 100644
--- a/arch/parisc/kernel/process.c
+++ b/arch/parisc/kernel/process.c
@@ -41,6 +41,7 @@
 #include <linux/rcupdate.h>
 #include <linux/random.h>
 #include <linux/nmi.h>
+#include <linux/pm.h>
 
 #include <asm/io.h>
 #include <asm/asm-offsets.h>
@@ -117,9 +118,8 @@ void machine_power_off(void)
 	pdc_chassis_send_status(PDC_CHASSIS_DIRECT_SHUTDOWN);
 
 	/* ipmi_poweroff may have been installed. */
-	if (pm_power_off)
-		pm_power_off();
-		
+	do_power_off();
+
 	/* It seems we have no way to power the system off via
 	 * software. The user has to press the button himself. */
 
@@ -132,9 +132,6 @@ void machine_power_off(void)
 	for (;;);
 }
 
-void (*pm_power_off)(void);
-EXPORT_SYMBOL(pm_power_off);
-
 void machine_halt(void)
 {
 	machine_power_off();
diff --git a/arch/powerpc/kernel/setup-common.c b/arch/powerpc/kernel/setup-common.c
index 71f38e9248be..3ed44b6ee232 100644
--- a/arch/powerpc/kernel/setup-common.c
+++ b/arch/powerpc/kernel/setup-common.c
@@ -32,6 +32,7 @@
 #include <linux/of_platform.h>
 #include <linux/hugetlb.h>
 #include <linux/pgtable.h>
+#include <linux/pm.h>
 #include <asm/debugfs.h>
 #include <asm/io.h>
 #include <asm/paca.h>
@@ -163,18 +164,13 @@ void machine_restart(char *cmd)
 void machine_power_off(void)
 {
 	machine_shutdown();
-	if (pm_power_off)
-		pm_power_off();
-
+	do_power_off();
 	smp_send_stop();
 	machine_hang();
 }
 /* Used by the G5 thermal driver */
 EXPORT_SYMBOL_GPL(machine_power_off);
 
-void (*pm_power_off)(void);
-EXPORT_SYMBOL_GPL(pm_power_off);
-
 void machine_halt(void)
 {
 	machine_shutdown();
diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
index dcd817ca2edf..38d76c283412 100644
--- a/arch/powerpc/xmon/xmon.c
+++ b/arch/powerpc/xmon/xmon.c
@@ -26,6 +26,7 @@
 #include <linux/ctype.h>
 #include <linux/highmem.h>
 #include <linux/security.h>
+#include <linux/pm.h>
 
 #include <asm/debugfs.h>
 #include <asm/ptrace.h>
@@ -1237,8 +1238,7 @@ static void bootcmds(void)
 	} else if (cmd == 'h') {
 		ppc_md.halt();
 	} else if (cmd == 'p') {
-		if (pm_power_off)
-			pm_power_off();
+		do_power_off();
 	}
 }
 
diff --git a/arch/riscv/kernel/reset.c b/arch/riscv/kernel/reset.c
index ee5878d968cc..f8bcf4d8b19b 100644
--- a/arch/riscv/kernel/reset.c
+++ b/arch/riscv/kernel/reset.c
@@ -12,9 +12,6 @@ static void default_power_off(void)
 		wait_for_interrupt();
 }
 
-void (*pm_power_off)(void) = default_power_off;
-EXPORT_SYMBOL(pm_power_off);
-
 void machine_restart(char *cmd)
 {
 	do_kernel_restart(cmd);
@@ -23,10 +20,12 @@ void machine_restart(char *cmd)
 
 void machine_halt(void)
 {
-	pm_power_off();
+	do_power_off();
+	default_power_off();
 }
 
 void machine_power_off(void)
 {
-	pm_power_off();
+	do_power_off();
+	default_power_off();
 }
diff --git a/arch/s390/kernel/setup.c b/arch/s390/kernel/setup.c
index 1fbed91c73bc..4e348d3b711f 100644
--- a/arch/s390/kernel/setup.c
+++ b/arch/s390/kernel/setup.c
@@ -302,9 +302,6 @@ void machine_power_off(void)
 /*
  * Dummy power off function.
  */
-void (*pm_power_off)(void) = machine_power_off;
-EXPORT_SYMBOL_GPL(pm_power_off);
-
 void *restart_stack;
 
 unsigned long stack_alloc(void)
diff --git a/arch/sh/kernel/reboot.c b/arch/sh/kernel/reboot.c
index 5c33f036418b..8c9b63e1dbba 100644
--- a/arch/sh/kernel/reboot.c
+++ b/arch/sh/kernel/reboot.c
@@ -10,9 +10,6 @@
 #include <asm/tlbflush.h>
 #include <asm/traps.h>
 
-void (*pm_power_off)(void);
-EXPORT_SYMBOL(pm_power_off);
-
 static void watchdog_trigger_immediate(void)
 {
 	sh_wdt_write_cnt(0xFF);
@@ -46,8 +43,7 @@ static void native_machine_shutdown(void)
 
 static void native_machine_power_off(void)
 {
-	if (pm_power_off)
-		pm_power_off();
+	do_power_off();
 }
 
 static void native_machine_halt(void)
diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
index db115943e8bd..cddf9ca4e6f6 100644
--- a/arch/x86/kernel/reboot.c
+++ b/arch/x86/kernel/reboot.c
@@ -34,12 +34,6 @@
 #include <asm/efi.h>
 
 /*
- * Power off function, if any
- */
-void (*pm_power_off)(void);
-EXPORT_SYMBOL(pm_power_off);
-
-/*
  * This is set if we need to go through the 'emergency' path.
  * When machine_emergency_restart() is called, we may be on
  * an inconsistent state and won't be able to do a clean cleanup
@@ -747,11 +741,10 @@ static void native_machine_halt(void)
 
 static void native_machine_power_off(void)
 {
-	if (pm_power_off) {
-		if (!reboot_force)
-			machine_shutdown();
-		pm_power_off();
-	}
+	if (!reboot_force)
+		machine_shutdown();
+	do_power_off();
+
 	/* A fallback in case there is no PM info available */
 	tboot_shutdown(TB_SHUTDOWN_HALT);
 }
diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c
index 4409306364dc..7e5416c316d3 100644
--- a/arch/x86/xen/enlighten_pv.c
+++ b/arch/x86/xen/enlighten_pv.c
@@ -33,6 +33,7 @@
 #include <linux/gfp.h>
 #include <linux/edd.h>
 #include <linux/objtool.h>
+#include <linux/pm.h>
 
 #include <xen/xen.h>
 #include <xen/events.h>
@@ -1084,8 +1085,7 @@ static void xen_machine_halt(void)
 
 static void xen_machine_power_off(void)
 {
-	if (pm_power_off)
-		pm_power_off();
+	do_power_off();
 	xen_reboot(SHUTDOWN_poweroff);
 }
 
diff --git a/arch/xtensa/kernel/process.c b/arch/xtensa/kernel/process.c
index 397a7de56377..fb8d5e9829ba 100644
--- a/arch/xtensa/kernel/process.c
+++ b/arch/xtensa/kernel/process.c
@@ -51,10 +51,6 @@
 extern void ret_from_fork(void);
 extern void ret_from_kernel_thread(void);
 
-void (*pm_power_off)(void) = NULL;
-EXPORT_SYMBOL(pm_power_off);
-
-
 #ifdef CONFIG_STACKPROTECTOR
 #include <linux/stackprotector.h>
 unsigned long __stack_chk_guard __read_mostly;
diff --git a/include/linux/pm.h b/include/linux/pm.h
index 47aca6bac1d6..78627c970be0 100644
--- a/include/linux/pm.h
+++ b/include/linux/pm.h
@@ -22,6 +22,8 @@
 extern void (*pm_power_off)(void);
 extern void (*pm_power_off_prepare)(void);
 
+extern void do_power_off(void);
+
 struct device; /* we have a circular dep with device.h */
 #ifdef CONFIG_VT_CONSOLE_SLEEP
 extern void pm_vt_switch_required(struct device *dev, bool required);
diff --git a/kernel/reboot.c b/kernel/reboot.c
index eb1b15850761..ec4cd66dd1ae 100644
--- a/kernel/reboot.c
+++ b/kernel/reboot.c
@@ -53,6 +53,16 @@ int reboot_force;
 void (*pm_power_off_prepare)(void);
 EXPORT_SYMBOL_GPL(pm_power_off_prepare);
 
+void (*pm_power_off)(void);
+EXPORT_SYMBOL_GPL(pm_power_off);
+
+void do_power_off(void)
+{
+	if (pm_power_off)
+		pm_power_off();
+}
+EXPORT_SYMBOL_GPL(do_power_off);
+
 /**
  *	emergency_restart - reboot the system
  *
-- 
2.11.0


^ permalink raw reply related

* Re: [PATCH v1 05/15] powerpc: Remove address argument from bad_page_fault()
From: Nicholas Piggin @ 2020-12-27  3:43 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Christophe Leroy, Michael Ellerman,
	Paul Mackerras
  Cc: linuxppc-dev, linux-kernel
In-Reply-To: <181cb8d6899a298e8ddab3b8f669a48c11b43cca.1608641533.git.christophe.leroy@csgroup.eu>

Excerpts from Christophe Leroy's message of December 22, 2020 11:28 pm:
> The address argument is not used by bad_page_fault().
> 
> Remove it.
> 
> Suggested-by: Nicholas Piggin <npiggin@gmail.com>
> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
> ---
>  arch/powerpc/include/asm/bug.h             | 4 ++--
>  arch/powerpc/kernel/entry_32.S             | 4 +---
>  arch/powerpc/kernel/exceptions-64e.S       | 3 +--
>  arch/powerpc/kernel/exceptions-64s.S       | 8 +++-----
>  arch/powerpc/kernel/traps.c                | 2 +-
>  arch/powerpc/mm/book3s64/hash_utils.c      | 2 +-
>  arch/powerpc/mm/book3s64/slb.c             | 2 +-
>  arch/powerpc/mm/fault.c                    | 6 +++---
>  arch/powerpc/platforms/8xx/machine_check.c | 2 +-
>  9 files changed, 14 insertions(+), 19 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/bug.h b/arch/powerpc/include/asm/bug.h
> index 464f8ca8a5c9..af8c164254d0 100644
> --- a/arch/powerpc/include/asm/bug.h
> +++ b/arch/powerpc/include/asm/bug.h
> @@ -112,8 +112,8 @@
>  
>  struct pt_regs;
>  extern int do_page_fault(struct pt_regs *, unsigned long, unsigned long);
> -extern void bad_page_fault(struct pt_regs *, unsigned long, int);
> -void __bad_page_fault(struct pt_regs *regs, unsigned long address, int sig);
> +void bad_page_fault(struct pt_regs *regs, int sig);
> +void __bad_page_fault(struct pt_regs *regs, int sig);
>  extern void _exception(int, struct pt_regs *, int, unsigned long);
>  extern void _exception_pkey(struct pt_regs *, unsigned long, int);
>  extern void die(const char *, struct pt_regs *, long);
> diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S
> index 238eacfda7b0..abd95aebe73a 100644
> --- a/arch/powerpc/kernel/entry_32.S
> +++ b/arch/powerpc/kernel/entry_32.S
> @@ -671,15 +671,13 @@ ppc_swapcontext:
>  handle_page_fault:
>  	addi	r3,r1,STACK_FRAME_OVERHEAD
>  	bl	do_page_fault
> -	cmpwi	r3,0
> +	mr.	r4,r3

This looks like an unrelated change so I'll leave it out. Nice little 
improvement though.

Thanks,
Nick

^ permalink raw reply

* Re: [PATCH v1 06/15] powerpc: Remove address and errorcode arguments from do_break()
From: Nicholas Piggin @ 2020-12-27  3:25 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Christophe Leroy, Michael Ellerman,
	Paul Mackerras
  Cc: linuxppc-dev, linux-kernel
In-Reply-To: <0246430576c2ff0aed1d35ccbd6f44e658908102.1608641533.git.christophe.leroy@csgroup.eu>

Excerpts from Christophe Leroy's message of December 22, 2020 11:28 pm:
> Let do_break() retrieve address and errorcode from regs.
> 
> This simplifies the code and shouldn't impeed performance as
> address and errorcode are likely still hot in the cache.
> 
> Suggested-by: Nicholas Piggin <npiggin@gmail.com>
> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
> ---
>  arch/powerpc/include/asm/debug.h     | 3 +--
>  arch/powerpc/kernel/exceptions-64s.S | 2 --
>  arch/powerpc/kernel/head_8xx.S       | 5 -----
>  arch/powerpc/kernel/process.c        | 8 +++-----
>  4 files changed, 4 insertions(+), 14 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/debug.h b/arch/powerpc/include/asm/debug.h
> index ec57daf87f40..0550eceab3ca 100644
> --- a/arch/powerpc/include/asm/debug.h
> +++ b/arch/powerpc/include/asm/debug.h
> @@ -52,8 +52,7 @@ extern void do_send_trap(struct pt_regs *regs, unsigned long address,
>  			 unsigned long error_code, int brkpt);
>  #else
>  
> -extern void do_break(struct pt_regs *regs, unsigned long address,
> -		     unsigned long error_code);
> +void do_break(struct pt_regs *regs);
>  #endif
>  
>  #endif /* _ASM_POWERPC_DEBUG_H */
> diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
> index cfbd1d690033..3ea067bcbb95 100644
> --- a/arch/powerpc/kernel/exceptions-64s.S
> +++ b/arch/powerpc/kernel/exceptions-64s.S
> @@ -3262,8 +3262,6 @@ handle_page_fault:
>  
>  /* We have a data breakpoint exception - handle it */
>  handle_dabr_fault:
> -	ld      r4,_DAR(r1)
> -	ld      r5,_DSISR(r1)
>  	addi    r3,r1,STACK_FRAME_OVERHEAD
>  	bl      do_break
>  	/*
> diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S
> index 52702f3db6df..81f3c984f50c 100644
> --- a/arch/powerpc/kernel/head_8xx.S
> +++ b/arch/powerpc/kernel/head_8xx.S
> @@ -364,11 +364,6 @@ do_databreakpoint:
>  	addi	r3,r1,STACK_FRAME_OVERHEAD
>  	mfspr	r4,SPRN_BAR
>  	stw	r4,_DAR(r11)
> -#ifdef CONFIG_VMAP_STACK
> -	lwz	r5,_DSISR(r11)
> -#else
> -	mfspr	r5,SPRN_DSISR
> -#endif

I didn't think you can do this (at leastuntil after your patch 10). I have my
!VMAP path doing mfspr r5,DSISR ; stw r3,_DSISR(r11); 

Thanks,
Nick


^ permalink raw reply

* [Bug 210911] New: error: implicit declaration of function 'cleanup_cpu_mmu_context' [-Werror=implicit-function-declaration]
From: bugzilla-daemon @ 2020-12-26 18:02 UTC (permalink / raw)
  To: linuxppc-dev

https://bugzilla.kernel.org/show_bug.cgi?id=210911

            Bug ID: 210911
           Summary: error: implicit declaration of function
                    'cleanup_cpu_mmu_context'
                    [-Werror=implicit-function-declaration]
           Product: Platform Specific/Hardware
           Version: 2.5
    Kernel Version: 5.10.3
          Hardware: All
                OS: Linux
              Tree: Mainline
            Status: NEW
          Severity: normal
          Priority: P1
         Component: PPC-32
          Assignee: platform_ppc-32@kernel-bugs.osdl.org
          Reporter: jason@bluehome.net
        Regression: No

Created attachment 294347
  --> https://bugzilla.kernel.org/attachment.cgi?id=294347&action=edit
Kernel config file

This began to appear starting with 5.10 and continues with 5.10.3. I had no
problems with the 5.9 series. I am building it with GCC 10.2. I have also tried
going back to 9.3 but that makes no difference.

arch/powerpc/platforms/powermac/smp.c: In function 'smp_core99_cpu_disable':
arch/powerpc/platforms/powermac/smp.c:914:2: error: implicit declaration of
function 'cleanup_cpu_mmu_context' [-Werror=implicit-function-declaration]
  914 |  cleanup_cpu_mmu_context();
      |  ^~~~~~~~~~~~~~~~~~~~~~~
cc1: some warnings being treated as errors
scripts/Makefile.build:279: recipe for target
'arch/powerpc/platforms/powermac/smp.o' failed
make[3]: *** [arch/powerpc/platforms/powermac/smp.o] Error 1
scripts/Makefile.build:496: recipe for target 'arch/powerpc/platforms/powermac'
failed
make[2]: *** [arch/powerpc/platforms/powermac] Error 2
scripts/Makefile.build:496: recipe for target 'arch/powerpc/platforms' failed
make[1]: *** [arch/powerpc/platforms] Error 2
Makefile:1805: recipe for target 'arch/powerpc' failed
make: *** [arch/powerpc] Error 2

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply

* Re: [PATCH v3 03/19] powerpc: bad_page_fault, do_break get registers from regs
From: Nicholas Piggin @ 2020-12-26 10:58 UTC (permalink / raw)
  To: Christophe Leroy, linuxppc-dev
In-Reply-To: <1608970380.delquel806.astroid@bobo.none>

Excerpts from Nicholas Piggin's message of December 26, 2020 6:19 pm:
> Excerpts from Christophe Leroy's message of December 23, 2020 12:42 am:
>> 
>> 
>> Le 28/11/2020 à 15:40, Nicholas Piggin a écrit :
>>> Similar to the previous patch this makes interrupt handler function
>>> types more regular so they can be wrapped with the next patch.
>>> 
>>> bad_page_fault and do_break are not performance critical.
>> 
>> I partly took your changes into one of my series, in different order though.
>> 
>> Please have a look at https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=221656 patches 
>> 4 to 7
> 
> Thanks, I had a look. Seems like the result is basically the same as my 
> series, so that's good if you like the end result now :)
> 
>> I think some of the changes are missing in your series, especially the changes in entry_32.S from 
>> patch 7.
> 
> Okay I could take them in. In your patch 7/15, why do you leave this 
> load of DSISR?
> 
> diff --git a/arch/powerpc/kernel/head_book3s_32.S b/arch/powerpc/kernel/head_book3s_32.S
> index 15e6003fd3b8..0133a02d1d47 100644
> --- a/arch/powerpc/kernel/head_book3s_32.S
> +++ b/arch/powerpc/kernel/head_book3s_32.S
> @@ -369,9 +369,9 @@  BEGIN_MMU_FTR_SECTION
>  END_MMU_FTR_SECTION_IFSET(MMU_FTR_HPTE_TABLE)
>  #endif
>  #endif	/* CONFIG_VMAP_STACK */
> -1:	mr	r4,r12
>  	andis.	r5,r9,DSISR_SRR1_MATCH_32S@h /* Filter relevant SRR1 bits */
> -	stw	r4, _DAR(r11)
> +	stw	r12, _DAR(r11)
> +	stw	r5, _DSISR(r11)
>  	EXC_XFER_LITE(0x400, handle_page_fault)
>  
>  /* External interrupt */
> @@ -693,7 +693,6 @@  handle_page_fault_tramp_1:
>  #ifdef CONFIG_VMAP_STACK
>  	EXCEPTION_PROLOG_2 handle_dar_dsisr=1
>  #endif
> -	lwz	r4, _DAR(r11)
>  	lwz	r5, _DSISR(r11)
> 	^^^^^^^^^^^^^^^^^^^^^^
>  	/* fall through */
>  handle_page_fault_tramp_2:
> 
> ?

Ah never mind, this needs to come back after your DABR match move
patch, which you have earlier in the series. I confused myself.

I'll rebase my series on your patch 4 rather than have it squashed
in with other do_break stuff.

Thanks,
Nick

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox