All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
To: Mark Rutland <mark.rutland@arm.com>
Cc: Andy Lutomirski <luto@kernel.org>,
	"Russell King (Oracle)" <linux@armlinux.org.uk>,
	the arch/x86 maintainers <x86@kernel.org>,
	Dave Hansen <dave.hansen@intel.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	linux-mm@kvack.org, Andrew Morton <akpm@linux-foundation.org>,
	Nicholas Piggin <npiggin@gmail.com>,
	"Peter Zijlstra (Intel)" <peterz@infradead.org>,
	linux-arm-kernel@lists.infradead.org
Subject: Re: [PATCH 7/8] membarrier: Remove arm (32) support for SYNC_CORE
Date: Thu, 17 Jun 2021 10:16:12 -0400	[thread overview]
Message-ID: <20210617141612.GA18005@localhost> (raw)
In-Reply-To: <20210617135133.GA86101@C02TD0UTHF1T.local>

On 17-Jun-2021 02:51:33 PM, Mark Rutland wrote:
> On Thu, Jun 17, 2021 at 06:41:41AM -0700, Andy Lutomirski wrote:
> > 
> > 
> > On Thu, Jun 17, 2021, at 4:33 AM, Mark Rutland wrote:
> > > On Thu, Jun 17, 2021 at 12:23:05PM +0100, Russell King (Oracle) wrote:
> > > > On Thu, Jun 17, 2021 at 11:40:46AM +0100, Mark Rutland wrote:
> > > > > On Tue, Jun 15, 2021 at 08:21:12PM -0700, Andy Lutomirski wrote:
> > > > > > On arm32, the only way to safely flush icache from usermode is to call
> > > > > > cacheflush(2).  This also handles any required pipeline flushes, so
> > > > > > membarrier's SYNC_CORE feature is useless on arm.  Remove it.
> > > > > 
> > > > > Unfortunately, it's a bit more complicated than that, and these days
> > > > > SYNC_CORE is equally necessary on arm as on arm64. This is something
> > > > > that changed in the architecture over time, but since ARMv7 we generally
> > > > > need both the cache maintenance *and* a context synchronization event
> > > > > (the latter must occur on the CPU which will execute the instructions).
> > > > > 
> > > > > If you look at the latest ARMv7-AR manual (ARM DDI 406C.d), section
> > > > > A3.5.4 "Concurrent modification and execution of instructions" covers
> > > > > this. That manual can be found at:
> > > > > 
> > > > > 	https://developer.arm.com/documentation/ddi0406/latest/
> > > > 
> > > > Looking at that, sys_cacheflush() meets this. The manual details a
> > > > series of cache maintenance calls in "step 1" that the modifying thread
> > > > must issue - this is exactly what sys_cacheflush() does. The same is
> > > > true for ARMv6, except the "ISB" terminology is replaced by a
> > > > "PrefetchFlush" terminology. (I checked DDI0100I).
> > > > 
> > > > "step 2" requires an ISB on the "other CPU" prior to executing that
> > > > code. As I understand it, in ARMv7, userspace can issue an ISB itself.
> > > > 
> > > > For ARMv6K, it doesn't have ISB, but instead has a CP15 instruction
> > > > for this that isn't availble to userspace. This is where we come to
> > > > the situation about ARM 11MPCore, and whether we continue to support
> > > > it or not.
> > > > 
> > > > So, I think we're completely fine with ARMv7 under 32-bit ARM kernels
> > > > as userspace has everything that's required. ARMv6K is a different
> > > > matter as we've already identified for several reasons.
> > > 
> > > Sure, and I agree we should not change cacheflush().
> > > 
> > > The point of membarrier(SYNC_CORE) is that you can move the cost of that
> > > ISB out of the fast-path in the executing thread(s) and into the
> > > slow-path on the thread which generated the code.
> > > 
> > > So e.g. rather than an executing thread always having to do:
> > > 
> > > 	LDR	<reg>, [<funcptr>]
> > > 	ISB	// in case funcptr was just updated
> > > 	BLR	<reg>
> > > 
> > > ... you have the thread generating the code use membarrier(SYNC_CORE)
> > > prior to plublishing the funcptr, and the fast-path on all the executing
> > > threads can be:
> > > 
> > > 	LDR	<reg> [<funcptr>]
> > > 	BLR	<reg>
> > > 
> > > ... and thus I think we still want membarrier(SYNC_CORE) so that people
> > > can do this, even if there are other means to achieve the same
> > > functionality.
> > 
> > I had the impression that sys_cacheflush() did that.  Am I wrong?
> 
> Currently sys_cacheflush() doesn't do this, and IIUC it has never done
> remote context synchronization even for architectures that need that
> (e.g. x86 requiring a serializing instruction).
> 
> > In any event, I’m even more convinced that no new SYNC_CORE arches
> > should be added. We need a new API that just does the right thing. 
> 
> My intuition is the other way around, and that this is a gnereally
> useful thing for architectures that require context synchronization.
> 
> It's not clear to me what "the right thing" would mean specifically, and
> on architectures with userspace cache maintenance JITs can usually do
> the most optimal maintenance, and only need help for the context
> synchronization.

If I can attempt to summarize the current situation for ARMv7:

- In addition to the cache flushing on the core doing the code update,
  the architecture requires every core to perform a context synchronizing
  instruction before executing the updated code.

- sys_cacheflush() don't do this core sync on every core. It also takes a
  single address range as parameter.

- ARM, ARM64, powerpc, powerpc64, x86, x86-64 all currently handle the
  context synchronization requirement for updating user-space code on
  SMP with sys_membarrier SYNC_CORE. It's not, however, meant to replace
  explicit cache flushing operations if those are needed.

So removing membarrier SYNC_CORE from ARM would be a step backward here.
On ARMv7, the SYNC_CORE is needed _in addition_ to sys_cacheflush.

Adding a sync-core operation at the end of sys_cacheflush would be
inefficient for common GC use-cases where a rather large set of address
ranges are invalidated in one go: for this, we either want the GC to:

- Invoke sys_cacheflush for each targeted range, and then issue a single
  sys_membarrier SYNC_CORE, or

- Implement a new "sys_cacheflush_iov" which takes an iovec input. There
  I see that it could indeed invalidate all relevant cache lines *and*
  issue the SYNC_CORE at the end.

But shoehorning the SYNC_CORE in the pre-existing sys_cacheflush after
the fact seems like a bad idea.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

WARNING: multiple messages have this Message-ID (diff)
From: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
To: Mark Rutland <mark.rutland@arm.com>
Cc: Andy Lutomirski <luto@kernel.org>,
	"Russell King (Oracle)" <linux@armlinux.org.uk>,
	the arch/x86 maintainers <x86@kernel.org>,
	Dave Hansen <dave.hansen@intel.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	linux-mm@kvack.org, Andrew Morton <akpm@linux-foundation.org>,
	Nicholas Piggin <npiggin@gmail.com>,
	"Peter Zijlstra (Intel)" <peterz@infradead.org>,
	linux-arm-kernel@lists.infradead.org
Subject: Re: [PATCH 7/8] membarrier: Remove arm (32) support for SYNC_CORE
Date: Thu, 17 Jun 2021 10:16:12 -0400	[thread overview]
Message-ID: <20210617141612.GA18005@localhost> (raw)
In-Reply-To: <20210617135133.GA86101@C02TD0UTHF1T.local>

On 17-Jun-2021 02:51:33 PM, Mark Rutland wrote:
> On Thu, Jun 17, 2021 at 06:41:41AM -0700, Andy Lutomirski wrote:
> > 
> > 
> > On Thu, Jun 17, 2021, at 4:33 AM, Mark Rutland wrote:
> > > On Thu, Jun 17, 2021 at 12:23:05PM +0100, Russell King (Oracle) wrote:
> > > > On Thu, Jun 17, 2021 at 11:40:46AM +0100, Mark Rutland wrote:
> > > > > On Tue, Jun 15, 2021 at 08:21:12PM -0700, Andy Lutomirski wrote:
> > > > > > On arm32, the only way to safely flush icache from usermode is to call
> > > > > > cacheflush(2).  This also handles any required pipeline flushes, so
> > > > > > membarrier's SYNC_CORE feature is useless on arm.  Remove it.
> > > > > 
> > > > > Unfortunately, it's a bit more complicated than that, and these days
> > > > > SYNC_CORE is equally necessary on arm as on arm64. This is something
> > > > > that changed in the architecture over time, but since ARMv7 we generally
> > > > > need both the cache maintenance *and* a context synchronization event
> > > > > (the latter must occur on the CPU which will execute the instructions).
> > > > > 
> > > > > If you look at the latest ARMv7-AR manual (ARM DDI 406C.d), section
> > > > > A3.5.4 "Concurrent modification and execution of instructions" covers
> > > > > this. That manual can be found at:
> > > > > 
> > > > > 	https://developer.arm.com/documentation/ddi0406/latest/
> > > > 
> > > > Looking at that, sys_cacheflush() meets this. The manual details a
> > > > series of cache maintenance calls in "step 1" that the modifying thread
> > > > must issue - this is exactly what sys_cacheflush() does. The same is
> > > > true for ARMv6, except the "ISB" terminology is replaced by a
> > > > "PrefetchFlush" terminology. (I checked DDI0100I).
> > > > 
> > > > "step 2" requires an ISB on the "other CPU" prior to executing that
> > > > code. As I understand it, in ARMv7, userspace can issue an ISB itself.
> > > > 
> > > > For ARMv6K, it doesn't have ISB, but instead has a CP15 instruction
> > > > for this that isn't availble to userspace. This is where we come to
> > > > the situation about ARM 11MPCore, and whether we continue to support
> > > > it or not.
> > > > 
> > > > So, I think we're completely fine with ARMv7 under 32-bit ARM kernels
> > > > as userspace has everything that's required. ARMv6K is a different
> > > > matter as we've already identified for several reasons.
> > > 
> > > Sure, and I agree we should not change cacheflush().
> > > 
> > > The point of membarrier(SYNC_CORE) is that you can move the cost of that
> > > ISB out of the fast-path in the executing thread(s) and into the
> > > slow-path on the thread which generated the code.
> > > 
> > > So e.g. rather than an executing thread always having to do:
> > > 
> > > 	LDR	<reg>, [<funcptr>]
> > > 	ISB	// in case funcptr was just updated
> > > 	BLR	<reg>
> > > 
> > > ... you have the thread generating the code use membarrier(SYNC_CORE)
> > > prior to plublishing the funcptr, and the fast-path on all the executing
> > > threads can be:
> > > 
> > > 	LDR	<reg> [<funcptr>]
> > > 	BLR	<reg>
> > > 
> > > ... and thus I think we still want membarrier(SYNC_CORE) so that people
> > > can do this, even if there are other means to achieve the same
> > > functionality.
> > 
> > I had the impression that sys_cacheflush() did that.  Am I wrong?
> 
> Currently sys_cacheflush() doesn't do this, and IIUC it has never done
> remote context synchronization even for architectures that need that
> (e.g. x86 requiring a serializing instruction).
> 
> > In any event, I’m even more convinced that no new SYNC_CORE arches
> > should be added. We need a new API that just does the right thing. 
> 
> My intuition is the other way around, and that this is a gnereally
> useful thing for architectures that require context synchronization.
> 
> It's not clear to me what "the right thing" would mean specifically, and
> on architectures with userspace cache maintenance JITs can usually do
> the most optimal maintenance, and only need help for the context
> synchronization.

If I can attempt to summarize the current situation for ARMv7:

- In addition to the cache flushing on the core doing the code update,
  the architecture requires every core to perform a context synchronizing
  instruction before executing the updated code.

- sys_cacheflush() don't do this core sync on every core. It also takes a
  single address range as parameter.

- ARM, ARM64, powerpc, powerpc64, x86, x86-64 all currently handle the
  context synchronization requirement for updating user-space code on
  SMP with sys_membarrier SYNC_CORE. It's not, however, meant to replace
  explicit cache flushing operations if those are needed.

So removing membarrier SYNC_CORE from ARM would be a step backward here.
On ARMv7, the SYNC_CORE is needed _in addition_ to sys_cacheflush.

Adding a sync-core operation at the end of sys_cacheflush would be
inefficient for common GC use-cases where a rather large set of address
ranges are invalidated in one go: for this, we either want the GC to:

- Invoke sys_cacheflush for each targeted range, and then issue a single
  sys_membarrier SYNC_CORE, or

- Implement a new "sys_cacheflush_iov" which takes an iovec input. There
  I see that it could indeed invalidate all relevant cache lines *and*
  issue the SYNC_CORE at the end.

But shoehorning the SYNC_CORE in the pre-existing sys_cacheflush after
the fact seems like a bad idea.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com


  parent reply	other threads:[~2021-06-17 14:18 UTC|newest]

Thread overview: 155+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-16  3:21 [PATCH 0/8] membarrier cleanups Andy Lutomirski
2021-06-16  3:21 ` [PATCH 1/8] membarrier: Document why membarrier() works Andy Lutomirski
2021-06-16  4:00   ` Nicholas Piggin
2021-06-16  7:30     ` Peter Zijlstra
2021-06-17 23:45       ` Andy Lutomirski
2021-06-16  3:21 ` [PATCH 2/8] x86/mm: Handle unlazying membarrier core sync in the arch code Andy Lutomirski
2021-06-16  4:25   ` Nicholas Piggin
2021-06-16 18:31     ` Andy Lutomirski
2021-06-16 17:49   ` Mathieu Desnoyers
2021-06-16 18:31     ` Andy Lutomirski
2021-06-16  3:21 ` [PATCH 3/8] membarrier: Remove membarrier_arch_switch_mm() prototype in core code Andy Lutomirski
2021-06-16  4:26   ` Nicholas Piggin
2021-06-16 17:52   ` Mathieu Desnoyers
2021-06-16  3:21 ` [PATCH 4/8] membarrier: Make the post-switch-mm barrier explicit Andy Lutomirski
2021-06-16  4:19   ` Nicholas Piggin
2021-06-16  7:35     ` Peter Zijlstra
2021-06-16 18:41       ` Andy Lutomirski
2021-06-17  1:37         ` Nicholas Piggin
2021-06-17  2:57           ` Andy Lutomirski
2021-06-17  5:32             ` Andy Lutomirski
2021-06-17  6:51               ` Nicholas Piggin
2021-06-17 23:49                 ` Andy Lutomirski
2021-06-19  2:53                   ` Nicholas Piggin
2021-06-19  3:20                     ` Andy Lutomirski
2021-06-19  4:27                       ` Nicholas Piggin
2021-06-17  9:08               ` [RFC][PATCH] sched: Use lightweight hazard pointers to grab lazy mms Peter Zijlstra
2021-06-17  9:10                 ` Peter Zijlstra
2021-06-17 10:00                   ` Nicholas Piggin
2021-06-17  9:13                 ` Peter Zijlstra
2021-06-17 14:06                   ` Andy Lutomirski
2021-06-17  9:28                 ` Peter Zijlstra
2021-06-17 14:03                   ` Andy Lutomirski
2021-06-17 14:10                 ` Andy Lutomirski
2021-06-17 15:45                   ` Peter Zijlstra
2021-06-18  3:29                 ` Paul E. McKenney
2021-06-18  5:04                   ` Andy Lutomirski
2021-06-17 15:02               ` [PATCH 4/8] membarrier: Make the post-switch-mm barrier explicit Paul E. McKenney
2021-06-18  0:06                 ` Andy Lutomirski
2021-06-18  3:35                   ` Paul E. McKenney
2021-06-17  8:45         ` Peter Zijlstra
2021-06-16  3:21 ` [PATCH 5/8] membarrier, kthread: Use _ONCE accessors for task->mm Andy Lutomirski
2021-06-16  4:28   ` Nicholas Piggin
2021-06-16 18:08   ` Mathieu Desnoyers
2021-06-16 18:45     ` Andy Lutomirski
2021-06-16  3:21 ` [PATCH 6/8] powerpc/membarrier: Remove special barrier on mm switch Andy Lutomirski
2021-06-16  3:21   ` Andy Lutomirski
2021-06-16  4:36   ` Nicholas Piggin
2021-06-16  4:36     ` Nicholas Piggin
2021-06-16  3:21 ` [PATCH 7/8] membarrier: Remove arm (32) support for SYNC_CORE Andy Lutomirski
2021-06-16  3:21   ` Andy Lutomirski
2021-06-16  9:28   ` Russell King (Oracle)
2021-06-16  9:28     ` Russell King (Oracle)
2021-06-16 10:16   ` Peter Zijlstra
2021-06-16 10:16     ` Peter Zijlstra
2021-06-16 10:20     ` Peter Zijlstra
2021-06-16 10:20       ` Peter Zijlstra
2021-06-16 10:34       ` Russell King (Oracle)
2021-06-16 10:34         ` Russell King (Oracle)
2021-06-16 11:10         ` Peter Zijlstra
2021-06-16 11:10           ` Peter Zijlstra
2021-06-16 13:22           ` Russell King (Oracle)
2021-06-16 13:22             ` Russell King (Oracle)
2021-06-16 15:04             ` Catalin Marinas
2021-06-16 15:04               ` Catalin Marinas
2021-06-16 15:23               ` Russell King (Oracle)
2021-06-16 15:23                 ` Russell King (Oracle)
2021-06-16 15:45                 ` Catalin Marinas
2021-06-16 15:45                   ` Catalin Marinas
2021-06-16 16:00                   ` Catalin Marinas
2021-06-16 16:00                     ` Catalin Marinas
2021-06-16 16:27                     ` Russell King (Oracle)
2021-06-16 16:27                       ` Russell King (Oracle)
2021-06-17  8:55                       ` Krzysztof Hałasa
2021-06-17  8:55                         ` Krzysztof Hałasa
2021-06-18 12:54                       ` Linus Walleij
2021-06-18 12:54                         ` Linus Walleij
2021-06-18 13:19                         ` Russell King (Oracle)
2021-06-18 13:19                           ` Russell King (Oracle)
2021-06-18 13:36                         ` Arnd Bergmann
2021-06-18 13:36                           ` Arnd Bergmann
2021-06-17 10:40   ` Mark Rutland
2021-06-17 10:40     ` Mark Rutland
2021-06-17 11:23     ` Russell King (Oracle)
2021-06-17 11:23       ` Russell King (Oracle)
2021-06-17 11:33       ` Mark Rutland
2021-06-17 11:33         ` Mark Rutland
2021-06-17 13:41         ` Andy Lutomirski
2021-06-17 13:41           ` Andy Lutomirski
2021-06-17 13:51           ` Mark Rutland
2021-06-17 13:51             ` Mark Rutland
2021-06-17 14:00             ` Andy Lutomirski
2021-06-17 14:00               ` Andy Lutomirski
2021-06-17 14:20               ` Mark Rutland
2021-06-17 14:20                 ` Mark Rutland
2021-06-17 15:01               ` Peter Zijlstra
2021-06-17 15:01                 ` Peter Zijlstra
2021-06-17 15:13                 ` Peter Zijlstra
2021-06-17 15:13                   ` Peter Zijlstra
2021-06-17 14:16             ` Mathieu Desnoyers [this message]
2021-06-17 14:16               ` Mathieu Desnoyers
2021-06-17 14:05           ` Peter Zijlstra
2021-06-17 14:05             ` Peter Zijlstra
2021-06-18  0:07   ` Andy Lutomirski
2021-06-18  0:07     ` Andy Lutomirski
2021-06-16  3:21 ` [PATCH 8/8] membarrier: Rewrite sync_core_before_usermode() and improve documentation Andy Lutomirski
2021-06-16  3:21   ` Andy Lutomirski
2021-06-16  3:21   ` Andy Lutomirski
2021-06-16  4:45   ` Nicholas Piggin
2021-06-16  4:45     ` Nicholas Piggin
2021-06-16  4:45     ` Nicholas Piggin
2021-06-16 18:52     ` Andy Lutomirski
2021-06-16 18:52       ` Andy Lutomirski
2021-06-16 18:52       ` Andy Lutomirski
2021-06-16 23:48       ` Andy Lutomirski
2021-06-16 23:48         ` Andy Lutomirski
2021-06-16 23:48         ` Andy Lutomirski
2021-06-18 15:27       ` Christophe Leroy
2021-06-18 15:27         ` Christophe Leroy
2021-06-18 15:27         ` Christophe Leroy
2021-06-16 10:20   ` Will Deacon
2021-06-16 10:20     ` Will Deacon
2021-06-16 10:20     ` Will Deacon
2021-06-16 23:58     ` Andy Lutomirski
2021-06-16 23:58       ` Andy Lutomirski
2021-06-16 23:58       ` Andy Lutomirski
2021-06-17 14:47   ` Mathieu Desnoyers
2021-06-17 14:47     ` Mathieu Desnoyers
2021-06-17 14:47     ` Mathieu Desnoyers
2021-06-18  0:12     ` Andy Lutomirski
2021-06-18  0:12       ` Andy Lutomirski
2021-06-18  0:12       ` Andy Lutomirski
2021-06-18 16:31       ` Mathieu Desnoyers
2021-06-18 16:31         ` Mathieu Desnoyers
2021-06-18 16:31         ` Mathieu Desnoyers
2021-06-18 19:58         ` Andy Lutomirski
2021-06-18 19:58           ` Andy Lutomirski
2021-06-18 19:58           ` Andy Lutomirski
2021-06-18 20:09           ` Mathieu Desnoyers
2021-06-18 20:09             ` Mathieu Desnoyers
2021-06-18 20:09             ` Mathieu Desnoyers
2021-06-19  6:02             ` Nicholas Piggin
2021-06-19  6:02               ` Nicholas Piggin
2021-06-19  6:02               ` Nicholas Piggin
2021-06-19 15:50               ` Andy Lutomirski
2021-06-19 15:50                 ` Andy Lutomirski
2021-06-19 15:50                 ` Andy Lutomirski
2021-06-20  2:10                 ` Nicholas Piggin
2021-06-20  2:10                   ` Nicholas Piggin
2021-06-20  2:10                   ` Nicholas Piggin
2021-06-17 15:16   ` Mathieu Desnoyers
2021-06-17 15:16     ` Mathieu Desnoyers
2021-06-17 15:16     ` Mathieu Desnoyers
2021-06-18  0:13     ` Andy Lutomirski
2021-06-18  0:13       ` Andy Lutomirski
2021-06-18  0:13       ` Andy Lutomirski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210617141612.GA18005@localhost \
    --to=mathieu.desnoyers@efficios.com \
    --cc=akpm@linux-foundation.org \
    --cc=dave.hansen@intel.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux@armlinux.org.uk \
    --cc=luto@kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=npiggin@gmail.com \
    --cc=peterz@infradead.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.