* Rough notes from sys_membarrier() lightning BoF @ 2017-09-17 22:36 Paul E. McKenney 2017-09-18 19:04 ` Alan Stern 2017-09-20 16:02 ` Andy Lutomirski 0 siblings, 2 replies; 17+ messages in thread From: Paul E. McKenney @ 2017-09-17 22:36 UTC (permalink / raw) To: peterz, mathieu.desnoyers, will.deacon, stern Cc: luto, mpe, linux-kernel, linux-arch, davejwatson, maged.michael Hello! Rough notes from our discussion last Thursday. Please reply to the group with any needed elaborations or corrections. Adding Andy and Michael on CC since this most closely affects their architectures. Also adding Dave Watson and Maged Michael because the preferred approach requires that processes wanting to use the lightweight sys_membarrier() do a registration step. Thanx, Paul ------------------------------------------------------------------------ Problem: 1. The current sys_membarrier() introduces an smp_mb() that is not otherwise required on powerpc. 2. The envisioned JIT variant of sys_membarrier() assumes that the return-to-user instruction sequence handling any change to the usermode instruction stream, and Andy Lutomirski's upcoming changes invalidate this assumption. It is believed that powerpc has a similar issue. Here are diagrams indicating the memory-ordering requirements: Scenario 1: Access preceding sys_membarrier() must see changes from thread that concurrently switches in. ---------------------------------------------------------------- Scheduler sys_membarrier() --------- ---------------- smp_mb(); usermode load or store to Y /* begin system call */ sys_membarrier() smp_mb(); Check rq->curr rq->curr = new_thread; smp_mb(); /* not powerpc! */ /* return to user */ usermode load or store to X smp_mb(); ---------------------------------------------------------------- Due to the fr link from the check of rq->curr to the scheduler's write, we need full memory barriers on both sides. However, we don't want to lose the powerpc optimization, at least not in the common case. Scenario 2: Access following sys_membarrier() must see changes from thread that concurrently switches out. ---------------------------------------------------------------- Scheduler sys_membarrier() --------- ---------------- /* begin system call */ sys_membarrier() smp_mb(); usermode load or store to X /* Schedule from user */ smp_mb(); rq->curr = new_thread; Check rq->curr smp_mb(); smp_mb(); /* not powerpc! */ /* return to user */ usermode load or store to Y ---------------------------------------------------------------- Here less ordering is required, given that a read is returning the value previously written. Weaker barriers could be used, but full memory barriers are in place in any case. Potential resolutions, including known stupid ones: A. IPI all CPUs all the time. Not so good for real-time workloads, and a usermode-induced set of IPIs could potentially be used for a denial-of-service (DoS) attack. B. Lock all runqueues all the time. This could potentially also be used in a usermode-induced DoS attack. C. Explicitly interact with all threads rather than with CPUs. This can be quite expensive for the surprisingly common case where applications have very large numbers of thread. (Java, we are looking at you!!!) D. Just keep the redundant smp_mb() and just say "no" to Andy's x86 optimizations. We would like to avoid the performance degradation in both cases. E. Require that threads register before using sys_membarrier() for private or JIT usage. (The historical implementation using synchronize_sched() would continue to -not- require registration, both for compatibility and because there is no need to do so.) For x86 and powerpc, this registration would set a TIF flag on all of the current process's threads. This flag would be inherited by any later thread creation within that process, and would be cleared by fork() and exec(). When this TIF flag is set, the return-to-user path would execute additional code that would ensure that ordering and newly JITed code was handled correctly. We believe that checks for these TIF flags could be combined with existing checks to avoid adding any overhead in the common case where the process was not using these sys_membarrier() features. For all other architecture, the registration step would be a no-op. Does anyone have any better solution? If so, please don't keep it a secret! Thanx, Paul ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Rough notes from sys_membarrier() lightning BoF 2017-09-17 22:36 Rough notes from sys_membarrier() lightning BoF Paul E. McKenney @ 2017-09-18 19:04 ` Alan Stern 2017-09-20 16:02 ` Andy Lutomirski 1 sibling, 0 replies; 17+ messages in thread From: Alan Stern @ 2017-09-18 19:04 UTC (permalink / raw) To: Paul E. McKenney Cc: peterz, mathieu.desnoyers, will.deacon, luto, mpe, linux-kernel, linux-arch, davejwatson, maged.michael On Sun, 17 Sep 2017, Paul E. McKenney wrote: > Hello! > > Rough notes from our discussion last Thursday. Please reply to the > group with any needed elaborations or corrections. > > Adding Andy and Michael on CC since this most closely affects their > architectures. Also adding Dave Watson and Maged Michael because > the preferred approach requires that processes wanting to use the > lightweight sys_membarrier() do a registration step. > > Thanx, Paul > > ------------------------------------------------------------------------ > > Problem: > > 1. The current sys_membarrier() introduces an smp_mb() that > is not otherwise required on powerpc. > > 2. The envisioned JIT variant of sys_membarrier() assumes that > the return-to-user instruction sequence handling any change > to the usermode instruction stream, and Andy Lutomirski's > upcoming changes invalidate this assumption. It is believed > that powerpc has a similar issue. > E. Require that threads register before using sys_membarrier() for > private or JIT usage. (The historical implementation using > synchronize_sched() would continue to -not- require registration, > both for compatibility and because there is no need to do so.) > > For x86 and powerpc, this registration would set a TIF flag > on all of the current process's threads. This flag would be > inherited by any later thread creation within that process, and > would be cleared by fork() and exec(). When this TIF flag is set, Why a TIF flag, and why clear it during fork()? If a process registers to use private expedited sys_membarrier, shouldn't that apply to threads it will create in the future just as much as to threads it has already created? > the return-to-user path would execute additional code that would > ensure that ordering and newly JITed code was handled correctly. > We believe that checks for these TIF flags could be combined with > existing checks to avoid adding any overhead in the common case > where the process was not using these sys_membarrier() features. > > For all other architecture, the registration step would be > a no-op. Don't we want to fail private expedited sys_membarrier calls if the process hasn't registered for them? This requires the registration call to set a flag for the process, even on architectures where no additional memory barriers are actually needed. It can't be a no-op. Alan Stern ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Rough notes from sys_membarrier() lightning BoF @ 2017-09-18 19:04 ` Alan Stern 0 siblings, 0 replies; 17+ messages in thread From: Alan Stern @ 2017-09-18 19:04 UTC (permalink / raw) To: Paul E. McKenney Cc: peterz, mathieu.desnoyers, will.deacon, luto, mpe, linux-kernel, linux-arch, davejwatson, maged.michael On Sun, 17 Sep 2017, Paul E. McKenney wrote: > Hello! > > Rough notes from our discussion last Thursday. Please reply to the > group with any needed elaborations or corrections. > > Adding Andy and Michael on CC since this most closely affects their > architectures. Also adding Dave Watson and Maged Michael because > the preferred approach requires that processes wanting to use the > lightweight sys_membarrier() do a registration step. > > Thanx, Paul > > ------------------------------------------------------------------------ > > Problem: > > 1. The current sys_membarrier() introduces an smp_mb() that > is not otherwise required on powerpc. > > 2. The envisioned JIT variant of sys_membarrier() assumes that > the return-to-user instruction sequence handling any change > to the usermode instruction stream, and Andy Lutomirski's > upcoming changes invalidate this assumption. It is believed > that powerpc has a similar issue. > E. Require that threads register before using sys_membarrier() for > private or JIT usage. (The historical implementation using > synchronize_sched() would continue to -not- require registration, > both for compatibility and because there is no need to do so.) > > For x86 and powerpc, this registration would set a TIF flag > on all of the current process's threads. This flag would be > inherited by any later thread creation within that process, and > would be cleared by fork() and exec(). When this TIF flag is set, Why a TIF flag, and why clear it during fork()? If a process registers to use private expedited sys_membarrier, shouldn't that apply to threads it will create in the future just as much as to threads it has already created? > the return-to-user path would execute additional code that would > ensure that ordering and newly JITed code was handled correctly. > We believe that checks for these TIF flags could be combined with > existing checks to avoid adding any overhead in the common case > where the process was not using these sys_membarrier() features. > > For all other architecture, the registration step would be > a no-op. Don't we want to fail private expedited sys_membarrier calls if the process hasn't registered for them? This requires the registration call to set a flag for the process, even on architectures where no additional memory barriers are actually needed. It can't be a no-op. Alan Stern ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Rough notes from sys_membarrier() lightning BoF 2017-09-18 19:04 ` Alan Stern (?) @ 2017-09-18 19:10 ` Mathieu Desnoyers -1 siblings, 0 replies; 17+ messages in thread From: Mathieu Desnoyers @ 2017-09-18 19:10 UTC (permalink / raw) To: Alan Stern Cc: Paul E. McKenney, Peter Zijlstra, Will Deacon, Andy Lutomirski, Michael Ellerman, linux-kernel, linux-arch, Dave Watson, maged michael ----- On Sep 18, 2017, at 3:04 PM, Alan Stern stern@rowland.harvard.edu wrote: > On Sun, 17 Sep 2017, Paul E. McKenney wrote: > >> Hello! >> >> Rough notes from our discussion last Thursday. Please reply to the >> group with any needed elaborations or corrections. >> >> Adding Andy and Michael on CC since this most closely affects their >> architectures. Also adding Dave Watson and Maged Michael because >> the preferred approach requires that processes wanting to use the >> lightweight sys_membarrier() do a registration step. >> >> Thanx, Paul >> >> ------------------------------------------------------------------------ >> >> Problem: >> >> 1. The current sys_membarrier() introduces an smp_mb() that >> is not otherwise required on powerpc. >> >> 2. The envisioned JIT variant of sys_membarrier() assumes that >> the return-to-user instruction sequence handling any change >> to the usermode instruction stream, and Andy Lutomirski's >> upcoming changes invalidate this assumption. It is believed >> that powerpc has a similar issue. > >> E. Require that threads register before using sys_membarrier() for >> private or JIT usage. (The historical implementation using >> synchronize_sched() would continue to -not- require registration, >> both for compatibility and because there is no need to do so.) >> >> For x86 and powerpc, this registration would set a TIF flag >> on all of the current process's threads. This flag would be >> inherited by any later thread creation within that process, and >> would be cleared by fork() and exec(). When this TIF flag is set, > > Why a TIF flag, and why clear it during fork()? If a process registers > to use private expedited sys_membarrier, shouldn't that apply to > threads it will create in the future just as much as to threads it has > already created? In my implementation posted today, I'm not clearing it on fork. The child inherits from the parent. Why TIF flag ? It appears to be a convenient way to add an architecture-specific single-bit state for each thread. We also don't want to do too much pointer chasing on the scheduler fast-path (current->mm->..). > >> the return-to-user path would execute additional code that would >> ensure that ordering and newly JITed code was handled correctly. >> We believe that checks for these TIF flags could be combined with >> existing checks to avoid adding any overhead in the common case >> where the process was not using these sys_membarrier() features. >> >> For all other architecture, the registration step would be >> a no-op. > > Don't we want to fail private expedited sys_membarrier calls if the > process hasn't registered for them? This requires the registration > call to set a flag for the process, even on architectures where no > additional memory barriers are actually needed. It can't be a no-op. My implementation posted today fails the private expedited command if the process is not registered yet. We indeed add a new flag in mm_struct for all architectures to do so. So why not re-use this flag instead of the TIF on powerpc ? See my pointer chasing on fast-path argument above. Thanks, Mathieu > > Alan Stern -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Rough notes from sys_membarrier() lightning BoF 2017-09-18 19:04 ` Alan Stern (?) (?) @ 2017-09-18 19:29 ` Paul E. McKenney 2017-09-18 19:37 ` Mathieu Desnoyers -1 siblings, 1 reply; 17+ messages in thread From: Paul E. McKenney @ 2017-09-18 19:29 UTC (permalink / raw) To: Alan Stern Cc: peterz, mathieu.desnoyers, will.deacon, luto, mpe, linux-kernel, linux-arch, davejwatson, maged.michael On Mon, Sep 18, 2017 at 03:04:21PM -0400, Alan Stern wrote: > On Sun, 17 Sep 2017, Paul E. McKenney wrote: > > > Hello! > > > > Rough notes from our discussion last Thursday. Please reply to the > > group with any needed elaborations or corrections. > > > > Adding Andy and Michael on CC since this most closely affects their > > architectures. Also adding Dave Watson and Maged Michael because > > the preferred approach requires that processes wanting to use the > > lightweight sys_membarrier() do a registration step. > > > > Thanx, Paul > > > > ------------------------------------------------------------------------ > > > > Problem: > > > > 1. The current sys_membarrier() introduces an smp_mb() that > > is not otherwise required on powerpc. > > > > 2. The envisioned JIT variant of sys_membarrier() assumes that > > the return-to-user instruction sequence handling any change > > to the usermode instruction stream, and Andy Lutomirski's > > upcoming changes invalidate this assumption. It is believed > > that powerpc has a similar issue. > > > E. Require that threads register before using sys_membarrier() for > > private or JIT usage. (The historical implementation using > > synchronize_sched() would continue to -not- require registration, > > both for compatibility and because there is no need to do so.) > > > > For x86 and powerpc, this registration would set a TIF flag > > on all of the current process's threads. This flag would be > > inherited by any later thread creation within that process, and > > would be cleared by fork() and exec(). When this TIF flag is set, > > Why a TIF flag, and why clear it during fork()? If a process registers > to use private expedited sys_membarrier, shouldn't that apply to > threads it will create in the future just as much as to threads it has > already created? The reason for a TIF flag is to keep this per-architecture, as only powerpc and x86 need it. The reason for clearing it during fork() is that fork() creates a new process initially having but a single thread, which might or might not use sys_membarrier(). Usually not, as most instances of fork() are quickly followed by exec(). In addition, if we give an error for unregistered use of private sys_membarrier(), clearing on fork() gets an unambiguous error instead of a silent likely failure (due to libraries being confused by the fork()). That said, pthread_create() should preserve the flag, as the new thread will be part of this same process. > > the return-to-user path would execute additional code that would > > ensure that ordering and newly JITed code was handled correctly. > > We believe that checks for these TIF flags could be combined with > > existing checks to avoid adding any overhead in the common case > > where the process was not using these sys_membarrier() features. > > > > For all other architecture, the registration step would be > > a no-op. > > Don't we want to fail private expedited sys_membarrier calls if the > process hasn't registered for them? This requires the registration > call to set a flag for the process, even on architectures where no > additional memory barriers are actually needed. It can't be a no-op. Good point, and we did discuss that. Color me forgetful!!! Thanx, Paul ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Rough notes from sys_membarrier() lightning BoF 2017-09-18 19:29 ` Paul E. McKenney @ 2017-09-18 19:37 ` Mathieu Desnoyers 2017-09-18 20:34 ` Paul E. McKenney 0 siblings, 1 reply; 17+ messages in thread From: Mathieu Desnoyers @ 2017-09-18 19:37 UTC (permalink / raw) To: Paul E. McKenney Cc: Alan Stern, Peter Zijlstra, Will Deacon, Andy Lutomirski, Michael Ellerman, linux-kernel, linux-arch, Dave Watson, maged michael ----- On Sep 18, 2017, at 3:29 PM, Paul E. McKenney paulmck@linux.vnet.ibm.com wrote: > On Mon, Sep 18, 2017 at 03:04:21PM -0400, Alan Stern wrote: >> On Sun, 17 Sep 2017, Paul E. McKenney wrote: >> >> > Hello! >> > >> > Rough notes from our discussion last Thursday. Please reply to the >> > group with any needed elaborations or corrections. >> > >> > Adding Andy and Michael on CC since this most closely affects their >> > architectures. Also adding Dave Watson and Maged Michael because >> > the preferred approach requires that processes wanting to use the >> > lightweight sys_membarrier() do a registration step. >> > >> > Thanx, Paul >> > >> > ------------------------------------------------------------------------ >> > >> > Problem: >> > >> > 1. The current sys_membarrier() introduces an smp_mb() that >> > is not otherwise required on powerpc. >> > >> > 2. The envisioned JIT variant of sys_membarrier() assumes that >> > the return-to-user instruction sequence handling any change >> > to the usermode instruction stream, and Andy Lutomirski's >> > upcoming changes invalidate this assumption. It is believed >> > that powerpc has a similar issue. >> >> > E. Require that threads register before using sys_membarrier() for >> > private or JIT usage. (The historical implementation using >> > synchronize_sched() would continue to -not- require registration, >> > both for compatibility and because there is no need to do so.) >> > >> > For x86 and powerpc, this registration would set a TIF flag >> > on all of the current process's threads. This flag would be >> > inherited by any later thread creation within that process, and >> > would be cleared by fork() and exec(). When this TIF flag is set, >> >> Why a TIF flag, and why clear it during fork()? If a process registers >> to use private expedited sys_membarrier, shouldn't that apply to >> threads it will create in the future just as much as to threads it has >> already created? > > The reason for a TIF flag is to keep this per-architecture, as only > powerpc and x86 need it. > > The reason for clearing it during fork() is that fork() creates a new > process initially having but a single thread, which might or might > not use sys_membarrier(). Usually not, as most instances of fork() > are quickly followed by exec(). In addition, if we give an error for > unregistered use of private sys_membarrier(), clearing on fork() gets an > unambiguous error instead of a silent likely failure (due to libraries > being confused by the fork()). I think clearing that state on fork() would be unexpected. The child process inherits from the parent flag in my current implementation. Clearing the flag is only provided through exec(). Libraries don't get re-initialized on fork, only on exec(). Therefore, it makes sense for the child process to inherit the state from its parent. Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Rough notes from sys_membarrier() lightning BoF 2017-09-18 19:37 ` Mathieu Desnoyers @ 2017-09-18 20:34 ` Paul E. McKenney 0 siblings, 0 replies; 17+ messages in thread From: Paul E. McKenney @ 2017-09-18 20:34 UTC (permalink / raw) To: Mathieu Desnoyers Cc: Alan Stern, Peter Zijlstra, Will Deacon, Andy Lutomirski, Michael Ellerman, linux-kernel, linux-arch, Dave Watson, maged michael On Mon, Sep 18, 2017 at 07:37:22PM +0000, Mathieu Desnoyers wrote: > ----- On Sep 18, 2017, at 3:29 PM, Paul E. McKenney paulmck@linux.vnet.ibm.com wrote: > > > On Mon, Sep 18, 2017 at 03:04:21PM -0400, Alan Stern wrote: > >> On Sun, 17 Sep 2017, Paul E. McKenney wrote: > >> > >> > Hello! > >> > > >> > Rough notes from our discussion last Thursday. Please reply to the > >> > group with any needed elaborations or corrections. > >> > > >> > Adding Andy and Michael on CC since this most closely affects their > >> > architectures. Also adding Dave Watson and Maged Michael because > >> > the preferred approach requires that processes wanting to use the > >> > lightweight sys_membarrier() do a registration step. > >> > > >> > Thanx, Paul > >> > > >> > ------------------------------------------------------------------------ > >> > > >> > Problem: > >> > > >> > 1. The current sys_membarrier() introduces an smp_mb() that > >> > is not otherwise required on powerpc. > >> > > >> > 2. The envisioned JIT variant of sys_membarrier() assumes that > >> > the return-to-user instruction sequence handling any change > >> > to the usermode instruction stream, and Andy Lutomirski's > >> > upcoming changes invalidate this assumption. It is believed > >> > that powerpc has a similar issue. > >> > >> > E. Require that threads register before using sys_membarrier() for > >> > private or JIT usage. (The historical implementation using > >> > synchronize_sched() would continue to -not- require registration, > >> > both for compatibility and because there is no need to do so.) > >> > > >> > For x86 and powerpc, this registration would set a TIF flag > >> > on all of the current process's threads. This flag would be > >> > inherited by any later thread creation within that process, and > >> > would be cleared by fork() and exec(). When this TIF flag is set, > >> > >> Why a TIF flag, and why clear it during fork()? If a process registers > >> to use private expedited sys_membarrier, shouldn't that apply to > >> threads it will create in the future just as much as to threads it has > >> already created? > > > > The reason for a TIF flag is to keep this per-architecture, as only > > powerpc and x86 need it. > > > > The reason for clearing it during fork() is that fork() creates a new > > process initially having but a single thread, which might or might > > not use sys_membarrier(). Usually not, as most instances of fork() > > are quickly followed by exec(). In addition, if we give an error for > > unregistered use of private sys_membarrier(), clearing on fork() gets an > > unambiguous error instead of a silent likely failure (due to libraries > > being confused by the fork()). > > I think clearing that state on fork() would be unexpected. The child process > inherits from the parent flag in my current implementation. Clearing the > flag is only provided through exec(). > > Libraries don't get re-initialized on fork, only on exec(). Therefore, it > makes sense for the child process to inherit the state from its parent. Fair enough! Thanx, Paul ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Rough notes from sys_membarrier() lightning BoF 2017-09-17 22:36 Rough notes from sys_membarrier() lightning BoF Paul E. McKenney 2017-09-18 19:04 ` Alan Stern @ 2017-09-20 16:02 ` Andy Lutomirski 2017-09-20 18:13 ` Mathieu Desnoyers 1 sibling, 1 reply; 17+ messages in thread From: Andy Lutomirski @ 2017-09-20 16:02 UTC (permalink / raw) To: Paul E. McKenney Cc: Peter Zijlstra, Mathieu Desnoyers, Will Deacon, Alan Stern, Andrew Lutomirski, Michael Ellerman, linux-kernel@vger.kernel.org, linux-arch, Dave Watson, Maged Michael On Sun, Sep 17, 2017 at 3:36 PM, Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote: > Hello! > > Rough notes from our discussion last Thursday. Please reply to the > group with any needed elaborations or corrections. > > Adding Andy and Michael on CC since this most closely affects their > architectures. Also adding Dave Watson and Maged Michael because > the preferred approach requires that processes wanting to use the > lightweight sys_membarrier() do a registration step. Not to be too much of a curmudgeon, but I think that there should be a real implementation of the isync membarrier before this get merged. This series purports to solve two problems, ppc barriers and x86 exit-without-isync, but it's very hard to evaluate whether it actually solves the latter problem given the complete lack of x86 or isync code in the current RFC. It still seems to me that you won't get any particular advantage for using this registration mechanism on x86 even when you implement isync. Unless I've misunderstood, the only real issue on x86 is that you need a helper like arch_force_isync_before_usermode(), and that helper doesn't presently exist. That means that this whole patchset is standing on very dangerous ground: you'll end up with an efficient implementation that works just fine without even requesting registration on every architecture except ppc. That way lies userspace bugs. Also, can you elaborate on the PPC issue? PPC appears to track mm_cpumask more or less just like x86. Is the issue just that this tracking has no implied barriers? If so, how does TLB flush on ppc work? It really does seem impressive to me that an architecture can efficiently support munmap() but not an expedited private membarrier. --Andy ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Rough notes from sys_membarrier() lightning BoF 2017-09-20 16:02 ` Andy Lutomirski @ 2017-09-20 18:13 ` Mathieu Desnoyers 2017-09-20 18:18 ` Andy Lutomirski ` (2 more replies) 0 siblings, 3 replies; 17+ messages in thread From: Mathieu Desnoyers @ 2017-09-20 18:13 UTC (permalink / raw) To: Andy Lutomirski Cc: Paul E. McKenney, Peter Zijlstra, Will Deacon, Alan Stern, Michael Ellerman, linux-kernel, linux-arch, Dave Watson, maged michael ----- On Sep 20, 2017, at 12:02 PM, Andy Lutomirski luto@kernel.org wrote: > On Sun, Sep 17, 2017 at 3:36 PM, Paul E. McKenney > <paulmck@linux.vnet.ibm.com> wrote: >> Hello! >> >> Rough notes from our discussion last Thursday. Please reply to the >> group with any needed elaborations or corrections. >> >> Adding Andy and Michael on CC since this most closely affects their >> architectures. Also adding Dave Watson and Maged Michael because >> the preferred approach requires that processes wanting to use the >> lightweight sys_membarrier() do a registration step. > > Not to be too much of a curmudgeon, but I think that there should be a > real implementation of the isync membarrier before this get merged. > This series purports to solve two problems, ppc barriers and x86 > exit-without-isync, but it's very hard to evaluate whether it actually > solves the latter problem given the complete lack of x86 or isync code > in the current RFC. > > It still seems to me that you won't get any particular advantage for > using this registration mechanism on x86 even when you implement > isync. Unless I've misunderstood, the only real issue on x86 is that > you need a helper like arch_force_isync_before_usermode(), and that > helper doesn't presently exist. That means that this whole patchset > is standing on very dangerous ground: you'll end up with an efficient > implementation that works just fine without even requesting > registration on every architecture except ppc. That way lies > userspace bugs. My proposed RFC for private expedited membarrier enforces that all architectures perform the registration step. Using the "PRIVATE_EXPEDITED" command without prior process registration returns an error on all architectures. The goal here is to make all architectures behave in the same way, and it allows us to rely on process registration to deal with future arch-specific optimizations. Adding the "core_sync" behavior could then be done for the next kernel merge window. I'm currently foreseeing two possible ABI approaches to expose it: Approach 1: Add MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE and MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED_SYNC_CORE commands. This allows us to return their availability through MEMBARRIER_CMD_QUERY. Approach 2: Add a "MEMBARRIER_FLAG_SYNC_CORE" as flag parameter. It could be set when issuing both MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED and MEMBARRIER_CMD_PRIVATE_EXPEDITED, thus ensuring core serializing behavior. Querying whether core serialization is supported could be done by issuing the MEMBARRIER_CMD_QUERY command with the MEMBARRIER_FLAG_SYNC_CORE flag set. Any other ideas ? Any approach seems better ? > > Also, can you elaborate on the PPC issue? PPC appears to track > mm_cpumask more or less just like x86. Is the issue just that this > tracking has no implied barriers? If so, how does TLB flush on ppc > work? It really does seem impressive to me that an architecture can > efficiently support munmap() but not an expedited private membarrier. I'll leave this question to the PPC experts :) Thanks, Mathieu > > --Andy -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Rough notes from sys_membarrier() lightning BoF 2017-09-20 18:13 ` Mathieu Desnoyers @ 2017-09-20 18:18 ` Andy Lutomirski 2017-09-20 19:57 ` Mathieu Desnoyers 2017-09-21 13:09 ` Peter Zijlstra 2017-09-21 13:15 ` Peter Zijlstra 2 siblings, 1 reply; 17+ messages in thread From: Andy Lutomirski @ 2017-09-20 18:18 UTC (permalink / raw) To: Mathieu Desnoyers Cc: Andy Lutomirski, Paul E. McKenney, Peter Zijlstra, Will Deacon, Alan Stern, Michael Ellerman, linux-kernel, linux-arch, Dave Watson, maged michael On Wed, Sep 20, 2017 at 11:13 AM, Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote: > > ----- On Sep 20, 2017, at 12:02 PM, Andy Lutomirski luto@kernel.org wrote: > > > On Sun, Sep 17, 2017 at 3:36 PM, Paul E. McKenney > > <paulmck@linux.vnet.ibm.com> wrote: > >> Hello! > >> > >> Rough notes from our discussion last Thursday. Please reply to the > >> group with any needed elaborations or corrections. > >> > >> Adding Andy and Michael on CC since this most closely affects their > >> architectures. Also adding Dave Watson and Maged Michael because > >> the preferred approach requires that processes wanting to use the > >> lightweight sys_membarrier() do a registration step. > > > > Not to be too much of a curmudgeon, but I think that there should be a > > real implementation of the isync membarrier before this get merged. > > This series purports to solve two problems, ppc barriers and x86 > > exit-without-isync, but it's very hard to evaluate whether it actually > > solves the latter problem given the complete lack of x86 or isync code > > in the current RFC. > > > > It still seems to me that you won't get any particular advantage for > > using this registration mechanism on x86 even when you implement > > isync. Unless I've misunderstood, the only real issue on x86 is that > > you need a helper like arch_force_isync_before_usermode(), and that > > helper doesn't presently exist. That means that this whole patchset > > is standing on very dangerous ground: you'll end up with an efficient > > implementation that works just fine without even requesting > > registration on every architecture except ppc. That way lies > > userspace bugs. > > My proposed RFC for private expedited membarrier enforces that all > architectures perform the registration step. Using the "PRIVATE_EXPEDITED" > command without prior process registration returns an error on all > architectures. The goal here is to make all architectures behave in the > same way, and it allows us to rely on process registration to deal > with future arch-specific optimizations. Fair enough. That being said, on same architectures (which may well be all but PPC), it might be nice if the registration call literally just sets a flag in the mm saying that it happened so that the registration enforcement can be done. > > > Adding the "core_sync" behavior could then be done for the next kernel > merge window. I'm currently foreseeing two possible ABI approaches to > expose it: > > Approach 1: > > Add MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE and > MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED_SYNC_CORE commands. This > allows us to return their availability through MEMBARRIER_CMD_QUERY. > > Approach 2: > > Add a "MEMBARRIER_FLAG_SYNC_CORE" as flag parameter. It could be set > when issuing both MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED and > MEMBARRIER_CMD_PRIVATE_EXPEDITED, thus ensuring core serializing > behavior. Querying whether core serialization is supported could > be done by issuing the MEMBARRIER_CMD_QUERY command with the > MEMBARRIER_FLAG_SYNC_CORE flag set. > > Any other ideas ? Any approach seems better ? It doesn't seem to make much difference to me. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Rough notes from sys_membarrier() lightning BoF 2017-09-20 18:18 ` Andy Lutomirski @ 2017-09-20 19:57 ` Mathieu Desnoyers 0 siblings, 0 replies; 17+ messages in thread From: Mathieu Desnoyers @ 2017-09-20 19:57 UTC (permalink / raw) To: Andy Lutomirski Cc: Paul E. McKenney, Peter Zijlstra, Will Deacon, Alan Stern, Michael Ellerman, linux-kernel, linux-arch, Dave Watson, maged michael ----- On Sep 20, 2017, at 2:18 PM, Andy Lutomirski luto@kernel.org wrote: > On Wed, Sep 20, 2017 at 11:13 AM, Mathieu Desnoyers > <mathieu.desnoyers@efficios.com> wrote: >> >> ----- On Sep 20, 2017, at 12:02 PM, Andy Lutomirski luto@kernel.org wrote: >> >> > On Sun, Sep 17, 2017 at 3:36 PM, Paul E. McKenney >> > <paulmck@linux.vnet.ibm.com> wrote: >> >> Hello! >> >> >> >> Rough notes from our discussion last Thursday. Please reply to the >> >> group with any needed elaborations or corrections. >> >> >> >> Adding Andy and Michael on CC since this most closely affects their >> >> architectures. Also adding Dave Watson and Maged Michael because >> >> the preferred approach requires that processes wanting to use the >> >> lightweight sys_membarrier() do a registration step. >> > >> > Not to be too much of a curmudgeon, but I think that there should be a >> > real implementation of the isync membarrier before this get merged. >> > This series purports to solve two problems, ppc barriers and x86 >> > exit-without-isync, but it's very hard to evaluate whether it actually >> > solves the latter problem given the complete lack of x86 or isync code >> > in the current RFC. >> > >> > It still seems to me that you won't get any particular advantage for >> > using this registration mechanism on x86 even when you implement >> > isync. Unless I've misunderstood, the only real issue on x86 is that >> > you need a helper like arch_force_isync_before_usermode(), and that >> > helper doesn't presently exist. That means that this whole patchset >> > is standing on very dangerous ground: you'll end up with an efficient >> > implementation that works just fine without even requesting >> > registration on every architecture except ppc. That way lies >> > userspace bugs. >> >> My proposed RFC for private expedited membarrier enforces that all >> architectures perform the registration step. Using the "PRIVATE_EXPEDITED" >> command without prior process registration returns an error on all >> architectures. The goal here is to make all architectures behave in the >> same way, and it allows us to rely on process registration to deal >> with future arch-specific optimizations. > > Fair enough. > > That being said, on same architectures (which may well be all but > PPC), it might be nice if the registration call literally just sets a > flag in the mm saying that it happened so that the registration > enforcement can be done. My RFC patch does exactly that. :-) Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Rough notes from sys_membarrier() lightning BoF 2017-09-20 18:13 ` Mathieu Desnoyers 2017-09-20 18:18 ` Andy Lutomirski @ 2017-09-21 13:09 ` Peter Zijlstra 2017-09-21 17:23 ` James Bottomley 2017-09-22 5:08 ` Michael Ellerman 2017-09-21 13:15 ` Peter Zijlstra 2 siblings, 2 replies; 17+ messages in thread From: Peter Zijlstra @ 2017-09-21 13:09 UTC (permalink / raw) To: Mathieu Desnoyers Cc: Andy Lutomirski, Paul E. McKenney, Will Deacon, Alan Stern, Michael Ellerman, linux-kernel, linux-arch, Dave Watson, maged michael On Wed, Sep 20, 2017 at 06:13:50PM +0000, Mathieu Desnoyers wrote: > > Also, can you elaborate on the PPC issue? PPC appears to track > > mm_cpumask more or less just like x86. Is the issue just that this > > tracking has no implied barriers? If so, how does TLB flush on ppc > > work? It really does seem impressive to me that an architecture can > > efficiently support munmap() but not an expedited private membarrier. > > I'll leave this question to the PPC experts :) IIRC PPC does not keep a tight mm_cpumask, it only sets bit, it never clears bits. The atomic op required to set bits does not imply any memory barrier on PPC. TLB invalidation is a TLBI instruction, it sends TLBI broadcast packets over the interconnect, it doesn't require IPIs like x86. The only optimization PPC does is that if the mm_cpumask has only a single bit set, it uses a TLBI instruction without broadcast, which is cheaper. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Rough notes from sys_membarrier() lightning BoF 2017-09-21 13:09 ` Peter Zijlstra @ 2017-09-21 17:23 ` James Bottomley 2017-09-22 9:33 ` Peter Zijlstra 2017-09-22 5:08 ` Michael Ellerman 1 sibling, 1 reply; 17+ messages in thread From: James Bottomley @ 2017-09-21 17:23 UTC (permalink / raw) To: Peter Zijlstra, Mathieu Desnoyers Cc: Andy Lutomirski, Paul E. McKenney, Will Deacon, Alan Stern, Michael Ellerman, linux-kernel, linux-arch, Dave Watson, maged michael On Thu, 2017-09-21 at 15:09 +0200, Peter Zijlstra wrote: > On Wed, Sep 20, 2017 at 06:13:50PM +0000, Mathieu Desnoyers wrote: > > > > > > > > > Also, can you elaborate on the PPC issue? PPC appears to track > > > mm_cpumask more or less just like x86. Is the issue just that > > > this > > > tracking has no implied barriers? If so, how does TLB flush on > > > ppc > > > work? It really does seem impressive to me that an architecture > > > can > > > efficiently support munmap() but not an expedited private > > > membarrier. > > > > I'll leave this question to the PPC experts :) > > IIRC PPC does not keep a tight mm_cpumask, it only sets bit, it never > clears bits. The atomic op required to set bits does not imply any > memory barrier on PPC. > > TLB invalidation is a TLBI instruction, it sends TLBI broadcast > packets over the interconnect, it doesn't require IPIs like x86. I believe this to be true for all SMP RISC systems ... it's certainly true for PA-RISC as well. There are so many RISC coherency issues that the CPUs pretty much have to have a private bus to broadcast and interlock coherency operations. We have one system that locks up if multiple CPUs have outstanding coherency operations on the private bus, but that's only one annoying CPU (which we manage with a special lock inside the PA-RISC mmu code). James ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Rough notes from sys_membarrier() lightning BoF 2017-09-21 17:23 ` James Bottomley @ 2017-09-22 9:33 ` Peter Zijlstra 0 siblings, 0 replies; 17+ messages in thread From: Peter Zijlstra @ 2017-09-22 9:33 UTC (permalink / raw) To: James Bottomley Cc: Mathieu Desnoyers, Andy Lutomirski, Paul E. McKenney, Will Deacon, Alan Stern, Michael Ellerman, linux-kernel, linux-arch, Dave Watson, maged michael On Thu, Sep 21, 2017 at 10:23:39AM -0700, James Bottomley wrote: > We have one system that locks up if > multiple CPUs have outstanding coherency operations on the private bus, > but that's only one annoying CPU (which we manage with a special lock > inside the PA-RISC mmu code). Hehe, yeah. I saw that code recently and had a WTF moment ;-) ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Rough notes from sys_membarrier() lightning BoF 2017-09-21 13:09 ` Peter Zijlstra 2017-09-21 17:23 ` James Bottomley @ 2017-09-22 5:08 ` Michael Ellerman 1 sibling, 0 replies; 17+ messages in thread From: Michael Ellerman @ 2017-09-22 5:08 UTC (permalink / raw) To: Peter Zijlstra, Mathieu Desnoyers Cc: Andy Lutomirski, Paul E. McKenney, Will Deacon, Alan Stern, linux-kernel, linux-arch, Dave Watson, maged michael Peter Zijlstra <peterz@infradead.org> writes: > On Wed, Sep 20, 2017 at 06:13:50PM +0000, Mathieu Desnoyers wrote: > >> > Also, can you elaborate on the PPC issue? PPC appears to track >> > mm_cpumask more or less just like x86. Is the issue just that this >> > tracking has no implied barriers? If so, how does TLB flush on ppc >> > work? It really does seem impressive to me that an architecture can >> > efficiently support munmap() but not an expedited private membarrier. >> >> I'll leave this question to the PPC experts :) > > IIRC PPC does not keep a tight mm_cpumask, it only sets bit, it never > clears bits. The atomic op required to set bits does not imply any > memory barrier on PPC. Yep. We do have a full barrier now when we set a bit, but not if the bit was already set. > TLB invalidation is a TLBI instruction, it sends TLBI broadcast packets > over the interconnect, it doesn't require IPIs like x86. Yep. > The only optimization PPC does is that if the mm_cpumask has only a > single bit set, it uses a TLBI instruction without broadcast, which is > cheaper. Yep. We would like to trim the mm_cpumask, but it's one of those hairy optimisations we have never quite found time to do. I've been away for two weeks so I've not been able to keep up with the membarrier discussions. Will try and page it back in next week. cheers ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Rough notes from sys_membarrier() lightning BoF 2017-09-20 18:13 ` Mathieu Desnoyers 2017-09-20 18:18 ` Andy Lutomirski 2017-09-21 13:09 ` Peter Zijlstra @ 2017-09-21 13:15 ` Peter Zijlstra 2017-09-21 18:03 ` Mathieu Desnoyers 2 siblings, 1 reply; 17+ messages in thread From: Peter Zijlstra @ 2017-09-21 13:15 UTC (permalink / raw) To: Mathieu Desnoyers Cc: Andy Lutomirski, Paul E. McKenney, Will Deacon, Alan Stern, Michael Ellerman, linux-kernel, linux-arch, Dave Watson, maged michael On Wed, Sep 20, 2017 at 06:13:50PM +0000, Mathieu Desnoyers wrote: > My proposed RFC for private expedited membarrier enforces that all > architectures perform the registration step. Using the "PRIVATE_EXPEDITED" > command without prior process registration returns an error on all > architectures. The goal here is to make all architectures behave in the > same way, and it allows us to rely on process registration to deal > with future arch-specific optimizations. > > Adding the "core_sync" behavior could then be done for the next kernel > merge window. I'm currently foreseeing two possible ABI approaches to > expose it: > > Approach 1: > > Add MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE and > MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED_SYNC_CORE commands. This > allows us to return their availability through MEMBARRIER_CMD_QUERY. > > Approach 2: > > Add a "MEMBARRIER_FLAG_SYNC_CORE" as flag parameter. It could be set > when issuing both MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED and > MEMBARRIER_CMD_PRIVATE_EXPEDITED, thus ensuring core serializing > behavior. Querying whether core serialization is supported could > be done by issuing the MEMBARRIER_CMD_QUERY command with the > MEMBARRIER_FLAG_SYNC_CORE flag set. > > Any other ideas ? Any approach seems better ? So we really need another FLAG for that? AFAICT the current PRIVATE_EXPEDITED is already sufficient for the cross modifying code, since the IPI triggers an exception return on all currently running CPUs and the future running CPUs will have the return to userspace doing the exception return. The only issue is Andy fudging our x86 ret-to-userspace to not use IRET, which we can fix by forcing it into the slowpath (that needs to exist anyway) using that new TIF flag. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Rough notes from sys_membarrier() lightning BoF 2017-09-21 13:15 ` Peter Zijlstra @ 2017-09-21 18:03 ` Mathieu Desnoyers 0 siblings, 0 replies; 17+ messages in thread From: Mathieu Desnoyers @ 2017-09-21 18:03 UTC (permalink / raw) To: Peter Zijlstra Cc: Andy Lutomirski, Paul E. McKenney, Will Deacon, Alan Stern, Michael Ellerman, linux-kernel, linux-arch, Dave Watson, maged michael ----- On Sep 21, 2017, at 9:15 AM, Peter Zijlstra peterz@infradead.org wrote: > On Wed, Sep 20, 2017 at 06:13:50PM +0000, Mathieu Desnoyers wrote: >> My proposed RFC for private expedited membarrier enforces that all >> architectures perform the registration step. Using the "PRIVATE_EXPEDITED" >> command without prior process registration returns an error on all >> architectures. The goal here is to make all architectures behave in the >> same way, and it allows us to rely on process registration to deal >> with future arch-specific optimizations. >> >> Adding the "core_sync" behavior could then be done for the next kernel >> merge window. I'm currently foreseeing two possible ABI approaches to >> expose it: >> >> Approach 1: >> >> Add MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE and >> MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED_SYNC_CORE commands. This >> allows us to return their availability through MEMBARRIER_CMD_QUERY. >> >> Approach 2: >> >> Add a "MEMBARRIER_FLAG_SYNC_CORE" as flag parameter. It could be set >> when issuing both MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED and >> MEMBARRIER_CMD_PRIVATE_EXPEDITED, thus ensuring core serializing >> behavior. Querying whether core serialization is supported could >> be done by issuing the MEMBARRIER_CMD_QUERY command with the >> MEMBARRIER_FLAG_SYNC_CORE flag set. >> >> Any other ideas ? Any approach seems better ? > > So we really need another FLAG for that? AFAICT the current > PRIVATE_EXPEDITED is already sufficient for the cross modifying code, > since the IPI triggers an exception return on all currently running CPUs > and the future running CPUs will have the return to userspace doing the > exception return. > > The only issue is Andy fudging our x86 ret-to-userspace to not use IRET, > which we can fix by forcing it into the slowpath (that needs to exist > anyway) using that new TIF flag. I agree that x86, as it stands today, would provide core serialization with the private expedited membarrier command. And we can deal with future optimization of ret-to-userspace using the TIF flag set on registration. I'm wondering whether all architectures guarantee core serialization on return from interrupt triggered by the IPI, and on ret-to-userspace ? Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com ^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2017-09-22 9:33 UTC | newest] Thread overview: 17+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2017-09-17 22:36 Rough notes from sys_membarrier() lightning BoF Paul E. McKenney 2017-09-18 19:04 ` Alan Stern 2017-09-18 19:04 ` Alan Stern 2017-09-18 19:10 ` Mathieu Desnoyers 2017-09-18 19:29 ` Paul E. McKenney 2017-09-18 19:37 ` Mathieu Desnoyers 2017-09-18 20:34 ` Paul E. McKenney 2017-09-20 16:02 ` Andy Lutomirski 2017-09-20 18:13 ` Mathieu Desnoyers 2017-09-20 18:18 ` Andy Lutomirski 2017-09-20 19:57 ` Mathieu Desnoyers 2017-09-21 13:09 ` Peter Zijlstra 2017-09-21 17:23 ` James Bottomley 2017-09-22 9:33 ` Peter Zijlstra 2017-09-22 5:08 ` Michael Ellerman 2017-09-21 13:15 ` Peter Zijlstra 2017-09-21 18:03 ` Mathieu Desnoyers
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.