* Re: Alternative to signals/sys_membarrier() in liburcu [not found] <CANW5cDmTCM9ZmhN7-2eWUEYvD+Y=sGt2i7mecdPTTLHMcT8fPg@mail.gmail.com> @ 2015-02-12 13:57 ` Duncan Sands 2015-03-12 14:57 ` Mathieu Desnoyers ` (2 subsequent siblings) 3 siblings, 0 replies; 16+ messages in thread From: Duncan Sands @ 2015-02-12 13:57 UTC (permalink / raw) To: lttng-dev Hi Michael, On 11/02/15 01:03, Michael Sullivan wrote: > I've been looking at the RCU library (as part of gathering examples for my > research on weak memory models) and was thinking about ways to force other > threads to issue barriers. Since it seems like sys_membarrier() never made it > into the kernel, I was pondering whether there was some other way to more or > less get its effect; as it turns out, there is, but it is a hack: mprotect(2). is it clear that sys_membarrier is really dead? Ciao, Duncan. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Alternative to signals/sys_membarrier() in liburcu [not found] <CANW5cDmTCM9ZmhN7-2eWUEYvD+Y=sGt2i7mecdPTTLHMcT8fPg@mail.gmail.com> 2015-02-12 13:57 ` Alternative to signals/sys_membarrier() in liburcu Duncan Sands @ 2015-03-12 14:57 ` Mathieu Desnoyers [not found] ` <54DCB15F.80505@free.fr> [not found] ` <867044376.285926.1426172227750.JavaMail.zimbra@efficios.com> 3 siblings, 0 replies; 16+ messages in thread From: Mathieu Desnoyers @ 2015-03-12 14:57 UTC (permalink / raw) To: Michael Sullivan; +Cc: lttng-dev [-- Attachment #1.1: Type: text/plain, Size: 2513 bytes --] ----- Original Message ----- > From: "Michael Sullivan" <sully@msully.net> > To: lttng-dev@lists.lttng.org > Sent: Tuesday, February 10, 2015 7:03:53 PM > Subject: [lttng-dev] Alternative to signals/sys_membarrier() in liburcu > I've been looking at the RCU library (as part of gathering examples for my > research on weak memory models) and was thinking about ways to force other > threads to issue barriers. Since it seems like sys_membarrier() never made > it into the kernel, I was pondering whether there was some other way to more > or less get its effect; as it turns out, there is, but it is a hack: > mprotect(2). > When a thread revokes write permissions on a page, the kernel needs to do a > TLB shootdown to make sure that none of the other CPUs running code in that > address space have a writable mapping for that page cached. In Linux, this > is done by forcing code to invalidate the mappings to run on every CPU in > the address space, and waiting for completion. The code for the "run this > function on another CPU" mechanism forces the target CPU to issue an > smp_mb(). > (In practice TLB shootdowns are done when permissions are added, not just > when they are removed, but they needn't be; faults caused by using a cached > entry with less permissions can be fixed up by the page fault handler. > They're also needed when unmapping memory, but mprotect() seems cheaper than > having to mmap() and munmap(). Also TLB shootdowns aren't needed if the page > is non-present because it's never been backed or has been swapped out, so > mlock(2) is used to keep it in place). > I hacked this up and in my limited testing, it does seem to have way better > write side performance than the signal version has. That said, it is also > super hacky and is certainly depending on behaviors of mprotect() that are > not actually specified. It would be unusual, I think, to implement > mprotect() in a way where this didn't work? It may well be to janky to > actually be useful, though. > I can send the code if you're interested. That's certainly an interesting way to mimick sys_membarrier! Even though it depends on internal behavior not currently specified by mprotect, I'd very much like to see the prototype you have, Thanks! Mathieu > -Michael Sullivan > _______________________________________________ > lttng-dev mailing list > lttng-dev@lists.lttng.org > http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com [-- Attachment #1.2: Type: text/html, Size: 3965 bytes --] [-- Attachment #2: Type: text/plain, Size: 155 bytes --] _______________________________________________ lttng-dev mailing list lttng-dev@lists.lttng.org http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev ^ permalink raw reply [flat|nested] 16+ messages in thread
[parent not found: <54DCB15F.80505@free.fr>]
* Re: Alternative to signals/sys_membarrier() in liburcu [not found] ` <54DCB15F.80505@free.fr> @ 2015-03-12 14:58 ` Mathieu Desnoyers 0 siblings, 0 replies; 16+ messages in thread From: Mathieu Desnoyers @ 2015-03-12 14:58 UTC (permalink / raw) To: Duncan Sands; +Cc: lttng-dev ----- Original Message ----- > From: "Duncan Sands" <baldrick@free.fr> > To: lttng-dev@lists.lttng.org > Sent: Thursday, February 12, 2015 8:57:51 AM > Subject: Re: [lttng-dev] Alternative to signals/sys_membarrier() in liburcu > > Hi Michael, > > On 11/02/15 01:03, Michael Sullivan wrote: > > I've been looking at the RCU library (as part of gathering examples for my > > research on weak memory models) and was thinking about ways to force other > > threads to issue barriers. Since it seems like sys_membarrier() never made > > it > > into the kernel, I was pondering whether there was some other way to more > > or > > less get its effect; as it turns out, there is, but it is a hack: > > mprotect(2). > > is it clear that sys_membarrier is really dead? There were no technical objections to sys_membarrier. The only objection left was that only liburcu was needing it, and kernel maintainers were concerned to introduce a kernel API for only a single user. It would be good if we can come up with other uses of sys_membarrier. Thoughts ? Thanks, Mathieu > > Ciao, Duncan. > > _______________________________________________ > lttng-dev mailing list > lttng-dev@lists.lttng.org > http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev > -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com ^ permalink raw reply [flat|nested] 16+ messages in thread
[parent not found: <867044376.285926.1426172227750.JavaMail.zimbra@efficios.com>]
* Re: Alternative to signals/sys_membarrier() in liburcu [not found] ` <867044376.285926.1426172227750.JavaMail.zimbra@efficios.com> @ 2015-03-12 16:04 ` Michael Sullivan [not found] ` <CANW5cDkiZoysNM3rqb4v6Tj996ocsaSh=OZoBLfp4h7ZGb4bxg@mail.gmail.com> 1 sibling, 0 replies; 16+ messages in thread From: Michael Sullivan @ 2015-03-12 16:04 UTC (permalink / raw) To: Mathieu Desnoyers; +Cc: lttng-dev [-- Attachment #1.1: Type: text/plain, Size: 750 bytes --] On Thu, Mar 12, 2015 at 10:57 AM, Mathieu Desnoyers < mathieu.desnoyers@efficios.com> wrote: > > Even though it depends on internal behavior not currently specified by > mprotect, > I'd very much like to see the prototype you have, > > I ended up posting my code at https://github.com/msullivan/userspace-rcu/tree/msync-barrier. The interesting patch is https://github.com/msullivan/userspace-rcu/commit/04656b468d418efbc5d934ab07954eb8395a7ab0 . Quick blog post I wrote about it at http://www.msully.net/blog/2015/02/24/forcing-memory-barriers-on-other-cpus-with-mprotect2/ . (I talked briefly about sys_membarrier in the post as best as I could piece together from LKML; if my comment on it is inaccurate I can edit the post.) -Michael Sullivan [-- Attachment #1.2: Type: text/html, Size: 1582 bytes --] [-- Attachment #2: Type: text/plain, Size: 155 bytes --] _______________________________________________ lttng-dev mailing list lttng-dev@lists.lttng.org http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev ^ permalink raw reply [flat|nested] 16+ messages in thread
[parent not found: <CANW5cDkiZoysNM3rqb4v6Tj996ocsaSh=OZoBLfp4h7ZGb4bxg@mail.gmail.com>]
* Re: Alternative to signals/sys_membarrier() in liburcu [not found] ` <CANW5cDkiZoysNM3rqb4v6Tj996ocsaSh=OZoBLfp4h7ZGb4bxg@mail.gmail.com> @ 2015-03-12 20:53 ` Mathieu Desnoyers [not found] ` <666590480.287502.1426193588471.JavaMail.zimbra@efficios.com> 1 sibling, 0 replies; 16+ messages in thread From: Mathieu Desnoyers @ 2015-03-12 20:53 UTC (permalink / raw) To: Michael Sullivan Cc: Peter Zijlstra, LKML, Steven Rostedt, lttng-dev, Thomas Gleixner, Paul E. McKenney, Linus Torvalds, Ingo Molnar [-- Attachment #1.1: Type: text/plain, Size: 2047 bytes --] Hi, Michael Sullivan proposed a clever hack abusing mprotect() to perform the same effect as sys_membarrier() I submitted a few years ago ( https://lkml.org/lkml/2010/4/18/15 ). At that time, the sys_membarrier implementation was deemed technically sound, but there were not enough users of the system call to justify its inclusion. So far, the number of users of liburcu has increased, but liburcu still appears to be the only direct user of sys_membarrier. On this front, we could argue that many other system calls have only one user: glibc. In that respect, liburcu is quite similar to glibc. So the question as it stands appears to be: would you be comfortable having users abuse mprotect(), relying on its side-effect of issuing a smp_mb() on each targeted CPU for the TLB shootdown, as an effective implementation of process-wide memory barrier ? Thoughts ? Thanks! Mathieu ----- Original Message ----- > From: "Michael Sullivan" <sully@msully.net> > To: "Mathieu Desnoyers" <mathieu.desnoyers@efficios.com> > Cc: lttng-dev@lists.lttng.org > Sent: Thursday, March 12, 2015 12:04:07 PM > Subject: Re: [lttng-dev] Alternative to signals/sys_membarrier() in liburcu > On Thu, Mar 12, 2015 at 10:57 AM, Mathieu Desnoyers < > mathieu.desnoyers@efficios.com > wrote: > > Even though it depends on internal behavior not currently specified by > > mprotect, > > > I'd very much like to see the prototype you have, > > I ended up posting my code at > https://github.com/msullivan/userspace-rcu/tree/msync-barrier . > The interesting patch is > https://github.com/msullivan/userspace-rcu/commit/04656b468d418efbc5d934ab07954eb8395a7ab0 > . > Quick blog post I wrote about it at > http://www.msully.net/blog/2015/02/24/forcing-memory-barriers-on-other-cpus-with-mprotect2/ > . > (I talked briefly about sys_membarrier in the post as best as I could piece > together from LKML; if my comment on it is inaccurate I can edit the post.) > -Michael Sullivan -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com [-- Attachment #1.2: Type: text/html, Size: 3793 bytes --] [-- Attachment #2: Type: text/plain, Size: 155 bytes --] _______________________________________________ lttng-dev mailing list lttng-dev@lists.lttng.org http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev ^ permalink raw reply [flat|nested] 16+ messages in thread
[parent not found: <666590480.287502.1426193588471.JavaMail.zimbra@efficios.com>]
* Re: Alternative to signals/sys_membarrier() in liburcu [not found] ` <666590480.287502.1426193588471.JavaMail.zimbra@efficios.com> @ 2015-03-12 20:56 ` Mathieu Desnoyers 2015-03-12 21:12 ` Paul E. McKenney 2015-03-12 23:59 ` One Thousand Gnomes 2015-03-12 21:47 ` Linus Torvalds 1 sibling, 2 replies; 16+ messages in thread From: Mathieu Desnoyers @ 2015-03-12 20:56 UTC (permalink / raw) To: Michael Sullivan Cc: Peter Zijlstra, LKML, Steven Rostedt, lttng-dev, Thomas Gleixner, Paul E. McKenney, Linus Torvalds, Ingo Molnar (sorry for re-send, my mail client tricked me into posting HTML to lkml) Hi, Michael Sullivan proposed a clever hack abusing mprotect() to perform the same effect as sys_membarrier() I submitted a few years ago ( https://lkml.org/lkml/2010/4/18/15 ). At that time, the sys_membarrier implementation was deemed technically sound, but there were not enough users of the system call to justify its inclusion. So far, the number of users of liburcu has increased, but liburcu still appears to be the only direct user of sys_membarrier. On this front, we could argue that many other system calls have only one user: glibc. In that respect, liburcu is quite similar to glibc. So the question as it stands appears to be: would you be comfortable having users abuse mprotect(), relying on its side-effect of issuing a smp_mb() on each targeted CPU for the TLB shootdown, as an effective implementation of process-wide memory barrier ? Thoughts ? Thanks! Mathieu From: "Michael Sullivan" <sully@msully.net> To: "Mathieu Desnoyers" <mathieu.desnoyers@efficios.com> Cc: lttng-dev@lists.lttng.org Sent: Thursday, March 12, 2015 12:04:07 PM Subject: Re: [lttng-dev] Alternative to signals/sys_membarrier() in liburcu On Thu, Mar 12, 2015 at 10:57 AM, Mathieu Desnoyers < mathieu.desnoyers@efficios.com > wrote: Even though it depends on internal behavior not currently specified by mprotect, I'd very much like to see the prototype you have, I ended up posting my code at https://github.com/msullivan/userspace-rcu/tree/msync-barrier . The interesting patch is https://github.com/msullivan/userspace-rcu/commit/04656b468d418efbc5d934ab07954eb8395a7ab0 . Quick blog post I wrote about it at http://www.msully.net/blog/2015/02/24/forcing-memory-barriers-on-other-cpus-with-mprotect2/ . (I talked briefly about sys_membarrier in the post as best as I could piece together from LKML; if my comment on it is inaccurate I can edit the post.) -Michael Sullivan -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com _______________________________________________ lttng-dev mailing list lttng-dev@lists.lttng.org http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Alternative to signals/sys_membarrier() in liburcu 2015-03-12 20:56 ` Mathieu Desnoyers @ 2015-03-12 21:12 ` Paul E. McKenney 2015-03-14 21:06 ` Benjamin Herrenschmidt 2015-03-12 23:59 ` One Thousand Gnomes 1 sibling, 1 reply; 16+ messages in thread From: Paul E. McKenney @ 2015-03-12 21:12 UTC (permalink / raw) To: Mathieu Desnoyers Cc: Michael Sullivan, Peter Zijlstra, LKML, Steven Rostedt, lttng-dev, Thomas Gleixner, Linus Torvalds, Ingo Molnar, linux-arch On Thu, Mar 12, 2015 at 08:56:00PM +0000, Mathieu Desnoyers wrote: > (sorry for re-send, my mail client tricked me into posting HTML > to lkml) > > Hi, > > Michael Sullivan proposed a clever hack abusing mprotect() to > perform the same effect as sys_membarrier() I submitted a few > years ago ( https://lkml.org/lkml/2010/4/18/15 ). > > At that time, the sys_membarrier implementation was deemed > technically sound, but there were not enough users of the system call > to justify its inclusion. > > So far, the number of users of liburcu has increased, but liburcu > still appears to be the only direct user of sys_membarrier. On this > front, we could argue that many other system calls have only > one user: glibc. In that respect, liburcu is quite similar to glibc. > > So the question as it stands appears to be: would you be comfortable > having users abuse mprotect(), relying on its side-effect of issuing > a smp_mb() on each targeted CPU for the TLB shootdown, as > an effective implementation of process-wide memory barrier ? > > Thoughts ? Are there any architectures left that use hardware-assisted global TLB invalidation? On such an architecture, you might not get a memory barrier except on the CPU executing the mprotect() or munmap(). (Here is hoping that no one does -- it is a cute abuse^Whack otherwise!) Thanx, Paul > Thanks! > > Mathieu > > > > > > From: "Michael Sullivan" <sully@msully.net> > To: "Mathieu Desnoyers" <mathieu.desnoyers@efficios.com> > Cc: lttng-dev@lists.lttng.org > Sent: Thursday, March 12, 2015 12:04:07 PM > Subject: Re: [lttng-dev] Alternative to signals/sys_membarrier() in liburcu > > On Thu, Mar 12, 2015 at 10:57 AM, Mathieu Desnoyers < mathieu.desnoyers@efficios.com > wrote: > > > > > Even though it depends on internal behavior not currently specified by mprotect, > I'd very much like to see the prototype you have, > > > I ended up posting my code at https://github.com/msullivan/userspace-rcu/tree/msync-barrier . > The interesting patch is https://github.com/msullivan/userspace-rcu/commit/04656b468d418efbc5d934ab07954eb8395a7ab0 . > > Quick blog post I wrote about it at http://www.msully.net/blog/2015/02/24/forcing-memory-barriers-on-other-cpus-with-mprotect2/ . > (I talked briefly about sys_membarrier in the post as best as I could piece together from LKML; if my comment on it is inaccurate I can edit the post.) > > -Michael Sullivan > > > > -- > Mathieu Desnoyers > EfficiOS Inc. > http://www.efficios.com > > _______________________________________________ > lttng-dev mailing list > lttng-dev@lists.lttng.org > http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev > ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Alternative to signals/sys_membarrier() in liburcu 2015-03-12 21:12 ` Paul E. McKenney @ 2015-03-14 21:06 ` Benjamin Herrenschmidt 0 siblings, 0 replies; 16+ messages in thread From: Benjamin Herrenschmidt @ 2015-03-14 21:06 UTC (permalink / raw) To: paulmck Cc: Mathieu Desnoyers, Michael Sullivan, Peter Zijlstra, LKML, Steven Rostedt, lttng-dev, Thomas Gleixner, Linus Torvalds, Ingo Molnar, linux-arch On Thu, 2015-03-12 at 14:12 -0700, Paul E. McKenney wrote: > > Are there any architectures left that use hardware-assisted global > TLB invalidation? ARM and PowerPC at least... Cheers, Ben. > On such an architecture, you might not get a memory > barrier except on the CPU executing the mprotect() or munmap(). > > (Here is hoping that no one does -- it is a cute abuse^Whack > otherwise!) > > ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Alternative to signals/sys_membarrier() in liburcu 2015-03-12 20:56 ` Mathieu Desnoyers 2015-03-12 21:12 ` Paul E. McKenney @ 2015-03-12 23:59 ` One Thousand Gnomes 2015-03-13 0:43 ` Mathieu Desnoyers 1 sibling, 1 reply; 16+ messages in thread From: One Thousand Gnomes @ 2015-03-12 23:59 UTC (permalink / raw) To: Mathieu Desnoyers Cc: Michael Sullivan, Peter Zijlstra, LKML, Steven Rostedt, lttng-dev, Thomas Gleixner, Paul E. McKenney, Linus Torvalds, Ingo Molnar On Thu, 12 Mar 2015 20:56:00 +0000 (UTC) Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote: > (sorry for re-send, my mail client tricked me into posting HTML > to lkml) > > Hi, > > Michael Sullivan proposed a clever hack abusing mprotect() to > perform the same effect as sys_membarrier() I submitted a few > years ago ( https://lkml.org/lkml/2010/4/18/15 ). > > At that time, the sys_membarrier implementation was deemed > technically sound, but there were not enough users of the system call > to justify its inclusion. > > So far, the number of users of liburcu has increased, but liburcu > still appears to be the only direct user of sys_membarrier. On this > front, we could argue that many other system calls have only > one user: glibc. In that respect, liburcu is quite similar to glibc. > > So the question as it stands appears to be: would you be comfortable > having users abuse mprotect(), relying on its side-effect of issuing > a smp_mb() on each targeted CPU for the TLB shootdown, as > an effective implementation of process-wide memory barrier ? What are you going to do if some future ARM or x86 CPU update with hardware TLB shootdown appears ? All your code will start to fail on new kernels using that property, and in nasty insidious ways. Also doesn't sun4d have hardware shootdown for 16 processors or less ? I would have thought a membarrier was a lot safer and it can be made to do whatever horrible things are needed on different processors (indeed it could even be a pure libc hotpath if some future cpu grows this ability) Alan ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Alternative to signals/sys_membarrier() in liburcu 2015-03-12 23:59 ` One Thousand Gnomes @ 2015-03-13 0:43 ` Mathieu Desnoyers 0 siblings, 0 replies; 16+ messages in thread From: Mathieu Desnoyers @ 2015-03-13 0:43 UTC (permalink / raw) To: One Thousand Gnomes Cc: Michael Sullivan, Peter Zijlstra, LKML, Steven Rostedt, lttng-dev, Thomas Gleixner, Paul E. McKenney, Linus Torvalds, Ingo Molnar ----- Original Message ----- > From: "One Thousand Gnomes" <gnomes@lxorguk.ukuu.org.uk> > To: "Mathieu Desnoyers" <mathieu.desnoyers@efficios.com> > Cc: "Michael Sullivan" <sully@msully.net>, "Peter Zijlstra" <peterz@infradead.org>, "LKML" > <linux-kernel@vger.kernel.org>, "Steven Rostedt" <rostedt@goodmis.org>, lttng-dev@lists.lttng.org, "Thomas Gleixner" > <tglx@linutronix.de>, "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>, "Linus Torvalds" > <torvalds@linux-foundation.org>, "Ingo Molnar" <mingo@kernel.org> > Sent: Thursday, March 12, 2015 7:59:38 PM > Subject: Re: Alternative to signals/sys_membarrier() in liburcu > > On Thu, 12 Mar 2015 20:56:00 +0000 (UTC) > Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote: > > > (sorry for re-send, my mail client tricked me into posting HTML > > to lkml) > > > > Hi, > > > > Michael Sullivan proposed a clever hack abusing mprotect() to > > perform the same effect as sys_membarrier() I submitted a few > > years ago ( https://lkml.org/lkml/2010/4/18/15 ). > > > > At that time, the sys_membarrier implementation was deemed > > technically sound, but there were not enough users of the system call > > to justify its inclusion. > > > > So far, the number of users of liburcu has increased, but liburcu > > still appears to be the only direct user of sys_membarrier. On this > > front, we could argue that many other system calls have only > > one user: glibc. In that respect, liburcu is quite similar to glibc. > > > > So the question as it stands appears to be: would you be comfortable > > having users abuse mprotect(), relying on its side-effect of issuing > > a smp_mb() on each targeted CPU for the TLB shootdown, as > > an effective implementation of process-wide memory barrier ? > > What are you going to do if some future ARM or x86 CPU update with > hardware TLB shootdown appears ? All your code will start to fail on new > kernels using that property, and in nasty insidious ways. I'd claim that removing the IPIs breaks userspace, of course. :-P If we start relying on mprotect() implying memory barriers issued on all CPUs associated with the memory mapping in core user-space libraries, then whenever those shiny new CPUs show up, we might be stuck with the IPIs, otherwise we could claim that removing them breaks userspace. I would really hate to tie in an assumption like that on mprotect, because that would really be painting ourselves in a corner. > > Also doesn't sun4d have hardware shootdown for 16 processors or less ? That's possible. I'm no sun expert though. > > I would have thought a membarrier was a lot safer and it can be made to > do whatever horrible things are needed on different processors (indeed it > could even be a pure libc hotpath if some future cpu grows this ability) I'd really prefer a well-documented system call for that purpose too. Thanks, Mathieu > > Alan > -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Alternative to signals/sys_membarrier() in liburcu [not found] ` <666590480.287502.1426193588471.JavaMail.zimbra@efficios.com> 2015-03-12 20:56 ` Mathieu Desnoyers @ 2015-03-12 21:47 ` Linus Torvalds 2015-03-12 22:30 ` Mathieu Desnoyers 1 sibling, 1 reply; 16+ messages in thread From: Linus Torvalds @ 2015-03-12 21:47 UTC (permalink / raw) To: Mathieu Desnoyers Cc: Michael Sullivan, lttng-dev, LKML, Paul E. McKenney, Peter Zijlstra, Ingo Molnar, Thomas Gleixner, Steven Rostedt On Thu, Mar 12, 2015 at 1:53 PM, Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote: > > So the question as it stands appears to be: would you be comfortable > having users abuse mprotect(), relying on its side-effect of issuing > a smp_mb() on each targeted CPU for the TLB shootdown, as > an effective implementation of process-wide memory barrier ? Be *very* careful. Just yesterday, in another thread (discussing the auto-numa TLB performance regression), we were discussing skipping the TLB invalidates entirely if the mprotect relaxes the protections. Because if you *used* to be read-only, and them mprotect() something so that it is read-write, there really is no need to send a TLB invalidate, at least on x86. You can just change the page tables, and *if* any entries are stale in the TLB they'll take a microfault on access and then just reload the TLB. So mprotect() to a more permissive mode is not necessarily serializing. Also, you need to make sure that your page is actually in memory, because otherwise the kernel may end up seeing "oh, it's not even present", and never flush the TLB at all. So now you need to mlock that page. Which can be problematic for non-root. In other words, I'd be a bit leery about it. There may be other gotcha's about it. Linus ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Alternative to signals/sys_membarrier() in liburcu 2015-03-12 21:47 ` Linus Torvalds @ 2015-03-12 22:30 ` Mathieu Desnoyers 2015-03-13 8:07 ` Ingo Molnar 0 siblings, 1 reply; 16+ messages in thread From: Mathieu Desnoyers @ 2015-03-12 22:30 UTC (permalink / raw) To: Linus Torvalds Cc: Michael Sullivan, lttng-dev, LKML, Paul E. McKenney, Peter Zijlstra, Ingo Molnar, Thomas Gleixner, Steven Rostedt ----- Original Message ----- > From: "Linus Torvalds" <torvalds@linux-foundation.org> > To: "Mathieu Desnoyers" <mathieu.desnoyers@efficios.com> > Cc: "Michael Sullivan" <sully@msully.net>, lttng-dev@lists.lttng.org, "LKML" <linux-kernel@vger.kernel.org>, "Paul E. > McKenney" <paulmck@linux.vnet.ibm.com>, "Peter Zijlstra" <peterz@infradead.org>, "Ingo Molnar" <mingo@kernel.org>, > "Thomas Gleixner" <tglx@linutronix.de>, "Steven Rostedt" <rostedt@goodmis.org> > Sent: Thursday, March 12, 2015 5:47:05 PM > Subject: Re: Alternative to signals/sys_membarrier() in liburcu > > On Thu, Mar 12, 2015 at 1:53 PM, Mathieu Desnoyers > <mathieu.desnoyers@efficios.com> wrote: > > > > So the question as it stands appears to be: would you be comfortable > > having users abuse mprotect(), relying on its side-effect of issuing > > a smp_mb() on each targeted CPU for the TLB shootdown, as > > an effective implementation of process-wide memory barrier ? > > Be *very* careful. > > Just yesterday, in another thread (discussing the auto-numa TLB > performance regression), we were discussing skipping the TLB > invalidates entirely if the mprotect relaxes the protections. > > Because if you *used* to be read-only, and them mprotect() something > so that it is read-write, there really is no need to send a TLB > invalidate, at least on x86. You can just change the page tables, and > *if* any entries are stale in the TLB they'll take a microfault on > access and then just reload the TLB. > > So mprotect() to a more permissive mode is not necessarily serializing. The idea here is to always mprotect() to a more restrictive mode, which should trigger the TLB shootdown. > > Also, you need to make sure that your page is actually in memory, > because otherwise the kernel may end up seeing "oh, it's not even > present", and never flush the TLB at all. > > So now you need to mlock that page. Which can be problematic for non-root. I'm aware the default amount of locked memory is usually quite low (64kB here). So we'd need to handle cases where we run out of locked memory. We could fallback to a slower userspace RCU scheme if this occurs. > > In other words, I'd be a bit leery about it. There may be other > gotcha's about it. Looking again at this old proposed patch (https://lkml.org/lkml/2010/4/18/15) which adds a few memory barriers around updates to mm_cpumask for sys_membarrier makes me wonder whether mprotect() may not skip some CPU from the mask that would actually need to be taken care of in very narrow race scenarios. Thanks, Mathieu > > Linus > -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Alternative to signals/sys_membarrier() in liburcu 2015-03-12 22:30 ` Mathieu Desnoyers @ 2015-03-13 8:07 ` Ingo Molnar 2015-03-13 14:18 ` Paul E. McKenney 0 siblings, 1 reply; 16+ messages in thread From: Ingo Molnar @ 2015-03-13 8:07 UTC (permalink / raw) To: Mathieu Desnoyers Cc: Linus Torvalds, Michael Sullivan, lttng-dev, LKML, Paul E. McKenney, Peter Zijlstra, Thomas Gleixner, Steven Rostedt * Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote: > ----- Original Message ----- > > From: "Linus Torvalds" <torvalds@linux-foundation.org> > > To: "Mathieu Desnoyers" <mathieu.desnoyers@efficios.com> > > Cc: "Michael Sullivan" <sully@msully.net>, lttng-dev@lists.lttng.org, "LKML" <linux-kernel@vger.kernel.org>, "Paul E. > > McKenney" <paulmck@linux.vnet.ibm.com>, "Peter Zijlstra" <peterz@infradead.org>, "Ingo Molnar" <mingo@kernel.org>, > > "Thomas Gleixner" <tglx@linutronix.de>, "Steven Rostedt" <rostedt@goodmis.org> > > Sent: Thursday, March 12, 2015 5:47:05 PM > > Subject: Re: Alternative to signals/sys_membarrier() in liburcu > > > > On Thu, Mar 12, 2015 at 1:53 PM, Mathieu Desnoyers > > <mathieu.desnoyers@efficios.com> wrote: > > > > > > So the question as it stands appears to be: would you be comfortable > > > having users abuse mprotect(), relying on its side-effect of issuing > > > a smp_mb() on each targeted CPU for the TLB shootdown, as > > > an effective implementation of process-wide memory barrier ? > > > > Be *very* careful. > > > > Just yesterday, in another thread (discussing the auto-numa TLB > > performance regression), we were discussing skipping the TLB > > invalidates entirely if the mprotect relaxes the protections. We have such code already in mm/mprotect.c, introduced in: 10c1045f28e8 mm: numa: avoid unnecessary TLB flushes when setting NUMA hinting entries which does: /* Avoid TLB flush if possible */ if (pte_protnone(oldpte)) continue; > > Because if you *used* to be read-only, and them mprotect() > > something so that it is read-write, there really is no need to > > send a TLB invalidate, at least on x86. You can just change the > > page tables, and *if* any entries are stale in the TLB they'll > > take a microfault on access and then just reload the TLB. > > > > So mprotect() to a more permissive mode is not necessarily > > serializing. > > The idea here is to always mprotect() to a more restrictive mode, > which should trigger the TLB shootdown. So what happens if a CPU comes around that integrates TLB shootdown management into its cache coherency protocol? In such a case IPI traffic can be skipped: the memory bus messages take care of TLB flushes in most cases. It's a natural optimization IMHO, because TLB flushes are conceptually pretty close to the synchronization mechanisms inherent in data cache coherency protocols: This could be implemented for example by a CPU that knows about ptes and handles their modification differently: when a pte is modified it will broadcast a MESI invalidation message not just for the cacheline belonging to the pte's physical address, but also an 'invalidate TLB' MESI message for the pte value's page. The TLB shootdown would either be guaranteed within the MESI transaction, or there would either be a deterministic timing guarantee, or some explicit synchronization mechanism (new instruction) to make sure the remote TLB(s) got shot down. Every form of this would be way faster than sending interrupts. New OSs could support this by the hardware telling them in which cases the TLBs are 'auto-flushed', while old OSs would still be compatible by sending (now pointless) TLB shootdown IPIs. So it's a relatively straightforward hardware optimization IMHO: assuming TLB flushes are considered important enough to complicate the cacheline state machine (which I think they currently aren't). So in this case there's no interrupt and no other interruption of the remote CPU's flow of execution in any fashion that could advance the RCU state machine. What do you think? Thanks, Ingo ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Alternative to signals/sys_membarrier() in liburcu 2015-03-13 8:07 ` Ingo Molnar @ 2015-03-13 14:18 ` Paul E. McKenney 2015-03-23 9:35 ` [lttng-dev] " Duncan Sands 0 siblings, 1 reply; 16+ messages in thread From: Paul E. McKenney @ 2015-03-13 14:18 UTC (permalink / raw) To: Ingo Molnar Cc: Mathieu Desnoyers, Linus Torvalds, Michael Sullivan, lttng-dev, LKML, Peter Zijlstra, Thomas Gleixner, Steven Rostedt On Fri, Mar 13, 2015 at 09:07:43AM +0100, Ingo Molnar wrote: > > * Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote: > > > ----- Original Message ----- > > > From: "Linus Torvalds" <torvalds@linux-foundation.org> > > > To: "Mathieu Desnoyers" <mathieu.desnoyers@efficios.com> > > > Cc: "Michael Sullivan" <sully@msully.net>, lttng-dev@lists.lttng.org, "LKML" <linux-kernel@vger.kernel.org>, "Paul E. > > > McKenney" <paulmck@linux.vnet.ibm.com>, "Peter Zijlstra" <peterz@infradead.org>, "Ingo Molnar" <mingo@kernel.org>, > > > "Thomas Gleixner" <tglx@linutronix.de>, "Steven Rostedt" <rostedt@goodmis.org> > > > Sent: Thursday, March 12, 2015 5:47:05 PM > > > Subject: Re: Alternative to signals/sys_membarrier() in liburcu > > > > > > On Thu, Mar 12, 2015 at 1:53 PM, Mathieu Desnoyers > > > <mathieu.desnoyers@efficios.com> wrote: > > > > > > > > So the question as it stands appears to be: would you be comfortable > > > > having users abuse mprotect(), relying on its side-effect of issuing > > > > a smp_mb() on each targeted CPU for the TLB shootdown, as > > > > an effective implementation of process-wide memory barrier ? > > > > > > Be *very* careful. > > > > > > Just yesterday, in another thread (discussing the auto-numa TLB > > > performance regression), we were discussing skipping the TLB > > > invalidates entirely if the mprotect relaxes the protections. > > We have such code already in mm/mprotect.c, introduced in: > > 10c1045f28e8 mm: numa: avoid unnecessary TLB flushes when setting NUMA hinting entries > > which does: > > /* Avoid TLB flush if possible */ > if (pte_protnone(oldpte)) > continue; > > > > Because if you *used* to be read-only, and them mprotect() > > > something so that it is read-write, there really is no need to > > > send a TLB invalidate, at least on x86. You can just change the > > > page tables, and *if* any entries are stale in the TLB they'll > > > take a microfault on access and then just reload the TLB. > > > > > > So mprotect() to a more permissive mode is not necessarily > > > serializing. > > > > The idea here is to always mprotect() to a more restrictive mode, > > which should trigger the TLB shootdown. > > So what happens if a CPU comes around that integrates TLB shootdown > management into its cache coherency protocol? In such a case IPI > traffic can be skipped: the memory bus messages take care of TLB > flushes in most cases. > > It's a natural optimization IMHO, because TLB flushes are conceptually > pretty close to the synchronization mechanisms inherent in data cache > coherency protocols: > > This could be implemented for example by a CPU that knows about ptes > and handles their modification differently: when a pte is modified it > will broadcast a MESI invalidation message not just for the cacheline > belonging to the pte's physical address, but also an 'invalidate TLB' > MESI message for the pte value's page. > > The TLB shootdown would either be guaranteed within the MESI > transaction, or there would either be a deterministic timing > guarantee, or some explicit synchronization mechanism (new > instruction) to make sure the remote TLB(s) got shot down. > > Every form of this would be way faster than sending interrupts. New > OSs could support this by the hardware telling them in which cases the > TLBs are 'auto-flushed', while old OSs would still be compatible by > sending (now pointless) TLB shootdown IPIs. > > So it's a relatively straightforward hardware optimization IMHO: > assuming TLB flushes are considered important enough to complicate the > cacheline state machine (which I think they currently aren't). > > So in this case there's no interrupt and no other interruption of the > remote CPU's flow of execution in any fashion that could advance the > RCU state machine. > > What do you think? I agree -- there really have been systems able to flush remote TLBs without interrupting the remote CPU. So, given the fact that the userspace RCU library does now see some real-world use, is it now time for Mathieu to resubmit his sys_membarrier() patch? Thanx, Paul ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [lttng-dev] Alternative to signals/sys_membarrier() in liburcu 2015-03-13 14:18 ` Paul E. McKenney @ 2015-03-23 9:35 ` Duncan Sands 0 siblings, 0 replies; 16+ messages in thread From: Duncan Sands @ 2015-03-23 9:35 UTC (permalink / raw) To: paulmck, Ingo Molnar Cc: Michael Sullivan, Peter Zijlstra, LKML, Steven Rostedt, lttng-dev, Thomas Gleixner, Linus Torvalds > So, given the fact that the userspace RCU library does now see > some real-world use, is it now time for Mathieu to resubmit his > sys_membarrier() patch? I'm using userspace RCU with success in financial software, so the LTTng project isn't the only user. It works well, but it's not as fast as I'd like. My profiling shows that the performance hit is coming from the memory barriers. So I would very much like to see sys_membarrier go in. Best wishes, Duncan. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Alternative to signals/sys_membarrier() in liburcu @ 2015-02-11 0:03 Michael Sullivan 0 siblings, 0 replies; 16+ messages in thread From: Michael Sullivan @ 2015-02-11 0:03 UTC (permalink / raw) To: lttng-dev [-- Attachment #1.1: Type: text/plain, Size: 1782 bytes --] I've been looking at the RCU library (as part of gathering examples for my research on weak memory models) and was thinking about ways to force other threads to issue barriers. Since it seems like sys_membarrier() never made it into the kernel, I was pondering whether there was some other way to more or less get its effect; as it turns out, there is, but it is a hack: mprotect(2). When a thread revokes write permissions on a page, the kernel needs to do a TLB shootdown to make sure that none of the other CPUs running code in that address space have a writable mapping for that page cached. In Linux, this is done by forcing code to invalidate the mappings to run on every CPU in the address space, and waiting for completion. The code for the "run this function on another CPU" mechanism forces the target CPU to issue an smp_mb(). (In practice TLB shootdowns are done when permissions are added, not just when they are removed, but they needn't be; faults caused by using a cached entry with less permissions can be fixed up by the page fault handler. They're also needed when unmapping memory, but mprotect() seems cheaper than having to mmap() and munmap(). Also TLB shootdowns aren't needed if the page is non-present because it's never been backed or has been swapped out, so mlock(2) is used to keep it in place). I hacked this up and in my limited testing, it does seem to have way better write side performance than the signal version has. That said, it is also super hacky and is certainly depending on behaviors of mprotect() that are not actually specified. It would be unusual, I think, to implement mprotect() in a way where this didn't work? It may well be to janky to actually be useful, though. I can send the code if you're interested. -Michael Sullivan [-- Attachment #1.2: Type: text/html, Size: 1931 bytes --] [-- Attachment #2: Type: text/plain, Size: 155 bytes --] _______________________________________________ lttng-dev mailing list lttng-dev@lists.lttng.org http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2015-03-23 9:35 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <CANW5cDmTCM9ZmhN7-2eWUEYvD+Y=sGt2i7mecdPTTLHMcT8fPg@mail.gmail.com>
2015-02-12 13:57 ` Alternative to signals/sys_membarrier() in liburcu Duncan Sands
2015-03-12 14:57 ` Mathieu Desnoyers
[not found] ` <54DCB15F.80505@free.fr>
2015-03-12 14:58 ` Mathieu Desnoyers
[not found] ` <867044376.285926.1426172227750.JavaMail.zimbra@efficios.com>
2015-03-12 16:04 ` Michael Sullivan
[not found] ` <CANW5cDkiZoysNM3rqb4v6Tj996ocsaSh=OZoBLfp4h7ZGb4bxg@mail.gmail.com>
2015-03-12 20:53 ` Mathieu Desnoyers
[not found] ` <666590480.287502.1426193588471.JavaMail.zimbra@efficios.com>
2015-03-12 20:56 ` Mathieu Desnoyers
2015-03-12 21:12 ` Paul E. McKenney
2015-03-14 21:06 ` Benjamin Herrenschmidt
2015-03-12 23:59 ` One Thousand Gnomes
2015-03-13 0:43 ` Mathieu Desnoyers
2015-03-12 21:47 ` Linus Torvalds
2015-03-12 22:30 ` Mathieu Desnoyers
2015-03-13 8:07 ` Ingo Molnar
2015-03-13 14:18 ` Paul E. McKenney
2015-03-23 9:35 ` [lttng-dev] " Duncan Sands
2015-02-11 0:03 Michael Sullivan
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).