From: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
To: Alan Stern <stern@rowland.harvard.edu>
Cc: Nicholas Piggin <npiggin@gmail.com>, paulmck <paulmck@kernel.org>,
Anton Blanchard <anton@ozlabs.org>, Arnd Bergmann <arnd@arndb.de>,
linux-arch <linux-arch@vger.kernel.org>,
linux-kernel <linux-kernel@vger.kernel.org>,
linux-mm <linux-mm@kvack.org>,
linuxppc-dev <linuxppc-dev@lists.ozlabs.org>,
Andy Lutomirski <luto@kernel.org>,
Peter Zijlstra <peterz@infradead.org>, x86 <x86@kernel.org>
Subject: Re: [RFC PATCH 4/7] x86: use exit_lazy_tlb rather than membarrier_mm_sync_core_before_usermode
Date: Fri, 17 Jul 2020 11:39:08 -0400 (EDT) [thread overview]
Message-ID: <1697220787.18880.1595000348405.JavaMail.zimbra@efficios.com> (raw)
In-Reply-To: <20200717145102.GC1147780@rowland.harvard.edu>
----- On Jul 17, 2020, at 10:51 AM, Alan Stern stern@rowland.harvard.edu wrote:
> On Fri, Jul 17, 2020 at 09:39:25AM -0400, Mathieu Desnoyers wrote:
>> ----- On Jul 16, 2020, at 5:24 PM, Alan Stern stern@rowland.harvard.edu wrote:
>>
>> > On Thu, Jul 16, 2020 at 02:58:41PM -0400, Mathieu Desnoyers wrote:
>> >> ----- On Jul 16, 2020, at 12:03 PM, Mathieu Desnoyers
>> >> mathieu.desnoyers@efficios.com wrote:
>> >>
>> >> > ----- On Jul 16, 2020, at 11:46 AM, Mathieu Desnoyers
>> >> > mathieu.desnoyers@efficios.com wrote:
>> >> >
>> >> >> ----- On Jul 16, 2020, at 12:42 AM, Nicholas Piggin npiggin@gmail.com wrote:
>> >> >>> I should be more complete here, especially since I was complaining
>> >> >>> about unclear barrier comment :)
>> >> >>>
>> >> >>>
>> >> >>> CPU0 CPU1
>> >> >>> a. user stuff 1. user stuff
>> >> >>> b. membarrier() 2. enter kernel
>> >> >>> c. smp_mb() 3. smp_mb__after_spinlock(); // in __schedule
>> >> >>> d. read rq->curr 4. rq->curr switched to kthread
>> >> >>> e. is kthread, skip IPI 5. switch_to kthread
>> >> >>> f. return to user 6. rq->curr switched to user thread
>> >> >>> g. user stuff 7. switch_to user thread
>> >> >>> 8. exit kernel
>> >> >>> 9. more user stuff
>
> ...
>
>> >> Requiring a memory barrier between update of rq->curr (back to current process's
>> >> thread) and following user-space memory accesses does not seem to guarantee
>> >> anything more than what the initial barrier at the beginning of __schedule
>> >> already
>> >> provides, because the guarantees are only about accesses to user-space memory.
>
> ...
>
>> > Is it correct to say that the switch_to operations in 5 and 7 include
>> > memory barriers? If they do, then skipping the IPI should be okay.
>> >
>> > The reason is as follows: The guarantee you need to enforce is that
>> > anything written by CPU0 before the membarrier() will be visible to CPU1
>> > after it returns to user mode. Let's say that a writes to X and 9
>> > reads from X.
>> >
>> > Then we have an instance of the Store Buffer pattern:
>> >
>> > CPU0 CPU1
>> > a. Write X 6. Write rq->curr for user thread
>> > c. smp_mb() 7. switch_to memory barrier
>> > d. Read rq->curr 9. Read X
>> >
>> > In this pattern, the memory barriers make it impossible for both reads
>> > to miss their corresponding writes. Since d does fail to read 6 (it
>> > sees the earlier value stored by 4), 9 must read a.
>> >
>> > The other guarantee you need is that g on CPU0 will observe anything
>> > written by CPU1 in 1. This is easier to see, using the fact that 3 is a
>> > memory barrier and d reads from 4.
>>
>> Right, and Nick's reply involving pairs of loads/stores on each side
>> clarifies the situation even further.
>
> The key part of my reply was the question: "Is it correct to say that
> the switch_to operations in 5 and 7 include memory barriers?" From the
> text quoted above and from Nick's reply, it seems clear that they do
> not.
I remember that switch_mm implies it, but not switch_to.
The scenario that triggered this discussion is when the scheduler does a
lazy tlb entry/exit, which is basically switch from a user task to
a kernel thread without changing the mm, and usually switching back afterwards.
This optimization means the rq->curr mm temporarily differs, which prevent
IPIs from being sent by membarrier, but without involving a switch_mm.
This requires explicit memory barriers either on entry/exit of lazy tlb
mode, or explicit barriers in the scheduler for those special-cases.
> I agree with Nick: A memory barrier is needed somewhere between the
> assignment at 6 and the return to user mode at 8. Otherwise you end up
> with the Store Buffer pattern having a memory barrier on only one side,
> and it is well known that this arrangement does not guarantee any
> ordering.
Yes, I see this now. I'm still trying to wrap my head around why the memory
barrier at the end of membarrier() needs to be paired with a scheduler
barrier though.
> One thing I don't understand about all this: Any context switch has to
> include a memory barrier somewhere, but both you and Nick seem to be
> saying that steps 6 and 7 don't include (or don't need) any memory
> barriers. What am I missing?
All context switch have the smp_mb__before_spinlock at the beginning of
__schedule(), which I suspect is what you refer to. However this barrier
is before the store to rq->curr, not after.
Thanks,
Mathieu
--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
WARNING: multiple messages have this Message-ID (diff)
From: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
To: Alan Stern <stern@rowland.harvard.edu>
Cc: linux-arch <linux-arch@vger.kernel.org>,
paulmck <paulmck@kernel.org>, Arnd Bergmann <arnd@arndb.de>,
Peter Zijlstra <peterz@infradead.org>, x86 <x86@kernel.org>,
linux-kernel <linux-kernel@vger.kernel.org>,
Nicholas Piggin <npiggin@gmail.com>,
linux-mm <linux-mm@kvack.org>, Andy Lutomirski <luto@kernel.org>,
linuxppc-dev <linuxppc-dev@lists.ozlabs.org>
Subject: Re: [RFC PATCH 4/7] x86: use exit_lazy_tlb rather than membarrier_mm_sync_core_before_usermode
Date: Fri, 17 Jul 2020 11:39:08 -0400 (EDT) [thread overview]
Message-ID: <1697220787.18880.1595000348405.JavaMail.zimbra@efficios.com> (raw)
In-Reply-To: <20200717145102.GC1147780@rowland.harvard.edu>
----- On Jul 17, 2020, at 10:51 AM, Alan Stern stern@rowland.harvard.edu wrote:
> On Fri, Jul 17, 2020 at 09:39:25AM -0400, Mathieu Desnoyers wrote:
>> ----- On Jul 16, 2020, at 5:24 PM, Alan Stern stern@rowland.harvard.edu wrote:
>>
>> > On Thu, Jul 16, 2020 at 02:58:41PM -0400, Mathieu Desnoyers wrote:
>> >> ----- On Jul 16, 2020, at 12:03 PM, Mathieu Desnoyers
>> >> mathieu.desnoyers@efficios.com wrote:
>> >>
>> >> > ----- On Jul 16, 2020, at 11:46 AM, Mathieu Desnoyers
>> >> > mathieu.desnoyers@efficios.com wrote:
>> >> >
>> >> >> ----- On Jul 16, 2020, at 12:42 AM, Nicholas Piggin npiggin@gmail.com wrote:
>> >> >>> I should be more complete here, especially since I was complaining
>> >> >>> about unclear barrier comment :)
>> >> >>>
>> >> >>>
>> >> >>> CPU0 CPU1
>> >> >>> a. user stuff 1. user stuff
>> >> >>> b. membarrier() 2. enter kernel
>> >> >>> c. smp_mb() 3. smp_mb__after_spinlock(); // in __schedule
>> >> >>> d. read rq->curr 4. rq->curr switched to kthread
>> >> >>> e. is kthread, skip IPI 5. switch_to kthread
>> >> >>> f. return to user 6. rq->curr switched to user thread
>> >> >>> g. user stuff 7. switch_to user thread
>> >> >>> 8. exit kernel
>> >> >>> 9. more user stuff
>
> ...
>
>> >> Requiring a memory barrier between update of rq->curr (back to current process's
>> >> thread) and following user-space memory accesses does not seem to guarantee
>> >> anything more than what the initial barrier at the beginning of __schedule
>> >> already
>> >> provides, because the guarantees are only about accesses to user-space memory.
>
> ...
>
>> > Is it correct to say that the switch_to operations in 5 and 7 include
>> > memory barriers? If they do, then skipping the IPI should be okay.
>> >
>> > The reason is as follows: The guarantee you need to enforce is that
>> > anything written by CPU0 before the membarrier() will be visible to CPU1
>> > after it returns to user mode. Let's say that a writes to X and 9
>> > reads from X.
>> >
>> > Then we have an instance of the Store Buffer pattern:
>> >
>> > CPU0 CPU1
>> > a. Write X 6. Write rq->curr for user thread
>> > c. smp_mb() 7. switch_to memory barrier
>> > d. Read rq->curr 9. Read X
>> >
>> > In this pattern, the memory barriers make it impossible for both reads
>> > to miss their corresponding writes. Since d does fail to read 6 (it
>> > sees the earlier value stored by 4), 9 must read a.
>> >
>> > The other guarantee you need is that g on CPU0 will observe anything
>> > written by CPU1 in 1. This is easier to see, using the fact that 3 is a
>> > memory barrier and d reads from 4.
>>
>> Right, and Nick's reply involving pairs of loads/stores on each side
>> clarifies the situation even further.
>
> The key part of my reply was the question: "Is it correct to say that
> the switch_to operations in 5 and 7 include memory barriers?" From the
> text quoted above and from Nick's reply, it seems clear that they do
> not.
I remember that switch_mm implies it, but not switch_to.
The scenario that triggered this discussion is when the scheduler does a
lazy tlb entry/exit, which is basically switch from a user task to
a kernel thread without changing the mm, and usually switching back afterwards.
This optimization means the rq->curr mm temporarily differs, which prevent
IPIs from being sent by membarrier, but without involving a switch_mm.
This requires explicit memory barriers either on entry/exit of lazy tlb
mode, or explicit barriers in the scheduler for those special-cases.
> I agree with Nick: A memory barrier is needed somewhere between the
> assignment at 6 and the return to user mode at 8. Otherwise you end up
> with the Store Buffer pattern having a memory barrier on only one side,
> and it is well known that this arrangement does not guarantee any
> ordering.
Yes, I see this now. I'm still trying to wrap my head around why the memory
barrier at the end of membarrier() needs to be paired with a scheduler
barrier though.
> One thing I don't understand about all this: Any context switch has to
> include a memory barrier somewhere, but both you and Nick seem to be
> saying that steps 6 and 7 don't include (or don't need) any memory
> barriers. What am I missing?
All context switch have the smp_mb__before_spinlock at the beginning of
__schedule(), which I suspect is what you refer to. However this barrier
is before the store to rq->curr, not after.
Thanks,
Mathieu
--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
next prev parent reply other threads:[~2020-07-17 15:39 UTC|newest]
Thread overview: 118+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-07-10 1:56 [RFC PATCH 0/7] mmu context cleanup, lazy tlb cleanup, Nicholas Piggin
2020-07-10 1:56 ` Nicholas Piggin
2020-07-10 1:56 ` [RFC PATCH 1/7] asm-generic: add generic MMU versions of mmu context functions Nicholas Piggin
2020-07-10 1:56 ` Nicholas Piggin
2020-07-10 1:56 ` [RFC PATCH 2/7] arch: use asm-generic mmu context for no-op implementations Nicholas Piggin
2020-07-10 1:56 ` Nicholas Piggin
2020-07-10 1:56 ` [RFC PATCH 3/7] mm: introduce exit_lazy_tlb Nicholas Piggin
2020-07-10 1:56 ` Nicholas Piggin
2020-07-10 1:56 ` [RFC PATCH 4/7] x86: use exit_lazy_tlb rather than membarrier_mm_sync_core_before_usermode Nicholas Piggin
2020-07-10 1:56 ` Nicholas Piggin
2020-07-10 9:42 ` Peter Zijlstra
2020-07-10 9:42 ` Peter Zijlstra
2020-07-10 14:02 ` Mathieu Desnoyers
2020-07-10 14:02 ` Mathieu Desnoyers
2020-07-10 17:04 ` Andy Lutomirski
2020-07-10 17:04 ` Andy Lutomirski
2020-07-13 4:45 ` Nicholas Piggin
2020-07-13 4:45 ` Nicholas Piggin
2020-07-13 13:47 ` Nicholas Piggin
2020-07-13 13:47 ` Nicholas Piggin
2020-07-13 14:13 ` Mathieu Desnoyers
2020-07-13 14:13 ` Mathieu Desnoyers
2020-07-13 15:48 ` Andy Lutomirski
2020-07-13 15:48 ` Andy Lutomirski
2020-07-13 16:37 ` Nicholas Piggin
2020-07-13 16:37 ` Nicholas Piggin
2020-07-16 4:15 ` Nicholas Piggin
2020-07-16 4:15 ` Nicholas Piggin
2020-07-16 4:42 ` Nicholas Piggin
2020-07-16 4:42 ` Nicholas Piggin
2020-07-16 15:46 ` Mathieu Desnoyers
2020-07-16 15:46 ` Mathieu Desnoyers
2020-07-16 16:03 ` Mathieu Desnoyers
2020-07-16 16:03 ` Mathieu Desnoyers
2020-07-16 18:58 ` Mathieu Desnoyers
2020-07-16 18:58 ` Mathieu Desnoyers
2020-07-16 21:24 ` Alan Stern
2020-07-16 21:24 ` Alan Stern
2020-07-17 13:39 ` Mathieu Desnoyers
2020-07-17 13:39 ` Mathieu Desnoyers
2020-07-17 14:51 ` Alan Stern
2020-07-17 14:51 ` Alan Stern
2020-07-17 15:39 ` Mathieu Desnoyers [this message]
2020-07-17 15:39 ` Mathieu Desnoyers
2020-07-17 16:11 ` Alan Stern
2020-07-17 16:11 ` Alan Stern
2020-07-17 16:22 ` Mathieu Desnoyers
2020-07-17 16:22 ` Mathieu Desnoyers
2020-07-17 17:44 ` Alan Stern
2020-07-17 17:44 ` Alan Stern
2020-07-17 17:52 ` Mathieu Desnoyers
2020-07-17 17:52 ` Mathieu Desnoyers
2020-07-17 0:00 ` Nicholas Piggin
2020-07-17 0:00 ` Nicholas Piggin
2020-07-16 5:18 ` Andy Lutomirski
2020-07-16 5:18 ` Andy Lutomirski
2020-07-16 6:06 ` Nicholas Piggin
2020-07-16 6:06 ` Nicholas Piggin
2020-07-16 8:50 ` Peter Zijlstra
2020-07-16 8:50 ` Peter Zijlstra
2020-07-16 10:03 ` Nicholas Piggin
2020-07-16 10:03 ` Nicholas Piggin
2020-07-16 11:00 ` peterz
2020-07-16 11:00 ` peterz
2020-07-16 15:34 ` Mathieu Desnoyers
2020-07-16 15:34 ` Mathieu Desnoyers
2020-07-16 23:26 ` Nicholas Piggin
2020-07-16 23:26 ` Nicholas Piggin
2020-07-17 13:42 ` Mathieu Desnoyers
2020-07-17 13:42 ` Mathieu Desnoyers
2020-07-20 3:03 ` Nicholas Piggin
2020-07-20 3:03 ` Nicholas Piggin
2020-07-20 16:46 ` Mathieu Desnoyers
2020-07-20 16:46 ` Mathieu Desnoyers
2020-07-21 10:04 ` Nicholas Piggin
2020-07-21 10:04 ` Nicholas Piggin
2020-07-21 13:11 ` Mathieu Desnoyers
2020-07-21 13:11 ` Mathieu Desnoyers
2020-07-21 14:30 ` Nicholas Piggin
2020-07-21 14:30 ` Nicholas Piggin
2020-07-21 15:06 ` peterz
2020-07-21 15:06 ` peterz
2020-07-21 15:15 ` Mathieu Desnoyers
2020-07-21 15:15 ` Mathieu Desnoyers
2020-07-21 15:19 ` Peter Zijlstra
2020-07-21 15:19 ` Peter Zijlstra
2020-07-21 15:22 ` Mathieu Desnoyers
2020-07-21 15:22 ` Mathieu Desnoyers
2020-07-10 1:56 ` [RFC PATCH 5/7] lazy tlb: introduce lazy mm refcount helper functions Nicholas Piggin
2020-07-10 1:56 ` Nicholas Piggin
2020-07-10 9:48 ` Peter Zijlstra
2020-07-10 9:48 ` Peter Zijlstra
2020-07-10 1:56 ` [RFC PATCH 6/7] lazy tlb: allow lazy tlb mm switching to be configurable Nicholas Piggin
2020-07-10 1:56 ` Nicholas Piggin
2020-07-10 1:56 ` [RFC PATCH 7/7] lazy tlb: shoot lazies, a non-refcounting lazy tlb option Nicholas Piggin
2020-07-10 1:56 ` Nicholas Piggin
2020-07-10 9:35 ` Peter Zijlstra
2020-07-10 9:35 ` Peter Zijlstra
2020-07-13 4:58 ` Nicholas Piggin
2020-07-13 4:58 ` Nicholas Piggin
2020-07-13 15:59 ` Andy Lutomirski
2020-07-13 15:59 ` Andy Lutomirski
2020-07-13 16:48 ` Nicholas Piggin
2020-07-13 16:48 ` Nicholas Piggin
2020-07-13 18:18 ` Andy Lutomirski
2020-07-13 18:18 ` Andy Lutomirski
2020-07-14 5:04 ` Nicholas Piggin
2020-07-14 5:04 ` Nicholas Piggin
2020-07-14 6:31 ` Nicholas Piggin
2020-07-14 6:31 ` Nicholas Piggin
2020-07-14 12:46 ` Andy Lutomirski
2020-07-14 12:46 ` Andy Lutomirski
2020-07-14 13:23 ` Peter Zijlstra
2020-07-14 13:23 ` Peter Zijlstra
2020-07-16 2:26 ` Nicholas Piggin
2020-07-16 2:26 ` Nicholas Piggin
2020-07-16 2:35 ` Nicholas Piggin
2020-07-16 2:35 ` Nicholas Piggin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1697220787.18880.1595000348405.JavaMail.zimbra@efficios.com \
--to=mathieu.desnoyers@efficios.com \
--cc=anton@ozlabs.org \
--cc=arnd@arndb.de \
--cc=linux-arch@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=luto@kernel.org \
--cc=npiggin@gmail.com \
--cc=paulmck@kernel.org \
--cc=peterz@infradead.org \
--cc=stern@rowland.harvard.edu \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.