All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
To: Alan Stern <stern@rowland.harvard.edu>
Cc: Nicholas Piggin <npiggin@gmail.com>, paulmck <paulmck@kernel.org>,
	Anton Blanchard <anton@ozlabs.org>, Arnd Bergmann <arnd@arndb.de>,
	linux-arch <linux-arch@vger.kernel.org>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	linux-mm <linux-mm@kvack.org>,
	linuxppc-dev <linuxppc-dev@lists.ozlabs.org>,
	Andy Lutomirski <luto@kernel.org>,
	Peter Zijlstra <peterz@infradead.org>, x86 <x86@kernel.org>
Subject: Re: [RFC PATCH 4/7] x86: use exit_lazy_tlb rather than membarrier_mm_sync_core_before_usermode
Date: Fri, 17 Jul 2020 09:39:25 -0400 (EDT)	[thread overview]
Message-ID: <1770378591.18523.1594993165391.JavaMail.zimbra@efficios.com> (raw)
In-Reply-To: <20200716212416.GA1126458@rowland.harvard.edu>

----- On Jul 16, 2020, at 5:24 PM, Alan Stern stern@rowland.harvard.edu wrote:

> On Thu, Jul 16, 2020 at 02:58:41PM -0400, Mathieu Desnoyers wrote:
>> ----- On Jul 16, 2020, at 12:03 PM, Mathieu Desnoyers
>> mathieu.desnoyers@efficios.com wrote:
>> 
>> > ----- On Jul 16, 2020, at 11:46 AM, Mathieu Desnoyers
>> > mathieu.desnoyers@efficios.com wrote:
>> > 
>> >> ----- On Jul 16, 2020, at 12:42 AM, Nicholas Piggin npiggin@gmail.com wrote:
>> >>> I should be more complete here, especially since I was complaining
>> >>> about unclear barrier comment :)
>> >>> 
>> >>> 
>> >>> CPU0                     CPU1
>> >>> a. user stuff            1. user stuff
>> >>> b. membarrier()          2. enter kernel
>> >>> c. smp_mb()              3. smp_mb__after_spinlock(); // in __schedule
>> >>> d. read rq->curr         4. rq->curr switched to kthread
>> >>> e. is kthread, skip IPI  5. switch_to kthread
>> >>> f. return to user        6. rq->curr switched to user thread
>> >>> g. user stuff            7. switch_to user thread
>> >>>                         8. exit kernel
>> >>>                         9. more user stuff
>> >>> 
>> >>> What you're really ordering is a, g vs 1, 9 right?
>> >>> 
>> >>> In other words, 9 must see a if it sees g, g must see 1 if it saw 9,
>> >>> etc.
>> >>> 
>> >>> Userspace does not care where the barriers are exactly or what kernel
>> >>> memory accesses might be being ordered by them, so long as there is a
>> >>> mb somewhere between a and g, and 1 and 9. Right?
>> >> 
>> >> This is correct.
>> > 
>> > Actually, sorry, the above is not quite right. It's been a while
>> > since I looked into the details of membarrier.
>> > 
>> > The smp_mb() at the beginning of membarrier() needs to be paired with a
>> > smp_mb() _after_ rq->curr is switched back to the user thread, so the
>> > memory barrier is between store to rq->curr and following user-space
>> > accesses.
>> > 
>> > The smp_mb() at the end of membarrier() needs to be paired with the
>> > smp_mb__after_spinlock() at the beginning of schedule, which is
>> > between accesses to userspace memory and switching rq->curr to kthread.
>> > 
>> > As to *why* this ordering is needed, I'd have to dig through additional
>> > scenarios from https://lwn.net/Articles/573436/. Or maybe Paul remembers ?
>> 
>> Thinking further about this, I'm beginning to consider that maybe we have been
>> overly cautious by requiring memory barriers before and after store to rq->curr.
>> 
>> If CPU0 observes a CPU1's rq->curr->mm which differs from its own process
>> (current)
>> while running the membarrier system call, it necessarily means that CPU1 had
>> to issue smp_mb__after_spinlock when entering the scheduler, between any
>> user-space
>> loads/stores and update of rq->curr.
>> 
>> Requiring a memory barrier between update of rq->curr (back to current process's
>> thread) and following user-space memory accesses does not seem to guarantee
>> anything more than what the initial barrier at the beginning of __schedule
>> already
>> provides, because the guarantees are only about accesses to user-space memory.
>> 
>> Therefore, with the memory barrier at the beginning of __schedule, just
>> observing that
>> CPU1's rq->curr differs from current should guarantee that a memory barrier was
>> issued
>> between any sequentially consistent instructions belonging to the current
>> process on
>> CPU1.
>> 
>> Or am I missing/misremembering an important point here ?
> 
> Is it correct to say that the switch_to operations in 5 and 7 include
> memory barriers?  If they do, then skipping the IPI should be okay.
> 
> The reason is as follows: The guarantee you need to enforce is that
> anything written by CPU0 before the membarrier() will be visible to CPU1
> after it returns to user mode.  Let's say that a writes to X and 9
> reads from X.
> 
> Then we have an instance of the Store Buffer pattern:
> 
>	CPU0			CPU1
>	a. Write X		6. Write rq->curr for user thread
>	c. smp_mb()		7. switch_to memory barrier
>	d. Read rq->curr	9. Read X
> 
> In this pattern, the memory barriers make it impossible for both reads
> to miss their corresponding writes.  Since d does fail to read 6 (it
> sees the earlier value stored by 4), 9 must read a.
> 
> The other guarantee you need is that g on CPU0 will observe anything
> written by CPU1 in 1.  This is easier to see, using the fact that 3 is a
> memory barrier and d reads from 4.

Right, and Nick's reply involving pairs of loads/stores on each side
clarifies the situation even further.

Thanks,

Mathieu


-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

WARNING: multiple messages have this Message-ID (diff)
From: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
To: Alan Stern <stern@rowland.harvard.edu>
Cc: linux-arch <linux-arch@vger.kernel.org>,
	paulmck <paulmck@kernel.org>, Arnd Bergmann <arnd@arndb.de>,
	Peter Zijlstra <peterz@infradead.org>, x86 <x86@kernel.org>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	Nicholas Piggin <npiggin@gmail.com>,
	linux-mm <linux-mm@kvack.org>, Andy Lutomirski <luto@kernel.org>,
	linuxppc-dev <linuxppc-dev@lists.ozlabs.org>
Subject: Re: [RFC PATCH 4/7] x86: use exit_lazy_tlb rather than membarrier_mm_sync_core_before_usermode
Date: Fri, 17 Jul 2020 09:39:25 -0400 (EDT)	[thread overview]
Message-ID: <1770378591.18523.1594993165391.JavaMail.zimbra@efficios.com> (raw)
In-Reply-To: <20200716212416.GA1126458@rowland.harvard.edu>

----- On Jul 16, 2020, at 5:24 PM, Alan Stern stern@rowland.harvard.edu wrote:

> On Thu, Jul 16, 2020 at 02:58:41PM -0400, Mathieu Desnoyers wrote:
>> ----- On Jul 16, 2020, at 12:03 PM, Mathieu Desnoyers
>> mathieu.desnoyers@efficios.com wrote:
>> 
>> > ----- On Jul 16, 2020, at 11:46 AM, Mathieu Desnoyers
>> > mathieu.desnoyers@efficios.com wrote:
>> > 
>> >> ----- On Jul 16, 2020, at 12:42 AM, Nicholas Piggin npiggin@gmail.com wrote:
>> >>> I should be more complete here, especially since I was complaining
>> >>> about unclear barrier comment :)
>> >>> 
>> >>> 
>> >>> CPU0                     CPU1
>> >>> a. user stuff            1. user stuff
>> >>> b. membarrier()          2. enter kernel
>> >>> c. smp_mb()              3. smp_mb__after_spinlock(); // in __schedule
>> >>> d. read rq->curr         4. rq->curr switched to kthread
>> >>> e. is kthread, skip IPI  5. switch_to kthread
>> >>> f. return to user        6. rq->curr switched to user thread
>> >>> g. user stuff            7. switch_to user thread
>> >>>                         8. exit kernel
>> >>>                         9. more user stuff
>> >>> 
>> >>> What you're really ordering is a, g vs 1, 9 right?
>> >>> 
>> >>> In other words, 9 must see a if it sees g, g must see 1 if it saw 9,
>> >>> etc.
>> >>> 
>> >>> Userspace does not care where the barriers are exactly or what kernel
>> >>> memory accesses might be being ordered by them, so long as there is a
>> >>> mb somewhere between a and g, and 1 and 9. Right?
>> >> 
>> >> This is correct.
>> > 
>> > Actually, sorry, the above is not quite right. It's been a while
>> > since I looked into the details of membarrier.
>> > 
>> > The smp_mb() at the beginning of membarrier() needs to be paired with a
>> > smp_mb() _after_ rq->curr is switched back to the user thread, so the
>> > memory barrier is between store to rq->curr and following user-space
>> > accesses.
>> > 
>> > The smp_mb() at the end of membarrier() needs to be paired with the
>> > smp_mb__after_spinlock() at the beginning of schedule, which is
>> > between accesses to userspace memory and switching rq->curr to kthread.
>> > 
>> > As to *why* this ordering is needed, I'd have to dig through additional
>> > scenarios from https://lwn.net/Articles/573436/. Or maybe Paul remembers ?
>> 
>> Thinking further about this, I'm beginning to consider that maybe we have been
>> overly cautious by requiring memory barriers before and after store to rq->curr.
>> 
>> If CPU0 observes a CPU1's rq->curr->mm which differs from its own process
>> (current)
>> while running the membarrier system call, it necessarily means that CPU1 had
>> to issue smp_mb__after_spinlock when entering the scheduler, between any
>> user-space
>> loads/stores and update of rq->curr.
>> 
>> Requiring a memory barrier between update of rq->curr (back to current process's
>> thread) and following user-space memory accesses does not seem to guarantee
>> anything more than what the initial barrier at the beginning of __schedule
>> already
>> provides, because the guarantees are only about accesses to user-space memory.
>> 
>> Therefore, with the memory barrier at the beginning of __schedule, just
>> observing that
>> CPU1's rq->curr differs from current should guarantee that a memory barrier was
>> issued
>> between any sequentially consistent instructions belonging to the current
>> process on
>> CPU1.
>> 
>> Or am I missing/misremembering an important point here ?
> 
> Is it correct to say that the switch_to operations in 5 and 7 include
> memory barriers?  If they do, then skipping the IPI should be okay.
> 
> The reason is as follows: The guarantee you need to enforce is that
> anything written by CPU0 before the membarrier() will be visible to CPU1
> after it returns to user mode.  Let's say that a writes to X and 9
> reads from X.
> 
> Then we have an instance of the Store Buffer pattern:
> 
>	CPU0			CPU1
>	a. Write X		6. Write rq->curr for user thread
>	c. smp_mb()		7. switch_to memory barrier
>	d. Read rq->curr	9. Read X
> 
> In this pattern, the memory barriers make it impossible for both reads
> to miss their corresponding writes.  Since d does fail to read 6 (it
> sees the earlier value stored by 4), 9 must read a.
> 
> The other guarantee you need is that g on CPU0 will observe anything
> written by CPU1 in 1.  This is easier to see, using the fact that 3 is a
> memory barrier and d reads from 4.

Right, and Nick's reply involving pairs of loads/stores on each side
clarifies the situation even further.

Thanks,

Mathieu


-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

  reply	other threads:[~2020-07-17 13:39 UTC|newest]

Thread overview: 118+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-10  1:56 [RFC PATCH 0/7] mmu context cleanup, lazy tlb cleanup, Nicholas Piggin
2020-07-10  1:56 ` Nicholas Piggin
2020-07-10  1:56 ` [RFC PATCH 1/7] asm-generic: add generic MMU versions of mmu context functions Nicholas Piggin
2020-07-10  1:56   ` Nicholas Piggin
2020-07-10  1:56 ` [RFC PATCH 2/7] arch: use asm-generic mmu context for no-op implementations Nicholas Piggin
2020-07-10  1:56   ` Nicholas Piggin
2020-07-10  1:56 ` [RFC PATCH 3/7] mm: introduce exit_lazy_tlb Nicholas Piggin
2020-07-10  1:56   ` Nicholas Piggin
2020-07-10  1:56 ` [RFC PATCH 4/7] x86: use exit_lazy_tlb rather than membarrier_mm_sync_core_before_usermode Nicholas Piggin
2020-07-10  1:56   ` Nicholas Piggin
2020-07-10  9:42   ` Peter Zijlstra
2020-07-10  9:42     ` Peter Zijlstra
2020-07-10 14:02   ` Mathieu Desnoyers
2020-07-10 14:02     ` Mathieu Desnoyers
2020-07-10 17:04   ` Andy Lutomirski
2020-07-10 17:04     ` Andy Lutomirski
2020-07-13  4:45     ` Nicholas Piggin
2020-07-13  4:45       ` Nicholas Piggin
2020-07-13 13:47       ` Nicholas Piggin
2020-07-13 13:47         ` Nicholas Piggin
2020-07-13 14:13         ` Mathieu Desnoyers
2020-07-13 14:13           ` Mathieu Desnoyers
2020-07-13 15:48           ` Andy Lutomirski
2020-07-13 15:48             ` Andy Lutomirski
2020-07-13 16:37             ` Nicholas Piggin
2020-07-13 16:37               ` Nicholas Piggin
2020-07-16  4:15           ` Nicholas Piggin
2020-07-16  4:15             ` Nicholas Piggin
2020-07-16  4:42             ` Nicholas Piggin
2020-07-16  4:42               ` Nicholas Piggin
2020-07-16 15:46               ` Mathieu Desnoyers
2020-07-16 15:46                 ` Mathieu Desnoyers
2020-07-16 16:03                 ` Mathieu Desnoyers
2020-07-16 16:03                   ` Mathieu Desnoyers
2020-07-16 18:58                   ` Mathieu Desnoyers
2020-07-16 18:58                     ` Mathieu Desnoyers
2020-07-16 21:24                     ` Alan Stern
2020-07-16 21:24                       ` Alan Stern
2020-07-17 13:39                       ` Mathieu Desnoyers [this message]
2020-07-17 13:39                         ` Mathieu Desnoyers
2020-07-17 14:51                         ` Alan Stern
2020-07-17 14:51                           ` Alan Stern
2020-07-17 15:39                           ` Mathieu Desnoyers
2020-07-17 15:39                             ` Mathieu Desnoyers
2020-07-17 16:11                             ` Alan Stern
2020-07-17 16:11                               ` Alan Stern
2020-07-17 16:22                               ` Mathieu Desnoyers
2020-07-17 16:22                                 ` Mathieu Desnoyers
2020-07-17 17:44                                 ` Alan Stern
2020-07-17 17:44                                   ` Alan Stern
2020-07-17 17:52                                   ` Mathieu Desnoyers
2020-07-17 17:52                                     ` Mathieu Desnoyers
2020-07-17  0:00                     ` Nicholas Piggin
2020-07-17  0:00                       ` Nicholas Piggin
2020-07-16  5:18             ` Andy Lutomirski
2020-07-16  5:18               ` Andy Lutomirski
2020-07-16  6:06               ` Nicholas Piggin
2020-07-16  6:06                 ` Nicholas Piggin
2020-07-16  8:50               ` Peter Zijlstra
2020-07-16  8:50                 ` Peter Zijlstra
2020-07-16 10:03                 ` Nicholas Piggin
2020-07-16 10:03                   ` Nicholas Piggin
2020-07-16 11:00                   ` peterz
2020-07-16 11:00                     ` peterz
2020-07-16 15:34                     ` Mathieu Desnoyers
2020-07-16 15:34                       ` Mathieu Desnoyers
2020-07-16 23:26                     ` Nicholas Piggin
2020-07-16 23:26                       ` Nicholas Piggin
2020-07-17 13:42                       ` Mathieu Desnoyers
2020-07-17 13:42                         ` Mathieu Desnoyers
2020-07-20  3:03                         ` Nicholas Piggin
2020-07-20  3:03                           ` Nicholas Piggin
2020-07-20 16:46                           ` Mathieu Desnoyers
2020-07-20 16:46                             ` Mathieu Desnoyers
2020-07-21 10:04                             ` Nicholas Piggin
2020-07-21 10:04                               ` Nicholas Piggin
2020-07-21 13:11                               ` Mathieu Desnoyers
2020-07-21 13:11                                 ` Mathieu Desnoyers
2020-07-21 14:30                                 ` Nicholas Piggin
2020-07-21 14:30                                   ` Nicholas Piggin
2020-07-21 15:06                               ` peterz
2020-07-21 15:06                                 ` peterz
2020-07-21 15:15                                 ` Mathieu Desnoyers
2020-07-21 15:15                                   ` Mathieu Desnoyers
2020-07-21 15:19                                   ` Peter Zijlstra
2020-07-21 15:19                                     ` Peter Zijlstra
2020-07-21 15:22                                     ` Mathieu Desnoyers
2020-07-21 15:22                                       ` Mathieu Desnoyers
2020-07-10  1:56 ` [RFC PATCH 5/7] lazy tlb: introduce lazy mm refcount helper functions Nicholas Piggin
2020-07-10  1:56   ` Nicholas Piggin
2020-07-10  9:48   ` Peter Zijlstra
2020-07-10  9:48     ` Peter Zijlstra
2020-07-10  1:56 ` [RFC PATCH 6/7] lazy tlb: allow lazy tlb mm switching to be configurable Nicholas Piggin
2020-07-10  1:56   ` Nicholas Piggin
2020-07-10  1:56 ` [RFC PATCH 7/7] lazy tlb: shoot lazies, a non-refcounting lazy tlb option Nicholas Piggin
2020-07-10  1:56   ` Nicholas Piggin
2020-07-10  9:35   ` Peter Zijlstra
2020-07-10  9:35     ` Peter Zijlstra
2020-07-13  4:58     ` Nicholas Piggin
2020-07-13  4:58       ` Nicholas Piggin
2020-07-13 15:59   ` Andy Lutomirski
2020-07-13 15:59     ` Andy Lutomirski
2020-07-13 16:48     ` Nicholas Piggin
2020-07-13 16:48       ` Nicholas Piggin
2020-07-13 18:18       ` Andy Lutomirski
2020-07-13 18:18         ` Andy Lutomirski
2020-07-14  5:04         ` Nicholas Piggin
2020-07-14  5:04           ` Nicholas Piggin
2020-07-14  6:31           ` Nicholas Piggin
2020-07-14  6:31             ` Nicholas Piggin
2020-07-14 12:46             ` Andy Lutomirski
2020-07-14 12:46               ` Andy Lutomirski
2020-07-14 13:23               ` Peter Zijlstra
2020-07-14 13:23                 ` Peter Zijlstra
2020-07-16  2:26               ` Nicholas Piggin
2020-07-16  2:26                 ` Nicholas Piggin
2020-07-16  2:35               ` Nicholas Piggin
2020-07-16  2:35                 ` Nicholas Piggin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1770378591.18523.1594993165391.JavaMail.zimbra@efficios.com \
    --to=mathieu.desnoyers@efficios.com \
    --cc=anton@ozlabs.org \
    --cc=arnd@arndb.de \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=luto@kernel.org \
    --cc=npiggin@gmail.com \
    --cc=paulmck@kernel.org \
    --cc=peterz@infradead.org \
    --cc=stern@rowland.harvard.edu \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.