From: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
paulmck <paulmck@linux.ibm.com>,
linux-kernel <linux-kernel@vger.kernel.org>,
Oleg Nesterov <oleg@redhat.com>,
"Eric W. Biederman" <ebiederm@xmission.com>,
"Russell King, ARM Linux" <linux@armlinux.org.uk>,
Chris Metcalf <cmetcalf@ezchip.com>, Chris Lameter <cl@linux.com>,
Kirill Tkhai <tkhai@yandex.ru>, Mike Galbraith <efault@gmx.de>,
Thomas Gleixner <tglx@linutronix.de>,
Ingo Molnar <mingo@kernel.org>
Subject: Re: [RFC PATCH 1/2] Fix: sched/membarrier: p->mm->membarrier_state racy load
Date: Thu, 5 Sep 2019 20:51:31 -0400 (EDT) [thread overview]
Message-ID: <1713923102.3325.1567731091553.JavaMail.zimbra@efficios.com> (raw)
In-Reply-To: <20190904182607.GG17205@worktop.programming.kicks-ass.net>
----- On Sep 4, 2019, at 2:26 PM, Peter Zijlstra peterz@infradead.org wrote:
> On Wed, Sep 04, 2019 at 01:12:53PM -0400, Mathieu Desnoyers wrote:
>> ----- On Sep 4, 2019, at 12:09 PM, Peter Zijlstra peterz@infradead.org wrote:
>>
>> > On Wed, Sep 04, 2019 at 11:19:00AM -0400, Mathieu Desnoyers wrote:
>> >> ----- On Sep 3, 2019, at 4:36 PM, Linus Torvalds torvalds@linux-foundation.org
>> >> wrote:
>> >
>> >> > I wonder if the easiest model might be to just use a percpu variable
>> >> > instead for the membarrier stuff? It's not like it has to be in
>> >> > 'struct task_struct' at all, I think. We only care about the current
>> >> > runqueues, and those are percpu anyway.
>> >>
>> >> One issue here is that membarrier iterates over all runqueues without
>> >> grabbing any runqueue lock. If we copy that state from mm to rq on
>> >> sched switch prepare, we would need to ensure we have the proper
>> >> memory barriers between:
>> >>
>> >> prior user-space memory accesses / setting the runqueue membarrier state
>> >>
>> >> and
>> >>
>> >> setting the runqueue membarrier state / following user-space memory accesses
>> >>
>> >> Copying the membarrier state into the task struct leverages the fact that
>> >> we have documented and guaranteed those barriers around the rq->curr update
>> >> in the scheduler.
>> >
>> > Should be the same as the barriers we already rely on for rq->curr, no?
>> > That is, if we put this before switch_mm() then we have
>> > smp_mb__after_spinlock() and switch_mm() itself.
>>
>> Yes, I think we can piggy-back on the already documented barriers documented
>> around
>> rq->curr store.
>>
>> > Also, if we place mm->membarrier_state in the same cacheline as mm->pgd
>> > (which switch_mm() is bound to load) then we should be fine, I think.
>>
>> Yes, if we make sure membarrier_prepare_task_switch only updates the
>> rq->membarrier_state if prev->mm != next->mm, we should be able to avoid
>> loading next->mm->membarrier_state when switch_mm() is not invoked.
>>
>> I'll prepare RFC patch implementing this approach.
>
> Thinking about this a bit; switching it 'on' still requires some
> thinking. Consider register on an already threaded process of which
> multiple threads are 'current' on multiple CPUs. In that case none of
> the rq bits will be set.
>
> Not even synchronize_rcu() is sufficient to force it either, since we
> only update on switch_mm() and nothing guarantees we pass that.
>
> One possible approach would be to IPI broadcast (after setting the
> ->mm->membarrier_State) and having the IPI update the rq state from
> 'current->mm'.
>
> But possible I'm just confusing evryone again. I'm not having a good day
> today.
Indeed, switching 'on' requires some care. I implemented the IPI-based
approach as per your suggestion,
Thanks,
Mathieu
--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
next prev parent reply other threads:[~2019-09-06 0:51 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-09-03 20:11 [RFC PATCH 1/2] Fix: sched/membarrier: p->mm->membarrier_state racy load Mathieu Desnoyers
2019-09-03 20:11 ` [RFC PATCH 2/2] Fix: sched/membarrier: private expedited registration check Mathieu Desnoyers
2019-09-03 20:24 ` [RFC PATCH 1/2] Fix: sched/membarrier: p->mm->membarrier_state racy load Peter Zijlstra
2019-09-03 20:36 ` Linus Torvalds
2019-09-04 15:19 ` Mathieu Desnoyers
2019-09-04 16:09 ` Peter Zijlstra
2019-09-04 17:12 ` Mathieu Desnoyers
2019-09-04 18:26 ` Peter Zijlstra
2019-09-06 0:51 ` Mathieu Desnoyers [this message]
2019-09-03 20:41 ` Mathieu Desnoyers
2019-09-04 11:28 ` Peter Zijlstra
2019-09-04 11:49 ` Peter Zijlstra
2019-09-04 15:26 ` Mathieu Desnoyers
2019-09-04 12:03 ` Oleg Nesterov
2019-09-04 12:43 ` Peter Zijlstra
2019-09-04 13:17 ` Oleg Nesterov
2019-09-03 20:27 ` Linus Torvalds
2019-09-03 20:53 ` Mathieu Desnoyers
2019-09-04 10:53 ` Oleg Nesterov
2019-09-04 11:39 ` Peter Zijlstra
2019-09-04 15:24 ` Mathieu Desnoyers
2019-09-04 11:11 ` Oleg Nesterov
2019-09-04 16:11 ` Mathieu Desnoyers
2019-09-08 13:46 ` Mathieu Desnoyers
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1713923102.3325.1567731091553.JavaMail.zimbra@efficios.com \
--to=mathieu.desnoyers@efficios.com \
--cc=cl@linux.com \
--cc=cmetcalf@ezchip.com \
--cc=ebiederm@xmission.com \
--cc=efault@gmx.de \
--cc=linux-kernel@vger.kernel.org \
--cc=linux@armlinux.org.uk \
--cc=mingo@kernel.org \
--cc=oleg@redhat.com \
--cc=paulmck@linux.ibm.com \
--cc=peterz@infradead.org \
--cc=tglx@linutronix.de \
--cc=tkhai@yandex.ru \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox