public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
To: Andy Lutomirski <luto@kernel.org>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
	Stephen Rothwell <sfr@canb.auug.org.au>,
	Thomas Gleixner <tglx@linutronix.de>, Ingo Molnar <mingo@elte.hu>,
	"H. Peter Anvin" <hpa@zytor.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Linux-Next Mailing List <linux-next@vger.kernel.org>,
	linux-kernel <linux-kernel@vger.kernel.org>
Subject: Re: linux-next: manual merge of the rcu tree with the tip tree
Date: Tue, 1 Aug 2017 14:02:32 +0000 (UTC)	[thread overview]
Message-ID: <1639218309.1091.1501596152868.JavaMail.zimbra@efficios.com> (raw)
In-Reply-To: <CALCETrWO5g5HuNpxa4Phxg--fDPWpuCVDTVr-UfuzrK5wn-8dQ@mail.gmail.com>

----- On Aug 1, 2017, at 9:43 AM, Andy Lutomirski luto@kernel.org wrote:

> On Mon, Jul 31, 2017 at 9:03 PM, Paul E. McKenney
> <paulmck@linux.vnet.ibm.com> wrote:
>> On Tue, Aug 01, 2017 at 12:04:05AM +0000, Mathieu Desnoyers wrote:
>>> ----- On Jul 31, 2017, at 12:13 PM, Paul E. McKenney paulmck@linux.vnet.ibm.com
>>> wrote:
>>>
> 
>>                                                         Thanx, Paul
>>
>> ------------------------------------------------------------------------
>>
>> commit fde19879b6bd1abc0c1d4d5f945efed61bf7eb8c
>> Author: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
>> Date:   Fri Jul 28 16:40:40 2017 -0400
>>
>>     membarrier: Expedited private command
>>
>>     Implement MEMBARRIER_CMD_PRIVATE_EXPEDITED with IPIs using cpumask built
>>     from all runqueues for which current thread's mm is the same as the
>>     thread calling sys_membarrier. It executes faster than the non-expedited
>>     variant (no blocking). It also works on NOHZ_FULL configurations.
>>
>>     Scheduler-wise, it requires a memory barrier before and after context
>>     switching between processes (which have different mm). The memory
>>     barrier before context switch is already present. For the barrier after
>>     context switch:
>>
>>     * Our TSO archs can do RELEASE without being a full barrier. Look at
>>       x86 spin_unlock() being a regular STORE for example.  But for those
>>       archs, all atomics imply smp_mb and all of them have atomic ops in
>>       switch_mm() for mm_cpumask().
> 
> I think that, on x86, context switches, even without mm changes, must
> at least flush the store buffer (maybe SFENCE is okay) to avoid
> visible inconsistency due to store-buffer forwarding.
> 
> Anyway, can you document whatever property you require with a comment
> in switch_mm() or wherever you're finding that property so that future
> arch changes don't break it?

As I asked to Paul in my reply to his proposed manual merge,
we should indeed have a comment in switch_mm() stating something
like this just before the line invoking cpumask_set_cpu():

/*
 * The full memory barrier implied by mm_cpumask update operations
 * is required by the membarrier system call.
 */

What we want to order here is:

prev userspace memory accesses
schedule
  <full mb> (it's already there) [A]
  update to rq->curr changing the rq->curr->mm value
  <full mb> (provided by mm_cpumask updates in switch_mm on x86) [B]
next userspace memory accesses

wrt to:

userspace memory accesses
sys_membarrier
  <full mb> [C]
  iterate on each cpu's rq->curr, compare their "mm" to current->mm
    IPI each CPU that match
  <full mb> [D]
userspace memory accesses

[A] pairs with [D] and [B] pairs with [C].


> 
>> +static void membarrier_private_expedited(void)
>> +{
>> +       int cpu;
>> +       bool fallback = false;
>> +       cpumask_var_t tmpmask;
>> +
>> +       if (num_online_cpus() == 1)
>> +               return;
>> +
>> +       /*
>> +        * Matches memory barriers around rq->curr modification in
>> +        * scheduler.
>> +        */
>> +       smp_mb();       /* system call entry is not a mb. */
>> +
>> +       /*
>> +        * Expedited membarrier commands guarantee that they won't
>> +        * block, hence the GFP_NOWAIT allocation flag and fallback
>> +        * implementation.
>> +        */
>> +       if (!zalloc_cpumask_var(&tmpmask, GFP_NOWAIT)) {
>> +               /* Fallback for OOM. */
>> +               fallback = true;
>> +       }
>> +
>> +       cpus_read_lock();
>> +       for_each_online_cpu(cpu) {
>> +               struct task_struct *p;
>> +
>> +               /*
>> +                * Skipping the current CPU is OK even through we can be
>> +                * migrated at any point. The current CPU, at the point
>> +                * where we read raw_smp_processor_id(), is ensured to
>> +                * be in program order with respect to the caller
>> +                * thread. Therefore, we can skip this CPU from the
>> +                * iteration.
>> +                */
>> +               if (cpu == raw_smp_processor_id())
>> +                       continue;
>> +               rcu_read_lock();
>> +               p = task_rcu_dereference(&cpu_rq(cpu)->curr);
>> +               if (p && p->mm == current->mm) {
> 
> I'm a bit surprised you're iterating all CPUs instead of just CPUs in
> mm_cpumask().

I see two reasons for this. The first is because architectures like
ARM64 don't even bother populating the mm_cpumask. The second reason
is because I don't think all architectures ensure that updates to
mm_cpumask imply full memory barriers. Therefore, we would need to revisit
each architecture switch_mm to ensure mm_cpumask bit set ops either imply
a full memory barrier, or are followed by an explicit one, if we
choose to use this bitmask as an optimization.

Thanks,

Mathieu


-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

  parent reply	other threads:[~2017-08-01 14:02 UTC|newest]

Thread overview: 71+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-07-31  3:50 linux-next: manual merge of the rcu tree with the tip tree Stephen Rothwell
2017-07-31 16:13 ` Paul E. McKenney
2017-08-01  0:04   ` Mathieu Desnoyers
2017-08-01  4:03     ` Paul E. McKenney
2017-08-01  4:25       ` Mathieu Desnoyers
2017-08-01 16:31         ` Paul E. McKenney
2017-08-01 13:43       ` Andy Lutomirski
2017-08-01 13:58         ` Peter Zijlstra
2017-08-01 14:15           ` Peter Zijlstra
2017-08-01 14:17             ` Andy Lutomirski
2017-08-01 14:02         ` Mathieu Desnoyers [this message]
2017-08-01 14:15           ` Andy Lutomirski
2017-08-01 15:40             ` Mathieu Desnoyers
2017-08-01 21:36             ` Paul E. McKenney
  -- strict thread matches above, loose matches on Subject: below --
2024-02-27  1:55 Stephen Rothwell
2022-04-06  2:45 Stephen Rothwell
2022-04-06 16:28 ` Paul E. McKenney
2022-02-21 18:17 broonie
2021-10-12  4:48 Stephen Rothwell
2021-10-13 16:31 ` Paul E. McKenney
2021-08-17  7:09 Stephen Rothwell
2021-06-23  5:33 Stephen Rothwell
2021-06-22  4:51 Stephen Rothwell
2021-06-22  4:47 Stephen Rothwell
2021-06-22  5:04 ` Stephen Rothwell
2021-06-22 17:10 ` Paul E. McKenney
2020-10-09  4:59 Stephen Rothwell
2020-07-29  6:23 Stephen Rothwell
2020-06-26  3:14 Stephen Rothwell
2020-06-25  2:44 Stephen Rothwell
2020-06-25  3:44 ` Paul E. McKenney
2020-06-24  3:04 Stephen Rothwell
2020-06-24  4:06 ` Paul E. McKenney
2020-05-29  6:22 Stephen Rothwell
2020-05-29  6:41 ` Stephen Rothwell
2020-05-29 14:15   ` Paul E. McKenney
2020-05-29 23:38     ` Stephen Rothwell
2020-03-25  3:08 Stephen Rothwell
2020-03-25  3:18 ` Paul E. McKenney
2020-03-25 21:31   ` Paul E. McKenney
2019-12-19  0:50 Stephen Rothwell
2019-12-19  1:27 ` Paul E. McKenney
2019-12-19  1:31   ` Paul E. McKenney
2019-12-19  8:41     ` Peter Zijlstra
2019-12-19 13:38       ` Paul E. McKenney
2019-12-16 23:37 Stephen Rothwell
2018-06-22  2:27 Stephen Rothwell
2018-06-26 19:33 ` Paul E. McKenney
2017-11-10  2:14 Stephen Rothwell
2017-08-22  4:13 Stephen Rothwell
2016-07-18  5:26 Stephen Rothwell
2016-07-19  3:00 ` Paul E. McKenney
2016-06-09  5:14 Stephen Rothwell
2016-06-09 15:59 ` Paul E. McKenney
2016-03-04  4:13 Stephen Rothwell
2016-03-04 15:04 ` Paul E. McKenney
2015-07-16  2:57 Stephen Rothwell
2015-05-07  3:56 Stephen Rothwell
2014-02-24  4:18 Stephen Rothwell
2014-02-24  4:42 ` Paul E. McKenney
2012-09-05  3:59 Stephen Rothwell
2012-09-05 16:39 ` Paul E. McKenney
2012-09-05 17:11   ` Peter Zijlstra
2012-08-23  3:01 Stephen Rothwell
2012-08-23  3:22 ` Paul E. McKenney
2012-08-22  4:27 Stephen Rothwell
2012-08-22  5:05 ` Paul E. McKenney
2012-08-22  4:27 Stephen Rothwell
2012-08-22  5:03 ` Paul E. McKenney
2011-06-20  4:47 Stephen Rothwell
2011-06-20 15:17 ` Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1639218309.1091.1501596152868.JavaMail.zimbra@efficios.com \
    --to=mathieu.desnoyers@efficios.com \
    --cc=hpa@zytor.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-next@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=mingo@elte.hu \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=peterz@infradead.org \
    --cc=sfr@canb.auug.org.au \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox