Re: [PATCH v2 4/4] membarrier: Execute SYNC_CORE on the calling thread

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
To: Andy Lutomirski <luto@kernel.org>
Cc: x86 <x86@kernel.org>, linux-kernel <linux-kernel@vger.kernel.org>,
	Nicholas Piggin <npiggin@gmail.com>,
	Arnd Bergmann <arnd@arndb.de>, Anton Blanchard <anton@ozlabs.org>,
	stable <stable@vger.kernel.org>
Subject: Re: [PATCH v2 4/4] membarrier: Execute SYNC_CORE on the calling thread
Date: Wed, 2 Dec 2020 14:39:32 -0500 (EST)	[thread overview]
Message-ID: <169451685.20.1606937972996.JavaMail.zimbra@efficios.com> (raw)
In-Reply-To: <8ee6202360fa1d1e2ab6e18846513bdbe20bc29c.1606923183.git.luto@kernel.org>

----- On Dec 2, 2020, at 10:35 AM, Andy Lutomirski luto@kernel.org wrote:

> membarrier()'s MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE is documented
> as syncing the core on all sibling threads but not necessarily the
> calling thread.  This behavior is fundamentally buggy and cannot be used
> safely.  Suppose a user program has two threads.  Thread A is on CPU 0
> and thread B is on CPU 1.  Thread A modifies some text and calls
> membarrier(MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE).  Then thread B
> executes the modified code.  If, at any point after membarrier() decides
> which CPUs to target, thread A could be preempted and replaced by thread
> B on CPU 0.  This could even happen on exit from the membarrier()
> syscall.  If this happens, thread B will end up running on CPU 0 without
> having synced.

Indeed, good catch! We only have sync core in the scheduler when switching
between mm, so indeed we cannot rely on the scheduler to issue a sync core
for us when switching between threads with the same mm.

> In principle, this could be fixed by arranging for the scheduler to
> sync_core_before_usermode() whenever switching between two threads in
> the same mm if there is any possibility of a concurrent membarrier()
> call, but this would have considerable overhead.  Instead, make
> membarrier() sync the calling CPU as well.

Yes, I agree that sync core on self is the right approach here.

> As an optimization, this avoids an extra smp_mb() in the default
> barrier-only mode.
> 
> Cc: stable@vger.kernel.org
> Signed-off-by: Andy Lutomirski <luto@kernel.org>
> ---
> kernel/sched/membarrier.c | 49 +++++++++++++++++++++++++--------------
> 1 file changed, 32 insertions(+), 17 deletions(-)
> 
> diff --git a/kernel/sched/membarrier.c b/kernel/sched/membarrier.c
> index 01538b31f27e..7df7c0e60647 100644
> --- a/kernel/sched/membarrier.c
> +++ b/kernel/sched/membarrier.c
> @@ -352,8 +352,6 @@ static int membarrier_private_expedited(int flags, int
> cpu_id)
> 

There is one small optimization you will want to adapt here:

        if (atomic_read(&mm->mm_users) == 1 || num_online_cpus() == 1)
                return 0;

should become:

        if (flags != MEMBARRIER_FLAG_SYNC_CORE && atomic_read(&mm->mm_users) == 1 || num_online_cpus() == 1)
                return 0;

So we issue a core sync for self in single-threaded applications,
to make things consistent. We can then document that membarrier sync core
issues a core sync on all thread siblings including the caller thread.

> 		if (cpu_id >= nr_cpu_ids || !cpu_online(cpu_id))
> 			goto out;
> -		if (cpu_id == raw_smp_processor_id())
> -			goto out;
> 		rcu_read_lock();
> 		p = rcu_dereference(cpu_rq(cpu_id)->curr);
> 		if (!p || p->mm != mm) {
> @@ -368,16 +366,6 @@ static int membarrier_private_expedited(int flags, int
> cpu_id)
> 		for_each_online_cpu(cpu) {
> 			struct task_struct *p;
> 
> -			/*
> -			 * Skipping the current CPU is OK even through we can be
> -			 * migrated at any point. The current CPU, at the point
> -			 * where we read raw_smp_processor_id(), is ensured to
> -			 * be in program order with respect to the caller
> -			 * thread. Therefore, we can skip this CPU from the
> -			 * iteration.
> -			 */
> -			if (cpu == raw_smp_processor_id())
> -				continue;
> 			p = rcu_dereference(cpu_rq(cpu)->curr);
> 			if (p && p->mm == mm)
> 				__cpumask_set_cpu(cpu, tmpmask);
> @@ -385,12 +373,39 @@ static int membarrier_private_expedited(int flags, int
> cpu_id)
> 		rcu_read_unlock();
> 	}
> 
> -	preempt_disable();
> -	if (cpu_id >= 0)
> +	if (cpu_id >= 0) {
> +		/*
> +		 * smp_call_function_single() will call ipi_func() if cpu_id
> +		 * is the calling CPU.
> +		 */
> 		smp_call_function_single(cpu_id, ipi_func, NULL, 1);
> -	else
> -		smp_call_function_many(tmpmask, ipi_func, NULL, 1);
> -	preempt_enable();
> +	} else {
> +		/*
> +		 * For regular membarrier, we can save a few cycles by
> +		 * skipping the current cpu -- we're about to do smp_mb()
> +		 * below, and if we migrate to a different cpu, this cpu
> +		 * and the new cpu will execute a full barrier in the
> +		 * scheduler.
> +		 *
> +		 * For CORE_SYNC, we do need a barrier on the current cpu --
> +		 * otherwise, if we are migrated and replaced by a different
> +		 * task in the same mm just before, during, or after
> +		 * membarrier, we will end up with some thread in the mm
> +		 * running without a core sync.
> +		 *
> +		 * For RSEQ, it seems polite to target the calling thread
> +		 * as well, although it's not clear it makes much difference
> +		 * either way.  Users aren't supposed to run syscalls in an
> +		 * rseq critical section.

Considering that we want a consistent behavior between single and multi-threaded
programs (as I pointed out above wrt the optimization change), I think it would
be better to skip self for the rseq ipi, in the same way we'd want to return
early for a membarrier-rseq-private on a single-threaded mm. Users are _really_
not supposed to run system calls in rseq critical sections. The kernel even kills
the offending applications when run on kernels with CONFIG_DEBUG_RSEQ=y. So it seems
rather pointless to waste cycles doing a rseq fence on self considering this.

Thanks,

Mathieu

> +		 */
> +		if (ipi_func == ipi_mb) {
> +			preempt_disable();
> +			smp_call_function_many(tmpmask, ipi_func, NULL, true);
> +			preempt_enable();
> +		} else {
> +			on_each_cpu_mask(tmpmask, ipi_func, NULL, true);
> +		}
> +	}
> 
> out:
> 	if (cpu_id < 0)
> --
> 2.28.0

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

     prev parent reply	other threads:[~2020-12-02 19:40 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-02 15:35 [PATCH v2 0/4] membarrier fixes Andy Lutomirski
2020-12-02 15:35 ` [PATCH v2 1/4] x86/membarrier: Get rid of a dubious optimization Andy Lutomirski
2020-12-02 15:35 ` [PATCH v2 2/4] membarrier: Add an actual barrier before rseq_preempt() Andy Lutomirski
2020-12-02 19:40   ` Mathieu Desnoyers
2020-12-02 15:35 ` [PATCH v2 3/4] membarrier: Explicitly sync remote cores when SYNC_CORE is requested Andy Lutomirski
2020-12-02 19:43   ` Mathieu Desnoyers
2020-12-02 15:35 ` [PATCH v2 4/4] membarrier: Execute SYNC_CORE on the calling thread Andy Lutomirski
2020-12-02 19:39   ` Mathieu Desnoyers [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=169451685.20.1606937972996.JavaMail.zimbra@efficios.com \
    --to=mathieu.desnoyers@efficios.com \
    --cc=anton@ozlabs.org \
    --cc=arnd@arndb.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=npiggin@gmail.com \
    --cc=stable@vger.kernel.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.