From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1752863AbdJSRsw (ORCPT <rfc822;w@1wt.eu>);
        Thu, 19 Oct 2017 13:48:52 -0400
Received: from mx0b-001b2d01.pphosted.com ([148.163.158.5]:55378 "EHLO
        mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-FAIL)
        by vger.kernel.org with ESMTP id S1751882AbdJSRsu (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 19 Oct 2017 13:48:50 -0400
Date: Thu, 19 Oct 2017 10:48:46 -0700
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
        linux-kernel@vger.kernel.org, Peter Zijlstra <peterz@infradead.org>,
        Ingo Molnar <mingo@redhat.com>,
        Alexander Viro <viro@zeniv.linux.org.uk>
Subject: Re: [PATCH for 4.14] membarrier: Provide register expedited private
 command
Reply-To: paulmck@linux.vnet.ibm.com
References: <20171019173015.8274-1-mathieu.desnoyers@efficios.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20171019173015.8274-1-mathieu.desnoyers@efficios.com>
User-Agent: Mutt/1.5.21 (2010-09-15)
X-TM-AS-GCONF: 00
x-cbid: 17101917-0048-0000-0000-000001F839DD
X-IBM-SpamModules-Scores: 
X-IBM-SpamModules-Versions: BY=3.00007919; HX=3.00000241; KW=3.00000007;
 PH=3.00000004; SC=3.00000238; SDB=6.00933475; UDB=6.00470181; IPR=6.00713764;
 BA=6.00005651; NDR=6.00000001; ZLA=6.00000005; ZF=6.00000009; ZB=6.00000000;
 ZP=6.00000000; ZH=6.00000000; ZU=6.00000002; MB=3.00017610; XFM=3.00000015;
 UTC=2017-10-19 17:48:48
X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused
x-cbparentid: 17101917-0049-0000-0000-000042EC2177
Message-Id: <20171019174846.GZ3521@linux.vnet.ibm.com>
X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2017-10-19_09:,,
 signatures=0
X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=2
 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam
 adjust=0 reason=mlx scancount=1 engine=8.0.1-1707230000
 definitions=main-1710190245
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Thu, Oct 19, 2017 at 01:30:15PM -0400, Mathieu Desnoyers wrote:
> [ This patch is sent directly to Linus, because it needs to be merged
>   before the end of 4.14 rc cycle. It introduces a "register private
>   expedited" membarrier command which allows eventual removal of
>   important memory barrier constraints on the scheduler fast-paths. It
>   changes how the "private expedited" membarrier command (new to 4.14)
>   is used from user-space. Sorry to send this late in the cycle. ]
> 
> Provide a command allowing processes to register their intent to use
> the private expedited command. This affects how the expedited private
> command introduced in 4.14-rc is meant to be used, and should be merged
> before 4.14 final.
> 
> Processes are now required to register before using
> MEMBARRIER_CMD_PRIVATE_EXPEDITED, otherwise that command returns EPERM.
> 
> This fixes a problem that arose when designing requested extensions to
> sys_membarrier() to allow JITs to efficiently flush old code from
> instruction caches.  Several potential algorithms are much less painful
> if the user register intent to use this functionality early on, for
> example, before the process spawns the second thread.  Registering at
> this time removes the need to interrupt each and every thread in that
> process at the first expedited sys_membarrier() system call.
> 
> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

This looks much less intrusive than the earlier series!

Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

> CC: Peter Zijlstra <peterz@infradead.org>
> CC: Ingo Molnar <mingo@redhat.com>
> CC: Alexander Viro <viro@zeniv.linux.org.uk>
> CC: Linus Torvalds <torvalds@linux-foundation.org>
> ---
>  fs/exec.c                       |  1 +
>  include/linux/mm_types.h        |  3 +++
>  include/linux/sched/mm.h        | 16 ++++++++++++++++
>  include/uapi/linux/membarrier.h | 23 ++++++++++++++++-------
>  kernel/sched/membarrier.c       | 34 ++++++++++++++++++++++++++++++----
>  5 files changed, 66 insertions(+), 11 deletions(-)
> 
> diff --git a/fs/exec.c b/fs/exec.c
> index 5470d3c1892a..3e14ba25f678 100644
> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -1802,6 +1802,7 @@ static int do_execveat_common(int fd, struct filename *filename,
>  	/* execve succeeded */
>  	current->fs->in_exec = 0;
>  	current->in_execve = 0;
> +	membarrier_execve(current);
>  	acct_update_integrals(current);
>  	task_numa_free(current);
>  	free_bprm(bprm);
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index 46f4ecf5479a..1861ea8dba77 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -445,6 +445,9 @@ struct mm_struct {
>  	unsigned long flags; /* Must use atomic bitops to access the bits */
> 
>  	struct core_state *core_state; /* coredumping support */
> +#ifdef CONFIG_MEMBARRIER
> +	atomic_t membarrier_state;
> +#endif
>  #ifdef CONFIG_AIO
>  	spinlock_t			ioctx_lock;
>  	struct kioctx_table __rcu	*ioctx_table;
> diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h
> index ae53e413fb13..ab9bf7b73954 100644
> --- a/include/linux/sched/mm.h
> +++ b/include/linux/sched/mm.h
> @@ -211,4 +211,20 @@ static inline void memalloc_noreclaim_restore(unsigned int flags)
>  	current->flags = (current->flags & ~PF_MEMALLOC) | flags;
>  }
> 
> +#ifdef CONFIG_MEMBARRIER
> +enum {
> +	MEMBARRIER_STATE_PRIVATE_EXPEDITED_READY	= (1U << 0),
> +	MEMBARRIER_STATE_SWITCH_MM			= (1U << 1),
> +};
> +
> +static inline void membarrier_execve(struct task_struct *t)
> +{
> +	atomic_set(&t->mm->membarrier_state, 0);
> +}
> +#else
> +static inline void membarrier_execve(struct task_struct *t)
> +{
> +}
> +#endif
> +
>  #endif /* _LINUX_SCHED_MM_H */
> diff --git a/include/uapi/linux/membarrier.h b/include/uapi/linux/membarrier.h
> index 6d47b3249d8a..4e01ad7ffe98 100644
> --- a/include/uapi/linux/membarrier.h
> +++ b/include/uapi/linux/membarrier.h
> @@ -52,21 +52,30 @@
>   *                          (non-running threads are de facto in such a
>   *                          state). This only covers threads from the
>   *                          same processes as the caller thread. This
> - *                          command returns 0. The "expedited" commands
> - *                          complete faster than the non-expedited ones,
> - *                          they never block, but have the downside of
> - *                          causing extra overhead.
> + *                          command returns 0 on success. The
> + *                          "expedited" commands complete faster than
> + *                          the non-expedited ones, they never block,
> + *                          but have the downside of causing extra
> + *                          overhead. A process needs to register its
> + *                          intent to use the private expedited command
> + *                          prior to using it, otherwise this command
> + *                          returns -EPERM.
> + * @MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED:
> + *                          Register the process intent to use
> + *                          MEMBARRIER_CMD_PRIVATE_EXPEDITED. Always
> + *                          returns 0.
>   *
>   * Command to be passed to the membarrier system call. The commands need to
>   * be a single bit each, except for MEMBARRIER_CMD_QUERY which is assigned to
>   * the value 0.
>   */
>  enum membarrier_cmd {
> -	MEMBARRIER_CMD_QUERY			= 0,
> -	MEMBARRIER_CMD_SHARED			= (1 << 0),
> +	MEMBARRIER_CMD_QUERY				= 0,
> +	MEMBARRIER_CMD_SHARED				= (1 << 0),
>  	/* reserved for MEMBARRIER_CMD_SHARED_EXPEDITED (1 << 1) */
>  	/* reserved for MEMBARRIER_CMD_PRIVATE (1 << 2) */
> -	MEMBARRIER_CMD_PRIVATE_EXPEDITED	= (1 << 3),
> +	MEMBARRIER_CMD_PRIVATE_EXPEDITED		= (1 << 3),
> +	MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED	= (1 << 4),
>  };
> 
>  #endif /* _UAPI_LINUX_MEMBARRIER_H */
> diff --git a/kernel/sched/membarrier.c b/kernel/sched/membarrier.c
> index a92fddc22747..dd7908743dab 100644
> --- a/kernel/sched/membarrier.c
> +++ b/kernel/sched/membarrier.c
> @@ -18,6 +18,7 @@
>  #include <linux/membarrier.h>
>  #include <linux/tick.h>
>  #include <linux/cpumask.h>
> +#include <linux/atomic.h>
> 
>  #include "sched.h"	/* for cpu_rq(). */
> 
> @@ -26,21 +27,26 @@
>   * except MEMBARRIER_CMD_QUERY.
>   */
>  #define MEMBARRIER_CMD_BITMASK	\
> -	(MEMBARRIER_CMD_SHARED | MEMBARRIER_CMD_PRIVATE_EXPEDITED)
> +	(MEMBARRIER_CMD_SHARED | MEMBARRIER_CMD_PRIVATE_EXPEDITED	\
> +	| MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED)
> 
>  static void ipi_mb(void *info)
>  {
>  	smp_mb();	/* IPIs should be serializing but paranoid. */
>  }
> 
> -static void membarrier_private_expedited(void)
> +static int membarrier_private_expedited(void)
>  {
>  	int cpu;
>  	bool fallback = false;
>  	cpumask_var_t tmpmask;
> 
> +	if (!(atomic_read(&current->mm->membarrier_state)
> +			& MEMBARRIER_STATE_PRIVATE_EXPEDITED_READY))
> +		return -EPERM;
> +
>  	if (num_online_cpus() == 1)
> -		return;
> +		return 0;
> 
>  	/*
>  	 * Matches memory barriers around rq->curr modification in
> @@ -94,6 +100,24 @@ static void membarrier_private_expedited(void)
>  	 * rq->curr modification in scheduler.
>  	 */
>  	smp_mb();	/* exit from system call is not a mb */
> +	return 0;
> +}
> +
> +static void membarrier_register_private_expedited(void)
> +{
> +	struct task_struct *p = current;
> +	struct mm_struct *mm = p->mm;
> +
> +	/*
> +	 * We need to consider threads belonging to different thread
> +	 * groups, which use the same mm. (CLONE_VM but not
> +	 * CLONE_THREAD).
> +	 */
> +	if (atomic_read(&mm->membarrier_state)
> +			& MEMBARRIER_STATE_PRIVATE_EXPEDITED_READY)
> +		return;
> +	atomic_or(MEMBARRIER_STATE_PRIVATE_EXPEDITED_READY,
> +			&mm->membarrier_state);
>  }
> 
>  /**
> @@ -144,7 +168,9 @@ SYSCALL_DEFINE2(membarrier, int, cmd, int, flags)
>  			synchronize_sched();
>  		return 0;
>  	case MEMBARRIER_CMD_PRIVATE_EXPEDITED:
> -		membarrier_private_expedited();
> +		return membarrier_private_expedited();
> +	case MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED:
> +		membarrier_register_private_expedited();
>  		return 0;
>  	default:
>  		return -EINVAL;
> -- 
> 2.11.0
>