From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752863AbdJSRsw (ORCPT ); Thu, 19 Oct 2017 13:48:52 -0400 Received: from mx0b-001b2d01.pphosted.com ([148.163.158.5]:55378 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751882AbdJSRsu (ORCPT ); Thu, 19 Oct 2017 13:48:50 -0400 Date: Thu, 19 Oct 2017 10:48:46 -0700 From: "Paul E. McKenney" To: Mathieu Desnoyers Cc: Linus Torvalds , linux-kernel@vger.kernel.org, Peter Zijlstra , Ingo Molnar , Alexander Viro Subject: Re: [PATCH for 4.14] membarrier: Provide register expedited private command Reply-To: paulmck@linux.vnet.ibm.com References: <20171019173015.8274-1-mathieu.desnoyers@efficios.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20171019173015.8274-1-mathieu.desnoyers@efficios.com> User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-GCONF: 00 x-cbid: 17101917-0048-0000-0000-000001F839DD X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00007919; HX=3.00000241; KW=3.00000007; PH=3.00000004; SC=3.00000238; SDB=6.00933475; UDB=6.00470181; IPR=6.00713764; BA=6.00005651; NDR=6.00000001; ZLA=6.00000005; ZF=6.00000009; ZB=6.00000000; ZP=6.00000000; ZH=6.00000000; ZU=6.00000002; MB=3.00017610; XFM=3.00000015; UTC=2017-10-19 17:48:48 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 17101917-0049-0000-0000-000042EC2177 Message-Id: <20171019174846.GZ3521@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2017-10-19_09:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=2 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1707230000 definitions=main-1710190245 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Oct 19, 2017 at 01:30:15PM -0400, Mathieu Desnoyers wrote: > [ This patch is sent directly to Linus, because it needs to be merged > before the end of 4.14 rc cycle. It introduces a "register private > expedited" membarrier command which allows eventual removal of > important memory barrier constraints on the scheduler fast-paths. It > changes how the "private expedited" membarrier command (new to 4.14) > is used from user-space. Sorry to send this late in the cycle. ] > > Provide a command allowing processes to register their intent to use > the private expedited command. This affects how the expedited private > command introduced in 4.14-rc is meant to be used, and should be merged > before 4.14 final. > > Processes are now required to register before using > MEMBARRIER_CMD_PRIVATE_EXPEDITED, otherwise that command returns EPERM. > > This fixes a problem that arose when designing requested extensions to > sys_membarrier() to allow JITs to efficiently flush old code from > instruction caches. Several potential algorithms are much less painful > if the user register intent to use this functionality early on, for > example, before the process spawns the second thread. Registering at > this time removes the need to interrupt each and every thread in that > process at the first expedited sys_membarrier() system call. > > Signed-off-by: Mathieu Desnoyers > CC: Paul E. McKenney This looks much less intrusive than the earlier series! Acked-by: Paul E. McKenney > CC: Peter Zijlstra > CC: Ingo Molnar > CC: Alexander Viro > CC: Linus Torvalds > --- > fs/exec.c | 1 + > include/linux/mm_types.h | 3 +++ > include/linux/sched/mm.h | 16 ++++++++++++++++ > include/uapi/linux/membarrier.h | 23 ++++++++++++++++------- > kernel/sched/membarrier.c | 34 ++++++++++++++++++++++++++++++---- > 5 files changed, 66 insertions(+), 11 deletions(-) > > diff --git a/fs/exec.c b/fs/exec.c > index 5470d3c1892a..3e14ba25f678 100644 > --- a/fs/exec.c > +++ b/fs/exec.c > @@ -1802,6 +1802,7 @@ static int do_execveat_common(int fd, struct filename *filename, > /* execve succeeded */ > current->fs->in_exec = 0; > current->in_execve = 0; > + membarrier_execve(current); > acct_update_integrals(current); > task_numa_free(current); > free_bprm(bprm); > diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h > index 46f4ecf5479a..1861ea8dba77 100644 > --- a/include/linux/mm_types.h > +++ b/include/linux/mm_types.h > @@ -445,6 +445,9 @@ struct mm_struct { > unsigned long flags; /* Must use atomic bitops to access the bits */ > > struct core_state *core_state; /* coredumping support */ > +#ifdef CONFIG_MEMBARRIER > + atomic_t membarrier_state; > +#endif > #ifdef CONFIG_AIO > spinlock_t ioctx_lock; > struct kioctx_table __rcu *ioctx_table; > diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h > index ae53e413fb13..ab9bf7b73954 100644 > --- a/include/linux/sched/mm.h > +++ b/include/linux/sched/mm.h > @@ -211,4 +211,20 @@ static inline void memalloc_noreclaim_restore(unsigned int flags) > current->flags = (current->flags & ~PF_MEMALLOC) | flags; > } > > +#ifdef CONFIG_MEMBARRIER > +enum { > + MEMBARRIER_STATE_PRIVATE_EXPEDITED_READY = (1U << 0), > + MEMBARRIER_STATE_SWITCH_MM = (1U << 1), > +}; > + > +static inline void membarrier_execve(struct task_struct *t) > +{ > + atomic_set(&t->mm->membarrier_state, 0); > +} > +#else > +static inline void membarrier_execve(struct task_struct *t) > +{ > +} > +#endif > + > #endif /* _LINUX_SCHED_MM_H */ > diff --git a/include/uapi/linux/membarrier.h b/include/uapi/linux/membarrier.h > index 6d47b3249d8a..4e01ad7ffe98 100644 > --- a/include/uapi/linux/membarrier.h > +++ b/include/uapi/linux/membarrier.h > @@ -52,21 +52,30 @@ > * (non-running threads are de facto in such a > * state). This only covers threads from the > * same processes as the caller thread. This > - * command returns 0. The "expedited" commands > - * complete faster than the non-expedited ones, > - * they never block, but have the downside of > - * causing extra overhead. > + * command returns 0 on success. The > + * "expedited" commands complete faster than > + * the non-expedited ones, they never block, > + * but have the downside of causing extra > + * overhead. A process needs to register its > + * intent to use the private expedited command > + * prior to using it, otherwise this command > + * returns -EPERM. > + * @MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED: > + * Register the process intent to use > + * MEMBARRIER_CMD_PRIVATE_EXPEDITED. Always > + * returns 0. > * > * Command to be passed to the membarrier system call. The commands need to > * be a single bit each, except for MEMBARRIER_CMD_QUERY which is assigned to > * the value 0. > */ > enum membarrier_cmd { > - MEMBARRIER_CMD_QUERY = 0, > - MEMBARRIER_CMD_SHARED = (1 << 0), > + MEMBARRIER_CMD_QUERY = 0, > + MEMBARRIER_CMD_SHARED = (1 << 0), > /* reserved for MEMBARRIER_CMD_SHARED_EXPEDITED (1 << 1) */ > /* reserved for MEMBARRIER_CMD_PRIVATE (1 << 2) */ > - MEMBARRIER_CMD_PRIVATE_EXPEDITED = (1 << 3), > + MEMBARRIER_CMD_PRIVATE_EXPEDITED = (1 << 3), > + MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED = (1 << 4), > }; > > #endif /* _UAPI_LINUX_MEMBARRIER_H */ > diff --git a/kernel/sched/membarrier.c b/kernel/sched/membarrier.c > index a92fddc22747..dd7908743dab 100644 > --- a/kernel/sched/membarrier.c > +++ b/kernel/sched/membarrier.c > @@ -18,6 +18,7 @@ > #include > #include > #include > +#include > > #include "sched.h" /* for cpu_rq(). */ > > @@ -26,21 +27,26 @@ > * except MEMBARRIER_CMD_QUERY. > */ > #define MEMBARRIER_CMD_BITMASK \ > - (MEMBARRIER_CMD_SHARED | MEMBARRIER_CMD_PRIVATE_EXPEDITED) > + (MEMBARRIER_CMD_SHARED | MEMBARRIER_CMD_PRIVATE_EXPEDITED \ > + | MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED) > > static void ipi_mb(void *info) > { > smp_mb(); /* IPIs should be serializing but paranoid. */ > } > > -static void membarrier_private_expedited(void) > +static int membarrier_private_expedited(void) > { > int cpu; > bool fallback = false; > cpumask_var_t tmpmask; > > + if (!(atomic_read(¤t->mm->membarrier_state) > + & MEMBARRIER_STATE_PRIVATE_EXPEDITED_READY)) > + return -EPERM; > + > if (num_online_cpus() == 1) > - return; > + return 0; > > /* > * Matches memory barriers around rq->curr modification in > @@ -94,6 +100,24 @@ static void membarrier_private_expedited(void) > * rq->curr modification in scheduler. > */ > smp_mb(); /* exit from system call is not a mb */ > + return 0; > +} > + > +static void membarrier_register_private_expedited(void) > +{ > + struct task_struct *p = current; > + struct mm_struct *mm = p->mm; > + > + /* > + * We need to consider threads belonging to different thread > + * groups, which use the same mm. (CLONE_VM but not > + * CLONE_THREAD). > + */ > + if (atomic_read(&mm->membarrier_state) > + & MEMBARRIER_STATE_PRIVATE_EXPEDITED_READY) > + return; > + atomic_or(MEMBARRIER_STATE_PRIVATE_EXPEDITED_READY, > + &mm->membarrier_state); > } > > /** > @@ -144,7 +168,9 @@ SYSCALL_DEFINE2(membarrier, int, cmd, int, flags) > synchronize_sched(); > return 0; > case MEMBARRIER_CMD_PRIVATE_EXPEDITED: > - membarrier_private_expedited(); > + return membarrier_private_expedited(); > + case MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED: > + membarrier_register_private_expedited(); > return 0; > default: > return -EINVAL; > -- > 2.11.0 >