From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751622AbdHaRAc (ORCPT ); Thu, 31 Aug 2017 13:00:32 -0400 Received: from foss.arm.com ([217.140.101.70]:58390 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750813AbdHaRAb (ORCPT ); Thu, 31 Aug 2017 13:00:31 -0400 Date: Thu, 31 Aug 2017 18:00:35 +0100 From: Will Deacon To: Mathieu Desnoyers Cc: Andy Lutomirski , "Paul E. McKenney" , Peter Zijlstra , linux-kernel , Boqun Feng , Andrew Hunter , maged michael , gromer , Avi Kivity , Benjamin Herrenschmidt , Paul Mackerras , Michael Ellerman , Dave Watson , Andy Lutomirski , Hans Boehm Subject: Re: [PATCH v2] membarrier: provide register sync core cmd Message-ID: <20170831170035.GC26273@arm.com> References: <20170827205035.25620-1-mathieu.desnoyers@efficios.com> <1463521395.16945.1503889546934.JavaMail.zimbra@efficios.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1463521395.16945.1503889546934.JavaMail.zimbra@efficios.com> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Aug 28, 2017 at 03:05:46AM +0000, Mathieu Desnoyers wrote: > ----- On Aug 27, 2017, at 3:53 PM, Andy Lutomirski luto@amacapital.net wrote: > > >> On Aug 27, 2017, at 1:50 PM, Mathieu Desnoyers > >> wrote: > >> > >> Add a new MEMBARRIER_CMD_REGISTER_SYNC_CORE command to the membarrier > >> system call. It allows processes to register their intent to have their > >> threads issue core serializing barriers in addition to memory barriers > >> whenever a membarrier command is performed. > >> > > > > Why is this stateful? That is, why not just have a new membarrier command to > > sync every thread's icache? > > If we'd do it on every CPU icache, it would be as trivial as you say. The > concern here is sending IPIs only to CPUs running threads that belong to the > same process, so we don't disturb unrelated processes. > > If we could just grab each CPU's runqueue lock, it would be fairly simple > to do. But we want to avoid hitting each runqueue with exclusive atomic > access associated with grabbing the lock. (cache-line bouncing) I'm still trying to get my head around this for arm64, where we have the following properties: * Return to userspace is context-synchronizing * We have a heavy barrier in switch_to so it would seem to me that we could avoid taking RQ locks if the mm_cpumask was kept up to date. The problematic case is where a CPU is not observed in the mask (maybe the write is buffered), but it is running in userspace. However, that can't occur with the barrier in switch_to. So we only need to IPI those CPUs that were in userspace for this task at the point when the syscall was made, and the mm_cpumask should reflect that. What am I missing? Will