From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758055AbbEaMxf (ORCPT ); Sun, 31 May 2015 08:53:35 -0400 Received: from mail.efficios.com ([78.47.125.74]:39956 "EHLO mail.efficios.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754390AbbEaMxX (ORCPT ); Sun, 31 May 2015 08:53:23 -0400 Date: Sun, 31 May 2015 12:53:05 +0000 (UTC) From: Mathieu Desnoyers To: Andrew Morton Cc: "Paul E. McKenney" , linux-kernel@vger.kernel.org, linux-api , KOSAKI Motohiro , rostedt , Nicholas Miell , Linus Torvalds , Ingo Molnar , One Thousand Gnomes , Lai Jiangshan , Stephen Hemminger , Thomas Gleixner , Peter Zijlstra , David Howells , Pranith Kumar , Michael Kerrisk Message-ID: <1260710044.536.1433076785629.JavaMail.zimbra@efficios.com> In-Reply-To: <20150529154036.9695b6c153ed37aed3343f3b@linux-foundation.org> References: <1431820100-17040-1-git-send-email-mathieu.desnoyers@efficios.com> <1431820100-17040-2-git-send-email-mathieu.desnoyers@efficios.com> <20150529154036.9695b6c153ed37aed3343f3b@linux-foundation.org> Subject: Re: [PATCH for v4.2 v18 1/3] sys_membarrier(): system-wide memory barrier (generic, x86) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [78.47.125.74] X-Mailer: Zimbra 8.6.0_GA_1153 (ZimbraWebClient - FF38 (Linux)/8.6.0_GA_1153) Thread-Topic: sys_membarrier(): system-wide memory barrier (generic, x86) Thread-Index: PVsubsTu1xwGzv6LBKWk+9LZtmjjGQ== Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org ----- On May 30, 2015, at 12:40 AM, Andrew Morton akpm@linux-foundation.org wrote: > On Sat, 16 May 2015 19:48:18 -0400 Mathieu Desnoyers > wrote: > >> Here is an implementation of a new system call, sys_membarrier(), which >> executes a memory barrier on all threads running on the system. It is >> implemented by calling synchronize_sched(). It can be used to distribute >> the cost of user-space memory barriers asymmetrically by transforming >> pairs of memory barriers into pairs consisting of sys_membarrier() and a >> compiler barrier. For synchronization primitives that distinguish >> between read-side and write-side (e.g. userspace RCU [1], rwlocks), the >> read-side can be accelerated significantly by moving the bulk of the >> memory barrier overhead to the write-side. >> >> ... >> > > It would be nice to hear about the real world value of this syscall to > our users. I'm seeing test results for a microbenchmark but so what. > What actual applications or application classes are calling for this and > what results can they expect to see? AFAIK, the existing open source applications that would be improved by this system call are as follows: * Through Userspace RCU library (http://urcu.so) - DNS server (Knot DNS) https://www.knot-dns.cz/ - Network sniffer (http://netsniff-ng.org/) - Distributed object storage (https://sheepdog.github.io/sheepdog/) - User-space tracing (http://lttng.org) - Network storage system (https://www.gluster.org/) Those projects use RCU in userspace to increase read-side speed and scalability compared to locking. Especially in the case of RCU used by libraries, sys_membarrier can speed up the read-side by moving the bulk of the memory barrier cost to synchronize_rcu(). * Direct users of sys_membarrier - core dotnet garbage collector (https://github.com/dotnet/coreclr/issues/198) Microsoft core dotnet GC developers are planning to use the mprotect() side-effect of issuing memory barriers through IPIs as a way to implement Windows FlushProcessWriteBuffers() on Linux. They are referring to sys_membarrier in their github thread, specifically stating that sys_membarrier() is what they are looking for. > >> >> membarrier(2) man page: >> --------------- snip ------------------- >> MEMBARRIER(2) Linux Programmer's Manual MEMBARRIER(2) >> >> NAME >> membarrier - issue memory barriers on a set of threads >> >> SYNOPSIS >> #include >> >> int membarrier(int cmd, int flags); >> >> DESCRIPTION >> The cmd argument is one of the following: >> >> MEMBARRIER_CMD_QUERY >> Query the set of supported commands. It returns a bitmask of >> supported commands. >> >> MEMBARRIER_CMD_SHARED >> Execute a memory barrier on all threads running on the system. >> Upon return from system call, the caller thread is ensured that >> all running threads have passed through a state where all memory >> accesses to user-space addresses match program order between >> entry to and return from the system call (non-running threads >> are de facto in such a state). This covers threads from all pro___ >> cesses running on the system. This command returns 0. >> >> The flags argument needs to be 0. For future extensions. >> >> All memory accesses performed in program order from each targeted >> thread is guaranteed to be ordered with respect to sys_membarrier(). If >> we use the semantic "barrier()" to represent a compiler barrier forcing >> memory accesses to be performed in program order across the barrier, >> and smp_mb() to represent explicit memory barriers forcing full memory >> ordering across the barrier, we have the following ordering table for >> each pair of barrier(), sys_membarrier() and smp_mb(): >> >> The pair ordering is detailed as (O: ordered, X: not ordered): >> >> barrier() smp_mb() sys_membarrier() >> barrier() X X O >> smp_mb() X O O >> sys_membarrier() O O O >> >> RETURN VALUE >> On success, these system calls return zero. On error, -1 is returned, >> and errno is set appropriately. For a given command, with flags >> argument set to 0, this system call is guaranteed to always return the >> same value until reboot. > > I suggest "with flags argument set to MEMBARRIER_CMD_QUERY" here. No, the enum is for the "cmd" argument (see above) not the flags argument. We really mean flags = 0 (the value) here. > >> >> ERRORS >> ENOSYS System call is not implemented. >> >> EINVAL Invalid arguments. >> >> ... >> >> +SYSCALL_DEFINE2(membarrier, int, cmd, int, flags) >> +{ >> + if (flags) >> + return -EINVAL; > > I'm not a huge fan of this "add a flags arg to syscalls" rule. Is > there any realistic expectation that we'll ever *use* this thing? If > not, why add it? I can see this system call evolve in a few ways in the future, such as having an expedited version (using IPIs), targeting the local thread group, and targeting all threads mapping a specific shared memory mapping. I guess that the cmd argument should be enough to cover that, but in doubt, it might be better to keep a flags argument there for future needs we might be overlooking right now, so we never end up needing a sys_membarrier2 system call. > > You may as well put an unlikely() in there btw. Will do. Thanks! Mathieu > >> + switch (cmd) { >> + case MEMBARRIER_CMD_QUERY: >> + return MEMBARRIER_CMD_BITMASK; >> + case MEMBARRIER_CMD_SHARED: >> + if (num_online_cpus() > 1) >> + synchronize_sched(); >> + return 0; >> + default: >> + return -EINVAL; >> + } > > +} -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com