All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mathieu Desnoyers <mathieu.desnoyers-vg+e7yoeK/dWk0Htik3J/w@public.gmane.org>
To: Andrew Morton <akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
Cc: "Paul E. McKenney"
	<paulmck-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-api <linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	KOSAKI Motohiro
	<kosaki.motohiro-+CUm20s59erQFUHtdCDX3A@public.gmane.org>,
	rostedt <rostedt-nx8X9YLhiw1AfugRpC6u6w@public.gmane.org>,
	Nicholas Miell <nmiell-Wuw85uim5zDR7s880joybQ@public.gmane.org>,
	Linus Torvalds
	<torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>,
	Ingo Molnar <mingo-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>,
	One Thousand Gnomes
	<gnomes-qBU/x9rampVanCEyBjwyrvXRex20P6io@public.gmane.org>,
	Lai Jiangshan <laijs-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>,
	Stephen Hemminger
	<stephen-OTpzqLSitTUnbdJkjeBofR2eb7JE58TQ@public.gmane.org>,
	Thomas Gleixner <tglx-hfZtesqFncYOwBW4kG4KsQ@public.gmane.org>,
	Peter Zijlstra <peterz-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>,
	David Howells <dhowells-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>,
	Pranith Kumar
	<bobby.prani-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>,
	Michael Kerrisk
	<mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Subject: Re: [PATCH for v4.2 v18 1/3] sys_membarrier(): system-wide memory barrier (generic, x86)
Date: Sun, 31 May 2015 12:53:05 +0000 (UTC)	[thread overview]
Message-ID: <1260710044.536.1433076785629.JavaMail.zimbra@efficios.com> (raw)
In-Reply-To: <20150529154036.9695b6c153ed37aed3343f3b-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>

----- On May 30, 2015, at 12:40 AM, Andrew Morton akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org wrote:

> On Sat, 16 May 2015 19:48:18 -0400 Mathieu Desnoyers
> <mathieu.desnoyers-vg+e7yoeK/dWk0Htik3J/w@public.gmane.org> wrote:
> 
>> Here is an implementation of a new system call, sys_membarrier(), which
>> executes a memory barrier on all threads running on the system. It is
>> implemented by calling synchronize_sched(). It can be used to distribute
>> the cost of user-space memory barriers asymmetrically by transforming
>> pairs of memory barriers into pairs consisting of sys_membarrier() and a
>> compiler barrier. For synchronization primitives that distinguish
>> between read-side and write-side (e.g. userspace RCU [1], rwlocks), the
>> read-side can be accelerated significantly by moving the bulk of the
>> memory barrier overhead to the write-side.
>>
>> ...
>>
> 
> It would be nice to hear about the real world value of this syscall to
> our users.  I'm seeing test results for a microbenchmark but so what.
> What actual applications or application classes are calling for this and
> what results can they expect to see?

AFAIK, the existing open source applications that would be improved by this
system call are as follows:

* Through Userspace RCU library (http://urcu.so)
  - DNS server (Knot DNS) https://www.knot-dns.cz/
  - Network sniffer (http://netsniff-ng.org/)
  - Distributed object storage (https://sheepdog.github.io/sheepdog/)
  - User-space tracing (http://lttng.org)
  - Network storage system (https://www.gluster.org/)

Those projects use RCU in userspace to increase read-side speed and
scalability compared to locking. Especially in the case of RCU used
by libraries, sys_membarrier can speed up the read-side by moving the
bulk of the memory barrier cost to synchronize_rcu().

* Direct users of sys_membarrier
  - core dotnet garbage collector (https://github.com/dotnet/coreclr/issues/198)

Microsoft core dotnet GC developers are planning to use the mprotect()
side-effect of issuing memory barriers through IPIs as a way to implement Windows
FlushProcessWriteBuffers() on Linux. They are referring to sys_membarrier in their
github thread, specifically stating that sys_membarrier() is what they are looking
for.

> 
>> 
>> membarrier(2) man page:
>> --------------- snip -------------------
>> MEMBARRIER(2)              Linux Programmer's Manual             MEMBARRIER(2)
>> 
>> NAME
>>        membarrier - issue memory barriers on a set of threads
>> 
>> SYNOPSIS
>>        #include <linux/membarrier.h>
>> 
>>        int membarrier(int cmd, int flags);
>> 
>> DESCRIPTION
>>        The cmd argument is one of the following:
>> 
>>        MEMBARRIER_CMD_QUERY
>>               Query  the  set  of  supported commands. It returns a bitmask of
>>               supported commands.
>> 
>>        MEMBARRIER_CMD_SHARED
>>               Execute a memory barrier on all threads running on  the  system.
>>               Upon  return from system call, the caller thread is ensured that
>>               all running threads have passed through a state where all memory
>>               accesses  to  user-space  addresses  match program order between
>>               entry to and return from the system  call  (non-running  threads
>>               are de facto in such a state). This covers threads from all pro___
>>               cesses running on the system.  This command returns 0.
>> 
>>        The flags argument needs to be 0. For future extensions.
>> 
>>        All memory accesses performed  in  program  order  from  each  targeted
>>        thread is guaranteed to be ordered with respect to sys_membarrier(). If
>>        we use the semantic "barrier()" to represent a compiler barrier forcing
>>        memory  accesses  to  be performed in program order across the barrier,
>>        and smp_mb() to represent explicit memory barriers forcing full  memory
>>        ordering  across  the barrier, we have the following ordering table for
>>        each pair of barrier(), sys_membarrier() and smp_mb():
>> 
>>        The pair ordering is detailed as (O: ordered, X: not ordered):
>> 
>>                               barrier()   smp_mb() sys_membarrier()
>>               barrier()          X           X            O
>>               smp_mb()           X           O            O
>>               sys_membarrier()   O           O            O
>> 
>> RETURN VALUE
>>        On success, these system calls return zero.  On error, -1 is  returned,
>>        and errno is set appropriately. For a given command, with flags
>>        argument set to 0, this system call is guaranteed to always return the
>>        same value until reboot.
> 
> I suggest "with flags argument set to MEMBARRIER_CMD_QUERY" here.

No, the enum is for the "cmd" argument (see above) not the flags argument. We
really mean flags = 0 (the value) here.

> 
>> 
>> ERRORS
>>        ENOSYS System call is not implemented.
>> 
>>        EINVAL Invalid arguments.
>> 
>> ...
>>
>> +SYSCALL_DEFINE2(membarrier, int, cmd, int, flags)
>> +{
>> +	if (flags)
>> +		return -EINVAL;
> 
> I'm not a huge fan of this "add a flags arg to syscalls" rule.  Is
> there any realistic expectation that we'll ever *use* this thing?  If
> not, why add it?

I can see this system call evolve in a few ways in the future, such as
having an expedited version (using IPIs), targeting the local thread
group, and targeting all threads mapping a specific shared memory mapping.
I guess that the cmd argument should be enough to cover that, but
in doubt, it might be better to keep a flags argument there for future
needs we might be overlooking right now, so we never end up needing a
sys_membarrier2 system call.

> 
> You may as well put an unlikely() in there btw.

Will do.

Thanks!

Mathieu

> 
>> +	switch (cmd) {
>> +	case MEMBARRIER_CMD_QUERY:
>> +		return MEMBARRIER_CMD_BITMASK;
>> +	case MEMBARRIER_CMD_SHARED:
>> +		if (num_online_cpus() > 1)
>> +			synchronize_sched();
>> +		return 0;
>> +	default:
>> +		return -EINVAL;
>> +	}
> > +}

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

WARNING: multiple messages have this Message-ID (diff)
From: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
	linux-kernel@vger.kernel.org,
	linux-api <linux-api@vger.kernel.org>,
	KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	rostedt <rostedt@goodmis.org>,
	Nicholas Miell <nmiell@comcast.net>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Ingo Molnar <mingo@redhat.com>,
	One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk>,
	Lai Jiangshan <laijs@cn.fujitsu.com>,
	Stephen Hemminger <stephen@networkplumber.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Peter Zijlstra <peterz@infradead.org>,
	David Howells <dhowells@redhat.com>,
	Pranith Kumar <bobby.prani@gmail.com>,
	Michael Kerrisk <mtk.manpages@gmail.com>
Subject: Re: [PATCH for v4.2 v18 1/3] sys_membarrier(): system-wide memory barrier (generic, x86)
Date: Sun, 31 May 2015 12:53:05 +0000 (UTC)	[thread overview]
Message-ID: <1260710044.536.1433076785629.JavaMail.zimbra@efficios.com> (raw)
In-Reply-To: <20150529154036.9695b6c153ed37aed3343f3b@linux-foundation.org>

----- On May 30, 2015, at 12:40 AM, Andrew Morton akpm@linux-foundation.org wrote:

> On Sat, 16 May 2015 19:48:18 -0400 Mathieu Desnoyers
> <mathieu.desnoyers@efficios.com> wrote:
> 
>> Here is an implementation of a new system call, sys_membarrier(), which
>> executes a memory barrier on all threads running on the system. It is
>> implemented by calling synchronize_sched(). It can be used to distribute
>> the cost of user-space memory barriers asymmetrically by transforming
>> pairs of memory barriers into pairs consisting of sys_membarrier() and a
>> compiler barrier. For synchronization primitives that distinguish
>> between read-side and write-side (e.g. userspace RCU [1], rwlocks), the
>> read-side can be accelerated significantly by moving the bulk of the
>> memory barrier overhead to the write-side.
>>
>> ...
>>
> 
> It would be nice to hear about the real world value of this syscall to
> our users.  I'm seeing test results for a microbenchmark but so what.
> What actual applications or application classes are calling for this and
> what results can they expect to see?

AFAIK, the existing open source applications that would be improved by this
system call are as follows:

* Through Userspace RCU library (http://urcu.so)
  - DNS server (Knot DNS) https://www.knot-dns.cz/
  - Network sniffer (http://netsniff-ng.org/)
  - Distributed object storage (https://sheepdog.github.io/sheepdog/)
  - User-space tracing (http://lttng.org)
  - Network storage system (https://www.gluster.org/)

Those projects use RCU in userspace to increase read-side speed and
scalability compared to locking. Especially in the case of RCU used
by libraries, sys_membarrier can speed up the read-side by moving the
bulk of the memory barrier cost to synchronize_rcu().

* Direct users of sys_membarrier
  - core dotnet garbage collector (https://github.com/dotnet/coreclr/issues/198)

Microsoft core dotnet GC developers are planning to use the mprotect()
side-effect of issuing memory barriers through IPIs as a way to implement Windows
FlushProcessWriteBuffers() on Linux. They are referring to sys_membarrier in their
github thread, specifically stating that sys_membarrier() is what they are looking
for.

> 
>> 
>> membarrier(2) man page:
>> --------------- snip -------------------
>> MEMBARRIER(2)              Linux Programmer's Manual             MEMBARRIER(2)
>> 
>> NAME
>>        membarrier - issue memory barriers on a set of threads
>> 
>> SYNOPSIS
>>        #include <linux/membarrier.h>
>> 
>>        int membarrier(int cmd, int flags);
>> 
>> DESCRIPTION
>>        The cmd argument is one of the following:
>> 
>>        MEMBARRIER_CMD_QUERY
>>               Query  the  set  of  supported commands. It returns a bitmask of
>>               supported commands.
>> 
>>        MEMBARRIER_CMD_SHARED
>>               Execute a memory barrier on all threads running on  the  system.
>>               Upon  return from system call, the caller thread is ensured that
>>               all running threads have passed through a state where all memory
>>               accesses  to  user-space  addresses  match program order between
>>               entry to and return from the system  call  (non-running  threads
>>               are de facto in such a state). This covers threads from all pro___
>>               cesses running on the system.  This command returns 0.
>> 
>>        The flags argument needs to be 0. For future extensions.
>> 
>>        All memory accesses performed  in  program  order  from  each  targeted
>>        thread is guaranteed to be ordered with respect to sys_membarrier(). If
>>        we use the semantic "barrier()" to represent a compiler barrier forcing
>>        memory  accesses  to  be performed in program order across the barrier,
>>        and smp_mb() to represent explicit memory barriers forcing full  memory
>>        ordering  across  the barrier, we have the following ordering table for
>>        each pair of barrier(), sys_membarrier() and smp_mb():
>> 
>>        The pair ordering is detailed as (O: ordered, X: not ordered):
>> 
>>                               barrier()   smp_mb() sys_membarrier()
>>               barrier()          X           X            O
>>               smp_mb()           X           O            O
>>               sys_membarrier()   O           O            O
>> 
>> RETURN VALUE
>>        On success, these system calls return zero.  On error, -1 is  returned,
>>        and errno is set appropriately. For a given command, with flags
>>        argument set to 0, this system call is guaranteed to always return the
>>        same value until reboot.
> 
> I suggest "with flags argument set to MEMBARRIER_CMD_QUERY" here.

No, the enum is for the "cmd" argument (see above) not the flags argument. We
really mean flags = 0 (the value) here.

> 
>> 
>> ERRORS
>>        ENOSYS System call is not implemented.
>> 
>>        EINVAL Invalid arguments.
>> 
>> ...
>>
>> +SYSCALL_DEFINE2(membarrier, int, cmd, int, flags)
>> +{
>> +	if (flags)
>> +		return -EINVAL;
> 
> I'm not a huge fan of this "add a flags arg to syscalls" rule.  Is
> there any realistic expectation that we'll ever *use* this thing?  If
> not, why add it?

I can see this system call evolve in a few ways in the future, such as
having an expedited version (using IPIs), targeting the local thread
group, and targeting all threads mapping a specific shared memory mapping.
I guess that the cmd argument should be enough to cover that, but
in doubt, it might be better to keep a flags argument there for future
needs we might be overlooking right now, so we never end up needing a
sys_membarrier2 system call.

> 
> You may as well put an unlikely() in there btw.

Will do.

Thanks!

Mathieu

> 
>> +	switch (cmd) {
>> +	case MEMBARRIER_CMD_QUERY:
>> +		return MEMBARRIER_CMD_BITMASK;
>> +	case MEMBARRIER_CMD_SHARED:
>> +		if (num_online_cpus() > 1)
>> +			synchronize_sched();
>> +		return 0;
>> +	default:
>> +		return -EINVAL;
>> +	}
> > +}

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

  parent reply	other threads:[~2015-05-31 12:53 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-05-16 23:48 [PATCH for v4.2 0/3] membarrier system call Mathieu Desnoyers
2015-05-16 23:48 ` Mathieu Desnoyers
     [not found] ` <1431820100-17040-1-git-send-email-mathieu.desnoyers-vg+e7yoeK/dWk0Htik3J/w@public.gmane.org>
2015-05-16 23:48   ` [PATCH for v4.2 v18 1/3] sys_membarrier(): system-wide memory barrier (generic, x86) Mathieu Desnoyers
2015-05-16 23:48     ` Mathieu Desnoyers
     [not found]     ` <1431820100-17040-2-git-send-email-mathieu.desnoyers-vg+e7yoeK/dWk0Htik3J/w@public.gmane.org>
2015-05-29 22:40       ` Andrew Morton
2015-05-29 22:40         ` Andrew Morton
     [not found]         ` <20150529154036.9695b6c153ed37aed3343f3b-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
2015-05-31 12:53           ` Mathieu Desnoyers [this message]
2015-05-31 12:53             ` Mathieu Desnoyers
2015-05-16 23:48 ` [PATCH for v4.2 2/3] selftests: add membarrier syscall test Mathieu Desnoyers
2015-05-16 23:48 ` [PATCH for v4.2 3/3] selftests: enhance " Mathieu Desnoyers

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1260710044.536.1433076785629.JavaMail.zimbra@efficios.com \
    --to=mathieu.desnoyers-vg+e7yoek/dwk0htik3j/w@public.gmane.org \
    --cc=akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org \
    --cc=bobby.prani-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
    --cc=dhowells-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
    --cc=gnomes-qBU/x9rampVanCEyBjwyrvXRex20P6io@public.gmane.org \
    --cc=kosaki.motohiro-+CUm20s59erQFUHtdCDX3A@public.gmane.org \
    --cc=laijs-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org \
    --cc=linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=mingo-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
    --cc=mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
    --cc=nmiell-Wuw85uim5zDR7s880joybQ@public.gmane.org \
    --cc=paulmck-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org \
    --cc=peterz-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org \
    --cc=rostedt-nx8X9YLhiw1AfugRpC6u6w@public.gmane.org \
    --cc=stephen-OTpzqLSitTUnbdJkjeBofR2eb7JE58TQ@public.gmane.org \
    --cc=tglx-hfZtesqFncYOwBW4kG4KsQ@public.gmane.org \
    --cc=torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.