All of lore.kernel.org
 help / color / mirror / Atom feed
From: Josh Triplett <josh@joshtriplett.org>
To: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Paul Turner <pjt@google.com>, Andrew Hunter <ahh@google.com>,
	Ben Maurer <bmaurer@fb.com>,
	linux-kernel@vger.kernel.org,
	Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
	Lai Jiangshan <laijs@cn.fujitsu.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [RFC PATCH] percpu system call: fast userspace percpu critical sections
Date: Thu, 21 May 2015 08:46:55 -0700	[thread overview]
Message-ID: <20150521154655.GA17956@x> (raw)
In-Reply-To: <1432219487-13364-1-git-send-email-mathieu.desnoyers@efficios.com>

On Thu, May 21, 2015 at 10:44:47AM -0400, Mathieu Desnoyers wrote:
> Expose a new system call allowing userspace threads to register
> a TLS area used as an ABI between the kernel and userspace to
> share information required to create efficient per-cpu critical
> sections in user-space.
> 
> This ABI consists of a thread-local structure containing:
> 
> - a nesting count surrounding the critical section,
> - a signal number to be sent to the thread when preempting a thread
>   with non-zero nesting count,
> - a flag indicating whether the signal has been sent within the
>   critical section,
> - an integer where to store the current CPU number, updated whenever
>   the thread is preempted. This CPU number cache is not strictly
>   needed, but performs better than getcpu vdso.
> 
> This approach is inspired by Paul Turner and Andrew Hunter's work
> on percpu atomics, which lets the kernel handle restart of critical
> sections, ref. http://www.linuxplumbersconf.org/2013/ocw/system/presentations/1695/original/LPC%20-%20PerCpu%20Atomics.pdf
> 
> What is done differently here compared to percpu atomics: we track
> a single nesting counter per thread rather than many ranges of
> instruction pointer values. We deliver a signal to user-space and
> let the logic of restart be handled in user-space, thus moving
> the complexity out of the kernel. The nesting counter approach
> allows us to skip the complexity of interacting with signals that
> would be otherwise needed with the percpu atomics approach, which
> needs to know which instruction pointers are preempted, including
> when preemption occurs on a signal handler nested over an instruction
> pointer of interest.
> 
> Advantages of this approach over percpu atomics:
> - kernel code is relatively simple: complexity of restart sections
>   is in user-space,
> - easy to port to other architectures: just need to reserve a new
>   system call,
> - for threads which have registered a TLS structure, the fast-path
>   at preemption is only a nesting counter check, along with the
>   optional store of the current CPU number, rather than comparing
>   instruction pointer with possibly many registered ranges,
> 
> Caveats of this approach compared to the percpu atomics:
> - We need a signal number for this, so it cannot be done without
>   designing the application accordingly,
> - Handling restart in user-space is currently performed with page
>   protection, for which we install a SIGSEGV signal handler. Again,
>   this requires designing the application accordingly, especially
>   if the application installs its own segmentation fault handler,
> - It cannot be used for tracing of processes by injection of code
>   into their address space, due to interactions with application
>   signal handlers.
> 
> The user-space proof of concept code implementing the restart section
> can be found here: https://github.com/compudj/percpu-dev
> 
> Benchmarking sched_getcpu() vs tls cache approach. Getting the
> current CPU number:
> 
> - With Linux vdso:            12.7 ns
> - With TLS-cached cpu number:  0.3 ns
> 
> We will use the TLS-cached cpu number for the following
> benchmarks.
> 
> On an Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz, comparison
> with a baseline running very few load/stores (no locking,
> no getcpu, assuming one thread per CPU with affinity),
> against locking scheme based on "lock; cmpxchg", "cmpxchg"
> (using restart signal), load-store (using restart signal).
> This is performed with 32 threads on a 16-core, hyperthread
> system:
> 
>                  ns/loop      overhead (ns)
> Baseline:          3.7           0.0
> lock; cmpxchg:    22.0          18.3
> cmpxchg:          11.1           7.4
> load-store:        9.4           5.7
> 
> Therefore, the load-store scheme has a speedup of 3.2x over the
> "lock; cmpxchg" scheme if both are using the tls-cache for the
> CPU number. If we use Linux sched_getcpu() for "lock; cmpxchg"
> we reach of speedup of 5.4x for load-store+tls-cache vs
> "lock; cmpxchg"+vdso-getcpu.
> 
> I'm sending this out to trigger discussion, and hopefully to see
> Paul and Andrew's patches being posted publicly at some point, so
> we can compare our approaches.

The idea seems sensible.  One quick comment: as with any new syscall,
please include a flags argument.

- Josh Triplett

  reply	other threads:[~2015-05-21 15:47 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-05-21 14:44 [RFC PATCH] percpu system call: fast userspace percpu critical sections Mathieu Desnoyers
2015-05-21 15:46 ` Josh Triplett [this message]
2015-05-21 18:58   ` Mathieu Desnoyers
2015-05-21 18:32 ` Peter Zijlstra
2015-05-21 19:08   ` Mathieu Desnoyers
2015-05-21 19:31     ` Paul Turner
2015-05-21 20:07 ` Paul Turner
2015-05-22 20:12   ` Mathieu Desnoyers
     [not found] ` <1432219487-13364-1-git-send-email-mathieu.desnoyers-vg+e7yoeK/dWk0Htik3J/w@public.gmane.org>
2015-05-22 20:26   ` Michael Kerrisk
2015-05-22 20:26     ` Michael Kerrisk
     [not found]     ` <CAHO5Pa0Kok4_QN0v3JNWyzGT=GbZNZcRyLhu02R2npV9hSdt7g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-05-22 20:53       ` Andy Lutomirski
2015-05-22 20:53         ` Andy Lutomirski
2015-05-22 21:34         ` Mathieu Desnoyers
2015-05-22 22:24           ` Andy Lutomirski
     [not found]             ` <CALCETrUxp-dP-kaTy4prEdciM-=sTXjpqnMbkvk38g5BTEvX0g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-05-23 17:09               ` Mathieu Desnoyers
2015-05-23 17:09                 ` Mathieu Desnoyers
2015-05-23 19:15                 ` Andy Lutomirski
     [not found]                   ` <CALCETrWzoFX7hXqvQqDEq=r=7PNaGKVjZeHEBWxPvC28Zi1AKA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-05-25 18:30                     ` Mathieu Desnoyers
2015-05-25 18:30                       ` Mathieu Desnoyers
     [not found]                       ` <1184354091.7499.1432578613872.JavaMail.zimbra-vg+e7yoeK/dWk0Htik3J/w@public.gmane.org>
2015-05-25 18:54                         ` Andy Lutomirski
2015-05-25 18:54                           ` Andy Lutomirski
     [not found]                           ` <CALCETrW3_Hv0jc3cpiwsHTinBqJzvab_EiPS8BVJhX-xe5D8qw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-05-26 19:57                             ` Andy Lutomirski
2015-05-26 19:57                               ` Andy Lutomirski
     [not found]                               ` <CALCETrXzmO=fQC=UdCh5b0zWiGWAJScEtdT4QDJkoqLgtgEVig-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-05-26 21:04                                 ` Mathieu Desnoyers
2015-05-26 21:04                                   ` Mathieu Desnoyers
     [not found]                                   ` <821493560.8531.1432674243321.JavaMail.zimbra-vg+e7yoeK/dWk0Htik3J/w@public.gmane.org>
2015-05-26 21:18                                     ` Andy Lutomirski
2015-05-26 21:18                                       ` Andy Lutomirski
2015-05-26 21:44                                       ` Andy Lutomirski
2015-05-26 20:38                             ` Mathieu Desnoyers
2015-05-26 20:38                               ` Mathieu Desnoyers
     [not found]                               ` <933886515.8478.1432672739485.JavaMail.zimbra-vg+e7yoeK/dWk0Htik3J/w@public.gmane.org>
2015-05-26 20:58                                 ` Andy Lutomirski
2015-05-26 20:58                                   ` Andy Lutomirski
2015-05-26 21:20                           ` Andi Kleen
     [not found]                             ` <20150526212041.GQ19417-1g7Xle2YJi4/4alezvVtWx2eb7JE58TQ@public.gmane.org>
2015-05-26 21:26                               ` Andy Lutomirski
2015-05-26 21:26                                 ` Andy Lutomirski
     [not found]         ` <CALCETrUSBqHG3tbOq1yFz33v1_ckEgLNorgAxwLFi7MkjNcwLA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-05-22 22:06           ` Andrew Hunter
2015-05-22 22:06             ` Andrew Hunter
2015-05-23 20:11 ` Linus Torvalds
2015-05-25 20:21   ` Mathieu Desnoyers
2015-05-29 16:46     ` Christoph Lameter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150521154655.GA17956@x \
    --to=josh@joshtriplett.org \
    --cc=ahh@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=bmaurer@fb.com \
    --cc=laijs@cn.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=mingo@redhat.com \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=peterz@infradead.org \
    --cc=pjt@google.com \
    --cc=rostedt@goodmis.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.