public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>,
	Peter Zijlstra <peterz@infradead.org>,
	"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
	Boqun Feng <boqun.feng@gmail.com>,
	Dave Watson <davejwatson@fb.com>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	linux-api <linux-api@vger.kernel.org>,
	Paul Turner <pjt@google.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Russell King <linux@arm.linux.org.uk>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, "H. Peter Anvin" <hpa@zytor.com>,
	Andrew Hunter <ahh@google.com>, Andi Kleen <andi@firstfloor.org>,
	Chris Lameter <cl@linux.com>, Ben Maurer <bmaurer@fb.com>,
	rostedt <rostedt@goodmis.org>,
	Josh Triplett <josh@joshtriplett.org>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Will Deacon <will.deacon@arm.com>,
	Michael Kerrisk <mtk.manpages@gmail.com>
Subject: Re: [RFC PATCH for 4.18 12/23] cpu_opv: Provide cpu_opv system call (v7)
Date: Fri, 4 May 2018 10:32:53 -0400 (EDT)	[thread overview]
Message-ID: <1883133260.11283.1525444373208.JavaMail.zimbra@efficios.com> (raw)
In-Reply-To: <1248652824.11527.1523912317964.JavaMail.zimbra@efficios.com>

----- On Apr 16, 2018, at 4:58 PM, Mathieu Desnoyers mathieu.desnoyers@efficios.com wrote:

> ----- On Apr 16, 2018, at 3:26 PM, Linus Torvalds torvalds@linux-foundation.org
> wrote:
> 
>> On Mon, Apr 16, 2018 at 12:21 PM, Mathieu Desnoyers
>> <mathieu.desnoyers@efficios.com> wrote:
>>>
>>> And I try very hard to avoid being told I'm the one breaking
>>> user-space. ;-)
>> 
>> You *can't* be breaking user space. User space doesn't use this yet.
>> 
>> That's actually why I'd like to start with the minimal set - to make
>> sure we don't introduce features that will come back to bite us later.
>> 
>> The one compelling use case I saw was a memory allocator that used
>> this for getting per-CPU (vs per-thread) memory scaling.
>> 
>> That code didn't need the cpu_opv system call at all.
>> 
>> And if somebody does a ldload of a malloc library, and then wants to
>> analyze the behavior of a program, maybe they should ldload their own
>> malloc routines first? That's pretty much par for the course for those
>> kinds of projects.
>> 
>> So I'd much rather we first merge the non-contentious parts that
>> actually have some numbers for "this improves performance and makes a
>> nice fancy malloc possible".
>> 
>> As it is, the cpu_opv seems to be all about theory, not about actual need.
> 
> I fully get your point about getting the minimal feature in. So let's focus
> on rseq only.
> 
> I will rework the patchset so the rseq selftests don't depend on cpu_opv,
> and remove the cpu_opv stuff. I think it would be a good start for the
> Facebook guys (jemalloc), given that just rseq seems to be enough for them
> for now. It should be enough for the arm64 performance counters as well.
> 
> Then we'll figure out what is needed to make other projects use it based on
> their needs (e.g. lttng-ust, liburcu, glibc malloc), and whether jemalloc
> end up requiring cpu_opv for memory migration between per-cpu pools after all.

So, having done this, I find myself in need of advice regarding smoothly
transitioning existing user-space programs/libraries to rseq. Let's consider
a situation where only rseq (without cpu_opv) eventually gets merged into
4.18.

The proposed rseq implementation presents the following constraints:

- Only a single rseq TLS can be registered per thread, therefore rseq needs
  to be "owned" by a single library (let's say it's librseq.so),
- User-space rseq critical sections need to be inlined into applications and
  libraries for performance reasons (extra branches and calls significantly
  degrade performance of those fast-paths).

I have a ring buffer "space reservation" use-case in my user-space tracer
which requires both rseq and cpu_opv.

My original plan to transition this fast-path to rseq was to test the
@cpu_id field value from the rseq TLS and use a fallback based on
atomic instructions if it is negative. rseq is already designed to ensure
we can compare @cpu_id against @cpu_id_start and detect both migration
(cpu id differs) and rseq ENOSYS with a single branch in the fast path.

Once rseq gets merged and deployed into kernels, this means librseq.so
will actually populate the rseq TLS, and this @cpu_id field will be >= 0.
If kernels are released with rseq but without cpu_opv, then I cannot use
this @cpu_id field to detect whether *both* rseq and cpu_opv are available.

I see a few possible ways to handle this, none of which are particularly
great:

1) Duplicate the entire implementation of the user-space functions where
   the rseq critical sections are inlined, and dynamically detect whether
   cpu_opv is available, and select the right function at runtime. If those
   functions are relatively small this could be acceptable,

2) Code patching based on asm goto. There is no user-space library for
   this at the moment AFAIK, and patching user-space code triggers COW,
   which is bad for TLB and cache locality,

3) Add an extra branch in the rseq fast-path. I would like to avoid this
   especially on arm32, where the cost of an extra branch is significant
   enough to outweigh the benefit of rseq compared to ll/sc.

So far, only option (1) seems relatively acceptable from my perspective,
but that's only because my functions using rseq are relatively small.
If this code bloat is not seen as acceptable, then we should revisit
merging both rseq and cpu_opv at the same time, and make sure CONFIG_RSEQ
selects CONFIG_CPU_OPV.

Thoughts ?

Thanks,

Mathieu


-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

  reply	other threads:[~2018-05-04 14:32 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-04-12 19:27 [RFC PATCH for 4.18 00/23] Restartable sequences and CPU op vector Mathieu Desnoyers
2018-04-12 19:27 ` [RFC PATCH for 4.18 01/23] uapi headers: Provide types_32_64.h (v2) Mathieu Desnoyers
2018-04-12 19:27 ` [RFC PATCH for 4.18 02/23] rseq: Introduce restartable sequences system call (v13) Mathieu Desnoyers
2018-04-12 19:27 ` [RFC PATCH for 4.18 03/23] arm: Add restartable sequences support Mathieu Desnoyers
2018-04-12 19:27 ` [RFC PATCH for 4.18 04/23] arm: Wire up restartable sequences system call Mathieu Desnoyers
2018-04-12 19:27 ` [RFC PATCH for 4.18 05/23] x86: Add support for restartable sequences (v2) Mathieu Desnoyers
2018-04-12 19:27 ` [RFC PATCH for 4.18 06/23] x86: Wire up restartable sequence system call Mathieu Desnoyers
2018-04-12 19:27 ` [RFC PATCH for 4.18 07/23] powerpc: Add support for restartable sequences Mathieu Desnoyers
2018-04-12 19:27 ` [RFC PATCH for 4.18 08/23] powerpc: Wire up restartable sequences system call Mathieu Desnoyers
2018-04-12 19:27 ` [RFC PATCH for 4.18 09/23] sched: Implement push_task_to_cpu (v2) Mathieu Desnoyers
2018-04-12 19:27 ` [RFC PATCH for 4.18 10/23] mm: Introduce vm_map_user_ram, vm_unmap_user_ram Mathieu Desnoyers
2018-04-12 19:27 ` [RFC PATCH for 4.18 11/23] mm: Provide is_vma_noncached Mathieu Desnoyers
2018-04-12 19:27 ` [RFC PATCH for 4.18 12/23] cpu_opv: Provide cpu_opv system call (v7) Mathieu Desnoyers
2018-04-12 19:43   ` Linus Torvalds
2018-04-12 19:59     ` Mathieu Desnoyers
2018-04-12 20:07       ` Linus Torvalds
2018-04-13 12:16         ` Mathieu Desnoyers
2018-04-13 16:37           ` Linus Torvalds
2018-04-13 18:06             ` Mathieu Desnoyers
2018-04-12 20:23     ` Andi Kleen
2018-04-16 16:28       ` Mathieu Desnoyers
2018-04-16 17:02         ` Andi Kleen
2018-04-14 22:44     ` Andy Lutomirski
2018-04-16 18:35       ` Mathieu Desnoyers
2018-04-16 18:39         ` Linus Torvalds
2018-04-16 19:21           ` Mathieu Desnoyers
2018-04-16 19:26             ` Linus Torvalds
2018-04-16 20:58               ` Mathieu Desnoyers
2018-05-04 14:32                 ` Mathieu Desnoyers [this message]
2018-04-12 19:27 ` [RFC PATCH for 4.18 13/23] x86: Wire up cpu_opv system call Mathieu Desnoyers
2018-04-12 19:27 ` [RFC PATCH for 4.18 14/23] powerpc: " Mathieu Desnoyers
2018-04-12 19:27 ` [RFC PATCH for 4.18 15/23] arm: " Mathieu Desnoyers
2018-04-12 19:27 ` [RFC PATCH for 4.18 16/23] selftests: lib.mk: Introduce OVERRIDE_TARGETS Mathieu Desnoyers
2018-04-12 19:27 ` [RFC PATCH for 4.18 17/23] cpu_opv: selftests: Implement selftests (v7) Mathieu Desnoyers
2018-04-12 19:27 ` [RFC PATCH for 4.18 18/23] rseq: selftests: Provide rseq library (v5) Mathieu Desnoyers
2018-04-12 19:27 ` [RFC PATCH for 4.18 19/23] rseq: selftests: Provide percpu_op API Mathieu Desnoyers
2018-04-12 19:27 ` [RFC PATCH for 4.18 20/23] rseq: selftests: Provide basic test Mathieu Desnoyers
2018-04-12 19:27 ` [RFC PATCH for 4.18 21/23] rseq: selftests: Provide basic percpu ops test Mathieu Desnoyers
2018-04-12 19:27 ` [RFC PATCH for 4.18 22/23] rseq: selftests: Provide parametrized tests Mathieu Desnoyers
2018-04-12 19:28 ` [RFC PATCH for 4.18 23/23] rseq: selftests: Provide Makefile, scripts, gitignore Mathieu Desnoyers

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1883133260.11283.1525444373208.JavaMail.zimbra@efficios.com \
    --to=mathieu.desnoyers@efficios.com \
    --cc=ahh@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=andi@firstfloor.org \
    --cc=bmaurer@fb.com \
    --cc=boqun.feng@gmail.com \
    --cc=catalin.marinas@arm.com \
    --cc=cl@linux.com \
    --cc=davejwatson@fb.com \
    --cc=hpa@zytor.com \
    --cc=josh@joshtriplett.org \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux@arm.linux.org.uk \
    --cc=luto@amacapital.net \
    --cc=mingo@redhat.com \
    --cc=mtk.manpages@gmail.com \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=peterz@infradead.org \
    --cc=pjt@google.com \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=will.deacon@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox