public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
To: Dave Watson <davejwatson@fb.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Russell King <linux@arm.linux.org.uk>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, "H. Peter Anvin" <hpa@zytor.com>,
	linux-kernel@vger.kernel.org,
	linux-api <linux-api@vger.kernel.org>,
	Paul Turner <pjt@google.com>, Andrew Hunter <ahh@google.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Andy Lutomirski <luto@amacapital.net>,
	Andi Kleen <andi@firstfloor.org>, Chris Lameter <cl@linux.com>,
	Ben Maurer <bmaurer@fb.com>, rostedt <rostedt@goodmis.org>,
	"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
	Josh Triplett <josh@joshtriplett.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Will Deacon <will.deacon@arm.com>,
	Michael Kerrisk <mtk.manpages@gmail.com>,
	Boqun Feng <boqun.feng@gmail.com>
Subject: Re: [RFC PATCH v7 7/7] Restartable sequences: self-tests
Date: Mon, 25 Jul 2016 18:12:50 +0000 (UTC)	[thread overview]
Message-ID: <2085982979.81688.1469470370270.JavaMail.zimbra@efficios.com> (raw)
In-Reply-To: <CO1PR15MB09822FC140F84DCEEF2004CDDD0B0@CO1PR15MB0982.namprd15.prod.outlook.com>

----- On Jul 23, 2016, at 5:26 PM, Dave Watson davejwatson@fb.com wrote:

[...]

> +static inline __attribute__((always_inline))
> +bool rseq_finish(struct rseq_lock *rlock,
> + intptr_t *p, intptr_t to_write,
> + struct rseq_state start_value)
> +{
> + RSEQ_INJECT_C(9)
> +
> + if (unlikely(start_value.lock_state != RSEQ_LOCK_STATE_RESTART)) {
> + if (start_value.lock_state == RSEQ_LOCK_STATE_LOCK)
> + rseq_fallback_wait(rlock);
> + return false;
> + }
> +
> +#ifdef __x86_64__
> + /*
> + * The __rseq_table section can be used by debuggers to better
> + * handle single-stepping through the restartable critical
> + * sections.
> + */
> + __asm__ __volatile__ goto (
> + ".pushsection __rseq_table, \"aw\"\n\t"
> + ".balign 8\n\t"
> + "4:\n\t"
> + ".quad 1f, 2f, 3f\n\t"
> + ".popsection\n\t"

> Is there a reason we're also passing the start ip? It looks unused.
> I see the "for debuggers" comment, but it looks like all the debugger
> support is done in userspace.

I notice I did not answer this question. This __rseq_table section is
populated by struct rseq_cs elements. This has two uses:

1) Interface with the kernel: only fields "post_commit_ip" and "abort_ip"
   are currently used by the kernel. This is a "critical section descriptor".
   User-space stores a pointer to this descriptor in the struct rseq
   "rseq_cs" field to tell the kernel that it needs to handle the
   rseq assembly block critical section, and it is set back to 0 when
   exiting the critical section.

2) Interface for debuggers: all three fields are used: "start_ip",
   "post_commit_ip", and "abort_ip". When a debugger single-steps
   through the rseq assembly block by placing breakpoints at the
   following instruction (I observed this behavior with gdb on arm32),
   it needs to be made aware that, when single-stepping instructions
   between "start_ip" (included) and "post_commit_ip" (excluded), it
   should also place a breakpoint at "abort_ip", or single-stepping
   would be fooled by an abort.

On 32-bit and 64-bit x86, we can combine the structures for (1) and (2)
and only keep one structure for both. The assembly fast-path can therefore
use the address within the __rseq_table section as pointer to descriptor.

On 32-bit ARM, as an optimization, we keep two copies of this structure:
one is in the __rseq_table section (for debuggers), and the other is
placed near the instruction pointer, so a cheaper ip-relative "adr"
instruction can be used to calculate the address of the descriptor.

If my understanding if correct, you suggest we do the following instead:

struct rseq_cs {
        RSEQ_FIELD_u32_u64(post_commit_ip);
        RSEQ_FIELD_u32_u64(abort_ip);
};

struct rseq_debug_cs {
        struct rseq_cs rseq_cs;
        RSEQ_FIELD_u32_u64(start_ip);
};

So we put struct rseq_debug_cs elements within the __rseq_table section,
and only expose the struct rseq_cs part to the kernel ABI.

Thinking a bit more about this, I think we should use the start_ip in the
kernel too. The idea here is that the end of critical sections (user-space fast
path) currently looks like this tangled mess (e.g. x86-64):

                "cmpl %[start_event_counter], %[current_event_counter]\n\t"
                "jnz 3f\n\t"                         <--- branch in case of failure
                "movq %[to_write], (%[target])\n\t"  <--- commit instruction
                "2:\n\t"
                "movq $0, (%[rseq_cs])\n\t"
                "jmp %l[succeed]\n\t"                <--- jump over the failure path
                "3: movq $0, (%[rseq_cs])\n\t"

Where we basically need to jump over the failure path at the end of the
successful fast path, all because the failure path needs to be placed
at addresses greater or equal to the post_commit_ip.

In the kernel, if rather than testing for:

if ((void __user *)instruction_pointer(regs) < post_commit_ip) {

we could test for both start_ip and post_commit_ip:

if ((void __user *)instruction_pointer(regs) < post_commit_ip
    && (void __user *)instruction_pointer(regs) >= start_ip) {

We could perform the failure path (storing NULL into the rseq_cs
field of struct rseq) in C rather than being required to do it in
assembly at addresses >= to post_commit_ip, all because the kernel
would test whether we are within the assembly block address range
using both the lower and upper bounds (start_ip and post_commit_ip).

The extra check with start_ip in the kernel is only done in a slow
path (in the notify resume handler), and only if the rseq_cs field
of struct rseq is non-NULL (when the kernel actually preempts or
delivers a signal over a rseq critical section), so it should not
matter at all in terms of kernel performance.

Removing this extra jump from the user-space fast-path might not have
much impact on x86, but I expect it to be more important on
architectures like ARM32, where the architecture is less forgiving
when fed sub-optimal assembly.

Thoughts ?

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

      parent reply	other threads:[~2016-07-25 18:12 UTC|newest]

Thread overview: 82+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-07-21 21:14 [RFC PATCH v7 0/7] Restartable sequences system call Mathieu Desnoyers
2016-07-21 21:14 ` [RFC PATCH v7 1/7] " Mathieu Desnoyers
2016-07-25 23:02   ` Andy Lutomirski
2016-07-26  3:02     ` Mathieu Desnoyers
2016-08-03 12:27       ` Peter Zijlstra
2016-08-03 16:37         ` Andy Lutomirski
2016-08-03 18:31           ` Christoph Lameter
2016-08-04  5:01             ` Andy Lutomirski
2016-08-04  4:27           ` Boqun Feng
2016-08-04  5:03             ` Andy Lutomirski
2016-08-09 16:13               ` Boqun Feng
2016-08-10  8:01                 ` Andy Lutomirski
2016-08-10 17:40                   ` Mathieu Desnoyers
2016-08-10 17:33                 ` Mathieu Desnoyers
2016-08-11  4:54                   ` Boqun Feng
2016-08-10  8:13               ` Andy Lutomirski
2016-08-03 18:29       ` Christoph Lameter
2016-08-10 16:47         ` Mathieu Desnoyers
2016-08-10 16:59           ` Christoph Lameter
2016-07-27 15:03   ` Boqun Feng
2016-07-27 15:05     ` [RFC 1/4] rseq/param_test: Convert test_data_entry::count to intptr_t Boqun Feng
2016-07-27 15:05       ` [RFC 2/4] Restartable sequences: powerpc architecture support Boqun Feng
2016-07-28  3:13         ` Mathieu Desnoyers
2016-07-27 15:05       ` [RFC 3/4] Restartable sequences: Wire up powerpc system call Boqun Feng
2016-07-28  3:13         ` Mathieu Desnoyers
2016-07-27 15:05       ` [RFC 4/4] Restartable sequences: Add self-tests for PPC Boqun Feng
2016-07-28  2:59         ` Mathieu Desnoyers
2016-07-28  4:43           ` Boqun Feng
2016-07-28  7:37             ` [RFC v2] " Boqun Feng
2016-07-28 14:04               ` Mathieu Desnoyers
2016-07-28 13:42             ` [RFC 4/4] " Mathieu Desnoyers
2016-07-28  3:07       ` [RFC 1/4] rseq/param_test: Convert test_data_entry::count to intptr_t Mathieu Desnoyers
2016-07-28  3:10     ` [RFC PATCH v7 1/7] Restartable sequences system call Mathieu Desnoyers
2016-08-03 13:19   ` Peter Zijlstra
2016-08-03 14:53     ` Paul E. McKenney
2016-08-03 15:45     ` Boqun Feng
2016-08-07 15:36       ` Mathieu Desnoyers
2016-08-07 23:35         ` Boqun Feng
2016-08-09 13:22           ` Mathieu Desnoyers
2016-08-09 20:06     ` Mathieu Desnoyers
2016-08-09 21:33       ` Peter Zijlstra
2016-08-09 22:41         ` Mathieu Desnoyers
2016-08-10  7:50           ` Peter Zijlstra
2016-08-10 13:26             ` Mathieu Desnoyers
2016-08-10 13:33               ` Peter Zijlstra
2016-08-10 14:04                 ` Mathieu Desnoyers
2016-08-10  8:10       ` Andy Lutomirski
2016-08-10 19:04         ` Mathieu Desnoyers
2016-08-10 19:16           ` Andy Lutomirski
2016-08-10 20:06             ` Mathieu Desnoyers
2016-08-10 20:09               ` Andy Lutomirski
2016-08-10 21:01                 ` Mathieu Desnoyers
2016-08-11  7:23                   ` Andy Lutomirski
2016-08-10  8:43       ` Peter Zijlstra
2016-08-10 13:57         ` Mathieu Desnoyers
2016-08-10 14:28           ` Peter Zijlstra
2016-08-10 14:44             ` Mathieu Desnoyers
2016-08-10 13:29       ` Peter Zijlstra
2016-07-21 21:14 ` [RFC PATCH v7 2/7] tracing: instrument restartable sequences Mathieu Desnoyers
2016-07-21 21:14 ` [RFC PATCH v7 3/7] Restartable sequences: ARM 32 architecture support Mathieu Desnoyers
2016-07-21 21:14 ` [RFC PATCH v7 4/7] Restartable sequences: wire up ARM 32 system call Mathieu Desnoyers
2016-07-21 21:14 ` [RFC PATCH v7 5/7] Restartable sequences: x86 32/64 architecture support Mathieu Desnoyers
2016-07-21 21:14 ` [RFC PATCH v7 6/7] Restartable sequences: wire up x86 32/64 system call Mathieu Desnoyers
2016-07-21 21:14 ` [RFC PATCH v7 7/7] Restartable sequences: self-tests Mathieu Desnoyers
     [not found]   ` <CO1PR15MB09822FC140F84DCEEF2004CDDD0B0@CO1PR15MB0982.namprd15.prod.outlook.com>
2016-07-24  3:09     ` Mathieu Desnoyers
2016-07-24 18:01       ` Dave Watson
2016-07-25 16:43         ` Mathieu Desnoyers
2016-08-11 23:26         ` Mathieu Desnoyers
2016-08-12  1:28           ` Boqun Feng
2016-08-12  3:10             ` Mathieu Desnoyers
2016-08-12  3:13               ` Mathieu Desnoyers
2016-08-12  5:30               ` Boqun Feng
2016-08-12 16:35                 ` Boqun Feng
2016-08-12 18:11                   ` Mathieu Desnoyers
2016-08-13  1:28                     ` Boqun Feng
2016-08-14 15:02                       ` Mathieu Desnoyers
2016-08-15  0:56                         ` Boqun Feng
2016-08-15 18:06                           ` Mathieu Desnoyers
2016-08-12 19:36           ` Mathieu Desnoyers
2016-08-12 20:05             ` Dave Watson
2016-08-14 17:09               ` Mathieu Desnoyers
2016-07-25 18:12     ` Mathieu Desnoyers [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2085982979.81688.1469470370270.JavaMail.zimbra@efficios.com \
    --to=mathieu.desnoyers@efficios.com \
    --cc=ahh@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=andi@firstfloor.org \
    --cc=bmaurer@fb.com \
    --cc=boqun.feng@gmail.com \
    --cc=catalin.marinas@arm.com \
    --cc=cl@linux.com \
    --cc=davejwatson@fb.com \
    --cc=hpa@zytor.com \
    --cc=josh@joshtriplett.org \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux@arm.linux.org.uk \
    --cc=luto@amacapital.net \
    --cc=mingo@redhat.com \
    --cc=mtk.manpages@gmail.com \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=peterz@infradead.org \
    --cc=pjt@google.com \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=will.deacon@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox