All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Paul Turner <pjt@google.com>, Andrew Hunter <ahh@google.com>,
	Ben Maurer <bmaurer@fb.com>,
	linux-kernel@vger.kernel.org, Ingo Molnar <mingo@redhat.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
	Josh Triplett <josh@joshtriplett.org>,
	Lai Jiangshan <laijs@cn.fujitsu.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [RFC PATCH] percpu system call: fast userspace percpu critical sections
Date: Thu, 21 May 2015 19:08:02 +0000 (UTC)	[thread overview]
Message-ID: <511681386.5786.1432235282365.JavaMail.zimbra@efficios.com> (raw)
In-Reply-To: <20150521183240.GW3644@twins.programming.kicks-ass.net>

----- Original Message -----
> On Thu, May 21, 2015 at 10:44:47AM -0400, Mathieu Desnoyers wrote:
> 
> > +struct thread_percpu_user {
> > +	int32_t nesting;
> > +	int32_t signal_sent;
> > +	int32_t signo;
> > +	int32_t current_cpu;
> > +};
> 
> I would require this thing be naturally aligned, such that it does not
> cross cacheline boundaries.

Good point. Adding a comment into the code to that effect.

> 
> > +
> > +static void percpu_user_sched_in(struct preempt_notifier *notifier, int
> > cpu)
> > +{
> > +	struct thread_percpu_user __user *tpu_user;
> > +	struct thread_percpu_user tpu;
> > +	struct task_struct *t = current;
> > +
> > +	tpu_user = t->percpu_user;
> > +	if (tpu_user == NULL)
> > +		return;
> > +	if (unlikely(t->flags & PF_EXITING))
> > +		return;
> > +	/*
> > +	 * access_ok() of tpu_user has already been checked by sys_percpu().
> > +	 */
> > +	if (__put_user(smp_processor_id(), &tpu_user->current_cpu)) {
> > +		WARN_ON_ONCE(1);
> > +		return;
> > +	}
> 
> This seems a waste; you already read the number unconditionally, might
> as well double check and avoid the store.
> 
> > +	if (__copy_from_user(&tpu, tpu_user, sizeof(tpu))) {
> > +		WARN_ON_ONCE(1);
> > +		return;
> > +	}
> 
> 	if (tpu.current_cpu != smp_processor_id())
> 		__put_user();

Yep, and I could even use the "cpu" parameter received by the
function rather than smp_processor_id().

> 
> 
> 
> > +	if (!tpu.nesting || tpu.signal_sent)
> > +		return;
> > +	if (do_send_sig_info(tpu.signo, SEND_SIG_PRIV, t, 0)) {
> > +		WARN_ON_ONCE(1);
> > +		return;
> > +	}
> > +	tpu.signal_sent = 1;
> > +	if (__copy_to_user(tpu_user, &tpu, sizeof(tpu))) {
> > +		WARN_ON_ONCE(1);
> > +		return;
> > +	}
> > +}
> 
> Please do not use preempt notifiers for this.

Do you recommend we issue a function call from the scheduler
finish_task_switch() ?

> 
> Second, this all is done with preemption disabled, this means that all
> that user access can fail.

OK, this is one part I was worried about.

> 
> You print useless WARNs and misbehave. If you detect a desire to fault,
> you could delay until return to userspace and try again there. But it
> all adds complexity.

We could keep a flag, and then call the function again if we detect a
desire to fault.

> 
> The big advantage pjt's scheme had is that we have the instruction
> pointer, we do not need to go read userspace memory that might not be
> there. And it being limited  to a single range, while inconvenient,
> simplifies the entire kernel side to:
> 
> 	if ((unsigned long)(ip - offset) < size)
> 		do_magic();
> 
> Which is still simpler than the above.

There is one big aspect of pjt's approach that I still don't grasp
after all this time that makes me worry. How does it interact with
the following scenario ?

Userspace thread
  - within the code region that needs to be restarted
    - signal handler nested on top
      - running within the signal handler code
        - preempted by kernel
          - checking instruction pointer misses the userspace stack
            underneath the signal handler.

Given this scenario, is the kernel code really as simple as a pointer check
on pt_regs, or do we need a stack walk over all signal frames ? Another way
would be to check for the pt_regs instruction pointer whenever we receive
a signal, but then it would require per-architectures modifications, and
suddenly becomes less straightforward.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

  reply	other threads:[~2015-05-21 19:08 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-05-21 14:44 [RFC PATCH] percpu system call: fast userspace percpu critical sections Mathieu Desnoyers
2015-05-21 15:46 ` Josh Triplett
2015-05-21 18:58   ` Mathieu Desnoyers
2015-05-21 18:32 ` Peter Zijlstra
2015-05-21 19:08   ` Mathieu Desnoyers [this message]
2015-05-21 19:31     ` Paul Turner
2015-05-21 20:07 ` Paul Turner
2015-05-22 20:12   ` Mathieu Desnoyers
     [not found] ` <1432219487-13364-1-git-send-email-mathieu.desnoyers-vg+e7yoeK/dWk0Htik3J/w@public.gmane.org>
2015-05-22 20:26   ` Michael Kerrisk
2015-05-22 20:26     ` Michael Kerrisk
     [not found]     ` <CAHO5Pa0Kok4_QN0v3JNWyzGT=GbZNZcRyLhu02R2npV9hSdt7g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-05-22 20:53       ` Andy Lutomirski
2015-05-22 20:53         ` Andy Lutomirski
2015-05-22 21:34         ` Mathieu Desnoyers
2015-05-22 22:24           ` Andy Lutomirski
     [not found]             ` <CALCETrUxp-dP-kaTy4prEdciM-=sTXjpqnMbkvk38g5BTEvX0g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-05-23 17:09               ` Mathieu Desnoyers
2015-05-23 17:09                 ` Mathieu Desnoyers
2015-05-23 19:15                 ` Andy Lutomirski
     [not found]                   ` <CALCETrWzoFX7hXqvQqDEq=r=7PNaGKVjZeHEBWxPvC28Zi1AKA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-05-25 18:30                     ` Mathieu Desnoyers
2015-05-25 18:30                       ` Mathieu Desnoyers
     [not found]                       ` <1184354091.7499.1432578613872.JavaMail.zimbra-vg+e7yoeK/dWk0Htik3J/w@public.gmane.org>
2015-05-25 18:54                         ` Andy Lutomirski
2015-05-25 18:54                           ` Andy Lutomirski
     [not found]                           ` <CALCETrW3_Hv0jc3cpiwsHTinBqJzvab_EiPS8BVJhX-xe5D8qw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-05-26 19:57                             ` Andy Lutomirski
2015-05-26 19:57                               ` Andy Lutomirski
     [not found]                               ` <CALCETrXzmO=fQC=UdCh5b0zWiGWAJScEtdT4QDJkoqLgtgEVig-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-05-26 21:04                                 ` Mathieu Desnoyers
2015-05-26 21:04                                   ` Mathieu Desnoyers
     [not found]                                   ` <821493560.8531.1432674243321.JavaMail.zimbra-vg+e7yoeK/dWk0Htik3J/w@public.gmane.org>
2015-05-26 21:18                                     ` Andy Lutomirski
2015-05-26 21:18                                       ` Andy Lutomirski
2015-05-26 21:44                                       ` Andy Lutomirski
2015-05-26 20:38                             ` Mathieu Desnoyers
2015-05-26 20:38                               ` Mathieu Desnoyers
     [not found]                               ` <933886515.8478.1432672739485.JavaMail.zimbra-vg+e7yoeK/dWk0Htik3J/w@public.gmane.org>
2015-05-26 20:58                                 ` Andy Lutomirski
2015-05-26 20:58                                   ` Andy Lutomirski
2015-05-26 21:20                           ` Andi Kleen
     [not found]                             ` <20150526212041.GQ19417-1g7Xle2YJi4/4alezvVtWx2eb7JE58TQ@public.gmane.org>
2015-05-26 21:26                               ` Andy Lutomirski
2015-05-26 21:26                                 ` Andy Lutomirski
     [not found]         ` <CALCETrUSBqHG3tbOq1yFz33v1_ckEgLNorgAxwLFi7MkjNcwLA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-05-22 22:06           ` Andrew Hunter
2015-05-22 22:06             ` Andrew Hunter
2015-05-23 20:11 ` Linus Torvalds
2015-05-25 20:21   ` Mathieu Desnoyers
2015-05-29 16:46     ` Christoph Lameter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=511681386.5786.1432235282365.JavaMail.zimbra@efficios.com \
    --to=mathieu.desnoyers@efficios.com \
    --cc=ahh@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=bmaurer@fb.com \
    --cc=josh@joshtriplett.org \
    --cc=laijs@cn.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=peterz@infradead.org \
    --cc=pjt@google.com \
    --cc=rostedt@goodmis.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.