public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Paul Turner <pjt@google.com>, Andrew Hunter <ahh@google.com>,
	Ben Maurer <bmaurer@fb.com>,
	linux-kernel@vger.kernel.org, Ingo Molnar <mingo@redhat.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
	Josh Triplett <josh@joshtriplett.org>,
	Lai Jiangshan <laijs@cn.fujitsu.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [RFC PATCH] percpu system call: fast userspace percpu critical sections
Date: Thu, 21 May 2015 19:08:02 +0000 (UTC)	[thread overview]
Message-ID: <511681386.5786.1432235282365.JavaMail.zimbra@efficios.com> (raw)
In-Reply-To: <20150521183240.GW3644@twins.programming.kicks-ass.net>

----- Original Message -----
> On Thu, May 21, 2015 at 10:44:47AM -0400, Mathieu Desnoyers wrote:
> 
> > +struct thread_percpu_user {
> > +	int32_t nesting;
> > +	int32_t signal_sent;
> > +	int32_t signo;
> > +	int32_t current_cpu;
> > +};
> 
> I would require this thing be naturally aligned, such that it does not
> cross cacheline boundaries.

Good point. Adding a comment into the code to that effect.

> 
> > +
> > +static void percpu_user_sched_in(struct preempt_notifier *notifier, int
> > cpu)
> > +{
> > +	struct thread_percpu_user __user *tpu_user;
> > +	struct thread_percpu_user tpu;
> > +	struct task_struct *t = current;
> > +
> > +	tpu_user = t->percpu_user;
> > +	if (tpu_user == NULL)
> > +		return;
> > +	if (unlikely(t->flags & PF_EXITING))
> > +		return;
> > +	/*
> > +	 * access_ok() of tpu_user has already been checked by sys_percpu().
> > +	 */
> > +	if (__put_user(smp_processor_id(), &tpu_user->current_cpu)) {
> > +		WARN_ON_ONCE(1);
> > +		return;
> > +	}
> 
> This seems a waste; you already read the number unconditionally, might
> as well double check and avoid the store.
> 
> > +	if (__copy_from_user(&tpu, tpu_user, sizeof(tpu))) {
> > +		WARN_ON_ONCE(1);
> > +		return;
> > +	}
> 
> 	if (tpu.current_cpu != smp_processor_id())
> 		__put_user();

Yep, and I could even use the "cpu" parameter received by the
function rather than smp_processor_id().

> 
> 
> 
> > +	if (!tpu.nesting || tpu.signal_sent)
> > +		return;
> > +	if (do_send_sig_info(tpu.signo, SEND_SIG_PRIV, t, 0)) {
> > +		WARN_ON_ONCE(1);
> > +		return;
> > +	}
> > +	tpu.signal_sent = 1;
> > +	if (__copy_to_user(tpu_user, &tpu, sizeof(tpu))) {
> > +		WARN_ON_ONCE(1);
> > +		return;
> > +	}
> > +}
> 
> Please do not use preempt notifiers for this.

Do you recommend we issue a function call from the scheduler
finish_task_switch() ?

> 
> Second, this all is done with preemption disabled, this means that all
> that user access can fail.

OK, this is one part I was worried about.

> 
> You print useless WARNs and misbehave. If you detect a desire to fault,
> you could delay until return to userspace and try again there. But it
> all adds complexity.

We could keep a flag, and then call the function again if we detect a
desire to fault.

> 
> The big advantage pjt's scheme had is that we have the instruction
> pointer, we do not need to go read userspace memory that might not be
> there. And it being limited  to a single range, while inconvenient,
> simplifies the entire kernel side to:
> 
> 	if ((unsigned long)(ip - offset) < size)
> 		do_magic();
> 
> Which is still simpler than the above.

There is one big aspect of pjt's approach that I still don't grasp
after all this time that makes me worry. How does it interact with
the following scenario ?

Userspace thread
  - within the code region that needs to be restarted
    - signal handler nested on top
      - running within the signal handler code
        - preempted by kernel
          - checking instruction pointer misses the userspace stack
            underneath the signal handler.

Given this scenario, is the kernel code really as simple as a pointer check
on pt_regs, or do we need a stack walk over all signal frames ? Another way
would be to check for the pt_regs instruction pointer whenever we receive
a signal, but then it would require per-architectures modifications, and
suddenly becomes less straightforward.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

  reply	other threads:[~2015-05-21 19:08 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-05-21 14:44 [RFC PATCH] percpu system call: fast userspace percpu critical sections Mathieu Desnoyers
2015-05-21 15:46 ` Josh Triplett
2015-05-21 18:58   ` Mathieu Desnoyers
2015-05-21 18:32 ` Peter Zijlstra
2015-05-21 19:08   ` Mathieu Desnoyers [this message]
2015-05-21 19:31     ` Paul Turner
2015-05-21 20:07 ` Paul Turner
2015-05-22 20:12   ` Mathieu Desnoyers
2015-05-22 20:26 ` Michael Kerrisk
2015-05-22 20:53   ` Andy Lutomirski
2015-05-22 21:34     ` Mathieu Desnoyers
2015-05-22 22:24       ` Andy Lutomirski
2015-05-23 17:09         ` Mathieu Desnoyers
2015-05-23 19:15           ` Andy Lutomirski
2015-05-25 18:30             ` Mathieu Desnoyers
2015-05-25 18:54               ` Andy Lutomirski
2015-05-26 19:57                 ` Andy Lutomirski
2015-05-26 21:04                   ` Mathieu Desnoyers
2015-05-26 21:18                     ` Andy Lutomirski
2015-05-26 21:44                       ` Andy Lutomirski
2015-05-26 20:38                 ` Mathieu Desnoyers
2015-05-26 20:58                   ` Andy Lutomirski
2015-05-26 21:20                 ` Andi Kleen
2015-05-26 21:26                   ` Andy Lutomirski
2015-05-22 22:06     ` Andrew Hunter
2015-05-23 20:11 ` Linus Torvalds
2015-05-25 20:21   ` Mathieu Desnoyers
2015-05-29 16:46     ` Christoph Lameter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=511681386.5786.1432235282365.JavaMail.zimbra@efficios.com \
    --to=mathieu.desnoyers@efficios.com \
    --cc=ahh@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=bmaurer@fb.com \
    --cc=josh@joshtriplett.org \
    --cc=laijs@cn.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=peterz@infradead.org \
    --cc=pjt@google.com \
    --cc=rostedt@goodmis.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox