linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Ingo Molnar <mingo@kernel.org>
To: Andy Lutomirski <luto@amacapital.net>
Cc: fweisbec@redhat.com, Paolo Bonzini <pbonzini@redhat.com>,
	X86 ML <x86@kernel.org>, Thomas Gleixner <tglx@linutronix.de>,
	Peter Zijlstra <peterz@infradead.org>,
	Heiko Carstens <heiko.carstens@de.ibm.com>,
	Ingo Molnar <mingo@redhat.com>,
	Mike Galbraith <umgwanakikbuti@gmail.com>,
	Rik van Riel <riel@redhat.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	williams@redhat.com
Subject: Re: [PATCH 3/3] context_tracking,x86: remove extraneous irq disable & enable from context tracking on syscall entry
Date: Fri, 8 May 2015 08:37:11 +0200	[thread overview]
Message-ID: <20150508063711.GA4934@gmail.com> (raw)
In-Reply-To: <CALCETrX8zEcAXBAsut8MGU7CwEMd_dFYDsRyysY0qR18qAC0rg@mail.gmail.com>


* Andy Lutomirski <luto@amacapital.net> wrote:

> > So do you mean:
> >
> >    this_cpu_set(rcu_state) = IN_KERNEL;
> >    ...
> >    this_cpu_inc(rcu_qs_ctr);
> >    this_cpu_set(rcu_state) = IN_USER;
> >
> > ?
> >
> > So in your proposal we'd have an INC and two MOVs. I think we can make
> > it just two simple stores into a byte flag, one on entry and one on
> > exit:
> >
> >    this_cpu_set(rcu_state) = IN_KERNEL;
> >    ...
> >    this_cpu_set(rcu_state) = IN_USER;
> >
> 
> I was thinking that either a counter or a state flag could make sense.
> Doing both would be pointless.  The counter could use the low bits to
> indicate the state.  The benefit of the counter would be that the
> RCU-waiting CPU could observe that the counter has incremented and
> that therefore a grace period has elapsed.  Getting it right would
> require lots of care.

So if you mean:

       <syscall entry>
       ...
       this_cpu_inc(rcu_qs_ctr);
       <syscall exit>

I don't see how this would work reliably: how do you handle the case 
of a SCHED_FIFO task never returning from user-space (common technique 
in RT apps)? synchronize_rcu() would block indefinitely as it would 
never see rcu_qs_ctr increase.

We have to be able to observe user-mode anyway, for system-time 
statistics purposes, and that flag could IMHO also drive the RCU GC 
machinery.

> > > The problem is that I don't see how TIF_RCU_THINGY can work 
> > > reliably. If the remote CPU sets it, it'll be too late and we'll 
> > > still enter user mode without seeing it.  If it's just an 
> > > optimization, though, then it should be fine.
> >
> > Well, after setting it, the remote CPU has to re-check whether the 
> > RT CPU has entered user-mode - before it goes to wait.
> 
> How?
> 
> Suppose the exit path looked like:
> 
> this_cpu_write(rcu_state, IN_USER);
> 
> if (ti->flags & _TIF_RCU_NOTIFY) {
>     if (test_and_clear_bit(TIF_RCU_NOTIFY, &ti->flags))
>         slow_notify_rcu_that_we_are_exiting();
> }
> 
> iret or sysret;

No, it would look like this:

   this_cpu_write(rcu_state, IN_USER);
   iret or sysret;

I.e. IN_USER is set well after all notifications are checked. No 
kernel execution happens afterwards. (No extra checks added - the 
regular return-to-user-work checks would handle TIF_RCU_NOTIFY.)

( Same goes for idle: we just mark it IN_IDLE and move it back to 
  IN_KERNEL after the idling ends. )

> The RCU-waiting CPU sees that rcu_state == IN_KERNEL and sets 
> _TIF_RCU_NOTIFY.  This could happen arbitrarily late before IRET 
> because stores can be delayed.  (It could even happen after sysret, 
> IIRC, but IRET is serializing.)

All it has to do in the synchronize_rcu() slowpath is something like:

	if (per_cpu(rcu_state, rt_cpu) == IN_KERNEL) {
		smp_mb__before_atomic();
		set_tsk_thread_flag(remote_task, TIF_RCU_NOTIFY);
		smp_rmb();
		if (per_cpu(rcu_state, rt_cpu) == IN_KERNEL)
			... go wait ...
	}
	/* Cool, we observed quiescent state: */

The cost of the trivial barrier is nothing compared to the 'go wait' 
cost which we will pay in 99.9% of the cases!

> If we put an mfence after this_cpu_set or did an unconditional 
> test_and_clear_bit on ti->flags then this problem goes away, but 
> that would probably be slower than we'd like.

We are talking about a dozen cycles, while a typical synchronize_rcu() 
will wait millions (sometimes billions) of cycles. There's absolutely 
zero performance concern here and it's all CPU local in any case.

In fact a user-mode/kernel-mode flag speeds up naive implementations 
of synchronize_rcu(): because it's able to observe extended quiescent 
state immediately, without having to wait for a counter to increase 
(which was always the classic way to observe grace periods).

If all CPUs are in user mode or are idle (which is rather common!) 
then synchronize_rcu() could return almost immediately - while 
previously it had to wait for scheduling or periodic timer irqs to 
trigger on all CPUs - adding many millisecs of delay even in the best 
of cases.

Thanks,

	Ingo

  reply	other threads:[~2015-05-08  6:37 UTC|newest]

Thread overview: 83+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-04-30 21:23 [PATCH 0/3] reduce nohz_full syscall overhead by 10% riel
2015-04-30 21:23 ` [PATCH 1/3] reduce indentation in __acct_update_integrals riel
2015-04-30 21:23 ` [PATCH 2/3] remove local_irq_save from __acct_update_integrals riel
2015-04-30 21:23 ` [PATCH 3/3] context_tracking,x86: remove extraneous irq disable & enable from context tracking on syscall entry riel
2015-04-30 21:56   ` Andy Lutomirski
2015-05-01  6:40   ` Ingo Molnar
2015-05-01 15:20     ` Rik van Riel
2015-05-01 15:59       ` Ingo Molnar
2015-05-01 16:03         ` Andy Lutomirski
2015-05-01 16:21           ` Ingo Molnar
2015-05-01 16:26             ` Rik van Riel
2015-05-01 16:34               ` Ingo Molnar
2015-05-01 18:05                 ` Rik van Riel
2015-05-01 18:40                   ` Ingo Molnar
2015-05-01 19:11                     ` Rik van Riel
2015-05-01 19:37                       ` Andy Lutomirski
2015-05-02  5:27                         ` Ingo Molnar
2015-05-02 18:27                           ` Rik van Riel
2015-05-03 18:41                           ` Andy Lutomirski
2015-05-07 10:35                             ` Ingo Molnar
2015-05-04  9:26                           ` Paolo Bonzini
2015-05-04 13:30                             ` Rik van Riel
2015-05-04 14:06                             ` Rik van Riel
2015-05-04 14:19                             ` Rik van Riel
2015-05-04 15:59                             ` question about RCU dynticks_nesting Rik van Riel
2015-05-04 18:39                               ` Paul E. McKenney
2015-05-04 19:39                                 ` Rik van Riel
2015-05-04 20:02                                   ` Paul E. McKenney
2015-05-04 20:13                                     ` Rik van Riel
2015-05-04 20:38                                       ` Paul E. McKenney
2015-05-04 20:53                                         ` Rik van Riel
2015-05-05  5:54                                           ` Paul E. McKenney
2015-05-06  1:49                                             ` Mike Galbraith
2015-05-06  3:44                                               ` Mike Galbraith
2015-05-06  6:06                                                 ` Paul E. McKenney
2015-05-06  6:52                                                   ` Mike Galbraith
2015-05-06  7:01                                                     ` Mike Galbraith
2015-05-07  0:59                                           ` Frederic Weisbecker
2015-05-07 15:44                                             ` Rik van Riel
2015-05-04 19:00                               ` Rik van Riel
2015-05-04 19:39                                 ` Paul E. McKenney
2015-05-04 19:59                                   ` Rik van Riel
2015-05-04 20:40                                     ` Paul E. McKenney
2015-05-05 10:53                                   ` Peter Zijlstra
2015-05-05 12:34                                     ` Paul E. McKenney
2015-05-05 13:00                                       ` Peter Zijlstra
2015-05-05 18:35                                         ` Paul E. McKenney
2015-05-05 21:09                                           ` Rik van Riel
2015-05-06  5:41                                             ` Paul E. McKenney
2015-05-05 10:48                                 ` Peter Zijlstra
2015-05-05 10:51                                   ` Peter Zijlstra
2015-05-05 12:30                                     ` Paul E. McKenney
2015-05-02  4:06                   ` [PATCH 3/3] context_tracking,x86: remove extraneous irq disable & enable from context tracking on syscall entry Mike Galbraith
2015-05-01 16:37             ` Ingo Molnar
2015-05-01 16:40               ` Rik van Riel
2015-05-01 16:45                 ` Ingo Molnar
2015-05-01 16:54                   ` Rik van Riel
2015-05-01 17:12                     ` Ingo Molnar
2015-05-01 17:22                       ` Rik van Riel
2015-05-01 17:59                         ` Ingo Molnar
2015-05-01 16:22           ` Rik van Riel
2015-05-01 16:27             ` Ingo Molnar
2015-05-03 13:23       ` Mike Galbraith
2015-05-03 17:30         ` Rik van Riel
2015-05-03 18:24           ` Andy Lutomirski
2015-05-03 18:52             ` Rik van Riel
2015-05-07 10:48               ` Ingo Molnar
2015-05-07 12:18                 ` Frederic Weisbecker
2015-05-07 12:29                   ` Ingo Molnar
2015-05-07 15:47                     ` Rik van Riel
2015-05-08  7:58                       ` Ingo Molnar
2015-05-07 12:22                 ` Andy Lutomirski
2015-05-07 12:44                   ` Ingo Molnar
2015-05-07 12:49                     ` Ingo Molnar
2015-05-08  6:17                       ` Paul E. McKenney
2015-05-07 12:52                     ` Andy Lutomirski
2015-05-07 15:08                       ` Ingo Molnar
2015-05-07 17:47                         ` Andy Lutomirski
2015-05-08  6:37                           ` Ingo Molnar [this message]
2015-05-08 10:59                             ` Andy Lutomirski
2015-05-08 11:27                               ` Ingo Molnar
2015-05-08 12:56                                 ` Andy Lutomirski
2015-05-08 13:27                                   ` Ingo Molnar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150508063711.GA4934@gmail.com \
    --to=mingo@kernel.org \
    --cc=fweisbec@redhat.com \
    --cc=heiko.carstens@de.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@amacapital.net \
    --cc=mingo@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=riel@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=umgwanakikbuti@gmail.com \
    --cc=williams@redhat.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).