From: Ingo Molnar <mingo@kernel.org>
To: Andy Lutomirski <luto@amacapital.net>
Cc: fweisbec@redhat.com, Paolo Bonzini <pbonzini@redhat.com>,
X86 ML <x86@kernel.org>, Thomas Gleixner <tglx@linutronix.de>,
Peter Zijlstra <peterz@infradead.org>,
Heiko Carstens <heiko.carstens@de.ibm.com>,
Ingo Molnar <mingo@redhat.com>,
Mike Galbraith <umgwanakikbuti@gmail.com>,
Rik van Riel <riel@redhat.com>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
williams@redhat.com
Subject: Re: [PATCH 3/3] context_tracking,x86: remove extraneous irq disable & enable from context tracking on syscall entry
Date: Fri, 8 May 2015 08:37:11 +0200 [thread overview]
Message-ID: <20150508063711.GA4934@gmail.com> (raw)
In-Reply-To: <CALCETrX8zEcAXBAsut8MGU7CwEMd_dFYDsRyysY0qR18qAC0rg@mail.gmail.com>
* Andy Lutomirski <luto@amacapital.net> wrote:
> > So do you mean:
> >
> > this_cpu_set(rcu_state) = IN_KERNEL;
> > ...
> > this_cpu_inc(rcu_qs_ctr);
> > this_cpu_set(rcu_state) = IN_USER;
> >
> > ?
> >
> > So in your proposal we'd have an INC and two MOVs. I think we can make
> > it just two simple stores into a byte flag, one on entry and one on
> > exit:
> >
> > this_cpu_set(rcu_state) = IN_KERNEL;
> > ...
> > this_cpu_set(rcu_state) = IN_USER;
> >
>
> I was thinking that either a counter or a state flag could make sense.
> Doing both would be pointless. The counter could use the low bits to
> indicate the state. The benefit of the counter would be that the
> RCU-waiting CPU could observe that the counter has incremented and
> that therefore a grace period has elapsed. Getting it right would
> require lots of care.
So if you mean:
<syscall entry>
...
this_cpu_inc(rcu_qs_ctr);
<syscall exit>
I don't see how this would work reliably: how do you handle the case
of a SCHED_FIFO task never returning from user-space (common technique
in RT apps)? synchronize_rcu() would block indefinitely as it would
never see rcu_qs_ctr increase.
We have to be able to observe user-mode anyway, for system-time
statistics purposes, and that flag could IMHO also drive the RCU GC
machinery.
> > > The problem is that I don't see how TIF_RCU_THINGY can work
> > > reliably. If the remote CPU sets it, it'll be too late and we'll
> > > still enter user mode without seeing it. If it's just an
> > > optimization, though, then it should be fine.
> >
> > Well, after setting it, the remote CPU has to re-check whether the
> > RT CPU has entered user-mode - before it goes to wait.
>
> How?
>
> Suppose the exit path looked like:
>
> this_cpu_write(rcu_state, IN_USER);
>
> if (ti->flags & _TIF_RCU_NOTIFY) {
> if (test_and_clear_bit(TIF_RCU_NOTIFY, &ti->flags))
> slow_notify_rcu_that_we_are_exiting();
> }
>
> iret or sysret;
No, it would look like this:
this_cpu_write(rcu_state, IN_USER);
iret or sysret;
I.e. IN_USER is set well after all notifications are checked. No
kernel execution happens afterwards. (No extra checks added - the
regular return-to-user-work checks would handle TIF_RCU_NOTIFY.)
( Same goes for idle: we just mark it IN_IDLE and move it back to
IN_KERNEL after the idling ends. )
> The RCU-waiting CPU sees that rcu_state == IN_KERNEL and sets
> _TIF_RCU_NOTIFY. This could happen arbitrarily late before IRET
> because stores can be delayed. (It could even happen after sysret,
> IIRC, but IRET is serializing.)
All it has to do in the synchronize_rcu() slowpath is something like:
if (per_cpu(rcu_state, rt_cpu) == IN_KERNEL) {
smp_mb__before_atomic();
set_tsk_thread_flag(remote_task, TIF_RCU_NOTIFY);
smp_rmb();
if (per_cpu(rcu_state, rt_cpu) == IN_KERNEL)
... go wait ...
}
/* Cool, we observed quiescent state: */
The cost of the trivial barrier is nothing compared to the 'go wait'
cost which we will pay in 99.9% of the cases!
> If we put an mfence after this_cpu_set or did an unconditional
> test_and_clear_bit on ti->flags then this problem goes away, but
> that would probably be slower than we'd like.
We are talking about a dozen cycles, while a typical synchronize_rcu()
will wait millions (sometimes billions) of cycles. There's absolutely
zero performance concern here and it's all CPU local in any case.
In fact a user-mode/kernel-mode flag speeds up naive implementations
of synchronize_rcu(): because it's able to observe extended quiescent
state immediately, without having to wait for a counter to increase
(which was always the classic way to observe grace periods).
If all CPUs are in user mode or are idle (which is rather common!)
then synchronize_rcu() could return almost immediately - while
previously it had to wait for scheduling or periodic timer irqs to
trigger on all CPUs - adding many millisecs of delay even in the best
of cases.
Thanks,
Ingo
next prev parent reply other threads:[~2015-05-08 6:37 UTC|newest]
Thread overview: 83+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-04-30 21:23 [PATCH 0/3] reduce nohz_full syscall overhead by 10% riel
2015-04-30 21:23 ` [PATCH 1/3] reduce indentation in __acct_update_integrals riel
2015-04-30 21:23 ` [PATCH 2/3] remove local_irq_save from __acct_update_integrals riel
2015-04-30 21:23 ` [PATCH 3/3] context_tracking,x86: remove extraneous irq disable & enable from context tracking on syscall entry riel
2015-04-30 21:56 ` Andy Lutomirski
2015-05-01 6:40 ` Ingo Molnar
2015-05-01 15:20 ` Rik van Riel
2015-05-01 15:59 ` Ingo Molnar
2015-05-01 16:03 ` Andy Lutomirski
2015-05-01 16:21 ` Ingo Molnar
2015-05-01 16:26 ` Rik van Riel
2015-05-01 16:34 ` Ingo Molnar
2015-05-01 18:05 ` Rik van Riel
2015-05-01 18:40 ` Ingo Molnar
2015-05-01 19:11 ` Rik van Riel
2015-05-01 19:37 ` Andy Lutomirski
2015-05-02 5:27 ` Ingo Molnar
2015-05-02 18:27 ` Rik van Riel
2015-05-03 18:41 ` Andy Lutomirski
2015-05-07 10:35 ` Ingo Molnar
2015-05-04 9:26 ` Paolo Bonzini
2015-05-04 13:30 ` Rik van Riel
2015-05-04 14:06 ` Rik van Riel
2015-05-04 14:19 ` Rik van Riel
2015-05-04 15:59 ` question about RCU dynticks_nesting Rik van Riel
2015-05-04 18:39 ` Paul E. McKenney
2015-05-04 19:39 ` Rik van Riel
2015-05-04 20:02 ` Paul E. McKenney
2015-05-04 20:13 ` Rik van Riel
2015-05-04 20:38 ` Paul E. McKenney
2015-05-04 20:53 ` Rik van Riel
2015-05-05 5:54 ` Paul E. McKenney
2015-05-06 1:49 ` Mike Galbraith
2015-05-06 3:44 ` Mike Galbraith
2015-05-06 6:06 ` Paul E. McKenney
2015-05-06 6:52 ` Mike Galbraith
2015-05-06 7:01 ` Mike Galbraith
2015-05-07 0:59 ` Frederic Weisbecker
2015-05-07 15:44 ` Rik van Riel
2015-05-04 19:00 ` Rik van Riel
2015-05-04 19:39 ` Paul E. McKenney
2015-05-04 19:59 ` Rik van Riel
2015-05-04 20:40 ` Paul E. McKenney
2015-05-05 10:53 ` Peter Zijlstra
2015-05-05 12:34 ` Paul E. McKenney
2015-05-05 13:00 ` Peter Zijlstra
2015-05-05 18:35 ` Paul E. McKenney
2015-05-05 21:09 ` Rik van Riel
2015-05-06 5:41 ` Paul E. McKenney
2015-05-05 10:48 ` Peter Zijlstra
2015-05-05 10:51 ` Peter Zijlstra
2015-05-05 12:30 ` Paul E. McKenney
2015-05-02 4:06 ` [PATCH 3/3] context_tracking,x86: remove extraneous irq disable & enable from context tracking on syscall entry Mike Galbraith
2015-05-01 16:37 ` Ingo Molnar
2015-05-01 16:40 ` Rik van Riel
2015-05-01 16:45 ` Ingo Molnar
2015-05-01 16:54 ` Rik van Riel
2015-05-01 17:12 ` Ingo Molnar
2015-05-01 17:22 ` Rik van Riel
2015-05-01 17:59 ` Ingo Molnar
2015-05-01 16:22 ` Rik van Riel
2015-05-01 16:27 ` Ingo Molnar
2015-05-03 13:23 ` Mike Galbraith
2015-05-03 17:30 ` Rik van Riel
2015-05-03 18:24 ` Andy Lutomirski
2015-05-03 18:52 ` Rik van Riel
2015-05-07 10:48 ` Ingo Molnar
2015-05-07 12:18 ` Frederic Weisbecker
2015-05-07 12:29 ` Ingo Molnar
2015-05-07 15:47 ` Rik van Riel
2015-05-08 7:58 ` Ingo Molnar
2015-05-07 12:22 ` Andy Lutomirski
2015-05-07 12:44 ` Ingo Molnar
2015-05-07 12:49 ` Ingo Molnar
2015-05-08 6:17 ` Paul E. McKenney
2015-05-07 12:52 ` Andy Lutomirski
2015-05-07 15:08 ` Ingo Molnar
2015-05-07 17:47 ` Andy Lutomirski
2015-05-08 6:37 ` Ingo Molnar [this message]
2015-05-08 10:59 ` Andy Lutomirski
2015-05-08 11:27 ` Ingo Molnar
2015-05-08 12:56 ` Andy Lutomirski
2015-05-08 13:27 ` Ingo Molnar
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150508063711.GA4934@gmail.com \
--to=mingo@kernel.org \
--cc=fweisbec@redhat.com \
--cc=heiko.carstens@de.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=luto@amacapital.net \
--cc=mingo@redhat.com \
--cc=pbonzini@redhat.com \
--cc=peterz@infradead.org \
--cc=riel@redhat.com \
--cc=tglx@linutronix.de \
--cc=umgwanakikbuti@gmail.com \
--cc=williams@redhat.com \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).