From: Chris Metcalf <cmetcalf@tilera.com>
To: Gilad Ben-Yossef <gilad@benyossef.com>
Cc: Christoph Lameter <cl@linux.com>,
Frederic Weisbecker <fweisbec@gmail.com>,
LKML <linux-kernel@vger.kernel.org>,
<linaro-sched-sig@lists.linaro.org>,
Alessio Igor Bogani <abogani@kernel.org>,
Andrew Morton <akpm@linux-foundation.org>,
Avi Kivity <avi@redhat.com>,
Daniel Lezcano <daniel.lezcano@linaro.org>,
Geoff Levand <geoff@infradead.org>,
Ingo Molnar <mingo@kernel.org>,
Max Krasnyansky <maxk@qualcomm.com>,
"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
Peter Zijlstra <peterz@infradead.org>,
Stephen Hemminger <shemminger@vyatta.com>,
Steven Rostedt <rostedt@goodmis.org>,
Sven-Thorsten Dietrich <thebigcorporation@gmail.com>,
Thomas Gleixner <tglx@linutronix.de>,
Zen Lin <zen@openhuawei.org>
Subject: Re: [PATCH 11/32] nohz/cpuset: Don't turn off the tick if rcu needs it
Date: Tue, 27 Mar 2012 11:43:14 -0400 [thread overview]
Message-ID: <4F71E012.7050907@tilera.com> (raw)
In-Reply-To: <CAOtvUMdtprdVxG3d5B-W6Y98SNg1kZms=Q+tmamqN9YkWsiWBg@mail.gmail.com>
On 3/27/2012 11:31 AM, Gilad Ben-Yossef wrote:
> On Thu, Mar 22, 2012 at 7:18 PM, Chris Metcalf <cmetcalf@tilera.com> wrote:
>> On 3/22/2012 3:38 AM, Gilad Ben-Yossef wrote:
>>> On Wed, Mar 21, 2012 at 4:54 PM, Christoph Lameter <cl@linux.com> wrote:
>>>> On Wed, 21 Mar 2012, Frederic Weisbecker wrote:
>>>>
>>>>> If RCU is waiting for the current CPU to complete a grace
>>>>> period, don't turn off the tick. Unlike dynctik-idle, we
>>>>> are not necessarily going to enter into rcu extended quiescent
>>>>> state, so we may need to keep the tick to note current CPU's
>>>>> quiescent states.
>>>> Is there any way for userspace to know that the tick is not off yet due to
>>>> this? It would make sense for us to have busy loop in user space that
>>>> waits until the OS has completed all processing if that avoids future
>>>> latencies for the application.
>>>>
>>> I previously suggested having the user register to receive a signal
>>> when the tick
>>> is turned off. Since the tick is always turned off the user task is
>>> the current task
>>> by design, *I think* you can simply mark the signal pending when you
>>> turn the tick off.
>>>
>>> The user would register a signal handler to set a flag when it is
>>> called and then busy
>>> loop waiting for a flag to clear.
>> This sounds plausible, but the kernel would have to know that the tick not
>> only was stopped currently, but also would still be stopped when the signal
>> handler's sigreturn syscall was performed.
> Well, I'd say send a signal when the tick is turned off and another
> signal when it's
> turned on again.
The thing is, what our customers seem to want is to be able to tell the
kernel to go away and not bother them again, ever, as long as their
application is running correctly. Obviously if it crashes, or if some
intervention is required, or whatever, they want the kernel to step in, but
otherwise the proposed signal mechanisms don't seem to help the case that
they're interested in. I don't think we've seen a customer application
where the signal mechanism would be helpful (unfortunately, since it does
seem like a cool idea).
Basically if the kernel interrupts a nohz application core, that's a fail.
It's interesting to know that such a fail has happened, but sending a
signal just makes it an even worse fail: more overhead. One thing I could
imagine that might be useful would be to register a region of user memory
that the kernel could put statistics of some kind into, obviously the
"bool" flag that says whether you're running tickless, but also things like
a count of the number of interrupts (e.g. ticks, but really anything) the
kernel had to deliver, the time of the last interrupt that was delivered,
maybe some breakdown by type of interrupt, etc. Then if the application
detects an interruption, or perhaps just periodically, it can inspect that
state area and report on any bad developments: and these would be basically
kernel bugs from failing to protect the nohz core the way it had asked, or
else application bugs from accidentally requesting a kernel service
unintentionally.
>> The problem we've seen is that
>> it's sometimes somewhat nondeterministic when the kernel might decide it
>> needed some more ticking, once you let kernel code start to run. For
>> example, for RCU ops the kernel can choose to ignore the nohz cpuset cores
>> when they're running userspace code only, but as soon as they get back into
>> the kernel for any reason, you may need to schedule a grace period, and so
>> just returning from the "you have no more ticks!" signal handler ends up
>> causing ticks to be scheduled.
> There is no real difference from the user stand point between the
> return signal sys call
> doing something that causes the tick to be turned on and an IPI or
> timer that turns on
> the tick a nano second after the signal return system call returned.
>
> The return signal syscall setting the tick on is just a private,
> though annoying, case of the
> tick getting turned on by something.
Yes, but see above: the claim I'm making is that we can arrange for a
well-behaved application to *expect* not to get kernel interrupts, so if
they happen, something has gone wrong.
>> The approach we took for the Tilera dataplane mode was to have a syscall
>> that would hold the task in the kernel until any ticks were done, and only
>> then return to userspace. (This is the same set_dataplane() syscall that
>> also offers some flags to control and debug the dataplane stuff in general;
>> in fact the "hold in kernel" support is a mode we set for all syscalls, to
>> keep things deterministic.) This way the "busy loop" is done in the
>> kernel, but in fact we explicitly go into idle until the next tick, so it's
>> lower-power.
>>
> Yes, I saw that. My gripe with it is that puts the policy of what to do
> while we wait for the tick to go away in the kernel. I usually hate the
> kernel to take decisions on what to do. I want it to give mechanisms
> and let the programmer set the policy.- e.g. have a led blink while
> you're waiting for the
> and the tick to go away so that the poor end user will know we are
> still waiting for
> the starts to align just right...
This is a fair point. On the other hand, the way we implemented it is
basically just a mode flag that is checked on all returns from the kernel,
that allow userspace to invoke kernel functions "synchronously", but
slowly, and not get hammered later by unexpected interrupts. So from that
point of view, we don't expect userspace to have anything useful to do on
return from syscalls or page faults other than wait in the kernel anyway.
But if the application did want to do something fancy for those few
hundredths of a second while the ticks settle, you could imagine not using
this "wait in kernel" mode, and instead spinning on the proposed data
structure described above.
> I'm not sure that is so big a deal, but that is why I thought of a
> signal handler.
>
>> An alternative approach, not so good for power but at least avoiding the
>> "use the kernel to avoid the kernel" aspect of signals, would be to
>> register a location in userspace that the kernel would write to when it
>> disabled the tick, and userspace could then just spin reading memory.
>>
> That's cool for letting you know when the tick goes away but not for alarming
> you when it suddenly came back... :-)
Yes, and in fact delivering a signal is not a bad way to let the
application know that either it, or the kernel, just screwed up. Currently
our dataplane code just handles this case with console backtraces (for the
"debug" mode) or by shooting down the application with SIGKILL (in "strict"
mode when it's said it wasn't going to use the kernel any more).
--
Chris Metcalf, Tilera Corp.
http://www.tilera.com
next prev parent reply other threads:[~2012-03-27 15:43 UTC|newest]
Thread overview: 96+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-03-21 13:58 [RFC][PATCH 00/32] Nohz cpusets v2 (adaptive tickless kernel) Frederic Weisbecker
2012-03-21 13:58 ` Frederic Weisbecker
2012-04-04 15:33 ` warning in tick_nohz_irq_exit Stephen Hemminger
2012-04-04 20:45 ` Frederic Weisbecker
2012-03-21 13:58 ` [PATCH 01/32] nohz: Separate idle sleeping time accounting from nohz logic Frederic Weisbecker
2012-03-21 13:58 ` [PATCH 02/32] nohz: Make nohz API agnostic against idle ticks cputime accounting Frederic Weisbecker
2012-03-21 13:58 ` [PATCH 03/32] nohz: Rename ts->idle_tick to ts->last_tick Frederic Weisbecker
2012-03-21 13:58 ` [PATCH 04/32] nohz: Move nohz load balancer selection into idle logic Frederic Weisbecker
2012-03-21 13:58 ` [PATCH 05/32] nohz: Move ts->idle_calls incrementation into strict " Frederic Weisbecker
2012-03-21 13:58 ` [PATCH 06/32] nohz: Move next idle expiry time record into idle logic area Frederic Weisbecker
2012-03-21 13:58 ` [PATCH 07/32] cpuset: Set up interface for nohz flag Frederic Weisbecker
2012-03-21 14:50 ` Christoph Lameter
2012-03-22 4:03 ` Mike Galbraith
2012-03-22 16:26 ` Christoph Lameter
2012-03-22 19:20 ` Mike Galbraith
2012-03-27 11:22 ` Frederic Weisbecker
2012-03-27 11:53 ` Mike Galbraith
2012-03-27 11:56 ` Frederic Weisbecker
2012-03-27 12:31 ` Mike Galbraith
2012-03-27 11:19 ` Frederic Weisbecker
2012-03-21 13:58 ` [PATCH 08/32] nohz: Try not to give the timekeeping duty to an adaptive tickless cpu Frederic Weisbecker
2012-03-21 14:52 ` Christoph Lameter
2012-03-27 10:50 ` Frederic Weisbecker
2012-03-27 16:08 ` Christoph Lameter
2012-03-27 16:47 ` Peter Zijlstra
2012-03-28 1:12 ` Christoph Lameter
2012-03-28 8:39 ` Peter Zijlstra
2012-03-28 13:11 ` Dimitri Sivanich
2012-03-28 15:51 ` Chris Metcalf
2012-03-30 1:34 ` Frederic Weisbecker
2012-03-21 13:58 ` [PATCH 09/32] x86: New cpuset nohz irq vector Frederic Weisbecker
2012-03-21 13:58 ` [PATCH 10/32] nohz: Adaptive tick stop and restart on nohz cpuset Frederic Weisbecker
2012-03-21 13:58 ` [PATCH 11/32] nohz/cpuset: Don't turn off the tick if rcu needs it Frederic Weisbecker
2012-03-21 14:54 ` Christoph Lameter
2012-03-22 7:38 ` Gilad Ben-Yossef
2012-03-22 16:18 ` Christoph Lameter
2012-03-27 15:21 ` Gilad Ben-Yossef
2012-03-28 12:39 ` Frederic Weisbecker
2012-03-28 12:57 ` Gilad Ben-Yossef
2012-03-28 13:38 ` Frederic Weisbecker
2012-03-22 17:18 ` Chris Metcalf
2012-03-27 15:31 ` Gilad Ben-Yossef
2012-03-27 15:43 ` Chris Metcalf [this message]
2012-03-28 8:36 ` Gilad Ben-Yossef
2012-03-27 12:13 ` Frederic Weisbecker
2012-03-27 16:13 ` Christoph Lameter
2012-03-27 16:24 ` Steven Rostedt
2012-03-28 0:42 ` Christoph Lameter
2012-03-28 1:06 ` Steven Rostedt
2012-03-28 1:19 ` Christoph Lameter
2012-03-28 1:35 ` Steven Rostedt
2012-03-28 3:17 ` Steven Rostedt
2012-03-28 7:55 ` Gilad Ben-Yossef
2012-03-28 12:21 ` Frederic Weisbecker
2012-03-28 12:41 ` Gilad Ben-Yossef
2012-03-28 14:02 ` Steven Rostedt
2012-03-28 11:53 ` Frederic Weisbecker
2012-03-21 13:58 ` [PATCH 12/32] nohz/cpuset: Wake up adaptive nohz CPU when a timer gets enqueued Frederic Weisbecker
2012-03-21 13:58 ` [PATCH 13/32] nohz/cpuset: Don't stop the tick if posix cpu timers are running Frederic Weisbecker
2012-03-21 13:58 ` [PATCH 14/32] nohz/cpuset: Restart tick when nohz flag is cleared on cpuset Frederic Weisbecker
2012-03-21 13:58 ` [PATCH 15/32] nohz/cpuset: Restart the tick if printk needs it Frederic Weisbecker
2012-03-21 13:58 ` [PATCH 16/32] rcu: Restart the tick on non-responding adaptive nohz CPUs Frederic Weisbecker
2012-03-21 13:58 ` [PATCH 17/32] rcu: Restart tick if we enqueue a callback in a nohz/cpuset CPU Frederic Weisbecker
2012-03-21 13:58 ` [PATCH 18/32] nohz: Generalize tickless cpu time accounting Frederic Weisbecker
2012-03-21 13:58 ` [PATCH 19/32] nohz/cpuset: Account user and system times in adaptive nohz mode Frederic Weisbecker
2012-03-21 13:58 ` [PATCH 20/32] nohz/cpuset: New API to flush cputimes on nohz cpusets Frederic Weisbecker
2012-03-21 13:58 ` [PATCH 21/32] nohz/cpuset: Flush cputime on threads in nohz cpusets when waiting leader Frederic Weisbecker
2012-03-27 14:10 ` Gilad Ben-Yossef
2012-03-27 14:23 ` Gilad Ben-Yossef
2012-03-28 11:20 ` Frederic Weisbecker
2012-03-21 13:58 ` [PATCH 22/32] nohz/cpuset: Flush cputimes on procfs stat file read Frederic Weisbecker
2012-03-21 13:58 ` [PATCH 23/32] nohz/cpuset: Flush cputimes for getrusage() and times() syscalls Frederic Weisbecker
2012-03-21 13:58 ` [PATCH 24/32] x86: Syscall hooks for nohz cpusets Frederic Weisbecker
2012-03-21 13:58 ` [PATCH 25/32] x86: Exception " Frederic Weisbecker
2012-03-21 13:58 ` [PATCH 26/32] x86: Add adaptive tickless hooks on do_notify_resume() Frederic Weisbecker
2012-03-21 13:58 ` [PATCH 27/32] nohz: Don't restart the tick before scheduling to idle Frederic Weisbecker
2012-03-21 13:58 ` [PATCH 28/32] rcu: New rcu_user_enter() and rcu_user_exit() APIs Frederic Weisbecker
2012-03-21 13:58 ` [PATCH 29/32] rcu: New rcu_user_enter_irq() and rcu_user_exit_irq() APIs Frederic Weisbecker
2012-03-21 13:58 ` [PATCH 30/32] rcu: Switch to extended quiescent state in userspace from nohz cpuset Frederic Weisbecker
2012-03-21 13:58 ` [PATCH 31/32] nohz: Exit RCU idle mode when we schedule before resuming userspace Frederic Weisbecker
2012-03-21 13:58 ` [PATCH 32/32] nohz/cpuset: Disable under some configs Frederic Weisbecker
2012-03-27 15:02 ` [RFC][PATCH 00/32] Nohz cpusets v2 (adaptive tickless kernel) Gilad Ben-Yossef
2012-03-27 15:04 ` Gilad Ben-Yossef
2012-03-27 15:05 ` Gilad Ben-Yossef
2012-03-27 16:22 ` Christoph Lameter
2012-03-28 6:47 ` Gilad Ben-Yossef
2012-03-27 15:10 ` Peter Zijlstra
2012-03-27 15:18 ` Gilad Ben-Yossef
2012-05-22 21:31 ` Thomas Gleixner
2012-05-22 21:50 ` Steven Rostedt
2012-05-22 22:22 ` Thomas Gleixner
2012-03-28 11:43 ` Frederic Weisbecker
2012-03-30 0:33 ` Kevin Hilman
2012-03-30 0:45 ` Frederic Weisbecker
2012-03-30 2:07 ` Geoff Levand
2012-03-30 14:10 ` Kevin Hilman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4F71E012.7050907@tilera.com \
--to=cmetcalf@tilera.com \
--cc=abogani@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=avi@redhat.com \
--cc=cl@linux.com \
--cc=daniel.lezcano@linaro.org \
--cc=fweisbec@gmail.com \
--cc=geoff@infradead.org \
--cc=gilad@benyossef.com \
--cc=linaro-sched-sig@lists.linaro.org \
--cc=linux-kernel@vger.kernel.org \
--cc=maxk@qualcomm.com \
--cc=mingo@kernel.org \
--cc=paulmck@linux.vnet.ibm.com \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=shemminger@vyatta.com \
--cc=tglx@linutronix.de \
--cc=thebigcorporation@gmail.com \
--cc=zen@openhuawei.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.