From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Frederic Weisbecker <fweisbec@gmail.com>
Cc: LKML <linux-kernel@vger.kernel.org>,
David Miller <davem@davemloft.net>,
Chris Metcalf <cmetcalf@tilera.com>,
Guan Xuetao <gxt@mprc.pku.edu.cn>,
Hans-Christian Egtvedt <hans-christian.egtvedt@atmel.com>,
Mike Frysinger <vapier@gentoo.org>,
Ralf Baechle <ralf@linux-mips.org>, Ingo Molnar <mingo@elte.hu>,
Peter Zijlstra <a.p.zijlstra@chello.nl>,
Thomas Gleixner <tglx@linutronix.de>,
"H. Peter Anvin" <hpa@zytor.com>,
Russell King <linux@arm.linux.org.uk>,
Paul Mackerras <paulus@samba.org>,
Heiko Carstens <heiko.carstens@de.ibm.com>,
Paul Mundt <lethal@linux-sh.org>
Subject: Re: [PATCH 1/4] nohz: Split extended quiescent state handling from nohz switch
Date: Sun, 4 Sep 2011 16:36:43 -0700 [thread overview]
Message-ID: <20110904233643.GN2411@linux.vnet.ibm.com> (raw)
In-Reply-To: <1313861452-24783-2-git-send-email-fweisbec@gmail.com>
On Sat, Aug 20, 2011 at 07:30:49PM +0200, Frederic Weisbecker wrote:
> It is assumed that rcu won't be used once we switch to tickless
> mode and until we restart the tick. However this is not always
> true, as in x86-64 where we dereference the idle notifiers after
> the tick is stopped.
>
> To prepare for fixing this, split the tickless mode switching and
> RCU extended quiescent state logics.
> Make tick_nohz_stop/restart_sched_tick() RCU agnostic but provide
> a new pair of APIs tick_nohz_enter/exit_idle() that keep the
> old behaviour by handling both the nohz mode and RCU extended
> quiescent states, then convert every archs to use these.
>
> Archs that want to switch to RCU extended QS to some custom points
> can do it later by changing the parameter in tick_nohz_enter,exit_idle()
> to false and call rcu_enter,exit() separately.
This approach looks quite good to me! A few comments below.
> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
> Cc: David Miller <davem@davemloft.net>
> Cc: Chris Metcalf <cmetcalf@tilera.com>
> Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
> Cc: Hans-Christian Egtvedt <hans-christian.egtvedt@atmel.com>
> Cc: Mike Frysinger <vapier@gentoo.org>
> Cc: Ralf Baechle <ralf@linux-mips.org>
> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> Cc: Ingo Molnar <mingo@elte.hu>
> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: H. Peter Anvin <hpa@zytor.com>
> Cc: Russell King <linux@arm.linux.org.uk>
> Cc: Paul Mackerras <paulus@samba.org>
> Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
> Cc: Paul Mundt <lethal@linux-sh.org>
> ---
[ . . . ]
> diff --git a/arch/powerpc/kernel/idle.c b/arch/powerpc/kernel/idle.c
> index 39a2baa..b30ddf1 100644
> --- a/arch/powerpc/kernel/idle.c
> +++ b/arch/powerpc/kernel/idle.c
> @@ -56,7 +56,7 @@ void cpu_idle(void)
>
> set_thread_flag(TIF_POLLING_NRFLAG);
> while (1) {
> - tick_nohz_stop_sched_tick(1);
> + tick_nohz_enter_idle(true);
This needs to pass false, as some variants of powerpc use event tracing
(which in turn uses RCU) just before returning to the hypervisor.
> while (!need_resched() && !cpu_should_die()) {
> ppc64_runlatch_off();
>
> @@ -93,7 +93,7 @@ void cpu_idle(void)
>
> HMT_medium();
> ppc64_runlatch_on();
> - tick_nohz_restart_sched_tick();
> + tick_nohz_exit_idle(true);
As does this, as some variants of powerpc use event tracing just after
the hypervisor returns control to the OS.
> preempt_enable_no_resched();
> if (cpu_should_die())
> cpu_die();
> diff --git a/arch/powerpc/platforms/iseries/setup.c b/arch/powerpc/platforms/iseries/setup.c
> index c25a081..aa2be7d 100644
> --- a/arch/powerpc/platforms/iseries/setup.c
> +++ b/arch/powerpc/platforms/iseries/setup.c
> @@ -562,7 +562,7 @@ static void yield_shared_processor(void)
> static void iseries_shared_idle(void)
> {
> while (1) {
> - tick_nohz_stop_sched_tick(1);
> + tick_nohz_enter_idle(true);
But I don't know enough about iseries to know whether or not this works.
> while (!need_resched() && !hvlpevent_is_pending()) {
> local_irq_disable();
> ppc64_runlatch_off();
> @@ -576,7 +576,7 @@ static void iseries_shared_idle(void)
> }
>
> ppc64_runlatch_on();
> - tick_nohz_restart_sched_tick();
> + tick_nohz_exit_idle(true);
>
> if (hvlpevent_is_pending())
> process_iSeries_events();
> @@ -592,7 +592,7 @@ static void iseries_dedicated_idle(void)
> set_thread_flag(TIF_POLLING_NRFLAG);
>
> while (1) {
> - tick_nohz_stop_sched_tick(1);
> + tick_nohz_enter_idle(true);
> if (!need_resched()) {
> while (!need_resched()) {
> ppc64_runlatch_off();
> @@ -609,7 +609,7 @@ static void iseries_dedicated_idle(void)
> }
>
> ppc64_runlatch_on();
> - tick_nohz_restart_sched_tick();
> + tick_nohz_exit_idle(true);
> preempt_enable_no_resched();
> schedule();
> preempt_disable();
[ . . . ]
> diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> index eb98e55..609bb20 100644
> --- a/kernel/time/tick-sched.c
> +++ b/kernel/time/tick-sched.c
> @@ -253,12 +253,13 @@ EXPORT_SYMBOL_GPL(get_cpu_iowait_time_us);
> * Called either from the idle loop or from irq_exit() when an idle period was
> * just interrupted by an interrupt which did not cause a reschedule.
> */
> -void tick_nohz_stop_sched_tick(int inidle)
> +bool tick_nohz_stop_sched_tick(int inidle)
> {
> unsigned long seq, last_jiffies, next_jiffies, delta_jiffies, flags;
> struct tick_sched *ts;
> ktime_t last_update, expires, now;
> struct clock_event_device *dev = __get_cpu_var(tick_cpu_device).evtdev;
> + int stopped = false;
> u64 time_delta;
> int cpu;
>
> @@ -405,7 +406,7 @@ void tick_nohz_stop_sched_tick(int inidle)
> ts->idle_tick = hrtimer_get_expires(&ts->sched_timer);
> ts->tick_stopped = 1;
> ts->idle_jiffies = last_jiffies;
> - rcu_enter_nohz();
> + stopped = true;
> }
>
> ts->idle_sleeps++;
> @@ -445,6 +446,24 @@ out:
> ts->sleep_length = ktime_sub(dev->next_event, now);
> end:
> local_irq_restore(flags);
> +
> + return stopped;
> +}
> +
> +
> +/**
> + * tick_nohz_enter_idle - stop the tick from the idle task
> + * @rcu_ext_qs: enter rcu dynticks idle mode
> + *
> + * When an arch doesn't make any use of rcu read side critical section
> + * between tick_nohz_enter_idle() and tick_nohz_exit_idle() it can set
> + * rcu_ext_qs to 1. Otherwise it needs to call rcu_enter_nohz() itself
> + * later before the CPU goes to sleep.
How about something like this?
+ * When an arch doesn't make any use of rcu read side critical section
+ * between tick_nohz_enter_idle() and the time the CPU is put to sleep,
+ * it can set rcu_ext_qs to true. Otherwise it needs set rcu_ext_qs to
+ * false, and then to call rcu_enter_nohz() itself later, but before the
+ * CPU goes to sleep and after its last use of RCU.
> + */
> +void tick_nohz_enter_idle(bool rcu_ext_qs)
> +{
> + if (tick_nohz_stop_sched_tick(1) && rcu_ext_qs)
> + rcu_enter_nohz();
> }
>
> /**
> @@ -486,11 +505,12 @@ static void tick_nohz_restart(struct tick_sched *ts, ktime_t now)
> }
>
> /**
> - * tick_nohz_restart_sched_tick - restart the idle tick from the idle task
> + * tick_nohz_exit_idle - restart the idle tick from the idle task
> + * @rcu_ext_qs: exit rcu dynticks idle mode
> *
> * Restart the idle tick when the CPU is woken up from idle
How about something like the following?
+ * When an arch doesn't make any use of rcu read side critical section
+ * between the time the CPU wakes up and tick_nohz_exit_idle(), it can set
+ * rcu_ext_qs to true. Otherwise it needs set rcu_ext_qs to false, and it
+ * must also have called rcu_enter_nohz() itself earlier, after the CPU
+ * was awakened, but before its first use of RCU.
> */
> -void tick_nohz_restart_sched_tick(void)
> +void tick_nohz_exit_idle(bool rcu_ext_qs)
> {
> int cpu = smp_processor_id();
> struct tick_sched *ts = &per_cpu(tick_cpu_sched, cpu);
> @@ -514,7 +534,8 @@ void tick_nohz_restart_sched_tick(void)
>
> ts->inidle = 0;
>
> - rcu_exit_nohz();
> + if (rcu_ext_qs)
> + rcu_exit_nohz();
>
> /* Update jiffies first */
> select_nohz_load_balancer(0);
> --
> 1.7.5.4
>
next prev parent reply other threads:[~2011-09-04 23:37 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-08-20 17:30 [PATCH 0/4 v2] rcu: Fix some rcu uses in extended quiescent state Frederic Weisbecker
2011-08-20 17:30 ` [PATCH 1/4] nohz: Split extended quiescent state handling from nohz switch Frederic Weisbecker
2011-08-20 17:39 ` Mike Frysinger
2011-08-22 2:02 ` Guan Xuetao
2011-09-04 21:01 ` Paul E. McKenney
2011-09-04 21:05 ` Frederic Weisbecker
2011-09-04 23:36 ` Paul E. McKenney [this message]
2011-09-06 14:58 ` Frederic Weisbecker
2011-09-07 14:22 ` Paul E. McKenney
2011-09-07 21:39 ` Frederic Weisbecker
2011-09-13 0:06 ` Frederic Weisbecker
2011-09-13 16:49 ` Paul E. McKenney
2011-08-20 17:30 ` [PATCH 2/4] x86: Enter rcu extended qs after idle notifier call Frederic Weisbecker
2011-08-20 17:30 ` [PATCH 3/4] x86: Call idle notifier after irq_enter() Frederic Weisbecker
2011-08-20 17:30 ` [PATCH 4/4] rcu: Fix early call to rcu_irq_exit() Frederic Weisbecker
2011-09-04 23:38 ` Paul E. McKenney
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110904233643.GN2411@linux.vnet.ibm.com \
--to=paulmck@linux.vnet.ibm.com \
--cc=a.p.zijlstra@chello.nl \
--cc=cmetcalf@tilera.com \
--cc=davem@davemloft.net \
--cc=fweisbec@gmail.com \
--cc=gxt@mprc.pku.edu.cn \
--cc=hans-christian.egtvedt@atmel.com \
--cc=heiko.carstens@de.ibm.com \
--cc=hpa@zytor.com \
--cc=lethal@linux-sh.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux@arm.linux.org.uk \
--cc=mingo@elte.hu \
--cc=paulus@samba.org \
--cc=ralf@linux-mips.org \
--cc=tglx@linutronix.de \
--cc=vapier@gentoo.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.