* [PATCH 0/4 v2] rcu: Fix some rcu uses in extended quiescent state
@ 2011-08-20 17:30 Frederic Weisbecker
2011-08-20 17:30 ` [PATCH 1/4] nohz: Split extended quiescent state handling from nohz switch Frederic Weisbecker
` (3 more replies)
0 siblings, 4 replies; 16+ messages in thread
From: Frederic Weisbecker @ 2011-08-20 17:30 UTC (permalink / raw)
To: Paul E. McKenney
Cc: LKML, Frederic Weisbecker, Thomas Gleixner, Peter Zijlstra,
H. Peter Anvin, David Miller, Chris Metcalf, Guan Xuetao,
Hans-Christian Egtvedt, Mike Frysinger, Ralf Baechle,
Russell King, Paul Mackerras, Heiko Carstens, Paul Mundt,
Ingo Molnar
I have changed a bit the first patch by adding a "rcu_ext_qs"
parameter in tick_nohz_enter_idle() so we just need to toggle that
parameter to false when rcu_enter_nohz() is called separately.
That's less confusing and more consistant than calling
tick_nohz_stop_sched_tick() instead of tick_nohz_enter_idle()
when rcu dynticks idle mode is handled separately.
It's an important enough change that I thought I had to drop
all the acks in the first patch.
Also it adds a new fix for a new bug detected in patch 4.
The powerpc case is trickier so I prefer to get that pile
handled before.
Thanks.
Frederic Weisbecker (4):
nohz: Split extended quiescent state handling from nohz switch
x86: Enter rcu extended qs after idle notifier call
x86: Call idle notifier after irq_enter()
rcu: Fix early call to rcu_irq_exit()
arch/arm/kernel/process.c | 4 +-
arch/avr32/kernel/process.c | 4 +-
arch/blackfin/kernel/process.c | 4 +-
arch/microblaze/kernel/process.c | 4 +-
arch/mips/kernel/process.c | 4 +-
arch/powerpc/kernel/idle.c | 4 +-
arch/powerpc/platforms/iseries/setup.c | 8 +++---
arch/s390/kernel/process.c | 4 +-
arch/sh/kernel/idle.c | 2 +-
arch/sparc/kernel/process_64.c | 4 +-
arch/tile/kernel/process.c | 4 +-
arch/um/kernel/process.c | 4 +-
arch/unicore32/kernel/process.c | 4 +-
arch/x86/kernel/apic/apic.c | 6 ++--
arch/x86/kernel/apic/io_apic.c | 2 +-
arch/x86/kernel/cpu/mcheck/mce.c | 2 +-
arch/x86/kernel/cpu/mcheck/therm_throt.c | 2 +-
arch/x86/kernel/cpu/mcheck/threshold.c | 2 +-
arch/x86/kernel/irq.c | 6 ++--
arch/x86/kernel/process_32.c | 4 +-
arch/x86/kernel/process_64.c | 9 ++++++-
include/linux/tick.h | 9 ++++---
kernel/softirq.c | 2 +-
kernel/time/tick-sched.c | 31 +++++++++++++++++++++++++----
24 files changed, 78 insertions(+), 51 deletions(-)
--
1.7.5.4
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH 1/4] nohz: Split extended quiescent state handling from nohz switch
2011-08-20 17:30 [PATCH 0/4 v2] rcu: Fix some rcu uses in extended quiescent state Frederic Weisbecker
@ 2011-08-20 17:30 ` Frederic Weisbecker
2011-08-20 17:39 ` Mike Frysinger
` (3 more replies)
2011-08-20 17:30 ` [PATCH 2/4] x86: Enter rcu extended qs after idle notifier call Frederic Weisbecker
` (2 subsequent siblings)
3 siblings, 4 replies; 16+ messages in thread
From: Frederic Weisbecker @ 2011-08-20 17:30 UTC (permalink / raw)
To: Paul E. McKenney
Cc: LKML, Frederic Weisbecker, David Miller, Chris Metcalf,
Guan Xuetao, Hans-Christian Egtvedt, Mike Frysinger, Ralf Baechle,
Ingo Molnar, Peter Zijlstra, Thomas Gleixner, H. Peter Anvin,
Russell King, Paul Mackerras, Heiko Carstens, Paul Mundt
It is assumed that rcu won't be used once we switch to tickless
mode and until we restart the tick. However this is not always
true, as in x86-64 where we dereference the idle notifiers after
the tick is stopped.
To prepare for fixing this, split the tickless mode switching and
RCU extended quiescent state logics.
Make tick_nohz_stop/restart_sched_tick() RCU agnostic but provide
a new pair of APIs tick_nohz_enter/exit_idle() that keep the
old behaviour by handling both the nohz mode and RCU extended
quiescent states, then convert every archs to use these.
Archs that want to switch to RCU extended QS to some custom points
can do it later by changing the parameter in tick_nohz_enter,exit_idle()
to false and call rcu_enter,exit() separately.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: David Miller <davem@davemloft.net>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: Hans-Christian Egtvedt <hans-christian.egtvedt@atmel.com>
Cc: Mike Frysinger <vapier@gentoo.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
---
arch/arm/kernel/process.c | 4 ++--
arch/avr32/kernel/process.c | 4 ++--
arch/blackfin/kernel/process.c | 4 ++--
arch/microblaze/kernel/process.c | 4 ++--
arch/mips/kernel/process.c | 4 ++--
arch/powerpc/kernel/idle.c | 4 ++--
arch/powerpc/platforms/iseries/setup.c | 8 ++++----
arch/s390/kernel/process.c | 4 ++--
arch/sh/kernel/idle.c | 2 +-
arch/sparc/kernel/process_64.c | 4 ++--
arch/tile/kernel/process.c | 4 ++--
arch/um/kernel/process.c | 4 ++--
arch/unicore32/kernel/process.c | 4 ++--
arch/x86/kernel/process_32.c | 4 ++--
arch/x86/kernel/process_64.c | 4 ++--
include/linux/tick.h | 9 +++++----
kernel/time/tick-sched.c | 31 ++++++++++++++++++++++++++-----
17 files changed, 62 insertions(+), 40 deletions(-)
diff --git a/arch/arm/kernel/process.c b/arch/arm/kernel/process.c
index 5e1e541..4d2e33e 100644
--- a/arch/arm/kernel/process.c
+++ b/arch/arm/kernel/process.c
@@ -182,7 +182,7 @@ void cpu_idle(void)
/* endless idle loop with no priority at all */
while (1) {
- tick_nohz_stop_sched_tick(1);
+ tick_nohz_enter_idle(true);
leds_event(led_idle_start);
while (!need_resched()) {
#ifdef CONFIG_HOTPLUG_CPU
@@ -208,7 +208,7 @@ void cpu_idle(void)
}
}
leds_event(led_idle_end);
- tick_nohz_restart_sched_tick();
+ tick_nohz_exit_idle(true);
preempt_enable_no_resched();
schedule();
preempt_disable();
diff --git a/arch/avr32/kernel/process.c b/arch/avr32/kernel/process.c
index ef5a2a0..a8fd60b 100644
--- a/arch/avr32/kernel/process.c
+++ b/arch/avr32/kernel/process.c
@@ -34,10 +34,10 @@ void cpu_idle(void)
{
/* endless idle loop with no priority at all */
while (1) {
- tick_nohz_stop_sched_tick(1);
+ tick_nohz_enter_idle(true);
while (!need_resched())
cpu_idle_sleep();
- tick_nohz_restart_sched_tick();
+ tick_nohz_exit_idle(true);
preempt_enable_no_resched();
schedule();
preempt_disable();
diff --git a/arch/blackfin/kernel/process.c b/arch/blackfin/kernel/process.c
index 6a660fa..e032471 100644
--- a/arch/blackfin/kernel/process.c
+++ b/arch/blackfin/kernel/process.c
@@ -88,10 +88,10 @@ void cpu_idle(void)
#endif
if (!idle)
idle = default_idle;
- tick_nohz_stop_sched_tick(1);
+ tick_nohz_enter_idle(true);
while (!need_resched())
idle();
- tick_nohz_restart_sched_tick();
+ tick_nohz_exit_idle(true);
preempt_enable_no_resched();
schedule();
preempt_disable();
diff --git a/arch/microblaze/kernel/process.c b/arch/microblaze/kernel/process.c
index 968648a..7134887 100644
--- a/arch/microblaze/kernel/process.c
+++ b/arch/microblaze/kernel/process.c
@@ -103,10 +103,10 @@ void cpu_idle(void)
if (!idle)
idle = default_idle;
- tick_nohz_stop_sched_tick(1);
+ tick_nohz_enter_idle(true);
while (!need_resched())
idle();
- tick_nohz_restart_sched_tick();
+ tick_nohz_exit_idle(true);
preempt_enable_no_resched();
schedule();
diff --git a/arch/mips/kernel/process.c b/arch/mips/kernel/process.c
index d2112d3..d79e7d3 100644
--- a/arch/mips/kernel/process.c
+++ b/arch/mips/kernel/process.c
@@ -56,7 +56,7 @@ void __noreturn cpu_idle(void)
/* endless idle loop with no priority at all */
while (1) {
- tick_nohz_stop_sched_tick(1);
+ tick_nohz_enter_idle(true);
while (!need_resched() && cpu_online(cpu)) {
#ifdef CONFIG_MIPS_MT_SMTC
extern void smtc_idle_loop_hook(void);
@@ -77,7 +77,7 @@ void __noreturn cpu_idle(void)
system_state == SYSTEM_BOOTING))
play_dead();
#endif
- tick_nohz_restart_sched_tick();
+ tick_nohz_exit_idle(true);
preempt_enable_no_resched();
schedule();
preempt_disable();
diff --git a/arch/powerpc/kernel/idle.c b/arch/powerpc/kernel/idle.c
index 39a2baa..b30ddf1 100644
--- a/arch/powerpc/kernel/idle.c
+++ b/arch/powerpc/kernel/idle.c
@@ -56,7 +56,7 @@ void cpu_idle(void)
set_thread_flag(TIF_POLLING_NRFLAG);
while (1) {
- tick_nohz_stop_sched_tick(1);
+ tick_nohz_enter_idle(true);
while (!need_resched() && !cpu_should_die()) {
ppc64_runlatch_off();
@@ -93,7 +93,7 @@ void cpu_idle(void)
HMT_medium();
ppc64_runlatch_on();
- tick_nohz_restart_sched_tick();
+ tick_nohz_exit_idle(true);
preempt_enable_no_resched();
if (cpu_should_die())
cpu_die();
diff --git a/arch/powerpc/platforms/iseries/setup.c b/arch/powerpc/platforms/iseries/setup.c
index c25a081..aa2be7d 100644
--- a/arch/powerpc/platforms/iseries/setup.c
+++ b/arch/powerpc/platforms/iseries/setup.c
@@ -562,7 +562,7 @@ static void yield_shared_processor(void)
static void iseries_shared_idle(void)
{
while (1) {
- tick_nohz_stop_sched_tick(1);
+ tick_nohz_enter_idle(true);
while (!need_resched() && !hvlpevent_is_pending()) {
local_irq_disable();
ppc64_runlatch_off();
@@ -576,7 +576,7 @@ static void iseries_shared_idle(void)
}
ppc64_runlatch_on();
- tick_nohz_restart_sched_tick();
+ tick_nohz_exit_idle(true);
if (hvlpevent_is_pending())
process_iSeries_events();
@@ -592,7 +592,7 @@ static void iseries_dedicated_idle(void)
set_thread_flag(TIF_POLLING_NRFLAG);
while (1) {
- tick_nohz_stop_sched_tick(1);
+ tick_nohz_enter_idle(true);
if (!need_resched()) {
while (!need_resched()) {
ppc64_runlatch_off();
@@ -609,7 +609,7 @@ static void iseries_dedicated_idle(void)
}
ppc64_runlatch_on();
- tick_nohz_restart_sched_tick();
+ tick_nohz_exit_idle(true);
preempt_enable_no_resched();
schedule();
preempt_disable();
diff --git a/arch/s390/kernel/process.c b/arch/s390/kernel/process.c
index 541a750..b19f7b2 100644
--- a/arch/s390/kernel/process.c
+++ b/arch/s390/kernel/process.c
@@ -90,10 +90,10 @@ static void default_idle(void)
void cpu_idle(void)
{
for (;;) {
- tick_nohz_stop_sched_tick(1);
+ tick_nohz_enter_idle(true);
while (!need_resched())
default_idle();
- tick_nohz_restart_sched_tick();
+ tick_nohz_exit_idle(true);
preempt_enable_no_resched();
schedule();
preempt_disable();
diff --git a/arch/sh/kernel/idle.c b/arch/sh/kernel/idle.c
index 425d604..d8997ac 100644
--- a/arch/sh/kernel/idle.c
+++ b/arch/sh/kernel/idle.c
@@ -88,7 +88,7 @@ void cpu_idle(void)
/* endless idle loop with no priority at all */
while (1) {
- tick_nohz_stop_sched_tick(1);
+ tick_nohz_enter_idle(true);
while (!need_resched()) {
check_pgt_cache();
diff --git a/arch/sparc/kernel/process_64.c b/arch/sparc/kernel/process_64.c
index c158a95..986d019 100644
--- a/arch/sparc/kernel/process_64.c
+++ b/arch/sparc/kernel/process_64.c
@@ -95,12 +95,12 @@ void cpu_idle(void)
set_thread_flag(TIF_POLLING_NRFLAG);
while(1) {
- tick_nohz_stop_sched_tick(1);
+ tick_nohz_enter_idle(true);
while (!need_resched() && !cpu_is_offline(cpu))
sparc64_yield(cpu);
- tick_nohz_restart_sched_tick();
+ tick_nohz_exit_idle(true);
preempt_enable_no_resched();
diff --git a/arch/tile/kernel/process.c b/arch/tile/kernel/process.c
index 9c45d8b..63fc9c0 100644
--- a/arch/tile/kernel/process.c
+++ b/arch/tile/kernel/process.c
@@ -85,7 +85,7 @@ void cpu_idle(void)
/* endless idle loop with no priority at all */
while (1) {
- tick_nohz_stop_sched_tick(1);
+ tick_nohz_enter_idle(true);
while (!need_resched()) {
if (cpu_is_offline(cpu))
BUG(); /* no HOTPLUG_CPU */
@@ -105,7 +105,7 @@ void cpu_idle(void)
local_irq_enable();
current_thread_info()->status |= TS_POLLING;
}
- tick_nohz_restart_sched_tick();
+ tick_nohz_exit_idle(true);
preempt_enable_no_resched();
schedule();
preempt_disable();
diff --git a/arch/um/kernel/process.c b/arch/um/kernel/process.c
index fab4371..c18786f 100644
--- a/arch/um/kernel/process.c
+++ b/arch/um/kernel/process.c
@@ -245,10 +245,10 @@ void default_idle(void)
if (need_resched())
schedule();
- tick_nohz_stop_sched_tick(1);
+ tick_nohz_enter_idle(true);
nsecs = disable_timer();
idle_sleep(nsecs);
- tick_nohz_restart_sched_tick();
+ tick_nohz_exit_idle(true);
}
}
diff --git a/arch/unicore32/kernel/process.c b/arch/unicore32/kernel/process.c
index ba401df..353b047 100644
--- a/arch/unicore32/kernel/process.c
+++ b/arch/unicore32/kernel/process.c
@@ -55,7 +55,7 @@ void cpu_idle(void)
{
/* endless idle loop with no priority at all */
while (1) {
- tick_nohz_stop_sched_tick(1);
+ tick_nohz_enter_idle(true);
while (!need_resched()) {
local_irq_disable();
stop_critical_timings();
@@ -63,7 +63,7 @@ void cpu_idle(void)
local_irq_enable();
start_critical_timings();
}
- tick_nohz_restart_sched_tick();
+ tick_nohz_exit_idle(true);
preempt_enable_no_resched();
schedule();
preempt_disable();
diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
index a3d0dc5..2cbf235 100644
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -97,7 +97,7 @@ void cpu_idle(void)
/* endless idle loop with no priority at all */
while (1) {
- tick_nohz_stop_sched_tick(1);
+ tick_nohz_enter_idle(true);
while (!need_resched()) {
check_pgt_cache();
@@ -112,7 +112,7 @@ void cpu_idle(void)
pm_idle();
start_critical_timings();
}
- tick_nohz_restart_sched_tick();
+ tick_nohz_exit_idle(true);
preempt_enable_no_resched();
schedule();
preempt_disable();
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index ca6f7ab..3d56f0d 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -120,7 +120,7 @@ void cpu_idle(void)
/* endless idle loop with no priority at all */
while (1) {
- tick_nohz_stop_sched_tick(1);
+ tick_nohz_enter_idle(true);
while (!need_resched()) {
rmb();
@@ -145,7 +145,7 @@ void cpu_idle(void)
__exit_idle();
}
- tick_nohz_restart_sched_tick();
+ tick_nohz_exit_idle(true);
preempt_enable_no_resched();
schedule();
preempt_disable();
diff --git a/include/linux/tick.h b/include/linux/tick.h
index b232ccc..4cbbcbb 100644
--- a/include/linux/tick.h
+++ b/include/linux/tick.h
@@ -121,14 +121,15 @@ static inline int tick_oneshot_mode_active(void) { return 0; }
#endif /* !CONFIG_GENERIC_CLOCKEVENTS */
# ifdef CONFIG_NO_HZ
-extern void tick_nohz_stop_sched_tick(int inidle);
-extern void tick_nohz_restart_sched_tick(void);
+extern bool tick_nohz_stop_sched_tick(int inidle);
+extern void tick_nohz_enter_idle(bool rcu_ext_qs);
+extern void tick_nohz_exit_idle(bool rcu_ext_qs);
extern ktime_t tick_nohz_get_sleep_length(void);
extern u64 get_cpu_idle_time_us(int cpu, u64 *last_update_time);
extern u64 get_cpu_iowait_time_us(int cpu, u64 *last_update_time);
# else
-static inline void tick_nohz_stop_sched_tick(int inidle) { }
-static inline void tick_nohz_restart_sched_tick(void) { }
+static inline void tick_nohz_enter_idle(bool rcu_ext_qs) { }
+static inline void tick_nohz_exit_idle(bool rcu_ext_qs) { }
static inline ktime_t tick_nohz_get_sleep_length(void)
{
ktime_t len = { .tv64 = NSEC_PER_SEC/HZ };
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index eb98e55..609bb20 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -253,12 +253,13 @@ EXPORT_SYMBOL_GPL(get_cpu_iowait_time_us);
* Called either from the idle loop or from irq_exit() when an idle period was
* just interrupted by an interrupt which did not cause a reschedule.
*/
-void tick_nohz_stop_sched_tick(int inidle)
+bool tick_nohz_stop_sched_tick(int inidle)
{
unsigned long seq, last_jiffies, next_jiffies, delta_jiffies, flags;
struct tick_sched *ts;
ktime_t last_update, expires, now;
struct clock_event_device *dev = __get_cpu_var(tick_cpu_device).evtdev;
+ int stopped = false;
u64 time_delta;
int cpu;
@@ -405,7 +406,7 @@ void tick_nohz_stop_sched_tick(int inidle)
ts->idle_tick = hrtimer_get_expires(&ts->sched_timer);
ts->tick_stopped = 1;
ts->idle_jiffies = last_jiffies;
- rcu_enter_nohz();
+ stopped = true;
}
ts->idle_sleeps++;
@@ -445,6 +446,24 @@ out:
ts->sleep_length = ktime_sub(dev->next_event, now);
end:
local_irq_restore(flags);
+
+ return stopped;
+}
+
+
+/**
+ * tick_nohz_enter_idle - stop the tick from the idle task
+ * @rcu_ext_qs: enter rcu dynticks idle mode
+ *
+ * When an arch doesn't make any use of rcu read side critical section
+ * between tick_nohz_enter_idle() and tick_nohz_exit_idle() it can set
+ * rcu_ext_qs to 1. Otherwise it needs to call rcu_enter_nohz() itself
+ * later before the CPU goes to sleep.
+ */
+void tick_nohz_enter_idle(bool rcu_ext_qs)
+{
+ if (tick_nohz_stop_sched_tick(1) && rcu_ext_qs)
+ rcu_enter_nohz();
}
/**
@@ -486,11 +505,12 @@ static void tick_nohz_restart(struct tick_sched *ts, ktime_t now)
}
/**
- * tick_nohz_restart_sched_tick - restart the idle tick from the idle task
+ * tick_nohz_exit_idle - restart the idle tick from the idle task
+ * @rcu_ext_qs: exit rcu dynticks idle mode
*
* Restart the idle tick when the CPU is woken up from idle
*/
-void tick_nohz_restart_sched_tick(void)
+void tick_nohz_exit_idle(bool rcu_ext_qs)
{
int cpu = smp_processor_id();
struct tick_sched *ts = &per_cpu(tick_cpu_sched, cpu);
@@ -514,7 +534,8 @@ void tick_nohz_restart_sched_tick(void)
ts->inidle = 0;
- rcu_exit_nohz();
+ if (rcu_ext_qs)
+ rcu_exit_nohz();
/* Update jiffies first */
select_nohz_load_balancer(0);
--
1.7.5.4
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 2/4] x86: Enter rcu extended qs after idle notifier call
2011-08-20 17:30 [PATCH 0/4 v2] rcu: Fix some rcu uses in extended quiescent state Frederic Weisbecker
2011-08-20 17:30 ` [PATCH 1/4] nohz: Split extended quiescent state handling from nohz switch Frederic Weisbecker
@ 2011-08-20 17:30 ` Frederic Weisbecker
2011-08-20 17:30 ` [PATCH 3/4] x86: Call idle notifier after irq_enter() Frederic Weisbecker
2011-08-20 17:30 ` [PATCH 4/4] rcu: Fix early call to rcu_irq_exit() Frederic Weisbecker
3 siblings, 0 replies; 16+ messages in thread
From: Frederic Weisbecker @ 2011-08-20 17:30 UTC (permalink / raw)
To: Paul E. McKenney
Cc: LKML, Frederic Weisbecker, Ingo Molnar, Thomas Gleixner,
H. Peter Anvin
The idle notifier, called by enter_idle(), enters into rcu read
side critical section but at that time we already switched into
rcu dynticks idle mode. And it's illegal to use rcu_read_lock()
in that state.
This results in rcu reporting its bad mood:
[ 1.275635] WARNING: at include/linux/rcupdate.h:194 __atomic_notifier_call_chain+0xd2/0x110()
[ 1.275635] Hardware name: AMD690VM-FMH
[ 1.275635] Modules linked in:
[ 1.275635] Pid: 0, comm: swapper Not tainted 3.0.0-rc6+ #252
[ 1.275635] Call Trace:
[ 1.275635] [<ffffffff81051c8a>] warn_slowpath_common+0x7a/0xb0
[ 1.275635] [<ffffffff81051cd5>] warn_slowpath_null+0x15/0x20
[ 1.275635] [<ffffffff817d6f22>] __atomic_notifier_call_chain+0xd2/0x110
[ 1.275635] [<ffffffff817d6f71>] atomic_notifier_call_chain+0x11/0x20
[ 1.275635] [<ffffffff810018a0>] enter_idle+0x20/0x30
[ 1.275635] [<ffffffff81001995>] cpu_idle+0xa5/0x110
[ 1.275635] [<ffffffff817a7465>] rest_init+0xe5/0x140
[ 1.275635] [<ffffffff817a73c8>] ? rest_init+0x48/0x140
[ 1.275635] [<ffffffff81cc5ca3>] start_kernel+0x3d1/0x3dc
[ 1.275635] [<ffffffff81cc5321>] x86_64_start_reservations+0x131/0x135
[ 1.275635] [<ffffffff81cc5412>] x86_64_start_kernel+0xed/0xf4
[ 1.275635] ---[ end trace a22d306b065d4a66 ]---
Fix this by entering rcu extended quiescent state later, just before
the CPU goes to sleep.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: H. Peter Anvin <hpa@zytor.com>
---
arch/x86/kernel/process_64.c | 9 +++++++--
1 files changed, 7 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 3d56f0d..1efd185 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -120,7 +120,7 @@ void cpu_idle(void)
/* endless idle loop with no priority at all */
while (1) {
- tick_nohz_enter_idle(true);
+ tick_nohz_enter_idle(false);
while (!need_resched()) {
rmb();
@@ -136,7 +136,12 @@ void cpu_idle(void)
enter_idle();
/* Don't trace irqs off for idle */
stop_critical_timings();
+
+ /* enter_idle() needs rcu for notifiers */
+ rcu_enter_nohz();
pm_idle();
+ rcu_exit_nohz();
+
start_critical_timings();
/* In many cases the interrupt that ended idle
@@ -145,7 +150,7 @@ void cpu_idle(void)
__exit_idle();
}
- tick_nohz_exit_idle(true);
+ tick_nohz_exit_idle(false);
preempt_enable_no_resched();
schedule();
preempt_disable();
--
1.7.5.4
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 3/4] x86: Call idle notifier after irq_enter()
2011-08-20 17:30 [PATCH 0/4 v2] rcu: Fix some rcu uses in extended quiescent state Frederic Weisbecker
2011-08-20 17:30 ` [PATCH 1/4] nohz: Split extended quiescent state handling from nohz switch Frederic Weisbecker
2011-08-20 17:30 ` [PATCH 2/4] x86: Enter rcu extended qs after idle notifier call Frederic Weisbecker
@ 2011-08-20 17:30 ` Frederic Weisbecker
2011-08-20 17:30 ` [PATCH 4/4] rcu: Fix early call to rcu_irq_exit() Frederic Weisbecker
3 siblings, 0 replies; 16+ messages in thread
From: Frederic Weisbecker @ 2011-08-20 17:30 UTC (permalink / raw)
To: Paul E. McKenney
Cc: LKML, Frederic Weisbecker, Ingo Molnar, Thomas Gleixner,
H. Peter Anvin
Interrupts notify the idle exit state before calling irq_enter().
But the notifier code calls rcu_read_lock() and this is not
allowed while rcu is in an extended quiescent state. We need
to wait for rcu_irq_enter() to be called before doing so
otherwise this results in a grumpy RCU:
[ 0.099991] WARNING: at include/linux/rcupdate.h:194 __atomic_notifier_call_chain+0xd2/0x110()
[ 0.099991] Hardware name: AMD690VM-FMH
[ 0.099991] Modules linked in:
[ 0.099991] Pid: 0, comm: swapper Not tainted 3.0.0-rc6+ #255
[ 0.099991] Call Trace:
[ 0.099991] <IRQ> [<ffffffff81051c8a>] warn_slowpath_common+0x7a/0xb0
[ 0.099991] [<ffffffff81051cd5>] warn_slowpath_null+0x15/0x20
[ 0.099991] [<ffffffff817d6fa2>] __atomic_notifier_call_chain+0xd2/0x110
[ 0.099991] [<ffffffff817d6ff1>] atomic_notifier_call_chain+0x11/0x20
[ 0.099991] [<ffffffff81001873>] exit_idle+0x43/0x50
[ 0.099991] [<ffffffff81020439>] smp_apic_timer_interrupt+0x39/0xa0
[ 0.099991] [<ffffffff817da253>] apic_timer_interrupt+0x13/0x20
[ 0.099991] <EOI> [<ffffffff8100ae67>] ? default_idle+0xa7/0x350
[ 0.099991] [<ffffffff8100ae65>] ? default_idle+0xa5/0x350
[ 0.099991] [<ffffffff8100b19b>] amd_e400_idle+0x8b/0x110
[ 0.099991] [<ffffffff810cb01f>] ? rcu_enter_nohz+0x8f/0x160
[ 0.099991] [<ffffffff810019a0>] cpu_idle+0xb0/0x110
[ 0.099991] [<ffffffff817a7505>] rest_init+0xe5/0x140
[ 0.099991] [<ffffffff817a7468>] ? rest_init+0x48/0x140
[ 0.099991] [<ffffffff81cc5ca3>] start_kernel+0x3d1/0x3dc
[ 0.099991] [<ffffffff81cc5321>] x86_64_start_reservations+0x131/0x135
[ 0.099991] [<ffffffff81cc5412>] x86_64_start_kernel+0xed/0xf4
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: H. Peter Anvin <hpa@zytor.com>
---
arch/x86/kernel/apic/apic.c | 6 +++---
arch/x86/kernel/apic/io_apic.c | 2 +-
arch/x86/kernel/cpu/mcheck/mce.c | 2 +-
arch/x86/kernel/cpu/mcheck/therm_throt.c | 2 +-
arch/x86/kernel/cpu/mcheck/threshold.c | 2 +-
arch/x86/kernel/irq.c | 6 +++---
6 files changed, 10 insertions(+), 10 deletions(-)
diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index b9338b8..82af921 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -856,8 +856,8 @@ void __irq_entry smp_apic_timer_interrupt(struct pt_regs *regs)
* Besides, if we don't timer interrupts ignore the global
* interrupt lock, which is the WrongThing (tm) to do.
*/
- exit_idle();
irq_enter();
+ exit_idle();
local_apic_timer_interrupt();
irq_exit();
@@ -1790,8 +1790,8 @@ void smp_spurious_interrupt(struct pt_regs *regs)
{
u32 v;
- exit_idle();
irq_enter();
+ exit_idle();
/*
* Check if this really is a spurious interrupt and ACK it
* if it is a vectored one. Just in case...
@@ -1827,8 +1827,8 @@ void smp_error_interrupt(struct pt_regs *regs)
"Illegal register address", /* APIC Error Bit 7 */
};
- exit_idle();
irq_enter();
+ exit_idle();
/* First tickle the hardware, only then report what went on. -- REW */
v0 = apic_read(APIC_ESR);
apic_write(APIC_ESR, 0);
diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index e529339..e1b5eec 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -2275,8 +2275,8 @@ asmlinkage void smp_irq_move_cleanup_interrupt(void)
unsigned vector, me;
ack_APIC_irq();
- exit_idle();
irq_enter();
+ exit_idle();
me = smp_processor_id();
for (vector = FIRST_EXTERNAL_VECTOR; vector < NR_VECTORS; vector++) {
diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index ff1ae9b..7ba9757 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -470,8 +470,8 @@ static inline void mce_get_rip(struct mce *m, struct pt_regs *regs)
asmlinkage void smp_mce_self_interrupt(struct pt_regs *regs)
{
ack_APIC_irq();
- exit_idle();
irq_enter();
+ exit_idle();
mce_notify_irq();
mce_schedule_work();
irq_exit();
diff --git a/arch/x86/kernel/cpu/mcheck/therm_throt.c b/arch/x86/kernel/cpu/mcheck/therm_throt.c
index 27c6251..f6bbc64 100644
--- a/arch/x86/kernel/cpu/mcheck/therm_throt.c
+++ b/arch/x86/kernel/cpu/mcheck/therm_throt.c
@@ -396,8 +396,8 @@ static void (*smp_thermal_vector)(void) = unexpected_thermal_interrupt;
asmlinkage void smp_thermal_interrupt(struct pt_regs *regs)
{
- exit_idle();
irq_enter();
+ exit_idle();
inc_irq_stat(irq_thermal_count);
smp_thermal_vector();
irq_exit();
diff --git a/arch/x86/kernel/cpu/mcheck/threshold.c b/arch/x86/kernel/cpu/mcheck/threshold.c
index d746df2..aa578ca 100644
--- a/arch/x86/kernel/cpu/mcheck/threshold.c
+++ b/arch/x86/kernel/cpu/mcheck/threshold.c
@@ -19,8 +19,8 @@ void (*mce_threshold_vector)(void) = default_threshold_interrupt;
asmlinkage void smp_threshold_interrupt(void)
{
- exit_idle();
irq_enter();
+ exit_idle();
inc_irq_stat(irq_threshold_count);
mce_threshold_vector();
irq_exit();
diff --git a/arch/x86/kernel/irq.c b/arch/x86/kernel/irq.c
index 6c0802e..73cf928 100644
--- a/arch/x86/kernel/irq.c
+++ b/arch/x86/kernel/irq.c
@@ -180,8 +180,8 @@ unsigned int __irq_entry do_IRQ(struct pt_regs *regs)
unsigned vector = ~regs->orig_ax;
unsigned irq;
- exit_idle();
irq_enter();
+ exit_idle();
irq = __this_cpu_read(vector_irq[vector]);
@@ -208,10 +208,10 @@ void smp_x86_platform_ipi(struct pt_regs *regs)
ack_APIC_irq();
- exit_idle();
-
irq_enter();
+ exit_idle();
+
inc_irq_stat(x86_platform_ipis);
if (x86_platform_ipi_callback)
--
1.7.5.4
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 4/4] rcu: Fix early call to rcu_irq_exit()
2011-08-20 17:30 [PATCH 0/4 v2] rcu: Fix some rcu uses in extended quiescent state Frederic Weisbecker
` (2 preceding siblings ...)
2011-08-20 17:30 ` [PATCH 3/4] x86: Call idle notifier after irq_enter() Frederic Weisbecker
@ 2011-08-20 17:30 ` Frederic Weisbecker
2011-09-04 23:38 ` Paul E. McKenney
3 siblings, 1 reply; 16+ messages in thread
From: Frederic Weisbecker @ 2011-08-20 17:30 UTC (permalink / raw)
To: Paul E. McKenney
Cc: LKML, Frederic Weisbecker, Ingo Molnar, Thomas Gleixner,
Peter Zijlstra
On the irq exit path, tick_nohz_stop_sched_tick()
may raise a softirq, which action leads to the wake up
path and select_task_rq_fair() that makes use of rcu
to iterate the domains.
This is an illegal use of RCU because we may be in dyntick
idle mode:
[ 132.978883] ===============================
[ 132.978883] [ INFO: suspicious RCU usage. ]
[ 132.978883] -------------------------------
[ 132.978883] kernel/sched_fair.c:1707 suspicious rcu_dereference_check() usage!
[ 132.978883]
[ 132.978883] other info that might help us debug this:
[ 132.978883]
[ 132.978883]
[ 132.978883] rcu_scheduler_active = 1, debug_locks = 0
[ 132.978883] RCU used illegally from extended quiescent state!
[ 132.978883] 2 locks held by swapper/0:
[ 132.978883] #0: (&p->pi_lock){-.-.-.}, at: [<ffffffff8105a729>] try_to_wake_up+0x39/0x2f0
[ 132.978883] #1: (rcu_read_lock){.+.+..}, at: [<ffffffff8105556a>] select_task_rq_fair+0x6a/0xec0
[ 132.978883]
[ 132.978883] stack backtrace:
[ 132.978883] Pid: 0, comm: swapper Tainted: G W 3.0.0+ #178
[ 132.978883] Call Trace:
[ 132.978883] <IRQ> [<ffffffff810a01f6>] lockdep_rcu_suspicious+0xe6/0x100
[ 132.978883] [<ffffffff81055c49>] select_task_rq_fair+0x749/0xec0
[ 132.978883] [<ffffffff8105556a>] ? select_task_rq_fair+0x6a/0xec0
[ 132.978883] [<ffffffff812fe494>] ? do_raw_spin_lock+0x54/0x150
[ 132.978883] [<ffffffff810a1f2d>] ? trace_hardirqs_on+0xd/0x10
[ 132.978883] [<ffffffff8105a7c3>] try_to_wake_up+0xd3/0x2f0
[ 132.978883] [<ffffffff81094f98>] ? ktime_get+0x68/0xf0
[ 132.978883] [<ffffffff8105aa35>] wake_up_process+0x15/0x20
[ 132.978883] [<ffffffff81069dd5>] raise_softirq_irqoff+0x65/0x110
[ 132.978883] [<ffffffff8108eb65>] __hrtimer_start_range_ns+0x415/0x5a0
[ 132.978883] [<ffffffff812fe3ee>] ? do_raw_spin_unlock+0x5e/0xb0
[ 132.978883] [<ffffffff8108ed08>] hrtimer_start+0x18/0x20
[ 132.978883] [<ffffffff8109c9c3>] tick_nohz_stop_sched_tick+0x393/0x450
[ 132.978883] [<ffffffff810694f2>] irq_exit+0xd2/0x100
[ 132.978883] [<ffffffff81829e96>] do_IRQ+0x66/0xe0
[ 132.978883] [<ffffffff81820d53>] common_interrupt+0x13/0x13
[ 132.978883] <EOI> [<ffffffff8103434b>] ? native_safe_halt+0xb/0x10
[ 132.978883] [<ffffffff810a1f2d>] ? trace_hardirqs_on+0xd/0x10
[ 132.978883] [<ffffffff810144ea>] default_idle+0xba/0x370
[ 132.978883] [<ffffffff810147fe>] amd_e400_idle+0x5e/0x130
[ 132.978883] [<ffffffff8100a9f6>] cpu_idle+0xb6/0x120
[ 132.978883] [<ffffffff817f217f>] rest_init+0xef/0x150
[ 132.978883] [<ffffffff817f20e2>] ? rest_init+0x52/0x150
[ 132.978883] [<ffffffff81ed9cf3>] start_kernel+0x3da/0x3e5
[ 132.978883] [<ffffffff81ed9346>] x86_64_start_reservations+0x131/0x135
[ 132.978883] [<ffffffff81ed944d>] x86_64_start_kernel+0x103/0x112
Fix this by calling rcu_irq_exit() after tick_nohz_stop_sched_tick().
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
kernel/softirq.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/kernel/softirq.c b/kernel/softirq.c
index fca82c3..d4047b8 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -347,12 +347,12 @@ void irq_exit(void)
if (!in_interrupt() && local_softirq_pending())
invoke_softirq();
- rcu_irq_exit();
#ifdef CONFIG_NO_HZ
/* Make sure that timer wheel updates are propagated */
if (idle_cpu(smp_processor_id()) && !in_interrupt() && !need_resched())
tick_nohz_stop_sched_tick(0);
#endif
+ rcu_irq_exit();
preempt_enable_no_resched();
}
--
1.7.5.4
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH 1/4] nohz: Split extended quiescent state handling from nohz switch
2011-08-20 17:30 ` [PATCH 1/4] nohz: Split extended quiescent state handling from nohz switch Frederic Weisbecker
@ 2011-08-20 17:39 ` Mike Frysinger
2011-08-22 2:02 ` Guan Xuetao
` (2 subsequent siblings)
3 siblings, 0 replies; 16+ messages in thread
From: Mike Frysinger @ 2011-08-20 17:39 UTC (permalink / raw)
To: Frederic Weisbecker
Cc: Paul E. McKenney, LKML, David Miller, Chris Metcalf, Guan Xuetao,
Hans-Christian Egtvedt, Ralf Baechle, Ingo Molnar, Peter Zijlstra,
Thomas Gleixner, H. Peter Anvin, Russell King, Paul Mackerras,
Heiko Carstens, Paul Mundt
[-- Attachment #1: Type: Text/Plain, Size: 51 bytes --]
Acked-by: Mike Frysinger <vapier@gentoo.org>
-mike
[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 836 bytes --]
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 1/4] nohz: Split extended quiescent state handling from nohz switch
2011-08-20 17:30 ` [PATCH 1/4] nohz: Split extended quiescent state handling from nohz switch Frederic Weisbecker
2011-08-20 17:39 ` Mike Frysinger
@ 2011-08-22 2:02 ` Guan Xuetao
2011-09-04 21:01 ` Paul E. McKenney
2011-09-04 23:36 ` Paul E. McKenney
3 siblings, 0 replies; 16+ messages in thread
From: Guan Xuetao @ 2011-08-22 2:02 UTC (permalink / raw)
To: Frederic Weisbecker
Cc: Paul E. McKenney, LKML, David Miller, Chris Metcalf,
Hans-Christian Egtvedt, Mike Frysinger, Ralf Baechle, Ingo Molnar,
Peter Zijlstra, Thomas Gleixner, H. Peter Anvin, Russell King,
Paul Mackerras, Heiko Carstens, Paul Mundt
Acked-by: Guan Xuetao <gxt@mprc.pku.edu.cn>
Thanks,
Guan Xuetao
On Sat, 2011-08-20 at 19:30 +0200, Frederic Weisbecker wrote:
> It is assumed that rcu won't be used once we switch to tickless
> mode and until we restart the tick. However this is not always
> true, as in x86-64 where we dereference the idle notifiers after
> the tick is stopped.
>
> To prepare for fixing this, split the tickless mode switching and
> RCU extended quiescent state logics.
> Make tick_nohz_stop/restart_sched_tick() RCU agnostic but provide
> a new pair of APIs tick_nohz_enter/exit_idle() that keep the
> old behaviour by handling both the nohz mode and RCU extended
> quiescent states, then convert every archs to use these.
>
> Archs that want to switch to RCU extended QS to some custom points
> can do it later by changing the parameter in tick_nohz_enter,exit_idle()
> to false and call rcu_enter,exit() separately.
>
> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
> Cc: David Miller <davem@davemloft.net>
> Cc: Chris Metcalf <cmetcalf@tilera.com>
> Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
> Cc: Hans-Christian Egtvedt <hans-christian.egtvedt@atmel.com>
> Cc: Mike Frysinger <vapier@gentoo.org>
> Cc: Ralf Baechle <ralf@linux-mips.org>
> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> Cc: Ingo Molnar <mingo@elte.hu>
> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: H. Peter Anvin <hpa@zytor.com>
> Cc: Russell King <linux@arm.linux.org.uk>
> Cc: Paul Mackerras <paulus@samba.org>
> Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
> Cc: Paul Mundt <lethal@linux-sh.org>
> ---
> arch/arm/kernel/process.c | 4 ++--
> arch/avr32/kernel/process.c | 4 ++--
> arch/blackfin/kernel/process.c | 4 ++--
> arch/microblaze/kernel/process.c | 4 ++--
> arch/mips/kernel/process.c | 4 ++--
> arch/powerpc/kernel/idle.c | 4 ++--
> arch/powerpc/platforms/iseries/setup.c | 8 ++++----
> arch/s390/kernel/process.c | 4 ++--
> arch/sh/kernel/idle.c | 2 +-
> arch/sparc/kernel/process_64.c | 4 ++--
> arch/tile/kernel/process.c | 4 ++--
> arch/um/kernel/process.c | 4 ++--
> arch/unicore32/kernel/process.c | 4 ++--
> arch/x86/kernel/process_32.c | 4 ++--
> arch/x86/kernel/process_64.c | 4 ++--
> include/linux/tick.h | 9 +++++----
> kernel/time/tick-sched.c | 31 ++++++++++++++++++++++++++-----
> 17 files changed, 62 insertions(+), 40 deletions(-)
>
> diff --git a/arch/arm/kernel/process.c b/arch/arm/kernel/process.c
> index 5e1e541..4d2e33e 100644
> --- a/arch/arm/kernel/process.c
> +++ b/arch/arm/kernel/process.c
> @@ -182,7 +182,7 @@ void cpu_idle(void)
>
> /* endless idle loop with no priority at all */
> while (1) {
> - tick_nohz_stop_sched_tick(1);
> + tick_nohz_enter_idle(true);
> leds_event(led_idle_start);
> while (!need_resched()) {
> #ifdef CONFIG_HOTPLUG_CPU
> @@ -208,7 +208,7 @@ void cpu_idle(void)
> }
> }
> leds_event(led_idle_end);
> - tick_nohz_restart_sched_tick();
> + tick_nohz_exit_idle(true);
> preempt_enable_no_resched();
> schedule();
> preempt_disable();
> diff --git a/arch/avr32/kernel/process.c b/arch/avr32/kernel/process.c
> index ef5a2a0..a8fd60b 100644
> --- a/arch/avr32/kernel/process.c
> +++ b/arch/avr32/kernel/process.c
> @@ -34,10 +34,10 @@ void cpu_idle(void)
> {
> /* endless idle loop with no priority at all */
> while (1) {
> - tick_nohz_stop_sched_tick(1);
> + tick_nohz_enter_idle(true);
> while (!need_resched())
> cpu_idle_sleep();
> - tick_nohz_restart_sched_tick();
> + tick_nohz_exit_idle(true);
> preempt_enable_no_resched();
> schedule();
> preempt_disable();
> diff --git a/arch/blackfin/kernel/process.c b/arch/blackfin/kernel/process.c
> index 6a660fa..e032471 100644
> --- a/arch/blackfin/kernel/process.c
> +++ b/arch/blackfin/kernel/process.c
> @@ -88,10 +88,10 @@ void cpu_idle(void)
> #endif
> if (!idle)
> idle = default_idle;
> - tick_nohz_stop_sched_tick(1);
> + tick_nohz_enter_idle(true);
> while (!need_resched())
> idle();
> - tick_nohz_restart_sched_tick();
> + tick_nohz_exit_idle(true);
> preempt_enable_no_resched();
> schedule();
> preempt_disable();
> diff --git a/arch/microblaze/kernel/process.c b/arch/microblaze/kernel/process.c
> index 968648a..7134887 100644
> --- a/arch/microblaze/kernel/process.c
> +++ b/arch/microblaze/kernel/process.c
> @@ -103,10 +103,10 @@ void cpu_idle(void)
> if (!idle)
> idle = default_idle;
>
> - tick_nohz_stop_sched_tick(1);
> + tick_nohz_enter_idle(true);
> while (!need_resched())
> idle();
> - tick_nohz_restart_sched_tick();
> + tick_nohz_exit_idle(true);
>
> preempt_enable_no_resched();
> schedule();
> diff --git a/arch/mips/kernel/process.c b/arch/mips/kernel/process.c
> index d2112d3..d79e7d3 100644
> --- a/arch/mips/kernel/process.c
> +++ b/arch/mips/kernel/process.c
> @@ -56,7 +56,7 @@ void __noreturn cpu_idle(void)
>
> /* endless idle loop with no priority at all */
> while (1) {
> - tick_nohz_stop_sched_tick(1);
> + tick_nohz_enter_idle(true);
> while (!need_resched() && cpu_online(cpu)) {
> #ifdef CONFIG_MIPS_MT_SMTC
> extern void smtc_idle_loop_hook(void);
> @@ -77,7 +77,7 @@ void __noreturn cpu_idle(void)
> system_state == SYSTEM_BOOTING))
> play_dead();
> #endif
> - tick_nohz_restart_sched_tick();
> + tick_nohz_exit_idle(true);
> preempt_enable_no_resched();
> schedule();
> preempt_disable();
> diff --git a/arch/powerpc/kernel/idle.c b/arch/powerpc/kernel/idle.c
> index 39a2baa..b30ddf1 100644
> --- a/arch/powerpc/kernel/idle.c
> +++ b/arch/powerpc/kernel/idle.c
> @@ -56,7 +56,7 @@ void cpu_idle(void)
>
> set_thread_flag(TIF_POLLING_NRFLAG);
> while (1) {
> - tick_nohz_stop_sched_tick(1);
> + tick_nohz_enter_idle(true);
> while (!need_resched() && !cpu_should_die()) {
> ppc64_runlatch_off();
>
> @@ -93,7 +93,7 @@ void cpu_idle(void)
>
> HMT_medium();
> ppc64_runlatch_on();
> - tick_nohz_restart_sched_tick();
> + tick_nohz_exit_idle(true);
> preempt_enable_no_resched();
> if (cpu_should_die())
> cpu_die();
> diff --git a/arch/powerpc/platforms/iseries/setup.c b/arch/powerpc/platforms/iseries/setup.c
> index c25a081..aa2be7d 100644
> --- a/arch/powerpc/platforms/iseries/setup.c
> +++ b/arch/powerpc/platforms/iseries/setup.c
> @@ -562,7 +562,7 @@ static void yield_shared_processor(void)
> static void iseries_shared_idle(void)
> {
> while (1) {
> - tick_nohz_stop_sched_tick(1);
> + tick_nohz_enter_idle(true);
> while (!need_resched() && !hvlpevent_is_pending()) {
> local_irq_disable();
> ppc64_runlatch_off();
> @@ -576,7 +576,7 @@ static void iseries_shared_idle(void)
> }
>
> ppc64_runlatch_on();
> - tick_nohz_restart_sched_tick();
> + tick_nohz_exit_idle(true);
>
> if (hvlpevent_is_pending())
> process_iSeries_events();
> @@ -592,7 +592,7 @@ static void iseries_dedicated_idle(void)
> set_thread_flag(TIF_POLLING_NRFLAG);
>
> while (1) {
> - tick_nohz_stop_sched_tick(1);
> + tick_nohz_enter_idle(true);
> if (!need_resched()) {
> while (!need_resched()) {
> ppc64_runlatch_off();
> @@ -609,7 +609,7 @@ static void iseries_dedicated_idle(void)
> }
>
> ppc64_runlatch_on();
> - tick_nohz_restart_sched_tick();
> + tick_nohz_exit_idle(true);
> preempt_enable_no_resched();
> schedule();
> preempt_disable();
> diff --git a/arch/s390/kernel/process.c b/arch/s390/kernel/process.c
> index 541a750..b19f7b2 100644
> --- a/arch/s390/kernel/process.c
> +++ b/arch/s390/kernel/process.c
> @@ -90,10 +90,10 @@ static void default_idle(void)
> void cpu_idle(void)
> {
> for (;;) {
> - tick_nohz_stop_sched_tick(1);
> + tick_nohz_enter_idle(true);
> while (!need_resched())
> default_idle();
> - tick_nohz_restart_sched_tick();
> + tick_nohz_exit_idle(true);
> preempt_enable_no_resched();
> schedule();
> preempt_disable();
> diff --git a/arch/sh/kernel/idle.c b/arch/sh/kernel/idle.c
> index 425d604..d8997ac 100644
> --- a/arch/sh/kernel/idle.c
> +++ b/arch/sh/kernel/idle.c
> @@ -88,7 +88,7 @@ void cpu_idle(void)
>
> /* endless idle loop with no priority at all */
> while (1) {
> - tick_nohz_stop_sched_tick(1);
> + tick_nohz_enter_idle(true);
>
> while (!need_resched()) {
> check_pgt_cache();
> diff --git a/arch/sparc/kernel/process_64.c b/arch/sparc/kernel/process_64.c
> index c158a95..986d019 100644
> --- a/arch/sparc/kernel/process_64.c
> +++ b/arch/sparc/kernel/process_64.c
> @@ -95,12 +95,12 @@ void cpu_idle(void)
> set_thread_flag(TIF_POLLING_NRFLAG);
>
> while(1) {
> - tick_nohz_stop_sched_tick(1);
> + tick_nohz_enter_idle(true);
>
> while (!need_resched() && !cpu_is_offline(cpu))
> sparc64_yield(cpu);
>
> - tick_nohz_restart_sched_tick();
> + tick_nohz_exit_idle(true);
>
> preempt_enable_no_resched();
>
> diff --git a/arch/tile/kernel/process.c b/arch/tile/kernel/process.c
> index 9c45d8b..63fc9c0 100644
> --- a/arch/tile/kernel/process.c
> +++ b/arch/tile/kernel/process.c
> @@ -85,7 +85,7 @@ void cpu_idle(void)
>
> /* endless idle loop with no priority at all */
> while (1) {
> - tick_nohz_stop_sched_tick(1);
> + tick_nohz_enter_idle(true);
> while (!need_resched()) {
> if (cpu_is_offline(cpu))
> BUG(); /* no HOTPLUG_CPU */
> @@ -105,7 +105,7 @@ void cpu_idle(void)
> local_irq_enable();
> current_thread_info()->status |= TS_POLLING;
> }
> - tick_nohz_restart_sched_tick();
> + tick_nohz_exit_idle(true);
> preempt_enable_no_resched();
> schedule();
> preempt_disable();
> diff --git a/arch/um/kernel/process.c b/arch/um/kernel/process.c
> index fab4371..c18786f 100644
> --- a/arch/um/kernel/process.c
> +++ b/arch/um/kernel/process.c
> @@ -245,10 +245,10 @@ void default_idle(void)
> if (need_resched())
> schedule();
>
> - tick_nohz_stop_sched_tick(1);
> + tick_nohz_enter_idle(true);
> nsecs = disable_timer();
> idle_sleep(nsecs);
> - tick_nohz_restart_sched_tick();
> + tick_nohz_exit_idle(true);
> }
> }
>
> diff --git a/arch/unicore32/kernel/process.c b/arch/unicore32/kernel/process.c
> index ba401df..353b047 100644
> --- a/arch/unicore32/kernel/process.c
> +++ b/arch/unicore32/kernel/process.c
> @@ -55,7 +55,7 @@ void cpu_idle(void)
> {
> /* endless idle loop with no priority at all */
> while (1) {
> - tick_nohz_stop_sched_tick(1);
> + tick_nohz_enter_idle(true);
> while (!need_resched()) {
> local_irq_disable();
> stop_critical_timings();
> @@ -63,7 +63,7 @@ void cpu_idle(void)
> local_irq_enable();
> start_critical_timings();
> }
> - tick_nohz_restart_sched_tick();
> + tick_nohz_exit_idle(true);
> preempt_enable_no_resched();
> schedule();
> preempt_disable();
> diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
> index a3d0dc5..2cbf235 100644
> --- a/arch/x86/kernel/process_32.c
> +++ b/arch/x86/kernel/process_32.c
> @@ -97,7 +97,7 @@ void cpu_idle(void)
>
> /* endless idle loop with no priority at all */
> while (1) {
> - tick_nohz_stop_sched_tick(1);
> + tick_nohz_enter_idle(true);
> while (!need_resched()) {
>
> check_pgt_cache();
> @@ -112,7 +112,7 @@ void cpu_idle(void)
> pm_idle();
> start_critical_timings();
> }
> - tick_nohz_restart_sched_tick();
> + tick_nohz_exit_idle(true);
> preempt_enable_no_resched();
> schedule();
> preempt_disable();
> diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
> index ca6f7ab..3d56f0d 100644
> --- a/arch/x86/kernel/process_64.c
> +++ b/arch/x86/kernel/process_64.c
> @@ -120,7 +120,7 @@ void cpu_idle(void)
>
> /* endless idle loop with no priority at all */
> while (1) {
> - tick_nohz_stop_sched_tick(1);
> + tick_nohz_enter_idle(true);
> while (!need_resched()) {
>
> rmb();
> @@ -145,7 +145,7 @@ void cpu_idle(void)
> __exit_idle();
> }
>
> - tick_nohz_restart_sched_tick();
> + tick_nohz_exit_idle(true);
> preempt_enable_no_resched();
> schedule();
> preempt_disable();
> diff --git a/include/linux/tick.h b/include/linux/tick.h
> index b232ccc..4cbbcbb 100644
> --- a/include/linux/tick.h
> +++ b/include/linux/tick.h
> @@ -121,14 +121,15 @@ static inline int tick_oneshot_mode_active(void) { return 0; }
> #endif /* !CONFIG_GENERIC_CLOCKEVENTS */
>
> # ifdef CONFIG_NO_HZ
> -extern void tick_nohz_stop_sched_tick(int inidle);
> -extern void tick_nohz_restart_sched_tick(void);
> +extern bool tick_nohz_stop_sched_tick(int inidle);
> +extern void tick_nohz_enter_idle(bool rcu_ext_qs);
> +extern void tick_nohz_exit_idle(bool rcu_ext_qs);
> extern ktime_t tick_nohz_get_sleep_length(void);
> extern u64 get_cpu_idle_time_us(int cpu, u64 *last_update_time);
> extern u64 get_cpu_iowait_time_us(int cpu, u64 *last_update_time);
> # else
> -static inline void tick_nohz_stop_sched_tick(int inidle) { }
> -static inline void tick_nohz_restart_sched_tick(void) { }
> +static inline void tick_nohz_enter_idle(bool rcu_ext_qs) { }
> +static inline void tick_nohz_exit_idle(bool rcu_ext_qs) { }
> static inline ktime_t tick_nohz_get_sleep_length(void)
> {
> ktime_t len = { .tv64 = NSEC_PER_SEC/HZ };
> diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> index eb98e55..609bb20 100644
> --- a/kernel/time/tick-sched.c
> +++ b/kernel/time/tick-sched.c
> @@ -253,12 +253,13 @@ EXPORT_SYMBOL_GPL(get_cpu_iowait_time_us);
> * Called either from the idle loop or from irq_exit() when an idle period was
> * just interrupted by an interrupt which did not cause a reschedule.
> */
> -void tick_nohz_stop_sched_tick(int inidle)
> +bool tick_nohz_stop_sched_tick(int inidle)
> {
> unsigned long seq, last_jiffies, next_jiffies, delta_jiffies, flags;
> struct tick_sched *ts;
> ktime_t last_update, expires, now;
> struct clock_event_device *dev = __get_cpu_var(tick_cpu_device).evtdev;
> + int stopped = false;
> u64 time_delta;
> int cpu;
>
> @@ -405,7 +406,7 @@ void tick_nohz_stop_sched_tick(int inidle)
> ts->idle_tick = hrtimer_get_expires(&ts->sched_timer);
> ts->tick_stopped = 1;
> ts->idle_jiffies = last_jiffies;
> - rcu_enter_nohz();
> + stopped = true;
> }
>
> ts->idle_sleeps++;
> @@ -445,6 +446,24 @@ out:
> ts->sleep_length = ktime_sub(dev->next_event, now);
> end:
> local_irq_restore(flags);
> +
> + return stopped;
> +}
> +
> +
> +/**
> + * tick_nohz_enter_idle - stop the tick from the idle task
> + * @rcu_ext_qs: enter rcu dynticks idle mode
> + *
> + * When an arch doesn't make any use of rcu read side critical section
> + * between tick_nohz_enter_idle() and tick_nohz_exit_idle() it can set
> + * rcu_ext_qs to 1. Otherwise it needs to call rcu_enter_nohz() itself
> + * later before the CPU goes to sleep.
> + */
> +void tick_nohz_enter_idle(bool rcu_ext_qs)
> +{
> + if (tick_nohz_stop_sched_tick(1) && rcu_ext_qs)
> + rcu_enter_nohz();
> }
>
> /**
> @@ -486,11 +505,12 @@ static void tick_nohz_restart(struct tick_sched *ts, ktime_t now)
> }
>
> /**
> - * tick_nohz_restart_sched_tick - restart the idle tick from the idle task
> + * tick_nohz_exit_idle - restart the idle tick from the idle task
> + * @rcu_ext_qs: exit rcu dynticks idle mode
> *
> * Restart the idle tick when the CPU is woken up from idle
> */
> -void tick_nohz_restart_sched_tick(void)
> +void tick_nohz_exit_idle(bool rcu_ext_qs)
> {
> int cpu = smp_processor_id();
> struct tick_sched *ts = &per_cpu(tick_cpu_sched, cpu);
> @@ -514,7 +534,8 @@ void tick_nohz_restart_sched_tick(void)
>
> ts->inidle = 0;
>
> - rcu_exit_nohz();
> + if (rcu_ext_qs)
> + rcu_exit_nohz();
>
> /* Update jiffies first */
> select_nohz_load_balancer(0);
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 1/4] nohz: Split extended quiescent state handling from nohz switch
2011-08-20 17:30 ` [PATCH 1/4] nohz: Split extended quiescent state handling from nohz switch Frederic Weisbecker
2011-08-20 17:39 ` Mike Frysinger
2011-08-22 2:02 ` Guan Xuetao
@ 2011-09-04 21:01 ` Paul E. McKenney
2011-09-04 21:05 ` Frederic Weisbecker
2011-09-04 23:36 ` Paul E. McKenney
3 siblings, 1 reply; 16+ messages in thread
From: Paul E. McKenney @ 2011-09-04 21:01 UTC (permalink / raw)
To: Frederic Weisbecker
Cc: LKML, David Miller, Chris Metcalf, Guan Xuetao,
Hans-Christian Egtvedt, Mike Frysinger, Ralf Baechle, Ingo Molnar,
Peter Zijlstra, Thomas Gleixner, H. Peter Anvin, Russell King,
Paul Mackerras, Heiko Carstens, Paul Mundt
On Sat, Aug 20, 2011 at 07:30:49PM +0200, Frederic Weisbecker wrote:
> It is assumed that rcu won't be used once we switch to tickless
> mode and until we restart the tick. However this is not always
> true, as in x86-64 where we dereference the idle notifiers after
> the tick is stopped.
>
> To prepare for fixing this, split the tickless mode switching and
> RCU extended quiescent state logics.
> Make tick_nohz_stop/restart_sched_tick() RCU agnostic but provide
> a new pair of APIs tick_nohz_enter/exit_idle() that keep the
> old behaviour by handling both the nohz mode and RCU extended
> quiescent states, then convert every archs to use these.
>
> Archs that want to switch to RCU extended QS to some custom points
> can do it later by changing the parameter in tick_nohz_enter,exit_idle()
> to false and call rcu_enter,exit() separately.
Hello, Frederic,
Some of my testing indicates that it is necessary to push
tick_nohz_enter_idle() and tick_nohz_exit_idle() further down in the
powerpc case due to tracing that happens in functions called from
ppc_md.power_save(). My current guess is that for the pSeries flavor
of powerpc, these must be called from __trace_hcall_entry() and
__trace_hcall_exit(), probably controlled by a powerpc-specific per-CPU
variable.
The following functions likely need help.
cbe_power_save()
cpm_idle()
e500_idle()
idle_doze()
idle_spin()
ppc44x_idle()
ppc6xx_idle()
ps3_power_save()
pseries_dedicated_idle_sleep()
pseries_shared_idle_sleep()
I am continuing testing. In the meantime, other thoughts?
Thanx, Paul
> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
> Cc: David Miller <davem@davemloft.net>
> Cc: Chris Metcalf <cmetcalf@tilera.com>
> Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
> Cc: Hans-Christian Egtvedt <hans-christian.egtvedt@atmel.com>
> Cc: Mike Frysinger <vapier@gentoo.org>
> Cc: Ralf Baechle <ralf@linux-mips.org>
> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> Cc: Ingo Molnar <mingo@elte.hu>
> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: H. Peter Anvin <hpa@zytor.com>
> Cc: Russell King <linux@arm.linux.org.uk>
> Cc: Paul Mackerras <paulus@samba.org>
> Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
> Cc: Paul Mundt <lethal@linux-sh.org>
> ---
> arch/arm/kernel/process.c | 4 ++--
> arch/avr32/kernel/process.c | 4 ++--
> arch/blackfin/kernel/process.c | 4 ++--
> arch/microblaze/kernel/process.c | 4 ++--
> arch/mips/kernel/process.c | 4 ++--
> arch/powerpc/kernel/idle.c | 4 ++--
> arch/powerpc/platforms/iseries/setup.c | 8 ++++----
> arch/s390/kernel/process.c | 4 ++--
> arch/sh/kernel/idle.c | 2 +-
> arch/sparc/kernel/process_64.c | 4 ++--
> arch/tile/kernel/process.c | 4 ++--
> arch/um/kernel/process.c | 4 ++--
> arch/unicore32/kernel/process.c | 4 ++--
> arch/x86/kernel/process_32.c | 4 ++--
> arch/x86/kernel/process_64.c | 4 ++--
> include/linux/tick.h | 9 +++++----
> kernel/time/tick-sched.c | 31 ++++++++++++++++++++++++++-----
> 17 files changed, 62 insertions(+), 40 deletions(-)
>
> diff --git a/arch/arm/kernel/process.c b/arch/arm/kernel/process.c
> index 5e1e541..4d2e33e 100644
> --- a/arch/arm/kernel/process.c
> +++ b/arch/arm/kernel/process.c
> @@ -182,7 +182,7 @@ void cpu_idle(void)
>
> /* endless idle loop with no priority at all */
> while (1) {
> - tick_nohz_stop_sched_tick(1);
> + tick_nohz_enter_idle(true);
> leds_event(led_idle_start);
> while (!need_resched()) {
> #ifdef CONFIG_HOTPLUG_CPU
> @@ -208,7 +208,7 @@ void cpu_idle(void)
> }
> }
> leds_event(led_idle_end);
> - tick_nohz_restart_sched_tick();
> + tick_nohz_exit_idle(true);
> preempt_enable_no_resched();
> schedule();
> preempt_disable();
> diff --git a/arch/avr32/kernel/process.c b/arch/avr32/kernel/process.c
> index ef5a2a0..a8fd60b 100644
> --- a/arch/avr32/kernel/process.c
> +++ b/arch/avr32/kernel/process.c
> @@ -34,10 +34,10 @@ void cpu_idle(void)
> {
> /* endless idle loop with no priority at all */
> while (1) {
> - tick_nohz_stop_sched_tick(1);
> + tick_nohz_enter_idle(true);
> while (!need_resched())
> cpu_idle_sleep();
> - tick_nohz_restart_sched_tick();
> + tick_nohz_exit_idle(true);
> preempt_enable_no_resched();
> schedule();
> preempt_disable();
> diff --git a/arch/blackfin/kernel/process.c b/arch/blackfin/kernel/process.c
> index 6a660fa..e032471 100644
> --- a/arch/blackfin/kernel/process.c
> +++ b/arch/blackfin/kernel/process.c
> @@ -88,10 +88,10 @@ void cpu_idle(void)
> #endif
> if (!idle)
> idle = default_idle;
> - tick_nohz_stop_sched_tick(1);
> + tick_nohz_enter_idle(true);
> while (!need_resched())
> idle();
> - tick_nohz_restart_sched_tick();
> + tick_nohz_exit_idle(true);
> preempt_enable_no_resched();
> schedule();
> preempt_disable();
> diff --git a/arch/microblaze/kernel/process.c b/arch/microblaze/kernel/process.c
> index 968648a..7134887 100644
> --- a/arch/microblaze/kernel/process.c
> +++ b/arch/microblaze/kernel/process.c
> @@ -103,10 +103,10 @@ void cpu_idle(void)
> if (!idle)
> idle = default_idle;
>
> - tick_nohz_stop_sched_tick(1);
> + tick_nohz_enter_idle(true);
> while (!need_resched())
> idle();
> - tick_nohz_restart_sched_tick();
> + tick_nohz_exit_idle(true);
>
> preempt_enable_no_resched();
> schedule();
> diff --git a/arch/mips/kernel/process.c b/arch/mips/kernel/process.c
> index d2112d3..d79e7d3 100644
> --- a/arch/mips/kernel/process.c
> +++ b/arch/mips/kernel/process.c
> @@ -56,7 +56,7 @@ void __noreturn cpu_idle(void)
>
> /* endless idle loop with no priority at all */
> while (1) {
> - tick_nohz_stop_sched_tick(1);
> + tick_nohz_enter_idle(true);
> while (!need_resched() && cpu_online(cpu)) {
> #ifdef CONFIG_MIPS_MT_SMTC
> extern void smtc_idle_loop_hook(void);
> @@ -77,7 +77,7 @@ void __noreturn cpu_idle(void)
> system_state == SYSTEM_BOOTING))
> play_dead();
> #endif
> - tick_nohz_restart_sched_tick();
> + tick_nohz_exit_idle(true);
> preempt_enable_no_resched();
> schedule();
> preempt_disable();
> diff --git a/arch/powerpc/kernel/idle.c b/arch/powerpc/kernel/idle.c
> index 39a2baa..b30ddf1 100644
> --- a/arch/powerpc/kernel/idle.c
> +++ b/arch/powerpc/kernel/idle.c
> @@ -56,7 +56,7 @@ void cpu_idle(void)
>
> set_thread_flag(TIF_POLLING_NRFLAG);
> while (1) {
> - tick_nohz_stop_sched_tick(1);
> + tick_nohz_enter_idle(true);
> while (!need_resched() && !cpu_should_die()) {
> ppc64_runlatch_off();
>
> @@ -93,7 +93,7 @@ void cpu_idle(void)
>
> HMT_medium();
> ppc64_runlatch_on();
> - tick_nohz_restart_sched_tick();
> + tick_nohz_exit_idle(true);
> preempt_enable_no_resched();
> if (cpu_should_die())
> cpu_die();
> diff --git a/arch/powerpc/platforms/iseries/setup.c b/arch/powerpc/platforms/iseries/setup.c
> index c25a081..aa2be7d 100644
> --- a/arch/powerpc/platforms/iseries/setup.c
> +++ b/arch/powerpc/platforms/iseries/setup.c
> @@ -562,7 +562,7 @@ static void yield_shared_processor(void)
> static void iseries_shared_idle(void)
> {
> while (1) {
> - tick_nohz_stop_sched_tick(1);
> + tick_nohz_enter_idle(true);
> while (!need_resched() && !hvlpevent_is_pending()) {
> local_irq_disable();
> ppc64_runlatch_off();
> @@ -576,7 +576,7 @@ static void iseries_shared_idle(void)
> }
>
> ppc64_runlatch_on();
> - tick_nohz_restart_sched_tick();
> + tick_nohz_exit_idle(true);
>
> if (hvlpevent_is_pending())
> process_iSeries_events();
> @@ -592,7 +592,7 @@ static void iseries_dedicated_idle(void)
> set_thread_flag(TIF_POLLING_NRFLAG);
>
> while (1) {
> - tick_nohz_stop_sched_tick(1);
> + tick_nohz_enter_idle(true);
> if (!need_resched()) {
> while (!need_resched()) {
> ppc64_runlatch_off();
> @@ -609,7 +609,7 @@ static void iseries_dedicated_idle(void)
> }
>
> ppc64_runlatch_on();
> - tick_nohz_restart_sched_tick();
> + tick_nohz_exit_idle(true);
> preempt_enable_no_resched();
> schedule();
> preempt_disable();
> diff --git a/arch/s390/kernel/process.c b/arch/s390/kernel/process.c
> index 541a750..b19f7b2 100644
> --- a/arch/s390/kernel/process.c
> +++ b/arch/s390/kernel/process.c
> @@ -90,10 +90,10 @@ static void default_idle(void)
> void cpu_idle(void)
> {
> for (;;) {
> - tick_nohz_stop_sched_tick(1);
> + tick_nohz_enter_idle(true);
> while (!need_resched())
> default_idle();
> - tick_nohz_restart_sched_tick();
> + tick_nohz_exit_idle(true);
> preempt_enable_no_resched();
> schedule();
> preempt_disable();
> diff --git a/arch/sh/kernel/idle.c b/arch/sh/kernel/idle.c
> index 425d604..d8997ac 100644
> --- a/arch/sh/kernel/idle.c
> +++ b/arch/sh/kernel/idle.c
> @@ -88,7 +88,7 @@ void cpu_idle(void)
>
> /* endless idle loop with no priority at all */
> while (1) {
> - tick_nohz_stop_sched_tick(1);
> + tick_nohz_enter_idle(true);
>
> while (!need_resched()) {
> check_pgt_cache();
> diff --git a/arch/sparc/kernel/process_64.c b/arch/sparc/kernel/process_64.c
> index c158a95..986d019 100644
> --- a/arch/sparc/kernel/process_64.c
> +++ b/arch/sparc/kernel/process_64.c
> @@ -95,12 +95,12 @@ void cpu_idle(void)
> set_thread_flag(TIF_POLLING_NRFLAG);
>
> while(1) {
> - tick_nohz_stop_sched_tick(1);
> + tick_nohz_enter_idle(true);
>
> while (!need_resched() && !cpu_is_offline(cpu))
> sparc64_yield(cpu);
>
> - tick_nohz_restart_sched_tick();
> + tick_nohz_exit_idle(true);
>
> preempt_enable_no_resched();
>
> diff --git a/arch/tile/kernel/process.c b/arch/tile/kernel/process.c
> index 9c45d8b..63fc9c0 100644
> --- a/arch/tile/kernel/process.c
> +++ b/arch/tile/kernel/process.c
> @@ -85,7 +85,7 @@ void cpu_idle(void)
>
> /* endless idle loop with no priority at all */
> while (1) {
> - tick_nohz_stop_sched_tick(1);
> + tick_nohz_enter_idle(true);
> while (!need_resched()) {
> if (cpu_is_offline(cpu))
> BUG(); /* no HOTPLUG_CPU */
> @@ -105,7 +105,7 @@ void cpu_idle(void)
> local_irq_enable();
> current_thread_info()->status |= TS_POLLING;
> }
> - tick_nohz_restart_sched_tick();
> + tick_nohz_exit_idle(true);
> preempt_enable_no_resched();
> schedule();
> preempt_disable();
> diff --git a/arch/um/kernel/process.c b/arch/um/kernel/process.c
> index fab4371..c18786f 100644
> --- a/arch/um/kernel/process.c
> +++ b/arch/um/kernel/process.c
> @@ -245,10 +245,10 @@ void default_idle(void)
> if (need_resched())
> schedule();
>
> - tick_nohz_stop_sched_tick(1);
> + tick_nohz_enter_idle(true);
> nsecs = disable_timer();
> idle_sleep(nsecs);
> - tick_nohz_restart_sched_tick();
> + tick_nohz_exit_idle(true);
> }
> }
>
> diff --git a/arch/unicore32/kernel/process.c b/arch/unicore32/kernel/process.c
> index ba401df..353b047 100644
> --- a/arch/unicore32/kernel/process.c
> +++ b/arch/unicore32/kernel/process.c
> @@ -55,7 +55,7 @@ void cpu_idle(void)
> {
> /* endless idle loop with no priority at all */
> while (1) {
> - tick_nohz_stop_sched_tick(1);
> + tick_nohz_enter_idle(true);
> while (!need_resched()) {
> local_irq_disable();
> stop_critical_timings();
> @@ -63,7 +63,7 @@ void cpu_idle(void)
> local_irq_enable();
> start_critical_timings();
> }
> - tick_nohz_restart_sched_tick();
> + tick_nohz_exit_idle(true);
> preempt_enable_no_resched();
> schedule();
> preempt_disable();
> diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
> index a3d0dc5..2cbf235 100644
> --- a/arch/x86/kernel/process_32.c
> +++ b/arch/x86/kernel/process_32.c
> @@ -97,7 +97,7 @@ void cpu_idle(void)
>
> /* endless idle loop with no priority at all */
> while (1) {
> - tick_nohz_stop_sched_tick(1);
> + tick_nohz_enter_idle(true);
> while (!need_resched()) {
>
> check_pgt_cache();
> @@ -112,7 +112,7 @@ void cpu_idle(void)
> pm_idle();
> start_critical_timings();
> }
> - tick_nohz_restart_sched_tick();
> + tick_nohz_exit_idle(true);
> preempt_enable_no_resched();
> schedule();
> preempt_disable();
> diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
> index ca6f7ab..3d56f0d 100644
> --- a/arch/x86/kernel/process_64.c
> +++ b/arch/x86/kernel/process_64.c
> @@ -120,7 +120,7 @@ void cpu_idle(void)
>
> /* endless idle loop with no priority at all */
> while (1) {
> - tick_nohz_stop_sched_tick(1);
> + tick_nohz_enter_idle(true);
> while (!need_resched()) {
>
> rmb();
> @@ -145,7 +145,7 @@ void cpu_idle(void)
> __exit_idle();
> }
>
> - tick_nohz_restart_sched_tick();
> + tick_nohz_exit_idle(true);
> preempt_enable_no_resched();
> schedule();
> preempt_disable();
> diff --git a/include/linux/tick.h b/include/linux/tick.h
> index b232ccc..4cbbcbb 100644
> --- a/include/linux/tick.h
> +++ b/include/linux/tick.h
> @@ -121,14 +121,15 @@ static inline int tick_oneshot_mode_active(void) { return 0; }
> #endif /* !CONFIG_GENERIC_CLOCKEVENTS */
>
> # ifdef CONFIG_NO_HZ
> -extern void tick_nohz_stop_sched_tick(int inidle);
> -extern void tick_nohz_restart_sched_tick(void);
> +extern bool tick_nohz_stop_sched_tick(int inidle);
> +extern void tick_nohz_enter_idle(bool rcu_ext_qs);
> +extern void tick_nohz_exit_idle(bool rcu_ext_qs);
> extern ktime_t tick_nohz_get_sleep_length(void);
> extern u64 get_cpu_idle_time_us(int cpu, u64 *last_update_time);
> extern u64 get_cpu_iowait_time_us(int cpu, u64 *last_update_time);
> # else
> -static inline void tick_nohz_stop_sched_tick(int inidle) { }
> -static inline void tick_nohz_restart_sched_tick(void) { }
> +static inline void tick_nohz_enter_idle(bool rcu_ext_qs) { }
> +static inline void tick_nohz_exit_idle(bool rcu_ext_qs) { }
> static inline ktime_t tick_nohz_get_sleep_length(void)
> {
> ktime_t len = { .tv64 = NSEC_PER_SEC/HZ };
> diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> index eb98e55..609bb20 100644
> --- a/kernel/time/tick-sched.c
> +++ b/kernel/time/tick-sched.c
> @@ -253,12 +253,13 @@ EXPORT_SYMBOL_GPL(get_cpu_iowait_time_us);
> * Called either from the idle loop or from irq_exit() when an idle period was
> * just interrupted by an interrupt which did not cause a reschedule.
> */
> -void tick_nohz_stop_sched_tick(int inidle)
> +bool tick_nohz_stop_sched_tick(int inidle)
> {
> unsigned long seq, last_jiffies, next_jiffies, delta_jiffies, flags;
> struct tick_sched *ts;
> ktime_t last_update, expires, now;
> struct clock_event_device *dev = __get_cpu_var(tick_cpu_device).evtdev;
> + int stopped = false;
> u64 time_delta;
> int cpu;
>
> @@ -405,7 +406,7 @@ void tick_nohz_stop_sched_tick(int inidle)
> ts->idle_tick = hrtimer_get_expires(&ts->sched_timer);
> ts->tick_stopped = 1;
> ts->idle_jiffies = last_jiffies;
> - rcu_enter_nohz();
> + stopped = true;
> }
>
> ts->idle_sleeps++;
> @@ -445,6 +446,24 @@ out:
> ts->sleep_length = ktime_sub(dev->next_event, now);
> end:
> local_irq_restore(flags);
> +
> + return stopped;
> +}
> +
> +
> +/**
> + * tick_nohz_enter_idle - stop the tick from the idle task
> + * @rcu_ext_qs: enter rcu dynticks idle mode
> + *
> + * When an arch doesn't make any use of rcu read side critical section
> + * between tick_nohz_enter_idle() and tick_nohz_exit_idle() it can set
> + * rcu_ext_qs to 1. Otherwise it needs to call rcu_enter_nohz() itself
> + * later before the CPU goes to sleep.
> + */
> +void tick_nohz_enter_idle(bool rcu_ext_qs)
> +{
> + if (tick_nohz_stop_sched_tick(1) && rcu_ext_qs)
> + rcu_enter_nohz();
> }
>
> /**
> @@ -486,11 +505,12 @@ static void tick_nohz_restart(struct tick_sched *ts, ktime_t now)
> }
>
> /**
> - * tick_nohz_restart_sched_tick - restart the idle tick from the idle task
> + * tick_nohz_exit_idle - restart the idle tick from the idle task
> + * @rcu_ext_qs: exit rcu dynticks idle mode
> *
> * Restart the idle tick when the CPU is woken up from idle
> */
> -void tick_nohz_restart_sched_tick(void)
> +void tick_nohz_exit_idle(bool rcu_ext_qs)
> {
> int cpu = smp_processor_id();
> struct tick_sched *ts = &per_cpu(tick_cpu_sched, cpu);
> @@ -514,7 +534,8 @@ void tick_nohz_restart_sched_tick(void)
>
> ts->inidle = 0;
>
> - rcu_exit_nohz();
> + if (rcu_ext_qs)
> + rcu_exit_nohz();
>
> /* Update jiffies first */
> select_nohz_load_balancer(0);
> --
> 1.7.5.4
>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 1/4] nohz: Split extended quiescent state handling from nohz switch
2011-09-04 21:01 ` Paul E. McKenney
@ 2011-09-04 21:05 ` Frederic Weisbecker
0 siblings, 0 replies; 16+ messages in thread
From: Frederic Weisbecker @ 2011-09-04 21:05 UTC (permalink / raw)
To: Paul E. McKenney
Cc: LKML, David Miller, Chris Metcalf, Guan Xuetao,
Hans-Christian Egtvedt, Mike Frysinger, Ralf Baechle, Ingo Molnar,
Peter Zijlstra, Thomas Gleixner, H. Peter Anvin, Russell King,
Paul Mackerras, Heiko Carstens, Paul Mundt
On Sun, Sep 04, 2011 at 02:01:30PM -0700, Paul E. McKenney wrote:
> On Sat, Aug 20, 2011 at 07:30:49PM +0200, Frederic Weisbecker wrote:
> > It is assumed that rcu won't be used once we switch to tickless
> > mode and until we restart the tick. However this is not always
> > true, as in x86-64 where we dereference the idle notifiers after
> > the tick is stopped.
> >
> > To prepare for fixing this, split the tickless mode switching and
> > RCU extended quiescent state logics.
> > Make tick_nohz_stop/restart_sched_tick() RCU agnostic but provide
> > a new pair of APIs tick_nohz_enter/exit_idle() that keep the
> > old behaviour by handling both the nohz mode and RCU extended
> > quiescent states, then convert every archs to use these.
> >
> > Archs that want to switch to RCU extended QS to some custom points
> > can do it later by changing the parameter in tick_nohz_enter,exit_idle()
> > to false and call rcu_enter,exit() separately.
>
> Hello, Frederic,
>
> Some of my testing indicates that it is necessary to push
> tick_nohz_enter_idle() and tick_nohz_exit_idle() further down in the
> powerpc case due to tracing that happens in functions called from
> ppc_md.power_save(). My current guess is that for the pSeries flavor
> of powerpc, these must be called from __trace_hcall_entry() and
> __trace_hcall_exit(), probably controlled by a powerpc-specific per-CPU
> variable.
>
> The following functions likely need help.
>
> cbe_power_save()
> cpm_idle()
> e500_idle()
> idle_doze()
> idle_spin()
> ppc44x_idle()
> ppc6xx_idle()
> ps3_power_save()
> pseries_dedicated_idle_sleep()
> pseries_shared_idle_sleep()
>
> I am continuing testing. In the meantime, other thoughts?
>
> Thanx, Paul
Yeah sure, I still have that in my todo list. I wanted to ensure
first that at least you are fine with the new interface before I go
fixing more detected bad callsites.
Thanks.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 1/4] nohz: Split extended quiescent state handling from nohz switch
2011-08-20 17:30 ` [PATCH 1/4] nohz: Split extended quiescent state handling from nohz switch Frederic Weisbecker
` (2 preceding siblings ...)
2011-09-04 21:01 ` Paul E. McKenney
@ 2011-09-04 23:36 ` Paul E. McKenney
2011-09-06 14:58 ` Frederic Weisbecker
2011-09-07 14:22 ` Paul E. McKenney
3 siblings, 2 replies; 16+ messages in thread
From: Paul E. McKenney @ 2011-09-04 23:36 UTC (permalink / raw)
To: Frederic Weisbecker
Cc: LKML, David Miller, Chris Metcalf, Guan Xuetao,
Hans-Christian Egtvedt, Mike Frysinger, Ralf Baechle, Ingo Molnar,
Peter Zijlstra, Thomas Gleixner, H. Peter Anvin, Russell King,
Paul Mackerras, Heiko Carstens, Paul Mundt
On Sat, Aug 20, 2011 at 07:30:49PM +0200, Frederic Weisbecker wrote:
> It is assumed that rcu won't be used once we switch to tickless
> mode and until we restart the tick. However this is not always
> true, as in x86-64 where we dereference the idle notifiers after
> the tick is stopped.
>
> To prepare for fixing this, split the tickless mode switching and
> RCU extended quiescent state logics.
> Make tick_nohz_stop/restart_sched_tick() RCU agnostic but provide
> a new pair of APIs tick_nohz_enter/exit_idle() that keep the
> old behaviour by handling both the nohz mode and RCU extended
> quiescent states, then convert every archs to use these.
>
> Archs that want to switch to RCU extended QS to some custom points
> can do it later by changing the parameter in tick_nohz_enter,exit_idle()
> to false and call rcu_enter,exit() separately.
This approach looks quite good to me! A few comments below.
> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
> Cc: David Miller <davem@davemloft.net>
> Cc: Chris Metcalf <cmetcalf@tilera.com>
> Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
> Cc: Hans-Christian Egtvedt <hans-christian.egtvedt@atmel.com>
> Cc: Mike Frysinger <vapier@gentoo.org>
> Cc: Ralf Baechle <ralf@linux-mips.org>
> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> Cc: Ingo Molnar <mingo@elte.hu>
> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: H. Peter Anvin <hpa@zytor.com>
> Cc: Russell King <linux@arm.linux.org.uk>
> Cc: Paul Mackerras <paulus@samba.org>
> Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
> Cc: Paul Mundt <lethal@linux-sh.org>
> ---
[ . . . ]
> diff --git a/arch/powerpc/kernel/idle.c b/arch/powerpc/kernel/idle.c
> index 39a2baa..b30ddf1 100644
> --- a/arch/powerpc/kernel/idle.c
> +++ b/arch/powerpc/kernel/idle.c
> @@ -56,7 +56,7 @@ void cpu_idle(void)
>
> set_thread_flag(TIF_POLLING_NRFLAG);
> while (1) {
> - tick_nohz_stop_sched_tick(1);
> + tick_nohz_enter_idle(true);
This needs to pass false, as some variants of powerpc use event tracing
(which in turn uses RCU) just before returning to the hypervisor.
> while (!need_resched() && !cpu_should_die()) {
> ppc64_runlatch_off();
>
> @@ -93,7 +93,7 @@ void cpu_idle(void)
>
> HMT_medium();
> ppc64_runlatch_on();
> - tick_nohz_restart_sched_tick();
> + tick_nohz_exit_idle(true);
As does this, as some variants of powerpc use event tracing just after
the hypervisor returns control to the OS.
> preempt_enable_no_resched();
> if (cpu_should_die())
> cpu_die();
> diff --git a/arch/powerpc/platforms/iseries/setup.c b/arch/powerpc/platforms/iseries/setup.c
> index c25a081..aa2be7d 100644
> --- a/arch/powerpc/platforms/iseries/setup.c
> +++ b/arch/powerpc/platforms/iseries/setup.c
> @@ -562,7 +562,7 @@ static void yield_shared_processor(void)
> static void iseries_shared_idle(void)
> {
> while (1) {
> - tick_nohz_stop_sched_tick(1);
> + tick_nohz_enter_idle(true);
But I don't know enough about iseries to know whether or not this works.
> while (!need_resched() && !hvlpevent_is_pending()) {
> local_irq_disable();
> ppc64_runlatch_off();
> @@ -576,7 +576,7 @@ static void iseries_shared_idle(void)
> }
>
> ppc64_runlatch_on();
> - tick_nohz_restart_sched_tick();
> + tick_nohz_exit_idle(true);
>
> if (hvlpevent_is_pending())
> process_iSeries_events();
> @@ -592,7 +592,7 @@ static void iseries_dedicated_idle(void)
> set_thread_flag(TIF_POLLING_NRFLAG);
>
> while (1) {
> - tick_nohz_stop_sched_tick(1);
> + tick_nohz_enter_idle(true);
> if (!need_resched()) {
> while (!need_resched()) {
> ppc64_runlatch_off();
> @@ -609,7 +609,7 @@ static void iseries_dedicated_idle(void)
> }
>
> ppc64_runlatch_on();
> - tick_nohz_restart_sched_tick();
> + tick_nohz_exit_idle(true);
> preempt_enable_no_resched();
> schedule();
> preempt_disable();
[ . . . ]
> diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> index eb98e55..609bb20 100644
> --- a/kernel/time/tick-sched.c
> +++ b/kernel/time/tick-sched.c
> @@ -253,12 +253,13 @@ EXPORT_SYMBOL_GPL(get_cpu_iowait_time_us);
> * Called either from the idle loop or from irq_exit() when an idle period was
> * just interrupted by an interrupt which did not cause a reschedule.
> */
> -void tick_nohz_stop_sched_tick(int inidle)
> +bool tick_nohz_stop_sched_tick(int inidle)
> {
> unsigned long seq, last_jiffies, next_jiffies, delta_jiffies, flags;
> struct tick_sched *ts;
> ktime_t last_update, expires, now;
> struct clock_event_device *dev = __get_cpu_var(tick_cpu_device).evtdev;
> + int stopped = false;
> u64 time_delta;
> int cpu;
>
> @@ -405,7 +406,7 @@ void tick_nohz_stop_sched_tick(int inidle)
> ts->idle_tick = hrtimer_get_expires(&ts->sched_timer);
> ts->tick_stopped = 1;
> ts->idle_jiffies = last_jiffies;
> - rcu_enter_nohz();
> + stopped = true;
> }
>
> ts->idle_sleeps++;
> @@ -445,6 +446,24 @@ out:
> ts->sleep_length = ktime_sub(dev->next_event, now);
> end:
> local_irq_restore(flags);
> +
> + return stopped;
> +}
> +
> +
> +/**
> + * tick_nohz_enter_idle - stop the tick from the idle task
> + * @rcu_ext_qs: enter rcu dynticks idle mode
> + *
> + * When an arch doesn't make any use of rcu read side critical section
> + * between tick_nohz_enter_idle() and tick_nohz_exit_idle() it can set
> + * rcu_ext_qs to 1. Otherwise it needs to call rcu_enter_nohz() itself
> + * later before the CPU goes to sleep.
How about something like this?
+ * When an arch doesn't make any use of rcu read side critical section
+ * between tick_nohz_enter_idle() and the time the CPU is put to sleep,
+ * it can set rcu_ext_qs to true. Otherwise it needs set rcu_ext_qs to
+ * false, and then to call rcu_enter_nohz() itself later, but before the
+ * CPU goes to sleep and after its last use of RCU.
> + */
> +void tick_nohz_enter_idle(bool rcu_ext_qs)
> +{
> + if (tick_nohz_stop_sched_tick(1) && rcu_ext_qs)
> + rcu_enter_nohz();
> }
>
> /**
> @@ -486,11 +505,12 @@ static void tick_nohz_restart(struct tick_sched *ts, ktime_t now)
> }
>
> /**
> - * tick_nohz_restart_sched_tick - restart the idle tick from the idle task
> + * tick_nohz_exit_idle - restart the idle tick from the idle task
> + * @rcu_ext_qs: exit rcu dynticks idle mode
> *
> * Restart the idle tick when the CPU is woken up from idle
How about something like the following?
+ * When an arch doesn't make any use of rcu read side critical section
+ * between the time the CPU wakes up and tick_nohz_exit_idle(), it can set
+ * rcu_ext_qs to true. Otherwise it needs set rcu_ext_qs to false, and it
+ * must also have called rcu_enter_nohz() itself earlier, after the CPU
+ * was awakened, but before its first use of RCU.
> */
> -void tick_nohz_restart_sched_tick(void)
> +void tick_nohz_exit_idle(bool rcu_ext_qs)
> {
> int cpu = smp_processor_id();
> struct tick_sched *ts = &per_cpu(tick_cpu_sched, cpu);
> @@ -514,7 +534,8 @@ void tick_nohz_restart_sched_tick(void)
>
> ts->inidle = 0;
>
> - rcu_exit_nohz();
> + if (rcu_ext_qs)
> + rcu_exit_nohz();
>
> /* Update jiffies first */
> select_nohz_load_balancer(0);
> --
> 1.7.5.4
>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 4/4] rcu: Fix early call to rcu_irq_exit()
2011-08-20 17:30 ` [PATCH 4/4] rcu: Fix early call to rcu_irq_exit() Frederic Weisbecker
@ 2011-09-04 23:38 ` Paul E. McKenney
0 siblings, 0 replies; 16+ messages in thread
From: Paul E. McKenney @ 2011-09-04 23:38 UTC (permalink / raw)
To: Frederic Weisbecker; +Cc: LKML, Ingo Molnar, Thomas Gleixner, Peter Zijlstra
On Sat, Aug 20, 2011 at 07:30:52PM +0200, Frederic Weisbecker wrote:
> On the irq exit path, tick_nohz_stop_sched_tick()
> may raise a softirq, which action leads to the wake up
> path and select_task_rq_fair() that makes use of rcu
> to iterate the domains.
>
> This is an illegal use of RCU because we may be in dyntick
> idle mode:
The rest of these look good!
Thanx, Paul
> [ 132.978883] ===============================
> [ 132.978883] [ INFO: suspicious RCU usage. ]
> [ 132.978883] -------------------------------
> [ 132.978883] kernel/sched_fair.c:1707 suspicious rcu_dereference_check() usage!
> [ 132.978883]
> [ 132.978883] other info that might help us debug this:
> [ 132.978883]
> [ 132.978883]
> [ 132.978883] rcu_scheduler_active = 1, debug_locks = 0
> [ 132.978883] RCU used illegally from extended quiescent state!
> [ 132.978883] 2 locks held by swapper/0:
> [ 132.978883] #0: (&p->pi_lock){-.-.-.}, at: [<ffffffff8105a729>] try_to_wake_up+0x39/0x2f0
> [ 132.978883] #1: (rcu_read_lock){.+.+..}, at: [<ffffffff8105556a>] select_task_rq_fair+0x6a/0xec0
> [ 132.978883]
> [ 132.978883] stack backtrace:
> [ 132.978883] Pid: 0, comm: swapper Tainted: G W 3.0.0+ #178
> [ 132.978883] Call Trace:
> [ 132.978883] <IRQ> [<ffffffff810a01f6>] lockdep_rcu_suspicious+0xe6/0x100
> [ 132.978883] [<ffffffff81055c49>] select_task_rq_fair+0x749/0xec0
> [ 132.978883] [<ffffffff8105556a>] ? select_task_rq_fair+0x6a/0xec0
> [ 132.978883] [<ffffffff812fe494>] ? do_raw_spin_lock+0x54/0x150
> [ 132.978883] [<ffffffff810a1f2d>] ? trace_hardirqs_on+0xd/0x10
> [ 132.978883] [<ffffffff8105a7c3>] try_to_wake_up+0xd3/0x2f0
> [ 132.978883] [<ffffffff81094f98>] ? ktime_get+0x68/0xf0
> [ 132.978883] [<ffffffff8105aa35>] wake_up_process+0x15/0x20
> [ 132.978883] [<ffffffff81069dd5>] raise_softirq_irqoff+0x65/0x110
> [ 132.978883] [<ffffffff8108eb65>] __hrtimer_start_range_ns+0x415/0x5a0
> [ 132.978883] [<ffffffff812fe3ee>] ? do_raw_spin_unlock+0x5e/0xb0
> [ 132.978883] [<ffffffff8108ed08>] hrtimer_start+0x18/0x20
> [ 132.978883] [<ffffffff8109c9c3>] tick_nohz_stop_sched_tick+0x393/0x450
> [ 132.978883] [<ffffffff810694f2>] irq_exit+0xd2/0x100
> [ 132.978883] [<ffffffff81829e96>] do_IRQ+0x66/0xe0
> [ 132.978883] [<ffffffff81820d53>] common_interrupt+0x13/0x13
> [ 132.978883] <EOI> [<ffffffff8103434b>] ? native_safe_halt+0xb/0x10
> [ 132.978883] [<ffffffff810a1f2d>] ? trace_hardirqs_on+0xd/0x10
> [ 132.978883] [<ffffffff810144ea>] default_idle+0xba/0x370
> [ 132.978883] [<ffffffff810147fe>] amd_e400_idle+0x5e/0x130
> [ 132.978883] [<ffffffff8100a9f6>] cpu_idle+0xb6/0x120
> [ 132.978883] [<ffffffff817f217f>] rest_init+0xef/0x150
> [ 132.978883] [<ffffffff817f20e2>] ? rest_init+0x52/0x150
> [ 132.978883] [<ffffffff81ed9cf3>] start_kernel+0x3da/0x3e5
> [ 132.978883] [<ffffffff81ed9346>] x86_64_start_reservations+0x131/0x135
> [ 132.978883] [<ffffffff81ed944d>] x86_64_start_kernel+0x103/0x112
>
> Fix this by calling rcu_irq_exit() after tick_nohz_stop_sched_tick().
>
> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> Cc: Ingo Molnar <mingo@elte.hu>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
> ---
> kernel/softirq.c | 2 +-
> 1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/kernel/softirq.c b/kernel/softirq.c
> index fca82c3..d4047b8 100644
> --- a/kernel/softirq.c
> +++ b/kernel/softirq.c
> @@ -347,12 +347,12 @@ void irq_exit(void)
> if (!in_interrupt() && local_softirq_pending())
> invoke_softirq();
>
> - rcu_irq_exit();
> #ifdef CONFIG_NO_HZ
> /* Make sure that timer wheel updates are propagated */
> if (idle_cpu(smp_processor_id()) && !in_interrupt() && !need_resched())
> tick_nohz_stop_sched_tick(0);
> #endif
> + rcu_irq_exit();
> preempt_enable_no_resched();
> }
>
> --
> 1.7.5.4
>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 1/4] nohz: Split extended quiescent state handling from nohz switch
2011-09-04 23:36 ` Paul E. McKenney
@ 2011-09-06 14:58 ` Frederic Weisbecker
2011-09-07 14:22 ` Paul E. McKenney
1 sibling, 0 replies; 16+ messages in thread
From: Frederic Weisbecker @ 2011-09-06 14:58 UTC (permalink / raw)
To: Paul E. McKenney
Cc: LKML, David Miller, Chris Metcalf, Guan Xuetao,
Hans-Christian Egtvedt, Mike Frysinger, Ralf Baechle, Ingo Molnar,
Peter Zijlstra, Thomas Gleixner, H. Peter Anvin, Russell King,
Paul Mackerras, Heiko Carstens, Paul Mundt
On Sun, Sep 04, 2011 at 04:36:43PM -0700, Paul E. McKenney wrote:
> On Sat, Aug 20, 2011 at 07:30:49PM +0200, Frederic Weisbecker wrote:
> > It is assumed that rcu won't be used once we switch to tickless
> > mode and until we restart the tick. However this is not always
> > true, as in x86-64 where we dereference the idle notifiers after
> > the tick is stopped.
> >
> > To prepare for fixing this, split the tickless mode switching and
> > RCU extended quiescent state logics.
> > Make tick_nohz_stop/restart_sched_tick() RCU agnostic but provide
> > a new pair of APIs tick_nohz_enter/exit_idle() that keep the
> > old behaviour by handling both the nohz mode and RCU extended
> > quiescent states, then convert every archs to use these.
> >
> > Archs that want to switch to RCU extended QS to some custom points
> > can do it later by changing the parameter in tick_nohz_enter,exit_idle()
> > to false and call rcu_enter,exit() separately.
>
> This approach looks quite good to me! A few comments below.
>
> > Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
> > Cc: David Miller <davem@davemloft.net>
> > Cc: Chris Metcalf <cmetcalf@tilera.com>
> > Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
> > Cc: Hans-Christian Egtvedt <hans-christian.egtvedt@atmel.com>
> > Cc: Mike Frysinger <vapier@gentoo.org>
> > Cc: Ralf Baechle <ralf@linux-mips.org>
> > Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > Cc: Ingo Molnar <mingo@elte.hu>
> > Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Cc: H. Peter Anvin <hpa@zytor.com>
> > Cc: Russell King <linux@arm.linux.org.uk>
> > Cc: Paul Mackerras <paulus@samba.org>
> > Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
> > Cc: Paul Mundt <lethal@linux-sh.org>
> > ---
>
> [ . . . ]
>
> > diff --git a/arch/powerpc/kernel/idle.c b/arch/powerpc/kernel/idle.c
> > index 39a2baa..b30ddf1 100644
> > --- a/arch/powerpc/kernel/idle.c
> > +++ b/arch/powerpc/kernel/idle.c
> > @@ -56,7 +56,7 @@ void cpu_idle(void)
> >
> > set_thread_flag(TIF_POLLING_NRFLAG);
> > while (1) {
> > - tick_nohz_stop_sched_tick(1);
> > + tick_nohz_enter_idle(true);
>
> This needs to pass false, as some variants of powerpc use event tracing
> (which in turn uses RCU) just before returning to the hypervisor.
Right. But I wanted this patch to only perform the API conversion, then
later pushdown the rcu_enter_nohz() things.
> > while (!need_resched() && !cpu_should_die()) {
> > ppc64_runlatch_off();
> >
> > @@ -93,7 +93,7 @@ void cpu_idle(void)
> >
> > HMT_medium();
> > ppc64_runlatch_on();
> > - tick_nohz_restart_sched_tick();
> > + tick_nohz_exit_idle(true);
>
> As does this, as some variants of powerpc use event tracing just after
> the hypervisor returns control to the OS.
Right
<snip>
> > +/**
> > + * tick_nohz_enter_idle - stop the tick from the idle task
> > + * @rcu_ext_qs: enter rcu dynticks idle mode
> > + *
> > + * When an arch doesn't make any use of rcu read side critical section
> > + * between tick_nohz_enter_idle() and tick_nohz_exit_idle() it can set
> > + * rcu_ext_qs to 1. Otherwise it needs to call rcu_enter_nohz() itself
> > + * later before the CPU goes to sleep.
>
> How about something like this?
>
> + * When an arch doesn't make any use of rcu read side critical section
> + * between tick_nohz_enter_idle() and the time the CPU is put to sleep,
> + * it can set rcu_ext_qs to true. Otherwise it needs set rcu_ext_qs to
> + * false, and then to call rcu_enter_nohz() itself later, but before the
> + * CPU goes to sleep and after its last use of RCU.
Better yeah, I'll use your version ;)
> > + */
> > +void tick_nohz_enter_idle(bool rcu_ext_qs)
> > +{
> > + if (tick_nohz_stop_sched_tick(1) && rcu_ext_qs)
> > + rcu_enter_nohz();
> > }
> >
> > /**
> > @@ -486,11 +505,12 @@ static void tick_nohz_restart(struct tick_sched *ts, ktime_t now)
> > }
> >
> > /**
> > - * tick_nohz_restart_sched_tick - restart the idle tick from the idle task
> > + * tick_nohz_exit_idle - restart the idle tick from the idle task
> > + * @rcu_ext_qs: exit rcu dynticks idle mode
> > *
> > * Restart the idle tick when the CPU is woken up from idle
>
> How about something like the following?
>
> + * When an arch doesn't make any use of rcu read side critical section
> + * between the time the CPU wakes up and tick_nohz_exit_idle(), it can set
> + * rcu_ext_qs to true. Otherwise it needs set rcu_ext_qs to false, and it
> + * must also have called rcu_enter_nohz() itself earlier, after the CPU
> + * was awakened, but before its first use of RCU.
Right, but to keep things clear, as long as we call tick_nohz_enter_idle(false)
and rcu_enter_nohz() later, we should keep the same mirroring with rcu_exit_nohz()
and tick_nohz_exit_idle(false).
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 1/4] nohz: Split extended quiescent state handling from nohz switch
2011-09-04 23:36 ` Paul E. McKenney
2011-09-06 14:58 ` Frederic Weisbecker
@ 2011-09-07 14:22 ` Paul E. McKenney
2011-09-07 21:39 ` Frederic Weisbecker
2011-09-13 0:06 ` Frederic Weisbecker
1 sibling, 2 replies; 16+ messages in thread
From: Paul E. McKenney @ 2011-09-07 14:22 UTC (permalink / raw)
To: Frederic Weisbecker
Cc: LKML, David Miller, Chris Metcalf, Guan Xuetao,
Hans-Christian Egtvedt, Mike Frysinger, Ralf Baechle, Ingo Molnar,
Peter Zijlstra, Thomas Gleixner, H. Peter Anvin, Russell King,
Paul Mackerras, Heiko Carstens, Paul Mundt
On Sun, Sep 04, 2011 at 04:36:43PM -0700, Paul E. McKenney wrote:
> On Sat, Aug 20, 2011 at 07:30:49PM +0200, Frederic Weisbecker wrote:
> > It is assumed that rcu won't be used once we switch to tickless
> > mode and until we restart the tick. However this is not always
> > true, as in x86-64 where we dereference the idle notifiers after
> > the tick is stopped.
> >
> > To prepare for fixing this, split the tickless mode switching and
> > RCU extended quiescent state logics.
> > Make tick_nohz_stop/restart_sched_tick() RCU agnostic but provide
> > a new pair of APIs tick_nohz_enter/exit_idle() that keep the
> > old behaviour by handling both the nohz mode and RCU extended
> > quiescent states, then convert every archs to use these.
> >
> > Archs that want to switch to RCU extended QS to some custom points
> > can do it later by changing the parameter in tick_nohz_enter,exit_idle()
> > to false and call rcu_enter,exit() separately.
>
> This approach looks quite good to me! A few comments below.
But I get RCU stall warnings when running it on powerpc on top of
the patch set at:
https://lkml.org/lkml/2011/9/7/64
At first glance, it appears that CPUs are entering dyntick-idle
mode without RCU being informed. Any thoughts on diagnostics?
Thanx, Paul
> > Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
> > Cc: David Miller <davem@davemloft.net>
> > Cc: Chris Metcalf <cmetcalf@tilera.com>
> > Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
> > Cc: Hans-Christian Egtvedt <hans-christian.egtvedt@atmel.com>
> > Cc: Mike Frysinger <vapier@gentoo.org>
> > Cc: Ralf Baechle <ralf@linux-mips.org>
> > Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > Cc: Ingo Molnar <mingo@elte.hu>
> > Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Cc: H. Peter Anvin <hpa@zytor.com>
> > Cc: Russell King <linux@arm.linux.org.uk>
> > Cc: Paul Mackerras <paulus@samba.org>
> > Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
> > Cc: Paul Mundt <lethal@linux-sh.org>
> > ---
>
> [ . . . ]
>
> > diff --git a/arch/powerpc/kernel/idle.c b/arch/powerpc/kernel/idle.c
> > index 39a2baa..b30ddf1 100644
> > --- a/arch/powerpc/kernel/idle.c
> > +++ b/arch/powerpc/kernel/idle.c
> > @@ -56,7 +56,7 @@ void cpu_idle(void)
> >
> > set_thread_flag(TIF_POLLING_NRFLAG);
> > while (1) {
> > - tick_nohz_stop_sched_tick(1);
> > + tick_nohz_enter_idle(true);
>
> This needs to pass false, as some variants of powerpc use event tracing
> (which in turn uses RCU) just before returning to the hypervisor.
>
> > while (!need_resched() && !cpu_should_die()) {
> > ppc64_runlatch_off();
> >
> > @@ -93,7 +93,7 @@ void cpu_idle(void)
> >
> > HMT_medium();
> > ppc64_runlatch_on();
> > - tick_nohz_restart_sched_tick();
> > + tick_nohz_exit_idle(true);
>
> As does this, as some variants of powerpc use event tracing just after
> the hypervisor returns control to the OS.
>
> > preempt_enable_no_resched();
> > if (cpu_should_die())
> > cpu_die();
> > diff --git a/arch/powerpc/platforms/iseries/setup.c b/arch/powerpc/platforms/iseries/setup.c
> > index c25a081..aa2be7d 100644
> > --- a/arch/powerpc/platforms/iseries/setup.c
> > +++ b/arch/powerpc/platforms/iseries/setup.c
> > @@ -562,7 +562,7 @@ static void yield_shared_processor(void)
> > static void iseries_shared_idle(void)
> > {
> > while (1) {
> > - tick_nohz_stop_sched_tick(1);
> > + tick_nohz_enter_idle(true);
>
> But I don't know enough about iseries to know whether or not this works.
>
> > while (!need_resched() && !hvlpevent_is_pending()) {
> > local_irq_disable();
> > ppc64_runlatch_off();
> > @@ -576,7 +576,7 @@ static void iseries_shared_idle(void)
> > }
> >
> > ppc64_runlatch_on();
> > - tick_nohz_restart_sched_tick();
> > + tick_nohz_exit_idle(true);
> >
> > if (hvlpevent_is_pending())
> > process_iSeries_events();
> > @@ -592,7 +592,7 @@ static void iseries_dedicated_idle(void)
> > set_thread_flag(TIF_POLLING_NRFLAG);
> >
> > while (1) {
> > - tick_nohz_stop_sched_tick(1);
> > + tick_nohz_enter_idle(true);
> > if (!need_resched()) {
> > while (!need_resched()) {
> > ppc64_runlatch_off();
> > @@ -609,7 +609,7 @@ static void iseries_dedicated_idle(void)
> > }
> >
> > ppc64_runlatch_on();
> > - tick_nohz_restart_sched_tick();
> > + tick_nohz_exit_idle(true);
> > preempt_enable_no_resched();
> > schedule();
> > preempt_disable();
>
> [ . . . ]
>
> > diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> > index eb98e55..609bb20 100644
> > --- a/kernel/time/tick-sched.c
> > +++ b/kernel/time/tick-sched.c
> > @@ -253,12 +253,13 @@ EXPORT_SYMBOL_GPL(get_cpu_iowait_time_us);
> > * Called either from the idle loop or from irq_exit() when an idle period was
> > * just interrupted by an interrupt which did not cause a reschedule.
> > */
> > -void tick_nohz_stop_sched_tick(int inidle)
> > +bool tick_nohz_stop_sched_tick(int inidle)
> > {
> > unsigned long seq, last_jiffies, next_jiffies, delta_jiffies, flags;
> > struct tick_sched *ts;
> > ktime_t last_update, expires, now;
> > struct clock_event_device *dev = __get_cpu_var(tick_cpu_device).evtdev;
> > + int stopped = false;
> > u64 time_delta;
> > int cpu;
> >
> > @@ -405,7 +406,7 @@ void tick_nohz_stop_sched_tick(int inidle)
> > ts->idle_tick = hrtimer_get_expires(&ts->sched_timer);
> > ts->tick_stopped = 1;
> > ts->idle_jiffies = last_jiffies;
> > - rcu_enter_nohz();
> > + stopped = true;
> > }
> >
> > ts->idle_sleeps++;
> > @@ -445,6 +446,24 @@ out:
> > ts->sleep_length = ktime_sub(dev->next_event, now);
> > end:
> > local_irq_restore(flags);
> > +
> > + return stopped;
> > +}
> > +
> > +
> > +/**
> > + * tick_nohz_enter_idle - stop the tick from the idle task
> > + * @rcu_ext_qs: enter rcu dynticks idle mode
> > + *
> > + * When an arch doesn't make any use of rcu read side critical section
> > + * between tick_nohz_enter_idle() and tick_nohz_exit_idle() it can set
> > + * rcu_ext_qs to 1. Otherwise it needs to call rcu_enter_nohz() itself
> > + * later before the CPU goes to sleep.
>
> How about something like this?
>
> + * When an arch doesn't make any use of rcu read side critical section
> + * between tick_nohz_enter_idle() and the time the CPU is put to sleep,
> + * it can set rcu_ext_qs to true. Otherwise it needs set rcu_ext_qs to
> + * false, and then to call rcu_enter_nohz() itself later, but before the
> + * CPU goes to sleep and after its last use of RCU.
>
> > + */
> > +void tick_nohz_enter_idle(bool rcu_ext_qs)
> > +{
> > + if (tick_nohz_stop_sched_tick(1) && rcu_ext_qs)
> > + rcu_enter_nohz();
> > }
> >
> > /**
> > @@ -486,11 +505,12 @@ static void tick_nohz_restart(struct tick_sched *ts, ktime_t now)
> > }
> >
> > /**
> > - * tick_nohz_restart_sched_tick - restart the idle tick from the idle task
> > + * tick_nohz_exit_idle - restart the idle tick from the idle task
> > + * @rcu_ext_qs: exit rcu dynticks idle mode
> > *
> > * Restart the idle tick when the CPU is woken up from idle
>
> How about something like the following?
>
> + * When an arch doesn't make any use of rcu read side critical section
> + * between the time the CPU wakes up and tick_nohz_exit_idle(), it can set
> + * rcu_ext_qs to true. Otherwise it needs set rcu_ext_qs to false, and it
> + * must also have called rcu_enter_nohz() itself earlier, after the CPU
> + * was awakened, but before its first use of RCU.
>
> > */
> > -void tick_nohz_restart_sched_tick(void)
> > +void tick_nohz_exit_idle(bool rcu_ext_qs)
> > {
> > int cpu = smp_processor_id();
> > struct tick_sched *ts = &per_cpu(tick_cpu_sched, cpu);
> > @@ -514,7 +534,8 @@ void tick_nohz_restart_sched_tick(void)
> >
> > ts->inidle = 0;
> >
> > - rcu_exit_nohz();
> > + if (rcu_ext_qs)
> > + rcu_exit_nohz();
> >
> > /* Update jiffies first */
> > select_nohz_load_balancer(0);
> > --
> > 1.7.5.4
> >
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 1/4] nohz: Split extended quiescent state handling from nohz switch
2011-09-07 14:22 ` Paul E. McKenney
@ 2011-09-07 21:39 ` Frederic Weisbecker
2011-09-13 0:06 ` Frederic Weisbecker
1 sibling, 0 replies; 16+ messages in thread
From: Frederic Weisbecker @ 2011-09-07 21:39 UTC (permalink / raw)
To: Paul E. McKenney
Cc: LKML, David Miller, Chris Metcalf, Guan Xuetao,
Hans-Christian Egtvedt, Mike Frysinger, Ralf Baechle, Ingo Molnar,
Peter Zijlstra, Thomas Gleixner, H. Peter Anvin, Russell King,
Paul Mackerras, Heiko Carstens, Paul Mundt
On Wed, Sep 07, 2011 at 07:22:33AM -0700, Paul E. McKenney wrote:
> On Sun, Sep 04, 2011 at 04:36:43PM -0700, Paul E. McKenney wrote:
> > On Sat, Aug 20, 2011 at 07:30:49PM +0200, Frederic Weisbecker wrote:
> > > It is assumed that rcu won't be used once we switch to tickless
> > > mode and until we restart the tick. However this is not always
> > > true, as in x86-64 where we dereference the idle notifiers after
> > > the tick is stopped.
> > >
> > > To prepare for fixing this, split the tickless mode switching and
> > > RCU extended quiescent state logics.
> > > Make tick_nohz_stop/restart_sched_tick() RCU agnostic but provide
> > > a new pair of APIs tick_nohz_enter/exit_idle() that keep the
> > > old behaviour by handling both the nohz mode and RCU extended
> > > quiescent states, then convert every archs to use these.
> > >
> > > Archs that want to switch to RCU extended QS to some custom points
> > > can do it later by changing the parameter in tick_nohz_enter,exit_idle()
> > > to false and call rcu_enter,exit() separately.
> >
> > This approach looks quite good to me! A few comments below.
>
> But I get RCU stall warnings when running it on powerpc on top of
> the patch set at:
>
> https://lkml.org/lkml/2011/9/7/64
>
> At first glance, it appears that CPUs are entering dyntick-idle
> mode without RCU being informed. Any thoughts on diagnostics?
>
> Thanx, Paul
Nothing looks wrong to me looking at the powerpc part in my patchset.
But I have no machine to test anything other than x86.
That said I can't even fetch your tree to test on x86 because of the
kernel.org's desolation.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 1/4] nohz: Split extended quiescent state handling from nohz switch
2011-09-07 14:22 ` Paul E. McKenney
2011-09-07 21:39 ` Frederic Weisbecker
@ 2011-09-13 0:06 ` Frederic Weisbecker
2011-09-13 16:49 ` Paul E. McKenney
1 sibling, 1 reply; 16+ messages in thread
From: Frederic Weisbecker @ 2011-09-13 0:06 UTC (permalink / raw)
To: Paul E. McKenney
Cc: LKML, David Miller, Chris Metcalf, Guan Xuetao,
Hans-Christian Egtvedt, Mike Frysinger, Ralf Baechle, Ingo Molnar,
Peter Zijlstra, Thomas Gleixner, H. Peter Anvin, Russell King,
Paul Mackerras, Heiko Carstens, Paul Mundt
On Wed, Sep 07, 2011 at 07:22:33AM -0700, Paul E. McKenney wrote:
> On Sun, Sep 04, 2011 at 04:36:43PM -0700, Paul E. McKenney wrote:
> > On Sat, Aug 20, 2011 at 07:30:49PM +0200, Frederic Weisbecker wrote:
> > > It is assumed that rcu won't be used once we switch to tickless
> > > mode and until we restart the tick. However this is not always
> > > true, as in x86-64 where we dereference the idle notifiers after
> > > the tick is stopped.
> > >
> > > To prepare for fixing this, split the tickless mode switching and
> > > RCU extended quiescent state logics.
> > > Make tick_nohz_stop/restart_sched_tick() RCU agnostic but provide
> > > a new pair of APIs tick_nohz_enter/exit_idle() that keep the
> > > old behaviour by handling both the nohz mode and RCU extended
> > > quiescent states, then convert every archs to use these.
> > >
> > > Archs that want to switch to RCU extended QS to some custom points
> > > can do it later by changing the parameter in tick_nohz_enter,exit_idle()
> > > to false and call rcu_enter,exit() separately.
> >
> > This approach looks quite good to me! A few comments below.
>
> But I get RCU stall warnings when running it on powerpc on top of
> the patch set at:
>
> https://lkml.org/lkml/2011/9/7/64
>
> At first glance, it appears that CPUs are entering dyntick-idle
> mode without RCU being informed. Any thoughts on diagnostics?
>
> Thanx, Paul
I've just tested with your rcu/next branch from github and applied
my patches on top of it but did not experience a stall in x86.
That may be related to powerpc more precisely, or may be any arch
but on some particular condition.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 1/4] nohz: Split extended quiescent state handling from nohz switch
2011-09-13 0:06 ` Frederic Weisbecker
@ 2011-09-13 16:49 ` Paul E. McKenney
0 siblings, 0 replies; 16+ messages in thread
From: Paul E. McKenney @ 2011-09-13 16:49 UTC (permalink / raw)
To: Frederic Weisbecker
Cc: LKML, David Miller, Chris Metcalf, Guan Xuetao,
Hans-Christian Egtvedt, Mike Frysinger, Ralf Baechle, Ingo Molnar,
Peter Zijlstra, Thomas Gleixner, H. Peter Anvin, Russell King,
Paul Mackerras, Heiko Carstens, Paul Mundt
On Tue, Sep 13, 2011 at 02:06:45AM +0200, Frederic Weisbecker wrote:
> On Wed, Sep 07, 2011 at 07:22:33AM -0700, Paul E. McKenney wrote:
> > On Sun, Sep 04, 2011 at 04:36:43PM -0700, Paul E. McKenney wrote:
> > > On Sat, Aug 20, 2011 at 07:30:49PM +0200, Frederic Weisbecker wrote:
> > > > It is assumed that rcu won't be used once we switch to tickless
> > > > mode and until we restart the tick. However this is not always
> > > > true, as in x86-64 where we dereference the idle notifiers after
> > > > the tick is stopped.
> > > >
> > > > To prepare for fixing this, split the tickless mode switching and
> > > > RCU extended quiescent state logics.
> > > > Make tick_nohz_stop/restart_sched_tick() RCU agnostic but provide
> > > > a new pair of APIs tick_nohz_enter/exit_idle() that keep the
> > > > old behaviour by handling both the nohz mode and RCU extended
> > > > quiescent states, then convert every archs to use these.
> > > >
> > > > Archs that want to switch to RCU extended QS to some custom points
> > > > can do it later by changing the parameter in tick_nohz_enter,exit_idle()
> > > > to false and call rcu_enter,exit() separately.
> > >
> > > This approach looks quite good to me! A few comments below.
> >
> > But I get RCU stall warnings when running it on powerpc on top of
> > the patch set at:
> >
> > https://lkml.org/lkml/2011/9/7/64
> >
> > At first glance, it appears that CPUs are entering dyntick-idle
> > mode without RCU being informed. Any thoughts on diagnostics?
> >
> > Thanx, Paul
>
> I've just tested with your rcu/next branch from github and applied
> my patches on top of it but did not experience a stall in x86.
Thank you for checking this!
> That may be related to powerpc more precisely, or may be any arch
> but on some particular condition.
OK, I guess my next step is to put a WARN_ON_ONCE() in the idle loop
to check to see if there might be some other idle loop buried in
PowerPC somewhere...
Thanx, Paul
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2011-09-13 16:49 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-08-20 17:30 [PATCH 0/4 v2] rcu: Fix some rcu uses in extended quiescent state Frederic Weisbecker
2011-08-20 17:30 ` [PATCH 1/4] nohz: Split extended quiescent state handling from nohz switch Frederic Weisbecker
2011-08-20 17:39 ` Mike Frysinger
2011-08-22 2:02 ` Guan Xuetao
2011-09-04 21:01 ` Paul E. McKenney
2011-09-04 21:05 ` Frederic Weisbecker
2011-09-04 23:36 ` Paul E. McKenney
2011-09-06 14:58 ` Frederic Weisbecker
2011-09-07 14:22 ` Paul E. McKenney
2011-09-07 21:39 ` Frederic Weisbecker
2011-09-13 0:06 ` Frederic Weisbecker
2011-09-13 16:49 ` Paul E. McKenney
2011-08-20 17:30 ` [PATCH 2/4] x86: Enter rcu extended qs after idle notifier call Frederic Weisbecker
2011-08-20 17:30 ` [PATCH 3/4] x86: Call idle notifier after irq_enter() Frederic Weisbecker
2011-08-20 17:30 ` [PATCH 4/4] rcu: Fix early call to rcu_irq_exit() Frederic Weisbecker
2011-09-04 23:38 ` Paul E. McKenney
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox