* [PATCH 00/11] rcu: Detect illegal uses of RCU in idle and fix some v5
@ 2011-10-07 16:21 Frederic Weisbecker
2011-10-07 16:22 ` [PATCH 01/11] rcu: Detect illegal rcu dereference in extended quiescent state Frederic Weisbecker
` (11 more replies)
0 siblings, 12 replies; 24+ messages in thread
From: Frederic Weisbecker @ 2011-10-07 16:21 UTC (permalink / raw)
To: Paul E. McKenney
Cc: LKML, Frederic Weisbecker, Ingo Molnar, Thomas Gleixner,
Peter Zijlstra, H. Peter Anvin, Andy Henroid, Mike Frysinger,
Guan Xuetao, David Miller, Chris Metcalf, Hans-Christian Egtvedt,
Ralf Baechle, Russell King, Paul Mackerras, Heiko Carstens,
Paul Mundt, Lai Jiangshan
Hi Paul,
Here is the rebase. And it's more than just a rebase given the
semantical changes after your patch that tracks idleness. And also
because of the new API naming (with a new pair of tick_nohz_idle_enter()
tick_nohz_idle_exit() with *_norcu suffixes).
I have reused and updated some comments (that you made on earlier
versions of my patchset) that assumed that extended qs = dynticks
idle, as it's not always true anymore. I've tried to bring a new
wording to express what we are dealing with: "RCU-free window in idle"
or "RCU-idle window". I let you update the comments if you think
that's confusing or too scarce.
It has passed several hours of rcutorture with NO_HZ && SMP, and at least
booted fine with all your configs.
Frederic Weisbecker (9):
rcu: Detect illegal rcu dereference in extended quiescent state
rcu: Inform the user about extended quiescent state on PROVE_RCU warning
rcu: Warn when rcu_read_lock() is used in extended quiescent state
rcu: Make srcu_read_lock_held() call common lockdep-enabled function
nohz: Separate out irq exit and idle loop dyntick logic
nohz: Allow rcu extended quiescent state handling seperately from tick stop
x86: Enter rcu extended qs after idle notifier call
x86: Call idle notifier after irq_enter()
rcu: Fix early call to rcu_idle_enter()
Paul E. McKenney (2):
rcu: Remove one layer of abstraction from PROVE_RCU checking
rcu: Warn when srcu_read_lock() is used in an extended quiescent state
arch/arm/kernel/process.c | 4 +-
arch/avr32/kernel/process.c | 4 +-
arch/blackfin/kernel/process.c | 4 +-
arch/microblaze/kernel/process.c | 4 +-
arch/mips/kernel/process.c | 4 +-
arch/openrisc/kernel/idle.c | 4 +-
arch/powerpc/kernel/idle.c | 4 +-
arch/powerpc/platforms/iseries/setup.c | 8 +-
arch/s390/kernel/process.c | 4 +-
arch/sh/kernel/idle.c | 4 +-
arch/sparc/kernel/process_64.c | 4 +-
arch/tile/kernel/process.c | 4 +-
arch/um/kernel/process.c | 4 +-
arch/unicore32/kernel/process.c | 4 +-
arch/x86/kernel/apic/apic.c | 6 +-
arch/x86/kernel/apic/io_apic.c | 2 +-
arch/x86/kernel/cpu/mcheck/therm_throt.c | 2 +-
arch/x86/kernel/cpu/mcheck/threshold.c | 2 +-
arch/x86/kernel/irq.c | 6 +-
arch/x86/kernel/process_32.c | 4 +-
arch/x86/kernel/process_64.c | 10 +++-
include/linux/rcupdate.h | 74 +++++++++++++++--------
include/linux/srcu.h | 38 ++++++++----
include/linux/tick.h | 52 ++++++++++++++--
kernel/lockdep.c | 22 +++++++
kernel/rcupdate.c | 4 +
kernel/rcutiny.c | 1 +
kernel/rcutree.c | 19 +++++-
kernel/softirq.c | 4 +-
kernel/time/tick-sched.c | 96 ++++++++++++++++++-----------
30 files changed, 273 insertions(+), 129 deletions(-)
--
1.7.5.4
^ permalink raw reply [flat|nested] 24+ messages in thread
* [PATCH 01/11] rcu: Detect illegal rcu dereference in extended quiescent state
2011-10-07 16:21 [PATCH 00/11] rcu: Detect illegal uses of RCU in idle and fix some v5 Frederic Weisbecker
@ 2011-10-07 16:22 ` Frederic Weisbecker
2011-10-07 16:22 ` [PATCH 02/11] rcu: Inform the user about extended quiescent state on PROVE_RCU warning Frederic Weisbecker
` (10 subsequent siblings)
11 siblings, 0 replies; 24+ messages in thread
From: Frederic Weisbecker @ 2011-10-07 16:22 UTC (permalink / raw)
To: Paul E. McKenney
Cc: LKML, Frederic Weisbecker, Ingo Molnar, Peter Zijlstra,
Thomas Gleixner, Lai Jiangshan
Report that none of the rcu read lock maps are held while in an RCU
extended quiescent state (the section between rcu_idle_enter()
and rcu_idle_exit()). This helps detect any use of rcu_dereference()
and friends from within the section in idle where RCU is not allowed.
This way we can guarantee an extended quiescent window where the CPU
can be put in dyntick idle mode or can simply aoid to be part of any
global grace period completion while in the idle loop.
Uses of RCU from such mode are totally ignored by RCU, hence the
importance of these checks.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
include/linux/rcupdate.h | 29 +++++++++++++++++++++++++++++
kernel/rcupdate.c | 4 ++++
kernel/rcutiny.c | 1 +
kernel/rcutree.c | 19 ++++++++++++++++---
4 files changed, 50 insertions(+), 3 deletions(-)
diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index a90a850..2f8e0a4 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -218,6 +218,15 @@ static inline void destroy_rcu_head_on_stack(struct rcu_head *head)
#ifdef CONFIG_DEBUG_LOCK_ALLOC
+#ifdef CONFIG_PROVE_RCU
+extern int rcu_is_cpu_idle(void);
+#else /* !CONFIG_PROVE_RCU */
+static inline int rcu_is_cpu_idle(void)
+{
+ return 0;
+}
+#endif /* else !CONFIG_PROVE_RCU */
+
extern struct lockdep_map rcu_lock_map;
# define rcu_read_acquire() \
lock_acquire(&rcu_lock_map, 0, 0, 2, 1, NULL, _THIS_IP_)
@@ -252,6 +261,10 @@ static inline int rcu_read_lock_held(void)
{
if (!debug_lockdep_rcu_enabled())
return 1;
+
+ if (rcu_is_cpu_idle())
+ return 0;
+
return lock_is_held(&rcu_lock_map);
}
@@ -275,6 +288,18 @@ extern int rcu_read_lock_bh_held(void);
*
* Check debug_lockdep_rcu_enabled() to prevent false positives during boot
* and while lockdep is disabled.
+ *
+ * Note that if the CPU is in the idle loop from an RCU point of view
+ * (ie: that we are in the section between rcu_idle_enter() and rcu_idle_exit())
+ * then rcu_read_lock_held() returns false even if the CPU did an rcu_read_lock().
+ * The reason for this is that RCU ignores CPUs that are in such a section,
+ * considering these as in extended quiescent state, so such a CPU is effectively
+ * never in an RCU read-side critical section regardless of what RCU primitives it
+ * invokes. This state of affairs is required --- we need to keep an RCU-free
+ * window in idle where the CPU may possibly enter into low power mode. This way
+ * we can notice an extended quiescent state to other CPUs that started a grace
+ * period. Otherwise we would delay any grace period as long as we run in the
+ * idle task.
*/
#ifdef CONFIG_PREEMPT_COUNT
static inline int rcu_read_lock_sched_held(void)
@@ -283,6 +308,10 @@ static inline int rcu_read_lock_sched_held(void)
if (!debug_lockdep_rcu_enabled())
return 1;
+
+ if (rcu_is_cpu_idle())
+ return 0;
+
if (debug_locks)
lockdep_opinion = lock_is_held(&rcu_sched_lock_map);
return lockdep_opinion || preempt_count() != 0 || irqs_disabled();
diff --git a/kernel/rcupdate.c b/kernel/rcupdate.c
index ca0d23b..d348e3a 100644
--- a/kernel/rcupdate.c
+++ b/kernel/rcupdate.c
@@ -93,6 +93,10 @@ int rcu_read_lock_bh_held(void)
{
if (!debug_lockdep_rcu_enabled())
return 1;
+
+ if (rcu_is_cpu_idle())
+ return 0;
+
return in_softirq() || irqs_disabled();
}
EXPORT_SYMBOL_GPL(rcu_read_lock_bh_held);
diff --git a/kernel/rcutiny.c b/kernel/rcutiny.c
index 124bd38..1f75d53 100644
--- a/kernel/rcutiny.c
+++ b/kernel/rcutiny.c
@@ -83,6 +83,7 @@ int rcu_is_cpu_idle(void)
{
return !rcu_dynticks_nesting;
}
+EXPORT_SYMBOL(rcu_is_cpu_idle);
#endif /* #ifdef CONFIG_PROVE_RCU */
diff --git a/kernel/rcutree.c b/kernel/rcutree.c
index 6279479..ed6371c 100644
--- a/kernel/rcutree.c
+++ b/kernel/rcutree.c
@@ -444,13 +444,26 @@ void rcu_nmi_exit(void)
* rcu_is_cpu_idle - see if RCU thinks that the current CPU is idle
*
* If the current CPU is in its idle loop and is neither in an interrupt
- * or NMI handler, return true. The caller must have at least disabled
- * preemption.
+ * or NMI handler, return true.
*/
int rcu_is_cpu_idle(void)
{
- return (atomic_read(&__get_cpu_var(rcu_dynticks).dynticks) & 0x1) == 0;
+ /*
+ * Disable preemption so that we can call it outside the idle task.
+ * This way we can check if we missed a call to rcu_idle_exit() before
+ * exiting idle.
+ */
+ struct rcu_dynticks *rdtp = &get_cpu_var(rcu_dynticks);
+ int idle = 0;
+
+ if ((atomic_read(&rdtp->dynticks) & 0x1) == 0)
+ idle = 1;
+
+ put_cpu_var(rcu_dynticks);
+
+ return idle;
}
+EXPORT_SYMBOL(rcu_is_cpu_idle);
#endif /* #ifdef CONFIG_PROVE_RCU */
--
1.7.5.4
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH 02/11] rcu: Inform the user about extended quiescent state on PROVE_RCU warning
2011-10-07 16:21 [PATCH 00/11] rcu: Detect illegal uses of RCU in idle and fix some v5 Frederic Weisbecker
2011-10-07 16:22 ` [PATCH 01/11] rcu: Detect illegal rcu dereference in extended quiescent state Frederic Weisbecker
@ 2011-10-07 16:22 ` Frederic Weisbecker
2011-10-07 22:47 ` Paul E. McKenney
2011-10-07 16:22 ` [PATCH 03/11] rcu: Warn when rcu_read_lock() is used in extended quiescent state Frederic Weisbecker
` (9 subsequent siblings)
11 siblings, 1 reply; 24+ messages in thread
From: Frederic Weisbecker @ 2011-10-07 16:22 UTC (permalink / raw)
To: Paul E. McKenney
Cc: LKML, Frederic Weisbecker, Ingo Molnar, Peter Zijlstra,
Thomas Gleixner, Lai Jiangshan
Inform the user if an RCU usage error is detected by lockdep while in
an extended quiescent state (in this case, the RCU-free window in idle).
This is accomplished by adding a line to the RCU lockdep splat indicating
whether or not the splat occurred in extended quiescent state.
Uses of RCU from within extended quiescent state mode are totally ignored
by RCU, hence the importance of this diagnostic.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
kernel/lockdep.c | 22 ++++++++++++++++++++++
1 files changed, 22 insertions(+), 0 deletions(-)
diff --git a/kernel/lockdep.c b/kernel/lockdep.c
index 1e48f1c..8873f6e 100644
--- a/kernel/lockdep.c
+++ b/kernel/lockdep.c
@@ -4026,6 +4026,28 @@ void lockdep_rcu_suspicious(const char *file, const int line, const char *s)
printk("%s:%d %s!\n", file, line, s);
printk("\nother info that might help us debug this:\n\n");
printk("\nrcu_scheduler_active = %d, debug_locks = %d\n", rcu_scheduler_active, debug_locks);
+
+ /*
+ * If a CPU is in the RCU-free window in idle (ie: in the section
+ * between rcu_idle_enter() and rcu_idle_exit(), then RCU
+ * considers that CPU to be in an "extended quiescent state",
+ * which means that RCU will be completely ignoring that CPU.
+ * Therefore, rcu_read_lock() and friends have absolutely no
+ * effect on a CPU running in that state. In other words, even if
+ * such an RCU-idle CPU has called rcu_read_lock(), RCU might well
+ * delete data structures out from under it. RCU really has no
+ * choice here: we need to keep an RCU-free window in idle where
+ * the CPU may possibly enter into low power mode. This way we can
+ * notice an extended quiescent state to other CPUs that started a grace
+ * period. Otherwise we would delay any grace period as long as we run
+ * in the idle task.
+ *
+ * So complain bitterly if someone does call rcu_read_lock(),
+ * rcu_read_lock_bh() and so on from extended quiescent states.
+ */
+ if (rcu_is_cpu_idle())
+ printk("RCU used illegally from extended quiescent state!\n");
+
lockdep_print_held_locks(curr);
printk("\nstack backtrace:\n");
dump_stack();
--
1.7.5.4
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH 03/11] rcu: Warn when rcu_read_lock() is used in extended quiescent state
2011-10-07 16:21 [PATCH 00/11] rcu: Detect illegal uses of RCU in idle and fix some v5 Frederic Weisbecker
2011-10-07 16:22 ` [PATCH 01/11] rcu: Detect illegal rcu dereference in extended quiescent state Frederic Weisbecker
2011-10-07 16:22 ` [PATCH 02/11] rcu: Inform the user about extended quiescent state on PROVE_RCU warning Frederic Weisbecker
@ 2011-10-07 16:22 ` Frederic Weisbecker
2011-10-07 16:22 ` [PATCH 04/11] rcu: Remove one layer of abstraction from PROVE_RCU checking Frederic Weisbecker
` (8 subsequent siblings)
11 siblings, 0 replies; 24+ messages in thread
From: Frederic Weisbecker @ 2011-10-07 16:22 UTC (permalink / raw)
To: Paul E. McKenney
Cc: LKML, Frederic Weisbecker, Ingo Molnar, Peter Zijlstra,
Thomas Gleixner, Lai Jiangshan
We are currently able to detect uses of rcu_dereference_check() inside
extended quiescent states (such as the RCU-free window in idle).
But rcu_read_lock() and friends can be used without rcu_dereference(),
so that the earlier commit checking for use of rcu_dereference() and
friends while in RCU idle mode miss some error conditions. This commit
therefore adds extended quiescent state checking to rcu_read_lock() and
friends.
Uses of RCU from within RCU-idle mode are totally ignored by
RCU, hence the importance of these checks.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
include/linux/rcupdate.h | 52 +++++++++++++++++++++++++++++++++++++--------
1 files changed, 42 insertions(+), 10 deletions(-)
diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 2f8e0a4..fe4f8976 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -227,21 +227,53 @@ static inline int rcu_is_cpu_idle(void)
}
#endif /* else !CONFIG_PROVE_RCU */
+static inline void rcu_lock_acquire(struct lockdep_map *map)
+{
+ WARN_ON_ONCE(rcu_is_cpu_idle());
+ lock_acquire(map, 0, 0, 2, 1, NULL, _THIS_IP_);
+}
+
+static inline void rcu_lock_release(struct lockdep_map *map)
+{
+ WARN_ON_ONCE(rcu_is_cpu_idle());
+ lock_release(map, 1, _THIS_IP_);
+}
+
extern struct lockdep_map rcu_lock_map;
-# define rcu_read_acquire() \
- lock_acquire(&rcu_lock_map, 0, 0, 2, 1, NULL, _THIS_IP_)
-# define rcu_read_release() lock_release(&rcu_lock_map, 1, _THIS_IP_)
+
+static inline void rcu_read_acquire(void)
+{
+ rcu_lock_acquire(&rcu_lock_map);
+}
+
+static inline void rcu_read_release(void)
+{
+ rcu_lock_release(&rcu_lock_map);
+}
extern struct lockdep_map rcu_bh_lock_map;
-# define rcu_read_acquire_bh() \
- lock_acquire(&rcu_bh_lock_map, 0, 0, 2, 1, NULL, _THIS_IP_)
-# define rcu_read_release_bh() lock_release(&rcu_bh_lock_map, 1, _THIS_IP_)
+
+static inline void rcu_read_acquire_bh(void)
+{
+ rcu_lock_acquire(&rcu_bh_lock_map);
+}
+
+static inline void rcu_read_release_bh(void)
+{
+ rcu_lock_release(&rcu_bh_lock_map);
+}
extern struct lockdep_map rcu_sched_lock_map;
-# define rcu_read_acquire_sched() \
- lock_acquire(&rcu_sched_lock_map, 0, 0, 2, 1, NULL, _THIS_IP_)
-# define rcu_read_release_sched() \
- lock_release(&rcu_sched_lock_map, 1, _THIS_IP_)
+
+static inline void rcu_read_acquire_sched(void)
+{
+ rcu_lock_acquire(&rcu_sched_lock_map);
+}
+
+static inline void rcu_read_release_sched(void)
+{
+ rcu_lock_release(&rcu_sched_lock_map);
+}
extern int debug_lockdep_rcu_enabled(void);
--
1.7.5.4
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH 04/11] rcu: Remove one layer of abstraction from PROVE_RCU checking
2011-10-07 16:21 [PATCH 00/11] rcu: Detect illegal uses of RCU in idle and fix some v5 Frederic Weisbecker
` (2 preceding siblings ...)
2011-10-07 16:22 ` [PATCH 03/11] rcu: Warn when rcu_read_lock() is used in extended quiescent state Frederic Weisbecker
@ 2011-10-07 16:22 ` Frederic Weisbecker
2011-10-07 16:22 ` [PATCH 05/11] rcu: Warn when srcu_read_lock() is used in an extended quiescent state Frederic Weisbecker
` (7 subsequent siblings)
11 siblings, 0 replies; 24+ messages in thread
From: Frederic Weisbecker @ 2011-10-07 16:22 UTC (permalink / raw)
To: Paul E. McKenney; +Cc: LKML
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Simplify things a bit by substituting the definitions of the single-line
rcu_read_acquire(), rcu_read_release(), rcu_read_acquire_bh(),
rcu_read_release_bh(), rcu_read_acquire_sched(), and
rcu_read_release_sched() functions at their call points.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
include/linux/rcupdate.h | 53 +++++++---------------------------------------
1 files changed, 8 insertions(+), 45 deletions(-)
diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index fe4f8976..e550f2a 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -240,41 +240,8 @@ static inline void rcu_lock_release(struct lockdep_map *map)
}
extern struct lockdep_map rcu_lock_map;
-
-static inline void rcu_read_acquire(void)
-{
- rcu_lock_acquire(&rcu_lock_map);
-}
-
-static inline void rcu_read_release(void)
-{
- rcu_lock_release(&rcu_lock_map);
-}
-
extern struct lockdep_map rcu_bh_lock_map;
-
-static inline void rcu_read_acquire_bh(void)
-{
- rcu_lock_acquire(&rcu_bh_lock_map);
-}
-
-static inline void rcu_read_release_bh(void)
-{
- rcu_lock_release(&rcu_bh_lock_map);
-}
-
extern struct lockdep_map rcu_sched_lock_map;
-
-static inline void rcu_read_acquire_sched(void)
-{
- rcu_lock_acquire(&rcu_sched_lock_map);
-}
-
-static inline void rcu_read_release_sched(void)
-{
- rcu_lock_release(&rcu_sched_lock_map);
-}
-
extern int debug_lockdep_rcu_enabled(void);
/**
@@ -357,12 +324,8 @@ static inline int rcu_read_lock_sched_held(void)
#else /* #ifdef CONFIG_DEBUG_LOCK_ALLOC */
-# define rcu_read_acquire() do { } while (0)
-# define rcu_read_release() do { } while (0)
-# define rcu_read_acquire_bh() do { } while (0)
-# define rcu_read_release_bh() do { } while (0)
-# define rcu_read_acquire_sched() do { } while (0)
-# define rcu_read_release_sched() do { } while (0)
+# define rcu_lock_acquire(a) do { } while (0)
+# define rcu_lock_release(a) do { } while (0)
static inline int rcu_read_lock_held(void)
{
@@ -683,7 +646,7 @@ static inline void rcu_read_lock(void)
{
__rcu_read_lock();
__acquire(RCU);
- rcu_read_acquire();
+ rcu_lock_acquire(&rcu_lock_map);
}
/*
@@ -703,7 +666,7 @@ static inline void rcu_read_lock(void)
*/
static inline void rcu_read_unlock(void)
{
- rcu_read_release();
+ rcu_lock_release(&rcu_lock_map);
__release(RCU);
__rcu_read_unlock();
}
@@ -724,7 +687,7 @@ static inline void rcu_read_lock_bh(void)
{
local_bh_disable();
__acquire(RCU_BH);
- rcu_read_acquire_bh();
+ rcu_lock_acquire(&rcu_bh_lock_map);
}
/*
@@ -734,7 +697,7 @@ static inline void rcu_read_lock_bh(void)
*/
static inline void rcu_read_unlock_bh(void)
{
- rcu_read_release_bh();
+ rcu_lock_release(&rcu_bh_lock_map);
__release(RCU_BH);
local_bh_enable();
}
@@ -751,7 +714,7 @@ static inline void rcu_read_lock_sched(void)
{
preempt_disable();
__acquire(RCU_SCHED);
- rcu_read_acquire_sched();
+ rcu_lock_acquire(&rcu_sched_lock_map);
}
/* Used by lockdep and tracing: cannot be traced, cannot call lockdep. */
@@ -768,7 +731,7 @@ static inline notrace void rcu_read_lock_sched_notrace(void)
*/
static inline void rcu_read_unlock_sched(void)
{
- rcu_read_release_sched();
+ rcu_lock_release(&rcu_sched_lock_map);
__release(RCU_SCHED);
preempt_enable();
}
--
1.7.5.4
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH 05/11] rcu: Warn when srcu_read_lock() is used in an extended quiescent state
2011-10-07 16:21 [PATCH 00/11] rcu: Detect illegal uses of RCU in idle and fix some v5 Frederic Weisbecker
` (3 preceding siblings ...)
2011-10-07 16:22 ` [PATCH 04/11] rcu: Remove one layer of abstraction from PROVE_RCU checking Frederic Weisbecker
@ 2011-10-07 16:22 ` Frederic Weisbecker
2011-10-07 16:22 ` [PATCH 06/11] rcu: Make srcu_read_lock_held() call common lockdep-enabled function Frederic Weisbecker
` (6 subsequent siblings)
11 siblings, 0 replies; 24+ messages in thread
From: Frederic Weisbecker @ 2011-10-07 16:22 UTC (permalink / raw)
To: Paul E. McKenney; +Cc: LKML, Frederic Weisbecker
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Catch SRCU up to the other variants of RCU by making PROVE_RCU
complain if either srcu_read_lock() or srcu_read_lock_held() are
used from within RCU-idle mode.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
---
include/linux/srcu.h | 35 ++++++++++++++++++++++-------------
1 files changed, 22 insertions(+), 13 deletions(-)
diff --git a/include/linux/srcu.h b/include/linux/srcu.h
index 58971e8..553c003 100644
--- a/include/linux/srcu.h
+++ b/include/linux/srcu.h
@@ -28,6 +28,7 @@
#define _LINUX_SRCU_H
#include <linux/mutex.h>
+#include <linux/rcupdate.h>
struct srcu_struct_array {
int c[2];
@@ -60,18 +61,10 @@ int __init_srcu_struct(struct srcu_struct *sp, const char *name,
__init_srcu_struct((sp), #sp, &__srcu_key); \
})
-# define srcu_read_acquire(sp) \
- lock_acquire(&(sp)->dep_map, 0, 0, 2, 1, NULL, _THIS_IP_)
-# define srcu_read_release(sp) \
- lock_release(&(sp)->dep_map, 1, _THIS_IP_)
-
#else /* #ifdef CONFIG_DEBUG_LOCK_ALLOC */
int init_srcu_struct(struct srcu_struct *sp);
-# define srcu_read_acquire(sp) do { } while (0)
-# define srcu_read_release(sp) do { } while (0)
-
#endif /* #else #ifdef CONFIG_DEBUG_LOCK_ALLOC */
void cleanup_srcu_struct(struct srcu_struct *sp);
@@ -90,12 +83,28 @@ long srcu_batches_completed(struct srcu_struct *sp);
* read-side critical section. In absence of CONFIG_DEBUG_LOCK_ALLOC,
* this assumes we are in an SRCU read-side critical section unless it can
* prove otherwise.
+ *
+ * Note that if the CPU is in the idle loop from an RCU point of view
+ * (ie: that we are in the section between rcu_idle_enter() and rcu_idle_exit())
+ * then srcu_read_lock_held() returns false even if the CPU did an srcu_read_lock().
+ * The reason for this is that RCU ignores CPUs that are in such a section,
+ * considering these as in extended quiescent state, so such a CPU is effectively
+ * never in an RCU read-side critical section regardless of what RCU primitives it
+ * invokes. This state of affairs is required --- we need to keep an RCU-free
+ * window in idle where the CPU may possibly enter into low power mode. This way
+ * we can notice an extended quiescent state to other CPUs that started a grace
+ * period. Otherwise we would delay any grace period as long as we run in the
+ * idle task.
*/
static inline int srcu_read_lock_held(struct srcu_struct *sp)
{
- if (debug_locks)
- return lock_is_held(&sp->dep_map);
- return 1;
+ if (rcu_is_cpu_idle())
+ return 0;
+
+ if (!debug_locks)
+ return 1;
+
+ return lock_is_held(&sp->dep_map);
}
#else /* #ifdef CONFIG_DEBUG_LOCK_ALLOC */
@@ -150,7 +159,7 @@ static inline int srcu_read_lock(struct srcu_struct *sp) __acquires(sp)
{
int retval = __srcu_read_lock(sp);
- srcu_read_acquire(sp);
+ rcu_lock_acquire(&(sp)->dep_map);
return retval;
}
@@ -164,7 +173,7 @@ static inline int srcu_read_lock(struct srcu_struct *sp) __acquires(sp)
static inline void srcu_read_unlock(struct srcu_struct *sp, int idx)
__releases(sp)
{
- srcu_read_release(sp);
+ rcu_lock_release(&(sp)->dep_map);
__srcu_read_unlock(sp, idx);
}
--
1.7.5.4
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH 06/11] rcu: Make srcu_read_lock_held() call common lockdep-enabled function
2011-10-07 16:21 [PATCH 00/11] rcu: Detect illegal uses of RCU in idle and fix some v5 Frederic Weisbecker
` (4 preceding siblings ...)
2011-10-07 16:22 ` [PATCH 05/11] rcu: Warn when srcu_read_lock() is used in an extended quiescent state Frederic Weisbecker
@ 2011-10-07 16:22 ` Frederic Weisbecker
2011-10-07 16:22 ` [PATCH 07/11] nohz: Separate out irq exit and idle loop dyntick logic Frederic Weisbecker
` (5 subsequent siblings)
11 siblings, 0 replies; 24+ messages in thread
From: Frederic Weisbecker @ 2011-10-07 16:22 UTC (permalink / raw)
To: Paul E. McKenney; +Cc: LKML, Frederic Weisbecker
A common debug_lockdep_rcu_enabled() function is used to check whether
RCU lockdep splats should be reported, but srcu_read_lock() does not
use it. This commit therefore brings srcu_read_lock_held() up to date.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
---
include/linux/srcu.h | 5 ++++-
1 files changed, 4 insertions(+), 1 deletions(-)
diff --git a/include/linux/srcu.h b/include/linux/srcu.h
index 553c003..05d34cf 100644
--- a/include/linux/srcu.h
+++ b/include/linux/srcu.h
@@ -84,6 +84,9 @@ long srcu_batches_completed(struct srcu_struct *sp);
* this assumes we are in an SRCU read-side critical section unless it can
* prove otherwise.
*
+ * Checks debug_lockdep_rcu_enabled() to prevent false positives during boot
+ * and while lockdep is disabled.
+ *
* Note that if the CPU is in the idle loop from an RCU point of view
* (ie: that we are in the section between rcu_idle_enter() and rcu_idle_exit())
* then srcu_read_lock_held() returns false even if the CPU did an srcu_read_lock().
@@ -101,7 +104,7 @@ static inline int srcu_read_lock_held(struct srcu_struct *sp)
if (rcu_is_cpu_idle())
return 0;
- if (!debug_locks)
+ if (!debug_lockdep_rcu_enabled())
return 1;
return lock_is_held(&sp->dep_map);
--
1.7.5.4
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH 07/11] nohz: Separate out irq exit and idle loop dyntick logic
2011-10-07 16:21 [PATCH 00/11] rcu: Detect illegal uses of RCU in idle and fix some v5 Frederic Weisbecker
` (5 preceding siblings ...)
2011-10-07 16:22 ` [PATCH 06/11] rcu: Make srcu_read_lock_held() call common lockdep-enabled function Frederic Weisbecker
@ 2011-10-07 16:22 ` Frederic Weisbecker
2011-10-07 16:22 ` [PATCH 08/11] nohz: Allow rcu extended quiescent state handling seperately from tick stop Frederic Weisbecker
` (4 subsequent siblings)
11 siblings, 0 replies; 24+ messages in thread
From: Frederic Weisbecker @ 2011-10-07 16:22 UTC (permalink / raw)
To: Paul E. McKenney
Cc: LKML, Frederic Weisbecker, Mike Frysinger, Guan Xuetao,
David Miller, Chris Metcalf, Hans-Christian Egtvedt, Ralf Baechle,
Ingo Molnar, Peter Zijlstra, Thomas Gleixner, H. Peter Anvin,
Russell King, Paul Mackerras, Heiko Carstens, Paul Mundt
The tick_nohz_stop_sched_tick() function, which tries to delay
the next timer tick as long as possible, can be called from two
places:
- From the idle loop to start the dytick idle mode
- From interrupt exit if we have interrupted the dyntick
idle mode, so that we reprogram the next tick event in
case the irq changed some internal state that requires this
action.
There are only few minor differences between both that
are handled by that function, driven by the ts->inidle
cpu variable and the inidle parameter. The whole guarantees
that we only update the dyntick mode on irq exit if we actually
interrupted the dyntick idle mode, and that we enter in RCU extended
quiescent state from idle loop entry only.
Split this function into:
- tick_nohz_idle_enter(), which sets ts->inidle to 1, enters
dynticks idle mode unconditionally if it can, and enters into RCU
extended quiescent state.
- tick_nohz_irq_exit() which only updates the dynticks idle mode
when ts->inidle is set (ie: if tick_nohz_idle_enter() has been called).
To maintain symmetry, tick_nohz_restart_sched_tick() has been renamed
into tick_nohz_idle_exit().
This simplifies the code and micro-optimize the irq exit path (no need
for local_irq_save there). This also prepares for the split between
dynticks and rcu extended quiescent state logics. We'll need this split to
further fix illegal uses of RCU in extended quiescent states in the idle
loop.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mike Frysinger <vapier@gentoo.org>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: David Miller <davem@davemloft.net>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Hans-Christian Egtvedt <hans-christian.egtvedt@atmel.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
---
arch/arm/kernel/process.c | 4 +-
arch/avr32/kernel/process.c | 4 +-
arch/blackfin/kernel/process.c | 4 +-
arch/microblaze/kernel/process.c | 4 +-
arch/mips/kernel/process.c | 4 +-
arch/openrisc/kernel/idle.c | 4 +-
arch/powerpc/kernel/idle.c | 4 +-
arch/powerpc/platforms/iseries/setup.c | 8 ++--
arch/s390/kernel/process.c | 4 +-
arch/sh/kernel/idle.c | 4 +-
arch/sparc/kernel/process_64.c | 4 +-
arch/tile/kernel/process.c | 4 +-
arch/um/kernel/process.c | 4 +-
arch/unicore32/kernel/process.c | 4 +-
arch/x86/kernel/process_32.c | 4 +-
arch/x86/kernel/process_64.c | 4 +-
include/linux/tick.h | 13 +++--
kernel/softirq.c | 2 +-
kernel/time/tick-sched.c | 93 +++++++++++++++++++------------
19 files changed, 99 insertions(+), 77 deletions(-)
diff --git a/arch/arm/kernel/process.c b/arch/arm/kernel/process.c
index 1a347f4..f9261d0 100644
--- a/arch/arm/kernel/process.c
+++ b/arch/arm/kernel/process.c
@@ -183,7 +183,7 @@ void cpu_idle(void)
/* endless idle loop with no priority at all */
while (1) {
- tick_nohz_stop_sched_tick(1);
+ tick_nohz_idle_enter();
leds_event(led_idle_start);
while (!need_resched()) {
#ifdef CONFIG_HOTPLUG_CPU
@@ -210,7 +210,7 @@ void cpu_idle(void)
}
}
leds_event(led_idle_end);
- tick_nohz_restart_sched_tick();
+ tick_nohz_idle_exit();
preempt_enable_no_resched();
schedule();
preempt_disable();
diff --git a/arch/avr32/kernel/process.c b/arch/avr32/kernel/process.c
index ef5a2a0..6ee7952 100644
--- a/arch/avr32/kernel/process.c
+++ b/arch/avr32/kernel/process.c
@@ -34,10 +34,10 @@ void cpu_idle(void)
{
/* endless idle loop with no priority at all */
while (1) {
- tick_nohz_stop_sched_tick(1);
+ tick_nohz_idle_enter();
while (!need_resched())
cpu_idle_sleep();
- tick_nohz_restart_sched_tick();
+ tick_nohz_idle_exit();
preempt_enable_no_resched();
schedule();
preempt_disable();
diff --git a/arch/blackfin/kernel/process.c b/arch/blackfin/kernel/process.c
index 6a80a9e..7b141b5 100644
--- a/arch/blackfin/kernel/process.c
+++ b/arch/blackfin/kernel/process.c
@@ -88,10 +88,10 @@ void cpu_idle(void)
#endif
if (!idle)
idle = default_idle;
- tick_nohz_stop_sched_tick(1);
+ tick_nohz_idle_enter();
while (!need_resched())
idle();
- tick_nohz_restart_sched_tick();
+ tick_nohz_idle_exit();
preempt_enable_no_resched();
schedule();
preempt_disable();
diff --git a/arch/microblaze/kernel/process.c b/arch/microblaze/kernel/process.c
index dbb8124..6dc123e 100644
--- a/arch/microblaze/kernel/process.c
+++ b/arch/microblaze/kernel/process.c
@@ -103,10 +103,10 @@ void cpu_idle(void)
if (!idle)
idle = default_idle;
- tick_nohz_stop_sched_tick(1);
+ tick_nohz_idle_enter();
while (!need_resched())
idle();
- tick_nohz_restart_sched_tick();
+ tick_nohz_idle_exit();
preempt_enable_no_resched();
schedule();
diff --git a/arch/mips/kernel/process.c b/arch/mips/kernel/process.c
index b30cb25..d50a005 100644
--- a/arch/mips/kernel/process.c
+++ b/arch/mips/kernel/process.c
@@ -56,7 +56,7 @@ void __noreturn cpu_idle(void)
/* endless idle loop with no priority at all */
while (1) {
- tick_nohz_stop_sched_tick(1);
+ tick_nohz_idle_enter();
while (!need_resched() && cpu_online(cpu)) {
#ifdef CONFIG_MIPS_MT_SMTC
extern void smtc_idle_loop_hook(void);
@@ -77,7 +77,7 @@ void __noreturn cpu_idle(void)
system_state == SYSTEM_BOOTING))
play_dead();
#endif
- tick_nohz_restart_sched_tick();
+ tick_nohz_idle_exit();
preempt_enable_no_resched();
schedule();
preempt_disable();
diff --git a/arch/openrisc/kernel/idle.c b/arch/openrisc/kernel/idle.c
index d5bc5f8..fb6a9bf 100644
--- a/arch/openrisc/kernel/idle.c
+++ b/arch/openrisc/kernel/idle.c
@@ -51,7 +51,7 @@ void cpu_idle(void)
/* endless idle loop with no priority at all */
while (1) {
- tick_nohz_stop_sched_tick(1);
+ tick_nohz_idle_enter();
while (!need_resched()) {
check_pgt_cache();
@@ -69,7 +69,7 @@ void cpu_idle(void)
set_thread_flag(TIF_POLLING_NRFLAG);
}
- tick_nohz_restart_sched_tick();
+ tick_nohz_idle_exit();
preempt_enable_no_resched();
schedule();
preempt_disable();
diff --git a/arch/powerpc/kernel/idle.c b/arch/powerpc/kernel/idle.c
index 39a2baa..878572f 100644
--- a/arch/powerpc/kernel/idle.c
+++ b/arch/powerpc/kernel/idle.c
@@ -56,7 +56,7 @@ void cpu_idle(void)
set_thread_flag(TIF_POLLING_NRFLAG);
while (1) {
- tick_nohz_stop_sched_tick(1);
+ tick_nohz_idle_enter();
while (!need_resched() && !cpu_should_die()) {
ppc64_runlatch_off();
@@ -93,7 +93,7 @@ void cpu_idle(void)
HMT_medium();
ppc64_runlatch_on();
- tick_nohz_restart_sched_tick();
+ tick_nohz_idle_exit();
preempt_enable_no_resched();
if (cpu_should_die())
cpu_die();
diff --git a/arch/powerpc/platforms/iseries/setup.c b/arch/powerpc/platforms/iseries/setup.c
index c25a081..e2f5fad 100644
--- a/arch/powerpc/platforms/iseries/setup.c
+++ b/arch/powerpc/platforms/iseries/setup.c
@@ -562,7 +562,7 @@ static void yield_shared_processor(void)
static void iseries_shared_idle(void)
{
while (1) {
- tick_nohz_stop_sched_tick(1);
+ tick_nohz_idle_enter();
while (!need_resched() && !hvlpevent_is_pending()) {
local_irq_disable();
ppc64_runlatch_off();
@@ -576,7 +576,7 @@ static void iseries_shared_idle(void)
}
ppc64_runlatch_on();
- tick_nohz_restart_sched_tick();
+ tick_nohz_idle_exit();
if (hvlpevent_is_pending())
process_iSeries_events();
@@ -592,7 +592,7 @@ static void iseries_dedicated_idle(void)
set_thread_flag(TIF_POLLING_NRFLAG);
while (1) {
- tick_nohz_stop_sched_tick(1);
+ tick_nohz_idle_enter();
if (!need_resched()) {
while (!need_resched()) {
ppc64_runlatch_off();
@@ -609,7 +609,7 @@ static void iseries_dedicated_idle(void)
}
ppc64_runlatch_on();
- tick_nohz_restart_sched_tick();
+ tick_nohz_idle_exit();
preempt_enable_no_resched();
schedule();
preempt_disable();
diff --git a/arch/s390/kernel/process.c b/arch/s390/kernel/process.c
index 541a750..db3e930 100644
--- a/arch/s390/kernel/process.c
+++ b/arch/s390/kernel/process.c
@@ -90,10 +90,10 @@ static void default_idle(void)
void cpu_idle(void)
{
for (;;) {
- tick_nohz_stop_sched_tick(1);
+ tick_nohz_idle_enter();
while (!need_resched())
default_idle();
- tick_nohz_restart_sched_tick();
+ tick_nohz_idle_exit();
preempt_enable_no_resched();
schedule();
preempt_disable();
diff --git a/arch/sh/kernel/idle.c b/arch/sh/kernel/idle.c
index db4ecd7..6015743 100644
--- a/arch/sh/kernel/idle.c
+++ b/arch/sh/kernel/idle.c
@@ -89,7 +89,7 @@ void cpu_idle(void)
/* endless idle loop with no priority at all */
while (1) {
- tick_nohz_stop_sched_tick(1);
+ tick_nohz_idle_enter();
while (!need_resched()) {
check_pgt_cache();
@@ -111,7 +111,7 @@ void cpu_idle(void)
start_critical_timings();
}
- tick_nohz_restart_sched_tick();
+ tick_nohz_idle_exit();
preempt_enable_no_resched();
schedule();
preempt_disable();
diff --git a/arch/sparc/kernel/process_64.c b/arch/sparc/kernel/process_64.c
index c158a95..1235f63 100644
--- a/arch/sparc/kernel/process_64.c
+++ b/arch/sparc/kernel/process_64.c
@@ -95,12 +95,12 @@ void cpu_idle(void)
set_thread_flag(TIF_POLLING_NRFLAG);
while(1) {
- tick_nohz_stop_sched_tick(1);
+ tick_nohz_idle_enter();
while (!need_resched() && !cpu_is_offline(cpu))
sparc64_yield(cpu);
- tick_nohz_restart_sched_tick();
+ tick_nohz_idle_exit();
preempt_enable_no_resched();
diff --git a/arch/tile/kernel/process.c b/arch/tile/kernel/process.c
index 9c45d8b..920e674 100644
--- a/arch/tile/kernel/process.c
+++ b/arch/tile/kernel/process.c
@@ -85,7 +85,7 @@ void cpu_idle(void)
/* endless idle loop with no priority at all */
while (1) {
- tick_nohz_stop_sched_tick(1);
+ tick_nohz_idle_enter();
while (!need_resched()) {
if (cpu_is_offline(cpu))
BUG(); /* no HOTPLUG_CPU */
@@ -105,7 +105,7 @@ void cpu_idle(void)
local_irq_enable();
current_thread_info()->status |= TS_POLLING;
}
- tick_nohz_restart_sched_tick();
+ tick_nohz_idle_exit();
preempt_enable_no_resched();
schedule();
preempt_disable();
diff --git a/arch/um/kernel/process.c b/arch/um/kernel/process.c
index 21c1ae7..41acf59 100644
--- a/arch/um/kernel/process.c
+++ b/arch/um/kernel/process.c
@@ -245,10 +245,10 @@ void default_idle(void)
if (need_resched())
schedule();
- tick_nohz_stop_sched_tick(1);
+ tick_nohz_idle_enter();
nsecs = disable_timer();
idle_sleep(nsecs);
- tick_nohz_restart_sched_tick();
+ tick_nohz_idle_exit();
}
}
diff --git a/arch/unicore32/kernel/process.c b/arch/unicore32/kernel/process.c
index ba401df..9999b9a 100644
--- a/arch/unicore32/kernel/process.c
+++ b/arch/unicore32/kernel/process.c
@@ -55,7 +55,7 @@ void cpu_idle(void)
{
/* endless idle loop with no priority at all */
while (1) {
- tick_nohz_stop_sched_tick(1);
+ tick_nohz_idle_enter();
while (!need_resched()) {
local_irq_disable();
stop_critical_timings();
@@ -63,7 +63,7 @@ void cpu_idle(void)
local_irq_enable();
start_critical_timings();
}
- tick_nohz_restart_sched_tick();
+ tick_nohz_idle_exit();
preempt_enable_no_resched();
schedule();
preempt_disable();
diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
index 7a3b651..ad93205 100644
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -98,7 +98,7 @@ void cpu_idle(void)
/* endless idle loop with no priority at all */
while (1) {
- tick_nohz_stop_sched_tick(1);
+ tick_nohz_idle_enter();
while (!need_resched()) {
check_pgt_cache();
@@ -114,7 +114,7 @@ void cpu_idle(void)
pm_idle();
start_critical_timings();
}
- tick_nohz_restart_sched_tick();
+ tick_nohz_idle_exit();
preempt_enable_no_resched();
schedule();
preempt_disable();
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index f693e44..9ca714e 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -121,7 +121,7 @@ void cpu_idle(void)
/* endless idle loop with no priority at all */
while (1) {
- tick_nohz_stop_sched_tick(1);
+ tick_nohz_idle_enter();
while (!need_resched()) {
rmb();
@@ -147,7 +147,7 @@ void cpu_idle(void)
__exit_idle();
}
- tick_nohz_restart_sched_tick();
+ tick_nohz_idle_exit();
preempt_enable_no_resched();
schedule();
preempt_disable();
diff --git a/include/linux/tick.h b/include/linux/tick.h
index ca40838..0df1d50 100644
--- a/include/linux/tick.h
+++ b/include/linux/tick.h
@@ -121,21 +121,22 @@ static inline int tick_oneshot_mode_active(void) { return 0; }
#endif /* !CONFIG_GENERIC_CLOCKEVENTS */
# ifdef CONFIG_NO_HZ
-extern void tick_nohz_stop_sched_tick(int inidle);
-extern void tick_nohz_restart_sched_tick(void);
+extern void tick_nohz_idle_enter(void);
+extern void tick_nohz_idle_exit(void);
+extern void tick_nohz_irq_exit(void);
extern ktime_t tick_nohz_get_sleep_length(void);
extern u64 get_cpu_idle_time_us(int cpu, u64 *last_update_time);
extern u64 get_cpu_iowait_time_us(int cpu, u64 *last_update_time);
# else
-static inline void tick_nohz_stop_sched_tick(int inidle)
+static inline void tick_nohz_idle_enter(void)
{
- if (inidle)
- rcu_idle_enter();
+ rcu_idle_enter();
}
-static inline void tick_nohz_restart_sched_tick(void)
+static inline void tick_nohz_idle_exit(void)
{
rcu_idle_exit();
}
+
static inline ktime_t tick_nohz_get_sleep_length(void)
{
ktime_t len = { .tv64 = NSEC_PER_SEC/HZ };
diff --git a/kernel/softirq.c b/kernel/softirq.c
index c0120d5..5c11dab 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -351,7 +351,7 @@ void irq_exit(void)
#ifdef CONFIG_NO_HZ
/* Make sure that timer wheel updates are propagated */
if (idle_cpu(smp_processor_id()) && !in_interrupt() && !need_resched())
- tick_nohz_stop_sched_tick(0);
+ tick_nohz_irq_exit();
#endif
preempt_enable_no_resched();
}
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 4692907..52b7ace 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -246,42 +246,17 @@ u64 get_cpu_iowait_time_us(int cpu, u64 *last_update_time)
}
EXPORT_SYMBOL_GPL(get_cpu_iowait_time_us);
-/**
- * tick_nohz_stop_sched_tick - stop the idle tick from the idle task
- *
- * When the next event is more than a tick into the future, stop the idle tick
- * Called either from the idle loop or from irq_exit() when an idle period was
- * just interrupted by an interrupt which did not cause a reschedule.
- */
-void tick_nohz_stop_sched_tick(int inidle)
+static void tick_nohz_stop_sched_tick(struct tick_sched *ts)
{
- unsigned long seq, last_jiffies, next_jiffies, delta_jiffies, flags;
- struct tick_sched *ts;
+ unsigned long seq, last_jiffies, next_jiffies, delta_jiffies;
ktime_t last_update, expires, now;
struct clock_event_device *dev = __get_cpu_var(tick_cpu_device).evtdev;
u64 time_delta;
int cpu;
- local_irq_save(flags);
-
cpu = smp_processor_id();
ts = &per_cpu(tick_cpu_sched, cpu);
- /*
- * Call to tick_nohz_start_idle stops the last_update_time from being
- * updated. Thus, it must not be called in the event we are called from
- * irq_exit() with the prior state different than idle.
- */
- if (!inidle && !ts->inidle)
- goto end;
-
- /*
- * Set ts->inidle unconditionally. Even if the system did not
- * switch to NOHZ mode the cpu frequency governers rely on the
- * update of the idle time accounting in tick_nohz_start_idle().
- */
- ts->inidle = 1;
-
now = tick_nohz_start_idle(cpu, ts);
/*
@@ -297,10 +272,10 @@ void tick_nohz_stop_sched_tick(int inidle)
}
if (unlikely(ts->nohz_mode == NOHZ_MODE_INACTIVE))
- goto end;
+ return;
if (need_resched())
- goto end;
+ return;
if (unlikely(local_softirq_pending() && cpu_online(cpu))) {
static int ratelimit;
@@ -310,7 +285,7 @@ void tick_nohz_stop_sched_tick(int inidle)
(unsigned int) local_softirq_pending());
ratelimit++;
}
- goto end;
+ return;
}
ts->idle_calls++;
@@ -442,10 +417,54 @@ out:
ts->next_jiffies = next_jiffies;
ts->last_jiffies = last_jiffies;
ts->sleep_length = ktime_sub(dev->next_event, now);
-end:
- if (inidle)
- rcu_idle_enter();
- local_irq_restore(flags);
+}
+
+/**
+ * tick_nohz_idle_enter - stop the idle tick from the idle task
+ *
+ * When the next event is more than a tick into the future, stop the idle tick
+ * Called when we start the idle loop.
+ * This also enters into RCU extended quiescent state so that this CPU doesn't
+ * need anymore to be part of any global grace period completion. This way
+ * the tick can be stopped safely as we don't need to report quiescent states.
+ */
+void tick_nohz_idle_enter(void)
+{
+ struct tick_sched *ts;
+
+ WARN_ON_ONCE(irqs_disabled());
+
+ local_irq_disable();
+
+ ts = &__get_cpu_var(tick_cpu_sched);
+ /*
+ * set ts->inidle unconditionally. even if the system did not
+ * switch to nohz mode the cpu frequency governers rely on the
+ * update of the idle time accounting in tick_nohz_start_idle().
+ */
+ ts->inidle = 1;
+ tick_nohz_stop_sched_tick(ts);
+ rcu_idle_enter();
+
+ local_irq_enable();
+}
+
+/**
+ * tick_nohz_irq_exit - update next tick event from interrupt exit
+ *
+ * When an interrupt fires while we are idle and it doesn't cause
+ * a reschedule, it may still add, modify or delete a timer, enqueue
+ * an RCU callback, etc...
+ * So we need to re-calculate and reprogram the next tick event.
+ */
+void tick_nohz_irq_exit(void)
+{
+ struct tick_sched *ts = &__get_cpu_var(tick_cpu_sched);
+
+ if (!ts->inidle)
+ return;
+
+ tick_nohz_stop_sched_tick(ts);
}
/**
@@ -487,11 +506,13 @@ static void tick_nohz_restart(struct tick_sched *ts, ktime_t now)
}
/**
- * tick_nohz_restart_sched_tick - restart the idle tick from the idle task
+ * tick_nohz_idle_exit - restart the idle tick from the idle task
*
* Restart the idle tick when the CPU is woken up from idle
+ * This also exit the RCU extended quiescent state. The CPU
+ * can use RCU again after this function is called.
*/
-void tick_nohz_restart_sched_tick(void)
+void tick_nohz_idle_exit(void)
{
int cpu = smp_processor_id();
struct tick_sched *ts = &per_cpu(tick_cpu_sched, cpu);
--
1.7.5.4
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH 08/11] nohz: Allow rcu extended quiescent state handling seperately from tick stop
2011-10-07 16:21 [PATCH 00/11] rcu: Detect illegal uses of RCU in idle and fix some v5 Frederic Weisbecker
` (6 preceding siblings ...)
2011-10-07 16:22 ` [PATCH 07/11] nohz: Separate out irq exit and idle loop dyntick logic Frederic Weisbecker
@ 2011-10-07 16:22 ` Frederic Weisbecker
2011-10-08 13:43 ` Frederic Weisbecker
2011-10-08 14:01 ` [PATCH 08/11 v2] " Frederic Weisbecker
2011-10-07 16:22 ` [PATCH 09/11] x86: Enter rcu extended qs after idle notifier call Frederic Weisbecker
` (3 subsequent siblings)
11 siblings, 2 replies; 24+ messages in thread
From: Frederic Weisbecker @ 2011-10-07 16:22 UTC (permalink / raw)
To: Paul E. McKenney
Cc: LKML, Frederic Weisbecker, Mike Frysinger, Guan Xuetao,
David Miller, Chris Metcalf, Hans-Christian Egtvedt, Ralf Baechle,
Ingo Molnar, Peter Zijlstra, Thomas Gleixner, H. Peter Anvin,
Russell King, Paul Mackerras, Heiko Carstens, Paul Mundt
It is assumed that rcu won't be used once we switch to tickless
mode and until we restart the tick. However this is not always
true, as in x86-64 where we dereference the idle notifiers after
the tick is stopped.
To prepare for fixing this, add two new APIs:
tick_nohz_idle_enter_norcu() and tick_nohz_idle_exit_norcu().
If no use of RCU is made in the idle loop between
tick_nohz_enter_idle() and tick_nohz_exit_idle() calls, the arch
must instead call the new *_norcu() version such that the arch doesn't
need to call rcu_idle_enter() and rcu_idle_exit().
Otherwise the arch must call tick_nohz_enter_idle() and
tick_nohz_exit_idle() and also call explicitly:
- rcu_idle_enter() after its last use of RCU before the CPU is put
to sleep.
- rcu_idle_exit() before the first use of RCU after the CPU is woken
up.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mike Frysinger <vapier@gentoo.org>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: David Miller <davem@davemloft.net>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Hans-Christian Egtvedt <hans-christian.egtvedt@atmel.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
---
arch/arm/kernel/process.c | 4 +-
arch/avr32/kernel/process.c | 4 +-
arch/blackfin/kernel/process.c | 4 +-
arch/microblaze/kernel/process.c | 4 +-
arch/mips/kernel/process.c | 4 +-
arch/openrisc/kernel/idle.c | 4 +-
arch/powerpc/kernel/idle.c | 4 +-
arch/powerpc/platforms/iseries/setup.c | 8 +++---
arch/s390/kernel/process.c | 4 +-
arch/sh/kernel/idle.c | 4 +-
arch/sparc/kernel/process_64.c | 4 +-
arch/tile/kernel/process.c | 4 +-
arch/um/kernel/process.c | 4 +-
arch/unicore32/kernel/process.c | 4 +-
arch/x86/kernel/process_32.c | 4 +-
arch/x86/kernel/process_64.c | 4 +-
include/linux/tick.h | 45 +++++++++++++++++++++++++++++--
kernel/time/tick-sched.c | 25 +++++++++--------
18 files changed, 89 insertions(+), 49 deletions(-)
diff --git a/arch/arm/kernel/process.c b/arch/arm/kernel/process.c
index f9261d0..4f83362 100644
--- a/arch/arm/kernel/process.c
+++ b/arch/arm/kernel/process.c
@@ -183,7 +183,7 @@ void cpu_idle(void)
/* endless idle loop with no priority at all */
while (1) {
- tick_nohz_idle_enter();
+ tick_nohz_idle_enter_norcu();
leds_event(led_idle_start);
while (!need_resched()) {
#ifdef CONFIG_HOTPLUG_CPU
@@ -210,7 +210,7 @@ void cpu_idle(void)
}
}
leds_event(led_idle_end);
- tick_nohz_idle_exit();
+ tick_nohz_idle_exit_norcu();
preempt_enable_no_resched();
schedule();
preempt_disable();
diff --git a/arch/avr32/kernel/process.c b/arch/avr32/kernel/process.c
index 6ee7952..34c8c70 100644
--- a/arch/avr32/kernel/process.c
+++ b/arch/avr32/kernel/process.c
@@ -34,10 +34,10 @@ void cpu_idle(void)
{
/* endless idle loop with no priority at all */
while (1) {
- tick_nohz_idle_enter();
+ tick_nohz_idle_enter_norcu();
while (!need_resched())
cpu_idle_sleep();
- tick_nohz_idle_exit();
+ tick_nohz_idle_exit_norcu();
preempt_enable_no_resched();
schedule();
preempt_disable();
diff --git a/arch/blackfin/kernel/process.c b/arch/blackfin/kernel/process.c
index 7b141b5..57e0749 100644
--- a/arch/blackfin/kernel/process.c
+++ b/arch/blackfin/kernel/process.c
@@ -88,10 +88,10 @@ void cpu_idle(void)
#endif
if (!idle)
idle = default_idle;
- tick_nohz_idle_enter();
+ tick_nohz_idle_enter_norcu();
while (!need_resched())
idle();
- tick_nohz_idle_exit();
+ tick_nohz_idle_exit_norcu();
preempt_enable_no_resched();
schedule();
preempt_disable();
diff --git a/arch/microblaze/kernel/process.c b/arch/microblaze/kernel/process.c
index 6dc123e..c6ece38 100644
--- a/arch/microblaze/kernel/process.c
+++ b/arch/microblaze/kernel/process.c
@@ -103,10 +103,10 @@ void cpu_idle(void)
if (!idle)
idle = default_idle;
- tick_nohz_idle_enter();
+ tick_nohz_idle_enter_norcu();
while (!need_resched())
idle();
- tick_nohz_idle_exit();
+ tick_nohz_idle_exit_norcu();
preempt_enable_no_resched();
schedule();
diff --git a/arch/mips/kernel/process.c b/arch/mips/kernel/process.c
index d50a005..7df2ffc 100644
--- a/arch/mips/kernel/process.c
+++ b/arch/mips/kernel/process.c
@@ -56,7 +56,7 @@ void __noreturn cpu_idle(void)
/* endless idle loop with no priority at all */
while (1) {
- tick_nohz_idle_enter();
+ tick_nohz_idle_enter_norcu();
while (!need_resched() && cpu_online(cpu)) {
#ifdef CONFIG_MIPS_MT_SMTC
extern void smtc_idle_loop_hook(void);
@@ -77,7 +77,7 @@ void __noreturn cpu_idle(void)
system_state == SYSTEM_BOOTING))
play_dead();
#endif
- tick_nohz_idle_exit();
+ tick_nohz_idle_exit_norcu();
preempt_enable_no_resched();
schedule();
preempt_disable();
diff --git a/arch/openrisc/kernel/idle.c b/arch/openrisc/kernel/idle.c
index fb6a9bf..2e82cd0 100644
--- a/arch/openrisc/kernel/idle.c
+++ b/arch/openrisc/kernel/idle.c
@@ -51,7 +51,7 @@ void cpu_idle(void)
/* endless idle loop with no priority at all */
while (1) {
- tick_nohz_idle_enter();
+ tick_nohz_idle_enter_norcu();
while (!need_resched()) {
check_pgt_cache();
@@ -69,7 +69,7 @@ void cpu_idle(void)
set_thread_flag(TIF_POLLING_NRFLAG);
}
- tick_nohz_idle_exit();
+ tick_nohz_idle_exit_norcu();
preempt_enable_no_resched();
schedule();
preempt_disable();
diff --git a/arch/powerpc/kernel/idle.c b/arch/powerpc/kernel/idle.c
index 878572f..2e782a3 100644
--- a/arch/powerpc/kernel/idle.c
+++ b/arch/powerpc/kernel/idle.c
@@ -56,7 +56,7 @@ void cpu_idle(void)
set_thread_flag(TIF_POLLING_NRFLAG);
while (1) {
- tick_nohz_idle_enter();
+ tick_nohz_idle_enter_norcu();
while (!need_resched() && !cpu_should_die()) {
ppc64_runlatch_off();
@@ -93,7 +93,7 @@ void cpu_idle(void)
HMT_medium();
ppc64_runlatch_on();
- tick_nohz_idle_exit();
+ tick_nohz_idle_exit_norcu();
preempt_enable_no_resched();
if (cpu_should_die())
cpu_die();
diff --git a/arch/powerpc/platforms/iseries/setup.c b/arch/powerpc/platforms/iseries/setup.c
index e2f5fad..350a7fb 100644
--- a/arch/powerpc/platforms/iseries/setup.c
+++ b/arch/powerpc/platforms/iseries/setup.c
@@ -562,7 +562,7 @@ static void yield_shared_processor(void)
static void iseries_shared_idle(void)
{
while (1) {
- tick_nohz_idle_enter();
+ tick_nohz_idle_enter_norcu_nohz();
while (!need_resched() && !hvlpevent_is_pending()) {
local_irq_disable();
ppc64_runlatch_off();
@@ -576,7 +576,7 @@ static void iseries_shared_idle(void)
}
ppc64_runlatch_on();
- tick_nohz_idle_exit();
+ tick_nohz_idle_exit_norcu_nohz();
if (hvlpevent_is_pending())
process_iSeries_events();
@@ -592,7 +592,7 @@ static void iseries_dedicated_idle(void)
set_thread_flag(TIF_POLLING_NRFLAG);
while (1) {
- tick_nohz_idle_enter();
+ tick_nohz_idle_enter_norcu_nohz();
if (!need_resched()) {
while (!need_resched()) {
ppc64_runlatch_off();
@@ -609,7 +609,7 @@ static void iseries_dedicated_idle(void)
}
ppc64_runlatch_on();
- tick_nohz_idle_exit();
+ tick_nohz_idle_exit_norcu_nohz();
preempt_enable_no_resched();
schedule();
preempt_disable();
diff --git a/arch/s390/kernel/process.c b/arch/s390/kernel/process.c
index db3e930..44028ae 100644
--- a/arch/s390/kernel/process.c
+++ b/arch/s390/kernel/process.c
@@ -90,10 +90,10 @@ static void default_idle(void)
void cpu_idle(void)
{
for (;;) {
- tick_nohz_idle_enter();
+ tick_nohz_idle_enter_norcu();
while (!need_resched())
default_idle();
- tick_nohz_idle_exit();
+ tick_nohz_idle_exit_norcu();
preempt_enable_no_resched();
schedule();
preempt_disable();
diff --git a/arch/sh/kernel/idle.c b/arch/sh/kernel/idle.c
index 6015743..ad58e75 100644
--- a/arch/sh/kernel/idle.c
+++ b/arch/sh/kernel/idle.c
@@ -89,7 +89,7 @@ void cpu_idle(void)
/* endless idle loop with no priority at all */
while (1) {
- tick_nohz_idle_enter();
+ tick_nohz_idle_enter_norcu();
while (!need_resched()) {
check_pgt_cache();
@@ -111,7 +111,7 @@ void cpu_idle(void)
start_critical_timings();
}
- tick_nohz_idle_exit();
+ tick_nohz_idle_exit_norcu();
preempt_enable_no_resched();
schedule();
preempt_disable();
diff --git a/arch/sparc/kernel/process_64.c b/arch/sparc/kernel/process_64.c
index 1235f63..78b1bc0 100644
--- a/arch/sparc/kernel/process_64.c
+++ b/arch/sparc/kernel/process_64.c
@@ -95,12 +95,12 @@ void cpu_idle(void)
set_thread_flag(TIF_POLLING_NRFLAG);
while(1) {
- tick_nohz_idle_enter();
+ tick_nohz_idle_enter_norcu();
while (!need_resched() && !cpu_is_offline(cpu))
sparc64_yield(cpu);
- tick_nohz_idle_exit();
+ tick_nohz_idle_exit_norcu();
preempt_enable_no_resched();
diff --git a/arch/tile/kernel/process.c b/arch/tile/kernel/process.c
index 920e674..53ac895 100644
--- a/arch/tile/kernel/process.c
+++ b/arch/tile/kernel/process.c
@@ -85,7 +85,7 @@ void cpu_idle(void)
/* endless idle loop with no priority at all */
while (1) {
- tick_nohz_idle_enter();
+ tick_nohz_idle_enter_norcu();
while (!need_resched()) {
if (cpu_is_offline(cpu))
BUG(); /* no HOTPLUG_CPU */
@@ -105,7 +105,7 @@ void cpu_idle(void)
local_irq_enable();
current_thread_info()->status |= TS_POLLING;
}
- tick_nohz_idle_exit();
+ tick_nohz_idle_exit_norcu();
preempt_enable_no_resched();
schedule();
preempt_disable();
diff --git a/arch/um/kernel/process.c b/arch/um/kernel/process.c
index 41acf59..9e7176b 100644
--- a/arch/um/kernel/process.c
+++ b/arch/um/kernel/process.c
@@ -245,10 +245,10 @@ void default_idle(void)
if (need_resched())
schedule();
- tick_nohz_idle_enter();
+ tick_nohz_idle_enter_norcu();
nsecs = disable_timer();
idle_sleep(nsecs);
- tick_nohz_idle_exit();
+ tick_nohz_idle_exit_norcu();
}
}
diff --git a/arch/unicore32/kernel/process.c b/arch/unicore32/kernel/process.c
index 9999b9a..095ff5a 100644
--- a/arch/unicore32/kernel/process.c
+++ b/arch/unicore32/kernel/process.c
@@ -55,7 +55,7 @@ void cpu_idle(void)
{
/* endless idle loop with no priority at all */
while (1) {
- tick_nohz_idle_enter();
+ tick_nohz_idle_enter_norcu();
while (!need_resched()) {
local_irq_disable();
stop_critical_timings();
@@ -63,7 +63,7 @@ void cpu_idle(void)
local_irq_enable();
start_critical_timings();
}
- tick_nohz_idle_exit();
+ tick_nohz_idle_exit_norcu();
preempt_enable_no_resched();
schedule();
preempt_disable();
diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
index ad93205..f311d096 100644
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -98,7 +98,7 @@ void cpu_idle(void)
/* endless idle loop with no priority at all */
while (1) {
- tick_nohz_idle_enter();
+ tick_nohz_idle_enter_norcu();
while (!need_resched()) {
check_pgt_cache();
@@ -114,7 +114,7 @@ void cpu_idle(void)
pm_idle();
start_critical_timings();
}
- tick_nohz_idle_exit();
+ tick_nohz_idle_exit_norcu();
preempt_enable_no_resched();
schedule();
preempt_disable();
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 9ca714e..e72daf9 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -121,7 +121,7 @@ void cpu_idle(void)
/* endless idle loop with no priority at all */
while (1) {
- tick_nohz_idle_enter();
+ tick_nohz_idle_enter_norcu();
while (!need_resched()) {
rmb();
@@ -147,7 +147,7 @@ void cpu_idle(void)
__exit_idle();
}
- tick_nohz_idle_exit();
+ tick_nohz_idle_exit_norcu();
preempt_enable_no_resched();
schedule();
preempt_disable();
diff --git a/include/linux/tick.h b/include/linux/tick.h
index 0df1d50..7224396 100644
--- a/include/linux/tick.h
+++ b/include/linux/tick.h
@@ -7,6 +7,7 @@
#define _LINUX_TICK_H
#include <linux/clockchips.h>
+#include <linux/irqflags.h>
#ifdef CONFIG_GENERIC_CLOCKEVENTS
@@ -121,18 +122,56 @@ static inline int tick_oneshot_mode_active(void) { return 0; }
#endif /* !CONFIG_GENERIC_CLOCKEVENTS */
# ifdef CONFIG_NO_HZ
-extern void tick_nohz_idle_enter(void);
+extern void __tick_nohz_idle_enter(void);
+static inline void tick_nohz_idle_enter(void)
+{
+ local_irq_disable();
+ __tick_nohz_idle_enter();
+ local_irq_enable();
+}
extern void tick_nohz_idle_exit(void);
+
+/*
+ * Call this pair of function if the arch doesn't make any use
+ * of RCU in-between. You won't need to call rcu_idle_enter() and
+ * rcu_idle_exit().
+ * Otherwise you need to call tick_nohz_idle_enter() and tick_nohz_idle_exit()
+ * and explicitly tell RCU about the window around the place the CPU enters low
+ * power mode where no RCU use is made. This is done by calling rcu_idle_enter()
+ * after the last use of RCU before the CPU is put to sleep and by calling
+ * rcu_idle_exit() before the first use of RCU after the CPU woke up.
+ */
+static inline void tick_nohz_idle_enter_norcu(void)
+{
+ /*
+ * Also call rcu_idle_enter() in the irq disabled section even
+ * if it disables irq itself.
+ * Just an optimization that prevents from an interrupt happening
+ * between it and __tick_nohz_idle_enter() to lose time to help completing
+ * a grace period while we could be in extended grace period already.
+ */
+ local_irq_disable();
+ __tick_nohz_idle_enter();
+ rcu_idle_enter();
+ local_irq_enable();
+}
+static inline void tick_nohz_idle_exit_norcu(void)
+{
+ rcu_idle_exit();
+ tick_nohz_idle_exit();
+}
extern void tick_nohz_irq_exit(void);
extern ktime_t tick_nohz_get_sleep_length(void);
extern u64 get_cpu_idle_time_us(int cpu, u64 *last_update_time);
extern u64 get_cpu_iowait_time_us(int cpu, u64 *last_update_time);
# else
-static inline void tick_nohz_idle_enter(void)
+static inline void tick_nohz_idle_enter(void) { }
+static inline void tick_nohz_idle_exit(void) { }
+static inline void tick_nohz_idle_enter_norcu(void)
{
rcu_idle_enter();
}
-static inline void tick_nohz_idle_exit(void)
+static inline void tick_nohz_idle_exit_norcu(void)
{
rcu_idle_exit();
}
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 52b7ace..360d028 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -424,18 +424,22 @@ out:
*
* When the next event is more than a tick into the future, stop the idle tick
* Called when we start the idle loop.
- * This also enters into RCU extended quiescent state so that this CPU doesn't
- * need anymore to be part of any global grace period completion. This way
- * the tick can be stopped safely as we don't need to report quiescent states.
+ *
+ * If no use of RCU is made in the idle loop between
+ * tick_nohz_idle_enter() and tick_nohz_idle_exit() calls, then
+ * tick_nohz_idle_enter_norcu() should be called instead and the arch
+ * doesn't need to call rcu_idle_enter() and rcu_idle_exit() explicitly.
+ *
+ * Otherwise the arch is responsible of calling:
+ *
+ * - rcu_idle_enter() after its last use of RCU before the CPU is put
+ * to sleep.
+ * - rcu_idle_exit() before the first use of RCU after the CPU is woken up.
*/
-void tick_nohz_idle_enter(void)
+void __tick_nohz_idle_enter(void)
{
struct tick_sched *ts;
- WARN_ON_ONCE(irqs_disabled());
-
- local_irq_disable();
-
ts = &__get_cpu_var(tick_cpu_sched);
/*
* set ts->inidle unconditionally. even if the system did not
@@ -444,9 +448,6 @@ void tick_nohz_idle_enter(void)
*/
ts->inidle = 1;
tick_nohz_stop_sched_tick(ts);
- rcu_idle_enter();
-
- local_irq_enable();
}
/**
@@ -522,7 +523,7 @@ void tick_nohz_idle_exit(void)
ktime_t now;
local_irq_disable();
- rcu_idle_exit();
+
if (ts->idle_active || (ts->inidle && ts->tick_stopped))
now = ktime_get();
--
1.7.5.4
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH 09/11] x86: Enter rcu extended qs after idle notifier call
2011-10-07 16:21 [PATCH 00/11] rcu: Detect illegal uses of RCU in idle and fix some v5 Frederic Weisbecker
` (7 preceding siblings ...)
2011-10-07 16:22 ` [PATCH 08/11] nohz: Allow rcu extended quiescent state handling seperately from tick stop Frederic Weisbecker
@ 2011-10-07 16:22 ` Frederic Weisbecker
2011-10-07 16:22 ` [PATCH 10/11] x86: Call idle notifier after irq_enter() Frederic Weisbecker
` (2 subsequent siblings)
11 siblings, 0 replies; 24+ messages in thread
From: Frederic Weisbecker @ 2011-10-07 16:22 UTC (permalink / raw)
To: Paul E. McKenney
Cc: LKML, Frederic Weisbecker, Ingo Molnar, Thomas Gleixner,
H. Peter Anvin
The idle notifier, called by enter_idle(), enters into rcu read
side critical section but at that time we already switched into
the RCU-idle window (rcu_idle_enter() has been called). And it's
illegal to use rcu_read_lock() in that state.
This results in rcu reporting its bad mood:
[ 1.275635] WARNING: at include/linux/rcupdate.h:194 __atomic_notifier_call_chain+0xd2/0x110()
[ 1.275635] Hardware name: AMD690VM-FMH
[ 1.275635] Modules linked in:
[ 1.275635] Pid: 0, comm: swapper Not tainted 3.0.0-rc6+ #252
[ 1.275635] Call Trace:
[ 1.275635] [<ffffffff81051c8a>] warn_slowpath_common+0x7a/0xb0
[ 1.275635] [<ffffffff81051cd5>] warn_slowpath_null+0x15/0x20
[ 1.275635] [<ffffffff817d6f22>] __atomic_notifier_call_chain+0xd2/0x110
[ 1.275635] [<ffffffff817d6f71>] atomic_notifier_call_chain+0x11/0x20
[ 1.275635] [<ffffffff810018a0>] enter_idle+0x20/0x30
[ 1.275635] [<ffffffff81001995>] cpu_idle+0xa5/0x110
[ 1.275635] [<ffffffff817a7465>] rest_init+0xe5/0x140
[ 1.275635] [<ffffffff817a73c8>] ? rest_init+0x48/0x140
[ 1.275635] [<ffffffff81cc5ca3>] start_kernel+0x3d1/0x3dc
[ 1.275635] [<ffffffff81cc5321>] x86_64_start_reservations+0x131/0x135
[ 1.275635] [<ffffffff81cc5412>] x86_64_start_kernel+0xed/0xf4
[ 1.275635] ---[ end trace a22d306b065d4a66 ]---
Fix this by entering rcu extended quiescent state later, just before
the CPU goes to sleep.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: H. Peter Anvin <hpa@zytor.com>
---
arch/x86/kernel/process_64.c | 10 ++++++++--
1 files changed, 8 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index e72daf9..4a1535a 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -121,7 +121,7 @@ void cpu_idle(void)
/* endless idle loop with no priority at all */
while (1) {
- tick_nohz_idle_enter_norcu();
+ tick_nohz_idle_enter();
while (!need_resched()) {
rmb();
@@ -137,8 +137,14 @@ void cpu_idle(void)
enter_idle();
/* Don't trace irqs off for idle */
stop_critical_timings();
+
+ /* enter_idle() needs rcu for notifiers */
+ rcu_idle_enter();
+
if (cpuidle_idle_call())
pm_idle();
+
+ rcu_idle_exit();
start_critical_timings();
/* In many cases the interrupt that ended idle
@@ -147,7 +153,7 @@ void cpu_idle(void)
__exit_idle();
}
- tick_nohz_idle_exit_norcu();
+ tick_nohz_idle_exit();
preempt_enable_no_resched();
schedule();
preempt_disable();
--
1.7.5.4
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH 10/11] x86: Call idle notifier after irq_enter()
2011-10-07 16:21 [PATCH 00/11] rcu: Detect illegal uses of RCU in idle and fix some v5 Frederic Weisbecker
` (8 preceding siblings ...)
2011-10-07 16:22 ` [PATCH 09/11] x86: Enter rcu extended qs after idle notifier call Frederic Weisbecker
@ 2011-10-07 16:22 ` Frederic Weisbecker
2011-10-07 16:22 ` [PATCH 11/11] rcu: Fix early call to rcu_idle_enter() Frederic Weisbecker
2011-10-07 23:32 ` [PATCH 00/11] rcu: Detect illegal uses of RCU in idle and fix some v5 Paul E. McKenney
11 siblings, 0 replies; 24+ messages in thread
From: Frederic Weisbecker @ 2011-10-07 16:22 UTC (permalink / raw)
To: Paul E. McKenney
Cc: LKML, Frederic Weisbecker, Ingo Molnar, Thomas Gleixner,
H. Peter Anvin, Andy Henroid
Interrupts notify the idle exit state before calling irq_enter().
But the notifier code calls rcu_read_lock() and this is not
allowed while rcu is in an extended quiescent state. We need
to wait for irq_enter() -> rcu_idle_exit() to be called before
doing so otherwise this results in a grumpy RCU:
[ 0.099991] WARNING: at include/linux/rcupdate.h:194 __atomic_notifier_call_chain+0xd2/0x110()
[ 0.099991] Hardware name: AMD690VM-FMH
[ 0.099991] Modules linked in:
[ 0.099991] Pid: 0, comm: swapper Not tainted 3.0.0-rc6+ #255
[ 0.099991] Call Trace:
[ 0.099991] <IRQ> [<ffffffff81051c8a>] warn_slowpath_common+0x7a/0xb0
[ 0.099991] [<ffffffff81051cd5>] warn_slowpath_null+0x15/0x20
[ 0.099991] [<ffffffff817d6fa2>] __atomic_notifier_call_chain+0xd2/0x110
[ 0.099991] [<ffffffff817d6ff1>] atomic_notifier_call_chain+0x11/0x20
[ 0.099991] [<ffffffff81001873>] exit_idle+0x43/0x50
[ 0.099991] [<ffffffff81020439>] smp_apic_timer_interrupt+0x39/0xa0
[ 0.099991] [<ffffffff817da253>] apic_timer_interrupt+0x13/0x20
[ 0.099991] <EOI> [<ffffffff8100ae67>] ? default_idle+0xa7/0x350
[ 0.099991] [<ffffffff8100ae65>] ? default_idle+0xa5/0x350
[ 0.099991] [<ffffffff8100b19b>] amd_e400_idle+0x8b/0x110
[ 0.099991] [<ffffffff810cb01f>] ? rcu_enter_nohz+0x8f/0x160
[ 0.099991] [<ffffffff810019a0>] cpu_idle+0xb0/0x110
[ 0.099991] [<ffffffff817a7505>] rest_init+0xe5/0x140
[ 0.099991] [<ffffffff817a7468>] ? rest_init+0x48/0x140
[ 0.099991] [<ffffffff81cc5ca3>] start_kernel+0x3d1/0x3dc
[ 0.099991] [<ffffffff81cc5321>] x86_64_start_reservations+0x131/0x135
[ 0.099991] [<ffffffff81cc5412>] x86_64_start_kernel+0xed/0xf4
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Andy Henroid <andrew.d.henroid@intel.com>
---
arch/x86/kernel/apic/apic.c | 6 +++---
arch/x86/kernel/apic/io_apic.c | 2 +-
arch/x86/kernel/cpu/mcheck/therm_throt.c | 2 +-
arch/x86/kernel/cpu/mcheck/threshold.c | 2 +-
arch/x86/kernel/irq.c | 6 +++---
5 files changed, 9 insertions(+), 9 deletions(-)
diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index 52fa563..54e3472 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -857,8 +857,8 @@ void __irq_entry smp_apic_timer_interrupt(struct pt_regs *regs)
* Besides, if we don't timer interrupts ignore the global
* interrupt lock, which is the WrongThing (tm) to do.
*/
- exit_idle();
irq_enter();
+ exit_idle();
local_apic_timer_interrupt();
irq_exit();
@@ -1791,8 +1791,8 @@ void smp_spurious_interrupt(struct pt_regs *regs)
{
u32 v;
- exit_idle();
irq_enter();
+ exit_idle();
/*
* Check if this really is a spurious interrupt and ACK it
* if it is a vectored one. Just in case...
@@ -1828,8 +1828,8 @@ void smp_error_interrupt(struct pt_regs *regs)
"Illegal register address", /* APIC Error Bit 7 */
};
- exit_idle();
irq_enter();
+ exit_idle();
/* First tickle the hardware, only then report what went on. -- REW */
v0 = apic_read(APIC_ESR);
apic_write(APIC_ESR, 0);
diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index 8eb863e..3f4a706 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -2316,8 +2316,8 @@ asmlinkage void smp_irq_move_cleanup_interrupt(void)
unsigned vector, me;
ack_APIC_irq();
- exit_idle();
irq_enter();
+ exit_idle();
me = smp_processor_id();
for (vector = FIRST_EXTERNAL_VECTOR; vector < NR_VECTORS; vector++) {
diff --git a/arch/x86/kernel/cpu/mcheck/therm_throt.c b/arch/x86/kernel/cpu/mcheck/therm_throt.c
index 27c6251..f6bbc64 100644
--- a/arch/x86/kernel/cpu/mcheck/therm_throt.c
+++ b/arch/x86/kernel/cpu/mcheck/therm_throt.c
@@ -396,8 +396,8 @@ static void (*smp_thermal_vector)(void) = unexpected_thermal_interrupt;
asmlinkage void smp_thermal_interrupt(struct pt_regs *regs)
{
- exit_idle();
irq_enter();
+ exit_idle();
inc_irq_stat(irq_thermal_count);
smp_thermal_vector();
irq_exit();
diff --git a/arch/x86/kernel/cpu/mcheck/threshold.c b/arch/x86/kernel/cpu/mcheck/threshold.c
index d746df2..aa578ca 100644
--- a/arch/x86/kernel/cpu/mcheck/threshold.c
+++ b/arch/x86/kernel/cpu/mcheck/threshold.c
@@ -19,8 +19,8 @@ void (*mce_threshold_vector)(void) = default_threshold_interrupt;
asmlinkage void smp_threshold_interrupt(void)
{
- exit_idle();
irq_enter();
+ exit_idle();
inc_irq_stat(irq_threshold_count);
mce_threshold_vector();
irq_exit();
diff --git a/arch/x86/kernel/irq.c b/arch/x86/kernel/irq.c
index 6c0802e..73cf928 100644
--- a/arch/x86/kernel/irq.c
+++ b/arch/x86/kernel/irq.c
@@ -180,8 +180,8 @@ unsigned int __irq_entry do_IRQ(struct pt_regs *regs)
unsigned vector = ~regs->orig_ax;
unsigned irq;
- exit_idle();
irq_enter();
+ exit_idle();
irq = __this_cpu_read(vector_irq[vector]);
@@ -208,10 +208,10 @@ void smp_x86_platform_ipi(struct pt_regs *regs)
ack_APIC_irq();
- exit_idle();
-
irq_enter();
+ exit_idle();
+
inc_irq_stat(x86_platform_ipis);
if (x86_platform_ipi_callback)
--
1.7.5.4
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH 11/11] rcu: Fix early call to rcu_idle_enter()
2011-10-07 16:21 [PATCH 00/11] rcu: Detect illegal uses of RCU in idle and fix some v5 Frederic Weisbecker
` (9 preceding siblings ...)
2011-10-07 16:22 ` [PATCH 10/11] x86: Call idle notifier after irq_enter() Frederic Weisbecker
@ 2011-10-07 16:22 ` Frederic Weisbecker
2011-10-07 23:32 ` [PATCH 00/11] rcu: Detect illegal uses of RCU in idle and fix some v5 Paul E. McKenney
11 siblings, 0 replies; 24+ messages in thread
From: Frederic Weisbecker @ 2011-10-07 16:22 UTC (permalink / raw)
To: Paul E. McKenney
Cc: LKML, Frederic Weisbecker, Ingo Molnar, Thomas Gleixner,
Peter Zijlstra
On the irq exit path, tick_nohz_irq_exit()
may raise a softirq, which action leads to the wake up
path and select_task_rq_fair() that makes use of rcu
to iterate the domains.
This is an illegal use of RCU because we may be in RCU
extended quiescent state if we interrupted an RCU-idle
window in the idle loop:
[ 132.978883] ===============================
[ 132.978883] [ INFO: suspicious RCU usage. ]
[ 132.978883] -------------------------------
[ 132.978883] kernel/sched_fair.c:1707 suspicious rcu_dereference_check() usage!
[ 132.978883]
[ 132.978883] other info that might help us debug this:
[ 132.978883]
[ 132.978883]
[ 132.978883] rcu_scheduler_active = 1, debug_locks = 0
[ 132.978883] RCU used illegally from extended quiescent state!
[ 132.978883] 2 locks held by swapper/0:
[ 132.978883] #0: (&p->pi_lock){-.-.-.}, at: [<ffffffff8105a729>] try_to_wake_up+0x39/0x2f0
[ 132.978883] #1: (rcu_read_lock){.+.+..}, at: [<ffffffff8105556a>] select_task_rq_fair+0x6a/0xec0
[ 132.978883]
[ 132.978883] stack backtrace:
[ 132.978883] Pid: 0, comm: swapper Tainted: G W 3.0.0+ #178
[ 132.978883] Call Trace:
[ 132.978883] <IRQ> [<ffffffff810a01f6>] lockdep_rcu_suspicious+0xe6/0x100
[ 132.978883] [<ffffffff81055c49>] select_task_rq_fair+0x749/0xec0
[ 132.978883] [<ffffffff8105556a>] ? select_task_rq_fair+0x6a/0xec0
[ 132.978883] [<ffffffff812fe494>] ? do_raw_spin_lock+0x54/0x150
[ 132.978883] [<ffffffff810a1f2d>] ? trace_hardirqs_on+0xd/0x10
[ 132.978883] [<ffffffff8105a7c3>] try_to_wake_up+0xd3/0x2f0
[ 132.978883] [<ffffffff81094f98>] ? ktime_get+0x68/0xf0
[ 132.978883] [<ffffffff8105aa35>] wake_up_process+0x15/0x20
[ 132.978883] [<ffffffff81069dd5>] raise_softirq_irqoff+0x65/0x110
[ 132.978883] [<ffffffff8108eb65>] __hrtimer_start_range_ns+0x415/0x5a0
[ 132.978883] [<ffffffff812fe3ee>] ? do_raw_spin_unlock+0x5e/0xb0
[ 132.978883] [<ffffffff8108ed08>] hrtimer_start+0x18/0x20
[ 132.978883] [<ffffffff8109c9c3>] tick_nohz_stop_sched_tick+0x393/0x450
[ 132.978883] [<ffffffff810694f2>] irq_exit+0xd2/0x100
[ 132.978883] [<ffffffff81829e96>] do_IRQ+0x66/0xe0
[ 132.978883] [<ffffffff81820d53>] common_interrupt+0x13/0x13
[ 132.978883] <EOI> [<ffffffff8103434b>] ? native_safe_halt+0xb/0x10
[ 132.978883] [<ffffffff810a1f2d>] ? trace_hardirqs_on+0xd/0x10
[ 132.978883] [<ffffffff810144ea>] default_idle+0xba/0x370
[ 132.978883] [<ffffffff810147fe>] amd_e400_idle+0x5e/0x130
[ 132.978883] [<ffffffff8100a9f6>] cpu_idle+0xb6/0x120
[ 132.978883] [<ffffffff817f217f>] rest_init+0xef/0x150
[ 132.978883] [<ffffffff817f20e2>] ? rest_init+0x52/0x150
[ 132.978883] [<ffffffff81ed9cf3>] start_kernel+0x3da/0x3e5
[ 132.978883] [<ffffffff81ed9346>] x86_64_start_reservations+0x131/0x135
[ 132.978883] [<ffffffff81ed944d>] x86_64_start_kernel+0x103/0x112
Fix this by calling rcu_idle_enter() after tick_nohz_irq_exit().
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
kernel/softirq.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/kernel/softirq.c b/kernel/softirq.c
index 5c11dab..cca87e9 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -347,12 +347,12 @@ void irq_exit(void)
if (!in_interrupt() && local_softirq_pending())
invoke_softirq();
- rcu_idle_enter();
#ifdef CONFIG_NO_HZ
/* Make sure that timer wheel updates are propagated */
if (idle_cpu(smp_processor_id()) && !in_interrupt() && !need_resched())
tick_nohz_irq_exit();
#endif
+ rcu_idle_enter();
preempt_enable_no_resched();
}
--
1.7.5.4
^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: [PATCH 02/11] rcu: Inform the user about extended quiescent state on PROVE_RCU warning
2011-10-07 16:22 ` [PATCH 02/11] rcu: Inform the user about extended quiescent state on PROVE_RCU warning Frederic Weisbecker
@ 2011-10-07 22:47 ` Paul E. McKenney
0 siblings, 0 replies; 24+ messages in thread
From: Paul E. McKenney @ 2011-10-07 22:47 UTC (permalink / raw)
To: Frederic Weisbecker
Cc: LKML, Ingo Molnar, Peter Zijlstra, Thomas Gleixner, Lai Jiangshan
On Fri, Oct 07, 2011 at 06:22:01PM +0200, Frederic Weisbecker wrote:
> Inform the user if an RCU usage error is detected by lockdep while in
> an extended quiescent state (in this case, the RCU-free window in idle).
> This is accomplished by adding a line to the RCU lockdep splat indicating
> whether or not the splat occurred in extended quiescent state.
>
> Uses of RCU from within extended quiescent state mode are totally ignored
> by RCU, hence the importance of this diagnostic.
Looks good!
At some point, we may need to add an additional argument to
lockdep_rcu_suspicious() to suppress this new message, but let's
worry about that when and if we come to it.
(An example would be an rcu_dereference_protected() that checks for
a lock being held, and thus can be legally called from an extended
quiescent state. But the message would print only if the lock was
not held, so this could not be a false positive, only a possible
source of confusion. Not worth worrying about yet.)
Thanx, Paul
> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> Cc: Ingo Molnar <mingo@elte.hu>
> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> ---
> kernel/lockdep.c | 22 ++++++++++++++++++++++
> 1 files changed, 22 insertions(+), 0 deletions(-)
>
> diff --git a/kernel/lockdep.c b/kernel/lockdep.c
> index 1e48f1c..8873f6e 100644
> --- a/kernel/lockdep.c
> +++ b/kernel/lockdep.c
> @@ -4026,6 +4026,28 @@ void lockdep_rcu_suspicious(const char *file, const int line, const char *s)
> printk("%s:%d %s!\n", file, line, s);
> printk("\nother info that might help us debug this:\n\n");
> printk("\nrcu_scheduler_active = %d, debug_locks = %d\n", rcu_scheduler_active, debug_locks);
> +
> + /*
> + * If a CPU is in the RCU-free window in idle (ie: in the section
> + * between rcu_idle_enter() and rcu_idle_exit(), then RCU
> + * considers that CPU to be in an "extended quiescent state",
> + * which means that RCU will be completely ignoring that CPU.
> + * Therefore, rcu_read_lock() and friends have absolutely no
> + * effect on a CPU running in that state. In other words, even if
> + * such an RCU-idle CPU has called rcu_read_lock(), RCU might well
> + * delete data structures out from under it. RCU really has no
> + * choice here: we need to keep an RCU-free window in idle where
> + * the CPU may possibly enter into low power mode. This way we can
> + * notice an extended quiescent state to other CPUs that started a grace
> + * period. Otherwise we would delay any grace period as long as we run
> + * in the idle task.
> + *
> + * So complain bitterly if someone does call rcu_read_lock(),
> + * rcu_read_lock_bh() and so on from extended quiescent states.
> + */
> + if (rcu_is_cpu_idle())
> + printk("RCU used illegally from extended quiescent state!\n");
> +
> lockdep_print_held_locks(curr);
> printk("\nstack backtrace:\n");
> dump_stack();
> --
> 1.7.5.4
>
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 00/11] rcu: Detect illegal uses of RCU in idle and fix some v5
2011-10-07 16:21 [PATCH 00/11] rcu: Detect illegal uses of RCU in idle and fix some v5 Frederic Weisbecker
` (10 preceding siblings ...)
2011-10-07 16:22 ` [PATCH 11/11] rcu: Fix early call to rcu_idle_enter() Frederic Weisbecker
@ 2011-10-07 23:32 ` Paul E. McKenney
11 siblings, 0 replies; 24+ messages in thread
From: Paul E. McKenney @ 2011-10-07 23:32 UTC (permalink / raw)
To: Frederic Weisbecker
Cc: LKML, Ingo Molnar, Thomas Gleixner, Peter Zijlstra,
H. Peter Anvin, Andy Henroid, Mike Frysinger, Guan Xuetao,
David Miller, Chris Metcalf, Hans-Christian Egtvedt, Ralf Baechle,
Russell King, Paul Mackerras, Heiko Carstens, Paul Mundt,
Lai Jiangshan
On Fri, Oct 07, 2011 at 06:21:59PM +0200, Frederic Weisbecker wrote:
> Hi Paul,
>
> Here is the rebase. And it's more than just a rebase given the
> semantical changes after your patch that tracks idleness. And also
> because of the new API naming (with a new pair of tick_nohz_idle_enter()
> tick_nohz_idle_exit() with *_norcu suffixes).
>
> I have reused and updated some comments (that you made on earlier
> versions of my patchset) that assumed that extended qs = dynticks
> idle, as it's not always true anymore. I've tried to bring a new
> wording to express what we are dealing with: "RCU-free window in idle"
> or "RCU-idle window". I let you update the comments if you think
> that's confusing or too scarce.
>
> It has passed several hours of rcutorture with NO_HZ && SMP, and at least
> booted fine with all your configs.
Thank you, Frederic! I have queued these locally and will start
testing them.
Thanx, Paul
> Frederic Weisbecker (9):
> rcu: Detect illegal rcu dereference in extended quiescent state
> rcu: Inform the user about extended quiescent state on PROVE_RCU warning
> rcu: Warn when rcu_read_lock() is used in extended quiescent state
> rcu: Make srcu_read_lock_held() call common lockdep-enabled function
> nohz: Separate out irq exit and idle loop dyntick logic
> nohz: Allow rcu extended quiescent state handling seperately from tick stop
> x86: Enter rcu extended qs after idle notifier call
> x86: Call idle notifier after irq_enter()
> rcu: Fix early call to rcu_idle_enter()
>
> Paul E. McKenney (2):
> rcu: Remove one layer of abstraction from PROVE_RCU checking
> rcu: Warn when srcu_read_lock() is used in an extended quiescent state
>
> arch/arm/kernel/process.c | 4 +-
> arch/avr32/kernel/process.c | 4 +-
> arch/blackfin/kernel/process.c | 4 +-
> arch/microblaze/kernel/process.c | 4 +-
> arch/mips/kernel/process.c | 4 +-
> arch/openrisc/kernel/idle.c | 4 +-
> arch/powerpc/kernel/idle.c | 4 +-
> arch/powerpc/platforms/iseries/setup.c | 8 +-
> arch/s390/kernel/process.c | 4 +-
> arch/sh/kernel/idle.c | 4 +-
> arch/sparc/kernel/process_64.c | 4 +-
> arch/tile/kernel/process.c | 4 +-
> arch/um/kernel/process.c | 4 +-
> arch/unicore32/kernel/process.c | 4 +-
> arch/x86/kernel/apic/apic.c | 6 +-
> arch/x86/kernel/apic/io_apic.c | 2 +-
> arch/x86/kernel/cpu/mcheck/therm_throt.c | 2 +-
> arch/x86/kernel/cpu/mcheck/threshold.c | 2 +-
> arch/x86/kernel/irq.c | 6 +-
> arch/x86/kernel/process_32.c | 4 +-
> arch/x86/kernel/process_64.c | 10 +++-
> include/linux/rcupdate.h | 74 +++++++++++++++--------
> include/linux/srcu.h | 38 ++++++++----
> include/linux/tick.h | 52 ++++++++++++++--
> kernel/lockdep.c | 22 +++++++
> kernel/rcupdate.c | 4 +
> kernel/rcutiny.c | 1 +
> kernel/rcutree.c | 19 +++++-
> kernel/softirq.c | 4 +-
> kernel/time/tick-sched.c | 96 ++++++++++++++++++-----------
> 30 files changed, 273 insertions(+), 129 deletions(-)
>
> --
> 1.7.5.4
>
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 08/11] nohz: Allow rcu extended quiescent state handling seperately from tick stop
2011-10-07 16:22 ` [PATCH 08/11] nohz: Allow rcu extended quiescent state handling seperately from tick stop Frederic Weisbecker
@ 2011-10-08 13:43 ` Frederic Weisbecker
2011-10-08 14:01 ` [PATCH 08/11 v2] " Frederic Weisbecker
1 sibling, 0 replies; 24+ messages in thread
From: Frederic Weisbecker @ 2011-10-08 13:43 UTC (permalink / raw)
To: Paul E. McKenney
Cc: LKML, Mike Frysinger, Guan Xuetao, David Miller, Chris Metcalf,
Hans-Christian Egtvedt, Ralf Baechle, Ingo Molnar, Peter Zijlstra,
Thomas Gleixner, H. Peter Anvin, Russell King, Paul Mackerras,
Heiko Carstens, Paul Mundt
On Fri, Oct 07, 2011 at 06:22:07PM +0200, Frederic Weisbecker wrote:
> diff --git a/arch/powerpc/platforms/iseries/setup.c b/arch/powerpc/platforms/iseries/setup.c
> index e2f5fad..350a7fb 100644
> --- a/arch/powerpc/platforms/iseries/setup.c
> +++ b/arch/powerpc/platforms/iseries/setup.c
> @@ -562,7 +562,7 @@ static void yield_shared_processor(void)
> static void iseries_shared_idle(void)
> {
> while (1) {
> - tick_nohz_idle_enter();
> + tick_nohz_idle_enter_norcu_nohz();
Ah too much nohz there. Seems that I skidded while walking
with my sed.
I'm sending you an updated version of that patch.
^ permalink raw reply [flat|nested] 24+ messages in thread
* [PATCH 08/11 v2] nohz: Allow rcu extended quiescent state handling seperately from tick stop
2011-10-07 16:22 ` [PATCH 08/11] nohz: Allow rcu extended quiescent state handling seperately from tick stop Frederic Weisbecker
2011-10-08 13:43 ` Frederic Weisbecker
@ 2011-10-08 14:01 ` Frederic Weisbecker
2011-10-13 6:57 ` Paul E. McKenney
1 sibling, 1 reply; 24+ messages in thread
From: Frederic Weisbecker @ 2011-10-08 14:01 UTC (permalink / raw)
To: Paul E. McKenney
Cc: LKML, Frederic Weisbecker, Mike Frysinger, Guan Xuetao,
David Miller, Chris Metcalf, Hans-Christian Egtvedt, Ralf Baechle,
Ingo Molnar, Peter Zijlstra, Thomas Gleixner, H. Peter Anvin,
Russell King, Paul Mackerras, Heiko Carstens, Paul Mundt
It is assumed that rcu won't be used once we switch to tickless
mode and until we restart the tick. However this is not always
true, as in x86-64 where we dereference the idle notifiers after
the tick is stopped.
To prepare for fixing this, add two new APIs:
tick_nohz_idle_enter_norcu() and tick_nohz_idle_exit_norcu().
If no use of RCU is made in the idle loop between
tick_nohz_enter_idle() and tick_nohz_exit_idle() calls, the arch
must instead call the new *_norcu() version such that the arch doesn't
need to call rcu_idle_enter() and rcu_idle_exit().
Otherwise the arch must call tick_nohz_enter_idle() and
tick_nohz_exit_idle() and also call explicitly:
- rcu_idle_enter() after its last use of RCU before the CPU is put
to sleep.
- rcu_idle_exit() before the first use of RCU after the CPU is woken
up.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mike Frysinger <vapier@gentoo.org>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: David Miller <davem@davemloft.net>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Hans-Christian Egtvedt <hans-christian.egtvedt@atmel.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
---
arch/arm/kernel/process.c | 4 +-
arch/avr32/kernel/process.c | 4 +-
arch/blackfin/kernel/process.c | 4 +-
arch/microblaze/kernel/process.c | 4 +-
arch/mips/kernel/process.c | 4 +-
arch/openrisc/kernel/idle.c | 4 +-
arch/powerpc/kernel/idle.c | 4 +-
arch/powerpc/platforms/iseries/setup.c | 8 +++---
arch/s390/kernel/process.c | 4 +-
arch/sh/kernel/idle.c | 4 +-
arch/sparc/kernel/process_64.c | 4 +-
arch/tile/kernel/process.c | 4 +-
arch/um/kernel/process.c | 4 +-
arch/unicore32/kernel/process.c | 4 +-
arch/x86/kernel/process_32.c | 4 +-
arch/x86/kernel/process_64.c | 4 +-
include/linux/tick.h | 45 +++++++++++++++++++++++++++++--
kernel/time/tick-sched.c | 25 +++++++++--------
18 files changed, 89 insertions(+), 49 deletions(-)
diff --git a/arch/arm/kernel/process.c b/arch/arm/kernel/process.c
index f9261d0..4f83362 100644
--- a/arch/arm/kernel/process.c
+++ b/arch/arm/kernel/process.c
@@ -183,7 +183,7 @@ void cpu_idle(void)
/* endless idle loop with no priority at all */
while (1) {
- tick_nohz_idle_enter();
+ tick_nohz_idle_enter_norcu();
leds_event(led_idle_start);
while (!need_resched()) {
#ifdef CONFIG_HOTPLUG_CPU
@@ -210,7 +210,7 @@ void cpu_idle(void)
}
}
leds_event(led_idle_end);
- tick_nohz_idle_exit();
+ tick_nohz_idle_exit_norcu();
preempt_enable_no_resched();
schedule();
preempt_disable();
diff --git a/arch/avr32/kernel/process.c b/arch/avr32/kernel/process.c
index 6ee7952..34c8c70 100644
--- a/arch/avr32/kernel/process.c
+++ b/arch/avr32/kernel/process.c
@@ -34,10 +34,10 @@ void cpu_idle(void)
{
/* endless idle loop with no priority at all */
while (1) {
- tick_nohz_idle_enter();
+ tick_nohz_idle_enter_norcu();
while (!need_resched())
cpu_idle_sleep();
- tick_nohz_idle_exit();
+ tick_nohz_idle_exit_norcu();
preempt_enable_no_resched();
schedule();
preempt_disable();
diff --git a/arch/blackfin/kernel/process.c b/arch/blackfin/kernel/process.c
index 7b141b5..57e0749 100644
--- a/arch/blackfin/kernel/process.c
+++ b/arch/blackfin/kernel/process.c
@@ -88,10 +88,10 @@ void cpu_idle(void)
#endif
if (!idle)
idle = default_idle;
- tick_nohz_idle_enter();
+ tick_nohz_idle_enter_norcu();
while (!need_resched())
idle();
- tick_nohz_idle_exit();
+ tick_nohz_idle_exit_norcu();
preempt_enable_no_resched();
schedule();
preempt_disable();
diff --git a/arch/microblaze/kernel/process.c b/arch/microblaze/kernel/process.c
index 6dc123e..c6ece38 100644
--- a/arch/microblaze/kernel/process.c
+++ b/arch/microblaze/kernel/process.c
@@ -103,10 +103,10 @@ void cpu_idle(void)
if (!idle)
idle = default_idle;
- tick_nohz_idle_enter();
+ tick_nohz_idle_enter_norcu();
while (!need_resched())
idle();
- tick_nohz_idle_exit();
+ tick_nohz_idle_exit_norcu();
preempt_enable_no_resched();
schedule();
diff --git a/arch/mips/kernel/process.c b/arch/mips/kernel/process.c
index d50a005..7df2ffc 100644
--- a/arch/mips/kernel/process.c
+++ b/arch/mips/kernel/process.c
@@ -56,7 +56,7 @@ void __noreturn cpu_idle(void)
/* endless idle loop with no priority at all */
while (1) {
- tick_nohz_idle_enter();
+ tick_nohz_idle_enter_norcu();
while (!need_resched() && cpu_online(cpu)) {
#ifdef CONFIG_MIPS_MT_SMTC
extern void smtc_idle_loop_hook(void);
@@ -77,7 +77,7 @@ void __noreturn cpu_idle(void)
system_state == SYSTEM_BOOTING))
play_dead();
#endif
- tick_nohz_idle_exit();
+ tick_nohz_idle_exit_norcu();
preempt_enable_no_resched();
schedule();
preempt_disable();
diff --git a/arch/openrisc/kernel/idle.c b/arch/openrisc/kernel/idle.c
index fb6a9bf..2e82cd0 100644
--- a/arch/openrisc/kernel/idle.c
+++ b/arch/openrisc/kernel/idle.c
@@ -51,7 +51,7 @@ void cpu_idle(void)
/* endless idle loop with no priority at all */
while (1) {
- tick_nohz_idle_enter();
+ tick_nohz_idle_enter_norcu();
while (!need_resched()) {
check_pgt_cache();
@@ -69,7 +69,7 @@ void cpu_idle(void)
set_thread_flag(TIF_POLLING_NRFLAG);
}
- tick_nohz_idle_exit();
+ tick_nohz_idle_exit_norcu();
preempt_enable_no_resched();
schedule();
preempt_disable();
diff --git a/arch/powerpc/kernel/idle.c b/arch/powerpc/kernel/idle.c
index 878572f..2e782a3 100644
--- a/arch/powerpc/kernel/idle.c
+++ b/arch/powerpc/kernel/idle.c
@@ -56,7 +56,7 @@ void cpu_idle(void)
set_thread_flag(TIF_POLLING_NRFLAG);
while (1) {
- tick_nohz_idle_enter();
+ tick_nohz_idle_enter_norcu();
while (!need_resched() && !cpu_should_die()) {
ppc64_runlatch_off();
@@ -93,7 +93,7 @@ void cpu_idle(void)
HMT_medium();
ppc64_runlatch_on();
- tick_nohz_idle_exit();
+ tick_nohz_idle_exit_norcu();
preempt_enable_no_resched();
if (cpu_should_die())
cpu_die();
diff --git a/arch/powerpc/platforms/iseries/setup.c b/arch/powerpc/platforms/iseries/setup.c
index e2f5fad..77ff6eb 100644
--- a/arch/powerpc/platforms/iseries/setup.c
+++ b/arch/powerpc/platforms/iseries/setup.c
@@ -562,7 +562,7 @@ static void yield_shared_processor(void)
static void iseries_shared_idle(void)
{
while (1) {
- tick_nohz_idle_enter();
+ tick_nohz_idle_enter_norcu();
while (!need_resched() && !hvlpevent_is_pending()) {
local_irq_disable();
ppc64_runlatch_off();
@@ -576,7 +576,7 @@ static void iseries_shared_idle(void)
}
ppc64_runlatch_on();
- tick_nohz_idle_exit();
+ tick_nohz_idle_exit_norcu();
if (hvlpevent_is_pending())
process_iSeries_events();
@@ -592,7 +592,7 @@ static void iseries_dedicated_idle(void)
set_thread_flag(TIF_POLLING_NRFLAG);
while (1) {
- tick_nohz_idle_enter();
+ tick_nohz_idle_enter_norcu();
if (!need_resched()) {
while (!need_resched()) {
ppc64_runlatch_off();
@@ -609,7 +609,7 @@ static void iseries_dedicated_idle(void)
}
ppc64_runlatch_on();
- tick_nohz_idle_exit();
+ tick_nohz_idle_exit_norcu();
preempt_enable_no_resched();
schedule();
preempt_disable();
diff --git a/arch/s390/kernel/process.c b/arch/s390/kernel/process.c
index db3e930..44028ae 100644
--- a/arch/s390/kernel/process.c
+++ b/arch/s390/kernel/process.c
@@ -90,10 +90,10 @@ static void default_idle(void)
void cpu_idle(void)
{
for (;;) {
- tick_nohz_idle_enter();
+ tick_nohz_idle_enter_norcu();
while (!need_resched())
default_idle();
- tick_nohz_idle_exit();
+ tick_nohz_idle_exit_norcu();
preempt_enable_no_resched();
schedule();
preempt_disable();
diff --git a/arch/sh/kernel/idle.c b/arch/sh/kernel/idle.c
index 6015743..ad58e75 100644
--- a/arch/sh/kernel/idle.c
+++ b/arch/sh/kernel/idle.c
@@ -89,7 +89,7 @@ void cpu_idle(void)
/* endless idle loop with no priority at all */
while (1) {
- tick_nohz_idle_enter();
+ tick_nohz_idle_enter_norcu();
while (!need_resched()) {
check_pgt_cache();
@@ -111,7 +111,7 @@ void cpu_idle(void)
start_critical_timings();
}
- tick_nohz_idle_exit();
+ tick_nohz_idle_exit_norcu();
preempt_enable_no_resched();
schedule();
preempt_disable();
diff --git a/arch/sparc/kernel/process_64.c b/arch/sparc/kernel/process_64.c
index 1235f63..78b1bc0 100644
--- a/arch/sparc/kernel/process_64.c
+++ b/arch/sparc/kernel/process_64.c
@@ -95,12 +95,12 @@ void cpu_idle(void)
set_thread_flag(TIF_POLLING_NRFLAG);
while(1) {
- tick_nohz_idle_enter();
+ tick_nohz_idle_enter_norcu();
while (!need_resched() && !cpu_is_offline(cpu))
sparc64_yield(cpu);
- tick_nohz_idle_exit();
+ tick_nohz_idle_exit_norcu();
preempt_enable_no_resched();
diff --git a/arch/tile/kernel/process.c b/arch/tile/kernel/process.c
index 920e674..53ac895 100644
--- a/arch/tile/kernel/process.c
+++ b/arch/tile/kernel/process.c
@@ -85,7 +85,7 @@ void cpu_idle(void)
/* endless idle loop with no priority at all */
while (1) {
- tick_nohz_idle_enter();
+ tick_nohz_idle_enter_norcu();
while (!need_resched()) {
if (cpu_is_offline(cpu))
BUG(); /* no HOTPLUG_CPU */
@@ -105,7 +105,7 @@ void cpu_idle(void)
local_irq_enable();
current_thread_info()->status |= TS_POLLING;
}
- tick_nohz_idle_exit();
+ tick_nohz_idle_exit_norcu();
preempt_enable_no_resched();
schedule();
preempt_disable();
diff --git a/arch/um/kernel/process.c b/arch/um/kernel/process.c
index 41acf59..9e7176b 100644
--- a/arch/um/kernel/process.c
+++ b/arch/um/kernel/process.c
@@ -245,10 +245,10 @@ void default_idle(void)
if (need_resched())
schedule();
- tick_nohz_idle_enter();
+ tick_nohz_idle_enter_norcu();
nsecs = disable_timer();
idle_sleep(nsecs);
- tick_nohz_idle_exit();
+ tick_nohz_idle_exit_norcu();
}
}
diff --git a/arch/unicore32/kernel/process.c b/arch/unicore32/kernel/process.c
index 9999b9a..095ff5a 100644
--- a/arch/unicore32/kernel/process.c
+++ b/arch/unicore32/kernel/process.c
@@ -55,7 +55,7 @@ void cpu_idle(void)
{
/* endless idle loop with no priority at all */
while (1) {
- tick_nohz_idle_enter();
+ tick_nohz_idle_enter_norcu();
while (!need_resched()) {
local_irq_disable();
stop_critical_timings();
@@ -63,7 +63,7 @@ void cpu_idle(void)
local_irq_enable();
start_critical_timings();
}
- tick_nohz_idle_exit();
+ tick_nohz_idle_exit_norcu();
preempt_enable_no_resched();
schedule();
preempt_disable();
diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
index ad93205..f311d096 100644
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -98,7 +98,7 @@ void cpu_idle(void)
/* endless idle loop with no priority at all */
while (1) {
- tick_nohz_idle_enter();
+ tick_nohz_idle_enter_norcu();
while (!need_resched()) {
check_pgt_cache();
@@ -114,7 +114,7 @@ void cpu_idle(void)
pm_idle();
start_critical_timings();
}
- tick_nohz_idle_exit();
+ tick_nohz_idle_exit_norcu();
preempt_enable_no_resched();
schedule();
preempt_disable();
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 9ca714e..e72daf9 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -121,7 +121,7 @@ void cpu_idle(void)
/* endless idle loop with no priority at all */
while (1) {
- tick_nohz_idle_enter();
+ tick_nohz_idle_enter_norcu();
while (!need_resched()) {
rmb();
@@ -147,7 +147,7 @@ void cpu_idle(void)
__exit_idle();
}
- tick_nohz_idle_exit();
+ tick_nohz_idle_exit_norcu();
preempt_enable_no_resched();
schedule();
preempt_disable();
diff --git a/include/linux/tick.h b/include/linux/tick.h
index 0df1d50..7224396 100644
--- a/include/linux/tick.h
+++ b/include/linux/tick.h
@@ -7,6 +7,7 @@
#define _LINUX_TICK_H
#include <linux/clockchips.h>
+#include <linux/irqflags.h>
#ifdef CONFIG_GENERIC_CLOCKEVENTS
@@ -121,18 +122,56 @@ static inline int tick_oneshot_mode_active(void) { return 0; }
#endif /* !CONFIG_GENERIC_CLOCKEVENTS */
# ifdef CONFIG_NO_HZ
-extern void tick_nohz_idle_enter(void);
+extern void __tick_nohz_idle_enter(void);
+static inline void tick_nohz_idle_enter(void)
+{
+ local_irq_disable();
+ __tick_nohz_idle_enter();
+ local_irq_enable();
+}
extern void tick_nohz_idle_exit(void);
+
+/*
+ * Call this pair of function if the arch doesn't make any use
+ * of RCU in-between. You won't need to call rcu_idle_enter() and
+ * rcu_idle_exit().
+ * Otherwise you need to call tick_nohz_idle_enter() and tick_nohz_idle_exit()
+ * and explicitly tell RCU about the window around the place the CPU enters low
+ * power mode where no RCU use is made. This is done by calling rcu_idle_enter()
+ * after the last use of RCU before the CPU is put to sleep and by calling
+ * rcu_idle_exit() before the first use of RCU after the CPU woke up.
+ */
+static inline void tick_nohz_idle_enter_norcu(void)
+{
+ /*
+ * Also call rcu_idle_enter() in the irq disabled section even
+ * if it disables irq itself.
+ * Just an optimization that prevents from an interrupt happening
+ * between it and __tick_nohz_idle_enter() to lose time to help completing
+ * a grace period while we could be in extended grace period already.
+ */
+ local_irq_disable();
+ __tick_nohz_idle_enter();
+ rcu_idle_enter();
+ local_irq_enable();
+}
+static inline void tick_nohz_idle_exit_norcu(void)
+{
+ rcu_idle_exit();
+ tick_nohz_idle_exit();
+}
extern void tick_nohz_irq_exit(void);
extern ktime_t tick_nohz_get_sleep_length(void);
extern u64 get_cpu_idle_time_us(int cpu, u64 *last_update_time);
extern u64 get_cpu_iowait_time_us(int cpu, u64 *last_update_time);
# else
-static inline void tick_nohz_idle_enter(void)
+static inline void tick_nohz_idle_enter(void) { }
+static inline void tick_nohz_idle_exit(void) { }
+static inline void tick_nohz_idle_enter_norcu(void)
{
rcu_idle_enter();
}
-static inline void tick_nohz_idle_exit(void)
+static inline void tick_nohz_idle_exit_norcu(void)
{
rcu_idle_exit();
}
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 52b7ace..360d028 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -424,18 +424,22 @@ out:
*
* When the next event is more than a tick into the future, stop the idle tick
* Called when we start the idle loop.
- * This also enters into RCU extended quiescent state so that this CPU doesn't
- * need anymore to be part of any global grace period completion. This way
- * the tick can be stopped safely as we don't need to report quiescent states.
+ *
+ * If no use of RCU is made in the idle loop between
+ * tick_nohz_idle_enter() and tick_nohz_idle_exit() calls, then
+ * tick_nohz_idle_enter_norcu() should be called instead and the arch
+ * doesn't need to call rcu_idle_enter() and rcu_idle_exit() explicitly.
+ *
+ * Otherwise the arch is responsible of calling:
+ *
+ * - rcu_idle_enter() after its last use of RCU before the CPU is put
+ * to sleep.
+ * - rcu_idle_exit() before the first use of RCU after the CPU is woken up.
*/
-void tick_nohz_idle_enter(void)
+void __tick_nohz_idle_enter(void)
{
struct tick_sched *ts;
- WARN_ON_ONCE(irqs_disabled());
-
- local_irq_disable();
-
ts = &__get_cpu_var(tick_cpu_sched);
/*
* set ts->inidle unconditionally. even if the system did not
@@ -444,9 +448,6 @@ void tick_nohz_idle_enter(void)
*/
ts->inidle = 1;
tick_nohz_stop_sched_tick(ts);
- rcu_idle_enter();
-
- local_irq_enable();
}
/**
@@ -522,7 +523,7 @@ void tick_nohz_idle_exit(void)
ktime_t now;
local_irq_disable();
- rcu_idle_exit();
+
if (ts->idle_active || (ts->inidle && ts->tick_stopped))
now = ktime_get();
--
1.7.5.4
^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: [PATCH 08/11 v2] nohz: Allow rcu extended quiescent state handling seperately from tick stop
2011-10-08 14:01 ` [PATCH 08/11 v2] " Frederic Weisbecker
@ 2011-10-13 6:57 ` Paul E. McKenney
2011-10-13 7:03 ` Paul E. McKenney
0 siblings, 1 reply; 24+ messages in thread
From: Paul E. McKenney @ 2011-10-13 6:57 UTC (permalink / raw)
To: Frederic Weisbecker
Cc: LKML, Mike Frysinger, Guan Xuetao, David Miller, Chris Metcalf,
Hans-Christian Egtvedt, Ralf Baechle, Ingo Molnar, Peter Zijlstra,
Thomas Gleixner, H. Peter Anvin, Russell King, Paul Mackerras,
Heiko Carstens, Paul Mundt
On Sat, Oct 08, 2011 at 04:01:00PM +0200, Frederic Weisbecker wrote:
> It is assumed that rcu won't be used once we switch to tickless
> mode and until we restart the tick. However this is not always
> true, as in x86-64 where we dereference the idle notifiers after
> the tick is stopped.
>
> To prepare for fixing this, add two new APIs:
> tick_nohz_idle_enter_norcu() and tick_nohz_idle_exit_norcu().
>
> If no use of RCU is made in the idle loop between
> tick_nohz_enter_idle() and tick_nohz_exit_idle() calls, the arch
> must instead call the new *_norcu() version such that the arch doesn't
> need to call rcu_idle_enter() and rcu_idle_exit().
>
> Otherwise the arch must call tick_nohz_enter_idle() and
> tick_nohz_exit_idle() and also call explicitly:
>
> - rcu_idle_enter() after its last use of RCU before the CPU is put
> to sleep.
> - rcu_idle_exit() before the first use of RCU after the CPU is woken
> up.
Thank you, Frederic! I have queued this to replace the earlier
version. The set is available on branch rcu/dyntick of
https://github.com/paulmckrcu/linux
Thanx, Paul
> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
> Cc: Mike Frysinger <vapier@gentoo.org>
> Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
> Cc: David Miller <davem@davemloft.net>
> Cc: Chris Metcalf <cmetcalf@tilera.com>
> Cc: Hans-Christian Egtvedt <hans-christian.egtvedt@atmel.com>
> Cc: Ralf Baechle <ralf@linux-mips.org>
> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: H. Peter Anvin <hpa@zytor.com>
> Cc: Russell King <linux@arm.linux.org.uk>
> Cc: Paul Mackerras <paulus@samba.org>
> Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
> Cc: Paul Mundt <lethal@linux-sh.org>
> ---
> arch/arm/kernel/process.c | 4 +-
> arch/avr32/kernel/process.c | 4 +-
> arch/blackfin/kernel/process.c | 4 +-
> arch/microblaze/kernel/process.c | 4 +-
> arch/mips/kernel/process.c | 4 +-
> arch/openrisc/kernel/idle.c | 4 +-
> arch/powerpc/kernel/idle.c | 4 +-
> arch/powerpc/platforms/iseries/setup.c | 8 +++---
> arch/s390/kernel/process.c | 4 +-
> arch/sh/kernel/idle.c | 4 +-
> arch/sparc/kernel/process_64.c | 4 +-
> arch/tile/kernel/process.c | 4 +-
> arch/um/kernel/process.c | 4 +-
> arch/unicore32/kernel/process.c | 4 +-
> arch/x86/kernel/process_32.c | 4 +-
> arch/x86/kernel/process_64.c | 4 +-
> include/linux/tick.h | 45 +++++++++++++++++++++++++++++--
> kernel/time/tick-sched.c | 25 +++++++++--------
> 18 files changed, 89 insertions(+), 49 deletions(-)
>
> diff --git a/arch/arm/kernel/process.c b/arch/arm/kernel/process.c
> index f9261d0..4f83362 100644
> --- a/arch/arm/kernel/process.c
> +++ b/arch/arm/kernel/process.c
> @@ -183,7 +183,7 @@ void cpu_idle(void)
>
> /* endless idle loop with no priority at all */
> while (1) {
> - tick_nohz_idle_enter();
> + tick_nohz_idle_enter_norcu();
> leds_event(led_idle_start);
> while (!need_resched()) {
> #ifdef CONFIG_HOTPLUG_CPU
> @@ -210,7 +210,7 @@ void cpu_idle(void)
> }
> }
> leds_event(led_idle_end);
> - tick_nohz_idle_exit();
> + tick_nohz_idle_exit_norcu();
> preempt_enable_no_resched();
> schedule();
> preempt_disable();
> diff --git a/arch/avr32/kernel/process.c b/arch/avr32/kernel/process.c
> index 6ee7952..34c8c70 100644
> --- a/arch/avr32/kernel/process.c
> +++ b/arch/avr32/kernel/process.c
> @@ -34,10 +34,10 @@ void cpu_idle(void)
> {
> /* endless idle loop with no priority at all */
> while (1) {
> - tick_nohz_idle_enter();
> + tick_nohz_idle_enter_norcu();
> while (!need_resched())
> cpu_idle_sleep();
> - tick_nohz_idle_exit();
> + tick_nohz_idle_exit_norcu();
> preempt_enable_no_resched();
> schedule();
> preempt_disable();
> diff --git a/arch/blackfin/kernel/process.c b/arch/blackfin/kernel/process.c
> index 7b141b5..57e0749 100644
> --- a/arch/blackfin/kernel/process.c
> +++ b/arch/blackfin/kernel/process.c
> @@ -88,10 +88,10 @@ void cpu_idle(void)
> #endif
> if (!idle)
> idle = default_idle;
> - tick_nohz_idle_enter();
> + tick_nohz_idle_enter_norcu();
> while (!need_resched())
> idle();
> - tick_nohz_idle_exit();
> + tick_nohz_idle_exit_norcu();
> preempt_enable_no_resched();
> schedule();
> preempt_disable();
> diff --git a/arch/microblaze/kernel/process.c b/arch/microblaze/kernel/process.c
> index 6dc123e..c6ece38 100644
> --- a/arch/microblaze/kernel/process.c
> +++ b/arch/microblaze/kernel/process.c
> @@ -103,10 +103,10 @@ void cpu_idle(void)
> if (!idle)
> idle = default_idle;
>
> - tick_nohz_idle_enter();
> + tick_nohz_idle_enter_norcu();
> while (!need_resched())
> idle();
> - tick_nohz_idle_exit();
> + tick_nohz_idle_exit_norcu();
>
> preempt_enable_no_resched();
> schedule();
> diff --git a/arch/mips/kernel/process.c b/arch/mips/kernel/process.c
> index d50a005..7df2ffc 100644
> --- a/arch/mips/kernel/process.c
> +++ b/arch/mips/kernel/process.c
> @@ -56,7 +56,7 @@ void __noreturn cpu_idle(void)
>
> /* endless idle loop with no priority at all */
> while (1) {
> - tick_nohz_idle_enter();
> + tick_nohz_idle_enter_norcu();
> while (!need_resched() && cpu_online(cpu)) {
> #ifdef CONFIG_MIPS_MT_SMTC
> extern void smtc_idle_loop_hook(void);
> @@ -77,7 +77,7 @@ void __noreturn cpu_idle(void)
> system_state == SYSTEM_BOOTING))
> play_dead();
> #endif
> - tick_nohz_idle_exit();
> + tick_nohz_idle_exit_norcu();
> preempt_enable_no_resched();
> schedule();
> preempt_disable();
> diff --git a/arch/openrisc/kernel/idle.c b/arch/openrisc/kernel/idle.c
> index fb6a9bf..2e82cd0 100644
> --- a/arch/openrisc/kernel/idle.c
> +++ b/arch/openrisc/kernel/idle.c
> @@ -51,7 +51,7 @@ void cpu_idle(void)
>
> /* endless idle loop with no priority at all */
> while (1) {
> - tick_nohz_idle_enter();
> + tick_nohz_idle_enter_norcu();
>
> while (!need_resched()) {
> check_pgt_cache();
> @@ -69,7 +69,7 @@ void cpu_idle(void)
> set_thread_flag(TIF_POLLING_NRFLAG);
> }
>
> - tick_nohz_idle_exit();
> + tick_nohz_idle_exit_norcu();
> preempt_enable_no_resched();
> schedule();
> preempt_disable();
> diff --git a/arch/powerpc/kernel/idle.c b/arch/powerpc/kernel/idle.c
> index 878572f..2e782a3 100644
> --- a/arch/powerpc/kernel/idle.c
> +++ b/arch/powerpc/kernel/idle.c
> @@ -56,7 +56,7 @@ void cpu_idle(void)
>
> set_thread_flag(TIF_POLLING_NRFLAG);
> while (1) {
> - tick_nohz_idle_enter();
> + tick_nohz_idle_enter_norcu();
> while (!need_resched() && !cpu_should_die()) {
> ppc64_runlatch_off();
>
> @@ -93,7 +93,7 @@ void cpu_idle(void)
>
> HMT_medium();
> ppc64_runlatch_on();
> - tick_nohz_idle_exit();
> + tick_nohz_idle_exit_norcu();
> preempt_enable_no_resched();
> if (cpu_should_die())
> cpu_die();
> diff --git a/arch/powerpc/platforms/iseries/setup.c b/arch/powerpc/platforms/iseries/setup.c
> index e2f5fad..77ff6eb 100644
> --- a/arch/powerpc/platforms/iseries/setup.c
> +++ b/arch/powerpc/platforms/iseries/setup.c
> @@ -562,7 +562,7 @@ static void yield_shared_processor(void)
> static void iseries_shared_idle(void)
> {
> while (1) {
> - tick_nohz_idle_enter();
> + tick_nohz_idle_enter_norcu();
> while (!need_resched() && !hvlpevent_is_pending()) {
> local_irq_disable();
> ppc64_runlatch_off();
> @@ -576,7 +576,7 @@ static void iseries_shared_idle(void)
> }
>
> ppc64_runlatch_on();
> - tick_nohz_idle_exit();
> + tick_nohz_idle_exit_norcu();
>
> if (hvlpevent_is_pending())
> process_iSeries_events();
> @@ -592,7 +592,7 @@ static void iseries_dedicated_idle(void)
> set_thread_flag(TIF_POLLING_NRFLAG);
>
> while (1) {
> - tick_nohz_idle_enter();
> + tick_nohz_idle_enter_norcu();
> if (!need_resched()) {
> while (!need_resched()) {
> ppc64_runlatch_off();
> @@ -609,7 +609,7 @@ static void iseries_dedicated_idle(void)
> }
>
> ppc64_runlatch_on();
> - tick_nohz_idle_exit();
> + tick_nohz_idle_exit_norcu();
> preempt_enable_no_resched();
> schedule();
> preempt_disable();
> diff --git a/arch/s390/kernel/process.c b/arch/s390/kernel/process.c
> index db3e930..44028ae 100644
> --- a/arch/s390/kernel/process.c
> +++ b/arch/s390/kernel/process.c
> @@ -90,10 +90,10 @@ static void default_idle(void)
> void cpu_idle(void)
> {
> for (;;) {
> - tick_nohz_idle_enter();
> + tick_nohz_idle_enter_norcu();
> while (!need_resched())
> default_idle();
> - tick_nohz_idle_exit();
> + tick_nohz_idle_exit_norcu();
> preempt_enable_no_resched();
> schedule();
> preempt_disable();
> diff --git a/arch/sh/kernel/idle.c b/arch/sh/kernel/idle.c
> index 6015743..ad58e75 100644
> --- a/arch/sh/kernel/idle.c
> +++ b/arch/sh/kernel/idle.c
> @@ -89,7 +89,7 @@ void cpu_idle(void)
>
> /* endless idle loop with no priority at all */
> while (1) {
> - tick_nohz_idle_enter();
> + tick_nohz_idle_enter_norcu();
>
> while (!need_resched()) {
> check_pgt_cache();
> @@ -111,7 +111,7 @@ void cpu_idle(void)
> start_critical_timings();
> }
>
> - tick_nohz_idle_exit();
> + tick_nohz_idle_exit_norcu();
> preempt_enable_no_resched();
> schedule();
> preempt_disable();
> diff --git a/arch/sparc/kernel/process_64.c b/arch/sparc/kernel/process_64.c
> index 1235f63..78b1bc0 100644
> --- a/arch/sparc/kernel/process_64.c
> +++ b/arch/sparc/kernel/process_64.c
> @@ -95,12 +95,12 @@ void cpu_idle(void)
> set_thread_flag(TIF_POLLING_NRFLAG);
>
> while(1) {
> - tick_nohz_idle_enter();
> + tick_nohz_idle_enter_norcu();
>
> while (!need_resched() && !cpu_is_offline(cpu))
> sparc64_yield(cpu);
>
> - tick_nohz_idle_exit();
> + tick_nohz_idle_exit_norcu();
>
> preempt_enable_no_resched();
>
> diff --git a/arch/tile/kernel/process.c b/arch/tile/kernel/process.c
> index 920e674..53ac895 100644
> --- a/arch/tile/kernel/process.c
> +++ b/arch/tile/kernel/process.c
> @@ -85,7 +85,7 @@ void cpu_idle(void)
>
> /* endless idle loop with no priority at all */
> while (1) {
> - tick_nohz_idle_enter();
> + tick_nohz_idle_enter_norcu();
> while (!need_resched()) {
> if (cpu_is_offline(cpu))
> BUG(); /* no HOTPLUG_CPU */
> @@ -105,7 +105,7 @@ void cpu_idle(void)
> local_irq_enable();
> current_thread_info()->status |= TS_POLLING;
> }
> - tick_nohz_idle_exit();
> + tick_nohz_idle_exit_norcu();
> preempt_enable_no_resched();
> schedule();
> preempt_disable();
> diff --git a/arch/um/kernel/process.c b/arch/um/kernel/process.c
> index 41acf59..9e7176b 100644
> --- a/arch/um/kernel/process.c
> +++ b/arch/um/kernel/process.c
> @@ -245,10 +245,10 @@ void default_idle(void)
> if (need_resched())
> schedule();
>
> - tick_nohz_idle_enter();
> + tick_nohz_idle_enter_norcu();
> nsecs = disable_timer();
> idle_sleep(nsecs);
> - tick_nohz_idle_exit();
> + tick_nohz_idle_exit_norcu();
> }
> }
>
> diff --git a/arch/unicore32/kernel/process.c b/arch/unicore32/kernel/process.c
> index 9999b9a..095ff5a 100644
> --- a/arch/unicore32/kernel/process.c
> +++ b/arch/unicore32/kernel/process.c
> @@ -55,7 +55,7 @@ void cpu_idle(void)
> {
> /* endless idle loop with no priority at all */
> while (1) {
> - tick_nohz_idle_enter();
> + tick_nohz_idle_enter_norcu();
> while (!need_resched()) {
> local_irq_disable();
> stop_critical_timings();
> @@ -63,7 +63,7 @@ void cpu_idle(void)
> local_irq_enable();
> start_critical_timings();
> }
> - tick_nohz_idle_exit();
> + tick_nohz_idle_exit_norcu();
> preempt_enable_no_resched();
> schedule();
> preempt_disable();
> diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
> index ad93205..f311d096 100644
> --- a/arch/x86/kernel/process_32.c
> +++ b/arch/x86/kernel/process_32.c
> @@ -98,7 +98,7 @@ void cpu_idle(void)
>
> /* endless idle loop with no priority at all */
> while (1) {
> - tick_nohz_idle_enter();
> + tick_nohz_idle_enter_norcu();
> while (!need_resched()) {
>
> check_pgt_cache();
> @@ -114,7 +114,7 @@ void cpu_idle(void)
> pm_idle();
> start_critical_timings();
> }
> - tick_nohz_idle_exit();
> + tick_nohz_idle_exit_norcu();
> preempt_enable_no_resched();
> schedule();
> preempt_disable();
> diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
> index 9ca714e..e72daf9 100644
> --- a/arch/x86/kernel/process_64.c
> +++ b/arch/x86/kernel/process_64.c
> @@ -121,7 +121,7 @@ void cpu_idle(void)
>
> /* endless idle loop with no priority at all */
> while (1) {
> - tick_nohz_idle_enter();
> + tick_nohz_idle_enter_norcu();
> while (!need_resched()) {
>
> rmb();
> @@ -147,7 +147,7 @@ void cpu_idle(void)
> __exit_idle();
> }
>
> - tick_nohz_idle_exit();
> + tick_nohz_idle_exit_norcu();
> preempt_enable_no_resched();
> schedule();
> preempt_disable();
> diff --git a/include/linux/tick.h b/include/linux/tick.h
> index 0df1d50..7224396 100644
> --- a/include/linux/tick.h
> +++ b/include/linux/tick.h
> @@ -7,6 +7,7 @@
> #define _LINUX_TICK_H
>
> #include <linux/clockchips.h>
> +#include <linux/irqflags.h>
>
> #ifdef CONFIG_GENERIC_CLOCKEVENTS
>
> @@ -121,18 +122,56 @@ static inline int tick_oneshot_mode_active(void) { return 0; }
> #endif /* !CONFIG_GENERIC_CLOCKEVENTS */
>
> # ifdef CONFIG_NO_HZ
> -extern void tick_nohz_idle_enter(void);
> +extern void __tick_nohz_idle_enter(void);
> +static inline void tick_nohz_idle_enter(void)
> +{
> + local_irq_disable();
> + __tick_nohz_idle_enter();
> + local_irq_enable();
> +}
> extern void tick_nohz_idle_exit(void);
> +
> +/*
> + * Call this pair of function if the arch doesn't make any use
> + * of RCU in-between. You won't need to call rcu_idle_enter() and
> + * rcu_idle_exit().
> + * Otherwise you need to call tick_nohz_idle_enter() and tick_nohz_idle_exit()
> + * and explicitly tell RCU about the window around the place the CPU enters low
> + * power mode where no RCU use is made. This is done by calling rcu_idle_enter()
> + * after the last use of RCU before the CPU is put to sleep and by calling
> + * rcu_idle_exit() before the first use of RCU after the CPU woke up.
> + */
> +static inline void tick_nohz_idle_enter_norcu(void)
> +{
> + /*
> + * Also call rcu_idle_enter() in the irq disabled section even
> + * if it disables irq itself.
> + * Just an optimization that prevents from an interrupt happening
> + * between it and __tick_nohz_idle_enter() to lose time to help completing
> + * a grace period while we could be in extended grace period already.
> + */
> + local_irq_disable();
> + __tick_nohz_idle_enter();
> + rcu_idle_enter();
> + local_irq_enable();
> +}
> +static inline void tick_nohz_idle_exit_norcu(void)
> +{
> + rcu_idle_exit();
> + tick_nohz_idle_exit();
> +}
> extern void tick_nohz_irq_exit(void);
> extern ktime_t tick_nohz_get_sleep_length(void);
> extern u64 get_cpu_idle_time_us(int cpu, u64 *last_update_time);
> extern u64 get_cpu_iowait_time_us(int cpu, u64 *last_update_time);
> # else
> -static inline void tick_nohz_idle_enter(void)
> +static inline void tick_nohz_idle_enter(void) { }
> +static inline void tick_nohz_idle_exit(void) { }
> +static inline void tick_nohz_idle_enter_norcu(void)
> {
> rcu_idle_enter();
> }
> -static inline void tick_nohz_idle_exit(void)
> +static inline void tick_nohz_idle_exit_norcu(void)
> {
> rcu_idle_exit();
> }
> diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> index 52b7ace..360d028 100644
> --- a/kernel/time/tick-sched.c
> +++ b/kernel/time/tick-sched.c
> @@ -424,18 +424,22 @@ out:
> *
> * When the next event is more than a tick into the future, stop the idle tick
> * Called when we start the idle loop.
> - * This also enters into RCU extended quiescent state so that this CPU doesn't
> - * need anymore to be part of any global grace period completion. This way
> - * the tick can be stopped safely as we don't need to report quiescent states.
> + *
> + * If no use of RCU is made in the idle loop between
> + * tick_nohz_idle_enter() and tick_nohz_idle_exit() calls, then
> + * tick_nohz_idle_enter_norcu() should be called instead and the arch
> + * doesn't need to call rcu_idle_enter() and rcu_idle_exit() explicitly.
> + *
> + * Otherwise the arch is responsible of calling:
> + *
> + * - rcu_idle_enter() after its last use of RCU before the CPU is put
> + * to sleep.
> + * - rcu_idle_exit() before the first use of RCU after the CPU is woken up.
> */
> -void tick_nohz_idle_enter(void)
> +void __tick_nohz_idle_enter(void)
> {
> struct tick_sched *ts;
>
> - WARN_ON_ONCE(irqs_disabled());
> -
> - local_irq_disable();
> -
> ts = &__get_cpu_var(tick_cpu_sched);
> /*
> * set ts->inidle unconditionally. even if the system did not
> @@ -444,9 +448,6 @@ void tick_nohz_idle_enter(void)
> */
> ts->inidle = 1;
> tick_nohz_stop_sched_tick(ts);
> - rcu_idle_enter();
> -
> - local_irq_enable();
> }
>
> /**
> @@ -522,7 +523,7 @@ void tick_nohz_idle_exit(void)
> ktime_t now;
>
> local_irq_disable();
> - rcu_idle_exit();
> +
> if (ts->idle_active || (ts->inidle && ts->tick_stopped))
> now = ktime_get();
>
> --
> 1.7.5.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 08/11 v2] nohz: Allow rcu extended quiescent state handling seperately from tick stop
2011-10-13 6:57 ` Paul E. McKenney
@ 2011-10-13 7:03 ` Paul E. McKenney
2011-10-13 12:50 ` Frederic Weisbecker
0 siblings, 1 reply; 24+ messages in thread
From: Paul E. McKenney @ 2011-10-13 7:03 UTC (permalink / raw)
To: Frederic Weisbecker
Cc: LKML, Mike Frysinger, Guan Xuetao, David Miller, Chris Metcalf,
Hans-Christian Egtvedt, Ralf Baechle, Ingo Molnar, Peter Zijlstra,
Thomas Gleixner, H. Peter Anvin, Russell King, Paul Mackerras,
Heiko Carstens, Paul Mundt
On Wed, Oct 12, 2011 at 11:57:52PM -0700, Paul E. McKenney wrote:
> On Sat, Oct 08, 2011 at 04:01:00PM +0200, Frederic Weisbecker wrote:
> > It is assumed that rcu won't be used once we switch to tickless
> > mode and until we restart the tick. However this is not always
> > true, as in x86-64 where we dereference the idle notifiers after
> > the tick is stopped.
> >
> > To prepare for fixing this, add two new APIs:
> > tick_nohz_idle_enter_norcu() and tick_nohz_idle_exit_norcu().
> >
> > If no use of RCU is made in the idle loop between
> > tick_nohz_enter_idle() and tick_nohz_exit_idle() calls, the arch
> > must instead call the new *_norcu() version such that the arch doesn't
> > need to call rcu_idle_enter() and rcu_idle_exit().
> >
> > Otherwise the arch must call tick_nohz_enter_idle() and
> > tick_nohz_exit_idle() and also call explicitly:
> >
> > - rcu_idle_enter() after its last use of RCU before the CPU is put
> > to sleep.
> > - rcu_idle_exit() before the first use of RCU after the CPU is woken
> > up.
>
> Thank you, Frederic! I have queued this to replace the earlier
> version. The set is available on branch rcu/dyntick of
>
> https://github.com/paulmckrcu/linux
Which reminds me... About the ultimate objective, getting tick-free
operation. (Or, for the guys who want to eliminate the tick entirely,
shutting up the hrtimer stuff that they want to replace it with.)
I believe that you will then need to have two levels of not-in-dynticks
for processes, one for idle vs. not and another for when a process
switches from user-space to kernel execution. Correct, or am I
confused?
The reason I ask is that commit e11f5981 currently only allows one
level of not-in-dynticks for processes. It is easy to add another
level, but thought I should check beforehand.
Thanx, Paul
> > Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
> > Cc: Mike Frysinger <vapier@gentoo.org>
> > Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
> > Cc: David Miller <davem@davemloft.net>
> > Cc: Chris Metcalf <cmetcalf@tilera.com>
> > Cc: Hans-Christian Egtvedt <hans-christian.egtvedt@atmel.com>
> > Cc: Ralf Baechle <ralf@linux-mips.org>
> > Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > Cc: Ingo Molnar <mingo@redhat.com>
> > Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Cc: H. Peter Anvin <hpa@zytor.com>
> > Cc: Russell King <linux@arm.linux.org.uk>
> > Cc: Paul Mackerras <paulus@samba.org>
> > Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
> > Cc: Paul Mundt <lethal@linux-sh.org>
> > ---
> > arch/arm/kernel/process.c | 4 +-
> > arch/avr32/kernel/process.c | 4 +-
> > arch/blackfin/kernel/process.c | 4 +-
> > arch/microblaze/kernel/process.c | 4 +-
> > arch/mips/kernel/process.c | 4 +-
> > arch/openrisc/kernel/idle.c | 4 +-
> > arch/powerpc/kernel/idle.c | 4 +-
> > arch/powerpc/platforms/iseries/setup.c | 8 +++---
> > arch/s390/kernel/process.c | 4 +-
> > arch/sh/kernel/idle.c | 4 +-
> > arch/sparc/kernel/process_64.c | 4 +-
> > arch/tile/kernel/process.c | 4 +-
> > arch/um/kernel/process.c | 4 +-
> > arch/unicore32/kernel/process.c | 4 +-
> > arch/x86/kernel/process_32.c | 4 +-
> > arch/x86/kernel/process_64.c | 4 +-
> > include/linux/tick.h | 45 +++++++++++++++++++++++++++++--
> > kernel/time/tick-sched.c | 25 +++++++++--------
> > 18 files changed, 89 insertions(+), 49 deletions(-)
> >
> > diff --git a/arch/arm/kernel/process.c b/arch/arm/kernel/process.c
> > index f9261d0..4f83362 100644
> > --- a/arch/arm/kernel/process.c
> > +++ b/arch/arm/kernel/process.c
> > @@ -183,7 +183,7 @@ void cpu_idle(void)
> >
> > /* endless idle loop with no priority at all */
> > while (1) {
> > - tick_nohz_idle_enter();
> > + tick_nohz_idle_enter_norcu();
> > leds_event(led_idle_start);
> > while (!need_resched()) {
> > #ifdef CONFIG_HOTPLUG_CPU
> > @@ -210,7 +210,7 @@ void cpu_idle(void)
> > }
> > }
> > leds_event(led_idle_end);
> > - tick_nohz_idle_exit();
> > + tick_nohz_idle_exit_norcu();
> > preempt_enable_no_resched();
> > schedule();
> > preempt_disable();
> > diff --git a/arch/avr32/kernel/process.c b/arch/avr32/kernel/process.c
> > index 6ee7952..34c8c70 100644
> > --- a/arch/avr32/kernel/process.c
> > +++ b/arch/avr32/kernel/process.c
> > @@ -34,10 +34,10 @@ void cpu_idle(void)
> > {
> > /* endless idle loop with no priority at all */
> > while (1) {
> > - tick_nohz_idle_enter();
> > + tick_nohz_idle_enter_norcu();
> > while (!need_resched())
> > cpu_idle_sleep();
> > - tick_nohz_idle_exit();
> > + tick_nohz_idle_exit_norcu();
> > preempt_enable_no_resched();
> > schedule();
> > preempt_disable();
> > diff --git a/arch/blackfin/kernel/process.c b/arch/blackfin/kernel/process.c
> > index 7b141b5..57e0749 100644
> > --- a/arch/blackfin/kernel/process.c
> > +++ b/arch/blackfin/kernel/process.c
> > @@ -88,10 +88,10 @@ void cpu_idle(void)
> > #endif
> > if (!idle)
> > idle = default_idle;
> > - tick_nohz_idle_enter();
> > + tick_nohz_idle_enter_norcu();
> > while (!need_resched())
> > idle();
> > - tick_nohz_idle_exit();
> > + tick_nohz_idle_exit_norcu();
> > preempt_enable_no_resched();
> > schedule();
> > preempt_disable();
> > diff --git a/arch/microblaze/kernel/process.c b/arch/microblaze/kernel/process.c
> > index 6dc123e..c6ece38 100644
> > --- a/arch/microblaze/kernel/process.c
> > +++ b/arch/microblaze/kernel/process.c
> > @@ -103,10 +103,10 @@ void cpu_idle(void)
> > if (!idle)
> > idle = default_idle;
> >
> > - tick_nohz_idle_enter();
> > + tick_nohz_idle_enter_norcu();
> > while (!need_resched())
> > idle();
> > - tick_nohz_idle_exit();
> > + tick_nohz_idle_exit_norcu();
> >
> > preempt_enable_no_resched();
> > schedule();
> > diff --git a/arch/mips/kernel/process.c b/arch/mips/kernel/process.c
> > index d50a005..7df2ffc 100644
> > --- a/arch/mips/kernel/process.c
> > +++ b/arch/mips/kernel/process.c
> > @@ -56,7 +56,7 @@ void __noreturn cpu_idle(void)
> >
> > /* endless idle loop with no priority at all */
> > while (1) {
> > - tick_nohz_idle_enter();
> > + tick_nohz_idle_enter_norcu();
> > while (!need_resched() && cpu_online(cpu)) {
> > #ifdef CONFIG_MIPS_MT_SMTC
> > extern void smtc_idle_loop_hook(void);
> > @@ -77,7 +77,7 @@ void __noreturn cpu_idle(void)
> > system_state == SYSTEM_BOOTING))
> > play_dead();
> > #endif
> > - tick_nohz_idle_exit();
> > + tick_nohz_idle_exit_norcu();
> > preempt_enable_no_resched();
> > schedule();
> > preempt_disable();
> > diff --git a/arch/openrisc/kernel/idle.c b/arch/openrisc/kernel/idle.c
> > index fb6a9bf..2e82cd0 100644
> > --- a/arch/openrisc/kernel/idle.c
> > +++ b/arch/openrisc/kernel/idle.c
> > @@ -51,7 +51,7 @@ void cpu_idle(void)
> >
> > /* endless idle loop with no priority at all */
> > while (1) {
> > - tick_nohz_idle_enter();
> > + tick_nohz_idle_enter_norcu();
> >
> > while (!need_resched()) {
> > check_pgt_cache();
> > @@ -69,7 +69,7 @@ void cpu_idle(void)
> > set_thread_flag(TIF_POLLING_NRFLAG);
> > }
> >
> > - tick_nohz_idle_exit();
> > + tick_nohz_idle_exit_norcu();
> > preempt_enable_no_resched();
> > schedule();
> > preempt_disable();
> > diff --git a/arch/powerpc/kernel/idle.c b/arch/powerpc/kernel/idle.c
> > index 878572f..2e782a3 100644
> > --- a/arch/powerpc/kernel/idle.c
> > +++ b/arch/powerpc/kernel/idle.c
> > @@ -56,7 +56,7 @@ void cpu_idle(void)
> >
> > set_thread_flag(TIF_POLLING_NRFLAG);
> > while (1) {
> > - tick_nohz_idle_enter();
> > + tick_nohz_idle_enter_norcu();
> > while (!need_resched() && !cpu_should_die()) {
> > ppc64_runlatch_off();
> >
> > @@ -93,7 +93,7 @@ void cpu_idle(void)
> >
> > HMT_medium();
> > ppc64_runlatch_on();
> > - tick_nohz_idle_exit();
> > + tick_nohz_idle_exit_norcu();
> > preempt_enable_no_resched();
> > if (cpu_should_die())
> > cpu_die();
> > diff --git a/arch/powerpc/platforms/iseries/setup.c b/arch/powerpc/platforms/iseries/setup.c
> > index e2f5fad..77ff6eb 100644
> > --- a/arch/powerpc/platforms/iseries/setup.c
> > +++ b/arch/powerpc/platforms/iseries/setup.c
> > @@ -562,7 +562,7 @@ static void yield_shared_processor(void)
> > static void iseries_shared_idle(void)
> > {
> > while (1) {
> > - tick_nohz_idle_enter();
> > + tick_nohz_idle_enter_norcu();
> > while (!need_resched() && !hvlpevent_is_pending()) {
> > local_irq_disable();
> > ppc64_runlatch_off();
> > @@ -576,7 +576,7 @@ static void iseries_shared_idle(void)
> > }
> >
> > ppc64_runlatch_on();
> > - tick_nohz_idle_exit();
> > + tick_nohz_idle_exit_norcu();
> >
> > if (hvlpevent_is_pending())
> > process_iSeries_events();
> > @@ -592,7 +592,7 @@ static void iseries_dedicated_idle(void)
> > set_thread_flag(TIF_POLLING_NRFLAG);
> >
> > while (1) {
> > - tick_nohz_idle_enter();
> > + tick_nohz_idle_enter_norcu();
> > if (!need_resched()) {
> > while (!need_resched()) {
> > ppc64_runlatch_off();
> > @@ -609,7 +609,7 @@ static void iseries_dedicated_idle(void)
> > }
> >
> > ppc64_runlatch_on();
> > - tick_nohz_idle_exit();
> > + tick_nohz_idle_exit_norcu();
> > preempt_enable_no_resched();
> > schedule();
> > preempt_disable();
> > diff --git a/arch/s390/kernel/process.c b/arch/s390/kernel/process.c
> > index db3e930..44028ae 100644
> > --- a/arch/s390/kernel/process.c
> > +++ b/arch/s390/kernel/process.c
> > @@ -90,10 +90,10 @@ static void default_idle(void)
> > void cpu_idle(void)
> > {
> > for (;;) {
> > - tick_nohz_idle_enter();
> > + tick_nohz_idle_enter_norcu();
> > while (!need_resched())
> > default_idle();
> > - tick_nohz_idle_exit();
> > + tick_nohz_idle_exit_norcu();
> > preempt_enable_no_resched();
> > schedule();
> > preempt_disable();
> > diff --git a/arch/sh/kernel/idle.c b/arch/sh/kernel/idle.c
> > index 6015743..ad58e75 100644
> > --- a/arch/sh/kernel/idle.c
> > +++ b/arch/sh/kernel/idle.c
> > @@ -89,7 +89,7 @@ void cpu_idle(void)
> >
> > /* endless idle loop with no priority at all */
> > while (1) {
> > - tick_nohz_idle_enter();
> > + tick_nohz_idle_enter_norcu();
> >
> > while (!need_resched()) {
> > check_pgt_cache();
> > @@ -111,7 +111,7 @@ void cpu_idle(void)
> > start_critical_timings();
> > }
> >
> > - tick_nohz_idle_exit();
> > + tick_nohz_idle_exit_norcu();
> > preempt_enable_no_resched();
> > schedule();
> > preempt_disable();
> > diff --git a/arch/sparc/kernel/process_64.c b/arch/sparc/kernel/process_64.c
> > index 1235f63..78b1bc0 100644
> > --- a/arch/sparc/kernel/process_64.c
> > +++ b/arch/sparc/kernel/process_64.c
> > @@ -95,12 +95,12 @@ void cpu_idle(void)
> > set_thread_flag(TIF_POLLING_NRFLAG);
> >
> > while(1) {
> > - tick_nohz_idle_enter();
> > + tick_nohz_idle_enter_norcu();
> >
> > while (!need_resched() && !cpu_is_offline(cpu))
> > sparc64_yield(cpu);
> >
> > - tick_nohz_idle_exit();
> > + tick_nohz_idle_exit_norcu();
> >
> > preempt_enable_no_resched();
> >
> > diff --git a/arch/tile/kernel/process.c b/arch/tile/kernel/process.c
> > index 920e674..53ac895 100644
> > --- a/arch/tile/kernel/process.c
> > +++ b/arch/tile/kernel/process.c
> > @@ -85,7 +85,7 @@ void cpu_idle(void)
> >
> > /* endless idle loop with no priority at all */
> > while (1) {
> > - tick_nohz_idle_enter();
> > + tick_nohz_idle_enter_norcu();
> > while (!need_resched()) {
> > if (cpu_is_offline(cpu))
> > BUG(); /* no HOTPLUG_CPU */
> > @@ -105,7 +105,7 @@ void cpu_idle(void)
> > local_irq_enable();
> > current_thread_info()->status |= TS_POLLING;
> > }
> > - tick_nohz_idle_exit();
> > + tick_nohz_idle_exit_norcu();
> > preempt_enable_no_resched();
> > schedule();
> > preempt_disable();
> > diff --git a/arch/um/kernel/process.c b/arch/um/kernel/process.c
> > index 41acf59..9e7176b 100644
> > --- a/arch/um/kernel/process.c
> > +++ b/arch/um/kernel/process.c
> > @@ -245,10 +245,10 @@ void default_idle(void)
> > if (need_resched())
> > schedule();
> >
> > - tick_nohz_idle_enter();
> > + tick_nohz_idle_enter_norcu();
> > nsecs = disable_timer();
> > idle_sleep(nsecs);
> > - tick_nohz_idle_exit();
> > + tick_nohz_idle_exit_norcu();
> > }
> > }
> >
> > diff --git a/arch/unicore32/kernel/process.c b/arch/unicore32/kernel/process.c
> > index 9999b9a..095ff5a 100644
> > --- a/arch/unicore32/kernel/process.c
> > +++ b/arch/unicore32/kernel/process.c
> > @@ -55,7 +55,7 @@ void cpu_idle(void)
> > {
> > /* endless idle loop with no priority at all */
> > while (1) {
> > - tick_nohz_idle_enter();
> > + tick_nohz_idle_enter_norcu();
> > while (!need_resched()) {
> > local_irq_disable();
> > stop_critical_timings();
> > @@ -63,7 +63,7 @@ void cpu_idle(void)
> > local_irq_enable();
> > start_critical_timings();
> > }
> > - tick_nohz_idle_exit();
> > + tick_nohz_idle_exit_norcu();
> > preempt_enable_no_resched();
> > schedule();
> > preempt_disable();
> > diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
> > index ad93205..f311d096 100644
> > --- a/arch/x86/kernel/process_32.c
> > +++ b/arch/x86/kernel/process_32.c
> > @@ -98,7 +98,7 @@ void cpu_idle(void)
> >
> > /* endless idle loop with no priority at all */
> > while (1) {
> > - tick_nohz_idle_enter();
> > + tick_nohz_idle_enter_norcu();
> > while (!need_resched()) {
> >
> > check_pgt_cache();
> > @@ -114,7 +114,7 @@ void cpu_idle(void)
> > pm_idle();
> > start_critical_timings();
> > }
> > - tick_nohz_idle_exit();
> > + tick_nohz_idle_exit_norcu();
> > preempt_enable_no_resched();
> > schedule();
> > preempt_disable();
> > diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
> > index 9ca714e..e72daf9 100644
> > --- a/arch/x86/kernel/process_64.c
> > +++ b/arch/x86/kernel/process_64.c
> > @@ -121,7 +121,7 @@ void cpu_idle(void)
> >
> > /* endless idle loop with no priority at all */
> > while (1) {
> > - tick_nohz_idle_enter();
> > + tick_nohz_idle_enter_norcu();
> > while (!need_resched()) {
> >
> > rmb();
> > @@ -147,7 +147,7 @@ void cpu_idle(void)
> > __exit_idle();
> > }
> >
> > - tick_nohz_idle_exit();
> > + tick_nohz_idle_exit_norcu();
> > preempt_enable_no_resched();
> > schedule();
> > preempt_disable();
> > diff --git a/include/linux/tick.h b/include/linux/tick.h
> > index 0df1d50..7224396 100644
> > --- a/include/linux/tick.h
> > +++ b/include/linux/tick.h
> > @@ -7,6 +7,7 @@
> > #define _LINUX_TICK_H
> >
> > #include <linux/clockchips.h>
> > +#include <linux/irqflags.h>
> >
> > #ifdef CONFIG_GENERIC_CLOCKEVENTS
> >
> > @@ -121,18 +122,56 @@ static inline int tick_oneshot_mode_active(void) { return 0; }
> > #endif /* !CONFIG_GENERIC_CLOCKEVENTS */
> >
> > # ifdef CONFIG_NO_HZ
> > -extern void tick_nohz_idle_enter(void);
> > +extern void __tick_nohz_idle_enter(void);
> > +static inline void tick_nohz_idle_enter(void)
> > +{
> > + local_irq_disable();
> > + __tick_nohz_idle_enter();
> > + local_irq_enable();
> > +}
> > extern void tick_nohz_idle_exit(void);
> > +
> > +/*
> > + * Call this pair of function if the arch doesn't make any use
> > + * of RCU in-between. You won't need to call rcu_idle_enter() and
> > + * rcu_idle_exit().
> > + * Otherwise you need to call tick_nohz_idle_enter() and tick_nohz_idle_exit()
> > + * and explicitly tell RCU about the window around the place the CPU enters low
> > + * power mode where no RCU use is made. This is done by calling rcu_idle_enter()
> > + * after the last use of RCU before the CPU is put to sleep and by calling
> > + * rcu_idle_exit() before the first use of RCU after the CPU woke up.
> > + */
> > +static inline void tick_nohz_idle_enter_norcu(void)
> > +{
> > + /*
> > + * Also call rcu_idle_enter() in the irq disabled section even
> > + * if it disables irq itself.
> > + * Just an optimization that prevents from an interrupt happening
> > + * between it and __tick_nohz_idle_enter() to lose time to help completing
> > + * a grace period while we could be in extended grace period already.
> > + */
> > + local_irq_disable();
> > + __tick_nohz_idle_enter();
> > + rcu_idle_enter();
> > + local_irq_enable();
> > +}
> > +static inline void tick_nohz_idle_exit_norcu(void)
> > +{
> > + rcu_idle_exit();
> > + tick_nohz_idle_exit();
> > +}
> > extern void tick_nohz_irq_exit(void);
> > extern ktime_t tick_nohz_get_sleep_length(void);
> > extern u64 get_cpu_idle_time_us(int cpu, u64 *last_update_time);
> > extern u64 get_cpu_iowait_time_us(int cpu, u64 *last_update_time);
> > # else
> > -static inline void tick_nohz_idle_enter(void)
> > +static inline void tick_nohz_idle_enter(void) { }
> > +static inline void tick_nohz_idle_exit(void) { }
> > +static inline void tick_nohz_idle_enter_norcu(void)
> > {
> > rcu_idle_enter();
> > }
> > -static inline void tick_nohz_idle_exit(void)
> > +static inline void tick_nohz_idle_exit_norcu(void)
> > {
> > rcu_idle_exit();
> > }
> > diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> > index 52b7ace..360d028 100644
> > --- a/kernel/time/tick-sched.c
> > +++ b/kernel/time/tick-sched.c
> > @@ -424,18 +424,22 @@ out:
> > *
> > * When the next event is more than a tick into the future, stop the idle tick
> > * Called when we start the idle loop.
> > - * This also enters into RCU extended quiescent state so that this CPU doesn't
> > - * need anymore to be part of any global grace period completion. This way
> > - * the tick can be stopped safely as we don't need to report quiescent states.
> > + *
> > + * If no use of RCU is made in the idle loop between
> > + * tick_nohz_idle_enter() and tick_nohz_idle_exit() calls, then
> > + * tick_nohz_idle_enter_norcu() should be called instead and the arch
> > + * doesn't need to call rcu_idle_enter() and rcu_idle_exit() explicitly.
> > + *
> > + * Otherwise the arch is responsible of calling:
> > + *
> > + * - rcu_idle_enter() after its last use of RCU before the CPU is put
> > + * to sleep.
> > + * - rcu_idle_exit() before the first use of RCU after the CPU is woken up.
> > */
> > -void tick_nohz_idle_enter(void)
> > +void __tick_nohz_idle_enter(void)
> > {
> > struct tick_sched *ts;
> >
> > - WARN_ON_ONCE(irqs_disabled());
> > -
> > - local_irq_disable();
> > -
> > ts = &__get_cpu_var(tick_cpu_sched);
> > /*
> > * set ts->inidle unconditionally. even if the system did not
> > @@ -444,9 +448,6 @@ void tick_nohz_idle_enter(void)
> > */
> > ts->inidle = 1;
> > tick_nohz_stop_sched_tick(ts);
> > - rcu_idle_enter();
> > -
> > - local_irq_enable();
> > }
> >
> > /**
> > @@ -522,7 +523,7 @@ void tick_nohz_idle_exit(void)
> > ktime_t now;
> >
> > local_irq_disable();
> > - rcu_idle_exit();
> > +
> > if (ts->idle_active || (ts->inidle && ts->tick_stopped))
> > now = ktime_get();
> >
> > --
> > 1.7.5.4
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 08/11 v2] nohz: Allow rcu extended quiescent state handling seperately from tick stop
2011-10-13 7:03 ` Paul E. McKenney
@ 2011-10-13 12:50 ` Frederic Weisbecker
2011-10-13 22:51 ` Paul E. McKenney
0 siblings, 1 reply; 24+ messages in thread
From: Frederic Weisbecker @ 2011-10-13 12:50 UTC (permalink / raw)
To: Paul E. McKenney
Cc: LKML, Mike Frysinger, Guan Xuetao, David Miller, Chris Metcalf,
Hans-Christian Egtvedt, Ralf Baechle, Ingo Molnar, Peter Zijlstra,
Thomas Gleixner, H. Peter Anvin, Russell King, Paul Mackerras,
Heiko Carstens, Paul Mundt
On Thu, Oct 13, 2011 at 12:03:57AM -0700, Paul E. McKenney wrote:
> On Wed, Oct 12, 2011 at 11:57:52PM -0700, Paul E. McKenney wrote:
> > On Sat, Oct 08, 2011 at 04:01:00PM +0200, Frederic Weisbecker wrote:
> > > It is assumed that rcu won't be used once we switch to tickless
> > > mode and until we restart the tick. However this is not always
> > > true, as in x86-64 where we dereference the idle notifiers after
> > > the tick is stopped.
> > >
> > > To prepare for fixing this, add two new APIs:
> > > tick_nohz_idle_enter_norcu() and tick_nohz_idle_exit_norcu().
> > >
> > > If no use of RCU is made in the idle loop between
> > > tick_nohz_enter_idle() and tick_nohz_exit_idle() calls, the arch
> > > must instead call the new *_norcu() version such that the arch doesn't
> > > need to call rcu_idle_enter() and rcu_idle_exit().
> > >
> > > Otherwise the arch must call tick_nohz_enter_idle() and
> > > tick_nohz_exit_idle() and also call explicitly:
> > >
> > > - rcu_idle_enter() after its last use of RCU before the CPU is put
> > > to sleep.
> > > - rcu_idle_exit() before the first use of RCU after the CPU is woken
> > > up.
> >
> > Thank you, Frederic! I have queued this to replace the earlier
> > version. The set is available on branch rcu/dyntick of
> >
> > https://github.com/paulmckrcu/linux
>
> Which reminds me... About the ultimate objective, getting tick-free
> operation. (Or, for the guys who want to eliminate the tick entirely,
> shutting up the hrtimer stuff that they want to replace it with.)
>
> I believe that you will then need to have two levels of not-in-dynticks
> for processes, one for idle vs. not and another for when a process
> switches from user-space to kernel execution. Correct, or am I
> confused?
>
> The reason I ask is that commit e11f5981 currently only allows one
> level of not-in-dynticks for processes. It is easy to add another
> level, but thought I should check beforehand.
Hmm, yeah looking at that patch, it's going to be hard to have a nesting
that looks like:
rcu_irq_enter();
rcu_user_enter();
rcu_irq_exit(); <-- with effective extended quiescent state starting there
I also need to be able to call rcu_user_enter() from non-irq path.
I don't truly understand the problem of the usermode helpers that
mess up the dynticks counts. May be we can somewhow fix it differently
from the offending callsite?
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 08/11 v2] nohz: Allow rcu extended quiescent state handling seperately from tick stop
2011-10-13 12:50 ` Frederic Weisbecker
@ 2011-10-13 22:51 ` Paul E. McKenney
2011-10-14 12:08 ` Frederic Weisbecker
0 siblings, 1 reply; 24+ messages in thread
From: Paul E. McKenney @ 2011-10-13 22:51 UTC (permalink / raw)
To: Frederic Weisbecker
Cc: LKML, Mike Frysinger, Guan Xuetao, David Miller, Chris Metcalf,
Hans-Christian Egtvedt, Ralf Baechle, Ingo Molnar, Peter Zijlstra,
Thomas Gleixner, H. Peter Anvin, Russell King, Paul Mackerras,
Heiko Carstens, Paul Mundt
On Thu, Oct 13, 2011 at 02:50:20PM +0200, Frederic Weisbecker wrote:
> On Thu, Oct 13, 2011 at 12:03:57AM -0700, Paul E. McKenney wrote:
> > On Wed, Oct 12, 2011 at 11:57:52PM -0700, Paul E. McKenney wrote:
> > > On Sat, Oct 08, 2011 at 04:01:00PM +0200, Frederic Weisbecker wrote:
> > > > It is assumed that rcu won't be used once we switch to tickless
> > > > mode and until we restart the tick. However this is not always
> > > > true, as in x86-64 where we dereference the idle notifiers after
> > > > the tick is stopped.
> > > >
> > > > To prepare for fixing this, add two new APIs:
> > > > tick_nohz_idle_enter_norcu() and tick_nohz_idle_exit_norcu().
> > > >
> > > > If no use of RCU is made in the idle loop between
> > > > tick_nohz_enter_idle() and tick_nohz_exit_idle() calls, the arch
> > > > must instead call the new *_norcu() version such that the arch doesn't
> > > > need to call rcu_idle_enter() and rcu_idle_exit().
> > > >
> > > > Otherwise the arch must call tick_nohz_enter_idle() and
> > > > tick_nohz_exit_idle() and also call explicitly:
> > > >
> > > > - rcu_idle_enter() after its last use of RCU before the CPU is put
> > > > to sleep.
> > > > - rcu_idle_exit() before the first use of RCU after the CPU is woken
> > > > up.
> > >
> > > Thank you, Frederic! I have queued this to replace the earlier
> > > version. The set is available on branch rcu/dyntick of
> > >
> > > https://github.com/paulmckrcu/linux
> >
> > Which reminds me... About the ultimate objective, getting tick-free
> > operation. (Or, for the guys who want to eliminate the tick entirely,
> > shutting up the hrtimer stuff that they want to replace it with.)
> >
> > I believe that you will then need to have two levels of not-in-dynticks
> > for processes, one for idle vs. not and another for when a process
> > switches from user-space to kernel execution. Correct, or am I
> > confused?
> >
> > The reason I ask is that commit e11f5981 currently only allows one
> > level of not-in-dynticks for processes. It is easy to add another
> > level, but thought I should check beforehand.
>
> Hmm, yeah looking at that patch, it's going to be hard to have a nesting
> that looks like:
>
> rcu_irq_enter();
> rcu_user_enter();
> rcu_irq_exit(); <-- with effective extended quiescent state starting there
OK, so the idea here is that there has been two runnable processes on
the current CPU, but during the irq handler one of them moves or some
such? If so, how about a rcu_user_enter_fromirq() that sets the counter
to 1 so that the rcu_irq_exit() cleans up properly? If need be, I could
of course provide an argument to allow you to specify the count offset.
Or am I misunderstanding the problem?
> I also need to be able to call rcu_user_enter() from non-irq path.
Then rcu_user_enter_fromirq() would be for the irq path and
rcu_user_enter() from the non-irq path.
Would that work for you?
> I don't truly understand the problem of the usermode helpers that
> mess up the dynticks counts. May be we can somewhow fix it differently
> from the offending callsite?
I tried a few approaches along these lines, but there were way too
many opportunities for interruption and preemption along the way.
The problem is that unless the fixup happens under a no-preempt
region of code that includes the rcu_irq_enter() or rcu_irq_exit()
call (as the case may be), then you end up messing up the idle-depth
count of two CPUs rather than just one. :-(
But maybe I am missing something -- suggestions more than welcome!
Thanx, Paul
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 08/11 v2] nohz: Allow rcu extended quiescent state handling seperately from tick stop
2011-10-13 22:51 ` Paul E. McKenney
@ 2011-10-14 12:08 ` Frederic Weisbecker
2011-10-14 17:00 ` Paul E. McKenney
0 siblings, 1 reply; 24+ messages in thread
From: Frederic Weisbecker @ 2011-10-14 12:08 UTC (permalink / raw)
To: Paul E. McKenney
Cc: LKML, Mike Frysinger, Guan Xuetao, David Miller, Chris Metcalf,
Hans-Christian Egtvedt, Ralf Baechle, Ingo Molnar, Peter Zijlstra,
Thomas Gleixner, H. Peter Anvin, Russell King, Paul Mackerras,
Heiko Carstens, Paul Mundt
On Thu, Oct 13, 2011 at 03:51:36PM -0700, Paul E. McKenney wrote:
> On Thu, Oct 13, 2011 at 02:50:20PM +0200, Frederic Weisbecker wrote:
> > On Thu, Oct 13, 2011 at 12:03:57AM -0700, Paul E. McKenney wrote:
> > > On Wed, Oct 12, 2011 at 11:57:52PM -0700, Paul E. McKenney wrote:
> > > > On Sat, Oct 08, 2011 at 04:01:00PM +0200, Frederic Weisbecker wrote:
> > > > > It is assumed that rcu won't be used once we switch to tickless
> > > > > mode and until we restart the tick. However this is not always
> > > > > true, as in x86-64 where we dereference the idle notifiers after
> > > > > the tick is stopped.
> > > > >
> > > > > To prepare for fixing this, add two new APIs:
> > > > > tick_nohz_idle_enter_norcu() and tick_nohz_idle_exit_norcu().
> > > > >
> > > > > If no use of RCU is made in the idle loop between
> > > > > tick_nohz_enter_idle() and tick_nohz_exit_idle() calls, the arch
> > > > > must instead call the new *_norcu() version such that the arch doesn't
> > > > > need to call rcu_idle_enter() and rcu_idle_exit().
> > > > >
> > > > > Otherwise the arch must call tick_nohz_enter_idle() and
> > > > > tick_nohz_exit_idle() and also call explicitly:
> > > > >
> > > > > - rcu_idle_enter() after its last use of RCU before the CPU is put
> > > > > to sleep.
> > > > > - rcu_idle_exit() before the first use of RCU after the CPU is woken
> > > > > up.
> > > >
> > > > Thank you, Frederic! I have queued this to replace the earlier
> > > > version. The set is available on branch rcu/dyntick of
> > > >
> > > > https://github.com/paulmckrcu/linux
> > >
> > > Which reminds me... About the ultimate objective, getting tick-free
> > > operation. (Or, for the guys who want to eliminate the tick entirely,
> > > shutting up the hrtimer stuff that they want to replace it with.)
> > >
> > > I believe that you will then need to have two levels of not-in-dynticks
> > > for processes, one for idle vs. not and another for when a process
> > > switches from user-space to kernel execution. Correct, or am I
> > > confused?
> > >
> > > The reason I ask is that commit e11f5981 currently only allows one
> > > level of not-in-dynticks for processes. It is easy to add another
> > > level, but thought I should check beforehand.
> >
> > Hmm, yeah looking at that patch, it's going to be hard to have a nesting
> > that looks like:
> >
> > rcu_irq_enter();
> > rcu_user_enter();
> > rcu_irq_exit(); <-- with effective extended quiescent state starting there
>
> OK, so the idea here is that there has been two runnable processes on
> the current CPU, but during the irq handler one of them moves or some
> such?
No it happens when we have an irq in userspace and we stop the tick
from that irq. Noticing we are in userspace, we want to be in extended
quiescent state when we resume from the interrupt to userspace.
> If so, how about a rcu_user_enter_fromirq() that sets the counter
> to 1 so that the rcu_irq_exit() cleans up properly? If need be, I could
> of course provide an argument to allow you to specify the count offset.
Yeah I think that should work.
> > I also need to be able to call rcu_user_enter() from non-irq path.
>
> Then rcu_user_enter_fromirq() would be for the irq path and
> rcu_user_enter() from the non-irq path.
>
> Would that work for you?
Yep!
>
> > I don't truly understand the problem of the usermode helpers that
> > mess up the dynticks counts. May be we can somewhow fix it differently
> > from the offending callsite?
>
> I tried a few approaches along these lines, but there were way too
> many opportunities for interruption and preemption along the way.
> The problem is that unless the fixup happens under a no-preempt
> region of code that includes the rcu_irq_enter() or rcu_irq_exit()
> call (as the case may be), then you end up messing up the idle-depth
> count of two CPUs rather than just one. :-(
>
> But maybe I am missing something -- suggestions more than welcome!
It's rather me missing everything :)
It happens when we call call_usermodehelper()? If so how? We have a
call to rcu_irq_enter() that lacks an rcu_irq_exit() ?
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 08/11 v2] nohz: Allow rcu extended quiescent state handling seperately from tick stop
2011-10-14 12:08 ` Frederic Weisbecker
@ 2011-10-14 17:00 ` Paul E. McKenney
2011-10-16 13:28 ` Frederic Weisbecker
0 siblings, 1 reply; 24+ messages in thread
From: Paul E. McKenney @ 2011-10-14 17:00 UTC (permalink / raw)
To: Frederic Weisbecker
Cc: LKML, Mike Frysinger, Guan Xuetao, David Miller, Chris Metcalf,
Hans-Christian Egtvedt, Ralf Baechle, Ingo Molnar, Peter Zijlstra,
Thomas Gleixner, H. Peter Anvin, Russell King, Paul Mackerras,
Heiko Carstens, Paul Mundt, anton
On Fri, Oct 14, 2011 at 02:08:36PM +0200, Frederic Weisbecker wrote:
> On Thu, Oct 13, 2011 at 03:51:36PM -0700, Paul E. McKenney wrote:
> > On Thu, Oct 13, 2011 at 02:50:20PM +0200, Frederic Weisbecker wrote:
> > > On Thu, Oct 13, 2011 at 12:03:57AM -0700, Paul E. McKenney wrote:
> > > > On Wed, Oct 12, 2011 at 11:57:52PM -0700, Paul E. McKenney wrote:
> > > > > On Sat, Oct 08, 2011 at 04:01:00PM +0200, Frederic Weisbecker wrote:
> > > > > > It is assumed that rcu won't be used once we switch to tickless
> > > > > > mode and until we restart the tick. However this is not always
> > > > > > true, as in x86-64 where we dereference the idle notifiers after
> > > > > > the tick is stopped.
> > > > > >
> > > > > > To prepare for fixing this, add two new APIs:
> > > > > > tick_nohz_idle_enter_norcu() and tick_nohz_idle_exit_norcu().
> > > > > >
> > > > > > If no use of RCU is made in the idle loop between
> > > > > > tick_nohz_enter_idle() and tick_nohz_exit_idle() calls, the arch
> > > > > > must instead call the new *_norcu() version such that the arch doesn't
> > > > > > need to call rcu_idle_enter() and rcu_idle_exit().
> > > > > >
> > > > > > Otherwise the arch must call tick_nohz_enter_idle() and
> > > > > > tick_nohz_exit_idle() and also call explicitly:
> > > > > >
> > > > > > - rcu_idle_enter() after its last use of RCU before the CPU is put
> > > > > > to sleep.
> > > > > > - rcu_idle_exit() before the first use of RCU after the CPU is woken
> > > > > > up.
> > > > >
> > > > > Thank you, Frederic! I have queued this to replace the earlier
> > > > > version. The set is available on branch rcu/dyntick of
> > > > >
> > > > > https://github.com/paulmckrcu/linux
> > > >
> > > > Which reminds me... About the ultimate objective, getting tick-free
> > > > operation. (Or, for the guys who want to eliminate the tick entirely,
> > > > shutting up the hrtimer stuff that they want to replace it with.)
> > > >
> > > > I believe that you will then need to have two levels of not-in-dynticks
> > > > for processes, one for idle vs. not and another for when a process
> > > > switches from user-space to kernel execution. Correct, or am I
> > > > confused?
> > > >
> > > > The reason I ask is that commit e11f5981 currently only allows one
> > > > level of not-in-dynticks for processes. It is easy to add another
> > > > level, but thought I should check beforehand.
> > >
> > > Hmm, yeah looking at that patch, it's going to be hard to have a nesting
> > > that looks like:
> > >
> > > rcu_irq_enter();
> > > rcu_user_enter();
> > > rcu_irq_exit(); <-- with effective extended quiescent state starting there
> >
> > OK, so the idea here is that there has been two runnable processes on
> > the current CPU, but during the irq handler one of them moves or some
> > such?
>
> No it happens when we have an irq in userspace and we stop the tick
> from that irq. Noticing we are in userspace, we want to be in extended
> quiescent state when we resume from the interrupt to userspace.
Ah, OK!
> > If so, how about a rcu_user_enter_fromirq() that sets the counter
> > to 1 so that the rcu_irq_exit() cleans up properly? If need be, I could
> > of course provide an argument to allow you to specify the count offset.
>
> Yeah I think that should work.
Very good. I will start off with no argument, easy enough to add it
later if needed.
> > > I also need to be able to call rcu_user_enter() from non-irq path.
> >
> > Then rcu_user_enter_fromirq() would be for the irq path and
> > rcu_user_enter() from the non-irq path.
> >
> > Would that work for you?
>
> Yep!
Very good, I will take a whack at it. BTW, testing is going quite
well thus far with your current patches combined with my paranoid
idle-count approach. One test in particular that previously failed
reliably within minutes just successfully completed a ten-hour run.
So things are looking up! (Famous last words...)
> > > I don't truly understand the problem of the usermode helpers that
> > > mess up the dynticks counts. May be we can somewhow fix it differently
> > > from the offending callsite?
> >
> > I tried a few approaches along these lines, but there were way too
> > many opportunities for interruption and preemption along the way.
> > The problem is that unless the fixup happens under a no-preempt
> > region of code that includes the rcu_irq_enter() or rcu_irq_exit()
> > call (as the case may be), then you end up messing up the idle-depth
> > count of two CPUs rather than just one. :-(
> >
> > But maybe I am missing something -- suggestions more than welcome!
>
> It's rather me missing everything :)
> It happens when we call call_usermodehelper()? If so how? We have a
> call to rcu_irq_enter() that lacks an rcu_irq_exit() ?
On powerpc, it executes the "sc" ("system call") instruction from
kernel mode, which results in an exception. But from what I can see,
there is no corresponding return from exception, so my not-so-paranoid
counting scheme would lose count. That said, please keep in mind that
I in no way fully understand that code. It is also far from clear to
me why my earlier dyntick-idle code worked in this situation -- perhaps
the value of preempt_count() gets fixed up somehow -- I haven't really
studied all the assembly language involved in detail, so there is lots
of opportunity for such a fixup somewhere.
You asked! ;-)
Thanx, Paul
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 08/11 v2] nohz: Allow rcu extended quiescent state handling seperately from tick stop
2011-10-14 17:00 ` Paul E. McKenney
@ 2011-10-16 13:28 ` Frederic Weisbecker
2011-10-17 2:26 ` Paul E. McKenney
0 siblings, 1 reply; 24+ messages in thread
From: Frederic Weisbecker @ 2011-10-16 13:28 UTC (permalink / raw)
To: Paul E. McKenney
Cc: LKML, Mike Frysinger, Guan Xuetao, David Miller, Chris Metcalf,
Hans-Christian Egtvedt, Ralf Baechle, Ingo Molnar, Peter Zijlstra,
Thomas Gleixner, H. Peter Anvin, Russell King, Paul Mackerras,
Heiko Carstens, Paul Mundt, anton
On Fri, Oct 14, 2011 at 10:00:19AM -0700, Paul E. McKenney wrote:
> On Fri, Oct 14, 2011 at 02:08:36PM +0200, Frederic Weisbecker wrote:
> > On Thu, Oct 13, 2011 at 03:51:36PM -0700, Paul E. McKenney wrote:
> > > On Thu, Oct 13, 2011 at 02:50:20PM +0200, Frederic Weisbecker wrote:
> > > > On Thu, Oct 13, 2011 at 12:03:57AM -0700, Paul E. McKenney wrote:
> > > > > On Wed, Oct 12, 2011 at 11:57:52PM -0700, Paul E. McKenney wrote:
> > > > > > On Sat, Oct 08, 2011 at 04:01:00PM +0200, Frederic Weisbecker wrote:
> > > > > > > It is assumed that rcu won't be used once we switch to tickless
> > > > > > > mode and until we restart the tick. However this is not always
> > > > > > > true, as in x86-64 where we dereference the idle notifiers after
> > > > > > > the tick is stopped.
> > > > > > >
> > > > > > > To prepare for fixing this, add two new APIs:
> > > > > > > tick_nohz_idle_enter_norcu() and tick_nohz_idle_exit_norcu().
> > > > > > >
> > > > > > > If no use of RCU is made in the idle loop between
> > > > > > > tick_nohz_enter_idle() and tick_nohz_exit_idle() calls, the arch
> > > > > > > must instead call the new *_norcu() version such that the arch doesn't
> > > > > > > need to call rcu_idle_enter() and rcu_idle_exit().
> > > > > > >
> > > > > > > Otherwise the arch must call tick_nohz_enter_idle() and
> > > > > > > tick_nohz_exit_idle() and also call explicitly:
> > > > > > >
> > > > > > > - rcu_idle_enter() after its last use of RCU before the CPU is put
> > > > > > > to sleep.
> > > > > > > - rcu_idle_exit() before the first use of RCU after the CPU is woken
> > > > > > > up.
> > > > > >
> > > > > > Thank you, Frederic! I have queued this to replace the earlier
> > > > > > version. The set is available on branch rcu/dyntick of
> > > > > >
> > > > > > https://github.com/paulmckrcu/linux
> > > > >
> > > > > Which reminds me... About the ultimate objective, getting tick-free
> > > > > operation. (Or, for the guys who want to eliminate the tick entirely,
> > > > > shutting up the hrtimer stuff that they want to replace it with.)
> > > > >
> > > > > I believe that you will then need to have two levels of not-in-dynticks
> > > > > for processes, one for idle vs. not and another for when a process
> > > > > switches from user-space to kernel execution. Correct, or am I
> > > > > confused?
> > > > >
> > > > > The reason I ask is that commit e11f5981 currently only allows one
> > > > > level of not-in-dynticks for processes. It is easy to add another
> > > > > level, but thought I should check beforehand.
> > > >
> > > > Hmm, yeah looking at that patch, it's going to be hard to have a nesting
> > > > that looks like:
> > > >
> > > > rcu_irq_enter();
> > > > rcu_user_enter();
> > > > rcu_irq_exit(); <-- with effective extended quiescent state starting there
> > >
> > > OK, so the idea here is that there has been two runnable processes on
> > > the current CPU, but during the irq handler one of them moves or some
> > > such?
> >
> > No it happens when we have an irq in userspace and we stop the tick
> > from that irq. Noticing we are in userspace, we want to be in extended
> > quiescent state when we resume from the interrupt to userspace.
>
> Ah, OK!
>
> > > If so, how about a rcu_user_enter_fromirq() that sets the counter
> > > to 1 so that the rcu_irq_exit() cleans up properly? If need be, I could
> > > of course provide an argument to allow you to specify the count offset.
> >
> > Yeah I think that should work.
>
> Very good. I will start off with no argument, easy enough to add it
> later if needed.
Yeah, but for now, this is out of tree code. The functions seem quite trivial
to write so I guess I can sort it out in my tree when I'll rebase on 3.2
(if the current queue goes to the next merge window)
> > > > I also need to be able to call rcu_user_enter() from non-irq path.
> > >
> > > Then rcu_user_enter_fromirq() would be for the irq path and
> > > rcu_user_enter() from the non-irq path.
> > >
> > > Would that work for you?
> >
> > Yep!
>
> Very good, I will take a whack at it. BTW, testing is going quite
> well thus far with your current patches combined with my paranoid
> idle-count approach. One test in particular that previously failed
> reliably within minutes just successfully completed a ten-hour run.
> So things are looking up! (Famous last words...)
Great, I cross my fingers! :)
>
> > > > I don't truly understand the problem of the usermode helpers that
> > > > mess up the dynticks counts. May be we can somewhow fix it differently
> > > > from the offending callsite?
> > >
> > > I tried a few approaches along these lines, but there were way too
> > > many opportunities for interruption and preemption along the way.
> > > The problem is that unless the fixup happens under a no-preempt
> > > region of code that includes the rcu_irq_enter() or rcu_irq_exit()
> > > call (as the case may be), then you end up messing up the idle-depth
> > > count of two CPUs rather than just one. :-(
> > >
> > > But maybe I am missing something -- suggestions more than welcome!
> >
> > It's rather me missing everything :)
> > It happens when we call call_usermodehelper()? If so how? We have a
> > call to rcu_irq_enter() that lacks an rcu_irq_exit() ?
>
> On powerpc, it executes the "sc" ("system call") instruction from
> kernel mode, which results in an exception. But from what I can see,
> there is no corresponding return from exception, so my not-so-paranoid
> counting scheme would lose count. That said, please keep in mind that
> I in no way fully understand that code. It is also far from clear to
> me why my earlier dyntick-idle code worked in this situation -- perhaps
> the value of preempt_count() gets fixed up somehow -- I haven't really
> studied all the assembly language involved in detail, so there is lots
> of opportunity for such a fixup somewhere.
>
> You asked! ;-)
Haha! ;)
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 08/11 v2] nohz: Allow rcu extended quiescent state handling seperately from tick stop
2011-10-16 13:28 ` Frederic Weisbecker
@ 2011-10-17 2:26 ` Paul E. McKenney
0 siblings, 0 replies; 24+ messages in thread
From: Paul E. McKenney @ 2011-10-17 2:26 UTC (permalink / raw)
To: Frederic Weisbecker
Cc: LKML, Mike Frysinger, Guan Xuetao, David Miller, Chris Metcalf,
Hans-Christian Egtvedt, Ralf Baechle, Ingo Molnar, Peter Zijlstra,
Thomas Gleixner, H. Peter Anvin, Russell King, Paul Mackerras,
Heiko Carstens, Paul Mundt, anton
On Sun, Oct 16, 2011 at 03:28:08PM +0200, Frederic Weisbecker wrote:
> On Fri, Oct 14, 2011 at 10:00:19AM -0700, Paul E. McKenney wrote:
> > On Fri, Oct 14, 2011 at 02:08:36PM +0200, Frederic Weisbecker wrote:
> > > On Thu, Oct 13, 2011 at 03:51:36PM -0700, Paul E. McKenney wrote:
> > > > On Thu, Oct 13, 2011 at 02:50:20PM +0200, Frederic Weisbecker wrote:
> > > > > On Thu, Oct 13, 2011 at 12:03:57AM -0700, Paul E. McKenney wrote:
> > > > > > On Wed, Oct 12, 2011 at 11:57:52PM -0700, Paul E. McKenney wrote:
> > > > > > > On Sat, Oct 08, 2011 at 04:01:00PM +0200, Frederic Weisbecker wrote:
> > > > > > > > It is assumed that rcu won't be used once we switch to tickless
> > > > > > > > mode and until we restart the tick. However this is not always
> > > > > > > > true, as in x86-64 where we dereference the idle notifiers after
> > > > > > > > the tick is stopped.
> > > > > > > >
> > > > > > > > To prepare for fixing this, add two new APIs:
> > > > > > > > tick_nohz_idle_enter_norcu() and tick_nohz_idle_exit_norcu().
> > > > > > > >
> > > > > > > > If no use of RCU is made in the idle loop between
> > > > > > > > tick_nohz_enter_idle() and tick_nohz_exit_idle() calls, the arch
> > > > > > > > must instead call the new *_norcu() version such that the arch doesn't
> > > > > > > > need to call rcu_idle_enter() and rcu_idle_exit().
> > > > > > > >
> > > > > > > > Otherwise the arch must call tick_nohz_enter_idle() and
> > > > > > > > tick_nohz_exit_idle() and also call explicitly:
> > > > > > > >
> > > > > > > > - rcu_idle_enter() after its last use of RCU before the CPU is put
> > > > > > > > to sleep.
> > > > > > > > - rcu_idle_exit() before the first use of RCU after the CPU is woken
> > > > > > > > up.
> > > > > > >
> > > > > > > Thank you, Frederic! I have queued this to replace the earlier
> > > > > > > version. The set is available on branch rcu/dyntick of
> > > > > > >
> > > > > > > https://github.com/paulmckrcu/linux
> > > > > >
> > > > > > Which reminds me... About the ultimate objective, getting tick-free
> > > > > > operation. (Or, for the guys who want to eliminate the tick entirely,
> > > > > > shutting up the hrtimer stuff that they want to replace it with.)
> > > > > >
> > > > > > I believe that you will then need to have two levels of not-in-dynticks
> > > > > > for processes, one for idle vs. not and another for when a process
> > > > > > switches from user-space to kernel execution. Correct, or am I
> > > > > > confused?
> > > > > >
> > > > > > The reason I ask is that commit e11f5981 currently only allows one
> > > > > > level of not-in-dynticks for processes. It is easy to add another
> > > > > > level, but thought I should check beforehand.
> > > > >
> > > > > Hmm, yeah looking at that patch, it's going to be hard to have a nesting
> > > > > that looks like:
> > > > >
> > > > > rcu_irq_enter();
> > > > > rcu_user_enter();
> > > > > rcu_irq_exit(); <-- with effective extended quiescent state starting there
> > > >
> > > > OK, so the idea here is that there has been two runnable processes on
> > > > the current CPU, but during the irq handler one of them moves or some
> > > > such?
> > >
> > > No it happens when we have an irq in userspace and we stop the tick
> > > from that irq. Noticing we are in userspace, we want to be in extended
> > > quiescent state when we resume from the interrupt to userspace.
> >
> > Ah, OK!
> >
> > > > If so, how about a rcu_user_enter_fromirq() that sets the counter
> > > > to 1 so that the rcu_irq_exit() cleans up properly? If need be, I could
> > > > of course provide an argument to allow you to specify the count offset.
> > >
> > > Yeah I think that should work.
> >
> > Very good. I will start off with no argument, easy enough to add it
> > later if needed.
>
> Yeah, but for now, this is out of tree code. The functions seem quite trivial
> to write so I guess I can sort it out in my tree when I'll rebase on 3.2
> (if the current queue goes to the next merge window)
I will be pushing rcu/next to get -next exposure later this week (once
I reach a geography that permits github access), and if all goes well...
> > > > > I also need to be able to call rcu_user_enter() from non-irq path.
> > > >
> > > > Then rcu_user_enter_fromirq() would be for the irq path and
> > > > rcu_user_enter() from the non-irq path.
> > > >
> > > > Would that work for you?
> > >
> > > Yep!
> >
> > Very good, I will take a whack at it. BTW, testing is going quite
> > well thus far with your current patches combined with my paranoid
> > idle-count approach. One test in particular that previously failed
> > reliably within minutes just successfully completed a ten-hour run.
> > So things are looking up! (Famous last words...)
>
> Great, I cross my fingers! :)
Good results for that one. Here's hoping for continued test passing.
> > > > > I don't truly understand the problem of the usermode helpers that
> > > > > mess up the dynticks counts. May be we can somewhow fix it differently
> > > > > from the offending callsite?
> > > >
> > > > I tried a few approaches along these lines, but there were way too
> > > > many opportunities for interruption and preemption along the way.
> > > > The problem is that unless the fixup happens under a no-preempt
> > > > region of code that includes the rcu_irq_enter() or rcu_irq_exit()
> > > > call (as the case may be), then you end up messing up the idle-depth
> > > > count of two CPUs rather than just one. :-(
> > > >
> > > > But maybe I am missing something -- suggestions more than welcome!
> > >
> > > It's rather me missing everything :)
> > > It happens when we call call_usermodehelper()? If so how? We have a
> > > call to rcu_irq_enter() that lacks an rcu_irq_exit() ?
> >
> > On powerpc, it executes the "sc" ("system call") instruction from
> > kernel mode, which results in an exception. But from what I can see,
> > there is no corresponding return from exception, so my not-so-paranoid
> > counting scheme would lose count. That said, please keep in mind that
> > I in no way fully understand that code. It is also far from clear to
> > me why my earlier dyntick-idle code worked in this situation -- perhaps
> > the value of preempt_count() gets fixed up somehow -- I haven't really
> > studied all the assembly language involved in detail, so there is lots
> > of opportunity for such a fixup somewhere.
> >
> > You asked! ;-)
>
> Haha! ;)
;-)
Thanx, Paul
^ permalink raw reply [flat|nested] 24+ messages in thread
end of thread, other threads:[~2011-10-17 2:37 UTC | newest]
Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-10-07 16:21 [PATCH 00/11] rcu: Detect illegal uses of RCU in idle and fix some v5 Frederic Weisbecker
2011-10-07 16:22 ` [PATCH 01/11] rcu: Detect illegal rcu dereference in extended quiescent state Frederic Weisbecker
2011-10-07 16:22 ` [PATCH 02/11] rcu: Inform the user about extended quiescent state on PROVE_RCU warning Frederic Weisbecker
2011-10-07 22:47 ` Paul E. McKenney
2011-10-07 16:22 ` [PATCH 03/11] rcu: Warn when rcu_read_lock() is used in extended quiescent state Frederic Weisbecker
2011-10-07 16:22 ` [PATCH 04/11] rcu: Remove one layer of abstraction from PROVE_RCU checking Frederic Weisbecker
2011-10-07 16:22 ` [PATCH 05/11] rcu: Warn when srcu_read_lock() is used in an extended quiescent state Frederic Weisbecker
2011-10-07 16:22 ` [PATCH 06/11] rcu: Make srcu_read_lock_held() call common lockdep-enabled function Frederic Weisbecker
2011-10-07 16:22 ` [PATCH 07/11] nohz: Separate out irq exit and idle loop dyntick logic Frederic Weisbecker
2011-10-07 16:22 ` [PATCH 08/11] nohz: Allow rcu extended quiescent state handling seperately from tick stop Frederic Weisbecker
2011-10-08 13:43 ` Frederic Weisbecker
2011-10-08 14:01 ` [PATCH 08/11 v2] " Frederic Weisbecker
2011-10-13 6:57 ` Paul E. McKenney
2011-10-13 7:03 ` Paul E. McKenney
2011-10-13 12:50 ` Frederic Weisbecker
2011-10-13 22:51 ` Paul E. McKenney
2011-10-14 12:08 ` Frederic Weisbecker
2011-10-14 17:00 ` Paul E. McKenney
2011-10-16 13:28 ` Frederic Weisbecker
2011-10-17 2:26 ` Paul E. McKenney
2011-10-07 16:22 ` [PATCH 09/11] x86: Enter rcu extended qs after idle notifier call Frederic Weisbecker
2011-10-07 16:22 ` [PATCH 10/11] x86: Call idle notifier after irq_enter() Frederic Weisbecker
2011-10-07 16:22 ` [PATCH 11/11] rcu: Fix early call to rcu_idle_enter() Frederic Weisbecker
2011-10-07 23:32 ` [PATCH 00/11] rcu: Detect illegal uses of RCU in idle and fix some v5 Paul E. McKenney
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox