* [RFC][PATCH RT 1/6] rt,rcu: Have rcu_read_lock_sched() use locks for PREEMPT_RT
2013-06-26 19:28 [RFC][PATCH RT 0/6] rt: Updates to handle some 3.10 changes Steven Rostedt
@ 2013-06-26 19:28 ` Steven Rostedt
2013-06-26 20:53 ` Paul E. McKenney
2013-06-27 3:52 ` Mike Galbraith
2013-06-26 19:28 ` [RFC][PATCH RT 2/6] workqueue: Use rcu_read_lock_sched() to denote RCU synchronize_sched() location Steven Rostedt
` (5 subsequent siblings)
6 siblings, 2 replies; 14+ messages in thread
From: Steven Rostedt @ 2013-06-26 19:28 UTC (permalink / raw)
To: linux-kernel, linux-rt-users
Cc: Thomas Gleixner, Carsten Emde, Sebastian Andrzej Siewior,
Clark Williams, Paul E. McKenney
[-- Attachment #1: rcu-sched-preempt.patch --]
[-- Type: text/plain, Size: 4533 bytes --]
There are several critical sections that require synchronization with
synchronize_sched(). Usually these are done by disabling preemption and
the synchronize_sched() just waits for the kernel to schedule on each
of the CPUs.
The rcu_read_lock_sched() is the preferred API to use, but some areas
still use preempt_disable() and local_irq_*() to prevent preemption
from happening. But our main concern is with those users of
rcu_read_lock_sched(), where they may also call spin_locks() that turn
into a mutex for PREEMPT_RT. For these cases, we need to allow
rcu_read_lock_sched() to schedule out.
To allow rcu_read_lock_sched() sections to preempt when PREEMPT_RT is enabled,
instead of disabling preemption, it will grab a local_lock(). Then the
synchronize_sched() will grab all CPUs local_locks() and release them.
After that, it still does the normal synchronize_sched() as there may be
places that still disable preemption or irqs that it needs to
synchronize with. By grabbing all the locks and releasing them, it will
properly synchronize with those that use the locks instead of disabling
preemption or interrupts.
Note: The rcu_read_lock_sched_notrace() version still only disables
preemption, because they are used for lockdep and tracing, which require
real preemption disabling and not mutexes.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Index: linux-rt.git/include/linux/rcupdate.h
===================================================================
--- linux-rt.git.orig/include/linux/rcupdate.h
+++ linux-rt.git/include/linux/rcupdate.h
@@ -36,6 +36,7 @@
#include <linux/types.h>
#include <linux/cache.h>
#include <linux/spinlock.h>
+#include <linux/locallock.h>
#include <linux/threads.h>
#include <linux/cpumask.h>
#include <linux/seqlock.h>
@@ -870,6 +871,28 @@ static inline void rcu_read_unlock_bh(vo
local_bh_enable();
}
+/* asm-offsets.c gets confused with local_lock here */
+#if defined(CONFIG_PREEMPT_RT_FULL)
+DECLARE_LOCAL_IRQ_LOCK(rcu_sched_lock);
+static inline void rcu_read_lock_sched_disable(void)
+{
+ local_lock(rcu_sched_lock);
+}
+static inline void rcu_read_lock_sched_enable(void)
+{
+ local_unlock(rcu_sched_lock);
+}
+#else
+static inline void rcu_read_lock_sched_disable(void)
+{
+ preempt_disable();
+}
+static inline void rcu_read_lock_sched_enable(void)
+{
+ preempt_enable();
+}
+#endif
+
/**
* rcu_read_lock_sched() - mark the beginning of a RCU-sched critical section
*
@@ -885,7 +908,7 @@ static inline void rcu_read_unlock_bh(vo
*/
static inline void rcu_read_lock_sched(void)
{
- preempt_disable();
+ rcu_read_lock_sched_disable();
__acquire(RCU_SCHED);
rcu_lock_acquire(&rcu_sched_lock_map);
rcu_lockdep_assert(!rcu_is_cpu_idle(),
@@ -910,7 +933,7 @@ static inline void rcu_read_unlock_sched
"rcu_read_unlock_sched() used illegally while idle");
rcu_lock_release(&rcu_sched_lock_map);
__release(RCU_SCHED);
- preempt_enable();
+ rcu_read_lock_sched_enable();
}
/* Used by lockdep and tracing: cannot be traced, cannot call lockdep. */
Index: linux-rt.git/kernel/rcutree.c
===================================================================
--- linux-rt.git.orig/kernel/rcutree.c
+++ linux-rt.git/kernel/rcutree.c
@@ -2491,6 +2491,31 @@ static inline int rcu_blocking_is_gp(voi
return ret;
}
+#ifdef CONFIG_PREEMPT_RT_FULL
+DEFINE_LOCAL_IRQ_LOCK(rcu_sched_lock);
+/*
+ * Real-time allows for synchronize sched to sleep but not migrate.
+ * This is done via the local locks. When calling synchronize_sched(),
+ * all the local locks need to be taken and released. This will ensure
+ * that all users of rcu_read_lock_sched() will be out of their critical
+ * sections at the completion of this function. synchronize_sched() will
+ * still perform the normal RCU sched operations to synchronize with
+ * locations that use disabling preemption or interrupts.
+ */
+static void rcu_synchronize_sched_rt(void)
+{
+ int cpu;
+
+ for_each_possible_cpu(cpu) {
+ spin_lock(&per_cpu(rcu_sched_lock, cpu).lock);
+ spin_unlock(&per_cpu(rcu_sched_lock, cpu).lock);
+ }
+}
+#else
+static inline void rcu_synchronize_sched_rt(void)
+{
+}
+#endif
/**
* synchronize_sched - wait until an rcu-sched grace period has elapsed.
*
@@ -2538,6 +2563,9 @@ void synchronize_sched(void)
!lock_is_held(&rcu_lock_map) &&
!lock_is_held(&rcu_sched_lock_map),
"Illegal synchronize_sched() in RCU-sched read-side critical section");
+
+ rcu_synchronize_sched_rt();
+
if (rcu_blocking_is_gp())
return;
if (rcu_expedited)
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC][PATCH RT 1/6] rt,rcu: Have rcu_read_lock_sched() use locks for PREEMPT_RT
2013-06-26 19:28 ` [RFC][PATCH RT 1/6] rt,rcu: Have rcu_read_lock_sched() use locks for PREEMPT_RT Steven Rostedt
@ 2013-06-26 20:53 ` Paul E. McKenney
2013-06-26 21:32 ` Steven Rostedt
2013-06-27 3:52 ` Mike Galbraith
1 sibling, 1 reply; 14+ messages in thread
From: Paul E. McKenney @ 2013-06-26 20:53 UTC (permalink / raw)
To: Steven Rostedt
Cc: linux-kernel, linux-rt-users, Thomas Gleixner, Carsten Emde,
Sebastian Andrzej Siewior, Clark Williams
On Wed, Jun 26, 2013 at 03:28:07PM -0400, Steven Rostedt wrote:
> There are several critical sections that require synchronization with
> synchronize_sched(). Usually these are done by disabling preemption and
> the synchronize_sched() just waits for the kernel to schedule on each
> of the CPUs.
>
> The rcu_read_lock_sched() is the preferred API to use, but some areas
> still use preempt_disable() and local_irq_*() to prevent preemption
> from happening. But our main concern is with those users of
> rcu_read_lock_sched(), where they may also call spin_locks() that turn
> into a mutex for PREEMPT_RT. For these cases, we need to allow
> rcu_read_lock_sched() to schedule out.
>
> To allow rcu_read_lock_sched() sections to preempt when PREEMPT_RT is enabled,
> instead of disabling preemption, it will grab a local_lock(). Then the
> synchronize_sched() will grab all CPUs local_locks() and release them.
> After that, it still does the normal synchronize_sched() as there may be
> places that still disable preemption or irqs that it needs to
> synchronize with. By grabbing all the locks and releasing them, it will
> properly synchronize with those that use the locks instead of disabling
> preemption or interrupts.
>
> Note: The rcu_read_lock_sched_notrace() version still only disables
> preemption, because they are used for lockdep and tracing, which require
> real preemption disabling and not mutexes.
This looks much better!
A few more questions and comments below.
Thanx, Paul
> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
>
> Index: linux-rt.git/include/linux/rcupdate.h
> ===================================================================
> --- linux-rt.git.orig/include/linux/rcupdate.h
> +++ linux-rt.git/include/linux/rcupdate.h
> @@ -36,6 +36,7 @@
> #include <linux/types.h>
> #include <linux/cache.h>
> #include <linux/spinlock.h>
> +#include <linux/locallock.h>
> #include <linux/threads.h>
> #include <linux/cpumask.h>
> #include <linux/seqlock.h>
> @@ -870,6 +871,28 @@ static inline void rcu_read_unlock_bh(vo
> local_bh_enable();
> }
>
> +/* asm-offsets.c gets confused with local_lock here */
> +#if defined(CONFIG_PREEMPT_RT_FULL)
> +DECLARE_LOCAL_IRQ_LOCK(rcu_sched_lock);
> +static inline void rcu_read_lock_sched_disable(void)
> +{
> + local_lock(rcu_sched_lock);
> +}
> +static inline void rcu_read_lock_sched_enable(void)
> +{
> + local_unlock(rcu_sched_lock);
> +}
> +#else
> +static inline void rcu_read_lock_sched_disable(void)
> +{
> + preempt_disable();
> +}
> +static inline void rcu_read_lock_sched_enable(void)
> +{
> + preempt_enable();
> +}
> +#endif
> +
> /**
> * rcu_read_lock_sched() - mark the beginning of a RCU-sched critical section
> *
> @@ -885,7 +908,7 @@ static inline void rcu_read_unlock_bh(vo
> */
How about having an rcu_read_lock_sched_rt() and rcu_read_unlock_sched_rt()?
Leave rcu_read_lock_sched() and rcu_read_unlock_sched() with their prior
semantics and deadlock immunity, with a header comment for the _rt()
variants that gives their properties and where they should be used.
> static inline void rcu_read_lock_sched(void)
> {
> - preempt_disable();
> + rcu_read_lock_sched_disable();
> __acquire(RCU_SCHED);
> rcu_lock_acquire(&rcu_sched_lock_map);
> rcu_lockdep_assert(!rcu_is_cpu_idle(),
> @@ -910,7 +933,7 @@ static inline void rcu_read_unlock_sched
> "rcu_read_unlock_sched() used illegally while idle");
> rcu_lock_release(&rcu_sched_lock_map);
> __release(RCU_SCHED);
> - preempt_enable();
> + rcu_read_lock_sched_enable();
> }
>
> /* Used by lockdep and tracing: cannot be traced, cannot call lockdep. */
> Index: linux-rt.git/kernel/rcutree.c
> ===================================================================
> --- linux-rt.git.orig/kernel/rcutree.c
> +++ linux-rt.git/kernel/rcutree.c
> @@ -2491,6 +2491,31 @@ static inline int rcu_blocking_is_gp(voi
> return ret;
> }
>
> +#ifdef CONFIG_PREEMPT_RT_FULL
> +DEFINE_LOCAL_IRQ_LOCK(rcu_sched_lock);
> +/*
> + * Real-time allows for synchronize sched to sleep but not migrate.
> + * This is done via the local locks. When calling synchronize_sched(),
> + * all the local locks need to be taken and released. This will ensure
> + * that all users of rcu_read_lock_sched() will be out of their critical
> + * sections at the completion of this function. synchronize_sched() will
> + * still perform the normal RCU sched operations to synchronize with
> + * locations that use disabling preemption or interrupts.
> + */
> +static void rcu_synchronize_sched_rt(void)
The name synchronize_sched_rt() would fit better with the companion
synchronize_sched() function.
> +{
> + int cpu;
> +
> + for_each_possible_cpu(cpu) {
> + spin_lock(&per_cpu(rcu_sched_lock, cpu).lock);
> + spin_unlock(&per_cpu(rcu_sched_lock, cpu).lock);
> + }
> +}
> +#else
> +static inline void rcu_synchronize_sched_rt(void)
> +{
I bet you want a synchronize_sched() here. ;-)
But looking below...
> +}
> +#endif
> /**
> * synchronize_sched - wait until an rcu-sched grace period has elapsed.
> *
> @@ -2538,6 +2563,9 @@ void synchronize_sched(void)
> !lock_is_held(&rcu_lock_map) &&
> !lock_is_held(&rcu_sched_lock_map),
> "Illegal synchronize_sched() in RCU-sched read-side critical section");
> +
> + rcu_synchronize_sched_rt();
> +
Are you sure you want a single primitive to wait on both types of
read-side critical sections? I can see arguments on either side...
For completeness, another approach would be to use SRCU instead of locking
for the preemptible RCU-sched read-side critical sections. One benefit
of doing this is that SRCU avoids introducing the potential deadlocks
that involve locks acquired both within and across read-side critical
sections.
> if (rcu_blocking_is_gp())
> return;
> if (rcu_expedited)
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC][PATCH RT 1/6] rt,rcu: Have rcu_read_lock_sched() use locks for PREEMPT_RT
2013-06-26 20:53 ` Paul E. McKenney
@ 2013-06-26 21:32 ` Steven Rostedt
0 siblings, 0 replies; 14+ messages in thread
From: Steven Rostedt @ 2013-06-26 21:32 UTC (permalink / raw)
To: paulmck
Cc: linux-kernel, linux-rt-users, Thomas Gleixner, Carsten Emde,
Sebastian Andrzej Siewior, Clark Williams, Tejun Heo
On Wed, 2013-06-26 at 13:53 -0700, Paul E. McKenney wrote:
> On Wed, Jun 26, 2013 at 03:28:07PM -0400, Steven Rostedt wrote:
> > There are several critical sections that require synchronization with
> > synchronize_sched(). Usually these are done by disabling preemption and
> > the synchronize_sched() just waits for the kernel to schedule on each
> > of the CPUs.
> >
> > The rcu_read_lock_sched() is the preferred API to use, but some areas
> > still use preempt_disable() and local_irq_*() to prevent preemption
> > from happening. But our main concern is with those users of
> > rcu_read_lock_sched(), where they may also call spin_locks() that turn
> > into a mutex for PREEMPT_RT. For these cases, we need to allow
> > rcu_read_lock_sched() to schedule out.
> >
> > To allow rcu_read_lock_sched() sections to preempt when PREEMPT_RT is enabled,
> > instead of disabling preemption, it will grab a local_lock(). Then the
> > synchronize_sched() will grab all CPUs local_locks() and release them.
> > After that, it still does the normal synchronize_sched() as there may be
> > places that still disable preemption or irqs that it needs to
> > synchronize with. By grabbing all the locks and releasing them, it will
> > properly synchronize with those that use the locks instead of disabling
> > preemption or interrupts.
> >
> > Note: The rcu_read_lock_sched_notrace() version still only disables
> > preemption, because they are used for lockdep and tracing, which require
> > real preemption disabling and not mutexes.
>
> This looks much better!
Really, I didn't think I changed it at all ;-)
>
> A few more questions and comments below.
>
> Thanx, Paul
>
> > Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
> >
> > Index: linux-rt.git/include/linux/rcupdate.h
> > ===================================================================
> > --- linux-rt.git.orig/include/linux/rcupdate.h
> > +++ linux-rt.git/include/linux/rcupdate.h
> > @@ -36,6 +36,7 @@
> > #include <linux/types.h>
> > #include <linux/cache.h>
> > #include <linux/spinlock.h>
> > +#include <linux/locallock.h>
> > #include <linux/threads.h>
> > #include <linux/cpumask.h>
> > #include <linux/seqlock.h>
> > @@ -870,6 +871,28 @@ static inline void rcu_read_unlock_bh(vo
> > local_bh_enable();
> > }
> >
> > +/* asm-offsets.c gets confused with local_lock here */
> > +#if defined(CONFIG_PREEMPT_RT_FULL)
> > +DECLARE_LOCAL_IRQ_LOCK(rcu_sched_lock);
> > +static inline void rcu_read_lock_sched_disable(void)
> > +{
> > + local_lock(rcu_sched_lock);
> > +}
> > +static inline void rcu_read_lock_sched_enable(void)
> > +{
> > + local_unlock(rcu_sched_lock);
> > +}
> > +#else
> > +static inline void rcu_read_lock_sched_disable(void)
> > +{
> > + preempt_disable();
> > +}
> > +static inline void rcu_read_lock_sched_enable(void)
> > +{
> > + preempt_enable();
> > +}
> > +#endif
> > +
> > /**
> > * rcu_read_lock_sched() - mark the beginning of a RCU-sched critical section
> > *
> > @@ -885,7 +908,7 @@ static inline void rcu_read_unlock_bh(vo
> > */
>
> How about having an rcu_read_lock_sched_rt() and rcu_read_unlock_sched_rt()?
> Leave rcu_read_lock_sched() and rcu_read_unlock_sched() with their prior
> semantics and deadlock immunity, with a header comment for the _rt()
> variants that gives their properties and where they should be used.
I can do that instead, just so that we don't introduce a deadlock
somewhere unknowingly.
>
> > static inline void rcu_read_lock_sched(void)
> > {
> > - preempt_disable();
> > + rcu_read_lock_sched_disable();
> > __acquire(RCU_SCHED);
> > rcu_lock_acquire(&rcu_sched_lock_map);
> > rcu_lockdep_assert(!rcu_is_cpu_idle(),
> > @@ -910,7 +933,7 @@ static inline void rcu_read_unlock_sched
> > "rcu_read_unlock_sched() used illegally while idle");
> > rcu_lock_release(&rcu_sched_lock_map);
> > __release(RCU_SCHED);
> > - preempt_enable();
> > + rcu_read_lock_sched_enable();
> > }
> >
> > /* Used by lockdep and tracing: cannot be traced, cannot call lockdep. */
> > Index: linux-rt.git/kernel/rcutree.c
> > ===================================================================
> > --- linux-rt.git.orig/kernel/rcutree.c
> > +++ linux-rt.git/kernel/rcutree.c
> > @@ -2491,6 +2491,31 @@ static inline int rcu_blocking_is_gp(voi
> > return ret;
> > }
> >
> > +#ifdef CONFIG_PREEMPT_RT_FULL
> > +DEFINE_LOCAL_IRQ_LOCK(rcu_sched_lock);
> > +/*
> > + * Real-time allows for synchronize sched to sleep but not migrate.
> > + * This is done via the local locks. When calling synchronize_sched(),
> > + * all the local locks need to be taken and released. This will ensure
> > + * that all users of rcu_read_lock_sched() will be out of their critical
> > + * sections at the completion of this function. synchronize_sched() will
> > + * still perform the normal RCU sched operations to synchronize with
> > + * locations that use disabling preemption or interrupts.
> > + */
> > +static void rcu_synchronize_sched_rt(void)
>
> The name synchronize_sched_rt() would fit better with the companion
> synchronize_sched() function.
See below.
>
> > +{
> > + int cpu;
> > +
> > + for_each_possible_cpu(cpu) {
> > + spin_lock(&per_cpu(rcu_sched_lock, cpu).lock);
> > + spin_unlock(&per_cpu(rcu_sched_lock, cpu).lock);
> > + }
> > +}
> > +#else
> > +static inline void rcu_synchronize_sched_rt(void)
> > +{
>
> I bet you want a synchronize_sched() here. ;-)
>
> But looking below...
>
> > +}
> > +#endif
> > /**
> > * synchronize_sched - wait until an rcu-sched grace period has elapsed.
> > *
> > @@ -2538,6 +2563,9 @@ void synchronize_sched(void)
> > !lock_is_held(&rcu_lock_map) &&
> > !lock_is_held(&rcu_sched_lock_map),
> > "Illegal synchronize_sched() in RCU-sched read-side critical section");
> > +
> > + rcu_synchronize_sched_rt();
> > +
>
> Are you sure you want a single primitive to wait on both types of
> read-side critical sections? I can see arguments on either side...
Actually, the problem is that the location I need this to work in is a
generic infrastructure (workqueues). I don't even see a call to
synchronize_sched() anyway. I'm assuming that it's the workqueue users
that are doing this.
I'm handling the change from commit fa1b54e69bc "workqueue: update
synchronization rules on worker_pool_idr" which states that it's
protecting the reads with synchronize_sched() (disabling interrupts).
>
> For completeness, another approach would be to use SRCU instead of locking
> for the preemptible RCU-sched read-side critical sections. One benefit
> of doing this is that SRCU avoids introducing the potential deadlocks
> that involve locks acquired both within and across read-side critical
> sections.
>
> > if (rcu_blocking_is_gp())
> > return;
> > if (rcu_expedited)
> >
I never understood fully the SRCU. Perhaps it will work, I'd just have
to look into it. But for now, I think I'll go with your original idea,
with the locks and use rcu_read_lock_sched_rt().
-- Steve
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC][PATCH RT 1/6] rt,rcu: Have rcu_read_lock_sched() use locks for PREEMPT_RT
2013-06-26 19:28 ` [RFC][PATCH RT 1/6] rt,rcu: Have rcu_read_lock_sched() use locks for PREEMPT_RT Steven Rostedt
2013-06-26 20:53 ` Paul E. McKenney
@ 2013-06-27 3:52 ` Mike Galbraith
2013-06-27 11:28 ` Steven Rostedt
1 sibling, 1 reply; 14+ messages in thread
From: Mike Galbraith @ 2013-06-27 3:52 UTC (permalink / raw)
To: Steven Rostedt
Cc: linux-kernel, linux-rt-users, Thomas Gleixner, Carsten Emde,
Sebastian Andrzej Siewior, Clark Williams, Paul E. McKenney
On Wed, 2013-06-26 at 15:28 -0400, Steven Rostedt wrote:
> @@ -2491,6 +2491,31 @@ static inline int rcu_blocking_is_gp(voi
> return ret;
> }
>
> +#ifdef CONFIG_PREEMPT_RT_FULL
> +DEFINE_LOCAL_IRQ_LOCK(rcu_sched_lock);
> +/*
> + * Real-time allows for synchronize sched to sleep but not migrate.
> + * This is done via the local locks. When calling synchronize_sched(),
> + * all the local locks need to be taken and released. This will ensure
> + * that all users of rcu_read_lock_sched() will be out of their critical
> + * sections at the completion of this function. synchronize_sched() will
> + * still perform the normal RCU sched operations to synchronize with
> + * locations that use disabling preemption or interrupts.
> + */
> +static void rcu_synchronize_sched_rt(void)
> +{
> + int cpu;
> +
> + for_each_possible_cpu(cpu) {
> + spin_lock(&per_cpu(rcu_sched_lock, cpu).lock);
> + spin_unlock(&per_cpu(rcu_sched_lock, cpu).lock);
> + }
> +}
Does that have to be possible vs online?
-Mike
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC][PATCH RT 1/6] rt,rcu: Have rcu_read_lock_sched() use locks for PREEMPT_RT
2013-06-27 3:52 ` Mike Galbraith
@ 2013-06-27 11:28 ` Steven Rostedt
0 siblings, 0 replies; 14+ messages in thread
From: Steven Rostedt @ 2013-06-27 11:28 UTC (permalink / raw)
To: Mike Galbraith
Cc: linux-kernel, linux-rt-users, Thomas Gleixner, Carsten Emde,
Sebastian Andrzej Siewior, Clark Williams, Paul E. McKenney
On Thu, 2013-06-27 at 05:52 +0200, Mike Galbraith wrote:
> On Wed, 2013-06-26 at 15:28 -0400, Steven Rostedt wrote:
>
> > @@ -2491,6 +2491,31 @@ static inline int rcu_blocking_is_gp(voi
> > return ret;
> > }
> >
> > +#ifdef CONFIG_PREEMPT_RT_FULL
> > +DEFINE_LOCAL_IRQ_LOCK(rcu_sched_lock);
> > +/*
> > + * Real-time allows for synchronize sched to sleep but not migrate.
> > + * This is done via the local locks. When calling synchronize_sched(),
> > + * all the local locks need to be taken and released. This will ensure
> > + * that all users of rcu_read_lock_sched() will be out of their critical
> > + * sections at the completion of this function. synchronize_sched() will
> > + * still perform the normal RCU sched operations to synchronize with
> > + * locations that use disabling preemption or interrupts.
> > + */
> > +static void rcu_synchronize_sched_rt(void)
> > +{
> > + int cpu;
> > +
> > + for_each_possible_cpu(cpu) {
> > + spin_lock(&per_cpu(rcu_sched_lock, cpu).lock);
> > + spin_unlock(&per_cpu(rcu_sched_lock, cpu).lock);
> > + }
> > +}
>
> Does that have to be possible vs online?
No, I was doing the "paranoid" approach, as I didn't know how much
hotplug may be using synchronize_sched() and wanted to make sure I hit
all CPUs. But I think online would be sufficient.
-- Steve
^ permalink raw reply [flat|nested] 14+ messages in thread
* [RFC][PATCH RT 2/6] workqueue: Use rcu_read_lock_sched() to denote RCU synchronize_sched() location
2013-06-26 19:28 [RFC][PATCH RT 0/6] rt: Updates to handle some 3.10 changes Steven Rostedt
2013-06-26 19:28 ` [RFC][PATCH RT 1/6] rt,rcu: Have rcu_read_lock_sched() use locks for PREEMPT_RT Steven Rostedt
@ 2013-06-26 19:28 ` Steven Rostedt
2013-06-26 19:28 ` [RFC][PATCH RT 3/6] idr: Use migrate_disable() to stay on the current CPU Steven Rostedt
` (4 subsequent siblings)
6 siblings, 0 replies; 14+ messages in thread
From: Steven Rostedt @ 2013-06-26 19:28 UTC (permalink / raw)
To: linux-kernel, linux-rt-users
Cc: Thomas Gleixner, Carsten Emde, Sebastian Andrzej Siewior,
Clark Williams, Paul E. McKenney
[-- Attachment #1: workqueue-rcu-sched.patch --]
[-- Type: text/plain, Size: 2073 bytes --]
Some functions in the workqueue code uses local_irq_disable() for the
RCU sched critical sections. That is, to make sure updates are done
using synchronize_sched() will synchronize with reader locations.
Unfortunately, local_irq_disable() is not very descriptive of what it
is used to protect, and disables interrupts for locations that do not
require interrupts disabled.
By using rcu_read_lock_sched(), we not only allow interrupts to be disabled
for shorter amounts of time, but also it documents nicely how the critical
section is being protected.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Index: linux-rt.git/kernel/workqueue.c
===================================================================
--- linux-rt.git.orig/kernel/workqueue.c
+++ linux-rt.git/kernel/workqueue.c
@@ -2766,14 +2766,15 @@ static bool start_flush_work(struct work
might_sleep();
- local_irq_disable();
+ rcu_read_lock_sched();
+
pool = get_work_pool(work);
if (!pool) {
- local_irq_enable();
+ rcu_read_unlock_sched();
return false;
}
- spin_lock(&pool->lock);
+ spin_lock_irq(&pool->lock);
/* see the comment in try_to_grab_pending() with the same code */
pwq = get_work_pwq(work);
if (pwq) {
@@ -2788,6 +2789,7 @@ static bool start_flush_work(struct work
insert_wq_barrier(pwq, barr, work, worker);
spin_unlock_irq(&pool->lock);
+ rcu_read_unlock_sched();
/*
* If @max_active is 1 or rescuer is in use, flushing another work
@@ -2804,6 +2806,7 @@ static bool start_flush_work(struct work
return true;
already_gone:
spin_unlock_irq(&pool->lock);
+ rcu_read_unlock_sched();
return false;
}
@@ -4366,7 +4369,7 @@ unsigned int work_busy(struct work_struc
if (work_pending(work))
ret |= WORK_BUSY_PENDING;
- local_irq_save(flags);
+ rcu_read_lock_sched();
pool = get_work_pool(work);
if (pool) {
spin_lock(&pool->lock);
@@ -4374,7 +4377,7 @@ unsigned int work_busy(struct work_struc
ret |= WORK_BUSY_RUNNING;
spin_unlock(&pool->lock);
}
- local_irq_restore(flags);
+ rcu_read_unlock_sched();
return ret;
}
^ permalink raw reply [flat|nested] 14+ messages in thread
* [RFC][PATCH RT 3/6] idr: Use migrate_disable() to stay on the current CPU
2013-06-26 19:28 [RFC][PATCH RT 0/6] rt: Updates to handle some 3.10 changes Steven Rostedt
2013-06-26 19:28 ` [RFC][PATCH RT 1/6] rt,rcu: Have rcu_read_lock_sched() use locks for PREEMPT_RT Steven Rostedt
2013-06-26 19:28 ` [RFC][PATCH RT 2/6] workqueue: Use rcu_read_lock_sched() to denote RCU synchronize_sched() location Steven Rostedt
@ 2013-06-26 19:28 ` Steven Rostedt
2013-06-26 19:28 ` [RFC][PATCH RT 4/6] rt,workqueue: Add local_lock_irq() call to put_pwq_unlocked() Steven Rostedt
` (3 subsequent siblings)
6 siblings, 0 replies; 14+ messages in thread
From: Steven Rostedt @ 2013-06-26 19:28 UTC (permalink / raw)
To: linux-kernel, linux-rt-users
Cc: Thomas Gleixner, Carsten Emde, Sebastian Andrzej Siewior,
Clark Williams, Paul E. McKenney
[-- Attachment #1: idr-migrate-disable.patch --]
[-- Type: text/plain, Size: 1343 bytes --]
idr_preload() and idr_preload_end() use preempt_disable() to keep the process
from migrating. For mainline that's fine, but for -rt, that has issues as
the spin_locks in between are changed to mutexes.
Use migrated_disable/enable() instead.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Index: linux-rt.git/include/linux/idr.h
===================================================================
--- linux-rt.git.orig/include/linux/idr.h
+++ linux-rt.git/include/linux/idr.h
@@ -94,7 +94,7 @@ void idr_init(struct idr *idp);
*/
static inline void idr_preload_end(void)
{
- preempt_enable();
+ migrate_enable();
}
/**
Index: linux-rt.git/lib/idr.c
===================================================================
--- linux-rt.git.orig/lib/idr.c
+++ linux-rt.git/lib/idr.c
@@ -423,7 +423,7 @@ void idr_preload(gfp_t gfp_mask)
WARN_ON_ONCE(in_interrupt());
might_sleep_if(gfp_mask & __GFP_WAIT);
- preempt_disable();
+ migrate_disable();
/*
* idr_alloc() is likely to succeed w/o full idr_layer buffer and
@@ -435,9 +435,9 @@ void idr_preload(gfp_t gfp_mask)
while (__this_cpu_read(idr_preload_cnt) < MAX_IDR_FREE) {
struct idr_layer *new;
- preempt_enable();
+ migrate_enable();
new = kmem_cache_zalloc(idr_layer_cache, gfp_mask);
- preempt_disable();
+ migrate_disable();
if (!new)
break;
^ permalink raw reply [flat|nested] 14+ messages in thread
* [RFC][PATCH RT 4/6] rt,workqueue: Add local_lock_irq() call to put_pwq_unlocked()
2013-06-26 19:28 [RFC][PATCH RT 0/6] rt: Updates to handle some 3.10 changes Steven Rostedt
` (2 preceding siblings ...)
2013-06-26 19:28 ` [RFC][PATCH RT 3/6] idr: Use migrate_disable() to stay on the current CPU Steven Rostedt
@ 2013-06-26 19:28 ` Steven Rostedt
2013-06-26 19:28 ` [RFC][PATCH RT 5/6] rt,ntp: Move call to schedule_delayed_work() to helper thread Steven Rostedt
` (2 subsequent siblings)
6 siblings, 0 replies; 14+ messages in thread
From: Steven Rostedt @ 2013-06-26 19:28 UTC (permalink / raw)
To: linux-kernel, linux-rt-users
Cc: Thomas Gleixner, Carsten Emde, Sebastian Andrzej Siewior,
Clark Williams, Paul E. McKenney
[-- Attachment #1: workqueue-recursive-locallock.patch --]
[-- Type: text/plain, Size: 6356 bytes --]
Hitting the following lockdep splat:
[ 5.272234] ======================================================
[ 5.272234] [ INFO: possible circular locking dependency detected ]
[ 5.272237] 3.10.0-rc7-test-rt10+ #58 Not tainted
[ 5.272237] -------------------------------------------------------
[ 5.272238] umount/413 is trying to acquire lock:
[ 5.272247] ((pendingb_lock).lock#7){+.+...}, at: [<ffffffff8106d5e6>] queue_work_on+0x76/0x1d0
[ 5.272248]
[ 5.272248] but task is already holding lock:
[ 5.272252] (&pool->lock/1){+.+...}, at: [<ffffffff8106d803>] put_pwq_unlocked+0x23/0x40
[ 5.272252]
[ 5.272252] which lock already depends on the new lock.
[ 5.272252]
[ 5.272253]
[ 5.272253] the existing dependency chain (in reverse order) is:
[ 5.272255]
[ 5.272255] -> #1 (&pool->lock/1){+.+...}:
[ 5.272259] [<ffffffff810b6817>] lock_acquire+0x87/0x150
[ 5.272265] [<ffffffff81665467>] rt_spin_lock+0x47/0x60
[ 5.272267] [<ffffffff8106d435>] __queue_work+0x295/0x3b0
[ 5.272269] [<ffffffff8106d6d6>] queue_work_on+0x166/0x1d0
[ 5.272273] [<ffffffff813e6cb1>] driver_deferred_probe_trigger.part.2+0x81/0x90
[ 5.272275] [<ffffffff813e6f15>] driver_bound+0x95/0xf0
[ 5.272277] [<ffffffff813e71b8>] driver_probe_device+0x108/0x3b0
[ 5.272279] [<ffffffff813e750b>] __driver_attach+0xab/0xb0
[ 5.272281] [<ffffffff813e517e>] bus_for_each_dev+0x5e/0x90
[ 5.272283] [<ffffffff813e6b9e>] driver_attach+0x1e/0x20
[ 5.272285] [<ffffffff813e6611>] bus_add_driver+0x101/0x290
[ 5.272288] [<ffffffff813e7a8a>] driver_register+0x7a/0x160
[ 5.272292] [<ffffffff8132d4fa>] __pci_register_driver+0x8a/0x90
[ 5.272295] [<ffffffffa01a6041>] 0xffffffffa01a6041
[ 5.272300] [<ffffffff810002ea>] do_one_initcall+0xea/0x1a0
[ 5.272303] [<ffffffff810c4135>] load_module+0x1315/0x1b10
[ 5.272305] [<ffffffff810c4a11>] SyS_init_module+0xe1/0x130
[ 5.272310] [<ffffffff8166dc82>] system_call_fastpath+0x16/0x1b
[ 5.272312]
[ 5.272312] -> #0 ((pendingb_lock).lock#7){+.+...}:
[ 5.272314] [<ffffffff810b61d8>] __lock_acquire+0x1c98/0x1ca0
[ 5.272316] [<ffffffff810b6817>] lock_acquire+0x87/0x150
[ 5.272319] [<ffffffff81665467>] rt_spin_lock+0x47/0x60
[ 5.272321] [<ffffffff8106d5e6>] queue_work_on+0x76/0x1d0
[ 5.272323] [<ffffffff8106d7b8>] put_pwq+0x78/0xa0
[ 5.272325] [<ffffffff8106d80b>] put_pwq_unlocked+0x2b/0x40
[ 5.272327] [<ffffffff8106d9f4>] destroy_workqueue+0x1d4/0x250
[ 5.272329] [<ffffffff81247af3>] ext4_put_super+0x53/0x360
[ 5.272333] [<ffffffff811a0e92>] generic_shutdown_super+0x62/0x100
[ 5.272336] [<ffffffff811a0f60>] kill_block_super+0x30/0x80
[ 5.272339] [<ffffffff811a124d>] deactivate_locked_super+0x4d/0x80
[ 5.272341] [<ffffffff811a1ffe>] deactivate_super+0x4e/0x70
[ 5.272346] [<ffffffff811bf0fc>] mntput_no_expire+0xdc/0x140
[ 5.272348] [<ffffffff811c03bf>] SyS_umount+0xaf/0x3a0
[ 5.272351] [<ffffffff8166dc82>] system_call_fastpath+0x16/0x1b
[ 5.272351]
[ 5.272351] other info that might help us debug this:
[ 5.272351]
[ 5.272352] Possible unsafe locking scenario:
[ 5.272352]
[ 5.272352] CPU0 CPU1
[ 5.272353] ---- ----
[ 5.272354] lock(&pool->lock/1);
[ 5.272356] lock((pendingb_lock).lock#7);
[ 5.272357] lock(&pool->lock/1);
[ 5.272358] lock((pendingb_lock).lock#7);
[ 5.272359]
[ 5.272359] *** DEADLOCK ***
[ 5.272359]
[ 5.272360] 2 locks held by umount/413:
[ 5.272364] #0: (&type->s_umount_key#19){+.+...}, at: [<ffffffff811a1ff6>] deactivate_super+0x46/0x70
[ 5.272368] #1: (&pool->lock/1){+.+...}, at: [<ffffffff8106d803>] put_pwq_unlocked+0x23/0x40
Was caused by put_pwq_unlocked() which grabs the pool->lock and then calls put_pwq()
which does a schedule_work() which will grab the local_lock(pendingb_lock) and
then later grab another pool->lock. There's a comment in put_pwq() about how the
second pool->lock has a subclass of 1 to get around lockdep complaining, but the
addition of the local_lock() to replace the local_irq_save() in -rt is a real
deadlock scenario.
As local_locks() are recrusive, by grabing the local_lock() before the pool->lock
in put_pwd_unlocked() we keep the correct locking order.
As this is only needed for -rt, a helper function is added called pool_lock_irq()
that grabs the local lock before grabing the pool->lock when PREEMPT_RT_FULL is
defined, and just grabs the pool->lock otherwise.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Index: linux-rt.git/kernel/workqueue.c
===================================================================
--- linux-rt.git.orig/kernel/workqueue.c
+++ linux-rt.git/kernel/workqueue.c
@@ -1053,6 +1053,33 @@ static void put_pwq(struct pool_workqueu
schedule_work(&pwq->unbound_release_work);
}
+#ifdef CONFIG_PREEMPT_RT_FULL
+/*
+ * RT uses local_lock instead of disabling interrupts.
+ * put_pwq() may grab this lock within the pool lock, which will
+ * reverse the general order. We need to grab it first before
+ * taking the pool lock.
+ */
+static inline void pool_lock_irq(struct pool_workqueue *pwq)
+{
+ local_lock_irq(pendingb_lock);
+ spin_lock(&pwq->pool->lock);
+}
+static inline void pool_unlock_irq(struct pool_workqueue *pwq)
+{
+ spin_unlock_irq(&pwq->pool->lock);
+ local_unlock_irq(pendingb_lock);
+}
+#else
+static inline void pool_lock_irq(struct pool_workqueue *pwq)
+{
+ spin_lock_irq(&pwq->pool->lock);
+}
+static inline void pool_unlock_irq(struct pool_workqueue *pwq)
+{
+ spin_unlock_irq(&pwq->pool->lock);
+}
+#endif
/**
* put_pwq_unlocked - put_pwq() with surrounding pool lock/unlock
* @pwq: pool_workqueue to put (can be %NULL)
@@ -1066,9 +1093,9 @@ static void put_pwq_unlocked(struct pool
* As both pwqs and pools are sched-RCU protected, the
* following lock operations are safe.
*/
- spin_lock_irq(&pwq->pool->lock);
+ pool_lock_irq(pwq);
put_pwq(pwq);
- spin_unlock_irq(&pwq->pool->lock);
+ pool_unlock_irq(pwq);
}
}
^ permalink raw reply [flat|nested] 14+ messages in thread
* [RFC][PATCH RT 5/6] rt,ntp: Move call to schedule_delayed_work() to helper thread
2013-06-26 19:28 [RFC][PATCH RT 0/6] rt: Updates to handle some 3.10 changes Steven Rostedt
` (3 preceding siblings ...)
2013-06-26 19:28 ` [RFC][PATCH RT 4/6] rt,workqueue: Add local_lock_irq() call to put_pwq_unlocked() Steven Rostedt
@ 2013-06-26 19:28 ` Steven Rostedt
2013-06-26 19:28 ` [RFC][PATCH RT 6/6] vtime: Convert vtime_seqlock into raw_spinlock_t and seqcount combo Steven Rostedt
2013-06-26 19:43 ` [RFC][PATCH RT 0.5/6] locallock: Add include of percpu.h Steven Rostedt
6 siblings, 0 replies; 14+ messages in thread
From: Steven Rostedt @ 2013-06-26 19:28 UTC (permalink / raw)
To: linux-kernel, linux-rt-users
Cc: Thomas Gleixner, Carsten Emde, Sebastian Andrzej Siewior,
Clark Williams, Paul E. McKenney
[-- Attachment #1: ntp-sched-delay-thread.patch --]
[-- Type: text/plain, Size: 2425 bytes --]
The ntp code for notify_cmos_timer() is called from a hard interrupt
context. schedule_delayed_work() under PREEMPT_RT_FULL calls spinlocks
that have been converted to mutexes, thus calling schedule_delayed_work()
from interrupt is not safe.
Add a helper thread that does the call to schedule_delayed_work and wake
up that thread instead of calling schedule_delayed_work() directly.
This is only for CONFIG_PREEMPT_RT_FULL, otherwise the code still calls
schedule_delayed_work() directly in irq context.
Note: There's a few places in the kernel that do this. Perhaps the RT
code should have a dedicated thread that does the checks. Just register
a notifier on boot up for your check and wake up the thread when
needed. This will be a todo.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Index: linux-rt.git/kernel/time/ntp.c
===================================================================
--- linux-rt.git.orig/kernel/time/ntp.c
+++ linux-rt.git/kernel/time/ntp.c
@@ -10,6 +10,7 @@
#include <linux/workqueue.h>
#include <linux/hrtimer.h>
#include <linux/jiffies.h>
+#include <linux/kthread.h>
#include <linux/math64.h>
#include <linux/timex.h>
#include <linux/time.h>
@@ -516,10 +517,49 @@ static void sync_cmos_clock(struct work_
schedule_delayed_work(&sync_cmos_work, timespec_to_jiffies(&next));
}
+#ifdef CONFIG_PREEMPT_RT_FULL
+/*
+ * RT can not call schedule_delayed_work from real interrupt context.
+ * Need to make a thread to do the real work.
+ */
+static struct task_struct *cmos_delay_thread;
+static bool do_cmos_delay;
+
+static int run_cmos_delay(void *ignore)
+{
+ while (!kthread_should_stop()) {
+ set_current_state(TASK_INTERRUPTIBLE);
+ if (do_cmos_delay) {
+ do_cmos_delay = false;
+ schedule_delayed_work(&sync_cmos_work, 0);
+ }
+ schedule();
+ }
+ __set_current_state(TASK_RUNNING);
+ return 0;
+}
+
+static void notify_cmos_timer(void)
+{
+ do_cmos_delay = true;
+ /* Make visible before waking up process */
+ smp_wmb();
+ wake_up_process(cmos_delay_thread);
+}
+
+static __init int create_cmos_delay_thread(void)
+{
+ cmos_delay_thread = kthread_run(run_cmos_delay, NULL, "kcmosdelayd");
+ BUG_ON(!cmos_delay_thread);
+ return 0;
+}
+early_initcall(create_cmos_delay_thread);
+#else
static void notify_cmos_timer(void)
{
schedule_delayed_work(&sync_cmos_work, 0);
}
+#endif /* CONFIG_PREEMPT_RT_FULL */
#else
static inline void notify_cmos_timer(void) { }
^ permalink raw reply [flat|nested] 14+ messages in thread
* [RFC][PATCH RT 6/6] vtime: Convert vtime_seqlock into raw_spinlock_t and seqcount combo
2013-06-26 19:28 [RFC][PATCH RT 0/6] rt: Updates to handle some 3.10 changes Steven Rostedt
` (4 preceding siblings ...)
2013-06-26 19:28 ` [RFC][PATCH RT 5/6] rt,ntp: Move call to schedule_delayed_work() to helper thread Steven Rostedt
@ 2013-06-26 19:28 ` Steven Rostedt
2013-07-10 17:12 ` Paul Gortmaker
2013-06-26 19:43 ` [RFC][PATCH RT 0.5/6] locallock: Add include of percpu.h Steven Rostedt
6 siblings, 1 reply; 14+ messages in thread
From: Steven Rostedt @ 2013-06-26 19:28 UTC (permalink / raw)
To: linux-kernel, linux-rt-users
Cc: Thomas Gleixner, Carsten Emde, Sebastian Andrzej Siewior,
Clark Williams, Paul E. McKenney
[-- Attachment #1: vtime-lock.patch --]
[-- Type: text/plain, Size: 6793 bytes --]
The vtime seqlock needs to be taken in true interrupt context on -rt.
The normal seqlocks are converted to mutexes when PREEMPT_RT_FULL is
enabled, which will break the vtime code as the calls are done from
interrupt context.
Convert the vtime seqlock into the raw_spinlock_t and seqcount combo
that can be done in interrupt context.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Index: linux-rt.git/include/linux/sched.h
===================================================================
--- linux-rt.git.orig/include/linux/sched.h
+++ linux-rt.git/include/linux/sched.h
@@ -1176,7 +1176,8 @@ struct task_struct {
struct cputime prev_cputime;
#endif
#ifdef CONFIG_VIRT_CPU_ACCOUNTING_GEN
- seqlock_t vtime_seqlock;
+ raw_spinlock_t vtime_lock;
+ seqcount_t vtime_seq;
unsigned long long vtime_snap;
enum {
VTIME_SLEEPING = 0,
Index: linux-rt.git/kernel/sched/cputime.c
===================================================================
--- linux-rt.git.orig/kernel/sched/cputime.c
+++ linux-rt.git/kernel/sched/cputime.c
@@ -668,9 +668,11 @@ void vtime_account_system(struct task_st
if (!vtime_accounting_enabled())
return;
- write_seqlock(&tsk->vtime_seqlock);
+ raw_spin_lock(&tsk->vtime_lock);
+ write_seqcount_begin(&tsk->vtime_seq);
__vtime_account_system(tsk);
- write_sequnlock(&tsk->vtime_seqlock);
+ write_seqcount_end(&tsk->vtime_seq);
+ raw_spin_unlock(&tsk->vtime_lock);
}
void vtime_account_irq_exit(struct task_struct *tsk)
@@ -678,11 +680,13 @@ void vtime_account_irq_exit(struct task_
if (!vtime_accounting_enabled())
return;
- write_seqlock(&tsk->vtime_seqlock);
+ raw_spin_lock(&tsk->vtime_lock);
+ write_seqcount_begin(&tsk->vtime_seq);
if (context_tracking_in_user())
tsk->vtime_snap_whence = VTIME_USER;
__vtime_account_system(tsk);
- write_sequnlock(&tsk->vtime_seqlock);
+ write_seqcount_end(&tsk->vtime_seq);
+ raw_spin_unlock(&tsk->vtime_lock);
}
void vtime_account_user(struct task_struct *tsk)
@@ -694,10 +698,12 @@ void vtime_account_user(struct task_stru
delta_cpu = get_vtime_delta(tsk);
- write_seqlock(&tsk->vtime_seqlock);
+ raw_spin_lock(&tsk->vtime_lock);
+ write_seqcount_begin(&tsk->vtime_seq);
tsk->vtime_snap_whence = VTIME_SYS;
account_user_time(tsk, delta_cpu, cputime_to_scaled(delta_cpu));
- write_sequnlock(&tsk->vtime_seqlock);
+ write_seqcount_end(&tsk->vtime_seq);
+ raw_spin_unlock(&tsk->vtime_lock);
}
void vtime_user_enter(struct task_struct *tsk)
@@ -705,26 +711,32 @@ void vtime_user_enter(struct task_struct
if (!vtime_accounting_enabled())
return;
- write_seqlock(&tsk->vtime_seqlock);
+ raw_spin_lock(&tsk->vtime_lock);
+ write_seqcount_begin(&tsk->vtime_seq);
tsk->vtime_snap_whence = VTIME_USER;
__vtime_account_system(tsk);
- write_sequnlock(&tsk->vtime_seqlock);
+ write_seqcount_end(&tsk->vtime_seq);
+ raw_spin_unlock(&tsk->vtime_lock);
}
void vtime_guest_enter(struct task_struct *tsk)
{
- write_seqlock(&tsk->vtime_seqlock);
+ raw_spin_lock(&tsk->vtime_lock);
+ write_seqcount_begin(&tsk->vtime_seq);
__vtime_account_system(tsk);
current->flags |= PF_VCPU;
- write_sequnlock(&tsk->vtime_seqlock);
+ write_seqcount_end(&tsk->vtime_seq);
+ raw_spin_unlock(&tsk->vtime_lock);
}
void vtime_guest_exit(struct task_struct *tsk)
{
- write_seqlock(&tsk->vtime_seqlock);
+ raw_spin_lock(&tsk->vtime_lock);
+ write_seqcount_begin(&tsk->vtime_seq);
__vtime_account_system(tsk);
current->flags &= ~PF_VCPU;
- write_sequnlock(&tsk->vtime_seqlock);
+ write_seqcount_end(&tsk->vtime_seq);
+ raw_spin_unlock(&tsk->vtime_lock);
}
void vtime_account_idle(struct task_struct *tsk)
@@ -741,24 +753,30 @@ bool vtime_accounting_enabled(void)
void arch_vtime_task_switch(struct task_struct *prev)
{
- write_seqlock(&prev->vtime_seqlock);
+ raw_spin_lock(&prev->vtime_lock);
+ write_seqcount_begin(&prev->vtime_seq);
prev->vtime_snap_whence = VTIME_SLEEPING;
- write_sequnlock(&prev->vtime_seqlock);
+ write_seqcount_end(&prev->vtime_seq);
+ raw_spin_unlock(&prev->vtime_lock);
- write_seqlock(¤t->vtime_seqlock);
+ raw_spin_lock(¤t->vtime_lock);
+ write_seqcount_begin(¤t->vtime_seq);
current->vtime_snap_whence = VTIME_SYS;
current->vtime_snap = sched_clock_cpu(smp_processor_id());
- write_sequnlock(¤t->vtime_seqlock);
+ write_seqcount_end(¤t->vtime_seq);
+ raw_spin_unlock(¤t->vtime_lock);
}
void vtime_init_idle(struct task_struct *t, int cpu)
{
unsigned long flags;
- write_seqlock_irqsave(&t->vtime_seqlock, flags);
+ raw_spin_lock_irqsave(&t->vtime_lock, flags);
+ write_seqcount_begin(&t->vtime_seq);
t->vtime_snap_whence = VTIME_SYS;
t->vtime_snap = sched_clock_cpu(cpu);
- write_sequnlock_irqrestore(&t->vtime_seqlock, flags);
+ write_seqcount_end(&t->vtime_seq);
+ raw_spin_unlock_irqrestore(&t->vtime_lock, flags);
}
cputime_t task_gtime(struct task_struct *t)
@@ -767,13 +785,13 @@ cputime_t task_gtime(struct task_struct
cputime_t gtime;
do {
- seq = read_seqbegin(&t->vtime_seqlock);
+ seq = read_seqcount_begin(&t->vtime_seq);
gtime = t->gtime;
if (t->flags & PF_VCPU)
gtime += vtime_delta(t);
- } while (read_seqretry(&t->vtime_seqlock, seq));
+ } while (read_seqcount_retry(&t->vtime_seq, seq));
return gtime;
}
@@ -796,7 +814,7 @@ fetch_task_cputime(struct task_struct *t
*udelta = 0;
*sdelta = 0;
- seq = read_seqbegin(&t->vtime_seqlock);
+ seq = read_seqcount_begin(&t->vtime_seq);
if (u_dst)
*u_dst = *u_src;
@@ -820,7 +838,7 @@ fetch_task_cputime(struct task_struct *t
if (t->vtime_snap_whence == VTIME_SYS)
*sdelta = delta;
}
- } while (read_seqretry(&t->vtime_seqlock, seq));
+ } while (read_seqcount_retry(&t->vtime_seq, seq));
}
Index: linux-rt.git/include/linux/init_task.h
===================================================================
--- linux-rt.git.orig/include/linux/init_task.h
+++ linux-rt.git/include/linux/init_task.h
@@ -151,7 +151,8 @@ extern struct task_group root_task_group
#ifdef CONFIG_VIRT_CPU_ACCOUNTING_GEN
# define INIT_VTIME(tsk) \
- .vtime_seqlock = __SEQLOCK_UNLOCKED(tsk.vtime_seqlock), \
+ .vtime_lock = __RAW_SPIN_LOCK_UNLOCKED(tsk.vtime_lock), \
+ .vtime_seq = SEQCNT_ZERO, \
.vtime_snap = 0, \
.vtime_snap_whence = VTIME_SYS,
#else
Index: linux-rt.git/kernel/fork.c
===================================================================
--- linux-rt.git.orig/kernel/fork.c
+++ linux-rt.git/kernel/fork.c
@@ -1268,7 +1268,8 @@ static struct task_struct *copy_process(
p->prev_cputime.utime = p->prev_cputime.stime = 0;
#endif
#ifdef CONFIG_VIRT_CPU_ACCOUNTING_GEN
- seqlock_init(&p->vtime_seqlock);
+ raw_spin_lock_init(&p->vtime_lock);
+ seqcount_init(&p->vtime_seq);
p->vtime_snap = 0;
p->vtime_snap_whence = VTIME_SLEEPING;
#endif
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC][PATCH RT 6/6] vtime: Convert vtime_seqlock into raw_spinlock_t and seqcount combo
2013-06-26 19:28 ` [RFC][PATCH RT 6/6] vtime: Convert vtime_seqlock into raw_spinlock_t and seqcount combo Steven Rostedt
@ 2013-07-10 17:12 ` Paul Gortmaker
2013-07-11 14:30 ` Steven Rostedt
0 siblings, 1 reply; 14+ messages in thread
From: Paul Gortmaker @ 2013-07-10 17:12 UTC (permalink / raw)
To: Steven Rostedt
Cc: linux-kernel, linux-rt-users, Thomas Gleixner, Carsten Emde,
Sebastian Andrzej Siewior, Clark Williams, Paul E. McKenney
[[RFC][PATCH RT 6/6] vtime: Convert vtime_seqlock into raw_spinlock_t and seqcount combo] On 26/06/2013 (Wed 15:28) Steven Rostedt wrote:
> The vtime seqlock needs to be taken in true interrupt context on -rt.
> The normal seqlocks are converted to mutexes when PREEMPT_RT_FULL is
> enabled, which will break the vtime code as the calls are done from
> interrupt context.
>
> Convert the vtime seqlock into the raw_spinlock_t and seqcount combo
> that can be done in interrupt context.
Alternatively, we could revive the raw seqlock patch from Thomas?
https://lkml.org/lkml/2010/2/17/238
Below is a version updating it to 3.8.x-RT. One downside is that
current mainline kernels have raw_seqcount_begin() function, which has
nothing to do with preempt-rt, so the seqcount function namespace can
get confusing unless we rename raw_seqcount_begin to something that
doesn't sound RT-ish.
Paul.
--
diff --git a/include/linux/seqlock.h b/include/linux/seqlock.h
index 939ea1a..5a3f6fd 100644
--- a/include/linux/seqlock.h
+++ b/include/linux/seqlock.h
@@ -185,6 +185,11 @@ static inline void write_seqcount_barrier(seqcount_t *s)
typedef struct {
struct seqcount seqcount;
+ raw_spinlock_t lock;
+} raw_seqlock_t;
+
+typedef struct {
+ struct seqcount seqcount;
spinlock_t lock;
} seqlock_t;
@@ -192,6 +197,21 @@ typedef struct {
* These macros triggered gcc-3.x compile-time problems. We think these are
* OK now. Be cautious.
*/
+#define __RAW_SEQLOCK_UNLOCKED(lockname) \
+ { \
+ .seqcount = SEQCNT_ZERO, \
+ .lock = __RAW_SPIN_LOCK_UNLOCKED(lockname) \
+ }
+
+#define raw_seqlock_init(x) \
+ do { \
+ seqcount_init(&(x)->seqcount); \
+ raw_spin_lock_init(&(x)->lock); \
+ } while (0)
+
+#define DEFINE_RAW_SEQLOCK(x) \
+ raw_seqlock_t x = __RAW_SEQLOCK_UNLOCKED(x)
+
#define __SEQLOCK_UNLOCKED(lockname) \
{ \
.seqcount = SEQCNT_ZERO, \
@@ -210,6 +230,11 @@ typedef struct {
/*
* Read side functions for starting and finalizing a read side section.
*/
+static inline unsigned read_raw_seqbegin(const raw_seqlock_t *sl)
+{
+ return read_seqcount_begin(&sl->seqcount);
+}
+
#ifndef CONFIG_PREEMPT_RT_FULL
static inline unsigned read_seqbegin(const seqlock_t *sl)
{
@@ -238,6 +263,11 @@ repeat:
}
#endif
+static inline unsigned read_raw_seqretry(const raw_seqlock_t *sl, unsigned start)
+{
+ return read_seqcount_retry(&sl->seqcount, start);
+}
+
static inline unsigned read_seqretry(const seqlock_t *sl, unsigned start)
{
return read_seqcount_retry(&sl->seqcount, start);
@@ -248,6 +278,64 @@ static inline unsigned read_seqretry(const seqlock_t *sl, unsigned start)
* Acts like a normal spin_lock/unlock.
* Don't need preempt_disable() because that is in the spin_lock already.
*/
+static inline void write_raw_seqlock(raw_seqlock_t *sl)
+{
+ raw_spin_lock(&sl->lock);
+ __write_seqcount_begin(&sl->seqcount);
+}
+
+static inline void write_raw_sequnlock(raw_seqlock_t *sl)
+{
+ __write_seqcount_end(&sl->seqcount);
+ raw_spin_unlock(&sl->lock);
+}
+
+static inline void write_raw_seqlock_bh(raw_seqlock_t *sl)
+{
+ raw_spin_lock_bh(&sl->lock);
+ __write_seqcount_begin(&sl->seqcount);
+}
+
+static inline void write_raw_sequnlock_bh(raw_seqlock_t *sl)
+{
+ __write_seqcount_end(&sl->seqcount);
+ raw_spin_unlock_bh(&sl->lock);
+}
+
+static inline void write_raw_seqlock_irq(raw_seqlock_t *sl)
+{
+ raw_spin_lock_irq(&sl->lock);
+ __write_seqcount_begin(&sl->seqcount);
+}
+
+static inline void write_raw_sequnlock_irq(raw_seqlock_t *sl)
+{
+ __write_seqcount_end(&sl->seqcount);
+ raw_spin_unlock_irq(&sl->lock);
+}
+
+static inline unsigned long __write_raw_seqlock_irqsave(raw_seqlock_t *sl)
+{
+ unsigned long flags;
+
+ raw_spin_lock_irqsave(&sl->lock, flags);
+ __write_seqcount_begin(&sl->seqcount);
+ return flags;
+}
+
+#define write_raw_seqlock_irqsave(lock, flags) \
+ do { flags = __write_raw_seqlock_irqsave(lock); } while (0)
+
+static inline void
+write_raw_sequnlock_irqrestore(raw_seqlock_t *sl, unsigned long flags)
+{
+ __write_seqcount_end(&sl->seqcount);
+ raw_spin_unlock_irqrestore(&sl->lock, flags);
+}
+
+/*
+ * non raw versions
+ */
static inline void write_seqlock(seqlock_t *sl)
{
spin_lock(&sl->lock);
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [RFC][PATCH RT 6/6] vtime: Convert vtime_seqlock into raw_spinlock_t and seqcount combo
2013-07-10 17:12 ` Paul Gortmaker
@ 2013-07-11 14:30 ` Steven Rostedt
0 siblings, 0 replies; 14+ messages in thread
From: Steven Rostedt @ 2013-07-11 14:30 UTC (permalink / raw)
To: Paul Gortmaker
Cc: linux-kernel, linux-rt-users, Thomas Gleixner, Carsten Emde,
Sebastian Andrzej Siewior, Clark Williams, Paul E. McKenney
On Wed, 2013-07-10 at 13:12 -0400, Paul Gortmaker wrote:
> [[RFC][PATCH RT 6/6] vtime: Convert vtime_seqlock into raw_spinlock_t and seqcount combo] On 26/06/2013 (Wed 15:28) Steven Rostedt wrote:
>
> > The vtime seqlock needs to be taken in true interrupt context on -rt.
> > The normal seqlocks are converted to mutexes when PREEMPT_RT_FULL is
> > enabled, which will break the vtime code as the calls are done from
> > interrupt context.
> >
> > Convert the vtime seqlock into the raw_spinlock_t and seqcount combo
> > that can be done in interrupt context.
>
> Alternatively, we could revive the raw seqlock patch from Thomas?
>
> https://lkml.org/lkml/2010/2/17/238
>
> Below is a version updating it to 3.8.x-RT. One downside is that
> current mainline kernels have raw_seqcount_begin() function, which has
> nothing to do with preempt-rt, so the seqcount function namespace can
> get confusing unless we rename raw_seqcount_begin to something that
> doesn't sound RT-ish.
There's a reason that Thomas didn't bring that patch forward, but I
don't recall what it was.
Thomas?
-- Steve
^ permalink raw reply [flat|nested] 14+ messages in thread
* [RFC][PATCH RT 0.5/6] locallock: Add include of percpu.h
2013-06-26 19:28 [RFC][PATCH RT 0/6] rt: Updates to handle some 3.10 changes Steven Rostedt
` (5 preceding siblings ...)
2013-06-26 19:28 ` [RFC][PATCH RT 6/6] vtime: Convert vtime_seqlock into raw_spinlock_t and seqcount combo Steven Rostedt
@ 2013-06-26 19:43 ` Steven Rostedt
6 siblings, 0 replies; 14+ messages in thread
From: Steven Rostedt @ 2013-06-26 19:43 UTC (permalink / raw)
To: linux-kernel
Cc: linux-rt-users, Thomas Gleixner, Carsten Emde,
Sebastian Andrzej Siewior, Clark Williams, Paul E. McKenney
[ I cut and pasted the series file but missed this patch to start ]
locallock.h uses per cpu variables, and if it is included by code
that does not include percpu.h, the compiler complains.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Index: linux-rt.git/include/linux/locallock.h
===================================================================
--- linux-rt.git.orig/include/linux/locallock.h
+++ linux-rt.git/include/linux/locallock.h
@@ -1,6 +1,7 @@
#ifndef _LINUX_LOCALLOCK_H
#define _LINUX_LOCALLOCK_H
+#include <linux/percpu.h>
#include <linux/spinlock.h>
#ifdef CONFIG_PREEMPT_RT_BASE
^ permalink raw reply [flat|nested] 14+ messages in thread