- * [PATCH RT 01/16] sched/workqueue: Only wake up idle workers if not blocked on sleeping spin lock
  2013-09-09 14:35 [PATCH RT 00/16] 3.0.89-rt118-rc1 stable review Steven Rostedt
@ 2013-09-09 14:35 ` Steven Rostedt
  2013-09-09 14:35 ` [PATCH RT 02/16] x86/mce: fix mce timer interval Steven Rostedt
                   ` (14 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: Steven Rostedt @ 2013-09-09 14:35 UTC (permalink / raw)
  To: linux-kernel, linux-rt-users
  Cc: Thomas Gleixner, Carsten Emde, Sebastian Andrzej Siewior,
	John Kacur, stable-rt
[-- Attachment #1: 0001-sched-workqueue-Only-wake-up-idle-workers-if-not-blo.patch --]
[-- Type: text/plain, Size: 1430 bytes --]
From: Steven Rostedt <rostedt@goodmis.org>
In -rt, most spin_locks() turn into mutexes. One of these spin_lock
conversions is performed on the workqueue gcwq->lock. When the idle
worker is worken, the first thing it will do is grab that same lock and
it too will block, possibly jumping into the same code, but because
nr_running would already be decremented it prevents an infinite loop.
But this is still a waste of CPU cycles, and it doesn't follow the method
of mainline, as new workers should only be woken when a worker thread is
truly going to sleep, and not just blocked on a spin_lock().
Check the saved_state too before waking up new workers.
Cc: stable-rt@vger.kernel.org
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 kernel/sched.c |    4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/kernel/sched.c b/kernel/sched.c
index 96dd9c2..59bb8bc 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -4436,8 +4436,10 @@ static inline void sched_submit_work(struct task_struct *tsk)
 	/*
 	 * If a worker went to sleep, notify and ask workqueue whether
 	 * it wants to wake up a task to maintain concurrency.
+	 * Only call wake up if prev isn't blocked on a sleeping
+	 * spin lock.
 	 */
-	if (tsk->flags & PF_WQ_WORKER)
+	if (tsk->flags & PF_WQ_WORKER && !tsk->saved_state)
 		wq_worker_sleeping(tsk);
 
 	/*
-- 
1.7.10.4
^ permalink raw reply related	[flat|nested] 17+ messages in thread
- * [PATCH RT 02/16] x86/mce: fix mce timer interval
  2013-09-09 14:35 [PATCH RT 00/16] 3.0.89-rt118-rc1 stable review Steven Rostedt
  2013-09-09 14:35 ` [PATCH RT 01/16] sched/workqueue: Only wake up idle workers if not blocked on sleeping spin lock Steven Rostedt
@ 2013-09-09 14:35 ` Steven Rostedt
  2013-09-09 14:35 ` [PATCH RT 03/16] genirq: Set irq thread to RT priority on creation Steven Rostedt
                   ` (13 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: Steven Rostedt @ 2013-09-09 14:35 UTC (permalink / raw)
  To: linux-kernel, linux-rt-users
  Cc: Thomas Gleixner, Carsten Emde, Sebastian Andrzej Siewior,
	John Kacur, stable-rt, Mike Galbraith
[-- Attachment #1: 0002-x86-mce-fix-mce-timer-interval.patch --]
[-- Type: text/plain, Size: 1401 bytes --]
From: Mike Galbraith <bitbucket@online.de>
Seems mce timer fire at the wrong frequency in -rt kernels since roughly
forever due to 32 bit overflow.  3.8-rt is also missing a multiplier.
Add missing us -> ns conversion and 32 bit overflow prevention.
Cc: stable-rt@vger.kernel.org
Signed-off-by: Mike Galbraith <bitbucket@online.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
[bigeasy: use ULL instead of u64 cast]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 arch/x86/kernel/cpu/mcheck/mce.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index c859bb4..e51191f 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -1171,7 +1171,7 @@ static enum hrtimer_restart mce_start_timer(struct hrtimer *timer)
 		*n = min(*n*2, round_jiffies_relative(check_interval*HZ));
 
 	hrtimer_forward(timer, timer->base->get_time(),
-			ns_to_ktime(jiffies_to_usecs(*n) * 1000));
+			ns_to_ktime(jiffies_to_usecs(*n) * 1000ULL));
 	return HRTIMER_RESTART;
 }
 
@@ -1452,7 +1452,7 @@ static void __mcheck_cpu_init_timer(void)
 	if (!*n)
 		return;
 
-	hrtimer_start_range_ns(t, ns_to_ktime(jiffies_to_usecs(*n) * 1000),
+	hrtimer_start_range_ns(t, ns_to_ktime(jiffies_to_usecs(*n) * 1000ULL),
 			       0 , HRTIMER_MODE_REL_PINNED);
 }
 
-- 
1.7.10.4
^ permalink raw reply related	[flat|nested] 17+ messages in thread
- * [PATCH RT 03/16] genirq: Set irq thread to RT priority on creation
  2013-09-09 14:35 [PATCH RT 00/16] 3.0.89-rt118-rc1 stable review Steven Rostedt
  2013-09-09 14:35 ` [PATCH RT 01/16] sched/workqueue: Only wake up idle workers if not blocked on sleeping spin lock Steven Rostedt
  2013-09-09 14:35 ` [PATCH RT 02/16] x86/mce: fix mce timer interval Steven Rostedt
@ 2013-09-09 14:35 ` Steven Rostedt
  2013-09-09 14:35 ` [PATCH RT 04/16] list_bl.h: make list head locking RT safe Steven Rostedt
                   ` (12 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: Steven Rostedt @ 2013-09-09 14:35 UTC (permalink / raw)
  To: linux-kernel, linux-rt-users
  Cc: Thomas Gleixner, Carsten Emde, Sebastian Andrzej Siewior,
	John Kacur, stable-rt, Ivo Sieben
[-- Attachment #1: 0003-genirq-Set-irq-thread-to-RT-priority-on-creation.patch --]
[-- Type: text/plain, Size: 2248 bytes --]
From: Ivo Sieben <meltedpianoman@gmail.com>
When a threaded irq handler is installed the irq thread is initially
created on normal scheduling priority. Only after the irq thread is
woken up it sets its priority to RT_FIFO MAX_USER_RT_PRIO/2 itself.
This means that interrupts that occur directly after the irq handler
is installed will be handled on a normal scheduling priority instead
of the realtime priority that one would expect.
Fix this by setting the RT priority on creation of the irq_thread.
Cc: stable-rt@vger.kernel.org
Signed-off-by: Ivo Sieben <meltedpianoman@gmail.com>
Cc: Sebastian Andrzej Siewior  <bigeasy@linutronix.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/1370254322-17240-1-git-send-email-meltedpianoman@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 kernel/irq/manage.c |   10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)
diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
index d750268..9d702cf 100644
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -780,9 +780,6 @@ static irqreturn_t irq_thread_fn(struct irq_desc *desc,
  */
 static int irq_thread(void *data)
 {
-	static const struct sched_param param = {
-		.sched_priority = MAX_USER_RT_PRIO/2,
-	};
 	struct irqaction *action = data;
 	struct irq_desc *desc = irq_to_desc(action->irq);
 	irqreturn_t (*handler_fn)(struct irq_desc *desc,
@@ -795,7 +792,6 @@ static int irq_thread(void *data)
 	else
 		handler_fn = irq_thread_fn;
 
-	sched_setscheduler(current, SCHED_FIFO, ¶m);
 	current->irqaction = action;
 
 	while (!irq_wait_for_interrupt(action)) {
@@ -932,11 +928,17 @@ __setup_irq(unsigned int irq, struct irq_desc *desc, struct irqaction *new)
 	 */
 	if (new->thread_fn && !nested) {
 		struct task_struct *t;
+		static const struct sched_param param = {
+			.sched_priority = MAX_USER_RT_PRIO/2,
+		};
 
 		t = kthread_create(irq_thread, new, "irq/%d-%s", irq,
 				   new->name);
 		if (IS_ERR(t))
 			return PTR_ERR(t);
+
+		sched_setscheduler(t, SCHED_FIFO, ¶m);
+
 		/*
 		 * We keep the reference to the task struct even if
 		 * the thread dies to avoid that the interrupt code
-- 
1.7.10.4
^ permalink raw reply related	[flat|nested] 17+ messages in thread
- * [PATCH RT 04/16] list_bl.h: make list head locking RT safe
  2013-09-09 14:35 [PATCH RT 00/16] 3.0.89-rt118-rc1 stable review Steven Rostedt
                   ` (2 preceding siblings ...)
  2013-09-09 14:35 ` [PATCH RT 03/16] genirq: Set irq thread to RT priority on creation Steven Rostedt
@ 2013-09-09 14:35 ` Steven Rostedt
  2013-09-09 14:35 ` [PATCH RT 05/16] list_bl.h: fix it for for !SMP && !DEBUG_SPINLOCK Steven Rostedt
                   ` (11 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: Steven Rostedt @ 2013-09-09 14:35 UTC (permalink / raw)
  To: linux-kernel, linux-rt-users
  Cc: Thomas Gleixner, Carsten Emde, Sebastian Andrzej Siewior,
	John Kacur, stable-rt, Paul Gortmaker
[-- Attachment #1: 0004-list_bl.h-make-list-head-locking-RT-safe.patch --]
[-- Type: text/plain, Size: 3645 bytes --]
From: Paul Gortmaker <paul.gortmaker@windriver.com>
As per changes in include/linux/jbd_common.h for avoiding the
bit_spin_locks on RT ("fs: jbd/jbd2: Make state lock and journal
head lock rt safe") we do the same thing here.
We use the non atomic __set_bit and __clear_bit inside the scope of
the lock to preserve the ability of the existing LIST_DEBUG code to
use the zero'th bit in the sanity checks.
As a bit spinlock, we had no lockdep visibility into the usage
of the list head locking.  Now, if we were to implement it as a
standard non-raw spinlock, we would see:
BUG: sleeping function called from invalid context at kernel/rtmutex.c:658
in_atomic(): 1, irqs_disabled(): 0, pid: 122, name: udevd
5 locks held by udevd/122:
 #0:  (&sb->s_type->i_mutex_key#7/1){+.+.+.}, at: [<ffffffff811967e8>] lock_rename+0xe8/0xf0
 #1:  (rename_lock){+.+...}, at: [<ffffffff811a277c>] d_move+0x2c/0x60
 #2:  (&dentry->d_lock){+.+...}, at: [<ffffffff811a0763>] dentry_lock_for_move+0xf3/0x130
 #3:  (&dentry->d_lock/2){+.+...}, at: [<ffffffff811a0734>] dentry_lock_for_move+0xc4/0x130
 #4:  (&dentry->d_lock/3){+.+...}, at: [<ffffffff811a0747>] dentry_lock_for_move+0xd7/0x130
Pid: 122, comm: udevd Not tainted 3.4.47-rt62 #7
Call Trace:
 [<ffffffff810b9624>] __might_sleep+0x134/0x1f0
 [<ffffffff817a24d4>] rt_spin_lock+0x24/0x60
 [<ffffffff811a0c4c>] __d_shrink+0x5c/0xa0
 [<ffffffff811a1b2d>] __d_drop+0x1d/0x40
 [<ffffffff811a24be>] __d_move+0x8e/0x320
 [<ffffffff811a278e>] d_move+0x3e/0x60
 [<ffffffff81199598>] vfs_rename+0x198/0x4c0
 [<ffffffff8119b093>] sys_renameat+0x213/0x240
 [<ffffffff817a2de5>] ? _raw_spin_unlock+0x35/0x60
 [<ffffffff8107781c>] ? do_page_fault+0x1ec/0x4b0
 [<ffffffff817a32ca>] ? retint_swapgs+0xe/0x13
 [<ffffffff813eb0e6>] ? trace_hardirqs_on_thunk+0x3a/0x3f
 [<ffffffff8119b0db>] sys_rename+0x1b/0x20
 [<ffffffff817a3b96>] system_call_fastpath+0x1a/0x1f
Since we are only taking the lock during short lived list operations,
lets assume for now that it being raw won't be a significant latency
concern.
Cc: stable-rt@vger.kernel.org
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 include/linux/list_bl.h |   24 ++++++++++++++++++++++--
 1 file changed, 22 insertions(+), 2 deletions(-)
diff --git a/include/linux/list_bl.h b/include/linux/list_bl.h
index 31f9d75..ddfd46a 100644
--- a/include/linux/list_bl.h
+++ b/include/linux/list_bl.h
@@ -2,6 +2,7 @@
 #define _LINUX_LIST_BL_H
 
 #include <linux/list.h>
+#include <linux/spinlock.h>
 #include <linux/bit_spinlock.h>
 
 /*
@@ -32,13 +33,22 @@
 
 struct hlist_bl_head {
 	struct hlist_bl_node *first;
+#ifdef CONFIG_PREEMPT_RT_BASE
+	raw_spinlock_t lock;
+#endif
 };
 
 struct hlist_bl_node {
 	struct hlist_bl_node *next, **pprev;
 };
-#define INIT_HLIST_BL_HEAD(ptr) \
-	((ptr)->first = NULL)
+
+static inline void INIT_HLIST_BL_HEAD(struct hlist_bl_head *h)
+{
+	h->first = NULL;
+#ifdef CONFIG_PREEMPT_RT_BASE
+	raw_spin_lock_init(&h->lock);
+#endif
+}
 
 static inline void INIT_HLIST_BL_NODE(struct hlist_bl_node *h)
 {
@@ -117,12 +127,22 @@ static inline void hlist_bl_del_init(struct hlist_bl_node *n)
 
 static inline void hlist_bl_lock(struct hlist_bl_head *b)
 {
+#ifndef CONFIG_PREEMPT_RT_BASE
 	bit_spin_lock(0, (unsigned long *)b);
+#else
+	raw_spin_lock(&b->lock);
+	__set_bit(0, (unsigned long *)b);
+#endif
 }
 
 static inline void hlist_bl_unlock(struct hlist_bl_head *b)
 {
+#ifndef CONFIG_PREEMPT_RT_BASE
 	__bit_spin_unlock(0, (unsigned long *)b);
+#else
+	__clear_bit(0, (unsigned long *)b);
+	raw_spin_unlock(&b->lock);
+#endif
 }
 
 /**
-- 
1.7.10.4
^ permalink raw reply related	[flat|nested] 17+ messages in thread
- * [PATCH RT 05/16] list_bl.h: fix it for for !SMP && !DEBUG_SPINLOCK
  2013-09-09 14:35 [PATCH RT 00/16] 3.0.89-rt118-rc1 stable review Steven Rostedt
                   ` (3 preceding siblings ...)
  2013-09-09 14:35 ` [PATCH RT 04/16] list_bl.h: make list head locking RT safe Steven Rostedt
@ 2013-09-09 14:35 ` Steven Rostedt
  2013-09-09 14:35 ` [PATCH RT 06/16] timers: prepare for full preemption improve Steven Rostedt
                   ` (10 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: Steven Rostedt @ 2013-09-09 14:35 UTC (permalink / raw)
  To: linux-kernel, linux-rt-users
  Cc: Thomas Gleixner, Carsten Emde, Sebastian Andrzej Siewior,
	John Kacur, stable-rt, Paul Gortmaker, Uwe Kleine-König
[-- Attachment #1: 0005-list_bl.h-fix-it-for-for-SMP-DEBUG_SPINLOCK.patch --]
[-- Type: TEXT/PLAIN, Size: 2052 bytes --]
From: =?UTF-8?q?Uwe=20Kleine-K=C3=B6nig?= <u.kleine-koenig@pengutronix.de>
The patch "list_bl.h: make list head locking RT safe" introduced
an unconditional
	__set_bit(0, (unsigned long *)b);
in void hlist_bl_lock(struct hlist_bl_head *b). This clobbers the value
of b->first. When the value of b->first is retrieved using
hlist_bl_first the clobbering is undone using
	(unsigned long)h->first & ~LIST_BL_LOCKMASK
and so depending on LIST_BL_LOCKMASK being one. But LIST_BL_LOCKMASK is
only one if at least on of CONFIG_SMP and CONFIG_DEBUG_SPINLOCK are
defined. Without these the value returned by hlist_bl_first has the
zeroth bit set which likely results in a crash.
So only do the clobbering in the cases where LIST_BL_LOCKMASK is one.
An alternative would be to always define LIST_BL_LOCKMASK to one with
CONFIG_PREEMPT_RT_BASE.
Cc: stable-rt@vger.kernel.org
Acked-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Tested-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 include/linux/list_bl.h |    4 ++++
 1 file changed, 4 insertions(+)
diff --git a/include/linux/list_bl.h b/include/linux/list_bl.h
index ddfd46a..becd7a6 100644
--- a/include/linux/list_bl.h
+++ b/include/linux/list_bl.h
@@ -131,8 +131,10 @@ static inline void hlist_bl_lock(struct hlist_bl_head *b)
 	bit_spin_lock(0, (unsigned long *)b);
 #else
 	raw_spin_lock(&b->lock);
+#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK)
 	__set_bit(0, (unsigned long *)b);
 #endif
+#endif
 }
 
 static inline void hlist_bl_unlock(struct hlist_bl_head *b)
@@ -140,7 +142,9 @@ static inline void hlist_bl_unlock(struct hlist_bl_head *b)
 #ifndef CONFIG_PREEMPT_RT_BASE
 	__bit_spin_unlock(0, (unsigned long *)b);
 #else
+#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK)
 	__clear_bit(0, (unsigned long *)b);
+#endif
 	raw_spin_unlock(&b->lock);
 #endif
 }
-- 
1.7.10.4
^ permalink raw reply related	[flat|nested] 17+ messages in thread
- * [PATCH RT 06/16] timers: prepare for full preemption improve
  2013-09-09 14:35 [PATCH RT 00/16] 3.0.89-rt118-rc1 stable review Steven Rostedt
                   ` (4 preceding siblings ...)
  2013-09-09 14:35 ` [PATCH RT 05/16] list_bl.h: fix it for for !SMP && !DEBUG_SPINLOCK Steven Rostedt
@ 2013-09-09 14:35 ` Steven Rostedt
  2013-09-09 14:35 ` [PATCH RT 07/16] kernel/cpu: fix cpu down problem if kthreads cpu is going down Steven Rostedt
                   ` (9 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: Steven Rostedt @ 2013-09-09 14:35 UTC (permalink / raw)
  To: linux-kernel, linux-rt-users
  Cc: Thomas Gleixner, Carsten Emde, Sebastian Andrzej Siewior,
	John Kacur, stable-rt, Zhao Hongjiang
[-- Attachment #1: 0006-timers-prepare-for-full-preemption-improve.patch --]
[-- Type: text/plain, Size: 1740 bytes --]
From: Zhao Hongjiang <zhaohongjiang@huawei.com>
wake_up should do nothing on the nort, so we should use wakeup_timer_waiters,
also fix a spell mistake.
Cc: stable-rt@vger.kernel.org
Signed-off-by: Zhao Hongjiang <zhaohongjiang@huawei.com>
[bigeasy: s/CONFIG_PREEMPT_RT_BASE/CONFIG_PREEMPT_RT_FULL/]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 kernel/timer.c |    8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/kernel/timer.c b/kernel/timer.c
index 2e21a6c..07070cb 100644
--- a/kernel/timer.c
+++ b/kernel/timer.c
@@ -76,7 +76,9 @@ struct tvec_root {
 struct tvec_base {
 	spinlock_t lock;
 	struct timer_list *running_timer;
+#ifdef CONFIG_PREEMPT_RT_FULL
 	wait_queue_head_t wait_for_running_timer;
+#endif
 	unsigned long timer_jiffies;
 	unsigned long next_timer;
 	struct tvec_root tv1;
@@ -930,7 +932,7 @@ static void wait_for_running_timer(struct timer_list *timer)
 			   base->running_timer != timer);
 }
 
-# define wakeup_timer_waiters(b)	wake_up(&(b)->wait_for_tunning_timer)
+# define wakeup_timer_waiters(b)	wake_up(&(b)->wait_for_running_timer)
 #else
 static inline void wait_for_running_timer(struct timer_list *timer)
 {
@@ -1183,7 +1185,7 @@ static inline void __run_timers(struct tvec_base *base)
 			spin_lock_irq(&base->lock);
 		}
 	}
-	wake_up(&base->wait_for_running_timer);
+	wakeup_timer_waiters(base);
 	spin_unlock_irq(&base->lock);
 }
 
@@ -1706,7 +1708,9 @@ static int __cpuinit init_timers_cpu(int cpu)
 			base = &boot_tvec_bases;
 		}
 		spin_lock_init(&base->lock);
+#ifdef CONFIG_PREEMPT_RT_FULL
 		init_waitqueue_head(&base->wait_for_running_timer);
+#endif
 		tvec_base_done[cpu] = 1;
 	} else {
 		base = per_cpu(tvec_bases, cpu);
-- 
1.7.10.4
^ permalink raw reply related	[flat|nested] 17+ messages in thread
- * [PATCH RT 07/16] kernel/cpu: fix cpu down problem if kthreads cpu is going down
  2013-09-09 14:35 [PATCH RT 00/16] 3.0.89-rt118-rc1 stable review Steven Rostedt
                   ` (5 preceding siblings ...)
  2013-09-09 14:35 ` [PATCH RT 06/16] timers: prepare for full preemption improve Steven Rostedt
@ 2013-09-09 14:35 ` Steven Rostedt
  2013-09-09 14:35 ` [PATCH RT 08/16] kernel/hotplug: restore original cpu mask oncpu/down Steven Rostedt
                   ` (8 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: Steven Rostedt @ 2013-09-09 14:35 UTC (permalink / raw)
  To: linux-kernel, linux-rt-users
  Cc: Thomas Gleixner, Carsten Emde, Sebastian Andrzej Siewior,
	John Kacur, stable-rt
[-- Attachment #1: 0007-kernel-cpu-fix-cpu-down-problem-if-kthread-s-cpu-is-.patch --]
[-- Type: text/plain, Size: 2635 bytes --]
From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
If kthread is pinned to CPUx and CPUx is going down then we get into
trouble:
- first the unplug thread is created
- it will set itself to hp->unplug. As a result, every task that is
  going to take a lock, has to leave the CPU.
- the CPU_DOWN_PREPARE notifier are started. The worker thread will
  start a new process for the "high priority worker".
  Now kthread would like to take a lock but since it can't leave the CPU
  it will never complete its task.
We could fire the unplug thread after the notifier but then the cpu is
no longer marked "online" and the unplug thread will run on CPU0 which
was fixed before :)
So instead the unplug thread is started and kept waiting until the
notfier complete their work.
Cc: stable-rt@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 kernel/cpu.c |   16 ++++++++++++++--
 1 file changed, 14 insertions(+), 2 deletions(-)
diff --git a/kernel/cpu.c b/kernel/cpu.c
index 3bcbf99..25853dc 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -78,6 +78,7 @@ struct hotplug_pcp {
 	int refcount;
 	int grab_lock;
 	struct completion synced;
+	struct completion unplug_wait;
 #ifdef CONFIG_PREEMPT_RT_FULL
 	spinlock_t lock;
 #else
@@ -175,6 +176,7 @@ static int sync_unplug_thread(void *data)
 {
 	struct hotplug_pcp *hp = data;
 
+	wait_for_completion(&hp->unplug_wait);
 	preempt_disable();
 	hp->unplug = current;
 	wait_for_pinned_cpus(hp);
@@ -240,6 +242,14 @@ static void __cpu_unplug_sync(struct hotplug_pcp *hp)
 	wait_for_completion(&hp->synced);
 }
 
+static void __cpu_unplug_wait(unsigned int cpu)
+{
+	struct hotplug_pcp *hp = &per_cpu(hotplug_pcp, cpu);
+
+	complete(&hp->unplug_wait);
+	wait_for_completion(&hp->synced);
+}
+
 /*
  * Start the sync_unplug_thread on the target cpu and wait for it to
  * complete.
@@ -263,6 +273,7 @@ static int cpu_unplug_begin(unsigned int cpu)
 	tell_sched_cpu_down_begin(cpu);
 
 	init_completion(&hp->synced);
+	init_completion(&hp->unplug_wait);
 
 	hp->sync_tsk = kthread_create(sync_unplug_thread, hp, "sync_unplug/%d", cpu);
 	if (IS_ERR(hp->sync_tsk)) {
@@ -278,8 +289,7 @@ static int cpu_unplug_begin(unsigned int cpu)
 	 * wait for tasks that are going to enter these sections and
 	 * we must not have them block.
 	 */
-	__cpu_unplug_sync(hp);
-
+	wake_up_process(hp->sync_tsk);
 	return 0;
 }
 
@@ -524,6 +534,8 @@ static int __ref _cpu_down(unsigned int cpu, int tasks_frozen)
 		goto out_release;
 	}
 
+	__cpu_unplug_wait(cpu);
+
 	/* Notifiers are done. Don't let any more tasks pin this CPU. */
 	cpu_unplug_sync(cpu);
 
-- 
1.7.10.4
^ permalink raw reply related	[flat|nested] 17+ messages in thread
- * [PATCH RT 08/16] kernel/hotplug: restore original cpu mask oncpu/down
  2013-09-09 14:35 [PATCH RT 00/16] 3.0.89-rt118-rc1 stable review Steven Rostedt
                   ` (6 preceding siblings ...)
  2013-09-09 14:35 ` [PATCH RT 07/16] kernel/cpu: fix cpu down problem if kthreads cpu is going down Steven Rostedt
@ 2013-09-09 14:35 ` Steven Rostedt
  2013-09-09 14:35 ` [PATCH RT 09/16] drm/i915: drop trace_i915_gem_ring_dispatch on rt Steven Rostedt
                   ` (7 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: Steven Rostedt @ 2013-09-09 14:35 UTC (permalink / raw)
  To: linux-kernel, linux-rt-users
  Cc: Thomas Gleixner, Carsten Emde, Sebastian Andrzej Siewior,
	John Kacur, stable-rt
[-- Attachment #1: 0008-kernel-hotplug-restore-original-cpu-mask-oncpu-down.patch --]
[-- Type: text/plain, Size: 1818 bytes --]
From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
If a task which is allowed to run only on CPU X puts CPU Y down then it
will be allowed on all CPUs but the on CPU Y after it comes back from
kernel. This patch ensures that we don't lose the initial setting unless
the CPU the task is running is going down.
Cc: stable-rt@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 kernel/cpu.c |   13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)
diff --git a/kernel/cpu.c b/kernel/cpu.c
index 25853dc..4abfd5d 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -497,6 +497,7 @@ static int __ref _cpu_down(unsigned int cpu, int tasks_frozen)
 		.hcpu = hcpu,
 	};
 	cpumask_var_t cpumask;
+	cpumask_var_t cpumask_org;
 
 	if (num_online_cpus() == 1)
 		return -EBUSY;
@@ -507,6 +508,12 @@ static int __ref _cpu_down(unsigned int cpu, int tasks_frozen)
 	/* Move the downtaker off the unplug cpu */
 	if (!alloc_cpumask_var(&cpumask, GFP_KERNEL))
 		return -ENOMEM;
+	if (!alloc_cpumask_var(&cpumask_org, GFP_KERNEL))  {
+		free_cpumask_var(cpumask);
+		return -ENOMEM;
+	}
+
+	cpumask_copy(cpumask_org, tsk_cpus_allowed(current));
 	cpumask_andnot(cpumask, cpu_online_mask, cpumask_of(cpu));
 	set_cpus_allowed_ptr(current, cpumask);
 	free_cpumask_var(cpumask);
@@ -515,7 +522,8 @@ static int __ref _cpu_down(unsigned int cpu, int tasks_frozen)
 	if (mycpu == cpu) {
 		printk(KERN_ERR "Yuck! Still on unplug CPU\n!");
 		migrate_enable();
-		return -EBUSY;
+		err = -EBUSY;
+		goto restore_cpus;
 	}
 
 	cpu_hotplug_begin();
@@ -573,6 +581,9 @@ out_cancel:
 	cpu_hotplug_done();
 	if (!err)
 		cpu_notify_nofail(CPU_POST_DEAD | mod, hcpu);
+restore_cpus:
+	set_cpus_allowed_ptr(current, cpumask_org);
+	free_cpumask_var(cpumask_org);
 	return err;
 }
 
-- 
1.7.10.4
^ permalink raw reply related	[flat|nested] 17+ messages in thread
- * [PATCH RT 09/16] drm/i915: drop trace_i915_gem_ring_dispatch on rt
  2013-09-09 14:35 [PATCH RT 00/16] 3.0.89-rt118-rc1 stable review Steven Rostedt
                   ` (7 preceding siblings ...)
  2013-09-09 14:35 ` [PATCH RT 08/16] kernel/hotplug: restore original cpu mask oncpu/down Steven Rostedt
@ 2013-09-09 14:35 ` Steven Rostedt
  2013-09-09 14:35 ` [PATCH RT 10/16] rt,ntp: Move call to schedule_delayed_work() to helper thread Steven Rostedt
                   ` (6 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: Steven Rostedt @ 2013-09-09 14:35 UTC (permalink / raw)
  To: linux-kernel, linux-rt-users
  Cc: Thomas Gleixner, Carsten Emde, Sebastian Andrzej Siewior,
	John Kacur, stable-rt, Joakim Hernberg
[-- Attachment #1: 0009-drm-i915-drop-trace_i915_gem_ring_dispatch-on-rt.patch --]
[-- Type: text/plain, Size: 2238 bytes --]
From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
This tracepoint is responsible for:
|[<814cc358>] __schedule_bug+0x4d/0x59
|[<814d24cc>] __schedule+0x88c/0x930
|[<814d3b90>] ? _raw_spin_unlock_irqrestore+0x40/0x50
|[<814d3b95>] ? _raw_spin_unlock_irqrestore+0x45/0x50
|[<810b57b5>] ? task_blocks_on_rt_mutex+0x1f5/0x250
|[<814d27d9>] schedule+0x29/0x70
|[<814d3423>] rt_spin_lock_slowlock+0x15b/0x278
|[<814d3786>] rt_spin_lock+0x26/0x30
|[<a00dced9>] gen6_gt_force_wake_get+0x29/0x60 [i915]
|[<a00e183f>] gen6_ring_get_irq+0x5f/0x100 [i915]
|[<a00b2a33>] ftrace_raw_event_i915_gem_ring_dispatch+0xe3/0x100 [i915]
|[<a00ac1b3>] i915_gem_do_execbuffer.isra.13+0xbd3/0x1430 [i915]
|[<810f8943>] ? trace_buffer_unlock_commit+0x43/0x60
|[<8113e8d2>] ? ftrace_raw_event_kmem_alloc+0xd2/0x180
|[<8101d063>] ? native_sched_clock+0x13/0x80
|[<a00acf29>] i915_gem_execbuffer2+0x99/0x280 [i915]
|[<a00114a3>] drm_ioctl+0x4c3/0x570 [drm]
|[<8101d0d9>] ? sched_clock+0x9/0x10
|[<a00ace90>] ? i915_gem_execbuffer+0x480/0x480 [i915]
|[<810f1c18>] ? rb_commit+0x68/0xa0
|[<810f1c6c>] ? ring_buffer_unlock_commit+0x1c/0xa0
|[<81197467>] do_vfs_ioctl+0x97/0x540
|[<81021318>] ? ftrace_raw_event_sys_enter+0xd8/0x130
|[<811979a1>] sys_ioctl+0x91/0xb0
|[<814db931>] tracesys+0xe1/0xe6
Chris Wilson does not like to move i915_trace_irq_get() out of the macro
|No. This enables the IRQ, as well as making a number of
|very expensively serialised read, unconditionally.
so it is gone now on RT.
Cc: stable-rt@vger.kernel.org
Reported-by: Joakim Hernberg <jbh@alchemy.lu>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |    2 ++
 1 file changed, 2 insertions(+)
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 1ca53ff..4d04a9f 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -1189,7 +1189,9 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 		}
 	}
 
+#ifndef CONFIG_PREEMPT_RT_BASE
 	trace_i915_gem_ring_dispatch(ring, seqno);
+#endif
 
 	exec_start = batch_obj->gtt_offset + args->batch_start_offset;
 	exec_len = args->batch_len;
-- 
1.7.10.4
^ permalink raw reply related	[flat|nested] 17+ messages in thread
- * [PATCH RT 10/16] rt,ntp: Move call to schedule_delayed_work() to helper thread
  2013-09-09 14:35 [PATCH RT 00/16] 3.0.89-rt118-rc1 stable review Steven Rostedt
                   ` (8 preceding siblings ...)
  2013-09-09 14:35 ` [PATCH RT 09/16] drm/i915: drop trace_i915_gem_ring_dispatch on rt Steven Rostedt
@ 2013-09-09 14:35 ` Steven Rostedt
  2013-09-09 14:35 ` [PATCH RT 11/16] hwlat-detector: Update hwlat_detector to add outer loop detection Steven Rostedt
                   ` (5 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: Steven Rostedt @ 2013-09-09 14:35 UTC (permalink / raw)
  To: linux-kernel, linux-rt-users
  Cc: Thomas Gleixner, Carsten Emde, Sebastian Andrzej Siewior,
	John Kacur, stable-rt
[-- Attachment #1: 0010-rt-ntp-Move-call-to-schedule_delayed_work-to-helper-.patch --]
[-- Type: text/plain, Size: 2645 bytes --]
From: Steven Rostedt <rostedt@goodmis.org>
The ntp code for notify_cmos_timer() is called from a hard interrupt
context. schedule_delayed_work() under PREEMPT_RT_FULL calls spinlocks
that have been converted to mutexes, thus calling schedule_delayed_work()
from interrupt is not safe.
Add a helper thread that does the call to schedule_delayed_work and wake
up that thread instead of calling schedule_delayed_work() directly.
This is only for CONFIG_PREEMPT_RT_FULL, otherwise the code still calls
schedule_delayed_work() directly in irq context.
Note: There's a few places in the kernel that do this. Perhaps the RT
code should have a dedicated thread that does the checks. Just register
a notifier on boot up for your check and wake up the thread when
needed. This will be a todo.
Cc: stable-rt@vger.kernel.org
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 kernel/time/ntp.c |   42 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 42 insertions(+)
diff --git a/kernel/time/ntp.c b/kernel/time/ntp.c
index 8b3a185..fa0c206 100644
--- a/kernel/time/ntp.c
+++ b/kernel/time/ntp.c
@@ -10,6 +10,7 @@
 #include <linux/workqueue.h>
 #include <linux/hrtimer.h>
 #include <linux/jiffies.h>
+#include <linux/kthread.h>
 #include <linux/math64.h>
 #include <linux/timex.h>
 #include <linux/time.h>
@@ -494,11 +495,52 @@ static void sync_cmos_clock(struct work_struct *work)
 	schedule_delayed_work(&sync_cmos_work, timespec_to_jiffies(&next));
 }
 
+#ifdef CONFIG_PREEMPT_RT_FULL
+/*
+ * RT can not call schedule_delayed_work from real interrupt context.
+ * Need to make a thread to do the real work.
+ */
+static struct task_struct *cmos_delay_thread;
+static bool do_cmos_delay;
+
+static int run_cmos_delay(void *ignore)
+{
+	while (!kthread_should_stop()) {
+		set_current_state(TASK_INTERRUPTIBLE);
+		if (do_cmos_delay) {
+			do_cmos_delay = false;
+			schedule_delayed_work(&sync_cmos_work, 0);
+		}
+		schedule();
+	}
+	__set_current_state(TASK_RUNNING);
+	return 0;
+}
+
+static void notify_cmos_timer(void)
+{
+	if (!no_sync_cmos_clock) {
+		do_cmos_delay = true;
+		/* Make visible before waking up process */
+		smp_wmb();
+		wake_up_process(cmos_delay_thread);
+	}
+}
+
+static __init int create_cmos_delay_thread(void)
+{
+	cmos_delay_thread = kthread_run(run_cmos_delay, NULL, "kcmosdelayd");
+	BUG_ON(!cmos_delay_thread);
+	return 0;
+}
+early_initcall(create_cmos_delay_thread);
+#else
 static void notify_cmos_timer(void)
 {
 	if (!no_sync_cmos_clock)
 		schedule_delayed_work(&sync_cmos_work, 0);
 }
+#endif /* CONFIG_PREEMPT_RT_FULL */
 
 #else
 static inline void notify_cmos_timer(void) { }
-- 
1.7.10.4
^ permalink raw reply related	[flat|nested] 17+ messages in thread
- * [PATCH RT 11/16] hwlat-detector: Update hwlat_detector to add outer loop detection
  2013-09-09 14:35 [PATCH RT 00/16] 3.0.89-rt118-rc1 stable review Steven Rostedt
                   ` (9 preceding siblings ...)
  2013-09-09 14:35 ` [PATCH RT 10/16] rt,ntp: Move call to schedule_delayed_work() to helper thread Steven Rostedt
@ 2013-09-09 14:35 ` Steven Rostedt
  2013-09-09 14:35 ` [PATCH RT 12/16] hwlat-detect/trace: Export trace_clock_local for hwlat-detector Steven Rostedt
                   ` (4 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: Steven Rostedt @ 2013-09-09 14:35 UTC (permalink / raw)
  To: linux-kernel, linux-rt-users
  Cc: Thomas Gleixner, Carsten Emde, Sebastian Andrzej Siewior,
	John Kacur, stable-rt, Steven Rostedt
[-- Attachment #1: 0011-hwlat-detector-Update-hwlat_detector-to-add-outer-lo.patch --]
[-- Type: text/plain, Size: 3993 bytes --]
From: Steven Rostedt <rostedt@goodmis.org>
The hwlat_detector reads two timestamps in a row, then reports any
gap between those calls. The problem is, it misses everything between
the second reading of the time stamp to the first reading of the time stamp
in the next loop. That's were most of the time is spent, which means,
chances are likely that it will miss all hardware latencies. This
defeats the purpose.
By also testing the first time stamp from the previous loop second
time stamp (the outer loop), we are more likely to find a latency.
Setting the threshold to 1, here's what the report now looks like:
1347415723.0232202770	0	2
1347415725.0234202822	0	2
1347415727.0236202875	0	2
1347415729.0238202928	0	2
1347415731.0240202980	0	2
1347415734.0243203061	0	2
1347415736.0245203113	0	2
1347415738.0247203166	2	0
1347415740.0249203219	0	3
1347415742.0251203272	0	3
1347415743.0252203299	0	3
1347415745.0254203351	0	2
1347415747.0256203404	0	2
1347415749.0258203457	0	2
1347415751.0260203510	0	2
1347415754.0263203589	0	2
1347415756.0265203642	0	2
1347415758.0267203695	0	2
1347415760.0269203748	0	2
1347415762.0271203801	0	2
1347415764.0273203853	2	0
There's some hardware latency that takes 2 microseconds to run.
Cc: stable-rt@vger.kernel.org
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 drivers/misc/hwlat_detector.c |   32 ++++++++++++++++++++++++++------
 1 file changed, 26 insertions(+), 6 deletions(-)
diff --git a/drivers/misc/hwlat_detector.c b/drivers/misc/hwlat_detector.c
index b7b7c90..f93b8ef 100644
--- a/drivers/misc/hwlat_detector.c
+++ b/drivers/misc/hwlat_detector.c
@@ -143,6 +143,7 @@ static void detector_exit(void);
 struct sample {
 	u64		seqnum;		/* unique sequence */
 	u64		duration;	/* ktime delta */
+	u64		outer_duration;	/* ktime delta (outer loop) */
 	struct timespec	timestamp;	/* wall time */
 	unsigned long   lost;
 };
@@ -219,11 +220,13 @@ static struct sample *buffer_get_sample(struct sample *sample)
  */
 static int get_sample(void *unused)
 {
-	ktime_t start, t1, t2;
+	ktime_t start, t1, t2, last_t2;
 	s64 diff, total = 0;
 	u64 sample = 0;
+	u64 outer_sample = 0;
 	int ret = 1;
 
+	last_t2.tv64 = 0;
 	start = ktime_get(); /* start timestamp */
 
 	do {
@@ -231,7 +234,22 @@ static int get_sample(void *unused)
 		t1 = ktime_get();	/* we'll look for a discontinuity */
 		t2 = ktime_get();
 
+		if (last_t2.tv64) {
+			/* Check the delta from the outer loop (t2 to next t1) */
+			diff = ktime_to_us(ktime_sub(t1, last_t2));
+			/* This shouldn't happen */
+			if (diff < 0) {
+				printk(KERN_ERR BANNER "time running backwards\n");
+				goto out;
+			}
+			if (diff > outer_sample)
+				outer_sample = diff;
+		}
+		last_t2 = t2;
+
 		total = ktime_to_us(ktime_sub(t2, start)); /* sample width */
+
+		/* This checks the inner loop (t1 to t2) */
 		diff = ktime_to_us(ktime_sub(t2, t1));     /* current diff */
 
 		/* This shouldn't happen */
@@ -246,12 +264,13 @@ static int get_sample(void *unused)
 	} while (total <= data.sample_width);
 
 	/* If we exceed the threshold value, we have found a hardware latency */
-	if (sample > data.threshold) {
+	if (sample > data.threshold || outer_sample > data.threshold) {
 		struct sample s;
 
 		data.count++;
 		s.seqnum = data.count;
 		s.duration = sample;
+		s.outer_duration = outer_sample;
 		s.timestamp = CURRENT_TIME;
 		__buffer_add_sample(&s);
 
@@ -738,10 +757,11 @@ static ssize_t debug_sample_fread(struct file *filp, char __user *ubuf,
 		}
 	}
 
-	len = snprintf(buf, sizeof(buf), "%010lu.%010lu\t%llu\n",
-		      sample->timestamp.tv_sec,
-		      sample->timestamp.tv_nsec,
-		      sample->duration);
+	len = snprintf(buf, sizeof(buf), "%010lu.%010lu\t%llu\t%llu\n",
+		       sample->timestamp.tv_sec,
+		       sample->timestamp.tv_nsec,
+		       sample->duration,
+		       sample->outer_duration);
 
 
 	/* handling partial reads is more trouble than it's worth */
-- 
1.7.10.4
^ permalink raw reply related	[flat|nested] 17+ messages in thread
- * [PATCH RT 12/16] hwlat-detect/trace: Export trace_clock_local for hwlat-detector
  2013-09-09 14:35 [PATCH RT 00/16] 3.0.89-rt118-rc1 stable review Steven Rostedt
                   ` (10 preceding siblings ...)
  2013-09-09 14:35 ` [PATCH RT 11/16] hwlat-detector: Update hwlat_detector to add outer loop detection Steven Rostedt
@ 2013-09-09 14:35 ` Steven Rostedt
  2013-09-09 14:35 ` [PATCH RT 13/16] hwlat-detector: Use trace_clock_local if available Steven Rostedt
                   ` (3 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: Steven Rostedt @ 2013-09-09 14:35 UTC (permalink / raw)
  To: linux-kernel, linux-rt-users
  Cc: Thomas Gleixner, Carsten Emde, Sebastian Andrzej Siewior,
	John Kacur
[-- Attachment #1: 0012-hwlat-detect-trace-Export-trace_clock_local-for-hwla.patch --]
[-- Type: text/plain, Size: 735 bytes --]
From: "Steven Rostedt (Red Hat)" <rostedt@goodmis.org>
The hwlat-detector needs a better clock than just ktime_get() as that
can induce its own latencies. The trace clock is perfect for it, but
it needs to be exported for use by modules.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 kernel/trace/trace_clock.c |    1 +
 1 file changed, 1 insertion(+)
diff --git a/kernel/trace/trace_clock.c b/kernel/trace/trace_clock.c
index 6302747..e5163ab 100644
--- a/kernel/trace/trace_clock.c
+++ b/kernel/trace/trace_clock.c
@@ -44,6 +44,7 @@ u64 notrace trace_clock_local(void)
 
 	return clock;
 }
+EXPORT_SYMBOL_GPL(trace_clock_local);
 
 /*
  * trace_clock(): 'between' trace clock. Not completely serialized,
-- 
1.7.10.4
^ permalink raw reply related	[flat|nested] 17+ messages in thread
- * [PATCH RT 13/16] hwlat-detector: Use trace_clock_local if available
  2013-09-09 14:35 [PATCH RT 00/16] 3.0.89-rt118-rc1 stable review Steven Rostedt
                   ` (11 preceding siblings ...)
  2013-09-09 14:35 ` [PATCH RT 12/16] hwlat-detect/trace: Export trace_clock_local for hwlat-detector Steven Rostedt
@ 2013-09-09 14:35 ` Steven Rostedt
  2013-09-09 14:35 ` [PATCH RT 14/16] hwlat-detector: Use thread instead of stop machine Steven Rostedt
                   ` (2 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: Steven Rostedt @ 2013-09-09 14:35 UTC (permalink / raw)
  To: linux-kernel, linux-rt-users
  Cc: Thomas Gleixner, Carsten Emde, Sebastian Andrzej Siewior,
	John Kacur, stable-rt, Steven Rostedt
[-- Attachment #1: 0013-hwlat-detector-Use-trace_clock_local-if-available.patch --]
[-- Type: text/plain, Size: 3095 bytes --]
From: Steven Rostedt <rostedt@goodmis.org>
As ktime_get() calls into the timing code which does a read_seq(), it
may be affected by other CPUS that touch that lock. To remove this
dependency, use the trace_clock_local() which is already exported
for module use. If CONFIG_TRACING is enabled, use that as the clock,
otherwise use ktime_get().
Cc: stable-rt@vger.kernel.org
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 drivers/misc/hwlat_detector.c |   34 +++++++++++++++++++++++++---------
 1 file changed, 25 insertions(+), 9 deletions(-)
diff --git a/drivers/misc/hwlat_detector.c b/drivers/misc/hwlat_detector.c
index f93b8ef..13443e9 100644
--- a/drivers/misc/hwlat_detector.c
+++ b/drivers/misc/hwlat_detector.c
@@ -51,6 +51,7 @@
 #include <linux/version.h>
 #include <linux/delay.h>
 #include <linux/slab.h>
+#include <linux/trace_clock.h>
 
 #define BUF_SIZE_DEFAULT	262144UL		/* 8K*(sizeof(entry)) */
 #define BUF_FLAGS		(RB_FL_OVERWRITE)	/* no block on full */
@@ -211,6 +212,21 @@ static struct sample *buffer_get_sample(struct sample *sample)
 	return sample;
 }
 
+#ifndef CONFIG_TRACING
+#define time_type	ktime_t
+#define time_get()	ktime_get()
+#define time_to_us(x)	ktime_to_us(x)
+#define time_sub(a, b)	ktime_sub(a, b)
+#define init_time(a, b)	(a).tv64 = b
+#define time_u64(a)	(a).tv64
+#else
+#define time_type	u64
+#define time_get()	trace_clock_local()
+#define time_to_us(x)	div_u64(x, 1000)
+#define time_sub(a, b)	((a) - (b))
+#define init_time(a, b)	a = b
+#define time_u64(a)	a
+#endif
 /**
  * get_sample - sample the CPU TSC and look for likely hardware latencies
  * @unused: This is not used but is a part of the stop_machine API
@@ -220,23 +236,23 @@ static struct sample *buffer_get_sample(struct sample *sample)
  */
 static int get_sample(void *unused)
 {
-	ktime_t start, t1, t2, last_t2;
+	time_type start, t1, t2, last_t2;
 	s64 diff, total = 0;
 	u64 sample = 0;
 	u64 outer_sample = 0;
 	int ret = 1;
 
-	last_t2.tv64 = 0;
-	start = ktime_get(); /* start timestamp */
+	init_time(last_t2, 0);
+	start = time_get(); /* start timestamp */
 
 	do {
 
-		t1 = ktime_get();	/* we'll look for a discontinuity */
-		t2 = ktime_get();
+		t1 = time_get();	/* we'll look for a discontinuity */
+		t2 = time_get();
 
-		if (last_t2.tv64) {
+		if (time_u64(last_t2)) {
 			/* Check the delta from the outer loop (t2 to next t1) */
-			diff = ktime_to_us(ktime_sub(t1, last_t2));
+			diff = time_to_us(time_sub(t1, last_t2));
 			/* This shouldn't happen */
 			if (diff < 0) {
 				printk(KERN_ERR BANNER "time running backwards\n");
@@ -247,10 +263,10 @@ static int get_sample(void *unused)
 		}
 		last_t2 = t2;
 
-		total = ktime_to_us(ktime_sub(t2, start)); /* sample width */
+		total = time_to_us(time_sub(t2, start)); /* sample width */
 
 		/* This checks the inner loop (t1 to t2) */
-		diff = ktime_to_us(ktime_sub(t2, t1));     /* current diff */
+		diff = time_to_us(time_sub(t2, t1));     /* current diff */
 
 		/* This shouldn't happen */
 		if (diff < 0) {
-- 
1.7.10.4
^ permalink raw reply related	[flat|nested] 17+ messages in thread
- * [PATCH RT 14/16] hwlat-detector: Use thread instead of stop machine
  2013-09-09 14:35 [PATCH RT 00/16] 3.0.89-rt118-rc1 stable review Steven Rostedt
                   ` (12 preceding siblings ...)
  2013-09-09 14:35 ` [PATCH RT 13/16] hwlat-detector: Use trace_clock_local if available Steven Rostedt
@ 2013-09-09 14:35 ` Steven Rostedt
  2013-09-09 14:35 ` [PATCH RT 15/16] genirq: do not invoke the affinity callback via a workqueue Steven Rostedt
  2013-09-09 14:35 ` [PATCH RT 16/16] Linux 3.0.89-rt118-rc1 Steven Rostedt
  15 siblings, 0 replies; 17+ messages in thread
From: Steven Rostedt @ 2013-09-09 14:35 UTC (permalink / raw)
  To: linux-kernel, linux-rt-users
  Cc: Thomas Gleixner, Carsten Emde, Sebastian Andrzej Siewior,
	John Kacur, stable-rt
[-- Attachment #1: 0014-hwlat-detector-Use-thread-instead-of-stop-machine.patch --]
[-- Type: text/plain, Size: 6412 bytes --]
From: Steven Rostedt <rostedt@goodmis.org>
There's no reason to use stop machine to search for hardware latency.
Simply disabling interrupts while running the loop will do enough to
check if something comes in that wasn't disabled by interrupts being
off, which is exactly what stop machine does.
Instead of using stop machine, just have the thread disable interrupts
while it checks for hardware latency.
Cc: stable-rt@vger.kernel.org
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 drivers/misc/hwlat_detector.c |   59 +++++++++++++++++------------------------
 1 file changed, 25 insertions(+), 34 deletions(-)
diff --git a/drivers/misc/hwlat_detector.c b/drivers/misc/hwlat_detector.c
index 13443e9..6f61d5f 100644
--- a/drivers/misc/hwlat_detector.c
+++ b/drivers/misc/hwlat_detector.c
@@ -41,7 +41,6 @@
 #include <linux/module.h>
 #include <linux/init.h>
 #include <linux/ring_buffer.h>
-#include <linux/stop_machine.h>
 #include <linux/time.h>
 #include <linux/hrtimer.h>
 #include <linux/kthread.h>
@@ -107,7 +106,6 @@ struct data;					/* Global state */
 /* Sampling functions */
 static int __buffer_add_sample(struct sample *sample);
 static struct sample *buffer_get_sample(struct sample *sample);
-static int get_sample(void *unused);
 
 /* Threading and state */
 static int kthread_fn(void *unused);
@@ -149,7 +147,7 @@ struct sample {
 	unsigned long   lost;
 };
 
-/* keep the global state somewhere. Mostly used under stop_machine. */
+/* keep the global state somewhere. */
 static struct data {
 
 	struct mutex lock;		/* protect changes */
@@ -172,7 +170,7 @@ static struct data {
  * @sample: The new latency sample value
  *
  * This receives a new latency sample and records it in a global ring buffer.
- * No additional locking is used in this case - suited for stop_machine use.
+ * No additional locking is used in this case.
  */
 static int __buffer_add_sample(struct sample *sample)
 {
@@ -229,18 +227,17 @@ static struct sample *buffer_get_sample(struct sample *sample)
 #endif
 /**
  * get_sample - sample the CPU TSC and look for likely hardware latencies
- * @unused: This is not used but is a part of the stop_machine API
  *
  * Used to repeatedly capture the CPU TSC (or similar), looking for potential
- * hardware-induced latency. Called under stop_machine, with data.lock held.
+ * hardware-induced latency. Called with interrupts disabled and with data.lock held.
  */
-static int get_sample(void *unused)
+static int get_sample(void)
 {
 	time_type start, t1, t2, last_t2;
 	s64 diff, total = 0;
 	u64 sample = 0;
 	u64 outer_sample = 0;
-	int ret = 1;
+	int ret = -1;
 
 	init_time(last_t2, 0);
 	start = time_get(); /* start timestamp */
@@ -279,10 +276,14 @@ static int get_sample(void *unused)
 
 	} while (total <= data.sample_width);
 
+	ret = 0;
+
 	/* If we exceed the threshold value, we have found a hardware latency */
 	if (sample > data.threshold || outer_sample > data.threshold) {
 		struct sample s;
 
+		ret = 1;
+
 		data.count++;
 		s.seqnum = data.count;
 		s.duration = sample;
@@ -295,7 +296,6 @@ static int get_sample(void *unused)
 			data.max_sample = sample;
 	}
 
-	ret = 0;
 out:
 	return ret;
 }
@@ -305,32 +305,30 @@ out:
  * @unused: A required part of the kthread API.
  *
  * Used to periodically sample the CPU TSC via a call to get_sample. We
- * use stop_machine, whith does (intentionally) introduce latency since we
+ * disable interrupts, which does (intentionally) introduce latency since we
  * need to ensure nothing else might be running (and thus pre-empting).
  * Obviously this should never be used in production environments.
  *
- * stop_machine will schedule us typically only on CPU0 which is fine for
- * almost every real-world hardware latency situation - but we might later
- * generalize this if we find there are any actualy systems with alternate
- * SMI delivery or other non CPU0 hardware latencies.
+ * Currently this runs on which ever CPU it was scheduled on, but most
+ * real-worald hardware latency situations occur across several CPUs,
+ * but we might later generalize this if we find there are any actualy
+ * systems with alternate SMI delivery or other hardware latencies.
  */
 static int kthread_fn(void *unused)
 {
-	int err = 0;
-	u64 interval = 0;
+	int ret;
+	u64 interval;
 
 	while (!kthread_should_stop()) {
 
 		mutex_lock(&data.lock);
 
-		err = stop_machine(get_sample, unused, 0);
-		if (err) {
-			/* Houston, we have a problem */
-			mutex_unlock(&data.lock);
-			goto err_out;
-		}
+		local_irq_disable();
+		ret = get_sample();
+		local_irq_enable();
 
-		wake_up(&data.wq); /* wake up reader(s) */
+		if (ret > 0)
+			wake_up(&data.wq); /* wake up reader(s) */
 
 		interval = data.sample_window - data.sample_width;
 		do_div(interval, USEC_PER_MSEC); /* modifies interval value */
@@ -338,15 +336,10 @@ static int kthread_fn(void *unused)
 		mutex_unlock(&data.lock);
 
 		if (msleep_interruptible(interval))
-			goto out;
+			break;
 	}
-		goto out;
-err_out:
-	printk(KERN_ERR BANNER "could not call stop_machine, disabling\n");
-	enabled = 0;
-out:
-	return err;
 
+	return 0;
 }
 
 /**
@@ -442,8 +435,7 @@ out:
  * This function provides a generic read implementation for the global state
  * "data" structure debugfs filesystem entries. It would be nice to use
  * simple_attr_read directly, but we need to make sure that the data.lock
- * spinlock is held during the actual read (even though we likely won't ever
- * actually race here as the updater runs under a stop_machine context).
+ * is held during the actual read.
  */
 static ssize_t simple_data_read(struct file *filp, char __user *ubuf,
 				size_t cnt, loff_t *ppos, const u64 *entry)
@@ -478,8 +470,7 @@ static ssize_t simple_data_read(struct file *filp, char __user *ubuf,
  * This function provides a generic write implementation for the global state
  * "data" structure debugfs filesystem entries. It would be nice to use
  * simple_attr_write directly, but we need to make sure that the data.lock
- * spinlock is held during the actual write (even though we likely won't ever
- * actually race here as the updater runs under a stop_machine context).
+ * is held during the actual write.
  */
 static ssize_t simple_data_write(struct file *filp, const char __user *ubuf,
 				 size_t cnt, loff_t *ppos, u64 *entry)
-- 
1.7.10.4
^ permalink raw reply related	[flat|nested] 17+ messages in thread
- * [PATCH RT 15/16] genirq: do not invoke the affinity callback via a workqueue
  2013-09-09 14:35 [PATCH RT 00/16] 3.0.89-rt118-rc1 stable review Steven Rostedt
                   ` (13 preceding siblings ...)
  2013-09-09 14:35 ` [PATCH RT 14/16] hwlat-detector: Use thread instead of stop machine Steven Rostedt
@ 2013-09-09 14:35 ` Steven Rostedt
  2013-09-09 14:35 ` [PATCH RT 16/16] Linux 3.0.89-rt118-rc1 Steven Rostedt
  15 siblings, 0 replies; 17+ messages in thread
From: Steven Rostedt @ 2013-09-09 14:35 UTC (permalink / raw)
  To: linux-kernel, linux-rt-users
  Cc: Thomas Gleixner, Carsten Emde, Sebastian Andrzej Siewior,
	John Kacur, stable-rt
[-- Attachment #1: 0015-genirq-do-not-invoke-the-affinity-callback-via-a-wor.patch --]
[-- Type: text/plain, Size: 4445 bytes --]
From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Joe Korty reported, that __irq_set_affinity_locked() schedules a
workqueue while holding a rawlock which results in a might_sleep()
warning.
This patch moves the invokation into a process context so that we only
wakeup() a process while holding the lock.
Cc: stable-rt@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 include/linux/interrupt.h |    1 +
 kernel/irq/manage.c       |   79 +++++++++++++++++++++++++++++++++++++++++++--
 2 files changed, 77 insertions(+), 3 deletions(-)
diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
index 71c2c0b..9f67f91 100644
--- a/include/linux/interrupt.h
+++ b/include/linux/interrupt.h
@@ -255,6 +255,7 @@ struct irq_affinity_notify {
 	unsigned int irq;
 	struct kref kref;
 	struct work_struct work;
+	struct list_head list;
 	void (*notify)(struct irq_affinity_notify *, const cpumask_t *mask);
 	void (*release)(struct kref *ref);
 };
diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
index 9d702cf..3d7d5f6 100644
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -141,6 +141,62 @@ static inline void
 irq_get_pending(struct cpumask *mask, struct irq_desc *desc) { }
 #endif
 
+#ifdef CONFIG_PREEMPT_RT_FULL
+static void _irq_affinity_notify(struct irq_affinity_notify *notify);
+static struct task_struct *set_affinity_helper;
+static LIST_HEAD(affinity_list);
+static DEFINE_RAW_SPINLOCK(affinity_list_lock);
+
+static int set_affinity_thread(void *unused)
+{
+	while (1) {
+		struct irq_affinity_notify *notify;
+		int empty;
+
+		set_current_state(TASK_INTERRUPTIBLE);
+
+		raw_spin_lock_irq(&affinity_list_lock);
+		empty = list_empty(&affinity_list);
+		raw_spin_unlock_irq(&affinity_list_lock);
+
+		if (empty)
+			schedule();
+		if (kthread_should_stop())
+			break;
+		set_current_state(TASK_RUNNING);
+try_next:
+		notify = NULL;
+
+		raw_spin_lock_irq(&affinity_list_lock);
+		if (!list_empty(&affinity_list)) {
+			notify = list_first_entry(&affinity_list,
+					struct irq_affinity_notify, list);
+			list_del_init(¬ify->list);
+		}
+		raw_spin_unlock_irq(&affinity_list_lock);
+
+		if (!notify)
+			continue;
+		_irq_affinity_notify(notify);
+		goto try_next;
+	}
+	return 0;
+}
+
+static void init_helper_thread(void)
+{
+	if (set_affinity_helper)
+		return;
+	set_affinity_helper = kthread_run(set_affinity_thread, NULL,
+			"affinity-cb");
+	WARN_ON(IS_ERR(set_affinity_helper));
+}
+#else
+
+static inline void init_helper_thread(void) { }
+
+#endif
+
 int __irq_set_affinity_locked(struct irq_data *data, const struct cpumask *mask)
 {
 	struct irq_chip *chip = irq_data_get_irq_chip(data);
@@ -166,7 +222,17 @@ int __irq_set_affinity_locked(struct irq_data *data, const struct cpumask *mask)
 
 	if (desc->affinity_notify) {
 		kref_get(&desc->affinity_notify->kref);
+
+#ifdef CONFIG_PREEMPT_RT_FULL
+		raw_spin_lock(&affinity_list_lock);
+		if (list_empty(&desc->affinity_notify->list))
+			list_add_tail(&affinity_list,
+					&desc->affinity_notify->list);
+		raw_spin_unlock(&affinity_list_lock);
+		wake_up_process(set_affinity_helper);
+#else
 		schedule_work(&desc->affinity_notify->work);
+#endif
 	}
 	irqd_set(data, IRQD_AFFINITY_SET);
 
@@ -207,10 +273,8 @@ int irq_set_affinity_hint(unsigned int irq, const struct cpumask *m)
 }
 EXPORT_SYMBOL_GPL(irq_set_affinity_hint);
 
-static void irq_affinity_notify(struct work_struct *work)
+static void _irq_affinity_notify(struct irq_affinity_notify *notify)
 {
-	struct irq_affinity_notify *notify =
-		container_of(work, struct irq_affinity_notify, work);
 	struct irq_desc *desc = irq_to_desc(notify->irq);
 	cpumask_var_t cpumask;
 	unsigned long flags;
@@ -232,6 +296,13 @@ out:
 	kref_put(¬ify->kref, notify->release);
 }
 
+static void irq_affinity_notify(struct work_struct *work)
+{
+	struct irq_affinity_notify *notify =
+		container_of(work, struct irq_affinity_notify, work);
+	_irq_affinity_notify(notify);
+}
+
 /**
  *	irq_set_affinity_notifier - control notification of IRQ affinity changes
  *	@irq:		Interrupt for which to enable/disable notification
@@ -261,6 +332,8 @@ irq_set_affinity_notifier(unsigned int irq, struct irq_affinity_notify *notify)
 		notify->irq = irq;
 		kref_init(¬ify->kref);
 		INIT_WORK(¬ify->work, irq_affinity_notify);
+		INIT_LIST_HEAD(¬ify->list);
+		init_helper_thread();
 	}
 
 	raw_spin_lock_irqsave(&desc->lock, flags);
-- 
1.7.10.4
^ permalink raw reply related	[flat|nested] 17+ messages in thread
- * [PATCH RT 16/16] Linux 3.0.89-rt118-rc1
  2013-09-09 14:35 [PATCH RT 00/16] 3.0.89-rt118-rc1 stable review Steven Rostedt
                   ` (14 preceding siblings ...)
  2013-09-09 14:35 ` [PATCH RT 15/16] genirq: do not invoke the affinity callback via a workqueue Steven Rostedt
@ 2013-09-09 14:35 ` Steven Rostedt
  15 siblings, 0 replies; 17+ messages in thread
From: Steven Rostedt @ 2013-09-09 14:35 UTC (permalink / raw)
  To: linux-kernel, linux-rt-users
  Cc: Thomas Gleixner, Carsten Emde, Sebastian Andrzej Siewior,
	John Kacur
[-- Attachment #1: 0016-Linux-3.0.89-rt118-rc1.patch --]
[-- Type: text/plain, Size: 301 bytes --]
From: "Steven Rostedt (Red Hat)" <rostedt@goodmis.org>
---
 localversion-rt |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/localversion-rt b/localversion-rt
index 9788245..f2ec5ab 100644
--- a/localversion-rt
+++ b/localversion-rt
@@ -1 +1 @@
--rt117
+-rt118-rc1
-- 
1.7.10.4
^ permalink raw reply related	[flat|nested] 17+ messages in thread