[PATCH RT 00/10] 3.8.13-rt16-rc1 stable review

linux-rt-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH RT 00/10] 3.8.13-rt16-rc1 stable review
@ 2013-09-08 12:25 Steven Rostedt
  2013-09-08 12:25 ` [PATCH RT 01/10] sched: Distangle worker accounting from rqlock Steven Rostedt
                   ` (9 more replies)
  0 siblings, 10 replies; 11+ messages in thread
From: Steven Rostedt @ 2013-09-08 12:25 UTC (permalink / raw)
  To: linux-kernel, linux-rt-users
  Cc: Thomas Gleixner, Carsten Emde, Sebastian Andrzej Siewior,
	John Kacur


Dear RT Folks,

This is the RT stable review cycle of patch 3.8.13-rt16-rc1.

Please scream at me if I messed something up. Please test the patches too.

The -rc release will be uploaded to kernel.org and will be deleted when
the final release is out. This is just a review release (or release candidate).

The pre-releases will not be pushed to the git repository, only the
final release is.

If all goes well, this patch will be converted to the next main release
on 9/11/2013.

Enjoy,

-- Steve


To build 3.8.13-rt16-rc1 directly, the following patches should be applied:

  http://www.kernel.org/pub/linux/kernel/v3.x/linux-3.8.tar.xz

  http://www.kernel.org/pub/linux/kernel/v3.x/patch-3.8.13.xz

  http://www.kernel.org/pub/linux/kernel/projects/rt/3.8/patch-3.8.13-rt16-rc1.patch.xz

You can also build from 3.8.13-rt15 by applying the incremental patch:

http://www.kernel.org/pub/linux/kernel/projects/rt/3.8/incr/patch-3.8.13-rt15-rt16-rc1.patch.xz


Changes from 3.8.13-rt15:

---


John Kacur (1):
      hpsa: fix warning with smp_processor_id() in preemptible section

Sebastian Andrzej Siewior (2):
      powerpc: 52xx: provide a default in mpc52xx_irqhost_map()
      genirq: do not invoke the affinity callback via a workqueue

Steven Rostedt (4):
      rt,ntp: Move call to schedule_delayed_work() to helper thread
      hwlat-detector: Update hwlat_detector to add outer loop detection
      hwlat-detector: Use trace_clock_local if available
      hwlat-detector: Use thread instead of stop machine

Steven Rostedt (Red Hat) (2):
      hwlat-detect/trace: Export trace_clock_local for hwlat-detector
      Linux 3.8.13-rt16-rc1

Thomas Gleixner (1):
      sched: Distangle worker accounting from rqlock

----
 arch/powerpc/platforms/52xx/mpc52xx_pic.c |    3 +-
 drivers/misc/hwlat_detector.c             |  117 ++++++++++++++++++-----------
 drivers/scsi/hpsa.c                       |    2 +-
 include/linux/interrupt.h                 |    1 +
 kernel/irq/manage.c                       |   79 ++++++++++++++++++-
 kernel/sched/core.c                       |   70 ++++-------------
 kernel/time/ntp.c                         |   40 ++++++++++
 kernel/trace/trace_clock.c                |    1 +
 kernel/workqueue.c                        |   63 +++++++---------
 kernel/workqueue_sched.h                  |    5 +-
 localversion-rt                           |    2 +-
 11 files changed, 240 insertions(+), 143 deletions(-)

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH RT 01/10] sched: Distangle worker accounting from rqlock
  2013-09-08 12:25 [PATCH RT 00/10] 3.8.13-rt16-rc1 stable review Steven Rostedt
@ 2013-09-08 12:25 ` Steven Rostedt
  2013-09-08 12:25 ` [PATCH RT 02/10] rt,ntp: Move call to schedule_delayed_work() to helper thread Steven Rostedt
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: Steven Rostedt @ 2013-09-08 12:25 UTC (permalink / raw)
  To: linux-kernel, linux-rt-users
  Cc: Thomas Gleixner, Carsten Emde, Sebastian Andrzej Siewior,
	John Kacur, Peter Zijlstra, Tejun Heo, Jens Axboe, stable-rt

[-- Attachment #1: 0001-sched-Distangle-worker-accounting-from-rqlock.patch --]
[-- Type: text/plain, Size: 8005 bytes --]

From: Thomas Gleixner <tglx@linutronix.de>

The worker accounting for cpu bound workers is plugged into the core
scheduler code and the wakeup code. This is not a hard requirement and
can be avoided by keeping track of the state in the workqueue code
itself.

Keep track of the sleeping state in the worker itself and call the
notifier before entering the core scheduler. There might be false
positives when the task is woken between that call and actually
scheduling, but that's not really different from scheduling and being
woken immediately after switching away. There is also no harm from
updating nr_running when the task returns from scheduling instead of
accounting it in the wakeup code.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: stable-rt@vger.kernel.org
Link: http://lkml.kernel.org/r/20110622174919.135236139@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/sched/core.c      |   70 ++++++++++------------------------------------
 kernel/workqueue.c       |   63 +++++++++++++++++++----------------------
 kernel/workqueue_sched.h |    5 ++--
 3 files changed, 46 insertions(+), 92 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 505e08f..d7c1471 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1321,10 +1321,6 @@ static void ttwu_activate(struct rq *rq, struct task_struct *p, int en_flags)
 {
 	activate_task(rq, p, en_flags);
 	p->on_rq = 1;
-
-	/* if a worker is waking up, notify workqueue */
-	if (p->flags & PF_WQ_WORKER)
-		wq_worker_waking_up(p, cpu_of(rq));
 }
 
 /*
@@ -1551,42 +1547,6 @@ out:
 }
 
 /**
- * try_to_wake_up_local - try to wake up a local task with rq lock held
- * @p: the thread to be awakened
- *
- * Put @p on the run-queue if it's not already there. The caller must
- * ensure that this_rq() is locked, @p is bound to this_rq() and not
- * the current task.
- */
-static void try_to_wake_up_local(struct task_struct *p)
-{
-	struct rq *rq = task_rq(p);
-
-	if (WARN_ON_ONCE(rq != this_rq()) ||
-	    WARN_ON_ONCE(p == current))
-		return;
-
-	lockdep_assert_held(&rq->lock);
-
-	if (!raw_spin_trylock(&p->pi_lock)) {
-		raw_spin_unlock(&rq->lock);
-		raw_spin_lock(&p->pi_lock);
-		raw_spin_lock(&rq->lock);
-	}
-
-	if (!(p->state & TASK_NORMAL))
-		goto out;
-
-	if (!p->on_rq)
-		ttwu_activate(rq, p, ENQUEUE_WAKEUP);
-
-	ttwu_do_wakeup(rq, p, 0);
-	ttwu_stat(p, smp_processor_id(), 0);
-out:
-	raw_spin_unlock(&p->pi_lock);
-}
-
-/**
  * wake_up_process - Wake up a specific process
  * @p: The process to be woken up.
  *
@@ -3139,21 +3099,6 @@ need_resched:
 		} else {
 			deactivate_task(rq, prev, DEQUEUE_SLEEP);
 			prev->on_rq = 0;
-
-			/*
-			 * If a worker went to sleep, notify and ask workqueue
-			 * whether it wants to wake up a task to maintain
-			 * concurrency.
-			 * Only call wake up if prev isn't blocked on a sleeping
-			 * spin lock.
-			 */
-			if (prev->flags & PF_WQ_WORKER && !prev->saved_state) {
-				struct task_struct *to_wakeup;
-
-				to_wakeup = wq_worker_sleeping(prev, cpu);
-				if (to_wakeup)
-					try_to_wake_up_local(to_wakeup);
-			}
 		}
 		switch_count = &prev->nvcsw;
 	}
@@ -3197,6 +3142,14 @@ static inline void sched_submit_work(struct task_struct *tsk)
 {
 	if (!tsk->state || tsk_is_pi_blocked(tsk))
 		return;
+
+	/*
+	 * If a worker went to sleep, notify and ask workqueue whether
+	 * it wants to wake up a task to maintain concurrency.
+	 */
+	if (tsk->flags & PF_WQ_WORKER)
+		wq_worker_sleeping(tsk);
+
 	/*
 	 * If we are going to sleep and we have plugged IO queued,
 	 * make sure to submit it to avoid deadlocks.
@@ -3205,12 +3158,19 @@ static inline void sched_submit_work(struct task_struct *tsk)
 		blk_schedule_flush_plug(tsk);
 }
 
+static inline void sched_update_worker(struct task_struct *tsk)
+{
+	if (tsk->flags & PF_WQ_WORKER)
+		wq_worker_running(tsk);
+}
+
 asmlinkage void __sched schedule(void)
 {
 	struct task_struct *tsk = current;
 
 	sched_submit_work(tsk);
 	__schedule();
+	sched_update_worker(tsk);
 }
 EXPORT_SYMBOL(schedule);
 
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 99855b3..788b017 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -149,6 +149,7 @@ struct worker {
 	unsigned long		last_active;	/* L: last active timestamp */
 	unsigned int		flags;		/* X: flags */
 	int			id;		/* I: worker id */
+	int			sleeping;	/* None */
 
 	/* for rebinding worker to CPU */
 	struct work_struct	rebind_work;	/* L: for busy worker */
@@ -730,54 +731,45 @@ static void wake_up_worker(struct worker_pool *pool)
 }
 
 /**
- * wq_worker_waking_up - a worker is waking up
- * @task: task waking up
- * @cpu: CPU @task is waking up to
+ * wq_worker_running - a worker is running again
+ * @task: task returning from sleep
  *
- * This function is called during try_to_wake_up() when a worker is
- * being awoken.
- *
- * CONTEXT:
- * spin_lock_irq(rq->lock)
+ * This function is called when a worker returns from schedule()
  */
-void wq_worker_waking_up(struct task_struct *task, unsigned int cpu)
+void wq_worker_running(struct task_struct *task)
 {
 	struct worker *worker = kthread_data(task);
 
-	if (!(worker->flags & WORKER_NOT_RUNNING)) {
-		WARN_ON_ONCE(worker->pool->gcwq->cpu != cpu);
+	if (!worker->sleeping)
+		return;
+	if (!(worker->flags & WORKER_NOT_RUNNING))
 		atomic_inc(get_pool_nr_running(worker->pool));
-	}
+	worker->sleeping = 0;
 }
 
 /**
  * wq_worker_sleeping - a worker is going to sleep
  * @task: task going to sleep
- * @cpu: CPU in question, must be the current CPU number
- *
- * This function is called during schedule() when a busy worker is
- * going to sleep.  Worker on the same cpu can be woken up by
- * returning pointer to its task.
- *
- * CONTEXT:
- * spin_lock_irq(rq->lock)
- *
- * RETURNS:
- * Worker task on @cpu to wake up, %NULL if none.
+ * This function is called from schedule() when a busy worker is
+ * going to sleep.
  */
-struct task_struct *wq_worker_sleeping(struct task_struct *task,
-				       unsigned int cpu)
+void wq_worker_sleeping(struct task_struct *task)
 {
-	struct worker *worker = kthread_data(task), *to_wakeup = NULL;
-	struct worker_pool *pool = worker->pool;
-	atomic_t *nr_running = get_pool_nr_running(pool);
+	struct worker *next, *worker = kthread_data(task);
+	struct worker_pool *pool;
+	atomic_t *nr_running;
 
 	if (worker->flags & WORKER_NOT_RUNNING)
-		return NULL;
+		return;
+
+	if (WARN_ON_ONCE(worker->sleeping))
+		return;
 
-	/* this can only happen on the local cpu */
-	BUG_ON(cpu != raw_smp_processor_id());
+	pool = worker->pool;
+	nr_running = get_pool_nr_running(pool);
 
+	worker->sleeping = 1;
+	spin_lock_irq(&pool->gcwq->lock);
 	/*
 	 * The counterpart of the following dec_and_test, implied mb,
 	 * worklist not empty test sequence is in insert_work().
@@ -789,9 +781,12 @@ struct task_struct *wq_worker_sleeping(struct task_struct *task,
 	 * manipulating idle_list, so dereferencing idle_list without gcwq
 	 * lock is safe.
 	 */
-	if (atomic_dec_and_test(nr_running) && !list_empty(&pool->worklist))
-		to_wakeup = first_worker(pool);
-	return to_wakeup ? to_wakeup->task : NULL;
+	if (atomic_dec_and_test(nr_running) && !list_empty(&pool->worklist)) {
+		next = first_worker(pool);
+		if (next)
+			wake_up_process(next->task);
+	}
+	spin_unlock_irq(&pool->gcwq->lock);
 }
 
 /**
diff --git a/kernel/workqueue_sched.h b/kernel/workqueue_sched.h
index 2d10fc9..3bf73e2 100644
--- a/kernel/workqueue_sched.h
+++ b/kernel/workqueue_sched.h
@@ -4,6 +4,5 @@
  * Scheduler hooks for concurrency managed workqueue.  Only to be
  * included from sched.c and workqueue.c.
  */
-void wq_worker_waking_up(struct task_struct *task, unsigned int cpu);
-struct task_struct *wq_worker_sleeping(struct task_struct *task,
-				       unsigned int cpu);
+void wq_worker_running(struct task_struct *task);
+void wq_worker_sleeping(struct task_struct *task);
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH RT 02/10] rt,ntp: Move call to schedule_delayed_work() to helper thread
  2013-09-08 12:25 [PATCH RT 00/10] 3.8.13-rt16-rc1 stable review Steven Rostedt
  2013-09-08 12:25 ` [PATCH RT 01/10] sched: Distangle worker accounting from rqlock Steven Rostedt
@ 2013-09-08 12:25 ` Steven Rostedt
  2013-09-08 12:25 ` [PATCH RT 03/10] hpsa: fix warning with smp_processor_id() in preemptible section Steven Rostedt
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: Steven Rostedt @ 2013-09-08 12:25 UTC (permalink / raw)
  To: linux-kernel, linux-rt-users
  Cc: Thomas Gleixner, Carsten Emde, Sebastian Andrzej Siewior,
	John Kacur, stable-rt

[-- Attachment #1: 0002-rt-ntp-Move-call-to-schedule_delayed_work-to-helper-.patch --]
[-- Type: text/plain, Size: 2580 bytes --]

From: Steven Rostedt <rostedt@goodmis.org>

The ntp code for notify_cmos_timer() is called from a hard interrupt
context. schedule_delayed_work() under PREEMPT_RT_FULL calls spinlocks
that have been converted to mutexes, thus calling schedule_delayed_work()
from interrupt is not safe.

Add a helper thread that does the call to schedule_delayed_work and wake
up that thread instead of calling schedule_delayed_work() directly.
This is only for CONFIG_PREEMPT_RT_FULL, otherwise the code still calls
schedule_delayed_work() directly in irq context.

Note: There's a few places in the kernel that do this. Perhaps the RT
code should have a dedicated thread that does the checks. Just register
a notifier on boot up for your check and wake up the thread when
needed. This will be a todo.

Cc: stable-rt@vger.kernel.org
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 kernel/time/ntp.c |   40 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 40 insertions(+)

diff --git a/kernel/time/ntp.c b/kernel/time/ntp.c
index bb1edfa..3e93516 100644
--- a/kernel/time/ntp.c
+++ b/kernel/time/ntp.c
@@ -10,6 +10,7 @@
 #include <linux/workqueue.h>
 #include <linux/hrtimer.h>
 #include <linux/jiffies.h>
+#include <linux/kthread.h>
 #include <linux/math64.h>
 #include <linux/timex.h>
 #include <linux/time.h>
@@ -529,10 +530,49 @@ static void sync_cmos_clock(struct work_struct *work)
 	schedule_delayed_work(&sync_cmos_work, timespec_to_jiffies(&next));
 }
 
+#ifdef CONFIG_PREEMPT_RT_FULL
+/*
+ * RT can not call schedule_delayed_work from real interrupt context.
+ * Need to make a thread to do the real work.
+ */
+static struct task_struct *cmos_delay_thread;
+static bool do_cmos_delay;
+
+static int run_cmos_delay(void *ignore)
+{
+	while (!kthread_should_stop()) {
+		set_current_state(TASK_INTERRUPTIBLE);
+		if (do_cmos_delay) {
+			do_cmos_delay = false;
+			schedule_delayed_work(&sync_cmos_work, 0);
+		}
+		schedule();
+	}
+	__set_current_state(TASK_RUNNING);
+	return 0;
+}
+
+static void notify_cmos_timer(void)
+{
+	do_cmos_delay = true;
+	/* Make visible before waking up process */
+	smp_wmb();
+	wake_up_process(cmos_delay_thread);
+}
+
+static __init int create_cmos_delay_thread(void)
+{
+	cmos_delay_thread = kthread_run(run_cmos_delay, NULL, "kcmosdelayd");
+	BUG_ON(!cmos_delay_thread);
+	return 0;
+}
+early_initcall(create_cmos_delay_thread);
+#else
 static void notify_cmos_timer(void)
 {
 	schedule_delayed_work(&sync_cmos_work, 0);
 }
+#endif /* CONFIG_PREEMPT_RT_FULL */
 
 #else
 static inline void notify_cmos_timer(void) { }
-- 
1.7.10.4



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH RT 03/10] hpsa: fix warning with smp_processor_id() in preemptible section
  2013-09-08 12:25 [PATCH RT 00/10] 3.8.13-rt16-rc1 stable review Steven Rostedt
  2013-09-08 12:25 ` [PATCH RT 01/10] sched: Distangle worker accounting from rqlock Steven Rostedt
  2013-09-08 12:25 ` [PATCH RT 02/10] rt,ntp: Move call to schedule_delayed_work() to helper thread Steven Rostedt
@ 2013-09-08 12:25 ` Steven Rostedt
  2013-09-08 12:25 ` [PATCH RT 04/10] powerpc: 52xx: provide a default in mpc52xx_irqhost_map() Steven Rostedt
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: Steven Rostedt @ 2013-09-08 12:25 UTC (permalink / raw)
  To: linux-kernel, linux-rt-users
  Cc: Thomas Gleixner, Carsten Emde, Sebastian Andrzej Siewior,
	John Kacur, stable-rt, Stephen

[-- Attachment #1: 0003-hpsa-fix-warning-with-smp_processor_id-in-preemptibl.patch --]
[-- Type: text/plain, Size: 3376 bytes --]

From: John Kacur <jkacur@redhat.com>

On a 3.6-rt (real-time patch) kernel we are seeing the following BUG
However, it appears to be relevant for non-realtime (mainline) as well.

|hpsa 0000:03:00.0: hpsa0: <0x323a> at IRQ 67 using DAC
|scsi0 : hpsa
|BUG: using smp_processor_id() in preemptible [0000000000000000] code: kworker/u:0/6
|caller is enqueue_cmd_and_start_io+0x5a/0x100 [hpsa]
|Pid: 6, comm: kworker/u:0 Not tainted 3.6.11.5-rt37.52.el6rt.x86_64.debug #1
|Call Trace:
| [<ffffffff812abe83>] debug_smp_processor_id+0x123/0x150
| [<ffffffffa009043a>] enqueue_cmd_and_start_io+0x5a/0x100 [hpsa]
| [<ffffffffa00905cb>] hpsa_scsi_do_simple_cmd_core+0xeb/0x110 [hpsa]
| [<ffffffff812b09c8>] ? swiotlb_dma_mapping_error+0x18/0x30
| [<ffffffff812b09c8>] ? swiotlb_dma_mapping_error+0x18/0x30
| [<ffffffffa0090701>] hpsa_scsi_do_simple_cmd_with_retry+0x91/0x280 [hpsa]
| [<ffffffffa0093558>] hpsa_scsi_do_report_luns.clone.2+0xd8/0x130 [hpsa]
| [<ffffffffa00935ea>] hpsa_gather_lun_info.clone.3+0x3a/0x1a0 [hpsa]
| [<ffffffffa00963df>] hpsa_update_scsi_devices+0x11f/0x4f0 [hpsa]
| [<ffffffff81592019>] ? sub_preempt_count+0xa9/0xe0
| [<ffffffffa00968ad>] hpsa_scan_start+0xfd/0x150 [hpsa]
| [<ffffffff8158cba8>] ? rt_spin_lock_slowunlock+0x78/0x90
| [<ffffffff813b04d7>] do_scsi_scan_host+0x37/0xa0
| [<ffffffff813b05da>] do_scan_async+0x1a/0x30
| [<ffffffff8107c4ab>] async_run_entry_fn+0x9b/0x1d0
| [<ffffffff8106ae92>] process_one_work+0x1f2/0x620
| [<ffffffff8106ae20>] ? process_one_work+0x180/0x620
| [<ffffffff8106d4fe>] ? worker_thread+0x5e/0x3a0
| [<ffffffff8107c410>] ? async_schedule+0x20/0x20
| [<ffffffff8106d5d3>] worker_thread+0x133/0x3a0
| [<ffffffff8106d4a0>] ? manage_workers+0x190/0x190
| [<ffffffff81073236>] kthread+0xa6/0xb0
| [<ffffffff815970a4>] kernel_thread_helper+0x4/0x10
| [<ffffffff81082a7c>] ? finish_task_switch+0x8c/0x110
| [<ffffffff8158e44b>] ? _raw_spin_unlock_irq+0x3b/0x70
| [<ffffffff8158e85d>] ? retint_restore_args+0xe/0xe
| [<ffffffff81073190>] ? kthreadd+0x1e0/0x1e0
| [<ffffffff815970a0>] ? gs_change+0xb/0xb

-------------------

This is caused by
enqueue_cmd_and_start_io()->
        set_performant_mode()->
                smp_processor_id()
Which if you have debugging enabled calls debug_processor_id() and triggers the warning.

The code here is
 c->Header.ReplyQueue = smp_processor_id() % h->nreply_queues;

Since it is not critical that the code complete on the same processor,
but the cpu is a hint used in generating the ReplyQueue and will still work if
the cpu migrates or is preempted, it is safe to use the raw_smp_processor_id()
to surpress the false positve warning.

Cc: stable-rt@vger.kernel.org
Signed-off-by: John Kacur <jkacur@redhat.com>
Acked-by: Stephen <scameron@beardog.cce.hp.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 drivers/scsi/hpsa.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/scsi/hpsa.c b/drivers/scsi/hpsa.c
index 4f33806..80a44b1 100644
--- a/drivers/scsi/hpsa.c
+++ b/drivers/scsi/hpsa.c
@@ -583,7 +583,7 @@ static void set_performant_mode(struct ctlr_info *h, struct CommandList *c)
 		c->busaddr |= 1 | (h->blockFetchTable[c->Header.SGList] << 1);
 		if (likely(h->msix_vector))
 			c->Header.ReplyQueue =
-				smp_processor_id() % h->nreply_queues;
+				raw_smp_processor_id() % h->nreply_queues;
 	}
 }
 
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH RT 04/10] powerpc: 52xx: provide a default in mpc52xx_irqhost_map()
  2013-09-08 12:25 [PATCH RT 00/10] 3.8.13-rt16-rc1 stable review Steven Rostedt
                   ` (2 preceding siblings ...)
  2013-09-08 12:25 ` [PATCH RT 03/10] hpsa: fix warning with smp_processor_id() in preemptible section Steven Rostedt
@ 2013-09-08 12:25 ` Steven Rostedt
  2013-09-08 12:25 ` [PATCH RT 05/10] hwlat-detector: Update hwlat_detector to add outer loop detection Steven Rostedt
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: Steven Rostedt @ 2013-09-08 12:25 UTC (permalink / raw)
  To: linux-kernel, linux-rt-users
  Cc: Thomas Gleixner, Carsten Emde, Sebastian Andrzej Siewior,
	John Kacur, stable-rt

[-- Attachment #1: 0004-powerpc-52xx-provide-a-default-in-mpc52xx_irqhost_ma.patch --]
[-- Type: TEXT/PLAIN, Size: 1623 bytes --]

From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

My gcc-4.3.5 fails to compile due to:

|cc1: warnings being treated as errors
|arch/powerpc/platforms/52xx/mpc52xx_pic.c: In function â€˜mpc52xx_irqhost_mapâ€™:
|arch/powerpc/platforms/52xx/mpc52xx_pic.c:343: error: â€˜irqchipâ€™ may be used uninitialized in this function

since commit e34298c ("powerpc: 52xx: nop out unsupported critical
IRQs"). This warning is complete crap since only values 0â€¦3 are possible
which are checked but gcc fails to understand that. I wouldn't care much
but since this is compiled with -Werror I made this patch.
While add it, I replaced the warning from l2irq to l1irq since this is
the number that is evaluated.

Cc: stable-rt@vger.kernel.org # 3.8
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 arch/powerpc/platforms/52xx/mpc52xx_pic.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/52xx/mpc52xx_pic.c b/arch/powerpc/platforms/52xx/mpc52xx_pic.c
index b89ef65..b69221b 100644
--- a/arch/powerpc/platforms/52xx/mpc52xx_pic.c
+++ b/arch/powerpc/platforms/52xx/mpc52xx_pic.c
@@ -373,8 +373,9 @@ static int mpc52xx_irqhost_map(struct irq_domain *h, unsigned int virq,
 	case MPC52xx_IRQ_L1_PERP: irqchip = &mpc52xx_periph_irqchip; break;
 	case MPC52xx_IRQ_L1_SDMA: irqchip = &mpc52xx_sdma_irqchip; break;
 	case MPC52xx_IRQ_L1_CRIT:
+	default:
 		pr_warn("%s: Critical IRQ #%d is unsupported! Nopping it.\n",
-			__func__, l2irq);
+			__func__, l1irq);
 		irq_set_chip(virq, &no_irq_chip);
 		return 0;
 	}
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH RT 05/10] hwlat-detector: Update hwlat_detector to add outer loop detection
  2013-09-08 12:25 [PATCH RT 00/10] 3.8.13-rt16-rc1 stable review Steven Rostedt
                   ` (3 preceding siblings ...)
  2013-09-08 12:25 ` [PATCH RT 04/10] powerpc: 52xx: provide a default in mpc52xx_irqhost_map() Steven Rostedt
@ 2013-09-08 12:25 ` Steven Rostedt
  2013-09-08 12:25 ` [PATCH RT 06/10] hwlat-detect/trace: Export trace_clock_local for hwlat-detector Steven Rostedt
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: Steven Rostedt @ 2013-09-08 12:25 UTC (permalink / raw)
  To: linux-kernel, linux-rt-users
  Cc: Thomas Gleixner, Carsten Emde, Sebastian Andrzej Siewior,
	John Kacur, stable-rt, Steven Rostedt

[-- Attachment #1: 0005-hwlat-detector-Update-hwlat_detector-to-add-outer-lo.patch --]
[-- Type: text/plain, Size: 3993 bytes --]

From: Steven Rostedt <rostedt@goodmis.org>

The hwlat_detector reads two timestamps in a row, then reports any
gap between those calls. The problem is, it misses everything between
the second reading of the time stamp to the first reading of the time stamp
in the next loop. That's were most of the time is spent, which means,
chances are likely that it will miss all hardware latencies. This
defeats the purpose.

By also testing the first time stamp from the previous loop second
time stamp (the outer loop), we are more likely to find a latency.

Setting the threshold to 1, here's what the report now looks like:

1347415723.0232202770	0	2
1347415725.0234202822	0	2
1347415727.0236202875	0	2
1347415729.0238202928	0	2
1347415731.0240202980	0	2
1347415734.0243203061	0	2
1347415736.0245203113	0	2
1347415738.0247203166	2	0
1347415740.0249203219	0	3
1347415742.0251203272	0	3
1347415743.0252203299	0	3
1347415745.0254203351	0	2
1347415747.0256203404	0	2
1347415749.0258203457	0	2
1347415751.0260203510	0	2
1347415754.0263203589	0	2
1347415756.0265203642	0	2
1347415758.0267203695	0	2
1347415760.0269203748	0	2
1347415762.0271203801	0	2
1347415764.0273203853	2	0

There's some hardware latency that takes 2 microseconds to run.

Cc: stable-rt@vger.kernel.org
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 drivers/misc/hwlat_detector.c |   32 ++++++++++++++++++++++++++------
 1 file changed, 26 insertions(+), 6 deletions(-)

diff --git a/drivers/misc/hwlat_detector.c b/drivers/misc/hwlat_detector.c
index b7b7c90..f93b8ef 100644
--- a/drivers/misc/hwlat_detector.c
+++ b/drivers/misc/hwlat_detector.c
@@ -143,6 +143,7 @@ static void detector_exit(void);
 struct sample {
 	u64		seqnum;		/* unique sequence */
 	u64		duration;	/* ktime delta */
+	u64		outer_duration;	/* ktime delta (outer loop) */
 	struct timespec	timestamp;	/* wall time */
 	unsigned long   lost;
 };
@@ -219,11 +220,13 @@ static struct sample *buffer_get_sample(struct sample *sample)
  */
 static int get_sample(void *unused)
 {
-	ktime_t start, t1, t2;
+	ktime_t start, t1, t2, last_t2;
 	s64 diff, total = 0;
 	u64 sample = 0;
+	u64 outer_sample = 0;
 	int ret = 1;
 
+	last_t2.tv64 = 0;
 	start = ktime_get(); /* start timestamp */
 
 	do {
@@ -231,7 +234,22 @@ static int get_sample(void *unused)
 		t1 = ktime_get();	/* we'll look for a discontinuity */
 		t2 = ktime_get();
 
+		if (last_t2.tv64) {
+			/* Check the delta from the outer loop (t2 to next t1) */
+			diff = ktime_to_us(ktime_sub(t1, last_t2));
+			/* This shouldn't happen */
+			if (diff < 0) {
+				printk(KERN_ERR BANNER "time running backwards\n");
+				goto out;
+			}
+			if (diff > outer_sample)
+				outer_sample = diff;
+		}
+		last_t2 = t2;
+
 		total = ktime_to_us(ktime_sub(t2, start)); /* sample width */
+
+		/* This checks the inner loop (t1 to t2) */
 		diff = ktime_to_us(ktime_sub(t2, t1));     /* current diff */
 
 		/* This shouldn't happen */
@@ -246,12 +264,13 @@ static int get_sample(void *unused)
 	} while (total <= data.sample_width);
 
 	/* If we exceed the threshold value, we have found a hardware latency */
-	if (sample > data.threshold) {
+	if (sample > data.threshold || outer_sample > data.threshold) {
 		struct sample s;
 
 		data.count++;
 		s.seqnum = data.count;
 		s.duration = sample;
+		s.outer_duration = outer_sample;
 		s.timestamp = CURRENT_TIME;
 		__buffer_add_sample(&s);
 
@@ -738,10 +757,11 @@ static ssize_t debug_sample_fread(struct file *filp, char __user *ubuf,
 		}
 	}
 
-	len = snprintf(buf, sizeof(buf), "%010lu.%010lu\t%llu\n",
-		      sample->timestamp.tv_sec,
-		      sample->timestamp.tv_nsec,
-		      sample->duration);
+	len = snprintf(buf, sizeof(buf), "%010lu.%010lu\t%llu\t%llu\n",
+		       sample->timestamp.tv_sec,
+		       sample->timestamp.tv_nsec,
+		       sample->duration,
+		       sample->outer_duration);
 
 
 	/* handling partial reads is more trouble than it's worth */
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH RT 06/10] hwlat-detect/trace: Export trace_clock_local for hwlat-detector
  2013-09-08 12:25 [PATCH RT 00/10] 3.8.13-rt16-rc1 stable review Steven Rostedt
                   ` (4 preceding siblings ...)
  2013-09-08 12:25 ` [PATCH RT 05/10] hwlat-detector: Update hwlat_detector to add outer loop detection Steven Rostedt
@ 2013-09-08 12:25 ` Steven Rostedt
  2013-09-08 12:25 ` [PATCH RT 07/10] hwlat-detector: Use trace_clock_local if available Steven Rostedt
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: Steven Rostedt @ 2013-09-08 12:25 UTC (permalink / raw)
  To: linux-kernel, linux-rt-users
  Cc: Thomas Gleixner, Carsten Emde, Sebastian Andrzej Siewior,
	John Kacur

[-- Attachment #1: 0006-hwlat-detect-trace-Export-trace_clock_local-for-hwla.patch --]
[-- Type: text/plain, Size: 733 bytes --]

From: "Steven Rostedt (Red Hat)" <rostedt@goodmis.org>

The hwlat-detector needs a better clock than just ktime_get() as that
can induce its own latencies. The trace clock is perfect for it, but
it needs to be exported for use by modules.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 kernel/trace/trace_clock.c |    1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/trace/trace_clock.c b/kernel/trace/trace_clock.c
index 3947835..1bbb1b2 100644
--- a/kernel/trace/trace_clock.c
+++ b/kernel/trace/trace_clock.c
@@ -44,6 +44,7 @@ u64 notrace trace_clock_local(void)
 
 	return clock;
 }
+EXPORT_SYMBOL_GPL(trace_clock_local);
 
 /*
  * trace_clock(): 'between' trace clock. Not completely serialized,
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH RT 07/10] hwlat-detector: Use trace_clock_local if available
  2013-09-08 12:25 [PATCH RT 00/10] 3.8.13-rt16-rc1 stable review Steven Rostedt
                   ` (5 preceding siblings ...)
  2013-09-08 12:25 ` [PATCH RT 06/10] hwlat-detect/trace: Export trace_clock_local for hwlat-detector Steven Rostedt
@ 2013-09-08 12:25 ` Steven Rostedt
  2013-09-08 12:25 ` [PATCH RT 08/10] hwlat-detector: Use thread instead of stop machine Steven Rostedt
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: Steven Rostedt @ 2013-09-08 12:25 UTC (permalink / raw)
  To: linux-kernel, linux-rt-users
  Cc: Thomas Gleixner, Carsten Emde, Sebastian Andrzej Siewior,
	John Kacur, stable-rt, Steven Rostedt

[-- Attachment #1: 0007-hwlat-detector-Use-trace_clock_local-if-available.patch --]
[-- Type: text/plain, Size: 3093 bytes --]

From: Steven Rostedt <rostedt@goodmis.org>

As ktime_get() calls into the timing code which does a read_seq(), it
may be affected by other CPUS that touch that lock. To remove this
dependency, use the trace_clock_local() which is already exported
for module use. If CONFIG_TRACING is enabled, use that as the clock,
otherwise use ktime_get().

Cc: stable-rt@vger.kernel.org
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 drivers/misc/hwlat_detector.c |   34 +++++++++++++++++++++++++---------
 1 file changed, 25 insertions(+), 9 deletions(-)

diff --git a/drivers/misc/hwlat_detector.c b/drivers/misc/hwlat_detector.c
index f93b8ef..13443e9 100644
--- a/drivers/misc/hwlat_detector.c
+++ b/drivers/misc/hwlat_detector.c
@@ -51,6 +51,7 @@
 #include <linux/version.h>
 #include <linux/delay.h>
 #include <linux/slab.h>
+#include <linux/trace_clock.h>
 
 #define BUF_SIZE_DEFAULT	262144UL		/* 8K*(sizeof(entry)) */
 #define BUF_FLAGS		(RB_FL_OVERWRITE)	/* no block on full */
@@ -211,6 +212,21 @@ static struct sample *buffer_get_sample(struct sample *sample)
 	return sample;
 }
 
+#ifndef CONFIG_TRACING
+#define time_type	ktime_t
+#define time_get()	ktime_get()
+#define time_to_us(x)	ktime_to_us(x)
+#define time_sub(a, b)	ktime_sub(a, b)
+#define init_time(a, b)	(a).tv64 = b
+#define time_u64(a)	(a).tv64
+#else
+#define time_type	u64
+#define time_get()	trace_clock_local()
+#define time_to_us(x)	div_u64(x, 1000)
+#define time_sub(a, b)	((a) - (b))
+#define init_time(a, b)	a = b
+#define time_u64(a)	a
+#endif
 /**
  * get_sample - sample the CPU TSC and look for likely hardware latencies
  * @unused: This is not used but is a part of the stop_machine API
@@ -220,23 +236,23 @@ static struct sample *buffer_get_sample(struct sample *sample)
  */
 static int get_sample(void *unused)
 {
-	ktime_t start, t1, t2, last_t2;
+	time_type start, t1, t2, last_t2;
 	s64 diff, total = 0;
 	u64 sample = 0;
 	u64 outer_sample = 0;
 	int ret = 1;
 
-	last_t2.tv64 = 0;
-	start = ktime_get(); /* start timestamp */
+	init_time(last_t2, 0);
+	start = time_get(); /* start timestamp */
 
 	do {
 
-		t1 = ktime_get();	/* we'll look for a discontinuity */
-		t2 = ktime_get();
+		t1 = time_get();	/* we'll look for a discontinuity */
+		t2 = time_get();
 
-		if (last_t2.tv64) {
+		if (time_u64(last_t2)) {
 			/* Check the delta from the outer loop (t2 to next t1) */
-			diff = ktime_to_us(ktime_sub(t1, last_t2));
+			diff = time_to_us(time_sub(t1, last_t2));
 			/* This shouldn't happen */
 			if (diff < 0) {
 				printk(KERN_ERR BANNER "time running backwards\n");
@@ -247,10 +263,10 @@ static int get_sample(void *unused)
 		}
 		last_t2 = t2;
 
-		total = ktime_to_us(ktime_sub(t2, start)); /* sample width */
+		total = time_to_us(time_sub(t2, start)); /* sample width */
 
 		/* This checks the inner loop (t1 to t2) */
-		diff = ktime_to_us(ktime_sub(t2, t1));     /* current diff */
+		diff = time_to_us(time_sub(t2, t1));     /* current diff */
 
 		/* This shouldn't happen */
 		if (diff < 0) {
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH RT 08/10] hwlat-detector: Use thread instead of stop machine
  2013-09-08 12:25 [PATCH RT 00/10] 3.8.13-rt16-rc1 stable review Steven Rostedt
                   ` (6 preceding siblings ...)
  2013-09-08 12:25 ` [PATCH RT 07/10] hwlat-detector: Use trace_clock_local if available Steven Rostedt
@ 2013-09-08 12:25 ` Steven Rostedt
  2013-09-08 12:25 ` [PATCH RT 09/10] genirq: do not invoke the affinity callback via a workqueue Steven Rostedt
  2013-09-08 12:25 ` [PATCH RT 10/10] Linux 3.8.13-rt16-rc1 Steven Rostedt
  9 siblings, 0 replies; 11+ messages in thread
From: Steven Rostedt @ 2013-09-08 12:25 UTC (permalink / raw)
  To: linux-kernel, linux-rt-users
  Cc: Thomas Gleixner, Carsten Emde, Sebastian Andrzej Siewior,
	John Kacur, stable-rt

[-- Attachment #1: 0008-hwlat-detector-Use-thread-instead-of-stop-machine.patch --]
[-- Type: text/plain, Size: 6412 bytes --]

From: Steven Rostedt <rostedt@goodmis.org>

There's no reason to use stop machine to search for hardware latency.
Simply disabling interrupts while running the loop will do enough to
check if something comes in that wasn't disabled by interrupts being
off, which is exactly what stop machine does.

Instead of using stop machine, just have the thread disable interrupts
while it checks for hardware latency.

Cc: stable-rt@vger.kernel.org
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 drivers/misc/hwlat_detector.c |   59 +++++++++++++++++------------------------
 1 file changed, 25 insertions(+), 34 deletions(-)

diff --git a/drivers/misc/hwlat_detector.c b/drivers/misc/hwlat_detector.c
index 13443e9..6f61d5f 100644
--- a/drivers/misc/hwlat_detector.c
+++ b/drivers/misc/hwlat_detector.c
@@ -41,7 +41,6 @@
 #include <linux/module.h>
 #include <linux/init.h>
 #include <linux/ring_buffer.h>
-#include <linux/stop_machine.h>
 #include <linux/time.h>
 #include <linux/hrtimer.h>
 #include <linux/kthread.h>
@@ -107,7 +106,6 @@ struct data;					/* Global state */
 /* Sampling functions */
 static int __buffer_add_sample(struct sample *sample);
 static struct sample *buffer_get_sample(struct sample *sample);
-static int get_sample(void *unused);
 
 /* Threading and state */
 static int kthread_fn(void *unused);
@@ -149,7 +147,7 @@ struct sample {
 	unsigned long   lost;
 };
 
-/* keep the global state somewhere. Mostly used under stop_machine. */
+/* keep the global state somewhere. */
 static struct data {
 
 	struct mutex lock;		/* protect changes */
@@ -172,7 +170,7 @@ static struct data {
  * @sample: The new latency sample value
  *
  * This receives a new latency sample and records it in a global ring buffer.
- * No additional locking is used in this case - suited for stop_machine use.
+ * No additional locking is used in this case.
  */
 static int __buffer_add_sample(struct sample *sample)
 {
@@ -229,18 +227,17 @@ static struct sample *buffer_get_sample(struct sample *sample)
 #endif
 /**
  * get_sample - sample the CPU TSC and look for likely hardware latencies
- * @unused: This is not used but is a part of the stop_machine API
  *
  * Used to repeatedly capture the CPU TSC (or similar), looking for potential
- * hardware-induced latency. Called under stop_machine, with data.lock held.
+ * hardware-induced latency. Called with interrupts disabled and with data.lock held.
  */
-static int get_sample(void *unused)
+static int get_sample(void)
 {
 	time_type start, t1, t2, last_t2;
 	s64 diff, total = 0;
 	u64 sample = 0;
 	u64 outer_sample = 0;
-	int ret = 1;
+	int ret = -1;
 
 	init_time(last_t2, 0);
 	start = time_get(); /* start timestamp */
@@ -279,10 +276,14 @@ static int get_sample(void *unused)
 
 	} while (total <= data.sample_width);
 
+	ret = 0;
+
 	/* If we exceed the threshold value, we have found a hardware latency */
 	if (sample > data.threshold || outer_sample > data.threshold) {
 		struct sample s;
 
+		ret = 1;
+
 		data.count++;
 		s.seqnum = data.count;
 		s.duration = sample;
@@ -295,7 +296,6 @@ static int get_sample(void *unused)
 			data.max_sample = sample;
 	}
 
-	ret = 0;
 out:
 	return ret;
 }
@@ -305,32 +305,30 @@ out:
  * @unused: A required part of the kthread API.
  *
  * Used to periodically sample the CPU TSC via a call to get_sample. We
- * use stop_machine, whith does (intentionally) introduce latency since we
+ * disable interrupts, which does (intentionally) introduce latency since we
  * need to ensure nothing else might be running (and thus pre-empting).
  * Obviously this should never be used in production environments.
  *
- * stop_machine will schedule us typically only on CPU0 which is fine for
- * almost every real-world hardware latency situation - but we might later
- * generalize this if we find there are any actualy systems with alternate
- * SMI delivery or other non CPU0 hardware latencies.
+ * Currently this runs on which ever CPU it was scheduled on, but most
+ * real-worald hardware latency situations occur across several CPUs,
+ * but we might later generalize this if we find there are any actualy
+ * systems with alternate SMI delivery or other hardware latencies.
  */
 static int kthread_fn(void *unused)
 {
-	int err = 0;
-	u64 interval = 0;
+	int ret;
+	u64 interval;
 
 	while (!kthread_should_stop()) {
 
 		mutex_lock(&data.lock);
 
-		err = stop_machine(get_sample, unused, 0);
-		if (err) {
-			/* Houston, we have a problem */
-			mutex_unlock(&data.lock);
-			goto err_out;
-		}
+		local_irq_disable();
+		ret = get_sample();
+		local_irq_enable();
 
-		wake_up(&data.wq); /* wake up reader(s) */
+		if (ret > 0)
+			wake_up(&data.wq); /* wake up reader(s) */
 
 		interval = data.sample_window - data.sample_width;
 		do_div(interval, USEC_PER_MSEC); /* modifies interval value */
@@ -338,15 +336,10 @@ static int kthread_fn(void *unused)
 		mutex_unlock(&data.lock);
 
 		if (msleep_interruptible(interval))
-			goto out;
+			break;
 	}
-		goto out;
-err_out:
-	printk(KERN_ERR BANNER "could not call stop_machine, disabling\n");
-	enabled = 0;
-out:
-	return err;
 
+	return 0;
 }
 
 /**
@@ -442,8 +435,7 @@ out:
  * This function provides a generic read implementation for the global state
  * "data" structure debugfs filesystem entries. It would be nice to use
  * simple_attr_read directly, but we need to make sure that the data.lock
- * spinlock is held during the actual read (even though we likely won't ever
- * actually race here as the updater runs under a stop_machine context).
+ * is held during the actual read.
  */
 static ssize_t simple_data_read(struct file *filp, char __user *ubuf,
 				size_t cnt, loff_t *ppos, const u64 *entry)
@@ -478,8 +470,7 @@ static ssize_t simple_data_read(struct file *filp, char __user *ubuf,
  * This function provides a generic write implementation for the global state
  * "data" structure debugfs filesystem entries. It would be nice to use
  * simple_attr_write directly, but we need to make sure that the data.lock
- * spinlock is held during the actual write (even though we likely won't ever
- * actually race here as the updater runs under a stop_machine context).
+ * is held during the actual write.
  */
 static ssize_t simple_data_write(struct file *filp, const char __user *ubuf,
 				 size_t cnt, loff_t *ppos, u64 *entry)
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH RT 09/10] genirq: do not invoke the affinity callback via a workqueue
  2013-09-08 12:25 [PATCH RT 00/10] 3.8.13-rt16-rc1 stable review Steven Rostedt
                   ` (7 preceding siblings ...)
  2013-09-08 12:25 ` [PATCH RT 08/10] hwlat-detector: Use thread instead of stop machine Steven Rostedt
@ 2013-09-08 12:25 ` Steven Rostedt
  2013-09-08 12:25 ` [PATCH RT 10/10] Linux 3.8.13-rt16-rc1 Steven Rostedt
  9 siblings, 0 replies; 11+ messages in thread
From: Steven Rostedt @ 2013-09-08 12:25 UTC (permalink / raw)
  To: linux-kernel, linux-rt-users
  Cc: Thomas Gleixner, Carsten Emde, Sebastian Andrzej Siewior,
	John Kacur, stable-rt

[-- Attachment #1: 0009-genirq-do-not-invoke-the-affinity-callback-via-a-wor.patch --]
[-- Type: text/plain, Size: 4444 bytes --]

From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

Joe Korty reported, that __irq_set_affinity_locked() schedules a
workqueue while holding a rawlock which results in a might_sleep()
warning.
This patch moves the invokation into a process context so that we only
wakeup() a process while holding the lock.

Cc: stable-rt@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 include/linux/interrupt.h |    1 +
 kernel/irq/manage.c       |   79 +++++++++++++++++++++++++++++++++++++++++++--
 2 files changed, 77 insertions(+), 3 deletions(-)

diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
index 11bdb1e..838d4bd 100644
--- a/include/linux/interrupt.h
+++ b/include/linux/interrupt.h
@@ -263,6 +263,7 @@ struct irq_affinity_notify {
 	unsigned int irq;
 	struct kref kref;
 	struct work_struct work;
+	struct list_head list;
 	void (*notify)(struct irq_affinity_notify *, const cpumask_t *mask);
 	void (*release)(struct kref *ref);
 };
diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
index 81a28dd..cf5b5f7 100644
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -163,6 +163,62 @@ int irq_do_set_affinity(struct irq_data *data, const struct cpumask *mask,
 	return ret;
 }
 
+#ifdef CONFIG_PREEMPT_RT_FULL
+static void _irq_affinity_notify(struct irq_affinity_notify *notify);
+static struct task_struct *set_affinity_helper;
+static LIST_HEAD(affinity_list);
+static DEFINE_RAW_SPINLOCK(affinity_list_lock);
+
+static int set_affinity_thread(void *unused)
+{
+	while (1) {
+		struct irq_affinity_notify *notify;
+		int empty;
+
+		set_current_state(TASK_INTERRUPTIBLE);
+
+		raw_spin_lock_irq(&affinity_list_lock);
+		empty = list_empty(&affinity_list);
+		raw_spin_unlock_irq(&affinity_list_lock);
+
+		if (empty)
+			schedule();
+		if (kthread_should_stop())
+			break;
+		set_current_state(TASK_RUNNING);
+try_next:
+		notify = NULL;
+
+		raw_spin_lock_irq(&affinity_list_lock);
+		if (!list_empty(&affinity_list)) {
+			notify = list_first_entry(&affinity_list,
+					struct irq_affinity_notify, list);
+			list_del_init(&notify->list);
+		}
+		raw_spin_unlock_irq(&affinity_list_lock);
+
+		if (!notify)
+			continue;
+		_irq_affinity_notify(notify);
+		goto try_next;
+	}
+	return 0;
+}
+
+static void init_helper_thread(void)
+{
+	if (set_affinity_helper)
+		return;
+	set_affinity_helper = kthread_run(set_affinity_thread, NULL,
+			"affinity-cb");
+	WARN_ON(IS_ERR(set_affinity_helper));
+}
+#else
+
+static inline void init_helper_thread(void) { }
+
+#endif
+
 int __irq_set_affinity_locked(struct irq_data *data, const struct cpumask *mask)
 {
 	struct irq_chip *chip = irq_data_get_irq_chip(data);
@@ -181,7 +237,17 @@ int __irq_set_affinity_locked(struct irq_data *data, const struct cpumask *mask)
 
 	if (desc->affinity_notify) {
 		kref_get(&desc->affinity_notify->kref);
+
+#ifdef CONFIG_PREEMPT_RT_FULL
+		raw_spin_lock(&affinity_list_lock);
+		if (list_empty(&desc->affinity_notify->list))
+			list_add_tail(&affinity_list,
+					&desc->affinity_notify->list);
+		raw_spin_unlock(&affinity_list_lock);
+		wake_up_process(set_affinity_helper);
+#else
 		schedule_work(&desc->affinity_notify->work);
+#endif
 	}
 	irqd_set(data, IRQD_AFFINITY_SET);
 
@@ -222,10 +288,8 @@ int irq_set_affinity_hint(unsigned int irq, const struct cpumask *m)
 }
 EXPORT_SYMBOL_GPL(irq_set_affinity_hint);
 
-static void irq_affinity_notify(struct work_struct *work)
+static void _irq_affinity_notify(struct irq_affinity_notify *notify)
 {
-	struct irq_affinity_notify *notify =
-		container_of(work, struct irq_affinity_notify, work);
 	struct irq_desc *desc = irq_to_desc(notify->irq);
 	cpumask_var_t cpumask;
 	unsigned long flags;
@@ -247,6 +311,13 @@ out:
 	kref_put(&notify->kref, notify->release);
 }
 
+static void irq_affinity_notify(struct work_struct *work)
+{
+	struct irq_affinity_notify *notify =
+		container_of(work, struct irq_affinity_notify, work);
+	_irq_affinity_notify(notify);
+}
+
 /**
  *	irq_set_affinity_notifier - control notification of IRQ affinity changes
  *	@irq:		Interrupt for which to enable/disable notification
@@ -276,6 +347,8 @@ irq_set_affinity_notifier(unsigned int irq, struct irq_affinity_notify *notify)
 		notify->irq = irq;
 		kref_init(&notify->kref);
 		INIT_WORK(&notify->work, irq_affinity_notify);
+		INIT_LIST_HEAD(&notify->list);
+		init_helper_thread();
 	}
 
 	raw_spin_lock_irqsave(&desc->lock, flags);
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH RT 10/10] Linux 3.8.13-rt16-rc1
  2013-09-08 12:25 [PATCH RT 00/10] 3.8.13-rt16-rc1 stable review Steven Rostedt
                   ` (8 preceding siblings ...)
  2013-09-08 12:25 ` [PATCH RT 09/10] genirq: do not invoke the affinity callback via a workqueue Steven Rostedt
@ 2013-09-08 12:25 ` Steven Rostedt
  9 siblings, 0 replies; 11+ messages in thread
From: Steven Rostedt @ 2013-09-08 12:25 UTC (permalink / raw)
  To: linux-kernel, linux-rt-users
  Cc: Thomas Gleixner, Carsten Emde, Sebastian Andrzej Siewior,
	John Kacur

[-- Attachment #1: 0010-Linux-3.8.13-rt16-rc1.patch --]
[-- Type: text/plain, Size: 301 bytes --]

From: "Steven Rostedt (Red Hat)" <rostedt@goodmis.org>

---
 localversion-rt |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/localversion-rt b/localversion-rt
index 18777ec..5d3eaf3 100644
--- a/localversion-rt
+++ b/localversion-rt
@@ -1 +1 @@
--rt15
+-rt16-rc1
-- 
1.7.10.4



^ permalink raw reply related	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2013-09-08 12:56 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-09-08 12:25 [PATCH RT 00/10] 3.8.13-rt16-rc1 stable review Steven Rostedt
2013-09-08 12:25 ` [PATCH RT 01/10] sched: Distangle worker accounting from rqlock Steven Rostedt
2013-09-08 12:25 ` [PATCH RT 02/10] rt,ntp: Move call to schedule_delayed_work() to helper thread Steven Rostedt
2013-09-08 12:25 ` [PATCH RT 03/10] hpsa: fix warning with smp_processor_id() in preemptible section Steven Rostedt
2013-09-08 12:25 ` [PATCH RT 04/10] powerpc: 52xx: provide a default in mpc52xx_irqhost_map() Steven Rostedt
2013-09-08 12:25 ` [PATCH RT 05/10] hwlat-detector: Update hwlat_detector to add outer loop detection Steven Rostedt
2013-09-08 12:25 ` [PATCH RT 06/10] hwlat-detect/trace: Export trace_clock_local for hwlat-detector Steven Rostedt
2013-09-08 12:25 ` [PATCH RT 07/10] hwlat-detector: Use trace_clock_local if available Steven Rostedt
2013-09-08 12:25 ` [PATCH RT 08/10] hwlat-detector: Use thread instead of stop machine Steven Rostedt
2013-09-08 12:25 ` [PATCH RT 09/10] genirq: do not invoke the affinity callback via a workqueue Steven Rostedt
2013-09-08 12:25 ` [PATCH RT 10/10] Linux 3.8.13-rt16-rc1 Steven Rostedt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).