[PATCH RT 2/4] workqueue: Prevent deadlock/stall on RT

linux-rt-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Steven Rostedt <rostedt@goodmis.org>
To: linux-kernel@vger.kernel.org,
	linux-rt-users <linux-rt-users@vger.kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>,
	Carsten Emde <C.Emde@osadl.org>,
	Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
	John Kacur <jkacur@redhat.com>,
	Paul Gortmaker <paul.gortmaker@windriver.com>,
	Richard Weinberger <richard.weinberger@gmail.com>,
	stable-rt@vger.kernel.org
Subject: [PATCH RT 2/4] workqueue: Prevent deadlock/stall on RT
Date: Mon, 14 Jul 2014 16:06:36 -0400	[thread overview]
Message-ID: <20140714200652.239622566@goodmis.org> (raw)
In-Reply-To: 20140714200634.590977709@goodmis.org

[-- Attachment #1: 0002-workqueue-Prevent-deadlock-stall-on-RT.patch --]
[-- Type: text/plain, Size: 5412 bytes --]

3.2.60-rt89-rc1 stable review patch.
If anyone has any objections, please let me know.

------------------

From: Thomas Gleixner <tglx@linutronix.de>

Austin reported a XFS deadlock/stall on RT where scheduled work gets
never exececuted and tasks are waiting for each other for ever.

The underlying problem is the modification of the RT code to the
handling of workers which are about to go to sleep. In mainline a
worker thread which goes to sleep wakes an idle worker if there is
more work to do. This happens from the guts of the schedule()
function. On RT this must be outside and the accessed data structures
are not protected against scheduling due to the spinlock to rtmutex
conversion. So the naive solution to this was to move the code outside
of the scheduler and protect the data structures by the pool
lock. That approach turned out to be a little naive as we cannot call
into that code when the thread blocks on a lock, as it is not allowed
to block on two locks in parallel. So we dont call into the worker
wakeup magic when the worker is blocked on a lock, which causes the
deadlock/stall observed by Austin and Mike.

Looking deeper into that worker code it turns out that the only
relevant data structure which needs to be protected is the list of
idle workers which can be woken up.

So the solution is to protect the list manipulation operations with
preempt_enable/disable pairs on RT and call unconditionally into the
worker code even when the worker is blocked on a lock. The preemption
protection is safe as there is nothing which can fiddle with the list
outside of thread context.

Reported-and_tested-by: Austin Schuh <austin@peloton-tech.com>
Reported-and_tested-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: http://vger.kernel.org/r/alpine.DEB.2.10.1406271249510.5170@nanos
Cc: Richard Weinberger <richard.weinberger@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: stable-rt@vger.kernel.org
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 kernel/sched.c     | 10 +++++-----
 kernel/workqueue.c | 41 +++++++++++++++++++++++++++++++++++------
 2 files changed, 40 insertions(+), 11 deletions(-)

diff --git a/kernel/sched.c b/kernel/sched.c
index b0e67590ba62..7fb61d32d771 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -4795,9 +4795,8 @@ need_resched:
 
 static inline void sched_submit_work(struct task_struct *tsk)
 {
-	if (!tsk->state || tsk_is_pi_blocked(tsk))
+	if (!tsk->state)
 		return;
-
 	/*
 	 * If a worker went to sleep, notify and ask workqueue whether
 	 * it wants to wake up a task to maintain concurrency.
@@ -4807,6 +4806,10 @@ static inline void sched_submit_work(struct task_struct *tsk)
 	if (tsk->flags & PF_WQ_WORKER && !tsk->saved_state)
 		wq_worker_sleeping(tsk);
 
+
+	if (tsk_is_pi_blocked(tsk))
+		return;
+
 	/*
 	 * If we are going to sleep and we have plugged IO queued,
 	 * make sure to submit it to avoid deadlocks.
@@ -4817,9 +4820,6 @@ static inline void sched_submit_work(struct task_struct *tsk)
 
 static inline void sched_update_worker(struct task_struct *tsk)
 {
-	if (tsk_is_pi_blocked(tsk))
-		return;
-
 	if (tsk->flags & PF_WQ_WORKER)
 		wq_worker_running(tsk);
 }
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 62ebed73dfa2..505b50dff3b5 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -319,6 +319,31 @@ static inline int __next_wq_cpu(int cpu, const struct cpumask *mask,
 	     (cpu) < WORK_CPU_NONE;					\
 	     (cpu) = __next_wq_cpu((cpu), cpu_possible_mask, (wq)))
 
+#ifdef CONFIG_PREEMPT_RT_BASE
+static inline void rt_lock_idle_list(struct global_cwq *gcwq)
+{
+	preempt_disable();
+}
+static inline void rt_unlock_idle_list(struct global_cwq *gcwq)
+{
+	preempt_enable();
+}
+static inline void sched_lock_idle_list(struct global_cwq *gcwq) { }
+static inline void sched_unlock_idle_list(struct global_cwq *gcwq) { }
+#else
+static inline void rt_lock_idle_list(struct global_cwq *gcwq) { }
+static inline void rt_unlock_idle_list(struct global_cwq *gcwq) { }
+static inline void sched_lock_idle_list(struct global_cwq *gcwq)
+{
+	spin_lock_irq(&gcwq->lock);
+}
+static inline void sched_unlock_idle_list(struct global_cwq *gcwq)
+{
+	spin_unlock_irq(&gcwq->lock);
+}
+#endif
+
+
 #ifdef CONFIG_DEBUG_OBJECTS_WORK
 
 static struct debug_obj_descr work_debug_descr;
@@ -655,10 +680,16 @@ static struct worker *first_worker(struct global_cwq *gcwq)
  */
 static void wake_up_worker(struct global_cwq *gcwq)
 {
-	struct worker *worker = first_worker(gcwq);
+	struct worker *worker;
+
+	rt_lock_idle_list(gcwq);
+
+	worker = first_worker(gcwq);
 
 	if (likely(worker))
 		wake_up_process(worker->task);
+
+	rt_unlock_idle_list(gcwq);
 }
 
 /**
@@ -701,7 +732,6 @@ void wq_worker_sleeping(struct task_struct *task)
 
 	cpu = smp_processor_id();
 	gcwq = get_gcwq(cpu);
-	spin_lock_irq(&gcwq->lock);
 	/*
 	 * The counterpart of the following dec_and_test, implied mb,
 	 * worklist not empty test sequence is in insert_work().
@@ -709,11 +739,10 @@ void wq_worker_sleeping(struct task_struct *task)
 	 */
 	if (atomic_dec_and_test(get_gcwq_nr_running(cpu)) &&
 	    !list_empty(&gcwq->worklist)) {
-		worker = first_worker(gcwq);
-		if (worker)
-			wake_up_process(worker->task);
+		sched_lock_idle_list(gcwq);
+		wake_up_worker(gcwq);
+		sched_unlock_idle_list(gcwq);
 	}
-	spin_unlock_irq(&gcwq->lock);
 }
 
 /**
-- 
2.0.0

next prev parent reply	other threads:[~2014-07-14 20:06 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-07-14 20:06 [PATCH RT 0/4] Linux 3.2.60-rt89-rc1 Steven Rostedt
2014-07-14 20:06 ` [PATCH RT 1/4] sched: Do not clear PF_NO_SETAFFINITY flag in select_fallback_rq() Steven Rostedt
2014-07-14 20:06 ` Steven Rostedt [this message]
2014-07-14 20:06 ` [PATCH RT 3/4] hrtimer:fix the miss of hrtimer_peek_ahead_timers in nort code Steven Rostedt
2014-07-14 20:06 ` [PATCH RT 4/4] Linux 3.2.60-rt89-rc1 Steven Rostedt
2014-07-14 23:05 ` [PATCH RT 0/4] " Pavel Vasilyev
2014-07-16 10:28   ` Rolf Peukert
2014-07-18  3:55     ` Pavel Vasilyev
  -- strict thread matches above, loose matches on Subject: below --
2014-07-14 20:05 [PATCH RT 0/4] Linux 3.4.97-rt121-rc1 Steven Rostedt
2014-07-14 20:05 ` [PATCH RT 2/4] workqueue: Prevent deadlock/stall on RT Steven Rostedt

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:b0e67590ba6 dfblob:7fb61d32d77 dfblob:62ebed73dfa
dfblob:505b50dff3b )
 OR (
bs:"[PATCH RT 2/4] workqueue: Prevent deadlock/stall on RT" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140714200652.239622566@goodmis.org \
    --to=rostedt@goodmis.org \
    --cc=C.Emde@osadl.org \
    --cc=bigeasy@linutronix.de \
    --cc=jkacur@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-rt-users@vger.kernel.org \
    --cc=paul.gortmaker@windriver.com \
    --cc=richard.weinberger@gmail.com \
    --cc=stable-rt@vger.kernel.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).