[PATCH v2 0/2] workqueue lockdep limitations/bugs

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v2 0/2] workqueue lockdep limitations/bugs
@ 2018-08-22  9:49 Johannes Berg
  2018-08-22  9:49 ` [PATCH v2 1/2] workqueue: skip lockdep wq dependency in cancel_work_sync() Johannes Berg
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Johannes Berg @ 2018-08-22  9:49 UTC (permalink / raw)
  To: Tejun Heo, Lai Jiangshan; +Cc: linux-kernel, linux-wireless, Byungchul Park

This addresses the first two issues.

The third problem is actually caught when the flush ever does
anything, but that might be rare depending on the circumstances,
so I still think it'd be nice to catch it somehow, but right now
I don't have the time to look into it.

johannes

v2:
 * move the acquire/release into flush_work() where it used to be
 * also re-add the acquire/release into flush_workqueue()


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH v2 1/2] workqueue: skip lockdep wq dependency in cancel_work_sync()
  2018-08-22  9:49 [PATCH v2 0/2] workqueue lockdep limitations/bugs Johannes Berg
@ 2018-08-22  9:49 ` Johannes Berg
  2018-08-22  9:49 ` [PATCH v2 2/2] workqueue: re-add lockdep dependencies for flushing Johannes Berg
  2018-08-22 15:32 ` [PATCH v2 0/2] workqueue lockdep limitations/bugs Tejun Heo
  2 siblings, 0 replies; 4+ messages in thread
From: Johannes Berg @ 2018-08-22  9:49 UTC (permalink / raw)
  To: Tejun Heo, Lai Jiangshan
  Cc: linux-kernel, linux-wireless, Byungchul Park, Johannes Berg

From: Johannes Berg <johannes.berg@intel.com>

In cancel_work_sync(), we can only have one of two cases, even
with an ordered workqueue:
 * the work isn't running, just cancelled before it started
 * the work is running, but then nothing else can be on the
   workqueue before it

Thus, we need to skip the lockdep workqueue dependency handling,
otherwise we get false positive reports from lockdep saying that
we have a potential deadlock when the workqueue also has other
work items with locking, e.g.

  work1_function() { mutex_lock(&mutex); ... }
  work2_function() { /* nothing */ }

  other_function() {
    queue_work(ordered_wq, &work1);
    queue_work(ordered_wq, &work2);
    mutex_lock(&mutex);
    cancel_work_sync(&work2);
  }

As described above, this isn't a problem, but lockdep will
currently flag it as if cancel_work_sync() was flush_work(),
which *is* a problem.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
---
 kernel/workqueue.c | 37 ++++++++++++++++++++++---------------
 1 file changed, 22 insertions(+), 15 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 78b192071ef7..a6c2b823f348 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -2843,7 +2843,8 @@ void drain_workqueue(struct workqueue_struct *wq)
 }
 EXPORT_SYMBOL_GPL(drain_workqueue);
 
-static bool start_flush_work(struct work_struct *work, struct wq_barrier *barr)
+static bool start_flush_work(struct work_struct *work, struct wq_barrier *barr,
+			     bool from_cancel)
 {
 	struct worker *worker = NULL;
 	struct worker_pool *pool;
@@ -2885,7 +2886,8 @@ static bool start_flush_work(struct work_struct *work, struct wq_barrier *barr)
 	 * workqueues the deadlock happens when the rescuer stalls, blocking
 	 * forward progress.
 	 */
-	if (pwq->wq->saved_max_active == 1 || pwq->wq->rescuer) {
+	if (!from_cancel &&
+	    (pwq->wq->saved_max_active == 1 || pwq->wq->rescuer)) {
 		lock_map_acquire(&pwq->wq->lockdep_map);
 		lock_map_release(&pwq->wq->lockdep_map);
 	}
@@ -2896,6 +2898,22 @@ static bool start_flush_work(struct work_struct *work, struct wq_barrier *barr)
 	return false;
 }
 
+static bool __flush_work(struct work_struct *work, bool from_cancel)
+{
+	struct wq_barrier barr;
+
+	if (WARN_ON(!wq_online))
+		return false;
+
+	if (start_flush_work(work, &barr, from_cancel)) {
+		wait_for_completion(&barr.done);
+		destroy_work_on_stack(&barr.work);
+		return true;
+	} else {
+		return false;
+	}
+}
+
 /**
  * flush_work - wait for a work to finish executing the last queueing instance
  * @work: the work to flush
@@ -2909,18 +2927,7 @@ static bool start_flush_work(struct work_struct *work, struct wq_barrier *barr)
  */
 bool flush_work(struct work_struct *work)
 {
-	struct wq_barrier barr;
-
-	if (WARN_ON(!wq_online))
-		return false;
-
-	if (start_flush_work(work, &barr)) {
-		wait_for_completion(&barr.done);
-		destroy_work_on_stack(&barr.work);
-		return true;
-	} else {
-		return false;
-	}
+	return __flush_work(work, false);
 }
 EXPORT_SYMBOL_GPL(flush_work);
 
@@ -2986,7 +2993,7 @@ static bool __cancel_work_timer(struct work_struct *work, bool is_dwork)
 	 * isn't executing.
 	 */
 	if (wq_online)
-		flush_work(work);
+		__flush_work(work, true);
 
 	clear_work_data(work);
 
-- 
2.14.4


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH v2 2/2] workqueue: re-add lockdep dependencies for flushing
  2018-08-22  9:49 [PATCH v2 0/2] workqueue lockdep limitations/bugs Johannes Berg
  2018-08-22  9:49 ` [PATCH v2 1/2] workqueue: skip lockdep wq dependency in cancel_work_sync() Johannes Berg
@ 2018-08-22  9:49 ` Johannes Berg
  2018-08-22 15:32 ` [PATCH v2 0/2] workqueue lockdep limitations/bugs Tejun Heo
  2 siblings, 0 replies; 4+ messages in thread
From: Johannes Berg @ 2018-08-22  9:49 UTC (permalink / raw)
  To: Tejun Heo, Lai Jiangshan
  Cc: linux-kernel, linux-wireless, Byungchul Park, Johannes Berg

From: Johannes Berg <johannes.berg@intel.com>

In flush_work(), we need to create a lockdep dependency so that
the following scenario is appropriately tagged as a problem:

  work_function()
  {
    mutex_lock(&mutex);
    ...
  }

  other_function()
  {
    mutex_lock(&mutex);
    flush_work(&work); // or cancel_work_sync(&work);
  }

This is a problem since the work might be running and be blocked
on trying to acquire the mutex.

Similarly, in flush_workqueue().

These were removed after cross-release partially caught these
problems, but now cross-release was reverted anyway. IMHO the
removal was erroneous anyway though, since lockdep should be
able to catch potential problems, not just actual ones, and
cross-release would only have caught the problem when actually
invoking wait_for_completion().

Fixes: fd1a5b04dfb8 ("workqueue: Remove now redundant lock acquisitions wrt. workqueue flushes")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
---
 kernel/workqueue.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index a6c2b823f348..60e80198c3df 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -2652,6 +2652,9 @@ void flush_workqueue(struct workqueue_struct *wq)
 	if (WARN_ON(!wq_online))
 		return;
 
+	lock_map_acquire(&wq->lockdep_map);
+	lock_map_release(&wq->lockdep_map);
+
 	mutex_lock(&wq->mutex);
 
 	/*
@@ -2905,6 +2908,11 @@ static bool __flush_work(struct work_struct *work, bool from_cancel)
 	if (WARN_ON(!wq_online))
 		return false;
 
+	if (!from_cancel) {
+		lock_map_acquire(&work->lockdep_map);
+		lock_map_release(&work->lockdep_map);
+	}
+
 	if (start_flush_work(work, &barr, from_cancel)) {
 		wait_for_completion(&barr.done);
 		destroy_work_on_stack(&barr.work);
-- 
2.14.4


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH v2 0/2] workqueue lockdep limitations/bugs
  2018-08-22  9:49 [PATCH v2 0/2] workqueue lockdep limitations/bugs Johannes Berg
  2018-08-22  9:49 ` [PATCH v2 1/2] workqueue: skip lockdep wq dependency in cancel_work_sync() Johannes Berg
  2018-08-22  9:49 ` [PATCH v2 2/2] workqueue: re-add lockdep dependencies for flushing Johannes Berg
@ 2018-08-22 15:32 ` Tejun Heo
  2 siblings, 0 replies; 4+ messages in thread
From: Tejun Heo @ 2018-08-22 15:32 UTC (permalink / raw)
  To: Johannes Berg; +Cc: Lai Jiangshan, linux-kernel, linux-wireless, Byungchul Park

On Wed, Aug 22, 2018 at 11:49:02AM +0200, Johannes Berg wrote:
> This addresses the first two issues.
> 
> The third problem is actually caught when the flush ever does
> anything, but that might be rare depending on the circumstances,
> so I still think it'd be nice to catch it somehow, but right now
> I don't have the time to look into it.

Applied to libata/for-4.19.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2018-08-22 15:32 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-08-22  9:49 [PATCH v2 0/2] workqueue lockdep limitations/bugs Johannes Berg
2018-08-22  9:49 ` [PATCH v2 1/2] workqueue: skip lockdep wq dependency in cancel_work_sync() Johannes Berg
2018-08-22  9:49 ` [PATCH v2 2/2] workqueue: re-add lockdep dependencies for flushing Johannes Berg
2018-08-22 15:32 ` [PATCH v2 0/2] workqueue lockdep limitations/bugs Tejun Heo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).