All of lore.kernel.org
 help / color / mirror / Atom feed
From: Roland Dreier <rdreier-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org>
To: Bart Van Assche
	<bart.vanassche-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	general-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5@public.gmane.org
Subject: [NEW PATCH] IB/mad: Fix possible lock-lock-timer deadlock
Date: Mon, 07 Sep 2009 08:37:26 -0700	[thread overview]
Message-ID: <adaws4an4uh.fsf@cisco.com> (raw)

A new interface was added to the core workqueue API to make handling
cancel_delayed_work() deadlocks easier, so a simpler fix for bug 13757
as below becomes possible.  Bart, it would be great if you could retest
this, since it is what I am planning on sending upstream for 2.6.31.
(This patch depends on 4e49627b, "workqueues: introduce
__cancel_delayed_work()", which was merged for 2.6.31-rc9; alternatively
my for-next branch is now rebased on top of -rc9 and has this patch plus
everything else queued for 2.6.32).

Thanks,
  Roland


Lockdep reported a possible deadlock with cm_id_priv->lock,
mad_agent_priv->lock and mad_agent_priv->timed_work.timer; this
happens because the mad module does

	cancel_delayed_work(&mad_agent_priv->timed_work);

while holding mad_agent_priv->lock.  cancel_delayed_work() internally
does del_timer_sync(&mad_agent_priv->timed_work.timer).

This can turn into a deadlock because mad_agent_priv->lock is taken
inside cm_id_priv->lock, so we can get the following set of contexts
that deadlock each other:

 A: holding cm_id_priv->lock, waiting for mad_agent_priv->lock
 B: holding mad_agent_priv->lock, waiting for del_timer_sync()
 C: interrupt during mad_agent_priv->timed_work.timer that takes
    cm_id_priv->lock

Fix this by using the new __cancel_delayed_work() interface (which
internally does del_timer() instead of del_timer_sync()) in all the
places where we are holding a lock.

Addresses: http://bugzilla.kernel.org/show_bug.cgi?id=13757
Reported-by: Bart Van Assche <bart.vanassche-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Signed-off-by: Roland Dreier <rolandd-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org>
---
 drivers/infiniband/core/mad.c |    6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c
index de922a0..bc30c00 100644
--- a/drivers/infiniband/core/mad.c
+++ b/drivers/infiniband/core/mad.c
@@ -1974,7 +1974,7 @@ static void adjust_timeout(struct ib_mad_agent_private *mad_agent_priv)
 	unsigned long delay;
 
 	if (list_empty(&mad_agent_priv->wait_list)) {
-		cancel_delayed_work(&mad_agent_priv->timed_work);
+		__cancel_delayed_work(&mad_agent_priv->timed_work);
 	} else {
 		mad_send_wr = list_entry(mad_agent_priv->wait_list.next,
 					 struct ib_mad_send_wr_private,
@@ -1983,7 +1983,7 @@ static void adjust_timeout(struct ib_mad_agent_private *mad_agent_priv)
 		if (time_after(mad_agent_priv->timeout,
 			       mad_send_wr->timeout)) {
 			mad_agent_priv->timeout = mad_send_wr->timeout;
-			cancel_delayed_work(&mad_agent_priv->timed_work);
+			__cancel_delayed_work(&mad_agent_priv->timed_work);
 			delay = mad_send_wr->timeout - jiffies;
 			if ((long)delay <= 0)
 				delay = 1;
@@ -2023,7 +2023,7 @@ static void wait_for_response(struct ib_mad_send_wr_private *mad_send_wr)
 
 	/* Reschedule a work item if we have a shorter timeout */
 	if (mad_agent_priv->wait_list.next == &mad_send_wr->agent_list) {
-		cancel_delayed_work(&mad_agent_priv->timed_work);
+		__cancel_delayed_work(&mad_agent_priv->timed_work);
 		queue_delayed_work(mad_agent_priv->qp_info->port_priv->wq,
 				   &mad_agent_priv->timed_work, delay);
 	}
-- 
1.6.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

             reply	other threads:[~2009-09-07 15:37 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-09-07 15:37 Roland Dreier [this message]
     [not found] ` <adaws4an4uh.fsf-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org>
2009-09-07 20:27   ` [NEW PATCH] IB/mad: Fix possible lock-lock-timer deadlock Bart Van Assche
     [not found]     ` <e2e108260909071327o7f521876s60d643b455e7c6ec-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2009-09-08  4:21       ` Roland Dreier
     [not found]         ` <adaskeym5gu.fsf-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org>
2009-09-08  6:25           ` Bart Van Assche
2009-09-08 17:01             ` [ofa-general] " Bart Van Assche
     [not found]               ` <e2e108260909081001u5c31fcf0lca909c488831ec4b-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2009-09-08 17:17                 ` Roland Dreier
     [not found]             ` <e2e108260909072325w2e0b4b1na0aa01a74f2341e4-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2009-09-08 17:15               ` Roland Dreier
     [not found]                 ` <adabpllmk7c.fsf-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org>
2009-09-08 19:09                   ` Bart Van Assche
     [not found]                     ` <e2e108260909081209t36bfef12m24ce000686ed116e-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2009-09-09 20:42                       ` [PATCH/RFC] IB/mad: Fix lock-lock-timer deadlock in RMPP code (was: [NEW PATCH] IB/mad: Fix possible lock-lock-timer deadlock) Roland Dreier
     [not found]                         ` <adavdjrkfyq.fsf_-_-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org>
2009-09-09 21:22                           ` Sean Hefty
     [not found]                             ` <F658AB9802E54F9887A2753721FA7882-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org>
2009-09-09 21:34                               ` [PATCH/RFC] IB/mad: Fix lock-lock-timer deadlock in RMPP code Roland Dreier
2009-09-22 18:27                               ` Roland Dreier
     [not found]                                 ` <ada7hvq7s36.fsf-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org>
2009-09-22 22:27                                   ` Sean Hefty
     [not found]                                     ` <9DA1536B0B4943E7BC52280C977F1D23-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org>
2009-09-23 18:09                                       ` Roland Dreier

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=adaws4an4uh.fsf@cisco.com \
    --to=rdreier-fyb4gu1cfyuavxtiumwx3w@public.gmane.org \
    --cc=bart.vanassche-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
    --cc=general-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5@public.gmane.org \
    --cc=linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.