[PATCH 00/10] IPoIB locking fixes and a couple minor other patches

public inbox for linux-rdma@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH 00/10] IPoIB locking fixes and a couple minor other patches
@ 2014-12-10 16:46 Doug Ledford
       [not found] ` <cover.1418229671.git.dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 20+ messages in thread
From: Doug Ledford @ 2014-12-10 16:46 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA, roland-DgEjT+Ai2ygdnm+yROfE0A
  Cc: Doug Ledford

As the locking fixes have missed two releases now, I'm resending them.
The other two items are minor (the pr_* function one at least makes
the system more usable by making output message point to the module
making all of the dmesg noise, the other is one I contend is the right
thing to do until we have a replacement for the cache).

These can also be pulled from my github repo as:

git://github.com/dledford/linux.git for-next

Bart Van Assche (1):
  IB/srp: Use P_Key cache for P_Key lookups

Doug Ledford (9):
  IPoIB: Consolidate rtnl_lock tasks in workqueue
  IPoIB: Make the carrier_on_task race aware
  IPoIB: fix MCAST_FLAG_BUSY usage
  IPoIB: fix mcast_dev_flush/mcast_restart_task race
  IPoIB: change init sequence ordering
  IPoIB: Use dedicated workqueues per interface
  IPoIB: Make ipoib_mcast_stop_thread flush the workqueue
  IPoIB: No longer use flush as a parameter
  ib_srpt: convert printk's to pr_* functions

 drivers/infiniband/ulp/srp/ib_srp.c   |   9 +-
 drivers/infiniband/ulp/srpt/ib_srpt.c | 188 ++++++++++++++++------------------
 2 files changed, 96 insertions(+), 101 deletions(-)

-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

[parent not found: <cover.1418229671.git.dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>]

* [PATCH 01/10] IPoIB: Consolidate rtnl_lock tasks in workqueue
       [not found] ` <cover.1418229671.git.dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2014-12-10 16:46   ` Doug Ledford
       [not found]     ` <84fa20ea8c6763fee66b378adda1da52cd48e342.1418229671.git.dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  2014-12-10 16:46   ` [PATCH 02/10] IPoIB: Make the carrier_on_task race aware Doug Ledford
                     ` (9 subsequent siblings)
  10 siblings, 1 reply; 20+ messages in thread
From: Doug Ledford @ 2014-12-10 16:46 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA, roland-DgEjT+Ai2ygdnm+yROfE0A
  Cc: Doug Ledford

Setting the mtu can safely be moved to the carrier_on_task, which keeps
us from needing to take the rtnl lock in the join_finish section.

Signed-off-by: Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
---
 drivers/infiniband/ulp/ipoib/ipoib_multicast.c | 8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
index ffb83b5f7e8..eee66d13e5b 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
@@ -190,12 +190,6 @@ static int ipoib_mcast_join_finish(struct ipoib_mcast *mcast,
 		spin_unlock_irq(&priv->lock);
 		priv->tx_wr.wr.ud.remote_qkey = priv->qkey;
 		set_qkey = 1;
-
-		if (!ipoib_cm_admin_enabled(dev)) {
-			rtnl_lock();
-			dev_set_mtu(dev, min(priv->mcast_mtu, priv->admin_mtu));
-			rtnl_unlock();
-		}
 	}
 
 	if (!test_bit(IPOIB_MCAST_FLAG_SENDONLY, &mcast->flags)) {
@@ -371,6 +365,8 @@ void ipoib_mcast_carrier_on_task(struct work_struct *work)
 	}
 
 	rtnl_lock();
+	if (!ipoib_cm_admin_enabled(priv->dev))
+		dev_set_mtu(priv->dev, min(priv->mcast_mtu, priv->admin_mtu));
 	netif_carrier_on(priv->dev);
 	rtnl_unlock();
 }
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 20+ messages in thread

[parent not found: <84fa20ea8c6763fee66b378adda1da52cd48e342.1418229671.git.dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>]

* RE: [PATCH 01/10] IPoIB: Consolidate rtnl_lock tasks in workqueue
       [not found]     ` <84fa20ea8c6763fee66b378adda1da52cd48e342.1418229671.git.dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2014-12-12 14:02       ` Estrin, Alex
  0 siblings, 0 replies; 20+ messages in thread
From: Estrin, Alex @ 2014-12-12 14:02 UTC (permalink / raw)
  To: Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	roland-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org



> -----Original Message-----
> From: linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org [mailto:linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org] On
> Behalf Of Doug Ledford
> Sent: Wednesday, December 10, 2014 11:47 AM
> To: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; roland-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org
> Cc: Doug Ledford
> Subject: [PATCH 01/10] IPoIB: Consolidate rtnl_lock tasks in workqueue
> 
> Setting the mtu can safely be moved to the carrier_on_task, which keeps
> us from needing to take the rtnl lock in the join_finish section.
> 
> Signed-off-by: Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

Acked-by: Alex Estrin <alex.estrin-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

> ---
>  drivers/infiniband/ulp/ipoib/ipoib_multicast.c | 8 ++------
>  1 file changed, 2 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
> b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
> index ffb83b5f7e8..eee66d13e5b 100644
> --- a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
> +++ b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
> @@ -190,12 +190,6 @@ static int ipoib_mcast_join_finish(struct ipoib_mcast *mcast,
>  		spin_unlock_irq(&priv->lock);
>  		priv->tx_wr.wr.ud.remote_qkey = priv->qkey;
>  		set_qkey = 1;
> -
> -		if (!ipoib_cm_admin_enabled(dev)) {
> -			rtnl_lock();
> -			dev_set_mtu(dev, min(priv->mcast_mtu, priv->admin_mtu));
> -			rtnl_unlock();
> -		}
>  	}
> 
>  	if (!test_bit(IPOIB_MCAST_FLAG_SENDONLY, &mcast->flags)) {
> @@ -371,6 +365,8 @@ void ipoib_mcast_carrier_on_task(struct work_struct *work)
>  	}
> 
>  	rtnl_lock();
> +	if (!ipoib_cm_admin_enabled(priv->dev))
> +		dev_set_mtu(priv->dev, min(priv->mcast_mtu, priv->admin_mtu));
>  	netif_carrier_on(priv->dev);
>  	rtnl_unlock();
>  }
> --
> 2.1.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 02/10] IPoIB: Make the carrier_on_task race aware
       [not found] ` <cover.1418229671.git.dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  2014-12-10 16:46   ` [PATCH 01/10] IPoIB: Consolidate rtnl_lock tasks in workqueue Doug Ledford
@ 2014-12-10 16:46   ` Doug Ledford
       [not found]     ` <9da7866273a37e8927eb079d90c96074a118c5bd.1418229671.git.dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  2014-12-10 16:47   ` [PATCH 03/10] IPoIB: fix MCAST_FLAG_BUSY usage Doug Ledford
                     ` (8 subsequent siblings)
  10 siblings, 1 reply; 20+ messages in thread
From: Doug Ledford @ 2014-12-10 16:46 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA, roland-DgEjT+Ai2ygdnm+yROfE0A
  Cc: Doug Ledford

We blindly assume that we can just take the rtnl lock and that will
prevent races with downing this interface.  Unfortunately, that's not
the case.  In ipoib_mcast_stop_thread() we will call flush_workqueue()
in an attempt to clear out all remaining instances of ipoib_join_task.
But, since this task is put on the same workqueue as the join task, the
flush_workqueue waits on this thread too.  But this thread is deadlocked
on the rtnl lock.  The better thing here is to use trylock and loop on
that until we either get the lock or we see that FLAG_ADMIN_UP has
been cleared, in which case we don't need to do anything anyway and we
just return.

Signed-off-by: Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
---
 drivers/infiniband/ulp/ipoib/ipoib_multicast.c | 21 +++++++++++++++------
 1 file changed, 15 insertions(+), 6 deletions(-)

diff --git a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
index eee66d13e5b..9862c76a83f 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
@@ -353,18 +353,27 @@ void ipoib_mcast_carrier_on_task(struct work_struct *work)
 						   carrier_on_task);
 	struct ib_port_attr attr;
 
-	/*
-	 * Take rtnl_lock to avoid racing with ipoib_stop() and
-	 * turning the carrier back on while a device is being
-	 * removed.
-	 */
 	if (ib_query_port(priv->ca, priv->port, &attr) ||
 	    attr.state != IB_PORT_ACTIVE) {
 		ipoib_dbg(priv, "Keeping carrier off until IB port is active\n");
 		return;
 	}
 
-	rtnl_lock();
+	/*
+	 * Take rtnl_lock to avoid racing with ipoib_stop() and
+	 * turning the carrier back on while a device is being
+	 * removed.  However, ipoib_stop() will attempt to flush
+	 * the workqueue while holding the rtnl lock, so loop
+	 * on trylock until either we get the lock or we see
+	 * FLAG_ADMIN_UP go away as that signals that we are bailing
+	 * and can safely ignore the carrier on work
+	 */
+	while (!rtnl_trylock()) {
+		if (!test_bit(IPOIB_FLAG_ADMIN_UP, &priv->flags))
+			return;
+		else
+			msleep(20);
+	}
 	if (!ipoib_cm_admin_enabled(priv->dev))
 		dev_set_mtu(priv->dev, min(priv->mcast_mtu, priv->admin_mtu));
 	netif_carrier_on(priv->dev);
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 20+ messages in thread

[parent not found: <9da7866273a37e8927eb079d90c96074a118c5bd.1418229671.git.dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>]

* RE: [PATCH 02/10] IPoIB: Make the carrier_on_task race aware
       [not found]     ` <9da7866273a37e8927eb079d90c96074a118c5bd.1418229671.git.dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2014-12-12 14:02       ` Estrin, Alex
  0 siblings, 0 replies; 20+ messages in thread
From: Estrin, Alex @ 2014-12-12 14:02 UTC (permalink / raw)
  To: Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	roland-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org

> -----Original Message-----
> From: linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org [mailto:linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org] On
> Behalf Of Doug Ledford
> Sent: Wednesday, December 10, 2014 11:47 AM
> To: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; roland-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org
> Cc: Doug Ledford
> Subject: [PATCH 02/10] IPoIB: Make the carrier_on_task race aware
> 
> We blindly assume that we can just take the rtnl lock and that will
> prevent races with downing this interface.  Unfortunately, that's not
> the case.  In ipoib_mcast_stop_thread() we will call flush_workqueue()
> in an attempt to clear out all remaining instances of ipoib_join_task.
> But, since this task is put on the same workqueue as the join task, the
> flush_workqueue waits on this thread too.  But this thread is deadlocked
> on the rtnl lock.  The better thing here is to use trylock and loop on
> that until we either get the lock or we see that FLAG_ADMIN_UP has
> been cleared, in which case we don't need to do anything anyway and we
> just return.
> 
> Signed-off-by: Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

Acked-by: Alex Estrin <alex.estrin-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

> ---
>  drivers/infiniband/ulp/ipoib/ipoib_multicast.c | 21 +++++++++++++++------
>  1 file changed, 15 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
> b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
> index eee66d13e5b..9862c76a83f 100644
> --- a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
> +++ b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
> @@ -353,18 +353,27 @@ void ipoib_mcast_carrier_on_task(struct work_struct *work)
>  						   carrier_on_task);
>  	struct ib_port_attr attr;
> 
> -	/*
> -	 * Take rtnl_lock to avoid racing with ipoib_stop() and
> -	 * turning the carrier back on while a device is being
> -	 * removed.
> -	 */
>  	if (ib_query_port(priv->ca, priv->port, &attr) ||
>  	    attr.state != IB_PORT_ACTIVE) {
>  		ipoib_dbg(priv, "Keeping carrier off until IB port is active\n");
>  		return;
>  	}
> 
> -	rtnl_lock();
> +	/*
> +	 * Take rtnl_lock to avoid racing with ipoib_stop() and
> +	 * turning the carrier back on while a device is being
> +	 * removed.  However, ipoib_stop() will attempt to flush
> +	 * the workqueue while holding the rtnl lock, so loop
> +	 * on trylock until either we get the lock or we see
> +	 * FLAG_ADMIN_UP go away as that signals that we are bailing
> +	 * and can safely ignore the carrier on work
> +	 */
> +	while (!rtnl_trylock()) {
> +		if (!test_bit(IPOIB_FLAG_ADMIN_UP, &priv->flags))
> +			return;
> +		else
> +			msleep(20);
> +	}
>  	if (!ipoib_cm_admin_enabled(priv->dev))
>  		dev_set_mtu(priv->dev, min(priv->mcast_mtu, priv->admin_mtu));
>  	netif_carrier_on(priv->dev);
> --
> 2.1.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 03/10] IPoIB: fix MCAST_FLAG_BUSY usage
       [not found] ` <cover.1418229671.git.dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  2014-12-10 16:46   ` [PATCH 01/10] IPoIB: Consolidate rtnl_lock tasks in workqueue Doug Ledford
  2014-12-10 16:46   ` [PATCH 02/10] IPoIB: Make the carrier_on_task race aware Doug Ledford
@ 2014-12-10 16:47   ` Doug Ledford
       [not found]     ` <3287eaddaf7948ccf6f471daa3312e27649ef798.1418229671.git.dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  2014-12-10 16:47   ` [PATCH 04/10] IPoIB: fix mcast_dev_flush/mcast_restart_task race Doug Ledford
                     ` (7 subsequent siblings)
  10 siblings, 1 reply; 20+ messages in thread
From: Doug Ledford @ 2014-12-10 16:47 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA, roland-DgEjT+Ai2ygdnm+yROfE0A
  Cc: Doug Ledford

Commit a9c8ba5884 (IPoIB: Fix usage of uninitialized multicast objects)
added a new flag MCAST_JOIN_STARTED, but was not very strict in how it
was used.  We didn't always initialize the completion struct before we
set the flag, and we didn't always call complete on the completion
struct from all paths that complete it.  This made it less than totally
effective, and certainly made its use confusing.  And in the flush
function we would use the presence of this flag to signal that we should
wait on the completion struct, but we never cleared this flag, ever.
This is further muddied by the fact that we overload the MCAST_FLAG_BUSY
flag to mean two different things: we have a join in flight, and we have
succeeded in getting an ib_sa_join_multicast.

In order to make things clearer and aid in resolving the rtnl deadlock
bug I've been chasing, I cleaned this up a bit.

1) Remove the MCAST_JOIN_STARTED flag entirely
2) Un-overload MCAST_FLAG_BUSY so it now only means a join is in-flight
3) Test on mcast->mc directly to see if we have completed
ib_sa_join_multicast (using IS_ERR_OR_NULL)
4) Make sure that before setting MCAST_FLAG_BUSY we always initialize
the mcast->done completion struct
5) Make sure that before calling complete(&mcast->done), we always clear
the MCAST_FLAG_BUSY bit
6) Take the mcast_mutex before we call ib_sa_multicast_join and also
take the mutex in our join callback.  This forces ib_sa_multicast_join
to return and set mcast->mc before we process the callback.  This way,
our callback can safely clear mcast->mc if there is an error on the join
and we will do the right thing as a result in mcast_dev_flush.
7) Because we need the mutex to synchronize mcast->mc, we can no longer
call mcast_sendonly_join directly from mcast_send and instead must add
sendonly join processing to the mcast_join_task

A number of different races are resolved with these changes.  These
races existed with the old MCAST_FLAG_BUSY usage, the
MCAST_JOIN_STARTED flag was an attempt to address them, and while it
helped, a determined effort could still trip things up.

One race looks something like this:

Thread 1                             Thread 2
ib_sa_join_multicast (as part of running restart mcast task)
  alloc member
  call callback
                                     ifconfig ib0 down
				     wait_for_completion
    callback call completes
                                     wait_for_completion in
				     mcast_dev_flush completes
				       mcast->mc is PTR_ERR_OR_NULL
				       so we skip ib_sa_leave_multicast
    return from callback
  return from ib_sa_join_multicast
set mcast->mc = return from ib_sa_multicast

We now have a permanently unbalanced join/leave issue that trips up the
refcounting in core/multicast.c

Another like this:

Thread 1                   Thread 2         Thread 3
ib_sa_multicast_join
                                            ifconfig ib0 down
					    priv->broadcast = NULL
                           join_complete
			                    wait_for_completion
			   mcast->mc is not yet set, so don't clear
return from ib_sa_join_multicast and set mcast->mc
			   complete
			   return -EAGAIN (making mcast->mc invalid)
			   		    call ib_sa_multicast_leave
					    on invalid mcast->mc, hang
					    forever

By holding the mutex around ib_sa_multicast_join and taking the mutex
early in the callback, we force mcast->mc to be valid at the time we run
the callback.  This allows us to clear mcast->mc if there is an error
and the join is going to fail.  We do this before we complete the mcast.
In this way, mcast_dev_flush always sees consistent state in regards to
mcast->mc membership at the time that the wait_for_completion() returns.

Signed-off-by: Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
---
 drivers/infiniband/ulp/ipoib/ipoib.h           |  10 +-
 drivers/infiniband/ulp/ipoib/ipoib_multicast.c | 148 ++++++++++++++++---------
 2 files changed, 101 insertions(+), 57 deletions(-)

diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h b/drivers/infiniband/ulp/ipoib/ipoib.h
index d7562beb542..f4c1b20b23b 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib.h
+++ b/drivers/infiniband/ulp/ipoib/ipoib.h
@@ -98,9 +98,15 @@ enum {
 
 	IPOIB_MCAST_FLAG_FOUND	  = 0,	/* used in set_multicast_list */
 	IPOIB_MCAST_FLAG_SENDONLY = 1,
-	IPOIB_MCAST_FLAG_BUSY	  = 2,	/* joining or already joined */
+	/*
+	 * For IPOIB_MCAST_FLAG_BUSY
+	 * When set, in flight join and mcast->mc is unreliable
+	 * When clear and mcast->mc IS_ERR_OR_NULL, need to restart or
+	 *   haven't started yet
+	 * When clear and mcast->mc is valid pointer, join was successful
+	 */
+	IPOIB_MCAST_FLAG_BUSY	  = 2,
 	IPOIB_MCAST_FLAG_ATTACHED = 3,
-	IPOIB_MCAST_JOIN_STARTED  = 4,
 
 	MAX_SEND_CQE		  = 16,
 	IPOIB_CM_COPYBREAK	  = 256,
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
index 9862c76a83f..a52c9f3f7e4 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
@@ -271,16 +271,27 @@ ipoib_mcast_sendonly_join_complete(int status,
 	struct ipoib_mcast *mcast = multicast->context;
 	struct net_device *dev = mcast->dev;
 
+	/*
+	 * We have to take the mutex to force mcast_sendonly_join to
+	 * return from ib_sa_multicast_join and set mcast->mc to a
+	 * valid value.  Otherwise we were racing with ourselves in
+	 * that we might fail here, but get a valid return from
+	 * ib_sa_multicast_join after we had cleared mcast->mc here,
+	 * resulting in mis-matched joins and leaves and a deadlock
+	 */
+	mutex_lock(&mcast_mutex);
+
 	/* We trap for port events ourselves. */
 	if (status == -ENETRESET)
-		return 0;
+		goto out;
 
 	if (!status)
 		status = ipoib_mcast_join_finish(mcast, &multicast->rec);
 
 	if (status) {
 		if (mcast->logcount++ < 20)
-			ipoib_dbg_mcast(netdev_priv(dev), "multicast join failed for %pI6, status %d\n",
+			ipoib_dbg_mcast(netdev_priv(dev), "sendonly multicast "
+					"join failed for %pI6, status %d\n",
 					mcast->mcmember.mgid.raw, status);
 
 		/* Flush out any queued packets */
@@ -290,11 +301,15 @@ ipoib_mcast_sendonly_join_complete(int status,
 			dev_kfree_skb_any(skb_dequeue(&mcast->pkt_queue));
 		}
 		netif_tx_unlock_bh(dev);
-
-		/* Clear the busy flag so we try again */
-		status = test_and_clear_bit(IPOIB_MCAST_FLAG_BUSY,
-					    &mcast->flags);
 	}
+out:
+	clear_bit(IPOIB_MCAST_FLAG_BUSY, &mcast->flags);
+	if (status)
+		mcast->mc = NULL;
+	complete(&mcast->done);
+	if (status == -ENETRESET)
+		status = 0;
+	mutex_unlock(&mcast_mutex);
 	return status;
 }
 
@@ -312,12 +327,14 @@ static int ipoib_mcast_sendonly_join(struct ipoib_mcast *mcast)
 	int ret = 0;
 
 	if (!test_bit(IPOIB_FLAG_OPER_UP, &priv->flags)) {
-		ipoib_dbg_mcast(priv, "device shutting down, no multicast joins\n");
+		ipoib_dbg_mcast(priv, "device shutting down, no sendonly "
+				"multicast joins\n");
 		return -ENODEV;
 	}
 
-	if (test_and_set_bit(IPOIB_MCAST_FLAG_BUSY, &mcast->flags)) {
-		ipoib_dbg_mcast(priv, "multicast entry busy, skipping\n");
+	if (test_bit(IPOIB_MCAST_FLAG_BUSY, &mcast->flags)) {
+		ipoib_dbg_mcast(priv, "multicast entry busy, skipping "
+				"sendonly join\n");
 		return -EBUSY;
 	}
 
@@ -325,6 +342,9 @@ static int ipoib_mcast_sendonly_join(struct ipoib_mcast *mcast)
 	rec.port_gid = priv->local_gid;
 	rec.pkey     = cpu_to_be16(priv->pkey);
 
+	mutex_lock(&mcast_mutex);
+	init_completion(&mcast->done);
+	set_bit(IPOIB_MCAST_FLAG_BUSY, &mcast->flags);
 	mcast->mc = ib_sa_join_multicast(&ipoib_sa_client, priv->ca,
 					 priv->port, &rec,
 					 IB_SA_MCMEMBER_REC_MGID	|
@@ -337,12 +357,14 @@ static int ipoib_mcast_sendonly_join(struct ipoib_mcast *mcast)
 	if (IS_ERR(mcast->mc)) {
 		ret = PTR_ERR(mcast->mc);
 		clear_bit(IPOIB_MCAST_FLAG_BUSY, &mcast->flags);
-		ipoib_warn(priv, "ib_sa_join_multicast failed (ret = %d)\n",
-			   ret);
+		complete(&mcast->done);
+		ipoib_warn(priv, "ib_sa_join_multicast for sendonly join "
+			   "failed (ret = %d)\n", ret);
 	} else {
-		ipoib_dbg_mcast(priv, "no multicast record for %pI6, starting join\n",
-				mcast->mcmember.mgid.raw);
+		ipoib_dbg_mcast(priv, "no multicast record for %pI6, starting "
+				"sendonly join\n", mcast->mcmember.mgid.raw);
 	}
+	mutex_unlock(&mcast_mutex);
 
 	return ret;
 }
@@ -390,22 +412,28 @@ static int ipoib_mcast_join_complete(int status,
 	ipoib_dbg_mcast(priv, "join completion for %pI6 (status %d)\n",
 			mcast->mcmember.mgid.raw, status);
 
+	/*
+	 * We have to take the mutex to force mcast_join to
+	 * return from ib_sa_multicast_join and set mcast->mc to a
+	 * valid value.  Otherwise we were racing with ourselves in
+	 * that we might fail here, but get a valid return from
+	 * ib_sa_multicast_join after we had cleared mcast->mc here,
+	 * resulting in mis-matched joins and leaves and a deadlock
+	 */
+	mutex_lock(&mcast_mutex);
+
 	/* We trap for port events ourselves. */
-	if (status == -ENETRESET) {
-		status = 0;
+	if (status == -ENETRESET)
 		goto out;
-	}
 
 	if (!status)
 		status = ipoib_mcast_join_finish(mcast, &multicast->rec);
 
 	if (!status) {
 		mcast->backoff = 1;
-		mutex_lock(&mcast_mutex);
 		if (test_bit(IPOIB_MCAST_RUN, &priv->flags))
 			queue_delayed_work(ipoib_workqueue,
 					   &priv->mcast_task, 0);
-		mutex_unlock(&mcast_mutex);
 
 		/*
 		 * Defer carrier on work to ipoib_workqueue to avoid a
@@ -413,37 +441,35 @@ static int ipoib_mcast_join_complete(int status,
 		 */
 		if (mcast == priv->broadcast)
 			queue_work(ipoib_workqueue, &priv->carrier_on_task);
-
-		status = 0;
-		goto out;
-	}
-
-	if (mcast->logcount++ < 20) {
-		if (status == -ETIMEDOUT || status == -EAGAIN) {
-			ipoib_dbg_mcast(priv, "multicast join failed for %pI6, status %d\n",
-					mcast->mcmember.mgid.raw, status);
-		} else {
-			ipoib_warn(priv, "multicast join failed for %pI6, status %d\n",
-				   mcast->mcmember.mgid.raw, status);
+	} else {
+		if (mcast->logcount++ < 20) {
+			if (status == -ETIMEDOUT || status == -EAGAIN) {
+				ipoib_dbg_mcast(priv, "multicast join failed for %pI6, status %d\n",
+						mcast->mcmember.mgid.raw, status);
+			} else {
+				ipoib_warn(priv, "multicast join failed for %pI6, status %d\n",
+					   mcast->mcmember.mgid.raw, status);
+			}
 		}
-	}
-
-	mcast->backoff *= 2;
-	if (mcast->backoff > IPOIB_MAX_BACKOFF_SECONDS)
-		mcast->backoff = IPOIB_MAX_BACKOFF_SECONDS;
-
-	/* Clear the busy flag so we try again */
-	status = test_and_clear_bit(IPOIB_MCAST_FLAG_BUSY, &mcast->flags);
 
-	mutex_lock(&mcast_mutex);
+		mcast->backoff *= 2;
+		if (mcast->backoff > IPOIB_MAX_BACKOFF_SECONDS)
+			mcast->backoff = IPOIB_MAX_BACKOFF_SECONDS;
+	}
+out:
 	spin_lock_irq(&priv->lock);
-	if (test_bit(IPOIB_MCAST_RUN, &priv->flags))
+	clear_bit(IPOIB_MCAST_FLAG_BUSY, &mcast->flags);
+	if (status)
+		mcast->mc = NULL;
+	complete(&mcast->done);
+	if (status == -ENETRESET)
+		status = 0;
+	if (status && test_bit(IPOIB_MCAST_RUN, &priv->flags))
 		queue_delayed_work(ipoib_workqueue, &priv->mcast_task,
 				   mcast->backoff * HZ);
 	spin_unlock_irq(&priv->lock);
 	mutex_unlock(&mcast_mutex);
-out:
-	complete(&mcast->done);
+
 	return status;
 }
 
@@ -492,10 +518,9 @@ static void ipoib_mcast_join(struct net_device *dev, struct ipoib_mcast *mcast,
 		rec.hop_limit	  = priv->broadcast->mcmember.hop_limit;
 	}
 
-	set_bit(IPOIB_MCAST_FLAG_BUSY, &mcast->flags);
+	mutex_lock(&mcast_mutex);
 	init_completion(&mcast->done);
-	set_bit(IPOIB_MCAST_JOIN_STARTED, &mcast->flags);
-
+	set_bit(IPOIB_MCAST_FLAG_BUSY, &mcast->flags);
 	mcast->mc = ib_sa_join_multicast(&ipoib_sa_client, priv->ca, priv->port,
 					 &rec, comp_mask, GFP_KERNEL,
 					 ipoib_mcast_join_complete, mcast);
@@ -509,13 +534,12 @@ static void ipoib_mcast_join(struct net_device *dev, struct ipoib_mcast *mcast,
 		if (mcast->backoff > IPOIB_MAX_BACKOFF_SECONDS)
 			mcast->backoff = IPOIB_MAX_BACKOFF_SECONDS;
 
-		mutex_lock(&mcast_mutex);
 		if (test_bit(IPOIB_MCAST_RUN, &priv->flags))
 			queue_delayed_work(ipoib_workqueue,
 					   &priv->mcast_task,
 					   mcast->backoff * HZ);
-		mutex_unlock(&mcast_mutex);
 	}
+	mutex_unlock(&mcast_mutex);
 }
 
 void ipoib_mcast_join_task(struct work_struct *work)
@@ -568,7 +592,8 @@ void ipoib_mcast_join_task(struct work_struct *work)
 	}
 
 	if (!test_bit(IPOIB_MCAST_FLAG_ATTACHED, &priv->broadcast->flags)) {
-		if (!test_bit(IPOIB_MCAST_FLAG_BUSY, &priv->broadcast->flags))
+		if (IS_ERR_OR_NULL(priv->broadcast->mc) &&
+		    !test_bit(IPOIB_MCAST_FLAG_BUSY, &priv->broadcast->flags))
 			ipoib_mcast_join(dev, priv->broadcast, 0);
 		return;
 	}
@@ -576,23 +601,33 @@ void ipoib_mcast_join_task(struct work_struct *work)
 	while (1) {
 		struct ipoib_mcast *mcast = NULL;
 
+		/*
+		 * Need the mutex so our flags are consistent, need the
+		 * priv->lock so we don't race with list removals in either
+		 * mcast_dev_flush or mcast_restart_task
+		 */
+		mutex_lock(&mcast_mutex);
 		spin_lock_irq(&priv->lock);
 		list_for_each_entry(mcast, &priv->multicast_list, list) {
-			if (!test_bit(IPOIB_MCAST_FLAG_SENDONLY, &mcast->flags)
-			    && !test_bit(IPOIB_MCAST_FLAG_BUSY, &mcast->flags)
-			    && !test_bit(IPOIB_MCAST_FLAG_ATTACHED, &mcast->flags)) {
+			if (IS_ERR_OR_NULL(mcast->mc) &&
+			    !test_bit(IPOIB_MCAST_FLAG_BUSY, &mcast->flags) &&
+			    !test_bit(IPOIB_MCAST_FLAG_ATTACHED, &mcast->flags)) {
 				/* Found the next unjoined group */
 				break;
 			}
 		}
 		spin_unlock_irq(&priv->lock);
+		mutex_unlock(&mcast_mutex);
 
 		if (&mcast->list == &priv->multicast_list) {
 			/* All done */
 			break;
 		}
 
-		ipoib_mcast_join(dev, mcast, 1);
+		if (test_bit(IPOIB_MCAST_FLAG_SENDONLY, &mcast->flags))
+			ipoib_mcast_sendonly_join(mcast);
+		else
+			ipoib_mcast_join(dev, mcast, 1);
 		return;
 	}
 
@@ -638,6 +673,9 @@ static int ipoib_mcast_leave(struct net_device *dev, struct ipoib_mcast *mcast)
 	int ret = 0;
 
 	if (test_and_clear_bit(IPOIB_MCAST_FLAG_BUSY, &mcast->flags))
+		ipoib_warn(priv, "ipoib_mcast_leave on an in-flight join\n");
+
+	if (!IS_ERR_OR_NULL(mcast->mc))
 		ib_sa_free_multicast(mcast->mc);
 
 	if (test_and_clear_bit(IPOIB_MCAST_FLAG_ATTACHED, &mcast->flags)) {
@@ -690,6 +728,8 @@ void ipoib_mcast_send(struct net_device *dev, u8 *daddr, struct sk_buff *skb)
 		memcpy(mcast->mcmember.mgid.raw, mgid, sizeof (union ib_gid));
 		__ipoib_mcast_add(dev, mcast);
 		list_add_tail(&mcast->list, &priv->multicast_list);
+		if (!test_and_set_bit(IPOIB_MCAST_RUN, &priv->flags))
+			queue_delayed_work(ipoib_workqueue, &priv->mcast_task, 0);
 	}
 
 	if (!mcast->ah) {
@@ -703,8 +743,6 @@ void ipoib_mcast_send(struct net_device *dev, u8 *daddr, struct sk_buff *skb)
 		if (test_bit(IPOIB_MCAST_FLAG_BUSY, &mcast->flags))
 			ipoib_dbg_mcast(priv, "no address vector, "
 					"but multicast join already started\n");
-		else if (test_bit(IPOIB_MCAST_FLAG_SENDONLY, &mcast->flags))
-			ipoib_mcast_sendonly_join(mcast);
 
 		/*
 		 * If lookup completes between here and out:, don't
@@ -766,7 +804,7 @@ void ipoib_mcast_dev_flush(struct net_device *dev)
 
 	/* seperate between the wait to the leave*/
 	list_for_each_entry_safe(mcast, tmcast, &remove_list, list)
-		if (test_bit(IPOIB_MCAST_JOIN_STARTED, &mcast->flags))
+		if (test_bit(IPOIB_MCAST_FLAG_BUSY, &mcast->flags))
 			wait_for_completion(&mcast->done);
 
 	list_for_each_entry_safe(mcast, tmcast, &remove_list, list) {
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 20+ messages in thread

[parent not found: <3287eaddaf7948ccf6f471daa3312e27649ef798.1418229671.git.dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>]

* RE: [PATCH 03/10] IPoIB: fix MCAST_FLAG_BUSY usage
       [not found]     ` <3287eaddaf7948ccf6f471daa3312e27649ef798.1418229671.git.dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2014-12-12 14:03       ` Estrin, Alex
  0 siblings, 0 replies; 20+ messages in thread
From: Estrin, Alex @ 2014-12-12 14:03 UTC (permalink / raw)
  To: Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	roland-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org



> -----Original Message-----
> From: linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org [mailto:linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org] On
> Behalf Of Doug Ledford
> Sent: Wednesday, December 10, 2014 11:47 AM
> To: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; roland-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org
> Cc: Doug Ledford
> Subject: [PATCH 03/10] IPoIB: fix MCAST_FLAG_BUSY usage
> 
> Commit a9c8ba5884 (IPoIB: Fix usage of uninitialized multicast objects)
> added a new flag MCAST_JOIN_STARTED, but was not very strict in how it
> was used.  We didn't always initialize the completion struct before we
> set the flag, and we didn't always call complete on the completion
> struct from all paths that complete it.  This made it less than totally
> effective, and certainly made its use confusing.  And in the flush
> function we would use the presence of this flag to signal that we should
> wait on the completion struct, but we never cleared this flag, ever.
> This is further muddied by the fact that we overload the MCAST_FLAG_BUSY
> flag to mean two different things: we have a join in flight, and we have
> succeeded in getting an ib_sa_join_multicast.
> 
> In order to make things clearer and aid in resolving the rtnl deadlock
> bug I've been chasing, I cleaned this up a bit.
> 
> 1) Remove the MCAST_JOIN_STARTED flag entirely
> 2) Un-overload MCAST_FLAG_BUSY so it now only means a join is in-flight
> 3) Test on mcast->mc directly to see if we have completed
> ib_sa_join_multicast (using IS_ERR_OR_NULL)
> 4) Make sure that before setting MCAST_FLAG_BUSY we always initialize
> the mcast->done completion struct
> 5) Make sure that before calling complete(&mcast->done), we always clear
> the MCAST_FLAG_BUSY bit
> 6) Take the mcast_mutex before we call ib_sa_multicast_join and also
> take the mutex in our join callback.  This forces ib_sa_multicast_join
> to return and set mcast->mc before we process the callback.  This way,
> our callback can safely clear mcast->mc if there is an error on the join
> and we will do the right thing as a result in mcast_dev_flush.
> 7) Because we need the mutex to synchronize mcast->mc, we can no longer
> call mcast_sendonly_join directly from mcast_send and instead must add
> sendonly join processing to the mcast_join_task
> 
> A number of different races are resolved with these changes.  These
> races existed with the old MCAST_FLAG_BUSY usage, the
> MCAST_JOIN_STARTED flag was an attempt to address them, and while it
> helped, a determined effort could still trip things up.
> 
> One race looks something like this:
> 
> Thread 1                             Thread 2
> ib_sa_join_multicast (as part of running restart mcast task)
>   alloc member
>   call callback
>                                      ifconfig ib0 down
> 				     wait_for_completion
>     callback call completes
>                                      wait_for_completion in
> 				     mcast_dev_flush completes
> 				       mcast->mc is PTR_ERR_OR_NULL
> 				       so we skip ib_sa_leave_multicast
>     return from callback
>   return from ib_sa_join_multicast
> set mcast->mc = return from ib_sa_multicast
> 
> We now have a permanently unbalanced join/leave issue that trips up the
> refcounting in core/multicast.c
> 
> Another like this:
> 
> Thread 1                   Thread 2         Thread 3
> ib_sa_multicast_join
>                                             ifconfig ib0 down
> 					    priv->broadcast = NULL
>                            join_complete
> 			                    wait_for_completion
> 			   mcast->mc is not yet set, so don't clear
> return from ib_sa_join_multicast and set mcast->mc
> 			   complete
> 			   return -EAGAIN (making mcast->mc invalid)
> 			   		    call ib_sa_multicast_leave
> 					    on invalid mcast->mc, hang
> 					    forever
> 
> By holding the mutex around ib_sa_multicast_join and taking the mutex
> early in the callback, we force mcast->mc to be valid at the time we run
> the callback.  This allows us to clear mcast->mc if there is an error
> and the join is going to fail.  We do this before we complete the mcast.
> In this way, mcast_dev_flush always sees consistent state in regards to
> mcast->mc membership at the time that the wait_for_completion() returns.
> 
> Signed-off-by: Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

Acked-by: Alex Estrin <alex.estrin-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

> ---
>  drivers/infiniband/ulp/ipoib/ipoib.h           |  10 +-
>  drivers/infiniband/ulp/ipoib/ipoib_multicast.c | 148 ++++++++++++++++---------
>  2 files changed, 101 insertions(+), 57 deletions(-)
> 
> diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h
> b/drivers/infiniband/ulp/ipoib/ipoib.h
> index d7562beb542..f4c1b20b23b 100644
> --- a/drivers/infiniband/ulp/ipoib/ipoib.h
> +++ b/drivers/infiniband/ulp/ipoib/ipoib.h
> @@ -98,9 +98,15 @@ enum {
> 
>  	IPOIB_MCAST_FLAG_FOUND	  = 0,	/* used in set_multicast_list */
>  	IPOIB_MCAST_FLAG_SENDONLY = 1,
> -	IPOIB_MCAST_FLAG_BUSY	  = 2,	/* joining or already joined */
> +	/*
> +	 * For IPOIB_MCAST_FLAG_BUSY
> +	 * When set, in flight join and mcast->mc is unreliable
> +	 * When clear and mcast->mc IS_ERR_OR_NULL, need to restart or
> +	 *   haven't started yet
> +	 * When clear and mcast->mc is valid pointer, join was successful
> +	 */
> +	IPOIB_MCAST_FLAG_BUSY	  = 2,
>  	IPOIB_MCAST_FLAG_ATTACHED = 3,
> -	IPOIB_MCAST_JOIN_STARTED  = 4,
> 
>  	MAX_SEND_CQE		  = 16,
>  	IPOIB_CM_COPYBREAK	  = 256,
> diff --git a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
> b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
> index 9862c76a83f..a52c9f3f7e4 100644
> --- a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
> +++ b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
> @@ -271,16 +271,27 @@ ipoib_mcast_sendonly_join_complete(int status,
>  	struct ipoib_mcast *mcast = multicast->context;
>  	struct net_device *dev = mcast->dev;
> 
> +	/*
> +	 * We have to take the mutex to force mcast_sendonly_join to
> +	 * return from ib_sa_multicast_join and set mcast->mc to a
> +	 * valid value.  Otherwise we were racing with ourselves in
> +	 * that we might fail here, but get a valid return from
> +	 * ib_sa_multicast_join after we had cleared mcast->mc here,
> +	 * resulting in mis-matched joins and leaves and a deadlock
> +	 */
> +	mutex_lock(&mcast_mutex);
> +
>  	/* We trap for port events ourselves. */
>  	if (status == -ENETRESET)
> -		return 0;
> +		goto out;
> 
>  	if (!status)
>  		status = ipoib_mcast_join_finish(mcast, &multicast->rec);
> 
>  	if (status) {
>  		if (mcast->logcount++ < 20)
> -			ipoib_dbg_mcast(netdev_priv(dev), "multicast join failed for
> %pI6, status %d\n",
> +			ipoib_dbg_mcast(netdev_priv(dev), "sendonly multicast "
> +					"join failed for %pI6, status %d\n",
>  					mcast->mcmember.mgid.raw, status);
> 
>  		/* Flush out any queued packets */
> @@ -290,11 +301,15 @@ ipoib_mcast_sendonly_join_complete(int status,
>  			dev_kfree_skb_any(skb_dequeue(&mcast->pkt_queue));
>  		}
>  		netif_tx_unlock_bh(dev);
> -
> -		/* Clear the busy flag so we try again */
> -		status = test_and_clear_bit(IPOIB_MCAST_FLAG_BUSY,
> -					    &mcast->flags);
>  	}
> +out:
> +	clear_bit(IPOIB_MCAST_FLAG_BUSY, &mcast->flags);
> +	if (status)
> +		mcast->mc = NULL;
> +	complete(&mcast->done);
> +	if (status == -ENETRESET)
> +		status = 0;
> +	mutex_unlock(&mcast_mutex);
>  	return status;
>  }
> 
> @@ -312,12 +327,14 @@ static int ipoib_mcast_sendonly_join(struct ipoib_mcast
> *mcast)
>  	int ret = 0;
> 
>  	if (!test_bit(IPOIB_FLAG_OPER_UP, &priv->flags)) {
> -		ipoib_dbg_mcast(priv, "device shutting down, no multicast joins\n");
> +		ipoib_dbg_mcast(priv, "device shutting down, no sendonly "
> +				"multicast joins\n");
>  		return -ENODEV;
>  	}
> 
> -	if (test_and_set_bit(IPOIB_MCAST_FLAG_BUSY, &mcast->flags)) {
> -		ipoib_dbg_mcast(priv, "multicast entry busy, skipping\n");
> +	if (test_bit(IPOIB_MCAST_FLAG_BUSY, &mcast->flags)) {
> +		ipoib_dbg_mcast(priv, "multicast entry busy, skipping "
> +				"sendonly join\n");
>  		return -EBUSY;
>  	}
> 
> @@ -325,6 +342,9 @@ static int ipoib_mcast_sendonly_join(struct ipoib_mcast *mcast)
>  	rec.port_gid = priv->local_gid;
>  	rec.pkey     = cpu_to_be16(priv->pkey);
> 
> +	mutex_lock(&mcast_mutex);
> +	init_completion(&mcast->done);
> +	set_bit(IPOIB_MCAST_FLAG_BUSY, &mcast->flags);
>  	mcast->mc = ib_sa_join_multicast(&ipoib_sa_client, priv->ca,
>  					 priv->port, &rec,
>  					 IB_SA_MCMEMBER_REC_MGID	|
> @@ -337,12 +357,14 @@ static int ipoib_mcast_sendonly_join(struct ipoib_mcast
> *mcast)
>  	if (IS_ERR(mcast->mc)) {
>  		ret = PTR_ERR(mcast->mc);
>  		clear_bit(IPOIB_MCAST_FLAG_BUSY, &mcast->flags);
> -		ipoib_warn(priv, "ib_sa_join_multicast failed (ret = %d)\n",
> -			   ret);
> +		complete(&mcast->done);
> +		ipoib_warn(priv, "ib_sa_join_multicast for sendonly join "
> +			   "failed (ret = %d)\n", ret);
>  	} else {
> -		ipoib_dbg_mcast(priv, "no multicast record for %pI6, starting
> join\n",
> -				mcast->mcmember.mgid.raw);
> +		ipoib_dbg_mcast(priv, "no multicast record for %pI6, starting "
> +				"sendonly join\n", mcast->mcmember.mgid.raw);
>  	}
> +	mutex_unlock(&mcast_mutex);
> 
>  	return ret;
>  }
> @@ -390,22 +412,28 @@ static int ipoib_mcast_join_complete(int status,
>  	ipoib_dbg_mcast(priv, "join completion for %pI6 (status %d)\n",
>  			mcast->mcmember.mgid.raw, status);
> 
> +	/*
> +	 * We have to take the mutex to force mcast_join to
> +	 * return from ib_sa_multicast_join and set mcast->mc to a
> +	 * valid value.  Otherwise we were racing with ourselves in
> +	 * that we might fail here, but get a valid return from
> +	 * ib_sa_multicast_join after we had cleared mcast->mc here,
> +	 * resulting in mis-matched joins and leaves and a deadlock
> +	 */
> +	mutex_lock(&mcast_mutex);
> +
>  	/* We trap for port events ourselves. */
> -	if (status == -ENETRESET) {
> -		status = 0;
> +	if (status == -ENETRESET)
>  		goto out;
> -	}
> 
>  	if (!status)
>  		status = ipoib_mcast_join_finish(mcast, &multicast->rec);
> 
>  	if (!status) {
>  		mcast->backoff = 1;
> -		mutex_lock(&mcast_mutex);
>  		if (test_bit(IPOIB_MCAST_RUN, &priv->flags))
>  			queue_delayed_work(ipoib_workqueue,
>  					   &priv->mcast_task, 0);
> -		mutex_unlock(&mcast_mutex);
> 
>  		/*
>  		 * Defer carrier on work to ipoib_workqueue to avoid a
> @@ -413,37 +441,35 @@ static int ipoib_mcast_join_complete(int status,
>  		 */
>  		if (mcast == priv->broadcast)
>  			queue_work(ipoib_workqueue, &priv->carrier_on_task);
> -
> -		status = 0;
> -		goto out;
> -	}
> -
> -	if (mcast->logcount++ < 20) {
> -		if (status == -ETIMEDOUT || status == -EAGAIN) {
> -			ipoib_dbg_mcast(priv, "multicast join failed for %pI6, status
> %d\n",
> -					mcast->mcmember.mgid.raw, status);
> -		} else {
> -			ipoib_warn(priv, "multicast join failed for %pI6, status %d\n",
> -				   mcast->mcmember.mgid.raw, status);
> +	} else {
> +		if (mcast->logcount++ < 20) {
> +			if (status == -ETIMEDOUT || status == -EAGAIN) {
> +				ipoib_dbg_mcast(priv, "multicast join failed for %pI6,
> status %d\n",
> +						mcast->mcmember.mgid.raw, status);
> +			} else {
> +				ipoib_warn(priv, "multicast join failed for %pI6, status
> %d\n",
> +					   mcast->mcmember.mgid.raw, status);
> +			}
>  		}
> -	}
> -
> -	mcast->backoff *= 2;
> -	if (mcast->backoff > IPOIB_MAX_BACKOFF_SECONDS)
> -		mcast->backoff = IPOIB_MAX_BACKOFF_SECONDS;
> -
> -	/* Clear the busy flag so we try again */
> -	status = test_and_clear_bit(IPOIB_MCAST_FLAG_BUSY, &mcast->flags);
> 
> -	mutex_lock(&mcast_mutex);
> +		mcast->backoff *= 2;
> +		if (mcast->backoff > IPOIB_MAX_BACKOFF_SECONDS)
> +			mcast->backoff = IPOIB_MAX_BACKOFF_SECONDS;
> +	}
> +out:
>  	spin_lock_irq(&priv->lock);
> -	if (test_bit(IPOIB_MCAST_RUN, &priv->flags))
> +	clear_bit(IPOIB_MCAST_FLAG_BUSY, &mcast->flags);
> +	if (status)
> +		mcast->mc = NULL;
> +	complete(&mcast->done);
> +	if (status == -ENETRESET)
> +		status = 0;
> +	if (status && test_bit(IPOIB_MCAST_RUN, &priv->flags))
>  		queue_delayed_work(ipoib_workqueue, &priv->mcast_task,
>  				   mcast->backoff * HZ);
>  	spin_unlock_irq(&priv->lock);
>  	mutex_unlock(&mcast_mutex);
> -out:
> -	complete(&mcast->done);
> +
>  	return status;
>  }
> 
> @@ -492,10 +518,9 @@ static void ipoib_mcast_join(struct net_device *dev, struct
> ipoib_mcast *mcast,
>  		rec.hop_limit	  = priv->broadcast->mcmember.hop_limit;
>  	}
> 
> -	set_bit(IPOIB_MCAST_FLAG_BUSY, &mcast->flags);
> +	mutex_lock(&mcast_mutex);
>  	init_completion(&mcast->done);
> -	set_bit(IPOIB_MCAST_JOIN_STARTED, &mcast->flags);
> -
> +	set_bit(IPOIB_MCAST_FLAG_BUSY, &mcast->flags);
>  	mcast->mc = ib_sa_join_multicast(&ipoib_sa_client, priv->ca, priv->port,
>  					 &rec, comp_mask, GFP_KERNEL,
>  					 ipoib_mcast_join_complete, mcast);
> @@ -509,13 +534,12 @@ static void ipoib_mcast_join(struct net_device *dev, struct
> ipoib_mcast *mcast,
>  		if (mcast->backoff > IPOIB_MAX_BACKOFF_SECONDS)
>  			mcast->backoff = IPOIB_MAX_BACKOFF_SECONDS;
> 
> -		mutex_lock(&mcast_mutex);
>  		if (test_bit(IPOIB_MCAST_RUN, &priv->flags))
>  			queue_delayed_work(ipoib_workqueue,
>  					   &priv->mcast_task,
>  					   mcast->backoff * HZ);
> -		mutex_unlock(&mcast_mutex);
>  	}
> +	mutex_unlock(&mcast_mutex);
>  }
> 
>  void ipoib_mcast_join_task(struct work_struct *work)
> @@ -568,7 +592,8 @@ void ipoib_mcast_join_task(struct work_struct *work)
>  	}
> 
>  	if (!test_bit(IPOIB_MCAST_FLAG_ATTACHED, &priv->broadcast->flags)) {
> -		if (!test_bit(IPOIB_MCAST_FLAG_BUSY, &priv->broadcast->flags))
> +		if (IS_ERR_OR_NULL(priv->broadcast->mc) &&
> +		    !test_bit(IPOIB_MCAST_FLAG_BUSY, &priv->broadcast->flags))
>  			ipoib_mcast_join(dev, priv->broadcast, 0);
>  		return;
>  	}
> @@ -576,23 +601,33 @@ void ipoib_mcast_join_task(struct work_struct *work)
>  	while (1) {
>  		struct ipoib_mcast *mcast = NULL;
> 
> +		/*
> +		 * Need the mutex so our flags are consistent, need the
> +		 * priv->lock so we don't race with list removals in either
> +		 * mcast_dev_flush or mcast_restart_task
> +		 */
> +		mutex_lock(&mcast_mutex);
>  		spin_lock_irq(&priv->lock);
>  		list_for_each_entry(mcast, &priv->multicast_list, list) {
> -			if (!test_bit(IPOIB_MCAST_FLAG_SENDONLY, &mcast->flags)
> -			    && !test_bit(IPOIB_MCAST_FLAG_BUSY, &mcast->flags)
> -			    && !test_bit(IPOIB_MCAST_FLAG_ATTACHED, &mcast->flags)) {
> +			if (IS_ERR_OR_NULL(mcast->mc) &&
> +			    !test_bit(IPOIB_MCAST_FLAG_BUSY, &mcast->flags) &&
> +			    !test_bit(IPOIB_MCAST_FLAG_ATTACHED, &mcast->flags)) {
>  				/* Found the next unjoined group */
>  				break;
>  			}
>  		}
>  		spin_unlock_irq(&priv->lock);
> +		mutex_unlock(&mcast_mutex);
> 
>  		if (&mcast->list == &priv->multicast_list) {
>  			/* All done */
>  			break;
>  		}
> 
> -		ipoib_mcast_join(dev, mcast, 1);
> +		if (test_bit(IPOIB_MCAST_FLAG_SENDONLY, &mcast->flags))
> +			ipoib_mcast_sendonly_join(mcast);
> +		else
> +			ipoib_mcast_join(dev, mcast, 1);
>  		return;
>  	}
> 
> @@ -638,6 +673,9 @@ static int ipoib_mcast_leave(struct net_device *dev, struct
> ipoib_mcast *mcast)
>  	int ret = 0;
> 
>  	if (test_and_clear_bit(IPOIB_MCAST_FLAG_BUSY, &mcast->flags))
> +		ipoib_warn(priv, "ipoib_mcast_leave on an in-flight join\n");
> +
> +	if (!IS_ERR_OR_NULL(mcast->mc))
>  		ib_sa_free_multicast(mcast->mc);
> 
>  	if (test_and_clear_bit(IPOIB_MCAST_FLAG_ATTACHED, &mcast->flags)) {
> @@ -690,6 +728,8 @@ void ipoib_mcast_send(struct net_device *dev, u8 *daddr, struct
> sk_buff *skb)
>  		memcpy(mcast->mcmember.mgid.raw, mgid, sizeof (union ib_gid));
>  		__ipoib_mcast_add(dev, mcast);
>  		list_add_tail(&mcast->list, &priv->multicast_list);
> +		if (!test_and_set_bit(IPOIB_MCAST_RUN, &priv->flags))
> +			queue_delayed_work(ipoib_workqueue, &priv->mcast_task, 0);
>  	}
> 
>  	if (!mcast->ah) {
> @@ -703,8 +743,6 @@ void ipoib_mcast_send(struct net_device *dev, u8 *daddr, struct
> sk_buff *skb)
>  		if (test_bit(IPOIB_MCAST_FLAG_BUSY, &mcast->flags))
>  			ipoib_dbg_mcast(priv, "no address vector, "
>  					"but multicast join already started\n");
> -		else if (test_bit(IPOIB_MCAST_FLAG_SENDONLY, &mcast->flags))
> -			ipoib_mcast_sendonly_join(mcast);
> 
>  		/*
>  		 * If lookup completes between here and out:, don't
> @@ -766,7 +804,7 @@ void ipoib_mcast_dev_flush(struct net_device *dev)
> 
>  	/* seperate between the wait to the leave*/
>  	list_for_each_entry_safe(mcast, tmcast, &remove_list, list)
> -		if (test_bit(IPOIB_MCAST_JOIN_STARTED, &mcast->flags))
> +		if (test_bit(IPOIB_MCAST_FLAG_BUSY, &mcast->flags))
>  			wait_for_completion(&mcast->done);
> 
>  	list_for_each_entry_safe(mcast, tmcast, &remove_list, list) {
> --
> 2.1.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 04/10] IPoIB: fix mcast_dev_flush/mcast_restart_task race
       [not found] ` <cover.1418229671.git.dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
                     ` (2 preceding siblings ...)
  2014-12-10 16:47   ` [PATCH 03/10] IPoIB: fix MCAST_FLAG_BUSY usage Doug Ledford
@ 2014-12-10 16:47   ` Doug Ledford
       [not found]     ` <fd20f0aae0dce26162aba317d2571e46bdbae919.1418229671.git.dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  2014-12-10 16:47   ` [PATCH 05/10] IPoIB: change init sequence ordering Doug Ledford
                     ` (6 subsequent siblings)
  10 siblings, 1 reply; 20+ messages in thread
From: Doug Ledford @ 2014-12-10 16:47 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA, roland-DgEjT+Ai2ygdnm+yROfE0A
  Cc: Doug Ledford

Our mcast_dev_flush routine and our mcast_restart_task can race against
each other.  In particular, they both hold the priv->lock while
manipulating the rbtree and while removing mcast entries from the
multicast_list and while adding entries to the remove_list, but they
also both drop their locks prior to doing the actual removes.  The
mcast_dev_flush routine is run entirely under the rtnl lock and so has
at least some locking.  The actual race condition is like this:

Thread 1                                Thread 2
ifconfig ib0 up
  start multicast join for broadcast
  multicast join completes for broadcast
  start to add more multicast joins
    call mcast_restart_task to add new entries
                                        ifconfig ib0 down
					  mcast_dev_flush
					    mcast_leave(mcast A)
    mcast_leave(mcast A)

As mcast_leave calls ib_sa_multicast_leave, and as member in
core/multicast.c is ref counted, we run into an unbalanced refcount
issue.  To avoid stomping on each others removes, take the rtnl lock
specifically when we are deleting the entries from the remove list.

Signed-off-by: Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
---
 drivers/infiniband/ulp/ipoib/ipoib_multicast.c | 37 ++++++++++++++++++++++----
 1 file changed, 32 insertions(+), 5 deletions(-)

diff --git a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
index a52c9f3f7e4..41325960e4e 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
@@ -802,7 +802,10 @@ void ipoib_mcast_dev_flush(struct net_device *dev)
 
 	spin_unlock_irqrestore(&priv->lock, flags);
 
-	/* seperate between the wait to the leave*/
+	/*
+	 * make sure the in-flight joins have finished before we attempt
+	 * to leave
+	 */
 	list_for_each_entry_safe(mcast, tmcast, &remove_list, list)
 		if (test_bit(IPOIB_MCAST_FLAG_BUSY, &mcast->flags))
 			wait_for_completion(&mcast->done);
@@ -923,14 +926,38 @@ void ipoib_mcast_restart_task(struct work_struct *work)
 	netif_addr_unlock(dev);
 	local_irq_restore(flags);
 
-	/* We have to cancel outside of the spinlock */
+	/*
+	 * make sure the in-flight joins have finished before we attempt
+	 * to leave
+	 */
+	list_for_each_entry_safe(mcast, tmcast, &remove_list, list)
+		if (test_bit(IPOIB_MCAST_FLAG_BUSY, &mcast->flags))
+			wait_for_completion(&mcast->done);
+
+	/*
+	 * We have to cancel outside of the spinlock, but we have to
+	 * take the rtnl lock or else we race with the removal of
+	 * entries from the remove list in mcast_dev_flush as part
+	 * of ipoib_stop() which will call mcast_stop_thread with
+	 * flush == 1 while holding the rtnl lock, and the
+	 * flush_workqueue won't complete until this restart_mcast_task
+	 * completes.  So do like the carrier on task and attempt to
+	 * take the rtnl lock, but if we can't before the ADMIN_UP flag
+	 * goes away, then just return and know that the remove list will
+	 * get flushed later by mcast_dev_flush.
+	 */
+	while (!rtnl_trylock()) {
+		if (!test_bit(IPOIB_FLAG_ADMIN_UP, &priv->flags))
+			return;
+		else
+			msleep(20);
+	}
 	list_for_each_entry_safe(mcast, tmcast, &remove_list, list) {
 		ipoib_mcast_leave(mcast->dev, mcast);
 		ipoib_mcast_free(mcast);
 	}
-
-	if (test_bit(IPOIB_FLAG_ADMIN_UP, &priv->flags))
-		ipoib_mcast_start_thread(dev);
+	ipoib_mcast_start_thread(dev);
+	rtnl_unlock();
 }
 
 #ifdef CONFIG_INFINIBAND_IPOIB_DEBUG
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 20+ messages in thread

[parent not found: <fd20f0aae0dce26162aba317d2571e46bdbae919.1418229671.git.dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>]

* RE: [PATCH 04/10] IPoIB: fix mcast_dev_flush/mcast_restart_task race
       [not found]     ` <fd20f0aae0dce26162aba317d2571e46bdbae919.1418229671.git.dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2014-12-12 14:04       ` Estrin, Alex
  0 siblings, 0 replies; 20+ messages in thread
From: Estrin, Alex @ 2014-12-12 14:04 UTC (permalink / raw)
  To: Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	roland-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org



> -----Original Message-----
> From: linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org [mailto:linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org] On
> Behalf Of Doug Ledford
> Sent: Wednesday, December 10, 2014 11:47 AM
> To: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; roland-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org
> Cc: Doug Ledford
> Subject: [PATCH 04/10] IPoIB: fix mcast_dev_flush/mcast_restart_task race
> 
> Our mcast_dev_flush routine and our mcast_restart_task can race against
> each other.  In particular, they both hold the priv->lock while
> manipulating the rbtree and while removing mcast entries from the
> multicast_list and while adding entries to the remove_list, but they
> also both drop their locks prior to doing the actual removes.  The
> mcast_dev_flush routine is run entirely under the rtnl lock and so has
> at least some locking.  The actual race condition is like this:
> 
> Thread 1                                Thread 2
> ifconfig ib0 up
>   start multicast join for broadcast
>   multicast join completes for broadcast
>   start to add more multicast joins
>     call mcast_restart_task to add new entries
>                                         ifconfig ib0 down
> 					  mcast_dev_flush
> 					    mcast_leave(mcast A)
>     mcast_leave(mcast A)
> 
> As mcast_leave calls ib_sa_multicast_leave, and as member in
> core/multicast.c is ref counted, we run into an unbalanced refcount
> issue.  To avoid stomping on each others removes, take the rtnl lock
> specifically when we are deleting the entries from the remove list.
> 
> Signed-off-by: Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

Acked-by: Alex Estrin <alex.estrin-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

> ---
>  drivers/infiniband/ulp/ipoib/ipoib_multicast.c | 37 ++++++++++++++++++++++----
>  1 file changed, 32 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
> b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
> index a52c9f3f7e4..41325960e4e 100644
> --- a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
> +++ b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
> @@ -802,7 +802,10 @@ void ipoib_mcast_dev_flush(struct net_device *dev)
> 
>  	spin_unlock_irqrestore(&priv->lock, flags);
> 
> -	/* seperate between the wait to the leave*/
> +	/*
> +	 * make sure the in-flight joins have finished before we attempt
> +	 * to leave
> +	 */
>  	list_for_each_entry_safe(mcast, tmcast, &remove_list, list)
>  		if (test_bit(IPOIB_MCAST_FLAG_BUSY, &mcast->flags))
>  			wait_for_completion(&mcast->done);
> @@ -923,14 +926,38 @@ void ipoib_mcast_restart_task(struct work_struct *work)
>  	netif_addr_unlock(dev);
>  	local_irq_restore(flags);
> 
> -	/* We have to cancel outside of the spinlock */
> +	/*
> +	 * make sure the in-flight joins have finished before we attempt
> +	 * to leave
> +	 */
> +	list_for_each_entry_safe(mcast, tmcast, &remove_list, list)
> +		if (test_bit(IPOIB_MCAST_FLAG_BUSY, &mcast->flags))
> +			wait_for_completion(&mcast->done);
> +
> +	/*
> +	 * We have to cancel outside of the spinlock, but we have to
> +	 * take the rtnl lock or else we race with the removal of
> +	 * entries from the remove list in mcast_dev_flush as part
> +	 * of ipoib_stop() which will call mcast_stop_thread with
> +	 * flush == 1 while holding the rtnl lock, and the
> +	 * flush_workqueue won't complete until this restart_mcast_task
> +	 * completes.  So do like the carrier on task and attempt to
> +	 * take the rtnl lock, but if we can't before the ADMIN_UP flag
> +	 * goes away, then just return and know that the remove list will
> +	 * get flushed later by mcast_dev_flush.
> +	 */
> +	while (!rtnl_trylock()) {
> +		if (!test_bit(IPOIB_FLAG_ADMIN_UP, &priv->flags))
> +			return;
> +		else
> +			msleep(20);
> +	}
>  	list_for_each_entry_safe(mcast, tmcast, &remove_list, list) {
>  		ipoib_mcast_leave(mcast->dev, mcast);
>  		ipoib_mcast_free(mcast);
>  	}
> -
> -	if (test_bit(IPOIB_FLAG_ADMIN_UP, &priv->flags))
> -		ipoib_mcast_start_thread(dev);
> +	ipoib_mcast_start_thread(dev);
> +	rtnl_unlock();
>  }
> 
>  #ifdef CONFIG_INFINIBAND_IPOIB_DEBUG
> --
> 2.1.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 05/10] IPoIB: change init sequence ordering
       [not found] ` <cover.1418229671.git.dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
                     ` (3 preceding siblings ...)
  2014-12-10 16:47   ` [PATCH 04/10] IPoIB: fix mcast_dev_flush/mcast_restart_task race Doug Ledford
@ 2014-12-10 16:47   ` Doug Ledford
       [not found]     ` <540b071b56bdc3b72d59e80707164da07243dc80.1418229671.git.dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  2014-12-10 16:47   ` [PATCH 06/10] IPoIB: Use dedicated workqueues per interface Doug Ledford
                     ` (5 subsequent siblings)
  10 siblings, 1 reply; 20+ messages in thread
From: Doug Ledford @ 2014-12-10 16:47 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA, roland-DgEjT+Ai2ygdnm+yROfE0A
  Cc: Doug Ledford

In preparation for using per device work queues, we need to move the
start of the neighbor thread task to after ipoib_ib_dev_init and move
the destruction of the neighbor task to before ipoib_ib_dev_cleanup.
Otherwise we will end up freeing our workqueue with work possibly still
on it.

Signed-off-by: Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
---
 drivers/infiniband/ulp/ipoib/ipoib_main.c | 24 +++++++++++++++++-------
 1 file changed, 17 insertions(+), 7 deletions(-)

diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c
index 58b5aa3b6f2..2cf81ef5141 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
@@ -1262,15 +1262,13 @@ int ipoib_dev_init(struct net_device *dev, struct ib_device *ca, int port)
 {
 	struct ipoib_dev_priv *priv = netdev_priv(dev);
 
-	if (ipoib_neigh_hash_init(priv) < 0)
-		goto out;
 	/* Allocate RX/TX "rings" to hold queued skbs */
 	priv->rx_ring =	kzalloc(ipoib_recvq_size * sizeof *priv->rx_ring,
 				GFP_KERNEL);
 	if (!priv->rx_ring) {
 		printk(KERN_WARNING "%s: failed to allocate RX ring (%d entries)\n",
 		       ca->name, ipoib_recvq_size);
-		goto out_neigh_hash_cleanup;
+		goto out;
 	}
 
 	priv->tx_ring = vzalloc(ipoib_sendq_size * sizeof *priv->tx_ring);
@@ -1285,16 +1283,24 @@ int ipoib_dev_init(struct net_device *dev, struct ib_device *ca, int port)
 	if (ipoib_ib_dev_init(dev, ca, port))
 		goto out_tx_ring_cleanup;
 
+	/*
+	 * Must be after ipoib_ib_dev_init so we can allocate a per
+	 * device wq there and use it here
+	 */
+	if (ipoib_neigh_hash_init(priv) < 0)
+		goto out_dev_uninit;
+
 	return 0;
 
+out_dev_uninit:
+	ipoib_ib_dev_cleanup();
+
 out_tx_ring_cleanup:
 	vfree(priv->tx_ring);
 
 out_rx_ring_cleanup:
 	kfree(priv->rx_ring);
 
-out_neigh_hash_cleanup:
-	ipoib_neigh_hash_uninit(dev);
 out:
 	return -ENOMEM;
 }
@@ -1317,6 +1323,12 @@ void ipoib_dev_cleanup(struct net_device *dev)
 	}
 	unregister_netdevice_many(&head);
 
+	/*
+	 * Must be before ipoib_ib_dev_cleanup or we delete an in use
+	 * work queue
+	 */
+	ipoib_neigh_hash_uninit(dev);
+
 	ipoib_ib_dev_cleanup(dev);
 
 	kfree(priv->rx_ring);
@@ -1324,8 +1336,6 @@ void ipoib_dev_cleanup(struct net_device *dev)
 
 	priv->rx_ring = NULL;
 	priv->tx_ring = NULL;
-
-	ipoib_neigh_hash_uninit(dev);
 }
 
 static const struct header_ops ipoib_header_ops = {
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 20+ messages in thread

[parent not found: <540b071b56bdc3b72d59e80707164da07243dc80.1418229671.git.dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>]

* RE: [PATCH 05/10] IPoIB: change init sequence ordering
       [not found]     ` <540b071b56bdc3b72d59e80707164da07243dc80.1418229671.git.dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2014-12-12 14:04       ` Estrin, Alex
  0 siblings, 0 replies; 20+ messages in thread
From: Estrin, Alex @ 2014-12-12 14:04 UTC (permalink / raw)
  To: Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	roland-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org



> -----Original Message-----
> From: linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org [mailto:linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org] On
> Behalf Of Doug Ledford
> Sent: Wednesday, December 10, 2014 11:47 AM
> To: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; roland-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org
> Cc: Doug Ledford
> Subject: [PATCH 05/10] IPoIB: change init sequence ordering
> 
> In preparation for using per device work queues, we need to move the
> start of the neighbor thread task to after ipoib_ib_dev_init and move
> the destruction of the neighbor task to before ipoib_ib_dev_cleanup.
> Otherwise we will end up freeing our workqueue with work possibly still
> on it.
> 
> Signed-off-by: Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

Acked-by: Alex Estrin <alex.estrin-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

> ---
>  drivers/infiniband/ulp/ipoib/ipoib_main.c | 24 +++++++++++++++++-------
>  1 file changed, 17 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c
> b/drivers/infiniband/ulp/ipoib/ipoib_main.c
> index 58b5aa3b6f2..2cf81ef5141 100644
> --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
> +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
> @@ -1262,15 +1262,13 @@ int ipoib_dev_init(struct net_device *dev, struct ib_device
> *ca, int port)
>  {
>  	struct ipoib_dev_priv *priv = netdev_priv(dev);
> 
> -	if (ipoib_neigh_hash_init(priv) < 0)
> -		goto out;
>  	/* Allocate RX/TX "rings" to hold queued skbs */
>  	priv->rx_ring =	kzalloc(ipoib_recvq_size * sizeof *priv->rx_ring,
>  				GFP_KERNEL);
>  	if (!priv->rx_ring) {
>  		printk(KERN_WARNING "%s: failed to allocate RX ring (%d entries)\n",
>  		       ca->name, ipoib_recvq_size);
> -		goto out_neigh_hash_cleanup;
> +		goto out;
>  	}
> 
>  	priv->tx_ring = vzalloc(ipoib_sendq_size * sizeof *priv->tx_ring);
> @@ -1285,16 +1283,24 @@ int ipoib_dev_init(struct net_device *dev, struct ib_device
> *ca, int port)
>  	if (ipoib_ib_dev_init(dev, ca, port))
>  		goto out_tx_ring_cleanup;
> 
> +	/*
> +	 * Must be after ipoib_ib_dev_init so we can allocate a per
> +	 * device wq there and use it here
> +	 */
> +	if (ipoib_neigh_hash_init(priv) < 0)
> +		goto out_dev_uninit;
> +
>  	return 0;
> 
> +out_dev_uninit:
> +	ipoib_ib_dev_cleanup();
> +
>  out_tx_ring_cleanup:
>  	vfree(priv->tx_ring);
> 
>  out_rx_ring_cleanup:
>  	kfree(priv->rx_ring);
> 
> -out_neigh_hash_cleanup:
> -	ipoib_neigh_hash_uninit(dev);
>  out:
>  	return -ENOMEM;
>  }
> @@ -1317,6 +1323,12 @@ void ipoib_dev_cleanup(struct net_device *dev)
>  	}
>  	unregister_netdevice_many(&head);
> 
> +	/*
> +	 * Must be before ipoib_ib_dev_cleanup or we delete an in use
> +	 * work queue
> +	 */
> +	ipoib_neigh_hash_uninit(dev);
> +
>  	ipoib_ib_dev_cleanup(dev);
> 
>  	kfree(priv->rx_ring);
> @@ -1324,8 +1336,6 @@ void ipoib_dev_cleanup(struct net_device *dev)
> 
>  	priv->rx_ring = NULL;
>  	priv->tx_ring = NULL;
> -
> -	ipoib_neigh_hash_uninit(dev);
>  }
> 
>  static const struct header_ops ipoib_header_ops = {
> --
> 2.1.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 06/10] IPoIB: Use dedicated workqueues per interface
       [not found] ` <cover.1418229671.git.dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
                     ` (4 preceding siblings ...)
  2014-12-10 16:47   ` [PATCH 05/10] IPoIB: change init sequence ordering Doug Ledford
@ 2014-12-10 16:47   ` Doug Ledford
       [not found]     ` <f733fa8b780b1ac512efb748683dca001fb2edf7.1418229671.git.dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  2014-12-10 16:47   ` [PATCH 07/10] IPoIB: Make ipoib_mcast_stop_thread flush the workqueue Doug Ledford
                     ` (4 subsequent siblings)
  10 siblings, 1 reply; 20+ messages in thread
From: Doug Ledford @ 2014-12-10 16:47 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA, roland-DgEjT+Ai2ygdnm+yROfE0A
  Cc: Doug Ledford

During my recent work on the rtnl lock deadlock in the IPoIB driver, I
saw that even once I fixed the apparent races for a single device, as
soon as that device had any children, new races popped up.  It turns out
that this is because no matter how well we protect against races on a
single device, the fact that all devices use the same workqueue, and
flush_workqueue() flushes *everything* from that workqueue, we can have
one device in the middle of a down and holding the rtnl lock and another
totally unrelated device needing to run mcast_restart_task, which wants
the rtnl lock and will loop trying to take it unless is sees its own
FLAG_ADMIN_UP flag go away.  Because the unrelated interface will never
see its own ADMIN_UP flag drop, the interface going down will deadlock
trying to flush the queue.  There are several possible solutions to this
problem:

Make carrier_on_task and mcast_restart_task try to take the rtnl for
some set period of time and if they fail, then bail.  This runs the real
risk of dropping work on the floor, which can end up being its own
separate kind of deadlock.

Set some global flag in the driver that says some device is in the
middle of going down, letting all tasks know to bail.  Again, this can
drop work on the floor.  I suppose if our own ADMIN_UP flag doesn't go
away, then maybe after a few tries on the rtnl lock we can queue our own
task back up as a delayed work and return and avoid dropping work on the
floor that way.  But I'm not 100% convinced that we won't cause other
problems.

Or the method this patch attempts to use, which is when we bring an
interface up, create a workqueue specifically for that interface, so
that when we take it back down, we are flushing only those tasks
associated with our interface.  In addition, keep the global workqueue,
but now limit it to only flush tasks.  In this way, the flush tasks can
always flush the device specific work queues without having deadlock
issues.

Signed-off-by: Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
---
 drivers/infiniband/ulp/ipoib/ipoib.h           |  1 +
 drivers/infiniband/ulp/ipoib/ipoib_cm.c        | 18 +++++++++---------
 drivers/infiniband/ulp/ipoib/ipoib_ib.c        |  6 +++---
 drivers/infiniband/ulp/ipoib/ipoib_main.c      | 19 ++++++++++++-------
 drivers/infiniband/ulp/ipoib/ipoib_multicast.c | 26 ++++++++++++--------------
 drivers/infiniband/ulp/ipoib/ipoib_verbs.c     | 22 +++++++++++++++++++++-
 6 files changed, 58 insertions(+), 34 deletions(-)

diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h b/drivers/infiniband/ulp/ipoib/ipoib.h
index f4c1b20b23b..45fd10a72ec 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib.h
+++ b/drivers/infiniband/ulp/ipoib/ipoib.h
@@ -323,6 +323,7 @@ struct ipoib_dev_priv {
 	struct list_head multicast_list;
 	struct rb_root multicast_tree;
 
+	struct workqueue_struct *wq;
 	struct delayed_work mcast_task;
 	struct work_struct carrier_on_task;
 	struct work_struct flush_light;
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_cm.c b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
index 933efcea0d0..56959adb6c7 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
@@ -474,7 +474,7 @@ static int ipoib_cm_req_handler(struct ib_cm_id *cm_id, struct ib_cm_event *even
 	}
 
 	spin_lock_irq(&priv->lock);
-	queue_delayed_work(ipoib_workqueue,
+	queue_delayed_work(priv->wq,
 			   &priv->cm.stale_task, IPOIB_CM_RX_DELAY);
 	/* Add this entry to passive ids list head, but do not re-add it
 	 * if IB_EVENT_QP_LAST_WQE_REACHED has moved it to flush list. */
@@ -576,7 +576,7 @@ void ipoib_cm_handle_rx_wc(struct net_device *dev, struct ib_wc *wc)
 			spin_lock_irqsave(&priv->lock, flags);
 			list_splice_init(&priv->cm.rx_drain_list, &priv->cm.rx_reap_list);
 			ipoib_cm_start_rx_drain(priv);
-			queue_work(ipoib_workqueue, &priv->cm.rx_reap_task);
+			queue_work(priv->wq, &priv->cm.rx_reap_task);
 			spin_unlock_irqrestore(&priv->lock, flags);
 		} else
 			ipoib_warn(priv, "cm recv completion event with wrid %d (> %d)\n",
@@ -603,7 +603,7 @@ void ipoib_cm_handle_rx_wc(struct net_device *dev, struct ib_wc *wc)
 				spin_lock_irqsave(&priv->lock, flags);
 				list_move(&p->list, &priv->cm.rx_reap_list);
 				spin_unlock_irqrestore(&priv->lock, flags);
-				queue_work(ipoib_workqueue, &priv->cm.rx_reap_task);
+				queue_work(priv->wq, &priv->cm.rx_reap_task);
 			}
 			return;
 		}
@@ -827,7 +827,7 @@ void ipoib_cm_handle_tx_wc(struct net_device *dev, struct ib_wc *wc)
 
 		if (test_and_clear_bit(IPOIB_FLAG_INITIALIZED, &tx->flags)) {
 			list_move(&tx->list, &priv->cm.reap_list);
-			queue_work(ipoib_workqueue, &priv->cm.reap_task);
+			queue_work(priv->wq, &priv->cm.reap_task);
 		}
 
 		clear_bit(IPOIB_FLAG_OPER_UP, &tx->flags);
@@ -1255,7 +1255,7 @@ static int ipoib_cm_tx_handler(struct ib_cm_id *cm_id,
 
 		if (test_and_clear_bit(IPOIB_FLAG_INITIALIZED, &tx->flags)) {
 			list_move(&tx->list, &priv->cm.reap_list);
-			queue_work(ipoib_workqueue, &priv->cm.reap_task);
+			queue_work(priv->wq, &priv->cm.reap_task);
 		}
 
 		spin_unlock_irqrestore(&priv->lock, flags);
@@ -1284,7 +1284,7 @@ struct ipoib_cm_tx *ipoib_cm_create_tx(struct net_device *dev, struct ipoib_path
 	tx->dev = dev;
 	list_add(&tx->list, &priv->cm.start_list);
 	set_bit(IPOIB_FLAG_INITIALIZED, &tx->flags);
-	queue_work(ipoib_workqueue, &priv->cm.start_task);
+	queue_work(priv->wq, &priv->cm.start_task);
 	return tx;
 }
 
@@ -1295,7 +1295,7 @@ void ipoib_cm_destroy_tx(struct ipoib_cm_tx *tx)
 	if (test_and_clear_bit(IPOIB_FLAG_INITIALIZED, &tx->flags)) {
 		spin_lock_irqsave(&priv->lock, flags);
 		list_move(&tx->list, &priv->cm.reap_list);
-		queue_work(ipoib_workqueue, &priv->cm.reap_task);
+		queue_work(priv->wq, &priv->cm.reap_task);
 		ipoib_dbg(priv, "Reap connection for gid %pI6\n",
 			  tx->neigh->daddr + 4);
 		tx->neigh = NULL;
@@ -1417,7 +1417,7 @@ void ipoib_cm_skb_too_long(struct net_device *dev, struct sk_buff *skb,
 
 	skb_queue_tail(&priv->cm.skb_queue, skb);
 	if (e)
-		queue_work(ipoib_workqueue, &priv->cm.skb_task);
+		queue_work(priv->wq, &priv->cm.skb_task);
 }
 
 static void ipoib_cm_rx_reap(struct work_struct *work)
@@ -1450,7 +1450,7 @@ static void ipoib_cm_stale_task(struct work_struct *work)
 	}
 
 	if (!list_empty(&priv->cm.passive_ids))
-		queue_delayed_work(ipoib_workqueue,
+		queue_delayed_work(priv->wq,
 				   &priv->cm.stale_task, IPOIB_CM_RX_DELAY);
 	spin_unlock_irq(&priv->lock);
 }
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_ib.c b/drivers/infiniband/ulp/ipoib/ipoib_ib.c
index 72626c34817..bfd17d41b5f 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_ib.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_ib.c
@@ -655,7 +655,7 @@ void ipoib_reap_ah(struct work_struct *work)
 	__ipoib_reap_ah(dev);
 
 	if (!test_bit(IPOIB_STOP_REAPER, &priv->flags))
-		queue_delayed_work(ipoib_workqueue, &priv->ah_reap_task,
+		queue_delayed_work(priv->wq, &priv->ah_reap_task,
 				   round_jiffies_relative(HZ));
 }
 
@@ -696,7 +696,7 @@ int ipoib_ib_dev_open(struct net_device *dev, int flush)
 	}
 
 	clear_bit(IPOIB_STOP_REAPER, &priv->flags);
-	queue_delayed_work(ipoib_workqueue, &priv->ah_reap_task,
+	queue_delayed_work(priv->wq, &priv->ah_reap_task,
 			   round_jiffies_relative(HZ));
 
 	if (!test_and_set_bit(IPOIB_FLAG_INITIALIZED, &priv->flags))
@@ -881,7 +881,7 @@ timeout:
 	set_bit(IPOIB_STOP_REAPER, &priv->flags);
 	cancel_delayed_work(&priv->ah_reap_task);
 	if (flush)
-		flush_workqueue(ipoib_workqueue);
+		flush_workqueue(priv->wq);
 
 	begin = jiffies;
 
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c
index 2cf81ef5141..42e5c278f48 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
@@ -839,7 +839,7 @@ static void ipoib_set_mcast_list(struct net_device *dev)
 		return;
 	}
 
-	queue_work(ipoib_workqueue, &priv->restart_task);
+	queue_work(priv->wq, &priv->restart_task);
 }
 
 static u32 ipoib_addr_hash(struct ipoib_neigh_hash *htbl, u8 *daddr)
@@ -954,7 +954,7 @@ static void ipoib_reap_neigh(struct work_struct *work)
 	__ipoib_reap_neigh(priv);
 
 	if (!test_bit(IPOIB_STOP_NEIGH_GC, &priv->flags))
-		queue_delayed_work(ipoib_workqueue, &priv->neigh_reap_task,
+		queue_delayed_work(priv->wq, &priv->neigh_reap_task,
 				   arp_tbl.gc_interval);
 }
 
@@ -1133,7 +1133,7 @@ static int ipoib_neigh_hash_init(struct ipoib_dev_priv *priv)
 
 	/* start garbage collection */
 	clear_bit(IPOIB_STOP_NEIGH_GC, &priv->flags);
-	queue_delayed_work(ipoib_workqueue, &priv->neigh_reap_task,
+	queue_delayed_work(priv->wq, &priv->neigh_reap_task,
 			   arp_tbl.gc_interval);
 
 	return 0;
@@ -1293,7 +1293,7 @@ int ipoib_dev_init(struct net_device *dev, struct ib_device *ca, int port)
 	return 0;
 
 out_dev_uninit:
-	ipoib_ib_dev_cleanup();
+	ipoib_ib_dev_cleanup(dev);
 
 out_tx_ring_cleanup:
 	vfree(priv->tx_ring);
@@ -1646,7 +1646,7 @@ register_failed:
 	/* Stop GC if started before flush */
 	set_bit(IPOIB_STOP_NEIGH_GC, &priv->flags);
 	cancel_delayed_work(&priv->neigh_reap_task);
-	flush_workqueue(ipoib_workqueue);
+	flush_workqueue(priv->wq);
 
 event_failed:
 	ipoib_dev_cleanup(priv->dev);
@@ -1717,7 +1717,7 @@ static void ipoib_remove_one(struct ib_device *device)
 		/* Stop GC */
 		set_bit(IPOIB_STOP_NEIGH_GC, &priv->flags);
 		cancel_delayed_work(&priv->neigh_reap_task);
-		flush_workqueue(ipoib_workqueue);
+		flush_workqueue(priv->wq);
 
 		unregister_netdev(priv->dev);
 		free_netdev(priv->dev);
@@ -1758,8 +1758,13 @@ static int __init ipoib_init_module(void)
 	 * unregister_netdev() and linkwatch_event take the rtnl lock,
 	 * so flush_scheduled_work() can deadlock during device
 	 * removal.
+	 *
+	 * In addition, bringing one device up and another down at the
+	 * same time can deadlock a single workqueue, so we have this
+	 * global fallback workqueue, but we also attempt to open a
+	 * per device workqueue each time we bring an interface up
 	 */
-	ipoib_workqueue = create_singlethread_workqueue("ipoib");
+	ipoib_workqueue = create_singlethread_workqueue("ipoib_flush");
 	if (!ipoib_workqueue) {
 		ret = -ENOMEM;
 		goto err_fs;
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
index 41325960e4e..845f910eb21 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
@@ -388,7 +388,7 @@ void ipoib_mcast_carrier_on_task(struct work_struct *work)
 	 * the workqueue while holding the rtnl lock, so loop
 	 * on trylock until either we get the lock or we see
 	 * FLAG_ADMIN_UP go away as that signals that we are bailing
-	 * and can safely ignore the carrier on work
+	 * and can safely ignore the carrier on work.
 	 */
 	while (!rtnl_trylock()) {
 		if (!test_bit(IPOIB_FLAG_ADMIN_UP, &priv->flags))
@@ -432,15 +432,14 @@ static int ipoib_mcast_join_complete(int status,
 	if (!status) {
 		mcast->backoff = 1;
 		if (test_bit(IPOIB_MCAST_RUN, &priv->flags))
-			queue_delayed_work(ipoib_workqueue,
-					   &priv->mcast_task, 0);
+			queue_delayed_work(priv->wq, &priv->mcast_task, 0);
 
 		/*
-		 * Defer carrier on work to ipoib_workqueue to avoid a
+		 * Defer carrier on work to priv->wq to avoid a
 		 * deadlock on rtnl_lock here.
 		 */
 		if (mcast == priv->broadcast)
-			queue_work(ipoib_workqueue, &priv->carrier_on_task);
+			queue_work(priv->wq, &priv->carrier_on_task);
 	} else {
 		if (mcast->logcount++ < 20) {
 			if (status == -ETIMEDOUT || status == -EAGAIN) {
@@ -465,7 +464,7 @@ out:
 	if (status == -ENETRESET)
 		status = 0;
 	if (status && test_bit(IPOIB_MCAST_RUN, &priv->flags))
-		queue_delayed_work(ipoib_workqueue, &priv->mcast_task,
+		queue_delayed_work(priv->wq, &priv->mcast_task,
 				   mcast->backoff * HZ);
 	spin_unlock_irq(&priv->lock);
 	mutex_unlock(&mcast_mutex);
@@ -535,8 +534,7 @@ static void ipoib_mcast_join(struct net_device *dev, struct ipoib_mcast *mcast,
 			mcast->backoff = IPOIB_MAX_BACKOFF_SECONDS;
 
 		if (test_bit(IPOIB_MCAST_RUN, &priv->flags))
-			queue_delayed_work(ipoib_workqueue,
-					   &priv->mcast_task,
+			queue_delayed_work(priv->wq, &priv->mcast_task,
 					   mcast->backoff * HZ);
 	}
 	mutex_unlock(&mcast_mutex);
@@ -576,8 +574,8 @@ void ipoib_mcast_join_task(struct work_struct *work)
 			ipoib_warn(priv, "failed to allocate broadcast group\n");
 			mutex_lock(&mcast_mutex);
 			if (test_bit(IPOIB_MCAST_RUN, &priv->flags))
-				queue_delayed_work(ipoib_workqueue,
-						   &priv->mcast_task, HZ);
+				queue_delayed_work(priv->wq, &priv->mcast_task,
+						   HZ);
 			mutex_unlock(&mcast_mutex);
 			return;
 		}
@@ -644,7 +642,7 @@ int ipoib_mcast_start_thread(struct net_device *dev)
 
 	mutex_lock(&mcast_mutex);
 	if (!test_and_set_bit(IPOIB_MCAST_RUN, &priv->flags))
-		queue_delayed_work(ipoib_workqueue, &priv->mcast_task, 0);
+		queue_delayed_work(priv->wq, &priv->mcast_task, 0);
 	mutex_unlock(&mcast_mutex);
 
 	return 0;
@@ -662,7 +660,7 @@ int ipoib_mcast_stop_thread(struct net_device *dev, int flush)
 	mutex_unlock(&mcast_mutex);
 
 	if (flush)
-		flush_workqueue(ipoib_workqueue);
+		flush_workqueue(priv->wq);
 
 	return 0;
 }
@@ -729,7 +727,7 @@ void ipoib_mcast_send(struct net_device *dev, u8 *daddr, struct sk_buff *skb)
 		__ipoib_mcast_add(dev, mcast);
 		list_add_tail(&mcast->list, &priv->multicast_list);
 		if (!test_and_set_bit(IPOIB_MCAST_RUN, &priv->flags))
-			queue_delayed_work(ipoib_workqueue, &priv->mcast_task, 0);
+			queue_delayed_work(priv->wq, &priv->mcast_task, 0);
 	}
 
 	if (!mcast->ah) {
@@ -944,7 +942,7 @@ void ipoib_mcast_restart_task(struct work_struct *work)
 	 * completes.  So do like the carrier on task and attempt to
 	 * take the rtnl lock, but if we can't before the ADMIN_UP flag
 	 * goes away, then just return and know that the remove list will
-	 * get flushed later by mcast_dev_flush.
+	 * get flushed later by mcast_stop_thread.
 	 */
 	while (!rtnl_trylock()) {
 		if (!test_bit(IPOIB_FLAG_ADMIN_UP, &priv->flags))
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_verbs.c b/drivers/infiniband/ulp/ipoib/ipoib_verbs.c
index c56d5d44c53..b72a753eb41 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_verbs.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_verbs.c
@@ -145,10 +145,20 @@ int ipoib_transport_dev_init(struct net_device *dev, struct ib_device *ca)
 	int ret, size;
 	int i;
 
+	/*
+	 * the various IPoIB tasks assume they will never race against
+	 * themselves, so always use a single thread workqueue
+	 */
+	priv->wq = create_singlethread_workqueue("ipoib_wq");
+	if (!priv->wq) {
+		printk(KERN_WARNING "ipoib: failed to allocate device WQ\n");
+		return -ENODEV;
+	}
+
 	priv->pd = ib_alloc_pd(priv->ca);
 	if (IS_ERR(priv->pd)) {
 		printk(KERN_WARNING "%s: failed to allocate PD\n", ca->name);
-		return -ENODEV;
+		goto out_free_wq;
 	}
 
 	priv->mr = ib_get_dma_mr(priv->pd, IB_ACCESS_LOCAL_WRITE);
@@ -242,6 +252,10 @@ out_free_mr:
 
 out_free_pd:
 	ib_dealloc_pd(priv->pd);
+
+out_free_wq:
+	destroy_workqueue(priv->wq);
+	priv->wq = NULL;
 	return -ENODEV;
 }
 
@@ -270,6 +284,12 @@ void ipoib_transport_dev_cleanup(struct net_device *dev)
 
 	if (ib_dealloc_pd(priv->pd))
 		ipoib_warn(priv, "ib_dealloc_pd failed\n");
+
+	if (priv->wq) {
+		flush_workqueue(priv->wq);
+		destroy_workqueue(priv->wq);
+		priv->wq = NULL;
+	}
 }
 
 void ipoib_event(struct ib_event_handler *handler,
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 20+ messages in thread

[parent not found: <f733fa8b780b1ac512efb748683dca001fb2edf7.1418229671.git.dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>]

* RE: [PATCH 06/10] IPoIB: Use dedicated workqueues per interface
       [not found]     ` <f733fa8b780b1ac512efb748683dca001fb2edf7.1418229671.git.dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2014-12-12 14:05       ` Estrin, Alex
  0 siblings, 0 replies; 20+ messages in thread
From: Estrin, Alex @ 2014-12-12 14:05 UTC (permalink / raw)
  To: Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	roland-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org



> -----Original Message-----
> From: linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org [mailto:linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org] On
> Behalf Of Doug Ledford
> Sent: Wednesday, December 10, 2014 11:47 AM
> To: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; roland-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org
> Cc: Doug Ledford
> Subject: [PATCH 06/10] IPoIB: Use dedicated workqueues per interface
> 
> During my recent work on the rtnl lock deadlock in the IPoIB driver, I
> saw that even once I fixed the apparent races for a single device, as
> soon as that device had any children, new races popped up.  It turns out
> that this is because no matter how well we protect against races on a
> single device, the fact that all devices use the same workqueue, and
> flush_workqueue() flushes *everything* from that workqueue, we can have
> one device in the middle of a down and holding the rtnl lock and another
> totally unrelated device needing to run mcast_restart_task, which wants
> the rtnl lock and will loop trying to take it unless is sees its own
> FLAG_ADMIN_UP flag go away.  Because the unrelated interface will never
> see its own ADMIN_UP flag drop, the interface going down will deadlock
> trying to flush the queue.  There are several possible solutions to this
> problem:
> 
> Make carrier_on_task and mcast_restart_task try to take the rtnl for
> some set period of time and if they fail, then bail.  This runs the real
> risk of dropping work on the floor, which can end up being its own
> separate kind of deadlock.
> 
> Set some global flag in the driver that says some device is in the
> middle of going down, letting all tasks know to bail.  Again, this can
> drop work on the floor.  I suppose if our own ADMIN_UP flag doesn't go
> away, then maybe after a few tries on the rtnl lock we can queue our own
> task back up as a delayed work and return and avoid dropping work on the
> floor that way.  But I'm not 100% convinced that we won't cause other
> problems.
> 
> Or the method this patch attempts to use, which is when we bring an
> interface up, create a workqueue specifically for that interface, so
> that when we take it back down, we are flushing only those tasks
> associated with our interface.  In addition, keep the global workqueue,
> but now limit it to only flush tasks.  In this way, the flush tasks can
> always flush the device specific work queues without having deadlock
> issues.
> 
> Signed-off-by: Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

Acked-by: Alex Estrin <alex.estrin-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

> ---
>  drivers/infiniband/ulp/ipoib/ipoib.h           |  1 +
>  drivers/infiniband/ulp/ipoib/ipoib_cm.c        | 18 +++++++++---------
>  drivers/infiniband/ulp/ipoib/ipoib_ib.c        |  6 +++---
>  drivers/infiniband/ulp/ipoib/ipoib_main.c      | 19 ++++++++++++-------
>  drivers/infiniband/ulp/ipoib/ipoib_multicast.c | 26 ++++++++++++--------------
>  drivers/infiniband/ulp/ipoib/ipoib_verbs.c     | 22 +++++++++++++++++++++-
>  6 files changed, 58 insertions(+), 34 deletions(-)
> 
> diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h
> b/drivers/infiniband/ulp/ipoib/ipoib.h
> index f4c1b20b23b..45fd10a72ec 100644
> --- a/drivers/infiniband/ulp/ipoib/ipoib.h
> +++ b/drivers/infiniband/ulp/ipoib/ipoib.h
> @@ -323,6 +323,7 @@ struct ipoib_dev_priv {
>  	struct list_head multicast_list;
>  	struct rb_root multicast_tree;
> 
> +	struct workqueue_struct *wq;
>  	struct delayed_work mcast_task;
>  	struct work_struct carrier_on_task;
>  	struct work_struct flush_light;
> diff --git a/drivers/infiniband/ulp/ipoib/ipoib_cm.c
> b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
> index 933efcea0d0..56959adb6c7 100644
> --- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c
> +++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
> @@ -474,7 +474,7 @@ static int ipoib_cm_req_handler(struct ib_cm_id *cm_id, struct
> ib_cm_event *even
>  	}
> 
>  	spin_lock_irq(&priv->lock);
> -	queue_delayed_work(ipoib_workqueue,
> +	queue_delayed_work(priv->wq,
>  			   &priv->cm.stale_task, IPOIB_CM_RX_DELAY);
>  	/* Add this entry to passive ids list head, but do not re-add it
>  	 * if IB_EVENT_QP_LAST_WQE_REACHED has moved it to flush list. */
> @@ -576,7 +576,7 @@ void ipoib_cm_handle_rx_wc(struct net_device *dev, struct ib_wc
> *wc)
>  			spin_lock_irqsave(&priv->lock, flags);
>  			list_splice_init(&priv->cm.rx_drain_list, &priv-
> >cm.rx_reap_list);
>  			ipoib_cm_start_rx_drain(priv);
> -			queue_work(ipoib_workqueue, &priv->cm.rx_reap_task);
> +			queue_work(priv->wq, &priv->cm.rx_reap_task);
>  			spin_unlock_irqrestore(&priv->lock, flags);
>  		} else
>  			ipoib_warn(priv, "cm recv completion event with wrid %d (>
> %d)\n",
> @@ -603,7 +603,7 @@ void ipoib_cm_handle_rx_wc(struct net_device *dev, struct ib_wc
> *wc)
>  				spin_lock_irqsave(&priv->lock, flags);
>  				list_move(&p->list, &priv->cm.rx_reap_list);
>  				spin_unlock_irqrestore(&priv->lock, flags);
> -				queue_work(ipoib_workqueue, &priv->cm.rx_reap_task);
> +				queue_work(priv->wq, &priv->cm.rx_reap_task);
>  			}
>  			return;
>  		}
> @@ -827,7 +827,7 @@ void ipoib_cm_handle_tx_wc(struct net_device *dev, struct ib_wc
> *wc)
> 
>  		if (test_and_clear_bit(IPOIB_FLAG_INITIALIZED, &tx->flags)) {
>  			list_move(&tx->list, &priv->cm.reap_list);
> -			queue_work(ipoib_workqueue, &priv->cm.reap_task);
> +			queue_work(priv->wq, &priv->cm.reap_task);
>  		}
> 
>  		clear_bit(IPOIB_FLAG_OPER_UP, &tx->flags);
> @@ -1255,7 +1255,7 @@ static int ipoib_cm_tx_handler(struct ib_cm_id *cm_id,
> 
>  		if (test_and_clear_bit(IPOIB_FLAG_INITIALIZED, &tx->flags)) {
>  			list_move(&tx->list, &priv->cm.reap_list);
> -			queue_work(ipoib_workqueue, &priv->cm.reap_task);
> +			queue_work(priv->wq, &priv->cm.reap_task);
>  		}
> 
>  		spin_unlock_irqrestore(&priv->lock, flags);
> @@ -1284,7 +1284,7 @@ struct ipoib_cm_tx *ipoib_cm_create_tx(struct net_device
> *dev, struct ipoib_path
>  	tx->dev = dev;
>  	list_add(&tx->list, &priv->cm.start_list);
>  	set_bit(IPOIB_FLAG_INITIALIZED, &tx->flags);
> -	queue_work(ipoib_workqueue, &priv->cm.start_task);
> +	queue_work(priv->wq, &priv->cm.start_task);
>  	return tx;
>  }
> 
> @@ -1295,7 +1295,7 @@ void ipoib_cm_destroy_tx(struct ipoib_cm_tx *tx)
>  	if (test_and_clear_bit(IPOIB_FLAG_INITIALIZED, &tx->flags)) {
>  		spin_lock_irqsave(&priv->lock, flags);
>  		list_move(&tx->list, &priv->cm.reap_list);
> -		queue_work(ipoib_workqueue, &priv->cm.reap_task);
> +		queue_work(priv->wq, &priv->cm.reap_task);
>  		ipoib_dbg(priv, "Reap connection for gid %pI6\n",
>  			  tx->neigh->daddr + 4);
>  		tx->neigh = NULL;
> @@ -1417,7 +1417,7 @@ void ipoib_cm_skb_too_long(struct net_device *dev, struct
> sk_buff *skb,
> 
>  	skb_queue_tail(&priv->cm.skb_queue, skb);
>  	if (e)
> -		queue_work(ipoib_workqueue, &priv->cm.skb_task);
> +		queue_work(priv->wq, &priv->cm.skb_task);
>  }
> 
>  static void ipoib_cm_rx_reap(struct work_struct *work)
> @@ -1450,7 +1450,7 @@ static void ipoib_cm_stale_task(struct work_struct *work)
>  	}
> 
>  	if (!list_empty(&priv->cm.passive_ids))
> -		queue_delayed_work(ipoib_workqueue,
> +		queue_delayed_work(priv->wq,
>  				   &priv->cm.stale_task, IPOIB_CM_RX_DELAY);
>  	spin_unlock_irq(&priv->lock);
>  }
> diff --git a/drivers/infiniband/ulp/ipoib/ipoib_ib.c
> b/drivers/infiniband/ulp/ipoib/ipoib_ib.c
> index 72626c34817..bfd17d41b5f 100644
> --- a/drivers/infiniband/ulp/ipoib/ipoib_ib.c
> +++ b/drivers/infiniband/ulp/ipoib/ipoib_ib.c
> @@ -655,7 +655,7 @@ void ipoib_reap_ah(struct work_struct *work)
>  	__ipoib_reap_ah(dev);
> 
>  	if (!test_bit(IPOIB_STOP_REAPER, &priv->flags))
> -		queue_delayed_work(ipoib_workqueue, &priv->ah_reap_task,
> +		queue_delayed_work(priv->wq, &priv->ah_reap_task,
>  				   round_jiffies_relative(HZ));
>  }
> 
> @@ -696,7 +696,7 @@ int ipoib_ib_dev_open(struct net_device *dev, int flush)
>  	}
> 
>  	clear_bit(IPOIB_STOP_REAPER, &priv->flags);
> -	queue_delayed_work(ipoib_workqueue, &priv->ah_reap_task,
> +	queue_delayed_work(priv->wq, &priv->ah_reap_task,
>  			   round_jiffies_relative(HZ));
> 
>  	if (!test_and_set_bit(IPOIB_FLAG_INITIALIZED, &priv->flags))
> @@ -881,7 +881,7 @@ timeout:
>  	set_bit(IPOIB_STOP_REAPER, &priv->flags);
>  	cancel_delayed_work(&priv->ah_reap_task);
>  	if (flush)
> -		flush_workqueue(ipoib_workqueue);
> +		flush_workqueue(priv->wq);
> 
>  	begin = jiffies;
> 
> diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c
> b/drivers/infiniband/ulp/ipoib/ipoib_main.c
> index 2cf81ef5141..42e5c278f48 100644
> --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
> +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
> @@ -839,7 +839,7 @@ static void ipoib_set_mcast_list(struct net_device *dev)
>  		return;
>  	}
> 
> -	queue_work(ipoib_workqueue, &priv->restart_task);
> +	queue_work(priv->wq, &priv->restart_task);
>  }
> 
>  static u32 ipoib_addr_hash(struct ipoib_neigh_hash *htbl, u8 *daddr)
> @@ -954,7 +954,7 @@ static void ipoib_reap_neigh(struct work_struct *work)
>  	__ipoib_reap_neigh(priv);
> 
>  	if (!test_bit(IPOIB_STOP_NEIGH_GC, &priv->flags))
> -		queue_delayed_work(ipoib_workqueue, &priv->neigh_reap_task,
> +		queue_delayed_work(priv->wq, &priv->neigh_reap_task,
>  				   arp_tbl.gc_interval);
>  }
> 
> @@ -1133,7 +1133,7 @@ static int ipoib_neigh_hash_init(struct ipoib_dev_priv *priv)
> 
>  	/* start garbage collection */
>  	clear_bit(IPOIB_STOP_NEIGH_GC, &priv->flags);
> -	queue_delayed_work(ipoib_workqueue, &priv->neigh_reap_task,
> +	queue_delayed_work(priv->wq, &priv->neigh_reap_task,
>  			   arp_tbl.gc_interval);
> 
>  	return 0;
> @@ -1293,7 +1293,7 @@ int ipoib_dev_init(struct net_device *dev, struct ib_device
> *ca, int port)
>  	return 0;
> 
>  out_dev_uninit:
> -	ipoib_ib_dev_cleanup();
> +	ipoib_ib_dev_cleanup(dev);
> 
>  out_tx_ring_cleanup:
>  	vfree(priv->tx_ring);
> @@ -1646,7 +1646,7 @@ register_failed:
>  	/* Stop GC if started before flush */
>  	set_bit(IPOIB_STOP_NEIGH_GC, &priv->flags);
>  	cancel_delayed_work(&priv->neigh_reap_task);
> -	flush_workqueue(ipoib_workqueue);
> +	flush_workqueue(priv->wq);
> 
>  event_failed:
>  	ipoib_dev_cleanup(priv->dev);
> @@ -1717,7 +1717,7 @@ static void ipoib_remove_one(struct ib_device *device)
>  		/* Stop GC */
>  		set_bit(IPOIB_STOP_NEIGH_GC, &priv->flags);
>  		cancel_delayed_work(&priv->neigh_reap_task);
> -		flush_workqueue(ipoib_workqueue);
> +		flush_workqueue(priv->wq);
> 
>  		unregister_netdev(priv->dev);
>  		free_netdev(priv->dev);
> @@ -1758,8 +1758,13 @@ static int __init ipoib_init_module(void)
>  	 * unregister_netdev() and linkwatch_event take the rtnl lock,
>  	 * so flush_scheduled_work() can deadlock during device
>  	 * removal.
> +	 *
> +	 * In addition, bringing one device up and another down at the
> +	 * same time can deadlock a single workqueue, so we have this
> +	 * global fallback workqueue, but we also attempt to open a
> +	 * per device workqueue each time we bring an interface up
>  	 */
> -	ipoib_workqueue = create_singlethread_workqueue("ipoib");
> +	ipoib_workqueue = create_singlethread_workqueue("ipoib_flush");
>  	if (!ipoib_workqueue) {
>  		ret = -ENOMEM;
>  		goto err_fs;
> diff --git a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
> b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
> index 41325960e4e..845f910eb21 100644
> --- a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
> +++ b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
> @@ -388,7 +388,7 @@ void ipoib_mcast_carrier_on_task(struct work_struct *work)
>  	 * the workqueue while holding the rtnl lock, so loop
>  	 * on trylock until either we get the lock or we see
>  	 * FLAG_ADMIN_UP go away as that signals that we are bailing
> -	 * and can safely ignore the carrier on work
> +	 * and can safely ignore the carrier on work.
>  	 */
>  	while (!rtnl_trylock()) {
>  		if (!test_bit(IPOIB_FLAG_ADMIN_UP, &priv->flags))
> @@ -432,15 +432,14 @@ static int ipoib_mcast_join_complete(int status,
>  	if (!status) {
>  		mcast->backoff = 1;
>  		if (test_bit(IPOIB_MCAST_RUN, &priv->flags))
> -			queue_delayed_work(ipoib_workqueue,
> -					   &priv->mcast_task, 0);
> +			queue_delayed_work(priv->wq, &priv->mcast_task, 0);
> 
>  		/*
> -		 * Defer carrier on work to ipoib_workqueue to avoid a
> +		 * Defer carrier on work to priv->wq to avoid a
>  		 * deadlock on rtnl_lock here.
>  		 */
>  		if (mcast == priv->broadcast)
> -			queue_work(ipoib_workqueue, &priv->carrier_on_task);
> +			queue_work(priv->wq, &priv->carrier_on_task);
>  	} else {
>  		if (mcast->logcount++ < 20) {
>  			if (status == -ETIMEDOUT || status == -EAGAIN) {
> @@ -465,7 +464,7 @@ out:
>  	if (status == -ENETRESET)
>  		status = 0;
>  	if (status && test_bit(IPOIB_MCAST_RUN, &priv->flags))
> -		queue_delayed_work(ipoib_workqueue, &priv->mcast_task,
> +		queue_delayed_work(priv->wq, &priv->mcast_task,
>  				   mcast->backoff * HZ);
>  	spin_unlock_irq(&priv->lock);
>  	mutex_unlock(&mcast_mutex);
> @@ -535,8 +534,7 @@ static void ipoib_mcast_join(struct net_device *dev, struct
> ipoib_mcast *mcast,
>  			mcast->backoff = IPOIB_MAX_BACKOFF_SECONDS;
> 
>  		if (test_bit(IPOIB_MCAST_RUN, &priv->flags))
> -			queue_delayed_work(ipoib_workqueue,
> -					   &priv->mcast_task,
> +			queue_delayed_work(priv->wq, &priv->mcast_task,
>  					   mcast->backoff * HZ);
>  	}
>  	mutex_unlock(&mcast_mutex);
> @@ -576,8 +574,8 @@ void ipoib_mcast_join_task(struct work_struct *work)
>  			ipoib_warn(priv, "failed to allocate broadcast group\n");
>  			mutex_lock(&mcast_mutex);
>  			if (test_bit(IPOIB_MCAST_RUN, &priv->flags))
> -				queue_delayed_work(ipoib_workqueue,
> -						   &priv->mcast_task, HZ);
> +				queue_delayed_work(priv->wq, &priv->mcast_task,
> +						   HZ);
>  			mutex_unlock(&mcast_mutex);
>  			return;
>  		}
> @@ -644,7 +642,7 @@ int ipoib_mcast_start_thread(struct net_device *dev)
> 
>  	mutex_lock(&mcast_mutex);
>  	if (!test_and_set_bit(IPOIB_MCAST_RUN, &priv->flags))
> -		queue_delayed_work(ipoib_workqueue, &priv->mcast_task, 0);
> +		queue_delayed_work(priv->wq, &priv->mcast_task, 0);
>  	mutex_unlock(&mcast_mutex);
> 
>  	return 0;
> @@ -662,7 +660,7 @@ int ipoib_mcast_stop_thread(struct net_device *dev, int flush)
>  	mutex_unlock(&mcast_mutex);
> 
>  	if (flush)
> -		flush_workqueue(ipoib_workqueue);
> +		flush_workqueue(priv->wq);
> 
>  	return 0;
>  }
> @@ -729,7 +727,7 @@ void ipoib_mcast_send(struct net_device *dev, u8 *daddr, struct
> sk_buff *skb)
>  		__ipoib_mcast_add(dev, mcast);
>  		list_add_tail(&mcast->list, &priv->multicast_list);
>  		if (!test_and_set_bit(IPOIB_MCAST_RUN, &priv->flags))
> -			queue_delayed_work(ipoib_workqueue, &priv->mcast_task, 0);
> +			queue_delayed_work(priv->wq, &priv->mcast_task, 0);
>  	}
> 
>  	if (!mcast->ah) {
> @@ -944,7 +942,7 @@ void ipoib_mcast_restart_task(struct work_struct *work)
>  	 * completes.  So do like the carrier on task and attempt to
>  	 * take the rtnl lock, but if we can't before the ADMIN_UP flag
>  	 * goes away, then just return and know that the remove list will
> -	 * get flushed later by mcast_dev_flush.
> +	 * get flushed later by mcast_stop_thread.
>  	 */
>  	while (!rtnl_trylock()) {
>  		if (!test_bit(IPOIB_FLAG_ADMIN_UP, &priv->flags))
> diff --git a/drivers/infiniband/ulp/ipoib/ipoib_verbs.c
> b/drivers/infiniband/ulp/ipoib/ipoib_verbs.c
> index c56d5d44c53..b72a753eb41 100644
> --- a/drivers/infiniband/ulp/ipoib/ipoib_verbs.c
> +++ b/drivers/infiniband/ulp/ipoib/ipoib_verbs.c
> @@ -145,10 +145,20 @@ int ipoib_transport_dev_init(struct net_device *dev, struct
> ib_device *ca)
>  	int ret, size;
>  	int i;
> 
> +	/*
> +	 * the various IPoIB tasks assume they will never race against
> +	 * themselves, so always use a single thread workqueue
> +	 */
> +	priv->wq = create_singlethread_workqueue("ipoib_wq");
> +	if (!priv->wq) {
> +		printk(KERN_WARNING "ipoib: failed to allocate device WQ\n");
> +		return -ENODEV;
> +	}
> +
>  	priv->pd = ib_alloc_pd(priv->ca);
>  	if (IS_ERR(priv->pd)) {
>  		printk(KERN_WARNING "%s: failed to allocate PD\n", ca->name);
> -		return -ENODEV;
> +		goto out_free_wq;
>  	}
> 
>  	priv->mr = ib_get_dma_mr(priv->pd, IB_ACCESS_LOCAL_WRITE);
> @@ -242,6 +252,10 @@ out_free_mr:
> 
>  out_free_pd:
>  	ib_dealloc_pd(priv->pd);
> +
> +out_free_wq:
> +	destroy_workqueue(priv->wq);
> +	priv->wq = NULL;
>  	return -ENODEV;
>  }
> 
> @@ -270,6 +284,12 @@ void ipoib_transport_dev_cleanup(struct net_device *dev)
> 
>  	if (ib_dealloc_pd(priv->pd))
>  		ipoib_warn(priv, "ib_dealloc_pd failed\n");
> +
> +	if (priv->wq) {
> +		flush_workqueue(priv->wq);
> +		destroy_workqueue(priv->wq);
> +		priv->wq = NULL;
> +	}
>  }
> 
>  void ipoib_event(struct ib_event_handler *handler,
> --
> 2.1.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 07/10] IPoIB: Make ipoib_mcast_stop_thread flush the workqueue
       [not found] ` <cover.1418229671.git.dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
                     ` (5 preceding siblings ...)
  2014-12-10 16:47   ` [PATCH 06/10] IPoIB: Use dedicated workqueues per interface Doug Ledford
@ 2014-12-10 16:47   ` Doug Ledford
       [not found]     ` <8c03e7569e7da863c0507dbf95e96ad06280b938.1418229671.git.dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  2014-12-10 16:47   ` [PATCH 08/10] IPoIB: No longer use flush as a parameter Doug Ledford
                     ` (3 subsequent siblings)
  10 siblings, 1 reply; 20+ messages in thread
From: Doug Ledford @ 2014-12-10 16:47 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA, roland-DgEjT+Ai2ygdnm+yROfE0A
  Cc: Doug Ledford

We used to pass a flush variable to mcast_stop_thread to indicate if we
should flush the workqueue or not.  This was due to some code trying to
flush a workqueue that it was currently running on which is a no-no.
Now that we have per-device work queues, and now that
ipoib_mcast_restart_task has taken the fact that it is queued on a
single thread workqueue with all of the ipoib_mcast_join_task's and
therefore has no need to stop the join task while it runs, we can do
away with the flush parameter and unilaterally flush always.

Signed-off-by: Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
---
 drivers/infiniband/ulp/ipoib/ipoib.h           |  2 +-
 drivers/infiniband/ulp/ipoib/ipoib_ib.c        |  4 ++--
 drivers/infiniband/ulp/ipoib/ipoib_multicast.c | 21 +++++++++------------
 3 files changed, 12 insertions(+), 15 deletions(-)

diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h b/drivers/infiniband/ulp/ipoib/ipoib.h
index 45fd10a72ec..28dc927c0e8 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib.h
+++ b/drivers/infiniband/ulp/ipoib/ipoib.h
@@ -499,7 +499,7 @@ void ipoib_mcast_send(struct net_device *dev, u8 *daddr, struct sk_buff *skb);
 
 void ipoib_mcast_restart_task(struct work_struct *work);
 int ipoib_mcast_start_thread(struct net_device *dev);
-int ipoib_mcast_stop_thread(struct net_device *dev, int flush);
+int ipoib_mcast_stop_thread(struct net_device *dev);
 
 void ipoib_mcast_dev_down(struct net_device *dev);
 void ipoib_mcast_dev_flush(struct net_device *dev);
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_ib.c b/drivers/infiniband/ulp/ipoib/ipoib_ib.c
index bfd17d41b5f..66096787119 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_ib.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_ib.c
@@ -747,7 +747,7 @@ int ipoib_ib_dev_down(struct net_device *dev, int flush)
 	clear_bit(IPOIB_FLAG_OPER_UP, &priv->flags);
 	netif_carrier_off(dev);
 
-	ipoib_mcast_stop_thread(dev, flush);
+	ipoib_mcast_stop_thread(dev);
 	ipoib_mcast_dev_flush(dev);
 
 	ipoib_flush_paths(dev);
@@ -1097,7 +1097,7 @@ void ipoib_ib_dev_cleanup(struct net_device *dev)
 	 */
 	ipoib_flush_paths(dev);
 
-	ipoib_mcast_stop_thread(dev, 1);
+	ipoib_mcast_stop_thread(dev);
 	ipoib_mcast_dev_flush(dev);
 
 	ipoib_transport_dev_cleanup(dev);
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
index 845f910eb21..bc50dd0d0e4 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
@@ -648,7 +648,7 @@ int ipoib_mcast_start_thread(struct net_device *dev)
 	return 0;
 }
 
-int ipoib_mcast_stop_thread(struct net_device *dev, int flush)
+int ipoib_mcast_stop_thread(struct net_device *dev)
 {
 	struct ipoib_dev_priv *priv = netdev_priv(dev);
 
@@ -659,8 +659,7 @@ int ipoib_mcast_stop_thread(struct net_device *dev, int flush)
 	cancel_delayed_work(&priv->mcast_task);
 	mutex_unlock(&mcast_mutex);
 
-	if (flush)
-		flush_workqueue(priv->wq);
+	flush_workqueue(priv->wq);
 
 	return 0;
 }
@@ -838,8 +837,6 @@ void ipoib_mcast_restart_task(struct work_struct *work)
 
 	ipoib_dbg_mcast(priv, "restarting multicast task\n");
 
-	ipoib_mcast_stop_thread(dev, 0);
-
 	local_irq_save(flags);
 	netif_addr_lock(dev);
 	spin_lock(&priv->lock);
@@ -936,13 +933,10 @@ void ipoib_mcast_restart_task(struct work_struct *work)
 	 * We have to cancel outside of the spinlock, but we have to
 	 * take the rtnl lock or else we race with the removal of
 	 * entries from the remove list in mcast_dev_flush as part
-	 * of ipoib_stop() which will call mcast_stop_thread with
-	 * flush == 1 while holding the rtnl lock, and the
-	 * flush_workqueue won't complete until this restart_mcast_task
-	 * completes.  So do like the carrier on task and attempt to
-	 * take the rtnl lock, but if we can't before the ADMIN_UP flag
-	 * goes away, then just return and know that the remove list will
-	 * get flushed later by mcast_stop_thread.
+	 * of ipoib_stop().  We detect the drop of the ADMIN_UP flag
+	 * to signal that we have hit this particular race, and we
+	 * return since we know we don't need to do anything else
+	 * anyway.
 	 */
 	while (!rtnl_trylock()) {
 		if (!test_bit(IPOIB_FLAG_ADMIN_UP, &priv->flags))
@@ -954,6 +948,9 @@ void ipoib_mcast_restart_task(struct work_struct *work)
 		ipoib_mcast_leave(mcast->dev, mcast);
 		ipoib_mcast_free(mcast);
 	}
+	/*
+	 * Restart our join task if needed
+	 */
 	ipoib_mcast_start_thread(dev);
 	rtnl_unlock();
 }
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 20+ messages in thread

[parent not found: <8c03e7569e7da863c0507dbf95e96ad06280b938.1418229671.git.dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>]

* RE: [PATCH 07/10] IPoIB: Make ipoib_mcast_stop_thread flush the workqueue
       [not found]     ` <8c03e7569e7da863c0507dbf95e96ad06280b938.1418229671.git.dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2014-12-12 14:05       ` Estrin, Alex
  0 siblings, 0 replies; 20+ messages in thread
From: Estrin, Alex @ 2014-12-12 14:05 UTC (permalink / raw)
  To: Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	roland-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org



> -----Original Message-----
> From: linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org [mailto:linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org] On
> Behalf Of Doug Ledford
> Sent: Wednesday, December 10, 2014 11:47 AM
> To: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; roland-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org
> Cc: Doug Ledford
> Subject: [PATCH 07/10] IPoIB: Make ipoib_mcast_stop_thread flush the workqueue
> 
> We used to pass a flush variable to mcast_stop_thread to indicate if we
> should flush the workqueue or not.  This was due to some code trying to
> flush a workqueue that it was currently running on which is a no-no.
> Now that we have per-device work queues, and now that
> ipoib_mcast_restart_task has taken the fact that it is queued on a
> single thread workqueue with all of the ipoib_mcast_join_task's and
> therefore has no need to stop the join task while it runs, we can do
> away with the flush parameter and unilaterally flush always.
> 
> Signed-off-by: Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

Acked-by: Alex Estrin <alex.estrin-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

> ---
>  drivers/infiniband/ulp/ipoib/ipoib.h           |  2 +-
>  drivers/infiniband/ulp/ipoib/ipoib_ib.c        |  4 ++--
>  drivers/infiniband/ulp/ipoib/ipoib_multicast.c | 21 +++++++++------------
>  3 files changed, 12 insertions(+), 15 deletions(-)
> 
> diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h
> b/drivers/infiniband/ulp/ipoib/ipoib.h
> index 45fd10a72ec..28dc927c0e8 100644
> --- a/drivers/infiniband/ulp/ipoib/ipoib.h
> +++ b/drivers/infiniband/ulp/ipoib/ipoib.h
> @@ -499,7 +499,7 @@ void ipoib_mcast_send(struct net_device *dev, u8 *daddr, struct
> sk_buff *skb);
> 
>  void ipoib_mcast_restart_task(struct work_struct *work);
>  int ipoib_mcast_start_thread(struct net_device *dev);
> -int ipoib_mcast_stop_thread(struct net_device *dev, int flush);
> +int ipoib_mcast_stop_thread(struct net_device *dev);
> 
>  void ipoib_mcast_dev_down(struct net_device *dev);
>  void ipoib_mcast_dev_flush(struct net_device *dev);
> diff --git a/drivers/infiniband/ulp/ipoib/ipoib_ib.c
> b/drivers/infiniband/ulp/ipoib/ipoib_ib.c
> index bfd17d41b5f..66096787119 100644
> --- a/drivers/infiniband/ulp/ipoib/ipoib_ib.c
> +++ b/drivers/infiniband/ulp/ipoib/ipoib_ib.c
> @@ -747,7 +747,7 @@ int ipoib_ib_dev_down(struct net_device *dev, int flush)
>  	clear_bit(IPOIB_FLAG_OPER_UP, &priv->flags);
>  	netif_carrier_off(dev);
> 
> -	ipoib_mcast_stop_thread(dev, flush);
> +	ipoib_mcast_stop_thread(dev);
>  	ipoib_mcast_dev_flush(dev);
> 
>  	ipoib_flush_paths(dev);
> @@ -1097,7 +1097,7 @@ void ipoib_ib_dev_cleanup(struct net_device *dev)
>  	 */
>  	ipoib_flush_paths(dev);
> 
> -	ipoib_mcast_stop_thread(dev, 1);
> +	ipoib_mcast_stop_thread(dev);
>  	ipoib_mcast_dev_flush(dev);
> 
>  	ipoib_transport_dev_cleanup(dev);
> diff --git a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
> b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
> index 845f910eb21..bc50dd0d0e4 100644
> --- a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
> +++ b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
> @@ -648,7 +648,7 @@ int ipoib_mcast_start_thread(struct net_device *dev)
>  	return 0;
>  }
> 
> -int ipoib_mcast_stop_thread(struct net_device *dev, int flush)
> +int ipoib_mcast_stop_thread(struct net_device *dev)
>  {
>  	struct ipoib_dev_priv *priv = netdev_priv(dev);
> 
> @@ -659,8 +659,7 @@ int ipoib_mcast_stop_thread(struct net_device *dev, int flush)
>  	cancel_delayed_work(&priv->mcast_task);
>  	mutex_unlock(&mcast_mutex);
> 
> -	if (flush)
> -		flush_workqueue(priv->wq);
> +	flush_workqueue(priv->wq);
> 
>  	return 0;
>  }
> @@ -838,8 +837,6 @@ void ipoib_mcast_restart_task(struct work_struct *work)
> 
>  	ipoib_dbg_mcast(priv, "restarting multicast task\n");
> 
> -	ipoib_mcast_stop_thread(dev, 0);
> -
>  	local_irq_save(flags);
>  	netif_addr_lock(dev);
>  	spin_lock(&priv->lock);
> @@ -936,13 +933,10 @@ void ipoib_mcast_restart_task(struct work_struct *work)
>  	 * We have to cancel outside of the spinlock, but we have to
>  	 * take the rtnl lock or else we race with the removal of
>  	 * entries from the remove list in mcast_dev_flush as part
> -	 * of ipoib_stop() which will call mcast_stop_thread with
> -	 * flush == 1 while holding the rtnl lock, and the
> -	 * flush_workqueue won't complete until this restart_mcast_task
> -	 * completes.  So do like the carrier on task and attempt to
> -	 * take the rtnl lock, but if we can't before the ADMIN_UP flag
> -	 * goes away, then just return and know that the remove list will
> -	 * get flushed later by mcast_stop_thread.
> +	 * of ipoib_stop().  We detect the drop of the ADMIN_UP flag
> +	 * to signal that we have hit this particular race, and we
> +	 * return since we know we don't need to do anything else
> +	 * anyway.
>  	 */
>  	while (!rtnl_trylock()) {
>  		if (!test_bit(IPOIB_FLAG_ADMIN_UP, &priv->flags))
> @@ -954,6 +948,9 @@ void ipoib_mcast_restart_task(struct work_struct *work)
>  		ipoib_mcast_leave(mcast->dev, mcast);
>  		ipoib_mcast_free(mcast);
>  	}
> +	/*
> +	 * Restart our join task if needed
> +	 */
>  	ipoib_mcast_start_thread(dev);
>  	rtnl_unlock();
>  }
> --
> 2.1.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 08/10] IPoIB: No longer use flush as a parameter
       [not found] ` <cover.1418229671.git.dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
                     ` (6 preceding siblings ...)
  2014-12-10 16:47   ` [PATCH 07/10] IPoIB: Make ipoib_mcast_stop_thread flush the workqueue Doug Ledford
@ 2014-12-10 16:47   ` Doug Ledford
       [not found]     ` <28c67ea96dd0a0c3f39bfa78176537c3eb983c50.1418229671.git.dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  2014-12-10 16:47   ` [PATCH 09/10] IB/srp: Use P_Key cache for P_Key lookups Doug Ledford
                     ` (2 subsequent siblings)
  10 siblings, 1 reply; 20+ messages in thread
From: Doug Ledford @ 2014-12-10 16:47 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA, roland-DgEjT+Ai2ygdnm+yROfE0A
  Cc: Doug Ledford

Various places in the IPoIB code had a deadlock related to flushing the
ipoib workqueue.  Now that we have per device workqueues and a specific
flush workqueue, there is no longer a deadlock issue with flushing the
device specific workqueues and we can do so unilaterally.

Signed-off-by: Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
---
 drivers/infiniband/ulp/ipoib/ipoib.h      |  6 +++---
 drivers/infiniband/ulp/ipoib/ipoib_ib.c   | 19 +++++++++----------
 drivers/infiniband/ulp/ipoib/ipoib_main.c |  8 ++++----
 3 files changed, 16 insertions(+), 17 deletions(-)

diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h b/drivers/infiniband/ulp/ipoib/ipoib.h
index 28dc927c0e8..8ba80a6d3a4 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib.h
+++ b/drivers/infiniband/ulp/ipoib/ipoib.h
@@ -484,10 +484,10 @@ void ipoib_ib_dev_flush_heavy(struct work_struct *work);
 void ipoib_pkey_event(struct work_struct *work);
 void ipoib_ib_dev_cleanup(struct net_device *dev);
 
-int ipoib_ib_dev_open(struct net_device *dev, int flush);
+int ipoib_ib_dev_open(struct net_device *dev);
 int ipoib_ib_dev_up(struct net_device *dev);
-int ipoib_ib_dev_down(struct net_device *dev, int flush);
-int ipoib_ib_dev_stop(struct net_device *dev, int flush);
+int ipoib_ib_dev_down(struct net_device *dev);
+int ipoib_ib_dev_stop(struct net_device *dev);
 void ipoib_pkey_dev_check_presence(struct net_device *dev);
 
 int ipoib_dev_init(struct net_device *dev, struct ib_device *ca, int port);
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_ib.c b/drivers/infiniband/ulp/ipoib/ipoib_ib.c
index 66096787119..fe65abb5150 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_ib.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_ib.c
@@ -664,7 +664,7 @@ static void ipoib_ib_tx_timer_func(unsigned long ctx)
 	drain_tx_cq((struct net_device *)ctx);
 }
 
-int ipoib_ib_dev_open(struct net_device *dev, int flush)
+int ipoib_ib_dev_open(struct net_device *dev)
 {
 	struct ipoib_dev_priv *priv = netdev_priv(dev);
 	int ret;
@@ -706,7 +706,7 @@ int ipoib_ib_dev_open(struct net_device *dev, int flush)
 dev_stop:
 	if (!test_and_set_bit(IPOIB_FLAG_INITIALIZED, &priv->flags))
 		napi_enable(&priv->napi);
-	ipoib_ib_dev_stop(dev, flush);
+	ipoib_ib_dev_stop(dev);
 	return -1;
 }
 
@@ -738,7 +738,7 @@ int ipoib_ib_dev_up(struct net_device *dev)
 	return ipoib_mcast_start_thread(dev);
 }
 
-int ipoib_ib_dev_down(struct net_device *dev, int flush)
+int ipoib_ib_dev_down(struct net_device *dev)
 {
 	struct ipoib_dev_priv *priv = netdev_priv(dev);
 
@@ -807,7 +807,7 @@ void ipoib_drain_cq(struct net_device *dev)
 	local_bh_enable();
 }
 
-int ipoib_ib_dev_stop(struct net_device *dev, int flush)
+int ipoib_ib_dev_stop(struct net_device *dev)
 {
 	struct ipoib_dev_priv *priv = netdev_priv(dev);
 	struct ib_qp_attr qp_attr;
@@ -880,8 +880,7 @@ timeout:
 	/* Wait for all AHs to be reaped */
 	set_bit(IPOIB_STOP_REAPER, &priv->flags);
 	cancel_delayed_work(&priv->ah_reap_task);
-	if (flush)
-		flush_workqueue(priv->wq);
+	flush_workqueue(priv->wq);
 
 	begin = jiffies;
 
@@ -918,7 +917,7 @@ int ipoib_ib_dev_init(struct net_device *dev, struct ib_device *ca, int port)
 		    (unsigned long) dev);
 
 	if (dev->flags & IFF_UP) {
-		if (ipoib_ib_dev_open(dev, 1)) {
+		if (ipoib_ib_dev_open(dev)) {
 			ipoib_transport_dev_cleanup(dev);
 			return -ENODEV;
 		}
@@ -1040,12 +1039,12 @@ static void __ipoib_ib_dev_flush(struct ipoib_dev_priv *priv,
 	}
 
 	if (level >= IPOIB_FLUSH_NORMAL)
-		ipoib_ib_dev_down(dev, 0);
+		ipoib_ib_dev_down(dev);
 
 	if (level == IPOIB_FLUSH_HEAVY) {
 		if (test_bit(IPOIB_FLAG_INITIALIZED, &priv->flags))
-			ipoib_ib_dev_stop(dev, 0);
-		if (ipoib_ib_dev_open(dev, 0) != 0)
+			ipoib_ib_dev_stop(dev);
+		if (ipoib_ib_dev_open(dev) != 0)
 			return;
 		if (netif_queue_stopped(dev))
 			netif_start_queue(dev);
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c
index 42e5c278f48..6bad17d4d58 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
@@ -108,7 +108,7 @@ int ipoib_open(struct net_device *dev)
 
 	set_bit(IPOIB_FLAG_ADMIN_UP, &priv->flags);
 
-	if (ipoib_ib_dev_open(dev, 1)) {
+	if (ipoib_ib_dev_open(dev)) {
 		if (!test_bit(IPOIB_PKEY_ASSIGNED, &priv->flags))
 			return 0;
 		goto err_disable;
@@ -139,7 +139,7 @@ int ipoib_open(struct net_device *dev)
 	return 0;
 
 err_stop:
-	ipoib_ib_dev_stop(dev, 1);
+	ipoib_ib_dev_stop(dev);
 
 err_disable:
 	clear_bit(IPOIB_FLAG_ADMIN_UP, &priv->flags);
@@ -157,8 +157,8 @@ static int ipoib_stop(struct net_device *dev)
 
 	netif_stop_queue(dev);
 
-	ipoib_ib_dev_down(dev, 1);
-	ipoib_ib_dev_stop(dev, 0);
+	ipoib_ib_dev_down(dev);
+	ipoib_ib_dev_stop(dev);
 
 	if (!test_bit(IPOIB_FLAG_SUBINTERFACE, &priv->flags)) {
 		struct ipoib_dev_priv *cpriv;
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 20+ messages in thread

[parent not found: <28c67ea96dd0a0c3f39bfa78176537c3eb983c50.1418229671.git.dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>]

* RE: [PATCH 08/10] IPoIB: No longer use flush as a parameter
       [not found]     ` <28c67ea96dd0a0c3f39bfa78176537c3eb983c50.1418229671.git.dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2014-12-12 14:05       ` Estrin, Alex
  0 siblings, 0 replies; 20+ messages in thread
From: Estrin, Alex @ 2014-12-12 14:05 UTC (permalink / raw)
  To: Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	roland-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org



> -----Original Message-----
> From: linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org [mailto:linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org] On
> Behalf Of Doug Ledford
> Sent: Wednesday, December 10, 2014 11:47 AM
> To: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; roland-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org
> Cc: Doug Ledford
> Subject: [PATCH 08/10] IPoIB: No longer use flush as a parameter
> 
> Various places in the IPoIB code had a deadlock related to flushing the
> ipoib workqueue.  Now that we have per device workqueues and a specific
> flush workqueue, there is no longer a deadlock issue with flushing the
> device specific workqueues and we can do so unilaterally.
> 
> Signed-off-by: Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

Acked-by: Alex Estrin <alex.estrin-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

> ---
>  drivers/infiniband/ulp/ipoib/ipoib.h      |  6 +++---
>  drivers/infiniband/ulp/ipoib/ipoib_ib.c   | 19 +++++++++----------
>  drivers/infiniband/ulp/ipoib/ipoib_main.c |  8 ++++----
>  3 files changed, 16 insertions(+), 17 deletions(-)
> 
> diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h
> b/drivers/infiniband/ulp/ipoib/ipoib.h
> index 28dc927c0e8..8ba80a6d3a4 100644
> --- a/drivers/infiniband/ulp/ipoib/ipoib.h
> +++ b/drivers/infiniband/ulp/ipoib/ipoib.h
> @@ -484,10 +484,10 @@ void ipoib_ib_dev_flush_heavy(struct work_struct *work);
>  void ipoib_pkey_event(struct work_struct *work);
>  void ipoib_ib_dev_cleanup(struct net_device *dev);
> 
> -int ipoib_ib_dev_open(struct net_device *dev, int flush);
> +int ipoib_ib_dev_open(struct net_device *dev);
>  int ipoib_ib_dev_up(struct net_device *dev);
> -int ipoib_ib_dev_down(struct net_device *dev, int flush);
> -int ipoib_ib_dev_stop(struct net_device *dev, int flush);
> +int ipoib_ib_dev_down(struct net_device *dev);
> +int ipoib_ib_dev_stop(struct net_device *dev);
>  void ipoib_pkey_dev_check_presence(struct net_device *dev);
> 
>  int ipoib_dev_init(struct net_device *dev, struct ib_device *ca, int port);
> diff --git a/drivers/infiniband/ulp/ipoib/ipoib_ib.c
> b/drivers/infiniband/ulp/ipoib/ipoib_ib.c
> index 66096787119..fe65abb5150 100644
> --- a/drivers/infiniband/ulp/ipoib/ipoib_ib.c
> +++ b/drivers/infiniband/ulp/ipoib/ipoib_ib.c
> @@ -664,7 +664,7 @@ static void ipoib_ib_tx_timer_func(unsigned long ctx)
>  	drain_tx_cq((struct net_device *)ctx);
>  }
> 
> -int ipoib_ib_dev_open(struct net_device *dev, int flush)
> +int ipoib_ib_dev_open(struct net_device *dev)
>  {
>  	struct ipoib_dev_priv *priv = netdev_priv(dev);
>  	int ret;
> @@ -706,7 +706,7 @@ int ipoib_ib_dev_open(struct net_device *dev, int flush)
>  dev_stop:
>  	if (!test_and_set_bit(IPOIB_FLAG_INITIALIZED, &priv->flags))
>  		napi_enable(&priv->napi);
> -	ipoib_ib_dev_stop(dev, flush);
> +	ipoib_ib_dev_stop(dev);
>  	return -1;
>  }
> 
> @@ -738,7 +738,7 @@ int ipoib_ib_dev_up(struct net_device *dev)
>  	return ipoib_mcast_start_thread(dev);
>  }
> 
> -int ipoib_ib_dev_down(struct net_device *dev, int flush)
> +int ipoib_ib_dev_down(struct net_device *dev)
>  {
>  	struct ipoib_dev_priv *priv = netdev_priv(dev);
> 
> @@ -807,7 +807,7 @@ void ipoib_drain_cq(struct net_device *dev)
>  	local_bh_enable();
>  }
> 
> -int ipoib_ib_dev_stop(struct net_device *dev, int flush)
> +int ipoib_ib_dev_stop(struct net_device *dev)
>  {
>  	struct ipoib_dev_priv *priv = netdev_priv(dev);
>  	struct ib_qp_attr qp_attr;
> @@ -880,8 +880,7 @@ timeout:
>  	/* Wait for all AHs to be reaped */
>  	set_bit(IPOIB_STOP_REAPER, &priv->flags);
>  	cancel_delayed_work(&priv->ah_reap_task);
> -	if (flush)
> -		flush_workqueue(priv->wq);
> +	flush_workqueue(priv->wq);
> 
>  	begin = jiffies;
> 
> @@ -918,7 +917,7 @@ int ipoib_ib_dev_init(struct net_device *dev, struct ib_device
> *ca, int port)
>  		    (unsigned long) dev);
> 
>  	if (dev->flags & IFF_UP) {
> -		if (ipoib_ib_dev_open(dev, 1)) {
> +		if (ipoib_ib_dev_open(dev)) {
>  			ipoib_transport_dev_cleanup(dev);
>  			return -ENODEV;
>  		}
> @@ -1040,12 +1039,12 @@ static void __ipoib_ib_dev_flush(struct ipoib_dev_priv
> *priv,
>  	}
> 
>  	if (level >= IPOIB_FLUSH_NORMAL)
> -		ipoib_ib_dev_down(dev, 0);
> +		ipoib_ib_dev_down(dev);
> 
>  	if (level == IPOIB_FLUSH_HEAVY) {
>  		if (test_bit(IPOIB_FLAG_INITIALIZED, &priv->flags))
> -			ipoib_ib_dev_stop(dev, 0);
> -		if (ipoib_ib_dev_open(dev, 0) != 0)
> +			ipoib_ib_dev_stop(dev);
> +		if (ipoib_ib_dev_open(dev) != 0)
>  			return;
>  		if (netif_queue_stopped(dev))
>  			netif_start_queue(dev);
> diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c
> b/drivers/infiniband/ulp/ipoib/ipoib_main.c
> index 42e5c278f48..6bad17d4d58 100644
> --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
> +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
> @@ -108,7 +108,7 @@ int ipoib_open(struct net_device *dev)
> 
>  	set_bit(IPOIB_FLAG_ADMIN_UP, &priv->flags);
> 
> -	if (ipoib_ib_dev_open(dev, 1)) {
> +	if (ipoib_ib_dev_open(dev)) {
>  		if (!test_bit(IPOIB_PKEY_ASSIGNED, &priv->flags))
>  			return 0;
>  		goto err_disable;
> @@ -139,7 +139,7 @@ int ipoib_open(struct net_device *dev)
>  	return 0;
> 
>  err_stop:
> -	ipoib_ib_dev_stop(dev, 1);
> +	ipoib_ib_dev_stop(dev);
> 
>  err_disable:
>  	clear_bit(IPOIB_FLAG_ADMIN_UP, &priv->flags);
> @@ -157,8 +157,8 @@ static int ipoib_stop(struct net_device *dev)
> 
>  	netif_stop_queue(dev);
> 
> -	ipoib_ib_dev_down(dev, 1);
> -	ipoib_ib_dev_stop(dev, 0);
> +	ipoib_ib_dev_down(dev);
> +	ipoib_ib_dev_stop(dev);
> 
>  	if (!test_bit(IPOIB_FLAG_SUBINTERFACE, &priv->flags)) {
>  		struct ipoib_dev_priv *cpriv;
> --
> 2.1.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 09/10] IB/srp: Use P_Key cache for P_Key lookups
       [not found] ` <cover.1418229671.git.dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
                     ` (7 preceding siblings ...)
  2014-12-10 16:47   ` [PATCH 08/10] IPoIB: No longer use flush as a parameter Doug Ledford
@ 2014-12-10 16:47   ` Doug Ledford
  2014-12-10 16:47   ` [PATCH 10/10] ib_srpt: convert printk's to pr_* functions Doug Ledford
  2014-12-12 14:01   ` [PATCH 00/10] IPoIB locking fixes and a couple minor other patches Estrin, Alex
  10 siblings, 0 replies; 20+ messages in thread
From: Doug Ledford @ 2014-12-10 16:47 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA, roland-DgEjT+Ai2ygdnm+yROfE0A
  Cc: Bart Van Assche, Sebastian Parschauer, Doug Ledford

From: Bart Van Assche <bvanassche-HInyCGIudOg@public.gmane.org>

This change slightly reduces the time needed to log in.

Signed-off-by: Bart Van Assche <bvanassche-HInyCGIudOg@public.gmane.org>
Reviewed-by: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Reviewed-by: David Dillow <dave-i1Mk8JYDVaaSihdK6806/g@public.gmane.org>
Cc: Sebastian Parschauer <sebastian.riemer-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
Signed-off-by: Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
---
 drivers/infiniband/ulp/srp/ib_srp.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c
index 5461924c9f1..8fa3da255cf 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -40,6 +40,7 @@
 #include <linux/parser.h>
 #include <linux/random.h>
 #include <linux/jiffies.h>
+#include <rdma/ib_cache.h>
 
 #include <linux/atomic.h>
 
@@ -265,10 +266,10 @@ static int srp_init_qp(struct srp_target_port *target,
 	if (!attr)
 		return -ENOMEM;
 
-	ret = ib_find_pkey(target->srp_host->srp_dev->dev,
-			   target->srp_host->port,
-			   be16_to_cpu(target->pkey),
-			   &attr->pkey_index);
+	ret = ib_find_cached_pkey(target->srp_host->srp_dev->dev,
+				  target->srp_host->port,
+				  be16_to_cpu(target->pkey),
+				  &attr->pkey_index);
 	if (ret)
 		goto out;
 
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 10/10] ib_srpt: convert printk's to pr_* functions
       [not found] ` <cover.1418229671.git.dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
                     ` (8 preceding siblings ...)
  2014-12-10 16:47   ` [PATCH 09/10] IB/srp: Use P_Key cache for P_Key lookups Doug Ledford
@ 2014-12-10 16:47   ` Doug Ledford
  2014-12-12 14:01   ` [PATCH 00/10] IPoIB locking fixes and a couple minor other patches Estrin, Alex
  10 siblings, 0 replies; 20+ messages in thread
From: Doug Ledford @ 2014-12-10 16:47 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA, roland-DgEjT+Ai2ygdnm+yROfE0A
  Cc: Doug Ledford

The driver already defined the pr_format, it just hadn't
been converted to use pr_info, pr_warn, and pr_err instead
of the equivalent printks.  Convert so that messages from
the driver are now properly tagged with their driver name
and can be more easily debugged.

In addition, a number of these printk's were not newline
terminated, so fix that at the same time.

Reviewed-by: Bart Van Assche <bvanassche-HInyCGIudOg@public.gmane.org>
Signed-off-by: Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
---
 drivers/infiniband/ulp/srpt/ib_srpt.c | 188 ++++++++++++++++------------------
 1 file changed, 91 insertions(+), 97 deletions(-)

diff --git a/drivers/infiniband/ulp/srpt/ib_srpt.c b/drivers/infiniband/ulp/srpt/ib_srpt.c
index dc829682701..a40367556d7 100644
--- a/drivers/infiniband/ulp/srpt/ib_srpt.c
+++ b/drivers/infiniband/ulp/srpt/ib_srpt.c
@@ -207,7 +207,7 @@ static void srpt_event_handler(struct ib_event_handler *handler,
 		}
 		break;
 	default:
-		printk(KERN_ERR "received unrecognized IB event %d\n",
+		pr_err("received unrecognized IB event %d\n",
 		       event->event);
 		break;
 	}
@@ -218,7 +218,7 @@ static void srpt_event_handler(struct ib_event_handler *handler,
  */
 static void srpt_srq_event(struct ib_event *event, void *ctx)
 {
-	printk(KERN_INFO "SRQ event %d\n", event->event);
+	pr_info("SRQ event %d\n", event->event);
 }
 
 /**
@@ -242,8 +242,7 @@ static void srpt_qp_event(struct ib_event *event, struct srpt_rdma_ch *ch)
 				 ch->sess_name, srpt_get_ch_state(ch));
 		break;
 	default:
-		printk(KERN_ERR "received unrecognized IB QP event %d\n",
-		       event->event);
+		pr_err("received unrecognized IB QP event %d\n", event->event);
 		break;
 	}
 }
@@ -602,7 +601,7 @@ static void srpt_unregister_mad_agent(struct srpt_device *sdev)
 		sport = &sdev->port[i - 1];
 		WARN_ON(sport->port != i);
 		if (ib_modify_port(sdev->device, i, 0, &port_modify) < 0)
-			printk(KERN_ERR "disabling MAD processing failed.\n");
+			pr_err("disabling MAD processing failed.\n");
 		if (sport->mad_agent) {
 			ib_unregister_mad_agent(sport->mad_agent);
 			sport->mad_agent = NULL;
@@ -810,7 +809,7 @@ static int srpt_post_send(struct srpt_rdma_ch *ch,
 
 	ret = -ENOMEM;
 	if (unlikely(atomic_dec_return(&ch->sq_wr_avail) < 0)) {
-		printk(KERN_WARNING "IB send queue full (needed 1)\n");
+		pr_warn("IB send queue full (needed 1)\n");
 		goto out;
 	}
 
@@ -912,7 +911,7 @@ static int srpt_get_desc_tbl(struct srpt_send_ioctx *ioctx,
 
 		if (ioctx->n_rbuf >
 		    (srp_cmd->data_out_desc_cnt + srp_cmd->data_in_desc_cnt)) {
-			printk(KERN_ERR "received unsupported SRP_CMD request"
+			pr_err("received unsupported SRP_CMD request"
 			       " type (%u out + %u in != %u / %zu)\n",
 			       srp_cmd->data_out_desc_cnt,
 			       srp_cmd->data_in_desc_cnt,
@@ -1432,7 +1431,7 @@ static void srpt_handle_send_comp(struct srpt_rdma_ch *ch,
 		srpt_unmap_sg_to_ib_sge(ch, ioctx);
 		transport_generic_free_cmd(&ioctx->cmd, 0);
 	} else {
-		printk(KERN_ERR "IB completion has been received too late for"
+		pr_err("IB completion has been received too late for"
 		       " wr_id = %u.\n", ioctx->ioctx.index);
 	}
 }
@@ -1457,7 +1456,7 @@ static void srpt_handle_rdma_comp(struct srpt_rdma_ch *ch,
 						SRPT_STATE_DATA_IN))
 			target_execute_cmd(&ioctx->cmd);
 		else
-			printk(KERN_ERR "%s[%d]: wrong state = %d\n", __func__,
+			pr_err("%s[%d]: wrong state = %d\n", __func__,
 			       __LINE__, srpt_get_cmd_state(ioctx));
 	} else if (opcode == SRPT_RDMA_ABORT) {
 		ioctx->rdma_aborted = true;
@@ -1481,7 +1480,7 @@ static void srpt_handle_rdma_err_comp(struct srpt_rdma_ch *ch,
 	switch (opcode) {
 	case SRPT_RDMA_READ_LAST:
 		if (ioctx->n_rdma <= 0) {
-			printk(KERN_ERR "Received invalid RDMA read"
+			pr_err("Received invalid RDMA read"
 			       " error completion with idx %d\n",
 			       ioctx->ioctx.index);
 			break;
@@ -1490,14 +1489,13 @@ static void srpt_handle_rdma_err_comp(struct srpt_rdma_ch *ch,
 		if (state == SRPT_STATE_NEED_DATA)
 			srpt_abort_cmd(ioctx);
 		else
-			printk(KERN_ERR "%s[%d]: wrong state = %d\n",
+			pr_err("%s[%d]: wrong state = %d\n",
 			       __func__, __LINE__, state);
 		break;
 	case SRPT_RDMA_WRITE_LAST:
 		break;
 	default:
-		printk(KERN_ERR "%s[%d]: opcode = %u\n", __func__,
-		       __LINE__, opcode);
+		pr_err("%s[%d]: opcode = %u\n", __func__, __LINE__, opcode);
 		break;
 	}
 }
@@ -1549,8 +1547,8 @@ static int srpt_build_cmd_rsp(struct srpt_rdma_ch *ch,
 		BUILD_BUG_ON(MIN_MAX_RSP_SIZE <= sizeof(*srp_rsp));
 		max_sense_len = ch->max_ti_iu_len - sizeof(*srp_rsp);
 		if (sense_data_len > max_sense_len) {
-			printk(KERN_WARNING "truncated sense data from %d to %d"
-			       " bytes\n", sense_data_len, max_sense_len);
+			pr_warn("truncated sense data from %d to %d"
+				" bytes\n", sense_data_len, max_sense_len);
 			sense_data_len = max_sense_len;
 		}
 
@@ -1628,8 +1626,8 @@ static uint64_t srpt_unpack_lun(const uint8_t *lun, int len)
 	int addressing_method;
 
 	if (unlikely(len < 2)) {
-		printk(KERN_ERR "Illegal LUN length %d, expected 2 bytes or "
-		       "more", len);
+		pr_err("Illegal LUN length %d, expected 2 bytes or more\n",
+		       len);
 		goto out;
 	}
 
@@ -1663,7 +1661,7 @@ static uint64_t srpt_unpack_lun(const uint8_t *lun, int len)
 
 	case SCSI_LUN_ADDR_METHOD_EXTENDED_LUN:
 	default:
-		printk(KERN_ERR "Unimplemented LUN addressing method %u",
+		pr_err("Unimplemented LUN addressing method %u\n",
 		       addressing_method);
 		break;
 	}
@@ -1672,8 +1670,7 @@ out:
 	return res;
 
 out_err:
-	printk(KERN_ERR "Support for multi-level LUNs has not yet been"
-	       " implemented");
+	pr_err("Support for multi-level LUNs has not yet been implemented\n");
 	goto out;
 }
 
@@ -1723,7 +1720,7 @@ static int srpt_handle_cmd(struct srpt_rdma_ch *ch,
 	}
 
 	if (srpt_get_desc_tbl(send_ioctx, srp_cmd, &dir, &data_len)) {
-		printk(KERN_ERR "0x%llx: parsing SRP descriptor table failed.\n",
+		pr_err("0x%llx: parsing SRP descriptor table failed.\n",
 		       srp_cmd->tag);
 		ret = TCM_INVALID_CDB_FIELD;
 		goto send_sense;
@@ -1912,7 +1909,7 @@ static void srpt_handle_new_iu(struct srpt_rdma_ch *ch,
 		srpt_handle_tsk_mgmt(ch, recv_ioctx, send_ioctx);
 		break;
 	case SRP_I_LOGOUT:
-		printk(KERN_ERR "Not yet implemented: SRP_I_LOGOUT\n");
+		pr_err("Not yet implemented: SRP_I_LOGOUT\n");
 		break;
 	case SRP_CRED_RSP:
 		pr_debug("received SRP_CRED_RSP\n");
@@ -1921,10 +1918,10 @@ static void srpt_handle_new_iu(struct srpt_rdma_ch *ch,
 		pr_debug("received SRP_AER_RSP\n");
 		break;
 	case SRP_RSP:
-		printk(KERN_ERR "Received SRP_RSP\n");
+		pr_err("Received SRP_RSP\n");
 		break;
 	default:
-		printk(KERN_ERR "received IU with unknown opcode 0x%x\n",
+		pr_err("received IU with unknown opcode 0x%x\n",
 		       srp_cmd->opcode);
 		break;
 	}
@@ -1948,12 +1945,12 @@ static void srpt_process_rcv_completion(struct ib_cq *cq,
 
 		req_lim = atomic_dec_return(&ch->req_lim);
 		if (unlikely(req_lim < 0))
-			printk(KERN_ERR "req_lim = %d < 0\n", req_lim);
+			pr_err("req_lim = %d < 0\n", req_lim);
 		ioctx = sdev->ioctx_ring[index];
 		srpt_handle_new_iu(ch, ioctx, NULL);
 	} else {
-		printk(KERN_INFO "receiving failed for idx %u with status %d\n",
-		       index, wc->status);
+		pr_info("receiving failed for idx %u with status %d\n",
+			index, wc->status);
 	}
 }
 
@@ -1993,12 +1990,12 @@ static void srpt_process_send_completion(struct ib_cq *cq,
 		}
 	} else {
 		if (opcode == SRPT_SEND) {
-			printk(KERN_INFO "sending response for idx %u failed"
-			       " with status %d\n", index, wc->status);
+			pr_info("sending response for idx %u failed"
+				" with status %d\n", index, wc->status);
 			srpt_handle_send_err_comp(ch, wc->wr_id);
 		} else if (opcode != SRPT_RDMA_MID) {
-			printk(KERN_INFO "RDMA t %d for idx %u failed with"
-				" status %d", opcode, index, wc->status);
+			pr_info("RDMA t %d for idx %u failed with"
+				" status %d\n", opcode, index, wc->status);
 			srpt_handle_rdma_err_comp(ch, send_ioctx, opcode);
 		}
 	}
@@ -2062,15 +2059,15 @@ static int srpt_compl_thread(void *arg)
 
 	ch = arg;
 	BUG_ON(!ch);
-	printk(KERN_INFO "Session %s: kernel thread %s (PID %d) started\n",
-	       ch->sess_name, ch->thread->comm, current->pid);
+	pr_info("Session %s: kernel thread %s (PID %d) started\n",
+		ch->sess_name, ch->thread->comm, current->pid);
 	while (!kthread_should_stop()) {
 		wait_event_interruptible(ch->wait_queue,
 			(srpt_process_completion(ch->cq, ch),
 			 kthread_should_stop()));
 	}
-	printk(KERN_INFO "Session %s: kernel thread %s (PID %d) stopped\n",
-	       ch->sess_name, ch->thread->comm, current->pid);
+	pr_info("Session %s: kernel thread %s (PID %d) stopped\n",
+		ch->sess_name, ch->thread->comm, current->pid);
 	return 0;
 }
 
@@ -2097,7 +2094,7 @@ retry:
 			      ch->rq_size + srp_sq_size, 0);
 	if (IS_ERR(ch->cq)) {
 		ret = PTR_ERR(ch->cq);
-		printk(KERN_ERR "failed to create CQ cqe= %d ret= %d\n",
+		pr_err("failed to create CQ cqe= %d ret= %d\n",
 		       ch->rq_size + srp_sq_size, ret);
 		goto out;
 	}
@@ -2123,7 +2120,7 @@ retry:
 				goto retry;
 			}
 		}
-		printk(KERN_ERR "failed to create_qp ret= %d\n", ret);
+		pr_err("failed to create_qp ret= %d\n", ret);
 		goto err_destroy_cq;
 	}
 
@@ -2143,7 +2140,7 @@ retry:
 
 	ch->thread = kthread_run(srpt_compl_thread, ch, "ib_srpt_compl");
 	if (IS_ERR(ch->thread)) {
-		printk(KERN_ERR "failed to create kernel thread %ld\n",
+		pr_err("failed to create kernel thread %ld\n",
 		       PTR_ERR(ch->thread));
 		ch->thread = NULL;
 		goto err_destroy_qp;
@@ -2204,7 +2201,7 @@ static void __srpt_close_ch(struct srpt_rdma_ch *ch)
 		/* fall through */
 	case CH_LIVE:
 		if (ib_send_cm_dreq(ch->cm_id, NULL, 0) < 0)
-			printk(KERN_ERR "sending CM DREQ failed.\n");
+			pr_err("sending CM DREQ failed.\n");
 		break;
 	case CH_DISCONNECTING:
 		break;
@@ -2291,7 +2288,7 @@ static void srpt_drain_channel(struct ib_cm_id *cm_id)
 
 		ret = srpt_ch_qp_err(ch);
 		if (ret < 0)
-			printk(KERN_ERR "Setting queue pair in error state"
+			pr_err("Setting queue pair in error state"
 			       " failed: %d\n", ret);
 	}
 }
@@ -2435,17 +2432,17 @@ static int srpt_cm_req_recv(struct ib_cm_id *cm_id,
 
 	it_iu_len = be32_to_cpu(req->req_it_iu_len);
 
-	printk(KERN_INFO "Received SRP_LOGIN_REQ with i_port_id 0x%llx:0x%llx,"
-	       " t_port_id 0x%llx:0x%llx and it_iu_len %d on port %d"
-	       " (guid=0x%llx:0x%llx)\n",
-	       be64_to_cpu(*(__be64 *)&req->initiator_port_id[0]),
-	       be64_to_cpu(*(__be64 *)&req->initiator_port_id[8]),
-	       be64_to_cpu(*(__be64 *)&req->target_port_id[0]),
-	       be64_to_cpu(*(__be64 *)&req->target_port_id[8]),
-	       it_iu_len,
-	       param->port,
-	       be64_to_cpu(*(__be64 *)&sdev->port[param->port - 1].gid.raw[0]),
-	       be64_to_cpu(*(__be64 *)&sdev->port[param->port - 1].gid.raw[8]));
+	pr_info("Received SRP_LOGIN_REQ with i_port_id 0x%llx:0x%llx,"
+		" t_port_id 0x%llx:0x%llx and it_iu_len %d on port %d"
+		" (guid=0x%llx:0x%llx)\n",
+		be64_to_cpu(*(__be64 *)&req->initiator_port_id[0]),
+		be64_to_cpu(*(__be64 *)&req->initiator_port_id[8]),
+		be64_to_cpu(*(__be64 *)&req->target_port_id[0]),
+		be64_to_cpu(*(__be64 *)&req->target_port_id[8]),
+		it_iu_len,
+		param->port,
+		be64_to_cpu(*(__be64 *)&sdev->port[param->port - 1].gid.raw[0]),
+		be64_to_cpu(*(__be64 *)&sdev->port[param->port - 1].gid.raw[8]));
 
 	rsp = kzalloc(sizeof *rsp, GFP_KERNEL);
 	rej = kzalloc(sizeof *rej, GFP_KERNEL);
@@ -2460,7 +2457,7 @@ static int srpt_cm_req_recv(struct ib_cm_id *cm_id,
 		rej->reason = __constant_cpu_to_be32(
 				SRP_LOGIN_REJ_REQ_IT_IU_LENGTH_TOO_LARGE);
 		ret = -EINVAL;
-		printk(KERN_ERR "rejected SRP_LOGIN_REQ because its"
+		pr_err("rejected SRP_LOGIN_REQ because its"
 		       " length (%d bytes) is out of range (%d .. %d)\n",
 		       it_iu_len, 64, srp_max_req_size);
 		goto reject;
@@ -2470,7 +2467,7 @@ static int srpt_cm_req_recv(struct ib_cm_id *cm_id,
 		rej->reason = __constant_cpu_to_be32(
 			     SRP_LOGIN_REJ_INSUFFICIENT_RESOURCES);
 		ret = -EINVAL;
-		printk(KERN_ERR "rejected SRP_LOGIN_REQ because the target port"
+		pr_err("rejected SRP_LOGIN_REQ because the target port"
 		       " has not yet been enabled\n");
 		goto reject;
 	}
@@ -2516,7 +2513,7 @@ static int srpt_cm_req_recv(struct ib_cm_id *cm_id,
 		rej->reason = __constant_cpu_to_be32(
 				SRP_LOGIN_REJ_UNABLE_ASSOCIATE_CHANNEL);
 		ret = -ENOMEM;
-		printk(KERN_ERR "rejected SRP_LOGIN_REQ because it"
+		pr_err("rejected SRP_LOGIN_REQ because it"
 		       " has an invalid target port identifier.\n");
 		goto reject;
 	}
@@ -2525,7 +2522,7 @@ static int srpt_cm_req_recv(struct ib_cm_id *cm_id,
 	if (!ch) {
 		rej->reason = __constant_cpu_to_be32(
 					SRP_LOGIN_REJ_INSUFFICIENT_RESOURCES);
-		printk(KERN_ERR "rejected SRP_LOGIN_REQ because no memory.\n");
+		pr_err("rejected SRP_LOGIN_REQ because no memory.\n");
 		ret = -ENOMEM;
 		goto reject;
 	}
@@ -2562,7 +2559,7 @@ static int srpt_cm_req_recv(struct ib_cm_id *cm_id,
 	if (ret) {
 		rej->reason = __constant_cpu_to_be32(
 				SRP_LOGIN_REJ_INSUFFICIENT_RESOURCES);
-		printk(KERN_ERR "rejected SRP_LOGIN_REQ because creating"
+		pr_err("rejected SRP_LOGIN_REQ because creating"
 		       " a new RDMA channel failed.\n");
 		goto free_ring;
 	}
@@ -2571,7 +2568,7 @@ static int srpt_cm_req_recv(struct ib_cm_id *cm_id,
 	if (ret) {
 		rej->reason = __constant_cpu_to_be32(
 				SRP_LOGIN_REJ_INSUFFICIENT_RESOURCES);
-		printk(KERN_ERR "rejected SRP_LOGIN_REQ because enabling"
+		pr_err("rejected SRP_LOGIN_REQ because enabling"
 		       " RTR failed (error code = %d)\n", ret);
 		goto destroy_ib;
 	}
@@ -2586,8 +2583,8 @@ static int srpt_cm_req_recv(struct ib_cm_id *cm_id,
 
 	nacl = srpt_lookup_acl(sport, ch->i_port_id);
 	if (!nacl) {
-		printk(KERN_INFO "Rejected login because no ACL has been"
-		       " configured yet for initiator %s.\n", ch->sess_name);
+		pr_info("Rejected login because no ACL has been"
+			" configured yet for initiator %s.\n", ch->sess_name);
 		rej->reason = __constant_cpu_to_be32(
 				SRP_LOGIN_REJ_CHANNEL_LIMIT_REACHED);
 		goto destroy_ib;
@@ -2631,7 +2628,7 @@ static int srpt_cm_req_recv(struct ib_cm_id *cm_id,
 
 	ret = ib_send_cm_rep(cm_id, rep_param);
 	if (ret) {
-		printk(KERN_ERR "sending SRP_LOGIN_REQ response failed"
+		pr_err("sending SRP_LOGIN_REQ response failed"
 		       " (error code = %d)\n", ret);
 		goto release_channel;
 	}
@@ -2679,7 +2676,7 @@ out:
 
 static void srpt_cm_rej_recv(struct ib_cm_id *cm_id)
 {
-	printk(KERN_INFO "Received IB REJ for cm_id %p.\n", cm_id);
+	pr_info("Received IB REJ for cm_id %p.\n", cm_id);
 	srpt_drain_channel(cm_id);
 }
 
@@ -2714,13 +2711,13 @@ static void srpt_cm_rtu_recv(struct ib_cm_id *cm_id)
 
 static void srpt_cm_timewait_exit(struct ib_cm_id *cm_id)
 {
-	printk(KERN_INFO "Received IB TimeWait exit for cm_id %p.\n", cm_id);
+	pr_info("Received IB TimeWait exit for cm_id %p.\n", cm_id);
 	srpt_drain_channel(cm_id);
 }
 
 static void srpt_cm_rep_error(struct ib_cm_id *cm_id)
 {
-	printk(KERN_INFO "Received IB REP error for cm_id %p.\n", cm_id);
+	pr_info("Received IB REP error for cm_id %p.\n", cm_id);
 	srpt_drain_channel(cm_id);
 }
 
@@ -2755,9 +2752,9 @@ static void srpt_cm_dreq_recv(struct ib_cm_id *cm_id)
 
 	if (send_drep) {
 		if (ib_send_cm_drep(ch->cm_id, NULL, 0) < 0)
-			printk(KERN_ERR "Sending IB DREP failed.\n");
-		printk(KERN_INFO "Received DREQ and sent DREP for session %s.\n",
-		       ch->sess_name);
+			pr_err("Sending IB DREP failed.\n");
+		pr_info("Received DREQ and sent DREP for session %s.\n",
+			ch->sess_name);
 	}
 }
 
@@ -2766,8 +2763,7 @@ static void srpt_cm_dreq_recv(struct ib_cm_id *cm_id)
  */
 static void srpt_cm_drep_recv(struct ib_cm_id *cm_id)
 {
-	printk(KERN_INFO "Received InfiniBand DREP message for cm_id %p.\n",
-	       cm_id);
+	pr_info("Received InfiniBand DREP message for cm_id %p.\n", cm_id);
 	srpt_drain_channel(cm_id);
 }
 
@@ -2811,14 +2807,13 @@ static int srpt_cm_handler(struct ib_cm_id *cm_id, struct ib_cm_event *event)
 		srpt_cm_rep_error(cm_id);
 		break;
 	case IB_CM_DREQ_ERROR:
-		printk(KERN_INFO "Received IB DREQ ERROR event.\n");
+		pr_info("Received IB DREQ ERROR event.\n");
 		break;
 	case IB_CM_MRA_RECEIVED:
-		printk(KERN_INFO "Received IB MRA event\n");
+		pr_info("Received IB MRA event\n");
 		break;
 	default:
-		printk(KERN_ERR "received unrecognized IB CM event %d\n",
-		       event->event);
+		pr_err("received unrecognized IB CM event %d\n", event->event);
 		break;
 	}
 
@@ -2848,8 +2843,8 @@ static int srpt_perform_rdmas(struct srpt_rdma_ch *ch,
 		ret = -ENOMEM;
 		sq_wr_avail = atomic_sub_return(n_rdma, &ch->sq_wr_avail);
 		if (sq_wr_avail < 0) {
-			printk(KERN_WARNING "IB send queue full (needed %d)\n",
-			       n_rdma);
+			pr_warn("IB send queue full (needed %d)\n",
+				n_rdma);
 			goto out;
 		}
 	}
@@ -2889,7 +2884,7 @@ static int srpt_perform_rdmas(struct srpt_rdma_ch *ch,
 	}
 
 	if (ret)
-		printk(KERN_ERR "%s[%d]: ib_post_send() returned %d for %d/%d",
+		pr_err("%s[%d]: ib_post_send() returned %d for %d/%d\n",
 				 __func__, __LINE__, ret, i, n_rdma);
 	if (ret && i > 0) {
 		wr.num_sge = 0;
@@ -2897,12 +2892,12 @@ static int srpt_perform_rdmas(struct srpt_rdma_ch *ch,
 		wr.send_flags = IB_SEND_SIGNALED;
 		while (ch->state == CH_LIVE &&
 			ib_post_send(ch->qp, &wr, &bad_wr) != 0) {
-			printk(KERN_INFO "Trying to abort failed RDMA transfer [%d]",
+			pr_info("Trying to abort failed RDMA transfer [%d]\n",
 				ioctx->ioctx.index);
 			msleep(1000);
 		}
 		while (ch->state != CH_RELEASING && !ioctx->rdma_aborted) {
-			printk(KERN_INFO "Waiting until RDMA abort finished [%d]",
+			pr_info("Waiting until RDMA abort finished [%d]\n",
 				ioctx->ioctx.index);
 			msleep(1000);
 		}
@@ -2923,17 +2918,17 @@ static int srpt_xfer_data(struct srpt_rdma_ch *ch,
 
 	ret = srpt_map_sg_to_ib_sge(ch, ioctx);
 	if (ret) {
-		printk(KERN_ERR "%s[%d] ret=%d\n", __func__, __LINE__, ret);
+		pr_err("%s[%d] ret=%d\n", __func__, __LINE__, ret);
 		goto out;
 	}
 
 	ret = srpt_perform_rdmas(ch, ioctx);
 	if (ret) {
 		if (ret == -EAGAIN || ret == -ENOMEM)
-			printk(KERN_INFO "%s[%d] queue full -- ret=%d\n",
-				   __func__, __LINE__, ret);
+			pr_info("%s[%d] queue full -- ret=%d\n",
+				__func__, __LINE__, ret);
 		else
-			printk(KERN_ERR "%s[%d] fatal error -- ret=%d\n",
+			pr_err("%s[%d] fatal error -- ret=%d\n",
 			       __func__, __LINE__, ret);
 		goto out_unmap;
 	}
@@ -3058,7 +3053,7 @@ static void srpt_queue_response(struct se_cmd *cmd)
 	    !ioctx->queue_status_only) {
 		ret = srpt_xfer_data(ch, ioctx);
 		if (ret) {
-			printk(KERN_ERR "xfer_data failed for tag %llu\n",
+			pr_err("xfer_data failed for tag %llu\n",
 			       ioctx->tag);
 			return;
 		}
@@ -3075,7 +3070,7 @@ static void srpt_queue_response(struct se_cmd *cmd)
 	}
 	ret = srpt_post_send(ch, ioctx, resp_len);
 	if (ret) {
-		printk(KERN_ERR "sending cmd response failed for tag %llu\n",
+		pr_err("sending cmd response failed for tag %llu\n",
 		       ioctx->tag);
 		srpt_unmap_sg_to_ib_sge(ch, ioctx);
 		srpt_set_cmd_state(ioctx, SRPT_STATE_DONE);
@@ -3154,7 +3149,7 @@ static int srpt_release_sdev(struct srpt_device *sdev)
 	res = wait_event_interruptible(sdev->ch_releaseQ,
 				       srpt_ch_list_empty(sdev));
 	if (res)
-		printk(KERN_ERR "%s: interrupted.\n", __func__);
+		pr_err("%s: interrupted.\n", __func__);
 
 	return 0;
 }
@@ -3293,7 +3288,7 @@ static void srpt_add_one(struct ib_device *device)
 		spin_lock_init(&sport->port_acl_lock);
 
 		if (srpt_refresh_port(sport)) {
-			printk(KERN_ERR "MAD registration failed for %s-%d.\n",
+			pr_err("MAD registration failed for %s-%d.\n",
 			       srpt_sdev_name(sdev), i);
 			goto err_ring;
 		}
@@ -3330,7 +3325,7 @@ free_dev:
 	kfree(sdev);
 err:
 	sdev = NULL;
-	printk(KERN_INFO "%s(%s) failed.\n", __func__, device->name);
+	pr_info("%s(%s) failed.\n", __func__, device->name);
 	goto out;
 }
 
@@ -3344,8 +3339,7 @@ static void srpt_remove_one(struct ib_device *device)
 
 	sdev = ib_get_client_data(device, &srpt_client);
 	if (!sdev) {
-		printk(KERN_INFO "%s(%s): nothing to do.\n", __func__,
-		       device->name);
+		pr_info("%s(%s): nothing to do.\n", __func__, device->name);
 		return;
 	}
 
@@ -3464,7 +3458,7 @@ static struct se_node_acl *srpt_alloc_fabric_acl(struct se_portal_group *se_tpg)
 
 	nacl = kzalloc(sizeof(struct srpt_node_acl), GFP_KERNEL);
 	if (!nacl) {
-		printk(KERN_ERR "Unable to allocate struct srpt_node_acl\n");
+		pr_err("Unable to allocate struct srpt_node_acl\n");
 		return NULL;
 	}
 
@@ -3615,7 +3609,7 @@ static struct se_node_acl *srpt_make_nodeacl(struct se_portal_group *tpg,
 	u8 i_port_id[16];
 
 	if (srpt_parse_i_port_id(i_port_id, name) < 0) {
-		printk(KERN_ERR "invalid initiator port ID %s\n", name);
+		pr_err("invalid initiator port ID %s\n", name);
 		ret = -EINVAL;
 		goto err;
 	}
@@ -3816,12 +3810,12 @@ static ssize_t srpt_tpg_store_enable(
 
 	ret = kstrtoul(page, 0, &tmp);
 	if (ret < 0) {
-		printk(KERN_ERR "Unable to extract srpt_tpg_store_enable\n");
+		pr_err("Unable to extract srpt_tpg_store_enable\n");
 		return -EINVAL;
 	}
 
 	if ((tmp != 0) && (tmp != 1)) {
-		printk(KERN_ERR "Illegal value for srpt_tpg_store_enable: %lu\n", tmp);
+		pr_err("Illegal value for srpt_tpg_store_enable: %lu\n", tmp);
 		return -EINVAL;
 	}
 	if (tmp == 1)
@@ -3980,7 +3974,7 @@ static int __init srpt_init_module(void)
 
 	ret = -EINVAL;
 	if (srp_max_req_size < MIN_MAX_REQ_SIZE) {
-		printk(KERN_ERR "invalid value %d for kernel module parameter"
+		pr_err("invalid value %d for kernel module parameter"
 		       " srp_max_req_size -- must be at least %d.\n",
 		       srp_max_req_size, MIN_MAX_REQ_SIZE);
 		goto out;
@@ -3988,7 +3982,7 @@ static int __init srpt_init_module(void)
 
 	if (srpt_srq_size < MIN_SRPT_SRQ_SIZE
 	    || srpt_srq_size > MAX_SRPT_SRQ_SIZE) {
-		printk(KERN_ERR "invalid value %d for kernel module parameter"
+		pr_err("invalid value %d for kernel module parameter"
 		       " srpt_srq_size -- must be in the range [%d..%d].\n",
 		       srpt_srq_size, MIN_SRPT_SRQ_SIZE, MAX_SRPT_SRQ_SIZE);
 		goto out;
@@ -3996,7 +3990,7 @@ static int __init srpt_init_module(void)
 
 	srpt_target = target_fabric_configfs_init(THIS_MODULE, "srpt");
 	if (IS_ERR(srpt_target)) {
-		printk(KERN_ERR "couldn't register\n");
+		pr_err("couldn't register\n");
 		ret = PTR_ERR(srpt_target);
 		goto out;
 	}
@@ -4018,13 +4012,13 @@ static int __init srpt_init_module(void)
 
 	ret = target_fabric_configfs_register(srpt_target);
 	if (ret < 0) {
-		printk(KERN_ERR "couldn't register\n");
+		pr_err("couldn't register\n");
 		goto out_free_target;
 	}
 
 	ret = ib_register_client(&srpt_client);
 	if (ret) {
-		printk(KERN_ERR "couldn't register IB client\n");
+		pr_err("couldn't register IB client\n");
 		goto out_unregister_target;
 	}
 
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* RE: [PATCH 00/10] IPoIB locking fixes and a couple minor other patches
       [not found] ` <cover.1418229671.git.dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
                     ` (9 preceding siblings ...)
  2014-12-10 16:47   ` [PATCH 10/10] ib_srpt: convert printk's to pr_* functions Doug Ledford
@ 2014-12-12 14:01   ` Estrin, Alex
  10 siblings, 0 replies; 20+ messages in thread
From: Estrin, Alex @ 2014-12-12 14:01 UTC (permalink / raw)
  To: Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	roland-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org

Hi Roland, Doug,

I've looked into ipoib patch set (earlier posted one), also ran it on 3.17.
I second it because ipoib locking fixes really straighten out multicast 
initialization flow, and dedicated workqueue per interface is a great 
improvement for overall ipoib stability and scalability.

Thanks,
Alex Estrin.

> -----Original Message-----
> From: linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org [mailto:linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org] On
> Behalf Of Doug Ledford
> Sent: Wednesday, December 10, 2014 11:47 AM
> To: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; roland-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org
> Cc: Doug Ledford
> Subject: [PATCH 00/10] IPoIB locking fixes and a couple minor other patches
> 
> As the locking fixes have missed two releases now, I'm resending them.
> The other two items are minor (the pr_* function one at least makes
> the system more usable by making output message point to the module
> making all of the dmesg noise, the other is one I contend is the right
> thing to do until we have a replacement for the cache).
> 
> These can also be pulled from my github repo as:
> 
> git://github.com/dledford/linux.git for-next
> 
> Bart Van Assche (1):
>   IB/srp: Use P_Key cache for P_Key lookups
> 
> Doug Ledford (9):
>   IPoIB: Consolidate rtnl_lock tasks in workqueue
>   IPoIB: Make the carrier_on_task race aware
>   IPoIB: fix MCAST_FLAG_BUSY usage
>   IPoIB: fix mcast_dev_flush/mcast_restart_task race
>   IPoIB: change init sequence ordering
>   IPoIB: Use dedicated workqueues per interface
>   IPoIB: Make ipoib_mcast_stop_thread flush the workqueue
>   IPoIB: No longer use flush as a parameter
>   ib_srpt: convert printk's to pr_* functions
> 
>  drivers/infiniband/ulp/srp/ib_srp.c   |   9 +-
>  drivers/infiniband/ulp/srpt/ib_srpt.c | 188 ++++++++++++++++------------------
>  2 files changed, 96 insertions(+), 101 deletions(-)
> 
> --
> 2.1.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2014-12-12 14:05 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-12-10 16:46 [PATCH 00/10] IPoIB locking fixes and a couple minor other patches Doug Ledford
     [not found] ` <cover.1418229671.git.dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2014-12-10 16:46   ` [PATCH 01/10] IPoIB: Consolidate rtnl_lock tasks in workqueue Doug Ledford
     [not found]     ` <84fa20ea8c6763fee66b378adda1da52cd48e342.1418229671.git.dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2014-12-12 14:02       ` Estrin, Alex
2014-12-10 16:46   ` [PATCH 02/10] IPoIB: Make the carrier_on_task race aware Doug Ledford
     [not found]     ` <9da7866273a37e8927eb079d90c96074a118c5bd.1418229671.git.dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2014-12-12 14:02       ` Estrin, Alex
2014-12-10 16:47   ` [PATCH 03/10] IPoIB: fix MCAST_FLAG_BUSY usage Doug Ledford
     [not found]     ` <3287eaddaf7948ccf6f471daa3312e27649ef798.1418229671.git.dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2014-12-12 14:03       ` Estrin, Alex
2014-12-10 16:47   ` [PATCH 04/10] IPoIB: fix mcast_dev_flush/mcast_restart_task race Doug Ledford
     [not found]     ` <fd20f0aae0dce26162aba317d2571e46bdbae919.1418229671.git.dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2014-12-12 14:04       ` Estrin, Alex
2014-12-10 16:47   ` [PATCH 05/10] IPoIB: change init sequence ordering Doug Ledford
     [not found]     ` <540b071b56bdc3b72d59e80707164da07243dc80.1418229671.git.dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2014-12-12 14:04       ` Estrin, Alex
2014-12-10 16:47   ` [PATCH 06/10] IPoIB: Use dedicated workqueues per interface Doug Ledford
     [not found]     ` <f733fa8b780b1ac512efb748683dca001fb2edf7.1418229671.git.dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2014-12-12 14:05       ` Estrin, Alex
2014-12-10 16:47   ` [PATCH 07/10] IPoIB: Make ipoib_mcast_stop_thread flush the workqueue Doug Ledford
     [not found]     ` <8c03e7569e7da863c0507dbf95e96ad06280b938.1418229671.git.dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2014-12-12 14:05       ` Estrin, Alex
2014-12-10 16:47   ` [PATCH 08/10] IPoIB: No longer use flush as a parameter Doug Ledford
     [not found]     ` <28c67ea96dd0a0c3f39bfa78176537c3eb983c50.1418229671.git.dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2014-12-12 14:05       ` Estrin, Alex
2014-12-10 16:47   ` [PATCH 09/10] IB/srp: Use P_Key cache for P_Key lookups Doug Ledford
2014-12-10 16:47   ` [PATCH 10/10] ib_srpt: convert printk's to pr_* functions Doug Ledford
2014-12-12 14:01   ` [PATCH 00/10] IPoIB locking fixes and a couple minor other patches Estrin, Alex

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox