public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [git patch review 2/4] IPoIB: Fix another send-only join race
  2006-02-11 20:22 [git patch review 1/4] IPoIB: Don't start send-only joins while multicast thread is stopped Roland Dreier
@ 2006-02-11 20:22 ` Roland Dreier
  2006-02-11 20:22   ` [git patch review 3/4] IB/mthca: Don't print debugging info until we have all values Roland Dreier
  2006-02-11 22:02 ` [git patch review 1/4] IPoIB: Don't start send-only joins while multicast thread is stopped Andrew Morton
  1 sibling, 1 reply; 11+ messages in thread
From: Roland Dreier @ 2006-02-11 20:22 UTC (permalink / raw)
  To: linux-kernel, openib-general

Further, there's an additional issue that I saw in testing:
ipoib_mcast_send may get called when priv->broadcast is NULL (e.g. if
the device was downed and then upped internally because of a port
event).

If this happends and the send-only join request gets completed before
priv->broadcast is set, we get an oops.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>

---

 drivers/infiniband/ulp/ipoib/ipoib_multicast.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

7bcb974ef6a0ae903888272c92c66ea779388c01
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
index 1c71482..932bf13 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
@@ -701,7 +701,7 @@ void ipoib_mcast_send(struct net_device 
 	 */
 	spin_lock(&priv->lock);
 
-	if (!test_bit(IPOIB_MCAST_STARTED, &priv->flags)) {
+	if (!test_bit(IPOIB_MCAST_STARTED, &priv->flags) || !priv->broadcast) {
 		++priv->stats.tx_dropped;
 		dev_kfree_skb_any(skb);
 		goto unlock;
-- 
1.1.3

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [git patch review 4/4] IPoIB: Yet another fix for send-only joins
  2006-02-11 20:22   ` [git patch review 3/4] IB/mthca: Don't print debugging info until we have all values Roland Dreier
@ 2006-02-11 20:22     ` Roland Dreier
  0 siblings, 0 replies; 11+ messages in thread
From: Roland Dreier @ 2006-02-11 20:22 UTC (permalink / raw)
  To: linux-kernel, openib-general

Even after the last fix, it's still possible for a send-only join to
start before the join for the broadcast group has finished.  This
could cause us to create a multicast group using attributes from the
broadcast group that haven't been initialized yet, so we would use
garbage for the Q_Key, etc.  Fix this by waiting until the broadcast
group's attached flag is set before starting send-only joins.

Signed-off-by: Roland Dreier <rolandd@cisco.com>

---

 drivers/infiniband/ulp/ipoib/ipoib_multicast.c |   15 ++++++++++-----
 1 files changed, 10 insertions(+), 5 deletions(-)

20b83382d1c5d4d1a73fc5671261db5239d1dbb3
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
index 932bf13..a2408d7 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
@@ -533,8 +533,10 @@ void ipoib_mcast_join_task(void *dev_ptr
 	}
 
 	if (!priv->broadcast) {
-		priv->broadcast = ipoib_mcast_alloc(dev, 1);
-		if (!priv->broadcast) {
+		struct ipoib_mcast *broadcast;
+
+		broadcast = ipoib_mcast_alloc(dev, 1);
+		if (!broadcast) {
 			ipoib_warn(priv, "failed to allocate broadcast group\n");
 			mutex_lock(&mcast_mutex);
 			if (test_bit(IPOIB_MCAST_RUN, &priv->flags))
@@ -544,10 +546,11 @@ void ipoib_mcast_join_task(void *dev_ptr
 			return;
 		}
 
-		memcpy(priv->broadcast->mcmember.mgid.raw, priv->dev->broadcast + 4,
+		spin_lock_irq(&priv->lock);
+		memcpy(broadcast->mcmember.mgid.raw, priv->dev->broadcast + 4,
 		       sizeof (union ib_gid));
+		priv->broadcast = broadcast;
 
-		spin_lock_irq(&priv->lock);
 		__ipoib_mcast_add(dev, priv->broadcast);
 		spin_unlock_irq(&priv->lock);
 	}
@@ -701,7 +704,9 @@ void ipoib_mcast_send(struct net_device 
 	 */
 	spin_lock(&priv->lock);
 
-	if (!test_bit(IPOIB_MCAST_STARTED, &priv->flags) || !priv->broadcast) {
+	if (!test_bit(IPOIB_MCAST_STARTED, &priv->flags)	||
+	    !priv->broadcast					||
+	    !test_bit(IPOIB_MCAST_FLAG_ATTACHED, &priv->broadcast->flags)) {
 		++priv->stats.tx_dropped;
 		dev_kfree_skb_any(skb);
 		goto unlock;
-- 
1.1.3

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [git patch review 3/4] IB/mthca: Don't print debugging info until we have all values
  2006-02-11 20:22 ` [git patch review 2/4] IPoIB: Fix another send-only join race Roland Dreier
@ 2006-02-11 20:22   ` Roland Dreier
  2006-02-11 20:22     ` [git patch review 4/4] IPoIB: Yet another fix for send-only joins Roland Dreier
  0 siblings, 1 reply; 11+ messages in thread
From: Roland Dreier @ 2006-02-11 20:22 UTC (permalink / raw)
  To: linux-kernel, openib-general

When debugging is enabled, the mthca_QUERY_DEV_LIM() firmware command
function prints out some of the device limits that it queries.
However the debugging prints happen before all of the fields are
extracted from the firmware response, so some of the values that get
printed are uninitialized junk.  Move the prints to the end of the
function to fix this.

Signed-off-by: Roland Dreier <rolandd@cisco.com>

---

 drivers/infiniband/hw/mthca/mthca_cmd.c |   38 ++++++++++++++++---------------
 1 files changed, 19 insertions(+), 19 deletions(-)

f295c79b6766b25fe8c1aad88211c54d1caa7e0b
diff --git a/drivers/infiniband/hw/mthca/mthca_cmd.c b/drivers/infiniband/hw/mthca/mthca_cmd.c
index f9b9b93..2825615 100644
--- a/drivers/infiniband/hw/mthca/mthca_cmd.c
+++ b/drivers/infiniband/hw/mthca/mthca_cmd.c
@@ -1029,25 +1029,6 @@ int mthca_QUERY_DEV_LIM(struct mthca_dev
 	MTHCA_GET(size, outbox, QUERY_DEV_LIM_UAR_ENTRY_SZ_OFFSET);
 	dev_lim->uar_scratch_entry_sz = size;
 
-	mthca_dbg(dev, "Max QPs: %d, reserved QPs: %d, entry size: %d\n",
-		  dev_lim->max_qps, dev_lim->reserved_qps, dev_lim->qpc_entry_sz);
-	mthca_dbg(dev, "Max SRQs: %d, reserved SRQs: %d, entry size: %d\n",
-		  dev_lim->max_srqs, dev_lim->reserved_srqs, dev_lim->srq_entry_sz);
-	mthca_dbg(dev, "Max CQs: %d, reserved CQs: %d, entry size: %d\n",
-		  dev_lim->max_cqs, dev_lim->reserved_cqs, dev_lim->cqc_entry_sz);
-	mthca_dbg(dev, "Max EQs: %d, reserved EQs: %d, entry size: %d\n",
-		  dev_lim->max_eqs, dev_lim->reserved_eqs, dev_lim->eqc_entry_sz);
-	mthca_dbg(dev, "reserved MPTs: %d, reserved MTTs: %d\n",
-		  dev_lim->reserved_mrws, dev_lim->reserved_mtts);
-	mthca_dbg(dev, "Max PDs: %d, reserved PDs: %d, reserved UARs: %d\n",
-		  dev_lim->max_pds, dev_lim->reserved_pds, dev_lim->reserved_uars);
-	mthca_dbg(dev, "Max QP/MCG: %d, reserved MGMs: %d\n",
-		  dev_lim->max_pds, dev_lim->reserved_mgms);
-	mthca_dbg(dev, "Max CQEs: %d, max WQEs: %d, max SRQ WQEs: %d\n",
-		  dev_lim->max_cq_sz, dev_lim->max_qp_sz, dev_lim->max_srq_sz);
-
-	mthca_dbg(dev, "Flags: %08x\n", dev_lim->flags);
-
 	if (mthca_is_memfree(dev)) {
 		MTHCA_GET(field, outbox, QUERY_DEV_LIM_MAX_SRQ_SZ_OFFSET);
 		dev_lim->max_srq_sz = 1 << field;
@@ -1093,6 +1074,25 @@ int mthca_QUERY_DEV_LIM(struct mthca_dev
 		dev_lim->mpt_entry_sz = MTHCA_MPT_ENTRY_SIZE;
 	}
 
+	mthca_dbg(dev, "Max QPs: %d, reserved QPs: %d, entry size: %d\n",
+		  dev_lim->max_qps, dev_lim->reserved_qps, dev_lim->qpc_entry_sz);
+	mthca_dbg(dev, "Max SRQs: %d, reserved SRQs: %d, entry size: %d\n",
+		  dev_lim->max_srqs, dev_lim->reserved_srqs, dev_lim->srq_entry_sz);
+	mthca_dbg(dev, "Max CQs: %d, reserved CQs: %d, entry size: %d\n",
+		  dev_lim->max_cqs, dev_lim->reserved_cqs, dev_lim->cqc_entry_sz);
+	mthca_dbg(dev, "Max EQs: %d, reserved EQs: %d, entry size: %d\n",
+		  dev_lim->max_eqs, dev_lim->reserved_eqs, dev_lim->eqc_entry_sz);
+	mthca_dbg(dev, "reserved MPTs: %d, reserved MTTs: %d\n",
+		  dev_lim->reserved_mrws, dev_lim->reserved_mtts);
+	mthca_dbg(dev, "Max PDs: %d, reserved PDs: %d, reserved UARs: %d\n",
+		  dev_lim->max_pds, dev_lim->reserved_pds, dev_lim->reserved_uars);
+	mthca_dbg(dev, "Max QP/MCG: %d, reserved MGMs: %d\n",
+		  dev_lim->max_pds, dev_lim->reserved_mgms);
+	mthca_dbg(dev, "Max CQEs: %d, max WQEs: %d, max SRQ WQEs: %d\n",
+		  dev_lim->max_cq_sz, dev_lim->max_qp_sz, dev_lim->max_srq_sz);
+
+	mthca_dbg(dev, "Flags: %08x\n", dev_lim->flags);
+
 out:
 	mthca_free_mailbox(dev, mailbox);
 	return err;
-- 
1.1.3

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [git patch review 1/4] IPoIB: Don't start send-only joins while multicast thread is stopped
@ 2006-02-11 20:22 Roland Dreier
  2006-02-11 20:22 ` [git patch review 2/4] IPoIB: Fix another send-only join race Roland Dreier
  2006-02-11 22:02 ` [git patch review 1/4] IPoIB: Don't start send-only joins while multicast thread is stopped Andrew Morton
  0 siblings, 2 replies; 11+ messages in thread
From: Roland Dreier @ 2006-02-11 20:22 UTC (permalink / raw)
  To: linux-kernel, openib-general

Fix the following race scenario:
  - Device is up.
  - Port event or set mcast list triggers ipoib_mcast_stop_thread,
    this cancels the query and waits on mcast "done" completion.
  - Completion is called and "done" is set.
  - Meanwhile, ipoib_mcast_send arrives and starts a new query,
    re-initializing "done".

Fix this by adding a "multicast started" bit and checking it before
starting a send-only join.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>

---

 drivers/infiniband/ulp/ipoib/ipoib.h           |    1 +
 drivers/infiniband/ulp/ipoib/ipoib_multicast.c |   15 +++++++++++++++
 2 files changed, 16 insertions(+), 0 deletions(-)

479a079663bd4c5f3d2714643b1b8c406aaba3e0
diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h b/drivers/infiniband/ulp/ipoib/ipoib.h
index e0a5412..2f85a9a 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib.h
+++ b/drivers/infiniband/ulp/ipoib/ipoib.h
@@ -78,6 +78,7 @@ enum {
 	IPOIB_FLAG_SUBINTERFACE   = 4,
 	IPOIB_MCAST_RUN 	  = 5,
 	IPOIB_STOP_REAPER         = 6,
+	IPOIB_MCAST_STARTED       = 7,
 
 	IPOIB_MAX_BACKOFF_SECONDS = 16,
 
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
index ccaa0c3..1c71482 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
@@ -601,6 +601,10 @@ int ipoib_mcast_start_thread(struct net_
 		queue_work(ipoib_workqueue, &priv->mcast_task);
 	mutex_unlock(&mcast_mutex);
 
+	spin_lock_irq(&priv->lock);
+	set_bit(IPOIB_MCAST_STARTED, &priv->flags);
+	spin_unlock_irq(&priv->lock);
+
 	return 0;
 }
 
@@ -611,6 +615,10 @@ int ipoib_mcast_stop_thread(struct net_d
 
 	ipoib_dbg_mcast(priv, "stopping multicast thread\n");
 
+	spin_lock_irq(&priv->lock);
+	clear_bit(IPOIB_MCAST_STARTED, &priv->flags);
+	spin_unlock_irq(&priv->lock);
+
 	mutex_lock(&mcast_mutex);
 	clear_bit(IPOIB_MCAST_RUN, &priv->flags);
 	cancel_delayed_work(&priv->mcast_task);
@@ -693,6 +701,12 @@ void ipoib_mcast_send(struct net_device 
 	 */
 	spin_lock(&priv->lock);
 
+	if (!test_bit(IPOIB_MCAST_STARTED, &priv->flags)) {
+		++priv->stats.tx_dropped;
+		dev_kfree_skb_any(skb);
+		goto unlock;
+	}
+
 	mcast = __ipoib_mcast_find(dev, mgid);
 	if (!mcast) {
 		/* Let's create a new send only group now */
@@ -754,6 +768,7 @@ out:
 		ipoib_send(dev, skb, mcast->ah, IB_MULTICAST_QPN);
 	}
 
+unlock:
 	spin_unlock(&priv->lock);
 }
 
-- 
1.1.3

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [git patch review 1/4] IPoIB: Don't start send-only joins while multicast thread is stopped
  2006-02-11 20:22 [git patch review 1/4] IPoIB: Don't start send-only joins while multicast thread is stopped Roland Dreier
  2006-02-11 20:22 ` [git patch review 2/4] IPoIB: Fix another send-only join race Roland Dreier
@ 2006-02-11 22:02 ` Andrew Morton
  2006-02-12  2:03   ` Roland Dreier
  1 sibling, 1 reply; 11+ messages in thread
From: Andrew Morton @ 2006-02-11 22:02 UTC (permalink / raw)
  To: Roland Dreier; +Cc: linux-kernel, openib-general

Roland Dreier <rolandd@cisco.com> wrote:
>
>  +	spin_lock_irq(&priv->lock);
>  +	set_bit(IPOIB_MCAST_STARTED, &priv->flags);
>  +	spin_unlock_irq(&priv->lock);

Strange to put a lock around an atomic op like that.

Sometimes it's valid.   If another cpu was doing:

	spin_lock(lock);

	if (test_bit(IPOIB_MCAST_STARTED))
		something();
	...
	if (test_bit(IPOIB_MCAST_STARTED))
		something_else();

	spin_unlock(lock);

then the locked set_bit() makes sense.

But often it doesn't ;)

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [git patch review 1/4] IPoIB: Don't start send-only joins while multicast thread is stopped
  2006-02-11 22:02 ` [git patch review 1/4] IPoIB: Don't start send-only joins while multicast thread is stopped Andrew Morton
@ 2006-02-12  2:03   ` Roland Dreier
  2006-02-12  7:50     ` [openib-general] " Michael S. Tsirkin
  0 siblings, 1 reply; 11+ messages in thread
From: Roland Dreier @ 2006-02-12  2:03 UTC (permalink / raw)
  To: Andrew Morton, mst; +Cc: Roland Dreier, linux-kernel, openib-general

 > Roland Dreier <rolandd@cisco.com> wrote:
 > >
 > >  +	spin_lock_irq(&priv->lock);
 > >  +	set_bit(IPOIB_MCAST_STARTED, &priv->flags);
 > >  +	spin_unlock_irq(&priv->lock);
 > 
 > Strange to put a lock around an atomic op like that.
 > 
 > Sometimes it's valid.   If another cpu was doing:
 > 
 > 	spin_lock(lock);
 > 
 > 	if (test_bit(IPOIB_MCAST_STARTED))
 > 		something();
 > 	...
 > 	if (test_bit(IPOIB_MCAST_STARTED))
 > 		something_else();
 > 
 > 	spin_unlock(lock);
 > 
 > then the locked set_bit() makes sense.
 > 
 > But often it doesn't ;)

Good point.  Michael, any reason why the lock is there around the
set_bit()?  (And similarly for the corresponding clear_bit())

Thanks,
 Roland

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [openib-general] Re: [git patch review 1/4] IPoIB: Don't start send-only joins while multicast thread is stopped
  2006-02-12  2:03   ` Roland Dreier
@ 2006-02-12  7:50     ` Michael S. Tsirkin
  2006-02-12  8:13       ` Kyle Moffett
  2006-02-12 16:37       ` [openib-general] " Roland Dreier
  0 siblings, 2 replies; 11+ messages in thread
From: Michael S. Tsirkin @ 2006-02-12  7:50 UTC (permalink / raw)
  To: Roland Dreier; +Cc: Andrew Morton, linux-kernel, openib-general

Quoting r. Roland Dreier <rdreier@cisco.com>:
> Subject: [openib-general] Re: [git patch review 1/4] IPoIB: Don't start send-only joins while multicast thread is stopped
> 
>  > Roland Dreier <rolandd@cisco.com> wrote:
>  > >
>  > >  +	spin_lock_irq(&priv->lock);
>  > >  +	set_bit(IPOIB_MCAST_STARTED, &priv->flags);
>  > >  +	spin_unlock_irq(&priv->lock);
>  > 
>  > Strange to put a lock around an atomic op like that.
>  > 
>  > Sometimes it's valid.   If another cpu was doing:
>  > 
>  > 	spin_lock(lock);
>  > 
>  > 	if (test_bit(IPOIB_MCAST_STARTED))
>  > 		something();
>  > 	...
>  > 	if (test_bit(IPOIB_MCAST_STARTED))
>  > 		something_else();
>  > 
>  > 	spin_unlock(lock);
>  > 
>  > then the locked set_bit() makes sense.
>  > 
>  > But often it doesn't ;)
> 
> Good point.  Michael, any reason why the lock is there around the
> set_bit()?  (And similarly for the corresponding clear_bit())
> 
> Thanks,
>  Roland

Basically, its as Andrew said: the lock around clear_bit is there to ensure that
ipoib_mcast_send isnt running already when we stop the thread.  Thats why
test_bit has to be inside the lock, too.

This was discussed with Krishna Kumar when I posted the patch originally.
For more detail, please review this thread:
http://www.mail-archive.com/openib-general@openib.org/msg13206.html
or here
http://openib.org/pipermail/openib-general/2005-December/014370.html


-- 
Michael S. Tsirkin
Staff Engineer, Mellanox Technologies

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [openib-general] Re: [git patch review 1/4] IPoIB: Don't start send-only joins while multicast thread is stopped
  2006-02-12  7:50     ` [openib-general] " Michael S. Tsirkin
@ 2006-02-12  8:13       ` Kyle Moffett
  2006-02-12  8:19         ` Michael S. Tsirkin
  2006-02-12 16:37       ` [openib-general] " Roland Dreier
  1 sibling, 1 reply; 11+ messages in thread
From: Kyle Moffett @ 2006-02-12  8:13 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Roland Dreier, Andrew Morton, linux-kernel, openib-general

On Feb 12, 2006, at 02:50, Michael S. Tsirkin wrote:
> Basically, its as Andrew said: the lock around clear_bit is there  
> to ensure that ipoib_mcast_send isnt running already when we stop  
> the thread.  Thats why test_bit has to be inside the lock, too.

Looks like you guys could use nonatomic versions to improve bus  
efficiency slightly, but they appear to be relying on the fact that  
when the function calling set_bit() returns, the multicast thread  
will be guaranteed to be finished and never run again.  The set_bit()  
can only happen when the thread is not doing work (due to the lock),  
and since the thread firsts checks the bit before doing any work, it  
provides more guarantees than just the atomics would.

Cheers,
Kyle Moffett

-----BEGIN GEEK CODE BLOCK-----
Version: 3.12
GCM/CS/IT/E/U d- s++: a18 C++++>$ ULBX*++++(+++)>$ P++++(+++)>$ L++++ 
(+++)>$ !E- W+++(++) N+++(++) o? K? w--- O? M++ V? PS+() PE+(-) Y+ PGP 
+ t+(+++) 5 X R? !tv-(--) b++++(++) DI+(++) D+++ G e>++++$ h*(+)>++$ r 
%(--)  !y?-(--)
------END GEEK CODE BLOCK------




^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Re: [git patch review 1/4] IPoIB: Don't start send-only joins while multicast thread is stopped
  2006-02-12  8:13       ` Kyle Moffett
@ 2006-02-12  8:19         ` Michael S. Tsirkin
  0 siblings, 0 replies; 11+ messages in thread
From: Michael S. Tsirkin @ 2006-02-12  8:19 UTC (permalink / raw)
  To: Kyle Moffett; +Cc: Roland Dreier, Andrew Morton, linux-kernel, openib-general

Quoting r. Kyle Moffett <mrmacman_g4@mac.com>:
> On Feb 12, 2006, at 02:50, Michael S. Tsirkin wrote:
> >Basically, its as Andrew said: the lock around clear_bit is there  
> >to ensure that ipoib_mcast_send isnt running already when we stop  
> >the thread.  Thats why test_bit has to be inside the lock, too.
> 
> Looks like you guys could use nonatomic versions to improve bus  
> efficiency slightly

I think we need atomics since other places touch bits in the same word without
taking the lock.

-- 
Michael S. Tsirkin
Staff Engineer, Mellanox Technologies

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [openib-general] Re: [git patch review 1/4] IPoIB: Don't start send-only joins while multicast thread is stopped
  2006-02-12  7:50     ` [openib-general] " Michael S. Tsirkin
  2006-02-12  8:13       ` Kyle Moffett
@ 2006-02-12 16:37       ` Roland Dreier
  2006-02-12 16:56         ` Michael S. Tsirkin
  1 sibling, 1 reply; 11+ messages in thread
From: Roland Dreier @ 2006-02-12 16:37 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Andrew Morton, linux-kernel, openib-general

    Michael> Basically, its as Andrew said: the lock around clear_bit
    Michael> is there to ensure that ipoib_mcast_send isnt running
    Michael> already when we stop the thread.  Thats why test_bit has
    Michael> to be inside the lock, too.

Makes sense I guess.  If I'm understanding correctly, the lock isn't
really there to serialize the bit ops, but rather to make sure
ipoib_mcast_send() won't do anything after we clear the bit.

Does that mean that there's no reason to take the lock around the set_bit()?

 - R.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [openib-general] Re: [git patch review 1/4] IPoIB: Don't start send-only joins while multicast thread is stopped
  2006-02-12 16:37       ` [openib-general] " Roland Dreier
@ 2006-02-12 16:56         ` Michael S. Tsirkin
  0 siblings, 0 replies; 11+ messages in thread
From: Michael S. Tsirkin @ 2006-02-12 16:56 UTC (permalink / raw)
  To: Roland Dreier; +Cc: Andrew Morton, linux-kernel, openib-general

Quoting r. Roland Dreier <rdreier@cisco.com>:
> Subject: Re: [openib-general] Re: [git patch review 1/4] IPoIB: Don't start send-only joins while multicast thread is stopped
> 
>     Michael> Basically, its as Andrew said: the lock around clear_bit
>     Michael> is there to ensure that ipoib_mcast_send isnt running
>     Michael> already when we stop the thread.  Thats why test_bit has
>     Michael> to be inside the lock, too.
> 
> Makes sense I guess.  If I'm understanding correctly, the lock isn't
> really there to serialize the bit ops, but rather to make sure
> ipoib_mcast_send() won't do anything after we clear the bit.

Right. Thats one way to put it.

> Does that mean that there's no reason to take the lock around the set_bit()?

Ugh, sorry, I dont really remember why I put it there.

I guess I just have easier time reasoning about locks than barriers and atomic
operations. "bit is protected by priv->lock" is a simple rule, and we are not on
data path here. The fact that the race went unnoticed for a while validates
this approach in my eyes.

I guess longer term we will replace mcast_mutex with priv->lock anyway, so it
doesnt matter much.

-- 
Michael S. Tsirkin
Staff Engineer, Mellanox Technologies

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2006-02-12 16:55 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-02-11 20:22 [git patch review 1/4] IPoIB: Don't start send-only joins while multicast thread is stopped Roland Dreier
2006-02-11 20:22 ` [git patch review 2/4] IPoIB: Fix another send-only join race Roland Dreier
2006-02-11 20:22   ` [git patch review 3/4] IB/mthca: Don't print debugging info until we have all values Roland Dreier
2006-02-11 20:22     ` [git patch review 4/4] IPoIB: Yet another fix for send-only joins Roland Dreier
2006-02-11 22:02 ` [git patch review 1/4] IPoIB: Don't start send-only joins while multicast thread is stopped Andrew Morton
2006-02-12  2:03   ` Roland Dreier
2006-02-12  7:50     ` [openib-general] " Michael S. Tsirkin
2006-02-12  8:13       ` Kyle Moffett
2006-02-12  8:19         ` Michael S. Tsirkin
2006-02-12 16:37       ` [openib-general] " Roland Dreier
2006-02-12 16:56         ` Michael S. Tsirkin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox