linux-rdma.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RFC for-next] net/mlx4_core: Fix racy flow in the driver CQ completion handler
@ 2012-08-30 10:38 Or Gerlitz
       [not found] ` <1346323085-2665-1-git-send-email-ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 15+ messages in thread
From: Or Gerlitz @ 2012-08-30 10:38 UTC (permalink / raw)
  To: roland-DgEjT+Ai2ygdnm+yROfE0A
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Yishai Hadas, Or Gerlitz

From: Yishai Hadas <yishaih-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

The mlx4 CQ completion handler, mlx4_cq_completion doesn't bother to lock 
the radix tree which is used to manage the table of CQs, nor to increase
the reference count of the CQ before invoking the user provided callback.

This is racy and can cause use after free, null pointer dereference, etc various 
crashes. Fix that by wrapping the radix tree lookup with taking the table lock, 
and increase the ref count for the time of the callback.

Signed-off-by: Yishai Hadas <yishaih-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Or Gerlitz <ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

---

Hi Roland,

Doing some code review on the related parts, I didn't see any way
the handler can be protected for the tree lockup and the user callback
invokation other then using these methods, e.g as done by the "event"
callback. Looking on mthca, I see the same behaviour -- mthca_cq_completion
takes no locks and doesn't increase the refcount while mthca_cq_event does
go by the book and do what ever needed, anything we miss here? its there
from day one... 

Or.

 drivers/net/ethernet/mellanox/mlx4/cq.c |    7 +++++++
 1 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/cq.c b/drivers/net/ethernet/mellanox/mlx4/cq.c
index 7e64033..05d9529 100644
--- a/drivers/net/ethernet/mellanox/mlx4/cq.c
+++ b/drivers/net/ethernet/mellanox/mlx4/cq.c
@@ -56,9 +56,14 @@
 void mlx4_cq_completion(struct mlx4_dev *dev, u32 cqn)
 {
 	struct mlx4_cq *cq;
+	struct mlx4_cq_table *cq_table = &mlx4_priv(dev)->cq_table;
 
+	spin_lock(&cq_table->lock);
 	cq = radix_tree_lookup(&mlx4_priv(dev)->cq_table.tree,
 			       cqn & (dev->caps.num_cqs - 1));
+	if (cq)
+		atomic_inc(&cq->refcount);
+	spin_unlock(&cq_table->lock);
 	if (!cq) {
 		mlx4_dbg(dev, "Completion event for bogus CQ %08x\n", cqn);
 		return;
@@ -67,6 +72,8 @@ void mlx4_cq_completion(struct mlx4_dev *dev, u32 cqn)
 	++cq->arm_sn;
 
 	cq->comp(cq);
+	if (atomic_dec_and_test(&cq->refcount))
+		complete(&cq->free);
 }
 
 void mlx4_cq_event(struct mlx4_dev *dev, u32 cqn, int event_type)
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH RFC for-next] net/mlx4_core: Fix racy flow in the driver CQ completion handler
       [not found] ` <1346323085-2665-1-git-send-email-ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2012-08-30 22:17   ` Or Gerlitz
       [not found]     ` <CAJZOPZJOWuoBQx49-O2WLk240GWGRsqfJpCXfhxRV6Cmdw7W0w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 15+ messages in thread
From: Or Gerlitz @ 2012-08-30 22:17 UTC (permalink / raw)
  To: Roland Dreier; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Yishai Hadas

Roland Dreier ‎<roland-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>  wrote:

> Can you be explicit about the race you're worried about?

few

1. on the  time CQ A is deleted an interrupt that relates to CQ B
takes place and a radix
tree lookup is running while an element is being deleted from the
tree, looking on the radix tree API, I don't see that this is allowed.

2. while a CQ is being freed an interrupt takes place and the driver
attempts to run the comp handler which can turn to use after free,
null pointer deref, etc. This can happen even if the ULP made sure to
consume all the WCs related to flushed/etc, e.g an "empty"
interrupt

BTW, your email missed the list somehow

Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH RFC for-next] net/mlx4_core: Fix racy flow in the driver CQ completion handler
       [not found]     ` <CAJZOPZJOWuoBQx49-O2WLk240GWGRsqfJpCXfhxRV6Cmdw7W0w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2012-08-30 22:35       ` Roland Dreier
       [not found]         ` <CAL1RGDX7kx7aypOqYvvwu8KBjMra2Gqq+4OGLz22k0RVmeZ1eQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2012-08-31  7:02       ` Or Gerlitz
  1 sibling, 1 reply; 15+ messages in thread
From: Roland Dreier @ 2012-08-30 22:35 UTC (permalink / raw)
  To: Or Gerlitz; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Yishai Hadas

On Thu, Aug 30, 2012 at 3:17 PM, Or Gerlitz <or.gerlitz-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> Roland Dreier ‎<roland-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>  wrote:
>
>> Can you be explicit about the race you're worried about?
>
> few
>
> 1. on the  time CQ A is deleted an interrupt that relates to CQ B
> takes place and a radix
> tree lookup is running while an element is being deleted from the
> tree, looking on the radix tree API, I don't see that this is allowed.

I don't think this is a real problem; the radix tree code is
explicitly designed for RCU use, and the data structure is pretty
clearly safe for looking up one slot while another slot is being
cleared.  In fact it's hard to see how this could screw up.

> 2. while a CQ is being freed an interrupt takes place and the driver
> attempts to run the comp handler which can turn to use after free,
> null pointer deref, etc. This can happen even if the ULP made sure to
> consume all the WCs related to flushed/etc, e.g an "empty"
> interrupt

So in mlx4_cq_free() we do

        mlx4_HW2SW_CQ(dev, NULL, cq->cqn);
        //...
        synchronize_irq(priv->eq_table.eq[cq->vector].irq);

before we touch the cq table.  I don't think we should get a CQ
completion event for the CQ we're freeing after we've done HW2SW_CQ on
it and then waited for any outstanding completion interrupts to
finish.

Also we know that there are no QPs attached to this CQ so there
shouldn't be any completion events anyway...

 - R.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH RFC for-next] net/mlx4_core: Fix racy flow in the driver CQ completion handler
       [not found]         ` <CAL1RGDX7kx7aypOqYvvwu8KBjMra2Gqq+4OGLz22k0RVmeZ1eQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2012-08-31  6:53           ` Or Gerlitz
  2012-08-31  7:12           ` Or Gerlitz
  1 sibling, 0 replies; 15+ messages in thread
From: Or Gerlitz @ 2012-08-31  6:53 UTC (permalink / raw)
  To: Roland Dreier, Yishai Hadas
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Dotan Barak

On Fri, Aug 31, 2012 at 1:35 AM, Roland Dreier <roland-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote:
> On Thu, Aug 30, 2012 at 3:17 PM, Or Gerlitz <or.gerlitz-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:

>> 1. on the  time CQ A is deleted an interrupt that relates to CQ B
>> takes place and a radix
>> tree lookup is running while an element is being deleted from the
>> tree, looking on the radix tree API, I don't see that this is allowed.
>
> I don't think this is a real problem; the radix tree code is
> explicitly designed for RCU use, and the data structure is pretty
> clearly safe for looking up one slot while another slot is being
> cleared.  In fact it's hard to see how this could screw up.

OK, I'll give it a 2nd look, thanks for elaborating on the design, one
thing which probably confused us was the driver cq event callback
which does take the lock before searching the radix tree, and increase
the refcount before invoking the user event handler, any reason for
the driver event callback to use different practice vs. the comp one?

>> 2. while a CQ is being freed an interrupt takes place and the driver
>> attempts to run the comp handler which can turn to use after free,
>> null pointer deref, etc. This can happen even if the ULP made sure to
>> consume all the WCs related to flushed/etc, e.g an "empty" interrupt

> So in mlx4_cq_free() we do
>         mlx4_HW2SW_CQ(dev, NULL, cq->cqn);
>         //...
>         synchronize_irq(priv->eq_table.eq[cq->vector].irq);
>
> before we touch the cq table.  I don't think we should get a CQ
> completion event for the CQ we're freeing after we've done HW2SW_CQ on
> it and then waited for any outstanding completion interrupts to finish.

yes, makes sense, this wouldn't handle use cases where the user does
context switch, e.g from hard_irq to softirq/tasklet and let their
handler touch the CQ, but the refcount in the driver wouldn't help
either in that case.

> Also we know that there are no QPs attached to this CQ so there
> shouldn't be any completion events anyway...

All to all, your reasoning makes much sense, we probably need to look
deeper into the crash report that triggered this RFC, see next email.

Yishai - were you thinking on other possible races that the patch
could address? also, can you double check the point Roland made on
HW2SW_CQ.


Or.
Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH RFC for-next] net/mlx4_core: Fix racy flow in the driver CQ completion handler
       [not found]     ` <CAJZOPZJOWuoBQx49-O2WLk240GWGRsqfJpCXfhxRV6Cmdw7W0w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2012-08-30 22:35       ` Roland Dreier
@ 2012-08-31  7:02       ` Or Gerlitz
  1 sibling, 0 replies; 15+ messages in thread
From: Or Gerlitz @ 2012-08-31  7:02 UTC (permalink / raw)
  To: Roland Dreier
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Yishai Hadas, Dotan Barak

> Roland Dreier ‎<roland-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>  wrote:
>> Can you be explicit about the race you're worried about?

Roland,

This patch was made after we got the below report from the field.

Dotan, can we get Roland access to the full vmcore image?

Here's the report I got:

"[...] box panic'ed when trying to find a completion queue. There is
no corruption but there is a possible race which could result in
mlx4_cq_completion getting wrong height of the radix tree and
following a bit too deep into the chains. In the other code which uses
this radix tree the access is protected by the lock but
mlx4_cq_completion is running win the interrupt context and cannot
take locks, so instead it run without any protection whatsoever."

The below is the stack trace, the kernel is an RHEL5
2.6.18-274.something, so in that respect, maybe your comment on the
radix tree RCU doesn't take effect there? you can see the problem was
somewhere into the radix tree code.

crash> bt
PID: 8178   TASK: ffff810b91a52400  CPU: 11  COMMAND: "universal_consu"
#0 [ffff81182fe3bc70] crash_kexec at ffffffff800ba60c
#1 [ffff81182fe3bd30] __die at ffffffff80069047
#2 [ffff81182fe3bd70] die at ffffffff8007075b
#3 [ffff81182fe3bda0] do_general_protection at ffffffff80069375
#4 [ffff81182fe3bde0] error_exit at ffffffff80060e9d

    [exception RIP: radix_tree_lookup+38]
    RIP: ffffffff8016405f  RSP: ffff81182fe3be90  RFLAGS: 00210002
    RAX: 6b6b6b6b6b6b6b8b  RBX: ffff810c2fb90000  RCX: 0000000000000006
    RDX: 0000000000000001  RSI: 00000000000000c0  RDI: 6b6b6b6b6b6b6b6b
    RBP: 00000000000000c0   R8: 0000000000010000   R9: 000000000920ea94
    R10: 00000000000000b3  R11: ffffffff800c7ea5  R12: 00000000000000b3
    R13: ffff810c2b15a280  R14: ffff810b7a98ff58  R15: 0000000000000000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0000

#5 [ffff81182fe3be90] mlx4_cq_completion at ffffffff886f2367
#6 [ffff81182fe3beb0] mlx4_eq_int at ffffffff886f2bd3
#7 [ffff81182fe3bf10] mlx4_msi_x_interrupt at ffffffff886f3078
#8 [ffff81182fe3bf20] handle_IRQ_event at ffffffff8001167c
#9 [ffff81182fe3bf50] __do_IRQ at ffffffff800c7f40
#10 [ffff81182fe3bf90] do_IRQ at ffffffff80071659

--- <IRQ stack> ---

#11 [ffff810b7a98ff58] ret_from_intr at ffffffff80060652
    RIP: 00000000f5cd8859  RSP: 00000000f3eafa28  RFLAGS: 00200202
    RAX: 0000000000000000  RBX: 00000000f3eafaf8  RCX: 000000000b6d4f00
    RDX: 000000000000001c  RSI: 000000000000001c  RDI: 0000000000000000
    RBP: ffff810bf1a0a120   R8: 0000000000000000   R9: 0000000000000000
    R10: 0000000000000000  R11: 0000000000000000  R12: 0000000000000001
    R13: 0000000000000000  R14: ffff810b9e7d5bf0  R15: 0000000000000000
    ORIG_RAX: ffffffffffffff4c  CS: 0023  SS: 002b
crash>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH RFC for-next] net/mlx4_core: Fix racy flow in the driver CQ completion handler
       [not found]         ` <CAL1RGDX7kx7aypOqYvvwu8KBjMra2Gqq+4OGLz22k0RVmeZ1eQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2012-08-31  6:53           ` Or Gerlitz
@ 2012-08-31  7:12           ` Or Gerlitz
  1 sibling, 0 replies; 15+ messages in thread
From: Or Gerlitz @ 2012-08-31  7:12 UTC (permalink / raw)
  To: Roland Dreier; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA

On Fri, Aug 31, 2012 at 1:35 AM, Roland Dreier <roland-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote:

> I don't think this is a real problem; the radix tree code is [...]

So maybe this patch wouldn't land in 3.7, we'll see, however, so far
no other patch sits in the for-next branch of your tree for the next
window... any plan to start reviewing / picking patches anytime soon?

Also, there are three more fixes for 3.6 I sent you earlier this week,
which need to get in as they address real bugs (deadlock in IPoIB,
crashes in mlx4 when attempting to register > 512GB, more).

Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH RFC for-next] net/mlx4_core: Fix racy flow in the driver CQ completion handler
       [not found] ` <20549.47088.805176.786765-AFFH1GffN5hPR4JQBCEnsQ@public.gmane.org>
@ 2012-09-09 14:08   ` Or Gerlitz
  0 siblings, 0 replies; 15+ messages in thread
From: Or Gerlitz @ 2012-09-09 14:08 UTC (permalink / raw)
  To: Max Matveev; +Cc: linux-rdma

On Tue, Sep 4, 2012 at 11:12 AM, Max Matveev <makc-puGfsi27rH1aa/9Udqfwiw@public.gmane.org> wrote:
>
> On Thu, Aug 30, 2012 at 22:35 PM Roland Dreier wrote:
>
>> On Thu, Aug 30, 2012 at 3:17 PM, Or Gerlitz <or.gerlitz-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>>> Roland Dreier <roland-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>  wrote:
>>>
>>>> Can you be explicit about the race you're worried about?
>>>
>>> few
>>>
>>> 1. on the  time CQ A is deleted an interrupt that relates to CQ B
>>> takes place and a radix
>>> tree lookup is running while an element is being deleted from the
>>> tree, looking on the radix tree API, I don't see that this is
>>> allowed.
>>
>> I don't think this is a real problem; the radix tree code is
>> explicitly designed for RCU use, and the data structure is pretty
>> clearly safe for looking up one slot while another slot is being
>> cleared.  In fact it's hard to see how this could screw up.
>
> What about races between radix_tree_extend and radix_tree_lookup?
>
> As far as I can see (even on newer kernels with rcu support) there is
> nothing which protects changes to node->height and it is used to
> decide how deep lookup must go to find the actual data.


Max,

I'm putting your response on linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org as it was
wrongly sent to linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org ...

Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH RFC for-next] net/mlx4_core: Fix racy flow in the driver CQ completion handler
       [not found]     ` <20551.21182.486376.898809-AFFH1GffN5hPR4JQBCEnsQ@public.gmane.org>
@ 2012-09-09 14:09       ` Or Gerlitz
       [not found]         ` <CAJZOPZLXktTREd3xmE=v14HpP1O+EgiqA54LHwQvPNXNX9Rr7A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 15+ messages in thread
From: Or Gerlitz @ 2012-09-09 14:09 UTC (permalink / raw)
  To: Max Matveev; +Cc: linux-rdma

On Wed, Sep 5, 2012 at 4:25 PM, Max Matveev <makc-puGfsi27rH1aa/9Udqfwiw@public.gmane.org> wrote:
> On Tue, 4 Sep 2012 11:02:32 -0700, Roland Dreier wrote:
>
>  roland> On Tue, Sep 4, 2012 at 1:12 AM, Max Matveev <makc-puGfsi27rH1aa/9Udqfwiw@public.gmane.org> wrote:
>  >> What about races between radix_tree_extend and radix_tree_lookup?
>  >>
>  >> As far as I can see (even on newer kernels with rcu support) there is
>  >> nothing which protects changes to node->height and it is used to
>  >> decide how deep lookup must go to find the actual data.
>
>  roland> I assume things are ordered so that lookups that follow the
>  roland> old data can still find old entries, and that updates become
>  roland> visible in a consistent order.
>
> I honestly do see how - the height of the root node is updated
> indepedently of the slots, so if someone managed to get the updated
> height there is nothing from stoping radix_tree_lookup from going too
> deep into the chain of slots.
>
> Also most other places which call radix_tree_lookup do use some kind
> of lock around it.


same here putting your response on linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org as it was
wrongly sent to linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org ...

Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH RFC for-next] net/mlx4_core: Fix racy flow in the driver CQ completion handler
       [not found]         ` <CAJZOPZLXktTREd3xmE=v14HpP1O+EgiqA54LHwQvPNXNX9Rr7A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2012-09-09 15:10           ` Roland Dreier
       [not found]             ` <CAL1RGDXZtqWcFhPXnit=O81wACdXsyDziv6ivkXMOcTf+8_X6w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 15+ messages in thread
From: Roland Dreier @ 2012-09-09 15:10 UTC (permalink / raw)
  To: Or Gerlitz; +Cc: Max Matveev, linux-rdma

On Sun, Sep 9, 2012 at 7:09 AM, Or Gerlitz <or.gerlitz-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>> I honestly do see how - the height of the root node is updated
>> indepedently of the slots, so if someone managed to get the updated
>> height there is nothing from stoping radix_tree_lookup from going too
>> deep into the chain of slots.

Please look at commit 7cf9c2c76c1a ("[PATCH] radix-tree: RCU lockless readside")

    Make radix tree lookups safe to be performed without locks.  Readers are
    protected against nodes being deleted by using RCU based freeing.  Readers
    are protected against new node insertion by using memory barriers to ensure
    the node itself will be properly written before it is visible in the radix
    tree.

etc.  Also the API description in <linux/radix-tree.h> is helpful.

Before we randomly throw in locking, let's please have at least a theory
about what race we're actually protecting against.

 - R.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH RFC for-next] net/mlx4_core: Fix racy flow in the driver CQ completion handler
       [not found]             ` <CAL1RGDXZtqWcFhPXnit=O81wACdXsyDziv6ivkXMOcTf+8_X6w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2012-09-10  6:57               ` Jack Morgenstein
       [not found]                 ` <201209100957.08509.jackm-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
  0 siblings, 1 reply; 15+ messages in thread
From: Jack Morgenstein @ 2012-09-10  6:57 UTC (permalink / raw)
  To: Roland Dreier; +Cc: Or Gerlitz, Max Matveev, linux-rdma

On Sunday 09 September 2012 18:10, Roland Dreier wrote:
> 
> Please look at commit 7cf9c2c76c1a ("[PATCH] radix-tree: RCU lockless readside")
> 
Roland,

What about the following note (from the commit diff mentioned above):
+/**
+ * Radix-tree synchronization
+ *
+ * The radix-tree API requires that users provide all synchronisation (with
+ * specific exceptions, noted below).
+ *
+ * Synchronization of access to the data items being stored in the tree, and
+ * management of their lifetimes must be completely managed by API users.
+ *
+ * For API usage, in general,
+ * - any function _modifying_ the the tree or tags (inserting or deleting
+ *   items, setting or clearing tags must exclude other modifications, and
+ *   exclude any functions reading the tree.
+ * - any function _reading_ the the tree or tags (looking up items or tags,
+ *   gang lookups) must exclude modifications to the tree, but may occur
+ *   concurrently with other readers.
+ *
+ * The notable exceptions to this rule are the following functions:
+ * radix_tree_lookup
+ * radix_tree_tag_get
+ * radix_tree_gang_lookup
+ * radix_tree_gang_lookup_tag
+ * radix_tree_tagged
+ *

======= JPM -- Note the following text! =========================
+ * The first 4 functions are able to be called locklessly, using RCU. The
+ * caller must ensure calls to these functions are made within rcu_read_lock()
+ * regions.
===========================================

+ * Other readers (lock-free or otherwise) and modifications may be
+ * running concurrently.

Roland, the above states explicitly that radix_tree_lookup should be called only under rcu_read_lock()!
This is NOT the case in our driver, in mlx4_eq_int()/mlx4_cq_completion() .

Looks to me, then, that the spinlock in mlx4_cq_completion() is required.

- Jack
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH RFC for-next] net/mlx4_core: Fix racy flow in the driver CQ completion handler
       [not found]                 ` <201209100957.08509.jackm-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
@ 2012-09-10 10:35                   ` Or Gerlitz
       [not found]                     ` <504DC25B.1030501-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 15+ messages in thread
From: Or Gerlitz @ 2012-09-10 10:35 UTC (permalink / raw)
  To: Jack Morgenstein, Max Matveev; +Cc: Roland Dreier, linux-rdma, Shlomo Pongratz

On 10/09/2012 09:57, Jack Morgenstein wrote:
> + *
> + * For API usage, in general,
> + * - any function _modifying_ the the tree or tags (inserting or deleting
> + *   items, setting or clearing tags must exclude other modifications, and
> + *   exclude any functions reading the tree.
> + * - any function _reading_ the the tree or tags (looking up items or tags,
> + *   gang lookups) must exclude modifications to the tree, but may occur
> + *   concurrently with other readers.
> + *
> + * The notable exceptions to this rule are the following functions:
> + * radix_tree_lookup
> + * radix_tree_tag_get
> + * radix_tree_gang_lookup
> + * radix_tree_gang_lookup_tag
> + * radix_tree_tagged
> + *
>
> ======= JPM -- Note the following text! =========================
> + * The first 4 functions are able to be called locklessly, using RCU. The
> + * caller must ensure calls to these functions are made within rcu_read_lock()
> + * regions.
> ===========================================
>
> + * Other readers (lock-free or otherwise) and modifications may be
> + * running concurrently.
>
> Roland, the above states explicitly that radix_tree_lookup should be called only under rcu_read_lock()!
> This is NOT the case in our driver, in mlx4_eq_int()/mlx4_cq_completion() .
>
> Looks to me, then, that the spinlock in mlx4_cq_completion() is required.
>
>

Jack,  Max

Actually, can't we do well with rcu_read_lock() in mlx4_cq_completion() 
as that commit documentation suggests?

Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH RFC for-next] net/mlx4_core: Fix racy flow in the driver CQ completion handler
       [not found]                     ` <504DC25B.1030501-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2012-09-10 13:17                       ` Jack Morgenstein
       [not found]                         ` <201209101617.46581.jackm-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
  0 siblings, 1 reply; 15+ messages in thread
From: Jack Morgenstein @ 2012-09-10 13:17 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: Max Matveev, Roland Dreier, linux-rdma, Shlomo Pongratz,
	dotanb-VPRAkNaXOzVS1MOuV/RT9w

On Monday 10 September 2012 13:35, Or Gerlitz wrote:
> Jack,  Max
> 
> Actually, can't we do well with rcu_read_lock() in mlx4_cq_completion() 
> as that commit documentation suggests?
> 
I don't know. I do notice (in file include/linux/rcupdate.h) that rcu_read_lock/unlock
is meant to be used in the interrupt context. Would it be sufficient (besides rcu_read_lock/unlock calls) to add
a call rcu_synchronize() in mlx4_cq_free (after calling synchronize_irq)?

Could we also then dispense with the spinlocks in mlx4_cq_event() as well?

Acquiring the SINGLE cq_table->lock spinlock for EVERY completion event
of EVERY cq seems very nasty to me (probably why Roland did not do this), and it would
clearly be desirable not to have to do this. 

-Jack
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH RFC for-next] net/mlx4_core: Fix racy flow in the driver CQ completion handler
       [not found]                         ` <201209101617.46581.jackm-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
@ 2012-09-10 13:27                           ` Or Gerlitz
       [not found]                             ` <504DEAB7.2070101-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 15+ messages in thread
From: Or Gerlitz @ 2012-09-10 13:27 UTC (permalink / raw)
  To: Jack Morgenstein, Max Matveev, Roland Dreier
  Cc: linux-rdma, Shlomo Pongratz, dotanb-VPRAkNaXOzVS1MOuV/RT9w

On 10/09/2012 16:17, Jack Morgenstein wrote:
> I don't know. I do notice (in file include/linux/rcupdate.h) that 
> rcu_read_lock/unlock is meant to be used in the interrupt context. 
> Would it be sufficient (besides rcu_read_lock/unlock calls) to add a 
> call rcu_synchronize() in mlx4_cq_free (after calling synchronize_irq)? 

I  took a look on the practice/wrapping used over the mm subsystem for 
radix_tree_lookup calls, whose maintainer,
Andrew Morton is signed on the patch Roland pointed to, its just 
rcu_read_lock/unlock, seems this is what to do as well.


> mm/readahead.c-179- rcu_read_lock();
> mm/readahead.c:180:             page = 
> radix_tree_lookup(&mapping->page_tree, page_offset);
> mm/readahead.c-181-             rcu_read_unlock();
> --
> mm/shmem.c-278- rcu_read_lock();
> mm/shmem.c:279: item = radix_tree_lookup(&mapping->page_tree, index);
> mm/shmem.c-280- rcu_read_unlock();
> --
> mm/vmalloc.c-986-       rcu_read_lock();
> mm/vmalloc.c:987:       vb = radix_tree_lookup(&vmap_block_tree, vb_idx);
> mm/vmalloc.c-988-       rcu_read_unlock();


> Could we also then dispense with the spinlocks in mlx4_cq_event() as 
> well? 

I don't see why mlx4_cq_event should be treated differently.

> Acquiring the SINGLE cq_table->lock spinlock for EVERY completion 
> event of EVERY cq seems very nasty to me (probably why Roland did not 
> do this), and it would clearly be desirable not to have to do this

so we have a way to avoid it, with rcu_read_lock, Max/Roland, agree?

Or.


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH RFC for-next] net/mlx4_core: Fix racy flow in the driver CQ completion handler
       [not found]                             ` <504DEAB7.2070101-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2012-09-11  6:03                               ` Jack Morgenstein
       [not found]                                 ` <201209110903.39789.jackm-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
  0 siblings, 1 reply; 15+ messages in thread
From: Jack Morgenstein @ 2012-09-11  6:03 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: Max Matveev, Roland Dreier, linux-rdma, Shlomo Pongratz,
	dotanb-VPRAkNaXOzVS1MOuV/RT9w

On Monday 10 September 2012 16:27, Or Gerlitz wrote:
> I  took a look on the practice/wrapping used over the mm subsystem for 
> radix_tree_lookup calls, whose maintainer,
> Andrew Morton is signed on the patch Roland pointed to, its just 
> rcu_read_lock/unlock, seems this is what to do as well.
> 
In addition, need to do a synchronize_rcu when deleting, per
the comment in include/linux/rcupdate.h:

* It is still required that the caller manage the synchronization and lifetimes
* of the items. So if RCU lock-free lookups are used, typically this would mean
* that the items have their own locks, or are amenable to lock-free access; and
* that the items are freed by RCU (or only freed after having been deleted from
* the radix tree *and* a synchronize_rcu() grace period).

-Jack
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH RFC for-next] net/mlx4_core: Fix racy flow in the driver CQ completion handler
       [not found]                                 ` <201209110903.39789.jackm-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
@ 2012-09-11  7:50                                   ` Or Gerlitz
  0 siblings, 0 replies; 15+ messages in thread
From: Or Gerlitz @ 2012-09-11  7:50 UTC (permalink / raw)
  To: Jack Morgenstein
  Cc: Or Gerlitz, Max Matveev, Roland Dreier, linux-rdma,
	Shlomo Pongratz, dotanb-VPRAkNaXOzVS1MOuV/RT9w

On Tue, Sep 11, 2012 at 9:03 AM, Jack Morgenstein
<jackm-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org> wrote:
> On Monday 10 September 2012 16:27, Or Gerlitz wrote:
>> I  took a look on the practice/wrapping used over the mm subsystem for
>> radix_tree_lookup calls, whose maintainer,
>> Andrew Morton is signed on the patch Roland pointed to, its just
>> rcu_read_lock/unlock, seems this is what to do as well.
>>
> In addition, need to do a synchronize_rcu when deleting

patch?
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2012-09-11  7:50 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20549.47088.805176.786765@iinet.net.au>
     [not found] ` <20549.47088.805176.786765-AFFH1GffN5hPR4JQBCEnsQ@public.gmane.org>
2012-09-09 14:08   ` [PATCH RFC for-next] net/mlx4_core: Fix racy flow in the driver CQ completion handler Or Gerlitz
     [not found] ` <CAG4TOxOQvAMFSTzM1vRS4XqMTiTPQ2Ofgbm7nEHPpPHD_tNwYw@mail.gmail.com>
     [not found]   ` <20551.21182.486376.898809@iinet.net.au>
     [not found]     ` <20551.21182.486376.898809-AFFH1GffN5hPR4JQBCEnsQ@public.gmane.org>
2012-09-09 14:09       ` Or Gerlitz
     [not found]         ` <CAJZOPZLXktTREd3xmE=v14HpP1O+EgiqA54LHwQvPNXNX9Rr7A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-09-09 15:10           ` Roland Dreier
     [not found]             ` <CAL1RGDXZtqWcFhPXnit=O81wACdXsyDziv6ivkXMOcTf+8_X6w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-09-10  6:57               ` Jack Morgenstein
     [not found]                 ` <201209100957.08509.jackm-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2012-09-10 10:35                   ` Or Gerlitz
     [not found]                     ` <504DC25B.1030501-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2012-09-10 13:17                       ` Jack Morgenstein
     [not found]                         ` <201209101617.46581.jackm-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2012-09-10 13:27                           ` Or Gerlitz
     [not found]                             ` <504DEAB7.2070101-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2012-09-11  6:03                               ` Jack Morgenstein
     [not found]                                 ` <201209110903.39789.jackm-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2012-09-11  7:50                                   ` Or Gerlitz
2012-08-30 10:38 Or Gerlitz
     [not found] ` <1346323085-2665-1-git-send-email-ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2012-08-30 22:17   ` Or Gerlitz
     [not found]     ` <CAJZOPZJOWuoBQx49-O2WLk240GWGRsqfJpCXfhxRV6Cmdw7W0w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-08-30 22:35       ` Roland Dreier
     [not found]         ` <CAL1RGDX7kx7aypOqYvvwu8KBjMra2Gqq+4OGLz22k0RVmeZ1eQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-08-31  6:53           ` Or Gerlitz
2012-08-31  7:12           ` Or Gerlitz
2012-08-31  7:02       ` Or Gerlitz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).