* [PATCH RFC for-next] net/mlx4_core: Fix racy flow in the driver CQ completion handler
@ 2012-08-30 10:38 Or Gerlitz
[not found] ` <1346323085-2665-1-git-send-email-ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
0 siblings, 1 reply; 15+ messages in thread
From: Or Gerlitz @ 2012-08-30 10:38 UTC (permalink / raw)
To: roland-DgEjT+Ai2ygdnm+yROfE0A
Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Yishai Hadas, Or Gerlitz
From: Yishai Hadas <yishaih-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
The mlx4 CQ completion handler, mlx4_cq_completion doesn't bother to lock
the radix tree which is used to manage the table of CQs, nor to increase
the reference count of the CQ before invoking the user provided callback.
This is racy and can cause use after free, null pointer dereference, etc various
crashes. Fix that by wrapping the radix tree lookup with taking the table lock,
and increase the ref count for the time of the callback.
Signed-off-by: Yishai Hadas <yishaih-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Or Gerlitz <ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
Hi Roland,
Doing some code review on the related parts, I didn't see any way
the handler can be protected for the tree lockup and the user callback
invokation other then using these methods, e.g as done by the "event"
callback. Looking on mthca, I see the same behaviour -- mthca_cq_completion
takes no locks and doesn't increase the refcount while mthca_cq_event does
go by the book and do what ever needed, anything we miss here? its there
from day one...
Or.
drivers/net/ethernet/mellanox/mlx4/cq.c | 7 +++++++
1 files changed, 7 insertions(+), 0 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx4/cq.c b/drivers/net/ethernet/mellanox/mlx4/cq.c
index 7e64033..05d9529 100644
--- a/drivers/net/ethernet/mellanox/mlx4/cq.c
+++ b/drivers/net/ethernet/mellanox/mlx4/cq.c
@@ -56,9 +56,14 @@
void mlx4_cq_completion(struct mlx4_dev *dev, u32 cqn)
{
struct mlx4_cq *cq;
+ struct mlx4_cq_table *cq_table = &mlx4_priv(dev)->cq_table;
+ spin_lock(&cq_table->lock);
cq = radix_tree_lookup(&mlx4_priv(dev)->cq_table.tree,
cqn & (dev->caps.num_cqs - 1));
+ if (cq)
+ atomic_inc(&cq->refcount);
+ spin_unlock(&cq_table->lock);
if (!cq) {
mlx4_dbg(dev, "Completion event for bogus CQ %08x\n", cqn);
return;
@@ -67,6 +72,8 @@ void mlx4_cq_completion(struct mlx4_dev *dev, u32 cqn)
++cq->arm_sn;
cq->comp(cq);
+ if (atomic_dec_and_test(&cq->refcount))
+ complete(&cq->free);
}
void mlx4_cq_event(struct mlx4_dev *dev, u32 cqn, int event_type)
--
1.7.1
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH RFC for-next] net/mlx4_core: Fix racy flow in the driver CQ completion handler
[not found] ` <1346323085-2665-1-git-send-email-ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2012-08-30 22:17 ` Or Gerlitz
[not found] ` <CAJZOPZJOWuoBQx49-O2WLk240GWGRsqfJpCXfhxRV6Cmdw7W0w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
0 siblings, 1 reply; 15+ messages in thread
From: Or Gerlitz @ 2012-08-30 22:17 UTC (permalink / raw)
To: Roland Dreier; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Yishai Hadas
Roland Dreier <roland-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote:
> Can you be explicit about the race you're worried about?
few
1. on the time CQ A is deleted an interrupt that relates to CQ B
takes place and a radix
tree lookup is running while an element is being deleted from the
tree, looking on the radix tree API, I don't see that this is allowed.
2. while a CQ is being freed an interrupt takes place and the driver
attempts to run the comp handler which can turn to use after free,
null pointer deref, etc. This can happen even if the ULP made sure to
consume all the WCs related to flushed/etc, e.g an "empty"
interrupt
BTW, your email missed the list somehow
Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH RFC for-next] net/mlx4_core: Fix racy flow in the driver CQ completion handler
[not found] ` <CAJZOPZJOWuoBQx49-O2WLk240GWGRsqfJpCXfhxRV6Cmdw7W0w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2012-08-30 22:35 ` Roland Dreier
[not found] ` <CAL1RGDX7kx7aypOqYvvwu8KBjMra2Gqq+4OGLz22k0RVmeZ1eQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-08-31 7:02 ` Or Gerlitz
1 sibling, 1 reply; 15+ messages in thread
From: Roland Dreier @ 2012-08-30 22:35 UTC (permalink / raw)
To: Or Gerlitz; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Yishai Hadas
On Thu, Aug 30, 2012 at 3:17 PM, Or Gerlitz <or.gerlitz-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> Roland Dreier <roland-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote:
>
>> Can you be explicit about the race you're worried about?
>
> few
>
> 1. on the time CQ A is deleted an interrupt that relates to CQ B
> takes place and a radix
> tree lookup is running while an element is being deleted from the
> tree, looking on the radix tree API, I don't see that this is allowed.
I don't think this is a real problem; the radix tree code is
explicitly designed for RCU use, and the data structure is pretty
clearly safe for looking up one slot while another slot is being
cleared. In fact it's hard to see how this could screw up.
> 2. while a CQ is being freed an interrupt takes place and the driver
> attempts to run the comp handler which can turn to use after free,
> null pointer deref, etc. This can happen even if the ULP made sure to
> consume all the WCs related to flushed/etc, e.g an "empty"
> interrupt
So in mlx4_cq_free() we do
mlx4_HW2SW_CQ(dev, NULL, cq->cqn);
//...
synchronize_irq(priv->eq_table.eq[cq->vector].irq);
before we touch the cq table. I don't think we should get a CQ
completion event for the CQ we're freeing after we've done HW2SW_CQ on
it and then waited for any outstanding completion interrupts to
finish.
Also we know that there are no QPs attached to this CQ so there
shouldn't be any completion events anyway...
- R.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH RFC for-next] net/mlx4_core: Fix racy flow in the driver CQ completion handler
[not found] ` <CAL1RGDX7kx7aypOqYvvwu8KBjMra2Gqq+4OGLz22k0RVmeZ1eQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2012-08-31 6:53 ` Or Gerlitz
2012-08-31 7:12 ` Or Gerlitz
1 sibling, 0 replies; 15+ messages in thread
From: Or Gerlitz @ 2012-08-31 6:53 UTC (permalink / raw)
To: Roland Dreier, Yishai Hadas
Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Dotan Barak
On Fri, Aug 31, 2012 at 1:35 AM, Roland Dreier <roland-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote:
> On Thu, Aug 30, 2012 at 3:17 PM, Or Gerlitz <or.gerlitz-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>> 1. on the time CQ A is deleted an interrupt that relates to CQ B
>> takes place and a radix
>> tree lookup is running while an element is being deleted from the
>> tree, looking on the radix tree API, I don't see that this is allowed.
>
> I don't think this is a real problem; the radix tree code is
> explicitly designed for RCU use, and the data structure is pretty
> clearly safe for looking up one slot while another slot is being
> cleared. In fact it's hard to see how this could screw up.
OK, I'll give it a 2nd look, thanks for elaborating on the design, one
thing which probably confused us was the driver cq event callback
which does take the lock before searching the radix tree, and increase
the refcount before invoking the user event handler, any reason for
the driver event callback to use different practice vs. the comp one?
>> 2. while a CQ is being freed an interrupt takes place and the driver
>> attempts to run the comp handler which can turn to use after free,
>> null pointer deref, etc. This can happen even if the ULP made sure to
>> consume all the WCs related to flushed/etc, e.g an "empty" interrupt
> So in mlx4_cq_free() we do
> mlx4_HW2SW_CQ(dev, NULL, cq->cqn);
> //...
> synchronize_irq(priv->eq_table.eq[cq->vector].irq);
>
> before we touch the cq table. I don't think we should get a CQ
> completion event for the CQ we're freeing after we've done HW2SW_CQ on
> it and then waited for any outstanding completion interrupts to finish.
yes, makes sense, this wouldn't handle use cases where the user does
context switch, e.g from hard_irq to softirq/tasklet and let their
handler touch the CQ, but the refcount in the driver wouldn't help
either in that case.
> Also we know that there are no QPs attached to this CQ so there
> shouldn't be any completion events anyway...
All to all, your reasoning makes much sense, we probably need to look
deeper into the crash report that triggered this RFC, see next email.
Yishai - were you thinking on other possible races that the patch
could address? also, can you double check the point Roland made on
HW2SW_CQ.
Or.
Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH RFC for-next] net/mlx4_core: Fix racy flow in the driver CQ completion handler
[not found] ` <CAJZOPZJOWuoBQx49-O2WLk240GWGRsqfJpCXfhxRV6Cmdw7W0w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-08-30 22:35 ` Roland Dreier
@ 2012-08-31 7:02 ` Or Gerlitz
1 sibling, 0 replies; 15+ messages in thread
From: Or Gerlitz @ 2012-08-31 7:02 UTC (permalink / raw)
To: Roland Dreier
Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Yishai Hadas, Dotan Barak
> Roland Dreier <roland-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote:
>> Can you be explicit about the race you're worried about?
Roland,
This patch was made after we got the below report from the field.
Dotan, can we get Roland access to the full vmcore image?
Here's the report I got:
"[...] box panic'ed when trying to find a completion queue. There is
no corruption but there is a possible race which could result in
mlx4_cq_completion getting wrong height of the radix tree and
following a bit too deep into the chains. In the other code which uses
this radix tree the access is protected by the lock but
mlx4_cq_completion is running win the interrupt context and cannot
take locks, so instead it run without any protection whatsoever."
The below is the stack trace, the kernel is an RHEL5
2.6.18-274.something, so in that respect, maybe your comment on the
radix tree RCU doesn't take effect there? you can see the problem was
somewhere into the radix tree code.
crash> bt
PID: 8178 TASK: ffff810b91a52400 CPU: 11 COMMAND: "universal_consu"
#0 [ffff81182fe3bc70] crash_kexec at ffffffff800ba60c
#1 [ffff81182fe3bd30] __die at ffffffff80069047
#2 [ffff81182fe3bd70] die at ffffffff8007075b
#3 [ffff81182fe3bda0] do_general_protection at ffffffff80069375
#4 [ffff81182fe3bde0] error_exit at ffffffff80060e9d
[exception RIP: radix_tree_lookup+38]
RIP: ffffffff8016405f RSP: ffff81182fe3be90 RFLAGS: 00210002
RAX: 6b6b6b6b6b6b6b8b RBX: ffff810c2fb90000 RCX: 0000000000000006
RDX: 0000000000000001 RSI: 00000000000000c0 RDI: 6b6b6b6b6b6b6b6b
RBP: 00000000000000c0 R8: 0000000000010000 R9: 000000000920ea94
R10: 00000000000000b3 R11: ffffffff800c7ea5 R12: 00000000000000b3
R13: ffff810c2b15a280 R14: ffff810b7a98ff58 R15: 0000000000000000
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0000
#5 [ffff81182fe3be90] mlx4_cq_completion at ffffffff886f2367
#6 [ffff81182fe3beb0] mlx4_eq_int at ffffffff886f2bd3
#7 [ffff81182fe3bf10] mlx4_msi_x_interrupt at ffffffff886f3078
#8 [ffff81182fe3bf20] handle_IRQ_event at ffffffff8001167c
#9 [ffff81182fe3bf50] __do_IRQ at ffffffff800c7f40
#10 [ffff81182fe3bf90] do_IRQ at ffffffff80071659
--- <IRQ stack> ---
#11 [ffff810b7a98ff58] ret_from_intr at ffffffff80060652
RIP: 00000000f5cd8859 RSP: 00000000f3eafa28 RFLAGS: 00200202
RAX: 0000000000000000 RBX: 00000000f3eafaf8 RCX: 000000000b6d4f00
RDX: 000000000000001c RSI: 000000000000001c RDI: 0000000000000000
RBP: ffff810bf1a0a120 R8: 0000000000000000 R9: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001
R13: 0000000000000000 R14: ffff810b9e7d5bf0 R15: 0000000000000000
ORIG_RAX: ffffffffffffff4c CS: 0023 SS: 002b
crash>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH RFC for-next] net/mlx4_core: Fix racy flow in the driver CQ completion handler
[not found] ` <CAL1RGDX7kx7aypOqYvvwu8KBjMra2Gqq+4OGLz22k0RVmeZ1eQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-08-31 6:53 ` Or Gerlitz
@ 2012-08-31 7:12 ` Or Gerlitz
1 sibling, 0 replies; 15+ messages in thread
From: Or Gerlitz @ 2012-08-31 7:12 UTC (permalink / raw)
To: Roland Dreier; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA
On Fri, Aug 31, 2012 at 1:35 AM, Roland Dreier <roland-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote:
> I don't think this is a real problem; the radix tree code is [...]
So maybe this patch wouldn't land in 3.7, we'll see, however, so far
no other patch sits in the for-next branch of your tree for the next
window... any plan to start reviewing / picking patches anytime soon?
Also, there are three more fixes for 3.6 I sent you earlier this week,
which need to get in as they address real bugs (deadlock in IPoIB,
crashes in mlx4 when attempting to register > 512GB, more).
Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH RFC for-next] net/mlx4_core: Fix racy flow in the driver CQ completion handler
[not found] ` <20549.47088.805176.786765-AFFH1GffN5hPR4JQBCEnsQ@public.gmane.org>
@ 2012-09-09 14:08 ` Or Gerlitz
0 siblings, 0 replies; 15+ messages in thread
From: Or Gerlitz @ 2012-09-09 14:08 UTC (permalink / raw)
To: Max Matveev; +Cc: linux-rdma
On Tue, Sep 4, 2012 at 11:12 AM, Max Matveev <makc-puGfsi27rH1aa/9Udqfwiw@public.gmane.org> wrote:
>
> On Thu, Aug 30, 2012 at 22:35 PM Roland Dreier wrote:
>
>> On Thu, Aug 30, 2012 at 3:17 PM, Or Gerlitz <or.gerlitz-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>>> Roland Dreier <roland-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote:
>>>
>>>> Can you be explicit about the race you're worried about?
>>>
>>> few
>>>
>>> 1. on the time CQ A is deleted an interrupt that relates to CQ B
>>> takes place and a radix
>>> tree lookup is running while an element is being deleted from the
>>> tree, looking on the radix tree API, I don't see that this is
>>> allowed.
>>
>> I don't think this is a real problem; the radix tree code is
>> explicitly designed for RCU use, and the data structure is pretty
>> clearly safe for looking up one slot while another slot is being
>> cleared. In fact it's hard to see how this could screw up.
>
> What about races between radix_tree_extend and radix_tree_lookup?
>
> As far as I can see (even on newer kernels with rcu support) there is
> nothing which protects changes to node->height and it is used to
> decide how deep lookup must go to find the actual data.
Max,
I'm putting your response on linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org as it was
wrongly sent to linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org ...
Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH RFC for-next] net/mlx4_core: Fix racy flow in the driver CQ completion handler
[not found] ` <20551.21182.486376.898809-AFFH1GffN5hPR4JQBCEnsQ@public.gmane.org>
@ 2012-09-09 14:09 ` Or Gerlitz
[not found] ` <CAJZOPZLXktTREd3xmE=v14HpP1O+EgiqA54LHwQvPNXNX9Rr7A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
0 siblings, 1 reply; 15+ messages in thread
From: Or Gerlitz @ 2012-09-09 14:09 UTC (permalink / raw)
To: Max Matveev; +Cc: linux-rdma
On Wed, Sep 5, 2012 at 4:25 PM, Max Matveev <makc-puGfsi27rH1aa/9Udqfwiw@public.gmane.org> wrote:
> On Tue, 4 Sep 2012 11:02:32 -0700, Roland Dreier wrote:
>
> roland> On Tue, Sep 4, 2012 at 1:12 AM, Max Matveev <makc-puGfsi27rH1aa/9Udqfwiw@public.gmane.org> wrote:
> >> What about races between radix_tree_extend and radix_tree_lookup?
> >>
> >> As far as I can see (even on newer kernels with rcu support) there is
> >> nothing which protects changes to node->height and it is used to
> >> decide how deep lookup must go to find the actual data.
>
> roland> I assume things are ordered so that lookups that follow the
> roland> old data can still find old entries, and that updates become
> roland> visible in a consistent order.
>
> I honestly do see how - the height of the root node is updated
> indepedently of the slots, so if someone managed to get the updated
> height there is nothing from stoping radix_tree_lookup from going too
> deep into the chain of slots.
>
> Also most other places which call radix_tree_lookup do use some kind
> of lock around it.
same here putting your response on linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org as it was
wrongly sent to linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org ...
Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH RFC for-next] net/mlx4_core: Fix racy flow in the driver CQ completion handler
[not found] ` <CAJZOPZLXktTREd3xmE=v14HpP1O+EgiqA54LHwQvPNXNX9Rr7A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2012-09-09 15:10 ` Roland Dreier
[not found] ` <CAL1RGDXZtqWcFhPXnit=O81wACdXsyDziv6ivkXMOcTf+8_X6w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
0 siblings, 1 reply; 15+ messages in thread
From: Roland Dreier @ 2012-09-09 15:10 UTC (permalink / raw)
To: Or Gerlitz; +Cc: Max Matveev, linux-rdma
On Sun, Sep 9, 2012 at 7:09 AM, Or Gerlitz <or.gerlitz-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>> I honestly do see how - the height of the root node is updated
>> indepedently of the slots, so if someone managed to get the updated
>> height there is nothing from stoping radix_tree_lookup from going too
>> deep into the chain of slots.
Please look at commit 7cf9c2c76c1a ("[PATCH] radix-tree: RCU lockless readside")
Make radix tree lookups safe to be performed without locks. Readers are
protected against nodes being deleted by using RCU based freeing. Readers
are protected against new node insertion by using memory barriers to ensure
the node itself will be properly written before it is visible in the radix
tree.
etc. Also the API description in <linux/radix-tree.h> is helpful.
Before we randomly throw in locking, let's please have at least a theory
about what race we're actually protecting against.
- R.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH RFC for-next] net/mlx4_core: Fix racy flow in the driver CQ completion handler
[not found] ` <CAL1RGDXZtqWcFhPXnit=O81wACdXsyDziv6ivkXMOcTf+8_X6w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2012-09-10 6:57 ` Jack Morgenstein
[not found] ` <201209100957.08509.jackm-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
0 siblings, 1 reply; 15+ messages in thread
From: Jack Morgenstein @ 2012-09-10 6:57 UTC (permalink / raw)
To: Roland Dreier; +Cc: Or Gerlitz, Max Matveev, linux-rdma
On Sunday 09 September 2012 18:10, Roland Dreier wrote:
>
> Please look at commit 7cf9c2c76c1a ("[PATCH] radix-tree: RCU lockless readside")
>
Roland,
What about the following note (from the commit diff mentioned above):
+/**
+ * Radix-tree synchronization
+ *
+ * The radix-tree API requires that users provide all synchronisation (with
+ * specific exceptions, noted below).
+ *
+ * Synchronization of access to the data items being stored in the tree, and
+ * management of their lifetimes must be completely managed by API users.
+ *
+ * For API usage, in general,
+ * - any function _modifying_ the the tree or tags (inserting or deleting
+ * items, setting or clearing tags must exclude other modifications, and
+ * exclude any functions reading the tree.
+ * - any function _reading_ the the tree or tags (looking up items or tags,
+ * gang lookups) must exclude modifications to the tree, but may occur
+ * concurrently with other readers.
+ *
+ * The notable exceptions to this rule are the following functions:
+ * radix_tree_lookup
+ * radix_tree_tag_get
+ * radix_tree_gang_lookup
+ * radix_tree_gang_lookup_tag
+ * radix_tree_tagged
+ *
======= JPM -- Note the following text! =========================
+ * The first 4 functions are able to be called locklessly, using RCU. The
+ * caller must ensure calls to these functions are made within rcu_read_lock()
+ * regions.
===========================================
+ * Other readers (lock-free or otherwise) and modifications may be
+ * running concurrently.
Roland, the above states explicitly that radix_tree_lookup should be called only under rcu_read_lock()!
This is NOT the case in our driver, in mlx4_eq_int()/mlx4_cq_completion() .
Looks to me, then, that the spinlock in mlx4_cq_completion() is required.
- Jack
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH RFC for-next] net/mlx4_core: Fix racy flow in the driver CQ completion handler
[not found] ` <201209100957.08509.jackm-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
@ 2012-09-10 10:35 ` Or Gerlitz
[not found] ` <504DC25B.1030501-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
0 siblings, 1 reply; 15+ messages in thread
From: Or Gerlitz @ 2012-09-10 10:35 UTC (permalink / raw)
To: Jack Morgenstein, Max Matveev; +Cc: Roland Dreier, linux-rdma, Shlomo Pongratz
On 10/09/2012 09:57, Jack Morgenstein wrote:
> + *
> + * For API usage, in general,
> + * - any function _modifying_ the the tree or tags (inserting or deleting
> + * items, setting or clearing tags must exclude other modifications, and
> + * exclude any functions reading the tree.
> + * - any function _reading_ the the tree or tags (looking up items or tags,
> + * gang lookups) must exclude modifications to the tree, but may occur
> + * concurrently with other readers.
> + *
> + * The notable exceptions to this rule are the following functions:
> + * radix_tree_lookup
> + * radix_tree_tag_get
> + * radix_tree_gang_lookup
> + * radix_tree_gang_lookup_tag
> + * radix_tree_tagged
> + *
>
> ======= JPM -- Note the following text! =========================
> + * The first 4 functions are able to be called locklessly, using RCU. The
> + * caller must ensure calls to these functions are made within rcu_read_lock()
> + * regions.
> ===========================================
>
> + * Other readers (lock-free or otherwise) and modifications may be
> + * running concurrently.
>
> Roland, the above states explicitly that radix_tree_lookup should be called only under rcu_read_lock()!
> This is NOT the case in our driver, in mlx4_eq_int()/mlx4_cq_completion() .
>
> Looks to me, then, that the spinlock in mlx4_cq_completion() is required.
>
>
Jack, Max
Actually, can't we do well with rcu_read_lock() in mlx4_cq_completion()
as that commit documentation suggests?
Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH RFC for-next] net/mlx4_core: Fix racy flow in the driver CQ completion handler
[not found] ` <504DC25B.1030501-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2012-09-10 13:17 ` Jack Morgenstein
[not found] ` <201209101617.46581.jackm-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
0 siblings, 1 reply; 15+ messages in thread
From: Jack Morgenstein @ 2012-09-10 13:17 UTC (permalink / raw)
To: Or Gerlitz
Cc: Max Matveev, Roland Dreier, linux-rdma, Shlomo Pongratz,
dotanb-VPRAkNaXOzVS1MOuV/RT9w
On Monday 10 September 2012 13:35, Or Gerlitz wrote:
> Jack, Max
>
> Actually, can't we do well with rcu_read_lock() in mlx4_cq_completion()
> as that commit documentation suggests?
>
I don't know. I do notice (in file include/linux/rcupdate.h) that rcu_read_lock/unlock
is meant to be used in the interrupt context. Would it be sufficient (besides rcu_read_lock/unlock calls) to add
a call rcu_synchronize() in mlx4_cq_free (after calling synchronize_irq)?
Could we also then dispense with the spinlocks in mlx4_cq_event() as well?
Acquiring the SINGLE cq_table->lock spinlock for EVERY completion event
of EVERY cq seems very nasty to me (probably why Roland did not do this), and it would
clearly be desirable not to have to do this.
-Jack
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH RFC for-next] net/mlx4_core: Fix racy flow in the driver CQ completion handler
[not found] ` <201209101617.46581.jackm-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
@ 2012-09-10 13:27 ` Or Gerlitz
[not found] ` <504DEAB7.2070101-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
0 siblings, 1 reply; 15+ messages in thread
From: Or Gerlitz @ 2012-09-10 13:27 UTC (permalink / raw)
To: Jack Morgenstein, Max Matveev, Roland Dreier
Cc: linux-rdma, Shlomo Pongratz, dotanb-VPRAkNaXOzVS1MOuV/RT9w
On 10/09/2012 16:17, Jack Morgenstein wrote:
> I don't know. I do notice (in file include/linux/rcupdate.h) that
> rcu_read_lock/unlock is meant to be used in the interrupt context.
> Would it be sufficient (besides rcu_read_lock/unlock calls) to add a
> call rcu_synchronize() in mlx4_cq_free (after calling synchronize_irq)?
I took a look on the practice/wrapping used over the mm subsystem for
radix_tree_lookup calls, whose maintainer,
Andrew Morton is signed on the patch Roland pointed to, its just
rcu_read_lock/unlock, seems this is what to do as well.
> mm/readahead.c-179- rcu_read_lock();
> mm/readahead.c:180: page =
> radix_tree_lookup(&mapping->page_tree, page_offset);
> mm/readahead.c-181- rcu_read_unlock();
> --
> mm/shmem.c-278- rcu_read_lock();
> mm/shmem.c:279: item = radix_tree_lookup(&mapping->page_tree, index);
> mm/shmem.c-280- rcu_read_unlock();
> --
> mm/vmalloc.c-986- rcu_read_lock();
> mm/vmalloc.c:987: vb = radix_tree_lookup(&vmap_block_tree, vb_idx);
> mm/vmalloc.c-988- rcu_read_unlock();
> Could we also then dispense with the spinlocks in mlx4_cq_event() as
> well?
I don't see why mlx4_cq_event should be treated differently.
> Acquiring the SINGLE cq_table->lock spinlock for EVERY completion
> event of EVERY cq seems very nasty to me (probably why Roland did not
> do this), and it would clearly be desirable not to have to do this
so we have a way to avoid it, with rcu_read_lock, Max/Roland, agree?
Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH RFC for-next] net/mlx4_core: Fix racy flow in the driver CQ completion handler
[not found] ` <504DEAB7.2070101-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2012-09-11 6:03 ` Jack Morgenstein
[not found] ` <201209110903.39789.jackm-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
0 siblings, 1 reply; 15+ messages in thread
From: Jack Morgenstein @ 2012-09-11 6:03 UTC (permalink / raw)
To: Or Gerlitz
Cc: Max Matveev, Roland Dreier, linux-rdma, Shlomo Pongratz,
dotanb-VPRAkNaXOzVS1MOuV/RT9w
On Monday 10 September 2012 16:27, Or Gerlitz wrote:
> I took a look on the practice/wrapping used over the mm subsystem for
> radix_tree_lookup calls, whose maintainer,
> Andrew Morton is signed on the patch Roland pointed to, its just
> rcu_read_lock/unlock, seems this is what to do as well.
>
In addition, need to do a synchronize_rcu when deleting, per
the comment in include/linux/rcupdate.h:
* It is still required that the caller manage the synchronization and lifetimes
* of the items. So if RCU lock-free lookups are used, typically this would mean
* that the items have their own locks, or are amenable to lock-free access; and
* that the items are freed by RCU (or only freed after having been deleted from
* the radix tree *and* a synchronize_rcu() grace period).
-Jack
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH RFC for-next] net/mlx4_core: Fix racy flow in the driver CQ completion handler
[not found] ` <201209110903.39789.jackm-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
@ 2012-09-11 7:50 ` Or Gerlitz
0 siblings, 0 replies; 15+ messages in thread
From: Or Gerlitz @ 2012-09-11 7:50 UTC (permalink / raw)
To: Jack Morgenstein
Cc: Or Gerlitz, Max Matveev, Roland Dreier, linux-rdma,
Shlomo Pongratz, dotanb-VPRAkNaXOzVS1MOuV/RT9w
On Tue, Sep 11, 2012 at 9:03 AM, Jack Morgenstein
<jackm-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org> wrote:
> On Monday 10 September 2012 16:27, Or Gerlitz wrote:
>> I took a look on the practice/wrapping used over the mm subsystem for
>> radix_tree_lookup calls, whose maintainer,
>> Andrew Morton is signed on the patch Roland pointed to, its just
>> rcu_read_lock/unlock, seems this is what to do as well.
>>
> In addition, need to do a synchronize_rcu when deleting
patch?
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2012-09-11 7:50 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20549.47088.805176.786765@iinet.net.au>
[not found] ` <20549.47088.805176.786765-AFFH1GffN5hPR4JQBCEnsQ@public.gmane.org>
2012-09-09 14:08 ` [PATCH RFC for-next] net/mlx4_core: Fix racy flow in the driver CQ completion handler Or Gerlitz
[not found] ` <CAG4TOxOQvAMFSTzM1vRS4XqMTiTPQ2Ofgbm7nEHPpPHD_tNwYw@mail.gmail.com>
[not found] ` <20551.21182.486376.898809@iinet.net.au>
[not found] ` <20551.21182.486376.898809-AFFH1GffN5hPR4JQBCEnsQ@public.gmane.org>
2012-09-09 14:09 ` Or Gerlitz
[not found] ` <CAJZOPZLXktTREd3xmE=v14HpP1O+EgiqA54LHwQvPNXNX9Rr7A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-09-09 15:10 ` Roland Dreier
[not found] ` <CAL1RGDXZtqWcFhPXnit=O81wACdXsyDziv6ivkXMOcTf+8_X6w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-09-10 6:57 ` Jack Morgenstein
[not found] ` <201209100957.08509.jackm-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2012-09-10 10:35 ` Or Gerlitz
[not found] ` <504DC25B.1030501-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2012-09-10 13:17 ` Jack Morgenstein
[not found] ` <201209101617.46581.jackm-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2012-09-10 13:27 ` Or Gerlitz
[not found] ` <504DEAB7.2070101-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2012-09-11 6:03 ` Jack Morgenstein
[not found] ` <201209110903.39789.jackm-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2012-09-11 7:50 ` Or Gerlitz
2012-08-30 10:38 Or Gerlitz
[not found] ` <1346323085-2665-1-git-send-email-ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2012-08-30 22:17 ` Or Gerlitz
[not found] ` <CAJZOPZJOWuoBQx49-O2WLk240GWGRsqfJpCXfhxRV6Cmdw7W0w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-08-30 22:35 ` Roland Dreier
[not found] ` <CAL1RGDX7kx7aypOqYvvwu8KBjMra2Gqq+4OGLz22k0RVmeZ1eQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-08-31 6:53 ` Or Gerlitz
2012-08-31 7:12 ` Or Gerlitz
2012-08-31 7:02 ` Or Gerlitz
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).