public inbox for linux-rdma@vger.kernel.org
 help / color / mirror / Atom feed
* race in ULPs  when processing RDMA_CM_EVENT_DEVICE_REMOVAL
@ 2013-05-05 14:05 Or Gerlitz
       [not found] ` <5186671F.3020408-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 5+ messages in thread
From: Or Gerlitz @ 2013-05-05 14:05 UTC (permalink / raw)
  To: Hefty, Sean
  Cc: Roi Dayan,
	linux-rdma (linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org)

Hi Sean,

When the low level driver exercises the hot unplug code (e.g if the user 
does modprobe -r, pci hot unplug, etc)
they would call the rdma-cm remove_one callback, which would go and 
generate RDMA_CM_EVENT_DEVICE_REMOVAL
event for the cma consumers. Now, if the consumer doesn't make sure they 
destroy all the IB objects
created on that ll device instance (e.g mlx4_0) prior to finalizing all 
processing of the DEVICE_REMOVAL
callback, the rdma-cm will let the low level driver green light to 
finalize its de-registation (destruction
of the IB device instance etc) with the IB core, and a call from the 
consumer to (say) ib_destroy_cq(dev, cq)
will crash since that dev object is practically null or the call points 
to  a function/module which
doesn't exist any more in the kernel address space - agree?

What would be the correct way to go for consumers, is that making sure 
they destroy 1st all their IB
objects (PDs, MRs, CQs, QPs, etc) prior to destroying the last rdma_cm 
id on a device removal event?
any other idea?

In iSER we don't make sure to destroy all the IB objects prior to acking 
this event to the rdma-cm
and we see crashes under RoCE link layer, where under IB link layer not 
crashing, most likely some
timing which is different between the link layers, but the arch question 
is still valid.

Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: race in ULPs  when processing RDMA_CM_EVENT_DEVICE_REMOVAL
       [not found] ` <5186671F.3020408-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2013-05-06 15:46   ` Hefty, Sean
       [not found]     ` <1828884A29C6694DAF28B7E6B8A823736FD21043-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
  0 siblings, 1 reply; 5+ messages in thread
From: Hefty, Sean @ 2013-05-06 15:46 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: Roi Dayan,
	linux-rdma (linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org)

> What would be the correct way to go for consumers, is that making sure
> they destroy 1st all their IB
> objects (PDs, MRs, CQs, QPs, etc) prior to destroying the last rdma_cm
> id on a device removal event?
> any other idea?

The user should free any objects that are associated with the device associated with the rdma_cm id.  If the user only accesses the device using the rdma_cm id, then all objects should be freed.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: race in ULPs when processing RDMA_CM_EVENT_DEVICE_REMOVAL
       [not found]     ` <1828884A29C6694DAF28B7E6B8A823736FD21043-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
@ 2013-05-06 20:25       ` Or Gerlitz
       [not found]         ` <CAJZOPZKm2ZXKVgTKAYXj6uHzji_p00UQQbKHzmqiPaTSLctEKg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 5+ messages in thread
From: Or Gerlitz @ 2013-05-06 20:25 UTC (permalink / raw)
  To: Hefty, Sean
  Cc: Or Gerlitz, Roi Dayan,
	linux-rdma (linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org)

On Mon, May 6, 2013 at 6:46 PM, Hefty, Sean <sean.hefty-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> wrote:
> The user should free any objects that are associated with the device associated with
> the rdma_cm id.  If the user only accesses the device using the rdma_cm id, then all
> objects should be freed.

So think on the case, e.g for iser where a ULP maintains a per-device
context where they stroe IB objects used for all connections of that
device (e.g PD, CQs, DMA-MR, etc) and per connection object where the
QP and RDMA-CM ID are storage.

The per-device context is created on demand throughout establishing
the 1st connection (e.g after address resolution where the IB device
to be used for that connection is "resolved") over a device and
destroyed after the last connection is disconnected.

Could we come up with an elegant (or the most not elegant..) way for a
secure teardown of the IB elements of that global context in the
context of the device removal event?

Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: race in ULPs when processing RDMA_CM_EVENT_DEVICE_REMOVAL
       [not found]         ` <CAJZOPZKm2ZXKVgTKAYXj6uHzji_p00UQQbKHzmqiPaTSLctEKg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2013-05-06 20:33           ` Hefty, Sean
       [not found]             ` <1828884A29C6694DAF28B7E6B8A823736FD2121B-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
  0 siblings, 1 reply; 5+ messages in thread
From: Hefty, Sean @ 2013-05-06 20:33 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: Or Gerlitz, Roi Dayan,
	linux-rdma (linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org)

> So think on the case, e.g for iser where a ULP maintains a per-device
> context where they stroe IB objects used for all connections of that
> device (e.g PD, CQs, DMA-MR, etc) and per connection object where the
> QP and RDMA-CM ID are storage.
> 
> The per-device context is created on demand throughout establishing
> the 1st connection (e.g after address resolution where the IB device
> to be used for that connection is "resolved") over a device and
> destroyed after the last connection is disconnected.
> 
> Could we come up with an elegant (or the most not elegant..) way for a
> secure teardown of the IB elements of that global context in the
> context of the device removal event?

I would think that this would do it:

- call rdma_destroy_qp() from the callback
- decrement the global reference count
- destroy the global context from the callback when count = 0
- return non-zero from the callback to destroy the rdma cm id

- Sean
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: race in ULPs when processing RDMA_CM_EVENT_DEVICE_REMOVAL
       [not found]             ` <1828884A29C6694DAF28B7E6B8A823736FD2121B-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
@ 2013-05-06 21:08               ` Or Gerlitz
  0 siblings, 0 replies; 5+ messages in thread
From: Or Gerlitz @ 2013-05-06 21:08 UTC (permalink / raw)
  To: Hefty, Sean
  Cc: Or Gerlitz, Roi Dayan,
	linux-rdma (linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org)

On Mon, May 6, 2013 at 11:33 PM, Hefty, Sean <sean.hefty-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> wrote:
> I would think that this would do it:
>
> - call rdma_destroy_qp() from the callback
> - decrement the global reference count
> - destroy the global context from the callback when count = 0
> - return non-zero from the callback to destroy the rdma cm id

Yep, generally speaking this makes some sense, we need to see how to
really make it work, in the presence of some other considerations
specific to iser.  At least now we understand clearly the problem of
the current code, I just wonder why over IB we don't hit the crash...
oh well.

Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2013-05-06 21:08 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-05-05 14:05 race in ULPs when processing RDMA_CM_EVENT_DEVICE_REMOVAL Or Gerlitz
     [not found] ` <5186671F.3020408-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2013-05-06 15:46   ` Hefty, Sean
     [not found]     ` <1828884A29C6694DAF28B7E6B8A823736FD21043-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2013-05-06 20:25       ` Or Gerlitz
     [not found]         ` <CAJZOPZKm2ZXKVgTKAYXj6uHzji_p00UQQbKHzmqiPaTSLctEKg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-05-06 20:33           ` Hefty, Sean
     [not found]             ` <1828884A29C6694DAF28B7E6B8A823736FD2121B-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2013-05-06 21:08               ` Or Gerlitz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox